Sequence Alignment Tool for Global and Local DNA RNA Protein Alignment
{ "skill_name": "Sequence Alignment Tool", "version": "1.0.0", "description": "Perform global or local sequence alignment on DNA, RNA, or protein sequences", "input_schema": { "type": "object", "required": [ "sequences" ], "properties": { "sequences": { "type": "object", "description": "Input sequences for alignment", "required": [ "seq1", "seq2" ], "properties": { "seq1": { "type": "string", "description": "First sequence" }, "seq2": { "type": "string", "description": "Second sequence" } } }, "alignment_mode": { "type": "string", "enum": [ "global", "local" ], "default": "global", "description": "Alignment algorithm" }, "sequence_type": { "type": "string", "enum": [ "DNA", "RNA", "protein", "auto" ], "default": "auto" } } }, "output_schema": { "type": "object", "properties": { "success": { "type": "boolean" }, "alignment": { "type": "object" }, "statistics": { "type": "object" } } }, "execution": { "type": "local", "command": "python execute.py --seq1 {seq1} --seq2 {seq2} --mode {alignment_mode}", "environment": { "python_version": ">=3.8", "dependencies": [] } }, "test_case": { "input": { "sequences": { "seq1": "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL", "seq2": "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL" }, "alignment_mode": "global" }, "expected_output": { "success": true, "statistics": { "identity": 100 } } } }
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# Sequence Alignment Tool ## Protocol for Agent Execution ### Name Sequence Alignment Tool ### Description A tool for performing local or global alignment of two protein or nucleic acid sequences. Supports Needleman-Wunsch global alignment algorithm and Smith-Waterman local alignment algorithm. ### Input Two FASTA formatted biological sequences (protein or nucleic acid) ### Steps 1. **Read Sequences** - Parse FASTA format files - Validate sequence validity (only valid characters) - Protein character set: ACDEFGHIKLMNPQRSTVWY - Nucleic acid character set: ACGTUN 2. **Select Alignment Algorithm** - Global alignment (Needleman-Wunsch): For overall similarity analysis - Local alignment (Smith-Waterman): For finding best matching subsequences 3. **Execute Alignment** - Use dynamic programming algorithm - Configure match/mismatch scores - Configure gap penalties (opening penalty + extension penalty) 4. **Calculate Similarity and Identity** - Similarity = (number of matches) / (alignment length) x 100% - Identity = (number of identical positions) / (shorter sequence length) x 100% - Gap rate = (number of gaps) / (alignment length) x 100% 5. **Output Alignment Results** - Aligned sequences (with gaps) - Position markers (`*` = exact match, `:` = similar, `.` = mismatch) - Alignment score - Statistical report ### Output - Aligned sequences (with gap insertions) - Similarity report (score, similarity percentage, identity, gap rate) - Alignment method description ### Tools - **Python**: Biopython `Bio.pairwise2` module - **Alternative**: EMBOSS toolkit (`water` for local alignment, `needle` for global alignment) ### Default Parameters ``` match_score: 2 mismatch_score: -1 gap_open: -10 gap_extend: -0.5 ``` ### Example Usage ```python # Global alignment python execute.py --seq1 test_inputs/seq1.fasta --seq2 test_inputs/seq2.fasta --mode global # Local alignment python execute.py --seq1 test_inputs/seq1.fasta --seq2 test_inputs/seq2.fasta --mode local ``` ### Supported Sequence Types - DNA (deoxyribonucleic acid): A, C, G, T - RNA (ribonucleic acid): A, C, G, U - Protein: 20 standard amino acids ### Error Handling - Invalid character detection and reporting - Empty sequence detection - File read error handling
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.