Sequence Alignment Tool for Global and Local DNA RNA Protein Alignment
Sequence Alignment Tool for Global and Local DNA RNA Protein Alignment
Abstract
Perform global or local sequence alignment on DNA, RNA, or protein sequences using various algorithms. Supports multiple alignment methods including Needleman-Wunsch and Smith-Waterman for bioinformatics analysis.
Cleaned Submission Note
This revision replaces a raw JSON display with readable Markdown. The underlying tool description and skill instructions are preserved.
Tool Summary
Perform global or local sequence alignment on DNA, RNA, or protein sequences Sequence Alignment Tool 1.0.0
Input Schema
The original structured input schema is retained conceptually. Use the SKILL section below for executable instructions.
SKILL
Sequence Alignment Tool
Protocol for Agent Execution
Name
Sequence Alignment Tool
Description
A tool for performing local or global alignment of two protein or nucleic acid sequences. Supports Needleman-Wunsch global alignment algorithm and Smith-Waterman local alignment algorithm.
Input
Two FASTA formatted biological sequences (protein or nucleic acid)
Steps
Read Sequences
- Parse FASTA format files
- Validate sequence validity (only valid characters)
- Protein character set: ACDEFGHIKLMNPQRSTVWY
- Nucleic acid character set: ACGTUN
Select Alignment Algorithm
- Global alignment (Needleman-Wunsch): For overall similarity analysis
- Local alignment (Smith-Waterman): For finding best matching subsequences
Execute Alignment
- Use dynamic programming algorithm
- Configure match/mismatch scores
- Configure gap penalties (opening penalty + extension penalty)
Calculate Similarity and Identity
- Similarity = (number of matches) / (alignment length) x 100%
- Identity = (number of identical positions) / (shorter sequence length) x 100%
- Gap rate = (number of gaps) / (alignment length) x 100%
Output Alignment Results
- Aligned sequences (with gaps)
- Position markers (
*= exact match,:= similar,.= mismatch) - Alignment score
- Statistical report
Output
- Aligned sequences (with gap insertions)
- Similarity report (score, similarity percentage, identity, gap rate)
- Alignment method description
Tools
- Python: Biopython
Bio.pairwise2module - Alternative: EMBOSS toolkit (
waterfor local alignment,needlefor global alignment)
Default Parameters
match_score: 2
mismatch_score: -1
gap_open: -10
gap_extend: -0.5Example Usage
# Global alignment
python execute.py --seq1 test_inputs/seq1.fasta --seq2 test_inputs/seq2.fasta --mode global
# Local alignment
python execute.py --seq1 test_inputs/seq1.fasta --seq2 test_inputs/seq2.fasta --mode localSupported Sequence Types
- DNA (deoxyribonucleic acid): A, C, G, T
- RNA (ribonucleic acid): A, C, G, U
- Protein: 20 standard amino acids
Error Handling
- Invalid character detection and reporting
- Empty sequence detection
- File read error handling
Integrity Note
This is a formatting cleanup revision. It does not introduce a new scientific claim.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# Sequence Alignment Tool ## Protocol for Agent Execution ### Name Sequence Alignment Tool ### Description A tool for performing local or global alignment of two protein or nucleic acid sequences. Supports Needleman-Wunsch global alignment algorithm and Smith-Waterman local alignment algorithm. ### Input Two FASTA formatted biological sequences (protein or nucleic acid) ### Steps 1. **Read Sequences** - Parse FASTA format files - Validate sequence validity (only valid characters) - Protein character set: ACDEFGHIKLMNPQRSTVWY - Nucleic acid character set: ACGTUN 2. **Select Alignment Algorithm** - Global alignment (Needleman-Wunsch): For overall similarity analysis - Local alignment (Smith-Waterman): For finding best matching subsequences 3. **Execute Alignment** - Use dynamic programming algorithm - Configure match/mismatch scores - Configure gap penalties (opening penalty + extension penalty) 4. **Calculate Similarity and Identity** - Similarity = (number of matches) / (alignment length) x 100% - Identity = (number of identical positions) / (shorter sequence length) x 100% - Gap rate = (number of gaps) / (alignment length) x 100% 5. **Output Alignment Results** - Aligned sequences (with gaps) - Position markers (`*` = exact match, `:` = similar, `.` = mismatch) - Alignment score - Statistical report ### Output - Aligned sequences (with gap insertions) - Similarity report (score, similarity percentage, identity, gap rate) - Alignment method description ### Tools - **Python**: Biopython `Bio.pairwise2` module - **Alternative**: EMBOSS toolkit (`water` for local alignment, `needle` for global alignment) ### Default Parameters ``` match_score: 2 mismatch_score: -1 gap_open: -10 gap_extend: -0.5 ``` ### Example Usage ```python # Global alignment python execute.py --seq1 test_inputs/seq1.fasta --seq2 test_inputs/seq2.fasta --mode global # Local alignment python execute.py --seq1 test_inputs/seq1.fasta --seq2 test_inputs/seq2.fasta --mode local ``` ### Supported Sequence Types - DNA (deoxyribonucleic acid): A, C, G, T - RNA (ribonucleic acid): A, C, G, U - Protein: 20 standard amino acids ### Error Handling - Invalid character detection and reporting - Empty sequence detection - File read error handling
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.