Bioinformatics File Format Converter for Common Data Types
{ "name": "bioinformatics_format_converter", "version": "1.0.0", "description": "Convert between common bioinformatics file formats", "input_schema": { "type": "object", "properties": { "source_file": { "type": "string", "description": "Path to the input file" }, "target_format": { "type": "string", "enum": [ "fasta", "genbank", "fastq", "mmtf", "csv", "tsv" ], "description": "Target output format" }, "options": { "type": "object", "properties": { "quality_threshold": { "type": "integer", "description": "Minimum quality score for FASTQ filtering", "default": 20 }, "output_path": { "type": "string", "description": "Custom output file path" }, "compression": { "type": "boolean", "description": "Enable output compression", "default": false } } } }, "required": [ "source_file", "target_format" ] }, "supported_conversions": [ { "from": "fasta", "to": "genbank", "description": "Convert FASTA sequences to GenBank format" }, { "from": "genbank", "to": "fasta", "description": "Extract sequences from GenBank files" }, { "from": "fastq", "to": "fasta", "description": "Convert FASTQ to FASTA with quality filtering" }, { "from": "fastq", "to": "fastq", "description": "Filter FASTQ by quality score" }, { "from": "pdb", "to": "mmtf", "description": "Convert PDB to compressed MMTF format" }, { "from": "csv", "to": "tsv", "description": "Convert CSV to TSV format" }, { "from": "tsv", "to": "csv", "description": "Convert TSV to CSV format" } ], "output_schema": { "type": "object", "properties": { "success": { "type": "boolean" }, "output_file": { "type": "string" }, "input_format": { "type": "string" }, "output_format": { "type": "string" }, "records_processed": { "type": "integer" }, "statistics": { "type": "object", "properties": { "total_bases": { "type": "integer" }, "total_sequences": { "type": "integer" }, "sequences_filtered": { "type": "integer" } } } } }, "examples": [ { "description": "Convert FASTA to GenBank", "input": { "source_file": "sequences.fasta", "target_format": "genbank" } }, { "description": "Filter FASTQ by quality", "input": { "source_file": "reads.fastq", "target_format": "fasta", "options": { "quality_threshold": 30 } } } ] }
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# SKILL: Bioinformatics Format Converter
## Name
Bioinformatics Format Converter
## Description
Converts common bioinformatics file formats, supporting conversion between FASTA, GenBank, FASTQ, PDB, CSV, TSV and other formats.
## Input
- `source_file`: Source file path (string, required)
- `target_format`: Target format (string, required)
- `options`: Additional options (object, optional)
- `quality_threshold`: FASTQ quality filter threshold (int, default: 20)
- `compression`: Output compression option (boolean, default: false)
## Supported Formats
| Source Format | Target Format | Description |
|---------------|--------------|-------------|
| FASTA | GenBank | DNA/protein sequence conversion |
| GenBank | FASTA | Sequence extraction |
| FASTQ | FASTA | Conversion after quality filtering |
| FASTQ | FASTQ | Quality filtering |
| PDB | MMTF | Structure format compression |
| CSV | TSV | Delimiter conversion |
| TSV | CSV | Delimiter conversion |
## Execution Steps
### Step 1: Detect Input File Format
```
1. Read file header
2. Identify format based on characteristics:
- FASTA: Starts with ">"
- GenBank: Contains "LOCUS" keyword
- FASTQ: Starts with "@", each record is 4 lines
- PDB: Starts with "HEADER" or "ATOM"
- CSV/TSV: Detect delimiter
```
### Step 2: Parse File Content
```
1. Use BioPython or standard library for parsing
2. Extract sequence/structure/table data
3. Validate data integrity
```
### Step 3: Convert to Target Format
```
1. Construct output based on target format
2. Apply any specified options
3. Handle special characters and format requirements
```
### Step 4: Output Converted File
```
1. Write to target file
2. Return output path and statistics
```
## Output
```json
{
"success": true,
"output_file": "path/to/output.file",
"input_format": "fasta",
"output_format": "genbank",
"records_processed": 5,
"statistics": {
"total_bases": 1500,
"total_sequences": 5
}
}
```
## Error Handling
- File not found: Return error code `FILE_NOT_FOUND`
- Format not supported: Return error code `UNSUPPORTED_FORMAT`
- Parsing failed: Return error code `PARSE_ERROR`
- Invalid input: Return error code `INVALID_INPUT`
## Tools
- **biopython**: Biological sequence and structure file parsing
- **python standard library**: CSV/TSV conversion, file operations
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.