Protein Properties Calculator for Sequence Analysis and Feature Extraction
{ "payload": { "tool": "protein_properties_calculator", "version": "1.0.0", "input": { "sequence_type": "fasta", "sequence_source": "inline", "format": "fasta" }, "parameters": { "calculate_mw": true, "calculate_pi": true, "calculate_composition": true, "calculate_gravy": true, "calculate_instability_index": true, "precision": { "molecular_weight": 2, "isoelectric_point": 2, "hydropathicity": 3, "instability_index": 2, "composition_percentage": 2 } }, "example_sequences": { "hemoglobin_alpha": ">sp|P69905|HBA_HUMAN Hemoglobin subunit alpha\nMKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR", "lysozyme": ">sp|P00698|LYZ_CHICK Lysozyme C\nMVLLILVLVFLVGVGVNPTIQAELPALSTTDDLAAANGNLLDFVKSNLDRYRPGGNNRPGAIAVRDNSVNWGSSGGRIRLLSHRDDPAYAAPYLGRGYYFYSSYVNNDGRTLTLNDIALWMRDVNAGWLSATDYGILQINSRYWCNDGKGRDVQLAARNVKLFGNFGADKRAASRERNPLSIDKFIAIKDASGKFTCSWTAADNAYHAIDQYDSTDMKFSSFAKALGIKADKDLNYTLDVNAAHAAPLSKEAAAIAKLLKSIKDNKDLKEVFAEAKEKAFKDLKEVVFEAAFKVFSQYADLGCYCGVGSSKDVQLINLNNKPFVDLKNKYFNDICHVALGGLSQTPLFAILHR", "insulin": ">sp|P01308|INS_HUMAN Insulin\nMALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN", "small_peptide": "MKFLLK" }, "output_options": { "format": "json", "include_metadata": true, "pretty_print": true } }, "api_spec": { "method": "POST", "endpoint": "/api/v1/protein/properties", "content_type": "application/json", "request_body": { "type": "object", "properties": { "sequence": { "type": "string", "description": "Protein sequence in FASTA format or plain format", "example": "MKVLSPADKTNVKAAWGKVGAHAGEY" }, "sequence_id": { "type": "string", "description": "Optional identifier for the sequence", "example": "P69905" }, "options": { "type": "object", "properties": { "all_properties": { "type": "boolean", "default": true }, "precision": { "type": "object" } } } }, "required": [ "sequence" ] }, "response": { "success": { "status_code": 200, "schema": "SKILL.md#Output" }, "error": { "status_codes": [ 400, 422 ], "schema": { "status": "error", "error": { "code": "string", "message": "string" } } } } }, "cli_usage": { "command": "python execute.py [OPTIONS]", "options": [ { "flag": "--input, -i", "type": "string", "description": "Input FASTA file path", "required": true }, { "flag": "--output, -o", "type": "string", "description": "Output JSON file path (default: stdout)", "required": false }, { "flag": "--sequence, -s", "type": "string", "description": "Direct sequence input (overrides --input)", "required": false }, { "flag": "--format, -f", "type": "string", "description": "Output format: json, yaml, table", "default": "json" } ], "examples": [ "python execute.py --input test_inputs/protein.fasta", "python execute.py --input test_inputs/protein.fasta --output result.json", "python execute.py --sequence MKVLSPADKTNVKAAWGKVGAHAGEY" ] } }
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# SKILL: Protein Properties Calculator
## Metadata
- **Name**: Protein Properties Calculator
- **Version**: 1.0.0
- **Category**: biochemical-analysis
- **Tags**: protein, biophysics, amino-acid, physicochemical
## Description
Calculates physicochemical properties of protein sequences, including molecular weight, isoelectric point, amino acid composition, hydrophobicity, and instability index.
## Input
### Format
- **Type**: FASTA format protein sequence
- **Encoding**: UTF-8
- **Validation**:
- Only standard 20 amino acid letters (ACDEFGHIKLMNPQRSTVWY) allowed
- Sequence must not be empty
- Optional: header line starting with `>` is ignored
### Example Input
```
>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha
MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSH
CLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
```
## Steps
### Step 1: Parse Amino Acid Sequence
```
1. Remove all whitespace and newline characters
2. Remove header line if present (everything after '>')
3. Convert to uppercase
4. Validate: reject if any non-standard amino acid character found
5. Store cleaned sequence
```
### Step 2: Calculate Molecular Weight (MW)
```
1. Sum molecular weights of all amino acids
2. Subtract (n-1) * 18.015 for peptide bonds (n = sequence length)
3. Add 18.015 for water molecule at N-terminus
4. Unit: Daltons (Da)
5. Precision: 2 decimal places
```
### Step 3: Calculate Isoelectric Point (pI)
```
1. Use bisection method to find pH where net charge = 0
2. At each pH estimate:
- Calculate positive charges (N-terminus + K, R, H)
- Calculate negative charges (C-terminus + D, E)
- Net charge = positive - negative
3. Search range: pH 0 to 14
4. Precision: 2 decimal places
5. Use Henderson-Hasselbalch equation for ionizable groups
```
### Step 4: Calculate Amino Acid Composition
```
1. Count occurrences of each of 20 standard amino acids
2. Calculate percentage for each amino acid
3. Return both count and percentage
```
### Step 5: Calculate Grand Average of Hydropathicity (GRAVY)
```
1. Use Kyte-Doolittle hydropathy scale
2. GRAVY = sum(amino_acid_hydropathy) / sequence_length
3. Positive = hydrophobic, Negative = hydrophilic
```
### Step 6: Calculate Instability Index (II)
```
1. Use Guruprasad dipeptide instability weights
2. II = (10^5 / sequence_length) * sum(dipeptide_weights)
3. II < 40 = stable protein
4. II >= 40 = unstable protein
```
### Step 7: Generate Complete Properties Report
```
1. Compile all calculated properties
2. Generate JSON output with all results
```
## Output
### Format
- **Type**: JSON (JavaScript Object Notation)
- **Encoding**: UTF-8
- **Content-Type**: application/json
### Schema
```json
{
"status": "success" | "error",
"sequence": {
"raw": "original input",
"cleaned": "cleaned sequence",
"length": number
},
"properties": {
"molecular_weight": {
"value": number,
"unit": "Da"
},
"isoelectric_point": {
"value": number,
"description": "pH at which net charge is zero"
},
"composition": {
"A": {"count": number, "percentage": number},
"C": {"count": number, "percentage": number},
...
},
"hydropathicity": {
"gravy": number,
"description": "Kyte-Doolittle GRAVY score"
},
"instability_index": {
"value": number,
"classification": "stable" | "unstable"
}
},
"metadata": {
"calculated_at": "ISO timestamp",
"calculator_version": "1.0.0"
}
}
```
### Example Output
```json
{
"status": "success",
"sequence": {
"raw": ">sp|P69905|HBA_HUMAN\nMKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR",
"cleaned": "MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR",
"length": 142
},
"properties": {
"molecular_weight": {"value": 15162.87, "unit": "Da"},
"isoelectric_point": {"value": 9.14, "description": "pH at which net charge is zero"},
"composition": {
"A": {"count": 12, "percentage": 8.45},
"C": {"count": 0, "percentage": 0.0},
...
},
"hydropathicity": {"gravy": -0.425, "description": "Kyte-Doolittle GRAVY score"},
"instability_index": {"value": 38.56, "classification": "stable"}
},
"metadata": {
"calculated_at": "2026-04-29T00:00:00Z",
"calculator_version": "1.0.0"
}
}
```
## Tools
### Primary Implementation
- **Python Standard Library**: No external dependencies required
- **Method**: Direct implementation of ProtParam algorithm
### Reference Algorithms
- Guruprasad et al. (1990) - Instability Index
- Kyte & Doolittle (1982) - Hydropathy Scale
- Bjellqvist et al. (1994) - pI calculation
## Error Handling
### Invalid Sequence
```json
{
"status": "error",
"error": {
"code": "INVALID_SEQUENCE",
"message": "Sequence contains invalid amino acid character(s): X, Z"
}
}
```
### Empty Sequence
```json
{
"status": "error",
"error": {
"code": "EMPTY_SEQUENCE",
"message": "Input sequence is empty"
}
}
```
## Usage Examples
### Direct Function Call
```python
from execute import ProteinPropertiesCalculator
calculator = ProteinPropertiesCalculator()
result = calculator.analyze("MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR")
print(result)
```
### Command Line
```bash
python execute.py --input test_inputs/protein.fasta --output result.json
```
## Notes
- All calculations assume standard amino acids only
- Post-translational modifications (phosphorylation, glycosylation) are not considered
- Disulfide bonds affect protein stability but are not included in basic II calculation
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.