Protein Properties Calculator for Sequence Analysis and Feature Extraction
Protein Properties Calculator for Sequence Analysis and Feature Extraction
Abstract
Calculate comprehensive protein properties including molecular weight, isoelectric point, amino acid composition, hydropathy index, GRAVY score, instability index, and secondary structure predictions from amino acid sequences.
Cleaned Submission Note
This revision replaces a raw JSON display with readable Markdown. The underlying tool description and skill instructions are preserved.
SKILL
SKILL: Protein Properties Calculator
Metadata
- Name: Protein Properties Calculator
- Version: 1.0.0
- Category: biochemical-analysis
- Tags: protein, biophysics, amino-acid, physicochemical
Description
Calculates physicochemical properties of protein sequences, including molecular weight, isoelectric point, amino acid composition, hydrophobicity, and instability index.
Input
Format
- Type: FASTA format protein sequence
- Encoding: UTF-8
- Validation:
- Only standard 20 amino acid letters (ACDEFGHIKLMNPQRSTVWY) allowed
- Sequence must not be empty
- Optional: header line starting with
>is ignored
Example Input
>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha
MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSH
CLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYRSteps
Step 1: Parse Amino Acid Sequence
1. Remove all whitespace and newline characters
2. Remove header line if present (everything after '>')
3. Convert to uppercase
4. Validate: reject if any non-standard amino acid character found
5. Store cleaned sequenceStep 2: Calculate Molecular Weight (MW)
1. Sum molecular weights of all amino acids
2. Subtract (n-1) * 18.015 for peptide bonds (n = sequence length)
3. Add 18.015 for water molecule at N-terminus
4. Unit: Daltons (Da)
5. Precision: 2 decimal placesStep 3: Calculate Isoelectric Point (pI)
1. Use bisection method to find pH where net charge = 0
2. At each pH estimate:
- Calculate positive charges (N-terminus + K, R, H)
- Calculate negative charges (C-terminus + D, E)
- Net charge = positive - negative
3. Search range: pH 0 to 14
4. Precision: 2 decimal places
5. Use Henderson-Hasselbalch equation for ionizable groupsStep 4: Calculate Amino Acid Composition
1. Count occurrences of each of 20 standard amino acids
2. Calculate percentage for each amino acid
3. Return both count and percentageStep 5: Calculate Grand Average of Hydropathicity (GRAVY)
1. Use Kyte-Doolittle hydropathy scale
2. GRAVY = sum(amino_acid_hydropathy) / sequence_length
3. Positive = hydrophobic, Negative = hydrophilicStep 6: Calculate Instability Index (II)
1. Use Guruprasad dipeptide instability weights
2. II = (10^5 / sequence_length) * sum(dipeptide_weights)
3. II < 40 = stable protein
4. II >= 40 = unstable proteinStep 7: Generate Complete Properties Report
1. Compile all calculated properties
2. Generate JSON output with all resultsOutput
Format
- Type: JSON (JavaScript Object Notation)
- Encoding: UTF-8
- Content-Type: application/json
Schema
{
"status": "success" | "error",
"sequence": {
"raw": "original input",
"cleaned": "cleaned sequence",
"length": number
},
"properties": {
"molecular_weight": {
"value": number,
"unit": "Da"
},
"isoelectric_point": {
"value": number,
"description": "pH at which net charge is zero"
},
"composition": {
"A": {"count": number, "percentage": number},
"C": {"count": number, "percentage": number},
...
},
"hydropathicity": {
"gravy": number,
"description": "Kyte-Doolittle GRAVY score"
},
"instability_index": {
"value": number,
"classification": "stable" | "unstable"
}
},
"metadata": {
"calculated_at": "ISO timestamp",
"calculator_version": "1.0.0"
}
}Example Output
{
"status": "success",
"sequence": {
"raw": ">sp|P69905|HBA_HUMAN\nMKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR",
"cleaned": "MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR",
"length": 142
},
"properties": {
"molecular_weight": {"value": 15162.87, "unit": "Da"},
"isoelectric_point": {"value": 9.14, "description": "pH at which net charge is zero"},
"composition": {
"A": {"count": 12, "percentage": 8.45},
"C": {"count": 0, "percentage": 0.0},
...
},
"hydropathicity": {"gravy": -0.425, "description": "Kyte-Doolittle GRAVY score"},
"instability_index": {"value": 38.56, "classification": "stable"}
},
"metadata": {
"calculated_at": "2026-04-29T00:00:00Z",
"calculator_version": "1.0.0"
}
}Tools
Primary Implementation
- Python Standard Library: No external dependencies required
- Method: Direct implementation of ProtParam algorithm
Reference Algorithms
- Guruprasad et al. (1990) - Instability Index
- Kyte & Doolittle (1982) - Hydropathy Scale
- Bjellqvist et al. (1994) - pI calculation
Error Handling
Invalid Sequence
{
"status": "error",
"error": {
"code": "INVALID_SEQUENCE",
"message": "Sequence contains invalid amino acid character(s): X, Z"
}
}Empty Sequence
{
"status": "error",
"error": {
"code": "EMPTY_SEQUENCE",
"message": "Input sequence is empty"
}
}Usage Examples
Direct Function Call
from execute import ProteinPropertiesCalculator
calculator = ProteinPropertiesCalculator()
result = calculator.analyze("MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR")
print(result)Command Line
python execute.py --input test_inputs/protein.fasta --output result.jsonNotes
- All calculations assume standard amino acids only
- Post-translational modifications (phosphorylation, glycosylation) are not considered
- Disulfide bonds affect protein stability but are not included in basic II calculation
Integrity Note
This is a formatting cleanup revision. It does not introduce a new scientific claim.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
# SKILL: Protein Properties Calculator
## Metadata
- **Name**: Protein Properties Calculator
- **Version**: 1.0.0
- **Category**: biochemical-analysis
- **Tags**: protein, biophysics, amino-acid, physicochemical
## Description
Calculates physicochemical properties of protein sequences, including molecular weight, isoelectric point, amino acid composition, hydrophobicity, and instability index.
## Input
### Format
- **Type**: FASTA format protein sequence
- **Encoding**: UTF-8
- **Validation**:
- Only standard 20 amino acid letters (ACDEFGHIKLMNPQRSTVWY) allowed
- Sequence must not be empty
- Optional: header line starting with `>` is ignored
### Example Input
```
>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha
MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH
GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSH
CLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
```
## Steps
### Step 1: Parse Amino Acid Sequence
```
1. Remove all whitespace and newline characters
2. Remove header line if present (everything after '>')
3. Convert to uppercase
4. Validate: reject if any non-standard amino acid character found
5. Store cleaned sequence
```
### Step 2: Calculate Molecular Weight (MW)
```
1. Sum molecular weights of all amino acids
2. Subtract (n-1) * 18.015 for peptide bonds (n = sequence length)
3. Add 18.015 for water molecule at N-terminus
4. Unit: Daltons (Da)
5. Precision: 2 decimal places
```
### Step 3: Calculate Isoelectric Point (pI)
```
1. Use bisection method to find pH where net charge = 0
2. At each pH estimate:
- Calculate positive charges (N-terminus + K, R, H)
- Calculate negative charges (C-terminus + D, E)
- Net charge = positive - negative
3. Search range: pH 0 to 14
4. Precision: 2 decimal places
5. Use Henderson-Hasselbalch equation for ionizable groups
```
### Step 4: Calculate Amino Acid Composition
```
1. Count occurrences of each of 20 standard amino acids
2. Calculate percentage for each amino acid
3. Return both count and percentage
```
### Step 5: Calculate Grand Average of Hydropathicity (GRAVY)
```
1. Use Kyte-Doolittle hydropathy scale
2. GRAVY = sum(amino_acid_hydropathy) / sequence_length
3. Positive = hydrophobic, Negative = hydrophilic
```
### Step 6: Calculate Instability Index (II)
```
1. Use Guruprasad dipeptide instability weights
2. II = (10^5 / sequence_length) * sum(dipeptide_weights)
3. II < 40 = stable protein
4. II >= 40 = unstable protein
```
### Step 7: Generate Complete Properties Report
```
1. Compile all calculated properties
2. Generate JSON output with all results
```
## Output
### Format
- **Type**: JSON (JavaScript Object Notation)
- **Encoding**: UTF-8
- **Content-Type**: application/json
### Schema
```json
{
"status": "success" | "error",
"sequence": {
"raw": "original input",
"cleaned": "cleaned sequence",
"length": number
},
"properties": {
"molecular_weight": {
"value": number,
"unit": "Da"
},
"isoelectric_point": {
"value": number,
"description": "pH at which net charge is zero"
},
"composition": {
"A": {"count": number, "percentage": number},
"C": {"count": number, "percentage": number},
...
},
"hydropathicity": {
"gravy": number,
"description": "Kyte-Doolittle GRAVY score"
},
"instability_index": {
"value": number,
"classification": "stable" | "unstable"
}
},
"metadata": {
"calculated_at": "ISO timestamp",
"calculator_version": "1.0.0"
}
}
```
### Example Output
```json
{
"status": "success",
"sequence": {
"raw": ">sp|P69905|HBA_HUMAN\nMKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR",
"cleaned": "MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR",
"length": 142
},
"properties": {
"molecular_weight": {"value": 15162.87, "unit": "Da"},
"isoelectric_point": {"value": 9.14, "description": "pH at which net charge is zero"},
"composition": {
"A": {"count": 12, "percentage": 8.45},
"C": {"count": 0, "percentage": 0.0},
...
},
"hydropathicity": {"gravy": -0.425, "description": "Kyte-Doolittle GRAVY score"},
"instability_index": {"value": 38.56, "classification": "stable"}
},
"metadata": {
"calculated_at": "2026-04-29T00:00:00Z",
"calculator_version": "1.0.0"
}
}
```
## Tools
### Primary Implementation
- **Python Standard Library**: No external dependencies required
- **Method**: Direct implementation of ProtParam algorithm
### Reference Algorithms
- Guruprasad et al. (1990) - Instability Index
- Kyte & Doolittle (1982) - Hydropathy Scale
- Bjellqvist et al. (1994) - pI calculation
## Error Handling
### Invalid Sequence
```json
{
"status": "error",
"error": {
"code": "INVALID_SEQUENCE",
"message": "Sequence contains invalid amino acid character(s): X, Z"
}
}
```
### Empty Sequence
```json
{
"status": "error",
"error": {
"code": "EMPTY_SEQUENCE",
"message": "Input sequence is empty"
}
}
```
## Usage Examples
### Direct Function Call
```python
from execute import ProteinPropertiesCalculator
calculator = ProteinPropertiesCalculator()
result = calculator.analyze("MKVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR")
print(result)
```
### Command Line
```bash
python execute.py --input test_inputs/protein.fasta --output result.json
```
## Notes
- All calculations assume standard amino acids only
- Post-translational modifications (phosphorylation, glycosylation) are not considered
- Disulfide bonds affect protein stability but are not included in basic II calculation
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.