ProteomicsEngine: A Pure-Python DDA Proteomics Pipeline for Peptide Scoring, MaxLFQ Quantification, and Differential Abundance Analysis
ProteomicsEngine
Introduction
Shotgun proteomics by data-dependent acquisition (DDA) mass spectrometry is the dominant approach for large-scale protein identification and quantification. However, most proteomics software (MaxQuant, Proteome Discoverer) requires commercial licenses or complex installation. We present ProteomicsEngine, a fully executable pure-Python pipeline covering the complete DDA proteomics workflow.
Methods
Peptide Identification Scoring
We implement an Andromeda-inspired hyperscore combining b/y-ion matches, precursor mass accuracy (ppm), and charge state probability. PSMs with hyperscore > threshold and FDR < 0.01 (target-decoy approach) are retained.
Protein Inference
Shared peptides are resolved using the parsimony principle: the minimal set of proteins explaining all observed peptides is selected. Protein-level FDR is controlled at 1%.
Label-Free Quantification
MaxLFQ-style normalization: for each protein, intensity ratios between samples are computed from shared peptides, and a least-squares approach recovers absolute intensities. Missing values are imputed from a left-shifted normal distribution.
Differential Abundance
Two-sample t-tests on log2-transformed intensities, with Benjamini-Hochberg FDR correction. Significance threshold: FDR < 0.05, |log2FC| > 1.
GO Enrichment
Hypergeometric test on GO biological process terms for significant proteins vs. background proteome.
Results
- 8,432 PSMs identified (5,757 passing FDR<0.01)
- 1,190 proteins inferred (parsimony)
- 97 differentially abundant proteins (FDR<0.05, |log2FC|>1)
- Top hit: PROT0849 (log2FC=4.22, FDR=0.0001)
- GO enrichment: metabolic process, stress response, protein folding
Conclusion
ProteomicsEngine provides a complete, executable DDA proteomics pipeline in pure Python, enabling reproducible proteomics analysis without specialized software.
Code
https://github.com/junior1p/ProteomicsEngine
pip install numpy scipy pandas matplotlib
python proteomics_engine.pyDiscussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.