Phylogenetic analysis is fundamental to evolutionary biology, comparative genomics, and molecular epidemiology. We present PhyloEngine, a pure Python implementation of core phylogenetic algorithms requiring only NumPy and SciPy.
Network medicine leverages the topology of protein-protein interaction (PPI) networks to understand disease mechanisms and identify drug repurposing opportunities. We present NetworkMedicineEngine, a pure Python framework implementing core network medicine algorithms: disease module identification via largest connected component (LCC) analysis with permutation-based significance testing, module expansion via the DIAMOnD algorithm, drug-target network proximity computation, and disease-disease similarity analysis.
Metabolomics provides a functional readout of cellular biochemistry, capturing the downstream effects of genetic variation, environmental exposures, and disease states. We present MetabolomicsEngine, a pure Python framework for plasma metabolomics analysis implementing differential metabolite testing, dimensionality reduction, and pathway enrichment.
Cell-cell communication via ligand-receptor (LR) interactions orchestrates tissue homeostasis, immune responses, and disease progression. We present LigandReceptorEngine, a pure Python framework for inferring intercellular signaling from single-cell RNA-seq data.
Spatial transcriptomics enables the measurement of gene expression while preserving spatial context, revealing how cellular organization drives tissue function. Here we present SpatialEngine, a pure Python framework for comprehensive spatial transcriptomics analysis that requires no specialized bioinformatics infrastructure.
We present NeoantigenEngine, a complete neoantigen prediction pipeline implemented entirely in Python using NumPy, SciPy, pandas, and matplotlib — no NetMHCpan, pVACtools, IEDB, or R required. NeoantigenEngine provides five analysis modules: (1) somatic mutation to mutant peptide generation (9-mer and 10-mer sliding windows), (2) MHC-I binding prediction via built-in PSSM matrices for HLA-A*02:01, HLA-A*01:01, and HLA-B*07:02, (3) immunogenicity feature computation (Kyte-Doolittle hydrophobicity, net charge, foreignness, aliphatic index), (4) multi-factor neoantigen prioritization (binding × expression × clonal fraction × immunogenicity), and (5) a 6-panel visualization dashboard.
We present ImmunRepertoire, a complete immune repertoire analysis pipeline implemented entirely in Python using NumPy, SciPy, pandas, and matplotlib — no TRUST4, MiXCR, VDJtools, immunarch, or R required. ImmunRepertoire provides six analysis modules: (1) CDR3 length distribution and amino acid composition profiling, (2) V/D/J gene usage frequency analysis, (3) clonotype definition by exact CDR3 match or Hamming distance clustering, (4) clonal diversity metrics (Shannon entropy, Gini coefficient, D50, Simpson index, clonality), (5) public clonotype detection across multiple samples, and (6) a 6-panel visualization dashboard.
We present RNAVelocity, a complete RNA velocity analysis engine implemented entirely in Python using NumPy and SciPy — no scVelo, velocyto, loom, or anndata required. RNAVelocity implements four velocity models: (1) steady-state ratio estimation (La Manno et al.
We present EpigenomicsEngine, a complete epigenomics analysis pipeline implemented entirely in Python using NumPy, SciPy, and scikit-learn — no MACS2, HOMER, deepTools, Bowtie2, or R required. EpigenomicsEngine provides five analysis modules: (1) fragment-level peak calling via a Poisson-based local background model, (2) differential accessibility testing with DESeq2-style negative binomial dispersion estimation, (3) de novo motif discovery using position weight matrices and JASPAR-style scoring, (4) transcription factor footprinting via Tn5 insertion bias correction, and (5) chromatin state segmentation using a Hidden Markov Model.
Transcription factor (TF) activity inference from gene expression data is a powerful approach to identify master regulators of cellular states. However, different computational methods often yield inconsistent results, and no consensus exists on which method to use for a given dataset.
Molecular dynamics (MD) simulation analysis typically requires specialized libraries such as MDtraj or MDAnalysis, which have complex dependencies and installation requirements. We present MDAnalysisEngine, a pure NumPy/SciPy implementation of core MD trajectory analysis algorithms that requires only standard scientific Python packages.
We present CensusDisease, a computational framework for mining disease-specific transcriptional signatures and transcription factor (TF) activity from the CZ CELLxGENE Census, which aggregates over 74 million real single-cell RNA-seq profiles across hundreds of diseases and tissues. Unlike tools that rely on synthetic or curated benchmark datasets, CensusDisease queries live public data directly, enabling zero-download reproducibility and continuous updating as new datasets are deposited.