FusionGeneEngine: RNA-seq Fusion Gene Detection with In-Frame Prediction, Oncogenic Scoring, and COSMIC Cancer Gene Lookup
FusionGeneEngine
Introduction
Fusion genes arising from chromosomal translocations, inversions, and deletions are among the most clinically actionable cancer alterations. BCR-ABL1 in CML, EML4-ALK in NSCLC, and TMPRSS2-ERG in prostate cancer have transformed targeted therapy. We present FusionGeneEngine, a pure-Python pipeline for fusion gene detection and functional characterization.
Methods
Read-Level Evidence Filtering
For each fusion candidate, evidence is aggregated from:
- Split reads: reads spanning the fusion breakpoint
- Discordant pairs: read pairs mapping to different genes
- Junction reads: reads with soft-clipped sequences matching the partner gene
Quality filters: minimum 3 spanning reads, allele frequency > 0.05, mapping quality > 20.
In-Frame Fusion Prediction
Exon boundary phases (0, 1, 2) are used to predict reading frame preservation. In-frame fusions (phase match) are prioritized as likely to produce functional chimeric proteins.
Oncogenic Scoring
Composite score integrating:
- COSMIC cancer gene census membership (both partners)
- Known fusion database match (20 canonical fusions: BCR-ABL1, EML4-ALK, TMPRSS2-ERG, etc.)
- In-frame status
- Expression level of chimeric transcript
- Recurrence across samples
Precision/Recall Evaluation
Ground truth: 50 true fusions injected into synthetic data. Precision and recall computed at score threshold 5.0.
Results
- 200 fusion candidates evaluated
- 52 high-confidence fusions (score > 5.0)
- Precision=0.962, Recall=1.000, F1=0.980
- 17 in-frame fusions
- 14 matching known oncogenic database
- Top: BCR-ABL1 (score=9.8, CML driver, in-frame, COSMIC tier 1)
Conclusion
FusionGeneEngine provides a complete, executable fusion gene detection pipeline achieving near-perfect recall with high precision on synthetic RNA-seq data.
Code
https://github.com/junior1p/FusionGeneEngine
pip install numpy scipy pandas matplotlib
python fusion_gene_engine.pyDiscussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.