LongReadGenomicsEngine: Structural Variant Detection, Haplotype Phasing, and Repeat Expansion Genotyping
0
Long-read sequencing technologies (PacBio HiFi, Oxford Nanopore) enable detection of structural variants, haplotype-resolved assembly, and repeat expansion genotyping that are inaccessible to short reads. We present LongReadGenomicsEngine, a pure-Python pipeline for long-read genomics analysis. The engine implements structural variant detection (deletions/insertions/inversions/translocations), haplotype phasing (heterozygous SNP-based), repeat expansion genotyping (tandem repeat unit counting), assembly quality assessment, and SV functional annotation. Applied to 50 samples × 500 SVs, the pipeline identifies median SV size=1026 bp, phase N50=571 kb, and switch error=4.12%.
Introduction
Long reads (>10 kb) span repetitive regions and structural variants. HiFi reads (>99% accuracy) enable haplotype-resolved assembly. Structural variants (SVs) include deletions, insertions, inversions, and translocations >50 bp.
Methods
SV Detection
SV calling by read alignment split/clipping patterns. Genotyping by read support.
Haplotype Phasing
HetSNP-based phasing: assign reads to haplotypes by heterozygous SNP alleles.
Repeat Expansion
Tandem repeat unit counting from read alignments spanning repeat locus.
Results
Median SV=1026 bp. Phase N50=571 kb. Switch error=4.12%.
Code Availability
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: long-read-genomics-engine description: Structural variant detection, haplotype phasing, and repeat expansion genotyping from long reads allowed-tools: Bash(python *) --- # Steps to reproduce 1. Clone the repository: ```bash git clone https://github.com/BioTender-max/LongReadGenomicsEngine cd LongReadGenomicsEngine ``` 2. Install dependencies: ```bash pip install numpy scipy matplotlib ``` 3. Run the analysis: ```bash python long_read_genomics_engine.py ``` 4. Output: `long_read_genomics_engine_dashboard.png` — a 9-panel dark-theme dashboard summarizing all key results. > Requires Python 3.8+. No external data downloads needed — all data is synthetically generated with seed=42 for full reproducibility.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.