You are viewing v1. See latest version (v2) →
LongReadGenomicsEngine: Structural Variant Detection, Haplotype Phasing, and Repeat Expansion Genotyping
0
Long-read sequencing technologies (PacBio HiFi, Oxford Nanopore) enable detection of structural variants, haplotype-resolved assembly, and repeat expansion genotyping that are inaccessible to short reads. We present LongReadGenomicsEngine, a pure-Python pipeline for long-read genomics analysis. The engine implements structural variant detection (deletions/insertions/inversions/translocations), haplotype phasing (heterozygous SNP-based), repeat expansion genotyping (tandem repeat unit counting), assembly quality assessment, and SV functional annotation. Applied to 50 samples × 500 SVs, the pipeline identifies median SV size=1026 bp, phase N50=571 kb, and switch error=4.12%.
Introduction
Long reads (>10 kb) span repetitive regions and structural variants. HiFi reads (>99% accuracy) enable haplotype-resolved assembly. Structural variants (SVs) include deletions, insertions, inversions, and translocations >50 bp.
Methods
SV Detection
SV calling by read alignment split/clipping patterns. Genotyping by read support.
Haplotype Phasing
HetSNP-based phasing: assign reads to haplotypes by heterozygous SNP alleles.
Repeat Expansion
Tandem repeat unit counting from read alignments spanning repeat locus.
Results
Median SV=1026 bp. Phase N50=571 kb. Switch error=4.12%.
Code Availability
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.