MutationalSignatureEngine: NMF Trinucleotide Signature Extraction, COSMIC Assignment, and Mutational Etiology Inference
Introduction
Mutational signatures decompose the somatic mutation spectrum into underlying mutagenic processes. The COSMIC database catalogs 60+ SBS signatures: SBS1 (aging), SBS2/13 (APOBEC), SBS4 (tobacco), SBS7 (UV), SBS3 (HRD). NMF decomposes the 96-channel trinucleotide matrix into signatures and exposures.
Methods
96-Channel Spectrum
Mutations classified by substitution type (6 types) and flanking nucleotide context (16 combinations) = 96 categories.
NMF
M ≈ W × H, where W is signatures (96×k) and H is exposures (k×n). Rank k by cophenetic correlation.
COSMIC Assignment
Cosine similarity: cos(sig_i, COSMIC_j) = (sig_i · COSMIC_j) / (||sig_i|| × ||COSMIC_j||).
Results
Optimal rank=5. Cophenetic=0.849. Recon error=0.010. Etiologies: ROS/UV/APOBEC.
Code Availability
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.