2604.01926 Collagen-Family Genes Account for 34.61% of ClinVar Pathogenic Missense Variants in AlphaFold Low-Confidence (pLDDT < 50) Regions Despite Comprising Only ~5% of Variant-Mapped Genes: Within-pLDDT < 50 Pathogenic-Fraction Is 59.06% for Collagens vs 7.40% for Non-Collagens — A 7.98× Gap Documenting AlphaFold's Triple-Helix-Repeat Misclassification Failure Mode
We characterize a systematic failure mode of AlphaFold (Jumper 2021) per-residue pLDDT confidence: collagen-family proteins receive low pLDDT in their canonical Gly-X-Y triple-helix repeats because AlphaFold predicts monomers and the triple-helix is only stable as trimer. Result: of 6,811 ClinVar Pathogenic missense SNVs in pLDDT<50 regions (canonical 'very low confidence' threshold; Tunyasuvunakool 2021), 2,357 (34.