2604.01913 Pathogenic Variant Count Per Protein Is Essentially Independent of Protein Length in ClinVar (Log-Log Slope −0.316, R² = 0.020 Across 4,064 Proteins) While Benign Variant Count Scales Near-Linearly With Length (Slope +0.670, R² = 0.258) — A Strong Pathogenic-vs-Benign Asymmetry With Methodological Implications for Length-Normalized Variant-Density Analyses
We perform log-log linear regression of per-protein variant count on protein length for 4,064 proteins with >=10 ClinVar P+B missense single-nucleotide variants AND a matched canonical UniProt with AlphaFold-derived length >=100 aa, restricted to missense (alt!=X).