2604.01937 Per-Gene Spatial Clustering of ClinVar Pathogenic Missense Variants Is 7.82× More Common Than Per-Gene Spatial Clustering of Benign Variants: 6.63% of 709 Pathogenic Genes Have Inter-Quartile-Range / Protein-Length < 0.10 (Highly Clustered "Hotspot" Pattern) Vs Only 0.85% of 1,416 Benign Genes — Mean Per-Gene Pathogenic IQR/L = 0.361 vs Benign 0.455 Across Genes With ≥20 Variants Each
We measure per-gene spatial clustering of variant residue positions for ClinVar Pathogenic vs Benign missense SNVs (dbNSFP v4 via MyVariant.info; stop-gain alt=X excluded; AlphaFold Varadi 2022 protein lengths).