AlphaMissense Pathogenic-Benign Mean-Score Gap Across 430 Human ClinVar Genes Ranges From 0.06 (ZNF469) to 0.83 (GABRB3) — A 14× Per-Gene Difficulty Spread, With Zero Inverted Genes
AlphaMissense Pathogenic-Benign Mean-Score Gap Across 430 Human ClinVar Genes Ranges From 0.06 (ZNF469) to 0.83 (GABRB3) — A 14× Per-Gene Difficulty Spread, With Zero Inverted Genes
Abstract
We compute the per-gene mean AlphaMissense pathogenicity-score gap between Pathogenic and Benign ClinVar variants across the 430 human genes with ≥20 Pathogenic AND ≥20 Benign variants in the dbNSFP v4 (Liu et al. 2020) annotation of 372,927 ClinVar Pathogenic + Benign records (Landrum et al. 2018) returned by MyVariant.info (Wu et al. 2021), drawing on AlphaMissense scores (Cheng et al. 2023). The gap distribution spans 0.062 to 0.826 — a 14× per-gene difficulty spread. Zero genes invert (no gene has mean Benign AM > mean Pathogenic AM) — AlphaMissense gets the directional separation right on every gene with sufficient sample size. The 10 genes with the cleanest separation (gap ≥ 0.80) are GABRB3, KRT10, CSF1R, KCNB1, KIT, SMAD4, COL3A1, SKI, FOXG1, RPGR — small-to-medium structured genes with well-characterized disease alleles. The 10 hardest genes (gap < 0.27) are dominated by large disordered or repeat-rich proteins: ZNF469 (0.06), LAMA5 (0.08), MEFV (0.12), PCSK9 (0.13), SAMD9 (0.13), TTN (0.21), APP (0.24), RELN (0.24), RARS2 (0.25), ADGRV1 (0.26). For TTN (titin, ~34,000 aa, mostly disordered Ig-like repeats and PEVK linkers), the gap of 0.21 across 94 Pathogenic and 2,365 Benign variants reflects AM's difficulty on the largest human protein. The actionable per-gene difficulty rank is published in result.json for any clinical-genomics pipeline to prioritize human review for variants in low-gap genes. We provide bootstrap 95% CIs on the cleanest and hardest 10 genes (1000 resamples; seed = 42) and explicitly discuss the AlphaMissense training-set memorization confound.
1. Background
AlphaMissense (Cheng et al. 2023) reports overall AUC 0.94 on ClinVar at the corpus level. Less commonly reported: per-gene mean-score-gap, which exposes which genes are easy versus hard for the predictor. A gene with a large gap (e.g., 0.83) means AM produces a near-bimodal distribution: Pathogenic variants cluster near 1.0, Benign near 0.0. A gene with a small gap (e.g., 0.06) means AM's per-variant score does not separate the classes — the predictor is operating in its lowest-confidence regime on that gene.
This paper measures the per-gene gap across the 430 high-data ClinVar genes and identifies the cleanest and hardest genes by that criterion.
2. Method
2.1 Data
- 178,509 Pathogenic + 194,418 Benign ClinVar single-nucleotide variants from MyVariant.info (Wu et al. 2021), with dbNSFP v4 annotation (Liu et al. 2020).
- For each variant: extract
dbnsfp.alphamissense.score(max across isoforms; Cheng 2023) anddbnsfp.genename(first if array).
2.2 Per-gene metrics
- Group variants by gene name. Restrict to genes with ≥20 Pathogenic AND ≥20 Benign variants in the joined corpus. N = 430 genes.
- For each gene: compute mean AM score per class.
- Gap = mean(AM | Pathogenic) − mean(AM | Benign).
- A gene is inverted if mean(AM | Benign) > mean(AM | Pathogenic).
- Bootstrap 95% CI on the gap: resample with replacement n_P times from each gene's Pathogenic AM scores and n_B times from Benign (random seed 42), recompute gap, take [2.5%, 97.5%] empirical quantiles. 1000 resamples per gene.
3. Results
3.1 Top-line
- N = 430 genes meet the ≥20 P AND ≥20 B threshold.
- 74,583 Pathogenic + 181,113 Benign variants total in this gene set.
- Gap range: 0.062 (ZNF469) to 0.826 (GABRB3) — 14× spread.
- 0 inverted genes (mean Pathogenic AM > mean Benign AM on every single gene).
3.2 The 10 cleanest-separation genes (gap ≥ 0.80)
| Gene | n_P | n_B | mean P AM | mean B AM | Gap (95% CI) |
|---|---|---|---|---|---|
| GABRB3 | 73 | 35 | 0.959 | 0.133 | 0.826 [0.787, 0.864] |
| KRT10 | 23 | 24 | 0.995 | 0.184 | 0.812 [0.748, 0.869] |
| CSF1R | 44 | 100 | 0.950 | 0.140 | 0.810 [0.776, 0.842] |
| KCNB1 | 87 | 145 | 0.979 | 0.170 | 0.809 [0.787, 0.831] |
| KIT | 39 | 116 | 0.924 | 0.117 | 0.807 [0.769, 0.843] |
| SMAD4 | 35 | 48 | 0.984 | 0.178 | 0.806 [0.766, 0.842] |
| COL3A1 | 547 | 56 | 0.934 | 0.130 | 0.804 [0.781, 0.826] |
| SKI | 25 | 80 | 0.928 | 0.123 | 0.804 [0.760, 0.846] |
| FOXG1 | 96 | 88 | 0.993 | 0.190 | 0.803 [0.781, 0.825] |
| RPGR | 56 | 92 | 0.930 | 0.128 | 0.802 [0.773, 0.829] |
These are genes where AlphaMissense achieves near-complete separation: pathogenic variants score ~0.95 average, benign variants ~0.15 average. Most are compact, well-folded human proteins with established Mendelian disease alleles (GABRB3 epilepsy; KIT GIST; SMAD4 juvenile polyposis; COL3A1 Ehlers-Danlos type IV; FOXG1 Rett syndrome variant).
3.3 The 10 hardest-separation genes (gap < 0.27)
| Gene | n_P | n_B | mean P AM | mean B AM | Gap (95% CI) |
|---|---|---|---|---|---|
| ZNF469 | 21 | 606 | 0.197 | 0.134 | 0.062 [0.005, 0.114] |
| LAMA5 | 21 | 211 | 0.213 | 0.136 | 0.078 [0.013, 0.144] |
| MEFV | 25 | 164 | 0.279 | 0.158 | 0.121 [0.069, 0.175] |
| PCSK9 | 35 | 79 | 0.242 | 0.116 | 0.126 [0.066, 0.184] |
| SAMD9 | 30 | 72 | 0.315 | 0.188 | 0.127 [0.068, 0.187] |
| TTN | 94 | 2,365 | 0.532 | 0.321 | 0.211 [0.175, 0.246] |
| APP | 28 | 35 | 0.570 | 0.334 | 0.236 [0.146, 0.323] |
| RELN | 20 | 396 | 0.551 | 0.307 | 0.244 [0.175, 0.319] |
| RARS2 | 31 | 20 | 0.465 | 0.213 | 0.252 [0.173, 0.330] |
| ADGRV1 | 36 | 941 | 0.470 | 0.212 | 0.258 [0.219, 0.298] |
These are dominated by large repeat-rich or disordered proteins:
- ZNF469 (~4,000 aa, brittle cornea syndrome): zinc finger repeats.
- LAMA5 (~3,700 aa, basement membrane laminin): multi-domain extracellular matrix.
- TTN (~34,000 aa, titin sarcomeric protein): the largest human protein, mostly Ig-like repeats and disordered PEVK linkers.
- APP (~770 aa, β-amyloid precursor): Alzheimer's disease gene with well-studied alternative splicing.
- RELN (~3,460 aa, reelin): ECM signaling, multi-domain.
- ADGRV1 (~6,300 aa, adhesion GPCR): massive extracellular domain.
3.4 The "0 inverted" finding
Across 430 genes, AlphaMissense never gets the directional separation wrong on average. There is no gene where mean(AM | Benign) > mean(AM | Pathogenic). This is a strong but easily-overlooked positive finding for AlphaMissense: even in its hardest cases, the model orders the classes correctly on average.
The closest-to-inverted gene (ZNF469 at gap 0.062, 95% CI [0.005, 0.114]) is borderline; the lower CI bound is just above zero but does not cross. For ZNF469 with 21 Pathogenic and 606 Benign, the per-class means are 0.197 and 0.134 — a modest separation that AM achieves despite the disordered zinc-finger-repeat character.
3.5 Practical recommendation
A clinical-genomics pipeline interpreting a novel variant in a gene with mean-gap < 0.30 (the bottom ~10% of named genes) should:
- Discount the AM score: in those genes, the predictor's separation signal is weak; absolute scores are unreliable.
- Seek complementary-tool consensus: REVEL, CADD, EVE, or other VEPs may carry independent signal.
- Always escalate to expert review: gap < 0.30 means the predictor is operating in its lowest-confidence regime.
4. Confound analysis
4.1 Mean-gap is a coarse metric
AUC per gene (Mann-Whitney) would be a sharper classification metric. The mean-gap conflates within-class spread with between-class separation. We report gap because it is interpretable in the same units as AM's score (0–1) and provides a single per-gene number ranking the difficulty.
4.2 N ≥ 20 P AND ≥ 20 B filters out lopsided genes
~13,000 genes in our corpus have <20 Pathogenic OR <20 Benign and are excluded from this per-gene analysis. The 430 reported genes are biased toward research-active and clinically-tracked Mendelian disease genes.
4.3 AlphaMissense training-set memorization
AlphaMissense was trained partly on ClinVar labels; some per-gene gap reflects memorization rather than mechanistic generalization. The 0/430 inverted-gene rate may be partly a memorization artifact for genes with many training variants. However, the gap-magnitude ranking (GABRB3 cleanest, ZNF469 hardest) is consistent with genuine biology: small structured Mendelian-classic genes versus large disordered repeat-proteins.
4.4 Per-isoform max-score may inflate gap
We use max AM score across isoforms reported by MyVariant.info. This may slightly inflate per-gene gap; effect is consistent across all genes.
4.5 Stop-gain contamination
Some "missense"-classified variants in MyVariant.info have aa.alt = X. Genes with high stop-gain Pathogenic fraction may have artificially-high Pathogenic mean scores (stop-gain residues often score near 1.0 in AM). This inflates the gap for those genes.
5. Implications
- AlphaMissense is directionally correct on every gene with sufficient data (0/430 inverted) — a strong positive baseline.
- The 14× per-gene difficulty spread is large: practitioners should not assume uniform AM reliability across genes.
- Disordered / repeat-rich genes are AM's hardest regime (ZNF469, LAMA5, TTN, RELN, ADGRV1).
- Per-gene mean-score-gap with bootstrap CI is a useful single-number difficulty metric that complements per-gene AUC. We publish the full ranked list.
- Genes with mean-gap < 0.30 (~10% of high-data genes) should default to alternative-VEP or human-review at variant interpretation.
6. Limitations
- Mean-gap is a coarse metric (§4.1).
- N ≥ 20 P + ≥ 20 B filter biases toward research-active Mendelian genes (§4.2).
- Per-isoform max-score may inflate gap (§4.4).
- AM training-set memorization confound (§4.3).
- Stop-gain contamination may inflate gap for some genes (§4.5).
7. Reproducibility
- Script:
analyze.js(Node.js, ~80 LOC, zero deps). - Inputs: ClinVar P + B JSON cache from MyVariant.info.
- Outputs:
result.jsoncontaining all 430 gene-level statistics with bootstrap 95% CI on cleanest-10 and hardest-10. - Random seed: 42.
- Verification mode: 6 machine-checkable assertions: (a) all gaps in [-1, +1]; (b) bootstrap CI contains the point estimate; (c) inverted-gene count = 0; (d) max gap > 0.80; (e) min gap < 0.10; (f) ratio of max/min gap > 10.
node analyze.js
node analyze.js --verify8. References
- Cheng, J., et al. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492.
- Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4. Genome Med. 12, 103.
- Wu, C., et al. (2021). MyVariant.info. Bioinformatics 37, 4029–4031.
- Landrum, M. J., et al. (2018). ClinVar. Nucleic Acids Res. 46, D1062–D1067.
- Ioannidis, N. M., et al. (2016). REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885.
- Bang, M.-L., et al. (2001). The complete gene sequence of titin. Circ. Res. 89, 1065–1072. (TTN reference.)
- Hopkinson, S. B., et al. (2014). KRT10 mutations in keratinopathies. (KRT10 reference.)
- Pepin, M., et al. (2000). Clinical and genetic features of Ehlers-Danlos syndrome type IV. N. Engl. J. Med. 342, 673–680. (COL3A1 reference.)
- Ariani, F., et al. (2008). FOXG1 is responsible for the congenital variant of Rett syndrome. Am. J. Hum. Genet. 83, 89–93.
- Karczewski, K. J., et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443. (gnomAD LOEUF reference.)