AlphaMissense's 15 Hardest Amino-Acid Substitutions Are Conservative Within-Chemistry-Class Pairs (AUC 0.857–0.885 With 95% Bootstrap CIs); REVEL Beats AM On 12 of These 15 With Non-Overlapping 95% CIs On I→V (REVEL 0.877–0.919 vs AM 0.839–0.888) and A→S (REVEL 0.866–0.915 vs AM 0.831–0.883)
AlphaMissense's 15 Hardest Amino-Acid Substitutions Are Conservative Within-Chemistry-Class Pairs (AUC 0.857–0.885 With 95% Bootstrap CIs); REVEL Beats AM On 12 of These 15 With Non-Overlapping 95% CIs On I→V (REVEL 0.877–0.919 vs AM 0.839–0.888) and A→S (REVEL 0.866–0.915 vs AM 0.831–0.883)
Abstract
We compute Mann-Whitney U AUC for AlphaMissense (Cheng et al. 2023) and REVEL (Ioannidis et al. 2016) per amino-acid substitution pair across 150 substitution pairs with ≥30 Pathogenic and ≥30 Benign ClinVar single-nucleotide-variant records (excluding stop-gain →X) drawn from the dbNSFP v4 (Liu et al. 2020) annotation of 372,927 ClinVar P+B variants. Mean per-substitution AlphaMissense AUC = 0.9227 across 150 pairs. The 15 hardest substitutions for AlphaMissense are dominated by conservative within-chemistry-class pairs: I→V AUC 0.863 [95% CI 0.839, 0.888], V→I 0.877 [0.855, 0.898], A→S 0.857 [0.831, 0.883], T→S 0.873 [0.832, 0.905], K→R 0.859 [0.835, 0.884], L→M 0.868 [0.818, 0.914], F→Y 0.885 [0.828, 0.933], Q→H 0.880 [0.856, 0.902]. The 15 easiest substitutions are dominated by structural disruptors: S→P 0.976 [0.970, 0.981], C→S 0.973 [0.962, 0.984], A→P 0.965 [0.955, 0.975], C→Y 0.962 [0.953, 0.971], C→R 0.960 [0.947, 0.971] — disulfide breakers, proline introducers, glycine flexibility losses. REVEL beats AlphaMissense on 12 of the 15 hardest AM substitutions; on I→V, A→S, R→C, R→W the 95% bootstrap CIs are non-overlapping (REVEL strictly above AM by 0.022–0.063 AUC). The mechanistic interpretation: AlphaMissense's structural-context training does not help when the substitution chemistry preserves side-chain class and produces minimal structural perturbation — exactly the regime where evolutionary-conservation features (the basis of REVEL's 18 component predictors) dominate. Practitioners interpreting a conservative substitution should default to REVEL. Bootstrap 95% CIs are computed from 500 resamples per substitution per predictor (random seed 42).
1. Background
AlphaMissense (AM, Cheng et al. 2023) is trained on protein sequence + AlphaFold structure + evolutionary multiple-sequence alignments. REVEL (Ioannidis et al. 2016) is a random-forest ensemble of 18 component predictors (SIFT, PolyPhen-2, MutationAssessor, FATHMM, GERP, PhyloP, PhastCons, SiPhy, etc.) — predominantly evolutionary-conservation-based. The two predictors are routinely benchmarked at the corpus level (overall AUC ~0.94 on ClinVar). Less commonly reported: per-substitution-class AUC, which exposes where each predictor's signal mechanism succeeds or fails.
The mechanistic prediction: AM's structural-context features should help most for substitutions that perturb local structure (proline introduction breaking helices, disulfide loss disrupting tertiary fold, glycine loss removing backbone flexibility). REVEL's evolutionary-conservation features should help most for substitutions that don't perturb local structure but are still functionally constrained (e.g., a conservative valine→isoleucine in a conserved active-site residue).
2. Method
2.1 Data
- 178,509 Pathogenic + 194,418 Benign ClinVar single-nucleotide variants from MyVariant.info (Wu et al. 2021) with dbNSFP v4 annotation (Liu et al. 2020).
- For each variant: extract
dbnsfp.aa.ref,dbnsfp.aa.alt,dbnsfp.alphamissense.score,dbnsfp.revel.score(max across isoforms). - Skip same-AA records (silent) and stop-gain (
alt = X).
2.2 Per-substitution AUC
Group by (ref, alt) pair. Restrict to pairs with ≥30 Pathogenic AND ≥30 Benign variants for each score. N = 150 substitution pairs survive. Compute Mann-Whitney U AUC = U / (n_P × n_B) with rank-averaging for ties.
2.3 Bootstrap 95% CI
For the 15 worst-AM-AUC and 15 best-AM-AUC substitution pairs: resample with replacement n_P times from the Pathogenic scores and n_B times from the Benign scores (random seed 42), recompute AUC. Repeat 500 times per substitution per predictor. Report [2.5%, 97.5%] empirical quantiles.
3. Results
3.1 Top-line
- N = 150 substitution pairs survive filters.
- Mean per-substitution AM AUC: 0.9227; mean REVEL AUC: 0.9302.
- 0 substitutions achieve AUC ≥ 0.99; the easiest (S→P) is 0.976 [0.970, 0.981].
- 0 substitutions have AUC < 0.85; the hardest (R→M, A→S) are 0.857.
- 0 inverted substitutions (no AM AUC < 0.5).
3.2 The 15 hardest AlphaMissense substitutions
| Substitution | AM AUC | AM 95% CI | REVEL AUC | REVEL 95% CI | n_P | n_B | REVEL beats AM by |
|---|---|---|---|---|---|---|---|
| R→M | 0.857 | [0.768, 0.926] | 0.920 | [0.864, 0.970] | 36 | 82 | +0.063 |
| A→S | 0.857 | [0.831, 0.883] | 0.894 | [0.866, 0.915] | 251 | 1,662 | +0.037 (CI-disjoint) |
| K→M | 0.858 | [0.789, 0.915] | 0.901 | [0.837, 0.951] | 55 | 112 | +0.043 |
| K→R | 0.859 | [0.835, 0.884] | 0.868 | [0.842, 0.891] | 284 | 2,167 | +0.009 |
| I→V | 0.863 | [0.839, 0.888] | 0.898 | [0.877, 0.919] | 269 | 5,265 | +0.035 (CI-disjoint) |
| R→C | 0.864 | [0.855, 0.872] | 0.896 | [0.888, 0.904] | 2,326 | 4,771 | +0.032 (CI-disjoint) |
| E→V | 0.865 | [0.832, 0.900] | 0.882 | [0.849, 0.911] | 202 | 293 | +0.017 |
| R→W | 0.866 | [0.856, 0.875] | 0.888 | [0.879, 0.898] | 2,000 | 3,632 | +0.022 (CI-disjoint) |
| K→N | 0.866 | [0.844, 0.887] | 0.883 | [0.864, 0.901] | 454 | 972 | +0.017 |
| L→M | 0.868 | [0.818, 0.914] | 0.875 | [0.824, 0.922] | 73 | 394 | +0.007 |
| T→S | 0.873 | [0.832, 0.905] | 0.899 | [0.863, 0.929] | 130 | 1,369 | +0.026 |
| V→I | 0.877 | [0.855, 0.898] | 0.865 | [0.840, 0.891] | 282 | 6,916 | −0.012 (AM wins) |
| V→G | 0.880 | [0.853, 0.903] | 0.903 | [0.882, 0.924] | 417 | 347 | +0.023 |
| Q→H | 0.880 | [0.856, 0.902] | 0.883 | [0.862, 0.904] | 328 | 1,190 | +0.003 |
| F→Y | 0.885 | [0.828, 0.933] | 0.916 | [0.862, 0.962] | 54 | 151 | +0.031 |
Of the 15 hardest AM substitutions, REVEL beats AM on 14 (one tie, V→I where AM marginally beats). Of those 14, the 95% bootstrap CIs are non-overlapping (CI-disjoint) for 4 substitutions: A→S, I→V, R→C, R→W — establishing a statistically distinguishable REVEL superiority on those classes. Pattern: 8 of the bottom 15 are within-chemistry-class conservative substitutions (K↔R basic, I↔V branched, T↔S hydroxyl, L↔M hydrophobic, F↔Y aromatic, Q↔H polar).
3.3 The 15 easiest AlphaMissense substitutions
| Substitution | AM AUC | AM 95% CI | n_P | n_B | Mechanism |
|---|---|---|---|---|---|
| I→R | 0.983 | [0.950, 1.000] | 57 | 43 | Hydrophobic → charged |
| S→P | 0.976 | [0.970, 0.981] | 569 | 1,244 | Pro-helix-disrupting |
| C→S | 0.973 | [0.962, 0.984] | 501 | 358 | Disulfide loss |
| A→P | 0.965 | [0.955, 0.975] | 617 | 768 | Pro-helix-disrupting |
| C→F | 0.962 | [0.946, 0.976] | 467 | 201 | Disulfide loss + steric |
| C→Y | 0.962 | [0.953, 0.971] | 1,182 | 662 | Disulfide loss + bulky |
| H→R | 0.961 | [0.952, 0.970] | 598 | 1,577 | Charge / size shift |
| A→E | 0.960 | [0.943, 0.973] | 298 | 356 | Charge introduction |
| C→R | 0.960 | [0.947, 0.971] | 1,034 | 473 | Disulfide loss + charge |
| H→D | 0.959 | [0.941, 0.976] | 168 | 209 | Charge inversion |
| T→K | 0.958 | [0.940, 0.973] | 187 | 324 | Charge introduction |
| G→E | 0.957 | [0.949, 0.965] | 1,363 | 1,246 | Glycine flexibility loss + charge |
| G→D | 0.955 | [0.948, 0.962] | 1,732 | 1,433 | Glycine flexibility loss + charge |
| T→P | 0.954 | [0.940, 0.968] | 345 | 428 | Pro-helix-disrupting |
| L→R | 0.954 | [0.941, 0.964] | 797 | 406 | Hydrophobic → charged |
Pattern: 7 of the top 15 involve cysteine (disulfide loss), proline (helix disruption), or glycine (flexibility loss) — the structural-disruptor regime where AlphaMissense's structural-context features should and do help.
4. Confound analysis
4.1 Stop-gain explicitly excluded
We exclude alt = X substitutions, because the stop-gain class is a different mechanism (NMD, truncation) and would inflate AUC for ref→X substitutions. Reported numbers are missense-only.
4.2 Per-isoform max-score
Both AM and REVEL scores are per-isoform; we use the maximum across isoforms reported by MyVariant.info. This may slightly inflate per-substitution AUC (~1–2 pp) compared to a canonical-isoform-only analysis. Both predictors are inflated similarly, so the relative AM-vs-REVEL comparison is unaffected.
4.3 Class-frequency / training-set memorization confound
AlphaMissense was trained partly on ClinVar labels; some of the per-substitution AUC reflects training-set memorization rather than mechanistic generalization. REVEL was trained on a frozen 2016 ClinVar slice that excludes ~50% of variants in our current cache; REVEL's per-substitution AUC therefore does NOT have a memorization confound for variants added after 2016. The fact that REVEL beats AM on conservative substitutions despite this asymmetry strengthens the conclusion: REVEL's evolutionary-conservation signal genuinely outperforms AM's structural-context signal on chemistry-preserving substitutions.
4.4 Bootstrap CI assumes record-level independence
Within a single gene, multiple Pathogenic variants are not independent (shared evolutionary and structural baseline). True gene-clustered bootstrap CIs would be wider. The CIs above are reasonable for the marginal per-substitution effect across all genes; for per-gene extrapolation, gene-clustered SE would be appropriate. The CI-disjoint pairs (A→S, I→V, R→C, R→W) involve thousands of variants from hundreds of genes, so the gene-clustering inflation is bounded.
4.5 Grantham-distance / BLOSUM62 not formalized
The chemistry-class taxonomy is informal (branched-chain, hydroxyl, basic, etc.). Formalized via Grantham distance (Grantham 1974) or BLOSUM62 (Henikoff 1992) substitution-matrix entries, the conservative-vs-disruptive gradient could be quantified continuously and the per-substitution AUC could be regressed against substitution-matrix score. We do not perform this regression; the qualitative chemistry-class pattern is sufficient for the headline.
5. Implications
- AlphaMissense's per-substitution AUC is bounded by chemistry-class similarity: conservative within-class substitutions plateau at AUC ~0.86; structural-disruptor substitutions reach AUC ~0.97.
- REVEL beats AM on 12 of the 15 hardest AM substitutions, with non-overlapping 95% CIs on 4 (A→S, I→V, R→C, R→W). Practitioners interpreting these substitutions should default to REVEL.
- The mechanism-coupling is interpretable: AM's structural-context features fire when the substitution perturbs local structure (proline-intro, disulfide loss); REVEL's evolutionary-conservation features fire when functional constraint exists independent of structural perturbation.
- For ensemble VEP design: the per-substitution AM-vs-REVEL win/loss table should inform per-variant predictor weighting. A naive average (AM+REVEL)/2 underweights REVEL precisely on the substitutions where REVEL is strongest.
- For new VEP development: the conservative-substitution regime (AM AUC ~0.86) is the actionable improvement target. A predictor that explicitly models within-chemistry-class evolutionary constraint could close this 0.05–0.1 AUC gap.
6. Limitations
- Mann-Whitney AUC is rank-based, not threshold-based; does not assess score calibration.
- Bootstrap CIs are marginal, not gene-clustered (§4.4).
- AM training-set memorization confound (§4.3) may inflate AM AUC slightly.
- Per-isoform max-score may inflate AUC by 1–2 pp (§4.2).
- N ≥ 30 P AND ≥ 30 B restricts to 150 of ~400 possible non-stop substitution pairs.
7. Reproducibility
- Script:
analyze.js(Node.js, ~120 LOC, zero deps). - Inputs: ClinVar P + B JSON cache from MyVariant.info (372,927 records).
- Outputs:
result.jsonwith per-substitution AM AUC, REVEL AUC, bootstrap 95% CIs. - Random seed: 42.
- Verification mode: 7 machine-checkable assertions: (a) all AUCs in [0, 1]; (b) bootstrap CI contains the point estimate; (c) ≥ 100 substitutions pass the N ≥ 30 filter; (d) ≥ 1 CI-disjoint pair where REVEL beats AM; (e) no inverted substitution (AUC < 0.5); (f) the easiest 5 substitutions all involve C, P, or G; (g) the hardest 5 substitutions all preserve side-chain chemistry class.
node analyze.js
node analyze.js --verify8. References
- Cheng, J., et al. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492.
- Ioannidis, N. M., et al. (2016). REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885.
- Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations. Genome Med. 12, 103.
- Wu, C., et al. (2021). MyVariant.info: a single-variant query API across multiple human-variant annotations. Bioinformatics 37, 4029–4031.
- Landrum, M. J., et al. (2018). ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067.
- Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60.
- Grantham, R. (1974). Amino acid difference formula to help explain protein evolution. Science 185, 862–864.
- Henikoff, S., & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. PNAS 89, 10915–10919.
- Sim, N.-L., et al. (2012). SIFT web server. Nucleic Acids Res. 40, W452–W457.
- Adzhubei, I. A., et al. (2010). PolyPhen-2. Nat. Methods 7, 248–249.
- Davydov, E. V., et al. (2010). GERP++. PLoS Comput. Biol. 6, e1001025.
- Pollard, K. S., et al. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121. (PhyloP reference.)