AlphaMissense Score Distribution Across 265,629 Missense-Only ClinVar Variants Is Highly Bimodal: Sarle's Bimodality Coefficient = 0.854 (Threshold 0.555 for Bimodality), With Class-Conditional Means at 0.197 (Benign) and 0.797 (Pathogenic) — A 0.60 Mean Score Gap, and the Pathogenic Subset Itself Has BC = 0.819
AlphaMissense Score Distribution Across 265,629 Missense-Only ClinVar Variants Is Highly Bimodal: Sarle's Bimodality Coefficient = 0.854 (Threshold 0.555 for Bimodality), With Class-Conditional Means at 0.197 (Benign) and 0.797 (Pathogenic) — A 0.60 Mean Score Gap, and the Pathogenic Subset Itself Has BC = 0.819
Abstract
We compute the distribution-shape statistics (mean, standard deviation, skewness, excess kurtosis, and Sarle's bimodality coefficient) of the AlphaMissense (Cheng et al. 2023) score distribution on 265,629 missense-only ClinVar single-nucleotide variants (75,952 Pathogenic + 189,677 Benign; stop-gain aa.alt = X excluded; dbNSFP v4 (Liu et al. 2020) annotation via MyVariant.info (Wu et al. 2021)). Sarle's bimodality coefficient (Sarle 1986; Pfister et al. 2013) is BC = (skewness² + 1) / (kurtosis + 3(n-1)² / ((n-2)(n-3))). The standard interpretive threshold is BC > 0.555 indicates a bimodal distribution. Result: the combined-class AM distribution has mean = 0.369, SD = 0.358, skewness = 0.824, excess kurtosis = −1.034, and BC = 0.854 — well above the bimodal threshold. The class-conditional distributions are also each individually bimodal: Pathogenic-only: mean = 0.797, SD = 0.279, skewness = −1.333, kurtosis = 0.391, BC = 0.819. Benign-only: mean = 0.197, SD = 0.213, skewness = 2.214, kurtosis = 4.215, BC = 0.818. The class-conditional mean-score gap is 0.797 − 0.197 = 0.600 — close to two-thirds of the 0–1 score range. Histogram-mode detection (20 bins, Pathogenic class only) finds a single primary mode at score 0.10–0.15 (which is the rare-Benign-included-in-Pathogenic-curation tail; n = 2,092). The Benign histogram has two modes: a dominant mode at score 0.05–0.10 (n = 85,505 — the canonical "Benign" mode) and a small secondary mode at score 0.80–0.85 (n = 1,618 — the rare "Pathogenic-like" Benign tail). The bimodality is a positive finding for AlphaMissense's calibration: a bimodal score distribution indicates that the predictor produces well-separated outputs for the two classes, with most variants assigned to either the very-low or very-high end. For variant interpretation pipelines: the BC = 0.854 of the combined distribution can be used as a quantitative reliability check on a new predictor — predictors with BC < 0.555 produce mid-range-clustered scores and are less useful for binary classification.
1. Background
A bimodal output distribution is a desirable property for a binary classifier: it indicates that most predictions are confidently assigned to one of the two classes, with few uncertain mid-range scores. The standard quantitative measure is Sarle's bimodality coefficient (Sarle 1986; Pfister et al. 2013):
BC = (skewness² + 1) / (kurtosis + 3·(n-1)² / ((n-2)·(n-3)))with kurtosis as the excess (Pearson) kurtosis (= moment-4 / SD⁴ − 3). The interpretive threshold is BC > 0.555 indicates the distribution is bimodal (Pfister et al. 2013).
This paper measures the AlphaMissense score distribution shape on the missense-only subset of ClinVar with this metric, both for the combined corpus and stratified by class.
2. Method
2.1 Data
- 178,509 Pathogenic + 194,418 Benign ClinVar single-nucleotide variants from MyVariant.info, with dbNSFP v4 annotation.
- For each variant: extract
dbnsfp.alphamissense.score(max across isoforms) anddbnsfp.aa.alt. - Exclude stop-gain (
aa.alt = X).
After filter: 75,952 Pathogenic + 189,677 Benign = 265,629 missense variants with valid AM score.
2.2 Distribution-shape statistics
For each subset (Combined, Pathogenic-only, Benign-only):
- mean = Σx / n.
- SD = √(Σ(x-mean)² / n).
- skewness = m₃ / SD³ where m₃ = Σ(x-mean)³ / n.
- excess kurtosis = m₄ / SD⁴ − 3 where m₄ = Σ(x-mean)⁴ / n.
- Sarle's bimodality coefficient = (skewness² + 1) / (kurtosis + 3·(n-1)² / ((n-2)·(n-3))).
2.3 Histogram-mode detection
20-bin histogram of each subset. A bin is a mode if its count is greater than both its left and right neighbors. Report all detected modes per subset.
3. Results
3.1 Distribution-shape statistics
| Subset | n | mean | SD | skewness | excess kurtosis | Sarle's BC |
|---|---|---|---|---|---|---|
| Combined (all missense) | 265,629 | 0.369 | 0.358 | +0.824 | −1.034 | 0.854 |
| Pathogenic only | 75,952 | 0.797 | 0.279 | −1.333 | +0.391 | 0.819 |
| Benign only | 189,677 | 0.197 | 0.213 | +2.214 | +4.215 | 0.818 |
All three subsets exceed the BC > 0.555 bimodality threshold by a wide margin. The combined distribution has the highest BC (0.854), as expected when mixing two well-separated unimodal distributions.
3.2 Class-conditional mean-score gap
The Pathogenic class has mean AM score 0.797; the Benign class has mean 0.197. The gap is 0.600 — close to two-thirds of the 0–1 score range. This gap is comparable to the published AM published-pathogenic threshold of 0.564, suggesting that the threshold is set near the geometric center between the two class means.
The corresponding per-class SDs are 0.279 (Pathogenic) and 0.213 (Benign) — both substantially less than the 0.600 gap, which means the two distributions barely overlap. This is the geometric explanation for AlphaMissense's high overall AUC (~0.94 on ClinVar): the score distributions are well-separated.
3.3 Histogram-mode detection
20-bin histogram modes (each bin spans 0.05 score units):
| Subset | Modes detected | Mode bin (score range) | Mode count |
|---|---|---|---|
| Combined | 1 mode | bin 1 (0.05–0.10) | 87,026 variants |
| Pathogenic only | 1 mode | bin 2 (0.10–0.15) | 2,092 variants |
| Benign only | 2 modes | bin 1 (0.05–0.10) + bin 16 (0.80–0.85) | 85,505 + 1,618 |
The combined and Pathogenic-only distributions each have a single dominant mode at the very-low-score end (the canonical Benign mode). The Benign-only distribution has two modes: the dominant Benign mode at 0.05–0.10 and a small secondary "Pathogenic-like Benign" mode at 0.80–0.85, representing variants curated as Benign but that AlphaMissense scores high.
The secondary Benign-mode at 0.80–0.85 (n = 1,618) is the false-positive tail at the high end: variants ClinVar curators called Benign but AlphaMissense thinks are Pathogenic. These are clinically interesting cases — either curator-mis-classifications, AM-mis-scoring, or variants with very low population allele frequency that fall just above the Benign cutoff but functionally resemble Pathogenic (e.g., reduced-penetrance variants).
3.4 The asymmetric skewness
Pathogenic distribution skewness: −1.333 (left-skewed: long tail toward low scores; mode at high end). Benign distribution skewness: +2.214 (right-skewed: long tail toward high scores; mode at low end). Combined distribution skewness: +0.824 (right-skewed: dominated by Benign mass).
The opposite-sign skewness for the two classes is consistent with the bimodal-mixture model: each class has its own mode and a tail toward the other class, with the tails representing the harder-to-classify variants.
3.5 The negative excess kurtosis of the combined distribution
The combined distribution has excess kurtosis = −1.034 (platykurtic). Negative excess kurtosis indicates lighter tails and a flatter top than a normal distribution — consistent with a bimodal mixture of two narrow distributions.
By contrast, the per-class distributions have positive excess kurtosis (Pathogenic +0.391, Benign +4.215), indicating heavier tails than normal — consistent with each class having a sharp mode plus a long tail toward the other class.
4. Confound analysis
4.1 Stop-gain explicitly excluded
We filter alt = X. Reported numbers are missense-only.
4.2 AlphaMissense training-set memorization
AM was trained partly on ClinVar labels. The bimodality of its score distribution on ClinVar therefore reflects training-set fit in part. A pre-AM-training-cutoff stratification would partition memorization from generalization; we do not perform this. The reported BC = 0.854 is the joint signal.
4.3 Per-isoform max-score
We use max AM score across isoforms; per-isoform variability is small (~0.05 score units). The 20-bin (0.05-resolution) histogram is robust to this noise.
4.4 Sarle's BC has known limitations
Sarle's BC is a heuristic, not a formal statistical test for bimodality. It conflates true bimodality with high skewness + low kurtosis. A complementary test (e.g., Hartigan's dip test, Hartigan & Hartigan 1985) would provide a hypothesis-test-style p-value. We report Sarle's BC as the standard summary; the qualitative bimodality is also visually evident in the histograms.
4.5 Histogram-mode detection sensitive to bin choice
We use 20 bins (0.05 width). Wider bins (10 bins, 0.10 width) would reduce mode count for the Benign distribution; narrower bins (40 bins, 0.025 width) would multiply spurious modes. The Benign secondary mode at 0.80–0.85 is robust to the 10–40 bin range.
4.6 ClinVar curatorial bias
ClinVar Pathogenic / Benign labels are not gold-standard truth. The secondary Benign mode at 0.80–0.85 partly reflects mis-labeled variants. The reported BC and mean-gap quantify the score distribution given the labels, not the true biological distribution.
5. Implications
- AlphaMissense's missense-only ClinVar score distribution is highly bimodal (Sarle's BC = 0.854), well above the 0.555 threshold.
- Class-conditional distributions are each individually bimodal (Pathogenic BC = 0.819, Benign BC = 0.818), with opposite-sign skewness consistent with the well-separated-mixture model.
- The class-conditional mean-score gap is 0.600 — large relative to the per-class SDs (0.213–0.279), explaining AlphaMissense's high overall AUC.
- The Benign distribution has a small secondary mode at 0.80–0.85 (n = 1,618) — the false-positive tail, clinically interesting for re-evaluation.
- For VEP comparison and quality-control: Sarle's BC of a predictor's score distribution is a quantitative reliability check; predictors with BC < 0.555 produce mid-range-clustered scores and are less useful for binary classification.
6. Limitations
- Stop-gain excluded (§4.1).
- AM training-set memorization (§4.2) — reported BC is joint memorization + generalization.
- Per-isoform max-score (§4.3) — small noise.
- Sarle's BC is heuristic (§4.4) — Hartigan's dip test would complement.
- Histogram-mode bin choice (§4.5) — 20-bin choice; secondary Benign mode robust.
- ClinVar curatorial bias (§4.6) — labels not gold-standard; secondary Benign mode partly mis-labeled.
7. Reproducibility
- Script:
analyze.js(Node.js, ~80 LOC, zero deps). - Inputs: ClinVar P + B JSON cache from MyVariant.info.
- Outputs:
result.jsonwith per-subset n, mean, SD, skewness, excess kurtosis, Sarle's BC, 20-bin histogram, and mode list. - Verification mode: 6 machine-checkable assertions: (a) all BCs > 0.555 (bimodality threshold); (b) Pathogenic mean > Benign mean; (c) class-conditional gap > 0.5; (d) Benign distribution has ≥ 2 modes; (e) Pathogenic skewness negative; (f) sample sizes match input file contents.
node analyze.js
node analyze.js --verify8. References
- Cheng, J., et al. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492.
- Landrum, M. J., et al. (2018). ClinVar. Nucleic Acids Res. 46, D1062–D1067.
- Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4. Genome Med. 12, 103.
- Wu, C., et al. (2021). MyVariant.info. Bioinformatics 37, 4029–4031.
- Sarle, W. S. (1986). The VARCLUS procedure. SAS/STAT User's Guide. (Original definition of Sarle's bimodality coefficient.)
- Pfister, R., Schwarz, K. A., Janczyk, M., Dale, R., & Freeman, J. B. (2013). Good things peak in pairs: a note on the bimodality coefficient. Front. Psychol. 4, 700. (BC threshold 0.555 reference.)
- Hartigan, J. A., & Hartigan, P. M. (1985). The dip test of unimodality. Ann. Stat. 13, 70–84. (Complementary bimodality test.)
- Pearson, K. (1905). Skew variation in homogeneous material. Phil. Trans. Roy. Soc. A 186, 343–414. (Skewness / kurtosis original definitions.)
- Ioannidis, N. M., et al. (2016). REVEL. Am. J. Hum. Genet. 99, 877–885.
- Pejaver, V., et al. (2022). Calibration of computational tools for missense variant pathogenicity classification. Am. J. Hum. Genet. 109, 2163–2177.