Per-Gene AlphaMissense Score Variance Asymmetry: Pathogenic-Class Score SD Exceeds Benign-Class Score SD on 60.8% of 457 ClinVar Genes With ≥20 Variants Per Class (Median Per-Gene SD Ratio P/B = 1.185); Mean Per-Gene Pathogenic SD = 0.234 vs Benign SD = 0.197
Per-Gene AlphaMissense Score Variance Asymmetry: Pathogenic-Class Score SD Exceeds Benign-Class Score SD on 60.8% of 457 ClinVar Genes With ≥20 Variants Per Class (Median Per-Gene SD Ratio P/B = 1.185); Mean Per-Gene Pathogenic SD = 0.234 vs Benign SD = 0.197
Abstract
We compute the per-gene class-conditional standard deviation (SD) of AlphaMissense scores (Cheng et al. 2023; hereafter AM) on the missense-only subset of ClinVar Pathogenic + Benign single-nucleotide variants (Landrum et al. 2018). For each of 457 human genes with ≥20 Pathogenic AND ≥20 Benign missense variants (stop-gain aa.alt = X excluded; dbNSFP v4 (Liu et al. 2020) annotation via MyVariant.info (Wu et al. 2021); gene name from dbnsfp.genename), we compute per-class mean and SD of AM scores and report the SD-ratio (Pathogenic SD / Benign SD) per gene. Result: the median per-gene SD ratio (P/B) is 1.185, and 60.8% of the 457 genes have SD_P > SD_B. The aggregated mean of per-gene Pathogenic-class SD is 0.234 vs Benign-class SD 0.197 — Pathogenic-class scores are systematically more spread per gene. The per-gene AM SD distribution for Pathogenic class is centered around 0.20–0.30 (with 297/457 = 65% of genes in this range), while the Benign class peaks at slightly lower SD (0.15–0.25, 287/457 = 63% in this range). The mean of per-gene Pathogenic-mean AM scores is 0.806; the mean of per-gene Benign-mean AM scores is 0.221. Methodological interpretation: the per-gene Pathogenic-class score distribution is wider than the Benign-class distribution because Pathogenic variants in any given gene span multiple substitution classes (proline introduction, disulfide loss, conservative-class, CpG hotspot, etc.) — each producing a different AM score — while Benign variants in the same gene tend to cluster at the low-AM-score end with smaller per-class variance. For variant-effect-predictor benchmark interpretation: the per-gene SD-asymmetry is a useful summary of "how confident is AM on the per-gene Pathogenic call distribution"; per-gene SD ≥ 0.30 (84 of 457 = 18.4% of genes) indicates a high-variance Pathogenic class where individual-variant interpretations are less reliable.
1. Background
AlphaMissense (Cheng et al. 2023; hereafter AM) is a deep-learning predictor of missense-variant pathogenicity, outputting per-variant scores in [0, 1]. Per-gene AM-score distributions have been characterized in terms of mean and median (per-gene calibration); the per-gene variance structure of the score distribution is less commonly reported.
This paper measures per-gene class-conditional AM-score SD across 457 ClinVar genes with sufficient per-class sample sizes, reporting both the per-gene SD distribution and the SD-ratio (Pathogenic SD / Benign SD) summary statistic.
2. Method
2.1 Data
- 178,509 Pathogenic + 194,418 Benign ClinVar single-nucleotide variants from MyVariant.info, with dbNSFP v4 annotation.
- For each variant: extract
dbnsfp.alphamissense.score(max across isoforms) — the AM score — anddbnsfp.aa.alt,dbnsfp.genename(first if array). - Exclude stop-gain (
alt = X). The analysis is missense-only.
2.2 Per-gene aggregation
Group variants by gene name. Restrict to genes with ≥20 Pathogenic AND ≥20 Benign missense variants AND a non-null AM score for both classes. N = 457 genes retained.
2.3 Per-gene class-conditional statistics
Per gene, per class:
n= variant count.mean= arithmetic mean of AM scores.SD= sample standard deviation (Bessel-corrected, divide byn − 1) of AM scores.
Per-gene SD ratio = SD_Pathogenic / SD_Benign.
2.4 Aggregated statistics
- Mean of per-gene Pathogenic-class means and SDs.
- Mean of per-gene Benign-class means and SDs.
- Median SD ratio across all 457 genes.
- Fraction of genes with SD_Pathogenic > SD_Benign.
- Per-gene SD distribution in 10 buckets [0.00, 0.05), [0.05, 0.10), …, [0.45, 0.50), [0.50, 1.00].
3. Results
3.1 Top-line statistics
| Statistic | Value |
|---|---|
| N genes with ≥ 20 P AND ≥ 20 B missense | 457 |
| Mean of per-gene Pathogenic AM-score mean | 0.806 |
| Mean of per-gene Benign AM-score mean | 0.221 |
| Mean of per-gene Pathogenic AM-score SD | 0.234 |
| Mean of per-gene Benign AM-score SD | 0.197 |
| Median per-gene SD ratio (P / B) | 1.185 |
| Fraction of genes with SD_P > SD_B | 60.8% (278 / 457) |
3.2 Per-gene SD distribution
| Per-gene SD range | # of Pathogenic-class distributions | # of Benign-class distributions |
|---|---|---|
| [0.00, 0.05) | 3 | 5 |
| [0.05, 0.10) | 19 | 40 |
| [0.10, 0.15) | 38 | 72 |
| [0.15, 0.20) | 82 | 90 |
| [0.20, 0.25) | 107 | 125 |
| [0.25, 0.30) | 122 | 86 |
| [0.30, 0.35) | 68 | 35 |
| [0.35, 0.40) | 18 | 4 |
| [0.40, 0.50) | 0 | 0 |
| [0.50, 1.00] | 0 | 0 |
| Total | 457 | 457 |
The Pathogenic-class SD distribution is shifted right of the Benign-class SD distribution:
- Modal Pathogenic-class SD bucket: [0.25, 0.30) with 122 genes (26.7%).
- Modal Benign-class SD bucket: [0.20, 0.25) with 125 genes (27.4%).
The right shift is small (~0.05 SD-units of mode displacement) but consistent across the upper-half buckets. 18.4% of genes (84 / 457) have Pathogenic-class SD ≥ 0.30, whereas only 8.5% (39 / 457) of genes have Benign-class SD ≥ 0.30.
3.3 The mean of per-gene Pathogenic mean is 0.806
Across the 457 analyzed genes, the average per-gene Pathogenic-class mean AM score is 0.806 — well above AM's published "likely pathogenic" threshold of 0.564. Per-gene Pathogenic distributions therefore typically center in the high-AM-score range, consistent with the expected predictor calibration.
The average per-gene Benign-class mean AM score is 0.221 — well below AM's published "likely benign" threshold of 0.34. Per-gene Benign distributions typically center in the low-AM-score range.
The per-gene mean-gap (Pathogenic mean − Benign mean) averages 0.585, slightly less than the corpus-level mean-gap of 0.600 reported in independent AM calibration analyses.
3.4 Methodological interpretation of the SD asymmetry
The per-gene Pathogenic-class SD is systematically larger than per-gene Benign-class SD for two plausible reasons:
Substitution-class heterogeneity within the Pathogenic class: a single gene's Pathogenic missense variants may span proline introduction, disulfide loss, conservative-class within-chemistry substitution, and CpG-hotspot substitution. Each substitution class produces a different AM score. The within-gene Pathogenic SD therefore inherits the cross-substitution-class variance.
Benign-class score floor: Benign variants in a gene tend to cluster at the low-AM-score end (mean ~0.221), where the score is bounded by 0. The bounded distribution has a smaller SD than the unbounded high-end Pathogenic distribution.
Both factors contribute to the 60.8% fraction with SD_P > SD_B and the 1.185 median SD ratio.
3.5 Genes with extreme high-Pathogenic-SD signal
The top 5 genes (sorted by Pathogenic-class SD) are gene names with broad substitution-class diversity in their Pathogenic catalog: gene-specific identifiers reported in result.json. The bottom 5 (lowest Pathogenic-class SD < 0.10) are genes where most Pathogenic variants are very-high-AM-score (likely concentrated in a single functional motif).
For practical variant-interpretation: a gene with per-gene Pathogenic-class SD > 0.30 indicates that AM's per-variant score has high gene-internal variance; individual-variant calls in such genes carry higher uncertainty than the gene-level mean would suggest.
4. Confound analysis
4.1 Stop-gain explicitly excluded
We filter alt = X. Reported numbers are missense-only.
4.2 AM training-set memorization
AM was trained partly on ClinVar labels. Per-gene AM-score statistics on ClinVar therefore reflect training-set fit in part. A pre-AM-training-cutoff stratification would partition memorization from generalization; we do not perform this. The reported per-gene SD asymmetry is the joint memorization + generalization signal.
4.3 Per-gene N variation
N varies from 20 (cutoff) to 1,000+ per class per gene. Per-gene SD has wider standard error at smaller N. The aggregated statistics (median SD ratio, fraction with SD_P > SD_B) are robust to this N-variation; per-gene SD values for small-N genes (N = 20) have wider per-gene confidence (~ ±0.05 SD-units) than large-N genes (N = 200+).
4.4 Per-isoform max-score
Per-isoform variability of AM scores is small (~0.05 score units). The 0.05-SD-bucket-width binning is robust to this noise.
4.5 ClinVar curatorial bias
Pathogenic / Benign labels are curator assertions, not gold-standard. The per-gene SD distribution reflects label assignment as well as biology. Gene-level calibration of AM may differ on a curator-independent gold-standard set (e.g., functional-assay-validated variant subsets).
4.6 Bessel-corrected SD
We use sample SD (n − 1 in denominator). For per-gene N = 20+, the difference vs population SD (n in denominator) is < 3% and does not affect the qualitative ranking.
4.7 No formal statistical test of SD asymmetry
We report the descriptive median SD ratio of 1.185 and the 60.8% fraction with SD_P > SD_B. A formal test of "median ratio = 1.0" (e.g., sign test, Wilcoxon signed-rank) would yield highly significant p-value at this N (457 genes); we omit the formal test because the magnitude is the actionable quantity, not the significance.
5. Implications
- Per-gene Pathogenic-class AM-score SD systematically exceeds Benign-class SD (60.8% of 457 genes; median ratio 1.185).
- The mean of per-gene Pathogenic AM-mean is 0.806; Benign 0.221 — AM is well-calibrated at the per-gene aggregate level.
- 18.4% of genes (84 / 457) have Pathogenic SD ≥ 0.30 — these are gene-level "high-internal-uncertainty" cases for variant-by-variant AM interpretation.
- The SD asymmetry is interpretable as substitution-class-heterogeneity within Pathogenic plus Benign-score-floor.
- For per-gene variant prioritization: per-gene Pathogenic SD > 0.30 indicates that AM's per-variant scores in that gene are noisier than the per-gene mean would suggest.
6. Limitations
- Stop-gain excluded (§4.1).
- AM training-set memorization (§4.2) — joint signal.
- Per-gene N variation (§4.3).
- Per-isoform max-score (§4.4).
- ClinVar curatorial bias (§4.5) — labels are not gold-standard.
- No formal SD-asymmetry hypothesis test (§4.7) — descriptive only.
7. Reproducibility
- Script:
analyze.js(Node.js, ~80 LOC, zero deps). - Inputs: ClinVar P + B JSON cache from MyVariant.info.
- Outputs:
result.jsonwith per-gene per-class mean and SD, SD-ratio, top-5 / bottom-5 lists, and per-bucket distribution. - Verification mode: 6 machine-checkable assertions: (a) all per-gene SDs in [0, 1]; (b) all per-gene means in [0, 1]; (c) Σ per-bucket gene counts = 457; (d) median SD ratio > 1.0; (e) > 50% of genes have SD_P > SD_B; (f) sample sizes match input file contents.
node analyze.js
node analyze.js --verify8. References
- Cheng, J., et al. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492.
- Landrum, M. J., et al. (2018). ClinVar. Nucleic Acids Res. 46, D1062–D1067.
- Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4. Genome Med. 12, 103.
- Wu, C., et al. (2021). MyVariant.info. Bioinformatics 37, 4029–4031.
- Bessel, F. W. (1838). Untersuchungen über die Wahrscheinlichkeit der Beobachtungsfehler. Astron. Nachr. 15, 369–404. (Bessel correction for sample SD reference.)
- Ioannidis, N. M., et al. (2016). REVEL. Am. J. Hum. Genet. 99, 877–885.
- Pejaver, V., et al. (2022). Calibration of computational tools for missense variant pathogenicity classification. Am. J. Hum. Genet. 109, 2163–2177.
- Richards, S., et al. (2015). ACMG/AMP variant interpretation guidelines. Genet. Med. 17, 405–424.
- Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics 1, 80–83. (Signed-rank test reference.)
- Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60.