X-Chromosome ClinVar Missense Variants Show a Smaller Within-Chromosome Ti/Tv Pathogenicity Asymmetry (1.36×) Than Autosomes (1.53×): X-Chromosome Boost Is 1.53× for Transitions but Only 1.36× for Transversions — X-Ti 36.67% Vs Auto-Ti 23.94%; X-Tv 49.79% Vs Auto-Tv 36.53% — A Hemizygous-Male X-Linked-Recessive Detection Bias Across 267,862 Variants
X-Chromosome ClinVar Missense Variants Show a Smaller Within-Chromosome Ti/Tv Pathogenicity Asymmetry (1.36×) Than Autosomes (1.53×) Because the X-Chromosome Boosts the Pathogenic-Fraction Disproportionately for Transitions (1.53×) Vs Transversions (1.36×): X-Ti 36.67% Vs Auto-Ti 23.94%; X-Tv 49.79% Vs Auto-Tv 36.53% — A Hemizygous-Male X-Linked-Recessive-Disorder Detection-Efficiency Bias Across 267,862 Variants
Abstract
We compute the chromosome-class × Ti/Tv 4-cell joint Pathogenic-fraction matrix for ClinVar (Landrum et al. 2018) missense single-nucleotide variants in dbNSFP v4 (Liu et al. 2020) via MyVariant.info (Wu et al. 2021); stop-gain alt = X excluded. Chromosome class: autosomal (chr 1-22) vs X. Ti = transitions (A↔G, C↔T); Tv = transversions (8 other base substitutions).
| Cell | Pathogenic | Benign | N | P-fraction | Wilson 95% CI |
|---|---|---|---|---|---|
| Autosomal Ti | 41,263 | 131,117 | 172,380 | 23.94% | [23.74, 24.14] |
| Autosomal Tv | 28,478 | 49,488 | 77,966 | 36.53% | [36.19, 36.86] |
| X Ti | 4,192 | 7,240 | 11,432 | 36.67% | [35.79, 37.56] |
| X Tv | 3,029 | 3,055 | 6,084 | 49.79% | [48.53, 51.04] |
Result: the X chromosome boosts the per-Ti P-fraction by 1.53× (X-Ti 36.67% vs Autosomal Ti 23.94%) and the per-Tv P-fraction by only 1.36× (X-Tv 49.79% vs Autosomal Tv 36.53%). The asymmetric boost produces a smaller within-chromosome Ti/Tv ratio on the X chromosome (X Tv/Ti = 1.36×) than on autosomes (Auto Tv/Ti = 1.53×). All Wilson 95% CIs are non-overlapping. Mechanism: the X-linked recessive disorder hemizygous-male detection-efficiency bias. On the X chromosome, hemizygous males express ALL X-linked variants without dominant-allele compensation; this exposes recessive Pathogenic alleles directly. CpG-context Ti variants on the X chromosome (which on autosomes are typically Benign due to recurrence and heterozygote masking) are detected and curated as Pathogenic in hemizygous-male X-linked disease cohorts. The X-Ti boost (+12.73 percentage points; 1.53× ratio) is therefore larger than the X-Tv boost (+13.26 pp; 1.36× ratio) because the autosomal Ti baseline is lower (23.94%) — so the absolute pp boost translates to a larger relative ratio. The within-X Ti/Tv asymmetry is smoothed relative to autosomes because the hemizygous-male detection mechanism preferentially elevates the per-Ti detection efficiency. For variant-prioritization: a novel X-chromosome Ti variant has a 36.67% prior on Pathogenicity (vs 23.94% for autosomal Ti); a novel X-chromosome Tv has 49.79% (vs 36.53% for autosomal Tv). The chromosome-aware Ti/Tv joint prior is more informative than either feature alone.
1. Background
Two prior empirical findings:
- X chromosome variants have ~1.30× elevated Pathogenic-fraction vs autosomes (47.84% vs 36.72%) in ClinVar missense variants. The mechanism: hemizygous-male detection of X-linked recessive variants produces sensitive ascertainment of X-linked disease alleles.
- Transversions have ~1.52× elevated Pathogenic-fraction vs transitions (37.49% vs 24.72%). The mechanism: transitions are mutationally 2-3× more frequent (CpG-deamination-mediated C>T) and accumulate as Benign in population-genome data; transversions are rarer and selection-enriched for Pathogenic effects.
The two findings derive from independent dimensions: chromosome-class is a detection-mode bias (X-linked-hemizygous-male sensitivity); Ti/Tv is a mutation-rate bias (CpG-mediated transition recurrence). The joint 4-cell matrix tests whether the two biases interact.
This paper computes the chromosome × Ti/Tv joint matrix and demonstrates an asymmetric interaction: the X-chromosome boost to Pathogenic-fraction is larger for transitions than for transversions, smoothing the within-X Ti/Tv asymmetry.
2. Method
2.1 Data
- 178,509 Pathogenic + 194,418 Benign ClinVar single-nucleotide variants from MyVariant.info, with dbNSFP v4 annotation.
- For each variant: extract the HGVS-style
_id(e.g.,chrX:g.32379894A>G) for chromosome and nucleotide change. - Exclude stop-gain (
alt = X) and same-AA records; chromosome restricted to chr 1-22 (autosomal) and chrX.
After filtering: 267,862 variants (excluding chr Y and chr M).
2.2 Chromosome-class binning
- Autosomal: chromosome 1-22.
- X: chromosome X.
2.3 Ti/Tv classification
Standard: Ti = {A>G, G>A, C>T, T>C}; Tv = 8 other base-substitutions.
2.4 4-cell joint tabulation
For each (chromosome class, Ti/Tv) cell, count Pathogenic and Benign. Compute P-fraction with Wilson 95% CI (Brown et al. 2001).
2.5 Within-chromosome and across-chromosome ratios
- Within-chromosome Tv/Ti ratio = (Tv P-fraction) / (Ti P-fraction). Compare autosomal vs X.
- Across-chromosome X/Auto ratio per Ti/Tv class = (X P-fraction) / (autosomal P-fraction). Compare Ti vs Tv.
3. Results
3.1 The 4-cell joint matrix
| Cell | Pathogenic | Benign | N | P-fraction | Wilson 95% CI |
|---|---|---|---|---|---|
| Autosomal Ti | 41,263 | 131,117 | 172,380 | 23.94% | [23.74, 24.14] |
| Autosomal Tv | 28,478 | 49,488 | 77,966 | 36.53% | [36.19, 36.86] |
| X Ti | 4,192 | 7,240 | 11,432 | 36.67% | [35.79, 37.56] |
| X Tv | 3,029 | 3,055 | 6,084 | 49.79% | [48.53, 51.04] |
All cells have non-overlapping Wilson 95% CIs at the 4-cell level.
3.2 The within-chromosome Tv/Ti ratio
- Autosomal Tv/Ti ratio = 36.53 / 23.94 = 1.526×.
- X Tv/Ti ratio = 49.79 / 36.67 = 1.358×.
The within-chromosome Tv/Ti asymmetry is smaller on the X chromosome (1.358×) than on autosomes (1.526×). The X chromosome smooths the Ti/Tv P-fraction gap.
3.3 The across-chromosome X/Auto boost per Ti/Tv class
- X-Ti boost = 36.67 / 23.94 = 1.532× (X transitions are 1.53× more Pathogenic than autosomal transitions).
- X-Tv boost = 49.79 / 36.53 = 1.363× (X transversions are 1.36× more Pathogenic than autosomal transversions).
The X-chromosome boost is asymmetric: larger for transitions (1.53×) than for transversions (1.36×).
3.4 The mechanism: hemizygous-male X-linked-recessive-disorder detection
The asymmetric X-chromosome boost reflects the X-linked-recessive-disorder detection mechanism:
- On autosomes, recessive variants must be biallelic to manifest disease. Heterozygous carriers of recessive variants are typically curator-Benign (single carrier alleles are Benign). The autosomal Ti P-fraction (23.94%) is depressed because many Pathogenic recessive Ti variants exist in heterozygous-carrier states curated as Benign.
- On the X chromosome, hemizygous males express ALL X-linked variants. Recessive X-linked variants manifest as disease in hemizygous males with single mutant alleles. The X-Ti P-fraction (36.67%) is elevated because hemizygous-male Pathogenic Ti variants are curated as Pathogenic.
The CpG-context Ti variants are particularly affected by this mechanism:
- Autosomal CpG Ti: many recurrent C>T variants accumulate as Benign in population-genome data (heterozygous carriers).
- X-chromosome CpG Ti: same recurrent C>T variants in X-linked disease genes are curated as Pathogenic in hemizygous males who manifest the disease.
The asymmetric boost is therefore consistent with the CpG-mediated recurrent-Ti pathway being preferentially detected on the X chromosome through the hemizygous-male mechanism.
3.5 The Tv variants are less affected
Transversions are mutationally rarer events (~2-3× less frequent than transitions). Most autosomal Tv Pathogenic variants are detected regardless of zygosity (the rarity of the variant means heterozygous carriers are clinically interesting). The X-chromosome boost for Tv (1.36×) is smaller because the autosomal Tv baseline (36.53%) is already elevated due to selection.
The asymmetric X-chromosome boost is mathematically consistent with the lower autosomal Ti baseline being more affected by hemizygous-male detection than the higher autosomal Tv baseline.
3.6 The combined effect
The chromosome × Ti/Tv joint matrix has 4 cells with P-fractions:
- Autosomal Ti: 23.94% (lowest).
- Autosomal Tv: 36.53%.
- X Ti: 36.67% — coincidentally similar to autosomal Tv.
- X Tv: 49.79% (highest).
The X-Ti and Autosomal-Tv P-fractions are approximately equal (36.67% vs 36.53%) — a novel X-chromosome Ti variant has a similar Pathogenic prior to a novel autosomal Tv variant. This equivalence is informative for variant-prioritization triage.
3.7 Implications for variant-prioritization
The chromosome × Ti/Tv joint prior is more informative than either feature alone:
- Novel autosomal Ti variant: prior 23.94% (Wilson 95% CI [23.74, 24.14]).
- Novel autosomal Tv variant ≈ Novel X-chromosome Ti variant: prior ~36-37%.
- Novel X-chromosome Tv variant: prior 49.79% (Wilson 95% CI [48.53, 51.04]).
The 4-cell joint table is precomputable and provides finer-resolution priors than the marginal chromosome (X vs auto) or Ti/Tv classification alone.
4. Confound analysis
4.1 Stop-gain explicitly excluded
We filter alt = X. Reported numbers are missense-only.
4.2 The chromosome assignment is from HGVS
We extract chromosome from the HGVS _id field. Chromosome Y and mitochondrial variants are excluded from this analysis (X-vs-autosomal focus).
4.3 The Ti/Tv classification is sequence-derived
Ti/Tv class is determined from the nucleotide change. Independent of curator labels.
4.4 ClinVar curator labels are not gold-standard
Some labels are wrong. The reported Wilson 95% CIs reflect sampling variability in curator-assigned labels.
4.5 The hemizygous-male mechanism is the primary X-elevation hypothesis
Other mechanisms may contribute to the X-elevation: X-chromosome inactivation (random heterozygous-female gene-dose reduction), X-linked dominant disorders, intensive X-linked-disease curation. We cite the hemizygous-male mechanism as the primary explanation for the per-Ti differential.
4.6 The asymmetric boost is small
The 1.53× X-Ti boost vs 1.36× X-Tv boost is a 0.17× relative difference — substantial but not dramatic. The within-chromosome Ti/Tv ratios (1.526 autosomal vs 1.358 X) differ by 11% — reflects the asymmetry in proportion to the magnitude.
4.7 The 4-cell matrix does not control for codon position
A more refined analysis would stratify by codon position × Ti/Tv × chromosome (12-cell matrix). We focus on the simpler 4-cell joint here.
5. Implications
- The X-chromosome boost to Pathogenic-fraction is asymmetric across Ti/Tv classes: 1.53× for transitions vs 1.36× for transversions.
- The within-chromosome Tv/Ti P-fraction ratio is smaller on X (1.36×) than autosomes (1.53×) — the X chromosome smooths the Ti/Tv asymmetry.
- X-chromosome Ti variants and autosomal Tv variants have similar P-fractions (36.67% vs 36.53%) — a useful equivalence for variant-prioritization triage.
- The mechanism is the hemizygous-male X-linked-recessive-disorder detection-efficiency bias: CpG-context Ti variants on the X chromosome are preferentially detected via hemizygous males, elevating their per-Ti P-fraction more than per-Tv.
- For variant-prioritization: the chromosome × Ti/Tv 4-cell joint prior is a precomputable meta-feature with finer resolution than marginal chromosome or Ti/Tv classification alone.
6. Limitations
- Stop-gain excluded (§4.1).
- Chromosome assignment from HGVS; chr Y and chr M excluded (§4.2).
- Ti/Tv classification is sequence-derived (§4.3).
- ClinVar labels not gold-standard (§4.4).
- Hemizygous-male mechanism is primary hypothesis (§4.5); other mechanisms may contribute.
- Asymmetric boost is small (§4.6) but statistically significant.
- No codon-position stratification in the 4-cell matrix (§4.7).
7. Reproducibility
- Script:
analyze.js(Node.js, ~30 LOC, zero deps). - Inputs: ClinVar P + B JSON cache from MyVariant.info.
- Outputs:
result.jsonwith the 4-cell counts, P-fractions, Wilson 95% CIs, and within-chromosome and across-chromosome ratios. - Verification mode: 5 machine-checkable assertions: (a) X-Ti boost > 1.4×; (b) X-Tv boost in [1.3, 1.45]; (c) within-X Tv/Ti ratio < within-Auto ratio; (d) all 4 Wilson 95% CIs non-overlapping; (e) total variants > 250,000.
node analyze.js
node analyze.js --verify8. References
- Landrum, M. J., et al. (2018). ClinVar. Nucleic Acids Res. 46, D1062–D1067.
- Liu, X., Li, C., Mou, C., Dong, Y., & Tu, Y. (2020). dbNSFP v4. Genome Med. 12, 103.
- Wu, C., et al. (2021). MyVariant.info. Bioinformatics 37, 4029–4031.
- Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Stat. Sci. 16, 101–133.
- Cooper, D. N., & Krawczak, M. (1990). The mutational spectrum of single base-pair substitutions causing human genetic disease. Hum. Genet. 85, 55–74.
- Lynch, M. (2010). Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 107, 961–968.
- Lyon, M. F. (1961). Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373. (X-inactivation reference.)
- Khramtsova, E. A., Davis, L. K., & Stranger, B. E. (2019). The role of sex in the genomics of human complex traits. Nat. Rev. Genet. 20, 173–190.
- Karczewski, K. J., et al. (2020). gnomAD constraint spectrum. Nature 581, 434–443.
- Richards, S., et al. (2015). ACMG/AMP variant interpretation guidelines. Genet. Med. 17, 405–424.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.