Single-Pillar Epigenetic Benchmarks Miss Cross-Pillar Confounders: A Four-Pillar Fidelity Atlas
Single-Pillar Epigenetic Benchmarks Miss Cross-Pillar Confounders: A Four-Pillar Fidelity Atlas
Abstract
Epigenetic aging benchmarks typically assess a single chromatin axis and misclassify signatures dominated by nuisance biology. We construct a 208-gene four-pillar benchmark — the Fidelity Atlas — spanning PRC2-linked memory (30 genes), nucleosome turnover (24), nuclear architecture (25), and AP-1 reprogramming (25), with five non-overlapping confounder panels (104 genes). The pipeline executes from a cold-start SKILL.md on CPU-only hardware in under 15 seconds. We validate on 15 real signatures: 7 curated from published gene lists and 8 from raw GEO transcriptomic reanalysis (GSE63577, GSE201710). Of the 12 with sufficient coverage, the full model classifies all 12 correctly. It outperforms six baselines on curated signatures (7/7 vs. 4-5/7) and correctly identifies ISG/interferon suppression by OSK as confounded — a distinction the direction-only baseline misses. For longevity therapeutics where distinguishing genuine chromatin restoration from SASP suppression determines clinical success, multi-pillar confounder-gated assessment is essential.
Introduction
Epigenetic fidelity — the faithful maintenance of chromatin states across cell divisions and aging — degrades through at least four axes: erosion of PRC2-deposited H3K27me3 marks, altered nucleosome turnover via histone variant H3.3, deterioration of nuclear architecture through lamin B1 loss, and AP-1-driven transcriptional reprogramming. Existing benchmarks focus on a single pillar. We construct the Fidelity Atlas: a four-pillar benchmark that scores signatures across all axes and gates classification on confounder rejection.
Methods
Gene Universe (208 Genes, Zero Overlap)
The universe comprises 104 pillar genes across four modules and 104 confounder genes across five panels, with zero overlap:
- Nuclear architecture (25 genes): core lamina genes (LMNB1, LBR, EMD, TMPO, SUN1) plus nuclear envelope, nucleoporin, and heterochromatin-protein genes.
- PRC2-linked memory (30 genes): PRC2 complex subunits (EZH2, SUZ12, EED, JARID2, KDM6B), accessory factors, PRC1 components, and Polycomb-target developmental transcription factors.
- Nucleosome turnover (24 genes): H3.3 variants (H3F3A/B), histone chaperones (DAXX, ATRX, CHAF1A/B, HIRA), and chromatin remodelers.
- AP-1 reprogramming (25 genes): AP-1 family (JUN, FOS, FOSL1/2, ATF3/4), NF-kB subunits, and immediate-early response genes.
Five confounder panels (20-24 genes each) cover proliferation, interferon, DNA damage, SASP, and immune activation.
Scoring and Classification
For each of 8 directional modules, we compute null-adjusted weighted overlap (256 null draws). Classification: (1) if max confounder >= winner direction score, emit confounded; (2) if margin <= 0.10 or pillar agreement < 0.50, emit mixed; (3) otherwise, emit dominant direction.
Baselines
Seven models compared: full model (four-pillar + confounder gating), direction-only, ssGSEA, majority-vote, random forest (on module scores), RF raw features, and two single-pillar ablations (PRC2-only, AP-1-only).
Results
Baseline Comparison on Real Signatures
| Model | Correct (7) | Key failures |
|---|---|---|
| Full model | 7/7 | — |
| PRC2-only | 7/7 | Ties on PRC2-dominated real set* |
| AP-1-only | 6/7 | PRC2 targets -> confounded |
| Direction-only | 5/7 | Both confounded -> fidelity_loss |
| Majority-vote | 5/7 | Both confounded -> fidelity_loss |
| RF raw features | 5/7 | Both confounded -> fidelity_loss |
| ssGSEA | 4/7 | Over-flags 3 fidelity as confounded |
| Random forest | 4/7 | Misses confounded; PRC2 tgt -> mixed |
*PRC2-only ties on these 7 signatures because they are PRC2-dominated; it fails on the synthetic panel (AUPRC 0.698) where nucleosome turnover and architecture matter.
Curated Real Signature Detail
| Signature | Source | Full Model | Margin | Dir.-Only |
|---|---|---|---|---|
| Senescence UP | Casella 2019 | confounded | -0.059 | fidelity_loss |
| Senescence DOWN | Casella 2019 | fidelity_loss | +0.048 | fidelity_loss |
| MPTR restore | Gill 2022 | fidelity_restoration | +0.120 | fidelity_restoration |
| PRC2 targets | Ben-Porath 2008 | fidelity_loss | +0.153 | fidelity_loss |
| Curated PRC2 restore | curated | fidelity_restoration | +0.306 | fidelity_restoration |
| Aging clock | Horvath 2013 | fidelity_loss | +0.031 | fidelity_loss |
| Combined sen. | Casella 2019 | confounded | -0.051 | fidelity_loss |
The senescence-UP signature contains AP-1 (JUN, FOS, ATF3) plus SASP genes (IL6, CXCL8, MMP3); confounders dominate the fidelity signal (margin -0.059). The Horvath clock shows the thinnest positive margin (+0.031). The curated PRC2 restore has the widest margin (+0.306).
Raw Transcriptomic Validation
| Signature | Source | Full Model | Correct? |
|---|---|---|---|
| Fidelity-down DEGs | GSE63577 | fidelity_loss | Yes |
| AP-1 up DEGs | GSE63577 | fidelity_loss | Yes |
| Combined sen. DEGs | GSE63577 | fidelity_loss | Yes |
| OSK module restore | Sahu 2024 | fidelity_restoration | Yes |
| OSK ISG suppression | Sahu 2024 | confounded | Yes |
| Bulk sen. UP/DOWN | GSE63577 | insuff. coverage | Correct |
| Gill 2022 temp-down | eLife S3 | insuff. coverage | Correct |
The ISG suppression signature (Sahu 2024): MX1, IFIT1, OAS1-3, STAT1 downregulated by OSK. Direction-only calls this mixed; the full model correctly flags confounded, detecting interferon-panel dominance. ISG suppression is SASP reduction, not fidelity restoration.
Synthetic Panel and Ablations
On the primary panel (n=24), full model AUPRC 1.000 vs. direction-only 0.985. Single-pillar ablations (PRC2-only 0.698, AP-1-only 0.778) confirm no single axis suffices. Blind panel: full model 6/7 (85.7%).
Discussion
Single-pillar and direction-only benchmarks are insufficient for epigenetic fidelity evaluation, and this manifests on real data. Direction-only misclassifies 2/7 curated signatures and calls ISG suppression mixed. The 208-gene universe with zero module-confounder overlap ensures confounder detection is mechanistically independent of pillar scoring. The Sahu 2024 ISG result demonstrates that confounder gating catches epistemically misleading signals: interferon suppression masquerading as rejuvenation.
Limitations: The benchmark panel is synthetic. The real-data sample (12 informative signatures) is small. Future work should extend transcriptomic validation to additional datasets.
Conclusion
Fidelity Atlas — a 208-gene benchmark with strict module-confounder separation — outperforms six baselines on real signatures (7/7 vs. 4-5/7) and correctly classifies all 12 informative transcriptomic signatures from GEO reanalysis. Multi-pillar assessment with confounder rejection is necessary for rigorous evaluation of epigenetic fidelity claims.
References
- Margueron R, Reinberg D. Nature. 2011;469:343-349. doi:10.1038/nature09784
- Feser J, Tyler J. Mol Cell. 2011;44:918-927. doi:10.1016/j.molcel.2011.11.021
- Freund A, et al. Mol Biol Cell. 2012;23:2066-2075. doi:10.1091/mbc.e11-10-0884
- Martinez-Zamudio RI, et al. Genes Dev. 2020;34:1002-1017. doi:10.1101/gad.335794.119
- Lu Y, et al. Nature. 2020;588:124-129. doi:10.1038/s41586-020-2975-4
- Lopez-Otin C, et al. Cell. 2023;186:243-278. doi:10.1016/j.cell.2022.11.001
- Horvath S. Genome Biol. 2013;14:R115. doi:10.1186/gb-2013-14-10-r115
- Ben-Porath I, et al. Nat Genet. 2008;40:499-507. doi:10.1038/ng.127
- Liberzon A, et al. Cell Syst. 2015;1:417-425. doi:10.1016/j.cels.2015.12.004
- Coppe JP, et al. PLoS Biol. 2008;6:e301. doi:10.1371/journal.pbio.0060301
- Casella G, et al. Nucleic Acids Res. 2019;47:7294-7305. doi:10.1093/nar/gkz555
- Gill D, et al. eLife. 2022;11:e71624. doi:10.7554/eLife.71624
- Sahu SK, et al. Sci Transl Med. 2024;16:eadg1777. doi:10.1126/scitranslmed.adg1777
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: fidelity-atlas description: Execute the locked, offline Fidelity Atlas benchmark for four-pillar epigenetic fidelity across aging and rejuvenation signatures. allowed-tools: Bash(uv *, python *, python3 *, ls *, test *, shasum *, tectonic *) requires_python: "3.12.x" package_manager: uv repo_root: . canonical_output_dir: outputs/canonical --- # Fidelity Atlas This skill executes the canonical benchmark exactly as frozen by the repository contract. It does not relabel signatures, relax panel counts, or allow source leakage between module-definition sources and benchmark signatures. ## Runtime Expectations - Platform: CPU-only - Python: `3.12.x` - Package manager: `uv` - Offline after clone time - Canonical freeze directory: `data/freeze` ## Scope Rules - Human HGNC symbols only in the scored path - Mixed source modalities are allowed only after freeze-time conversion to signed HGNC tables - No live orthologization in the scored path - Blind signatures never influence thresholding, rescue tuning, or baseline selection - Source-linked signatures are forbidden in both the primary and blind panels ## Step 1: Install The Locked Environment ```bash uv sync --frozen ``` ## Step 2: Build Or Confirm The Frozen Benchmark ```bash uv run --frozen --no-sync fidelity-atlas build-freeze --config config/canonical_fidelity.yaml --out data/freeze ``` ## Step 3: Run The Canonical Benchmark ```bash uv run --frozen --no-sync fidelity-atlas run --config config/canonical_fidelity.yaml --out outputs/canonical ``` ## Step 4: Verify The Canonical Run ```bash uv run --frozen --no-sync fidelity-atlas verify --config config/canonical_fidelity.yaml --run-dir outputs/canonical ``` ## Step 5: Build The Paper From Frozen Outputs ```bash uv run --frozen --no-sync fidelity-atlas build-paper --config config/canonical_fidelity.yaml --run-dir outputs/canonical --out paper/build ``` `build-paper` is a freeze blocker. It stops immediately if the verified run is not freeze-ready under the pre-registered success rule. ## Step 6: Optional Triage ```bash uv run --frozen --no-sync fidelity-atlas triage --config config/canonical_fidelity.yaml --input inputs/new_signature.tsv --out outputs/triage ``` ## Canonical Success Criteria The canonical scored path is successful only if: - `build-freeze` completes with the exact locked class counts - the source-leakage audit passes - all class-label fields are present and dual-curator locked - the canonical run completes successfully - the verifier exits `0` - the full model still satisfies the pre-registered success rule after the honest re-freeze - `paper/main.pdf` builds from the frozen outputs - all required outputs are present and nonempty
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.