From Exciting Hits to Durable Claims: A Self-Auditing Robustness Ranking of Longevity Interventions from DrugAge
Abstract
DrugAge contains many promising lifespan-extension results, but striking effects in isolated experiments do not automatically become durable scientific claims. We present an offline, agent-executable workflow that turns DrugAge into a robustness-first screen for longevity interventions. Rather than rewarding the single most exciting reported result, the workflow favors compounds whose pro-longevity signal is broad across species, survives prespecified stress tests, and remains measurably above a species-matched empirical null baseline. The canonical run uses a vendored DrugAge Build 5 snapshot, explicit normalization rules, evidence tiers, a Claim Stability Certificate, and an Empirical Null Certificate. In the frozen rerun, the workflow retained 3,372 scored experiments spanning 1,038 compounds, 33 normalized species, and 9 taxon labels, and identified 48 robust compounds. The observed top-10 mean robustness score was 0.94097 versus a null mean of 0.91330, while the robust-compound count was 48 versus a null mean of 29.49 (both empirical p = 0.00775). The result is a reproducible shortlist of model-organism longevity claims that has been stress-tested before reporting, not a recommendation engine for human interventions.
Introduction
This submission presents an agent-executable workflow for ranking DrugAge longevity intervention claims by robustness. The scored path is fully offline and uses a vendored Human Ageing Genomic Resources (HAGR) DrugAge Build 5 snapshot, deterministic normalization, evidence tiers, a Claim Stability Certificate, and an Empirical Null Certificate.
The contribution is not a leaderboard of raw lifespan effects. The contribution is a self-auditing scientific skill that asks which model-organism longevity claims remain convincing after perturbation and falsification pressure.
Data
The canonical input is data/drugage_build5_2024-11-29.csv, a vendored DrugAge Build 5 snapshot dated November 29, 2024. The scored path validates the file hash, required columns, and release metadata before processing. No network access is needed after the repository is cloned.
Rows are dropped only if compound name, species, or numeric avg_lifespan_change_percent is missing. In the frozen rerun, the workflow retained 3372 of 3423 DrugAge rows, covering 1038 compounds, 33 normalized species, and 9 scored taxon labels.
Methods
The canonical ranking uses DrugAge's average lifespan change field, avg_lifespan_change_percent, because it is the most consistently populated and directly comparable effect-size field in DrugAge. Significance annotations are retained descriptively because they are heterogeneous across studies and do not provide a stable standalone ranking signal. For each compound, the workflow computes:
- number of experiments
- number of species
- number of taxa
- number of PMIDs
- median effect
- 10% trimmed-mean effect
- sign consistency
- leave-one-species-out stability
- leave-one-taxon-out stability
- aggregation stability
- breadth score
- robustness score
Compounds are then assigned to four evidence tiers:
robustpromisingthin evidenceconflicted
Ranking is robustness-first: tier priority dominates sort order, followed by robustness score, species breadth, taxonomic breadth, PMID breadth, and effect magnitude.
The scored path emits two scientific certificates.
The Claim Stability Certificate evaluates the top-ranked compounds under five fixed perturbations:
- leave-one-species-out positivity
- leave-one-taxon-out positivity
- positive median and trimmed mean
- exclusion of single-PMID compounds
- exclusion of mixed-sign compounds
The Empirical Null Certificate runs 128 fixed-seed species-stratified effect permutations. Within each species, average lifespan effects are shuffled across rows, preserving DrugAge's species composition and within-species effect distribution while breaking the link between compound identity and observed effect. With 128 reruns, the smallest nonzero empirical p-value is 1/129, approximately 0.0078.
Results
In the frozen rerun, the evidence tiers were:
robust: 48 compoundspromising: 174 compoundsthin_evidence: 435 compoundsconflicted: 381 compounds
The top 10 compounds were:
SpermidineApple extractN-acetyl-L-cysteineMinocyclineAlpha-ketoglutarateCarnosineRapamycinMycophenolic acidEpigallocatechin-3-gallateVitamin E
Some top-ranked compounds may look surprising; this ranking reflects internal robustness within curated model-organism evidence, not human plausibility or mechanistic priority.
All top-10 compounds passed leave-one-species-out positivity, leave-one-taxon-out positivity, positive median-vs-trimmed-mean checks, and the single-PMID exclusion perturbation. Three of the top 10 also passed the stricter mixed-sign exclusion perturbation.
The observed top-10 mean robustness score was 0.94097, compared with a null mean of 0.91330 and null standard deviation 0.01130. The corresponding empirical p-value was 0.00775, indicating measurable rather than overwhelming score separation. The more persuasive null result was the robust-compound count: 48, compared with a null mean of 29.49, null standard deviation 4.07, and empirical p-value 0.00775.
Optional AnAge Context
The optional AnAge context report joins normalized DrugAge species to a vendored AnAge snapshot for descriptive context only. It does not alter the canonical ranking, scores, tiers, or certificates. In the current rerun, 10 of 35 normalized DrugAge species matched AnAge exactly after normalization.
Limitations
This workflow ranks model-organism longevity evidence and does not recommend interventions for humans. It does not harmonize doses, perform causal mechanism inference, or treat DrugAge significance fields as scored inputs. Some top-ranked compounds may look surprising; this reflects robustness within curated model-organism evidence, not translational plausibility or mechanistic priority. The optional AnAge join is intentionally descriptive and partial.
Conclusion
This repository contributes a lightweight, offline, agent-native longevity workflow that ranks claims by robustness, certifies perturbation stability, and measures separation from a species-matched empirical null. The main result is not a static list of compounds. The main result is an executable skill that interrogates its own conclusions before reporting them.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: drugage-robustness-null-certified description: Execute a locked, offline DrugAge robustness ranking workflow with evidence tiers, a claim stability certificate, and an empirical null certificate. allowed-tools: Bash(uv *, python *, ls *, test *, shasum *) requires_python: "3.12.x" package_manager: uv repo_root: . canonical_output_dir: outputs/canonical --- # Claim-Certified DrugAge Robustness Skill This skill executes the canonical scored path only. It does not run the optional AnAge context report or posting helpers. ## Runtime Expectations - Platform: CPU-only - Python: 3.12.x - Package manager: `uv` - Canonical input: `data/drugage_build5_2024-11-29.csv` - Offline execution: no network access required after the repo is cloned ## Step 1: Confirm Canonical Input ```bash test -f data/drugage_build5_2024-11-29.csv shasum -a 256 data/drugage_build5_2024-11-29.csv ``` Expected SHA256: ```text 7ed9771440fa4e1e30be0d3c8e92d919254b572ab40c81e2440ba78c885401d4 ``` ## Step 2: Install the Locked Environment ```bash uv sync --frozen ``` Success condition: - `uv` completes without changing the lockfile ## Step 3: Run the Canonical Pipeline ```bash uv run --frozen --no-sync drugage-skill run --config config/canonical_drugage.yaml --out outputs/canonical ``` Success condition: - `outputs/canonical/manifest.json` exists - all required CSV, JSON, and PNG artifacts are present ## Step 4: Verify the Run ```bash uv run --frozen --no-sync drugage-skill verify --run-dir outputs/canonical ``` Success condition: - exit code is `0` - `outputs/canonical/verification.json` exists - verification status is `passed` ## Step 5: Confirm Required Artifacts Required files: - `outputs/canonical/manifest.json` - `outputs/canonical/normalization_audit.json` - `outputs/canonical/robustness_rankings.csv` - `outputs/canonical/compound_evidence_profiles.csv` - `outputs/canonical/claim_stability_certificate.json` - `outputs/canonical/claim_stability_heatmap.png` - `outputs/canonical/empirical_null_certificate.json` - `outputs/canonical/compound_null_significance.csv` - `outputs/canonical/null_separation_plot.png` - `outputs/canonical/verification.json` ## Step 6: Canonical Success Criteria The canonical path is successful only if: - the vendored DrugAge snapshot is used - the run command finishes successfully - the verify command exits `0` - all required artifacts are present and nonempty
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.


