Compact Frozen Atlas Snapshots Prioritize Safe Cell-Therapy Single Targets and Bulk-Supported Logic-Gate Rescue Hypotheses across Solid Tumors
Safety-Filtered Cell-Therapy Target Prioritization from Compact Frozen Atlas Snapshots across Five Solid Tumor Types
We present a deterministic, offline target-prioritization workflow that ranks single-antigen cell-therapy leads only after passing explicit safety filters against bulk-normal RNA, bulk-normal protein, and adult healthy single-cell expression data. The workflow operates on compact frozen snapshots covering five epithelial solid tumor types (ovarian, pancreatic, gastric, hepatocellular, lung adenocarcinoma) with nine candidate surface antigens and three independent safety data layers.
Method
For each candidate gene and tumor type, the workflow computes: tumor prevalence (fraction of samples above a log2(TPM+1) >= 2.0 threshold), tumor intensity (capped median of positive samples), patient patchiness (Gini heterogeneity plus prevalence loss), bulk RNA risk (tiered by maximum normal-tissue nTPM), bulk protein risk (ordinal mapping), and adult healthy single-cell risk (maximum adjusted positive fraction with 1.5x critical-compartment multiplier). The composite score is a fixed weighted sum: 0.35prevalence + 0.25intensity + 0.15surface_confidence - 0.10RNA_risk - 0.05protein_risk - 0.05single_cell_risk - 0.05*patchiness. Candidates must pass both an off-tumor safety certificate and a tumor coverage certificate to qualify.
Results
Across five individually scored tumor types, the workflow promotes MSLN in ovarian (score 0.691) and pancreatic (0.613), and GPC3 in hepatocellular (0.737), while rejecting all candidates in gastric and lung adenocarcinoma due to safety or coverage failures. These results are consistent with the clinical literature: MSLN and GPC3 are under active CAR-T clinical investigation for these indications.
In a separate rediscovery benchmark (45 gene-tumor pairs, 3 positives from registered clinical trials and preclinical studies, 4 negative controls), the full model achieves AUPRC of 1.0 versus 0.867 for a tumor-overexpression-only baseline. However, this benchmark is a compact fixture-level validation: the perfect AUPRC reflects a trivially small evaluation set (3 positives among 45 pairs) rather than broad predictive power. The benchmark confirms that adding protein and single-cell safety data improves negative-control suppression.
Optional logic-gate rescue hypotheses are generated from bulk tumor co-detection but are explicitly not validated at the single-cell level and should be treated as exploratory.
Limitations
All results derive from compact vendored snapshots (5 samples per tumor type, 9 genes). The AUPRC of 1.0 is trivially achieved on 3 positives. Logic-gate hypotheses lack single-cell co-expression validation. Fetal cell types are excluded from safety scoring. No immunopeptidomics, HLA modeling, or microenvironment features are included. This is a methodology demonstration, not a discovery claim.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: cell-therapy-target-cartographer description: Execute a deterministic, offline target-prioritization workflow for safety-filtered solid-tumor cell-therapy single-antigen leads and optional bulk-supported logic-gate rescue hypotheses. allowed-tools: Bash(uv *, python *, ls *, test *, shasum *) requires_python: "3.12.x" package_manager: uv repo_root: . canonical_output_dir: outputs/canonical --- # Cell Therapy Target Cartographer This skill executes the canonical scored path. It can also run the rediscovery benchmark and the logic-gate hypothesis benchmark as optional steps. ## Scientific Context CAR-T cell therapy is effective in blood cancers but limited in solid tumors by on-target/off-tumor toxicity. This workflow prioritizes single-antigen targets that pass explicit safety filters against three independent normal-tissue data layers: bulk RNA (HPA-style), bulk protein (HPA-style), and adult healthy single-cell expression (SCA-style). Tumor-side features include prevalence, intensity, and patient heterogeneity. The composite score is a fixed weighted sum with documented weights and thresholds. The workflow covers five epithelial solid tumor types: ovarian (OV), pancreatic (PAAD), gastric (STAD), hepatocellular (LIHC), and lung adenocarcinoma (LUAD). ## Runtime Expectations - Platform: CPU-only - Python: 3.12.x - Package manager: `uv` - Offline execution: no network access required after clone time - Canonical input: `inputs/canonical_tumor_type.txt` (single tumor type) or `inputs/canonical_tumor_panel.txt` (multi-type panel) ## Step 1: Confirm Canonical Input ```bash test -f inputs/canonical_tumor_type.txt shasum -a 256 inputs/canonical_tumor_type.txt ``` Expected SHA256: ```text 103d49f5a3df9387156dcdef7bd1e6f2756bafee0303528550c2e093079b5450 ``` ## Step 2: Install the Locked Environment ```bash uv sync --frozen ``` Success condition: - `uv` completes without changing `uv.lock` ## Step 3: Run the Canonical Pipeline ```bash uv run --frozen --no-sync cell-therapy-target-cartographer run --config config/canonical_targeting.yaml --input inputs/canonical_tumor_type.txt --out outputs/canonical ``` Success condition: - `outputs/canonical/manifest.json` exists - all required canonical JSON and CSV artifacts are present ## Step 4: Verify the Run ```bash uv run --frozen --no-sync cell-therapy-target-cartographer verify --run-dir outputs/canonical ``` Success condition: - exit code is `0` - `outputs/canonical/verification.json` exists - verification status is `passed` ## Step 5: Confirm Required Artifacts Required files: - `outputs/canonical/manifest.json` - `outputs/canonical/normalization_audit.json` - `outputs/canonical/single_target_scores.csv` - `outputs/canonical/top_single_targets.csv` - `outputs/canonical/off_tumor_safety_certificate.json` - `outputs/canonical/tumor_coverage_patchiness_certificate.json` - `outputs/canonical/verification.json` ## Step 6: Canonical Success Criteria The canonical path is successful only if: - all vendored scored-path assets match the configured SHA256 hashes - the run command finishes successfully - the verify command exits `0` - all required canonical artifacts are present and nonempty - the top ranked safety-filtered target identities match the frozen expectations - the certificate verdicts match the frozen expectations ## Step 7: Run Tests ```bash uv run --frozen --no-sync pytest -q ``` Success condition: 10 tests pass. ## Optional: Multi-Tumor-Type Runs The workflow supports five tumor types. To run on individual types: ```bash uv run --frozen --no-sync cell-therapy-target-cartographer run --config config/canonical_targeting.yaml --input <(echo "OV") --out outputs/run_OV uv run --frozen --no-sync cell-therapy-target-cartographer run --config config/canonical_targeting.yaml --input <(echo "PAAD") --out outputs/run_PAAD uv run --frozen --no-sync cell-therapy-target-cartographer run --config config/canonical_targeting.yaml --input <(echo "LIHC") --out outputs/run_LIHC ``` Expected qualifying leads: MSLN in OV (0.691), MSLN in PAAD (0.613), GPC3 in LIHC (0.737). STAD and LUAD produce no qualifying targets. ## Optional: Rediscovery Benchmark ```bash uv run --frozen --no-sync cell-therapy-target-cartographer benchmark-rediscovery --config config/canonical_targeting.yaml --out outputs/rediscovery ``` Note: The benchmark uses 3 positive labels from clinical trials and preclinical studies, not from atlas data. The AUPRC of 1.0 reflects a trivially small evaluation set (3 positives among 45 pairs). ## Optional: Logic-Gate Hypothesis Benchmark ```bash uv run --frozen --no-sync cell-therapy-target-cartographer benchmark-logic-gates --config config/canonical_targeting.yaml --out outputs/logic_gate_benchmark ``` Logic-gate outputs are bulk-supported rescue hypotheses only. Same-malignant-cell co-expression is unobserved in v1 and would require tumor single-cell RNA-seq data for validation.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.