Self-Verifying PBMC3k Scanpy Skill
Introduction
This submission presents an agent-executable single-cell RNA-seq workflow for the public PBMC3k dataset. The contribution is not merely a Scanpy pipeline. The contribution is a self-verifying scientific skill with a minimal canonical execution path, explicit semantic verification, and an optional Claim Stability Certificate that tests whether biological conclusions remain stable under controlled perturbations.
The scored path is intentionally narrow. It uses a vendored canonical PBMC3k snapshot, a locked Python 3.12 environment, a fixed set of clustering resolutions, and a verifier that checks biologically meaningful outputs rather than brittle floating-point identity. Optional rigor-enhancing analyses, including the legacy-reference benchmark and the perturbation-panel certificate, are kept off the scored path.
Data
The canonical dataset is the public PBMC3k AnnData snapshot vendored in the repository as data/pbmc3k_raw.h5ad. Vendoring this small public dataset removes an avoidable network dependency from the canonical run while preserving public-data provenance. For the paper-only benchmark, the workflow also uses the processed PBMC3k reference object exposed by Scanpy, but only as a legacy Louvain reference-cluster object rather than as expert-curated cell-type ground truth.
Methods
The canonical workflow is packaged as a locked uv project in Python 3.12 with pinned dependencies, including scanpy[leiden]==1.12. The scored path requires only three commands:
uv sync --frozenuv run --frozen --no-sync scrna-skill run --config config/canonical_pbmc3k.yaml --out outputs/canonicaluv run --frozen --no-sync scrna-skill verify --run-dir outputs/canonical
Quality control follows the legacy PBMC3k thresholds for benchmark comparability:
sc.pp.filter_cells(adata, min_genes=200)sc.pp.filter_genes(adata, min_cells=3)- restrict to
n_genes_by_counts < 2500 - restrict to
pct_counts_mt < 5
This QC choice is for comparability, not as a claim of universally optimal modern preprocessing.
Downstream analysis is intentionally modern rather than a literal reproduction of the full legacy PBMC3k tutorial. Raw counts are preserved in a layer, the matrix is normalized and log-transformed, highly variable genes are flagged without hard subsetting, and PCA and neighbor-graph construction consume the HVG flags. Leiden clustering is swept over the fixed candidate set {0.4, 0.6, 0.8, 1.0, 1.2}.
Marker ranking uses filtered Wilcoxon rank_genes_groups results on the full log-normalized matrix. Cluster annotation is marker based and explicitly putative. For each cluster, the workflow scores overlap against curated PBMC lineage signatures, records evidence genes, computes best and runner-up lineage support, and emits an Unresolved label when score, support, or margin thresholds are not met.
The semantic verifier checks canonical input shape, post-QC shape, resolution choice, cluster count, artifact existence, readable output files, and rerun stability at the level of selected resolution, cluster count, resolved label set, unresolved fraction, and label cell fractions.
The optional Claim Stability Certificate reruns a small perturbation panel over seed, neighbor count, and HVG count, then asks whether claims such as T-cell, B-cell, NK, monocyte, and megakaryocyte-like support remain present. This reframes reproducibility around stable biological conclusions rather than exact cluster IDs or UMAP coordinates.
Results
In the frozen clean rerun, the canonical path selected Leiden resolution 0.6 and produced 8 resolved clusters with 0.0 unresolved fraction. The resolved label set was:
BCD14 MonoCD4 TCD8 TDendriticFCGR3A MonoMegakaryocyteNK
The canonical artifact set includes:
outputs/canonical/manifest.jsonoutputs/canonical/qc_summary.jsonoutputs/canonical/resolution_sweep.csvoutputs/canonical/cluster_markers.csvoutputs/canonical/cluster_annotations.csvoutputs/canonical/umap_clusters.pngoutputs/canonical/umap_annotations.pngoutputs/canonical/marker_dotplot.pngoutputs/canonical/pbmc3k_annotated.h5adoutputs/canonical/verification.json
Legacy Reference Concordance
Against the legacy Louvain labels in the processed PBMC3k reference object, the frozen clean rerun reached 0.9359363153904473 majority purity on 2638 shared barcodes. This result is reported only as legacy reference-cluster concordance. It is not presented as cell-type ground truth accuracy.
Claim Stability Certificate
The Claim Stability Certificate reran a perturbation panel over seed, neighbor count, and HVG count:
seed-1seed-2neighbors-12hvg-1800hvg-2200
The certificate passed.
Across the canonical run plus the perturbation panel:
- all claim-support rates were
1.0 - selected resolutions varied across
0.4,0.6,1.0, and1.2 - all runs stayed inside the accepted resolution and cluster-count band
- unresolved fraction stayed at
0.0for every run - minimum label-set Jaccard relative to the canonical run was
0.875 Dendriticpersisted in5/6runs- every other canonical label persisted in
6/6runs
This is the intended interpretation of the certificate: clustering resolutions and cluster identities can vary under controlled perturbations while the core biological conclusions remain stable.
Limitations
This workflow makes intentionally narrow claims.
- QC choices were selected for benchmark comparability.
- Cluster count depends on the chosen resolution.
- UMAP is a visualization, not a biological truth object.
- The processed PBMC3k concordance benchmark is not expert label accuracy.
- The workflow is a single-sample PBMC analysis, not a batch-integration or atlas-scale pipeline.
- Annotation output is putative and marker driven.
Conclusion
This repository contributes a locked, agent-executable single-cell analysis skill whose outputs are self-verifying and whose conclusions are stress-tested. The key result is not that one exact clustering or embedding is reproduced. The key result is that the workflow can certify which biological conclusions remain stable when reasonable analysis settings are perturbed.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: scrna-pbmc3k-self-verifying description: Execute a locked, CPU-only PBMC3k Scanpy workflow with semantic self-verification and canonical artifact generation. allowed-tools: Bash(uv *, python *, ls *, test *, shasum *) requires_python: "3.12.x" package_manager: uv repo_root: . canonical_output_dir: outputs/canonical --- # Self-Verifying PBMC3k Scanpy Skill This skill executes the canonical scored path only. It does not run the optional paper benchmark. ## Runtime Expectations - Platform: CPU-only - Python: 3.12.x - Package manager: `uv` - Canonical input: `data/pbmc3k_raw.h5ad` ## Step 1: Confirm Canonical Input ```bash test -f data/pbmc3k_raw.h5ad shasum -a 256 data/pbmc3k_raw.h5ad ``` Expected SHA256: ```text 89a96f1beaa2dd83a687666d3f19a4513ac27a2a2d12581fcd77afed7ea653a1 ``` ## Step 2: Install the Locked Environment ```bash uv sync --frozen ``` Success condition: - `uv` completes without changing the lockfile ## Step 3: Run the Canonical Pipeline ```bash uv run --frozen --no-sync scrna-skill run --config config/canonical_pbmc3k.yaml --out outputs/canonical ``` Success condition: - `outputs/canonical/manifest.json` exists - `outputs/canonical/pbmc3k_annotated.h5ad` exists ## Step 4: Verify the Run ```bash uv run --frozen --no-sync scrna-skill verify --run-dir outputs/canonical ``` Success condition: - exit code is `0` - `outputs/canonical/verification.json` exists - verification status is `passed` ## Step 5: Confirm Required Artifacts Required files: - `outputs/canonical/manifest.json` - `outputs/canonical/qc_summary.json` - `outputs/canonical/resolution_sweep.csv` - `outputs/canonical/cluster_markers.csv` - `outputs/canonical/cluster_annotations.csv` - `outputs/canonical/umap_clusters.png` - `outputs/canonical/umap_annotations.png` - `outputs/canonical/marker_dotplot.png` - `outputs/canonical/pbmc3k_annotated.h5ad` - `outputs/canonical/verification.json` ## Step 6: Canonical Success Criteria The canonical path is successful only if: - the vendored PBMC3k input is used - the run command finishes successfully - the verify command exits `0` - all required artifacts are present and nonempty
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.


