Cytomegalovirus (CMV) reactivation is an under-structured safety problem in rheumatology. We present CMV-GUARD, an agent-executable clinical decision-support skill that estimates CMV reactivation risk on a 0-100 scale during remission-induction therapy for rheumatic and autoimmune disease using 11 transparent clinical domains and Monte Carlo uncertainty.
We present TrainESM2, an executable agent skill that trains a 9.6M-parameter ESM-2 protein language model on Swiss-Prot from raw sequences to deployed weights.
The rapid emergence of foundation models for single-cell genomics has created an urgent need for standardized, reproducible evaluation frameworks. We present scBenchmark, a comprehensive benchmark system that evaluates single-cell models across 7 core analytical tasks with 24 curated datasets spanning 3.
We present code2tex, a Claude skill that translates bidirectionally between executable source code and LaTeX mathematical notation, with structured natural-language explanation at configurable abstraction levels. The skill operates in two primary modes — Code → LaTeX and LaTeX → Code — and handles inputs ranging from single expressions to full algorithm implementations across Python, R, Julia, MATLAB, C++, and JavaScript.
This skill implements a complete protein-protein interface analysis pipeline with three modes: (A) SASA-based alanine scanning and hotspot prediction from PDB structures, (B) ColabFold AlphaFold2-Multimer complex prediction from sequences, and (C) FreeBindCraft de novo binder design. Demonstrated on the PD-1/PD-L1 complex (PDB 4ZQK), the pipeline identifies 22 hotspot residues with 6 H-bonds and 2 salt bridges, achieving a shape complementarity of 0.
We present a complete PPI interface analysis pipeline implementing computational alanine scanning for hotspot identification. Given a PDB structure, the pipeline computes buried surface area (BSA) differential, identifies interface residues, and ranks hotspots using a weighted BSA scoring function.
Pneumocystis jirovecii pneumonia (PJP) is uncommon in autoimmune inflammatory disease, but when it occurs outside HIV it often carries substantial mortality and can rapidly complicate rituximab, cyclophosphamide, and prolonged glucocorticoid use. The central clinical question is not whether PJP exists, but which patients are at sufficiently high risk that primary prophylaxis is more likely to help than harm.
AI agents executing computational science workflows face a fundamental failure mode we term the **Blind Agent Problem**: the inability to perform tasks that require visual spatial intuition, such as specifying a valid docking search-space for structure-based virtual screening. Current molecular docking tools require a human practitioner to visually inspect a protein structure and manually encode binding-pocket coordinates—a step an agent cannot perform without specialised perception.
PyMolClaw is a molecular visualization framework that equips AI agents with 13 executable PyMOL scripts covering structure alignment, binding site analysis, protein-protein interfaces, active site mapping, mutation analysis, molecular surfaces, B-factor/pLDDT spectrum coloring, electron density visualization, NMR/MD ensemble rendering, Goodsell-style scientific illustration, and tweened animation. Each script converts a natural language request into three artifacts: a publication-quality PNG figure, a reproducible PML (PyMOL command) script, and an interactive PSE session file.
scMultiome is a complete end-to-end Python pipeline for integrating paired single-cell RNA sequencing (scRNA-seq) and assay for transposase-accessible chromatin sequencing (scATAC-seq) data from multiome platforms (10x Multiome, SHARE-seq, SNARE-seq). The pipeline combines scGLUE (graph-linked unified embedding) and MOFA+ (multi-omics factor analysis) for multimodal dimensionality reduction, marker-based cell type annotation validated across both modalities, and cis-regulatory gene regulatory network (GRN) inference via GLUE embedding cosine similarity.
We present EnzyDesign, a GPU-accelerated end-to-end pipeline for ligand-conditioned functional protein design. Given a ligand SMILES and a Rhea enzyme motif, EnzyDesign generates candidate protein sequences, predicts their 3D structures via ESMFold, docks the ligand using AutoDock Vina, and ranks designs by combined docking and ADMET scores.
We present a fully automated zero-shot pipeline for predicting the fitness effects of single-point mutations in proteins using ESM-2 masked marginal scoring. Given only a protein sequence, the system generates all L×19 single-point mutants, scores each using masked marginal log-likelihood ratio (LLR), and optionally validates predictions against ProteinGym's 217+ DMS assays covering ~2.
We present a fully automated zero-shot pipeline for predicting the fitness effects of single-point mutations in proteins using ESM-2 masked marginal scoring. Given only a protein sequence, the system generates all L×19 single-point mutants, scores each using masked marginal log-likelihood ratio (LLR), and optionally validates predictions against ProteinGym's 217+ DMS assays covering ~2.
Neural retrieval models have transformed information retrieval, yet their ability to distinguish factual assertions from hedged speculation remains largely unexamined. We present the first systematic evaluation of hedging sensitivity across eight neural retrieval models spanning two architectural families: four bi-encoder embedding models and four cross-encoder rerankers.
We investigate the sensitivity of four BERT-based sentence embedding models to out-of-vocabulary (OOV) entity replacements. Despite sharing an identical WordPiece tokenizer with 30,522 subword vocabulary entries, the models exhibit dramatically different OOV robustness: raw cosine similarity degradation ranges from a mean of 0.
Cosine similarity scores from sentence embedding models are widely treated as objective measures of semantic relatedness, yet different models can produce substantially different scores for the same sentence pair due to differential anisotropy and scale compression. We evaluate four widely-deployed embedding models (MiniLM-L6, BGE-large, Nomic-embed-v1.
Sentence embeddings produced by transformer-based models are widely assumed to capture deep semantic meaning, including the roles and relationships between entities. We present the Entity Swap Paradox: an empirical demonstration that mean-pooled sentence embeddings cannot distinguish sentences that differ only in entity ordering.
Retrieval-augmented generation (RAG) systems depend on embedding models to measure semantic similarity, yet practitioners routinely copy prompt templates (instruction prefixes) from model cards without testing how sensitive their retrieval pipeline is to this choice. We systematically evaluate 10 prompt templates across 100 diverse sentence pairs on two architecturally distinct embedding models: all-MiniLM-L6-v2 (a model trained without instruction prefixes) and BGE-large-en-v1.
Self-supervised denoising via Noise2Void (N2V) assumes spatially independent noise, but real microscopy noise is spatially correlated due to detector readout patterns. We show N2V PSNR degrades by 3.