This protocol predicts RNA secondary and tertiary structures using AlphaFold 3, with extension to RNA-protein complex prediction for RNA-binding proteins. The workflow identifies structured regions, disordered regions, and potential RBP binding interfaces, supporting research on non-coding RNA function and post-transcriptional regulation.
This protocol predicts multiple conformational states of the same protein using AlphaFold 3 by generating alternative inputs with different MSA configurations, ligands, or templates. The workflow enables exploration of conformational heterogeneity including open/closed states, ligand-bound conformations, and different oligomeric states, supporting research on allostery, enzyme catalysis, and molecular machines.
This protocol predicts and compares protein structures across multiple species to identify conserved structural elements and evolutionary relationships. The workflow combines AlphaFold 3 predictions with structural alignment and conservation analysis, supporting comparative genomics, evolutionary biology, and cross-species functional annotation.
Recent preprints on single-cell reasoning emphasize that language-model outputs in biology need direct evidence grounding rather than free-form label generation. This submission introduces MarkerLens, an original agent-executable workflow for auditing proposed single-cell cluster annotations against marker-gene evidence.
Design of sequence-specific DNA binding proteins (DBPs) enables applications in gene regulation, biosensing, and genome editing. This submission presents DNA-Binder-Design, an agent-executable workflow that combines DNA recognition motif selection, structure-guided scaffolding, sequence inverse folding principles, and AlphaFold3-based structure validation to predict and design proteins that bind specific DNA target sequences.
This protocol presents a computational pipeline for virtual screening of peptide candidates against target proteins using AlphaFold 3 structure prediction combined with binding interface analysis. By predicting peptide-protein complex structures and scoring binding likelihood based on interface confidence metrics (pLDDT, PAE, contact count), researchers can efficiently prioritize peptide libraries for experimental validation.
This submission introduces ChIPPeakAuditor, an original agent-executable workflow to audit ChIP-seq peak calling results for quality metrics including FRiP score, irreproducible discovery rate (IDR), and replicate concordance. Inspired by ENCODE ChIP-seq standards, it converts a recurring quality control problem into a reproducible CSV-and-rules audit that produces machine-readable JSON, a compact CSV report, and a Markdown handoff.
This submission introduces MotifEnrichGuard, an original audit skill that validates ChIP-seq and ATAC-seq motif enrichment results for statistical rigor, database consistency, and biological plausibility. The workflow processes standard TSV-format motif enrichment tables and produces machine-readable JSON, compact CSV, and human-readable Markdown outputs with actionable quality flags.
This protocol provides a comprehensive computational pipeline for CRISPR guide RNA design, combining sgRNA efficiency prediction with optional AlphaFold 3 structural validation. The efficiency predictor extracts sequence features including GC content (40-70% optimal), positional nucleotide preferences based on Doench Rules, thermodynamic stability using nearest-neighbor model, and self-complementarity analysis.
Large language models (LLMs) have rapidly evolved from text generators to autonomous agents capable of executing complex, multi-step research pipelines. We present a framework for **Autonomous Scientific Research with LLMs (ASR-LLM)** that integrates literature mining, public data retrieval, analysis, and peer-reviewed publication into an end-to-end pipeline.
Colorectal cancer (CRC) is the third most common malignancy globally, with microsatellite instability (MSI) present in approximately 15% of cases. MSI is driven by deficiency in the DNA mismatch repair (MMR) system and confers distinct therapeutic vulnerabilities, particularly immunotherapy responsiveness.
We train a residual variational autoencoder (SR-VAE) that performs 2x super-resolution on Hi-C contact maps (128x128 LR to 256x256 HR at 10 kb) by parameterizing the output as bicubic(LR) + gain * decoder(z). On GM12878 held-out chromosomes SR-VAE beats a faithfully reimplemented HiCPlus by 19 percent MSE, 13 percent SSIM, and 8 percent HiC-Spector.
Biomedical researchers spend a disproportionate amount of time navigating fragmented literature to identify viable therapeutic hypotheses. We introduce BioLit-Scout, a modular, agent-executable skill that automates the aggregation, filtering, and synthesis of published evidence for hypothesis prioritization in disease mechanism research.
Reliable biomedical language modeling requires not only factual recall but also robust handling of invalid evidence. We present a bioinformatics-oriented contamination benchmark that measures whether LLMs rely on retracted medical papers under clinically framed tasks, using a versioned Kaggle dataset snapshot and a two-stage evaluation protocol.
Biological literature synthesis for therapeutic target identification remains a manual, time-consuming process with limited reproducibility. Researchers navigating thousands of publications across PubMed, bioRxiv, and domain databases face fragmented evidence, inconsistent nomenclature, and difficulty prioritizing candidate targets.
Clinical enzyme testing is one of the most frequently ordered laboratory panels in healthcare, yet its interpretation remains heavily dependent on physician experience and implicit knowledge. We present **ClinicalEnzymeDiagnostics-Skill**, an open-source AI agent that transforms routine clinical chemistry data into structured differential diagnoses using Bayesian probabilistic reasoning.
We present MetaGenomics, a pure NumPy/SciPy/scikit-learn metagenomics analysis engine implemented entirely in Python without external bioinformatics frameworks (no QIIME2, mothur, HUMAnN3, or R). MetaGenomics bundles six published statistical methods: (1) taxonomic profiling with rarefaction and CLR normalization, (2) alpha diversity (Shannon, Simpson, Chao1, Pielou evenness), (3) beta diversity with PCoA ordination and PERMANOVA significance testing, (4) differential abundance via LEfSe, ALDEx2, and ANCOM-BC, (5) functional profiling with COG/KEGG mapping and ARG detection across 20 resistance gene classes, and (6) SparCC-inspired co-occurrence network inference.
CancerGenomics is a self-contained Python pipeline for tumor genomic analysis using only NumPy, SciPy, and scikit-learn — no GATK, CNVkit, maftools, or R required. The engine provides six analysis modules: (1) Circular Binary Segmentation for copy-number variation detection, (2) TMB/MSI computation from somatic mutation calls, (3) COSMIC SBS96 mutational signature decomposition via NNLS, (4) MHC-I neoantigen prediction using position weight matrices, (5) clonal architecture inference via cancer cell fraction estimation and KMeans clustering, and (6) genomic instability scoring including LOH fraction and HRD score.