Compare neutral drift model vs frequency-dependent selection for ARG frequency distributions in 3 databases (CARD, ResFinder, AMRFinderPlus) across 2,400 bacterial genomes. Neutral drift (Wright-Fisher with mutation) fits observed frequency spectra with KS p>0.
Compare CLR, ALR, ILR, and raw relative abundance on 4 published microbiome-disease association datasets (IBD, obesity, colorectal cancer, diabetes). The 'winning' method (highest number of significant associations at FDR<0.
Benchmark ML survival models (Cox-PH, RSF, DeepSurv, Cox-nnet) on genomics/transcriptomics/proteomics features vs TNM clinical staging alone across 12 TCGA cohorts (N=5,847). Mean C-index: clinical staging 0.
Batch effects are a major confounder in genomics, and multiple correction methods exist. We compare ComBat, limma removeBatchEffect, Harmony, scVI, and MNN on 5 paired RNA-seq datasets where the same biological comparison was performed in two independent batches.
Alternative polyadenylation (APA) has been proposed as a cancer biomarker, with studies reporting widespread 3'UTR shortening in tumors. We test whether APA changes are cancer-specific or tissue-specific by analyzing RNA-seq data from 8 TCGA cancer types across 5 tissue origins (4,200 tumor, 800 normal samples).
GC-content bias in microarray and RNA-seq platforms is well-documented but rarely corrected in differential expression analyses. We audit 20 widely-cited microarray datasets from GEO, applying a permutation-based test that evaluates whether the overlap between differentially expressed gene lists and GC-content-correlated genes exceeds chance.
SNNs promise energy efficiency via sparse spike trains, but accuracy requires sufficient timesteps, creating a latency-accuracy tradeoff. We characterize this for 8 SNN architectures on CIFAR-10/100 and DVS-Gesture at timesteps 1-128.
The sim-to-real transfer gap is assumed to grow with task complexity, but we find a U-shaped relationship. Across 6 manipulation tasks (reaching, pushing, pick-and-place, stacking, insertion, bimanual assembly) with 5 domain randomization levels on Franka Emika: simple tasks transfer well (gap 8-12%), moderate tasks show maximum gap (28-41%), complex tasks show reduced gap (18-24%).
In cooperative MARL, free-riding agents contribute minimally while benefiting from team rewards. We propose Shapley Contribution Tracking (SCT) using online Shapley value approximation.
Multi-agent LLM systems chain multiple model instances via natural language, but scaling properties are unknown. We study 2-16 agents across four patterns (sequential, broadcast, hierarchical, peer-to-peer).
Fault-tolerant LLM training requires periodic checkpointing. We analyze the cost structure across 64-4,096 GPUs, comparing checkpoint overhead against failure recovery cost.
Distributed LLM training suffers from straggler nodes that impose synchronization barriers. We analyze 2,400 training runs on clusters of 10-512 GPUs with data/tensor/pipeline parallelism.
LLM APIs process inputs autoregressively, coupling response latency to input/output length. We demonstrate this creates an exploitable timing side channel: observing only response time reveals input token count with 93.
Prompt injection is a critical LLM security vulnerability. We analyze the tradeoff between injection resistance and helpfulness across 12 models from 4 families.
LLMs generate unit tests with impressive coverage, but we challenge this optimism using mutation testing. We evaluate GPT-4, Claude-3, CodeLlama-34B, and DeepSeek-Coder-33B on 200 Python functions from popular libraries.