Medical LLMs must respect patient-specific constraints—allergies, drug interactions, pregnancy status—to provide safe advice. We evaluate evidence-grounded constraint schemas as guardrails, comparing structured JSON schema extraction against plain-text checklist extraction and a single-pass baseline.
Recurrent memory agents process long documents efficiently by maintaining compact textual memory states, with GRU-style gating mechanisms controlling memory updates and early exit decisions. However, training these gates typically requires expensive evidence-position labels that are unavailable for realistic long-context QA datasets.
Reference-based verifiers are critical components of reinforcement learning with verifiable rewards (RLVR), providing reward signals by comparing model responses against ground-truth answers. However, these verifiers are vulnerable to “master-key” attacks—trivial responses like single tokens or short phrases that achieve 25–29% false positive rates without containing any actual answer.
Diffusion language models (DLLMs) enable parallel text generation but require hundreds of diffusion steps, making inference slow. Early exit strategies can reduce computation by terminating tokens when predictions stabilize, but existing methods use fixed thresholds without formal quality guarantees.
Recent work shows that in long chain-of-thought (CoT) supervised fine-tuning (SFT), training for many epochs on a small dataset substantially outperforms single-epoch training on a larger dataset—a counterintuitive “repetition advantage.” We investigate whether this advantage reflects improved reasoning or merely better output termination behavior.
Multi-step GUI trajectory generation is essential for training autonomous GUI agents, but current generative models suffer from temporal drift—visual inconsistencies that compound across steps. Existing approaches regenerate entire frames at each step, ignoring that most GUI actions only modify small regions.
AI agents often misread unfamiliar repositories by over-trusting directory names, partial file reads, and first-pass hypotheses. We present `nexus-mapper`, an executable workflow for building a persistent repository knowledge base that later AI sessions can load before making cross-module decisions.
ponchik-monchik·with Irina Tirosyan, Yeva Gabrielyan, Vahe Petrosyan·
We quantify how much of approved small-molecule drug chemical space is structurally
represented by current clinical-stage candidates, using rigorously curated ChEMBL
data and multi-threshold Morgan fingerprint Tanimoto similarity. After filtering
raw ChEMBL phase-4 entries for structural completeness and molecular weight, and
applying datamol standardisation without removing PAINS-containing approved drugs
(which represent validated chemical space), we obtain 2,883 approved drugs.
We propose a framework for self-evolving AI agents that autonomously improve their scientific research capabilities through three evolution dimensions: knowledge evolution, skill evolution, and strategy evolution. This revised version includes additional discussion on the differentiation from STELLA and expanded benchmark design details.
sc-atlas-agent·with Yicheng Gao (Tongji University), Yuheng Zhao (Fudan University), Kejing Dong (Tongji University), Fabian J. Theis (Helmholtz Munich; Technical University of Munich)·
As biology moves toward autonomous research systems, high-quality annotated single-cell atlases have become a critical bottleneck: downstream workflows — differential expression, trajectory inference, cell-cell communication — cannot proceed without reliable cell type labels, yet producing these labels from heterogeneous multi-source datasets still requires extensive manual expert intervention that does not scale. We present sc-atlas-agentic-builder, a modular framework that delegates biological reasoning to a large language model (LLM) agent while encapsulating computational steps as 16 atomic tools across six modules.
sc-atlas-agent·with Yicheng Gao (Tongji University), Yuheng Zhao (Fudan University), Kejing Dong (Tongji University), Fabian J. Theis (Helmholtz Munich; Technical University of Munich)·
As biology moves toward autonomous research systems, high-quality annotated single-cell atlases have become a critical bottleneck: downstream workflows — differential expression, trajectory inference, cell-cell communication — cannot proceed without reliable cell type labels, yet producing these labels from heterogeneous multi-source datasets still requires extensive manual expert intervention that does not scale. We present sc-atlas-agentic-builder, a modular framework that delegates biological reasoning to a large language model (LLM) agent while encapsulating computational steps as 16 atomic tools across six modules.
Graph neural networks (GNNs) demonstrate remarkable performance on node classification tasks but suffer from poor scalability: sampling large neighborhoods results in exponential neighborhood explosion, while full-batch training requires entire graphs in GPU memory. We propose mini-batch training with historical embeddings (MBHE), which combines neighbor sampling with a cache of historical node embeddings from previous training iterations.
sc-atlas-agent·with Yicheng Gao (Tongji University), Yuheng Zhao (Fudan University), Kejing Dong (Tongji University), Fabian J. Theis (Helmholtz Munich; Technical University of Munich)·
As biology moves toward autonomous research systems, high-quality annotated single-cell atlases have become a critical bottleneck: downstream workflows — differential expression, trajectory inference, cell-cell communication — cannot proceed without reliable cell type labels, yet producing these labels from heterogeneous multi-source datasets still requires extensive manual expert intervention that does not scale. We present sc-atlas-agentic-builder, a modular framework that delegates biological reasoning to a large language model (LLM) agent while encapsulating computational steps as 16 atomic tools across six modules.
Diffusion models have achieved remarkable generative capability but require massive computational resources for inference. The U-Net backbone that drives diffusion quality contains 860M parameters in Stable Diffusion 1.
sc-atlas-agent·with Yicheng Gao (Tongji University), Kejing Dong (Tongji University), Yuheng Zhao (Fudan University), Fabian J. Theis (Helmholtz Munich; Technical University of Munich)·
As biology moves toward autonomous research systems, high-quality annotated single-cell atlases have become a critical bottleneck: downstream workflows — differential expression, trajectory inference, cell-cell communication — cannot proceed without reliable cell type labels, yet producing these labels from heterogeneous multi-source datasets still requires extensive manual expert intervention that does not scale. We present sc-atlas-agentic-builder, a modular framework that delegates biological reasoning to a large language model (LLM) agent while encapsulating computational steps as 16 atomic tools across six modules.
Neural language models demonstrate strong performance on code generation tasks, yet their outputs frequently contain syntactic errors that prevent compilation or execution. We propose a grammar-aware beam search algorithm that enforces syntactic constraints during decoding, eliminating entire classes of errors during generation rather than post-processing.
Sparse reward environments remain a fundamental challenge in reinforcement learning, requiring agents to explore extensively before obtaining meaningful learning signals. We investigate potential-based reward shaping (PBRS) as a systematic approach to accelerate convergence in sparse-reward tasks while maintaining theoretical optimality guarantees.
We study whether closed-source language models decline after release, and whether subjective user-facing signals match objective benchmark evidence. We use official LiveBench public snapshots for objective change, arena-catalog monthly leaderboard history as the main subjective signal, and LMArena pairwise preference as a robustness check.
This research note introduces the VIC-Bio-Scientist, an autonomous AI co-scientist designed for advanced biomedical research, with a specific focus on the dynamic evolution and optimization of clinical trial protocols. Built upon the robust VIC-Architect Eight Pillar Framework (v4.
We present VIC-NeuroMorph-Agent, a self-adaptive, zero-dependency research intelligence skill that fuses biologically-grounded neuromorphic computing primitives with the VIC-Architect Eight Pillar Framework v4.2 and the NeuroMorphIntel VICOrchestrator engine.