This research note presents VIC-Research-Assistant, a minimal, reproducible Vertical Intelligence Companion (VIC) designed to demonstrate the VIC-Architect Eight-Pillar Framework (v4.2) with zero external dependencies.
We present VIC-Research-Assistant, a minimal, reproducible Vertical Intelligence Companion that demonstrates the VIC-Architect Eight-Pillar Framework v4.2 with zero external dependencies.
We present VIC-Research-Assistant, a minimal, reproducible Vertical Intelligence Companion that demonstrates the VIC-Architect Eight-Pillar Framework v4.2 with zero external dependencies.
We present a three-phase AI-agent research protocol for automated discovery of mathematical expressions from integer sequence data. Phase 1 uses genetic programming to evolve closed-form expressions over 12 operators.
We present a fully reproducible 10-step computational pipeline for partition-theoretic congruence exploration. The pipeline computes exact values of three partition-theoretic functions — the partition function p(n) to n=10,000, the Ramanujan tau function tau(n) to n=500, and the overpartition function p_bar(n) to n=5,000 — and performs systematic congruence verification, equidistribution testing, and new pattern discovery.
Text embedding applications increasingly require real-time streaming updates—from conversational agents to recommendation systems processing continuous user interactions. While bidirectional attention models achieve superior embedding quality, they break key-value cache compatibility, requiring full sequence recomputation for each update.
Biologic therapies for autoimmune rheumatic diseases carry significant risk of tuberculosis reactivation. TB-SCREEN is an agent-executable 10-domain clinical decision support tool integrating TST/IGRA results, chest radiography, epidemiologic exposure, immunosuppression burden, biologic-specific risk profiles, comorbidities, and laboratory markers to generate a composite risk score (0-100) with Monte Carlo 95% confidence intervals.
Long chain-of-thought (CoT) reasoning has substantially improved vision-language model (VLM) performance on complex visual tasks. However, extended generation causes visual forgetting, where models progressively lose dependence on image content and increasingly rely on language priors, leading to hallucinations.
Large language models often know multiple valid conventions for mathematical notation but default to the wrong one when a specific convention is required. We introduce Definition Unit Tests (DUT), a prompting method that improves convention adherence by prepending discriminative checks—simple verification questions that test whether the model correctly interprets the specified convention—before the main problem.
Diffusion language models have emerged as a promising alternative to autoregressive generation, yet they significantly underperform on structured output tasks such as tool calling. A common hypothesis attributes this gap to formatting failures that could be addressed through constrained decoding.
Visual token pruning is essential for efficient vision-language model inference, yet existing attention-based methods either fail catastrophically on spatially-sensitive tasks or require offline calibration data. We present a simple solution: use attention from deeper layers.
KV cache quantization enables long-context inference in large language models but degrades accuracy at aggressive 2-bit precision. Recent methods like Kitty recover accuracy by dynamically boosting outlier channels to higher precision, but this requires per-page magnitude computation and metadata overhead.
DFA-guided diffusion language models enable constrained text generation by steering denoising with gradients of DFA acceptance probability. However, the DFA dynamic programming computation accounts for 57–59% of each guided step, creating a significant bottleneck.
Template overlap between training and test splits is a persistent concern in document understanding benchmarks, as models may memorize specific form layouts rather than learning generalizable detection capabilities. We present TEMPLATELEAK, an audit framework that uses MinHash/LSH clustering to identify template overlap and applies document-level permutation testing to assess statistical significance.
Elastic Spectral State Space Models (ES-SSM) enable runtime budget adaptation through ordered spectral truncation, allowing a single model to operate at any spectral budget K by using only the first K channels. However, ES-SSM suffers from severe accuracy degradation at low budgets, limiting practical deployment.
Engram-style conditional memory augments transformers with hash-indexed n-gram embeddings and learned gating, but prior work has identified a critical training pathology: gates become systematically mis-calibrated, preferring high-frequency “hot” memory slots even when low-frequency “cold” positions achieve lower loss. We propose Counterfactual Gate Supervision (CGS), which computes per-token counterfactual loss differences under forced gate settings and uses this signal to supervise gate activations via an auxiliary loss.
Multi-turn LLM applications with prefix caching are increasingly common in production deployments. Speculative decoding accelerates inference by using a draft model to propose tokens verified in parallel, but its serialization requirement creates a severe bottleneck under concurrent multi-tenant load.
Kalman Policy Optimization (KPO) applies causal Kalman filtering to smooth importance sampling ratios in LLM reinforcement learning, but its performance is sensitive to the process-to-measurement noise ratio Q/V: weak smoothing (large Q/V) degrades accuracy by 11.79 percentage points on MATH-500.
Text embedding applications increasingly require real-time streaming updates—from conversational agents to recommendation systems processing continuous user interactions. While bidirectional attention models achieve superior embedding quality, they break key-value cache compatibility, requiring full sequence recomputation for each update.