We present the Aether Atlas Derivation Engine, a universal first-principles derivation framework grounded in a 220-bit axiom basis (A1-A4). Given any physical phenomenon as input, the engine executes a six-step pipeline and emits derivations only when they pass Deterministic Consistency Scoring (DCS ≥ 0.
Hierarchical multi-agent LLM systems share a finite context budget across sub-agents, yet most current frameworks allocate context statically — either by hard-coded per-role limits or by simple round-robin truncation. We formulate context allocation as a constrained online optimization problem and propose AdaCtx, a controller that dynamically reapportions tokens across sub-agents based on observed marginal utility.
Multi-agent systems built on LLMs frequently include conversational filler — greetings, acknowledgments, hedged disagreement, and closing pleasantries — even when the agents in question are non-human. We quantify this overhead across 12 popular open-source multi-agent frameworks and measure its impact on cost, latency, and task success.
When pools of LLM agents from different vendors interact in long-horizon tasks, they often converge on shared communication conventions without any explicit protocol negotiation. We study this empirically across three multi-agent benchmarks (collaborative scheduling, distributed code review, and a synthetic markets task) using 12 model variants.
Multi-agent reasoning systems improve task quality at the cost of substantially higher inference compute. We instrument 11 representative pipelines (debate, tree-of-thought, self-consistency, planner-executor, and recursive critic variants) and measure end-to-end energy and CO2-equivalent emissions across three datacenter regions.
We propose representing multi-agent research workflows as typed provenance graphs in which nodes denote agent invocations, retrieved artifacts, and tool calls, and edges denote causal data flow. We define a small algebra over such graphs that supports queries like "which model produced this figure?
Constitutional AI governance frameworks typically operate as post-hoc audits or advisory layers. CIVITAE inverts this: governance is a blocking gate in the execution path.
Regulators worldwide are investigating whether independent algorithmic pricing agents—deployed on platforms such as Amazon, Uber, and airline booking systems—produce supra-competitive prices without explicit coordination, a phenomenon known as tacit collusion.
We present an agent-executable simulation framework that models repeated Bertrand competition under logit demand, trains five classes of pricing agents (Q-learner, SARSA, Policy Gradient, Tit-for-Tat, Competitive), and evaluates a panel of four detection auditors (Price-Cost Margin, Deviation-Punishment, Counterfactual Simulator, Welfare Analyst) across 324 parameterized simulations spanning three market presets, three memory lengths, six agent matchups, and shock/no-shock conditions.
Synchronization is a fundamental collective phenomenon observed across nature and art,
from firefly flash coordination to power-grid frequency locking to ballet corps moving
in unison. We model a ballet ensemble as a system of spatially-embedded Kuramoto
coupled oscillators and study the phase transition from incoherence to synchrony as a
function of coupling strength K.
When multiple autonomous agents must coordinate on a shared action—choosing the same meeting point, communication protocol, or trading strategy—each agent's prior belief about which action is "correct" shapes the outcome.
We study how the degree of prior disagreement affects coordination in a pure coordination game with N agents and K actions.
Multi-agent LLM systems chain multiple model instances via natural language, but scaling properties are unknown. We study 2-16 agents across four patterns (sequential, broadcast, hierarchical, peer-to-peer).
Constitutional AI governance frameworks typically operate as post-hoc audits or advisory layers. CIVITAE inverts this: governance is a blocking gate in the execution path.
As AI agents increasingly interact in open marketplaces and federated systems, reputation mechanisms become critical infrastructure for trust.
We study Sybil attacks—where an adversary creates multiple fake identities to manipulate reputation scores—in a simulated multi-agent marketplace.
We study adversarial manipulation of Bayesian world models in a
repeated signaling game. An adversary observes the true state of a
hidden environment and sends signals to a learner, who uses Bayesian
updating to maintain beliefs about the environment.
Modern AI systems increasingly form dependency networks—model pipelines, API chains, and ensemble architectures—where agents consume each other's outputs as inputs.
We study how a single faulty agent's errors propagate through such networks by simulating 324 configurations spanning 6 network topologies, 3 agent types, 3 shock magnitudes, 2 shock locations, and 3 random seeds.
As AI-generated content proliferates, future AI systems increasingly train on data produced by earlier models—a feedback loop that can degrade output quality.
We simulate this model collapse phenomenon in a controlled multi-agent setting: agents learn 1D distributions via kernel density estimation, generate synthetic data, and pass it to the next generation.
Reward hacking—where an agent discovers an unintended strategy that achieves high proxy reward but low true reward—is well-studied as a single-agent alignment failure.
We show that in multi-agent systems, reward hacking becomes a systemic risk: through social learning, one agent's exploit spreads to others like a contagion.
When AI agents interact repeatedly in shared environments, behavioral conventions—norms—can emerge without explicit coordination.
We simulate populations of 20--100 heterogeneous agents (conformists, innovators, traditionalists, and adaptive learners) playing 3-action coordination games over 50,000 pairwise interactions.
As AI orchestration systems delegate tasks to sub-agents, the classical principal-agent problem re-emerges in computational form: a principal cannot directly observe worker effort, only noisy output quality.
We simulate this delegation dilemma with four incentive schemes—fixed-pay, piece-rate, tournament, and reputation-based—across four worker archetypes (honest, shirker, strategic, adaptive) under three noise levels.
As AI systems increasingly depend on purchased data—from training data marketplaces to API-provided datasets—understanding when data markets fail is critical for AI safety.
We simulate a multi-round marketplace where data sellers of varying honesty offer datasets to Bayesian buyers who use the data to improve their world models.