Emergent Coordination Protocols Among Heterogeneous Large-Language-Model Agents
Emergent Coordination Protocols Among Heterogeneous LLM Agents
1. Introduction
It has become routine to compose pipelines from agents drawn from multiple vendors --- e.g., a planner from one provider, a code-writer from another, an evaluator from a third. The conventional wisdom is that such pipelines require an explicit communication protocol designed up-front. We study an alternative: what happens if no protocol is specified and agents are only told the high-level task?
The phenomenon is reminiscent of emergent communication studied in multi-agent reinforcement learning [Lazaridou and Baroni 2020], but with an LLM-specific twist: the agents bring strong linguistic priors and substantial in-context reasoning, so convergence is typically faster and the converged protocol is human-legible.
We ask:
- How fast does convergence occur, and to what?
- Do heterogeneous pools converge differently from homogeneous ones?
- Are the converged protocols efficient relative to a hand-designed baseline?
2. Setup
2.1 Tasks
- Scheduling: 5 agents jointly schedule 24 meetings across constraints. Communication is unrestricted text.
- Distributed code review: 4 agents review a 600-line PR sequentially with arbitrary inter-message format.
- Synthetic market: 8 agents trade in a posted-offer market, communicating intent via free-form messages.
2.2 Agents
We sample from 12 model variants spanning four vendors. A pool is a tuple specifying which models occupy which roles. We test 24 pool configurations, 8 of which are homogeneous (all-from-one-vendor) and 16 heterogeneous.
2.3 Metrics
- Convergence time : turns until message format stabilizes (Levenshtein distance of consecutive same-role messages drops below 0.15 for 3 consecutive turns).
- Vocabulary entropy : Shannon entropy over message-prefix tokens, computed in a 50-message sliding window.
- Task success: domain-specific (e.g., schedule satisfies all constraints).
3. Results
3.1 Convergence dynamics
Median across all configurations is 7.4 turns; the 90th percentile is 13. Convergence is faster in homogeneous pools () than heterogeneous (), unsurprisingly.
Vocabulary entropy in the converged regime stabilizes at bits across pools. For comparison, a hand-designed JSON protocol for the same tasks has , suggesting emergent protocols are slightly more verbose but in the same complexity band.
3.2 Convergence regimes
We observe three regimes:
- Vendor-dominant (38% of heterogeneous pools): the protocol of one vendor's model becomes canonical.
- Hybrid (44%): a fused protocol with structural elements from multiple agents.
- Stigmergic (18%): no per-agent protocol; agents communicate via shared scratchpad updates rather than messages.
The stigmergic regime is surprising: it appears most often when one agent in the pool prefers terse outputs and others prefer verbose ones, and tends to be most efficient by token count.
3.3 Task success
| Regime | Scheduling success | Code-review F1 | Market efficiency |
|---|---|---|---|
| Hand-designed protocol | 0.94 | 0.83 | 0.79 |
| Vendor-dominant | 0.91 | 0.79 | 0.74 |
| Hybrid | 0.92 | 0.81 | 0.77 |
| Stigmergic | 0.93 | 0.78 | 0.81 |
No emergent regime is statistically inferior to the hand-designed protocol on more than one metric.
4. Noise-Injection Intervention
In vendor-dominant convergence we worry about protocol monoculture and the loss of redundancy. We test a simple intervention: each turn, with probability , an agent's outgoing message is rewritten by a small auxiliary model.
def inject_noise(msg, rho=0.05):
if random.random() < rho:
return paraphrase_via_aux(msg)
return msgUnder , the share of vendor-dominant convergences drops from 38% to 17% with no significant loss of task success (). Higher noise levels degrade task success linearly.
5. Discussion
The key takeaway is that under-specified multi-agent systems can self-organize to operating points comparable to hand-designed protocols, at least at the scale tested. This is not an argument against explicit protocols --- which remain superior for safety-critical use --- but it changes the calculus for prototyping and for systems where protocol designers are unavailable.
The stigmergic regime deserves special attention. It transfers state via a shared workspace rather than messages, and it scales better in agent count: at 12 agents, message-based protocols incur quadratic chatter while stigmergic costs grow linearly.
6. Limitations
Our tasks are synthetic; whether the same convergence dynamics hold in production workflows is unstudied. We did not control for prompt engineering effects beyond a fixed system message. Pool size was capped at 8 agents; behavior at 30+ agents may differ qualitatively.
7. Conclusion
Heterogeneous LLM pools coordinate faster and more competently than the literature on emergent communication might suggest. We catalog three convergence regimes and provide a simple lever (noise injection) to reduce monoculture risk. Code, prompts, and traces are released.
References
- Lazaridou, A. and Baroni, M. (2020). Emergent Multi-Agent Communication.
- Park, J. S. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior.
- Du, Y. et al. (2024). Improving Factuality and Reasoning via Multiagent Debate.
- Gibney, E. (2025). When AI Agents Talk to Each Other.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.