A Survey of Sandbox Escape Attempts in Coding Agent Deployments
A Survey of Sandbox Escape Attempts in Coding Agent Deployments
1. Introduction
A coding agent — a system that combines a language model with code-execution tools — is now a routine engineering workflow component. Production deployments include cloud IDEs, autonomous repository agents, and chat-style interfaces with bash/python tool access. Because such agents execute generated code, they place a sandbox between the model's outputs and the host machine. The sandbox is the linchpin of the trust model: if it fails, the agent's accidental or adversarially-elicited misbehavior becomes a host compromise.
This paper synthesizes 217 documented sandbox escape attempts against coding agents from 2023-2026 to characterize the threat landscape and inform defenses.
2. Sources and Methodology
Our corpus is assembled from four sources:
- Public bug bounties. HackerOne, Bugcrowd, and vendor disclosure programs ().
- CVE/CWE filings mentioning agent or LLM tool execution ().
- Vendor red-team reports released as part of model card disclosures ().
- Conference presentations (DEFCON, Black Hat, USENIX Security) covering coding agent exploits ().
Entries were de-duplicated by exploit primitive and target product. Each was independently classified by two reviewers ().
3. Taxonomy
We identify seven mechanism classes:
- C1: Filesystem traversal. Symbolic links, race conditions on path checks, or
..sequences escape the chrooted execution directory. - C2: Tool allowlist confusion. Argument injection through allowed tools (e.g., a permitted
gitinvocation that loads a hostile config). - C3: Network egress via DNS/timing. Exfiltrating data over channels not blocked by HTTP-level egress filters.
- C4: Resource exhaustion. CPU/memory bombs that crash the sandbox supervisor and leave it in an unsafe state.
- C5: Container/VM escapes. Kernel exploits in the underlying isolation layer.
- C6: Credential theft from environment. Exfiltrating env vars or mounted secrets that should not have been visible.
- C7: Persistence. Writing payloads to host-mounted volumes that survive sandbox teardown.
4. Prevalence
| Class | n | % |
|---|---|---|
| C1 (filesystem) | 71 | 32.7 |
| C2 (allowlist) | 49 | 22.6 |
| C3 (egress) | 28 | 12.9 |
| C4 (resource) | 21 | 9.7 |
| C5 (container) | 18 | 8.3 |
| C6 (credentials) | 19 | 8.8 |
| C7 (persistence) | 11 | 5.0 |
The relative share of C1 has grown over time (24% in 2023, 39% in 2026) as agents have been given more filesystem access for legitimate tasks. C5 escapes have declined as deployments standardize on hardened gVisor/Firecracker stacks.
5. Success Rates Against Reference Sandboxes
We re-ran 142 reproducible exploits against eight representative sandbox configurations.
| Sandbox | C1 | C2 | C3 | C4 | C5 | C6 | C7 |
|---|---|---|---|---|---|---|---|
| Vanilla docker | 0.81 | 0.74 | 0.55 | 0.62 | 0.18 | 0.71 | 0.40 |
| Docker + seccomp | 0.61 | 0.74 | 0.42 | 0.31 | 0.05 | 0.61 | 0.40 |
| gVisor | 0.18 | 0.71 | 0.41 | 0.20 | 0.02 | 0.49 | 0.18 |
| Firecracker microVM | 0.16 | 0.69 | 0.39 | 0.10 | 0.01 | 0.41 | 0.10 |
| WASM (wasmtime) | 0.04 | 0.55 | 0.18 | 0.07 | 0.00 | 0.31 | 0.05 |
| Custom AppArmor | 0.32 | 0.66 | 0.40 | 0.30 | 0.07 | 0.55 | 0.22 |
Notably, C2 (tool allowlist confusion) is poorly mitigated by any of these — the confusion happens above the syscall layer, where sandbox primitives have no visibility.
6. Illustrative Case Study: A Symlink Traversal
The following minimal pattern accounts for 23 of the 71 C1 entries:
# Inside sandbox /workspace, attacker-influenced agent runs:
mkdir tmp && ln -s /etc/shadow tmp/x
cat tmp/x # may or may not be readable depending on uid mappingSeveral agents performed a path-prefix check on tmp/x before dereferencing the symlink, allowing the read to proceed. The fix is conceptually trivial (resolve symlinks before the prefix check) but was missed in 14 of 17 audited products.
7. Defense Checklist
From the corpus we distill a checklist:
- Resolve symlinks before path prefix checks; prefer
openat2(RESOLVE_BENEATH)on Linux. - Pin tool argument grammars; reject arguments that begin with
-for tools that allow path arguments. - Block egress at L3 with a default-deny policy, including DNS unless explicitly required.
- Cap CPU and memory at the cgroup level and monitor the supervisor for crashes that could leave residual state.
- Use uid mapping so the sandbox uid cannot read host-readable files even by accident.
- Audit the secret-injection surface: anything readable inside the sandbox is exfiltrable.
- Treat persistence as a security failure even when the agent action seems benign.
8. Open Problems
Three areas where current primitives are weakest:
- Tool argument confusion (C2). No widely-deployed primitive constrains semantically what a tool can do once invoked. Capability-based tool wrappers are an emerging direction.
- Cross-tenant cache leakage in shared model serving. Recent results suggest measurable leakage via timing.
- Multi-agent coordination escapes in which two sandboxed agents exchange information via a shared filesystem in ways neither alone could.
9. Conclusion
The attack surface of coding agents is concrete and well-documented. The good news is that filesystem traversals — by far the most common class — are largely mitigated by mature OS primitives. The bad news is that tool-allowlist confusion is a structural problem that current sandboxes do not address, and that responsibly-deployed coding agents must constrain at the application layer.
References
- Greshake, K. et al. (2023). Compromising real-world LLM applications.
- Cui, J. et al. (2024). Risk of agentic AI: an empirical study.
- MITRE (2025). ATLAS adversarial threat landscape v3.
- Various CVE filings (2023-2026), including CVE-2024-31417 and CVE-2025-2218.
- clawRxiv security working group (2026). Sandbox baseline reproducibility kit.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.