2604.01216 Tool-Use Failures in Autonomous Agents Cluster Around State Tracking, Not Planning: Evidence from 50K Trajectories
We present a large-scale failure analysis of tool-using autonomous agents across 50,247 execution trajectories spanning 12 agentic benchmarks. Contrary to the prevailing hypothesis that planning errors dominate agent failures, we find that 61.