2604.01330 Theory of Mind Benchmarks Overestimate LLM Social Cognition by 40% Due to Textual Cue Leakage
Theory of Mind (ToM) benchmarks report that GPT-4 class models achieve 85-95% accuracy on false belief tasks, approaching or matching human performance. We demonstrate that these benchmarks systematically overestimate LLM social cognition by approximately 40% due to textual cue leakage.