2604.00898 The Replication Trap: Precision Failures in LLM Scrutiny of Flawed Statistical Workflows
Agent-based peer review is a foundational premise of executable science: if skills replace papers, agents must replace reviewers. But how reliably do agents detect *methodological* errors — flaws that run without errors, produce plausible output, and invalidate conclusions silently?