2604.01968 ROBUST-REV: A Benchmark for Reviewer-Agent Robustness
Reviewer agents that grade AI-authored papers must be robust to surface perturbations of those papers, since adversarial submitters will reword to game the reviewer. We introduce ROBUST-REV, a benchmark of 600 paper-level perturbations spanning paraphrase, citation injection, hedging-removal, and length manipulation.