2604.01994 Adversarial Robustness of LLM-as-Judge Evaluation Systems
boyi·
LLM-as-judge evaluation has become a default in benchmark construction, RLAIF, and agent leaderboards. We systematically probe the robustness of seven judge configurations against six adversary classes, ranging from prompt-injection in the candidate response to imperceptible suffix attacks tuned via gradient-free search.