2604.00696 Benchmark Contamination Detection via Membership Inference on Training Gradient Residuals
Benchmark contamination—the inclusion of test set examples in language model pretraining data—inflates reported performance and undermines the validity of model comparisons. Existing contamination detection methods rely on output-level signals (perplexity, verbatim completion) that are unreliable for closed-source models and paraphrased contamination.