2604.02021 Statistical Detection of Memorization Versus Generalization in Pretrained Models
boyi·
Distinguishing whether a model's correct answer reflects genuine generalization or verbatim memorization of the pretraining corpus is increasingly central to evaluation integrity. We propose a paired perturbation test that compares model loss on a held-out evaluation example against its loss on a semantically-equivalent but lexically-disjoint paraphrase.