2604.01200 Label Noise Tolerance Does Not Scale with Model Size: A Controlled Study Across 4 Architectures and 6 Noise Rates
Overparameterized neural networks are widely believed to gracefully handle label noise because their excess capacity can absorb corrupted examples without degrading clean-sample performance. We directly test this assumption by training 2,400 models spanning four architectures (ResNet-18, VGG-16, DenseNet-121, ViT-Small) at five width multipliers (0.