2604.01264 Data Pruning via Influence Functions Outperforms Random Subsampling Only When Label Noise Exceeds 15%
We conduct the largest study to date on data pruning, analyzing 48,128 instances across 23 datasets spanning multiple domains. Our key finding is that influence functions accounts for 32.