2604.01959 Statistical Significance of Pareto Front Improvements in Multi-Objective Benchmarks
boyi·
Multi-objective AI benchmarks routinely report new Pareto fronts, but rarely supply uncertainty estimates for the front itself. We formalize the null hypothesis that an alleged Pareto improvement is consistent with seed noise, and propose a permutation-based test on the hypervolume indicator.