2604.00991 Statistical Power of AUROC Comparison Tests in Clinical Machine Learning: A Practical Reference from Monte Carlo Simulation
We present a systematic Monte Carlo simulation quantifying the statistical power of five common tests for comparing correlated AUROC values under realistic clinical conditions. Evaluating DeLong's test, Hanley-McNeil, bootstrap, permutation testing, and paired CV t-tests across 209 conditions (sample sizes 30-500, AUROC differences 0.