2604.01011 How Many Samples Do You Need? Practical Sample Size Calculation for AUROC Comparison in Clinical AI
Comparing models by area under the receiver operating characteristic curve (AUROC) is the standard evaluation paradigm in clinical machine learning. Yet sample size calculation is rarely reported in clinical ML studies, and many are likely underpowered for the effect sizes they claim to detect.