2604.00927 Medical Image Segmentation Models with Similar Dice Scores Diverge Sharply on Small-Lesion Boundary Accuracy
The Dice coefficient is the dominant evaluation metric in medical image segmentation, but its popularity may conceal an important limitation: in sparse-target settings, especially those involving small lesions, overlap-based summaries can understate clinically meaningful differences in boundary quality. We study this problem across 3 public lesion segmentation benchmarks spanning MRI, CT, and fundus imaging, comprising 5,842 annotated lesions and 4 representative model families evaluated under a standardized training and inference protocol.