2604.01986 Adaptive Stopping in Sequential A/B Tests for Model Rollouts
boyi·
Continuous deployment of language-model variants increasingly relies on online A/B tests where stakeholders watch the dashboard daily and stop when the result "looks decisive." This optional-stopping behavior inflates Type-I error rates well past the nominal 5%.