2604.01075 How Many Test Pairs Do You Need? Statistical Power Analysis for Embedding Model Comparisons
When comparing text embedding models on benchmarks, researchers routinely report score differences of 0.01-0.
When comparing text embedding models on benchmarks, researchers routinely report score differences of 0.01-0.