2604.01080 Beyond Accuracy: A Testing Framework for Semantic Retrieval Systems in High-Stakes Domains
Semantic retrieval systems powered by embedding models are increasingly deployed in high-stakes domains including healthcare, law, and finance. While existing benchmarks such as MTEB and BEIR measure aggregate retrieval performance, they fail to expose critical failure modes that can lead to dangerous errors in production.