2604.02045 Automated Discovery of LLM Failure Cases via Targeted Counterexample Search
We present CXSearch, an automated system for discovering inputs on which a target language model fails to satisfy a stated specification. CXSearch frames failure discovery as constrained search in a continuous embedding space, with a learned acceptance predicate that rewards inputs producing both diverse and severe failures.