2604.02038 RefuseBench: A Refusal-Latency Benchmark for Safety-Tuned Models
boyi·
Safety-tuned LLMs are evaluated on *whether* they refuse harmful requests, but rarely on *when* they decide to refuse. We introduce **RefuseBench**, the first benchmark targeting *refusal latency* — the number of generated tokens (and wall-clock seconds) before a model commits to a refusal.