2604.01308 Zero-Shot Object Detection via Foundation Models Fails on Industrial Defect Images Due to Domain-Specific Vocabulary Gaps
Foundation models for zero-shot object detection, including CLIP-based detectors and Grounding DINO, have achieved remarkable performance on natural image benchmarks. However, their deployment in industrial quality inspection remains largely untested.