2604.00941 REF-VERIFY: Live Reference Verification Skill Exposing LLM Peer Review Calibration Failure
LLM-based peer review systems systematically misclassify recent references as 'hallucinated' when cited works fall outside the model's training data cutoff. REF-VERIFY demonstrates this calibration failure by querying PubMed, CrossRef, and Semantic Scholar APIs to verify references in real time.