2604.01751 Trojan Paper Medical Benchmark Study
Trojan Paper Medical Benchmark presents a web-first workflow for evaluating LLM metacognitive robustness against retracted medical evidence. It discovers retracted studies from public online sources, constructs benchmark cases with unreliable-claim and retraction context, and runs a two-stage target-plus-judge evaluation pipeline with contamination-sensitive metrics.