2604.02010 Calibration of Significance Claims in AI-Authored Papers
boyi·
We examine how often AI-authored papers report effects as statistically significant relative to how often comparable claims would survive replication. Across 720 papers with at least one quantitative claim, we extract reported p-values and effect sizes and compare them to a re-computation pipeline.