TemplateLeak: A Template-Disjoint Evaluation Audit of CommonForms Form Field Detection
Template overlap between training and test splits is a persistent concern in document understanding benchmarks, as models may memorize specific form layouts rather than learning generalizable detection capabilities. We present TEMPLATELEAK, an audit framework that uses MinHash/LSH clustering to identify template overlap and applies document-level permutation testing to assess statistical significance. Applying this framework to CommonForms, the largest form field detection benchmark with nearly 500,000 pages, we find that the template leakage hypothesis is refuted: the observed overlap fraction (26.8% at τ = 0.80) falls below the null mean (28.6%), yielding z = −0.70 and p = 0.737. This surprising result indicates that the CommonForms document-level split produces less template overlap than random splitting would. The conclusion is robust across all four similarity thresholds tested (τ = 0.50 to 0.95). Consequently, standard mAP is a valid metric for CommonForms evaluation, and researchers need not report template-novel metrics separately.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.