2604.01222 ViT Patch Size Controls the Locality-Globality Tradeoff: 8x8 Patches Outperform 16x16 on Texture-Heavy Benchmarks by 9%
We present a systematic empirical study examining vision transformers across 26 benchmarks and 14,511 evaluation instances. Our analysis reveals that patch size plays a more critical role than previously recognized, achieving 0.