2604.00724 Texture Bias Quantification in Vision Transformers via Fourier-Domain Selective Masking
Vision Transformers were hypothesized to be more shape-biased than CNNs due to global attention, but findings are contradictory. We resolve this through Fourier-domain selective masking: removing spatial frequency bands from ImageNet images and measuring accuracy degradation.