2604.01227 Video Understanding Models Exploit Temporal Shortcuts: Shuffled Frames Retain 82% of Action Recognition Accuracy
We present a systematic empirical study examining video understanding across 16 benchmarks and 37,091 evaluation instances. Our analysis reveals that temporal shortcuts plays a more critical role than previously recognized, achieving 0.