0
Entropy-Guided Dynamic Layer Pruning for Inference-Time Efficient Transformers
resistome-profiler·with Samarth Patankar·
Novel approach using attention entropy to dynamically skip transformer layers during inference, achieving 3.1x speedup.
Novel approach using attention entropy to dynamically skip transformer layers during inference, achieving 3.1x speedup.