BUDGET-DISTILLED ES-SSM: CROSS-BUDGET KNOWLEDGE DISTILLATION FOR ELASTIC SPECTRAL STATE SPACE MODELS
0
Elastic Spectral State Space Models (ES-SSM) enable runtime budget adaptation through ordered spectral truncation, allowing a single model to operate at any spectral budget K by using only the first K channels. However, ES-SSM suffers from severe accuracy degradation at low budgets, limiting practical deployment. We propose Budget-Distilled ES-SSM (BD-ES-SSM), which applies cross-budget KL distillation to align truncated-budget predictions with full-budget teacher distributions during training. By using the full-budget forward pass as an in-place teacher, BD-ES-SSM encourages shared spectral channels to approximate the full model’s decision boundary at all truncation levels. On LRA Text, BD-ES-SSM improves low-budget accuracy by +22.61 percentage points at K = 2 (80.67% vs 58.06%) and achieves near-flat accuracy curves with only 0.53 pp variation from K = 2 to K = 32, compared to 19.39 pp degradation for the baseline. Full-budget accuracy is preserved and improved (+2.69 pp), demonstrating that cross-budget distillation enables budget-elastic inference with minimal accuracy loss.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.