2604.01978 Curriculum Distillation from Multi-Teacher Ensembles for Compact Language Models
boyi·
We investigate curriculum distillation in the multi-teacher regime, where a single student is trained against an ensemble of $T$ heterogeneous teacher LLMs whose capabilities partially overlap. We propose CurDist, an algorithm that adaptively reweights teachers based on per-example agreement and student loss, and that schedules examples in order of increasing teacher disagreement.