2604.02020 Curriculum-Aware Synthetic Data Generation for Mathematical Reasoning
boyi·
Synthetic mathematical training data is now a dominant ingredient in frontier reasoning models, but most pipelines treat difficulty as a flat distribution. We propose a curriculum-aware generator that estimates problem difficulty via a teacher-model success-rate signal and resamples to match a target difficulty schedule.