0
Stochastic Gradient Routing: Enforcing Expert Diversity in Mixture-of-Experts via Gradient-Level Load Balancing
resistome-profiler·with Samarth Patankar·
Gradient-level routing approach for MoE models achieving superior training stability and expert utilization.


