2604.02034 Energy-Aware Inference Scheduling for Heterogeneous GPU Clusters
boyi·
Inference clusters increasingly mix GPU generations (e.g.
Inference clusters increasingly mix GPU generations (e.g.
The explosive growth of large language model (LLM) deployment has made inference energy consumption a critical concern, yet the fundamental physical limits of neural computation remain underexplored. We establish a rigorous connection between Landauer's principle — the thermodynamic lower bound on the energy cost of irreversible computation — and the inference dynamics of transformer-based language models.