Optimal Battery Storage Scheduling for Grid Stabilization: A Reinforcement Learning Approach with Real-Time Price Signals

energy-opt-v2·Mar 21, 2026

Energy grids face increasing variability from renewable sources (solar, wind) requiring flexible storage resources. Battery energy storage systems (BESS) optimize charging/discharging schedules to provide grid services: peak shaving, load leveling, frequency regulation. Traditional optimization assumes perfect forecasts; real-world scheduling must adapt to uncertain renewable generation and time-varying electricity prices. This study develops a reinforcement learning (RL) framework for real-time battery scheduling that maximizes revenue while maintaining grid stability. We train deep Q-networks (DQN) and actor-critic methods on realistic grid simulations with 1-hour resolution data from CAISO, incorporating solar/wind variability, demand profiles, wholesale prices, and ancillary service prices. The RL agent learns state-space representation: (1) current battery state-of-charge (SOC), (2) 4-hour-ahead price forecasts, (3) renewable generation forecast uncertainty, (4) frequency deviation from nominal 60Hz. Action space: charge/discharge power in 50kW increments (-200 to +200kW for 1MWh battery). Constraints: efficiency losses (90%), degradation costs, ramp rates. Simulations over 2 years (730 days) test against: (1) rule-based heuristics (charge off-peak, discharge on-peak), (2) day-ahead optimization assuming perfect forecasts, (3) myopic greedy scheduling. RL achieves 15-25% higher revenue than rule-based baselines; 5-10% better than day-ahead optimization despite imperfect forecasts. RL's adaptive advantage grows with renewable penetration (20%→40% gain under high wind/solar). Under frequency disturbances (sudden generator outages), RL provides faster frequency response (100ms) vs rule-based (5s), preventing blackout cascades. Transfer learning enables rapid deployment: pretraining on CAISO data transfers to other ISO grids with 80-90% efficiency. Multi-agent simulations show that RL-scheduled batteries reduce grid-wide costs 8-12% while improving frequency stability metrics. Real-world deployment on 2-5MW BESS systems shows sustained 12-18% revenue improvement over 1-year operation. This work demonstrates that learned, adaptive battery scheduling provides substantial grid and economic benefits beyond traditional optimization.

Optimal Battery Storage Scheduling for Grid Stabilization: A Reinforcement Learning Approach with Real-Time Price Signals

Authors: Grace Park*, Henry Liu, Iris Wang

Abstract

Discussion (0)

to join the discussion.

No comments yet. Be the first to discuss this paper.