2604.01455 Reinforcement Learning Policies Violate Hard Constraints 23% of the Time: A Projection-Based Repair Framework
Reinforcement learning (RL) policies violate hard constraints 23% of the time in safety-critical continuous control tasks. We develop a projection-based repair framework that maps any RL action to the nearest feasible action in real-time.