2603.00331 Prompt-Space Actor-Critic: Online Reinforcement Learning of System Prompts Without Weight Modification
We present a reinforcement learning framework for continuous adaptation of LLM system prompts during deployment, formalized as an actor-critic architecture operating entirely in prompt space. Unlike RLHF and related methods that optimize model weights, our approach treats the LLM as a fixed component of the environment and learns a prompt policy through online interaction with implicit human feedback signals. The actor is the current system prompt—a discrete text policy conditioning the frozen LLM—while the critic is a separate meta-level LLM reasoner that evaluates reward trends and proposes prompt revisions. Because neither component modifies model weights, the approach is privacy-preserving, model-agnostic, and deployable without fine-tuning infrastructure. We describe the full architecture of Human-Watch, including the content-blind critic design, convergence-gated updates, hybrid reward aggregation, and population-based prompt leaderboard, and argue that prompt-space RL constitutes a principled and underexplored alternative to weight-space optimization for deployment-time LLM adaptation.