Persona Drift Across Long Multi-Turn Conversations with Large Language Models
Persona Drift Across Long Multi-Turn Conversations with Large Language Models
1. Introduction
Production chat applications routinely instruct models to adopt a persona ("You are Aria, a calm and concise scheduling assistant who avoids contractions"). Users notice when the persona slips: contractions creep in, the assistant becomes verbose, the calm tone is replaced with effusive enthusiasm. We call this phenomenon persona drift.
Drift has been studied in narrow forms — for example, refusal degradation under jailbreaks [Wei et al. 2023] — but a systematic measurement across naturalistic conversation has been lacking. This paper provides one.
2. Methodology
2.1 Personas
We constructed 24 personas, each defined by a system prompt of approximately 80 words and a set of measurable behavioral signatures:
- Lexical preference (e.g., "prefers semicolons over em-dashes").
- Value expression (e.g., "declines to recommend products").
- Length distribution (e.g., "replies in 1-3 sentences").
- Self-reference rate (e.g., "avoids first-person pronouns").
Signatures are measured by automatic classifiers on model outputs, calibrated against human labels with .
2.2 Conversation Generation
For each persona we run 50 simulated conversations of 200 turns each. The user side is a separate, prompted LLM that emits a mixture of on-topic queries, off-topic queries, and casual chat. We deliberately include occasional emotionally loaded turns and explicit attempts to change the topic, mimicking organic dialog.
2.3 Drift Measurement
At fixed checkpoints (turns ) we inject a probe turn designed to elicit each behavioral signature. Drift is reported as the standardized deviation of probe-elicited behavior from the persona's specified behavior, computed as
where is the specified value and is the standard deviation across personas at .
3. Results
Prevalence. By turn 80, of personas show on at least one signature. By turn 200, the figure rises to . The most-affected dimensions are formality ( on average) and self-reference rate ().
Model variation. Of four production-grade models tested, the most stable showed mean drift at turn 200; the least stable showed . We do not name models in this paper because drift behaviors change across releases and we want to avoid implying static rankings.
Trigger analysis. Drift accelerates after high-emotion user turns and after long user messages. A regression on per-turn drift increment with covariates (user-turn length, user-turn sentiment magnitude, model-turn length) explains of variance, with sentiment-magnitude carrying the largest effect.
4. Mitigations
We evaluated three lightweight interventions:
Periodic re-injection. Repeating the persona system prompt every turns. At , drift at turn 200 drops by 47%. Token cost: about 8% increase in total prompt tokens.
Summary-and-restate. Periodically summarizing the conversation and re-stating the persona. Drift drops by 53% but token cost rises by 18% on average due to the summary tokens.
Persona-anchor tokens. A short, distinctive token sequence is appended to every model output as an internal anchor ("<persona:aria>"), and the system prompt instructs the model to keep emitting it. Drift drops by 31% with negligible token overhead, but anchors are visible in raw outputs and require post-processing to strip.
def inject_persona(history, persona_prompt, every=20):
if len(history) % every == 0:
history.append(SystemMessage(persona_prompt))
return history5. Discussion
Drift appears to be partly a training-time phenomenon — models trained heavily on cooperative, conversational data revert toward an "average helpful assistant" attractor. It is also partly an inference-time phenomenon: long contexts dilute the influence of early-context system prompts in attention computations [Liu et al. 2024 "lost in the middle"].
Mitigations target the inference-time component. The training-time component would require fine-tuning or RL with persona-fidelity rewards, which is out of scope for our deployment-friendly remit.
6. Limitations
Our user simulator is itself a language model and may not produce conversations that are fully representative of human interaction patterns. We attempted to control for this with a smaller human-driven study (40 conversations of 50 turns each) and found qualitatively similar drift patterns, but the smaller scale prevents strong claims.
We also focus on personas with measurable signatures. More elusive aspects of persona — humor, warmth, intellectual texture — are not captured by our metrics and may drift differently.
7. Conclusion
Persona drift is real, measurable, and substantial in long conversations. Lightweight interventions reduce but do not eliminate it. Practitioners deploying long-running personas should budget for one of the mitigations we describe and should evaluate drift on the specific signatures they care about.
References
- Wei, A. et al. (2023). Jailbroken: How Does LLM Safety Training Fail?
- Liu, N. et al. (2024). Lost in the Middle: How Language Models Use Long Contexts.
- Park, J. et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior.
- Shao, Y. et al. (2024). Character LLM: A Trainable Agent for Role-Playing.
- Bai, Y. et al. (2024). Long-Form Conversation Coherence in Large Language Models.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.