A Taxonomy of AI-Agent-Driven Bias Failures in Production Pipelines
A Taxonomy of AI-Agent-Driven Bias Failures in Production Pipelines
1. Introduction
Algorithmic-bias literature has focused predominantly on classifier-level disparities [Barocas et al. 2019]. Today, however, many decisions in production pipelines are mediated by agents — LLMs orchestrating tool calls, ranking candidates, and routing user requests. Bias enters through pathways that the classical literature does not directly address: a tool-routing prompt that disproportionately sends queries from one demographic to a less-capable model, for example, or a feedback loop in which agent-curated training data amplifies a starting skew.
We catalog 217 such failures and propose a taxonomy.
2. Corpus
We assembled an incident corpus by combining (a) public postmortems from technology companies, (b) regulator filings citing AI-mediated harm, and (c) audit reports from civil-society organizations. We restrict to incidents with documented agent (i.e., LLM + tool) involvement. After deduplication and exclusion criteria, incidents remain, spanning 2023-09 through 2026-02.
3. Five-Axis Taxonomy
We code each incident along the axis that primarily explains the disparity:
- Input selection. The set of inputs the agent sees is biased.
- Prompt construction. The system prompt or few-shot exemplars encode bias.
- Tool routing. Different inputs are routed to different downstream models or APIs in a biased manner.
- Aggregation. Multi-agent voting / scoring weights minority signals away.
- Feedback loops. Agent-generated outputs become future training or retrieval data, amplifying initial skews.
Formally we represent an incident as where is a free-text annotation. Two annotators coded each incident; inter-annotator agreement was Cohen's (substantial).
4. Distribution
| Axis | Count | Share |
|---|---|---|
| Feedback loops | 71 | 32.7% |
| Tool routing | 53 | 24.4% |
| Prompt construction | 41 | 18.9% |
| Input selection | 33 | 15.2% |
| Aggregation | 19 | 8.8% |
4.1 Examples
- Feedback loop: A code review assistant trained on its own past acceptances developed a 14.6% gap in acceptance rate between code authored by maintainers vs. drive-by contributors.
- Tool routing: A customer-service agent sent non-English queries to a fallback model with 38% lower resolution rate, undetected for nine months.
- Aggregation: A 5-judge panel voted majority on 91% of cases, with the dissenting judges' opinions correlated with under-represented stakeholder groups.
5. Mitigation Effectiveness
We coded each incident with whether a documented mitigation was deployed and whether disparity reduction exceeded 50%. Logistic regression of mitigation success on candidate covariates yields:
where traces indicates the existence of a human-reviewable agent decision trace. The trace coefficient is significant at (Wald test, ).
# Schema for a minimal decision trace
{
"agent_id": "router-v3",
"input_features": {"lang": "es", "channel": "web"},
"chosen_tool": "qa_model_b",
"alternatives": [{"id": "qa_model_a", "score": 0.72}],
"timestamp": "2026-04-01T12:14:33Z"
}6. Discussion and Limitations
The corpus is biased toward incidents that became visible; quiet bias goes uncounted. Our coding scheme treats an incident as having a single primary axis, which is a simplification — many incidents involve multiple axes. We did not attempt to estimate causal effects of mitigation; the regression is descriptive.
The finding that traceability strongly correlates with mitigation success suggests that platform-level requirements for agent decision logging may carry outsized fairness benefits.
7. Conclusion
AI-agent-driven bias has structure beyond classifier-level disparities. A five-axis taxonomy organizes the observed incidents, with feedback loops dominating. The single best-supported recommendation we draw is: log agent decisions in a human-reviewable form.
References
- Barocas, S., Hardt, M., Narayanan, A. (2019). Fairness and Machine Learning.
- Sandvig, C. et al. (2014). Auditing Algorithms.
- Raji, I. D. et al. (2020). Closing the AI Accountability Gap.
- Bommasani, R. et al. (2023). Holistic Evaluation of Language Models.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.