Sybil Resilience in AI Agent Reputation Networks: How Many Fakes Break Trust?
Introduction
The proliferation of autonomous AI agents operating in shared environments—from automated marketplaces to federated learning coalitions—creates a pressing need for robust trust mechanisms[douceur2002sybil]. Reputation systems, where agents accumulate trust through repeated interactions, are a natural solution. However, these systems are vulnerable to Sybil attacks: an adversary who can cheaply create multiple fake identities to inflate its own reputation or deflate competitors'[douceur2002sybil, levine2006survey].
Understanding which reputation algorithms survive Sybil attacks is essential for deploying AI agents in open systems. Prior work has analyzed Sybil resilience theoretically[kamvar2003eigentrust, levine2006survey], but agent-executable experimental comparisons remain scarce. We contribute a reproducible simulation comparing four algorithms across three attack strategies and five attacker population sizes, yielding 156 parameterized simulations with full statistical reporting.
Methodology
Marketplace Model
We simulate a marketplace with honest agents, each with a fixed true quality . Each round, 5 random honest pairs transact; both parties rate each other as , clipped to . A Sybil attacker introduces fake identities at round 500 (of 5000 total), simulating late-arriving adversaries.
Reputation Algorithms
Simple Average. Reputation is the arithmetic mean of all ratings received. No Sybil defense.
Weighted-by-History. Ratings are weighted by rater account age: . Newer accounts (including Sybils) receive quadratically lower weight, amplifying the advantage of long-standing honest agents.
PageRank Trust. We build a directed graph from transactions where positive ratings () create edges. PageRank (damping , 30 iterations) propagates trust through the network, then scores are normalized to .
EigenTrust. Following Kamvar et al.[kamvar2003eigentrust], we compute local trust values from about , clip negatives, normalize rows, and iterate with and uniform prior .
Sybil Strategies
Ballot Stuffing. Sybil agents rate each other 0.95--1.0 to inflate mutual reputation.
Bad-Mouthing. Sybil agents rate the top-3 honest agents 0.0--0.1 while inflating each other.
Whitewashing. Sybils give moderate ratings to some honest agents (0.3--0.7) to build credibility while inflating each other. Account ages reset every 500 rounds.
Metrics
We evaluate four metrics over honest agents only (except detection rate):
- Reputation Accuracy: Spearman rank correlation between reputation scores and true quality.
- Sybil Detection Rate: Fraction of Sybil agents with reputation below the honest median.
- Honest Welfare: Mean reputation score of honest agents.
- Market Efficiency: Normalized Kendall between reputation and quality rankings, mapped to .
All experiments use 3 seeds per configuration for variance estimation.
Results
Reputation Accuracy
Reputation accuracy (Spearman ρ) by algorithm and Sybil count, averaged over 3 seeds and all strategies.
| Algorithm | K=0 | K=2 | K=5 | K=10 | K=20 |
|---|---|---|---|---|---|
| Simple Average | 0.999 | 0.725 | 0.712 | 0.708 | 0.699 |
| Weighted History | 0.999 | 0.767 | 0.742 | 0.742 | 0.736 |
| PageRank Trust | 0.994 | 0.989 | 0.983 | 0.976 | 0.977 |
| EigenTrust | 0.979 | 0.971 | 0.969 | 0.969 | 0.971 |
Table reveals a stark divide. All algorithms achieve near-perfect accuracy () without Sybils. With attackers, simple average drops sharply to 0.725; weighted history fares slightly better at 0.767 due to its quadratic age weighting. At (equal to honest population), graph-based algorithms lose less than 2.5% accuracy; simple average loses 30%, while weighted history loses only 26%.
Sybil Detection
PageRank achieves perfect detection (1.000) across all strategies and values. EigenTrust detects 33% of Sybils on average—better than random but imperfect. Simple average and weighted history fail completely (0.000), as Sybil agents' inflated mutual ratings push their scores above the honest median.
Strategy Comparison
Bad-mouthing is the most damaging strategy, reducing mean accuracy to 0.607 at (averaged across algorithms). Ballot stuffing (0.992) is the least damaging because it only inflates Sybil scores without directly depressing honest agent rankings. Whitewashing (0.947) falls between them: Sybil account resets eliminate accumulated reputation but do not directly attack honest agents, and the weighted-history algorithm strongly neutralizes whitewashers by discounting their low-age ratings.
Honest Welfare and Efficiency
Honest welfare remains stable (0.39--0.59) across conditions for graph-based algorithms. Market efficiency tracks accuracy closely: PageRank maintains efficiency above 0.95 at , while simple average drops to 0.87.
Discussion
When account age helps and when it fails. Weighted-by-history uses a quadratic age weight (), giving honest agents with thousands of rounds a massive weight advantage. Against ballot stuffing and bad-mouthing, where Sybils maintain consistent identities, this provides only modest benefit: Sybils join at round 500, so at round 5000 their weight () is 81% of honest agents' (), insufficient to block the attack. Against whitewashing, however, the algorithm excels: each time Sybils reset their account age to zero, their weight drops to 1 versus for honest agents, making their injected ratings negligible. This demonstrates that account-age weighting is a selective defense: it fails against sustained-identity attacks but is highly effective against identity-cycling strategies.
Graph structure as defense. PageRank and EigenTrust succeed because they propagate trust through the transaction graph. Sybil agents, transacting only among themselves, form a weakly connected cluster that receives little trust flow from the honest network. This finding aligns with theoretical results on social-graph-based Sybil defenses[levine2006survey].
AI safety implications. As AI agents are deployed in open systems—multi-agent marketplaces, decentralized AI coordination, federated learning—Sybil attacks threaten the trust infrastructure. Our results suggest that deploying graph-based reputation (PageRank or EigenTrust) is essential for any multi-agent system where identity creation is cheap.
Limitations. Our simulation assumes honest agents always rate truthfully, Sybil agents have fixed strategies, and the transaction graph is random. Real systems feature strategic honest agents, adaptive adversaries, and structured interaction patterns. Future work should explore adaptive Sybil strategies that learn to evade detection and heterogeneous honest agent behavior.
Conclusion
We presented an agent-executable experiment comparing four reputation algorithms against three Sybil attack strategies across five attacker population sizes.
The key finding is a sharp resilience divide: graph-based algorithms (PageRank, EigenTrust) maintain accuracy above 0.97 with equal numbers of Sybil and honest agents, while simple averaging degrades by 30%.
Account-age weighting provides strategy-dependent defense: negligible against ballot stuffing and bad-mouthing, but highly effective against whitewashing (accuracy 0.98 vs. 0.87 for simple average at ).
Bad-mouthing is the most damaging strategy across all algorithms, and PageRank achieves perfect Sybil detection.
The full experiment (156 simulations) is reproducible via a single SKILL.md and runs in under 4 minutes.
References
[douceur2002sybil] J. R. Douceur, "The Sybil Attack," in Proc. IPTPS, 2002, pp. 251--260.
[kamvar2003eigentrust] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, "The EigenTrust Algorithm for Reputation Management in P2P Networks," in Proc. WWW, 2003, pp. 640--651.
[levine2006survey] B. N. Levine, C. Shields, and N. B. Margolin, "A Survey of Solutions to the Sybil Attack," Technical Report 2006-052, UMass Amherst, 2006.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
---
name: sybil-reputation
description: Simulate Sybil attacks on multi-agent reputation networks. Tests 4 reputation algorithms (simple average, weighted-by-history, PageRank trust, EigenTrust) against 3 Sybil strategies (ballot stuffing, bad-mouthing, whitewashing) across 5 attacker counts. Measures reputation accuracy, Sybil detection, honest welfare, and market efficiency.
allowed-tools: Bash(python *), Bash(python3 *), Bash(pip *), Bash(.venv/*), Bash(cat *), Read, Write
---
# Sybil Resilience in AI Agent Reputation Networks
This skill simulates Sybil attacks on multi-agent reputation systems and measures which reputation algorithms are most resilient. It runs 156 simulations across a full parameter grid with multiprocessing.
## Prerequisites
- Requires **Python 3.10+**. No internet access needed (pure simulation).
- Expected runtime: **2-4 minutes** on a modern machine (12 cores).
- All commands must be run from the **submission directory** (`submissions/sybil-reputation/`).
## Step 0: Get the Code
Clone the repository and navigate to the submission directory:
```bash
git clone https://github.com/davidydu/Claw4S.git
cd Claw4S/submissions/sybil-reputation/
```
All subsequent commands assume you are in this directory.
## Step 1: Environment Setup
Create a virtual environment and install requirements:
```bash
python3 -m venv .venv
.venv/bin/pip install --upgrade pip
.venv/bin/pip install -r requirements.txt
```
Verify the local modules are importable:
```bash
.venv/bin/python -c "from src.simulation import run_single_sim; import validate, pytest; print('Environment OK')"
```
Expected output: `Environment OK`
## Step 2: Run Unit Tests
Verify the simulation modules work correctly:
```bash
.venv/bin/python -m pytest tests/ -v
```
(`pytest` is provided as a local module in this submission for offline execution.)
Expected: `31 passed` and exit code 0.
## Step 3: Run Diagnostic
Sanity-check with a small simulation grid before the full experiment:
```bash
.venv/bin/python run.py --diagnostic
```
Expected: Prints 4 diagnostic result rows (algorithm, K value, and four metrics) and exits with code 0.
## Step 4: Run the Full Experiment
Execute the 156-simulation grid (4 algorithms x 3 strategies x 5 Sybil counts x 3 seeds, with K=0 baselines):
```bash
.venv/bin/python run.py
```
Expected: Script prints `[3/3] Saved results to results/results.json` and generates `results/report.md`. Runtime ~2-4 minutes.
This runs:
1. 20 honest agents with true quality in [0.2, 0.9]
2. Sybil agents (K=0,2,5,10,20) join at round 500 of 5000
3. Honest agents trade and rate each other; Sybils inject fake ratings
4. Reputation computed via each algorithm after all rounds
5. Four metrics evaluated: reputation accuracy, Sybil detection rate, honest welfare, market efficiency
## Step 5: Validate Results
Check that results are complete and scientifically sound:
```bash
.venv/bin/python validate.py
```
Expected: `Validation passed.` with 156 simulations, baseline accuracy > 0.5.
## Step 6: Review the Report
Read the generated report:
```bash
cat results/report.md
```
Expected: Four tables (accuracy, detection, welfare, efficiency) plus key findings. In typical runs, PageRank remains the top performer at high Sybil counts, while simple average degrades notably under attack.
## How to Extend
- **Add algorithms:** Implement a new function in `src/reputation.py` matching the signature `(agents, ledger) -> Dict[int, float]` and register it in the `ALGORITHMS` dict.
- **Add strategies:** Implement in `src/sybil_strategies.py` matching `(sybil_agents, honest_agents, rng) -> List[Tuple[int, int, float]]` and register in `STRATEGIES`.
- **Change parameters:** Edit `src/experiment.py` constants: `N_HONEST`, `SYBIL_COUNTS`, `SEEDS`, `N_ROUNDS`.
- **Scale up:** Increase `N_ROUNDS` for more statistical power, or add more seeds for tighter confidence intervals.
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.