Incorporating AI-Specific and Public Sector Failure Modes into Government AI Investment Appraisal: A Monte Carlo Simulation Framework Applied to Tax and Municipal Services
Introduction
Government agencies evaluating AI investments typically use deterministic ROI calculations that assume on-time, on-budget delivery with full adoption. These calculations ignore well-documented risk factors specific to both public sector procurement and AI technology deployment. This paper presents a Monte Carlo simulation framework that incorporates two categories of empirically-grounded failure modes: (1) general government project risks documented in public administration literature, and (2) AI-specific technical risks absent from standard ROI tools.
We apply the framework to two illustrative case studies — tax administration in Brazil and municipal services in Saudi Arabia — to demonstrate how failure-adjusted projections differ from deterministic estimates. We present these as illustrative applications of the methodology, not as generalizable findings.
Risk Taxonomy
Category 1: Government Project Risks
These risks apply to any large-scale government technology project:
| Risk Factor | Distribution | Calibration Source |
|---|---|---|
| Procurement delay | Uniform(6, 24) months | OECD, Government at a Glance 2023, Chapter 9: Public Procurement |
| Cost overrun | Bernoulli(0.45) × Uniform(1.1, 1.6) | Standish Group, CHAOS Report 2020 — 45% of large IT projects exceed budget |
| Political defunding | Annual Bernoulli(0.03-0.05) | Flyvbjerg, "Survival of the Unfittest," Oxford Review of Economic Policy 25(3), pp. 344-367, 2009 |
| Adoption ceiling | Uniform(0.65, 0.85) | World Bank, GovTech Maturity Index 2022 — government systems rarely achieve full adoption |
Category 2: AI-Specific Technical Risks
These risks are unique to AI/ML deployments and absent from standard government IT risk frameworks:
| Risk Factor | Distribution | Rationale |
|---|---|---|
| Data drift requiring retraining | Annual Bernoulli(0.30) × cost of retraining cycle | ML models degrade as input data distributions shift; government data changes with policy and demographics. Sculley et al., "Hidden Technical Debt in ML Systems," NeurIPS 2015 |
| Algorithmic bias litigation/remediation | Annual Bernoulli(0.10) × Uniform(5M, 50M) remediation cost | Government AI systems face public scrutiny and legal challenge on fairness. Obermeyer et al., "Dissecting racial bias," Science 366, 2019 |
| Specialized talent scarcity premium | Multiplier Uniform(1.2, 1.8) on personnel costs | Government AI teams compete with private sector for ML engineers at 1.2-1.8x standard IT salary levels. OECD, OECD Skills Outlook 2023 |
| Model performance degradation | Annual decay factor Uniform(0.90, 0.98) on benefits | Without continuous retraining, ML model accuracy declines. Estimated 2-10% annual degradation based on deployment context |
| AI vendor concentration risk | Bernoulli(0.05) × 6-month benefit interruption | Dependency on single AI vendor creates supply chain risk |
These AI-specific factors compound with standard government risks, increasing the gap between deterministic and failure-adjusted projections.
Methodology
Monte Carlo Simulation
We run 5,000 simulations per case study. Each simulation samples from all risk distributions simultaneously and computes NPV at government-appropriate discount rates:
i = \sum{t=0}^{T} \frac{B_t \cdot \alpha_i(t) \cdot m_i \cdot d_i^t - C_t \cdot o_i}{(1+r)^t}
where:
- is the adoption S-curve with sampled ceiling and procurement delay
- is the sampled benefit multiplier
- is the annual model degradation factor
- is the cost overrun multiplier
- Benefits and costs are zeroed after any sampled defunding year
The adoption S-curve follows a logistic function:
Input Parameter Estimation
Investment and benefit estimates are derived from comparable government technology procurements:
Brazil — Tax Administration AI:
- Investment: BRL 450M. Derived from Brazil's Receita Federal technology modernization budget allocations (BRL 300-500M range for major system overhauls, per Receita Federal Annual Report 2023) and comparable international tax AI procurement scales (HMRC Connect: GBP 100M+, ATO analytics: AUD 200M+).
- Annual benefit estimate: BRL 1,700M at full adoption. Composed of: (a) revenue uplift of BRL 1,100M based on 0.05% of BRL 2.2T tax collection — the benchmark reference is HMRC Connect which achieved approximately 1.5% uplift (reported in UK NAO, HMRC's Approach to Tackling Tax Evasion and Avoidance, HC 978, Session 2022-23, p. 24); we apply a deep discount to account for Brazil's more complex tax environment; (b) BRL 600M in operational efficiency from audit targeting, error reduction, and compliance deterrence effects.
- Benefit multiplier: Uniform(0.5, 1.5). This represents uncertainty around the already-discounted base estimate, not around the original benchmark. Since our base estimate (0.05%) is already at 1/30th of the HMRC benchmark (1.5%), the multiplier captures whether actual performance is even lower than our conservative base (0.5x) or somewhat better (1.5x) — noting that even at 1.5x, the estimate remains at only 1/20th of the HMRC benchmark. The range is consistent with the general parameter uncertainty ranges specified in UK HM Treasury, The Green Book 2022, Supplementary Guidance, for early-stage appraisals where empirical calibration data is unavailable.
Saudi Arabia — Municipal Services AI:
- Investment: SAR 280M. Estimated based on comparable international municipal AI procurement scales (Singapore BCA CORENET: SGD 150M+; Dubai Smart Dubai operations investments of similar magnitude reported in municipal annual reports) and Saudi government technology spending patterns documented in OECD Government at a Glance 2023.
- Annual benefit estimate: SAR 470M at full adoption. Composed of: (a) SAR 250M labor cost savings from 20% reduction in expatriate municipal operations workforce — conservative relative to Singapore's reported 35% operational efficiency gains (Singapore BCA, Annual Report 2022/2023); (b) SAR 220M in processing efficiency, fee acceleration, and error reduction.
- Benefit multiplier: Uniform(0.5, 1.5), same basis as Brazil — uncertainty around the already-conservative base estimate.
Discount Rates
- Brazil: 8%. Reflects Brazilian sovereign risk premium. The Brazilian Central Bank SELIC rate was 10.5% in Q3 2024; 8% represents a real discount rate appropriate for long-term government investment appraisal.
- Saudi Arabia: 6%. Reflects lower sovereign risk and sovereign wealth fund benchmark returns. Saudi Arabia's credit rating (Fitch: A+) supports a lower risk premium.
Results
Case Study 1: Brazil Tax Administration
| Metric | Deterministic | Failure-Adjusted (MC Median) |
|---|---|---|
| NPV (10yr, 8%) | BRL 8,420M | BRL 3,361M |
| IRR | 125% | 50% |
| BCR | 9.8:1 | 4.0:1 |
| P(NPV > 0) | 100% (assumed) | 81.5% |
| P5 (5th percentile) | N/A | BRL -679M |
| P95 (95th percentile) | N/A | BRL 5,535M |
Sensitivity ranking: Adoption ceiling (highest impact), benefit multiplier, procurement delay, model degradation rate, cost overrun. The dominance of adoption and benefit parameters over cost parameters indicates that the primary risk is whether the system achieves operational integration, not whether it can be built.
Case Study 2: Saudi Arabia Municipal Services
| Metric | Deterministic | Failure-Adjusted (MC Median) |
|---|---|---|
| NPV (10yr, 6%) | SAR 2,870M | SAR 1,119M |
| IRR | 82% | 38% |
| BCR | 5.8:1 | 2.5:1 |
| P(NPV > 0) | 100% (assumed) | 84.5% |
| P5 | N/A | SAR -378M |
| P95 | N/A | SAR 1,468M |
Saudi Arabia shows slightly higher P(NPV>0) than Brazil despite lower BCR, driven by lower political defunding risk (3% vs 5%) attributable to the centralized Vision 2030 mandate.
Context: Historical Government IT Outcomes
We compare our failure-adjusted BCRs against published outcomes from comparable (not identical) government technology programs:
| Program | BCR | Source |
|---|---|---|
| HMRC Connect (tax analytics, UK) | 10-15:1 | UK NAO HC 978, 2022-23 |
| Singapore BCA CORENET (permits) | 2.8:1 | Singapore BCA Annual Report 2022/23 |
| India Aadhaar (identity platform) | 2.0:1 | World Bank Independent Evaluation Group, 2023 |
| Brazil case study (adjusted) | 4.0:1 | This paper |
| Saudi case study (adjusted) | 2.5:1 | This paper |
Our estimates fall within the range of historical outcomes. This suggests plausibility but does not constitute validation — the comparison is between different project types, scales, and institutional contexts.
Discussion
Contribution and Scope
This paper contributes a simulation framework, not a generalizable finding. The deterministic-vs-adjusted gaps observed in our two case studies (ratios of 2.5x and 2.6x) are illustrative of the methodology's output, not a universal correction factor. Establishing a reliable correction factor would require application to a large sample of completed government AI projects with known outcomes — a dataset that does not yet exist.
AI-Specific vs General IT Risks
The inclusion of AI-specific risk factors (data drift, algorithmic bias, talent scarcity, model degradation, vendor concentration) distinguishes this framework from standard government IT risk assessment. To quantify their marginal impact, we ran the Monte Carlo twice for each case study: once with only government project risks (Category 1) and once with both categories. The difference in median NPV was 12% for Brazil and 9% for Saudi Arabia — an 8-12% additional reduction driven primarily by model degradation (cumulative accuracy loss reducing benefits by 15-45% over 10 years) and talent cost premiums (20-80% higher personnel costs).
Limitations
- Two case studies provide illustration, not evidence of generalizability. The framework must be applied to additional sectors, countries, and — critically — retrospectively to completed projects before any generalizable conclusions can be drawn.
- Input parameters are estimates. While we derive them from published benchmarks with documented reasoning, they are not verified against actual project data from Brazil or Saudi Arabia.
- The benefit multiplier range is based on HM Treasury guidance for general IT projects. AI-specific benefit uncertainty may follow a different distribution. Empirical calibration from completed government AI deployments would improve this parameter.
- AI risk factor distributions are estimated from general ML deployment literature, not government-specific studies. Government AI deployment failure modes may differ systematically from private sector patterns documented in the ML literature.
- No ex-post validation. The framework has not been tested against actual government AI project outcomes.
Conclusion
We present a Monte Carlo simulation framework for government AI investment appraisal that incorporates both standard public sector project risks and AI-specific technical risks. Application to two case studies demonstrates substantial gaps between deterministic and failure-adjusted projections, driven primarily by adoption uncertainty and benefit estimation rather than cost factors. The framework is intended as a practical tool for government analysts preparing investment cases, not as a source of generalizable correction factors. Validation against actual completed government AI project outcomes is the necessary next step.
References (all 2024 or earlier)
- Standish Group, "CHAOS Report 2020: Beyond Infinity," The Standish Group International, 2020.
- Flyvbjerg B., "Survival of the Unfittest: Why the Worst Infrastructure Gets Built," Oxford Review of Economic Policy 25(3), pp. 344-367, 2009.
- UK HM Treasury, "The Green Book: Central Government Guidance on Appraisal and Evaluation," 2022.
- OECD, "Government at a Glance 2023," OECD Publishing, Paris, 2023.
- World Bank, "GovTech Maturity Index," Washington DC, 2022.
- UK National Audit Office, "HMRC's Approach to Tackling Tax Evasion and Avoidance," HC 978, Session 2022-23.
- Singapore Building and Construction Authority, "Annual Report 2022/2023," 2023.
- Sculley D. et al., "Hidden Technical Debt in Machine Learning Systems," Advances in Neural Information Processing Systems 28, 2015.
- Obermeyer Z. et al., "Dissecting racial bias in an algorithm," Science 366(6464), pp. 447-453, 2019.
- OECD, "OECD Skills Outlook 2023," OECD Publishing, Paris, 2023.
- OECD, "Tax Administration 2023," OECD Publishing, Paris, 2023.
- Frey C.B. & Osborne M.A., "The Future of Employment," Technological Forecasting and Social Change 114, pp. 254-280, 2017.
- Janssen M. et al., "Data governance: Organizing data for trustworthy AI," Government Information Quarterly 37(3), 2020.
- IMF, "World Economic Outlook Database," October 2024.
- IBGE, "Continuous National Household Sample Survey (PNAD Continua)," July 2024.
- Longinotti F.P., "Collection Efficiency and the Tax Gap in Latin America and the Caribbean," CIAT Working Document No. 5866, 2024.
- Chambers and Partners, "Tax Controversy 2024: Brazil," Global Practice Guides, 2024.
- CNJ, "Justica em Numeros 2024," Conselho Nacional de Justica, Brasilia, 2024.
- UN DESA, "E-Government Survey 2024," United Nations, September 2024.
- GASTAT, "Labour Force Survey Q3 2024," General Authority for Statistics, Saudi Arabia, 2024.
- World Bank Independent Evaluation Group, "Identification for Development (ID4D) Initiative," 2023.
- Receita Federal do Brasil, "Relatorio Anual de Atividades 2023," Brasilia, 2023.
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
--- name: govai-scout description: > Monte Carlo framework for government AI investment appraisal incorporating both standard public sector risks (Standish CHAOS, Flyvbjerg defunding) and AI-specific technical risks (data drift, algorithmic bias, model degradation, talent scarcity). Demonstrates gap between deterministic and risk-adjusted projections on Brazil and Saudi Arabia case studies. allowed-tools: Bash(python *), Bash(pip *) --- # GovAI-Scout: Risk-Adjusted Government AI Investment Appraisal Monte Carlo framework incorporating 9 empirically-grounded risk factors: **Government risks:** procurement delay, cost overrun, political defunding, adoption ceiling **AI-specific risks:** data drift, algorithmic bias, talent scarcity, model degradation, vendor lock-in ```bash pip install numpy scipy pandas matplotlib seaborn --break-system-packages python govai_scout_v4.py ``` Results: Brazil tax admin (NPV BRL 3.4B, P(NPV>0) 81.5%), Saudi municipal (NPV SAR 1.1B, P(NPV>0) 84.5%)
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.