Singapore AI Verify: The World's First Government-Built AI Testing Toolkit Explained

Last reviewed: April 30, 2026

AI Verify is the world’s first government-built open-source AI governance testing toolkit. Singapore launched it in 2022 as a pilot, opened it to the global community via the AI Verify Foundation in June 2023, and has since extended it twice — Project Moonshot for LLM evaluation in May 2024, and the Global AI Assurance Sandbox in 2025. The framework converts the abstract principles in Singapore’s Model AI Governance Framework into 11 governance principles, 85 testable criteria, and 4 built-in technical test toolboxes that produce auditable AI Governance Reports. This article is the practitioner walkthrough — what AI Verify covers, how the toolkit works in practice, where Project Moonshot fits, and how the framework maps to the EU AI Act, NIST AI RMF, and ISO/IEC 42001.

Key Takeaways

AI Verify is the only government-built AI assurance platform. Developed by IMDA, governed by the AI Verify Foundation (a global open-source community with 7 premier members and 50+ general members), distributed under open-source license on GitHub.
The framework structures testing around 11 governance principles in 5 focus areas. Transparency, Explainability, Repeatability, Safety, Security, Robustness, Fairness, Data Governance, Accountability, Human Oversight, and Inclusive Growth — 85 testable criteria operationalize them.
Four built-in technical test toolboxes ship with the toolkit. SHAP for explainability, robustness toolbox for adversarial testing, fairness metrics for classification, fairness metrics for regression. Plus extensible plug-in architecture.
Project Moonshot extends AI Verify for LLMs. Combines red-teaming, benchmarking with 100+ datasets, and automated evaluation that integrates into CI/CD. Open-source on GitHub with 316 stars.
AI Verify maps to international frameworks. EU AI Act technical documentation requirements, NIST AI RMF Govern/Map/Measure/Manage functions, OECD AI Principles, G7 Hiroshima Process, ISO/IEC 42001 — making AI Verify operationally useful well beyond Singapore.

Why does AI Verify exist?

The problem AI Verify solves is the gap between AI governance principles and audit-ready evidence. Every major AI governance framework — Singapore’s Model AI Governance Framework, the NIST AI Risk Management Framework, the EU AI Act’s high-risk requirements, the OECD AI Principles — tells organizations what their AI should do. None of them tell organizations how to demonstrate that their AI does it.

AI Verify was launched in May 2022 as a Minimum Viable Product to fill that gap. IMDA piloted the toolkit with 50+ local and multinational companies through 2022 and 2023. The pilot tested whether structured technical and process testing could produce credible governance evidence. It could. On June 7, 2023 — at the Asia Tech x Singapore (ATxSG) conference — Minister Josephine Teo announced the AI Verify Foundation. The foundation opened the toolkit to the global open-source community and structured ongoing development through a multi-stakeholder governance model.

The foundation’s strategic membership reflects the reach AI Verify has achieved. Seven premier members guide direction:

IMDA (Infocomm Media Development Authority — Singapore government)
Aicadium (Temasek’s AI Centre of Excellence)
IBM
Microsoft
Google
Red Hat
Salesforce

Plus 50+ general members across sectors — Adobe, AWS, DBS Bank, Dell, Meta, SenseTime, Singapore Airlines, DataRobot, X0PA, and others. The membership demonstrates that AI Verify is not a Singapore-only artifact; it is technical infrastructure that the major US technology companies have invested in maintaining.

What does the AI Verify Testing Framework cover?

The framework structures testing around 11 governance principles, organized into 5 focus areas. Each principle has both process checks (governance documentation) and technical tests (where applicable). The structure converts abstract governance into auditable evidence.

Five focus areas, eleven principles

Focus Area	Principles
1. Transparency on the Use of AI and AI Systems	Transparency
2. Understanding How AI Models Reach Decisions	Explainability
3. Safety & Resilience of AI System	Repeatability/Reproducibility, Safety, Security, Robustness
4. Fairness / No Unintended Discrimination	Fairness
5. Governance and Accountability	Data Governance, Accountability, Human Oversight, Inclusive Growth/Wellbeing

The 11 principles are derived from international consensus — they map to OECD AI Principles, ISO/IEC 42001, EU AI Act trustworthiness requirements, and NIST AI RMF Trustworthy AI characteristics. AI Verify is not asserting a Singapore-specific view of what AI governance requires; it is implementing the international consensus in a structured testing format.

The 85 testable criteria

Each principle is operationalized through process checks. The framework contains 85 testable criteria across 11 process checklists (one per principle). Each criterion is a specific question with documentary evidence requirements — for example, under Accountability:

Does the organization have a designated AI governance owner? Evidence: organization chart, role description, signed accountability letter.
Is there a process for AI incident management? Evidence: incident response plan, escalation matrix, recent incident logs.
Are AI risks assessed before deployment? Evidence: risk assessment template, completed assessments for production systems.

The criteria are intentionally specific. They convert “the organization should be accountable for AI” (an aspirational statement) into “produce these documents” (an auditable test). This is the framework’s most consequential design choice. Process maturity gets measured by document existence and quality, not by leadership statements about governance posture.

Per-principle structure

Each of the 11 principles follows a consistent four-layer structure:

Layer	Description
Principle	The overarching consideration (e.g., “AI should not unintentionally discriminate”)
Outcomes	Desired outcomes — what the organization wants to achieve
Processes	Actionable steps to achieve outcomes — testable criteria
Evidence	Documentary evidence validating implementation

This four-layer structure is the framework’s auditable spine. Internal teams know what to produce. External auditors know what to verify. Regulators reviewing post-deployment evidence know what to expect.

Four built-in technical test toolboxes

Process checks measure governance maturity. Technical tests measure model behavior. AI Verify ships with four built-in technical test toolboxes that run automatically against your model:

Toolbox	Principle	What It Does
SHAP Toolbox	Explainability (Global)	SHapley Additive exPlanations — feature attribution showing which inputs drive model predictions
Robustness Toolbox	Robustness	Adversarial perturbation testing — measures how model performance degrades under input noise
Fairness Metrics Toolbox (Classification)	Fairness	Statistical parity, equalised odds, calibration metrics across protected attributes for classification models
Fairness Metrics Toolbox (Regression)	Fairness	Same fairness measurement adapted for regression models

Three additional technical tests are available but not in default templates:

Image Corruption Toolbox — tests robustness to image corruption and noise patterns
Partial Dependence Plot — visualizes marginal effect of features on predictions
Accumulated Local Effect — measures how features influence predictions on average

The technical tests run against a serialized model file or folder (black-box testing). The model itself is treated as a closed object; AI Verify tests it through inputs and observed outputs. This matters operationally: developers using AI Verify do not need to expose model weights or architecture details to the testing toolkit. The Foundation does not see your model. The output is yours.

Extensibility

AI Verify supports adding new technical tests through plug-ins:

Veritas Toolkit (MAS financial sector responsible AI methodology) is integrated as an extension. FIs running MAS AI Risk Management compliance can run Veritas FEAT assessments inside AI Verify.
CCCS plug-in — Competition and Consumer Commission of Singapore contributed an extension for competition-relevant assessments.
Custom modules — third-party developers can create proprietary test modules. AI Verify’s developer-tools repository (aiverify-foundation/aiverify-developer-tools) provides scaffolding.

How does the AI Verify Toolkit work in practice?

The toolkit converts the framework into runnable software. The workflow is parallelizable — technical teams run technical tests while compliance teams complete process checklists, and both feed into a single output report.

Workflow

Start with end-goal: design the report. AI Verify uses a customizable report canvas. Select which principles you want to report against; the toolkit determines which technical tests and process checks are required.
Provide inputs. Serialized model file or folder. Test datasets appropriate for the tests being run. Test arguments (configuration). User inputs and documentary evidence for process checks.
Run tests in parallel. Technical tests execute against the model. Process check responses are entered by compliance teams independently. The two paths combine into the final report without blocking each other.
Generate the AI Governance Report. A comprehensive document showing performance against selected principles. Quantitative test results (fairness scores, robustness scores, explainability visualizations). Process check completion status. Custom layout per organization needs.

Important: only reports generated by the official AI Verify Toolkit without modification are considered “AI Verify Reports.” Organizations cannot manually edit a generated report and re-label it as AI Verify-attested. This preserves the integrity of the audit trail.

Report templates

Pre-defined layouts simplify common use cases. A binary classification model for credit risk has different requirements than a generative AI customer service chatbot. Templates pre-select appropriate tests and process checks. The AI Verify Foundation publishes a sample report — a 71-page assessment of a binary classification credit-risk model — that demonstrates the output format. The sample is downloadable from aiverifyfoundation.sg.

Project Moonshot — LLM evaluation extension

Project Moonshot extends AI Verify into LLM-specific evaluation. Launched May 31, 2024 at ATxSG, Moonshot fills a gap that the original AI Verify did not address: traditional AI testing methods do not transfer cleanly to large language models.

Capability	What It Does
Red-teaming	One-stop tool for jailbreaks, customised attacks, and adversarial testing of LLMs
Benchmarking	Repository of 100+ pre-built benchmark datasets with built-in evaluators; curate relevant benchmarks per application
Automated Evaluation	Tools for generative AI applications that integrate into CI/CD pipelines

The conceptual difference between AI Verify Toolkit and Moonshot:

	AI Verify Toolkit	Project Moonshot
Covers	Traditional AI (ML models) and now GenAI extension	LLMs and LLM applications specifically
Key question	“Is this model fair, explainable, robust?”	“Which LLM is best?” and “Is our LLM app safe?”
Testing type	Governance process checks + technical tests	Benchmarking + red-teaming + safety testing
Inputs	Serialized model file/folder	API endpoint or local model
Stars (GitHub)	58 (aiverify)	316 (moonshot)

Both tools are open-source. Both ship with web UI for non-technical users and Python library for technical teams. Project Moonshot’s design partners include DataRobot (which integrated Moonshot into its commercial AI platform), IBM, Singtel, and Temasek.

Moonshot’s web interface implements IMDA’s Starter Kit for Safety Testing of LLM-Based Applications — a recommended baseline of tests every LLM deployment should pass. The Starter Kit is meant as a floor, not a ceiling. Production LLM deployments should expand beyond it based on use case risk.

For a deeper practitioner-focused walkthrough of Moonshot specifically — installation, first evaluation run, comparison to HELM/OpenAI Evals/Anthropic Evals — see our forthcoming Project Moonshot deep dive.

The GenAI Testing Framework extension

The original AI Verify Testing Framework was designed for traditional machine learning models. Generative AI introduced testing requirements that did not map cleanly to the original framework. The AI Verify Foundation extended the testing framework to address GenAI-specific concerns:

Content provenance testing — detecting AI-generated content; verifying provenance metadata (C2PA-compatible)
Safety alignment evaluation — measuring whether model behavior aligns with organizational safety standards
Hallucination detection — measuring rate and severity of factual fabrication
Prompt injection resilience — testing whether system prompts can be overridden by adversarial user inputs
Data leakage assessment — measuring whether the model leaks training data through inference

The GenAI extension is aligned with international frameworks from the EU, G7, OECD, and US. Organizations running both traditional and generative AI can use AI Verify as the unified testing framework for both.

Global AI Assurance Sandbox and ISAGO 2.0

Two additional initiatives extend AI Verify’s reach beyond the testing toolkit itself.

Global AI Assurance Sandbox (2025). Launched at the Singapore PDP Summit by AI Verify Foundation and IMDA. The sandbox is a marketplace platform where companies deploying AI bring their systems for testing, certified testers conduct governance assessments, and results contribute to evolving best practices and standards. It is currently accepting expressions of interest. The sandbox addresses a gap that pure self-assessment leaves: third-party testing capacity. Most organizations cannot credibly self-attest. The sandbox provides a structured route to independent assessment without the cost of a full external audit.

ISAGO 2.0 (Implementation and Self-Assessment Guide for Organisations). The updated guide that helps organizations self-assess against AI Verify principles. ISAGO 2.0 is the practitioner-facing companion to the testing framework — a structured workbook that walks compliance teams through process checks and connects them to documentary evidence requirements. Used by Google, Microsoft, DBS Bank, Singapore Airlines, Dell, X0PA, and others through AI Verify pilots.

How does AI Verify map to international frameworks?

AI Verify’s strategic value beyond Singapore comes from explicit mapping to other major AI governance frameworks. Organizations operating across jurisdictions can use AI Verify outputs as evidence for multiple regimes simultaneously.

International Framework	AI Verify Mapping
EU AI Act	AI Verify principles align with EU AI Act technical documentation requirements (Article 11, Annex IV); fairness and robustness testing supports conformity assessment evidence (Articles 8-15); audit trail supports post-market monitoring (Article 72)
NIST AI RMF (US)	Process checks align with Govern function; technical tests align with Map and Measure functions; ISAGO 2.0 supports Manage function
OECD AI Principles	The 11 AI Verify principles are derived from international consensus including OECD; direct mapping by design
G7 Hiroshima Process	Voluntary code of conduct for AI developers — AI Verify provides operational evidence of code compliance
ASEAN AI Governance Guidelines	Singapore’s framework directly informed ASEAN guidance; AI Verify is the operational layer for ASEAN-aligned governance
ISO/IEC 42001	Singapore’s framework informed ISO standard development; AI Verify principles overlap substantially with ISO/IEC 42001 management system requirements

Multinational AI deployments increasingly use AI Verify outputs to satisfy multiple regulatory requirements through a single testing pass. A single AI Governance Report can serve as evidence for EU AI Act conformity assessment, NIST AI RMF documentation, and ISO/IEC 42001 certification audit. This integration value is the framework’s strongest argument for adoption beyond Singapore.

What companies are using AI Verify?

Adoption metrics are not fully published — IMDA has not released definitive statistics on total organizations using AI Verify or total reports generated. The publicly named users include:

Premier members (active in the Foundation): IMDA, Aicadium, IBM, Microsoft, Google, Red Hat, Salesforce
Major Singapore corporates: DBS Bank, Singapore Airlines, OCBC, UOB
Multinationals: Google, Microsoft, IBM, Adobe, Meta, AWS, Dell
Vendors with AI Verify integration: DataRobot (integrated Project Moonshot into commercial AI platform), IBM, SAS

The pattern: AI Verify adoption clusters in (1) major Singapore-headquartered corporates, (2) US technology companies that participate in the Foundation governance, and (3) AI vendors that have integrated AI Verify outputs into their commercial assurance offerings. Adoption among small and mid-size enterprises outside these categories is harder to verify but appears to be growing through the Global AI Assurance Sandbox.

Limitations and roadmap gaps

AI Verify is the most concrete AI assurance toolkit globally — but it has gaps. Honest assessment for practitioners:

No risk-tiering guidance. The framework presents all 11 principles with similar weight. A small organization with limited resources may struggle to prioritize. Compare to the EU AI Act’s risk-tier model, which explicitly classifies high-risk vs limited-risk vs minimal-risk systems.
No direct mapping to specific legal obligations. AI Verify maps conceptually to EU AI Act requirements but does not produce the specific documentation forms regulators expect. Organizations using AI Verify for EU compliance need additional translation work.
Limited operational guidance on “how.” The framework defines what to do well. It is lighter on how to implement efficiently — automation, tooling, workflow integration. Practitioners often need vendor support to operationalize at scale.
GenAI adaptation is still maturing. Some sections of the GenAI extension are adapted from traditional ML thinking rather than fully reimagined for generative models. Expect continued evolution.
Adoption metrics are opaque. No published statistics on total organizations using AI Verify or reports generated. This makes it hard to assess network effects in adoption.
No agentic-specific module yet. Singapore’s Agentic AI Governance Framework was published January 2026. AI Verify has not yet shipped agentic-specific testing capabilities. Expect this in 2026-2027.

The roadmap gaps are not framework failures. They are signals about where extensions are most needed. Organizations should engage with the AI Verify Foundation through the Global AI Assurance Sandbox and Foundation membership channels to influence priorities.

For broader Singapore AI governance context — how AI Verify sits alongside the Model AI Governance Framework, Agentic AI Framework, MAS guidelines, and PDPC advisory — see Singapore AI Governance: All Frameworks in One Place. For comparable cross-jurisdiction assurance approaches see EU vs UK AI Regulation and EU vs US AI Regulation: The Definitive Comparison.

Sources

AI Verify Foundation. Official website. https://aiverifyfoundation.sg/
AI Verify Foundation. “AI Verify Testing Framework.” https://aiverifyfoundation.sg/what-is-ai-verify/
AI Verify Foundation. “AI Verify Toolkit.” https://aiverifyfoundation.sg/what-is-ai-verify/toolkit/
AI Verify Foundation. AI Verify GitHub organization. https://github.com/aiverify-foundation
AI Verify Foundation. AI Verify main repository. https://github.com/aiverify-foundation/aiverify
AI Verify Foundation. Project Moonshot repository. https://github.com/aiverify-foundation/moonshot
AI Verify Foundation. Sample AI Governance Report (binary classification credit risk). https://aiverifyfoundation.sg/downloads/AI_Verify_Sample_Report.pdf
AI Verify User Guide. https://aiverify-foundation.github.io/aiverify/introduction/how-it-works/
IMDA. “AI Verify launch announcement.” June 7, 2023.
IMDA. “Project Moonshot launch press release.” May 31, 2024. https://www.imda.gov.sg/resources/press-releases-factsheets-and-speeches/press-releases/2024/sg-launches-project-moonshot
IMDA. “Starter Kit for Safety Testing of LLM-Based Applications.”
AI Verify Foundation. “Global AI Assurance Sandbox launch.” PDP Summit 2025.
Microsoft / IMDA joint announcement. June 12, 2023. https://news.microsoft.com/source/asia/2023/06/12/singapore-launches-ai-verify-foundation/
AIGL Blog. “AI Verify Testing Framework analysis.” https://www.aigl.blog/ai-verify-testing-framework-for-traditional-and-generative-ai/
MAS / Veritas Toolkit GitHub. https://github.com/veritas-toolkit
ISO/IEC 42001:2023. AI Management System Standard.

Reg Intel is not a law firm and does not provide legal services. This article is for informational purposes only and should not be relied upon as legal advice. Consult qualified counsel for your specific compliance situation.

Singapore AI Verify: The World’s First Government-Built AI Testing Toolkit Explained

Key Takeaways

Why does AI Verify exist?

What does the AI Verify Testing Framework cover?

Five focus areas, eleven principles

The 85 testable criteria

Per-principle structure

Four built-in technical test toolboxes

Extensibility

How does the AI Verify Toolkit work in practice?

Workflow

Report templates

Project Moonshot — LLM evaluation extension

The GenAI Testing Framework extension

Global AI Assurance Sandbox and ISAGO 2.0

How does AI Verify map to international frameworks?

What companies are using AI Verify?

Limitations and roadmap gaps

Sources

Singapore Wave 2 — Deep Dives + EU Comparison

Singapore AI Verify: The World’s First Government-Built AI Testing Toolkit Explained

Key Takeaways

Why does AI Verify exist?

What does the AI Verify Testing Framework cover?

Five focus areas, eleven principles

The 85 testable criteria

Per-principle structure

Four built-in technical test toolboxes

Extensibility

How does the AI Verify Toolkit work in practice?

Workflow

Report templates

Project Moonshot — LLM evaluation extension

The GenAI Testing Framework extension

Global AI Assurance Sandbox and ISAGO 2.0

How does AI Verify map to international frameworks?

What companies are using AI Verify?

Limitations and roadmap gaps

Sources

Singapore Wave 2 — Deep Dives + EU Comparison

The Weekly Brief