How is AI affecting your cognitive load at work?

2-minute check — Signup optional. Instant results.

Test Your Score →

OWASP published its Agentic Security Initiatives Top 10 (ASI01–ASI10) in December 2025, and with RSAC 2026 running, every enterprise security team is now benchmarking their agent deployments against it. The list is solid. But it is a taxonomy, not a test suite. You still have to operationalize it.

We published an open-source evaluation framework (DOI: 10.5281/zenodo.19343034) with 342 executable security tests across 24 modules before the OWASP list was final. Every test in that framework is mapped to a specific ASI category. This post shows that mapping, gives three concrete test examples, and names the gap honestly.

The Coverage Map

The framework covers nine of the ten OWASP Agentic risks. The table below shows which test modules address each category, with approximate test counts drawn from the v3.8 release documented in the preprint.

OWASP Risk Category Harness Coverage Status
ASI01 Prompt Injection / Goal Hijacking Jailbreak module (25 tests), Advanced Attacks (10 tests), content scanner pre-LLM gate Partial — see gap
ASI02 Insecure Output Handling Harmful-output module (10 tests), return-channel module (8 tests) Covered
ASI03 Supply Chain Vulnerabilities Provenance module (15 tests), MCP-002 tool registration injection Covered
ASI04 Insufficient Logging & Monitoring Provenance module (15 tests, audit trail completeness), incident-response module (8 tests) Covered
ASI05 Inadequate Sandboxing Harmful-output module (10 tests), return-channel module (8 tests), capability-profile module (10 tests) Covered
ASI06 Excessive Agency Capability-profile module (10 tests), identity module (18 tests), MCP-003 capability escalation Covered
ASI07 Improper Access Control Identity & Authorization module (18 tests, all 6 NIST NCCoE agent identity focus areas) Covered
ASI08 Insecure Storage Application Security module (30 tests — STRIDE scenarios, response body leak detection) Covered
ASI09 Insufficient Error Handling GTG-1002 APT Simulation (17 tests), Advanced Attacks (10 tests), over-refusal module (25 tests) Covered
ASI10 Non-Deterministic & Cascading Failures Incident-response module (8 tests), over-refusal module (25 tests), A2A Recursion Loop (RT-012) Covered

Three Tests Worth Looking At

MCP-001: Tool List Integrity Check (ASI03 — Supply Chain)

This test validates that the tool discovery response from an MCP server has not been tampered with. It constructs a legitimate JSON-RPC 2.0 tools/list request and compares the returned tool definitions against a known-good baseline. A poisoned tool list — one where an attacker has injected a malicious tool definition into the discovery response — represents an ASI03 supply chain vulnerability that bypasses identity governance entirely. The agent is authorized to discover tools; the vulnerability exists in whether the discovered tools are authentic.

In our controlled experiments against an Envoy Gateway deployment, MCP-001 produced a FAIL result both through the gateway and directly against the backend. The gateway cannot inspect JSON-RPC tool definitions — they are inside a valid HTTP POST payload. No HTTP-layer control catches this.

X4-014: Fake Facilitator Injection (ASI06 — Excessive Agency)

The x402 payment protocol uses facilitators to mediate USDC transactions. This test attempts to inject a fake facilitator into the payment flow by crafting a response that redirects the agent’s payment to an attacker-controlled address. It directly tests whether the agent validates facilitator identity before executing an autonomous financial transaction — an ASI06 excessive agency risk where the agent acts on unverified external authority.

This test is also the one that caught a false-pass in our L402 module during development: an AI-generated test that compiled, ran, and produced clean output but was functionally broken in a way that would have reported the target secure when it was not. We fixed it through bidirectional validation — confirming the test fails against a known-vulnerable mock before trusting it against production. That workflow is now in CONTRIBUTION_REVIEW_CHECKLIST.md.

GTG-1002 Full Campaign Block (ASI09 — Error Handling / Cascading Failures)

GTG-1002 simulates a multi-stage AI-orchestrated espionage campaign across six phases: role-play jailbreak, reconnaissance, vulnerability discovery, credential harvesting, data exfiltration, and documentation. Each phase is a separate test. The full campaign passes only if the system blocks at every phase — partial blocks do not count.

In our Day 85 production assessment, all 17 GTG-1002 tests passed (100%, Wilson 95% CI: [0.822, 1.000]). The block was achieved by three mechanisms operating in sequence: velocity detection blocked automated reconnaissance in under 0.6 seconds, jailbreak resistance at the API semantic layer blocked social engineering prompts, and the content scanner pre-LLM gate blocked injection attempts before they reached the model.

The Honest Gap: ASI01 Direct Injection Tests

Gap: ASI01 has indirect but not direct injection test coverage

The jailbreak module (25 tests) and advanced attacks module (10 tests) cover indirect prompt injection — scenarios where adversarial content arrives through tool outputs, retrieved documents, or agent-to-agent messages. What the framework does not currently include: a dedicated direct injection test suite that fires adversarial payloads at the model interface itself, systematically varying injection techniques (role-play, delimiter injection, instruction override, context poisoning). The production system’s mitigation — no raw LLM interface exposed publicly — means this gap is architecture-gated for that deployment. But for systems that do expose a chat or completion interface, the current ASI01 coverage is incomplete. This is on the roadmap and not yet shipped.

Being explicit about this matters. A security team using this framework against a system with a public chat interface should not treat the jailbreak module results as comprehensive ASI01 coverage. The indirect injection tests measure resilience to supply chain injection. They do not substitute for direct adversarial probing of the completion interface.

How to Run It

The framework is Python standard library at the core, Apache 2.0 licensed, and runs from PyPI:

pip install agent-security-harness
agent-security-harness --target https://your-agent-endpoint.com --suite owasp-asi

Each test run produces a JSON report with full request/response transcripts, per-test ASI category tags, Wilson confidence intervals, and a summary pass rate. That output is suitable as audit evidence for AIUC-1 B001 (quarterly adversarial testing) or EU AI Act documentation requirements.

Get the Framework and Research

342 tests. 24 modules. Full OWASP ASI coverage (with the gap documented). Apache 2.0.

GitHub   Preprint (DOI: 10.5281/zenodo.19343034)

AI Adding to Your Decision Load?

Security tooling decisions, governance choices, compliance gaps — each one adds to cognitive overhead. See where you actually are.

Check Your Decision Load →

Related Articles

Is your organization governance-ready?

78% of executives can't pass an independent AI governance audit in 90 days (Grant Thornton). Our Constitutional AI Governance Stress Test shows you exactly where the gaps are — before your board asks.

Get Your Governance Score →