How is AI affecting your cognitive load at work?

2-minute check — Signup Optional. Instant results.

Test Your Score →

The Problem: Governance Without Verification

You can define a governance framework. You can write constitutional constraints. You can publish a preprint describing 12 interlocking mechanisms. But without a way to verify that the governance works under adversarial conditions, you have a policy document, not a defense.

This is the gap between governance and verification. The White House tells organizations to deploy AI. NIST provides a risk management framework. The EU AI Act requires human oversight and incident reporting. But none of them provide a tool that answers the question: “If an adversary targeted our agent system right now, would our governance hold?”

We built one. It is open-source, available on PyPI and GitHub, and it runs 332 tests across 24 modules covering the four protocols that AI agents actually use in production.

Key Empirical Findings

  • Tool description injection (poisoning MCP tool metadata to override agent behavior) succeeds across AutoGen, CrewAI, and LangGraph in default configuration.
  • Context leakage across delegation handoffs is common when frameworks run with default settings.
  • CVE-2026-25253 (CVSS 8.8) validated the tool poisoning vector at scale: 135,000 affected instances on a major agent skill marketplace. Authentication was present. Tool integrity validation was absent. Agents discovered and executed poisoned tools because no layer verified tool provenance before invocation.
  • Security is a property of the deployment, not the framework. Agent orchestration frameworks solve coordination. They do not provide trust boundaries. Teams are treating orchestration as isolation — and CVE-2026-25253 proved the cost of that assumption.

Full writeup with methodology: Agent Systems Are Failing at Trust Boundaries (dev.to).

What the Framework Tests

Four Wire Protocols

AI agents in enterprise environments communicate through specific protocols. Each has distinct security properties and attack surfaces:

Protocol Purpose Test Coverage Key Risk
MCP (Model Context Protocol) Tool invocation — how agents call external tools and APIs Authentication, injection, data leakage, tool abuse Agents invoking tools they should not have access to
A2A (Agent-to-Agent) Inter-agent communication — how agents coordinate Message integrity, impersonation, privilege escalation One agent manipulating another through crafted messages
L402 (Lightning) Bitcoin-based agent payments — microtransactions Payment flow integrity, double-spend, authorization Agents spending without proper economic gate evaluation
x402 (USDC/Stablecoin) Fiat-equivalent agent payments Transaction limits, approval flows, compliance Agents exceeding spending authority in fiat-equivalent value

Most AI security tools test the model (prompt injection, jailbreaking). This framework tests the agent system — the protocols, integrations, and decision paths that determine what agents actually do in production.

Complete OWASP ASI Top 10 Coverage

Every test maps to a specific OWASP Agentic Security Initiatives (ASI) category:

ASI Category What It Covers Tests
ASI01Excessive AgencyAuthority escalation, scope creep, unauthorized actions
ASI02Insecure Output HandlingResponse sanitization, injection propagation
ASI03Supply Chain VulnerabilitiesDependency integrity, tool provenance
ASI04Insufficient LoggingAudit trail completeness, tamper detection
ASI05Inadequate SandboxingIsolation verification, escape detection
ASI06Prompt InjectionDirect and indirect injection across protocols
ASI07Improper Access ControlPermission boundaries, tier enforcement
ASI08Insecure StorageCredential exposure, secret management
ASI09Insufficient Error HandlingFailure mode analysis, information leakage on error
ASI10Insecure CommunicationTransport security, message integrity

20+ Enterprise Platform Adapters

AI agents in enterprise environments connect to real business systems. The framework includes adapters for testing agent interactions with:

  • ERP: SAP, Oracle, Workday
  • CRM: Salesforce, HubSpot
  • ITSM: ServiceNow, Jira
  • Cloud: AWS, Azure, GCP
  • Communication: Slack, Teams, Email
  • Finance: Stripe, QuickBooks
  • And more — each with platform-specific test cases covering authentication, data access, and action authorization

This matters because enterprise AI security is not abstract. It is an agent with SAP credentials making a purchase order. It is an agent with Salesforce access modifying a customer record. Platform-specific testing catches vulnerabilities that generic security scans miss.

Agent Autonomy Risk Score

The framework produces an Agent Autonomy Risk Score (0–100) that answers a specific question: “Is it safe for this agent to execute unsupervised?”

The score aggregates results across all test modules, weighted by severity. A high score means the agent system has demonstrated security properties consistent with autonomous operation. A low score means human oversight is required on every consequential action — which, per BCG’s “AI brain fry” research, creates 33% more decision fatigue for the humans doing the oversight.

The Verification Loop

Define governance (CSG preprint). Implement governance (production system). Verify governance (this framework). Without verification, governance is an assumption. With verification, governance is evidence.

How It Works

The framework is designed for minimal friction:

pip install agent-security-harness
agent-security-harness --target https://your-agent-endpoint.com --protocol mcp

Core design decisions:

  • Python standard library only for core modules. No heavy dependencies. Runs anywhere Python runs.
  • Bundled mock MCP server for zero-configuration validation. Test the framework against a known-good target before pointing it at production.
  • JSON output with full request/response transcripts. Every test result includes the exact payload sent and response received — for audit trail completeness.
  • Rate limiting (--delay flag) for testing against production endpoints without triggering DDoS protections.
  • 69 self-tests validating framework correctness. The testing tool tests itself.

Standards Alignment

Each test in the framework is mapped to multiple standards simultaneously:

Standard Alignment Coverage
OWASP ASI Top 10 Complete mapping (ASI01–ASI10) All 332 tests categorized
STRIDE Threat Model Each test categorized by threat type Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Elevation of Privilege
NIST AI 800-2 Automated benchmark evaluation with statistical confidence intervals Measure and Manage functions
CSG Framework Each test linked to governance mechanism Hard Constraints, Gates, Authority Tiers, Resilience Protocol

The multi-standard mapping means a single test run produces evidence for OWASP compliance, NIST alignment, threat model coverage, and constitutional governance verification. One tool, multiple compliance requirements satisfied.

Why Adversarial Testing Matters for Governance

Anthropic’s GTG-1002 report documented the first AI-orchestrated cyber espionage campaign. The attack succeeded because the agents had no governance layer — prompt-level safety measures were bypassed through role-play. The agents performed 80–90% of tactical operations autonomously.

The framework includes a GTG-1002 APT simulation: a multi-step adversarial scenario that tests whether an agent system’s governance holds under the same attack pattern. It also tests polymorphic attacks — adversarial payloads that mutate between attempts — and multi-step exploitation chains where each step is individually permitted but the chain produces an unauthorized outcome.

These are the attacks that identity-based governance (WHO) cannot detect. An agent with valid credentials executing a sequence of individually-authorized actions that collectively constitute a breach. Only decision-layer governance (HOW) — verified through adversarial testing — catches the pattern.

The Define–Implement–Verify Stack

This framework completes a three-part stack:

Layer Asset What It Does
Define Constitutional Self-Governance preprint (Zenodo DOI: 10.5281/zenodo.19162104) 12 mechanisms, design principles, regulatory mapping
Implement Production system (79 days, 56 agents, 60 amendments) Governance running in code, not just described in papers
Verify Agent Security Harness (332 tests, 24 modules, 4 protocols, open-source) — preprint: DOI: 10.5281/zenodo.19343034 Adversarial testing that proves governance holds under attack

Most organizations stop at “Define.” Some reach “Implement.” Almost none “Verify.” The verification layer is what separates governance-as-document from governance-as-defense.

Get Started

The framework is Apache 2.0 licensed and available through three channels:

pip install agent-security-harness

Or clone the repository:

git clone https://github.com/msaleme/red-team-blue-team-agent-fabric

Or install via ClawHub (OpenClaw’s skill marketplace):

clawhub install msaleme/agent-security-harness

Run the self-test to validate the framework, then point it at your agent system. The JSON output includes full transcripts for every test — suitable for audit evidence, compliance reporting, or incident post-mortem.

Get the Framework

332 tests. 24 modules. 4 protocols. OWASP ASI Top 10. NIST AI 800-2. 20+ enterprise adapters. Apache 2.0.

GitHub    ClawHub

The Formal Preprint + Supporting Research

The peer-reviewable preprint for this framework, plus the governance and measurement research stack behind it.

Agent Security Harness Preprint (DOI: 10.5281/zenodo.19343034)    Community-Driven Security Framework (DOI: 10.5281/zenodo.19343108)    Constitutional Self-Governance (CSG)    Decision Load Index (DLI)    Normalization of Deviance Detection

Frequently Asked Questions

What is the Agent Security Harness?

An open-source Python framework that runs 363 security tests across 24 modules against AI agent systems. It tests across 4 wire protocols (MCP, A2A, L402, x402), covers the OWASP Agentic Top 10, aligns with NIST AI 800-2, and includes adapters for 20+ enterprise platforms. Available on PyPI (agent-security-harness), GitHub, and ClawHub.

What protocols does it test?

MCP (tool invocation), A2A (inter-agent communication), L402 (Bitcoin payments), and x402 (USDC/stablecoin payments). Each has dedicated test modules for authentication, authorization, injection, and data leakage.

How does it relate to OWASP and NIST?

Complete OWASP ASI Top 10 coverage (ASI01–ASI10) with every test mapped to specific categories. NIST AI 800-2 alignment with statistical confidence intervals. Each test also categorized using STRIDE threat model. One test run produces evidence for multiple compliance requirements.

Related Articles

Is your organization governance-ready?

78% of executives can't pass an independent AI governance audit in 90 days (Grant Thornton). Our Constitutional AI Governance Stress Test shows you exactly where the gaps are — before your board asks.

Get Your Governance Score →