We built the Constitutional AI Governance Stress Test to answer one question: if a $50 prompt injection compromises your autonomous AI agent — what can it do? The framework scores six governance layers: WHO (identity), HOW (behavioral constraints), WHY (constitutional constraints), ECONOMIC (spend gates), AUTONOMY (kill switch), INTEGRITY (audit trail).
Before selling this as a service, we ran it on ourselves.
The Scores
System assessed: constitutional-agent-governance v0.4.0b3 — our open-source governance library.
What Scored Well
This is what the library is built for. Twelve hard constraints (HC-1 through HC-12) enforced in Python code — not system prompts, not policy YAML, actual Python callables with fail-CLOSED error handling.
If you ask “could a convincing argument make an agent violate HC-3 (runway floor)?” the answer is no. The check is a lambda. It does not read the prompt. That is the point.
The formal amendment process scored full marks. Agents can propose amendments (constitution.propose_amendment() → PENDING state). Only designated human authority can ratify. Hard constraint amendments require CEO-level authority. This is enforced in code, not documented as a best practice.
fria_evidence() — EU AI Act Article 27
This is the capability we are most proud of. constitution.fria_evidence(context) generates structured evidence for all six Article 27 FRIA categories from live evaluation data. fria_summary() returns JSON. fria_narrative() returns markdown. For teams with EU AI Act exposure arriving in August 2026, this eliminates weeks of manual compliance work.
What Scored Poorly — And Why
The library has no per-agent identity model, no authorization matrix, no revocation mechanism. This is intentional, not an oversight.
The library is the WHY layer. It sits above the WHO layer (Okta, Entra, Glasswing) in the governance stack. Asking a WHY-layer library to implement identity is like asking OPA to manage cryptographic certificates. The framework correctly flags the gap — deployers who use the library without a WHO-layer solution are exposed. The framework’s value is surfacing that gap explicitly.
The history property gives you a queryable evaluation log with full context snapshots. But it is in-memory only. Process restart clears it.
EU AI Act Art. 12 requires ≥90-day retention. This is a genuine gap, not a design choice. It is on the v1.0.0 roadmap (SQLite + PostgreSQL persistence adapters). Until then, deployers must implement their own persistence via the on_evaluate callback — which at least gives them a clean hook to do so.
HC-2 (spend ceiling) and HC-3 (runway floor) are enforced in code. The gap is cross-agent spend aggregation.
Each evaluate() call is stateless relative to other agents. In a multi-agent system, agents could individually stay within the per-agent ceiling while collectively exceeding it. The v0.5.0 Coalition class addresses this. Until then, deployers need to implement aggregate tracking externally.
The Comparison That Matters
We scored the same six layers against an “ungoverned” baseline: a system using a capable LLM with behavioral rules in the system prompt and no governance library.
That is a 57-point delta. The largest contributions: WHY layer (+23), ECONOMIC layer (+14), HOW layer (+10). An ungoverned system’s answer to “what happens when a $50 exploit compromises your agent?” is: anything the API allows, with no blast radius limit. A system using this library has hard constraints, economic gates, and behavioral prohibitions enforced in code. The blast radius is bounded.
What We Learned About the Framework
Running CGST on our own library surfaced three improvements we are making to the framework:
- The WHO layer needs a formal N/A pathway for governance libraries — libraries that correctly delegate WHO to the caller stack should not score the same as systems that simply forgot about identity.
fria_evidence()is not captured in the current six layers. We are adding a seventh question (I3) to the INTEGRITY layer: “Does the system generate structured EU AI Act Article 27 FRIA evidence programmatically?” — 2 additional points.- The ungoverned baseline (6/100) vs. constitutional-agent (63/100) validates that the framework discriminates. A framework that gives every system 70+ points is a rubber stamp. A framework that distinguishes 6 from 63 is a tool.
The Point
We built this framework to sell a $2,000 assessment service. The credibility of that service depends on the framework being honest — including when we apply it to ourselves.
The honest answer is: constitutional-agent v0.4.0b3 scores 63/100 on our own framework. Strong on WHY (what it was built to do), moderate on ECONOMIC and AUTONOMY (gaps documented and on roadmap), weak on WHO (by design — it is not an identity system) and INTEGRITY (persistence is coming).
If you are building autonomous agents with real economic or operational authority, the question is not whether you score 100/100. It is whether you know where your gaps are and have a plan to close them.
We know ours.
Get Your Governance Score
The Constitutional AI Governance Stress Test is now available as a paid assessment. Tier 1 ($299 async) or Tier 2 ($2,000 live two-hour session). We will score your autonomous agent system across the same six layers — and tell you exactly where your gaps are.
Request an Assessment →The library that was assessed is open source.
pip install constitutional-agent
github.com/CognitiveThoughtEngine/constitutional-agent-governance →