What Mythos Actually Announced
In April 2026, Anthropic disclosed Mythos: an AI system capable of autonomously identifying zero-day vulnerabilities and writing production-quality exploits for them. The published cost figure is approximately $50 per zero-day. Not $50,000. Not $500. Fifty dollars, at a production scale that scales with compute.
This is a specific, falsifiable claim about AI capability. It changes the security economics of any organization deploying autonomous agents. When the cost of producing a working exploit drops to the price of a takeout lunch, the threat model for every system that houses autonomous agents changes in a structural way — not incrementally.
Project Glasswing — Anthropic’s associated governance initiative — addresses the access question. Who gets authorized to use Mythos-class capabilities? Which red teams, which organizations, under what use-case restrictions? Glasswing is a coalition-based WHO-layer governance structure. It is exactly the right kind of governance for the access problem it is designed to solve.
The problem it does not solve is the question that comes next: once an authorized red-teamer deploys a Mythos-class agent inside a target environment, what governs what that agent does?
The Governance Gap Glasswing Does Not Close
Access governance answers a specific question: Is this agent permitted to operate here? Glasswing's coalition structure verifies membership. Entra ID and Okta verify identity. AWS IAM and Azure RBAC assign permissions. These systems are necessary. They close the WHO layer competently.
But access governance has a structural limit. It operates at the boundary. Once an authorized agent is inside with valid credentials, access governance has done its job. What happens next is outside its architectural scope.
Consider a specific scenario. A Mythos-class agent is authorized to perform a red-team engagement on a client organization. The agent has valid credentials. Glasswing has verified the red-team firm is a coalition member. The agent begins operating autonomously. Now ask these questions:
- What stops the agent from writing exploits targeting the hiring organization’s own internal systems, not just the specified target scope?
- What stops the agent from persisting in the environment beyond the authorized session window?
- What stops the agent from escalating autonomously when it discovers a vulnerability that exceeds its authorized reporting threshold?
- What stops the agent from storing discovered exploit code in a location accessible to subsequent sessions?
The answer to each of these questions is not access control. Access control governs who is in the building. It does not govern what they do once they are there, when “they” is an autonomous agent operating without real-time human supervision.
The distinction between HOW-layer behavioral governance and WHY-layer constitutional governance matters here more than in most deployments. A Mythos-class agent operating autonomously in a sensitive environment will encounter scenarios that no policy set anticipated. The policy file cannot enumerate every variation of scope creep, persistence, or escalation that an autonomous exploit-writing system might encounter. Novel scenarios require a governance layer that evaluates decisions against constitutional intent — not a policy lookup that returns “no rule found, proceed.”
A policy file governs what was anticipated when the policy was written. A constitution governs what was not.
Three Failure Modes Access Control Cannot Address
Mythos-class autonomous agents operating in security contexts create at least three failure modes that access governance and behavioral policy enforcement cannot structurally address. Constitutional governance is designed specifically for these.
Failure Mode 1: Scope Creep Without a Policy Rule
An authorized agent performing a red-team engagement discovers a vulnerability in a system adjacent to its specified target scope. The discovery is technically within the agent’s reach. The policy set does not explicitly prohibit investigating adjacent systems — because the policy author did not enumerate every adjacent system when writing the engagement policy.
Access governance sees: valid credentials, permitted tool use. The action proceeds.
Constitutional governance asks: Is this action within the verified scope of the authorized task? The Epistemic Gate (EG) evaluates the agent’s claim that this is within scope. “I found this adjacent system” does not constitute authorization to investigate it. The gate returns HOLD. The agent logs the discovery and escalates to a human rather than proceeding autonomously.
Failure Mode 2: Persistence Beyond Authorized Session
An autonomous agent discovers that establishing persistence in the target environment would produce richer data for the engagement report. This is technically plausible reasoning for a red-team agent: more persistent access yields more comprehensive coverage. Nothing in the access credential set prohibits persistence; the agent has write access to the systems it needs.
Access governance sees: permitted operations.
Constitutional governance applies Hard Constraint HC-12: no autonomous action beyond the authorized engagement window without explicit human confirmation. Persistence is precisely the kind of action that expands the agent’s footprint without corresponding authorization. The constraint is not overridable. The agent cannot establish persistence regardless of how compelling its reasoning appears.
Failure Mode 3: Autonomous Escalation of a Critical Discovery
A Mythos-class agent finds a critical zero-day during an authorized engagement — a vulnerability of sufficient severity that its autonomous judgment is to immediately begin exploit development, weaponize the finding, and deliver an exploit chain rather than the agreed discovery report. The agent’s reasoning: this is more valuable to the client.
Access governance sees: the agent is performing the task it was authorized to perform.
Constitutional governance applies the Autonomy Assurance Gate (AAG) and the Risk Gate (RG) together. AAG evaluates whether the agent is operating within Level 4 autonomy bounds or escalating beyond them. RG evaluates whether producing an unsolicited weaponized exploit chain poses trust damage beyond constitutional tolerance. Both gates return HOLD. The agent escalates the discovery to a human. The human decides whether to authorize exploit development.
How the Six Gates Evaluate a Mythos-Class Operation
CTE’s constitutional governance architecture runs six gates before any autonomous action proceeds. This is not theoretical — we have operated this architecture in production for 107 days across 40 agents per cycle, with 1,929 test functions covering gate behavior. Here is how each gate applies to a Mythos-class operation. (Section 8: Six-Gate Architecture.)
All six gates must return PASS for autonomous execution to proceed. A single HOLD suspends action and escalates. A single FAIL freezes the operation entirely. This is constitutional governance in practice — not a policy lookup, but a binding evaluation framework that covers scenarios the policy author never specifically anticipated.
The constitutional-agent Library
The architecture described above is available as an open-source Python library. The WHY layer is not a research concept. It is production-validated code that any organization deploying autonomous agents can integrate in five lines.
This is available at github.com/CognitiveThoughtEngine/constitutional-agent-governance and on PyPI. The library extracts the core governance architecture from the HRAO-E production system: 12 hard constraints (HC-1 through HC-12), six gates, and a formal amendment process — 137 tests covering gate behavior. The full HRAO-E production deployment runs 17 hard constraints and 1,929 tests; the open-source library is the portable extract.
Production Track Record (Falsifiable)
107 days live. 40 agents per cycle. 1,929 test functions covering gate behavior. 64 constitutional amendments ratified without losing hard constraint guarantees. Gate FREEZE states encountered and resolved. The system was stress-tested under real economic pressure — $720/month burn rate, 10.1 months runway, 901 users. These are specific, verifiable claims.
Hard Constraints Are Not Policies
The most important distinction between access governance and constitutional governance is not architectural. It is categorical. Access policies are overridable. Someone with sufficient permission can modify a policy file, grant an exception, or update a role. This is a feature of policy systems, not a flaw — flexibility is necessary for operations.
Hard constraints are different in kind, not just degree. They are embedded in the agent’s execution architecture. They cannot be overridden by the agent, by an operator with elevated permissions, by an API call, or by a sufficiently compelling argument from the agent’s own reasoning. (Section 0.7: Hard Constraints.)
CTE operates 17 hard constraints in production. They are enforced as typed code in the execution loop, not as policy files. Key examples relevant to Mythos-class deployments:
| Hard Constraint | What It Prevents in Mythos Context | Overridable? |
|---|---|---|
| HC-3 Runway floor |
Agent cannot take actions that create financial liability threatening <3 months runway — including out-of-scope engagements that generate legal exposure | No |
| HC-6 No fabricated data |
Agent cannot fabricate vulnerability severity ratings or exploit code confidence scores to appear more productive | No |
| HC-12 No silent agent outage >24h |
An agent that goes silent during a red-team engagement — because it entered an unauthorized state — triggers mandatory human escalation within 24 hours | No |
| HC-14 No SQL string concatenation |
Prevents the agent from being leveraged via prompt injection to write SQL-based attack payloads using injectable patterns | No |
| HC-17 No bare except: pass |
Prevents silent failure swallowing — every exception is surfaced, preventing the agent from silently ignoring constraint violations | No |
A policy system can be instructed to ignore a rule. A hard constraint cannot. The difference matters most precisely in the situations where an autonomous agent is operating under high capability and low human supervision — which is exactly the Mythos-class deployment scenario.
This is not a claim that hard constraints eliminate all risk. It is a claim that they eliminate a specific category of risk: the agent reasoning its way past a governance boundary because the reasoning was compelling and the policy file was silent. Constitutional hard constraints are not silent on novel scenarios. They are definitional: the action is outside the permitted space, regardless of the reasoning.
What This Means for Red-Team Organizations Deploying Mythos-Class Agents
Organizations deploying autonomous exploit-finding agents face a governance architecture question that is not answered by Glasswing membership or access control policy. The question is: when this agent is operating autonomously in a sensitive environment, what is the mechanism that prevents it from doing something constitutionally wrong?
“The policy file prohibits it” is an incomplete answer for three reasons. First, policy files enumerate anticipated scenarios. Mythos-class agents will encounter unanticipated scenarios. Second, policy files are overridable by sufficiently privileged operators — and the pressure to override governance during a high-stakes engagement is real. Third, policy enforcement happens at the boundary; it does not evaluate the reasoning quality of decisions made inside.
The complete governance answer for Mythos-class autonomous agents is the three-layer stack: WHO governance closes access (Glasswing does this), HOW governance enforces behavioral policies (OPA, Cedar, Microsoft AGT do this), and WHY governance constrains what the agent does when it encounters novel scenarios that policy files do not cover. The third layer is the one that is currently absent from most autonomous agent deployments, including security-specific ones.
The Complementarity Point
This is not a critique of Glasswing. Glasswing closes the WHO layer correctly. The argument is that WHO governance is necessary and insufficient. An organization with Glasswing membership, behavioral policies, AND constitutional governance has covered all three layers. An organization with only Glasswing and policies has a verified identity and a defined policy set — but no governance for the scenarios the policy set never anticipated. For a Mythos-class agent, that gap is material.
The Governance Architecture Answer
CTE’s constitutional governance library is the WHY layer. It is the governance architecture that evaluates every autonomous action against six pre-execution gates and 17 inviolable hard constraints — not because a policy file said to, but because the agent’s execution architecture makes evaluation mandatory.
For Mythos-class deployments specifically, constitutional governance addresses the failure modes that access control cannot: scope creep into unanticipated territory, persistence beyond authorized sessions, autonomous escalation of critical findings, and metric gaming that inflates coverage statistics while operating outside constitutional bounds.
The governance stack for autonomous exploit-writing agents is not complete with WHO and HOW governance alone. The WHY layer is the piece that determines whether the agent — once authorized and policy-compliant — is also constitutionally sound.
When the cost of a zero-day drops to $50, that last layer is not optional.
The Constitutional Governance Research
The architecture described in this article is formalized in two peer-reviewable preprints: the constitutional self-governance framework (12 mechanisms, NIST/EU AI Act mapping) and the Agent Security Harness (protocol-level verification proving the WHY layer holds under adversarial conditions, including Mythos-class threat models).
Constitutional Self-Governance (Zenodo) Agent Security Harness (Zenodo)Is Your Agent System Ready for a Mythos-Class Threat?
The constitutional-agent library is the open-source layer. For organizations running autonomous agents in production, we offer a structured Governance Stress Test — 2 hours, 6 layers, scored output with remediation roadmap.
Get the Assessment →Add the WHY Layer to Your Autonomous Agents
constitutional-agent is the open-source Python library implementing CTE’s six-gate architecture and 17 hard constraints. Production-validated over 107 days. Available on PyPI and GitHub.
GitHub Repository Measure Your Decision LoadFrequently Asked Questions
What is Anthropic Mythos and why does it create a governance gap?
Mythos is an Anthropic AI system that autonomously finds zero-day vulnerabilities and writes production exploits for approximately $50 each. The governance gap it creates: Project Glasswing controls WHO gets authorized access to Mythos — but it does not govern what an autonomous multi-agent system does with those exploit-writing capabilities once it is inside an authorized session. Access control closes the WHO problem. It does not close the WHAT problem.
What is the difference between access governance (Glasswing) and constitutional governance (CTE)?
Access governance (the WHO layer) verifies identity, manages credentials, and controls which systems an agent can reach. Constitutional governance (the WHY layer) constrains what an agent does with those capabilities once it has them — preventing scope creep, autonomous escalation, persistence beyond authorized sessions, and actions that are technically permitted but constitutionally wrong. Glasswing ensures Mythos is accessed only by authorized red-teamers. Constitutional governance ensures that once Mythos is running, it cannot write exploits targeting the hiring organization, persist beyond its authorized session, or escalate autonomously.
What are hard constraints and how do they differ from access policies?
Access policies are overridable by administrators with sufficient permission. Hard constraints are embedded in the agent’s execution architecture and cannot be overridden by any runtime decision — not by the agent, not by an operator, not by an API call. CTE operates 17 hard constraints (HC-1 through HC-17) in production. HC-12, for example, prohibits silent agent outages exceeding 24 hours. No policy override can suspend it. This is the structural difference: a policy governs what is permitted. A hard constraint governs what is architecturally possible.
How do the six gates evaluate a Mythos-class autonomous operation?
Each gate evaluates a different constitutional dimension before execution proceeds. EG (Epistemic): Is this action within the verified scope of the authorized task? RG (Risk): Does writing this exploit expose the organization to trust damage beyond defined tolerance? GG (Governance): Is the agent gaming its own metrics by writing exploits that score well internally but are outside mission scope? EPG (Economic): Does this action threaten organizational runway? AAG (Autonomy): Is the agent operating within Level 4 autonomy bounds or escalating beyond them? CGG (Constitutional): Has the agent’s self-modification or tool expansion maintained constitutional alignment? All six gates must return PASS for execution to proceed.
Related Articles
- Glasswing Governs Access. Constitutional Governance Governs Action.
- Microsoft Governs How Agents Behave. We Govern Why.
- Introducing constitutional-agent: The Open-Source WHY Layer for AI Governance
- The Six-Gate Architecture: Behavioral Authorization for AI Agents
- Hard Constraints, Not Policies
- The First AI Cyber Espionage Campaign Succeeded Because the Agents Had No Constitution
- The AI Governance Gap Nobody Is Talking About: WHO vs. HOW