Three ingredients, one exploit
The phrase lethal trifecta comes from Simon Willison, and it names the most reliable way to get data out of an AI agent. It is not a single flashy bug. It is a combination of three capabilities that, individually, all look reasonable:
- Access to private data — your inbox, your customer records, your source, your secrets.
- Exposure to untrusted content — web pages, emails, PDFs, tickets, anything an attacker can influence.
- The ability to communicate externally — send an email, call an API, post to a URL, write a file that leaves the boundary.
Hold any two and you are mostly fine. Private data plus untrusted content, but no way to send anything out? An injection can confuse the agent but not steal from it. Untrusted content plus outbound comms, but no access to anything private? There is nothing worth exfiltrating. It is the presence of all three at once that is lethal: injected instructions in the untrusted content tell the agent to read the private data and ship it somewhere the attacker controls.
Attacker hides an instruction in content the agent reads → the agent, holding your private data, follows it → the agent uses its outbound channel to send that data to the attacker. Three legitimate features, one exfiltration.
Why nearly every useful agent has all three
Here is the part that makes this hard to dismiss. Each leg of the trifecta is not a mistake—it is the feature. Private data access is what lets the agent act on your context instead of generic knowledge. Reading untrusted content is frequently the entire job: summarize this page, triage this ticket, answer from this document. And communicating outward is how the agent delivers value: it sends the reply, files the record, calls the service. Strip any one leg and you often strip the reason the agent exists.
That is why, when you survey real deployed agents, the trifecta is not the exception—it is close to the norm. The provocative framing that the overwhelming majority of useful agents carry all three is directionally right precisely because usefulness pulls each leg into place. (Treat any single headline percentage as an illustrative estimate from surveying common architectures, not a measured census—the structural point stands regardless of the exact number.)
You cannot always remove a leg
The textbook advice is to break the trifecta by removing one leg. Sometimes you can—an agent that only ever reads a trusted, curated corpus does not have leg two. But often removing a leg guts the product. A customer-support agent needs your records (leg one), needs to read customer messages you do not control (leg two), and needs to reply and update systems (leg three). All three are the job description.
When you cannot remove a leg, you attack the chain instead of the ingredients—and the cheapest place to break the chain is the last step: the outbound action. Injection can compromise the decision. It cannot complete the exfiltration if the send is gated.
# pip install constitutional-agent
from constitutional_agent import Constitution
constitution = Constitution.from_defaults()
async def guarded_send(outbound, ctx):
"""Gate any action that moves data across the trust boundary."""
decision = constitution.evaluate({
# Did this turn ingest untrusted content? (leg two present)
"failing_tests": ctx.get("untrusted_input_present", 0),
# Does the payload carry private / sensitive data? (leg one present)
"proposed_spend": outbound.sensitivity_score,
"approved_budget": ctx.get("approved_egress", 0),
# Is the destination outside the trusted allowlist?
"gate_override_without_amendment": outbound.is_external_untrusted,
})
if decision.system_state.value == "FREEZE":
# Sensitive data + untrusted turn + external target = do not send.
return Blocked(reason=decision.gate_results)
return await outbound.execute()
The gate is deterministic code sitting in front of every egress action. When a turn that ingested untrusted content tries to push sensitive data to an untrusted destination, the governance gate fails, the system state goes to FREEZE, and the send never happens. The agent keeps all three capabilities—because it needs them—but the one dangerous combination is checked in code the injection cannot argue past.
You do not defeat the lethal trifecta by pretending a useful agent can live without one of its legs. You defeat it by refusing to let the three combine unsupervised at the outbound step.
Audit your agents for the trifecta first—private data, untrusted content, outbound comms—and assume the useful ones have all three. Where you can drop a leg cheaply, drop it. Where you cannot, gate the egress: make every cross-boundary send clear a deterministic check that considers data sensitivity, whether untrusted input was in play, and where it is going. That is the difference between an injection that fizzles and one that becomes a breach.
Gate the egress step of your agents
constitutional-agent is open-source and MIT-licensed. pip install constitutional-agent, and put constitution.evaluate() in front of the actions that move data across your trust boundary—so the lethal trifecta cannot complete the chain.
Related reading
When Prompts Become Shells: The Tool Registry Is the Attack Surface“Expected Behavior” Is Not the Same as Safe: MCP Command Execution
The Gate That Would Have Stopped the Cursor Incident — in 10 Lines
Frequently Asked Questions
What is the lethal trifecta for AI agents?
The lethal trifecta, a term coined by Simon Willison, describes an AI agent that simultaneously has access to private data, exposure to untrusted content, and the ability to communicate externally. When all three are present, a prompt injection hidden in the untrusted content can instruct the agent to read the private data and send it out—turning injection into exfiltration. Any two of the three is far less dangerous; it is the combination of all three that is lethal.
Why do so many AI agents have the lethal trifecta?
Because each leg is exactly what makes an agent useful. Private data access lets it act on your context; reading untrusted content is usually the job; and communicating out is how it delivers results. A genuinely useful assistant tends to acquire all three by default, which is why the trifecta is structural rather than a rare misconfiguration.
How do you defend an agent that has the lethal trifecta?
Break the chain at the outbound step. If you cannot remove a leg without breaking the product, put a deterministic pre-execution gate in front of the communicate-out actions, so any attempt to send data externally is evaluated against policy before it executes. Combined with least-privilege tool registries and separating trusted from untrusted context, the gate ensures an injected exfiltration attempt has to clear a check the model cannot override.
constitutional-agent package is open-source on PyPI. Governance preprint: zenodo.org/records/19343034.