A prompt is only text until there is a tool behind it
A language model with no tools is, from a security standpoint, mostly harmless. It can be tricked into saying regrettable things, but it cannot reach out and touch a database, send an email, spend money, or run a shell. The instant you register tools, that changes categorically. Now the model’s output can name a function, and that function does something in the real world. The prompt stopped being text and became, effectively, a shell—an interpreter turning natural language into privileged calls.
This reframes the whole security question. Teams spend enormous effort trying to make the model refuse bad instructions. Useful, but secondary. The primary fact is simpler and more uncomfortable: the agent can only ever do what is in its tool registry. Everything reachable by a successful prompt injection is exactly the set of tools you registered. The registry is the attack surface.
Prompt injection is not the vulnerability—it is the delivery mechanism. The vulnerability is a powerful, unscoped tool sitting in the registry where an injected prompt can reach it.
Untrusted input is closer than you think
“But I control the prompt” is the reassurance that fails. You control the system prompt. You do not control the web page the agent summarizes, the PDF it parses, the support ticket it triages, the email it reads, or the API response it ingests. All of that is untrusted content flowing into the same context window where the model decides which tool to call. Any of it can carry instructions. If a destructive tool is registered, any of those channels can, in principle, fire it.
| Registry choice | What an injection can reach | Blast radius |
|---|---|---|
| Read-only search + summarize | Information disclosure at most | Low |
| + send_email, post_message | Exfiltration, impersonation | Medium |
| + run_shell, delete_records, transfer_funds | Destruction, financial loss | High — unbounded |
Two moves that actually shrink it
You reduce this attack surface the same way you reduce any privilege surface: fewer grants, and a checkpoint on the dangerous ones.
1. Treat every registered tool as a privilege grant
Before adding a tool, ask the question you would ask before granting a production credential: does this task actually need it, and what is the worst a single call can do? Split read-only tools from destructive ones. Scope each tool as tightly as it will go—a query_orders that can only read the caller’s own orders is a smaller surface than a generic run_sql. A registry curated this way turns most injections into dead ends because the tool the attacker wants was never registered.
2. Put a pre-execution gate on the destructive ones
For the tools you must keep—the ones that write, delete, spend, or execute—add a deterministic gate between the model’s decision and the effect. The model can still be injected into asking. The gate decides whether the ask is honored, in code the attacker cannot talk their way past.
# pip install constitutional-agent
from constitutional_agent import Constitution
constitution = Constitution.from_defaults()
async def dispatch_tool(tool_call, ctx):
# Read-only tools clear a lightweight path.
if tool_call.is_read_only:
return await tool_call.execute()
# Destructive tools must clear the gate first.
decision = constitution.evaluate({
"proposed_spend": tool_call.blast_radius,
"approved_budget": ctx.get("approved_blast_radius", 0),
"gate_override_without_amendment": tool_call.is_destructive,
"failing_tests": ctx.get("untrusted_input_present", 0),
})
if decision.system_state.value == "FREEZE":
return Blocked(reason=decision.gate_results) # prompt-as-shell defused
return await tool_call.execute()
The gate is framework-agnostic—it does not care whether the tools come from MCP, native function-calling, LangGraph, or a hand-rolled loop. It cares only that a destructive invocation, however it was triggered, has to pass a deterministic check whose answer the model does not get a vote on.
You cannot make a language model injection-proof; that is an open research problem. You can make sure that even a perfect injection lands on a small, scoped registry with a gate in front of anything that bites.
Stop treating the tool registry as plumbing you configure once and forget. It is the definition of your agent’s capability and therefore the definition of your attack surface. Curate it like a permission set, split read from write, and gate the destructive calls. When prompts become shells, the registry is the thing you actually control.
Gate the destructive tools in your registry
constitutional-agent is open-source and MIT-licensed. pip install constitutional-agent, split read-only from destructive tools, and put constitution.evaluate() in front of anything that writes, deletes, spends, or executes.
Related reading
“Expected Behavior” Is Not the Same as Safe: MCP Command ExecutionMost Useful Agents Carry the Lethal Trifecta
The Gate That Would Have Stopped the Cursor Incident — in 10 Lines
Frequently Asked Questions
Why is the tool registry the real attack surface of an AI agent?
Because the set of registered tools defines everything the agent can do in the world, and prompt injection can steer the model into calling any of them. The model is the decision-maker, but the registry is the reachable capability. A large registry with destructive tools means any untrusted text the model reads is a potential command against those tools. Shrinking and scoping the registry directly shrinks the attack surface.
How do you reduce an AI agent’s attack surface?
Apply least privilege to the tool registry: register only the tools the task needs, scope each tool’s blast radius, separate read-only tools from destructive ones, and put a deterministic pre-execution gate in front of the destructive ones. The registry is a privilege grant, so treat adding a tool the way you would treat granting a new production permission.
Does prompt injection let attackers run tools the agent has?
Yes. If untrusted content reaches the model and a powerful tool is registered, injected instructions can steer the model into invoking that tool with attacker-chosen arguments. The tool functions normally; the intent came from the injected text. The defense is not detecting every injection but ensuring that even a successfully injected call must clear a gate before it executes.
constitutional-agent package is open-source on PyPI. Governance preprint: zenodo.org/records/19343034.