“Expected Behavior” Is Not the Same as Safe: MCP Command Execution and the Agent Attack Surface

When a Model Context Protocol server runs a shell command that the model asked for, the honest answer to “is this a vulnerability?” is that command execution is the tool’s expected behavior. That answer is correct. It is also the reason the pattern is dangerous—because expected capability plus a model that untrusted input can steer is an exploit waiting for its prompt.

The reply that should worry you

A recurring exchange in agent security goes like this. A researcher reports that a Model Context Protocol server will execute arbitrary shell commands emitted by the model. The maintainer replies, reasonably, that the server exposes a command-execution tool on purpose—running commands is what it is for—so this is expected behavior, not a bug. And they are right. The tool is doing exactly what it was built to do.

That is precisely why it should worry you. In classic application security, “the code did what it was written to do” usually ends the conversation. In agent security it opens it. Because the entity deciding when to invoke that perfectly-working tool is a language model, and a language model’s decisions are influenced by whatever text lands in its context—including text an attacker controls.

The failure mode in one sentence

The tool executed a command because the model asked it to, and the model asked because injected text told it to. Nothing malfunctioned. The capability was the weapon and the prompt was the trigger.

Capability, decision, and where they got merged

Traditional software keeps two things separate: what a component can do and the authenticated request that asks it to do that. A database can drop a table; it drops one only when an authorized query says so. Agent tool-calling quietly collapses that separation. The “request” is now a probabilistic decision by a model reading a mix of trusted instructions and untrusted content, and the model cannot reliably tell which is which.

So when the MCP maintainer says command execution is expected behavior, translate it: the capability is intentional, and the decision to use it has been delegated to something that can be argued with. The vulnerability is not in the tool. It is in the missing layer between the model’s decision and the tool’s capability.

  Traditional service Agent + MCP tool
Who requests the action Authenticated caller A model reading untrusted text
Can the request be forged by content No (auth boundary) Yes (prompt injection)
What “expected behavior” means Runs valid, authorized requests Runs whatever the model was steered to ask

The fix is not removing the tool

You often cannot delete the command-execution tool—it is why the agent is useful. The move is to reintroduce the boundary the tool-calling architecture removed: a pre-execution gate that runs deterministic code between the model’s decision and the side effect. The model may still decide to run rm -rf; the gate decides whether that decision is honored.

# pip install constitutional-agent
from constitutional_agent import Constitution

constitution = Constitution.from_defaults()

async def guarded_mcp_call(tool_call, env):
    """Gate every MCP tool invocation that has real-world side effects."""
    decision = constitution.evaluate({
        # Governance: is this destructive / unapproved for its blast radius?
        "proposed_spend": tool_call.blast_radius,
        "approved_budget": env.get("approved_blast_radius", 0),
        "gate_override_without_amendment": tool_call.is_shell_exec,
        # Epistemic: did any of this context come from untrusted input?
        "failing_tests": env.get("untrusted_input_present", 0),
    })

    if decision.system_state.value == "FREEZE":
        # The command never reaches the shell. Log and stop.
        return Blocked(reason=decision.gate_results)

    return await tool_call.execute()

The gate is deterministic Python. It returns a system state, and your code—not the model—branches on it. An untrusted-sourced request to run a high-blast-radius command trips the governance gate to FAIL, the system state goes to FREEZE, and the shell call is never made. The tool kept its expected behavior. You added the boundary that decides whether to use it.

Minimize the registry while you are there

The gate handles the moment of invocation. The other half is reducing how much can be invoked at all. Every tool you register is a decision the model is now allowed to make. A command-execution tool wired into an agent that also reads untrusted web content is a maximal attack surface. Register the smallest set of tools the task needs, scope each one’s blast radius, and gate the destructive ones specifically. Least privilege did not stop being good advice because the caller is a neural network.

“Expected behavior” describes the tool. It says nothing about whether the decision to invoke it was trustworthy. Agent security lives in that gap, not in the tool’s changelog.

The takeaway

Stop arguing about whether command execution is a bug. It is a capability. The question that matters is who—or what—gets to decide when it fires, and whether a deterministic gate stands between that decision and the shell. Put the boundary back, in code, and expected behavior stops being a synonym for exploitable.

Gate your MCP tool calls

constitutional-agent is open-source and MIT-licensed. pip install constitutional-agent, drop constitution.evaluate() in front of your MCP command-execution path, and the invocation is decided in code instead of trusted to a steerable model.

Install from PyPI → View on GitHub →

Frequently Asked Questions

Is MCP command execution a vulnerability or expected behavior?

Both, and that is the point. An MCP server exposing a command-execution tool is behaving as designed when it runs a command the model asked for. The security problem is that the decision to invoke the tool is made by a model that untrusted input can steer. Expected capability plus a manipulable decision-maker is the exploit. The fix is a deterministic gate in front of the invocation so code, not the model, decides whether the command runs.

How do you secure an MCP server that can run commands?

Treat every tool invocation as an action that must clear a pre-execution gate. Minimize the tool registry, scope each tool’s blast radius, and evaluate the proposed call against policy before it executes. The gate should return a system state the calling process honors—if the state is FREEZE, the command never runs, regardless of how convincingly the model argued for it.

Why does prompt injection turn a normal MCP tool into an exploit?

Because any untrusted text that reaches the model—a web page, an email, a file—can influence which tool the model calls and with what arguments. If a command-execution tool is registered, injected instructions can steer the model into calling it. The tool did its job; the attacker supplied the intent. Governing the invocation decision separately from the tool’s capability is what closes the gap.

This article was drafted by AI agents operating under the constitutional governance framework it describes, and is cross-posted from the author’s dev.to writing on agent security. Code figures (blast radius, record counts) are illustrative inputs to the gate, not measured production data, and no metrics were fabricated as fact (HC-9). The constitutional-agent package is open-source on PyPI. Governance preprint: zenodo.org/records/19343034.