Why AI Safety Code Must Fail Closed, Not Open

When a rate limiter throws an exception, should the action proceed or halt? The answer reveals everything about your AI governance.

Most AI agent deployments govern what agents can access. Permissions. API scopes. Resource quotas. Far fewer govern what happens when the safety checks themselves fail. This is a problem, and it may be the most underexamined failure mode in autonomous agent systems.

We discovered this gap the hard way. Our business development agent—one of 88 autonomous agents operating under a constitutional governance framework—sent too many social media replies in a single session. Not a catastrophic number, but enough to trigger platform moderation warnings. The investigation that followed uncovered a pattern that we have since found in five separate subsystems. The pattern is simple, consequential, and almost invisible in code review.

The Incident

The business development agent has a rate limiter. The rate limiter caps social media replies at a configurable threshold per time window. This is standard safety infrastructure. The rate limiter worked correctly when tested in isolation.

The problem was not the rate limiter. It was the exception handler wrapping the rate limiter.

During a production cycle, the rate limiter encountered an unexpected input—a malformed timestamp from an upstream service. It raised an exception. The exception handler, written to be resilient, caught the error, logged a warning, and allowed the action to proceed. The agent sent the reply. Then another. Then another. Each time the rate limiter threw the same exception. Each time the handler logged and continued.

The safety mechanism had silently disabled itself on error. The agent operated for an entire cycle with no rate limiting at all.

Fail-Open vs. Fail-Closed

These terms come from physical security systems. A fail-open lock unlocks when it loses power. A fail-closed lock stays locked. In a fire, you want exit doors to fail open. In a vault, you want the door to fail closed. The correct default depends on what you are protecting against.

In safety code for autonomous agents, the correct default is almost always fail-closed. If the check cannot determine whether an action is safe, the action should not proceed.

Here is the difference in pseudocode:

# Fail-open: if the safety check breaks, allow the action
def check_rate_limit_open(agent_id, action):
    try:
        return rate_limiter.is_allowed(agent_id, action)
    except Exception as e:
        logger.warning(f"Rate limiter error: {e}")
        return True  # Dangerous: action proceeds without check

# Fail-closed: if the safety check breaks, block the action
def check_rate_limit_closed(agent_id, action):
    try:
        return rate_limiter.is_allowed(agent_id, action)
    except Exception as e:
        logger.error(f"Rate limiter error: {e}")
        return False  # Safe: action blocked until check works
                

The difference is one boolean. The consequences are not proportional to the change.

Why Fail-Open Is the Natural Default

Most software engineers are trained to write resilient code. Do not crash the program. Handle errors gracefully. Degrade performance rather than availability. These are sound principles for most software. They are exactly wrong for safety code.

The instinct to keep things running is deeply embedded. When an exception handler is written, the developer is usually thinking about availability: "If this check fails, I do not want the entire system to halt." The implicit assumption is that the check is a convenience, not a constraint. That failing to check is better than failing to operate.

For rate limiters, permission checks, deduplication layers, and content filters, this assumption inverts. Failing to check is the dangerous state. The system operating without its safety checks is worse than the system not operating at all. A car with failed brakes should not keep driving. It should stop.

This is counterintuitive for developers accustomed to maximizing uptime. In safety code, the most dangerous failure is the one that looks like success.

Curious about your cognitive load?

The Decision Load Index measures cognitive friction from unprocessed decisions. Takes about 5 minutes.

Check your DLI score

The Pattern Repeats

After finding the fail-open rate limiter, we audited the entire codebase. We found the same pattern in five distinct subsystems across our 88-agent system:

Rate limiters: Exception handlers allowed actions when rate checks failed.
Permission checks: When the permission service was unreachable, agents defaulted to permitted.
Deduplication layers: When the dedup check threw an error, duplicate actions were allowed through.
Content filters: When the filter could not evaluate content, the content was published unfiltered.
Budget gates: When the budget query timed out, the spending action was approved by default.

Each instance followed the same logic. A developer wrote a try/except block to handle potential errors from a safety function. The except clause logged the error and returned a permissive default. The code passed review because it looked like responsible error handling. It was, in effect, a bypass switch activated by any unexpected condition.

None of these were written with malicious intent. Each one was a reasonable-looking exception handler that happened to have catastrophic safety implications. The pattern is subtle enough that it survives code review, testing, and deployment without raising alarms.

How to Audit for It

The audit process is straightforward. Search your codebase for exception handlers around safety-critical functions. The signature is consistent:

Identify safety functions: Rate limiters, permission checks, content filters, deduplication, budget validation, fraud detection—anything that exists to prevent an action.
Find their callers: Locate every place these functions are invoked.
Examine the exception handlers: For each call site, check what happens when the safety function raises an exception.
Check the default return: If the except block returns a permissive value (True, None, an empty list, or simply continues execution), that is fail-open.
Convert to fail-closed: The except block should return a denial. The action should not proceed when its safety check is broken.

In our case, this audit took approximately four hours and produced 71 new tests—each one verifying that a specific safety function correctly blocks when it encounters an error.

The Edge Cases

Fail-closed is not universally correct. There are legitimate scenarios where fail-open is the right choice:

Non-safety analytics: If a metrics collection function fails, blocking the primary action is usually wrong. Analytics should fail open.
Logging infrastructure: If the audit logger is down, whether to halt all operations is a judgment call. Some systems require it (financial compliance). Others do not.
Redundant checks: If the same safety property is verified by multiple independent systems, one failing open may be acceptable if the others remain closed.

The decision framework is simple: ask what the consequence is if this check silently disappears. If the consequence is that an unsafe action proceeds unchecked, fail closed. If the consequence is a missing log entry, fail open. Most safety code falls into the first category.

The Generalized Lesson

This pattern extends beyond code. Governance—whether for AI agents, financial systems, or organizations—is not just the rules. It is what happens when the rules themselves break.

A rate limiter that allows unlimited actions on error is not a rate limiter. A compliance check that approves on timeout is not a compliance check. A constitutional constraint that defaults to permissive when it cannot evaluate is not a constraint. In each case, the mechanism exists in the codebase, passes tests in normal conditions, and provides zero protection in abnormal conditions.

The abnormal conditions are precisely when protection matters most.

After this incident, we added a constitutional principle to our governance framework: all safety-critical code must fail closed. This is not a guideline or a best practice. It is a hard constraint—an absolute prohibition that no agent or amendment can override. We chose to encode it as law because the natural tendency of software development pushes toward fail-open, and only a structural counterpressure prevents the pattern from recurring.

A governance system that only works when everything goes right is not a governance system. It is an optimistic assumption with a nice API.

This article was drafted by AI agents operating under the constitutional governance framework described above. The incident and audit findings are from production system data. The code examples are simplified pseudocode, not production code. CTE is a research initiative, not an established product.

Building AI agents? We documented 58 days of operational governance data.

Three major incidents, 15 lessons learned, and a framework that tells the truth about its own failures.

Read the full case study

Is your organization governance-ready?

78% of executives can't pass an independent AI governance audit in 90 days (Grant Thornton). Our Constitutional AI Governance Stress Test shows you exactly where the gaps are — before your board asks.

Get Your Governance Score →