Does this prevent prompt injection?

Not on its own — nothing does. But it makes prompt injection structurally unable to escalate. The LLM can be persuaded to ask for anything; capability-based tool registration and the sandbox decide what actually runs.

What happens on internal errors?

Fail-secure: every error path returns Err(_) with a sanitised message. We never leak internal state, never default to 'permit on error', never panic on user input.

A zero-trust runtime for AI agents

The premise of zero-trust is simple: no input is trusted, no internal call is trusted, no error path is trusted. You validate at every boundary, you fail closed, you redact secrets even from yourself.

openclaw-rs applies this at five layers.

Layer 1: Input validation at boundaries

Every external byte hits a validator before anything else looks at it.

pub const MAX_MESSAGE_SIZE: usize = 100_000;        // 100 KB
pub const MAX_JSON_DEPTH: usize  = 32;
pub const MAX_ATTACHMENT_SIZE: u64 = 50_000_000;   // 50 MB
pub const MAX_ATTACHMENTS: usize = 10;

We reject oversized payloads, deeply-nested JSON, null bytes, control characters, and malformed UTF-8. Boring. Effective. Cheap.

Layer 2: Capability-based tool registration

Agents can’t invent tools. They invoke tools by name, and the name has to be in the ToolRegistry.

let mut tools = ToolRegistry::new();
tools.register("bash", BashTool::new(sandbox_config.clone()))?;
tools.register("read_file", ReadFileTool::new(workspace.clone()))?;
// Tools NOT registered here cannot be called, period.

If a prompt-injected LLM asks for tool: delete_all_data, the registry says no, and the runtime never spins up a process for it.

Layer 3: The sandbox

When a tool does run, it runs inside a platform sandbox. bubblewrap, sandbox-exec, or Job Objects — capped CPU, capped memory, workspace-only filesystem, no network unless explicitly granted.

The sandbox is the bridge between “the LLM asked for it” and “your machine actually does it.” That bridge has a guard.

Layer 4: Secrets that protect themselves

API keys live inside ApiKey:

pub struct ApiKey(SecretBox<str>);

impl fmt::Debug   for ApiKey { fn fmt(...) { write!(f, "[REDACTED]") } }
impl fmt::Display for ApiKey { fn fmt(...) { write!(f, "[REDACTED]") } }

You can format-print an ApiKey anywhere — including a panic message or a tracing span — and the literal characters [REDACTED] come out. You unlock the contents only with an explicit .expose_secret() call, in the code path that actually makes the HTTP request.

At rest, secrets go through the CredentialStore: AES-256-GCM, Argon2id key derivation, file permissions 0600, per-record nonces. See the AES-GCM piece for the long version.

Layer 5: Fail-secure errors

Every function that touches external state returns Result<T, E>. There is no unwrap() on user input, no panic!() on parse failure. Errors return sanitised messages; the underlying cause is logged but never returned to the caller. If we can’t decide whether something is safe, the answer is “no.”

match validate_message(&input) {
    Ok(msg) => process(msg).await,
    Err(_) => {
        // The error has been logged with full context internally.
        // The caller gets a generic 400.
        Err(GatewayError::InvalidInput)
    }
}

Rate limiting

Tower middleware enforces:

60 requests/min per client (gateway-level).
30 messages/min per session (agent-level).

This isn’t security in the classical sense — it’s the difference between an attacker burning through a budget in seconds and burning through it in months.

Audit logging

Authentication attempts, authorisation decisions, tool executions, configuration changes, rate-limit triggers — all logged via tracing with structured fields. The log is your record of “what was attempted” even when policies blocked it.

Dependency hygiene

cargo-deny is part of CI. We reject:

Crates with known security advisories.
Copyleft licenses (we ship MIT).
Crates with unmaintained-recently flags unless we have a reason.

We prefer rustls over openssl, aes-gcm over hand-rolled crypto, secrecy over raw String for sensitive types.

What you still own

Zero-trust at the runtime doesn’t make your tool code safe. If you register a tool that does format!("rm {}", user_input), no sandbox will save you from yourself — the policy will allow the command because it came from your registered code.

Validate inside your tools too. Use serde to parse params into typed structures. Use validator or garde to constrain fields. Never unwrap() on a model-supplied value.

The structure that makes this work

Defence in depth is layered, not stacked. Each layer expects the others to do their job, but degrades gracefully when they fail.

If input validation slips: capability-based registration catches the missing tool. If registration slips: the sandbox catches the dangerous tool process. If the sandbox slips: secret redaction stops sensitive data leaving the process. If redaction slips: audit logs tell you what happened.

That’s not paranoia. That’s how you build a runtime that ships AI agents to production without panic-deploying every time a new prompt-injection paper drops.