An independent LLM-based safety reviewer that validates destructive tool calls against the user's original intent before execution. Prevents autonomous AI actions that contradict user instructions — such as the incident where an AI agent deleted emails despite the user saying "don't action until I tell you to".
ClawGuard intercepts destructive tool calls (delete, modify, exec, etc.) and compares them with the user's original message using a fast, small LLM. It decides:
- pass — action aligns with user intent → execute normally
- block — action clearly violates user intent → reject
- escalate — uncertain → ask user for confirmation
| Level | Tools | ClawGuard Review |
|---|---|---|
| Red (destructive) | exec, fs_delete, fs_move, sessions_spawn, sessions_send, gateway, message/deleteMessage, message/editMessage, cron/create,update,remove, nodes/run,invoke |
Yes |
| Yellow (mutating) | fs_write, message/send, browser/act,navigate |
No |
| Green (safe) | read, browser/snapshot,status, web_search |
No |
Plugins can declare their own risk level via riskLevel: "safe" | "mutating" | "destructive".
- An OpenClaw source checkout (git clone)
- Git CLI available in PATH
chmod +x install.sh
./install.sh /path/to/openclaw.\install.ps1 -OpenClawRoot C:\path\to\openclawAdd to your openclaw.json:
{
"safety": {
"guardian": {
"enabled": true,
"model": "openrouter/qwen/qwen3-32b"
}
}
}enabled(boolean) — Enable/disable ClawGuard. Default:false.model(string, optional) — LLM model reference. Default:openrouter/qwen/qwen3-32b.
Run the ClawGuard test suite:
cd /path/to/openclaw
npx vitest run src/security/guardian-risk.test.ts src/security/guardian-agent.test.ts src/security/guardian-hook.test.ts src/security/guardian-audit.test.ts src/security/guardian-config.test.ts src/security/guardian-scenario.test.tsExpected: 120 tests passed.
./uninstall.sh /path/to/openclaw.\uninstall.ps1 -OpenClawRoot C:\path\to\openclawThen remove the "safety" key from your openclaw.json.
User message ──┐
├──▶ Agent LLM ──▶ Tool Call ──▶ [before_tool_call hook]
│ │
│ ┌─────────▼──────────┐
│ │ Risk Classification │
│ │ (guardian-risk.ts) │
│ └─────────┬──────────┘
│ safe? │ destructive?
│ │ │
│ pass ▼
│ ┌─────────────────────┐
└──────────────────────────────▶│ ClawGuard LLM │
(original user message) │ (qwen3-32b) │
└─────────┬──────────┘
│
pass / block / escalate
│
▼
┌──────────────────┐
│ Audit Log │
│ (guardian-audit) │
└──────────────────┘
- Fail-closed: If the ClawGuard LLM is unavailable, times out, or returns unparseable output, the tool call is blocked (never silently allowed).
- Audit trail: Every ClawGuard decision is persisted to
~/.openclaw/guardian/audit.jsonl. - Minimal latency: Only destructive tool calls are reviewed; safe/mutating operations pass through without LLM overhead.
- Plugin support: Plugins declare
riskLevelat registration; ClawGuard respects plugin-declared risk levels.
| File | Description |
|---|---|
src/security/guardian-types.ts |
Type definitions (config, risk levels, decisions, audit) |
src/security/guardian-agent.ts |
LLM safety reviewer (prompt, model resolution, response parsing) |
src/security/guardian-audit.ts |
JSONL audit logging |
src/security/guardian-risk.ts |
Tool risk classification (static + plugin + sub-action) |
src/security/guardian-*.test.ts |
Test suite (120 tests) |
| File | Change |
|---|---|
src/agents/pi-tools.before-tool-call.ts |
Guardian hook integration + runGuardianCheck |
src/agents/pi-tools.ts |
Pass originalUserMessage, config, agentDir to hook context |
src/agents/pi-embedded-runner/run/attempt.ts |
Pass user prompt as originalUserMessage |
src/config/types.openclaw.ts |
Add safety?: SafetyConfig to OpenClawConfig |
src/plugins/registry.ts |
Add riskLevel to PluginToolRegistration |
src/plugins/tools.ts |
Register plugin risk levels with guardian |
src/plugins/types.ts |
Add riskLevel to OpenClawPluginToolOptions |