Skip to content

CommonstackAI/ClawGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClawGuard

An independent LLM-based safety reviewer that validates destructive tool calls against the user's original intent before execution. Prevents autonomous AI actions that contradict user instructions — such as the incident where an AI agent deleted emails despite the user saying "don't action until I tell you to".

What it does

ClawGuard intercepts destructive tool calls (delete, modify, exec, etc.) and compares them with the user's original message using a fast, small LLM. It decides:

  • pass — action aligns with user intent → execute normally
  • block — action clearly violates user intent → reject
  • escalate — uncertain → ask user for confirmation

Risk Classification

Level Tools ClawGuard Review
Red (destructive) exec, fs_delete, fs_move, sessions_spawn, sessions_send, gateway, message/deleteMessage, message/editMessage, cron/create,update,remove, nodes/run,invoke Yes
Yellow (mutating) fs_write, message/send, browser/act,navigate No
Green (safe) read, browser/snapshot,status, web_search No

Plugins can declare their own risk level via riskLevel: "safe" | "mutating" | "destructive".

Install

Prerequisites

  • An OpenClaw source checkout (git clone)
  • Git CLI available in PATH

Linux / macOS

chmod +x install.sh
./install.sh /path/to/openclaw

Windows (PowerShell)

.\install.ps1 -OpenClawRoot C:\path\to\openclaw

Configure

Add to your openclaw.json:

{
  "safety": {
    "guardian": {
      "enabled": true,
      "model": "openrouter/qwen/qwen3-32b"
    }
  }
}
  • enabled (boolean) — Enable/disable ClawGuard. Default: false.
  • model (string, optional) — LLM model reference. Default: openrouter/qwen/qwen3-32b.

Verify

Run the ClawGuard test suite:

cd /path/to/openclaw
npx vitest run src/security/guardian-risk.test.ts src/security/guardian-agent.test.ts src/security/guardian-hook.test.ts src/security/guardian-audit.test.ts src/security/guardian-config.test.ts src/security/guardian-scenario.test.ts

Expected: 120 tests passed.

Uninstall

Linux / macOS

./uninstall.sh /path/to/openclaw

Windows (PowerShell)

.\uninstall.ps1 -OpenClawRoot C:\path\to\openclaw

Then remove the "safety" key from your openclaw.json.

Architecture

User message ──┐
               ├──▶ Agent LLM ──▶ Tool Call ──▶ [before_tool_call hook]
               │                                        │
               │                              ┌─────────▼──────────┐
               │                              │  Risk Classification │
               │                              │  (guardian-risk.ts)  │
               │                              └─────────┬──────────┘
               │                                 safe?  │  destructive?
               │                                  │     │
               │                               pass     ▼
               │                              ┌─────────────────────┐
               └──────────────────────────────▶│   ClawGuard LLM     │
                  (original user message)      │   (qwen3-32b)       │
                                               └─────────┬──────────┘
                                                         │
                                              pass / block / escalate
                                                         │
                                                         ▼
                                               ┌──────────────────┐
                                               │   Audit Log       │
                                               │  (guardian-audit)  │
                                               └──────────────────┘

Key Design Decisions

  • Fail-closed: If the ClawGuard LLM is unavailable, times out, or returns unparseable output, the tool call is blocked (never silently allowed).
  • Audit trail: Every ClawGuard decision is persisted to ~/.openclaw/guardian/audit.jsonl.
  • Minimal latency: Only destructive tool calls are reviewed; safe/mutating operations pass through without LLM overhead.
  • Plugin support: Plugins declare riskLevel at registration; ClawGuard respects plugin-declared risk levels.

File Manifest

New Files

File Description
src/security/guardian-types.ts Type definitions (config, risk levels, decisions, audit)
src/security/guardian-agent.ts LLM safety reviewer (prompt, model resolution, response parsing)
src/security/guardian-audit.ts JSONL audit logging
src/security/guardian-risk.ts Tool risk classification (static + plugin + sub-action)
src/security/guardian-*.test.ts Test suite (120 tests)

Modified Files (via patch)

File Change
src/agents/pi-tools.before-tool-call.ts Guardian hook integration + runGuardianCheck
src/agents/pi-tools.ts Pass originalUserMessage, config, agentDir to hook context
src/agents/pi-embedded-runner/run/attempt.ts Pass user prompt as originalUserMessage
src/config/types.openclaw.ts Add safety?: SafetyConfig to OpenClawConfig
src/plugins/registry.ts Add riskLevel to PluginToolRegistration
src/plugins/tools.ts Register plugin risk levels with guardian
src/plugins/types.ts Add riskLevel to OpenClawPluginToolOptions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors