magesh.ai agent v1.0 (views are my own) · kill-chain resources about
viewing: hook_guardrails · 00:00:00
← agent.navigate: resources / defensive controls
25 min read · 3 architectures · 5 patterns · 7 references

Hook-Based Guardrails

Pre and post tool-call hooks are the most practical, deployable defensive control for agent systems today. They enforce behavior at the tool boundary — the exact point where an agent's reasoning becomes a real-world action. Here's how I use them, with real code from my own systems.

category:
Defensive Controls · builders · security-teams
PREREQUISITE This article is part of the Agentic AI Kill Chain — read it first for the full threat model →

Why Hooks

An AI agent reasons in natural language. It plans in natural language. You can't reliably validate natural language reasoning — that's the prompt injection problem. But at some point, the agent's reasoning turns into a concrete action: a file write, a shell command, an API call. That's the tool boundary. And that's where hooks operate.

Hooks are shell scripts or configuration rules that run before or after every tool call. They can block, warn, modify, or log. They're not sophisticated — they're regex, file patterns, and exit codes. But they're deployable today, in any agent framework that supports them, with no model changes required.

The key insight: You can't validate what an agent thinks. You can validate what it does. Hooks validate what it does.
Kill Chain stages where hooks apply

Hooks can block injected instructions from reaching sensitive files (protect-files.sh) and enforce structured input validation on tool parameters.

Hooks enforce least privilege by blocking destructive operations (block-dangerous-commands.sh) and gating tool access by agent role (pre-tool-gate.json).

Hooks protect instruction files and configs from modification, preventing an attacker from poisoning the agent's memory or startup instructions.

Three Architectures

Hooks work differently depending on whether you're using an AI coding assistant, building with an agent SDK, or working in a spec-driven IDE. The core contract is the same — event, matcher, handler, decision — but the implementation shapes what's possible.

AI Coding Assistant Hooks Shell scripts in settings.json

Hooks are shell scripts defined in settings.json and triggered by 20+ lifecycle events. They receive JSON on stdin (tool name, input parameters, session context) and block via exit code 2 or JSON output. This is the DevOps model: write a bash script, match it to a tool, deploy via git.

When to use

Solo developer or small team using AI coding assistants for development. You want guardrails on file writes, shell commands, and git operations. Your hooks live in the repo and apply to everyone who clones it.

Key capabilities

  • 20+ lifecycle events (as of March 2026): PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, SubagentStart/Stop, PreCompact, and more
  • 4 hook types: shell command, HTTP endpoint, LLM prompt evaluation, agent (spawns a sub-agent with Read/Grep/Glob)
  • Blocking: exit code 2 (stderr becomes the reason), or JSON permissionDecision: "deny"
  • Input modification: return updatedInput to rewrite tool arguments before execution
  • Git-shareable: .claude/settings.json commits with the project
Agent SDK Hooks Typed callbacks in Python/TypeScript

Hooks are callback functions passed in options.hooks when creating an agent. Same blocking contract (permissionDecision: deny/allow/ask), but implemented as typed code rather than shell scripts. This is the developer model: write functions with full access to your application state.

When to use

Building a multi-agent system, a production agent application, or any architecture where agents run programmatically. You need hooks that can query databases, call APIs, or make decisions based on application state — not just regex matching.

Key capabilities

  • Typed callback functions with access to tool name, input parameters, and session state
  • Same blocking contract as CLI hooks: permissionDecision: "deny" to block, "allow" to auto-approve, "ask" to prompt user
  • Can access application state, databases, and external APIs within the callback
  • Subagent tracking: SubagentStart and SubagentStop events with agent IDs
  • Can also load CLI-style settings.json hooks alongside SDK callbacks via setting_sources

Note: SDK callback APIs vary by language and version. Consult the SDK documentation for current typed interfaces — the blocking contract (permissionDecision) is stable, but callback signatures evolve.

Spec-Driven IDE Hooks IDE-native with agent prompts

Hooks are configured through the IDE UI and stored in the project's configuration directory. Two action types: shell commands (like CLI hooks) or agent prompts (the IDE's agent evaluates a natural language instruction). This is the designer model: hooks that can reason about context, not just match patterns.

When to use

Spec-driven development where hooks integrate with the specification lifecycle. Unique capability: hooks that fire before/after spec task execution — connecting security checks to the development workflow at the spec level, not just the tool level.

Documented features

  • Agent prompt actions: "Summarize scan results" or "Check for security violations" — the IDE agent evaluates the instruction
  • File-pattern triggers: hooks that fire on file save with glob matching (demonstrated in my on-policy-save.json)
  • Agent turn completion hooks for post-action logging (demonstrated in my on-scan-complete.json)
  • Pre-tool hooks for tool gating (demonstrated in my pre-tool-gate.json)

Note: IDE hook blocking behavior and spec-task integration are evolving. Verify current capabilities in the IDE documentation. The shell command action type runs commands but the blocking mechanism is not as explicitly documented as the CLI's exit-code-2 contract.

The pattern is the same everywhere: Event fires → matcher checks if this hook applies → handler runs (script, callback, or agent prompt) → decision returned. The CLI and SDK share the same blocking contract (permissionDecision). IDE hooks use a different mechanism but the defensive intent is the same. The logic you write in a bash hook (check file path, decide block/allow) is the same logic regardless of where it runs.

Hooks in Multi-Agent Systems

In a multi-agent system, hooks become critical at the delegation boundary. When Agent A spawns Agent B, what tools should Agent B have? When Agent B completes, should its output be trusted? The SDK provides SubagentStart and SubagentStop events specifically for this.

SubagentStart: scope the child agent's permissions

When a parent agent spawns a sub-agent, the SubagentStart event fires with the child's agent_id and agent_type. A hook can inspect what the child was asked to do and restrict its tool access. This is the confused deputy defense from Kill Chain Stage 4 — the child agent inherits only the permissions the hook grants, not the parent's full access.

Kill Chain mapping: This directly addresses Stage 4 ESCALATE — preventing privilege escalation through multi-agent delegation chains. Without this hook, a compromised parent can spawn children with its own elevated permissions.
SubagentStop: validate the child's output

When a sub-agent completes, SubagentStop fires with its transcript path and output. A hook can scan the child's output for sensitive data before it flows back to the parent — preventing cross-agent data leakage. This is output guardrailing at the agent boundary, not just the tool boundary.

Kill Chain mapping: Addresses Stage 5 EXFILTRATE — preventing data exfiltration through inter-agent communication channels. The sub-agent's output is the exfiltration vector; the hook is the inspection point.

Five Patterns

From my own agent systems. Each pattern includes the real code, what it prevents, and its kill chain mapping.

Pattern 1: Secret file protection

A pre-tool hook that blocks Edit/Write operations on sensitive files — .env, .xcconfig, deployment configs, service credentials. Allows .template and .example versions so developers can still document the expected format.

protect-files.sh
#!/bin/bash
# Blocks Edit/Write to sensitive files
set -euo pipefail

INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')

[[ -z "$FILE_PATH" ]] && exit 0
FILE_PATH=$(realpath -m "$FILE_PATH" 2>/dev/null || echo "$FILE_PATH")

PROTECTED_PATTERNS=(
  ".env"  ".xcconfig"  ".githooks/"
  "GoogleService-Info.plist"  "wrangler.toml"
)

for pattern in "${PROTECTED_PATTERNS[@]}"; do
  if [[ "$FILE_PATH" == *"$pattern"* ]]; then
    # Allow template/example files
    if [[ "$FILE_PATH" == *".template"* ]] || \
       [[ "$FILE_PATH" == *".example"* ]]; then
      exit 0
    fi
    echo "Blocked: '$FILE_PATH' matches protected pattern '$pattern'. Use the .template version instead." >&2
    exit 2
  fi
done
exit 0
What it prevents: An agent that's been hijacked via prompt injection cannot write API keys to .env, modify deployment configs, or inject secrets into committed code. The exit 2 blocks the tool call entirely — the agent gets an error, not a success.
Pattern 2: Destructive command denylist

A pre-Bash hook that blocks shell commands that destroy state without recovery. Force push, reset --hard, rm -rf on project directories, DROP TABLE. Allows rm -rf on known-safe targets (node_modules, .build, /tmp).

block-dangerous-commands.sh
#!/bin/bash
set -euo pipefail

INPUT=$(cat)
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty')
[[ -z "$COMMAND" ]] && exit 0

# Block force push
if echo "$COMMAND" | grep -qE 'git\s+push\s+.*--force|git\s+push\s+-f\b'; then
  echo "Blocked: force push." >&2; exit 2
fi

# Block reset --hard
if echo "$COMMAND" | grep -qE 'git\s+reset\s+--hard'; then
  echo "Blocked: git reset --hard." >&2; exit 2
fi

# Block git clean -f
if echo "$COMMAND" | grep -qE 'git\s+clean\s+-[a-zA-Z]*f'; then
  echo "Blocked: git clean -f." >&2; exit 2
fi

# Block rm -rf (except safe targets)
if echo "$COMMAND" | grep -qE 'rm\s+-[a-zA-Z]*r[a-zA-Z]*f|rm\s+-[a-zA-Z]*f[a-zA-Z]*r'; then
  if echo "$COMMAND" | grep -qE 'rm\s+-rf\s+(/tmp/|node_modules|\.build/|DerivedData)'; then
    exit 0
  fi
  echo "Blocked: rm -rf on non-temp directory." >&2; exit 2
fi

# Block DROP TABLE/DATABASE
if echo "$COMMAND" | grep -qiE 'DROP\s+(TABLE|DATABASE|SCHEMA)'; then
  echo "Blocked: DROP statement." >&2; exit 2
fi
exit 0
What it prevents: A compromised agent cannot force-push malicious code, destroy git history, delete project files, or drop databases. The denylist is coarse but effective — most destructive commands follow predictable patterns.
Pattern 3: Post-edit hardcoded value detection

A post-Edit hook that scans edited Swift files for hardcoded secrets and configs. Context-aware: allows hex colors in Theme.swift, skips test files and Constants.swift, checks for Supabase URLs and API key patterns. Warns but doesn't block (exit 0) — the agent receives the warning and corrects its behavior in the next edit. Adapt the file extension and patterns for your language.

The pattern: Not every hook needs to block. Post-tool hooks that warn create a feedback loop: the agent receives the warning and corrects its behavior in the next edit. Over a session, the agent learns "use AppConstants, not hardcoded values" — without being blocked.
Pattern 4: Role-based tool access control

A pre-tool hook that gates Write/Edit tools based on agent mode. In audit mode, all write operations are blocked. Audit agents can read everything but modify nothing. This is fail-safe: blocking is the default.

pre-tool-gate.json (Kiro hook)
{
  "name": "Audit Mode Write Gate",
  "event": {
    "type": "preToolUse",
    "toolPattern": "write|edit"
  },
  "action": {
    "type": "runCommand",
    "command": "echo 'BLOCKED: Write operations disabled in audit mode.'"
  }
}
What it prevents: A hijacked audit agent cannot escalate from read to write. Even if the attacker convinces the agent to modify files, the hook blocks at the tool layer. This is the confused deputy defense: the tool system doesn't trust the agent's reasoning about what it should be allowed to do.
Pattern 5: Context reinforcement after compaction

When an agent compacts its context window (drops older messages to fit new ones), security instructions can be lost. This post-compaction hook re-injects critical project context — build commands, security rules, API proxy requirements — so the agent never forgets the threat model.

What it prevents: Context window displacement attacks (Kill Chain Stage 2). An attacker who floods the context to push out safety instructions fails because the compaction hook re-injects them. The security instructions survive even when the context is truncated.

What Hooks Can't Do

Hooks are tool-layer defenses. They validate actions, not reasoning. Here's what they don't solve:

Prompt injection at the reasoning layer

Hooks can't prevent an agent from being hijacked (Kill Chain Stage 3). The agent's reasoning chain is modified before any tool call happens. Hooks only intervene after the agent has already decided what to do. You need instruction hierarchy, input/output separation, and behavioral monitoring for Stage 3 — hooks aren't enough. See the Kill Chain defensive checklist for the full set of controls.

Semantic understanding of intent

A hook that blocks `rm -rf` can be bypassed by `find . -delete` or a Python script that achieves the same result. Hooks match patterns, not intent. A determined attacker can craft equivalent operations that don't match the denylist. Defense in depth matters — hooks are one layer, not the only layer.

Data exfiltration through agent output

If the agent reads secrets and includes them in its text response (not a tool call), hooks don't see it. The exfiltration happens through the agent's natural language output, not through a tool. Output guardrails (classifiers on agent responses) address this — hooks don't.

Hooks themselves can be disabled

If an attacker can modify settings.json or the hooks directory, all defenses collapse. This is why protect-files.sh should protect its own config files — and why hook configurations should be version-controlled, code-reviewed, and monitored for changes. Hooks are only as strong as the integrity of their configuration.

Multi-step bypass

Hooks validate individual tool calls, not sequences. An agent can accomplish something destructive across multiple individually-benign operations that no single hook catches. A hook blocks rm -rf / but can't detect that ten separate rm commands targeting different paths achieve the same result.

Implementing Hooks

Start here
.claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [{
          "type": "command",
          "command": "./hooks/protect-files.sh"
        }]
      },
      {
        "matcher": "Bash",
        "hooks": [{
          "type": "command",
          "command": "./hooks/block-dangerous-commands.sh"
        }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hooks": [{
          "type": "command",
          "command": "./hooks/check-hardcoded-values.sh"
        }]
      }
    ]
  }
}
1
List your agent's tools

What can it do? File read/write, shell, web requests, API calls, database queries? Each tool type needs a different hook pattern.

2
Identify what should never happen

Force push? Write to .env? DROP TABLE? rm -rf on non-temp dirs? These are your denylist. Start with the destructive operations.

3
Add hooks incrementally

Start with one blocking hook (protect-files.sh) and one warning hook (check-hardcoded-values.sh). Monitor for false positives. Add more as you learn your agent's patterns.

The hooks shown here are from my own agent systems. Adapt them for your stack — the patterns are framework-agnostic.

References
[1] Kiro Hooks Documentation — Pre and post-action hooks for agent behavior enforcement. kiro.dev/docs/hooks
[2] Claude Code Hooks — PreToolUse and PostToolUse hooks for tool-call interception. code.claude.com/docs/en/hooks
[3] NVIDIA NeMo Guardrails — Programmable guardrails using Colang for LLM applications. github.com/NVIDIA/NeMo-Guardrails
[4] Guardrails AI — Input/output validation framework for LLM applications. guardrailsai.com
[5] Dhanasekaran, M. "The Agentic AI Kill Chain." magesh.ai/kill-chain (2026). Stages 2, 4, 6 defensive controls.
[6] MCP Specification — Tool annotations (readOnlyHint, destructiveHint). modelcontextprotocol.io/specification
[7] Claude Agent SDK — Agent permission models and tool-use safety patterns. platform.claude.com/docs/en/agent-sdk/hooks

This work represents the author's independent research and personal views. It is not related to or endorsed by the author's employer.