magesh.ai agent v1.0 (views are my own) · resources kill-chain
viewing: kill_chain · 6 stages · 13 references · 00:00:00
← agent.navigate: home

The Agentic AI
Kill Chain

A 6-stage attack lifecycle framework for autonomous AI agent systems. Building on MITRE ATLAS (16 tactics, 84 techniques) and OWASP LLM Top 10 v2.0 — extending them for agents that chain decisions, use tools, delegate, and persist.

cite as:
Dhanasekaran, M. (2026). The Agentic AI Kill Chain. magesh.ai/kill-chain

Why a New Framework

"

Existing frameworks don't account for the emergent security properties that arise when autonomy, long-term memory access, and dynamic tool usage are combined.

— "Securing Agentic AI: A Comprehensive Threat Model", arxiv 2504.19956
01
Adapted from Lockheed Martin

Same structural concept as the Cyber Kill Chain (sequential attack lifecycle), adapted for autonomous AI agents instead of network intrusion. 7 stages become 6 — weaponization is implicit in agent attacks.

02
Extends ATLAS + OWASP

MITRE ATLAS (16 tactics, 84 techniques) covers AI model attacks. OWASP LLM Top 10 covers application risks. This framework extends both into the agent-specific lifecycle — tool chains, delegation, memory persistence.

03
Practitioner-focused

Every stage maps to a defensive control. Not an academic taxonomy — operational guidance built from hands-on experience building and securing agentic systems with Claude Code, Kiro, and MCP.

04
Cloud-agnostic

No vendor-specific recommendations. The framework applies to any agent architecture — whether you're building with Claude, GPT, Gemini, or open-source models. Patterns over products.

What makes agentic AI attacks fundamentally different
Traditional
Attack a system from the outside
Agentic
Hijack a system that attacks for you
Traditional
Map network topology
Agentic
Map capability topology — tools, permissions, delegation chains
Traditional
Exploit code vulnerabilities
Agentic
Exploit trust and reasoning — the agent does the work
Traditional
Install malware for persistence
Agentic
Poison memory and config files — the agent persists for you
$ agent.map --topology --attack-surface

Every connection is an attack surface. Click a stage below to see where it strikes.

USER input / prompts AI AGENT reasoning chain planning loop tool selection system prompt MCP SERVER tool registry schema dispatch TOOLS fs / shell / code DATA docs / db / rag EXT APIs email / slack / web MEMORY CLAUDE.md / .kiro/ SUB-AGENTS delegated tasks probe enumerate payload tool poison indirect HIJACK delegate tool abuse exfil channel data leak memory poison config inject
$ kill_chain.load --stages 6 --mode interactive
agent.log
> select a stage to begin attack simulation...
01 RECON — Probe Agent Capabilities

What the attacker does

Maps the agent's tool access, permission boundaries, connected MCP servers, model type, system prompt constraints, and behavioral limits. Agent recon maps capability topology — not network topology.

How agents change this: Traditional recon maps network topology — ports, services, versions. Agent recon maps what the agent can do: which tools it has, what permissions are auto-approved, what its system prompt constrains, and how it connects to other agents and MCP servers.

Techniques

  • Enumerate available tools by asking the agent what it can do
  • Test permission boundaries by requesting escalating actions
  • Probe system prompt by asking about instructions or constraints
  • Map MCP server connections by observing tool call patterns
  • Identify model family through response characteristics

Real-world example

Documented

Researchers demonstrated system prompt extraction from ChatGPT, Claude, and Gemini through iterative probing — asking models to repeat their instructions verbatim, or using multi-turn conversations to gradually extract constraint boundaries.

Source: Perez & Ribeiro, "Ignore This Title and HackAPrompt" (2023); Simon Willison, "Prompt injection and jailbreaking are not the same thing" (2024)

Extends existing frameworks

ATLAS AML.TA0001 (Reconnaissance) covers ML model recon. This stage extends it with agent-specific vectors: tool enumeration, permission boundary probing, and MCP server discovery.

Defensive control

Minimize information disclosure about agent capabilities. Don't reveal tool lists, permission structures, or system prompt details in responses.

02 INJECT — Deliver the Payload

What the attacker does

Delivers adversarial input to alter agent behavior — through direct prompts, retrieved documents, tool responses, or data sources the agent consumes. The goal: change what the agent does, not just what it says.

How agents change this: Traditional prompt injection targets a single LLM response. Agent injection targets the planning/action loop — the agent doesn't just say something wrong, it does something wrong. Autonomously. Across multiple tool calls.

Techniques

  • Direct prompt injection in user input
  • Indirect injection in documents, web pages, or retrieved context
  • Tool-response poisoning — malicious data from MCP servers
  • Tool schema injection — malicious tool descriptions that alter behavior
  • Context window displacement — flooding context to push out safety instructions

Real-world example

Documented

Indirect prompt injection via web pages: a researcher embedded hidden instructions in a webpage that, when retrieved by a Bing Chat agent, caused it to exfiltrate the user's conversation history through a crafted URL. The agent followed the injected instruction because it couldn't distinguish retrieved content from user intent.

Source: Johann Rehberger, "Bing Chat Data Exfiltration via Indirect Prompt Injection" (2023); Greshake et al., "Not what you've signed up for" (2023)

Extends existing frameworks

ATLAS AML.T0051 (Prompt Injection), AML.T0099 (Tool Data Poisoning) cover prompt and data poisoning. This stage extends them with tool schema injection, MCP protocol-level vectors, and context displacement attacks.

OWASP LLM01 (Prompt Injection) covers direct and indirect injection. This stage extends it with tool-response and protocol-level injection vectors specific to agent architectures.

Defensive control

Validate and sanitize all external inputs. Treat tool responses as untrusted. Pin system instructions outside the context window where possible.

03 HIJACK — Override Agent Behavior

What the attacker does

Takes control of the agent's decision-making — redirecting goals, overriding instructions, or manipulating the reasoning chain. The agent continues operating autonomously — toward the attacker's objectives.

How agents change this: This isn't getting a bad output — the agent's ongoing autonomous behavior is redirected. It continues operating, reasoning through each step, using tools, making decisions. But now it's working toward the attacker's objectives. The victim becomes the weapon.

Techniques

  • Goal substitution — replace the agent's current objective
  • Instruction override — make the agent ignore system constraints
  • Reasoning chain manipulation — influence chain-of-thought
  • Persona hijacking — alter agent's role through accumulated context

Real-world example

Key Extension

Anthropic's own red-team research showed that Claude agents given tool access could be manipulated into executing multi-step attack sequences — reading files, modifying configs, and calling APIs — all from a single injected instruction that overrode the system prompt's constraints. The agent reasoned its way through each step autonomously.

Source: Anthropic, "Challenges in Red-Teaming AI Systems" (2024); Compound AI systems risk analysis, arxiv 2504.19956

Extends existing frameworks

This is where the Kill Chain adds the most. ATLAS and OWASP focus on model-level and application-level threats. This stage extends both into autonomous decision chain hijacking and goal substitution at the planning layer — building on OWASP LLM06 (Excessive Agency) with active hijacking of running agents.

Defensive control

Immutable system instructions. Reasoning chain monitoring. Behavioral anomaly detection against established baselines.

04 ESCALATE — Expand Access

What the attacker does

Uses the hijacked agent to gain broader access — abusing tool permissions, chaining through multi-agent delegation, or bypassing human-in-the-loop controls. Agents trust other agents — exploit the trust model.

How agents change this: Traditional privilege escalation exploits OS or network vulnerabilities. Agent escalation exploits the trust model — agents trust other agents, tools trust agent calls, autoApprove bypasses human review. A compromised orchestrator agent inherits the permissions of every agent it coordinates.

Techniques

  • Abuse existing tool permissions beyond intended scope
  • Chain multi-agent delegation to inherit higher privileges
  • Confused deputy — make a high-privilege agent act on attacker's behalf
  • Bypass autoApprove to execute without human review
  • Orchestrator compromise — hijack the coordinating agent

Real-world example

Multi-Agent

Researchers at UIUC demonstrated "confused deputy" attacks in multi-agent systems where a low-privilege agent crafted requests to a higher-privilege agent, inheriting its tool access. The orchestrator passed the request because inter-agent messages weren't treated as untrusted input — a trust boundary that doesn't exist in traditional systems.

Source: "Securing Agentic AI: A Comprehensive Threat Model", arxiv 2504.19956; Zenity Labs multi-agent attack research (2025)

Extends existing frameworks

ATLAS AML.TA0012 (Privilege Escalation) covers single-system escalation. This stage extends it into multi-agent delegation chains, confused deputy patterns in agent systems, and orchestrator compromise.

Defensive control

Least privilege for every tool and agent. No autoApprove for sensitive operations. Inter-agent authentication. Explicit delegation scoping.

05 EXFILTRATE — Extract Value

What the attacker does

Uses the agent's legitimate access to extract sensitive data. The agent is the exfiltration channel — it has legitimate access and legitimate output channels. Exfiltration looks like normal agent behavior.

How agents change this: The agent itself is the exfiltration channel. It has legitimate access to data and legitimate channels to send it — APIs, emails, file writes, web requests. The exfiltration looks identical to normal agent behavior. No anomaly to detect.

Techniques

  • Read sensitive data through agent's tool access
  • Encode data in legitimate outputs (tool parameters, emails, docs)
  • Cross-session memory leakage — data persisted across sessions
  • Side-channel exfiltration through behavioral patterns

Real-world example

Documented

The Bing Chat markdown rendering attack: an injected instruction caused the agent to encode conversation data into an image URL. When the browser rendered the markdown, it sent an HTTP request to the attacker's server with the user's data as URL parameters — exfiltration through a legitimate rendering feature.

Source: Johann Rehberger, "Data Exfiltration from Bing Chat via Markdown Rendering" (2023); Roman Samoilenko, "ChatGPT Plugin Data Exfiltration" (2023)

Extends existing frameworks

ATLAS AML.T0055 (Exfiltration via Tool Invocation) covers tool-based exfiltration. This stage extends it with cross-session memory leakage and behavioral side-channel patterns specific to persistent agents.

Defensive control

Monitor and log all tool invocations. Data loss prevention on agent outputs. Session-scoped memory with no cross-session persistence of sensitive data.

06 PERSIST — Maintain Access

What the attacker does

Establishes long-term presence by poisoning agent memory, injecting into configuration files, or creating callbacks. The agent itself becomes the persistence mechanism.

How agents change this: Traditional persistence installs malware or backdoors. Agent persistence poisons the information the agent trusts — memory, config files, instruction documents. No binary is modified. No process is running. The agent reloads the poisoned instructions on every startup and re-compromises itself.

Techniques

  • Poison agent memory for future sessions
  • Inject into CLAUDE.md, .kiro/ configs, project instructions
  • Modify agent configuration files for persistent behavior change
  • Establish callbacks through agent-accessible APIs
  • Backdoor skills/plugins the agent loads on startup

Real-world example

Documented

SpaiwareAI research demonstrated persistent memory injection in ChatGPT — an attacker embedded instructions into a document that, when processed by the agent, wrote malicious directives into the long-term memory. Every future conversation then followed the injected instructions, across sessions, without the user's knowledge.

Source: Johann Rehberger / SpaiwareAI, "Persistent Memory Injection in ChatGPT" (2024); MITRE ATLAS AML.T0056

Extends existing frameworks

ATLAS AML.T0056 (Memory Manipulation) covers memory poisoning. This stage extends it into ecosystem persistence: instruction files, skill backdoors, MCP config manipulation, and agent startup poisoning.

Defensive control

Memory integrity verification. Config file integrity monitoring. Skill/plugin signing and verification. Regular memory audit and pruning.

Walk Through an Attack

Step through a realistic attack scenario against an AI coding agent. Each step maps to a kill chain stage.

attack_sim.sh
> ready. click [START] to begin simulation.
READY

MITRE ATLAS Cross-Reference

Mapping all 16 ATLAS tactics against the Agentic AI Kill Chain. Shaded rows show where the Kill Chain extends ATLAS coverage into agent-specific vectors.

ATLAS Tactic ID Kill Chain Stage Agent Extension
Reconnaissance AML.TA0001 01 RECON + Tool enumeration, permission probing, MCP discovery
Resource Development AML.TA0002 02 INJECT + Crafted tool schemas, poisoned MCP servers
Initial Access AML.TA0003 02 INJECT + Indirect injection via retrieved context, tool responses
ML Model Access AML.TA0004 01 RECON + Agent capability mapping beyond model access
Execution AML.TA0005 03 HIJACK + Autonomous execution via reasoning chain hijack
Persistence AML.TA0006 06 PERSIST + Memory poisoning, config injection, skill backdoors
Defense Evasion AML.TA0007 03 HIJACK + Reasoning chain manipulation to bypass safety checks
Discovery AML.TA0008 01 RECON + MCP server discovery, tool registry enumeration
Collection AML.TA0009 05 EXFIL + Agent reads data through legitimate tool access
ML Attack Staging AML.TA0010 02 INJECT + Context window displacement, schema poisoning
Credential Access AML.TA0011 04 ESCALATE + Tool credential harvesting (AML.T0098)
Privilege Escalation AML.TA0012 04 ESCALATE + Multi-agent delegation chains, confused deputy, orchestrator compromise
Lateral Movement AML.TA0013 04 ESCALATE + Inter-agent trust exploitation, sub-agent delegation
Exfiltration AML.TA0014 05 EXFIL + Cross-session memory leakage, behavioral side channels
Impact AML.TA0015 03–06 Impact spans multiple stages in agentic context
Command and Control AML.TA0016 06 PERSIST + Agent callbacks via APIs, webhook persistence
ATLAS added 14 agent-specific techniques in late 2025 through Zenity Labs collaboration (AML.T0055, T0056, T0098, T0099, T0100, T0102). This Kill Chain extends those into the full agent attack lifecycle — covering multi-agent systems, MCP protocol attacks, and ecosystem persistence vectors that no single ATLAS tactic addresses.

OWASP LLM Top 10 Agent Severity

Every OWASP LLM vulnerability amplifies in agentic context. This matrix assesses agent-specific severity for each category — because an agent that acts on a vulnerability is fundamentally different from a chatbot that outputs one.

OWASP Category Chatbot Risk Agent Risk Why It Amplifies
LLM01 — Prompt Injection HIGH CRITICAL Agents act on injected instructions — tool calls, file writes, API requests
LLM02 — Sensitive Info Disclosure MEDIUM HIGH Agents have broader system access — files, databases, credentials
LLM03 — Supply Chain MEDIUM HIGH Each MCP server, tool, and plugin is a supply chain link
LLM04 — Data/Model Poisoning MEDIUM HIGH Poisoned data affects autonomous decisions with real consequences
LLM05 — Improper Output Handling HIGH CRITICAL Agent outputs become real actions — shell commands, code execution
LLM06 — Excessive Agency MEDIUM CRITICAL The core agent risk — too many tools, too few guardrails, autoApprove enabled
LLM07 — System Prompt Leakage LOW MED-HIGH Reveals agent capabilities, tool lists, permission structures
LLM08 — Vector/Embedding Weaknesses MEDIUM HIGH Persistent memory poisoning across sessions
LLM09 — Misinformation MEDIUM HIGH Hallucinations trigger real actions — wrong API calls, wrong file edits
LLM10 — Unbounded Consumption MEDIUM HIGH Agent loops amplify cost attacks — recursive tool calls, infinite delegation
Assessment based on OWASP LLM Top 10 v2.0 (2025). The OWASP GenAI working group (genai.owasp.org) has an active agentic AI initiative — as of March 2026, no standalone "Top 10 for Agentic AI" has been published. This severity assessment bridges that gap.

How It Fits Together

MITRE ATLAS maps 16 tactics for attacking AI models. OWASP lists 10 ways LLM applications fail. The Agentic AI Kill Chain extends both into the full attack lifecycle against autonomous agent systems — agents that chain decisions, use tools, delegate, and persist.
MITRE ATLAS
16 tactics · 84 techniques
Covers ML model attacks — adversarial examples, model poisoning, evasion. Added 14 agent-specific techniques in late 2025 (Zenity Labs collaboration). The Kill Chain extends ATLAS into multi-agent delegation chains, tool protocol attacks, and autonomous decision hijacking.
Scope: AI model security
OWASP LLM Top 10
10 vulnerability categories (v2.0, 2025)
Covers LLM application vulnerabilities — prompt injection, excessive agency, information disclosure. Designed for the chatbot/RAG era. The Kill Chain extends OWASP into multi-agent privilege escalation, cross-session memory poisoning, and orchestrator compromise.
Scope: LLM application risks
Agentic AI Kill Chain
6 stages · attack lifecycle
Extends both frameworks into the full attack lifecycle against autonomous agent systems — from reconnaissance through persistence. Each stage builds on existing ATLAS tactics and OWASP categories, adding agent-specific vectors and defensive controls.
Scope: autonomous agent systems

References & Sources

Frameworks Reviewed
[1] MITRE ATLAS — Adversarial Threat Landscape for AI Systems. 16 tactics, 84 techniques, 56 sub-techniques, 32 mitigations, 42 case studies. atlas.mitre.org
[2] OWASP Top 10 for LLM Applications v2.0 (2025). owasp.org/www-project-top-10-for-large-language-model-applications/
[3] Lockheed Martin Cyber Kill Chain — 7-stage intrusion lifecycle model. Structural model adapted for autonomous AI agents.
Academic Papers
[4] "Securing Agentic AI: A Comprehensive Threat Model." arxiv 2504.19956. Identifies emergent security properties when autonomy, memory, and tool use combine.
[5] Greshake, K. et al. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." (2023). First systematic analysis of indirect injection vectors.
[6] Perez, F. & Ribeiro, I. "Ignore This Title and HackAPrompt: Exposing Systemic Weaknesses of LLMs through a Global Scale Prompt Hacking Competition." (2023).
Industry Research
[7] Zenity Labs + MITRE ATLAS collaboration — contributed 14 agent-specific techniques to ATLAS in late 2025. zenity.io/blog/current-events/zenity-labs-and-mitre-atlas-collaborate-to-advances-ai-agent-security
[8] Anthropic. "Challenges in Red-Teaming AI Systems." (2024). Demonstrated multi-step attack sequences through tool-using agents.
[9] NIST Presentation on ATLAS. csrc.nist.gov/csrc/media/Presentations/2025/mitre-atlas/TuePM2.1-MITRE%20ATLAS%20Overview%20Sept%202025.pdf
Documented Attacks
[10] Rehberger, J. "Data Exfiltration from Bing Chat via Indirect Prompt Injection." (2023). Demonstrated exfiltration through markdown rendering and crafted URLs.
[11] Rehberger, J. / SpaiwareAI. "Persistent Memory Injection in ChatGPT." (2024). Demonstrated cross-session memory poisoning via document processing.
[12] Samoilenko, R. "ChatGPT Plugin Data Exfiltration." (2023). Plugin-based data extraction through legitimate API calls.
[13] Willison, S. "Prompt injection and jailbreaking are not the same thing." (2024). Distinction between prompt injection (security) and jailbreaking (policy).

Builds on

  • MITRE ATLAS v2026 — 16 tactics, 84 techniques (atlas.mitre.org)
  • OWASP LLM Top 10 v2.0 (2025)
  • Lockheed Martin Cyber Kill Chain
  • "Securing Agentic AI" — arxiv 2504.19956

Author

Magesh Dhanasekaran — Senior Security Consultant, 17 years in cybersecurity. Built from hands-on experience securing and building agentic AI systems with Claude Code, Kiro, and MCP.

LinkedIn · X

License & citation

This framework is open for reference, citation, and use in security assessments. Please cite as:

Dhanasekaran, M. (2026). The Agentic AI Kill Chain. magesh.ai/kill-chain

Views are my own. This is a practitioner framework — it prioritizes operational utility over completeness. Cloud-agnostic. No vendor-specific recommendations.