The security thread, updated.
Most of this manual was written against the agent-security model of about eighteen months ago. That model was: keep your prompts clean, check tool outputs, watch for jailbreaks, and don't let any agent do something it shouldn't. The trust, guardrails, control plane, and adversarial chapters all teach pieces of that model. None of them is wrong. All of them are incomplete now.
The threat surface has moved. Two specific attacks shown in late 2025, plus the research that followed them, made it clear that agent security is no longer a single-call problem. It's a multi-hop delegation problem, and the field has converged on what the fix needs to look like, even though the fix isn't standardized yet. This chapter names the attacks, explains the pattern they share, and walks through what the next layer looks like. Everything below is grounded in research published in the last twelve months.
The two attacks that named the problem
In September 2025, security researcher Johann Rehberger published an attack he called Cross-Agent Privilege Escalation Rehberger 2025. The setup is simple: two agents are running on the same machine. GitHub Copilot is one. Claude Code is the other. They share the file system. Rehberger showed that if you can compromise one agent through a prompt injection, you can use it to write into the other agent's configuration files. The next time the second agent starts up, it loads the poisoned config and runs whatever the attacker wanted. The first agent compromised the second one without ever directly talking to it.
Two months later, Palo Alto's Unit 42 published Agent Session Smuggling Unit 42 2025. This one targets agents that talk to each other over the Agent2Agent (A2A) protocol. A2A is stateful, which means agents in a session remember what they've said to each other. Unit 42's researchers showed that a malicious agent in a session can slip extra instructions in between the legitimate back-and-forth, and the receiving agent will pick them up as if they were part of the original conversation. They demonstrated a research-assistant agent quietly steering a financial-assistant agent into making unauthorized stock trades. The end user saw none of it.
These two attacks look different. They share one root cause. The protocols that manage trust between agents weren't designed for a world where agents reason, delegate to each other, and spawn other agents to do work. The trust mechanisms in this manual's earlier chapters are correct for an agent acting on its own; they aren't enough for an agent passing work to another agent passing work to a third one.
The pattern: multi-hop delegation
Almost every interesting agent system today is built on task decomposition. A primary agent gets a request, breaks it into smaller tasks, hands those off to specialist agents, and those specialists may hand work off to others CSA 2026. Each handoff crosses a trust boundary. At each handoff, authority gets passed along, and unless something stops it, the authority gets passed along at full strength.
This is the multi-hop delegation problem, and it's where most of 2026's biggest agent security incidents trace back to. Some examples:
- An agent gets a request. It calls a sub-agent. The sub-agent calls a third one. By the time something goes wrong, nobody can prove which step authorized which action.
- The original user said "look up flight prices." Three hops later, an agent is buying tickets. Each hop in isolation looked fine.
- An audit log shows agents X, Y, and Z each performed legal actions. The chain of who-asked-whom-to-do-what is missing.
The OWASP Top 10 for Agentic Applications, published in December 2025, names this directly OWASP 2026. It introduces the concept of least agency: only grant agents the minimum freedom they need to do safe, bounded tasks. That's the spiritual successor of "least privilege" from older security frameworks, with one important difference: it isn't just about what an agent can access, it's about how much room it has to act without checking back in.
What a fix has to do
The Cloud Security Alliance laid out four things any fix has to provide CSA 2026. They're worth knowing by name because the IETF drafts and shipping libraries below all line up against them.
When agent A passes work to agent B, B can never end up with more authority than A had. If A could read this file, B can read this file. If A could not write to it, B cannot write to it either, even if B has its own permissions that would otherwise allow it. Authority only goes down across hops, never up.
At any point in the chain, anyone holding the token should be able to verify the full history of who delegated to whom, all the way back to the original human. This has to work without phoning home to a central server. The token itself carries the proof.
The original goal, meaning what the human actually asked for, has to stay tied to the work as it moves through the chain. If a sub-agent ends up doing something that doesn't match the original ask, the system has to be able to notice. This is where Agent Session Smuggling lives. The fix is to keep the original intent attached to every step.
For anything that's irreversible or expensive, the approval has to come through a different channel from the one the agents are using. If the agent chain is talking over A2A, the approval can't come over A2A. It has to be a push notification, a separate UI, a phone call, anything the agents can't influence. If the agent runs the channel, the agent can manipulate the approval.
What the standards bodies are drafting
Two IETF drafts are working on this. They're worth knowing about not because either has won yet, but because between them they cover the four foundations above and they show what production agent security will look like in 2027.
The first is HDP, short for Human Delegation Provenance, submitted as draft-helixar-hdp-agentic-delegation-00 HDP draft 2026. HDP defines a token where the human authorization is signed at the start, every agent that handles the token adds its own signed block to the chain, and any node anywhere in the chain can verify the whole history offline using only the original public key. No central registry. No third-party trust anchor. The verification is fully self-contained.
The second is AIP, short for Agent Identity Protocol, submitted as draft-prakash-aip-00 AIP draft 2026. AIP introduces what it calls Invocation-Bound Capability Tokens, which try to do four things at once: identity, authorization, scope constraints, and provenance, all in one token. Two flavors: a small JWT for single-hop calls, and a chained Biscuit token for multi-hop delegation. The reference implementation reports verification times under a millisecond.
These two drafts overlap. They don't agree on every detail. They probably won't both win. But they're useful right now as a way to see what direction the field is moving, and someone building a real production agent system in 2026 should know they exist and read both.
Where this changes earlier chapters of this manual
Four pieces of the manual are affected by what's above. Each gets updated in the rest of the manual, but here's where to look and why.
Trust chapter. The capability tokens described in Trust are flat: one signature, one subject, one audience. That's correct for a single-hop call. For multi-hop delegation, the token has to chain. It adds a new signed block at every hop, and to attenuate scope as it goes. The Biscuit token format from AIP, or the macaroon format that predates it, is the way real systems do this. The teaching kit (next page) ships a small implementation of chained tokens so you can see how it works in roughly a hundred lines of Python.
Trust chapter again, on reputation. The reputation system in Trust scores agents on four global dimensions. That's defensible if every agent does one kind of work. It breaks the moment an agent does several. An agent with a great refund-handling reputation should not get a head start on database deletions just because the two scores share an agent ID. Reputation needs to be sliced by task class and by tenant. The math stays the same; the index gets bigger.
Control plane, on audit. The audit log in Control Plane is per-agent and hash-chained. That's correct for tamper evidence on one agent's actions. For a multi-hop system, the question that matters isn't "what did this agent do?" The right question is "the user asked for X; show me everything that happened because of that, across every agent involved." The fix is to make every audit entry reference both the previous entry's hash and the token that authorized this action. Now you can replay forward from the human's original ask and see every action it ultimately produced, across every agent. The EU AI Act's Article 14 effectively requires this for high-risk systems.
Guardrails, on inputs. The guardrails in the Guardrails chapter check inputs at the boundary. Real attacks don't always come in at the boundary. They often come in through tool outputs, retrieved documents, or messages from other agents. The OWASP Top 10 ranks tool-output injection as the fastest-growing class. The fix is taint labels: every piece of data the agent sees gets tagged with where it came from and how trusted that source is. When a tool call would mix highly trusted and untrusted data, the system either blocks it, downgrades the call's permissions, or routes it through a sanitizer. This is borrowed from operating systems, where it's been a hard rule since the 1970s.
The honest state of things
Most production agent systems running in 2026 do not yet do any of this. The standards aren't finished. The libraries are young. The known attacks are still mostly proof-of-concept, though there have already been real-world incidents in the same family. A team shipping today has to make a judgment call: how exposed is what we're shipping to multi-hop delegation attacks, and how much of the 2026-style defense should we build in now versus add later?
The honest answer for most teams: if your agents only do single-hop work (they take a request, do it, and return), you can keep using the patterns from the earlier chapters and revisit this when you add a second hop. If your agents delegate, even in a small way, the four foundations above are not optional anymore. You don't need a Biscuit-Datalog implementation on day one. You do need scope that attenuates, audit that traces, intent that survives the chain, and a separate channel for sensitive approvals. The companion kit on the next page shows what that looks like in code.