Tone Dark
Tint
field manual · agentic orchestration

Many minds.
one product.

A hands-on guide to building AI agent systems. How agents work together in parallel, how to think about what could go wrong, when something needs an alert, how to design safety checks that bend without breaking, and how to run agents safely from your laptop to production. With diagrams, working code, and real examples.

28
Chapters
30+
Code examples
80+
Citations
2025–26
Research integrated
See it work · live demo

Every chapter that follows is built on one idea: agents that perceive, decide, and act. Watch two of the most common shapes those systems take. The animation runs; the subtitle below it narrates what each agent is thinking and why.

auth api db cache queue worker Detector idle Diagnoser idle Patcher idle $80 $140 $200 price ($) waiting Buyer $90 waiting Seller $190 DEAL · $145 Used GPU · target range $130–$160 two agents negotiating on behalf of a buyer and seller
Where to start

Pick a path through the manual.

Six ways to read this, depending on where you're starting:

Companion code is available. Seven runnable agent frameworks (tiny_agent, orchestrator, context_exchange, agent_profile, verification, trust_engine, guardrails) with 201 passing tests. Pairs chapter-by-chapter with the manual. Get the kit
★ Start here
Don't know where to begin?
The manual has 28 chapters. The build path is a single seven-step ladder from "I have never built an agent" to "I have a multi-agent system with safety, trust, and evaluation that I would ship." Each step takes 30 minutes to a weekend and produces something that runs.
1. First agent 2. Real LLM 3. Two agents 4. Share context 5. Add guards 6. Add trust 7. Ship
Open the build path →

The twenty-eight chapters

CHAPTER 01
Tutorial & prerequisites
Start here if AI agents are new to you. Build a tiny agent in 30 lines of Python, learn the basic loop, and figure out when one agent beats many.
Begin tutorial
CHAPTER 02
Architecture & flow
A practical seven-layer way to think about agent systems: identity, router, state, messaging, memory, tools, and observability. One author's mental model, not a standard.
Explore architecture
CHAPTER 03
Where the work comes from
How a task reaches an agent (five concrete sources). How an agent perceives its environment (five layers of bootstrap). And the most honest answer in the manual to "where does the agent actually live as a software entity?"
Trace the locus
CHAPTER 04
Protocols & interop
MCP, A2A, ACP, ANP. The open standards that emerged in 2025 for connecting agents to tools and to each other, plus the security pitfalls.
Learn protocols
CHAPTER 05
Seven patterns
Seven common ways to organize a group of agents: orchestrator, hierarchy, pipeline, peer swarm, blackboard, debate, and time-aware. Each one has a "what / when / why / watch out" breakdown.
Compare patterns
CHAPTER 06
Context exchange
How agents share what they know without leaking what they shouldn't. Typed envelopes with provenance and TTL, capability handshakes that pin the contract before any data flows, and compartments at the boundary that minimize and redact every crossing.
See the three building blocks
CHAPTER 07
Parallel collaboration
How multiple agents can work on the same thing at the same time, and why surfacing their disagreements is more useful than hiding them.
See parallel build
CHAPTER 08
Memory & reasoning
How agents remember things across sessions: long context, RAG, memory agents, graph memory. Plus reasoning models and when self-reflection actually helps.
Manage memory
CHAPTER 09
Generalists & specialists
Three legitimate shapes for an agent in production: generalist, specialist, generalist plus RAG. Where knowledge actually lives across five anchors (weights, fine-tune, prompt, tools, retrieved). How the choice changes which guards do real work.
Pick the right shape
CHAPTER 10
When the agent itself is wrong
LLM agents hallucinate their own capabilities, drift away from the original ask, and try to call tools they were never authorized for. Three external checks (capability registry, pinned ask, tool gate) close the gap. No attacker required.
Defend against drift
CHAPTER 11
Heuristics & rewards
The four ways to guide an agent: prompts, hand-written rules, rewards, and learned preferences. How they layer, when to use which, and how to avoid reward hacking.
Guide your agent
CHAPTER 12
Trust, privileges & RAG
What your agent is allowed to do, and how it earns more. Pre-config vs runtime privileges, RAG access as a security boundary, the six trust mechanisms, and a working behavior-tracking system: Beta-distributed reputation with exponential decay, signed Ed25519 capability tokens, Sybil-detection by correlation analysis, append-only audit chains.
Manage privileges
CHAPTER 13
Control plane
Real-time enforcement over heterogeneous data sources. Sub-80ms compliance budgets, in-memory policy engines, classification-aware retrieval, PII redaction, contextvar-threaded lineage, multi-store right-to-erasure with signed deletion certificates, and residency routing. Compliance that actually runs while the agent is running.
Enforce in real time
CHAPTER 14
Predictability
Modeling what an agent will do next. The four kinds of predictability, MDPs as the foundation, HMMs for behavior auditing, conformal prediction for runtime confidence intervals with statistical guarantees, world models, and how to combine them so surprises become things you knew about in advance.
Model behavior
CHAPTER 15
Risk modeling
A simple way to figure out which risks need strict controls and which you can just log. The 5×5 matrix, a scoring formula, and how risk changes with the pattern you pick.
Model risks
CHAPTER 16
Alerting
Alerts that ask for action, not alerts that just describe status. Four severity tiers, dedup rules, and a live demo of an alert stream.
Watch alerts
CHAPTER 17
Evaluation
SWE-bench, OSWorld, GAIA, TAU-bench, MCP-Bench. What each one measures, why they can be cheated, and how to build evaluation that actually works for your system.
Evaluate properly
CHAPTER 18
Guardrails
Safety checks that protect your agent and your users. Eight categories of guardrail, ten rules for making them durable, and the latest research on hidden-instruction attacks.
Build guardrails
CHAPTER 19
Infra & deployment
Running agents in dev, test, staging, and production. Blue-green and canary deployments, version pinning, and five real-world case studies from software, fintech, retail, e-commerce, and healthcare.
Deploy to prod
CHAPTER 20
Adversarial & consensus
What happens when agents disagree, on purpose or by accident. Voting protocols (RAFT, BFT), how attackers can spread between agents, and how to combine answers safely.
Handle disagreement
CHAPTER 21
The 2026 frontier
Where agent security has moved in the last twelve months. Cross-Agent Privilege Escalation, Agent Session Smuggling, the OWASP Top 10 for Agentic Applications, the four foundations from CSA, and the IETF drafts (HDP, AIP) converging on the fix. Plain language, verified sources.
Cross the frontier
CHAPTER 22
Production case studies
Real shipped systems with public sources. GitHub Copilot Workspace, Anthropic's computer use, Cursor's background agent, the Devin benchmark controversy, Shopify Sidekick, the Berkeley scanning-agent results. What each engineering team reported in writing, and what those choices reveal.
See real systems
CHAPTER 23
End-to-end walkthrough
One coherent customer support agent, designed from architecture through trust, control plane, predictability, risk, alerting, evaluation, guardrails, and deployment. Each section names which earlier chapter it draws from. The whole system in one place.
Walk it through
CHAPTER 24
The road ahead
Where this is all heading. Self-improving agents, world models, multi-agent economies, embodied agents, formal verification, and the shift from instructing agents to supervising them.
See what's next
CHAPTER 25
Beyond software
Agents that watch instead of answer. The case for moving agentic AI out of software workflows and into clinical floors, neuroimaging streams, autonomous labs. Grounded in 2025-26 research from BMC, Nature, Royal Society Open Science, Frontiers, Meta FAIR. The next decade's frontier.
Cross the frontier
CHAPTER 26
Q&A
Common questions with practical answers. When does using more agents actually help? How do I prevent infinite loops? How much extra does this cost?
Read Q&A
CHAPTER 27
Glossary
Every term defined in one place. Agent, blackboard, blast radius, BFT, MCP, A2A, semantic consensus, and more.
Look up terms
CHAPTER 28
References
Curated bibliography. Foundational papers (ReAct, Reflexion, Toolformer), 2025–26 protocols (MCP, A2A), benchmarks, security research, and the multi-agent frameworks that shaped this manual.
Browse references

Case study domains

The case studies in this manual span software, financial services, and healthcare, with separate examples for retail and e-commerce sub-domains:

Tech & SaaS Fintech Retail E-commerce Healthcare

In our experience, most agent systems that fail don't fail because the model picked the wrong words. They fail because of state management, deployment, infrastructure, and how (or whether) humans stay in the loop. This manual treats those as the main event, not as afterthoughts.