field manual · agentic orchestration

Many minds.
one product.

A hands-on guide to building AI agent systems. How agents work together in parallel, how to think about what could go wrong, when something needs an alert, how to design safety checks that bend without breaking, and how to run agents safely from your laptop to production. With diagrams, working code, and real examples.

Chapters

30+

Code examples

80+

Citations

2025–26

Research integrated

▸ See it work · live demo

Every chapter that follows is built on one idea: agents that perceive, decide, and act. Watch two of the most common shapes those systems take. The animation runs; the subtitle below it narrates what each agent is thinking and why.

This demo uses JavaScript to animate two agent scenarios. With JS disabled you see the static start frame; the full walkthrough is in the Tutorial chapter.

▸ Where to start

Pick a path through the manual.

Six ways to read this, depending on where you're starting:

New to AI agents? Start with Tutorial & prerequisites, then read straight through in order.
Building your first multi-agent system? Read Architecture, then Protocols, then Seven patterns, and finish with Guardrails.
Working on memory or reasoning? Go straight to Memory & reasoning and then Heuristics & rewards for how to shape behavior beyond just prompts.
Wiring up RAG, permissions, or agent-to-agent trust? Trust, privileges & RAG covers pre-config vs runtime privileges, scoped RAG access, and reputation that doesn't get gamed.
Worried about compliance, audits, or data privacy? Control plane covers real-time policy enforcement, classification-aware retrieval, lineage tracking, GDPR right-to-erasure across heterogeneous stores, and residency routing, with sub-80ms latency budgets.
Putting an agent into production? Read Infra & deployment, Evaluation, Risk modeling, and Alerting.
Curious where this is all heading? Skip ahead to The road ahead for a forward look at self-improving agents, world models, multi-agent economies, and what to start preparing for.

↓ Companion code is available. Seven runnable agent frameworks (tiny_agent, orchestrator, context_exchange, agent_profile, verification, trust_engine, guardrails) with 201 passing tests. Pairs chapter-by-chapter with the manual. Get the kit

★ Start here

Don't know where to begin?

The manual has 28 chapters. The build path is a single seven-step ladder from "I have never built an agent" to "I have a multi-agent system with safety, trust, and evaluation that I would ship." Each step takes 30 minutes to a weekend and produces something that runs.

1. First agent → 2. Real LLM → 3. Two agents → 4. Share context → 5. Add guards → 6. Add trust → 7. Ship

Open the build path →

The twenty-eight chapters

CHAPTER 01

Tutorial & prerequisites

Start here if AI agents are new to you. Build a tiny agent in 30 lines of Python, learn the basic loop, and figure out when one agent beats many.

Begin tutorial

CHAPTER 02

Architecture & flow

A practical seven-layer way to think about agent systems: identity, router, state, messaging, memory, tools, and observability. One author's mental model, not a standard.

Explore architecture

CHAPTER 03

Where the work comes from

How a task reaches an agent (five concrete sources). How an agent perceives its environment (five layers of bootstrap). And the most honest answer in the manual to "where does the agent actually live as a software entity?"

Trace the locus

CHAPTER 04

Protocols & interop

MCP, A2A, ACP, ANP. The open standards that emerged in 2025 for connecting agents to tools and to each other, plus the security pitfalls.

Learn protocols

CHAPTER 05

Seven patterns

Seven common ways to organize a group of agents: orchestrator, hierarchy, pipeline, peer swarm, blackboard, debate, and time-aware. Each one has a "what / when / why / watch out" breakdown.

Compare patterns

CHAPTER 06

Context exchange

How agents share what they know without leaking what they shouldn't. Typed envelopes with provenance and TTL, capability handshakes that pin the contract before any data flows, and compartments at the boundary that minimize and redact every crossing.

See the three building blocks

CHAPTER 07

Parallel collaboration

How multiple agents can work on the same thing at the same time, and why surfacing their disagreements is more useful than hiding them.

See parallel build

CHAPTER 08

Memory & reasoning

How agents remember things across sessions: long context, RAG, memory agents, graph memory. Plus reasoning models and when self-reflection actually helps.

Manage memory

CHAPTER 09

Generalists & specialists

Three legitimate shapes for an agent in production: generalist, specialist, generalist plus RAG. Where knowledge actually lives across five anchors (weights, fine-tune, prompt, tools, retrieved). How the choice changes which guards do real work.

Pick the right shape

CHAPTER 10

When the agent itself is wrong

LLM agents hallucinate their own capabilities, drift away from the original ask, and try to call tools they were never authorized for. Three external checks (capability registry, pinned ask, tool gate) close the gap. No attacker required.

Defend against drift

CHAPTER 11

Heuristics & rewards

The four ways to guide an agent: prompts, hand-written rules, rewards, and learned preferences. How they layer, when to use which, and how to avoid reward hacking.

Guide your agent

CHAPTER 12

Trust, privileges & RAG

What your agent is allowed to do, and how it earns more. Pre-config vs runtime privileges, RAG access as a security boundary, the six trust mechanisms, and a working behavior-tracking system: Beta-distributed reputation with exponential decay, signed Ed25519 capability tokens, Sybil-detection by correlation analysis, append-only audit chains.

Manage privileges

CHAPTER 13

Control plane

Real-time enforcement over heterogeneous data sources. Sub-80ms compliance budgets, in-memory policy engines, classification-aware retrieval, PII redaction, contextvar-threaded lineage, multi-store right-to-erasure with signed deletion certificates, and residency routing. Compliance that actually runs while the agent is running.

Enforce in real time

CHAPTER 14

Predictability

Modeling what an agent will do next. The four kinds of predictability, MDPs as the foundation, HMMs for behavior auditing, conformal prediction for runtime confidence intervals with statistical guarantees, world models, and how to combine them so surprises become things you knew about in advance.

Model behavior

CHAPTER 15

Risk modeling

A simple way to figure out which risks need strict controls and which you can just log. The 5×5 matrix, a scoring formula, and how risk changes with the pattern you pick.

Model risks

CHAPTER 16

Alerting

Alerts that ask for action, not alerts that just describe status. Four severity tiers, dedup rules, and a live demo of an alert stream.

Watch alerts

CHAPTER 17

Evaluation

SWE-bench, OSWorld, GAIA, TAU-bench, MCP-Bench. What each one measures, why they can be cheated, and how to build evaluation that actually works for your system.

Evaluate properly

CHAPTER 18

Guardrails

Safety checks that protect your agent and your users. Eight categories of guardrail, ten rules for making them durable, and the latest research on hidden-instruction attacks.

Build guardrails

CHAPTER 19

Infra & deployment

Running agents in dev, test, staging, and production. Blue-green and canary deployments, version pinning, and five real-world case studies from software, fintech, retail, e-commerce, and healthcare.

Deploy to prod

CHAPTER 20

Adversarial & consensus

What happens when agents disagree, on purpose or by accident. Voting protocols (RAFT, BFT), how attackers can spread between agents, and how to combine answers safely.

Handle disagreement

CHAPTER 21

The 2026 frontier

Where agent security has moved in the last twelve months. Cross-Agent Privilege Escalation, Agent Session Smuggling, the OWASP Top 10 for Agentic Applications, the four foundations from CSA, and the IETF drafts (HDP, AIP) converging on the fix. Plain language, verified sources.

Cross the frontier

CHAPTER 22

Production case studies

Real shipped systems with public sources. GitHub Copilot Workspace, Anthropic's computer use, Cursor's background agent, the Devin benchmark controversy, Shopify Sidekick, the Berkeley scanning-agent results. What each engineering team reported in writing, and what those choices reveal.

See real systems

CHAPTER 23

End-to-end walkthrough

One coherent customer support agent, designed from architecture through trust, control plane, predictability, risk, alerting, evaluation, guardrails, and deployment. Each section names which earlier chapter it draws from. The whole system in one place.

Walk it through

CHAPTER 24

The road ahead

Where this is all heading. Self-improving agents, world models, multi-agent economies, embodied agents, formal verification, and the shift from instructing agents to supervising them.

See what's next

CHAPTER 25

Beyond software

Agents that watch instead of answer. The case for moving agentic AI out of software workflows and into clinical floors, neuroimaging streams, autonomous labs. Grounded in 2025-26 research from BMC, Nature, Royal Society Open Science, Frontiers, Meta FAIR. The next decade's frontier.

Cross the frontier

CHAPTER 26

Q&A

Common questions with practical answers. When does using more agents actually help? How do I prevent infinite loops? How much extra does this cost?

Read Q&A

CHAPTER 27

Glossary

Every term defined in one place. Agent, blackboard, blast radius, BFT, MCP, A2A, semantic consensus, and more.

Look up terms

CHAPTER 28

References

Curated bibliography. Foundational papers (ReAct, Reflexion, Toolformer), 2025–26 protocols (MCP, A2A), benchmarks, security research, and the multi-agent frameworks that shaped this manual.

Browse references

Case study domains

The case studies in this manual span software, financial services, and healthcare, with separate examples for retail and e-commerce sub-domains:

Tech & SaaS Fintech Retail E-commerce Healthcare

In our experience, most agent systems that fail don't fail because the model picked the wrong words. They fail because of state management, deployment, infrastructure, and how (or whether) humans stay in the loop. This manual treats those as the main event, not as afterthoughts.

Many minds. one product.

Pick a path through the manual.

The twenty-eight chapters

Case study domains

Many minds.
one product.