Tone Dark
Tint
02 Architecture · anatomy & flow

An agent system is mostly routing, state, and rules.

Once you start building real agent systems, the same handful of pieces show up in every project: who the agent is, where requests get routed, where state lives, how messages flow, what the agent remembers, what tools it can use, and how you watch it work. The way of breaking these apart shown here is informed by surveys including Architectures & Taxonomies, arXiv 2026 and Prompt-Response to Goal-Directed, arXiv 2026, plus the design choices in AutoGen, Wu et al. 2023 and MetaGPT, Hong et al. ICLR 2024. There's no agreed-upon standard the way networking has the OSI model, so the seven layers below are one working model that has been useful in practice. The value isn't the specific layers; it's having names for things. Once each piece has a name, problems become easier to find. Without names, every bug feels like a mystery.

A practical seven-layer way to think about it

Layered architecture diagram
Each layer has one responsibility. Cross-cutting concerns (guardrails, observability) thread through all of them.
L7 · OBSERVER traces · costs · drift · loops · escalation triggers L6 · TOOL GATEWAY audit · rate-limit · sandbox · per-role allow-lists · output validation L5 · MEMORY scratchpad · workflow state · long-term store · provenance L4 · MESSAGE BUS in-process · async queues · gRPC · pub-sub · typed envelopes L3 · STATE STORE single source of truth · typed object · checkpointable · replay-friendly L2 · ROUTER deterministic rules · LLM planner · hybrid · dispatcher L1 · IDENTITY agent ID · role · system prompt · tool allow-list · model binding GUARDRAILS REQUEST FLOW
Reading the diagram. The layers are concerns, not request hops. Each one names a thing your system has to handle.
  • L1 Identity: who each agent is and what it's allowed to do.
  • L2 Router: which agent handles a given task.
  • L3 State: the canonical record.
  • L4 Bus: how messages move between agents.
  • L5 Memory: what agents read and write across turns.
  • L6 Tool Gateway: the audited boundary to the outside world.
  • L7 Observer: what watches all of it.
A real request touches most of these layers, often more than once, in whatever order the workflow needs.

Guardrails (the right-hand ribbon) are checkpoints between layers, not a layer of their own. When one layer fails, the observer sees it and the router picks the next move; the system doesn't crash.

The smallest version that's still real

Here's the simplest honest version of an orchestrator. The big frameworks (LangGraph, CrewAI, AutoGenWu 2023, OpenAI Swarm) are all richer versions of this same idea.

from dataclasses import dataclass, field
from typing import Callable
import uuid, time

@dataclass
class Message:
    sender: str
    recipient: str
    kind: str            # 'task' | 'result' | 'error' | 'vote'
    payload: dict
    trace_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    ts: float = field(default_factory=time.time)

@dataclass
class WorkflowState:
    """L3: single source of truth."""
    goal: str
    decisions: list = field(default_factory=list)
    artifacts: dict = field(default_factory=dict)
    history: list = field(default_factory=list)
    errors: list = field(default_factory=list)
    iteration: int = 0
    max_iterations: int = 25
    token_budget: int = 100_000
    tokens_used: int = 0

class Agent:
    """L1: identity + scoped capability."""
    def __init__(self, name, role, tools, llm):
        self.name = name
        self.role = role
        self.tools = {t.__name__: t for t in tools}   # allow-list
        self.llm = llm

    def handle(self, msg, state):
        ctx = self._project_context(msg, state)        # curated, not full
        decision = self.llm(ctx)
        # Schema matches the tutorial chapter: {"action": "call_tool" | "final_answer", ...}
        if decision["action"] == "call_tool":
            if decision["tool"] not in self.tools:
                raise PermissionError(f"{self.name} denied: {decision['tool']}")
            result = self.tools[decision["tool"]](**decision["args"])
            return Message(self.name, msg.sender, "result", {"data": result})
        # action == "final_answer"
        return Message(self.name, msg.sender, "result", decision["payload"])

    def _project_context(self, msg, state):
        # Each role declares which artifact keys it cares about (role isolation)
        keymap = {"researcher": ["sources"],
                  "writer": ["outline", "sources"],
                  "reviewer": ["draft"]}
        keys = keymap.get(self.role, [])
        return {
            "role": self.role,
            "goal": state.goal,
            "task": msg.payload,
            "artifacts": {k: state.artifacts.get(k) for k in keys},
            "tools": list(self.tools.keys()),
        }

class Orchestrator:
    """L2 router + L7 observer wired together."""
    def __init__(self, agents, router, observer=None):
        self.agents = agents
        self.router = router         # state -> next agent name | 'done'
        self.observer = observer

    def run(self, goal):
        state = WorkflowState(goal=goal)
        while state.iteration < state.max_iterations:
            state.iteration += 1
            # Token-budget halt (the field is enforced, not decorative)
            if state.tokens_used >= state.token_budget:
                state.errors.append({"reason": "token_budget_exhausted"})
                break
            next_agent = self.router(state)
            if next_agent == "done": break
            agent = self.agents[next_agent]
            msg = Message("orchestrator", next_agent, "task",
                          {"instruction": self._task_for(next_agent, state)})
            try:
                reply = agent.handle(msg, state)
                self._merge(state, next_agent, reply)
            except Exception as e:
                state.errors.append({"agent": next_agent, "error": str(e)})
            if self.observer:
                self.observer(state)        # trace · alert · halt
        return state

    def _task_for(self, agent_name, state):
        # Build the natural-language instruction for the next agent.
        # Real implementations templatize per-role; this is the simplest version.
        last_artifact = next(iter(reversed(state.history)), None)
        return {
            "goal": state.goal,
            "prior_step": last_artifact,
            "iteration": state.iteration,
        }

    def _merge(self, state, agent_name, reply):
        # Persist the agent's reply into shared state.
        state.history.append({"agent": agent_name, "payload": reply.payload})
        if isinstance(reply.payload, dict) and "artifact_key" in reply.payload:
            state.artifacts[reply.payload["artifact_key"]] = reply.payload.get("data")
        # Account for token usage if the agent reports it
        state.tokens_used += int(reply.payload.get("tokens_used", 0)) if isinstance(reply.payload, dict) else 0

The idea of separating "decide what to do" from "actually do it" comes from two well-known papers: ReAct, Yao et al. ICLR 2023 and Toolformer, Schick et al. NeurIPS 2023. Four habits to keep: each agent gets its own list of tools, not access to everything. Show only the context the agent actually needs, not the whole history. Errors are normal events, not crashes. The observer is a separate piece, not mixed into the agents.

Deterministic or stochastic? The choice shapes everything downstream

The loop runs a model. Models are not pure functions. Given the same prompt, the same model can produce different outputs on two calls, and on most setups it does. This single property of the underlying engine reaches into every later chapter, so it is worth being clear about what it means before composition makes the question harder to think about.

The thing that changes is the decide step. Perceive reads from a tool or a state store; that is deterministic up to whatever the source returns. Act calls a tool; the tool itself may be deterministic or not, but the call is intentional. Only decide is where stochasticity comes from in the loop, because decide is where the model gets invoked and the model samples from a probability distribution rather than picking the single most likely token.

Operators have three knobs that control how stochastic decide actually is.

Most production systems use a fourth pattern that the three knobs above do not name: mixed determinism. Different decisions in the same loop run with different settings.

Mixed determinism is implemented as multiple model calls in the same loop iteration, each with different decoding settings, with the outputs composed by orchestration code. It is not a setting on a single call; it is a discipline about which decisions get which settings.

What changes downstream when the loop is stochastic

Six places where this matters, with the implication for each.

SubsystemWhat changesWhat you do about it
Goal confirmation (ch 03) The agent's restatement of the goal is not deterministic, so hash equality is brittle Run confirmation at temperature zero, or use semantic similarity above a threshold rather than hash equality
Reputation math (ch 12) The same agent on the same task can produce different outcomes; the Beta counter measures the agent's output distribution, not a single deterministic ability Treat reputation as a distribution estimate. Set thresholds aware of the agent's variance, not just its mean. The credible-lower-bound rule already does this; it works correctly here, but you should be intentional about the variance
Caching You cannot cache stochastic outputs by input alone. Two calls with identical input may legitimately produce different outputs Cache only at the deterministic boundary (temperature zero tool-call decisions, structured output schemas). Cache the prose only if your operators have agreed that one canned reply is acceptable
Reproducing failures "Run the same input again" does not reproduce a failure. The audit log has to record the actual output, not the intent to produce one Log the full model response on every call, not just the parsed action. The audit log from chapter 12 already does this if implemented correctly. If yours does not, fix that first
Evaluation harness (ch 17) Pass rates are distributions, not numbers. "95% accuracy" with a stochastic agent and "95% accuracy" with a deterministic agent are different claims Run each evaluation case multiple times (5 to 20 typically). Report mean and confidence interval, not a single number. Block merges on the lower bound dropping, not on the mean
Adversarial robustness (ch 20) "The agent always refuses this prompt" cannot be proven from a single test. A 1-in-50 refusal failure looks like full compliance over five tests Adversarial test suites run each case at least 50 times and flag any failure, not just majority-rule. Set the bar for refusal at "never failed in 50 runs" rather than "passed once"

The pattern across the table is that every place you treated the agent's output as a single value, you now have to treat it as a sample from a distribution. The math gets a little more careful; the practice gets a lot more honest about what the system is actually doing.

Common mistakes

The AgentProfile from the kit's agent_profile/ module records this choice as a DecodingPolicy field, with separate settings for tool-call decisions, confirmation steps, and prose. Operators set the policy at deploy time; the audit log records which policy was active when each action was taken. When something goes wrong, the policy is the first thing to check after the model version.

Composing two agents creates a new system

Two agents that pass their own tests don't automatically pass the test of being chained together. This trips up almost every team the first time. Three things go wrong predictably:

The rule: every composed system is a new system. Test it as one. "It's just A plus B plus a connector" is wrong in the same way "two single-process programs glued together is just a single-process program twice" was wrong twenty years ago.

The orchestrator doesn't think. It routes. The agents think. Mixing the two up is the biggest reason agent code becomes unmaintainable.

The loop is correct as far as it goes, but it leaves three questions unanswered: where does the work come from in the first place, how does the agent know its world, and where does the agent really live as a software entity? Chapter 03 (Where the work comes from) is the honest answer to all three. Read it before chapter 04 (Protocols), because protocols only make sense once you know what is being routed and who is asking.