Tone Dark
Tint
Build path · what to build first, second, third

The manual has 28 chapters. Most people just want to know what to build first.

Reading the whole manual front to back is one way through. It is not the way most people learn. Most people learn by building a small thing that runs, breaking it on purpose, fixing it, and adding the next piece. This page is that ladder. Seven steps from "I have never built an agent" to "I have a multi-agent system with safety, trust, and evaluation that I would ship." Each step takes between thirty minutes and a weekend, produces something that runs, and points to the chapter and kit module that go with it.

The total commitment is around fifteen to twenty hours spread over two weekends if you go end to end. You can stop at any step and have something useful. Many production systems live happily at step 3 or step 5 for years.

The build path is opinionated. There are other valid orders. If you are debugging a specific problem in a system you already have, jump to the chapter that matches the problem. The path is for greenfield learning.

Before you start

The ladder, at a glance

#StepTimeDeliverableReadBuild
1First agent30 minA perceive-decide-act loop that runs offlineCh 01, Ch 02tiny_agent/
2Real LLM1 hourSame loop, swapped to a real modelCh 04examples/real_llm_adapter.py
3Two agents1 eveningConductor plus specialists, working in parallelCh 05, Ch 07orchestrator/
4Share context safely1 eveningTyped envelopes plus a handshake between agentsCh 06context_exchange/
5Add guards1 eveningFive built-in guards on the boundaryCh 18guardrails/
6Add trust1 weekendReputation, signed tokens, audit logCh 12trust_engine/
7Evaluate & ship1 weekendTest harness, traces, deploy planCh 17, Ch 19your code

Step 1 · First agent (30 minutes)

The simplest useful thing you can build. An agent is a loop that perceives the situation, decides on an action, takes the action, and repeats. The kit ships tiny_agent/ as a reference implementation in less than 150 lines. Do not start by reading the code. Start by running it.

cd cenpie-agent-kit
python cli.py tiny

What you should see: the agent receives a goal, picks a tool, gets a result, and returns an answer. Two example goals are pre-loaded so you can see the loop work without an API key. Now open tiny_agent/agent.py and read it. Notice three things: the loop has a hard max_steps cap, every tool call is wrapped in a try/except so a broken tool does not crash the agent, and every step goes through an observer hook that lets you trace what the agent is thinking.

Stop here if you only need a single-step agent. Plenty of production systems are exactly this loop with one tool. The rest of the path is for when you need more agents, more safety, or more trust.

Read alongside: Chapter 01 (Tutorial) for a from-scratch walkthrough of the same code, and Chapter 02 (Architecture) for the perceive-decide-act loop in context.

Step 2 · Swap to a real LLM (1 hour)

The kit's ScriptedLLM is a deterministic stub. Replace it with a real model. The kit ships adapters for OpenAI, Anthropic, and Ollama in examples/real_llm_adapter.py; pick whichever you have a key for. Set the API key in your environment, change one import in your test script, and run again. The agent should now do real reasoning instead of following the script.

What to notice here: the agent loop is unchanged. The only thing that changed is the LLM object passed in. This is the value of keeping the loop and the model decoupled. When a new model comes out tomorrow, you swap the adapter, not the system.

Read alongside: Chapter 04 (Protocols) for MCP and how tools should be defined so models can pick them reliably.

Step 3 · Two agents (1 evening)

One agent is rarely enough for real work. The most common pattern in production is a conductor that delegates to specialists. Run the orchestrator demo:

python cli.py orch

The demo shows a conductor handing off three subtasks to three specialists. Specialists do not talk to each other; they only talk to the conductor. Read orchestrator/core.py: the whole pattern is about 80 lines. Notice how the conductor handles a specialist failure (failure isolation) without taking down the others.

Then write your own. Pick a small task you understand well (summarize an article, plan a trip, review a code patch) and split it into two or three subtasks. Wire it up. The first version will be slow because the specialists run sequentially. Chapter 07 (Parallel collaboration) shows how to fan out and merge in parallel using asyncio.gather; expect to cut latency in half.

Read alongside: Chapter 05 (Seven patterns) to see the other six shapes (hierarchical, pipeline, swarm, blackboard, debate, time-aware) and pick the one that fits your problem. Most people pick orchestrator and stay. Then Chapter 09 (Generalists & specialists) for the structural choice every specialist needs upfront: fine-tune, system prompt, or RAG, and where each one's knowledge actually lives.

Step 4 · Share context safely between agents (1 evening)

Now you have multiple agents passing data to each other. The default in most frameworks is to dump the whole conversation into every sub-call. That works until it leaks. Step 4 fixes the leak before it happens.

python cli.py context

The demo shows the conductor and a specialist running a five-rule capability handshake before any data flows, then exchanging a ContextEnvelope through a compartment that minimizes (drops keys not in the agreed need-to-know set), redacts (scrubs default PII patterns like card numbers and emails), and validates (checks tags and freshness on the response). Three small ideas. About 200 lines of code. Three failure modes prevented: surplus context leak, PII in logs, stale data drift.

Add this to the system you built in step 3. Wrap every cross-agent message in an envelope. Put a Compartment on every boundary. The first time you do this it feels like overhead. Two weeks later, when a new specialist gets added by someone else and they forget to filter the input, the compartment catches it and the audit log says exactly which contract was violated.

Read alongside: Chapter 06 (Context exchange) for the full reasoning, the failure-mode table, and how the three pieces map onto each of the seven patterns from chapter 05. Then Chapter 10 (When the agent itself is wrong) for the next layer up: the agent on the other end of the handshake might lie about its capabilities, drift away from the original ask, or call tools it was never authorized for. Three external checks (capability registry, pinned ask, tool gate) close the gap. The kit ships them as the verification/ module; run python cli.py verify to see all three in action.

Step 5 · Add guards (1 evening)

Your agents are now passing typed context. Next: stop them from being attacked. The kit's guardrails/ module ships five built-in guards that cover the common cases:

python cli.py guards

Read guardrails/builtin.py. Each guard is a small class with a single check() method. The pipeline is fail-closed: if any guard raises, the whole pipeline blocks rather than passes through. Add the pipeline to every entry point in your system and every cross-agent handoff.

Read alongside: Chapter 18 (Guardrails) for the principles behind layered defense, plus Chapter 20 (Adversarial) for the threats these guards are answering.

Step 6 · Add trust (1 weekend)

At this point your system has multiple agents, safe context exchange, and guards on every boundary. The remaining problem is privilege. Which agent is allowed to do what? When does a rogue or compromised agent get its privileges revoked? How do you prove later that a particular agent did or did not take a particular action? trust_engine/ is the answer.

python cli.py trust

Three pieces working together. Reputation: Beta-distributed counters track each agent's success rate over time, with exponential decay so old behavior matters less. Capability tokens: Ed25519-signed bearer tokens that bind an agent to a specific tool and a specific tenant for a short window; replay attacks are blocked by a unique jti; tampering breaks the signature; expiry is wall-clock. Audit log: every action recorded in a hash-chained log; tampering with any entry breaks the chain at exactly that position.

This is the kit's technical centerpiece. The math, the cryptography, and the tests all match the chapter. After you have run it, port the parts that fit your stack: the reputation tracker can run alongside any agent system; the token broker is a separate process; the audit log can ship to whatever durable store you already use.

Read alongside: Chapter 12 (Trust, privileges & RAG). If you are working in 2026 with multi-hop delegation, the V2 graduations in trust_engine/chains.py, contextual.py, audit_dag.py, and the guardrails/taint.py module are the next layer; Chapter 21 (The 2026 frontier) covers them.

Step 7 · Evaluate and ship (1 weekend)

The last step is the one most teams skip. Before you put any of this in front of users, write the evaluation harness. The kit ships 191 tests covering the modules; you need an evaluation harness covering your end-to-end task. Chapter 17 (Evaluation) walks through how to build one: golden cases, regression tests, online metrics, drift detection.

At the end of step 7 you have a multi-agent system with typed context exchange, guards, signed capability tokens, an audit log, an evaluation harness, traces, alerts, and a deploy plan. That is a complete system. Most teams take longer than a weekend on this step because production complexity is real; the chapter and the kit will cut the time in half compared to figuring it out from scratch.

Side branches

Once you have the seven steps in place, the rest of the manual is reachable in any order. A few common branches:

What if you only have an evening?

Steps 1, 2, and 3. By the end you will have run a real LLM through a perceive-decide-act loop, then split the work between a conductor and two specialists. You will understand most of what makes agent systems different from regular software. The other steps are about making it production-safe; the first three are about understanding the shape.

What if you only have a weekend?

Steps 1 through 5. By the end you will have a multi-agent system with typed context exchange and guards on every boundary. That is enough to ship to internal users at most companies. Steps 6 and 7 are the hardening for external users and high-stakes domains.

The kit ships a six-exercise workbook that mirrors the build path. After step 6, run python earn_certificate.py to mint a verifiable completion certificate with a SHA-256 hash of your work. It is not a credential; it is a forcing function.