Build path · Cenpie · Agentic AI Field Manual

★ Build path · what to build first, second, third

The manual has 28 chapters. Most people just want to know what to build first.

Reading the whole manual front to back is one way through. It is not the way most people learn. Most people learn by building a small thing that runs, breaking it on purpose, fixing it, and adding the next piece. This page is that ladder. Seven steps from "I have never built an agent" to "I have a multi-agent system with safety, trust, and evaluation that I would ship." Each step takes between thirty minutes and a weekend, produces something that runs, and points to the chapter and kit module that go with it.

The total commitment is around fifteen to twenty hours spread over two weekends if you go end to end. You can stop at any step and have something useful. Many production systems live happily at step 3 or step 5 for years.

The build path is opinionated. There are other valid orders. If you are debugging a specific problem in a system you already have, jump to the chapter that matches the problem. The path is for greenfield learning.

Before you start

Python 3.10 or later. The kit uses standard library features through dataclasses with default factories and the match statement is not required.
Download the kit: cenpie-agent-kit.zip. Unzip, run pip install -r requirements.txt, then pytest tests/ to confirm 139 of 139 pass.
An LLM API key is optional until step 2. The kit ships with a ScriptedLLM that lets every demo run offline.
One open terminal, one open editor. No frameworks, no orchestration platforms, no cloud accounts. Just Python.

The ladder, at a glance

#	Step	Time	Deliverable	Read	Build
1	First agent	30 min	A perceive-decide-act loop that runs offline	Ch 01, Ch 02	`tiny_agent/`
2	Real LLM	1 hour	Same loop, swapped to a real model	Ch 04	`examples/real_llm_adapter.py`
3	Two agents	1 evening	Conductor plus specialists, working in parallel	Ch 05, Ch 07	`orchestrator/`
4	Share context safely	1 evening	Typed envelopes plus a handshake between agents	Ch 06	`context_exchange/`
5	Add guards	1 evening	Five built-in guards on the boundary	Ch 18	`guardrails/`
6	Add trust	1 weekend	Reputation, signed tokens, audit log	Ch 12	`trust_engine/`
7	Evaluate & ship	1 weekend	Test harness, traces, deploy plan	Ch 17, Ch 19	your code

Step 1 · First agent (30 minutes)

The simplest useful thing you can build. An agent is a loop that perceives the situation, decides on an action, takes the action, and repeats. The kit ships tiny_agent/ as a reference implementation in less than 150 lines. Do not start by reading the code. Start by running it.

cd cenpie-agent-kit
python cli.py tiny

What you should see: the agent receives a goal, picks a tool, gets a result, and returns an answer. Two example goals are pre-loaded so you can see the loop work without an API key. Now open tiny_agent/agent.py and read it. Notice three things: the loop has a hard max_steps cap, every tool call is wrapped in a try/except so a broken tool does not crash the agent, and every step goes through an observer hook that lets you trace what the agent is thinking.

Stop here if you only need a single-step agent. Plenty of production systems are exactly this loop with one tool. The rest of the path is for when you need more agents, more safety, or more trust.

Read alongside: Chapter 01 (Tutorial) for a from-scratch walkthrough of the same code, and Chapter 02 (Architecture) for the perceive-decide-act loop in context.

Step 2 · Swap to a real LLM (1 hour)

The kit's ScriptedLLM is a deterministic stub. Replace it with a real model. The kit ships adapters for OpenAI, Anthropic, and Ollama in examples/real_llm_adapter.py; pick whichever you have a key for. Set the API key in your environment, change one import in your test script, and run again. The agent should now do real reasoning instead of following the script.

What to notice here: the agent loop is unchanged. The only thing that changed is the LLM object passed in. This is the value of keeping the loop and the model decoupled. When a new model comes out tomorrow, you swap the adapter, not the system.

Read alongside: Chapter 04 (Protocols) for MCP and how tools should be defined so models can pick them reliably.

Step 3 · Two agents (1 evening)

One agent is rarely enough for real work. The most common pattern in production is a conductor that delegates to specialists. Run the orchestrator demo:

python cli.py orch

The demo shows a conductor handing off three subtasks to three specialists. Specialists do not talk to each other; they only talk to the conductor. Read orchestrator/core.py: the whole pattern is about 80 lines. Notice how the conductor handles a specialist failure (failure isolation) without taking down the others.

Then write your own. Pick a small task you understand well (summarize an article, plan a trip, review a code patch) and split it into two or three subtasks. Wire it up. The first version will be slow because the specialists run sequentially. Chapter 07 (Parallel collaboration) shows how to fan out and merge in parallel using asyncio.gather; expect to cut latency in half.

Read alongside: Chapter 05 (Seven patterns) to see the other six shapes (hierarchical, pipeline, swarm, blackboard, debate, time-aware) and pick the one that fits your problem. Most people pick orchestrator and stay. Then Chapter 09 (Generalists & specialists) for the structural choice every specialist needs upfront: fine-tune, system prompt, or RAG, and where each one's knowledge actually lives.

Step 4 · Share context safely between agents (1 evening)

Now you have multiple agents passing data to each other. The default in most frameworks is to dump the whole conversation into every sub-call. That works until it leaks. Step 4 fixes the leak before it happens.

python cli.py context

The demo shows the conductor and a specialist running a five-rule capability handshake before any data flows, then exchanging a ContextEnvelope through a compartment that minimizes (drops keys not in the agreed need-to-know set), redacts (scrubs default PII patterns like card numbers and emails), and validates (checks tags and freshness on the response). Three small ideas. About 200 lines of code. Three failure modes prevented: surplus context leak, PII in logs, stale data drift.

Add this to the system you built in step 3. Wrap every cross-agent message in an envelope. Put a Compartment on every boundary. The first time you do this it feels like overhead. Two weeks later, when a new specialist gets added by someone else and they forget to filter the input, the compartment catches it and the audit log says exactly which contract was violated.

Read alongside: Chapter 06 (Context exchange) for the full reasoning, the failure-mode table, and how the three pieces map onto each of the seven patterns from chapter 05. Then Chapter 10 (When the agent itself is wrong) for the next layer up: the agent on the other end of the handshake might lie about its capabilities, drift away from the original ask, or call tools it was never authorized for. Three external checks (capability registry, pinned ask, tool gate) close the gap. The kit ships them as the verification/ module; run python cli.py verify to see all three in action.

Step 5 · Add guards (1 evening)

Your agents are now passing typed context. Next: stop them from being attacked. The kit's guardrails/ module ships five built-in guards that cover the common cases:

python cli.py guards

Input length cap: reject inputs over a configured size. Stops the cheapest denial-of-service.
Keyword and regex blocklist: filter prompt-injection patterns ("ignore previous instructions", "system:") and known jailbreaks. Filters output, too.
Tool allow-list: each agent role can only call a named subset of tools. Default deny.
Output schema validation: the agent's reply must match a declared shape. Catches hallucinated structure.
Per-agent rate limiting: token bucket on every agent. Caps a runaway loop's blast radius.

Read guardrails/builtin.py. Each guard is a small class with a single check() method. The pipeline is fail-closed: if any guard raises, the whole pipeline blocks rather than passes through. Add the pipeline to every entry point in your system and every cross-agent handoff.

Read alongside: Chapter 18 (Guardrails) for the principles behind layered defense, plus Chapter 20 (Adversarial) for the threats these guards are answering.

Step 6 · Add trust (1 weekend)

At this point your system has multiple agents, safe context exchange, and guards on every boundary. The remaining problem is privilege. Which agent is allowed to do what? When does a rogue or compromised agent get its privileges revoked? How do you prove later that a particular agent did or did not take a particular action? trust_engine/ is the answer.

python cli.py trust

Three pieces working together. Reputation: Beta-distributed counters track each agent's success rate over time, with exponential decay so old behavior matters less. Capability tokens: Ed25519-signed bearer tokens that bind an agent to a specific tool and a specific tenant for a short window; replay attacks are blocked by a unique jti; tampering breaks the signature; expiry is wall-clock. Audit log: every action recorded in a hash-chained log; tampering with any entry breaks the chain at exactly that position.

This is the kit's technical centerpiece. The math, the cryptography, and the tests all match the chapter. After you have run it, port the parts that fit your stack: the reputation tracker can run alongside any agent system; the token broker is a separate process; the audit log can ship to whatever durable store you already use.

Read alongside: Chapter 12 (Trust, privileges & RAG). If you are working in 2026 with multi-hop delegation, the V2 graduations in trust_engine/chains.py, contextual.py, audit_dag.py, and the guardrails/taint.py module are the next layer; Chapter 21 (The 2026 frontier) covers them.

Step 7 · Evaluate and ship (1 weekend)

The last step is the one most teams skip. Before you put any of this in front of users, write the evaluation harness. The kit ships 191 tests covering the modules; you need an evaluation harness covering your end-to-end task. Chapter 17 (Evaluation) walks through how to build one: golden cases, regression tests, online metrics, drift detection.

Golden cases: twenty to fifty real examples of input plus expected behavior. Run the system on them every commit. Block the merge if any of them changes behavior.
Traces: use the observer hook from step 1 to ship every step (decision, tool call, result, guard verdict, trust verdict) to a logging system. Chapter 16 (Alerting) covers what to alert on.
Risk model: Chapter 15 (Risk modeling), even a one-page version, will save you in the post-mortem you have not yet written.
Deployment plan: Chapter 19 (Infra & deployment) covers staging, canary, rollback, and the operational concerns that "it works on my laptop" does not address.

At the end of step 7 you have a multi-agent system with typed context exchange, guards, signed capability tokens, an audit log, an evaluation harness, traces, alerts, and a deploy plan. That is a complete system. Most teams take longer than a weekend on this step because production complexity is real; the chapter and the kit will cut the time in half compared to figuring it out from scratch.

Side branches

Once you have the seven steps in place, the rest of the manual is reachable in any order. A few common branches:

You operate at scale. Read Chapter 13 (Control plane) for routing, rate limiting, and the policy layer that sits in front of the LLMs. Then Chapter 08 (Memory) for persistent state across sessions.
You face adversarial pressure. Read Chapter 20 (Adversarial), then the V2 modules in the kit, then Chapter 21 (The 2026 frontier) for chained tokens, contextual reputation, audit DAGs, and taint tracking.
You want to see the whole picture. Read Chapter 23 (End-to-end walkthrough) for a single example that touches every layer of the stack, from architecture through trust to deployment, in one running system.
You want production stories. Chapter 22 (Production case studies) covers what real systems look like in support, code review, research, and analytics.
You want to know where the field is going. Chapter 24 (The road ahead) and Chapter 25 (Beyond software).

What if you only have an evening?

Steps 1, 2, and 3. By the end you will have run a real LLM through a perceive-decide-act loop, then split the work between a conductor and two specialists. You will understand most of what makes agent systems different from regular software. The other steps are about making it production-safe; the first three are about understanding the shape.

What if you only have a weekend?

Steps 1 through 5. By the end you will have a multi-agent system with typed context exchange and guards on every boundary. That is enough to ship to internal users at most companies. Steps 6 and 7 are the hardening for external users and high-stakes domains.

The kit ships a six-exercise workbook that mirrors the build path. After step 6, run python earn_certificate.py to mint a verifiable completion certificate with a SHA-256 hash of your work. It is not a credential; it is a forcing function.