Tutorial & prerequisites

01 Tutorial · prerequisites & first principles

Start here if Agentic AI is new to you.

Beginner-friendly

This chapter assumes nothing. By the end, you will know what an agent is, what makes a system "agentic", what tools and memory are doing, and you will have walked through the lifecycle of a single agent answering one question. The rest of the manual will then make sense.

Before you start // what you should know

Basic Python: you can read a function, a class, and a list comprehension. We will not teach Python here. required
What an LLM is: a model that takes text in and produces text out. ChatGPT, Claude, Gemini are examples. You have used at least one. required
Prompting basics: you have written a prompt, gotten a response, and tweaked the prompt. You know "system prompt" vs "user prompt". helpful
API calls: comfortable with HTTP requests, JSON, and reading API documentation. helpful
Async programming: familiarity with async/await in Python or JavaScript. The parallel chapters lean on this. optional
Distributed systems vocabulary: words like consensus, quorum, fault tolerance. Useful for Chapter 20, not required. optional

What is an "agent", really?

An LLM by itself is a function: text in, text out. It has no memory between calls, no ability to take action, no way to look things up. It just predicts the next token.

An agent wraps an LLM in a loop and gives it three things it lacks: the ability to use tools, the ability to remember across turns, and the ability to decide what to do next. That loop, perceive, decide, act, observe, repeat, is what makes a system "agentic".

LLM call

Single input, single output. Stateless. Cannot reach the world.

"Summarize this article" → text

Tool

A function the agent can call. Search, calculator, database, API.

search_web(query) → results

Memory

Storage the agent reads from and writes to across turns.

scratchpad, vector store, KV store

Loop

Repeated cycle: think, act, observe, until done.

while not done: step()

Goal

The objective the agent is pursuing across the whole loop.

"Book me a flight under $400"

Stop condition

When the agent declares itself done. Goal met, or budget exhausted.

return Done(answer=...)

The three things every agent does, in order

1 Perceive: read the situation

The agent receives the goal plus whatever context is relevant: prior messages, tool results from earlier in the loop, retrieved documents. This becomes the prompt sent to the LLM.

The hard part is not what to include. It is what to exclude. Stuffing too much into context confuses the model and burns tokens. A good agent passes only what is needed for the current step.

2 Decide: pick the next action

The LLM responds with a structured decision. Modern systems use function calling or tool use, where the model produces something like:

{
  "action": "call_tool",
  "tool": "search_web",
  "args": { "query": "agentic AI orchestration patterns" }
}

Or it can declare it is finished:

{
  "action": "final_answer",
  "text": "Based on the search results, the four main patterns are..."
}

3 Act: execute the decision, observe the result

The agent runs the chosen tool, captures the output, appends it to memory, then loops back to step 1. The next perception now includes what just happened. This is how the agent "learns" within a single task.

If the model returned final_answer, the loop exits and the answer is returned to the user.

A complete tiny agent in Python

Here is the smallest honest agent. Roughly 30 lines. Read it top to bottom. The patterns you will see in production systems are elaborations of this.

def tiny_agent(goal, llm, tools, max_steps=10):
    """A minimal agent. Loops until the LLM says 'done' or budget runs out."""
    history = [{"role": "system", "content": f"You are an agent. Goal: {goal}"}]

    for step in range(max_steps):
        # 1. PERCEIVE: build the prompt from history
        decision = llm(history, available_tools=list(tools.keys()))

        # 2. DECIDE: did the model finish or call a tool?
        if decision["action"] == "final_answer":
            return decision["text"]

        if decision["action"] == "call_tool":
            tool_name = decision["tool"]
            args = decision["args"]

            # 3. ACT: run the tool, observe the result
            if tool_name not in tools:
                result = f"Error: tool '{tool_name}' is not available"
            else:
                try:
                    result = tools[tool_name](**args)
                except Exception as e:
                    result = f"Tool error: {e}"

            # Append the tool call AND its result to history (this is memory)
            history.append({"role": "assistant", "content": f"calling {tool_name}({args})"})
            history.append({"role": "tool", "content": str(result)})

    return "Agent stopped: max steps reached without final answer"

Three things to internalize from this code. First, history is everything: it is the only memory the agent has within one task. Second, the loop has a hard upper bound (max_steps): without this, a confused agent loops forever. Third, tool errors are caught and fed back: the agent gets the chance to recover instead of crashing.

Exercise 1 · think before you scroll

Imagine running this agent with the goal "What is the population of Tokyo?" and a single tool search_web. Mentally walk through the loop. What does step 1 look like? Step 2? When does the loop exit? What would happen if search_web threw an error on every call?

From one agent to many: why orchestration?

A single agent is enough for narrow, well-scoped tasks. But three things break when you push it harder:

Context bloat: as the loop grows, history balloons. The model gets confused, slow, and expensive. There is a hard ceiling around 100k to 200k tokens before quality drops sharply.
Skill spread: the same prompt cannot make a model great at researching, writing, coding, and reviewing. Specialization beats generalization.
Failure correlation: a single agent's blind spots are the system's blind spots. No one is checking its work.

Multi-agent systems address all three: shorter contexts per agent, specialization by role, and cross-agent verification. That is what the rest of this manual covers.

Exercise 2 · identify the right shape

For each of these tasks, decide: single agent or multi-agent, and why?

Translating a 200-word email from English to Japanese.
Researching a market and producing a 10-page report with citations.
Reviewing a pull request that changes the payment system.
Answering "what time is it in Paris right now?".

Answers: 1. single (narrow, no skill spread). 2. multi (research, writing, fact-checking are different skills, plus context bloat). 3. multi (proposer plus critic plus security is the textbook debate pattern). 4. single (one tool call, done).

The conceptual map: what we will cover

Each chapter answers one question. Use this as a roadmap.

CH 02 · Architecture

What are the layers of a real agentic system, and what does each one do?

CH 03 · Protocols

How do agents talk to tools and to each other? MCPMCP 2025, A2AYang 2025, ACP, ANP.

CH 04 · Patterns

When should agents talk in a line, in a tree, freely, or through a shared board?

CH 05 · Parallelism

How do you make agents work concurrently on the same artifact without stepping on each other?

CH 06 · Memory

How do agents remember things across sessions? Long context, RAG, memory agents, graph memory.

CH 07 · Heuristics & rewards

How do you guide agent behavior beyond just prompts? Rules, reward signals, learned preferences.

CH 08 · Trust & privileges

What's an agent allowed to do? Pre-config vs runtime privileges, RAG access, reputation tracking.

CH 09 · Control plane

Real-time enforcement over data: policy engines, classification-aware retrieval, lineage, erasure, residency. Compliance that runs in microseconds, not policy documents.

CH 10 · Predictability

Modeling what an agent will do next. MDPs, hidden Markov models for behavior auditing, conformal prediction for runtime confidence intervals, world models. The instrumentation that turns surprises into things you knew were coming.

CH 11 · Risk

What can go wrong, and how do you score severity to decide where to spend safety budget?

CH 12 · Alerts

When something does go wrong, who finds out, how fast, and what do they do about it?

CH 13 · Evaluation

How do you actually measure whether your agent works? The benchmarks, what they miss, and what to do instead.

CH 14 · Guardrails

How do you prevent the worst failures by design, in layers that survive contact with reality?

CH 15 · Infra & deployment

How do you take all this from your laptop to production safely?

CH 16 · Adversarial

What if an agent gets compromised, lies, or has goals that conflict with yours?

CH 17 · The 2026 frontier

The security model from chapters 8-15 was built for single-call agents. Two attacks in late 2025 and the IETF drafts that followed reshape what trust looks like for agents that delegate.

CH 18 · Production case studies

Real shipped systems with public sources: Copilot Workspace, Anthropic computer use, Cursor, Devin postmortem, Shopify Sidekick, Berkeley scanning-agent results.

CH 19 · End-to-end walkthrough

One coherent customer support agent system using every layer of the manual, in twelve steps.

CH 20 · The road ahead

Where the field is going next: self-improvement, world models, agent economies, embodied agents.

CH 21 · Beyond software

The unbuilt architecture: agents that watch instead of answer, embedded in clinics, neuroimaging streams, autonomous labs. Grounded in 2025-26 medical and scientific research.

The fastest way to learn agentic AI is to build a tiny agent end-to-end before reading any framework documentation. Then read the manual. The frameworks will feel like solutions to problems you have already met.

Recommended next steps

Build the tiny agent above with one real LLM and one real tool (web search, a calculator, anything). Watch the history grow. Break it on purpose: remove the max_steps cap, see what happens.
Then read Chapter 2 (Architecture). The seven-layer architecture will map directly onto what you just built, and you will see what your minimal version was missing.
Pick a real problem you care about. Decide single vs multi, sketch the agents on paper, and only then start coding. The thinking matters more than the framework.
One scope note before you go. Almost every agent in this manual is the call-and-response kind: a function that takes a prompt and returns an answer. That is one valid kind of agent, and the kind nearly all production systems ship today. Chapter 24 names a different kind, the one almost nobody is building yet, that watches continuous streams instead of answering questions. If after building tiny agents you find yourself wondering why every example feels like a chatbot under the hood, that chapter will tell you what is missing.

Foundational papers worth reading after this tutorial: ReAct, Yao et al. ICLR 2023 for the reason-act loop; Toolformer, Schick et al. NeurIPS 2023 for tool use; Shinn et al., NeurIPS 2023 for self-reflection; AutoGen, Wu et al. 2023 and MetaGPT, Hong et al. ICLR 2024 for the leading multi-agent frameworks. For a 2026 landscape view, see the survey Architectures & Taxonomies, arXiv 2026.