> For the complete documentation index, see [llms.txt](https://docs.zenml.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.zenml.io/user-guides/agents-guide.md). # Agents guide This guide teaches production AI agents with [Kitaru](https://docs.zenml.io/kitaru), ZenML's sibling project for running, replaying, and improving agents. By the end you'll be able to do three things: 1. **Run** an agent durably, so a crash never re-pays for finished work. 2. **Replay** a real run with one thing changed — a different model, a different prompt — and diff the result against a faithful baseline. 3. **Improve** the agent by rolling the winning change across a cohort of recent runs and keeping the version that wins on cost, latency, and quality. Replay is the part other tooling can't do. An eval re-scores outputs after the fact. Kitaru re-executes the actual run from a durable checkpoint with one input swapped, so you find out what *would have happened* if you'd shipped the change. {% hint style="info" %} A Kitaru **flow** is a dynamic ZenML pipeline. A **checkpoint** is like a step. Agents and pipelines run on the same stacks, the same server, the same dashboard. {% endhint %} ## The learning path The guide is in three parts. Parts 1 and 2 are the spine — they're enough to run and improve a single agent. Part 3 is for teams who go on to operate many agents on shared rails. ### Part 1 — Run Wrap a PydanticAI agent in a Kitaru flow so every model call and tool call becomes a durable checkpoint. A retry resumes from where the crash hit instead of paying for the whole run twice. This is the enabler the rest of the guide builds on. * [Run a durable agent](/user-guides/agents-guide/01-durable-agent.md) ### Part 2 — Replay and improve The differentiator. Take a recorded run, reproduce it faithfully as a control, then replay it again with exactly one thing changed and diff the two. Because the baseline reproduced, the difference is your change, not replay noise. Then scale that decision across a cohort and measure cost, latency, and quality. Replay and diff are exposed over a CLI and an MCP server, so a coding agent can drive the loop and hill-climb on its own. * [Replay and improve](/user-guides/agents-guide/replay-and-improve.md) ### Part 3 — Operate at scale When several teams start building agents, the same platform questions come back every time: where logs live, how shell commands run without touching the host, how tools call internal services without handing the model raw credentials, how to pause for a human and resume from the same point, and how each team gets its own tools and rules without copying glue code. Part 3 builds a small **internal agent harness platform** that answers those questions once. A team describes an agent with a `Profile` — its name, model, system prompt, allowed tools, allowed services, skill files, sandbox rules, and approval points — and shared platform code turns that profile into a runnable, durable agent. The result is reusable rails plus per-agent configuration, so Team A can build a support-triage agent and Team B a release-notes agent without both re-solving durability, logs, secrets, approvals, and safe command execution. These stages each add one capability while keeping the earlier ones valid:


Sandboxed command execution	Put shell commands in a Docker sandbox with its own filesystem and network namespace, rather than running agent-generated commands on the host.	/pages/Pg2Qt5UDJIBmVQ1olxjS
Operator-editable procedures	Move repeatable agent instructions into skill markdown files, so teams can change procedures without burying every rule in the system prompt.	/pages/t2JZ39WPu66cDYTUd7nC
Credential isolation	Keep secrets out of the worker. A separate proxy process holds credentials and adds auth headers for approved internal calls.	/pages/WnTuBKPRQQs6L4xSgbKU
Typed service boundaries	Route structured service requests through a typed dispatcher, so the platform can decide exactly which internal actions an agent may call.	/pages/FTShFcd4zW2cljRHs4Ki
Durable human approval	Pause a run with `kitaru.wait()`, ask a human for a decision, and resume the same flow after the answer arrives.	/pages/bnKRh2VYepJPgyfJYFfD

The platform stages are a **runnable local reference architecture**, not a turnkey enterprise platform. They don't ship your identity provider, policy engine, observability stack, or production secret store, and the sandbox is for local isolation, not a hostile-code security boundary. For which pieces are teaching stand-ins and what to harden first, see [Production notes and upgrade paths](/user-guides/agents-guide/production-notes.md).

## Get the code The local tour needs Docker and one model-provider API key. The wiki and webhook services are mocked locally. ```bash git clone https://github.com/zenml-io/kitaru.git cd kitaru/examples/end_to_end/agent_harness_platform uv sync uv run kitaru init export OPENAI_API_KEY=sk-... uv run python stage_1_basic_agent.py ``` The full source lives in [`examples/end_to_end/agent_harness_platform/`](https://github.com/zenml-io/kitaru/tree/develop/examples/end_to_end/agent_harness_platform) on GitHub. It includes the runnable stage files, the reusable `agent_harness_platform/` library, mocks, skills, and Dockerfiles. If you only want to make one function durable, start with the [Kitaru quickstart](https://docs.zenml.io/kitaru/getting-started/quickstart). Come back here when you want the full run → replay → improve loop, and then the platform shape around it. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.zenml.io/user-guides/agents-guide.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.