> For the complete documentation index, see [llms.txt](https://docs.zenml.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.zenml.io/kitaru/getting-started/readme.md). # Welcome to Kitaru Kitaru is the runtime for production AI agents: **run, replay, improve**. It records every model call and tool call as a durable checkpoint, then lets you re-execute a real run faithfully with one thing changed — a different model, a different prompt — and diff the result against the original. Because the baseline reproduces, the difference you see is your change, not replay noise. The harness you already picked (PydanticAI, OpenAI Agents SDK, LangGraph, Claude Agent SDK, raw Python) keeps owning how the agent thinks. Kitaru owns the run record and the replay loop. A Kitaru flow is a dynamic ZenML pipeline, so agents run on the same [stacks](/kitaru/agent-runtime-stacks/stacks.md), server, and dashboard as your ZenML pipelines. ## Run, replay, improve * **Run (durable).** Every `@checkpoint` is a durable unit of work; its output is persisted automatically, and every model and tool call is recorded. If a flow fails partway, replaying it reuses recorded results instead of re-running expensive work. * **Replay (the differentiator).** Re-execute a recorded run from any checkpoint. A plain rerun with no change reproduces the original — that is your baseline. Replay again with one input overridden and diff the two. This re-executes the real run from a checkpoint; it is not re-scoring saved outputs like an eval. * **Improve.** Apply the same change across a cohort of recent runs, measure cost, latency, and quality, and keep the winner. Kitaru is self-host-first: a single-service server on your own Kubernetes, artifacts in your own S3/GCS/Azure Blob. No mandatory SaaS control plane in the path of your agent's data. See [Harness, Runtime, Platform](/kitaru/core-concepts/harness-runtime-platform.md) for where Kitaru fits. ## The replay loop ```python import kitaru from kitaru import checkpoint, flow @checkpoint def research(topic: str) -> str: return kitaru.llm(f"Summarize {topic} in two sentences.") @checkpoint def draft_report(summary: str) -> str: return kitaru.llm(f"Write a short report based on: {summary}") @flow def research_agent(topic: str) -> str: summary = research(topic) return draft_report(summary) if __name__ == "__main__": # Run, then replay from a checkpoint with one input changed. run = research_agent.run(topic="Why do agents need durable execution?").wait() baseline = research_agent.replay(run.exec_id, at="draft_report") variant = research_agent.replay( run.exec_id, at="draft_report", flow_overrides={"model": "anthropic/claude-opus-4"}, ) # baseline reproduces the original; diff variant against it to isolate your change. ``` `run(...)` returns a handle; `.wait()` blocks for the result and exposes `.exec_id`. `replay(exec_id, at="", flow_overrides={...})` re-executes from that checkpoint, overriding flow inputs such as the model or prompt profile. The same loop is available over the [CLI](https://sdkdocs.kitaru.ai) and the [MCP server](/kitaru/agent-native/mcp-server.md) so a coding agent can drive it. See the [Quickstart](/kitaru/getting-started/quickstart.md) to install and run this yourself. ## Where ZenML fits Kitaru is built by the team behind [ZenML](https://docs.zenml.io), the open-source framework for production ML and LLM pipelines, and runs on the same foundations. Each project works on its own — you can use Kitaru without ever touching ZenML. If you use both, they compose rather than coexist: a Kitaru flow is a dynamic ZenML pipeline under the hood, so your agents and pipelines run on the same [stacks](/kitaru/agent-runtime-stacks/stacks.md), persist artifacts to the same stores, and show up in the same server and dashboard. If your work is ML pipelines rather than agents, start with the [ZenML docs](https://docs.zenml.io) — and if you want the narrative tutorial for agents, the [Agents guide](https://docs.zenml.io/user-guides/agents-guide) sits alongside ZenML's Starter, Production, and LLMOps guides in the shared [Learn](https://docs.zenml.io/user-guides) section. ## Runtime primitives These are the primitives Kitaru adds on top of your existing Python agent code. You keep your harness and your control flow; Kitaru records the run and makes it replayable. * **Replay and override:** Re-execute any run from any checkpoint — to recover from a failure, or with [overrides](/kitaru/guides/replay-and-overrides.md) (a different model or parameter) to isolate the effect of a change before you ship it. Use invocation overrides when you need to change one recorded checkpoint, tool, or model call instead of every call with the same checkpoint name. * **Durable execution:** Wrap steps in [`@checkpoint`](/kitaru/core-concepts/checkpoints.md) and your agent picks up where it left off without re-running expensive work * **Wait and resume:** Add [`kitaru.wait()`](/kitaru/guides/wait-and-resume.md) and let agents pause for a human, another system, or later input; after the polling timeout, compute is released and the run resumes when input lands * **Artifact lineage:** Every checkpoint output is written to your object store as a typed, versioned artifact — step through runs, diff outputs across runs, and trace a bad final output back to the exact step that produced it * **Execution management:** [`KitaruClient`](/kitaru/guides/execution-management.md) lets you inspect, replay, retry, resume, and cancel executions from code or CLI * **Tracked LLM calls:** Use [`kitaru.llm()`](/kitaru/guides/llm-calls.md) and every call gets automatic secret resolution, prompt/response capture, and token/latency logging * **Persistent data:** [`kitaru.save()` / `kitaru.load()`](/kitaru/guides/artifacts.md) let agents store and retrieve files, objects, and results across executions * **Structured observability:** [`kitaru.log()`](/kitaru/core-concepts/logging.md) attaches key-value metadata to any checkpoint or flow for debugging and the UI * **Runtime configuration:** [`kitaru.configure()`](/kitaru/guides/configuration.md) sets your model, log store, and stack defaults in one call * **Framework and infrastructure portability:** Keep your Python control flow, use your preferred framework, and run locally or on remote stacks — Kubernetes, Vertex AI, SageMaker, AzureML ## Next steps


Installation	Install Kitaru with uv or pip.	/pages/cDb4N92M787W6Uf33vjO
Quickstart	Run a tiny flow end to end.	/pages/knQ03wCkSkWfSSXcMRor
Examples	Browse runnable workflows grouped by goal.	/pages/avGnXrIi7fgY7KLr0o2L
Harness, Runtime, Platform	Where Kitaru fits in an agent stack, and where it doesn't.	/pages/jFEpVFR4YYhJvoEp1r9K
How It Works	Server, runner, execution targets, and what lives where in local dev vs production.	/pages/fpgU4WBhT9hosGDLfA42
Core Concepts	Flows, checkpoints, and the execution model.	/pages/qw8hIFEbl4taSEvy4SNP
Execution Management	Inspect runs, replay, retry, resume, and fetch logs.	/pages/m1ms9iW3v3U2tkSxyRWm
Wait, Input, and Resume	Pause flows for external input and continue the same execution.	/pages/BUp6cWRuU8VUfknQKRto
Tracked LLM Calls	Use kitaru.llm() with aliases, secrets, and captured artifacts.	/pages/wljT8fZIU4BA8fs9S8aB
Secrets + Model Registration	Store provider credentials, register a model alias, and use kitaru.llm().	/pages/DJsLPOTXT5IAsfz7v4WZ
Configuration	Set runtime defaults and understand override precedence.	/pages/jfZrP31z5ehu33Ct8Ljy
Stacks	Create, inspect, switch, and clean up local and remote stacks across Kubernetes, AWS, GCP, and Azure.	/pages/Md0YgNiF5z5NwLEvQ5aR
MCP Server	Query and manage executions via MCP tools.	/pages/bKWyQ7nmVr76lemvYneQ
Agent Skills	Install quickstart, scoping, authoring, and adapter migration skills.	/pages/dxfY4zN6l6d8rDPipYj2
CLI Reference	Browse the generated command reference.	https://sdkdocs.kitaru.ai
Blog	Read essays on durable execution, long-running agents, and Kitaru's design.	https://kitaru.ai/blog/

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.zenml.io/kitaru/getting-started/readme.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.