# Welcome to Kitaru

Kitaru is the runtime layer underneath your agent stack. It gives you durable execution for Python agents — checkpoints, replay, resume, `wait()`, versioned deployments — while the harness you already picked (Pydantic AI, Deep Agents, LangGraph, Claude Agent SDK, raw Python) keeps owning how the agent thinks, and your existing platform keeps owning auth, observability, and policy.

Kitaru is self-host-first: a single-service server on your own Kubernetes, artifacts in your own S3/GCS/Azure Blob. No mandatory SaaS control plane in the path of your agent's data. See [Harness, Runtime, Platform](/kitaru/core-concepts/harness-runtime-platform.md) for the full picture of where Kitaru fits.

## Create a durable agent

```python
import kitaru
from kitaru import checkpoint, flow

@checkpoint
def research(topic: str) -> str:
    return kitaru.llm(f"Summarize {topic} in two sentences.")

@checkpoint
def draft_report(summary: str) -> str:
    return kitaru.llm(f"Write a short report based on: {summary}")

@flow
def research_agent(topic: str) -> str:
    summary = research(topic)
    return draft_report(summary)

if __name__ == "__main__":
    research_agent.run(topic="Why do AI agents need durable execution?")
```

Each `@checkpoint` is a durable unit of work — its output is persisted automatically. If the flow fails at `draft_report`, replaying it skips `research` and reuses its recorded result. `kitaru.llm()` logs model calls with prompt, response, tokens, and latency per call.

See the [Quickstart](/kitaru/getting-started/quickstart.md) to install and run this yourself.

## What your agent can do with Kitaru

These are the runtime primitives Kitaru adds on top of your existing Python agent code. You keep your harness and your control flow; Kitaru makes the run durable.

* **Durable execution:** Wrap steps in [`@checkpoint`](/kitaru/core-concepts/checkpoints.md) and your agent picks up where it left off without re-running expensive work
* **Replay from failure:** Re-run only the failed part of a flow by replaying from a checkpoint instead of starting from scratch
* **Wait and resume:** Add [`kitaru.wait()`](/kitaru/guides/wait-and-resume.md) and let agents pause for a human, another system, or later input; after the polling timeout, compute is released and the run resumes when input lands
* **Artifact lineage:** Every checkpoint output is written to your object store as a typed, versioned artifact — step through runs, diff outputs across runs, and trace a bad final output back to the exact step that produced it
* **Execution management:** [`KitaruClient`](/kitaru/guides/execution-management.md) lets you inspect, replay, retry, resume, and cancel executions from code or CLI
* **Tracked LLM calls:** Use [`kitaru.llm()`](/kitaru/guides/llm-calls.md) and every call gets automatic secret resolution, prompt/response capture, and token/latency logging
* **Persistent data:** [`kitaru.save()` / `kitaru.load()`](/kitaru/guides/artifacts.md) let agents store and retrieve files, objects, and results across executions
* **Structured observability:** [`kitaru.log()`](/kitaru/core-concepts/logging.md) attaches key-value metadata to any checkpoint or flow for debugging and the UI
* **Runtime configuration:** [`kitaru.configure()`](/kitaru/guides/configuration.md) sets your model, log store, and stack defaults in one call
* **Framework and infrastructure portability:** Keep your Python control flow, use your preferred framework, and run locally or on remote stacks — Kubernetes, Vertex AI, SageMaker, AzureML

## Next steps

<table data-view="cards"><thead><tr><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>Installation</strong></td><td>Install Kitaru with uv or pip.</td><td><a href="/pages/cDb4N92M787W6Uf33vjO">/pages/cDb4N92M787W6Uf33vjO</a></td></tr><tr><td><strong>Quickstart</strong></td><td>Run a tiny flow end to end.</td><td><a href="/pages/knQ03wCkSkWfSSXcMRor">/pages/knQ03wCkSkWfSSXcMRor</a></td></tr><tr><td><strong>Examples</strong></td><td>Browse runnable workflows grouped by goal.</td><td><a href="/pages/avGnXrIi7fgY7KLr0o2L">/pages/avGnXrIi7fgY7KLr0o2L</a></td></tr><tr><td><strong>Harness, Runtime, Platform</strong></td><td>Where Kitaru fits in an agent stack, and where it doesn't.</td><td><a href="/pages/jFEpVFR4YYhJvoEp1r9K">/pages/jFEpVFR4YYhJvoEp1r9K</a></td></tr><tr><td><strong>How It Works</strong></td><td>Server, runner, execution targets, and what lives where in local dev vs production.</td><td><a href="/pages/fpgU4WBhT9hosGDLfA42">/pages/fpgU4WBhT9hosGDLfA42</a></td></tr><tr><td><strong>Core Concepts</strong></td><td>Flows, checkpoints, and the execution model.</td><td><a href="/pages/qw8hIFEbl4taSEvy4SNP">/pages/qw8hIFEbl4taSEvy4SNP</a></td></tr><tr><td><strong>Execution Management</strong></td><td>Inspect runs, replay, retry, resume, and fetch logs.</td><td><a href="/pages/m1ms9iW3v3U2tkSxyRWm">/pages/m1ms9iW3v3U2tkSxyRWm</a></td></tr><tr><td><strong>Wait, Input, and Resume</strong></td><td>Pause flows for external input and continue the same execution.</td><td><a href="/pages/BUp6cWRuU8VUfknQKRto">/pages/BUp6cWRuU8VUfknQKRto</a></td></tr><tr><td><strong>Tracked LLM Calls</strong></td><td>Use kitaru.llm() with aliases, secrets, and captured artifacts.</td><td><a href="/pages/wljT8fZIU4BA8fs9S8aB">/pages/wljT8fZIU4BA8fs9S8aB</a></td></tr><tr><td><strong>Secrets + Model Registration</strong></td><td>Store provider credentials, register a model alias, and use kitaru.llm().</td><td><a href="/pages/DJsLPOTXT5IAsfz7v4WZ">/pages/DJsLPOTXT5IAsfz7v4WZ</a></td></tr><tr><td><strong>Configuration</strong></td><td>Set runtime defaults and understand override precedence.</td><td><a href="/pages/jfZrP31z5ehu33Ct8Ljy">/pages/jfZrP31z5ehu33Ct8Ljy</a></td></tr><tr><td><strong>Stacks</strong></td><td>Create, inspect, switch, and clean up local and remote stacks across Kubernetes, AWS, GCP, and Azure.</td><td><a href="/pages/Md0YgNiF5z5NwLEvQ5aR">/pages/Md0YgNiF5z5NwLEvQ5aR</a></td></tr><tr><td><strong>MCP Server</strong></td><td>Query and manage executions via MCP tools.</td><td><a href="/pages/bKWyQ7nmVr76lemvYneQ">/pages/bKWyQ7nmVr76lemvYneQ</a></td></tr><tr><td><strong>Agent Skills</strong></td><td>Install quickstart, scoping, authoring, and adapter migration skills.</td><td><a href="/pages/dxfY4zN6l6d8rDPipYj2">/pages/dxfY4zN6l6d8rDPipYj2</a></td></tr><tr><td><strong>CLI Reference</strong></td><td>Browse the generated command reference.</td><td><a href="https://docs.zenml.io/sdk-reference">https://docs.zenml.io/sdk-reference</a></td></tr><tr><td><strong>Blog</strong></td><td>Read essays on durable execution, long-running agents, and Kitaru's design.</td><td><a href="https://kitaru.ai/blog/">https://kitaru.ai/blog/</a></td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.zenml.io/kitaru/getting-started/readme.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
