For the complete documentation index, see llms.txt. This page is also available as Markdown.

How It Works

What runs where when you execute a Kitaru flow — server, runner, execution targets, and the contract between them.

When you call .run() on a flow, three things work together to make it durable: the Kitaru server (shared metadata, auth, deployment registry), the runner (per-run durable control flow), and one or more execution targets (where each checkpoint's code actually executes). During local development all three collapse into a single Python process. In production they separate across your infrastructure.

The Kitaru server, runner, and execution targets and how they relate.

Kitaru separates durable control flow from code execution:

  • The Kitaru server stores shared metadata, deployment snapshots, checkpoint state, execution logs, and control-plane data.

  • For each run, a runner (the durable brain of an execution) executes the selected flow snapshot, manages checkpoint order, persists state, and handles retry, replay, resume, and wait.

  • Individual checkpoints can run inline in the runner or in an isolated runtime (a separate container, Kubernetes job, or cloud job on the configured stack). The runner/target split is also where sandboxes, external tools, and custom compute backends conceptually plug in — the two shipped execution targets today are inline and isolated.

Key idea. The runner owns the durable run: checkpoint order, state, retry, replay, resume, and wait. Execution targets do the work. Checkpoints are the contract between the two.

Execution architecture: the runner delegates checkpoint work to inline or isolated execution targets.

Control / orchestration / execution

Kitaru splits runtime responsibilities into three planes. (This is separate from the harness / runtime / platform split, which is about where Kitaru sits in the broader agent stack — not about how a single run executes.)

The control, orchestration, and execution planes of a Kitaru run.
Plane
What lives here
Responsibility

Control plane

Kitaru server, UI, metadata DB, deployment registry, CLI/SDK/MCP APIs, auth and credential brokering

Knows what exists and who can call what

Orchestration plane

The runner for a single execution

Owns durable control flow for one run

Execution plane

Inline process or isolated container (shipped today); sandboxes, external tools, and custom backends are conceptual extensions of the same contract

Performs work

The control plane is long-lived and shared. The orchestration plane is per-run and durable. The execution plane is where your code (and your agent's code) actually executes.

What runs where?

The most common confusion is which component runs user code. This table makes it explicit.

Component
What it does
Runs user code?

Kitaru server (control plane)

Stores deployment registry, execution metadata, checkpoint state, log metadata, auth and session state

No

Runner (orchestration plane)

Runs the selected flow snapshot, controls checkpoint order, persists durable state, handles retry / replay / resume / wait

Yes, for inline checkpoints

Inline execution

Runs a checkpoint inside the runner process/pod

Yes

Isolated runtime

Runs a checkpoint in a separate container, job, pod, or remote compute backend

Yes

Sandbox (conceptual)

The same contract as isolated, tightened with stronger isolation or restricted egress. Not a shipped Kitaru execution target today — provided via adapters / your platform.

Yes, where integrated

External tool / MCP server

Performs work through a remote API or capability

Outside Kitaru

Metadata store

Stores runs, versions, checkpoint statuses, replay lineage

No

Artifact / state store

Stores checkpoint outputs, files, logs, replay lineage

No

The run, step by step

Here is what actually happens when a consumer invokes a flow.

1

Request arrives. A user, service, or upstream agent calls the Kitaru invocation API (via CLI, SDK, MCP, or HTTP).

2

Server resolves the flow. The server authenticates the caller, resolves the target flow (and optionally a version or tag), validates the input schema, and creates a run record plus a FlowHandle.

3

Runner starts. The control plane schedules a runner on your configured stack — a Kubernetes pod, a cloud job, or the local process in dev. The runner loads the selected flow snapshot.

4

Runner executes checkpoints in order. For each checkpoint, the runner either executes inline or delegates to an isolated target. It waits for the result, persists the output to the artifact/state store, and advances.

5

State is durable the entire time. If a checkpoint fails, if the runner dies, or if a kitaru.wait() suspends the run, the server retains everything needed to retry, replay, or resume later.

6

Consumer observes results. The caller uses the returned FlowHandle (or the UI / CLI / SDK / MCP) to tail logs, inspect checkpoints, provide human input, replay, or cancel.

Runner vs sandbox

This is the idea that tends to click last and matter most.

The runner is the durable brain of a run. The sandbox (or isolated runtime) is the hands that perform work.

If a sandbox dies mid-execution — a container evicted, a network partition, a pod OOM — the runner still holds durable checkpoint state and can retry that single checkpoint, resume from the last known boundary, or replay the run with a modified input or code version. The sandbox's failure is localized to the checkpoint that was executing, not the whole agent.

This is why platform teams should not confuse "I have a sandbox provider" with "I have durable execution". A sandbox is a bounded execution environment. Durable execution is a property of the surrounding runner — and of the checkpoints it persists.

Inline vs isolated checkpoints

Every checkpoint picks an execution target. Two are built in today: inline (same process as the runner) and isolated (a separate container or job on the configured stack). Code examples, decision rules for when to reach for isolated, and the interaction with .submit() live on the Checkpoints page.

A failed checkpoint is durable context

In classical pipelines, a failed step is a crash. In Kitaru, a failed checkpoint is durable context — something the runner, the agent loop, a human, or a retry policy can reason about.

Consider a document-synthesis agent:

A failed checkpoint is persisted as a typed artifact the runner, agent, or human can act on.

Because the retrieval checkpoint's failure is persisted as a typed artifact, a downstream consumer has several real options:

  • Retry the same checkpoint with the same input

  • Replay with a modified input (e.g. a corrected document id)

  • Replay with modified code (e.g. a new retrieval strategy)

  • Feed the error artifact back into the agent loop so it can self-correct

  • Wait for a human to provide a correction via kitaru.wait(), then resume

This is what "agent-native error handling" means in practice: failures become data, and durable state survives them.

How deep do you integrate?

You don't have to restructure your agent to get value. Pick the depth that fits.

Level 0 — Black-box harness

Wrap the entire agent run as one checkpoint.

  • Fastest integration

  • Minimal code changes

  • Framework-agnostic

The tradeoff: replay boundary is coarse (one per agent run) and you see less of the agent's internal state.

Level 1 — Coarse workflow checkpoints

Add checkpoints around the phases that matter to your team.

  • Useful replay points

  • Better audit trail

  • Good balance of portability and durability

The tradeoff: you (not the framework) decide where the boundaries go.

Level 2 — Framework-aware adapter

Use a Kitaru adapter that tracks the framework's internals (model calls, tool calls, intermediate state) as child events under the enclosing checkpoint.

  • Richer introspection

  • Better debugging

  • Tighter developer experience

The tradeoff: adapters are per-framework and need maintenance. Supported framework integrations now live in the Adapters section.

Framework-agnostic by construction

Kitaru does not require your agent to be written as a graph. @checkpoint wraps ordinary Python function boundaries, independent of the harness.

That means a platform team supporting multiple harnesses — PydanticAI here, LangGraph there, Claude Agent SDK for one team, a raw-Python loop for another — can still standardize durability, replay, and execution metadata on a single runtime primitive. The harness choice stays a per-team decision.

Fits behind your platform

Kitaru can be used directly through its invocation API, or placed behind your existing platform/gateway:

Kitaru placed behind an existing platform gateway that owns auth, entitlements, and UI.

You keep your auth, entitlements, interceptors, observability, and UI. Kitaru handles the durable execution layer underneath. This is how Kitaru drops into a finance- or regulated-industry-style internal agent platform without asking you to rebuild the surrounding system.

Local development

When you are developing locally, all three components run inside a single Python process on your machine. The server is embedded — no separate service to start, no database to configure. Checkpoint outputs are written to your local filesystem.

This means you can install Kitaru, run kitaru init, and have a fully working durable execution environment in under a minute. Your flows behave exactly the same as they will in production — same checkpointing, same replay, same observability — just without the cloud infrastructure underneath.

Optionally, you can run a local server and UI to browse executions in a web UI.

Production

In production, the three components separate across your infrastructure:

  • The server runs as a long-lived Kubernetes pod (deployed via Helm). It stores execution state in a database and serves the UI. Your whole team connects to it.

  • The runner runs on the compute backend defined by your stack — Kubernetes, Vertex AI, SageMaker, AzureML. When you call .run(), the client fetches short-lived credentials from the server and dispatches the execution directly to the compute backend. The runner executes your checkpoints and writes outputs to cloud storage. If the execution crashes, replay picks up from the last completed checkpoint.

  • Artifacts and state live in your own S3 / GCS / Azure Blob bucket. The server tracks metadata but does not access storage directly; when a client needs to read files, it fetches temporary credentials brokered by the server.

There is no mandatory SaaS control plane in the path of your agent's data.

Last updated

Was this helpful?