For the complete documentation index, see llms.txt. This page is also available as Markdown.

Claude Agent SDK

Wrap Claude Agent SDK invocations in Kitaru checkpoints, capture session context, and replay completed Claude calls honestly

The Claude Agent SDK gives you Claude Code as a library: Claude can read files, edit files, run commands, use MCP servers, call tools, follow permissions, and keep a session transcript. Kitaru does not replace that agent loop.

Kitaru adds an outer durable execution boundary around it:

one completed Claude Agent SDK invocation = one Kitaru checkpoint

That boundary is useful when a Claude call is one part of a larger workflow. Imagine this flow:

collect inputs → ask Claude to analyze them → write report → notify reviewer

If Claude finishes the analysis and the later write report checkpoint fails, Kitaru can replay the flow and reuse the completed Claude result instead of calling Claude again. You keep the Claude session ID, final text, usage/cost metadata, message records, and Kitaru artifacts that explain what happened.

The adapter focuses on the completed Claude SDK invocation as the durable unit: one prompt enters the SDK, Claude finishes, and Kitaru stores the completed result and capture envelope.

The mental model

Think of Claude Agent SDK as the driver and Kitaru as the trip recorder and checkpoint gate.

Claude still drives inside the invocation:

Claude prompt
  └─ Claude Agent SDK / Claude Code loop
      ├─ model calls
      ├─ built-in tools
      ├─ optional Bash commands
      ├─ optional MCP/custom tool calls
      ├─ permissions and hooks
      └─ final ResultMessage with a session_id

Kitaru wraps the outside:

So Kitaru can say:

“This Claude invocation completed. Here is the result and session context. On replay, I can return that completed boundary output without calling Claude again.”

Kitaru cannot honestly say:

“I can resume from Claude's sixth internal tool call and deterministically avoid every side effect Claude already performed.”

The difference matters. If Claude runs Bash and creates report.md, Kitaru can store the final Claude result saying that happened. Kitaru does not automatically snapshot and recreate report.md in a fresh workspace. If that file must be a durable workflow output, write it in a Kitaru-owned checkpoint after Claude returns the content.

What you get

The adapter gives existing Claude Agent SDK users:

  • one durable Kitaru checkpoint around each completed Claude invocation

  • replay-skip for completed Claude invocations inside larger Kitaru flows

  • optional live stream events from run_stream(...) / run_stream_sync(...) while that invocation checkpoint is running

  • a typed ClaudeRunResult with final text, session ID, stop reason, turn count, usage, model usage, and cost fields when the SDK reports them

  • captured SDK message records

  • best-effort local transcript capture when the Claude SDK writes a transcript file where the adapter can find it

  • a redacted options manifest for debugging and audit trails

  • Kitaru event-log and run-summary artifacts

  • an explicit resume path through Claude's native session_id

Claude's own docs are still the source of truth for the inner SDK behavior:

Install

Add the Claude Agent SDK extra. Include local if you want the local Kitaru server and dashboard:

Initialize the project once:

For direct Anthropic API usage, set ANTHROPIC_API_KEY before making a real Claude call:

The Claude SDK also supports Bedrock and Vertex modes when their provider-specific environment is configured.

Migrating an existing Claude Agent SDK project? The zenml-io/kitaru-skills package includes /kitaru:kitaru-claude-agent-sdk-migration for wrapping one Claude invocation checkpoint while keeping Claude-owned tools, Bash, MCP, sessions, and workspace-file caveats explicit. See Agent Skills.

Minimal flow

This example keeps Claude's tools disabled so the first run is easy to reason about: one prompt goes in, one completed Claude result comes back, and Kitaru stores that completed invocation.

The checkpoint name is derived from the runner name. In this example, the adapter-created checkpoint is named:

How a run works, step by step

When runner.run_sync(ClaudeRunRequest.start(...)) executes inside a Kitaru flow, this is the concrete sequence:

On replay, if this checkpoint is already complete and cache/replay rules allow Kitaru to reuse it, Kitaru serves the saved ClaudeRunResult. Claude is not called again for that completed invocation.

Requests, sessions, and resume

Use ClaudeRunRequest.start(...) for a fresh Claude SDK session:

Use ClaudeRunRequest.resume(...) when you want the Claude SDK to continue from a previous Claude session_id:

If the prompt comes from a previous checkpoint or from a flow-body kitaru.llm() call, remember that the value in the flow body is a Kitaru checkpoint output handle, not the concrete string yet. Load it before building a Claude request when Claude needs the actual text:

If you want to preserve Kitaru's durable data edge instead, pass the original handle into a downstream @checkpoint and build the ClaudeRunRequest inside that checkpoint.

This uses Claude's native session mechanism. Kitaru records the session ID and passes it back through ClaudeAgentOptions(resume=...) when you resume.

Session resume and Kitaru replay are related but different:

Thing
What it means

Claude session_id

Claude's conversation/session handle. Use it to continue a Claude SDK session.

Kitaru checkpoint

Kitaru's durable workflow boundary. Use it to skip or replay completed workflow work.

Claude transcript

The conversation record written by Claude SDK when available. Useful for audit/debugging.

Workspace files

Real files on disk. Kitaru does not snapshot them in this adapter.

A practical consequence: if a Claude invocation fails halfway through, Kitaru has no completed invocation result to reuse. Retrying that checkpoint starts the Claude invocation again. If you need stronger mid-invocation transcript persistence, look at Claude's session and session-store features in the official SDK docs; this adapter's Kitaru boundary is the completed invocation.

On Kubernetes (or any distributed orchestrator), treat local Claude transcript files and workspace files as pod-local best-effort state. A later checkpoint or replay may run in a different pod with a different filesystem. The durable part is the completed Kitaru checkpoint output/artifacts that Kitaru persists; local Claude JSONL transcript files are only reliably reusable across pods when you back them with shared persistent storage and/or a Claude session store.

Result shape

KitaruClaudeRunner.run_sync(...) returns a ClaudeRunResult with fields such as:

  • final_text

  • session_id

  • transcript_path

  • usage

  • cost_usd

  • model_usage

  • stop_reason

  • num_turns

  • artifact names for captured messages, transcript, output, usage, event log, and run summary

  • warnings for best-effort capture gaps, such as a missing transcript file or a non-fatal artifact/event/log persistence failure

Failed Claude invocations raise an exception instead of returning a ClaudeRunResult(status="failed"). If the failure happens inside a Kitaru checkpoint, the adapter still records best-effort failure metadata before the exception propagates.

Live streaming with Kitaru durability

Use run_stream(...) / run_stream_sync(...) when you want to watch a Claude invocation while the Kitaru checkpoint is still running. The durable boundary is unchanged: Kitaru still saves one final ClaudeRunResult for one Claude SDK query.

The concrete story is:

Async flow:

Sync flow:

The Claude stream event kinds are:

  • claude_agent_sdk.stream.started

  • claude_agent_sdk.stream.event

  • claude_agent_sdk.stream.completed

  • claude_agent_sdk.stream.failed

You can import the constants instead of typing the strings yourself:

To watch the live events, submit the flow and read execution events from the client while .wait() is still pending:

The important shape is: start the optional watcher in the background, keep .wait() on the main path, and stop the watcher after the durable result is available. If the active backend cannot publish or watch live events, or if the watcher misses a terminal event, the Claude invocation can still finish and .wait() can still return the saved ClaudeRunResult.

The live payloads are intentionally smaller than the durable artifacts. By default, they do not include text deltas, the raw prompt, full SDK options, raw SDK stream events, full tool input JSON, full assistant messages, final result text, or structured output. If you deliberately want clipped live text deltas, opt in with ClaudeCapturePolicy(include_stream_text_deltas=True). If you need the durable conversation record, inspect the final ClaudeRunResult and captured artifacts after the checkpoint completes.

A few gotchas matter:

  • Live events are best effort. Missing stream events do not mean Claude failed; check the final ClaudeRunResult.

  • Replay may emit stream events again if the stream checkpoint body re-executes. For example, if a later checkpoint crashes, replay may call Claude again and send new live events for the retried invocation.

  • Stream calls use a separate cache surface from non-stream calls, so a cached run_sync(...) result does not silently satisfy run_stream_sync(...). A repeated stream call can still hit the stream cache, though. When that happens, Kitaru reuses the saved final ClaudeRunResult, the checkpoint body is skipped, and no fresh Claude stream events may appear.

  • checkpoint_strategy="invocation" is still the only Claude strategy. The stream updates are progress reports from inside that one invocation boundary; they are not per-tool or per-token replay checkpoints.

  • If you set allow_direct_execution_inside_checkpoint=True, stream events are published on the outer checkpoint's live-event lane. The replay warning still applies: replaying that outer checkpoint can call Claude again and duplicate tool actions, file edits, or API cost.

  • Claude's own file checkpointing remains separate from Kitaru checkpoints. A Claude file checkpoint can help Claude rewind supported file edits inside a session; it is not the Kitaru checkpoint output and it is not a workspace snapshot.

The runnable version lives at examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_streaming.py. It uses allowed_tools=[], max_turns=1, disables checkpoint caching for the demo, watches CLAUDE_STREAM_EVENT_KINDS, and then prints the final durable ClaudeRunResult.

Capture policy

By default, the adapter captures the boundary data that is useful for replay inspection and audits:

  • prompt and SDK message records

  • best-effort local transcript JSONL payload

  • redacted options manifest

  • final output

  • usage and cost information when the SDK reports it

  • one invocation event and one run summary

Usage and cost statistics

Each successful Claude invocation logs one canonical llm_usage_v1 record. The record uses the adapter run label as its stable identity, includes the SDK usage payload when Claude reports one, and stores cost_usd as provider-reported actual_cost_usd when the Claude SDK provides it. Kitaru does not run a separate Claude cost estimator in this adapter path.

Some Claude SDK results expose model_usage instead of the top-level usage payload. In that case, Kitaru uses model_usage as a fallback for the canonical usage record. If both are present, Kitaru uses usage and does not add model_usage on top; otherwise the same tokens could be counted twice.

The canonical record is independent of the durable adapter event log. Turning emit_events=False suppresses the invocation event and run summary artifacts, but it does not stop the lightweight LLM usage metadata record. Set save_usage=False when you do not want Claude usage persisted; that disables both the separate usage artifact and the canonical invocation record used for execution-level LLM summaries.

These records roll up after your code observes the terminal execution with FlowHandle.wait() or FlowHandle.get(). If Claude reports usage but not cost, the summary can still count tokens while marking that record as missing a dollar-cost value.

You can reduce what is stored with ClaudeCapturePolicy:

Treat messages and transcripts as conversation data. They may contain prompts, retrieved document snippets, tool arguments, command output, and model output. The options manifest is redacted by default, including common secret-bearing mapping keys, key/value sequence pairs such as [("Authorization", "Bearer ...")], [("x-api-key", "...")], cookie/env pairs, and env-list entries shaped like {"name": "ANTHROPIC_API_KEY", "value": "..."}. Message/transcript artifacts are not a secret store; turn them off when the conversation itself is sensitive.

Artifact, event-log, run-summary, and metadata persistence are best-effort by default. That matters for replay safety: if Claude has already completed and a later Kitaru save/log call fails, the adapter does not fail the completed Claude call just because observability capture had a problem. Instead, the returned ClaudeRunResult.warnings and ClaudeRunResult.metadata describe what failed, and failed artifact-name fields are left empty.

ClaudeCapturePolicy.emit_events controls this durable adapter event log and run summary capture. It is not the switch for live claude_agent_sdk.stream.* publishing. Live stream publishing happens only when you call run_stream(...) / run_stream_sync(...), and it remains best-effort progress rather than durable transcript state.

If you want observability persistence to be fail-fast, opt in explicitly:

Even in strict mode, a failed attempt to save failure metadata does not hide the original Claude SDK error.

Options factory

Prefer options_factory over passing one static options object when request fields should affect the Claude SDK call.

Why this matters:

  • cwd controls which project directory Claude sees.

  • resume_session_id must become ClaudeAgentOptions(resume=...) for resumed requests.

  • max_turns is often different per call.

  • MCP servers, hooks, permission callbacks, and session stores can be live Python objects. Building fresh options per request avoids mutating shared state.

If you pass a static options= object and then send a request with cwd, resume_session_id, or max_turns, the adapter raises instead of guessing how to mutate your options.

Checkpoint strategy

The Claude Agent SDK adapter currently supports one strategy:

"invocation" is also the default. It is the only replay-honest boundary the Claude SDK exposes to Kitaru today: one Claude query goes in, one durable result comes out. Strategies such as "calls", "runner_call", "model_call", and "tool_call" are rejected on purpose because they would imply granular durability that this adapter does not provide.

checkpoint_config= accepts the same small checkpoint knobs used by other adapter-created checkpoints:

runtime="isolated" is not supported for adapter-managed Claude checkpoints. Claude SDK options can contain live process objects such as MCP servers, hooks, and callbacks, and those are not reconstructible across Kitaru's isolated runtime boundary yet.

Calling from inside an existing checkpoint

By default, runner.run(...) and runner.run_sync(...) reject calls made from inside an existing Kitaru checkpoint. The reason is concrete: Kitaru cannot open the adapter-created Claude invocation checkpoint inside another checkpoint. If the adapter silently called Claude directly there, replaying the outer checkpoint could call Claude again and duplicate tool actions, file edits, or API cost.

The recommended pattern is:

Use the direct-execution opt-in only when you have accepted that replay risk:

When this opt-in is used, the returned result includes a warning and metadata["direct_execution_inside_checkpoint"] = True.

Claude tools, MCP, Bash, hooks, and permissions

You can still use Claude Agent SDK features through ClaudeAgentOptions:

Kitaru passes those options to Claude. Claude's own runtime decides what tools are available, what permissions apply, and which hooks run. See the official hooks and permissions docs for that layer.

The important Kitaru point is: these features remain inside the one Claude invocation checkpoint.

A concrete failure story:

On replay, Kitaru can return the saved ClaudeRunResult without calling Claude again. But if the replay is running in a fresh workspace, Kitaru does not magically recreate report.md, because that file write happened inside Claude's own Bash/tool loop, not in a Kitaru-owned checkpoint. The adapter also does not snapshot the working directory before or after Claude runs.

If a side effect must be durable, make it a Kitaru-owned step:

That way the durable file write is visible to Kitaru as its own checkpoint.

Claude file checkpointing is different

Claude Agent SDK has its own file checkpointing feature for rewinding certain file changes made by Claude's built-in file-edit tools. That is useful, but it is not the same thing as a Kitaru checkpoint.

Keep the two ideas separate:

Feature
Owned by
What it helps with

Kitaru checkpoint

Kitaru

Replay/skip completed workflow work and store typed outputs.

Claude session

Claude SDK

Continue a Claude conversation from a session ID.

Claude file checkpointing

Claude SDK

Rewind supported Claude file-edit tool changes in a session.

Workspace snapshot

Not provided by this adapter

Recreate arbitrary files/process state after a crash.

The adapter records what it can observe at the invocation boundary. It does not turn Claude file checkpointing into Kitaru checkpointing, and it does not provide a general workspace snapshot system.

Put the Claude runner call directly in the flow body so the adapter can create its own checkpoint around the invocation. Put side effects that must be durable in separate Kitaru checkpoints after Claude returns.

That gives you a concrete sequence like this:

If a later checkpoint fails, Kitaru can reuse the completed Claude invocation result instead of calling Claude again.

Runnable examples

Run the educational non-streaming integration example:

To inspect the example without making a Claude API call:

The example prints the final text, session ID, usage/cost details when reported, and Kitaru artifact names. It uses allowed_tools=[] and max_turns=1 so the first run teaches the adapter boundary without introducing tool side effects.

Run the live streaming example when you have a REST-backed stream-event backend:

That script submits the flow, starts a watcher for claude_agent_sdk.stream.* events, and then prints the final durable ClaudeRunResult. If live watching is not available on your backend, the .wait() path still demonstrates the saved result.

Larger Claude example

For a richer workflow, see the compliance-review example:

That example shows staged audit checkpoints, artifact persistence, and wait/resume around Claude turns. It uses the same high-level boundary: one Claude SDK invocation is one Kitaru checkpoint. It also has a custom transcript materializer for its multi-turn remote-stack story. That extra materializer is example-specific; the generic adapter still does not claim granular replay of Claude-internal side effects.

Troubleshooting

“Why do I need @kitaru.flow?”

The adapter creates a Kitaru checkpoint. Checkpoints need a flow execution to belong to. Wrap Claude calls in @kitaru.flow so Kitaru can record and replay that boundary.

“Why did checkpoint_strategy="calls" fail?”

Because Claude's Python SDK does not expose replay-safe Python call bodies for each internal model/tool/Bash/MCP action. The adapter would be lying if it called hook observations “call checkpoints.” Use checkpoint_strategy="invocation" and move durable side effects into explicit Kitaru checkpoints.

“Why did calling the runner from my checkpoint fail?”

The adapter refused the call because it could not create its own Claude invocation checkpoint inside your existing checkpoint. Move the runner call to the flow body when possible. If you deliberately want Claude to run directly inside the existing checkpoint, create the runner with allow_direct_execution_inside_checkpoint=True and treat replay of the outer checkpoint as capable of calling Claude again.

“Why is my transcript artifact missing?”

The adapter always captures SDK-emitted messages when save_messages=True. Transcript-file capture is best-effort because Claude owns where and when local JSONL transcripts are written. Missing transcript files create a warning, not a failed run.

“Why is an artifact name missing even though Claude completed?”

Capture and event persistence are best-effort by default. If Claude completed but saving one artifact or log entry failed, the adapter returns the Claude result and records the persistence problem in result.warnings and result.metadata instead of turning the completed Claude call into a failure. Use fail_on_artifact_capture_error=True or fail_on_event_persistence_error=True when you want strict behavior.

“Why did my stream watcher show no events?”

First check the final ClaudeRunResult. Live events are progress only, not the durable source of truth. Common causes are: the active backend does not support live event watching, the checkpoint hit the stream cache and skipped the body, or Claude text deltas were not explicitly enabled with ClaudeCapturePolicy(include_stream_text_deltas=True), or Claude produced only coarse SDK messages. For a repeatable demo, the streaming example sets checkpoint_config={"cache": False}.

“Can this resume a half-finished Claude invocation?”

Not by itself. If the invocation does not complete, Kitaru has no completed checkpoint output to reuse. Claude session/session-store features may help with conversation recovery at the Claude layer, but the Kitaru adapter boundary is still one completed invocation.

Last updated

Was this helpful?