> For the complete documentation index, see [llms.txt](https://docs.zenml.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.zenml.io/kitaru/adapters/claude-agent-sdk.md).

# Claude Agent SDK

Run a [Claude Agent SDK](https://code.claude.com/docs/en/agent-sdk) invocation as a Kitaru checkpoint so each completed Claude call becomes a durable unit you can replay, skip, and audit inside a larger flow. The Claude SDK gives you Claude Code as a library (read/edit files, run commands, use MCP servers, call tools, follow permissions, keep a session transcript); Kitaru does **not** replace that agent loop, it records the boundary around it:

```
one completed Claude Agent SDK invocation = one Kitaru checkpoint
```

That boundary matters when a Claude call is one step of a multi-step workflow:

```
collect inputs → ask Claude to analyze them → write report → notify reviewer
```

If Claude finishes the analysis and the later `write report` checkpoint fails, replaying the flow reuses the completed Claude result instead of calling Claude again. You keep the Claude session ID, final text, usage/cost metadata, message records, and Kitaru artifacts that explain what happened. One prompt enters the SDK, Claude finishes, and Kitaru stores the completed result and capture envelope as the durable unit.

## The mental model

Think of Claude Agent SDK as the **driver** and Kitaru as the **trip recorder and checkpoint gate**.

Claude still drives inside the invocation:

```
Claude prompt
  └─ Claude Agent SDK / Claude Code loop
      ├─ model calls
      ├─ built-in tools
      ├─ optional Bash commands
      ├─ optional MCP/custom tool calls
      ├─ permissions and hooks
      └─ final ResultMessage with a session_id
```

Kitaru wraps the outside:

```
@kitaru.flow
  └─ claude_summary_claude_invocation checkpoint
      └─ one call to claude_agent_sdk.query(...)
```

So Kitaru can say:

> “This Claude invocation completed. Here is the result and session context. On replay, I can return that completed boundary output without calling Claude again.”

Kitaru cannot honestly say:

> “I can resume from Claude's sixth internal tool call and deterministically avoid every side effect Claude already performed.”

The difference matters. If Claude runs Bash and creates `report.md`, Kitaru can store the final Claude result saying that happened. Kitaru does not automatically snapshot and recreate `report.md` in a fresh workspace. If that file must be a durable workflow output, write it in a Kitaru-owned checkpoint after Claude returns the content.

## What you get

The adapter gives existing Claude Agent SDK users:

* one durable Kitaru checkpoint around each completed Claude invocation
* replay-skip for completed Claude invocations inside larger Kitaru flows
* optional live stream events from `run_stream(...)` / `run_stream_sync(...)` while that invocation checkpoint is running
* a typed `ClaudeRunResult` with final text, session ID, stop reason, turn count, usage, model usage, and cost fields when the SDK reports them
* captured SDK message records
* best-effort local transcript capture when the Claude SDK writes a transcript file where the adapter can find it
* a redacted options manifest for debugging and audit trails
* Kitaru event-log and run-summary artifacts
* an explicit resume path through Claude's native `session_id`

Claude's own docs are still the source of truth for the inner SDK behavior:

* [Agent SDK overview](https://code.claude.com/docs/en/agent-sdk)
* [Python SDK reference](https://code.claude.com/docs/en/agent-sdk/python)
* [Sessions](https://code.claude.com/docs/en/agent-sdk/sessions)
* [Hooks](https://code.claude.com/docs/en/agent-sdk/hooks)
* [Permissions](https://code.claude.com/docs/en/agent-sdk/permissions)
* [File checkpointing](https://code.claude.com/docs/en/agent-sdk/file-checkpointing)

## Install

Add the Claude Agent SDK extra. Include `local` if you want the local Kitaru server and dashboard:

```bash
uv add "kitaru[claude-agent-sdk,local]"
```

Initialize the project once:

```bash
kitaru init
kitaru login        # local server; add a URL for a deployed one
kitaru status
```

For direct Anthropic API usage, set `ANTHROPIC_API_KEY` before making a real Claude call:

```bash
export ANTHROPIC_API_KEY='<your-anthropic-api-key>'
```

The Claude SDK also supports Bedrock and Vertex modes when their provider-specific environment is configured.

{% hint style="info" %}
Migrating an existing Claude Agent SDK project? The [`zenml-io/kitaru-skills`](https://github.com/zenml-io/kitaru-skills) package includes `/kitaru:kitaru-claude-agent-sdk-migration` for wrapping one Claude invocation checkpoint while keeping Claude-owned tools, Bash, MCP, sessions, and workspace-file caveats explicit. See [Agent Skills](/kitaru/agent-native/claude-code-skill.md).
{% endhint %}

## Minimal flow

This example keeps Claude's tools disabled so the first run is easy to reason about: one prompt goes in, one completed Claude result comes back, and Kitaru stores that completed invocation.

```python
from pathlib import Path

from claude_agent_sdk import ClaudeAgentOptions
from kitaru import flow
from kitaru.adapters.claude_agent_sdk import (
    ClaudeRunRequest,
    ClaudeRunResult,
    KitaruClaudeRunner,
)

runner = KitaruClaudeRunner(
    name="claude_summary",
    options_factory=lambda request: ClaudeAgentOptions(
        # Empty allowed_tools keeps this first example non-destructive.
        allowed_tools=[],
        cwd=request.cwd,
        resume=request.resume_session_id,
        max_turns=request.max_turns,
    ),
)

@flow
def summarize(prompt: str) -> ClaudeRunResult:
    request = ClaudeRunRequest.start(
        prompt,
        cwd=str(Path.cwd()),
        max_turns=1,
    )
    return runner.run_sync(request)

result = summarize.run("Explain Kitaru checkpoints in five sentences.").wait()
print(result.final_text)
print(result.session_id)
```

The checkpoint name is derived from the runner name. In this example, the adapter-created checkpoint is named:

```
claude_summary_claude_invocation
```

## How a run works, step by step

When `runner.run_sync(ClaudeRunRequest.start(...))` executes inside a Kitaru flow, this is the concrete sequence:

```
1. Kitaru opens one synthetic checkpoint.
2. The adapter builds ClaudeAgentOptions from options_factory(request).
3. The adapter calls claude_agent_sdk.query(prompt=..., options=...).
4. The adapter drains the SDK messages until the final ResultMessage arrives.
5. It extracts final text, session_id, usage, cost, stop reason, and turn count.
6. It tries to locate the local Claude transcript file for that session.
7. It saves configured artifacts: messages, transcript, manifest, output, usage,
   event log, and run summary.
8. It returns ClaudeRunResult as the checkpoint output.
```

On replay, if this checkpoint is already complete and cache/replay rules allow Kitaru to reuse it, Kitaru serves the saved `ClaudeRunResult`. Claude is not called again for that completed invocation.

## Requests, sessions, and resume

Use `ClaudeRunRequest.start(...)` for a fresh Claude SDK session:

```python
request = ClaudeRunRequest.start(
    "Review this policy for missing controls.",
    cwd="/path/to/project",
    max_turns=3,
    metadata={"document": "it-policy"},
)
```

Use `ClaudeRunRequest.resume(...)` when you want the Claude SDK to continue from a previous Claude `session_id`:

```python
request = ClaudeRunRequest.resume(
    "Now explain the highest-risk finding.",
    session_id=previous_result.session_id,
    cwd="/path/to/project",
    max_turns=2,
)
```

If the prompt comes from a previous checkpoint or from a flow-body `kitaru.llm()` call, remember that the value in the flow body is a Kitaru checkpoint output handle, not the concrete string yet. Load it before building a Claude request when Claude needs the actual text:

```python
prompt_text = render_prompt(topic).load()
request = ClaudeRunRequest.start(prompt_text)
```

If you want to preserve Kitaru's durable data edge instead, pass the original handle into a downstream `@checkpoint` and build the `ClaudeRunRequest` inside that checkpoint.

This uses Claude's native session mechanism. Kitaru records the session ID and passes it back through `ClaudeAgentOptions(resume=...)` when you resume.

Session resume and Kitaru replay are related but different:

| Thing               | What it means                                                                             |
| ------------------- | ----------------------------------------------------------------------------------------- |
| Claude `session_id` | Claude's conversation/session handle. Use it to continue a Claude SDK session.            |
| Kitaru checkpoint   | Kitaru's durable workflow boundary. Use it to skip or replay completed workflow work.     |
| Claude transcript   | The conversation record written by Claude SDK when available. Useful for audit/debugging. |
| Workspace files     | Real files on disk. Kitaru does not snapshot them in this adapter.                        |

A practical consequence: if a Claude invocation fails halfway through, Kitaru has no completed invocation result to reuse. Retrying that checkpoint starts the Claude invocation again. If you need stronger mid-invocation transcript persistence, look at Claude's session and session-store features in the official SDK docs; this adapter's Kitaru boundary is the completed invocation.

On Kubernetes (or any distributed orchestrator), treat local Claude transcript files and workspace files as pod-local best-effort state. A later checkpoint or replay may run in a different pod with a different filesystem. The durable part is the completed Kitaru checkpoint output/artifacts that Kitaru persists; local Claude JSONL transcript files are only reliably reusable across pods when you back them with shared persistent storage and/or a Claude session store.

## Result shape

`KitaruClaudeRunner.run_sync(...)` returns a `ClaudeRunResult` with fields such as:

* `final_text`
* `session_id`
* `transcript_path`
* `usage`
* `cost_usd`
* `model_usage`
* `stop_reason`
* `num_turns`
* artifact names for captured messages, transcript, output, usage, event log, and run summary
* `warnings` for best-effort capture gaps, such as a missing transcript file or a non-fatal artifact/event/log persistence failure

Failed Claude invocations raise an exception instead of returning a `ClaudeRunResult(status="failed")`. If the failure happens inside a Kitaru checkpoint, the adapter still records best-effort failure metadata before the exception propagates.

## Live streaming with Kitaru durability

Use `run_stream(...)` / `run_stream_sync(...)` when you want to watch a Claude invocation while the Kitaru checkpoint is still running. The durable boundary is unchanged: Kitaru still saves one final `ClaudeRunResult` for one Claude SDK query.

The concrete story is:

```
Kitaru opens claude_summary_claude_invocation
  -> Claude SDK sends stream updates while it works
  -> Kitaru publishes best-effort claude_agent_sdk.stream.* live events
  -> Claude finishes with a ResultMessage
  -> Kitaru saves the final ClaudeRunResult as the checkpoint output
```

Async flow:

```python
from kitaru import flow
from kitaru.adapters.claude_agent_sdk import ClaudeRunRequest, KitaruClaudeRunner

runner = KitaruClaudeRunner(name="claude_summary")

@flow
async def summarize(prompt: str) -> str:
    result = await runner.run_stream(ClaudeRunRequest.start(prompt, max_turns=1))
    return result.final_text or ""
```

Sync flow:

```python
from kitaru import flow
from kitaru.adapters.claude_agent_sdk import ClaudeRunRequest, KitaruClaudeRunner

runner = KitaruClaudeRunner(name="claude_summary")

@flow
def summarize(prompt: str) -> str:
    result = runner.run_stream_sync(ClaudeRunRequest.start(prompt, max_turns=1))
    return result.final_text or ""
```

The Claude stream event kinds are:

* `claude_agent_sdk.stream.started`
* `claude_agent_sdk.stream.event`
* `claude_agent_sdk.stream.completed`
* `claude_agent_sdk.stream.failed`

You can import the constants instead of typing the strings yourself:

```python
from kitaru.adapters.claude_agent_sdk import (
    CLAUDE_STREAM_EVENT_KINDS,
    CLAUDE_STREAM_TERMINAL_EVENT_KINDS,
)
```

To watch the live events, submit the flow and read execution events from the client while `.wait()` is still pending:

```python
import threading

from kitaru.adapters.claude_agent_sdk import (
    CLAUDE_STREAM_EVENT_KINDS,
    CLAUDE_STREAM_TERMINAL_EVENT_KINDS,
)
from kitaru.client import KitaruClient


def watch_claude_stream(exec_id: str, stop_event: threading.Event) -> None:
    try:
        for event in KitaruClient().executions.events(
            exec_id,
            kinds=list(CLAUDE_STREAM_EVENT_KINDS),
        ):
            if stop_event.is_set():
                return
            data = event.payload.get("data", {})
            data = data if isinstance(data, dict) else {}
            print(data.get("display", event.kind))
            if event.kind in CLAUDE_STREAM_TERMINAL_EVENT_KINDS:
                return
    except Exception as exc:
        print(f"Live event watching unavailable; reading saved result instead: {exc}")


handle = summarize.run("Explain the project in one paragraph.", cache=False)
stop_watching = threading.Event()
watcher = threading.Thread(
    target=watch_claude_stream,
    args=(handle.exec_id, stop_watching),
    daemon=True,
)
watcher.start()

try:
    result = handle.wait()
finally:
    stop_watching.set()
    watcher.join(timeout=1.0)

if watcher.is_alive():
    print("Live watcher is still open; showing the durable result now.")
```

The important shape is: start the optional watcher in the background, keep `.wait()` on the main path, and stop the watcher after the durable result is available. If the active backend cannot publish or watch live events, or if the watcher misses a terminal event, the Claude invocation can still finish and `.wait()` can still return the saved `ClaudeRunResult`.

The live payloads are intentionally smaller than the durable artifacts. By default, they do not include text deltas, the raw prompt, full SDK options, raw SDK stream events, full tool input JSON, full assistant messages, final result text, or structured output. If you deliberately want clipped live text deltas, opt in with `ClaudeCapturePolicy(include_stream_text_deltas=True)`. If you need the durable conversation record, inspect the final `ClaudeRunResult` and captured artifacts after the checkpoint completes.

A few gotchas matter:

* Live events are best effort. Missing stream events do **not** mean Claude failed; check the final `ClaudeRunResult`.
* Replay may emit stream events again if the stream checkpoint body re-executes. For example, if a later checkpoint crashes, replay may call Claude again and send new live events for the retried invocation.
* Stream calls use a separate cache surface from non-stream calls, so a cached `run_sync(...)` result does not silently satisfy `run_stream_sync(...)`. A repeated stream call can still hit the stream cache, though. When that happens, Kitaru reuses the saved final `ClaudeRunResult`, the checkpoint body is skipped, and no fresh Claude stream events may appear.
* `checkpoint_strategy="invocation"` is still the only Claude strategy. The stream updates are progress reports from inside that one invocation boundary; they are not per-tool or per-token replay checkpoints.
* If you set `allow_direct_execution_inside_checkpoint=True`, stream events are published on the outer checkpoint's live-event lane. The replay warning still applies: replaying that outer checkpoint can call Claude again and duplicate tool actions, file edits, or API cost.
* Claude's own file checkpointing remains separate from Kitaru checkpoints. A Claude file checkpoint can help Claude rewind supported file edits inside a session; it is not the Kitaru checkpoint output and it is not a workspace snapshot.

The runnable version lives at [`examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_streaming.py`](https://github.com/zenml-io/kitaru/blob/develop/examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_streaming.py). It uses `allowed_tools=[]`, `max_turns=1`, disables checkpoint caching for the demo, watches `CLAUDE_STREAM_EVENT_KINDS`, and then prints the final durable `ClaudeRunResult`.

## Capture policy

By default, the adapter captures the boundary data that is useful for replay inspection and audits:

* prompt and SDK message records
* best-effort local transcript JSONL payload
* redacted options manifest
* final output
* usage and estimated cost information when the SDK reports it or your calculator provides it
* one invocation event and one run summary

## Usage and cost statistics

Each successful Claude invocation logs one canonical `llm_usage_v1` record. The record uses the adapter run label as its stable identity and includes the SDK `usage` payload when Claude reports one.

When the Claude SDK reports `total_cost_usd`, Kitaru records it as `estimated_cost_usd`, not `actual_cost_usd`. The SDK value is useful for spend dashboards, but it is still a calculated SDK value rather than a provider invoice line. If the SDK does not report a cost, a `cost_calculator=` you pass to `KitaruClaudeRunner` takes priority; Kitaru calls it with a `ClaudeUsageSummary` and records the returned non-negative number as an estimated USD cost. Without an SDK estimate or user calculator, Kitaru estimates with `genai-prices` when the usage points to one known Anthropic model.

```python
from kitaru.adapters.claude_agent_sdk import ClaudeUsageSummary, KitaruClaudeRunner


def calculate_claude_cost(usage: ClaudeUsageSummary) -> float | None:
    if usage.input_tokens is None or usage.output_tokens is None:
        return None
    # Replace these example rates with your own contract or billing source.
    return (usage.input_tokens / 1_000_000 * 3.0) + (
        usage.output_tokens / 1_000_000 * 15.0
    )


runner = KitaruClaudeRunner(
    name="claude_summary",
    cost_calculator=calculate_claude_cost,
)
```

The fallback `genai-prices` estimate is still an observability estimate, not a provider invoice. If Claude usage spans multiple models or does not identify the model, Kitaru records tokens only instead of guessing.

Some Claude SDK results expose `model_usage` instead of the top-level `usage` payload. In that case, Kitaru uses `model_usage` as a fallback for the canonical usage record. If both are present, Kitaru uses `usage` and does not add `model_usage` on top; otherwise the same tokens could be counted twice.

The canonical record is independent of the durable adapter event log. Turning `emit_events=False` suppresses the invocation event and run summary artifacts, but it does not stop the lightweight LLM usage metadata record. Set `save_usage=False` when you do not want Claude usage persisted; that disables both the separate usage artifact and the canonical invocation record used for execution-level LLM summaries.

Kitaru normally rolls these records into the execution-level usage summary when the execution finishes. `FlowHandle.wait()` and `FlowHandle.get()` can populate missing summaries for older executions or executions where the finish-time summary was not written. If Claude reports usage but not cost, the usage record keeps token counts and leaves cost fields empty unless you pass a calculator.

You can reduce what is stored with `ClaudeCapturePolicy`:

```python
from kitaru.adapters.claude_agent_sdk import ClaudeCapturePolicy, KitaruClaudeRunner

runner = KitaruClaudeRunner(
    name="private_claude_run",
    capture=ClaudeCapturePolicy(
        save_prompt=False,
        save_messages=False,
        save_transcript_file=False,
        save_usage=True,
    ),
)
```

Treat messages and transcripts as conversation data. They may contain prompts, retrieved document snippets, tool arguments, command output, and model output. The options manifest is redacted by default, including common secret-bearing mapping keys, key/value sequence pairs such as `[("Authorization", "Bearer ...")]`, `[("x-api-key", "...")]`, cookie/env pairs, and env-list entries shaped like `{"name": "ANTHROPIC_API_KEY", "value": "..."}`. Message/transcript artifacts are not a secret store; turn them off when the conversation itself is sensitive.

Artifact, event-log, run-summary, and metadata persistence are best-effort by default. That matters for replay safety: if Claude has already completed and a later Kitaru save/log call fails, the adapter does not fail the completed Claude call just because observability capture had a problem. Instead, the returned `ClaudeRunResult.warnings` and `ClaudeRunResult.metadata` describe what failed, and failed artifact-name fields are left empty.

`ClaudeCapturePolicy.emit_events` controls this durable adapter event log and run summary capture. It is not the switch for live `claude_agent_sdk.stream.*` publishing. Live stream publishing happens only when you call `run_stream(...)` / `run_stream_sync(...)`, and it remains best-effort progress rather than durable transcript state.

If you want observability persistence to be fail-fast, opt in explicitly:

```python
runner = KitaruClaudeRunner(
    name="strict_claude_run",
    capture=ClaudeCapturePolicy(
        fail_on_artifact_capture_error=True,
        fail_on_event_persistence_error=True,
    ),
)
```

Even in strict mode, a failed attempt to save failure metadata does not hide the original Claude SDK error.

## Options factory

Prefer `options_factory` over passing one static options object when request fields should affect the Claude SDK call.

```python
runner = KitaruClaudeRunner(
    name="claude_project_agent",
    options_factory=lambda request: ClaudeAgentOptions(
        cwd=request.cwd,
        resume=request.resume_session_id,
        max_turns=request.max_turns,
        allowed_tools=[],
        setting_sources=["project"],
    ),
)
```

Why this matters:

* `cwd` controls which project directory Claude sees.
* `resume_session_id` must become `ClaudeAgentOptions(resume=...)` for resumed requests.
* `max_turns` is often different per call.
* MCP servers, hooks, permission callbacks, and session stores can be live Python objects. Building fresh options per request avoids mutating shared state.

If you pass a static `options=` object and then send a request with `cwd`, `resume_session_id`, or `max_turns`, the adapter raises instead of guessing how to mutate your options.

## Checkpoint strategy

The Claude Agent SDK adapter currently supports one strategy:

```python
KitaruClaudeRunner(name="claude_reviewer", checkpoint_strategy="invocation")
```

`"invocation"` is also the default. It is the only replay-honest boundary the Claude SDK exposes to Kitaru today: one Claude query goes in, one durable result comes out. Strategies such as `"calls"`, `"runner_call"`, `"model_call"`, and `"tool_call"` are rejected on purpose because they would imply granular durability that this adapter does not provide.

`checkpoint_config=` accepts the same small checkpoint knobs used by other adapter-created checkpoints:

```python
runner = KitaruClaudeRunner(
    name="claude_reviewer",
    checkpoint_config={"cache": True, "retries": 1},
)
```

`runtime="isolated"` is not supported for adapter-managed Claude checkpoints. Claude SDK options can contain live process objects such as MCP servers, hooks, and callbacks, and those are not reconstructible across Kitaru's isolated runtime boundary yet.

### Calling from inside an existing checkpoint

By default, `runner.run(...)` and `runner.run_sync(...)` reject calls made from inside an existing Kitaru checkpoint. The reason is concrete: Kitaru cannot open the adapter-created Claude invocation checkpoint inside another checkpoint. If the adapter silently called Claude directly there, replaying the outer checkpoint could call Claude again and duplicate tool actions, file edits, or API cost.

The recommended pattern is:

```python
@flow
def review_flow(prompt: str) -> ClaudeRunResult:
    return runner.run_sync(ClaudeRunRequest.start(prompt, max_turns=1))
```

Use the direct-execution opt-in only when you have accepted that replay risk:

```python
runner = KitaruClaudeRunner(
    name="claude_inside_existing_checkpoint",
    allow_direct_execution_inside_checkpoint=True,
)
```

When this opt-in is used, the returned result includes a warning and `metadata["direct_execution_inside_checkpoint"] = True`.

## Claude tools, MCP, Bash, hooks, and permissions

You can still use Claude Agent SDK features through `ClaudeAgentOptions`:

```python
runner = KitaruClaudeRunner(
    name="claude_code_review",
    options_factory=lambda request: ClaudeAgentOptions(
        cwd=request.cwd,
        allowed_tools=["Read", "Grep", "Glob"],
        disallowed_tools=["Bash"],
        setting_sources=["project"],
    ),
)
```

Kitaru passes those options to Claude. Claude's own runtime decides what tools are available, what permissions apply, and which hooks run. See the official [hooks](https://code.claude.com/docs/en/agent-sdk/hooks) and [permissions](https://code.claude.com/docs/en/agent-sdk/permissions) docs for that layer.

The important Kitaru point is: these features remain **inside** the one Claude invocation checkpoint.

A concrete failure story:

```
1. Claude runs Bash and writes report.md.
2. Claude returns "I wrote report.md".
3. Kitaru stores the invocation result as a checkpoint.
4. Later, the flow crashes in a different checkpoint.
5. You replay from after the Claude checkpoint.
```

On replay, Kitaru can return the saved `ClaudeRunResult` without calling Claude again. But if the replay is running in a fresh workspace, Kitaru does not magically recreate `report.md`, because that file write happened inside Claude's own Bash/tool loop, not in a Kitaru-owned checkpoint. The adapter also does not snapshot the working directory before or after Claude runs.

If a side effect must be durable, make it a Kitaru-owned step:

```python
from pathlib import Path

import kitaru

@kitaru.checkpoint
def write_report(text: str, path: str) -> str:
    Path(path).write_text(text)
    return path

@kitaru.flow
def report_flow(prompt: str) -> str:
    claude_result = runner.run_sync(ClaudeRunRequest.start(prompt, max_turns=1))
    return write_report(claude_result.final_text or "", "report.md")
```

That way the durable file write is visible to Kitaru as its own checkpoint.

## Kitaru-owned sandbox command tool

Sometimes you do want Claude to run a command, but you want that command to go through the sandbox on your current Kitaru stack instead of through Claude's built-in `Bash` tool. Use Kitaru's Claude Agent SDK MCP helper for that case.

The concrete path is:

```
Claude prompt
  -> Claude Agent SDK invocation
  -> Claude calls mcp__kitaru__run_command
  -> Kitaru runs kitaru.run_sandbox_command(...) in your current stack's sandbox
  -> Claude receives stdout, stderr, exit_code, and cleanup metadata
  -> Kitaru stores one completed ClaudeRunResult for the whole invocation
```

Here is the supported registration shape:

```python
from claude_agent_sdk import ClaudeAgentOptions
from kitaru.adapters.claude_agent_sdk import (
    KITARU_SANDBOX_COMMAND_ALLOWED_TOOL_NAME,
    KitaruClaudeRunner,
    create_kitaru_sandbox_mcp_server,
)

sandbox_server = create_kitaru_sandbox_mcp_server()

runner = KitaruClaudeRunner(
    name="claude_sandboxed",
    options_factory=lambda request: ClaudeAgentOptions(
        cwd=request.cwd,
        resume=request.resume_session_id,
        max_turns=request.max_turns,
        mcp_servers={"kitaru": sandbox_server},
        strict_mcp_config=True,
        tools=[],
        permission_mode="dontAsk",
        disallowed_tools=["Bash"],
        allowed_tools=[KITARU_SANDBOX_COMMAND_ALLOWED_TOOL_NAME],
    ),
)
```

Do not pass the Kitaru MCP tool through `ClaudeAgentOptions(tools=[...])`. Claude Agent SDK uses `tools=` for built-in tool names and presets. Custom tools created with `tool(...)` are registered through `create_sdk_mcp_server(...)` and `ClaudeAgentOptions(mcp_servers=...)`, which is what `create_kitaru_sandbox_mcp_server()` builds for you.

If you customize the server or tool name, build the matching allowed-tools entry with `allowed_tool_name(server_name, tool_name)` instead of hand-writing the `mcp__...` string:

```python
from kitaru.adapters.claude_agent_sdk import allowed_tool_name
```

This helper does **not** transparently redirect Claude's built-in `Bash`. If you want command execution to be Kitaru-owned, disable Claude's built-in tools with `tools=[]`, pre-approve the Kitaru MCP tool with `allowed_tools=[...]`, and use `permission_mode="dontAsk"` so any non-approved tool request is denied instead of prompting. Claude's own `ClaudeAgentOptions(sandbox=...)` is also separate: it configures Claude-owned Bash sandboxing, not Kitaru stack sandbox execution.

The MCP tool accepts these inputs from Claude:

* `command`: a shell command string or argv-style list; Kitaru bounds command length before sending it to the sandbox
* `cwd`: optional working directory inside the sandbox; Kitaru bounds its length before sending it to the sandbox
* `env`: optional command environment; Kitaru bounds the number and size of variables so one MCP call cannot send an enormous env payload
* `max_chars`: optional per-stream output collection limit; Claude may lower the per-call limit but cannot raise it above the helper's configured `default_max_chars`
* `cleanup`: optional session cleanup policy, either `"destroy"` or `"close"`

The tool output is JSON text. A successful sandbox call returns `status="completed"`, the command, cwd, stdout, stderr, exit code, truncation flags, stack identity, sandbox identity, session ID, and cleanup status. The MCP tool advertises Claude's `maxResultSizeChars` annotation so Claude can receive large sandbox outputs without applying its smaller default MCP truncation path. Claude Code still caps that annotation at 500,000 characters, so Kitaru's Claude MCP helper uses a lower `default_max_chars` than raw `kitaru.run_sandbox_command(...)`: stdout, stderr, and the surrounding JSON fields must all fit inside Claude's result-size ceiling. Kitaru checks the exact JSON text that will be returned to Claude; if JSON escaping makes the result too large, it trims stdout and/or stderr and flips the corresponding truncation flag before Claude receives it. If you configure a larger `default_max_chars` than the helper can safely advertise, Kitaru raises at setup time instead of returning output that says `stdout_truncated=false` while Claude only received part of the JSON. A non-zero `exit_code` is still `status="completed"`; the sandbox ran the command and returned process data, so Claude can decide what to do next.

If Kitaru cannot run the command, the tool returns `status="failed"` with an error object containing the exception type, a category such as `usage`, `state`, `feature_not_available`, `backend`, or `runtime`, and a message.

Your current Kitaru stack must have exactly one sandbox component. If it has none, Kitaru refuses to run the command. If it has more than one, Kitaru refuses to guess. That avoids the bad outcome where Claude asks for a simple inspection command and Kitaru silently runs it in the wrong sandbox.

{% hint style="warning" %}
Kitaru redacts non-trivial `env` values, plus values from secret-like keys such as `TOKEN`, `API_KEY`, or `PASSWORD`, from the returned tool output. Claude may still record the tool **input** in its own messages or transcript. Avoid passing secrets through `env` unless you are comfortable with those values appearing in Claude-side records.
{% endhint %}

This still uses the adapter's invocation checkpoint model. If the whole Claude invocation is re-run before Kitaru has a completed checkpoint to reuse, Claude may call `mcp__kitaru__run_command` again. Treat commands with side effects the same way you would treat any command that might be retried: make them safe to repeat, or move the side effect into a separate Kitaru checkpoint after Claude returns.

A runnable showcase lives in the repository at `examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_sandbox_tool.py`. By default, that showcase passes a small temporary directory as Claude's own working directory, uses Claude Code's `--bare` mode, selects the tool-capable `sonnet` model alias, and sets a small Claude SDK budget cap. The sandbox command still runs through your current stack's sandbox. This default avoids the bad surprise where a tiny `python --version` demo causes Claude Code to load a large repository context. Pass `--claude-cwd /path/to/project` only when you actually want Claude to see that project, and pass `--model <model>` when you want a different Claude model.

## Claude file checkpointing is different

Claude Agent SDK has its own [file checkpointing](https://code.claude.com/docs/en/agent-sdk/file-checkpointing) feature for rewinding certain file changes made by Claude's built-in file-edit tools. That is useful, but it is not the same thing as a Kitaru checkpoint.

Keep the two ideas separate:

| Feature                   | Owned by                     | What it helps with                                           |
| ------------------------- | ---------------------------- | ------------------------------------------------------------ |
| Kitaru checkpoint         | Kitaru                       | Replay/skip completed workflow work and store typed outputs. |
| Claude session            | Claude SDK                   | Continue a Claude conversation from a session ID.            |
| Claude file checkpointing | Claude SDK                   | Rewind supported Claude file-edit tool changes in a session. |
| Workspace snapshot        | Not provided by this adapter | Recreate arbitrary files/process state after a crash.        |

The adapter records what it can observe at the invocation boundary. It does not turn Claude file checkpointing into Kitaru checkpointing, and it does not provide a general workspace snapshot system.

## Recommended durability pattern

Put the Claude runner call directly in the flow body so the adapter can create its own checkpoint around the invocation. Put side effects that must be durable in separate Kitaru checkpoints after Claude returns.

That gives you a concrete sequence like this:

```
flow body asks Claude
  -> adapter-created Claude invocation checkpoint completes
  -> Kitaru-owned write_report checkpoint writes a durable file
  -> Kitaru-owned notify_reviewer checkpoint sends a notification
```

If a later checkpoint fails, Kitaru can reuse the completed Claude invocation result instead of calling Claude again.

## Runnable examples

Run the educational non-streaming integration example:

```bash
uv sync --extra local --extra claude-agent-sdk
uv run kitaru init
export ANTHROPIC_API_KEY='<your-anthropic-api-key>'
uv run python examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_adapter.py
```

To inspect the example without making a Claude API call:

```bash
uv run python examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_adapter.py --help
```

The example prints the final text, session ID, usage/cost details when reported, and Kitaru artifact names. It uses `allowed_tools=[]` and `max_turns=1` so the first run teaches the adapter boundary without introducing tool side effects.

Run the live streaming example when you have a REST-backed stream-event backend:

```bash
uv run python examples/integrations/claude_agent_sdk_agent/claude_agent_sdk_streaming.py
```

That script submits the flow, starts a watcher for `claude_agent_sdk.stream.*` events, and then prints the final durable `ClaudeRunResult`. If live watching is not available on your backend, the `.wait()` path still demonstrates the saved result.

## Larger Claude example

For a richer workflow, see the compliance-review example:

```bash
uv run python examples/end_to_end/compliance_review/stage_1_single_turn.py
```

That example shows staged audit checkpoints, artifact persistence, and wait/resume around Claude turns. It uses the same high-level boundary: one Claude SDK invocation is one Kitaru checkpoint. It also has a custom transcript materializer for its multi-turn remote-stack story. That extra materializer is example-specific; the generic adapter still does not claim granular replay of Claude-internal side effects.

## Troubleshooting

### “Why do I need `@kitaru.flow`?”

The adapter creates a Kitaru checkpoint. Checkpoints need a flow execution to belong to. Wrap Claude calls in `@kitaru.flow` so Kitaru can record and replay that boundary.

### “Why did `checkpoint_strategy="calls"` fail?”

Because Claude's Python SDK does not expose replay-safe Python call bodies for each internal model/tool/Bash/MCP action. The adapter would be lying if it called hook observations “call checkpoints.” Use `checkpoint_strategy="invocation"` and move durable side effects into explicit Kitaru checkpoints.

### “Why did calling the runner from my checkpoint fail?”

The adapter refused the call because it could not create its own Claude invocation checkpoint inside your existing checkpoint. Move the runner call to the flow body when possible. If you deliberately want Claude to run directly inside the existing checkpoint, create the runner with `allow_direct_execution_inside_checkpoint=True` and treat replay of the outer checkpoint as capable of calling Claude again.

### “Why is my transcript artifact missing?”

The adapter always captures SDK-emitted messages when `save_messages=True`. Transcript-file capture is best-effort because Claude owns where and when local JSONL transcripts are written. Missing transcript files create a warning, not a failed run.

### “Why is an artifact name missing even though Claude completed?”

Capture and event persistence are best-effort by default. If Claude completed but saving one artifact or log entry failed, the adapter returns the Claude result and records the persistence problem in `result.warnings` and `result.metadata` instead of turning the completed Claude call into a failure. Use `fail_on_artifact_capture_error=True` or `fail_on_event_persistence_error=True` when you want strict behavior.

### “Why did my stream watcher show no events?”

First check the final `ClaudeRunResult`. Live events are progress only, not the durable source of truth. Common causes are: the active backend does not support live event watching, the checkpoint hit the stream cache and skipped the body, or Claude text deltas were not explicitly enabled with `ClaudeCapturePolicy(include_stream_text_deltas=True)`, or Claude produced only coarse SDK messages. For a repeatable demo, the streaming example sets `checkpoint_config={"cache": False}`.

### “Can this resume a half-finished Claude invocation?”

Not by itself. If the invocation does not complete, Kitaru has no completed checkpoint output to reuse. Claude session/session-store features may help with conversation recovery at the Claude layer, but the Kitaru adapter boundary is still one completed invocation.

## Related guides

<table data-view="cards"><thead><tr><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>Replay and overrides</strong></td><td>Re-run a flow with cached outputs for completed checkpoints</td><td><a href="/pages/ieoi9kJkRqHsTAPEUquq">/pages/ieoi9kJkRqHsTAPEUquq</a></td></tr><tr><td><strong>Artifacts</strong></td><td>Save named outputs from Kitaru-owned checkpoints</td><td><a href="/pages/gZ9fn4QuMS1d9OUqDiuH">/pages/gZ9fn4QuMS1d9OUqDiuH</a></td></tr></tbody></table>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.zenml.io/kitaru/adapters/claude-agent-sdk.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.