For the complete documentation index, see llms.txt. This page is also available as Markdown.

Execution Management

Inspect execution status, fetch runtime logs, resolve waits, and manage lifecycle actions

KitaruClient is the programmatic API for managing and inspecting executions outside your flow functions.

KitaruClient and the CLI use your current Kitaru connection context. If you want to inspect executions from a deployed Kitaru server, connect first with kitaru login ... or provide KITARU_* connection variables in the current environment.

Create a client

import kitaru

client = kitaru.KitaruClient()

The client uses your current Kitaru connection/project context.

Inspect a single execution

execution = client.executions.get(exec_id)
print(execution.exec_id)
print(execution.flow_name)
print(execution.status)  # running/waiting/completed/failed/cancelled

if execution.pending_wait:
    print(execution.pending_wait.name, execution.pending_wait.question)

if execution.failure:
    print(execution.failure.origin, execution.failure.message)

Execution details include:

  • start/end timestamps

  • stack name

  • summary metadata

  • checkpoint calls

  • pending wait details (execution.pending_wait)

  • execution failure details (execution.failure) when status is failed

  • checkpoint retry/failure attempt history (checkpoint.attempts)

  • artifacts

  • frozen execution spec (when available)

List and query executions

Execution statistics

Use execution statistics when you want counts, trends, or health checks without fetching every individual execution first. This is the difference between asking "show me the last 20 executions" and asking "how many executions failed this week?" Kitaru sends the aggregate question to the active Kitaru runtime and returns a small grouped result. Each group always includes an execution count. You can also ask for numeric metrics, such as average duration or the sum of a numeric execution metadata key.

The CLI exposes the same surface:

Text output is intentionally small:

When you request metrics, text output adds one column per metric:

Supported groupings are:

  • status → public Kitaru status (running, waiting, completed, failed, cancelled)

  • flowflow_id

  • stackstack_id

  • tag → tag value

  • time:hour, time:day, time:week, time:month

  • metadata:<key> → the value stored for that execution metadata key

Supported metric sources are:

  • duration

  • step_count

  • cached_step_count

  • output_artifact_count

  • metadata:<key> through a metric spec that sets source="metadata" and metadata_key="<key>"

Supported aggregations are avg, sum, min, and max.

CLI metric specs use this format:

  • <name>:<source>:<avg|sum|min|max> for built-in sources

  • <name>:metadata:<metadata_key>:<avg|sum|min|max> for metadata

LLM usage and cost metadata

When an execution makes LLM calls through kitaru.llm() or the supported agent adapters, Kitaru records canonical llm_usage_v1 metadata on the checkpoint that made or reused the provider work. One usage record usually means one provider interaction or one adapter-level graph/agent invocation, depending on which adapter produced it. When FlowHandle.wait() or FlowHandle.get() observes the execution finishing, Kitaru reads those checkpoint records and writes two execution-level views:

  • llm_usage_summary_v1 is the inspection view. kitaru executions get and the Python client parse it into execution.llm_usage_summary. It tells you what happened in one execution. Its usage_record_count, incurred_usage_record_count, and reused_usage_record_count fields count Kitaru usage records, not raw provider API calls.

  • Flat numeric metadata keys such as kitaru_llm_display_cost_usd_v1 and kitaru_llm_total_tokens_v1 are the statistics view. Kitaru execution statistics can sum or average these because they are top-level numbers, not nested objects.

Cost fields are intentionally split:

  • actual_cost_usd means the provider reported a cost. Claude Agent SDK exposes this via total_cost_usd.

  • estimated_cost_usd means Kitaru used an adapter cost calculator. OpenAI Agents and LangGraph can report this when you configure their calculator hook.

  • display_cost_usd uses actual cost for a record when present, otherwise estimated cost. Treat it as observability, not as a billing invoice.

Direct kitaru.llm() records token counts and latency, but it does not invent a cost number. If the provider call does not return a real cost source, cost stays empty and the execution summary increments records_without_cost_count.

Useful statistics queries:

Supported filters are flow, status, stack, tags, and max_groups. Multiple tag filters mean "executions that have all of these tags". When max_groups truncates a time-grouped result, Kitaru keeps the newest time rows and still displays the rows from oldest to newest.

flow and stack groupings currently return IDs (flow_id and stack_id), not display names. This avoids guessing when a flow or stack has been renamed or deleted. You can still filter by a flow or stack name.

The current statistics surface supports grouping by time and metadata, but not filtering by time range or metadata values yet. If you need "last 7 days" or "only executions where customer_tier=enterprise", fetch/list those executions separately or add a stable tag for that cohort before querying statistics.

Agent and operations summaries should use this same general surface. For example, an assistant can ask for daily volume first, then drill into only the unhealthy cohort:

Fetch runtime logs

Runtime log retrieval requires a server-backed connection. For CLI options, follow mode, grouped output, and retrieval caveats, see View Execution Runtime Logs.

Resolve wait input

On local interactive runs, the runtime prompts for input in the same terminal. For non-interactive or timed-out executions, resolve the pending wait externally:

If the execution does not continue automatically after input (e.g. the original runner already exited), call resume(...):

Retry, replay, and cancel

Execution convenience methods

Execution objects returned by client.executions.get(...) also expose convenience methods that call back into the same client:

These are equivalent to calling client.executions.retry(exec_id) etc. — they return a new Execution snapshot rather than mutating the existing object.

Inspect or abort waits programmatically

List all pending wait conditions for an execution:

Abort a pending wait instead of continuing it:

Browse and load artifacts

You can also filter artifact lists:

Manage executions from the CLI

Query executions through MCP

If you want assistant-native tooling (Claude Code, Cursor, etc.), install and run the MCP server:

Then use tool calls like:

  • kitaru_executions_list(status="waiting")

  • kitaru_executions_statistics(group_by=["status"])

  • kitaru_executions_input(exec_id=..., wait=..., value=...) (MCP requires explicit wait)

  • get_execution_logs(exec_id=...)

  • kitaru_artifacts_get(artifact_id=...)

  • kitaru_status()

If the execution does not continue automatically after wait input is resolved (e.g. the original runner already exited), use the CLI or SDK resume(...) call. MCP does not currently expose a separate resume tool.

See the full setup guide at MCP Server.

Try the examples

For the broader catalog, see Examples.

Last updated

Was this helpful?