For the complete documentation index, see llms.txt. This page is also available as Markdown.

Sandbox

Run the agent's shell commands inside a Docker sandbox, so a mistaken command hits a throwaway container instead of your host.

Stage 1 made a crash survivable. When a turn finished, its output went to durable storage, so a retry picked up the saved work instead of paying for the same model call twice. The agent itself was deliberately plain: a PydanticAI agent with one tool, exec, that ran shell commands directly in the host process.

That host process is what this stage fixes. Running commands on your own machine is fine for a first demo. It turns into a liability the moment the agent does something you did not foresee, and an agent doing something you did not foresee is the normal case, not the rare one.

This walkthrough uses stage_2_sandboxed_exec.py from the runnable Agent Harness Platform example. If you have not cloned the repo yet, start with Get the code on the overview page.

The agent's shell is your shell

Think about what the exec tool actually is. The agent picks a shell command, the host runs it, and the output goes back to the model. Most of the time the command is harmless: an ls to look around, a cat to read a file, a grep to find something. The trouble is that the same opening that lets the agent run ls lets it run rm -rf against the wrong directory, or a script that quietly fills the disk, or a curl ... | bash it talked itself into.

In Stage 1 those commands run with whatever access the worker has. On your laptop that is your home directory and every other project sitting beside this one. On a server it is the production worker's filesystem. The agent reasons about the files it can see, and the files it can see are yours, so one wrong command can delete things that had nothing to do with the task.

Stage 2 gives the agent a room of its own. Each run happens inside a with DockerSandbox(...) block, and every shell command runs inside that container instead of on the host. The container has its own filesystem, its own process table, and its own network view, so a bad command lands on a throwaway container and a mounted workspace rather than reaching back to your machine.

The sandbox boundary: shell commands run inside a throwaway container, not on the host.

One-time setup

Build the sandbox image once:

If you would rather build everything at once, bash setup.sh builds this image along with the proxy and mock images that stages 4 and up use.

Run it

The flow runs two agent.run_sync() turns that share a single with DockerSandbox(...). Turn 1 looks around the machine and cds into /tmp. Turn 2 writes a summary.txt "in the current directory," and because the container's shell is long-lived, turn 1's cd is still in effect, so the file lands at /tmp/summary.txt even though turn 2 never names an absolute path.

DISABLE_CACHE=1 forces every checkpoint to re-execute. Leave it off and a repeat run serves both turns from cache, and you never get to watch the sandbox run a command. While you are learning what it does, you want the commands to really run.

What you should see in the logs

Every line that starts with [sandbox] is a command running inside the container, never on your host:

The line to watch is cwd=/tmp. Turn 1 runs cd /tmp; when turn 2 starts, ls -la && pwd still reports /tmp. The working directory carried across the turn boundary because both turns spoke to the same shell process inside the container.

You can watch that container live from another terminal while the flow runs:

What just happened?

DockerSandbox is a context manager. On the way in it runs docker run -d to start the container; on the way out it runs docker stop. Each step prints a [sandbox] line and records structured metadata through kitaru.log(), so the same events show up in the dashboard too.

The exec tool does not know any container exists. build_tools takes an optional sandbox argument, and when one is passed in, exec sends each command through sandbox.run(command) instead of running it in the host process. Same tool, same agent code. The only thing that changed between Stage 1 and Stage 2 is whether a sandbox was handed in.

Two details make the container behave like a working environment rather than a fresh shell per command.

The first is one long-lived shell. Every run(command) call goes through a single bash --noprofile --norc process inside the container. Shell state, such as a cd, an export, an open file descriptor, or a background job, survives from one exec call to the next, the same way it would in a terminal you left open. The host writes each command into that shell's stdin and reads the output back up to a unique marker line. That is why turn 1's cd /tmp was still in effect for turn 2.

The second is a workspace that outlives the container. A named volume, workspace_<execution_id>, is mounted at /workspace. Files written there survive the container being stopped and started again, which is what lets a flow pause and resume later without losing its working files.

Try one small change

Stage 1's durability did not disappear when the sandbox arrived. This time leave DISABLE_CACHE off so the cache is live, and make the flow crash between the two turns:

Turn 1 runs its commands in the container and gets checkpointed, then the flow raises before turn 2 starts. Run it once more with no flags, and turn 1's saved output comes back from cache while turn 2 runs fresh.

There is one important difference from Stage 1: cached shell turns do not replay their side effects. The saved output comes back, but the earlier cd /tmp or file writes do not run again. If the next live turn needs filesystem state, put it in /workspace or write it as its own checkpointed value. The diagram and warning below spell this out because it is exactly the kind of detail that matters when agents start touching real files.

On replay a cached shell turn returns its saved output but does not re-run its side effects.

What's simplified for the tutorial

Docker here is a workspace boundary, not a wall against hostile code. It gives a real filesystem, process, and network boundary, which is the right size for the risk in this example: a confused agent that runs a bad rm hits the container and a mounted workspace, not your home directory. It is not built to hold code that is actively trying to escape. The container shares one Linux kernel with the host, so a kernel exploit or a careless mount can still reach across. For genuinely untrusted code you want stronger isolation, which the upgrade path below covers.

Shell state is also not carried across separate runs, and that is deliberate. The bash process dies when the container stops, and the example does not try to replay shell state into the next run. Bash commands have side effects (rm, git push, curl POST, psql -c "INSERT…") that a cd plus declare -px snapshot cannot capture or undo, so "restoring" such a snapshot would silently drop every real mutation while looking like it recreated the session. If the agent needs state to survive across runs, it should write to /workspace (the named volume, which outlives container teardown) or persist specific values with kitaru.save() and reload them with kitaru.load() at the top of the next run.

Production upgrade path

Stage 2 uses Docker, but the exec tool never learns that. It calls sandbox.run(command) and expects an ExecResult back (an exit code plus captured stdout). Anything that satisfies that one method is a valid sandbox, so changing the backend never touches the agent's tool wiring. DockerSandbox is simply the implementation that runs easily on a laptop.

When you need a real boundary against untrusted code, that seam is where you swap it. A few backends to reach for, roughly in order of how much isolation they buy:

  • gVisor runs each container against a user-space kernel, so syscalls hit gVisor rather than the host kernel directly. Stronger than plain Docker, still container-shaped.

  • Kata Containers wraps each container in a lightweight VM, so a kernel escape lands in the VM instead of on the host.

  • Firecracker microVMs give each run its own minimal VM with a small attack surface (the same technology behind AWS Lambda).

  • Hosted sandboxes like E2B, Modal, or Daytona run commands on isolated infrastructure, so nothing executes on your own machine.

  • WebAssembly isolates by default, though arbitrary bash is not its strong suit, so it fits best when you can constrain what the agent runs.

Credentials are the next gap. If the next tool needs a secret, the obvious shortcut is to put that secret in the worker's environment, which puts it one shell command away from the agent.

Where this leaves us

Stage 2 took the agent's shell off your host and put it in a container. A bad command now ruins a throwaway workspace instead of your home directory, and the durability from Stage 1 still holds around it.

What the agent does inside that sandbox is still decided in Python, though. Its procedure, the steps it follows and the checks it runs, lives in the system prompt as a string literal in the code. The next stage pulls that procedure out into a markdown file an operator can edit, so changing how the agent behaves no longer means editing and redeploying code.

Last updated

Was this helpful?