Containerization
Customize Docker builds to run your pipelines in isolated, well-defined environments.
Last updated
Was this helpful?
Customize Docker builds to run your pipelines in isolated, well-defined environments.
Last updated
Was this helpful?
ZenML executes pipeline steps sequentially in the active Python environment when running locally. However, with remote or , ZenML builds images to run your pipeline in an isolated, well-defined environment.
This page explains how ZenML's Docker build process works and how you can customize it to meet your specific requirements.
When a pipeline is run with a remote orchestrator, a Dockerfile is dynamically generated at runtime. It is then used to build the Docker image using the image builder component of your stack. The Dockerfile consists of the following steps:
Starts from a parent image that has ZenML installed. By default, this will use the for the Python and ZenML version that you're using in the active Python environment.
Installs additional pip dependencies. ZenML automatically detects which integrations are used in your stack and installs the required dependencies.
Optionally copies your source files. Your source files need to be available inside the Docker container so ZenML can execute your step code.
Sets user-defined environment variables.
The process described above is automated by ZenML and covers most basic use cases. This page covers various ways to customize the Docker build process to fit your specific needs.
ZenML uses the following process to decide how to build Docker images:
No dockerfile
specified: If any of the options regarding requirements, environment variables, or copying files require us to build an image, ZenML will build this image. Otherwise, the parent_image
will be used to run the pipeline.
dockerfile
specified: ZenML will first build an image based on the specified Dockerfile. If any additional options regarding requirements, environment variables, or copying files require an image built on top of that, ZenML will build a second image. If not, the image built from the specified Dockerfile will be used to run the pipeline.
Depending on the configuration of your Docker settings, requirements will be installed in the following order (each step is optional):
The packages installed in your local Python environment (if enabled)
The packages required by the stack (unless disabled by setting install_stack_requirements=False
)
The packages specified via the required_integrations
The packages specified via the requirements
attribute
You can customize Docker builds for your pipelines and steps using the DockerSettings
class:
There are multiple ways to supply these settings:
Configuring settings on a pipeline applies them to all steps of that pipeline:
For more fine-grained control, configure settings on individual steps. This is particularly useful when different steps have conflicting requirements or when some steps need specialized environments:
Define settings in a YAML configuration file for better separation of code and configuration:
You can customize the build process by specifying build options that get passed to the build method of the image builder:
To use a static parent image (e.g., with internal dependencies pre-installed):
ZenML will use this image as the base and still perform the following steps:
Install additional pip dependencies
Copy source files (if configured)
Set environment variables
To use the image directly to run your steps without including any code or installing any requirements on top of it, skip the Docker builds by setting skip_build=True
:
When skip_build
is enabled, the parent_image
will be used directly to run the steps of your pipeline without any additional Docker builds on top of it. This means that none of the following will happen:
No installation of local Python environment packages
No installation of stack requirements
No installation of required integrations
No installation of specified requirements
No installation of apt packages
No inclusion of source files in the container
No setting of environment variables
This is an advanced feature and may cause unintended behavior when running your pipelines. If you use this, ensure your image contains everything necessary to run your pipeline:
Your stack requirements
Integration requirements
Project-specific requirements
Any system packages
Your project code files (unless a code repository is registered or allow_download_from_artifact_store
is enabled)
Make sure that Python, pip
and zenml
are installed in your image, and that your code is in the /app
directory set as the active working directory.
Also note that the Docker settings validator will raise an error if you set skip_build=True
without specifying a parent_image
. A parent image is required when skipping the build as it will be used directly to run your pipeline steps.
For greater control, you can specify a custom Dockerfile and build context:
Here is how the build process looks like with a custom Dockerfile:
Dockerfile
specified: ZenML will first build an image based on the specified Dockerfile
. If any options regarding requirements, environment variables, or copying files require an additional image built on top of that, ZenML will build a second image. Otherwise, the image built from the specified Dockerfile
will be used to run the pipeline.
ZenML offers several ways to specify dependencies for your Docker containers:
Replicate Local Environment:
This feature allows you to easily replicate your local Python environment in the Docker container, ensuring that your pipeline runs with the same dependencies.
Specify Requirements Directly:
Use Requirements File:
Specify ZenML Integrations:
Control Stack Requirements: By default, ZenML installs the requirements needed by your active stack. You can disable this behavior if needed:
Depending on the options specified in your Docker settings, ZenML installs the requirements in the following order (each step optional):
The packages installed in your local Python environment
The packages required by the stack (unless disabled by setting install_stack_requirements=False
)
The packages specified via the required_integrations
The packages specified via the requirements
attribute
Specify apt packages to be installed in the Docker image:
Control how packages are installed:
The available package installers are:
pip
: The default Python package installer
uv
: A faster alternative to pip (experimental)
For packages that require authentication from private repositories:
Be cautious with handling credentials. Always use secure methods to manage and distribute authentication information within your team. Consider using secrets management tools or environment variables passed securely.
ZenML determines the root directory of your source files in the following order:
If you've initialized zenml (zenml init
) in your current working directory or one of its parent directories, the repository root directory will be used.
Otherwise, the parent directory of the Python file you're executing will be the source root. For example, running python /path/to/file.py
, the source root would be /path/to
.
You can specify how the files inside this root directory are handled:
ZenML handles your source code in the following order:
If the previous option is disabled or no code repository without local changes exists for the root directory, ZenML will archive and upload your code to the artifact store if allow_download_from_artifact_store
is True
.
If both previous options were disabled or not possible, ZenML will include your files in the Docker image if allow_including_files_in_images
is enabled. This means a new Docker image has to be built each time you modify one of your code files.
Setting all of the above attributes to False
is not recommended and will most likely cause unintended and unanticipated behavior when running your pipelines. If you do this, you're responsible that all your files are at the correct paths in the Docker images that will be used to run your pipeline steps.
When downloading files from a code repository, use a .gitignore
file to exclude files.
When including files in the image, use a .dockerignore
file to exclude files and keep the image smaller:
You can set environment variables that will be available in the Docker container:
Environment variables can reference other environment variables by using the ${VAR_NAME}
syntax. ZenML will substitute these at runtime.
ZenML automatically reuses Docker builds when possible to save time and resources:
A pipeline build is an encapsulation of a pipeline and the stack it was run on. It contains the Docker images that were built for the pipeline with all required dependencies from the stack, integrations and the user. Optionally, it also contains the pipeline code.
List all available builds for a pipeline:
Create a build manually (useful for pre-building images):
By default, when you run a pipeline, ZenML will check if a build with the same pipeline and stack exists. If it does, it will reuse that build automatically. However, you can also force using a specific build by providing its ID:
You can also specify this in configuration files:
Specifying a custom build when running a pipeline will not run the code on your client machine but will use the code included in the Docker images of the build. Even if you make local code changes, reusing a build will always execute the code bundled in the Docker image, rather than the local code.
You can control where your Docker image is pushed by specifying a target repository name:
The repository name will be appended to the registry URI of your container registry stack component. For example, if your container registry URI is gcr.io/my-project
and you set target_repository="zenml-pipelines"
, the full image name would be gcr.io/my-project/zenml-pipelines
.
If you don't specify a target repository, the default repository name configured in your container registry stack component settings will be used.
To reuse Docker builds while still using your latest code changes, you need to decouple your code from the build. There are two main approaches:
You can let ZenML use the artifact store to upload your code. This is the default behavior if no code repository is detected and the allow_download_from_artifact_store
flag is not set to False
in your DockerSettings
.
ZenML will automatically figure out which builds match your pipeline and reuse the appropriate build id. Therefore, you do not need to explicitly pass in the build id when you have a clean repository state and a connected git repository.
In order to benefit from the advantages of having a code repository in a project, you need to make sure that the relevant integrations are installed for your ZenML installation.. For instance, let's assume you are working on a project with ZenML and one of your team members has already registered a corresponding code repository of type github
for it. If you do zenml code-repository list
, you would also be able to see this repository. However, in order to fully use this repository, you still need to install the corresponding integration for it, in this example the github
integration.
Once you have registered one or more code repositories, ZenML will check whether the files you use when running a pipeline are tracked inside one of those code repositories. This happens as follows:
First, the source root is computed
Next, ZenML checks whether this source root directory is included in a local checkout of one of the registered code repositories
If a local code repository checkout is detected when running a pipeline, ZenML will store a reference to the current commit for the pipeline run, so you'll be able to know exactly which code was used.
Note that this reference is only tracked if your local checkout is clean (i.e. it does not contain any untracked or uncommitted files). This is to ensure that your pipeline is actually running with the exact code stored at the specific code repository commit.
There might be cases where you want to force a new build, even if a suitable existing build is available. You can do this by setting prevent_build_reuse=True
:
This is useful in scenarios like:
When you've made changes to your image building process that aren't tracked by ZenML
When troubleshooting issues in your Docker image
When you want to ensure your Docker image uses the most up-to-date base images
Clean Repository State: The file download is only possible if the local checkout is clean (no untracked or uncommitted files) and the latest commit has been pushed to the remote repository.
Team Collaboration: Using code repositories allows team members to reuse images that colleagues might have built for the same stack, enhancing collaboration efficiency.
Build Selection: ZenML automatically selects matching builds, but you can override this with explicit build IDs for special cases.
By default, Docker containers often run as the root
user, which can pose security risks. ZenML allows you to specify a different user to run your containers:
When you set the user
parameter:
The specified user will become the owner of the /app
directory, which contains all your code
The container entrypoint will run as this user instead of root
This can help improve security by following the principle of least privilege
Use code repositories to speed up builds and enable team collaboration. This approach is highly recommended for production environments.
Keep dependencies minimal to reduce build times. Only include packages you actually need.
Use fine-grained Docker settings at the step level for conflicting requirements. This prevents dependency conflicts and reduces image sizes.
Use pre-built images for common environments. This can significantly speed up your workflow.
Configure dockerignore files to reduce image size. Large Docker images take longer to build, push, and pull.
Leverage build caching by structuring your Dockerfiles and build processes to maximize cache hits.
Use environment variables for configuration instead of hardcoding values in your images.
Test your Docker builds locally before using them in production pipelines.
Keep your repository clean (no uncommitted changes) when running pipelines to ensure ZenML can correctly track code versions.
Use metadata and labels to help identify and manage your Docker images.
Run containers as non-root users when possible to improve security.
By following these practices, you can optimize your Docker builds in ZenML and create a more efficient workflow.
For a full list of configuration options, check out .
Check out for more information on the hierarchy and precedence of the various ways in which you can supply the settings.
For the default local image builder, these options are passed to the .
If you're going to use a custom parent image, you need to make sure that it has Python, pip, and ZenML installed for it to work. If you need a starting point, you can take a look at the Dockerfile that ZenML uses .
Full documentation for how uv
works with PyTorch can be found on the Astral Docs website . It covers some of the particular gotchas and details you might need to know.
If allow_download_from_code_repository
is True
and your files are inside a registered and the repository has no local changes, the files will be downloaded from the code repository and not included in the image.
You can use options to specify the configuration file and the stack to use for the build. Learn more about the build function .
Registering a lets you avoid building images each time you run a pipeline and quickly iterate on your code. When running a pipeline that is part of a local code repository checkout, ZenML can instead build the Docker images without including any of your source files, and download the files inside the container before running your code.
Configuration Options: If you want to disable or enforce downloading of files, check the for available options.
By default, execution environments are created locally using the local Docker client. However, this requires Docker installation and permissions. ZenML offers , a special , allowing users to build and push Docker images in a different specialized image builder environment.
Note that even if you don't configure an image builder in your stack, ZenML still uses the to retain consistency across all builds. In this case, the image builder environment is the same as the .
You don't need to directly interact with any image builder in your code. As long as the image builder that you want to use is part of your active , it will be used automatically by any component that needs to build container images.