LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Getting Started
    • Welcome to ZenML
    • Installation
    • Hello World
    • Core Concepts
    • System Architecture
  • Deploying ZenML
    • Deploy
      • Deploy with Docker
      • Deploy with Helm
      • Deploy using HuggingFace Spaces
      • Deploy with custom images
      • Secret management
      • Custom secret stores
    • Connect
      • with your User (interactive)
      • with an API Token
      • with a Service Account
    • Manage
      • Best practices for upgrading
      • Using ZenML server in production
      • Troubleshoot your ZenML server
      • Migration guide
        • Migration guide 0.13.2 → 0.20.0
        • Migration guide 0.23.0 → 0.30.0
        • Migration guide 0.39.1 → 0.41.0
        • Migration guide 0.58.2 → 0.60.0
  • Concepts
    • Steps & Pipelines
      • Configuration
      • Scheduling
      • Logging
      • Advanced Features
      • YAML Configuration
    • Artifacts
      • Materializers
      • Visualizations
    • Stack & Components
    • Service Connectors
    • Containerization
    • Code Repositories
    • Secrets
    • Tags
    • Metadata
    • Models
    • Templates
    • Dashboard
  • Reference
    • Community & content
    • Environment Variables
    • llms.txt
    • FAQ
    • Global settings
    • Legacy docs
Powered by GitBook
On this page
  • Understanding Docker Builds in ZenML
  • Docker Build Process
  • Requirements Installation Order
  • Configuring Docker Settings
  • Pipeline-Level Settings
  • Step-Level Settings
  • Using YAML Configuration
  • Specifying Docker Build Options
  • Using Custom Parent Images
  • Pre-built Parent Images
  • Skip Build Process
  • Custom Dockerfiles
  • Managing Dependencies
  • Python Dependencies
  • System Packages
  • Installation Control
  • Private PyPI Repositories
  • Source Code Management
  • Controlling Included Files
  • Environment Variables
  • Build Reuse and Optimization
  • What is a Pipeline Build?
  • Reusing Builds
  • Controlling Image Repository Names
  • Decoupling Code from Builds
  • Image Build Location
  • Container User Permissions
  • Best Practices

Was this helpful?

Edit on GitHub
  1. Concepts

Containerization

Customize Docker builds to run your pipelines in isolated, well-defined environments.

PreviousService ConnectorsNextCode Repositories

Last updated 11 days ago

Was this helpful?

ZenML executes pipeline steps sequentially in the active Python environment when running locally. However, with remote or , ZenML builds images to run your pipeline in an isolated, well-defined environment.

This page explains how ZenML's Docker build process works and how you can customize it to meet your specific requirements.

Understanding Docker Builds in ZenML

When a pipeline is run with a remote orchestrator, a Dockerfile is dynamically generated at runtime. It is then used to build the Docker image using the image builder component of your stack. The Dockerfile consists of the following steps:

  1. Starts from a parent image that has ZenML installed. By default, this will use the for the Python and ZenML version that you're using in the active Python environment.

  2. Installs additional pip dependencies. ZenML automatically detects which integrations are used in your stack and installs the required dependencies.

  3. Optionally copies your source files. Your source files need to be available inside the Docker container so ZenML can execute your step code.

  4. Sets user-defined environment variables.

The process described above is automated by ZenML and covers most basic use cases. This page covers various ways to customize the Docker build process to fit your specific needs.

Docker Build Process

ZenML uses the following process to decide how to build Docker images:

  • No dockerfile specified: If any of the options regarding requirements, environment variables, or copying files require us to build an image, ZenML will build this image. Otherwise, the parent_image will be used to run the pipeline.

  • dockerfile specified: ZenML will first build an image based on the specified Dockerfile. If any additional options regarding requirements, environment variables, or copying files require an image built on top of that, ZenML will build a second image. If not, the image built from the specified Dockerfile will be used to run the pipeline.

Requirements Installation Order

Depending on the configuration of your Docker settings, requirements will be installed in the following order (each step is optional):

  1. The packages installed in your local Python environment (if enabled)

  2. The packages required by the stack (unless disabled by setting install_stack_requirements=False)

  3. The packages specified via the required_integrations

  4. The packages specified via the requirements attribute

Configuring Docker Settings

You can customize Docker builds for your pipelines and steps using the DockerSettings class:

from zenml.config import DockerSettings

There are multiple ways to supply these settings:

Pipeline-Level Settings

Configuring settings on a pipeline applies them to all steps of that pipeline:

from zenml.config import DockerSettings
docker_settings = DockerSettings()

# Either add it to the decorator
@pipeline(settings={"docker": docker_settings})
def my_pipeline() -> None:
    my_step()

# Or configure the pipelines options
my_pipeline = my_pipeline.with_options(
    settings={"docker": docker_settings}
)

Step-Level Settings

For more fine-grained control, configure settings on individual steps. This is particularly useful when different steps have conflicting requirements or when some steps need specialized environments:

docker_settings = DockerSettings()

# Either add it to the decorator
@step(settings={"docker": docker_settings})
def my_step() -> None:
    pass

# Or configure the step options
my_step = my_step.with_options(
    settings={"docker": docker_settings}
)

Using YAML Configuration

Define settings in a YAML configuration file for better separation of code and configuration:

settings:
    docker:
        parent_image: python:3.9-slim
        apt_packages:
          - git
          - curl
        requirements:
          - tensorflow==2.8.0
          - pandas

steps:
  training_step:
    settings:
        docker:
            parent_image: pytorch/pytorch:2.2.0-cuda11.8-cudnn8-runtime
            required_integrations:
              - wandb
              - mlflow

Specifying Docker Build Options

You can customize the build process by specifying build options that get passed to the build method of the image builder:

docker_settings = DockerSettings(
    build_config={"build_options": {"buildargs": {"MY_ARG": "value"}}}
)

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
    ...

If you're running your pipelines on MacOS with ARM architecture, the local Docker caching does not work unless you specify the target platform of the image:

docker_settings = DockerSettings(
    build_config={"build_options": {"platform": "linux/amd64"}}
)

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
    ...

Using Custom Parent Images

Pre-built Parent Images

To use a static parent image (e.g., with internal dependencies pre-installed):

docker_settings = DockerSettings(parent_image="my_registry.io/image_name:tag")

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
    ...

ZenML will use this image as the base and still perform the following steps:

  1. Install additional pip dependencies

  2. Copy source files (if configured)

  3. Set environment variables

Skip Build Process

To use the image directly to run your steps without including any code or installing any requirements on top of it, skip the Docker builds by setting skip_build=True:

docker_settings = DockerSettings(
    parent_image="my_registry.io/image_name:tag",
    skip_build=True
)

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
    ...

When skip_build is enabled, the parent_image will be used directly to run the steps of your pipeline without any additional Docker builds on top of it. This means that none of the following will happen:

  • No installation of local Python environment packages

  • No installation of stack requirements

  • No installation of required integrations

  • No installation of specified requirements

  • No installation of apt packages

  • No inclusion of source files in the container

  • No setting of environment variables

This is an advanced feature and may cause unintended behavior when running your pipelines. If you use this, ensure your image contains everything necessary to run your pipeline:

  1. Your stack requirements

  2. Integration requirements

  3. Project-specific requirements

  4. Any system packages

  5. Your project code files (unless a code repository is registered or allow_download_from_artifact_store is enabled)

Make sure that Python, pip and zenml are installed in your image, and that your code is in the /app directory set as the active working directory.

Also note that the Docker settings validator will raise an error if you set skip_build=True without specifying a parent_image. A parent image is required when skipping the build as it will be used directly to run your pipeline steps.

Custom Dockerfiles

For greater control, you can specify a custom Dockerfile and build context:

docker_settings = DockerSettings(
    dockerfile="/path/to/dockerfile",
    build_context_root="/path/to/build/context",
    parent_image_build_config={
        "build_options": {"buildargs": {"MY_ARG": "value"}},
        "dockerignore": "/path/to/.dockerignore"
    }
)

@pipeline(settings={"docker": docker_settings})
def my_pipeline(...):
    ...

Here is how the build process looks like with a custom Dockerfile:

  • Dockerfile specified: ZenML will first build an image based on the specified Dockerfile. If any options regarding requirements, environment variables, or copying files require an additional image built on top of that, ZenML will build a second image. Otherwise, the image built from the specified Dockerfile will be used to run the pipeline.

Important notes about using a custom Dockerfile:

  • When you specify a custom dockerfile, the parent_image attribute will be ignored

  • The image built from your Dockerfile must have ZenML installed

  • If you set build_context_root, that directory will be used as the build context for the Docker build. If left empty, the build context will only contain the Dockerfile

  • You can configure the build options by setting parent_image_build_config with specific build options and dockerignore settings

Managing Dependencies

ZenML offers several ways to specify dependencies for your Docker containers:

Python Dependencies

  1. Replicate Local Environment:

    # Use pip freeze (outputs a requirements file with exact package versions)
    from zenml.config import DockerSettings, PythonEnvironmentExportMethod
    docker_settings = DockerSettings(
        replicate_local_python_environment=PythonEnvironmentExportMethod.PIP_FREEZE
    )
    # Or as a string
    docker_settings = DockerSettings(replicate_local_python_environment="pip_freeze")
    
    # Or use poetry (requires Poetry to be installed)
    docker_settings = DockerSettings(
        replicate_local_python_environment=PythonEnvironmentExportMethod.POETRY_EXPORT
    )
    # Or as a string
    docker_settings = DockerSettings(replicate_local_python_environment="poetry_export")
    
    # Use custom command (provide a list of command arguments)
    docker_settings = DockerSettings(replicate_local_python_environment=[
        "poetry", "export", "--extras=train", "--format=requirements.txt"
    ])

    This feature allows you to easily replicate your local Python environment in the Docker container, ensuring that your pipeline runs with the same dependencies.

  2. Specify Requirements Directly:

    docker_settings = DockerSettings(requirements=["torch==1.12.0", "torchvision"])
  3. Use Requirements File:

    docker_settings = DockerSettings(requirements="/path/to/requirements.txt")
  4. Specify ZenML Integrations:

    from zenml.integrations.constants import PYTORCH, EVIDENTLY
    
    docker_settings = DockerSettings(required_integrations=[PYTORCH, EVIDENTLY])
  5. Control Stack Requirements: By default, ZenML installs the requirements needed by your active stack. You can disable this behavior if needed:

    docker_settings = DockerSettings(install_stack_requirements=False)

You can combine these methods but do make sure that your list of requirements does not overlap with ones specified explicitly in the Docker settings to avoid version conflicts.

Depending on the options specified in your Docker settings, ZenML installs the requirements in the following order (each step optional):

  1. The packages installed in your local Python environment

  2. The packages required by the stack (unless disabled by setting install_stack_requirements=False)

  3. The packages specified via the required_integrations

  4. The packages specified via the requirements attribute

System Packages

Specify apt packages to be installed in the Docker image:

docker_settings = DockerSettings(apt_packages=["git", "curl", "libsm6", "libxext6"])

Installation Control

Control how packages are installed:

# Use custom installer arguments
docker_settings = DockerSettings(python_package_installer_args={"timeout": 1000})

# Use uv instead of pip (experimental)
from zenml.config import DockerSettings, PythonPackageInstaller
docker_settings = DockerSettings(python_package_installer=PythonPackageInstaller.UV)
# Or as a string
docker_settings = DockerSettings(python_package_installer="uv")

# Use pip (default)
docker_settings = DockerSettings(python_package_installer=PythonPackageInstaller.PIP)

The available package installers are:

  • pip: The default Python package installer

  • uv: A faster alternative to pip (experimental)

uv is a relatively new project and not as stable as pip yet, which might lead to errors during package installation. If this happens, try switching the installer back to pip and see if that solves the issue.

Private PyPI Repositories

For packages that require authentication from private repositories:

import os

docker_settings = DockerSettings(
    requirements=["my-internal-package==0.1.0"],
    environment={
        'PIP_EXTRA_INDEX_URL': f"https://{os.environ.get('PYPI_TOKEN', '')}@my-private-pypi-server.com/{os.environ.get('PYPI_USERNAME', '')}/"}
)

Be cautious with handling credentials. Always use secure methods to manage and distribute authentication information within your team. Consider using secrets management tools or environment variables passed securely.

Source Code Management

ZenML determines the root directory of your source files in the following order:

  1. If you've initialized zenml (zenml init) in your current working directory or one of its parent directories, the repository root directory will be used.

  2. Otherwise, the parent directory of the Python file you're executing will be the source root. For example, running python /path/to/file.py, the source root would be /path/to.

You can specify how the files inside this root directory are handled:

docker_settings = DockerSettings(
    # Download files from code repository if available
    allow_download_from_code_repository=True,
    # If no code repository, upload code to artifact store
    allow_download_from_artifact_store=True,
    # If neither of the above, include files in the image
    allow_including_files_in_images=True
)

ZenML handles your source code in the following order:

  1. If the previous option is disabled or no code repository without local changes exists for the root directory, ZenML will archive and upload your code to the artifact store if allow_download_from_artifact_store is True.

  2. If both previous options were disabled or not possible, ZenML will include your files in the Docker image if allow_including_files_in_images is enabled. This means a new Docker image has to be built each time you modify one of your code files.

Setting all of the above attributes to False is not recommended and will most likely cause unintended and unanticipated behavior when running your pipelines. If you do this, you're responsible that all your files are at the correct paths in the Docker images that will be used to run your pipeline steps.

Controlling Included Files

  • When downloading files from a code repository, use a .gitignore file to exclude files.

  • When including files in the image, use a .dockerignore file to exclude files and keep the image smaller:

    # Have a file called .dockerignore in your source root directory
    # Or explicitly specify a .dockerignore file to use:
    docker_settings = DockerSettings(build_config={"dockerignore": "/path/to/.dockerignore"})

Environment Variables

You can set environment variables that will be available in the Docker container:

docker_settings = DockerSettings(
    environment={
        "PYTHONUNBUFFERED": "1",
        "MODEL_DIR": "/models",
        "API_KEY": "${GLOBAL_API_KEY}"  # Reference environment variables
    }
)

Environment variables can reference other environment variables by using the ${VAR_NAME} syntax. ZenML will substitute these at runtime.

Build Reuse and Optimization

ZenML automatically reuses Docker builds when possible to save time and resources:

What is a Pipeline Build?

A pipeline build is an encapsulation of a pipeline and the stack it was run on. It contains the Docker images that were built for the pipeline with all required dependencies from the stack, integrations and the user. Optionally, it also contains the pipeline code.

List all available builds for a pipeline:

zenml pipeline builds list --pipeline_id='startswith:ab53ca'

Create a build manually (useful for pre-building images):

zenml pipeline build --stack vertex-stack my_module.my_pipeline_instance

Reusing Builds

By default, when you run a pipeline, ZenML will check if a build with the same pipeline and stack exists. If it does, it will reuse that build automatically. However, you can also force using a specific build by providing its ID:

pipeline_instance.run(build="<build_id>")

You can also specify this in configuration files:

build: your-build-id-here

Specifying a custom build when running a pipeline will not run the code on your client machine but will use the code included in the Docker images of the build. Even if you make local code changes, reusing a build will always execute the code bundled in the Docker image, rather than the local code.

Controlling Image Repository Names

You can control where your Docker image is pushed by specifying a target repository name:

docker_settings = DockerSettings(target_repository="my-custom-repo-name")

The repository name will be appended to the registry URI of your container registry stack component. For example, if your container registry URI is gcr.io/my-project and you set target_repository="zenml-pipelines", the full image name would be gcr.io/my-project/zenml-pipelines.

If you don't specify a target repository, the default repository name configured in your container registry stack component settings will be used.

Decoupling Code from Builds

To reuse Docker builds while still using your latest code changes, you need to decouple your code from the build. There are two main approaches:

1. Using the Artifact Store to Upload Code

You can let ZenML use the artifact store to upload your code. This is the default behavior if no code repository is detected and the allow_download_from_artifact_store flag is not set to False in your DockerSettings.

2. Using Code Repositories for Faster Builds

ZenML will automatically figure out which builds match your pipeline and reuse the appropriate build id. Therefore, you do not need to explicitly pass in the build id when you have a clean repository state and a connected git repository.

In order to benefit from the advantages of having a code repository in a project, you need to make sure that the relevant integrations are installed for your ZenML installation.. For instance, let's assume you are working on a project with ZenML and one of your team members has already registered a corresponding code repository of type github for it. If you do zenml code-repository list, you would also be able to see this repository. However, in order to fully use this repository, you still need to install the corresponding integration for it, in this example the github integration.

zenml integration install github

Detecting local code repository checkouts

Once you have registered one or more code repositories, ZenML will check whether the files you use when running a pipeline are tracked inside one of those code repositories. This happens as follows:

  • First, the source root is computed

  • Next, ZenML checks whether this source root directory is included in a local checkout of one of the registered code repositories

Tracking code versions for pipeline runs

If a local code repository checkout is detected when running a pipeline, ZenML will store a reference to the current commit for the pipeline run, so you'll be able to know exactly which code was used.

Note that this reference is only tracked if your local checkout is clean (i.e. it does not contain any untracked or uncommitted files). This is to ensure that your pipeline is actually running with the exact code stored at the specific code repository commit.

If you want to ignore untracked files, you can set the ZENML_CODE_REPOSITORY_IGNORE_UNTRACKED_FILES environment variable to True. When doing this, you're responsible that the files committed to the repository includes everything necessary to run your pipeline.

Preventing Build Reuse

There might be cases where you want to force a new build, even if a suitable existing build is available. You can do this by setting prevent_build_reuse=True:

docker_settings = DockerSettings(prevent_build_reuse=True)

This is useful in scenarios like:

  • When you've made changes to your image building process that aren't tracked by ZenML

  • When troubleshooting issues in your Docker image

  • When you want to ensure your Docker image uses the most up-to-date base images

Tips and Best Practices for Build Reuse

  • Clean Repository State: The file download is only possible if the local checkout is clean (no untracked or uncommitted files) and the latest commit has been pushed to the remote repository.

  • Team Collaboration: Using code repositories allows team members to reuse images that colleagues might have built for the same stack, enhancing collaboration efficiency.

  • Build Selection: ZenML automatically selects matching builds, but you can override this with explicit build IDs for special cases.

Image Build Location

Container User Permissions

By default, Docker containers often run as the root user, which can pose security risks. ZenML allows you to specify a different user to run your containers:

docker_settings = DockerSettings(user="non-root-user")

When you set the user parameter:

  • The specified user will become the owner of the /app directory, which contains all your code

  • The container entrypoint will run as this user instead of root

  • This can help improve security by following the principle of least privilege

Best Practices

  1. Use code repositories to speed up builds and enable team collaboration. This approach is highly recommended for production environments.

  2. Keep dependencies minimal to reduce build times. Only include packages you actually need.

  3. Use fine-grained Docker settings at the step level for conflicting requirements. This prevents dependency conflicts and reduces image sizes.

  4. Use pre-built images for common environments. This can significantly speed up your workflow.

  5. Configure dockerignore files to reduce image size. Large Docker images take longer to build, push, and pull.

  6. Leverage build caching by structuring your Dockerfiles and build processes to maximize cache hits.

  7. Use environment variables for configuration instead of hardcoding values in your images.

  8. Test your Docker builds locally before using them in production pipelines.

  9. Keep your repository clean (no uncommitted changes) when running pipelines to ensure ZenML can correctly track code versions.

  10. Use metadata and labels to help identify and manage your Docker images.

  11. Run containers as non-root users when possible to improve security.

By following these practices, you can optimize your Docker builds in ZenML and create a more efficient workflow.

For a full list of configuration options, check out .

Check out for more information on the hierarchy and precedence of the various ways in which you can supply the settings.

For the default local image builder, these options are passed to the .

If you're going to use a custom parent image, you need to make sure that it has Python, pip, and ZenML installed for it to work. If you need a starting point, you can take a look at the Dockerfile that ZenML uses .

Full documentation for how uv works with PyTorch can be found on the Astral Docs website . It covers some of the particular gotchas and details you might need to know.

If allow_download_from_code_repository is True and your files are inside a registered and the repository has no local changes, the files will be downloaded from the code repository and not included in the image.

You can use options to specify the configuration file and the stack to use for the build. Learn more about the build function .

Registering a lets you avoid building images each time you run a pipeline and quickly iterate on your code. When running a pipeline that is part of a local code repository checkout, ZenML can instead build the Docker images without including any of your source files, and download the files inside the container before running your code.

Configuration Options: If you want to disable or enforce downloading of files, check the for available options.

By default, execution environments are created locally using the local Docker client. However, this requires Docker installation and permissions. ZenML offers , a special , allowing users to build and push Docker images in a different specialized image builder environment.

Note that even if you don't configure an image builder in your stack, ZenML still uses the to retain consistency across all builds. In this case, the image builder environment is the same as the .

You don't need to directly interact with any image builder in your code. As long as the image builder that you want to use is part of your active , it will be used automatically by any component that needs to build container images.

orchestrators
step operators
Docker
official ZenML image
the DockerSettings object on the SDKDocs
this page
docker build command
here
here
code repository
here
code repository
DockerSettings
image builders
stack component
local image builder
client environment
ZenML stack