Manage Docker Images
How ZenML uses Docker images to run your pipeline
This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.
When running locally, ZenML will execute the steps of your pipeline in the active Python environment. When using a remote orchestrators or step operators instead, ZenML builds Docker images to transport and run your pipeline code in an isolated and well-defined environment. For this purpose, a Dockerfile is dynamically generated and used to build the image using the local Docker client. This Dockerfile consists of the following steps:
Starts from a parent image which needs to have ZenML installed. By default, this will use the official ZenML image for the Python and ZenML version that you're using in the active Python environment. If you want to use a different image as the base for the following steps, check out this guide.
Installs additional pip dependencies. ZenML will automatically detect which integrations are used in your stack and install the required dependencies. If your pipeline needs any additional requirements, check out our guide on including custom dependencies.
Copies your active stack configuration. This is needed so that ZenML can execute your code on the stack that you specified.
Copies your source files. These files need to be included in the Docker image so ZenML can execute your step code. Check out this section for more information on which files get included by default and how to exclude files.
Sets user-defined environment variables.
ZenML uses the official Docker python library to build and push your images. This library loads its authentication credentials to push images from the default config location: $HOME/.docker/config.json
. If your Docker configuration is stored in a different directory, you can use the environment variable DOCKER_CONFIG
to override this behavior:
The directory that you specify here must contain your Docker configuration in a file called config.json
.
Customizing the build process
The process explained above is all done automatically by ZenML and covers most basic use cases. This section covers all the different ways in which you can hook into the Docker building process to customize the resulting image to your needs.
For a full list of configuration options, check out our API Docs.
For the configuration examples described below, you'll need to import the DockerConfiguration
module:
Which files get included
ZenML will try to determine the root directory of your source files in the following order:
If you've created a ZenML repository for your project, the repository directory will be used.
Otherwise, the parent directory of the python file you're executing will be the source root. For example, running
python /path/to/file.py
, the source root would be/path/to
.
By default, ZenML will copy all contents of this root directory into the Docker image. If you want to exclude files to keep the image smaller, you can do so using a .dockerignore file in either of the following two ways:
Have a file called
.dockerignore
in your source root directory explained above.Explicitly specify a
.dockerignore
file that you want to use:
Don't include any user files
If you want to prevent ZenML from copying any of your source files, you can do so by setting the copy_files
attribute on the Docker configuration to False
:
This is an advanced feature and will most likely break your pipelines. If you use this, you're on your own and need to copy all the necessary files to the correct paths yourself.
Don't include the stack configuration
If you want to prevent ZenML from copying the configuration of your active stack, you can do so by setting the copy_profile
attribute on the Docker configuration to False
:
This is an advanced feature and will most likely break your pipelines. If you use this, you're on your own and need to copy a stack configuration to the correct path yourself.
How to install additional pip dependencies
By default, ZenML will automatically install all the packages required by your active ZenML stack. There are, however, various ways in which you can specify additional packages that should be installed:
Install all the packages in your local python environment (This will use the
pip
orpoetry
package manager to get a list of your local packages):Specify a list of pip requirements in code:
Specify a pip requirements file:
Specify a list of ZenML integrations that you're using in your pipeline:
Prevent ZenML from automatically installing the requirements of your stack:
You can even combine these methods, but do make sure that your list of pip requirements doesn't overlap with the ones specified by your required integrations.
Depending on all the options specified in your Docker configuration, ZenML will install the requirements in the following order (each step optional):
The packages installed in your local python environment
The packages specified via the
requirements
attributeThe packages specified via the
required_integrations
and potentially stack requirements
Using a custom parent image
By default, ZenML will perform all the steps described above on top of the official ZenML image for the Python and ZenML version that you're using in the active Python environment. To have more control over the entire environment which is used to execute your pipelines, you can either specify a custom pre-built parent image or a Dockerfile which ZenML will use to build a parent image for you.
If you're going to use a custom parent image (either pre-built or by specifying a Dockerfile), you need to make sure that it has Python, pip and ZenML installed for it to work. If you need a starting point, you can take a look at the Dockerfile that ZenML uses here.
Using a pre-built parent image
If you want to use a static parent image (which for example has some internal dependencies installed) that doesn't need to be rebuilt on every pipeline run, you can do so by specifying it on the Docker configuration for your pipeline:
Specifying a Dockerfile to dynamically build a parent image
In some cases you might want full control over the resulting Docker image but want to build a Docker image dynamically each time a pipeline is executed. To make this process easier, ZenML allows you to specify a custom Dockerfile as well as build context directory and build options. ZenML then builds an intermediate image based on the Dockerfile you specified and uses the intermediate image as the parent image.
Depending on the configuration of your Docker configuration, this intermediate image might also be used directly to execute your pipeline steps.
Last updated