Containerize your pipeline
Using Docker images to run your pipeline.
Containerize your pipeline
There are three ways to control this containerization process:
Define where an image is built
Reuse Docker image builds from previous runs
in Python or using the CLI command:
You can see all pipeline builds with the command:
This will register the build output in the ZenML database and allow you to use the built images when running a pipeline later.
To use a registered build when running a pipeline, pass it as an argument in Python
or when running a pipeline from the CLI
Automate build reuse by connecting a code repository
Customize the Docker building
Sets user-defined environment variables.
The process described above is automated by ZenML and covers the most basic use cases. This section covers various ways to customize the Docker build process to fit your needs.
How to configure Docker builds for your pipelines
Customizing the Docker builds for your pipelines and steps is done using the DockerSettings
class which you can import like this:
There are many ways in which you can supply these settings:
Configuring them on a pipeline applies the settings to all steps of that pipeline:
Configuring them on a step gives you more fine-grained control and enables you to build separate specialized Docker images for different steps of your pipelines:
Handling source files
ZenML determines the root directory of your source files in the following order:
If you've initialized zenml (
zenml init
), the repository root directory will be used.Otherwise, the parent directory of the Python file you're executing will be the source root. For example, running
python /path/to/file.py
, the source root would be/path/to
.
You can specify how these files are handled using the source_files
attribute on the DockerSettings
:
If you want your files to be included in the image in any case, set the
source_files
attribute toinclude
.If you want your files to be downloaded in any case, set the
source_files
attribute todownload
. If this is specified, the files must be inside a registered code repository and the repository must have no local changes, otherwise the Docker build will fail.If you want to prevent ZenML from copying or downloading any of your source files, you can do so by setting the
source_files
attribute on the Docker settings toignore
. This is an advanced feature and will most likely cause unintended and unanticipated behavior when running your pipelines. If you use this, make sure to copy all the necessary files to the correct paths yourself.
Which files get included
Have a file called
.dockerignore
in your source root directory.Explicitly specify a
.dockerignore
file to use:
Installing additional pip dependencies or apt packages
By default, ZenML automatically installs all packages required by your active ZenML stack. However, you can specify additional packages to be installed in various ways:
Install all the packages in your local Python environment (This will use the
pip
orpoetry
package manager to get a list of your local packages):
Specify a list of pip requirements in code:
Specify a pip requirements file:
Specify a list of apt packages in code:
Prevent ZenML from automatically installing the requirements of your stack:
In some cases the steps of your pipeline will have conflicting requirements or some steps of your pipeline will require large dependencies that don't need to be installed to run the remaining steps of your pipeline. For this case, ZenML allows you to specify custom Docker settings for steps in your pipeline.
You can combine these methods but do make sure that your list of pip requirements does not overlap with the ones specified explicitly in the docker settings.
Depending on the options specified in your Docker settings, ZenML installs the requirements in the following order (each step optional):
The packages installed in your local Python environment
The packages specified via the
requirements
attribute (step level overwrites pipeline level)The packages specified via the
required_integrations
and potentially stack requirements
Using a custom parent image
Using a pre-built parent image
To use a static parent image (e.g., with internal dependencies installed) that doesn't need to be rebuilt on every pipeline run, specify it in the Docker settings for your pipeline:
To use this image directly to run your steps without including any code or installing any requirements on top of it, skip the Docker builds by specifying it in the Docker settings:
This is an advanced feature and may cause unintended behavior when running your pipelines. If you use this, ensure your code files are correctly included in the image you specified.
Specifying a Dockerfile to dynamically build a parent image
In some cases, you might want full control over the resulting Docker image but want to build a Docker image dynamically each time a pipeline is executed. To make this process easier, ZenML allows you to specify a custom Dockerfile as well as build context
directory and build options. ZenML then builds an intermediate image based on the Dockerfile you specified and uses the intermediate image as the parent image.
Depending on the configuration of your Docker settings, this intermediate image might also be used directly to execute your pipeline steps.
Last updated