Not all ZenML Pipelines (and Steps) are executed in a host-native environment (e.g. your local development machine). Some Backends rather rely on Docker images.
When users need to think about Docker Images¶
Whenever a pipeline is executing non-locally, i.e., when a non-local Backend is specified, there is usually an
image parameter exposed that takes a Docker image as input.
Because ML practitioners may not be familiar with the Docker paradigm, ZenML ensures that there is a series of sane defaults that kick-in for users who do not want (or need) to build their own images. If no image is used a default Base Image is used instead. This Base Image contains all dependencies that come bundled with ZenML.
Some examples of when a Docker image is required:
Creating custom images¶
In cases where the dependencies specified in the base images are not enough, you can easily create a custom image based on the corresponding base image. The base images are hosted on a public Container Registry, namely
[eu.gcr.io/maiot-zenml](http://eu.gcr.io/maiot-zenml). The Dockerfiles of all base images can be found in the
zenml/docker directory of the source code.
The easiest way to create your own, custom ZenML Docker Image, is by starting a new Dockerfile, using the ZenML Base Image as
FROM eu.gcr.io/maiot-zenml/zenml:base-0.1.5 # The ZenML Base Image ADD . . # adds your working directory to the resulting Docker Image RUN pip install -r requirements.txt # install your custom requirements
More seasoned readers might notice, that there is no definition of an
ENTRYPOINT anywhere. The ZenML Docker Images are deliberately designed without an
ENTRYPOINT , as every backend can ship with a generic or specialised pipeline entrypoint. Routing is therefore handled via container invocation, not through defaults.
docker build . -f path/to/your/dockerfile -t your-container-registry/your-image-name:your-image-tag (given you adjust the paths and names) will yield you with your very own Docker image. You should push that image to your own container registry, to ensure availability for your pipeline backends, via
docker push your-container-registry/your-image-name:your-image-tag.
This is by no means a complete guide on building, pushing, and hosting Docker images. We strongly recommend you familiarize yourself with a Container registry of your choosing (e.g. Google Container Registry or Docker Hub), and how the backend of your choosing can interact with the individual Registry options to ensure a pipeline’s backend can pull and run your newly built containers.
Using custom images¶
Once you’ve successfully built and pushed your new Docker Image to a registry you’re ready to use it in your ZenML pipelines. For this example I’ll be re-using our Tutorial on running a pipeline on GCP.
Your new orchestration backend instantiation looks like this:
# Let's define the image you'll want to use: custom_docker_image = 'your-container-registry/your-image-name:your-image-tag' # taken straigt from the Google Cloud VM example artifact_store = 'gs://your-bucket-name/optional-subfolder' project = 'PROJECT' # the project to launch the VM in zone = 'europe-west1-b' # the zone to launch the VM in cloudsql_connection_name = 'PROJECT:REGION:INSTANCE' mysql_db = 'DATABASE' mysql_user = 'USERNAME' mysql_pw = 'PASSWORD' # Run the pipeline on a Google Cloud VM training_pipeline.run( backend=OrchestratorGCPBackend( cloudsql_connection_name=cloudsql_connection_name, project=project, zone=zone, image=custom_docker_image, # this is it - your pipeline will use your very own Docker image. ), metadata_store=MySQLMetadataStore( host='127.0.0.1', port=3306, database=mysql_db, username=mysql_user, password=mysql_pw, ), artifact_store=ArtifactStore(artifact_store) )
For more information on individual backends and if they have support for Docker Images please check the corresponding documentation.