Self-hosted deployment
Guide for installing ZenML Pro self-hosted in a Kubernetes cluster.
This page provides instructions for installing ZenML Pro - the ZenML Pro Control Plane and one or more ZenML Pro Workspace servers - on-premise in a Kubernetes cluster. For more general information on deploying ZenML, visit our documentation where we explain the different options you have.
Overview
ZenML Pro can be installed as a self-hosted deployment. You need to be granted access to the ZenML Pro container images and you'll have to provide your own infrastructure: a Kubernetes cluster, a database server and a few other common prerequisites usually needed to expose Kubernetes services via HTTPs - a load balancer, an Ingress controller, HTTPs certificate(s) and DNS rule(s).
This document will guide you through the process.
Preparation and prerequisites
Software Artifacts
The ZenML Pro on-prem installation relies on a set of container images and Helm charts. The container images are stored in private ZenML container registries that are not available to the public.
If you haven't done so already, please book a demo to get access to the private ZenML Pro container images.
ZenML Pro Control Plane Artifacts
The following artifacts are required to install the ZenML Pro control plane in your own Kubernetes cluster:
private container images for the ZenML Pro API server:
715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-api
in AWSeurope-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-api
in GCP
private container images for the ZenML Pro dashboard:
715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-dashboard
in AWSeurope-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-dashboard
in GCP
the public ZenML Pro helm chart (as an OCI artifact):
oci://public.ecr.aws/zenml/zenml-pro
ZenML Pro Workspace Server Artifacts
The following artifacts are required to install ZenML Pro workspace servers in your own Kubernetes cluster:
private container images for the ZenML Pro workspace server:
715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server
in AWSeurope-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-server
in GCP
the public open-source ZenML Helm chart (as an OCI artifact):
oci://public.ecr.aws/zenml/zenml
ZenML Pro Client Artifacts
If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located in Docker Hub at zenmldocker/zenml
. This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the DockerSettings documentation for more information).
Accessing the ZenML Pro Container Images
This section provides instructions for how to access the private ZenML Pro container images.
AWS
To access the ZenML Pro container images stored in AWS ECR, you need to set up an AWS IAM user or IAM role in your AWS account. The steps below outline how to create an AWS account, configure the necessary IAM entities, and pull images from the private repositories. If you're familiar with AWS or even plan on using an AWS EKS cluster to deploy ZenML Pro, then you can simply use your existing IAM user or IAM role and skip steps 1. and 2.
Step 1: Create a Free AWS Account
Visit the AWS Free Tier page.
Click Create a Free Account.
Follow the on-screen instructions to provide your email address, create a root user, and set a secure password.
Enter your contact and payment information for verification purposes. While a credit or debit card is required, you won't be charged for free-tier eligible services.
Confirm your email and complete the verification process.
Log in to the AWS Management Console using your root user credentials.
Step 2: Create an IAM User or IAM Role
A. Create an IAM User
Log in to the AWS Management Console.
Navigate to the IAM service.
Click Users in the left-hand menu, then click Add Users.
Provide a user name (e.g.,
zenml-ecr-access
).Select Access Key - Programmatic access as the AWS credential type.
Click Next: Permissions.
Choose Attach policies directly, then select the following policies:
AmazonEC2ContainerRegistryReadOnly
Click Next: Tags and optionally add tags for organization purposes.
Click Next: Review, then Create User.
Note the Access Key ID and Secret Access Key displayed after creation. Save these securely.
B. Create an IAM Role
Navigate to the IAM service.
Click Roles in the left-hand menu, then click Create Role.
Choose the type of trusted entity:
Select AWS Account.
Enter your AWS account ID and click Next.
Select the AmazonEC2ContainerRegistryReadOnly policy.
Click Next: Tags, optionally add tags, then click Next: Review.
Provide a role name (e.g.,
zenml-ecr-access-role
) and click Create Role.
Step 3: Provide the IAM User/Role ARN
For an IAM user, the ARN can be found in the Users section under the Summary tab.
For an IAM role, the ARN is displayed in the Roles section under the Summary tab.
Send the ARN to ZenML Support so it can be granted permission to access the ZenML Pro container images and Helm charts.
Step 4: Authenticate your Docker Client
Run these steps on the machine that you'll use to pull the ZenML Pro images. It is recommended that you copy the container images into your own container registry that will be accessible from the Kubernetes cluster where ZenML Pro will be stored, otherwise you'll have to find a way to configure the Kubernetes cluster to authenticate directly to the ZenML Pro container registry and that will be problematic if your Kubernetes cluster is not running on AWS.
A. Install AWS CLI
Follow the instructions to install the AWS CLI: AWS CLI Installation Guide.
B. Configure AWS CLI Credentials
Open a terminal and run
aws configure
Enter the following when prompted:
Access Key ID: Provided during IAM user creation.
Secret Access Key: Provided during IAM user creation.
Default region name:
eu-west-1
Default output format: Leave blank or enter
json
.
If you chose to use an IAM role, update the AWS CLI configuration file to specify the role you want to assume. Open the configuration file located at
~/.aws/config
and add the following:[profile zenml-ecr-access] role_arn = <IAM-ROLE-ARN> source_profile = default region = eu-west-1
Replace
<IAM-ROLE-ARN>
with the ARN of the role you created and ensuresource_profile
points to a profile with sufficient permissions to assume the role.
C. Authenticate Docker with ECR
Run the following command to authenticate your Docker client with the ZenML ECR repository:
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 715803424590.dkr.ecr.eu-west-1.amazonaws.com aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 715803424590.dkr.ecr.eu-central-1.amazonaws.com
If you used an IAM role, use the specified profile to execute commands. For example:
aws ecr get-login-password --region eu-west-1 --profile zenml-ecr-access | docker login --username AWS --password-stdin 715803424590.dkr.ecr.eu-west-1.amazonaws.com aws ecr get-login-password --region eu-central-1 --profile zenml-ecr-access | docker login --username AWS --password-stdin 715803424590.dkr.ecr.eu-central-1.amazonaws.com
This will allow you to authenticate to the ZenML Pro container registries and pull the necessary images with Docker, e.g.:
# Pull the ZenML Pro API image docker pull 715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-api:<zenml-pro-version> # Pull the ZenML Pro Dashboard image docker pull 715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-dashboard:<zenml-pro-version> # Pull the ZenML Pro Server image docker pull 715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server:<zenml-oss-version>
GCP
To access the ZenML Pro container images stored in Google Cloud Platform (GCP) Artifact Registry, you need to set up a GCP account and configure the necessary permissions. The steps below outline how to create a GCP account, configure authentication, and pull images from the private repositories. If you're familiar with GCP or plan on using a GKE cluster to deploy ZenML Pro, you can use your existing GCP account and skip step 1.
Step 1: Create a GCP Account
Visit the Google Cloud Console.
Click Get Started for Free or sign in with an existing Google account.
Follow the on-screen instructions to set up your account and create a project.
Set up billing information (required for using GCP services).
Step 2: Create a Service Account
Navigate to the IAM & Admin > Service Accounts page in the Google Cloud Console.
Click Create Service Account.
Enter a service account name (e.g.,
zenml-gar-access
).Add a description (optional) and click Create and Continue.
No additional permissions are needed as access will be granted directly to the Artifact Registry.
Click Done.
After creation, click on the service account to view its details.
Go to the Keys tab and click Add Key > Create new key.
Choose JSON as the key type and click Create.
Save the downloaded JSON key file securely - you'll need it later.
Step 3: Provide the Service Account Email
In the service account details page, copy the service account email address (it should look like
[email protected]
).Send this email address to ZenML Support so it can be granted permission to access the ZenML Pro container images.
Step 4: Authenticate your Docker Client
Run these steps on the machine that you'll use to pull the ZenML Pro images. It is recommended that you copy the container images into your own container registry that will be accessible from the Kubernetes cluster where ZenML Pro will be stored.
A. Install Google Cloud CLI
Follow the instructions to install the Google Cloud CLI.
Initialize the CLI by running:
gcloud init
B. Configure Authentication
Activate the service account using the JSON key file you downloaded:
gcloud auth activate-service-account --key-file=/path/to/your-key-file.json
Configure Docker authentication for Artifact Registry:
gcloud auth configure-docker europe-west3-docker.pkg.dev
C. Pull the Container Images
You can now pull the ZenML Pro images:
# Pull the ZenML Pro API image docker pull europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-api:<zenml-pro-version> # Pull the ZenML Pro Dashboard image docker pull europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-dashboard:<zenml-pro-version> # Pull the ZenML Pro Server image docker pull europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-server:<zenml-oss-version>
Air-Gapped Installation
If you need to install ZenML Pro in an air-gapped environment (a network with no direct internet access), you'll need to transfer all required artifacts to your internal infrastructure. Here's a step-by-step process:
1. Prepare a Machine with Internet Access
First, you'll need a machine with both internet access and sufficient storage space to temporarily store all artifacts. On this machine:
Follow the authentication steps described above to gain access to the private repositories
Install the required tools:
Docker
Helm
2. Download All Required Artifacts
A Bash script like the following can be used to download all necessary components, or you can run the listed commands manually:
#!/bin/bash
set -e
# Set the version numbers
ZENML_PRO_VERSION="<version>" # e.g., "0.10.24"
ZENML_OSS_VERSION="<version>" # e.g., "0.73.0"
# Create directories for artifacts
mkdir -p zenml-artifacts/images
mkdir -p zenml-artifacts/charts
# Set registry URLs
# Use the following if you're pulling from the ZenML private ECR registry
ZENML_PRO_REGISTRY="715803424590.dkr.ecr.eu-west-1.amazonaws.com"
ZENML_PRO_SERVER_REGISTRY="715803424590.dkr.ecr.eu-central-1.amazonaws.com"
# Use the following if you're pulling from the ZenML private GCP Artifact Registry
# ZENML_PRO_REGISTRY="europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro"
# ZENML_PRO_SERVER_REGISTRY=$ZENML_PRO_REGISTRY
ZENML_HELM_REGISTRY="public.ecr.aws/zenml"
ZENML_DOCKERHUB_REGISTRY="zenmldocker"
# Download container images
echo "Downloading container images..."
docker pull ${ZENML_PRO_REGISTRY}/zenml-pro-api:${ZENML_PRO_VERSION}
docker pull ${ZENML_PRO_REGISTRY}/zenml-pro-dashboard:${ZENML_PRO_VERSION}
docker pull ${ZENML_PRO_SERVER_REGISTRY}/zenml-pro-server:${ZENML_OSS_VERSION}
docker pull ${ZENML_DOCKERHUB_REGISTRY}/zenml:${ZENML_OSS_VERSION}
# Save images to tar files
echo "Saving images to tar files..."
docker save ${ZENML_PRO_REGISTRY}/zenml-pro-api:${ZENML_PRO_VERSION} > zenml-artifacts/images/zenml-pro-api.tar
docker save ${ZENML_PRO_REGISTRY}/zenml-pro-dashboard:${ZENML_PRO_VERSION} > zenml-artifacts/images/zenml-pro-dashboard.tar
docker save ${ZENML_PRO_SERVER_REGISTRY}/zenml-pro-server:${ZENML_OSS_VERSION} > zenml-artifacts/images/zenml-pro-server.tar
docker save ${ZENML_DOCKERHUB_REGISTRY}/zenml:${ZENML_OSS_VERSION} > zenml-artifacts/images/zenml-client.tar
# Download Helm charts
echo "Downloading Helm charts..."
helm pull oci://${ZENML_HELM_REGISTRY}/zenml-pro --version ${ZENML_PRO_VERSION} -d zenml-artifacts/charts
helm pull oci://${ZENML_HELM_REGISTRY}/zenml --version ${ZENML_OSS_VERSION} -d zenml-artifacts/charts
# Create a manifest file with versions
echo "Creating manifest file..."
cat > zenml-artifacts/manifest.txt << EOF
ZenML Pro Version: ${ZENML_PRO_VERSION}
ZenML OSS Version: ${ZENML_OSS_VERSION}
Date Created: $(date)
Container Images:
- zenml-pro-api:${ZENML_PRO_VERSION}
- zenml-pro-dashboard:${ZENML_PRO_VERSION}
- zenml-pro-server:${ZENML_OSS_VERSION}
- zenml-client:${ZENML_OSS_VERSION}
Helm Charts:
- zenml-pro-${ZENML_PRO_VERSION}.tgz
- zenml-${ZENML_OSS_VERSION}.tgz
EOF
# Create final archive
echo "Creating final archive..."
tar czf zenml-artifacts.tar.gz zenml-artifacts/
3. Transfer Artifacts to Air-Gapped Environment
Copy the
zenml-artifacts.tar.gz
file to your preferred transfer medium (e.g., USB drive, approved file transfer system)Transfer the archive to a machine in your air-gapped environment that has access to your internal container registry
4. Load Artifacts in Air-Gapped Environment
Create a script to load the artifacts in your air-gapped environment or run the listed commands manually:
#!/bin/bash
set -e
# Extract the archive
echo "Extracting archive..."
tar xzf zenml-artifacts.tar.gz
# Read the manifest
echo "Manifest:"
cat zenml-artifacts/manifest.txt
# Load images and track which ones were loaded
echo "Loading images into Docker..."
LOADED_IMAGES=()
# Load each image and capture its reference
image_ref=$(docker load < zenml-artifacts/images/zenml-pro-api.tar | grep "Loaded image:" | cut -d' ' -f3)
LOADED_IMAGES+=("$image_ref")
echo "Loaded image: $image_ref"
image_ref=$(docker load < zenml-artifacts/images/zenml-pro-dashboard.tar | grep "Loaded image:" | cut -d' ' -f3)
LOADED_IMAGES+=("$image_ref")
echo "Loaded image: $image_ref"
image_ref=$(docker load < zenml-artifacts/images/zenml-pro-server.tar | grep "Loaded image:" | cut -d' ' -f3)
LOADED_IMAGES+=("$image_ref")
echo "Loaded image: $image_ref"
image_ref=$(docker load < zenml-artifacts/images/zenml-client.tar | grep "Loaded image:" | cut -d' ' -f3)
LOADED_IMAGES+=("$image_ref")
echo "Loaded image: $image_ref"
# Tag and push images to your internal registry
INTERNAL_REGISTRY="internal-registry.company.com"
echo "Pushing images to internal registry..."
for img in "${LOADED_IMAGES[@]}"; do
# Get the image name without the repository and tag
img_name=$(echo $img | awk -F/ '{print $NF}' | cut -d: -f1)
# Get the tag
tag=$(echo $img | cut -d: -f2)
echo "Processing $img"
docker tag "$img" "${INTERNAL_REGISTRY}/zenml/$img_name:$tag"
docker push "${INTERNAL_REGISTRY}/zenml/$img_name:$tag"
echo "Pushed image: ${INTERNAL_REGISTRY}/zenml/$img_name:$tag"
done
# Copy Helm charts to your internal Helm repository (if applicable)
echo "Helm charts are available in: zenml-artifacts/charts/"
5. Update Configuration
When deploying ZenML Pro in your air-gapped environment, make sure to update all references to container images in your Helm values to point to your internal registry. For example:
zenml:
image:
api:
repository: internal-registry.company.com/zenml/zenml-pro-api
dashboard:
repository: internal-registry.company.com/zenml/zenml-pro-dashboard
The scripts provided above are examples and may need to be adjusted based on your specific security requirements and internal infrastructure setup.
6. Using the Helm Charts
After downloading the Helm charts, you can use their local paths instead of a remote OCI registry to deploy ZenML Pro components. Here's an example of how to use them:
# Install the ZenML Pro Control Plane (e.g. zenml-pro-0.10.24.tgz)
helm install zenml-pro ./zenml-artifacts/charts/zenml-pro-<version>.tgz \
--namespace zenml-pro \
--create-namespace \
--values your-values.yaml
# Install a ZenML Pro Workspace Server (e.g. zenml-0.73.0.tgz)
helm install zenml-workspace ./zenml-artifacts/charts/zenml-<version>.tgz \
--namespace zenml-workspace \
--create-namespace \
--values your-workspace-values.yaml
Infrastructure Requirements
To deploy the ZenML Pro control plane and one or more ZenML Pro workspace servers, ensure the following prerequisites are met:
Kubernetes Cluster
A functional Kubernetes cluster is required as the primary runtime environment.
Database Server(s)
The ZenML Pro Control Plane and ZenML Pro Workspace servers need to connect to an external database server. To minimize the amount of infrastructure resources needed, you can use a single database server in common for the Control Plane and for all workspaces, or you can use different database servers to ensure server-level database isolation, as long as you keep in mind the following limitations:
the ZenML Pro Control Plane can be connected to either MySQL or Postgres as the external database
the ZenML Pro Workspace servers can only be connected to a MySQL database (no Postgres support is available)
the ZenML Pro Control Plane as well as every ZenML Pro Workspace server needs to use its own individual database (especially important when connected to the same server)
Ensure you have a valid username and password for the different ZenML Pro services. For improved security, it is recommended to have different users for different services. If the database user does not have permissions to create databases, you must also create a database and give the user full permissions to access and manage it (i.e. create, update and delete tables).
Ingress Controller
Install an Ingress provider in the cluster (e.g., NGINX, Traefik) to handle HTTP(S) traffic routing. Ensure the Ingress provider is properly configured to expose the cluster's services externally.
Domain Name
You'll need an FQDN for the ZenML Pro Control Plane as well as for every ZenML Pro workspace. For this reason, it's highly recommended to use a DNS prefix and associated SSL certificate instead of individual FQDNs and SSL certificates, to make this process easier.
FQDN or DNS Prefix Setup Obtain a Fully Qualified Domain Name (FQDN) or DNS prefix (e.g.,
*.zenml-pro.mydomain.com
) from your DNS provider.Identify the external Load Balancer IP address of the Ingress controller using the command
kubectl get svc -n <ingress-namespace>
. Look for theEXTERNAL-IP
field of the Load Balancer service.Create a DNS
A
record (orCNAME
for subdomains) pointing the FQDN to the Load Balancer IP. Example:Host:
zenml-pro.mydomain.com
Type:
A
Value:
<Load Balancer IP>
Use a DNS propagation checker to confirm that the DNS record is resolving correctly.
Make sure you don't use a simple DNS prefix for the servers (e.g. https://zenml.cluster
is not recommended). This is especially relevant for the TLS certificates that you have to prepare for these endpoints. Always use a fully qualified domain name (FQDN) (e.g. https://zenml.ml.cluster
). The TLS certificates will not be accepted by some browsers otherwise (e.g. Chrome).
SSL Certificate
The ZenML Pro services do not terminate SSL traffic. It is your responsibility to generate and configure the necessary SSL certificates for the ZenML Pro Control Plane as well as all the ZenML Pro workspaces that you will deploy (see the previous point on how to use a DNS prefix to make the process easier).
Obtaining SSL Certificates
Acquire an SSL certificate for the domain. You can use:
A commercial SSL certificate provider (e.g., DigiCert, Sectigo).
Free services like Let's Encrypt for domain validation and issuance.
Self-signed certificates (not recommended for production environments). IMPORTANT: If you are using self-signed certificates, it is highly recommended to use the same self-signed CA certificate for all the ZenML Pro services (control plane and workspace servers), otherwise it will be difficult to manage the certificates on the client machines. With only one CA certificate, you can install it system-wide on all the client machines only once and then use it to sign all the TLS certificates for the ZenML Pro services.
Configuring SSL Termination
Once the SSL certificate is obtained, configure your load balancer or Ingress controller to terminate HTTPS traffic:
For NGINX Ingress Controller:
You can configure SSL termination globally for the NGINX Ingress Controller by setting up a default SSL certificate or configuring it at the ingress controller level, or you can specify SSL certificates when configuring the ingress in the ZenML server Helm values.
Here's how you can do it globally:
Create a TLS Secret
Store your SSL certificate and private key as a Kubernetes TLS secret in the namespace where the NGINX Ingress Controller is deployed.
kubectl create secret tls default-ssl-secret \\ --cert=/path/to/tls.crt \\ --key=/path/to/tls.key \\ -n <nginx-ingress-namespace>
Update NGINX Ingress Controller Configurations
Configure the NGINX Ingress Controller to use the default SSL certificate.
If using the NGINX Ingress Controller Helm chart, modify the
values.yaml
file or use-set
during installation:controller: extraArgs: default-ssl-certificate: <nginx-ingress-namespace>/default-ssl-secret
Or directly pass the argument during Helm installation or upgrade:
helm upgrade --install ingress-nginx ingress-nginx \\ --repo <https://kubernetes.github.io/ingress-nginx> \\ --namespace <nginx-ingress-namespace> \\ --set controller.extraArgs.default-ssl-certificate=<nginx-ingress-namespace>/default-ssl-secret
If the NGINX Ingress Controller was installed manually, edit its deployment to include the argument in the
args
section of the container:spec: containers: - name: controller args: - --default-ssl-certificate=<nginx-ingress-namespace>/default-ssl-secret
For Traefik:
Configure Traefik to use TLS by creating a certificate resolver for Let's Encrypt or specifying the certificates manually in the
traefik.yml
orvalues.yaml
file. Example for Let's Encrypt:tls: certificatesResolvers: letsencrypt: acme: email: [email protected] storage: acme.json httpChallenge: entryPoint: web entryPoints: web: address: ":80" websecure: address: ":443"
Reference the domain in your IngressRoute or Middleware configuration.
If you used a custom CA certificate to sign the TLS certificates for the ZenML Pro services, you will need to install the CA certificates on every client machine, as covered in the Install CA Certificates section.
The above are infrastructure requirements for ZenML Pro. If, in addition to ZenML, you would also like to reuse the same Kubernetes cluster to run machine learning workloads with ZenML, you will require the following additional infrastructure resources and services to be able to set up a remote ZenML Stack:
a Kubernetes ZenML Orchestrator can be set up to run on the same cluster as ZenML Pro. For authentication, you will be able to configure a ZenML Kubernetes Service Connector using service account tokens
you'll need a container registry to store the container images built by ZenML. If you don't have one already, you can install Docker registry on the same cluster as ZenML Pro.
you'll also need some form of centralized object storage to store the artifacts generated by ZenML. If you don't have one already, you can install MinIO on the same cluster as ZenML Pro and then configure the ZenML S3 Artifact Store to use it.
(optional) you can install Kaniko in your Kubernetes cluster to build the container images for your ZenML pipelines and then configure it as a ZenML Kaniko Image Builder in your ZenML Stack.
Stage 1/2: Install the ZenML Pro Control Plane
Set up Credentials
If your Kubernetes cluster is not set to be authenticated to the container registry where the ZenML Pro container images are hosted, you will need to create a secret to allow the ZenML Pro server to pull the images. The following is an example of how to do this if you've received a private access key for the ZenML GCP Artifact Registry from ZenML, but you can use the same approach for your own private container registry:
kubectl create ns zenml-pro
kubectl -n zenml-pro create secret docker-registry image-pull-secret \
--docker-server=europe-west3-docker.pkg.dev \
--docker-username=_json_key_base64 \
--docker-password="$(cat key.base64)" \
--docker-email=unused
The key.base64
file should contain the base64 encoded JSON key for the GCP service account as received from the ZenML support team. The image-pull-secret
secret will be used in the next step when installing the ZenML Pro helm chart.
Configure the Helm Chart
There are a variety of options that can be configured for the ZenML Pro helm chart before installation.
You can take look at the Helm chart README and values.yaml
file and familiarize yourself with some of the configuration settings that you can customize for your ZenML Pro deployment. Alternatively, you can unpack the README.md
and values.yaml
files included in the helm chart:
helm pull --untar oci://public.ecr.aws/zenml/zenml-pro --version <version>
less zenml-pro/README.md
less zenml-pro/values.yaml
This is an example Helm values YAML file that covers the most common configuration options:
# Set up imagePullSecrets to authenticate to the container registry where the
# ZenML Pro container images are hosted, if necessary (see the previous step)
imagePullSecrets:
- name: image-pull-secret
# ZenML Pro server related options.
zenml:
image:
api:
# Change this to point to your own container repository or use this for direct ECR access
repository: 715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-api
# Use this for direct GAR access
# repository: europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-api
dashboard:
# Change this to point to your own container repository or use this for direct ECR access
repository: 715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-dashboard
# Use this for direct GAR access
# repository: europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-dashboard
# The external URL where the ZenML Pro server API and dashboard are reachable.
#
# This should be set to a hostname that is associated with the Ingress
# controller.
serverURL: https://zenml-pro.my.domain
# Database configuration.
database:
# Credentials to use to connect to an external Postgres or MySQL database.
external:
# The type of the external database service to use:
# - postgres: use an external Postgres database service.
# - mysql: use an external MySQL database service.
type: mysql
# The host of the external database service.
host: my-database.my.domain
# The username to use to connect to the external database service.
username: zenml
# The password to use to connect to the external database service.
password: my-password
# The name of the database to use. Will be created on first run if it
# doesn't exist.
#
# NOTE: if the database user doesn't have permissions to create this
# database, the database should be created manually before installing
# the helm chart.
database: zenmlpro
ingress:
enabled: true
# Use the same hostname configured in `serverURL`
host: zenml-pro.my.domain
Minimum required settings:
the database credentials (
zenml.database.external
)the URL (
zenml.serverURL
) and Ingress hostname (zenml.ingress.host
) where the ZenML Pro Control Plane API and Dashboard will be reachable
In addition to the above, the following might also be relevant for you:
configure container registry credentials (
imagePullSecrets
)injecting custom CA certificates (
zenml.certificates
), especially important if the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authorityconfigure HTTP proxy settings (
zenml.proxy
)custom container image repository locations (
zenml.image.api
andzenml.image.dashboard
)the username and password used for the default admin account (
zenml.auth.password
)additional Ingress settings (
zenml.ingress
)Kubernetes resources allocated to the pods (
resources
)If you set up a common DNS prefix that you plan on using for all the ZenML Pro services, you may configure the domain of the HTTP cookies used by the ZenML Pro dashboard to match it by setting
zenml.auth.authCookieDomain
to the DNS prefix (e.g..my.domain
instead ofzenml-pro.my-domain
)
Install the Helm Chart
To install the helm chart (assuming the customized configuration values are in a my-values.yaml
file), run:
helm --namespace zenml-pro upgrade --install --create-namespace zenml-pro oci://public.ecr.aws/zenml/zenml-pro --version <version> --values my-values.yaml
If the installation is successful, you should be able to see the following workloads running in your cluster:
$ kubectl -n zenml-pro get all
NAME READY STATUS RESTARTS AGE
pod/zenml-pro-5db4c4d9d-jwp6x 1/1 Running 0 1m
pod/zenml-pro-dashboard-855c4849-qf2f6 1/1 Running 0 1m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/zenml-pro ClusterIP 172.20.230.49 <none> 80/TCP 162m
service/zenml-pro-dashboard ClusterIP 172.20.163.154 <none> 80/TCP 162m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/zenml-pro 1/1 1 1 1m
deployment.apps/zenml-pro-dashboard 1/1 1 1 1m
NAME DESIRED CURRENT READY AGE
replicaset.apps/zenml-pro-5db4c4d9d 1 1 1 1m
replicaset.apps/zenml-pro-dashboard-855c4849 1 1 1 1m
The Helm chart will output information explaining how to connect and authenticate to the ZenML Pro dashboard:
You may access the ZenML Pro server at: https://zenml-pro.my.domain
Use the following credentials:
Username: [email protected]
Password: fetch the password by running:
kubectl get secret --namespace zenml-pro zenml-pro -o jsonpath="{.data.ZENML_CLOUD_ADMIN_PASSWORD}" | base64 --decode; echo
The credentials are for the default administrator user account provisioned on installation. With these on-hand, you can proceed to the next step and on-board additional users.
Install CA Certificates
If the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authority, you need to install the CA certificates on every machine that needs to access the ZenML server:
installing the CA certificates system-wide is usually the easiest solution. For example, on Ubuntu and Debian-based systems, you can install the CA certificates system-wide by copying the CA certificates into the
/usr/local/share/ca-certificates
directory and runningupdate-ca-certificates
.for some browsers (e.g. Chrome), updating the system's CA certificates is not enough. You will also need to import the CA certificates into the browser.
for Python, you also need to set the
REQUESTS_CA_BUNDLE
environment variable to the path to the system's CA certificates bundle file (e.g.export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
)later on, when you're running containerized pipelines with ZenML, you'll also want to install those same CA certificates into the container images built by ZenML by customizing the build process via DockerSettings. For example:
customize the ZenML client container image using a Dockerfile like this:
# Use the original ZenML client image as a base image. The ZenML version # should match the version of the ZenML server you're using (e.g. 0.73.0). FROM zenmldocker/zenml:<zenml-version> # Install certificates COPY my-custom-ca.crt /usr/local/share/ca-certificates/ RUN update-ca-certificates ENV REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
then build and push that image to your private container registry:
docker build -t my.docker.registry/my-custom-zenml-image:<zenml-version> . docker push my.docker.registry/my-custom-zenml-image:<zenml-version>
and finally update your ZenML pipeline code to use the custom ZenML client image by using the
DockerSettings
class:from zenml.config import DockerSettings from zenml import __version__ # Define the custom base image CUSTOM_BASE_IMAGE = f"my.docker.registry/my-custom-zenml-image:{__version__}" docker_settings = DockerSettings( parent_image=CUSTOM_BASE_IMAGE, ) @pipeline(settings={"docker": docker_settings}) def my_pipeline() -> None: ...
Onboard Additional Users
The deployed ZenML Pro service will come with a pre-installed default administrator account. This admin account serves the purpose of creating and recovering other users. First you will need to get the admin password following the instructions at the previous step.
kubectl get secret --namespace zenml-pro zenml-pro -o jsonpath="{.data.ZENML_CLOUD_ADMIN_PASSWORD}" | base64 --decode; echo
Create a
users.yml
file that contains a list of all the users that you want to create for ZenML. Also set a default password. The users will be asked to change this password on their first login.users: - email: [email protected] password: tu3]4_Xz{5$9
Run the
create_users.py
script below. This will create all of the users.[file: create_users.py]
import getpass from typing import Optional import requests import yaml import sys # Configuration LOGIN_ENDPOINT = "/api/v1/auth/login" USERS_ENDPOINT = "/api/v1/users" def login(base_url: str, username: str, password: str): """Log in and return the authentication token.""" # Define the headers headers = { 'accept': 'application/json', 'Content-Type': 'application/x-www-form-urlencoded' } # Define the data payload data = { 'grant_type': '', 'username': username, 'password': password, 'client_id': '', 'client_secret': '', 'device_code': '', 'audience': '' } login_url = f"{base_url}{LOGIN_ENDPOINT}" response = requests.post(login_url, headers=headers, data=data) if response.status_code == 200: return response.json().get("token") else: print(f"Login failed. Status code: {response.status_code}") print(f"Response: {response.text}") sys.exit(1) def create_user(token: str, base_url: str, email: str, password: Optional[str]): """Create a user with the given email.""" users_url = f"{base_url}{USERS_ENDPOINT}" params = { 'email': email, 'password': password } # Define the headers headers = { 'accept': 'application/json', "Authorization": f"Bearer {token}" } # Make the POST request response = requests.post(users_url, params=params, headers=headers, data='') if response.status_code == 200: print(f"User created successfully: {email}") else: print(f"Failed to create user: {email}") print(f"Status code: {response.status_code}") print(f"Response: {response.text}") def main(): # Get login credentials base_url = input("ZenML URL: ") username = input("Enter username: ") password = getpass.getpass("Enter password: ") # Get the YAML file path yaml_file = input("Enter the path to the YAML file containing email addresses: ") # Login and get token token = login(base_url, username, password) print("Login successful.") # Read users from YAML file try: with open(yaml_file, 'r') as file: data = yaml.safe_load(file) except Exception as e: print(f"Error reading YAML file: {e}") sys.exit(1) users = data['users'] # Create users if isinstance(users, list): for user in users: create_user(token, base_url, user["email"], user["password"]) else: print("Invalid YAML format. Expected a list of email addresses.") if __name__ == "__main__": main()
The script will prompt you for the URL of your deployment, the admin account email and admin account password and finally the location of your users.yml
file.

Create an Organization
The ZenML Pro admin user should only be used for administrative operations: creating other users, resetting the password of existing users and enrolling workspaces. All other operations should be executed while logged in as a regular user.
Head on over to your deployment in the browser and use one of the users you just created to log in.

After logging in for the first time, you will need to create a new password. (Be aware: For the time being only the admin account will be able to reset this password)

Finally you can create an Organization. This Organization will host all the workspaces you enroll at the next stage.

Invite Other Users to the Organization
Now you can invite your whole team to the org. For this open the drop-down in the top right and head over to the settings.

Here in the members tab, add all the users you created in the previous step.

For each user, finally head over to the Pending invited screen and copy the invite link for each user.

Finally, send the invitation link, along with the account's email and initial password over to your team members.
Stage 2/2: Enroll and Deploy ZenML Pro workspaces
Installing and updating on-prem ZenML Pro workspace servers is not automated, as it is with the SaaS version. You will be responsible for enrolling workspace servers in the right ZenML Pro organization, installing them and regularly updating them. Some scripts are provided to simplify this task as much as possible.
Enrolling a Workspace
Run the
enroll-workspace.py
script belowThis will collect all the necessary data, then enroll the workspace in the organization and generate a Helm
values.yaml
file template that you can use to install the workspace server:[file: enroll-workspace.py]
import getpass import sys import uuid from typing import List, Optional, Tuple import requests DEFAULT_API_ROOT_PATH = "/api/v1" DEFAULT_REPOSITORY = ( "715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server" ) # Configuration LOGIN_ENDPOINT = "/api/v1/auth/login" WORKSPACE_ENDPOINT = "/api/v1/workspaces" ORGANIZATION_ENDPOINT = "/api/v1/organizations" def login(base_url: str, username: str, password: str) -> str: """Log in and return the authentication token.""" # Define the headers headers = { "accept": "application/json", "Content-Type": "application/x-www-form-urlencoded", } # Define the data payload data = { "grant_type": "", "username": username, "password": password, "client_id": "", "client_secret": "", "device_code": "", "audience": "", } login_url = f"{base_url}{LOGIN_ENDPOINT}" response = requests.post(login_url, headers=headers, data=data) if response.status_code == 200: return response.json().get("access_token") else: print(f"Login failed. Status code: {response.status_code}") print(f"Response: {response.text}") sys.exit(1) def workspace_exists( token: str, base_url: str, org_id: str, workspace_name: Optional[str] = None, ) -> Optional[str]: """Get a workspace with a given name or url.""" workspace_url = f"{base_url}{WORKSPACE_ENDPOINT}" # Define the headers headers = { "accept": "application/json", "Authorization": f"Bearer {token}", } params = { "organization_id": org_id, } if workspace_name: params["workspace_name"] = workspace_name # Create the workspace response = requests.get( workspace_url, params=params, headers=headers, ) if response.status_code == 200: json_response = response.json() if len(json_response) > 0: return json_response[0]["id"] else: print(f"Failed to fetch workspaces for organization: {org_id}") print(f"Status code: {response.status_code}") print(f"Response: {response.text}") sys.exit(1) return None def list_organizations( token: str, base_url: str, ) -> List[Tuple[str, str]]: """Get a list of organizations.""" organization_url = f"{base_url}{ORGANIZATION_ENDPOINT}" # Define the headers headers = { "accept": "application/json", "Authorization": f"Bearer {token}", } # Create the workspace response = requests.get( organization_url, headers=headers, ) if response.status_code == 200: json_response = response.json() return [(org["id"], org["name"]) for org in json_response] else: print("Failed to fetch organizations") print(f"Status code: {response.status_code}") print(f"Response: {response.text}") sys.exit(1) def enroll_workspace( token: str, base_url: str, org_id: str, workspace_name: str, delete_existing: Optional[str] = None, ) -> dict: """Enroll a workspace.""" workspace_url = f"{base_url}{WORKSPACE_ENDPOINT}" # Define the headers headers = { "accept": "application/json", "Authorization": f"Bearer {token}", } if delete_existing: # Delete the workspace response = requests.delete( f"{workspace_url}/{delete_existing}", headers=headers, ) if response.status_code == 200: print(f"Workspace deleted successfully: {delete_existing}") else: print(f"Failed to delete workspace: {delete_existing}") print(f"Status code: {response.status_code}") print(f"Response: {response.text}") sys.exit(1) # Enroll the workspace response = requests.post( workspace_url, json={ "name": workspace_name, "organization_id": org_id, }, params={ "enroll": True, }, headers=headers, ) if response.status_code == 200: workspace = response.json() workspace_id = workspace.get("id") print(f"Workspace enrolled successfully: {workspace_name} [{workspace_id}]") return workspace else: print(f"Failed to enroll workspace: {workspace_name}") print(f"Status code: {response.status_code}") print(f"Response: {response.text}") sys.exit(1) def prompt( prompt_text: str, default_value: Optional[str] = None, password: bool = False, ) -> str: """Prompt the user with a default value.""" while True: if default_value: text = f"{prompt_text} [{default_value}]: " else: text = f"{prompt_text}: " if password: user_input = getpass.getpass(text) else: user_input = input(text) if user_input.strip() == "": if default_value: return default_value print("Please provide a value.") continue return user_input def get_workspace_config( zenml_pro_url: str, organization_id: str, organization_name: str, workspace_id: str, workspace_name: str, enrollment_key: str, repository: str = DEFAULT_REPOSITORY, ) -> str: """Get the workspace configuration. Args: workspace_id: Workspace ID. workspace_name: Workspace name. organization_name: Organization name. enrollment_key: Enrollment key. repository: Workspace docker image repository. Returns: The workspace configuration. """ # Generate a secret key to encrypt the SQL database secrets encryption_key = f"{uuid.uuid4().hex}{uuid.uuid4().hex}" # Generate a hostname and database name from the workspace ID short_workspace_id = workspace_id.replace("-", "") return f""" zenml: analyticsOptIn: false threadPoolSize: 20 database: maxOverflow: "-1" poolSize: "10" # TODO: use the actual database host and credentials url: mysql://root:[email protected]:3306/zenml{short_workspace_id} image: # TODO: use your actual image repository (omit the tag, which is # assumed to be the same as the helm chart version) repository: { repository } # TODO: use your actual server domain here serverURL: https://zenml.{ short_workspace_id }.example.com ingress: enabled: true # TODO: use your actual domain here host: zenml.{ short_workspace_id }.example.com pro: apiURL: { zenml_pro_url }/api/v1 dashboardURL: { zenml_pro_url } enabled: true enrollmentKey: { enrollment_key } organizationID: { organization_id } organizationName: { organization_name } workspaceID: { workspace_id } workspaceName: { workspace_name } replicaCount: 1 secretsStore: sql: encryptionKey: { encryption_key } type: sql # TODO: these are the minimum resources required for the ZenML server. You can # adjust them to your needs. resources: limits: memory: 800Mi requests: cpu: 100m memory: 450Mi """ def main() -> None: zenml_pro_url = prompt( "What is the URL of your ZenML Pro instance? (e.g. https://zenml-pro.mydomain.com)", ) username = prompt( "Enter the ZenML Pro admin account username", default_value="[email protected]", ) password = prompt( "Enter the ZenML Pro admin account password", password=True ) # Login and get token token = login(zenml_pro_url, username, password) print("Login successful.") organizations = list_organizations( token=token, base_url=zenml_pro_url, ) if len(organizations) == 0: print("No organizations found. Please create an organization first.") sys.exit(1) elif len(organizations) == 1: organization_id, organization_name = organizations[0] confirm = prompt( f"The following organization was found: {organization_name} [{organization_id}]. " f"Use this organization? (y/n)", default_value="n", ) if confirm.lower() != "y": print("Exiting.") sys.exit(0) else: while True: organizations = "\n".join( [f"{name} [{id}]" for id, name in organizations] ) print(f"The following organizations are available:\n{organizations}") organization_id = prompt( "Which organization ID should the workspace be enrolled in?", ) if organization_id in [id for id, _ in organizations]: break print("Invalid organization ID. Please try again.") # Generate a default workspace name workspace_name = f"zenml-{str(uuid.uuid4())[:8]}" workspace_name = prompt( "Choose a name for the workspace, or press enter to use a generated name (only lowercase letters, numbers, and hyphens are allowed)", default_value=workspace_name, ) existing_workspace_id = workspace_exists( token=token, base_url=zenml_pro_url, org_id=organization_id, workspace_name=workspace_name, ) if existing_workspace_id: confirm = prompt( f"A workspace with name {workspace_name} already exists in the " f"organization {organization_id}. Overwrite ? (y/n)", default_value="n", ) if confirm.lower() != "y": print("Exiting.") sys.exit(0) workspace = enroll_workspace( token=token, base_url=zenml_pro_url, org_id=organization_id, workspace_name=workspace_name, delete_existing=existing_workspace_id, ) workspace_id = workspace.get("id") organization_name = workspace.get("organization").get("name") enrollment_key = workspace.get("enrollment_key") workspace_config = get_workspace_config( zenml_pro_url=zenml_pro_url, workspace_name=workspace_name, workspace_id=workspace_id, organization_id=organization_id, organization_name=organization_name, enrollment_key=enrollment_key, ) # Write the workspace configuration to a file values_file = f"zenml-{workspace_name}-values.yaml" with open(values_file, "w") as file: file.write(workspace_config) print( f""" The workspace was enrolled successfully. It can be accessed at: {zenml_pro_url}/workspaces/{workspace_name} The workspace server Helm values were written to: {values_file} Please note the TODOs in the file and adjust them to your needs. To install the workspace, run e.g.: helm --namespace zenml-pro-{workspace_name} upgrade --install --create-namespace \ zenml oci://public.ecr.aws/zenml/zenml --version <version> \ --values {values_file} """ ) if __name__ == "__main__": main()
Running the script does two things:
it creates a workspace entry in the ZenML Pro database. The workspace will remain in a "provisioning" state and won't be accessible until you actually install it using Helm.
it outputs a YAML file with Helm chart configuration values that you can use to deploy the ZenML Pro workspace server in your Kubernetes cluster.
This is an example of a generated Helm YAML file:
zenml: analyticsOptIn: false threadPoolSize: 20 database: maxOverflow: "-1" poolSize: "10" # TODO: use the actual database host and credentials url: mysql://root:[email protected]:3306/zenmlf8e306ef90e74b2f99db28298834feed image: # TODO: use your actual image repository (omit the tag, which is # assumed to be the same as the helm chart version) repository: 715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server # TODO: use your actual server domain here serverURL: https://zenml.f8e306ef90e74b2f99db28298834feed.example.com ingress: enabled: true # TODO: use your actual domain here host: zenml.f8e306ef90e74b2f99db28298834feed.example.com pro: apiURL: https://zenml-pro.staging.cloudinfra.zenml.io/api/v1 dashboardURL: https://zenml-pro.staging.cloudinfra.zenml.io enabled: true enrollmentKey: Mt9Rw-Cdjlumel7GTCrbLpCQ5KhhtfmiDt43mVOYYsDKEjboGg9R46wWu53WQ20OzAC45u-ZmxVqQkMGj-0hWQ organizationID: 0e99e236-0aeb-44cc-aff7-590e41c9a702 organizationName: MyOrg workspaceID: f8e306ef-90e7-4b2f-99db-28298834feed workspaceName: zenml-eab14ff8 replicaCount: 1 secretsStore: sql: encryptionKey: 155b20a388064423b1943d64f1686dd0d0aa6454be0a46839b1ee830f6565904 type: sql # TODO: these are the minimum resources required for the ZenML server. You can # adjust them to your needs. resources: limits: memory: 800Mi requests: cpu: 100m memory: 450Mi
Configure the ZenML Pro workspace Helm chart
IMPORTANT: In configuring the ZenML Pro workspace Helm chart, keep the following in mind:
don't use the same database name for multiple workspaces
don't reuse the control plane database name for the workspace server database
The ZenML Pro workspace server is nothing more than a slightly modified open-source ZenML server. The deployment even uses the official open-source helm chart.
There are a variety of options that can be configured for the ZenML Pro workspace server chart before installation. You can start by taking a look at the Helm chart README and
values.yaml
file and familiarize yourself with some of the configuration settings that you can customize for your ZenML server deployment. Alternatively, you can unpack theREADME.md
andvalues.yaml
files included in the helm chart:helm pull --untar oci://public.ecr.aws/zenml/zenml --version <version> less zenml/README.md less zenml/values.yaml
To configure the Helm chart, use the generated YAML file generated at the previous step as a template and fill in the necessary values marked by
TODO
comments. At a minimum, you'll need to configure the following:configure container registry credentials (
imagePullSecrets
, same as described for the control plane)the MySQL database credentials (
zenml.database.url
)the container image repository where the ZenML Pro workspace server container images are stored (
zenml.image.repository
)the hostname where the ZenML Pro workspace server will be reachable (
zenml.ingress.host
andzenml.serverURL
)
You may also choose to configure additional features documented in the official OSS ZenML Helm deployment documentation pages, if you need them:
injecting custom CA certificates (
zenml.certificates
), especially important if the TLS certificate used for the ZenML Pro control plane is signed by a custom Certificate Authorityconfigure HTTP proxy settings (
zenml.proxy
)set up secrets stores
configure database backup and restore
customize Kubernetes resources
etc.
Deploy the ZenML Pro workspace server with Helm
To install the helm chart (assuming the customized configuration values are in the generated
zenml-my-workspace-values.yaml
file), run e.g.:helm --namespace zenml-pro-f8e306ef-90e7-4b2f-99db-28298834feed upgrade --install --create-namespace zenml oci://public.ecr.aws/zenml/zenml --version <version> --values zenml-f8e306ef-90e7-4b2f-99db-28298834feed-values.yaml
The deployment is ready when the ZenML server pod is running and healthy:
$ kubectl -n zenml-pro-f8e306ef-90e7-4b2f-99db-28298834feed get all NAME READY STATUS RESTARTS AGE pod/zenml-5c4b6d9dcd-7bhfp 1/1 Running 0 85m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/zenml ClusterIP 172.20.43.140 <none> 80/TCP 85m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/zenml 1/1 1 1 85m NAME DESIRED CURRENT READY AGE replicaset.apps/zenml-5c4b6d9dcd 1 1 1 85m
After deployment, your workspace should show up as running in the ZenML Pro dashboard and can be accessed at the next step.
If you need to deploy multiple workspaces, simply run the enrollment script again with different values.
Accessing the Workspace
If you use TLS certificates for the ZenML Pro control plane or workspace server signed by a custom Certificate Authority, remember to install them on the client machines.
Accessing the Workspace Dashboard
The newly enrolled workspace should be accessible in the ZenML Pro workspace dashboard and the CLI now. You need to login as an organization member and add yourself as a workspace member first):



Then follow the instructions in the checklist to unlock the full dashboard:


Accessing the Workspace from the ZenML CLI
To login to the workspace with the ZenML CLI, you need to pass the custom ZenML Pro API URL to the zenml login
command:
zenml login --pro-api-url https://zenml-pro.staging.cloudinfra.zenml.io/api/v1
Alternatively, you can set the ZENML_PRO_API_URL
environment variable:
export ZENML_PRO_API_URL=https://zenml-pro.staging.cloudinfra.zenml.io/api/v1
zenml login
Enabling Run Templates Support
The ZenML Pro workspace server can be configured to optionally support Run Templates - the ability to run pipelines straight from the dashboard. This feature is not enabled by default and needs a few additional steps to be set up.
The Run Templates feature is only available from ZenML workspace server version 0.81.0 onwards.
The Run Templates feature comes with some optional sub-features that can be turned on or off to customize the behavior of the feature:
Building runner container images: Running pipelines from the dashboard relies on Kubernetes jobs (aka "runner" jobs) that are triggered by the ZenML workspace server. These jobs need to use container images that have the correct Python software packages installed on them to be able to launch the pipelines.
The good news is that run templates are based on pipeline runs that have already run in the past and already have container images built and associated with them. The same container images can be reused by the ZenML workspace server for the "runner jobs". However, for this to work, the Kubernetes cluster itself has to be able to access the container registries where these images are stored. This can be achieved in several ways:
use implicit workload identity access to the container registry - available in most cloud providers by granting the Kubernetes service account access to the container registry
configure a service account with implicit access to the container registry - associating some cloud service identity (e.g. a GCP service account, an AWS IAM role, etc.) with the Kubernetes service account used by the "runner" jobs
configure an image pull secret for the service account - similar to the previous option, but using a Kubernetes secret instead of a cloud service identity
When none of the above are available or desirable, an alternative approach is to configure the ZenML workspace server itself to build these "runner" container images and push them to a different container registry. This can be achieved by setting the
ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
environment variable totrue
and theZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY
environment variable to the container registry where the "runner" images will be pushed.Yet another alternative is to configure the ZenML workspace server to use a single pre-built "runner" image for all the pipeline runs. This can be achieved by keeping
ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
environment variable set tofalse
and theZENML_KUBERNETES_WORKLOAD_MANAGER_RUNNER_IMAGE
environment variable set to the container image registry URI where the "runner" image is stored. Note that this image needs to have all requirements installed to instantiate the stack that will be used for the template run.Store logs externally: By default, the ZenML workspace server will use the logs extracted from the "runner" job pods to populate the run template logs shown in the ZenML dashboard. These pods may disappear after a while, so the logs may not be available anymore.
To avoid this, you can configure the ZenML workspace server to store the logs in an external location, like an S3 bucket. This can be achieved by setting the
ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS
environment variable totrue
.This option is only currently available with the AWS implementation of the Run Templates feature and also requires the
ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET
environment variable to be set to point to the S3 bucket where the logs will be stored.
Decide on an implementation.
There are currently three different implementations of the Run Templates feature:
Kubernetes: runs pipelines in the same Kubernetes cluster as the ZenML Pro workspace server.
AWS: extends the Kubernetes implementation to be able to build and push container images to AWS ECR and to store run the template logs in AWS S3.
GCP: currently, this is the same as the Kubernetes implementation, but we plan to extend it to be able to push container images to GCP GCR and to store run template logs in GCP GCS.
If you're going for a fast, minimalistic setup, you should go for the Kubernetes implementation. If you want a complete cloud provider solution with all features enabled, you should go for the AWS implementation.
Prepare Run Templates configuration.
You'll need to prepare a list of environment variables that will be added to the Helm chart values used to deploy the ZenML workspace server.
For all implementations, the following variables are supported:
ZENML_SERVER_WORKLOAD_MANAGER_IMPLEMENTATION_SOURCE
(mandatory): one of the values associated with the implementation you've chosen in step 1:zenml_cloud_plugins.kubernetes_workload_manager.KubernetesWorkloadManager
zenml_cloud_plugins.aws_kubernetes_workload_manager.AWSKubernetesWorkloadManager
zenml_cloud_plugins.gcp_kubernetes_workload_manager.GCPKubernetesWorkloadManager
ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE
(mandatory): the Kubernetes namespace where the "runner" jobs will be launched. It must exist before the run templates are enabled.ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT
(mandatory): the Kubernetes service account to use for the "runner" jobs. It must exist before the run templates are enabled.ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
(optional): whether to build the "runner" container images or not. Defaults tofalse
.ZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY
(optional): the container registry where the "runner" images will be pushed. Mandatory ifZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
is set totrue
, ignored otherwise.ZENML_KUBERNETES_WORKLOAD_MANAGER_RUNNER_IMAGE
(optional): the "runner" container image to use. Only used ifZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
is set tofalse
, ignored otherwise.ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS
(optional): whether to store the logs of the "runner" jobs in an external location. Defaults tofalse
. Currently only supported with the AWS implementation and requires theZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET
variable to be set as well.ZENML_KUBERNETES_WORKLOAD_MANAGER_POD_RESOURCES
(optional): the Kubernetes pod resources specification to use for the "runner" jobs, in JSON format. Example:{"requests": {"cpu": "100m", "memory": "400Mi"}, "limits": {"memory": "700Mi"}}
.ZENML_KUBERNETES_WORKLOAD_MANAGER_TTL_SECONDS_AFTER_FINISHED
(optional): the time in seconds after which to cleanup finished jobs and their pods. Defaults to 2 days.ZENML_KUBERNETES_WORKLOAD_MANAGER_NODE_SELECTOR
(optional): the Kubernetes node selector to use for the "runner" jobs, in JSON format. Example:{"node-pool": "zenml-pool"}
.ZENML_KUBERNETES_WORKLOAD_MANAGER_TOLERATIONS
(optional): the Kubernetes tolerations to use for the "runner" jobs, in JSON format. Example:[{"key": "node-pool", "operator": "Equal", "value": "zenml-pool", "effect": "NoSchedule"}]
.ZENML_SERVER_MAX_CONCURRENT_TEMPLATE_RUNS
(optional): the maximum number of concurrent run templates that can be started at the same time by each server container or pod. Defaults to 2. If a client exceeds this number, the request will be rejected with a 429 Too Many Requests HTTP error. Note that this only limits the number of parallel run templates that can be started at the same time, not the number of parallel pipeline runs.
For the AWS implementation, the following additional variables are supported:
ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET
(optional): the S3 bucket where the logs will be stored (e.g.s3://my-bucket/run-template-logs
). Mandatory ifZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS
is set totrue
.ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_REGION
(optional): the AWS region where the container images will be pushed (e.g.eu-central-1
). Mandatory ifZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
is set totrue
.
Create the Kubernetes resources.
For the Kubernetes implementation, you'll need to create the following resources:
the Kubernetes namespace passed in the
ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE
variable.the Kubernetes service account passed in the
ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT
variable. This service account will be used to build images and run the "runner" jobs, so it needs to have the necessary permissions to do so (e.g. access to the container images, permissions to push container images to the configured container registry, permissions to access the configured bucket, etc.).
Finally, update the ZenML workspace server configuration to use the new implementation.
The environment variables you prepared in step 2 need to be added to the Helm chart values used to deploy the ZenML workspace server and the ZenML server has to be updated as covered in the Day 2 Operations: Upgrades and Updates section.
Example updated Helm values file (minimal configuration):
zenml: environment: ZENML_SERVER_WORKLOAD_MANAGER_IMPLEMENTATION_SOURCE: zenml_cloud_plugins.kubernetes_workload_manager.KubernetesWorkloadManager ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE: zenml-workspace-namespace ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT: zenml-workspace-service-account
Example updated Helm values file (full AWS configuration):
zenml: environment: ZENML_SERVER_WORKLOAD_MANAGER_IMPLEMENTATION_SOURCE: zenml_cloud_plugins.aws_kubernetes_workload_manager.AWSKubernetesWorkloadManager ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE: zenml-workspace-namespace ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT: zenml-workspace-service-account ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE: "true" ZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY: 339712793861.dkr.ecr.eu-central-1.amazonaws.com ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS: "true" ZENML_KUBERNETES_WORKLOAD_MANAGER_POD_RESOURCES: '{"requests": {"cpu": "100m", "memory": "400Mi"}, "limits": {"memory": "700Mi"}}' ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET: s3://my-bucket/run-template-logs ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_REGION: eu-central-1 ZENML_KUBERNETES_WORKLOAD_MANAGER_NODE_SELECTOR: '{"node-pool": "zenml-pool"}' ZENML_KUBERNETES_WORKLOAD_MANAGER_TOLERATIONS: '[{"key": "node-pool", "operator": "Equal", "value": "zenml-pool", "effect": "NoSchedule"}]' ZENML_SERVER_MAX_CONCURRENT_TEMPLATE_RUNS: 10
Example updated Helm values file (full GCP configuration):
zenml: environment: ZENML_SERVER_WORKLOAD_MANAGER_IMPLEMENTATION_SOURCE: zenml_cloud_plugins.gcp_kubernetes_workload_manager.GCPKubernetesWorkloadManager ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE: zenml-workspace-namespace ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT: zenml-workspace-service-account ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE: "true" ZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY: europe-west3-docker.pkg.dev/zenml-project/zenml-run-templates/zenml ZENML_KUBERNETES_WORKLOAD_MANAGER_POD_RESOURCES: '{"requests": {"cpu": "100m", "memory": "400Mi"}, "limits": {"memory": "700Mi"}}' ZENML_KUBERNETES_WORKLOAD_MANAGER_NODE_SELECTOR: '{"node-pool": "zenml-pool"}' ZENML_KUBERNETES_WORKLOAD_MANAGER_TOLERATIONS: '[{"key": "node-pool", "operator": "Equal", "value": "zenml-pool", "effect": "NoSchedule"}]' ZENML_SERVER_MAX_CONCURRENT_TEMPLATE_RUNS: 10
Day 2 Operations: Upgrades and Updates
This section covers how to upgrade or update your ZenML Pro deployment. The process involves updating both the ZenML Pro Control Plane and the ZenML Pro workspace servers.
Always upgrade the ZenML Pro Control Plane first, then upgrade the workspace servers. This ensures compatibility and prevents potential issues.
Upgrade Checklist
Check Available Versions and Release Notes
For ZenML Pro Control Plane:
Check available versions in the ZenML Pro ArtifactHub repository
For ZenML Pro Workspace Servers:
Check available versions in the ZenML OSS ArtifactHub repository
Review the ZenML GitHub releases page for release notes and breaking changes
Fetch and Prepare New Software Artifacts
Follow the Software Artifacts section to get access to the new versions of:
ZenML Pro Control Plane container images and Helm chart
ZenML Pro workspace server container images and Helm chart
If using a private registry, copy the new container images to your private registry
If you are using an air-gapped installation, follow the Air-Gapped Installation instructions
Upgrade the ZenML Pro Control Plane
Option A - In-place upgrade with existing values. Use this if you don't need to change any configuration values as part of the upgrade:
helm --namespace zenml-pro upgrade zenml-pro oci://public.ecr.aws/zenml/zenml-pro \ --version <new-version> --reuse-values
Option B - Retrieve, modify and reapply values, if necessary. Use this if you need to change any configuration values as part of the upgrade or if you are performing a configuration update without upgrading the ZenML Pro Control Plane.
# Get the current values helm --namespace zenml-pro get values zenml-pro > current-values.yaml # Edit current-values.yaml if needed, then upgrade helm --namespace zenml-pro upgrade zenml-pro oci://public.ecr.aws/zenml/zenml-pro \ --version <new-or-existing-version> --values current-values.yaml
Upgrade ZenML Pro Workspace Servers
For each workspace, perform either:
Option A - In-place upgrade with existing values. Use this if you don't need to change any configuration values as part of the upgrade:
helm --namespace zenml-pro-<workspace-name-or-id> upgrade zenml oci://public.ecr.aws/zenml/zenml \ --version <new-version> --reuse-values
Option B - Retrieve, modify and reapply values, if necessary. Use this if you need to change any configuration values as part of the upgrade or if you are performing a configuration update without upgrading the ZenML Pro Workspace Server.
# Get the current values helm --namespace zenml-pro-<workspace-name-or-id> get values zenml > current-workspace-values.yaml # Edit current-workspace-values.yaml if needed, then upgrade helm --namespace zenml-pro-<workspace-name-or-id> upgrade zenml oci://public.ecr.aws/zenml/zenml \ --version <new-version> --values current-workspace-values.yaml

Last updated
Was this helpful?