Self-hosted deployment
Guide for installing ZenML Pro self-hosted in a Kubernetes cluster.
This page provides instructions for installing ZenML Pro - the ZenML Pro Control Plane and one or more ZenML Pro Workspace servers - on-premise in a Kubernetes cluster. For more general information on deploying ZenML, visit our documentation where we explain the different options you have.
Overview
ZenML Pro can be installed as a self-hosted deployment. You need to be granted access to the ZenML Pro container images and you'll have to provide your own infrastructure: a Kubernetes cluster, a database server and a few other common prerequisites usually needed to expose Kubernetes services via HTTPs - a load balancer, an Ingress controller, HTTPs certificate(s) and DNS rule(s).
This document will guide you through the process.
Preparation and prerequisites
Software Artifacts
The ZenML Pro on-prem installation relies on a set of container images and Helm charts. The container images are stored in private ZenML container registries that are not available to the public.
If you haven't done so already, please book a demo to get access to the private ZenML Pro container images.
ZenML Pro Control Plane Artifacts
The following artifacts are required to install the ZenML Pro control plane in your own Kubernetes cluster:
private container images for the ZenML Pro API server:
715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-apiin AWSeurope-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-apiin GCP
private container images for the ZenML Pro dashboard:
715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-dashboardin AWSeurope-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-dashboardin GCP
the public ZenML Pro helm chart (as an OCI artifact):
oci://public.ecr.aws/zenml/zenml-pro
ZenML Pro Workspace Server Artifacts
The following artifacts are required to install ZenML Pro workspace servers in your own Kubernetes cluster:
private container images for the ZenML Pro workspace server:
715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-serverin AWSeurope-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-serverin GCP
the public open-source ZenML Helm chart (as an OCI artifact):
oci://public.ecr.aws/zenml/zenml
ZenML Pro Client Artifacts
If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located in Docker Hub at zenmldocker/zenml. This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the DockerSettings documentation for more information).
Accessing the ZenML Pro Container Images
This section provides instructions for how to access the private ZenML Pro container images.
AWS
To access the ZenML Pro container images stored in AWS ECR, you need to set up an AWS IAM user or IAM role in your AWS account. The steps below outline how to create an AWS account, configure the necessary IAM entities, and pull images from the private repositories. If you're familiar with AWS or even plan on using an AWS EKS cluster to deploy ZenML Pro, then you can simply use your existing IAM user or IAM role and skip steps 1. and 2.
Step 1: Create a Free AWS Account
Visit the AWS Free Tier page.
Click Create a Free Account.
Follow the on-screen instructions to provide your email address, create a root user, and set a secure password.
Enter your contact and payment information for verification purposes. While a credit or debit card is required, you won't be charged for free-tier eligible services.
Confirm your email and complete the verification process.
Log in to the AWS Management Console using your root user credentials.
Step 2: Create an IAM User or IAM Role
A. Create an IAM User
Log in to the AWS Management Console.
Navigate to the IAM service.
Click Users in the left-hand menu, then click Add Users.
Provide a user name (e.g.,
zenml-ecr-access).Select Access Key - Programmatic access as the AWS credential type.
Click Next: Permissions.
Choose Attach policies directly, then select the following policies:
AmazonEC2ContainerRegistryReadOnly
Click Next: Tags and optionally add tags for organization purposes.
Click Next: Review, then Create User.
Note the Access Key ID and Secret Access Key displayed after creation. Save these securely.
B. Create an IAM Role
Navigate to the IAM service.
Click Roles in the left-hand menu, then click Create Role.
Choose the type of trusted entity:
Select AWS Account.
Enter your AWS account ID and click Next.
Select the AmazonEC2ContainerRegistryReadOnly policy.
Click Next: Tags, optionally add tags, then click Next: Review.
Provide a role name (e.g.,
zenml-ecr-access-role) and click Create Role.
Step 3: Provide the IAM User/Role ARN
For an IAM user, the ARN can be found in the Users section under the Summary tab.
For an IAM role, the ARN is displayed in the Roles section under the Summary tab.
Send the ARN to ZenML Support so it can be granted permission to access the ZenML Pro container images and Helm charts.
Step 4: Authenticate your Docker Client
Run these steps on the machine that you'll use to pull the ZenML Pro images. It is recommended that you copy the container images into your own container registry that will be accessible from the Kubernetes cluster where ZenML Pro will be stored, otherwise you'll have to find a way to configure the Kubernetes cluster to authenticate directly to the ZenML Pro container registry and that will be problematic if your Kubernetes cluster is not running on AWS.
A. Install AWS CLI
Follow the instructions to install the AWS CLI: AWS CLI Installation Guide.
B. Configure AWS CLI Credentials
Open a terminal and run
aws configureEnter the following when prompted:
Access Key ID: Provided during IAM user creation.
Secret Access Key: Provided during IAM user creation.
Default region name:
eu-west-1Default output format: Leave blank or enter
json.
If you chose to use an IAM role, update the AWS CLI configuration file to specify the role you want to assume. Open the configuration file located at
~/.aws/configand add the following:Replace
<IAM-ROLE-ARN>with the ARN of the role you created and ensuresource_profilepoints to a profile with sufficient permissions to assume the role.
C. Authenticate Docker with ECR
Run the following command to authenticate your Docker client with the ZenML ECR repository:
If you used an IAM role, use the specified profile to execute commands. For example:
This will allow you to authenticate to the ZenML Pro container registries and pull the necessary images with Docker, e.g.:
GCP
To access the ZenML Pro container images stored in Google Cloud Platform (GCP) Artifact Registry, you need to set up a GCP account and configure the necessary permissions. The steps below outline how to create a GCP account, configure authentication, and pull images from the private repositories. If you're familiar with GCP or plan on using a GKE cluster to deploy ZenML Pro, you can use your existing GCP account and skip step 1.
Step 1: Create a GCP Account
Visit the Google Cloud Console.
Click Get Started for Free or sign in with an existing Google account.
Follow the on-screen instructions to set up your account and create a project.
Set up billing information (required for using GCP services).
Step 2: Create a Service Account
Navigate to the IAM & Admin > Service Accounts page in the Google Cloud Console.
Click Create Service Account.
Enter a service account name (e.g.,
zenml-gar-access).Add a description (optional) and click Create and Continue.
No additional permissions are needed as access will be granted directly to the Artifact Registry.
Click Done.
After creation, click on the service account to view its details.
Go to the Keys tab and click Add Key > Create new key.
Choose JSON as the key type and click Create.
Save the downloaded JSON key file securely - you'll need it later.
Step 3: Provide the Service Account Email
In the service account details page, copy the service account email address (it should look like
[email protected]).Send this email address to ZenML Support so it can be granted permission to access the ZenML Pro container images.
Step 4: Authenticate your Docker Client
Run these steps on the machine that you'll use to pull the ZenML Pro images. It is recommended that you copy the container images into your own container registry that will be accessible from the Kubernetes cluster where ZenML Pro will be stored.
A. Install Google Cloud CLI
Follow the instructions to install the Google Cloud CLI.
Initialize the CLI by running:
B. Configure Authentication
Activate the service account using the JSON key file you downloaded:
Configure Docker authentication for Artifact Registry:
C. Pull the Container Images
You can now pull the ZenML Pro images:
Air-Gapped Installation
If you need to install ZenML Pro in an air-gapped environment (a network with no direct internet access), you'll need to transfer all required artifacts to your internal infrastructure. Here's a step-by-step process:
1. Prepare a Machine with Internet Access
First, you'll need a machine with both internet access and sufficient storage space to temporarily store all artifacts. On this machine:
Follow the authentication steps described above to gain access to the private repositories
Install the required tools:
Docker
Helm
2. Download All Required Artifacts
A Bash script like the following can be used to download all necessary components, or you can run the listed commands manually:
3. Transfer Artifacts to Air-Gapped Environment
Copy the
zenml-artifacts.tar.gzfile to your preferred transfer medium (e.g., USB drive, approved file transfer system)Transfer the archive to a machine in your air-gapped environment that has access to your internal container registry
4. Load Artifacts in Air-Gapped Environment
Create a script to load the artifacts in your air-gapped environment or run the listed commands manually:
5. Update Configuration
When deploying ZenML Pro in your air-gapped environment, make sure to update all references to container images in your Helm values to point to your internal registry. For example:
The scripts provided above are examples and may need to be adjusted based on your specific security requirements and internal infrastructure setup.
6. Using the Helm Charts
After downloading the Helm charts, you can use their local paths instead of a remote OCI registry to deploy ZenML Pro components. Here's an example of how to use them:
Infrastructure Requirements
To deploy the ZenML Pro control plane and one or more ZenML Pro workspace servers, ensure the following prerequisites are met:
Kubernetes Cluster
A functional Kubernetes cluster is required as the primary runtime environment.
Database Server(s)
The ZenML Pro Control Plane and ZenML Pro Workspace servers need to connect to an external database server. To minimize the amount of infrastructure resources needed, you can use a single database server in common for the Control Plane and for all workspaces, or you can use different database servers to ensure server-level database isolation, as long as you keep in mind the following limitations:
the ZenML Pro Control Plane can be connected to either MySQL or Postgres as the external database
the ZenML Pro Workspace servers can only be connected to a MySQL database (no Postgres support is available)
the ZenML Pro Control Plane as well as every ZenML Pro Workspace server needs to use its own individual database (especially important when connected to the same server)
Ensure you have a valid username and password for the different ZenML Pro services. For improved security, it is recommended to have different users for different services. If the database user does not have permissions to create databases, you must also create a database and give the user full permissions to access and manage it (i.e. create, update and delete tables).
Ingress Controller
Install an Ingress provider in the cluster (e.g., NGINX, Traefik) to handle HTTP(S) traffic routing. Ensure the Ingress provider is properly configured to expose the cluster's services externally.
Domain Name
You'll need an FQDN for the ZenML Pro Control Plane as well as for every ZenML Pro workspace. For this reason, it's highly recommended to use a DNS prefix and associated SSL certificate instead of individual FQDNs and SSL certificates, to make this process easier.
FQDN or DNS Prefix Setup Obtain a Fully Qualified Domain Name (FQDN) or DNS prefix (e.g.,
*.zenml-pro.mydomain.com) from your DNS provider.Identify the external Load Balancer IP address of the Ingress controller using the command
kubectl get svc -n <ingress-namespace>. Look for theEXTERNAL-IPfield of the Load Balancer service.Create a DNS
Arecord (orCNAMEfor subdomains) pointing the FQDN to the Load Balancer IP. Example:Host:
zenml-pro.mydomain.comType:
AValue:
<Load Balancer IP>
Use a DNS propagation checker to confirm that the DNS record is resolving correctly.
Make sure you don't use a simple DNS prefix for the servers (e.g. https://zenml.cluster is not recommended). This is especially relevant for the TLS certificates that you have to prepare for these endpoints. Always use a fully qualified domain name (FQDN) (e.g. https://zenml.ml.cluster). The TLS certificates will not be accepted by some browsers otherwise (e.g. Chrome).
SSL Certificate
The ZenML Pro services do not terminate SSL traffic. It is your responsibility to generate and configure the necessary SSL certificates for the ZenML Pro Control Plane as well as all the ZenML Pro workspaces that you will deploy (see the previous point on how to use a DNS prefix to make the process easier).
Obtaining SSL Certificates
Acquire an SSL certificate for the domain. You can use:
A commercial SSL certificate provider (e.g., DigiCert, Sectigo).
Free services like Let's Encrypt for domain validation and issuance.
Self-signed certificates (not recommended for production environments). IMPORTANT: If you are using self-signed certificates, it is highly recommended to use the same self-signed CA certificate for all the ZenML Pro services (control plane and workspace servers), otherwise it will be difficult to manage the certificates on the client machines. With only one CA certificate, you can install it system-wide on all the client machines only once and then use it to sign all the TLS certificates for the ZenML Pro services.
Configuring SSL Termination
Once the SSL certificate is obtained, configure your load balancer or Ingress controller to terminate HTTPS traffic:
For NGINX Ingress Controller:
You can configure SSL termination globally for the NGINX Ingress Controller by setting up a default SSL certificate or configuring it at the ingress controller level, or you can specify SSL certificates when configuring the ingress in the ZenML server Helm values.
Here's how you can do it globally:
Create a TLS Secret
Store your SSL certificate and private key as a Kubernetes TLS secret in the namespace where the NGINX Ingress Controller is deployed.
Update NGINX Ingress Controller Configurations
Configure the NGINX Ingress Controller to use the default SSL certificate.
If using the NGINX Ingress Controller Helm chart, modify the
values.yamlfile or use-setduring installation:Or directly pass the argument during Helm installation or upgrade:
If the NGINX Ingress Controller was installed manually, edit its deployment to include the argument in the
argssection of the container:
For Traefik:
Configure Traefik to use TLS by creating a certificate resolver for Let's Encrypt or specifying the certificates manually in the
traefik.ymlorvalues.yamlfile. Example for Let's Encrypt:Reference the domain in your IngressRoute or Middleware configuration.
If you used a custom CA certificate to sign the TLS certificates for the ZenML Pro services, you will need to install the CA certificates on every client machine, as covered in the Install CA Certificates section.
The above are infrastructure requirements for ZenML Pro. If, in addition to ZenML, you would also like to reuse the same Kubernetes cluster to run machine learning workloads with ZenML, you will require the following additional infrastructure resources and services to be able to set up a remote ZenML Stack:
a Kubernetes ZenML Orchestrator can be set up to run on the same cluster as ZenML Pro. For authentication, you will be able to configure a ZenML Kubernetes Service Connector using service account tokens
you'll need a container registry to store the container images built by ZenML. If you don't have one already, you can install Docker registry on the same cluster as ZenML Pro.
you'll also need some form of centralized object storage to store the artifacts generated by ZenML. If you don't have one already, you can install MinIO on the same cluster as ZenML Pro and then configure the ZenML S3 Artifact Store to use it.
(optional) you can install Kaniko in your Kubernetes cluster to build the container images for your ZenML pipelines and then configure it as a ZenML Kaniko Image Builder in your ZenML Stack.
Stage 1/2: Install the ZenML Pro Control Plane
Set up Credentials
If your Kubernetes cluster is not set to be authenticated to the container registry where the ZenML Pro container images are hosted, you will need to create a secret to allow the ZenML Pro server to pull the images. The following is an example of how to do this if you've received a private access key for the ZenML GCP Artifact Registry from ZenML, but you can use the same approach for your own private container registry:
The key.base64 file should contain the base64 encoded JSON key for the GCP service account as received from the ZenML support team. The image-pull-secret secret will be used in the next step when installing the ZenML Pro helm chart.
Configure the Helm Chart
There are a variety of options that can be configured for the ZenML Pro helm chart before installation.
You can take look at the Helm chart README and values.yaml file and familiarize yourself with some of the configuration settings that you can customize for your ZenML Pro deployment. Alternatively, you can unpack the README.md and values.yaml files included in the helm chart:
This is an example Helm values YAML file that covers the most common configuration options:
Minimum required settings:
the database credentials (
zenml.database.external)the URL (
zenml.serverURL) and Ingress hostname (zenml.ingress.host) where the ZenML Pro Control Plane API and Dashboard will be reachable
In addition to the above, the following might also be relevant for you:
configure container registry credentials (
imagePullSecrets)injecting custom CA certificates (
zenml.certificates), especially important if the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authorityconfigure HTTP proxy settings (
zenml.proxy)custom container image repository locations (
zenml.image.apiandzenml.image.dashboard)the username and password used for the default admin account (
zenml.auth.password)additional Ingress settings (
zenml.ingress)Kubernetes resources allocated to the pods (
resources)If you set up a common DNS prefix that you plan on using for all the ZenML Pro services, you may configure the domain of the HTTP cookies used by the ZenML Pro dashboard to match it by setting
zenml.auth.authCookieDomainto the DNS prefix (e.g..my.domaininstead ofzenml-pro.my-domain)
Install the Helm Chart
To install the helm chart (assuming the customized configuration values are in a my-values.yaml file), run:
If the installation is successful, you should be able to see the following workloads running in your cluster:
The Helm chart will output information explaining how to connect and authenticate to the ZenML Pro dashboard:
The credentials are for the default administrator user account provisioned on installation. With these on-hand, you can proceed to the next step and on-board additional users.
Install CA Certificates
If the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authority, you need to install the CA certificates on every machine that needs to access the ZenML server:
installing the CA certificates system-wide is usually the easiest solution. For example, on Ubuntu and Debian-based systems, you can install the CA certificates system-wide by copying the CA certificates into the
/usr/local/share/ca-certificatesdirectory and runningupdate-ca-certificates.for some browsers (e.g. Chrome), updating the system's CA certificates is not enough. You will also need to import the CA certificates into the browser.
for Python, you also need to set the
REQUESTS_CA_BUNDLEenvironment variable to the path to the system's CA certificates bundle file (e.g.export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt)later on, when you're running containerized pipelines with ZenML, you'll also want to install those same CA certificates into the container images built by ZenML by customizing the build process via DockerSettings. For example:
customize the ZenML client container image using a Dockerfile like this:
then build and push that image to your private container registry:
and finally update your ZenML pipeline code to use the custom ZenML client image by using the
DockerSettingsclass:
Onboard Additional Users
The deployed ZenML Pro service will come with a pre-installed default administrator account. This admin account serves the purpose of creating and recovering other users. First you will need to get the admin password following the instructions at the previous step.
Create a
users.yamlfile that contains a list of all the users that you want to create for ZenML. Also set a default password. The users will be asked to change this password on their first login.Run the
create_users.pyscript below. This will create all of the users.[file: create_users.py]
The script will prompt you for the URL of your deployment, the admin account username and password and finally the location of your users.yaml file.

Create an Organization
The ZenML Pro admin user should only be used for administrative operations: creating other users, resetting the password of existing users and enrolling workspaces. All other operations should be executed while logged in as a regular user.
Head on over to your deployment in the browser and use one of the users you just created to log in.

After logging in for the first time, you will need to create a new password. (Be aware: For the time being only the admin account will be able to reset this password)

Finally you can create an Organization. This Organization will host all the workspaces you enroll at the next stage.

Invite Other Users to the Organization
Now you can invite your whole team to the org. For this open the drop-down in the top right and head over to the settings.

Here in the members tab, add all the users you created in the previous step. Make sure to assign the appropriate role to each user.


Finally, send the account's username and initial password over to your team members.
Stage 2/2: Enroll and Deploy ZenML Pro workspaces
Installing and updating on-prem ZenML Pro workspace servers is not automated, as it is with the SaaS version. You will be responsible for enrolling workspace servers in the right ZenML Pro organization, installing them and regularly updating them. Some scripts are provided to simplify this task as much as possible.
Enrolling a Workspace
Run the
enroll-workspace.pyscript belowThis will collect all the necessary data, then enroll the workspace in the organization and generate a Helm
values.yamlfile template that you can use to install the workspace server:[file: enroll-workspace.py]
Running the script does two things:
it creates a workspace entry in the ZenML Pro database. The workspace will remain in a "provisioning" state and won't be accessible until you actually install it using Helm.
it outputs a YAML file with Helm chart configuration values that you can use to deploy the ZenML Pro workspace server in your Kubernetes cluster.
This is an example of a generated Helm YAML file:
Configure the ZenML Pro workspace Helm chart
IMPORTANT: In configuring the ZenML Pro workspace Helm chart, keep the following in mind:
don't use the same database name for multiple workspaces
don't reuse the control plane database name for the workspace server database
The ZenML Pro workspace server is nothing more than a slightly modified open-source ZenML server. The deployment even uses the official open-source helm chart.
There are a variety of options that can be configured for the ZenML Pro workspace server chart before installation. You can start by taking a look at the Helm chart README and
values.yamlfile and familiarize yourself with some of the configuration settings that you can customize for your ZenML server deployment. Alternatively, you can unpack theREADME.mdandvalues.yamlfiles included in the helm chart:To configure the Helm chart, use the generated YAML file generated at the previous step as a template and fill in the necessary values marked by
TODOcomments. At a minimum, you'll need to configure the following:configure container registry credentials (
imagePullSecrets, same as described for the control plane)the MySQL database credentials (
zenml.database.url)the container image repository where the ZenML Pro workspace server container images are stored (
zenml.image.repository)the hostname where the ZenML Pro workspace server will be reachable (
zenml.ingress.hostandzenml.serverURL)
You may also choose to configure additional features documented in the official OSS ZenML Helm deployment documentation pages, if you need them:
injecting custom CA certificates (
zenml.certificates), especially important if the TLS certificate used for the ZenML Pro control plane is signed by a custom Certificate Authorityconfigure HTTP proxy settings (
zenml.proxy)set up secrets stores
configure database backup and restore
customize Kubernetes resources
etc.
Deploy the ZenML Pro workspace server with Helm
To install the helm chart (assuming the customized configuration values are in the generated
zenml-my-workspace-values.yamlfile), run e.g.:The deployment is ready when the ZenML server pod is running and healthy:
After deployment, your workspace should show up as running in the ZenML Pro dashboard and can be accessed at the next step.
If you need to deploy multiple workspaces, simply run the enrollment script again with different values.
Accessing the Workspace
If you use TLS certificates for the ZenML Pro control plane or workspace server signed by a custom Certificate Authority, remember to install them on the client machines.
Accessing the Workspace Dashboard
The newly enrolled workspace should be accessible in the ZenML Pro workspace dashboard and the CLI now. If you're the organization admin, you may also need to add other users as workspace members, if they don't have access to the workspace yet.




Then follow the instructions in the "Get Started" checklist to unlock the full dashboard:

Accessing the Workspace from the ZenML CLI
To login to the workspace with the ZenML CLI, you need to pass the custom ZenML Pro API URL to the zenml login command:
Alternatively, you can set the ZENML_PRO_API_URL environment variable:
Enabling Snapshot Support
The ZenML Pro workspace server can be configured to optionally support running pipeline snapshots straight from the dashboard. This feature is not enabled by default and needs a few additional steps to be set up.
Snapshots are only available from ZenML workspace server version 0.90.0 onwards.
Snapshots come with some optional sub-features that can be turned on or off to customize the behavior of the feature:
Building runner container images: Running pipelines from the dashboard relies on Kubernetes jobs (aka "runner" jobs) that are triggered by the ZenML workspace server. These jobs need to use container images that have the correct Python software packages installed on them to be able to launch the pipelines.
The good news is that snapshots are based on pipeline runs that have already run in the past and already have container images built and associated with them. The same container images can be reused by the ZenML workspace server for the "runner jobs". However, for this to work, the Kubernetes cluster itself has to be able to access the container registries where these images are stored. This can be achieved in several ways:
use implicit workload identity access to the container registry - available in most cloud providers by granting the Kubernetes service account access to the container registry
configure a service account with implicit access to the container registry - associating some cloud service identity (e.g. a GCP service account, an AWS IAM role, etc.) with the Kubernetes service account used by the "runner" jobs
configure an image pull secret for the service account - similar to the previous option, but using a Kubernetes secret instead of a cloud service identity
When none of the above are available or desirable, an alternative approach is to configure the ZenML workspace server itself to build these "runner" container images and push them to a different container registry. This can be achieved by setting the
ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGEenvironment variable totrueand theZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRYenvironment variable to the container registry where the "runner" images will be pushed.Yet another alternative is to configure the ZenML workspace server to use a single pre-built "runner" image for all the pipeline runs. This can be achieved by keeping
ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGEenvironment variable set tofalseand theZENML_KUBERNETES_WORKLOAD_MANAGER_RUNNER_IMAGEenvironment variable set to the container image registry URI where the "runner" image is stored. Note that this image needs to have all requirements installed to instantiate the stack that will be used for the template run.Store logs externally: By default, the ZenML workspace server will use the logs extracted from the "runner" job pods to populate the run template logs shown in the ZenML dashboard. These pods may disappear after a while, so the logs may not be available anymore.
To avoid this, you can configure the ZenML workspace server to store the logs in an external location, like an S3 bucket. This can be achieved by setting the
ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGSenvironment variable totrue.This option is only currently available with the AWS implementation of the snapshots feature and also requires the
ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKETenvironment variable to be set to point to the S3 bucket where the logs will be stored.
Decide on an implementation.
There are currently three different implementations of the snapshots feature:
Kubernetes: runs pipelines in the same Kubernetes cluster as the ZenML Pro workspace server.
AWS: extends the Kubernetes implementation to be able to build and push container images to AWS ECR and to store run the template logs in AWS S3.
GCP: currently, this is the same as the Kubernetes implementation, but we plan to extend it to be able to push container images to GCP GCR and to store run template logs in GCP GCS.
If you're going for a fast, minimalistic setup, you should go for the Kubernetes implementation. If you want a complete cloud provider solution with all features enabled, you should go for the AWS implementation.
Prepare Snapshots configuration.
You'll need to prepare a list of environment variables that will be added to the Helm chart values used to deploy the ZenML workspace server.
For all implementations, the following variables are supported:
ZENML_SERVER_WORKLOAD_MANAGER_IMPLEMENTATION_SOURCE(mandatory): one of the values associated with the implementation you've chosen in step 1:zenml_cloud_plugins.kubernetes_workload_manager.KubernetesWorkloadManagerzenml_cloud_plugins.aws_kubernetes_workload_manager.AWSKubernetesWorkloadManagerzenml_cloud_plugins.gcp_kubernetes_workload_manager.GCPKubernetesWorkloadManager
ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE(mandatory): the Kubernetes namespace where the "runner" jobs will be launched. It must exist before the snapshots are enabled.ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT(mandatory): the Kubernetes service account to use for the "runner" jobs. It must exist before the snapshots are enabled.ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE(optional): whether to build the "runner" container images or not. Defaults tofalse.ZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY(optional): the container registry where the "runner" images will be pushed. Mandatory ifZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGEis set totrue, ignored otherwise.ZENML_KUBERNETES_WORKLOAD_MANAGER_RUNNER_IMAGE(optional): the "runner" container image to use. Only used ifZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGEis set tofalse, ignored otherwise.ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS(optional): whether to store the logs of the "runner" jobs in an external location. Defaults tofalse. Currently only supported with the AWS implementation and requires theZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKETvariable to be set as well.ZENML_KUBERNETES_WORKLOAD_MANAGER_POD_RESOURCES(optional): the Kubernetes pod resources specification to use for the "runner" jobs, in JSON format. Example:{"requests": {"cpu": "100m", "memory": "400Mi"}, "limits": {"memory": "700Mi"}}.ZENML_KUBERNETES_WORKLOAD_MANAGER_TTL_SECONDS_AFTER_FINISHED(optional): the time in seconds after which to cleanup finished jobs and their pods. Defaults to 2 days.ZENML_KUBERNETES_WORKLOAD_MANAGER_NODE_SELECTOR(optional): the Kubernetes node selector to use for the "runner" jobs, in JSON format. Example:{"node-pool": "zenml-pool"}.ZENML_KUBERNETES_WORKLOAD_MANAGER_TOLERATIONS(optional): the Kubernetes tolerations to use for the "runner" jobs, in JSON format. Example:[{"key": "node-pool", "operator": "Equal", "value": "zenml-pool", "effect": "NoSchedule"}].ZENML_KUBERNETES_WORKLOAD_MANAGER_JOB_BACKOFF_LIMIT(optional): the Kubernetes backoff limit to use for the builder and runner jobs.ZENML_KUBERNETES_WORKLOAD_MANAGER_POD_FAILURE_POLICY(optional): the Kubernetes pod failure policy to use for the builder and runner jobs.ZENML_SERVER_MAX_CONCURRENT_TEMPLATE_RUNS(optional): the maximum number of concurrent snapshot runs that can be started at the same time by each server container or pod. Defaults to 2. If a client exceeds this number, the request will be rejected with a 429 Too Many Requests HTTP error. Note that this only limits the number of parallel snapshots that can be started at the same time, not the number of parallel pipeline runs.
For the AWS implementation, the following additional variables are supported:
ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET(optional): the S3 bucket where the logs will be stored (e.g.s3://my-bucket/run-template-logs). Mandatory ifZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGSis set totrue.ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_REGION(optional): the AWS region where the container images will be pushed (e.g.eu-central-1). Mandatory ifZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGEis set totrue.
Create the Kubernetes resources.
For the Kubernetes implementation, you'll need to create the following resources:
the Kubernetes namespace passed in the
ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACEvariable.the Kubernetes service account passed in the
ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNTvariable. This service account will be used to build images and run the "runner" jobs, so it needs to have the necessary permissions to do so (e.g. access to the container images, permissions to push container images to the configured container registry, permissions to access the configured bucket, etc.).
Finally, update the ZenML workspace server configuration to use the new implementation.
The environment variables you prepared in step 2 need to be added to the Helm chart values used to deploy the ZenML workspace server and the ZenML server has to be updated as covered in the Day 2 Operations: Upgrades and Updates section.
Example updated Helm values file (minimal configuration):
Example updated Helm values file (full AWS configuration):
Example updated Helm values file (full GCP configuration):
Day 2 Operations: Upgrades and Updates
This section covers how to upgrade or update your ZenML Pro deployment. The process involves updating both the ZenML Pro Control Plane and the ZenML Pro workspace servers.
Always upgrade the ZenML Pro Control Plane first, then upgrade the workspace servers. This ensures compatibility and prevents potential issues.
Upgrade Checklist
Check Available Versions and Release Notes
For ZenML Pro Control Plane:
Check available versions in the ZenML Pro ArtifactHub repository
For ZenML Pro Workspace Servers:
Check available versions in the ZenML OSS ArtifactHub repository
Review the ZenML GitHub releases page for release notes and breaking changes
Fetch and Prepare New Software Artifacts
Follow the Software Artifacts section to get access to the new versions of:
ZenML Pro Control Plane container images and Helm chart
ZenML Pro workspace server container images and Helm chart
If using a private registry, copy the new container images to your private registry
If you are using an air-gapped installation, follow the Air-Gapped Installation instructions
Upgrade the ZenML Pro Control Plane
Option A - In-place upgrade with existing values. Use this if you don't need to change any configuration values as part of the upgrade:
Option B - Retrieve, modify and reapply values, if necessary. Use this if you need to change any configuration values as part of the upgrade or if you are performing a configuration update without upgrading the ZenML Pro Control Plane.
Upgrade ZenML Pro Workspace Servers
For each workspace, perform either:
Option A - In-place upgrade with existing values. Use this if you don't need to change any configuration values as part of the upgrade:
Option B - Retrieve, modify and reapply values, if necessary. Use this if you need to change any configuration values as part of the upgrade or if you are performing a configuration update without upgrading the ZenML Pro Workspace Server.
Last updated
Was this helpful?