Self-hosted deployment
Guide for installing ZenML Pro self-hosted in a Kubernetes cluster.
Last updated
Was this helpful?
Guide for installing ZenML Pro self-hosted in a Kubernetes cluster.
Last updated
Was this helpful?
This page provides instructions for installing ZenML Pro - the ZenML Pro Control Plane and one or more ZenML Pro Workspace servers - on-premise in a Kubernetes cluster.
ZenML Pro can be installed as a self-hosted deployment. You need to be granted access to the ZenML Pro container images and you'll have to provide your own infrastructure: a Kubernetes cluster, a database server and a few other common prerequisites usually needed to expose Kubernetes services via HTTPs - a load balancer, an Ingress controller, HTTPs certificate(s) and DNS rule(s).
This document will guide you through the process.
The ZenML Pro on-prem installation relies on a set of container images and Helm charts. The container images are stored in private ZenML container registries that are not available to the public.
If you haven't done so already, please to get access to the private ZenML Pro container images.
The following artifacts are required to install the ZenML Pro control plane in your own Kubernetes cluster:
private container images for the ZenML Pro API server:
715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-api
in AWS
europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-api
in GCP
private container images for the ZenML Pro dashboard:
715803424590.dkr.ecr.eu-west-1.amazonaws.com/zenml-pro-dashboard
in AWS
europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-dashboard
in GCP
the public ZenML Pro helm chart (as an OCI artifact): oci://public.ecr.aws/zenml/zenml-pro
The following artifacts are required to install ZenML Pro workspace servers in your own Kubernetes cluster:
private container images for the ZenML Pro workspace server:
715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server
in AWS
europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-server
in GCP
the public open-source ZenML Helm chart (as an OCI artifact): oci://public.ecr.aws/zenml/zenml
This section provides instructions for how to access the private ZenML Pro container images.
To access the ZenML Pro container images stored in AWS ECR, you need to set up an AWS IAM user or IAM role in your AWS account. The steps below outline how to create an AWS account, configure the necessary IAM entities, and pull images from the private repositories. If you're familiar with AWS or even plan on using an AWS EKS cluster to deploy ZenML Pro, then you can simply use your existing IAM user or IAM role and skip steps 1. and 2.
Step 1: Create a Free AWS Account
Click Create a Free Account.
Follow the on-screen instructions to provide your email address, create a root user, and set a secure password.
Enter your contact and payment information for verification purposes. While a credit or debit card is required, you won't be charged for free-tier eligible services.
Confirm your email and complete the verification process.
Log in to the AWS Management Console using your root user credentials.
Step 2: Create an IAM User or IAM Role
A. Create an IAM User
Log in to the AWS Management Console.
Navigate to the IAM service.
Click Users in the left-hand menu, then click Add Users.
Provide a user name (e.g., zenml-ecr-access
).
Select Access Key - Programmatic access as the AWS credential type.
Click Next: Permissions.
Choose Attach policies directly, then select the following policies:
AmazonEC2ContainerRegistryReadOnly
Click Next: Tags and optionally add tags for organization purposes.
Click Next: Review, then Create User.
Note the Access Key ID and Secret Access Key displayed after creation. Save these securely.
B. Create an IAM Role
Navigate to the IAM service.
Click Roles in the left-hand menu, then click Create Role.
Choose the type of trusted entity:
Select AWS Account.
Enter your AWS account ID and click Next.
Select the AmazonEC2ContainerRegistryReadOnly policy.
Click Next: Tags, optionally add tags, then click Next: Review.
Provide a role name (e.g., zenml-ecr-access-role
) and click Create Role.
Step 3: Provide the IAM User/Role ARN
For an IAM user, the ARN can be found in the Users section under the Summary tab.
For an IAM role, the ARN is displayed in the Roles section under the Summary tab.
Send the ARN to ZenML Support so it can be granted permission to access the ZenML Pro container images and Helm charts.
Step 4: Authenticate your Docker Client
Run these steps on the machine that you'll use to pull the ZenML Pro images. It is recommended that you copy the container images into your own container registry that will be accessible from the Kubernetes cluster where ZenML Pro will be stored, otherwise you'll have to find a way to configure the Kubernetes cluster to authenticate directly to the ZenML Pro container registry and that will be problematic if your Kubernetes cluster is not running on AWS.
A. Install AWS CLI
B. Configure AWS CLI Credentials
Open a terminal and run aws configure
Enter the following when prompted:
Access Key ID: Provided during IAM user creation.
Secret Access Key: Provided during IAM user creation.
Default region name: eu-west-1
Default output format: Leave blank or enter json
.
If you chose to use an IAM role, update the AWS CLI configuration file to specify the role you want to assume. Open the configuration file located at ~/.aws/config
and add the following:
Replace <IAM-ROLE-ARN>
with the ARN of the role you created and ensure source_profile
points to a profile with sufficient permissions to assume the role.
C. Authenticate Docker with ECR
Run the following command to authenticate your Docker client with the ZenML ECR repository:
If you used an IAM role, use the specified profile to execute commands. For example:
This will allow you to authenticate to the ZenML Pro container registries and pull the necessary images with Docker, e.g.:
To access the ZenML Pro container images stored in Google Cloud Platform (GCP) Artifact Registry, you need to set up a GCP account and configure the necessary permissions. The steps below outline how to create a GCP account, configure authentication, and pull images from the private repositories. If you're familiar with GCP or plan on using a GKE cluster to deploy ZenML Pro, you can use your existing GCP account and skip step 1.
Step 1: Create a GCP Account
Click Get Started for Free or sign in with an existing Google account.
Follow the on-screen instructions to set up your account and create a project.
Set up billing information (required for using GCP services).
Step 2: Create a Service Account
Click Create Service Account.
Enter a service account name (e.g., zenml-gar-access
).
Add a description (optional) and click Create and Continue.
No additional permissions are needed as access will be granted directly to the Artifact Registry.
Click Done.
After creation, click on the service account to view its details.
Go to the Keys tab and click Add Key > Create new key.
Choose JSON as the key type and click Create.
Save the downloaded JSON key file securely - you'll need it later.
Step 3: Provide the Service Account Email
In the service account details page, copy the service account email address (it should look like zenml-gar-access@your-project.iam.gserviceaccount.com
).
Send this email address to ZenML Support so it can be granted permission to access the ZenML Pro container images.
Step 4: Authenticate your Docker Client
Run these steps on the machine that you'll use to pull the ZenML Pro images. It is recommended that you copy the container images into your own container registry that will be accessible from the Kubernetes cluster where ZenML Pro will be stored.
A. Install Google Cloud CLI
Initialize the CLI by running:
B. Configure Authentication
Activate the service account using the JSON key file you downloaded:
Configure Docker authentication for Artifact Registry:
C. Pull the Container Images
You can now pull the ZenML Pro images:
If you need to install ZenML Pro in an air-gapped environment (a network with no direct internet access), you'll need to transfer all required artifacts to your internal infrastructure. Here's a step-by-step process:
1. Prepare a Machine with Internet Access
First, you'll need a machine with both internet access and sufficient storage space to temporarily store all artifacts. On this machine:
Follow the authentication steps described above to gain access to the private repositories
Install the required tools:
Docker
Helm
2. Download All Required Artifacts
A Bash script like the following can be used to download all necessary components, or you can run the listed commands manually:
3. Transfer Artifacts to Air-Gapped Environment
Copy the zenml-artifacts.tar.gz
file to your preferred transfer medium (e.g., USB drive, approved file transfer system)
Transfer the archive to a machine in your air-gapped environment that has access to your internal container registry
4. Load Artifacts in Air-Gapped Environment
Create a script to load the artifacts in your air-gapped environment or run the listed commands manually:
5. Update Configuration
When deploying ZenML Pro in your air-gapped environment, make sure to update all references to container images in your Helm values to point to your internal registry. For example:
The scripts provided above are examples and may need to be adjusted based on your specific security requirements and internal infrastructure setup.
6. Using the Helm Charts
After downloading the Helm charts, you can use their local paths instead of a remote OCI registry to deploy ZenML Pro components. Here's an example of how to use them:
To deploy the ZenML Pro control plane and one or more ZenML Pro workspace servers, ensure the following prerequisites are met:
Kubernetes Cluster
A functional Kubernetes cluster is required as the primary runtime environment.
Database Server(s)
The ZenML Pro Control Plane and ZenML Pro Workspace servers need to connect to an external database server. To minimize the amount of infrastructure resources needed, you can use a single database server in common for the Control Plane and for all workspaces, or you can use different database servers to ensure server-level database isolation, as long as you keep in mind the following limitations:
the ZenML Pro Control Plane can be connected to either MySQL or Postgres as the external database
the ZenML Pro Workspace servers can only be connected to a MySQL database (no Postgres support is available)
the ZenML Pro Control Plane as well as every ZenML Pro Workspace server needs to use its own individual database (especially important when connected to the same server)
Ensure you have a valid username and password for the different ZenML Pro services. For improved security, it is recommended to have different users for different services. If the database user does not have permissions to create databases, you must also create a database and give the user full permissions to access and manage it (i.e. create, update and delete tables).
Ingress Controller
Install an Ingress provider in the cluster (e.g., NGINX, Traefik) to handle HTTP(S) traffic routing. Ensure the Ingress provider is properly configured to expose the cluster's services externally.
Domain Name
You'll need an FQDN for the ZenML Pro Control Plane as well as for every ZenML Pro workspace. For this reason, it's highly recommended to use a DNS prefix and associated SSL certificate instead of individual FQDNs and SSL certificates, to make this process easier.
FQDN or DNS Prefix Setup
Obtain a Fully Qualified Domain Name (FQDN) or DNS prefix (e.g., *.zenml-pro.mydomain.com
) from your DNS provider.
Identify the external Load Balancer IP address of the Ingress controller using the command kubectl get svc -n <ingress-namespace>
. Look for the EXTERNAL-IP
field of the Load Balancer service.
Create a DNS A
record (or CNAME
for subdomains) pointing the FQDN to the Load Balancer IP. Example:
Host: zenml-pro.mydomain.com
Type: A
Value: <Load Balancer IP>
Use a DNS propagation checker to confirm that the DNS record is resolving correctly.
Make sure you don't use a simple DNS prefix for the servers (e.g. https://zenml.cluster
is not recommended). This is especially relevant for the TLS certificates that you have to prepare for these endpoints. Always use a fully qualified domain name (FQDN) (e.g. https://zenml.ml.cluster
). The TLS certificates will not be accepted by some browsers otherwise (e.g. Chrome).
SSL Certificate
The ZenML Pro services do not terminate SSL traffic. It is your responsibility to generate and configure the necessary SSL certificates for the ZenML Pro Control Plane as well as all the ZenML Pro workspaces that you will deploy (see the previous point on how to use a DNS prefix to make the process easier).
Obtaining SSL Certificates
Acquire an SSL certificate for the domain. You can use:
A commercial SSL certificate provider (e.g., DigiCert, Sectigo).
Self-signed certificates (not recommended for production environments). IMPORTANT: If you are using self-signed certificates, it is highly recommended to use the same self-signed CA certificate for all the ZenML Pro services (control plane and workspace servers), otherwise it will be difficult to manage the certificates on the client machines. With only one CA certificate, you can install it system-wide on all the client machines only once and then use it to sign all the TLS certificates for the ZenML Pro services.
Configuring SSL Termination
Once the SSL certificate is obtained, configure your load balancer or Ingress controller to terminate HTTPS traffic:
For NGINX Ingress Controller:
You can configure SSL termination globally for the NGINX Ingress Controller by setting up a default SSL certificate or configuring it at the ingress controller level, or you can specify SSL certificates when configuring the ingress in the ZenML server Helm values.
Here's how you can do it globally:
Create a TLS Secret
Store your SSL certificate and private key as a Kubernetes TLS secret in the namespace where the NGINX Ingress Controller is deployed.
Update NGINX Ingress Controller Configurations
Configure the NGINX Ingress Controller to use the default SSL certificate.
If using the NGINX Ingress Controller Helm chart, modify the values.yaml
file or use -set
during installation:
Or directly pass the argument during Helm installation or upgrade:
If the NGINX Ingress Controller was installed manually, edit its deployment to include the argument in the args
section of the container:
For Traefik:
Configure Traefik to use TLS by creating a certificate resolver for Let's Encrypt or specifying the certificates manually in the traefik.yml
or values.yaml
file. Example for Let's Encrypt:
Reference the domain in your IngressRoute or Middleware configuration.
If your Kubernetes cluster is not set to be authenticated to the container registry where the ZenML Pro container images are hosted, you will need to create a secret to allow the ZenML Pro server to pull the images. The following is an example of how to do this if you've received a private access key for the ZenML GCP Artifact Registry from ZenML, but you can use the same approach for your own private container registry:
The key.base64
file should contain the base64 encoded JSON key for the GCP service account as received from the ZenML support team. The image-pull-secret
secret will be used in the next step when installing the ZenML Pro helm chart.
There are a variety of options that can be configured for the ZenML Pro helm chart before installation.
This is an example Helm values YAML file that covers the most common configuration options:
Minimum required settings:
the database credentials (zenml.database.external
)
the URL (zenml.serverURL
) and Ingress hostname (zenml.ingress.host
) where the ZenML Pro Control Plane API and Dashboard will be reachable
In addition to the above, the following might also be relevant for you:
configure container registry credentials (imagePullSecrets
)
injecting custom CA certificates (zenml.certificates
), especially important if the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authority
configure HTTP proxy settings (zenml.proxy
)
custom container image repository locations (zenml.image.api
and zenml.image.dashboard
)
the username and password used for the default admin account (zenml.auth.password
)
additional Ingress settings (zenml.ingress
)
Kubernetes resources allocated to the pods (resources
)
If you set up a common DNS prefix that you plan on using for all the ZenML Pro services, you may configure the domain of the HTTP cookies used by the ZenML Pro dashboard to match it by setting zenml.auth.authCookieDomain
to the DNS prefix (e.g. .my.domain
instead of zenml-pro.my-domain
)
To install the helm chart (assuming the customized configuration values are in a my-values.yaml
file), run:
If the installation is successful, you should be able to see the following workloads running in your cluster:
The Helm chart will output information explaining how to connect and authenticate to the ZenML Pro dashboard:
The credentials are for the default administrator user account provisioned on installation. With these on-hand, you can proceed to the next step and on-board additional users.
If the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authority, you need to install the CA certificates on every machine that needs to access the ZenML server:
installing the CA certificates system-wide is usually the easiest solution. For example, on Ubuntu and Debian-based systems, you can install the CA certificates system-wide by copying the CA certificates into the /usr/local/share/ca-certificates
directory and running update-ca-certificates
.
for some browsers (e.g. Chrome), updating the system's CA certificates is not enough. You will also need to import the CA certificates into the browser.
for Python, you also need to set the REQUESTS_CA_BUNDLE
environment variable to the path to the system's CA certificates bundle file (e.g. export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
)
customize the ZenML client container image using a Dockerfile like this:
then build and push that image to your private container registry:
and finally update your ZenML pipeline code to use the custom ZenML client image by using the DockerSettings
class:
The deployed ZenML Pro service will come with a pre-installed default administrator account. This admin account serves the purpose of creating and recovering other users. First you will need to get the admin password following the instructions at the previous step.
Create a users.yml
file that contains a list of all the users that you want to create for ZenML. Also set a default password. The users will be asked to change this password on their first login.
Run the create_users.py
script below. This will create all of the users.
[file: create_users.py]
The script will prompt you for the URL of your deployment, the admin account email and admin account password and finally the location of your users.yml
file.
The ZenML Pro admin user should only be used for administrative operations: creating other users, resetting the password of existing users and enrolling workspaces. All other operations should be executed while logged in as a regular user.
Head on over to your deployment in the browser and use one of the users you just created to log in.
After logging in for the first time, you will need to create a new password. (Be aware: For the time being only the admin account will be able to reset this password)
Finally you can create an Organization. This Organization will host all the workspaces you enroll at the next stage.
Now you can invite your whole team to the org. For this open the drop-down in the top right and head over to the settings.
Here in the members tab, add all the users you created in the previous step.
For each user, finally head over to the Pending invited screen and copy the invite link for each user.
Finally, send the invitation link, along with the account's email and initial password over to your team members.
Installing and updating on-prem ZenML Pro workspace servers is not automated, as it is with the SaaS version. You will be responsible for enrolling workspace servers in the right ZenML Pro organization, installing them and regularly updating them. Some scripts are provided to simplify this task as much as possible.
Run the enroll-workspace.py
script below
This will collect all the necessary data, then enroll the workspace in the organization and generate a Helm values.yaml
file template that you can use to install the workspace server:
[file: enroll-workspace.py]
Running the script does two things:
it creates a workspace entry in the ZenML Pro database. The workspace will remain in a "provisioning" state and won't be accessible until you actually install it using Helm.
it outputs a YAML file with Helm chart configuration values that you can use to deploy the ZenML Pro workspace server in your Kubernetes cluster.
This is an example of a generated Helm YAML file:
Configure the ZenML Pro workspace Helm chart
IMPORTANT: In configuring the ZenML Pro workspace Helm chart, keep the following in mind:
don't use the same database name for multiple workspaces
don't reuse the control plane database name for the workspace server database
The ZenML Pro workspace server is nothing more than a slightly modified open-source ZenML server. The deployment even uses the official open-source helm chart.
To configure the Helm chart, use the generated YAML file generated at the previous step as a template and fill in the necessary values marked by TODO
comments. At a minimum, you'll need to configure the following:
the MySQL database credentials (zenml.database.url
)
the container image repository where the ZenML Pro workspace server container images are stored (zenml.image.repository
)
the hostname where the ZenML Pro workspace server will be reachable (zenml.ingress.host
and zenml.serverURL
)
injecting custom CA certificates (zenml.certificates
), especially important if the TLS certificate used for the ZenML Pro control plane is signed by a custom Certificate Authority
configure HTTP proxy settings (zenml.proxy
)
set up secrets stores
configure database backup and restore
customize Kubernetes resources
etc.
Deploy the ZenML Pro workspace server with Helm
To install the helm chart (assuming the customized configuration values are in the generated zenml-my-workspace-values.yaml
file), run e.g.:
The deployment is ready when the ZenML server pod is running and healthy:
After deployment, your workspace should show up as running in the ZenML Pro dashboard and can be accessed at the next step.
If you need to deploy multiple workspaces, simply run the enrollment script again with different values.
The newly enrolled workspace should be accessible in the ZenML Pro workspace dashboard and the CLI now. You need to login as an organization member and add yourself as a workspace member first):
Then follow the instructions in the checklist to unlock the full dashboard:
To login to the workspace with the ZenML CLI, you need to pass the custom ZenML Pro API URL to the zenml login
command:
Alternatively, you can set the ZENML_PRO_API_URL
environment variable:
The ZenML Pro workspace server can be configured to optionally support Run Templates - the ability to run pipelines straight from the dashboard. This feature is not enabled by default and needs a few additional steps to be set up.
The Run Templates feature is only available from ZenML workspace server version 0.81.0 onwards.
The Run Templates feature comes with some optional sub-features that can be turned on or off to customize the behavior of the feature:
Building runner container images: Running pipelines from the dashboard relies on Kubernetes jobs (aka "runner" jobs) that are triggered by the ZenML workspace server. These jobs need to use container images that have the correct Python software packages installed on them to be able to launch the pipelines.
The good news is that run templates are based on pipeline runs that have already run in the past and already have container images built and associated with them. The same container images can be reused by the ZenML workspace server for the "runner jobs". However, for this to work, the Kubernetes cluster itself has to be able to access the container registries where these images are stored. This can be achieved in several ways:
use implicit workload identity access to the container registry - available in most cloud providers by granting the Kubernetes service account access to the container registry
configure a service account with implicit access to the container registry - associating some cloud service identity (e.g. a GCP service account, an AWS IAM role, etc.) with the Kubernetes service account used by the "runner" jobs
configure an image pull secret for the service account - similar to the previous option, but using a Kubernetes secret instead of a cloud service identity
When none of the above are available or desirable, an alternative approach is to configure the ZenML workspace server itself to build these "runner" container images and push them to a different container registry. This can be achieved by setting the ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
environment variable to true
and the ZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY
environment variable to the container registry where the "runner" images will be pushed.
Yet another alternative is to configure the ZenML workspace server to use a single pre-built "runner" image for all the pipeline runs. This can be achieved by keeping ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
environment variable set to false
and the ZENML_KUBERNETES_WORKLOAD_MANAGER_RUNNER_IMAGE
environment variable set to the container image registry URI where the "runner" image is stored. Note that this image needs to have all requirements installed to instantiate the stack that will be used for the template run.
Store logs externally: By default, the ZenML workspace server will use the logs extracted from the "runner" job pods to populate the run template logs shown in the ZenML dashboard. These pods may disappear after a while, so the logs may not be available anymore.
To avoid this, you can configure the ZenML workspace server to store the logs in an external location, like an S3 bucket. This can be achieved by setting the ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS
environment variable to true
.
This option is only currently available with the AWS implementation of the Run Templates feature and also requires the ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET
environment variable to be set to point to the S3 bucket where the logs will be stored.
Decide on an implementation.
There are currently three different implementations of the Run Templates feature:
Kubernetes: runs pipelines in the same Kubernetes cluster as the ZenML Pro workspace server.
AWS: extends the Kubernetes implementation to be able to build and push container images to AWS ECR and to store run the template logs in AWS S3.
GCP: currently, this is the same as the Kubernetes implementation, but we plan to extend it to be able to push container images to GCP GCR and to store run template logs in GCP GCS.
If you're going for a fast, minimalistic setup, you should go for the Kubernetes implementation. If you want a complete cloud provider solution with all features enabled, you should go for the AWS implementation.
Prepare Run Templates configuration.
You'll need to prepare a list of environment variables that will be added to the Helm chart values used to deploy the ZenML workspace server.
For all implementations, the following variables are supported:
ZENML_SERVER_WORKLOAD_MANAGER_IMPLEMENTATION_SOURCE
(mandatory): one of the values associated with the implementation you've chosen in step 1:
zenml_cloud_plugins.kubernetes_workload_manager.KubernetesWorkloadManager
zenml_cloud_plugins.aws_kubernetes_workload_manager.AWSKubernetesWorkloadManager
zenml_cloud_plugins.gcp_kubernetes_workload_manager.GCPKubernetesWorkloadManager
ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE
(mandatory): the Kubernetes namespace where the "runner" jobs will be launched. It must exist before the run templates are enabled.
ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT
(mandatory): the Kubernetes service account to use for the "runner" jobs. It must exist before the run templates are enabled.
ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
(optional): whether to build the "runner" container images or not. Defaults to false
.
ZENML_KUBERNETES_WORKLOAD_MANAGER_DOCKER_REGISTRY
(optional): the container registry where the "runner" images will be pushed. Mandatory if ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
is set to true
, ignored otherwise.
ZENML_KUBERNETES_WORKLOAD_MANAGER_RUNNER_IMAGE
(optional): the "runner" container image to use. Only used if ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
is set to false
, ignored otherwise.
ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS
(optional): whether to store the logs of the "runner" jobs in an external location. Defaults to false
. Currently only supported with the AWS implementation and requires the ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET
variable to be set as well.
ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE_POD_RESOURCES
(optional): the Kubernetes pod resources specification to use for the "runner" jobs, in JSON format. Example: {"requests": {"cpu": "100m", "memory": "400Mi"}, "limits": {"memory": "700Mi"}}
.
For the AWS implementation, the following additional variables are supported:
ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_BUCKET
(optional): the S3 bucket where the logs will be stored (e.g. s3://my-bucket/run-template-logs
). Mandatory if ZENML_KUBERNETES_WORKLOAD_MANAGER_ENABLE_EXTERNAL_LOGS
is set to true
.
ZENML_AWS_KUBERNETES_WORKLOAD_MANAGER_REGION
(optional): the AWS region where the container images will be pushed (e.g. eu-central-1
). Mandatory if ZENML_KUBERNETES_WORKLOAD_MANAGER_BUILD_RUNNER_IMAGE
is set to true
.
Create the Kubernetes resources.
For the Kubernetes implementation, you'll need to create the following resources:
the Kubernetes namespace passed in the ZENML_KUBERNETES_WORKLOAD_MANAGER_NAMESPACE
variable.
the Kubernetes service account passed in the ZENML_KUBERNETES_WORKLOAD_MANAGER_SERVICE_ACCOUNT
variable. This service account will be used to build images and run the "runner" jobs, so it needs to have the necessary permissions to do so (e.g. access to the container images, permissions to push container images to the configured container registry, permissions to access the configured bucket, etc.).
Finally, update the ZenML workspace server configuration to use the new implementation.
Example updated Helm values file (minimal configuration):
Example updated Helm values file (full AWS configuration):
Example updated Helm values file (full GCP configuration):
This section covers how to upgrade or update your ZenML Pro deployment. The process involves updating both the ZenML Pro Control Plane and the ZenML Pro workspace servers.
Always upgrade the ZenML Pro Control Plane first, then upgrade the workspace servers. This ensures compatibility and prevents potential issues.
Check Available Versions and Release Notes
For ZenML Pro Control Plane:
For ZenML Pro Workspace Servers:
Fetch and Prepare New Software Artifacts
ZenML Pro Control Plane container images and Helm chart
ZenML Pro workspace server container images and Helm chart
If using a private registry, copy the new container images to your private registry
Upgrade the ZenML Pro Control Plane
Option A - In-place upgrade with existing values. Use this if you don't need to change any configuration values as part of the upgrade:
Option B - Retrieve, modify and reapply values, if necessary. Use this if you need to change any configuration values as part of the upgrade or if you are performing a configuration update without upgrading the ZenML Pro Control Plane.
Upgrade ZenML Pro Workspace Servers
For each workspace, perform either:
Option A - In-place upgrade with existing values. Use this if you don't need to change any configuration values as part of the upgrade:
Option B - Retrieve, modify and reapply values, if necessary. Use this if you need to change any configuration values as part of the upgrade or if you are performing a configuration update without upgrading the ZenML Pro Workspace Server.
The container image tags and the Helm chart versions are both synchronized and linked to the ZenML Pro releases. You can find the ZenML Pro Helm chart along with the available released versions in the .
The container image tags and the Helm chart versions are both synchronized and linked to the ZenML open-source releases. To find the latest ZenML OSS release, please check the or the .
If you're planning on running containerized ZenML pipelines, or using other containerization related ZenML features, you'll also need to access the public ZenML client container image located . This isn't a problem unless you're deploying ZenML Pro in an air-gapped environment, in which case you'll also have to copy the client container image into your own container registry. You'll also have to configure your code to use the correct base container registry via DockerSettings (see the for more information).
Visit the .
Follow the instructions to install the AWS CLI: .
for the available ZenML Pro versions: the
for the available ZenML OSS versions: the or the
Visit the .
Navigate to the page in the Google Cloud Console.
Follow the instructions to install the .
for the available ZenML Pro versions: the
for the available ZenML OSS versions: the or the
Free services like for domain validation and issuance.
If you used a custom CA certificate to sign the TLS certificates for the ZenML Pro services, you will need to install the CA certificates on every client machine, as covered in the section.
The above are infrastructure requirements for ZenML Pro. If, in addition to ZenML, you would also like to reuse the same Kubernetes cluster to run machine learning workloads with ZenML, you will require the following additional infrastructure resources and services to be able to set up :
can be set up to run on the same cluster as ZenML Pro. For authentication, you will be able to configure
you'll need a container registry to store the container images built by ZenML. If you don't have one already, you can install on the same cluster as ZenML Pro.
you'll also need some form of centralized object storage to store the artifacts generated by ZenML. If you don't have one already, you can install on the same cluster as ZenML Pro and then configure the to use it.
(optional) you can install in your Kubernetes cluster to build the container images for your ZenML pipelines and then configure it as a in your ZenML Stack.
You can take look at the and and familiarize yourself with some of the configuration settings that you can customize for your ZenML Pro deployment. Alternatively, you can unpack the README.md
and values.yaml
files included in the helm chart:
later on, when you're running containerized pipelines with ZenML, you'll also want to install those same CA certificates into the container images built by ZenML by customizing the build process via . For example:
There are a variety of options that can be configured for the ZenML Pro workspace server chart before installation. You can start by taking a look at the and and familiarize yourself with some of the configuration settings that you can customize for your ZenML server deployment. Alternatively, you can unpack the README.md
and values.yaml
files included in the helm chart:
configure container registry credentials (imagePullSecrets
, same as )
You may also choose to configure additional features documented in , if you need them:
If you use TLS certificates for the ZenML Pro control plane or workspace server signed by a custom Certificate Authority, remember to .
The environment variables you prepared in step 2 need to be added to the Helm chart values used to deploy the ZenML workspace server and the ZenML server has to be updated as covered in the section.
Check available versions in the
Check available versions in the
Review the for release notes and breaking changes
Follow the section to get access to the new versions of:
If you are using an air-gapped installation, follow the instructions