Kubernetes with Helm
Deploy ZenML Pro Hybrid using Kubernetes and Helm charts.
This guide provides step-by-step instructions for deploying ZenML Pro in a Hybrid setup using Kubernetes and Helm charts. In this deployment model, the Workspace Server runs in your infrastructure while the Control Plane is managed by ZenML.
What you'll configure:
Workspace Server with database connection
Network connectivity to ZenML Control Plane
Workload manager for running pipelines from the UI
TLS/SSL certificates and domain name
Prerequisites
Kubernetes cluster (1.24+) - EKS, GKE, AKS, or self-managed
kubectlconfigured to access your clusterhelmCLI (3.0+) installedA domain name and TLS certificate for your ZenML server
MySQL database (managed or self-hosted)
Outbound HTTPS access to
cloudapi.zenml.io
Tools (on a machine with internet access for initial setup):
Docker
Helm (3.0+)
Access to pull ZenML Pro images from private registries (contact [email protected])
Before starting, complete the setup described in Hybrid Deployment Overview:
Step 1: Set up ZenML Pro organization
Step 2: Configure your infrastructure (database, networking, TLS)
Step 3: Obtain Pro credentials from ZenML Support
Step 1: Prepare Helm Chart and docker images
Pull Container Images
Access and pull from the ZenML Pro container registries:
Authenticate to the ZenML Pro container registries (AWS ECR or GCP Artifact Registry)
Use the credentials that you provided to the ZenML Support to access the private zenml container registry
Pull all required images:
Workspace Server image (AWS ECR):
715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server:<version>
Workspace Server image (GCP Artifact Registry):
europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-server:<version>
Client image (for pipelines):
zenmldocker/zenml:<version>
Example pull commands (AWS ECR):
Example pull commands (GCP Artifact Registry):
Pull Helm chart
For OCI-based Helm charts, you can either pull the chart or install directly. To pull the chart first:
Alternatively, you can install directly from OCI (see Step 3 below).
Step 2: Create Helm Values File
Create a file zenml-hybrid-values.yaml with your configuration:
Minimum required settings:
the database credentials (
zenml.database.url)the URL (
zenml.serverURL) and Ingress hostname (zenml.ingress.host) where the ZenML Hybrid workspace server will be reachablethe Pro configuration (
zenml.pro.*) with your organization and workspace details
Additional relevant settings:
configure container registry credentials (
imagePullSecrets) if your cluster cannot authenticate directly to the ZenML Pro container registryinjecting custom CA certificates (
zenml.certificates), especially important if the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authorityconfigure HTTP proxy settings (
zenml.proxy)custom container image repository location (
zenml.image.repository)additional Ingress settings (
zenml.ingress)Kubernetes resources allocated to the pods (
resources)
Step 3: Deploy with Helm
Install the ZenML chart directly from OCI:
Or if you pulled the chart in Step 1, install from the local file:
Monitor the deployment:
Wait for the pod to be running:
Step 4: Verify the Deployment
Check Service is Running
Verify Control Plane Connection
Look for messages indicating successful connection to the control plane.
Test HTTPS Connectivity
Access the Dashboard
Navigate to
https://zenml.mycompany.comin your browserYou should be redirected to ZenML Cloud login
Sign in with your organization credentials
You should see your workspace listed
Step 5: Configure Workload Manager
The Workspace Server includes a workload manager that enables running pipelines directly from the ZenML Pro UI. This requires the workspace server to have access to a Kubernetes cluster where ad-hoc runner pods can be created.
Snapshots are only available from ZenML workspace server version 0.90.0 onwards.
1. Create Kubernetes Resources for Workload Manager
Create a dedicated namespace and service account:
2. Configure Workload Manager in Helm Values
Add environment variables to your zenml-hybrid-values.yaml:
Option A: Kubernetes-based (Simplest)
Option B: AWS-based (if running on EKS)
Option C: GCP-based (if running on GKE)
3. Configure Pod Resources (Optional but Recommended)
4. Redeploy with Updated Values
Domain Name
You'll need an FQDN for the ZenML Hybrid workspace server.
FQDN Setup Obtain a Fully Qualified Domain Name (FQDN) (e.g.,
zenml.mycompany.com) from your DNS provider.Identify the external Load Balancer IP address of the Ingress controller using the command
kubectl get svc -n <ingress-namespace>. Look for theEXTERNAL-IPfield of the Load Balancer service.Create a DNS
Arecord (orCNAMEfor subdomains) pointing the FQDN to the Load Balancer IP. Example:Host:
zenml.mycompany.comType:
AValue:
<Load Balancer IP>
Use a DNS propagation checker to confirm that the DNS record is resolving correctly.
Make sure you don't use a simple DNS prefix for the server (e.g. https://zenml.cluster is not recommended). Always use a fully qualified domain name (FQDN) (e.g. https://zenml.ml.cluster). The TLS certificates will not be accepted by some browsers otherwise (e.g. Chrome).
SSL Certificate
The ZenML Hybrid workspace server does not terminate SSL traffic. It is your responsibility to generate and configure the necessary SSL certificates for the workspace server.
Obtaining SSL Certificates
Acquire an SSL certificate for the domain. You can use:
A commercial SSL certificate provider (e.g., DigiCert, Sectigo).
Free services like Let's Encrypt for domain validation and issuance.
Self-signed certificates (not recommended for production environments). IMPORTANT: If you are using self-signed certificates, you will need to install the CA certificate on every client machine that connects to the workspace server.
Configuring SSL Termination
Once the SSL certificate is obtained, configure your load balancer or Ingress controller to terminate HTTPS traffic:
For NGINX Ingress Controller:
You can configure SSL termination globally for the NGINX Ingress Controller by setting up a default SSL certificate or configuring it at the ingress controller level, or you can specify SSL certificates when configuring the ingress in the ZenML server Helm values.
Here's how you can do it globally:
Create a TLS Secret
Store your SSL certificate and private key as a Kubernetes TLS secret in the namespace where the NGINX Ingress Controller is deployed.
Update NGINX Ingress Controller Configurations
Configure the NGINX Ingress Controller to use the default SSL certificate.
If using the NGINX Ingress Controller Helm chart, modify the
values.yamlfile or use--setduring installation:Or directly pass the argument during Helm installation or upgrade:
If the NGINX Ingress Controller was installed manually, edit its deployment to include the argument in the
argssection of the container:
For Traefik:
Configure Traefik to use TLS by creating a certificate resolver for Let's Encrypt or specifying the certificates manually in the
traefik.ymlorvalues.yamlfile. Example for Let's Encrypt:Reference the domain in your IngressRoute or Middleware configuration.
If you used a custom CA certificate to sign the TLS certificates for the ZenML Hybrid workspace server, you will need to install the CA certificates on every client machine.
Configure Ingress in Helm Values
After setting up SSL termination at the ingress controller level, configure the ZenML Helm values to use ingress:
For NGINX:
For Traefik:
Database Backup Strategy (Optional)
ZenML supports backing up the database before migrations are performed. Configure the backup strategy in your values file:
Scaling & High Availability
Multiple Replicas
Horizontal Pod Autoscaler
Monitoring & Logging
Debug Logging
Enable verbose debug logging in the ZenML server:
Collecting Logs
View server logs with:
Updating the Deployment
Update Configuration
Modify
zenml-hybrid-values.yamlUpgrade with Helm:
Upgrade ZenML Version
Check available versions:
For the latest available ZenML Helm chart versions, visit: https://artifacthub.io/packages/helm/zenml/zenml
Update values file with new version
Upgrade:
Troubleshooting
Pod won't start
Uninstalling
Next Steps
Related Documentation
Last updated
Was this helpful?