Kubernetes with Helm

Deploy ZenML Pro Hybrid using Kubernetes and Helm charts.

This guide provides step-by-step instructions for deploying ZenML Pro in a Hybrid setup using Kubernetes and Helm charts. In this deployment model, the Workspace Server runs in your infrastructure while the Control Plane is managed by ZenML.

What you'll configure:

  • Workspace Server with database connection

  • Network connectivity to ZenML Control Plane

  • Workload manager for running pipelines from the UI

  • TLS/SSL certificates and domain name

Prerequisites

  • Kubernetes cluster (1.24+) - EKS, GKE, AKS, or self-managed

  • kubectl configured to access your cluster

  • helm CLI (3.0+) installed

  • A domain name and TLS certificate for your ZenML server

  • MySQL database (managed or self-hosted)

  • Outbound HTTPS access to cloudapi.zenml.io

Tools (on a machine with internet access for initial setup):

Before starting, complete the setup described in Hybrid Deployment Overview:

  • Step 1: Set up ZenML Pro organization

  • Step 2: Configure your infrastructure (database, networking, TLS)

  • Step 3: Obtain Pro credentials from ZenML Support

Step 1: Prepare Helm Chart and docker images

Pull Container Images

Access and pull from the ZenML Pro container registries:

  1. Authenticate to the ZenML Pro container registries (AWS ECR or GCP Artifact Registry)

    • Use the credentials that you provided to the ZenML Support to access the private zenml container registry

  2. Pull all required images:

    • Workspace Server image (AWS ECR):

      • 715803424590.dkr.ecr.eu-central-1.amazonaws.com/zenml-pro-server:<version>

    • Workspace Server image (GCP Artifact Registry):

      • europe-west3-docker.pkg.dev/zenml-cloud/zenml-pro/zenml-pro-server:<version>

    • Client image (for pipelines):

      • zenmldocker/zenml:<version>

    Example pull commands (AWS ECR):

    Example pull commands (GCP Artifact Registry):

Pull Helm chart

For OCI-based Helm charts, you can either pull the chart or install directly. To pull the chart first:

Alternatively, you can install directly from OCI (see Step 3 below).

Step 2: Create Helm Values File

Create a file zenml-hybrid-values.yaml with your configuration:

Minimum required settings:

  • the database credentials (zenml.database.url)

  • the URL (zenml.serverURL) and Ingress hostname (zenml.ingress.host) where the ZenML Hybrid workspace server will be reachable

  • the Pro configuration (zenml.pro.*) with your organization and workspace details

Additional relevant settings:

  • configure container registry credentials (imagePullSecrets) if your cluster cannot authenticate directly to the ZenML Pro container registry

  • injecting custom CA certificates (zenml.certificates), especially important if the TLS certificates used by the ZenML Pro services are signed by a custom Certificate Authority

  • configure HTTP proxy settings (zenml.proxy)

  • custom container image repository location (zenml.image.repository)

  • additional Ingress settings (zenml.ingress)

  • Kubernetes resources allocated to the pods (resources)

Step 3: Deploy with Helm

Install the ZenML chart directly from OCI:

Or if you pulled the chart in Step 1, install from the local file:

Monitor the deployment:

Wait for the pod to be running:

Step 4: Verify the Deployment

Check Service is Running

Verify Control Plane Connection

Look for messages indicating successful connection to the control plane.

Test HTTPS Connectivity

Access the Dashboard

  1. Navigate to https://zenml.mycompany.com in your browser

  2. You should be redirected to ZenML Cloud login

  3. Sign in with your organization credentials

  4. You should see your workspace listed

Step 5: Configure Workload Manager

The Workspace Server includes a workload manager that enables running pipelines directly from the ZenML Pro UI. This requires the workspace server to have access to a Kubernetes cluster where ad-hoc runner pods can be created.

circle-exclamation

1. Create Kubernetes Resources for Workload Manager

Create a dedicated namespace and service account:

2. Configure Workload Manager in Helm Values

Add environment variables to your zenml-hybrid-values.yaml:

Option A: Kubernetes-based (Simplest)

Option B: AWS-based (if running on EKS)

Option C: GCP-based (if running on GKE)

4. Redeploy with Updated Values

Domain Name

You'll need an FQDN for the ZenML Hybrid workspace server.

  • FQDN Setup Obtain a Fully Qualified Domain Name (FQDN) (e.g., zenml.mycompany.com) from your DNS provider.

    • Identify the external Load Balancer IP address of the Ingress controller using the command kubectl get svc -n <ingress-namespace>. Look for the EXTERNAL-IP field of the Load Balancer service.

    • Create a DNS A record (or CNAME for subdomains) pointing the FQDN to the Load Balancer IP. Example:

      • Host: zenml.mycompany.com

      • Type: A

      • Value: <Load Balancer IP>

    • Use a DNS propagation checker to confirm that the DNS record is resolving correctly.

circle-exclamation

SSL Certificate

The ZenML Hybrid workspace server does not terminate SSL traffic. It is your responsibility to generate and configure the necessary SSL certificates for the workspace server.

Obtaining SSL Certificates

Acquire an SSL certificate for the domain. You can use:

  • A commercial SSL certificate provider (e.g., DigiCert, Sectigo).

  • Free services like Let's Encryptarrow-up-right for domain validation and issuance.

  • Self-signed certificates (not recommended for production environments). IMPORTANT: If you are using self-signed certificates, you will need to install the CA certificate on every client machine that connects to the workspace server.

Configuring SSL Termination

Once the SSL certificate is obtained, configure your load balancer or Ingress controller to terminate HTTPS traffic:

For NGINX Ingress Controller:

You can configure SSL termination globally for the NGINX Ingress Controller by setting up a default SSL certificate or configuring it at the ingress controller level, or you can specify SSL certificates when configuring the ingress in the ZenML server Helm values.

Here's how you can do it globally:

  1. Create a TLS Secret

    Store your SSL certificate and private key as a Kubernetes TLS secret in the namespace where the NGINX Ingress Controller is deployed.

  2. Update NGINX Ingress Controller Configurations

    Configure the NGINX Ingress Controller to use the default SSL certificate.

    • If using the NGINX Ingress Controller Helm chart, modify the values.yaml file or use --set during installation:

      Or directly pass the argument during Helm installation or upgrade:

    • If the NGINX Ingress Controller was installed manually, edit its deployment to include the argument in the args section of the container:

For Traefik:

  • Configure Traefik to use TLS by creating a certificate resolver for Let's Encrypt or specifying the certificates manually in the traefik.yml or values.yaml file. Example for Let's Encrypt:

  • Reference the domain in your IngressRoute or Middleware configuration.

circle-exclamation

Configure Ingress in Helm Values

After setting up SSL termination at the ingress controller level, configure the ZenML Helm values to use ingress:

For NGINX:

For Traefik:

Database Backup Strategy (Optional)

ZenML supports backing up the database before migrations are performed. Configure the backup strategy in your values file:

circle-info

Local SQLite persistence (zenml.database.persistence) is only relevant when not using an external MySQL database. For hybrid deployments with external MySQL, configure backups at the database level.

Scaling & High Availability

Multiple Replicas

Horizontal Pod Autoscaler

Monitoring & Logging

Debug Logging

Enable verbose debug logging in the ZenML server:

Collecting Logs

View server logs with:

Updating the Deployment

Update Configuration

  1. Modify zenml-hybrid-values.yaml

  2. Upgrade with Helm:

Upgrade ZenML Version

  1. Check available versions:

For the latest available ZenML Helm chart versions, visit: https://artifacthub.io/packages/helm/zenml/zenml

  1. Update values file with new version

  2. Upgrade:

Troubleshooting

Pod won't start

Uninstalling

Next Steps

Last updated

Was this helpful?