Enable Resource Pools

Enable the ZenML Pro resource pool reconciler microservice for self-hosted workspace servers on Kubernetes.

Resource pools let you model shared capacity (GPUs, custom keys, and related limits) for dynamic pipelines. Keeping pool state consistent uses a background reconciler process in ZenML Pro.

On Kubernetes self-hosted deployments, you enable that process by adding a microservice (the resource pool reconciler) that runs plugins start-resource-pool-reconciler (same image as the workspace server).

circle-exclamation
circle-exclamation

Prerequisites

  • Resource pools apply to dynamic pipelinesarrow-up-right. Ensure your teams understand that contract before enabling the reconciler.

  • Enough cluster resources for one extra microservice (see the example resources below).

What to configure in Helm

The ZenML Helm chart deploys optional background processes as additional microservices, each declared under the workerDeployments key in your workspace values.yaml. Each map entry becomes its own Kubernetes Deployment.

Add the resource pool reconciler under workerDeployments next to your existing server: configuration. That microservice uses the same container image as the ZenML Pro server by default and overrides the entrypoint to run the reconciler.

Set SQLAlchemy pool sizes appropriate for a dedicated pod. The example below is a reasonable starting point; adjust resources and probes for your environment.

circle-exclamation
workerDeployments:
  resource-pool-reconciler:
    enabled: true
    replicaCount: 1
    command: ["plugins"]
    args: ["start-resource-pool-reconciler"]
    strategy:
      type: Recreate
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    environment:
      ZENML_STORE_POOL_SIZE: "1"
      ZENML_STORE_MAX_OVERFLOW: "1"
    livenessProbe:
      httpGet:
        path: /health
        port: 8000
      initialDelaySeconds: 30
      periodSeconds: 60
      timeoutSeconds: 2
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /health
        port: 8000
      initialDelaySeconds: 10
      periodSeconds: 30
      timeoutSeconds: 2
      failureThreshold: 3

Environment variables (reference)

Variable
Purpose

ZENML_STORE_POOL_SIZE

SQLAlchemy pool size for store access in this microservice

ZENML_STORE_MAX_OVERFLOW

SQLAlchemy max overflow for the store connection pool

Apply the change

After updating your values file, upgrade the release (adjust release name and namespace as you use them):

Last updated

Was this helpful?