Using ZenML server in production

Learn about best practices for using ZenML server in production environments.

Setting up a ZenML server for testing is a quick process. However, most people have to move beyond so-called 'day zero' operations and in such cases, it helps to learn best practices around setting up your ZenML server in a production-ready way. This guide encapsulates all the tips and tricks we've learned ourselves and from working with people who use ZenML in production environments. Following are some of the best practices we recommend.

If you are using ZenML Pro, you don't have to worry about any of these. We have got you covered! You can sign up for a free trial here.

Autoscaling replicas

In production, you often have to run bigger and longer running pipelines that might strain your server's resources. It is a good idea to set up autoscaling for your ZenML server so that you don't have to worry about your pipeline runs getting interrupted or your Dashboard slowing down due to high traffic.

How you do it depends greatly on the environment in which you have deployed your ZenML server. Below are some common deployment options and how to set up autoscaling for them.

If you are using the official ZenML Helm chart, you can take advantage of the autoscaling.enabled flag to enable autoscaling for your ZenML server. For example:

autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80

This will create a horizontal pod autoscaler for your ZenML server that will scale the number of replicas up to 10 and down to 1 based on the CPU utilization of the pods.

If you use Docker Compose, you don't get autoscaling out of the box. However, you can scale your service to N number of replicas using the scale flag. For example:

docker compose up --scale zenml-server=N

This will scale your ZenML server to N replicas.

High connection pool values

One other way to improve the performance of your ZenML server is to increase the number of threads that your server process uses, provided that you have hardware that can support it.

You can control this by setting the zenml.threadPoolSize value in the ZenML Helm chart values. For example:

zenml:
  threadPoolSize: 100

By default, it is set to 40. If you are using any other deployment option, you can set the ZENML_SERVER_THREAD_POOL_SIZE environment variable to the desired value.

Once this is set, you should also modify the zenml.database.poolSize and zenml.database.maxOverflow values to ensure that the ZenML server workers do not block on database connections (i.e. the sum of the pool size and max overflow should be greater than or equal to the thread pool size). If you manage your own database, ensure these values are set appropriately.

Scaling the backing database

An important component of the ZenML server deployment is the backing database. When you start scaling your ZenML server instances, you will also need to scale your database to avoid any bottlenecks.

We would recommend starting out with a simple (single) database instance and then monitoring it to decide if it needs scaling. Some common metrics to look out for:

CPU Utilization: If the CPU Utilization is consistently above 50%, you may need to scale your database. Some spikes in the utlization are expected but it should not be consistently high.
Freeable Memory: It is natural for the freeable memory to go down with time as your database uses it for caching and buffering but if it drops below 100-200 MB, you may need to scale your database.

Setting up an ingress/load balancer

Exposing your ZenML server to the internet securely and reliably is a must for production use cases. One way to do this is to set up an ingress/load balancer.

If you are using the official ZenML Helm chart, you can take advantage of the zenml.ingress.enabled flag to enable ingress for your ZenML server. For example:

zenml:
  ingress:
    enabled: true
    className: "nginx"
    annotations:
      # nginx.ingress.kubernetes.io/ssl-redirect: "true"
      # nginx.ingress.kubernetes.io/rewrite-target: /$1
      # kubernetes.io/ingress.class: nginx
      # kubernetes.io/tls-acme: "true"
      # cert-manager.io/cluster-issuer: "letsencrypt"

This will create an NGINX ingress for your ZenML service that will create a LoadBalancer on whatever cloud provider you are using.

Monitoring

Monitoring your service is crucial to ensure that it is running smoothly and to catch any issues early before they can cause problems. Depending on the deployment option you are using, you can use different tools to monitor your service.

You can set up Prometheus and Grafana to monitor your ZenML server. We recommend using the kube-prometheus-stack Helm chart from the prometheus-community to get started quickly.

Once you have deployed the chart, you can find your grafana service by searching for services in the namespace you have deployed the chart in. Port-forward it to your local machine or deploy it through an ingress.

You can now use queries like the following to monitor your ZenML server:

sum by(namespace) (rate(container_cpu_usage_seconds_total{namespace=~"zenml.*"}[5m]))

This query would give you the CPU utilization of your server pods in all namespaces that start with zenml. The image below shows how this query would look like in Grafana.

Backups

The data in your ZenML server is critical as it contains your pipeline runs, stack configurations, and other important information. It is, therefore, recommended to have a backup strategy in place to avoid losing any data.

Some common strategies include:

Setting up automated backups with a good retention period (say 30 days).
Periodically exporting the data to an external storage (e.g. S3, GCS, etc.).
Manual backups before upgrading your server to avoid any problems.

PreviousUpgrade your ZenML server NextTroubleshoot your ZenML server

Last updated 15 hours ago