Using ZenML server in production
Learn about best practices for using ZenML server in production environments.
Last updated
Learn about best practices for using ZenML server in production environments.
Last updated
Setting up a ZenML server for testing is a quick process. However, most people have to move beyond so-called 'day zero' operations and in such cases, it helps to learn best practices around setting up your ZenML server in a production-ready way. This guide encapsulates all the tips and tricks we've learned ourselves and from working with people who use ZenML in production environments. Following are some of the best practices we recommend.
If you are using ZenML Pro, you don't have to worry about any of these. We have got you covered! You can sign up for a free trial here.
In production, you often have to run bigger and longer running pipelines that might strain your server's resources. It is a good idea to set up autoscaling for your ZenML server so that you don't have to worry about your pipeline runs getting interrupted or your Dashboard slowing down due to high traffic.
How you do it depends greatly on the environment in which you have deployed your ZenML server. Below are some common deployment options and how to set up autoscaling for them.
If you are using the official ZenML Helm chart, you can take advantage of the autoscaling.enabled
flag to enable autoscaling for your ZenML server. For example:
This will create a horizontal pod autoscaler for your ZenML server that will scale the number of replicas up to 10 and down to 1 based on the CPU utilization of the pods.
One other way to improve the performance of your ZenML server is to increase the number of threads that your server process uses, provided that you have hardware that can support it.
You can control this by setting the zenml.threadPoolSize
value in the ZenML Helm chart values. For example:
By default, it is set to 40. If you are using any other deployment option, you can set the ZENML_SERVER_THREAD_POOL_SIZE
environment variable to the desired value.
Once this is set, you should also modify the zenml.database.poolSize
and zenml.database.maxOverflow
values to ensure that the ZenML server workers do not block on database connections (i.e. the sum of the pool size and max overflow should be greater than or equal to the thread pool size). If you manage your own database, ensure these values are set appropriately.
An important component of the ZenML server deployment is the backing database. When you start scaling your ZenML server instances, you will also need to scale your database to avoid any bottlenecks.
We would recommend starting out with a simple (single) database instance and then monitoring it to decide if it needs scaling. Some common metrics to look out for:
CPU Utilization: If the CPU Utilization is consistently above 50%, you may need to scale your database. Some spikes in the utlization are expected but it should not be consistently high.
Freeable Memory: It is natural for the freeable memory to go down with time as your database uses it for caching and buffering but if it drops below 100-200 MB, you may need to scale your database.
Exposing your ZenML server to the internet securely and reliably is a must for production use cases. One way to do this is to set up an ingress/load balancer.
If you are using the official ZenML Helm chart, you can take advantage of the zenml.ingress.enabled
flag to enable ingress for your ZenML server. For example:
This will create an NGINX ingress for your ZenML service that will create a LoadBalancer on whatever cloud provider you are using.
Monitoring your service is crucial to ensure that it is running smoothly and to catch any issues early before they can cause problems. Depending on the deployment option you are using, you can use different tools to monitor your service.
You can set up Prometheus and Grafana to monitor your ZenML server. We recommend using the kube-prometheus-stack
Helm chart from the prometheus-community to get started quickly.
Once you have deployed the chart, you can find your grafana service by searching for services in the namespace you have deployed the chart in. Port-forward it to your local machine or deploy it through an ingress.
You can now use queries like the following to monitor your ZenML server:
This query would give you the CPU utilization of your server pods in all namespaces that start with zenml
. The image below shows how this query would look like in Grafana.
The data in your ZenML server is critical as it contains your pipeline runs, stack configurations, and other important information. It is, therefore, recommended to have a backup strategy in place to avoid losing any data.
Some common strategies include:
Setting up automated backups with a good retention period (say 30 days).
Periodically exporting the data to an external storage (e.g. S3, GCS, etc.).
Manual backups before upgrading your server to avoid any problems.