Kubernetes Orchestrator
Orchestrating your pipelines to run on Kubernetes clusters.
Kubernetes Orchestrator
This Kubernetes-native orchestrator is a minimalist, lightweight alternative to other distributed orchestrators like Airflow or Kubeflow.
Overall, the Kubernetes orchestrator is quite similar to the Kubeflow orchestrator in that it runs each pipeline step in a separate Kubernetes pod. However, the orchestration of the different pods is not done by Kubeflow but by a separate master pod that orchestrates the step execution via topological sort.
Compared to Kubeflow, this means that the Kubernetes-native orchestrator is faster and much simpler to start with since you do not need to install and maintain Kubeflow on your cluster. The Kubernetes-native orchestrator is an ideal choice for teams new to distributed orchestration that do not want to go with a fully-managed offering.
However, since Kubeflow is much more mature, you should, in most cases, aim to move your pipelines to Kubeflow in the long run. A smooth way to production-grade orchestration could be to set up a Kubernetes cluster first and get started with the Kubernetes-native orchestrator. If needed, you can then install and set up Kubeflow later and simply switch out the orchestrator of your stack as soon as your full setup is ready.
When to use it
You should use the Kubernetes orchestrator if:
you're looking lightweight way of running your pipelines on Kubernetes.
you don't need a UI to list all your pipeline runs.
How to deploy it
The Kubernetes orchestrator requires a Kubernetes cluster in order to run. There are many ways to deploy a Kubernetes cluster using different cloud providers or on your custom infrastructure, and we can't possibly cover all of them, but you can check out our cloud guide
Infrastructure Deployment
A Kubernetes orchestrator can be deployed directly from the ZenML CLI:
You can pass other configurations specific to the stack components as key-value arguments. If you don't provide a name, a random one is generated for you. For more information about how to work use the CLI for this, please refer to the dedicated documentation section.
How to use it
To use the Kubernetes orchestrator, we need:
The ZenML
kubernetes
integration installed. If you haven't done so, run
We can then register the orchestrator and use it in our active stack. This can be done in two ways:
You can now run any ZenML pipeline using the Kubernetes orchestrator:
If all went well, you should now see the logs of all Kubernetes pods in your terminal, and when running kubectl get pods -n zenml
, you should also see that a pod was created in your cluster for each pipeline step.
Interacting with pods via kubectl
For debugging, it can sometimes be handy to interact with the Kubernetes pods directly via kubectl. To make this easier, we have added the following labels to all pods:
run
: the name of the ZenML run.pipeline
: the name of the ZenML pipeline associated with this run.
E.g., you can use these labels to manually delete all pods related to a specific pipeline:
Additional configuration
The Kubernetes orchestrator will by default use a Kubernetes namespace called zenml
to run pipelines. In that namespace, it will automatically create a Kubernetes service account called zenml-service-account
and grant it edit
RBAC role in that namespace. To customize these settings, you can configure the following additional attributes in the Kubernetes orchestrator:
kubernetes_namespace
: The Kubernetes namespace to use for running the pipelines. The namespace must already exist in the Kubernetes cluster.service_account_name
: The name of a Kubernetes service account to use for running the pipelines. If configured, it must point to an existing service account in the default or configurednamespace
that has associated RBAC roles granting permissions to create and manage pods in that namespace. This can also be configured as an individual pipeline setting in addition to the global orchestrator setting.
For additional configuration of the Kubernetes orchestrator, you can pass KubernetesOrchestratorSettings
which allows you to configure (among others) the following attributes:
pod_settings
: Node selectors, affinity, and tolerations to apply to the Kubernetes Pods running your pipeline. These can be either specified using the Kubernetes model objects or as dictionaries.
Enabling CUDA for GPU-backed hardware
Last updated