Kubeflow Orchestrator
Orchestrating your pipelines to run on Kubeflow.
When to use it
You should use the Kubeflow orchestrator if:
you're looking for a proven production-grade orchestrator.
you're looking for a UI in which you can track your pipeline runs.
you're already using Kubernetes or are not afraid of setting up and maintaining a Kubernetes cluster.
you're willing to deploy and maintain Kubeflow Pipelines on your cluster.
How to deploy it
To run ZenML pipelines on Kubeflow, you'll need to set up a Kubernetes cluster and deploy Kubeflow Pipelines on it. This can be done in a variety of ways, depending on whether you want to use a cloud provider or your own infrastructure:
If one or more of the deployments are not in the Running
state, try increasing the number of nodes in your cluster.
If you're installing Kubeflow Pipelines manually, make sure the Kubernetes service is called exactly ml-pipeline
. This is a requirement for ZenML to connect to your Kubeflow Pipelines deployment.
Infrastructure Deployment
A Kubeflow orchestrator can be deployed directly from the ZenML CLI:
You can pass other configurations specific to the stack components as key-value arguments. If you don't provide a name, a random one is generated for you. For more information about how to work use the CLI for this, please refer to the dedicated documentation section.
How to use it
To use the Kubeflow orchestrator, we need:
The ZenML
kubeflow
integration installed. If you haven't done so, run
We can then register the orchestrator and use it in our active stack. This can be done in two ways:
The following example demonstrates how to register the orchestrator and connect it to a remote Kubernetes cluster using a Service Connector:
You can now run any ZenML pipeline using the Kubeflow orchestrator:
Kubeflow UI
Kubeflow comes with its own UI that you can use to find further details about your pipeline runs, such as the logs of your steps. For any runs executed on Kubeflow, you can get the URL to the Kubeflow UI in Python using the following code snippet:
Additional configuration
For additional configuration of the Kubeflow orchestrator, you can pass KubeflowOrchestratorSettings
which allows you to configure (among others) the following attributes:
client_args
: Arguments to pass when initializing the KFP client.user_namespace
: The user namespace to use when creating experiments and runs.pod_settings
: Node selectors, affinity, and tolerations to apply to the Kubernetes Pods running your pipeline. These can be either specified using the Kubernetes model objects or as dictionaries.
Enabling CUDA for GPU-backed hardware
Important Note for Multi-Tenancy Deployments
Using the ZenML Kubeflow orchestrator on a multi-tenant deployment without any settings will result in the following error:
In order to get it to work, we need to leverage the KubeflowOrchestratorSettings
referenced above. By setting the namespace option, and by passing in the right authentication credentials to the Kubeflow Pipelines Client, we can make it work.
First, when registering your Kubeflow orchestrator, please make sure to include the kubeflow_hostname
parameter. The kubeflow_hostname
must end with the /pipeline
post-fix.
Then, ensure that you use the pass the right settings before triggering a pipeline run. The following snippet will prove useful:
Using secrets in settings
The above example encoded the username and password in plain text as settings. You can also set them as secrets.
And then you can use them in code:
Last updated