Kubeflow Orchestrator
Orchestrating your pipelines to run on Kubeflow.
Last updated
Was this helpful?
Orchestrating your pipelines to run on Kubeflow.
Last updated
Was this helpful?
The Kubeflow orchestrator is an flavor provided by the ZenML kubeflow
integration that uses to run your pipelines.
This component is only meant to be used within the context of a . Usage with a local ZenML deployment may lead to unexpected behavior!
You should use the Kubeflow orchestrator if:
you're looking for a proven production-grade orchestrator.
you're looking for a UI in which you can track your pipeline runs.
you're already using Kubernetes or are not afraid of setting up and maintaining a Kubernetes cluster.
you're willing to deploy and maintain Kubeflow Pipelines on your cluster.
To run ZenML pipelines on Kubeflow, you'll need to set up a Kubernetes cluster and deploy Kubeflow Pipelines on it. This can be done in a variety of ways, depending on whether you want to use a cloud provider or your own infrastructure:
Have an existing AWS set up.
Make sure you have the set up.
Download and kubectl
and configure it to talk to your EKS cluster using the following command:
Kubeflow Pipelines onto your cluster.
( optional) to grant ZenML Stack Components easy and secure access to the remote EKS cluster.
If you're installing Kubeflow Pipelines manually, make sure the Kubernetes service is called exactly ml-pipeline
. This is a requirement for ZenML to connect to your Kubeflow Pipelines deployment.
To use the Kubeflow orchestrator, we need:
The ZenML kubeflow
integration installed. If you haven't done so, run
We can then register the orchestrator and use it in our active stack. This can be done in two ways:
The following example demonstrates how to register the orchestrator and connect it to a remote Kubernetes cluster using a Service Connector:
You can now run any ZenML pipeline using the Kubeflow orchestrator:
Kubeflow comes with its own UI that you can use to find further details about your pipeline runs, such as the logs of your steps. For any runs executed on Kubeflow, you can get the URL to the Kubeflow UI in Python using the following code snippet:
For additional configuration of the Kubeflow orchestrator, you can pass KubeflowOrchestratorSettings
which allows you to configure (among others) the following attributes:
client_args
: Arguments to pass when initializing the KFP client.
user_namespace
: The user namespace to use when creating experiments and runs.
pod_settings
: Node selectors, affinity, and tolerations to apply to the Kubernetes Pods running your pipeline. These can be either specified using the Kubernetes model objects or as dictionaries.
Using the ZenML Kubeflow orchestrator on a multi-tenant deployment without any settings will result in the following error:
In order to get it to work, we need to leverage the KubeflowOrchestratorSettings
referenced above. By setting the namespace option, and by passing in the right authentication credentials to the Kubeflow Pipelines Client, we can make it work.
First, when registering your Kubeflow orchestrator, please make sure to include the kubeflow_hostname
parameter. The kubeflow_hostname
must end with the /pipeline
post-fix.
Then, ensure that you use the pass the right settings before triggering a pipeline run. The following snippet will prove useful:
The above example encoded the username and password in plain text as settings. You can also set them as secrets.
And then you can use them in code:
Download and kubectl
and configure it to talk to your Kubernetes cluster.
Kubeflow Pipelines onto your cluster.
( optional) to grant ZenML Stack Components easy and secure access to the remote Kubernetes cluster. This is especially useful if your Kubernetes cluster is remotely accessible, as this enables other ZenML users to use it to run pipelines without needing to configure and set up kubectl
on their local machines.
A Kubernetes cluster with Kubeflow pipelines installed. See the for more information.
A ZenML server deployed remotely where it can be accessed from the Kubernetes cluster. See the for more information.
installed and running (unless you are using a remote in your ZenML stack).
installed (optional, see below)
If you are using a single-tenant Kubeflow installed in a Kubernetes cluster managed by a cloud provider like AWS, GCP or Azure, it is recommended that you set up and use it to connect ZenML Stack Components to the remote Kubernetes cluster. This guarantees that your Stack is fully portable on other environments and your pipelines are fully reproducible.
The name of your Kubernetes context which points to your remote cluster. Run kubectl config get-contexts
to see a list of available contexts. NOTE: this is no longer required if you are using to connect your Kubeflow Orchestrator Stack Component to the remote Kubernetes cluster.
A as part of your stack.
A as part of your stack.
If you have configured to access the remote Kubernetes cluster, you no longer need to set the kubernetes_context
attribute to a local kubectl
context. In fact, you don't need the local Kubernetes CLI at all. You can instead:
if you don't have a Service Connector on hand and you don't want to , the local Kubernetes kubectl
client needs to be configured with a configuration context pointing to the remote cluster. The kubernetes_context
must also be configured with the value of that context:
ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME>
which includes all required software dependencies and use it to run your pipeline steps in Kubeflow. Check out if you want to learn more about how ZenML builds these images and how you can customize them.
Check out the for a full list of available attributes and for more information on how to specify settings.
Note that if you wish to use this orchestrator to run steps on a GPU, you will need to follow to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.
Kubeflow has a notion of built into its deployment. Kubeflow's multi-user isolation simplifies user operations because each user only views and edited the Kubeflow components and model artifacts defined in their configuration.
Note that the above is also currently not tested on all Kubeflow versions, so there might be further bugs with older Kubeflow versions. In this case, please reach out to us on .
See full documentation of using ZenML secrets .
For more information and a full list of configurable attributes of the Kubeflow orchestrator, check out the .