This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.
The Kubeflow orchestrator is an orchestrator flavor provided with the ZenML kubeflow integration that uses Kubeflow Pipelines to run your pipelines.
When to use it
You should use the Kubeflow orchestrator if:
you're looking for a proven production-grade orchestrator.
you're looking for a UI in which you can track your pipeline runs.
you're already using Kubernetes or are not afraid of setting up and maintaining a Kubernetes cluster.
you're willing to deploy and maintain Kubeflow Pipelines on your cluster.
How to deploy it
The Kubeflow orchestrator supports two different modes: Local and remote. In case you want to run the orchestrator on a local Kubernetes cluster running on your machine, there is no additional infrastructure setup necessary.
If you want to run your pipelines on a remote cluster instead, you'll need to set up a Kubernetes cluster and deploy Kubeflow Pipelines:
. However, the workflow controller installed with the Kubeflow installation has Docker set as the default runtime. In order to make your pipelines work, you have to change the value to one of the options
This change has to be made by editing the containerRuntimeExecutor property of the ConfigMap corresponding to the workflow controller. Run the following commands to first know what config map to change and then to edit it to reflect your new value.
kubectl get configmap -n kubeflow
kubectl edit configmap CONFIGMAP_NAME -n kubeflow
# This opens up an editor that can be used to make the change.
If one or more of the deployments are not in the Running state, try increasing the number of nodes in your cluster.
If you're installing Kubeflow Pipelines manually, make sure the Kubernetes service is called exactly ml-pipeline. This is a requirement for ZenML to connect to your Kubeflow Pipelines deployment.
How to use it
To use the Kubeflow orchestrator, we need:
The ZenML kubeflow integration installed. If you haven't done so, run
The local Kubeflow Pipelines deployment requires more than 2 GB of RAM, so if you're using Docker Desktop make sure to update the resource limits in the preferences.
We can then register the orchestrator and use it in our active stack:
zenmlorchestratorregister<NAME> \--flavor=kubeflow# Add the orchestrator to the active stackzenmlstackupdate-o<NAME>
When using the Kubeflow orchestrator with a remote cluster, you'll additionally need:
Kubeflow pipelines deployed on a remote cluster. See the deployment section for more information.
The name of your Kubernetes context which points to your remote cluster. Run kubectl config get-contexts to see a list of available contexts.
We can then register the orchestrator and use it in our active stack:
zenmlorchestratorregister<NAME> \--flavor=kubeflow \--kubernetes_context=<KUBERNETES_CONTEXT># Add the orchestrator to the active stackzenmlstackupdate-o<NAME>
ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME> which includes your code and use it to run your pipeline steps in Kubeflow. Check out this page if you want to learn more about how ZenML builds these images and how you can customize them.
Once the orchestrator is part of the active stack, we need to run zenml stack up before running any pipelines. This command
forwards a port, so you can view the Kubeflow UI in your browser.
(in the local case) uses K3D to provision a Kubernetes cluster on your machine and deploys Kubeflow Pipelines on it.
You can now run any ZenML pipeline using the Kubeflow orchestrator:
pythonfile_that_runs_a_zenml_pipeline.py
For additional configuration of the Kubeflow orchestrator, you can pass KubeflowOrchestratorSettings which allows you to configure the following attributes:
client_args: Arguments to pass when initializing the KFP client.
user_namespace: The user namespace to use when creating experiments and runs.
Kubeflow has a notion of multi-tenancy built into its deployment. Kubeflow’s multi-user isolation simplifies user operations because each user only views and edited\s the Kubeflow components and model artifacts defined in their configuration.
Using the ZenML Kubeflow orchestrator on a multi-tenant deployment without any settings will result in the following error:
HTTP response body: {"error":"Invalid input error: Invalid resource references for experiment. ListExperiment requires filtering by namespace.","code":3,"message":"Invalid input error: Invalid resource references for experiment. ListExperiment requires filtering by
namespace.","details":[{"@type":"type.googleapis.com/api.Error","error_message":"Invalid resource references for experiment. ListExperiment requires filtering by namespace.","error_details":"Invalid input error: Invalid resource references for experiment. ListExperiment requires filtering by namespace."}]}
In order to get it to work, we need to leverage the KubeflowOrchestratorSettings referenced above. By setting the namespace option, and by passing in the right authentication credentials to the Kubeflow Pipelines Client, we can make it work.
First, when registering your kubeflow orchestrator, please make sure to include the kubeflow_hostname parameter. The kubeflow_hostnamemust end with the /pipeline post-fix.
zenmlorchestratorregister<NAME> \--flavor=kubeflow \--kubernetes_context=<KUBERNETES_CONTEXT> \ --kubeflow_hostname=<KUBEFLOW_HOSTNAME># e.g. https://mykubeflow.example.com/pipeline
Then, ensure that you use the pass the right settings before triggerling a pipeline run. The following snipper will prove useful:
import requestsfrom zenml.client import Clientfrom zenml.integrations.kubeflow.flavors.kubeflow_orchestrator_flavor import ( KubeflowOrchestratorSettings,)NAMESPACE ="namespace_name"# This is the user namespace for the profile you want to useUSERNAME ="username"# This is the username for the profile you want to usePASSWORD ="password"# This is the password for the profile you want to usedefget_kfp_token(username:str,password:str) ->str:"""Get token for kubeflow authentication."""# Resolve host from active stack orchestrator =Client().active_stack.orchestratorif orchestrator.flavor !="kubeflow":raiseAssertionError("You can only use this function with an ""orchestrator of flavor `kubeflow` in the ""active stack!" )try: kubeflow_host = orchestrator.config.kubeflow_hostnameexceptAttributeError:raiseAssertionError("You must configure the Kubeflow orchestrator ""with the `kubeflow_hostname` parameter which ends ""with `/pipeline` (e.g. `https://mykubeflow.com/pipeline`). ""Please update the current kubeflow orchestrator with: "f"`zenml orchestrator update {orchestrator.name} ""--kubeflow_hostname=<MY_KUBEFLOW_HOST>`" ) session = requests.Session() response = session.get(kubeflow_host) headers ={"Content-Type":"application/x-www-form-urlencoded",} data ={"login": username,"password": password} session.post(response.url, headers=headers, data=data) session_cookie = session.cookies.get_dict()["authservice_session"]return session_cookietoken =get_kfp_token(USERNAME, PASSWORD)session_cookie ="authservice_session="+ tokenkubeflow_settings =KubeflowOrchestratorSettings( client_args={"cookies": session_cookie}, user_namespace=NAMESPACE)@pipeline( settings={"orchestrator.kubeflow": kubeflow_settings }): ...if"__name__"=="__main__":# Run the pipeline
Note that the above is also currently not tested on all Kubeflow versions, so there might be further bugs with older Kubeflow versions. In this case, please reach out to us on Slack.
A concrete example of using the Kubeflow orchestrator can be found here.
For more information and a full list of configurable attributes of the Kubeflow orchestrator, check out the API Docs.