Develop a custom orchestrator
Learning how to develop a custom orchestrator.
Before diving into the specifics of this component type, it is beneficial to familiarize yourself with our general guide to writing custom component flavors in ZenML. This guide provides an essential understanding of ZenML's component flavor concepts.
Base Implementation
ZenML aims to enable orchestration with any orchestration tool. This is where the BaseOrchestrator
comes into play. It abstracts away many of the ZenML-specific details from the actual implementation and exposes a simplified interface:
This is a slimmed-down version of the base implementation which aims to highlight the abstraction layer. In order to see the full implementation and get the complete docstrings, please check the source code on GitHub .
Build your own custom orchestrator
If you want to create your own custom flavor for an orchestrator, you can follow the following steps:
Create a class that inherits from the
BaseOrchestrator
class and implement the abstractprepare_or_run_pipeline(...)
andget_orchestrator_run_id()
methods.If you need to provide any configuration, create a class that inherits from the
BaseOrchestratorConfig
class and add your configuration parameters.Bring both the implementation and the configuration together by inheriting from the
BaseOrchestratorFlavor
class. Make sure that you give aname
to the flavor through its abstract property.
Once you are done with the implementation, you can register it through the CLI. Please ensure you point to the flavor class via dot notation:
For example, if your flavor class MyOrchestratorFlavor
is defined in flavors/my_flavor.py
, you'd register it by doing:
ZenML resolves the flavor class by taking the path where you initialized zenml (via zenml init
) as the starting point of resolution. Therefore, please ensure you follow the best practice of initializing zenml at the root of your repository.
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
Afterward, you should see the new flavor in the list of available flavors:
It is important to draw attention to when and how these base abstractions are coming into play in a ZenML workflow.
The CustomOrchestratorFlavor class is imported and utilized upon the creation of the custom flavor through the CLI.
The CustomOrchestratorConfig class is imported when someone tries to register/update a stack component with this custom flavor. Especially, during the registration process of the stack component, the config will be used to validate the values given by the user. As
Config
object are inherentlypydantic
objects, you can also add your own custom validators here.The CustomOrchestrator only comes into play when the component is ultimately in use.
The design behind this interaction lets us separate the configuration of the flavor from its implementation. This way we can register flavors and components even when the major dependencies behind their implementation are not installed in our local setting (assuming the CustomOrchestratorFlavor
and the CustomOrchestratorConfig
are implemented in a different module/path than the actual CustomOrchestrator
).
Implementation guide
Create your orchestrator class: This class should either inherit from
BaseOrchestrator
, or more commonly fromContainerizedOrchestrator
. If your orchestrator uses container images to run code, you should inherit fromContainerizedOrchestrator
which handles building all Docker images for the pipeline to be executed. If your orchestator does not use container images, you'll be responsible that the execution environment contains all the necessary requirements and code files to run the pipeline.Implement the
prepare_or_run_pipeline(...)
method: This method is responsible for running or scheduling the pipeline. In most cases, this means converting the pipeline into a format that your orchestration tool understands and running it. To do so, you should:Loop over all steps of the pipeline and configure your orchestration tool to run the correct command and arguments in the correct Docker image
Make sure the passed environment variables are set when the container is run
Make sure the containers are running in the correct order
Check out the code sample below for more details on how to fetch the Docker image, command, arguments and step order.
Implement the
get_orchestrator_run_id()
method: This must return a ID that is different for each pipeline run, but identical if called from within Docker containers running different steps of the same pipeline run. If your orchestrator is based on an external tool like Kubeflow or Airflow, it is usually best to use an unique ID provided by this tool.
To see a full end-to-end worked example of a custom orchestrator, see here.
Optional features
There are some additional optional features that your orchestrator can implement:
Running pipelines on a schedule: if your orchestrator supports running pipelines on a schedule, make sure to handle
deployment.schedule
if it exists. If your orchestrator does not support schedules, you should either log a warning and or even raise an exception in case the user tries to schedule a pipeline.Specifying hardware resources: If your orchestrator supports setting resources like CPUs, GPUs or memory for the pipeline or specific steps, make sure to handle the values defined in
step.config.resource_settings
. See the code sample below for additional helper methods to check whether any resources are required from your orchestrator.
Code sample
To see a full end-to-end worked example of a custom orchestrator, see here.
Enabling CUDA for GPU-backed hardware
Note that if you wish to use your custom orchestrator to run steps on a GPU, you will need to follow the instructions on this page to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.
Last updated