The orchestrator backend is especially important, as it defines where the actual
pipeline job runs. Think of it as the
root of any pipeline job, that controls how and where each individual step within
a pipeline is executed. Therefore, the combination of orchestrator and other backends can be used to great effect to scale
jobs in production.
The orchestrator environment can be the same environment as the processing environment, but not neccessarily.
E.g. by default a
pipeline.run() call would result in a local orchestrator and processing backend configuration,
meaning the orchestration would be local along with the actual steps. However, if lets say, a dataflow processing backend
is chosen, then chosen steps would be executed not in the local enviornment, but on the cloud in Google Dataflow.
Please refer to the docstrings within the source code for precise details the following s
The local orchestrator is used by default. It runs the pipelines sequentially as a Python process on your local machine. This is meant for smaller datasets and for quick experimentation.
The GCPOrchestrator can be found at
It spins up a VM on your GCP projects, zips up your code to the instance, and executes the ZenML pipeline with a
Docker Image of your choice.
Best of all, the Orchestrator