Orchestrate on the cloud
Orchestrate using cloud resources.
Last updated
Was this helpful?
Orchestrate using cloud resources.
Last updated
Was this helpful?
Until now, we've only run pipelines locally. The next step is to get free from our local machines and transition our pipelines to execute on the cloud. This will enable you to run your MLOps pipelines in a cloud environment, leveraging the scalability and robustness that cloud platforms offer.
In order to do this, we need to get familiar with two more stack components:
The manages the workflow and execution of your pipelines.
The is a storage and content delivery system that holds your Docker container images.
These, along with , complete a basic cloud stack where our pipeline is entirely running on the cloud.
The easiest cloud orchestrator to start with is the orchestrator running on a public cloud. The advantage of Skypilot is that it simply provisions a VM to execute the pipeline on your cloud provider.
Coupled with Skypilot, we need a mechanism to package your code and ship it to the cloud for Skypilot to do its thing. ZenML uses to achieve this. Every time you run a pipeline with a remote orchestrator, for the entire pipeline (and optionally each step of a pipeline depending on your ). This image contains the code, requirements, and everything else needed to run the steps of the pipeline in any environment. ZenML then pushes this image to the container registry configured in your stack, and the orchestrator pulls the image when it's ready to execute a step.
To summarize, here is the broad sequence of events that happen when you run a pipeline with such a cloud stack:
The user runs a pipeline on the client machine. This executes the run.py
script where ZenML reads the @pipeline
function and understands what steps need to be executed.
The client asks the server for the stack info, which returns it with the configuration of the cloud stack.
Based on the stack info and pipeline specification, the client builds and pushes an image to the container registry
. The image contains the environment needed to execute the pipeline and the code of the steps.
The orchestrator
pulls the appropriate image from the container registry
as it's executing the pipeline (each step has an image).
As each pipeline runs, it stores artifacts physically in the artifact store
. Of course, this artifact store needs to be some form of cloud storage.
As each pipeline runs, it reports status back to the ZenML server and optionally queries the server for metadata.
In order to launch a pipeline on AWS with the SkyPilot orchestrator, the first thing that you need to do is to install the AWS and Skypilot integrations:
With the components registered, everything is set up for the next steps.
and then, run the training pipeline:
You will notice this time your pipeline behaves differently. After it has built the Docker image with all your code, it will push that image, and run a VM on the cloud. Here is where your pipeline will execute, and the logs will be streamed back to you. So with a few commands, we were able to ship our entire code to the cloud!
The client creates a run in the orchestrator
. For example, in the case of the orchestrator, it creates a virtual machine in the cloud with some commands to pull and run a Docker image from the specified container registry.
While there are detailed docs on and a on each public cloud, we have put the most relevant details here for convenience:
Before we start registering any components, there is another step that we have to execute. As we , components such as orchestrators and container registries often require you to set up the right permissions. In ZenML, this process is simplified with the use of . For this example, we need to use the :
Once the service connector is set up, we can register :
The next step is to register . Similar to the orchestrator, we will use our connector as we are setting up the container registry:
For more information, you can always check the .
Before we start registering any components, there is another step that we have to execute. As we , components such as orchestrators and container registries often require you to set up the right permissions. In ZenML, this process is simplified with the use of . For this example, we need to use the :
Once the service connector is set up, we can register :
The next step is to register . Similar to the orchestrator, we will use our connector as we are setting up the container registry:
For more information, you can always check the .
As of , alongside the switch to pydantic
v2, due to an incompatibility between the new version pydantic
and the azurecli
, the skypilot[azure]
flavor can not be installed at the same time. Therefore, for Azure users, an alternative is to use the . You can easily deploy a Kubernetes cluster in your subscription using the .
You should also ensure you have .
Before we start registering any components, there is another step that we have to execute. As we , components such as orchestrators and container registries often require you to set up the right permissions. In ZenML, this process is simplified with the use of . For this example, we will need to use the :
Once the service connector is set up, we can register :
The next step is to register . Similar to the orchestrator, we will use our connector as we are setting up the container registry.
For more information, you can always check the .
Having trouble with setting up infrastructure? Try reading the section of the docs to gain more insight. If that still doesn't work, join the and ask!
Now that we have our orchestrator and container registry registered, we can , just like we did in the previous chapter:
Now, using the , we can run a training pipeline. First, set the minimal cloud stack active:
Curious to see what other stacks you can create? The has an exhaustive list of various artifact stores, container registries, and orchestrators that are integrated with ZenML. Try playing around with more stack components to see how easy it is to switch between MLOps stacks with ZenML.