Configure your pipeline to add compute
Add more resources to your pipeline configuration.
Now that we have our pipeline up and running in the cloud, you might be wondering how ZenML figured out what sort of dependencies to install in the Docker image that we just ran on the VM. The answer lies in the runner script we executed (i.e. run.py), in particular, these lines:
The above commands configure our training pipeline with a YAML configuration called training_rf.yaml
(found here in the source code). Let's learn more about this configuration file.
The with_options
command that points to a YAML config is only one way to configure a pipeline. We can also directly configure a pipeline or a step in the decorator:
However, it is best to not mix configuration from code to ensure separation of concerns in our codebase.
Breaking down our configuration YAML
The YAML configuration of a ZenML pipeline can be very simple, as in this case. Let's break it down and go through each section one by one:
The Docker settings
The first section is the so-called settings
of the pipeline. This section has a docker
key, which controls the containerization process. Here, we are simply telling ZenML that we need pyarrow
as a pip requirement, and we want to enable the sklearn
integration of ZenML, which will in turn install the scikit-learn
library. This Docker section can be populated with many different options, and correspond to the DockerSettings class in the Python SDK.
Associating a ZenML Model
The next section is about associating a ZenML Model with this pipeline.
You will see that this configuration lines up with the model created after executing these pipelines:
Passing parameters
The last part of the config YAML is the parameters
key:
This parameters key aligns with the parameters that the pipeline expects. In this case, the pipeline expects a string called model_type
that will inform it which type of model to use:
So you can see that the YAML config is fairly easy to use and is an important part of the codebase to control the execution of our pipeline. You can read more about how to configure a pipeline in the how to section, but for now, we can move on to scaling our pipeline.
Scaling compute on the cloud
When we ran our pipeline with the above config, ZenML used some sane defaults to pick the resource requirements for that pipeline. However, in the real world, you might want to add more memory, CPU, or even a GPU depending on the pipeline at hand.
This is as easy as adding the following section to your local training_rf.yaml
file:
Here we are configuring the entire pipeline with a certain amount of memory, while for the trainer step we are additionally configuring 8 CPU cores. The orchestrator
key corresponds to the SkypilotBaseOrchestratorSettings
class in the Python SDK.
Now let's run the pipeline again:
Now you should notice the machine that gets provisioned on your cloud provider would have a different configuration as compared to last time. As easy as that!
Bear in mind that not every orchestrator supports ResourceSettings
directly. To learn more, you can read about ResourceSettings
here, including the ability to attach a GPU.
Last updated