AzureML
Executing individual steps in AzureML.
AzureML offers specialized compute instances to run your training jobs and has a comprehensive UI to track and manage your models and logs. ZenML's AzureML step operator allows you to submit individual steps to be run on AzureML compute instances.
When to use it
You should use the AzureML step operator if:
one or more steps of your pipeline require computing resources (CPU, GPU, memory) that are not provided by your orchestrator.
How to deploy it
Would you like to skip ahead and deploy a full ZenML cloud stack already, including an AzureML step operator? Check out the in-browser stack deployment wizard, the stack registration wizard, or the ZenML Azure Terraform module for a shortcut on how to deploy & register this stack component.
Create a
Machine learning
workspace on Azure. This should include an Azure container registry and an Azure storage account that will be used as part of your stack.(Optional) Once your resource is created, you can head over to the
Azure Machine Learning Studio
and create a compute instance or cluster to run your pipelines. If omitted, the AzureML step operator will use the serverless compute target or will provision a new compute target on the fly, depending on the settings used to configure the step operator.(Optional) Create a Service Principal for authentication. This is required if you intend to use a service connector to authenticate your step operator.
How to use it
To use the AzureML step operator, we need:
The ZenML
azure
integration installed. If you haven't done so, runDocker installed and running.
An Azure container registry as part of your stack. Take a look here for a guide on how to set that up.
An Azure artifact store as part of your stack. This is needed so that both your orchestration environment and AzureML can read and write step artifacts. Take a look here for a guide on how to set that up.
An AzureML workspace and an optional compute cluster. Note that the AzureML workspace can share the Azure container registry and Azure storage account that are required above. See the deployment section for detailed instructions.
There are two ways you can authenticate your step operator to be able to run steps on Azure:
The recommended way to authenticate your AzureML step operator is by registering or using an existing Azure Service Connector and connecting it to your AzureML step operator. The credentials configured for the connector must have permissions to create and manage AzureML jobs (e.g. the AzureML Data Scientist
and AzureML Compute Operator
managed roles). The AzureML step operator uses the azure-generic
resource type, so make sure to configure the connector accordingly:
Once you added the step operator to your active stack, you can use it to execute individual steps of your pipeline by specifying it in the @step
decorator as follows:
ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME>
which includes your code and use it to run your steps in AzureML. Check out this page if you want to learn more about how ZenML builds these images and how you can customize them.
Additional configuration
The ZenML AzureML step operator comes with a dedicated class called AzureMLStepOperatorSettings
for configuring its settings and it controls the compute resources used for step execution in AzureML.
Currently, it supports three different modes of operation.
Serverless Compute (Default)
Set
mode
toserverless
.Other parameters are ignored.
Compute Instance
Set
mode
tocompute-instance
.Requires a
compute_name
.If a compute instance with the same name exists, it uses the existing compute instance and ignores other parameters.
If a compute instance with the same name doesn't exist, it creates a new compute instance with the
compute_name
. For this process, you can specifycompute_size
andidle_type_before_shutdown_minutes
.
Compute Cluster
Set
mode
tocompute-cluster
.Requires a
compute_name
.If a compute cluster with the same name exists, it uses existing cluster, ignores other parameters.
If a compute cluster with the same name doesn't exist, it creates a new compute cluster. Additional parameters can be used for configuring this process.
Here is an example how you can use the AzureMLStepOperatorSettings
to define a compute instance:
You can check out the AzureMLStepOperatorSettings SDK docs for a full list of available attributes and this docs page for more information on how to specify settings.
Enabling CUDA for GPU-backed hardware
Note that if you wish to use this step operator to run steps on a GPU, you will need to follow the instructions on this page to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.
Last updated