AzureML

Executing individual steps in AzureML.

AzureML offers specialized compute instances to run your training jobs and has a comprehensive UI to track and manage your models and logs. ZenML's AzureML step operator allows you to submit individual steps to be run on AzureML compute instances.

When to use it

You should use the AzureML step operator if:

  • one or more steps of your pipeline require computing resources (CPU, GPU, memory) that are not provided by your orchestrator.

  • you have access to AzureML. If you're using a different cloud provider, take a look at the SageMaker or Vertex step operators.

How to deploy it

Would you like to skip ahead and deploy a full ZenML cloud stack already, including an AzureML step operator? Check out the in-browser stack deployment wizard, the stack registration wizard, or the ZenML Azure Terraform module for a shortcut on how to deploy & register this stack component.

  • Create a Machine learning workspace on Azure. This should include an Azure container registry and an Azure storage account that will be used as part of your stack.

  • (Optional) Once your resource is created, you can head over to the Azure Machine Learning Studio and create a compute instance or cluster to run your pipelines. If omitted, the AzureML step operator will use the serverless compute target or will provision a new compute target on the fly, depending on the settings used to configure the step operator.

  • (Optional) Create a Service Principal for authentication. This is required if you intend to use a service connector to authenticate your step operator.

How to use it

To use the AzureML step operator, we need:

  • The ZenML azure integration installed. If you haven't done so, run

    zenml integration install azure
  • Docker installed and running.

  • An Azure container registry as part of your stack. Take a look here for a guide on how to set that up.

  • An Azure artifact store as part of your stack. This is needed so that both your orchestration environment and AzureML can read and write step artifacts. Take a look here for a guide on how to set that up.

  • An AzureML workspace and an optional compute cluster. Note that the AzureML workspace can share the Azure container registry and Azure storage account that are required above. See the deployment section for detailed instructions.

There are two ways you can authenticate your step operator to be able to run steps on Azure:

The recommended way to authenticate your AzureML step operator is by registering or using an existing Azure Service Connector and connecting it to your AzureML step operator. The credentials configured for the connector must have permissions to create and manage AzureML jobs (e.g. the AzureML Data Scientist and AzureML Compute Operator managed roles). The AzureML step operator uses the azure-generic resource type, so make sure to configure the connector accordingly:

zenml service-connector register <CONNECTOR_NAME> --type azure -i
zenml step-operator register <STEP_OPERATOR_NAME> \
    --flavor=azureml \
    --subscription_id=<AZURE_SUBSCRIPTION_ID> \
    --resource_group=<AZURE_RESOURCE_GROUP> \
    --workspace_name=<AZURE_WORKSPACE_NAME> \
#   --compute_target_name=<AZURE_COMPUTE_TARGET_NAME> # optionally specify an existing compute target

zenml step-operator connect <STEP_OPERATOR_NAME> --connector <CONNECTOR_NAME>
zenml stack register <STACK_NAME> -s <STEP_OPERATOR_NAME> ... --set

Once you added the step operator to your active stack, you can use it to execute individual steps of your pipeline by specifying it in the @step decorator as follows:

from zenml import step


@step(step_operator=<NAME>)
def trainer(...) -> ...:
    """Train a model."""
    # This step will be executed in AzureML.

ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME> which includes your code and use it to run your steps in AzureML. Check out this page if you want to learn more about how ZenML builds these images and how you can customize them.

Additional configuration

The ZenML AzureML step operator comes with a dedicated class called AzureMLStepOperatorSettings for configuring its settings and it controls the compute resources used for step execution in AzureML.

Currently, it supports three different modes of operation.

  1. Serverless Compute (Default)

  • Set mode to serverless.

  • Other parameters are ignored.

  1. Compute Instance

  • Set mode to compute-instance.

  • Requires a compute_name.

    • If a compute instance with the same name exists, it uses the existing compute instance and ignores other parameters.

    • If a compute instance with the same name doesn't exist, it creates a new compute instance with the compute_name. For this process, you can specify compute_size and idle_type_before_shutdown_minutes.

  1. Compute Cluster

  • Set mode to compute-cluster.

  • Requires a compute_name.

    • If a compute cluster with the same name exists, it uses existing cluster, ignores other parameters.

    • If a compute cluster with the same name doesn't exist, it creates a new compute cluster. Additional parameters can be used for configuring this process.

Here is an example how you can use the AzureMLStepOperatorSettings to define a compute instance:

from zenml.integrations.azure.flavors import AzureMLStepOperatorSettings

azureml_settings = AzureMLStepOperatorSettings(
    mode="compute-instance",
    compute_name="MyComputeInstance",
    compute_size="Standard_NC6s_v3",
)

@step(
   settings={
       "step_operator": azureml_settings
   }
)
def my_azureml_step():
    # YOUR STEP CODE
    ...

You can check out the AzureMLStepOperatorSettings SDK docs for a full list of available attributes and this docs page for more information on how to specify settings.

Enabling CUDA for GPU-backed hardware

Note that if you wish to use this step operator to run steps on a GPU, you will need to follow the instructions on this page to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.

Last updated