Amazon SageMaker

Executing individual steps in SageMaker.

SageMaker offers specialized compute instances to run your training jobs and has a comprehensive UI to track and manage your models and logs. ZenML's SageMaker step operator allows you to submit individual steps to be run on Sagemaker compute instances.

When to use it

You should use the SageMaker step operator if:

  • one or more steps of your pipeline require computing resources (CPU, GPU, memory) that are not provided by your orchestrator.

  • you have access to SageMaker. If you're using a different cloud provider, take a look at the Vertex or AzureML step operators.

How to deploy it

  • Create a role in the IAM console that you want the jobs running in SageMaker to assume. This role should at least have the AmazonS3FullAccess and AmazonSageMakerFullAccess policies applied. Check here for a guide on how to set up this role.

Infrastructure Deployment

A Sagemaker step operator can be deployed directly from the ZenML CLI:

zenml orchestrator deploy sagemaker_step_operator --flavor=sagemaker --provider=aws ...

You can pass other configurations specific to the stack components as key-value arguments. If you don't provide a name, a random one is generated for you. For more information about how to work use the CLI for this, please refer to the dedicated documentation section.

How to use it

To use the SageMaker step operator, we need:

  • The ZenML aws integration installed. If you haven't done so, run

    zenml integration install aws
  • Docker installed and running.

  • An IAM role with the correct permissions. See the deployment section for detailed instructions.

  • An AWS container registry as part of our stack. Take a look here for a guide on how to set that up.

  • A remote artifact store as part of your stack. This is needed so that both your orchestration environment and SageMaker can read and write step artifacts. Check out the documentation page of the artifact store you want to use for more information on how to set that up and configure authentication for it.

  • An instance type that we want to execute our steps on. See here for a list of available instance types.

  • (Optional) An experiment that is used to group SageMaker runs. Check this guide to see how to create an experiment.

There are two ways you can authenticate your orchestrator to AWS to be able to run steps on SageMaker:

The recommended way to authenticate your SageMaker step operator is by registering or using an existing AWS Service Connector and connecting it to your SageMaker step operator. The credentials configured for the connector must have permissions to create and manage SageMaker runs (e.g. the AmazonSageMakerFullAccess managed policy permissions). The SageMaker step operator uses these aws-generic resource type, so make sure to configure the connector accordingly:

zenml service-connector register <CONNECTOR_NAME> --type aws -i
zenml step-operator register <STEP_OPERATOR_NAME> \
    --flavor=sagemaker \
    --role=<SAGEMAKER_ROLE> \
    --instance_type=<INSTANCE_TYPE> \
#   --experiment_name=<EXPERIMENT_NAME> # optionally specify an experiment to assign this run to

zenml step-operator connect <STEP_OPERATOR_NAME> --connector <CONNECTOR_NAME>
zenml stack register <STACK_NAME> -s <STEP_OPERATOR_NAME> ... --set

Once you added the step operator to your active stack, you can use it to execute individual steps of your pipeline by specifying it in the @step decorator as follows:

from zenml import step


@step(step_operator= <NAME>)
def trainer(...) -> ...:
    """Train a model."""
    # This step will be executed in SageMaker.

ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME> which includes your code and use it to run your steps in SageMaker. Check out this page if you want to learn more about how ZenML builds these images and how you can customize them.

Additional configuration

For additional configuration of the SageMaker step operator, you can pass SagemakerStepOperatorSettings when defining or running your pipeline. Check out the SDK docs for a full list of available attributes and this docs page for more information on how to specify settings.

For more information and a full list of configurable attributes of the SageMaker step operator, check out the API Docs .

Enabling CUDA for GPU-backed hardware

Note that if you wish to use this step operator to run steps on a GPU, you will need to follow the instructions on this page to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.

Last updated