LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Overview
  • Integrations
  • Stack Components
    • Orchestrators
      • Local Orchestrator
      • Local Docker Orchestrator
      • Kubeflow Orchestrator
      • Kubernetes Orchestrator
      • Google Cloud VertexAI Orchestrator
      • AWS Sagemaker Orchestrator
      • AzureML Orchestrator
      • Databricks Orchestrator
      • Tekton Orchestrator
      • Airflow Orchestrator
      • Skypilot VM Orchestrator
      • HyperAI Orchestrator
      • Lightning AI Orchestrator
      • Develop a custom orchestrator
    • Artifact Stores
      • Local Artifact Store
      • Amazon Simple Cloud Storage (S3)
      • Google Cloud Storage (GCS)
      • Azure Blob Storage
      • Develop a custom artifact store
    • Container Registries
      • Default Container Registry
      • DockerHub
      • Amazon Elastic Container Registry (ECR)
      • Google Cloud Container Registry
      • Azure Container Registry
      • GitHub Container Registry
      • Develop a custom container registry
    • Step Operators
      • Amazon SageMaker
      • AzureML
      • Google Cloud VertexAI
      • Kubernetes
      • Modal
      • Spark
      • Develop a Custom Step Operator
    • Experiment Trackers
      • Comet
      • MLflow
      • Neptune
      • Weights & Biases
      • Google Cloud VertexAI Experiment Tracker
      • Develop a custom experiment tracker
    • Image Builders
      • Local Image Builder
      • Kaniko Image Builder
      • AWS Image Builder
      • Google Cloud Image Builder
      • Develop a Custom Image Builder
    • Alerters
      • Discord Alerter
      • Slack Alerter
      • Develop a Custom Alerter
    • Annotators
      • Argilla
      • Label Studio
      • Pigeon
      • Prodigy
      • Develop a Custom Annotator
    • Data Validators
      • Great Expectations
      • Deepchecks
      • Evidently
      • Whylogs
      • Develop a custom data validator
    • Feature Stores
      • Feast
      • Develop a Custom Feature Store
    • Model Deployers
      • MLflow
      • Seldon
      • BentoML
      • Hugging Face
      • Databricks
      • vLLM
      • Develop a Custom Model Deployer
    • Model Registries
      • MLflow Model Registry
      • Develop a Custom Model Registry
  • Service Connectors
    • Introduction
    • Complete guide
    • Best practices
    • Connector Types
      • Docker Service Connector
      • Kubernetes Service Connector
      • AWS Service Connector
      • GCP Service Connector
      • Azure Service Connector
      • HyperAI Service Connector
  • Popular Stacks
    • AWS
    • Azure
    • GCP
    • Kubernetes
  • Deployment
    • 1-click Deployment
    • Terraform Modules
    • Register a cloud stack
    • Infrastructure as code
  • Contribute
    • Custom Stack Component
    • Custom Integration
Powered by GitBook
On this page
  • When to use it
  • How to deploy it
  • How to use it

Was this helpful?

Edit on GitHub
  1. Stack Components
  2. Step Operators

Amazon SageMaker

Executing individual steps in SageMaker.

PreviousStep OperatorsNextAzureML

Last updated 22 days ago

Was this helpful?

offers specialized compute instances to run your training jobs and has a comprehensive UI to track and manage your models and logs. ZenML's SageMaker step operator allows you to submit individual steps to be run on Sagemaker compute instances.

When to use it

You should use the SageMaker step operator if:

  • one or more steps of your pipeline require computing resources (CPU, GPU, memory) that are not provided by your orchestrator.

  • you have access to SageMaker. If you're using a different cloud provider, take a look at the or step operators.

How to deploy it

Create a role in the IAM console that you want the jobs running in SageMaker to assume. This role should at least have the AmazonS3FullAccess and AmazonSageMakerFullAccess policies applied. Check for a guide on how to set up this role.

How to use it

To use the SageMaker step operator, we need:

  • The ZenML aws integration installed. If you haven't done so, run

    zenml integration install aws
  • installed and running.

  • An IAM role with the correct permissions. See the for detailed instructions.

  • An as part of our stack. Take a look for a guide on how to set that up.

  • A as part of your stack. This is needed so that both your orchestration environment and SageMaker can read and write step artifacts. Check out the documentation page of the artifact store you want to use for more information on how to set that up and configure authentication for it.

  • An instance type that we want to execute our steps on. See for a list of available instance types.

  • (Optional) An experiment that is used to group SageMaker runs. Check to see how to create an experiment.

There are two ways you can authenticate your orchestrator to AWS to be able to run steps on SageMaker:

zenml service-connector register <CONNECTOR_NAME> --type aws -i
zenml step-operator register <STEP_OPERATOR_NAME> \
    --flavor=sagemaker \
    --role=<SAGEMAKER_ROLE> \
    --instance_type=<INSTANCE_TYPE> \
#   --experiment_name=<EXPERIMENT_NAME> # optionally specify an experiment to assign this run to

zenml step-operator connect <STEP_OPERATOR_NAME> --connector <CONNECTOR_NAME>
zenml stack register <STACK_NAME> -s <STEP_OPERATOR_NAME> ... --set

If you don't connect your step operator to a service connector:

  • If using a remote orchestrator: the remote environment in which the orchestrator runs needs to be able to implicitly authenticate to AWS and assume the IAM role specified when registering the SageMaker step operator. This is only possible if the orchestrator is also running in AWS and uses a form of implicit workload authentication like the IAM role of an EC2 instance. If this is not the case, you will need to use a service connector.

zenml step-operator register <NAME> \
    --flavor=sagemaker \
    --role=<SAGEMAKER_ROLE> \
    --instance_type=<INSTANCE_TYPE> \
#   --experiment_name=<EXPERIMENT_NAME> # optionally specify an experiment to assign this run to

zenml stack register <STACK_NAME> -s <STEP_OPERATOR_NAME> ... --set
python run.py  # Authenticates with `default` profile in `~/.aws/config`

Once you added the step operator to your active stack, you can use it to execute individual steps of your pipeline by specifying it in the @step decorator as follows:

from zenml import step


@step(step_operator= <NAME>)
def trainer(...) -> ...:
    """Train a model."""
    # This step will be executed in SageMaker.

Additional configuration

Enabling CUDA for GPU-backed hardware

The recommended way to authenticate your SageMaker step operator is by registering or using an existing and connecting it to your SageMaker step operator. The credentials configured for the connector must have permissions to create and manage SageMaker runs (e.g. permissions). The SageMaker step operator uses these aws-generic resource type, so make sure to configure the connector accordingly:

If using a : ZenML will try to implicitly authenticate to AWS via the default profile in your local . Make sure this profile has permissions to create and manage SageMaker runs (e.g. permissions).

ZenML will build a Docker image called <CONTAINER_REGISTRY_URI>/zenml:<PIPELINE_NAME> which includes your code and use it to run your steps in SageMaker. Check out if you want to learn more about how ZenML builds these images and how you can customize them.

For additional configuration of the SageMaker step operator, you can pass SagemakerStepOperatorSettings when defining or running your pipeline. Check out the for a full list of available attributes and for more information on how to specify settings.

For more information and a full list of configurable attributes of the SageMaker step operator, check out the .

Note that if you wish to use this step operator to run steps on a GPU, you will need to follow to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.

AWS Service Connector
the AmazonSageMakerFullAccess managed policy
local orchestrator
AWS configuration file
the AmazonSageMakerFullAccess managed policy
this page
SDK docs
this docs page
SDK Docs
the instructions on this page
SageMaker
Vertex
AzureML
here
Docker
AWS container registry
here
remote artifact store
here
this guide
deployment section
ZenML Scarf