AWS Sagemaker Orchestrator
How to orchestrate pipelines with Amazon Sagemaker
You should use the Sagemaker orchestrator if:
- you're already using AWS.
- you're looking for a proven production-grade orchestrator.
- you're looking for a UI in which you can track your pipeline runs.
- you're looking for a managed solution for running your pipelines.
- you're looking for a serverless solution for running your pipelines.
In order to use a Sagemaker AI orchestrator, you need to first deploy ZenML to the cloud. It would be recommended to deploy ZenML in the same region as you plan on using for Sagemaker, but it is not necessary to do so. You must ensure that you are connected to the remote ZenML server before using this stack component.
The only other thing necessary to use the ZenML Sagemaker orchestrator is enabling the relevant permissions for your particular role.
In order to quickly enable APIs, and create other resources necessary for to use this integration, we will soon provide a Sagemaker stack recipe via our
mlops-stacksrecipe repository, which will help you set up the infrastructure with one click.
To use the Sagemaker orchestrator, we need:
- The ZenML
s3integrations installed. If you haven't done so, run
zenml integration install aws s3
- An IAM role or user with an
AmazonSageMakerFullAccessmanaged policy applied to it as well as
sagemaker.amazonaws.comadded as a Principal Service. Full details on these permissions can be found here or use the ZenML recipe (when available) which will set up the necessary permissions for you. The creation of this role is described in more detail in the instructions for using our
- The local client (whoever is running the pipeline) will also have to have the necessary permissions or role to be able to launch Sagemaker jobs. (This would be covered by the
AmazonSageMakerFullAccesspolicy suggested above.)
We can then register the orchestrator and use it in our active stack:
zenml orchestrator register <ORCHESTRATOR_NAME> \
# Register and activate a stack with the new orchestrator
zenml stack register <STACK_NAME> -o <ORCHESTRATOR_NAME> ... --set
You can now run any ZenML pipeline using the Sagemaker orchestrator:
For additional configuration of the Sagemaker orchestrator, you can pass
SagemakerOrchestratorSettingswhich allows you to configure (among others) the following attributes:
instance_type: The instance type to use for the Sagemaker training job. (Defaults to
processor_role: The IAM role to use for the Sagemaker processing job/step.
volume_size_in_gb: The size of the volume to use for the Sagemaker training job. (Defaults to 30 GB.)
max_runtime_in_seconds: The maximum runtime of the Sagemaker training job. (Defaults to 1 day or 86400 seconds.)
processor_tags: Any tags you want to add to the particular step or pipeline as a whole.
Note that if you wish to use this orchestrator to run steps on a GPU, you will need to follow the instructions on this page to ensure that it works. It requires adding some extra settings customization and is essential to enable CUDA for the GPU to give its full acceleration.