Run on AWS
A simple guide to create an AWS stack to run your ZenML pipelines
This page aims to quickly set up a minimal production stack on AWS. With just a few simple steps, you will set up an IAM role with specifically-scoped permissions that ZenML can use to authenticate with the relevant AWS resources.
Would you like to skip ahead and deploy a full AWS ZenML cloud stack already?
Check out the in-browser stack deployment wizard, the stack registration wizard, or the ZenML AWS Terraform module for a shortcut on how to deploy & register this stack.
1) Set up credentials and local environment
To follow this guide, you need:
An active AWS account with necessary permissions for AWS S3, SageMaker, ECR, and ECS.
ZenML installed
AWS CLI installed and configured with your AWS credentials. You can follow the instructions here.
Once ready, navigate to the AWS console:
Choose an AWS region In the AWS console, choose the region where you want to deploy your ZenML stack resources. Make note of the region name (e.g.,
us-east-1
,eu-west-2
, etc.) as you will need it in subsequent steps.Create an IAM role
For this, you'll need to find out your AWS account ID. You can find this by running:
This will output your AWS account ID. Make a note of this as you will need it in the next steps. (If you're doing anything more esoteric with your AWS account and IAM roles, this might not work for you. The account ID here that we're trying to get is the root account ID that you use to log in to the AWS console.)
Then create a file named assume-role-policy.json
with the following content:
Make sure to replace the placeholder <YOUR_ACCOUNT_ID>
with your actual AWS account ID that we found earlier.
Now create a new IAM role that ZenML will use to access AWS resources. We'll use zenml-role
as a role name in this example, but you can feel free to choose something else if you prefer. Run the following command to create the role:
Be sure to take note of the information that is output to the terminal, as you will need it in the next steps, especially the Role ARN.
Attach policies to the role
Attach the following policies to the role to grant access to the necessary AWS services:
AmazonS3FullAccess
AmazonEC2ContainerRegistryFullAccess
AmazonSageMakerFullAccess
If you have not already, install the AWS and S3 ZenML integrations:
2) Create a Service Connector within ZenML
Create an AWS Service Connector within ZenML. The service connector will allow ZenML and other ZenML components to authenticate themselves with AWS using the IAM role.
Replace <ROLE_ARN>
with the ARN of the IAM role you created in the previous step, <YOUR_REGION>
with the respective value and use your AWS access key ID and secret access key that we noted down earlier.
3) Create Stack Components
Artifact Store (S3)
An artifact store is used for storing and versioning data flowing through your pipelines.
Before you run anything within the ZenML CLI, create an AWS S3 bucket. If you already have one, you can skip this step. (Note: the bucket name should be unique, so you might need to try a few times to find a unique name.)
Once this is done, you can create the ZenML stack component as follows:
Register an S3 Artifact Store with the connector
More details here.
Orchestrator (SageMaker Pipelines)
An orchestrator is the compute backend to run your pipelines.
Before you run anything within the ZenML CLI, head on over to AWS and create a SageMaker domain (Skip this if you already have one). The instructions for creating a domain can be found in the AWS core documentation.
A SageMaker domain is a central management unit for all SageMaker users and resources within a region. It provides a single sign-on (SSO) experience and enables users to create and manage SageMaker resources, such as notebooks, training jobs, and endpoints, within a collaborative environment.
When you create a SageMaker domain, you specify the configuration settings, such as the domain name, user profiles, and security settings. Each user within a domain gets their own isolated workspace, which includes a JupyterLab interface, a set of compute resources, and persistent storage.
The SageMaker orchestrator in ZenML requires a SageMaker domain to run pipelines because it leverages the SageMaker Pipelines service, which is part of the SageMaker ecosystem. SageMaker Pipelines allows you to define, execute, and manage end-to-end machine learning workflows using a declarative approach.
By creating a SageMaker domain, you establish the necessary environment and permissions for the SageMaker orchestrator to interact with SageMaker Pipelines and other SageMaker resources seamlessly. The domain acts as a prerequisite for using the SageMaker orchestrator in ZenML.
Once this is done, you can create the ZenML stack component as follows:
Register a SageMaker Pipelines orchestrator stack component:
You'll need the IAM role ARN that we noted down earlier to register the orchestrator. This is the 'execution role' ARN you need to pass to the orchestrator.
Note: The SageMaker orchestrator utilizes the AWS configuration for operation and does not require direct connection via a service connector for authentication, as it relies on your AWS CLI configurations or environment variables.
More details here.
Container Registry (ECR)
A container registry is used to store Docker images for your pipelines.
You'll need to create a repository in ECR. If you already have one, you can skip this step.
Once this is done, you can create the ZenML stack component as follows:
Register an ECR container registry stack component:
More details here.
4) Create stack
In case you want to also add any other stack components to this stack, feel free to do so.
5) And you're already done!
Just like that, you now have a fully working AWS stack ready to go. Feel free to take it for a spin by running a pipeline on it.
Define a ZenML pipeline:
Save this code to run.py and execute it. The pipeline will use AWS S3 for artifact storage, Amazon SageMaker Pipelines for orchestration, and Amazon ECR for container registry.
Read more in the production guide.
Cleanup
Make sure you no longer need the resources before deleting them. The instructions and commands that follow are DESTRUCTIVE.
Delete any AWS resources you no longer use to avoid additional charges. You'll want to do the following:
Make sure to run these commands in the same AWS region where you created the resources.
By running these cleanup commands, you will delete the S3 bucket, SageMaker domain, ECR repository, and IAM role, along with their associated policies. This will help you avoid any unnecessary charges for resources you no longer need.
Remember to be cautious when deleting resources and ensure that you no longer require them before running the deletion commands.
Conclusion
In this guide, we walked through the process of setting up an AWS stack with ZenML to run your machine learning pipelines in a scalable and production-ready environment. The key steps included:
Setting up credentials and the local environment by creating an IAM role with the necessary permissions.
Creating a ZenML service connector to authenticate with AWS services using the IAM role.
Configuring stack components, including an S3 artifact store, a SageMaker Pipelines orchestrator, and an ECR container registry.
Registering the stack components and creating a ZenML stack.
By following these steps, you can leverage the power of AWS services, such as S3 for artifact storage, SageMaker Pipelines for orchestration, and ECR for container management, all within the ZenML framework. This setup allows you to build, deploy, and manage machine learning pipelines efficiently and scale your workloads based on your requirements.
The benefits of using an AWS stack with ZenML include:
Scalability: Leverage the scalability of AWS services to handle large-scale machine learning workloads.
Reproducibility: Ensure reproducibility of your pipelines with versioned artifacts and containerized environments.
Collaboration: Enable collaboration among team members by using a centralized stack and shared resources.
Flexibility: Customize and extend your stack components based on your specific needs and preferences.
Now that you have a functional AWS stack set up with ZenML, you can explore more advanced features and capabilities offered by ZenML. Some next steps to consider:
Dive deeper into ZenML's production guide to learn best practices for deploying and managing production-ready pipelines.
Explore ZenML's integrations with other popular tools and frameworks in the machine learning ecosystem.
Join the ZenML community to connect with other users, ask questions, and get support.
By leveraging the power of AWS and ZenML, you can streamline your machine learning workflows, improve collaboration, and deploy production-ready pipelines with ease. What follows is a set of best practices for using your AWS stack with ZenML.
Best Practices for Using an AWS Stack with ZenML
When working with an AWS stack in ZenML, consider the following best practices to optimize your workflow, enhance security, and improve cost-efficiency. These are all things you might want to do or amend in your own setup once you have tried running some pipelines on your AWS stack.
Use IAM Roles and Least Privilege Principle
Always adhere to the principle of least privilege when setting up IAM roles. Only grant the minimum permissions necessary for your ZenML pipelines to function. Regularly review and audit your IAM roles to ensure they remain appropriate and secure.
Leverage AWS Resource Tagging
Implement a consistent tagging strategy for all of your AWS resources that you use for your pipelines. For example, if you have S3 as an artifact store in your stack, you should tag it like shown below:
These tags will help you with billing and cost allocation tracking and also with any cleanup efforts.
Implement Cost Management Strategies
Use AWS Cost Explorer and AWS Budgets to monitor and manage your spending. To create a cost budget:
Create a JSON file (e.g.,
budget-config.json
) defining the budget:
Create the cost budget:
Set up cost allocation tags to track expenses related to your ZenML projects:
Use Warm Pools for your SageMaker Pipelines
Warm Pools in SageMaker can significantly reduce the startup time of your pipeline steps, leading to faster iterations and improved development efficiency. This feature keeps compute instances in a "warm" state, ready to quickly start new jobs.
To enable Warm Pools, use the SagemakerOrchestratorSettings
class:
This configuration keeps instances warm for 5 minutes after each job completes, allowing subsequent jobs to start faster if initiated within this timeframe. The reduced startup time can be particularly beneficial for iterative development processes or frequently run pipelines.
Implement a Robust Backup Strategy
Regularly backup your critical data and configurations. For S3, enable versioning and consider using cross-region replication for disaster recovery.
By following these best practices and implementing the provided examples, you can create a more secure, efficient, and cost-effective AWS stack for your ZenML projects. Remember to regularly review and update your practices as your projects evolve and as AWS introduces new features and services.
Last updated