AWS
A simple guide to create an AWS stack to run your ZenML pipelines
Last updated
Was this helpful?
A simple guide to create an AWS stack to run your ZenML pipelines
Last updated
Was this helpful?
This page aims to quickly set up a minimal production stack on AWS. With just a few simple steps, you will set up an IAM role with specifically-scoped permissions that ZenML can use to authenticate with the relevant AWS resources.
To follow this guide, you need:
An active AWS account with necessary permissions for AWS S3, SageMaker, ECR, and ECS.
ZenML
AWS CLI installed and configured with your AWS credentials. You can follow the instructions .
Once ready, navigate to the AWS console:
Choose an AWS region: In the AWS console, choose the region where you want to deploy your ZenML stack resources. Make note of the region name (e.g., us-east-1
, eu-west-2
, etc.) as you will need it in subsequent steps.
Create an IAM role:
For this, you'll need to find out your AWS account ID. You can find this by running:
This will output your AWS account ID. Make a note of this as you will need it in the next steps. (If you're doing anything more esoteric with your AWS account and IAM roles, this might not work for you. The account ID here that we're trying to get is the root account ID that you use to log in to the AWS console.)
Then create a file named assume-role-policy.json
with the following content:
Make sure to replace the placeholder <YOUR_ACCOUNT_ID>
with your actual AWS account ID that we found earlier.
Now create a new IAM role that ZenML will use to access AWS resources. We'll use zenml-role
as a role name in this example, but you can feel free to choose something else if you prefer. Run the following command to create the role:
Be sure to take note of the information that is output to the terminal, as you will need it in the next steps, especially the Role ARN.
Attach policies to the role:
Attach the following policies to the role to grant access to the necessary AWS services:
AmazonS3FullAccess
AmazonEC2ContainerRegistryFullAccess
AmazonSageMakerFullAccess
If you have not already, install the AWS and S3 ZenML integrations:
Create an AWS Service Connector within ZenML. The service connector will allow ZenML and other ZenML components to authenticate themselves with AWS using the IAM role.
Replace <ROLE_ARN>
with the ARN of the IAM role you created in the previous step, <YOUR_REGION>
with the respective value and use your AWS access key ID and secret access key that we noted down earlier.
Before you run anything within the ZenML CLI, create an AWS S3 bucket. If you already have one, you can skip this step. (Note: the bucket name should be unique, so you might need to try a few times to find a unique name.)
Once this is done, you can create the ZenML stack component as follows:
Register an S3 Artifact Store with the connector:
A SageMaker domain is a central management unit for all SageMaker users and resources within a region. It provides a single sign-on (SSO) experience and enables users to create and manage SageMaker resources, such as notebooks, training jobs, and endpoints, within a collaborative environment.
When you create a SageMaker domain, you specify the configuration settings, such as the domain name, user profiles, and security settings. Each user within a domain gets their own isolated workspace, which includes a JupyterLab interface, a set of compute resources, and persistent storage.
The SageMaker orchestrator in ZenML requires a SageMaker domain to run pipelines because it leverages the SageMaker Pipelines service, which is part of the SageMaker ecosystem. SageMaker Pipelines allows you to define, execute, and manage end-to-end machine learning workflows using a declarative approach.
By creating a SageMaker domain, you establish the necessary environment and permissions for the SageMaker orchestrator to interact with SageMaker Pipelines and other SageMaker resources seamlessly. The domain acts as a prerequisite for using the SageMaker orchestrator in ZenML.
Once this is done, you can create the ZenML stack component as follows:
Register a SageMaker Pipelines orchestrator stack component:
You'll need the IAM role ARN that we noted down earlier to register the orchestrator. This is the 'execution role' ARN you need to pass to the orchestrator.
Note: The SageMaker orchestrator utilizes the AWS configuration for operation and does not require direct connection via a service connector for authentication, as it relies on your AWS CLI configurations or environment variables.
You'll need to create a repository in ECR. If you already have one, you can skip this step.
Once this is done, you can create the ZenML stack component as follows:
Register an ECR container registry stack component:
Just like that, you now have a fully working AWS stack ready to go. Feel free to take it for a spin by running a pipeline on it.
Define a ZenML pipeline:
Save this code to run.py and execute it. The pipeline will use AWS S3 for artifact storage, Amazon SageMaker Pipelines for orchestration, and Amazon ECR for container registry.
Make sure you no longer need the resources before deleting them. The instructions and commands that follow are DESTRUCTIVE.
Delete any AWS resources you no longer use to avoid additional charges. You'll want to do the following:
Make sure to run these commands in the same AWS region where you created the resources.
By running these cleanup commands, you will delete the S3 bucket, SageMaker domain, ECR repository, and IAM role, along with their associated policies. This will help you avoid any unnecessary charges for resources you no longer need.
Remember to be cautious when deleting resources and ensure that you no longer require them before running the deletion commands.
In this guide, we walked through the process of setting up an AWS stack with ZenML to run your machine learning pipelines in a scalable and production-ready environment. The key steps included:
Setting up credentials and the local environment by creating an IAM role with the necessary permissions.
Creating a ZenML service connector to authenticate with AWS services using the IAM role.
Configuring stack components, including an S3 artifact store, a SageMaker Pipelines orchestrator, and an ECR container registry.
Registering the stack components and creating a ZenML stack.
By following these steps, you can leverage the power of AWS services, such as S3 for artifact storage, SageMaker Pipelines for orchestration, and ECR for container management, all within the ZenML framework. This setup allows you to build, deploy, and manage machine learning pipelines efficiently and scale your workloads based on your requirements.
The benefits of using an AWS stack with ZenML include:
Scalability: Leverage the scalability of AWS services to handle large-scale machine learning workloads.
Reproducibility: Ensure reproducibility of your pipelines with versioned artifacts and containerized environments.
Collaboration: Enable collaboration among team members by using a centralized stack and shared resources.
Flexibility: Customize and extend your stack components based on your specific needs and preferences.
Now that you have a functional AWS stack set up with ZenML, you can explore more advanced features and capabilities offered by ZenML. Some next steps to consider:
By leveraging the power of AWS and ZenML, you can streamline your machine learning workflows, improve collaboration, and deploy production-ready pipelines with ease. What follows is a set of best practices for using your AWS stack with ZenML.
When working with an AWS stack in ZenML, consider the following best practices to optimize your workflow, enhance security, and improve cost-efficiency. These are all things you might want to do or amend in your own setup once you have tried running some pipelines on your AWS stack.
These tags will help you with billing and cost allocation tracking and also with any cleanup efforts.
Create a JSON file (e.g., budget-config.json
) defining the budget:
Create the cost budget:
Set up cost allocation tags to track expenses related to your ZenML projects:
To enable Warm Pools, use the SagemakerOrchestratorSettings
class:
This configuration keeps instances warm for 5 minutes after each job completes, allowing subsequent jobs to start faster if initiated within this timeframe. The reduced startup time can be particularly beneficial for iterative development processes or frequently run pipelines.
By following these best practices and implementing the provided examples, you can create a more secure, efficient, and cost-effective AWS stack for your ZenML projects. Remember to regularly review and update your practices as your projects evolve and as AWS introduces new features and services.
An is used for storing and versioning data flowing through your pipelines.
More details .
An is the compute backend to run your pipelines.
Before you run anything within the ZenML CLI, head on over to AWS and create a SageMaker domain (Skip this if you already have one). The instructions for creating a domain can be found .
More details .
A is used to store Docker images for your pipelines.
More details .
Read more in the .
Dive deeper into ZenML's to learn best practices for deploying and managing production-ready pipelines.
Explore ZenML's with other popular tools and frameworks in the machine learning ecosystem.
Join the to connect with other users, ask questions, and get support.
Always adhere to the principle of least privilege when setting up IAM roles. Only grant the minimum permissions necessary for your ZenML pipelines to function. Regularly review and audit your to ensure they remain appropriate and secure.
Implement a for all of your AWS resources that you use for your pipelines. For example, if you have S3 as an artifact store in your stack, you should tag it like shown below:
Use and to monitor and manage your spending. To create a cost budget:
can significantly reduce the startup time of your pipeline steps, leading to faster iterations and improved development efficiency. This feature keeps compute instances in a "warm" state, ready to quickly start new jobs.
Regularly backup your critical data and configurations. For S3, enable versioning and consider using for disaster recovery.