LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Overview
  • Integrations
  • Stack Components
    • Orchestrators
      • Local Orchestrator
      • Local Docker Orchestrator
      • Kubeflow Orchestrator
      • Kubernetes Orchestrator
      • Google Cloud VertexAI Orchestrator
      • AWS Sagemaker Orchestrator
      • AzureML Orchestrator
      • Databricks Orchestrator
      • Tekton Orchestrator
      • Airflow Orchestrator
      • Skypilot VM Orchestrator
      • HyperAI Orchestrator
      • Lightning AI Orchestrator
      • Develop a custom orchestrator
    • Artifact Stores
      • Local Artifact Store
      • Amazon Simple Cloud Storage (S3)
      • Google Cloud Storage (GCS)
      • Azure Blob Storage
      • Develop a custom artifact store
    • Container Registries
      • Default Container Registry
      • DockerHub
      • Amazon Elastic Container Registry (ECR)
      • Google Cloud Container Registry
      • Azure Container Registry
      • GitHub Container Registry
      • Develop a custom container registry
    • Step Operators
      • Amazon SageMaker
      • AzureML
      • Google Cloud VertexAI
      • Kubernetes
      • Modal
      • Spark
      • Develop a Custom Step Operator
    • Experiment Trackers
      • Comet
      • MLflow
      • Neptune
      • Weights & Biases
      • Google Cloud VertexAI Experiment Tracker
      • Develop a custom experiment tracker
    • Image Builders
      • Local Image Builder
      • Kaniko Image Builder
      • AWS Image Builder
      • Google Cloud Image Builder
      • Develop a Custom Image Builder
    • Alerters
      • Discord Alerter
      • Slack Alerter
      • Develop a Custom Alerter
    • Annotators
      • Argilla
      • Label Studio
      • Pigeon
      • Prodigy
      • Develop a Custom Annotator
    • Data Validators
      • Great Expectations
      • Deepchecks
      • Evidently
      • Whylogs
      • Develop a custom data validator
    • Feature Stores
      • Feast
      • Develop a Custom Feature Store
    • Model Deployers
      • MLflow
      • Seldon
      • BentoML
      • Hugging Face
      • Databricks
      • vLLM
      • Develop a Custom Model Deployer
    • Model Registries
      • MLflow Model Registry
      • Develop a Custom Model Registry
  • Service Connectors
    • Introduction
    • Complete guide
    • Best practices
    • Connector Types
      • Docker Service Connector
      • Kubernetes Service Connector
      • AWS Service Connector
      • GCP Service Connector
      • Azure Service Connector
      • HyperAI Service Connector
  • Popular Stacks
    • AWS
    • Azure
    • GCP
    • Kubernetes
  • Deployment
    • 1-click Deployment
    • Terraform Modules
    • Register a cloud stack
    • Infrastructure as code
  • Contribute
    • Custom Stack Component
    • Custom Integration
Powered by GitBook
On this page
  • 1) Set up credentials and local environment
  • 2) Create a Service Connector within ZenML
  • 3) Create Stack Components
  • Artifact Store (S3)
  • Orchestrator (SageMaker Pipelines)
  • Container Registry (ECR)
  • 4) Create stack
  • 5) And you're already done!
  • Cleanup
  • Conclusion
  • Best Practices for Using an AWS Stack with ZenML
  • Use IAM Roles and Least Privilege Principle
  • Leverage AWS Resource Tagging
  • Implement Cost Management Strategies
  • Use Warm Pools for your SageMaker Pipelines
  • Implement a Robust Backup Strategy

Was this helpful?

  1. Popular Stacks

AWS

A simple guide to create an AWS stack to run your ZenML pipelines

PreviousHyperAI Service ConnectorNextAzure

Last updated 11 days ago

Was this helpful?

This page aims to quickly set up a minimal production stack on AWS. With just a few simple steps, you will set up an IAM role with specifically-scoped permissions that ZenML can use to authenticate with the relevant AWS resources.

Would you like to skip ahead and deploy a full AWS ZenML cloud stack already?

Check out the , the , or for a shortcut on how to deploy & register this stack.

1) Set up credentials and local environment

To follow this guide, you need:

  • An active AWS account with necessary permissions for AWS S3, SageMaker, ECR, and ECS.

  • ZenML

  • AWS CLI installed and configured with your AWS credentials. You can follow the instructions .

Once ready, navigate to the AWS console:

  1. Choose an AWS region: In the AWS console, choose the region where you want to deploy your ZenML stack resources. Make note of the region name (e.g., us-east-1, eu-west-2, etc.) as you will need it in subsequent steps.

  2. Create an IAM role:

For this, you'll need to find out your AWS account ID. You can find this by running:

aws sts get-caller-identity --query Account --output text

This will output your AWS account ID. Make a note of this as you will need it in the next steps. (If you're doing anything more esoteric with your AWS account and IAM roles, this might not work for you. The account ID here that we're trying to get is the root account ID that you use to log in to the AWS console.)

Then create a file named assume-role-policy.json with the following content:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<YOUR_ACCOUNT_ID>:root",
        "Service": "sagemaker.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Make sure to replace the placeholder <YOUR_ACCOUNT_ID> with your actual AWS account ID that we found earlier.

Now create a new IAM role that ZenML will use to access AWS resources. We'll use zenml-role as a role name in this example, but you can feel free to choose something else if you prefer. Run the following command to create the role:

aws iam create-role --role-name zenml-role --assume-role-policy-document file://assume-role-policy.json

Be sure to take note of the information that is output to the terminal, as you will need it in the next steps, especially the Role ARN.

  1. Attach policies to the role:

Attach the following policies to the role to grant access to the necessary AWS services:

  • AmazonS3FullAccess

  • AmazonEC2ContainerRegistryFullAccess

  • AmazonSageMakerFullAccess

aws iam attach-role-policy --role-name zenml-role --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam attach-role-policy --role-name zenml-role --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
aws iam attach-role-policy --role-name zenml-role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
  1. If you have not already, install the AWS and S3 ZenML integrations:

zenml integration install aws s3 -y

2) Create a Service Connector within ZenML

Create an AWS Service Connector within ZenML. The service connector will allow ZenML and other ZenML components to authenticate themselves with AWS using the IAM role.

zenml service-connector register aws_connector \
  --type aws \
  --auth-method iam-role \
  --role_arn=<ROLE_ARN> \
  --region=<YOUR_REGION> \
  --aws_access_key_id=<YOUR_ACCESS_KEY_ID> \
  --aws_secret_access_key=<YOUR_SECRET_ACCESS_KEY>

Replace <ROLE_ARN> with the ARN of the IAM role you created in the previous step, <YOUR_REGION> with the respective value and use your AWS access key ID and secret access key that we noted down earlier.

3) Create Stack Components

Artifact Store (S3)

  1. Before you run anything within the ZenML CLI, create an AWS S3 bucket. If you already have one, you can skip this step. (Note: the bucket name should be unique, so you might need to try a few times to find a unique name.)

aws s3api create-bucket --bucket your-bucket-name

Once this is done, you can create the ZenML stack component as follows:

  1. Register an S3 Artifact Store with the connector:

zenml artifact-store register cloud_artifact_store -f s3 --path=s3://bucket-name --connector aws_connector

Orchestrator (SageMaker Pipelines)

A SageMaker domain is a central management unit for all SageMaker users and resources within a region. It provides a single sign-on (SSO) experience and enables users to create and manage SageMaker resources, such as notebooks, training jobs, and endpoints, within a collaborative environment.

When you create a SageMaker domain, you specify the configuration settings, such as the domain name, user profiles, and security settings. Each user within a domain gets their own isolated workspace, which includes a JupyterLab interface, a set of compute resources, and persistent storage.

The SageMaker orchestrator in ZenML requires a SageMaker domain to run pipelines because it leverages the SageMaker Pipelines service, which is part of the SageMaker ecosystem. SageMaker Pipelines allows you to define, execute, and manage end-to-end machine learning workflows using a declarative approach.

By creating a SageMaker domain, you establish the necessary environment and permissions for the SageMaker orchestrator to interact with SageMaker Pipelines and other SageMaker resources seamlessly. The domain acts as a prerequisite for using the SageMaker orchestrator in ZenML.

Once this is done, you can create the ZenML stack component as follows:

  1. Register a SageMaker Pipelines orchestrator stack component:

You'll need the IAM role ARN that we noted down earlier to register the orchestrator. This is the 'execution role' ARN you need to pass to the orchestrator.

zenml orchestrator register sagemaker-orchestrator --flavor=sagemaker --region=<YOUR_REGION> --execution_role=<ROLE_ARN>

Note: The SageMaker orchestrator utilizes the AWS configuration for operation and does not require direct connection via a service connector for authentication, as it relies on your AWS CLI configurations or environment variables.

Container Registry (ECR)

  1. You'll need to create a repository in ECR. If you already have one, you can skip this step.

aws ecr create-repository --repository-name zenml --region <YOUR_REGION>

Once this is done, you can create the ZenML stack component as follows:

  1. Register an ECR container registry stack component:

zenml container-registry register ecr-registry --flavor=aws --uri=<ACCOUNT_ID>.dkr.ecr.<YOUR_REGION>.amazonaws.com --connector aws-connector

4) Create stack

export STACK_NAME=aws_stack

zenml stack register ${STACK_NAME} -o ${ORCHESTRATOR_NAME} \
    -a ${ARTIFACT_STORE_NAME} -c ${CONTAINER_REGISTRY_NAME} --set

In case you want to also add any other stack components to this stack, feel free to do so.

5) And you're already done!

Just like that, you now have a fully working AWS stack ready to go. Feel free to take it for a spin by running a pipeline on it.

Define a ZenML pipeline:

from zenml import pipeline, step

@step
def hello_world() -> str:
    return "Hello from SageMaker!"

@pipeline
def aws_sagemaker_pipeline():
    hello_world()

if __name__ == "__main__":
    aws_sagemaker_pipeline()

Save this code to run.py and execute it. The pipeline will use AWS S3 for artifact storage, Amazon SageMaker Pipelines for orchestration, and Amazon ECR for container registry.

python run.py

Cleanup

Make sure you no longer need the resources before deleting them. The instructions and commands that follow are DESTRUCTIVE.

Delete any AWS resources you no longer use to avoid additional charges. You'll want to do the following:

# delete the S3 bucket
aws s3 rm s3://your-bucket-name --recursive
aws s3api delete-bucket --bucket your-bucket-name

# delete the SageMaker domain
aws sagemaker delete-domain --domain-id <DOMAIN_ID>

# delete the ECR repository
aws ecr delete-repository --repository-name zenml-repository --force

# detach policies from the IAM role
aws iam detach-role-policy --role-name zenml-role --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam detach-role-policy --role-name zenml-role --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
aws iam detach-role-policy --role-name zenml-role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

# delete the IAM role
aws iam delete-role --role-name zenml-role

Make sure to run these commands in the same AWS region where you created the resources.

By running these cleanup commands, you will delete the S3 bucket, SageMaker domain, ECR repository, and IAM role, along with their associated policies. This will help you avoid any unnecessary charges for resources you no longer need.

Remember to be cautious when deleting resources and ensure that you no longer require them before running the deletion commands.

Conclusion

In this guide, we walked through the process of setting up an AWS stack with ZenML to run your machine learning pipelines in a scalable and production-ready environment. The key steps included:

  1. Setting up credentials and the local environment by creating an IAM role with the necessary permissions.

  2. Creating a ZenML service connector to authenticate with AWS services using the IAM role.

  3. Configuring stack components, including an S3 artifact store, a SageMaker Pipelines orchestrator, and an ECR container registry.

  4. Registering the stack components and creating a ZenML stack.

By following these steps, you can leverage the power of AWS services, such as S3 for artifact storage, SageMaker Pipelines for orchestration, and ECR for container management, all within the ZenML framework. This setup allows you to build, deploy, and manage machine learning pipelines efficiently and scale your workloads based on your requirements.

The benefits of using an AWS stack with ZenML include:

  • Scalability: Leverage the scalability of AWS services to handle large-scale machine learning workloads.

  • Reproducibility: Ensure reproducibility of your pipelines with versioned artifacts and containerized environments.

  • Collaboration: Enable collaboration among team members by using a centralized stack and shared resources.

  • Flexibility: Customize and extend your stack components based on your specific needs and preferences.

Now that you have a functional AWS stack set up with ZenML, you can explore more advanced features and capabilities offered by ZenML. Some next steps to consider:

By leveraging the power of AWS and ZenML, you can streamline your machine learning workflows, improve collaboration, and deploy production-ready pipelines with ease. What follows is a set of best practices for using your AWS stack with ZenML.

Best Practices for Using an AWS Stack with ZenML

When working with an AWS stack in ZenML, consider the following best practices to optimize your workflow, enhance security, and improve cost-efficiency. These are all things you might want to do or amend in your own setup once you have tried running some pipelines on your AWS stack.

Use IAM Roles and Least Privilege Principle

Leverage AWS Resource Tagging

aws s3api put-bucket-tagging --bucket your-bucket-name --tagging 'TagSet=[{Key=Project,Value=ZenML},{Key=Environment,Value=Production}]'

These tags will help you with billing and cost allocation tracking and also with any cleanup efforts.

Implement Cost Management Strategies

  1. Create a JSON file (e.g., budget-config.json) defining the budget:

{
  "BudgetLimit": {
    "Amount": "100",
    "Unit": "USD"
  },
  "BudgetName": "ZenML Monthly Budget",
  "BudgetType": "COST",
  "CostFilters": {
    "TagKeyValue": [
      "user:Project$ZenML"
    ]
  },
  "CostTypes": {
    "IncludeTax": true,
    "IncludeSubscription": true,
    "UseBlended": false
  },
  "TimeUnit": "MONTHLY"
}
  1. Create the cost budget:

aws budgets create-budget --account-id your-account-id --budget file://budget-config.json

Set up cost allocation tags to track expenses related to your ZenML projects:

aws ce create-cost-category-definition --name ZenML-Projects --rules-version 1 --rules file://rules.json

Use Warm Pools for your SageMaker Pipelines

To enable Warm Pools, use the SagemakerOrchestratorSettings class:

sagemaker_orchestrator_settings = SagemakerOrchestratorSettings(
    keep_alive_period_in_seconds = 300, # 5 minutes, default value
)

This configuration keeps instances warm for 5 minutes after each job completes, allowing subsequent jobs to start faster if initiated within this timeframe. The reduced startup time can be particularly beneficial for iterative development processes or frequently run pipelines.

Implement a Robust Backup Strategy

By following these best practices and implementing the provided examples, you can create a more secure, efficient, and cost-effective AWS stack for your ZenML projects. Remember to regularly review and update your practices as your projects evolve and as AWS introduces new features and services.

An is used for storing and versioning data flowing through your pipelines.

More details .

An is the compute backend to run your pipelines.

Before you run anything within the ZenML CLI, head on over to AWS and create a SageMaker domain (Skip this if you already have one). The instructions for creating a domain can be found .

More details .

A is used to store Docker images for your pipelines.

More details .

Combine the three stack components and you have your AWS stack. Feel free to add any other component of your choice as well.

Read more in the .

Dive deeper into ZenML's to learn best practices for deploying and managing production-ready pipelines.

Explore ZenML's with other popular tools and frameworks in the machine learning ecosystem.

Join the to connect with other users, ask questions, and get support.

Always adhere to the principle of least privilege when setting up IAM roles. Only grant the minimum permissions necessary for your ZenML pipelines to function. Regularly review and audit your to ensure they remain appropriate and secure.

Implement a for all of your AWS resources that you use for your pipelines. For example, if you have S3 as an artifact store in your stack, you should tag it like shown below:

Use and to monitor and manage your spending. To create a cost budget:

can significantly reduce the startup time of your pipeline steps, leading to faster iterations and improved development efficiency. This feature keeps compute instances in a "warm" state, ready to quickly start new jobs.

Regularly backup your critical data and configurations. For S3, enable versioning and consider using for disaster recovery.

in-browser stack deployment wizard
stack registration wizard
the ZenML AWS Terraform module
installed
here
artifact store
here
orchestrator
in the AWS core documentation
here
container registry
here
production guide
production guide
integrations
ZenML community
IAM roles
consistent tagging strategy
AWS Cost Explorer
AWS Budgets
Warm Pools in SageMaker
cross-region replication
Sequence of events that happen when running a pipeline on a remote stack with a code repository
ZenML Scarf