Deploy a cloud stack with Terraform
Deploy a cloud stack using Terraform
ZenML maintains a collection of Terraform modules designed to streamline the provisioning of cloud resources and seamlessly integrate them with ZenML Stacks. These modules simplify the setup process, allowing users to quickly provision cloud resources as well as configure and authorize ZenML to utilize them for running pipelines and other AI/ML operations.
By leveraging these Terraform modules, users can ensure a more efficient and scalable deployment of their machine learning infrastructure, ultimately enhancing their development and operational workflows. The modules' implementation can also be used as a reference for creating custom Terraform configurations tailored to specific cloud environments and requirements.
Terraform requires you to manage your infrastructure as code yourself. Among other things, this means that you will need to have Terraform installed on your machine and you will need to manually manage the state of your infrastructure.
If you prefer a more automated approach, you can use the 1-click stack deployment feature to deploy a cloud stack with ZenML with minimal knowledge of Terraform or cloud infrastructure for that matter.
If you have the required infrastructure pieces already deployed on your cloud, you can also use the stack wizard to seamlessly register your stack.
Pre-requisites
To use this feature, you need a deployed ZenML server instance that is reachable from the cloud provider where you wish to have the stack provisioned (this can't be a local server started via zenml login --local
). If you do not already have one set up, you can fast-track to trying out a ZenML Pro server by simply running zenml login --pro
or register for a free ZenML Pro account. If you prefer to host your own, you can learn about self-hosting a ZenML server here.
Once you are connected to your deployed ZenML server, you need to create a service account and an API key for it. You will use the API key to give the Terraform module programmatic access to your ZenML server. You can find more about service accounts and API keys here. but the process is as simple as running the following CLI command while connected to your ZenML server:
Example output:
Finally, you will need the following on the machine where you will be running Terraform:
Terraform installed on your machine (version at least 1.9).
the ZenML Terraform stack modules assume you are already locally authenticated with your cloud provider through the provider's CLI or SDK tool and have permissions to create the resources that the modules will provision. This is different depending on the cloud provider you are using and is covered in the following sections.
How to use the Terraform stack deployment modules
If you are already knowledgeable with using Terraform and the cloud provider where you want to deploy the stack, this process will be straightforward. In a nutshell, you will need to:
create a new Terraform configuration file (e.g.,
main.tf
), preferably in a new directory, with the content that looks like this (<cloud provider>
can beaws
,gcp
, orazure
):
There might be a few additional required or optional inputs depending on the cloud provider you are using. You can find the full list of inputs for each module in the Terraform Registry documentation for the relevant module or you can read on in the following sections.
Run the following commands in the directory where you have your Terraform configuration file:
The directory where you keep the Terraform configuration file and where you run the terraform
commands is important. This is where Terraform will store the state of your infrastructure. Make sure you do not delete this directory or the state file it contains unless you are sure you no longer need to manage these resources with Terraform or after you have deprovisioned them up with terraform destroy
.
Terraform will prompt you to confirm the changes it will make to your cloud infrastructure. If you are happy with the changes, type
yes
and hit enter.Terraform will then provision the resources you have specified in your configuration file. Once the process is complete, you will see a message indicating that the resources have been successfully created and printing out the ZenML stack ID and name:
At this point, a ZenML stack has also been created and registered with your ZenML server and you can start using it to run your pipelines:
You can find more details specific to the cloud provider of your choice in the next section:
The original documentation for the ZenML AWS Terraform module contains extensive information about required permissions, inputs, outputs and provisioned resources. This is a summary of the key points from that documentation.
Authentication
To authenticate with AWS, you need to have the AWS CLI installed on your machine and you need to have run aws configure
to set up your credentials.
Example Terraform Configuration
Here is an example Terraform configuration file for deploying a ZenML stack on AWS:
Stack Components
The Terraform module will create a ZenML stack configuration with the following components:
an S3 Artifact Store linked to a S3 bucket
an ECR Container Registry linked to a ECR repository
depending on the
orchestrator
input variable:a local Orchestrator, if
orchestrator
is set tolocal
. This can be used in combination with the SageMaker Step Operator to selectively run some steps locally and some on SageMaker.a SageMaker Orchestrator linked to the AWS account, if
orchestrator
is set tosagemaker
(default)a SkyPilot Orchestrator linked to the AWS account, if
orchestrator
is set toskypilot
a SageMaker Step Operator linked to the AWS account
an AWS Service Connector configured with the IAM role credentials and used to authenticate all ZenML components with your AWS account
To use the ZenML stack, you will need to install the required integrations:
for the local or SageMaker orchestrator:
for the SkyPilot orchestrator:
How to clean up the Terraform stack deployments
Cleaning up the resources provisioned by Terraform is as simple as running the terraform destroy
command in the directory where you have your Terraform configuration file. This will remove all the resources that were provisioned by the Terraform module and will also delete the ZenML stack that was registered with your ZenML server.
Last updated