Connecting remote storage
Transitioning to remote artifact storage.
Last updated
Was this helpful?
Transitioning to remote artifact storage.
Last updated
Was this helpful?
In the previous chapters, we've been working with artifacts stored locally on our machines. This setup is fine for individual experiments, but as we move towards a collaborative and production-ready environment, we need a solution that is more robust, shareable, and scalable. Enter remote storage!
Remote storage allows us to store our artifacts in the cloud, which means they're accessible from anywhere and by anyone with the right permissions. This is essential for team collaboration and for managing the larger datasets and models that come with production workloads.
When using a stack with remote storage, nothing changes except the fact that the artifacts get materialized in a central and remote storage location. This diagram explains the flow:
With the URI to your S3 bucket known, registering an S3 Artifact Store can be done as follows:
Once we have our service connector, we can now attach it to stack components. In this case, we are going to connect it to our remote artifact store:
Now, every time you (or anyone else with access) uses the cloud_artifact_store
, they will be granted a temporary token that will grant them access to the remote storage. Therefore, your colleagues don't need to worry about setting up credentials and installing clients locally!
Set our local_with_remote_storage
stack active:
Let us continue with the example from the previous page and run the training pipeline:
You can list your artifact versions as follows:
You will notice above that some artifacts are stored locally, while others are stored in a remote storage location.
By connecting remote storage, you're taking a significant step towards building a collaborative and scalable MLOps workflow. Your artifacts are no longer tied to a single machine but are now part of a cloud-based ecosystem, ready to be shared and built upon.
Check out the, the , or for a shortcut on how to deploy & register a cloud stack.
Out of the box, ZenML ships with . For convenience, here are some brief instructions on how to quickly get up and running on the major cloud providers:
You will need to install and set up the AWS CLI on your machine as a prerequisite, as covered in , before you register the S3 Artifact Store.
The Amazon Web Services S3 Artifact Store flavor is provided by the , you need to install it on your local machine to be able to register an S3 Artifact Store and add it to your stack:
The only configuration parameter mandatory for registering an S3 Artifact Store is the root path URI, which needs to point to an S3 bucket and take the form s3://bucket-name
. In order to create a S3 bucket, refer to the .
For more information, read the .
You will need to install and set up the Google Cloud CLI on your machine as a prerequisite, as covered in , before you register the GCS Artifact Store.
The Google Cloud Storage Artifact Store flavor is provided by the , you need to install it on your local machine to be able to register a GCS Artifact Store and add it to your stack:
The only configuration parameter mandatory for registering a GCS Artifact Store is the root path URI, which needs to point to a GCS bucket and take the form gs://bucket-name
. Please read on how to provision a GCS bucket.
For more information, read the .
You will need to install and set up the Azure CLI on your machine as a prerequisite, as covered in , before you register the Azure Artifact Store.
The Microsoft Azure Artifact Store flavor is provided by the , you need to install it on your local machine to be able to register an Azure Artifact Store and add it to your stack:
The only configuration parameter mandatory for registering an Azure Artifact Store is the root path URI, which needs to point to an Azure Blog Storage container and take the form az://container-name
or abfs://container-name
. Please read on how to provision an Azure Blob Storage container.
For more information, read the .
You can create a remote artifact store in pretty much any environment, including other cloud providers using a cloud-agnostic artifact storage such as .
It is also relatively simple to create a for your use case.
Having trouble with setting up infrastructure? Join the and ask for help!
While you can go ahead and if your local client is configured to access it, it is best practice to use a for this purpose. Service connectors are quite a complicated concept (We have a whole on them) - but we're going to be starting with a very basic approach.
First, let's understand what a service connector does. In simple words, a service connector contains credentials that grant stack components access to cloud infrastructure. These credentials are stored in the form of a, and are available to the ZenML server to use. Using these credentials, the service connector brokers a short-lived token and grants temporary permissions to the stack component to access that infrastructure. This diagram represents this process:
There are , but for the sake of this guide, we recommend creating one by .
There are , but for the sake of this guide, we recommend creating one by .
There are , but for the sake of this guide, we recommend creating one by .
Now that we have our remote artifact store registered, we can with it, just like we did in the previous chapter:
Now, using the , we run a training pipeline:
When you run that pipeline, ZenML will automatically store the artifacts in the specified remote storage, ensuring that they are preserved and accessible for future runs and by your team members. You can ask your colleagues to connect to the same , and you will notice that if they run the same pipeline, the pipeline would be partially cached, even if they have not run the pipeline themselves before.
features an to visualize artifact versions: