Connecting remote storage
Transitioning to remote artifact storage.
Chapter 3: Connecting Remote Storage
In the previous chapters, we've been working with artifacts stored locally on our machines. This setup is fine for individual experiments, but as we move towards a collaborative and production-ready environment, we need a solution that is more robust, shareable, and scalable. Enter remote storage!
Remote storage allows us to store our artifacts in the cloud, which means they're accessible from anywhere and by anyone with the right permissions. This is essential for team collaboration and for managing the larger datasets and models that come with production workloads.
When using a stack with remote storage, nothing changes except the fact that the artifacts get materialized in a central and remote storage location. This diagram explains the flow:
Provisioning and registering a remote artifact store
Having trouble with this command? You can use poetry
or pip
to install the requirements of any ZenML integration directly. In order to obtain the exact requirements of the AWS S3 integration you can use zenml integration requirements s3
.
With the URI to your S3 bucket known, registering an S3 Artifact Store can be done as follows:
Configuring permissions with your first service connector
Once we have our service connector, we can now attach it to stack components. In this case, we are going to connect it to our remote artifact store:
Now, every time you (or anyone else with access) uses the cloud_artifact_store
, they will be granted a temporary token that will grant them access to the remote storage. Therefore, your colleagues don't need to worry about setting up credentials and installing clients locally!
Running a pipeline on a cloud stack
Set our local_with_remote_storage
stack active:
Let us continue with the example from the previous page and run the training pipeline:
You can list your artifact versions as follows:
You will notice above that some artifacts are stored locally, while others are stored in a remote storage location.
By connecting remote storage, you're taking a significant step towards building a collaborative and scalable MLOps workflow. Your artifacts are no longer tied to a single machine but are now part of a cloud-based ecosystem, ready to be shared and built upon.
Last updated