Amazon Simple Cloud Storage (S3)

How to store artifacts in an AWS S3 bucket

This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.

The S3 Artifact Store is an Artifact Store flavor provided with the S3 ZenML integration that uses the AWS S3 managed object storage service or one of the self-hosted S3 alternatives, such as MinIO or Ceph RGW, to store artifacts in an S3 compatible object storage backend.

When would you want to use it?

Running ZenML pipelines with the local Artifact Store is usually sufficient if you just want to evaluate ZenML or get started quickly without incurring the trouble and the cost of employing cloud storage services in your stack. However, the local Artifact Store becomes insufficient or unsuitable if you have more elaborate needs for your project:

  • if you want to share your pipeline run results with other team members or stakeholders inside or outside your organization

  • if you have other components in your stack that are running remotely (e.g. a Kubeflow or Kubernetes Orchestrator running in public cloud).

  • if you outgrow what your local machine can offer in terms of storage space and need to use some form of private or public storage service that is shared with others

  • if you are running pipelines at scale and need an Artifact Store that can handle the demands of production grade MLOps

In all these cases, you need an Artifact Store that is backed by a form of public cloud or self-hosted shared object storage service.

You should use the S3 Artifact Store when you decide to keep your ZenML artifacts in a shared object storage and if you have access to the AWS S3 managed service or one of the S3 compatible alternatives (e.g. Minio, Ceph RGW). You should consider one of the other Artifact Store flavors if you don't have access to an S3 compatible service.

How do you deploy it?

The S3 Artifact Store flavor is provided by the S3 ZenML integration, you need to install it on your local machine to be able to register an S3 Artifact Store and add it to your stack:

zenml integration install s3 -y

The only configuration parameter mandatory for registering an S3 Artifact Store is the root path URI, which needs to point to an S3 bucket and takes the form s3://bucket-name. Please read the documentation relevant to the S3 service that you are using on how to create an S3 bucket. For example, the AWS S3 documentation is available here.

With the URI to your S3 bucket known, registering an S3 Artifact Store and using it in a stack can be done as follows:

# Register the S3 artifact-store
zenml artifact-store register s3_store -f s3 --path=s3://bucket-name

# Register and set a stack with the new artifact store
zenml stack register custom_stack -a s3_store ... --set

Depending on your use-case, however, you may also need to provide additional configuration parameters pertaining to authentication or pass advanced configuration parameters to match your S3 compatible service or deployment scenario.

Infrastructure Deployment

An S3 Artifact Store can be deployed directly from the ZenML CLI:

zenml artifact-store deploy s3_artifact_store --flavor=s3

You can pass other configuration specific to the stack components as key-value arguments. If you don't provide a name, a random one is generated for you. For more information about how to work use the CLI for this, please refer to the dedicated documentation section.

Authentication Methods

Integrating and using an S3 compatible Artifact Store in your pipelines is not possible without employing some form of authentication. ZenML currently provides two options for managing S3 authentication: one where you don't need to manage credentials explicitly, the other one that requires you to generate an AWS access key and store it in a ZenML Secret. Each method has advantages and disadvantages, and you should choose the one that best suits your use-case. If you're looking for a quick way to get started locally, we recommend using the Implicit Authentication method. However, if you would like to experiment with ZenML stacks that combine the S3 Artifact Store with other remote stack components, we recommend using the AWS Credentials method, especially if you don't have a lot of experience with AWS IAM roles and policies.

This method uses the implicit AWS authentication available in the environment where the ZenML code is running. On your local machine, this is the quickest way to configure an S3 Artifact Store. You don't need to supply credentials explicitly when you register the S3 Artifact Store, as it leverages the local credentials and configuration that the AWS CLI stores on your local machine. However, you will need to install and set up the AWS CLI on your machine as a prerequisite, as covered in the AWS CLI documentation, before you register the S3 Artifact Store.

Certain dashboard functionality, such as visualizing or deleting artifacts, is not available when using an implicitly authenticated artifact store together with a deployed ZenML server because the ZenML server will not have permissions to access the filesystem.

The implicit authentication method also needs to be coordinated with other stack components that are highly dependent on the Artifact Store and need to interact with it directly to function. If these components are not running on your machine, they do not have access to the local AWS CLI configuration and will encounter authentication failures while trying to access the S3 Artifact Store:

  • Orchestrators need to access the Artifact Store to manage pipeline artifacts

  • Step Operators need to access the Artifact Store to manage step level artifacts

  • Model Deployers need to access the Artifact Store to load served models

These remote stack components can still use the implicit authentication method: if they are also running on Amazon EC2 or EKS nodes, ZenML will try to load credentials from the instance metadata service. In order to take advantage of this feature, you must have specified an IAM role to use when you launched your EC2 instance or EKS cluster. This mechanism allows AWS workloads like EC2 instances and EKS pods to access other AWS services without requiring explicit credentials. For more information on how to configure IAM roles:

If you have remote stack components that are not running in AWS Cloud, or if you are unsure how to configure them to use IAM roles, you should use one of the other authentication methods.

Advanced Configuration

The S3 Artifact Store accepts a range of advanced configuration options that can be used to further customize how ZenML connects to the S3 storage service that you are using. These are accessible via the client_kwargs, config_kwargs and s3_additional_kwargs configuration attributes and are passed transparently to the underlying S3Fs library:

  • client_kwargs: arguments that will be transparently passed to the botocore client. You can use it to configure parameters like endpoint_url and region_name when connecting to an S3 compatible endpoint (e.g. Minio).

  • config_kwargs: advanced parameters passed to botocore.client.Config.

  • s3_additional_kwargs: advanced parameters that are used when calling S3 API, typically used for things like ServerSideEncryption and ACL.

To include these advanced parameters in your Artifact Store configuration, pass them using JSON format during registration, e.g.:

zenml artifact-store register minio_store -f s3 \
    --path='s3://minio_bucket' \
    --client_kwargs='{"endpoint_url:"http://minio.cluster.local:9000", "region_name":"us-east-1"}'

For more, up-to-date information on the S3 Artifact Store implementation and its configuration, you can have a look at the API docs.

How do you use it?

Aside from the fact that the artifacts are stored in an S3 compatible backend, using the S3 Artifact Store is no different than using any other flavor of Artifact Store.

Last updated