๐ŸชArtifact Stores

Setting up a persistent storage for your artifacts.

The Artifact Store is a central component in any MLOps stack. As the name suggests, it acts as a data persistence layer where artifacts (e.g. datasets, models) ingested or generated by the machine learning pipelines are stored.

ZenML automatically serializes and saves the data circulated through your pipelines in the Artifact Store: datasets, models, data profiles, data and model validation reports, and generally any object that is returned by a pipeline step. This is coupled with tracking in ZenML to provide extremely useful features such as caching and provenance/lineage tracking and pipeline reproducibility.

Not all objects returned by pipeline steps are physically stored in the Artifact Store, nor do they have to be. How artifacts are serialized and deserialized and where their contents are stored are determined by the particular implementation of the Materializer associated with the artifact data type. The majority of Materializers shipped with ZenML use the Artifact Store which is part of the active Stack as the location where artifacts are kept.

If you need to store a particular type of pipeline artifact in a different medium (e.g. use an external model registry to store model artifacts, or an external data lake or data warehouse to store dataset artifacts), you can write your own Materializer to implement the custom logic required for it. In contrast, if you need to use an entirely different storage backend to store artifacts, one that isn't already covered by one of the ZenML integrations, you can extend the Artifact Store abstraction to provide your own Artifact Store implementation.

In addition to pipeline artifacts, the Artifact Store may also be used as storage backed by other specialized stack components that need to store their data in the form of persistent object storage. The Great Expectations Data Validator is such an example.

Related concepts:

  • the Artifact Store is a type of Stack Component that needs to be registered as part of your ZenML Stack.

  • the objects circulated through your pipelines are serialized and stored in the Artifact Store using Materializers. Materializers implement the logic required to serialize and deserialize the artifact contents and to store them and retrieve their contents to/from the Artifact Store.

When to use it

The Artifact Store is a mandatory component in the ZenML stack. It is used to store all artifacts produced by pipeline runs, and you are required to configure it in all of your stacks.

Artifact Store Flavors

Out of the box, ZenML comes with a local artifact store already part of the default stack that stores artifacts on your local filesystem. Additional Artifact Stores are provided by integrations:

Artifact StoreFlavorIntegrationURI Schema(s)Notes

local

built-in

None

This is the default Artifact Store. It stores artifacts on your local filesystem. Should be used only for running ZenML locally.

s3

s3

s3://

Uses AWS S3 as an object store backend

gcp

gcp

gs://

Uses Google Cloud Storage as an object store backend

azure

azure

abfs://, az://

Uses Azure Blob Storage as an object store backend

custom

custom

Extend the Artifact Store abstraction and provide your own implementation

If you would like to see the available flavors of Artifact Stores, you can use the command:

zenml artifact-store flavor list

Every Artifact Store has a path attribute that must be configured when it is registered with ZenML. This is a URI pointing to the root path where all objects are stored in the Artifact Store. It must use a URI schema that is supported by the Artifact Store flavor. For example, the S3 Artifact Store will need a URI that contains the s3:// schema:

zenml artifact-store register s3_store -f s3 --path s3://my_bucket

How to use it

The Artifact Store provides low-level object storage services for other ZenML mechanisms. When you develop ZenML pipelines, you normally don't even have to be aware of its existence or interact with it directly. ZenML provides higher-level APIs that can be used as an alternative to store and access artifacts:

  • return one or more objects from your pipeline steps to have them automatically saved in the active Artifact Store as pipeline artifacts.

  • retrieve pipeline artifacts from the active Artifact Store after a pipeline run is complete.

You will probably need to interact with the low-level Artifact Store API directly:

  • if you implement custom Materializers for your artifact data types

  • if you want to store custom objects in the Artifact Store

The Artifact Store API

All ZenML Artifact Stores implement the same IO API that resembles a standard file system. This allows you to access and manipulate the objects stored in the Artifact Store in the same manner you would normally handle files on your computer and independently of the particular type of Artifact Store that is configured in your ZenML stack.

Accessing the low-level Artifact Store API can be done through the following Python modules:

  • zenml.io.fileio provides low-level utilities for manipulating Artifact Store objects (e.g. open, copy, rename , remove, mkdir). These functions work seamlessly across Artifact Stores types. They have the same signature as the Artifact Store abstraction methods ( in fact, they are one and the same under the hood).

  • zenml.utils.io_utils includes some higher-level helper utilities that make it easier to find and transfer objects between the Artifact Store and the local filesystem or memory.

When calling the Artifact Store API, you should always use URIs that are relative to the Artifact Store root path, otherwise, you risk using an unsupported protocol or storing objects outside the store. You can use the Repository singleton to retrieve the root path of the active Artifact Store and then use it as a base path for artifact URIs, e.g.:

import os
from zenml.client import Client
from zenml.io import fileio

root_path = Client().active_stack.artifact_store.path

artifact_contents = "example artifact"
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
fileio.makedirs(artifact_path)
with fileio.open(artifact_uri, "w") as f:
    f.write(artifact_contents)

When using the Artifact Store API to write custom Materializers, the base artifact URI path is already provided. See the documentation on Materializers for an example.

The following are some code examples showing how to use the Artifact Store API for various operations:

  • creating folders, writing and reading data directly to/from an artifact store object

import os
from zenml.utils import io_utils
from zenml.io import fileio

from zenml.client import Client

root_path = Client().active_stack.artifact_store.path

artifact_contents = "example artifact"
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
fileio.makedirs(artifact_path)
io_utils.write_file_contents_as_string(artifact_uri, artifact_contents)
import os
from zenml.utils import io_utils

from zenml.client import Client

root_path = Client().active_stack.artifact_store.path

artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
artifact_contents = io_utils.read_file_contents_as_string(artifact_uri)
  • using a temporary local file/folder to serialize and copy in-memory objects to/from the artifact store (heavily used in Materializers to transfer information between the Artifact Store and external libraries that don't support writing/reading directly to/from the artifact store backend):

import os
import tempfile
import external_lib

root_path = Repository().active_stack.artifact_store.path

artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.json")
fileio.makedirs(artifact_path)

with tempfile.NamedTemporaryFile(
        mode="w", suffix=".json", delete=True
) as f:
    external_lib.external_object.save_to_file(f.name)
    # Copy it into artifact store
    fileio.copy(f.name, artifact_uri)
import os
import tempfile
import external_lib

root_path = Repository().active_stack.artifact_store.path

artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.json")

with tempfile.NamedTemporaryFile(
        mode="w", suffix=".json", delete=True
) as f:
    # Copy the serialized object from the artifact store
    fileio.copy(artifact_uri, f.name)
    external_lib.external_object.load_from_file(f.name)

Last updated