Artifact Stores

Setting up a persistent storage for your artifacts.

The Artifact Store is a central component in any MLOps stack. As the name suggests, it acts as a data persistence layer where artifacts (e.g. datasets, models) ingested or generated by the machine learning pipelines are stored.

ZenML automatically serializes and saves the data circulated through your pipelines in the Artifact Store: datasets, models, data profiles, data and model validation reports, and generally any object that is returned by a pipeline step. This is coupled with tracking in ZenML to provide extremely useful features such as caching and provenance/lineage tracking and pipeline reproducibility.

In addition to pipeline artifacts, the Artifact Store may also be used as storage backed by other specialized stack components that need to store their data in the form of persistent object storage. The Great Expectations Data Validator is such an example.

Related concepts:

  • the Artifact Store is a type of Stack Component that needs to be registered as part of your ZenML Stack.

  • the objects circulated through your pipelines are serialized and stored in the Artifact Store using Materializer. Materializers implement the logic required to serialize and deserialize the artifact contents and to store them and retrieve their contents to/from the Artifact Store.

When to use it

The Artifact Store is a mandatory component in the ZenML stack. It is used to store all artifacts produced by pipeline runs, and you are required to configure it in all of your stacks.

Artifact Store Flavors

Out of the box, ZenML comes with a local artifact store already part of the default stack that stores artifacts on your local filesystem. Additional Artifact Stores are provided by integrations:

Artifact Store
Flavor
Integration
URI Schema(s)
Notes

local

built-in

None

This is the default Artifact Store. It stores artifacts on your local filesystem. Should be used only for running ZenML locally.

s3

s3

s3://

Uses AWS S3 as an object store backend

gcp

gcp

gs://

Uses Google Cloud Storage as an object store backend

azure

azure

abfs://, az://

Uses Azure Blob Storage as an object store backend

custom

custom

Extend the Artifact Store abstraction and provide your own implementation

If you would like to see the available flavors of Artifact Stores, you can use the command:

zenml artifact-store flavor list

How to use it

The Artifact Store provides low-level object storage services for other ZenML mechanisms. When you develop ZenML pipelines, you normally don't even have to be aware of its existence or interact with it directly. ZenML provides higher-level APIs that can be used as an alternative to store and access artifacts:

  • return one or more objects from your pipeline steps to have them automatically saved in the active Artifact Store as pipeline artifacts.

  • retrieve pipeline artifacts from the active Artifact Store after a pipeline run is complete.

You will probably need to interact with the low-level Artifact Store API directly:

  • if you implement custom Materializers for your artifact data types

  • if you want to store custom objects in the Artifact Store

The Artifact Store API

All ZenML Artifact Stores implement the same IO API that resembles a standard file system. This allows you to access and manipulate the objects stored in the Artifact Store in the same manner you would normally handle files on your computer and independently of the particular type of Artifact Store that is configured in your ZenML stack.

Accessing the low-level Artifact Store API can be done through the following Python modules:

  • zenml.io.fileio provides low-level utilities for manipulating Artifact Store objects (e.g. open, copy, rename , remove, mkdir). These functions work seamlessly across Artifact Stores types. They have the same signature as the Artifact Store abstraction methods ( in fact, they are one and the same under the hood).

  • zenml.utils.io_utils includes some higher-level helper utilities that make it easier to find and transfer objects between the Artifact Store and the local filesystem or memory.

The following are some code examples showing how to use the Artifact Store API for various operations:

  • creating folders, writing and reading data directly to/from an artifact store object

import os
from zenml.utils import io_utils
from zenml.io import fileio

from zenml.client import Client

root_path = Client().active_stack.artifact_store.path

artifact_contents = "example artifact"
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
fileio.makedirs(artifact_path)
io_utils.write_file_contents_as_string(artifact_uri, artifact_contents)
import os
from zenml.utils import io_utils

from zenml.client import Client

root_path = Client().active_stack.artifact_store.path

artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
artifact_contents = io_utils.read_file_contents_as_string(artifact_uri)
  • using a temporary local file/folder to serialize and copy in-memory objects to/from the artifact store (heavily used in Materializers to transfer information between the Artifact Store and external libraries that don't support writing/reading directly to/from the artifact store backend):

import os
import tempfile
import external_lib

root_path = Repository().active_stack.artifact_store.path

artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.json")
fileio.makedirs(artifact_path)

with tempfile.NamedTemporaryFile(
        mode="w", suffix=".json", delete=True
) as f:
    external_lib.external_object.save_to_file(f.name)
    # Copy it into artifact store
    fileio.copy(f.name, artifact_uri)
import os
import tempfile
import external_lib

root_path = Repository().active_stack.artifact_store.path

artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.json")

with tempfile.NamedTemporaryFile(
        mode="w", suffix=".json", delete=True
) as f:
    # Copy the serialized object from the artifact store
    fileio.copy(artifact_uri, f.name)
    external_lib.external_object.load_from_file(f.name)

Last updated

Was this helpful?