0.23.0
Search…
⌃K
Links

Artifact Stores

How to set up the persistent storage for your artifacts
The Artifact Store is a central component in any MLOps stack. As the name suggests, it acts as a data persistence layer where artifacts (e.g. datasets, models) ingested or generated by the machine learning pipelines are stored.
ZenML automatically serializes and saves the data circulated through your pipelines in the Artifact Store: datasets, models, data profiles, data and model validation reports and generally any object that is returned by a pipeline step. This is coupled with tracking in ZenML to provide extremely useful features such as caching and provenance/lineage tracking and pipeline reproducibility.
Not all objects returned by pipeline steps are physically stored in the Artifact Store, nor do they have to be. How artifacts are serialized and deserialized and where their contents are stored is determined by the particular implementation of the Materializer associated with the artifact data type. The majority of Materializers shipped with ZenML use the Artifact Store that is part of the active Stack as the location where artifacts are kept.
If you need to store a particular type of pipeline artifacts in a different medium (e.g. use an external model registry to store model artifacts, or an external data lake or data warehouse to store dataset artifacts), you can write your own Materializer to implement the custom logic required for it. In contrast, if you need to use an entirely different storage backend to store artifacts, one that isn't already covered by one of the ZenML integrations, you can extend the Artifact Store abstraction to provide your own Artifact Store implementation.
In addition to pipeline artifacts, the Artifact Store may also be used as a storage backed by other specialized stack components that need to store their data in a form of persistent object storage. The Great Expectations Data Validator is such an example.
Related concepts:
  • the Artifact Store is a type of Stack Component that needs to be registered as part of your ZenML Stack.
  • the objects circulated through your pipelines are serialized and stored in the Artifact Store using Materializers. Materializers implement the logic required to serialize and deserialize the artifact contents and to store them and retrieve their contents to/from the Artifact Store.
  • you can access the artifacts produced by your pipeline runs from the Artifact Store using the post-execution workflow API.

When to use it

The Artifact Store is a mandatory component in the ZenML stack. It is used to store all artifacts produced by pipeline runs, and you are required to configure it in all of your stacks.

Artifact Store Flavors

Out of the box, ZenML comes with a local artifact store already part of the default stack that stores artifacts on your local filesystem. Additional Artifact Stores are provided by integrations:
Artifact Store
Flavor
Integration
URI Schema(s)
Notes
Local
local
built-in
None
This is the default Artifact Store. It stores artifacts on your local filesystem. Should be used only for running ZenML locally.
Amazon S3
s3
s3
s3://
Uses AWS S3 as an object store backend
gcp
gcp
gs://
Uses Google Cloud Storage as an object store backend
Azure
azure
azure
abfs://, az://
Uses Azure Blob Storage as an object store backend
custom
custom
Extend the Artifact Store abstraction and provide your own implementation
If you would like to see the available flavors of Artifact Stores, you can use the command:
zenml artifact-store flavor list
Every Artifact Store has a path attribute that must be configured when it is registered with ZenML. This is a URI pointing to the root path where all objects are stored in the Artifact Store. It must use a URI schema that is supported by the Artifact Store flavor. For example, the S3 Artifact Store will need a URI that contains the s3:// schema:
zenml artifact-store register s3_store -f s3 --path s3://my_bucket

How to use it

The Artifact Store provides low-level object storage services for other ZenML mechanisms. When you develop ZenML pipelines, you normally don't even have to be aware of its existence or interact with it directly. ZenML provides higher-level APIs that can be used as an alternative to store and access artifacts:
  • return one or more objects from your pipeline steps to have them automatically saved in the active Artifact Store as pipeline artifacts.
  • use the post-execution workflow API to retrieve pipeline artifacts from the active Artifact Store after a pipeline run is complete.
You will probably need to interact with the low-level Artifact Store API directly:
  • if you implement custom Materializers for your artifact data types
  • if you want to store custom objects in the Artifact Store

The Artifact Store API

All ZenML Artifact Stores implement the same IO API that resembles a standard file system. This allows you to access and manipulate the objects stored in the Artifact Store in the same manner you would normally handle files on your computer and independently of the particular type of Artifact Store that is configured in your ZenML stack.
Accessing the low-level Artifact Store API can be done through the following Python modules:
  • zenml.io.fileio provides low-level utilities for manipulating Artifact Store objects (e.g. open, copy, rename, remove, mkdir). These functions work seamlessly across Artifact Stores types. They have the same signature as the Artifact Store abstraction methods (in fact, they are one and the same under the hood).
  • zenml.utils.io_utils includes some higher-level helper utilities that make it easier to find and transfer objects between the Artifact Store and the local filesystem or memory.
When calling the Artifact Store API, you should always use URIs that are relative to the Artifact Store root path, otherwise you risk using an unsupported protocol or storing objects outside the store. You can use the Repository singleton to retrieve the root path of the active Artifact Store and then use it as a base path for artifact URIs, e.g.:
import os
from zenml.client import Client
from zenml.io import fileio
root_path = Client().active_stack.artifact_store.path
artifact_contents = "example artifact"
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
fileio.makedirs(artifact_path)
with fileio.open(artifact_uri, "w") as f:
f.write(artifact_contents)
When using the Artifact Store API to write custom Materializers, the base artifact URI path is already provided. See the documentation on Materializers for an example).
The following are some code examples showing how to use the Artifact Store API for various operations:
  • creating folders, writing and reading data directly to/from an artifact store object
import os
from zenml.utils import io_utils
from zenml.io import fileio
from zenml.client import Client
root_path = Client().active_stack.artifact_store.path
artifact_contents = "example artifact"
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
fileio.makedirs(artifact_path)
io_utils.write_file_contents_as_string(artifact_uri, artifact_contents)
import os
from zenml.utils import io_utils
from zenml.client import Client
root_path = Client().active_stack.artifact_store.path
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.txt")
artifact_contents = io_utils.read_file_contents_as_string(artifact_uri)
  • using a temporary local file/folder to serialize and copy in-memory objects to/from the artifact store (heavily used in Materializers to transfer information between the Artifact Store and external libraries that don't support writing/reading directly to/from the artifact store backend):
import os
import tempfile
import external_lib
root_path = Repository().active_stack.artifact_store.path
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.json")
fileio.makedirs(artifact_path)
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=True
) as f:
external_lib.external_object.save_to_file(f.name)
# Copy it into artifact store
fileio.copy(f.name, artifact_uri)
import os
import tempfile
import external_lib
root_path = Repository().active_stack.artifact_store.path
artifact_path = os.path.join(root_path, "artifacts", "examples")
artifact_uri = os.path.join(artifact_path, "test.json")
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=True
) as f:
# Copy the serialized object from the artifact store
fileio.copy(artifact_uri, f.name)
external_lib.external_object.load_from_file(f.name)