Artifacts

Learn how ZenML manages data artifacts, tracks versioning and lineage, and enables effective data flow between steps.

Artifacts are a cornerstone of ZenML's ML pipeline management system. This guide explains what artifacts are, how they work, and how to use them effectively in your pipelines.

Artifacts in the Pipeline Workflow

Here's how artifacts fit into the ZenML pipeline workflow:

  1. A step produces data as output

  2. ZenML automatically stores this output as an artifact

  3. Other steps can use this artifact as input

  4. ZenML tracks the relationships between artifacts and steps

This system creates a complete data lineage for every artifact in your ML workflows, enabling reproducibility and traceability.

Basic Artifact Usage

Creating Artifacts (Step Outputs)

Any value returned from a step becomes an artifact:

from zenml import pipeline, step
import pandas as pd

@step
def create_data() -> pd.DataFrame:
    """Creates a dataframe that becomes an artifact."""
    return pd.DataFrame({
        "feature_1": [1, 2, 3],
        "feature_2": [4, 5, 6],
        "target": [10, 20, 30]
    })

@step
def create_prompt_template() -> str:
    """Creates a prompt template that becomes an artifact."""
    return """
    You are a helpful customer service agent. 
    
    Customer Query: {query}
    Previous Context: {context}
    
    Please provide a helpful response following our company guidelines.
    """

Consuming Artifacts (Step Inputs)

You can use artifacts by receiving them as inputs to other steps:

Artifacts vs. Parameters

When calling a step, inputs can be either artifacts or parameters:

  • Artifacts are outputs from other steps in the pipeline. They are tracked, versioned, and stored in the artifact store.

  • Parameters are literal values provided directly to the step. They aren't stored as artifacts but are recorded with the pipeline run.

Parameters are limited to JSON-serializable values (numbers, strings, lists, dictionaries, etc.). More complex objects should be passed as artifacts.

Accessing Artifacts After Pipeline Runs

You can access artifacts from completed runs using the ZenML Client:

Working with Artifact Types

Type Annotations

Type annotations are important when working with artifacts as they:

  1. Help ZenML select the appropriate materializer for storage

  2. Validate inputs and outputs at runtime

  3. Document the data flow of your pipeline

ZenML supports many common data types out of the box:

  • Primitive types (int, float, str, bool)

  • Container types (dict, list, tuple)

  • NumPy arrays

  • Pandas DataFrames

  • Many ML model formats (through integrations)

Returning Multiple Outputs

Steps can return multiple artifacts using tuples:

ZenML differentiates between:

  • A step with multiple outputs: return a, b or return (a, b)

  • A step with a single tuple output: return some_tuple

Naming Your Artifacts

By default, artifacts are named based on their position or variable name:

  • Single outputs are named output

  • Multiple outputs are named output_0, output_1, etc.

You can give your artifacts more meaningful names using the Annotated type:

You can even use dynamic naming with placeholders:

ZenML supports these placeholders:

  • {date}: Current date (e.g., "2023_06_15")

  • {time}: Current time (e.g., "14_30_45_123456")

  • Custom placeholders can be defined using substitutions

How Artifacts Work Under the Hood

Materializers: How Data Gets Stored

Materializers are a key concept in ZenML's artifact system. They handle:

  • Serializing data when saving artifacts to storage

  • Deserializing data when loading artifacts from storage

  • Generating visualizations for the dashboard

  • Extracting metadata for tracking and searching

When a step produces an output, ZenML automatically selects the appropriate materializer based on the data type (using type annotations). ZenML includes built-in materializers for common data types like:

  • Primitive types (int, float, str, bool)

  • Container types (dict, list, tuple)

  • NumPy arrays, Pandas DataFrames and many other ML-related formats (through integrations)

Here's how materializers work in practice:

For custom data types, you can create your own materializers. See the Materializers guide for details.

Lineage and Caching

ZenML automatically tracks the complete lineage of each artifact:

  • Which step produced it

  • Which pipeline run it belongs to

  • Which other artifacts it depends on

  • Which steps have consumed it

This lineage tracking enables powerful caching capabilities. When you run a pipeline, ZenML checks if any steps have been run before with the same inputs, code, and configuration. If so, it reuses the cached outputs instead of rerunning the step:

Advanced Artifact Usage

Accessing Artifacts from Previous Runs

You can access artifacts from any previous run by name or ID:

You can also access artifacts within steps:

Cross-Pipeline Artifact Usage

You can use artifacts produced by one pipeline in another pipeline:

This allows you to build modular pipelines that can work together as part of a larger ML system.

Visualizing Artifacts

ZenML automatically generates visualizations for many types of artifacts, viewable in the dashboard:

For detailed information on visualizations, see Visualizations.

Managing Artifacts

Individual artifacts cannot be deleted directly (to prevent broken references). However, you can clean up unused artifacts:

This deletes artifacts that are no longer referenced by any pipeline run. You can control this behavior with flags:

  • --only-artifact: Only delete the physical files, keep database entries

  • --only-metadata: Only delete database entries, keep files

  • --ignore-errors: Continue pruning even if some artifacts can't be deleted

Registering Existing Data as Artifacts

Sometimes, you may have data created externally (outside of ZenML pipelines) that you want to use within your ZenML workflows. Instead of reading and materializing this data within a step, you can register existing files or folders as ZenML artifacts directly.

Register an Existing Folder

To register a folder as a ZenML artifact:

Register an Existing File

Similarly, you can register individual files:

This approach is particularly useful for:

  • Integrating with external ML frameworks that save their own data

  • Working with pre-existing datasets

  • Registering model checkpoints created during training

When you load these artifacts, you'll receive a pathlib.Path pointing to a temporary location in your executing environment, ready for use as a normal local path.

Register Framework Checkpoints

A common use case is registering model checkpoints from training frameworks like PyTorch Lightning:

You can also extend the ModelCheckpoint callback to register each checkpoint as a separate artifact version during training. This approach enables better version control of intermediate checkpoints.

Conclusion

Artifacts are a central part of ZenML's approach to ML pipelines. They provide:

  • Automatic versioning and lineage tracking

  • Efficient storage and caching

  • Type-safe data handling

  • Visualization capabilities

  • Cross-pipeline data sharing

Whether you're working with traditional ML models, prompt templates, agent configurations, or evaluation datasets, ZenML's artifact system treats them all uniformly. This enables you to apply the same MLOps principles across your entire AI stack - from classical ML to complex multi-agent systems.

By understanding how artifacts work, you can build more effective, maintainable, and reproducible ML pipelines and AI workflows.

For more information on specific aspects of artifacts, see:

Last updated

Was this helpful?