Ask or search…

Visualize Data Lineage

How to visualize ZenML pipeline runs
This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.
ZenML's Dash integration provides a PipelineRunLineageVisualizer that can be used to visualize pipeline runs in your local browser, as shown below:
Pipeline Run Visualization Example


Before you can use the Dash visualizer, you first need to install ZenML's Dash integration:
zenml integration install dash -y
See the Integrations page for more details on ZenML integrations and how to install and use them.

Visualizing Pipelines

After a pipeline run has been started, we can access it using the Repository, as you learned in the last section on Inspecting Finished Pipeline Runs.
We can then visualize a run using the PipelineRunLineageVisualizer class:
from zenml.integrations.dash.visualizers.pipeline_run_lineage_visualizer import (
from zenml.repository import Repository
repo = Repository()
latest_run = repo.get_pipeline(<PIPELINE_NAME>).runs[-1]
This will open an interactive visualization in your local browser at, where squares represent your artifacts and circles your pipeline steps.
The different nodes are color-coded in the visualization, so if your pipeline ever fails or runs for too long, you can find the responsible step at a glance, as it will be colored red or yellow respectively.

Visualizing Caching

In addition to Completed, Running, and Failed, there is also a separate Cached state. You already learned about caching in a previous section on Caching Pipeline Runs. Using the PipelineRunLineageVisualizer, you can see at a glance which steps were cached (green) and which were rerun (blue). See below for a detailed example.

Code Example

In the following example we use the PipelineRunLineageVisualizer to visualize the three pipeline runs from the Caching Pipeline Runs Example:
Code Example of this Section
import numpy as np
from sklearn.base import ClassifierMixin
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from zenml.steps import BaseStepConfig, Output, step
from zenml.pipelines import pipeline
from zenml.integrations.dash.visualizers.pipeline_run_lineage_visualizer import (
from zenml.repository import Repository
def digits_data_loader() -> Output(
X_train=np.ndarray, X_test=np.ndarray, y_train=np.ndarray, y_test=np.ndarray
"""Loads the digits dataset as a tuple of flattened numpy arrays."""
digits = load_digits()
data = digits.images.reshape((len(digits.images), -1))
X_train, X_test, y_train, y_test = train_test_split(
data, digits.target, test_size=0.2, shuffle=False
return X_train, X_test, y_train, y_test
class SVCTrainerStepConfig(BaseStepConfig):
"""Trainer params"""
gamma: float = 0.001
@step(enable_cache=False) # never cache this step, always retrain
def svc_trainer(
config: SVCTrainerStepConfig,
X_train: np.ndarray,
y_train: np.ndarray,
) -> ClassifierMixin:
"""Train a sklearn SVC classifier."""
model = SVC(gamma=config.gamma)
model.fit(X_train, y_train)
return model
def first_pipeline(step_1, step_2):
X_train, X_test, y_train, y_test = step_1()
step_2(X_train, y_train)
first_pipeline_instance = first_pipeline(
# The pipeline is executed for the first time, so all steps are run.
latest_run= first_pipeline_instance.get_runs()[-1]
# Step one will use cache, step two will rerun due to the decorator config
latest_run = first_pipeline_instance.get_runs()[-1]
# The complete pipeline will be rerun
latest_run = first_pipeline_instance.get_runs()[-1]

Expected Visualizations

Run 1:
Visualization Run 1
Run 2:
Visualization Run 2
Run 3:
Visualization Run 3