Dynamically assign artifact names
How to dynamically assign artifact names in your pipelines.
In ZenML pipelines, you often need to reuse the same step multiple times with different inputs, resulting in multiple artifacts. However, the default naming convention for artifacts can make it challenging to track and differentiate between these outputs, especially when they need to be used in subsequent pipelines. Below you can find a detailed exploration of how you might go about dynamically generating steps and artifacts to improve pipeline flexibility and maintainability.
By default, ZenML uses type annotations in function definitions to determine artifact names. While this works well for steps used once in a pipeline, it becomes problematic when:
The same step is called multiple times with different inputs.
The resulting artifacts need to be used in different pipelines later.
Output artifacts are saved with the same name and incremented version numbers.
For example, when using a preprocessor step that needs to transform train, validation, and test data separately, you might end up with three versions of an artifact called transformed_data
, making it difficult to track which is which.
ZenML offers two possible ways to address this problem:
Using factory functions to create dynamic steps with custom artifact names.
Using metadata to identify artifacts in a single step.
1. Using factory functions for dynamic artifact names
This approach allows you to create steps with custom artifact names dynamically:
This method generates unique artifact names for each step, making it easier to track and retrieve specific artifacts later in your workflow.
One caveat applies to this first method which is that either of the following two things must be true:
The factory must be in the same file as where the steps are defined -> This is so the logic with
globals()
worksThe user must have use the same variable name for the step as the
__name__
of the entrypoint function
As you can see, this is not always possible or desirable and you should use the second method if you can.
2. Using Metadata for Custom Artifact Identification
If you prefer using a single step and differentiating artifacts through metadata, try this approach:
We can see the metadata in the dashboard:
This method uses a single generic_step
but adds custom metadata to each artifact. You can later use this metadata to identify and differentiate between artifacts:
Both solutions provide ways to custom-identify your artifacts without modifying ZenML's core functionality. The factory function approach offers more control over the artifact name itself, while the metadata approach maintains consistent artifact names but adds custom metadata for identification.
Last updated