Skipping materialization
Skip materialization of artifacts.
A ZenML pipeline is built in a data-centric way. The outputs and inputs of steps define how steps are connected and the order in which they are executed. Each step should be considered as its very own process that reads and writes its inputs and outputs from and to the artifact store. This is where materializers come into play.
A materializer dictates how a given artifact can be written to and retrieved from the artifact store and also contains all serialization and deserialization logic. Whenever you pass artifacts as outputs from one pipeline step to other steps as inputs, the corresponding materializer for the respective data type defines how this artifact is first serialized and written to the artifact store, and then deserialized and read in the next step. Read more about this here.
However, there are instances where you might not want to materialize an artifact in a step, but rather use a reference to it instead. This is where skipping materialization comes in.
Skipping materialization might have unintended consequences for downstream tasks that rely on materialized artifacts. Only skip materialization if there is no other way to do what you want to do.
How to skip materialization
While materializers should in most cases be used to control how artifacts are returned and consumed from pipeline steps, you might sometimes need to have a completely unmaterialized artifact in a step, e.g., if you need to know the exact path to where your artifact is stored.
An unmaterialized artifact is a zenml.materializers.UnmaterializedArtifact
. Among others, it has a property uri
that points to the unique path in the artifact store where the artifact is persisted. One can use an unmaterialized artifact by specifying UnmaterializedArtifact
as the type in the step:
Code Example
The following shows an example of how unmaterialized artifacts can be used in the steps of a pipeline. The pipeline we define will look like this:
s1
and s2
produce identical artifacts, however s3
consumes materialized artifacts while s4
consumes unmaterialized artifacts. s4
can now use the dict_.uri
and list_.uri
paths directly rather than their materialized counterparts.
You can see another example of using an UnmaterializedArtifact
when triggering a pipeline from another.
Last updated