torch.nn.Module
, then you need to define a custom Materializer to tell ZenML how to handle this specific data type.BaseMaterializer
, which defines the interface of all materializers:ASSOCIATED_TYPES
attribute that contains a list of data types that this materializer can handle. ZenML uses this information to call the right materializer at the right time. I.e., if a ZenML step returns a pd.DataFrame
, ZenML will try to find any materializer that has pd.DataFrame
in its ASSOCIATED_TYPES
. List the data type of your custom object here to link the materializer to that data type.ASSOCIATED_ARTIFACT_TYPES
attribute, which defines what types of artifacts are being stored.DataArtifact
or ModelArtifact
here. If you are unsure, just use DataArtifact
. The exact choice is not too important, as the artifact type is only used as a tag in the visualization tools of some certain integrations like Facets.artifact
object. The most important property of an artifact
object is the uri
. The uri
is automatically created by ZenML whenever you run a pipeline and points to the directory of a file system where the artifact is stored (location in the artifact store). This should not be modified.handle_input()
and handle_return()
methods define the serialization and deserialization of artifacts.handle_input()
defines how data is read from the artifact store and deserialized,handle_return()
defines how data is serialized and saved to the artifact store.ASSOCIATED_TYPES
, then you might want to use torch.save()
and torch.load()
here. For example, have a look at the materializer in the Neural Prophet integration.with_return_materializers()
method of the step. E.g.:{<OUTPUT_NAME>:<MATERIALIZER_CLASS>}
can be supplied to the with_return_materializers()
method.with_return_materializers
only needs to be called for the output of the first step that produced an artifact of a given data type, all downstream steps will use the same materializer by default.my_step() -> Output(a: int, b: float)
has a
and b
as available output names.name
refers to the class name of your materializer, and the file
should contain a path to the module where the materializer is defined.MyObject
that flows between two steps in a pipeline:zenml.exceptions.StepInterfaceError: Unable to find materializer for output 'output' of type <class '__main__.MyObj'> in step 'step1'. Please make sure to either explicitly set a materializer for step outputs using step.with_return_materializers(...) or registering a default materializer for specific types by subclassing BaseMaterializer and setting its ASSOCIATED_TYPES class variable. For more information, visit https://docs.zenml.io/developer-guide/advanced-usage/materializer
MyObj
(how could it? We just created this!). Therefore, we have to create our own materializer. To do this, you can extend the BaseMaterializer
by sub-classing it, listing MyObj
in ASSOCIATED_TYPES
, and overwriting handle_input()
and handle_return()
:fileio
module to ensure your materialization logic works across artifact stores (local and remote like S3 buckets).ASSOCIATED_TYPES
attribute of the materializer, you won't necessarily have to add .with_return_materializers(MyMaterializer)
to the step. It should automatically be detected. It doesn't hurt to be explicit though.BaseArtifact
(or any of its subclasses) and has a property uri
that points to the unique path in the artifact store where the artifact is stored. One can use a non-materialized artifact by specifying it as the type in the step:ModelArtifact
then you should specify ModelArtifact
in a non-materialized step.keras.model
or torch.nn.Module
are pythonic types that are both linked to ModelArtifact
implicitly via their materializers.s1
and s2
produce identical artifacts, however s3
consumes materialized artifacts while s4
consumes non-materialized artifacts. s4
can now use the dict_.uri
and list_.uri
paths directly rather than their materialized counterparts.