Step output typing and annotation
Step outputs are stored in your artifact store. Annotate and name them to make more explicit.
Type annotations
Your functions will work as ZenML steps even if you don't provide any type annotations for their inputs and outputs. However, adding type annotations to your step functions gives you lots of additional benefits:
Type validation of your step inputs: ZenML makes sure that your step functions receive an object of the correct type from the upstream steps in your pipeline.
Better serialization: Without type annotations, ZenML uses Cloudpickle to serialize your step outputs. When provided with type annotations, ZenML can choose a materializer that is best suited for the output. In case none of the builtin materializers work, you can even write a custom materializer.
ZenML provides a built-in CloudpickleMaterializer that can handle any object by saving it with cloudpickle. However, this is not production-ready because the resulting artifacts cannot be loaded when running with a different Python version. In such cases, you should consider building a custom Materializer to save your objects in a more robust and efficient format.
Moreover, using the CloudpickleMaterializer
could allow users to upload of any kind of object. This could be exploited to upload a malicious file, which could execute arbitrary code on the vulnerable system.
If you want to make sure you get all the benefits of type annotating your steps, you can set the environment variable ZENML_ENFORCE_TYPE_ANNOTATIONS
to True
. ZenML will then raise an exception in case one of the steps you're trying to run is missing a type annotation.
Tuple vs multiple outputs
It is impossible for ZenML to detect whether you want your step to have a single output artifact of type Tuple
or multiple output artifacts just by looking at the type annotation.
We use the following convention to differentiate between the two: When the return
statement is followed by a tuple literal (e.g. return 1, 2
or return (value_1, value_2)
) we treat it as a step with multiple outputs. All other cases are treated as a step with a single output of type Tuple
.
Step output names
By default, ZenML uses the output name output
for single output steps and output_0, output_1, ...
for steps with multiple outputs. These output names are used to display your outputs in the dashboard and fetch them after your pipeline is finished.
If you want to use custom output names for your steps, use the Annotated
type annotation:
If you do not give your outputs custom names, the created artifacts will be named {pipeline_name}::{step_name}::output
or {pipeline_name}::{step_name}::output_{i}
in the dashboard. See the documentation on artifact versioning and configuration for more information.
See Also:
Last updated