Register Existing Data as a ZenML Artifact
Learn how to register an external data as a ZenML artifact for future use.
Many modern Machine Learning framework create their own data as a byproduct of model training or other processes. In such cases there is no need to read and materialize those data assets to pack them into a ZenML Artifact, instead it is beneficial registering those data assets as-is in ZenML for future use.
Register Existing Folder as a ZenML Artifact
If the data created externally is a folder you can register the whole folder as a ZenML Artifact and later make use of it in subsequent steps or other pipelines.
The artifact produced from the preexisting data will have a pathlib.Path
type, once loaded or passed as input to another step. The path will be pointing to a temporary location in the executing environment and ready for use as a normal local Path
(passed into from_pretrained
or open
functions to name a few examples).
Register Existing File as a ZenML Artifact
If the data created externally is a file you can register it as a ZenML Artifact and later make use of it in subsequent steps or other pipelines.
Register All Checkpoints of a Pytorch Lightning Training Run
Now let's explore the Pytorch Lightning example to fit the model and store the checkpoints in a remote location.
Even if an artifact is created and stored externally, it can be treated like any other artifact produced by ZenML steps - with all the functionalities described above!
Register Checkpoints of a Pytorch Lightning Training Run as Separate Artifact Versions
To make checkpoints (or other intermediate artifacts) linkage better versioned you can extend the ModelCheckpoint
callback to your needs. For example such custom implementation could look like the one below, where we extend the on_train_epoch_end
method to register each checkpoint created during the training as a separate Artifact Version in ZenML.
To make checkpoint files last you need to set save_top_k=-1
, otherwise older checkpoints will be deleted, making registered artifact version unusable.
Below you can find a sophisticated example of a pipeline doing a Pytorch Lightning training with the artifacts linkage for checkpoint artifacts implemented as an extended Callback.
Last updated