Automatically track ML Metadata¶
ZenML uses Google’s ML Metadata under-the-hood to automatically track all metadata produced by ZenML pipelines. ML Metadata standardizes metadata tracking and makes it easy to keep track of iterative experimentation as it happens. This not only helps in post-training workflows to compare results as experiments progress but also has the added advantage of leveraging caching of pipeline steps.
How does it work?¶
All parameters of every ZenML step are persisted in the Metadata Store and also in the
declarative pipeline configs. In the config, they can be seen quite easily in the
key. Here is a sample stemming from this Python step:
training_pipeline.add_trainer(TrainerStep( batch_size=1, dropout_chance=0, epochs=1, hidden_activation="relu", hidden_layers=None, last_activation="sigmoid", loss="mse", lr=0.01, metrics=None, output_units=1, ))
That translates to the following config:
steps: ... trainer: args: batch_size: 1 dropout_chance: 0.2 epochs: 1 hidden_activation: relu hidden_layers: null last_activation: sigmoid loss: mse lr: 0.001 metrics: null output_units: 1 source: ...
args key represents all the parameters captured and persisted.
For most use-cases, ZenML exposes native interfaces to fetch these parameters after a pipeline has been run successfully.
repo.compare_pipelines() method compares all pipelines in a Repository() and extensively uses the ML Metadata store
to spin up a visualization of comparison of training pipeline results.
However, if users would like direct access to the store, they can easily use the ML Metadata Python library to quickly access their parameters. To understand more about how Execution Parameters and ML Metadata work please refer to the TFX docs.
How to specify what to track¶
As all steps are persisted in the same pattern show above, it is very simple to track any metadata you desire.
Whenever creating a custom step, simply add the parameters you want to track as
kwargs (keyworded parameters) in your Step
__init__() method (i.e. the constructor).
ZenML ensures that all kwargs of all steps are tracked automatically.
As outlined in the Steps definition, for now, only primitive types are supported for tracking. You cannot track any arbitrary python object. Please ensure that only primitive types (int, string, float etc) are used in your Step constructors.