Configure Automated Caching
How automated caching works in ZenML
This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.
Machine learning pipelines are rerun many times over throughout their development lifecycle. Prototyping is often a fast and iterative process that benefits a lot from caching. This makes caching a very powerful tool. Checkout this ZenML Blogpost on Caching for more context on the benefits of caching and ZenBytes lesson 1.2 for a detailed example on how to configure and visualize caching.
Caching in ZenML
ZenML comes with caching enabled by default. Since ZenML automatically tracks and versions all inputs, outputs, and parameters of steps and pipelines, ZenML will not re-execute steps within the same pipeline on subsequent pipeline runs as long as there is no change in these three.
Currently, the caching does not automatically detect changes within the file system or on external APIs. Make sure to set caching to False
on steps that depend on external inputs or if the step should run regardless of caching.
Configuring caching behavior of your pipelines
Although caching is desirable in many circumstances, one might want to disable it in certain instances. For example, if you are quickly prototyping with changing step definitions or you have an external API state change in your function that ZenML does not detect.
There are multiple ways to take control of when and where caching is used:
Disabling caching for the entire pipeline: Do this if you want to turn off all caching (not recommended).
Disabling caching for individual steps: This is required for certain steps that depend on external input.
Dynamically disabling caching for a pipeline run: This is useful to force a complete rerun of a pipeline.
Disabling caching for the entire pipeline
On a pipeline level the caching policy can be set as a parameter within the decorator.
If caching is explicitly turned off on a pipeline level, all steps are run without caching, even if caching is set to True
for single steps.
Disabling caching for individual steps
Caching can also be explicitly turned off at a step level. You might want to turn off caching for steps that take external input (like fetching data from an API or File IO).
You can get a graphical visualization of which steps were cached using ZenML's Pipeline Run Visualization Tool.
You can disable caching for individual steps via the config.yaml
file and specifying parameters for a specific step (as described in the section on YAML config files.) In this case, you would specify True
or False
in the place of the <ENABLE_CACHE_VALUE>
below.
You can see an example of this in action in our PyTorch Example, where caching is disabled for the trainer
step.
Dynamically disabling caching for a pipeline run
Sometimes you want to have control over caching at runtime instead of defaulting to the backed in configurations of your pipeline and its steps. ZenML offers a way to override all caching settings of the pipeline at runtime.
Code Example
The following example shows caching in action with the code example from the previous section on Runtime Configuration.
For a more detailed example on how caching is used at ZenML and how it works under the hood, checkout ZenBytes lesson 1.2!
Last updated