Caching
Discover the power of caching with ZenML.
Machine learning pipelines are rerun many times over throughout their development lifecycle. Prototyping is often a fast and iterative process that benefits a lot from caching. This makes caching a very powerful tool. (Read our blogpost for more context on the benefits of caching.)

Caching in ZenML

ZenML comes with caching enabled by default. As long as there is no change within a step or upstream from it, the cached outputs of that step will be used for the next pipeline run. This means that whenever there are code or configuration changes affecting a step, the step will be rerun in the next pipeline execution. Currently, the caching does not automatically detect changes within the file system or on external APIs. Make sure to set caching to False on steps that depend on external input or if the step should run regardless of caching.
There are multiple ways to take control of when and where caching is used.

Caching on a Pipeline Level

On a pipeline level the caching policy can easily be set as a parameter within the decorator. If caching is explicitly turned off on a pipeline level, all steps are run without caching, even if caching is set to true for single steps.
1
@pipeline(enable_cache=False)
2
def first_pipeline(....):
3
"""Pipeline with cache disabled"""
Copied!

Control Caching on a Step Level

Caching can also be explicitly turned off at a step level. You might want to turn off caching for steps that take external input (like fetching data from an API or File IO).
1
@step(enable_cache=False)
2
def import_data_from_api(...):
3
"""Import most up-to-date data from public api"""
4
...
5
6
@pipeline(enable_cache=True)
7
def pipeline(....):
8
"""Pipeline with cache disabled"""
Copied!

Control Caching within the Runtime Configuration

Sometimes you want to have control over caching at runtime instead of defaulting to the backed in configurations of your pipeline and its steps. ZenML offers a way to override all caching settings of the pipeline at runtime.
1
first_pipeline(step_1=..., step_2=...).run(enable_cache=False)
Copied!

Summary in Code

Code Example of this Section
Export as PDF
Copy link
Edit on GitHub
Contents