Understanding configuration
Understanding how to configure a ZenML pipeline
Configuring a step and a pipeline
The configuration of a step and/or a pipeline determines various details of how a run is executed. It is an important aspect of running workloads in production, and as such deserves a dedicated section in these docs.
How to apply configuration
Before we learn about all the different configuration options, let's briefly look at how configuration can be applied to a step or a pipeline. We start with the simplest configuration, a boolean flag called enable_cache
, that specifies whether caching should be enabled or disabled. There are essentially three ways you could configure this:
Method 1: Directly on the decorator
The most basic way to configure a step or a pipeline @step
and @pipeline
decorators:
Method 2: On the step/pipeline instance
This is exactly the same as passing it through the decorator, but if you prefer you can also pass it in the configure
methods of the pipeline and step instances:
Method 3: Configuring with YAML
As all configuration can be passed through as a dictionary, users have the option to send all configurations in via a YAML file. This is useful in situations where code changes are not desirable.
To use a YAML file, you must pass it to the with_options(...)
method of a pipeline:
The YAML method is the recommended method for applying configuration in production. It has the benefit of being declarative and decoupled from the codebase.
It is best practice to put all config files in a configs
directory at the root of your repository and check them into git history. This way, the tracked commit hash of a pipeline run links back to your config YAML.
Breaking configuration down
First, let's create a simple pipeline:
The method write_run_configuration_template
generates a config template (at path run.yaml
in this case) that includes all configuration options for this specific pipeline and your active stack. Let's run the pipeline:
The generated config contains most of the available configuration options for this pipeline. Let's walk through it section by section:
enable_XXX
parameters
enable_XXX
parametersThese are boolean flags for various configurations:
build
ID
build
IDextra
dict
extra
dictConfiguring the model
model
Pipeline and step parameters
parameters
Corresponds to:
Normally, parameters defined at the pipeline level are used in multiple steps, and then no step-level configuration is defined.
Setting the run_name
run_name
Real-time settings
settings
failure_hook_source
and success_hook_source
failure_hook_source
and success_hook_source
Step-specific configuration
A lot of pipeline-level configuration can also be applied at a step level (as we already seen with the enable_cache
flag). However, there is some configuration that is step-specific, meaning it cannot be applied at a pipeline level, but only at a step level.
Hierarchy and precedence
Some things can be configured on pipelines and steps, some only on one of the two. Pipeline-level settings will be automatically applied to all steps, but if the same setting is configured on a step as well that takes precedence.
When an object is configured, ZenML merges the values with previously-configured keys. E.g.:
In the above example, the two settings configurations were automatically merged.
Fetching configuration
The configuration is also displayed in the dashboard in the pipeline run details page.
Last updated