Understanding configuration
Understanding how to configure a ZenML pipeline
The configuration of a step and/or a pipeline determines various details of how a run is executed. It is an important aspect of running workloads in production, and as such deserves a dedicated section in these docs.
We have already learned about some basics of configuration in the production guide. Here we go into more depth.
How to apply configuration
Before we learn about all the different configuration options, let's briefly look at how configuration can be applied to a step or a pipeline. We start with the simplest configuration, a boolean flag called enable_cache
, that specifies whether caching should be enabled or disabled. There are essentially three ways you could configure this:
Method 1: Directly on the decorator
The most basic way to configure a step or a pipeline @step
and @pipeline
decorators:
Once you set configuration on a pipeline, they will be applied to all steps with some exceptions. See the section on precedence for more details.
Method 2: On the step/pipeline instance
This is exactly the same as passing it through the decorator, but if you prefer you can also pass it in the configure
methods of the pipeline and step instances:
Method 3: Configuring with YAML
As all configuration can be passed through as a dictionary, users have the option to send all configurations in via a YAML file. This is useful in situations where code changes are not desirable.
To use a YAML file, you must pass it to the with_options(...)
method of a pipeline:
The format of a YAML config file is exactly the same as the configurations you would pass in Python in the above two sections. Step-specific configurations can be passed by using the step invocation ID inside the steps
dictionary. All keys are optional. Here is an example:
The YAML method is the recommended method for applying configuration in production. It has the benefit of being declarative and decoupled from the codebase.
It is best practice to put all config files in a configs
directory at the root of your repository and check them into git history. This way, the tracked commit hash of a pipeline run links back to your config YAML.
Breaking configuration down
Now that we understand how to apply configuration, let's see all the various ways we can configure a ZenML pipeline. We will use the YAML configuration for this, but as seen in the section above, you can use the information here to configure your steps and pipelines any way you choose.
First, let's create a simple pipeline:
The method write_run_configuration_template
generates a config template (at path run.yaml
in this case) that includes all configuration options for this specific pipeline and your active stack. Let's run the pipeline:
The generated config contains most of the available configuration options for this pipeline. Let's walk through it section by section:
enable_XXX
parameters
enable_XXX
parametersThese are boolean flags for various configurations:
enable_artifact_metadata
: Whether to associate metadata with artifacts or not.enable_artifact_visualization
: Whether to attach visualizations of artifacts.enable_cache
: Utilize caching or not.enable_step_logs
: Enable tracking step logs.
build
ID
build
IDThe UUID of the build
to use for this pipeline. If specified, Docker image building is skipped for remote orchestrators, and the Docker image specified in this build is used.
extra
dict
extra
dictThis is a dictionary that is available to be passed to steps and pipelines called extra
. This dictionary is meant to be used to pass any configuration down to the pipeline, step, or stack components that the user has use of. See an example in this section.
Configuring the model
model
Specifies the ZenML Model to use for this pipeline.
Pipeline and step parameters
parameters
A dictionary of JSON-serializable parameters specified at the pipeline or step level. For example:
Corresponds to:
Important note, in the above case, the value of the step would be the one defined in the steps
key (i.e. 0.001). So the YAML config always takes precedence over pipeline parameters that are passed down to steps in code. Read this section for more details.
Normally, parameters defined at the pipeline level are used in multiple steps, and then no step-level configuration is defined.
Note that parameters
are different from artifacts
. Parameters are JSON-serializable values that are passed in the runtime configuration of a pipeline. Artifacts are inputs and outputs of a step, and need not always be JSON-serializable (materializers handle their persistence in the artifact store).
Setting the run_name
run_name
To change the name for a run, pass run_name
as a parameter. This can be a dynamic value as well. Read here for details..
Real-time settings
settings
Settings are special runtime configurations of a pipeline or a step that require a dedicated section. In short, they define a whole bunch of execution configuration such as Docker building and resource settings.
failure_hook_source
and success_hook_source
failure_hook_source
and success_hook_source
The source
of the failure and success hooks.
Step-specific configuration
A lot of pipeline-level configuration can also be applied at a step level (as we already seen with the enable_cache
flag). However, there is some configuration that is step-specific, meaning it cannot be applied at a pipeline level, but only at a step level.
experiment_tracker
: Name of the experiment_tracker to enable for this step. This experiment_tracker should be defined in the active stack with the same name.step_operator
: Name of the step_operator to enable for this step. This step_operator should be defined in the active stack with the same name.outputs
: This is configuration of the output artifacts of this step. This is further keyed by output name (by default, step outputs are namedoutput
). The most interesting configuration here is thematerializer_source
, which is the UDF path of the materializer in code to use for this output (e.g.materializers.some_data.materializer.materializer_class
). Read more about this source path here.
Learn more about step configuration in the dedicated section on managing steps.
Hierarchy and precedence
Some things can be configured on pipelines and steps, some only on one of the two. Pipeline-level settings will be automatically applied to all steps, but if the same setting is configured on a step as well that takes precedence.
When an object is configured, ZenML merges the values with previously-configured keys. E.g.:
In the above example, the two settings configurations were automatically merged.
Fetching configuration
Any configuration can be fetched using the client from the Python SDK. For example, say we use the extra
parameter to tag a pipeline:
This tag is now associated and tracked with all pipeline runs, and can be fetched later:
The configuration is also displayed in the dashboard in the pipeline run details page.
Last updated