What can be configured
Here is an example of a sample YAML file, with the most important configuration highlighted. For brevity, we have removed all possible keys. To view a sample file with all possible keys, refer to this page.
Deep-dive
enable_XXX
parameters
enable_XXX
parametersThese are boolean flags for various configurations:
enable_artifact_metadata
: Whether to associate metadata with artifacts or not.enable_artifact_visualization
: Whether to attach visualizations of artifacts.enable_cache
: Utilize caching or not.enable_step_logs
: Enable tracking step logs.
build
ID
build
IDThe UUID of the build
to use for this pipeline. If specified, Docker image building is skipped for remote orchestrators, and the Docker image specified in this build is used.
Configuring the model
model
Specifies the ZenML Model to use for this pipeline.
Pipeline and step parameters
parameters
A dictionary of JSON-serializable parameters specified at the pipeline or step level. For example:
Corresponds to:
Important note, in the above case, the value of the step would be the one defined in the steps
key (i.e. 0.001). So the YAML config always takes precedence over pipeline parameters that are passed down to steps in code. Read this section for more details.
Normally, parameters defined at the pipeline level are used in multiple steps, and then no step-level configuration is defined.
Note that parameters
are different from artifacts
. Parameters are JSON-serializable values that are passed in the runtime configuration of a pipeline. Artifacts are inputs and outputs of a step, and need not always be JSON-serializable (materializers handle their persistence in the artifact store).
Setting the run_name
run_name
To change the name for a run, pass run_name
as a parameter. This can be a dynamic value as well.
You will not be able to run with the same run_name twice. Do not set this statically when running on a schedule. Try to include some auto-incrementation or timestamp to the name.
Stack Component Runtime settings
Settings are special runtime configurations of a pipeline or a step that require a dedicated section. In short, they define a bunch of execution configuration such as Docker building and resource settings.
Docker Settings
Docker Settings can be passed in directly as objects, or a dictionary representation of the object. For example, the Docker configuration can be set in configuration files as follows:
Resource Settings
Some stacks allow setting the resource settings using these settings.
Note that this may not work for all types of stack components. To learn which components support this, please refer to the specific orchestrator docs.
failure_hook_source
and success_hook_source
failure_hook_source
and success_hook_source
The source
of the failure and success hooks can be specified.
Step-specific configuration
A lot of pipeline-level configuration can also be applied at a step level (as we have already seen with the enable_cache
flag). However, there is some configuration that is step-specific, meaning it cannot be applied at a pipeline level, but only at a step level.
experiment_tracker
: Name of the experiment_tracker to enable for this step. This experiment_tracker should be defined in the active stack with the same name.step_operator
: Name of the step_operator to enable for this step. This step_operator should be defined in the active stack with the same name.outputs
: This is configuration of the output artifacts of this step. This is further keyed by output name (by default, step outputs are namedoutput
). The most interesting configuration here is thematerializer_source
, which is the UDF path of the materializer in code to use for this output (e.g.materializers.some_data.materializer.materializer_class
). Read more about this source path here.
Last updated