Configure Pipelines at Runtime
How to configure step and pipeline parameters for each run
This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.
Runtime Configuration
A ZenML pipeline clearly separates business logic from parameter configuration. Business logic is what defines a step or a pipeline. Parameter configurations are used to dynamically set parameters of your steps and pipelines at runtime.
You can configure your pipelines at runtime in the following ways:
Configuring from within code: Do this when you are quickly iterating on your code and don't want to change your actual step code. This is useful in the development phase.
Configuring with YAML config files: Do this when you want to launch pipeline runs without modifying the code at all. This is the recommended way for production scenarios.
Configuring from within code
You can add a configuration to a step by creating your configuration as a subclass of the BaseStepConfig
. When such a config object is passed to a step, it is not treated like other artifacts. Instead, it gets passed into the step when the pipeline is instantiated.
The default value for the gamma
parameter is set to 0.001
. However, when the pipeline is instantiated you can override the default like this:
Behind the scenes, BaseStepConfig
is implemented as a Pydantic BaseModel. Therefore, any type that Pydantic supports is also supported as an attribute type in the BaseStepConfig
.
Configuring with YAML config files
For production scenarios where you want to launch pipeline runs without modifying the code at all, you can also configure your pipeline runs using YAML config files.
There are two ways how YAML config files can be used:
Defining step parameters in YAML and setting the path to the config with
pipeline.with_config()
before callingpipeline.run()
,Configuring the entire pipeline at runtime in YAML and executing it with
zenml pipeline run
.
Defining step parameters in YAML
If you only want to configure step parameters as above, you can do so with a minimalistic configuration YAML file, which you use to configure a pipeline before running it using the with_config()
method, e.g.:
The path_to_config.yaml
needs to have the following structure:
For our example from above, we could use the following configuration file:
Note that svc_trainer()
still has to be defined to have a config: SVCTrainerStepConfig
argument. The difference here is only that we provide gamma
via a config file before running the pipeline, instead of explicitly passing a SVCTrainerStepConfig
object during the step creation.
Configuring the entire pipeline at runtime in YAML
For production settings, you might want to use config files not only for your parameters, but even for choosing what code gets executed. This way, you can define entire pipeline runs without changing the code.
To run pipelines in this way, you can use the zenml pipeline run
command with -c
argument:
<PATH_TO_PIPELINE_PYTHON_FILE>
should point to the Python file where your pipeline function or class is defined. Your steps can also be in that file, but they do not need to. If your steps are defined in separate code files, you can instead specify that in the YAML, as we will see below.
Do not instantiate and run your pipeline within the python file that you want to run using the CLI, else your pipeline will be run twice, possibly with different configurations.
If you want to dynamically configure the entire pipeline, your config file will need a bit more information:
The name of the function or class of your pipeline in
<PATH_TO_PIPELINE_PYTHON_FILE>
,The name of the function or class of each step, optionally with additional path to the the code file where it is defined (if it is not in
<PATH_TO_PIPELINE_PYTHON_FILE>
),Optionally, the name of each materializer (about which you will learn later in the section on Materializers).
Overall, the required structure of such a YAML should look like this:
This might seem daunting at first, so let us go over it one by one:
Defining which pipeline to use
The first line of the YAML defines which pipeline code to use. In case you defined your pipeline as Python function with @pipeline
decorator, this name is the name of the decorated function. If you used the Class Based API (which you will learn about in the next section), it will be the name of the class.
For example, if you have defined a pipeline my_pipeline_a
in pipelines/my_pipelines.py
, then you would:
Set
name: my_pipeline_a
in the YAML,Use
pipelines/my_pipelines.py
as<PATH_TO_PIPELINE_PYTHON_FILE>
.
Defining which steps to use
For each step, you can define which source code to use via the source
field:
For example, if you have defined a step my_step_1
in steps/my_steps.py
that you want to use as step_1
of your pipeline my_pipeline_a
, then you would define that in your YAML like this:
If your step is defined in the same file as your pipeline, you can omit the last file: ...
line.
Defining materializer source codes
The materializers
field of a step can be used to specify custom materializers of your step outputs and inputs.
Materializers are responsible for saving and loading artifacts within each step. For more details on materializers and how to configure them in YAML config files, see the Materializers section in the list of advanced concepts.
Code Summary
Putting it all together, we can configure our entire example pipeline run like this in the CLI:
Pro-Tip: You can use this to configure and run your pipeline from within your github action (or comparable tools). This way you ensure each run is directly associated with a code version.
Last updated