ZenML provides several approaches to configure your pipelines and steps:
Understanding .configure() vs .with_options()
ZenML provides two primary methods to configure pipelines and steps: .configure() and .with_options(). While they accept the same parameters, they behave differently:
.configure(): Modifies the configuration in-place and returns the same object.
.with_options(): Creates a new copy with the applied configuration, leaving the original unchanged.
When to use each:
Use .with_options() in most cases, especially inside pipeline definitions:
@pipeline
def my_pipeline():
# This creates a new configuration just for this instance
my_step.with_options(parameters={"param": "value"})()
Use .configure() only when you intentionally want to modify a step globally, and are aware that the change will affect all subsequent uses of that step.
Approaches to Configuration
Pipeline Configuration with configure
You can configure various aspects of a pipeline using the configure method:
# Create a pipeline
my_pipeline = MyPipeline()
# Configure the pipeline
my_pipeline.configure(
enable_cache=False,
enable_artifact_metadata=True,
settings={
"docker": {
"parent_image": "zenml-io/zenml-cuda:latest"
}
}
)
# Run the pipeline
my_pipeline()
Runtime Configuration with with_options
You can configure a pipeline at runtime using the with_options method:
# Configure specific step parameters
my_pipeline.with_options(steps={"trainer": {"parameters": {"learning_rate": 0.01}}})()
# Or using a YAML configuration file
my_pipeline.with_options(config_file="path_to_yaml_file")()
Step-Level Configuration
You can configure individual steps with the @step decorator:
You can directly specify which stack components a step should use. This feature is only available for experiment trackers and stack components:
@step(experiment_tracker="mlflow_tracker", step_operator="vertex_ai")
def train_model():
# This step will use MLflow for tracking and run on Vertex AI
...
@step(experiment_tracker="wandb", step_operator="kubernetes")
def evaluate_model():
# This step will use Weights & Biases for tracking and run on Kubernetes
...
This direct specification is a concise way to assign different stack components to different steps. You can combine this with settings to configure the specific behavior of those components:
@step(step_operator="nameofstepoperator", settings={"step_operator": {"estimator_args": {"instance_type": "m7g.medium"}}})
def my_step():
# This step will use the specified step operator with custom instance type
...
# Alternatively, using the appropriate settings class:
@step(step_operator="nameofstepoperator", settings={"step_operator": SagemakerStepOperatorSettings(instance_type="m7g.medium")})
def my_step():
# Same configuration using the settings class
...
This approach allows you to use different components for different steps in your pipeline while also customizing their runtime behavior.
Types of Settings
Settings in ZenML are categorized into two main types:
General settings that can be used on all ZenML pipelines:
DockerSettings for container configuration
ResourceSettings for CPU, memory, and GPU allocation
Stack-component-specific settings for configuring behaviors of components in your stack:
These use the pattern <COMPONENT_CATEGORY> or <COMPONENT_CATEGORY>.<COMPONENT_FLAVOR> as keys
Examples include experiment_tracker.mlflow or just step_operator
Configuration Hierarchy
There are a few general rules when it comes to settings and configurations that are applied in multiple places. Generally the following is true:
Configurations in code override configurations made inside of the yaml file
Configurations at the step level override those made at the pipeline level
In case of attributes the dictionaries are merged
from zenml import pipeline, step
from zenml.config import ResourceSettings
@step
def load_data(parameter: int) -> dict:
...
@step(settings={"resources": ResourceSettings(gpu_count=1, memory="2GB")})
def train_model(data: dict) -> None:
...
@pipeline(settings={"resources": ResourceSettings(cpu_count=2, memory="1GB")})
def simple_ml_pipeline(parameter: int):
...
# ZenMl merges the two configurations and uses the step configuration to override
# values defined on the pipeline level
train_model.configuration.settings["resources"]
# -> cpu_count: 2, gpu_count=1, memory="2GB"
simple_ml_pipeline.configuration.settings["resources"]
# -> cpu_count: 2, memory="1GB"
Common Setting Types
Resource Settings
Resource settings allow you to specify the CPU, memory, and GPU requirements for your steps:
When both pipeline and step resource settings are specified, they are merged with step settings taking precedence:
# Result of merging the above configurations:
# train_model.configuration.settings["resources"]
# -> cpu_count: 2, gpu_count=1, memory="2GB"
Note that ResourceSettings are not always applied by all orchestrators. The ability to enforce resource constraints depends on the specific orchestrator being used. Some orchestrators like Kubernetes fully support these settings, while others may ignore them. In order to learn more, read the individual pages of the orchestrator you are using.
Docker Settings
Docker settings allow you to customize the containerization process:
For more detailed information on containerization options, see the containerization guide.
Stack Component Configuration
Registration-time vs Runtime Stack Component Settings
Stack components have two types of configuration:
Registration-time configuration: Static settings defined when registering a component
# Example: Setting a fixed tracking URL for MLflow
zenml experiment-tracker register mlflow_tracker --flavor=mlflow --tracking_url=http://localhost:5000
Runtime settings: Dynamic settings that can change between pipeline runs
# Example: Setting experiment name that changes for each run
@step(settings={"experiment_tracker.mlflow": {"experiment_name": "custom_experiment"}})
def my_step():
...
Even for runtime settings, you can set default values during registration:
# Setting a default value for "nested" setting
zenml experiment-tracker register <n> --flavor=mlflow --nested=True
Using the Right Key for Stack Component Settings
When specifying stack-component-specific settings, the key follows this pattern:
# Using just the component category
@step(settings={"step_operator": {"estimator_args": {"instance_type": "m7g.medium"}}})
# Or using the component category and flavor
@step(settings={"experiment_tracker.mlflow": {"experiment_name": "custom_experiment"}})
If you specify just the category (e.g., step_operator), ZenML applies these settings to whatever flavor of component is in your stack. If the settings don't apply to that flavor, they are ignored.
Making Configurations Flexible with Environment Variables
You can make your configurations more flexible by referencing environment variables using the placeholder syntax ${ENV_VARIABLE_NAME}:
This allows you to easily adapt your pipelines to different environments without changing code.
Advanced Pipeline Triggering
For triggering pipelines from a client or another pipeline, you can use a PipelineRunConfiguration object. This approach is covered in the advanced template usage documentation.
Autogenerate a template yaml file
If you want to generate a template yaml file of your specific pipeline, you can do so by using the .write_run_configuration_template() method. This will generate a yaml file with all options commented out. This way you can pick and choose the settings that are relevant to you.
from zenml import pipeline
...
@pipeline(enable_cache=True) # set cache behavior at step level
def simple_ml_pipeline(parameter: int):
dataset = load_data(parameter=parameter)
train_model(dataset)
simple_ml_pipeline.write_run_configuration_template(path="<Insert_path_here>")
An example of a generated YAML configuration template
When you want to configure your pipeline with a certain stack in mind, you can do so as well: ...write_run_configuration_template(stack=<Insert_stack_here>)