Real-time settings

Using settings to configure runtime configuration.

Stack Component Config vs Settings in ZenML

As we saw before, one special type of configuration is called Settings. These allow you to configure runtime configurations for stack components and pipelines. Concretely, they allow you to configure:

  • The resources required for a step

  • Configuring the containerization process of a pipeline (e.g. What requirements get installed in the Docker image)

  • Stack component-specific configuration, e.g., if you have an experiment tracker passing in the name of the experiment at runtime

You will learn about all of the above in more detail later, but for now, let's try to understand that all of this configuration flows through one central concept called BaseSettings. (From here on, we use settings and BaseSettings as analogous in this guide).

Types of settings

Settings are categorized into two types:

Difference between stack component settings at registration-time vs real-time

For stack-component-specific settings, you might be wondering what the difference is between these and the configuration passed in while doing zenml stack-component register <NAME> --config1=configvalue --config2=configvalue, etc. The answer is that the configuration passed in at registration time is static and fixed throughout all pipeline runs, while the settings can change.

A good example of this is the MLflow Experiment Tracker, where configuration which remains static such as the tracking_url is sent through at registration time, while runtime configuration such as the experiment_name (which might change every pipeline run) is sent through as runtime settings.

Even though settings can be overridden at runtime, you can also specify default values for settings while configuring a stack component. For example, you could set a default value for the nested setting of your MLflow experiment tracker: zenml experiment-tracker register <NAME> --flavor=mlflow --nested=True

This means that all pipelines that run using this experiment tracker use nested MLflow runs unless overridden by specifying settings for the pipeline at runtime.

Using objects or dicts

Settings can be passed in directly as BaseSettings-subclassed objects, or a dictionary representation of the object. For example, a Docker configuration can be passed in as follows:

from zenml.config import DockerSettings

settings = {'docker': DockerSettings(requirements=['pandas'])}

Or like this:

settings = {'docker': {'requirements': ['pandas']}}

Or in a YAML like this:

settings:
  docker:
    requirements:
      - pandas

Using the right key for Stack-component-specific settings

When specifying stack-component-specific settings, a key needs to be passed. This key should always correspond to the pattern: <COMPONENT_CATEGORY>.<COMPONENT_FLAVOR>

For example, the SagemakerStepOperator supports passing in estimator_args. The way to specify this would be to use the key step_operator.sagemaker

@step(step_operator="nameofstepoperator", settings= {"step_operator.sagemaker": {"estimator_args": {"instance_type": "m7g.medium"}}})
def my_step():
  ...

# Using the class
@step(step_operator="nameofstepoperator", settings= {"step_operator.sagemaker": SagemakerStepOperatorSettings(instance_type="m7g.medium")})
def my_step():
  ...

or in YAML:

steps:
  my_step:
    step_operator: "nameofstepoperator"
    settings:
      step_operator.sagemaker:
        estimator_args:
          instance_type: m7g.medium

Utilizing the settings

Settings can be configured in the same way as any other configuration. For example, users have the option to send all configurations in via a YAML file. This is useful in situations where code changes are not desirable.

To use a YAML file, you must pass it to the with_options(...) method of a pipeline:

@step
def my_step() -> None:
    print("my step")


@pipeline
def my_pipeline():
    my_step()


# Pass in a config file
my_pipeline = my_pipeline.with_options(config_path='/local/path/to/config.yaml')

The format of a YAML config file is exactly the same as the configurations you would pass in Python in the above two sections. Step-specific configurations can be passed by using the step invocation ID inside the steps dictionary. Here is a rough skeleton of a valid YAML config. All keys are optional.

# Pipeline level settings
settings: 
  docker:
    build_context_root: .
    build_options: Mapping[str, Any]
    source_files: str
    copy_global_config: bool
    dockerfile: Optional[str]
    dockerignore: Optional[str]
    environment: Mapping[str, Any]
    install_stack_requirements: bool
    parent_image: Optional[str]
    replicate_local_python_environment: Optional
    required_integrations: List[str]
    requirements:
      - pandas
  resources:
    cpu_count: 1
    gpu_count: 1
    memory: "1GB"
    
steps:
  step_invocation_id:
    settings: { }  # overrides pipeline settings
  other_step_invocation_id:
    settings: { }
  ...

Hierarchy and precedence

Some settings can be configured on pipelines and steps, some only on one of the two. Pipeline-level settings will be automatically applied to all steps, but if the same setting is configured on a step as well that takes precedence.

When a settings object is configured, ZenML merges the values with previously configured keys. E.g.:

from zenml.config import ResourceSettings


@step(settings={"resources": ResourceSettings(cpu_count=2, memory="1GB")})
def my_step() -> None:
    ...


my_step.configure(
    settings={"resources": ResourceSettings(gpu_count=1, memory="2GB")}
)

my_step.configuration.settings["resources"]
# cpu_count: 2, gpu_count=1, memory="2GB"

In the above example, the two settings were automatically merged.

Last updated