LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Getting Started
    • Welcome to ZenML
    • Installation
    • Hello World
    • Core Concepts
    • System Architecture
  • Deploying ZenML
    • Deploy
      • Deploy with Docker
      • Deploy with Helm
      • Deploy using HuggingFace Spaces
      • Deploy with custom images
      • Secret management
      • Custom secret stores
    • Connect
      • with your User (interactive)
      • with an API Token
      • with a Service Account
    • Manage
      • Best practices for upgrading
      • Using ZenML server in production
      • Troubleshoot your ZenML server
      • Migration guide
        • Migration guide 0.13.2 → 0.20.0
        • Migration guide 0.23.0 → 0.30.0
        • Migration guide 0.39.1 → 0.41.0
        • Migration guide 0.58.2 → 0.60.0
  • Concepts
    • Steps & Pipelines
      • Configuration
      • Scheduling
      • Logging
      • Advanced Features
      • YAML Configuration
    • Artifacts
      • Materializers
      • Visualizations
    • Stack & Components
    • Service Connectors
    • Containerization
    • Code Repositories
    • Secrets
    • Tags
    • Metadata
    • Models
    • Templates
    • Dashboard
  • Reference
    • Community & content
    • Environment Variables
    • llms.txt
    • FAQ
    • Global settings
    • Legacy docs
Powered by GitBook
On this page
  • Basic Usage
  • Sample Configuration File
  • Configuration Hierarchy
  • Configuring Steps and Pipelines
  • Pipeline and Step Parameters
  • Enable Flags
  • Run Name
  • Resource and Component Configuration
  • Docker Settings
  • Resource Settings
  • Stack Component Settings
  • Working with Configuration Files
  • Autogenerating Template YAML Files
  • Environment Variables in Configuration
  • Using Configuration Files for Different Environments
  • Advanced Configuration Options
  • Model Configuration
  • Scheduling
  • Conclusion

Was this helpful?

Edit on GitHub
  1. Concepts
  2. Steps & Pipelines

YAML Configuration

Learn how to configure ZenML pipelines using YAML configuration files.

ZenML provides configuration capabilities through YAML files that allow you to customize pipeline and step behavior without changing your code. This is particularly useful for separating configuration from code, experimenting with different parameters, and ensuring reproducibility.

Basic Usage

You can apply a YAML configuration file when running a pipeline:

my_pipeline.with_options(config_path="config.yaml")()

This allows you to change pipeline behavior without modifying your code.

Sample Configuration File

Here's a simple example of a YAML configuration file:

# Enable/disable features
enable_cache: False
enable_step_logs: True

# Pipeline parameters
parameters: 
  dataset_name: "my_dataset"
  learning_rate: 0.01

# Step-specific configuration
steps:
  train_model:
    parameters:
      learning_rate: 0.001  # Override the pipeline parameter for this step
    enable_cache: True      # Override the pipeline cache setting

Configuration Hierarchy

ZenML follows a specific hierarchy when resolving configuration:

  1. Runtime Python code - Highest precedence

  2. Step-level YAML configuration

    steps:
      train_model:
        parameters:
          learning_rate: 0.001  # Overrides pipeline-level setting
  3. Pipeline-level YAML configuration

    parameters:
      learning_rate: 0.01  # Lower precedence than step-level
  4. Default values in code - Lowest precedence

This hierarchy allows you to define base configurations at the pipeline level and override them for specific steps as needed.

Configuring Steps and Pipelines

Pipeline and Step Parameters

You can specify parameters for pipelines and steps, similar to how you'd define them in Python code:

# Pipeline parameters
parameters:
  dataset_name: "my_dataset"
  learning_rate: 0.01
  batch_size: 32
  epochs: 10

# Step parameters
steps:
  preprocessing:
    parameters:
      normalize: True
      fill_missing: "mean"
  
  train_model:
    parameters:
      learning_rate: 0.001  # Override the pipeline parameter
      optimizer: "adam"

These settings correspond directly to the parameters you'd normally pass to your pipeline and step functions.

Enable Flags

These boolean flags control aspects of pipeline execution that were covered in the Advanced Features section:

# Pipeline-level flags
enable_artifact_metadata: True      # Whether to collect and store metadata for artifacts
enable_artifact_visualization: True  # Whether to generate visualizations for artifacts
enable_cache: True                  # Whether to use caching for steps
enable_step_logs: True              # Whether to capture and store step logs

# Step-specific flags
steps:
  preprocessing:
    enable_cache: False             # Disable caching for this step only
  train_model:
    enable_artifact_visualization: False  # Disable visualizations for this step

Run Name

Set a custom name for the pipeline run:

run_name: "training_run_cifar10_resnet50_lr0.001"

Resource and Component Configuration

Docker Settings

Configure Docker container settings for pipeline execution:

settings:
  docker:
    # Packages to install via apt-get
    apt_packages: ["curl", "git", "libgomp1"]
    
    # Whether to copy files from current directory to the Docker image
    copy_files: True
    
    # Environment variables to set in the container
    environment:
      ZENML_LOGGING_VERBOSITY: DEBUG
      PYTHONUNBUFFERED: "1"
    
    # Parent image to use for building
    parent_image: "zenml-io/zenml-cuda:latest"
    
    # Additional Python packages to install
    requirements: ["torch==1.10.0", "transformers>=4.0.0", "pandas"]

Resource Settings

Configure compute resources for pipeline or step execution:

# Pipeline-level resource settings
settings:
  resources:
    cpu_count: 2
    gpu_count: 1
    memory: "4Gb"

# Step-specific resource settings
steps:
  train_model:
    settings:
      resources:
        cpu_count: 4
        gpu_count: 2
        memory: "16Gb"

Stack Component Settings

Configure specific stack components for steps:

steps:
  train_model:
    # Use specific named components
    experiment_tracker: "mlflow_tracker"
    step_operator: "vertex_gpu"
    
    # Component-specific settings
    settings:
      # MLflow specific configuration
      experiment_tracker.mlflow:
        experiment_name: "image_classification"
        nested: True

Working with Configuration Files

Autogenerating Template YAML Files

ZenML provides a command to generate a template configuration file:

zenml pipeline build-configuration my_pipeline > config.yaml

This generates a YAML file with all pipeline parameters, step parameters, and configuration options with their default values.

Environment Variables in Configuration

You can reference environment variables in your YAML configuration:

settings:
  docker:
    environment:
      # References an environment variable from the host system
      API_KEY: ${MY_API_KEY}
      DATABASE_URL: ${DB_CONNECTION_STRING}

Using Configuration Files for Different Environments

A common pattern is to maintain different configuration files for different environments:

├── configs/
│   ├── dev.yaml     # Development configuration
│   ├── staging.yaml # Staging configuration
│   └── prod.yaml    # Production configuration

Example development configuration:

# dev.yaml
enable_cache: False
enable_step_logs: True
parameters:
  dataset_size: "small"
settings:
  docker:
    parent_image: "zenml-io/zenml:latest"

Example production configuration:

# prod.yaml
enable_cache: True
enable_step_logs: False
parameters:
  dataset_size: "full"
settings:
  docker:
    parent_image: "zenml-io/zenml-cuda:latest"
  resources:
    cpu_count: 8
    memory: "16Gb"

You can then specify which configuration to use:

# For development
my_pipeline.with_options(config_path="configs/dev.yaml")()

# For production
my_pipeline.with_options(config_path="configs/prod.yaml")()

Advanced Configuration Options

Model Configuration

Link a pipeline to a ZenML Model:

model:
  name: "classification_model"
  description: "Image classifier trained on the CIFAR-10 dataset"
  tags: ["computer-vision", "classification", "pytorch"]
  
  # Specific model version
  version: "1.2.3"

Scheduling

Configure pipeline scheduling when using an orchestrator that supports it:

schedule:
  # Whether to run the pipeline for past dates if schedule is missed
  catchup: false
  
  # Cron expression for scheduling (daily at midnight)
  cron_expression: "0 0 * * *"
  
  # Time to start scheduling from
  start_time: "2023-06-01T00:00:00Z"

Conclusion

YAML configuration in ZenML provides a powerful way to customize pipeline behavior without changing your code. By separating configuration from implementation, you can make your ML workflows more flexible, maintainable, and reproducible.

See also:

PreviousAdvanced FeaturesNextArtifacts

Last updated 26 days ago

Was this helpful?

- Core building blocks

- Advanced pipeline features

Steps & Pipelines
Advanced Features