Advanced Features

Advanced features and capabilities of ZenML pipelines and steps

This guide covers advanced features and capabilities of ZenML pipelines and steps, allowing you to build more sophisticated machine learning workflows.

Execution Control

Caching

Steps are automatically cached based on their code, inputs and other factors. When a step runs, ZenML computes a hash of the inputs and checks if a previous run with the same inputs exists. If found, ZenML reuses the outputs instead of re-executing the step.

You can control caching behavior at the step level:

@step(enable_cache=False)
def non_cached_step():
    pass

You can also configure caching at the pipeline level:

@pipeline(enable_cache=False)
def my_pipeline():
    ...

Or modify it after definition:

my_step.configure(enable_cache=False)
my_pipeline.configure(enable_cache=False)

For more information, check out this page.

Running Individual Steps

You can run a single step directly:

This creates an unlisted pipeline run with just that step. If you want to bypass ZenML completely and run the underlying function directly:

You can make this the default behavior by setting the ZENML_RUN_SINGLE_STEPS_WITHOUT_STACK environment variable to True.

Asynchronous Pipeline Execution

By default, pipelines run synchronously, with terminal logs displaying as the pipeline builds and runs. You can change this behavior to run pipelines asynchronously (in the background):

Alternatively, you can configure this in a YAML config file:

You can also configure the orchestrator to always run asynchronously by setting synchronous=False in its configuration.

Step Execution Order

By default, ZenML determines step execution order based on data dependencies. When a step requires output from another step, it automatically creates a dependency.

You can explicitly control execution order with the after parameter:

This is particularly useful for steps with side effects (like data loading or model deployment) where the data dependency is not explicit.

Execution Modes

ZenML provides three execution modes that control how your orchestrator behaves when a step fails during pipeline execution. These modes are:

  • CONTINUE_ON_FAILURE: The orchestrator continues executing steps that don't depend on any of the failed steps.

  • STOP_ON_FAILURE: The orchestrator allows the running steps to complete, but prevents new steps from starting.

  • FAIL_FAST: The orchestrator stops the run and any running steps immediately when a failure occurs.

You can configure the execution mode of your pipeline in several ways:

As an example, you can consider a pipeline with this dependency structure:

If steps 2, 3, and 4 execute in parallel and step 2 fails:

  • With FAIL_FAST: Step 1 finishes → Steps 2,3,4 start → Step 2 fails → Steps 3, 4 are stopped → No other steps get launched

  • With STOP_ON_FAILURE: Step 1 finishes → Steps 2,3,4 start → Step 2 fails but Steps 3, 4 complete → Steps 5, 6, 7 are skipped

  • With CONTINUE_ON_FAILURE: Step 1 finishes → Steps 2,3,4 start → Step 2 fails, Steps 3, 4 complete → Step 5 skipped (depends on failed Step 2), Steps 6, 7 run normally → Step 8 is skipped as well.

All three execution modes are currently only supported by the local, local_docker, and kubernetes orchestrator flavors. For any other orchestrator flavor, the default (and only available) behavior is CONTINUE_ON_FAILURE. If you would like to see any of the other orchestrators extended to support the other execution modes, reach out to us in Slack.

Data & Output Management

Type annotations

Your functions will work as ZenML steps even if you don't provide any type annotations for their inputs and outputs. However, adding type annotations to your step functions gives you lots of additional benefits:

  • Type validation of your step inputs: ZenML makes sure that your step functions receive an object of the correct type from the upstream steps in your pipeline.

  • Better serialization: Without type annotations, ZenML uses Cloudpickle to serialize your step outputs. When provided with type annotations, ZenML can choose a materializer that is best suited for the output. In case none of the builtin materializers work, you can even write a custom materializer.

If you want to make sure you get all the benefits of type annotating your steps, you can set the environment variable ZENML_ENFORCE_TYPE_ANNOTATIONS to True. ZenML will then raise an exception in case one of the steps you're trying to run is missing a type annotation.

Tuple vs multiple outputs

It is impossible for ZenML to detect whether you want your step to have a single output artifact of type Tuple or multiple output artifacts just by looking at the type annotation.

We use the following convention to differentiate between the two: When the return statement is followed by a tuple literal (e.g. return 1, 2 or return (value_1, value_2)) we treat it as a step with multiple outputs. All other cases are treated as a step with a single output of type Tuple.

Step output names

By default, ZenML uses the output name output for single output steps and output_0, output_1, ... for steps with multiple outputs. These output names are used to display your outputs in the dashboard and fetch them after your pipeline is finished.

If you want to use custom output names for your steps, use the Annotated type annotation:

If you do not give your outputs custom names, the created artifacts will be named {pipeline_name}::{step_name}::output or {pipeline_name}::{step_name}::output_{i} in the dashboard. See the documentation on artifact versioning and configuration for more information.

Workflow Patterns

Pipeline Composition

You can compose pipelines from other pipelines to create modular, reusable workflows:

Pipeline composition allows you to build complex workflows from simpler, well-tested components.

Fan-out and Fan-in

The fan-out/fan-in pattern is a common pipeline architecture where a single step splits into multiple parallel operations (fan-out) and then consolidates the results back into a single step (fan-in). This pattern is particularly useful for parallel processing, distributed workloads, or when you need to process data through different transformations and then aggregate the results. For example, you might want to process different chunks of data in parallel and then aggregate the results:

The fan-out pattern allows for parallel processing and better resource utilization, while the fan-in pattern enables aggregation and consolidation of results. This is particularly useful for:

  • Parallel data processing

  • Distributed model training

  • Ensemble methods

  • Batch processing

  • Data validation across multiple sources

  • Hyperparameter tuning

Note that when implementing the fan-in step, you'll need to use the ZenML Client to query the results from previous parallel steps, as shown in the example above, and you can't pass in the result directly.

Dynamic Fan-out/Fan-in with Snapshots

For scenarios where you need to determine the number of parallel operations at runtime (e.g., based on database queries or dynamic data), you can use snapshots to create a more flexible fan-out/fan-in pattern. This approach allows you to trigger multiple pipeline runs dynamically and then aggregate their results.

This pattern enables dynamic scaling, true parallelism, and database-driven workflows. Key advantages include fault tolerance and separate monitoring for each chunk. Consider resource management and proper error handling when implementing.

Custom Step Invocation IDs

When calling a ZenML step as part of your pipeline, it gets assigned a unique invocation ID that you can use to reference this step invocation when defining the execution order of your pipeline steps or use it to fetch information about the invocation after the pipeline has finished running.

Named Pipeline Runs

In the output logs of a pipeline run you will see the name of the run:

This name is automatically generated based on the current date and time. To change the name for a run, pass run_name as a parameter to the with_options() method:

Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the placeholders that ZenML will replace.

The substitutions for the custom placeholders like experiment_name can be set in:

  • @pipeline decorator, so they are effective for all steps in this pipeline

  • pipeline.with_options function, so they are effective for all steps in this pipeline run

Standard substitutions always available and consistent in all steps of the pipeline are:

  • {date}: current date, e.g. 2024_11_27

  • {time}: current time in UTC format, e.g. 11_07_09_326492

Error Handling & Reliability

Automatic Step Retries

For steps that may encounter transient failures (like network issues or resource limitations), you can configure automatic retries:

It's important to note that retries happen at the step level, not the pipeline level. This means that ZenML will only retry individual failed steps, not the entire pipeline.

With this configuration, if the step fails, ZenML will:

  1. Wait 10 seconds before the first retry

  2. Wait 20 seconds (10 × 2) before the second retry

  3. Wait 40 seconds (20 × 2) before the third retry

  4. Fail the pipeline if all retries are exhausted

This is particularly useful for steps that interact with external services or resources.

Monitoring & Notifications

Pipeline and Step Hooks

Hooks allow you to execute custom code at specific points in the pipeline or step lifecycle:

The following conventions apply to hooks:

  • the success hook takes no arguments

  • the failure hook optionally takes a single BaseException typed argument

You can also define hooks at the pipeline level to apply to all steps:

Step-level hooks take precedence over pipeline-level hooks. Hooks are particularly useful for:

  • Sending notifications when steps fail or succeed

  • Logging detailed information about runs

  • Triggering external workflows based on pipeline state

Accessing Step Context in Hooks

You can access detailed information about the current run using the step context:

Using Alerter in Hooks

You can use the Alerter stack component to send notifications when steps fail or succeed:

ZenML provides built-in alerter hooks for common scenarios:

Conclusion

These advanced features provide powerful capabilities for building sophisticated machine learning workflows in ZenML. By leveraging these features, you can create pipelines that are more robust, maintainable, and flexible.

See also:

Last updated

Was this helpful?