Automatically retry steps

Automatically configure your steps to retry if they fail.

ZenML provides a built-in retry mechanism that allows you to configure automatic retries for your steps in case of failures. This can be useful when dealing with intermittent issues or transient errors. A common pattern when trying to run a step on GPU-backed hardware is that the provider will not have enough resources available, so you can set ZenML to handle the retries until the resources free up. You can configure three parameters for step retries:

  • max_retries: The maximum number of times the step should be retried in case of failure.

  • delay: The initial delay in seconds before the first retry attempt.

  • backoff: The factor by which the delay should be multiplied after each retry attempt.

Using the @step decorator:

You can specify the retry configuration directly in the definition of your step as follows:

from zenml.config.retry_config import StepRetryConfig

@step(
    retry=StepRetryConfig(
        max_retries=3, 
        delay=10, 
        backoff=2
    )
)
def my_step() -> None:
    raise Exception("This is a test exception")
steps:
  my_step:
    retry:
      max_retries: 3
      delay: 10
      backoff: 2

See Also:

Last updated

Was this helpful?