Managing scheduled pipelines

A step-by-step tutorial on how to create, update, and delete scheduled pipelines in ZenML

Managing scheduled pipelines

This tutorial demonstrates how to work with scheduled pipelines in ZenML through a practical example. We'll create a simple data processing pipeline that runs on a schedule, update its configuration, and finally clean up by deleting the schedule.

How Scheduling Works in ZenML

ZenML doesn't implement its own scheduler but acts as a wrapper around the scheduling capabilities of supported orchestrators like Vertex AI, Airflow, Kubeflow, and others. When you create a schedule, ZenML:

  1. Translates your schedule definition to the orchestrator's native format

  2. Registers the schedule with the orchestrator's scheduling system

  3. Records the schedule in the ZenML metadata store

The orchestrator then takes over responsibility for executing the pipeline according to the schedule.

For our full reference documentation on schedules, see the Schedule a Pipeline page.

Prerequisites

Before starting this tutorial, make sure you have:

  1. ZenML installed and configured

  2. A supported orchestrator (we'll use Vertex AI in this example)

  3. Basic understanding of ZenML pipelines and steps

Step 1: Create a Simple Pipeline

First, let's create a basic pipeline that we'll schedule. This pipeline will simulate a daily data processing task.

Step 2: Create a Schedule

Now, let's create a schedule for our pipeline. We'll set it to run daily at 9 AM.

Running the pipeline will create the schedule in the ZenML metadata store. as well as the scheduled run in the orchestrator.

Best Practice: Use Descriptive Schedule Names

When creating schedules, follow a consistent naming pattern to better organize them:

Include the frequency, purpose, environment, and version in your schedule names.

Step 3: Verify the Schedule

After creating a schedule, it's important to verify that it exists in both ZenML and the orchestrator. This verification helps ensure your pipeline will run as expected.

Step 3.1: Verify the Schedule in ZenML

Let's check if our schedule was created successfully using both Python and the CLI:

Using the CLI to verify:

Here's an example of what the CLI output might look like:

Schedules list CLI

Step 3.2: Verify the Schedule in the Orchestrator

To ensure the schedule was properly created in Vertex AI, we can verify it using the Google Cloud SDK:

Step 4: Update the Schedule

Sometimes we need to modify an existing schedule. Since ZenML doesn't support direct schedule updates, we'll need to delete the old schedule and create a new one. This is a two-step process:

  1. Delete the existing schedules (both from ZenML and the orchestrator)

  2. Create a new schedule with the updated configuration

Step 4.1: Delete the Existing Schedule

First, delete the schedule from ZenML:

Using the CLI:

For Vertex AI, you need to delete the orchestrator schedule:

Step 4.2: Create the Updated Schedule

Now, create a new schedule with the updated parameters:

Or using a script:

Step 5: Monitor Schedule Execution

Let's check the execution history of our scheduled pipeline:

Monitoring with Alerters

For critical pipelines, add alerting to notify you of failures:

This assumes you've registered an alerter (like Slack or Discord) in your active stack.

Step 6: Clean Up

When you're done with a scheduled pipeline, proper cleanup is essential to prevent unexpected executions. You must perform two separate deletion operations:

  1. Delete the schedule from ZenML's database

  2. Delete the schedule from the underlying orchestrator (Vertex AI in this example)

Step 6.1: Delete the Schedule from ZenML

First, let's delete the schedule from ZenML:

Step 6.2: Delete the Schedule from the Orchestrator (Required)

Here's how to delete the schedule from Vertex AI:

The procedure for deleting schedules varies by orchestrator. Always check your orchestrator's documentation for the correct deletion method.

Troubleshooting: Quick Fixes for Common Issues

Here are some practical fixes for issues you might encounter with your scheduled pipelines:

Issue: Timezone Confusion with Scheduled Runs

A common issue with scheduled pipelines is timezone confusion. Here's how ZenML handles timezone information:

  1. If you provide a timezone-aware datetime, ZenML will use it as is

  2. If you provide a datetime without timezone information, ZenML assumes it's in your local timezone and converts it to UTC for storage and communication with orchestrators

For cloud orchestrators like Vertex AI, Kubeflow, and Airflow, schedules typically run in the orchestrator's timezone, which is usually UTC. This can lead to confusion if you expect a schedule to run at 9 AM in your local timezone but it runs at 9 AM UTC instead.

To ensure your schedule runs at the expected time:

Remember that cron expressions themselves don't have timezone information - they're interpreted in the timezone of the system executing them (which for cloud orchestrators is usually UTC).

Issue: Schedule Doesn't Run at the Expected Time

If your pipeline doesn't run when scheduled:

For Vertex AI specifically, verify that your service account has the required permissions:

Issue: Orphaned Schedules in the Orchestrator

To clean up orphaned Vertex AI schedules:

Issue: Finding Failing Scheduled Runs

When scheduled runs fail silently:

Next Steps

Now that you understand the basics of managing scheduled pipelines, you can:

  1. Create more complex schedules with various cron expressions for different business needs

  2. Set up monitoring and alerting to be notified when scheduled runs fail

  3. Optimize resource allocation for your scheduled pipelines

  4. Implement data-dependent scheduling where pipelines trigger based on data availability

For more advanced schedule management and monitoring techniques, check out theZenML documentation.

Last updated

Was this helpful?