Trigger pipelines from external systems

A step-by-step tutorial on effectively triggering your ZenML pipelines from external systems

This tutorial demonstrates practical approaches to triggering ZenML pipelines from external systems. We'll explore multiple methods, from ZenML Pro's Snapshots to open-source alternatives using custom APIs, serverless functions, and GitHub Actions.

Introduction: The Pipeline Triggering Challenge

In development environments, you typically run your ZenML pipelines directly from Python code. However, in production, pipelines often need to be triggered by external systems:

  • Scheduled retraining of models based on a time interval

  • Batch inference when new data arrives

  • Event-driven ML workflows responding to data drift or performance degradation

  • Integration with CI/CD pipelines and other automation systems

  • Invocation from custom applications via API calls

Each scenario requires a reliable way to trigger the right version of your pipeline with the correct parameters, while maintaining security and operational standards.

For our full reference documentation on pipeline triggering, see the Snapshot docs page.

Prerequisites

Before starting this tutorial, make sure you have:

  1. ZenML installed and configured

  2. Basic understanding of ZenML pipelines and steps

  3. A simple pipeline to use for triggering examples

Creating a Sample Pipeline for External Triggering

First, let's create a basic pipeline that we'll use throughout this tutorial. This pipeline takes a dataset URL and model type as inputs, then performs a simple training operation:

This pipeline is designed to be configurable with parameters that might change between runs:

  • data_url: Where to find the input data

  • model_type: Which algorithm to use

These parameters make it an ideal candidate for external triggering scenarios where we want to run the same pipeline with different configurations.

Method 1: Using Snapshots (ZenML Pro)

Important: Workspace API vs ZenML Pro API

Snapshots use your Workspace API (your individual workspace URL), not the ZenML Pro API (cloudapi.zenml.io). This distinction is crucial for authentication - you'll need to use ZenML Pro credentials with the Workspace API, not the ZenML Pro management API. See ZenML Pro Personal Access Tokens and ZenML Pro Organization Service Accounts.

Snapshots are the most straightforward way to trigger pipelines externally in ZenML. They provide a pre-defined, parameterized configuration that can be executed via multiple interfaces.

Creating a Snapshot

First, we need to create a snapshot of our pipeline. This requires having a remote stack with at least a remote orchestrator, artifact store, and container registry.

You can also pass a config file and specify a stack:

Running a snapshot

Once you have created a snapshot, there are multiple ways to run it, either programmatically with the Python client or via REST API for external systems.

Using the Python Client:

Using the REST API:

For this you'll need a URL for a ZenML server. For those with a ZenML Pro account, you can find the URL in the dashboard in the following location:

Where to find the ZenML server URL

You can also find the URL via the CLI by running:

The REST API is ideal for external system integration, allowing you to trigger pipelines from non-Python environments:

Note: When using the REST API, you need to specify parameters at the step level, not at the pipeline level. This matches how parameters are configured in the Python client.

Security Considerations for API Tokens

When using the REST API for external systems, proper token management is critical:

Why service accounts are better for automation:

  • Long-lived: Tokens don't expire automatically like user tokens (1 hour)

  • Dedicated: Not tied to individual team members who might leave

  • Secure: Can be granted minimal permissions needed for the task

  • Traceable: Clear audit trail of which system performed actions

Use this token in your API calls, and store it securely in your external system (e.g., as a GitHub Secret, AWS Secret, or environment variable). Read more about service accounts and tokens.

Method 2: Building a Custom Trigger API (Open Source)

If you're using the open-source version of ZenML or prefer a customized solution, you can create your own API wrapper around pipeline execution. This approach gives you full control over how pipelines are triggered and can be integrated into your existing infrastructure.

The custom trigger API solution consists of the following components:

  1. Pipeline Definition Module - Contains your pipeline code

  2. FastAPI Web Server - Provides HTTP endpoints for triggering pipelines

  3. Dynamic Pipeline Loading - Loads and executes pipelines on demand

  4. Authentication - Secures the API with API key authentication

  5. Containerization - Packages everything for deployment

Creating a Pipeline Module

First, create a module containing your pipeline definitions. This will be imported by the API service:

Creating a Requirements File

Create a requirements.txt file with the necessary dependencies:

Creating a FastAPI Wrapper

Next, create the pipeline_api.py file with the FastAPI application:

Containerizing Your API

Create a Dockerfile to containerize your API:

This Dockerfile includes several important features:

  1. Building with the uv package installer for faster builds

  2. Support for passing ZenML configuration via build arguments

  3. Automatic installation of stack-specific requirements

  4. Setting up environment variables for ZenML configuration

Running Your API Locally

To test the API server locally:

Deploying Your API

Build and deploy your containerized API:

For production deployment, you can:

  • Deploy to Kubernetes with a proper Ingress and TLS

  • Deploy to a cloud platform supporting Docker containers

  • Set up CI/CD for automated deployments

Triggering Pipelines via the API

You can trigger pipelines through the custom API with this endpoint:

This method starts the pipeline in a background thread and returns immediately with a status code of 202 (Accepted), making it suitable for asynchronous execution from external systems.

Extending the API

You can extend this API to support additional features:

  1. Pipeline Discovery: Add endpoints to list available pipelines

  2. Run Status Tracking: Add endpoints to check the status of pipeline runs

  3. Webhook Notifications: Implement callbacks when pipelines complete

  4. Advanced Authentication: Implement JWT or OAuth2 for better security

  5. Pipeline Scheduling: Add endpoints to schedule pipeline runs

Handling Concurrent Pipeline Execution

The FastAPI example above uses threading, but due to ZenML's architecture, concurrent pipeline execution will fail. For production environments that need to handle concurrent pipeline requests, consider deploying your pipeline triggers through container orchestration platforms.

For production deployments, consider using:

  1. Kubernetes Jobs: Deploy each pipeline execution as a separate Kubernetes Job for resource management and scaling

  2. Docker Containers: Use a container orchestration platform like Docker Swarm or ECS to run separate container instances

  3. Cloud Container Services: Leverage services like AWS ECS, Google Cloud Run, or Azure Container Instances

  4. Serverless Functions: Deploy pipeline triggers as serverless functions (AWS Lambda, Azure Functions, etc.)

These approaches ensure each pipeline runs in its own isolated environment, avoiding the concurrency limitations of ZenML's shared state architecture.

Security Considerations

When deploying this API in production:

  1. Use Strong API Keys: Generate secure, random API keys. The PIPELINE_API_KEY in the code example is a simple authentication token that protects your API endpoints. Do not use the default value in production.

  2. HTTPS/TLS: Always use HTTPS for production deployments

  3. Least Privilege: Use ZenML service accounts with minimal permissions

  4. Rate Limiting: Implement rate limiting to prevent abuse

  5. Secret Management: Use a secure secrets manager for API keys and credentials

  6. Logging & Monitoring: Implement proper logging for security audits

Best Practices & Troubleshooting

Tag Snapshots

You should tag your snapshots to make them easier to find and manage. It is currently only possible using the Python SDK:

Parameter Stability Best Practices

When triggering pipelines externally, it's crucial to maintain parameter stability to prevent unexpected behavior:

  1. Document Parameter Changes: Keep a changelog of parameter modifications and their impact on pipeline behavior

  2. Version Control Parameters: Store parameter configurations in version-controlled files (e.g., YAML) alongside your pipeline code

  3. Validate Parameter Changes: Consider implementing validation checks to ensure new parameter values are compatible with existing pipeline steps

  4. Consider Upstream Impact: Before modifying step parameters, analyze how changes might affect:

    • Downstream steps that depend on the step's output

    • Cached artifacts that might become invalid

    • Other pipelines that might be using this step

  5. Use Parameter Templates: Create parameter templates for different scenarios (e.g., development, staging, production) to maintain consistency

Security Best Practices

  1. API Keys: Always use API keys or tokens for authentication

  2. Principle of Least Privilege: Grant only necessary permissions to service accounts

  3. Key Rotation: Rotate API keys regularly

  4. Secure Storage: Store credentials in secure locations (not in code)

  5. TLS: Use HTTPS for all API endpoints

Monitoring and Observability

Implement monitoring for your trigger mechanisms:

Conclusion: Choosing the Right Approach

The best approach for triggering pipelines depends on your specific needs:

  1. ZenML Pro Snapshots: Ideal for teams that need a complete, managed solution with UI support and centralized management

  2. Custom API: Best for teams that need full control over the triggering mechanism and want to embed it within their own infrastructure

Regardless of your approach, always prioritize:

  • Security (authentication and authorization)

  • Reliability (error handling and retries)

  • Observability (logging and monitoring)

Next Steps

Now that you understand how to trigger ZenML pipelines from external systems, consider exploring:

  1. Managing scheduled pipelines for time-based execution

  2. Implementing comprehensive CI/CD for your ML workflows

  3. Setting up monitoring and alerting for pipeline failures

Last updated

Was this helpful?