Pipeline Deployments
Deploy pipelines as HTTP services for real-time execution
Pipeline deployment allows you to run ZenML pipelines as long-running HTTP services for real-time execution, rather than traditional batch mode execution. This enables you to invoke pipelines through HTTP requests and receive immediate responses.
What is a Pipeline Deployment?
A pipeline deployment is a long-running HTTP server that wraps your pipeline for real-time, request-response interactions. While traditional (batch) pipeline execution (via orchestrators) is ideal for scheduled batch processing, data transformations, and offline training workflows, deployments are designed for scenarios where you need immediate responses - like serving predictions to a web app, processing user requests, or powering interactive AI agents. Deployments create persistent services that stay running and can handle multiple concurrent requests through HTTP endpoints.
When you deploy a pipeline, ZenML creates an HTTP server (called a Deployment) that can execute your pipeline multiple times in parallel by invoking HTTP endpoints.
Common Use Cases
Pipeline deployments are ideal for scenarios requiring real-time, on-demand execution of ML workflows:
Online ML Inference: Deploy trained models as HTTP services for real-time predictions, such as fraud detection in payment systems, recommendation engines for e-commerce, or image classification APIs. Pipeline deployments handle feature preprocessing, model loading, and prediction logic while managing concurrent requests efficiently.
LLM Agent Workflows: Build intelligent agents that combine multiple AI capabilities like intent analysis, retrieval-augmented generation (RAG), and response synthesis. These deployments can power chatbots, customer support systems, or document analysis services that require multi-step reasoning and context retrieval. See the Agent Outer Loop and Deploying Agents examples for practical implementations.
Real-time Data Processing: Process streaming events or user interactions that require immediate analysis and response, such as real-time analytics dashboards, anomaly detection systems, or personalization engines.
Multi-step Business Workflows: Orchestrate complex processes involving multiple AI/ML components, like document processing pipelines that combine OCR, entity extraction, sentiment analysis, and classification into a single deployable service.
Traditional Model Serving vs. Deployed Pipelines
If you're reaching for tools like Seldon or KServe, consider this: deployed pipelines give you all the core serving primitives, plus the power of a full application runtime.
Equivalent functionality: A pipeline handles the end-to-end inference path out of the box — request validation, feature pre-processing, model loading and inference, post-processing, and response shaping.
More flexible: Deployed pipelines are unopinionated, so you can layer in retrieval, guardrails, rules, A/B routing, canary logic, human-in-the-loop, or any custom orchestration. You're not constrained by a model-server template.
More customizable: The deployment is a real ASGI app. Tailor endpoints, authentication, authorization, rate limiting, structured logging, tracing, correlation IDs, or SSO/OIDC — all with first-class middleware and framework-level hooks.
More features: Serve single-page apps alongside the API. Ship admin/ops dashboards, experiment playgrounds, model cards, or customer-facing UIs from the very same deployment for tighter operational feedback loops.
This approach aligns better with production realities: inference is rarely "just call a model." There are policies, data dependencies, and integrations that need a programmable, evolvable surface. Deployed pipelines give you that without sacrificing the convenience of a managed deployer and a clean HTTP contract.
How Deployments Work
To deploy a pipeline or snapshot, a Deployer stack component needs to be in your active stack. You can use the default stack, which has a default local deployer that will deploy the pipeline directly on your local machine as a background process:
or set up a new stack with a deployer in it:
The Deployer stack component manages the deployment of pipelines as long-running HTTP servers. It integrates with a specific infrastructure back-end like Docker, AWS App Runner, GCP Cloud Run etc., in order to implement the following functionalities:
Creating and managing persistent containerized services
Exposing HTTP endpoints for pipeline invocation
Managing the lifecycle of deployments (creation, updates, deletion)
Providing connection information and management commands
With a Deployer stack component in your active stack, a pipeline or snapshot can be deployed using the ZenML CLI:
To deploy a pipeline using the ZenML SDK:
It is also possible to deploy snapshots programmatically:
Once deployed, a pipeline can be invoked through the URL exposed by the deployment. Every invocation of the deployment will create a new pipeline run.
The ZenML CLI provides a convenient command to invoke a deployment:
which is the equivalent of the following HTTP request:
Deployment Lifecycle
Once a Deployment is created, it is tied to the specific Deployer stack component that was used to provision it and can be managed independently of the active stack as a standalone entity with its own lifecycle.
A Deployment contains the following key information:
name: Unique deployment name within the projecturl: HTTP endpoint URL where the deployment can be accessedstatus: Current deployment status. This can take one of the following valuesDeploymentStatusenum values:RUNNING: The deployment is running and accepting HTTP requestsABSENT: The deployment is not currently provisionedPENDING: The deployment is currently undergoing some operation (e.g. being created, updated or deleted)ERROR: The deployment is in an error state. When in this state, more information about the error can be found in the ZenML logs, the Deploymentmetadatafield or in the Deployment logs.UNKNOWN: The deployment is in an unknown state
metadata: Deployer-specific metadata describing the deployment's operational state
Managing Deployments
To list all the deployments managed in your project by all the available Deployers:
This shows a table with deployment details:
Detailed information about a specific deployment can be obtained with the following command:
This provides comprehensive deployment details, including its state and access information:
Deploying or redeploying a pipeline or snapshot on top of an existing deployment will update the deployment in place:
Deployment update checks and limitations
Updating a deployment owned by a different user requires additional confirmation. This is to avoid unintentionally updating someone else's deployment.
An existing deployment cannot be updated using a stack different from the one it was originally deployed with.
A pipeline snapshot can only have one deployment running at a time. You cannot deploy the same snapshot multiple times. You either have to delete the existing deployment and deploy the snapshot again or create a different snapshot.
Deprovisioning and deleting a deployment are two different operations. Deprovisioning a deployment keeps a record of it in the ZenML database so that it can be easily restored later if needed. Deleting a deployment completely removes it from the ZenML store:
Deployer deletion
A Deployer stack component cannot be deleted as long as there is at least one deployment managed by it that is not in an ABSENT state. To delete a Deployer stack component, you need to first deprovision or delete all the deployments managed by it. If some deployments are stuck in an ERROR state, you can use the --force flag to delete them without the need to deprovision them first, but be aware that this may leave some infrastructure resources orphaned.
The server logs of a deployment can be accessed with the following command:
Deployable Pipeline Requirements
While any pipeline can technically be deployed, following these guidelines ensures practical usability:
Pipeline Input Parameters
Pipelines should accept explicit parameters to enable dynamic invocation:
When deployed, the example pipeline above can be invoked:
with a CLI command like the following:
or with an HTTP request like the following:
Pipeline input parameters behave differently when pipelines are deployed than when they are run as a batch job. When running a parameterized pipeline, its input parameters are evaluated before the pipeline run even starts and can be used to configure the structure of the pipeline DAG. When invoking a deployment, the input parameters do not have an effect on the pipeline DAG structure, so a pipeline like the following will not work as expected:
Pipeline Outputs
Pipelines should return meaningful values for useful HTTP responses:
Invoking a deployment of this pipeline will return the response below. Note how the outputs field contains the value returned by the process_weather step and the name of the output artifact is used as the key.
Deployment Authentication
A rudimentary form of HTTP Basic authentication can be enabled for deployments by configuring one of two deployer configuration options:
generate_auth_key: set toTrueto automatically generate a shared secret key for the deployment. This is not set by default.auth_key: configure the shared secret key manually.
Deploying the above pipeline automatically generates and returns a key that will be required in the Authorization header of HTTP requests made to the deployment:
Deployment Initialization, Cleanup and State
It often happens that the HTTP requests made to the same deployment share some type of initialization or cleanup or need to share the same global state or. For example:
a machine learning model needs to be loaded in memory, initialized and then shared between all the HTTP requests made to the deployment in order to be used by the deployed pipeline to make predictions
a database client must be initialized and shared across all the HTTP requests made to the deployment in order to read and write data
To achieve this, it is possible to configure custom initialization and cleanup hooks for the pipeline being deployed:
The following happens when the pipeline is deployed and then later invoked:
The on_init hook is executed only once, when the deployment is started
The value returned by the on_init hook is stored in memory in the deployment and can be accessed by pipeline steps using the
pipeline_stateproperty of the step contextThe on_cleanup hook is executed only once, when the deployment is stopped
This mechanism can be used to initialize and share global state between all the HTTP requests made to the deployment or to execute long-running initialization or cleanup operations when the deployment is started or stopped rather than on each HTTP request.
Deployment Configuration
The deployer settings cover aspects of the pipeline deployment process and specific back-end infrastructure used to provision and manage the resources required to run the deployment servers. Independently of that, DeploymentSettings can be used to fully customize all aspects pertaining to the deployment ASGI application itself, including:
HTTP endpoints
middleware
secure headers
CORS settings
mounting and serving static files to support deploying single-page applications alongside the pipeline
for more advanced cases, even the ASGI framework (e.g. FastAPI, Django, Flask, Falcon, Quart, BlackSheep, etc.) and its configuration can be customized
Example:
For more detailed information on deployment options, see the deployment settings guide.
Best Practices
Design for Parameters: Structure your pipelines to accept meaningful parameters that control behavior
Provide Default Values: Ensure all parameters have sensible defaults
Return Useful Data: Design pipeline outputs to provide meaningful responses
Use Type Annotations: Leverage Pydantic models for complex parameter types
Use Global Initialization and State: Use the
on_initandon_cleanuphooks along with thepipeline_statestep context property to initialize and share global state between all the HTTP requests made to the deployment. Also use these hooks to execute long-running initialization or cleanup operations when the deployment is started or stopped rather than on each HTTP request.Handle Errors Gracefully: Implement proper error handling in your steps
Test Locally First: Validate your deployable pipeline locally before deploying to production
Conclusion
Pipeline deployment transforms ZenML pipelines from batch processing workflows into real-time services. By following the guidelines for deployable pipelines and understanding the deployment lifecycle, you can create robust, scalable ML services that integrate seamlessly with web applications and real-time systems.
See also:
Steps & Pipelines - Core building blocks
Deployer Stack Component - The stack component that manages the deployment of pipelines as long-running HTTP servers
Last updated
Was this helpful?