Deployment Settings
Customize the pipeline deployment ASGI application with DeploymentSettings.
Deployment servers and ASGI apps
ZenML pipeline deployments run an ASGI application under a production-grade uvicorn server. This makes your pipelines callable over HTTP for online workloads like real-time ML inference, LLM agents/workflows, and even full web apps co-located with pipelines.
At runtime, three core components work together:
the ASGI application: the HTTP surface that exposes endpoints (health, invoke, metrics, docs) and any custom routes or middleware you configure. This is powered by an ASGI framework like FastAPI, Starlette, Django, Flask, etc.
the ASGI application factory (aka the Deployment App Runner): this component is responsible for constructing the ASGI application piece by piece based on the instructions provided by users via runtime configuration.
the Deployment Service: the component responsible for the business logic that backs the pipeline deployment and its invocation lifecycle.
Both the Deployment App Runner and the Deployment Service are customizable at runtime, through the DeploymentSettings configuration mechanism. They can also be extended via inheritance to support different ASGI frameworks or to tweak existing functionality.
The DeploymentSettings class lets you shape both server behavior and the ASGI app composition without changing framework code. Typical reasons to customize include:
Tight security posture: CORS controls, strict headers, authentication, API surface minimization.
Observability: request/response logging, tracing, metrics, correlation identifiers.
Enterprise integration: policy gateways, SSO/OIDC/OAuth, audit logging, routing and network architecture constraints.
Product UX: single-page application (SPA) static files served alongside deployment APIs or custom docs paths.
Performance/SRE: thread pool sizing, uvicorn worker settings, log levels, max request sizes and platform-specific fine-tuning.
All DeploymentSettings are pipeline-level settings. They apply to the deployment that serves the pipeline as a whole. They are not available at step-level.
Configuration overview
You can configure DeploymentSettings in Python or via YAML, the same way as other settings classes. The settings can be attached to a pipeline decorator or via with_options. These settings are only valid at pipeline level.
Python configuration
Use the DeploymentSettings class to configure the deployment settings for your pipeline in-code
YAML configuration
Define settings in a YAML configuration file for better separation of code and configuration:
Check out this page for more information on the hierarchy and precedence of the various ways in which you can supply the settings.
Basic customization options
DeploymentSettings expose the following basic customization options. The sections below provide short examples and guidance.
application metadata and paths
built-in endpoints and middleware toggles
static files (SPAs) and dashboards
CORS
secure headers
startup and shutdown hooks
uvicorn server options, logging level, and thread pool size
Application metadata
You can set app_title, app_description, and app_version to be reflected in the ASGI application's metadata:
Default URL paths, endpoints and middleware
The ASGI application exposes the following built-in endpoints by default:
documentation endpoints:
/docs- The OpenAPI documentation UI generated based on the endpoints and their signatures./redoc- The ReDoc documentation UI generated based on the endpoints and their signatures.
REST API endpoints:
/invoke- The main pipeline invocation endpoint for synchronous inference./health- The health check endpoint./info- The info endpoint providing extensive information about the deployment and its service./metrics- Simple metrics endpoint.
dashboard endpoints - present only if the accompanying UI is enabled:
/,/index.html,/static- Endpoints for serving the dashboard files from thedashboard_files_pathdirectory.
The ASGI application includes the following built-in middleware by default:
secure headers middleware: for setting security headers.
CORS middleware: for handling CORS requests.
You can include or exclude these default endpoints and middleware either globally or individually by setting the include_default_endpoints and include_default_middleware settings. It is also possible to remap the built-in endpoint URL paths.
With the above settings, the ASGI application will only expose the following endpoints and middleware:
/pipeline/documentation- The API documentation (OpenAPI schema)/pipeline/api/invoke- The REST API pipeline invocation endpoint/pipeline/api/healthz- The REST API health check endpointCORS middleware: for handling CORS requests
Static files (single-page applications)
Deployed pipelines can serve full single-page applications (React/Vue/Svelte) from the same origin as your inference API. This eliminates CORS/auth/routing friction and lets you ship user-facing UI components alongside your endpoints, such as:
operator dashboards
governance portals
experiment browsers
feature explorers
custom data labeling interfaces
model cards
observability dashboards
customer-facing playgrounds
Co-locating UI and API streamlines delivery (one image, one URL, one CI/CD), improves latency, and keeps telemetry and auth consistent.
To enable this, point dashboard_files_path to a directory containing an index.html and any static assets. The path must be relative to the source root:
A rudimentary playground dashboard is included with the ZenML python package that features a simple UI useful for sending pipeline invocations and viewing the pipeline's response.
Jinja2 templates
You can use a Jinja2 template to dynamically generate the index.html file that hosts the single-page application. This is useful if you want to dynamically generate the dashboard files based on the pipeline configuration, step configuration or stack configuration. A service_info variable is passed to the template that contains the service information, such as the service name, version, and description. This variable has the same structure as the zenml.deployers.server.models.ServiceInfo model.
Example:
CORS
Fine-tune cross-origin access:
Secure headers
Harden responses with strict headers. Each field supports either a boolean or string. Using True selects a safe default, False disables the header, and custom strings allow fully custom policies:
Set any field to False to omit that header. Set to a string for a custom value. The defaults are strong, production-safe policies.
Startup and shutdown hooks
Lifecycle startup and shutdown hooks are called as part of the ASGI application's lifespan. This is an alternative to the on_init and on_cleanup hooks that can be configured at pipeline level.
Common use-cases:
Model inference
load models/tokenizers and warm caches (JIT/ONNX/TensorRT, HF, sklearn)
hydrate feature stores, connect to vector DBs (FAISS, Milvus, PGVector)
initialize GPU memory pools and thread/process pools
set global config, download artifacts from registry or object store
prefetch embeddings, label maps, lookup tables
create connection pools for databases, Redis, Kafka, SQS, Pub/Sub
LLM agent workflows
initialize LLM client(s), tool registry, and router/policy engine
build or load RAG indexes; warm retrieval caches and prompts
configure rate limiting, concurrency guards, circuit breakers
load guardrails (PII filters, toxicity, jailbreak detection)
configure tracing/observability for token usage and tool calls
Shutdown
flush metrics/traces/logs, close pools/clients, persist state/caches
graceful draining: wait for in-flight requests before teardown
Hooks can be provided as:
A Python callable object
A source path string to be loaded dynamically (e.g.
my_project.runtime.hooks.on_startup)
The callable must accept an app_runner argument of type BaseDeploymentAppRunner and any additional keyword arguments. The app_runner argument is the application factory that is responsible for building the ASGI application. You can use it to access information such as:
the ASGI application instance that is being built
the deployment service instance that is being deployed
the
DeploymentResponseobject itself, which also contains details about the snapshot, pipeline, etc.
YAML using source strings:
Uvicorn and threading
Tune server runtime parameters for performance and topology:
The following settings are available for tuning the uvicorn server:
thread_pool_size: the size of the thread pool for CPU-bound work offload.uvicorn_host: the host to bind the uvicorn server to.uvicorn_port: the port to bind the uvicorn server to.uvicorn_workers: the number of workers to use for the uvicorn server.log_level: the log level to use for the uvicorn server.uvicorn_reload: whether to enable auto-reload for the uvicorn server. This is useful when using the local Deployer stack component to speed up local development by automatically restarting the server when code changes are detected. NOTE: theuvicorn_reloadsetting has no effect on changes in the pipeline configuration, step configuration or stack configuration.uvicorn_kwargs: a dictionary of keyword arguments to pass to the uvicorn server.
The following settings are available:
Advanced customization options
When the built-in ASGI application, endpoints and middleware are not enough, you can take customizing your deployment to the next level by providing your own implementation for endpoints, middleware and other ASGI application extensions. ZenML DeploymentSettings provides a flexible and extensible mechanism to inject your own custom code into the ASGI application at runtime:
custom endpoints - to expose your own HTTP endpoints.
custom middleware - to insert your own ASGI middleware.
free-form ASGI application building extensions - to take full control of the ASGI application and its lifecycle for truly advanced use-cases when endpoints and middleware are not enough.
Custom endpoints
In production, custom endpoints are often required alongside the main pipeline invoke route. Common use-cases include:
Online inference controls
model (re)load, warm-up, and cache priming
dynamic model/version switching and traffic shaping (A/B, canary)
async/batch prediction submission and job-status polling
feature store materialization/backfills and online/offline sync triggers
Enterprise integration
authentication bootstrap (API key issuance/rotation), JWKS rotation
OIDC/OAuth device-code flows and SSO callback handlers
external system webhooks (CRM, billing, ticketing, audit sink)
Observability and operations
detailed health/readiness endpoints (subsystems, dependencies)
metrics/traces/log shipping toggles; log level switch (INFO/DEBUG)
maintenance-mode enable/disable and graceful drain controls
LLM agent serving
tool registry CRUD, tool execution sandboxes, guardrail toggles
RAG index CRUD (upsert documents, rebuild embeddings, vacuum/compact)
prompt template catalogs and runtime overrides
session memory inspection/reset, conversation export/import
Governance and data management
payload redaction policy updates and capture sampling controls
schema/contract discovery (sample payloads, test vectors)
tenant provisioning, quotas/limits, and per-tenant configuration
You can configure custom_endpoints in DeploymentSettings to expose your own HTTP endpoints.
Endpoints support multiple definition modes (see code examples below):
Direct callable - a simple function that takes in request parameters and returns a response. Framework-specific arguments such as FastAPI's
Request,Responseand dependency injection patterns are supported.Builder class - a callable class with a
__call__method that is the actual endpoint callable described at 1). The builder class constructor is called by the ASGI application factory and can be leveraged to execute any global initialization logic before the endpoint is called.Builder function - a function that returns the actual endpoint callable described at 1). Similar to the builder class.
Native framework-specific object (
native=True). This can vary from ASGI framework to framework.
Definitions can be provided as Python objects or as loadable source path strings.
The builder class and builder function must accept an app_runner argument of type BaseDeploymentAppRunner. This is the application factory that is responsible for building the ASGI application. You can use it to access information such as:
the ASGI application instance that is being built
the deployment service instance that is being deployed
the
DeploymentResponseobject itself, which also contains details about the snapshot, pipeline, etc.
The final endpoint callable can take any input arguments and return any output that are JSON-serializable or Pydantic models. The application factory will handle converting these into the appropriate schema for the ASGI application.
You can also use framework-specific request/response types (e.g. FastAPI Request, Response) or dependency injection patterns for your endpoint callable if needed. However, this will limit the portability of your endpoint to other frameworks.
The following code examples demonstrate the different definition modes for custom endpoints:
a custom detailed health check endpoint implemented as a direct callable
a custom ML model inference endpoint, implemented as a builder function. Note how the builder function loads the model only once at runtime, and then reuses it for all subsequent requests.
NOTE: a similar way to do this is to implement a proper ZenML pipeline that loads the model in the on_init hook and then runs pre-processing and inference steps in the pipeline.
a custom deployment info endpoint implemented as a builder class
a custom model selection endpoint, implemented as a FastAPI router. This example is more involved and demonstrates how to coordinate multiple endpoints with the main pipeline invoke endpoint.
And here is a minimal ZenML inference pipeline that uses the globally loaded model. The prediction step reads the model from the global variable set by the FastAPI router above. You can invoke this pipeline via the built-in /invoke endpoint once a model has been loaded through /model/load.
Custom middleware
Middleware is where you enforce cross-cutting concerns consistently across every endpoint. Common use-cases include:
Security and access control
API key/JWT verification, tenant extraction and context injection
IP allow/deny lists, basic WAF-style request filtering, mTLS header checks
Request body/schema validation and max body size enforcement
Governance and privacy
PII detection/redaction on inputs/outputs; payload sampling/scrubbing
Policy enforcement (data residency, retention, consent) at request time
Reliability and traffic shaping
Rate limiting, quotas, per-tenant concurrency limits
Idempotency keys, deduplication, retries with backoff, circuit breakers
Timeouts, slow-request detection, maintenance mode and graceful drain
Observability
Correlation/trace IDs, OpenTelemetry spans, structured logging
Metrics for latency, throughput, error rates, request/response sizes
Performance and caching
Response caching/ETags, compression (gzip/br), streaming/chunked responses
Adaptive content negotiation and serialization tuning
LLM/agent-specific controls
Token accounting/limits, cost guards per tenant/user
Guardrails (toxicity/PII/jailbreak) and output filtering
Tool execution sandboxing gates and allowlists
Data and feature enrichment
Feature store prefetch, user/tenant profile enrichment, AB bucketing tags
You can configure custom_middlewares in DeploymentSettings to insert your own ASGI middleware.
Middlewares support multiple definition modes (see code examples below):
Middleware class - a standard ASGI middleware class that implements the
__call__method that takes the traditionalscope,receiveandsendarguments. The constructor must accept anappargument of typeASGIApplicationand any additional keyword arguments.Middleware callable - a callable that takes all arguments in one go:
app,scope,receiveandsend.Native framework-specific middleware (
native=True) - this can vary from ASGI framework to framework.
Definitions can be provided as Python objects or as loadable source path strings. The order parameter controls the insertion order in the middleware chain. Lower order values insert the middleware earlier in the chain.
The following code examples demonstrate the different definition modes for custom middlewares:
a custom middleware that adds a processing time header to every response, implemented as a middleware class:
a custom middleware that injects a correlation ID into responses (and generates one if missing), implemented as a middleware callable:
a FastAPI/Starlette-native middleware that adds GZIP support, implemented as a native middleware:
App extensions
App extensions are pluggable components that are running as part of the ASGI application factory that can install complex, possibly framework-specific structures. The following are usual scenarios for using a full-blown extension instead of endpoints/middleware:
Advanced authentication and authorization
install org-wide dependencies (e.g., OAuth/OIDC auth, RBAC guards)
register custom exception handlers for uniform error envelopes
augment OpenAPI with security schemes and per-route security policies
Multi-tenant and routing topology
programmatically include routers per tenant/region/version
mount sub-apps for internal admin vs public APIs under different prefixes
dynamic route rewrites/switches for blue/green or canary rollouts
Observability and platform integration
wire OpenTelemetry instrumentation at the app level (tracer/meter providers)
register global request/response logging with redaction policies
expose or mount vendor-specific observability apps (e.g., Prometheus)
LLM agent control plane
attach a tool registry/router and lifecycle hooks for tools
register guardrail handlers and policy engines across routes
install runtime prompt/template catalogs and index management routers
API ergonomics and governance
reshape OpenAPI (tags, servers, components) and versioned docs
global response model wrapping, pagination conventions, error mappers
maintenance-mode switch and graceful-drain controls at the app level
App extensions support multiple definition modes (see code examples below):
Extension class - a class that implements the
BaseAppExtensionabstract class. The class constructor must accept any keyword arguments and theinstallmethod must accept anapp_runnerargument of typeBaseDeploymentAppRunner.Extension callable - a callable that takes the
app_runnerargument of typeBaseDeploymentAppRunner.
Both classes and callables must take in an app_runner argument of type BaseDeploymentAppRunner. This is the application factory that is responsible for building the ASGI application. You can use it to access information such as:
the ASGI application instance that is being built
the deployment service instance that is being deployed
the
DeploymentResponseobject itself, which also contains details about the snapshot, pipeline, etc.
Definitions can be provided as Python objects or as loadable source path strings.
The extensions are summoned to take part in the ASGI application building process near the end of the initialization - after the ASGI app has been built according to the deployment configuration settings.
The example below installs API key authentication at the FastAPI application level, attaches the dependency to selected routes, registers an auth error handler, and augments the OpenAPI schema with the security scheme.
Implementation customizations for advanced use cases
For cases where you need deeper control over how the ASGI app is created or how the deployment logic is implemented, you can swap/extend the core components using the following DeploymentSettings fields:
deployment_app_runner_flavoranddeployment_app_runner_kwargslet you choose or extend the app runner that constructs and runs the ASGI app. This needs to be set to a subclass ofBaseDeploymentAppRunnerFlavor, which is basically a descriptor of an app runner implementation that itself is a subclass ofBaseDeploymentAppRunner.deployment_service_classanddeployment_service_kwargslet you provide your own deployment service to customize the pipeline deployment logic. This needs to be set to a subclass ofBasePipelineDeploymentService.
Both accept loadable sources or objects. We cover how to implement custom runner flavors and services in a dedicated guide.
Last updated
Was this helpful?