How to deploy your models locally with MLflow

This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.

The MLflow Model Deployer is one of the available flavors of the Model Deployer stack component. Provided with the MLflow integration it can be used to deploy and manage MLflow models on a local running MLflow server.

The MLflow Model Deployer is not yet available for use in production. This is a work in progress and will be available soon. At the moment it is only available for use in a local development environment.

When to use it?

MLflow is a popular open source platform for machine learning. It's a great tool for managing the entire lifecycle of your machine learning. One of the most important features of MLflow is the ability to package your model and its dependencies into a single artifact that can be deployed to a variety of deployment targets.

You should use the MLflow Model Deployer:

  • if you want to have an easy way to deploy your models locally and perform real-time predictions using the running MLflow prediction server.

  • if you are looking to deploy your models in a simple way without the need for a dedicated deployment environment like Kubernetes or advanced infrastructure configuration.

If you are looking to deploy your models in a more complex way, you should use one of the other Model Deployer Flavors available in ZenML (e.g. Seldon Core, KServe, etc.)

How do you deploy it?

The MLflow Model Deployer flavor is provided by the MLflow ZenML integration, you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:

zenml integration install mlflow -y

To register the MLflow model deployer with ZenML you need to run the following command:

zenml model-deployer register mlflow_deployer --flavor=mlflow

The ZenML integration will provision a local MLflow deployment server as a daemon process that will continue to run in the background to serve the latest MLflow model.

How do you use it?

The first step to be able to deploy and use your MLflow model is to create Service deployment from code, this is done by setting the different parameters that the MLflow deployment step requires.

from zenml.integrations.mlflow.steps import mlflow_deployer_step
from zenml.integrations.mlflow.steps import MLFlowDeployerParameters

model_deployer = mlflow_deployer_step(name="model_deployer")


# Initialize a continuous deployment pipeline run
deployment = continuous_deployment_pipeline(
    # as a last step to our pipeline the model deployer step is run with it config in place

You can run predictions on the deployed model with something like:

from import MLFlowDeploymentService
from zenml.steps import BaseParameters, Output, StepContext, step
from import load_last_service_from_step


class MLFlowDeploymentLoaderStepParams(BaseParameters):
    """MLflow deployment getter configuration

        pipeline_name: name of the pipeline that deployed the MLflow prediction
        step_name: the name of the step that deployed the MLflow prediction
        running: when this flag is set, the step only returns a running service

    pipeline_name: str
    step_name: str
    running: bool = True

# Step to retrieve the service associated with the last pipeline run
def prediction_service_loader(
    params: MLFlowDeploymentLoaderStepParams, context: StepContext
) -> MLFlowDeploymentService:
    """Get the prediction service started by the deployment pipeline"""

    service = load_last_service_from_step(
    if not service:
        raise RuntimeError(
            f"No MLflow prediction service deployed by the "
            f"{params.step_name} step in the {params.pipeline_name} pipeline "
            f"is currently running."

    return service

# Use the service for inference
def predictor(
    service: MLFlowDeploymentService,
    data: np.ndarray,
) -> Output(predictions=np.ndarray):
    """Run a inference request against a prediction service"""

    service.start(timeout=10)  # should be a NOP if already started
    prediction = service.predict(data)
    prediction = prediction.argmax(axis=-1)

    return prediction

# Initialize an inference pipeline run
inference = inference_pipeline(

You can check the MLflow deployment example for more details.

For more information and a full list of configurable attributes of the MLflow Model Deployer, check out the API Docs.

Last updated