LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Overview
  • Integrations
  • Stack Components
    • Orchestrators
      • Local Orchestrator
      • Local Docker Orchestrator
      • Kubeflow Orchestrator
      • Kubernetes Orchestrator
      • Google Cloud VertexAI Orchestrator
      • AWS Sagemaker Orchestrator
      • AzureML Orchestrator
      • Databricks Orchestrator
      • Tekton Orchestrator
      • Airflow Orchestrator
      • Skypilot VM Orchestrator
      • HyperAI Orchestrator
      • Lightning AI Orchestrator
      • Develop a custom orchestrator
    • Artifact Stores
      • Local Artifact Store
      • Amazon Simple Cloud Storage (S3)
      • Google Cloud Storage (GCS)
      • Azure Blob Storage
      • Develop a custom artifact store
    • Container Registries
      • Default Container Registry
      • DockerHub
      • Amazon Elastic Container Registry (ECR)
      • Google Cloud Container Registry
      • Azure Container Registry
      • GitHub Container Registry
      • Develop a custom container registry
    • Step Operators
      • Amazon SageMaker
      • AzureML
      • Google Cloud VertexAI
      • Kubernetes
      • Modal
      • Spark
      • Develop a Custom Step Operator
    • Experiment Trackers
      • Comet
      • MLflow
      • Neptune
      • Weights & Biases
      • Google Cloud VertexAI Experiment Tracker
      • Develop a custom experiment tracker
    • Image Builders
      • Local Image Builder
      • Kaniko Image Builder
      • AWS Image Builder
      • Google Cloud Image Builder
      • Develop a Custom Image Builder
    • Alerters
      • Discord Alerter
      • Slack Alerter
      • Develop a Custom Alerter
    • Annotators
      • Argilla
      • Label Studio
      • Pigeon
      • Prodigy
      • Develop a Custom Annotator
    • Data Validators
      • Great Expectations
      • Deepchecks
      • Evidently
      • Whylogs
      • Develop a custom data validator
    • Feature Stores
      • Feast
      • Develop a Custom Feature Store
    • Model Deployers
      • MLflow
      • Seldon
      • BentoML
      • Hugging Face
      • Databricks
      • vLLM
      • Develop a Custom Model Deployer
    • Model Registries
      • MLflow Model Registry
      • Develop a Custom Model Registry
  • Service Connectors
    • Introduction
    • Complete guide
    • Best practices
    • Connector Types
      • Docker Service Connector
      • Kubernetes Service Connector
      • AWS Service Connector
      • GCP Service Connector
      • Azure Service Connector
      • HyperAI Service Connector
  • Popular Stacks
    • AWS
    • Azure
    • GCP
    • Kubernetes
  • Deployment
    • 1-click Deployment
    • Terraform Modules
    • Register a cloud stack
    • Infrastructure as code
  • Contribute
    • Custom Stack Component
    • Custom Integration
Powered by GitBook
On this page
  • When to use it?
  • How to deploy it?
  • Configuration
  • Run inference on a provisioned inference endpoint

Was this helpful?

  1. Stack Components
  2. Model Deployers

Databricks

Deploying models to Databricks Inference Endpoints with Databricks

PreviousHugging FaceNextvLLM

Last updated 1 month ago

Was this helpful?

Databricks Model Serving or Mosaic AI Model Serving provides a unified interface to deploy, govern, and query AI models. Each model you serve is available as a REST API that you can integrate into your web or client application.

This service provides dedicated and autoscaling infrastructure managed by Databricks, allowing you to deploy models without dealing with containers and GPUs.

Databricks Model deployer can be considered as a managed service for deploying models using MLflow, This means you can switch between MLflow and Databricks Model Deployers without changing your pipeline code even for custom complex models.

When to use it?

You should use Databricks Model Deployer:

  • You are already using Databricks for your data and ML workloads.

  • If you want to deploy AI models without dealing with containers and GPUs, Databricks Model Deployer provides a unified interface to deploy, govern, and query models.

  • Databricks Model Deployer offers dedicated and autoscaling infrastructure managed by Databricks, making it easier to deploy models at scale.

  • Enterprise security is a priority, and you need to deploy models into secure offline endpoints accessible only via a direct connection to your Virtual Private Cloud (VPCs).

  • if your goal is to turn your models into production-ready APIs with minimal infrastructure or MLOps involvement.

If you are looking for a more easy way to deploy your models locally, you can use the flavor.

How to deploy it?

The Databricks Model Deployer flavor is provided by the Databricks ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:

zenml integration install databricks -y

To register the Databricks model deployer with ZenML you need to run the following command:

zenml model-deployer register <MODEL_DEPLOYER_NAME> --flavor=databricks --host=<HOST> --client_id={{databricks.client_id}} --client_secret={{databricks.client_secret}}

We can now use the model deployer in our stack.

zenml stack update <CUSTOM_STACK_NAME> --model-deployer=<MODEL_DEPLOYER_NAME>

Configuration

Within the DatabricksServiceConfig you can configure:

  • model_name: The name of the model that will be served, this will be used to identify the model in the Databricks Model Registry.

  • model_version: The version of the model that will be served, this will be used to identify the model in the Databricks Model Registry.

  • workload_size: The size of the workload that the model will be serving. This can be Small, Medium, or Large.

  • scale_to_zero_enabled: A boolean flag to enable or disable the scale to zero feature.

  • env_vars: A dictionary of environment variables to be passed to the model serving container.

  • workload_type: The type of workload that the model will be serving. This can be CPU, GPU_LARGE, GPU_MEDIUM, GPU_SMALL, or MULTIGPU_MEDIUM.

  • endpoint_secret_name: The name of the secret that will be used to secure the endpoint and authenticate requests.

Run inference on a provisioned inference endpoint

The following code example shows how to run inference against a provisioned inference endpoint:

from typing import Annotated
from zenml import step, pipeline
from zenml.integrations.databricks.model_deployers import DatabricksModelDeployer
from zenml.integrations.databricks.services import DatabricksDeploymentService


# Load a prediction service deployed in another pipeline
@step(enable_cache=False)
def prediction_service_loader(
    pipeline_name: str,
    pipeline_step_name: str,
    running: bool = True,
    model_name: str = "default",
) -> DatabricksDeploymentService:
    """Get the prediction service started by the deployment pipeline.

    Args:
        pipeline_name: name of the pipeline that deployed the MLflow prediction
            server
        step_name: the name of the step that deployed the MLflow prediction
            server
        running: when this flag is set, the step only returns a running service
        model_name: the name of the model that is deployed
    """
    # get the Databricks model deployer stack component
    model_deployer = DatabricksModelDeployer.get_active_model_deployer()

    # fetch existing services with same pipeline name, step name and model name
    existing_services = model_deployer.find_model_server(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
        model_name=model_name,
        running=running,
    )

    if not existing_services:
        raise RuntimeError(
            f"No Databricks inference endpoint deployed by step "
            f"'{pipeline_step_name}' in pipeline '{pipeline_name}' with name "
            f"'{model_name}' is currently running."
        )

    return existing_services[0]


# Use the service for inference
@step
def predictor(
    service: DatabricksDeploymentService,
    data: str
) -> Annotated[str, "predictions"]:
    """Run a inference request against a prediction service"""

    prediction = service.predict(data)
    return prediction


@pipeline
def databricks_deployment_inference_pipeline(
    pipeline_name: str, pipeline_step_name: str = "databricks_model_deployer_step",
):
    inference_data = ...
    model_deployment_service = prediction_service_loader(
        pipeline_name=pipeline_name,
        pipeline_step_name=pipeline_step_name,
    )
    predictions = predictor(model_deployment_service, inference_data)

We recommend creating a Databricks service account with the necessary permissions to create and run jobs. You can find more information on how to create a service account . You can generate a client_id and client_secret for the service account and use them to authenticate with Databricks.

See the for an example of using the Databricks Model Deployer to deploy a model inside a ZenML pipeline step.

For more information and a full list of configurable attributes of the Databricks Model Deployer, check out the and Databricks endpoint .

For more information and a full list of configurable attributes of the Databricks Model Deployer, check out the .

MLflow Model Deployer
here
databricks_model_deployer_step
SDK Docs
code
SDK Docs
ZenML Scarf