Databricks
Deploying models to Databricks Inference Endpoints with Databricks
Databricks Model Serving or Mosaic AI Model Serving provides a unified interface to deploy, govern, and query AI models. Each model you serve is available as a REST API that you can integrate into your web or client application.
This service provides dedicated and autoscaling infrastructure managed by Databricks, allowing you to deploy models without dealing with containers and GPUs.
Databricks Model deployer can be considered as a managed service for deploying models using MLflow, This means you can switch between MLflow and Databricks Model Deployers without changing your pipeline code even for custom complex models.
When to use it?
You should use Databricks Model Deployer:
You are already using Databricks for your data and ML workloads.
If you want to deploy AI models without dealing with containers and GPUs, Databricks Model Deployer provides a unified interface to deploy, govern, and query models.
Databricks Model Deployer offers dedicated and autoscaling infrastructure managed by Databricks, making it easier to deploy models at scale.
Enterprise security is a priority, and you need to deploy models into secure offline endpoints accessible only via a direct connection to your Virtual Private Cloud (VPCs).
if your goal is to turn your models into production-ready APIs with minimal infrastructure or MLOps involvement.
If you are looking for a more easy way to deploy your models locally, you can use the MLflow Model Deployer flavor.
How to deploy it?
The Databricks Model Deployer flavor is provided by the Databricks ZenML integration, so you need to install it on your local machine to be able to deploy your models. You can do this by running the following command:
To register the Databricks model deployer with ZenML you need to run the following command:
We recommend creating a Databricks service account with the necessary permissions to create and run jobs. You can find more information on how to create a service account here. You can generate a client_id and client_secret for the service account and use them to authenticate with Databricks.
We can now use the model deployer in our stack.
See the databricks_model_deployer_step for an example of using the Databricks Model Deployer to deploy a model inside a ZenML pipeline step.
Configuration
Within the DatabricksServiceConfig
you can configure:
model_name
: The name of the model that will be served, this will be used to identify the model in the Databricks Model Registry.model_version
: The version of the model that will be served, this will be used to identify the model in the Databricks Model Registry.workload_size
: The size of the workload that the model will be serving. This can beSmall
,Medium
, orLarge
.scale_to_zero_enabled
: A boolean flag to enable or disable the scale to zero feature.env_vars
: A dictionary of environment variables to be passed to the model serving container.workload_type
: The type of workload that the model will be serving. This can beCPU
,GPU_LARGE
,GPU_MEDIUM
,GPU_SMALL
, orMULTIGPU_MEDIUM
.endpoint_secret_name
: The name of the secret that will be used to secure the endpoint and authenticate requests.
For more information and a full list of configurable attributes of the Databricks Model Deployer, check out the SDK Docs and Databricks endpoint code.
Run inference on a provisioned inference endpoint
The following code example shows how to run inference against a provisioned inference endpoint:
For more information and a full list of configurable attributes of the Databricks Model Deployer, check out the SDK Docs.
Last updated