KServe
How to deploy models to Kubernetes with KServe
When to use it?
KServe encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KServe is being used across various organizations.
You should use the KServe Model Deployer:
If you are looking to deploy your model with an advanced Model Inference Platform with Kubernetes, built for highly scalable use cases.
If you want to handle the lifecycle of the deployed model with no downtime, with possibility of scaling to zero on GPUs.
Looking for out-of-the-box model serving runtimes that are easy to use and easy to deploy model from the majority of frameworks.
If you want more advanced deployment strategies like A/B testing, canary deployments, ensembles and transformers.
How to deploy it?
ZenML provides a KServe flavor build on top of the KServe Integration to allow you to deploy and use your models in a production-grade environment. In order to use the integration you need to install it on your local machine to be able to register the KServe Model deployer with ZenML and add it to your stack:
To deploy and make use of the KServe integration we need to have the following prerequisites:
access to a Kubernetes cluster. The example accepts a
--kubernetes-context
command line argument. This Kubernetes context needs to point to the Kubernetes cluster where KServe model servers will be deployed. If the context is not explicitly supplied to the example, it defaults to using the locally active context.
Since the KServe Model Deployer is interacting with the KServe model serving Platform deployed on a Kubernetes cluster, you need to provide a set of configuration parameters. These parameters are:
kubernetes_context: the Kubernetes context to use to contact the remote KServe installation. If not specified, the current configuration is used. Depending on where the KServe model deployer is being used
kubernetes_namespace: the Kubernetes namespace where the KServe deployment servers are provisioned and managed by ZenML. If not specified, the namespace set in the current configuration is used.
base_url: the base URL of the Kubernetes ingress used to expose the KServe deployment servers.
secret: the name of a ZenML secret containing the credentials used by KServe storage initializers to authenticate to the Artifact Store
Managing KServe Credentials
The KServe model servers need to access the Artifact Store in the ZenML stack to retrieve the model artifacts. This usually involve passing some credentials to the KServe model servers required to authenticate with the Artifact Store. In ZenML, this is done by creating a ZenML secret with the proper credentials and configuring the KServe Model Deployer stack component to use it, by passing the --secret
argument to the CLI command used to register the model deployer. We've already done the latter, now all that is left to do is to configure the s3-store
ZenML secret specified before as a KServe Model Deployer configuration attribute with the credentials needed by KServe to access the artifact store.
There are built-in secret schemas that the KServe integration provides which can be used to configure credentials for the 3 main types of Artifact Stores supported by ZenML: S3, GCS and Azure.
The recommended way to pass the credentials to the KServe model deployer is to use a file that contains the credentials. You can achieve this by adding the @
followed by the path to the file to the --credentials
argument. (e.g. --credentials @/path/to/credentials.json
)
The following is an example of registering an GS secret with the KServe model deployer:
How do you use it?
For registering the model deployer, we need the URL of the Istio Ingress Gateway deployed on the Kubernetes cluster. We can get this URL by running the following command (assuming that the service name is istio-ingressgateway
, deployed in the istio-system
namespace):
Now register the model deployer:
We can now use the model deployer in our stack.
As the packaging and preparation of the model artifacts to the right format can be a bit of a challenge, ZenML's KServe Integration comes with a built-in model deployment step that can be used to deploy your models with the minimum of effort.
This step will:
Verify if the model is already deployed in the KServe cluster. If not, it will deploy the model.
Prepare the model artifacts to the right format for the TF, MLServer runtimes servers.
Package, verify and prepare the model artifact for the PyTorch runtime server since it requires additional files.
Upload the model artifacts to the Artifact Store.
An example of how to use the model deployment step is shown below.
Within the KServeDeploymentConfig
you can configure:
model_name
: the name of the model in the KServe cluster and in ZenML.replicas
: the number of replicas with which to deploy the modelpredictor
: the type of predictor to use for the model. The predictor type can be one of the following:tensorflow
,pytorch
,sklearn
,xgboost
,custom
.resources
: This can be configured by passing a dictionary with therequests
andlimits
keys. The values for these keys can be a dictionary with thecpu
andmemory
keys. The values for these keys can be a string with the amount of CPU and memory to be allocated to the model.
Custom Model Deployment
While KServe is a good fit for most use cases with the built-in model servers, it is not always the best fit for your custom model deployment use case. For that reason KServe allows you to create your own model server using the KServe ModelServer
API where you can customize the predict, the pre- and post-processing functions. With ZenML's KServe Integration, you can create your own custom model deployment code by creating a custom predict function that will be passed to a custom deployment step responsible for preparing a Docker image for the model server.
This custom_predict
function should be getting the model and the input data as arguments and returns the output data. ZenML will take care of loading the model into memory, starting the KServe ModelServer
that will be responsible for serving the model, and running the predict function.
Then this custom predict function path
can be passed to the custom deployment parameters.
Advanced Custom Code Deployment with KServe Integration
The built-in KServe custom deployment step is a good starting point for deploying your custom models. However, if you want to deploy more than the trained model, you can create your own Custom Model Class and a custom step to achieve this.
Last updated