Seldon
Deploying models to Kubernetes with Seldon Core.
This is an older version of the ZenML documentation. To read and view the latest version please visit this up-to-date URL.
Seldon
The Seldon Core Model Deployer is one of the available flavors of the Model Deployer stack component. Provided with the MLflow integration it can be used to deploy and manage models on an inference server running on top of a Kubernetes cluster.
When to use it?
Seldon Core is a production-grade open-source model serving platform. It packs a wide range of features built around deploying models to REST/GRPC microservices that include monitoring and logging, model explainers, outlier detectors, and various continuous deployment strategies such as A/B testing, canary deployments, and more.
Seldon Core also comes equipped with a set of built-in model server implementations designed to work with standard formats for packaging ML models that greatly simplify the process of serving models for real-time inference.
You should use the Seldon Core Model Deployer:
If you are looking to deploy your model on a more advanced infrastructure like Kubernetes.
If you want to handle the lifecycle of the deployed model with no downtime, including updating the runtime graph, scaling, monitoring, and security.
Looking for more advanced API endpoints to interact with the deployed model, including REST and GRPC endpoints.
If you want more advanced deployment strategies like A/B testing, canary deployments, and more.
if you have a need for a more complex deployment process that can be customized by the advanced inference graph that includes custom TRANSFORMER and ROUTER.
If you are looking for a more easy way to deploy your models locally, you can use the MLflow Model Deployer flavor.
How to deploy it?
ZenML provides a Seldon Core flavor build on top of the Seldon Core Integration to allow you to deploy and use your models in a production-grade environment. In order to use the integration you need to install it on your local machine to be able to register a Seldon Core Model deployer with ZenML and add it to your stack:
To deploy and make use of the Seldon Core integration we need to have the following prerequisites:
access to a Kubernetes cluster. The example accepts a
--kubernetes-context
command line argument. This Kubernetes context needs to point to the Kubernetes cluster where Seldon Core model servers will be deployed. If the context is not explicitly supplied to the example, it defaults to using the locally active context.Seldon Core needs to be preinstalled and running in the target Kubernetes cluster. Check out the official Seldon Core installation instructions .
models deployed with Seldon Core need to be stored in some form of persistent shared storage that is accessible from the Kubernetes cluster where Seldon Core is installed (e.g. AWS S3, GCS, Azure Blob Storage, etc.). You can use one of the supported remote artifact store flavors to store your models as part of your stack. For a smoother experience running Seldon Core with a cloud artifact store, we also recommend configuring explicit credentials for the artifact store. The Seldon Core model deployer knows how to automatically convert those credentials in the format needed by Seldon Core model servers to authenticate to the storage back-end where models are stored.
Since the Seldon Model Deployer is interacting with the Seldon Core model server deployed on a Kubernetes cluster, you need to provide a set of configuration parameters. These parameters are:
kubernetes_context: the Kubernetes context to use to contact the remote Seldon Core installation. If not specified, the current configuration is used. Depending on where the Seldon model deployer is being used
kubernetes_namespace: the Kubernetes namespace where the Seldon Core deployment servers are provisioned and managed by ZenML. If not specified, the namespace set in the current configuration is used.
base_url: the base URL of the Kubernetes ingress used to expose the Seldon Core deployment servers.
In addition to these parameters, the Seldon Core Model Deployer may also require additional configuration to be set up to allow it to authenticate to the remote artifact store or persistent storage service where model artifacts are located. This is covered in the Managing Seldon Core Authentication section.
Configuring Seldon Core in a Kubernetes cluster can be a complex and error-prone process, so we have provided a set of Terraform-based recipes to quickly provision popular combinations of MLOps tools. More information about these recipes can be found in the Open Source MLOps Stack Recipes.
Infrastructure Deployment
The Seldon Model Deployer can be deployed directly from the ZenML CLI:
You can pass other configurations specific to the stack components as key-value arguments. If you don't provide a name, a random one is generated for you. For more information about how to work use the CLI for this, please refer to the dedicated documentation section.
Managing Seldon Core Authentication
The Seldon Core Model Deployer requires access to the persistent storage where models are located. In most cases, you will use the Seldon Core model deployer to serve models that are trained through ZenML pipelines and stored in the ZenML Artifact Store, which implies that the Seldon Core model deployer needs to access the Artifact Store.
If Seldon Core is already running in the same cloud as the Artifact Store (e.g. S3 and an EKS cluster for AWS, or GCS and a GKE cluster for GCP), there are ways of configuring cloud workloads to have implicit access to other cloud resources like persistent storage without requiring explicit credentials. However, if Seldon Core is running in a different cloud, or on-prem, or if implicit in-cloud workload authentication is not enabled, then you need to configure explicit credentials for the Artifact Store to allow other components like the Seldon Core model deployer to authenticate to it. Every cloud Artifact Store flavor supports some way of configuring explicit credentials and this is documented for each individual flavor in the Artifact Store documentation.
When explicit credentials are configured in the Artifact Store, the Seldon Core Model Deployer doesn't need any additional configuration and will use those credentials automatically to authenticate to the same persistent storage service used by the Artifact Store. If the Artifact Store doesn't have explicit credentials configured, then Seldon Core will default to using whatever implicit authentication method is available in the Kubernetes cluster where it is running. For example, in AWS this means using the IAM role attached to the EC2 or EKS worker nodes, and in GCP this means using the service account attached to the GKE worker nodes.
If the Artifact Store used in combination with the Seldon Core Model Deployer in the same ZenML stack does not have explicit credentials configured, then the Seldon Core Model Deployer might not be able to authenticate to the Artifact Store which will cause the deployed model servers to fail.
To avoid this, we recommend that you use Artifact Stores with explicit credentials in the same stack as the Seldon Core Model Deployer. Alternatively, if you're running Seldon Core in one of the cloud providers, you should configure implicit authentication for the Kubernetes nodes.
If you want to use a custom persistent storage with Seldon Core, or if you prefer to manually manage the authentication credentials attached to the Seldon Core model servers, you can use the approach described in the next section.
Advanced: Configuring a Custom Seldon Core Secret
The Seldon Core model deployer stack component allows configuring an additional secret
attribute that can be used to specify custom credentials that Seldon Core should use to authenticate to the persistent storage service where models are located. This is useful if you want to connect Seldon Core to a persistent storage service that is not supported as a ZenML Artifact Store, or if you don't want to configure or use the same credentials configured for your Artifact Store. The secret
attribute must be set to the name of a ZenML secret containing credentials configured in the format supported by Seldon Core.
This method is not recommended, because it limits the Seldon Core model deployer to a single persistent storage service, whereas using the Artifact Store credentials gives you more flexibility in combining the Seldon Core model deployer with any Artifact Store in the same ZenML stack.
Seldon Core model servers use rclone
to connect to persistent storage services and the credentials that can be configured in the ZenML secret must also be in the configuration format supported by rclone
. This section covers a few common use cases and provides examples of how to configure the ZenML secret to support them, but for more information on supported configuration options, you can always refer to the rclone
documentation for various providers.
How do you use it?
For registering the model deployer, we need the URL of the Istio Ingress Gateway deployed on the Kubernetes cluster. We can get this URL by running the following command (assuming that the service name is istio-ingressgateway
, deployed in the istio-system
namespace):
Now register the model deployer:
Note: If you chose to configure your own custom credentials to authenticate to the persistent storage service where models are stored, as covered in the Advanced: Configuring a Custom Seldon Core Secret section, you will need to specify a ZenML secret reference when you configure the Seldon Core model deployer below:
We can now use the model deployer in our stack.
The following code snippet shows how to use the Seldon Core Model Deployer to deploy a model inside a ZenML pipeline step:
Within the SeldonDeploymentConfig
you can configure:
model_name
: the name of the model in the KServe cluster and in ZenML.replicas
: the number of replicas with which to deploy the modelimplementation
: the type of Seldon inference server to use for the model. The implementation type can be one of the following:TENSORFLOW_SERVER
,SKLEARN_SERVER
,XGBOOST_SERVER
,custom
.parameters
: an optional list of parameters (SeldonDeploymentPredictorParameter
) to pass to the deployment predictor in the form of:name
type
value
resources
: the resources to be allocated to the model. This can be configured by passing aSeldonResourceRequirements
object with therequests
andlimits
properties. The values for these properties can be a dictionary with thecpu
andmemory
keys. The values for these keys can be a string with the amount of CPU and memory to be allocated to the model.
A concrete example of using the Seldon Core Model Deployer can be found here.
For more information and a full list of configurable attributes of the Seldon Core Model Deployer, check out the API Docs .
Custom Model Deployment
When you have a custom use-case where Seldon Core pre-packaged inference servers cannot cover your needs, you can leverage the language wrappers to containerize your machine learning model(s) and logic. With ZenML's Seldon Core Integration, you can create your own custom model deployment code by creating a custom predict function that will be passed to a custom deployment step responsible for preparing a Docker image for the model server.
This custom_predict
function should be getting the model and the input data as arguments and returning the output data. ZenML will take care of loading the model into memory, starting the seldon-core-microservice
that will be responsible for serving the model and running the predict function.
Then this custom_predict
function path
can be passed to the custom deployment parameters.
The full code example can be found here.
Advanced Custom Code Deployment with Seldon Core Integration
Before creating your custom model class, you should take a look at the custom Python model section of the Seldon Core documentation.
The built-in Seldon Core custom deployment step is a good starting point for deploying your custom models. However, if you want to deploy more than the trained model, you can create your own Custom Class and a custom step to achieve this.
Example of the custom class .
The built-in Seldon Core custom deployment step responsible for packaging, preparing, and deploying to Seldon Core can be found here .
Last updated