How to configure MLOps tooling and infrastructure with stacks
Machine learning in production is not just about designing and training models. It is a fractured space consisting of a wide variety of tasks ranging from experiment tracking to orchestration, from model deployment to monitoring, from drift detection to feature stores and much, much more than that. Even though there are already some seemingly well-established solutions for these tasks, it can become increasingly difficult to establish a running production system in a reliable and modular manner once all these solutions are brought together.
This is a problem which is especially critical when switching from a research setting to a production setting. Due to a lack of standards, the time and resources invested in proof of concepts frequently go completely to waste, because the initial system can not easily be transferred to a production-grade setting.
At ZenML, we believe that this is one of the most important and challenging problems in the field of MLOps, and it can be solved with a set of standards and well-structured abstractions. Owing to the nature of MLOps, it is essential that these abstractions not only cover concepts such as pipelines and steps but also the infrastructure elements on which the pipelines run.
Taking this into consideration, ZenML provides additional abstractions that help you simplify infrastructure configuration and management:
Let's discuss each in further detail:
In ZenML, a Stack represents a set of configurations for your MLOps tools and infrastructure. For instance, you might want to:
In the illustration, you see one user register two stacks, the
Local Stackand a
Production Stack. These stacks can be shared with other people easily - something we'll dig into more later.
Running your pipeline in the cloud
Any such combination of tools and infrastructure can be registered as a separate stack in ZenML. Since ZenML code is tooling-independent, you can switch between stacks with a single command and then automatically execute your ML workflows on the desired stack without having to modify your code.
By default, every ZenML project that you create already come with an initial active
defaultstack. If you followed the code examples in the Steps and Pipelines section, then you have already used this stack implicitly to run all of your pipelines.
This stack features two stack components:
Speaking of stack components...
In ZenML, each MLOps tool is associated to a specific Stack Component, which is responsible for one specific task of your ML workflow. All stack components are grouped into categories.
For instance, each ZenML stack (e.g. the default stack above) includes an Orchestrator which is responsible for the execution of the steps within your pipeline, an Artifact Store which is responsible for storing the artifacts generated by your pipelines.
Check out the Categories of MLOps Tools page for a detailed overview of available stack components in ZenML.
The Orchestrator is the component that defines how and where each pipeline step is executed when calling
pipeline.run(). By default, all runs are executed locally, but by configuring a different orchestrator you can, e.g., automatically execute your ML workflows on Kubeflow instead.
Under the hood, all the artifacts in our ML pipeline are automatically stored in an Artifact Store. By default, this is simply a place in your local file system, but we could also configure ZenML to store this data in a cloud bucket like Amazon S3 or any other place instead.
Every stack can usually contain one stack component category of each type, e.g., one
Artifact Store, etc, but in some cases, you can have more than one stack component category in one stack (e.g. in the case of having two
Step Operatorsin your stack). We will discuss this in later chapters.
The specific tool you are using is called a Flavor of the stack component. E.g., Kubeflow is a flavor of the Orchestrator stack component category.
Out-of-the-box, ZenML already comes with a wide variety of flavors, which are either built-in or enabled through the installation of specific Integrations.
Our CLI features a wide variety of commands that let you manage and use your stacks. If you would like to learn more, please run: "
zenml stack --help" or visit our CLI docs.
You can see a list of all your registered stacks with the following command:
zenml stack list
Similarly, you can see all registered stack components of a specific type using
zenml <STACK_COMPONENT_CATEGORY> list, e.g.:
zenml orchestrator list
In order to see all the available flavors for a specific stack component use
zenml <STACK_COMPONENT_CATEGORY> flavor list, e.g.:
zenml orchestrator flavor list
You can also see details of configuration parameters available for a flavor with
zenml <STACK_COMPONENT_CATEGORY> flavor describe <FLAVOR>, e.g.:
zenml orchestrator flavor describe kubeflow