Discovering the core concepts behind ZenML.
ZenML is an extensible, open-source MLOps framework for creating portable, production-ready MLOps pipelines. It's built for data scientists, ML Engineers, and MLOps Developers to collaborate as they develop to production. In order to achieve this goal, ZenML introduces various concepts for different aspects of an ML workflow and we can categorize these concepts under three different threads:
First, let's look at the main concepts which play a role during the development stage of an ML workflow with ZenML.
At its core, ZenML follows a pipeline-based workflow for your projects. A pipeline consists of a series of steps, organized in any order that makes sense for your use case. Below, you can see four steps running one after another in a pipeline.
Representation of a pipeline dag.
As seen in the image, a step might use the outputs from a previous step and thus must wait until the previous step completes before starting. This is something you can keep in mind when organizing your steps.
Pipelines and steps are defined in code using Python decorators or classes. This is where the core business logic and value of your work lives, and you will spend most of your time defining these two things.
Artifacts represent the data that goes through your steps as inputs and outputs and they are stored in the artifact store. The serialization and deserialization logic of artifacts is defined by Materializers.
Materializers define how Artifacts live in-between steps. More precisely, they define how data of a particular type can be serialized/deserialized, so that the steps are able to load the input data and store the output data.
All materializers use the base abstraction called the
BaseMaterializerclass. While ZenML comes built-in with various implementations of materializers for different datatypes, if you are using a library or a tool that doesn't work with our built-in options, you can write your own custom materializer to ensure that your data can be passed from step to step.
When we think about steps as functions, we know they receive input in the form of artifacts. We also know that they produce output (in the form of artifacts, stored in the artifact store). But steps also take parameters. The parameters that you pass into the steps are also (helpfully!) stored by ZenML. This helps freeze the iterations of your experimentation workflow in time, so you can return to them exactly as you run them. On top of the parameters that you provide for your steps, you can also use different
Settings to configure runtime configurations for your infrastructure and pipelines.
Once you have implemented your workflow by using the concepts described above, you can focus your attention on the execution of the pipeline run.
When you want to execute a pipeline run with ZenML, Stacks come into play. A Stack is a collection of stack components, where each component represents the respective configuration regarding a particular function in your MLOps pipeline such as orchestration systems, artifact repositories, and model deployment platforms.
For instance, if you take a close look at the default local stack of ZenML, you will see two components that are required in every stack in ZenML, namely an orchestrator and an artifact store.
ZenML running code on the Local Stack.
Keep in mind, that each one of these components is built on top of base abstractions and is completely extensible.
An Orchestrator is a workhorse that coordinates all the steps to run in a pipeline. Since pipelines can be set up with complex combinations of steps with various asynchronous dependencies between them, the orchestrator acts as the component that decides what steps to run and when to run them.
ZenML comes with a default local orchestrator designed to run on your local machine. This is useful, especially during the exploration phase of your project. You don't have to rent a cloud instance just to try out basic things.
An Artifact Store is a component that houses all data that pass through the pipeline as inputs and outputs. Each artifact that gets stored in the artifact store is tracked and versioned and this allows for extremely useful features like data caching which speeds up your workflows.
Similar to the orchestrator, ZenML comes with a default local artifact store designed to run on your local machine. This is useful, especially during the exploration phase of your project. You don't have to set up a cloud storage system to try out basic things.
ZenML provides a dedicated base abstraction for each stack component type. These abstractions are used to develop solutions, called Flavors, tailored to specific use cases/tools. With ZenML installed, you get access to a variety of built-in and integrated Flavors for each component type, but users can also leverage the base abstractions to create their own custom flavors.
When it comes to production-grade solutions, it is rarely enough to just run your workflow locally without including any cloud infrastructure.
Thanks to the separation between the pipeline code and the stack in ZenML, you can easily switch your stack independently from your code. For instance, all it would take you to switch from an experimental local stack running on your machine to a remote stack that employs a full-fledged cloud infrastructure is a single CLI command.
Switching between stacks with ZenML.
In order to benefit from the aforementioned core concepts to their fullest extent, it is essential to deploy and manage a production-grade environment that interacts with your ZenML installation.
First, in order to utilize stack components that are running remotely on a cloud infrastructure, you need to deploy a ZenML Server, so that it can communicate with these stack components and run your pipelines.
Visualization of the relationship between code and infrastructure.
On top of the communication with the stack components, the ZenML Server also keeps track of all the bits of metadata around a pipeline run. With a ZenML server, you are able to access all of your previous experiments with the associated details. This is extremely helpful in troubleshooting.
The ZenML Server also acts as a centralized secrets store that safely and securely stores sensitive data such as credentials used to access the services that are part of your stack. It can be configured to use a variety of different backends for this purpose, such as the AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, and Hashicorp Vault.
Secrets are sensitive data that you don't want to store in your code or configure alongside your stacks and pipelines. ZenML includes a centralized secrets store that you can use to store and access your secrets securely.
Collaboration is a crucial aspect of any MLOps team as they often need to bring together individuals with diverse skills and expertise to create a cohesive and effective workflow for machine learning projects. A successful MLOps team requires seamless collaboration between data scientists, engineers, and DevOps professionals to develop, train, deploy, and maintain machine learning models.
With a deployed ZenML Server, users have the ability to create their own teams and project structures. They can easily share pipelines, runs, stacks, and other resources, streamlining the workflow and promoting teamwork.
The ZenML Dashboard also communicates with the ZenML Server to visualize your pipelines, stacks, and stack components. The dashboard serves as a visual interface to showcase collaboration with ZenML. You can invite users, and share your stacks with them.
When you start working with ZenML, you'll start with a local ZenML setup, and when you want to transition you will need to deploy ZenML. Don't worry though, there is a one-click way to do it which we'll learn about later.
The ZenML Hub is a central platform that enables our users to search, share and discover community-contributed code, such as flavors, materializers, and steps, that can be used across organizations. The goal is to allow our users to extend their ZenML experience by leveraging the community's diverse range of implementations.
The ZenML Hub revolves around the concept of plugins, which can be made up of one or multiple ZenML entities, including flavors, materializers, and steps. Aside from the implementation of these entities, every plugin in the hub is also equipped with