Search…
Chapter 7
Deploy pipelines to production
If you want to see the code for this chapter of the guide, head over to the GitHub.

Deploy pipelines to production

When developing ML models, your pipelines will, at first, most probably live in your machine with a local Stack. However, at a certain point when you are finished with its design, you might want to transition to a more production-ready setting, and deploy the pipeline to a more robust environment.

Install and configure Airflow

This part is optional, and it would depend on your pre-existing production setting. For example, for this guide, Airflow will be set up from scratch and set it to work locally, however you might want to use a managed Airflow instance like Cloud Composer or Astronomer.
For this guide, you'll want to install airflow before continuing:
1
pip install apache_airflow==2.2.0
Copied!

Creating an Airflow Stack

A Stack is the configuration of the surrounding infrastructure where ZenML pipelines are run and managed. For now, a Stack consists of:
  • A metadata store: To store metadata like parameters and artifact URIs
  • An artifact store: To store interim data step output.
  • An orchestrator: A service that actually kicks off and runs each step of the pipeline.
When you did zenml init at the start of this guide, a default local_stack was created with local version of all of these. In order to see the stack you can check it out in the command line:
1
zenml stack list
Copied!
Output:
1
STACKS:
2
key stack_type metadata_store_name artifact_store_name orchestrator_name
3
----------- ------------ --------------------- --------------------- -------------------
4
local_stack base local_metadata_store local_artifact_store local_orchestrato
Copied!
Your local stack when you start
Let's stick with the local_metadata_store and a local_artifact_store for now and create an Airflow orchestrator and corresponding stack.
1
zenml orchestrator register airflow_orchestrator airflow
2
zenml stack register airflow_stack \
3
-m local_metadata_store \
4
-a local_artifact_store \
5
-o airflow_orchestrator
6
zenml stack set airflow_stack
Copied!
Output:
1
Orchestrator `airflow_orchestrator` successfully registered!
2
Stack `airflow_stack` successfully registered!
3
Active stack: airflow_stack
Copied!
Your stack with Airflow as orchestrator
In the real-world we would also switch to something like a MySQL-based metadata store and a Azure/GCP/S3-based artifact store. We have just skipped that part to keep everything in one machine to make it a bit easier to run this guide.

Starting up Airflow

ZenML takes care of configuring Airflow, all we need to do is run:
1
zenml orchestrator up
Copied!
This will bootstrap Airflow, start up all the necessary components and run them in the background. When the setup is finished, it will print username and password for the Airflow webserver to the console.
If you can't find the password on the console, you can navigate to the APP_DIR / airflow / airflow_root / STACK_UUID / standalone_admin_password.txt file. The username will always be admin.
  • APP_DIR will depend on your os. See which path corresponds to your OS here.
  • STACK_UUID will be the unique id of the airflow_stack. There will be only one folder here so you can just navigate to the one that is present.

Run

The code from this chapter is the same as the last chapter. So run:
1
python chapter_7.py
Copied!
Even through the pipeline script is the same, the output will be a lot different from last time. ZenML will detect that airflow_stack is the active stack, and do the following:
  • chapter_7.py will be copied to the Airflow dag_dir so Airflow can detect is as an Airflow DAG definition file.
  • The Airflow DAG will show up in the Airflow UI at http://0.0.0.0:8080. You will have to login with the username and password generated above.
  • The DAG name will be the same as the pipeline name, so in this case mnist_pipeline.
  • The DAG will be scheduled to run every minute.
  • The DAG will be un-paused so you'll probably see the first run as you click through.
And that's it: As long as you keep Airflow running now, this script will run every minute, pull the latest data, and train a new model!
We now have a continuously training ML pipeline training on new data every day. All the pipelines will be tracked in your production Stack's metadata store, the interim artifacts will be stored in the Artifact Store, and the scheduling and orchestration is being handled by the orchestrator, in this case Airflow.

Shutting down Airflow

Once we are done experimenting, we need to shut down Airflow by running:
1
zenml orchestrator down
Copied!

Conclusion

If you made it this far, congratulations! You're one step closer to being production-ready with your ML workflows! Here is what we achieved in this entire guide:
  • Experimented locally and built-up a ML pipeline.
  • Transitioned to production by deploying a continuously training pipeline on newly arriving data.
  • All the while retained complete lineage and tracking over parameters, data, code, and metadata.

Coming soon

There are lot's more things you do in production that you might consider adding to your workflows:
  • Adding a step to automatically deploy the models to a REST endpoint.
  • Setting up a drift detection and validation step to test models before deploying.
  • Creating a batch inference pipeline to get predictions.
ZenML will help with all of these and above -> Watch out for future releases and the next extension of this guide coming soon!
Last modified 20d ago