Set up CI/CD
Managing the lifecycle of a ZenML pipeline with Continuous Integration and Delivery
Last updated
Managing the lifecycle of a ZenML pipeline with Continuous Integration and Delivery
Last updated
Until now, we have been executing ZenML pipelines locally. While this is a good mode of operating pipelines, in production it is often desirable to mediate runs through a central workflow engine baked into your CI.
This allows data scientists to experiment with data processing and model training locally and then have code changes automatically tested and validated through the standard pull request/merge request peer review process. Changes that pass the CI and code review are then deployed automatically to production. Here is how this could look like:
To illustrate this, let's walk through how this process could be set up with a GitHub Repository. Basically we'll be using Github Actions in order to set up a proper CI/CD workflow.
To see this in action, check out the ZenML Gitflow Repository. This repository showcases how ZenML can be used for machine learning with a GitHub workflow that automates CI/CD with continuous model training and continuous model deployment to production. The repository is also meant to be used as a template: you can fork it and easily adapt it to your own MLOps stack, infrastructure, code and data.
In order to facilitate machine-to-machine connection you need to create an API key within ZenML. Learn more about those here.
This will return the API Key to you like this. This will not be shown to you again, so make sure to copy it here for use in the next section.
For our Github Actions we will need to set up some secrets for our repository. Specifically, you should use github secrets to store the ZENML_API_KEY
that you created above.
The other values that are loaded from secrets into the environment here can also be set explicitly or as variables.
You might not necessarily want to use the same stack with the same resources for your staging and production use.
This step is optional, all you'll need for certain is a stack that runs remotely (remote orchestration and artifact storage). The rest is up to you. You might for example want to parametrize your pipeline to use different data sources for the respective environments. You can also use different configuration files for the different environments to configure the Model, the DockerSettings, the ResourceSettings like accelerators differently for the different environments.
One way to ensure only fully working code makes it into production, you should use a staging environment to test all the changes made to your code base and verify they work as intended. To do so automatically you should set up a github action workflow that runs your pipeline for you when you make changes to it. Here is an example that you can use.
To only run the Github Action on a PR, you can configure the yaml like this
When the workflow starts we want to set some important values. Here is a simplified version that you can use.
After configuring these values so they apply to your specific situation the rest of the template should work as is for you. Specifically you will need to install all requirements, connect to your ZenML Server, set an active stack and run a pipeline within your github action.
When you push to a branch now, that is within a Pull Request, this action will run automatically.
Finally you can configure your github action workflow to leave a report based on the pipeline that was run. Check out the template for this [here](https://github.com/zenml-io/zenml-gitflow/blob/main/.github/workflows/pipeline_run.yaml#L87-L99.