Configure a code repository
Connect a Git repository to ZenML to track code changes and collaborate on MLOps projects.
Last updated
Connect a Git repository to ZenML to track code changes and collaborate on MLOps projects.
Last updated
Throughout the lifecycle of a MLOps pipeline, it can get quite tiresome to always wait for a Docker build every time after running a pipeline (even if the local Docker cache is used). However, there is a way to just have one pipeline build and keep reusing it until a change to the pipeline environment is made: by connecting a code repository.
With ZenML, connecting to a Git repository optimizes the Docker build processes. It also has the added bonus of being a better way of managing repository changes and enabling better code collaboration. Here is how the flow changes when running a pipeline:
You trigger a pipeline run on your local machine. ZenML parses the @pipeline
function to determine the necessary steps.
The local client requests stack information from the ZenML server, which responds with the cloud stack configuration.
The local client detects that we're using a code repository and requests the information from the git repo.
Instead of building a new Docker image, the client checks if an existing image can be reused based on the current Git commit hash and other environment metadata.
The client initiates a run in the orchestrator, which sets up the execution environment in the cloud, such as a VM.
The orchestrator downloads the code directly from the Git repository and uses the existing Docker image to run the pipeline steps.
Pipeline steps execute, storing artifacts in the cloud-based artifact store.
Throughout the execution, the pipeline run status and metadata are reported back to the ZenML server.
By connecting a Git repository, you avoid redundant builds and make your MLOps processes more efficient. Your team can work on the codebase simultaneously, with ZenML handling the version tracking and ensuring that the correct code version is always used for each run.
While ZenML supports many different flavors of git repositories, this guide will focus on GitHub. To create a repository on GitHub:
Sign in to GitHub.
Click the "+" icon and select "New repository."
Name your repository, set its visibility, and add a README or .gitignore if needed.
Click "Create repository."
We can now push our local code (from the previous chapters) to GitHub with these commands:
Replace YOUR_USERNAME
and YOUR_REPOSITORY_NAME
with your GitHub information.
To connect your GitHub repository to ZenML, you'll need a GitHub Personal Access Token (PAT).
Now, we can install the GitHub integration and register your repository:
Fill in <REPO_NAME>
, YOUR_USERNAME
, YOUR_REPOSITORY_NAME
, and YOUR_GITHUB_PERSONAL_ACCESS_TOKEN
with your details.
Your code is now connected to your ZenML server. ZenML will automatically detect if your source files are being tracked by GitHub and store the commit hash for each subsequent pipeline run.
You can try this out by running our training pipeline again:
You can read more about the ZenML Git Integration here.