LogoLogo
ProductResourcesGitHubStart free
  • Documentation
  • Learn
  • ZenML Pro
  • Stacks
  • API Reference
  • SDK Reference
  • Overview
  • Starter guide
    • Create an ML pipeline
    • Cache previous executions
    • Manage artifacts
    • Track ML models
    • A starter project
  • Production guide
    • Deploying ZenML
    • Understanding stacks
    • Connecting remote storage
    • Orchestrate on the cloud
    • Configure your pipeline to add compute
    • Configure a code repository
    • Set up CI/CD
    • An end-to-end project
  • LLMOps guide
    • RAG with ZenML
      • RAG in 85 lines of code
      • Understanding Retrieval-Augmented Generation (RAG)
      • Data ingestion and preprocessing
      • Embeddings generation
      • Storing embeddings in a vector database
      • Basic RAG inference pipeline
    • Evaluation and metrics
      • Evaluation in 65 lines of code
      • Retrieval evaluation
      • Generation evaluation
      • Evaluation in practice
    • Reranking for better retrieval
      • Understanding reranking
      • Implementing reranking in ZenML
      • Evaluating reranking performance
    • Improve retrieval by finetuning embeddings
      • Synthetic data generation
      • Finetuning embeddings with Sentence Transformers
      • Evaluating finetuned embeddings
    • Finetuning LLMs with ZenML
      • Finetuning in 100 lines of code
      • Why and when to finetune LLMs
      • Starter choices with finetuning
      • Finetuning with 🤗 Accelerate
      • Evaluation for finetuning
      • Deploying finetuned models
      • Next steps
  • Tutorials
    • Managing scheduled pipelines
    • Trigger pipelines from external systems
    • Hyper-parameter tuning
    • Inspecting past pipeline runs
    • Train with GPUs
    • Running notebooks remotely
    • Managing machine learning datasets
    • Handling big data
  • Best practices
    • 5-minute Quick Wins
    • Keep Your Dashboard Clean
    • Configure Python environments
    • Shared Components for Teams
    • Organizing Stacks Pipelines Models
    • Access Management
    • Setting up a Project Repository
    • Infrastructure as Code with Terraform
    • Creating Templates for ML Platform
    • Using VS Code extension
    • Leveraging MCP
    • Debugging and Solving Issues
    • Choosing an Orchestrator
  • Examples
    • Quickstart
    • End-to-End Batch Inference
    • Basic NLP with BERT
    • Computer Vision with YoloV8
    • LLM Finetuning
    • More Projects...
Powered by GitBook
On this page
  • The Importance of a Well-Architected Project
  • Key Components of a Well-Architected ZenML Project
  • Repository Structure
  • Version Control and Collaboration
  • Stacks, Pipelines, Models, and Artifacts
  • Access Management and Roles
  • Shared Components and Libraries
  • Project Templates
  • Migration and Maintenance
  • Set up your repository

Was this helpful?

Edit on GitHub
  1. Best practices

Setting up a Project Repository

Setting your team up for success with a well-architected ZenML project.

PreviousAccess ManagementNextInfrastructure as Code with Terraform

Last updated 8 days ago

Was this helpful?

Welcome to the guide on setting up a well-architected ZenML project. This section will provide you with a comprehensive overview of best practices, strategies, and considerations for structuring your ZenML projects to ensure scalability, maintainability, and collaboration within your team.

The Importance of a Well-Architected Project

A well-architected ZenML project is crucial for the success of your machine learning operations (MLOps). It provides a solid foundation for your team to develop, deploy, and maintain ML models efficiently. By following best practices and leveraging ZenML's features, you can create a robust and flexible MLOps pipeline that scales with your needs.

Key Components of a Well-Architected ZenML Project

Repository Structure

A clean and organized repository structure is essential for any ZenML project. This includes:

  • Proper folder organization for pipelines, steps, and configurations

  • Clear separation of concerns between different components

  • Consistent naming conventions

Learn more about setting up your repository in the .

Version Control and Collaboration

Integrating your ZenML project with version control systems like Git is crucial for team collaboration and code management. This allows for:

  • Makes creating pipeline builds faster, as you can leverage the same image and .

  • Easy tracking of changes

  • Collaboration among team members

Stacks, Pipelines, Models, and Artifacts

Understanding the relationship between stacks, models, and pipelines is key to designing an efficient ZenML project:

  • Stacks: Define your infrastructure and tool configurations

  • Models: Represent your machine learning models and their metadata

  • Pipelines: Encapsulate your ML workflows

  • Artifacts: Track your data and model outputs

Access Management and Roles

Proper access management ensures that team members have the right permissions and responsibilities:

  • Define roles such as data scientists, MLOps engineers, and infrastructure managers

  • Establish processes for pipeline maintenance and server upgrades

Shared Components and Libraries

Leverage shared components and libraries to promote code reuse and standardization across your team:

  • Custom flavors, steps, and materializers

  • Shared private wheels for internal distribution

  • Handling authentication for specific libraries

Project Templates

Utilize project templates to kickstart your ZenML projects and ensure consistency:

  • Use pre-made templates for common use cases

  • Create custom templates tailored to your team's needs

Migration and Maintenance

As your project evolves, you may need to migrate existing codebases or upgrade your ZenML server:

  • Strategies for migrating legacy code to newer ZenML versions

  • Best practices for upgrading ZenML servers

Set up your repository

While it doesn't matter how you structure your ZenML project, here is a recommended project structure the core team often uses:

.
├── .dockerignore
├── Dockerfile
├── steps
│   ├── loader_step
│   │   ├── .dockerignore (optional)
│   │   ├── Dockerfile (optional)
│   │   ├── loader_step.py
│   │   └── requirements.txt (optional)
│   └── training_step
│       └── ...
├── pipelines
│   ├── training_pipeline
│   │   ├── .dockerignore (optional)
│   │   ├── config.yaml (optional)
│   │   ├── Dockerfile (optional)
│   │   ├── training_pipeline.py
│   │   └── requirements.txt (optional)
│   └── deployment_pipeline
│       └── ...
├── notebooks
│   └── *.ipynb
├── requirements.txt
├── .zen
└── run.py

Steps

Keep your steps in separate Python files. This allows you to optionally keep their utils, dependencies, and Dockerfiles separate.

Logging

ZenML records the root python logging handler's output into the artifact store as a side-effect of running a step. Therefore, when writing steps, use the logging module to record logs, to ensure that these logs then show up in the ZenML dashboard.

# Use ZenML handler
from zenml.logger import get_logger

logger = get_logger(__name__)
...

@step
def training_data_loader():
    # This will show up in the dashboard
    logger.info("My logs")

Pipelines

Just like steps, keep your pipelines in separate Python files. This allows you to optionally keep their utils, dependencies, and Dockerfiles separate.

It is recommended that you separate the pipeline execution from the pipeline definition so that importing the pipeline does not immediately run it.

Do not give pipelines or pipeline instances the name "pipeline". Doing this will overwrite the imported pipeline and decorator and lead to failures at later stages if more pipelines are decorated there.

Pipeline names are their unique identifiers, so using the same name for different pipelines will create a mixed history where two runs of a pipeline are two very different entities.

.dockerignore

Containerized orchestrators and step operators load your complete project files into a Docker image for execution. To speed up the process and reduce Docker image sizes, exclude all unnecessary files (like data, virtual environments, git repos, etc.) within the .dockerignore.

Dockerfile (optional)

Notebooks

Collect all your notebooks in one place.

.zen

By running zenml init at the root of your project, you define the project scope for ZenML. In ZenML terms, this will be called your "source root". This will be used to resolve import paths and store configurations.

  • When running Jupyter notebooks, it is required that you have a .zen directory initialized in one of the parent directories of your notebook.

  • When running regular Python scripts, it is still highly recommended that you have a .zen directory initialized in the root of your project. If that is not the case, ZenML will look for a .zen directory in the parent directories, which might cause issues if one is found (The import paths will not be relative to the source root anymore for example). If no .zen directory is found, the parent directory of the Python file that you're executing will be used as the implicit source root.

All of your import paths should be relative to the source root.

run.py

Putting your pipeline runners in the root of the repository ensures that all imports that are defined relative to the project root resolve for the pipeline runner. In case there is no .zen defined this also defines the implicit source's root.

Discover how to connect your Git repository in the .

Learn about organizing these components in the .

Set up and manage authorizations

Leverage to assign roles and permissions to a group of users, to mimic your real-world team roles.

Explore access management strategies in the .

Find out more about sharing code in the .

Learn about using and creating project templates in the .

Discover migration strategies and maintenance best practices in the .

All ZenML are modeled around this basic structure. The steps and pipelines folders contain the steps and pipelines defined in your project. If your project is simpler you can also just keep your steps at the top level of the steps folder without the need so structure them in subfolders.

It might also make sense to register your repository as a code repository. These enable ZenML to keep track of the code version that you use for your pipeline runs. Additionally, running a pipeline that is tracked in can speed up the Docker image building for containerized stack components by eliminating the need to rebuild Docker images each time you change one of your source code files. Learn more about these in .

By default, ZenML uses the official as a base for all pipeline and step builds. You can use your own Dockerfile to overwrite this behavior. Learn more .

Set up repository guide
have ZenML download code from your repository
Set up a repository guide
Organizing Stacks, Pipelines, Models, and Artifacts guide
service connectors
Teams in ZenML Pro
Access Management and Roles guide
Shared Libraries and Logic for Teams guide
Project Templates guide
Migration and Maintenance guide
Project templates
a registered code repository
connecting your Git repository
zenml Docker image
here
ZenML Scarf