What is a ZenML Repository?¶
Every ZenML project starts inside a ZenML repository. Think of it just like a normal Git repository, except that there are some added bonuses on top! To create a ZenML repository, do the following after having installed ZenML.
Please make sure to be inside a valid git repository before executing the above!
The initialization does the following:
Creates an empty
pipelinesdirectory at the root as well, which is the path where all your pipeline configurations will be stored.
.zenml_configYAML configuration file inside the
.zenmlfolder that tracks defaults.
If you want to change your default Metadata Store, Artifact Store, or Pipelines Directory, please use the
zenml config CLI group.
# Display the current property zenml config PROPERTY get # Set the current property zenml config PROPERTY set [OPTIONS] ARGUMENTS
Local vs Global Config¶
Similar to other tools like Git, ZenML both maintains a per-repository configuration as well as a global configuration on your machine. As mentioned above, the local configuration is stored in a
.zenml/ directory at the root of your repository. This configuration is written in YAML and may look like this:
artifact_store: /path/to/zenml/repo/.zenml/local_store metadata: args: uri: /path/to/zenml/repo/.zenml/local_store/metadata.db type: sqlite pipelines_dir: /path/to/zenml/repo/pipelines
The global config on the other hand stores
global information such as if a unique anonymous UUID for your zenml installation as well as metadata regarding usage of your ZenML package. It can be found in most systems in the
.config directory at the path
Recommended Repository Structure¶
The only requirement for the structure of any ZenML repository is that it should be a Git enabled repository. However, we recommend structuring a ZenML repo as follows. Not everything here is required, it is simply an organization that maximizes the effectiveness of ZenML.
repository │ requirements.txt │ pipeline_run_script.py | .gitignore | Dockerfile | └───.git | └───.zenml | .zenml_config | local_store/ | └───notebooks | pipeline_exploration.ipynb | pipeline_evaluation.ipynb | └───pipelines | pipeline_1.yaml | pipeline_2.yaml | pipeline_n.yaml | └───split | │ __init__.py | | | └───my_split_module | │ step.py | | __init__.py | └───preprocessing | │ __init__.py | | | └───my_preprocessing_module | │ step.py | | __init__.py | └───trainers | │ __init__.py | | | └───my_trainer_module | │ step.py | | __init__.py | └───other_step │ __init__.py | └───my_step_module │ step.py | __init__.py
Some things to note:
There can be many scripts of the type pipeline_run_script.py, and can potentially be placed in their own directory. These sorts of files are where the actual ZenML pipeline is constructed. When using ZenML in a CI/CD setting with automated runs, these files can be checked into source control as well.
You can put pipeline construction files anywhere within a ZenML repo, and not just the root. ZenML figures out automatically from which context you are executing and always finds a reference to the root of the repository!
The Dockerfile is necessary in case custom images are required for non-local pipeline runs. This too can be automated via a simple CI/CD scheme.
The notebook directory is for pre and post pipeline run analysis, to analyze what went right (or wrong) as the experiments develop. Whenever decisions are made and realized, the code developed here should be refactored into appropriate Step directories to be persisted and tracked by ZenML.
Notice that each type of Step has its own root folder, which contains individual modules for different implementations of it. This allows for flexible git pinning and easier development as this repository grows.
Let us know your structuring via Slack so we can improve this recommendation!