Every ZenML project starts inside a ZenML repository. Think of it just like a normal Git repository, except that there are some added bonuses on top! In order to create a ZenML repository, create a git repo and do the following within this repository:
The initialization will execute the following steps:
It will create a default local SQLite metadata store and artifact store inside a
.zenml folder in the root of your repository.
It will create an empty pipelines directory at the root as well, which is the path where all your pipeline configurations will be stored on default.
.zenml_config YAML configuration file inside the
.zenml folder that tracks these defaults.
If you want to change your metadata store, artifact store, or pipelines directory, please use the
zenml config CLI group.
# Display the current propertyzenml config PROPERTY get# Set the current propertyzenml config PROPERTY set [OPTIONS] ARGUMENTS
Similar to other tools like Git, ZenML maintains both a per-repository configuration as well as a global configuration on your machine. As mentioned above, the local configuration is stored in a
.zenml/ directory at the root of your repository. This configuration is written in YAML and may look like this:
artifact_store: /path/to/zenml/repo/.zenml/local_storemetadata:args:uri: /path/to/zenml/repo/.zenml/local_store/metadata.dbtype: sqlitepipelines_dir: /path/to/zenml/repo/pipelines
As you can see this file stores the default artifact store, metadata store and pipelines directory which each of your pipelines will use by default when they are run.
The global config on the other hand stores
global information such as if a unique anonymous UUID for your ZenML installation as well as metadata regarding usage of your ZenML package. It can be found in most systems in the
.config directory at the path
In order to access information about your ZenML repository in code, you need to access the ZenML repository instance. This object is a singleton and can be fetched any time from within your Python code simply by executing:
from zenml.repo import Repository# We recommend to add the type hint for auto-completion in your IDE/Notebookrepo: Repository = Repository.get_instance()
repo object can be used to fetch all sorts of information regarding the repository. For example, one can do:
# Get all datasourcesdatasources = repo.get_datasources()# Get all pipelinespipelines = repo.get_pipelines()# List all registered steps in thesteps = repo.get_step_versions()# Get a step by its versionstep_object = repo.get_step_by_version(step_type, version)# Compare all pipelines in the repositoryrepo.compare_training_runs()
The full list of commands can be found within the
Repository class definition. Using these commands, one can always look back at what actions have been performed in this repository.
As we now have our ZenML repository set up, we can go ahead and start developing our first pipeline.
If you want to learn more about how the git integration works under the hood or see our suggestions on how to organize your repository, you can check here.