Migration guide 0.13.2 → 0.20.0
How to migrate from ZenML <=0.13.2 to 0.20.0.
Last updated: 2023-07-24
The ZenML 0.20.0 release brings a number of big changes to its architecture and its features, some of which are not backwards compatible with previous versions. This guide walks you through these changes and offers instructions on how to migrate your existing ZenML stacks and pipelines to the new version with minimal effort and disruption to your existing workloads.
Updating to ZenML 0.20.0 needs to be followed by a migration of your existing ZenML Stacks and you may also need to make changes to your current ZenML pipeline code. Please read this guide carefully and follow the migration instructions to ensure a smooth transition.
If you have updated to ZenML 0.20.0 by mistake or are experiencing issues with the new version, you can always go back to the previous version by using pip install zenml==0.13.2
instead of pip install zenml
when installing ZenML manually or in your scripts.
High-level overview of the changes:
ZenML takes over the Metadata Store role
The release introduces a series of commands to facilitate managing the lifecycle of the ZenML server and to access the pipeline and pipeline run information:
zenml pipeline list / runs / delete
can be used to display information and about and manage your pipelines and pipeline runs.
In ZenML 0.13.2 and earlier versions, information about pipelines and pipeline runs used to be stored in a separate stack component called the Metadata Store. Starting with 0.20.0, the role of the Metadata Store is now taken over by ZenML itself. This means that the Metadata Store is no longer a separate component in the ZenML architecture, but rather a part of the ZenML core, located wherever ZenML is deployed: locally on your machine or running remotely as a server.
All metadata is now stored, tracked, and managed by ZenML itself. The Metadata Store stack component type and all its implementations have been deprecated and removed. It is no longer possible to register them or include them in ZenML stacks. This is a key architectural change in ZenML 0.20.0 that further improves usability, reproducibility and makes it possible to visualize and manage all your pipelines and pipeline runs in the new ZenML Dashboard.
The architecture changes for the local case are shown in the diagram below:
The architecture changes for the remote case are shown in the diagram below:
If you're already using ZenML, aside from the above limitation, this change will impact you differently, depending on the flavor of Metadata Stores you have in your stacks:
The ZenML Server inherits the same limitations that the Metadata Store had prior to ZenML 0.20.0:
it is not possible to use a local ZenML Server to track pipelines and pipeline runs that are running remotely in the cloud, unless the ZenML server is explicitly configured to be reachable from the cloud (e.g. by using a public IP address or a VPN connection).
using a remote ZenML Server to track pipelines and pipeline runs that are running locally is possible, but can have significant performance issues due to the network latency.
It is therefore recommended that you always use a ZenML deployment that is located as close as possible to and reachable from where your pipelines and step operators are running. This will ensure the best possible performance and usability.
👣 How to migrate pipeline runs from your old metadata stores
The zenml pipeline runs migrate
CLI command is only available under ZenML versions [0.21.0, 0.21.1, 0.22.0]. If you want to migrate your existing ZenML runs from zenml<0.20.0
to zenml>0.22.0
, please first upgrade to zenml==0.22.0
and migrate your runs as shown below, then upgrade to the newer version.
To migrate the pipeline run information already stored in an existing metadata store to the new ZenML paradigm, you can use the zenml pipeline runs migrate
CLI command.
Before upgrading ZenML, make a backup of all metadata stores you want to migrate, then upgrade ZenML.
Use the
zenml pipeline runs migrate
CLI command to migrate your old pipeline runs:
If you want to migrate from a local SQLite metadata store, you only need to pass the path to the metadata store to the command, e.g.:
If you would like to migrate any other store, you will need to set
--database_type=mysql
and provide the MySQL host, username, and password in addition to the database, e.g.:
💾 The New Way (CLI Command Cheat Sheet)
Deploy the server
zenml deploy --aws
(maybe don’t do this :) since it spins up infrastructure on AWS…)
Spin up a local ZenML Server
zenml up
Connect to a pre-existing server
zenml connect
(pass in URL / etc, or zenml connect --config + yaml file)
List your deployed server details
zenml status
The ZenML Dashboard is now available
To launch it locally, simply run zenml up
on your machine and follow the instructions:
The Dashboard will be available at http://localhost:8237
by default:
Removal of Profiles and the local YAML database
Prior to 0.20.0, ZenML used used a set of local YAML files to store information about the Stacks and Stack Components that were registered on your machine. In addition to that, these Stacks could be grouped together and organized under individual Profiles.
Profiles and the local YAML database have both been deprecated and removed in ZenML 0.20.0. Stack, Stack Components as well as all other information that ZenML tracks, such as Pipelines and Pipeline Runs, are now stored in a single SQL database. These entities are no longer organized into Profiles, but they can be scoped into different Projects instead.
👣 How to migrate your Profiles
If you're already using ZenML, you can migrate your existing Profiles to the new ZenML 0.20.0 paradigm by following these steps:
first, update ZenML to 0.20.0. This will automatically invalidate all your existing Profiles.
use the
zenml profile list
andzenml profile migrate
CLI commands to import the Stacks and Stack Components from your Profiles into your new ZenML deployment. If you have multiple Profiles that you would like to migrate, you can either use a prefix for the names of your imported Stacks and Stack Components, or you can use a different ZenML Project for each Profile.
The ZenML Dashboard is currently limited to showing only information that is available in the default
Project. If you wish to migrate your Profiles to a different Project, you will not be able to visualize the migrated Stacks and Stack Components in the Dashboard. This will be fixed in a future release.
Once you've migrated all your Profiles, you can delete the old YAML files.
Example of migrating a default
profile into the default
project:
Example of migrating a profile into the default
project using a name prefix:
Example of migrating a profile into a new project:
The zenml profile migrate
CLI command also provides command line flags for cases in which the user wants to overwrite existing components or stacks, or ignore errors.
Decoupling Stack Component configuration from implementation
Shared ZenML Stacks and Stack Components
With collaboration being the key part of ZenML, the 0.20.0 release puts the concepts of Users in the front and center and introduces the possibility to share stacks and stack components with other users by means of the ZenML server.
When your client is connected to a ZenML server, entities such as Stacks, Stack Components, Stack Component Flavors, Pipelines, Pipeline Runs, and artifacts are scoped to a Project and owned by the User that creates them. Only the objects that are owned by the current user used to authenticate to the ZenML server and that are part of the current project are available to the client.
Stacks and Stack Components can also be shared within the same project with other users. To share an object, either set it as shared during creation time (e.g. zenml stack register mystack ... --share
) or afterwards (e.g. through zenml stack share mystack
).
To differentiate between shared and private Stacks and Stack Components, these can now be addressed by name, id or the first few letters of the id in the cli. E.g. for a stack default
with id 179ebd25-4c5b-480f-a47c-d4f04e0b6185
you can now run zenml stack describe default
or zenml stack describe 179
or zenml stack describe 179ebd25-4c5b-480f-a47c-d4f04e0b6185
.
We also introduce the notion of local
vs non-local
stack components. Local stack components are stack components that are configured to run locally while non-local stack components are configured to run remotely or in a cloud environment. Consequently:
stacks made up of local stack components should not be shared on a central ZenML Server, even though this is not enforced by the system.
stacks made up of non-local stack components are only functional if they are shared through a remotely deployed ZenML Server.
Other changes
The Repository
class is now called Client
Repository
class is now called Client
The Repository
object has been renamed to Client
to better capture its functionality. You can continue to use the Repository
object for backwards compatibility, but it will be removed in a future release.
How to migrate: Rename all references to Repository
in your code to Client
.
The BaseStepConfig
class is now called BaseParameters
BaseStepConfig
class is now called BaseParameters
The BaseStepConfig
object has been renamed to BaseParameters
to better capture its functionality. You can NOT continue to use the BaseStepConfig
.
This is part of a broader configuration rehaul which is discussed next.
How to migrate: Rename all references to BaseStepConfig
in your code to BaseParameters
.
Configuration Rework
Alongside the architectural shift, Pipeline configuration has been completely rethought. This video gives an overview of how configuration has changed with ZenML in the post ZenML 0.20.0 world.
What changed?
ZenML pipelines and steps could previously be configured in many different ways:
On the
@pipeline
and@step
decorators (e.g. therequirements
variable)In the
__init__
method of the pipeline and step classUsing
@enable_xxx
decorators, e.g.@enable_mlflow
.Using specialized methods like
pipeline.with_config(...)
orstep.with_return_materializer(...)
Some of the configuration options were quite hidden, difficult to access and not tracked in any way by the ZenML metadata store.
With ZenML 0.20.0, we introduce the BaseSettings
class, a broad class that serves as a central object to represent all runtime configuration of a pipeline run (apart from the BaseParameters
).
Pipelines and steps now allow all configurations on their decorators as well as the .configure(...)
method. This includes configurations for stack components that are not infrastructure-related which was previously done using the @enable_xxx
decorators). The same configurations can also be defined in a YAML file.
Deprecating the enable_xxx
decorators
With the above changes, we are deprecating the much-loved enable_xxx
decorators, like enable_mlflow
and enable_wandb
.
How to migrate: Simply remove the decorator and pass something like this instead to step directly:
Deprecating pipeline.with_config(...)
How to migrate: Replaced with the new pipeline.run(config_path=...)
.
Deprecating step.with_return_materializer(...)
How to migrate: Simply remove the with_return_materializer
method and pass something like this instead to step directly:
DockerConfiguration
is now renamed to DockerSettings
How to migrate: Rename DockerConfiguration
to DockerSettings
and instead of passing it in the decorator directly with docker_configuration
, you can use:
With this change, all stack components (e.g. Orchestrators and Step Operators) that accepted a docker_parent_image
as part of its Stack Configuration should now pass it through the DockerSettings
object.
ResourceConfiguration
is now renamed to ResourceSettings
How to migrate: Rename ResourceConfiguration
to ResourceSettings
and instead of passing it in the decorator directly with resource_configuration
, you can use:
Deprecating the requirements
and required_integrations
parameters
Users used to be able to pass requirements
and required_integrations
directly in the @pipeline
decorator, but now need to pass them through settings:
How to migrate: Simply remove the parameters and use the DockerSettings
instead
A new pipeline intermediate representation
All the aforementioned configurations as well as additional information required to run a ZenML pipelines are now combined into an intermediate representation called PipelineDeployment
. Instead of the user-facing BaseStep
and BasePipeline
classes, all the ZenML orchestrators and step operators now use this intermediate representation to run pipelines and steps.
PipelineSpec
now uniquely defines pipelines
PipelineSpec
now uniquely defines pipelinesOnce a pipeline has been executed, it is represented by a PipelineSpec
that uniquely identifies it. Therefore, users are no longer able to edit a pipeline once it has been run once. There are now three options to get around this:
Pipelines can be deleted and created again.
Pipelines can be given unique names each time they are run to uniquely identify them.
New post-execution workflow
The Post-execution workflow has changed as follows:
The
get_pipelines
andget_pipeline
methods have been moved out of theRepository
(i.e. the newClient
) class and lie directly in the post_execution module now. To use the user has to do:
New methods to directly get a run have been introduced:
get_run
andget_unlisted_runs
method has been introduced to get unlisted runs.
How to migrate: Replace all post-execution workflows from the paradigm of Repository.get_pipelines
or Repository.get_pipeline_run
to the corresponding post_execution methods.
📡Future Changes
While this rehaul is big and will break previous releases, we do have some more work left to do. However we also expect this to be the last big rehaul of ZenML before our 1.0.0 release, and no other release will be so hard breaking as this one. Currently planned future breaking changes are:
Following the metadata store, the secrets manager stack component might move out of the stack.
ZenML
StepContext
might be deprecated.
🐞 Reporting Bugs
Last updated