Migration guide 0.13.2 β 0.20.0
How to migrate from ZenML <=0.13.2 to 0.20.0.
Last updated: 2023-07-24
The ZenML 0.20.0 release brings a number of big changes to its architecture and its features, some of which are not backwards compatible with previous versions. This guide walks you through these changes and offers instructions on how to migrate your existing ZenML stacks and pipelines to the new version with minimal effort and disruption to your existing workloads.
Updating to ZenML 0.20.0 needs to be followed by a migration of your existing ZenML Stacks and you may also need to make changes to your current ZenML pipeline code. Please read this guide carefully and follow the migration instructions to ensure a smooth transition.
If you have updated to ZenML 0.20.0 by mistake or are experiencing issues with the new version, you can always go back to the previous version by using pip install zenml==0.13.2
instead of pip install zenml
when installing ZenML manually or in your scripts.
High-level overview of the changes:
ZenML takes over the Metadata Store role. All information about your ZenML Stacks, pipelines, and artifacts is tracked by ZenML itself directly. If you are currently using remote Metadata Stores (e.g. deployed in cloud) in your stacks, you will probably need to replace them with a ZenML server deployment.
the new ZenML Dashboard is now available with all ZenML deployments.
ZenML Profiles have been removed in favor of ZenML Projects. You need to manually migrate your existing ZenML Profiles after the update.
the configuration of Stack Components is now decoupled from their implementation. If you extended ZenML with custom stack component implementations, you may need to update the way they are registered in ZenML.
the updated ZenML server provides a new and improved collaborative experience. When connected to a ZenML server, you can now share your ZenML Stacks and Stack Components with other users. If you were previously using the ZenML Profiles or the ZenML server to share your ZenML Stacks, you should switch to the new ZenML server and Dashboard and update your existing workflows to reflect the new features.
ZenML takes over the Metadata Store role
ZenML can now run as a server that can be accessed via a REST API and also comes with a visual user interface (called the ZenML Dashboard). This server can be deployed in arbitrary environments (local, on-prem, via Docker, on AWS, GCP, Azure etc.) and supports user management, workspace scoping, and more.
The release introduces a series of commands to facilitate managing the lifecycle of the ZenML server and to access the pipeline and pipeline run information:
zenml connect / disconnect / down / up / logs / status
can be used to configure your client to connect to a ZenML server, to start a local ZenML Dashboard or to deploy a ZenML server to a cloud environment. For more information on how to use these commands, see the ZenML deployment documentation.zenml pipeline list / runs / delete
can be used to display information and about and manage your pipelines and pipeline runs.
In ZenML 0.13.2 and earlier versions, information about pipelines and pipeline runs used to be stored in a separate stack component called the Metadata Store. Starting with 0.20.0, the role of the Metadata Store is now taken over by ZenML itself. This means that the Metadata Store is no longer a separate component in the ZenML architecture, but rather a part of the ZenML core, located wherever ZenML is deployed: locally on your machine or running remotely as a server.
All metadata is now stored, tracked, and managed by ZenML itself. The Metadata Store stack component type and all its implementations have been deprecated and removed. It is no longer possible to register them or include them in ZenML stacks. This is a key architectural change in ZenML 0.20.0 that further improves usability, reproducibility and makes it possible to visualize and manage all your pipelines and pipeline runs in the new ZenML Dashboard.
The architecture changes for the local case are shown in the diagram below:
The architecture changes for the remote case are shown in the diagram below:
If you're already using ZenML, aside from the above limitation, this change will impact you differently, depending on the flavor of Metadata Stores you have in your stacks:
if you're using the default
sqlite
Metadata Store flavor in your stacks, you don't need to do anything. ZenML will automatically switch to using its local database instead of yoursqlite
Metadata Stores when you update to 0.20.0 (also see how to migrate your stacks).if you're using the
kubeflow
Metadata Store flavor only as a way to connect to the local Kubeflow Metadata Service (i.e. the one installed by thekubeflow
Orchestrator in a local k3d Kubernetes cluster), you also don't need to do anything explicitly. When you migrate your stacks to ZenML 0.20.0, ZenML will automatically switch to using its local database.if you're using the
kubeflow
Metadata Store flavor to connect to a remote Kubeflow Metadata Service such as those provided by a Kubeflow installation running in AWS, Google or Azure, there is currently no equivalent in ZenML 0.20.0. You'll need to deploy a ZenML Server instance close to where your Kubeflow service is running (e.g. in the same cloud region).if you're using the
mysql
Metadata Store flavor to connect to a remote MySQL database service (e.g. a managed AWS, GCP or Azure MySQL service), you'll have to deploy a ZenML Server instance connected to that same database.if you deployed a
kubernetes
Metadata Store flavor (i.e. a MySQL database service deployed in Kubernetes), you can deploy a ZenML Server in the same Kubernetes cluster and connect it to that same database. However, ZenML will no longer provide thekubernetes
Metadata Store flavor and you'll have to manage the Kubernetes MySQL database service deployment yourself going forward.
The ZenML Server inherits the same limitations that the Metadata Store had prior to ZenML 0.20.0:
it is not possible to use a local ZenML Server to track pipelines and pipeline runs that are running remotely in the cloud, unless the ZenML server is explicitly configured to be reachable from the cloud (e.g. by using a public IP address or a VPN connection).
using a remote ZenML Server to track pipelines and pipeline runs that are running locally is possible, but can have significant performance issues due to the network latency.
It is therefore recommended that you always use a ZenML deployment that is located as close as possible to and reachable from where your pipelines and step operators are running. This will ensure the best possible performance and usability.
π£ How to migrate pipeline runs from your old metadata stores
The zenml pipeline runs migrate
CLI command is only available under ZenML versions [0.21.0, 0.21.1, 0.22.0]. If you want to migrate your existing ZenML runs from zenml<0.20.0
to zenml>0.22.0
, please first upgrade to zenml==0.22.0
and migrate your runs as shown below, then upgrade to the newer version.
To migrate the pipeline run information already stored in an existing metadata store to the new ZenML paradigm, you can use the zenml pipeline runs migrate
CLI command.
Before upgrading ZenML, make a backup of all metadata stores you want to migrate, then upgrade ZenML.
Decide the ZenML deployment model that you want to follow for your projects. See the ZenML deployment documentation for available deployment scenarios. If you decide on using a local or remote ZenML server to manage your pipelines, make sure that you first connect your client to it by running
zenml connect
.Use the
zenml pipeline runs migrate
CLI command to migrate your old pipeline runs:
If you want to migrate from a local SQLite metadata store, you only need to pass the path to the metadata store to the command, e.g.:
If you would like to migrate any other store, you will need to set
--database_type=mysql
and provide the MySQL host, username, and password in addition to the database, e.g.:
πΎ The New Way (CLI Command Cheat Sheet)
Deploy the server
zenml deploy --aws
(maybe donβt do this :) since it spins up infrastructure on AWSβ¦)
Spin up a local ZenML Server
zenml up
Connect to a pre-existing server
zenml connect
(pass in URL / etc, or zenml connect --config + yaml file)
List your deployed server details
zenml status
The ZenML Dashboard is now available
The new ZenML Dashboard is now bundled into the ZenML Python package and can be launched directly from Python. The source code lives in the ZenML Dashboard repository.
To launch it locally, simply run zenml up
on your machine and follow the instructions:
The Dashboard will be available at http://localhost:8237
by default:
For more details on other possible deployment options, see the ZenML deployment documentation, and/or follow the starter guide to learn more.
Removal of Profiles and the local YAML database
Prior to 0.20.0, ZenML used used a set of local YAML files to store information about the Stacks and Stack Components that were registered on your machine. In addition to that, these Stacks could be grouped together and organized under individual Profiles.
Profiles and the local YAML database have both been deprecated and removed in ZenML 0.20.0. Stack, Stack Components as well as all other information that ZenML tracks, such as Pipelines and Pipeline Runs, are now stored in a single SQL database. These entities are no longer organized into Profiles, but they can be scoped into different Projects instead.
Since the local YAML database is no longer used by ZenML 0.20.0, you will lose all the Stacks and Stack Components that you currently have configured when you update to ZenML 0.20.0. If you still want to use these Stacks, you will need to manually migrate them after the update.
π£ How to migrate your Profiles
If you're already using ZenML, you can migrate your existing Profiles to the new ZenML 0.20.0 paradigm by following these steps:
first, update ZenML to 0.20.0. This will automatically invalidate all your existing Profiles.
decide the ZenML deployment model that you want to follow for your projects. See the ZenML deployment documentation for available deployment scenarios. If you decide on using a local or remote ZenML server to manage your pipelines, make sure that you first connect your client to it by running
zenml connect
.use the
zenml profile list
andzenml profile migrate
CLI commands to import the Stacks and Stack Components from your Profiles into your new ZenML deployment. If you have multiple Profiles that you would like to migrate, you can either use a prefix for the names of your imported Stacks and Stack Components, or you can use a different ZenML Project for each Profile.
The ZenML Dashboard is currently limited to showing only information that is available in the default
Project. If you wish to migrate your Profiles to a different Project, you will not be able to visualize the migrated Stacks and Stack Components in the Dashboard. This will be fixed in a future release.
Once you've migrated all your Profiles, you can delete the old YAML files.
Example of migrating a default
profile into the default
project:
Example of migrating a profile into the default
project using a name prefix:
Example of migrating a profile into a new project:
The zenml profile migrate
CLI command also provides command line flags for cases in which the user wants to overwrite existing components or stacks, or ignore errors.
Decoupling Stack Component configuration from implementation
Stack components can now be registered without having the required integrations installed. As part of this change, we split all existing stack component definitions into three classes: an implementation class that defines the logic of the stack component, a config class that defines the attributes and performs input validations, and a flavor class that links implementation and config classes together. See component flavor models #895 for more details.
If you are only using stack component flavors that are shipped with the zenml Python distribution, this change has no impact on the configuration of your existing stacks. However, if you are currently using custom stack component implementations, you will need to update them to the new format. See the documentation on writing custom stack component flavors for updated information on how to do this.
Shared ZenML Stacks and Stack Components
With collaboration being the key part of ZenML, the 0.20.0 release puts the concepts of Users in the front and center and introduces the possibility to share stacks and stack components with other users by means of the ZenML server.
When your client is connected to a ZenML server, entities such as Stacks, Stack Components, Stack Component Flavors, Pipelines, Pipeline Runs, and artifacts are scoped to a Project and owned by the User that creates them. Only the objects that are owned by the current user used to authenticate to the ZenML server and that are part of the current project are available to the client.
Stacks and Stack Components can also be shared within the same project with other users. To share an object, either set it as shared during creation time (e.g. zenml stack register mystack ... --share
) or afterwards (e.g. through zenml stack share mystack
).
To differentiate between shared and private Stacks and Stack Components, these can now be addressed by name, id or the first few letters of the id in the cli. E.g. for a stack default
with id 179ebd25-4c5b-480f-a47c-d4f04e0b6185
you can now run zenml stack describe default
or zenml stack describe 179
or zenml stack describe 179ebd25-4c5b-480f-a47c-d4f04e0b6185
.
We also introduce the notion of local
vs non-local
stack components. Local stack components are stack components that are configured to run locally while non-local stack components are configured to run remotely or in a cloud environment. Consequently:
stacks made up of local stack components should not be shared on a central ZenML Server, even though this is not enforced by the system.
stacks made up of non-local stack components are only functional if they are shared through a remotely deployed ZenML Server.
Read more about shared stacks in the new starter guide.
Other changes
The Repository
class is now called Client
Repository
class is now called Client
The Repository
object has been renamed to Client
to better capture its functionality. You can continue to use the Repository
object for backwards compatibility, but it will be removed in a future release.
How to migrate: Rename all references to Repository
in your code to Client
.
The BaseStepConfig
class is now called BaseParameters
BaseStepConfig
class is now called BaseParameters
The BaseStepConfig
object has been renamed to BaseParameters
to better capture its functionality. You can NOT continue to use the BaseStepConfig
.
This is part of a broader configuration rehaul which is discussed next.
How to migrate: Rename all references to BaseStepConfig
in your code to BaseParameters
.
Configuration Rework
Alongside the architectural shift, Pipeline configuration has been completely rethought. This video gives an overview of how configuration has changed with ZenML in the post ZenML 0.20.0 world.
What changed?
ZenML pipelines and steps could previously be configured in many different ways:
On the
@pipeline
and@step
decorators (e.g. therequirements
variable)In the
__init__
method of the pipeline and step classUsing
@enable_xxx
decorators, e.g.@enable_mlflow
.Using specialized methods like
pipeline.with_config(...)
orstep.with_return_materializer(...)
Some of the configuration options were quite hidden, difficult to access and not tracked in any way by the ZenML metadata store.
With ZenML 0.20.0, we introduce the BaseSettings
class, a broad class that serves as a central object to represent all runtime configuration of a pipeline run (apart from the BaseParameters
).
Pipelines and steps now allow all configurations on their decorators as well as the .configure(...)
method. This includes configurations for stack components that are not infrastructure-related which was previously done using the @enable_xxx
decorators). The same configurations can also be defined in a YAML file.
Read more about this paradigm in the new docs section about settings.
Here is a list of changes that are the most obvious in consequence of the above code. Please note that this list is not exhaustive, and if we have missed something let us know via Slack.
Deprecating the enable_xxx
decorators
With the above changes, we are deprecating the much-loved enable_xxx
decorators, like enable_mlflow
and enable_wandb
.
How to migrate: Simply remove the decorator and pass something like this instead to step directly:
Deprecating pipeline.with_config(...)
How to migrate: Replaced with the new pipeline.run(config_path=...)
.
Deprecating step.with_return_materializer(...)
How to migrate: Simply remove the with_return_materializer
method and pass something like this instead to step directly:
DockerConfiguration
is now renamed to DockerSettings
How to migrate: Rename DockerConfiguration
to DockerSettings
and instead of passing it in the decorator directly with docker_configuration
, you can use:
With this change, all stack components (e.g. Orchestrators and Step Operators) that accepted a docker_parent_image
as part of its Stack Configuration should now pass it through the DockerSettings
object.
Read more here.
ResourceConfiguration
is now renamed to ResourceSettings
How to migrate: Rename ResourceConfiguration
to ResourceSettings
and instead of passing it in the decorator directly with resource_configuration
, you can use:
Deprecating the requirements
and required_integrations
parameters
Users used to be able to pass requirements
and required_integrations
directly in the @pipeline
decorator, but now need to pass them through settings:
How to migrate: Simply remove the parameters and use the DockerSettings
instead
Read more here.
A new pipeline intermediate representation
All the aforementioned configurations as well as additional information required to run a ZenML pipelines are now combined into an intermediate representation called PipelineDeployment
. Instead of the user-facing BaseStep
and BasePipeline
classes, all the ZenML orchestrators and step operators now use this intermediate representation to run pipelines and steps.
How to migrate: If you have written a custom orchestrator or step operator, then you should see the new base abstractions (seen in the links). You can adjust your stack component implementations accordingly.
PipelineSpec
now uniquely defines pipelines
PipelineSpec
now uniquely defines pipelinesOnce a pipeline has been executed, it is represented by a PipelineSpec
that uniquely identifies it. Therefore, users are no longer able to edit a pipeline once it has been run once. There are now three options to get around this:
Pipeline runs can be created without being associated with a pipeline explicitly: We call these
unlisted
runs. Read more about unlisted runs here.Pipelines can be deleted and created again.
Pipelines can be given unique names each time they are run to uniquely identify them.
How to migrate: No code changes, but rather keep in mind the behavior (e.g. in a notebook setting) when quickly iterating over pipelines as experiments.
New post-execution workflow
The Post-execution workflow has changed as follows:
The
get_pipelines
andget_pipeline
methods have been moved out of theRepository
(i.e. the newClient
) class and lie directly in the post_execution module now. To use the user has to do:
New methods to directly get a run have been introduced:
get_run
andget_unlisted_runs
method has been introduced to get unlisted runs.
Usage remains largely similar. Please read the new docs for post-execution to inform yourself of what further has changed.
How to migrate: Replace all post-execution workflows from the paradigm of Repository.get_pipelines
or Repository.get_pipeline_run
to the corresponding post_execution methods.
π‘Future Changes
While this rehaul is big and will break previous releases, we do have some more work left to do. However we also expect this to be the last big rehaul of ZenML before our 1.0.0 release, and no other release will be so hard breaking as this one. Currently planned future breaking changes are:
Following the metadata store, the secrets manager stack component might move out of the stack.
ZenML
StepContext
might be deprecated.
π Reporting Bugs
While we have tried our best to document everything that has changed, we realize that mistakes can be made and smaller changes overlooked. If this is the case, or you encounter a bug at any time, the ZenML core team and community are available around the clock on the growing Slack community.
For bug reports, please also consider submitting a GitHub Issue.
Lastly, if the new changes have left you desiring a feature, then consider adding it to our public feature voting board. Before doing so, do check what is already on there and consider upvoting the features you desire the most.
Last updated