Pipelines
How to create ML pipelines in ZenML
Steps & Pipelines
ZenML helps you standardize your ML workflows as ML Pipelines consisting of decoupled, modular Steps. This enables you to write portable code that can be moved from experimentation to production in seconds.
Step
Steps are the atomic components of a ZenML pipeline. Each step is defined by its inputs, the logic it applies and its outputs. Here is a very basic example of such a step, which uses a utility function to load the Digits dataset:
Let's come up with a second step that consumes the output of our first step and performs some sort of transformation on it. In this case, let's train a support vector machine classifier on the training data using sklearn:
Next, we will combine our two steps into our first ML pipeline.
In case you want to run the step function outside the context of a ZenML pipeline, all you need to do is call the .entrypoint()
method with the same input signature. For example:
Artifacts
The inputs and outputs of a step are artifacts that are automatically tracked and stored by ZenML in the artifact store. Artifacts are produced by and circulated among steps whenever your step returns an object or a value. If a step returns only a single thing (value or object etc) there is no need to use the Output
class as shown above.
Pipeline
Let us now define our first ML pipeline. This is agnostic of the implementation and can be done by routing outputs through the steps within the pipeline. You can think of this as a recipe for how we want data to flow through our steps.
Instantiate and run your Pipeline
With your pipeline recipe in hand you can now specify which concrete step implementations to use when instantiating the pipeline:
You can then execute your pipeline instance with the .run()
method:
You should see the following output in your terminal:
Inspect your pipeline in the dashboard
Give each pipeline run a name
When running a pipeline by calling my_pipeline.run()
, ZenML uses the current date and time as the name for the pipeline run. In order to change the name for a run, pass run_name
as a parameter to the run()
function:
Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the following placeholders that will be replaced by ZenML:
{{date}}
will resolve to the current date, e.g.2023_02_19
{{time}}
will resolve to the current time, e.g.11_07_09_326492
Unlisted runs
Pipeline runs can be created without being associated with a pipeline explicitly. These are called
unlisted
runs and can be created by passing theunlisted
parameter when running a pipeline:pipeline_instance.run(unlisted=True)
.Pipelines can be deleted and created again using
zenml pipeline delete <PIPELINE_ID_OR_NAME>
.
Code Summary
Last updated