Hyper-parameter tuning
Running a hyperparameter tuning trial with ZenML.
Introduction
Hyper‑parameter tuning is the process of systematically searching for the best set of hyper‑parameters for your model. In ZenML, you can express these experiments declaratively inside a pipeline so that every trial is tracked, reproducible and shareable.
In this tutorial you will:
Build a simple training
step
that takes a hyper‑parameter as input.Create a fan‑out / fan‑in pipeline that trains multiple models in parallel – one for each hyper‑parameter value.
Select the best performing model.
Run the pipeline and inspect the results in the ZenML dashboard or programmatically.
Prerequisites
ZenML installed and an active stack (the local default stack is fine)
scikit‑learn
installed (pip install scikit-learn
)Basic familiarity with ZenML pipelines and steps
Step 1 Define the training step
Create a training step that accepts the learning‑rate as an input parameter and returns both the trained model and its training accuracy:
from typing import Annotated
from sklearn.base import ClassifierMixin
from zenml import step
MODEL_OUTPUT = "model"
@step
def train_step(learning_rate: float) -> Annotated[ClassifierMixin, MODEL_OUTPUT]:
"""Train a model with the given learning‑rate."""
# <your training code goes here>
...
Step 2 Create a fan‑out / fan‑in pipeline
Next, wire several instances of the same train_step
into a pipeline, each with a different hyper‑parameter. Afterwards, use a selection step that takes all models as input and decides which one is best.
from zenml import pipeline
from zenml import get_step_context, step
from zenml.client import Client
@step
def selection_step(step_prefix: str, output_name: str):
"""Pick the best model among all training steps."""
run = Client().get_pipeline_run(get_step_context().pipeline_run.name)
trained_models = {}
for step_name, step_info in run.steps.items():
if step_name.startswith(step_prefix):
model = step_info.outputs[output_name][0].load()
lr = step_info.config.parameters["learning_rate"]
trained_models[lr] = model
# <evaluate and select your favorite model here>
@pipeline
def hp_tuning_pipeline(step_count: int = 4):
after = []
for i in range(step_count):
train_step(learning_rate=i * 0.0001, id=f"train_step_{i}")
after.append(f"train_step_{i}")
selection_step(step_prefix="train_step_", output_name=MODEL_OUTPUT, after=after)
Currently ZenML doesn't allow passing a variable number of inputs into a step. The workaround shown above queries the artifacts after the fact via the Client
.
Step 3 Run the pipeline
if __name__ == "__main__":
hp_tuning_pipeline(step_count=4)()
While the pipeline is running you can:
follow the logs in your terminal
open the ZenML dashboard and watch the DAG execute
Step 4 Inspect results
Once the run is finished you can programmatically analyze which hyper‑parameter performed best or load the chosen model:
from zenml.client import Client
run = Client().get_pipeline("hp_tuning_pipeline").last_run
best_model = run.steps["selection_step"].outputs["best_model"].load()
For a deeper exploration of how to query past pipeline runs, see the Inspecting past pipeline runs tutorial.
Next steps
Replace the simple grid‑search with a more sophisticated tuner (e.g.
sklearn.model_selection.GridSearchCV
or Optuna).Serve the winning model via a Model Deployer to serve it right away.
Move the pipeline to a remote orchestrator to scale out the search.
Last updated
Was this helpful?