In this case, ZenML has an integration with sklearn so you can use the ZenML CLI to install the right version directly.
Steps with multiple outputs
Sometimes a step will have multiple outputs. In order to give each output a unique name, use the Output() annotation. Here we load an open-source dataset and split it into a train and a test dataset.
Here we are creating a training step for a support vector machine classifier with sklearn. As we might want to adjust the hyperparameter gamma later on, we define it as an input value to the step as well.
If you want to run the step function outside the context of a ZenML pipeline, all you need to do is call the .entrypoint() method with the same input signature. For example:
svc_trainer.entrypoint(X_train=..., y_train=...)
Next, we will combine our two steps into a pipeline and run it. As you can see, the parameter gamma is configurable as a pipeline input.
Best Practice: Always nest the actual execution of the pipeline inside an if __name__ == "__main__" condition. This ensures that loading the pipeline from elsewhere does not also run it.
if__name__=="__main__":first_pipeline()
Running python main.py should look somewhat like this in the terminal:
This name is automatically generated based on the current date and time. To change the name for a run, pass run_name as a parameter to the with_options() method:
Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the following placeholders that ZenML will replace:
{{date}} will resolve to the current date, e.g. 2023_02_19
{{time}} will resolve to the current time, e.g. 11_07_09_326492