Return multiple outputs from a step
Use Annotated to return multiple outputs from a step and name them for easy retrieval and dashboard display.
You can use the Annotated
type to return multiple outputs from a step and give each output a name. Naming your step outputs will help you retrieve the specific artifact later and also improves the readability of your pipeline's dashboard.
In this code, the clean_data
step takes a pandas DataFrame as input and returns a tuple of four elements: x_train
, x_test
, y_train
, and y_test
. Each element in the tuple is annotated with a specific name using the Annotated
type.
Inside the step, we split the input data into features (x
) and target (y
), and then use train_test_split
from scikit-learn to split the data into training and testing sets. The resulting DataFrames and Series are returned as a tuple, with each element annotated with its respective name.
By using Annotated
, we can easily identify and retrieve specific artifacts later in the pipeline. Additionally, the names will be displayed on the pipeline's dashboard, making it more readable and understandable.
Last updated