Evidently
is a useful open-source library to painlessly check for data drift (among other features). At its core, Evidently's drift detection takes in a reference data set and compares it against another comparison dataset. These are both input in the form of a Pandas DataFrame
, though CSV inputs are also possible. You can receive these results in the form of a standard dictionary object containing all the relevant information, or as a visualization. ZenML supports both outputs.EvidentlyProfileConfig
. Possible options supported by Evidently are:drift_detection
example here. The key part of the pipeline definition above is when we use the datasets derived from the data_splitter
step (i.e. function) and pass them in as arguments to the drift_detector
function as part of the pipeline.