Evidently
How to keep your data quality in check and guard against data and model drift with Evidently profiling
When would you want to use it?
You should use the Evidently Data Validator when you need the following data and/or model validation features that are possible with Evidently:
How do you deploy it?
The Evidently Data Validator flavor is included in the Evidently ZenML integration, you need to install it on your local machine to be able to register an Evidently Data Validator and add it to your stack:
The Data Validator stack component does not have any configuration parameters. Adding it to a stack is as simple as running e.g.:
How do you use it?
Evidently's profiling functions take in a pandas.DataFrame
dataset or a pair of datasets and generate results in the form of a Profile
object containing all the relevant information, or as a Dashboard
visualization.
There are three ways you can use Evidently in your ZenML pipelines that allow different levels of flexibility:
The Evidently standard step
ZenML wraps the Evidently functionality in the form of a standard EvidentlyProfileStep
step. You select which reports you want to generate in your step by passing a list of string identifiers into the EvidentlyProfileConfig
:
The step can then be inserted into your pipeline where it can take in two datasets, e.g.:
Possible report options supported by Evidently are:
"datadrift"
"categoricaltargetdrift"
"numericaltargetdrift"
"dataquality"
"classificationmodelperformance"
"regressionmodelperformance"
"probabilisticmodelperformance"
If needed, Evidently column mappings can be passed into the step configuration, but as zenml.integrations.evidently.steps.EvidentlyColumnMapping
objects, which have the exact same structure as evidently.pipeline.column_mapping.ColumnMapping
:
You can also check out our examples pages for working examples that use the Evidently standard step:
The Evidently Data Validator
The Evidently Data Validator implements the same interface as do all Data Validators, so this method forces you to maintain some level of compatibility with the overall Data Validator abstraction, which guarantees an easier migration in case you decide to switch to another Data Validator.
All you have to do is call the Evidently Data Validator methods when you need to interact with Evidently to generate data profiles, e.g.:
Call Evidently directly
You can use the Evidently library directly in your custom pipeline steps, and only leverage ZenML's capability of serializing, versioning and storing the Profile
objects in its Artifact Store, e.g.:
The Evidently ZenML Visualizer
The Evidently dashboards will be opened as tabs in your browser, or displayed inline in your Jupyter notebook, depending on where you are running the code:
Last updated