The Evaluator is executed right after Trainer step in the TrainingPipeline. The purpose of the evaluator step is to calculate relevant metrics to ascertain how well a trained model performed on un-seen data. It gives the user a chance to dig a bit deeper into the trained model and grasp a better understanding of it by conducting a post-training evaluation which features configurable slices and metrics.

Standard Evaluators

There are some built-in evaluators available for use with ZenML.


The TFMAEvaluator utilizes the Tensorflow Model Analysis (TFMA) to evaluate a model produced by the TrainerStep. Structurally, it features two main parameters, namely slices and metrics.

  • slices represent a list of string values, which hold the slicing columns. In more detail, the data will be sliced based on the columns given in this list and the post-training evaluation will be conducted on each slice.

  • metrics define which metrics will be computed during the evaluation.


Currently, the TFMAEvaluator only works with Trainers fully implementing the TFBaseTrainerStep interface. An example is the standard tf_ff_trainer.FeedForwardTrainer step.


Let’s start with a simple example. Imagine you are dealing with a simple regression task on a single label and you want to compute the mean_squared_error of your trained model on the eval dataset. Moreover, you want to do this not just on a slice of the entire eval dataset, but also on each category within the native_country feature within your eval dataset. In order to configure your pipeline accordingly, you can add the step:

                  metrics={'native_country': ['mean_squared_error']}))

Taking on a bit more complicated example, let us assume that the task handles more than one label, age and income_bracket and due to the nature of these labels, you want to apply different metrics to each label. Furthermore, considering the slices, you do not just want to slice on features, native_country and marital_status, but you also want to combine them to create a multi-feature slicing which features both the native_country and the marital_status:

            ['native_country', 'marital_status']
            'age': ['mean_squared_error'],
            'income_bracket': ['binary_crossentropy']

Create custom evaluator

The mechanism to create a custom Evaluator will be published in more detail soon in this space. However, the details of this are currently being worked out and will be made available in future releases.

If you need this functionality earlier, then ping us on our Slack or create an issue on GitHub so that we know about it!