评估器 TFX 管道组件

评估器 TFX 管道组件对模型的训练结果进行深入分析，以帮助您了解模型在数据子集上的表现。评估器还有助于您验证导出的模型，确保它们“足够好”可以推送到生产环境中。

启用验证后，评估器会将新模型与基线（例如当前正在服务的模型）进行比较，以确定它们相对于基线是否“足够好”。它通过在评估数据集上评估两个模型并计算它们在指标（例如 AUC、损失）上的性能来实现。如果新模型的指标相对于基线模型满足开发人员指定的标准（例如 AUC 不低于基线），则该模型将被“认可”（标记为良好），指示 Pusher 可以将该模型推送到生产环境中。

消耗
- 来自 Examples 的评估拆分
- 来自 Trainer 的训练模型
- 先前认可的模型（如果要执行验证）
发出
- 分析结果到 ML Metadata
- 验证结果到 ML Metadata（如果要执行验证）

评估器和 TensorFlow 模型分析

评估器利用 TensorFlow 模型分析库来执行分析，该库反过来使用 Apache Beam 进行可扩展处理。

使用评估器组件

评估器管道组件通常非常易于部署，并且需要很少的自定义，因为大部分工作都是由评估器 TFX 组件完成的。

要设置评估器，需要以下信息

要配置的指标（仅在模型之外添加其他指标时才需要）。有关更多信息，请参阅 Tensorflow 模型分析指标。
要配置的切片（如果没有给出切片，则默认情况下会添加一个“总体”切片）。有关更多信息，请参阅 Tensorflow 模型分析设置。

如果要包含验证，则需要以下其他信息

要比较的模型（最新认可的模型等）。
要验证的模型验证（阈值）。有关更多信息，请参阅 Tensorflow 模型分析模型验证。

启用后，将针对所有已定义的指标和切片执行验证。

典型的代码如下所示

import tensorflow_model_analysis as tfma
...

# For TFMA evaluation

eval_config = tfma.EvalConfig(
    model_specs=[
        # This assumes a serving model with signature 'serving_default'. If
        # using estimator based EvalSavedModel, add signature_name='eval' and
        # remove the label_key. Note, if using a TFLite model, then you must set
        # model_type='tf_lite'.
        tfma.ModelSpec(label_key='<label_key>')
    ],
    metrics_specs=[
        tfma.MetricsSpec(
            # The metrics added here are in addition to those saved with the
            # model (assuming either a keras model or EvalSavedModel is used).
            # Any metrics added into the saved model (for example using
            # model.compile(..., metrics=[...]), etc) will be computed
            # automatically.
            metrics=[
                tfma.MetricConfig(class_name='ExampleCount'),
                tfma.MetricConfig(
                    class_name='BinaryAccuracy',
                    threshold=tfma.MetricThreshold(
                        value_threshold=tfma.GenericValueThreshold(
                            lower_bound={'value': 0.5}),
                        change_threshold=tfma.GenericChangeThreshold(
                            direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                            absolute={'value': -1e-10})))
            ]
        )
    ],
    slicing_specs=[
        # An empty slice spec means the overall slice, i.e. the whole dataset.
        tfma.SlicingSpec(),
        # Data can be sliced along a feature column. In this case, data is
        # sliced along feature column trip_start_hour.
        tfma.SlicingSpec(feature_keys=['trip_start_hour'])
    ])

# The following component is experimental and may change in the future. This is
# required to specify the latest blessed model will be used as the baseline.
model_resolver = Resolver(
      strategy_class=dsl.experimental.LatestBlessedModelStrategy,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing)
).with_id('latest_blessed_model_resolver')

model_analyzer = Evaluator(
      examples=examples_gen.outputs['examples'],
      model=trainer.outputs['model'],
      baseline_model=model_resolver.outputs['model'],
      # Change threshold will be ignored if there is no baseline (first run).
      eval_config=eval_config)

评估器会生成一个 EvalResult（如果使用了验证，则可以选择生成一个 ValidationResult），可以使用 TFMA 加载。以下是如何将结果加载到 Jupyter 笔记本中的示例

import tensorflow_model_analysis as tfma

output_path = evaluator.outputs['evaluation'].get()[0].uri

# Load the evaluation results.
eval_result = tfma.load_eval_result(output_path)

# Visualize the metrics and plots using tfma.view.render_slicing_metrics,
# tfma.view.render_plot, etc.
tfma.view.render_slicing_metrics(tfma_result)
...

# Load the validation results
validation_result = tfma.load_validation_result(output_path)
if not validation_result.validation_ok:
  ...

有关更多详细信息，请参阅评估器 API 参考。