Tensorflow 模型分析模型验证

概述

TFMA 支持通过根据支持的指标设置值阈值和变化阈值来验证模型。

配置

GenericValueThreshold

值阈值可用于通过检查相应指标是否大于下限和/或小于上限来对候选模型进行门控。用户可以设置下限和上限值中的一个或两个。如果未设置，下限默认为负无穷大，如果未设置，上限默认为无穷大。

import tensorflow_model_analysis as tfma

lower_bound = tfma.GenericValueThreshold(lower_bound={'value':0})
upper_bound = tfma.GenericValueThreshold(upper_bound={'value':1})
lower_upper_bound = tfma.GenericValueThreshold(lower_bound={'value':0},
                                               upper_bound={'value':1))

GenericChangeThreshold

变化阈值可用于通过检查相应指标是否大于/小于基线模型的指标来对候选模型进行门控。有两种方法可以衡量变化：绝对变化和相对变化。绝对变化计算为候选模型和基线模型指标之间的值差，即 v_c - v_b，其中 v_c 表示候选指标值，v_b 表示基线值。相对值是候选模型指标与基线指标之间的相对差，即 v_c/v_b。绝对阈值和相对阈值可以共存，以根据两个标准对模型进行门控。除了设置阈值外，用户还需要配置 MetricDirection。对于指标值越高越好的指标（例如 AUC），将方向设置为 HIGHER_IS_BETTER，对于指标值越低越好的指标（例如损失），将方向设置为 LOWER_IS_BETTER。变化阈值需要与候选模型一起评估基线模型。有关示例，请参阅入门指南。

import tensorflow_model_analysis as tfma

absolute_higher_is_better = tfma.GenericChangeThreshold(absolute={'value':1},
                                                        direction=tfma.MetricDirection.HIGHER_IS_BETTER)
absolute_lower_is_better = tfma.GenericChangeThreshold(absolute={'value':1},
                                                       direction=tfma.MetricDirection.LOWER_IS_BETTER)
relative_higher_is_better = tfma.GenericChangeThreshold(relative={'value':1},
                                                        direction=tfma.MetricDirection.HIGHER_IS_BETTER)
relative_lower_is_better = tfma.GenericChangeThreshold(relative={'value':1},
                                                       direction=tfma.MetricDirection.LOWER_IS_BETTER)
absolute_and_relative = tfma.GenericChangeThreshold(relative={'value':1},
                                                    absolute={'value':0.2},
                                                    direction=tfma.MetricDirection.LOWER_IS_BETTER)

将所有内容整合在一起

以下示例结合了值阈值和变化阈值

import tensorflow_model_analysis as tfma

lower_bound = tfma.GenericValueThreshold(lower_bound={'value':0.7})
relative_higher_is_better =
    tfma.GenericChangeThreshold(relative={'value':1.01},
                                direction=tfma.MetricDirection.HIGHER_IS_BETTER)
auc_threshold = tfma.MetricThreshold(value_threshold=lower_bound,
                                     change_threshold=relative_higher_is_better)

以 proto 格式写下配置可能更易读

from google.protobuf import text_format

auc_threshold = text_format.Parse("""
  value_threshold { lower_bound { value: 0.6 } }
  change_threshold { relative { value: 1.01 } }
""", tfma.MetricThreshold())

MetricThreshold 可以设置为对模型训练时间指标（EvalSavedModel 或 Keras 保存的模型）和训练后指标（在 TFMA 配置中定义）进行门控。对于训练时间指标，阈值在 tfma.MetricsSpec 中指定

metrics_spec = tfma.MetricSpec(thresholds={'auc': auc_threshold})

对于训练后指标，阈值直接在 tfma.MetricConfig 中定义

metric_config = tfma.MetricConfig(class_name='TotalWeightedExample',
                                  threshold=lower_bound)

以下是一个示例，以及 EvalConfig 中的其他设置

# Run in a Jupyter Notebook.
from google.protobuf import text_format

eval_config = text_format.Parse("""
  model_specs {
    # This assumes a serving model with a "serving_default" signature.
    label_key: "label"
    example_weight_key: "weight"
  }
  metrics_spec {
    # Training Time metric thresholds
    thresholds {
      key: "auc"
      value: {
        value_threshold {
          lower_bound { value: 0.7 }
        }
        change_threshold {
          direction: HIGHER_IS_BETTER
          absolute { value: -1e-10 }
        }
      }
    }
    # Post Training metrics and their thesholds.
    metrics {
      # This assumes a binary classification model.
      class_name: "AUC"
      threshold {
        value_threshold {
          lower_bound { value: 0 }
        }
      }
    }
  }
  slicing_specs {}
  slicing_specs {
    feature_keys: ["age"]
  }
""", tfma.EvalConfig())

eval_shared_models = [
  tfma.default_eval_shared_model(
      model_name=tfma.CANDIDATE_KEY,
      eval_saved_model_path='/path/to/saved/candiate/model',
      eval_config=eval_config),
  tfma.default_eval_shared_model(
      model_name=tfma.BASELINE_KEY,
      eval_saved_model_path='/path/to/saved/baseline/model',
      eval_config=eval_config),
]

eval_result = tfma.run_model_analysis(
    eval_shared_models,
    eval_config=eval_config,
    # This assumes your data is a TFRecords file containing records in the
    # tf.train.Example format.
    data_location="/path/to/file/containing/tfrecords",
    output_path="/path/for/output")

tfma.view.render_slicing_metrics(eval_result)
tfma.load_validation_result(output_path)

输出

除了评估器输出的指标文件外，使用验证时还会输出一个额外的“验证”文件。有效负载格式为 ValidationResult。当没有失败时，输出将设置“validation_ok”为 True。当有失败时，将提供有关相关指标、阈值和观察到的指标值的信息。以下是一个示例，其中“weighted_examle_count”未通过值阈值（1.5 不小于 1.0，因此失败）

  validation_ok: False
  metric_validations_per_slice {
    failures {
      metric_key {
        name: "weighted_example_count"
        model_name: "candidate"
      }
      metric_threshold {
        value_threshold {
          upper_bound { value: 1.0 }
        }
      }
      metric_value {
        double_value { value: 1.5 }
      }
    }
  }