TF.Text 指标

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看 下载笔记本

概览

TensorFlow Text 提供了一组与文本指标相关的类和操作,可与 TensorFlow 2.0 配合使用。该库包含文本相似性指标(例如 ROUGE-L)的实现,这是自动评估文本生成模型所必需的。

在评估模型时使用这些操作的好处是,它们与 TPU 评估兼容,并且与 TF 流式指标 API 配合得很好。

设置

pip install -q "tensorflow-text==2.11.*"
import tensorflow as tf
import tensorflow_text as text
2023-11-16 14:15:17.221359: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-11-16 14:15:18.023785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-11-16 14:15:18.023888: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-11-16 14:15:18.023899: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

ROUGE-L

Rouge-L 指标是一个介于 0 到 1 之间的分数,表示两个序列的相似程度,基于最长公共子序列 (LCS) 的长度。具体来说,Rouge-L 是 LCS 精度(LCS 覆盖的假设序列的百分比)和 LCS 召回率(LCS 覆盖的参考序列的百分比)的加权调和平均值(或 f-measure)。

来源:https://www.microsoft.com/en-us/research/publication/rouge-a-package-for-automatic-evaluation-of-summaries/

TF.Text 实现为每个(假设、参考)对返回 F-measure、精度和召回率。

考虑以下假设/参考对

hypotheses = tf.ragged.constant([['captain', 'of', 'the', 'delta', 'flight'],
                                 ['the', '1990', 'transcript']])
references = tf.ragged.constant([['delta', 'air', 'lines', 'flight'],
                                 ['this', 'concludes', 'the', 'transcript']])
2023-11-16 14:15:19.800786: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-11-16 14:15:19.800884: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2023-11-16 14:15:19.800949: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2023-11-16 14:15:19.801005: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2023-11-16 14:15:19.858671: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2023-11-16 14:15:19.858893: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://tensorflowcn.cn/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

假设和参考应为令牌的 tf.RaggedTensors。需要令牌而不是原始句子,因为没有单一的标记化策略适合所有任务。

现在我们可以调用 text.metrics.rouge_l 并获得结果

result = text.metrics.rouge_l(hypotheses, references)
print('F-Measure: %s' % result.f_measure)
print('P-Measure: %s' % result.p_measure)
print('R-Measure: %s' % result.r_measure)
F-Measure: tf.Tensor([0.44444448 0.57142854], shape=(2,), dtype=float32)
P-Measure: tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
R-Measure: tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)

ROUGE-L 有一个附加超参数 alpha,它决定用于计算 F-Measure 的调和平均值的权重。接近 0 的值将召回率视为更重要,接近 1 的值将精确率视为更重要。alpha 的默认值为 .5,对应于精确率和召回率的权重相等。

# Compute ROUGE-L with alpha=0
result = text.metrics.rouge_l(hypotheses, references, alpha=0)
print('F-Measure (alpha=0): %s' % result.f_measure)
print('P-Measure (alpha=0): %s' % result.p_measure)
print('R-Measure (alpha=0): %s' % result.r_measure)
F-Measure (alpha=0): tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)
P-Measure (alpha=0): tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
R-Measure (alpha=0): tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)
# Compute ROUGE-L with alpha=1
result = text.metrics.rouge_l(hypotheses, references, alpha=1)
print('F-Measure (alpha=1): %s' % result.f_measure)
print('P-Measure (alpha=1): %s' % result.p_measure)
print('R-Measure (alpha=1): %s' % result.r_measure)
F-Measure (alpha=1): tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
P-Measure (alpha=1): tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
R-Measure (alpha=1): tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)