TF.Text 指标

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看 下载笔记本

概览

TensorFlow Text 提供了一系列与文本指标相关的类和操作,可直接用于 TensorFlow 2.0。该库包含文本相似度指标(如 ROUGE-L)的实现,这些指标是自动评估文本生成模型所必需的。

在模型评估中使用这些操作的好处在于,它们与 TPU 评估兼容,并能与 TF 流式指标 API 良好配合。

设置

pip install -q "tensorflow-text==2.11.*"
import tensorflow as tf
import tensorflow_text as text
2024-07-19 12:39:45.738293: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-07-19 12:39:46.534784: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-07-19 12:39:46.534901: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-07-19 12:39:46.534912: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

ROUGE-L

Rouge-L 指标是一个 0 到 1 之间的分数,表示两个序列的相似程度,基于最长公共子序列(LCS)的长度。具体而言,Rouge-L 是结合了 LCS 精确率(LCS 覆盖假设序列的百分比)和 LCS 召回率(LCS 覆盖参考序列的百分比)的加权调和平均值(或 F-measure)。

来源:https://www.microsoft.com/en-us/research/publication/rouge-a-package-for-automatic-evaluation-of-summaries/

TF.Text 的实现会返回每对(假设,参考)的 F-measure、精确率和召回率。

考虑以下假设/参考对

hypotheses = tf.ragged.constant([['captain', 'of', 'the', 'delta', 'flight'],
                                 ['the', '1990', 'transcript']])
references = tf.ragged.constant([['delta', 'air', 'lines', 'flight'],
                                 ['this', 'concludes', 'the', 'transcript']])
2024-07-19 12:39:48.200948: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-07-19 12:39:48.201065: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2024-07-19 12:39:48.201127: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2024-07-19 12:39:48.201185: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2024-07-19 12:39:48.257557: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2024-07-19 12:39:48.257746: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://tensorflowcn.cn/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

假设和参考应为标记的 tf.RaggedTensors。需要使用标记(tokens)而不是原始句子,因为没有单一的标记化策略适用于所有任务。

现在我们可以调用 text.metrics.rouge_l 并获取结果

result = text.metrics.rouge_l(hypotheses, references)
print('F-Measure: %s' % result.f_measure)
print('P-Measure: %s' % result.p_measure)
print('R-Measure: %s' % result.r_measure)
F-Measure: tf.Tensor([0.44444448 0.57142854], shape=(2,), dtype=float32)
P-Measure: tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
R-Measure: tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)

ROUGE-L 还有一个额外的超参数 alpha,它决定了用于计算 F-measure 的调和平均值的权重。越接近 0 的值表示召回率越重要,越接近 1 的值表示精确率越重要。alpha 默认为 .5,对应于精确率和召回率具有相同的权重。

# Compute ROUGE-L with alpha=0
result = text.metrics.rouge_l(hypotheses, references, alpha=0)
print('F-Measure (alpha=0): %s' % result.f_measure)
print('P-Measure (alpha=0): %s' % result.p_measure)
print('R-Measure (alpha=0): %s' % result.r_measure)
F-Measure (alpha=0): tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)
P-Measure (alpha=0): tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
R-Measure (alpha=0): tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)
# Compute ROUGE-L with alpha=1
result = text.metrics.rouge_l(hypotheses, references, alpha=1)
print('F-Measure (alpha=1): %s' % result.f_measure)
print('P-Measure (alpha=1): %s' % result.p_measure)
print('R-Measure (alpha=1): %s' % result.r_measure)
F-Measure (alpha=1): tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
P-Measure (alpha=1): tf.Tensor([0.4       0.6666667], shape=(2,), dtype=float32)
R-Measure (alpha=1): tf.Tensor([0.5 0.5], shape=(2,), dtype=float32)