多任务推荐系统

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 下载笔记本

基本检索教程 中,我们使用电影观看作为积极交互信号构建了一个检索系统。

然而,在许多应用中,有多种丰富的反馈来源可供利用。例如,电子商务网站可能会记录用户访问产品页面(丰富但信号相对较弱)、点击图片、添加到购物车以及最终购买。它甚至可能会记录购买后的信号,例如评论和退货。

整合所有这些不同的反馈形式对于构建用户喜爱的系统至关重要,并且不会以整体性能为代价优化任何一项指标。

此外,为多个任务构建联合模型可能会产生比构建多个特定任务模型更好的结果。在某些数据丰富(例如点击)而某些数据稀疏(购买、退货、人工审核)的情况下尤其如此。在这些场景中,联合模型能够利用从丰富任务中学到的表示,通过称为迁移学习的现象来改进其对稀疏任务的预测。例如,这篇论文表明,通过添加使用丰富点击日志数据的辅助任务,可以大幅改进从稀疏用户调查中预测明确用户评分的模型。

在本教程中,我们将使用隐式(电影观看)和显式信号(评分)为 Movielens 构建多目标推荐器。

导入

让我们首先完成我们的导入。

pip install -q tensorflow-recommenders
pip install -q --upgrade tensorflow-datasets
import os
import pprint
import tempfile

from typing import Dict, Text

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
2022-12-14 12:23:34.727681: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-14 12:23:34.727787: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-14 12:23:34.727798: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
import tensorflow_recommenders as tfrs

准备数据集

我们将使用 Movielens 100K 数据集。

ratings = tfds.load('movielens/100k-ratings', split="train")
movies = tfds.load('movielens/100k-movies', split="train")

# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "user_rating": x["user_rating"],
})
movies = movies.map(lambda x: x["movie_title"])
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089

并重复我们的准备工作,为构建词汇表并将数据拆分为训练集和测试集

# Randomly shuffle data and split between train and test.
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

movie_titles = movies.batch(1_000)
user_ids = ratings.batch(1_000_000).map(lambda x: x["user_id"])

unique_movie_titles = np.unique(np.concatenate(list(movie_titles)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))

多任务模型

多任务推荐器有两个关键部分

  1. 它们针对两个或更多目标进行优化,因此具有两个或更多损失。
  2. 它们在任务之间共享变量,允许进行迁移学习。

在本教程中,我们将像以前一样定义我们的模型,但我们将有两个任务,而不是一个任务:一个预测评分,另一个预测电影观看。

用户和电影模型与以前一样

user_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=unique_user_ids, mask_token=None),
  # We add 1 to account for the unknown token.
  tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])

movie_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=unique_movie_titles, mask_token=None),
  tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
])

但是,现在我们将有两个任务。第一个是评分任务

tfrs.tasks.Ranking(
    loss=tf.keras.losses.MeanSquaredError(),
    metrics=[tf.keras.metrics.RootMeanSquaredError()],
)

其目标是尽可能准确地预测评分。

第二个是检索任务

tfrs.tasks.Retrieval(
    metrics=tfrs.metrics.FactorizedTopK(
        candidates=movies.batch(128)
    )
)

与以前一样,此任务的目标是预测用户将观看或不观看哪些电影。

将它们放在一起

我们在模型类中将所有内容放在一起。

此处的组件是 - 由于我们有两个任务和两个损失 - 我们需要决定每个损失的重要性。我们可以通过给每个损失一个权重来实现这一点,并将这些权重视为超参数。如果我们为评分任务分配较大的损失权重,我们的模型将专注于预测评分(但仍然会使用检索任务中的一些信息);如果我们为检索任务分配较大的损失权重,它将专注于检索。

class MovielensModel(tfrs.models.Model):

  def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
    # We take the loss weights in the constructor: this allows us to instantiate
    # several model objects with different loss weights.

    super().__init__()

    embedding_dimension = 32

    # User and movie models.
    self.movie_model: tf.keras.layers.Layer = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=unique_movie_titles, mask_token=None),
      tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
    ])
    self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=unique_user_ids, mask_token=None),
      tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
    ])

    # A small model to take in user and movie embeddings and predict ratings.
    # We can make this as complicated as we want as long as we output a scalar
    # as our prediction.
    self.rating_model = tf.keras.Sequential([
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dense(1),
    ])

    # The tasks.
    self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
        loss=tf.keras.losses.MeanSquaredError(),
        metrics=[tf.keras.metrics.RootMeanSquaredError()],
    )
    self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.movie_model)
        )
    )

    # The loss weights.
    self.rating_weight = rating_weight
    self.retrieval_weight = retrieval_weight

  def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
    # We pick out the user features and pass them into the user model.
    user_embeddings = self.user_model(features["user_id"])
    # And pick out the movie features and pass them into the movie model.
    movie_embeddings = self.movie_model(features["movie_title"])

    return (
        user_embeddings,
        movie_embeddings,
        # We apply the multi-layered rating model to a concatentation of
        # user and movie embeddings.
        self.rating_model(
            tf.concat([user_embeddings, movie_embeddings], axis=1)
        ),
    )

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

    ratings = features.pop("user_rating")

    user_embeddings, movie_embeddings, rating_predictions = self(features)

    # We compute the loss for each task.
    rating_loss = self.rating_task(
        labels=ratings,
        predictions=rating_predictions,
    )
    retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)

    # And combine them using the loss weights.
    return (self.rating_weight * rating_loss
            + self.retrieval_weight * retrieval_loss)

评分专用模型

根据我们分配的权重,模型将对任务进行不同的平衡编码。我们从一个仅考虑评分的模型开始。

model = MovielensModel(rating_weight=1.0, retrieval_weight=0.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()
model.fit(cached_train, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")
Epoch 1/3
10/10 [==============================] - 7s 319ms/step - root_mean_squared_error: 2.2354 - factorized_top_k/top_1_categorical_accuracy: 3.3750e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0026 - factorized_top_k/top_10_categorical_accuracy: 0.0060 - factorized_top_k/top_50_categorical_accuracy: 0.0305 - factorized_top_k/top_100_categorical_accuracy: 0.0599 - loss: 4.5809 - regularization_loss: 0.0000e+00 - total_loss: 4.5809
Epoch 2/3
10/10 [==============================] - 3s 319ms/step - root_mean_squared_error: 1.1220 - factorized_top_k/top_1_categorical_accuracy: 2.6250e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0025 - factorized_top_k/top_10_categorical_accuracy: 0.0056 - factorized_top_k/top_50_categorical_accuracy: 0.0304 - factorized_top_k/top_100_categorical_accuracy: 0.0601 - loss: 1.2614 - regularization_loss: 0.0000e+00 - total_loss: 1.2614
Epoch 3/3
10/10 [==============================] - 3s 315ms/step - root_mean_squared_error: 1.1170 - factorized_top_k/top_1_categorical_accuracy: 2.6250e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0024 - factorized_top_k/top_10_categorical_accuracy: 0.0057 - factorized_top_k/top_50_categorical_accuracy: 0.0304 - factorized_top_k/top_100_categorical_accuracy: 0.0605 - loss: 1.2500 - regularization_loss: 0.0000e+00 - total_loss: 1.2500
5/5 [==============================] - 3s 185ms/step - root_mean_squared_error: 1.1125 - factorized_top_k/top_1_categorical_accuracy: 5.0000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0034 - factorized_top_k/top_10_categorical_accuracy: 0.0065 - factorized_top_k/top_50_categorical_accuracy: 0.0309 - factorized_top_k/top_100_categorical_accuracy: 0.0599 - loss: 1.2326 - regularization_loss: 0.0000e+00 - total_loss: 1.2326
Retrieval top-100 accuracy: 0.060.
Ranking RMSE: 1.113.

该模型在预测评分方面表现不错(RMSE 约为 1.11),但在预测哪些电影会被观看或不被观看方面表现不佳:其准确度为 100,几乎比仅训练用于预测观看次数的模型差 4 倍。

专门用于检索的模型

现在我们尝试一个仅关注检索的模型。

model = MovielensModel(rating_weight=0.0, retrieval_weight=1.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
model.fit(cached_train, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")
Epoch 1/3
10/10 [==============================] - 4s 309ms/step - root_mean_squared_error: 3.6972 - factorized_top_k/top_1_categorical_accuracy: 5.8750e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0056 - factorized_top_k/top_10_categorical_accuracy: 0.0131 - factorized_top_k/top_50_categorical_accuracy: 0.0751 - factorized_top_k/top_100_categorical_accuracy: 0.1483 - loss: 69829.1612 - regularization_loss: 0.0000e+00 - total_loss: 69829.1612
Epoch 2/3
10/10 [==============================] - 3s 301ms/step - root_mean_squared_error: 3.6905 - factorized_top_k/top_1_categorical_accuracy: 0.0010 - factorized_top_k/top_5_categorical_accuracy: 0.0118 - factorized_top_k/top_10_categorical_accuracy: 0.0272 - factorized_top_k/top_50_categorical_accuracy: 0.1425 - factorized_top_k/top_100_categorical_accuracy: 0.2634 - loss: 67466.0661 - regularization_loss: 0.0000e+00 - total_loss: 67466.0661
Epoch 3/3
10/10 [==============================] - 3s 300ms/step - root_mean_squared_error: 3.6877 - factorized_top_k/top_1_categorical_accuracy: 0.0016 - factorized_top_k/top_5_categorical_accuracy: 0.0183 - factorized_top_k/top_10_categorical_accuracy: 0.0391 - factorized_top_k/top_50_categorical_accuracy: 0.1782 - factorized_top_k/top_100_categorical_accuracy: 0.3048 - loss: 66294.5128 - regularization_loss: 0.0000e+00 - total_loss: 66294.5128
5/5 [==============================] - 1s 188ms/step - root_mean_squared_error: 3.6884 - factorized_top_k/top_1_categorical_accuracy: 9.5000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0093 - factorized_top_k/top_10_categorical_accuracy: 0.0203 - factorized_top_k/top_50_categorical_accuracy: 0.1199 - factorized_top_k/top_100_categorical_accuracy: 0.2330 - loss: 31092.1455 - regularization_loss: 0.0000e+00 - total_loss: 31092.1455
Retrieval top-100 accuracy: 0.233.
Ranking RMSE: 3.688.

我们得到了相反的结果:一个在检索方面表现良好,但在预测评分方面表现不佳的模型。

联合模型

现在我们训练一个对这两个任务都分配正权重的模型。

model = MovielensModel(rating_weight=1.0, retrieval_weight=1.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
model.fit(cached_train, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")
Epoch 1/3
10/10 [==============================] - 4s 309ms/step - root_mean_squared_error: 2.0230 - factorized_top_k/top_1_categorical_accuracy: 5.8750e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0051 - factorized_top_k/top_10_categorical_accuracy: 0.0123 - factorized_top_k/top_50_categorical_accuracy: 0.0768 - factorized_top_k/top_100_categorical_accuracy: 0.1509 - loss: 69787.3722 - regularization_loss: 0.0000e+00 - total_loss: 69787.3722
Epoch 2/3
10/10 [==============================] - 3s 312ms/step - root_mean_squared_error: 1.3647 - factorized_top_k/top_1_categorical_accuracy: 0.0012 - factorized_top_k/top_5_categorical_accuracy: 0.0120 - factorized_top_k/top_10_categorical_accuracy: 0.0275 - factorized_top_k/top_50_categorical_accuracy: 0.1438 - factorized_top_k/top_100_categorical_accuracy: 0.2642 - loss: 67453.3125 - regularization_loss: 0.0000e+00 - total_loss: 67453.3125
Epoch 3/3
10/10 [==============================] - 3s 309ms/step - root_mean_squared_error: 1.1934 - factorized_top_k/top_1_categorical_accuracy: 0.0016 - factorized_top_k/top_5_categorical_accuracy: 0.0190 - factorized_top_k/top_10_categorical_accuracy: 0.0394 - factorized_top_k/top_50_categorical_accuracy: 0.1771 - factorized_top_k/top_100_categorical_accuracy: 0.3037 - loss: 66299.1676 - regularization_loss: 0.0000e+00 - total_loss: 66299.1676
5/5 [==============================] - 1s 190ms/step - root_mean_squared_error: 1.1100 - factorized_top_k/top_1_categorical_accuracy: 9.5000e-04 - factorized_top_k/top_5_categorical_accuracy: 0.0086 - factorized_top_k/top_10_categorical_accuracy: 0.0210 - factorized_top_k/top_50_categorical_accuracy: 0.1237 - factorized_top_k/top_100_categorical_accuracy: 0.2349 - loss: 31075.5518 - regularization_loss: 0.0000e+00 - total_loss: 31075.5518
Retrieval top-100 accuracy: 0.235.
Ranking RMSE: 1.110.

结果是一个在两个任务上的表现都与每个专门模型大致相同的模型。

进行预测

我们可以使用训练好的多任务模型来获取经过训练的用户和电影嵌入,以及预测的评分

trained_movie_embeddings, trained_user_embeddings, predicted_rating = model({
      "user_id": np.array(["42"]),
      "movie_title": np.array(["Dances with Wolves (1990)"])
  })
print("Predicted rating:")
print(predicted_rating)
Predicted rating:
tf.Tensor([[4.604047]], shape=(1, 1), dtype=float32)

虽然此处的结果并未显示出联合模型在这种情况下的明显准确性优势,但多任务学习通常是一个非常有用的工具。当我们可以将知识从数据丰富的任务(例如点击)转移到密切相关的数据稀疏任务(例如购买)时,我们可以期待更好的结果。