使用 RNN

作者：Scott Zhu、Francois Chollet

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 GitHub 上查看源代码

在 keras.io 上查看

简介

循环神经网络 (RNN) 是一类神经网络，对于对时间序列或自然语言等序列数据进行建模非常强大。

从结构上讲，RNN 层使用 for 循环迭代序列的时间步长，同时维护一个内部状态，该状态编码它所见过的所有时间步长的信息。

Keras RNN API 的设计重点是

易用性：内置的 keras.layers.RNN、keras.layers.LSTM、keras.layers.GRU 层使您能够快速构建循环模型，而无需做出困难的配置选择。
易于自定义：您还可以使用自定义行为定义自己的 RNN 单元层（for 循环的内部部分），并将其与通用 keras.layers.RNN 层（for 循环本身）一起使用。这使您能够以灵活的方式快速对不同的研究想法进行原型设计，并且代码量最少。

设置

import numpy as np
import tensorflow as tf
import keras
from keras import layers

2023-11-16 12:10:07.977993: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-16 12:10:07.978039: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-16 12:10:07.979464: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

内置 RNN 层：一个简单的示例

Keras 中有三个内置 RNN 层

keras.layers.SimpleRNN，一个全连接的 RNN，其中前一个时间步的输出将被馈送到下一个时间步。
keras.layers.GRU，首次提出于 Cho 等人，2014 年。
keras.layers.LSTM，首次提出于 Hochreiter & Schmidhuber，1997 年。

在 2015 年初，Keras 拥有第一个可重用的 LSTM 和 GRU 开源 Python 实现。

这是一个 Sequential 模型的简单示例，它处理整数序列，将每个整数嵌入到 64 维向量中，然后使用 LSTM 层处理向量序列。

model = keras.Sequential()
# Add an Embedding layer expecting input vocab of size 1000, and
# output embedding dimension of size 64.
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# Add a LSTM layer with 128 internal units.
model.add(layers.LSTM(128))

# Add a Dense layer with 10 units.
model.add(layers.Dense(10))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 64)          64000     
                                                                 
 lstm (LSTM)                 (None, 128)               98816     
                                                                 
 dense (Dense)               (None, 10)                1290      
                                                                 
=================================================================
Total params: 164106 (641.04 KB)
Trainable params: 164106 (641.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

内置 RNN 支持许多有用的功能

循环 dropout，通过 dropout 和 recurrent_dropout 参数
能够反向处理输入序列，通过 go_backwards 参数
循环展开（当在 CPU 上处理短序列时，这会导致速度大幅提升），通过 unroll 参数实现
...等等。

有关更多信息，请参阅 RNN API 文档。

输出和状态

默认情况下，RNN 层的输出包含每个样本的单个向量。此向量是对应于最后一个时间步的 RNN 单元输出，包含有关整个输入序列的信息。此输出的形状为 (batch_size, units)，其中 units 对应于传递给层构造函数的 units 参数。

如果将 return_sequences=True 设置为 True，则 RNN 层还可以返回每个样本的整个输出序列（每个样本每个时间步一个向量）。此输出的形状为 (batch_size, timesteps, units)。

model = keras.Sequential()
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# The output of GRU will be a 3D tensor of shape (batch_size, timesteps, 256)
model.add(layers.GRU(256, return_sequences=True))

# The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128)
model.add(layers.SimpleRNN(128))

model.add(layers.Dense(10))

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, None, 64)          64000     
                                                                 
 gru (GRU)                   (None, None, 256)         247296    
                                                                 
 simple_rnn (SimpleRNN)      (None, 128)               49280     
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 361866 (1.38 MB)
Trainable params: 361866 (1.38 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

此外，RNN 层可以返回其最终内部状态。返回的状态可用于稍后恢复 RNN 执行，或初始化另一个 RNN。此设置通常用于编码器-解码器序列到序列模型，其中编码器最终状态用作解码器的初始状态。

要配置 RNN 层以返回其内部状态，请在创建层时将 return_state 参数设置为 True。请注意，LSTM 有 2 个状态张量，但 GRU 只有一个。

要配置层的初始状态，只需使用额外的关键字参数 initial_state 调用该层即可。请注意，状态的形状需要与层的单元大小匹配，如下面的示例所示。

encoder_vocab = 1000
decoder_vocab = 2000

encoder_input = layers.Input(shape=(None,))
encoder_embedded = layers.Embedding(input_dim=encoder_vocab, output_dim=64)(
    encoder_input
)

# Return states in addition to output
output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")(
    encoder_embedded
)
encoder_state = [state_h, state_c]

decoder_input = layers.Input(shape=(None,))
decoder_embedded = layers.Embedding(input_dim=decoder_vocab, output_dim=64)(
    decoder_input
)

# Pass the 2 states to a new LSTM layer, as initial state
decoder_output = layers.LSTM(64, name="decoder")(
    decoder_embedded, initial_state=encoder_state
)
output = layers.Dense(10)(decoder_output)

model = keras.Model([encoder_input, decoder_input], output)
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 input_1 (InputLayer)        [(None, None)]               0         []                            
                                                                                                  
 input_2 (InputLayer)        [(None, None)]               0         []                            
                                                                                                  
 embedding_2 (Embedding)     (None, None, 64)             64000     ['input_1[0][0]']             
                                                                                                  
 embedding_3 (Embedding)     (None, None, 64)             128000    ['input_2[0][0]']             
                                                                                                  
 encoder (LSTM)              [(None, 64),                 33024     ['embedding_2[0][0]']         
                              (None, 64),                                                         
                              (None, 64)]                                                         
                                                                                                  
 decoder (LSTM)              (None, 64)                   33024     ['embedding_3[0][0]',         
                                                                     'encoder[0][1]',             
                                                                     'encoder[0][2]']             
                                                                                                  
 dense_2 (Dense)             (None, 10)                   650       ['decoder[0][0]']             
                                                                                                  
==================================================================================================
Total params: 258698 (1010.54 KB)
Trainable params: 258698 (1010.54 KB)
Non-trainable params: 0 (0.00 Byte)
__________________________________________________________________________________________________

RNN 层和 RNN 单元

除了内置的 RNN 层之外，RNN API 还提供单元级 API。与处理整个输入序列批次的 RNN 层不同，RNN 单元仅处理单个时间步。

单元是 RNN 层的 for 循环的内部。将单元包装在 keras.layers.RNN 层中，您将获得一个能够处理序列批次的层，例如 RNN(LSTMCell(10))。

在数学上，RNN(LSTMCell(10)) 会产生与 LSTM(10) 相同的结果。实际上，TF v1.x 中此层的实现只是创建了相应的 RNN 单元并将其包装在 RNN 层中。但是，使用内置的 GRU 和 LSTM 层可以启用 CuDNN 的使用，并且您可能会看到更好的性能。

有三个内置的 RNN 单元，每个单元都对应于匹配的 RNN 层。

keras.layers.SimpleRNNCell 对应于 SimpleRNN 层。
keras.layers.GRUCell 对应于 GRU 层。
keras.layers.LSTMCell 对应于 LSTM 层。

单元抽象以及通用的 keras.layers.RNN 类，使得为您的研究实现自定义 RNN 架构变得非常容易。

跨批次状态性

在处理非常长的序列（可能是无限的）时，您可能希望使用 **跨批次状态性** 模式。

通常，RNN 层的内部状态会在每次看到新批次时重置（即，假设层看到的每个样本都独立于过去）。该层仅在处理给定样本时保持状态。

但是，如果您有非常长的序列，将它们分解成更短的序列并在不重置层状态的情况下将这些更短的序列依次馈送到 RNN 层中非常有用。这样，即使该层一次只看到一个子序列，它也可以保留有关整个序列的信息。

您可以在构造函数中将 stateful=True 设置为 True 来实现这一点。

如果您有一个序列 s = [t0, t1, ... t1546, t1547]，您将将其拆分为例如

s1 = [t0, t1, ... t100]
s2 = [t101, ... t201]
...
s16 = [t1501, ... t1547]

然后，您将通过以下方式处理它

lstm_layer = layers.LSTM(64, stateful=True)
for s in sub_sequences:
  output = lstm_layer(s)

当您想要清除状态时，可以使用 layer.reset_states()。

注意： 在此设置中，假设给定批次中的样本 i 是前一批次中样本 i 的延续。这意味着所有批次都应包含相同数量的样本（批次大小）。例如，如果一个批次包含 [sequence_A_from_t0_to_t100, sequence_B_from_t0_to_t100]，则下一个批次应包含 [sequence_A_from_t101_to_t200, sequence_B_from_t101_to_t200]。

以下是一个完整的示例

paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)

lstm_layer = layers.LSTM(64, stateful=True)
output = lstm_layer(paragraph1)
output = lstm_layer(paragraph2)
output = lstm_layer(paragraph3)

# reset_states() will reset the cached state to the original initial_state.
# If no initial_state was provided, zero-states will be used by default.
lstm_layer.reset_states()

RNN 状态重用

记录的 RNN 层状态不包含在 layer.weights() 中。如果您想重用来自 RNN 层的状态，可以通过 layer.states 获取状态值，并将其用作通过 Keras 函数式 API（如 new_layer(inputs, initial_state=layer.states)）或模型子类化的新层的初始状态。

另请注意，在这种情况下可能无法使用顺序模型，因为它仅支持具有单个输入和输出的层，初始状态的额外输入使其无法在此处使用。

paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)

lstm_layer = layers.LSTM(64, stateful=True)
output = lstm_layer(paragraph1)
output = lstm_layer(paragraph2)

existing_state = lstm_layer.states

new_lstm_layer = layers.LSTM(64)
new_output = new_lstm_layer(paragraph3, initial_state=existing_state)

双向 RNN

对于除时间序列以外的序列（例如文本），RNN 模型通常可以通过不仅从头到尾处理序列，而且还可以反向处理序列来获得更好的性能。例如，要预测句子中的下一个词，通常需要了解该词周围的上下文，而不仅仅是它之前的词。

Keras 提供了一个简单的 API，您可以使用它来构建这种双向 RNN：keras.layers.Bidirectional 包装器。

model = keras.Sequential()

model.add(
    layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(5, 10))
)
model.add(layers.Bidirectional(layers.LSTM(32)))
model.add(layers.Dense(10))

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bidirectional (Bidirection  (None, 5, 128)            38400     
 al)                                                             
                                                                 
 bidirectional_1 (Bidirecti  (None, 64)                41216     
 onal)                                                           
                                                                 
 dense_3 (Dense)             (None, 10)                650       
                                                                 
=================================================================
Total params: 80266 (313.54 KB)
Trainable params: 80266 (313.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

在幕后，Bidirectional 将复制传入的 RNN 层，并将新复制层的 go_backwards 字段翻转，以便它以相反的顺序处理输入。

默认情况下，Bidirectional RNN 的输出将是前向层输出和后向层输出的串联。如果您需要不同的合并行为，例如串联，请更改 Bidirectional 包装器构造函数中的 merge_mode 参数。有关 Bidirectional 的更多详细信息，请查看 API 文档。

性能优化和 CuDNN 内核

在 TensorFlow 2.0 中，内置的 LSTM 和 GRU 层已更新为在 GPU 可用时默认情况下利用 CuDNN 内核。通过此更改，先前的 keras.layers.CuDNNLSTM/CuDNNGRU 层已弃用，您可以构建模型而无需担心它将在哪个硬件上运行。

由于 CuDNN 内核是在某些假设下构建的，这意味着如果您更改内置 LSTM 或 GRU 层的默认值，则该层 **将无法使用 CuDNN 内核**。例如

将 activation 函数从 tanh 更改为其他函数。
将 recurrent_activation 函数从 sigmoid 更改为其他函数。
使用 recurrent_dropout > 0。
将 unroll 设置为 True，这会强制 LSTM/GRU 将内部 tf.while_loop 分解为展开的 for 循环。
将 use_bias 设置为 False。
当输入数据不是严格右填充时使用掩码（如果掩码对应于严格右填充数据，则仍然可以使用 CuDNN。这是最常见的情况）。

有关约束的详细列表，请参阅 LSTM 和 GRU 层的文档。

在可用时使用 CuDNN 内核

让我们构建一个简单的 LSTM 模型来演示性能差异。

我们将使用 MNIST 数字的行序列作为输入序列（将每一行像素视为一个时间步），并将预测数字的标签。

batch_size = 64
# Each MNIST image batch is a tensor of shape (batch_size, 28, 28).
# Each input sequence will be of size (28, 28) (height is treated like time).
input_dim = 28

units = 64
output_size = 10  # labels are from 0 to 9


# Build the RNN model
def build_model(allow_cudnn_kernel=True):
    # CuDNN is only available at the layer level, and not at the cell level.
    # This means `LSTM(units)` will use the CuDNN kernel,
    # while RNN(LSTMCell(units)) will run on non-CuDNN kernel.
    if allow_cudnn_kernel:
        # The LSTM layer with default options uses CuDNN.
        lstm_layer = keras.layers.LSTM(units, input_shape=(None, input_dim))
    else:
        # Wrapping a LSTMCell in a RNN layer will not use CuDNN.
        lstm_layer = keras.layers.RNN(
            keras.layers.LSTMCell(units), input_shape=(None, input_dim)
        )
    model = keras.models.Sequential(
        [
            lstm_layer,
            keras.layers.BatchNormalization(),
            keras.layers.Dense(output_size),
        ]
    )
    return model

让我们加载 MNIST 数据集

mnist = keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
sample, sample_label = x_train[0], y_train[0]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

让我们创建一个模型实例并对其进行训练。

我们选择 sparse_categorical_crossentropy 作为模型的损失函数。模型的输出形状为 [batch_size, 10]。模型的目标是一个整数向量，每个整数都在 0 到 9 的范围内。

model = build_model(allow_cudnn_kernel=True)

model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer="sgd",
    metrics=["accuracy"],
)


model.fit(
    x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1
)

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1700136618.250305    9824 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
938/938 [==============================] - 7s 5ms/step - loss: 0.9965 - accuracy: 0.6845 - val_loss: 0.5699 - val_accuracy: 0.8181
<keras.src.callbacks.History at 0x7f71d8117c10>

现在，让我们将其与不使用 CuDNN 内核的模型进行比较

noncudnn_model = build_model(allow_cudnn_kernel=False)
noncudnn_model.set_weights(model.get_weights())
noncudnn_model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer="sgd",
    metrics=["accuracy"],
)
noncudnn_model.fit(
    x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1
)

938/938 [==============================] - 20s 20ms/step - loss: 0.4268 - accuracy: 0.8698 - val_loss: 0.3017 - val_accuracy: 0.9145
<keras.src.callbacks.History at 0x7f71d84e5520>

在安装了 NVIDIA GPU 和 CuDNN 的机器上运行时，使用 CuDNN 构建的模型的训练速度比使用常规 TensorFlow 内核的模型快得多。

相同的启用 CuDNN 的模型也可以用于在仅 CPU 的环境中运行推理。下面的 tf.device 注释只是强制执行设备放置。如果不可用 GPU，则模型将默认在 CPU 上运行。

您只需不再担心运行的硬件。这难道不酷吗？

import matplotlib.pyplot as plt

with tf.device("CPU:0"):
    cpu_model = build_model(allow_cudnn_kernel=True)
    cpu_model.set_weights(model.get_weights())
    result = tf.argmax(cpu_model.predict_on_batch(tf.expand_dims(sample, 0)), axis=1)
    print(
        "Predicted result is: %s, target result is: %s" % (result.numpy(), sample_label)
    )
    plt.imshow(sample, cmap=plt.get_cmap("gray"))

Predicted result is: [3], target result is: 5

png

具有列表/字典输入或嵌套输入的 RNN

嵌套结构允许实现者在单个时间步内包含更多信息。例如，视频帧可以同时具有音频和视频输入。在这种情况下，数据形状可以是

[batch, timestep, {"video": [height, width, channel], "audio": [frequency]}]

在另一个示例中，手写数据可以同时具有笔的当前位置的坐标 x 和 y，以及压力信息。因此，数据表示可以是

[batch, timestep, {"location": [x, y], "pressure": [force]}]

以下代码提供了一个示例，说明如何构建一个接受这种结构化输入的自定义 RNN 单元。

定义一个支持嵌套输入/输出的自定义单元

有关编写自己的层的详细信息，请参阅通过子类化创建新的层和模型。

@keras.saving.register_keras_serializable()
class NestedCell(keras.layers.Layer):
    def __init__(self, unit_1, unit_2, unit_3, **kwargs):
        self.unit_1 = unit_1
        self.unit_2 = unit_2
        self.unit_3 = unit_3
        self.state_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
        self.output_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
        super().__init__(**kwargs)

    def build(self, input_shapes):
        # expect input_shape to contain 2 items, [(batch, i1), (batch, i2, i3)]
        i1 = input_shapes[0][1]
        i2 = input_shapes[1][1]
        i3 = input_shapes[1][2]

        self.kernel_1 = self.add_weight(
            shape=(i1, self.unit_1), initializer="uniform", name="kernel_1"
        )
        self.kernel_2_3 = self.add_weight(
            shape=(i2, i3, self.unit_2, self.unit_3),
            initializer="uniform",
            name="kernel_2_3",
        )

    def call(self, inputs, states):
        # inputs should be in [(batch, input_1), (batch, input_2, input_3)]
        # state should be in shape [(batch, unit_1), (batch, unit_2, unit_3)]
        input_1, input_2 = tf.nest.flatten(inputs)
        s1, s2 = states

        output_1 = tf.matmul(input_1, self.kernel_1)
        output_2_3 = tf.einsum("bij,ijkl->bkl", input_2, self.kernel_2_3)
        state_1 = s1 + output_1
        state_2_3 = s2 + output_2_3

        output = (output_1, output_2_3)
        new_states = (state_1, state_2_3)

        return output, new_states

    def get_config(self):
        return {"unit_1": self.unit_1, "unit_2": self.unit_2, "unit_3": self.unit_3}

使用嵌套输入/输出构建 RNN 模型

让我们构建一个使用 keras.layers.RNN 层和我们刚刚定义的自定义单元的 Keras 模型。

unit_1 = 10
unit_2 = 20
unit_3 = 30

i1 = 32
i2 = 64
i3 = 32
batch_size = 64
num_batches = 10
timestep = 50

cell = NestedCell(unit_1, unit_2, unit_3)
rnn = keras.layers.RNN(cell)

input_1 = keras.Input((None, i1))
input_2 = keras.Input((None, i2, i3))

outputs = rnn((input_1, input_2))

model = keras.models.Model([input_1, input_2], outputs)

model.compile(optimizer="adam", loss="mse", metrics=["accuracy"])

使用随机生成的数据训练模型

由于没有适合此模型的良好候选数据集，因此我们使用随机 Numpy 数据进行演示。

input_1_data = np.random.random((batch_size * num_batches, timestep, i1))
input_2_data = np.random.random((batch_size * num_batches, timestep, i2, i3))
target_1_data = np.random.random((batch_size * num_batches, unit_1))
target_2_data = np.random.random((batch_size * num_batches, unit_2, unit_3))
input_data = [input_1_data, input_2_data]
target_data = [target_1_data, target_2_data]

model.fit(input_data, target_data, batch_size=batch_size)

10/10 [==============================] - 1s 27ms/step - loss: 0.7623 - rnn_1_loss: 0.2873 - rnn_1_1_loss: 0.4750 - rnn_1_accuracy: 0.1016 - rnn_1_1_accuracy: 0.0350
<keras.src.callbacks.History at 0x7f734c8e2d30>

使用 Keras keras.layers.RNN 层，您只需要定义序列中单个步骤的数学逻辑，keras.layers.RNN 层将为您处理序列迭代。这是一种快速原型化新型 RNN（例如 LSTM 变体）的极其强大的方法。

有关更多详细信息，请访问 API 文档。