顺序模型

作者: fchollet

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 在 keras.io 上查看

设置

import tensorflow as tf
import keras
from keras import layers

何时使用顺序模型

一个 Sequential 模型适用于 简单的层堆叠,其中每一层都有 正好一个输入张量和一个输出张量

从结构上看,以下 Sequential 模型

# Define Sequential model with 3 layers
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu", name="layer1"),
        layers.Dense(3, activation="relu", name="layer2"),
        layers.Dense(4, name="layer3"),
    ]
)
# Call model on a test input
x = tf.ones((3, 3))
y = model(x)

等效于此函数

# Create 3 layers
layer1 = layers.Dense(2, activation="relu", name="layer1")
layer2 = layers.Dense(3, activation="relu", name="layer2")
layer3 = layers.Dense(4, name="layer3")

# Call layers on a test input
x = tf.ones((3, 3))
y = layer3(layer2(layer1(x)))

顺序模型 不适合 以下情况:

  • 您的模型有多个输入或多个输出
  • 您的任何层有多个输入或多个输出
  • 您需要进行层共享
  • 您想要非线性拓扑(例如残差连接、多分支模型)

创建顺序模型

您可以通过将层列表传递给 Sequential 构造函数来创建顺序模型

model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)

可以通过 layers 属性访问其层

model.layers
[<keras.src.layers.core.dense.Dense at 0x7fa3c8de0100>,
 <keras.src.layers.core.dense.Dense at 0x7fa3c8de09a0>,
 <keras.src.layers.core.dense.Dense at 0x7fa5181b5c10>]

您也可以通过 add() 方法逐步创建顺序模型

model = keras.Sequential()
model.add(layers.Dense(2, activation="relu"))
model.add(layers.Dense(3, activation="relu"))
model.add(layers.Dense(4))

请注意,还有一个对应的 pop() 方法用于删除层:顺序模型的行为非常类似于层列表。

model.pop()
print(len(model.layers))  # 2
2

另请注意,Sequential 构造函数接受一个 name 参数,就像 Keras 中的任何层或模型一样。这对于使用语义上有意义的名称来注释 TensorBoard 图很有用。

model = keras.Sequential(name="my_sequential")
model.add(layers.Dense(2, activation="relu", name="layer1"))
model.add(layers.Dense(3, activation="relu", name="layer2"))
model.add(layers.Dense(4, name="layer3"))

提前指定输入形状

通常,Keras 中的所有层都需要知道其输入的形状才能创建其权重。因此,当您像这样创建层时,它最初没有权重

layer = layers.Dense(3)
layer.weights  # Empty
[]

它在第一次对输入进行调用时创建其权重,因为权重的形状取决于输入的形状

# Call layer on a test input
x = tf.ones((1, 4))
y = layer(x)
layer.weights  # Now it has weights, of shape (4, 3) and (3,)
[<tf.Variable 'dense_6/kernel:0' shape=(4, 3) dtype=float32, numpy=
 array([[ 0.1752373 ,  0.47623062,  0.24374962],
        [-0.0298934 ,  0.50255656,  0.78478384],
        [-0.58323103, -0.56861055, -0.7190975 ],
        [-0.3191281 , -0.23635858, -0.8841506 ]], dtype=float32)>,
 <tf.Variable 'dense_6/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

自然地,这也适用于顺序模型。当您实例化没有输入形状的顺序模型时,它不会被“构建”:它没有权重(并且调用 model.weights 会导致错误,说明了这一点)。权重是在模型第一次看到一些输入数据时创建的

model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)  # No weights at this stage!

# At this point, you can't do this:
# model.weights

# You also can't do this:
# model.summary()

# Call the model on a test input
x = tf.ones((1, 4))
y = model(x)
print("Number of weights after calling the model:", len(model.weights))  # 6
Number of weights after calling the model: 6

模型被“构建”后,您可以调用其 summary() 方法来显示其内容

model.summary()
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_7 (Dense)             (1, 2)                    10        
                                                                 
 dense_8 (Dense)             (1, 3)                    9         
                                                                 
 dense_9 (Dense)             (1, 4)                    16        
                                                                 
=================================================================
Total params: 35 (140.00 Byte)
Trainable params: 35 (140.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

然而,在逐步构建 Sequential 模型时,能够显示到目前为止模型的摘要,包括当前的输出形状,这将非常有用。在这种情况下,您应该通过将一个 Input 对象传递给您的模型来开始您的模型,以便它从一开始就知道其输入形状。

model = keras.Sequential()
model.add(keras.Input(shape=(4,)))
model.add(layers.Dense(2, activation="relu"))

model.summary()
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_10 (Dense)            (None, 2)                 10        
                                                                 
=================================================================
Total params: 10 (40.00 Byte)
Trainable params: 10 (40.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

请注意,Input 对象不会显示为 model.layers 的一部分,因为它不是一个层。

model.layers
[<keras.src.layers.core.dense.Dense at 0x7fa3bc0ba820>]

一个简单的替代方法是将一个 input_shape 参数传递给您的第一层。

model = keras.Sequential()
model.add(layers.Dense(2, activation="relu", input_shape=(4,)))

model.summary()
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_11 (Dense)            (None, 2)                 10        
                                                                 
=================================================================
Total params: 10 (40.00 Byte)
Trainable params: 10 (40.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

使用预定义输入形状构建的模型始终具有权重(即使在看到任何数据之前),并且始终具有定义的输出形状。

一般来说,建议您始终提前指定 Sequential 模型的输入形状,如果您知道它是什么。

常见的调试工作流程:add() + summary()

在构建新的 Sequential 架构时,使用 add() 逐步堆叠层并经常打印模型摘要非常有用。例如,这使您能够监控 Conv2DMaxPooling2D 层的堆栈如何对图像特征图进行降采样。

model = keras.Sequential()
model.add(keras.Input(shape=(250, 250, 3)))  # 250x250 RGB images
model.add(layers.Conv2D(32, 5, strides=2, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(3))

# Can you guess what the current output shape is at this point? Probably not.
# Let's just print it:
model.summary()

# The answer was: (40, 40, 32), so we can keep downsampling...

model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(3))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(2))

# And now?
model.summary()

# Now that we have 4x4 feature maps, time to apply global max pooling.
model.add(layers.GlobalMaxPooling2D())

# Finally, we add a classification layer.
model.add(layers.Dense(10))
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 123, 123, 32)      2432      
                                                                 
 conv2d_1 (Conv2D)           (None, 121, 121, 32)      9248      
                                                                 
 max_pooling2d (MaxPooling2  (None, 40, 40, 32)        0         
 D)                                                              
                                                                 
=================================================================
Total params: 11680 (45.62 KB)
Trainable params: 11680 (45.62 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 123, 123, 32)      2432      
                                                                 
 conv2d_1 (Conv2D)           (None, 121, 121, 32)      9248      
                                                                 
 max_pooling2d (MaxPooling2  (None, 40, 40, 32)        0         
 D)                                                              
                                                                 
 conv2d_2 (Conv2D)           (None, 38, 38, 32)        9248      
                                                                 
 conv2d_3 (Conv2D)           (None, 36, 36, 32)        9248      
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 12, 12, 32)        0         
 g2D)                                                            
                                                                 
 conv2d_4 (Conv2D)           (None, 10, 10, 32)        9248      
                                                                 
 conv2d_5 (Conv2D)           (None, 8, 8, 32)          9248      
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 4, 4, 32)          0         
 g2D)                                                            
                                                                 
=================================================================
Total params: 48672 (190.12 KB)
Trainable params: 48672 (190.12 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

非常实用,对吧?

拥有模型后该怎么办

模型架构准备就绪后,您需要

使用 Sequential 模型进行特征提取

构建 Sequential 模型后,它的行为类似于 函数式 API 模型。这意味着每一层都有一个 inputoutput 属性。这些属性可用于执行一些巧妙的操作,例如快速创建提取 Sequential 模型中所有中间层的输出的模型。

initial_model = keras.Sequential(
    [
        keras.Input(shape=(250, 250, 3)),
        layers.Conv2D(32, 5, strides=2, activation="relu"),
        layers.Conv2D(32, 3, activation="relu"),
        layers.Conv2D(32, 3, activation="relu"),
    ]
)
feature_extractor = keras.Model(
    inputs=initial_model.inputs,
    outputs=[layer.output for layer in initial_model.layers],
)

# Call feature extractor on test input.
x = tf.ones((1, 250, 250, 3))
features = feature_extractor(x)

这是一个类似的示例,它只从一层提取特征。

initial_model = keras.Sequential(
    [
        keras.Input(shape=(250, 250, 3)),
        layers.Conv2D(32, 5, strides=2, activation="relu"),
        layers.Conv2D(32, 3, activation="relu", name="my_intermediate_layer"),
        layers.Conv2D(32, 3, activation="relu"),
    ]
)
feature_extractor = keras.Model(
    inputs=initial_model.inputs,
    outputs=initial_model.get_layer(name="my_intermediate_layer").output,
)
# Call feature extractor on test input.
x = tf.ones((1, 250, 250, 3))
features = feature_extractor(x)

使用 Sequential 模型进行迁移学习

迁移学习包括冻结模型中的底层并仅训练顶层。如果您不熟悉它,请务必阅读我们的 迁移学习指南

以下是涉及 Sequential 模型的两种常见的迁移学习蓝图。

首先,假设您有一个 Sequential 模型,并且您想冻结除最后一层之外的所有层。在这种情况下,您只需遍历 model.layers 并对除最后一层之外的每一层设置 layer.trainable = False。就像这样

model = keras.Sequential([
    keras.Input(shape=(784)),
    layers.Dense(32, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(10),
])

# Presumably you would want to first load pre-trained weights.
model.load_weights(...)

# Freeze all layers except the last one.
for layer in model.layers[:-1]:
  layer.trainable = False

# Recompile and train (this will only update the weights of the last layer).
model.compile(...)
model.fit(...)

另一种常见的蓝图是使用 Sequential 模型来堆叠预训练模型和一些新初始化的分类层。就像这样

# Load a convolutional base with pre-trained weights
base_model = keras.applications.Xception(
    weights='imagenet',
    include_top=False,
    pooling='avg')

# Freeze the base model
base_model.trainable = False

# Use a Sequential model to add a trainable classifier on top
model = keras.Sequential([
    base_model,
    layers.Dense(1000),
])

# Compile & train
model.compile(...)
model.fit(...)

如果您进行迁移学习,您可能会发现自己经常使用这两种模式。

关于 Sequential 模型,您需要了解的知识就这些了!

要详细了解在 Keras 中构建模型,请参阅