Keras 中的剪枝示例

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 下载笔记本

概述

欢迎来到基于幅度的权重剪枝的端到端示例。

其他页面

有关剪枝的介绍以及如何确定是否应该使用剪枝(包括支持的内容),请参阅概述页面

要快速找到适合您的用例的 API(超出使用 80% 稀疏度完全剪枝模型),请参阅综合指南

摘要

在本教程中,您将

  1. 从头开始训练一个用于 MNIST 的keras 模型。
  2. 通过应用剪枝 API 微调模型,并查看准确率。
  3. 从剪枝中创建 3 倍更小的 TF 和 TFLite 模型。
  4. 通过结合剪枝和训练后量化,创建 10 倍更小的 TFLite 模型。
  5. 查看从 TF 到 TFLite 的准确率保持情况。

设置

 pip install -q tensorflow-model-optimization
import tempfile
import os

import tensorflow as tf
import numpy as np

from tensorflow_model_optimization.python.core.keras.compat import keras

%load_ext tensorboard

训练一个用于 MNIST 的模型,不进行剪枝

# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

# Normalize the input image so that each pixel value is between 0 and 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=4,
  validation_split=0.1,
)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
2024-03-09 12:16:05.087631: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Epoch 1/4
1688/1688 [==============================] - 21s 4ms/step - loss: 0.3272 - accuracy: 0.9066 - val_loss: 0.1451 - val_accuracy: 0.9602
Epoch 2/4
1688/1688 [==============================] - 7s 4ms/step - loss: 0.1396 - accuracy: 0.9602 - val_loss: 0.1017 - val_accuracy: 0.9698
Epoch 3/4
1688/1688 [==============================] - 7s 4ms/step - loss: 0.0925 - accuracy: 0.9730 - val_loss: 0.0776 - val_accuracy: 0.9780
Epoch 4/4
1688/1688 [==============================] - 7s 4ms/step - loss: 0.0726 - accuracy: 0.9787 - val_loss: 0.0653 - val_accuracy: 0.9822
<tf_keras.src.callbacks.History at 0x7f630863b790>

评估基线测试准确率,并保存模型以备后用。

_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)

_, keras_file = tempfile.mkstemp('.h5')
keras.models.save_model(model, keras_file, include_optimizer=False)
print('Saved baseline model to:', keras_file)
Baseline test accuracy: 0.9790999889373779
Saved baseline model to: /tmpfs/tmp/tmpq067am7o.h5
/tmpfs/tmp/ipykernel_10506/3790298460.py:7: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native TF-Keras format, e.g. `model.save('my_model.keras')`.
  keras.models.save_model(model, keras_file, include_optimizer=False)

使用剪枝微调预训练模型

定义模型

您将对整个模型应用剪枝,并在模型摘要中看到这一点。

在本示例中,您从 50% 稀疏度(权重中 50% 为零)开始,最终达到 80% 稀疏度。

综合指南中,您可以看到如何剪枝某些层以提高模型准确率。

import tensorflow_model_optimization as tfmot

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Compute end step to finish pruning after 2 epochs.
batch_size = 128
epochs = 2
validation_split = 0.1 # 10% of training set will be used for validation set. 

num_images = train_images.shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs

# Define model for pruning.
pruning_params = {
      'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,
                                                               final_sparsity=0.80,
                                                               begin_step=0,
                                                               end_step=end_step)
}

model_for_pruning = prune_low_magnitude(model, **pruning_params)

# `prune_low_magnitude` requires a recompile.
model_for_pruning.compile(optimizer='adam',
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model_for_pruning.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_reshap  (None, 28, 28, 1)         1         
 e (PruneLowMagnitude)                                           
                                                                 
 prune_low_magnitude_conv2d  (None, 26, 26, 12)        230       
  (PruneLowMagnitude)                                            
                                                                 
 prune_low_magnitude_max_po  (None, 13, 13, 12)        1         
 oling2d (PruneLowMagnitude                                      
 )                                                               
                                                                 
 prune_low_magnitude_flatte  (None, 2028)              1         
 n (PruneLowMagnitude)                                           
                                                                 
 prune_low_magnitude_dense   (None, 10)                40572     
 (PruneLowMagnitude)                                             
                                                                 
=================================================================
Total params: 40805 (159.41 KB)
Trainable params: 20410 (79.73 KB)
Non-trainable params: 20395 (79.69 KB)
_________________________________________________________________

训练模型并根据基线进行评估

使用剪枝进行两个时期的微调。

tfmot.sparsity.keras.UpdatePruningStep 在训练期间是必需的,而 tfmot.sparsity.keras.PruningSummaries 提供日志以跟踪进度和调试。

logdir = tempfile.mkdtemp()

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep(),
  tfmot.sparsity.keras.PruningSummaries(log_dir=logdir),
]

model_for_pruning.fit(train_images, train_labels,
                  batch_size=batch_size, epochs=epochs, validation_split=validation_split,
                  callbacks=callbacks)
Epoch 1/2
  1/422 [..............................] - ETA: 19:24 - loss: 0.0619 - accuracy: 0.9766WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0047s vs `on_train_batch_end` time: 0.0067s). Check your callbacks.
422/422 [==============================] - 5s 6ms/step - loss: 0.1062 - accuracy: 0.9710 - val_loss: 0.1166 - val_accuracy: 0.9718
Epoch 2/2
422/422 [==============================] - 2s 5ms/step - loss: 0.1099 - accuracy: 0.9697 - val_loss: 0.0930 - val_accuracy: 0.9747
<tf_keras.src.callbacks.History at 0x7f6278426cd0>

对于本示例,与基线相比,剪枝后测试准确率的损失很小。

_, model_for_pruning_accuracy = model_for_pruning.evaluate(
   test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy) 
print('Pruned test accuracy:', model_for_pruning_accuracy)
Baseline test accuracy: 0.9790999889373779
Pruned test accuracy: 0.9686999917030334

日志显示了每个层的稀疏度进展情况。

#docs_infra: no_execute
%tensorboard --logdir={logdir}

对于非 Colab 用户,您可以在TensorBoard.dev 上查看此代码块的先前运行结果

从剪枝中创建 3 倍更小的模型

既需要tfmot.sparsity.keras.strip_pruning,也需要应用标准压缩算法(例如,通过 gzip),才能看到剪枝的压缩优势。

  • strip_pruning 是必需的,因为它会删除剪枝仅在训练期间需要的每个 tf.Variable,否则这些变量会在推理期间增加模型大小。
  • 应用标准压缩算法是必需的,因为序列化权重矩阵的大小与剪枝之前相同。但是,剪枝会使大多数权重变为零,这会增加算法可以利用以进一步压缩模型的冗余。

首先,为 TensorFlow 创建一个可压缩模型。

model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

_, pruned_keras_file = tempfile.mkstemp('.h5')
keras.models.save_model(model_for_export, pruned_keras_file, include_optimizer=False)
print('Saved pruned Keras model to:', pruned_keras_file)
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
Saved pruned Keras model to: /tmpfs/tmp/tmphtdbakhm.h5
/tmpfs/tmp/ipykernel_10506/3267383138.py:4: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native TF-Keras format, e.g. `model.save('my_model.keras')`.
  keras.models.save_model(model_for_export, pruned_keras_file, include_optimizer=False)

然后,为 TFLite 创建一个可压缩模型。

converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
pruned_tflite_model = converter.convert()

_, pruned_tflite_file = tempfile.mkstemp('.tflite')

with open(pruned_tflite_file, 'wb') as f:
  f.write(pruned_tflite_model)

print('Saved pruned TFLite model to:', pruned_tflite_file)
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp0i4b2dha/assets
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp0i4b2dha/assets
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1709986617.944296   10506 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format.
W0000 00:00:1709986617.944339   10506 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency.
Saved pruned TFLite model to: /tmpfs/tmp/tmp1i3mzxcp.tflite

定义一个辅助函数,通过 gzip 实际压缩模型并测量压缩后的尺寸。

def get_gzipped_model_size(file):
  # Returns size of gzipped model, in bytes.
  import os
  import zipfile

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(file)

  return os.path.getsize(zipped_file)

比较并查看模型通过剪枝缩小了 3 倍。

print("Size of gzipped baseline Keras model: %.2f bytes" % (get_gzipped_model_size(keras_file)))
print("Size of gzipped pruned Keras model: %.2f bytes" % (get_gzipped_model_size(pruned_keras_file)))
print("Size of gzipped pruned TFlite model: %.2f bytes" % (get_gzipped_model_size(pruned_tflite_file)))
Size of gzipped baseline Keras model: 78239.00 bytes
Size of gzipped pruned Keras model: 25908.00 bytes
Size of gzipped pruned TFlite model: 24848.00 bytes

通过结合剪枝和量化,创建 10 倍更小的模型

您可以对剪枝后的模型应用训练后量化,以获得更多优势。

converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_and_pruned_tflite_model = converter.convert()

_, quantized_and_pruned_tflite_file = tempfile.mkstemp('.tflite')

with open(quantized_and_pruned_tflite_file, 'wb') as f:
  f.write(quantized_and_pruned_tflite_model)

print('Saved quantized and pruned TFLite model to:', quantized_and_pruned_tflite_file)

print("Size of gzipped baseline Keras model: %.2f bytes" % (get_gzipped_model_size(keras_file)))
print("Size of gzipped pruned and quantized TFlite model: %.2f bytes" % (get_gzipped_model_size(quantized_and_pruned_tflite_file)))
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp35zwmyql/assets
INFO:tensorflow:Assets written to: /tmpfs/tmp/tmp35zwmyql/assets
W0000 00:00:1709986618.942100   10506 tf_tfl_flatbuffer_helpers.cc:390] Ignored output_format.
W0000 00:00:1709986618.942130   10506 tf_tfl_flatbuffer_helpers.cc:393] Ignored drop_control_dependency.
Saved quantized and pruned TFLite model to: /tmpfs/tmp/tmp3v6lm0h4.tflite
Size of gzipped baseline Keras model: 78239.00 bytes
Size of gzipped pruned and quantized TFlite model: 8064.00 bytes

查看从 TF 到 TFLite 的准确率保持情况

定义一个辅助函数,在测试数据集上评估 TFLite 模型。

import numpy as np

def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on ever y image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print('Evaluated on {n} results so far.'.format(n=i))
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

您评估剪枝后的量化模型,并查看 TensorFlow 的准确率保持在 TFLite 后端。

interpreter = tf.lite.Interpreter(model_content=quantized_and_pruned_tflite_model)
interpreter.allocate_tensors()

test_accuracy = evaluate_model(interpreter)

print('Pruned and quantized TFLite test_accuracy:', test_accuracy)
print('Pruned TF test accuracy:', model_for_pruning_accuracy)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
WARNING: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#13 is a dynamic-sized tensor).
Evaluated on 0 results so far.
Evaluated on 1000 results so far.
Evaluated on 2000 results so far.
Evaluated on 3000 results so far.
Evaluated on 4000 results so far.
Evaluated on 5000 results so far.
Evaluated on 6000 results so far.
Evaluated on 7000 results so far.
Evaluated on 8000 results so far.
Evaluated on 9000 results so far.


Pruned and quantized TFLite test_accuracy: 0.9691
Pruned TF test accuracy: 0.9686999917030334

结论

在本教程中,您了解了如何使用 TensorFlow 模型优化工具包 API 为 TensorFlow 和 TFLite 创建稀疏模型。然后,您将剪枝与训练后量化相结合,以获得更多优势。

您为 MNIST 创建了一个 10 倍更小的模型,准确率差异很小。

我们鼓励您尝试此新功能,它对于在资源受限的环境中部署尤其重要。