TensorFlow 基础

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 GitHub 上查看源代码

下载笔记本

本指南简要概述了TensorFlow 基础。本文档的每个部分都概述了一个更大的主题 - 您可以在每个部分的末尾找到指向完整指南的链接。

TensorFlow 是一个端到端的机器学习平台。它支持以下内容

基于多维数组的数值计算（类似于 NumPy。）
GPU 和分布式处理
自动微分
模型构建、训练和导出
等等

张量

TensorFlow 在多维数组或张量上运行，这些张量表示为 tf.Tensor 对象。这是一个二维张量

import tensorflow as tf

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

print(x)
print(x.shape)
print(x.dtype)

tf.Tensor 的最重要的属性是它的 shape 和 dtype

Tensor.shape：告诉您张量沿每个轴的大小。
Tensor.dtype：告诉您张量中所有元素的类型。

TensorFlow 在张量上实现了标准数学运算，以及许多专门用于机器学习的运算。

例如

x + x

5 * x

x @ tf.transpose(x)

tf.concat([x, x, x], axis=0)

tf.nn.softmax(x, axis=-1)

tf.reduce_sum(x)

tf.convert_to_tensor([1,2,3])

tf.reduce_sum([1,2,3])

在 CPU 上运行大型计算可能很慢。如果配置正确，TensorFlow 可以使用 GPU 等加速器硬件来非常快地执行操作。

if tf.config.list_physical_devices('GPU'):
  print("TensorFlow **IS** using the GPU")
else:
  print("TensorFlow **IS NOT** using the GPU")

有关详细信息，请参阅张量指南。

变量

普通的 tf.Tensor 对象是不可变的。要在 TensorFlow 中存储模型权重（或其他可变状态），请使用 tf.Variable。

var = tf.Variable([0.0, 0.0, 0.0])

var.assign([1, 2, 3])

var.assign_add([1, 1, 1])

有关详细信息，请参阅变量指南。

自动微分

梯度下降 及其相关算法是现代机器学习的基石。

为了实现这一点，TensorFlow 实现了自动微分 (autodiff)，它使用微积分来计算梯度。通常，您将使用它来计算模型误差或损失相对于其权重的梯度。

x = tf.Variable(1.0)

def f(x):
  y = x**2 + 2*x - 5
  return y

f(x)

在 x = 1.0 时，y = f(x) = (1**2 + 2*1 - 5) = -2。

y 的导数为 y' = f'(x) = (2*x + 2) = 4。TensorFlow 可以自动计算这一点

with tf.GradientTape() as tape:
  y = f(x)

g_x = tape.gradient(y, x)  # g(x) = dy/dx

g_x

这个简化的示例只计算了相对于单个标量 (x) 的导数，但 TensorFlow 可以同时计算相对于任意数量的非标量张量的梯度。

有关详细信息，请参阅自动微分指南。

图和 tf.function

虽然您可以像使用任何 Python 库一样交互式地使用 TensorFlow，但 TensorFlow 还提供用于以下方面的工具：

性能优化：加速训练和推理。
导出：以便您可以在训练完成后保存模型。

这些要求您使用 tf.function 将纯 TensorFlow 代码与 Python 代码分开。

@tf.function
def my_func(x):
  print('Tracing.\n')
  return tf.reduce_sum(x)

您第一次运行 tf.function 时，虽然它在 Python 中执行，但它会捕获一个完整的、优化的图，该图表示函数内完成的 TensorFlow 计算。

x = tf.constant([1, 2, 3])
my_func(x)

在后续调用中，TensorFlow 只执行优化的图，跳过任何非 TensorFlow 步骤。请注意，下面的 my_func 不会打印跟踪，因为 print 是一个 Python 函数，而不是 TensorFlow 函数。

x = tf.constant([10, 9, 8])
my_func(x)

对于具有不同签名（shape 和 dtype）的输入，图可能无法重用，因此会生成一个新图。

x = tf.constant([10.0, 9.1, 8.2], dtype=tf.float32)
my_func(x)

这些捕获的图提供了两个好处：

在许多情况下，它们可以显着提高执行速度（尽管在这个简单的示例中并非如此）。
您可以使用 tf.saved_model 导出这些图，以便在其他系统（如服务器或移动设备）上运行，无需安装 Python。

有关更多详细信息，请参阅图简介。

模块、层和模型

tf.Module 是一个用于管理 tf.Variable 对象和对它们进行操作的 tf.function 对象的类。 tf.Module 类对于支持两个重要功能是必要的：

您可以使用 tf.train.Checkpoint 保存和恢复变量的值。这在训练期间很有用，因为它可以快速保存和恢复模型的状态。
您可以使用 tf.saved_model 导入和导出 tf.Variable 值以及 tf.function 图。这使您能够独立于创建它的 Python 程序运行模型。

以下是一个完整的示例，用于导出一个简单的 tf.Module 对象：

class MyModule(tf.Module):
  def __init__(self, value):
    self.weight = tf.Variable(value)

  @tf.function
  def multiply(self, x):
    return x * self.weight

mod = MyModule(3)
mod.multiply(tf.constant([1, 2, 3]))

保存 Module

save_path = './saved'
tf.saved_model.save(mod, save_path)

生成的 SavedModel 独立于创建它的代码。您可以从 Python、其他语言绑定或 TensorFlow Serving 加载 SavedModel。您还可以将其转换为使用 TensorFlow Lite 或 TensorFlow JS 运行。

reloaded = tf.saved_model.load(save_path)
reloaded.multiply(tf.constant([1, 2, 3]))

tf.keras.layers.Layer 和 tf.keras.Model 类基于 tf.Module，提供了用于构建、训练和保存模型的额外功能和便捷方法。下一节将演示其中的一些功能。

有关详细信息，请参阅模块简介。

训练循环

现在将所有这些放在一起，构建一个基本模型并从头开始训练它。

首先，创建一些示例数据。这将生成一个松散地遵循二次曲线的点云。

import matplotlib
from matplotlib import pyplot as plt

matplotlib.rcParams['figure.figsize'] = [9, 6]

x = tf.linspace(-2, 2, 201)
x = tf.cast(x, tf.float32)

def f(x):
  y = x**2 + 2*x - 5
  return y

y = f(x) + tf.random.normal(shape=[201])

plt.plot(x.numpy(), y.numpy(), '.', label='Data')
plt.plot(x, f(x), label='Ground truth')
plt.legend();

创建一个具有随机初始化权重和偏差的二次模型。

class Model(tf.Module):

  def __init__(self):
    # Randomly generate weight and bias terms
    rand_init = tf.random.uniform(shape=[3], minval=0., maxval=5., seed=22)
    # Initialize model parameters
    self.w_q = tf.Variable(rand_init[0])
    self.w_l = tf.Variable(rand_init[1])
    self.b = tf.Variable(rand_init[2])

  @tf.function
  def __call__(self, x):
    # Quadratic Model : quadratic_weight * x^2 + linear_weight * x + bias
    return self.w_q * (x**2) + self.w_l * x + self.b

首先，观察模型在训练前的性能。

quad_model = Model()

def plot_preds(x, y, f, model, title):
  plt.figure()
  plt.plot(x, y, '.', label='Data')
  plt.plot(x, f(x), label='Ground truth')
  plt.plot(x, model(x), label='Predictions')
  plt.title(title)
  plt.legend()

plot_preds(x, y, f, quad_model, 'Before training')

现在，为您的模型定义一个损失函数。

鉴于此模型旨在预测连续值，均方误差 (MSE) 是损失函数的良好选择。给定一个预测向量 \(\hat{y}\) 和一个真实目标向量 \(y\)，MSE 定义为预测值与真实值之间平方差的平均值。

\(MSE = \frac{1}{m}\sum_{i=1}^{m}(\hat{y}_i -y_i)^2\)

def mse_loss(y_pred, y):
  return tf.reduce_mean(tf.square(y_pred - y))

为模型编写一个基本的训练循环。该循环将利用 MSE 损失函数及其相对于输入的梯度来迭代更新模型的参数。使用小批量进行训练可以提供内存效率和更快的收敛速度。 tf.data.Dataset API 具有用于批处理和混洗的有用函数。

batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.shuffle(buffer_size=x.shape[0]).batch(batch_size)

# Set training parameters
epochs = 100
learning_rate = 0.01
losses = []

# Format training loop
for epoch in range(epochs):
  for x_batch, y_batch in dataset:
    with tf.GradientTape() as tape:
      batch_loss = mse_loss(quad_model(x_batch), y_batch)
    # Update parameters with respect to the gradient calculations
    grads = tape.gradient(batch_loss, quad_model.variables)
    for g,v in zip(grads, quad_model.variables):
        v.assign_sub(learning_rate*g)
  # Keep track of model loss per epoch
  loss = mse_loss(quad_model(x), y)
  losses.append(loss)
  if epoch % 10 == 0:
    print(f'Mean squared error for step {epoch}: {loss.numpy():0.3f}')

# Plot model results
print("\n")
plt.plot(range(epochs), losses)
plt.xlabel("Epoch")
plt.ylabel("Mean Squared Error (MSE)")
plt.title('MSE loss vs training iterations');

现在，观察模型在训练后的性能。

plot_preds(x, y, f, quad_model, 'After training')

这有效，但请记住，tf.keras 模块中提供了常见训练实用程序的实现。因此，在编写自己的代码之前，请考虑使用这些代码。首先， Model.compile 和 Model.fit 方法为您实现了训练循环。

首先使用 tf.keras.Sequential 在 Keras 中创建一个顺序模型。最简单的 Keras 层之一是密集层，可以使用 tf.keras.layers.Dense 实例化。密集层能够学习形式为 \(\mathrm{Y} = \mathrm{W}\mathrm{X} + \vec{b}\) 的多维线性关系。为了学习形式为 \(w_1x^2 + w_2x + b\) 的非线性方程，密集层的输入应该是一个数据矩阵，其中 \(x^2\) 和 \(x\) 作为特征。可以使用 lambda 层 tf.keras.layers.Lambda 来执行此堆叠转换。

new_model = tf.keras.Sequential([
    tf.keras.layers.Lambda(lambda x: tf.stack([x, x**2], axis=1)),
    tf.keras.layers.Dense(units=1, kernel_initializer=tf.random.normal)])

new_model.compile(
    loss=tf.keras.losses.MSE,
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01))

history = new_model.fit(x, y,
                        epochs=100,
                        batch_size=32,
                        verbose=0)

new_model.save('./my_new_model')

观察 Keras 模型在训练后的性能。

plt.plot(history.history['loss'])
plt.xlabel('Epoch')
plt.ylim([0, max(plt.ylim())])
plt.ylabel('Loss [Mean Squared Error]')
plt.title('Keras training progress');

plot_preds(x, y, f, new_model, 'After Training: Keras')

有关更多详细信息，请参阅基本训练循环和 Keras 指南。