自定义基础：张量和操作

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 GitHub 上查看源代码

下载笔记本

这是一个介绍性的 TensorFlow 教程，展示了如何

导入所需的包。
创建和使用张量。
使用 GPU 加速。
使用 tf.data.Dataset 构建数据管道。

导入 TensorFlow

要开始，请导入 tensorflow 模块。从 TensorFlow 2 开始，默认情况下会启用急切执行。急切执行为 TensorFlow 提供了更具交互性的前端，您将在后面详细探讨。

import tensorflow as tf

张量

张量是一个多维数组。类似于 NumPy 的 ndarray 对象，tf.Tensor 对象具有数据类型和形状。此外，tf.Tensor 可以驻留在加速器内存（如 GPU）中。TensorFlow 提供了丰富的操作库（例如，tf.math.add、tf.linalg.matmul 和 tf.linalg.inv），这些操作使用和生成 tf.Tensor。这些操作会自动转换内置的 Python 类型。例如

print(tf.math.add(1, 2))
print(tf.math.add([1, 2], [3, 4]))
print(tf.math.square(5))
print(tf.math.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(tf.math.square(2) + tf.math.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)

每个 tf.Tensor 都有一个形状和一个数据类型

x = tf.linalg.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>

NumPy 数组和 tf.Tensor 之间最明显的区别是

张量可以由加速器内存（如 GPU、TPU）支持。
张量是不可变的。

NumPy 兼容性

在 TensorFlow tf.Tensor 和 NumPy ndarray 之间进行转换很容易

TensorFlow 操作会自动将 NumPy ndarrays 转换为张量。
NumPy 操作会自动将张量转换为 NumPy ndarrays。

使用张量的 .numpy() 方法将张量显式转换为 NumPy ndarrays。这些转换通常很便宜，因为数组和 tf.Tensor 共享底层内存表示（如果可能）。但是，并不总是可以共享底层表示，因为 tf.Tensor 可能托管在 GPU 内存中，而 NumPy 数组始终由主机内存支持，转换涉及从 GPU 到主机内存的复制。

import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.math.multiply(ndarray, 42)
print(tensor)


print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)
And NumPy operations convert Tensors to NumPy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]
The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]

GPU 加速

许多 TensorFlow 操作使用 GPU 加速计算。在没有任何注释的情况下，TensorFlow 会自动决定是否使用 GPU 或 CPU 执行操作，并在必要时将张量复制到 CPU 和 GPU 内存之间。操作生成的张量通常由执行操作的设备的内存支持。例如

x = tf.random.uniform([3, 3])

print("Is there a GPU available: "),
print(tf.config.list_physical_devices("GPU"))

print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

Is there a GPU available: 
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
Is the Tensor on GPU #0:  
True

设备名称

该 Tensor.device 属性提供托管张量内容的设备的完全限定字符串名称。此名称编码了许多详细信息，例如执行此程序的主机的网络地址标识符以及该主机内的设备。这是分布式执行 TensorFlow 程序所必需的。如果张量放置在主机的第 N 个 GPU 上，则字符串以 GPU:<N> 结尾。

显式设备放置

在 TensorFlow 中，放置指的是如何将单个操作分配（放置在）设备上以执行。如前所述，当没有提供显式指导时，TensorFlow 会自动决定执行操作的设备，并在需要时将张量复制到该设备。

但是，可以使用 tf.device 上下文管理器将 TensorFlow 操作显式放置在特定设备上。例如

import time

def time_matmul(x):
  start = time.time()
  for loop in range(10):
    tf.linalg.matmul(x, x)

  result = time.time()-start

  print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
  x = tf.random.uniform([1000, 1000])
  assert x.device.endswith("CPU:0")
  time_matmul(x)

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
  print("On GPU:")
  with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("GPU:0")
    time_matmul(x)

On CPU:
10 loops: 42.76ms
On GPU:
10 loops: 300.72ms

数据集

本节使用 tf.data.Dataset API 来构建一个管道，用于将数据馈送到您的模型。 tf.data.Dataset 用于从简单、可重复使用的部分构建高性能、复杂输入管道，这些管道将馈送到模型的训练或评估循环。（请参阅 tf.data：构建 TensorFlow 输入管道指南以了解更多信息。）

创建一个源 `Dataset`

使用其中一个工厂函数（如 tf.data.Dataset.from_tensors、tf.data.Dataset.from_tensor_slices）或使用从文件读取的对象（如 tf.data.TextLineDataset 或 tf.data.TFRecordDataset）来创建一个源数据集。有关更多信息，请参阅 tf.data：构建 TensorFlow 输入管道指南的读取输入数据部分。

ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()

with open(filename, 'w') as f:
  f.write("""Line 1
Line 2
Line 3
  """)

ds_file = tf.data.TextLineDataset(filename)

应用转换

使用转换函数（如 tf.data.Dataset.map、tf.data.Dataset.batch 和 tf.data.Dataset.shuffle）将转换应用于数据集记录。

ds_tensors = ds_tensors.map(tf.math.square).shuffle(2).batch(2)

ds_file = ds_file.batch(2)

迭代

tf.data.Dataset 对象支持迭代以循环遍历记录

print('Elements of ds_tensors:')
for x in ds_tensors:
  print(x)

print('\nElements in ds_file:')
for x in ds_file:
  print(x)

Elements of ds_tensors:
tf.Tensor([4 9], shape=(2,), dtype=int32)
tf.Tensor([ 1 25], shape=(2,), dtype=int32)
tf.Tensor([16 36], shape=(2,), dtype=int32)

Elements in ds_file:
tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'Line 3' b'  '], shape=(2,), dtype=string)