DeepDream

在 TensorFlow.org 上查看

在 Google Colab 中运行

在 GitHub 上查看源代码

下载笔记本

本教程包含 DeepDream 的最小实现，如 Alexander Mordvintsev 在这篇博文中所述。

DeepDream 是一种可视化神经网络学习模式的实验。类似于孩子观察云朵并试图解释随机形状，DeepDream 过度解读并增强它在图像中看到的模式。

它通过将图像前馈到网络中，然后计算图像相对于特定层激活的梯度来实现。然后修改图像以增加这些激活，增强网络看到的模式，并产生梦境般的图像。这个过程被称为“Inceptionism”（参考 InceptionNet 和电影盗梦空间）。

让我们演示如何让神经网络“做梦”并增强它在图像中看到的超现实模式。

Dogception

import tensorflow as tf

2023-11-16 03:41:19.358323: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-16 03:41:19.358367: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-16 03:41:19.359861: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

import numpy as np

import matplotlib as mpl

import IPython.display as display
import PIL.Image

选择要“梦境化”的图像

在本教程中，让我们使用一张拉布拉多犬的图像。

url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg'

# Download an image and read it into a NumPy array.
def download(url, max_dim=None):
  name = url.split('/')[-1]
  image_path = tf.keras.utils.get_file(name, origin=url)
  img = PIL.Image.open(image_path)
  if max_dim:
    img.thumbnail((max_dim, max_dim))
  return np.array(img)

# Normalize an image
def deprocess(img):
  img = 255*(img + 1.0)/2.0
  return tf.cast(img, tf.uint8)

# Display an image
def show(img):
  display.display(PIL.Image.fromarray(np.array(img)))


# Downsizing the image makes it easier to work with.
original_img = download(url, max_dim=500)
show(original_img)
display.display(display.HTML('Image cc-by: <a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg">Von.grzanka</a>'))

png

准备特征提取模型

下载并准备一个预训练的图像分类模型。您将使用 InceptionV3，它类似于最初在 DeepDream 中使用的模型。请注意，任何预训练模型都可以使用，但如果您更改了模型，则需要调整下面的层名称。

base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
87910968/87910968 [==============================] - 0s 0us/step

DeepDream 中的想法是选择一个层（或多个层）并以一种方式最大化“损失”，使图像越来越“激发”这些层。所包含特征的复杂性取决于您选择的层，即较低的层产生笔画或简单模式，而较深的层则在图像中提供复杂的特征，甚至整个物体。

InceptionV3 架构相当大（有关模型架构图，请参阅 TensorFlow 的研究库）。对于 DeepDream，感兴趣的层是卷积连接的层。InceptionV3 中有 11 个这样的层，名为“mixed0”到“mixed10”。使用不同的层将产生不同的梦境般的图像。较深的层对更高层次的特征（如眼睛和面部）做出反应，而较早的层对更简单的特征（如边缘、形状和纹理）做出反应。您可以随意尝试下面选择的层，但请记住，较深的层（索引较高的层）训练时间更长，因为梯度计算更深。

# Maximize the activations of these layers
names = ['mixed3', 'mixed5']
layers = [base_model.get_layer(name).output for name in names]

# Create the feature extraction model
dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)

计算损失

损失是所选层中激活的总和。每个层的损失都经过归一化，因此来自较大层的贡献不会超过较小层的贡献。通常，损失是您希望通过梯度下降来最小化的量。在 DeepDream 中，您将通过梯度上升来最大化此损失。

def calc_loss(img, model):
  # Pass forward the image through the model to retrieve the activations.
  # Converts the image into a batch of size 1.
  img_batch = tf.expand_dims(img, axis=0)
  layer_activations = model(img_batch)
  if len(layer_activations) == 1:
    layer_activations = [layer_activations]

  losses = []
  for act in layer_activations:
    loss = tf.math.reduce_mean(act)
    losses.append(loss)

  return  tf.reduce_sum(losses)

梯度上升

计算完所选层的损失后，剩下的就是计算相对于图像的梯度，并将它们添加到原始图像中。

将梯度添加到图像中会增强网络看到的模式。在每一步，您都将创建一个越来越激发网络中某些层激活的图像。

下面执行此操作的方法包装在一个 tf.function 中以提高性能。它使用 input_signature 来确保函数不会针对不同的图像大小或 steps/step_size 值重新追踪。有关详细信息，请参阅具体函数指南。

class DeepDream(tf.Module):
  def __init__(self, model):
    self.model = model

  @tf.function(
      input_signature=(
        tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
        tf.TensorSpec(shape=[], dtype=tf.int32),
        tf.TensorSpec(shape=[], dtype=tf.float32),)
  )
  def __call__(self, img, steps, step_size):
      print("Tracing")
      loss = tf.constant(0.0)
      for n in tf.range(steps):
        with tf.GradientTape() as tape:
          # This needs gradients relative to `img`
          # `GradientTape` only watches `tf.Variable`s by default
          tape.watch(img)
          loss = calc_loss(img, self.model)

        # Calculate the gradient of the loss with respect to the pixels of the input image.
        gradients = tape.gradient(loss, img)

        # Normalize the gradients.
        gradients /= tf.math.reduce_std(gradients) + 1e-8 

        # In gradient ascent, the "loss" is maximized so that the input image increasingly "excites" the layers.
        # You can update the image by directly adding the gradients (because they're the same shape!)
        img = img + gradients*step_size
        img = tf.clip_by_value(img, -1, 1)

      return loss, img

deepdream = DeepDream(dream_model)

主循环

def run_deep_dream_simple(img, steps=100, step_size=0.01):
  # Convert from uint8 to the range expected by the model.
  img = tf.keras.applications.inception_v3.preprocess_input(img)
  img = tf.convert_to_tensor(img)
  step_size = tf.convert_to_tensor(step_size)
  steps_remaining = steps
  step = 0
  while steps_remaining:
    if steps_remaining>100:
      run_steps = tf.constant(100)
    else:
      run_steps = tf.constant(steps_remaining)
    steps_remaining -= run_steps
    step += run_steps

    loss, img = deepdream(img, run_steps, tf.constant(step_size))

    display.clear_output(wait=True)
    show(deprocess(img))
    print ("Step {}, loss {}".format(step, loss))


  result = deprocess(img)
  display.clear_output(wait=True)
  show(result)

  return result

dream_img = run_deep_dream_simple(img=original_img, 
                                  steps=100, step_size=0.01)

png

提高一个八度

还不错，但第一次尝试中有一些问题

输出有噪声（这可以通过 tf.image.total_variation 损失来解决）。
图像分辨率低。
模式看起来好像都在相同的粒度上发生。

解决所有这些问题的一种方法是在不同的尺度上应用梯度上升。这将允许在较小尺度上生成的模式被合并到较大尺度上的模式中，并用更多细节填充。

为此，您可以执行先前的梯度上升方法，然后增加图像的大小（称为八度），并对多个八度重复此过程。

import time
start = time.time()

OCTAVE_SCALE = 1.30

img = tf.constant(np.array(original_img))
base_shape = tf.shape(img)[:-1]
float_base_shape = tf.cast(base_shape, tf.float32)

for n in range(-2, 3):
  new_shape = tf.cast(float_base_shape*(OCTAVE_SCALE**n), tf.int32)

  img = tf.image.resize(img, new_shape).numpy()

  img = run_deep_dream_simple(img=img, steps=50, step_size=0.01)

display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)

end = time.time()
end-start

png

16.497364282608032

可选：使用图块进行放大

需要注意的是，随着图像大小的增加，执行梯度计算所需的时间和内存也会增加。上面的八度实现无法处理非常大的图像或多个八度。

为了避免这个问题，您可以将图像分成图块，并为每个图块计算梯度。

在每次图块计算之前对图像应用随机偏移可以防止图块接缝出现。

首先实现随机偏移

def random_roll(img, maxroll):
  # Randomly shift the image to avoid tiled boundaries.
  shift = tf.random.uniform(shape=[2], minval=-maxroll, maxval=maxroll, dtype=tf.int32)
  img_rolled = tf.roll(img, shift=shift, axis=[0,1])
  return shift, img_rolled

shift, img_rolled = random_roll(np.array(original_img), 512)
show(img_rolled)

png

这是前面定义的 deepdream 函数的图块等效项

class TiledGradients(tf.Module):
  def __init__(self, model):
    self.model = model

  @tf.function(
      input_signature=(
        tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
        tf.TensorSpec(shape=[2], dtype=tf.int32),
        tf.TensorSpec(shape=[], dtype=tf.int32),)
  )
  def __call__(self, img, img_size, tile_size=512):
    shift, img_rolled = random_roll(img, tile_size)

    # Initialize the image gradients to zero.
    gradients = tf.zeros_like(img_rolled)

    # Skip the last tile, unless there's only one tile.
    xs = tf.range(0, img_size[1], tile_size)[:-1]
    if not tf.cast(len(xs), bool):
      xs = tf.constant([0])
    ys = tf.range(0, img_size[0], tile_size)[:-1]
    if not tf.cast(len(ys), bool):
      ys = tf.constant([0])

    for x in xs:
      for y in ys:
        # Calculate the gradients for this tile.
        with tf.GradientTape() as tape:
          # This needs gradients relative to `img_rolled`.
          # `GradientTape` only watches `tf.Variable`s by default.
          tape.watch(img_rolled)

          # Extract a tile out of the image.
          img_tile = img_rolled[y:y+tile_size, x:x+tile_size]
          loss = calc_loss(img_tile, self.model)

        # Update the image gradients for this tile.
        gradients = gradients + tape.gradient(loss, img_rolled)

    # Undo the random shift applied to the image and its gradients.
    gradients = tf.roll(gradients, shift=-shift, axis=[0,1])

    # Normalize the gradients.
    gradients /= tf.math.reduce_std(gradients) + 1e-8 

    return gradients

get_tiled_gradients = TiledGradients(dream_model)

将这些放在一起，就可以得到一个可扩展的、八度感知的 DeepDream 实现

def run_deep_dream_with_octaves(img, steps_per_octave=100, step_size=0.01, 
                                octaves=range(-2,3), octave_scale=1.3):
  base_shape = tf.shape(img)
  img = tf.keras.utils.img_to_array(img)
  img = tf.keras.applications.inception_v3.preprocess_input(img)

  initial_shape = img.shape[:-1]
  img = tf.image.resize(img, initial_shape)
  for octave in octaves:
    # Scale the image based on the octave
    new_size = tf.cast(tf.convert_to_tensor(base_shape[:-1]), tf.float32)*(octave_scale**octave)
    new_size = tf.cast(new_size, tf.int32)
    img = tf.image.resize(img, new_size)

    for step in range(steps_per_octave):
      gradients = get_tiled_gradients(img, new_size)
      img = img + gradients*step_size
      img = tf.clip_by_value(img, -1, 1)

      if step % 10 == 0:
        display.clear_output(wait=True)
        show(deprocess(img))
        print ("Octave {}, Step {}".format(octave, step))

  result = deprocess(img)
  return result

img = run_deep_dream_with_octaves(img=original_img, step_size=0.01)

display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)

png

好多了！尝试调整八度的数量、八度尺度和激活的层，以改变 DeepDream 处理后的图像的外观。

读者可能还会对 TensorFlow Lucid 感兴趣，它扩展了本教程中介绍的想法，以可视化和解释神经网络。