计算梯度

在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 下载笔记本

本教程探讨了量子电路期望值的梯度计算算法。

计算量子电路中某个可观测量的期望值的梯度是一个复杂的过程。可观测量的期望值没有解析梯度公式的优势,这些公式通常很容易写下来,这不同于传统的机器学习转换,例如矩阵乘法或向量加法,它们具有易于写下的解析梯度公式。因此,有不同的量子梯度计算方法适用于不同的场景。本教程比较和对比了两种不同的微分方案。

设置

pip install tensorflow==2.15.0

安装 TensorFlow Quantum

pip install tensorflow-quantum==0.7.3
# Update package resources to account for version changes.
import importlib, pkg_resources
importlib.reload(pkg_resources)
/tmpfs/tmp/ipykernel_13801/1875984233.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import importlib, pkg_resources
<module 'pkg_resources' from '/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/pkg_resources/__init__.py'>

现在导入 TensorFlow 和模块依赖项

import tensorflow as tf
import tensorflow_quantum as tfq

import cirq
import sympy
import numpy as np

# visualization tools
%matplotlib inline
import matplotlib.pyplot as plt
from cirq.contrib.svg import SVGCircuit
2024-05-18 11:25:27.526147: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-18 11:25:27.526190: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-18 11:25:27.527683: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-18 11:25:30.854843: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:274] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

1. 预备

让我们使量子电路的梯度计算概念更加具体。假设你有一个这样的参数化电路

qubit = cirq.GridQubit(0, 0)
my_circuit = cirq.Circuit(cirq.Y(qubit)**sympy.Symbol('alpha'))
SVGCircuit(my_circuit)
findfont: Font family 'Arial' not found.
findfont: Font family 'Arial' not found.

svg

以及一个可观测量

pauli_x = cirq.X(qubit)
pauli_x
cirq.X(cirq.GridQubit(0, 0))

观察此算子,你便知道 \(⟨Y(\alpha)| X | Y(\alpha)⟩ = \sin(\pi \alpha)\)

def my_expectation(op, alpha):
    """Compute ⟨Y(alpha)| `op` | Y(alpha)⟩"""
    params = {'alpha': alpha}
    sim = cirq.Simulator()
    final_state_vector = sim.simulate(my_circuit, params).final_state_vector
    return op.expectation_from_state_vector(final_state_vector, {qubit: 0}).real


my_alpha = 0.3
print("Expectation=", my_expectation(pauli_x, my_alpha))
print("Sin Formula=", np.sin(np.pi * my_alpha))
Expectation= 0.80901700258255
Sin Formula= 0.8090169943749475

如果你定义 \(f_{1}(\alpha) = ⟨Y(\alpha)| X | Y(\alpha)⟩\),则 \(f_{1}^{'}(\alpha) = \pi \cos(\pi \alpha)\)。让我们检查一下

def my_grad(obs, alpha, eps=0.01):
    grad = 0
    f_x = my_expectation(obs, alpha)
    f_x_prime = my_expectation(obs, alpha + eps)
    return ((f_x_prime - f_x) / eps).real


print('Finite difference:', my_grad(pauli_x, my_alpha))
print('Cosine formula:   ', np.pi * np.cos(np.pi * my_alpha))
Finite difference: 1.8063604831695557
Cosine formula:    1.8465818304904567

2. 需要一个微分器

对于较大的电路,你不会总是那么幸运,可以拥有一个精确计算给定量子电路的梯度的公式。如果一个简单的公式不足以计算梯度,则 tfq.differentiators.Differentiator 类允许你定义算法来计算电路的梯度。例如,你可以使用以下方法在 TensorFlow Quantum (TFQ) 中重新创建上述示例

expectation_calculation = tfq.layers.Expectation(
    differentiator=tfq.differentiators.ForwardDifference(grid_spacing=0.01))

expectation_calculation(my_circuit,
                        operators=pauli_x,
                        symbol_names=['alpha'],
                        symbol_values=[[my_alpha]])
<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.80901706]], dtype=float32)>

但是,如果你切换到基于采样的期望值估计(在真实设备上会发生什么),这些值可能会发生一些变化。这意味着你现在有了不完美的估计

sampled_expectation_calculation = tfq.layers.SampledExpectation(
    differentiator=tfq.differentiators.ForwardDifference(grid_spacing=0.01))

sampled_expectation_calculation(my_circuit,
                                operators=pauli_x,
                                repetitions=500,
                                symbol_names=['alpha'],
                                symbol_values=[[my_alpha]])
<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.832]], dtype=float32)>

当涉及到梯度时,这可能会迅速演变成一个严重的精度问题

# Make input_points = [batch_size, 1] array.
input_points = np.linspace(0, 5, 200)[:, np.newaxis].astype(np.float32)
exact_outputs = expectation_calculation(my_circuit,
                                        operators=pauli_x,
                                        symbol_names=['alpha'],
                                        symbol_values=input_points)
imperfect_outputs = sampled_expectation_calculation(my_circuit,
                                                    operators=pauli_x,
                                                    repetitions=500,
                                                    symbol_names=['alpha'],
                                                    symbol_values=input_points)
plt.title('Forward Pass Values')
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.plot(input_points, exact_outputs, label='Analytic')
plt.plot(input_points, imperfect_outputs, label='Sampled')
plt.legend()
<matplotlib.legend.Legend at 0x7f3c740e74c0>

png

# Gradients are a much different story.
values_tensor = tf.convert_to_tensor(input_points)

with tf.GradientTape() as g:
    g.watch(values_tensor)
    exact_outputs = expectation_calculation(my_circuit,
                                            operators=pauli_x,
                                            symbol_names=['alpha'],
                                            symbol_values=values_tensor)
analytic_finite_diff_gradients = g.gradient(exact_outputs, values_tensor)

with tf.GradientTape() as g:
    g.watch(values_tensor)
    imperfect_outputs = sampled_expectation_calculation(
        my_circuit,
        operators=pauli_x,
        repetitions=500,
        symbol_names=['alpha'],
        symbol_values=values_tensor)
sampled_finite_diff_gradients = g.gradient(imperfect_outputs, values_tensor)

plt.title('Gradient Values')
plt.xlabel('$x$')
plt.ylabel('$f^{\'}(x)$')
plt.plot(input_points, analytic_finite_diff_gradients, label='Analytic')
plt.plot(input_points, sampled_finite_diff_gradients, label='Sampled')
plt.legend()
<matplotlib.legend.Legend at 0x7f3b6c627fd0>

png

在这里你可以看到,尽管有限差分公式在分析情况下可以快速计算梯度本身,但当涉及到基于采样的方法时,它太嘈杂了。必须使用更谨慎的技术来确保可以计算出良好的梯度。接下来,你将看到一种更慢的技术,它不太适合分析期望梯度计算,但在基于真实世界样本的情况下表现得更好

# A smarter differentiation scheme.
gradient_safe_sampled_expectation = tfq.layers.SampledExpectation(
    differentiator=tfq.differentiators.ParameterShift())

with tf.GradientTape() as g:
    g.watch(values_tensor)
    imperfect_outputs = gradient_safe_sampled_expectation(
        my_circuit,
        operators=pauli_x,
        repetitions=500,
        symbol_names=['alpha'],
        symbol_values=values_tensor)

sampled_param_shift_gradients = g.gradient(imperfect_outputs, values_tensor)

plt.title('Gradient Values')
plt.xlabel('$x$')
plt.ylabel('$f^{\'}(x)$')
plt.plot(input_points, analytic_finite_diff_gradients, label='Analytic')
plt.plot(input_points, sampled_param_shift_gradients, label='Sampled')
plt.legend()
<matplotlib.legend.Legend at 0x7f3b6c557400>

png

从以上内容中,你可以看到某些微分器最适合用于特定的研究场景。一般来说,在更“真实世界”的设置中测试或实现算法时,对设备噪声等具有鲁棒性的较慢的基于样本的方法是优秀的微分器。像有限差分这样的更快的方法非常适合分析计算,并且你希望获得更高的吞吐量,但尚未关注算法的设备可行性。

3. 多个可观测量

让我们引入第二个可观测量,看看 TensorFlow Quantum 如何支持单个电路的多个可观测量。

pauli_z = cirq.Z(qubit)
pauli_z
cirq.Z(cirq.GridQubit(0, 0))

如果将此可观测量与之前的电路一起使用,则有 \(f_{2}(\alpha) = ⟨Y(\alpha)| Z | Y(\alpha)⟩ = \cos(\pi \alpha)\) 和 \(f_{2}^{'}(\alpha) = -\pi \sin(\pi \alpha)\)。执行快速检查

test_value = 0.

print('Finite difference:', my_grad(pauli_z, test_value))
print('Sin formula:      ', -np.pi * np.sin(np.pi * test_value))
Finite difference: -0.04934072494506836
Sin formula:       -0.0

这是一个匹配(足够接近)。

现在,如果你定义 \(g(\alpha) = f_{1}(\alpha) + f_{2}(\alpha)\),则 \(g'(\alpha) = f_{1}^{'}(\alpha) + f^{'}_{2}(\alpha)\)。在 TensorFlow Quantum 中定义多个可观测量以与电路一起使用等效于向 \(g\) 添加更多项。

这意味着电路中特定符号的梯度等于针对该电路应用于该符号的每个可观测量的梯度的和。这与 TensorFlow 梯度获取和反向传播兼容(其中你将所有可观测量上的梯度和作为特定符号的梯度)。

sum_of_outputs = tfq.layers.Expectation(
    differentiator=tfq.differentiators.ForwardDifference(grid_spacing=0.01))

sum_of_outputs(my_circuit,
               operators=[pauli_x, pauli_z],
               symbol_names=['alpha'],
               symbol_values=[[test_value]])
<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[1.9106855e-15, 1.0000000e+00]], dtype=float32)>

在这里,你看到第一个条目是相对于泡利 X 的期望,第二个条目是相对于泡利 Z 的期望。现在,当你获取梯度时

test_value_tensor = tf.convert_to_tensor([[test_value]])

with tf.GradientTape() as g:
    g.watch(test_value_tensor)
    outputs = sum_of_outputs(my_circuit,
                             operators=[pauli_x, pauli_z],
                             symbol_names=['alpha'],
                             symbol_values=test_value_tensor)

sum_of_gradients = g.gradient(outputs, test_value_tensor)

print(my_grad(pauli_x, test_value) + my_grad(pauli_z, test_value))
print(sum_of_gradients.numpy())
3.0917350202798843
[[3.0917213]]

在这里,你已经验证了每个可观测量的梯度之和确实是 \(\alpha\) 的梯度。所有 TensorFlow Quantum 微分器都支持此行为,并且在与 TensorFlow 其余部分的兼容性中发挥着至关重要的作用。

4. 高级用法

所有存在于 TensorFlow Quantum 子类 tfq.differentiators.Differentiator 中的微分器。要实现微分器,用户必须实现两个接口之一。标准是实现 get_gradient_circuits,它告诉基类测量哪些电路以获得梯度的估计值。或者,您可以重载 differentiate_analyticdifferentiate_sampled;类 tfq.differentiators.Adjoint 采用此途径。

以下使用 TensorFlow Quantum 来实现电路的梯度。您将使用参数偏移的一个小示例。

回想一下您上面定义的电路,\(|\alpha⟩ = Y^{\alpha}|0⟩\)。与之前一样,您可以将函数定义为该电路针对 \(X\) 可观测量的期望值,\(f(\alpha) = ⟨\alpha|X|\alpha⟩\)。使用 参数偏移规则,对于该电路,您可以发现导数为

\[\frac{\partial}{\partial \alpha} f(\alpha) = \frac{\pi}{2} f\left(\alpha + \frac{1}{2}\right) - \frac{ \pi}{2} f\left(\alpha - \frac{1}{2}\right)\]

get_gradient_circuits 函数返回此导数的各个分量。

class MyDifferentiator(tfq.differentiators.Differentiator):
    """A Toy differentiator for <Y^alpha | X |Y^alpha>."""

    def __init__(self):
        pass

    def get_gradient_circuits(self, programs, symbol_names, symbol_values):
        """Return circuits to compute gradients for given forward pass circuits.

        Every gradient on a quantum computer can be computed via measurements
        of transformed quantum circuits.  Here, you implement a custom gradient
        for a specific circuit.  For a real differentiator, you will need to
        implement this function in a more general way.  See the differentiator
        implementations in the TFQ library for examples.
        """

        # The two terms in the derivative are the same circuit...
        batch_programs = tf.stack([programs, programs], axis=1)

        # ... with shifted parameter values.
        shift = tf.constant(1/2)
        forward = symbol_values + shift
        backward = symbol_values - shift
        batch_symbol_values = tf.stack([forward, backward], axis=1)

        # Weights are the coefficients of the terms in the derivative.
        num_program_copies = tf.shape(batch_programs)[0]
        batch_weights = tf.tile(tf.constant([[[np.pi/2, -np.pi/2]]]),
                                [num_program_copies, 1, 1])

        # The index map simply says which weights go with which circuits.
        batch_mapper = tf.tile(
            tf.constant([[[0, 1]]]), [num_program_copies, 1, 1])

        return (batch_programs, symbol_names, batch_symbol_values,
                batch_weights, batch_mapper)

Differentiator 基类使用从 get_gradient_circuits 返回的分量来计算导数,如您在上面的参数偏移公式中看到的。此新微分器现在可以与现有的 tfq.layer 对象一起使用

custom_dif = MyDifferentiator()
custom_grad_expectation = tfq.layers.Expectation(differentiator=custom_dif)

# Now let's get the gradients with finite diff.
with tf.GradientTape() as g:
    g.watch(values_tensor)
    exact_outputs = expectation_calculation(my_circuit,
                                            operators=[pauli_x],
                                            symbol_names=['alpha'],
                                            symbol_values=values_tensor)

analytic_finite_diff_gradients = g.gradient(exact_outputs, values_tensor)

# Now let's get the gradients with custom diff.
with tf.GradientTape() as g:
    g.watch(values_tensor)
    my_outputs = custom_grad_expectation(my_circuit,
                                         operators=[pauli_x],
                                         symbol_names=['alpha'],
                                         symbol_values=values_tensor)

my_gradients = g.gradient(my_outputs, values_tensor)

plt.subplot(1, 2, 1)
plt.title('Exact Gradient')
plt.plot(input_points, analytic_finite_diff_gradients.numpy())
plt.xlabel('x')
plt.ylabel('f(x)')
plt.subplot(1, 2, 2)
plt.title('My Gradient')
plt.plot(input_points, my_gradients.numpy())
plt.xlabel('x')
Text(0.5, 0, 'x')

png

此新微分器现在可用于生成可微分运算。

# Create a noisy sample based expectation op.
expectation_sampled = tfq.get_sampled_expectation_op(
    cirq.DensityMatrixSimulator(noise=cirq.depolarize(0.01)))

# Make it differentiable with your differentiator:
# Remember to refresh the differentiator before attaching the new op
custom_dif.refresh()
differentiable_op = custom_dif.generate_differentiable_op(
    sampled_op=expectation_sampled)

# Prep op inputs.
circuit_tensor = tfq.convert_to_tensor([my_circuit])
op_tensor = tfq.convert_to_tensor([[pauli_x]])
single_value = tf.convert_to_tensor([[my_alpha]])
num_samples_tensor = tf.convert_to_tensor([[5000]])

with tf.GradientTape() as g:
    g.watch(single_value)
    forward_output = differentiable_op(circuit_tensor, ['alpha'], single_value,
                                       op_tensor, num_samples_tensor)

my_gradients = g.gradient(forward_output, single_value)

print('---TFQ---')
print('Foward:  ', forward_output.numpy())
print('Gradient:', my_gradients.numpy())
print('---Original---')
print('Forward: ', my_expectation(pauli_x, my_alpha))
print('Gradient:', my_grad(pauli_x, my_alpha))
---TFQ---
Foward:   [[0.7816]]
Gradient: [[1.7919645]]
---Original---
Forward:  0.80901700258255
Gradient: 1.8063604831695557