在 TensorFlow.org 上查看 | 在 Google Colab 中运行 | 在 GitHub 上查看源代码 | 下载笔记本 |
import collections
import tensorflow as tf
tf.compat.v2.enable_v2_behavior()
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors
基础
与 TensorFlow 分布形状相关的三个重要概念
- 事件形状描述了从分布中抽取单个样本的形状;它可能在维度之间是相关的。对于标量分布,事件形状为
[]
。对于 5 维多元正态分布,事件形状为[5]
。 - 批次形状描述了独立的、非同分布的样本,也称为分布的“批次”。
- 样本形状描述了从分布族中抽取的批次的独立同分布样本。
事件形状和批次形状是 Distribution
对象的属性,而样本形状与对 sample
或 log_prob
的特定调用相关联。
本笔记本的目的是通过示例说明这些概念,因此如果这些概念不是立即显而易见,请不要担心!
有关这些概念的另一个概念性概述,请参阅 这篇博文。
关于 TensorFlow Eager 的说明。
整个笔记本都是使用 TensorFlow Eager 编写的。此处介绍的概念不依赖于 Eager,尽管在 Eager 中,分布批次和事件形状在 Python 中创建 Distribution
对象时进行评估(因此已知),而在图(非 Eager 模式)中,可以定义事件和批次形状在运行图之前未确定的分布。
标量分布
如上所述,Distribution
对象具有定义的事件和批次形状。我们将从一个用于描述分布的实用程序开始
def describe_distributions(distributions):
print('\n'.join([str(d) for d in distributions]))
在本节中,我们将探讨标量分布:事件形状为 []
的分布。一个典型的例子是泊松分布,由 rate
指定
poisson_distributions = [
tfd.Poisson(rate=1., name='One Poisson Scalar Batch'),
tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons'),
tfd.Poisson(rate=[[1., 10., 100.,], [2., 20., 200.]],
name='Two-by-Three Poissons'),
tfd.Poisson(rate=[1.], name='One Poisson Vector Batch'),
tfd.Poisson(rate=[[1.]], name='One Poisson Expanded Batch')
]
describe_distributions(poisson_distributions)
tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32) tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32) tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32) tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32)
泊松分布是标量分布,因此其事件形状始终为 []
。如果我们指定更多速率,这些速率将显示在批次形状中。最后两个示例很有趣:只有一个速率,但由于该速率嵌入在具有非空形状的 numpy 数组中,因此该形状将成为批次形状。
标准正态分布也是标量。它的事件形状为 []
,就像泊松分布一样,但我们将使用它来查看我们第一个广播示例。正态分布使用 loc
和 scale
参数指定
normal_distributions = [
tfd.Normal(loc=0., scale=1., name='Standard'),
tfd.Normal(loc=[0.], scale=1., name='Standard Vector Batch'),
tfd.Normal(loc=[0., 1., 2., 3.], scale=1., name='Different Locs'),
tfd.Normal(loc=[0., 1., 2., 3.], scale=[[1.], [5.]],
name='Broadcasting Scale')
]
describe_distributions(normal_distributions)
tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32) tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32) tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32)
上面有趣的示例是 Broadcasting Scale
分布。 loc
参数的形状为 [4]
,而 scale
参数的形状为 [2, 1]
。使用 Numpy 广播规则,批次形状为 [2, 4]
。定义 "Broadcasting Scale"
分布的等效(但不太优雅且不推荐)方法是
describe_distributions(
[tfd.Normal(loc=[[0., 1., 2., 3], [0., 1., 2., 3.]],
scale=[[1., 1., 1., 1.], [5., 5., 5., 5.]])])
tfp.distributions.Normal("Normal", batch_shape=[2, 4], event_shape=[], dtype=float32)
我们可以看到为什么广播表示法很有用,尽管它也是头痛和错误的来源。
采样标量分布
我们可以对分布进行两项主要操作:可以从分布中 sample
,也可以计算 log_prob
。让我们先来探讨采样。基本规则是,当我们从分布中采样时,生成的张量的形状为 [sample_shape, batch_shape, event_shape]
,其中 batch_shape
和 event_shape
由 Distribution
对象提供,而 sample_shape
由对 sample
的调用提供。对于标量分布,event_shape = []
,因此从 sample 返回的张量的形状将为 [sample_shape, batch_shape]
。让我们试一试
def describe_sample_tensor_shape(sample_shape, distribution):
print('Sample shape:', sample_shape)
print('Returned sample tensor shape:',
distribution.sample(sample_shape).shape)
def describe_sample_tensor_shapes(distributions, sample_shapes):
started = False
for distribution in distributions:
print(distribution)
for sample_shape in sample_shapes:
describe_sample_tensor_shape(sample_shape, distribution)
print()
sample_shapes = [1, 2, [1, 5], [3, 4, 5]]
describe_sample_tensor_shapes(poisson_distributions, sample_shapes)
tfp.distributions.Poisson("One_Poisson_Scalar_Batch", batch_shape=[], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1,) Sample shape: 2 Returned sample tensor shape: (2,) Sample shape: [1, 5] Returned sample tensor shape: (1, 5) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5) tfp.distributions.Poisson("Three_Poissons", batch_shape=[3], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 3) Sample shape: 2 Returned sample tensor shape: (2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 3) tfp.distributions.Poisson("Two_by_Three_Poissons", batch_shape=[2, 3], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Poisson("One_Poisson_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1) tfp.distributions.Poisson("One_Poisson_Expanded_Batch", batch_shape=[1, 1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1, 1)
describe_sample_tensor_shapes(normal_distributions, sample_shapes)
tfp.distributions.Normal("Standard", batch_shape=[], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1,) Sample shape: 2 Returned sample tensor shape: (2,) Sample shape: [1, 5] Returned sample tensor shape: (1, 5) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5) tfp.distributions.Normal("Standard_Vector_Batch", batch_shape=[1], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 1) Sample shape: 2 Returned sample tensor shape: (2, 1) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 1) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 1) tfp.distributions.Normal("Different_Locs", batch_shape=[4], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 4) Sample shape: 2 Returned sample tensor shape: (2, 4) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 4) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 4) tfp.distributions.Normal("Broadcasting_Scale", batch_shape=[2, 4], event_shape=[], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 4) Sample shape: 2 Returned sample tensor shape: (2, 2, 4) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 4) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 4)
关于 sample
就这么多:返回的样本张量的形状为 [sample_shape, batch_shape, event_shape]
。
计算标量分布的log_prob
现在让我们看一下log_prob
,它有点棘手。 log_prob
接受一个(非空)张量作为输入,该张量表示要为分布计算 log_prob
的位置。在最直接的情况下,此张量的形状将为 [sample_shape, batch_shape, event_shape]
,其中 batch_shape
和 event_shape
与分布的批次和事件形状匹配。再次回顾一下,对于标量分布,event_shape = []
,因此输入张量的形状为 [sample_shape, batch_shape]
。在这种情况下,我们将得到一个形状为 [sample_shape, batch_shape]
的张量。
three_poissons = tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons')
three_poissons
<tfp.distributions.Poisson 'Three_Poissons' batch_shape=[3] event_shape=[] dtype=float32>
three_poissons.log_prob([[1., 10., 100.], [100., 10., 1]]) # sample_shape is [2].
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [-364.73938 , -2.0785608, -95.39484 ]], dtype=float32)>
three_poissons.log_prob([[[[1., 10., 100.], [100., 10., 1.]]]]) # sample_shape is [1, 1, 2].
<tf.Tensor: shape=(1, 1, 2, 3), dtype=float32, numpy= array([[[[ -1. , -2.0785608, -3.2223587], [-364.73938 , -2.0785608, -95.39484 ]]]], dtype=float32)>
请注意,在第一个示例中,输入和输出的形状为 [2, 3]
,而在第二个示例中,它们的形状为 [1, 1, 2, 3]
。
如果没有广播,那就没什么好说的了。一旦我们考虑了广播,这里就是规则。我们将以完全的通用性进行描述,并注意标量分布的简化。
- 定义
n = len(batch_shape) + len(event_shape)
。(对于标量分布,len(event_shape)=0
。) - 如果输入张量
t
的维度少于n
,则通过在左侧添加大小为1
的维度来填充其形状,直到它正好具有n
个维度。将得到的张量称为t'
。 - 将
t'
的最右边的n
个维度广播到您正在计算log_prob
的分布的[batch_shape, event_shape]
上。更详细地说:对于t'
已经与分布匹配的维度,不做任何操作,对于t'
具有单例的维度,将该单例复制适当的次数。任何其他情况都是错误。(对于标量分布,我们只针对batch_shape
进行广播,因为 event_shape =[]
。) - 现在我们终于可以计算
log_prob
了。得到的张量将具有形状[sample_shape, batch_shape]
,其中sample_shape
被定义为t
或t'
中最右边n
个维度左侧的任何维度:sample_shape = shape(t)[:-n]
。
如果您不知道这意味着什么,这可能是一团糟,所以让我们做一些例子。
three_poissons.log_prob([10.])
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-16.104412 , -2.0785608, -69.05272 ], dtype=float32)>
张量 [10.]
(形状为 [1]
)在 3 的 batch_shape
上广播,因此我们在值 10 处评估所有三个泊松的 log 概率。
three_poissons.log_prob([[[1.], [10.]], [[100.], [1000.]]])
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[-1.0000000e+00, -7.6974149e+00, -9.5394836e+01], [-1.6104412e+01, -2.0785608e+00, -6.9052719e+01]], [[-3.6473938e+02, -1.4348087e+02, -3.2223587e+00], [-5.9131279e+03, -3.6195427e+03, -1.4069575e+03]]], dtype=float32)>
在上面的示例中,输入张量的形状为 [2, 2, 1]
,而分布对象的批次形状为 3。因此,对于每个 [2, 2]
样本维度,提供的单个值都会广播到三个泊松中的每一个。
一种可能有用思考方式:因为 three_poissons
具有 batch_shape = [2, 3]
,所以对 log_prob
的调用必须接受一个张量,其最后一个维度为 1 或 3;任何其他情况都是错误。(numpy 广播规则将标量的特殊情况视为与形状为 [1]
的张量完全等效。)
让我们通过使用更复杂的泊松分布(batch_shape = [2, 3]
)来测试我们的技巧。
poisson_2_by_3 = tfd.Poisson(
rate=[[1., 10., 100.,], [2., 20., 200.]],
name='Two-by-Three Poissons')
poisson_2_by_3.log_prob(1.)
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>
poisson_2_by_3.log_prob([1.]) # Exactly equivalent to above, demonstrating the scalar special case.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 1., 1.], [1., 1., 1.]]) # Another way to write the same thing. No broadcasting.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 10., 100.]]) # Input is [1, 3] broadcast to [2, 3].
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [ -1.3068528, -5.14709 , -33.90767 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 10., 100.], [1., 10., 100.]]) # Equivalent to above. No broadcasting.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -2.0785608, -3.2223587], [ -1.3068528, -5.14709 , -33.90767 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1., 1., 1.], [2., 2., 2.]]) # No broadcasting.
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -14.701683 , -190.09653 ]], dtype=float32)>
poisson_2_by_3.log_prob([[1.], [2.]]) # Equivalent to above. Input shape [2, 1] broadcast to [2, 3].
<tf.Tensor: shape=(2, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -14.701683 , -190.09653 ]], dtype=float32)>
上面的示例涉及在批次上广播,但样本形状为空。假设我们有一组值,我们想在批次中的每个点获取每个值的 log 概率。我们可以手动完成它。
poisson_2_by_3.log_prob([[[1., 1., 1.], [1., 1., 1.]], [[2., 2., 2.], [2., 2., 2.]]]) # Input shape [2, 2, 3].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
或者我们可以让广播处理最后一个批次维度。
poisson_2_by_3.log_prob([[[1.], [1.]], [[2.], [2.]]]) # Input shape [2, 2, 1].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
我们也可以(也许不太自然地)让广播只处理第一个批次维度。
poisson_2_by_3.log_prob([[[1., 1., 1.]], [[2., 2., 2.]]]) # Input shape [2, 1, 3].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
或者我们可以让广播处理两个批次维度。
poisson_2_by_3.log_prob([[[1.]], [[2.]]]) # Input shape [2, 1, 1].
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
当我们只有两个想要的值时,上面的方法运行良好,但假设我们有一个很长的值列表,我们想在每个批次点进行评估。为此,以下表示法非常有用,它在形状的右侧添加了额外的尺寸 1。
poisson_2_by_3.log_prob(tf.constant([1., 2.])[..., tf.newaxis, tf.newaxis])
<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy= array([[[ -1. , -7.697415 , -95.39484 ], [ -1.3068528, -17.004269 , -194.70169 ]], [[ -1.6931472, -6.087977 , -91.48282 ], [ -1.3068528, -14.701683 , -190.09653 ]]], dtype=float32)>
这是 步长切片表示法 的一个实例,值得了解。
为了完整起见,回到 three_poissons
,相同的示例看起来像这样。
three_poissons.log_prob([[1.], [10.], [50.], [100.]])
<tf.Tensor: shape=(4, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -16.104412 , -2.0785608, -69.05272 ], [-149.47777 , -43.34851 , -18.219261 ], [-364.73938 , -143.48087 , -3.2223587]], dtype=float32)>
three_poissons.log_prob(tf.constant([1., 10., 50., 100.])[..., tf.newaxis]) # Equivalent to above.
<tf.Tensor: shape=(4, 3), dtype=float32, numpy= array([[ -1. , -7.697415 , -95.39484 ], [ -16.104412 , -2.0785608, -69.05272 ], [-149.47777 , -43.34851 , -18.219261 ], [-364.73938 , -143.48087 , -3.2223587]], dtype=float32)>
多元分布
现在我们转向多元分布,它们具有非空的事件形状。让我们看一下多项式分布。
multinomial_distributions = [
# Multinomial is a vector-valued distribution: if we have k classes,
# an individual sample from the distribution has k values in it, so the
# event_shape is `[k]`.
tfd.Multinomial(total_count=100., probs=[.5, .4, .1],
name='One Multinomial'),
tfd.Multinomial(total_count=[100., 1000.], probs=[.5, .4, .1],
name='Two Multinomials Same Probs'),
tfd.Multinomial(total_count=100., probs=[[.5, .4, .1], [.1, .2, .7]],
name='Two Multinomials Same Counts'),
tfd.Multinomial(total_count=[100., 1000.],
probs=[[.5, .4, .1], [.1, .2, .7]],
name='Two Multinomials Different Everything')
]
describe_distributions(multinomial_distributions)
tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32) tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32)
请注意,在最后三个示例中,batch_shape 始终为 [2]
,但我们可以使用广播来拥有共享的 total_count
或共享的 probs
(或两者都没有),因为在幕后,它们被广播以具有相同的形状。
根据我们已经知道的知识,采样很简单。
describe_sample_tensor_shapes(multinomial_distributions, sample_shapes)
tfp.distributions.Multinomial("One_Multinomial", batch_shape=[], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 3) Sample shape: 2 Returned sample tensor shape: (2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 3) tfp.distributions.Multinomial("Two_Multinomials_Same_Probs", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Multinomial("Two_Multinomials_Same_Counts", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3) tfp.distributions.Multinomial("Two_Multinomials_Different_Everything", batch_shape=[2], event_shape=[3], dtype=float32) Sample shape: 1 Returned sample tensor shape: (1, 2, 3) Sample shape: 2 Returned sample tensor shape: (2, 2, 3) Sample shape: [1, 5] Returned sample tensor shape: (1, 5, 2, 3) Sample shape: [3, 4, 5] Returned sample tensor shape: (3, 4, 5, 2, 3)
计算 log 概率同样简单。让我们用对角多元正态分布来举一个例子。(多项式不太适合广播,因为对计数和概率的约束意味着广播通常会产生不可接受的值。)我们将使用一批 2 个 3 维分布,它们具有相同的均值,但不同的尺度(标准差)。
two_multivariate_normals = tfd.MultivariateNormalDiag(loc=[1., 2., 3.], scale_diag=tf.ones([2, 3]) * [[1.], [2.]])
two_multivariate_normals
<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2] event_shape=[3] dtype=float32>
现在让我们评估每个批次点在其均值和偏移均值处的 log 概率。
two_multivariate_normals.log_prob([[[1., 2., 3.]], [[3., 4., 5.]]]) # Input has shape [2,1,3].
<tf.Tensor: shape=(2, 2), dtype=float32, numpy= array([[-2.7568154, -4.836257 ], [-8.756816 , -6.336257 ]], dtype=float32)>
完全等效地,我们可以使用 https://tensorflowcn.cn/api_docs/cc/class/tensorflow/ops/strided-slice 在常量中间插入一个额外的形状为 1 的维度。
two_multivariate_normals.log_prob(
tf.constant([[1., 2., 3.], [3., 4., 5.]])[:, tf.newaxis, :]) # Equivalent to above.
<tf.Tensor: shape=(2, 2), dtype=float32, numpy= array([[-2.7568154, -4.836257 ], [-8.756816 , -6.336257 ]], dtype=float32)>
另一方面,如果我们不插入额外的维度,我们将 [1., 2., 3.]
传递给第一个批次点,并将 [3., 4., 5.]
传递给第二个批次点。
two_multivariate_normals.log_prob(tf.constant([[1., 2., 3.], [3., 4., 5.]]))
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2.7568154, -6.336257 ], dtype=float32)>
形状操作技术
重塑双射
Reshape
双射可用于重塑分布的event_shape。让我们看一个例子。
six_way_multinomial = tfd.Multinomial(total_count=1000., probs=[.3, .25, .2, .15, .08, .02])
six_way_multinomial
<tfp.distributions.Multinomial 'Multinomial' batch_shape=[] event_shape=[6] dtype=float32>
我们创建了一个事件形状为 [6]
的多项式。 Reshape
双射允许我们将此视为事件形状为 [2, 3]
的分布。
Bijector
表示 \({\mathbb R}^n\) 的开放子集上的可微分、一对一函数。 Bijectors
与 TransformedDistribution
结合使用,后者根据基本分布 \(p(x)\) 和表示 \(Y = g(X)\) 的 Bijector
对分布 \(p(y)\) 进行建模。让我们看看它的实际应用。
transformed_multinomial = tfd.TransformedDistribution(
distribution=six_way_multinomial,
bijector=tfb.Reshape(event_shape_out=[2, 3]))
transformed_multinomial
<tfp.distributions.TransformedDistribution 'reshapeMultinomial' batch_shape=[] event_shape=[2, 3] dtype=float32>
six_way_multinomial.log_prob([500., 100., 100., 150., 100., 50.])
<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>
transformed_multinomial.log_prob([[500., 100., 100.], [150., 100., 50.]])
<tf.Tensor: shape=(), dtype=float32, numpy=-178.21973>
这是 Reshape
双射可以做的唯一事情:它不能将事件维度转换为批次维度,反之亦然。
独立分布
Independent
分布用于将一组独立的、不一定是相同的(也称为一批)分布视为单个分布。更简洁地说, Independent
允许将 batch_shape
中的维度转换为 event_shape
中的维度。我们将通过示例进行说明。
two_by_five_bernoulli = tfd.Bernoulli(
probs=[[.05, .1, .15, .2, .25], [.3, .35, .4, .45, .5]],
name="Two By Five Bernoulli")
two_by_five_bernoulli
<tfp.distributions.Bernoulli 'Two_By_Five_Bernoulli' batch_shape=[2, 5] event_shape=[] dtype=int32>
我们可以将其视为一个 2x5 的硬币数组,以及相关的正面概率。让我们评估一组特定的、任意的 1 和 0 的概率。
pattern = [[1., 0., 0., 1., 0.], [0., 0., 1., 1., 1.]]
two_by_five_bernoulli.log_prob(pattern)
<tf.Tensor: shape=(2, 5), dtype=float32, numpy= array([[-2.9957323 , -0.10536051, -0.16251892, -1.609438 , -0.2876821 ], [-0.35667497, -0.4307829 , -0.91629076, -0.79850775, -0.6931472 ]], dtype=float32)>
我们可以使用 Independent
将其转换为两个不同的“五组伯努利”,如果我们想将“一行”硬币翻转以特定模式出现视为单个结果,这将很有用。
two_sets_of_five = tfd.Independent(
distribution=two_by_five_bernoulli,
reinterpreted_batch_ndims=1,
name="Two Sets Of Five")
two_sets_of_five
<tfp.distributions.Independent 'Two_Sets_Of_Five' batch_shape=[2] event_shape=[5] dtype=int32>
在数学上,我们通过对集合中五个“独立”硬币翻转的 log 概率求和来计算每个“集合”的五个 log 概率,这就是该分布名称的由来。
two_sets_of_five.log_prob(pattern)
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-5.160732 , -3.1954036], dtype=float32)>
我们可以更进一步,使用 Independent
创建一个分布,其中单个事件是一组 2x5 伯努利。
one_set_of_two_by_five = tfd.Independent(
distribution=two_by_five_bernoulli, reinterpreted_batch_ndims=2,
name="One Set Of Two By Five")
one_set_of_two_by_five.log_prob(pattern)
<tf.Tensor: shape=(), dtype=float32, numpy=-8.356134>
值得注意的是,从 sample
的角度来看,使用 Independent
不会改变任何东西。
describe_sample_tensor_shapes(
[two_by_five_bernoulli,
two_sets_of_five,
one_set_of_two_by_five],
[[3, 5]])
tfp.distributions.Bernoulli("Two_By_Five_Bernoulli", batch_shape=[2, 5], event_shape=[], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5) tfp.distributions.Independent("Two_Sets_Of_Five", batch_shape=[2], event_shape=[5], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5) tfp.distributions.Independent("One_Set_Of_Two_By_Five", batch_shape=[], event_shape=[2, 5], dtype=int32) Sample shape: [3, 5] Returned sample tensor shape: (3, 5, 2, 5)
作为读者的告别练习,我们建议从采样和 log 概率的角度考虑一批 Normal
分布的向量和 MultivariateNormalDiag
分布之间的差异和相似之处。我们如何使用 Independent
从一批 Normal
中构建一个 MultivariateNormalDiag
?(请注意,MultivariateNormalDiag
实际上并非以这种方式实现。)