{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FVKYfpQVYPaJ"
      },
      "source": [
        "##### Copyright 2018 The TensorFlow Probability Authors.\n",
        "\n",
        "Licensed under the Apache License, Version 2.0 (the \"License\");"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "htHLjlnLYSoB"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\"); { display-mode: \"form\" }\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DcriL2xPrG3_"
      },
      "source": [
        "# 了解 TensorFlow Distributions 形状\n",
        "\n",
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>     <a target=\"_blank\" href=\"https://tensorflow.google.cn/probability/examples/Understanding_TensorFlow_Distributions_Shapes\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\">在 TensorFlow.org 上查看</a>   </td>\n",
        "  <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/probability/examples/Understanding_TensorFlow_Distributions_Shapes.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\">在 Google Colab 中运行</a></td>\n",
        "  <td>     <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/probability/examples/Understanding_TensorFlow_Distributions_Shapes.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\">在 Github 上查看源代码</a>   </td>\n",
        "  <td>     <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/probability/examples/Understanding_TensorFlow_Distributions_Shapes.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\">下载笔记本</a>   </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "J6t0EUihrG4B"
      },
      "outputs": [],
      "source": [
        "import collections\n",
        "\n",
        "import tensorflow as tf\n",
        "tf.compat.v2.enable_v2_behavior()\n",
        "\n",
        "import tensorflow_probability as tfp\n",
        "tfd = tfp.distributions\n",
        "tfb = tfp.bijectors"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QD5lzFZerG4H"
      },
      "source": [
        "## 基础知识\n",
        "\n",
        "与 TensorFlow Distributions 形状相关的三个重要概念如下：\n",
        "\n",
        "- *事件形状*描述了从分布中单次抽样的形状；抽样可能依赖于不同维度。对于标量分布，事件形状为 `[]`。对于 5 维多元正态分布，事件形状为 `[5]`。\n",
        "- *批次形状*描述了独立但并非同分布的抽样，也称为分布的“批次”。\n",
        "- *样本形状*描述了来自分布系列的独立同分布批次抽样。\n",
        "\n",
        "事件形状和批次形状是 `Distribution` 对象的属性，而样本形状则与对 `sample` 或 `log_prob` 的特定调用相关联。\n",
        "\n",
        "此笔记本的目的是通过示例说明这些概念，因此，如果没有立即看到效果，请不要担心！\n",
        "\n",
        "有关这些概念的另一种概念性概述，请参阅[此博文](https://ericmjl.github.io/blog/2019/5/29/reasoning-about-shapes-and-probability-distributions/)。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yU34kIHDrG4I"
      },
      "source": [
        "### 关于 TensorFlow Eager 的说明\n",
        "\n",
        "整个笔记本使用 [TensorFlow Eager](https://research.googleblog.com/2017/10/eager-execution-imperative-define-by.html) 编写。在 Eager 模式下，尽管在 Python 中创建 `Distribution` 对象时会评估（并因此已知）分布批次和事件形状，提出的概念也完全不*依赖*于 Eager，而在计算图（非 Eager 模式）中，可以定义事件和批次的形状在运行计算图之后才会确定的分布。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MeirD-0JrG4K"
      },
      "source": [
        "## 标量分布\n",
        "\n",
        "如上所述，`Distribution` 对象已定义事件和批次形状。我们将从描述分布的实用工具开始："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bq8guNPtrG4M"
      },
      "outputs": [],
      "source": [
        "def describe_distributions(distributions):\n",
        "  print('\\n'.join([str(d) for d in distributions]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "06CafVXWrG4Q"
      },
      "source": [
        "在本部分中，我们将探讨*标量*分布：事件形状为 `[]` 的分布。一个典型示例是由 `rate` 指定的泊松分布："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "Sdz1OMg7rG4S"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Poisson(\"One_Poisson_Scalar_Batch\", batch_shape=[], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Poisson(\"Three_Poissons\", batch_shape=[3], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Poisson(\"Two_by_Three_Poissons\", batch_shape=[2, 3], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Poisson(\"One_Poisson_Vector_Batch\", batch_shape=[1], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Poisson(\"One_Poisson_Expanded_Batch\", batch_shape=[1, 1], event_shape=[], dtype=float32)\n"
          ]
        }
      ],
      "source": [
        "poisson_distributions = [\n",
        "    tfd.Poisson(rate=1., name='One Poisson Scalar Batch'),\n",
        "    tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons'),\n",
        "    tfd.Poisson(rate=[[1., 10., 100.,], [2., 20., 200.]],\n",
        "                name='Two-by-Three Poissons'),\n",
        "    tfd.Poisson(rate=[1.], name='One Poisson Vector Batch'),\n",
        "    tfd.Poisson(rate=[[1.]], name='One Poisson Expanded Batch')\n",
        "]\n",
        "\n",
        "describe_distributions(poisson_distributions)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lVPVIsC9rG4a"
      },
      "source": [
        "泊松分布是一种标量分布，因此其事件形状始终为 `[]`。如果我们指定更多比率，则这些比率将显示在批次形状中。最后一对示例十分有趣：只有一个比率，但由于该比率已嵌入具有非空形状的 NumPy 数组中，因此该形状成为批次形状。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cFlXG9O5rG4b"
      },
      "source": [
        "标准正态分布也是标量。与泊松分布一样，其事件的形状也是 `[]`，但我们将使用它来查看我们的第一个*广播*示例。使用 `loc` 和 `scale` 参数指定正态："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "e5PXRPM1rG4c"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Normal(\"Standard\", batch_shape=[], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Normal(\"Standard_Vector_Batch\", batch_shape=[1], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Normal(\"Different_Locs\", batch_shape=[4], event_shape=[], dtype=float32)\n",
            "tfp.distributions.Normal(\"Broadcasting_Scale\", batch_shape=[2, 4], event_shape=[], dtype=float32)\n"
          ]
        }
      ],
      "source": [
        "normal_distributions = [\n",
        "    tfd.Normal(loc=0., scale=1., name='Standard'),\n",
        "    tfd.Normal(loc=[0.], scale=1., name='Standard Vector Batch'),\n",
        "    tfd.Normal(loc=[0., 1., 2., 3.], scale=1., name='Different Locs'),\n",
        "    tfd.Normal(loc=[0., 1., 2., 3.], scale=[[1.], [5.]],\n",
        "               name='Broadcasting Scale')\n",
        "]\n",
        "\n",
        "describe_distributions(normal_distributions)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Dh70eNXHrG4i"
      },
      "source": [
        "上面有趣的示例是 `Broadcasting Scale` 分布。`loc` 参数的形状为 `[4]`，而 `scale` 参数的形状为 `[2, 1]`。使用 [Numpy 广播规则](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)时，批次形状为 `[2, 4]`。一种定义 `\"Broadcasting Scale\"` 分布的等效方式（但不太简洁，因此不推荐使用）为："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "9G5JNBzQrG4j"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Normal(\"Normal\", batch_shape=[2, 4], event_shape=[], dtype=float32)\n"
          ]
        }
      ],
      "source": [
        "describe_distributions(\n",
        "    [tfd.Normal(loc=[[0., 1., 2., 3], [0., 1., 2., 3.]],\n",
        "                scale=[[1., 1., 1., 1.], [5., 5., 5., 5.]])])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_hSBWsokrG4p"
      },
      "source": [
        "我们可以看到广播符号为什么有用，尽管它也是麻烦和错误的来源。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "trGxojHwrG4r"
      },
      "source": [
        "### 对标量分布抽样"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TDJqRz-qrG4t"
      },
      "source": [
        "我们可以对分布做的两件主要事情：从分布中 `sample`，以及计算 `log_prob`。我们先探讨抽样。基本规则是，当我们从分布中抽样时，所生成张量的形状为 `[sample_shape, batch_shape, event_shape]`，其中 `batch_shape` 和 `event_shape` 由 `Distribution` 对象提供，而 `sample_shape` 由对 `sample` 的调用提供。对于标量分布，`event_shape = []`，因此从样本返回的张量的形状为 `[sample_shape, batch_shape]`。我们来试一下："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "2TbeP0btrG4u"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Poisson(\"One_Poisson_Scalar_Batch\", batch_shape=[], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1,)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2,)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5)\n",
            "\n",
            "tfp.distributions.Poisson(\"Three_Poissons\", batch_shape=[3], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 3)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 3)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 3)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 3)\n",
            "\n",
            "tfp.distributions.Poisson(\"Two_by_Three_Poissons\", batch_shape=[2, 3], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 2, 3)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 2, 3)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 2, 3)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 2, 3)\n",
            "\n",
            "tfp.distributions.Poisson(\"One_Poisson_Vector_Batch\", batch_shape=[1], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 1)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 1)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 1)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 1)\n",
            "\n",
            "tfp.distributions.Poisson(\"One_Poisson_Expanded_Batch\", batch_shape=[1, 1], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 1, 1)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 1, 1)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 1, 1)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 1, 1)\n",
            "\n"
          ]
        }
      ],
      "source": [
        "def describe_sample_tensor_shape(sample_shape, distribution):\n",
        "    print('Sample shape:', sample_shape)\n",
        "    print('Returned sample tensor shape:',\n",
        "          distribution.sample(sample_shape).shape)\n",
        "\n",
        "def describe_sample_tensor_shapes(distributions, sample_shapes):\n",
        "    started = False\n",
        "    for distribution in distributions:\n",
        "      print(distribution)\n",
        "      for sample_shape in sample_shapes:\n",
        "        describe_sample_tensor_shape(sample_shape, distribution)\n",
        "      print()\n",
        "\n",
        "sample_shapes = [1, 2, [1, 5], [3, 4, 5]]\n",
        "describe_sample_tensor_shapes(poisson_distributions, sample_shapes)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "qiJK8UBorG40"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Normal(\"Standard\", batch_shape=[], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1,)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2,)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5)\n",
            "\n",
            "tfp.distributions.Normal(\"Standard_Vector_Batch\", batch_shape=[1], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 1)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 1)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 1)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 1)\n",
            "\n",
            "tfp.distributions.Normal(\"Different_Locs\", batch_shape=[4], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 4)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 4)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 4)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 4)\n",
            "\n",
            "tfp.distributions.Normal(\"Broadcasting_Scale\", batch_shape=[2, 4], event_shape=[], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 2, 4)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 2, 4)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 2, 4)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 2, 4)\n",
            "\n"
          ]
        }
      ],
      "source": [
        "describe_sample_tensor_shapes(normal_distributions, sample_shapes)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wDRB80oLrG48"
      },
      "source": [
        "这就是关于 `sample` 的所有内容：返回的样本张量的形状为 `[sample_shape, batch_shape, event_shape]`。\n",
        "\n",
        "### 计算标量分布的 `log_prob`\n",
        "\n",
        "现在，我们看一下有点棘手的 `log_prob`。`log_prob` 接受（非空）张量作为输入，此张量表示要计算分布的 `log_prob` 的位置。在最简单的情况下，此张量将具有 `[sample_shape, batch_shape, event_shape]` 形式的形状，其中 `batch_shape` 和 `event_shape` 与分布的批次和事件形状匹配。再次回想一下，对于标量分布，`event_shape = []`，因此输入张量的形状为 `[sample_shape, batch_shape]`。在本例中，我们会重新得到形状为 `[sample_shape, batch_shape]` 的张量："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "UgNIiFf9rG49"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tfp.distributions.Poisson 'Three_Poissons' batch_shape=[3] event_shape=[] dtype=float32>"
            ]
          },
          "execution_count": 10,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons = tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons')\n",
        "three_poissons"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "OpN5WGog0WwC"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -2.0785608,   -3.2223587],\n",
              "       [-364.73938  ,   -2.0785608,  -95.39484  ]], dtype=float32)>"
            ]
          },
          "execution_count": 11,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons.log_prob([[1., 10., 100.], [100., 10., 1]])  # sample_shape is [2]."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "4szFj9lkrG5F"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(1, 1, 2, 3), dtype=float32, numpy=\n",
              "array([[[[  -1.       ,   -2.0785608,   -3.2223587],\n",
              "         [-364.73938  ,   -2.0785608,  -95.39484  ]]]], dtype=float32)>"
            ]
          },
          "execution_count": 12,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons.log_prob([[[[1., 10., 100.], [100., 10., 1.]]]])  # sample_shape is [1, 1, 2]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VG_n9BHsrG5M"
      },
      "source": [
        "请注意，在第一个示例中，输入和输出的形状为 `[2, 3]`，而在第二个示例中，输入和输出的形状为 `[1, 1, 2, 3]`。\n",
        "\n",
        "如果不是为了广播，那么这就是要介绍的全部内容。下面是考虑广播时的规则。我们从完全通用的角度对广播进行描述，并说明标量分布的简化：\n",
        "\n",
        "1. 定义 `n = len(batch_shape) + len(event_shape)`。（对于标量分布，`len(event_shape)=0`。）\n",
        "2. 如果输入张量 `t` 的维数小于 `n`，则通过在左侧添加大小为 `1` 的维度来填充其形状，直到其维数恰好为 `n`。调用生成的张量 `t'`。\n",
        "3. 针对要为其计算 `log_prob` 的分布的 `[batch_shape, event_shape]` 广播 `t'` 的 `n` 个最右侧维度。更详细地说：对于 `t'` 已经与分布匹配的维度，不执行任何操作，而对于 `t'` 具有单例的维度，将该单例复制适当的次数。任何其他情况都是错误。（对于标量分布，由于 event_shape = `[]`，因此我们仅针对 `batch_shape` 进行广播。）\n",
        "4. 现在，我们终于可以计算 `log_prob` 了。所生成张量的形状为 `[sample_shape, batch_shape]`，其中 `sample_shape` 被定义为 `n` 个最右侧维度左侧的 `t` 或 `t'` 的任意维度：`sample_shape = shape(t)[:-n]`。\n",
        "\n",
        "如果您不知道它的含义是什么，那么可能有些混乱，因此我们来运行一些示例："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "YwDVaeRHrG5O"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-16.104412 ,  -2.0785608, -69.05272  ], dtype=float32)>"
            ]
          },
          "execution_count": 13,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons.log_prob([10.])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xAImEhtdrG5U"
      },
      "source": [
        "张量 `[10.]`（形状为 `[1]`）在 `batch_shape` 为 3 的范围内广播，因此我们在值 10 处评估全部三个泊松分布的对数概率。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "id": "daDAG6p2rG5V"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=\n",
              "array([[[-1.0000000e+00, -7.6974149e+00, -9.5394836e+01],\n",
              "        [-1.6104412e+01, -2.0785608e+00, -6.9052719e+01]],\n",
              "\n",
              "       [[-3.6473938e+02, -1.4348087e+02, -3.2223587e+00],\n",
              "        [-5.9131279e+03, -3.6195427e+03, -1.4069575e+03]]], dtype=float32)>"
            ]
          },
          "execution_count": 14,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons.log_prob([[[1.], [10.]], [[100.], [1000.]]])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "REEX-DgBrG5b"
      },
      "source": [
        "在上面的示例中，输入张量的形状为 `[2, 2, 1]`，而分布对象的批次形状为 3。因此，对于 `[2, 2]` 示例维度中的每个维度，提供的单个值会广播到这三个泊松分布。\n",
        "\n",
        "一种可能有用的思考方式：由于 `three_poissons` 具有 `batch_shape = [2, 3]`，因此，对 `log_prob` 的调用必须获取一个最后维度为 1 或 3 的张量；其他所有情况都是错误。（NumPy 广播规则将标量的特例视为完全等同于形状为 `[1]` 的张量。）\n",
        "\n",
        "我们通过使用更复杂的泊松分布 (`batch_shape = [2, 3]`) 来测试我们的想法："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MkSWkwYarG5d"
      },
      "outputs": [],
      "source": [
        "poisson_2_by_3 = tfd.Poisson(\n",
        "    rate=[[1., 10., 100.,], [2., 20., 200.]],\n",
        "    name='Two-by-Three Poissons')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "9YFRkkssrG5f"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [  -1.3068528,  -17.004269 , -194.70169  ]], dtype=float32)>"
            ]
          },
          "execution_count": 16,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob(1.)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "CqQXvOexrG5i"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [  -1.3068528,  -17.004269 , -194.70169  ]], dtype=float32)>"
            ]
          },
          "execution_count": 17,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([1.])  # Exactly equivalent to above, demonstrating the scalar special case."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "1nCuYQC5rG5m"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [  -1.3068528,  -17.004269 , -194.70169  ]], dtype=float32)>"
            ]
          },
          "execution_count": 18,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[1., 1., 1.], [1., 1., 1.]])  # Another way to write the same thing. No broadcasting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "id": "2PgG6udBrG5p"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[ -1.       ,  -2.0785608,  -3.2223587],\n",
              "       [ -1.3068528,  -5.14709  , -33.90767  ]], dtype=float32)>"
            ]
          },
          "execution_count": 19,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[1., 10., 100.]])  # Input is [1, 3] broadcast to [2, 3]."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "id": "Gm7ejyoArG5s"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[ -1.       ,  -2.0785608,  -3.2223587],\n",
              "       [ -1.3068528,  -5.14709  , -33.90767  ]], dtype=float32)>"
            ]
          },
          "execution_count": 20,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[1., 10., 100.], [1., 10., 100.]])  # Equivalent to above. No broadcasting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "id": "mVMSGVvGrG5w"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [  -1.3068528,  -14.701683 , -190.09653  ]], dtype=float32)>"
            ]
          },
          "execution_count": 21,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[1., 1., 1.], [2., 2., 2.]])  # No broadcasting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "id": "OVEpi5QErG5z"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [  -1.3068528,  -14.701683 , -190.09653  ]], dtype=float32)>"
            ]
          },
          "execution_count": 22,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[1.], [2.]])  # Equivalent to above. Input shape [2, 1] broadcast to [2, 3]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZW2tApDGrG53"
      },
      "source": [
        "上面的示例涉及在批次上广播，但样本形状为空。假设我们有一个值的集合，并且我们想要获得批次中每个点处每个值的对数概率。我们可以手动实现："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "id": "03DvnmK2rG53"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=\n",
              "array([[[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "        [  -1.3068528,  -17.004269 , -194.70169  ]],\n",
              "\n",
              "       [[  -1.6931472,   -6.087977 ,  -91.48282  ],\n",
              "        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>"
            ]
          },
          "execution_count": 23,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[[1., 1., 1.], [1., 1., 1.]], [[2., 2., 2.], [2., 2., 2.]]])  # Input shape [2, 2, 3]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XkpJQ0dJrG56"
      },
      "source": [
        "或者，我们可以让广播处理最后一个批次维度："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "id": "KJ6OsodCrG57"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=\n",
              "array([[[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "        [  -1.3068528,  -17.004269 , -194.70169  ]],\n",
              "\n",
              "       [[  -1.6931472,   -6.087977 ,  -91.48282  ],\n",
              "        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>"
            ]
          },
          "execution_count": 24,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[[1.], [1.]], [[2.], [2.]]])  # Input shape [2, 2, 1]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eZFx8pThrG5-"
      },
      "source": [
        "我们还可以（或许不太自然）让广播只处理第一个批次维度："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "id": "UoGs7GBSrG5_"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=\n",
              "array([[[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "        [  -1.3068528,  -17.004269 , -194.70169  ]],\n",
              "\n",
              "       [[  -1.6931472,   -6.087977 ,  -91.48282  ],\n",
              "        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>"
            ]
          },
          "execution_count": 25,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[[1., 1., 1.]], [[2., 2., 2.]]])  # Input shape [2, 1, 3]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cOP4OhGDrG6C"
      },
      "source": [
        "或者，我们可以让广播同时处理*两个*批次维度："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "id": "tnG2f4tZrG6E"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=\n",
              "array([[[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "        [  -1.3068528,  -17.004269 , -194.70169  ]],\n",
              "\n",
              "       [[  -1.6931472,   -6.087977 ,  -91.48282  ],\n",
              "        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>"
            ]
          },
          "execution_count": 26,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob([[[1.]], [[2.]]])  # Input shape [2, 1, 1]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "I1s1drAwrG6K"
      },
      "source": [
        "当我们只有两个需要的值时，上面的方式十分有效，但是，设想我们有一长串的值要在每个批次点处进行评估。为此，下面的符号可以在形状的右侧额外添加一个大小为 1 的维度，这样做非常有用："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "id": "oUxbYZN_rG6K"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2, 3), dtype=float32, numpy=\n",
              "array([[[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "        [  -1.3068528,  -17.004269 , -194.70169  ]],\n",
              "\n",
              "       [[  -1.6931472,   -6.087977 ,  -91.48282  ],\n",
              "        [  -1.3068528,  -14.701683 , -190.09653  ]]], dtype=float32)>"
            ]
          },
          "execution_count": 27,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "poisson_2_by_3.log_prob(tf.constant([1., 2.])[..., tf.newaxis, tf.newaxis])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Se893aIurG6M"
      },
      "source": [
        "这是[跨步切片符号](https://tensorflow.google.cn/api_docs/cc/class/tensorflow/ops/strided-slice)的一个实例，非常值得了解。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XNDhHqJmrG6N"
      },
      "source": [
        "为了保持完整性，回到 `three_poissons`，相同的示例如下所示："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "id": "zKP7OmQsrG6N"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(4, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [ -16.104412 ,   -2.0785608,  -69.05272  ],\n",
              "       [-149.47777  ,  -43.34851  ,  -18.219261 ],\n",
              "       [-364.73938  , -143.48087  ,   -3.2223587]], dtype=float32)>"
            ]
          },
          "execution_count": 28,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons.log_prob([[1.], [10.], [50.], [100.]])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "id": "PK_9DwSdrG6R"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(4, 3), dtype=float32, numpy=\n",
              "array([[  -1.       ,   -7.697415 ,  -95.39484  ],\n",
              "       [ -16.104412 ,   -2.0785608,  -69.05272  ],\n",
              "       [-149.47777  ,  -43.34851  ,  -18.219261 ],\n",
              "       [-364.73938  , -143.48087  ,   -3.2223587]], dtype=float32)>"
            ]
          },
          "execution_count": 29,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "three_poissons.log_prob(tf.constant([1., 10., 50., 100.])[..., tf.newaxis])  # Equivalent to above."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lhL17DW5rG6T"
      },
      "source": [
        "## 多元分布\n",
        "\n",
        "现在，我们转向具有非空事件形状的多元分布。我们看一下多项式分布。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "id": "lOdGa5n9rG6T"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Multinomial(\"One_Multinomial\", batch_shape=[], event_shape=[3], dtype=float32)\n",
            "tfp.distributions.Multinomial(\"Two_Multinomials_Same_Probs\", batch_shape=[2], event_shape=[3], dtype=float32)\n",
            "tfp.distributions.Multinomial(\"Two_Multinomials_Same_Counts\", batch_shape=[2], event_shape=[3], dtype=float32)\n",
            "tfp.distributions.Multinomial(\"Two_Multinomials_Different_Everything\", batch_shape=[2], event_shape=[3], dtype=float32)\n"
          ]
        }
      ],
      "source": [
        "multinomial_distributions = [\n",
        "    # Multinomial is a vector-valued distribution: if we have k classes,\n",
        "    # an individual sample from the distribution has k values in it, so the\n",
        "    # event_shape is `[k]`.\n",
        "    tfd.Multinomial(total_count=100., probs=[.5, .4, .1],\n",
        "                    name='One Multinomial'),\n",
        "    tfd.Multinomial(total_count=[100., 1000.], probs=[.5, .4, .1],\n",
        "                    name='Two Multinomials Same Probs'),\n",
        "    tfd.Multinomial(total_count=100., probs=[[.5, .4, .1], [.1, .2, .7]],\n",
        "                    name='Two Multinomials Same Counts'),\n",
        "    tfd.Multinomial(total_count=[100., 1000.],\n",
        "                    probs=[[.5, .4, .1], [.1, .2, .7]],\n",
        "                    name='Two Multinomials Different Everything')\n",
        "\n",
        "]\n",
        "\n",
        "describe_distributions(multinomial_distributions)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-NQ8gK7irG6W"
      },
      "source": [
        "请注意，在最后三个示例中，batch_shape 始终为 `[2]`，但是我们可以使用广播来获得共享的 `total_count` 或共享的 `probs`（或者两者都没有），因为它们在后台被广播为具有相同的形状。\n",
        "\n",
        "根据我们已知的信息，抽样非常简单："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "id": "hSr362qjrG6W"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Multinomial(\"One_Multinomial\", batch_shape=[], event_shape=[3], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 3)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 3)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 3)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 3)\n",
            "\n",
            "tfp.distributions.Multinomial(\"Two_Multinomials_Same_Probs\", batch_shape=[2], event_shape=[3], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 2, 3)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 2, 3)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 2, 3)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 2, 3)\n",
            "\n",
            "tfp.distributions.Multinomial(\"Two_Multinomials_Same_Counts\", batch_shape=[2], event_shape=[3], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 2, 3)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 2, 3)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 2, 3)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 2, 3)\n",
            "\n",
            "tfp.distributions.Multinomial(\"Two_Multinomials_Different_Everything\", batch_shape=[2], event_shape=[3], dtype=float32)\n",
            "Sample shape: 1\n",
            "Returned sample tensor shape: (1, 2, 3)\n",
            "Sample shape: 2\n",
            "Returned sample tensor shape: (2, 2, 3)\n",
            "Sample shape: [1, 5]\n",
            "Returned sample tensor shape: (1, 5, 2, 3)\n",
            "Sample shape: [3, 4, 5]\n",
            "Returned sample tensor shape: (3, 4, 5, 2, 3)\n",
            "\n"
          ]
        }
      ],
      "source": [
        "describe_sample_tensor_shapes(multinomial_distributions, sample_shapes)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jjXgxPXCrG6Z"
      },
      "source": [
        "计算对数概率同样非常简单。我们以一个对角多元正态分布作为示例。（多项式分布对广播不是十分友好，因为对计数和概率的约束意味着广播通常会产生不可接受的值。）我们将使用 2 个 3 维分布组成的批次，它们的均值相同但标度不同（标准差）："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {
        "id": "cnywBQdZrG6Z"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2] event_shape=[3] dtype=float32>"
            ]
          },
          "execution_count": 32,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_multivariate_normals = tfd.MultivariateNormalDiag(loc=[1., 2., 3.], scale_identity_multiplier=[1., 2.])\n",
        "two_multivariate_normals"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S9xE21IirG6b"
      },
      "source": [
        "（请注意，尽管我们使用标度为恒等式倍数的分布，但这不是对它的限制；我们可以传递 `scale` 而不是 `scale_identity_multiplier`。）\n",
        "\n",
        "现在，我们来评估每个批次点的均值和漂移均值处的对数概率："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "id": "YBOLH33PrG6b"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2), dtype=float32, numpy=\n",
              "array([[-2.7568154, -4.836257 ],\n",
              "       [-8.756816 , -6.336257 ]], dtype=float32)>"
            ]
          },
          "execution_count": 33,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_multivariate_normals.log_prob([[[1., 2., 3.]], [[3., 4., 5.]]])  # Input has shape [2,1,3]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oPDC6y3qrG6e"
      },
      "source": [
        "完全等效，我们可以使用 [https://tensorflow.google.cn/api_docs/cc/class/tensorflow/ops/strided-slice](跨步切片符号)在常量中间插入一个额外的 shape=1 维度："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {
        "id": "M9m9GMezrG6f"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 2), dtype=float32, numpy=\n",
              "array([[-2.7568154, -4.836257 ],\n",
              "       [-8.756816 , -6.336257 ]], dtype=float32)>"
            ]
          },
          "execution_count": 34,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_multivariate_normals.log_prob(\n",
        "    tf.constant([[1., 2., 3.], [3., 4., 5.]])[:, tf.newaxis, :])  # Equivalent to above."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jJA07wN7rG6i"
      },
      "source": [
        "另一方面，如果不插入额外的维度，则将 `[1., 2., 3.]` 传递给第一个批次点，将 `[3., 4., 5.]` 传递给第二个批次点："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 35,
      "metadata": {
        "id": "xaX1unvPrG6i"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-2.7568154, -6.336257 ], dtype=float32)>"
            ]
          },
          "execution_count": 35,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_multivariate_normals.log_prob(tf.constant([[1., 2., 3.], [3., 4., 5.]]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JnhW86vcUfT8"
      },
      "source": [
        "## 形状操作技术"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6EYcFW7OrG6m"
      },
      "source": [
        "### Reshape 双射器\n",
        "\n",
        "`Reshape` 双射器可用于改变分布的 *event_shape* 的形状。我们来看一个示例："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {
        "id": "1YT_lQCarG6m"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tfp.distributions.Multinomial 'Multinomial' batch_shape=[] event_shape=[6] dtype=float32>"
            ]
          },
          "execution_count": 36,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "six_way_multinomial = tfd.Multinomial(total_count=1000., probs=[.3, .25, .2, .15, .08, .02])\n",
        "six_way_multinomial"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c5a5uXQsUpMs"
      },
      "source": [
        "我们创建了一个事件形状为 `[6]` 的多项式。借助 Reshape 双射器，我们可以将其视为事件形状为 `[2, 3]` 的分布。\n",
        "\n",
        "`Bijector` 表示 ${\\mathbb R}^n$ 的一个开放子集上的可微分一对一函数。`Bijectors` 适合与 `TransformedDistribution` 结合使用，后者会根据基础分布 $p(y)$ 和表示 $Y = g(X)$ 的 `Bijector` 对分布 $p(x)$ 建模。我们来看一下它的实际运行："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 37,
      "metadata": {
        "id": "Wttfn9Q-rG6o"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tfp.distributions.TransformedDistribution 'reshapeMultinomial' batch_shape=[] event_shape=[2, 3] dtype=float32>"
            ]
          },
          "execution_count": 37,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "transformed_multinomial = tfd.TransformedDistribution(\n",
        "    distribution=six_way_multinomial,\n",
        "    bijector=tfb.Reshape(event_shape_out=[2, 3]))\n",
        "transformed_multinomial"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 38,
      "metadata": {
        "id": "sh6l4XZdrG6p"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(), dtype=float32, numpy=-178.22021>"
            ]
          },
          "execution_count": 38,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "six_way_multinomial.log_prob([500., 100., 100., 150., 100., 50.])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 39,
      "metadata": {
        "id": "6nMHlVrArG6r"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(), dtype=float32, numpy=-178.22021>"
            ]
          },
          "execution_count": 39,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "transformed_multinomial.log_prob([[500., 100., 100.], [150., 100., 50.]])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dxZNZ02OrG6t"
      },
      "source": [
        "这是 `Reshape` 双射器模型*唯一*能做的事情：它不能将事件维度转换为批次维度，反之亦然。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "de7ek-FerG6t"
      },
      "source": [
        "### Independent 分布\n",
        "\n",
        "`Independent` 分布用于将不一定相同的（也称为一批）独立分布的集合视为单个分布。更简单地说，`Independent` 允许将 `batch_shape` 中的维度转换为 `event_shape` 中的维度。我们将举例说明："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 40,
      "metadata": {
        "id": "tLwioZPRrG6t"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tfp.distributions.Bernoulli 'Two_By_Five_Bernoulli' batch_shape=[2, 5] event_shape=[] dtype=int32>"
            ]
          },
          "execution_count": 40,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_by_five_bernoulli = tfd.Bernoulli(\n",
        "    probs=[[.05, .1, .15, .2, .25], [.3, .35, .4, .45, .5]],\n",
        "    name=\"Two By Five Bernoulli\")\n",
        "two_by_five_bernoulli"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-okVviR3rG6v"
      },
      "source": [
        "我们可以认为这是一个具有相关正面概率的 2 × 5 硬币数组。我们评估特定的任意一组 1 和 0 的概率："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 41,
      "metadata": {
        "id": "9yq9jTGIrG6x"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2, 5), dtype=float32, numpy=\n",
              "array([[-2.9957323 , -0.10536052, -0.16251892, -1.609438  , -0.2876821 ],\n",
              "       [-0.35667497, -0.4307829 , -0.9162907 , -0.7985077 , -0.6931472 ]],\n",
              "      dtype=float32)>"
            ]
          },
          "execution_count": 41,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pattern = [[1., 0., 0., 1., 0.], [0., 0., 1., 1., 1.]]\n",
        "two_by_five_bernoulli.log_prob(pattern)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C9CA19oPrG6y"
      },
      "source": [
        "我们可以使用 `Independent` 将其转换为两个不同的“5 次伯努利试验的集合”，如果我们想将给定模式中出现的“一行”抛硬币视为单个结果，这会十分有用："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 42,
      "metadata": {
        "id": "1iR23BMBrG6z"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tfp.distributions.Independent 'Two_Sets_Of_Five' batch_shape=[2] event_shape=[5] dtype=int32>"
            ]
          },
          "execution_count": 42,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_sets_of_five = tfd.Independent(\n",
        "    distribution=two_by_five_bernoulli,\n",
        "    reinterpreted_batch_ndims=1,\n",
        "    name=\"Two Sets Of Five\")\n",
        "two_sets_of_five"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mRrkesaPrG67"
      },
      "source": [
        "在数学上，我们通过将五次“独立”抛硬币的对数概率相加来计算每“组”五次抛硬币的对数概率，这就是分布名称的由来："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 43,
      "metadata": {
        "id": "LcM6OgKNrG66"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-5.160732 , -3.1954036], dtype=float32)>"
            ]
          },
          "execution_count": 43,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "two_sets_of_five.log_prob(pattern)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "krpnUVL9rG7A"
      },
      "source": [
        "我们可以更进一步，使用 `Independent` 来创建一个分布，其中各个事件是一组 2 × 5 伯努利试验："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 44,
      "metadata": {
        "id": "PXSsoidirG7A"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<tf.Tensor: shape=(), dtype=float32, numpy=-8.356134>"
            ]
          },
          "execution_count": 44,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "one_set_of_two_by_five = tfd.Independent(\n",
        "    distribution=two_by_five_bernoulli, reinterpreted_batch_ndims=2,\n",
        "    name=\"One Set Of Two By Five\")\n",
        "one_set_of_two_by_five.log_prob(pattern)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QfbfHA4hrG7F"
      },
      "source": [
        "值得注意的是，从 `sample` 的角度来看，使用 `Independent` 不会发生任何改变："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 45,
      "metadata": {
        "id": "uZ3NhQEZrG7F"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "tfp.distributions.Bernoulli(\"Two_By_Five_Bernoulli\", batch_shape=[2, 5], event_shape=[], dtype=int32)\n",
            "Sample shape: [3, 5]\n",
            "Returned sample tensor shape: (3, 5, 2, 5)\n",
            "\n",
            "tfp.distributions.Independent(\"Two_Sets_Of_Five\", batch_shape=[2], event_shape=[5], dtype=int32)\n",
            "Sample shape: [3, 5]\n",
            "Returned sample tensor shape: (3, 5, 2, 5)\n",
            "\n",
            "tfp.distributions.Independent(\"One_Set_Of_Two_By_Five\", batch_shape=[], event_shape=[2, 5], dtype=int32)\n",
            "Sample shape: [3, 5]\n",
            "Returned sample tensor shape: (3, 5, 2, 5)\n",
            "\n"
          ]
        }
      ],
      "source": [
        "describe_sample_tensor_shapes(\n",
        "    [two_by_five_bernoulli,\n",
        "     two_sets_of_five,\n",
        "     one_set_of_two_by_five],\n",
        "    [[3, 5]])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "usTT7v0trG7H"
      },
      "source": [
        "作为给读者的临别练习，我们建议从抽样和对数概率角度来考虑 `Normal` 分布的向量批次与 `MultivariateNormalDiag` 分布之间的差异和相似性。我们如何使用 `Independent` 从一个 `Normal` 批次构造一个 `MultivariateNormalDiag`？（请注意，`MultivariateNormalDiag` 实际上并未以这种方式实现。）"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "wDRB80oLrG48"
      ],
      "name": "Understanding_TensorFlow_Distributions_Shapes.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}