{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-12-14T22:53:26.896212Z", "iopub.status.busy": "2022-12-14T22:53:26.895771Z", "iopub.status.idle": "2022-12-14T22:53:26.899798Z", "shell.execute_reply": "2022-12-14T22:53:26.899147Z" }, "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# 高级自动微分" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
在 TensorFlow.org 上查看 在 Google Colab 中运行\n", " 在 GitHub 上查看源代码 下载笔记本
" ] }, { "cell_type": "markdown", "metadata": { "id": "8a859404ce7e" }, "source": [ "[梯度和自动微分简介](autodiff.ipynb)指南包括在 TensorFlow 中计算梯度所需的全部内容。本指南重点介绍 `tf.GradientTape` API 更深入、更不常见的功能。" ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## 设置" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:26.902953Z", "iopub.status.busy": "2022-12-14T22:53:26.902748Z", "iopub.status.idle": "2022-12-14T22:53:29.169891Z", "shell.execute_reply": "2022-12-14T22:53:29.169178Z" }, "id": "IqR2PQG4ZaZ0" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-12-14 22:53:27.845325: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", "2022-12-14 22:53:27.845442: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", "2022-12-14 22:53:27.845452: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" ] } ], "source": [ "import tensorflow as tf\n", "\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "\n", "mpl.rcParams['figure.figsize'] = (8, 6)" ] }, { "cell_type": "markdown", "metadata": { "id": "uGRJJRi8TCkJ" }, "source": [ "## 控制梯度记录\n", "\n", "在[自动微分指南](autodiff.ipynb)中,您已了解构建梯度计算时如何控制条带监视变量和张量。\n", "\n", "条带还具有操作记录的方法。" ] }, { "cell_type": "markdown", "metadata": { "id": "gB_i0VnhQKt2" }, "source": [ "### 停止记录\n", "\n", "如果您希望停止记录梯度,可以使用 `tf.GradientTape.stop_recording` 暂时挂起记录。\n", "\n", "如果您不希望在模型中间对复杂运算微分,这可能有助于减少开销。其中可能包括计算指标或中间结果:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:29.174186Z", "iopub.status.busy": "2022-12-14T22:53:29.173436Z", "iopub.status.idle": "2022-12-14T22:53:32.516698Z", "shell.execute_reply": "2022-12-14T22:53:32.515799Z" }, "id": "mhFSYf7uQWxR" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dz/dx: tf.Tensor(4.0, shape=(), dtype=float32)\n", "dz/dy: None\n" ] } ], "source": [ "x = tf.Variable(2.0)\n", "y = tf.Variable(3.0)\n", "\n", "with tf.GradientTape() as t:\n", " x_sq = x * x\n", " with t.stop_recording():\n", " y_sq = y * y\n", " z = x_sq + y_sq\n", "\n", "grad = t.gradient(z, {'x': x, 'y': y})\n", "\n", "print('dz/dx:', grad['x']) # 2*x => 4\n", "print('dz/dy:', grad['y'])" ] }, { "cell_type": "markdown", "metadata": { "id": "DEHbEZ1h4p8A" }, "source": [ "### 重置/从头开始纪录\n", "\n", "如果您希望完全重新开始,请使用 `tf.GradientTape.reset`。通常,直接退出梯度带块并重新开始比较易于读取,但在退出梯度带块有困难或不可行时,可以使用 `reset` 方法。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.520345Z", "iopub.status.busy": "2022-12-14T22:53:32.519757Z", "iopub.status.idle": "2022-12-14T22:53:32.526863Z", "shell.execute_reply": "2022-12-14T22:53:32.526265Z" }, "id": "lsMHsmrh4pqM" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dz/dx: tf.Tensor(4.0, shape=(), dtype=float32)\n", "dz/dy: None\n" ] } ], "source": [ "x = tf.Variable(2.0)\n", "y = tf.Variable(3.0)\n", "reset = True\n", "\n", "with tf.GradientTape() as t:\n", " y_sq = y * y\n", " if reset:\n", " # Throw out all the tape recorded so far.\n", " t.reset()\n", " z = x * x + y_sq\n", "\n", "grad = t.gradient(z, {'x': x, 'y': y})\n", "\n", "print('dz/dx:', grad['x']) # 2*x => 4\n", "print('dz/dy:', grad['y'])" ] }, { "cell_type": "markdown", "metadata": { "id": "6zS7cLmS6zMf" }, "source": [ "## 精确停止梯度流\n", "\n", "与上面的全局条带控制相比,`tf.stop_gradient` 函数更加精确。它可以用来阻止梯度沿着特定路径流动,而不需要访问条带本身:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.530119Z", "iopub.status.busy": "2022-12-14T22:53:32.529526Z", "iopub.status.idle": "2022-12-14T22:53:32.539224Z", "shell.execute_reply": "2022-12-14T22:53:32.538586Z" }, "id": "30qnZMe48BkB" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dz/dx: tf.Tensor(4.0, shape=(), dtype=float32)\n", "dz/dy: None\n" ] } ], "source": [ "x = tf.Variable(2.0)\n", "y = tf.Variable(3.0)\n", "\n", "with tf.GradientTape() as t:\n", " y_sq = y**2\n", " z = x**2 + tf.stop_gradient(y_sq)\n", "\n", "grad = t.gradient(z, {'x': x, 'y': y})\n", "\n", "print('dz/dx:', grad['x']) # 2*x => 4\n", "print('dz/dy:', grad['y'])" ] }, { "cell_type": "markdown", "metadata": { "id": "mbb-9lnGVngH" }, "source": [ "## 自定义梯度\n", "\n", "在某些情况下,您可能需要精确控制梯度的计算方式,而不是使用默认值。这些情况包括:\n", "\n", "1. 正在编写的新运算没有定义的梯度。\n", "2. 默认计算在数值上不稳定。\n", "3. 您希望从前向传递缓存开销大的计算。\n", "4. 您想修改一个值(例如,使用 `tf.clip_by_value` 或 `tf.math.round`)而不修改梯度。\n", "\n", "对于第一种情况,要编写新运算,您可以使用 `tf.RegisterGradient` 自行设置。(请参阅 API 文档了解详细信息)。(注意,梯度注册为全局,需谨慎更改。)\n", "\n", "对于后三种情况,可以使用 `tf.custom_gradient`。" ] }, { "cell_type": "markdown", "metadata": { "id": "oHr31kc_irF_" }, "source": [ "以下示例将 `tf.clip_by_norm` 应用于中间梯度:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.542484Z", "iopub.status.busy": "2022-12-14T22:53:32.541928Z", "iopub.status.idle": "2022-12-14T22:53:32.560962Z", "shell.execute_reply": "2022-12-14T22:53:32.560361Z" }, "id": "Mjj01w4NYtwd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(2.0, shape=(), dtype=float32)\n" ] } ], "source": [ "# Establish an identity operation, but clip during the gradient pass.\n", "@tf.custom_gradient\n", "def clip_gradients(y):\n", " def backward(dy):\n", " return tf.clip_by_norm(dy, 0.5)\n", " return y, backward\n", "\n", "v = tf.Variable(2.0)\n", "with tf.GradientTape() as t:\n", " output = clip_gradients(v * v)\n", "print(t.gradient(output, v)) # calls \"backward\", which clips 4 to 2" ] }, { "cell_type": "markdown", "metadata": { "id": "n4t7S0scYrD3" }, "source": [ "请参阅 `tf.custom_gradient` 装饰器 API 文档,了解更多详细信息。" ] }, { "cell_type": "markdown", "metadata": { "id": "v0ODp4Oi--I0" }, "source": [ "### SavedModel 中的自定义梯度\n", "\n", "注:此功能将从 TensorFlow 2.6 开始提供。\n", "\n", "可以使用选项 `tf.saved_model.SaveOptions(experimental_custom_gradients=True)` 将自定义梯度保存到 SavedModel。\n", "\n", "要保存到 SavedModel,梯度函数必须可以跟踪(要了解详情,请参阅[使用 tf.function 提高性能](function.ipynb)指南)。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.564363Z", "iopub.status.busy": "2022-12-14T22:53:32.563821Z", "iopub.status.idle": "2022-12-14T22:53:32.567723Z", "shell.execute_reply": "2022-12-14T22:53:32.567122Z" }, "id": "Q5JBgIBYjN1I" }, "outputs": [], "source": [ "class MyModule(tf.Module):\n", "\n", " @tf.function(input_signature=[tf.TensorSpec(None)])\n", " def call_custom_grad(self, x):\n", " return clip_gradients(x)\n", "\n", "model = MyModule()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.570540Z", "iopub.status.busy": "2022-12-14T22:53:32.570316Z", "iopub.status.idle": "2022-12-14T22:53:32.767847Z", "shell.execute_reply": "2022-12-14T22:53:32.767122Z" }, "id": "xZTrgy2q-9pq" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:tensorflow:Assets written to: saved_model/assets\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(2.0, shape=(), dtype=float32)\n" ] } ], "source": [ "tf.saved_model.save(\n", " model,\n", " 'saved_model',\n", " options=tf.saved_model.SaveOptions(experimental_custom_gradients=True))\n", "\n", "# The loaded gradients will be the same as the above example.\n", "v = tf.Variable(2.0)\n", "loaded = tf.saved_model.load('saved_model')\n", "with tf.GradientTape() as t:\n", " output = loaded.call_custom_grad(v * v)\n", "print(t.gradient(output, v))" ] }, { "cell_type": "markdown", "metadata": { "id": "d-LfRs5FbJCk" }, "source": [ "关于上述示例的注意事项:如果您尝试用 `tf.saved_model.SaveOptions(experimental_custom_gradients=False)` 替换上面的代码,梯度仍会在加载时产生相同的结果。原因在于,梯度注册表仍然包含函数 `call_custom_op` 中使用的自定义梯度。但是,如果在没有自定义梯度的情况下保存后重新启动运行时,则在 `tf.GradientTape` 下运行加载的模型会抛出错误:`LookupError: No gradient defined for operation 'IdentityN' (op type: IdentityN)`。" ] }, { "cell_type": "markdown", "metadata": { "id": "8aENEt6Veryb" }, "source": [ "## 多个条带\n", "\n", "多个条带无缝交互。\n", "\n", "例如,下面每个条带监视不同的张量集:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.771447Z", "iopub.status.busy": "2022-12-14T22:53:32.770894Z", "iopub.status.idle": "2022-12-14T22:53:32.778938Z", "shell.execute_reply": "2022-12-14T22:53:32.778242Z" }, "id": "BJ0HdMvte0VZ" }, "outputs": [], "source": [ "x0 = tf.constant(0.0)\n", "x1 = tf.constant(0.0)\n", "\n", "with tf.GradientTape() as tape0, tf.GradientTape() as tape1:\n", " tape0.watch(x0)\n", " tape1.watch(x1)\n", "\n", " y0 = tf.math.sin(x0)\n", " y1 = tf.nn.sigmoid(x1)\n", "\n", " y = y0 + y1\n", "\n", " ys = tf.reduce_sum(y)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.782321Z", "iopub.status.busy": "2022-12-14T22:53:32.781826Z", "iopub.status.idle": "2022-12-14T22:53:32.791918Z", "shell.execute_reply": "2022-12-14T22:53:32.791320Z" }, "id": "6ApAoMNFfNz6" }, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tape0.gradient(ys, x0).numpy() # cos(x) => 1.0" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.795148Z", "iopub.status.busy": "2022-12-14T22:53:32.794645Z", "iopub.status.idle": "2022-12-14T22:53:32.800473Z", "shell.execute_reply": "2022-12-14T22:53:32.799873Z" }, "id": "rF1jrAJsfYW_" }, "outputs": [ { "data": { "text/plain": [ "0.25" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tape1.gradient(ys, x1).numpy() # sigmoid(x1)*(1-sigmoid(x1)) => 0.25" ] }, { "cell_type": "markdown", "metadata": { "id": "DK05KXrAAld3" }, "source": [ "### 高阶梯度\n", "\n", "`tf.GradientTape` 上下文管理器内的运算会被记录下来,以供自动微分。如果在该上下文中计算梯度,梯度计算也会被记录。因此,完全相同的 API 也适用于高阶梯度。\n", "\n", "例如:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.803497Z", "iopub.status.busy": "2022-12-14T22:53:32.803216Z", "iopub.status.idle": "2022-12-14T22:53:32.811810Z", "shell.execute_reply": "2022-12-14T22:53:32.811227Z" }, "id": "cPQgthZ7ugRJ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dy_dx: 3.0\n", "d2y_dx2: 6.0\n" ] } ], "source": [ "x = tf.Variable(1.0) # Create a Tensorflow variable initialized to 1.0\n", "\n", "with tf.GradientTape() as t2:\n", " with tf.GradientTape() as t1:\n", " y = x * x * x\n", "\n", " # Compute the gradient inside the outer `t2` context manager\n", " # which means the gradient computation is differentiable as well.\n", " dy_dx = t1.gradient(y, x)\n", "d2y_dx2 = t2.gradient(dy_dx, x)\n", "\n", "print('dy_dx:', dy_dx.numpy()) # 3 * x**2 => 3.0\n", "print('d2y_dx2:', d2y_dx2.numpy()) # 6 * x => 6.0" ] }, { "cell_type": "markdown", "metadata": { "id": "k0HV-Ah4_76i" }, "source": [ "虽然这确实可以得到*标量*函数的二次导数,但这种模式并不能通用于生成黑塞矩阵,因为 `tf.GradientTape.gradient` 只计算标量的梯度。要构造[黑塞矩阵](https://en.wikipedia.org/wiki/Hessian_matrix),请参见[“雅可比矩阵”部分](https://en.wikipedia.org/wiki/Hessian_matrix)下的[“黑塞矩阵”示例](#hessian)。\n", "\n", "当您从梯度计算标量,然后产生的标量作为第二个梯度计算的源时,“嵌套调用 `tf.GradientTape.gradient`”是一种不错的模式,如以下示例所示。\n" ] }, { "cell_type": "markdown", "metadata": { "id": "t7LRlcpVKHv1" }, "source": [ "#### 示例:输入梯度正则化\n", "\n", "许多模型都容易受到“对抗样本”影响。这种技术的集合会修改模型的输入,进而混淆模型输出。最简单的实现(例如,[使用 Fast Gradient Signed Method 攻击的对抗样本](https://tensorflow.google.cn/tutorials/generative/adversarial_fgsm))沿着输出相对于输入的梯度(即“输入梯度”) 迈出一步。\n", "\n", "一种增强相对于对抗样本的稳健性的方法是[输入梯度正则化](https://arxiv.org/abs/1905.11468)(Finlay 和 Oberman,2019 年),这种方法会尝试将输入梯度的幅度最小化。如果输入梯度较小,那么输出的变化也应该较小。\n", "\n", "以下是输入梯度正则化的简单实现:\n", "\n", "1. 使用内条带计算输出相对于输入的梯度。\n", "2. 计算该输入梯度的幅度。\n", "3. 计算该幅度相对于模型的梯度。" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.815159Z", "iopub.status.busy": "2022-12-14T22:53:32.814659Z", "iopub.status.idle": "2022-12-14T22:53:32.823086Z", "shell.execute_reply": "2022-12-14T22:53:32.822530Z" }, "id": "tH3ZFuUfDLrR" }, "outputs": [], "source": [ "x = tf.random.normal([7, 5])\n", "\n", "layer = tf.keras.layers.Dense(10, activation=tf.nn.relu)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:32.826170Z", "iopub.status.busy": "2022-12-14T22:53:32.825717Z", "iopub.status.idle": "2022-12-14T22:53:33.181410Z", "shell.execute_reply": "2022-12-14T22:53:33.180473Z" }, "id": "E6yOFsjEDR9u" }, "outputs": [], "source": [ "with tf.GradientTape() as t2:\n", " # The inner tape only takes the gradient with respect to the input,\n", " # not the variables.\n", " with tf.GradientTape(watch_accessed_variables=False) as t1:\n", " t1.watch(x)\n", " y = layer(x)\n", " out = tf.reduce_sum(layer(x)**2)\n", " # 1. Calculate the input gradient.\n", " g1 = t1.gradient(out, x)\n", " # 2. Calculate the magnitude of the input gradient.\n", " g1_mag = tf.norm(g1)\n", "\n", "# 3. Calculate the gradient of the magnitude with respect to the model.\n", "dg1_mag = t2.gradient(g1_mag, layer.trainable_variables)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.185576Z", "iopub.status.busy": "2022-12-14T22:53:33.185054Z", "iopub.status.idle": "2022-12-14T22:53:33.189613Z", "shell.execute_reply": "2022-12-14T22:53:33.188983Z" }, "id": "123QMq6PqK_d" }, "outputs": [ { "data": { "text/plain": [ "[TensorShape([5, 10]), TensorShape([10])]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[var.shape for var in dg1_mag]" ] }, { "cell_type": "markdown", "metadata": { "id": "E4xiYigexMtQ" }, "source": [ "## 雅可比矩阵\n" ] }, { "cell_type": "markdown", "metadata": { "id": "4-hVHVIeExkI" }, "source": [ "以上所有示例都取标量目标相对于某些源张量的梯度。\n", "\n", "[雅可比矩阵](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)代表向量值函数的梯度。每行都包含其中一个向量元素的梯度。\n", "\n", "`tf.GradientTape.jacobian` 方法让您能够有效计算雅可比矩阵。" ] }, { "cell_type": "markdown", "metadata": { "id": "KzNyIM0QBYIH" }, "source": [ "注意:\n", "\n", "- 类似于 `gradient`:`sources` 参数可以是张量或张量的容器。\n", "- 不同于 `gradient`:`target` 张量必须是单个张量。" ] }, { "cell_type": "markdown", "metadata": { "id": "O74K3hlxBC8a" }, "source": [ "### 标量源" ] }, { "cell_type": "markdown", "metadata": { "id": "B08OKn1Orkuc" }, "source": [ "作为第一个示例,以下是矢量目标相对于标量源的雅可比矩阵。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.193558Z", "iopub.status.busy": "2022-12-14T22:53:33.193031Z", "iopub.status.idle": "2022-12-14T22:53:33.361152Z", "shell.execute_reply": "2022-12-14T22:53:33.360438Z" }, "id": "bAFeIE8EuVIq" }, "outputs": [], "source": [ "x = tf.linspace(-10.0, 10.0, 200+1)\n", "delta = tf.Variable(0.0)\n", "\n", "with tf.GradientTape() as tape:\n", " y = tf.nn.sigmoid(x+delta)\n", "\n", "dy_dx = tape.jacobian(y, delta)" ] }, { "cell_type": "markdown", "metadata": { "id": "BgHbUk3zr-WU" }, "source": [ "当您相对于标量取雅可比矩阵时,结果为**目标**的形状,并给出每个元素相对于源的梯度:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.365004Z", "iopub.status.busy": "2022-12-14T22:53:33.364752Z", "iopub.status.idle": "2022-12-14T22:53:33.368584Z", "shell.execute_reply": "2022-12-14T22:53:33.367924Z" }, "id": "iZ6awnDzr_BA" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(201,)\n", "(201,)\n" ] } ], "source": [ "print(y.shape)\n", "print(dy_dx.shape)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.372097Z", "iopub.status.busy": "2022-12-14T22:53:33.371423Z", "iopub.status.idle": "2022-12-14T22:53:33.548977Z", "shell.execute_reply": "2022-12-14T22:53:33.548264Z" }, "id": "siNZaklc0_-e" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(x.numpy(), y, label='y')\n", "plt.plot(x.numpy(), dy_dx, label='dy/dx')\n", "plt.legend()\n", "_ = plt.xlabel('x')" ] }, { "cell_type": "markdown", "metadata": { "id": "DsOMSD_1BGkD" }, "source": [ "### 张量源" ] }, { "cell_type": "markdown", "metadata": { "id": "g3iXKN7KF-st" }, "source": [ "无论输入是标量还是张量,`tf.GradientTape.jacobian` 都能有效计算源的每个元素相对于目标的每个元素的梯度。\n", "\n", "例如,此层的输出的形状为 `(10,7)`。" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.552940Z", "iopub.status.busy": "2022-12-14T22:53:33.552263Z", "iopub.status.idle": "2022-12-14T22:53:33.562815Z", "shell.execute_reply": "2022-12-14T22:53:33.562242Z" }, "id": "39YXItgLxMBk" }, "outputs": [ { "data": { "text/plain": [ "TensorShape([7, 10])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = tf.random.normal([7, 5])\n", "layer = tf.keras.layers.Dense(10, activation=tf.nn.relu)\n", "\n", "with tf.GradientTape(persistent=True) as tape:\n", " y = layer(x)\n", "\n", "y.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "tshNRtfKuVP_" }, "source": [ "层内核的形状是 `(5,10)`。" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.566325Z", "iopub.status.busy": "2022-12-14T22:53:33.565753Z", "iopub.status.idle": "2022-12-14T22:53:33.570051Z", "shell.execute_reply": "2022-12-14T22:53:33.569411Z" }, "id": "CigTWyfPvPuv" }, "outputs": [ { "data": { "text/plain": [ "TensorShape([5, 10])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "layer.kernel.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "mN96JRpnAjpx" }, "source": [ "将这两个形状连在一起就是输出相对于内核的雅可比矩阵的形状:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.573259Z", "iopub.status.busy": "2022-12-14T22:53:33.572715Z", "iopub.status.idle": "2022-12-14T22:53:33.688633Z", "shell.execute_reply": "2022-12-14T22:53:33.687891Z" }, "id": "pRLzTTbvEimH" }, "outputs": [ { "data": { "text/plain": [ "TensorShape([7, 10, 5, 10])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "j = tape.jacobian(y, layer.kernel)\n", "j.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "2Lrv7miMvTll" }, "source": [ "如果您在目标的维度上求和,会得到由 `tf.GradientTape.gradient` 计算的总和的梯度:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.692551Z", "iopub.status.busy": "2022-12-14T22:53:33.691997Z", "iopub.status.idle": "2022-12-14T22:53:33.701294Z", "shell.execute_reply": "2022-12-14T22:53:33.700672Z" }, "id": "FJjZpYRnDjVa" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "g.shape: (5, 10)\n", "delta: 4.7683716e-07\n" ] } ], "source": [ "g = tape.gradient(y, layer.kernel)\n", "print('g.shape:', g.shape)\n", "\n", "j_sum = tf.reduce_sum(j, axis=[0, 1])\n", "delta = tf.reduce_max(abs(g - j_sum)).numpy()\n", "assert delta < 1e-3\n", "print('delta:', delta)" ] }, { "cell_type": "markdown", "metadata": { "id": "ZKajuGlk_krs" }, "source": [ " \n", "\n", "#### 示例:黑塞矩阵" ] }, { "cell_type": "markdown", "metadata": { "id": "NYcsXeo8TDLi" }, "source": [ "虽然 `tf.GradientTape` 并没有给出构造[黑塞矩阵](https://en.wikipedia.org/wiki/Hessian_matrix)的显式方法,但可以使用 tf.GradientTape.jacobian 方法进行构建。\n", "\n", "注:黑塞矩阵包含 `N**2` 个参数。由于这个原因和其他原因,它对于大多数模型都不实际。此示例主要是为了演示如何使用 `tf.GradientTape.jacobian` 方法,并不是对直接黑塞矩阵优化的认可。黑塞矩阵向量积可以[通过嵌套条带有效计算](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/benchmarks/resnet50/hvp_test.py),这也是一种更有效的二阶优化方法。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.704964Z", "iopub.status.busy": "2022-12-14T22:53:33.704370Z", "iopub.status.idle": "2022-12-14T22:53:33.982510Z", "shell.execute_reply": "2022-12-14T22:53:33.981813Z" }, "id": "ELGTaell_j81" }, "outputs": [], "source": [ "x = tf.random.normal([7, 5])\n", "layer1 = tf.keras.layers.Dense(8, activation=tf.nn.relu)\n", "layer2 = tf.keras.layers.Dense(6, activation=tf.nn.relu)\n", "\n", "with tf.GradientTape() as t2:\n", " with tf.GradientTape() as t1:\n", " x = layer1(x)\n", " x = layer2(x)\n", " loss = tf.reduce_mean(x**2)\n", "\n", " g = t1.gradient(loss, layer1.kernel)\n", "\n", "h = t2.jacobian(g, layer1.kernel)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.986560Z", "iopub.status.busy": "2022-12-14T22:53:33.986051Z", "iopub.status.idle": "2022-12-14T22:53:33.989710Z", "shell.execute_reply": "2022-12-14T22:53:33.989070Z" }, "id": "FVqQuZj4XGjm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "layer.kernel.shape: (5, 8)\n", "h.shape: (5, 8, 5, 8)\n" ] } ], "source": [ "print(f'layer.kernel.shape: {layer1.kernel.shape}')\n", "print(f'h.shape: {h.shape}')" ] }, { "cell_type": "markdown", "metadata": { "id": "_M7XElgaiMeP" }, "source": [ "要将此黑塞矩阵用于[牛顿方法](https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization)步骤,首先需要将其轴展平为矩阵,然后将梯度展平为向量:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:33.993118Z", "iopub.status.busy": "2022-12-14T22:53:33.992570Z", "iopub.status.idle": "2022-12-14T22:53:33.999939Z", "shell.execute_reply": "2022-12-14T22:53:33.999275Z" }, "id": "6te7N6wVXwXX" }, "outputs": [], "source": [ "n_params = tf.reduce_prod(layer1.kernel.shape)\n", "\n", "g_vec = tf.reshape(g, [n_params, 1])\n", "h_mat = tf.reshape(h, [n_params, n_params])" ] }, { "cell_type": "markdown", "metadata": { "id": "L9rO8b-0mgOH" }, "source": [ "黑塞矩阵应当对称:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.003205Z", "iopub.status.busy": "2022-12-14T22:53:34.002673Z", "iopub.status.idle": "2022-12-14T22:53:34.006052Z", "shell.execute_reply": "2022-12-14T22:53:34.005510Z" }, "id": "8TCHc7Vrf52S" }, "outputs": [], "source": [ "def imshow_zero_center(image, **kwargs):\n", " lim = tf.reduce_max(abs(image))\n", " plt.imshow(image, vmin=-lim, vmax=lim, cmap='seismic', **kwargs)\n", " plt.colorbar()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.009264Z", "iopub.status.busy": "2022-12-14T22:53:34.008630Z", "iopub.status.idle": "2022-12-14T22:53:34.222801Z", "shell.execute_reply": "2022-12-14T22:53:34.222222Z" }, "id": "DExOxd7Ok2H0" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "imshow_zero_center(h_mat)" ] }, { "cell_type": "markdown", "metadata": { "id": "13fBswmtQes4" }, "source": [ "牛顿方法更新步骤如下所示。" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.226346Z", "iopub.status.busy": "2022-12-14T22:53:34.225861Z", "iopub.status.idle": "2022-12-14T22:53:34.232357Z", "shell.execute_reply": "2022-12-14T22:53:34.231774Z" }, "id": "3DdnbynBdSor" }, "outputs": [], "source": [ "eps = 1e-3\n", "eye_eps = tf.eye(h_mat.shape[0])*eps" ] }, { "cell_type": "markdown", "metadata": { "id": "-zPdtyoWeUeV" }, "source": [ "注:[实际上不反转矩阵](https://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/)。" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.235864Z", "iopub.status.busy": "2022-12-14T22:53:34.235207Z", "iopub.status.idle": "2022-12-14T22:53:34.336240Z", "shell.execute_reply": "2022-12-14T22:53:34.335482Z" }, "id": "k1LYftgmswOO" }, "outputs": [], "source": [ "# X(k+1) = X(k) - (∇²f(X(k)))^-1 @ ∇f(X(k))\n", "# h_mat = ∇²f(X(k))\n", "# g_vec = ∇f(X(k))\n", "update = tf.linalg.solve(h_mat + eye_eps, g_vec)\n", "\n", "# Reshape the update and apply it to the variable.\n", "_ = layer1.kernel.assign_sub(tf.reshape(update, layer1.kernel.shape))" ] }, { "cell_type": "markdown", "metadata": { "id": "pF6qjlHKWxF4" }, "source": [ "虽然这对于单个 `tf.Variable` 来说相对简单,但将其应用于非平凡模型则需要仔细的级联和切片,以产生跨多个变量的完整黑塞矩阵。" ] }, { "cell_type": "markdown", "metadata": { "id": "PQWM0uN-GO5t" }, "source": [ "### 批量雅可比矩阵" ] }, { "cell_type": "markdown", "metadata": { "id": "hKtB3rY6EySJ" }, "source": [ "在某些情况下,您需要取各个目标堆栈相对于源堆栈的雅可比矩阵,其中每个目标-源对的雅可比矩阵都是独立的。\n", "\n", "例如,此处的输入 `x` 形状为 `(batch, ins)` ,输出 `y` 形状为 `(batch, outs)`:\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.340794Z", "iopub.status.busy": "2022-12-14T22:53:34.340282Z", "iopub.status.idle": "2022-12-14T22:53:34.356398Z", "shell.execute_reply": "2022-12-14T22:53:34.355804Z" }, "id": "tQMndhIUHMes" }, "outputs": [ { "data": { "text/plain": [ "TensorShape([7, 6])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = tf.random.normal([7, 5])\n", "\n", "layer1 = tf.keras.layers.Dense(8, activation=tf.nn.elu)\n", "layer2 = tf.keras.layers.Dense(6, activation=tf.nn.elu)\n", "\n", "with tf.GradientTape(persistent=True, watch_accessed_variables=False) as tape:\n", " tape.watch(x)\n", " y = layer1(x)\n", " y = layer2(y)\n", "\n", "y.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "Ff2spRHEJXBU" }, "source": [ "`y` 相对 `x` 的完整雅可比矩阵的形状为 `(batch, ins, batch, outs)`,即使您只想要 `(batch, ins, outs)`。" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.359583Z", "iopub.status.busy": "2022-12-14T22:53:34.359107Z", "iopub.status.idle": "2022-12-14T22:53:34.472362Z", "shell.execute_reply": "2022-12-14T22:53:34.471792Z" }, "id": "1zSl2A5-HhMH" }, "outputs": [ { "data": { "text/plain": [ "TensorShape([7, 6, 7, 5])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "j = tape.jacobian(y, x)\n", "j.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "UibJijPLJrpQ" }, "source": [ "如果堆栈中各项的梯度相互独立,那么此张量的每一个 `(batch, batch)` 切片都是对角矩阵:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.475794Z", "iopub.status.busy": "2022-12-14T22:53:34.475245Z", "iopub.status.idle": "2022-12-14T22:53:34.713223Z", "shell.execute_reply": "2022-12-14T22:53:34.712612Z" }, "id": "ZFl9uj3ueVSH" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "imshow_zero_center(j[:, 0, :, 0])\n", "_ = plt.title('A (batch, batch) slice')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.716722Z", "iopub.status.busy": "2022-12-14T22:53:34.716084Z", "iopub.status.idle": "2022-12-14T22:53:34.938202Z", "shell.execute_reply": "2022-12-14T22:53:34.937510Z" }, "id": "g4ZoRJcJNmy5" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plot_as_patches(j):\n", " # Reorder axes so the diagonals will each form a contiguous patch.\n", " j = tf.transpose(j, [1, 0, 3, 2])\n", " # Pad in between each patch.\n", " lim = tf.reduce_max(abs(j))\n", " j = tf.pad(j, [[0, 0], [1, 1], [0, 0], [1, 1]],\n", " constant_values=-lim)\n", " # Reshape to form a single image.\n", " s = j.shape\n", " j = tf.reshape(j, [s[0]*s[1], s[2]*s[3]])\n", " imshow_zero_center(j, extent=[-0.5, s[2]-0.5, s[0]-0.5, -0.5])\n", "\n", "plot_as_patches(j)\n", "_ = plt.title('All (batch, batch) slices are diagonal')" ] }, { "cell_type": "markdown", "metadata": { "id": "OXpTBKyeK84z" }, "source": [ "要获取所需结果,您可以对重复的 `batch` 维度求和,或者使用 `tf.einsum` 选择对角线:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.942587Z", "iopub.status.busy": "2022-12-14T22:53:34.942002Z", "iopub.status.idle": "2022-12-14T22:53:34.948481Z", "shell.execute_reply": "2022-12-14T22:53:34.947786Z" }, "id": "v65OAjEgLQwl" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(7, 6, 5)\n", "(7, 6, 5)\n" ] } ], "source": [ "j_sum = tf.reduce_sum(j, axis=2)\n", "print(j_sum.shape)\n", "j_select = tf.einsum('bxby->bxy', j)\n", "print(j_select.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "zT_VfR6lcwxD" }, "source": [ "没有额外维度时,计算会更加高效。`tf.GradientTape.batch_jacobian` 方法就是如此运作的:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:34.952042Z", "iopub.status.busy": "2022-12-14T22:53:34.951485Z", "iopub.status.idle": "2022-12-14T22:53:35.102117Z", "shell.execute_reply": "2022-12-14T22:53:35.101534Z" }, "id": "YJLIl9WpHqYq" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:5 out of the last 5 calls to .f at 0x7f85281da0d0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" ] }, { "data": { "text/plain": [ "TensorShape([7, 6, 5])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jb = tape.batch_jacobian(y, x)\n", "jb.shape" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:35.105361Z", "iopub.status.busy": "2022-12-14T22:53:35.104834Z", "iopub.status.idle": "2022-12-14T22:53:35.110521Z", "shell.execute_reply": "2022-12-14T22:53:35.109932Z" }, "id": "-5t_q5SfHw7T" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0\n" ] } ], "source": [ "error = tf.reduce_max(abs(jb - j_sum))\n", "assert error < 1e-3\n", "print(error.numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "IUeY2ZCiL31I" }, "source": [ "小心:`tf.GradientTape.batch_jacobian` 只验证源和目标的第一维是否匹配,并不会检查梯度是否独立。用户需要确保仅在合理条件下使用 `batch_jacobian`。例如,添加 `tf.keras.layers.BatchNormalization` 将破坏独立性,因为它在 `batch` 维度进行了归一化:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:35.113759Z", "iopub.status.busy": "2022-12-14T22:53:35.113287Z", "iopub.status.idle": "2022-12-14T22:53:35.456923Z", "shell.execute_reply": "2022-12-14T22:53:35.456301Z" }, "id": "tnDugVc-L4fj" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:6 out of the last 6 calls to .f at 0x7f85a003df70> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "j.shape: (7, 6, 7, 5)\n" ] } ], "source": [ "x = tf.random.normal([7, 5])\n", "\n", "layer1 = tf.keras.layers.Dense(8, activation=tf.nn.elu)\n", "bn = tf.keras.layers.BatchNormalization()\n", "layer2 = tf.keras.layers.Dense(6, activation=tf.nn.elu)\n", "\n", "with tf.GradientTape(persistent=True, watch_accessed_variables=False) as tape:\n", " tape.watch(x)\n", " y = layer1(x)\n", " y = bn(y, training=True)\n", " y = layer2(y)\n", "\n", "j = tape.jacobian(y, x)\n", "print(f'j.shape: {j.shape}')" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:35.460383Z", "iopub.status.busy": "2022-12-14T22:53:35.459793Z", "iopub.status.idle": "2022-12-14T22:53:35.681924Z", "shell.execute_reply": "2022-12-14T22:53:35.681313Z" }, "id": "SNyZ1WhJMVLm" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_as_patches(j)\n", "\n", "_ = plt.title('These slices are not diagonal')\n", "_ = plt.xlabel(\"Don't use `batch_jacobian`\")" ] }, { "cell_type": "markdown", "metadata": { "id": "M_x7ih5sarvG" }, "source": [ "在此示例中,`batch_jacobian` 仍然可以运行并返回*某些信息*与预期形状,但其内容具有不明确的含义:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T22:53:35.685362Z", "iopub.status.busy": "2022-12-14T22:53:35.684863Z", "iopub.status.idle": "2022-12-14T22:53:36.035066Z", "shell.execute_reply": "2022-12-14T22:53:36.034368Z" }, "id": "k8_mICHoasCi" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "jb.shape: (7, 6, 5)\n" ] } ], "source": [ "jb = tape.batch_jacobian(y, x)\n", "print(f'jb.shape: {jb.shape}')" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "advanced_autodiff.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 0 }