{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2024-08-15T01:30:06.291838Z", "iopub.status.busy": "2024-08-15T01:30:06.291620Z", "iopub.status.idle": "2024-08-15T01:30:06.295536Z", "shell.execute_reply": "2024-08-15T01:30:06.294994Z" }, "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# Introduction to gradients and automatic differentiation" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on TensorFlow.org\n", " \n", " Run in Google Colab\n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "r6P32iYYV27b" }, "source": [ "## Automatic Differentiation and Gradients\n", "\n", "[Automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)\n", "is useful for implementing machine learning algorithms such as\n", "[backpropagation](https://en.wikipedia.org/wiki/Backpropagation) for training\n", "neural networks.\n", "\n", "In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution." ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:06.298872Z", "iopub.status.busy": "2024-08-15T01:30:06.298622Z", "iopub.status.idle": "2024-08-15T01:30:08.870485Z", "shell.execute_reply": "2024-08-15T01:30:08.869676Z" }, "id": "IqR2PQG4ZaZ0" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-08-15 01:30:07.003169: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-08-15 01:30:07.023862: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-08-15 01:30:07.029954: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n" ] } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import tensorflow as tf" ] }, { "cell_type": "markdown", "metadata": { "id": "xHxb-dlhMIzW" }, "source": [ "## Computing gradients\n", "\n", "To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the *forward* pass. Then, during the *backward pass*, TensorFlow traverses this list of operations in reverse order to compute gradients." ] }, { "cell_type": "markdown", "metadata": { "id": "1CLWJl0QliB0" }, "source": [ "## Gradient tapes\n", "\n", "TensorFlow provides the `tf.GradientTape` API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually `tf.Variable`s.\n", "TensorFlow \"records\" relevant operations executed inside the context of a `tf.GradientTape` onto a \"tape\". TensorFlow then uses that tape to compute the gradients of a \"recorded\" computation using [reverse mode differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation).\n", "\n", "Here is a simple example:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:08.874835Z", "iopub.status.busy": "2024-08-15T01:30:08.874386Z", "iopub.status.idle": "2024-08-15T01:30:11.490841Z", "shell.execute_reply": "2024-08-15T01:30:11.490090Z" }, "id": "Xq9GgTCP7a4A" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "I0000 00:00:1723685409.408818 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.412555 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.416343 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.420087 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.431667 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.435229 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.438777 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.442350 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.445712 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.449141 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.452491 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685409.456034 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.685265 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.687389 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.689411 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.691490 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.693542 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.695541 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.697441 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.699432 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.701351 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.703333 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.705229 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.707222 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.744994 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.747037 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.749507 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.751538 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.753500 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.755501 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.757421 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.759404 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.761363 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.763858 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.766199 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685410.768560 20970 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n" ] } ], "source": [ "x = tf.Variable(3.0)\n", "\n", "with tf.GradientTape() as tape:\n", " y = x**2" ] }, { "cell_type": "markdown", "metadata": { "id": "CR9tFAP_7cra" }, "source": [ "Once you've recorded some operations, use `GradientTape.gradient(target, sources)` to calculate the gradient of some target (often a loss) relative to some source (often the model's variables):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:11.494831Z", "iopub.status.busy": "2024-08-15T01:30:11.494515Z", "iopub.status.idle": "2024-08-15T01:30:11.507497Z", "shell.execute_reply": "2024-08-15T01:30:11.506851Z" }, "id": "LsvrwF6bHroC" }, "outputs": [ { "data": { "text/plain": [ "6.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dy = 2x * dx\n", "dy_dx = tape.gradient(y, x)\n", "dy_dx.numpy()" ] }, { "cell_type": "markdown", "metadata": { "id": "Q2_aqsO25Vx1" }, "source": [ "The above example uses scalars, but `tf.GradientTape` works as easily on any tensor:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:11.510543Z", "iopub.status.busy": "2024-08-15T01:30:11.510294Z", "iopub.status.idle": "2024-08-15T01:30:11.575341Z", "shell.execute_reply": "2024-08-15T01:30:11.574704Z" }, "id": "vacZ3-Ws5VdV" }, "outputs": [], "source": [ "w = tf.Variable(tf.random.normal((3, 2)), name='w')\n", "b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')\n", "x = [[1., 2., 3.]]\n", "\n", "with tf.GradientTape(persistent=True) as tape:\n", " y = x @ w + b\n", " loss = tf.reduce_mean(y**2)" ] }, { "cell_type": "markdown", "metadata": { "id": "i4eXOkrQ-9Pb" }, "source": [ "To get the gradient of `loss` with respect to both variables, you can pass both as sources to the `gradient` method. The tape is flexible about how sources are passed and will accept any nested combination of lists or dictionaries and return the gradient structured the same way (see `tf.nest`)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:11.579188Z", "iopub.status.busy": "2024-08-15T01:30:11.578956Z", "iopub.status.idle": "2024-08-15T01:30:11.597447Z", "shell.execute_reply": "2024-08-15T01:30:11.596846Z" }, "id": "luOtK1Da_BR0" }, "outputs": [], "source": [ "[dl_dw, dl_db] = tape.gradient(loss, [w, b])" ] }, { "cell_type": "markdown", "metadata": { "id": "Ei4iVXi6qgM7" }, "source": [ "The gradient with respect to each source has the shape of the source:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:11.600712Z", "iopub.status.busy": "2024-08-15T01:30:11.600470Z", "iopub.status.idle": "2024-08-15T01:30:11.604206Z", "shell.execute_reply": "2024-08-15T01:30:11.603480Z" }, "id": "aYbWRFPZqk4U" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 2)\n", "(3, 2)\n" ] } ], "source": [ "print(w.shape)\n", "print(dl_dw.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "dI_SzxHsvao1" }, "source": [ "Here is the gradient calculation again, this time passing a dictionary of variables:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:11.607301Z", "iopub.status.busy": "2024-08-15T01:30:11.607081Z", "iopub.status.idle": "2024-08-15T01:30:11.614206Z", "shell.execute_reply": "2024-08-15T01:30:11.613564Z" }, "id": "d73cY6NOuaMd" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_vars = {\n", " 'w': w,\n", " 'b': b\n", "}\n", "\n", "grad = tape.gradient(loss, my_vars)\n", "grad['b']" ] }, { "cell_type": "markdown", "metadata": { "id": "HZ2LvHifEMgO" }, "source": [ "## Gradients with respect to a model\n", "\n", "It's common to collect `tf.Variables` into a `tf.Module` or one of its subclasses (`layers.Layer`, `keras.Model`) for [checkpointing](checkpoint.ipynb) and [exporting](saved_model.ipynb).\n", "\n", "In most cases, you will want to calculate gradients with respect to a model's trainable variables. Since all subclasses of `tf.Module` aggregate their variables in the `Module.trainable_variables` property, you can calculate these gradients in a few lines of code: " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:11.617798Z", "iopub.status.busy": "2024-08-15T01:30:11.617178Z", "iopub.status.idle": "2024-08-15T01:30:12.317074Z", "shell.execute_reply": "2024-08-15T01:30:12.316345Z" }, "id": "JvesHtbQESc-" }, "outputs": [], "source": [ "layer = tf.keras.layers.Dense(2, activation='relu')\n", "x = tf.constant([[1., 2., 3.]])\n", "\n", "with tf.GradientTape() as tape:\n", " # Forward pass\n", " y = layer(x)\n", " loss = tf.reduce_mean(y**2)\n", "\n", "# Calculate gradients with respect to every trainable variable\n", "grad = tape.gradient(loss, layer.trainable_variables)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.321035Z", "iopub.status.busy": "2024-08-15T01:30:12.320769Z", "iopub.status.idle": "2024-08-15T01:30:12.324853Z", "shell.execute_reply": "2024-08-15T01:30:12.324246Z" }, "id": "PR_ezr6UFrpI" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "kernel, shape: (3, 2)\n", "bias, shape: (2,)\n" ] } ], "source": [ "for var, g in zip(layer.trainable_variables, grad):\n", " print(f'{var.name}, shape: {g.shape}')" ] }, { "cell_type": "markdown", "metadata": { "id": "f6Gx6LS714zR" }, "source": [ "\n", "\n", "## Controlling what the tape watches" ] }, { "cell_type": "markdown", "metadata": { "id": "N4VlqKFzzGaC" }, "source": [ "The default behavior is to record all operations after accessing a trainable `tf.Variable`. The reasons for this are:\n", "\n", "* The tape needs to know which operations to record in the forward pass to calculate the gradients in the backwards pass.\n", "* The tape holds references to intermediate outputs, so you don't want to record unnecessary operations.\n", "* The most common use case involves calculating the gradient of a loss with respect to all a model's trainable variables.\n", "\n", "For example, the following fails to calculate a gradient because the `tf.Tensor` is not \"watched\" by default, and the `tf.Variable` is not trainable:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.328318Z", "iopub.status.busy": "2024-08-15T01:30:12.328063Z", "iopub.status.idle": "2024-08-15T01:30:12.338607Z", "shell.execute_reply": "2024-08-15T01:30:12.338012Z" }, "id": "Kj9gPckdB37a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(6.0, shape=(), dtype=float32)\n", "None\n", "None\n", "None\n" ] } ], "source": [ "# A trainable variable\n", "x0 = tf.Variable(3.0, name='x0')\n", "# Not trainable\n", "x1 = tf.Variable(3.0, name='x1', trainable=False)\n", "# Not a Variable: A variable + tensor returns a tensor.\n", "x2 = tf.Variable(2.0, name='x2') + 1.0\n", "# Not a variable\n", "x3 = tf.constant(3.0, name='x3')\n", "\n", "with tf.GradientTape() as tape:\n", " y = (x0**2) + (x1**2) + (x2**2)\n", "\n", "grad = tape.gradient(y, [x0, x1, x2, x3])\n", "\n", "for g in grad:\n", " print(g)" ] }, { "cell_type": "markdown", "metadata": { "id": "RkcpQnLgNxgi" }, "source": [ "You can list the variables being watched by the tape using the `GradientTape.watched_variables` method:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.342261Z", "iopub.status.busy": "2024-08-15T01:30:12.341657Z", "iopub.status.idle": "2024-08-15T01:30:12.346467Z", "shell.execute_reply": "2024-08-15T01:30:12.345654Z" }, "id": "hwNwjW1eAkib" }, "outputs": [ { "data": { "text/plain": [ "['x0:0']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[var.name for var in tape.watched_variables()]" ] }, { "cell_type": "markdown", "metadata": { "id": "NB9I1uFvB4tf" }, "source": [ "`tf.GradientTape` provides hooks that give the user control over what is or is not watched.\n", "\n", "To record gradients with respect to a `tf.Tensor`, you need to call `GradientTape.watch(x)`:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.349661Z", "iopub.status.busy": "2024-08-15T01:30:12.349387Z", "iopub.status.idle": "2024-08-15T01:30:12.354618Z", "shell.execute_reply": "2024-08-15T01:30:12.353987Z" }, "id": "tVN1QqFRDHBK" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6.0\n" ] } ], "source": [ "x = tf.constant(3.0)\n", "with tf.GradientTape() as tape:\n", " tape.watch(x)\n", " y = x**2\n", "\n", "# dy = 2x * dx\n", "dy_dx = tape.gradient(y, x)\n", "print(dy_dx.numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "qxsiYnf2DN8K" }, "source": [ "Conversely, to disable the default behavior of watching all `tf.Variables`, set `watch_accessed_variables=False` when creating the gradient tape. This calculation uses two variables, but only connects the gradient for one of the variables:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.358163Z", "iopub.status.busy": "2024-08-15T01:30:12.357581Z", "iopub.status.idle": "2024-08-15T01:30:12.451991Z", "shell.execute_reply": "2024-08-15T01:30:12.451324Z" }, "id": "7QPzwWvSEwIp" }, "outputs": [], "source": [ "x0 = tf.Variable(0.0)\n", "x1 = tf.Variable(10.0)\n", "\n", "with tf.GradientTape(watch_accessed_variables=False) as tape:\n", " tape.watch(x1)\n", " y0 = tf.math.sin(x0)\n", " y1 = tf.nn.softplus(x1)\n", " y = y0 + y1\n", " ys = tf.reduce_sum(y)" ] }, { "cell_type": "markdown", "metadata": { "id": "TRduLbE1H2IJ" }, "source": [ "Since `GradientTape.watch` was not called on `x0`, no gradient is computed with respect to it:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.456339Z", "iopub.status.busy": "2024-08-15T01:30:12.455695Z", "iopub.status.idle": "2024-08-15T01:30:12.521151Z", "shell.execute_reply": "2024-08-15T01:30:12.520521Z" }, "id": "e6GM-3evH1Sz" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dy/dx0: None\n", "dy/dx1: 0.9999546\n" ] } ], "source": [ "# dys/dx1 = exp(x1) / (1 + exp(x1)) = sigmoid(x1)\n", "grad = tape.gradient(ys, {'x0': x0, 'x1': x1})\n", "\n", "print('dy/dx0:', grad['x0'])\n", "print('dy/dx1:', grad['x1'].numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "2g1nKB6P-OnA" }, "source": [ "## Intermediate results\n", "\n", "You can also request gradients of the output with respect to intermediate values computed inside the `tf.GradientTape` context." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.524608Z", "iopub.status.busy": "2024-08-15T01:30:12.524366Z", "iopub.status.idle": "2024-08-15T01:30:12.530956Z", "shell.execute_reply": "2024-08-15T01:30:12.530308Z" }, "id": "7XaPRAwUyYms" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18.0\n" ] } ], "source": [ "x = tf.constant(3.0)\n", "\n", "with tf.GradientTape() as tape:\n", " tape.watch(x)\n", " y = x * x\n", " z = y * y\n", "\n", "# Use the tape to compute the gradient of z with respect to the\n", "# intermediate value y.\n", "# dz_dy = 2 * y and y = x ** 2 = 9\n", "print(tape.gradient(z, y).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "ISkXuY7YzIcS" }, "source": [ "By default, the resources held by a `GradientTape` are released as soon as the `GradientTape.gradient` method is called. To compute multiple gradients over the same computation, create a gradient tape with `persistent=True`. This allows multiple calls to the `gradient` method as resources are released when the tape object is garbage collected. For example:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.534481Z", "iopub.status.busy": "2024-08-15T01:30:12.533804Z", "iopub.status.idle": "2024-08-15T01:30:12.541218Z", "shell.execute_reply": "2024-08-15T01:30:12.540652Z" }, "id": "zZaCm3-9zVCi" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 4. 108.]\n", "[2. 6.]\n" ] } ], "source": [ "x = tf.constant([1, 3.0])\n", "with tf.GradientTape(persistent=True) as tape:\n", " tape.watch(x)\n", " y = x * x\n", " z = y * y\n", "\n", "print(tape.gradient(z, x).numpy()) # [4.0, 108.0] (4 * x**3 at x = [1.0, 3.0])\n", "print(tape.gradient(y, x).numpy()) # [2.0, 6.0] (2 * x at x = [1.0, 3.0])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.544392Z", "iopub.status.busy": "2024-08-15T01:30:12.543861Z", "iopub.status.idle": "2024-08-15T01:30:12.546772Z", "shell.execute_reply": "2024-08-15T01:30:12.546207Z" }, "id": "j8bv_jQFg6CN" }, "outputs": [], "source": [ "del tape # Drop the reference to the tape" ] }, { "cell_type": "markdown", "metadata": { "id": "O_ZY-9BUB7vX" }, "source": [ "## Notes on performance\n", "\n", "* There is a tiny overhead associated with doing operations inside a gradient tape context. For most eager execution this will not be a noticeable cost, but you should still use tape context around the areas only where it is required.\n", "\n", "* Gradient tapes use memory to store intermediate results, including inputs and outputs, for use during the backwards pass.\n", "\n", " For efficiency, some ops (like `ReLU`) don't need to keep their intermediate results and they are pruned during the forward pass. However, if you use `persistent=True` on your tape, *nothing is discarded* and your peak memory usage will be higher." ] }, { "cell_type": "markdown", "metadata": { "id": "9dLBpZsJebFq" }, "source": [ "## Gradients of non-scalar targets" ] }, { "cell_type": "markdown", "metadata": { "id": "7pldU9F5duP2" }, "source": [ "A gradient is fundamentally an operation on a scalar." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.550634Z", "iopub.status.busy": "2024-08-15T01:30:12.549991Z", "iopub.status.idle": "2024-08-15T01:30:12.619656Z", "shell.execute_reply": "2024-08-15T01:30:12.618932Z" }, "id": "qI0sDV_WeXBb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n", "-0.25\n" ] } ], "source": [ "x = tf.Variable(2.0)\n", "with tf.GradientTape(persistent=True) as tape:\n", " y0 = x**2\n", " y1 = 1 / x\n", "\n", "print(tape.gradient(y0, x).numpy())\n", "print(tape.gradient(y1, x).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "COEyYp34fxj4" }, "source": [ "Thus, if you ask for the gradient of multiple targets, the result for each source is:\n", "\n", "* The gradient of the sum of the targets, or equivalently\n", "* The sum of the gradients of each target." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.623092Z", "iopub.status.busy": "2024-08-15T01:30:12.622831Z", "iopub.status.idle": "2024-08-15T01:30:12.630160Z", "shell.execute_reply": "2024-08-15T01:30:12.629521Z" }, "id": "o4a6_YOcfWKS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.75\n" ] } ], "source": [ "x = tf.Variable(2.0)\n", "with tf.GradientTape() as tape:\n", " y0 = x**2\n", " y1 = 1 / x\n", "\n", "print(tape.gradient({'y0': y0, 'y1': y1}, x).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "uvP-mkBMgbym" }, "source": [ "Similarly, if the target(s) are not scalar the gradient of the sum is calculated:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.633668Z", "iopub.status.busy": "2024-08-15T01:30:12.633412Z", "iopub.status.idle": "2024-08-15T01:30:12.639753Z", "shell.execute_reply": "2024-08-15T01:30:12.639157Z" }, "id": "DArPWqsSh5un" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7.0\n" ] } ], "source": [ "x = tf.Variable(2.)\n", "\n", "with tf.GradientTape() as tape:\n", " y = x * [3., 4.]\n", "\n", "print(tape.gradient(y, x).numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "flDbx68Zh5Lb" }, "source": [ "This makes it simple to take the gradient of the sum of a collection of losses, or the gradient of the sum of an element-wise loss calculation.\n", "\n", "If you need a separate gradient for each item, refer to [Jacobians](advanced_autodiff.ipynb#jacobians)." ] }, { "cell_type": "markdown", "metadata": { "id": "iwFswok8RAly" }, "source": [ "In some cases you can skip the Jacobian. For an element-wise calculation, the gradient of the sum gives the derivative of each element with respect to its input-element, since each element is independent:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:12.643198Z", "iopub.status.busy": "2024-08-15T01:30:12.642763Z", "iopub.status.idle": "2024-08-15T01:30:13.331633Z", "shell.execute_reply": "2024-08-15T01:30:13.330889Z" }, "id": "JQvk_jnMmTDS" }, "outputs": [], "source": [ "x = tf.linspace(-10.0, 10.0, 200+1)\n", "\n", "with tf.GradientTape() as tape:\n", " tape.watch(x)\n", " y = tf.nn.sigmoid(x)\n", "\n", "dy_dx = tape.gradient(y, x)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.335667Z", "iopub.status.busy": "2024-08-15T01:30:13.335420Z", "iopub.status.idle": "2024-08-15T01:30:13.499401Z", "shell.execute_reply": "2024-08-15T01:30:13.498786Z" }, "id": "e_f2QgDPmcPE" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(x, y, label='y')\n", "plt.plot(x, dy_dx, label='dy/dx')\n", "plt.legend()\n", "_ = plt.xlabel('x')" ] }, { "cell_type": "markdown", "metadata": { "id": "6kADybtQzYj4" }, "source": [ "## Control flow\n", "\n", "Because a gradient tape records operations as they are executed, Python control flow is naturally handled (for example, `if` and `while` statements).\n", "\n", "Here a different variable is used on each branch of an `if`. The gradient only connects to the variable that was used:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.502631Z", "iopub.status.busy": "2024-08-15T01:30:13.502403Z", "iopub.status.idle": "2024-08-15T01:30:13.510472Z", "shell.execute_reply": "2024-08-15T01:30:13.509868Z" }, "id": "ciFLizhrrjy7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(1.0, shape=(), dtype=float32)\n", "None\n" ] } ], "source": [ "x = tf.constant(1.0)\n", "\n", "v0 = tf.Variable(2.0)\n", "v1 = tf.Variable(2.0)\n", "\n", "with tf.GradientTape(persistent=True) as tape:\n", " tape.watch(x)\n", " if x > 0.0:\n", " result = v0\n", " else:\n", " result = v1**2 \n", "\n", "dv0, dv1 = tape.gradient(result, [v0, v1])\n", "\n", "print(dv0)\n", "print(dv1)" ] }, { "cell_type": "markdown", "metadata": { "id": "HKnLaiapsjeP" }, "source": [ "Just remember that the control statements themselves are not differentiable, so they are invisible to gradient-based optimizers.\n", "\n", "Depending on the value of `x` in the above example, the tape either records `result = v0` or `result = v1**2`. The gradient with respect to `x` is always `None`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.513704Z", "iopub.status.busy": "2024-08-15T01:30:13.513457Z", "iopub.status.idle": "2024-08-15T01:30:13.517267Z", "shell.execute_reply": "2024-08-15T01:30:13.516678Z" }, "id": "8k05WmuAwPm7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "dx = tape.gradient(result, x)\n", "\n", "print(dx)" ] }, { "cell_type": "markdown", "metadata": { "id": "egypBxISAHhx" }, "source": [ "## Cases where `gradient` returns `None`\n", "\n", "When a target is not connected to a source, `gradient` will return `None`.\n" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.520596Z", "iopub.status.busy": "2024-08-15T01:30:13.520029Z", "iopub.status.idle": "2024-08-15T01:30:13.525557Z", "shell.execute_reply": "2024-08-15T01:30:13.524980Z" }, "id": "CU185WDM81Ut" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "x = tf.Variable(2.)\n", "y = tf.Variable(3.)\n", "\n", "with tf.GradientTape() as tape:\n", " z = y * y\n", "print(tape.gradient(z, x))" ] }, { "cell_type": "markdown", "metadata": { "id": "sZbKpHfBRJym" }, "source": [ "Here `z` is obviously not connected to `x`, but there are several less-obvious ways that a gradient can be disconnected." ] }, { "cell_type": "markdown", "metadata": { "id": "eHDzDOiQ8xmw" }, "source": [ "### 1. Replaced a variable with a tensor\n", "\n", "In the section on [\"controlling what the tape watches\"](#watches) you saw that the tape will automatically watch a `tf.Variable` but not a `tf.Tensor`.\n", "\n", "One common error is to inadvertently replace a `tf.Variable` with a `tf.Tensor`, instead of using `Variable.assign` to update the `tf.Variable`. Here is an example:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.528706Z", "iopub.status.busy": "2024-08-15T01:30:13.528492Z", "iopub.status.idle": "2024-08-15T01:30:13.534284Z", "shell.execute_reply": "2024-08-15T01:30:13.533635Z" }, "id": "QPKY4Tn9zX7_" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ResourceVariable : tf.Tensor(1.0, shape=(), dtype=float32)\n", "EagerTensor : None\n" ] } ], "source": [ "x = tf.Variable(2.0)\n", "\n", "for epoch in range(2):\n", " with tf.GradientTape() as tape:\n", " y = x+1\n", "\n", " print(type(x).__name__, \":\", tape.gradient(y, x))\n", " x = x + 1 # This should be `x.assign_add(1)`" ] }, { "cell_type": "markdown", "metadata": { "id": "3gwZKxgA97an" }, "source": [ "### 2. Did calculations outside of TensorFlow\n", "\n", "The tape can't record the gradient path if the calculation exits TensorFlow.\n", "For example:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.537149Z", "iopub.status.busy": "2024-08-15T01:30:13.536925Z", "iopub.status.idle": "2024-08-15T01:30:13.544353Z", "shell.execute_reply": "2024-08-15T01:30:13.543704Z" }, "id": "jmoLCDJb_yw1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "x = tf.Variable([[1.0, 2.0],\n", " [3.0, 4.0]], dtype=tf.float32)\n", "\n", "with tf.GradientTape() as tape:\n", " x2 = x**2\n", "\n", " # This step is calculated with NumPy\n", " y = np.mean(x2, axis=0)\n", "\n", " # Like most ops, reduce_mean will cast the NumPy array to a constant tensor\n", " # using `tf.convert_to_tensor`.\n", " y = tf.reduce_mean(y, axis=0)\n", "\n", "print(tape.gradient(y, x))" ] }, { "cell_type": "markdown", "metadata": { "id": "p3YVfP3R-tp7" }, "source": [ "### 3. Took gradients through an integer or string\n", "\n", "Integers and strings are not differentiable. If a calculation path uses these data types there will be no gradient.\n", "\n", "Nobody expects strings to be differentiable, but it's easy to accidentally create an `int` constant or variable if you don't specify the `dtype`." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.547270Z", "iopub.status.busy": "2024-08-15T01:30:13.547047Z", "iopub.status.idle": "2024-08-15T01:30:13.552690Z", "shell.execute_reply": "2024-08-15T01:30:13.552120Z" }, "id": "9jlHXHqfASU3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:The dtype of the watched tensor must be floating (e.g. tf.float32), got tf.int32\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "x = tf.constant(10)\n", "\n", "with tf.GradientTape() as g:\n", " g.watch(x)\n", " y = x * x\n", "\n", "print(g.gradient(y, x))" ] }, { "cell_type": "markdown", "metadata": { "id": "RsdP_mTHX9L1" }, "source": [ "TensorFlow doesn't automatically cast between types, so, in practice, you'll often get a type error instead of a missing gradient." ] }, { "cell_type": "markdown", "metadata": { "id": "WyAZ7C8qCEs6" }, "source": [ "### 4. Took gradients through a stateful object\n", "\n", "State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.\n", "\n", "A `tf.Tensor` is immutable. You can't change a tensor once it's created. It has a _value_, but no _state_. All the operations discussed so far are also stateless: the output of a `tf.matmul` only depends on its inputs.\n", "\n", "A `tf.Variable` has internal state—its value. When you use the variable, the state is read. It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back. For example:\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.556090Z", "iopub.status.busy": "2024-08-15T01:30:13.555864Z", "iopub.status.idle": "2024-08-15T01:30:13.563022Z", "shell.execute_reply": "2024-08-15T01:30:13.562452Z" }, "id": "C1tLeeRFE479" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "x0 = tf.Variable(3.0)\n", "x1 = tf.Variable(0.0)\n", "\n", "with tf.GradientTape() as tape:\n", " # Update x1 = x1 + x0.\n", " x1.assign_add(x0)\n", " # The tape starts recording from x1.\n", " y = x1**2 # y = (x1 + x0)**2\n", "\n", "# This doesn't work.\n", "print(tape.gradient(y, x0)) #dy/dx0 = 2*(x1 + x0)" ] }, { "cell_type": "markdown", "metadata": { "id": "xKA92-dqF2r-" }, "source": [ "Similarly, `tf.data.Dataset` iterators and `tf.queue`s are stateful, and will stop all gradients on tensors that pass through them." ] }, { "cell_type": "markdown", "metadata": { "id": "HHvcDGIbOj2I" }, "source": [ "## No gradient registered" ] }, { "cell_type": "markdown", "metadata": { "id": "aoc-A6AxVqry" }, "source": [ "Some `tf.Operation`s are **registered as being non-differentiable** and will return `None`. Others have **no gradient registered**.\n", "\n", "The `tf.raw_ops` page shows which low-level ops have gradients registered.\n", "\n", "If you attempt to take a gradient through a float op that has no gradient registered the tape will throw an error instead of silently returning `None`. This way you know something has gone wrong.\n", "\n", "For example, the `tf.image.adjust_contrast` function wraps `raw_ops.AdjustContrastv2`, which could have a gradient but the gradient is not implemented:\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.566420Z", "iopub.status.busy": "2024-08-15T01:30:13.566184Z", "iopub.status.idle": "2024-08-15T01:30:13.578588Z", "shell.execute_reply": "2024-08-15T01:30:13.578030Z" }, "id": "HSb20FXc_V0U" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LookupError: gradient registry has no entry for: AdjustContrastv2\n" ] } ], "source": [ "image = tf.Variable([[[0.5, 0.0, 0.0]]])\n", "delta = tf.Variable(0.1)\n", "\n", "with tf.GradientTape() as tape:\n", " new_image = tf.image.adjust_contrast(image, delta)\n", "\n", "try:\n", " print(tape.gradient(new_image, [image, delta]))\n", " assert False # This should not happen.\n", "except LookupError as e:\n", " print(f'{type(e).__name__}: {e}')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "pDoutjzATiEm" }, "source": [ "If you need to differentiate through this op, you'll either need to implement the gradient and register it (using `tf.RegisterGradient`) or re-implement the function using other ops." ] }, { "cell_type": "markdown", "metadata": { "id": "GCTwc_dQXp2W" }, "source": [ "## Zeros instead of None" ] }, { "cell_type": "markdown", "metadata": { "id": "TYDrVogA89eA" }, "source": [ "In some cases it would be convenient to get 0 instead of `None` for unconnected gradients. You can decide what to return when you have unconnected gradients using the `unconnected_gradients` argument:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:30:13.581859Z", "iopub.status.busy": "2024-08-15T01:30:13.581644Z", "iopub.status.idle": "2024-08-15T01:30:13.642074Z", "shell.execute_reply": "2024-08-15T01:30:13.641433Z" }, "id": "U6zxk1sf9Ixx" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor([0. 0.], shape=(2,), dtype=float32)\n" ] } ], "source": [ "x = tf.Variable([2., 2.])\n", "y = tf.Variable(3.)\n", "\n", "with tf.GradientTape() as tape:\n", " z = y**2\n", "print(tape.gradient(z, x, unconnected_gradients=tf.UnconnectedGradients.ZERO))" ] } ], "metadata": { "colab": { "collapsed_sections": [ "Tce3stUlHN0L" ], "name": "autodiff.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 0 }