{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ZjN_IJ8mhJ-4" }, "source": [ "##### Copyright 2020 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2024-08-15T01:31:54.730267Z", "iopub.status.busy": "2024-08-15T01:31:54.730039Z", "iopub.status.idle": "2024-08-15T01:31:54.733750Z", "shell.execute_reply": "2024-08-15T01:31:54.733134Z" }, "id": "sY3Ffd83hK3b" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "03Pw58e6mTHI" }, "source": [ "# NumPy API on TensorFlow" ] }, { "cell_type": "markdown", "metadata": { "id": "7WpGysDJmZsg" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on TensorFlow.org\n", " \n", " Run in Google Colab\n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "s2enCDi_FvCR" }, "source": [ "## Overview\n", "\n", "TensorFlow implements a subset of the [NumPy API](https://numpy.org/doc/stable/index.html), available as `tf.experimental.numpy`. This allows running NumPy code, accelerated by TensorFlow, while also allowing access to all of TensorFlow's APIs." ] }, { "cell_type": "markdown", "metadata": { "id": "ob1HNwUmYR5b" }, "source": [ "## Setup\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:54.737436Z", "iopub.status.busy": "2024-08-15T01:31:54.737065Z", "iopub.status.idle": "2024-08-15T01:31:57.341967Z", "shell.execute_reply": "2024-08-15T01:31:57.341304Z" }, "id": "AJR558zjAZQu" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-08-15 01:31:55.452313: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-08-15 01:31:55.473711: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-08-15 01:31:55.480014: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Using TensorFlow version 2.17.0\n" ] } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import tensorflow as tf\n", "import tensorflow.experimental.numpy as tnp\n", "import timeit\n", "\n", "print(\"Using TensorFlow version %s\" % tf.__version__)" ] }, { "cell_type": "markdown", "metadata": { "id": "M6tacoy0DU6e" }, "source": [ "### Enabling NumPy behavior\n", "\n", "In order to use `tnp` as NumPy, enable NumPy behavior for TensorFlow:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:57.345801Z", "iopub.status.busy": "2024-08-15T01:31:57.345404Z", "iopub.status.idle": "2024-08-15T01:31:57.348756Z", "shell.execute_reply": "2024-08-15T01:31:57.348195Z" }, "id": "TfCyofpFDQxm" }, "outputs": [], "source": [ "tnp.experimental_enable_numpy_behavior()" ] }, { "cell_type": "markdown", "metadata": { "id": "et9D5wq0D1H2" }, "source": [ "This call enables type promotion in TensorFlow and also changes type inference, when converting literals to tensors, to more strictly follow the NumPy standard.\n", "\n", "Note: This call will change the behavior of entire TensorFlow, not just the `tf.experimental.numpy` module." ] }, { "cell_type": "markdown", "metadata": { "id": "yh2BwqUzH3C3" }, "source": [ "## TensorFlow NumPy ND array\n", "\n", "An instance of `tf.experimental.numpy.ndarray`, called **ND Array**, represents a multidimensional dense array of a given `dtype` placed on a certain device. It is an alias to `tf.Tensor`. Check out the ND array class for useful methods like `ndarray.T`, `ndarray.reshape`, `ndarray.ravel` and others.\n", "\n", "First create an ND array object, and then invoke different methods." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:57.351951Z", "iopub.status.busy": "2024-08-15T01:31:57.351715Z", "iopub.status.idle": "2024-08-15T01:31:59.592454Z", "shell.execute_reply": "2024-08-15T01:31:59.591757Z" }, "id": "-BHJjxigJ2H1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created ND array with shape = (5, 3), rank = 2, dtype = on device = /job:localhost/replica:0/task:0/device:GPU:0\n", "\n", "Is `ones` an instance of tf.Tensor: True\n", "\n", "ndarray.T has shape (3, 5)\n", "narray.reshape(-1) has shape (15,)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "I0000 00:00:1723685517.895102 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.898954 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.902591 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.906282 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.918004 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.921334 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.924792 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.928195 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.931648 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.935182 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.938672 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685517.942206 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.174780 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.176882 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.178882 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.180984 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.183497 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.185413 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.187391 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.189496 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.191405 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.193373 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.195387 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.197409 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.235845 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.237893 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.239848 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.241962 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.243923 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.245869 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.247783 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.249797 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.251701 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.254112 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.256530 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1723685519.258985 23752 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n" ] } ], "source": [ "# Create an ND array and check out different attributes.\n", "ones = tnp.ones([5, 3], dtype=tnp.float32)\n", "print(\"Created ND array with shape = %s, rank = %s, \"\n", " \"dtype = %s on device = %s\\n\" % (\n", " ones.shape, ones.ndim, ones.dtype, ones.device))\n", "\n", "# `ndarray` is just an alias to `tf.Tensor`.\n", "print(\"Is `ones` an instance of tf.Tensor: %s\\n\" % isinstance(ones, tf.Tensor))\n", "\n", "# Try commonly used member functions.\n", "print(\"ndarray.T has shape %s\" % str(ones.T.shape))\n", "print(\"narray.reshape(-1) has shape %s\" % ones.reshape(-1).shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "-BOY8CGRKEhE" }, "source": [ "### Type promotion\n", "\n", "There are 4 options for type promotion in TensorFlow.\n", "\n", "- By default, TensorFlow raises errors instead of promoting types for mixed type operations.\n", "- Running `tf.numpy.experimental_enable_numpy_behavior()` switches TensorFlow to use `NumPy` type promotion rules (described below).\n", "- After TensorFlow 2.15, there are two new options (refer to [TF NumPy Type Promotion](tf_numpy_type_promotion.ipynb) for details):\n", " - `tf.numpy.experimental_enable_numpy_behavior(dtype_conversion_mode=\"all\")` uses Jax type promotion rules.\n", " - `tf.numpy.experimental_enable_numpy_behavior(dtype_conversion_mode=\"safe\")` uses Jax type promotion rules, but disallows certain unsafe promotions." ] }, { "cell_type": "markdown", "metadata": { "id": "SXskSHrX5J45" }, "source": [ "#### NumPy Type Promotion\n", "\n", "TensorFlow NumPy APIs have well-defined semantics for converting literals to ND array, as well as for performing type promotion on ND array inputs. Please see [`np.result_type`](https://numpy.org/doc/1.16/reference/generated/numpy.result_type.html) for more details." ] }, { "cell_type": "markdown", "metadata": { "id": "vcRznNaMj27J" }, "source": [ "TensorFlow APIs leave `tf.Tensor` inputs unchanged and do not perform type promotion on them, while TensorFlow NumPy APIs promote all inputs according to NumPy type promotion rules. In the next example, you will perform type promotion. First, run addition on ND array inputs of different types and note the output types. None of these type promotions would be allowed by TensorFlow APIs." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:59.596322Z", "iopub.status.busy": "2024-08-15T01:31:59.596064Z", "iopub.status.idle": "2024-08-15T01:31:59.612266Z", "shell.execute_reply": "2024-08-15T01:31:59.611655Z" }, "id": "uHmBi4KZI2t1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type promotion for operations\n", "int32 + int64 => int64\n", "int32 + float32 => float64\n", "int32 + float64 => float64\n", "int64 + float32 => float64\n", "int64 + float64 => float64\n", "float32 + float64 => float64\n" ] } ], "source": [ "print(\"Type promotion for operations\")\n", "values = [tnp.asarray(1, dtype=d) for d in\n", " (tnp.int32, tnp.int64, tnp.float32, tnp.float64)]\n", "for i, v1 in enumerate(values):\n", " for v2 in values[i + 1:]:\n", " print(\"%s + %s => %s\" %\n", " (v1.dtype.name, v2.dtype.name, (v1 + v2).dtype.name))" ] }, { "cell_type": "markdown", "metadata": { "id": "CrpIoOc7oqox" }, "source": [ "Finally, convert literals to ND array using `ndarray.asarray` and note the resulting type." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:59.615576Z", "iopub.status.busy": "2024-08-15T01:31:59.615344Z", "iopub.status.idle": "2024-08-15T01:31:59.621146Z", "shell.execute_reply": "2024-08-15T01:31:59.620550Z" }, "id": "1m1cp8_VooNk" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type inference during array creation\n", "tnp.asarray(1).dtype == tnp.int64\n", "tnp.asarray(1.).dtype == tnp.float64\n", "\n" ] } ], "source": [ "print(\"Type inference during array creation\")\n", "print(\"tnp.asarray(1).dtype == tnp.%s\" % tnp.asarray(1).dtype.name)\n", "print(\"tnp.asarray(1.).dtype == tnp.%s\\n\" % tnp.asarray(1.).dtype.name)" ] }, { "cell_type": "markdown", "metadata": { "id": "kd-_iccXoRL8" }, "source": [ "When converting literals to ND array, NumPy prefers wide types like `tnp.int64` and `tnp.float64`. In contrast, `tf.convert_to_tensor` prefers `tf.int32` and `tf.float32` types for converting constants to `tf.Tensor`. TensorFlow NumPy APIs adhere to the NumPy behavior for integers. As for floats, the `prefer_float32` argument of `experimental_enable_numpy_behavior` lets you control whether to prefer `tf.float32` over `tf.float64` (default to `False`). For example:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:59.624205Z", "iopub.status.busy": "2024-08-15T01:31:59.623943Z", "iopub.status.idle": "2024-08-15T01:31:59.630711Z", "shell.execute_reply": "2024-08-15T01:31:59.630084Z" }, "id": "4gKasnH0j84C" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "When prefer_float32 is True:\n", "tnp.asarray(1.).dtype == tnp.float32\n", "tnp.add(1., 2.).dtype == tnp.float32\n", "When prefer_float32 is False:\n", "tnp.asarray(1.).dtype == tnp.float64\n", "tnp.add(1., 2.).dtype == tnp.float64\n" ] } ], "source": [ "tnp.experimental_enable_numpy_behavior(prefer_float32=True)\n", "print(\"When prefer_float32 is True:\")\n", "print(\"tnp.asarray(1.).dtype == tnp.%s\" % tnp.asarray(1.).dtype.name)\n", "print(\"tnp.add(1., 2.).dtype == tnp.%s\" % tnp.add(1., 2.).dtype.name)\n", "\n", "tnp.experimental_enable_numpy_behavior(prefer_float32=False)\n", "print(\"When prefer_float32 is False:\")\n", "print(\"tnp.asarray(1.).dtype == tnp.%s\" % tnp.asarray(1.).dtype.name)\n", "print(\"tnp.add(1., 2.).dtype == tnp.%s\" % tnp.add(1., 2.).dtype.name)" ] }, { "cell_type": "markdown", "metadata": { "id": "MwCCDxSZOfA1" }, "source": [ "### Broadcasting\n", "\n", "Similar to TensorFlow, NumPy defines rich semantics for \"broadcasting\" values.\n", "You can check out the [NumPy broadcasting guide](https://numpy.org/doc/1.16/user/basics.broadcasting.html) for more information and compare this with [TensorFlow broadcasting semantics](https://www.tensorflow.org/guide/tensor#broadcasting)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:59.633685Z", "iopub.status.busy": "2024-08-15T01:31:59.633439Z", "iopub.status.idle": "2024-08-15T01:31:59.639412Z", "shell.execute_reply": "2024-08-15T01:31:59.638849Z" }, "id": "qlyOShxIO0s2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Broadcasting shapes (2, 3), (3,) and (1, 2, 1) gives shape (1, 2, 3)\n" ] } ], "source": [ "x = tnp.ones([2, 3])\n", "y = tnp.ones([3])\n", "z = tnp.ones([1, 2, 1])\n", "print(\"Broadcasting shapes %s, %s and %s gives shape %s\" % (\n", " x.shape, y.shape, z.shape, (x + y + z).shape))" ] }, { "cell_type": "markdown", "metadata": { "id": "LEVr4ctRPrqR" }, "source": [ "### Indexing\n", "\n", "NumPy defines very sophisticated indexing rules. See the [NumPy Indexing guide](https://numpy.org/doc/1.16/reference/arrays.indexing.html). Note the use of ND arrays as indices below." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:31:59.642439Z", "iopub.status.busy": "2024-08-15T01:31:59.642026Z", "iopub.status.idle": "2024-08-15T01:32:00.371954Z", "shell.execute_reply": "2024-08-15T01:32:00.371267Z" }, "id": "lRsrtnd3YyMj" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Basic indexing\n", "tf.Tensor(\n", "[[[16 17 18 19]\n", " [20 21 22 23]]], shape=(1, 2, 4), dtype=int64) \n", "\n", "Boolean indexing\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(\n", "[[[ 0 1 2 3]\n", " [ 8 9 10 11]]\n", "\n", " [[12 13 14 15]\n", " [20 21 22 23]]], shape=(2, 2, 4), dtype=int64) \n", "\n", "Advanced indexing\n", "tf.Tensor([12 13 17], shape=(3,), dtype=int64)\n" ] } ], "source": [ "x = tnp.arange(24).reshape(2, 3, 4)\n", "\n", "print(\"Basic indexing\")\n", "print(x[1, tnp.newaxis, 1:3, ...], \"\\n\")\n", "\n", "print(\"Boolean indexing\")\n", "print(x[:, (True, False, True)], \"\\n\")\n", "\n", "print(\"Advanced indexing\")\n", "print(x[1, (0, 0, 1), tnp.asarray([0, 1, 1])])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.375254Z", "iopub.status.busy": "2024-08-15T01:32:00.374994Z", "iopub.status.idle": "2024-08-15T01:32:00.379580Z", "shell.execute_reply": "2024-08-15T01:32:00.378997Z" }, "id": "yRAaiGhlaNw7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Currently, TensorFlow NumPy does not support mutation.\n" ] } ], "source": [ "# Mutation is currently not supported\n", "try:\n", " tnp.arange(6)[1] = -1\n", "except TypeError:\n", " print(\"Currently, TensorFlow NumPy does not support mutation.\")" ] }, { "cell_type": "markdown", "metadata": { "id": "5XfJ602j-GVD" }, "source": [ "### Example Model\n", "\n", "Next, you can see how to create a model and run inference on it. This simple model applies a relu layer followed by a linear projection. Later sections will show how to compute gradients for this model using TensorFlow's `GradientTape`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.382875Z", "iopub.status.busy": "2024-08-15T01:32:00.382414Z", "iopub.status.idle": "2024-08-15T01:32:00.593746Z", "shell.execute_reply": "2024-08-15T01:32:00.592952Z" }, "id": "kR_KCh4kYEhm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(\n", "[[-0.8292594 0.75780904]\n", " [-0.8292594 0.75780904]], shape=(2, 2), dtype=float32)\n" ] } ], "source": [ "class Model(object):\n", " \"\"\"Model with a dense and a linear layer.\"\"\"\n", "\n", " def __init__(self):\n", " self.weights = None\n", "\n", " def predict(self, inputs):\n", " if self.weights is None:\n", " size = inputs.shape[1]\n", " # Note that type `tnp.float32` is used for performance.\n", " stddev = tnp.sqrt(size).astype(tnp.float32)\n", " w1 = tnp.random.randn(size, 64).astype(tnp.float32) / stddev\n", " bias = tnp.random.randn(64).astype(tnp.float32)\n", " w2 = tnp.random.randn(64, 2).astype(tnp.float32) / 8\n", " self.weights = (w1, bias, w2)\n", " else:\n", " w1, bias, w2 = self.weights\n", " y = tnp.matmul(inputs, w1) + bias\n", " y = tnp.maximum(y, 0) # Relu\n", " return tnp.matmul(y, w2) # Linear projection\n", "\n", "model = Model()\n", "# Create input data and compute predictions.\n", "print(model.predict(tnp.ones([2, 32], dtype=tnp.float32)))" ] }, { "cell_type": "markdown", "metadata": { "id": "kSR7Ou5YcS38" }, "source": [ "## TensorFlow NumPy and NumPy\n", "\n", "TensorFlow NumPy implements a subset of the full NumPy spec. While more symbols will be added over time, there are systematic features that will not be supported in the near future. These include NumPy C API support, Swig integration, Fortran storage order, views and `stride_tricks`, and some `dtype`s (like `np.recarray` and `np.object`). For more details, please see the [TensorFlow NumPy API Documentation](https://www.tensorflow.org/api_docs/python/tf/experimental/numpy).\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Jb1KXak2YlNN" }, "source": [ "### NumPy interoperability\n", "\n", "TensorFlow ND arrays can interoperate with NumPy functions. These objects implement the `__array__` interface. NumPy uses this interface to convert function arguments to `np.ndarray` values before processing them.\n", "\n", "Similarly, TensorFlow NumPy functions can accept inputs of different types including `np.ndarray`. These inputs are converted to an ND array by calling `ndarray.asarray` on them.\n", "\n", "Conversion of the ND array to and from `np.ndarray` may trigger actual data copies. Please see the section on [buffer copies](#buffer-copies) for more details." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.597789Z", "iopub.status.busy": "2024-08-15T01:32:00.597282Z", "iopub.status.idle": "2024-08-15T01:32:00.608313Z", "shell.execute_reply": "2024-08-15T01:32:00.607647Z" }, "id": "cMOCgzQmeXRU" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sum = 6.0. Class: \n", "sum = 6.0. Class: \n" ] } ], "source": [ "# ND array passed into NumPy function.\n", "np_sum = np.sum(tnp.ones([2, 3]))\n", "print(\"sum = %s. Class: %s\" % (float(np_sum), np_sum.__class__))\n", "\n", "# `np.ndarray` passed into TensorFlow NumPy function.\n", "tnp_sum = tnp.sum(np.ones([2, 3]))\n", "print(\"sum = %s. Class: %s\" % (float(tnp_sum), tnp_sum.__class__))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.611395Z", "iopub.status.busy": "2024-08-15T01:32:00.611147Z", "iopub.status.idle": "2024-08-15T01:32:00.741887Z", "shell.execute_reply": "2024-08-15T01:32:00.741303Z" }, "id": "ZaLPjzxft780" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# It is easy to plot ND arrays, given the __array__ interface.\n", "labels = 15 + 2 * tnp.random.randn(1, 1000)\n", "_ = plt.hist(labels)" ] }, { "cell_type": "markdown", "metadata": { "id": "kF-Xyw3XWKqJ" }, "source": [ "### Buffer copies\n", "\n", "Intermixing TensorFlow NumPy with NumPy code may trigger data copies. This is because TensorFlow NumPy has stricter requirements on memory alignment than those of NumPy.\n", "\n", "When a `np.ndarray` is passed to TensorFlow NumPy, it will check for alignment requirements and trigger a copy if needed. When passing an ND array CPU buffer to NumPy, generally the buffer will satisfy alignment requirements and NumPy will not need to create a copy.\n", "\n", "ND arrays can refer to buffers placed on devices other than the local CPU memory. In such cases, invoking a NumPy function will trigger copies across the network or device as needed.\n", "\n", "Given this, intermixing with NumPy API calls should generally be done with caution and the user should watch out for overheads of copying data. Interleaving TensorFlow NumPy calls with TensorFlow calls is generally safe and avoids copying data. See the section on [TensorFlow interoperability](#tensorflow-interoperability) for more details." ] }, { "cell_type": "markdown", "metadata": { "id": "RwljbqkBc7Ro" }, "source": [ "### Operator precedence\n", "\n", "TensorFlow NumPy defines an `__array_priority__` higher than NumPy's. This means that for operators involving both ND array and `np.ndarray`, the former will take precedence, i.e., `np.ndarray` input will get converted to an ND array and the TensorFlow NumPy implementation of the operator will get invoked." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.745689Z", "iopub.status.busy": "2024-08-15T01:32:00.745068Z", "iopub.status.idle": "2024-08-15T01:32:00.750238Z", "shell.execute_reply": "2024-08-15T01:32:00.749602Z" }, "id": "Cbw8a3G_WUO7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x = tf.Tensor([2. 2.], shape=(2,), dtype=float64)\n", "class = \n" ] } ], "source": [ "x = tnp.ones([2]) + np.ones([2])\n", "print(\"x = %s\\nclass = %s\" % (x, x.__class__))" ] }, { "cell_type": "markdown", "metadata": { "id": "DNEab_Ctky83" }, "source": [ "## TF NumPy and TensorFlow\n", "\n", "TensorFlow NumPy is built on top of TensorFlow and hence interoperates seamlessly with TensorFlow." ] }, { "cell_type": "markdown", "metadata": { "id": "fCcfgrlOnAhQ" }, "source": [ "### `tf.Tensor` and ND array\n", "\n", "ND array is an alias to `tf.Tensor`, so obviously they can be intermixed without triggering actual data copies." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.753385Z", "iopub.status.busy": "2024-08-15T01:32:00.753128Z", "iopub.status.idle": "2024-08-15T01:32:00.757913Z", "shell.execute_reply": "2024-08-15T01:32:00.757313Z" }, "id": "BkHVauKwnky_" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor([1 2], shape=(2,), dtype=int32)\n", "tf.Tensor([1 2], shape=(2,), dtype=int32)\n", "tf.Tensor([1 2], shape=(2,), dtype=int32)\n", "[1 2] \n" ] } ], "source": [ "x = tf.constant([1, 2])\n", "print(x)\n", "\n", "# `asarray` and `convert_to_tensor` here are no-ops.\n", "tnp_x = tnp.asarray(x)\n", "print(tnp_x)\n", "print(tf.convert_to_tensor(tnp_x))\n", "\n", "# Note that tf.Tensor.numpy() will continue to return `np.ndarray`.\n", "print(x.numpy(), x.numpy().__class__)" ] }, { "cell_type": "markdown", "metadata": { "id": "_151HQVBooxG" }, "source": [ "### TensorFlow interoperability\n", "\n", "An ND array can be passed to TensorFlow APIs, since ND array is just an alias to `tf.Tensor`. As mentioned earlier, such interoperation does not do data copies, even for data placed on accelerators or remote devices.\n", "\n", "Conversely, `tf.Tensor` objects can be passed to `tf.experimental.numpy` APIs, without performing data copies." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.761161Z", "iopub.status.busy": "2024-08-15T01:32:00.760614Z", "iopub.status.idle": "2024-08-15T01:32:00.769073Z", "shell.execute_reply": "2024-08-15T01:32:00.768503Z" }, "id": "-QvxNhrFoz09" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output = tf.Tensor(6.0, shape=(), dtype=float32)\n", "Output = tf.Tensor(6.0, shape=(), dtype=float32)\n" ] } ], "source": [ "# ND array passed into TensorFlow function.\n", "tf_sum = tf.reduce_sum(tnp.ones([2, 3], tnp.float32))\n", "print(\"Output = %s\" % tf_sum)\n", "\n", "# `tf.Tensor` passed into TensorFlow NumPy function.\n", "tnp_sum = tnp.sum(tf.ones([2, 3]))\n", "print(\"Output = %s\" % tnp_sum)" ] }, { "cell_type": "markdown", "metadata": { "id": "1b4HeAkhprF_" }, "source": [ "### Gradients and Jacobians: tf.GradientTape\n", "\n", "TensorFlow's GradientTape can be used for backpropagation through TensorFlow and TensorFlow NumPy code.\n", "\n", "Use the model created in [Example Model](#example-model) section, and compute gradients and jacobians." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.772128Z", "iopub.status.busy": "2024-08-15T01:32:00.771652Z", "iopub.status.idle": "2024-08-15T01:32:00.930610Z", "shell.execute_reply": "2024-08-15T01:32:00.929913Z" }, "id": "T47C9KS8pbsP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameter shapes: [TensorShape([32, 64]), TensorShape([64]), TensorShape([64, 2])]\n", "Gradient shapes: [TensorShape([32, 64]), TensorShape([64]), TensorShape([64, 2])]\n" ] } ], "source": [ "def create_batch(batch_size=32):\n", " \"\"\"Creates a batch of input and labels.\"\"\"\n", " return (tnp.random.randn(batch_size, 32).astype(tnp.float32),\n", " tnp.random.randn(batch_size, 2).astype(tnp.float32))\n", "\n", "def compute_gradients(model, inputs, labels):\n", " \"\"\"Computes gradients of squared loss between model prediction and labels.\"\"\"\n", " with tf.GradientTape() as tape:\n", " assert model.weights is not None\n", " # Note that `model.weights` need to be explicitly watched since they\n", " # are not tf.Variables.\n", " tape.watch(model.weights)\n", " # Compute prediction and loss\n", " prediction = model.predict(inputs)\n", " loss = tnp.sum(tnp.square(prediction - labels))\n", " # This call computes the gradient through the computation above.\n", " return tape.gradient(loss, model.weights)\n", "\n", "inputs, labels = create_batch()\n", "gradients = compute_gradients(model, inputs, labels)\n", "\n", "# Inspect the shapes of returned gradients to verify they match the\n", "# parameter shapes.\n", "print(\"Parameter shapes:\", [w.shape for w in model.weights])\n", "print(\"Gradient shapes:\", [g.shape for g in gradients])\n", "# Verify that gradients are of type ND array.\n", "assert isinstance(gradients[0], tnp.ndarray)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:00.934170Z", "iopub.status.busy": "2024-08-15T01:32:00.933878Z", "iopub.status.idle": "2024-08-15T01:32:01.116321Z", "shell.execute_reply": "2024-08-15T01:32:01.115632Z" }, "id": "TujVPDFwrdqp" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output shape: (16, 2), input shape: (16, 32)\n", "Batch jacobian shape: (16, 2, 32)\n" ] } ], "source": [ "# Computes a batch of jacobians. Each row is the jacobian of an element in the\n", "# batch of outputs w.r.t. the corresponding input batch element.\n", "def prediction_batch_jacobian(inputs):\n", " with tf.GradientTape() as tape:\n", " tape.watch(inputs)\n", " prediction = model.predict(inputs)\n", " return prediction, tape.batch_jacobian(prediction, inputs)\n", "\n", "inp_batch = tnp.ones([16, 32], tnp.float32)\n", "output, batch_jacobian = prediction_batch_jacobian(inp_batch)\n", "# Note how the batch jacobian shape relates to the input and output shapes.\n", "print(\"Output shape: %s, input shape: %s\" % (output.shape, inp_batch.shape))\n", "print(\"Batch jacobian shape:\", batch_jacobian.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "MYq9wxfc1Dv_" }, "source": [ "### Trace compilation: tf.function\n", "\n", "TensorFlow's `tf.function` works by \"trace compiling\" the code and then optimizing these traces for much faster performance. See the [Introduction to Graphs and Functions](./intro_to_graphs.ipynb).\n", "\n", "`tf.function` can be used to optimize TensorFlow NumPy code as well. Here is a simple example to demonstrate the speedups. Note that the body of `tf.function` code includes calls to TensorFlow NumPy APIs.\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:01.119908Z", "iopub.status.busy": "2024-08-15T01:32:01.119400Z", "iopub.status.idle": "2024-08-15T01:32:01.326909Z", "shell.execute_reply": "2024-08-15T01:32:01.326269Z" }, "id": "05SrUulm1OlL" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Eager performance\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "2.710705300000882 ms\n", "\n", "tf.function compiled performance\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "0.7041131000050882 ms\n" ] } ], "source": [ "inputs, labels = create_batch(512)\n", "print(\"Eager performance\")\n", "compute_gradients(model, inputs, labels)\n", "print(timeit.timeit(lambda: compute_gradients(model, inputs, labels),\n", " number=10) * 100, \"ms\")\n", "\n", "print(\"\\ntf.function compiled performance\")\n", "compiled_compute_gradients = tf.function(compute_gradients)\n", "compiled_compute_gradients(model, inputs, labels) # warmup\n", "print(timeit.timeit(lambda: compiled_compute_gradients(model, inputs, labels),\n", " number=10) * 100, \"ms\")" ] }, { "cell_type": "markdown", "metadata": { "id": "5w8YxR6ELmo1" }, "source": [ "### Vectorization: tf.vectorized_map\n", "\n", "TensorFlow has inbuilt support for vectorizing parallel loops, which allows speedups of one to two orders of magnitude. These speedups are accessible via the `tf.vectorized_map` API and apply to TensorFlow NumPy code as well.\n", "\n", "It is sometimes useful to compute the gradient of each output in a batch w.r.t. the corresponding input batch element. Such computation can be done efficiently using `tf.vectorized_map` as shown below." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:01.330518Z", "iopub.status.busy": "2024-08-15T01:32:01.329930Z", "iopub.status.idle": "2024-08-15T01:32:01.504340Z", "shell.execute_reply": "2024-08-15T01:32:01.503636Z" }, "id": "PemSIrs5L-VJ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Weight shape: (32, 64), batch size: 128, per example gradient shape: (128, 32, 64) \n", "Weight shape: (64,), batch size: 128, per example gradient shape: (128, 64) \n", "Weight shape: (64, 2), batch size: 128, per example gradient shape: (128, 64, 2) \n" ] } ], "source": [ "@tf.function\n", "def vectorized_per_example_gradients(inputs, labels):\n", " def single_example_gradient(arg):\n", " inp, label = arg\n", " return compute_gradients(model,\n", " tnp.expand_dims(inp, 0),\n", " tnp.expand_dims(label, 0))\n", " # Note that a call to `tf.vectorized_map` semantically maps\n", " # `single_example_gradient` over each row of `inputs` and `labels`.\n", " # The interface is similar to `tf.map_fn`.\n", " # The underlying machinery vectorizes away this map loop which gives\n", " # nice speedups.\n", " return tf.vectorized_map(single_example_gradient, (inputs, labels))\n", "\n", "batch_size = 128\n", "inputs, labels = create_batch(batch_size)\n", "\n", "per_example_gradients = vectorized_per_example_gradients(inputs, labels)\n", "for w, p in zip(model.weights, per_example_gradients):\n", " print(\"Weight shape: %s, batch size: %s, per example gradient shape: %s \" % (\n", " w.shape, batch_size, p.shape))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:01.507462Z", "iopub.status.busy": "2024-08-15T01:32:01.507231Z", "iopub.status.idle": "2024-08-15T01:32:02.316964Z", "shell.execute_reply": "2024-08-15T01:32:02.315994Z" }, "id": "_QZ5BjJmRAlG" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running vectorized computation\n", "0.659675699989748 ms\n", "\n", "Running unvectorized computation\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "29.259711299982882 ms\n" ] } ], "source": [ "# Benchmark the vectorized computation above and compare with\n", "# unvectorized sequential computation using `tf.map_fn`.\n", "@tf.function\n", "def unvectorized_per_example_gradients(inputs, labels):\n", " def single_example_gradient(arg):\n", " inp, label = arg\n", " return compute_gradients(model,\n", " tnp.expand_dims(inp, 0),\n", " tnp.expand_dims(label, 0))\n", "\n", " return tf.map_fn(single_example_gradient, (inputs, labels),\n", " fn_output_signature=(tf.float32, tf.float32, tf.float32))\n", "\n", "print(\"Running vectorized computation\")\n", "print(timeit.timeit(lambda: vectorized_per_example_gradients(inputs, labels),\n", " number=10) * 100, \"ms\")\n", "\n", "print(\"\\nRunning unvectorized computation\")\n", "per_example_gradients = unvectorized_per_example_gradients(inputs, labels)\n", "print(timeit.timeit(lambda: unvectorized_per_example_gradients(inputs, labels),\n", " number=10) * 100, \"ms\")" ] }, { "cell_type": "markdown", "metadata": { "id": "UOTh-nkzaJd9" }, "source": [ "### Device placement\n", "\n", "TensorFlow NumPy can place operations on CPUs, GPUs, TPUs and remote devices. It uses standard TensorFlow mechanisms for device placement. Below a simple example shows how to list all devices and then place some computation on a particular device.\n", "\n", "TensorFlow also has APIs for replicating computation across devices and performing collective reductions which will not be covered here." ] }, { "cell_type": "markdown", "metadata": { "id": "-0gHrwYYaTCE" }, "source": [ "#### List devices\n", "\n", "`tf.config.list_logical_devices` and `tf.config.list_physical_devices` can be used to find what devices to use." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:02.320733Z", "iopub.status.busy": "2024-08-15T01:32:02.320448Z", "iopub.status.idle": "2024-08-15T01:32:02.325633Z", "shell.execute_reply": "2024-08-15T01:32:02.324746Z" }, "id": "NDEAd9m9aemS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All logical devices: [LogicalDevice(name='/device:CPU:0', device_type='CPU'), LogicalDevice(name='/device:GPU:0', device_type='GPU'), LogicalDevice(name='/device:GPU:1', device_type='GPU'), LogicalDevice(name='/device:GPU:2', device_type='GPU'), LogicalDevice(name='/device:GPU:3', device_type='GPU')]\n", "All physical devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]\n" ] } ], "source": [ "print(\"All logical devices:\", tf.config.list_logical_devices())\n", "print(\"All physical devices:\", tf.config.list_physical_devices())\n", "\n", "# Try to get the GPU device. If unavailable, fallback to CPU.\n", "try:\n", " device = tf.config.list_logical_devices(device_type=\"GPU\")[0]\n", "except IndexError:\n", " device = \"/device:CPU:0\"" ] }, { "cell_type": "markdown", "metadata": { "id": "fihgfF_tahVx" }, "source": [ "#### Placing operations: **`tf.device`**\n", "\n", "Operations can be placed on a device by calling it in a `tf.device` scope.\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:02.328976Z", "iopub.status.busy": "2024-08-15T01:32:02.328726Z", "iopub.status.idle": "2024-08-15T01:32:02.337014Z", "shell.execute_reply": "2024-08-15T01:32:02.336211Z" }, "id": "c7ELvLmnazfV" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using device: LogicalDevice(name='/device:GPU:0', device_type='GPU')\n", "prediction is placed on /job:localhost/replica:0/task:0/device:GPU:0\n" ] } ], "source": [ "print(\"Using device: %s\" % str(device))\n", "# Run operations in the `tf.device` scope.\n", "# If a GPU is available, these operations execute on the GPU and outputs are\n", "# placed on the GPU memory.\n", "with tf.device(device):\n", " prediction = model.predict(create_batch(5)[0])\n", "\n", "print(\"prediction is placed on %s\" % prediction.device)" ] }, { "cell_type": "markdown", "metadata": { "id": "e-LK6wsHbBiM" }, "source": [ "#### Copying ND arrays across devices: **`tnp.copy`**\n", "\n", "A call to `tnp.copy`, placed in a certain device scope, will copy the data to that device, unless the data is already on that device." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:02.340196Z", "iopub.status.busy": "2024-08-15T01:32:02.339941Z", "iopub.status.idle": "2024-08-15T01:32:02.345357Z", "shell.execute_reply": "2024-08-15T01:32:02.344550Z" }, "id": "CCesyidaa-UT" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/job:localhost/replica:0/task:0/device:GPU:0\n", "/job:localhost/replica:0/task:0/device:CPU:0\n" ] } ], "source": [ "with tf.device(\"/device:CPU:0\"):\n", " prediction_cpu = tnp.copy(prediction)\n", "print(prediction.device)\n", "print(prediction_cpu.device)" ] }, { "cell_type": "markdown", "metadata": { "id": "AiYzRDOtKzAH" }, "source": [ "## Performance comparisons\n", "\n", "TensorFlow NumPy uses highly optimized TensorFlow kernels that can be dispatched on CPUs, GPUs and TPUs. TensorFlow also performs many compiler optimizations, like operation fusion, which translate to performance and memory improvements. See [TensorFlow graph optimization with Grappler](./graph_optimization.ipynb) to learn more.\n", "\n", "However TensorFlow has higher overheads for dispatching operations compared to NumPy. For workloads composed of small operations (less than about 10 microseconds), these overheads can dominate the runtime and NumPy could provide better performance. For other cases, TensorFlow should generally provide better performance.\n", "\n", "Run the benchmark below to compare NumPy and TensorFlow NumPy performance for different input sizes." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "cellView": "code", "execution": { "iopub.execute_input": "2024-08-15T01:32:02.349247Z", "iopub.status.busy": "2024-08-15T01:32:02.348437Z", "iopub.status.idle": "2024-08-15T01:32:02.355122Z", "shell.execute_reply": "2024-08-15T01:32:02.354271Z" }, "id": "RExwjI9_pJG0" }, "outputs": [], "source": [ "def benchmark(f, inputs, number=30, force_gpu_sync=False):\n", " \"\"\"Utility to benchmark `f` on each value in `inputs`.\"\"\"\n", " times = []\n", " for inp in inputs:\n", " def _g():\n", " if force_gpu_sync:\n", " one = tnp.asarray(1)\n", " f(inp)\n", " if force_gpu_sync:\n", " with tf.device(\"CPU:0\"):\n", " tnp.copy(one) # Force a sync for GPU case\n", "\n", " _g() # warmup\n", " t = timeit.timeit(_g, number=number)\n", " times.append(t * 1000. / number)\n", " return times\n", "\n", "\n", "def plot(np_times, tnp_times, compiled_tnp_times, has_gpu, tnp_times_gpu):\n", " \"\"\"Plot the different runtimes.\"\"\"\n", " plt.xlabel(\"size\")\n", " plt.ylabel(\"time (ms)\")\n", " plt.title(\"Sigmoid benchmark: TF NumPy vs NumPy\")\n", " plt.plot(sizes, np_times, label=\"NumPy\")\n", " plt.plot(sizes, tnp_times, label=\"TF NumPy (CPU)\")\n", " plt.plot(sizes, compiled_tnp_times, label=\"Compiled TF NumPy (CPU)\")\n", " if has_gpu:\n", " plt.plot(sizes, tnp_times_gpu, label=\"TF NumPy (GPU)\")\n", " plt.legend()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-08-15T01:32:02.358643Z", "iopub.status.busy": "2024-08-15T01:32:02.358002Z", "iopub.status.idle": "2024-08-15T01:32:03.521684Z", "shell.execute_reply": "2024-08-15T01:32:03.521018Z" }, "id": "p-fs_H1lkLfV" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Define a simple implementation of `sigmoid`, and benchmark it using\n", "# NumPy and TensorFlow NumPy for different input sizes.\n", "\n", "def np_sigmoid(y):\n", " return 1. / (1. + np.exp(-y))\n", "\n", "def tnp_sigmoid(y):\n", " return 1. / (1. + tnp.exp(-y))\n", "\n", "@tf.function\n", "def compiled_tnp_sigmoid(y):\n", " return tnp_sigmoid(y)\n", "\n", "sizes = (2 ** 0, 2 ** 5, 2 ** 10, 2 ** 15, 2 ** 20)\n", "np_inputs = [np.random.randn(size).astype(np.float32) for size in sizes]\n", "np_times = benchmark(np_sigmoid, np_inputs)\n", "\n", "with tf.device(\"/device:CPU:0\"):\n", " tnp_inputs = [tnp.random.randn(size).astype(np.float32) for size in sizes]\n", " tnp_times = benchmark(tnp_sigmoid, tnp_inputs)\n", " compiled_tnp_times = benchmark(compiled_tnp_sigmoid, tnp_inputs)\n", "\n", "has_gpu = len(tf.config.list_logical_devices(\"GPU\"))\n", "if has_gpu:\n", " with tf.device(\"/device:GPU:0\"):\n", " tnp_inputs = [tnp.random.randn(size).astype(np.float32) for size in sizes]\n", " tnp_times_gpu = benchmark(compiled_tnp_sigmoid, tnp_inputs, 100, True)\n", "else:\n", " tnp_times_gpu = None\n", "plot(np_times, tnp_times, compiled_tnp_times, has_gpu, tnp_times_gpu)" ] }, { "cell_type": "markdown", "metadata": { "id": "ReK_9k5D8BZQ" }, "source": [ "## Further reading\n", "\n", "- [TensorFlow NumPy: Distributed Image Classification Tutorial](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/numpy_ops/g3doc/TensorFlow_Numpy_Distributed_Image_Classification.ipynb)\n", "- [TensorFlow NumPy: Keras and Distribution Strategy](\n", " https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/numpy_ops/g3doc/TensorFlow_NumPy_Keras_and_Distribution_Strategy.ipynb)\n", "- [Sentiment Analysis with Trax and TensorFlow NumPy](\n", " https://github.com/google/trax/blob/master/trax/tf_numpy_and_keras.ipynb)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "name": "tf_numpy.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 0 }