{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "iPpI7RaYoZuE" }, "source": [ "##### Copyright 2018 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "hro2InpHobKk" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "U9i2Dsh-ziXr" }, "source": [ "# 自定义基础知识:张量和运算" ] }, { "cell_type": "markdown", "metadata": { "id": "Hndw-YcxoOJK" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
在 TensorFlow.org 查看在 Google Colab 中运行 在 Github 上查看源代码下载笔记本
" ] }, { "cell_type": "markdown", "metadata": { "id": "6sILUVbHoSgH" }, "source": [ "本 TensorFlow 入门教程可帮助您了解如何:\n", "\n", "- 导入所需的软件包。\n", "- 创建和使用张量。\n", "- 使用 GPU 加速。\n", "- 使用 `tf.data.Dataset` 构建数据流水线。" ] }, { "cell_type": "markdown", "metadata": { "id": "z1JcS5iBXMRO" }, "source": [ "## 导入 TensorFlow\n", "\n", "首先,导入 `tensorflow` 模块。从 TensorFlow 2 开始,会默认打开 Eager Execution。Eager Execution 可以使 TensorFlow 的前端交互性更高,稍后将更详细地探索它。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "vjBPmYjLdFmk" }, "outputs": [], "source": [ "import tensorflow as tf" ] }, { "cell_type": "markdown", "metadata": { "id": "H9UySOPLXdaw" }, "source": [ "## 张量\n", "\n", "张量是一个多维数组。与 NumPy `ndarray` 对象类似,`tf.Tensor` 对象也具有数据类型和形状。此外,`tf.Tensor` 可以驻留在加速器内存(例如 GPU)中。TensorFlow 提供了丰富的运算库(`tf.math.add`、`tf.linalg.matmul` 和 `tf.linalg.inv` 等),这些运算使用和生成 `tf.Tensor`。这些运算会自动转换原生 Python 类型,例如:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "code", "id": "ngUe237Wt48W" }, "outputs": [], "source": [ "print(tf.math.add(1, 2))\n", "print(tf.math.add([1, 2], [3, 4]))\n", "print(tf.math.square(5))\n", "print(tf.math.reduce_sum([1, 2, 3]))\n", "\n", "# Operator overloading is also supported\n", "print(tf.math.square(2) + tf.math.square(3))" ] }, { "cell_type": "markdown", "metadata": { "id": "IDY4WsYRhP81" }, "source": [ "每个 `tf.Tensor` 都具有形状和数据类型:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "srYWH1MdJNG7" }, "outputs": [], "source": [ "x = tf.linalg.matmul([[1]], [[2, 3]])\n", "print(x)\n", "print(x.shape)\n", "print(x.dtype)" ] }, { "cell_type": "markdown", "metadata": { "id": "eBPw8e8vrsom" }, "source": [ "NumPy 数组与 `tf.Tensor` 之间最明显的区别是:\n", "\n", "1. 张量可以驻留在加速器内存(例如 GPU、TPU)中。\n", "2. 张量不可变。" ] }, { "cell_type": "markdown", "metadata": { "id": "Dwi1tdW3JBw6" }, "source": [ "### NumPy 兼容性\n", "\n", "在 TensorFlow `tf.Tensor` 和 NumPy `ndarray` 之间进行转换非常容易:\n", "\n", "- TensorFlow 运算会自动将 NumPy ndarray 转换为张量。\n", "- NumPy 运算会自动将张量转换为 NumPy ndarray。\n", "\n", "张量可以使用 `.numpy()` 方法显式转换为 NumPy ndarray。这些转换的开销通常比较小,因为如果可行,数组和 `tf.Tensor` 将共享底层内存表示。但是,共享底层表示并非始终可行,因为 `tf.Tensor` 可能托管在 GPU 内存中,而 NumPy 数组则始终驻留在主机内存中,所以转换涉及从 GPU 向主机内存复制的过程。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lCUWzso6mbqR" }, "outputs": [], "source": [ "import numpy as np\n", "\n", "ndarray = np.ones([3, 3])\n", "\n", "print(\"TensorFlow operations convert numpy arrays to Tensors automatically\")\n", "tensor = tf.math.multiply(ndarray, 42)\n", "print(tensor)\n", "\n", "\n", "print(\"And NumPy operations convert Tensors to NumPy arrays automatically\")\n", "print(np.add(tensor, 1))\n", "\n", "print(\"The .numpy() method explicitly converts a Tensor to a numpy array\")\n", "print(tensor.numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "PBNP8yTRfu_X" }, "source": [ "## GPU 加速\n", "\n", "许多 TensorFlow 操作都可使用 GPU 来加速计算。无需任何注释,TensorFlow 会自动决定是使用 GPU 还是 CPU 进行操作 - 如有必要,会在 CPU 和 GPU 内存之间复制张量。 由操作生成的张量通常驻留在执行操作的设备的内存中,例如:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "code", "id": "3Twf_Rw-gQFM" }, "outputs": [], "source": [ "x = tf.random.uniform([3, 3])\n", "\n", "print(\"Is there a GPU available: \"),\n", "print(tf.config.list_physical_devices(\"GPU\"))\n", "\n", "print(\"Is the Tensor on GPU #0: \"),\n", "print(x.device.endswith('GPU:0'))" ] }, { "cell_type": "markdown", "metadata": { "id": "vpgYzgVXW2Ud" }, "source": [ "### 设备名称\n", "\n", "`Tensor.device` 属性提供存放张量内容的设备的完全限定字符串名称。此名称会编码许多详细信息,例如执行此程序的主机的网络地址标识符以及该主机内设备的标识符。这是分布式执行 TensorFlow 程序所必需的信息。如果将张量放置在主机的第 `N` 个 GPU 上,则该字符串将以 `GPU:` 结尾。" ] }, { "cell_type": "markdown", "metadata": { "id": "ZWZQCimzuqyP" }, "source": [ "### 显式设备放置\n", "\n", "在 TensorFlow 中,*放置*指的是各个运算在设备上如何分配(放置)以执行。如上所述,如果没有提供明确的指导,TensorFlow 将自动决定执行运算的设备,并在需要时将张量复制到该设备。\n", "\n", "不过,可以使用 `tf.device` 上下文管理器将 TensorFlow 运算显式放置在特定设备上。例如:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RjkNZTuauy-Q" }, "outputs": [], "source": [ "import time\n", "\n", "def time_matmul(x):\n", " start = time.time()\n", " for loop in range(10):\n", " tf.linalg.matmul(x, x)\n", "\n", " result = time.time()-start\n", "\n", " print(\"10 loops: {:0.2f}ms\".format(1000*result))\n", "\n", "# Force execution on CPU\n", "print(\"On CPU:\")\n", "with tf.device(\"CPU:0\"):\n", " x = tf.random.uniform([1000, 1000])\n", " assert x.device.endswith(\"CPU:0\")\n", " time_matmul(x)\n", "\n", "# Force execution on GPU #0 if available\n", "if tf.config.list_physical_devices(\"GPU\"):\n", " print(\"On GPU:\")\n", " with tf.device(\"GPU:0\"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.\n", " x = tf.random.uniform([1000, 1000])\n", " assert x.device.endswith(\"GPU:0\")\n", " time_matmul(x)" ] }, { "cell_type": "markdown", "metadata": { "id": "o1K4dlhhHtQj" }, "source": [ "## 数据集\n", "\n", "本部分使用 `tf.data.Dataset` API 构建为模型提供数据的流水线。`tf.data.Dataset` 用于从简单、可重用的数据流水线构建高性能、复杂的输入流水线,为模型的训练或评估循环馈送数据。(有关详情,请参阅 [tf.data:构建 TensorFlow 输入流水线](../../guide/data.ipynb)指南。)" ] }, { "cell_type": "markdown", "metadata": { "id": "zI0fmOynH-Ne" }, "source": [ "### 创建源 `Dataset`\n", "\n", "使用诸如 `tf.data.Dataset.from_tensors`、`tf.data.Dataset.from_tensor_slices` 等工厂函数之一,或使用从诸如 `tf.data.TextLineDataset` 或 `tf.data.TFRecordDataset` 等文件读取的对象创建*源*数据集。有关详情,请参阅 [tf.data:构建 TensorFlow 输入流水线](../../guide/data.ipynb)指南的*读取输入数据*部分。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "F04fVOHQIBiG" }, "outputs": [], "source": [ "ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])\n", "\n", "# Create a CSV file\n", "import tempfile\n", "_, filename = tempfile.mkstemp()\n", "\n", "with open(filename, 'w') as f:\n", " f.write(\"\"\"Line 1\n", "Line 2\n", "Line 3\n", " \"\"\")\n", "\n", "ds_file = tf.data.TextLineDataset(filename)" ] }, { "cell_type": "markdown", "metadata": { "id": "vbxIhC-5IPdf" }, "source": [ "### 应用转换\n", "\n", "使用 `tf.data.Dataset.map`、`tf.data.Dataset.batch` 和 `tf.data.Dataset.shuffle` 等转换函数对数据集记录应用转换。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "uXSDZWE-ISsd" }, "outputs": [], "source": [ "ds_tensors = ds_tensors.map(tf.math.square).shuffle(2).batch(2)\n", "\n", "ds_file = ds_file.batch(2)" ] }, { "cell_type": "markdown", "metadata": { "id": "A8X1GNfoIZKJ" }, "source": [ "### 迭代\n", "\n", "`tf.data.Dataset` 对象支持迭代以循环记录:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ws-WKRk5Ic6-" }, "outputs": [], "source": [ "print('Elements of ds_tensors:')\n", "for x in ds_tensors:\n", " print(x)\n", "\n", "print('\\nElements in ds_file:')\n", "for x in ds_file:\n", " print(x)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "name": "basics.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }