{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "1ljvLya59ep5" }, "source": [ "##### Copyright 2019 The TensorFlow Authors.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2023-11-07T19:13:09.853615Z", "iopub.status.busy": "2023-11-07T19:13:09.853168Z", "iopub.status.idle": "2023-11-07T19:13:09.857252Z", "shell.execute_reply": "2023-11-07T19:13:09.856656Z" }, "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "VcQIa1uG86Wh" }, "source": [ "# DTensor 概念" ] }, { "cell_type": "markdown", "metadata": { "id": "6dWNQEum9AfY" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
在 TensorFlow.org 上查看 在 Google Colab 中运行 在 GitHub 上查看源代码 下载笔记本
" ] }, { "cell_type": "markdown", "metadata": { "id": "MGZuakHVlVQf" }, "source": [ "## 概述\n", "\n", "此 Colab 将介绍 DTensor,它是用于同步分布式计算的 TensorFlow 扩展程序。\n", "\n", "DTensor 提供了一个全局编程模型,使开发者能够编写以全局方式在张量上进行运算,同时在内部管理跨设备分布的应用。DTensor 会通过称为*[单程序多数据 (SPMD)](https://en.wikipedia.org/wiki/SPMD) 扩展*的过程根据分片指令分布程序和张量。\n", "\n", "通过将应用与分片指令分离,DTensor 可以实现在单个设备、多个设备甚至多个客户端上运行相同的应用,同时保留其全局语义。\n", "\n", "本指南将介绍用于分布式计算的 DTensor 的概念,以及 DTensor 如何与 TensorFlow 集成。要查看在模型训练中使用 DTensor 的演示,请参阅[使用 DTensor 进行分布式训练](https://tensorflow.google.cn/tutorials/distribute/dtensor_ml_tutorial)教程。" ] }, { "cell_type": "markdown", "metadata": { "id": "h7ZTDq7KngwA" }, "source": [ "## 安装\n", "\n", "DTensor 是 TensorFlow 2.9.0 版本的一部分,并且包含在自 2022 年 4 月 9 日起的 TensorFlow Nightly 构建中。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:09.861219Z", "iopub.status.busy": "2023-11-07T19:13:09.860675Z", "iopub.status.idle": "2023-11-07T19:13:12.006908Z", "shell.execute_reply": "2023-11-07T19:13:12.005983Z" }, "id": "OKaPw8vwwZAC" }, "outputs": [], "source": [ "!pip install --quiet --upgrade --pre tensorflow" ] }, { "cell_type": "markdown", "metadata": { "id": "O3pG29uZIWYO" }, "source": [ "安装后,导入 `tensorflow` 和 `tf.experimental.dtensor`。然后,将 TensorFlow 配置为使用 6 个虚拟 CPU。\n", "\n", "本示例使用了 vCPU,但 DTensor 在 CPU、GPU 或 TPU 设备上的工作方式相同。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:12.011826Z", "iopub.status.busy": "2023-11-07T19:13:12.011109Z", "iopub.status.idle": "2023-11-07T19:13:16.313392Z", "shell.execute_reply": "2023-11-07T19:13:16.312690Z" }, "id": "Q92lo0zjwej8" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-11-07 19:13:12.446689: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2023-11-07 19:13:12.446732: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2023-11-07 19:13:12.448299: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow version: 2.15.0-rc1\n" ] }, { "data": { "text/plain": [ "[LogicalDevice(name='/device:CPU:0', device_type='CPU'),\n", " LogicalDevice(name='/device:CPU:1', device_type='CPU'),\n", " LogicalDevice(name='/device:CPU:2', device_type='CPU'),\n", " LogicalDevice(name='/device:CPU:3', device_type='CPU'),\n", " LogicalDevice(name='/device:CPU:4', device_type='CPU'),\n", " LogicalDevice(name='/device:CPU:5', device_type='CPU')]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import tensorflow as tf\n", "from tensorflow.experimental import dtensor\n", "\n", "print('TensorFlow version:', tf.__version__)\n", "\n", "def configure_virtual_cpus(ncpu):\n", " phy_devices = tf.config.list_physical_devices('CPU')\n", " tf.config.set_logical_device_configuration(phy_devices[0], [\n", " tf.config.LogicalDeviceConfiguration(),\n", " ] * ncpu)\n", "\n", "configure_virtual_cpus(6)\n", "DEVICES = [f'CPU:{i}' for i in range(6)]\n", "\n", "tf.config.list_logical_devices('CPU')" ] }, { "cell_type": "markdown", "metadata": { "id": "O-lsrxUnlsCC" }, "source": [ "## DTensor 的分布式张量模型\n", "\n", "DTensor 引入了两个概念:`dtensor.Mesh` 和 `dtensor.Layout`。它们是对跨拓扑相关设备的张量分片建模的抽象化。\n", "\n", "- `Mesh` 定义用于计算的设备列表。\n", "- `Layout` 定义如何在 `Mesh` 上执行张量维度分片。" ] }, { "cell_type": "markdown", "metadata": { "id": "JjiHaH0ql9yo" }, "source": [ "### 网格\n", "\n", "`Mesh` 表示一组设备的逻辑笛卡尔拓扑。笛卡尔网格的每个维度都被称为**网格维度**,以名称进行引用。同一 `Mesh` 内的网格维度名称必须唯一。\n", "\n", "`Layout` 会引用网格维度的名称来描述 `tf.Tensor` 沿其每个轴的分片行为。我们将在后面的 `Layout` 部分中进行更详细的说明。\n", "\n", "`Mesh` 可被视为设备的多维数组。" ] }, { "cell_type": "markdown", "metadata": { "id": "_J6cOieEbaUw" }, "source": [ "在一维 `Mesh` 中,所有设备会以单一网格维度构成列表。以下示例使用 `dtensor.create_mesh` 从 6 个 CPU 设备沿网格维度 `'x'` 创建了一个网格,大小为 6 个设备:\n", "\n", "\n", "\"具有 \n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.317343Z", "iopub.status.busy": "2023-11-07T19:13:16.316852Z", "iopub.status.idle": "2023-11-07T19:13:16.321685Z", "shell.execute_reply": "2023-11-07T19:13:16.320952Z" }, "id": "QLH5fgdBmA58" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mesh.from_string(|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5)\n" ] } ], "source": [ "mesh_1d = dtensor.create_mesh([('x', 6)], devices=DEVICES)\n", "print(mesh_1d)" ] }, { "cell_type": "markdown", "metadata": { "id": "hSZwaUwnEgXB" }, "source": [ "`Mesh` 也可以是多维的。在以下示例中,6 个 CPU 设备构成了一个 `3x2` 网格,其中网格维度 `'x'` 的大小为 3 个设备,网格维度 `'y'` 的大小为 2 个设备:\n", "\n", "\"具有" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.325452Z", "iopub.status.busy": "2023-11-07T19:13:16.324846Z", "iopub.status.idle": "2023-11-07T19:13:16.329194Z", "shell.execute_reply": "2023-11-07T19:13:16.328514Z" }, "id": "op6TmKUQE-sZ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mesh.from_string(|x=3,y=2|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5)\n" ] } ], "source": [ "mesh_2d = dtensor.create_mesh([('x', 3), ('y', 2)], devices=DEVICES)\n", "print(mesh_2d)" ] }, { "cell_type": "markdown", "metadata": { "id": "deAqdrDPFn2f" }, "source": [ "### 布局\n", "\n", "**`Layout`** 指定张量在 `Mesh` 上的分布或分片方式。\n", "\n", "注:为了避免在 `Mesh` 和 `Layout` 之间发生混淆,本指南中将*维度*这一术语限定为仅与 `Mesh` 相关,而*轴*这一术语则与 `Tensor` 和 `Layout` 相关。\n", "\n", "`Layout` 的秩应与应用该 `Layout` 的 `Tensor` 的秩相同。对于 `Tensor` 的每个轴,`Layout` 可能指定网格维度以对张量进行分片,或者将轴指定为“非分片”。张量会在任何未分片的网格维度间进行复制。\n", "\n", "`Layout` 的秩无需与 `Mesh` 的维数相符。`Layout` 的 `unsharded` 轴无需与网格维度相关,并且 `unsharded` 网格维度也无需与 `layout` 轴相关。\n", "\n", "\"dtensor" ] }, { "cell_type": "markdown", "metadata": { "id": "Px_bF1c-bQ7e" }, "source": [ "让我们分析在上一部分中创建的 `Mesh` 的几个 `Layout` 示例。" ] }, { "cell_type": "markdown", "metadata": { "id": "fqzCNlWAbm-c" }, "source": [ "在诸如 `[(\"x\", 6)]`之类的一维网格(上一部分中的 `mesh_1d`)上,`Layout([\"unsharded\", \"unsharded\"], mesh_1d)` 是在 6 个设备间复制的 2 秩张量的布局。\"在" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.333014Z", "iopub.status.busy": "2023-11-07T19:13:16.332436Z", "iopub.status.idle": "2023-11-07T19:13:16.335799Z", "shell.execute_reply": "2023-11-07T19:13:16.335192Z" }, "id": "-a3EnmZag6x1" }, "outputs": [], "source": [ "layout = dtensor.Layout([dtensor.UNSHARDED, dtensor.UNSHARDED], mesh_1d)" ] }, { "cell_type": "markdown", "metadata": { "id": "ywRJwuLDt2Qq" }, "source": [ "使用相同的张量和网格,布局 `Layout(['unsharded', 'x'])` 将在 6 个设备上对张量的第二个轴进行分片。\n", "\n", "\n", "\"A " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.339215Z", "iopub.status.busy": "2023-11-07T19:13:16.338706Z", "iopub.status.idle": "2023-11-07T19:13:16.342109Z", "shell.execute_reply": "2023-11-07T19:13:16.341508Z" }, "id": "7BgqL0jUvV5a" }, "outputs": [], "source": [ "layout = dtensor.Layout([dtensor.UNSHARDED, 'x'], mesh_1d)" ] }, { "cell_type": "markdown", "metadata": { "id": "DgciDNmK76l9" }, "source": [ "给定一个二维 3x2 网格,例如 `[(\"x\", 3), (\"y\", 2)]`(上一部分中的 `mesh_2d`),`Layout([\"y\", \"x\"], mesh_2d)` 是 2 秩 `Tensor` 的布局,其第一个轴在网格维度 `\"y\"` 上分片,第二个轴在网格维度 `\"x\"` 上分片。" ] }, { "cell_type": "markdown", "metadata": { "id": "Eyp_qOSyvieo" }, "source": [ "\"A \n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.345758Z", "iopub.status.busy": "2023-11-07T19:13:16.345197Z", "iopub.status.idle": "2023-11-07T19:13:16.348349Z", "shell.execute_reply": "2023-11-07T19:13:16.347721Z" }, "id": "p8OrehEuhPbS" }, "outputs": [], "source": [ "layout = dtensor.Layout(['y', 'x'], mesh_2d)" ] }, { "cell_type": "markdown", "metadata": { "id": "1Kyg0V3ehMNJ" }, "source": [ "对于同一 `mesh_2d`,布局 `Layout([\"x\", dtensor.UNSHARDED], mesh_2d)` 是跨 `\"y\"` 复制的 2 秩 `Tensor` 的布局,其第一个轴在网格维度 `x` 上分片。\n", "\n", "\n", "\"A \n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.352183Z", "iopub.status.busy": "2023-11-07T19:13:16.351509Z", "iopub.status.idle": "2023-11-07T19:13:16.354776Z", "shell.execute_reply": "2023-11-07T19:13:16.354193Z" }, "id": "IkWe6mVl7uRb" }, "outputs": [], "source": [ "layout = dtensor.Layout([\"x\", dtensor.UNSHARDED], mesh_2d)" ] }, { "cell_type": "markdown", "metadata": { "id": "TTalu6M-ISYb" }, "source": [ "### 单客户端和多客户端应用\n", "\n", "DTensor 支持单客户端和多客户端应用。Python 内核 Colab 就是单客户端 DTensor 应用的示例,其中包含一个 Python 进程。\n", "\n", "在多客户端 DTensor 应用中,多个 Python 进程会共同作为连贯应用执行。多客户端 DTensor 应用中 `Mesh` 的笛卡尔网格可跨设备,无论它们是本地连接到当前客户端还是远程连接到其他客户端。`Mesh` 使用的所有设备的集合称为*全局设备列表*。\n", "\n", "在多客户端 DTensor 应用中创建 `Mesh` 是一种集合运算,其中*全局设备列表*对于所有参与客户端都是相同的,并且 `Mesh` 的创建会起到全局屏障的作用。\n", "\n", "在 `Mesh` 创建期间,每个客户端都会提供其*局部设备列表*以及预期的*全局设备列表*。DTensor 会验证两个列表是否一致。有关多客户端网格创建和*全局设备列表*的更多信息,请参阅 `dtensor.create_mesh` 和 `dtensor.create_distributed_mesh` 的 API 文档。\n", "\n", "可以将单客户端视为一种仅含 1 个客户端的多客户端特例。在单客户端应用中,*全局设备列表*与*局部设备列表*相同。\n" ] }, { "cell_type": "markdown", "metadata": { "id": "P_F7DWkXkB4w" }, "source": [ "## DTensor 作为分片张量\n", "\n", "现在,让我们开始使用 `DTensor` 进行编码。辅助函数 `dtensor_from_array` 演示了如何从类似于 `tf.Tensor` 的对象创建 DTensor。该函数执行 2 个步骤:\n", "\n", "- 将张量复制到网格上的每个设备。\n", "- 根据参数中请求的布局对副本进行分片。" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.358123Z", "iopub.status.busy": "2023-11-07T19:13:16.357896Z", "iopub.status.idle": "2023-11-07T19:13:16.362446Z", "shell.execute_reply": "2023-11-07T19:13:16.361861Z" }, "id": "s6aws-b8dN9L" }, "outputs": [], "source": [ "def dtensor_from_array(arr, layout, shape=None, dtype=None):\n", " \"\"\"Convert a DTensor from something that looks like an array or Tensor.\n", "\n", " This function is convenient for quick doodling DTensors from a known,\n", " unsharded data object in a single-client environment. This is not the\n", " most efficient way of creating a DTensor, but it will do for this\n", " tutorial.\n", " \"\"\"\n", " if shape is not None or dtype is not None:\n", " arr = tf.constant(arr, shape=shape, dtype=dtype)\n", "\n", " # replicate the input to the mesh\n", " a = dtensor.copy_to_mesh(arr,\n", " layout=dtensor.Layout.replicated(layout.mesh, rank=layout.rank))\n", " # shard the copy to the desirable layout\n", " return dtensor.relayout(a, layout=layout)" ] }, { "cell_type": "markdown", "metadata": { "id": "r3o6IysrlGMu" }, "source": [ "### DTensor 剖析\n", "\n", "DTensor 是一个 `tf.Tensor` 对象,但增加了用于定义其分片行为的 `Layout` 注解。DTensor 包含以下内容:\n", "\n", "- 全局张量元数据,包括张量的全局形状和数据类型。\n", "- `Layout`,用于定义 `Tensor` 所属的 `Mesh`,以及 `Tensor` 如何分片至 `Mesh`。\n", "- **张量分量**列表,`Mesh` 中每个本地设备一个条目。\n", "\n", "您可以使用 `dtensor_from_array` 创建您的第一个 DTensor(即 `my_first_dtensor`),并检查其内容。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.365779Z", "iopub.status.busy": "2023-11-07T19:13:16.365204Z", "iopub.status.idle": "2023-11-07T19:13:16.709388Z", "shell.execute_reply": "2023-11-07T19:13:16.708673Z" }, "id": "mQu_nScGUvYH" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor([0 1], layout=\"sharding_specs:unsharded, mesh:|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(2,), dtype=int32)\n", "global shape: (2,)\n", "dtype: \n" ] } ], "source": [ "mesh = dtensor.create_mesh([(\"x\", 6)], devices=DEVICES)\n", "layout = dtensor.Layout([dtensor.UNSHARDED], mesh)\n", "\n", "my_first_dtensor = dtensor_from_array([0, 1], layout)\n", "\n", "# Examine the dtensor content\n", "print(my_first_dtensor)\n", "print(\"global shape:\", my_first_dtensor.shape)\n", "print(\"dtype:\", my_first_dtensor.dtype)" ] }, { "cell_type": "markdown", "metadata": { "id": "r8LQy1nqmvFy" }, "source": [ "#### 布局和 `fetch_layout`\n", "\n", "DTensor 的布局不是 `tf.Tensor` 的常规特性。而 DTensor 提供了用于访问 DTensor 布局的函数 `dtensor.fetch_layout`。" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.713440Z", "iopub.status.busy": "2023-11-07T19:13:16.712775Z", "iopub.status.idle": "2023-11-07T19:13:16.717070Z", "shell.execute_reply": "2023-11-07T19:13:16.716446Z" }, "id": "dCSFyaAjmzGu" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Layout.from_string(sharding_specs:unsharded, mesh:|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5)\n" ] } ], "source": [ "print(dtensor.fetch_layout(my_first_dtensor))\n", "assert layout == dtensor.fetch_layout(my_first_dtensor)" ] }, { "cell_type": "markdown", "metadata": { "id": "ed7i3l2lmatm" }, "source": [ "#### 张量分量、`pack` 和 `unpack`\n", "\n", "DTensor 包含一个**张量分量**列表。`Mesh` 中设备的张量分量是 `Tensor` 对象,后者代表了存储在此设备上的全局 DTensor 的片段。\n", "\n", "DTensor 可以通过 `dtensor.unpack` 解包为张量分量。您可以利用 `dtensor.unpack` 来检查 DTensor 的分量,并确认它们位于 `Mesh` 的所有设备上。\n", "\n", "请注意,全局视图中张量分量的位置可能相互重叠。例如,在完全复制布局的情况下,所有分量都是全局张量的相同副本。" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.720668Z", "iopub.status.busy": "2023-11-07T19:13:16.719953Z", "iopub.status.idle": "2023-11-07T19:13:16.724419Z", "shell.execute_reply": "2023-11-07T19:13:16.723814Z" }, "id": "BGbjqVAOnXMk" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Device: /job:localhost/replica:0/task:0/device:CPU:0 , tf.Tensor([0 1], shape=(2,), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:1 , tf.Tensor([0 1], shape=(2,), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:2 , tf.Tensor([0 1], shape=(2,), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:3 , tf.Tensor([0 1], shape=(2,), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:4 , tf.Tensor([0 1], shape=(2,), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:5 , tf.Tensor([0 1], shape=(2,), dtype=int32)\n" ] } ], "source": [ "for component_tensor in dtensor.unpack(my_first_dtensor):\n", " print(\"Device:\", component_tensor.device, \",\", component_tensor)" ] }, { "cell_type": "markdown", "metadata": { "id": "-tqIQM52k788" }, "source": [ "如上所示,`my_first_dtensor` 是复制到所有 6 个设备的 `[0, 1]` 的张量。" ] }, { "cell_type": "markdown", "metadata": { "id": "6By3k-CGn3yv" }, "source": [ "`dtensor.unpack` 的逆运算为 `dtensor.pack`。张量分量可以重新打包回 DTensor。\n", "\n", "分量必须具有相同的秩和数据类型,即要还原的 DTensor 的秩和数据类型。不过,作为 `dtensor.unpack` 的输入,张量分量没有严格的设备放置要求:该函数会自动将张量分量复制到其各自对应的设备。\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.728137Z", "iopub.status.busy": "2023-11-07T19:13:16.727568Z", "iopub.status.idle": "2023-11-07T19:13:16.732019Z", "shell.execute_reply": "2023-11-07T19:13:16.731419Z" }, "id": "9lT-6qQwxOgf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor([0 1], layout=\"sharding_specs:unsharded, mesh:|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(2,), dtype=int32)\n" ] } ], "source": [ "packed_dtensor = dtensor.pack(\n", " [[0, 1], [0, 1], [0, 1],\n", " [0, 1], [0, 1], [0, 1]],\n", " layout=layout\n", ")\n", "print(packed_dtensor)" ] }, { "cell_type": "markdown", "metadata": { "id": "zvS3autrpK2U" }, "source": [ "### 将 DTensor 分片到网格\n", "\n", "到目前为止,您已经使用过 `my_first_dtensor`,它是在一维 `Mesh` 中完全复制的 1 秩 DTensor。\n", "\n", "接下来,我们要创建并检查在二维 `Mesh` 中分片的 DTensor。下一个示例使用 3x2 `Mesh` 在 6 个 CPU 设备上执行此操作,其中网格维度 `'x'` 的大小为 3 个设备,网格维度 `'y'` 的大小为 2 个设备。" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.735221Z", "iopub.status.busy": "2023-11-07T19:13:16.734995Z", "iopub.status.idle": "2023-11-07T19:13:16.738541Z", "shell.execute_reply": "2023-11-07T19:13:16.737962Z" }, "id": "KWb9Ae0VJ-Rc" }, "outputs": [], "source": [ "mesh = dtensor.create_mesh([(\"x\", 3), (\"y\", 2)], devices=DEVICES)" ] }, { "cell_type": "markdown", "metadata": { "id": "ndSeQSFWKQk9" }, "source": [ "#### 二维网格上的完全分片 2 秩张量\n", "\n", "创建一个 3x2 的 2 秩 DTensor,将其第一个轴沿网格维度 `'x'` 分片,将其第二个轴沿网格维度 `'y'` 分片。\n", "\n", "- 由于张量形状等于沿所有分片轴的网格维度,每个设备都会接收 DTensor 的一个元素。\n", "- 张量分量的秩与全局形状的秩始终相同。DTensor 利用这种惯例来方便地保存用于定位张量分量与全局 DTensor 之间关系的信息。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.742275Z", "iopub.status.busy": "2023-11-07T19:13:16.741751Z", "iopub.status.idle": "2023-11-07T19:13:16.769170Z", "shell.execute_reply": "2023-11-07T19:13:16.768510Z" }, "id": "ax_ZHouJp1MX" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Device: /job:localhost/replica:0/task:0/device:CPU:0 , tf.Tensor([[0]], shape=(1, 1), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:1 , tf.Tensor([[1]], shape=(1, 1), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:2 , tf.Tensor([[2]], shape=(1, 1), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:3 , tf.Tensor([[3]], shape=(1, 1), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:4 , tf.Tensor([[4]], shape=(1, 1), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:5 , tf.Tensor([[5]], shape=(1, 1), dtype=int32)\n" ] } ], "source": [ "fully_sharded_dtensor = dtensor_from_array(\n", " tf.reshape(tf.range(6), (3, 2)),\n", " layout=dtensor.Layout([\"x\", \"y\"], mesh))\n", "\n", "for raw_component in dtensor.unpack(fully_sharded_dtensor):\n", " print(\"Device:\", raw_component.device, \",\", raw_component)" ] }, { "cell_type": "markdown", "metadata": { "id": "zhsLC-NgrC2p" }, "source": [ "#### 二维网格上的完全复制 2 秩张量\n", "\n", "为了对比,我们仍创建一个 3x2 的 2 秩 DTensor,完全复制到相同的二维网格。\n", "\n", "- 由于 DTensor 是完全复制的,每个设备都会接收 3x2 DTensor 的完整副本。\n", "- 张量分量的秩与全局形状的秩相同 – 这一事实不足为奇,因为在这种情况下,张量分量的形状无论如何都将与全局形状相同。" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.772852Z", "iopub.status.busy": "2023-11-07T19:13:16.772195Z", "iopub.status.idle": "2023-11-07T19:13:16.778890Z", "shell.execute_reply": "2023-11-07T19:13:16.778296Z" }, "id": "xmyC6H6Ec90P" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Device: /job:localhost/replica:0/task:0/device:CPU:0 , tf.Tensor(\n", "[[0 1]\n", " [2 3]\n", " [4 5]], shape=(3, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:1 , tf.Tensor(\n", "[[0 1]\n", " [2 3]\n", " [4 5]], shape=(3, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:2 , tf.Tensor(\n", "[[0 1]\n", " [2 3]\n", " [4 5]], shape=(3, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:3 , tf.Tensor(\n", "[[0 1]\n", " [2 3]\n", " [4 5]], shape=(3, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:4 , tf.Tensor(\n", "[[0 1]\n", " [2 3]\n", " [4 5]], shape=(3, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:5 , tf.Tensor(\n", "[[0 1]\n", " [2 3]\n", " [4 5]], shape=(3, 2), dtype=int32)\n" ] } ], "source": [ "fully_replicated_dtensor = dtensor_from_array(\n", " tf.reshape(tf.range(6), (3, 2)),\n", " layout=dtensor.Layout([dtensor.UNSHARDED, dtensor.UNSHARDED], mesh))\n", "# Or, layout=tensor.Layout.fully_replicated(mesh, rank=2)\n", "\n", "for component_tensor in dtensor.unpack(fully_replicated_dtensor):\n", " print(\"Device:\", component_tensor.device, \",\", component_tensor)" ] }, { "cell_type": "markdown", "metadata": { "id": "KWoyv_oHMzk1" }, "source": [ "#### 二维网格上的混合 2 秩张量\n", "\n", "介于完全分片与完全复制之间会如何?\n", "\n", "DTensor 允许采用混合 `Layout`:沿某些轴分片,但沿其他轴复制。\n", "\n", "例如,您可以通过以下方式对相同的 3x2 2 秩 DTensor 进行分片:\n", "\n", "- 第 1 个轴沿网格维度 `'x'` 分片。\n", "- 第 2 个轴沿网格维度 `'y'` 复制。\n", "\n", "要实现这种分片方案,您只需将第 2 个轴的分片规范从 `'y'` 更改为 `dtensor.UNSHARDED`,以指示您打算沿第 2 个轴进行复制。布局对象类似于 `Layout(['x', dtensor.UNSHARDED], mesh)`。" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.782453Z", "iopub.status.busy": "2023-11-07T19:13:16.781856Z", "iopub.status.idle": "2023-11-07T19:13:16.798086Z", "shell.execute_reply": "2023-11-07T19:13:16.797409Z" }, "id": "DygnbkQ1Lu42" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Device: /job:localhost/replica:0/task:0/device:CPU:0 , tf.Tensor([[0 1]], shape=(1, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:1 , tf.Tensor([[0 1]], shape=(1, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:2 , tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:3 , tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:4 , tf.Tensor([[4 5]], shape=(1, 2), dtype=int32)\n", "Device: /job:localhost/replica:0/task:0/device:CPU:5 , tf.Tensor([[4 5]], shape=(1, 2), dtype=int32)\n" ] } ], "source": [ "hybrid_sharded_dtensor = dtensor_from_array(\n", " tf.reshape(tf.range(6), (3, 2)),\n", " layout=dtensor.Layout(['x', dtensor.UNSHARDED], mesh))\n", "\n", "for component_tensor in dtensor.unpack(hybrid_sharded_dtensor):\n", " print(\"Device:\", component_tensor.device, \",\", component_tensor)" ] }, { "cell_type": "markdown", "metadata": { "id": "T7FtZ9kQRZgE" }, "source": [ "您可以检查创建的 DTensor 的张量分量并验证它们是否确实已根据您的方案进行了分片。使用图表说明情况可能会有所帮助:\n", "\n", "\"具有\n" ] }, { "cell_type": "markdown", "metadata": { "id": "auAkA38XjL-q" }, "source": [ "#### Tensor.numpy() 和分片 DTensor\n", "\n", "请注意,在分片 DTensor 上调用 `.numpy()` 方法会引发错误。错误的根本原理是为防止从多个计算设备向支持返回的 NumPy 数组的主机 CPU 设备意外收集数据。" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.802247Z", "iopub.status.busy": "2023-11-07T19:13:16.801645Z", "iopub.status.idle": "2023-11-07T19:13:16.806539Z", "shell.execute_reply": "2023-11-07T19:13:16.805875Z" }, "id": "hNdwmnL0jAXS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1]\n", " [2 3]\n", " [4 5]]\n", "got an error as expected for fully_sharded_dtensor\n", "got an error as expected for hybrid_sharded_dtensor\n" ] } ], "source": [ "print(fully_replicated_dtensor.numpy())\n", "\n", "try:\n", " fully_sharded_dtensor.numpy()\n", "except tf.errors.UnimplementedError:\n", " print(\"got an error as expected for fully_sharded_dtensor\")\n", "\n", "try:\n", " hybrid_sharded_dtensor.numpy()\n", "except tf.errors.UnimplementedError:\n", " print(\"got an error as expected for hybrid_sharded_dtensor\")" ] }, { "cell_type": "markdown", "metadata": { "id": "8WcMkiagPF_6" }, "source": [ "## DTensor 的相关 TensorFlow API\n", "\n", "DTensor 致力于以普适性的方式在您的程序中替代张量。使用 `tf.Tensor` 的 TensorFlow Python API(例如运算库函数、`tf.function`、`tf.GradientTape`)也可以与 DTensor 一起使用。\n", "\n", "为此,针对每个 [TensorFlow 计算图](https://tensorflow.google.cn/guide/intro_to_graphs),DTensor 都会在称为 *SPMD 扩展*的过程中生成并执行等效的 [SPMD](https://en.wikipedia.org/wiki/SPMD) 计算图。DTensor SPMD 扩展的几个关键步骤包括:\n", "\n", "- 在 TensorFlow 计算图中传播 DTensor 的分片 `Layout`\n", "- 使用张量分量上的等效 TensorFlow 运算重写全局 DTensor 上的 TensorFlow 运算,必要时插入集合和通信运算\n", "- 将后端中性 TensorFlow 运算降为后端特定 TensorFlow 运算。\n", "\n", "最后,**DTensor 成为张量的普适性替代**。\n", "\n", "注:DTensor 仍为实验性 API,这意味着您将探索和推动拓宽 DTensor 编程模型的边界和限制。\n", "\n", "可以通过两种方式触发 DTensor 执行:\n", "\n", "- DTensor 为 Python 函数的运算对象,例如,如果 `a`、`b` 或两者都是 DTensor,则 `tf.matmul(a, b)` 将通过 DTensor 运行。\n", "- 请求 Python 函数的结果为 DTensor,例如,`dtensor.call_with_layout(tf.ones, layout, shape=(3, 2))` 将通过 DTensor 运行,因为我们请求根据 `layout` 对 tf.ones 的输出进行分片。" ] }, { "cell_type": "markdown", "metadata": { "id": "urKzmqAoPssT" }, "source": [ "### DTensor 作为运算对象\n", "\n", "许多 TensorFlow API 函数都会将 `tf.Tensor` 作为其运算对象,并返回 `tf.Tensor` 作为其结果。对于这些函数,您可以通过传入 DTensor 作为运算对象来表明要通过 DTensor 运行函数的意图。本部分将以 `tf.matmul(a, b)` 为例。" ] }, { "cell_type": "markdown", "metadata": { "id": "7LO8ZT7iWVga" }, "source": [ "#### 完全复制输入和输出\n", "\n", "在这种情况下,DTensor 会被完全复制。在 `Mesh` 的每个设备上,\n", "\n", "- 运算对象 `a` 的张量分量为 `[[1, 2, 3], [4, 5, 6]]` (2x3)\n", "- 运算对象 `b` 的张量分量为 `[[6, 5], [4, 3], [2, 1]]` (3x2)\n", "- 计算由 `(2x3, 3x2) -> 2x2` 的单个 `MatMul` 组成\n", "- 结果 `c` 的张量分量为 `[[20, 14], [56,41]]` (2x2)\n", "\n", "浮点 mul 运算的总数为 `6 device * 4 result * 3 mul = 72`。" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.810166Z", "iopub.status.busy": "2023-11-07T19:13:16.809678Z", "iopub.status.idle": "2023-11-07T19:13:16.819158Z", "shell.execute_reply": "2023-11-07T19:13:16.818481Z" }, "id": "TiZf2J9JNd2D" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sharding spec: ['unsharded', 'unsharded']\n", "components:\n", "/job:localhost/replica:0/task:0/device:CPU:0 [[20 14]\n", " [56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:1 [[20 14]\n", " [56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:2 [[20 14]\n", " [56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:3 [[20 14]\n", " [56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:4 [[20 14]\n", " [56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:5 [[20 14]\n", " [56 41]]\n" ] } ], "source": [ "mesh = dtensor.create_mesh([(\"x\", 6)], devices=DEVICES)\n", "layout = dtensor.Layout([dtensor.UNSHARDED, dtensor.UNSHARDED], mesh)\n", "a = dtensor_from_array([[1, 2, 3], [4, 5, 6]], layout=layout)\n", "b = dtensor_from_array([[6, 5], [4, 3], [2, 1]], layout=layout)\n", "\n", "c = tf.matmul(a, b) # runs 6 identical matmuls in parallel on 6 devices\n", "\n", "# `c` is a DTensor replicated on all devices (same as `a` and `b`)\n", "print('Sharding spec:', dtensor.fetch_layout(c).sharding_specs)\n", "print(\"components:\")\n", "for component_tensor in dtensor.unpack(c):\n", " print(component_tensor.device, component_tensor.numpy())\n" ] }, { "cell_type": "markdown", "metadata": { "id": "QXtR9qgKWgWV" }, "source": [ "#### 沿收缩轴分片运算对象\n", "\n", "您可以通过对运算对象 `a` 和 `b` 进行分片来降低每个设备的计算量。`tf.matmul` 的一种热门分片方案为沿收缩轴对运算对象进行分片,这意味着沿第二个轴对 `a` 进行分片,沿第一个轴对 `b` 进行分片。\n", "\n", "在此方案下分片的全局矩阵乘积可以通过同时运行的局部 matmul 高效地执行,然后进行集合归约以聚合局部结果。这也是实现分布式矩阵点积的[规范方式](https://github.com/open-mpi/ompi/blob/ee87ec391f48512d3718fc7c8b13596403a09056/docs/man-openmpi/man3/MPI_Reduce.3.rst?plain=1#L265)。\n", "\n", "浮点 mul 运算的总数为 `6 devices * 4 result * 1 = 24`,仅为上述完全复制情况 (72) 的 1/3。系数 3 源自于沿网格维度 `x` 在 `3` 个设备间的共享。\n", "\n", "减少按顺序运行的运算数量是同步模型并行加速训练所采用的主要机制。" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.822819Z", "iopub.status.busy": "2023-11-07T19:13:16.822183Z", "iopub.status.idle": "2023-11-07T19:13:16.858253Z", "shell.execute_reply": "2023-11-07T19:13:16.857444Z" }, "id": "EyVAUvMePbms" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sharding spec: ['unsharded', 'unsharded']\n" ] } ], "source": [ "mesh = dtensor.create_mesh([(\"x\", 3), (\"y\", 2)], devices=DEVICES)\n", "a_layout = dtensor.Layout([dtensor.UNSHARDED, 'x'], mesh)\n", "a = dtensor_from_array([[1, 2, 3], [4, 5, 6]], layout=a_layout)\n", "b_layout = dtensor.Layout(['x', dtensor.UNSHARDED], mesh)\n", "b = dtensor_from_array([[6, 5], [4, 3], [2, 1]], layout=b_layout)\n", "\n", "c = tf.matmul(a, b)\n", "# `c` is a DTensor replicated on all devices (same as `a` and `b`)\n", "print('Sharding spec:', dtensor.fetch_layout(c).sharding_specs)" ] }, { "cell_type": "markdown", "metadata": { "id": "IhD8yYgJiCEh" }, "source": [ "#### 额外分片\n", "\n", "您可以对输入执行额外的分片,它们会适当地转移到结果中。例如,您可以在网格维度 `'y'` 上对运算对象 `a` 沿其第一个轴应用额外分片。额外分片将被转移到结果 `c` 的第一个轴。\n", "\n", "浮点 mul 运算的总数为 `6 devices * 2 result * 1 = 12`,仅为上述上述情况 (24) 的 1/2。系数 2 源自于沿网格维度 `y` 在 `2` 个设备间的共享。" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:16.862042Z", "iopub.status.busy": "2023-11-07T19:13:16.861367Z", "iopub.status.idle": "2023-11-07T19:13:17.580982Z", "shell.execute_reply": "2023-11-07T19:13:17.580158Z" }, "id": "0PYqe0neiOpR" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sharding spec: ['y', 'unsharded']\n", "components:\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "/job:localhost/replica:0/task:0/device:CPU:0 [[20 14]]\n", "/job:localhost/replica:0/task:0/device:CPU:1 [[56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:2 [[20 14]]\n", "/job:localhost/replica:0/task:0/device:CPU:3 [[56 41]]\n", "/job:localhost/replica:0/task:0/device:CPU:4 [[20 14]]\n", "/job:localhost/replica:0/task:0/device:CPU:5 [[56 41]]\n" ] } ], "source": [ "mesh = dtensor.create_mesh([(\"x\", 3), (\"y\", 2)], devices=DEVICES)\n", "\n", "a_layout = dtensor.Layout(['y', 'x'], mesh)\n", "a = dtensor_from_array([[1, 2, 3], [4, 5, 6]], layout=a_layout)\n", "b_layout = dtensor.Layout(['x', dtensor.UNSHARDED], mesh)\n", "b = dtensor_from_array([[6, 5], [4, 3], [2, 1]], layout=b_layout)\n", "\n", "c = tf.matmul(a, b)\n", "# The sharding of `a` on the first axis is carried to `c'\n", "print('Sharding spec:', dtensor.fetch_layout(c).sharding_specs)\n", "print(\"components:\")\n", "for component_tensor in dtensor.unpack(c):\n", " print(component_tensor.device, component_tensor.numpy())" ] }, { "cell_type": "markdown", "metadata": { "id": "c-1NazCVmLWZ" }, "source": [ "### DTensor 作为输出\n", "\n", "不接受运算对象但会返回可分片的张量结果的 Python 函数如何?此类函数的示例为:\n", "\n", "- `tf.ones`、`tf.zeros`、`tf.random.stateless_normal`\n", "\n", "针对这些 Python 函数,DTensor 提供了 `dtensor.call_with_layout`,后者支持使用 DTensor 以 Eager 方式执行 Python 函数,并确保返回的张量是具有请求 `Layout` 的 DTensor。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.585050Z", "iopub.status.busy": "2023-11-07T19:13:17.584388Z", "iopub.status.idle": "2023-11-07T19:13:17.588625Z", "shell.execute_reply": "2023-11-07T19:13:17.587984Z" }, "id": "J0jo_8NPtJiO" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function call_with_layout in module tensorflow.dtensor.python.api:\n", "\n", "call_with_layout(fn: Callable[..., Any], layout: Optional[tensorflow.dtensor.python.layout.Layout], *args, **kwargs) -> Any\n", " Calls a function in the DTensor device scope if `layout` is not None.\n", " \n", " If `layout` is not None, `fn` consumes DTensor(s) as input and produces a\n", " DTensor as output; a DTensor is a tf.Tensor with layout-related attributes.\n", " \n", " If `layout` is None, `fn` consumes and produces regular tf.Tensors.\n", " \n", " Args:\n", " fn: A supported TF API function such as tf.zeros.\n", " layout: Optional, the layout of the output DTensor.\n", " *args: Arguments given to `fn`.\n", " **kwargs: Keyword arguments given to `fn`.\n", " \n", " Returns:\n", " The return value of `fn` transformed to a DTensor if requested.\n", "\n" ] } ], "source": [ "help(dtensor.call_with_layout)" ] }, { "cell_type": "markdown", "metadata": { "id": "V-YdLvfytM7g" }, "source": [ "以 Eager 方式执行的 Python 函数通常只包含一个非常重要的 TensorFlow 运算。\n", "\n", "要使用通过 `dtensor.call_with_layout` 发出多个 TensorFlow 运算的 Python 函数,应将该函数转换为 `tf.function`。调用 `tf.function` 为单个 TensorFlow 运算。调用 `tf.function` 时,DTensor 可以在任何中间张量具体化之前在分析 `tf.function` 的计算图时执行布局传播。" ] }, { "cell_type": "markdown", "metadata": { "id": "DLrksgFjqRLS" }, "source": [ "#### 发出单个 TensorFlow 运算的 API\n", "\n", "如果函数发出单个 TensorFlow 运算,您可以直接将 `dtensor.call_with_layout` 应用于该函数。" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.592458Z", "iopub.status.busy": "2023-11-07T19:13:17.591857Z", "iopub.status.idle": "2023-11-07T19:13:17.595609Z", "shell.execute_reply": "2023-11-07T19:13:17.595005Z" }, "id": "G1CuKYSFtFeM" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function ones in module tensorflow.python.ops.array_ops:\n", "\n", "ones(shape, dtype=tf.float32, name=None, layout=None)\n", " Creates a tensor with all elements set to one (1).\n", " \n", " See also `tf.ones_like`, `tf.zeros`, `tf.fill`, `tf.eye`.\n", " \n", " This operation returns a tensor of type `dtype` with shape `shape` and\n", " all elements set to one.\n", " \n", " >>> tf.ones([3, 4], tf.int32)\n", " \n", " \n", " Args:\n", " shape: A `list` of integers, a `tuple` of integers, or a 1-D `Tensor` of\n", " type `int32`.\n", " dtype: Optional DType of an element in the resulting `Tensor`. Default is\n", " `tf.float32`.\n", " name: Optional string. A name for the operation.\n", " layout: Optional, `tf.experimental.dtensor.Layout`. If provided, the result\n", " is a [DTensor](https://www.tensorflow.org/guide/dtensor_overview) with the\n", " provided layout.\n", " \n", " Returns:\n", " A `Tensor` with all elements set to one (1).\n", "\n" ] } ], "source": [ "help(tf.ones)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.598859Z", "iopub.status.busy": "2023-11-07T19:13:17.598394Z", "iopub.status.idle": "2023-11-07T19:13:17.613818Z", "shell.execute_reply": "2023-11-07T19:13:17.613168Z" }, "id": "2m_EAwy-ozOh" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor({\"CPU:0\": [[1 1]\n", " [1 1]], \"CPU:1\": [[1 1]\n", " [1 1]], \"CPU:2\": [[1 1]\n", " [1 1]], \"CPU:3\": [[1 1]\n", " [1 1]], \"CPU:4\": [[1 1]\n", " [1 1]], \"CPU:5\": [[1 1]\n", " [1 1]]}, layout=\"sharding_specs:x,y, mesh:|x=3,y=2|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(6, 4), dtype=float32)\n" ] } ], "source": [ "mesh = dtensor.create_mesh([(\"x\", 3), (\"y\", 2)], devices=DEVICES)\n", "ones = dtensor.call_with_layout(tf.ones, dtensor.Layout(['x', 'y'], mesh), shape=(6, 4))\n", "print(ones)" ] }, { "cell_type": "markdown", "metadata": { "id": "bx-7Xo8Cpb8S" }, "source": [ "#### 发出多个 TensorFlow 运算的 API\n", "\n", "如果 API 发出多个 TensorFlow 运算,请通过 `tf.function` 将函数转换为单个运算。例如 `tf.random.stateleess_normal`" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.617222Z", "iopub.status.busy": "2023-11-07T19:13:17.616684Z", "iopub.status.idle": "2023-11-07T19:13:17.620493Z", "shell.execute_reply": "2023-11-07T19:13:17.619820Z" }, "id": "H8BQSTRFtCih" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function stateless_random_normal in module tensorflow.python.ops.stateless_random_ops:\n", "\n", "stateless_random_normal(shape, seed, mean=0.0, stddev=1.0, dtype=tf.float32, name=None, alg='auto_select')\n", " Outputs deterministic pseudorandom values from a normal distribution.\n", " \n", " This is a stateless version of `tf.random.normal`: if run twice with the\n", " same seeds and shapes, it will produce the same pseudorandom numbers. The\n", " output is consistent across multiple runs on the same hardware (and between\n", " CPU and GPU), but may change between versions of TensorFlow or on non-CPU/GPU\n", " hardware.\n", " \n", " Args:\n", " shape: A 1-D integer Tensor or Python array. The shape of the output tensor.\n", " seed: A shape [2] Tensor, the seed to the random number generator. Must have\n", " dtype `int32` or `int64`. (When using XLA, only `int32` is allowed.)\n", " mean: A 0-D Tensor or Python value of type `dtype`. The mean of the normal\n", " distribution.\n", " stddev: A 0-D Tensor or Python value of type `dtype`. The standard deviation\n", " of the normal distribution.\n", " dtype: The float type of the output: `float16`, `bfloat16`, `float32`,\n", " `float64`. Defaults to `float32`.\n", " name: A name for the operation (optional).\n", " alg: The RNG algorithm used to generate the random numbers. See\n", " `tf.random.stateless_uniform` for a detailed explanation.\n", " \n", " Returns:\n", " A tensor of the specified shape filled with random normal values.\n", "\n" ] } ], "source": [ "help(tf.random.stateless_normal)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.623568Z", "iopub.status.busy": "2023-11-07T19:13:17.623120Z", "iopub.status.idle": "2023-11-07T19:13:17.714128Z", "shell.execute_reply": "2023-11-07T19:13:17.713432Z" }, "id": "TvP81eYopSPm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor({\"CPU:0\": [[0.0368092842 1.76192284]\n", " [1.22868407 -0.731756687]], \"CPU:1\": [[0.255247623 -0.13820985]\n", " [-0.747412503 1.06443202]], \"CPU:2\": [[-0.395325899 -0.836183369]\n", " [0.581941128 -0.2587713]], \"CPU:3\": [[0.476060659 0.406645179]\n", " [-0.110623844 -1.49052978]], \"CPU:4\": [[0.645035267 1.36384416]\n", " [2.18210244 -0.965060234]], \"CPU:5\": [[-1.70534277 1.32558191]\n", " [0.972473264 0.972343624]]}, layout=\"sharding_specs:x,y, mesh:|x=3,y=2|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(6, 4), dtype=float32)\n" ] } ], "source": [ "ones = dtensor.call_with_layout(\n", " tf.function(tf.random.stateless_normal),\n", " dtensor.Layout(['x', 'y'], mesh),\n", " shape=(6, 4),\n", " seed=(1, 1))\n", "print(ones)" ] }, { "cell_type": "markdown", "metadata": { "id": "qKoojp9ZyWzW" }, "source": [ "允许使用 `tf.function` 来包装发出单个 TensorFlow 运算的 Python 函数。唯一需要注意的是,必须承担从 Python 函数创建 `tf.function` 的相关成本和复杂性。" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.717672Z", "iopub.status.busy": "2023-11-07T19:13:17.717062Z", "iopub.status.idle": "2023-11-07T19:13:17.787380Z", "shell.execute_reply": "2023-11-07T19:13:17.786656Z" }, "id": "LbAtKrSkpOaq" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor({\"CPU:0\": [[1 1]\n", " [1 1]], \"CPU:1\": [[1 1]\n", " [1 1]], \"CPU:2\": [[1 1]\n", " [1 1]], \"CPU:3\": [[1 1]\n", " [1 1]], \"CPU:4\": [[1 1]\n", " [1 1]], \"CPU:5\": [[1 1]\n", " [1 1]]}, layout=\"sharding_specs:x,y, mesh:|x=3,y=2|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(6, 4), dtype=float32)\n" ] } ], "source": [ "ones = dtensor.call_with_layout(\n", " tf.function(tf.ones),\n", " dtensor.Layout(['x', 'y'], mesh),\n", " shape=(6, 4))\n", "print(ones)" ] }, { "cell_type": "markdown", "metadata": { "id": "D-m1816JP3CE" }, "source": [ "### 从 `tf.Variable` 到 `dtensor.DVariable`\n", "\n", "在 Tensorflow 中,`tf.Variable` 是可变 `Tensor` 值的持有者。使用 DTensor 时,`dtensor.DVariable` 可以提供相应的变量语义。\n", "\n", "为 DTensor 变量引入新类型 `DVariable` 的原因是 DVariable 有一个额外的要求,即布局的初始值不能改变。" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.791087Z", "iopub.status.busy": "2023-11-07T19:13:17.790547Z", "iopub.status.idle": "2023-11-07T19:13:17.896160Z", "shell.execute_reply": "2023-11-07T19:13:17.895410Z" }, "id": "awRPuR26P0Sc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(, layout=\"sharding_specs:unsharded,unsharded, mesh:|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(), dtype=resource)\n" ] } ], "source": [ "mesh = dtensor.create_mesh([(\"x\", 6)], devices=DEVICES)\n", "layout = dtensor.Layout([dtensor.UNSHARDED, dtensor.UNSHARDED], mesh)\n", "\n", "v = dtensor.DVariable(\n", " initial_value=dtensor.call_with_layout(\n", " tf.function(tf.random.stateless_normal),\n", " layout=layout,\n", " shape=tf.TensorShape([64, 32]),\n", " seed=[1, 1],\n", " dtype=tf.float32))\n", "\n", "print(v.handle)\n", "assert layout == dtensor.fetch_layout(v)" ] }, { "cell_type": "markdown", "metadata": { "id": "Pb9jn473prC_" }, "source": [ "除了匹配 `layout` 的要求外,`DVariable` 的行为与 `tf.Variable` 相同。例如,您可以将 DVariable 添加到 DTensor。\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.899698Z", "iopub.status.busy": "2023-11-07T19:13:17.899228Z", "iopub.status.idle": "2023-11-07T19:13:17.906600Z", "shell.execute_reply": "2023-11-07T19:13:17.905968Z" }, "id": "adxFw9wJpqQQ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor([[2.66521645 2.36637592 1.77863169 ... -1.18624139 2.26035929 0.664066315]\n", " [0.511952519 0.655031443 0.122243524 ... 0.0424078107 1.67057109 0.912334144]\n", " [0.769825 1.42743981 3.13473773 ... 1.16159868 0.628931046 0.733521938]\n", " ...\n", " [0.388001859 2.72882509 2.92771554 ... 1.17472672 1.72462416 1.5047121]\n", " [-0.252545118 0.761886716 1.72119033 ... 0.775034547 2.8065362 1.00457215]\n", " [1.23498726 0.584536672 1.15659761 ... 0.955793858 1.11440909 0.18848455]], layout=\"sharding_specs:unsharded,unsharded, mesh:|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(64, 32), dtype=float32)\n" ] } ], "source": [ "a = dtensor.call_with_layout(tf.ones, layout=layout, shape=(64, 32))\n", "b = v + a # add DVariable and DTensor\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "QxBdNHWSu-kV" }, "source": [ "您还可以将 DTensor 指定给 DVariable。\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.909812Z", "iopub.status.busy": "2023-11-07T19:13:17.909406Z", "iopub.status.idle": "2023-11-07T19:13:17.919037Z", "shell.execute_reply": "2023-11-07T19:13:17.918396Z" }, "id": "oYwfiyw5P94U" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor([[1 1 1 ... 1 1 1]\n", " [1 1 1 ... 1 1 1]\n", " [1 1 1 ... 1 1 1]\n", " ...\n", " [1 1 1 ... 1 1 1]\n", " [1 1 1 ... 1 1 1]\n", " [1 1 1 ... 1 1 1]], layout=\"sharding_specs:unsharded,unsharded, mesh:|x=6|0,1,2,3,4,5|0,1,2,3,4,5|/job:localhost/replica:0/task:0/device:CPU:0,/job:localhost/replica:0/task:0/device:CPU:1,/job:localhost/replica:0/task:0/device:CPU:2,/job:localhost/replica:0/task:0/device:CPU:3,/job:localhost/replica:0/task:0/device:CPU:4,/job:localhost/replica:0/task:0/device:CPU:5\", shape=(64, 32), dtype=float32)\n" ] } ], "source": [ "v.assign(a) # assign a DTensor to a DVariable\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "4fvSk_VUvGnj" }, "source": [ "通过指定具有不兼容布局的 DTensor 来尝试改变 `DVariable` 的布局会产生错误。" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-11-07T19:13:17.922382Z", "iopub.status.busy": "2023-11-07T19:13:17.921841Z", "iopub.status.idle": "2023-11-07T19:13:17.929081Z", "shell.execute_reply": "2023-11-07T19:13:17.928452Z" }, "id": "3pckUugYP_r-" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "exception raised\n" ] } ], "source": [ "# variable's layout is immutable.\n", "another_mesh = dtensor.create_mesh([(\"x\", 3), (\"y\", 2)], devices=DEVICES)\n", "b = dtensor.call_with_layout(tf.ones,\n", " layout=dtensor.Layout([dtensor.UNSHARDED, dtensor.UNSHARDED], another_mesh),\n", " shape=(64, 32))\n", "try:\n", " v.assign(b)\n", "except:\n", " print(\"exception raised\")" ] }, { "cell_type": "markdown", "metadata": { "id": "3LadIcwRvR6f" }, "source": [ "## 后续步骤\n", "\n", "在此 Colab 中,您了解了 DTensor,它是用于分布式计算的 TensorFlow 扩展程序。要在教程中尝试运用这些概念,请参阅[使用 DTensor 进行分布式训练](https://tensorflow.google.cn/tutorials/distribute/dtensor_ml_tutorial)。" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "dtensor_overview.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 0 }