{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MhoQ0WE77laV"
   },
   "source": [
    "##### Copyright 2019 The TensorFlow Authors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cellView": "form",
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:24.874443Z",
     "iopub.status.busy": "2024-01-11T18:11:24.874166Z",
     "iopub.status.idle": "2024-01-11T18:11:24.878249Z",
     "shell.execute_reply": "2024-01-11T18:11:24.877656Z"
    },
    "id": "_ckMIh7O7s6D"
   },
   "outputs": [],
   "source": [
    "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jYysdyb-CaWM"
   },
   "source": [
    "# tf.distribute.Strategy を使用したカスタムトレーニング"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "S5Uhzt6vVIB2"
   },
   "source": [
    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
    "  <td><a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/distribute/custom_training\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\">TensorFlow.org で表示</a></td>\n",
    "  <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/ja/tutorials/distribute/custom_training.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\">Google Colab で実行</a></td>\n",
    "  <td><a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/ja/tutorials/distribute/custom_training.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\">GitHub でソースを表示</a></td>\n",
    "  <td><a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/ja/tutorials/distribute/custom_training.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\">ノートブックをダウンロード</a></td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FbVhjPpzn6BM"
   },
   "source": [
    "このチュートリアルでは、複数の処理ユニット（GPU、複数のマシン、または TPU）に[トレーニングを分散](../../guide/distributed_training.ipynb)するための抽象化を提供する `tf.distribute.Strategy` という TensorFlow API をカスタムトレーニングループで使用する方法を説明します。この例では、70,000 個の 28 x 28 のサイズの画像を含む [Fashion MNIST データセット](https://github.com/zalandoresearch/fashion-mnist)で、単純な畳み込みニューラルネットワークをトレーニングします。\n",
    "\n",
    "[カスタムトレーニングループ](../customization/custom_training_walkthrough.ipynb)を使用すると、より優れた制御によってトレーニングを柔軟に実行できます。また、モデルとトレーニングループのデバックもより簡単に行えるようになります。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:24.881351Z",
     "iopub.status.busy": "2024-01-11T18:11:24.881083Z",
     "iopub.status.idle": "2024-01-11T18:11:27.243419Z",
     "shell.execute_reply": "2024-01-11T18:11:27.242715Z"
    },
    "id": "dzLKpmZICaWN"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-01-11 18:11:25.309748: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-01-11 18:11:25.309791: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-01-11 18:11:25.311321: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.15.0\n"
     ]
    }
   ],
   "source": [
    "# Import TensorFlow\n",
    "import tensorflow as tf\n",
    "\n",
    "# Helper libraries\n",
    "import numpy as np\n",
    "import os\n",
    "\n",
    "print(tf.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MM6W__qraV55"
   },
   "source": [
    "## Fashion MNIST データセットをダウンロードする"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:27.247393Z",
     "iopub.status.busy": "2024-01-11T18:11:27.247009Z",
     "iopub.status.idle": "2024-01-11T18:11:28.266632Z",
     "shell.execute_reply": "2024-01-11T18:11:28.265861Z"
    },
    "id": "7MqDQO0KCaWS"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      " 8192/29515 [=======>......................] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "29515/29515 [==============================] - 0s 0us/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "    8192/26421880 [..............................] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 4202496/26421880 [===>..........................] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "12484608/26421880 [=============>................] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "18333696/26421880 [===================>..........] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "26421880/26421880 [==============================] - 0s 0us/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "5148/5148 [==============================] - 0s 0us/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "   8192/4422102 [..............................] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "4422102/4422102 [==============================] - 0s 0us/step\n"
     ]
    }
   ],
   "source": [
    "fashion_mnist = tf.keras.datasets.fashion_mnist\n",
    "\n",
    "(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()\n",
    "\n",
    "# Add a dimension to the array -> new shape == (28, 28, 1)\n",
    "# This is done because the first layer in our model is a convolutional\n",
    "# layer and it requires a 4D input (batch_size, height, width, channels).\n",
    "# batch_size dimension will be added later on.\n",
    "train_images = train_images[..., None]\n",
    "test_images = test_images[..., None]\n",
    "\n",
    "# Scale the images to the [0, 1] range.\n",
    "train_images = train_images / np.float32(255)\n",
    "test_images = test_images / np.float32(255)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4AXoHhrsbdF3"
   },
   "source": [
    "## 変数とグラフを分散させるストラテジーを作成する"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5mVuLZhbem8d"
   },
   "source": [
    "`tf.distribute.MirroredStrategy`ストラテジーはどのように機能するのでしょう？\n",
    "\n",
    "- すべての変数とモデルグラフはレプリカ上に複製されます。\n",
    "- 入力はレプリカ全体に均等に分散されます。\n",
    "- 各レプリカは受け取った入力の損失と勾配を計算します。\n",
    "- 勾配は加算して全てのレプリカ間で同期されます。\n",
    "- 同期後、各レプリカ上の変数のコピーにも同じ更新が行われます。\n",
    "\n",
    "注意: 下のコードはすべて 1 つのスコープ内に入れることができます。説明しやすいように、この例では複数のコードセルに分割しています。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:28.270824Z",
     "iopub.status.busy": "2024-01-11T18:11:28.270565Z",
     "iopub.status.idle": "2024-01-11T18:11:31.449695Z",
     "shell.execute_reply": "2024-01-11T18:11:31.448920Z"
    },
    "id": "F2VeZUWUj5S4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')\n"
     ]
    }
   ],
   "source": [
    "# If the list of devices is not specified in\n",
    "# `tf.distribute.MirroredStrategy` constructor, they will be auto-detected.\n",
    "strategy = tf.distribute.MirroredStrategy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:31.453746Z",
     "iopub.status.busy": "2024-01-11T18:11:31.453501Z",
     "iopub.status.idle": "2024-01-11T18:11:31.457177Z",
     "shell.execute_reply": "2024-01-11T18:11:31.456576Z"
    },
    "id": "ZngeM_2o0_JO"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of devices: 4\n"
     ]
    }
   ],
   "source": [
    "print('Number of devices: {}'.format(strategy.num_replicas_in_sync))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "k53F5I_IiGyI"
   },
   "source": [
    "## 入力パイプラインをセットアップする"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:31.460147Z",
     "iopub.status.busy": "2024-01-11T18:11:31.459925Z",
     "iopub.status.idle": "2024-01-11T18:11:31.463213Z",
     "shell.execute_reply": "2024-01-11T18:11:31.462597Z"
    },
    "id": "jwJtsCQhHK-E"
   },
   "outputs": [],
   "source": [
    "BUFFER_SIZE = len(train_images)\n",
    "\n",
    "BATCH_SIZE_PER_REPLICA = 64\n",
    "GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync\n",
    "\n",
    "EPOCHS = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "J7fj3GskHC8g"
   },
   "source": [
    "データセットを作成して、それを分散します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:31.465981Z",
     "iopub.status.busy": "2024-01-11T18:11:31.465763Z",
     "iopub.status.idle": "2024-01-11T18:11:32.207447Z",
     "shell.execute_reply": "2024-01-11T18:11:32.206720Z"
    },
    "id": "WYrMNNDhAvVl"
   },
   "outputs": [],
   "source": [
    "train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(BUFFER_SIZE).batch(GLOBAL_BATCH_SIZE)\n",
    "test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)\n",
    "\n",
    "train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)\n",
    "test_dist_dataset = strategy.experimental_distribute_dataset(test_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bAXAo_wWbWSb"
   },
   "source": [
    "## モデルを作成する\n",
    "\n",
    "`tf.keras.Sequential` を使用してモデルを作成します。これには、[Model Subclassing API](https://www.tensorflow.org/guide/keras/custom_layers_and_models) や [functional API](https://www.tensorflow.org/guide/keras/functional) も使用できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.211502Z",
     "iopub.status.busy": "2024-01-11T18:11:32.211204Z",
     "iopub.status.idle": "2024-01-11T18:11:32.216533Z",
     "shell.execute_reply": "2024-01-11T18:11:32.215903Z"
    },
    "id": "9ODch-OFCaW4"
   },
   "outputs": [],
   "source": [
    "def create_model():\n",
    "  regularizer = tf.keras.regularizers.L2(1e-5)\n",
    "  model = tf.keras.Sequential([\n",
    "      tf.keras.layers.Conv2D(32, 3,\n",
    "                             activation='relu',\n",
    "                             kernel_regularizer=regularizer),\n",
    "      tf.keras.layers.MaxPooling2D(),\n",
    "      tf.keras.layers.Conv2D(64, 3,\n",
    "                             activation='relu',\n",
    "                             kernel_regularizer=regularizer),\n",
    "      tf.keras.layers.MaxPooling2D(),\n",
    "      tf.keras.layers.Flatten(),\n",
    "      tf.keras.layers.Dense(64,\n",
    "                            activation='relu',\n",
    "                            kernel_regularizer=regularizer),\n",
    "      tf.keras.layers.Dense(10, kernel_regularizer=regularizer)\n",
    "    ])\n",
    "\n",
    "  return model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.219248Z",
     "iopub.status.busy": "2024-01-11T18:11:32.219029Z",
     "iopub.status.idle": "2024-01-11T18:11:32.222236Z",
     "shell.execute_reply": "2024-01-11T18:11:32.221556Z"
    },
    "id": "9iagoTBfijUz"
   },
   "outputs": [],
   "source": [
    "# Create a checkpoint directory to store the checkpoints.\n",
    "checkpoint_dir = './training_checkpoints'\n",
    "checkpoint_prefix = os.path.join(checkpoint_dir, \"ckpt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0-VVTqDEICrl"
   },
   "source": [
    "## 損失関数を定義する\n",
    "\n",
    "損失関数は以下の 2 つの部分で構成されていることを思い出しましょう。\n",
    "\n",
    "- **予測損失**は、モデルの予測が、トレーニングサンプルのバッチに対するトレーニングラベルからどれくらい外れているかを測定します。ラベル付きのサンプルごとに計算されてから、平均値を計算してバッチ全体で縮小されます。\n",
    "- オプションの **正則化損失**の項を予測損失に追加して、モデルがトレーニングデータを過学習しないように誘導します。一般的には L2 正則化が使用されます。これは、サンプルの数に関係なく、すべてのモデルの重みの二乗和の小さな固定倍数を追加します。上記のモデルは L2 正則化を使用して、以下のトレーニングループでの処理を示しています。\n",
    "\n",
    "単一の GPU/CPU を使った単一のマシンでのトレーニングでは、次のように動作します。\n",
    "\n",
    "- 予測損失は、バッチのサンプルごとに計算され、バッチ全体で加算され、バッチサイズで除算されます。\n",
    "- 正則化損失は、予測損失に追加されます。\n",
    "- 合計損失の勾配は各モデルの重みに関して計算され、オプティマイザが、対応する勾配から各モデルの重みを更新します。\n",
    "\n",
    "`tf.distribute.Strategy` では、入力バッチはレプリカ間で分割されます。たとえば、GPU が 4 つあり、それぞれにモデルのレプリカが 1 つあるとします。1 つのバッチの 256 の入力サンプルは 4 つのレプリカで均等に分散されるため、各レプリカのバッチサイズは 64 となります。したがって、`256 = 4*64`、または一般に `GLOBAL_BATCH_SIZE = num_replicas_in_sync * BATCH_SIZE_PER_REPLICA` があることになります。\n",
    "\n",
    "各レプリカは、それが得るトレーニングサンプルから損失を計算し、各モデルの重みに関する損失の勾配を計算します。オプティマイザは、これらの**勾配をレプリカ全体で加算**してから、レプリカごとにモデルの重みのコピーを更新します。\n",
    "\n",
    "*では、`tf.distribute.Strategy` を使用する場合、どのように損失を計算すればよいのでしょうか。*\n",
    "\n",
    "- 各レプリカは、それに分散されたすべてのサンプルの予測損失を計算し、結果を加算して、`num_replicas_in_sync * BATCH_SIZE_PER_REPLICA` または `GLOBAL_BATCH_SIZE` で除算します。\n",
    "- 各レプリカは正則化損失を計算し、それを `num_replicas_in_sync` で除算します。\n",
    "\n",
    "非分散型トレーニングに桑ベルト、すべてのレプリカ単位の損失項は `1/num_replicas_in_sync` の計数でスケールダウンされます。一方、すべての損失項、または勾配は、オプティマイザが適用する前にレプリカの数で加算されます。実際、各レプリカのオプティマイザは、`GLOBAL_BATCH_SIZE` による非分散型計算が行われなかったかのようにして、同じ勾配を使用します。これは、分散型と非分散型の Keras `Model.fit` の動作と同じです。より大きなグローバルバッチサイズによって学習率のスケールアップが可能になるかについて、[Keras による分散型トレーニング](./keras.ipynb)をご覧ください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e-wlFFZbP33n"
   },
   "source": [
    "*TensorFlow では次のようにします。*\n",
    "\n",
    "- この縮小とスケーリングは、Keras `Model.compile` と `Model.fit` で自動的に行われます。\n",
    "\n",
    "- If you're writing a custom training loop, as in this tutorial, you should sum the per example losses and divide the sum by the `GLOBAL_BATCH_SIZE`: `scale_loss = tf.reduce_sum(loss) * (1. / GLOBAL_BATCH_SIZE)` or you can use `tf.nn.compute_average_loss` which takes the per example loss, optional sample weights, and `GLOBAL_BATCH_SIZE` as arguments and returns the scaled loss.\n",
    "\n",
    "- `tf.keras.losses` クラスを使用すると（以下の例）、損失の縮小を `NONE` または `SUM` のいずれかになるように明示的に指定する必要があります。デフォルトの `AUTO` と `SUM_OVER_BATCH_SIZE` は `Model.fit` の外では使用できません。\n",
    "\n",
    "    - `AUTO` は、分散型のケースで正しくなるようにどの縮小を使用するかを明示的に考える必要があるため、使用できません。\n",
    "    - `SUM_OVER_BATCH_SIZE` は、現在、レプリカごとのバッチサイズでのみ除算し、レプリカ数による除算をユーザーが処理しなければならないようになっていますが、見逃す可能性があるため、使用できなくなっています。したがって、ユーザー自身が縮小を明示的に行う必要があります。\n",
    "\n",
    "- 空でない `Model.losses` リストのカスタムトレーニングループを書いている場合は（重みレギュラライザなど）、加算して、レプリカ数で除算する必要があります。これは、`tf.nn.scale_regularization_loss` 関数を使って行えます。モデルコード自体は、レプリカの数を認識していません。\n",
    "\n",
    "ただし、モデルは、`Layer.add_loss(...)` や `Layer(activity_regularizer=...)` などの Keras API によって入力に依存する正則化損失を定義できます。`Layer.add_loss(...)` の場合、モデリングコードが加算されたサンプルごとの項を `tf.math.reduce_mean()` などを使ってレプリカ単位(!) のバッチサイズで除算します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.225802Z",
     "iopub.status.busy": "2024-01-11T18:11:32.225192Z",
     "iopub.status.idle": "2024-01-11T18:11:32.229617Z",
     "shell.execute_reply": "2024-01-11T18:11:32.229016Z"
    },
    "id": "R144Wci782ix"
   },
   "outputs": [],
   "source": [
    "with strategy.scope():\n",
    "  # Set reduction to `NONE` so you can do the reduction yourself.\n",
    "  loss_object = tf.keras.losses.SparseCategoricalCrossentropy(\n",
    "      from_logits=True,\n",
    "      reduction=tf.keras.losses.Reduction.NONE)\n",
    "  def compute_loss(labels, predictions, model_losses):\n",
    "    per_example_loss = loss_object(labels, predictions)\n",
    "    loss = tf.nn.compute_average_loss(per_example_loss)\n",
    "    if model_losses:\n",
    "      loss += tf.nn.scale_regularization_loss(tf.add_n(model_losses))\n",
    "    return loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6pM96bqQY52D"
   },
   "source": [
    "### 特殊ケース\n",
    "\n",
    "高度なユーザーは、以下の特殊ケースについても考慮することをお勧めします。\n",
    "\n",
    "- `GLOBAL_BATCH_SIZE` よりも短い入力バッチが原因で、いくつかの場所で好ましくない例外が発生します。実際には、`Dataset.repeat().batch()` を使用してエポックの境界をまたぐバッチを許可し、データセットの終了ではなくステップ数でおおよそのエポック数を定義することで、例外を回避することがよくあります。または、`Dataset.batch(drop_remainder=True)` は、エポックの表記を維持しながら、最後の数個のサンプルを除外します。\n",
    "\n",
    "説明のために、この例ではより困難なルートを選択し、短いバッチを許可するため、トレーニングエポックごとに各トレーニング サンプルが 1 回だけ含まれます。\n",
    "\n",
    "どのデノミネーターを `tf.nn.compute_average_loss()` で使用すればよいでしょうか。\n",
    "\n",
    "```\n",
    "* By default, in the example code above and equivalently in `Keras.fit()`, the sum of prediction losses is divided by `num_replicas_in_sync` times the actual batch size seen on the replica (with empty batches silently ignored). This preserves the balance between the prediction loss on the one hand and the regularization losses on the other hand. It is particularly appropriate for models that use input-dependent regularization losses. Plain L2 regularization just superimposes weight decay onto the gradients of the prediction loss and is less in need of such a balance.\n",
    "* In practice, many custom training loops pass as a constant Python value into `tf.nn.compute_average_loss(..., global_batch_size=GLOBAL_BATCH_SIZE)` to use it as the denominator. This preserves the relative weighting of training examples between batches. Without it, the smaller denominator in short batches effectively upweights the examples in those. (Before TensorFlow 2.13, this was also needed to avoid NaNs in case some replica received an actual batch size of zero.)\n",
    "```\n",
    "\n",
    "上記で説明するように、いずれのオプションも、短いバッチが回避されるのであれば同等です。\n",
    "\n",
    "- 多次元 `labels` では、各サンプルの予測数全体で `per_example_loss` を平均化する必要があります。形状が `(batch_size, H, W, n_classes)` の `predictions` と形状が `(batch_size, H, W)` の `labels` を持つ入力画像のすべてのピクセルに対する分類タスクがあるとした場合、`per_example_loss` は次のようにして更新する必要があります: `per_example_loss /= tf.cast(tf.reduce_prod(tf.shape(labels)[1:]), tf.float32)`\n",
    "\n",
    "注意：**損失の形状を確認してください**。 `tf.losses`/`tf.keras.losses`の損失関数は、通常、入力の最後の次元の平均を返します。損失クラスはこれらの関数をラップします。 損失クラスのインスタンスを作成するときに`reduction=Reduction.NONE`を渡すことは、「**追加の**縮小がない」ことを意味します。`[batch, W, H, n_classes]`の入力形状の例を使用したカテゴリ損失の場合、`n_classes`次元が縮小されます。`losses.mean_squared_error`または`losses.binary_crossentropy`のような点ごとの損失の場合、ダミー軸を用いて、`[batch, W, H, 1]`を`[batch, W, H]`に縮小します。ダミー軸がないと、`[batch, W, H]`は誤って`[batch, W]`に縮小されます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "w8y54-o9T2Ni"
   },
   "source": [
    "## 損失と精度を追跡するメトリクスを定義する\n",
    "\n",
    "これらのメトリクスは、テストの損失、トレーニング、テストの精度を追跡します。`.result()`を使用して、いつでも累積統計を取得できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.233189Z",
     "iopub.status.busy": "2024-01-11T18:11:32.232942Z",
     "iopub.status.idle": "2024-01-11T18:11:32.272799Z",
     "shell.execute_reply": "2024-01-11T18:11:32.272123Z"
    },
    "id": "zt3AHb46Tr3w"
   },
   "outputs": [],
   "source": [
    "with strategy.scope():\n",
    "  test_loss = tf.keras.metrics.Mean(name='test_loss')\n",
    "\n",
    "  train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n",
    "      name='train_accuracy')\n",
    "  test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n",
    "      name='test_accuracy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iuKuNXPORfqJ"
   },
   "source": [
    "## トレーニングループ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.275962Z",
     "iopub.status.busy": "2024-01-11T18:11:32.275730Z",
     "iopub.status.idle": "2024-01-11T18:11:32.316456Z",
     "shell.execute_reply": "2024-01-11T18:11:32.315871Z"
    },
    "id": "OrMmakq5EqeQ"
   },
   "outputs": [],
   "source": [
    "# A model, an optimizer, and a checkpoint must be created under `strategy.scope`.\n",
    "with strategy.scope():\n",
    "  model = create_model()\n",
    "\n",
    "  optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n",
    "\n",
    "  checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.319621Z",
     "iopub.status.busy": "2024-01-11T18:11:32.319398Z",
     "iopub.status.idle": "2024-01-11T18:11:32.324416Z",
     "shell.execute_reply": "2024-01-11T18:11:32.323727Z"
    },
    "id": "3UX43wUu04EL"
   },
   "outputs": [],
   "source": [
    "def train_step(inputs):\n",
    "  images, labels = inputs\n",
    "\n",
    "  with tf.GradientTape() as tape:\n",
    "    predictions = model(images, training=True)\n",
    "    loss = compute_loss(labels, predictions, model.losses)\n",
    "\n",
    "  gradients = tape.gradient(loss, model.trainable_variables)\n",
    "  optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n",
    "\n",
    "  train_accuracy.update_state(labels, predictions)\n",
    "  return loss\n",
    "\n",
    "def test_step(inputs):\n",
    "  images, labels = inputs\n",
    "\n",
    "  predictions = model(images, training=False)\n",
    "  t_loss = loss_object(labels, predictions)\n",
    "\n",
    "  test_loss.update_state(t_loss)\n",
    "  test_accuracy.update_state(labels, predictions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:11:32.327333Z",
     "iopub.status.busy": "2024-01-11T18:11:32.327095Z",
     "iopub.status.idle": "2024-01-11T18:12:20.886447Z",
     "shell.execute_reply": "2024-01-11T18:12:20.885639Z"
    },
    "id": "gX975dMSNw0e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Collective all_reduce tensors: 8 all_reduces, num_devices = 4, group_size = 4, implementation = CommunicationImplementation.NCCL, num_packs = 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Collective all_reduce tensors: 8 all_reduces, num_devices = 4, group_size = 4, implementation = CommunicationImplementation.NCCL, num_packs = 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
      "I0000 00:00:1704996698.665771   44862 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Collective all_reduce tensors: 8 all_reduces, num_devices = 4, group_size = 4, implementation = CommunicationImplementation.NCCL, num_packs = 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1, Loss: 0.6599556803703308, Accuracy: 77.09000396728516, Test Loss: 0.4591159522533417, Test Accuracy: 83.42000579833984\n",
      "INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 2, Loss: 0.4002712368965149, Accuracy: 85.9866714477539, Test Loss: 0.4018000364303589, Test Accuracy: 85.7699966430664\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 3, Loss: 0.34986022114753723, Accuracy: 87.6066665649414, Test Loss: 0.3515324294567108, Test Accuracy: 87.4800033569336\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 4, Loss: 0.32334816455841064, Accuracy: 88.48666381835938, Test Loss: 0.32882973551750183, Test Accuracy: 88.13999938964844\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 5, Loss: 0.30147460103034973, Accuracy: 89.28166961669922, Test Loss: 0.31937628984451294, Test Accuracy: 88.80000305175781\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 6, Loss: 0.28456664085388184, Accuracy: 89.92833709716797, Test Loss: 0.30238109827041626, Test Accuracy: 89.41000366210938\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 7, Loss: 0.268964558839798, Accuracy: 90.45833587646484, Test Loss: 0.30147919058799744, Test Accuracy: 89.45999908447266\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 8, Loss: 0.2548156976699829, Accuracy: 90.91500091552734, Test Loss: 0.29818081855773926, Test Accuracy: 89.13999938964844\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 9, Loss: 0.24469958245754242, Accuracy: 91.3550033569336, Test Loss: 0.28745248913764954, Test Accuracy: 89.8800048828125\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.23541080951690674, Accuracy: 91.68000030517578, Test Loss: 0.2748521864414215, Test Accuracy: 89.94000244140625\n"
     ]
    }
   ],
   "source": [
    "# `run` replicates the provided computation and runs it\n",
    "# with the distributed input.\n",
    "@tf.function\n",
    "def distributed_train_step(dataset_inputs):\n",
    "  per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))\n",
    "  return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,\n",
    "                         axis=None)\n",
    "\n",
    "@tf.function\n",
    "def distributed_test_step(dataset_inputs):\n",
    "  return strategy.run(test_step, args=(dataset_inputs,))\n",
    "\n",
    "for epoch in range(EPOCHS):\n",
    "  # TRAIN LOOP\n",
    "  total_loss = 0.0\n",
    "  num_batches = 0\n",
    "  for x in train_dist_dataset:\n",
    "    total_loss += distributed_train_step(x)\n",
    "    num_batches += 1\n",
    "  train_loss = total_loss / num_batches\n",
    "\n",
    "  # TEST LOOP\n",
    "  for x in test_dist_dataset:\n",
    "    distributed_test_step(x)\n",
    "\n",
    "  if epoch % 2 == 0:\n",
    "    checkpoint.save(checkpoint_prefix)\n",
    "\n",
    "  template = (\"Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, \"\n",
    "              \"Test Accuracy: {}\")\n",
    "  print(template.format(epoch + 1, train_loss,\n",
    "                         train_accuracy.result() * 100, test_loss.result(),\n",
    "                         test_accuracy.result() * 100))\n",
    "\n",
    "  test_loss.reset_states()\n",
    "  train_accuracy.reset_states()\n",
    "  test_accuracy.reset_states()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Z1YvXqOpwy08"
   },
   "source": [
    "### 上記の例における注意点\n",
    "\n",
    "- `for x in ...` コンストラクトを使用して、`train_dist_dataset` と `test_dist_dataset` をイテレーションします。\n",
    "- スケーリングされた損失は `distributed_train_step` の戻り値です。この値は `tf.distribute.Strategy.reduce` 呼び出しを使用してレプリカ間で集約され、次に `tf.distribute.Strategy.reduce` 呼び出しの戻り値を加算してバッチ間で集約されます。\n",
    "- `tf.keras.Metrics` は、`tf.distribute.Strategy.run` によって実行される `train_step` および `test_step` 内で更新する必要があります。\n",
    "- `tf.distribute.Strategy.run` はストラテジー内の各ローカルレプリカの結果を返し、この結果の使用方法は多様です。<code>reduce</code> で、集約された値を取得することができます。また、<code>tf.distribute.Strategy.experimental_local_results</code> を実行して、ローカルレプリカごとに 1 つ、結果に含まれる値のリストを取得することもできます。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-q5qp31IQD8t"
   },
   "source": [
    "## 最新のチェックポイントを復元してテストする"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WNW2P00bkMGJ"
   },
   "source": [
    "`tf.distribute.Strategy`でチェックポイントされたモデルは、ストラテジーの有無に関わらず復元することができます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:12:20.890492Z",
     "iopub.status.busy": "2024-01-11T18:12:20.889866Z",
     "iopub.status.idle": "2024-01-11T18:12:20.955159Z",
     "shell.execute_reply": "2024-01-11T18:12:20.954559Z"
    },
    "id": "pg3B-Cw_cn3a"
   },
   "outputs": [],
   "source": [
    "eval_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n",
    "      name='eval_accuracy')\n",
    "\n",
    "new_model = create_model()\n",
    "new_optimizer = tf.keras.optimizers.Adam()\n",
    "\n",
    "test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:12:20.958616Z",
     "iopub.status.busy": "2024-01-11T18:12:20.958143Z",
     "iopub.status.idle": "2024-01-11T18:12:20.961672Z",
     "shell.execute_reply": "2024-01-11T18:12:20.961113Z"
    },
    "id": "7qYii7KUYiSM"
   },
   "outputs": [],
   "source": [
    "@tf.function\n",
    "def eval_step(images, labels):\n",
    "  predictions = new_model(images, training=False)\n",
    "  eval_accuracy(labels, predictions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:12:20.964537Z",
     "iopub.status.busy": "2024-01-11T18:12:20.964012Z",
     "iopub.status.idle": "2024-01-11T18:12:21.593962Z",
     "shell.execute_reply": "2024-01-11T18:12:21.593146Z"
    },
    "id": "LeZ6eeWRoUNq"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy after restoring the saved model without strategy: 89.8800048828125\n"
     ]
    }
   ],
   "source": [
    "checkpoint = tf.train.Checkpoint(optimizer=new_optimizer, model=new_model)\n",
    "checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))\n",
    "\n",
    "for images, labels in test_dataset:\n",
    "  eval_step(images, labels)\n",
    "\n",
    "print('Accuracy after restoring the saved model without strategy: {}'.format(\n",
    "    eval_accuracy.result() * 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EbcI87EEzhzg"
   },
   "source": [
    "## データセットのイテレーションの代替方法\n",
    "\n",
    "### イテレータを使用する\n",
    "\n",
    "データセット全体ではなく、任意のステップ数のイテレーションを行う場合は、`iter` 呼び出しを使用してイテレータを作成し、そのイテレータ上で `next` を明示的に呼び出すことができます。`tf.function` の内側と外側の両方でデータセットのイテレーションを選択することができます。ここでは、イテレータを使用し `tf.function` の外側のデータセットのイテレーションを実行する小さなスニペットを示します。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:12:21.597478Z",
     "iopub.status.busy": "2024-01-11T18:12:21.596924Z",
     "iopub.status.idle": "2024-01-11T18:12:27.963415Z",
     "shell.execute_reply": "2024-01-11T18:12:27.962710Z"
    },
    "id": "7c73wGC00CzN"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.22530433535575867, Accuracy: 92.0703125\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.21155035495758057, Accuracy: 92.421875\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.23270802199840546, Accuracy: 91.09375\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.2111983597278595, Accuracy: 92.421875\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.2315395325422287, Accuracy: 91.7578125\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.22891399264335632, Accuracy: 91.640625\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.23187729716300964, Accuracy: 91.4453125\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.23954670131206512, Accuracy: 91.4453125\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.21727390587329865, Accuracy: 92.5390625\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.2208312749862671, Accuracy: 92.3046875\n"
     ]
    }
   ],
   "source": [
    "for _ in range(EPOCHS):\n",
    "  total_loss = 0.0\n",
    "  num_batches = 0\n",
    "  train_iter = iter(train_dist_dataset)\n",
    "\n",
    "  for _ in range(10):\n",
    "    total_loss += distributed_train_step(next(train_iter))\n",
    "    num_batches += 1\n",
    "  average_train_loss = total_loss / num_batches\n",
    "\n",
    "  template = (\"Epoch {}, Loss: {}, Accuracy: {}\")\n",
    "  print(template.format(epoch + 1, average_train_loss, train_accuracy.result() * 100))\n",
    "  train_accuracy.reset_states()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GxVp48Oy0m6y"
   },
   "source": [
    "### tf.function 内でイテレーションする\n",
    "\n",
    "`for x in ...` コンストラクトを使用して、または上記で行ったようにイテレータを作成して、`tf.function` 内で `train_dist_dataset` の入力全体をイテレートすることもできます。以下の例では、1 エポックのトレーニングを `@tf.function` デコレータでラップし、関数内で `train_dist_dataset` をイテレーションする方法を示します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:12:27.966722Z",
     "iopub.status.busy": "2024-01-11T18:12:27.966466Z",
     "iopub.status.idle": "2024-01-11T18:12:41.531191Z",
     "shell.execute_reply": "2024-01-11T18:12:41.530407Z"
    },
    "id": "-REzmcXv00qm"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py:462: UserWarning: To make it possible to preserve tf.data options across serialization boundaries, their implementation has moved to be part of the TensorFlow graph. As a consequence, the options value is in general no longer known at graph construction time. Invoking this method in graph mode retains the legacy behavior of the original implementation, but note that the returned value might not reflect the actual value of the options.\n",
      "  warnings.warn(\"To make it possible to preserve tf.data options across \"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Collective all_reduce tensors: 8 all_reduces, num_devices = 4, group_size = 4, implementation = CommunicationImplementation.NCCL, num_packs = 1\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1, Loss: 0.22321248054504395, Accuracy: 92.12667083740234\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 2, Loss: 0.21352693438529968, Accuracy: 92.40499877929688\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 3, Loss: 0.2031208723783493, Accuracy: 92.74666595458984\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 4, Loss: 0.19914129376411438, Accuracy: 92.96666717529297\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 5, Loss: 0.18742477893829346, Accuracy: 93.40333557128906\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 6, Loss: 0.18182916939258575, Accuracy: 93.59166717529297\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 7, Loss: 0.17676156759262085, Accuracy: 93.77666473388672\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 8, Loss: 0.16894836723804474, Accuracy: 94.06333923339844\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 9, Loss: 0.1639356017112732, Accuracy: 94.2683334350586\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 10, Loss: 0.1561119258403778, Accuracy: 94.56999969482422\n"
     ]
    }
   ],
   "source": [
    "@tf.function\n",
    "def distributed_train_epoch(dataset):\n",
    "  total_loss = 0.0\n",
    "  num_batches = 0\n",
    "  for x in dataset:\n",
    "    per_replica_losses = strategy.run(train_step, args=(x,))\n",
    "    total_loss += strategy.reduce(\n",
    "      tf.distribute.ReduceOp.SUM, per_replica_losses, axis=None)\n",
    "    num_batches += 1\n",
    "  return total_loss / tf.cast(num_batches, dtype=tf.float32)\n",
    "\n",
    "for epoch in range(EPOCHS):\n",
    "  train_loss = distributed_train_epoch(train_dist_dataset)\n",
    "\n",
    "  template = (\"Epoch {}, Loss: {}, Accuracy: {}\")\n",
    "  print(template.format(epoch + 1, train_loss, train_accuracy.result() * 100))\n",
    "\n",
    "  train_accuracy.reset_states()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MuZGXiyC7ABR"
   },
   "source": [
    "### レプリカ間でトレーニング損失を追跡する\n",
    "\n",
    "注意: 一般的なルールとして、サンプルごとの値の追跡には`tf.keras.Metrics`を使用し、レプリカ内で集約された値を避ける必要があります。\n",
    "\n",
    "損失スケーリングの計算が実行されるため、レプリカ間でトレーニング損失を追跡するために `tf.keras.metrics.Mean` を使用することは推奨されません。\n",
    "\n",
    "例えば、次のような特徴を持つトレーニングジョブを実行するとします。\n",
    "\n",
    "- レプリカ 2 つ\n",
    "- 各レプリカで 2 つのサンプルを処理\n",
    "- 結果の損失値 : 各レプリカで [2,  3] および [4,  5]\n",
    "- グローバルバッチサイズ = 4\n",
    "\n",
    "損失スケーリングで損失値を加算して各レプリカのサンプルごとの損失の値を計算し、さらにグローバルバッチサイズで除算します。この場合は、`(2 + 3) / 4 = 1.25`および`(4 + 5) / 4 = 2.25`となります。\n",
    "\n",
    "`tf.keras.metrics.Mean` を使用して 2 つのレプリカ間の損失を追跡すると、異なる結果が得られます。この例では、`total` は 3.50、`count` は 2 となるため、メトリックで `result()` が呼び出されると、`total`/`count` = 1.75 となります。`tf.keras.Metrics` で計算された損失は、同期するレプリカの数に等しい追加の係数によってスケーリングされます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xisYJaV9KZTN"
   },
   "source": [
    "### ガイドと例\n",
    "\n",
    "カスタムトレーニングループを用いた分散ストラテジーの使用例をここに幾つか示します。\n",
    "\n",
    "1. 分散型トレーニングガイド\n",
    "2. `MirroredStrategy`を使用した [DenseNet](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/densenet/distributed_train.py) の例。\n",
    "3. <code>MirroredStrategy</code>と`TPUStrategy`を使用してトレーニングされた <a>BERT</a> の例。この例は、分散トレーニングなどの間にチェックポイントから読み込む方法と、定期的にチェックポイントを生成する方法を理解するのに特に有用です。\n",
    "4. `MirroredStrategy` を使用してトレーニングされ、`keras_use_ctl` フラグを使用した有効化が可能な、[NCF](https://github.com/tensorflow/models/blob/master/official/recommendation/ncf_keras_main.py) の例。\n",
    "5. `MirroredStrategy`を使用してトレーニングされた、[NMT](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/nmt_with_attention/distributed_train.py) の例。\n",
    "\n",
    "その他の例は、[分散型ストラテジーガイド](../../guide/distributed_training.ipynb)の「*例とチュートリアル*」に記載されています。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6hEJNsokjOKs"
   },
   "source": [
    "## 次のステップ\n",
    "\n",
    "- 新しい`tf.distribute.Strategy` API を独自のモデルで試してみましょう。\n",
    "- TensorFlow モデルのパフォーマンスを最適化する方法についてのその他の詳細は、[`tf.function` によるパフォーマンスの改善](../../guide/function.ipynb)と [TensorFlow Profiler](../../guide/profiler.md) をご覧ください。\n",
    "- [TensorFlow での分散型トレーニング](../../guide/distributed_training.ipynb)ガイドでは、利用可能な分散ストラテジーの概要が説明されています。"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "custom_training.ipynb",
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}