{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rA5Mubike7OJ"
   },
   "source": [
    "##### Copyright 2020 The TensorFlow Authors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cellView": "form",
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:23.208425Z",
     "iopub.status.busy": "2024-01-11T18:21:23.208189Z",
     "iopub.status.idle": "2024-01-11T18:21:23.211989Z",
     "shell.execute_reply": "2024-01-11T18:21:23.211409Z"
    },
    "id": "fY0a3LRYfHUl"
   },
   "outputs": [],
   "source": [
    "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iNz7xXMSsAQa"
   },
   "source": [
    "# ParameterServerStrategy でパラメータサーバーをトレーニングする"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jHyqRIqxsJuc"
   },
   "source": [
    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
    "  <td>     <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/distribute/parameter_server_training\">     <img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\">     TensorFlow.org で表示</a> </td>\n",
    "  <td>     <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/ja/tutorials/distribute/parameter_server_training.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\">Google Colabで実行</a> </td>\n",
    "  <td><a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/ja/tutorials/distribute/parameter_server_training.ipynb\">     <img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\">     GitHubでソースを表示</a></td>\n",
    "  <td><a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/ja/tutorials/distribute/parameter_server_training.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\"> ノートブックをダウンロード</a></td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6v4D6QfcfTrm"
   },
   "source": [
    "## 概要\n",
    "\n",
    "[パラメータサーバートレーニング](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-li_mu.pdf)は、複数のマシンでモデルトレーニングをスケールアップするための一般的なデータ並列方法です。\n",
    "\n",
    "パラメータサーバートレーニング クラスタは、*ワーカー*と*パラメータサーバー*で構成されます。変数はパラメータサーバーで作成され、各ステップでワーカーにより読み取られ、更新されます。 デフォルトでは、ワーカーは相互に同期することなく、これらの変数を個別に読み取り、更新します。そのため、パラメータサーバースタイルのトレーニングは*非同期トレーニング*と呼ばれます。\n",
    "\n",
    "TensorFlow 2 では、パラメータサーバートレーニングは `tf.distribute.ParameterServerStrategy` クラスによって行われます。このクラスは、数千のワーカーにスケールアップするクラスタにトレーニングステップを分散します (パラメータサーバーを伴う)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "W1LGfTdgOF-J"
   },
   "source": [
    "### サポートされているトレーニング方法\n",
    "\n",
    "サポートされている主なトレーニング方法は 2 つあります。\n",
    "\n",
    "- Keras `Model.fit` API: 高レベルの抽象化とトレーニングの処理を希望する場合に使用します。これは、`tf.keras.Model` をトレーニングしている場合に一般的に推奨されます。\n",
    "- カスタムトレーニングループ: トレーニングループの詳細を定義する場合に使用します (詳細については、[カスタムトレーニング](../customization/custom_training_walkthrough.ipynb)、[トレーニングループを最初から作成する](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch)、[MultiWorkerMirroredStrategy と Keras を使用したカスタムトレーニングループ](multi_worker_with_ctl.ipynb)に関するガイド を参照してください)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FjbULGvV7NRz"
   },
   "source": [
    "### ジョブとタスクのクラスタ\n",
    "\n",
    "選択した API (`Model.fit` またはカスタムトレーニングループ) に関係なく、TensorFlow 2 の分散トレーニングには、複数の `'jobs'` があり、各ジョブには 1 つ以上の `'task'` がある場合があります。\n",
    "\n",
    "パラメータサーバートレーニングを使用する場合は、次を推薦します。\n",
    "\n",
    "- 1 つの *コーディネータ*ジョブ (ジョブ名は `chief`)\n",
    "- 複数の*ワーカー*ジョブ (ジョブ名は `worker`)\n",
    "- 複数の*パラメータサーバー*ジョブ (ジョブ名は `ps`)\n",
    "\n",
    "*コーディネータ*は、リソースを作成し、トレーニングタスクをディスパッチし、チェックポイントを書き込み、タスクの失敗に対処します。*ワーカー*と*パラメータサーバー*は、コーディネータからのリクエストをリッスンする `tf.distribute.Server` インスタンスを実行します。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oLV1FbpLtqtB"
   },
   "source": [
    "### `Model.fit` API を使用したパラメータサーバートレーニング\n",
    "\n",
    "`Model.fit` API を使用したパラメータサーバートレーニングでは、コーディネータが `tf.distribute.ParameterServerStrategy` オブジェクトを使用する必要があります。`Model.fit` をストラテジーなしで使用する場合や他のストラテジーを使用する場合と同様に、ワークフローには、モデルの作成とコンパイル、コールバックの準備、および `Model.fit` の呼び出しが含まれます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yJ5AosxFyfzk"
   },
   "source": [
    "### カスタムトレーニングループを使用したパラメータサーバートレーニング\n",
    "\n",
    "カスタムトレーニングループでは、`tf.distribute.coordinator.ClusterCoordinator` クラスがコーディネータに使用される重要なコンポーネントです。\n",
    "\n",
    "- `ClusterCoordinator` クラスは、`tf.distribute.ParameterServerStrategy` オブジェクトと連携して動作する必要があります。\n",
    "- この `tf.distribute.Strategy` オブジェクトは、クラスタの情報を提供するために必要であり、[tf.distribute.Strategy を使用したカスタムトレーニング](custom_training.ipynb)で示されているように、トレーニングステップを定義するために使用されます。\n",
    "- `ClusterCoordinator` オブジェクトは、これらのトレーニング ステップの実行をリモートワーカーにディスパッチします。\n",
    "\n",
    "`ClusterCoordinator` オブジェクトにより提供される最も重要な API は `schedule` です。\n",
    "\n",
    "- `schedule` API は `tf.function` をキューに入れ、future-like の `RemoteValue` をすぐに返します。\n",
    "- キューに入れられた関数は、バックグラウンドスレッドでリモートワーカーにディスパッチされ、その `RemoteValue` は非同期で埋められます。\n",
    "- `schedule` はワーカーの割り当てを必要としないため、渡された `tf.function` は使用可能な任意のワーカーで実行できます。\n",
    "- 関数が実行されたワーカーが完了前に利用できなくなった場合、別の利用可能なワーカーで再試行されます。\n",
    "- そのため、そして関数の実行がアトミックではないために、1 つの関数の呼び出しが複数回実行される場合があります。\n",
    "\n",
    "`ClusterCoordinator` は、リモート関数のディスパッチに加えて、すべてのワーカーでデータセットを作成し、ワーカーが障害から回復したときにこれらのデータセットを再構築するのにも役立ちます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MyDnWjmOje5-"
   },
   "source": [
    "## チュートリアルのセットアップ\n",
    "\n",
    "チュートリアルのセクションは `Model.fit` とカスタムトレーニングループ向けに分かれています。「X を使用したトレーニング」以外のセクションは、両方に適用されます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:23.216227Z",
     "iopub.status.busy": "2024-01-11T18:21:23.215695Z",
     "iopub.status.idle": "2024-01-11T18:21:25.349496Z",
     "shell.execute_reply": "2024-01-11T18:21:25.348590Z"
    },
    "id": "0-V3LUcIs4a-"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting portpicker\r\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  Downloading portpicker-1.6.0-py3-none-any.whl.metadata (1.5 kB)\r\n",
      "Requirement already satisfied: psutil in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from portpicker) (5.9.7)\r\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading portpicker-1.6.0-py3-none-any.whl (16 kB)\r\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Installing collected packages: portpicker\r\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Successfully installed portpicker-1.6.0\r\n"
     ]
    }
   ],
   "source": [
    "!pip install portpicker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:25.353748Z",
     "iopub.status.busy": "2024-01-11T18:21:25.353449Z",
     "iopub.status.idle": "2024-01-11T18:21:27.741891Z",
     "shell.execute_reply": "2024-01-11T18:21:27.741163Z"
    },
    "id": "GlI_NAVFae3J"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-01-11 18:21:25.783779: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-01-11 18:21:25.783829: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-01-11 18:21:25.785347: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n"
     ]
    }
   ],
   "source": [
    "#@title\n",
    "import multiprocessing\n",
    "import os\n",
    "import random\n",
    "import portpicker\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uvwgM2rzgzIC"
   },
   "source": [
    "## クラスタのセットアップ\n",
    "\n",
    "前述のように、パラメータサーバートレーニングクラスタには、トレーニングプログラムを実行するコーディネータタスク、1 つまたは複数のワーカー、TensorFlow サーバーを実行するパラメータサーバータスク (`tf.distribute.Server`) が必要です。場合によっては、サイドカー評価を実行する追加の評価タスクが必要です (以下の[サイドカー評価セクション](#sidecar_evaluation)を参照してください)。これらを設定するための要件は次のとおりです。\n",
    "\n",
    "- コーディネータタスクは、エバリュエータを除く他のすべての TensorFlow サーバーのアドレスとポートを知る必要があります。\n",
    "- ワーカーとパラメータサーバーは、リッスンする必要があるポートを知る必要があります。通常、これらのタスクで TensorFlow サーバーを作成するときに、完全なクラスタの情報を渡します。\n",
    "- エバリュエータタスクは、トレーニングクラスタの設定を知る必要はありません。知っている場合でも、トレーニングクラスタへの接続を試みるべきではありません。\n",
    "- ワーカーとパラメータサーバーには、それぞれ `\"worker\"` と `\"ps\"` のタスクタイプが必要です。コーディネータは、タスクタイプとして従来の `\"chief\"` を使用する必要があります。\n",
    "\n",
    "このチュートリアルでは、インプロセスのクラスタを作成し、パラメータサーバーのトレーニング全体を Colab で実行できるようにします。[実際のクラスタ](#real_clusters)の設定方法については、後のセクションで説明します。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7UNs7Lm2g19n"
   },
   "source": [
    "### インプロセス クラスタ\n",
    "\n",
    "事前にいくつかの TensorFlow サーバーを作成することから始め、後でそれらに接続します。これは、チュートリアルのデモを目的としており、実際のトレーニングでは、サーバーは `\"worker\"` および `\"ps\"` マシンで起動されることに注意してください。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:27.746509Z",
     "iopub.status.busy": "2024-01-11T18:21:27.745805Z",
     "iopub.status.idle": "2024-01-11T18:21:29.832478Z",
     "shell.execute_reply": "2024-01-11T18:21:29.831596Z"
    },
    "id": "FbrP5pXuaoVH"
   },
   "outputs": [],
   "source": [
    "def create_in_process_cluster(num_workers, num_ps):\n",
    "  \"\"\"Creates and starts local servers and returns the cluster_resolver.\"\"\"\n",
    "  worker_ports = [portpicker.pick_unused_port() for _ in range(num_workers)]\n",
    "  ps_ports = [portpicker.pick_unused_port() for _ in range(num_ps)]\n",
    "\n",
    "  cluster_dict = {}\n",
    "  cluster_dict[\"worker\"] = [\"localhost:%s\" % port for port in worker_ports]\n",
    "  if num_ps > 0:\n",
    "    cluster_dict[\"ps\"] = [\"localhost:%s\" % port for port in ps_ports]\n",
    "\n",
    "  cluster_spec = tf.train.ClusterSpec(cluster_dict)\n",
    "\n",
    "  # Workers need some inter_ops threads to work properly.\n",
    "  worker_config = tf.compat.v1.ConfigProto()\n",
    "  if multiprocessing.cpu_count() < num_workers + 1:\n",
    "    worker_config.inter_op_parallelism_threads = num_workers + 1\n",
    "\n",
    "  for i in range(num_workers):\n",
    "    tf.distribute.Server(\n",
    "        cluster_spec,\n",
    "        job_name=\"worker\",\n",
    "        task_index=i,\n",
    "        config=worker_config,\n",
    "        protocol=\"grpc\")\n",
    "\n",
    "  for i in range(num_ps):\n",
    "    tf.distribute.Server(\n",
    "        cluster_spec,\n",
    "        job_name=\"ps\",\n",
    "        task_index=i,\n",
    "        protocol=\"grpc\")\n",
    "\n",
    "  cluster_resolver = tf.distribute.cluster_resolver.SimpleClusterResolver(\n",
    "      cluster_spec, rpc_layer=\"grpc\")\n",
    "  return cluster_resolver\n",
    "\n",
    "# Set the environment variable to allow reporting worker and ps failure to the\n",
    "# coordinator. This is a workaround and won't be necessary in the future.\n",
    "os.environ[\"GRPC_FAIL_FAST\"] = \"use_caller\"\n",
    "\n",
    "NUM_WORKERS = 3\n",
    "NUM_PS = 2\n",
    "cluster_resolver = create_in_process_cluster(NUM_WORKERS, NUM_PS)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pX_91OByt0J2"
   },
   "source": [
    "インプロセスクラスタのセットアップは、ユニットテストでよく使用されます ([こちら](https://github.com/tensorflow/tensorflow/blob/eb4c40fc91da260199fa2aed6fe67d36ad49fafd/tensorflow/python/distribute/coordinator/cluster_coordinator_test.py#L447)を参照)。\n",
    "\n",
    "ローカルテストのもう 1 つのオプションは、ローカルマシンでプロセスを起動することです。このアプローチの例については、[Keras を使用したマルチワーカートレーニング](multi_worker_with_keras.ipynb)を参照してください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zyby6M2Jqg6J"
   },
   "source": [
    "## ParameterServerStrategy をインスタンス化する\n",
    "\n",
    "トレーニング コードに入る前に、`tf.distribute.ParameterServerStrategy` オブジェクトをインスタンス化します。これは、`Model.fit` とカスタムトレーニングループのどちらを使用している場合でも必要であることに注意してください。`variable_partitioner` 引数については、[変数シャーディングのセクション](#variable_sharding)で説明します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:29.838263Z",
     "iopub.status.busy": "2024-01-11T18:21:29.837554Z",
     "iopub.status.idle": "2024-01-11T18:21:29.979415Z",
     "shell.execute_reply": "2024-01-11T18:21:29.978639Z"
    },
    "id": "_YyEPgisrC35"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:`tf.distribute.experimental.ParameterServerStrategy` is initialized with cluster_spec: ClusterSpec({'ps': ['localhost:44073', 'localhost:33727'], 'worker': ['localhost:33687', 'localhost:35065', 'localhost:39575']})\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:ParameterServerStrategyV2 is now connecting to cluster with cluster_spec: ClusterSpec({'ps': ['localhost:44073', 'localhost:33727'], 'worker': ['localhost:33687', 'localhost:35065', 'localhost:39575']})\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:ParameterServerStrategy (CentralStorageStrategy if you are using a single machine) with compute_devices = ['/job:chief/replica:0/task:0/device:GPU:0', '/job:chief/replica:0/task:0/device:GPU:1', '/job:chief/replica:0/task:0/device:GPU:2', '/job:chief/replica:0/task:0/device:GPU:3'], variable_device = '/device:CPU:0'\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:ParameterServerStrategy (CentralStorageStrategy if you are using a single machine) with compute_devices = ['/job:chief/replica:0/task:0/device:GPU:0', '/job:chief/replica:0/task:0/device:GPU:1', '/job:chief/replica:0/task:0/device:GPU:2', '/job:chief/replica:0/task:0/device:GPU:3'], variable_device = '/device:CPU:0'\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Number of GPUs on workers: 4\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Number of GPUs on workers: 4\n"
     ]
    }
   ],
   "source": [
    "variable_partitioner = (\n",
    "    tf.distribute.experimental.partitioners.MinSizePartitioner(\n",
    "        min_shard_bytes=(256 << 10),\n",
    "        max_shards=NUM_PS))\n",
    "\n",
    "strategy = tf.distribute.ParameterServerStrategy(\n",
    "    cluster_resolver,\n",
    "    variable_partitioner=variable_partitioner)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WlAQxuMDJ3k9"
   },
   "source": [
    "トレーニングに GPU を使用するには、各ワーカーに表示される GPU を割り当てます。 `ParameterServerStrategy` は、各ワーカーで利用可能なすべての GPU を使用しますが、すべてのワーカーが同じ数の GPU を利用できる必要があるという制限があります。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QMmBLsf6sEXh"
   },
   "source": [
    "### 変数のシャーディング\n",
    "\n",
    "変数のシャーディングとは、変数を*シャード*と呼ばれる複数の小さな変数に分割することです。変数のシャーディングは、これらのシャードにアクセスする際のネットワーク負荷を分散するのに役立つ場合があります。また、1 台のマシンのメモリに収まらない非常に大きな埋め込みを使用する場合など、通常の変数の計算と格納を複数のパラメータサーバーに分散することもできます。\n",
    "\n",
    "変数シャーディングを有効にするには、`ParameterServerStrategy` オブジェクトを構築する際に `variable partitioner` を渡します。`variable_partitioner` は、変数が作成されるたびに呼び出され、変数の各次元に沿ってシャードの数を返すことが期待されます。`tf.distribute.experimental.partitioners.MinSizePartitioner` など、すぐに使える `variable_partitioner` がいくつか提供されています。`tf.distribute.experimental.partitioners.MinSizePartitioner` のようなサイズベースのパーティショナーを使用して、モデルのトレーニング速度に悪影響を及ぼす可能性のある小さな変数のパーティショニングを避けることをお勧めします。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1--SxlxtsOb7"
   },
   "source": [
    "`variable_partitioner` が渡され、`Strategy.scope` のすぐ下に変数を作成すると、その変数は `variables` プロパティを持つコンテナタイプになり、シャードのリストへのアクセスを提供します。ほとんどの場合、このコンテナは、すべてのシャードを連結することによって自動的にテンソルに変換されるので、通常の変数として使用できます。一方、`tf.nn.embedding_lookup` などの一部の TensorFlow メソッドは、このコンテナタイプの効率的な実装を提供し、これらのメソッドでは自動連結が回避されます。\n",
    "\n",
    "詳細については、`tf.distribute.ParameterServerStrategy` の API ドキュメントを参照してください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jlOq-O-26O1d"
   },
   "source": [
    "## `Model.fit` でトレーニングする\n",
    "\n",
    "<a id=\"training_with_modelfit\"></a>\n",
    "\n",
    "Keras は、`Model.fit` を介して使いやすいトレーニング API を提供します。これは、内部でトレーニングループを処理し、オーバーライド可能な柔軟な `train_step` や TensorBoard のチェックポイントの保存やサマリーの保存などの機能を提供するコールバックを備えています。`Model.fit` を使用すると、ストラテジーオブジェクトを簡単に交換するだけで、同じトレーニングコードを他のストラテジーで使用できます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oMZ9Cu5J6ZGi"
   },
   "source": [
    "### 入力データ\n",
    "\n",
    "`tf.distribute.ParameterServerStrategy` を使用する Keras `Model.fit` では、`tf.data.Dataset`、`tf.distribute.DistributedDataset` の形式の入力データを使えます。または、`tf.keras.utils.experimental.DatasetCreator` の`Dataset` は使いやすい推奨されるオプションです。ただし、`Dataset` を使用してメモリの問題が発生した場合は、呼び出し可能な `dataset_fn` 引数を指定して `DatasetCreator` を使用する必要がある場合があります (詳細については、`tf .keras.utils.experimental.DatasetCreator` API ドキュメントを参照してください)。\n",
    "\n",
    "データセットを `tf.data.Dataset` に変換する場合は、以下の例で示されているように、`Dataset.shuffle` と `Dataset.repeat` を使用する必要があります。\n",
    "\n",
    "- パラメータサーバートレーニングを使用する Keras `Model.fit` では、異なる方法でシャッフルされる場合を除いて、各ワーカーが同じデータセットを受け取ることを前提としています。したがって、`Dataset.shuffle` を呼び出すことで、データをより均等にイテレーションできます。\n",
    "- ワーカーは同期しないため、データセットの処理の終了時が異なる場合があります。`Dataset.repeat` を使用するとパラメータサーバートレーニングでエポックを簡単に定義できます。これは、引数なしで呼び出された場合にデータセットを無期限に繰り返し、`Model.fit` 呼び出しで `steps_per_epoch` 引数を指定します。\n",
    "\n",
    "`shuffle` と `repeat` の詳細については、[tf.data ガイド](../../guide/data.ipynb)の「トレーニング ワークフロー」セクションを参照してください。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:29.985505Z",
     "iopub.status.busy": "2024-01-11T18:21:29.985233Z",
     "iopub.status.idle": "2024-01-11T18:21:30.389883Z",
     "shell.execute_reply": "2024-01-11T18:21:30.388976Z"
    },
    "id": "shAo1CCS7wU1"
   },
   "outputs": [],
   "source": [
    "global_batch_size = 64\n",
    "\n",
    "x = tf.random.uniform((10, 10))\n",
    "y = tf.random.uniform((10,))\n",
    "\n",
    "dataset = tf.data.Dataset.from_tensor_slices((x, y)).shuffle(10).repeat()\n",
    "dataset = dataset.batch(global_batch_size)\n",
    "dataset = dataset.prefetch(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "v_jhF70K7zON"
   },
   "source": [
    "代わりに `tf.keras.utils.experimental.DatasetCreator` でデータセットを作成すると、`dataset_fn` のコードは、各ワーカーマシンの入力デバイス (通常は CPU) で呼び出されます。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "w60PuWrWwBD4"
   },
   "source": [
    "### モデルの構築とコンパイル\n",
    "\n",
    "まず、`tf.keras.Model` (デモ用の自明な `tf.keras.models.Sequential` モデル) を作成し、次に `Model.compile` を呼び出して、オプティマイザー、メトリックなどのコンポーネント、および `steps_per_execution` などのその他のパラメータを組み込みます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:30.393522Z",
     "iopub.status.busy": "2024-01-11T18:21:30.393280Z",
     "iopub.status.idle": "2024-01-11T18:21:30.436118Z",
     "shell.execute_reply": "2024-01-11T18:21:30.435441Z"
    },
    "id": "PhTHUYaD74vT"
   },
   "outputs": [],
   "source": [
    "with strategy.scope():\n",
    "  model = tf.keras.models.Sequential([tf.keras.layers.Dense(10)])\n",
    "\n",
    "  model.compile(tf.keras.optimizers.legacy.SGD(), loss=\"mse\", steps_per_execution=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nWb_Ekm377YX"
   },
   "source": [
    "### コールバックとトレーニング\n",
    "\n",
    "<a id=\"callbacks-and-training\"> </a>\n",
    "\n",
    "実際のトレーニングのために Keras `Model.fit` を呼び出す前に、次のような一般的なタスクに必要な[コールバック](https://www.tensorflow.org/guide/keras/train_and_evaluate)を準備します。\n",
    "\n",
    "- `tf.keras.callbacks.ModelCheckpoint`: 各エポック後など、特定の頻度でモデルを保存します。\n",
    "- `tf.keras.callbacks.BackupAndRestore`: クラスタが使用できなくなった場合 (アボートやプリエンプションなど)、モデルとその時点のエポック番号をバックアップすることで耐障害性を提供します。その後、ジョブの失敗からの再開時にトレーニング状態を復元し、中断されたエポックの最初からトレーニングを続行できます。\n",
    "- `tf.keras.callbacks.TensorBoard`: サマリーファイルにモデル ログを定期的に書き込みます。これは、TensorBoard ツールで視覚化できます。\n",
    "\n",
    "注意: パフォーマンスを維持するために、`ParameterServerStrategy` で使用する場合、カスタムコールバックでバッチレベルのコールバックをオーバーライドすることはできません。カスタムコールバックをエポックレベルの呼び出しに変更し、`steps_per_epoch` を適切な値に調整してください。また、`steps_per_epoch` は、`ParameterServerStrategy` と併用する場合、`Model.fit` に必須の引数です。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:30.439343Z",
     "iopub.status.busy": "2024-01-11T18:21:30.439118Z",
     "iopub.status.idle": "2024-01-11T18:21:37.933103Z",
     "shell.execute_reply": "2024-01-11T18:21:37.932233Z"
    },
    "id": "3ddUvUZk7_wm"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py:462: UserWarning: To make it possible to preserve tf.data options across serialization boundaries, their implementation has moved to be part of the TensorFlow graph. As a consequence, the options value is in general no longer known at graph construction time. Invoking this method in graph mode retains the legacy behavior of the original implementation, but note that the returned value might not reflect the actual value of the options.\n",
      "  warnings.warn(\"To make it possible to preserve tf.data options across \"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/5\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20/20 - 4s - loss: 0.9102 - 4s/epoch - 203ms/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 2/5\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20/20 - 1s - loss: 0.7032 - 1s/epoch - 60ms/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 3/5\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:5 out of the last 5 calls to <function MultiDeviceSaver.save.<locals>.tf_function_save at 0x7f6f08236790> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:5 out of the last 5 calls to <function MultiDeviceSaver.save.<locals>.tf_function_save at 0x7f6f08236790> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:6 out of the last 6 calls to <function MultiDeviceSaver.save.<locals>.tf_function_save at 0x7f6f0845a280> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:6 out of the last 6 calls to <function MultiDeviceSaver.save.<locals>.tf_function_save at 0x7f6f0845a280> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20/20 - 1s - loss: 0.5557 - 554ms/epoch - 28ms/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 4/5\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20/20 - 1s - loss: 0.4427 - 554ms/epoch - 28ms/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 5/5\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Assets written to: /tmp/my_working_dir/ckpt/assets\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20/20 - 1s - loss: 0.3580 - 553ms/epoch - 28ms/step\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.src.callbacks.History at 0x7f6f146b7fa0>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "working_dir = \"/tmp/my_working_dir\"\n",
    "log_dir = os.path.join(working_dir, \"log\")\n",
    "ckpt_filepath = os.path.join(working_dir, \"ckpt\")\n",
    "backup_dir = os.path.join(working_dir, \"backup\")\n",
    "\n",
    "callbacks = [\n",
    "    tf.keras.callbacks.TensorBoard(log_dir=log_dir),\n",
    "    tf.keras.callbacks.ModelCheckpoint(filepath=ckpt_filepath),\n",
    "    tf.keras.callbacks.BackupAndRestore(backup_dir=backup_dir),\n",
    "]\n",
    "\n",
    "model.fit(dataset, epochs=5, steps_per_epoch=20, callbacks=callbacks)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uWgP1h2z8B3j"
   },
   "source": [
    "### `ClusterCoordinator` で直接使用する (オプション)\n",
    "\n",
    "`Model.fit` トレーニングを選択した場合でも、必要に応じて `tf.distribute.coordinator.ClusterCoordinator` オブジェクトをインスタンス化して、ワーカーで実行する他の関数をスケジュールできます。詳細と例については、[カスタムトレーニングループを使ったトレーニング](#training_with_custom_training_loop)のセクションを参照してください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GxypEyIthR0z"
   },
   "source": [
    "## カスタムトレーニングループを使ったトレーニング\n",
    "\n",
    "<a id=\"training_with_custom_training_loop\"> </a>\n",
    "\n",
    "`tf.distribute.Strategy` でカスタムトレーニングループを使用すると、トレーニングループを非常に柔軟に定義できます。上で (`strategy` として) 定義された `ParameterServerStrategy` を使用して、`tf.distribute.coordinator.ClusterCoordinator` を使用して、トレーニングステップの実行をリモートワーカーにディスパッチできます。\n",
    "\n",
    "次に、他の `tf.distribute.Strategy` のトレーニングループで行ったように、モデルを作成し、データセットを定義し、ステップ関数を定義します。詳細については、[tf.distribute.Strategy を使用したカスタムトレーニング](custom_training.ipynb) チュートリアルを参照してください。\n",
    "\n",
    "効率的にデータセットをプリフェッチするには、以下の[リモートワーカーにトレーニングステップをディスパッチする](#dispatch_training_steps_to_remote_workers)セクションで説明されている、推奨される分散データセット作成 API を使用してください。また、ワーカーに割り当てられた GPU を最大限に活用するために、`worker_fn` 内で `Strategy.run` を呼び出してください。 残りのステップは、トレーニングで GPU を使用する場合でも使用しない場合でも同じです。\n",
    "\n",
    "次の手順でこれらのコンポーネントを作成します。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4QNkCtV8VivM"
   },
   "source": [
    "### データのセットアップ\n",
    "\n",
    "まず、データセットを作成する関数を作成します。\n",
    "\n",
    "[Keras 前処理レイヤー](https://www.tensorflow.org/guide/keras/preprocessing_layers)または [Tensorflow 変換レイヤー](https://www.tensorflow.org/tfx/tutorials/transform/simple)でデータを前処理する場合は、他の Keras レイヤーに対して行うようにこれらのレイヤーを **`dataset_fn`** の外、および、**`Strategy.scope`** の下に作成します。これは、`dataset_fn` が `tf.function` にラップされ、各ワーカーで実行されてデータパイプラインが生成されるためです。\n",
    "\n",
    "上記の手順に従わずにレイヤーを作成すると、`tf.function` からコーディネータにリフトされる Tensorflow 状態が作成され、ワーカーでそれらにアクセスすると、コーディネータとワーカーの間で繰り返し RPC 呼び出しが発生し、速度が大幅に低下する可能性があります。\n",
    "\n",
    "`Strategy.scope` の下にレイヤーを配置すると、代わりにすべてのワーカーにレイヤーが作成され、`tf.data.Dataset.map` を介して `dataset_fn` 内に変換を適用します。分散入力によるデータの前処理の詳細については、[分散入力](input.ipynb)チュートリアルの*データの前処理*を参照してください。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:37.936784Z",
     "iopub.status.busy": "2024-01-11T18:21:37.936509Z",
     "iopub.status.idle": "2024-01-11T18:21:38.001527Z",
     "shell.execute_reply": "2024-01-11T18:21:38.000782Z"
    },
    "id": "2GUwATssauus"
   },
   "outputs": [],
   "source": [
    "feature_vocab = [\n",
    "    \"avenger\", \"ironman\", \"batman\", \"hulk\", \"spiderman\", \"kingkong\", \"wonder_woman\"\n",
    "]\n",
    "label_vocab = [\"yes\", \"no\"]\n",
    "\n",
    "with strategy.scope():\n",
    "  feature_lookup_layer = tf.keras.layers.StringLookup(\n",
    "      vocabulary=feature_vocab,\n",
    "      mask_token=None)\n",
    "  label_lookup_layer = tf.keras.layers.StringLookup(\n",
    "      vocabulary=label_vocab,\n",
    "      num_oov_indices=0,\n",
    "      mask_token=None)\n",
    "\n",
    "  raw_feature_input = tf.keras.layers.Input(\n",
    "      shape=(3,),\n",
    "      dtype=tf.string,\n",
    "      name=\"feature\")\n",
    "  feature_id_input = feature_lookup_layer(raw_feature_input)\n",
    "  feature_preprocess_stage = tf.keras.Model(\n",
    "      {\"features\": raw_feature_input},\n",
    "      feature_id_input)\n",
    "\n",
    "  raw_label_input = tf.keras.layers.Input(\n",
    "      shape=(1,),\n",
    "      dtype=tf.string,\n",
    "      name=\"label\")\n",
    "  label_id_input = label_lookup_layer(raw_label_input)\n",
    "\n",
    "  label_preprocess_stage = tf.keras.Model(\n",
    "      {\"label\": raw_label_input},\n",
    "      label_id_input)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Jgp8MX_7OR_A"
   },
   "source": [
    "データセットでトイサンプルを生成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.004775Z",
     "iopub.status.busy": "2024-01-11T18:21:38.004522Z",
     "iopub.status.idle": "2024-01-11T18:21:38.009772Z",
     "shell.execute_reply": "2024-01-11T18:21:38.009085Z"
    },
    "id": "chIY4fFANaFH"
   },
   "outputs": [],
   "source": [
    "def feature_and_label_gen(num_examples=200):\n",
    "  examples = {\"features\": [], \"label\": []}\n",
    "  for _ in range(num_examples):\n",
    "    features = random.sample(feature_vocab, 3)\n",
    "    label = [\"yes\"] if \"avenger\" in features else [\"no\"]\n",
    "    examples[\"features\"].append(features)\n",
    "    examples[\"label\"].append(label)\n",
    "  return examples\n",
    "\n",
    "examples = feature_and_label_gen()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2AtZBya7OeyZ"
   },
   "source": [
    "次に、`dataset_fn` にラップされたトレーニングデータセットを作成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.012679Z",
     "iopub.status.busy": "2024-01-11T18:21:38.012445Z",
     "iopub.status.idle": "2024-01-11T18:21:38.016494Z",
     "shell.execute_reply": "2024-01-11T18:21:38.015861Z"
    },
    "id": "Gs0QYRZoNbvw"
   },
   "outputs": [],
   "source": [
    "def dataset_fn(_):\n",
    "  raw_dataset = tf.data.Dataset.from_tensor_slices(examples)\n",
    "\n",
    "  train_dataset = raw_dataset.map(\n",
    "      lambda x: (\n",
    "          {\"features\": feature_preprocess_stage(x[\"features\"])},\n",
    "          label_preprocess_stage(x[\"label\"])\n",
    "      )).shuffle(200).batch(32).repeat()\n",
    "  return train_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IT9PQexJiFtB"
   },
   "source": [
    "### モデルを構築する\n",
    "\n",
    "次に、モデルとその他のオブジェクトを作成します。必ず `Strategy.scope` の下にすべての変数を作成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.019501Z",
     "iopub.status.busy": "2024-01-11T18:21:38.019259Z",
     "iopub.status.idle": "2024-01-11T18:21:38.102884Z",
     "shell.execute_reply": "2024-01-11T18:21:38.102204Z"
    },
    "id": "Quxud1uEazeo"
   },
   "outputs": [],
   "source": [
    "# These variables created under the `Strategy.scope` will be placed on parameter\n",
    "# servers in a round-robin fashion.\n",
    "with strategy.scope():\n",
    "  # Create the model. The input needs to be compatible with Keras processing layers.\n",
    "  model_input = tf.keras.layers.Input(\n",
    "      shape=(3,), dtype=tf.int64, name=\"model_input\")\n",
    "\n",
    "  emb_layer = tf.keras.layers.Embedding(\n",
    "      input_dim=len(feature_lookup_layer.get_vocabulary()), output_dim=16384)\n",
    "  emb_output = tf.reduce_mean(emb_layer(model_input), axis=1)\n",
    "  dense_output = tf.keras.layers.Dense(\n",
    "      units=1, activation=\"sigmoid\",\n",
    "      kernel_regularizer=tf.keras.regularizers.L2(1e-4),\n",
    "  )(emb_output)\n",
    "  model = tf.keras.Model({\"features\": model_input}, dense_output)\n",
    "\n",
    "  optimizer = tf.keras.optimizers.legacy.RMSprop(learning_rate=0.1)\n",
    "  accuracy = tf.keras.metrics.Accuracy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iyuxiqCQU50m"
   },
   "source": [
    "`FixedShardsPartitioner` の使用により、すべての変数が 2 つのシャードに分割され、各シャードが異なるパラメータサーバーに割り当てられたことを確認します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.106114Z",
     "iopub.status.busy": "2024-01-11T18:21:38.105880Z",
     "iopub.status.idle": "2024-01-11T18:21:38.110274Z",
     "shell.execute_reply": "2024-01-11T18:21:38.109646Z"
    },
    "id": "04r1nO4WVDO1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/job:ps/replica:0/task:1/device:CPU:0\n",
      "/job:ps/replica:0/task:0/device:CPU:0\n"
     ]
    }
   ],
   "source": [
    "assert len(emb_layer.weights) == 2\n",
    "assert emb_layer.weights[0].shape == (4, 16384)\n",
    "assert emb_layer.weights[1].shape == (4, 16384)\n",
    "\n",
    "print(emb_layer.weights[0].device)\n",
    "print(emb_layer.weights[1].device)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lWhfXZLRiHyM"
   },
   "source": [
    "### トレーニングステップを定義する\n",
    "\n",
    "3 番目に、`tf.function` にラップされたトレーニングステップを作成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.113670Z",
     "iopub.status.busy": "2024-01-11T18:21:38.113114Z",
     "iopub.status.idle": "2024-01-11T18:21:38.121528Z",
     "shell.execute_reply": "2024-01-11T18:21:38.120891Z"
    },
    "id": "aNNVo0bFa1K9"
   },
   "outputs": [],
   "source": [
    "@tf.function\n",
    "def step_fn(iterator):\n",
    "\n",
    "  def replica_fn(batch_data, labels):\n",
    "    with tf.GradientTape() as tape:\n",
    "      pred = model(batch_data, training=True)\n",
    "      per_example_loss = tf.keras.losses.BinaryCrossentropy(\n",
    "          reduction=tf.keras.losses.Reduction.NONE)(labels, pred)\n",
    "      loss = tf.nn.compute_average_loss(per_example_loss)\n",
    "      model_losses = model.losses\n",
    "      if model_losses:\n",
    "        loss += tf.nn.scale_regularization_loss(tf.add_n(model_losses))\n",
    "    gradients = tape.gradient(loss, model.trainable_variables)\n",
    "\n",
    "    optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n",
    "\n",
    "    actual_pred = tf.cast(tf.greater(pred, 0.5), tf.int64)\n",
    "    accuracy.update_state(labels, actual_pred)\n",
    "    return loss\n",
    "\n",
    "  batch_data, labels = next(iterator)\n",
    "  losses = strategy.run(replica_fn, args=(batch_data, labels))\n",
    "  return strategy.reduce(tf.distribute.ReduceOp.SUM, losses, axis=None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rvrYQUeYiLNy"
   },
   "source": [
    "上記のトレーニングステップ関数では、`step_fn` における `Strategy.run` と `Strategy.reduce` の呼び出しでワーカーごとに複数の GPU をサポートできます。ワーカーに GPU が割り当てられている場合、`Strategy.run` は複数のレプリカ（GPU）でデータセットを分散します。これらの `tf.nn.compute_average_loss()` への同時呼び出しは、ワーカーの合計数に関係なく、1 つのワーカーのレプリカ（GPU）間で損失の平均を計算します。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GPJ3PV_L2zAY"
   },
   "source": [
    "### リモートワーカーにトレーニングステップをディスパッチする\n",
    "\n",
    "<a id=\"dispatch_training_steps_to_remote_workers\"> </a>\n",
    "\n",
    "すべての計算が `ParameterServerStrategy` によって定義された後、`tf.distribute.coordinator.ClusterCoordinator` クラスを使用してリソースを作成し、トレーニングステップをリモートワーカーに分散します。\n",
    "\n",
    "まず、`ClusterCoordinator` オブジェクトを作成し、ストラテジーオブジェクトを渡します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.124510Z",
     "iopub.status.busy": "2024-01-11T18:21:38.124281Z",
     "iopub.status.idle": "2024-01-11T18:21:38.127388Z",
     "shell.execute_reply": "2024-01-11T18:21:38.126686Z"
    },
    "id": "DpcMlH7Pa3DB"
   },
   "outputs": [],
   "source": [
    "coordinator = tf.distribute.coordinator.ClusterCoordinator(strategy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-xRIgKxciOSe"
   },
   "source": [
    "次に、`ClusterCoordinator.create_per_worker_dataset` API を使用して、ワーカーごとのデータセットと反復子を作成します。これにより、データセットがすべてのワーカーに複製されます。以下の `per_worker_dataset_fn` では、`dataset_fn` を `strategy.distribute_datasets_from_function` にラップして、GPU へ効率的にプリフェッチを実行できるようにすることを推薦します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.130777Z",
     "iopub.status.busy": "2024-01-11T18:21:38.130231Z",
     "iopub.status.idle": "2024-01-11T18:21:38.211699Z",
     "shell.execute_reply": "2024-01-11T18:21:38.210874Z"
    },
    "id": "h9DCvTJTa4Q2"
   },
   "outputs": [],
   "source": [
    "@tf.function\n",
    "def per_worker_dataset_fn():\n",
    "  return strategy.distribute_datasets_from_function(dataset_fn)\n",
    "\n",
    "per_worker_dataset = coordinator.create_per_worker_dataset(per_worker_dataset_fn)\n",
    "per_worker_iterator = iter(per_worker_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "i2pnOx78iRwW"
   },
   "source": [
    "最後のステップは、`Cluster Coordinator.schedule` を使用して計算をリモートワーカーに分散することです。\n",
    "\n",
    "- `schedule` メソッドは `tf.function` をキューに入れ、future-like の `RemoteValue` をすぐに返します。キューに入れられた関数はバックグラウンドスレッドでリモートワーカーにディスパッチされ、`RemoteValue` は非同期で入力されます。\n",
    "- `join` メソッド (`ClusterCoordinator.join`) は、スケジュールされたすべての関数が実行されるまで待機するために使用します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:38.215092Z",
     "iopub.status.busy": "2024-01-11T18:21:38.214853Z",
     "iopub.status.idle": "2024-01-11T18:21:44.030396Z",
     "shell.execute_reply": "2024-01-11T18:21:44.029454Z"
    },
    "id": "gmPvactfa6Eh"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Reduce to /device:CPU:0 then broadcast to ('/replica:0/device:CPU:0',).\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finished epoch 0, accuracy is 0.636824.\n",
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finished epoch 1, accuracy is 0.427365.\n",
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finished epoch 2, accuracy is 1.000000.\n",
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finished epoch 3, accuracy is 1.000000.\n"
     ]
    }
   ],
   "source": [
    "num_epochs = 4\n",
    "steps_per_epoch = 5\n",
    "for i in range(num_epochs):\n",
    "  accuracy.reset_states()\n",
    "  for _ in range(steps_per_epoch):\n",
    "    coordinator.schedule(step_fn, args=(per_worker_iterator,))\n",
    "  # Wait at epoch boundaries.\n",
    "  coordinator.join()\n",
    "  print(\"Finished epoch %d, accuracy is %f.\" % (i, accuracy.result().numpy()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WBn-gn-OP3DR"
   },
   "source": [
    "`Remote Value` の結果を取得する方法は次のとおりです。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:44.034154Z",
     "iopub.status.busy": "2024-01-11T18:21:44.033896Z",
     "iopub.status.idle": "2024-01-11T18:21:44.067226Z",
     "shell.execute_reply": "2024-01-11T18:21:44.066340Z"
    },
    "id": "-15a2I_lQDO1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Final loss is 0.184862\n"
     ]
    }
   ],
   "source": [
    "loss = coordinator.schedule(step_fn, args=(per_worker_iterator,))\n",
    "print(\"Final loss is %f\" % loss.fetch())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "htY4QKc9iXg9"
   },
   "source": [
    "または、すべてのステップを起動して、完了するのを待っている間に何かを行うこともできます。\n",
    "\n",
    "```python\n",
    "for _ in range(total_steps):\n",
    "  coordinator.schedule(step_fn, args=(per_worker_iterator,))\n",
    "while not coordinator.done():\n",
    "  time.sleep(10)\n",
    "  # Do something like logging metrics or writing checkpoints.\n",
    "```\n",
    "\n",
    "この特定の例の完全なトレーニングとサービングのワークフローについては、この[テスト](https://github.com/keras-team/keras/blob/master/keras/integration_test/parameter_server_keras_preprocessing_test.py)を参照してください。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kzNsj2GR3BGs"
   },
   "source": [
    "### データセット作成の詳細\n",
    "\n",
    "上記のコードのデータセットは、`ClusterCoordinator.create_per_worker_dataset` API を使用して作成されます。ワーカーごとに 1 つのデータセットを作成し、コンテナオブジェクトを返します。その上で `iter` メソッドを呼び出して、ワーカーごとの反復子を作成できます。ワーカーごとの反復子には、ワーカーごとに 1 つの反復子が含まれ、特定のワーカーで関数が実行される前に、`ClusterCoordinator.schedule` メソッドに渡される関数の入力引数で、ワーカーの対応するスライスが置き換えられます。\n",
    "\n",
    "`ClusterCoordinator.schedule` メソッドは、ワーカーが同等で、異なるワーカーのデータセットが同じであると想定しています (ただし、異なる方法でシャッフルされる可能性があります)。そのため、データセットから `OutOfRangeError` を受け取ることに依存せず、データセットを繰り返し、有限数のステップをスケジュールすることも推薦します。\n",
    "\n",
    "もう 1 つの重要な注意点は、`tf.data` データセットは、タスク境界を越えた暗黙的なシリアル化と逆シリアル化をサポートしていないということです。そのため、`ClusterCoordinator.create_per_worker_dataset` に渡される関数内でデータセット全体を作成することが重要です。`create_per_worker_dataset` API は、`tf.data.Dataset` または `tf.distribute.DistributedDataset` を入力として直接受け取ることもできます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LcfdI_M83lAM"
   },
   "source": [
    "## 評価\n",
    "\n",
    "`tf.distribute.ParameterServerStrategy` トレーニングで評価を実行する 2 つの主な方法は、インライン評価とサイドカー評価です。以下に説明するように、それぞれに長所と短所があります。特にこだわりがない場合は、インライン評価方法を推薦します。`Model.fit` を使用しているユーザーの場合、`Model.evaluate` は内部でインライン（分散）評価を使用しています。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oiG8EhcY3gA1"
   },
   "source": [
    "### インライン評価\n",
    "\n",
    "*インライン評価*では、コーディネータがトレーニングと評価を交互に行います。\n",
    "\n",
    "インライン評価には、以下のようないくつかの利点があります。\n",
    "\n",
    "- 単一のタスクでは保持できない大規模な評価モデルと評価データセットをサポートできます。\n",
    "- 評価結果を使用して、次のエポックのトレーニングに関する決定を下すことができます (トレーニングを早期に停止するかどうかなど)。\n",
    "\n",
    "インライン評価を実装するには、直接評価と分散評価の 2 つの方法があります。\n",
    "\n",
    "- **直接評価**: 小規模なモデルと評価データセットの場合、コーディネータは、コーディネータ上の評価データセットを使用して、分散モデルで直接評価を実行できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:44.071382Z",
     "iopub.status.busy": "2024-01-11T18:21:44.071067Z",
     "iopub.status.idle": "2024-01-11T18:21:44.368436Z",
     "shell.execute_reply": "2024-01-11T18:21:44.367492Z"
    },
    "id": "WakiAakoaHVn"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluation accuracy: 1.000000\n"
     ]
    }
   ],
   "source": [
    "eval_dataset = tf.data.Dataset.from_tensor_slices(\n",
    "    feature_and_label_gen(num_examples=16)).map(\n",
    "          lambda x: (\n",
    "              {\"features\": feature_preprocess_stage(x[\"features\"])},\n",
    "              label_preprocess_stage(x[\"label\"])\n",
    "          )).batch(8)\n",
    "\n",
    "eval_accuracy = tf.keras.metrics.Accuracy()\n",
    "\n",
    "for batch_data, labels in eval_dataset:\n",
    "  pred = model(batch_data, training=False)\n",
    "  actual_pred = tf.cast(tf.greater(pred, 0.5), tf.int64)\n",
    "  eval_accuracy.update_state(labels, actual_pred)\n",
    "\n",
    "print(\"Evaluation accuracy: %f\" % eval_accuracy.result())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MKGHbdI7aGoJ"
   },
   "source": [
    "- **分散評価**: コーディネータで直接実行することが不可能な大規模なモデルまたはデータセットの場合、コーディネータタスクは、`ClusterCoordinator.schedule`/`ClusterCoordinator.join` メソッドを介して評価タスクをワーカーに分散できます。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-01-11T18:21:44.372795Z",
     "iopub.status.busy": "2024-01-11T18:21:44.372090Z",
     "iopub.status.idle": "2024-01-11T18:21:45.478847Z",
     "shell.execute_reply": "2024-01-11T18:21:45.477849Z"
    },
    "id": "XcHNHJpDgEvK"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:4 GPUs are allocated per worker. Please use DistributedDataset by calling strategy.experimental_distribute_dataset or strategy.distribute_datasets_from_function to make best use of GPU resources\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:4 GPUs are allocated per worker. Please use DistributedDataset by calling strategy.experimental_distribute_dataset or strategy.distribute_datasets_from_function to make best use of GPU resources\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:4 GPUs are allocated per worker. Please use DistributedDataset by calling strategy.experimental_distribute_dataset or strategy.distribute_datasets_from_function to make best use of GPU resources\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:4 GPUs are allocated per worker. Please use DistributedDataset by calling strategy.experimental_distribute_dataset or strategy.distribute_datasets_from_function to make best use of GPU resources\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:4 GPUs are allocated per worker. Please use DistributedDataset by calling strategy.experimental_distribute_dataset or strategy.distribute_datasets_from_function to make best use of GPU resources\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:4 GPUs are allocated per worker. Please use DistributedDataset by calling strategy.experimental_distribute_dataset or strategy.distribute_datasets_from_function to make best use of GPU resources\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Waiting for all global closures to be finished.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluation accuracy: 1.000000\n"
     ]
    }
   ],
   "source": [
    "with strategy.scope():\n",
    "  # Define the eval metric on parameter servers.\n",
    "  eval_accuracy = tf.keras.metrics.Accuracy()\n",
    "\n",
    "@tf.function\n",
    "def eval_step(iterator):\n",
    "  def replica_fn(batch_data, labels):\n",
    "    pred = model(batch_data, training=False)\n",
    "    actual_pred = tf.cast(tf.greater(pred, 0.5), tf.int64)\n",
    "    eval_accuracy.update_state(labels, actual_pred)\n",
    "  batch_data, labels = next(iterator)\n",
    "  strategy.run(replica_fn, args=(batch_data, labels))\n",
    "\n",
    "def eval_dataset_fn():\n",
    "  return tf.data.Dataset.from_tensor_slices(\n",
    "      feature_and_label_gen(num_examples=16)).map(\n",
    "          lambda x: (\n",
    "              {\"features\": feature_preprocess_stage(x[\"features\"])},\n",
    "              label_preprocess_stage(x[\"label\"])\n",
    "          )).shuffle(16).repeat().batch(8)\n",
    "\n",
    "per_worker_eval_dataset = coordinator.create_per_worker_dataset(eval_dataset_fn)\n",
    "per_worker_eval_iterator = iter(per_worker_eval_dataset)\n",
    "\n",
    "eval_steps_per_epoch = 2\n",
    "for _ in range(eval_steps_per_epoch):\n",
    "  coordinator.schedule(eval_step, args=(per_worker_eval_iterator,))\n",
    "coordinator.join()\n",
    "print(\"Evaluation accuracy: %f\" % eval_accuracy.result())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cKrQktZX5z7a"
   },
   "source": [
    "#### 1 回限りの評価を有効にする\n",
    "\n",
    "<a id=\"exactly_once_evaluation\"></a>\n",
    "\n",
    "`tf.distribute.coordinator.ClusterCoordinator` の `schedule` と `join` メソッドは、デフォルトで、評価保証または 1 回限りのセマンティクスをサポートしていません。言い換えると、上記の例では、データセット内のすべての評価例がちょうど 1 回実行される保証がない、評価されないものや数回評価されるものがある、ということです。\n",
    "\n",
    "エポック間での評価の分散を軽減し、早期停止やハイパーパラメータのチューニングなどの方法で行われるモデルの選択を改善するには、1 回限りの評価が好ましい可能性があります。1 回限りの評価は、以下のように様々な方法で有効にできます。\n",
    "\n",
    "- `Model.fit/.evaluate` ワークフローを使用すると、`Model.compile` に引数を追加することで有効にできます。ドキュメントで `pss_evaluation_shards` 引数をご覧ください。\n",
    "- `tf.data` サービス API は、`ParameterServerStrategy` を使用する場合に 1 回限りの評価を提供できます（`tf.data.experimental.service` API ドキュメントの*動的シャーディング*セクションをご覧ください）。\n",
    "- [サイドカー評価](#sidecar_evaluation)は単一のマシン上で実行されるため、デフォルトで 1 回限りの評価を提供します。ただし、多数のワーカーに分散される評価をじっこうするよりも大幅に低速な場合があります。\n",
    "\n",
    "`Model.compile` を使用する最初のオプションは、ほとんどのユーザーに提案されるソリューションです。\n",
    "\n",
    "1 回限りの評価には、以下のような制限があります。\n",
    "\n",
    "- 1 回限りの評価保証でカスタム分散評価ループを記述することはサポートされていません。このサポートが必要な場合は、GitHub 課題を提出してください。\n",
    "- `Layer.add_metric` API を使用するメトリクスの計算は、自動的に処理されません。これらを評価から除外するか、`Metric` オブジェクトに組み込むように作り直す必要があります。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "H40X-9Gs3i7_"
   },
   "source": [
    "### サイドカー評価\n",
    "\n",
    "<a id=\"sidecar_evaluation\"></a>\n",
    "\n",
    "<em>サイドカー評価</em>は、<code>tf.distribute.ParameterServerStrategy</code> トレーニングで評価ループを定義して実行する別の方法で、最新のチェックポイントでチェックポイントを繰り返し読み取り評価を実行する専用の評価タスクを作成します。（チェックポイントの詳細については、[このガイド](../../guide/checkpoint.ipynb)を参照してください）。コーディネータータスクとワーカータスクは評価に時間を費やさないため、反復回数が一定であれば、全体のトレーニング時間は他の評価方法を使用するよりも短くなります。ただし、評価をトリガーするには、追加のエバリュエータタスクと定期的なチェックポイントが必要です。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HonyjnXK9-ys"
   },
   "source": [
    "サイドカー評価の評価ループを作成するには、次の 2 つのオプションがあります。\n",
    "\n",
    "1. `tf.keras.utils.SidecarEvaluator` API を使用する。\n",
    "2. カスタム評価ループを作成する。\n",
    "\n",
    "オプション 1 の詳細については、`tf.keras.utils.SidecarEvaluator` API ドキュメントを参照してください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "U_c0EiwB88OG"
   },
   "source": [
    "サイドカー評価は、単一のタスクでのみサポートされています。 これは、次のことを意味します。\n",
    "\n",
    "- 各サンプルが 1 回評価されることが保証されます。エバリュエータがプリエンプトまたは再起動された場合、最新のチェックポイントから評価ループを再起動し、再起動前に行われた部分的な評価の進行状況は破棄されます。\n",
    "\n",
    "- ただし、単一のタスクで評価を実行すると、完全な評価に時間がかかる可能性があります。\n",
    "\n",
    "- モデルのサイズが大きすぎてエバリュエータのメモリに収まらない場合、単一のサイドカー評価は適用されません。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VNJoWVc797B1"
   },
   "source": [
    "もう 1 つの注意点は、`tf.keras.utils.SidecarEvaluator` の実装と以下のカスタム評価ループが、一部のチェックポイントをスキップする可能性があるということです。利用可能な最新のチェックポイントは、常に取得され、評価エポック中に複数のチェックポイントがトレーニングクラスタから生成されるからです。すべてのチェックポイントを評価するカスタム評価ループを作成できますが、このチュートリアルでは扱いません。一方、評価の実行にかかる時間よりもチェックポイントの生成頻度が低い場合は、アイドル状態になる可能性があります。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "G5jopxBd85Ji"
   },
   "source": [
    "カスタム評価ループを使用すると、評価するチェックポイントを選択したり、評価とともに実行する追加のロジックを提供したりするなど、詳細を制御できます。以下は、カスタムサイドカー評価ループの例です。\n",
    "\n",
    "```python\n",
    "checkpoint_dir = ...\n",
    "eval_model = ...\n",
    "eval_data = ...\n",
    "checkpoint = tf.train.Checkpoint(model=eval_model)\n",
    "\n",
    "for latest_checkpoint in tf.train.checkpoints_iterator(\n",
    "    checkpoint_dir):\n",
    "  try:\n",
    "    checkpoint.restore(latest_checkpoint).expect_partial()\n",
    "  except (tf.errors.OpError,) as e:\n",
    "    # checkpoint may be deleted by training when it is about to read it.\n",
    "    continue\n",
    "\n",
    "  # Optionally add callbacks to write summaries.\n",
    "  eval_model.evaluate(eval_data)\n",
    "\n",
    "  # Evaluation finishes when it has evaluated the last epoch.\n",
    "  if latest_checkpoint.endswith('-{}'.format(train_epochs)):\n",
    "    break\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9TkNbtpPhFRQ"
   },
   "source": [
    "## 現実世界のクラスタ\n",
    "\n",
    "<a id=\"real_clusters\"></a>\n",
    "\n",
    "注意: このセクションは、このページのチュートリアルコードを実行するためには必要ありません。\n",
    "\n",
    "実際の運用環境では、すべてのタスクをさまざまなマシンのさまざまなプロセスで実行します。各タスクでクラスタ情報を構成する最も簡単な方法は、`\"TF_CONFIG\"` 環境変数を設定し、`tf.distribute.cluster_resolver.TFConfigClusterResolver` を使用して `\"TF_CONFIG\"` を解析することです。\n",
    "\n",
    "`\"TF_CONFIG\"` 環境変数の一般的な説明については、[分散トレーニング](../../guide/distributed_training.ipynb)ガイドの「`TF_CONFIG` 環境変数の設定」を参照してください。\n",
    "\n",
    "Kubernetes やその他の構成テンプレートを使用してトレーニングタスクを開始すると、これらのテンプレートにより `“TF_CONFIG\"` が既に設定されている可能性があります。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n7AK9SJGt3tQ"
   },
   "source": [
    "### `\"TF_CONFIG\"` 環境変数の設定\n",
    "\n",
    "3 つのワーカーと 2 つのパラメータサーバーがあるとします。ワーカー 1 の `\"TF_CONFIG\"` は次のようになります。\n",
    "\n",
    "```python\n",
    "os.environ[\"TF_CONFIG\"] = json.dumps({\n",
    "    \"cluster\": {\n",
    "        \"worker\": [\"host1:port\", \"host2:port\", \"host3:port\"],\n",
    "        \"ps\": [\"host4:port\", \"host5:port\"],\n",
    "        \"chief\": [\"host6:port\"]\n",
    "    },\n",
    "    \"task\": {\"type\": \"worker\", \"index\": 1}\n",
    "})\n",
    "```\n",
    "\n",
    "エバリュエータの `\"TF_CONFIG\"` は次のとおりです。\n",
    "\n",
    "```python\n",
    "os.environ[\"TF_CONFIG\"] = json.dumps({\n",
    "    \"cluster\": {\n",
    "        \"evaluator\": [\"host7:port\"]\n",
    "    },\n",
    "    \"task\": {\"type\": \"evaluator\", \"index\": 0}\n",
    "})\n",
    "```\n",
    "\n",
    "上記のエバリュエータの `\"TF_CONFIG\"` 文字列の `\"cluster\"` の部分はオプションです"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fZRjMS0pt1LM"
   },
   "source": [
    "### すべてのタスクで同じバイナリを使用する場合\n",
    "\n",
    "単一のバイナリを使用してこれらすべてのタスクを実行する場合は、最初にプログラムをさまざまなロールに分岐させる必要があります。\n",
    "\n",
    "```python\n",
    "cluster_resolver = tf.distribute.cluster_resolver.TFConfigClusterResolver()\n",
    "if cluster_resolver.task_type in (\"worker\", \"ps\"):\n",
    "  # Start a TensorFlow server and wait.\n",
    "elif cluster_resolver.task_type == \"evaluator\":\n",
    "  # Run sidecar evaluation\n",
    "else:\n",
    "  # Run the coordinator.\n",
    "```\n",
    "\n",
    "次のコードは、TensorFlow サーバーを起動して待機します。これは、`\"worker\"` および `\"ps\"` ロールに役立ちます。\n",
    "\n",
    "```python\n",
    "# Set the environment variable to allow reporting worker and ps failure to the\n",
    "# coordinator. This is a workaround and won't be necessary in the future.\n",
    "os.environ[\"GRPC_FAIL_FAST\"] = \"use_caller\"\n",
    "\n",
    "server = tf.distribute.Server(\n",
    "    cluster_resolver.cluster_spec(),\n",
    "    job_name=cluster_resolver.task_type,\n",
    "    task_index=cluster_resolver.task_id,\n",
    "    protocol=cluster_resolver.rpc_layer or \"grpc\",\n",
    "    start=True)\n",
    "server.join()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZWdYfK593eOL"
   },
   "source": [
    "## タスクの障害の処理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Bl9eK5r13cOv"
   },
   "source": [
    "### ワーカーの障害\n",
    "\n",
    "`tf.distribute.coordinator.ClusterCoordinator` カスタムトレーニングループと `Model.fit` アプローチの両方が、ワーカーの障害に対する組み込みのフォールトトレランスを提供します。ワーカーの復旧時に、`ClusterCoordinator` はワーカーでデータセットの再作成を呼び出します。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "aP0OHZ1-Ne-B"
   },
   "source": [
    "### パラメータサーバーまたはコーディネータの障害\n",
    "\n",
    "コーディネータがパラメータサーバーエラーを検出すると、すぐに `UnavailableError` または `AbortedError` が発生します。この場合、コーディネータを再起動できます。また、コーディネータ自体も利用できなくなる可能性があるので、トレーニングの進行状況を失わないようにするためのツールを使用することを推薦します。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f7m7Itoz8lsI"
   },
   "source": [
    "- `Model.fit` の場合、進行状況の保存と復元を自動的に処理する `BackupAndRestore` コールバックを使用する必要があります。例については、上記の[コールバックとトレーニング](#callbacks-and-training) セクションを参照してください。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-XlLyJp53Z8A"
   },
   "source": [
    "- カスタムトレーニングループの場合、モデル変数を定期的にチェックポイントし、チェックポイントがある場合は、トレーニングを開始する前にモデル変数を読み込む必要があります。オプティマイザがチェックポイントされている場合、トレーニングの進行状況は `optimizer.iterations` からおおよそ推測できます。\n",
    "\n",
    "```python\n",
    "checkpoint_manager = tf.train.CheckpointManager(\n",
    "    tf.train.Checkpoint(model=model, optimizer=optimizer),\n",
    "    checkpoint_dir,\n",
    "    max_to_keep=3)\n",
    "if checkpoint_manager.latest_checkpoint:\n",
    "  checkpoint = checkpoint_manager.checkpoint\n",
    "  checkpoint.restore(\n",
    "      checkpoint_manager.latest_checkpoint).assert_existing_objects_matched()\n",
    "\n",
    "global_steps = int(optimizer.iterations.numpy())\n",
    "starting_epoch = global_steps // steps_per_epoch\n",
    "\n",
    "for _ in range(starting_epoch, num_epochs):\n",
    "  for _ in range(steps_per_epoch):\n",
    "    coordinator.schedule(step_fn, args=(per_worker_iterator,))\n",
    "  coordinator.join()\n",
    "  checkpoint_manager.save()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "PlN1P7C53XK9"
   },
   "source": [
    "### `RemoteValue` のフェッチ\n",
    "\n",
    "関数が正常に実行された場合、`RemoteValue` のフェッチは確実に成功します。これは、現在、関数が実行された後、戻り値がすぐにコーディネータにコピーされるためです。コピー中にワーカーに障害が発生した場合、関数は別の使用可能なワーカーで再試行されます。したがって、パフォーマンスを最適化するには、戻り値なしで関数をスケジュールします。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iZcR_xNZ3UdU"
   },
   "source": [
    "## エラーレポート\n",
    "\n",
    "コーディネータは、パラメータサーバーからの `UnavailableError` などのエラーや、`tf.debugging.check_numerics` からの `InvalidArgument` などの他のアプリケーションエラーを確認すると、エラーを発生する前に、保留中およびキューに入れられたすべての関数をキャンセルします。対応する `RemoteValue` をフェッチすると、`CancelledError` が発生します。\n",
    "\n",
    "エラーが発生した後、コーディネータは同じエラーまたはキャンセルされた関数からのエラーを発生しません。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QfhbXH-j3NVw"
   },
   "source": [
    "## パフォーマンスの改善\n",
    "\n",
    "`tf.distribute.ParameterServerStrategy` と `tf.distribute.coordinator.ClusterCoordinator` でトレーニングするときにパフォーマンスの問題が発生することがあります。\n",
    "\n",
    "一般的に、パラメータサーバーの負荷が不均衡であり、負荷の高い一部のパラメータサーバーが制限容量に達した場合に発生します。 また、複数の根本原因が存在する場合もあります。この問題を軽減する簡単な方法は次のとおりです。\n",
    "\n",
    "1. `ParameterServerStrategy` を構築するときに `variable_partitioner` を指定して、大規模なモデルの変数を分割します。\n",
    "2. 次のようにして、すべてのパラメータサーバーで必要なホットスポット変数を 1 つのステップで作成することは避けてください。\n",
    "\n",
    "1. オプティマイザで一定の学習率またはサブクラス `tf.keras.optimizers.schedules.LearningRateSchedule` を使用します。これは、デフォルトの動作では、学習率は特定のパラメータサーバーに配置される変数になり、各ステップで他のすべてのパラメータサーバーによって要求されるためです。\n",
    "\n",
    "2. `tf.keras.optimizers.legacy.Optimizer` を使用します（標準の `tf.keras.optimizers.Optimizer` では、ホットスポット変数になる可能性があります）。\n",
    "\n",
    "1. 大きな語彙は、Keras の前処理レイヤーに渡す前にシャッフルします。\n",
    "\n",
    "もう 1 つのパフォーマンスの問題の原因は、コーディネータです。 `schedule`/`join` の実装は Python ベースであるため、スレッドのオーバーヘッドが発生する場合があります。また、コーディネータとワーカー間の待ち時間が長くなる可能性があります。このような場合は、次のようにします。\n",
    "\n",
    "- `Model.fit` では、`Model.compile` で提供される `steps_per_execution` 引数を 1 より大きい値に設定します。\n",
    "\n",
    "- カスタムトレーニングループでは、複数のステップを 1 つの `tf.function` にまとめることができます。\n",
    "\n",
    "```python\n",
    "steps_per_invocation = 10\n",
    "\n",
    "@tf.function\n",
    "def step_fn(iterator):\n",
    "  for _ in range(steps_per_invocation):\n",
    "    features, labels = next(iterator)\n",
    "    def replica_fn(features, labels):\n",
    "      ...\n",
    "\n",
    "    strategy.run(replica_fn, args=(features, labels))\n",
    "```\n",
    "\n",
    "今後ライブラリがさらに最適化されるにつれて、ほとんどのユーザーはステップを手動でまとめる必要がなくなることでしょう。\n",
    "\n",
    "また、上記の[タスクの障害の処理セクション](#handling_task_failure)で説明したように、パフォーマンスを向上させるために、戻り値なしで関数をスケジュールすることもできます。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "chu5F7M_JmVk"
   },
   "source": [
    "## 既知の制限\n",
    "\n",
    "<a id=\"known_limitations\"> </a>\n",
    "\n",
    "既知の制限のほとんどは、上記のセクションで既に説明されています。このセクションでは、概要を説明します。\n",
    "\n",
    "### `ParameterServerStrategy` 全般\n",
    "\n",
    "- `os.environment[\"grpc_fail_fast\"]=\"use_caller\"` は、フォールトトレランスを適切に機能させるために、コーディネータを含むすべてのタスクで必要です。\n",
    "- 同期パラメータサーバートレーニングはサポートされていません。\n",
    "- 通常、パフォーマンスを最適化するには、複数のステップを 1 つの関数にまとめる必要があります。\n",
    "- 分割された変数を含む `tf.saved_model.load` 経由での saved_model の読み込みはサポートされていません。注意: TensorFlow Serving を使用したこのような saved_model の読み込みは機能することが期待されています (詳細については、[サービングのチュートリアル](https://www.tensorflow.org/tfx/tutorials/serving/rest_simple)を参照してください)。\n",
    "- コーディネータタスクを再起動せずにパラメータサーバーの障害から回復することできません。\n",
    "- `tf.keras.layers.IntegerLookup`、`tf.keras.layers.StringLookup`、`tf.keras.layers.TextVectorization`、などの一部の Keras 前処理レイヤーで一般的に使用される `tf.lookup.StaticHashTable` は、`Strategy.scope` の下に配置する必要があります。そうしないと、リソースがコーディネータに配置され、ワーカーからコーディネータへのルックアップ RPC がパフォーマンスに影響を与えます。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2MKBF0RPSvzB"
   },
   "source": [
    "### `Model.fit` のみ\n",
    "\n",
    "- `Model.fit` には `steps_per_epoch` 引数が必要です。エポックで適切な間隔を提供する値を選択します。\n",
    "- `ParameterServerStrategy` は、パフォーマンス上の理由から、バッチレベルの呼び出しを持つカスタムコールバックをサポートしていません。これらの呼び出しを適切に選択された `steps_per_epoch` を持つエポックレベルの呼び出しに変換して、`steps_per_epoch` のステップ数ごとに呼び出されるようにする必要があります。バッチレベルの呼び出しはパフォーマンスが向上するように変更されているので、組み込みのコールバックは影響を受けません。`ParameterServerStrategy` のバッチレベルの呼び出しのサポートは計画されています。\n",
    "- 同じ理由で、他のストラテジーとは異なり、進捗バーと指標はエポック境界でのみログに記録されます。\n",
    "- `run_eagerly` は、サポートされていません。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wvY-mg35Sx5L"
   },
   "source": [
    "### カスタムトレーニングループのみ\n",
    "\n",
    "- `ClusterCoordinator.schedule` は一般にデータセットの評価保証をサポートしていませんが、評価保証は `Model.fit/.evaluate` を通じて可能です。[1 回限りの評価を有効にする](#exactly_once_evaluation)をご覧ください。\n",
    "- `ClusterCoordinator.create_per_worker_dataset` が callable と入力として使用される場合、渡された関数内でデータセット全体を作成する必要があります。\n",
    "- `tf.data.Options` は、`ClusterCoordinator.create_per_worker_dataset` により作成されたデータセットでは無視されます。"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "name": "parameter_server_training.ipynb",
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}