##### Copyright 2018 The TensorFlow Authors.


In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# GPU を使用する

<table class="tfo-notebook-buttons" align="left">
  <td><a target="_blank" href="https://www.tensorflow.org/guide/gpu"><img src="https://www.tensorflow.org/images/tf_logo_32px.png">TensorFlow.org で表示</a></td>
  <td><a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/ja/guide/gpu.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png">Google Colab で実行</a></td>
  <td><a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/ja/guide/gpu.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png">GitHub でソースを表示</a></td>
  <td><a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/ja/guide/gpu.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png">ノートブックをダウンロード</a></td>
</table>

TensorFlow のコードと`tf.keras`モデルは、コードを変更することなく単一の GPU で透過的に実行されます。

注意: `tf.config.list_physical_devices('GPU')`を使用して、TensorFlow が GPU を使用していることを確認してください。

単一または複数のマシンで複数の GPU を実行する最も簡単な方法は、[分散ストラテジー](distributed_training.ipynb)の使用です。

このガイドは、これらのアプローチを試し、TensorFlow の GPU 使用方法を細かく制御する必要があることに気づいたユーザーを対象としています。シングル GPU とマルチ GPU のシナリオにおいてパフォーマンスをデバッグする方法については、「[TensorFlow GPU パフォーマンスを最適化する](gpu_performance_analysis.md)」ガイドをご覧ください。

## セットアップ

最新リリースの TensorFlow GPU がインストールされていることを確認します。

In [2]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

2022-12-14 22:56:35.420719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-14 22:56:35.420819: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory


Num GPUs Available:  4


## 概要


TensorFlow は、CPU や GPU など、さまざまなタイプのデバイスでの計算の実行をサポートしています。それらは、例えば文字列の識別子で表されています。

- `"/device:CPU:0"`: マシンの CPU。
- `"/GPU:0"`: TensorFlow が認識するマシンの最初の GPU の簡略表記。
- `"/job:localhost/replica:0/task:0/device:GPU:1"`: TensorFlow が認識するマシンの 2 番目の GPU の完全修飾名。

TensorFlow 演算に CPU と GPU 実装の両方を持つ場合、演算がデバイスに割り当てられるときに、デフォルトでは GPU デバイスが優先されます。たとえば、`tf.matmul` は CPU と GPU カーネルの両方を持ちます。デバイス `CPU:0` と `GPU:0` を持つシステム上では、それを他のデバイス上で実行することを明示的に要求しない限りは、`GPU:0` デバイスが `tf.matmul` の実行に選択されます。

TensorFlow 演算に対応する GPU 実装がない場合は、演算は CPU デバイスにフォールバックします。たとえば、`tf.cast` には CPU カーネルしかないため、デバイス `CPU:0` と `GPU:0` のあるシステムでは、`CPU:0` デバイスが `tf.cast` の実行に選択され、`GPU:0` デバイスでの実行をリクエストされても無視されます。

## デバイスの配置をログに記録する

演算と tensor がどのデバイスに割り当てられたかを確認するには、`tf.debugging.set_log_device_placement(True)`をプログラムの最初のステートメントとして置きます。デバイス配置ログを有効にすると、テンソルの割り当てや演算が出力されます。

In [3]:
tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0


tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)


上記のコードの出力は、`GPU:0`で`MatMul`演算が実行されたことを示します。

## 手動でデバイスを配置する

自動的に選択されたものの代わりに、自分が選択したデバイスで特定の演算を実行する場合は、デバイスコンテキストの作成に`with tf.device`を使用すると、そのコンテキスト内のすべての演算は同じ指定されたデバイス上で実行されます。

In [4]:
tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

c = tf.matmul(a, b)
print(c)

Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0


tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)


これで`a`と`b`は`CPU:0`に割り当てられたことが分かります。`MatMul`演算に対してデバイスが明示的に指定されていないため、TensorFlow ランタイムは演算と利用可能なデバイスに基づいてデバイスを選択し（この例では`GPU:0`）、必要に応じて自動的にデバイス間でテンソルをコピーします。

## GPU のメモリ増加を制限する

デフォルトでは、TensorFlow は（[`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars)に従い）プロセスが認識する全 GPU の ほぼ全てのGPU メモリをマップします。これはメモリの断片化を減らしてデバイス上のかなり貴重な GPU メモリリソースをより効率的に使用するために行われます。TensorFlow を特定の GPU セットに制限するには、`tf.config.set_visible_devices`メソッドを使用します。

In [5]:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

Visible devices cannot be modified after being initialized


場合によりますが、プロセスが使用可能なメモリのサブセットのみを割り当てるか、またはプロセスに必要とされるメモリ使用量のみを増やすことが望ましいです。TensorFlow は、これを制御する 2 つのメソッドを提供します。

最初のオプションは、`tf.config.experimental.set_memory_growth` を呼び出してメモリ増大を有効にすることです。これはランタイムの割り当てに必要な GPU メモリだけを割り当てようと試みます。非常に小さいメモリの割り当てから始め、プログラムが実行されてより多くの GPU メモリが必要になってくるにつれて、TensorFlow プロセスに割り当てられる GPU メモリ領域を拡張します。メモリの断面化につながる可能性があるため、メモリは解放されません。特定の GPU のメモリ増大を有効にするためには、テンソルの割り当てや演算の実行の前に次のコードを使用してください。

In [6]:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

Physical devices cannot be modified after being initialized


このオプションを有効にするもう 1 つの方法は、環境変数`TF_FORCE_GPU_ALLOW_GROWTH`を`true`に設定するというものです。この構成はプラットフォーム固有です。

2 番目のメソッドは、`tf.config.set_logical_device_configurartion`で仮想 GPU デバイスを構成し、GPU に割り当てられるメモリ総量を固定することです。

In [7]:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Virtual devices cannot be modified after being initialized


これは TensorFlow プロセスに利用可能な GPU メモリの量を正確に抑制する場合に有用です。これは GPU がワークステーション GUI などの他のアプリケーションと共有されている場合のローカル開発では一般的な方法です。

## マルチ GPU システムで単一 GPU を使用する

システムに 2 つ以上の GPU が搭載されている場合、デフォルトでは最小の ID を持つ GPU が選択されます。異なる GPU 上で実行する場合には、その選択を明示的に指定する必要があります。

In [8]:
tf.debugging.set_log_device_placement(True)

try:
  # Specify an invalid GPU device
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:2


指定したデバイスが存在しない場合には、`RuntimeError`: `.../device:GPU:2 unknown device`が表示されます。

指定されたデバイスが存在しない場合に TensorFlow に既に演算の実行をサポートしているデバイスを自動的に選択させたければ、`tf.config.set_soft_device_placement(True)`を呼び出すことができます。

In [9]:
tf.config.set_soft_device_placement(True)
tf.debugging.set_log_device_placement(True)

# Creates some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0


tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32)


## マルチ GPU を使用する

マルチ GPU 向けの開発は、追加リソースでのモデルのスケーリングを可能にします。単一 GPU システム上で開発している場合は、仮想デバイスで複数の GPU をシミュレートできます。これにより、追加のリソースがなくてもマルチ GPU セットアップの簡単なテストが可能になります。

In [10]:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Virtual devices cannot be modified after being initialized


ランタイムで利用可能な複数の論理 GPU を取得したら、`tf.distribute.Strategy` または手動配置でマルチ GPU を利用できます。

#### `tf.distribute.Strategy`を使用する

マルチ GPU を使用するベストプラクティスは、`tf.distribute.Strategy`を使用することです。次に単純な例を示します。

In [11]:
tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op StatelessRandomGetKeyCounter in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op StatelessRandomUniformV2 in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Sub in device /job:localhost/replica:0/task:0/device:GPU:0


input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:2
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
a: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
b: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:2
product_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:2
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp

Executing op AddV2 in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


0/task:0/device:GPU:0
x: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
y: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AddV2: (AddV2): /job:localhost/replica:0/task:0/device:GPU:0
z_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
value_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:1
Identity: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
output_RetVal: (_Retval): /job:localhost/replica

Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


:0/device:GPU:2
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:2
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:2
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:3
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:3
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:3
AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:3
input: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
_EagerConst: (_EagerConst): /job:localhost/replica:0/task:0/device:GPU:0
output_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
resource_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:GPU:0
VarHandleOp: (VarHandleOp): /job:localhost/replica:0/task:0/device:GPU:0
resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
value: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0
AssignVariableOp

Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0


INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0


INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:CPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AssignVariableOp in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op NoOp in device /job:localhost/replica:0/task:0/device:GPU:0


このプログラムは各 GPU でモデルのコピーを実行し、それらの間で入力データを分割します。これは、「[データの並列処理](https://en.wikipedia.org/wiki/Data_parallelism)」としても知られています。

分散ストラテジーの詳細については、[こちら](./distributed_training.ipynb)のガイドをご覧ください。

#### 手動で配置する

`tf.distribute.Strategy`は、内部的には複数のデバイスに渡って計算を複製することによって動作しています。各 GPU 上でモデルを構築することにより、複製を手動で実装することができます。例を示します。

In [12]:
tf.debugging.set_log_device_placement(True)

gpus = tf.config.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)

Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:1


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:2


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op _EagerConst in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:3


Executing op AddN in device /job:localhost/replica:0/task:0/device:CPU:0


tf.Tensor(
[[ 88. 112.]
 [196. 256.]], shape=(2, 2), dtype=float32)
