{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "b518b04cbfe0" }, "source": [ "##### Copyright 2020 The TensorFlow Authors." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-12-14T21:28:55.097751Z", "iopub.status.busy": "2022-12-14T21:28:55.097333Z", "iopub.status.idle": "2022-12-14T21:28:55.101145Z", "shell.execute_reply": "2022-12-14T21:28:55.100607Z" }, "id": "906e07f6e562" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "6e083398b477" }, "source": [ "# 前処理レイヤーを使用する" ] }, { "cell_type": "markdown", "metadata": { "id": "64010bd23c2e" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
TensorFlow.org で実行 Google Colab で実行 GitHubでソースを表示 ノートブックをダウンロード
" ] }, { "cell_type": "markdown", "metadata": { "id": "b1d403f04693" }, "source": [ "## Keras 前処理レイヤー\n", "\n", "Keras 前処理レイヤー API を使用すると、開発者は Keras ネイティブの入力処理パイプラインを構築できます。これらの入力処理パイプラインは、Keras 以外のワークフローで独立した前処理コードとして使用し、Keras モデルと直接組み合わせて、Keras SavedModel の一部としてエクスポートできます。\n", "\n", "Keras 前処理レイヤーを使用すると、真にエンドツーエンドのモデル(生の画像または生の構造化データを入力として受け入れるモデルや特徴の正規化または特徴値のインデックス作成を独自に処理するモデル)を構築およびエクスポートできます。" ] }, { "cell_type": "markdown", "metadata": { "id": "313360fa9024" }, "source": [ "## 利用可能な前処理\n", "\n", "### テキストの前処理\n", "\n", "- `tf.keras.layers.TextVectorization`: 生の文字列を、`Embedding` レイヤーまたは `Dense` レイヤーで読み取ることができるエンコードされた表現に変換します。\n", "\n", "### 数値特徴量の前処理\n", "\n", "- `tf.keras.layers.Normalization`: 入力した特徴量を特徴量ごとに正規化します。\n", "- `tf.keras.layers.Discretization`: 連続数値の特徴量を整数カテゴリカル特徴量に変換します。\n", "\n", "### カテゴリカル特徴量の前処理\n", "\n", "- `tf.keras.layers.CategoryEncoding`: 整数のカテゴリカル特徴量をワンホット、マルチホット、またはカウントデンス表現に変換します。\n", "- `tf.keras.layers.Hashing`: カテゴリカル特徴量ハッシュ (ハッシュトリック) を実行します。\n", "- `tf.keras.layers.StringLookup`: 文字列のカテゴリカル値を、`Embedding` レイヤーや `Dense` レイヤーで読み取れるエンコードされた表現に変換します。\n", "- `tf.keras.layers.IntegerLookup`: 整数のカテゴリ値を、`Embedding` レイヤーまたは `Dense` レイヤーで読み取ることができるエンコードされた表現に変換します。\n", "\n", "### 画像の前処理\n", "\n", "これらのレイヤーは、画像モデルの入力を標準化するためのものです。\n", "\n", "- `tf.keras.layers.Resizing`: 画像のバッチのサイズをターゲットサイズに変更します。\n", "- `tf.keras.layers.Rescaling`: 画像のバッチの値を再スケーリングおよびオフセットします(たとえば、`[0, 255]` 範囲の入力から `[0, 1]` 範囲の入力に移動します。)\n", "- `tf.keras.layers.CenterCrop`: 画像のバッチの中央の切り抜きを返します。\n", "\n", "### 画像データ増強(デバイス上)\n", "\n", "これらのレイヤーは、ランダムな増強変換を画像のバッチに適用します。これらはトレーニング中にのみアクティブになります。\n", "\n", "- `tf.keras.layers.RandomCrop`\n", "- `tf.keras.layers.RandomFlip`\n", "- `tf.keras.layers.RandomTranslation`\n", "- `tf.keras.layers.RandomRotation`\n", "- `tf.keras.layers.RandomZoom`\n", "- `tf.keras.layers.RandomHeight`\n", "- `tf.keras.layers.RandomWidth`\n", "- `tf.keras.layers.RandomContrast`" ] }, { "cell_type": "markdown", "metadata": { "id": "c923e41fb1b4" }, "source": [ "## `adapt()` メソッド\n", "\n", "一部の前処理レイヤーには、トレーニングデータのサンプルに基づいて計算できる内部状態があります。ステートフル前処理レイヤーのリストは次のとおりです。\n", "\n", "- `TextVectorization`: 文字列トークンと整数インデックス間のマッピングを保持します。\n", "- `StringLookup` と `IntegerLookup`: 入力値と整数インデックスの間のマッピングを保持します。\n", "- `Normalization`: 特徴量の平均と標準偏差を保持します。\n", "- `Discretization`: 値バケットの境界に関する情報を保持します。\n", "\n", "これらのレイヤーは**トレーニング不可**であることに注意してください。これらの状態はトレーニング中に設定されません。事前に計算された定数から初期化するか、データに「適応」させることにより、**トレーニングの前に**設定する必要があります。\n", "\n", "`adapt()` メソッドを使用して、トレーニングデータに前処理レイヤーを公開することにより、前処理レイヤーの状態を設定します。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:28:55.104913Z", "iopub.status.busy": "2022-12-14T21:28:55.104269Z", "iopub.status.idle": "2022-12-14T21:29:00.567918Z", "shell.execute_reply": "2022-12-14T21:29:00.567139Z" }, "id": "4cac6bd80812" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2022-12-14 21:28:56.053993: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", "2022-12-14 21:28:56.054084: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", "2022-12-14 21:28:56.054094: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Features mean: -0.00\n", "Features std: 1.00\n" ] } ], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "from tensorflow.keras import layers\n", "\n", "data = np.array([[0.1, 0.2, 0.3], [0.8, 0.9, 1.0], [1.5, 1.6, 1.7],])\n", "layer = layers.Normalization()\n", "layer.adapt(data)\n", "normalized_data = layer(data)\n", "\n", "print(\"Features mean: %.2f\" % (normalized_data.numpy().mean()))\n", "print(\"Features std: %.2f\" % (normalized_data.numpy().std()))" ] }, { "cell_type": "markdown", "metadata": { "id": "d43b8246b8a3" }, "source": [ "`adapt()`メソッドは、Numpy 配列または`tf.data.Dataset`オブジェクトのいずれかを取ります。`StringLookup`および`TextVectorization`の場合、文字列のリストを渡すこともできます。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:00.572129Z", "iopub.status.busy": "2022-12-14T21:29:00.571366Z", "iopub.status.idle": "2022-12-14T21:29:00.742561Z", "shell.execute_reply": "2022-12-14T21:29:00.741786Z" }, "id": "48d95713348a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(\n", "[[37 12 25 5 9 20 21 0 0]\n", " [51 34 27 33 29 18 0 0 0]\n", " [49 52 30 31 19 46 10 0 0]\n", " [ 7 5 50 43 28 7 47 17 0]\n", " [24 35 39 40 3 6 32 16 0]\n", " [ 4 2 15 14 22 23 0 0 0]\n", " [36 48 6 38 42 3 45 0 0]\n", " [ 4 2 13 41 53 8 44 26 11]], shape=(8, 9), dtype=int64)\n" ] } ], "source": [ "data = [\n", " \"ξεῖν᾽, ἦ τοι μὲν ὄνειροι ἀμήχανοι ἀκριτόμυθοι\",\n", " \"γίγνοντ᾽, οὐδέ τι πάντα τελείεται ἀνθρώποισι.\",\n", " \"δοιαὶ γάρ τε πύλαι ἀμενηνῶν εἰσὶν ὀνείρων:\",\n", " \"αἱ μὲν γὰρ κεράεσσι τετεύχαται, αἱ δ᾽ ἐλέφαντι:\",\n", " \"τῶν οἳ μέν κ᾽ ἔλθωσι διὰ πριστοῦ ἐλέφαντος,\",\n", " \"οἵ ῥ᾽ ἐλεφαίρονται, ἔπε᾽ ἀκράαντα φέροντες:\",\n", " \"οἱ δὲ διὰ ξεστῶν κεράων ἔλθωσι θύραζε,\",\n", " \"οἵ ῥ᾽ ἔτυμα κραίνουσι, βροτῶν ὅτε κέν τις ἴδηται.\",\n", "]\n", "layer = layers.TextVectorization()\n", "layer.adapt(data)\n", "vectorized_text = layer(data)\n", "print(vectorized_text)" ] }, { "cell_type": "markdown", "metadata": { "id": "7619914dfb40" }, "source": [ "さらに、適応可能なレイヤーは、コンストラクタ引数または重みの割り当てを介して状態を直接設定するオプションを常に公開します。意図した状態値がレイヤー構築時にわかっている場合、または`adapt()`呼び出しの外で計算される場合は、レイヤーの内部計算に依存せずに設定できます。例えば、`TextVectorization`、`StringLookup`、または、`IntegerLookup`レイヤーの外部語彙ファイルがすでに存在する場合、レイヤーのコンストラクタ引数で語彙ファイルへのパスを渡すことにより、それらをルックアップテーブルに直接読み込めます。\n", "\n", "以下の例では、事前に計算された語彙を使用して`StringLookup`レイヤーをインスタンス化します。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:00.746260Z", "iopub.status.busy": "2022-12-14T21:29:00.745680Z", "iopub.status.idle": "2022-12-14T21:29:00.755049Z", "shell.execute_reply": "2022-12-14T21:29:00.754375Z" }, "id": "9df56efc7f3b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(\n", "[[1 3 4]\n", " [4 0 2]], shape=(2, 3), dtype=int64)\n" ] } ], "source": [ "vocab = [\"a\", \"b\", \"c\", \"d\"]\n", "data = tf.constant([[\"a\", \"c\", \"d\"], [\"d\", \"z\", \"b\"]])\n", "layer = layers.StringLookup(vocabulary=vocab)\n", "vectorized_data = layer(data)\n", "print(vectorized_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "49cbfe135b00" }, "source": [ "## モデルの前またはモデル内のデータの前処理\n", "\n", "前処理レイヤーを使用する方法は 2 つあります。\n", "\n", "**オプション 1:** 次のように、それらをモデルの一部にします。\n", "\n", "```python\n", "inputs = keras.Input(shape=input_shape)\n", "x = preprocessing_layer(inputs)\n", "outputs = rest_of_the_model(x)\n", "model = keras.Model(inputs, outputs)\n", "```\n", "\n", "このオプションを使用すると、モデルの残りの実行と同期してデバイス上で前処理が行われるため、GPU アクセラレーションの恩恵を受けることができます。GPU でトレーニングしている場合、これは`Normalization`レイヤー、およびすべての画像前処理レイヤーとデータ増強レイヤーに最適なオプションです。\n", "\n", "**オプション 2:** これを`tf.data.Dataset`に適用して、次のように前処理されたデータのバッチを生成するデータセットを取得します。\n", "\n", "```python\n", "dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))\n", "```\n", "\n", "このオプションを使用すると、前処理は CPU で非同期に行われ、モデルに入れる前にバッファリングされます。\n", "\n", "```python\n", "dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))\n", "dataset = dataset.prefetch(tf.data.AUTOTUNE)\n", "model.fit(dataset, ...)\n", "```\n", "\n", "これは、`TextVectorization`およびすべての構造化データ前処理レイヤーに最適なオプションです。CPU でトレーニングしていて、画像前処理レイヤーを使用している場合にも、これは良いオプションです。\n", "\n", "**TPU で実行する場合は、常に前処理レイヤーを `tf.data` パイプラインに配置する必要があります** (`Normalization` と `Rescaling` は、TPU で正常に実行され、最初のレイヤーが画像モデルとして一般的に使用されます)。" ] }, { "cell_type": "markdown", "metadata": { "id": "32f6d2a104b7" }, "source": [ "## 推論時にモデル内で前処理を行うことの利点\n", "\n", "オプション 2 を選択した場合でも、後で前処理レイヤーを含む推論のみのエンドツーエンドモデルをエクスポートしたい場合があります。これを行う主な利点は、**モデルを移植可能にする**ことと、**[トレーニング/サービングスキュー](https://developers.google.com/machine-learning/guides/rules-of-ml#training-serving_skew)**を減らせることです。\n", "\n", "すべてのデータ前処理がモデルの一部である場合、他のユーザーは、各特徴がどのようにエンコードおよび正規化されるかを知らなくても、モデルを読み込んで使用できます。推論モデルは生の画像または生の構造化データを処理できるようになり、モデルのユーザーは画像のピクセル値が`[-1, +1]`、または、`[0, 1]`に正規化されていても、テキストに使用されるトークン化スキーム、カテゴリカルフィーチャに使用されるインデックススキームの詳細を認識する必要はありません。これは、モデルを TensorFlow.js などの別のランタイムにエクスポートする場合に特に有用です。JavaScript で前処理パイプラインを再実装する必要はありません。\n", "\n", "最初に前処理レイヤーを`tf.data`パイプラインに配置した場合、前処理をパッケージ化する推論モデルをエクスポートできます。前処理レイヤーとトレーニングモデルをチェーンする新しいモデルをインスタンス化するだけです。\n", "\n", "```python\n", "inputs = keras.Input(shape=input_shape)\n", "x = preprocessing_layer(inputs)\n", "outputs = training_model(x)\n", "inference_model = keras.Model(inputs, outputs)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "b41b381d48d4" }, "source": [ "## クイックレシピ\n", "\n", "### 画像データ増強(デバイス上)\n", "\n", "画像データ増強レイヤーはトレーニング中にのみアクティブになることに注意してください(`Dropout`レイヤーと同様)。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:00.758385Z", "iopub.status.busy": "2022-12-14T21:29:00.758132Z", "iopub.status.idle": "2022-12-14T21:29:33.940288Z", "shell.execute_reply": "2022-12-14T21:29:33.939355Z" }, "id": "a3793692e983" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", " 8192/170498071 [..............................] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 204800/170498071 [..............................] - ETA: 55s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 1400832/170498071 [..............................] - ETA: 14s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 4538368/170498071 [..............................] - ETA: 6s " ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 9707520/170498071 [>.............................] - ETA: 3s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 14163968/170498071 [=>............................] - ETA: 2s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 18268160/170498071 [==>...........................] - ETA: 2s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 23003136/170498071 [===>..........................] - ETA: 2s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 28672000/170498071 [====>.........................] - ETA: 2s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 33521664/170498071 [====>.........................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 37560320/170498071 [=====>........................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 41631744/170498071 [======>.......................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 47136768/170498071 [=======>......................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 52625408/170498071 [========>.....................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 57352192/170498071 [=========>....................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 61317120/170498071 [=========>....................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 66199552/170498071 [==========>...................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 71393280/170498071 [===========>..................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 76341248/170498071 [============>.................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 80691200/170498071 [=============>................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 85024768/170498071 [=============>................] - ETA: 1s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 90234880/170498071 [==============>...............] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 95354880/170498071 [===============>..............] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 99753984/170498071 [================>.............] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "103989248/170498071 [=================>............] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "109772800/170498071 [==================>...........] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "115449856/170498071 [===================>..........] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "119611392/170498071 [====================>.........] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "123641856/170498071 [====================>.........] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "128565248/170498071 [=====================>........] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "134004736/170498071 [======================>.......] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "138821632/170498071 [=======================>......] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "142925824/170498071 [========================>.....] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "147496960/170498071 [========================>.....] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "152748032/170498071 [=========================>....] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "157835264/170498071 [==========================>...] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "162201600/170498071 [===========================>..] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "166723584/170498071 [============================>.] - ETA: 0s" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "170498071/170498071 [==============================] - 2s 0us/step\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.\n", "Instructions for updating:\n", "Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "1/5 [=====>........................] - ETA: 1:32 - loss: 4.1194" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "3/5 [=================>............] - ETA: 0s - loss: 8.7151 " ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "5/5 [==============================] - ETA: 0s - loss: 10.3676" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "5/5 [==============================] - 23s 38ms/step - loss: 10.3676\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from tensorflow import keras\n", "from tensorflow.keras import layers\n", "\n", "# Create a data augmentation stage with horizontal flipping, rotations, zooms\n", "data_augmentation = keras.Sequential(\n", " [\n", " layers.RandomFlip(\"horizontal\"),\n", " layers.RandomRotation(0.1),\n", " layers.RandomZoom(0.1),\n", " ]\n", ")\n", "\n", "# Load some data\n", "(x_train, y_train), _ = keras.datasets.cifar10.load_data()\n", "input_shape = x_train.shape[1:]\n", "classes = 10\n", "\n", "# Create a tf.data pipeline of augmented images (and their labels)\n", "train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n", "train_dataset = train_dataset.batch(16).map(lambda x, y: (data_augmentation(x), y))\n", "\n", "\n", "# Create a model and train it on the augmented image data\n", "inputs = keras.Input(shape=input_shape)\n", "x = layers.Rescaling(1.0 / 255)(inputs) # Rescale inputs\n", "outputs = keras.applications.ResNet50( # Add the rest of the model\n", " weights=None, input_shape=input_shape, classes=classes\n", ")(x)\n", "model = keras.Model(inputs, outputs)\n", "model.compile(optimizer=\"rmsprop\", loss=\"sparse_categorical_crossentropy\")\n", "model.fit(train_dataset, steps_per_epoch=5)" ] }, { "cell_type": "markdown", "metadata": { "id": "51d369f0310f" }, "source": [ "[画像分類を最初から行う](https://keras.io/examples/vision/image_classification_from_scratch/)の例で同様の設定が実際に行われていることを確認できます。" ] }, { "cell_type": "markdown", "metadata": { "id": "a79a1c48b2b7" }, "source": [ "### 数値的特徴の正規化" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:33.944138Z", "iopub.status.busy": "2022-12-14T21:29:33.943634Z", "iopub.status.idle": "2022-12-14T21:29:41.298460Z", "shell.execute_reply": "2022-12-14T21:29:41.297652Z" }, "id": "9cc2607a45c8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r", " 1/1563 [..............................] - ETA: 12:22 - loss: 3.2272" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 26/1563 [..............................] - ETA: 3s - loss: 2.6142 " ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 52/1563 [..............................] - ETA: 2s - loss: 2.4576" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 79/1563 [>.............................] - ETA: 2s - loss: 2.4113" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 106/1563 [=>............................] - ETA: 2s - loss: 2.3482" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 132/1563 [=>............................] - ETA: 2s - loss: 2.3135" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 158/1563 [==>...........................] - ETA: 2s - loss: 2.2871" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 183/1563 [==>...........................] - ETA: 2s - loss: 2.2721" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 208/1563 [==>...........................] - ETA: 2s - loss: 2.2509" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 233/1563 [===>..........................] - ETA: 2s - loss: 2.2391" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 259/1563 [===>..........................] - ETA: 2s - loss: 2.2314" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 285/1563 [====>.........................] - ETA: 2s - loss: 2.2230" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 310/1563 [====>.........................] - ETA: 2s - loss: 2.2189" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 335/1563 [=====>........................] - ETA: 2s - loss: 2.2178" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 361/1563 [=====>........................] - ETA: 2s - loss: 2.2182" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 387/1563 [======>.......................] - ETA: 2s - loss: 2.2079" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 414/1563 [======>.......................] - ETA: 2s - loss: 2.2044" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 440/1563 [=======>......................] - ETA: 2s - loss: 2.2028" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 467/1563 [=======>......................] - ETA: 2s - loss: 2.1950" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 494/1563 [========>.....................] - ETA: 2s - loss: 2.1918" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 521/1563 [=========>....................] - ETA: 2s - loss: 2.1877" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 548/1563 [=========>....................] - ETA: 1s - loss: 2.1868" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 575/1563 [==========>...................] - ETA: 1s - loss: 2.1810" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 601/1563 [==========>...................] - ETA: 1s - loss: 2.1826" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 627/1563 [===========>..................] - ETA: 1s - loss: 2.1785" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 653/1563 [===========>..................] - ETA: 1s - loss: 2.1761" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 679/1563 [============>.................] - ETA: 1s - loss: 2.1688" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 705/1563 [============>.................] - ETA: 1s - loss: 2.1678" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 731/1563 [=============>................] - ETA: 1s - loss: 2.1672" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 756/1563 [=============>................] - ETA: 1s - loss: 2.1634" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 781/1563 [=============>................] - ETA: 1s - loss: 2.1614" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 807/1563 [==============>...............] - ETA: 1s - loss: 2.1582" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 833/1563 [==============>...............] - ETA: 1s - loss: 2.1544" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 859/1563 [===============>..............] - ETA: 1s - loss: 2.1520" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 886/1563 [================>.............] - ETA: 1s - loss: 2.1517" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 913/1563 [================>.............] - ETA: 1s - loss: 2.1496" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 939/1563 [=================>............] - ETA: 1s - loss: 2.1448" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 965/1563 [=================>............] - ETA: 1s - loss: 2.1444" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", " 992/1563 [==================>...........] - ETA: 1s - loss: 2.1419" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1019/1563 [==================>...........] - ETA: 1s - loss: 2.1421" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1045/1563 [===================>..........] - ETA: 1s - loss: 2.1394" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1071/1563 [===================>..........] - ETA: 0s - loss: 2.1413" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1097/1563 [====================>.........] - ETA: 0s - loss: 2.1395" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1123/1563 [====================>.........] - ETA: 0s - loss: 2.1387" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1149/1563 [=====================>........] - ETA: 0s - loss: 2.1376" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1175/1563 [=====================>........] - ETA: 0s - loss: 2.1359" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1201/1563 [======================>.......] - ETA: 0s - loss: 2.1360" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1227/1563 [======================>.......] - ETA: 0s - loss: 2.1340" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1253/1563 [=======================>......] - ETA: 0s - loss: 2.1333" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1278/1563 [=======================>......] - ETA: 0s - loss: 2.1338" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1304/1563 [========================>.....] - ETA: 0s - loss: 2.1331" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1330/1563 [========================>.....] - ETA: 0s - loss: 2.1323" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1356/1563 [=========================>....] - ETA: 0s - loss: 2.1307" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1382/1563 [=========================>....] - ETA: 0s - loss: 2.1284" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1408/1563 [==========================>...] - ETA: 0s - loss: 2.1264" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1434/1563 [==========================>...] - ETA: 0s - loss: 2.1279" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1460/1563 [===========================>..] - ETA: 0s - loss: 2.1268" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1486/1563 [===========================>..] - ETA: 0s - loss: 2.1280" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1512/1563 [============================>.] - ETA: 0s - loss: 2.1273" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1538/1563 [============================>.] - ETA: 0s - loss: 2.1268" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1563/1563 [==============================] - 4s 2ms/step - loss: 2.1255\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load some data\n", "(x_train, y_train), _ = keras.datasets.cifar10.load_data()\n", "x_train = x_train.reshape((len(x_train), -1))\n", "input_shape = x_train.shape[1:]\n", "classes = 10\n", "\n", "# Create a Normalization layer and set its internal state using the training data\n", "normalizer = layers.Normalization()\n", "normalizer.adapt(x_train)\n", "\n", "# Create a model that include the normalization layer\n", "inputs = keras.Input(shape=input_shape)\n", "x = normalizer(inputs)\n", "outputs = layers.Dense(classes, activation=\"softmax\")(x)\n", "model = keras.Model(inputs, outputs)\n", "\n", "# Train the model\n", "model.compile(optimizer=\"adam\", loss=\"sparse_categorical_crossentropy\")\n", "model.fit(x_train, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "62685d477010" }, "source": [ "### ワンホットエンコーディングによる文字列カテゴリカルフィーチャのエンコーディング" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:41.302317Z", "iopub.status.busy": "2022-12-14T21:29:41.301647Z", "iopub.status.idle": "2022-12-14T21:29:41.400371Z", "shell.execute_reply": "2022-12-14T21:29:41.399724Z" }, "id": "ae0d2b0405f1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(\n", "[[0. 0. 0. 1.]\n", " [0. 0. 1. 0.]\n", " [0. 1. 0. 0.]\n", " [1. 0. 0. 0.]\n", " [1. 0. 0. 0.]\n", " [1. 0. 0. 0.]], shape=(6, 4), dtype=float32)\n" ] } ], "source": [ "# Define some toy data\n", "data = tf.constant([[\"a\"], [\"b\"], [\"c\"], [\"b\"], [\"c\"], [\"a\"]])\n", "\n", "# Use StringLookup to build an index of the feature values and encode output.\n", "lookup = layers.StringLookup(output_mode=\"one_hot\")\n", "lookup.adapt(data)\n", "\n", "# Convert new test data (which includes unknown feature values)\n", "test_data = tf.constant([[\"a\"], [\"b\"], [\"c\"], [\"d\"], [\"e\"], [\"\"]])\n", "encoded_data = lookup(test_data)\n", "print(encoded_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "686aeda532f5" }, "source": [ "ここで、インデックス 0 は、語彙外の値 (`adapt()` 中に表示されなかった値) 用に予約されていることに注意してください。\n", "\n", "[構造化データ分類を最初から行う](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/)の例で、`StringLookup` の動作を確認できます。" ] }, { "cell_type": "markdown", "metadata": { "id": "dc8af3e290df" }, "source": [ "### ワンホットエンコーディングによる整数カテゴリカルフィーチャのエンコーディング" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:41.403711Z", "iopub.status.busy": "2022-12-14T21:29:41.403464Z", "iopub.status.idle": "2022-12-14T21:29:41.495197Z", "shell.execute_reply": "2022-12-14T21:29:41.494277Z" }, "id": "75f3d6af4522" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tf.Tensor(\n", "[[0. 0. 1. 0. 0.]\n", " [0. 0. 1. 0. 0.]\n", " [0. 1. 0. 0. 0.]\n", " [1. 0. 0. 0. 0.]\n", " [1. 0. 0. 0. 0.]\n", " [0. 0. 0. 0. 1.]], shape=(6, 5), dtype=float32)\n" ] } ], "source": [ "# Define some toy data\n", "data = tf.constant([[10], [20], [20], [10], [30], [0]])\n", "\n", "# Use IntegerLookup to build an index of the feature values and encode output.\n", "lookup = layers.IntegerLookup(output_mode=\"one_hot\")\n", "lookup.adapt(data)\n", "\n", "# Convert new test data (which includes unknown feature values)\n", "test_data = tf.constant([[10], [10], [20], [50], [60], [0]])\n", "encoded_data = lookup(test_data)\n", "print(encoded_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "da5a6be487be" }, "source": [ "インデックス 0 は欠落している値 (値 0 として指定する必要があります) 用に予約されており、インデックス 1 は語彙外の値 (`adapt()` 中に表示されなかった値) 用に予約されていることに注意してください) 。これは、`IntegerLookup` の`mask_token` と ​`oov_token` コンストラクター引数を使用して構成できます\n", "\n", "[構造化データ分類を最初から行う](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/)の例で、`IntegerLookup` の動作を確認できます。" ] }, { "cell_type": "markdown", "metadata": { "id": "8fbfaa6ab3e2" }, "source": [ "### 整数のカテゴリカルフィーチャにハッシュトリックを適用する\n", "\n", "多くの異なる値(10 の 3 乗以上の桁)をとることができるカテゴリカルフィーチャがあり、各値がデータに数回しか表示されない場合、特徴値にインデックスを付けてワンホットエンコードすることは非現実的で効果的ではありません。このような場合は、代わりに、固定サイズのベクトルに値をハッシュする「ハッシュトリック」を適用することをお勧めします。これにより、特徴スペースのサイズが管理しやすくなり、明示的なインデックス作成が不要になります。" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:41.498783Z", "iopub.status.busy": "2022-12-14T21:29:41.498218Z", "iopub.status.idle": "2022-12-14T21:29:41.516941Z", "shell.execute_reply": "2022-12-14T21:29:41.516272Z" }, "id": "8f6c1f84c43c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(10000, 64)\n" ] } ], "source": [ "# Sample data: 10,000 random integers with values between 0 and 100,000\n", "data = np.random.randint(0, 100000, size=(10000, 1))\n", "\n", "# Use the Hashing layer to hash the values to the range [0, 64]\n", "hasher = layers.Hashing(num_bins=64, salt=1337)\n", "\n", "# Use the CategoryEncoding layer to multi-hot encode the hashed values\n", "encoder = layers.CategoryEncoding(num_tokens=64, output_mode=\"multi_hot\")\n", "encoded_data = encoder(hasher(data))\n", "print(encoded_data.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "df69b434d327" }, "source": [ "### トークンインデックスのシーケンスとしてテキストをエンコードする\n", "\n", "以下は、`Embedded`レイヤーに渡されるテキストを前処理する方法です。" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:41.520074Z", "iopub.status.busy": "2022-12-14T21:29:41.519685Z", "iopub.status.idle": "2022-12-14T21:29:44.427817Z", "shell.execute_reply": "2022-12-14T21:29:44.427172Z" }, "id": "361b561bc88b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Encoded text:\n", " [[ 2 19 14 1 9 2 1]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Training model...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "1/1 [==============================] - ETA: 0s - loss: 0.4559" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1/1 [==============================] - 2s 2s/step - loss: 0.4559\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Calling end-to-end model on test string...\n", "Model output: tf.Tensor([[0.02034554]], shape=(1, 1), dtype=float32)\n" ] } ], "source": [ "# Define some text data to adapt the layer\n", "adapt_data = tf.constant(\n", " [\n", " \"The Brain is wider than the Sky\",\n", " \"For put them side by side\",\n", " \"The one the other will contain\",\n", " \"With ease and You beside\",\n", " ]\n", ")\n", "\n", "# Create a TextVectorization layer\n", "text_vectorizer = layers.TextVectorization(output_mode=\"int\")\n", "# Index the vocabulary via `adapt()`\n", "text_vectorizer.adapt(adapt_data)\n", "\n", "# Try out the layer\n", "print(\n", " \"Encoded text:\\n\", text_vectorizer([\"The Brain is deeper than the sea\"]).numpy(),\n", ")\n", "\n", "# Create a simple model\n", "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", "x = layers.Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=16)(inputs)\n", "x = layers.GRU(8)(x)\n", "outputs = layers.Dense(1)(x)\n", "model = keras.Model(inputs, outputs)\n", "\n", "# Create a labeled dataset (which includes unknown tokens)\n", "train_dataset = tf.data.Dataset.from_tensor_slices(\n", " ([\"The Brain is deeper than the sea\", \"for if they are held Blue to Blue\"], [1, 0])\n", ")\n", "\n", "# Preprocess the string inputs, turning them into int sequences\n", "train_dataset = train_dataset.batch(2).map(lambda x, y: (text_vectorizer(x), y))\n", "# Train the model on the int sequences\n", "print(\"\\nTraining model...\")\n", "model.compile(optimizer=\"rmsprop\", loss=\"mse\")\n", "model.fit(train_dataset)\n", "\n", "# For inference, you can export a model that accepts strings as input\n", "inputs = keras.Input(shape=(1,), dtype=\"string\")\n", "x = text_vectorizer(inputs)\n", "outputs = model(x)\n", "end_to_end_model = keras.Model(inputs, outputs)\n", "\n", "# Call the end-to-end model on test data (which includes unknown tokens)\n", "print(\"\\nCalling end-to-end model on test string...\")\n", "test_data = tf.constant([\"The one the other will absorb\"])\n", "test_output = end_to_end_model(test_data)\n", "print(\"Model output:\", test_output)" ] }, { "cell_type": "markdown", "metadata": { "id": "e725dbcae3e4" }, "source": [ "テキスト分類を最初から行うの例では、`Embedded`モードと組み合わされてTextVectorizationレイヤーが動作する方法を確認できます。\n", "\n", "このようなモデルをトレーニングする場合、最高のパフォーマンスを得るには、入力パイプラインの一部として`TextVectorization`レイヤーを使用する必要があることに注意してください(上記のテキスト分類の例で示すように)。" ] }, { "cell_type": "markdown", "metadata": { "id": "28c2f2ff61fb" }, "source": [ "### マルチホットエンコーディングを使用した ngram の密な行列としてのテキストのエンコーディング\n", "\n", "これは、`Dense`レイヤーに渡されるテキストを前処理する方法です。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:44.431336Z", "iopub.status.busy": "2022-12-14T21:29:44.431092Z", "iopub.status.idle": "2022-12-14T21:29:45.105159Z", "shell.execute_reply": "2022-12-14T21:29:45.104474Z" }, "id": "7bae1c223cd8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:5 out of the last 1567 calls to .adapt_step at 0x7f0fb46204c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Encoded text:\n", " [[1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Training model...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "1/1 [==============================] - ETA: 0s - loss: 1.1152" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1/1 [==============================] - 0s 375ms/step - loss: 1.1152\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Calling end-to-end model on test string...\n", "Model output: tf.Tensor([[-0.26651537]], shape=(1, 1), dtype=float32)\n" ] } ], "source": [ "# Define some text data to adapt the layer\n", "adapt_data = tf.constant(\n", " [\n", " \"The Brain is wider than the Sky\",\n", " \"For put them side by side\",\n", " \"The one the other will contain\",\n", " \"With ease and You beside\",\n", " ]\n", ")\n", "# Instantiate TextVectorization with \"multi_hot\" output_mode\n", "# and ngrams=2 (index all bigrams)\n", "text_vectorizer = layers.TextVectorization(output_mode=\"multi_hot\", ngrams=2)\n", "# Index the bigrams via `adapt()`\n", "text_vectorizer.adapt(adapt_data)\n", "\n", "# Try out the layer\n", "print(\n", " \"Encoded text:\\n\", text_vectorizer([\"The Brain is deeper than the sea\"]).numpy(),\n", ")\n", "\n", "# Create a simple model\n", "inputs = keras.Input(shape=(text_vectorizer.vocabulary_size(),))\n", "outputs = layers.Dense(1)(inputs)\n", "model = keras.Model(inputs, outputs)\n", "\n", "# Create a labeled dataset (which includes unknown tokens)\n", "train_dataset = tf.data.Dataset.from_tensor_slices(\n", " ([\"The Brain is deeper than the sea\", \"for if they are held Blue to Blue\"], [1, 0])\n", ")\n", "\n", "# Preprocess the string inputs, turning them into int sequences\n", "train_dataset = train_dataset.batch(2).map(lambda x, y: (text_vectorizer(x), y))\n", "# Train the model on the int sequences\n", "print(\"\\nTraining model...\")\n", "model.compile(optimizer=\"rmsprop\", loss=\"mse\")\n", "model.fit(train_dataset)\n", "\n", "# For inference, you can export a model that accepts strings as input\n", "inputs = keras.Input(shape=(1,), dtype=\"string\")\n", "x = text_vectorizer(inputs)\n", "outputs = model(x)\n", "end_to_end_model = keras.Model(inputs, outputs)\n", "\n", "# Call the end-to-end model on test data (which includes unknown tokens)\n", "print(\"\\nCalling end-to-end model on test string...\")\n", "test_data = tf.constant([\"The one the other will absorb\"])\n", "test_output = end_to_end_model(test_data)\n", "print(\"Model output:\", test_output)" ] }, { "cell_type": "markdown", "metadata": { "id": "336a4d3426ed" }, "source": [ "### TF-IDF 重み付けを使用した ngramの 密な行列としてのテキストのエンコード\n", "\n", "これは、テキストを`Dense`レイヤーに渡す前に前処理する別の方法です。" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2022-12-14T21:29:45.108807Z", "iopub.status.busy": "2022-12-14T21:29:45.108210Z", "iopub.status.idle": "2022-12-14T21:29:45.826561Z", "shell.execute_reply": "2022-12-14T21:29:45.825825Z" }, "id": "5b6c0fec928e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:6 out of the last 1568 calls to .adapt_step at 0x7f11fc0f68b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Encoded text:\n", " [[5.461647 1.6945957 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 1.0986123 1.0986123 1.0986123 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0.\n", " 1.0986123 0. 0. 0. 0. 0. 0.\n", " 0. 1.0986123 1.0986123 0. 0. 0. ]]\n", "\n", "Training model...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\r", "1/1 [==============================] - ETA: 0s - loss: 2.9832" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r", "1/1 [==============================] - 0s 348ms/step - loss: 2.9832\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Calling end-to-end model on test string...\n", "Model output: tf.Tensor([[1.6719354]], shape=(1, 1), dtype=float32)\n" ] } ], "source": [ "# Define some text data to adapt the layer\n", "adapt_data = tf.constant(\n", " [\n", " \"The Brain is wider than the Sky\",\n", " \"For put them side by side\",\n", " \"The one the other will contain\",\n", " \"With ease and You beside\",\n", " ]\n", ")\n", "# Instantiate TextVectorization with \"tf-idf\" output_mode\n", "# (multi-hot with TF-IDF weighting) and ngrams=2 (index all bigrams)\n", "text_vectorizer = layers.TextVectorization(output_mode=\"tf-idf\", ngrams=2)\n", "# Index the bigrams and learn the TF-IDF weights via `adapt()`\n", "\n", "with tf.device(\"CPU\"):\n", " # A bug that prevents this from running on GPU for now.\n", " text_vectorizer.adapt(adapt_data)\n", "\n", "# Try out the layer\n", "print(\n", " \"Encoded text:\\n\", text_vectorizer([\"The Brain is deeper than the sea\"]).numpy(),\n", ")\n", "\n", "# Create a simple model\n", "inputs = keras.Input(shape=(text_vectorizer.vocabulary_size(),))\n", "outputs = layers.Dense(1)(inputs)\n", "model = keras.Model(inputs, outputs)\n", "\n", "# Create a labeled dataset (which includes unknown tokens)\n", "train_dataset = tf.data.Dataset.from_tensor_slices(\n", " ([\"The Brain is deeper than the sea\", \"for if they are held Blue to Blue\"], [1, 0])\n", ")\n", "\n", "# Preprocess the string inputs, turning them into int sequences\n", "train_dataset = train_dataset.batch(2).map(lambda x, y: (text_vectorizer(x), y))\n", "# Train the model on the int sequences\n", "print(\"\\nTraining model...\")\n", "model.compile(optimizer=\"rmsprop\", loss=\"mse\")\n", "model.fit(train_dataset)\n", "\n", "# For inference, you can export a model that accepts strings as input\n", "inputs = keras.Input(shape=(1,), dtype=\"string\")\n", "x = text_vectorizer(inputs)\n", "outputs = model(x)\n", "end_to_end_model = keras.Model(inputs, outputs)\n", "\n", "# Call the end-to-end model on test data (which includes unknown tokens)\n", "print(\"\\nCalling end-to-end model on test string...\")\n", "test_data = tf.constant([\"The one the other will absorb\"])\n", "test_output = end_to_end_model(test_data)\n", "print(\"Model output:\", test_output)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "143ce01c5558" }, "source": [ "## 重要な点\n", "\n", "### ルックアップレイヤーの語彙が非常に多い場合\n", "\n", "`TextVectorization`、`StringLookup` レイヤー、または `IntegerLookup` レイヤーで非常に多くの語彙がある場合があります。通常、500MB を超える語彙は「非常に多い」と見なされます。\n", "\n", "このような場合、最高のパフォーマンスを得るには、`adapt()` の使用を避ける必要があります。代わりに、事前に語彙を計算し (これには、Apache Beam または TF Transform を使用できます)、ファイルに保存します。次に、ファイルパスを `vocabulary` 引数として渡すことにより、構築時に語彙をレイヤーに読み込みます。\n", "\n", "### TPU ポッドまたは `ParameterServerStrategy` でルックアップレイヤーを使用する\n", "\n", "TPU ポッドや複数のマシンで `ParameterServerStrategy` によってトレーニングする際に、`TextVectorization`、`StringLookup`、または `IntegerLookup` レイヤーを使用すると、パフォーマンスが劣化する未解決の問題があります。これは、TensorFLおw2.7 で修正される予定です。" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "preprocessing_layers.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 0 }