{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b518b04cbfe0"
   },
   "source": [
    "##### Copyright 2020 The TensorFlow Authors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cellView": "form",
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:02.707589Z",
     "iopub.status.busy": "2022-12-14T22:21:02.707132Z",
     "iopub.status.idle": "2022-12-14T22:21:02.710907Z",
     "shell.execute_reply": "2022-12-14T22:21:02.710387Z"
    },
    "id": "906e07f6e562"
   },
   "outputs": [],
   "source": [
    "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6e083398b477"
   },
   "source": [
    "# 전처리 레이어 처리"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "64010bd23c2e"
   },
   "source": [
    "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
    "  <td><a target=\"_blank\" href=\"https://www.tensorflow.org/guide/keras/preprocessing_layers\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\">TensorFlow.org에서 보기</a></td>\n",
    "  <td><a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/ko/guide/keras/preprocessing_layers.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\">Google Colab에서 실행</a></td>\n",
    "  <td><a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/ko/guide/keras/preprocessing_layers.ipynb\">     <img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\">    GitHub에서 소스 보기</a></td>\n",
    "  <td><a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/ko/guide/keras/preprocessing_layers.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\">노트북 다운로드</a></td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b1d403f04693"
   },
   "source": [
    "## Keras 전처리\n",
    "\n",
    "개발자는 Keras 전처리 레이어 API를 사용하여 Keras 네이티브 입력 처리 파이프라인을 구축할 수 있습니다. 이러한 입력 처리 파이프라인은 Keras가 아닌 워크플로에서 독립적인 전처리 코드로 사용할 수 있고, Keras 모델과 직접 결합하고, Keras SavedModel의 일부로 내보낼 수 있습니다.\n",
    "\n",
    "Keras 전처리 레이어를 사용하여 진정한 엔드 투 엔드 모델을 구축하고 내보낼 수 있으며, 이러한 모델에는 원시 이미지 또는 구조화된 원시 데이터를 입력으로 받아들이는 모델, 기능 정규화 또는 기능 값 인덱싱을 자체적으로 처리하는 모델이 있습니다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "313360fa9024"
   },
   "source": [
    "## 사용할 수 있는 전처리\n",
    "\n",
    "### 텍스트 전처리\n",
    "\n",
    "- `tf.keras.layers.TextVectorization`: 원시 문자열을 `Embedding` 레이어 또는 `Dense` 레이어에서 읽을 수 있는 인코딩 표현으로 바꿉니다.\n",
    "\n",
    "### 숫자 기능 전처리\n",
    "\n",
    "- `tf.keras.layers.Normalization`: 입력 기능의 기능별 정규화를 수행합니다.\n",
    "- `tf.keras.layers.Discretization`: 연속 숫자 기능을 정수 범주형 기능으로 바꿉니다.\n",
    "\n",
    "### 범주형 기능 전처리\n",
    "\n",
    "- `tf.keras.layers.CategoryEncoding`: 정수 범주형 기능을 원-핫(one-hot), 멀티-핫(multi-hot) 또는 카운트 밀집 표현(count dense representations)으로 바꿉니다.\n",
    "- `tf.keras.layers.Hashing`: \"해싱 트릭(hashing trick)\"이라 불리우는 범주형 기능 해싱을 수행합니다.\n",
    "- `tf.keras.layers.StringLookup`: 문자열 범주를 `Embedding` 레이어 또는 `Dense` 레이어에서 읽을 수 있는 인코딩 표현으로 바꿉니다.\n",
    "- `tf.keras.layers.IntegerLookup`: 정수 범주를 `Embedding` 레이어 또는 `Dense` 레이어에서 읽을 수 있는 인코딩 표현으로 바꿉니다.\n",
    "\n",
    "### 이미지 전처리\n",
    "\n",
    "이미지 모델 입력을 표준화하기 위한 레이어입니다.\n",
    "\n",
    "- `tf.keras.layers.Resizing`: 이미지 배치의 크기를 대상 크기로 조정합니다.\n",
    "- `tf.keras.layers.Rescaling`: 이미지 배치의 값을 재조정하고 오프셋합니다(예: `[0, 255]` 범위의 입력에서 `[0, 1]` 범위의 입력으로 이동).\n",
    "- `tf.keras.layers.CenterCrop`: 이미지 배치의 가운데 크롭을 반환합니다.\n",
    "\n",
    "### 이미지 데이터 증강\n",
    "\n",
    "이 레이어는 이미지 배치에 무작위 증강 변환을 적용합니다. 이러한 작업은 훈련 중에만 활성화됩니다.\n",
    "\n",
    "- `tf.keras.layers.RandomCrop`\n",
    "- `tf.keras.layers.RandomFlip`\n",
    "- `tf.keras.layers.RandomTranslation`\n",
    "- `tf.keras.layers.RandomRotation`\n",
    "- `tf.keras.layers.RandomZoom`\n",
    "- `tf.keras.layers.RandomHeight`\n",
    "- `tf.keras.layers.RandomWidth`\n",
    "- `tf.keras.layers.RandomContrast`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c923e41fb1b4"
   },
   "source": [
    "## `adapt()` 메서드\n",
    "\n",
    "일부 전처리 레이어에는 훈련 데이터의 샘플을 기반으로 계산할 수 있는 내부 상태가 있습니다. 상태 저장 전처리 레이어 목록은 다음과 같습니다.\n",
    "\n",
    "- `TextVectorization`: 문자열 토큰과 정수 인덱스 간의 매핑을 유지합니다.\n",
    "- `StringLookup` 및 `IntegerLookup`: 입력 값과 정수 인덱스 간의 매핑을 유지합니다.\n",
    "- `Normalization`: 기능의 평균과 표준편차를 유지합니다.\n",
    "- `Discretization`: 값 버킷 경계에 대한 정보를 유지합니다.\n",
    "\n",
    "결정적으로 이러한 레이어는 **훈련할 수 없습니다**. 이들의 상태는 훈련 중에 설정되지 않습니다. 미리 계산된 상수로부터 초기화하거나 데이터에 \"적용\"하는 방식을 사용하여 **훈련하기 전**에 설정해야 합니다.\n",
    "\n",
    "다음과 같이 `adapt()` 메서드를 통해 학습 데이터에 노출하여 전처리 레이어의 상태를 설정할 수 있습니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:02.714547Z",
     "iopub.status.busy": "2022-12-14T22:21:02.713990Z",
     "iopub.status.idle": "2022-12-14T22:21:08.228489Z",
     "shell.execute_reply": "2022-12-14T22:21:08.227607Z"
    },
    "id": "4cac6bd80812"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-12-14 22:21:03.662581: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n",
      "2022-12-14 22:21:03.662680: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n",
      "2022-12-14 22:21:03.662690: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Features mean: -0.00\n",
      "Features std: 1.00\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow.keras import layers\n",
    "\n",
    "data = np.array([[0.1, 0.2, 0.3], [0.8, 0.9, 1.0], [1.5, 1.6, 1.7],])\n",
    "layer = layers.Normalization()\n",
    "layer.adapt(data)\n",
    "normalized_data = layer(data)\n",
    "\n",
    "print(\"Features mean: %.2f\" % (normalized_data.numpy().mean()))\n",
    "print(\"Features std: %.2f\" % (normalized_data.numpy().std()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d43b8246b8a3"
   },
   "source": [
    "`adapt()` 메소드는 Numpy 배열 또는 `tf.data.Dataset` 객체를 사용합니다. `StringLookup` 및 `TextVectorization` 의 경우, 문자열의 목록을 전달할 수도 있습니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:08.232709Z",
     "iopub.status.busy": "2022-12-14T22:21:08.231926Z",
     "iopub.status.idle": "2022-12-14T22:21:08.398603Z",
     "shell.execute_reply": "2022-12-14T22:21:08.397912Z"
    },
    "id": "48d95713348a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(\n",
      "[[37 12 25  5  9 20 21  0  0]\n",
      " [51 34 27 33 29 18  0  0  0]\n",
      " [49 52 30 31 19 46 10  0  0]\n",
      " [ 7  5 50 43 28  7 47 17  0]\n",
      " [24 35 39 40  3  6 32 16  0]\n",
      " [ 4  2 15 14 22 23  0  0  0]\n",
      " [36 48  6 38 42  3 45  0  0]\n",
      " [ 4  2 13 41 53  8 44 26 11]], shape=(8, 9), dtype=int64)\n"
     ]
    }
   ],
   "source": [
    "data = [\n",
    "    \"ξεῖν᾽, ἦ τοι μὲν ὄνειροι ἀμήχανοι ἀκριτόμυθοι\",\n",
    "    \"γίγνοντ᾽, οὐδέ τι πάντα τελείεται ἀνθρώποισι.\",\n",
    "    \"δοιαὶ γάρ τε πύλαι ἀμενηνῶν εἰσὶν ὀνείρων:\",\n",
    "    \"αἱ μὲν γὰρ κεράεσσι τετεύχαται, αἱ δ᾽ ἐλέφαντι:\",\n",
    "    \"τῶν οἳ μέν κ᾽ ἔλθωσι διὰ πριστοῦ ἐλέφαντος,\",\n",
    "    \"οἵ ῥ᾽ ἐλεφαίρονται, ἔπε᾽ ἀκράαντα φέροντες:\",\n",
    "    \"οἱ δὲ διὰ ξεστῶν κεράων ἔλθωσι θύραζε,\",\n",
    "    \"οἵ ῥ᾽ ἔτυμα κραίνουσι, βροτῶν ὅτε κέν τις ἴδηται.\",\n",
    "]\n",
    "layer = layers.TextVectorization()\n",
    "layer.adapt(data)\n",
    "vectorized_text = layer(data)\n",
    "print(vectorized_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7619914dfb40"
   },
   "source": [
    "또한, 적응 가능한 레이어는 항상 생성자 인수 또는 가중치 할당을 통해 상태를 직접 설정하는 옵션을 제공합니다. 의도한 상태 값이 레이어 생성 시 알려지거나 `adapt()` 호출 외부에서 계산되는 경우, 레이어의 내부 계산에 의존하지 않고 설정할 수 있습니다. 예를 들어, `TextVectorization` , `StringLookup` 또는 `IntegerLookup` 레이어에 대한 외부 어휘 파일이 이미 있는 경우, 레이어의 생성자 인수에 있는 어휘 파일에 대한 경로를 전달하여 조회 테이블에 직접 로드할 수 있습니다.\n",
    "\n",
    "다음은 사전 계산된 어휘로 `StringLookup` 레이어를 인스턴스화하는 예입니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:08.402344Z",
     "iopub.status.busy": "2022-12-14T22:21:08.401588Z",
     "iopub.status.idle": "2022-12-14T22:21:08.411026Z",
     "shell.execute_reply": "2022-12-14T22:21:08.410412Z"
    },
    "id": "9df56efc7f3b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(\n",
      "[[1 3 4]\n",
      " [4 0 2]], shape=(2, 3), dtype=int64)\n"
     ]
    }
   ],
   "source": [
    "vocab = [\"a\", \"b\", \"c\", \"d\"]\n",
    "data = tf.constant([[\"a\", \"c\", \"d\"], [\"d\", \"z\", \"b\"]])\n",
    "layer = layers.StringLookup(vocabulary=vocab)\n",
    "vectorized_data = layer(data)\n",
    "print(vectorized_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "49cbfe135b00"
   },
   "source": [
    "## 모델 이전 또는 모델 내부에서의 데이터 전처리\n",
    "\n",
    "전처리 레이어를 사용할 수 있는 두 가지 방법이 있습니다.\n",
    "\n",
    "**옵션 1:** 다음과 같이 모델의 일부로 만듭니다.\n",
    "\n",
    "```python\n",
    "inputs = keras.Input(shape=input_shape)\n",
    "x = preprocessing_layer(inputs)\n",
    "outputs = rest_of_the_model(x)\n",
    "model = keras.Model(inputs, outputs)\n",
    "```\n",
    "\n",
    "이 옵션을 사용하면 나머지 모델 실행과 동시에 장치에서 전처리가 발생하므로 GPU 가속의 이점을 얻을 수 있습니다. GPU로 훈련을 진행하는 경우 `Normalization` 레이어와 모든 이미지 전처리 및 데이터 강화 레이어에 이 옵션이 가장 적합합니다.\n",
    "\n",
    "**옵션 2:** `tf.data.Dataset`에 이 옵션을 적용하면 다음과 같이 전처리된 데이터 배치를 생성하는 데이터세트를 얻습니다.\n",
    "\n",
    "```python\n",
    "dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))\n",
    "```\n",
    "\n",
    "이 옵션을 사용하면 전처리가 CPU에서 비동기적으로 발생하고 모델에 들어가기 전에 전처리가 버퍼링됩니다. 추가적으로 데이터세트에서 `dataset.prefetch(tf.data.AUTOTUNE)`를 호출하면 사전처리가 훈련과 병행하게 효율적으로 이루어집니다.\n",
    "\n",
    "```python\n",
    "dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))\n",
    "dataset = dataset.prefetch(tf.data.AUTOTUNE)\n",
    "model.fit(dataset, ...)\n",
    "```\n",
    "\n",
    "이는 `TextVectorization` 및 모든 구조화된 데이터 전처리 레이어에 가장 적합한 옵션입니다. CPU로 훈련하고 이미지 전처리 레이어를 사용하는 경우에도 좋은 옵션이 될 수 있습니다.\n",
    "\n",
    "**TPU에서 실행할 경우 항상 `tf.data` 파이프라인**에 전처리 레이어를 배치해야 합니다(TPU에서 잘 실행되고 일반적으로 첫 번째 레이어를 이미지 모델로 사용하는 `Normalization` 및 `Rescaling`은 제외)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "32f6d2a104b7"
   },
   "source": [
    "## 추론 시 모델 내에서 전처리를 수행할 때의 이점\n",
    "\n",
    "옵션 2를 사용하더라도 나중에 전처리 레이어를 포함하는 추론 전용 엔드 투 엔드 모델을 내보낼 수 있습니다. 이 작업의 주요 이점은 **모델을 이식 가능하게 만들고** **[훈련/적용 편향](https://developers.google.com/machine-learning/guides/rules-of-ml#training-serving_skew)**을 줄이는 데 도움이 된다는 것입니다.\n",
    "\n",
    "모든 데이터 전처리가 모델의 일부인 경우, 다른 사람들은 각 특성이 어떻게 인코딩되고 정규화될 것으로 예상되는지 알 필요 없이 모델을 로드하고 사용할 수 있습니다. 추론 모델은 원시 이미지 또는 원시 구조적 데이터를 처리할 수 있으며, 모델 사용자가 예를 들어, 텍스트에 사용되는 토큰화 체계, 범주형 특성에 사용되는 인덱싱 체계, 이미지 픽셀 값이 `[-1, +1]` 또는 `[0, 1]`로 정규화되었는지 여부와 같은 세부 정보를 알 필요가 없습니다. 이는 모델을 TensorFlow.js와 같은 다른 런타임으로 내보내는 경우 특히 강력합니다. JavaScript에서 전처리 파이프라인을 다시 구현할 필요가 없습니다.\n",
    "\n",
    "처음에 전처리 레이어를 `tf.data` 파이프라인에 배치한 경우, 전처리를 패키징하는 추론 모델을 내보낼 수 있습니다. 전처리 레이어와 훈련 모델을 연결하는 새 모델을 인스턴스화하기만 하면 됩니다.\n",
    "\n",
    "```python\n",
    "inputs = keras.Input(shape=input_shape)\n",
    "x = preprocessing_layer(inputs)\n",
    "outputs = training_model(x)\n",
    "inference_model = keras.Model(inputs, outputs)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b41b381d48d4"
   },
   "source": [
    "## 퀵 레시피\n",
    "\n",
    "### 이미지 데이터 증강\n",
    "\n",
    "이미지 데이터 증강 레이어는 훈련 중에만 활성화됩니다(`Dropout` 레이어와 유사)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:08.414273Z",
     "iopub.status.busy": "2022-12-14T22:21:08.413818Z",
     "iopub.status.idle": "2022-12-14T22:21:41.705344Z",
     "shell.execute_reply": "2022-12-14T22:21:41.704664Z"
    },
    "id": "a3793692e983"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "     8192/170498071 [..............................] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "   204800/170498071 [..............................] - ETA: 56s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "  1302528/170498071 [..............................] - ETA: 15s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "  4440064/170498071 [..............................] - ETA: 6s "
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "  9232384/170498071 [>.............................] - ETA: 3s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 13934592/170498071 [=>............................] - ETA: 3s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 18595840/170498071 [==>...........................] - ETA: 2s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 23273472/170498071 [===>..........................] - ETA: 2s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 28000256/170498071 [===>..........................] - ETA: 2s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 32636928/170498071 [====>.........................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 37199872/170498071 [=====>........................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 42049536/170498071 [======>.......................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 46702592/170498071 [=======>......................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 51396608/170498071 [========>.....................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 56131584/170498071 [========>.....................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 60866560/170498071 [=========>....................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 65576960/170498071 [==========>...................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 70279168/170498071 [===========>..................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 74694656/170498071 [============>.................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 79429632/170498071 [============>.................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 84074496/170498071 [=============>................] - ETA: 1s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 88735744/170498071 [==============>...............] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 93331456/170498071 [===============>..............] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 97959936/170498071 [================>.............] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "102580224/170498071 [=================>............] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "107192320/170498071 [=================>............] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "111788032/170498071 [==================>...........] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "116391936/170498071 [===================>..........] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "121020416/170498071 [====================>.........] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "125591552/170498071 [=====================>........] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "130228224/170498071 [=====================>........] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "134922240/170498071 [======================>.......] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "139608064/170498071 [=======================>......] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "144326656/170498071 [========================>.....] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "148987904/170498071 [=========================>....] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "153681920/170498071 [==========================>...] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "158433280/170498071 [==========================>...] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "163201024/170498071 [===========================>..] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "168001536/170498071 [============================>.] - ETA: 0s"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "170498071/170498071 [==============================] - 2s 0us/step\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.\n",
      "Instructions for updating:\n",
      "Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "1/5 [=====>........................] - ETA: 1:32 - loss: 4.7259"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "3/5 [=================>............] - ETA: 0s - loss: 8.3104  "
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "5/5 [==============================] - ETA: 0s - loss: 8.0458"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "5/5 [==============================] - 23s 38ms/step - loss: 8.0458\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.History at 0x7faddc27ea30>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from tensorflow import keras\n",
    "from tensorflow.keras import layers\n",
    "\n",
    "# Create a data augmentation stage with horizontal flipping, rotations, zooms\n",
    "data_augmentation = keras.Sequential(\n",
    "    [\n",
    "        layers.RandomFlip(\"horizontal\"),\n",
    "        layers.RandomRotation(0.1),\n",
    "        layers.RandomZoom(0.1),\n",
    "    ]\n",
    ")\n",
    "\n",
    "# Load some data\n",
    "(x_train, y_train), _ = keras.datasets.cifar10.load_data()\n",
    "input_shape = x_train.shape[1:]\n",
    "classes = 10\n",
    "\n",
    "# Create a tf.data pipeline of augmented images (and their labels)\n",
    "train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n",
    "train_dataset = train_dataset.batch(16).map(lambda x, y: (data_augmentation(x), y))\n",
    "\n",
    "\n",
    "# Create a model and train it on the augmented image data\n",
    "inputs = keras.Input(shape=input_shape)\n",
    "x = layers.Rescaling(1.0 / 255)(inputs)  # Rescale inputs\n",
    "outputs = keras.applications.ResNet50(  # Add the rest of the model\n",
    "    weights=None, input_shape=input_shape, classes=classes\n",
    ")(x)\n",
    "model = keras.Model(inputs, outputs)\n",
    "model.compile(optimizer=\"rmsprop\", loss=\"sparse_categorical_crossentropy\")\n",
    "model.fit(train_dataset, steps_per_epoch=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "51d369f0310f"
   },
   "source": [
    "[처음부터 이미지 분류하기](https://keras.io/examples/vision/image_classification_from_scratch/) 예제에서 유사한 설정이 동작하는 것을 볼 수 있습니다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "a79a1c48b2b7"
   },
   "source": [
    "### 수치 특성 정규화하기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:41.709193Z",
     "iopub.status.busy": "2022-12-14T22:21:41.708628Z",
     "iopub.status.idle": "2022-12-14T22:21:48.924299Z",
     "shell.execute_reply": "2022-12-14T22:21:48.923565Z"
    },
    "id": "9cc2607a45c8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "   1/1563 [..............................] - ETA: 12:33 - loss: 2.6955"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "  28/1563 [..............................] - ETA: 2s - loss: 2.4880   "
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "  56/1563 [>.............................] - ETA: 2s - loss: 2.3822"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "  84/1563 [>.............................] - ETA: 2s - loss: 2.3226"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 112/1563 [=>............................] - ETA: 2s - loss: 2.3049"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 139/1563 [=>............................] - ETA: 2s - loss: 2.2888"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 167/1563 [==>...........................] - ETA: 2s - loss: 2.2643"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 195/1563 [==>...........................] - ETA: 2s - loss: 2.2574"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 221/1563 [===>..........................] - ETA: 2s - loss: 2.2470"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 248/1563 [===>..........................] - ETA: 2s - loss: 2.2426"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 276/1563 [====>.........................] - ETA: 2s - loss: 2.2230"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 304/1563 [====>.........................] - ETA: 2s - loss: 2.2168"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 331/1563 [=====>........................] - ETA: 2s - loss: 2.2009"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 357/1563 [=====>........................] - ETA: 2s - loss: 2.1881"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 385/1563 [======>.......................] - ETA: 2s - loss: 2.1818"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 412/1563 [======>.......................] - ETA: 2s - loss: 2.1791"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 439/1563 [=======>......................] - ETA: 2s - loss: 2.1710"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 467/1563 [=======>......................] - ETA: 2s - loss: 2.1668"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 495/1563 [========>.....................] - ETA: 1s - loss: 2.1610"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 522/1563 [=========>....................] - ETA: 1s - loss: 2.1606"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 549/1563 [=========>....................] - ETA: 1s - loss: 2.1573"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 576/1563 [==========>...................] - ETA: 1s - loss: 2.1473"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 604/1563 [==========>...................] - ETA: 1s - loss: 2.1415"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 631/1563 [===========>..................] - ETA: 1s - loss: 2.1438"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 657/1563 [===========>..................] - ETA: 1s - loss: 2.1429"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 683/1563 [============>.................] - ETA: 1s - loss: 2.1400"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 709/1563 [============>.................] - ETA: 1s - loss: 2.1382"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 735/1563 [=============>................] - ETA: 1s - loss: 2.1363"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 761/1563 [=============>................] - ETA: 1s - loss: 2.1343"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 787/1563 [==============>...............] - ETA: 1s - loss: 2.1340"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 814/1563 [==============>...............] - ETA: 1s - loss: 2.1340"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 841/1563 [===============>..............] - ETA: 1s - loss: 2.1347"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 867/1563 [===============>..............] - ETA: 1s - loss: 2.1342"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 893/1563 [================>.............] - ETA: 1s - loss: 2.1345"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 919/1563 [================>.............] - ETA: 1s - loss: 2.1357"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 945/1563 [=================>............] - ETA: 1s - loss: 2.1330"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 971/1563 [=================>............] - ETA: 1s - loss: 2.1307"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      " 997/1563 [==================>...........] - ETA: 1s - loss: 2.1321"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1023/1563 [==================>...........] - ETA: 1s - loss: 2.1320"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1049/1563 [===================>..........] - ETA: 0s - loss: 2.1326"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1075/1563 [===================>..........] - ETA: 0s - loss: 2.1295"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1101/1563 [====================>.........] - ETA: 0s - loss: 2.1281"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1129/1563 [====================>.........] - ETA: 0s - loss: 2.1256"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1157/1563 [=====================>........] - ETA: 0s - loss: 2.1279"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1184/1563 [=====================>........] - ETA: 0s - loss: 2.1263"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1212/1563 [======================>.......] - ETA: 0s - loss: 2.1287"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1240/1563 [======================>.......] - ETA: 0s - loss: 2.1278"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1267/1563 [=======================>......] - ETA: 0s - loss: 2.1271"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1293/1563 [=======================>......] - ETA: 0s - loss: 2.1264"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1319/1563 [========================>.....] - ETA: 0s - loss: 2.1253"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1346/1563 [========================>.....] - ETA: 0s - loss: 2.1255"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1374/1563 [=========================>....] - ETA: 0s - loss: 2.1257"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1402/1563 [=========================>....] - ETA: 0s - loss: 2.1267"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1430/1563 [==========================>...] - ETA: 0s - loss: 2.1259"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1458/1563 [==========================>...] - ETA: 0s - loss: 2.1277"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1485/1563 [===========================>..] - ETA: 0s - loss: 2.1265"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1513/1563 [============================>.] - ETA: 0s - loss: 2.1288"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1541/1563 [============================>.] - ETA: 0s - loss: 2.1281"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1563/1563 [==============================] - 3s 2ms/step - loss: 2.1283\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.History at 0x7fad544b6700>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load some data\n",
    "(x_train, y_train), _ = keras.datasets.cifar10.load_data()\n",
    "x_train = x_train.reshape((len(x_train), -1))\n",
    "input_shape = x_train.shape[1:]\n",
    "classes = 10\n",
    "\n",
    "# Create a Normalization layer and set its internal state using the training data\n",
    "normalizer = layers.Normalization()\n",
    "normalizer.adapt(x_train)\n",
    "\n",
    "# Create a model that include the normalization layer\n",
    "inputs = keras.Input(shape=input_shape)\n",
    "x = normalizer(inputs)\n",
    "outputs = layers.Dense(classes, activation=\"softmax\")(x)\n",
    "model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Train the model\n",
    "model.compile(optimizer=\"adam\", loss=\"sparse_categorical_crossentropy\")\n",
    "model.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "62685d477010"
   },
   "source": [
    "### 원-핫 인코딩을 통해 문자열 범주형 특성 인코딩하기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:48.928050Z",
     "iopub.status.busy": "2022-12-14T22:21:48.927567Z",
     "iopub.status.idle": "2022-12-14T22:21:49.025348Z",
     "shell.execute_reply": "2022-12-14T22:21:49.024579Z"
    },
    "id": "ae0d2b0405f1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(\n",
      "[[0. 0. 0. 1.]\n",
      " [0. 0. 1. 0.]\n",
      " [0. 1. 0. 0.]\n",
      " [1. 0. 0. 0.]\n",
      " [1. 0. 0. 0.]\n",
      " [1. 0. 0. 0.]], shape=(6, 4), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "# Define some toy data\n",
    "data = tf.constant([[\"a\"], [\"b\"], [\"c\"], [\"b\"], [\"c\"], [\"a\"]])\n",
    "\n",
    "# Use StringLookup to build an index of the feature values and encode output.\n",
    "lookup = layers.StringLookup(output_mode=\"one_hot\")\n",
    "lookup.adapt(data)\n",
    "\n",
    "# Convert new test data (which includes unknown feature values)\n",
    "test_data = tf.constant([[\"a\"], [\"b\"], [\"c\"], [\"d\"], [\"e\"], [\"\"]])\n",
    "encoded_data = lookup(test_data)\n",
    "print(encoded_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "686aeda532f5"
   },
   "source": [
    "여기에서 인덱스 0은 어휘 이외의 값(`adapt()` 동안 표시되지 않은 값)을 위해 예약되어 있습니다\n",
    "\n",
    "<a>구조화된 데이터 분류 처음부터 하기</a> 예시에서 작동 중인 `StringLookup`을 볼 수 있습니다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dc8af3e290df"
   },
   "source": [
    "### 원-핫 인코딩을 통해 정수 범주형 특성 인코딩하기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:49.029353Z",
     "iopub.status.busy": "2022-12-14T22:21:49.028814Z",
     "iopub.status.idle": "2022-12-14T22:21:49.117590Z",
     "shell.execute_reply": "2022-12-14T22:21:49.116931Z"
    },
    "id": "75f3d6af4522"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf.Tensor(\n",
      "[[0. 0. 1. 0. 0.]\n",
      " [0. 0. 1. 0. 0.]\n",
      " [0. 1. 0. 0. 0.]\n",
      " [1. 0. 0. 0. 0.]\n",
      " [1. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 1.]], shape=(6, 5), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "# Define some toy data\n",
    "data = tf.constant([[10], [20], [20], [10], [30], [0]])\n",
    "\n",
    "# Use IntegerLookup to build an index of the feature values and encode output.\n",
    "lookup = layers.IntegerLookup(output_mode=\"one_hot\")\n",
    "lookup.adapt(data)\n",
    "\n",
    "# Convert new test data (which includes unknown feature values)\n",
    "test_data = tf.constant([[10], [10], [20], [50], [60], [0]])\n",
    "encoded_data = lookup(test_data)\n",
    "print(encoded_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "da5a6be487be"
   },
   "source": [
    "인덱스 0은 누락된 값(값을 0으로 지정해야 함)용으로 예약되어 있고 인덱스 1은 어휘 외 의값(`adapt()` 동안 표시되지 않은 값)용으로 예약되어 있습니다. `IntegerLookup`의 `mask_token` 및 `oov_token` 생성자 인수를 사용하여 이를 구성할 수 있습니다.\n",
    "\n",
    "[구조화된 데이터 분류 처음부터 하기](https://keras.io/examples/structured_data/structured_data_classification_from_scratch/) 예시에서 작동 중인 `IntegerLookup`을 볼 수 있습니다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8fbfaa6ab3e2"
   },
   "source": [
    "### 정수 범주형 특성에 해싱 트릭 적용하기\n",
    "\n",
    "여러 다른 값(10e3 이상)을 사용할 수 있는 범주형 특성의 각 값이 데이터에서 몇 번만 나타나는 경우, 특성 값을 인덱싱하고 원-핫 인코딩하는 것은 비실용적이고 비효율적입니다. 대신, \"해싱 트릭\"을 적용하는 것이 좋습니다. 값을 고정된 크기의 벡터로 해싱합니다. 이는 특성 공간의 크기를 관리 가능한 상태로 유지하고 명시적 인덱싱의 필요성을 제거합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:49.121128Z",
     "iopub.status.busy": "2022-12-14T22:21:49.120658Z",
     "iopub.status.idle": "2022-12-14T22:21:49.138503Z",
     "shell.execute_reply": "2022-12-14T22:21:49.137809Z"
    },
    "id": "8f6c1f84c43c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(10000, 64)\n"
     ]
    }
   ],
   "source": [
    "# Sample data: 10,000 random integers with values between 0 and 100,000\n",
    "data = np.random.randint(0, 100000, size=(10000, 1))\n",
    "\n",
    "# Use the Hashing layer to hash the values to the range [0, 64]\n",
    "hasher = layers.Hashing(num_bins=64, salt=1337)\n",
    "\n",
    "# Use the CategoryEncoding layer to multi-hot encode the hashed values\n",
    "encoder = layers.CategoryEncoding(num_tokens=64, output_mode=\"multi_hot\")\n",
    "encoded_data = encoder(hasher(data))\n",
    "print(encoded_data.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "df69b434d327"
   },
   "source": [
    "### 일련의 토큰 인덱스로 텍스트 인코딩하기\n",
    "\n",
    "`Embedding` 레이어에 전달될 텍스트를 전처리하는 방법입니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:49.141730Z",
     "iopub.status.busy": "2022-12-14T22:21:49.141262Z",
     "iopub.status.idle": "2022-12-14T22:21:51.664114Z",
     "shell.execute_reply": "2022-12-14T22:21:51.663371Z"
    },
    "id": "361b561bc88b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Encoded text:\n",
      " [[ 2 19 14  1  9  2  1]]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Training model...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "1/1 [==============================] - ETA: 0s - loss: 0.5044"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1/1 [==============================] - 2s 2s/step - loss: 0.5044\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Calling end-to-end model on test string...\n",
      "Model output: tf.Tensor([[0.02392788]], shape=(1, 1), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "# Define some text data to adapt the layer\n",
    "adapt_data = tf.constant(\n",
    "    [\n",
    "        \"The Brain is wider than the Sky\",\n",
    "        \"For put them side by side\",\n",
    "        \"The one the other will contain\",\n",
    "        \"With ease and You beside\",\n",
    "    ]\n",
    ")\n",
    "\n",
    "# Create a TextVectorization layer\n",
    "text_vectorizer = layers.TextVectorization(output_mode=\"int\")\n",
    "# Index the vocabulary via `adapt()`\n",
    "text_vectorizer.adapt(adapt_data)\n",
    "\n",
    "# Try out the layer\n",
    "print(\n",
    "    \"Encoded text:\\n\", text_vectorizer([\"The Brain is deeper than the sea\"]).numpy(),\n",
    ")\n",
    "\n",
    "# Create a simple model\n",
    "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n",
    "x = layers.Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=16)(inputs)\n",
    "x = layers.GRU(8)(x)\n",
    "outputs = layers.Dense(1)(x)\n",
    "model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Create a labeled dataset (which includes unknown tokens)\n",
    "train_dataset = tf.data.Dataset.from_tensor_slices(\n",
    "    ([\"The Brain is deeper than the sea\", \"for if they are held Blue to Blue\"], [1, 0])\n",
    ")\n",
    "\n",
    "# Preprocess the string inputs, turning them into int sequences\n",
    "train_dataset = train_dataset.batch(2).map(lambda x, y: (text_vectorizer(x), y))\n",
    "# Train the model on the int sequences\n",
    "print(\"\\nTraining model...\")\n",
    "model.compile(optimizer=\"rmsprop\", loss=\"mse\")\n",
    "model.fit(train_dataset)\n",
    "\n",
    "# For inference, you can export a model that accepts strings as input\n",
    "inputs = keras.Input(shape=(1,), dtype=\"string\")\n",
    "x = text_vectorizer(inputs)\n",
    "outputs = model(x)\n",
    "end_to_end_model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Call the end-to-end model on test data (which includes unknown tokens)\n",
    "print(\"\\nCalling end-to-end model on test string...\")\n",
    "test_data = tf.constant([\"The one the other will absorb\"])\n",
    "test_output = end_to_end_model(test_data)\n",
    "print(\"Model output:\", test_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e725dbcae3e4"
   },
   "source": [
    "<a>처음부터 텍스트 분류</a> 예에서 `Embedding` 모드와 결합된 <code>TextVectorization</code> 레이어가 동작하는 것을 볼 수 있습니다.\n",
    "\n",
    "이러한 모델을 훈련할 때에는 최상의 성능을 위해 `TextVectorization` 레이어를 입력 파이프라인의 일부로 사용해야 합니다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "28c2f2ff61fb"
   },
   "source": [
    "### 멀티-핫 인코딩을 사용하여 텍스트를 ngram의 밀집 행렬로 인코딩하기\n",
    "\n",
    "다음은 `Dense` 레이어로 전달될 텍스트를 전처리하는 방법입니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:51.667922Z",
     "iopub.status.busy": "2022-12-14T22:21:51.667427Z",
     "iopub.status.idle": "2022-12-14T22:21:52.359161Z",
     "shell.execute_reply": "2022-12-14T22:21:52.358463Z"
    },
    "id": "7bae1c223cd8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:5 out of the last 1567 calls to <function PreprocessingLayer.make_adapt_function.<locals>.adapt_step at 0x7fad547cd040> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Encoded text:\n",
      " [[1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0.\n",
      "  0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.]]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Training model...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "1/1 [==============================] - ETA: 0s - loss: 1.6243"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1/1 [==============================] - 0s 388ms/step - loss: 1.6243\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Calling end-to-end model on test string...\n",
      "Model output: tf.Tensor([[-0.38353857]], shape=(1, 1), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "# Define some text data to adapt the layer\n",
    "adapt_data = tf.constant(\n",
    "    [\n",
    "        \"The Brain is wider than the Sky\",\n",
    "        \"For put them side by side\",\n",
    "        \"The one the other will contain\",\n",
    "        \"With ease and You beside\",\n",
    "    ]\n",
    ")\n",
    "# Instantiate TextVectorization with \"multi_hot\" output_mode\n",
    "# and ngrams=2 (index all bigrams)\n",
    "text_vectorizer = layers.TextVectorization(output_mode=\"multi_hot\", ngrams=2)\n",
    "# Index the bigrams via `adapt()`\n",
    "text_vectorizer.adapt(adapt_data)\n",
    "\n",
    "# Try out the layer\n",
    "print(\n",
    "    \"Encoded text:\\n\", text_vectorizer([\"The Brain is deeper than the sea\"]).numpy(),\n",
    ")\n",
    "\n",
    "# Create a simple model\n",
    "inputs = keras.Input(shape=(text_vectorizer.vocabulary_size(),))\n",
    "outputs = layers.Dense(1)(inputs)\n",
    "model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Create a labeled dataset (which includes unknown tokens)\n",
    "train_dataset = tf.data.Dataset.from_tensor_slices(\n",
    "    ([\"The Brain is deeper than the sea\", \"for if they are held Blue to Blue\"], [1, 0])\n",
    ")\n",
    "\n",
    "# Preprocess the string inputs, turning them into int sequences\n",
    "train_dataset = train_dataset.batch(2).map(lambda x, y: (text_vectorizer(x), y))\n",
    "# Train the model on the int sequences\n",
    "print(\"\\nTraining model...\")\n",
    "model.compile(optimizer=\"rmsprop\", loss=\"mse\")\n",
    "model.fit(train_dataset)\n",
    "\n",
    "# For inference, you can export a model that accepts strings as input\n",
    "inputs = keras.Input(shape=(1,), dtype=\"string\")\n",
    "x = text_vectorizer(inputs)\n",
    "outputs = model(x)\n",
    "end_to_end_model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Call the end-to-end model on test data (which includes unknown tokens)\n",
    "print(\"\\nCalling end-to-end model on test string...\")\n",
    "test_data = tf.constant([\"The one the other will absorb\"])\n",
    "test_output = end_to_end_model(test_data)\n",
    "print(\"Model output:\", test_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "336a4d3426ed"
   },
   "source": [
    "### TF-IDF 가중치를 사용하여 텍스트를 ngram의 밀집 행렬로 인코딩하기\n",
    "\n",
    "다음은 `Dense` 레이어로 전달하기 전에 텍스트를 전처리하는 또 다른 방법입니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-12-14T22:21:52.362738Z",
     "iopub.status.busy": "2022-12-14T22:21:52.362067Z",
     "iopub.status.idle": "2022-12-14T22:21:53.087679Z",
     "shell.execute_reply": "2022-12-14T22:21:53.086832Z"
    },
    "id": "5b6c0fec928e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:6 out of the last 1568 calls to <function PreprocessingLayer.make_adapt_function.<locals>.adapt_step at 0x7fad4467b940> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Encoded text:\n",
      " [[5.461647  1.6945957 0.        0.        0.        0.        0.\n",
      "  0.        0.        0.        0.        0.        0.        0.\n",
      "  0.        0.        1.0986123 1.0986123 1.0986123 0.        0.\n",
      "  0.        0.        0.        0.        0.        0.        0.\n",
      "  1.0986123 0.        0.        0.        0.        0.        0.\n",
      "  0.        1.0986123 1.0986123 0.        0.        0.       ]]\n",
      "\n",
      "Training model...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "1/1 [==============================] - ETA: 0s - loss: 8.5937"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
      "1/1 [==============================] - 0s 349ms/step - loss: 8.5937\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Calling end-to-end model on test string...\n",
      "Model output: tf.Tensor([[-0.04213065]], shape=(1, 1), dtype=float32)\n"
     ]
    }
   ],
   "source": [
    "# Define some text data to adapt the layer\n",
    "adapt_data = tf.constant(\n",
    "    [\n",
    "        \"The Brain is wider than the Sky\",\n",
    "        \"For put them side by side\",\n",
    "        \"The one the other will contain\",\n",
    "        \"With ease and You beside\",\n",
    "    ]\n",
    ")\n",
    "# Instantiate TextVectorization with \"tf-idf\" output_mode\n",
    "# (multi-hot with TF-IDF weighting) and ngrams=2 (index all bigrams)\n",
    "text_vectorizer = layers.TextVectorization(output_mode=\"tf-idf\", ngrams=2)\n",
    "# Index the bigrams and learn the TF-IDF weights via `adapt()`\n",
    "\n",
    "with tf.device(\"CPU\"):\n",
    "    # A bug that prevents this from running on GPU for now.\n",
    "    text_vectorizer.adapt(adapt_data)\n",
    "\n",
    "# Try out the layer\n",
    "print(\n",
    "    \"Encoded text:\\n\", text_vectorizer([\"The Brain is deeper than the sea\"]).numpy(),\n",
    ")\n",
    "\n",
    "# Create a simple model\n",
    "inputs = keras.Input(shape=(text_vectorizer.vocabulary_size(),))\n",
    "outputs = layers.Dense(1)(inputs)\n",
    "model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Create a labeled dataset (which includes unknown tokens)\n",
    "train_dataset = tf.data.Dataset.from_tensor_slices(\n",
    "    ([\"The Brain is deeper than the sea\", \"for if they are held Blue to Blue\"], [1, 0])\n",
    ")\n",
    "\n",
    "# Preprocess the string inputs, turning them into int sequences\n",
    "train_dataset = train_dataset.batch(2).map(lambda x, y: (text_vectorizer(x), y))\n",
    "# Train the model on the int sequences\n",
    "print(\"\\nTraining model...\")\n",
    "model.compile(optimizer=\"rmsprop\", loss=\"mse\")\n",
    "model.fit(train_dataset)\n",
    "\n",
    "# For inference, you can export a model that accepts strings as input\n",
    "inputs = keras.Input(shape=(1,), dtype=\"string\")\n",
    "x = text_vectorizer(inputs)\n",
    "outputs = model(x)\n",
    "end_to_end_model = keras.Model(inputs, outputs)\n",
    "\n",
    "# Call the end-to-end model on test data (which includes unknown tokens)\n",
    "print(\"\\nCalling end-to-end model on test string...\")\n",
    "test_data = tf.constant([\"The one the other will absorb\"])\n",
    "test_output = end_to_end_model(test_data)\n",
    "print(\"Model output:\", test_output)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "143ce01c5558"
   },
   "source": [
    "## 중요 정보\n",
    "\n",
    "### 매우 방대한 어휘를 보유한 룩업(lookup) 레이어로 작업 수행하기\n",
    "\n",
    "`TextVectorization`, `StringLookup` 레이어 또는 `IntegerLookup` 레이어에서 매우 방대한 어휘로 작업해야 하는 경우가 있을 수 있습니다. 일반적으로 500MB보다 큰 어휘는 \"매우 큰\" 것으로 간주합니다.\n",
    "\n",
    "이러한 경우 최상의 성능을 위해 `adapt()` 사용을 피해야 합니다. 대신 사전에 어휘를 미리 계산하고(이를 위해 Apache Beam 또는 TF Transform을 사용할 수 있음) 이를 파일에 저장합니다. 그런 다음 파일 경로를 `vocabulary` 인수로 전달하여 구성하는 시점에 어휘를 레이어에 로드합니다.\n",
    "\n",
    "### TPU 포드에서 또는 `ParameterServerStrategy`로 룩업(lookup) 레이어 사용하기\n",
    "\n",
    "TPU 포드 또는 여러 기기에서 `ParameterServerStrategy`를 사용하여 훈련을 진행하는 동안 `TextVectorization`, `StringLookup` 또는 `IntegerLookup`을 사용할 경우 성능을 저하시키는 미해결 문제가 있습니다. 이는 TensorFlow 2.7에서 수정될 예정입니다."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "preprocessing_layers.ipynb",
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}