{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow IO Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# Prometheus サーバーからメトリックを読み込む" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
TensorFlow.orgで表示 Google Colab で実行GitHub でソースを表示{ノートブックをダウンロード/a0}
" ] }, { "cell_type": "markdown", "metadata": { "id": "9wRVaOQZWgRc" }, "source": [ "注: このノートブックは python パッケージの他、`sudo apt-get install`を使用してサードパーティのパッケージをインストールします。" ] }, { "cell_type": "markdown", "metadata": { "id": "xHxb-dlhMIzW" }, "source": [ "## 概要\n", "\n", "このチュートリアルでは、CoreDNS メトリクスを [Prometheus](https://prometheus.io) サーバーから`tf.data.Dataset`に読み込み、`tf.keras`をトレーニングと推論に使用します。\n", "\n", "[CoreDNS](https://github.com/coredns/coredns) は、サービスディスカバリに重点を置いた DNS サーバーであり、[Kubernetes](https://kubernetes.io) クラスタの一部として広くデプロイされています。そのため、通常、開発オペレーションによって綿密に監視されています。\n", "\n", "このチュートリアルでは、開発者向けに機械学習による運用の自動化の例を紹介します。" ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## セットアップと使用法" ] }, { "cell_type": "markdown", "metadata": { "id": "upgCc3gXybsA" }, "source": [ "### 必要な tensorflow-io パッケージをインストールし、ランタイムを再起動する" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "48B9eAMMhAgw" }, "outputs": [], "source": [ "import os" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "uC6nYgKdWtOc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow 2.x selected.\n" ] } ], "source": [ "try:\n", " %tensorflow_version 2.x\n", "except Exception:\n", " pass" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "uUDYyMZRfkX4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: tensorflow-io in /usr/local/lib/python3.6/dist-packages (0.12.0)\n", "Requirement already satisfied: tensorflow<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow-io) (2.1.0)\n", "Requirement already satisfied: opt-einsum>=2.3.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.0)\n", "Requirement already satisfied: google-pasta>=0.1.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.1.8)\n", "Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)\n", "Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)\n", "Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.34.2)\n", "Requirement already satisfied: grpcio>=1.8.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.27.2)\n", "Requirement already satisfied: astor>=0.6.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.8.1)\n", "Requirement already satisfied: absl-py>=0.7.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.9.0)\n", "Requirement already satisfied: termcolor>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)\n", "Requirement already satisfied: numpy<2.0,>=1.16.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.18.1)\n", "Requirement already satisfied: keras-applications>=1.0.8 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.8)\n", "Requirement already satisfied: protobuf>=3.8.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.11.3)\n", "Requirement already satisfied: keras-preprocessing>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)\n", "Requirement already satisfied: wrapt>=1.11.1 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.12.0)\n", "Requirement already satisfied: gast==0.2.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.2)\n", "Requirement already satisfied: scipy==1.4.1; python_version >= \"3\" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.4.1)\n", "Requirement already satisfied: six>=1.12.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.14.0)\n", "Requirement already satisfied: markdown>=2.6.8 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.1)\n", "Requirement already satisfied: setuptools>=41.0.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (45.2.0)\n", "Requirement already satisfied: werkzeug>=0.11.15 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.0)\n", "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.1)\n", "Requirement already satisfied: google-auth<2,>=1.6.3 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.11.2)\n", "Requirement already satisfied: requests<3,>=2.21.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.23.0)\n", "Requirement already satisfied: h5py in /tensorflow-2.1.0/python3.6 (from keras-applications>=1.0.8->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.10.0)\n", "Requirement already satisfied: requests-oauthlib>=0.7.0 in /tensorflow-2.1.0/python3.6 (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.3.0)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.8)\n", "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0.0)\n", "Requirement already satisfied: rsa<4.1,>=3.1.4 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2019.11.28)\n", "Requirement already satisfied: idna<3,>=2.5 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.9)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.25.8)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.0.4)\n", "Requirement already satisfied: oauthlib>=3.0.0 in /tensorflow-2.1.0/python3.6 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0)\n", "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /tensorflow-2.1.0/python3.6 (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.8)\n" ] } ], "source": [ "!pip install tensorflow-io" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "m6KXZuTBWgRm" }, "outputs": [], "source": [ "from datetime import datetime\n", "\n", "import tensorflow as tf\n", "import tensorflow_io as tfio" ] }, { "cell_type": "markdown", "metadata": { "id": "yZmI7l_GykcW" }, "source": [ "### CoreDNS と Prometheus のインストールとセットアップ\n", "\n", "デモ用に、`9053`番ポートをローカルで開放し、DNS クエリを受信するための CoreDNS サーバーとスクレイピングのメトリックを公開するために`9153`番ポート (デフォルト) を開放します。以下は CoreDNS の基本的な Corefile 構成であり、[ダウンロード](https://github.com/tensorflow/io/blob/master/docs/tutorials/prometheus/Corefile)できます。\n", "\n", "```\n", ".:9053 { prometheus whoami }\n", "```\n", "\n", "インストールの詳細については、CoreDNS の[ドキュメント](https://coredns.io)を参照してください。\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "YUj0878jPyz7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".:9053 {\n", " prometheus\n", " whoami\n", "}\n" ] } ], "source": [ "!curl -s -OL https://github.com/coredns/coredns/releases/download/v1.6.7/coredns_1.6.7_linux_amd64.tgz\n", "!tar -xzf coredns_1.6.7_linux_amd64.tgz\n", "\n", "!curl -s -OL https://raw.githubusercontent.com/tensorflow/io/master/docs/tutorials/prometheus/Corefile\n", "\n", "!cat Corefile" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "n9ujlunrWgRx" }, "outputs": [], "source": [ "# Run `./coredns` as a background process.\n", "# IPython doesn't recognize `&` in inline bash cells.\n", "get_ipython().system_raw('./coredns &')" ] }, { "cell_type": "markdown", "metadata": { "id": "5ZWe5DwcWgR1" }, "source": [ "次に、Prometheus サーバーをセットアップし、Prometheus を使用して、上記の`9153`番ポートで公開されている CoreDNS メトリックを取得します。また、構成用の`prometheus.yml`ファイルは[ダウンロード](https://github.com/tensorflow/io/blob/master/docs/tutorials/prometheus/prometheus.yml)できます。\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "2HFfTfHkWgR3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "global:\n", " scrape_interval: 1s\n", " evaluation_interval: 1s\n", "alerting:\n", " alertmanagers:\n", " - static_configs:\n", " - targets:\n", "rule_files:\n", "scrape_configs:\n", "- job_name: 'prometheus'\n", " static_configs:\n", " - targets: ['localhost:9090']\n", "- job_name: \"coredns\"\n", " static_configs:\n", " - targets: ['localhost:9153']\n" ] } ], "source": [ "!curl -s -OL https://github.com/prometheus/prometheus/releases/download/v2.15.2/prometheus-2.15.2.linux-amd64.tar.gz\n", "!tar -xzf prometheus-2.15.2.linux-amd64.tar.gz --strip-components=1\n", "\n", "!curl -s -OL https://raw.githubusercontent.com/tensorflow/io/master/docs/tutorials/prometheus/prometheus.yml\n", "\n", "!cat prometheus.yml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VSJGsQtoWgR7" }, "outputs": [], "source": [ "# Run `./prometheus` as a background process.\n", "# IPython doesn't recognize `&` in inline bash cells.\n", "get_ipython().system_raw('./prometheus &')" ] }, { "cell_type": "markdown", "metadata": { "id": "rLxPgbI1WgR_" }, "source": [ "アクティビティを表示するためには、`dig`コマンドを使用して、セットアップされている CoreDNS サーバーに対していくつかの DNS クエリを生成できます。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FN0YNdstBl8M" }, "outputs": [], "source": [ "!sudo apt-get install -y -qq dnsutils" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "mrYsnIrVWgSE" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @127.0.0.1 -p 9053 demo1.example.org\n", "; (1 server found)\n", ";; global options: +cmd\n", ";; Got answer:\n", ";; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53868\n", ";; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 3\n", ";; WARNING: recursion requested but not available\n", "\n", ";; OPT PSEUDOSECTION:\n", "; EDNS: version: 0, flags:; udp: 4096\n", "; COOKIE: 855234f1adcb7a28 (echoed)\n", ";; QUESTION SECTION:\n", ";demo1.example.org.\t\tIN\tA\n", "\n", ";; ADDITIONAL SECTION:\n", "demo1.example.org.\t0\tIN\tA\t127.0.0.1\n", "_udp.demo1.example.org.\t0\tIN\tSRV\t0 0 45361 .\n", "\n", ";; Query time: 0 msec\n", ";; SERVER: 127.0.0.1#9053(127.0.0.1)\n", ";; WHEN: Tue Mar 03 22:35:20 UTC 2020\n", ";; MSG SIZE rcvd: 132\n", "\n" ] } ], "source": [ "!dig @127.0.0.1 -p 9053 demo1.example.org" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "5APx3wD6WgSH" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @127.0.0.1 -p 9053 demo2.example.org\n", "; (1 server found)\n", ";; global options: +cmd\n", ";; Got answer:\n", ";; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53163\n", ";; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 3\n", ";; WARNING: recursion requested but not available\n", "\n", ";; OPT PSEUDOSECTION:\n", "; EDNS: version: 0, flags:; udp: 4096\n", "; COOKIE: f18b2ba23e13446d (echoed)\n", ";; QUESTION SECTION:\n", ";demo2.example.org.\t\tIN\tA\n", "\n", ";; ADDITIONAL SECTION:\n", "demo2.example.org.\t0\tIN\tA\t127.0.0.1\n", "_udp.demo2.example.org.\t0\tIN\tSRV\t0 0 42194 .\n", "\n", ";; Query time: 0 msec\n", ";; SERVER: 127.0.0.1#9053(127.0.0.1)\n", ";; WHEN: Tue Mar 03 22:35:21 UTC 2020\n", ";; MSG SIZE rcvd: 132\n", "\n" ] } ], "source": [ "!dig @127.0.0.1 -p 9053 demo2.example.org" ] }, { "cell_type": "markdown", "metadata": { "id": "f61fK3bXQH4N" }, "source": [ "CoreDNS サーバーのメトリックが Prometheus サーバーによりスクレイピングされ、TensorFlow で使用する準備ができました。" ] }, { "cell_type": "markdown", "metadata": { "id": "acEST3amdyDI" }, "source": [ "### CoreDNS メトリックのデータセットを作成し、TensorFlow で使用する\n", "\n", "PostgreSQL サーバーから利用可能な CoreDNS メトリックのデータセットを作成します。これは、`tfio.experimental.IODataset.from_prometheus`を使用して実行できます。少なくとも次の 2 つの引数が必要です。 queryはメトリックを選択するため Prometheus サーバーに渡され、lengthは Dataset に読み込む期間です。\n", "\n", "`\"coredns_dns_request_count_total\"`と`\"5\"`(秒)から始めて、以下のデータセットを作成します。チュートリアルの前半で 2 つの DNS クエリが送信されたため、`\"coredns_dns_request_count_total\"`のメトリックは時系列の終わりに`\"2.0\"`になると予想されます。" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "h21RdP7meGzP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset Spec:\n", "(TensorSpec(shape=(), dtype=tf.int64, name=None), {'coredns': {'localhost:9153': {'coredns_dns_request_count_total': TensorSpec(shape=(), dtype=tf.float64, name=None)}}})\n", "\n", "CoreDNS Time Series:\n", "2020-03-03 22:35:17: 2.0\n", "2020-03-03 22:35:18: 2.0\n", "2020-03-03 22:35:19: 2.0\n", "2020-03-03 22:35:20: 2.0\n", "2020-03-03 22:35:21: 2.0\n" ] } ], "source": [ "dataset = tfio.experimental.IODataset.from_prometheus(\n", " \"coredns_dns_request_count_total\", 5, endpoint=\"http://localhost:9090\")\n", "\n", "\n", "print(\"Dataset Spec:\\n{}\\n\".format(dataset.element_spec))\n", "\n", "print(\"CoreDNS Time Series:\")\n", "for (time, value) in dataset:\n", " # time is milli second, convert to data time:\n", " time = datetime.fromtimestamp(time // 1000)\n", " print(\"{}: {}\".format(time, value['coredns']['localhost:9153']['coredns_dns_request_count_total']))" ] }, { "cell_type": "markdown", "metadata": { "id": "8y-VpwcWNYTF" }, "source": [ "データセットの仕様をさらに見てみましょう。\n", "\n", "```\n", "( TensorSpec(shape=(), dtype=tf.int64, name=None), { 'coredns': { 'localhost:9153': { 'coredns_dns_request_count_total': TensorSpec(shape=(), dtype=tf.float64, name=None) } } } )\n", "```\n", "\n", "データセットが`(time, values)`タプルで構成されていることは明らかです。`values`フィールドは、次のように展開された python dict です。\n", "\n", "```\n", "\"job_name\": { \"instance_name\": { \"metric_name\": value, }, }\n", "```\n", "\n", "上記の例では、`'coredns'`はジョブ名、`'localhost:9153'`はインスタンス名、`'coredns_dns_request_count_total'`はメトリック名です。使用する Prometheus クエリによっては、複数のジョブ/インスタンス/メトリックが返される可能性があることに注意してください。これは、データセットの構造で python dict が使用されている理由でもあります。\n", "\n", "別のクエリ`\"go_memstats_gc_sys_bytes\"`を例として見てみましょう。CoreDNS と Prometheus はどちらも Golang で記述されているため、`\"go_memstats_gc_sys_bytes\"`メトリックは、`\"coredns\"`ジョブと`\"prometheus\"`ジョブの両方で使用できます。" ] }, { "cell_type": "markdown", "metadata": { "id": "5CA3JUIkduY5" }, "source": [ "注: このセルは、最初に実行したときにエラーになる場合があります。もう一度実行すると、パスします。" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "qCoueXYZOvqZ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time Series CoreDNS/Prometheus Comparision:\n", "2020-03-03 22:35:17: 2385920.0/2775040.0\n", "2020-03-03 22:35:18: 2385920.0/2775040.0\n", "2020-03-03 22:35:19: 2385920.0/2775040.0\n", "2020-03-03 22:35:20: 2385920.0/2775040.0\n", "2020-03-03 22:35:21: 2385920.0/2775040.0\n" ] } ], "source": [ "dataset = tfio.experimental.IODataset.from_prometheus(\n", " \"go_memstats_gc_sys_bytes\", 5, endpoint=\"http://localhost:9090\")\n", "\n", "print(\"Time Series CoreDNS/Prometheus Comparision:\")\n", "for (time, value) in dataset:\n", " # time is milli second, convert to data time:\n", " time = datetime.fromtimestamp(time // 1000)\n", " print(\"{}: {}/{}\".format(\n", " time,\n", " value['coredns']['localhost:9153']['go_memstats_gc_sys_bytes'],\n", " value['prometheus']['localhost:9090']['go_memstats_gc_sys_bytes']))" ] }, { "cell_type": "markdown", "metadata": { "id": "xO2pheWEPQSU" }, "source": [ "作成された`Dataset`は、トレーニングまたは推論のために直接`tf.keras`に渡す準備ができています。" ] }, { "cell_type": "markdown", "metadata": { "id": "DhVm2fGaoyuA" }, "source": [ "## モデルトレーニングにデータセットを使用する\n", "\n", "メトリクスのデータセットを作成すると、モデルのトレーニングや推論のためにデータセットを`tf.keras`に直接渡すことができます。\n", "\n", "デモのために、このチュートリアルでは、1 つの特徴と 2 つのステップを入力とする非常に単純な LSTM モデルを使用します。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fxObBtlvr6n_" }, "outputs": [], "source": [ "n_steps, n_features = 2, 1\n", "simple_lstm_model = tf.keras.models.Sequential([\n", " tf.keras.layers.LSTM(8, input_shape=(n_steps, n_features)),\n", " tf.keras.layers.Dense(1)\n", "])\n", "\n", "simple_lstm_model.compile(optimizer='adam', loss='mae')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Moh_tEGZu-3_" }, "source": [ "使用するデータセットは、10 サンプルの CoreDNS の「go_memstats_sys_bytes」の値です。ただし、`window = n_steps`および`shift = 1`のスライディングウィンドウが形成されるため、追加のサンプルが必要です (2 つの連続する要素の場合、最初のサンプルは`x`で、2 番目は`y`と見なされます) 。合計は`10 + n_steps - 1 + 1 = 12` 秒です。\n", "\n", "また、データ値は`[0、1]`にスケーリングされています。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "CZmStrvFvJLN" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train for 10 steps\n", "Epoch 1/5\n", "10/10 [==============================] - 2s 150ms/step - loss: 0.8484\n", "Epoch 2/5\n", "10/10 [==============================] - 0s 10ms/step - loss: 0.7808\n", "Epoch 3/5\n", "10/10 [==============================] - 0s 10ms/step - loss: 0.7102\n", "Epoch 4/5\n", "10/10 [==============================] - 0s 11ms/step - loss: 0.6359\n", "Epoch 5/5\n", "10/10 [==============================] - 0s 11ms/step - loss: 0.5572\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "n_samples = 10\n", "\n", "dataset = tfio.experimental.IODataset.from_prometheus(\n", " \"go_memstats_sys_bytes\", n_samples + n_steps - 1 + 1, endpoint=\"http://localhost:9090\")\n", "\n", "# take go_memstats_gc_sys_bytes from coredns job \n", "dataset = dataset.map(lambda _, v: v['coredns']['localhost:9153']['go_memstats_sys_bytes'])\n", "\n", "# find the max value and scale the value to [0, 1]\n", "v_max = dataset.reduce(tf.constant(0.0, tf.float64), tf.math.maximum)\n", "dataset = dataset.map(lambda v: (v / v_max))\n", "\n", "# expand the dimension by 1 to fit n_features=1\n", "dataset = dataset.map(lambda v: tf.expand_dims(v, -1))\n", "\n", "# take a sliding window\n", "dataset = dataset.window(n_steps, shift=1, drop_remainder=True)\n", "dataset = dataset.flat_map(lambda d: d.batch(n_steps))\n", "\n", "\n", "# the first value is x and the next value is y, only take 10 samples\n", "x = dataset.take(n_samples)\n", "y = dataset.skip(1).take(n_samples)\n", "\n", "dataset = tf.data.Dataset.zip((x, y))\n", "\n", "# pass the final dataset to model.fit for training\n", "simple_lstm_model.fit(dataset.batch(1).repeat(10), epochs=5, steps_per_epoch=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "Df7wrNx2BTWW" }, "source": [ "このチュートリアルでセットアップされた CoreDNS サーバーにはワークロードがないため、上記のトレーニング済みモデルは実際にはあまり役に立ちませんが、これは実際の運用サーバーからメトリックを読み込むために使用できる作業パイプラインです。その後、モデルを改善して、開発の自動化という実際の問題を解決できます。" ] } ], "metadata": { "colab": { "collapsed_sections": [ "Tce3stUlHN0L" ], "name": "prometheus.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }