{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow IO Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# Load metrics from Prometheus server" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on TensorFlow.org\n", " \n", " Run in Google Colab\n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "9wRVaOQZWgRc" }, "source": [ "Caution: In addition to python packages this notebook uses `sudo apt-get install` to install third party packages." ] }, { "cell_type": "markdown", "metadata": { "id": "xHxb-dlhMIzW" }, "source": [ "## Overview\n", "\n", "This tutorial loads CoreDNS metrics from a [Prometheus](https://prometheus.io) server into a `tf.data.Dataset`, then uses `tf.keras` for training and inference.\n", "\n", "[CoreDNS](https://github.com/coredns/coredns) is a DNS server with a focus on service discovery, and is widely deployed as a part of the [Kubernetes](https://kubernetes.io) cluster. For that reason it is often closely monitoring by devops operations.\n", "\n", "This tutorial is an example that could be used by devops looking for automation in their operations through machine learning." ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## Setup and usage" ] }, { "cell_type": "markdown", "metadata": { "id": "upgCc3gXybsA" }, "source": [ "### Install required tensorflow-io package, and restart runtime" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "48B9eAMMhAgw" }, "outputs": [], "source": [ "import os" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "uC6nYgKdWtOc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow 2.x selected.\n" ] } ], "source": [ "try:\n", " %tensorflow_version 2.x\n", "except Exception:\n", " pass" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "uUDYyMZRfkX4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: tensorflow-io in /usr/local/lib/python3.6/dist-packages (0.12.0)\n", "Requirement already satisfied: tensorflow<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow-io) (2.1.0)\n", "Requirement already satisfied: opt-einsum>=2.3.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.0)\n", "Requirement already satisfied: google-pasta>=0.1.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.1.8)\n", "Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)\n", "Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)\n", "Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.34.2)\n", "Requirement already satisfied: grpcio>=1.8.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.27.2)\n", "Requirement already satisfied: astor>=0.6.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.8.1)\n", "Requirement already satisfied: absl-py>=0.7.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.9.0)\n", "Requirement already satisfied: termcolor>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)\n", "Requirement already satisfied: numpy<2.0,>=1.16.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.18.1)\n", "Requirement already satisfied: keras-applications>=1.0.8 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.8)\n", "Requirement already satisfied: protobuf>=3.8.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.11.3)\n", "Requirement already satisfied: keras-preprocessing>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)\n", "Requirement already satisfied: wrapt>=1.11.1 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.12.0)\n", "Requirement already satisfied: gast==0.2.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.2)\n", "Requirement already satisfied: scipy==1.4.1; python_version >= \"3\" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.4.1)\n", "Requirement already satisfied: six>=1.12.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.14.0)\n", "Requirement already satisfied: markdown>=2.6.8 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.1)\n", "Requirement already satisfied: setuptools>=41.0.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (45.2.0)\n", "Requirement already satisfied: werkzeug>=0.11.15 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.0)\n", "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.1)\n", "Requirement already satisfied: google-auth<2,>=1.6.3 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.11.2)\n", "Requirement already satisfied: requests<3,>=2.21.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.23.0)\n", "Requirement already satisfied: h5py in /tensorflow-2.1.0/python3.6 (from keras-applications>=1.0.8->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.10.0)\n", "Requirement already satisfied: requests-oauthlib>=0.7.0 in /tensorflow-2.1.0/python3.6 (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.3.0)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.8)\n", "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0.0)\n", "Requirement already satisfied: rsa<4.1,>=3.1.4 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2019.11.28)\n", "Requirement already satisfied: idna<3,>=2.5 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.9)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.25.8)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.0.4)\n", "Requirement already satisfied: oauthlib>=3.0.0 in /tensorflow-2.1.0/python3.6 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0)\n", "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /tensorflow-2.1.0/python3.6 (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.8)\n" ] } ], "source": [ "!pip install tensorflow-io" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "m6KXZuTBWgRm" }, "outputs": [], "source": [ "from datetime import datetime\n", "\n", "import tensorflow as tf\n", "import tensorflow_io as tfio" ] }, { "cell_type": "markdown", "metadata": { "id": "yZmI7l_GykcW" }, "source": [ "### Install and setup CoreDNS and Prometheus\n", "\n", "For demo purposes, a CoreDNS server locally with port `9053` open to receive DNS queries and port `9153` (defult) open to expose metrics for scraping. The following is a basic Corefile configuration for CoreDNS and is available to [download](https://github.com/tensorflow/io/blob/master/docs/tutorials/prometheus/Corefile):\n", "```\n", ".:9053 {\n", " prometheus\n", " whoami\n", "}\n", "```\n", "\n", "More details about installation could be found on CoreDNS's [documentation](https://coredns.io).\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "YUj0878jPyz7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".:9053 {\n", " prometheus\n", " whoami\n", "}\n" ] } ], "source": [ "!curl -s -OL https://github.com/coredns/coredns/releases/download/v1.6.7/coredns_1.6.7_linux_amd64.tgz\n", "!tar -xzf coredns_1.6.7_linux_amd64.tgz\n", "\n", "!curl -s -OL https://raw.githubusercontent.com/tensorflow/io/master/docs/tutorials/prometheus/Corefile\n", "\n", "!cat Corefile" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "n9ujlunrWgRx" }, "outputs": [], "source": [ "# Run `./coredns` as a background process.\n", "# IPython doesn't recognize `&` in inline bash cells.\n", "get_ipython().system_raw('./coredns &')" ] }, { "cell_type": "markdown", "metadata": { "id": "5ZWe5DwcWgR1" }, "source": [ "The next step is to setup Prometheus server and use Prometheus to scrape CoreDNS metrics that are exposed on port `9153` from above. The `prometheus.yml` file for configuration is also available for [download](https://github.com/tensorflow/io/blob/master/docs/tutorials/prometheus/prometheus.yml):\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "2HFfTfHkWgR3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "global:\n", " scrape_interval: 1s\n", " evaluation_interval: 1s\n", "alerting:\n", " alertmanagers:\n", " - static_configs:\n", " - targets:\n", "rule_files:\n", "scrape_configs:\n", "- job_name: 'prometheus'\n", " static_configs:\n", " - targets: ['localhost:9090']\n", "- job_name: \"coredns\"\n", " static_configs:\n", " - targets: ['localhost:9153']\n" ] } ], "source": [ "!curl -s -OL https://github.com/prometheus/prometheus/releases/download/v2.15.2/prometheus-2.15.2.linux-amd64.tar.gz\n", "!tar -xzf prometheus-2.15.2.linux-amd64.tar.gz --strip-components=1\n", "\n", "!curl -s -OL https://raw.githubusercontent.com/tensorflow/io/master/docs/tutorials/prometheus/prometheus.yml\n", "\n", "!cat prometheus.yml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VSJGsQtoWgR7" }, "outputs": [], "source": [ "# Run `./prometheus` as a background process.\n", "# IPython doesn't recognize `&` in inline bash cells.\n", "get_ipython().system_raw('./prometheus &')" ] }, { "cell_type": "markdown", "metadata": { "id": "rLxPgbI1WgR_" }, "source": [ "In order to show some activity, `dig` command could be used to generate a few DNS queries against the CoreDNS server that has been setup:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FN0YNdstBl8M" }, "outputs": [], "source": [ "!sudo apt-get install -y -qq dnsutils" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "mrYsnIrVWgSE" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @127.0.0.1 -p 9053 demo1.example.org\n", "; (1 server found)\n", ";; global options: +cmd\n", ";; Got answer:\n", ";; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53868\n", ";; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 3\n", ";; WARNING: recursion requested but not available\n", "\n", ";; OPT PSEUDOSECTION:\n", "; EDNS: version: 0, flags:; udp: 4096\n", "; COOKIE: 855234f1adcb7a28 (echoed)\n", ";; QUESTION SECTION:\n", ";demo1.example.org.\t\tIN\tA\n", "\n", ";; ADDITIONAL SECTION:\n", "demo1.example.org.\t0\tIN\tA\t127.0.0.1\n", "_udp.demo1.example.org.\t0\tIN\tSRV\t0 0 45361 .\n", "\n", ";; Query time: 0 msec\n", ";; SERVER: 127.0.0.1#9053(127.0.0.1)\n", ";; WHEN: Tue Mar 03 22:35:20 UTC 2020\n", ";; MSG SIZE rcvd: 132\n", "\n" ] } ], "source": [ "!dig @127.0.0.1 -p 9053 demo1.example.org" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "5APx3wD6WgSH" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @127.0.0.1 -p 9053 demo2.example.org\n", "; (1 server found)\n", ";; global options: +cmd\n", ";; Got answer:\n", ";; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53163\n", ";; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 3\n", ";; WARNING: recursion requested but not available\n", "\n", ";; OPT PSEUDOSECTION:\n", "; EDNS: version: 0, flags:; udp: 4096\n", "; COOKIE: f18b2ba23e13446d (echoed)\n", ";; QUESTION SECTION:\n", ";demo2.example.org.\t\tIN\tA\n", "\n", ";; ADDITIONAL SECTION:\n", "demo2.example.org.\t0\tIN\tA\t127.0.0.1\n", "_udp.demo2.example.org.\t0\tIN\tSRV\t0 0 42194 .\n", "\n", ";; Query time: 0 msec\n", ";; SERVER: 127.0.0.1#9053(127.0.0.1)\n", ";; WHEN: Tue Mar 03 22:35:21 UTC 2020\n", ";; MSG SIZE rcvd: 132\n", "\n" ] } ], "source": [ "!dig @127.0.0.1 -p 9053 demo2.example.org" ] }, { "cell_type": "markdown", "metadata": { "id": "f61fK3bXQH4N" }, "source": [ "Now a CoreDNS server whose metrics are scraped by a Prometheus server and ready to be consumed by TensorFlow." ] }, { "cell_type": "markdown", "metadata": { "id": "acEST3amdyDI" }, "source": [ "### Create Dataset for CoreDNS metrics and use it in TensorFlow\n", "\n", "Create a Dataset for CoreDNS metrics that is available from PostgreSQL server, could be done with `tfio.experimental.IODataset.from_prometheus`. At the minimium two arguments are needed. `query` is passed to Prometheus server to select the metrics and `length` is the period you want to load into Dataset.\n", "\n", "You can start with `\"coredns_dns_request_count_total\"` and `\"5\"` (secs) to create the Dataset below. Since earlier in the tutorial two DNS queries were sent, it is expected that the metrics for `\"coredns_dns_request_count_total\"` will be `\"2.0\"` at the end of the time series:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "h21RdP7meGzP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset Spec:\n", "(TensorSpec(shape=(), dtype=tf.int64, name=None), {'coredns': {'localhost:9153': {'coredns_dns_request_count_total': TensorSpec(shape=(), dtype=tf.float64, name=None)}}})\n", "\n", "CoreDNS Time Series:\n", "2020-03-03 22:35:17: 2.0\n", "2020-03-03 22:35:18: 2.0\n", "2020-03-03 22:35:19: 2.0\n", "2020-03-03 22:35:20: 2.0\n", "2020-03-03 22:35:21: 2.0\n" ] } ], "source": [ "dataset = tfio.experimental.IODataset.from_prometheus(\n", " \"coredns_dns_request_count_total\", 5, endpoint=\"http://localhost:9090\")\n", "\n", "\n", "print(\"Dataset Spec:\\n{}\\n\".format(dataset.element_spec))\n", "\n", "print(\"CoreDNS Time Series:\")\n", "for (time, value) in dataset:\n", " # time is milli second, convert to data time:\n", " time = datetime.fromtimestamp(time // 1000)\n", " print(\"{}: {}\".format(time, value['coredns']['localhost:9153']['coredns_dns_request_count_total']))" ] }, { "cell_type": "markdown", "metadata": { "id": "8y-VpwcWNYTF" }, "source": [ "Further looking into the spec of the Dataset:\n", "```\n", "(\n", " TensorSpec(shape=(), dtype=tf.int64, name=None),\n", " {\n", " 'coredns': {\n", " 'localhost:9153': {\n", " 'coredns_dns_request_count_total': TensorSpec(shape=(), dtype=tf.float64, name=None)\n", " }\n", " }\n", " }\n", ")\n", "\n", "```\n", "\n", "It is obvious that the dataset consists of a `(time, values)` tuple where the `values` field is a python dict expanded into:\n", "```\n", "\"job_name\": {\n", " \"instance_name\": {\n", " \"metric_name\": value,\n", " },\n", "}\n", "```\n", "\n", "In the above example, `'coredns'` is the job name, `'localhost:9153'` is the instance name, and `'coredns_dns_request_count_total'` is the metric name. Note that depending on the Prometheus query used, it is possible that multiple jobs/instances/metrics could be returned. This is also the reason why python dict has been used in the structure of the Dataset.\n", "\n", "Take another query `\"go_memstats_gc_sys_bytes\"` as an example. Since both CoreDNS and Prometheus are written in Golang, `\"go_memstats_gc_sys_bytes\"` metric is available for both `\"coredns\"` job and `\"prometheus\"` job:" ] }, { "cell_type": "markdown", "metadata": { "id": "5CA3JUIkduY5" }, "source": [ "Note: This cell may error out the first time you run it. Run it again and it will pass ." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "qCoueXYZOvqZ" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time Series CoreDNS/Prometheus Comparision:\n", "2020-03-03 22:35:17: 2385920.0/2775040.0\n", "2020-03-03 22:35:18: 2385920.0/2775040.0\n", "2020-03-03 22:35:19: 2385920.0/2775040.0\n", "2020-03-03 22:35:20: 2385920.0/2775040.0\n", "2020-03-03 22:35:21: 2385920.0/2775040.0\n" ] } ], "source": [ "dataset = tfio.experimental.IODataset.from_prometheus(\n", " \"go_memstats_gc_sys_bytes\", 5, endpoint=\"http://localhost:9090\")\n", "\n", "print(\"Time Series CoreDNS/Prometheus Comparision:\")\n", "for (time, value) in dataset:\n", " # time is milli second, convert to data time:\n", " time = datetime.fromtimestamp(time // 1000)\n", " print(\"{}: {}/{}\".format(\n", " time,\n", " value['coredns']['localhost:9153']['go_memstats_gc_sys_bytes'],\n", " value['prometheus']['localhost:9090']['go_memstats_gc_sys_bytes']))" ] }, { "cell_type": "markdown", "metadata": { "id": "xO2pheWEPQSU" }, "source": [ "The created `Dataset` is ready to be passed to `tf.keras` directly for either training or inference purposes now." ] }, { "cell_type": "markdown", "metadata": { "id": "DhVm2fGaoyuA" }, "source": [ "## Use Dataset for model training\n", "\n", "With metrics Dataset created, it is possible to directly pass the Dataset to `tf.keras` for model training or inference.\n", "\n", "For demo purposes, this tutorial will just use a very simple LSTM model with 1 feature and 2 steps as input:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fxObBtlvr6n_" }, "outputs": [], "source": [ "n_steps, n_features = 2, 1\n", "simple_lstm_model = tf.keras.models.Sequential([\n", " tf.keras.layers.LSTM(8, input_shape=(n_steps, n_features)),\n", " tf.keras.layers.Dense(1)\n", "])\n", "\n", "simple_lstm_model.compile(optimizer='adam', loss='mae')\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Moh_tEGZu-3_" }, "source": [ "The dataset to be used is the value of 'go_memstats_sys_bytes' for CoreDNS with 10 samples. However, since a sliding window of `window=n_steps` and `shift=1` are formed, additional samples are needed (for any two consecute elements, the first is taken as `x` and the second is taken as `y` for training). The total is `10 + n_steps - 1 + 1 = 12` seconds.\n", "\n", "The data value is also scaled to `[0, 1]`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "CZmStrvFvJLN" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train for 10 steps\n", "Epoch 1/5\n", "10/10 [==============================] - 2s 150ms/step - loss: 0.8484\n", "Epoch 2/5\n", "10/10 [==============================] - 0s 10ms/step - loss: 0.7808\n", "Epoch 3/5\n", "10/10 [==============================] - 0s 10ms/step - loss: 0.7102\n", "Epoch 4/5\n", "10/10 [==============================] - 0s 11ms/step - loss: 0.6359\n", "Epoch 5/5\n", "10/10 [==============================] - 0s 11ms/step - loss: 0.5572\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "n_samples = 10\n", "\n", "dataset = tfio.experimental.IODataset.from_prometheus(\n", " \"go_memstats_sys_bytes\", n_samples + n_steps - 1 + 1, endpoint=\"http://localhost:9090\")\n", "\n", "# take go_memstats_gc_sys_bytes from coredns job \n", "dataset = dataset.map(lambda _, v: v['coredns']['localhost:9153']['go_memstats_sys_bytes'])\n", "\n", "# find the max value and scale the value to [0, 1]\n", "v_max = dataset.reduce(tf.constant(0.0, tf.float64), tf.math.maximum)\n", "dataset = dataset.map(lambda v: (v / v_max))\n", "\n", "# expand the dimension by 1 to fit n_features=1\n", "dataset = dataset.map(lambda v: tf.expand_dims(v, -1))\n", "\n", "# take a sliding window\n", "dataset = dataset.window(n_steps, shift=1, drop_remainder=True)\n", "dataset = dataset.flat_map(lambda d: d.batch(n_steps))\n", "\n", "\n", "# the first value is x and the next value is y, only take 10 samples\n", "x = dataset.take(n_samples)\n", "y = dataset.skip(1).take(n_samples)\n", "\n", "dataset = tf.data.Dataset.zip((x, y))\n", "\n", "# pass the final dataset to model.fit for training\n", "simple_lstm_model.fit(dataset.batch(1).repeat(10), epochs=5, steps_per_epoch=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "Df7wrNx2BTWW" }, "source": [ "The trained model above is not very useful in reality, as the CoreDNS server that has been setup in this tutorial does not have any workload. However, this is a working pipeline that could be used to load metrics from true production servers. The model could then be improved to solve the real-world problem of devops automation." ] } ], "metadata": { "colab": { "collapsed_sections": [ "Tce3stUlHN0L" ], "name": "prometheus.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }