{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Tce3stUlHN0L"
},
"source": [
"##### Copyright 2023 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2024-04-20T11:21:57.299827Z",
"iopub.status.busy": "2024-04-20T11:21:57.299373Z",
"iopub.status.idle": "2024-04-20T11:21:57.303487Z",
"shell.execute_reply": "2024-04-20T11:21:57.302899Z"
},
"id": "tuOe1ymfHZPu"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "36EdAGhThQov"
},
"source": [
"# Uplifting with Decision Forests\n",
"\n",
"
\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2j8GzKvfVvF8"
},
"source": [
"Welcome to the *Uplifting* Tutorial for TensorFlow Decision Forests (TF-DF). In this tutorial, you will learn what uplifting is, why it is so important, and how to do it in TF-DF.\n",
"\n",
"This tutorial assumes you are familiar with the fundaments of TF-DF, in particular the installation procedure. The [beginner tutorial](https://www.tensorflow.org/decision_forests/tutorials/beginner_colab) is a great place to start learning about TF-DF.\n",
"\n",
"In this colab, you will:\n",
"\n",
"- Learn what an uplift modeling is.\n",
"- Train a Uplift Random Forest model on the **Hillstrom Email Marketing** dataset.\n",
"- Evaluate the quality of this model.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MQIPhTQVW19g"
},
"source": [
"## Installing TensorFlow Decision Forests\n",
"\n",
"Install TF-DF by running the following cell.\n",
"\n",
"[Wurlitzer](https://pypi.org/project/wurlitzer/) is needed to display the detailed training logs in Colabs (when using `verbose=2` in the model constructor)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:21:57.307220Z",
"iopub.status.busy": "2024-04-20T11:21:57.306592Z",
"iopub.status.idle": "2024-04-20T11:22:00.150145Z",
"shell.execute_reply": "2024-04-20T11:22:00.149111Z"
},
"id": "oiz5HmMyWxgd"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting tensorflow_decision_forests\r\n",
" Using cached tensorflow_decision_forests-1.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.0 kB)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting wurlitzer\r\n",
" Using cached wurlitzer-3.0.3-py3-none-any.whl.metadata (1.9 kB)\r\n",
"Requirement already satisfied: numpy in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.26.4)\r\n",
"Requirement already satisfied: pandas in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.2.2)\r\n",
"Requirement already satisfied: tensorflow~=2.16.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.16.1)\r\n",
"Requirement already satisfied: six in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.16.0)\r\n",
"Requirement already satisfied: absl-py in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (1.4.0)\r\n",
"Requirement already satisfied: wheel in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (0.41.2)\r\n",
"Requirement already satisfied: tf-keras~=2.16 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow_decision_forests) (2.16.0)\r\n",
"Requirement already satisfied: astunparse>=1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (1.6.3)\r\n",
"Requirement already satisfied: flatbuffers>=23.5.26 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (24.3.25)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (0.5.4)\r\n",
"Requirement already satisfied: google-pasta>=0.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (0.2.0)\r\n",
"Requirement already satisfied: h5py>=3.10.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (3.11.0)\r\n",
"Requirement already satisfied: libclang>=13.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (18.1.1)\r\n",
"Requirement already satisfied: ml-dtypes~=0.3.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (0.3.2)\r\n",
"Requirement already satisfied: opt-einsum>=2.3.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (3.3.0)\r\n",
"Requirement already satisfied: packaging in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (24.0)\r\n",
"Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (3.20.3)\r\n",
"Requirement already satisfied: requests<3,>=2.21.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (2.31.0)\r\n",
"Requirement already satisfied: setuptools in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (69.5.1)\r\n",
"Requirement already satisfied: termcolor>=1.1.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (2.4.0)\r\n",
"Requirement already satisfied: typing-extensions>=3.6.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (4.11.0)\r\n",
"Requirement already satisfied: wrapt>=1.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (1.16.0)\r\n",
"Requirement already satisfied: grpcio<2.0,>=1.24.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (1.63.0rc2)\r\n",
"Requirement already satisfied: tensorboard<2.17,>=2.16 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (2.16.2)\r\n",
"Requirement already satisfied: keras>=3.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (3.2.1)\r\n",
"Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow~=2.16.1->tensorflow_decision_forests) (0.36.0)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: python-dateutil>=2.8.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2.9.0.post0)\r\n",
"Requirement already satisfied: pytz>=2020.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2024.1)\r\n",
"Requirement already satisfied: tzdata>=2022.7 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow_decision_forests) (2024.1)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: rich in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from keras>=3.0.0->tensorflow~=2.16.1->tensorflow_decision_forests) (13.7.1)\r\n",
"Requirement already satisfied: namex in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from keras>=3.0.0->tensorflow~=2.16.1->tensorflow_decision_forests) (0.0.8)\r\n",
"Requirement already satisfied: optree in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from keras>=3.0.0->tensorflow~=2.16.1->tensorflow_decision_forests) (0.11.0)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: charset-normalizer<4,>=2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow~=2.16.1->tensorflow_decision_forests) (3.3.2)\r\n",
"Requirement already satisfied: idna<4,>=2.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow~=2.16.1->tensorflow_decision_forests) (3.7)\r\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow~=2.16.1->tensorflow_decision_forests) (2.2.1)\r\n",
"Requirement already satisfied: certifi>=2017.4.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow~=2.16.1->tensorflow_decision_forests) (2024.2.2)\r\n",
"Requirement already satisfied: markdown>=2.6.8 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.17,>=2.16->tensorflow~=2.16.1->tensorflow_decision_forests) (3.6)\r\n",
"Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.17,>=2.16->tensorflow~=2.16.1->tensorflow_decision_forests) (0.7.2)\r\n",
"Requirement already satisfied: werkzeug>=1.0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.17,>=2.16->tensorflow~=2.16.1->tensorflow_decision_forests) (3.0.2)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: importlib-metadata>=4.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard<2.17,>=2.16->tensorflow~=2.16.1->tensorflow_decision_forests) (7.1.0)\r\n",
"Requirement already satisfied: MarkupSafe>=2.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard<2.17,>=2.16->tensorflow~=2.16.1->tensorflow_decision_forests) (2.1.5)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: markdown-it-py>=2.2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from rich->keras>=3.0.0->tensorflow~=2.16.1->tensorflow_decision_forests) (3.0.0)\r\n",
"Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from rich->keras>=3.0.0->tensorflow~=2.16.1->tensorflow_decision_forests) (2.17.2)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: zipp>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.17,>=2.16->tensorflow~=2.16.1->tensorflow_decision_forests) (3.18.1)\r\n",
"Requirement already satisfied: mdurl~=0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.0.0->tensorflow~=2.16.1->tensorflow_decision_forests) (0.1.2)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using cached tensorflow_decision_forests-1.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.5 MB)\r\n",
"Using cached wurlitzer-3.0.3-py3-none-any.whl (7.3 kB)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Installing collected packages: wurlitzer, tensorflow_decision_forests\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Successfully installed tensorflow_decision_forests-1.9.0 wurlitzer-3.0.3\r\n"
]
}
],
"source": [
"!pip install tensorflow_decision_forests wurlitzer"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2LIE3UDMXeB4"
},
"source": [
"## Importing libraries"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:00.154628Z",
"iopub.status.busy": "2024-04-20T11:22:00.154336Z",
"iopub.status.idle": "2024-04-20T11:22:02.986156Z",
"shell.execute_reply": "2024-04-20T11:22:02.985360Z"
},
"id": "ue7Q-ysiPOmG"
},
"outputs": [],
"source": [
"import tensorflow_decision_forests as tfdf\n",
"\n",
"import os\n",
"import numpy as np\n",
"import pandas as pd\n",
"import tensorflow as tf\n",
"import math\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bN7quUfTXjaA"
},
"source": [
"The hidden code cell limits the output height in colab.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2024-04-20T11:22:02.990709Z",
"iopub.status.busy": "2024-04-20T11:22:02.990282Z",
"iopub.status.idle": "2024-04-20T11:22:02.994892Z",
"shell.execute_reply": "2024-04-20T11:22:02.994252Z"
},
"id": "nFP4KJ79Xl3J"
},
"outputs": [],
"source": [
"#@title\n",
"\n",
"from IPython.core.magic import register_line_magic\n",
"from IPython.display import Javascript\n",
"from IPython.display import display as ipy_display\n",
"\n",
"# Some of the model training logs can cover the full\n",
"# screen if not compressed to a smaller viewport.\n",
"# This magic allows setting a max height for a cell.\n",
"@register_line_magic\n",
"def set_cell_height(size):\n",
" ipy_display(\n",
" Javascript(\"google.colab.output.setIframeHeight(0, true, {maxHeight: \" +\n",
" str(size) + \"})\"))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:02.998078Z",
"iopub.status.busy": "2024-04-20T11:22:02.997778Z",
"iopub.status.idle": "2024-04-20T11:22:03.001400Z",
"shell.execute_reply": "2024-04-20T11:22:03.000741Z"
},
"id": "jnpiCdRKXvir"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found TensorFlow Decision Forests v1.9.0\n"
]
}
],
"source": [
"# Check the version of TensorFlow Decision Forests\n",
"print(\"Found TensorFlow Decision Forests v\" + tfdf.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9SqXMEGLX0ry"
},
"source": [
"## What is uplift modeling?\n",
"\n",
"[Uplift modeling](https://en.wikipedia.org/wiki/Uplift_modelling) is a statistical modeling technique to predict the **incremental impact of an action** on a subject. The action is often referred to as a **treatment** that may or may not be applied.\n",
"\n",
"Uplift modeling is often used in targeted marketing campaigns to predict the increase in the likelihood of a person making a purchase (or any other desired action) based on the marketing exposition they receive.\n",
"\n",
"For example, uplift modeling can predict the **effect** of an email. The effect is defined as the **conditional probability**\n",
"\\begin{align}\n",
"\\text{effect}(\\text{email}) = &\\Pr(\\text{outcome}=\\text{purchase}\\ \\vert\\ \\text{treatment}=\\text{with email})\\\\ &- \\Pr(\\text{outcome}=\\text{purchase} \\ \\vert\\ \\text{treatment}=\\text{no email}),\n",
"\\end{align}\n",
"where $\\Pr(\\text{outcome}=\\text{purchase}\\ \\vert\\ ...)$\n",
"is the probability of purchase depending on the receiving or not an email.\n",
"\n",
"Compare this to a classification model: With a classification model, one can predict the probability of a purchase. However, customers with a high probability are likely to spend money in the store regardless of whether or not they received an email.\n",
"\n",
"Similarly, one can use **numerical uplifting** to predict the numerical **increase in spend** when receiving an email. In comparison, a regression model can only increase the expected spend, which is a less useful metric in many cases.\n",
"\n",
"### Defining uplift models in TF-DF\n",
"\n",
"TF-DF expects uplifting datasets to be presented in a \"flat\" format.\n",
"A dataset of customers might look like this\n",
"\n",
"treatment | outcome | feature_1 | feature_2\n",
"--------- | ------- | --------- | ---------\n",
"0 | 1 | 0.1 | blue \n",
"0 | 0 | 0.2 | blue \n",
"1 | 1 | 0.3 | blue \n",
"1 | 1 | 0.4 | blue \n",
"\n",
"\n",
"The **treatment** is a binary variable indicating whether or not the example has received treatment. In the above example, the treatment indicates if the customer has received an email or not. The **outcome** (label) indicates the status of the example after receiving the treatment (or not). TF-DF supports categorical outcomes for categorical uplifting and numerical outcomes for numerical uplifting.\n",
"\n",
"**Note**: Uplifting is also frequently used in medical contexts. Here the *treatment* can be a medical treatment (e.g. administering a vaccine), the label can be an indicator of quality of life (e.g. whether the patient got sick). This also explains the nomenclature of uplift modeling."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kVaDog4ldPEY"
},
"source": [
"## Training an uplifting model\n",
"\n",
"In this example, we will use the *Hillstrom Email Marketing dataset*.\n",
"\n",
"This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test:\n",
"\n",
"- 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.\n",
"- 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.\n",
"- 1/3 were randomly chosen to not receive an e-mail campaign.\n",
"\n",
"During a period of two weeks following the e-mail campaign, results were tracked. The task is to tell if the Mens or Womens e-mail campaign was successful.\n",
"\n",
"Read more about dataset [in its documentation]( https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html). This tutorial uses the dataset as curated by [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/hillstrom)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:03.005071Z",
"iopub.status.busy": "2024-04-20T11:22:03.004780Z",
"iopub.status.idle": "2024-04-20T11:22:05.130141Z",
"shell.execute_reply": "2024-04-20T11:22:05.128804Z"
},
"id": "IKkNy8xedOb7"
},
"outputs": [],
"source": [
"# Install the TensorFlow Datasets package\n",
"!pip install tensorflow-datasets -U --quiet"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:05.134319Z",
"iopub.status.busy": "2024-04-20T11:22:05.134024Z",
"iopub.status.idle": "2024-04-20T11:22:09.092330Z",
"shell.execute_reply": "2024-04-20T11:22:09.091494Z"
},
"id": "1veZ9nJZPGsv"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-04-20 11:22:09.063782: W tensorflow/core/kernels/data/cache_dataset_ops.cc:858] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.\n",
"2024-04-20 11:22:09.069098: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
channel
\n",
"
conversion
\n",
"
history
\n",
"
history_segment
\n",
"
mens
\n",
"
newbie
\n",
"
recency
\n",
"
segment
\n",
"
spend
\n",
"
visit
\n",
"
womens
\n",
"
zip_code
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
b'Web'
\n",
"
0
\n",
"
29.990000
\n",
"
b'1) $0 - $100'
\n",
"
1
\n",
"
0
\n",
"
6
\n",
"
b'Womens E-Mail'
\n",
"
0.0
\n",
"
0
\n",
"
0
\n",
"
b'Surburban'
\n",
"
\n",
"
\n",
"
1
\n",
"
b'Web'
\n",
"
0
\n",
"
150.380005
\n",
"
b'2) $100 - $200'
\n",
"
0
\n",
"
1
\n",
"
9
\n",
"
b'Womens E-Mail'
\n",
"
0.0
\n",
"
0
\n",
"
1
\n",
"
b'Surburban'
\n",
"
\n",
"
\n",
"
2
\n",
"
b'Phone'
\n",
"
0
\n",
"
602.960022
\n",
"
b'5) $500 - $750'
\n",
"
1
\n",
"
1
\n",
"
4
\n",
"
b'Womens E-Mail'
\n",
"
0.0
\n",
"
0
\n",
"
0
\n",
"
b'Surburban'
\n",
"
\n",
"
\n",
"
3
\n",
"
b'Multichannel'
\n",
"
0
\n",
"
341.010010
\n",
"
b'3) $200 - $350'
\n",
"
0
\n",
"
0
\n",
"
9
\n",
"
b'Womens E-Mail'
\n",
"
0.0
\n",
"
1
\n",
"
1
\n",
"
b'Urban'
\n",
"
\n",
"
\n",
"
4
\n",
"
b'Phone'
\n",
"
0
\n",
"
97.180000
\n",
"
b'1) $0 - $100'
\n",
"
0
\n",
"
1
\n",
"
3
\n",
"
b'Womens E-Mail'
\n",
"
0.0
\n",
"
1
\n",
"
1
\n",
"
b'Surburban'
\n",
"
\n",
"
\n",
"
5
\n",
"
b'Web'
\n",
"
0
\n",
"
83.269997
\n",
"
b'1) $0 - $100'
\n",
"
1
\n",
"
0
\n",
"
5
\n",
"
b'Mens E-Mail'
\n",
"
0.0
\n",
"
0
\n",
"
0
\n",
"
b'Urban'
\n",
"
\n",
"
\n",
"
6
\n",
"
b'Web'
\n",
"
0
\n",
"
331.170013
\n",
"
b'3) $200 - $350'
\n",
"
1
\n",
"
0
\n",
"
8
\n",
"
b'Womens E-Mail'
\n",
"
0.0
\n",
"
0
\n",
"
0
\n",
"
b'Surburban'
\n",
"
\n",
"
\n",
"
7
\n",
"
b'Multichannel'
\n",
"
0
\n",
"
628.400024
\n",
"
b'5) $500 - $750'
\n",
"
1
\n",
"
1
\n",
"
9
\n",
"
b'No E-Mail'
\n",
"
0.0
\n",
"
1
\n",
"
0
\n",
"
b'Surburban'
\n",
"
\n",
"
\n",
"
8
\n",
"
b'Phone'
\n",
"
0
\n",
"
134.610001
\n",
"
b'2) $100 - $200'
\n",
"
1
\n",
"
0
\n",
"
6
\n",
"
b'No E-Mail'
\n",
"
0.0
\n",
"
1
\n",
"
0
\n",
"
b'Rural'
\n",
"
\n",
"
\n",
"
9
\n",
"
b'Web'
\n",
"
0
\n",
"
141.210007
\n",
"
b'2) $100 - $200'
\n",
"
0
\n",
"
1
\n",
"
9
\n",
"
b'Mens E-Mail'
\n",
"
0.0
\n",
"
1
\n",
"
1
\n",
"
b'Surburban'
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" channel conversion history history_segment mens newbie \\\n",
"0 b'Web' 0 29.990000 b'1) $0 - $100' 1 0 \n",
"1 b'Web' 0 150.380005 b'2) $100 - $200' 0 1 \n",
"2 b'Phone' 0 602.960022 b'5) $500 - $750' 1 1 \n",
"3 b'Multichannel' 0 341.010010 b'3) $200 - $350' 0 0 \n",
"4 b'Phone' 0 97.180000 b'1) $0 - $100' 0 1 \n",
"5 b'Web' 0 83.269997 b'1) $0 - $100' 1 0 \n",
"6 b'Web' 0 331.170013 b'3) $200 - $350' 1 0 \n",
"7 b'Multichannel' 0 628.400024 b'5) $500 - $750' 1 1 \n",
"8 b'Phone' 0 134.610001 b'2) $100 - $200' 1 0 \n",
"9 b'Web' 0 141.210007 b'2) $100 - $200' 0 1 \n",
"\n",
" recency segment spend visit womens zip_code \n",
"0 6 b'Womens E-Mail' 0.0 0 0 b'Surburban' \n",
"1 9 b'Womens E-Mail' 0.0 0 1 b'Surburban' \n",
"2 4 b'Womens E-Mail' 0.0 0 0 b'Surburban' \n",
"3 9 b'Womens E-Mail' 0.0 1 1 b'Urban' \n",
"4 3 b'Womens E-Mail' 0.0 1 1 b'Surburban' \n",
"5 5 b'Mens E-Mail' 0.0 0 0 b'Urban' \n",
"6 8 b'Womens E-Mail' 0.0 0 0 b'Surburban' \n",
"7 9 b'No E-Mail' 0.0 1 0 b'Surburban' \n",
"8 6 b'No E-Mail' 0.0 1 0 b'Rural' \n",
"9 9 b'Mens E-Mail' 0.0 1 1 b'Surburban' "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load the dataset\n",
"import tensorflow_datasets as tfds\n",
"raw_train, raw_test = tfds.load('hillstrom', split=['train[:80%]', 'train[20%:]'])\n",
"\n",
"# Display the first 10 examples of the test fold.\n",
"pd.DataFrame(list(raw_test.batch(10).take(1))[0])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5stnFbyKaIgn"
},
"source": [
"### Dataset preprocessing\n",
"\n",
"Since TF-DF currently only supports binary treatments, combine the \"Men's Email\" and the \"Women's Email\" campaign. This tutorial uses the binary variable `conversion` as outcome. This means that the problem is a **Categorical Uplifting** problem. If we were using the numerical variable `spend`, the problem would be a **Numerical Uplifting** problem."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:09.097644Z",
"iopub.status.busy": "2024-04-20T11:22:09.096961Z",
"iopub.status.idle": "2024-04-20T11:22:09.202788Z",
"shell.execute_reply": "2024-04-20T11:22:09.202170Z"
},
"id": "dLpAw7jibIrh"
},
"outputs": [],
"source": [
"def prepare_dataset(example):\n",
" # Use a binary treatment class.\n",
" example['treatment'] = 1 if example['segment'] == b'Mens E-Mail' or example['segment'] == b'Womens E-Mail' else 0\n",
" outcome = example['conversion']\n",
" # Restrict the dataset to the input features.\n",
" input_features = ['channel', 'history', 'mens', 'womens', 'newbie', 'recency', 'zip_code', 'treatment']\n",
" example = {feature: example[feature] for feature in input_features}\n",
" return example, outcome\n",
"\n",
"train_ds = raw_train.map(prepare_dataset).batch(100)\n",
"test_ds = raw_test.map(prepare_dataset).batch(100)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z-mtKmd-RoOu"
},
"source": [
"### Model training\n",
"\n",
"Finally, train and evaluate the model as usual. Note that TF-DF only supports Random Forest models for uplifting."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:09.206353Z",
"iopub.status.busy": "2024-04-20T11:22:09.205894Z",
"iopub.status.idle": "2024-04-20T11:22:20.891717Z",
"shell.execute_reply": "2024-04-20T11:22:20.890807Z"
},
"id": "-OZN8t8LRn38"
},
"outputs": [
{
"data": {
"application/javascript": [
"google.colab.output.setIframeHeight(0, true, {maxHeight: 300})"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Warning: The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:absl:The `num_threads` constructor argument is not set and the number of CPU is os.cpu_count()=32 > 32. Setting num_threads to 32. Set num_threads manually to use more than 32 cpus.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Use /tmpfs/tmp/tmppyeh4gae as temporary training directory\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reading training dataset...\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training tensor examples:\n",
"Features: {'channel': , 'history': , 'mens': , 'womens': , 'newbie': , 'recency': , 'zip_code': , 'treatment': }\n",
"Label: Tensor(\"data_8:0\", shape=(None,), dtype=int64)\n",
"Weights: None\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Normalized tensor features:\n",
" {'channel': SemanticTensor(semantic=, tensor=), 'history': SemanticTensor(semantic=, tensor=), 'mens': SemanticTensor(semantic=, tensor=), 'womens': SemanticTensor(semantic=, tensor=), 'newbie': SemanticTensor(semantic=, tensor=), 'recency': SemanticTensor(semantic=, tensor=), 'zip_code': SemanticTensor(semantic=, tensor=)}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training dataset read in 0:00:04.974222. Found 51200 examples.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training model...\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Standard output detected as not visible to the user e.g. running in a notebook. Creating a training log redirection. If training gets stuck, try calling tfdf.keras.set_training_logs_redirection(False).\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.2334 UTC kernel.cc:771] Start Yggdrasil model training\n",
"[INFO 24-04-20 11:22:14.2335 UTC kernel.cc:772] Collect training examples\n",
"[INFO 24-04-20 11:22:14.2335 UTC kernel.cc:785] Dataspec guide:\n",
"column_guides {\n",
" column_name_pattern: \"^__LABEL$\"\n",
" type: CATEGORICAL\n",
"}\n",
"default_column_guide {\n",
" categorial {\n",
" max_vocab_count: 2000\n",
" }\n",
" discretized_numerical {\n",
" maximum_num_bins: 255\n",
" }\n",
"}\n",
"ignore_columns_without_guides: false\n",
"detect_numerical_as_discretized_numerical: false\n",
"\n",
"[INFO 24-04-20 11:22:14.2339 UTC kernel.cc:391] Number of batches: 512\n",
"[INFO 24-04-20 11:22:14.2339 UTC kernel.cc:392] Number of examples: 51200\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.2463 UTC kernel.cc:792] Training dataset:\n",
"Number of records: 51200\n",
"Number of columns: 9\n",
"\n",
"Number of columns by type:\n",
"\tNUMERICAL: 5 (55.5556%)\n",
"\tCATEGORICAL: 4 (44.4444%)\n",
"\n",
"Columns:\n",
"\n",
"NUMERICAL: 5 (55.5556%)\n",
"\t2: \"history\" NUMERICAL mean:241.833 min:29.99 max:3345.93 sd:255.292\n",
"\t3: \"mens\" NUMERICAL mean:0.550391 min:0 max:1 sd:0.497454\n",
"\t4: \"newbie\" NUMERICAL mean:0.503086 min:0 max:1 sd:0.49999\n",
"\t5: \"recency\" NUMERICAL mean:5.75514 min:1 max:12 sd:3.50281\n",
"\t7: \"womens\" NUMERICAL mean:0.549687 min:0 max:1 sd:0.497525\n",
"\n",
"CATEGORICAL: 4 (44.4444%)\n",
"\t0: \"__LABEL\" CATEGORICAL integerized vocab-size:3 no-ood-item\n",
"\t1: \"channel\" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:\"Web\" 22576 (44.0938%)\n",
"\t6: \"treatment\" CATEGORICAL integerized vocab-size:3 no-ood-item\n",
"\t8: \"zip_code\" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:\"Surburban\" 22966 (44.8555%)\n",
"\n",
"Terminology:\n",
"\tnas: Number of non-available (i.e. missing) values.\n",
"\tood: Out of dictionary.\n",
"\tmanually-defined: Attribute whose type is manually defined by the user, i.e., the type was not automatically inferred.\n",
"\ttokenized: The attribute value is obtained through tokenization.\n",
"\thas-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.\n",
"\tvocab-size: Number of unique values.\n",
"\n",
"[INFO 24-04-20 11:22:14.2464 UTC kernel.cc:808] Configure learner\n",
"[INFO 24-04-20 11:22:14.2466 UTC kernel.cc:822] Training config:\n",
"learner: \"RANDOM_FOREST\"\n",
"features: \"^channel$\"\n",
"features: \"^history$\"\n",
"features: \"^mens$\"\n",
"features: \"^newbie$\"\n",
"features: \"^recency$\"\n",
"features: \"^womens$\"\n",
"features: \"^zip_code$\"\n",
"label: \"^__LABEL$\"\n",
"task: CATEGORICAL_UPLIFT\n",
"random_seed: 123456\n",
"uplift_treatment: \"treatment\"\n",
"metadata {\n",
" framework: \"TF Keras\"\n",
"}\n",
"pure_serving_model: false\n",
"[yggdrasil_decision_forests.model.random_forest.proto.random_forest_config] {\n",
" num_trees: 300\n",
" decision_tree {\n",
" max_depth: 16\n",
" min_examples: 5\n",
" in_split_min_examples_check: true\n",
" keep_non_leaf_label_distribution: true\n",
" num_candidate_attributes: 0\n",
" missing_value_policy: GLOBAL_IMPUTATION\n",
" allow_na_conditions: false\n",
" categorical_set_greedy_forward {\n",
" sampling: 0.1\n",
" max_num_items: -1\n",
" min_item_frequency: 1\n",
" }\n",
" growing_strategy_local {\n",
" }\n",
" categorical {\n",
" cart {\n",
" }\n",
" }\n",
" axis_aligned_split {\n",
" }\n",
" internal {\n",
" sorting_strategy: PRESORTED\n",
" }\n",
" uplift {\n",
" min_examples_in_treatment: 5\n",
" split_score: KULLBACK_LEIBLER\n",
" }\n",
" }\n",
" winner_take_all_inference: true\n",
" compute_oob_performances: true\n",
" compute_oob_variable_importances: false\n",
" num_oob_variable_importances_permutations: 1\n",
" bootstrap_training_dataset: true\n",
" bootstrap_size_ratio: 1\n",
" adapt_bootstrap_size_ratio_for_maximum_training_duration: false\n",
" sampling_with_replacement: true\n",
"}\n",
"\n",
"[INFO 24-04-20 11:22:14.2469 UTC kernel.cc:825] Deployment config:\n",
"cache_path: \"/tmpfs/tmp/tmppyeh4gae/working_cache\"\n",
"num_threads: 32\n",
"try_resume_training: true\n",
"\n",
"[INFO 24-04-20 11:22:14.2472 UTC kernel.cc:887] Train model\n",
"[INFO 24-04-20 11:22:14.2473 UTC random_forest.cc:416] Training random forest on 51200 example(s) and 7 feature(s).\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.3731 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n",
"[INFO 24-04-20 11:22:14.3741 UTC random_forest.cc:802] Training of tree 1/300 (tree index:2) done qini:0.000172044 auuc:0.0025137\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.4012 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.4027 UTC random_forest.cc:802] Training of tree 15/300 (tree index:31) done qini:1.41341e-05 auuc:0.0023575\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.5302 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.5327 UTC random_forest.cc:802] Training of tree 25/300 (tree index:23) done qini:-2.19346e-05 auuc:0.00235455\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.6034 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.6058 UTC random_forest.cc:802] Training of tree 35/300 (tree index:33) done qini:0.00013211 auuc:0.0025086\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.6887 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.6910 UTC random_forest.cc:802] Training of tree 45/300 (tree index:45) done qini:-2.28572e-05 auuc:0.00235363\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.7656 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.7680 UTC random_forest.cc:802] Training of tree 55/300 (tree index:55) done qini:-8.67727e-05 auuc:0.00228972\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.8354 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.8379 UTC random_forest.cc:802] Training of tree 65/300 (tree index:56) done qini:-0.000112323 auuc:0.00226417\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.9052 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.9077 UTC random_forest.cc:802] Training of tree 75/300 (tree index:74) done qini:-0.000109942 auuc:0.00226655\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:14.9680 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:14.9704 UTC random_forest.cc:802] Training of tree 101/300 (tree index:101) done qini:-0.000112409 auuc:0.00226408\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.1148 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.1196 UTC random_forest.cc:802] Training of tree 121/300 (tree index:118) done qini:-0.000299795 auuc:0.00207669\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.2280 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.2305 UTC random_forest.cc:802] Training of tree 131/300 (tree index:138) done qini:-0.000153133 auuc:0.00222336\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.3108 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.3155 UTC random_forest.cc:802] Training of tree 141/300 (tree index:139) done qini:-0.000173194 auuc:0.0022033\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.3853 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.3877 UTC random_forest.cc:802] Training of tree 168/300 (tree index:162) done qini:-0.000130945 auuc:0.00224554\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.5471 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.5519 UTC random_forest.cc:802] Training of tree 178/300 (tree index:178) done qini:-0.000145457 auuc:0.00223103\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.6367 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.6414 UTC random_forest.cc:802] Training of tree 188/300 (tree index:189) done qini:-0.000124566 auuc:0.00225192\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.6876 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.6901 UTC random_forest.cc:802] Training of tree 217/300 (tree index:213) done qini:-0.000161956 auuc:0.00221453\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.8731 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.8795 UTC random_forest.cc:802] Training of tree 227/300 (tree index:229) done qini:-0.000133605 auuc:0.00224288\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:15.9403 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:15.9428 UTC random_forest.cc:802] Training of tree 237/300 (tree index:239) done qini:-0.000101549 auuc:0.00227494\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.0044 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.0068 UTC random_forest.cc:802] Training of tree 247/300 (tree index:253) done qini:-0.000141334 auuc:0.00223516\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.0749 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.0773 UTC random_forest.cc:802] Training of tree 257/300 (tree index:257) done qini:-0.000135416 auuc:0.00224107\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.1446 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.1471 UTC random_forest.cc:802] Training of tree 267/300 (tree index:261) done qini:-0.000131112 auuc:0.00224538\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.2109 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.2132 UTC random_forest.cc:802] Training of tree 277/300 (tree index:275) done qini:-0.000149751 auuc:0.00222674\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.2724 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.2746 UTC random_forest.cc:802] Training of tree 287/300 (tree index:283) done qini:-0.000168736 auuc:0.00220775\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.3282 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.3306 UTC random_forest.cc:802] Training of tree 297/300 (tree index:299) done qini:-0.000181665 auuc:0.00219482\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[WARNING 24-04-20 11:22:16.3623 UTC random_forest.cc:1105] Internal error: Non empty oob evaluation\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.3646 UTC random_forest.cc:802] Training of tree 300/300 (tree index:298) done qini:-0.000173258 auuc:0.00220323\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.3680 UTC random_forest.cc:882] Final OOB metrics: qini:-0.000173258 auuc:0.00220323\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.3843 UTC kernel.cc:919] Export model in log directory: /tmpfs/tmp/tmppyeh4gae with prefix 568256236db544eb\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.4274 UTC kernel.cc:937] Save model in resources\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.4309 UTC abstract_model.cc:881] Model self evaluation:\n",
"Number of predictions (without weights): 51200\n",
"Number of predictions (with weights): 51200\n",
"Task: CATEGORICAL_UPLIFT\n",
"Label: __LABEL\n",
"\n",
"Number of treatments: 2\n",
"AUUC: 0.00220323\n",
"Qini: -0.000173258\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.4580 UTC kernel.cc:1233] Loading model from path /tmpfs/tmp/tmppyeh4gae/model/ with prefix 568256236db544eb\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO 24-04-20 11:22:16.6557 UTC decision_forest.cc:734] Model loaded with 300 root(s), 60190 node(s), and 7 input feature(s).\n",
"[INFO 24-04-20 11:22:16.6557 UTC abstract_model.cc:1344] Engine \"RandomForestGeneric\" built\n",
"[INFO 24-04-20 11:22:16.6557 UTC kernel.cc:1061] Use fast generic engine\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model trained in 0:00:02.442514\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Compiling model...\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model compiled.\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%set_cell_height 300\n",
"\n",
"# Configure the model and its hyper-parameters.\n",
"model = tfdf.keras.RandomForestModel(\n",
" verbose=2,\n",
" task=tfdf.keras.Task.CATEGORICAL_UPLIFT,\n",
" uplift_treatment='treatment'\n",
")\n",
"\n",
"# Train the model.\n",
"model.fit(train_ds)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XKhtZuLhGtv_"
},
"source": [
"# Evaluating Uplift models.\n",
"\n",
"## Metrics for Uplift models\n",
"\n",
"The two most important metrics for evaluating upift models are the **AUUC** (Area Under the Uplift Curve) metric and the **Qini** (Area Under the Qini Curve) metric. This is similar to the use of AUC and accuracy for classification problems. For both metrics, the larger they are, the better.\n",
"\n",
"Both AUUC and Qini are **not** normalized metrics. This means that the best possible value of the metric can vary from dataset to dataset. This is different from, for example, the AUC matric that always varies between 0 and 1.\n",
"\n",
"A formal definition of AUUC is below. For more information about these metrics, see [Guelman](https://diposit.ub.edu/dspace/bitstream/2445/65123/1/Leo%20Guelman_PhD_THESIS.pdf) and [Betlei et al.](https://arxiv.org/pdf/2012.09897.pdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AMSpNTTZmuzv"
},
"source": [
"## Model Self-Evaluation\n",
"\n",
"TF-DF Random Forest models perform self-evaluation on the out-of-bag examples of the training dataset. For uplift models, they expose the AUUC and the Qini metric. You can directly retrieve the two metrics on the training dataset through the inspector\n",
"\n",
"Later, we are going to recompute the AUUC metric \"manually\" on the test dataset. Note that two metrics are not expected to be exactly equal (out-of-bag on train vs test) since the AUUC is not a normalized metric."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:20.895624Z",
"iopub.status.busy": "2024-04-20T11:22:20.895358Z",
"iopub.status.idle": "2024-04-20T11:22:20.901523Z",
"shell.execute_reply": "2024-04-20T11:22:20.900814Z"
},
"id": "OsN1R9mT_8T6"
},
"outputs": [
{
"data": {
"text/plain": [
"Evaluation(num_examples=51200, accuracy=None, loss=None, rmse=None, ndcg=None, aucs=None, auuc=0.0022032308892709586, qini=-0.00017325819500263418)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The self-evaluation is available through the model inspector\n",
"insp = model.make_inspector()\n",
"insp.evaluation()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WErGZZ27HWJN"
},
"source": [
"## Manually computing the AUUC\n",
"\n",
"In this section, we manually compute the AUUC and plot the uplift curves.\n",
"\n",
"The next few paragraphs explain the AUUC metric in more detail and may be skipped.\n",
"\n",
"### Computing the AUUC\n",
"\n",
"Suppose you have a labeled dataset with $|T|$ examples with treatment and $|C|$ examples without treatment, called *control* examples. For each example, the uplift model $f$ produces the conditional probability that a treatment on the example will yield a positive outcome.\n",
"\n",
"Suppose a decision-maker needs to decide which clients to send an email using an uplift model $f$. The model produces a (conditional) probability that the email will result in a conversion. The decision-maker might therefore just pick the number $k$ of emails to send and send those $k$ emails to the clients with the highest probability.\n",
"\n",
"Using a labeled test dataset, it is possible to study the impact of $k$ on the success of the campaign. First, we are interested in the ratio $\\frac{|C \\cap T|}{|T|}$ of clients that received an email that converted versus total number of clients that received an email. Here $C$ is the set of clients that received an email and converted and $T$ is the total number of clients that received an email. We plot this ratio against $k$.\n",
"\n",
"Ideally, we like to have this curve increase steeply. This would mean that the model prioritizes sending email to those clients that will generate a conversion when receiving an email."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:20.905039Z",
"iopub.status.busy": "2024-04-20T11:22:20.904791Z",
"iopub.status.idle": "2024-04-20T11:22:26.270536Z",
"shell.execute_reply": "2024-04-20T11:22:26.269802Z"
},
"id": "xUGNWKkkkl-s"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\r",
" 1/512 [..............................] - ETA: 3:28"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 11/512 [..............................] - ETA: 2s "
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 21/512 [>.............................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 31/512 [>.............................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 41/512 [=>............................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 51/512 [=>............................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 61/512 [==>...........................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 71/512 [===>..........................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 81/512 [===>..........................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
" 90/512 [====>.........................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"100/512 [====>.........................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"110/512 [=====>........................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"120/512 [======>.......................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"130/512 [======>.......................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"140/512 [=======>......................] - ETA: 2s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"150/512 [=======>......................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"159/512 [========>.....................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"169/512 [========>.....................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"179/512 [=========>....................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"189/512 [==========>...................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"199/512 [==========>...................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"209/512 [===========>..................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"218/512 [===========>..................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"228/512 [============>.................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"238/512 [============>.................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"248/512 [=============>................] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"258/512 [==============>...............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"268/512 [==============>...............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"278/512 [===============>..............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"288/512 [===============>..............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"298/512 [================>.............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"307/512 [================>.............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"316/512 [=================>............] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"326/512 [==================>...........] - ETA: 1s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"336/512 [==================>...........] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"346/512 [===================>..........] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"356/512 [===================>..........] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"366/512 [====================>.........] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"376/512 [=====================>........] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"386/512 [=====================>........] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"396/512 [======================>.......] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"406/512 [======================>.......] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"416/512 [=======================>......] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"426/512 [=======================>......] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"436/512 [========================>.....] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"446/512 [=========================>....] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"456/512 [=========================>....] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"466/512 [==========================>...] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"476/512 [==========================>...] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"485/512 [===========================>..] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"495/512 [============================>.] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"505/512 [============================>.] - ETA: 0s"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\r",
"512/512 [==============================] - 3s 5ms/step\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-04-20 11:22:25.165808: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-04-20 11:22:25.975008: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Compute all predictions on the test dataset\n",
"predictions = model.predict(test_ds).flatten()\n",
"# Extract outcomes and treatments\n",
"outcomes = np.concatenate([outcome.numpy() for _, outcome in test_ds])\n",
"treatment = np.concatenate([example['treatment'].numpy() for example,_ in test_ds])\n",
"control = 1 - treatment\n",
"\n",
"num_treatments = np.sum(treatment)\n",
"# Clients without treatment are called 'control' group\n",
"num_control = np.sum(control)\n",
"num_examples = len(predictions)\n",
"\n",
"# Sort labels and treatments according to predictions in descending order\n",
"prediction_order = predictions.argsort()[::-1]\n",
"outcomes_sorted = outcomes[prediction_order]\n",
"treatment_sorted = treatment[prediction_order]\n",
"control_sorted = control[prediction_order]\n",
"ratio_treatment = np.cumsum(np.multiply(outcomes_sorted, treatment_sorted), axis=0)/num_treatments\n",
"\n",
"fig, ax = plt.subplots()\n",
"ax.plot(ratio_treatment, label='Conversion ratio of treatment')\n",
"ax.set_xlabel('k')\n",
"ax.set_ylabel('Ratio of conversion')\n",
"ax.legend()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "97IFpq5epHsx"
},
"source": [
"Similarly, we can also compute and plot the conversion ratio of those not receiving an email, called the *control group*. Ideally, this curve is initially flat: This would mean that the model does not prioritize sending emails to clients that will generate a conversion despite **not** receiving a email"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:26.274521Z",
"iopub.status.busy": "2024-04-20T11:22:26.273808Z",
"iopub.status.idle": "2024-04-20T11:22:26.456834Z",
"shell.execute_reply": "2024-04-20T11:22:26.456209Z"
},
"id": "bIY-oA9alwzY"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ratio_control = np.cumsum(np.multiply(outcomes_sorted, control_sorted), axis=0)/num_control\n",
"ax.plot(ratio_control, label='Conversion ratio of control')\n",
"ax.legend()\n",
"fig"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "q9MopM5MnCK0"
},
"source": [
"The AUUC metric measures the area between these two curves, normalizing the y-axis between 0 and 1"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2024-04-20T11:22:26.460538Z",
"iopub.status.busy": "2024-04-20T11:22:26.460120Z",
"iopub.status.idle": "2024-04-20T11:22:26.806300Z",
"shell.execute_reply": "2024-04-20T11:22:26.805590Z"
},
"id": "99XXGsq7nQgN"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The AUUC on the test dataset is 0.007513928513572819\n"
]
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x = np.linspace(0, 1, num_examples)\n",
"plt.plot(x,ratio_treatment, label='Conversion ratio of treatment')\n",
"plt.plot(x,ratio_control, label='Conversion ratio of control')\n",
"plt.fill_between(x, ratio_treatment, ratio_control, where=(ratio_treatment > ratio_control), color='C0', alpha=0.3)\n",
"plt.fill_between(x, ratio_treatment, ratio_control, where=(ratio_treatment < ratio_control), color='C1', alpha=0.3)\n",
"plt.xlabel('k')\n",
"plt.ylabel('Ratio of conversion')\n",
"plt.legend()\n",
"\n",
"# Approximate the integral of the difference between the two curves.\n",
"auuc = np.trapz(ratio_treatment-ratio_control, dx=1/num_examples)\n",
"print(f'The AUUC on the test dataset is {auuc}')"
]
}
],
"metadata": {
"colab": {
"name": "uplift_colab.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 0
}