{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "PLwGNovEanAB" }, "source": [ "##### Copyright 2022 The TensorFlow Authors." ] }, { "cell_type": "markdown", "metadata": { "id": "fePXTHt_Izkk" }, "source": [] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2024-07-19T09:53:39.602990Z", "iopub.status.busy": "2024-07-19T09:53:39.602707Z", "iopub.status.idle": "2024-07-19T09:53:39.607283Z", "shell.execute_reply": "2024-07-19T09:53:39.606575Z" }, "id": "jUK4QgfmbGPS" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "lKy2xzTnbsaJ" }, "source": [ "# Creating a custom Counterfactual Logit Pairing Dataset\n", "\n", "
\n", " \n", "\n", "\n", "\n", "
\n", " View on TensorFlow.org\n", "\n", " \n", " Run in Google Colab\n", "\n", " \n", " View source on GitHub\n", "\n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "N4wgS7SGbzhL" }, "source": [ "Applying Counterfactual Logit Pairing (CLP) to evaluate and improve the fairness of your model requires a counterfactual dataset. You create a counterfactual dataset by duplicating your existing dataset and changing the new dataset to add, remove, or modify identity terminology. This tutorial explains the approach and techniques for creating a counterfactual dataset for your existing text dataset.\n", "\n", "You use your counterfactual dataset with the CLP technique by creating a new data object, `CounterfactualPackedInputs`, that contains the `original_input` and `counterfactual_data`, and looks like the following:\n", "\n", "`CounterfactualPackedInputs` looks like the following:\n", "\n", "```python\n", "CounterfactualPackedInputs(\n", " original_input=(x, y, sample_weight),\n", " counterfactual_data=(original_x, counterfactual_x,\n", " counterfactual_sample_weight)\n", ")\n", "```\n", "\n", "The `original_input` should be the original dataset that is used to train your Keras model. `counterfactual_data` should be a `tf.data.Dataset` with the original `x` value, the corresponding `counterfactual_x` value, and the `counterfactual_sample_weight`. The `counterfactual_x` value is nearly identical to the original value but with one or more of the attributes removed or replaced. This dataset is used to pair the loss function between the original value and the counterfactual value with the goal of assuring that the model’s prediction doesn’t change when the sensitive attribute is different. `original_input` and `counterfactual_data` need to be the same shape. You can duplicate values from `counterfactual_data` so that it’s the same number of elements as `original_input`. \n", "\n", "Properties of `counterfactual_data`:\n", "* All `original_x` values need to have references to an identity group \n", "* Each `counterfactual_x` value is identical to the original value, but with one or more of the attributes removed or replaced\n", "* Have the same shape as original input (you can duplicate values so that they’re the same shape) \n", "\n", "`counterfactual_data` does not need to:\n", "* Have overlap with data within original input \n", "* Have ground truth labels \n", "\n", "Here’s an example of what a `counterfactual_data` would look like if you remove the term \"gay\".\n", "```python\n", "original_x: “I am a gay man”\n", "counterfactual_x: “I am a man” \n", "counterfactual_sample_weight”: 1\n", "```\n", "If you have a text classifier, you can use [`build_counterfactual_data`](https://www.tensorflow.org/responsible_ai/model_remediation/api_docs/python/model_remediation/counterfactual/keras/utils/build_counterfactual_data) to help create a counterfactual dataset. For all other data types, you need to provide a counterfactual dataset directly. \n" ] }, { "cell_type": "markdown", "metadata": { "id": "npFvpoI2cG9-" }, "source": [ "## Setup\n", "\n", "You'll begin by installing TensorFlow Model Remediation.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-07-19T09:53:39.611264Z", "iopub.status.busy": "2024-07-19T09:53:39.610668Z", "iopub.status.idle": "2024-07-19T09:53:41.082284Z", "shell.execute_reply": "2024-07-19T09:53:41.081468Z" }, "id": "8ou41oj9cSJd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting tensorflow-model-remediation\r\n", " Using cached tensorflow_model_remediation-0.1.7.1-py3-none-any.whl.metadata (4.8 kB)\r\n", "Collecting dill (from tensorflow-model-remediation)\r\n", " Using cached dill-0.3.8-py3-none-any.whl.metadata (10 kB)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Collecting mock (from tensorflow-model-remediation)\r\n", " Using cached mock-5.1.0-py3-none-any.whl.metadata (3.0 kB)\r\n", "Requirement already satisfied: pandas in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-model-remediation) (2.2.2)\r\n", "Requirement already satisfied: tensorflow-hub in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-model-remediation) (0.16.1)\r\n", "Requirement already satisfied: tensorflow>=2.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-model-remediation) (2.17.0)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: absl-py>=1.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (2.1.0)\r\n", "Requirement already satisfied: astunparse>=1.6.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (1.6.3)\r\n", "Requirement already satisfied: flatbuffers>=24.3.25 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (24.3.25)\r\n", "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (0.6.0)\r\n", "Requirement already satisfied: google-pasta>=0.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (0.2.0)\r\n", "Requirement already satisfied: h5py>=3.10.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (3.11.0)\r\n", "Requirement already satisfied: libclang>=13.0.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (18.1.1)\r\n", "Requirement already satisfied: ml-dtypes<0.5.0,>=0.3.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (0.4.0)\r\n", "Requirement already satisfied: opt-einsum>=2.3.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (3.3.0)\r\n", "Requirement already satisfied: packaging in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (24.1)\r\n", "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (3.20.3)\r\n", "Requirement already satisfied: requests<3,>=2.21.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (2.32.3)\r\n", "Requirement already satisfied: setuptools in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (71.0.3)\r\n", "Requirement already satisfied: six>=1.12.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (1.16.0)\r\n", "Requirement already satisfied: termcolor>=1.1.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (2.4.0)\r\n", "Requirement already satisfied: typing-extensions>=3.6.6 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (4.12.2)\r\n", "Requirement already satisfied: wrapt>=1.11.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (1.16.0)\r\n", "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (1.65.1)\r\n", "Requirement already satisfied: tensorboard<2.18,>=2.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (2.17.0)\r\n", "Requirement already satisfied: keras>=3.2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (3.4.1)\r\n", "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (0.37.1)\r\n", "Requirement already satisfied: numpy<2.0.0,>=1.23.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow>=2.0.0->tensorflow-model-remediation) (1.26.4)\r\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow-model-remediation) (2.9.0.post0)\r\n", "Requirement already satisfied: pytz>=2020.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow-model-remediation) (2024.1)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: tzdata>=2022.7 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from pandas->tensorflow-model-remediation) (2024.1)\r\n", "Requirement already satisfied: tf-keras>=2.14.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorflow-hub->tensorflow-model-remediation) (2.17.0)\r\n", "Requirement already satisfied: wheel<1.0,>=0.23.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from astunparse>=1.6.0->tensorflow>=2.0.0->tensorflow-model-remediation) (0.43.0)\r\n", "Requirement already satisfied: rich in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from keras>=3.2.0->tensorflow>=2.0.0->tensorflow-model-remediation) (13.7.1)\r\n", "Requirement already satisfied: namex in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from keras>=3.2.0->tensorflow>=2.0.0->tensorflow-model-remediation) (0.0.8)\r\n", "Requirement already satisfied: optree in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from keras>=3.2.0->tensorflow>=2.0.0->tensorflow-model-remediation) (0.12.1)\r\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow>=2.0.0->tensorflow-model-remediation) (3.3.2)\r\n", "Requirement already satisfied: idna<4,>=2.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow>=2.0.0->tensorflow-model-remediation) (3.7)\r\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow>=2.0.0->tensorflow-model-remediation) (2.2.2)\r\n", "Requirement already satisfied: certifi>=2017.4.17 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from requests<3,>=2.21.0->tensorflow>=2.0.0->tensorflow-model-remediation) (2024.7.4)\r\n", "Requirement already satisfied: markdown>=2.6.8 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.18,>=2.17->tensorflow>=2.0.0->tensorflow-model-remediation) (3.6)\r\n", "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.18,>=2.17->tensorflow>=2.0.0->tensorflow-model-remediation) (0.7.2)\r\n", "Requirement already satisfied: werkzeug>=1.0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from tensorboard<2.18,>=2.17->tensorflow>=2.0.0->tensorflow-model-remediation) (3.0.3)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: importlib-metadata>=4.4 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown>=2.6.8->tensorboard<2.18,>=2.17->tensorflow>=2.0.0->tensorflow-model-remediation) (8.0.0)\r\n", "Requirement already satisfied: MarkupSafe>=2.1.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from werkzeug>=1.0.1->tensorboard<2.18,>=2.17->tensorflow>=2.0.0->tensorflow-model-remediation) (2.1.5)\r\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from rich->keras>=3.2.0->tensorflow>=2.0.0->tensorflow-model-remediation) (3.0.0)\r\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from rich->keras>=3.2.0->tensorflow>=2.0.0->tensorflow-model-remediation) (2.18.0)\r\n", "Requirement already satisfied: zipp>=0.5 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.18,>=2.17->tensorflow>=2.0.0->tensorflow-model-remediation) (3.19.2)\r\n", "Requirement already satisfied: mdurl~=0.1 in /tmpfs/src/tf_docs_env/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.2.0->tensorflow>=2.0.0->tensorflow-model-remediation) (0.1.2)\r\n", "Using cached tensorflow_model_remediation-0.1.7.1-py3-none-any.whl (142 kB)\r\n", "Using cached dill-0.3.8-py3-none-any.whl (116 kB)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Using cached mock-5.1.0-py3-none-any.whl (30 kB)\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Installing collected packages: mock, dill, tensorflow-model-remediation\r\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully installed dill-0.3.8 mock-5.1.0 tensorflow-model-remediation-0.1.7.1\r\n" ] } ], "source": [ "!pip install --upgrade tensorflow-model-remediation" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-07-19T09:53:41.086614Z", "iopub.status.busy": "2024-07-19T09:53:41.086340Z", "iopub.status.idle": "2024-07-19T09:53:43.673216Z", "shell.execute_reply": "2024-07-19T09:53:43.672442Z" }, "id": "w42tJVqpcTal" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-07-19 09:53:41.340953: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-07-19 09:53:41.361880: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-07-19 09:53:41.368395: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n" ] } ], "source": [ "import tensorflow as tf\n", "from tensorflow_model_remediation import counterfactual" ] }, { "cell_type": "markdown", "metadata": { "id": "2tpF3OEleEDr" }, "source": [ "## Create a simple Dataset\n", "\n", "For demonstrative purposes, we’ll create counterfactual data from the original input using `build_counterfactual_dataset`. Note that you can also construct counterfactual data from unlabeled data (as opposed to constructing it from original input). You will create a simple dataset with one sentence: “i am a gay man” which will serve as the `original_input`.\n", "\n", "Note: The dataset created in this tutorial is a simple list of repeated text for demonstration purposes only. Further this tutorial only demonstrates the steps for creating a counterfactual dataset and does not represent a real-world use case. \n", "\n", "## Build a Counterfactual Dataset \n", "\n", "As this is a text classifier, you can create the counterfactual dataset with `build_counterfactual_data` in two ways: \n", "1. Remove terms: Use build_counterfactual_data to pass a list of words that will be removed from the dataset via `tf.strings.regex_replace`.\n", "2. Replace terms: Create a custom function to pass to `build_counterfactual_data`. This might include using more specific regex functions to replace words within your original dataset or to support non-text features\n", "\n", "`build_counterfactual_dataset` takes in `original_input` and either removes or replaces terms depending on what optional parameters you pass. In most cases removing terms (option 1) should be sufficient to run CLP, however passing a custom function (option 2) is available for more precise control on the counterfactual values.\n", "\n", "### Option 1: List of Words to Remove\n", "Pass in a list of gender-related terms to remove with`build_counterfactual_data`.\n", "\n", "When using simple regex to create the counterfactual dataset, keep in mind that this may augment words that shouldn’t be changed. It is good practice to check that the changes made to the `counterfactual_x` value make sense in the context of the `orginal_x` value. Additionally, `build_counterfactual_dataset` will return only the values including a counterfactual instance. This could result in a different shape dataset from `orginal_input`, but it will be resized when passed to `pack_counterfactual_data`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-07-19T09:53:43.677700Z", "iopub.status.busy": "2024-07-19T09:53:43.677252Z", "iopub.status.idle": "2024-07-19T09:53:46.121127Z", "shell.execute_reply": "2024-07-19T09:53:46.120206Z" }, "id": "xAPFGLy_fKSm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Length of starting values: 20\n", "original: tf.Tensor(b'I am a gay man0', shape=(), dtype=string)\n", "counterfactual: tf.Tensor(b'I am a man0', shape=(), dtype=string)\n", "Length of dataset after build_counterfactual_data: 10\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "I0000 00:00:1721382824.212840 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.216631 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.220305 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.225695 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.237536 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.240949 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.244491 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.247862 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.251348 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.254836 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.258211 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382824.261555 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.510650 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.512802 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.514940 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.517172 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.519339 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.521281 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.523290 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.525275 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.527328 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.529283 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.531324 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.533303 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.572780 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.574793 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.576887 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.578906 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.581059 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.582998 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.584991 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.586968 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.588999 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.591422 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.593838 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n", "I0000 00:00:1721382825.596178 23039 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n" ] } ], "source": [ "simple_dataset_x = tf.constant(\n", " [\"I am a gay man\" + str(i) for i in range(10)] +\n", " [\"I am a man\" + str(i) for i in range(10)])\n", "print(\"Length of starting values: \" + str(len(simple_dataset_x)))\n", "\n", "simple_dataset = tf.data.Dataset.from_tensor_slices(\n", " (simple_dataset_x, None, None))\n", "\n", "counterfactual_data = counterfactual.keras.utils.build_counterfactual_data(\n", " original_input=simple_dataset,\n", " sensitive_terms_to_remove=['gay'])\n", "\n", "# Inspect the content of the TF Counterfactual Dataset\n", "for original_value, counterfactual_value, _ in counterfactual_data.take(1):\n", " print(\"original: \", original_value)\n", " print(\"counterfactual: \", counterfactual_value)\n", "print(\"Length of dataset after build_counterfactual_data: \" +\n", " str(len(list(counterfactual_data))))" ] }, { "cell_type": "markdown", "metadata": { "id": "_ueC9K5qsvXH" }, "source": [ "### Option 2: Custom Function \n", "\n", "For more flexibility around ways of modifying your original dataset, you can instead pass a custom function to `build_counterfactual_data`. \n", "\n", "In the example, you can consider replacing identity terms that reference men with those that reference women. This can be done by writing a function to replace a dictionary of words. \n", " \n", "Note that the only limitation on the custom function is that it must be a callable to accept and return a tuple in the format used in [`Model.fit`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) and should remove values that do not include any changes, which can be done by passing the terms to `sensitive_terms_to_remove`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-07-19T09:53:46.125875Z", "iopub.status.busy": "2024-07-19T09:53:46.125600Z", "iopub.status.idle": "2024-07-19T09:53:46.261597Z", "shell.execute_reply": "2024-07-19T09:53:46.260773Z" }, "id": "L57yticErNJG" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Length of starting values: 20\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "original: tf.Tensor(b'I am a gay man0', shape=(), dtype=string)\n", "counterfactual: tf.Tensor(b'I am a gay man0', shape=(), dtype=string)\n", "Length of dataset after build_counterfactual_data: 10\n" ] } ], "source": [ "words_to_replace = {\"man\": \"woman\"}\n", "print(\"Length of starting values: \" + str(len(simple_dataset_x)))\n", "\n", "def replace_words(original_batch):\n", " original_x, _, original_sample_weight = (\n", " tf.keras.utils.unpack_x_y_sample_weight(original_batch))\n", " for word in words_to_replace:\n", " counterfactual_x = tf.strings.regex_replace(\n", " original_x, f'\b{word}\b', words_to_replace[word])\n", " return tf.keras.utils.pack_x_y_sample_weight(\n", " original_x, counterfactual_x, sample_weight=original_sample_weight)\n", "\n", "counterfactual_data = counterfactual.keras.utils.build_counterfactual_data(\n", " original_input=simple_dataset,\n", " sensitive_terms_to_remove=['gay'],\n", " custom_counterfactual_function=replace_words)\n", "\n", "# Inspect the content of the TF Counterfactual Dataset\n", "for original_value, counterfactual_value in counterfactual_data.take(1):\n", " print(\"original: \", original_value)\n", " print(\"counterfactual: \", counterfactual_value)\n", "print(\"Length of dataset after build_counterfactual_data: \" +\n", " str(len(list(counterfactual_data))))" ] }, { "cell_type": "markdown", "metadata": { "id": "GKOUgoE4Og76" }, "source": [ "To learn more, please see the API documents for [`build_counterfactual_data`](https://www.tensorflow.org/responsible_ai/model_remediation/api_docs/python/model_remediation/counterfactual/keras/utils/build_counterfactual_data)." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "creating_a_custom_counterfactual_dataset.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 0 }