##### Copyright 2019 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 对结构化数据进行分类

<table class="tfo-notebook-buttons" align="left">
  <td>     <a target="_blank" href="https://tensorflow.google.cn/tutorials/structured_data/feature_columns"><img src="https://tensorflow.google.cn/images/tf_logo_32px.png">在 TensorFlow.org 上查看</a>
</td>
  <td>     <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/structured_data/feature_columns.ipynb"><img src="https://tensorflow.google.cn/images/colab_logo_32px.png">在 Google Colab 运行</a>
</td>
  <td>     <a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/tutorials/structured_data/feature_columns.ipynb">     <img src="https://tensorflow.google.cn/images/GitHub-Mark-32px.png">     在 Github 上查看源代码</a>   </td>
  <td>     <a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/tutorials/structured_data/feature_columns.ipynb"><img src="https://tensorflow.google.cn/images/download_logo_32px.png">下载笔记本</a>
</td>
</table>

> 警告：不推荐为新代码使用本教程中介绍的 `tf.feature_columns` 模块。[Keras 预处理层](https://tensorflow.google.cn/tutorials/structured_data/preprocessing_layers)介绍了此功能，有关迁移说明，请参阅[迁移特征列](../../guide/migrate/migrating_feature_columns.ipynb)指南。`tf.feature_columns` 模块旨在与 TF1 `Estimators` 结合使用。它不在我们的[兼容性保证](https://tensorflow.org/guide/versions)范围内，除了安全漏洞修正外，不会获得其他修正。

我们将使用一个小型 [数据集](https://archive.ics.uci.edu/ml/datasets/heart+Disease)，该数据集由克利夫兰心脏病诊所基金会（Cleveland Clinic Foundation for Heart Disease）提供。CSV 中有几百行数据。每行描述了一个病人（patient），每列描述了一个属性（attribute）。我们将使用这些信息来预测一位病人是否患有心脏病，这是在该数据集上的二分类任务。

- 用 [Pandas](https://pandas.pydata.org/) 导入 CSV 文件。
- 用 [tf.data](https://tensorflow.google.cn/guide/datasets) 建立了一个输入流水线（pipeline），用于对行进行分批（batch）和随机排序（shuffle）。
- 用特征列将 CSV 中的列映射到用于训练模型的特征。
- 用 Keras 构建，训练并评估模型。

## 数据集

下面是该数据集的[描述](https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names)。 请注意，有数值（numeric）和类别（categorical）类型的列。

Following is a description of this dataset. Notice there are both numeric and categorical columns. There is a free text column which we will not use in this tutorial.

列 | 描述 | 特征类型 | 数据类型
--- | --- | --- | ---
Type | 动物类型（狗、猫） | 分类 | 字符串
Age | 宠物年龄 | 数值 | 整数
Breed1 | 宠物的主要品种 | 分类 | 字符串
Color1 | 宠物的颜色 1 | 分类 | 字符串
Color2 | 宠物的颜色 2 | 分类 | 字符串
MaturitySize | 成年个体大小 | 分类 | 字符串
FurLength | 毛发长度 | 分类 | 字符串
Vaccinated | 宠物已接种疫苗 | 分类 | 字符串
Sterilized | 宠物已绝育 | 分类 | 字符串
Health | 健康状况 | 分类 | 字符串
Fee | 领养费 | 数值 | 整数
Description | 关于此宠物的简介 | 文本 | 字符串
PhotoAmt | 为该宠物上传的照片总数 | 数值 | 整数
AdoptionSpeed | 领养速度 | 分类 | 整数

## 导入 TensorFlow 和其他库

In [2]:
!pip install sklearn

Collecting sklearn
  Using cached sklearn-0.0.post11.tar.gz (3.6 kB)


  Preparing metadata (setup.py) ... [?25l-

 done
[?25hBuilding wheels for collected packages: sklearn


  Building wheel for sklearn (setup.py) ... [?25l-

 \

 done
[?25h  Created wheel for sklearn: filename=sklearn-0.0.post11-py3-none-any.whl size=2959 sha256=8768b3bdcb91a5939dd51855f6b3280de66f571e7304999227284073795aa611
  Stored in directory: /home/kbuilder/.cache/pip/wheels/9e/9e/4c/184e84f4ce918378a9ec9adafd1b6b73bea45f0a4a7855b6ce
Successfully built sklearn


Installing collected packages: sklearn


Successfully installed sklearn-0.0.post11


In [3]:
import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

2023-11-08 00:50:39.060981: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 00:50:39.061041: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 00:50:39.062610: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## 使用 Pandas 创建一个 dataframe

[Pandas](https://pandas.pydata.org/) 是一个 Python 库，它有许多有用的实用程序，用于加载和处理结构化数据。我们将使用 Pandas 从 URL下载数据集，并将其加载到 dataframe 中。

In [4]:
import pathlib

dataset_url = 'http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip'
csv_file = 'datasets/petfinder-mini/petfinder-mini.csv'

tf.keras.utils.get_file('petfinder_mini.zip', dataset_url,
                        extract=True, cache_dir='.')
dataframe = pd.read_csv(csv_file)

Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/petfinder-mini.zip


   8192/1668792 [..............................] - ETA: 0s



In [5]:
dataframe.head()

Unnamed: 0,Type,Age,Breed1,Gender,Color1,Color2,MaturitySize,FurLength,Vaccinated,Sterilized,Health,Fee,Description,PhotoAmt,AdoptionSpeed
0,Cat,3,Tabby,Male,Black,White,Small,Short,No,No,Healthy,100,Nibble is a 3+ month old ball of cuteness. He ...,1,2
1,Cat,1,Domestic Medium Hair,Male,Black,Brown,Medium,Medium,Not Sure,Not Sure,Healthy,0,I just found it alone yesterday near my apartm...,2,0
2,Dog,1,Mixed Breed,Male,Brown,White,Medium,Medium,Yes,No,Healthy,0,Their pregnant mother was dumped by her irresp...,7,3
3,Dog,4,Mixed Breed,Female,Black,Brown,Medium,Short,Yes,No,Healthy,150,"Good guard dog, very alert, active, obedience ...",8,2
4,Dog,1,Mixed Breed,Male,Black,No Color,Medium,Short,No,No,Healthy,0,This handsome yet cute boy is up for adoption....,3,2


## 创建目标变量

原始数据集中的任务是预测宠物被领养的速度（例如，在第一周、第一个月、前三个月等）。我们针对教程进行一下简化。在这里，我们将把它转化为一个二元分类问题，并简单地预测宠物是否被领养。

修改标签列后，0 表示宠物未被领养，1 表示宠物已被领养。

In [6]:
# In the original dataset "4" indicates the pet was not adopted.
dataframe['target'] = np.where(dataframe['AdoptionSpeed']==4, 0, 1)

# Drop un-used columns.
dataframe = dataframe.drop(columns=['AdoptionSpeed', 'Description'])

## 将 dataframe 拆分为训练、验证和测试集

我们下载的数据集是一个 CSV 文件。 我们将其拆分为训练、验证和测试集。

In [7]:
train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')

7383 train examples
1846 validation examples
2308 test examples


## 用 tf.data 创建输入流水线

接下来，我们将使用 [tf.data](https://tensorflow.google.cn/guide/datasets) 包装 dataframe。这让我们能将特征列作为一座桥梁，该桥梁将 Pandas dataframe 中的列映射到用于训练模型的特征。如果我们使用一个非常大的 CSV 文件（非常大以至于它不能放入内存），我们将使用 tf.data 直接从磁盘读取它。本教程不涉及这一点。

In [8]:
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()
  labels = dataframe.pop('target')
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds

In [9]:
batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)

## 理解输入流水线

现在我们已经创建了输入流水线，让我们调用它来查看它返回的数据的格式。 我们使用了一小批量大小来保持输出的可读性。

In [10]:
for feature_batch, label_batch in train_ds.take(1):
  print('Every feature:', list(feature_batch.keys()))
  print('A batch of ages:', feature_batch['Age'])
  print('A batch of targets:', label_batch )

Every feature: ['Type', 'Age', 'Breed1', 'Gender', 'Color1', 'Color2', 'MaturitySize', 'FurLength', 'Vaccinated', 'Sterilized', 'Health', 'Fee', 'PhotoAmt']
A batch of ages: tf.Tensor([ 1  1  2  4 36], shape=(5,), dtype=int64)
A batch of targets: tf.Tensor([1 1 1 0 1], shape=(5,), dtype=int64)


我们可以看到数据集返回了一个字典，该字典从列名称（来自 dataframe）映射到 dataframe 中行的列值。

## 演示几种特征列

TensorFlow 提供了多种特征列。本节中，我们将创建几类特征列，并演示特征列如何转换 dataframe 中的列。

In [11]:
# We will use this batch to demonstrate several types of feature columns
example_batch = next(iter(train_ds))[0]

In [12]:
# A utility method to create a feature column
# and to transform a batch of data
def demo(feature_column):
  feature_layer = layers.DenseFeatures(feature_column)
  print(feature_layer(example_batch).numpy())

### 数值列

一个特征列的输出将成为模型的输入（使用上面定义的 demo 函数，我们将能准确地看到 dataframe 中的每列的转换方式）。 [数值列（numeric column）](https://tensorflow.google.cn/api_docs/python/tf/feature_column/numeric_column) 是最简单的列类型。它用于表示实数特征。使用此列时，模型将从 dataframe 中接收未更改的列值。

In [13]:
photo_count = feature_column.numeric_column('PhotoAmt')
demo(photo_count)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


[[6.]
 [5.]
 [4.]
 [1.]
 [1.]]


在这个心脏病数据集中，dataframe 中的大多数列都是数值列。

### 分桶列

通常，您不希望将数字直接输入模型，而是根据数值范围将其值分成不同的类别。考虑代表一个人年龄的原始数据。我们可以用 [分桶列（bucketized column）](https://tensorflow.google.cn/api_docs/python/tf/feature_column/bucketized_column)将年龄分成几个分桶（buckets），而不是将年龄表示成数值列。请注意下面的 one-hot 数值表示每行匹配的年龄范围。

In [14]:
age = feature_column.numeric_column('Age')
age_buckets = feature_column.bucketized_column(age, boundaries=[1, 3, 5])
demo(age_buckets)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


[[0. 0. 0. 1.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [0. 1. 0. 0.]]


### 分类列

在此数据集中，thal 用字符串表示（如 'fixed'，'normal'，或 'reversible'）。我们无法直接将字符串提供给模型。相反，我们必须首先将它们映射到数值。分类词汇列（categorical vocabulary columns）提供了一种用 one-hot 向量表示字符串的方法（就像您在上面看到的年龄分桶一样）。词汇表可以用 [categorical_column_with_vocabulary_list](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list) 作为 list 传递，或者用 [categorical_column_with_vocabulary_file](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_file) 从文件中加载。

In [15]:
animal_type = feature_column.categorical_column_with_vocabulary_list(
      'Type', ['Cat', 'Dog'])

animal_type_one_hot = feature_column.indicator_column(animal_type)
demo(animal_type_one_hot)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


[[0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]]


### 嵌入列

假设我们不是只有几个可能的字符串，而是每个类别有数千（或更多）值。 由于多种原因，随着类别数量的增加，使用 one-hot 编码训练神经网络变得不可行。我们可以使用嵌入列来克服此限制。[嵌入列（embedding column）](https://tensorflow.google.cn/api_docs/python/tf/feature_column/embedding_column)将数据表示为一个低维度密集向量，而非多维的 one-hot 向量，该低维度密集向量可以包含任何数，而不仅仅是 0 或 1。嵌入的大小（在下面的示例中为 8）是必须调整的参数。

关键点：当分类列具有许多可能的值时，最好使用嵌入列。我们在这里使用嵌入列用于演示目的，为此您有一个完整的示例，以在将来可以修改用于其他数据集。

In [16]:
# Notice the input to the embedding column is the categorical column
# we previously created
breed1 = feature_column.categorical_column_with_vocabulary_list(
      'Breed1', dataframe.Breed1.unique())
breed1_embedding = feature_column.embedding_column(breed1, dimension=8)
demo(breed1_embedding)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


[[-0.07659347  0.10356487  0.01368913  0.02459786 -0.3640056   0.18093517
  -0.4344755  -0.31928632]
 [ 0.14273317 -0.27333534  0.31785467  0.50661033  0.16866212 -0.28168607
  -0.49861428 -0.09915882]
 [-0.24061684 -0.12300641 -0.5030474   0.15006594 -0.12581691 -0.58737135
  -0.16661155  0.35474306]
 [ 0.14273317 -0.27333534  0.31785467  0.50661033  0.16866212 -0.28168607
  -0.49861428 -0.09915882]
 [-0.36425355  0.32278156 -0.13717666  0.47004294 -0.01533458 -0.45511642
   0.3003537   0.5170619 ]]


### 经过哈希处理的特征列

表示具有大量数值的分类列的另一种方法是使用 [categorical_column_with_hash_bucket](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket)。该特征列计算输入的一个哈希值，然后选择一个 `hash_bucket_size` 分桶来编码字符串。使用此列时，您不需要提供词汇表，并且可以选择使 hash_buckets 的数量远远小于实际类别的数量以节省空间。

关键点：该技术的一个重要缺点是可能存在冲突，不同的字符串被映射到同一个范围。实际上，无论如何，经过哈希处理的特征列对某些数据集都有效。

In [17]:
breed1_hashed = feature_column.categorical_column_with_hash_bucket(
      'Breed1', hash_bucket_size=10)
demo(feature_column.indicator_column(breed1_hashed))

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


### 组合的特征列

将多种特征组合到一个特征中，称为[特征组合（feature crosses）](https://developers.google.com/machine-learning/glossary/#feature_cross)，它让模型能够为每种特征组合学习单独的权重。此处，我们将创建一个 age 和 thal 组合的新特征。请注意，`crossed_column` 不会构建所有可能组合的完整列表（可能非常大）。相反，它由 `hashed_column` 支持，因此您可以选择表的大小。

In [18]:
crossed_feature = feature_column.crossed_column([age_buckets, animal_type], hash_bucket_size=10)
demo(feature_column.indicator_column(crossed_feature))

Instructions for updating:
Use `tf.keras.layers.experimental.preprocessing.HashedCrossing` instead for feature crossing when preprocessing data to train a Keras model.


[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]


## 选择要使用的列

我们已经了解了如何使用几种类型的特征列。 现在我们将使用它们来训练模型。本教程的目标是向您展示使用特征列所需的完整代码（例如，机制）。我们任意地选择了几列来训练我们的模型。

关键点：如果您的目标是建立一个准确的模型，请尝试使用您自己的更大的数据集，并仔细考虑哪些特征最有意义，以及如何表示它们。

In [19]:
feature_columns = []

# numeric cols
for header in ['PhotoAmt', 'Fee', 'Age']:
  feature_columns.append(feature_column.numeric_column(header))

In [20]:
# bucketized cols
age = feature_column.numeric_column('Age')
age_buckets = feature_column.bucketized_column(age, boundaries=[1, 2, 3, 4, 5])
feature_columns.append(age_buckets)

In [21]:
# indicator_columns
indicator_column_names = ['Type', 'Color1', 'Color2', 'Gender', 'MaturitySize',
                          'FurLength', 'Vaccinated', 'Sterilized', 'Health']
for col_name in indicator_column_names:
  categorical_column = feature_column.categorical_column_with_vocabulary_list(
      col_name, dataframe[col_name].unique())
  indicator_column = feature_column.indicator_column(categorical_column)
  feature_columns.append(indicator_column)

In [22]:
# embedding columns
breed1 = feature_column.categorical_column_with_vocabulary_list(
      'Breed1', dataframe.Breed1.unique())
breed1_embedding = feature_column.embedding_column(breed1, dimension=8)
feature_columns.append(breed1_embedding)

In [23]:
# crossed columns
age_type_feature = feature_column.crossed_column([age_buckets, animal_type], hash_bucket_size=100)
feature_columns.append(feature_column.indicator_column(age_type_feature))

### 建立一个新的特征层

现在我们已经定义了我们的特征列，我们将使用[密集特征（DenseFeatures）](https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/layers/DenseFeatures)层将特征列输入到我们的 Keras 模型中。

In [24]:
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)

之前，我们使用一个小批量大小来演示特征列如何运转。我们将创建一个新的更大批量的输入流水线。

In [25]:
batch_size = 32
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)

## 创建，编译和训练模型

In [26]:
model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dropout(.1),
  layers.Dense(1)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_ds,
          validation_data=val_ds,
          epochs=10)

Epoch 1/10


I0000 00:00:1699404647.600018  915444 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


  1/231 [..............................] - ETA: 14:07 - loss: 1.5522 - accuracy: 0.3438

  2/231 [..............................] - ETA: 36s - loss: 1.1268 - accuracy: 0.3594  

  3/231 [..............................] - ETA: 36s - loss: 0.9634 - accuracy: 0.3125

  4/231 [..............................] - ETA: 36s - loss: 0.9905 - accuracy: 0.2812

  5/231 [..............................] - ETA: 35s - loss: 0.9161 - accuracy: 0.2750

 11/231 [>.............................] - ETA: 18s - loss: 0.7729 - accuracy: 0.4375

 14/231 [>.............................] - ETA: 16s - loss: 0.7812 - accuracy: 0.5179

 21/231 [=>............................] - ETA: 11s - loss: 0.7515 - accuracy: 0.5804

 23/231 [=>............................] - ETA: 11s - loss: 0.7771 - accuracy: 0.5856

 31/231 [===>..........................] - ETA: 8s - loss: 0.7280 - accuracy: 0.6230 

 32/231 [===>..........................] - ETA: 9s - loss: 0.7378 - accuracy: 0.6221

 39/231 [====>.........................] - ETA: 7s - loss: 0.7186 - accuracy: 0.6378

 47/231 [=====>........................] - ETA: 6s - loss: 0.6991 - accuracy: 0.6529

















































Epoch 2/10


  1/231 [..............................] - ETA: 11s - loss: 0.5767 - accuracy: 0.7188

  9/231 [>.............................] - ETA: 1s - loss: 0.5473 - accuracy: 0.7431 

 17/231 [=>............................] - ETA: 1s - loss: 0.6106 - accuracy: 0.7169

 25/231 [==>...........................] - ETA: 1s - loss: 0.5840 - accuracy: 0.7250

 33/231 [===>..........................] - ETA: 1s - loss: 0.5759 - accuracy: 0.7140

 41/231 [====>.........................] - ETA: 1s - loss: 0.5849 - accuracy: 0.7111

 49/231 [=====>........................] - ETA: 1s - loss: 0.5751 - accuracy: 0.7073

















































Epoch 3/10


  1/231 [..............................] - ETA: 10s - loss: 0.5314 - accuracy: 0.6875

  9/231 [>.............................] - ETA: 1s - loss: 0.5354 - accuracy: 0.7326 

 17/231 [=>............................] - ETA: 1s - loss: 0.5337 - accuracy: 0.7188

 25/231 [==>...........................] - ETA: 1s - loss: 0.5279 - accuracy: 0.7175

 32/231 [===>..........................] - ETA: 1s - loss: 0.5300 - accuracy: 0.7178

 39/231 [====>.........................] - ETA: 1s - loss: 0.5208 - accuracy: 0.7276

 47/231 [=====>........................] - ETA: 1s - loss: 0.5188 - accuracy: 0.7307

















































Epoch 4/10


  1/231 [..............................] - ETA: 10s - loss: 0.5814 - accuracy: 0.6875

  9/231 [>.............................] - ETA: 1s - loss: 0.4792 - accuracy: 0.7674 

 17/231 [=>............................] - ETA: 1s - loss: 0.4969 - accuracy: 0.7574

 25/231 [==>...........................] - ETA: 1s - loss: 0.5177 - accuracy: 0.7538

 33/231 [===>..........................] - ETA: 1s - loss: 0.5222 - accuracy: 0.7358

 41/231 [====>.........................] - ETA: 1s - loss: 0.5180 - accuracy: 0.7355

 49/231 [=====>........................] - ETA: 1s - loss: 0.5124 - accuracy: 0.7423















































Epoch 5/10


  1/231 [..............................] - ETA: 10s - loss: 0.3763 - accuracy: 0.6875

  9/231 [>.............................] - ETA: 1s - loss: 0.5158 - accuracy: 0.6806 

 17/231 [=>............................] - ETA: 1s - loss: 0.4818 - accuracy: 0.7279

 25/231 [==>...........................] - ETA: 1s - loss: 0.4929 - accuracy: 0.7400

 33/231 [===>..........................] - ETA: 1s - loss: 0.4862 - accuracy: 0.7434

 41/231 [====>.........................] - ETA: 1s - loss: 0.4885 - accuracy: 0.7409

 49/231 [=====>........................] - ETA: 1s - loss: 0.4899 - accuracy: 0.7411















































Epoch 6/10


  1/231 [..............................] - ETA: 10s - loss: 0.4209 - accuracy: 0.8125

  9/231 [>.............................] - ETA: 1s - loss: 0.5260 - accuracy: 0.7153 

 17/231 [=>............................] - ETA: 1s - loss: 0.5047 - accuracy: 0.7298

 25/231 [==>...........................] - ETA: 1s - loss: 0.4858 - accuracy: 0.7450

 33/231 [===>..........................] - ETA: 1s - loss: 0.4941 - accuracy: 0.7358

 41/231 [====>.........................] - ETA: 1s - loss: 0.4967 - accuracy: 0.7363

 49/231 [=====>........................] - ETA: 1s - loss: 0.4964 - accuracy: 0.7430

 50/231 [=====>........................] - ETA: 1s - loss: 0.4998 - accuracy: 0.7425















































Epoch 7/10


  1/231 [..............................] - ETA: 10s - loss: 0.3490 - accuracy: 0.7500

  9/231 [>.............................] - ETA: 1s - loss: 0.4431 - accuracy: 0.7674 

 17/231 [=>............................] - ETA: 1s - loss: 0.4562 - accuracy: 0.7629

 25/231 [==>...........................] - ETA: 1s - loss: 0.4768 - accuracy: 0.7475

 33/231 [===>..........................] - ETA: 1s - loss: 0.4698 - accuracy: 0.7576

 41/231 [====>.........................] - ETA: 1s - loss: 0.4727 - accuracy: 0.7553

 49/231 [=====>........................] - ETA: 1s - loss: 0.4684 - accuracy: 0.7551















































Epoch 8/10


  1/231 [..............................] - ETA: 9s - loss: 0.4894 - accuracy: 0.6875

  9/231 [>.............................] - ETA: 1s - loss: 0.4805 - accuracy: 0.7778

 17/231 [=>............................] - ETA: 1s - loss: 0.5116 - accuracy: 0.7574

 25/231 [==>...........................] - ETA: 1s - loss: 0.4811 - accuracy: 0.7675

 33/231 [===>..........................] - ETA: 1s - loss: 0.4899 - accuracy: 0.7557

 41/231 [====>.........................] - ETA: 1s - loss: 0.4856 - accuracy: 0.7546

 49/231 [=====>........................] - ETA: 1s - loss: 0.4792 - accuracy: 0.7564















































Epoch 9/10


  1/231 [..............................] - ETA: 10s - loss: 0.5286 - accuracy: 0.7812

  9/231 [>.............................] - ETA: 1s - loss: 0.4893 - accuracy: 0.7708 

 17/231 [=>............................] - ETA: 1s - loss: 0.4629 - accuracy: 0.7702

 25/231 [==>...........................] - ETA: 1s - loss: 0.4541 - accuracy: 0.7563

 33/231 [===>..........................] - ETA: 1s - loss: 0.4548 - accuracy: 0.7557

 41/231 [====>.........................] - ETA: 1s - loss: 0.4633 - accuracy: 0.7447

 49/231 [=====>........................] - ETA: 1s - loss: 0.4669 - accuracy: 0.7366

















































Epoch 10/10


  1/231 [..............................] - ETA: 10s - loss: 0.3258 - accuracy: 0.8125

  9/231 [>.............................] - ETA: 1s - loss: 0.4229 - accuracy: 0.7917 

 17/231 [=>............................] - ETA: 1s - loss: 0.4714 - accuracy: 0.7426

 25/231 [==>...........................] - ETA: 1s - loss: 0.4601 - accuracy: 0.7500

 33/231 [===>..........................] - ETA: 1s - loss: 0.4646 - accuracy: 0.7509

 41/231 [====>.........................] - ETA: 1s - loss: 0.4561 - accuracy: 0.7614

 49/231 [=====>........................] - ETA: 1s - loss: 0.4576 - accuracy: 0.7589















































<keras.src.callbacks.History at 0x7f40d7036e80>

In [27]:
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)

 1/73 [..............................] - ETA: 1s - loss: 0.8510 - accuracy: 0.5625

11/73 [===>..........................] - ETA: 0s - loss: 0.4884 - accuracy: 0.7415















Accuracy 0.7352686524391174


关键点：通常使用更大更复杂的数据集进行深度学习，您将看到最佳结果。使用像这样的小数据集时，我们建议使用决策树或随机森林作为强有力的基准。本教程的目的不是训练一个准确的模型，而是演示处理结构化数据的机制，这样，在将来使用自己的数据集时，您有可以使用的代码作为起点。

## 后续步骤

了解有关分类结构化数据的更多信息的最佳方法是亲自尝试。我们建议寻找另一个可以使用的数据集，并使用和上面相似的代码，训练一个模型，对其分类。要提高准确率，请仔细考虑模型中包含哪些特征，以及如何表示这些特征。