##### Copyright 2019 The TensorFlow Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Carregar um pandas.DataFrame

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/tutorials/load_data/pandas_dataframe"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />Ver em TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/pt-br/tutorials/load_data/pandas_dataframe.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Executar no Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs-l10n/blob/master/site/pt-br/tutorials/load_data/pandas_dataframe.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />Ver código fonte no GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/pt-br/tutorials/load_data/pandas_dataframe.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Baixar notebook</a>
  </td>
</table>

Este tutorial fornece um exemplo de como carregar dataframe do pandas em um `tf.data.Dataset`.

Este tutorial usa um pequeno [conjunto de dados] (https://archive.ics.uci.edu/ml/datasets/heart+Disease) fornecido pela Cleveland Clinic Foundation for Heart Disease. Existem várias centenas de linhas no CSV. Cada linha descreve um paciente e cada coluna descreve um atributo. Usaremos essas informações para prever se um paciente tem uma doença cardíaca, que neste conjunto de dados é uma tarefa de classificação binária.

## Ler os dados usando pandas

In [2]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import pandas as pd
import tensorflow as tf

Fazer download do arquivo csv que contém o conjunto de dados do coração.

In [3]:
csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/applied-dl/heart.csv')

Downloading data from https://storage.googleapis.com/applied-dl/heart.csv



Ler o arquivo csv usando pandas.

In [4]:
df = pd.read_csv(csv_file)

In [5]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,reversible,0
3,37,1,3,130,250,0,0,187,0,3.5,3,0,normal,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,normal,0


In [6]:
df.dtypes

age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal         object
target        int64
dtype: object

Converta a coluna `thal`, que é um `objeto` no dataframe para um valor numérico discreto

In [7]:
df['thal'] = pd.Categorical(df['thal'])
df['thal'] = df.thal.cat.codes

In [8]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,2,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,3,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,4,0
3,37,1,3,130,250,0,0,187,0,3.5,3,0,3,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,3,0


## Carregar dados usando o `tf.data.Dataset`

Use `tf.data.Dataset.from_tensor_slices` para ler os valores de um dataframe do pandas.

Uma das vantagens do uso do `tf.data.Dataset` é que ele permite escrever pipelines de dados simples e altamente eficientes. Leia o [loading data guide] (https://www.tensorflow.org/guide/data) para obter mais informações.

In [9]:
target = df.pop('target')

In [10]:
dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))

In [11]:
for feat, targ in dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))

Features: [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.
   2. ], Target: 0
Features: [ 67.    1.    4.  160.  286.    0.    2.  108.    1.    1.5   2.    3.
   3. ], Target: 1
Features: [ 67.    1.    4.  120.  229.    0.    2.  129.    1.    2.6   2.    2.
   4. ], Target: 0
Features: [ 37.    1.    3.  130.  250.    0.    0.  187.    0.    3.5   3.    0.
   3. ], Target: 0
Features: [ 41.    0.    2.  130.  204.    0.    2.  172.    0.    1.4   1.    0.
   3. ], Target: 0


Como um `pd.Series` implementa o protocolo `__array__`, ele pode ser usado de forma transparente em praticamente qualquer lugar que você usaria um `np.array` ou um `tf.Tensor`.

In [12]:
tf.constant(df['thal'])

<tf.Tensor: shape=(303,), dtype=int8, numpy=
array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3,
       3, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 4, 2, 4, 3, 4, 3, 4, 4,
       2, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4,
       4, 2, 3, 3, 4, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 4, 4, 4,
       3, 3, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4,
       3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 2, 4, 4, 2, 3, 3, 4, 4, 3, 4,
       3, 3, 4, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
       4, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 2,
       4, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 2, 2, 4, 3, 4, 2, 4, 3,
       3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 2, 2, 4, 3, 4, 3, 2, 4, 3, 3, 2,
       4, 4, 4, 4, 3, 0, 3, 3, 3, 3, 1, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4,
       3, 3, 4, 4, 4, 4, 3, 3, 4, 3, 4, 3, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3,
      

Aleatório e lote do conjunto de dados.

In [13]:
train_dataset = dataset.shuffle(len(df)).batch(1)

## Crirar e treinar um modelo

In [14]:
def get_compiled_model():
  model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1)
  ])

  model.compile(optimizer='adam',
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model

In [15]:
model = get_compiled_model()
model.fit(train_dataset, epochs=15)

Epoch 1/15


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.



  1/303 [..............................] - ETA: 0s - loss: 1.5467e-06 - accuracy: 1.0000

 28/303 [=>............................] - ETA: 0s - loss: 4.3005 - accuracy: 0.6786    

 56/303 [====>.........................] - ETA: 0s - loss: 3.9851 - accuracy: 0.6964



















Epoch 2/15
  1/303 [..............................] - ETA: 0s - loss: 1.9297e-07 - accuracy: 1.0000

 31/303 [==>...........................] - ETA: 0s - loss: 1.1653 - accuracy: 0.8387    

 59/303 [====>.........................] - ETA: 0s - loss: 1.1033 - accuracy: 0.8475



















Epoch 3/15
  1/303 [..............................] - ETA: 0s - loss: 2.9271e-05 - accuracy: 1.0000

 29/303 [=>............................] - ETA: 0s - loss: 1.5278 - accuracy: 0.7931    

 57/303 [====>.........................] - ETA: 0s - loss: 1.3368 - accuracy: 0.7544



















Epoch 4/15
  1/303 [..............................] - ETA: 0s - loss: 2.9582e-04 - accuracy: 1.0000

 29/303 [=>............................] - ETA: 0s - loss: 1.4422 - accuracy: 0.7586    

 58/303 [====>.........................] - ETA: 0s - loss: 1.2606 - accuracy: 0.7241



















Epoch 5/15
  1/303 [..............................] - ETA: 0s - loss: 0.0429 - accuracy: 1.0000

 30/303 [=>............................] - ETA: 0s - loss: 0.7618 - accuracy: 0.7667

 59/303 [====>.........................] - ETA: 0s - loss: 1.1391 - accuracy: 0.7288



















Epoch 6/15
  1/303 [..............................] - ETA: 0s - loss: 1.7463 - accuracy: 0.0000e+00

 29/303 [=>............................] - ETA: 0s - loss: 0.3252 - accuracy: 0.8966    

 58/303 [====>.........................] - ETA: 0s - loss: 0.4544 - accuracy: 0.8621



















Epoch 7/15
  1/303 [..............................] - ETA: 0s - loss: 0.0242 - accuracy: 1.0000

 30/303 [=>............................] - ETA: 0s - loss: 0.3259 - accuracy: 0.8000

 59/303 [====>.........................] - ETA: 0s - loss: 0.5161 - accuracy: 0.7966



















Epoch 8/15
  1/303 [..............................] - ETA: 0s - loss: 0.0503 - accuracy: 1.0000

 30/303 [=>............................] - ETA: 0s - loss: 0.6562 - accuracy: 0.8000

 59/303 [====>.........................] - ETA: 0s - loss: 0.7946 - accuracy: 0.7627



















Epoch 9/15
  1/303 [..............................] - ETA: 0s - loss: 0.0027 - accuracy: 1.0000

 31/303 [==>...........................] - ETA: 0s - loss: 0.4270 - accuracy: 0.8065

 60/303 [====>.........................] - ETA: 0s - loss: 0.5072 - accuracy: 0.8000



















Epoch 10/15
  1/303 [..............................] - ETA: 0s - loss: 1.3689 - accuracy: 0.0000e+00

 30/303 [=>............................] - ETA: 0s - loss: 0.3818 - accuracy: 0.8333    

 58/303 [====>.........................] - ETA: 0s - loss: 0.3681 - accuracy: 0.8621



















Epoch 11/15
  1/303 [..............................] - ETA: 0s - loss: 4.1361e-04 - accuracy: 1.0000

 29/303 [=>............................] - ETA: 0s - loss: 0.5946 - accuracy: 0.7586    

 57/303 [====>.........................] - ETA: 0s - loss: 0.7557 - accuracy: 0.7368



















Epoch 12/15
  1/303 [..............................] - ETA: 0s - loss: 0.4278 - accuracy: 1.0000

 29/303 [=>............................] - ETA: 0s - loss: 0.3258 - accuracy: 0.8621

 57/303 [====>.........................] - ETA: 0s - loss: 0.3266 - accuracy: 0.8772



















Epoch 13/15
  1/303 [..............................] - ETA: 0s - loss: 9.9377e-05 - accuracy: 1.0000

 30/303 [=>............................] - ETA: 0s - loss: 0.4766 - accuracy: 0.9000    

 58/303 [====>.........................] - ETA: 0s - loss: 0.9561 - accuracy: 0.7931



















Epoch 14/15
  1/303 [..............................] - ETA: 0s - loss: 0.3864 - accuracy: 1.0000

 29/303 [=>............................] - ETA: 0s - loss: 0.4538 - accuracy: 0.8621

 57/303 [====>.........................] - ETA: 0s - loss: 0.4616 - accuracy: 0.8421



















Epoch 15/15
  1/303 [..............................] - ETA: 0s - loss: 2.9692 - accuracy: 0.0000e+00

 30/303 [=>............................] - ETA: 0s - loss: 0.6236 - accuracy: 0.7667    

 58/303 [====>.........................] - ETA: 0s - loss: 0.6608 - accuracy: 0.7586



















<tensorflow.python.keras.callbacks.History at 0x7f3f5f32c710>

## Alternativa para colunas de características

Passar um dicionário como entrada para um modelo é tão fácil quanto criar um dicionário correspondente de camadas `tf.keras.layers.Input`, aplicar qualquer pré-processamento e empilhá-los usando a [API funcional] (../../guide/keras/functional.ipynb). Você pode usar isso como uma alternativa para [colunas de características] (../keras/feature_columns.ipynb).

In [16]:
inputs = {key: tf.keras.layers.Input(shape=(), name=key) for key in df.keys()}
x = tf.stack(list(inputs.values()), axis=-1)

x = tf.keras.layers.Dense(10, activation='relu')(x)
output = tf.keras.layers.Dense(1)(x)

model_func = tf.keras.Model(inputs=inputs, outputs=output)

model_func.compile(optimizer='adam',
                   loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                   metrics=['accuracy'])

A maneira mais fácil de preservar a estrutura da coluna de um `pd.DataFrame` quando usado com `tf.data` é converter o `pd.DataFrame` em um `dict` e dividir esse dicionário.

In [17]:
dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)

In [18]:
for dict_slice in dict_slices.take(1):
  print (dict_slice)

({'age': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([63, 67, 67, 37, 41, 56, 62, 57, 63, 53, 57, 56, 56, 44, 52, 57],
      dtype=int32)>, 'sex': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)>, 'cp': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 4, 4, 3, 2, 2, 4, 4, 4, 4, 4, 2, 3, 2, 3, 3], dtype=int32)>, 'trestbps': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([145, 160, 120, 130, 130, 120, 140, 120, 130, 140, 140, 140, 130,
       120, 172, 150], dtype=int32)>, 'chol': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([233, 286, 229, 250, 204, 236, 268, 354, 254, 203, 192, 294, 256,
       263, 199, 168], dtype=int32)>, 'fbs': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0], dtype=int32)>, 'restecg': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 0], dtype=int32)>, 'thalach': <tf.Tensor: shape=(16

In [19]:
model_func.fit(dict_slices, epochs=15)

Epoch 1/15


 1/19 [>.............................] - ETA: 0s - loss: 4.6000 - accuracy: 0.7500



Epoch 2/15
 1/19 [>.............................] - ETA: 0s - loss: 1.7118 - accuracy: 0.5625



Epoch 3/15
 1/19 [>.............................] - ETA: 0s - loss: 0.7722 - accuracy: 0.6250



Epoch 4/15
 1/19 [>.............................] - ETA: 0s - loss: 0.6410 - accuracy: 0.6875



Epoch 5/15
 1/19 [>.............................] - ETA: 0s - loss: 0.6018 - accuracy: 0.6875



Epoch 6/15
 1/19 [>.............................] - ETA: 0s - loss: 0.6144 - accuracy: 0.6250



Epoch 7/15
 1/19 [>.............................] - ETA: 0s - loss: 0.6143 - accuracy: 0.6250



Epoch 8/15
 1/19 [>.............................] - ETA: 0s - loss: 0.6072 - accuracy: 0.6250



Epoch 9/15
 1/19 [>.............................] - ETA: 0s - loss: 0.6011 - accuracy: 0.6250



Epoch 10/15
 1/19 [>.............................] - ETA: 0s - loss: 0.5951 - accuracy: 0.6250



Epoch 11/15
 1/19 [>.............................] - ETA: 0s - loss: 0.5914 - accuracy: 0.6250



Epoch 12/15
 1/19 [>.............................] - ETA: 0s - loss: 0.5882 - accuracy: 0.6875



Epoch 13/15
 1/19 [>.............................] - ETA: 0s - loss: 0.5809 - accuracy: 0.6875



Epoch 14/15
 1/19 [>.............................] - ETA: 0s - loss: 0.5760 - accuracy: 0.6875



Epoch 15/15
 1/19 [>.............................] - ETA: 0s - loss: 0.5698 - accuracy: 0.6875



<tensorflow.python.keras.callbacks.History at 0x7f3f8d4789b0>