# TFDS CLI

TFDS CLI is a command-line tool that provides various commands to easily work with TensorFlow Datasets.

Copyright 2020 The TensorFlow Datasets Authors, Licensed under the Apache License, Version 2.0

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/datasets/cli"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/cli.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/datasets/blob/master/docs/cli.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/datasets/docs/cli.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

##### Disable TF logs on import


In [1]:
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

## Installation

The CLI tool is installed with `tensorflow-datasets` (or `tfds-nightly`).

In [2]:
!pip install -q tfds-nightly apache-beam
!tfds --version

TensorFlow Datasets: 4.9.3+nightly


For the list of all CLI commands:

In [3]:
!tfds --help

usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.


## `tfds new`: Implementing a new Dataset

This command will help you kickstart writing your new Python dataset by creating
a `<dataset_name>/` directory containing default implementation files.

Usage:

In [4]:
!tfds new my_dataset

Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.


`tfds new my_dataset` will create:

In [5]:
ls -1 my_dataset/

CITATIONS.bib
README.md
TAGS.txt
__init__.py
checksums.tsv
[0m[01;34mdummy_data[0m/
my_dataset_dataset_builder.py
my_dataset_dataset_builder_test.py


An optional flag `--data_format` can be used to generate format-specific dataset builders (e.g., `conll`). If no data format is given, it will generate a template for a standard ``tfds.core.GeneratorBasedBuilder``.
Refer to the [documentation](https://www.tensorflow.org/datasets/format_specific_dataset_builders) for details on the available format-specific dataset builders.

See our [writing dataset guide](https://www.tensorflow.org/datasets/add_dataset)
for more info.

Available options:

In [6]:
!tfds new --help

usage: tfds new [-h] [--helpfull] [--data_format {standard,conll,conllu}]
                [--dir DIR]
                dataset_name

positional arguments:
  dataset_name          Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --data_format {standard,conll,conllu}
                        Optional format of the input data, which is used to
                        generate a format-specific template.
  --dir DIR             Path where the dataset directory will be created.
                        Defaults to current directory.


## `tfds build`: Download and prepare a dataset

Use `tfds build <my_dataset>` to generate a new dataset. `<my_dataset>` can be:

* A path to `dataset/` folder or `dataset.py` file (empty for current directory):
  * `tfds build datasets/my_dataset/`
  * `cd datasets/my_dataset/ && tfds build`
  * `cd datasets/my_dataset/ && tfds build my_dataset`
  * `cd datasets/my_dataset/ && tfds build my_dataset.py`

* A registered dataset:

  * `tfds build mnist`
  * `tfds build my_dataset --imports my_project.datasets`

Note: `tfds build` has useful flags to help prototyping and debuging. See the `Debug & tests:` section bellow.

Available options:

In [7]:
!tfds build --help

usage: tfds build [-h] [--helpfull]
                  [--datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]]
                  [--overwrite] [--fail_if_exists]
                  [--max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]]
                  [--data_dir DATA_DIR] [--download_dir DOWNLOAD_DIR]
                  [--extract_dir EXTRACT_DIR] [--manual_dir MANUAL_DIR]
                  [--add_name_to_manual_dir] [--download_only]
                  [--config CONFIG] [--config_idx CONFIG_IDX]
                  [--update_metadata_only] [--download_config DOWNLOAD_CONFIG]
                  [--imports IMPORTS] [--register_checksums]
                  [--force_checksums_validation]
                  [--noforce_checksums_validation]
                  [--beam_pipeline_options BEAM_PIPELINE_OPTIONS]
                  [--file_format FILE_FORMAT]
                  [--max_shard_size_mb MAX_SHARD_SIZE_MB]
                  [--num-processes NUM_PROCESSES] [--publish_dir PUBLISH_DIR]
 