Accessing the Taskmaster-1 dataset

The full Taskmaster-1 dialog dataset has a total of 13,215 English dialogs with 7708 written and 5507 spoken. A full description of the data is provided in readme.txt. To get a basic idea of the dialog content see sample.json. The annotation schema is viewable in ontology.json.

The data can be downloaded on the command line via gsutil. Instructions to install gsutil can be found here. Note that there is an option to install gsutil as a standalone tool if you don't want to download the Google Cloud SDK. Once installed, substitute gsutil_path in the command below with the data  path (listed in table).

gsutil cp <gsutil_path> <path_to_destination>

Data

Description

Path

self-dialogs.json

Dialogs written by one person.

gs://dialog-data-corpus/TASKMASTER-1-2019/self-dialogs.json

woz-dialogs.json

Dialogs collected using a Wizard of Oz approach.

gs://dialog-data-corpus/TASKMASTER-1-2019/woz-dialogs.json

instructions/

Instructions used to create dialogs.  

gs://dialog-data-corpus/TASKMASTER-1-2019/instructions/*

To download the entire dataset and related documentation (~80MB), you can run:

gsutil -m cp -R gs://dialog-data-corpus/TASKMASTER-1-2019 <path_to_destination>

Comments or questions? Join taskmaster-datasets@googlegroups.com to discuss.