Accessing the Taskmaster-1 dataset
The full Taskmaster-1 dialog dataset has a total of 13,215 English dialogs with 7708 written and 5507 spoken. A full description of the data is provided in readme.txt. To get a basic idea of the dialog content see sample.json. The annotation schema is viewable in ontology.json.
The data can be downloaded on the command line via gsutil. Instructions to install gsutil can be found here. Note that there is an option to install gsutil as a standalone tool if you don't want to download the Google Cloud SDK. Once installed, substitute gsutil_path in the command below with the data path (listed in table).
gsutil cp <gsutil_path> <path_to_destination>
Dialogs written by one person.
Dialogs collected using a Wizard of Oz approach.
Instructions used to create dialogs.
To download the entire dataset and related documentation (~80MB), you can run:
gsutil -m cp -R gs://dialog-data-corpus/TASKMASTER-1-2019 <path_to_destination>
Comments or questions? Join email@example.com to discuss.