Accessing the CCPE-M dataset

The CCPE-M dialog dataset consists of 502 English dialogs with 11,972 annotated utterances between a user and a Wizard-of-Oz assistant discussing movie preferences in natural language. A full description of the data is provided in readme.txt and in this research paper.

The data can be downloaded in JSON format directly from here.

The data can also be downloaded on the command line via gsutil. Instructions to install gsutil can be found here. Note that there is an option to install gsutil as a standalone tool if you don't want to download the Google Cloud SDK. Once installed, the entire dataset and related documentation (~5MB) can be downloaded with:

gsutil -m cp -R gs://dialog-data-corpus/CCPE-M-2019 <path_to_destination>

Please cite the data as:

F. Radlinski, K. Balog, B. Byrne and K. Krishnamoorthi (2019). Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences, In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL).