Edinburgh Research Explorer The Lothian Diary Project

The ongoing Lothian Diary Project consists of 125+ audio/video recordings collected since May 2020 from residents of Edinburgh and the Lothian counties in Scotland. The diaries comprise self-recorded monologues or semi-structured interviews in which participants discuss their experiences during different stages of the COVID-19 pandemic. Recordings were uploaded to an online survey that also collected consent, demographic information, and opinion regarding Covid-related policies. All data marked for reuse are and will be housed in the University of Edinburgh’s DataShare and DataVault repositories. A partial deposit is available now and another will be made available upon completion of data collection. Data from consenting participants will form an oral history archive with Museums and Galleries, Edinburgh.


CONTEXT
The data for the Lothian Diary Project are part of Edinburgh Speaks, a project of the Language Variation and Change Research Group in Linguistics and English Language at the University of Edinburgh. The dataset comprises 125+ individual contributions with varying levels of consent for sharing. Each contribution consists of a video or audio recording of 1-22 minutes and most also include answers to a 20-minute survey on participant demographics and COVID-19 experience. The video/audio recordings are mostly vlogs of individual adults or children speaking, responding to project website prompts (e.g. "How has your life changed during lockdown?"), but some are interviews with a parent, volunteer, or social worker behind the camera asking similar questions. These data are useful for both qualitative and quantitative social science research interested in any area of COVID-19 experience at the individual level. The focus is on Edinburgh and the Lothian countries because of the project's original motivation to conduct dialectological and sociolinguistic research on the local community. A subset of the data has been used for an MSc dissertation on how different notions of audience influence messaging and presentation style (Lee, 2020).

STEPS
(1) Participants self-record a digital audio/video diary. (2) Participants give consent for data usage via the opening page to the survey on Qualtrics (level 1, other researchers; level 2, oral history archive; level 3, radio/television/online) and are given the option to waive anonymity.
(3) Participants upload the diary file to a secure, temporary repository on Box.com only accessible by the research team. The file is manually transferred by a Research Assistant to a protected server space for processing ('DataStore'). (4) Participants answer questions about their COVID-19 experience through a Qualtrics survey, adapted from questions created by the University of Edinburgh's CovidLife project. 1 (5) Participants chose an option for a £15 payment: bank transfer, local business voucher, or charity donation. All diaries are backed up in original formats and as WAV files (mono, 16 KHz and mono/stereo 44 kHz).
Data processing steps: Transcripts of English language recordings are generated using custom automatic speech recognition models implemented in Kaldi (Povey et al, 2011) and then handcorrected in ELAN (ELAN, 2020). Transcripts of non-English speech will be hand-transcribed at a future date. For contributions marked for sharing with other researchers, mono 16 KHz WAV files, MP4 files, and accompanying transcripts and surveys are manually deposited into the Lothian Diary Project repository on DataVault; one deposit is available now and another will be made available upon completion of data collection. Data collection began on the 25 th of May 2020 and will continue until the 31 st of July 2021.

SAMPLING STRATEGY
Stage 1 of data collection followed convenience sampling. Any resident of Edinburgh and the Lothian counties was welcome to participate, regardless of vulnerability (e.g., children, vulnerable adults). Participants were recruited via social media platforms, adverts in local newspapers and radio, press releases, and word-of-mouth. Stage 2 of data collection was introduced to recruit underrepresented participants. The sampling strategy at this stage was to contact and partner with charities representing homeless, disabled, or other vulnerable individuals, as well as caregivers of any group. We established charity partnerships by running a targeted social media advertising campaign and an online interactive workshop for the Economic and Social Research Council's 'Festival of Social Science'. We also rented a local community space for three days to allow digitally excluded members of the public (e.g., those without reliable Wi-Fi access) to participate in person. At the time of writing,

REPOSITORY NAME
All data marked for reuse by other researchers (as per participant consent) are placed in the Lothian Diary Project collection either housed in the DataShare repository or the DataVault repository at the University of Edinburgh. The DataShare repository is Open Access and includes the audio or video recordings and their corresponding transcripts. The DataVault repository contains the same as well as participants' survey answers, including demographic information. These more sensitive data are accessible by contacting the data manager, Catherine Lai, at C.Lai@ed.ac.uk. For data collected at the time of writing, the DOI for DataShare is 10.7488/ds/3009 and the DOI for DataVault is 10.7488/7a22cc4b-87ec-4df3-a549-3b347fd4bca5.

PUBLICATION DATE
Data collection for the Lothian Diary Project will continue until approximately 31 July 2021, with publication to follow soon thereafter.

(4) REUSE POTENTIAL
Data that have been marked for reuse, as per the participant's consent, will be housed as an oral history archive with Museums and Galleries, Edinburgh, allowing researchers, policymakers, civil organisations, and community members to access. Those marked for wider reuse are available for other purposes (e.g., aggregation, reference, validation, teaching) to any interested individual or organisation, with the permission of the Lothian Diary Project research team. Our data can be reused beyond the scope of our project for other researchers, as well as non-researchers to identify common themes, topics and issues discussed in the pandemic. For example, the diaries capture facial, gestural, and speech components of affectively charged topics, enabling individuals who are interested in multimodal expression to further explore this area. Policymakers and civil organisations may also access our data to conduct further research to advance future responses to health emergencies, as well as to provide support for vulnerable groups. We also welcome collaborative opportunities, especially for creative ways in which we could share and reuse our data beyond the current scope of our project.