LITERATURE REVIEW On the Iatrogenic Effect of Sensitive-Topics Survey Research: A Scoping Review

Self-report surveys are popular among researchers due to their convenience and, recently, the need for social distancing. However, there is concern that surveys investigating sensitive topics, such as suicide, may cause distress or harm to participants. Iatrogenic harm is any negative effect caused by an instrument, intervention, or treatment. I conducted a scoping review of the studies investigating the potential for iatrogenic harm from survey research. Thirty-seven studies were included. Results indicate that for a small subset of participants, sensitive-topics research can cause some distress. Usually, these participants also report that their participation was important and display a willingness to participate further. However, more randomised controlled trials based on power analyses and using validated psychometric outcome measures are needed.

It has long been known that all treatments and research programmes-medical or psychologicalinvolving human participants carry the inherent risk of iatrogenic effects.These are negative effects induced by an intervention such as research or treatment (Krishnan & Kasthuri, 2005).The need to protect participants from iatrogenic harm and to maximise benefit has been enshrined in most ethical codes of conduct for research involving human participants (American Psychological Association, 2002;National Institutes of Health, 1979;World Medical Association, 2008).Consequently, Institutional Review Boards (IRBs) have become wary of approving research that involves sensitive topics such as suicide, abuse, sex, and trauma for fear that the research will cause distress, worsen symptoms, or otherwise cause harm to participants.The potential for harm as a result of paper-based or online surveys pertaining to sensitive topics is an important factor, which an ethical social scientist must consider before conducting research.Research using sensitive-topics surveys is vital to many services and bodies of literature that seek to lower suicide or to lower youth violence and abuse (Ybarra et al., 2009).Some claim that IRBs often inaccurately or incorrectly judge the potential harm of research based on a range of assumptions or socio-political concerns rather than empirical evidence as to the level of risk (Kuyper et al., 2012).Subsequently, many researchers are attempting to empirically investigate and quantify the harm and benefit experienced by participants in sensitive-topics survey research to provide a reliable research base upon which decisions can be based concerning the appropriateness of future research.

The Present Study
The coronavirus pandemic limited the opportunities for in-person research; therefore, researchers had to turn to remote methods for conducting research.Indeed, even before the heightened need for social distancing measures, paper-based surveys were one of the main tools used by social scientists.Such methods are efficient and cheap, and often, survey research on sensitive topics is best self-reported rather than researcher-administrated, providing participants with a greater sense of anonymity to avoid demand characteristics, reactivity, and deception (Krumpal, 2013).Additionally, online participant pools provide researchers with an efficient means of recruiting participants and provide access to samples and data that are as good as those historically used by social scientists (Behrend et al., 2011).
Given that much of the research performed in psychology, both before the 2019 coronavirus pandemic and after, is performed remotely, it is relevant to focus the study of the iatrogenic harm in sensitive-topics research to that of pen-and-paper and online surveys.Before conducting such research, however, it may be valuable for researchers to have "a lay of the land"-an understanding of what research has already taken place, what common methods and measures are used, what type of samples were used, what type of sensitive-topics surveys were used, and the results that have been found so far.The aim of this study is to perform a full scoping review of the literature investigating the potential iatrogenic effects of paper sensitive-topics survey research.A scoping review methodology was chosen to provide an overview of the design, methods, and results of studies investigating iatrogenic harm from sensitive topics research in order to identify gaps in the literature and inform future studies.

Method
This scoping review is performed according to the five-step methodology laid out by Arksey and O'Malley (2005).The intent of the present study is based on two of the four reasons that Arksey and O'Malley identify for performing a scoping review: to examine the range and nature of research activity and to identify gaps in the existing literature.

Step 1: Identifying the Research Question(s)
This project aims to answer two primary research questions: 1) What have previous studies discovered regarding the effect of answering survey questions about mental health on distress and other emotional states?2) What are the methodologies used to study this topic throughout the literature?
Step 2: Identifying Relevant Studies The search for relevant studies began with an informal search of the literature, including database searches and hand searching, to identify the common keywords used in relevant studies.The most common keyword found throughout the search was the technical term "iatrogenic," meaning negative effects induced by an intervention, treatment, or study (Krishnan & Kasthuri, 2005).In the case of the present study, an "iatrogenic" effect refers to any harm caused to a participant in the process of taking part in mental health survey research.
The term iatrogenic is used frequently throughout the literature of clinical psychology and the medical and biological sciences.Likewise, terms such as "mental health", "distress", "negative effects", and "survey" are used throughout psychological research.This presented a problem in that any keyword combination involving "iatrogenic" paired with "harm", "distress", and "research" even in the presence of keywords more relevant to this review, such as "mental health" or "survey research", inevitably returned upwards of tens of thousands of results, often more than 30,000.A very large proportion of the results were studies from unrelated fields or subdisciplines that use the same terms.In the interest of feasibility and efficiency, I decided that for all keyword combinations, I would comb the first 20 pages of results for relevant studies.Once a body of relevant studies had been found from keyword searches, I would then continue the search via hand searching of the reference lists.Given that the body of literature was quite small and that many of the studies identified by keyword were reviews and meta-analyses, I concluded that a hand search following a partial keyword search would cover the available relevant literature well.
Google Scholar was used as the primary database for the search, as it searches the widest array of databases and journals and covers the major journal repositories.The author's University database search engine was used, which searches the university's library as well as multiple online journal databases.
Because I was unable to comb through all returned studies from the keyword search and was instead collecting some of the relevant studies from the first 20 pages of results for a later hand search, I was very broad and generous with the selection criteria for this part of the search, by selecting studies that appeared even slightly relevant.Studies which appeared to be in the field of psychology and which appeared even nominally related to research, mental health and harm of some kind were selected and added to the list for later inspection and handsearching of reference lists.Google Scholar returns approximately 10 results per page, so for each keyword combination I combed approximately 200 results (approximately 1,000 in total).Table 1 shows the search strategy: the keyword combinations and number of results selected for later inspection and reference list searching, and the subsequent studies found via hand-searching.

Step 3: Study Selection
Step 2 resulted in a list of 105 articles that were selected based on the broad selection criteria for that stage.Any studies that appeared to investigate the potential effects of taking part in psychological research were selected.The criteria were necessarily broad for step 2 because most of the search was to be done by hand.Therefore, any slightly relevant studies were included because their reference lists might contain articles that are important to this review.Then, the list of 105 articles contained varied research investigating the iatrogenic effects of clinical screenings using psychological measures, interview research, pen-and-paper surveys, the iatrogenic effects of therapeutic interventions, the effect of school interventions, meta-analyses, and literature reviews.
In step 3, I established a set of exclusion/inclusion criteria.Given the small number of studies, these were similarly simple and broad, as I did not need to cull many studies to make the review more feasible.For studies to be included in this review, they were required to be based on penand-paper or online survey research.Given the small number of studies in the list, there was no limit placed on the year of publication.Similarly, no criteria were placed around research design (e.g., experimental vs. observational) or, outcome measures (e.g., focus on studies that measured distress specifically) or independent variables (e.g., I did not limit studies to those that use surveys on a particular sensitive topic such as depression).
Therefore, I selected studies that attempted to investigate the iatrogenic effects of pen-and-paper and online sensitive-topics surveys on participants.Figure 1 shows the PRISMA diagram of the selection process.In total, 68 studies were excluded.Of those Table 1.

Keywords
Number of results selected (Duplicates omitted) 1 "Mental Health Survey" and "Iatrogenic" and "Distress" 29 2 "Online Survey" and "Mental Health" and "Suicidality or Distress" 1 3 "Negative Effects" and "Mental Health Survey" 4 4 "Mental Health Survey" and "Effect" and "Distress" 2 5 Hand search of reference lists of previously selected studies.Total 69 105 excluded, 41 were excluded because they used interviews as the survey method, six were excluded because they investigated the iatrogenic effects of a type of therapy or therapeutic approach, five were excluded because they investigated the effect of an education or training programme, five were excluded because they used other types of stimuli, such as exposure to potentially distressing videos, and 11 because they were entirely unrelated.

Results
Thirty-seven studies were included in the final sample for this review.Of these, 10 were experimental studies, and 27 were observational.No metaanalyses were included in the sample because they all subsumed both interview and pen-andpaper/online surveys under the same meta-analyses, whereas this review is concerned with remote methods such as pen-and-paper/online surveys.I was particularly interested in the number and results of experimental studies, as they provide a stronger basis for causal inferences about the iatrogenic effects of the surveys administered.Therefore, I have reviewed the experimental and observational studies separately.

Independent Variables
These 10 RCTs investigated the iatrogenic effects of a range of different sensitive topics surveys such as suicide, non-suicidal self-injury (NSSI), sexual history, violence, substance abuse and PTSD (see Table 2).These studies almost always used batteries of measures-an amalgam of multiple surveys pertaining to different sensitive topics, often combined with non-validated items.For this reason, Table 2 lists the subject matter of the independent measures used in the surveys rather than the specific measures, as listing all the individual instruments in the table was impractical.For example, Cook et al. (2015) used a survey composed of 300 questions pulled from 10 different instruments pertaining to many different sensitive topics.All studies used wellvalidated measures that are often used in psychological research, such as the Sexual Experiences Survey (Cook et al., 2015;Pedersen et al., 2014), the PTSD Checklist (Ferrier-Auerbach et al., 2009;Rinehart et al., 2017;Yeater et al., 2012), the Rape Myth Acceptance Scale (Rinehart et al., 2017;Yeater et al., 2012) and many others.
The use of batteries of multiple different sensitive surveys in these studies makes it difficult to ascertain which of the instruments or sensitive topics influenced the mood or distress level of participants.The studies which used a single independent measure were the study by Muehlenkamp et al. (2014), which investigated the effect that the Inventory of Statements About Self Injury (ISAS) had on participants, and the study by de Beurs et al. ( 2016) which used the Beck Suicidal Ideation scale.

Outcome measure
There was less variability and complexity among the outcome instruments used to measure the effect of the sensitive surveys.The most common outcome measure used was the Positive and Negative Affect Scale (PANAS; Watson et al., 1988) used in four of the 10 studies, followed by the Profile of Mood States (POMS; Pollock et al., 1979), used in two of the studies (see Table 2).Not all the studies focused only on the surveys' effect on mood or affect.Others investigated whether suicide research increased the rate of suicidal ideation on the Suicidal Ideation Questionnaire (SIQ) (Gould et al., 2005) or whether the trauma, sexual history, and PTSD survey battery affected PTSD symptom severity (Pedersen et al., 2014).

Samples and Power Analyses
As seen in Table 2, six of the studies used samples of undergraduate students (de Beurs et al., 2016;Cook et al., 2015;Muehlenkamp et al., 2014;Pedersen et al., 2014;Rinehart et al., 2017;Yeater et al., 2012), and two of the studies focused on a sample of adolescents (Gould et al., 2005;Robinson et al., 2011).Furthermore, one study focused on a femaleonly sample (Pedersen et al., 2014), and another one focused on adolescent boys (Robinson et al., 2011).
Only three of the 10 studies provided any indication of awareness or discussion of statistical power for the analyses performed (Cook et al., 2015;Gould et al., 2005;Rinehart et al., 2017).None of these studies describe an actual a priori power analysis.Still, they do show some awareness of the statistical power of their tests or at least describe an effort to maximise statistical power.

Methods
The most common design used in these experiments was a pretest-posttest design (de Beurs et al., 2016;Ferrier-Auerbach et al., 2009;Harris & Goh, 2017;Muehlenkamp et al., 2014).Rinehart et al. (2017) 2011) used a counterbalanced design in which the control and experimental conditions completed their respective measures on day one, and then the conditions were reversed on day two.
In addition to their main respective research designs, many of the studies used a post-test measure of reactions to research participation, asking participants about their experience with the research (Ferrier-Auerbach et al., 2009;Muehlenkamp et al., 2014;Pedersen et al., 2014;Yeater et al., 2012).
The length of some of the studies using multiple sensitive topics measures may be considered problematic.Yeater et al. (2012) subjected participants to a formidable battery of measures that took two hours or more to complete and covered a range of different topics.It is possible that engaging in sensitive topics research of such duration and complexity might cause distress by the sheer intensity and length of the survey.However, this effect of survey length has not been investigated.In contrast, upon completing such a long survey, participants may have returned to baseline or even been desensitised to the initial distress caused by the sensitive survey.Until the iatrogenic effects of validated measures of singular sensitive topics have been properly investigated, it does not seem appropriate to use multiple surveys of different measures administered simultaneously, as surveys such as these are not likely to be used often in reality and identifying which measure or what aspect of the study caused any harm or distress is extremely difficult.

Findings
While some of the studies did find a statistically significant increase in distress or reduction in positive emotion after completing sensitive topic surveys, the effect sizes were small.None of the studies concluded that sensitive-topics research poses a significant risk to participants or that the risks outweigh the benefits.Gould et al. (2005) found no difference in distress levels between experimental and control groups at the time of the survey or two days later.The suicide screening survey did not increase suicide ideation or distress in high-risk students compared to the highrisk students in the control condition.There was some evidence that students with depression and those with suicide attempt histories were less distressed than others.Likewise, Harris and Goh (2017) found no significant differences in affect between study conditions and no pre-and post-test affect changes for condition or suicidal participants.Participants with depressive symptoms showed a decrease in positive affect in both sensitive topics and the control conditions.This indicates that the sensitive topics research was not any more distressing than participating in the control survey.Depressive symptoms and family support predicted negative affect changes.Robinson et al. (2011) found that exposure to the sensitive-topics screening questions did not increase distress, even among students previously identified as high-risk.Only 8.9% of students reported finding the questions about self-harm to be either moderately or very distressing.Over 70% of participants found the research to be moderately or very worthwhile, however, those who were identified as at-risk found the research to be less worthwhile than those who were not.
Perhaps somewhat counter-intuitively, Yeater et al. (2012) found that participants who completed trauma and sex surveys had higher positive affect and perceived the study as having greater benefits than those in the control condition.All the participants rated normal life stressors as more distressing than participating in the study.Similarly, Muehlenkamp et al. (2014) found that asking about NSSI did not produce iatrogenic effects and, in fact, may have lowered distress and produced a small decrease in regard to the urge to self-injure.Participants also listed a range of positive emotions toward the research and a willingness to participate again.
In contrast, de Beurs et al. (2016) found that answering surveys about suicide does produce distress in a minority of participants.Of those who did experience distress, 80% were in the experimental group.Similarly, Ferrier-Auerbach et al. (2009) found that after controlling for baseline affect, participants in the experimental group reported significantly higher sadness and tension than those in the control group.However, there was no difference in willingness to complete further research or perceived gain from participating in the research among the conditions.Rinehart et al. (2017) found that all participants had low distress after participation, though those exposed to sensitive surveys did have a significantly higher level of distress, but all participants perceived some benefit from participation.Pedersen et al. (2014) found that daily trauma surveys produced a small increase in distress and PTSD symptoms but that participants' responses to participation indicate that these changes are well-tolerated.Many participants reported no harm from participation, and few reported no benefit.The authors suggest, based on the data and participant-reported benefits, that for some participants, the short-term distress had a cathartic effect.Cook et al. (2015), when controlling for baseline PTSD symptoms and levels of distress, found a small decrease in positive affect immediately after responding to questions about sexual violation.Still, this effect had diminished at two weeks postparticipation.These participants and others who responded to questions about stressful events reported greater perceptions of the benefits of research participation.
These studies tend to suggest that while some sensitive topics research may cause -for a minority of participants-a small increase in distress, a reduction in positive affect and/or an increase in negative affect, they do not increase symptoms of suicidal ideation or intent to self-harm.Likewise, even in the presence of increased distress, studies found that participants reported perceiving benefits from research participation, that the research was not more distressing than general life stress, and that willingness to participate was still generally high among participants.

Observational Studies
Of the 37 articles selected for this review, 27 were observational studies.That is, while some utilised statistical controls in their analyses, there were no control conditions and, therefore no random assignment to control/experimental conditions.The majority of these studies were performed in the USA (see Table 3).

Independent Variables
Like the experimental surveys reviewed above, the sensitive-topics surveys used in the observational studies are varied and often used batteries combining surveys of varied focus.Therefore, Table 4 shows the focus of the independent variables used rather than the actual measures used.

Table 4.
A simplified data chart of the 27 observational studies selected for review.  of surveys (see Table 5).

Dependent Variables
As seen in Table 4, many of the studies use nonvalidated or open-ended items to measure the effect that the survey had on participants rather than validated measures of distress, etc.Approximately 45% of studies used non-validated items to measure distress or reactions to research participation (see Table 4).Items such as these have not been assessed for validity or reliability in their use as questionnaires for measuring outcomes of research participation, however, they have strong face validity for their intended use.The majority of studies measured distress, level of upset, or change in mood from survey participation.Four studies investigated whether suicide/self-harm surveys increased or exacerbated suicide/self-harm symptoms or intentions (Coppersmith et al., 2021;Gibson et al., 2014;Hom et al., 2018;Mathias et al., 2012).Six studies used the Reactions to Research Participation Questionnaire (RRPQ) as the basis of their outcome measures (Edwards et al., 2012;Edwards et al., 2013;Edwards et al., 2014;Johnson & Benight, 2003;Shorey, Febres, et al., 2013;Shorey, Zucosky, et al., 2013), and others used well known and wellvalidated measures such as State-Trait Personality Inventory (Rojas & Kinder, 2007;Savell et al., 2006), the PTSD checklist (Newman et al., 1999), and the Lazarus Stress Questionnaire (Daugherty & Lawrence, 1996; see Table 4).

Samples and Power Analyses
None of the studies described any sort of a priori power analysis, and only two studies acknowledged a potential lack of statistical power in passing.Hom et al. (2018) acknowledged a potential lack of statistical power for their tests, and Carter et al. (2020) appear to have conducted a post hoc power analysis and likewise acknowledged a potential lack of statistical power.Most of these studies used reasonably large samples, however, without a power analysis and some idea of effect sizes expected, one cannot know how well the statistical tests performed.
As seen in Table 4, 10 of the studies used a sample of undergraduates, eight focused on adolescents and/or children, eight of the samples focused only on female participants and two focused on men.

Methods
Fourteen of the studies used a simple post-test survey design, while eight used a longitudinal/repeated measures design, and five used a pretest-posttest survey design (see Table 6).This review focused on studies investigating the iatrogenic effects of pen-and-paper and online sensitive topics survey research.Thirty-seven studies were included for review.There was a clear lack of randomised controlled trials in the literature, with almost 75% of the studies being of observational design.Many of the studies -including the RCTsused batteries of tests pertaining to a suite of sensitive topics.This makes it difficult for researchers to ascertain which sensitive topics surveys were causing the changes in affect and/or causing distress.There needs to be more investigation into the effect of survey length to clarify whether it was the length and brute force of such long sensitive topics surveys that caused distress.
There was a clear lack of a priori power analyses in the literature, even in the RCTs.While it is true that many of the studies had quite large samples, it is still necessary to understand the expected effect sizes and one's power to detect them.
The lack of experimental studies and the relative abundance of non-experimental studies is a major concern for researchers seeking to evaluate potential risk to participants in a course of study focused on a sensitive topic.These studies do not have the ability to make any causal claims about the surveys' effect on participants and, therefore, cannot make any strong claim as to the relative safety or danger of the surveys.In addition, the frequent use of nonvalidated outcome measures is a threat to the external validity of many of these studies.

Implications
With the limitations of the research in this review acknowledged, there was little evidence to suggest that online and pen-and-paper survey research on sensitive topics present a threat of harm to participants.While some sensitive topics surveys may cause minor distress, this distress was fleeting, and most participants felt that their participation was worthwhile.Therefore, there is little evidence in the existing literature to suggest that researchers or ethical review boards should be particularly hesitant to engage in sensitive topics research, as the research so far suggests that there is little to no harm associated with such research.

Future Directions
The lack of experimental studies on this topic presents a major gap in the literature and a clear direction for future researchers.There is a need for more RCTs to be performed based on full a priori power analyses and using validated measures of distress.These studies should be performed using single sensitive topics surveys, at least in the beginning, rather than entire batteries of multiple sensitive topics surveys.For experimental studies that find no evidence of an iatrogenic effect, it will be useful for researchers to perform analyses that are suitable for determining "no effect," such as Bayesian methods (Dienes, 2014).
Despite the conclusion that, in general, sensitivetopics survey research is likely safe, it is true that a minority of participants did experience mild increases in distress or symptomology.These results may represent a subset of participants who are particularly vulnerable to iatrogenic effects, and future research should focus on developing methods to identify and protect these particularly vulnerable participants.

Limitations
The main limitation of the present study is that the search method was confined to the first 20 pages of each search string or keyword combination.The fact that all keywords relevant to the present study were highly generic and relevant to most psychological research meant that the number of results returned was highly unwieldy and that the majority of them were irrelevant to the present study.Therefore, I found it appropriate to search the first 20 results pages for studies meeting inclusion criteria (a total of approximately 1,000 results) and, beyond that, to continue via hand searching the reference lists of studies for relevant cited literature.While I am confident that the present review provided adequate coverage of the relevant literature, this search method could fairly be perceived to be a limitation of the study.
and Yeater et al. (2012) used a post-test-only design.Three studies used variations of longitudinal designs:Gould et al. (2005) measured participants after an initial survey and then again before a second survey two days later;Cook et al. (2015) measured participants at four different time points, andPedersen et al. (2014) used a longitudinal design composed of a pre-assessment, a 30-day monitoring period of daily surveys, and a post-assessment.Robinson et al. (

Table 2 .
Data table of the 10 experimental studies selected for review.

Table 3 .
The countries in which the 27 observational studies included in this review were performed.

Table 5 .
The focus of the batteries and surveys used as independent measures in the 27 observational studies.

Table 6 .
Survey designs used in the 27 observational studies.