Quality Assessment of Studies Evaluating Cognitive Behavioural Therapy for Insomnia in Populations with PTSD: A Systematic Review

Objective: This systematic review aims to evaluate the quality of intervention studies assessing CBT-I in adult populations with PTSD. Specifically examining overall quality, specific design elements and trends over time. Method: n > 1 studies assessing CBT-I in adult populations with PTSD were included. The Randomized Controlled Trial - Psychotherapy Quality Rating Scale (RCT-PQRS) was used to evaluate study quality. Correlations between overall study quality and publication year were calculated. Results: Nine studies were included. Overall, study quality was rated as moderate. Six design elements were not addressed adequately in most studies. No significant correlation was found. Conclusions: The evidence base is of moderate quality. However, several specific design elements limit their conclusions. Strategies to minimize limitations are discussed.

Posttraumatic stress disorder (PTSD) is a psychiatric diagnosis that an individual can develop after experiencing traumatic experiences.The DSM-5 (American Psychiatric Association, 2013) outlines that, to live up to the criteria for PTSD, a person has to have been exposed to a serious traumatic experience whereafter several symptoms have been developed.The traumatic event is persistently reexperienced, the person engages in avoidance of trauma-related stimuli, and negative thoughts or feelings will begin or worsen after the trauma.Furthermore, the person will experience traumarelated arousal or reactivity that began or worsened after the trauma.These symptoms will have been present for more than one month, they create distress or functional impairment and are not due to medication, substance use or other illness.Posttraumatic stress disorder (PTSD) is associated with serious health conditions and worsening of psychiatric ailments including alcohol and substance use disorder, depression, and suicidal deaths (American Psychiatric Association, 2013).PTSD is rarely the only psychiatric disorder experienced by the patient, seeing that patients diagnosed with PTSD are 80% more likely to have symptoms that meet diagnostic criteria for at least one other mental disorder, such as depression, anxiety, or insomnia (American Psychiatric Association, 2013;Breslau et al., 1991;Creamer et al., 2001;Kessler et al., 1995).Previous studies have shown that 41% -91% of patients with PTSD have insomnia (Neylan et al., 1998;Ohayon & Shapiro, 2000), and even when PTSD symptoms remit post-treatment, insomnia symptoms usually persist (Zayfert & DeViva, 2004).
Insomnia impacts multiple domains of the patient's life, such as interpersonal and cognitive functioning, for example in the form of decreased energy and inability to concentrate (Benca, 2001;Goff et al., 2007), but it also has a significant financial impact on society, by leading to lower work productivity and increased absenteeism (Fullerton, 2006).Insomnia, in addition to PTSD, is also associated with serious health conditions and worsening of psychiatric ailments including alcohol and substance abuse, depression, and suicidal behaviour (Applewhite et al., 2012;Hoge et al., 2007;McLay et al., 2010;Wright et al., 2012).
Many treatments target insomnia, one of which is cognitive behavioural therapy for insomnia (CBT-I), which The American Academy of Sleep Medicine (AASM) recommends (Schutte-Rodin et al., 2008).The AASM defines CBT-I as a treatment combining different cognitive and behavioural monotherapies indicated for chronic insomnia (Schutte-Rodin et al., 2008).Clinicians' usage of CBT-I for patients with PTSD has become more widespread and studies in populations with PTSD are becoming more common (Pruiksma et al., 2018).An RCT evaluating CBT-I in an adult sample with PTSD found that 41% of participants in their intervention group achieved full remission on subjective sleep measures compared to 0% for a waitlist control group and that these results were maintained at 6month follow-up (Talbot et al., 2014).In another RCT large significant pre-to post-intervention change was observed for sleep efficiency measured with sleep diaries (Ustinov 2014).In that same study, no significant change was found for the waitlist control group.Although the increase in studies assessing the usefulness of CBT-I for patients with PTSD is useful for determining the value of this approach, it is important to assess the quality of these studies.Study quality is often categorized into different domains with one such categorization being: the description of subjects, definition and delivery of treatment, outcome measurements, data analysis and treatment assignment (Kocsis et al., 2010).
Proper reporting and conducting of different procedures and strategies in these domains ensures quality (Kocsis et al., 2010).
A meta-analysis of a variety of psychotherapies for depression found that study quality moderated treatment effect sizes (Barth et al., 2013).
Another meta-analysis examining psychotherapeutic interventions for depression examined the differences between high-quality studies and low-quality studies.Of the 115 randomized controlled trials (RCTs) they located only 11 studies that met the requirements of their quality criteria.The Cohen's d effect sizes, when the interventions were compared to control conditions, were 0.74 for all 115 studies and only 0.22 for the high-quality studies (Cuijpers et al., 2010).To further underline the importance of controlling for quality, one meta-analysis of CBT for depression found that across time, the quality of intervention studies increased, and the effect sizes decreased (Johnsen & Friborg, 2015).This may imply that there is a reverse causal relationship between the found effect sizes and the quality of the study.
To date, no systematic reviews have been conducted that specifically assess the quality of studies that evaluate CBT-I in adult populations with PTSD.As this particular research base is still young and quite small, it could be beneficial to gain an understanding of the quality of the evidence base for using these interventions in adult populations with PTSD.Furthermore, this review can help by elucidating aspects of studies that could be improved upon.
The aim of this study is to conduct a systematic review that seeks to answer the following question: What is the quality of the evidence base for CBT-I used in adult populations with PTSD?To critically evaluate the quality of the studies, we aim to answer three separate questions: 1) What is the overall quality of studies?
2) What is the quality of specific study design elements?3) Are there any trends in the quality of studies over time?
The findings of this review could provide an evaluation of the quality of current literature and indicate areas to be improved in the design of future intervention studies.

Eligibility Criteria
The eligibility criteria of this review will be presented according to the PICOS (population, intervention[s], control[s], outcome[s] studies) framework, which can be seen fully described in Table 1 (Cherry & Dickson, 2017).The control(s) section of PICOS was not applicable here, as all studies with n > 1 evaluating CBT-I in adult populations with PTSD were included with no specifications for control groups.

Information Sources
The literature search was conducted in PsycInfo and PubMed.These two databases were originally chosen as they are some of the largest databases of psychological research.

Search Strategy
PsycInfo and PubMed were searched with two similar search strings containing three search blocks (see Supplemental Materials).We applied date, language, and population limits.We excluded studies published before 1980 since this was the introduction of the DSM-III PTSD diagnosis.We also excluded studies with non-adult or non-human populations.Finally, we only included studies published in English or Danish.

Selection Process
Following the removal of duplicates, the two first authors performed title and abstract screening of all records.Studies were thereafter screened at the fulltext level if they met eligibility criteria or if the full text had to be assessed to properly screen the study.
When the decisions of the two first authors were inconsistent, disagreements were resolved through discussion.After eligible studies were identified through full-text screening, the reference lists of these studies were checked for any additional eligible studies.Lastly, the first authors of all eligible studies were contacted and asked whether they knew of any other eligible studies that were missed in our search.

Data Collection Process
The two first authors independently extracted relevant data for study, treatment, and participant Note. a = Studies were excluded if they examined any kind of treatment that combined conventional CBT-I with elements from other treatments, such as CBT-I + image rehearsal therapy (IRT) for nightmares or CBT-I + prolonged exposure therapy.This was done to focus the review on studies that were solely assessing the isolated effect of CBT-I.b = All intervention formats, such as individual, group, and telemedicine were included, as one of the purposes of this review was to be able to broadly inform future intervention studies of CBT-I in the PTSD population.c = Not applicable.For a more detailed explanation of the four applicable areas see the methods section.
characteristics.Inconsistencies in extracted data were resolved by revisiting study inclusion criteria together and reaching a consensus.Data extraction was purposefully conducted prior to the quality assessment as the process would increase the familiarization with the studies and therefore increase the ease of doing the quality assessment.
The RCT-PQRS was used to assess the quality of the included studies.Additional data items extracted were study type, control group, sample size, treatment name, treatment dosage, format (group/individual), mean age with standard deviations (SD), civilian or veteran population, percentage of sample with a clinician-administered PTSD diagnosis or self-report scores meeting cut-off for PTSD, percentage of women in sample, and concurrent treatment reported as yes/no.

Quality Assessment
Since the commonly used CONSORT tools are designed as checklists and not as tools with continuous rating scales, the RCT-PQRS was used for the purposes of this review (Kocsis et al., 2010).The RCT-PQRS was created by a subcommittee of the American Psychiatric Association for the systematic evaluation of the quality of RCTs assessing psychotherapeutic interventions (Kocsis et al., 2010).The tool has been used to evaluate the quality of clinical studies in a range of different reviews, both with controlled and uncontrolled trials (Gerber et al., 2011;Grenon et al., 2018Grenon et al., , 2019;;Harb et al., 2013;Johnsen & Thimm, 2018;Keefe et al., 2014Keefe et al., , 2020;;Kishita & Laidlaw, 2017;Koelen et al., 2014;Lilliengren et al., 2016;Steinert et al., 2017;Suh et al., 2019;Thoma et al., 2012;Zakhour et al., 2020).
The tool contains 25 items, with 24 in the following categories: description of subjects (four items), definition and delivery of treatment (five items), outcome measures (five items), data analysis (five items), treatment assignment (three items), and overall quality of the study (two items).The total score of the first 24 items is typically reported as an overall quality measurement and an omnibus rating (item 25) for the overall quality of the study is also given (Kocsis et al., 2010).Every item is scored as 0 (inadequate or limited reporting), 1 (adequate or brief reporting) or 2 (excellent or full reporting).The omnibus rating is scored on a scale of 1 (exceptionally poor) to 7 (exceptionally good).
The two first authors used the RCT-PQRS independently to evaluate the studies and in cases of disagreement, conflicts were resolved by discussion.When lasting disagreements of items or definitions were present, they were resolved through consultation with a senior researcher.In one study using the RCT-PQRS, a cut-off score of 50% of the maximum total scores of the first 24 items was used to indicate when a study met adequate quality (Gerber et al., 2011).That means that if a study receives a score of 1 on the first 24 items or receives equal amounts of scores of 0 and 2, they would be above that cut-off score.The same cut-off score of 50% was used in this review.

Statistical Analysis and Synthesis Methods
SPSS (Version 26) was used to calculate interrater reliability and descriptive statistics, and to perform correlational analyses.

Study, Treatment and Participant Characteristics
Study, treatment, and participant characteristics were synthesized narratively and with the use of descriptive statistics.

Aim 1: Overall Quality of Studies
To synthesize the results of the overall quality assessment, the mean percentage of maximum scores and mean omnibus score across studies were calculated.The maximum score of the RCT-PQRS is normally 48 (Kocsis et al., 2010).However, as several uncontrolled intervention studies are included in this review, several items relating to randomization procedures and control groups are not applicable (items 15, 20, 21, 22, 23).As in previous similar reviews using the RCT-PQRS for both controlled and uncontrolled trials, these items are omitted in the rating of the uncontrolled studies and, therefore limit the overall maximum score to 38 for the uncontrolled intervention studies (Harb et al., 2013).To accommodate the different study types, the percentage of maximum score was used as a metric of overall quality in a study instead of the maximum score itself.This metric was calculated by dividing the sum of all ratings for that particular study by the maximum possible sum of ratings for that particular study type, which was then multiplied by 100.

Aim 2: Quality of Specific Study Design Elements
The results of specific study design elements as per individual items categorized in the subcategories of the RCT-PQRS were synthesized narratively and tabulated.

Aim 3: Quality Over Time
To assess the quality trends over time, a correlation between the year of publication and the percentage of maximum score, as well as between year of publication and omnibus ratings, were calculated for each study.

Certainty Assessment
To statistically evaluate the interrater reliability of the RCT-PQRS, Kendall's tau-b correlation was calculated for the two raters' individual percentages of maximum scores ratings, as well as for omnibus ratings (Akoglu, 2018).Kendall's tau-b for the percentage of maximum scores was 0.58 (95% CI [0.51, 0.63], p < 0.001).Kendall's tau-b for omnibus ratings was 0.62 (95% CI [0.15, 0.86], p = 0.039).Typically, Kendall's tau-b scores < 0.39, 0.40 to 0.69, 0.7 to 0.99, and 1 are considered estimates of weak, moderate, strong, and perfect reliability, respectively (Akoglu, 2018).Therefore, using Kendall's tau-b, the interrater reliability in the current study was moderate for the percentages of maximum total scores and for the omnibus rating.

Search Results
After applying in-database restrictions and removing 71 duplicates, 649 studies were screened.In the title and/or abstract screening process, 598 studies were ineligible and therefore excluded.The percentage of the agreement prior to the discussion of disagreements between authors for title and abstract screening was 90%.42 studies were excluded during full-text screening, leaving eight studies.No additional eligible studies were found after handsearching the reference lists of the eight studies.After contact with the first authors of the eight studies, one additional relevant study was identified and included (Gehrman et al., 2020).In summary, nine studies were included in the review (DeViva et al., 2018;El-Solh et al., 2019;Gehrman et al., 2020;Gellis & Gehrman, 2011;Harb et al., 2019;Laurel Franklin et al., 2018;Owen, 2002;Talbot et al., 2014;Ustinov, 2014).See Figure 1 for a full overview of the screening process.

Treatment Type
All studies examined CBT-I in a face-to-face delivery.Six out of nine studies had a control group in the form of telemedicine (Gehrman et al., 2020;Laurel Franklin et al., 2018), CBT-I combined with IRT (Harb et al., 2019), waitlist (Talbot et al., 2014;Ustinov, 2014), or treatment as usual (Owen, 2002) range between four and eight.Three studies (DeViva et al., 2018;Gellis & Gehrman, 2011;Talbot et al., 2014) did not specify the length of sessions.The mean session length in minutes was 59.58 (15.84) minutes with a range between 37.50 and 90 minutes, where this data was available.See Table 2 for the full information.

Participant Characteristics
Most studies evaluated samples consisting entirely of veterans, with only one study involving both civilians and veterans (Talbot et al., 2014).For full information on participant characteristics see Table 3.All studies except two studies (Laurel Franklin et al., 2018;Ustinov, 2014) included a sample where 100% of participants were diagnosed with PTSD.For the two remaining studies, the percentage of participants diagnosed with PTSD in their samples was 67% (Laurel Franklin et al., 2018) and 87% (Ustinov, 2014).In all except one (Talbot et al., 2014) of the studies, the sample was predominantly male, with some being completely male (Gellis & Gehrman, 2011;Laurel Franklin et al., 2018;Owen, 2002).Four of the nine studies reported concurrent traumafocused treatment, meaning that participants were engaged in other trauma-focused treatments at the same time as the CBT-I intervention (DeViva et al., 2018;El-Solh et al., 2019;Gellis & Gehrman, 2011;Owen, 2002).

Aim 1: Overall Quality of Studies
The mean percentage of maximum total scores for the studies was 57.19 (12.82) with a range between 37.50 and 79.17.Therefore, the mean was above the predefined cut-off score of adequate quality defined as 50% maximum total scores.Seven of the nine studies were above the 50% cut-off score.The mean (SD) omnibus rating for the studies was 4.44 (1.01) with a range between 3 and 6.An omnibus rating of 4 is indicative of average quality.Eight of the nine studies had an omnibus rating of 4 or above.Thus, for most of the studies, both the percentage of maximum total scores and the omnibus rating were rated to have adequate/average quality.

Aim 2: Quality of Specific Study Design Elements
Table 4 provides a breakdown of ratings on each item of the RCT-PQRS for included studies.

Description of Participants (Items 1-4)
Most of the studies fully and appropriately described their diagnostic measurements, eligibility criteria and number of participants included/excluded (items one and four).All the studies briefly documented the reliability of diagnostic measurements but didn't conduct any/enough within-study reliability checks of the diagnostic measurements to achieve excellence (item two).Nearly all the studies either briefly or fully described relevant comorbidities (item three).

Definition and Delivery of Treatment (Items 5-9)
Most studies fully described the treatment being evaluated or cited references that could fully describe it (item five).However, most studies had poor or no adherence reporting with regard to demonstrating the correct delivery of treatment (item six).A small majority of studies fully described therapist training levels and used well-qualified therapists (item seven).A small majority of the studies had limited or excellent descriptions of therapist supervision (item eight).Lastly, all studies had either brief or full descriptions of concurrent treatments (item nine).

Outcome Measures (Items 10-14)
Most studies showed or cited full validations of outcome measures (item ten) and specified the primary outcome measures in advance, though not explicitly (item 11).Most studies had poor or no blinding of outcome raters with regard to the treatment group (item 12) and had poor or no discussions of potential adverse events during treatment (item 13).Most studies had medium-term (2-12 months) follow-ups (item 14), indicating adequacy but not excellence.

Data Analysis (Items 15-19)
The subcategory of data analysis differs for uncontrolled studies and RCTs since intent-to-treat

Treatment assignment (Items 20-22)
The subcategory of treatment assignment is only applicable to the RCTs as it considers randomization processes among other elements.A majority of RCTs had full justifications of comparison groups (item 20) and had comparison groups from the same population and time frame as the treatment group (item 21).Half of the RCTs utilized poor randomization procedures, such as sequential assignment or simple randomization with small samples (Beller et al., 2002).The other half had full and appropriate randomization procedures, such as performing randomization after screening and baseline assessment (item 22).

Overall quality of study (Items 23-24)
The item considering allegiance balancing with regard to treatment by therapists is only applicable for RCTs (item 23).Four of the six RCTs had either poor or no information with regard to allegiance balancing (item 23).The conclusions of most studies were however either partially or fully justified (item 24).

Aim 3: Quality Over Time
The Pearson correlation between year and percentage of maximum total scores was nonsignificant, r = 0.56, p = 0.116.In addition to this, the Pearson correlation between omnibus rating and publication year was nonsignificant, r = 0.52, p = 0.154.This indicates that there was no significant association between the year of publication and quality ratings.

Discussion of Main Findings
Aim 1 As reviews utilizing the RCT-PQRS are not consistent in their metric of quality measurement, it is difficult to compare our results with other reviews.Therefore, the number of studies above the cut-off score, as well as the omnibus ratings will be used as a comparison.
The results of this review indicated that 78% of the studies were above the cut-off score for adequacy.These results are contrary to some research showing that only 57% of 93 RCTs evaluating psychodynamic therapy were rated as of adequate quality (Gerber et al., 2011).However, the findings were more consistent with more recent studies finding rates of studies above the cut-off score like 75%, 77% and in one case 91% (Grenon et al., 2018;Keefe et al., 2020;Steinert et al., 2017).
In our review, the average omnibus rating of studies was average to moderately good.This finding was comparable to the findings of the reviews previously mentioned (Grenon et al., 2018;Keefe et al., 2020).Interestingly, the omnibus ratings for this review were comparable to those of a similar review assessing the quality of IRT for the other sleep-related symptoms in PTSD, namely PTSD-related nightmares (Harb et al., 2013).In that review, the authors reported the mean omnibus rating to be moderately poor (Harb et al., 2013).
Even though the findings of this review were to some degree comparable to those of other reviews using the RCT-PRQS, the results still showed that most of the studies only just surpassed the cutoff score.This indicates that certain measures could be taken to increase the quality of any future studies evaluating CBT-I in adult populations with PTSD.Additionally, it needs to be noted that there is a substantial difference in the scores between some of the studies.The mean percentage of maximum total scores should, therefore, be interpreted with caution.

Aim 2
As stated previously, low study quality can inflate effect sizes.Therefore, even though most studies were above the cut-off score of adequate study quality, measures should still be taken to ensure optimal quality.Certain aspects of the study design were consistently implemented or reported in ways that resulted in limited scoring for those specific items.
Treatment fidelity checks.Most of the studies failed to conduct proper treatment fidelity checks.Testing treatment fidelity is of importance since it can help explain why prior results weren't replicated or why that particular evaluation found higher effects than prior studies (Temple et al., 2018).Without assessing the treatment fidelity of the study, therapy adherence and integrity cannot be controlled for, which limits the conclusions being drawn.To ensure that the treatment being studied is also the treatment being provided to study participants, systematic treatment fidelity checks should be administered (Temple et al., 2018).
Outcome assessments by blinded raters.Most studies were assessed to have limited outcome assessments in relation to blinding of raters and/or reliability checks.The reason for this in most of the studies was that self-reports were used as outcome measures.By not ensuring blinded outcome raters, the raters, whether clinicians, patients or unblinded raters, could be biased by expectancy effects or knowledge of sham treatments that influence their ratings (Marcus et al., 2006).
Discussion of adverse events during treatment.All except one study failed to report adverse events in the treatments, which only mentioned that there were none (Harb et al., 2019).Unfortunately, one systematic review has shown that the reporting of adverse events is conducted less often in psychotherapy trials than in trials assessing pharmacological treatments (Duggan et al., 2014).However, Spielman et al. (2011) argue that sleep restriction therapy, one of the interventions used in most CBT-I packages, can result in increased daytime sleepiness in the beginning phase of treatment.They underline that patients working in some industries where vehicles or heavy machinery are used have an increased risk of accidents due to the treatment (Spielman et al., 2011).They also mention that patients with conditions like parasomnia (e.g., sleepwalking or sleep terrors), epilepsy, and sleepdisordered breathing should avoid participating in sleep restriction therapy due to the acute potential side effects of increased symptoms (Spielman et al., 2011).
Consideration of therapist/site effects.Most studies failed to adequately control for any potential therapist/site effects.Therapist/site effects are defined as differing levels of efficiency between the individual study therapists or study sites.The implications of not properly considering these effects are that confounding variables aren't controlled for, and effects that aren't necessarily due to the specific treatment are concluded to be just that (Wampold & Imel, 2015).This can be avoided by conducting statistical analyses that control for the differing effectiveness of the treatment providers included in the study or by comparing the effectiveness of different sites of the study employing a multisite design (Wampold & Imel, 2015).One such statistical approach would be utilizing the multilevel model (Kahn, 2011), as has been done in three of the articles included in the review (Gehrman et al., 2020;Harb et al., 2019;Laurel Franklin et al., 2018).
Randomization procedures.Half of the studies had limited randomization procedures.Proper randomization procedures are important to ensure that differences between groups at posttreatment or follow-up due not occur due to systematic differences that existed at baseline (Beller et al., 2002).This means that, when not properly conducted, poor randomization can weaken the ability to conclude that the intervention caused the treatment effect.Adequate or excellent randomization can be ensured using stratified approaches with randomization being conducted after baseline assessments.
Balancing of allegiance bias.Finally, most RCTs did not discuss or evaluate the balance of allegiance to types of treatment.Allegiance in psychotherapy outcome studies refers to the finding that a correlation has been found between the personal allegiance researchers or study therapists have to one of the treatments being evaluated and the outcomes of that particular treatment (Wampold & Imel, 2015).Multiple explanations, such as implicit bias in study design decisions or expectancy effects on behalf of study therapists have been offered for this finding (Wampold & Imel, 2015).Even though this mostly applies to studies using an active control condition, it has still been recognised by certain researchers that there should be an explicit consideration of allegiance balancing in psychotherapy outcome studies (Wampold & Imel, 2015).This can be implemented by ensuring that study therapists in the different treatment groups are equally motivated towards the respective treatments and that comparative studies use teams with mixed allegiances (Munder et al., 2013).

Aim 3
The finding that the correlation between publication year and percentage of maximum total or omnibus scores was not significant is contrary to that of previous studies.Previous reviews have found quality trends change over time for psychodynamic therapy trials (Kocsis et al., 2010), interventions for emotional distress in breast cancer patients (Temple et al., 2018), and interventions for eating disorders (Grenon et al., 2018).It has been argued that intervention studies and RCTs in particular have improved in quality over time (Kocsis et al., 2010).There are two possible explanations for this association not being found in our review.Firstly, this review only found nine relevant studies whereas previous reviews assessing study quality of intervention studies found 69 relevant studies (Grenon et al., 2018;Kocsis et al., 2010;Temple et al., 2018).Our review was therefore based on a much smaller number of studies than previous reviews.Another explanation is that the included studies span a shorter time period than previous reviews with a time period of 11 years in our included studies whereas the time period in one of the earlier mentioned reviews was 30 years (Grenon et al., 2018;Kocsis et al., 2010;Temple et al., 2018).

Limitations
Only two databases, PsycInfo and PubMed were used for the search.Future reviews could possibly benefit from reviewing additional databases.One such database could be The Published International Literature on Traumatic Stress (PILOTS), that aims to include all articles relevant with regards to PTSD.Furthermore, no search of grey literature, such as unpublished manuscripts or reports not published in scientific journals, was conducted, limiting the scope of our review.Additionally, the RCT-PQRS is used to assess study quality specifically with regard to internal validity and focuses less on external validity in the form of generalizability (Gerber et al., 2011).Future reviews should increase the efforts to evaluate study design elements that are relevant to generalizability.Lastly, one very important limitation of our review is that even though we sought to evaluate the quality of studies evaluating CBT-I in general adult populations with PTSD, nearly all of our studies' samples consisted entirely of veterans.Even though our search seems to indicate that no other studies with adult non-veteran samples exist, it is still important to take note of.

Strengths
The systematic literature searches were thorough, with preliminary work to identify relevant databases and search terms, as well as searching reference lists and contacting the first authors of the included studies.The RCT-PQRS has demonstrated good reliability and validity and is a useful tool for evaluating the quality of clinical studies (Gerber et al., 2011;Kocsis et al., 2010).Furthermore, it was useful to use the same measure across different study types, as has been done in the past, to enable comparison of scores.Lastly, the ICC was deemed to be adequate.

Implications and Future Suggestions
Most of the studies were quantitatively rated to have adequate study quality through their omnibus rating, suggesting that the validity of these studies with regards to the design and conduct could be considered adequate.This would mean that the results of this review support the internal validity of the included studies and their results.However, as our discussions of our findings showcased, there were several important aspects of study quality that were inadequate.The limitations that were discussed, such as lack of treatment fidelity checks, blinded raters, reporting of adverse events, consideration of therapist/site effects, proper randomization procedures, and balancing of allegiance bias indicate impaired internal validity of the studies.This limits the certainty of the findings of the studies and practicing clinicians therefore need to consider the findings with regards to the efficacy of CBT-I in these studies tentative.
We offer six suggestions for future research based on the six items that were found to generally be of inadequate quality in our review: • We suggest a greater focus on ensuring both the implementation and reporting of treatment fidelity checks during the study.• The outcome assessments could to a greater extent be conducted by blinded raters.• The reporting of any adverse events during the interventions could benefit from becoming usual practice.• Therapist/site effects could statistically be accounted for in future studies to ensure that any potentially confounding variables are controlled for.• Randomization procedures could be optimized by either using stratified approaches or using third parties to ensure proper randomization.• Lastly, allegiance bias is a potential threat that can introduce bias to the designing and reporting of the study and therefore needs to be considered.This could be achieved by using mixed teams of researchers or by making sure that the therapists in the different treatment groups have allegiance to the respective techniques of that treatment (Munder et al., 2013).
By implementing these recommendations, the evidence base for using CBT-I in adult populations with PTSD could be considerably strengthened.Even though it may be too early to try and examine whether the quality of the studies evaluating CBT-I in PTSD populations change over time, the implementation of the proposed changes could hopefully ensure a positive trend of improved study quality with time.As stated earlier, adequate study quality is important if the reported treatment effects and their conclusions are to inform policy-making and evidence-based practice.Finally, future reviews examining the quality of mixed CBT-I interventions are needed.

Conclusion
Intervention studies of cognitive behavioural therapy for insomnia in adult populations with PTSD are generally of adequate quality.This review gives support to the included studies and their results with regard to their internal validity.However, several aspects of study design can be improved.The findings indicated that most intervention studies were generally above the cut-off score for adequate quality on the RCT-PQRS, but only 70% of the studies had a maximum total score of 50% or more.
Although most intervention studies were generally of adequate quality, some design elements were lacking in study quality, including the use and reporting of treatment fidelity checks, outcome assessments using blinded raters, reporting adverse events during the treatment, controlling for confounding variables in regard to therapist/site effects, and considering allegiance bias in the designing and reporting of the study.This review found a non-significant correlation between publication year and study quality, which may be due to the relatively recent publication years of included studies and the small sample size.Future studies could improve upon the aforementioned aspects to improve the quality of intervention studies and enhance the ability to draw conclusions from treatment effects.

Table 2
Study and Treatment Characteristics ofCBT-I Studies Included in this Review.
a = Randomized Controlled Trial.b = Image Rehearsal Therapy.c = Treatment as usual.

Table 3
Participant Characteristics ofCBT-I Studies Included in this Review.
a = Standard deviation (SD) not reported b = Information not reported.