From January 2014,

Recent years have seen a

Questionable research practices can be broadly defined as a set of research practices that are typically employed with the purpose of presenting biased evidence in favor of an assertion [

Under the

The response to this crisis of confidence has not been uniform and different guidelines have been introduced in response to these questionable research practices by different journals. These guidelines may also refer to a third party’s guidelines as a baseline, such as the American Psychological Association (APA)’s publication manual [

The Editor-in-Chief of ^{st}, 2014 [

Statistics section. Effective January 2014, Psychological Science recommends the use of the “new statistics”—effect sizes, confidence intervals, and meta-analysis—to avoid problems associated with null-hypothesis significance testing (NHST). Authors are encouraged to consult this Psychological Science tutorial by Geoff Cumming, which shows why estimation and meta-analysis are more informative than NHST and how they foster development of a cumulative, quantitative discipline.

Research Disclosure Statements section. For each study reported in your manuscript, check the boxes below to: (i) confirm that (a) the total number of excluded observations and (b) the reasons for making these exclusions have been reported in the Method section(s). [] If no observations were excluded, check here. []; (ii) confirm that all independent variables or conditions in all studies reported in the paper, whether they ended up being included in the analyses or not, have been reported in the Method section(s). [] If there were no independent variables or manipulations, as in the case of correlational research, check here. []; (iii) confirm that all dependent variables or measures that were analyzed for this article’s target research questions have been reported in the Methods section(s), (iv) specify how sample size in each study was determined and (b) your data-collection stopping rule have been reported in the Method section(s). []; (v) if sample size was based on past research, include the relevant reference information in your manuscript; and (vi) if sample size was based on power analysis, include in your manuscript the type of test (independent t-test, logistical regression, etc.) and the pertinent parameters: significance level (alpha), effect size (d), and power (1 –beta); all tests should be two tailed.

Open Practice section. Three open practices (or badges) were included in this section:

(i) open data badge, which is earned for making publicly available the digitally-shareable data necessary to reproduce the reported result []; (ii) open materials badge, which is earned for making publicly available the digitally shareable materials/methods necessary to reproduce the reported results []; and (iii) preregistered badge, which is earned for having a preregistered design and analysis plan for the reported research and reporting results according to that plan. An analysis plan includes specification of the variables and the analyses that will be conducted.

Recent research indicates that the implementation of these new guidelines was found to be very effective in promoting the use of Open Practices from January 2012 to May 2015 [

The main aim of this preregistered study (^{st} 2013 and December 31^{st} 2015, by comparing papers accepted under both the old, simply referring to APA rules, and the new guidelines. In

To contextualize the extent to which practices changed across different psychology journals during this period, we also examined the

We hypothesized a positive change in the proportion of

All papers published in ^{st}, 2013 to December 31^{st}, 2015 were considered. It is worth mentioning that only the final published version, not an earlier version (if any) released online was considered. The sample included 305

Inclusion and exclusion criteria. Only primary empirical papers reporting data from one or more empirical studies were included. Papers only reporting meta-analysis, narrative reviews, simulation, comments, theoretical studies were excluded. In particular, we excluded 6% of

Scoring procedure and method (see also

NHST. A

CI. A confidence interval was reported. CI counted all cases with any confidence interval. We reported the overall proportion of papers with at least one confidence interval for either standardized or unstandardized measures.

MA. Meta-analysis of multiple related results included in the paper was reported. We only included papers with more than one result related to the same empirical question.

CI_interp. A confidence interval was referred to in the discussion or interpretation of results, upon which data interpretation was explicitly based. For example, this would include a paper explicitly mentioning the width or the precision of the CI, a comparison between two or more CIs, or an overlapping between two intervals.

ES_interp. An effect size, either standardized or unstandardized, was referred to in the discussion or interpretation of results. We considered ‘effect size’ in the broad sense (17), including means, differences between means, percentages, and correlations, as well as Cohen’s ^{2}, and ^{2}. Papers were considered which included not only a dichotomous difference vs. no difference approach, but also those referring to the magnitude of the effect (e.g., small, large, strong etc.) or to the amount of explained variance. Effect size could be expressed in original units, or in some standardized or units-free form.

Sample_size. The authors described how sample size(s) were determined. For example, a power analysis - based on previous research, or on an estimated effect size–had been conducted. We used a very lenient approach, including all papers vaguely mentioning how the sample size was determined (e.g., the sample size was determined based on previous research, etc.).

Data_excl. The authors reported the criteria for data inclusion or exclusion—for example, the criteria for the exclusion of outliers.

Data. The paper carried the Open Data badge (see below), or stated where the data were available or how they could be obtained. We used a very lenient approach, including all the papers mentioning that data were available (e.g., data are available upon request).

Materials. The paper carried the Open Materials badge, or stated where details of the experimental materials and procedure could be obtained. We used a very lenient approach, including all the papers mentioning that materials were available (e.g., materials are available upon request).

Preregistered. The paper carried the Preregistered badge, or stated where a preregistered plan had been lodged in advance of data collection. Papers in this category typically included information about the number of the preregistration or where the preregistration is available.

Paper ID: | |||
---|---|---|---|

Value | Labels | Criteria for ‘Yes’ response | |

1-Null hypothesis significance testing | PE (p exact) |
NHST | At least one |

2-Confidence intervals | Y | CI | At least one is reported; specify where: text, tables, figures; |

3-Meta-analysis of reported data | Y/NA | MA | Authors meta-analyze results obtained in more than one reported experiment |

4-Confidence intervals interpretation | Y | CI_Interpr | Authors explicitly refer to CIs in the comments and/or discussion of the results, e.g. |

5-Standaridzed or unstandardized effect size interpretation | Y | ES_Interpr | Authors explicitly refer to |

6-Sample size determination | Y | Sample_size | Authors explicitly clarify how they determined the sample size(s), e.g. power estimate, previous studies, etc. |

7-Sample size stopping rule | Y | Data_excl | Authors explicitly declare if and which stopping rule where adopted or the criteria to exclude data and/or manage outliers |

8-Data availability | Y | Data | Authors explicitly give information on how the data may be obtained, e.g. posted in a repository; author email, etc. |

9-Materials availability | Y | Mater | Authors explicitly give information on how to obtain the materials, equipment and/or software used in the study |

10-Preregistered design & analysis plan | Y | Prereg | Authors explicitly declare where the study was preregistered |

The three badges (i.e., Data, Materials and Preregistration) are described in detail by the Center for Open Science (tiny.cc/badges; accessed by June, 2016). For

Papers were examined for the presence of each of the ten practices. For each of them, the score could be “Y” (yes) if present.

Papers were divided by the authors and independently scored. The authors are experienced researchers with good knowledge of the statistics examined. Secondly, a random sample comprising ten percent of the papers was scored independently by both raters to test inter-rater reliability. Mean inter-rater reliability was 90% across all ten variables, ranging from 99% for type of

Only descriptive statistics are reported given that they refer to the whole populations of studies. For each of the ten practices analyzed, the number of papers including a practice was expressed as a proportion of the total number of papers. Only for the meta-analysis (MA) did we exclude from the total papers those for which the meta-analysis criterion was not applicable (NA), namely papers with a single study. The checklist for study examination is presented in

We considered all of the papers that appeared in

NHST = null hypothesis significance testing; CI = confidence intervals; MA = meta-analysis; CI_interp = confidence intervals interpretation; ES_interp = effect size interpretation; Data_excl = exclusion criteria reported; Material = additional materials availability; Prereg = preregistered study. For the meta-analysis (MA) the proportion of papers with more than one related study, i.e. have potential for MA, was considered.

Journal | PS | JEP: General | ||||
---|---|---|---|---|---|---|

Year | 2013 | 2014 | 2015 | 2013 | 2014 | 2015 |

1. NHST | 0.99 | 1.00 | 0.96 | 1.00 | 0.99 | 1.00 |

2. CI | 0.28 | 0.37 | 0.70 | 0.26 | 0.24 | 0.52 |

3. MA | 0.04 | 0.02 | 0.03 | 0.00 | 0.02 | 0.10 |

4. CI_interp | 0.02 | 0.02 | 0.06 | 0.09 | 0.00 | 0.03 |

5. ES_interp | 0.09 | 0.07 | 0.17 | 0.25 | 0.08 | 0.17 |

6. Sample_size | 0.05 | 0.17 | 0.43 | 0.07 | 0.12 | 0.48 |

7. Data_excl | 0.12 | 0.26 | 0.65 | 0.32 | 0.27 | 0.46 |

8. Data | 0.03 | 0.17 | 0.39 | 0.01 | 0.01 | 0.02 |

9. Mater | 0.07 | 0.14 | 0.31 | 0.11 | 0.01 | 0.05 |

10. Prereg | 0.00 | 0.01 | 0.02 | 0.00 | 0.00 | 0.04 |

NHST = null hypothesis significance testing; CI = confidence intervals; MA = meta-analysis; CI_interp = confidence intervals interpretation; ES_interp = effect size interpretation; Data_excl = exclusion criteria reported; Material = additional materials availability; Prereg = preregistered study. For the meta-analysis (MA) the proportion of papers with more than one related study, i.e. having potential for MA, was considered.

In both journals over all three years, virtually all empirical papers used NHST, with about 80% reporting the

Overall, in

As for the use of the three open access practices, there was an increase from 3% to 39% (i.e., about a 36% increase) of the availability of data and an increase from 7% to 31% (i.e., about a 24% increase) of other materials. The use of a meta-analysis (MA) did not see an evident increase, with less than 5% of the papers that reported more than one related study reporting one. The increase in preregistration was also very small (only about 2% for 2015).

In general, it seems that CI use and sample size justification are increasingly being adopted in

Comparing

For

Moreover, there were larger and broader changes in

It is uncertain whether the improvement in the use of some statistical practices represents a substantial change. On one hand, CI intervals, sample size determination and data exclusion practices have improved considerably. However, we believe that there is still an overreliance on interpreting results based solely on NHST and on

However, by shifting many incentives underlying long-entrenched scientific practices to make the adoption of new practices easy, there is a possibility that substantial change can occur; and in fact, initiatives to promote this shift have been occurring in the community (e.g., preregistration / Preregistration Challenge, TOP Guidelines, Peer Reviewers’ Openness initiative).

Improvements in open practices in

In

Our findings provide convergent support to the initiatives that emphasize the critical role of journal editors and reviewers in the promotion of reforms in scientific practices. Journal editors and reviewers are crucial in verifying that the practices proposed are adopted by the authors. Among the ongoing initiatives, the Transparency and Openness Promotion (TOP) guidelines (available here

Another recent initiative is the Peer Reviewers' Openness Initiative (

Choice of comparison journal. It is difficult to make a direct comparison between the submission guidelines of two journals as there are numerous factors to consider. Journals may be influenced by the publisher or society to which they are associated, the subject matter, the technical aspects of submission, or age and pedigree of the journal.

Time-frame. Our analysis was limited only to three years, from 2013 to 2015. In fact,

Confidence intervals and effect size interpretation. We believe that it is difficult to establish whether an author has interpreted an ES or a CI. We only coded CI_interp or ES_interp if the authors explicitly interpreted the CI or the ES. However, we cannot exclude the possibility that the proportion of papers falling within these two categories would have increased using a more lenient approach.

Sample size. We used a very lenient approach for sample size determination. It can be argued that research can be rather vague on the sample size determination (e.g., sample size was determined based on previous research etc.), and many authors overlook the importance of performing a power analysis before collecting data [

Data. Our results on open data sharing rates in

Materials. We used a very lenient approach for the materials availability. However, we recognize that the majority of researchers fail to share data after publication [

To sum up, we cannot assess the extent that observed changes were caused by guideline changes, but it seems that changes in guidelines may be useful although not sufficient. Changing guidelines may be effective for some practices but rather less so for others. Substantial innovation in science practice seems likely to require multiple strategies for change, including, in particular, the coordinated efforts of journal editors, reviewers and authors.

Broadly speaking, it could be suggested that many authors take a “bare-minimum” approach, therefore journal-specific submission guidelines may have a greater impact than reference to an external source, such as APA. Consequently, it may be in a journal’s best interest and best practice to give authors specific directions on reporting of statistics and use of open practices even when these are nearly identical to, and readily available from, other sources.

We are grateful to the Editor, Jelte M. Wicherts and to the reviewer Mallory Kidwell for their comments and suggestions.