^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: MvA RvA MN. Performed the experiments: MvA RvA MN. Analyzed the data: RvA MN. Wrote the paper: MvA RvA MN JW.

De Winter and Happee

Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing.

Publishing everything is more effective than only reporting significant outcomes.

Scientific publications typically report positive rather than negative empirical results

In their simulation study, W&H compared two approaches to scientific publishing in a scenario involving a hypothetical line of research on a given topic. All of W&H's simulations assumed that one underlying population effect gave rise to the study outcomes (i.e., a fixed effect in the parlance of meta-analysis). In W&H's first approach (“publishing everything”) all results were published regardless of their outcome. In their second approach (“selective publishing”), results were published if they deviated significantly from the earlier evidence as presented in the literature. We remark that the notion that all results are published does not match well with the overwhelming evidence of the file-drawer problem in many academic fields

The selective publishing approach of W&H assumes that each subsequent study in a line of research (i.e., a study on a given topic) involves a test of the null hypothesis that the population effect equals the summary effect obtained by a meta-analysis on the previously published findings (a so-called cumulative meta-analysis). Findings are only published when they are significantly different from the summary effect on the basis of earlier published results (

W&H evaluated the performance of the selective publishing and publishing everything approaches using a straightforward simulation study that involved one scenario in which the population was normally distributed with mean and standard deviation equal 0.3 and 1, respectively. All primary studies in the scenario had a sample size of 50. W&H repeated the simulation 5,000 times, and each simulation was stopped when 40 studies were published in the selective publishing approach. W&H found that the standard deviation of the cumulative meta-analytic effect across all 5,000 replications as a function of publication number was smaller in the selective publishing than in the publishing everything approach (W&H's Figure 3). More specifically, the standard deviation after 40 publications was .0170 for selective publishing compared to the higher .0222 for publishing everything, and 68 studies were needed in publishing everything for reaching the same accuracy obtained after 40 publications in selective publishing (

Here we show that W&H's conclusion is false, i.e., that publishing everything is

Using exactly the same simulation results and scenario as W&H, we present two reasons why publishing everything is more effective than selective publishing. The first reason is that the meta-analytic effect is estimated more precisely (i.e., with a lower standard error) in the publishing everything than in the selective publishing approach. Second, publishing everything is also “cheaper” than selective publishing in terms of cost-benefit and time considerations.

A major problem in the evaluation of the two approaches by W&H is that it is based on a number that is not available to the scientist. W&H compared the two approaches using the

To illustrate the higher precision of the meta-analytic effect in publishing everything, we ran exactly the same simulation as W&H but now recorded the sampling error of the estimate after 40 publications for each of the 5,000 simulations. We applied random-effects meta-analysis, because random-effects meta-analysis is generally recommended when the underlying population effect may be heterogeneous

The second reason why publishing everything is more effective concerns the neglect of time and money in the analysis of W&H; “… the factor time is not included in our simulation models. That is, results are assessed per publication without taking into account study completion and the time between study completion and publication.” (p2). They compared both approaches conditional on the number of ^{th} published study in publishing everything corresponds to on average the 704^{th} conducted study (p.3) in the selective publishing approach. Hence on average 17.6 more time evolves and money is spent on data collection and doing research in the selective publishing approach to obtain the same number of 40 published studies in the publishing everything approach. To conclude, publishing everything is also more effective than selective publishing using cost-benefit and time considerations.

In this section, we expand on W&H's results by considering a scenario where the effect size matters. We slightly changed the scenario of W&H, but still compared the same two approaches. Again, the population was normally distributed, standard deviation equaled 1, and each primary study had a sample size of 50. However, the population mean equaled 0 (no effect), rather than 0.3 as in W&H's scenario. Moreover, we introduced a different stopping rule: it was assumed scientists stopped investigating the effect when they rejected the null hypothesis that the population effect is at least small (H_{0}:

Publishing Everything | Selective Publishing | |

0.001 (0.083) | 0.000 (0.106) | |

# studies published (sd) | 3.84 (1.73) | 8.36 (1.64) |

# studies conducted (sd) | 3.84 (1.73) | 113.95 (49.30) |

0.005 (0.009) | 0.072 (0.014) | |

% |
59.4 | 0 |

0.023 (0.082) | 0.000 (0.106) | |

−0.022 (0.083) | 0.000 (0.106) |

The final rows of

On the basis of results of their simulation, W&H concluded that, after a fixed number of publications, selective publishing yields a more accurate estimate of the true effect than publishing everything. We argue that their analysis of their results has two weaknesses. First, they use information that a researcher does not have, i.e., the standard deviation of the meta-analytic effect after many simulations; a researcher only possesses a standard error of one meta-analysis on the basis of all available results. Second, W&H analyzed their data disregarding the possible heterogeneity of effect sizes in a given meta-analysis. After re-analyzing their simulation results using random-effects meta-analysis we conclude, contrary to W&H that publishing everything yields a more accurate estimate than selective publishing. We also compared the two approaches in another scenario where the population effect is zero and scientists stop investigating the effect when they reject the null hypothesis that the population effect is at least small. The results in the latter scenario favored publishing everything even more; on average less than 4 studies were needed in the publishing everything approach, whereas more than 8 publications (114 studies) were needed in the selective publishing approach. On top of statistical arguments, we also demonstrate that publishing everything is more efficient than selective publishing with respect to cost-benefit and time considerations.

Here we argued that publishing everything is superior to selective publishing. More generally, in line with what is considered as best practice in meta-analysis, we recommend researchers to include both published

Another relevant question concerns the validity of the selective publishing scenario. It surely has some merit, since it explains the Proteus phenomenon that has occurred in some research areas

To the defense of W&H, one may argue that we did not do justice to their fixed effect approach by using random-effects meta-analyses in our simulations. Random-effects meta-analysis assumes a heterogeneous population effect size, but the data in all scenarios are generated from a population with a fixed effect size. However, a researcher is ignorant about the population effect size and cannot a priori determine that there is just one underlying effect in a set of studies. It is standard practice to use random-effects meta-analysis when there is evidence of effect size heterogeneity

Neither our nor W&H's simulations take into account what we consider major sources of bias in science, namely researcher's own aspirations and expectations in conducting studies, analyzing the data, and reporting of the results. The colloquial notion of “disappointing results” renders actual science notably more complex. Researchers conduct studies with a clear expectation about the results and are not immune to confirmation bias. Moreover, most high-impact journals specifically select for novel results and are commonly believed to select studies predominantly on the basis of statistical significance

The bias introduced by the contemporary scientific publishing system is quite severe

(R)