^{1}

^{2}

^{1}

^{3}

The authors have declared that no competing interests exist.

Efficient medical progress requires that we know when a treatment effect is absent. We considered all 207 Original Articles published in the 2015 volume of the

Across the medical sciences, null results are of central importance. For both patients and doctors, it is crucial to know that a new treatment does not outperform the current gold standard; that a generic drug is just as effective as an expensive brand-name drug; and that a new surgical procedure does not improve survival rate. Knowing that effects are absent allows the profession to retain existing medical procedures and reallocate its limited resources to the exploration of novel treatments that are potentially effective.

In this manuscript we summarize and reanalyze the null results for primary outcome measures reported in the 2015 volume of the

The statistical evaluation of medical hypotheses currently proceeds almost exclusively through the framework of

In order to quantify the evidence in favor of the absence of a treatment effect, we adopt the framework of Bayesian statistics and compute the predictive performance of two competing hypotheses: the null hypothesis that states the effect to be absent and the alternative hypothesis that states the effect to be present (e.g., [

The

Computation of the Bayes factor requires the analyst to quantify the expectation about effect size under the alternative hypothesis. In contrast to a classical power analysis, this expectation encompasses a range of different effect sizes, weighted by their prior plausibility. Here we adopt an “objective Bayesian approach” [

The conceptual advantage of a Bayesian analysis can be underscored with a simple example. Consider the study by Jolly et al. ([

Both studies fail to reject the null hypothesis and obtain a

In sum, the assessment of medical null effects may benefit from an additional Bayesian analysis. Below we explore the extent to which medical null results reported in the 2015 volume of NEJM actually yield compelling evidence in favor of the null hypothesis when assessed by a default Bayes factor hypothesis test.

We considered all 207 Original Articles published in the 2015 volume of NEJM. This journal was chosen because of its prominence in the field of medicine, and because it publishes papers about a wide range of medical issues. An initial screening identified 45 articles whose abstract contained at least one claim about the absence or non-significance of an effect for a primary outcome measure (21.7%). To facilitate the analysis and the interpretation of the results, we selected a further subset 37 of articles that allowed a simple comparison between proportions, that is,

Our Bayesian reanalysis was facilitated by the fact that in order to compute Bayes factors for contingency tables, knowledge of the individual cell counts suffices. Bayes factors were calculated using the default test for a comparison of proportions (e.g., [

The Bayes factors reported here compare the predictive performance of the null-hypothesis (which assumes the absence of association between rows and columns) against predictive performance of the alternative hypothesis (which assumes the presence of an association). The default Bayes factor specifies that under the alternative hypothesis, every combination of values for the proportions is equally likely a priori. For example, in the case of the 2 by 2 table, the alternative hypothesis specifies two independent uniform distributions for the two rate parameters. In specific applications, such a default, reference-style analysis can be supplemented by substantive knowledge based on earlier experience. With a more informative prior distribution, the alternative hypothesis will make different predictions, and a comparison with the null hypothesis will therefore yield a different Bayes factor. The more informed the prior distribution, the more specific the model predictions, and the more risk the analyst is willing to take. Highly informed prior distributions need to be used with care, as they may exert a dominant effect on the posterior distribution, making it difficult to “recover” once the data suggest that the prior was ill-conceived. With informed prior distributions, it is wise to perform a robustness analysis to examine the extent to which different modeling choices lead to qualitatively different outcomes.

In this article, we prefer the default prior, as it is the most common choice and an informed specification would require an elaborate elicitation process from many different experts. We do not, therefore, view the outcomes of this analysis as definitive, although we believe that the qualitative results (i.e., strong but highly variable evidence in favor of the null) hold across a broad range of prior distributions.

Before proceeding it is important to point out that the Bayes factor quantifies the support provided by the data. For any two models, the posterior odds is obtained by multiplying the Bayes factors by the prior odds. In other words, the Bayes factor allows an assessment of the strength of evidence that is independent from the relative prior plausibility of the models.

The main result concerns the default Bayes factors for the 43 null effects reported in the 2015 NEJM volume.

One possible determinant of the strength of the Bayes factor is sample size.

Among the 43 null results from the 2015 volume of NEJM, large samples are more likely to yield compelling evidence in favor of the null hypothesis than small samples (r = 0.72).

We applied a default Bayes factor reanalysis to 43 null results published in the 2015 volume of NEJM. Reassuringly, this reanalysis revealed that all null results supported the null hypothesis over the alternative hypothesis, and the overall degree of support was strong. Nevertheless, from experiment to experiment the degree of evidence varied considerably—the smallest Bayes factor was 2.42 (“not worth more than a bare mention”, [

Several remarks are in order. First, we were pleasantly surprised that as many as 21.7% of the studied papers reported a null result for at least one of the primary outcome measures, considering the strong confirmation bias that is present in the biomedical literature (e.g., [

Fourth, it may be possible to combine the different ingredients of classical statistics and NHST (e.g., power, confidence intervals,

Fifth, one might argue that the point null hypothesis is never exactly true, and as a result its examination is pointless (i.e., a foregone conclusion [_{0} or H_{δ} [with δ representing a very small effect—HMRW] will be nearly equal, and concern about the high probability of rejecting one is equivalent to concern about rejecting the other” (p. 582).

A final point concerns the difference between testing and estimation. Several researchers and institutions (e.g., [

In general, null results in medicine can have serious practical ramifications. The importance of medical null results is evident from the fact that in 2015, about 1 in every 5 papers we studied reported a null result for one of its primary outcome measures. For such null results, medical professionals need to be able to gauge the evidence in favor of the absence of an effect. Here we showed how this goal can be accomplished by the application of a default Bayes factor test. For many standard analyses, such default Bayes factor tests are now easy to apply ([

In sum, an assessment of all 207 Original Articles in the 2015 volume from NEJM revealed that 21.7% reported a null result for one or more of their primary outcome measures. A standard Bayesian reanalysis of 43 null results revealed that the evidence in favor of the null hypothesis was strong on average, but highly variable. Higher sample sizes generally produced stronger evidence. We suggest that by adopting a statistically inclusive approach, medical researchers confronted with a null result can issue a report that is more informative and more appropriate than the one that is currently the norm.

This is a supplement of “Bayesian Reanalysis of Null Results Reported in the New England Journal of Medicine: Strong yet Variable Evidence for the Absence of Treatment Effects” by Hoekstra R, Monden R, van Ravenzwaaij D and Wagenmaker EJ. This document was written by Rei Monden (November, 2016). These plots were generated based on the 43 test statistics reported in the New England Journal of Medicine 2015.

(PDF)

(PDF)