^{*}

Benjamin Djulbegovic is in the Department of Interdisciplinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, University of South Florida, Tampa, Florida, United States of America. Iztok Hozo is in the Department of Mathematics, Indiana University Northwest, Gary, Indiana, United States of America.

The authors have declared that no competing interests exist.

Ioannidis estimated that most published research findings are false [

The authors calculate the probability above which potentially false research findings may become acceptable to society.

As in most investment strategies, our willingness to accept particular research findings will depend on the expected payback (the benefits) and the inadvertent consequences (the harms) of the research. We begin by defining a “positive” finding in research in the same way that Ioannidis defined it [

However, the calculation of PPV tells us nothing about whether a particular research result is acceptable to researchers or not. Nevertheless, it can be shown that there is some probability (the “threshold probability,” p_{t}) above which the results of a study will be sufficient for researchers to accept them as “true” [_{t} and B/H can be expressed as (see Appendix, Equation A1):

We define net benefit as the difference between the values of the outcomes of the action taken under the research hypothesis and the null hypothesis, respectively (when in fact the research hypothesis is true). Net harms are defined as the difference between the values of the outcomes of the action taken under the null and the research hypotheses, respectively (when in fact the null hypothesis is true) [_{t} we can rationally accept the results of the research findings. Similarly, if the PPV is below p_{t} we should accept the null hypothesis. Note that the research payoffs (the benefits) and the inadvertent consequences (harms) in

We can now frame the crucial question of interest as: What is the minimum B/H ratio for the given PPV for which the research hypothesis has a greater value than the null hypothesis? Mathematically, this will occur when (see Appendix, Equations A1 and A2):

The horizontal yellow line indicates the actual conditional probability that the research hypothesis is true in the case of positive findings. This means that for benefit/harm ratios above the threshold (1.5 in this example), the research hypothesis can be accepted.

Note that we are following the classic decision theory approach to the results of clinical trials, which states that a rational decision maker should select the research versus the null hypothesis depending on which one maximizes the value of consequences [

Interim analyses of clinical trials are challenging exercises in which researchers and/or data safety monitoring committees have to make a judgment as to whether to accept early promising results and terminate a trial or whether the trial should continue [

We now illustrate these issues by considering a clinical research hypothesis: is radiotherapy plus chemotherapy (combined R_{x}) superior to radiotherapy alone (RT) in the management of cancer of the esophagus? (see

The Radiation Oncology Cooperative Group conducted a randomized controlled trial to evaluate the effects of combined chemotherapy and radiotherapy versus radiotherapy alone in patients with cancer of the esophagus [

A sample size of 150 patients was planned to detect an improvement in the two-year survival rate from 10%–30% in favor of combined R_{x} (at α = 0.05 and ß = 0.10). At the interim analysis, 88% of patients in the control group (RT) had died while only 59% in the experimental arm (combined R_{x}) had died, resulting in a survival advantage of 29% in favor of combined R_{x} (

For this reason, the trial was terminated prematurely after enrolling 121 patients. Two percent of patients died as a result of treatment in the combined R_{x} group versus 0% in the RT arm. Thus, the observed net benefit/harm ratio in this trial was [88-59-2]/2 = 13.5 [

For our _{x} (12%) will have died. This will result in the worst-case net benefit/harms ratio = (88-59-12)/12 = 1.4.

The trial was stopped using classic inferential statistics which indicated that the probability of the observed results, assuming the null hypothesis that combined R_{x} is equivalent to RT, was extremely small (_{x} is better than RT? The probability that the research finding is true [_{x} is truly better treatment than RT) under the best-case scenario is 95% [95% CI, 89%–99.9%]. Under the worst-case scenario, the probability that combined R_{x} is better than RT is 80% [95% CI, 61%–99%].

The results indicate that in the best-case scenario, the probability that the research findings are true far exceeds the threshold above which the results should be accepted (i.e., PPV is greater than p_{t}). Therefore, rationally, in this case we should not hesitate to accept the findings from this study as truthful. However, in the worst-case scenario, the lower limit of the PPV's 95% confidence interval intersects with the upper limit of the threshold's 95% confidence interval, indicating that under these circumstances the research hypothesis may not be acceptable (since PPV is possibly less than p_{t}). Had the investigators made a mistake when they terminated the trial early?

Mistakes are an integral part of research. Positive research findings may subsequently be shown to be false [

We now apply the concept of acceptable regret to address the question of whether potentially false research findings should be tolerated. In other words: which decision (regarding a research hypothesis) should we make if we want to ensure that the regret is less than a predetermined (minimal acceptable) regret, R_{0} [_{0}

It can easily be shown that we should be willing to accept the results of potentially false research findings as long as the posterior probability of it being true is above the acceptable regret threshold probability, _{r}_{o}

This equation describes the effect of acceptable regret on the threshold probability (

Note that actions under expected utility theory (EUT) and acceptable regret may not necessary be identical, but arguably the most rational course of action would be to select those research findings with the highest expected utility while keeping regret below the acceptable levels. The supplementary material (a longer version of the paper and Appendix) show that the maximum possible fraction of benefits that we can forgo (and still be wrong) while at the same time adhering to the precepts of EUT is given by (see Appendix, Equations A3–A6):

A practical interpretation of this inequality is that some research findings may never become acceptable unless we are ready to violate the axioms of EUT, i.e., accept value r to be larger than defined in

We return now to the “real life” scenario above, i.e., the dilemma of whether to stop a clinical trial early. In our worst-case analysis (_{x} is better than radiotherapy alone could potentially be as low as 80% [95% CI, 61%–99%]. This figure overlaps with the probability of the threshold of 41% [95% CI, 11%–72%] above which research findings are acceptable under the worst case scenario (see _{t}; see

One way to handle situations in which evidence is not solidly established is to explicitly take into account the possibility that one can make a mistake and wrongly accept the results of a research hypothesis. Accepting this possibility can, in turn, help us determine “decision thresholds” that will take into account the amount of error which may or may not be particularly troublesome to us if we wrongly accept research findings.

Let us assume that the investigators in the esophageal cancer trial are prepared to accept that they may be wrong and that they were willing to forgo 10%, 30%, or 67% of benefits. Using _{x} is superior to R_{T} is above all decision thresholds (since p_{r} = 0 in best-case scenario;

The calculated (acceptable regret) threshold above which we should accept research findings is shown for the worst-case scenario (B/H = 1.4; see text for details) with a (hypothetical) assumption that we are willing to forgo 30% of the benefits (slanted line). The calculated threshold probability (acceptable regret threshold) has a value of 58% when B/H = 1.4 (the horizontal line). This means that as long as the probability that research findings are true is above this acceptable regret threshold, these research findings could be accepted with tolerable amount of regret in case the research hypothesis proves to be wrong (for didactic purposes only one acceptable regret threshold is shown). See

You will recall (in _{x}. By finding that combined R_{x} improved survival by 29%, they appeared to have realized their most optimistic expectations [

Therefore, we assume that the investigators in the esophageal cancer trial are prepared to accept that they may be wrong and that they were willing to forgo 10%, 30%, or 67% of benefits.

We applied _{r}).

_{r} = 0) are well below calculated probability that the research hypothesis is true [PPV = 95% (88%–99.9%)] ( i.e., PPV > p_{r} = 0 for all acceptable regret assumptions;

_{x} is superior to RT [80% (61%–99%)] is above all other decision thresholds and its “truthfulness” can be accepted (because PPV [= 80% (61%–99%)] > acceptable regret threshold [= 58% (52%–64%)] and PPV > acceptable regret threshold [= 6% (0%– 19%)]). Note that in case of our willingness to tolerate loss of 30% of benefits for being wrong, the upper limit of the acceptable regret CI (=64%) still overlaps with the lower limit of PPV's CI (=61%), but that is not the case if we are willing to forgo 67% of treatment benefits. See

In the final analysis, the answer to the question posed in the title of this paper, “When should potentially false research findings be considered acceptable?” has much to do with our beliefs about what constitutes knowledge itself [

However, because a typical clinical research hypothesis is formulated to test for benefits, we have here postulated a relationship between _{0} to harms, or both benefits and harms, one must acknowledge that there is much more uncertainty, often total ignorance, about harms (since data on harms is often limited). As a consequence, under these circumstances research may become acceptable only if we relax our criteria for acceptable regret, i.e., accept value r to be larger than defined in

We conclude that since obtaining the absolute “truth” in research is impossible, society has to decide when less-than-perfect results may become acceptable. The approach presented here, advocating that the research hypothesis should be accepted when it is coherent with beliefs “upon which a man is prepared to act” [

(657 KB DOC).

(163 KB DOC).

We thank Drs. Stela Pudar-Hozo, Heloisa Soares, Ambuj Kumar, and Madhu Behrera for critical reading of the paper and their useful comments. We also wish to thank Dr. John Ioannidis for important and constructive insights particularly related to the issue of quality of data on harms and overall context of this work.

net benefits/harms

confidence interval

_{x}

radiotherapy plus chemotherapy

expected utility theory

posterior probability

_{r}

acceptable regret threshold probability

_{t}

threshold probability

_{0}

acceptable regret

radiotherapy alone