<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id><journal-title-group>
<journal-title>PLoS ONE</journal-title></journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PONE-D-14-17566</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0114255</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biology and life sciences</subject><subj-group><subject>Neuroscience</subject><subj-group><subject>Cognitive science</subject><subj-group><subject>Cognitive psychology</subject><subj-group><subject>Priming (psychology)</subject></subj-group></subj-group></subj-group></subj-group><subj-group><subject>Psychology</subject><subj-group><subject>Developmental psychology</subject><subject>Experimental psychology</subject><subject>Social psychology</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Science policy</subject><subj-group><subject>Research integrity</subject><subj-group><subject>Publication ethics</subject><subject>Scientific misconduct</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Social sciences</subject></subj-group></article-categories>
<title-group>
<article-title>Excess Success for Psychology Articles in the Journal <italic>Science</italic></article-title>
<alt-title alt-title-type="running-head">Excess Success for Psychology in <italic>Science</italic></alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>Gregory</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Tanzman</surname><given-names>Jay</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Matthews</surname><given-names>William J.</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><label>1</label><addr-line>Department of Psychological Sciences, Purdue University, West Lafayette, Indiana, United States of America and Brain Mind Institute, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland</addr-line></aff>
<aff id="aff2"><label>2</label><addr-line>Tanzman Statistical Consulting, Los Angeles, California, United States of America</addr-line></aff>
<aff id="aff3"><label>3</label><addr-line>Department of Psychology, University of Cambridge, Cambridge, United Kingdom</addr-line></aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Lu</surname><given-names>Zhong-Lin</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/></contrib>
</contrib-group>
<aff id="edit1"><addr-line>The Ohio State University, Center for Cognitive and Brain Sciences, Center for Cognitive and Behavioral Brain Imaging, United States of America</addr-line></aff>
<author-notes>
<corresp id="cor1">* E-mail: <email xlink:type="simple">gfrancis@purdue.edu</email></corresp>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
<fn fn-type="con"><p>Analyzed the data: GF JT WJM. Contributed reagents/materials/analysis tools: GF JT WJM. Wrote the paper: GF JT WJM.</p></fn>
</author-notes>
<pub-date pub-type="collection"><year>2014</year></pub-date>
<pub-date pub-type="epub"><day>4</day><month>12</month><year>2014</year></pub-date>
<volume>9</volume>
<issue>12</issue>
<elocation-id>e114255</elocation-id>
<history>
<date date-type="received"><day>18</day><month>4</month><year>2014</year></date>
<date date-type="accepted"><day>5</day><month>11</month><year>2014</year></date>
</history>
<permissions>
<copyright-year>2014</copyright-year>
<copyright-holder>Francis et al</copyright-holder><license xlink:type="simple"><license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p></license></permissions>
<abstract>
<p>This article describes a systematic analysis of the relationship between empirical data and theoretical conclusions for a set of experimental psychology articles published in the journal <italic>Science</italic> between 2005–2012. When the success rate of a set of empirical studies is much higher than would be expected relative to the experiments' reported effects and sample sizes, it suggests that null findings have been suppressed, that the experiments or analyses were inappropriate, or that the theory does not properly follow from the data. The analyses herein indicate such excess success for 83% (15 out of 18) of the articles in <italic>Science</italic> that report four or more studies and contain sufficient information for the analysis. This result suggests a systematic pattern of excess success among psychology articles in the journal <italic>Science</italic>.</p>
</abstract>
<funding-group><funding-statement>The authors have no funding or support to report.</funding-statement></funding-group><counts><page-count count="15"/></counts><custom-meta-group><custom-meta id="data-availability" xlink:type="simple"><meta-name>Data Availability</meta-name><meta-value>The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information files.</meta-value></custom-meta></custom-meta-group></article-meta>
</front>
<body><sec id="s1">
<title>Introduction</title>
<p>Unbelievable discoveries <xref ref-type="bibr" rid="pone.0114255-Bem1">[1]</xref>, important experimental findings that fail to replicate <xref ref-type="bibr" rid="pone.0114255-Doyen1">[2]</xref>, <xref ref-type="bibr" rid="pone.0114255-Shanks1">[3]</xref>, fraudulent data <xref ref-type="bibr" rid="pone.0114255-Stapel1">[4]</xref>, <xref ref-type="bibr" rid="pone.0114255-Johnson1">[5]</xref>, and awareness that researchers might use questionable research practices to produce significant findings <xref ref-type="bibr" rid="pone.0114255-Simmons1">[6]</xref>, <xref ref-type="bibr" rid="pone.0114255-John1">[7]</xref> have contributed to concerns that psychology cannot be trusted to produce valid scientific work <xref ref-type="bibr" rid="pone.0114255-Yong1">[8]</xref>–<xref ref-type="bibr" rid="pone.0114255-Levelt1">[10]</xref>. Even though fraud may be rare, researchers may re-run experiments, drop subjects, selectively merge data from different experiments, suppress null findings, drop experimental conditions, or round down <italic>p</italic> values in order to report statistical significance. Such practices bias the experimental outcomes, undermine the credibility of the theoretical conclusions that are derived from the published experimental data, and leave a statistical trace that can be identified across multiple experiments with the Test for Excess Significance (TES) <xref ref-type="bibr" rid="pone.0114255-Ioannidis1">[11]</xref>. Broadly speaking, the TES estimates the probability that a set of experiments with proper sampling, appropriate analyses, and full reporting will produce at least as many “successful” outcomes as have actually been observed. If this probability is small, it suggests that researchers should doubt the assumptions of appropriate sampling, proper analysis, and complete reporting. We describe the details of the TES below.</p>
<p>Recent investigations using the TES <xref ref-type="bibr" rid="pone.0114255-Francis1">[12]</xref>–<xref ref-type="bibr" rid="pone.0114255-Schimmack1">[21]</xref> have indicated that some articles and meta-analyses in the field of psychological science appear to be biased, which suggests that scientists should be skeptical about the claims in those articles and meta-analyses. Since those TES analyses focused on specific articles rather than a representative sample of articles, they do not allow for generalization to the broader field. It is difficult to create a random sample from published articles, and such a sample may not be especially meaningful because a seminal paper may motivate many investigations while a randomly selected paper may have little impact. Francis <xref ref-type="bibr" rid="pone.0114255-Francis9">[22]</xref> partly addressed this issue by investigating all possible articles in the prominent journal <italic>Psychological Science</italic> over a four-year span. In that analysis, 82% of investigated articles (36 out of 44) failed the TES analysis. Here we apply the same systematic analysis to articles published in the highly influential journal <italic>Science</italic> over an eight-year span. We investigated the journal <italic>Science</italic> because it is widely recognized as being one of the most important scientific journals. We restricted our analysis to papers related to psychology and education because, as described below, the TES analysis requires some subject-matter expertise to be able to interpret the presented statistical findings. The current authors have such expertise for psychology and education but will have to leave a similar analysis for fields such as biology or medicine to other subject-matter experts.</p>
</sec><sec id="s2">
<title>The Test for Excess Success</title>
<p>Across a series of hypothesis tests some unknown subset of the tests will result in errors, either by rejecting the null hypothesis when it is true (a Type I error) or by failing to reject the null when it is false (a Type II error). Random sampling ensures that such errors will sometimes occur even under ideal experimental conditions. Null hypothesis significance testing provides a way to control the Type I and Type II error rates; for example, setting the Type I error rate at 0.05 implies that one will mistakenly reject the null hypothesis for 5% of those studies where the null hypothesis is true. Under non-ideal conditions, such as when model assumptions required by the statistical tests are not met, the Type I error rate can be much larger than the intended 5%. Moreover, this nominal error rate depends upon the data having been properly sampled, analyzed, and reported.</p>
<p>Improper sampling, analysis, or reporting is difficult to identify in any single study, but such behaviors leave a detectable pattern across a set of experiments. Ioannidis and Trikalinos <xref ref-type="bibr" rid="pone.0114255-Ioannidis1">[11]</xref> proposed a “test for excess significance” (TES) that compares estimates of experimental power with the reported frequency of significant outcomes. For our analyses, we slightly extend the TES to encompass “excess success” rather than only excess significance.</p>
<p>The definition of “success” differs across experiments. In many experiments a successful outcome is to reject the null hypothesis (typically defined as <italic>p≤</italic>0.05), in which case the probability of success corresponds to a calculation of experimental power. In other experiments a successful outcome is to not reject the null hypothesis, in which case the success probability is the complement of power. Finally, an experiment's success is sometimes based on a pattern of significant and non-significant hypothesis tests that contrast different aspects of the data. Regardless of the definition of success, a set of experiments with properly gathered data, and results that are appropriately analyzed and fully reported, should produce successful outcomes at a rate consistent with the experiments' estimated success probabilities <xref ref-type="bibr" rid="pone.0114255-Ioannidis1">[11]</xref>, <xref ref-type="bibr" rid="pone.0114255-Francis4">[15]</xref>, <xref ref-type="bibr" rid="pone.0114255-Francis7">[18]</xref>, <xref ref-type="bibr" rid="pone.0114255-Schimmack1">[21]</xref>. Too much success suggests that the reported results are biased in favor of the theoretical claims.</p>
<p>The logic of the TES is similar to standard approaches in hypothesis testing. We start by supposing proper data collection and analysis for each experiment along with full reporting of all experimental outcomes related to the theoretical ideas. Such suppositions are similar to the null hypothesis in standard hypothesis testing. We then identify the magnitude of the reported effects and estimate the probability of success for experiments like those reported. Finally, we compute a joint success probability, <italic>P<sub>TES</sub></italic>, across the full set of experiments, which estimates the probability that experiments like the ones reported would produce outcomes at least as successful as those actually reported. When the reported experiments are uniformly successful, <italic>P<sub>TES</sub></italic> estimates the probability that direct replications of the experiments will all be successful. The <italic>P<sub>TES</sub></italic> value plays a role similar to the <italic>p</italic> value in standard hypothesis testing, with a small <italic>P<sub>TES</sub></italic> suggesting that the starting suppositions are not entirely correct and that, instead, there appears to be a problem with data collection, analysis, or publication of relevant findings. In essence, if <italic>P<sub>TES</sub></italic> is small, then the published findings in an article appear to be “too good to be true” relative to the theoretical claims. A common criterion for <italic>P<sub>TES</sub></italic> being small is 0.1 <xref ref-type="bibr" rid="pone.0114255-Ioannidis1">[11]</xref>, <xref ref-type="bibr" rid="pone.0114255-Francis1">[12]</xref>, <xref ref-type="bibr" rid="pone.0114255-Begg1">[23]</xref>. Given scientific interest in reproducibility, a probability of 0.1 seems like a very modest criterion for success across experiments that support theoretical claims; most scientists would probably want their theoretical claims to be based on experiments with much more reliable outcomes. While it is not the case that the theoretical claims derived from experiments with excess success are necessarily (or entirely) wrong, the evidence for these claims is, at best, weaker than it first appears.</p>
<p>To explain the TES analysis without emphasizing any particular article, <xref ref-type="table" rid="pone-0114255-t001">Table 1</xref> describes the relevant statistics and hypotheses for five reported experiments that are taken from five different articles in <italic>Science</italic> and artificially brought together. The TES conclusion here will not be meaningful because these experiments do not promote a common theoretical claim, but the discussion helps to demonstrate the types of issues that appear when doing a TES analysis. The original authors reported the hypotheses listed in <xref ref-type="table" rid="pone-0114255-t001">Table 1</xref> as supporting their theoretical claims. Because “success” sometimes includes a complex set of both significant and non-significant outcomes, the estimation of success probability is often more complicated than a standard power analysis. Moreover, sometimes an article does not report sufficient statistical detail to fully estimate the success probabilities. In such cases, we always estimate an upper limit on the probabilities, which favors an interpretation that the articles are valid.</p>
<table-wrap id="pone-0114255-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0114255.t001</object-id><label>Table 1</label><caption>
<title>Statistical properties, hypotheses, and estimated probabilities of success for a set of five experiments.</title>
</caption><alternatives><graphic id="pone-0114255-t001-1" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0114255.t001" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">Statistics</td>
<td align="left" rowspan="1" colspan="1">Hypotheses</td>
<td align="left" rowspan="1" colspan="1">Probability of success</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Exp. 1</td>
<td align="left" rowspan="1" colspan="1"><italic>n</italic> = 179</td>
<td align="left" rowspan="1" colspan="1">ρ<sub>1</sub> ≠ 0</td>
<td align="left" rowspan="1" colspan="1">0.844</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>r<sub>1</sub></italic> = −0.22</td>
<td align="left" rowspan="1" colspan="1">ρ<sub>2</sub> ≠ 0</td>
<td align="left" rowspan="1" colspan="1">0.518</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>r<sub>2</sub></italic> = −0.15</td>
<td align="left" rowspan="1" colspan="1">ρ<sub>3</sub> ≠ 0</td>
<td align="left" rowspan="1" colspan="1">0.675</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>r<sub>3</sub></italic> = −0.18</td>
<td align="left" rowspan="1" colspan="1">Joint</td>
<td align="left" rowspan="1" colspan="1"><italic>0.518</italic></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Exp. 2</td>
<td align="left" rowspan="1" colspan="1"><italic>n<sub>1</sub></italic> = 17, <italic>n<sub>2</sub></italic> = 17, <italic>n<sub>3</sub></italic> = 18</td>
<td align="left" rowspan="1" colspan="1">ANOVA</td>
<td align="left" rowspan="1" colspan="1">0.684</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e001" xlink:type="simple"/></inline-formula> = 316, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e002" xlink:type="simple"/></inline-formula> = 305, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e003" xlink:type="simple"/></inline-formula> = 186</td>
<td align="left" rowspan="1" colspan="1">µ<sub>1</sub> ≠ µ<sub>3</sub></td>
<td align="left" rowspan="1" colspan="1">0.696</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>s</italic> = 152</td>
<td align="left" rowspan="1" colspan="1">µ<sub>2</sub> ≠ µ<sub>3</sub></td>
<td align="left" rowspan="1" colspan="1">0.620</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">µ<sub>1</sub>  =  µ<sub>2</sub></td>
<td align="left" rowspan="1" colspan="1">0.946</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">Joint</td>
<td align="left" rowspan="1" colspan="1"><italic>0.482</italic></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Exp. 3</td>
<td align="left" rowspan="1" colspan="1"><italic>n<sub>1</sub></italic> = 18, <italic>n<sub>2</sub></italic> = 18</td>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e004" xlink:type="simple"/></inline-formula> = 5.16,<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e005" xlink:type="simple"/></inline-formula> = 3.47</td>
<td align="left" rowspan="1" colspan="1">µ<sub>1</sub> ≠ µ<sub>2</sub></td>
<td align="left" rowspan="1" colspan="1"><italic>0.517</italic></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>s</italic> = 2.85</td>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Exp. 4</td>
<td align="left" rowspan="1" colspan="1"><italic>n<sub>1</sub></italic> = 28, <italic>n<sub>2</sub></italic> = 26</td>
<td align="left" rowspan="1" colspan="1">µ<sub>X1</sub> ≠ µ<sub>X2</sub></td>
<td align="left" rowspan="1" colspan="1">0.495</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>F<sub>X</sub>  = 4.08, F<sub>Y</sub>  = 4.40</italic></td>
<td align="left" rowspan="1" colspan="1">µ<sub>Y1</sub> ≠ µ<sub>Y2</sub></td>
<td align="left" rowspan="1" colspan="1">0.528</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>r<sub>XY</sub></italic>  = 0.36</td>
<td align="left" rowspan="1" colspan="1">Joint</td>
<td align="left" rowspan="1" colspan="1"><italic>0.318</italic></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Exp. 5</td>
<td align="left" rowspan="1" colspan="1"><italic>n<sub>1A</sub></italic> = 17, <italic>n<sub>2A</sub></italic> = 17, <italic>n<sub>1B</sub></italic> = 17, <italic>n<sub>2B</sub></italic> = 17</td>
<td align="left" rowspan="1" colspan="1">Interaction</td>
<td align="left" rowspan="1" colspan="1">0.916</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e006" xlink:type="simple"/></inline-formula> = 3.87, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e007" xlink:type="simple"/></inline-formula> = 7.00</td>
<td align="left" rowspan="1" colspan="1">µ<sub>1A</sub> ≠ µ<sub>1B</sub></td>
<td align="left" rowspan="1" colspan="1">0.681</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e008" xlink:type="simple"/></inline-formula> = 7.59, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pone.0114255.e009" xlink:type="simple"/></inline-formula> = 4.28</td>
<td align="left" rowspan="1" colspan="1">µ<sub>2A</sub> ≠ µ<sub>2B</sub></td>
<td align="left" rowspan="1" colspan="1">0.635</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"><italic>s</italic> = 3.91</td>
<td align="left" rowspan="1" colspan="1">Joint</td>
<td align="left" rowspan="1" colspan="1"><italic>0.438</italic></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><italic>P<sub>TES</sub></italic></td>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">0.018</td>
</tr>
</tbody>
</table>
</alternatives></table-wrap>
<p>Experiment 1 measured the correlation between a dependent variable and each of three related measures of another variable. The probability of a significant outcome for each test by itself is estimated with post hoc power calculations <xref ref-type="bibr" rid="pone.0114255-Champely1">[24]</xref>, which suppose that the population correlation is the value observed in the sample. (Post hoc power is sometimes inappropriately used as an estimated probability of an already observed experimental outcome, but we use it here as an estimate of success probabilities for the outcome of replication studies.) Because these outcomes are correlated, estimating the joint probability for all three observed significant outcomes would require access to the raw data. Since the probability of all three outcomes being significant must be less than the probability of any one of the outcomes, the test with the smallest success probability provides an estimated upper bound for the set. This estimated probability is a bit over one-half, and <xref ref-type="table" rid="pone-0114255-t001">Table 1</xref> lists it as the joint probability for Experiment 1.</p>
<p>Experiment 2 used a between-subjects design with three conditions. The authors supported their theoretical claims with a statistical analysis that included a significant omnibus ANOVA and significant contrasts between a control and each of the two experimental conditions. Any difference between the two experimental conditions was predicted to be non-significant. These tests are not independent, so we estimated the probability of success by the Monte Carlo method. We simulated 100,000 experiments that drew samples from normally distributed populations using the means and standard deviations derived from the sample statistics. Our simulations found that only the predicted non-significant outcome has a high probability of success. Moreover, the joint probability of all tests producing a successful outcome is less than one-half because it is uncommon for a set of random samples to satisfy so many constraints on the outcomes.</p>
<p>Experiment 3 compared ratings across two priming conditions with a two-sample <italic>t</italic>-test. The test produced only a “marginally significant” result (<italic>p</italic> = 0.09), but this was judged by the original authors as sufficient evidence to support their theoretical claim. To be consistent with the original authors, the success probability was based on a significance criterion of 0.1.</p>
<p>Experiment 4 reported two behavioral measures from two samples of participants that were exposed to different conditions. A successful outcome relative to the theory required both measures to show a significant difference. The summary statistics did not fully report the means and standard deviations of the measures, so <xref ref-type="table" rid="pone-0114255-t001">Table 1</xref> lists the relevant <italic>F</italic> values for the tests. The article reported the correlation between the measures, so we estimated the probability of success by the Monte Carlo method with 100,000 simulated experiments that took samples from populations using the correlation and standardized means, which we derived from the <italic>F</italic> values. The success probability for each individual test is close to one-half, but the probability of both outcomes being significant is closer to one-third. Similar to Experiment 2, the multiple constraints on the definition of success reduce the joint probability.</p>
<p>Experiment 5 used a two-by-two between-subjects design and predicted a significant interaction and significant contrasts across each of two pairs of conditions. We estimated the probabilities of these outcomes with simulated experiments that used the reported means and estimated standard deviations. We derived the latter from the reported <italic>F</italic> values because the reported standard deviations were inconsistent with the reported <italic>F</italic> values. Although the interaction has a high success probability, the joint success probability is low because the particular type of interaction required by the theoretical claims is fairly uncommon.</p>
<p>The italicized success probability for each experiment in <xref ref-type="table" rid="pone-0114255-t001">Table 1</xref> indicates the joint probability for all of the required outcomes for that experiment. The estimated probability that five experiments like these would all produce successful outcomes is the product of the five joint probabilities, <italic>P<sub>TES</sub></italic> = 0.018. This probability indicates that entirely successful outcomes across all of these experiments should be very rare for unbiased experiments. Had these experiments been reported together, in a single paper, to support a set of related theoretical claims, then such a low probability would indicate that readers should be skeptical that the data were gathered properly, that the analyses were appropriate, or that all relevant experimental findings were fully reported. Note that this skepticism would not mean that the theoretical claims were wrong, only that such claims were unsubstantiated by the analyses of the current set of experiments. This skepticism also would not necessarily indicate that any of the experimental results were invalid, because the TES analysis can only identify a problem across the set. It could be that all of the reported experiments were valid but that unsuccessful valid experiments were not reported. Such unsuccessful experiments might undermine the authors' theoretical claims. Alternatively, the reported experiments might be invalid by themselves because of inappropriate sampling or analysis.</p>
</sec><sec id="s3">
<title>Applying the TES Analysis to Articles in <italic>Science</italic></title>
<p>From the <italic>Science</italic> journal's on-line collection, we downloaded all 133 original research articles (and their supplementary material) that were classified as Psychology or Education for years 2005–2012. We then checked each article and its supplementary material to determine if the article contained four or more studies, a condition required to provide sufficient power to conduct a TES analysis <xref ref-type="bibr" rid="pone.0114255-Francis7">[18]</xref>. We identified 25 such articles classified as Psychology and one such article classified as Education.</p>
<p>We further examined each of these 26 articles to see if the article and its supplementary material provided sufficient detail to perform a TES analysis. Eight articles did not include sufficient detail to compute success probabilities for at least four studies and were thus excluded from the TES analysis. <xref ref-type="supplementary-material" rid="pone.0114255.s001">Information S1</xref> lists the eight excluded articles and the reasons for their exclusion. The supporting information also provides a full description of the TES analysis for each of the included articles <xref ref-type="bibr" rid="pone.0114255-Dijksterhuis1">[25]</xref>–<xref ref-type="bibr" rid="pone.0114255-Shah1">[42]</xref> (<xref ref-type="supplementary-material" rid="pone.0114255.s001">Information S1</xref>) and provides <italic>R</italic> source code (<xref ref-type="supplementary-material" rid="pone.0114255.s002">Information S2</xref>) for estimating success probabilities with Monte Carlo simulations for complicated experimental designs.</p>
</sec><sec id="s4">
<title>Results and Discussion</title>
<p><xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> lists the <italic>P<sub>TES</sub></italic> value for each of the 18 analyzed articles. For 15 of these articles, the probability of the observed experimental outcome is below the 0.1 criterion for excess success. Thus, 83% of the articles in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> make theoretical claims based on experimental results that appear to be too good to be true given a weak requirement for estimated reproducibility. This 83% apparent bias rate is especially troubling because the journal <italic>Science</italic> publishes studies that are widely considered to be among the best and most influential in the field. If the work published in <italic>Science</italic> is flawed, then either the entire field is suspect or the journal <italic>Science</italic> does not actually reflect the field's best work and its influence is unwarranted.</p>
<table-wrap id="pone-0114255-t002" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0114255.t002</object-id><label>Table 2</label><caption>
<title>Results of the TES analysis for each of eighteen articles in <italic>Science</italic>.</title>
</caption><alternatives><graphic id="pone-0114255-t002-2" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0114255.t002" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Year</td>
<td align="left" rowspan="1" colspan="1">Authors</td>
<td align="left" rowspan="1" colspan="1">Short title</td>
<td align="left" rowspan="1" colspan="1"><italic>P<sub>TES</sub></italic></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">2006</td>
<td align="left" rowspan="1" colspan="1">Dijksterhuis et al. <xref ref-type="bibr" rid="pone.0114255-Dijksterhuis1">[25]</xref></td>
<td align="left" rowspan="1" colspan="1">Deliberation-Without-Attention Effect</td>
<td align="left" rowspan="1" colspan="1">0.051</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2006</td>
<td align="left" rowspan="1" colspan="1">Vohs et al. <xref ref-type="bibr" rid="pone.0114255-Vohs1">[26]</xref></td>
<td align="left" rowspan="1" colspan="1">Psychological Consequences of Money</td>
<td align="left" rowspan="1" colspan="1">0.002</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2006</td>
<td align="left" rowspan="1" colspan="1">Zhong &amp; Lijenquist <xref ref-type="bibr" rid="pone.0114255-Zhong1">[27]</xref></td>
<td align="left" rowspan="1" colspan="1">Washing Away Your Sins</td>
<td align="left" rowspan="1" colspan="1">0.095</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2007</td>
<td align="left" rowspan="1" colspan="1">Wood et al. <xref ref-type="bibr" rid="pone.0114255-Wood1">[28]</xref></td>
<td align="left" rowspan="1" colspan="1">Perception of Goal-Directed Action in Primates</td>
<td align="left" rowspan="1" colspan="1">0.031</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2008</td>
<td align="left" rowspan="1" colspan="1">Whitson &amp; Galinsky <xref ref-type="bibr" rid="pone.0114255-Whitson1">[29]</xref></td>
<td align="left" rowspan="1" colspan="1">Lacking Control Increases Illusory Pattern Perception</td>
<td align="left" rowspan="1" colspan="1">0.008</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2009</td>
<td align="left" rowspan="1" colspan="1">Mehta &amp; Zhu <xref ref-type="bibr" rid="pone.0114255-Mehta1">[30]</xref></td>
<td align="left" rowspan="1" colspan="1">Effect of Color on Cognitive Performance</td>
<td align="left" rowspan="1" colspan="1">0.002</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2009</td>
<td align="left" rowspan="1" colspan="1">Paukner et al. <xref ref-type="bibr" rid="pone.0114255-Paukner1">[31]</xref></td>
<td align="left" rowspan="1" colspan="1">Monkeys Display Affiliation Toward Imitators</td>
<td align="left" rowspan="1" colspan="1">0.037</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2009</td>
<td align="left" rowspan="1" colspan="1">Weisbuch et al. <xref ref-type="bibr" rid="pone.0114255-Weisbuch1">[32]</xref></td>
<td align="left" rowspan="1" colspan="1">Race Bias via Televised Nonverbal Behavior</td>
<td align="left" rowspan="1" colspan="1">0.027</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2010</td>
<td align="left" rowspan="1" colspan="1">Ackerman et al. <xref ref-type="bibr" rid="pone.0114255-Ackerman1">[33]</xref></td>
<td align="left" rowspan="1" colspan="1">Incidental Haptic Sensations Influence Decisions</td>
<td align="left" rowspan="1" colspan="1">0.017</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2010</td>
<td align="left" rowspan="1" colspan="1">Bahrami et al. <xref ref-type="bibr" rid="pone.0114255-Bahrami1">[34]</xref></td>
<td align="left" rowspan="1" colspan="1">Optimally Interacting Minds</td>
<td align="left" rowspan="1" colspan="1">0.332</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2010</td>
<td align="left" rowspan="1" colspan="1">Kovács et al. <xref ref-type="bibr" rid="pone.0114255-Kovcs1">[35]</xref></td>
<td align="left" rowspan="1" colspan="1">Susceptibility to Others' Beliefs in Infants and Adults</td>
<td align="left" rowspan="1" colspan="1">0.021</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2010</td>
<td align="left" rowspan="1" colspan="1">Morewedge et al. <xref ref-type="bibr" rid="pone.0114255-Morewedge1">[36]</xref></td>
<td align="left" rowspan="1" colspan="1">Imagined Consumption Reduces Actual Consumption</td>
<td align="left" rowspan="1" colspan="1">0.012</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2011</td>
<td align="left" rowspan="1" colspan="1">Halperine et al. <xref ref-type="bibr" rid="pone.0114255-Halperin1">[37]</xref></td>
<td align="left" rowspan="1" colspan="1">Promoting the Middle East Peace Process</td>
<td align="left" rowspan="1" colspan="1">0.210</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2011</td>
<td align="left" rowspan="1" colspan="1">Ramirez &amp; Beilock <xref ref-type="bibr" rid="pone.0114255-Ramirez1">[38]</xref></td>
<td align="left" rowspan="1" colspan="1">Writing About Worries Boosts Exam Performance</td>
<td align="left" rowspan="1" colspan="1">0.059</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2011</td>
<td align="left" rowspan="1" colspan="1">Stapel &amp; Lindenberg <xref ref-type="bibr" rid="pone.0114255-Stapel2">[39]</xref></td>
<td align="left" rowspan="1" colspan="1">Disordered Contexts Promote Stereotyping</td>
<td align="left" rowspan="1" colspan="1">0.075</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2012</td>
<td align="left" rowspan="1" colspan="1">Gervais &amp; Norenzayan <xref ref-type="bibr" rid="pone.0114255-Gervais1">[40]</xref></td>
<td align="left" rowspan="1" colspan="1">Analytic Thinking Promotes Religious Disbelief</td>
<td align="left" rowspan="1" colspan="1">0.051</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2012</td>
<td align="left" rowspan="1" colspan="1">Seeley et al. <xref ref-type="bibr" rid="pone.0114255-Seeley1">[41]</xref></td>
<td align="left" rowspan="1" colspan="1">Stop Signals Provide Inhibition in Honeybee Swarms</td>
<td align="left" rowspan="1" colspan="1">0.957</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2012</td>
<td align="left" rowspan="1" colspan="1">Shah et al. <xref ref-type="bibr" rid="pone.0114255-Shah1">[42]</xref></td>
<td align="left" rowspan="1" colspan="1">Some Consequences of Having Too Little</td>
<td align="left" rowspan="1" colspan="1">0.091</td>
</tr>
</tbody>
</table>
</alternatives></table-wrap>
<p>It remains an open question whether the 83% excess success rate generalizes to studies in <italic>Scienc</italic>e with fewer than four studies. On the one hand, papers with fewer studies may be based on more convincing experimental results (e.g., larger sample sizes), which might indicate that the 83% rate should not apply to such papers. On the other hand, it seems unfair to suppose that scientists would lower their standards for papers with four or more experiments, which suggests that the same problems that produce the 83% rate would also apply to other papers. Another open question, to which similar considerations apply, is whether the excess success rate for <italic>Scienc</italic>e generalizes to other journals. It could be that journals that impose different publication criteria than <italic>Scienc</italic>e end up publishing more convincing experimental results, but it would be ironic if scientists' most valid work was published in “secondary” outlets. Furthermore, the excess success rate in <italic>Science</italic> is similar to the reported excess success rate in <italic>Psychological Science</italic> (82%), where the only other systematic TES analysis of psychology articles has been applied <xref ref-type="bibr" rid="pone.0114255-Francis9">[22]</xref>.</p>
<p>Two of the articles listed in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> deserve special discussion. One of the articles has been retracted due to Stapel's fraudulent research practices <xref ref-type="bibr" rid="pone.0114255-Stapel1">[4]</xref>, <xref ref-type="bibr" rid="pone.0114255-Stapel2">[39]</xref>. The TES analysis is not designed to detect fraudulent data, because a knowledgeable fraudster can always craft data that pass the test and produce a seemingly convincing scientific argument. Although Stapel remains responsible for his fraud, knowledgeable researchers in the field could have identified that his reported findings were too good to be true (<italic>P<sub>TES</sub></italic> = 0.075). Likewise, when evidence of fraud was levied against Marc Hauser in other publications, his article in <italic>Science</italic> <xref ref-type="bibr" rid="pone.0114255-Wood1">[28]</xref> was suspected of improper data collection. The article was subsequently “cleared” by a replication experiment, but the originally published data seem too good to be true (<italic>P<sub>TES</sub></italic> = 0.051), and the subsequent successful replication made the full set of findings even less believable (<italic>P<sub>TES</sub></italic> = 0.031).</p>
<p>Although many of the concerns about research practices have focused on articles from social psychology, <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> demonstrates that some studies from educational psychology <xref ref-type="bibr" rid="pone.0114255-Ramirez1">[38]</xref>, developmental psychology <xref ref-type="bibr" rid="pone.0114255-Kovcs1">[35]</xref>, and primate behavior <xref ref-type="bibr" rid="pone.0114255-Wood1">[28]</xref>, <xref ref-type="bibr" rid="pone.0114255-Paukner1">[31]</xref> have similar problems.</p>
<p>The bias across the studies in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> must be severe because simulation studies of the performance of the TES analysis <xref ref-type="bibr" rid="pone.0114255-Francis4">[15]</xref>, <xref ref-type="bibr" rid="pone.0114255-Francis7">[18]</xref> demonstrate that the test is conservative, in the sense that truly proper experiment sets rarely produce <italic>P<sub>TES</sub></italic> values below the 0.1 criterion. When a set of unbiased experiments all happen to produce a significant effect, such experiments also tend to give large estimated power values and thereby produce a large <italic>P<sub>TES</sub></italic> value. If the true power is small, it is unusual for all of the experiments to produce a significant outcome, but it is even more unusual for such experiments to have small estimated power values. For unbiased experiment sets, the true Type I error rate for concluding bias (reporting bias that does not exist) is often close to 0.01 even when using the nominal 0.1 criterion. Furthermore, when only a file-drawer bias exists (running proper experiments but suppressing unsuccessful findings), the test often fails to detect the bias because inflated effect size estimates from the biased set of published experiments lead to overestimated success probabilities. Thus, the high rate of bias detected in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> is unlikely to be produced only by the suppression of null findings. Instead, it seems that multiple forms of bias were applied to the articles in <italic>Science</italic>. It seems plausible that researchers often tweak their datasets or analyses to produce <italic>p</italic> values just below 0.05 and also reinterpret or suppress findings when they produce <italic>p</italic> values that cannot be forced below the significance criterion <xref ref-type="bibr" rid="pone.0114255-Masicampo1">[43]</xref>.</p>
</sec><sec id="s5">
<title>Problems with Scientific Practice in Psychology</title>
<p>The TES analyses in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> paint a worrying picture of the psychology research that is published in the journal <italic>Science</italic>. However, as noted previously, this does not necessarily imply that the authors of apparently biased articles intentionally misled readers. This section discusses how four scientific principles can be easily misapplied and how those misapplications tend to produce experimental results with excess success. This discussion is not intended to be exhaustive or entirely novel, but it tries to focus on issues that might explain the findings in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref>.</p>
<sec id="s5a">
<title>Replication Does Not Necessarily Establish Scientific Truth</title>
<p>Successful replication is widely considered to be the gold standard of empirical scientific investigations. However, the role of replication in a field such as psychology is complicated because successful outcomes are based on statistics. A successful experiment in psychology is generally one that rejects the null hypothesis, but a key lesson from the TES analysis is that the rate of successful replication must reflect the power of the experiments. Having experiments with moderate or low power consistently reject the null hypothesis is cause for concern rather than celebration. For many of the articles in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref>, reporting one or two unsuccessful but theoretically relevant experimental results would have blunted the TES analysis. Of course, even though reporting unsuccessful experimental outcomes may remove the appearance of bias, it may not strengthen the argument for the theoretical conclusion because the unsuccessful outcomes may contradict the theory.</p>
<p>Not every experiment is methodologically sound, and some experiments (even if methodologically sound) do not clarify the status of a theoretical idea. There is little reason to publish such experimental results, whether they are statistically significant or not. Unfortunately, in day-to-day scientific practice it is quite easy to interpret an unsuccessful outcome as being irrelevant to the theory or as being methodologically flawed and therefore not worth reporting. Such an attitude may reflect the conventional wisdom that non-significant results do not provide useful information because it is not possible to prove the null. Although there is truth to that conventional wisdom, reporting only significant outcomes misrepresents the magnitude of effects and can make even true null effects appear to be non-zero. In a variation of this approach, researchers may abort experiments that appear to not be working and instead focus resources elsewhere. A researcher who suppresses such an incomplete experimental result can honestly say that they do not know what would happen for a completed experiment; but if only seemingly successful experiments are run to completion, then the resulting findings are almost surely biased. A set of such experiments will tend to have an excess of success.</p>
</sec><sec id="s5b">
<title>Gathering More Data is Not Always Better</title>
<p>Statistical inference almost always improves with larger samples, which suggests that researchers using hypothesis testing will improve their conclusions by increasing sample sizes. Although unclear results should motivate scientists to gather more data and thereby reveal the truth, this idea does not work well in psychological science because the standard logic of frequentist hypothesis testing is valid only when the sample size is fixed prior to analysing data.</p>
<p>Despite this limitation in hypothesis testing, a common request by reviewers and editors is for authors to add more participants and see if a weak (say, <italic>p</italic> = 0.07) result might become significant. Reviewers almost never request that authors add more participants to see if data with a moderate (say, <italic>p</italic> = 0.03) result might become non-significant. Over multiple experiments, these requests, or similar “optional stopping” by authors, produce a bias that exaggerates effect sizes and replication rates of measured effects <xref ref-type="bibr" rid="pone.0114255-Anscombe1">[44]</xref>–<xref ref-type="bibr" rid="pone.0114255-Wagenmakers1">[46]</xref>. Gathering additional data can lead to misleading <italic>p</italic> values, because the Type I error rate increases rapidly with additional tests.</p>
<p>An experiment that stops as soon as it finds a significant result tends to produce <italic>p</italic> values that are just below the significance criterion. Indeed, for many of the studies in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref>, relevant significant outcomes have <italic>p</italic> values just below 0.05 <xref ref-type="bibr" rid="pone.0114255-Masicampo1">[43]</xref>, which tends to produce a set of experiments with excess success.</p>
</sec><sec id="s5c">
<title>The Data Should Not Always Define the Theory</title>
<p>A principle tenet of science is that a theory must change (or be rejected) to reflect new data. The principle is true, but if the precision of empirical measurements is poor, then theories defined by the data statistics largely reflect noise.</p>
<p>Consider the precision of the measured effects reported by Gervais and Norenzayan <xref ref-type="bibr" rid="pone.0114255-Gervais1">[40]</xref>, which is representative of many of the articles in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref>. <xref ref-type="fig" rid="pone-0114255-g001">Figure 1</xref> characterizes the 95% confidence interval <xref ref-type="bibr" rid="pone.0114255-Kelley1">[47]</xref> for each experiment's standardized effect size (Hedges' <italic>g</italic>). The confidence intervals around these effect size estimates stretch from almost zero to above 1.2. The breadth of these confidence intervals indicates that most of the experiments give little clarity about the true size of the measured effects. A theory that perfectly matches the mean data may be the best fit by conventional statistical criteria (e.g., maximum likelihood); but if the data are noisy then a best-fitting theory is not necessarily a good-fitting theory <xref ref-type="bibr" rid="pone.0114255-Pitt1">[48]</xref>, <xref ref-type="bibr" rid="pone.0114255-Roberts1">[49]</xref>. When imprecise experimental results are pooled together through meta-analysis or as converging evidence, they can constrain theoretical ideas in important ways, but the validity of such pooling requires the experiment set to be unbiased.</p>
<fig id="pone-0114255-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pone.0114255.g001</object-id><label>Figure 1</label><caption>
<title>The circles mark the standardized effect size for the key findings in five experiments <xref ref-type="bibr" rid="pone.0114255-Gervais1">[40]</xref>.</title>
<p>Each horizontal line indicates the range of a 95% confidence interval for the effect size. The diameter of a circle indicates the relative sample size of the experiment, with the largest sample size being 179.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pone.0114255.g001" position="float" xlink:type="simple"/></fig>
<p>Confusion about the relationship between data and theory is also reflected in what Kerr <xref ref-type="bibr" rid="pone.0114255-Kerr1">[50]</xref> called hypothesizing after the results are known (HARKing). Some researchers are so fixated on the 0.05 criterion that they take any significant result as something that should be included in a theory but judge any non-significant finding as irrelevant to the theory or as identification of a boundary condition (and thus part of the theory). Sometimes a theory that emerged from a dataset is presented as if it predicted the dataset, which is clearly a misrepresentation of the scientific process <xref ref-type="bibr" rid="pone.0114255-Bones1">[51]</xref>.</p>
<p>If many <italic>p</italic> values are close to the significance criterion, then HARKing will produce theoretical claims with a high risk of being based on noise. A TES analysis of such findings will often reveal excess success, reflecting the fact that an exact replication would be unlikely to produce the same pattern of significant and non-significant outcomes.</p>
</sec><sec id="s5d">
<title>Not All Confirmed Predictions Support a Theory</title>
<p>Perhaps no outcome of science is more convincing than when a theory makes a novel prediction that is verified by a new experiment. A common phrase in the articles contributing to <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> goes something like, “as predicted by the theory,” which is then followed by the report of a successful hypothesis test. Such verification of a prediction seems like strong validation of the theoretical ideas, but the belief in this validation is sometimes unjustified. The outcome of a statistical hypothesis test varies across random samples, which implies that the best any theoretical predictor can do is to estimate the probability of an experimental outcome. If the predicted outcome is to reject the null, this probability is power; and its estimation requires knowing an effect size, sample size, and analysis design. None of the articles in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> described a theoretically motivated effect size, which means that the theories in those articles do not actually predict the outcome of a hypothesis test (even probabilistically). Thus, the consistently reported validation of theory predictions is actually a cause for concern.</p>
<p>What may have happened for some of the articles listed in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref> is that the authors had valid reasons to search for the existence of certain effects, but they did not convert such ideas into quantitative predictions about the outcome of a hypothesis test. Without a quantitative prediction, it is difficult to design an experiment that can convincingly demonstrate a prediction failure <xref ref-type="bibr" rid="pone.0114255-Morey1">[52]</xref>. Given pressure to reduce costs, such experiments tend to be underpowered and thereby produce outcomes that are difficult for researchers to interpret. In such situations, experiments that happen to qualitatively match a predicted outcome may be given undue credence, since the observed outcome is partially due to chance, even while experiments that do not match a prediction are dismissed as being underpowered. With inadequate tests of a prediction, a scientist may strongly feel that he or she is following good scientific practice even while producing dubious support for their theory. Mistakes in data collection, statistical analysis, or reporting of relevant findings further magnify the discrepancy between a researcher's belief in the validity of a theoretical interpretation of the reported findings and the biased properties of those findings.</p>
</sec></sec><sec id="s6">
<title>Conclusions</title>
<p>Eighty-three percent of the Psychology/Education articles with four or more experiments that were published in <italic>Science</italic> (2005–2012) have an excess of success, which suggests that their results are too good to be true. Since scientists should be skeptical about those articles' theoretical claims, this high rate of bias is disturbing. It is unlikely that psychology uniquely faces these problems, as analyses suggest that at least some findings in neuroscience and medicine have similar problems <xref ref-type="bibr" rid="pone.0114255-Button1">[53]</xref>–<xref ref-type="bibr" rid="pone.0114255-Fanelli1">[55]</xref>, and these concerns may also apply to other fields that use statistics <xref ref-type="bibr" rid="pone.0114255-Francis10">[56]</xref>.</p>
<p>We would like to emphasize that the appearance of excess success does not establish intentional misconduct by the authors of an article. Given the problems identified above and in other discussions about statistics and publication practices in psychology <xref ref-type="bibr" rid="pone.0114255-Simmons1">[6]</xref>, <xref ref-type="bibr" rid="pone.0114255-John1">[7]</xref>, <xref ref-type="bibr" rid="pone.0114255-Levelt1">[10]</xref>, <xref ref-type="bibr" rid="pone.0114255-Wagenmakers1">[46]</xref>, <xref ref-type="bibr" rid="pone.0114255-Kerr1">[50]</xref>, <xref ref-type="bibr" rid="pone.0114255-Bones1">[51]</xref>, <xref ref-type="bibr" rid="pone.0114255-Ferguson1">[57]</xref>–<xref ref-type="bibr" rid="pone.0114255-Matthews1">[59]</xref>, we believe that the appearance of excess success is often an honest mistake by authors who did not appreciate the inherent variability that should appear in their hypothesis test results. Such misunderstandings may lead to misinterpretations (e.g., concluding that a non-significant outcome is irrelevant and need not be reported) or over-interpretations (e.g., deriving a theory to match all reported outcomes) of experimental findings; and these errors lead to poor theories and excess success.</p>
<p>In terms of reform, we see promise in an approach that advises exploratory empirical work to focus on principles of estimation (e.g., <xref ref-type="bibr" rid="pone.0114255-Cumming1">[60]</xref>–<xref ref-type="bibr" rid="pone.0114255-Thompson1">[62]</xref>) and in a complementary approach that advises formal methods for model development and theory testing <xref ref-type="bibr" rid="pone.0114255-Matthews1">[59]</xref>, <xref ref-type="bibr" rid="pone.0114255-Myung1">[63]</xref>, <xref ref-type="bibr" rid="pone.0114255-Wagenmakers2">[64]</xref>. Discussions about empirical findings and theories in the published literature are an integral part of the scientific process, so we also see benefit to systems such as PubMed Commons and Pub Peer that encourage such discussions and to systems such as the Open Science Framework <xref ref-type="bibr" rid="pone.0114255-Nosek1">[65]</xref> that improve access to empirical data.</p>
<p>Overall, we believe that many of the current problems in psychology reflect misunderstandings about how to draw theoretical conclusions from statistical data. Moreover, we believe that these problems can be fixed and that psychological scientists will be receptive to the solutions.</p>
</sec><sec id="s7">
<title>Supporting Information</title>
<supplementary-material id="pone.0114255.s001" mimetype="application/pdf" xlink:href="info:doi/10.1371/journal.pone.0114255.s001" position="float" xlink:type="simple"><label>Information S1</label><caption>
<p><bold>TES analyses for individual articles.</bold> This document provides a full analysis of each article listed in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref>. It also explains why some articles could not be analyzed.</p>
<p>(PDF)</p>
</caption></supplementary-material><supplementary-material id="pone.0114255.s002" mimetype="application/zip" xlink:href="info:doi/10.1371/journal.pone.0114255.s002" position="float" xlink:type="simple"><label>Information S2</label><caption>
<p><bold>TES analysis calculations.</bold> This compressed file contains a directory for every article in <xref ref-type="table" rid="pone-0114255-t002">Table 2</xref>. Each directory includes a text file describing the location of the statistics taken from the article that were used for the TES analysis. It also includes a spreadsheet that summarizes the statistics, computes effect sizes (where appropriate), and lists the estimated success probability for each experiment. The directory also includes any <italic>R</italic> source code that was used to estimate success probability.</p>
<p>(ZIP)</p>
</caption></supplementary-material></sec></body>
<back><ref-list>
<title>References</title>
<ref id="pone.0114255-Bem1"><label>1</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Bem</surname><given-names>DJ</given-names></name> (<year>2011</year>) <article-title>Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect</article-title>. <source>Journal of Personality and Social Psychology</source> <volume>100</volume>:<fpage>407</fpage>–<lpage>425</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Doyen1"><label>2</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Doyen</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Klein</surname><given-names>O</given-names></name>, <name name-style="western"><surname>Pichon</surname><given-names>C-L</given-names></name>, <name name-style="western"><surname>Cleeremans</surname><given-names>A</given-names></name> (<year>2012</year>) <article-title>Behavioral priming: It's all in the mind, but whose mind?</article-title> <source>PLoS ONE</source> <volume>7(1)</volume>:<fpage>e29081</fpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Shanks1"><label>3</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Shanks</surname><given-names>DR</given-names></name>, <name name-style="western"><surname>Newell</surname><given-names>BR</given-names></name>, <name name-style="western"><surname>Lee</surname><given-names>EH</given-names></name>, <name name-style="western"><surname>Balakrishnan</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Ekelund</surname><given-names>L</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>Priming intelligent behavior: An elusive phenomenon</article-title>. <source>PLoS ONE</source> <volume>8(4)</volume>:<fpage>e56515</fpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Stapel1"><label>4</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Stapel</surname><given-names>DA</given-names></name>, <name name-style="western"><surname>Lindenberg</surname><given-names>S</given-names></name> (<year>2011</year>) <article-title>Retraction of Stapel and Lindenberg, Science, 332 (6026) 251–253</article-title>. <source>Science</source> <volume>334</volume>:<fpage>1202</fpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Johnson1"><label>5</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Johnson</surname><given-names>CS</given-names></name>, <name name-style="western"><surname>Smeesters</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Wheeler</surname><given-names>SC</given-names></name> (<year>2012</year>) <article-title>Retraction of Johnson, Smeesters, and Wheeler (2012)</article-title>. <source>Journal of Personality and Social Psychology</source> <volume>103</volume>:<fpage>605</fpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Simmons1"><label>6</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Simmons</surname><given-names>JP</given-names></name>, <name name-style="western"><surname>Nelson</surname><given-names>LD</given-names></name>, <name name-style="western"><surname>Simonsohn</surname><given-names>U</given-names></name> (<year>2011</year>) <article-title>False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant</article-title>. <source>Psychological Science</source> <volume>22</volume>:<fpage>1359</fpage>–<lpage>1366</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-John1"><label>7</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>John</surname><given-names>LK</given-names></name>, <name name-style="western"><surname>Loewenstein</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Prelec</surname><given-names>D</given-names></name> (<year>2012</year>) <article-title>Measuring the prevalence of questionable research practices with incentives for truth-telling</article-title>. <source>Psychological Science</source> <volume>23</volume>:<fpage>524</fpage>–<lpage>532</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Yong1"><label>8</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Yong</surname><given-names>E</given-names></name> (<year>2012</year>) <article-title>Bad copy</article-title>. <source>Nature</source> <volume>485</volume>:<fpage>298</fpage>–<lpage>300</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Yong2"><label>9</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Yong</surname><given-names>E</given-names></name> (<year>2012</year>) <article-title>Nobel laureate challenges psychologists to clean up their act</article-title>. <source>Nature</source> <comment>doi:<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature.2012.11535" xlink:type="simple">10.1038/nature.2012.11535</ext-link></comment></mixed-citation>
</ref>
<ref id="pone.0114255-Levelt1"><label>10</label>
<mixed-citation publication-type="other" xlink:type="simple">Levelt Noort, Drenth Committees (2012) <italic>Flawed science: The fraudulent research practices of social psychologist Diederik Stapel</italic>.Downloaded from <ext-link ext-link-type="uri" xlink:href="https://www.commissielevelt.nl" xlink:type="simple">https://www.commissielevelt.nl</ext-link>.</mixed-citation>
</ref>
<ref id="pone.0114255-Ioannidis1"><label>11</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ioannidis</surname><given-names>JPA</given-names></name>, <name name-style="western"><surname>Trikalinos</surname><given-names>TA</given-names></name> (<year>2007</year>) <article-title>An exploratory test for an excess of significant findings</article-title>. <source>Clinical Trials</source> <volume>4</volume>:<fpage>245</fpage>–<lpage>253</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis1"><label>12</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2012</year>) <article-title>Too good to be true: Publication bias in two prominent studies from experimental psychology</article-title>. <source>Psychonomic Bulletin &amp; Review</source> <volume>19</volume>:<fpage>151</fpage>–<lpage>156</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis2"><label>13</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2012</year>) <article-title>The same old New Look: Publication bias in a study of wishful seeing</article-title>. <source>i-Perception</source> <volume>3(3)</volume>:<fpage>176</fpage>–<lpage>178</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis3"><label>14</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2012</year>) <article-title>Evidence that publication bias contaminated studies relating social class and unethical behavior</article-title>. <source>Proceedings of the National Academy of Sciences</source> <volume>109</volume>:<fpage>E1587</fpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis4"><label>15</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2012</year>) <article-title>Publication bias and the failure of replication in experimental psychology</article-title>. <source>Psychonomic Bulletin &amp; Review</source> <volume>19(6)</volume>:<fpage>975</fpage>–<lpage>991</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis5"><label>16</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2012</year>) <article-title>The psychology of replication and replication in psychology</article-title>. <source>Perspectives on Psychological Science</source> <volume>7(6)</volume>:<fpage>580</fpage>–<lpage>589</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis6"><label>17</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2013</year>) <article-title>Publication bias in “Red, Rank, and Romance in Women Viewing Men” by Elliot, et al. (2010)</article-title>. <source>Journal of Experimental Psychology: General</source> <volume>142</volume>:<fpage>292</fpage>–<lpage>296</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis7"><label>18</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2013</year>) <article-title>Replication, statistical consistency, and publication bias</article-title>. <source>Journal of Mathematical Psychology</source> <volume>57</volume>:<fpage>153</fpage>–<lpage>169</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis8"><label>19</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2013</year>) <article-title>We should focus on the biases that matter: A reply to commentaries</article-title>. <source>Journal of Mathematical Psychology</source> <volume>57</volume>:<fpage>190</fpage>–<lpage>195</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Renkewitz1"><label>20</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Renkewitz</surname><given-names>F</given-names></name>, <name name-style="western"><surname>Fuchs</surname><given-names>HM</given-names></name>, <name name-style="western"><surname>Fiedler</surname><given-names>S</given-names></name> (<year>2011</year>) <article-title>Is there evidence of publication biases in JDM research?</article-title> <source>Judgment and Decision Making</source> <volume>6</volume>:<fpage>870</fpage>–<lpage>881</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Schimmack1"><label>21</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Schimmack</surname><given-names>U</given-names></name> (<year>2012</year>) <article-title>The ironic effect of significant results on the credibility of multiple study articles</article-title>. <source>Psychological Methods</source> <volume>17(4)</volume>:<fpage>551</fpage>–<lpage>566</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis9"><label>22</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2014</year>) <article-title>The frequency of excess success for articles in Psychological Science</article-title>. <source>Psychonomic Bulletin &amp; Review</source> <volume>21(5)</volume>:<fpage>1180</fpage>–<lpage>1187</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Begg1"><label>23</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Begg</surname><given-names>CB</given-names></name>, <name name-style="western"><surname>Mazumdar</surname><given-names>M</given-names></name> (<year>1994</year>) <article-title>Operating characteristics of a rank correlation test for publication bias</article-title>. <source>Biometrics</source> <volume>50</volume>:<fpage>1088</fpage>–<lpage>1101</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Champely1"><label>24</label>
<mixed-citation publication-type="other" xlink:type="simple">Champely S (2009) pwr: Basic functions for power analysis. R package version 1.1.1. <ext-link ext-link-type="uri" xlink:href="http://CRAN.R-project.org/package=pwr" xlink:type="simple">http://CRAN.R-project.org/package=pwr</ext-link>.</mixed-citation>
</ref>
<ref id="pone.0114255-Dijksterhuis1"><label>25</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Dijksterhuis</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Bos</surname><given-names>MW</given-names></name>, <name name-style="western"><surname>Nordgren</surname><given-names>LF</given-names></name>, <name name-style="western"><surname>van Baaren</surname><given-names>RB</given-names></name> (<year>2006</year>) <article-title>On making the right choice: The deliberation-without-attention effect</article-title>. <source>Science</source> <volume>311</volume>:<fpage>1005</fpage>–<lpage>1007</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Vohs1"><label>26</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Vohs</surname><given-names>KD</given-names></name>, <name name-style="western"><surname>Mead</surname><given-names>NL</given-names></name>, <name name-style="western"><surname>Goode</surname><given-names>MR</given-names></name> (<year>2006</year>) <article-title>The psychological consequences of money</article-title>. <source>Science</source> <volume>311</volume>:<fpage>1154</fpage>–<lpage>1156</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Zhong1"><label>27</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhong</surname><given-names>C-B</given-names></name>, <name name-style="western"><surname>Liljenquist</surname><given-names>K</given-names></name> (<year>2006</year>) <article-title>Washing away your sins: Threatened morality and physical cleansing</article-title>. <source>Science</source> <volume>313</volume>:<fpage>1451</fpage>–<lpage>1452</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Wood1"><label>28</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wood</surname><given-names>JN</given-names></name>, <name name-style="western"><surname>Glynn</surname><given-names>DD</given-names></name>, <name name-style="western"><surname>Phillips</surname><given-names>BC</given-names></name>, <name name-style="western"><surname>Hauser</surname><given-names>MD</given-names></name> (<year>2007</year>) <article-title>The perception of rational, goal-directed action in nonhuman primates</article-title>. <source>Science</source> <volume>317</volume>:<fpage>1402</fpage>–<lpage>1405</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Whitson1"><label>29</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Whitson</surname><given-names>JA</given-names></name>, <name name-style="western"><surname>Galinsky</surname><given-names>AD</given-names></name> (<year>2008</year>) <article-title>Lacking control increases illusory pattern perception</article-title>. <source>Science</source> <volume>332</volume>:<fpage>115</fpage>–<lpage>117</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Mehta1"><label>30</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Mehta</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Zhu</surname><given-names>R</given-names></name> (<year>2009</year>) <article-title>Blue or red? Exploring the effect of color on cognitive task performances</article-title>. <source>Science</source> <volume>323</volume>:<fpage>1226</fpage>–<lpage>1229</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Paukner1"><label>31</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Paukner</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Suomi</surname><given-names>SJ</given-names></name>, <name name-style="western"><surname>Visalberghi</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Ferrari</surname><given-names>PF</given-names></name> (<year>2009</year>) <article-title>Capuchin monkeys display affiliation toward humans who imitate them</article-title>. <source>Science</source> <volume>325</volume>:<fpage>880</fpage>–<lpage>883</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Weisbuch1"><label>32</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Weisbuch</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Pauker</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Ambady</surname><given-names>N</given-names></name> (<year>2009</year>) <article-title>The subtle transmission of race bias via televised nonverbal behavior</article-title>. <source>Science</source> <volume>323</volume>:<fpage>1711</fpage>–<lpage>1714</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Ackerman1"><label>33</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ackerman</surname><given-names>JM</given-names></name>, <name name-style="western"><surname>Nocera</surname><given-names>CC</given-names></name>, <name name-style="western"><surname>Bargh</surname><given-names>JA</given-names></name> (<year>2010</year>) <article-title>Incidental haptic sensations influence social judgments and decisions</article-title>. <source>Science</source> <volume>328</volume>:<fpage>1712</fpage>–<lpage>1715</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Bahrami1"><label>34</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Bahrami</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Olsen</surname><given-names>K</given-names></name>, <name name-style="western"><surname>Latham</surname><given-names>PE</given-names></name>, <name name-style="western"><surname>Roepstorff</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Rees</surname><given-names>G</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>Optimally interacting minds</article-title>. <source>Science</source> <volume>329</volume>:<fpage>1081</fpage>–<lpage>1085</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Kovcs1"><label>35</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kovács</surname><given-names>ÁM</given-names></name>, <name name-style="western"><surname>Téglás</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Endress</surname><given-names>AD</given-names></name> (<year>2010</year>) <article-title>The social sense: Susceptibility to others' beliefs in human infants and adults</article-title>. <source>Science</source> <volume>330</volume>:<fpage>1830</fpage>–<lpage>1834</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Morewedge1"><label>36</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Morewedge</surname><given-names>CK</given-names></name>, <name name-style="western"><surname>Huh</surname><given-names>YE</given-names></name>, <name name-style="western"><surname>Vosgerau</surname><given-names>J</given-names></name> (<year>2010</year>) <article-title>Thought for food: Imagined consumption reduces actual consumption</article-title>. <source>Science</source> <volume>330</volume>:<fpage>1530</fpage>–<lpage>1533</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Halperin1"><label>37</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Halperin</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Russell</surname><given-names>AG</given-names></name>, <name name-style="western"><surname>Trzesniewski</surname><given-names>KH</given-names></name>, <name name-style="western"><surname>Gross</surname><given-names>JJ</given-names></name>, <name name-style="western"><surname>Dweck</surname><given-names>CS</given-names></name> (<year>2010</year>) <article-title>Promoting the Middle East peace process by changing beliefs about group malleability</article-title>. <source>Science</source> <volume>333</volume>:<fpage>1767</fpage>–<lpage>1769</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Ramirez1"><label>38</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ramirez</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Beilock</surname><given-names>SL</given-names></name> (<year>2011</year>) <article-title>Writing about testing worries boosts exam performance in the classroom</article-title>. <source>Science</source> <volume>331</volume>:<fpage>211</fpage>–<lpage>213</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Stapel2"><label>39</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Stapel</surname><given-names>DA</given-names></name>, <name name-style="western"><surname>Lindenberg</surname><given-names>S</given-names></name> (<year>2011</year>) <article-title>Coping with chaos: How disordered contexts promote stereotyping and discrimination</article-title>. <source>Science</source> <volume>332</volume>:<fpage>251</fpage>–<lpage>253</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Gervais1"><label>40</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gervais</surname><given-names>WM</given-names></name>, <name name-style="western"><surname>Norenzayan</surname><given-names>A</given-names></name> (<year>2012</year>) <article-title>Analytic thinking promotes religious disbelief</article-title>. <source>Science</source> <volume>336</volume>:<fpage>493</fpage>–<lpage>496</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Seeley1"><label>41</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Seeley</surname><given-names>TD</given-names></name>, <name name-style="western"><surname>Visscher</surname><given-names>PK</given-names></name>, <name name-style="western"><surname>Schlegel</surname><given-names>T</given-names></name>, <name name-style="western"><surname>Hogan</surname><given-names>PM</given-names></name>, <name name-style="western"><surname>Franks</surname><given-names>NR</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>Stop signals provide cross inhibition in collective decision-making by honeybee swarms</article-title>. <source>Science</source> <volume>335</volume>:<fpage>108</fpage>–<lpage>111</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Shah1"><label>42</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Shah</surname><given-names>AK</given-names></name>, <name name-style="western"><surname>Mullainathan</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Shafir</surname><given-names>E</given-names></name> (<year>2012</year>) <article-title>Some consequences of having too little</article-title>. <source>Science</source> <volume>338</volume>:<fpage>682</fpage>–<lpage>685</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Masicampo1"><label>43</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Masicampo</surname><given-names>EJ</given-names></name>, <name name-style="western"><surname>Lalande</surname><given-names>DR</given-names></name> (<year>2012</year>) <article-title>A peculiar prevalence of p values just below.05</article-title>. <source>Quarterly Journal of Experimental Psychology</source> <volume>65(11)</volume>:<fpage>2271</fpage>–<lpage>2279</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Anscombe1"><label>44</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Anscombe</surname><given-names>FJ</given-names></name> (<year>1954</year>) <article-title>Fixed-sample-size analysis of sequential observations</article-title>. <source>Biometrics</source> <volume>10(1)</volume>:<fpage>89</fpage>–<lpage>100</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Strube1"><label>45</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Strube</surname><given-names>MJ</given-names></name> (<year>2006</year>) <article-title>SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing</article-title>. <source>Behavior Research Methods</source> <volume>38</volume>:<fpage>24</fpage>–<lpage>27</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Wagenmakers1"><label>46</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wagenmakers</surname><given-names>E-J</given-names></name> (<year>2007</year>) <article-title>A practical solution to the pervasive problems of p-values</article-title>. <source>Psychonomic Bulletin &amp; Review</source> <volume>14</volume>:<fpage>779</fpage>–<lpage>804</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Kelley1"><label>47</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kelley</surname><given-names>K</given-names></name> (<year>2007</year>) <article-title>Confidence intervals for standardized effect sizes: Theory, application, and implementation</article-title>. <source>Journal of Statistical Software</source> <volume>20</volume>. <ext-link ext-link-type="uri" xlink:href="http://www.jstatsoft.org/v20/a08/" xlink:type="simple">http://www.jstatsoft.org/v20/a08/</ext-link></mixed-citation>
</ref>
<ref id="pone.0114255-Pitt1"><label>48</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pitt</surname><given-names>MA</given-names></name>, <name name-style="western"><surname>Myung</surname><given-names>IJ</given-names></name> (<year>2002</year>) <article-title>When a good fit can be bad</article-title>. <source>Trends in Cognitive Sciences</source> <volume>6(10)</volume>:<fpage>421</fpage>–<lpage>425</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Roberts1"><label>49</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Roberts</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Pashler</surname><given-names>H</given-names></name> (<year>2000</year>) <article-title>How persuasive is a good fit? A comment on theory testing</article-title>. <source>Psychological Review</source> <volume>107</volume>:<fpage>358</fpage>–<lpage>367</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Kerr1"><label>50</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kerr</surname><given-names>NL</given-names></name> (<year>1998</year>) <article-title>HARKing: Hypothesizing after the results are known</article-title>. <source>Personality and Social Psychology Review</source> <volume>2</volume>:<fpage>196</fpage>–<lpage>217</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Bones1"><label>51</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Bones</surname><given-names>AK</given-names></name> (<year>2012</year>) <article-title>We knew the future all along</article-title>. <source>Perspectives on Psychological Science</source> <volume>7(3)</volume>:<fpage>307</fpage>–<lpage>309</lpage> <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1177/1745691612441216" xlink:type="simple">10.1177/1745691612441216</ext-link></comment></mixed-citation>
</ref>
<ref id="pone.0114255-Morey1"><label>52</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Morey</surname><given-names>RD</given-names></name>, <name name-style="western"><surname>Rouder</surname><given-names>JN</given-names></name>, <name name-style="western"><surname>Verhagen</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Wagenmakers</surname><given-names>E-J</given-names></name> (<year>2014</year>) <article-title>Why hypothesis tests are essential for psychological science: A comment on Cumming</article-title>. <source>Psychological Science</source> <volume>25</volume>, <fpage>1289</fpage>–<lpage>1290</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Button1"><label>53</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Button</surname><given-names>KS</given-names></name>, <name name-style="western"><surname>Ioannidis</surname><given-names>JPA</given-names></name>, <name name-style="western"><surname>Mokrysz</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Nosek</surname><given-names>BA</given-names></name>, <name name-style="western"><surname>Flint</surname><given-names>J</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>Power failure: Why small sample size undermines the reliability of neuroscience</article-title>. <source>Nature Reviews Neuroscience</source> <volume>14</volume>:<fpage>365</fpage>–<lpage>376</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Kicinski1"><label>54</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kicinski</surname><given-names>M</given-names></name> (<year>2013</year>) <article-title>Publication bias in recent meta-analyses</article-title>. <source>PLOS One</source> <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0081823" xlink:type="simple">10.1371/journal.pone.0081823</ext-link></comment></mixed-citation>
</ref>
<ref id="pone.0114255-Fanelli1"><label>55</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fanelli</surname><given-names>D</given-names></name> (<year>2012</year>) <article-title>Negative results are disappearing from most disciplines and countries</article-title>. <source>Scientometrics</source> <volume>90</volume>:<fpage>891</fpage>–<lpage>904</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Francis10"><label>56</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Francis</surname><given-names>G</given-names></name> (<year>2014</year>) <article-title>Too much success for recent groundbreaking epigenetic experiments</article-title>. <source>Genetics</source> <volume>198</volume>:<fpage>449</fpage>–<lpage>451</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Ferguson1"><label>57</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ferguson</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Heene</surname><given-names>M</given-names></name> (<year>2012</year>) <article-title>A vast graveyard of undead theories: Publication bias and psychological science's aversion to the null</article-title>. <source>Perspectives on Psychological Science</source> <volume>7(6)</volume>:<fpage>555</fpage>–<lpage>561</lpage> <comment>doi:<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1177/1745691612459059" xlink:type="simple">10.1177/1745691612459059</ext-link></comment></mixed-citation>
</ref>
<ref id="pone.0114255-Pashler1"><label>58</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pashler</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Wagenmakers</surname><given-names>EJ</given-names></name> (<year>2012</year>) <article-title>Editors' introduction to the special section on replicability in psychological science: A crisis of confidence?</article-title> <source>Perspectives on Psychological Science</source> <volume>7</volume>:<fpage>528</fpage>–<lpage>530</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Matthews1"><label>59</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Matthews</surname><given-names>WJ</given-names></name> (<year>2011</year>) <article-title>What might judgment and decision making research be like if we took a Bayesian approach to hypothesis testing?</article-title> <source>Judgment and Decision Making</source> <volume>6(8)</volume>:<fpage>843</fpage>–<lpage>856</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Cumming1"><label>60</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Cumming</surname><given-names>G</given-names></name> (<year>2014</year>) <article-title>The new statistics: Why and how</article-title>. <source>Psychological Science</source> <volume>25(1)</volume>:<fpage>7</fpage>–<lpage>29</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Kruschke1"><label>61</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kruschke</surname><given-names>JK</given-names></name> (<year>2010</year>) <article-title>Bayesian data analysis</article-title>. <source>Wiley Interdisciplinary Reviews: Cognitive Science</source> <volume>1(5)</volume>:<fpage>658</fpage>–<lpage>676</lpage> <comment>doi:<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1002/wcs.72" xlink:type="simple">10.1002/wcs.72</ext-link></comment></mixed-citation>
</ref>
<ref id="pone.0114255-Thompson1"><label>62</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Thompson</surname><given-names>B</given-names></name> (<year>2002</year>) <article-title>What future quantitative social science research could look like: Confidence intervals for effect sizes</article-title>. <source>Educational Researcher</source> <volume>31</volume>:<fpage>25</fpage>–<lpage>32</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Myung1"><label>63</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Myung</surname><given-names>IJ</given-names></name>, <name name-style="western"><surname>Pitt</surname><given-names>MA</given-names></name> (<year>2009</year>) <article-title>Optimal experimental design for model discrimination</article-title>. <source>Psychological Review</source> <volume>116(3)</volume>:<fpage>499</fpage>–<lpage>518</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Wagenmakers2"><label>64</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wagenmakers</surname><given-names>E-J</given-names></name>, <name name-style="western"><surname>Wetzels</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Borsboom</surname><given-names>D</given-names></name>, <name name-style="western"><surname>van der Maas</surname><given-names>HLJ</given-names></name>, <name name-style="western"><surname>Kievit</surname><given-names>RA</given-names></name> (<year>2012</year>) <article-title>An agenda for purely confirmatory research</article-title>. <source>Perspectives on Psychological Science</source> <volume>7(6)</volume>:<fpage>632</fpage>–<lpage>638</lpage>.</mixed-citation>
</ref>
<ref id="pone.0114255-Nosek1"><label>65</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Nosek</surname><given-names>BA</given-names></name>, <name name-style="western"><surname>Spies</surname><given-names>JR</given-names></name>, <name name-style="western"><surname>Motyl</surname><given-names>M</given-names></name> (<year>2012</year>) <article-title>Scientific utopia II: Restructuring incentives and practices to promote truth over publishability</article-title>. <source>Perspectives on Psychological Science</source> <volume>7(6)</volume>:<fpage>615</fpage>–<lpage>631</lpage>.</mixed-citation>
</ref>
</ref-list></back>
</article>