The authors B. Gy. and O. M declare no potential conflicts of interest. The author B. W. received salary from the commercial company A5 Genetics Ltd, Hungary. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Scientists from nearly all disciplines face the problem of simultaneously evaluating many hypotheses. Conducting multiple comparisons increases the likelihood that a non-negligible proportion of associations will be false positives, clouding real discoveries. Drawing valid conclusions require taking into account the number of performed statistical tests and adjusting the statistical confidence measures. Several strategies exist to overcome the problem of multiple hypothesis testing. We aim to summarize critical statistical concepts and widely used correction approaches while also draw attention to frequently misinterpreted notions of statistical inference. We provide a step-by-step description of each multiple-testing correction method with clear examples and present an easy-to-follow guide for selecting the most suitable correction technique. To facilitate multiple-testing corrections, we developed a fully automated solution not requiring programming skills or the use of a command line. Our registration free online tool is available at
Technological innovations of the past decades enabled the concurrent investigation of complex issues in biomedical sciences and increased their reliance on mathematics. For example, high-throughput omics-based technologies (e.g., genomics, transcriptomics, proteomics, metabolomics) involving hundreds or thousands of markers offer tremendous opportunities to find associations with the phenotype. Analyzing a massive amount of data by simultaneous statistical tests, as in genomic studies, is a double-edged sword. Conducting multiple comparisons increases the likelihood that a non-negligible proportion of associations will be false positives while also increases the number of missed associations (false negatives) [
Several strategies exist to overcome difficulties when evaluating multiple hypotheses. Here we review the most frequently used correction approaches, illustrate the selected methods by examples, and provide an easy-to-follow guide to facilitate the selection of proper strategies. We also summarize the basic concepts of statistical tests, clarify conceptual differences between exploratory and confirmatory analyses, and discuss problems associated with the categorical interpretation of p-values.
To facilitate the interpretation of multiple hypothesis tests, we established a quick and user-friendly solution for automated multiple testing correction that does not require programming skills or the use of a command line. Our tool available at
In a formal scientific method, the null hypothesis (H0) is the one we are seeking to disprove, representing no differences in measurements between two groups (e.g., there is no effect of a given gene on a trait of interest). The null hypothesis is compared with a statistical test to the alternative hypothesis (H1), the antithesis of the null, assuming differences between groups (e.g., there is an association between a gene and a phenotypic trait). The procedure results in a statistical confidence measure, called a p-value, compared to the level of significance, α. Thus the p-value represents how extreme the data are while the α determines how extreme the data must be before the null hypothesis is rejected. When p is smaller than the confidence threshold α, the null hypothesis is rejected with a certain confidence, but the rejection does not "prove" the alternative hypothesis. If the p-value is higher than α, the null hypothesis is not rejected, although it does not mean the null is "true", only there is not enough evidence against it.
Four outcomes are possible of a statistical test: the test rejects a false null hypothesis (true positives), the test rejects a true null (type I error or false positives), the test does not reject a true null (true negatives), or the test does not reject a false null (type II error or false negatives). The level of significance, α, controls the level of false positives. Historically, α values have been set at 0.05 [
Instead of representing the metric of "truth" or "significance", in reality, a p-value of 0.05 means that there is 5% chance to get the observed results when the null hypothesis is true, when statistical results may not be translated into biologically relevant conclusions. For example, if we measure 20 different health parameters at p = 0.05 in a patient where all the nulls are true, one out of 20 will statistically deviate from the normal range, but without biological relevance (false positive). Following the same logic, when 20,000 genes are analyzed between two samples, the expected number of false positives increases to a substantial 1000. The elevated number of simultaneous statistical tests increases the danger that the number of irrelevant false positives exceeds the number of true discoveries; therefore, multiple-testing correction methods are required.
A great concern is that the p-value is frequently treated as a categorical statistical measure, also being reflected in reporting of the data: instead of being disclosed with precision (e.g. p = 0.081 or 0.8), p-values are described as categorical inequalities around an arbitrary cut-off (p > 0.05 or p < 0.05). The greatest concern is that results below the statistical threshold are frequently portrayed as "real" effects, while statistically non-significant estimates are treated as evidence for the absence of effects [
One always must consider the test statistics when interpreting p-values. If the sample size is too large, small and irrelevant effects might produce statistically significant results. Small sample size or large variance may, on the contrary, can render a remarkable effect to be insignificant.
Another principle is to differentiate between an exploratory and confirmatory hypothesis test and the resulting p-values. As their name suggests, exploratory analyses explore novel information within a data set to establish new hypotheses and novel research directions. To fulfill this purpose, all comparisons should be tested, followed by an appropriate adjustment of p-values. Consequently, exploratory analyses are suitable to generate hypotheses but do not "prove" them.
Confirmatory analyses, on the contrary, are testing "a priori" identified, specific hypotheses, intending to confirm or reject a limited number of clearly articulated assumptions, where significance levels are also established beforehand [
To choose a suitable correction method, one must consider the exploratory or confirmatory nature of the conducted statistical tests. A decision tree depicted in
The initial decision relies upon the statistical analysis’s exploratory or confirmatory nature, while subsequent steps narrow down the list of appropriate methods. In some cases, though, studies are based on a mixture of exploratory or confirmatory analyses.
The topic is far-reaching and rapidly proliferating to be covered in its entirety, and comparing technical parts of actual statistical methods is beyond our intended scope. Here we aim to introduce the most extensively utilized methods.
The common denominator across the presented methods dealing with multiplicity is that all of them reject the null hypothesis at the smallest p-values; still, there is a difference in the number of rejected hypotheses.
In single-step corrections, equivalent adjustments are made to each p-value. The simplest and most widely used correction method is the Bonferroni-procedure, which accounts for the number of statistical tests and does not make assumptions about relations between the tests [
Statistical concept | Formula | Explanation |
---|---|---|
αFW = 1- (1- αPC)C | where C refers to the number of comparisons performed, and αPC refers to the per contrast error rate, usually 0.05 | |
p’_{i} = np_{i} ≤ α | the p-value of each test (p_{i}) is multiplied with the number of performed statistical tests (n). If the corrected p-value (p’_{i}) is lower than the significance level, α (usually 0.05), the null hypothesis will be rejected and the result will be significant | |
p’_{i} = 1- (1—p_{i})n ≤ α. | where p_{i} refers to the p-value of each test, and n refers to the number of performed statistical tests (n) | |
FDR = E [V / R] | where E refers to the expected proportion of null hypotheses that are falsely rejected (V), among all tests rejected (R), thus it calculates the probability of an incorrect discovery. | |
FDR (t) = (π0 x m x t) / S(t) | where t represents a treshold between 0 and 1 under which p-values are considered significant, m is the number of p-values above the treshold (p_{1}, p_{2}, p_{m}), π0 is the estimated proportion of true nulls (π0 = m0 / m) and S(t) is the number of all rejected hypotheses at t | |
FDR_{i} ≤ (n x p_{i})/(nR_{i} x c(n))’ | where c(n) is a function of the number of tests depending on the correlation between the tests. If the tests are positively correlated, c(n) = 1 | |
PFP = E (V) / E (R) | where E refers to the expected proportion of null hypotheses that are falsely rejected (V), among all tests rejected (R). V and R are both individually estimated | |
q(p_{i}) = min FDR (t) | the q-value is defined as the minimum FDR that can be achieved when calling that "feature" significant |
There are two approaches to calculate the adjusted p-values with the Bonferroni-procedure. According to the first method, one may divide the per analysis error rate by the number of comparisons (α/n). Only p-values smaller than the adjusted p-value would be declared statistically significant. For example, if we have five measurements and α = 0.05, only p-values < 0.01 (five divided by 0.05) would be reported significant.
According to the second method, the p-value of each test (p_{i}) is multiplied by the number of performed statistical tests (n): np_{i}. If the adjusted p-value is lower than the significance level, α (usually 0.05), the null hypothesis will be rejected, and the result will be significant (
Another well-known one-step correction method is the Sidak-correction [
The Bonferroni-adjustment works well in settings where the number of statistical tests does not exceed a couple of dozens to a couple of hundreds, as in candidate gene studies or genomewide microsatellite scans, respectively. Nevertheless, the Bonferroni-correction is the most stringent method with the major disadvantage of over-adjusting p-values, erroneously increasing the probability of false negatives, and overlooking positive signals when evaluating a large number of tests. For example, in genomic studies testing for 40,000 genes, the adjusted p-value would decrease from p = 0.05 to the impossibly low p = 0.00000125. Novel statistical approaches are available to avoid over-adjustment.
The
For example if we conducted n = 500 statistical tests with the three smallest p-values being 0.00001, 0.00008, 0.00012, and α = 0.05, the following adjustments are concluded:
Rank#1: 0.00001 * 500 = 0.005, 0.005 < 0.05, the test is significant, reject the hull hypothesis
Rank#2: 0.00008 * 499 = 0.0398, 0.0398 < 0.05, the test is significant, reject the hull hypothesis
Rank#3: 0.00012 * 498 = 0.0596, 0.0596 > 0.05, the test is
The
Rank#1: 0.0015 * 500 = 0.75, 0.75 > 0.05, the test is not significant
Rank#2: 0.00013 * 499 = 0.0649, 0.0649 > 0.05, the test is not significant
Rank#3: 0.00001 * 498 = 0.0498, 0.0498 < 0.05, the test
Generally, the Hochberg-correction retains a larger number of significant results compared to the one-step and Holm-corrections. The Holm- and Hochberg-corrections are beneficial when the number of comparisons is relatively low while the effect rate is high, but are not appropriate for the correction of thousands of comparisons.
The widespread application of RNA-seq and microarray-based gene expression studies have greatly stimulated research on the problem of massive hypothesis testing. Controlling the
The FDR is calculated as the expected proportion of null hypotheses falsely rejected among all tests rejected, thus calculating the probability of an incorrect discovery. To clarify the distinction between the error rate and FDR, the error rate of 0.05 means that 5% of truly null hypotheses will be called significant on average. In contrast, FDR controlled at 5% means that out of 100 genes considered statistically significant, five genes will be truly null on average.
In practice, the procedure is based upon the ranking of p-values in ascending order after which each individual p-value’s Benjamini-Hochberg critical value is calculated by dividing the p-value’s individual rank with the number of tests, multiplied by the False Discovery Rate (a percentage chosen by the researcher) (
For the second smallest p-value, the critical value would be calculated as 2/100 * 0.05 = 0.01. With the Benjamini-Hochberg procedure, we are searching for the highest p-value that is smaller than the critical value. All the p-values lower than the identified p-value would be considered significant.
Various alternative methods have been developed to provide a more precise estimation of the FDR [
The Benjamini-Hochberg method is sufficient for most cases, especially when tests are independent and p-values are uniformly distributed. Simulations suggest that multiple testing correction methods perform reasonably well even in the presence of weak positive correlations, which is common in genetic studies [
When the assumption of independence among p-values is not fulfilled, another method is available to control the proportion of false positives (PFP) among all positive test results [
The FDR introduced by Benjamini and Hochberg (1995) is a global measure and can not assess the reliability of a specific genetic marker. Contrarily, the local FDR can quantify the probability of a given null hypothesis to be true by taking into account the p-value of individual genetic markers, thus assessing each marker’s significance. The method is particularly suitable if the intentions are to follow up on a single gene. However, the method requires an estimation of true nulls (π0) and the distribution under the alternative hypothesis [
The q-value is the FDR analog of the p-value in multiple hypothesis testing; adjusted p-values after FDR corrections are actually q-values. The q-value offers a measure of the strength of the observed statistic concerning the FDR; it is defined as the minimum FDR at which the test would be declared significant; in other words, it provides the proportion of significant features that turn out to be false leads (
Calculating false positives according to p-values considers all statistical tests, while q-values take into account only tests with q-values less than the chosen threshold. The concept is illustrated with the following scenario: in a genomic study with 5000 statistical tests, geneX has a p-value of 0.015 and a q-value of 0.017. In the dataset, there are 500 genes with p-values of 0.015 or less. According to the 1.5% false-positive rate, 0.015 * 5000 = 75 genes would be expected to be false positives. At q = 0.017, 1.7% of genes with p-values as small or smaller as geneX’s will be categorized as false positives; thus, the expected number of false positives is 0.017 * 500 = 8.5, which is much lower than the predicted 75.
Methods involving the Benjamini-Hochberg FDR control and Storey’s
Modern FDR-controlling methods include an informative covariate, encoding contextual information about the hypotheses to weigh and prioritize statistical tests, with an ultimate goal to increase the overall power of the experiment. Various covariate-aware FDR methods have been developed to provide a general solution for a wide range of statistical problems. A recent systematic evaluation summarized the merits of two classical and six modern FDR-controlling methods on real case studies using publicly available datasets [
Besides the introduced tools, additional complex strategies are available that require extensive skills in both statistics and programming. Excellent comprehensive reviews describe additional concepts, such as guidelines for large-scale genomic studies [
While most research has been dedicated to continuous data, high dimensional count and binary data are also common in genomics, machine learning, and imaging, such as medical scans or satellite images. An excellent summary discusses the extension of concepts in multiple testing for high dimensional discrete data, such as false discovery rate estimation to the discrete setting [
Modern biopharmaceutical applications also utilize multiplicity control, such as clinical trials involving interim and subgroup analyses with multiple treatment arms and primary and secondary endpoints. For example, when aiming to find dose effects, hypotheses are
For multiple testing procedures, R packages are available via CRAN (36) or Bioconductor [
Multiple hypothesis testing corrections help to avoid unjustified "significant" discoveries in many fields of life sciences. The question remains: which method should be used for a particular analysis? It depends on the trade-off between our tolerance for false positives and the benefit of discovery. The exploratory and confirmatory nature of the planned research is also indicative. Asking the right questions before the analysis will narrow down the number of possibilities, as illustrated with a decision tree in
Controlling for false positives is particularly relevant in biomedical research. Applications include the selection of differentially expressed genes in RNA-seq or microarray experiments, where expression measures are associated with particular covariates or treatment responses; genetic mapping of complex traits based on single nucleotide polymorphisms when evaluating the results of genome-wide association studies (GWAS); scanning the genome for the identification of transcription factor binding sites or searching a protein database for homologs, etc. [
Choosing and successfully conducting appropriate multiple testing corrections may require extensive literature and background investigations with a steep learning curve. The use of adjustment methods improves the soundness of conclusions, although they are still underutilized by many life science researchers. We hope the current summary provides a much-needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples. The frequently used adjustment tools, including the Bonferroni, the Holm-, the Hochberg-corrections, FDR, and q-value calculations, are implemented into our online calculator accessible at
PONE-D-21-00649
MultipleTesting.com: a tool for life science researchers for multiple hypothesis testing correction
PLOS ONE
Dear Dr. Menyhart,
Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.
In the revised version of the paper please address the reviewers' comments listed at the end of this email. Additionally, please provide a comprehensive literature review in which you point out similar works in terms of tools development and works that use the type of analysis for which the current tool has been developed. Please carefully check the components presented in Figure 1. Also, please provide a real life science example in which the use of this tool will ease the work of the researches. Please provide the limitations of the study and point out the possible directions of extending the work. Finally, please compare the facilities offered by your tool with other similar tools available online.
Please submit your revised manuscript by Apr 16 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at
Please include the following items when submitting your revised manuscript:
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see:
We look forward to receiving your revised manuscript.
Kind regards,
Camelia Delcea
Academic Editor
PLOS ONE
Journal Requirements:
When submitting your revision, we need you to address these additional requirements.
1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at
2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.
3. Thank you for stating the following in the Financial Disclosure section:
'The research was financed by the 2018-2.1.17-TET-KR-00001 and 2018-1.3.1-VKE-2018-00032 grants and by the Higher Education Institutional Excellence Programme (2020-4.1.1.-TKP2020) awarded to B. Gy, of the Ministry for Innovation and Technology in Hungary within the framework of the Bionic thematic program of the Semmelweis University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors wish to acknowledge the support of ELIXIR Hungary (www.
We note that one or more of the authors are employed by a commercial company: A5 Genetics Ltd, Hungary
a. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.
Please also include the following statement within your amended Funding Statement.
“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”
If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.
b. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.
Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors
c. Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.
Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests:
4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information:
[Note: HTML markup is below. Please do not edit.]
Reviewers' comments:
Reviewer's Responses to Questions
1. Is the manuscript technically sound, and do the data support the conclusions?
The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.
Reviewer #1: Yes
Reviewer #2: Yes
Reviewer #3: Yes
**********
2. Has the statistical analysis been performed appropriately and rigorously?
Reviewer #1: N/A
Reviewer #2: Yes
Reviewer #3: N/A
**********
3. Have the authors made all data underlying the findings in their manuscript fully available?
The
Reviewer #1: Yes
Reviewer #2: No
Reviewer #3: Yes
**********
4. Is the manuscript presented in an intelligible fashion and written in standard English?
PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.
Reviewer #1: Yes
Reviewer #2: Yes
Reviewer #3: Yes
**********
5. Review Comments to the Author
Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)
Reviewer #1: The authors provide a review of the most frequently used approaches for correction methods in multiple hypothesis testing.
The simultaneous problem, introduced as early as the mid-twentieth century, is interesting and has been recently revived due to advanced in technology. However, I think the authors miss to give a contribution to the topic. There are several reviews on the subject which I believe are complete. [1], [2] and [3] are examples of extensive surveys over correction methods.
1. Stefanie R Austin, Isaac Dialsingh, and Naomi Altman. Multiple hypothesis testing: A review. J. Indian Soc. Of Agricultural Stat, 68:303–314, 2014.
2. Farcomeni A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Statistical Methods in Medical Research, 17:347 (88), 2007.
3. Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46(1), 561–584.
Reviewer #2: The study “MultipleTesting.com: a tool for life science researchers for multiple hypothesis testing correction” is interesting. To facilitate multiple-testing corrections, the authors developed a fully automated solution not requiring programming skills or the use of a command line. The current research provides a much needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples. The web tool will fill the gap for life science researchers by providing a user-friendly substitute for command-line alternatives. The paper is well set, and the problem highlighted executed properly. However, attention should be given to the following highlighted points before resubmitting.
1. Page 10 of 22, “The procedure results in a statistical confidence measure, called a p-value, compared to the level of significance, α. When p is smaller than the confidence threshold α, the null hypothesis is rejected with a certain confidence, but the rejection does not "prove" the alternative hypothesis.” The P-value is called the observed significance level than why we are comparing with α. Why not take a straight level of significance from the P value. For example, if P = 0.11, means the significance level is 11%.
2. In the last, more recent references should be added to broaden the view of readers and enhance the new contribution of this paper for comparison.
3. The authors needed to add some more explanation in the conclusion section.
Reviewer #3: The authors developed an online tool for life science researchers for multiple hypothesis testing corrections. The online tool compiles the five most frequently used adjustment tools which can enable researchers to calculate False Discovery Rates (FDR) and q-values for multiple-testing corrections.
I suggest the author check the citation on page 9, paragraph 3, line 2 for proper citation.
I suggest moving the supplemental Table 1 to the main work due to its relevance.
Provide a friendly user-guide in the online tool to make it easier and more useful to targeted research communities.
The paper presented original research, reported in standard English with relevance literature discussed. With the level of contribution presented by the authors, I, therefore, recommend the manuscript for consideration by PLOS ONE after minor revision.
**********
6. PLOS authors have the option to publish the peer review history of their article (
If you choose “no”, your identity will remain anonymous but your review may still be made public.
Reviewer #1:
Reviewer #2: No
Reviewer #3:
[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]
While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,
Dear Editor-in-Chief,
Thank you for the opportunity to submit a revised version of our manuscript.
Please find below our point-by-point responses to the issues raised by the reviewers (the implemented changes are also highlighted in the manuscript):
Comments from the Editor:
In the revised version of the paper please address the reviewers' comments listed at the end of this email. Additionally, please provide a comprehensive literature review in which you point out similar works in terms of tools development and works that use the type of analysis for which the current tool has been developed. Please carefully check the components presented in Figure 1. Also, please provide a real life science example in which the use of this tool will ease the work of the researches. Please provide the limitations of the study and point out the possible directions of extending the work. Finally, please compare the facilities offered by your tool with other similar tools available online.
Thank you for your precious input. We have addressed every comment raised by the reviewers and have extended our literature review by recent articles and tools developed for controlling multiplicity in high throughput data, a particularly rapidly growing field of science. We have substantially expanded the number of cited articles. We have also addressed the strategy illustrated in Figure 1. We are discussing examples for real-life applications, where controlling for multiplicity is a common issue.
The topic of multiple hypothesis testing is far-reaching and rapidly proliferating, and our manuscript is limited to the introduction of the most extensively utilized methods. Another limitation is the lack of detailed description of technical parts of the presented methods, although the provided examples clarify the concepts. We articulate our goals more clearly in the revised version of the manuscript.
We have conducted an extensive literature search but could not locate a similar user-friendly, web-based tool for conducting multiple hypothesis testing without the need for programming knowledge. We now include a section about tools implemented in R, and emphasize the facilities enabled by our platform. Our goal for the future is to extend the repertoire of available methods further, which we also highlight in the manuscript.
Reviewer comments:
Reviewer #1: The authors provide a review of the most frequently used approaches for correction methods in multiple hypothesis testing.
The simultaneous problem, introduced as early as the mid-twentieth century, is interesting and has been recently revived due to advanced in technology. However, I think the authors miss to give a contribution to the topic. There are several reviews on the subject which I believe are complete. [1], [2] and [3] are examples of extensive surveys over correction methods.
1. Stefanie R Austin, Isaac Dialsingh, and Naomi Altman. Multiple hypothesis testing: A review. J. Indian Soc. Of Agricultural Stat, 68:303–314, 2014.
2. Farcomeni A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Statistical Methods in Medical Research, 17:347 (88), 2007.
3. Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46(1), 561–584.
Thank you very much for the valuable suggestions. Following the reviewer's advice, we incorporated a section about additional, more recent methods, focusing mainly on novel FDR-controlling strategies. We have also included some of the suggested publications, complemented by additional comprehensive reviews, to enhance our contribution to the topic; please see pages 10-12.
Also, after reviewing the scientific literature for procedures and tools available for multiple testing, we could only identify tools requiring programming skills and the use of a command line. Although programming is increasingly important, this knowledge is not yet a default state among scientists. We feel that our online calculator accessible at
Reviewer #2: The study "MultipleTesting.com: a tool for life science researchers for multiple hypothesis testing correction" is interesting. To facilitate multiple-testing corrections, the authors developed a fully automated solution not requiring programming skills or the use of a command line. The current research provides a much needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples. The web tool will fill the gap for life science researchers by providing a user-friendly substitute for command-line alternatives. The paper is well set, and the problem highlighted executed properly. However, attention should be given to the following highlighted points before resubmitting.
Thank you very much for the positive remarks regarding our manuscript.
1. Page 10 of 22, "The procedure results in a statistical confidence measure, called a p-value, compared to the level of significance, α. When p is smaller than the confidence threshold α, the null hypothesis is rejected with a certain confidence, but the rejection does not "prove" the alternative hypothesis." The P-value is called the observed significance level than why we are comparing with α. Why not take a straight level of significance from the P value. For example, if P = 0.11, means the significance level is 11%.
Thank you for your suggestion. P-values indicate how extreme the data are, while alpha values determine how extreme the data must be before the null hypothesis is rejected. We have extended our description and clarified the concept further; please see page 4. We feel that our description helps the readers by showing the difference between these two concepts.
2. In the last, more recent references should be added to broaden the view of readers and enhance the new contribution of this paper for comparison.
Thank you for the suggestion. According to the reviewer's request, we have incorporated additional work to enhance our contribution to the topic and to broaden the view of our readers, focusing mainly at recent FDR-controlling methods; please see pages 10-12.
3. The authors needed to add some more explanation in the conclusion section.
Thank you for your observation. We have incorporated additional examples, mainly aimed at life scientists, and expanded our Conclusions section; please see pages 12-13.
Reviewer #3: The authors developed an online tool for life science researchers for multiple hypothesis testing corrections. The online tool compiles the five most frequently used adjustment tools which can enable researchers to calculate False Discovery Rates (FDR) and q-values for multiple-testing corrections.
I suggest the author check the citation on page 9, paragraph 3, line 2 for proper citation.
Thank you for your remark, we have corrected the citation; please see page 9.
I suggest moving the supplemental Table 1 to the main work due to its relevance.
Thank you for your suggestion; we now moved Table 1 to the main body of the manuscript.
Provide a friendly user-guide in the online tool to make it easier and more useful to targeted research communities.
Thank you for your recommendation. We have incorporated a short description into the online platform to enhance its practicality.
The paper presented original research, reported in standard English with relevance literature discussed. With the level of contribution presented by the authors, I, therefore, recommend the manuscript for consideration by PLOS ONE after minor revision.
We appreciate the positive evaluation and have edited the manuscript to increase its standards further.
We hope that the issues raised about the manuscript have been sufficiently addressed in this improved version. On this occasion, we would also like to thank the Editor and the three anonymous reviewers for their expert and helpful comments.
With best regards:
Balázs Győrffy MD PhD
Submitted filename:
MultipleTesting.com: a tool for life science researchers for multiple hypothesis testing correction
PONE-D-21-00649R1
Dear Dr. Menyhart,
We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.
Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.
An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at
If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact
Kind regards,
Camelia Delcea
Academic Editor
PLOS ONE
Additional Editor Comments (optional):
Reviewers' comments:
Reviewer's Responses to Questions
1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.
Reviewer #2: All comments have been addressed
Reviewer #3: All comments have been addressed
**********
2. Is the manuscript technically sound, and do the data support the conclusions?
The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.
Reviewer #2: Yes
Reviewer #3: Yes
**********
3. Has the statistical analysis been performed appropriately and rigorously?
Reviewer #2: Yes
Reviewer #3: Yes
**********
4. Have the authors made all data underlying the findings in their manuscript fully available?
The
Reviewer #2: Yes
Reviewer #3: Yes
**********
5. Is the manuscript presented in an intelligible fashion and written in standard English?
PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.
Reviewer #2: Yes
Reviewer #3: Yes
**********
6. Review Comments to the Author
Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)
Reviewer #2: (No Response)
Reviewer #3: The authors have addressed all the reviewer's comments and notable improvement is observed in the revised version.
**********
7. PLOS authors have the option to publish the peer review history of their article (
If you choose “no”, your identity will remain anonymous but your review may still be made public.
Reviewer #2: No
Reviewer #3: No
PONE-D-21-00649R1
MultipleTesting.com: a tool for life science researchers for multiple hypothesis testing correction
Dear Dr. Menyhart:
I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.
If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact
If we can help with anything else, please email us at
Thank you for submitting your work to PLOS ONE and supporting open access.
Kind regards,
PLOS ONE Editorial Office Staff
on behalf of
Dr. Camelia Delcea
Academic Editor
PLOS ONE