^{1}

^{2}

^{1}

^{2}

^{*}

^{1}

^{3}

^{1}

^{2}

^{4}

^{5}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: TH SG HK. Performed the experiments: TH SG. Analyzed the data: TH SG. Contributed reagents/materials/analysis tools: TH SG RY HK. Wrote the paper: TH SG RY HK.

Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, Top

Elucidating gene regulatory networks is crucial to understand disease mechanisms at the system level. A large number of algorithms have been developed to infer gene regulatory networks from gene-expression datasets. If you remember the success of IBM's Watson in ”Jeopardy!„ quiz show, the critical features of Watson were the use of very large numbers of heterogeneous algorithms generating various hypotheses and to select one of which as the answer. We took similar approach, “Top

Most genes do not exert their functions in isolation

A plethora of algorithms have been developed to infer GRNs from gene expression data,

Several studies compared performances of the network-inference algorithms

Above observations suggest that different network-inference algorithms have different strengths and weaknesses

Analysis of DREAM5 results

Analysis of small in silico datasets of the DREAM3 challenge demonstrated that integration of the best five algorithms outperforms integration of all algorithms submitted to the challenge

A measure to quantify similarity among gene-expression datasets could be a clue to determine the optimal algorithms for reconstruction of unknown regulatory networks. This is because, if expression-data associated with known regulatory network (e.g., the DREAM5 datasets) is similar to that with unknown regulatory network, optimal algorithms for data with known regulatory network could be also optimal for data with unknown regulatory network.

Motivated by the above observations and issues, this paper focuses on four strategies towards building a comprehensive network reconstruction platform –

A computational framework to integrate diverse inference algorithms.

Systematically assess the performance of the framework against the DREAM5 datasets composed of genome-wide transcriptional regulatory networks and their corresponding expression data from actual microarray experiments as well as in silico simulation.

Develop a measure to quantify diversity among inference techniques towards identifying optimal combination of algorithms which elucidate accurate GRNs.

Develop a measure to quantify similarity among expression datasets towards selecting optimal algorithms for reconstruction of unknown regulatory networks.

To investigate these possible strategies, we first develop a novel network-inference algorithm that can combine multiple network-inference algorithms. Second, to evaluate inference performances of the algorithms precisely, we used the DREAM5 datasets composed of

We developed a computational workflow for the combination of network-inference algorithms and systematic assessment of their performance. The workflow of our framework is composed of three steps (see Supplementary

(

- | In silico |
||

Number of genes | 1,643 | 4,511 | 5,950 |

Number of samples | 805 | 805 | 536 |

In silico Dream 5 dataset.

Dream 5 dataset from

Dream5 dataset from

Based on the observation that network-inference algorithms tend to assign high confidence levels to true-positive links

Network inference algorithms have increased following Moore's Law (doubling every two years)

In this section, we first outline these components employed in this study on the basis of which the performance of Top

The performance of gene network reconstruction algorithms require benchmarking against various data sets representing network dynamics (for example, gene expression profiles) for which the underlying network is known. However, the ability to generate biologically plausible networks and validate them against experimental data remains a fundamental tenet in network reconstruction. In this respect, the DREAM initiative provides a community platform for the objective assessment of inference methods. The DREAM challenges provide a common framework on which to evaluate inference techniques against well-characterized data sets. In this study, we used large scale experimental data from the DREAM5 network inference challenge.

True-positive rate, false-positive rate, recall, and precision are representative metrics to evaluate performances of network inference algorithms (see

Further, we calculated three representative metrics,

To evaluate how Top

Black squares and lines show performances of Top

These results indicate that, while Top

Thus, by integrating only high-performance algorithms that tend to assign high confidence score to true-positive link, Top

Black squares and lines show performances of Top

As demonstrated in this section, selection of optimal algorithms for a given expression data and Top1Net, Top2Net, and community prediction based on integration of the selected optimal algorithms could be a powerful approach to reconstruct high-quality GRNs. However, currently, to our knowledge, there is no method to determine beforehand optimal algorithms for expression data associated with an unknown regulatory network. Development of a method to determine optimal algorithms is a key to reconstruct unknown regulatory networks (We investigate this issue in the next section).

Different network-inference algorithms employ different and often complementary techniques to infer gene regulatory interactions from an expression dataset. Therefore, a consensus driven approach, which leverages

It is pertinent to analyze the anatomy of diversity between different algorithms in a theoretical framework to answer the questions of -

To what extent, then, are the algorithms different from each other?

Does bringing diversity of the algorithms into community prediction improve the quality of inferred networks?

For the purposes, Marbach et al. conducted principal component analysis (PCA) on confidence scores from 35 network-inference algorithms ^{nd} and 3^{rd} principal components and grouped the algorithms into four clusters by visual inspection. The analysis demonstrated that integration of three algorithms from different clusters shows higher performance than that from the same cluster. It indicates that the diversity signature of the selected algorithms, and not just the number of algorithms, plays an important role in the performance of the network reconstruction techniques.

However, their algorithm diversity is qualitative and, to our knowledge, there is no quantitative measure for algorithm diversity. In order to quantify diversity among the individual algorithms employed in this study, we developed two quantitative measures of diversity which calculates distance between algorithms pairs on the basis of confidence scores of regulatory interactions inferred by the algorithms. One is based on simple Euclidean distance (EUC distance) and the other is based on EUC distance on 2^{nd} and 3^{rd} components from PCA analysis (PCA distance) (see

(

By using the diversity measures, we calculated distance among 10 optimal algorithms for each of the DREAM5 datasets to examine whether bringing quantified algorithm diversity into Top1Net (and Community prediction) improves the performances of network reconstruction. Based on the calculated distances, we defined high-diversity pairs as top 10% of algorithm pairs with highest distance, while low-diversity pairs are defined as bottom 10% of algorithm pairs with lowest distance. In this study, we have 45 algorithm pairs among 10 optimal algorithms and thus top 5 algorithm pairs with highest distance are high-diversity pairs, while bottom 5 algorithm pairs with lowest distance are low-diversity pairs.

Next, we evaluated the performances of Top1Net (or community prediction) based on integration of high-diversity pairs and those of low-diversity pair. As seen in

H and L represent high-diversity and low-diversity algorithm pairs, respectively. (

Even for the same number of algorithms (in this case, two algorithms are integrated), the quantitative diversity to selected pairs can improve the performance of the consensus methods (Top

Quantitative diversity-guided consensus can reduce the cost of consensus (only 2 algorithms integration instead of 38 algorithms integration in this case) without compromising the quality of the inferred network as shown in this study where the inference performance of high diversity pair is much higher than that of 38 algorithms combination.

Top1Net or Top2Net based on integration of highest-performance algorithms consistently reconstruct the most accurate GRNs, as demonstrated in the previous section (see

A measure to quantify similarity among expression datasets can be a key to select optimal network-inference algorithms for each of the datasets, because, if similarity between expression-data associated with known regulatory network (

First, we briefly explain the overview of the procedure to calculate similarity among expression datasets (see

(

Next, to evaluate whether dataset similarity can be used to govern optimal selection of inference algorithms, we calculated the similarity among the DREAM5 gene-expression datasets and compared the performance of the algorithms across the datasets. As seen in

The scatter plots show correlation of algorithm distance between two gene-expression datasets. Each of points in scatter plots represents each of algorithm pairs. Because we have 703 algorithm pairs among 38 algorithms, 703 points are in each of the figures. Vertical axis represents (EUC or PCA) distance between two algorithms for one gene-expression dataset, while horizontal axis represents that for the other gene-expression dataset. (

Dataset 1 | Dataset 2 | EUC distance |
PCA distance |

0.87 | 0.81 | ||

0.83 | 0.83 | ||

0.99 | 0.99 |

Spearman's correlation coefficient of algorithm distance (EUC distance) between Dataset 1 and Dataset 2.

Spearman's correlation coefficient of algorithm distance (PCA distance) between Dataset 1 and Dataset 2.

In silico Dream 5 dataset.

Dream 5 dataset from

Dream5 dataset from

Further correlation of algorithm performances between dataset pair with high similarity (

From above observations (observations in

As seen in

Red lines show performance of Top

Further, as shown in

With an increasing corpus of inference algorithms, leveraging their diverse and sometimes complementary approaches in a community consensus can be a promising strategy for reconstruction of gene regulatory networks from large scale experimental data. A computational platform to systematically analyze, assess and leverage these diverse techniques is essential for the successful application of reverse engineering in biomedical research.

This study presents a reverse engineering framework which can flexibly integrate multiple inference algorithms, based on

Comparative evaluation on the DREAM5 datasets showed that, although Top

Why does Top1net algorithm integrating 10 optimal algorithms perform quite well and outperform the best individual method? This is because 10 optimal algorithms tend to assign high-confidence scores to true-positive links and Top1net method can recover many true-positive links that are with the highest confidence scores from 10 optimal algorithms. Furthermore, 10 optimal algorithms are based on different techniques (

Why, then, Top1net outperforms community prediction and Topknet with higher

A key to reconstruct accurate GRNs is development of a method to determine optimal algorithms for a given expression dataset associated with unknown regulatory network. As mentioned in results, if similarity between expression-data associated with known regulatory network (

Based on this observation, we developed a measure to quantify similarity among the expression datasets based on algorithm diversity and demonstrated that, if similarity between the two expression-datasets is high, integration of algorithms that are optimal for one dataset could perform well on the other dataset. Thus, the similarity measure proposed in this study can be a good clue to identify optimal algorithms for reliable reconstruction of an unknown regulatory network.

The consensus framework outlined in this paper, Top

We used the DREAM5 datasets (

To evaluate performance of inference algorithms for the DREAM5 datasets, DREAM organizers provide a matlab software (_{1}+_{2}), where _{1} and _{2} are the mean of the log-transformed AUC-PR p-values and that of the log-transformed AUC-ROC p-values taken over the three networks of the DREAM5 challenge, respectively.

We obtained confidence scores between two genes by 35 algorithms (29 algorithms are from DREAM5 participants and 6 algorithms are commonly used “off-the shelf” algorithms) from supplementary file of Marbach et al.

For a given threshold value of confidence level, network-inference algorithms predict whether a pair of genes have regulatory link or not. A pair of genes with a predicted link is considered as a true positive (TP) if the link is present in the underlying synthetic network, while the pair is a false positive (FP) if the synthetic network does not have the link. Similarly, a pair of genes without a predicted link is considered as a true negative (TN) or false negative (FN) depending on whether the link exists or not in the underlying synthetic network, respectively. By using the values of TP, FP, TN, and FN, we can calculate several metrics to evaluate performances of network-inference algorithms.

One representative metric is precision/recall curve where the precision (

By using confidence scores among genes by network-inference algorithms, we calculated, _{EUC}(X,Y), the simple Euclidean distance between two network-inference algorithms (EUC distance) X and Y for expression datasets with given number of genes and given sample size. Before giving a definition for _{EUC}(X,Y), let us first define some notations. Let

Further, we calculated, _{PCA}(X,Y), the distance between two network-inference (X and Y) on 2^{nd} and 3^{rd} principal components (PCA distance) from PCA analysis on confidence scores of 38 algorithms. Let _{2}(X) and _{3}(X) be the 2^{nd} and 3^{rd} components of X, respectively. We defined the PCA distance between two algorithms as

By using distances among algorithms, we calculated, _{1}, a_{2}, …, a_{i}_{k}_{1},a_{2}),(a_{2},a_{3}),…,(a_{i}_{-1},a_{i}_{k}_{-1},a_{k}_{i}_{j}_{1} and a_{2} for da1. _{da1}{AC} = {_{1},a_{2},da1), _{2},a_{3},da1), …, _{i}_{-1},a_{i}_{k}_{-1},a_{k}_{da1}{AC} and _{da2}{AC}.

To infer GRNs from the large-scale expression data of DREAM5 (expression data of

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(TIF)

(XLS)

(DOC)

The authors thank Matsuoka Y, Kang H, Fujita K, Lopes T, and Shoemaker J for their useful comments and discussion.