^{1}

^{2}

^{*}

Conceived and designed the experiments: XQ BJY. Performed the experiments: XQ. Analyzed the data: XQ BJY. Wrote the paper: XQ BJY.

The authors have declared that no competing interests exist.

The advent of various high-throughput experimental techniques for measuring molecular interactions has enabled the systematic study of biological interactions on a global scale. Since biological processes are carried out by elaborate collaborations of numerous molecules that give rise to a complex network of molecular interactions, comparative analysis of these biological networks can bring important insights into the functional organization and regulatory mechanisms of biological systems.

In this paper, we present an effective framework for identifying common interaction patterns in the biological networks of different organisms based on hidden Markov models (HMMs). Given two or more networks, our method efficiently finds the top

Based on several protein-protein interaction (PPI) networks obtained from the Database of Interacting Proteins (DIP) and other public databases, we demonstrate that our method is able to detect biologically significant pathways that are conserved across different organisms. Our algorithm has a polynomial complexity that grows linearly with the size of the aligned paths. This enables the search for very long paths with more than 10 nodes within a few minutes on a desktop computer. The software program that implements this algorithm is available upon request from the authors.

Recent advances in high-throughput experimental techniques for measuring molecular interactions

Network alignment can be broadly divided into two categories, namely,

There are also many local network alignment algorithms, where examples include PathBLAST

In this paper, we extend the HMM-based framework proposed in

In this section, we present an algorithm for solving the local network alignment problem based on HMMs. For simplicity, we first focus on the problem of aligning two networks, which can be formally defined as follows: Given two biological networks

Let

Our goal is to find the best matching pair of paths

(A) Example of two undirected biological networks

To define the alignment score

(A) Ungapped hidden Markov models (HMMs) for finding the best matching pair of paths. The dots next to the hidden states represent all possible symbols corresponding to virtual nodes in

The described HMM-based network representation allows us to naturally integrate the interaction reliability scores and the node similarity scores into an effective probabilistic framework. We first define two mappings

Based on the HMM framework, the problem of finding the best matching pair of paths is transformed into the problem of finding the optimal pair of state sequences in the two HMMs that jointly maximize the observation probability of the virtual path

We repeat the above iterations until

The computational complexity of the above algorithm is

The log-probability

As we can see, the concept of the “virtual” path provides an intuitive way of coupling states in two different HMMs. In fact, by taking a closer look at the recursive equation (4), the proposed alignment algorithm can also be viewed as a Markovian walk on a product graph, whose nodes consist of all possible pairs of hidden states in the respective HMMs and the edges between these nodes are determined by the connectivity (or transition probability) between the corresponding states in the HMMs. The algorithm searches for the optimal path (or the top-

To accommodate gaps in the aligned paths

In order to find the optimal pair of paths (and their alignment) that maximize the pathway alignment score, we can apply the same dynamic programming algorithm described in the previous section. The retrieved paths can contain any of the hidden states

It is straightforward to extend the described pairwise network alignment algorithm for aligning multiple networks. Without loss of generality, we only consider the extension to the alignment of three networks. Given three network graphs

It should be noted that although we fix the length of the virtual path to

In order to obtain more general subnetwork alignments, not just alignments of linear paths, we can combine the overlapping paths among the top

The memory complexity of the proposed algorithm is

To demonstrate the effectiveness of the HMM-based network alignment algorithm, we carried out the following experiments. First, we used our algorithm to align two pairs of small synthetic networks that were used to validate the network alignment algorithm proposed in

To illustrate the potential capability of aligning different types of molecular networks, we first tested our algorithm using two small synthetic examples, which include a pair of undirected networks and another pair of directed networks. These examples were obtained from the tutorial files in the PathBLAST plugin of software Cytopscape (version 1.1,

For aligning the synthetic networks, we parameterized the HMMs as follows. We set the transition scores

We first used our algorithm for aligning a pair of undirected networks. To compare the alignment results with the results obtained by MNAligner

(A) Undirected networks; (B) Directed networks.

Without any modification, our algorithm can also be used for aligning directed networks. We demonstrate this by using the second example that contains a pair of small directed networks. In this experiment, we set the length of the virtual path to

The proposed algorithm can also be used for identifying putative pathways in a new biological network, which look similar to known pathways. To demonstrate this, we used our algorithm to search for human signaling pathways in the fruit fly PPI network. In order to compare the search results with those of the network querying algorithm in

We first obtained the PPI network of

In order to validate the accuracy of our algorithm for predicting functional modules that are conserved in different organisms, we performed additional experiments using three microbial PPI networks obtained from

For this experiment, the parameters of the HMMs have been chosen as follows. First, the transition scores

Based on the constructed HMMs, we used our algorithm to find the top-scoring pathway alignment with gaps. At each iteration, we looked for the top aligned pair of paths, stored the alignment, and removed the interactions included in the alignment from the respective networks for the next iteration. By repeating this iteration, we found 200 high-scoring path alignments. This experiment has been repeated with varying virtual path length:

The cumulative specificity of the top

Further analysis of the predicted alignments led to a number of interesting observations. For example, the alignment of

In this paper, we proposed an HMM-based network alignment algorithm that can be used for finding conserved pathways in two or more biological networks. The HMM framework and the proposed alignment algorithm has a number of important advantages compared to other existing local network alignment algorithms. First of all, despite its generality, the proposed algorithm is very simple and efficient. In fact, the alignment algorithm based on the proposed HMM framework is a variant of the Viterbi algorithm. As a result, it has a very low polynomial computational complexity, which grows only linearly with respect to the length of the identified pathways and the number of edges in each network. This makes it possible to find conserved pathways with more than 10 nodes in networks with thousands of nodes and tens of thousands of interactions within a few minutes on a personal computer. Furthermore, the HMM-based framework can handle a large class of path isomorphism, which allows us to find pathway alignments with any number of gaps (node insertions and deletions) at arbitrary locations. In addition to this, the proposed framework is very flexible in choosing the scoring scheme for pathway alignments, where different penalties can be used for mismatches, insertions and deletions. We can also assign different penalties for gap opening and gap extension, which can be convenient when comparing networks that are remotely related to each other. Another important advantage of the proposed framework is that it allows us to use an efficient dynamic programming algorithm for finding the mathematically optimal alignment. Considering that many available algorithms rely on heuristics that cannot guarantee the optimality of the obtained solutions, this is certainly a significant merit of the HMM-based approach. Although the mathematical optimality does not guarantee the biological significance of the obtained solution, it can certainly lead to more accurate predictions if combined with a realistic scoring scheme for assessing pathway homology. As demonstrated in our experiments, the proposed algorithm yields accurate and biologically meaningful results both for querying known pathways in the network of another organism and also for finding conserved functional modules in the networks of different organisms. Finally, the HMM-based framework presented in this paper can be extended for aligning multiple networks. While many current multiple network alignment algorithms adopt a progressive approach for comparing multiple networks

For future research, we plan to evaluate the performance of our HMM-based algorithm more extensively by investigating the consistency of the predicted alignments based on other available functional annotations, including the gene ontology (GO) annotations

(0.06 MB PDF)

The authors would also like to thank Maxim Kalaev, Wenhong Tian, as well as Jason Flannick for sharing the datasets and for the helpful communication.