^{1}

^{2}

^{2}

^{1}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: TC. Performed the experiments: MO TC. Analyzed the data: TC MO HS. Wrote the paper: TC.

Since metabolome data are derived from the underlying metabolic network, reverse engineering of such data to recover the network topology is of wide interest. Lyapunov equation puts a constraint to the link between data and network by coupling the covariance of data with the strength of interactions (Jacobian matrix). This equation, when expressed as a linear set of equations at steady state, constitutes a basis to infer the network structure given the covariance matrix of data. The sparse structure of metabolic networks points to reactions which are active based on minimal enzyme production, hinting at sparsity as a cellular objective. Therefore, for a given covariance matrix, we solved Lyapunov equation to calculate Jacobian matrix by a simultaneous use of minimization of Euclidean norm of residuals and maximization of sparsity (the number of zeros in Jacobian matrix) as objective functions to infer directed small-scale networks from three kingdoms of life (bacteria, fungi, mammalian). The inference performance of the approach was found to be promising, with zero False Positive Rate, and almost one True positive Rate. The effect of missing data on results was additionally analyzed, revealing superiority over similarity-based approaches which infer undirected networks. Our findings suggest that the covariance of metabolome data implies an underlying network with sparsest pattern. The theoretical analysis forms a framework for further investigation of sparsity-based inference of metabolic networks from real metabolome data.

While the majority of computational systems biology approaches use cellular networks as scaffolds to analyze omics data, some focus on investigation of the information content of omics data to recover the underlying biological network. These approaches, termed top-down systems biology

Network inference approaches can be grouped into two in terms of the directionality of the inferred network. A large group of approaches including similarity-based approaches such as partial Pearson correlation and mutual information infers undirected networks

Steady-state data have also been a focus of reverse engineering approaches, but almost exclusively to infer undirected networks. Few examples use steady-state data only to infer partially directed networks

Cellular networks have been shown to exhibit sparse structures

In this work, we perform a theoretical study based on constraining observational steady state metabolome data with the sparsity information, and show the potential of such data to discover underlying metabolic networks with directionality information as well as the interaction strength of metabolite pairs. The results are demonstrated

A metabolic reaction network can be described by a set of nonlinear differential equations around its metabolites,

For systems around steady state, a linear approximation can be made to express the equation system in terms of Jacobian matrix, _{s}, and _{s}. Jacobian matrix holds very detailed information on the underlying network structure including (i) direction of interaction, (ii) nature of interaction (positive or negative), and (iii) strength of interaction. The (^{th} entry of a Jacobian matrix quantifies the influence of

_{i} shows the extent of fluctuations, and _{i}

As demonstrated by

Here, ^{2}^{2}^{2}^{2}

^{2}

Lyapunov equation was already shown to hold for metabolic networks

Indeed, when we calculated Spearman correlation between the exact covariance matrix of yeast that we obtained from

The use of similarity-based network inference approaches (eg. correlation) to infer undirected metabolic networks from metabolome data showed that full-order partial Pearson correlation, also known as Graphical Gaussian Model (GGM), is the best performer among others studied _{GGM}| <0.001 do not have an edge in between in the real network, and pairs with |R_{GGM}| >0.60 are linked in reality. We used this purely data-based information on very lowly correlated and very highly correlated metabolite pairs in order to further constrain

Note that the two terms in the objective (fitness) function are indeed summed up since the logarithm of the term in parenthesis is negative for values smaller than 1. λ was chosen 0.05 in all simulations. To guarantee the search of reasonable solution space, first term of ^{2}−n)×0.9, since the diagonals of a Jacobian matrix cannot be zero, and it is not feasible for a Jacobian matrix to have more than 90% of its remaining entries to be zero. Similarly, the second term was replaced by (10–[10+log10(||^{−10}. Thereby, we reduced the contribution of the second term on the objective function for such small residual norms to shift the emphasis on the number of zeros. These constraints prevented the genetic algorithm to get stuck in local minima. For noise analysis and missing-data analysis cases, we reduced the contribution of residual norms smaller than 1×10^{−5} to the fitness function and used (5–[5+log10(||

The Jacobian vector corresponding to the best individual obtained from the genetic algorithm was compared with the real Jacobian vector, and prediction was quantified statistically by using true positive rate (TPR) and false positive rate (FPR). When necessary, g-score

The directionality was taken into account while calculating these metrics. That is, a true positive count meant that both the availability of interaction and its direction were correctly inferred. Similarly, false negatives were the edges which were either predicted as no-edge, or predicted in wrong directions. The entries in the calculated Jacobian vector which are smaller than 1×10^{−8} were assumed to be zero.

We start with a demonstration of our approach on a smaller (6-node) system. We preferred a non-metabolic system for the demonstration on purpose to draw attention to the fact that our approach can also be applied to the inference of other biological networks such as gene-regulatory or signaling networks. The system has 6 nodes, and 5 interactions in between

The gene network is from

We first calculated covariance matrices of the three metabolic networks ^{−4}), making these inferred interactions practically irreversible, and in the direction of true edges. This corresponds to a practical TPR of 0.97. The remaining false negative was indeed for a very weak regulatory interaction between fructose-1,6-biphosphate and pyruvate (on the order of 10^{−5}). Our approach could not capture this interaction due to its almost-zero strength. This means that our approach is almost flawless for the inference of stronger interactions, considering these three systems from mammalian, eukaryotic and prokaryotic organisms. One should note that the predicted networks are condition-specific. The approach infers the active links of the metabolic networks for the condition of interest rather than inferring the general metabolic network with all possible reactions, which also has a sparse structure.

System Characteristics | Inference-Quality Metrics | ||||

Number ofNodes | Number ofInteractions | True Positive Rate | False PositiveRate | Spearman Correlationof Strengths | |

Brain | 12 | 17 | 1.00 | 0 | 1.00 |

13 | 21 | 1.00 | 0 | 1.00 | |

18 | 39 | 0.85 | 0 | 1.00 |

To allow a clearer demonstration of the positive effect of the sparsity objective on the results, we repeated calculations with an alternative double-objective function which simultaneously minimizes (i) sum of the absolute values of the elements of Jacobian matrix and (ii) Euclidean norm of the residuals, by using a similar framework as in

A previous study used two types of ^{th} order partial correlation) as most powerful similarity-based approach based on their analysis. We compared our results with the networks inferred in that study based on GGM. Additionally, we generated

Lyapunov-based Approach | Similarity-based GGM Approach | |||||

Enzymatic |
Intrinsic |
|||||

TPR(directed) | FPR | TPR(directed) | FPR | TPR(directed) | FPR | |

Brain | 1 | 0 | 0.64 | 0.34 | 0.44 | 0.03 |

1 | 0 | 0.69 | 0.19 | 0.77 | 0.13 | |

0.85 | 0 | 0.66 | 0.16 | 0.61 | 0.08 |

Next, the sensitivity of our approach to noise in data was analyzed. To do so, we focused on one of the networks:

Lyapunov-based Approach(0.5% standard dev.) | Similarity-based GGM Approach | |||||||

Enzymatic Variation | Intrinsic Variation | |||||||

TPR | FPR | R_{Sp} |
TPR | FPR | R_{Sp} |
TPR | FPR | R_{Sp} |

0.73 |
0.11 | 0.51 |
0.60 | 0.15 | 0.39 | 0.71 | 0.21 | 0.47 |

As expected, and observed before

One important and relatively untouched issue in the literature is how network inference approaches behave in case of missing data. It may not be possible to have metabolomic measurement for every node in a metabolic network. We have investigated the effect of missing data on the prediction capacity of our approach, compared to similarity-based approaches.

We have focused on

The inferred network included a link between 3-phosphoglycerate and phosphoenolpyruvate, and between Fructose-6-phosphate and Triose-phosphate as expected. The expected connection between triose-phosphate and phosphate was not recovered. This is probably due to the strength of the interaction between F16bP and phosphate in the original network: it was a relatively weak interaction. The inferred directed network has a TPR of 0.67 and an FPR of 0.08. When the prediction of directionality is not taken into account, TPR and FPR are 0.74 and 0.08 respectively. This corresponds to a g-score of 0.82. For comparison, ^{−11}) when interaction directions were considered, and 0.70 when interaction directions were not considered (p-value: 2×10^{−9}) by our approach. Spearman values of 0.32 and 0.44 by the similarity-based approaches were identified between undirected Jacobian strengths and GGM values.

We have presented a theoretical analysis which justifies the use of sparsity as a cellular objective from the perspective of network inference. Additionally, the results imply a superiority of our approach to the similarity-based approaches reported for metabolic network inference so far. The approach has three strengths: (i) high-quality directed network inference, (ii) no requirement for advanced and complicated experimental design such as knock-outs, only data around steady-state are sufficient, (iii) recovering interaction strengths between metabolite pairs. Moreover, the approach can readily be applied to the inference of other types of biological networks such as gene-regulatory networks.

One should note that our Lyapunov-based approach has an equation system with n^{2} unknowns for an n-metabolite system. It may seem to be an obstacle to apply the method to metabolomic datasets with larger coverage. However, considering increased computational capacity with novel approaches such as cloud-computing, this may not be a primary issue. Besides, we have shown that our approach can still have a high-TPR & low-FPR characteristics in the case of missing information for some nodes. In this sense, our study has attempted to address the untouched issue of missing data in the metabolic network inference area by covering also the performance of similarity-based approaches on this issue. Our next focus will be the improvement of the algorithm to show its applicability to larger metabolic networks. A further challenge will be to test the approach in terms of the required data characteristics, such as the number of replicates, to infer structures from real metabolome data.

(PDF)

Prof. Age Smilde (University of Amsterdam) is gratefully acknowledged for his invaluable contributions on the initial phase of the study, and for reading the manuscript.