^{1}

^{2}

^{1}

^{*}

^{2}

^{3}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: EN HSH RM RMTF. Performed the experiments: EN HSH. Analyzed the data: EN HSH. Contributed reagents/materials/analysis tools: EN HSH RM RMTF. Wrote the paper: EN HSH RM RMTF.

Standard Gibbs energies of reactions are increasingly being used in metabolic modeling for applying thermodynamic constraints on reaction rates, metabolite concentrations and kinetic parameters. The increasing scope and diversity of metabolic models has led scientists to look for genome-scale solutions that can estimate the standard Gibbs energy of all the reactions in metabolism. Group contribution methods greatly increase coverage, albeit at the price of decreased precision. We present here a way to combine the estimations of group contribution with the more accurate reactant contributions by decomposing each reaction into two parts and applying one of the methods on each of them. This method gives priority to the reactant contributions over group contributions while guaranteeing that all estimations will be consistent, i.e. will not violate the first law of thermodynamics. We show that there is a significant increase in the accuracy of our estimations compared to standard group contribution. Specifically, our cross-validation results show an 80% reduction in the median absolute residual for reactions that can be derived by reactant contributions only. We provide the full framework and source code for deriving estimates of standard reaction Gibbs energy, as well as confidence intervals, and believe this will facilitate the wide use of thermodynamic data for a better understanding of metabolism.

The metabolism of living organisms is a complex system with a large number of parameters and interactions. Nevertheless, it is governed by a strict set of rules that make it somewhat predictable and amenable to modeling. The laws of thermodynamics play a pivotal role by determining reaction feasibility and by governing the kinetics of enzymes. Here we introduce estimations for the standard Gibbs energy of reactions, with the best combination of accuracy and coverage to date. The estimations are derived using a new method which we denote

A living system, like any other physical system, obeys the laws of thermodynamics. In the context of metabolism, the laws of thermodynamics have been successfully applied in several modeling schemes to improve accuracy in predictions and eliminate infeasible functional states. For instance, several methodologies that reflect the constraints imposed by the second law of thermodynamics have been developed

The nearly ubiquitous method for experimentally obtaining thermodynamic parameters for biochemical reactions, specifically their standard transformed Gibbs energies

In 1957

In order to facilitate these

In the 50 years following Burton's work, several such tables of formation Gibbs energies have been published. Some of the most noteworthy are the table by R. Thauer

Quite coincidentally, a year after Burton published his thermodynamic tables, S. Benson and J. Buss

Group contribution methods were relatively successful in estimating the thermodynamic parameters of ideal gases

The coverage is calculated as the percent of the relevant reactions in the KEGG database (i.e. reactions that have full chemical descriptions and are chemically balanced). The median residual (in absolute values) is calculated using leave-one-out cross-validation over the set of reactions that are within the scope of each method. Note that the reason component contribution has a higher median absolute residual than RC is only due to its higher coverage of reactions (for reactions covered by RC, the component contribution method gives the exact same predictions). *The residual value for Alberty's method is not based on cross-validation since it is a result of manual curation of multiple data sources – a process that we cannot readily repeat.

In this paper, we aim to unify GC and RC into a more general framework we call the Component Contribution method. We demonstrate that component contribution combines the accuracy of RC with the coverage of GC in a fully consistent manner. A plot comparing the component contribution method to other known methods is given in

The extensive use of formation Gibbs energies for calculating

For instance, Alberty's formation energy table ^{−1}, NAD(ox)^{−1}, FAD(ox)^{−2}, FMN(ox)^{−2} and seven other redox carriers). In most reactions which use these co-factors as substrates, the “zeros” will cancel out since one of the products will match it with a formation energy which is defined according to the same reference point (e.g. FAD(ox)^{−2} will be matched with FAD(red)^{−2} whose formation energy is

One way to deal with the problem of reference-point conflicts, is to use either RC or GC exclusively for every reaction

The combined stoichiometry of (1) threonine aldolase, (2) acetaldehyde dehydrogenase (acetylating), (3) glycine C-acetyltransferase, and (4) threonine:NAD oxidoreductase creates a futile cycle where all the inputs and outputs are balanced. Using RC we are able to derive the

Reference-point conflicts and first-law violations can both be avoided, by adjusting baseline formation energies of compounds with non-elemental reference points to match group contribution estimates. This approach was taken in ^{−2} and all other reference points in Alberty's table were set equal to their group contribution estimates. All formation energies that were determined relative to each reference point were then adjusted according to Alberty's table to maintain the same relative formation energies. The main disadvantage of this approach is that the set of reference points is fixed and limited to a few common cofactors. The coverage of reactant contributions could be increased by also defining less common metabolites as reference points, but listing them all in a static table would be impractical and inefficient.

The component contribution method, which is described in detail in the following sections of this paper, manages to combine the estimates of RC and GC while avoiding any reference-point conflicts or first-law violations. In the component contribution framework, the maximal set of reference points given a set of measured reactions is automatically determined. We maintain the notion of prioritizing RC over GC, but rather than applying only one method exclusively per reaction, we split every reaction into two independent reactions. One of these sub-reactions can be evaluated using RC, while the other cannot – and thus its

The component contribution method integrates reactant contributions and group contributions in a single, unified framework using a layered linear regression technique. This technique enables maximum usage of the more accurate reactant contributions, and fills in missing information using group contributions in a fully consistent manner. The inputs to the component contribution method are the stoichiometric matrix of measured reactions, denoted

The regression model used in the reactant contribution method is based on the first law of thermodynamics (conservation of energy). The first law dictates that the overall standard Gibbs energy of a reaction that takes place in more than one step, is the sum of the standard Gibbs energies of all the intermediate steps at the same conditions

From

Least-squares linear regression on the system in

The standard Gibbs energy

The reactant contribution method can be used to evaluate standard Gibbs energies for

Increased reaction coverage can be achieved using the group contribution method, where each compound in

The reaction coverage of the group contribution method is much greater than that of the reactant contribution method in

The reactant contribution method covers any vector in the range of

(A) The reaction vector

The component

A common example where

In order to evaluate the improvement in estimations derived using component contribution compared to an implementation of group contribution

The CDF of the absolute-value residuals for both group contribution (

Our results show a significant improvement for component contribution compared to group contribution when focusing on reactions in the range of

In each iteration of the cross-validation, one reaction was excluded from the training set. To further validate the component contribution method, we used the results of each iteration to predict independent observations of the reaction that was excluded. All available observations of that reaction were then compared against the prediction intervals for its standard Gibbs energy (see section

A major application of the component contribution method is estimation of standard Gibbs energies for reactions in genome-scale reconstructions. Such large reaction networks require consistent and reliable estimates with high coverage. If estimates are not consistent, the risk of reference point violations increases with network size. As discussed in section

Here, we apply von Bertalanffy 2.0 to two reconstruction; the

Compartment | pH | Electrical potential (mV) |

Cytosol | 7.70 | 0 |

Periplasm | 7.70 | 90 |

Extracellular fluid | 7.70 | 90 |

Compartment | pH | Electrical potential (mV) |

Cytosol | 7.20 | 0 |

Extracellular fluid | 7.40 | 30 |

Golgi apparatus | 6.35 | 0 |

Lysosomes | 5.50 | 19 |

Mitochondria | 8.00 | −155 |

Nucleus | 7.20 | 0 |

Endoplasmic reticulum | 7.20 | 0 |

Peroxisomes | 7.00 | 12 |

von Bertalanffy 2.0 relies on component contribution estimated standard reaction Gibbs energies, whereas older versions relied on a combination of experimental data and group contribution estimates.

iAF1260 | Recon 1 | |||

Fleming et al. |
Current study | Haraldsdóttir et al. |
Current study | |

Coverage | 85% | 90% | 63% | 72% |

RMSE (kJ/mol) | 9.9 | 2.7 | 11.6 | 3.1 |

Mean |
20.3 | 2.3 | 3.4 | 2.2 |

Another improvement achieved with the component contribution method was the lower standard error,

The lower RMSE achieved with component contribution stems primarily from two factors. The first is the normalization of the training data by the inverse Legendre transform, which in

in iAF1260 (

The component contribution method presented in this paper merges two established methods for calculating standard Gibbs energies of reactions while maintaining each of their advantages; accuracy in the case of reactant contribution (RC) and the wide coverage of group contribution (GC). By representing every reaction as a sum of two complementary component reactions, one in the subspace that is completely covered by RC and the other in the complementary space, we maximize the usage of information that can be obtained with the more accurate RC method. Overall, we find that there is a 50% reduction in the median absolute residual compared to standard GC methods, while providing the same wide coverage and ensuring that there are no reference-point inconsistencies that otherwise lead to large errors. Furthermore, since our method is based on least-squares linear regression, we use standard practices for calculating confidence intervals for standard Gibbs energies (see section

Since the empirical data used in our method is measured in various conditions (temperature, pH, ionic strength, metal ion concentrations, etc.) – it is important to “standardize” the input data before applying any linear regression model

The precision of the component contribution method is limited by the accuracy of the measured reaction equilibrium constants used in the regression model. In cases of isolated reactions, where the empirical data cannot be corroborated by overlapping measurements, large errors will be directly propagated to our estimate of those reactions' standard Gibbs energies. As the number of measurements underlying an estimate is reflected in its standard error, however, confidence intervals for such reactions will be large. It is therefore recommended to use confidence intervals, and not point estimates, for simulations and predictions based on standard Gibbs energy estimates. In the future, it might be worthwhile to integrate several promising computational prediction approaches

The use of thermodynamic parameters in modeling living systems has been hindered by the fact that it is mostly inaccessible or requires a high level of expertise to use correctly, especially in genome-scale models. In order to alleviate this limitation, we created a framework that facilitates the integration of standard reaction Gibbs energies into existing models and also embedded our code into the openCOBRA toolbox. The entire framework (including the source code and training data) is freely available. We envisage a collaborative community effort that will result in a simple and streamlined process where these important thermodynamic data are widely used and where future improvements in estimation methods will seamlessly propagate to modelers.

The component contribution estimated standard Gibbs energy

The covariance matrix ^{2}. The covariance matrix ^{2}.

For a reaction

In calculating

The covariance matrices can likewise be used to propagate lack of knowledge to

Both group contribution and component contribution are parametric methods that use a set of training data in order to evaluate a long list of parameters. In order to validate these models, we need to use more empirical data which has not been used in the training phase. Since data regarding reaction Gibbs energies is scarce, we apply the leave-one-out method in order to maximize the amount of data left for training in each cross-validation iteration. As a measure for the quality of the standard Gibbs energy estimations from each method we use the median absolute residual of the cross-validation results compared to the observations.

Our entire training set consists of 4146 distinct reaction measurements. However, since many of them are experimental replicates – measurements of the same chemical reaction in different conditions or by different researchers – we can only use each distinct reaction once. We thus take the median

The

For an input reaction

We estimated

We estimated

The component contribution method has been implemented in both Matlab and Python. The Matlab implementation is tailored towards application to genome-scale metabolic reconstructions. It is fully compatible with the COBRA toolbox

Supporting text with sections on 1) the inverse Legendre transform of the training data, 2) group decomposition, 3) the full mathematical derivation of the component contribution method, 4) estimation of error in the group model, 5) reaction type statistics, 6) prediction of flux distributions, 7) the theory underlying calculation of confidence and prediction intervals, and 8) mathematical symbols used throughout the manuscript.

(PDF)

We thank Arren Bar-Even, Wolfram Liebermeister, Naama Tepper, Tomer Shlomi, Bastian Niebel, Steinn Gudmundsson, Adrian Jinich, Dmitrij Rappoport, and William R. Cannon for helpful discussions.