^{*}

The authors have declared that no competing interests exist.

Analyzed the data: AVK SK. Wrote the paper: AVK SK. Designed the study: SK. Implementation: AVK.

One ultimate goal of metabolic network modeling is the rational redesign of biochemical networks to optimize the production of certain compounds by cellular systems. Although several constraint-based optimization techniques have been developed for this purpose, methods for systematic enumeration of intervention strategies in genome-scale metabolic networks are still lacking. In principle, Minimal Cut Sets (MCSs; inclusion-minimal combinations of reaction or gene deletions that lead to the fulfilment of a given intervention goal) provide an exhaustive enumeration approach. However, their disadvantage is the combinatorial explosion in larger networks and the requirement to compute first the elementary modes (EMs) which itself is impractical in genome-scale networks.

We present MCSEnumerator, a new method for effective enumeration of the smallest MCSs (with fewest interventions) in genome-scale metabolic network models. For this we combine two approaches, namely (i) the mapping of MCSs to EMs in a dual network, and (ii) a modified algorithm by which shortest EMs can be effectively determined in large networks. In this way, we can identify the smallest MCSs by calculating the shortest EMs in the dual network. Realistic application examples demonstrate that our algorithm is able to list thousands of the most efficient intervention strategies in genome-scale networks for various intervention problems. For instance, for the first time we could enumerate all synthetic lethals in

Mathematical modeling has become an essential tool for investigating metabolic networks. One ultimate goal of metabolic network modeling is the rational redesign of biochemical networks to optimize the production of certain compounds by cellular systems. Accordingly, several optimization techniques have been proposed for this purpose. However, for large-scale networks, an effective method for systematic enumeration of the most efficient intervention strategies is still lacking. Herein we present MCSEnumerator, a new mathematical approach by which thousands of the smallest intervention strategies (with fewest targets) can be readily computed in large-scale metabolic models. Our approach is built upon an extended concept of Minimal Cut Sets, the latter being minimal (irreducible) combinations of reaction (or gene) deletions that will lead to the fulfilment of a given intervention goal. The strength of the presented approach is that smallest intervention strategies can be quickly calculated with neither network size nor the number of required interventions posing major challenges. Realistic application examples with

Stoichiometric and constraint-based modeling techniques such as flux balance analysis or elementary modes analysis have become standard tools for the mathematical and computational investigation of metabolic networks

Metabolic networks consisting of _{i}_{i}

One ultimate goal of metabolic network modeling is the targeted manipulation of the network behavior. A typical application is metabolic engineering where one is interested in the optimization of the production of a certain compound by a given host organism. A number of constraint-based optimization techniques have been proposed for this purpose

The method of Minimal Cut Sets (MCSs) directly addresses the enumeration of metabolic intervention strategies

Another approach to compute MCSs, which exploits the inherent dual relationship between EMs and MCSs, was recently presented by Ballerstein et al.

However, there are two potential problems related to MCSs. First, when the reactions contained in an MCS are removed, we are sure that the targeted network functions are disabled but other (desired) functions might be blocked as well. For instance, it can occur that an MCS which disables low-yield pathways synthesizing a desired product also blocks growth of the organism making this MCS impractical. To prevent such side effects, the concept of

The second and more serious problem of (c)MCSs is that their full enumeration in large/genome-scale networks becomes prohibitive. The algorithms requiring as inputs the target (and possibly desired) EMs are usually not applicable: despite large progress in algorithmic design

On the other hand, for the purpose of applying MCSs in real networks, those with the smallest number of elements are usually the most relevant. Thus, it is worthwhile to consider computing only the (c)MCSs with low cardinality. The effective enumeration of the smallest cut sets is therefore the key goal of the present work.

Usually, the unwanted/desired functionalities to be disabled/kept in a metabolic network can be described by sets of linear equalities and inequalities over the fluxes. For the purpose of computing MCSs, we could therefore use an exhaustive FBA-based scheme by testing all single, double, triple and higher knockout sets whether they are suitable cut sets or not. The formulation of FBA problems would circumvent the problem to enumerate the EMs first. However, as discussed above, this approach becomes problematic if larger knockout sets are required to solve an intervention problem, as it must test a large number of candidate sets with increasing MCS size (the number of candidates grows with

Whereas the direct calculation of smallest MCSs in large-scale networks cannot be properly addressed yet by current methods, a method for computing the smallest (or shortest) EMs in genome-scale networks was recently presented by de Figueiredo et al.

The goal of the present work is to realize a similar approach for computing the

The paper is organized as follows: we will first briefly review the approach of de Figueiredo et al. for computing

Thereafter we will describe how the network constraints (including inhomogeneous constraints) and the intervention goal have to be translated into their dual description in which we can then enumerate the shortest EMs to obtain the smallest MCSs in the primal network. We shall also explain how

For the sake of simplicity, throughout the manuscript we will deal with reaction cut (or knockout) sets, which must in practice be translated to gene knockout sets to construct the corresponding mutants. This transformation can be easily achieved if the corresponding gene-enzyme-reaction associations are available. The latter could also directly be included in the problem formulations given below to compute gene (instead of reaction) cut sets.

Both elementary modes and minimal cut sets can be represented as sets of reactions (sets of active reactions in case of EMs and sets of deleted reactions in case of MCSs). Since we are mainly interested in the composition and size (cardinality) of EMs and MCSs it is important to represent them efficiently in the MILP problem to be formulated. Here we will make use of _{i}_{i}_{i}

The functionality of indicator variables is often implemented by a “big M” formulation with integer variables (cf. _{i}

We now rephrase the MILP problem presented in _{i}

The actual enumeration of the _{i}

The last step in the loop remains to be explained, the addition of exclusion constraints to the MILP which make sure that duplicates or supersets of already identified EMs will not be returned as solutions by subsequent _{i}

In this subsection, we propose a modified scheme for enumeration of shortest EMs. We first introduce an additional size control constraint

We restate that the exclusion constraints (11) are needed to prevent supersets of known EMs from being erroneously identified as EMs. If all EMs up to size _{i}

Importantly, when all EMs of size

_{i} */

The search that is conducted during a

We present now the key methodological development of this work showing that the basic algorithm for enumerating shortest EMs introduced in the previous section can also be used to compute smallest MCSs. The procedure is based on the duality properties of EMs and MCSs presented by Ballerstein et al.

Constraint (13) specifying the target flux vectors can be generalized to:

In addition to (14) and to the standard network constraints (1) and (2), Ballerstein et al. augmented the system by equality constraints setting all reaction rates to zero

We thus need to transform the primal system defined by (1), (2), (14), (15) into its dual which can be written as follows (cf. _{dual}_{dual}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}

After dualization, we can now compute the smallest MCSs of the primal system by applying algorithm ALGO2 in the dual system. As constraints we need to consider (17) (replacing (1) and (2) from the primal system) as well as (18) and as objective function we exchange (10) with_{i}_{i}_{i}

The MCSs of the primal network are eventually obtained by taking the _{i}_{i}_{i}_{i}_{i}

In the previous subsection we dealt with enumeration of smallest MCSs, however, we have not yet clarified how

The MCSEnumerator method has been integrated as a new functionality in the

We analyze basic properties of the runtime behavior of our algorithm by means of three realistic benchmark problems with different complexities. All computations were performed with the CPLEX 12.4 MILP solver. When using multiple threads deterministic parallel mode was used to get repeatable behaviour. The search tree that CPLEX dynamically constructs took up less than 3 GB of RAM for all the systems used here.

In order to compare our MILP-based MCS enumeration scheme to other approaches the same benchmark problems as in

Problem | MCSs via EMs from original network | MCSs as EMs of the dual system | |||||

substrate | threads | size limit | MCS number | Compute first EFMs, then minimal hitting sets | Method of Ballerstein et al. |
ALGO1: Iterative MILP | ALGO2: MILP populate with fixed EM sizes |

acetate | 1 | - | 309 | 0.6 s | 0.4 s | 19 s | 3 s |

succinate | 1 | - | 1623 | 6.4 s | 7.0 s | 499 s | 32 s |

glycerol | 1 | - | 3733 | 36.8 s | 37.4 s | 61.4 min | 3.5 min |

glucose | 1 | - | 4960 | 181.3 s | 188.7 s | 356.6 min | 21.2 min |

glucose | 1 | 4 | 423 | 43.9 s | not possible | 92.5 s | 4.2 s |

glucose | 4 | - | 4960 | unsupported | unsupported | 228 min (698.0 min) | 18.5 min (56.8 min) |

glucose | 12 | - | 4960 | unsupported | unsupported | 62.9 m (633.4 m) | 5.6 min (58.3 min) |

The number of calculated MCSs and computation times are shown in

Although the MILP algorithm developed herein was actually developed to compute the

The main advantage of our new approach can be seen in the case where only the MCSs up to size 4 have to be calculated (fifth row in

As described in the

To demonstrate this, we use the ^{min}^{max}^{−1}

MCS size | number of MCSs | physical time with MCSEnumerator | physical time with SL Finder |

1 | 277 | 11.1 s | [included] |

2 | 96 | 39.1 s | 91 min |

3 | 247 | 16.8 min | >75.5 h |

4 | 402 | 18.5 h | n/a |

5 | 1464 | 410.4 h | n/a |

Only 226 synthetic triple lethals (which are all contained in the MCSs found by MCSEnumerator) could be calculated after which optimization could not be continued due to numerical problems reported by the solver.

We also tested the homogeneous version of the above intervention problem, that is, we calculated the MCSs blocking growth without the additional constraints for ATP maintenance (

The following third example relates to a typical problem of finding rational intervention strategies for metabolic engineering purposes. We here focus on a biotechnologically relevant application, namely to let

We used again the iAF1260 genome-scale network model of

With these values in mind, we formulated the following intervention goal: the task is to identify cMCSs that guarantee a minimal ethanol yield of ^{min}^{−1}

With these inhomogeneous constraints we can now specify the target flux polyhedron containing all undesired network behaviors to be eliminated by the cMCSs:_{Eth/Glc}

With these values, several linear programs were run in a preprocessing step to explore network capabilities. For ^{−1}^{−1}^{−1}^{−1}

We then computed the cMCSs. As described in the _{i}_{i}

After network compression, the (primal) network could be reduced to 562 metabolites and 958 (lumped) reaction subsets of which 845 can be knocked out.

Note also that the disruption of glucose uptake or ATP maintenance are valid MCSs deleting all undesired behaviors but they violate for trivial reasons the desired functionality (growth not possible) and can thus not be contained in any valid cMCSs. Such reactions being essential for the desired flux space could also be identified at an early stage and then be excluded from the search space.

Scen. | # MCSs | # cMCSs | cMCSs size | runtime [h] phys./CPU | ||||||

3 | 4 | 5 | 6 | 7 | ||||||

1 | 10 | 1.4 | 185302 | 8342 | 8 | 46 | 283 | 1309 | 6696 | 20.5/207.6 |

2 | 10 | 1.8 | 153338 | 1987 | 0 | 0 | 77 | 317 | 1593 | 13.8/136.8 |

3 | 18.5 | 1.4 | 156477 | 8819 | 2 | 98 | 533 | 1737 | 6449 | 16.6/166.4 |

4 | 18.5 | 1.8 | 138675 | 4618 | 2 | 70 | 509 | 917 | 3120 | 20.9/212.2 |

As can be seen in

We then analyzed the cMCSs in more detail. A first observation in

The situation is different in the case of increasing

The graphics indicates the found cMCSs requiring only three knockouts (

The cMCSs for scenario 1 (the red cut, one of the two blue cuts and one of the four dark green cuts in

The fact that three reaction or gene knockouts may suffice to induce a high ethanol yield of more than 1.8 (scenario 4) is a surprising fact on its own. Previous work on computing intervention strategies for ethanol overproduction in a smaller (core) network of

We mention here that two other intervention strategies with three knockouts for production of ethanol by

Having exhaustively enumerated the cMCSs up to a given size enables one to analyze essential features and performance measures of all found intervention strategies by which eventually the optimal knockout strategy can be selected. ^{−1}^{−1}

A: Minimal (guaranteed) ethanol yield (under maximal substrate uptake rate) vs. maximal possible growth rate for each cMCS of scenario 3 in

Other performance measures of designed mutant strains can be studied as well. One such proposed measure is substrate-specific productivity (SSP) which is the product of the growth-rate and the product yield

As a technical note, it is not absolutely mandatory to have all MCSs (up to a maximal size) enumerated before running the LP checks for testing the “survival” of some desired flux vectors: these checks could be (independently) performed as soon as an MCSs has been found by the MILP solver. In fact, it is in principle possible to integrate the LP into the MILP so that the cMCSs are computed directly which offers the advantage that far fewer exclusion constraints need to be integrated while the enumeration proceeds. In practice, however, this approach showed a markedly inferior performance for the system studied here. One reason is that the LP adds further degrees of freedom to the solution space and leads to redundant solutions for the cMCSs which requires a more intricate control of the

To summarize the results of this sub-problem, our algorithm enabled the enumeration of all reaction knock-out sets up to size 7 that lead to coupled ethanol and biomass synthesis in

If more computational capacity is available, one might try to find even larger cMCSs. However, the best knockout strategy to be implemented is likely to be contained among the up to 8819 smallest cMCSs found as the number of required interventions will be one (though not the only) key criterion when deciding for a concrete strain design.

One large-scale study to evaluate the growth-coupled production potential in ^{−1}

Here we wanted to test the potential of our method for some of the intervention problems. We focused on the aerobic production of either fumarate or serine from glucose which both have a potential for high yield as calculated by FBA. However, growth-coupled strains for the production of fumarate only achieved 20% (5 knockouts, OptKnock) respectively 23% (7 knockouts, OptGene) of the theoretical maximum while for serine no growth-coupled strains could be identified in

To demonstrate the power of our method in dealing with large-scale systems, we increased the search space drastically compared to

For fumarate production, the MCSs up to size 7 were calculated (taking 13.6 h) from which 30 cMCSs (all of size 7) could be extracted. Applying those cMCSs would result in production strains exhibiting – at maximal substrate uptake rates – a guaranteed (minimal) fumarate yield between 0.71 and 0.89 corresponding to minimal production rates between 40.9% and 51.3% of the theoretical maximum of 34.68

In this work we presented MCSEnumerator, a new algorithmic approach to enumerate the smallest (c)MCSs up to a given size in genome-scale networks. This approach is based on a MILP problem calculating the shortest EMs in the dual representation of the metabolic network eventually yielding the smallest cMCSs. The whole procedure can be summarized by five steps:

Build the metabolic network as usual by specifying the stoichiometric matrix and the irreversibility constraints (

Define the space of undesired (target) flux vectors and (optionally) the space of desired flux vectors by means of the linear inequalities (14) and (21), respectively. The (c)MCSs to be computed will ensure that no target flux vector can operate whereas the operation of at least one desired flux vector will be feasible.

Build the dual system which is immediately given by (17). Introduce indicator (or binary) variables (

Enumerate the

Translate the EMs found in the dual to MCSs in the primal. If desired behaviors were specified in step 2, run one LP for each MCSs to check whether it is a constrained MCS, i.e., whether some desired flux distributions remain feasible after cutting the reactions contained in the MCS.

With these five steps, MCSEnumerator provides a generic approach for enumerating smallest intervention strategies; one just has to plugin the corresponding matrices in

Apart from the combination of dualization and shortest EM calculation in step 3, another key development made herein is the improvement of the required sub-routine for computing shortest EMs (ALGO2) which is now based on a more efficient enumeration of feasible EMs with fixed size and which consequently makes use of available enumeration features of modern MILP solvers. Appropriate integration of such functionalities could also be useful to effectively solve other enumeration problems in the field.

Despite the fact that calculation of

The main drawback of using a MILP stems from the fact that constraints have to be continuously added to remove already found MCSs and their supersets from the solution space. Hence this method is bound to slow down with increasing number of constraints which explains the inferior performance when computing

The algorithmic advantage of the presented approach lies thus in the possibility to quickly (compared to other approaches) calculate the smallest (c)MCSs with neither network size nor the number of elements in the (c)MCSs posing major challenges. With these results and due to the fact that the approach of (c)MCSs allows the setup of complex intervention problems in a flexible and convenient way, we expect that a large number of metabolic network studies can benefit from our conceived framework.

An interesting aspect for future work will be to investigate how far ALGO2 (the sub-routine used for shortest EM calculation) can be generalized to enumerate also other elementary sets arising in different contexts of computational biology (e.g., for calculating minimal intervention sets in signaling or regulatory networks

We are grateful to Oliver Hädicke for valuable hints and discussions.