Conceived and designed the experiments: JK JLR CTM. Performed the experiments: JK. Analyzed the data: JK JLR CTM. Wrote the paper: JK JLR CTM.
The authors have declared that no competing interests exist.
The use of computational models in metabolic engineering has been increasing as more genomescale metabolic models and computational approaches become available. Various computational approaches have been developed to predict how genetic perturbations affect metabolic behavior at a systems level, and have been successfully used to engineer microbial strains with improved primary or secondary metabolite production. However, identification of metabolic engineering strategies involving a large number of perturbations is currently limited by computational resources due to the size of genomescale models and the combinatorial nature of the problem. In this study, we present (i) two new bilevel strain design approaches using mixedinteger programming (MIP), and (ii) general solution techniques that improve the performance of MIPbased bilevel approaches. The first approach (SimOptStrain) simultaneously considers gene deletion and nonnative reaction addition, while the second approach (BiMOMA) uses minimization of metabolic adjustment to predict knockout behavior in a MIPbased bilevel problem for the first time. Our general MIP solution techniques significantly reduced the CPU times needed to find optimal strategies when applied to an existing strain design approach (OptORF) (e.g., from ∼10 days to ∼5 minutes for metabolic engineering strategies with 4 gene deletions), and identified strategies for producing compounds where previous studies could not (e.g., malate and serine). Additionally, we found novel strategies using SimOptStrain with higher predicted production levels (for succinate and glycerol) than could have been found using an existing approach that considers network additions and deletions in sequential steps rather than simultaneously. Finally, using BiMOMA we found novel strategies involving large numbers of modifications (for pyruvate and glutamate), which sequential search and genetic algorithms were unable to find. The approaches and solution techniques developed here will facilitate the strain design process and extend the scope of its application to metabolic engineering.
Metabolic engineering of microbial strains has been of great interest for producing a wide variety of chemicals including biofuels, polymer precursors, and drugs. While conventional metabolic engineering approaches often focus on modifications to the desired and neighboring pathways, recent developments in computational analysis of metabolic models allow identification of genetic modifications needed to improve production of biochemicals
To avoid this computational challenge, approaches like flux balance analysis (FBA)
A number of bilevel strain design approaches use mixedinteger programming (MIP) to efficiently identify the mutations needed to achieve the highest production rates, including OptKnock, OptStrain, OptReg, OptForce, and OptORF. These bilevel MIP approaches consist of an ‘outer’ problem and an ‘inner’ problem, where the outer problem optimizes an engineering objective function and the inner problem optimizes a cellular objective function. Frequently, the inner problem is FBA which is a linear programming (LP) problem. In MIPbased approaches, the inner FBA problem is converted into optimality constraints by formulating a dual LP of FBA and enforcing strong duality
In this study, we report two new MIPbased bilevel strain design approaches and solution techniques to improve their runtime performance. First, we present SimOptStrain which simultaneously considers gene deletions in a host organism and reaction additions from a universal database such as KEGG
(A) Simplified representation of the existing OptStrain procedure and an illustrative example. Step 1 adds a minimum number of reactions from a universal database that yields the maximal increase in theoretical maximum production (TMP). Step 2 identifies reaction deletions in the augmented network identified in Step 1 that couple biomass and biochemical production. (B) SimOptStrain with simultaneous gene deletion and nonnative reaction addition, and illustrative examples. Solution s1 shows an example of reaction additions, which do not increase the TMP, that improve biochemical production at the maximum growth rate when combined with gene deletions. Solution s2 is an example of reaction additions that yield a suboptimal increase in the TMP, while solution s3 is a case where the number of added reactions is not necessarily the minimum. Solutions s1, s2, and s3 could only be found using SimOptStrain.
Second, we present a new quadratic bilevel MIP approach, BiMOMA, to identify gene deletions for improving biochemical production when MOMA is used as an inner problem (see
The MOMA inner problem, a convex quadratic program, is converted to its optimality conditions using strong duality. The resulting BiMOMA problem is a single level mixedinteger quadratically constrained program.
These bilevel computational approaches lead to MIP formulations that become intractable when the number of allowed modifications is large. Preprocessing and heuristic algorithms have been used to improve tractability
We recently developed a bilevel optimization approach (OptORF) which uses metabolic and transcriptional regulatory models to find metabolic and/or regulatory gene perturbation strategies
First, we tightened the bounds on a subset of variables in the dual LP of FBA by examining its feasible region. Similar to FBA, the dual LP often has alternate optimal solutions due to the redundancy in metabolic networks. In a bilevel problem, any optimal solution of the dual LP will provide a feasible solution to the bilevel problem without affecting solutions of the primal LP since the primaldual LP pair is only connected via strong duality. Therefore, we can obtain a valid solution of the dual LP among alternate optimal solutions by minimizing the norm of the dual variables subject to the dual LP constraints and optimal objective function value. We focused on dual variables corresponding to the reaction removals, and sampled their values using 1,000,000 samples of 10 random gene knockouts. We initially tested different sample sizes and numbers of gene knockouts and found the results were consistent above ∼100,000 samples and ∼5 gene knockouts. Therefore, we collected 1,000,000 samples, which was computationally tractable, and 10 gene knockouts, which was the maximum number allowed for the case studies in this work.
For each sample, we randomly choose 10 genes and solve FBA where the reactions corresponding to the 10 genes are removed via gene to protein to reaction (GPR) associations. If the FBA problem is feasible and biomass production is positive, we then minimize the Euclidean norm of the dual variables for reaction removals in the dual LP while the objective function is constrained to be equal to the optimal biomass production value. This process is repeated 1,000,000 times to sample the values of dual variables for removed reactions in different modified network structures.
Maximum (downward triangle) and minimum (upward triangle) of observed dual variable values for each reaction sorted by the standard deviation. The values of dual variables were obtained from 1,000,000 samples of 10 gene knockouts in glucose anaerobic condition using the
Second, we applied a penalty (α) for each additional gene deletion in the outer objective function to create a tradeoff between biochemical production and the required number of genetic modifications (Equation A.1). This penalty results in selection of strategies with fewer modifications among solutions with equal production and reduces the solution time.
Third, as other studies have done
Fourth, we used an iterative algorithm by solving successive problems to optimality with increasing numbers of allowed gene deletions (k) where the solution from the previous problem (p^{k}) is used as a starting point for the next problem (p^{k+1}). Unlike a local search where the next solution is constrained to keep parts of the previous solution, here the next solution is not at all constrained by this starting point, but it facilitates the search by providing a good feasible solution that can be used to prune large numbers of suboptimal solutions. The successive runs improved solver stability for some difficult cases.
While all four steps were taken, we found that major runtime performance improvements were made when the bounds on dual variables and the penalty for gene deletions were applied simultaneously. We found that placing [−1, 1] bounds on the dual variables for reaction removals was very effective for the OptORF cases examined here, but these values may need to be adjusted for other models or conditions. The optimization problems were solved using CPLEX 11.2 accessed via GAMS on a linux machine with Intel Xeon 2.66GHz processors.
SimOptStrain was developed to simultaneously consider gene deletions in a host organism and reaction additions from a universal database (see
First, GPR associations and gene deletion constraints (Equations B.16–B.20 in
The curated KEGG
We also developed a bilevel MIQCP approach that, for the first time, uses MOMA as an inner objective problem (see
First, the MOMA inner problem is converted into a standard QP form (Equations C.2–C.4, and the left hand side of Equation C.9 in
While a global optimum can be obtained since the inner MOMA problem is convex, the BiMOMA problem for a genomescale model can be very difficult to solve due to its size and nonlinearity. Therefore, we investigated the dual QP of MOMA using a similar sampling procedure described in the first subsection of
We first tested the performance of the developed MIP techniques to identify gene deletion strains that are predicted to have high acetate production (
(
k  Identified Genes  Changes 
Yield 

1 

38.67  
2 


1  38.86  
3 



5  53.97  
4 




1  54.93  
5 





5  59.16  
6 






11  68.00  
7 







1  68.56  
8 








11  75.42  
9 









3  77.15  
10 










3  77.25 
‘Changes’ refer to the number of genes which are newly introduced in the solution with k deletions or removed from the solution with k–1 deletions.
Yield is reported as % of the TMP for wildtype strain with a maximum glucose uptake rate of 10 mmol gDW^{−1} h^{−1} (2.56 mol acetate produced/mol glucose consumed). gDW stands for gram dry weight.
To compare the proposed MIP techniques to local search methods, we implemented and modified the Genetic Design through Local Search algorithm (GDLS
We additionally used OptORF to find high production strategies for metabolites that were previously found to be difficult to couple to biomass production under glucose and/or xylose aerobic conditions (
Different colors indicate OptKnock (light grey), OptGene (grey), and OptORF (dark grey). The numbers on the xaxis correspond to the number of reaction deletions identified (or maximum allowed if no strategy was found) by OptKnock and OptGene
To demonstrate the benefit of considering gene deletions and nonnative reaction additions simultaneously, we applied the SimOptStrain approach to succinate and glycerol production under glucose aerobic conditions with a minimum growth of 0.1 h^{−1}. We used a metabolic model of
For succinate production, we found that there are no nonnative reactions which improve the TMP when added to the wildtype
The top numbers are for wildtype and the bottom numbers are for a predicted succinate producing strain (
k  Deleted Genes  k′  Added Reactions (EC No. 
Growth Rate (h^{−1})  TMP 
Yield 

wildtype  0.88  1.5  0.0  
3 



0  None 
0.83  1.5  8.8  



1  1.2.1.52  0.62  1.5  32.5  



2  1.2.1.52  2.1.3.1  0.62  1.5  37.3  
4 




0  None 
0.59  1.5  38.2  




1  1.2.1.51  0.17  1.5  60.4  
5 





0  None 
0.11  1.5  54.4  





1  1.4.1.20  0.12  1.5  67.5  





1  1.4.1.9  0.12  1.5  67.5 
Enzyme Commission number.
Theoretical maximum production (TMP) is reported as mol succinate produced/mol glucose consumed for each strain with nonnative reaction additions, but without gene deletions. A maximum glucose uptake rate of 10 mmol gDW^{−1} h^{−1} and a maximum oxygen uptake rate of 18.5 mmol gDW^{−1} h^{−1} were used.
Yield is reported as % of the TMP for each strain after reactions are added.
Best strategies without addition of nonnative reactions.
For glycerol production, the addition of nonnative reactions from the KEGG database could improve the TMP up to ∼220% of the TMP for the wildtype
k  Deleted Genes  k′  Added Reactions (EC No. 
Growth Rate (h^{−1})  TMP 
Yield 

Wildtype  0.88  0.91  0.0  
3 



0  None 
0.88  0.91  0.06  
5 





1  3.1.3.21  0.21  1.51  47.1  





1  3.1.3.21  0.21  1.51  55.1  





1  2.7.1.142  0.35  1.58  59.3  





1  3.1.3.21  0.30  1.51  62.6  
6 






0  None 
0.64  0.91  6.8  






2  3.1.3.21  2.1.3.1  0.17  1.51  64.1 
Enzyme Commission number.
Theoretical maximum production is reported as mol glycerol produced/mol glucose consumed for each strain with nonnative reaction additions, but without gene deletions. A maximum glucose uptake rate of 10 mmol gDW^{−1} h^{−1} and a maximum oxygen uptake rate of 18.5 mmol gDW^{−1} h^{−1} were used.
Yield is reported as % of the TMP for each strain after reactions are added.
Best strategies without addition of nonnative reactions (no strategy was found for k = 5 and k′ = 0 because any small production increases (over the k = 3 strategy) were negated by the penalty α = 10^{−6}).
To find ‘unevolved’
(A) Pyruvate and (B) Glutamate. The best BiMOMA strategies (○) were identified for k = 1 to 5 using a penalty of 0.5% TMP, and were combined with a local search (□) with search sizes of 2 or 3. BiMOMA+local search size of 2 starts from the best BiMOMA solutions for k = 2 and 3; and BiMOMA+local search size of 3 starts from the best BiMOMA solutions for k = 3, 4, and 5. A sequential search was also performed, which is a local search with search size of 1 starting from the best k = 1 solution.
In the pyruvate case, the differences in yields between the sequential search and BiMOMA search were somewhat moderate for k = 1 to 5, but the sequential search missed higher production strategies as the number of allowed gene deletions increased (
k  Identified Genes  Changes 
Yield 

1 

6.59  
2 


3  11.78  
3 



1  15.69  
4 




1  19.16  
5 





3  22.39  
6 






5  25.87  
7 







5  31.86  
8 








3  37.21  
9 









7  38.46  
10 










1  41.03 
‘Changes’ refer to the number of genes which are newly introduced in the solution with k deletions or removed from the solution with k–1 deletions.
Yield is reported as % of the TMP (2 mol pyruvate produced/mol glucose consumed) for wildtype strain with a maximum glucose uptake rate of 10 mmol gDW^{−1} h^{−1} and a maximum oxygen uptake rate of 18.5 mmol gDW^{−1} h^{−1}.
The benefit of a bigger search size is more evident in the glutamate case (
k  Identified Genes  Changes 
Yield 

1 

3.01  
2 


1  5.41  
3 



3  7.90  
4 




1  16.74  
5 





1  28.29  
6 






1  31.16  
7 







3  35.54  
8 








1  38.56  
9 









3  39.65  
10 










5  43.70 
‘Changes’ refer to the number of genes which are newly introduced in the solution with k deletions or removed from the solution with k–1 deletions.
Yield is reported as % of the TMP (1.15 mol glutamate produced/mol glucose consumed) for wildtype strain with a maximum glucose uptake rate of 10 mmol gDW^{−1} h^{−1} and a maximum oxygen uptake rate of 18.5 mmol gDW^{−1} h^{−1}.
Using the BiMOMA approach, we were able to efficiently identify production strategies with up to 40–45% theoretical maximum yields for glutamate and pyruvate (
The use of computational approaches in metabolic engineering has grown rapidly, alongside an increasing number of genomescale metabolic models
The MIP techniques developed in this study can be applied to most existing bilevel approaches for strain design, synthetic lethal identification, or network identification
With these runtime performance improvements, the SimOptStrain approach can now be used to simultaneously consider the deletion of genes in a host organism and addition of nonnative reactions. The simultaneous search broadens the scope of strain designs and identifies novel combinations of modifications, which could not have been found previously using a multistep procedure. In addition to strain design, SimOptStrain could also be used to refine models (by adding and removing reactions) for cases when FBA does not correctly predict byproduct secretion. Improvements in the universal reaction database are still needed, particularly with respect to reaction reversibility which affects constraintbased model predictions
We further expanded the application of the solution techniques to BiMOMA, the first mixedinteger programming approach that uses MOMA
In summary, we developed two new bilevel strain design approaches using mixedinteger programming. The developed approaches could be useful particularly for identifying novel metabolic engineering strategies to improve production of nonnative secondary metabolites. We also presented mixedinteger programming solution techniques based on concepts from duality to effectively identify genetic perturbation strategies, within a reasonable amount of time even for a large number of perturbations. The MIP techniques were successfully applied to existing strain design approaches as well as new approaches developed in this study. They will likely improve the efficiency of other bilevel problems as well, including model identification, synthetic lethal identification, and objective function prediction
(PDF)
(PDF)
(PDF)
(XLSX)
(PDF)
(XML)
(XML)
The authors are grateful to the anonymous reviewers for their comments and suggestions to improve the paper.