The authors have declared that no competing interests exist.

Practical identifiability of Systems Biology models has received a lot of attention in recent scientific research. It addresses the crucial question for models’ predictability: how accurately can the models’ parameters be recovered from available experimental data. The methods based on profile likelihood are among the most reliable methods of practical identification. However, these methods are often computationally demanding or lead to inaccurate estimations of parameters’ confidence intervals. Development of methods, which can accurately produce parameters’ confidence intervals in reasonable computational time, is of utmost importance for Systems Biology and QSP modeling.

We propose an algorithm Confidence Intervals by Constraint Optimization (CICO) based on profile likelihood, designed to speed-up confidence intervals estimation and reduce computational cost. The numerical implementation of the algorithm includes settings to control the accuracy of confidence intervals estimates. The algorithm was tested on a number of Systems Biology models, including Taxol treatment model and STAT5 Dimerization model, discussed in the current article.

The CICO algorithm is implemented in a software package freely available in Julia (

Differential equations-based models are widely used in Systems Biology and Quantitative Systems Pharmacology and play a significant role in the discovery of new disease-directed drugs. Complexity of models is a trade off from their employment to crucial fields of biology and medicine. These areas of application require large non-linear models with many unknown parameters. How accurately can the parameters of a model be recovered from experimental data? What is the identifiable subset of parameters? Can the model be reduced or reparameterized to become identifiable? All those questions of identifiability analysis are essential for model’s predictability and reliability. That explains why the topic of identifiability of Systems Biology models has received a lot of attention in recent scientific research. However, existing numerical methods of identifiability analysis are computationally demanding or often lead to inaccurate estimations. Development of methods, which can accurately produce parameters’ confidence intervals in reasonable computational time, is of utmost importance for Systems Biology and QSP modeling. We propose an algorithm and a software package to test identifiability of Systems Biology models, designed to speed-up confidence intervals estimation and reduce computational cost. The software package was tested on a number of Systems Biology models, including Taxol treatment model and STAT5 Dimerization model, discussed in the current article.

This is a

Reliability and predictability of a kinetic systems biology model depends on how precisely the parameters of the model can be recovered from the given experimental data. Fitting a model to experimental data is not enough to estimate all the parameters unambiguously. Noisy or incomplete experimental data as well as the models structure often result in uncertainty in parameters estimations.

Identifiability analysis is crucial for models verification. It addresses the question to what extent and with what level of certainty can parameters of a model be recovered from the available experimental data. Two branches of identifiability analysis are distinguished [

The goal of

Even if a model includes only structurally identifiable parameters it doesn’t imply their practical identifiability. While structural non-identifiability implies practical non-identifiability, structurally identifiable models often appear to be practically non-identifiable [

Profile likelihood is a reliable though computationally demanding approach to test parameters’ identifiability in Systems Biology (SB). It helps us understand how the data can be mapped to parameters’ values and how accurate the model predictions are.

Following the definitions of [

A kinetic systems biology model can be expressed as an ODE system:

The state vector _{end}) given nominal initial values _{0} =

The subset of unknown parameters can be estimated using the experimental dataset by solving the inverse problem. Typically, not all the variables _{i} are the measurement errors and _{i} are observation functions,

The unknown parameters _{0}} can be estimated by fitting simulated values _{i}(_{i}(_{0}),

The exact choice of the likelihood function

Here the double summation is performed over _{i}(_{j})–simulated values and

MLE provides the point estimates

Confidence interval (CI) is an estimate of the unknown parameter which characterizes it by the range of values for particular confidence level

A confidence interval with confidence level _{i} is an interval defined by probability

Different methods of CI estimation may lead to different definitions of parameters’ identifiability. Profile likelihood is one of the most common and robust ways to construct CIs and state practical identifiability of the estimated parameters [_{i} [

Corresponding confidence interval for an estimate _{α} is ^{2} distribution if the likelihood ratio test is used,

Confidence intervals estimation is the major goal of practical identifiability analysis. According to [_{i},

Two general numerical approaches to construct parameters profiles and PL-based CIs are currently developed and implemented in software packages [_{PL}(_{i}) until the profile function reaches the threshold

_{PL}(_{i}). They imply exploring the shape of _{PL}(_{i}) by making small steps from the minima _{j≠i} at each step of _{i}. The smaller _{i} steps the numerical algorithm takes while exploring _{PL}(_{i}) the more accurate the profiles are. At the same time, re-optimizing _{i} step may require thousands of likelihood function calls, which can be inacceptable for high dimensional ODE models. Progressive derivative-based [

_{i} profile as a solution of the ODE system. The ODE system itself is derived from optimal conditions for constrained optimization of

Various numerical implementations of

The current study presents a new approach for confidence intervals estimation and profile likelihood-based analysis of identifiability: Confidence Intervals estimated by Constrained Optimization (CICO). It addresses the above-mentioned difficulties of stepwise optimization-based and integration-based PL implementations, namely computational effort, accuracy of CI endpoints estimation and algorithm termination criteria. The key idea of the method is to obtain CI endpoints and avoid the calculation of profiles as the most computationally expensive part of the analysis.

According to [

Modified version of Newton-Raphson algorithm is proposed in [

Assuming there exists a solution of (

In case

Note, that ^{T}

The system (^{T}_{i} with inequality constraint _{i} value is the lower CI endpoint

Likewise, in case _{i} with inequality constraint _{i} value is the upper CI endpoint

In the previous section we have reformulated the problem of confidence intervals estimation in the terms of constrained optimization. This approach has a clear geometrical interpretation. We are looking for tangent hyperplanes to the confidence region _{i}. For ^{2} the approach can be illustrated by _{1} is infinite and confidence interval for _{2} has no finite upper endpoint. CI endpoints were calculated using CICO method.

Plots show the contour lines of two functions, chosen to illustrate identifiable and non-identifiable cases. Plot (_{A}(_{1}+2_{2}−7)^{2}+(2_{1}+_{2}−5)^{2}, which has known minimum _{A}(1,3) = 0. Plot (_{B}(1,1) = 0. The star-shaped points mark the minima of the above functions. The bold contour represents the _{1}, _{2}) Red circles mark the points where tangent hyperplanes correspond to parameters’ minimal or maximal values in _{α}. Red circles are CI endpoints. The contours were calculated using marching squares algorithm implemented in Contour.jl package (

All PL-based approaches: stepwise optimization, integration-based algorithm and CICO imply exploring _{i} no a-priori information about its identifiability is usually available. In case _{i} is identifiable we can expect that the profile will intersect with the threshold. In contrast, to state parameter’s non-identifiability we have to check all _{i} feasible values, which can be the whole R space. The definition of practical non-identifiability [_{i} domain but in practice it is never performed. Due to the limitations of computational resources a limited region of _{i} is often utilized in practice for general identifiability analysis.

To address the discrepancy between identifiability definition and its practical application the numerical implementation of CICO proposes the notion of scan bounds

The proposed scan bounds naturally suggest the notion of

It is necessary to note that the PL-based confidence intervals may be asymmetric relative to

The definition of identifiability within the bounds is utilized in the CICO implementation. If lower or upper CI endpoint is present within the scan bounds

We provide an implementation of CICO algorithm in an open source free package LikelihoodProfiler _{i}. Currently the CICO implementation depends on NLopt package [

To test parameters’ identifiability the user should provide _{i} range

The implementation utilizes two termination criteria, which address two possible situations. In case there is a confidence interval endpoint within the

The algorithm can also work in transformed space (

Internally LikelihoodProfiler uses Augmented Lagrangian algorithm [

Here we provide identifiability analysis of the cancer taxol treatment model [

The taxol treatment model is defined by the set of ODEs with three state variables, five unknown parameters (a0, ka, r0, d0, kd), dosage regime and experimental data. The unknown parameters have been fitted to experimental data and their estimated values were taken from original Matlab implementation. Even though the model is structurally identifiable, practically available experimental data, as it was shown [

The same authors provide an open GitHub repository with Matlab implementation of the taxol treatment model (

To run identifiability analysis with LikelihoodProfiler package the taxol treatment model was rewritten in Julia language. To make the numerical simulations comparable with original Matlab implementation Julia’s analogue of Matlab ode23s solver Rosenbrock23 from DifferentialEquations.jl package [

CI endpoints estimated with CICO (

LikelihoodProfiler (CICO) | Original Matlab (Stepwise PL) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Parameter | Lower Endpoint | Upper Endpoint | LF Calls (Lower) | LF Calls (Upper) | Time (sec) | Lower Endpoint | Upper Endpoint | LF Calls (Lower) | LF Calls (Upper) | Time (sec) |

a0 | 6.76 | 17.3 | 285 | 601 | 2.79 | (7.9, 8.32) |
(17.05, 17.46) |
285 | 1715 | 97.74 |

ka | 4.99 | 10.73 | 522 | 349 | 3.26 | (4.86, 5.26) |
(10.52, 10.93) |
682 | 670 | 75.16 |

r0 | NI | 0.4 | 49 | 796 | 2.85 | NI | (0.36, 0.37) |
1510 | 7475 | 531.96 |

d0 | 0.19 | NI | 601 | 170 | 2.81 | (0.13, 0.2) |
NI | 1605 | >20000 | >1000 |

kd | 50.51 | NI | 796 | 223 | 3.74 | (47.65, 53.61) |
NI | 930 | 12260 | 722.52 |

CI endpoints estimated with CICO and CIs’ estimates obtained in the original Matlab stepwise optimization-based implementation. The CI endpoints for original Matlab implementation are given as intervals

(*) because stepwise PL approach doesn’t estimate endpoints with any preset tolerance but marks two points before and after parameter’s profile intersects the threshold. NI stands for non-identifiable parameter. Elapsed time is measured by

As most of computational efforts in “profiling” approach are focused on solving ODEs with different parameters’ sets, the performance of the algorithms was measured by the number of likelihood function calls (

In general, CICO needs less likelihood function evaluations than stepwise optimization-based profiling to converge to endpoint value. Efficacy of CICO is especially evident in non-identifiable cases. This is due to the constraints incorporated in the objective function as a penalty part. It starts to penalize the algorithm only when optimizer gets near to the threshold, which doesn’t happen in many non-identifiable cases where profiles are flat.

The path of CI search for stepwise optimization-based algorithm (

STAT5 Dimerization Model [

dMod implements

LikelihoodProfiler (CICO) | dMod (optimize) | ||||||||
---|---|---|---|---|---|---|---|---|---|

Parameter | Lower Endpoint | Upper Endpoint | LF Calls (Lower) | LF Calls (Upper) | Time (sec) | Lower Endpoint | Upper Endpoint | LF Calls (Total) | Time (sec) |

Epo_degradation_BaF3 | -1.71 | -1.42 | 523 | 494 | 0.75 | (-1.74, -1.72) |
(-1.42, -1.39) |
1716 | 42.15 |

k_exp_hetero | NI | -3.15 | 4 | 1036 | 0.72 | NI | (-3.1, -3.01) |
533 | 13.53 |

k_exp_homo | -2.48 | -1.98 | 237 | 289 | 0.4 | (-2.56, -2.52) |
(-1.95, -1.93) |
1931 | 47.89 |

k_imp_hetero | -1.86 | -1.69 | 171 | 179 | 0.32 | (-1.91, -1.9) |
(-1.67,-1.66) |
1435 | 37.58 |

k_imp_homo | 0.19 | NI | 1287 | 7 | 1.04 | (0.11, 0.18) |
NI | 2675 | 66.35 |

k_phos | 4.16 | 4.27 | 143 | 168 | 0.21 | (4.1, 4.12) |
(4.29, 4.3) |
1959 | 50.75 |

sd_pSTAT5A_rel | 0.44 | 0.77 | 172 | 243 | 0.34 | (0.42, 0.44) |
(0.78, 0.8) | 2165 | 55.58 |

sd_pSTAT5B_rel | 0.72 | 0.99 | 231 | 186 | 0.34 | (0.66, 0.68) | (0.99, 1.01) | 2062 | 53.50 |

sd_rSTAT5A_rel | 0.4 | 0.67 | 204 | 929 | 0.83 | (0.35, 0.36) | (0.67, 0.67) | 2062 | 53.49 |

CI endpoints estimated with LikelihoodProfiler (CICO) and CIs’ estimates obtained in dMod. Lower and upper CI endpoints for dMod are given as intervals

* marking two points before and after parameter’s profile intersects the threshold. NI stands for non-identifiable parameter. Elapsed time is measured by

This allowed us to set tolerance of endpoint estimation in LikelihoodProfiler

Taking into account the difference of the underlying optimizers, the endpoints reported by LikelihoodProfiler correspond to the values obtained in dMod. The performance of each package was measured by the number of likelihood function evaluations and time required to compute CI endpoints. The results indicate the efficiency of CICO, which on average overperforms integration-based approach implemented in dMod even though dMod relies on model’s functions compiled to C. Only for

The detailed identifiability analysis of the Taxol treatment model and STAT5 dimerization model, the source code as well as other use-case models’ identifiability analyses are published on our GitHub repository (

A number of recent studies have demonstrated that profile likelihood-based methods are efficient to analyze identifiability of the parameters reconstructed on the basis of experimental data. In the absence of identifiability analysis one can never be certain how reliable parameters estimations and how accurate the model predictions are. However, practical usage of profile likelihood-based methods has not become a standard routine yet due to a number of challenges.

Indeed, profile likelihood-based methods are computationally demanding. Progressive stepping and other optimizations of the basic profile likelihood approach impose restrictions on the likelihood function (such as the need to calculate gradients) and limits the set of the applicable optimization methods. The CICO algorithm attempts to solve this problem by replacing multiple calculations of the likelihood function with constrained optimization. For each individual parameter only two optimization iterations are required to calculate the lower and upper CI endpoints. CICO doesn’t require the gradient of the likelihood function and allows the user to choose derivative-free or gradient-based optimization algorithm.

Other challenges originate from uncertainty in practical non-identifiability definition. It is implied that researchers have to scan sufficiently wide but finite intervals to state a non-identifiable case. In practice it is usually performed by visualizing the profiles on a chosen interval and extrapolating profiles behavior to the global parameters feasible region. In the current study we have proposed a formal criteria of the algorithm termination, utilizing the scan bounds notion, which can automate the analysis process and get rid of subjectivity.

The numerical experiments have demonstrated that confidence intervals obtained with CICO algorithm coincide with the results reported in the publications. As it was shown, on average the algorithm overperforms considered above optimization-based and integration-based PL implementations. This comparison was performed with the default solver settings and can possibly be optimized for greater efficiency. Moreover, the optimization-based PL approach doesn’t converge to the endpoint, while the CICO algorithm was developed to accurately estimate CI endpoints. Hence a more thorough comparison of the algorithms is difficult, since the termination criteria of the optimization-based PL doesn’t take into account the accuracy of CI endpoints estimation.

To compare the methods we have measured efficacy in terms of elapsed time and likelihood function calls required to obtain CI endpoints. In general, CICO implementation in LikelihoodProfiler is about 100 times faster than dMod integration-based approach (R) and optimization-based method (Matlab). However, it is important to note that timings highly depend on the programming language, optimization method and ODE solver used while the number of likelihood function evaluations is a language independent measurement, though it also is affected by the efficacy of optimization algorithm and ODE solver.

In addition to confidence intervals, other interval estimates may also be of interest: confidence n-dimensional parameters’ regions, prediction bands, etc. The CICO algorithm usage can be potentially expanded to calculate these generalizations of confidence intervals, and we plan to test its use for these classes of tasks in our future studies.

Dear Mr. Borisov,

Thank you very much for submitting your manuscript "Confidence intervals by constrained optimization – an algorithm and software package for practical identifiability analysis in Systems Biology" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Daniel A Beard

Deputy Editor

PLOS Computational Biology

Daniel Beard

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact

[LINK]

Reviewer's Responses to Questions

Reviewer #1: The article is very clearly written. The authors lay out a timeline of the developments of the field that to me seems fairly complete and leads directly to the new method as a substantial increase in the field. The motivation for focusing on the structural identifiability is made fairly clear. The method is well-described and after reading it, it's clear that it should work. The evidence then demonstrates that it does work. I can easily see this method and this package being used by many researchers in practice.

That said, there are some improvements that should probably be made to the paper before publication. For one, I think that section 3.4 is unnecessary. I think it's fairly clear that this kind of numerical method needs to be computed on some finite support so practically all determinations are going to be made in some box. I don't think that more than a sentence or a paragraph is really required to get that point across. Secondly, the paper itself doesn't seem to have a lot of the validation. One example is used as validation, but the paper needs more. When I look at the package they discuss, I can see 5 clear examples with Binder links that demonstrate the method on more systems: some of this should be in the paper instead of 3.4 in order to more broadly demonstrate the validity of this method. Next, what they established was "structural efficiency", i.e. efficiency in terms of likelihood function evaluations. But it would've been nice to also see "practical efficiency", i.e. raw timings for the MATLAB method and Julia and Python implementation of the new methods, and use this to demonstrate a clear orders of magnitude actual performance improvement. Overall I think it's a really good paper, a good idea, and a strong result with just some touch-ups requires to really hammer home the advance in a more clear way.

Reviewer #2: This article presents a novel method to study practical identifiability of parameters of ODE-based models. The method is innovative and seems to overcome existing methods in terms of computational cost, at least in the presented example. It can definitely be useful for the Research community, especially since the authors have made it freely available either in Julia or in Python. The article is very clear and well written. It cites all relevant literature. I have three minor comments:

-Equation 7 : precise the values for j, to make clearer the fact that this is a system of more than 2 equations.

-Equation 8: it is not obvious how the authors transformed system (

-The authors claim in the Abstract and Introduction that their method provides more accurate estimation of confidence Interval bounds. However, this is not demonstrated in the article, neither theoretically, nor computationally (On the opposite, they do provide some evidence of the lower computational cost of their algorithm compared to existing ones). Please either add the corresponding evidence or modify the text.

**********

Large-scale datasets should be made available via a public repository as described in the

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1:

Reviewer #2: No

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see

Submitted filename:

Dear Mr. Borisov,

We are pleased to inform you that your manuscript 'Confidence Intervals by Constrained Optimization – an Algorithm and Software Package for Practical Identifiability Analysis in Systems Biology' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Daniel A Beard

Deputy Editor

PLOS Computational Biology

Daniel Beard

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Reviewer #1: The authors have addressed my previous concerns and demonstrate a significant improvement to the practical application of practical identifiability analysis with these new results. In addition, I can confirm that their code, timing, and results on the Julia side are easily reproducible.

Reviewer #2: The authors have answered all my comments.

**********

Large-scale datasets should be made available via a public repository as described in the

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1:

Reviewer #2: No

PCOMPBIOL-D-20-01281R1

Confidence Intervals by Constrained Optimization – an Algorithm and Software Package for Practical Identifiability Analysis in Systems Biology

Dear Dr Borisov,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Nicola Davies

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom