^{1}

^{2}

^{2}

^{3}

^{4}

^{5}

^{2}

^{*}

^{1}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: AS SS JK WZ MN IS. Performed the experiments: AS SS. Analyzed the data: AS SS JK HAK MN IS. Wrote the paper: AS SS JK MN IS.

To facilitate analysis and understanding of biological systems, large-scale data are often integrated into models using a variety of mathematical and computational approaches. Such models describe the dynamics of the biological system and can be used to study the changes in the state of the system over time. For many model classes, such as discrete or continuous dynamical systems, there exist appropriate frameworks and tools for analyzing system dynamics. However, the heterogeneous information that encodes and bridges molecular and cellular dynamics, inherent to fine-grained molecular simulation models, presents significant challenges to the study of system dynamics. In this paper, we present an algorithmic information theory based approach for the analysis and interpretation of the dynamics of such executable models of biological systems. We apply a normalized compression distance (NCD) analysis to the state representations of a model that simulates the immune decision making and immune cell behavior. We show that this analysis successfully captures the essential information in the dynamics of the system, which results from a variety of events including proliferation, differentiation, or perturbations such as gene knock-outs. We demonstrate that this approach can be used for the analysis of executable models, regardless of the modeling framework, and for making experimentally quantifiable predictions.

Biological systems are remarkable examples of complex dynamical systems. The dynamics of these systems involve extreme concurrency and interactions across multiple scales of biological organization. For example, the states of individual cells are determined by internal molecular processes governed by molecular interaction systems. These cellular states influence cell-cell interactions, which collectively give rise to macroscopic behavior, but the ensuing macroscopic state of the system, such as establishment of cellular structures (e.g., blood vessels) or nonhomogeneous distributions of diffusible molecules, also feeds back on the “lower” intracellular molecular systems and their states. The complexity of biological systems is further compounded by the fact that they are open and react to time-varying input received from their environment. A reactive system must respond to each stimulus as it occurs, often needing to respond to many stimuli concurrently

Over the last decade, systems biology research has yielded a plethora of data pertaining to biological systems. To facilitate further analysis and systems-level understanding, this information is often integrated into large scale models using a variety of mathematical and computational approaches. A general class of models are so-called executable models

Any given model class, be it a Boolean network or a system of stochastic differential equations, comes furnished with the appropriate framework and tools for analyzing system dynamics. For instance, to assess the sensitivity of a continuous dynamical system to small perturbations, we may compute the Lyapunov exponent based on Euclidean distance as a measure of state similarity

The problem becomes greatly compounded once we start moving away from abstract models that entail significant assumptions and reductions of scale; for instance, a system of ordinary differential equations that models a genetic network assumes a well-mixed homogeneous system of cells (all assumed to be in the same state) to allow for molecular concentrations to be used in the model, since on a single cell level, such a model is not appropriate due to low molecular counts. Fine-grained models that incorporate and bridge molecular (both intracellular and diffusible extracellular) and cellular information present a significant challenge to the study of system dynamics. Although a state (“snapshot”) of the system can be defined, encompassing information such as spatially varying molecular concentrations, cell types and positions of cells, or activation status and other functional states of individual cells, it is far less clear how to study the dynamics of the system that incorporates all this information. Loosely speaking, if we consider the collective information embodied in a state of the system, then system dynamics amounts to information flow; the information in a given state is related to the information in a predecessor state.

Classical Shannon information theory is based on modeling the distribution of symbols that need to be fixed in advance

As a proof of principle, we apply the NCD based analysis on a statecharts based executable model of immune decision making and immune cell behavior

A) A cartoon diagram of the simplified biological system underlying the modeling. HSP60 and aCD3 serve as activators of the Treg and nTh populations by binding TLR2 and CD3 respectively. The activated Tregs feature two effector molecules that communicate inhibitory signals to other effector T cells, thereby suppressing proliferation and inflammatory cytokine secretion (IFN-γ and TNF-α). CTLA-4 is a membrane bound inhibitory signal that binds B7 molecules on effector T cells, thus requiring direct cell contact between Tregs and effector T cells. IL-10 is a secreted molecule that binds IL-10 receptor on effector T cells and thus has spatial influence on the population. IFN-γ acts as an activator of the effector T cells

We present a proof-of-concept methodology for analyzing the dynamics of complex biological models by building on and generalizing a previously published methodology demonstrated on Boolean network models and gene expression data

To show the feasibility of our methodology, we applied it on the output of an executable model that simulates the effect of heat shock protein (HSP60) on the interactions between two populations of T cells - Tregs and nTh cells, and the results of these interactions

The database that contains the data for the model holds several types of information - the number and types of cells and molecules in the system, numerical discrete parameters, such as concentrations and affinity levels, and general rules such as which cells express which receptors, which molecules can potentially bind and interact, and the emergent cellular behavior of these interactions. The database is constructed in a way that the data in it can be easily changed, and the user can create different sets of initial parameters for the execution of the model, or different sets of interaction rules

The spatial conformation of the model is a 20×20 2D grid. There are no limits on the number of cells that can be present on each grid location at any given time point, and there can also be several types of molecules with different concentrations on the same grid location. We started each execution with a total of 100 cells dispersed randomly across the grid.

The model is synchronous and simulates the changes in the dynamics of the system over 30 hours with time resolution of one hour. Thus, in our analysis

We executed the model under several different initial conditions: wild type, knock-outs of three major molecular components - IL-10, IFN-γ and CTLA-4, and different ratios of cell quantities of the two initial cell populations. We also generated a random model, in which the numerical parameters we used were random, but the interaction rules remained the same as in the wild type model. For each initial condition we generated 50 runs

We used algorithmic information theory

We used non-metric multidimensional scaling (MDS) to visualize the state trajectories in a low dimensional space. The non-metric MDS algorithm was initialized with the solution of the classical multidimensional scaling algorithm. We used Kruskal’s stress criterion as an objective function.

For MDS, we created a global NCD distance matrix between all the states from each of the “setups”, the wild type, and the IL-10, IFN-γ and CTLA-4 knock-outs. For any two setups, the NCD distance matrix has a size of (50×31, 50×31) = (1550,1550). For all the four setups, the NCD distance matrix has a size of (4×1550, 4×1550) = (6200×6200). The overall computational cost of building the NCD distance matrix is high, but can be mitigated by the fact that the computation can be done in parallel (see

To quantify the information flow within the system, we defined the convergence or divergence of states based on whether the distance between them increases or decreases over time. We measured the distance between two states with the NCD. First, we computed the NCD between two states _{1} = t_{2}_{.} This was repeated for all possible pairs

For visualization, we chose contour levels for the estimated density such that we had high resolution in the regions of small probability values and low resolution in the regions of large probability values (see

The behavior of the simulated system under wild type conditions, in terms of the different cells and molecules present in the system, is shown in

a) Changes in the number of cells in each cell population over time, averaged over 50 simulations and b) changes in the concentration levels of the secreted molecules over time, also averaged over 50 simulations. In c) NCD over time between two consecutive states of one simulation and d) changes over time in the NCD between the same state taken from two distinct simulations. The average of 50 simulations is shown with the 5th percentile confidence interval. The major changes in NCD in (c) co-occur with the events in a) and b). There is a clear trend of divergence in d) as the distances between trajectories from different runs increase over time.

To see if the NCD is able to detect these events from the state data, we first compared two consecutive states (

To study the behavior of the system in more detail, we computed a distance matrix to measure the similarity of all the pairs of states (see

a) 3D state trajectories of wild type (WT), IL-10, IFN- γ and CTLA-4 knock-outs (KO); b) the Euclidean distance between the trajectories in relation to wild type, denoted as (

While the MDS analysis provides a clear insight into the system’s dynamics, it is based on a projection into a low dimensional Euclidean space. This representation contains only the most relevant information in terms of the minimized stress criteria. To obtain an alternative, more quantitative view into the dynamics, we study the information flow in the system over time (see

Information flow in a) random model, b) wild type (WT), c) IL-10, d) IFN-γ and e) CTLA-4 knock-outs (KO). While the information flow in the random model is focused in a narrow area, the wild type and knock-out simulations are more diverse, with each system exhibiting characteristic dynamical behaviors.

In addition to the wild type and knock-out conditions, we also analyzed the behavior of a random model (see

To assess the degree to which such global information-theoretic analysis can reflect experimentally quantifiable phenomena, we performed an additional simulation. So far, we have considered perfect knock-outs of secreted molecules. In real biological systems the knock-out is hardly ever perfect, but only lowers the expression level of the target molecule to a certain degree. Such a partial knock-out can be quantified by measuring the expression of the molecule before and after the knock-out. Analogously, the effect of the knock-out could be observed from the representation of

Wild type IL-10 (WT) is compared with knock-outs at 25%, 50%, 75% and 100% efficiency. a) 3D representation of state trajectories. b) The Euclidean distance between wild type and knock-out trajectories is denoted as

Common practice in complex systems analysis is to analyze dynamical systems with model class specific tools. These are highly informative and effective in revealing the specific properties of the systems. However, as these methods are specially designed for each model class, the comparison and generalization of results between classes is challenging. In our previous work, we have successfully applied algorithmic information theory to Boolean and ternary network models

We used different data representation techniques that allowed us to observe the informational dynamics of a model of immune decision making and immune cell behavior. Multidimensional scaling based analysis allowed the representation of state trajectories in low dimensional space. Such representation will allow, for instance to study the relationships and transitions between attractors or steady states of the underlying system. With partial knock-outs we presented an example of how this approach can be used to produce experimentally quantifiable biological predictions in addition to general theoretical insights on the basis of system-level information dynamics.

We also studied the information flow in the systems. For this type of systems the sampling of the whole state space is not feasible and thus, we cannot build a complete map of the dynamical behavior. However, using a randomized model as a background and analyzing the range of observed dynamics in different knock-outs we were able to establish clear differences in dynamical behavior. We observed multiple domains where the information flow was constrained in comparison to random or wild type models. While we have shown that these observations can be informative about system dynamics, further development of systems theory is needed before the characteristics of such a limited sampling of state space can be tied to global dynamics of the system. Any computational model is a significant simplification of the real system. In this study, our model describes in a simplified fashion the capacity of regulatory T cells (Tregs) to suppress inflammatory T cell effector functions and it characterizes the dependence of this suppression on Cytotoxic T-Lymphocyte Antigen 4 (CTLA-4), Interleukin-10 (IL-10) and interferon-gamma (IFN- γ) levels. The goal of the model is to capture the interplay between these molecules, which subsequently leads to the control of cell proliferation and differentiation, analogously to the events in real biological systems. Our analysis is able to identify proliferation and differentiation events from the time series of output files that contain the state of the entire system. The timing of these events in the NCD analysis is consistent with what we expect from the simulation of the model. The further analysis where we compare the trajectories of the system in response to different knock-outs provides more in-depth knowledge on how the global behavior of the system is altered. Such system level properties cannot be quantified directly from the model output or from observing the biological system by any single experiment. Predictions derived from the system level analysis can directly be tested, such as the expression level needed to have an effective knock-out of the molecules.

In recent years there has been a growth in the demand and the corresponding development of multiscale models of biological systems. These models capture multiple abstraction levels such as genes, proteins, cells, tissues, organs and even whole organisms. The dynamics of the biological systems captured by these models are inherently highly complex. Studying the global dynamics of biological systems has the potential to generate insights regarding the general state of the system. For example, some aspect of the global dynamics of a certain system may correspond to a shift that system goes through from a healthy state towards a disease state. Predictive models could be used to explore possible parameters that cause the simulated system to shift towards such a dynamical state of disease. These parameters could then potentially be experimentally measured or even manipulated in the lab. Similarly, parameter changes could be identified to trigger the system to return to its normal healthy state. Even though the model we used as a case study in this paper is a somewhat simplistic example of a fine-grained simulation model, the information theoretic approach we present offers a method of understanding system dynamics and using it for optimization or control.

In conclusion, we have shown that algorithmic information theory provides a suitable framework for the analysis of fine grained-molecular simulation models and arbitrary model classes, making model comparison more straightforward. The methodology based on this framework is applied to a state description of the system, which is an output of a simulation of the model, and this is model framework independent and can be successfully applied to models at various abstraction levels. The analysis of the dynamics of an immune system model also demonstrated the potential of this approach to make experimentally quantifiable predictions. This approach can be easily applied as-is to more detailed and highly complex multiscale models. We believe that the proposed methodology is a step forward in understanding the dynamics of complex multiscale biological models that represent the behavior of the whole cell or even organism

(TIFF)

(TIFF)

(TIFF)

(DOC)

(ZIP)

(ZIP)

(ZIP)

(ZIP)

(ZIP)

(ZIP)

(ZIP)