^{1}

^{2}

^{*}

^{2}

^{3}

^{4}

The authors have declared that no competing interests exist.

Provided initial direction and on-going feedback on analyses: CZ RS. Analyzed the data: CP. Contributed reagents/materials/analysis tools: CP. Wrote the paper: CP.

Networks are often used to understand a whole system by modeling the interactions among its pieces. Examples include biomolecules in a cell interacting to provide some primary function, or species in an environment forming a stable community. However, these interactions are often unknown; instead, the pieces' dynamic states are known, and network structure must be inferred. Because observed function may be explained by many different networks (

Researchers across several disciplines have increasingly focused on network-based descriptions of complex behavior, with notable success in systems biology phenomena. That focus is essential to connecting knowledge across various scales – from cell chemistry to individual organisms, individuals to populations, and populations to ecosystems – and is driven by increasing availability of high-fidelity data, computational processing power to digest and test hypotheses against those data, and high-profile applications from designer pharmaceuticals to environmental policy

In molecular biology, Kauffman provided one of the earliest applications of network-based thinking several decades ago

Work in the network perspective does not usually focus on the exact details of the dynamics associated with a particular network. Consider the work on Boolean activation-inhibition network models, of the type initially introduced for the fruit fly (

What if, instead of assuming the network, one assumes that primary function? In molecular biology, this roughly means starting from a gene expression time series (microarray time course data) and asking the question: what can we say about the possible interactions that would exhibit these dynamics? That is the question we have been asking in our recent work, and we use it to frame a complementary

First, we use

Second, we show how to approximate the Derrida plot, a popular measure of ordered versus chaotic behavior. We also apply this to the putative yeast cycle network to compare it with the

Then, we quantify which experiments will likely best identify a unique network structure that supports some observed phenomena. In molecular biology, even with the downward trends in experimental cost and data analysis, the demand for data over a plethora of systems remains voracious enough that optimizing the choice of tests seems imminently practical, and at other scales – for example, ecological – extensive testing remains implausible. Other groups have used a similar Shannon entropy based approach

Finally, we propose how

We close by reviewing open questions for

For (1), we are not aware of a physical system that switches between networks stochastically, but that idea shares some parallels to models of protein folding, specifically stochasticity in intermediate conformations leading to well-defined outcomes

For (2), we calculated the Shannon entropy from

For (3), we outline how

The Boolean Network Model is

• a system of parts

• at time

• an

For our case, we use two types of interactions:

We use a single rule to update all parts, typically referred to as

The

A final note on the update rule and interactions: many boolean network models do not use a system-wide update rule paired with interactions, instead encoding individual rules for each part without reference to interactions. Of course, an overall rule plus varying interactions encodes “different” rules for each part; in some sense, the system-wide rule is a reductionist explanation of the individual rules. A single system-wide rule may prove too optimistic a reduction for many interesting systems; fortunately,

The Strong Inhibition rule is deterministic for

As typical inputs, we have

We denote collections like

For a Boolean system with

We define the inverse of a dynamic,

or,

For our results, we use an exact set counting function

Finally, we must note: while

In Boolean dynamics, a point attractor is a system state which is static.

For each of the dynamics in our

This plot compares the distribution of point attractor number as calculated based on

There are some caveats – as should be obvious from the figure (and to be expected, per

For stochastic systems, these distributions of attractor sizes would be analytically exact. Thus, the

We conclude overall that

Since a Derrida Plot conventionally measures deterministic Hamming distance, we need to develop a stochastic alternative. We propose that the classic expected value calculation of Hamming distance is a suitable stochastic substitute:

We define the matrix

For

This plot compares the putative yeast cell cycle network (grey) and

Obviously, this is a single point comparison. This result does not invalidate the idea of a function-based Derrida plot, but there is work to be done before considering it useful. We discuss that work in our concluding remarks, as well as proposing some preliminary interpretation of this single point result.

When

Each row

However, our results indicate that such dependence may be irrelevant by comparing to a “naïve” entropy. This naive entropy is the uncertainty expected based on the Strong Inhibition rule and equiprobable interactions among parts (calculating this value is discussed in the

We also tested scalar measures that can be derived from

To compare experimental selection based on Shannon entropy, we simulated initial condition experiments on a network selected randomly from a class by calculating the transition from that initial state based on that particular network. We performed these simulations for 10 k network samples from each of the dynamics in our

We selected initial conditions based on three orders: (1) Shannon entropy from

This plot is a histogram comparing number of experiments to determine an underlying network given partial initial information. The categories are the number of simulated experiments using

As a complementary application to recommending which tests to perform, we also considered treating

It is not obvious how to determine ROC for

So we instead opted to tackle a more straightforward question about the ROC for identifying interactions. Instead of using

The ROC for identifying the flexible and free inhibition interactions in the yeast cell network.

The ROC for identifying the flexible and free activation interactions in the yeast cell network.

The ROC for identifying non-interactions between components in the yeast cell network.

To determine if a class-defining model does capture population diversity, we can make predictions based on that model and experimentally test them. In broad strokes, using a particular model –

compute

conduct an experiment, translated to model states, that forces part(s) to be (in)active, precludes a fixed interaction,

measure the population proportions of different responses

compute the proportion based on

As a concrete example, posit that the putative yeast cell cycle process adequately models wild type yeast for the purposes of predicting their diversity. This could be tested by gathering or growing yeast under conditions that maintain diversity, then exposing them to an environmental change that affects the cell cycle and is included in the model. Continue to measure the growth rate of the yeast under this condition, and calculate the defect in that growth rate compared to typical conditions. From that, calculate the proportion of yeast suffering some inhibitory (or lethal) effect from that environment. Data obtained, identify which rows in

The researchers might discern order of magnitude effects (

As to the question of a more general measure of phenotypic diversity surrounding a function, we have not yet set on a specific and useful calculation, but we suspect there may be a scalarization of

A complementary application of

This framework also obviously allows comparison of

We have shown that, for the Strongly Inhibited Boolean Network model, it is

• practical to compute the superposition stochastic matrix

• accurate to use

• useful to select experiments based on Shannon entropy from

We have also shown how to calculate a Derrida plot and provide phenotype diversity predictions based on

Our own investigations of open questions associated with

We also proposed open questions specifically for the Derrida plots and phenotype diversity. Relative to the Derrida plot, there is an obvious general question about what exactly is being measured. Practically, we think comparing the plots across various functional inputs would provide a useful starting point. As part of that survey, we think that the small Hamming distance region deserves special attention; recall that, for the yeast cell cycle, this region presented a result that is impossible for a deterministic system. We posit that there may be a quantitative meaning to this result in light of the three interpretations we offered for

Relative to phenotypic diversity, interpretation of

Other publications invoking the Strong Inhibition rule present different formulations; eq. (1) is equivalent to those after adding interaction constraints dependent on the particular formulation. For example, most formulations do not allow self-inhibition: adding the

Though these specific cases require an extra constraint, overall eq. (1) simplifies representation, reducing our algorithm's lines-of-code complexity without degrading performance and generalizing it to cover more phenomena within the same framework. This generalization comes from including self-directed interactions: if the part is already active, it can send a signal to stay active or deactivate. Allowing these self-interactions and having

The analysis sections are independent of this formulation (though the Shannon entropy calculation would require some trivial modifications), but we include it to limit any future confusion comparing our code base to this publication, and because we feel it is enough of an improvement to warrant general community adoption. Finally, this formulation is particularly conducive to representing variables–system states

First, the number of transitions to be considered as perturbations grows rapidly with the system size

However, we made the naive calculation of eq. (2) more practical with analytically equivalent modifications:

Transitions can be calculated independently for each element, then combined into overall transitions. That is, for an initial

The definitions in eq. (12) are complementary, as we implied by their notation:

Finally, the additional transition constraints can be considered against the known results of

Taken together, these modifications substantially reduce the computation time. Anecdotally, on a two-core, 2.4 GHz system the yeast-sized systems (

We calculated these distributions by repeatedly multiplying the previous distribution of attractor counts by

For cases where several of the

We obtained

The free interactions are chosen uniformly from the available options:

We then calculate all of the pertinent dynamic transitions for each sample;

In the Boolean network model using the Strong Inhibition rule, an initial state only has effects through its active parts. In the naive case, we have no knowledge about the interactions among these parts, so they are indistinguishable. A target part's probability of being active at

so given the equiprobability:

Of course, the entropy for a single element is

We are grateful to Burton Singer and Juliet Pulliam for feedback on the draft of this work.