^{1}

^{2}

^{2}

^{*}

The authors have declared that no competing interests exist.

A major goal in computational biology is to develop models that accurately predict a gene's expression from its surrounding regulatory DNA. Here we present one class of such models, thermodynamic state ensemble models. We describe the biochemical derivation of the thermodynamic framework in simple terms, and lay out the mathematical components that comprise each model. These components include (1) the possible states of a promoter, where a state is defined as a particular arrangement of transcription factors bound to a DNA promoter, (2) the binding constants that describe the affinity of the protein–protein and protein–DNA interactions that occur in each state, and (3) whether each state is capable of transcribing. Using these components, we demonstrate how to compute a

Modern molecular biology and genomics methods allow investigators to readily assay protein and mRNA expression levels and identify interactions between proteins, RNA, and other cellular components. Leveraging these data to understand the functional significance of interactions on gene expression is a key challenge in computational biology. The recent application of thermodynamic models to gene regulation is an exciting development, as each model reflects a specific, testable hypothesis regarding the physical architecture of the underlying molecular system

Though a gene is regulated at every step of transcription and translation, a large component of regulation operates at the level of the promoter

Here we show the biochemical derivation of the thermodynamic framework used to model promoter activity. The derivation is presented in a form that can be readily coded in any programming language, allowing readers to develop

A model of

Importantly, there is no hypothesis-independent form of the

Two approaches have been used to formulate

The “thermodynamic model” is a framework for constructing a set of states that collectively encode the rules of transcription for a particular promoter. Each state represents a particular number and arrangement of transcription factors bound to a DNA template. Some states are transcriptionally active while others remain transcriptionally dormant. All states occur at some point, but their contributions to transcription are weighted by their relative stabilities. In this formulation,

Generating a model requires writing down all possible states a promoter may adopt in the form of a binding polynomial,

A basal promoter is composed of two states, one where DNA is bound with RNAP and is transcriptionally active, and another where DNA is free and inactive.

This DNA-centric binding polynomial enumerates the two mutually exclusive states of a basal promoter; either DNA is free or bound by RNAP. From

Equation 3 is a basic

We can reformulate Equation 3 in terms of its component free species and their association constants. The apparent association constant for the binding of RNAP to DNA is

This simplification presumes that the concentrations of all cofactors required to form the RNAP complex are invariant. Solving for

The denominator of the right-hand-side of Equation 5 is called the biochemical partition function (

Several other manipulations of these equations are employed in the literature. In addition to writing states in terms of free species concentrations, Shea and Ackers substitute association constants with Boltzmann weights

The framework suggested by Shea and Ackers allows great flexibility for assembling models to reflect a wide variety of mechanisms and behavior.

For any particular system, construction of a thermodynamic

(A) Four states are allowed in this example, two where transcription is inactive (states 1 and 3) and two states where transcription is active (states 2 and 4). (B) The

Proteins/complexes are represented as ovals, binding sites as rectangles. (A) Repressor-RNAP competition with activator release model, see

Vector

Lastly, we define vector

Generally, for any architecture

For our example,

The

Completely independent binding of transcription factor and RNAP implies that the presence of TF has no bearing on the probability of RNAP being bound, a scenario reflected in the equation by factoring and canceling out the TF terms, revealing our basal promoter function:

In order for the TF to affect binding of the polymerase we must introduce a cooperative binding term

The cooperative term

When constructing a thermodynamic model, an investigator explicitly selects the number of binding sites, decides which proteins bind to each site, determines whether a state is transcriptionally active, and assigns cooperative interactions between binding partners. The resulting

Modular Michaelis-like functions have also been used to model

One subtlety of the Michaelis-like models is that there is no uniform definition of the basal rate. To illustrate this, consider two promoters, one with a single binding site for an activator and the other with a single site for a repressor. The corresponding models are given in Equations 14 and 15:

One might expect that removing the effect of the TF in either the single activator or single repressor model would cause reversion to the same basal rate. This is not the case. In the single activator model setting,

What is the physical interpretation of the Michaelis function architecture? By converting the Michaelis model formulations above (Equation 13) into thermodynamic functions we will reveal assumptions underlying Michaelis-like models that are not obvious in their original formulation. The steps involved in converting one model to the other also highlight the similarity between these models, and demonstrate that the Michaelis formulation is simply a thermodynamic model with specific

We can reconcile the thermodynamic model with the Michaelis framework by treating polymerase as an activator. Since polymerase is required for transcription, we incorporate the basal thermodynamic function (Equation 5) into the Michaelis-like formulation, Equation 13, as an activator function (Equations 16 and 17).

Comparing Equations 16 and 17 with Equation 13 illustrates that the original Michaelis-like function requires the assumption that

The asymmetry in the way Michaelis functions treat RNAP becomes clear when they are recast in the thermodynamic framework. Consider the following Michaelis-like models: activator only (Equation 18), repressor only (Equation 19), and one activator and one repressor (Equation 20).

Adding in the polymerase function as in Equation 16 and multiplying out the terms, we generate the following expressions.

Comparing the resulting models shows that the Michaelis-like activator and repressor functions treat the state in which only RNAP is bound very differently. A one activator promoter (Equation 21) transcribes only when both RNAP and activator are present, as represented by the sole numerator term. The presence of the

Recasting the original Michaelis-like functions as a thermodynamic ensemble model also highlights its implicit AND-circuitry. The inclusion of both an activator and repressor in the Michaelis-like formulation results in a model with only a single term in the numerator (Equation 23). This means that transcripts are generated only when activator is bound and repressor is not bound. Higher numbers of transcription factors continue these patterns. For example, a two or more activator model requires that all activators are bound for transcription, and a two or more repressor model requires that none of the repressors are bound. In a mixed system with multiple activators and repressors, the trend set by the one activator and one repressor model (Equation 23) prevails; transcripts are produced only when all activators accompany polymerase with no repressors present. Investigators must decide on the validity of this constraint when employing Michaelis-like functions.

The implicit AND logic associated with Michaelis-like functions leads to a seeming paradox. The more activators a promoter contains, the lower its expression. This is because the probability of having all activators bound at the same time decreases with the number of activator binding sites in a promoter. This seeming paradox and the general AND-circuitry associated with this formalism led some groups to produce an OR-logic function for activators (Equation 24) and repressors (Equation 25)

The activator function involves addition rather than multiplication of individual transcription factor effects. Following the same steps outlined above, one can show that the OR-logic model here no longer produces zero expression when any single activator concentration (or affinity) goes to zero. However, if all activator concentrations are zero, transcription is abolished, implying that some activator (of either type) is required to produce transcripts.

To allow basal expression even in the absence of transcription factors, some groups

These are reasonable models provided that the mechanisms described appropriately reflect the logic of the system being modeled. Michaelis-like functions can be a simple and powerful framework for modeling many types of regulatory logic. The purpose of reformulating these models in the thermodynamic framework was to demonstrate that Michaelis-like functions are simply one type of thermodynamic model. The assumptions that underlie these particular models, which are easy to see in the thermodynamic framework, are likely to be valid for many, but not all types of

Some regulatory mechanisms require the use of the more general thermodynamic framework. For example, a repressor might function by directly blocking polymerase binding, so that simultaneous binding of polymerase and repressor does not occur

Cooperativity is a repulsion or attraction between proteins on the surface of DNA such that the sum of the free energies of proteins binding independently differs from the energy of the proteins binding together. We discussed cooperativity in the thermodynamic framework using Equation 10. Another commonly used method to capture cooperativity is the addition of Hill coefficients (

These functions are known as Hill functions

The assumption of extreme cooperativity must be made in order to convert the thermodynamic model into a Hill function. Consider a promoter with two binding sites for an activator, A. The two A proteins exhibit positive cooperative binding with constant

This model is not directly comparable to the Hill function in Equation 26. In order to reduce this model to a form that is comparable to the Hill model, we must further assume that the TF affinity for DNA is small and the cooperative binding constant large (

The polymerase binding term can now be factored out.

The right hand term in Equation 30 is the basal promoter function and the left hand term is the new activator function, which is now directly comparable to Equation 26. The key point is that in order to convert the thermodynamic framework into the Hill framework we must assume that

A practical realization of extreme cooperativity is the oligomerization of TFs prior to binding. While the model above implies that TFs are monomeric in solution and

Using expression-profiling methods, investigators routinely collect large quantities of gene expression data. A mature and robust quantitative framework would draw meaningful conclusions from these rich but complex datasets. Here we derived a thermodynamic state ensemble framework for capturing

The flexibility of the thermodynamic formalism makes it simple to model different promoter architectures and regulatory mechanisms. Discrete promoter states determine the overall architecture of the model, with individual states constructed from the product of activities of DNA-bound molecules. The balance between productive and silent states determines the probability of transcription (

Michaelis-like models are simplified forms of the thermodynamic framework. Each type of Michaelis-like

In some cases, investigators must employ the more general form of the thermodynamic framework. For example, repressors might inhibit transcription by binding directly to the RNAP binding site, a mode of repression that cannot be specifically represented using the Michaelis formulation. Such a mechanism can be captured by a thermodynamic state ensemble model in which one disallows the state in which both RNAP and repressor are simultaneously bound (for examples, see

With the exception of a few well-characterized systems like

(PDF)

The authors thank members of the Cohen lab for critical reading of the manuscript.