^{1}

^{2}

The author has declared that no competing interests exist.

Protein structure can be predicted in silico given sufficiently good templates, as demonstrated in successive installments of the biannual Critical Assessment of protein Structure Prediction (CASP) competition [

The central ingredient in DCA is to learn generative probabilistic models from a set of homologous protein sequences. These models are chosen from an exponential family with linear and quadratic interactions, commonly referred to as Potts models (see

Maximum entropy was introduced by E.T. Jaynes in two papers published in

For a positive evaluation of maximum entropy in science and an entry point to the more recent literature, I refer to the companion paper by Erik van Nimwegen [

The maximum entropy argument for DCA starts from a multiple sequence alignment of homologous protein sequences and posits that each sequence is an independent draw from a probability distribution maximizing entropy while constrained by the single-site amino acid frequencies _{i}(_{ij}(_{i}(_{i})) and quadratic (_{ij}(_{i}, _{j})) interactions and a normalization constant

The core of the DCA is to score amino acid pairs by the strength of the interaction matrices _{ij} in

Panel (A) depicts a fragment of multiple sequence alignment used in shown predictions (residues 60–120, with some very similar sequences removed for the sake of clarity). Panel (B) shows a plot of top L/2 contacts predicted by gplmDCA (upper left corner) and correlations-based mutual information method, with alignment filtered for columns and rows containing too many gaps and corrected for phylogenetic bias (Dunn et al., 2008). Panels (C) and (D) depict the predicted contacts plotted against the experimentally determined protein structure, color-coded for distance (green—contacting in real structure; red—noncontacting).

To determine such _{ij}’s is a much better predictor of spatial proximity than measures of correlation, such as the _{ij}, or simpler modeling approaches based, e.g., on sequence profiles.

By the logic of the DCA procedure itself, maximum entropy provides no grounds to believe in _{i}(_{ij}(_{i}(_{ij}(

Whether or not the data compression as above means loss of information depends on the data and how that were generated. First, consider the case favorable to maximum entropy, when the data actually were generated by _{i}(_{ij}(_{i}(_{ij}(_{i}(_{ij}(_{i}(_{ij}(

Now, consider the more natural case that the data in fact were generated by another probabilistic model, such as an exponential model including both second- and third-order interactions, or a mixture model. Given enough data, the sample frequencies _{i}(_{ij}(_{ij}(

An example of this effect has appeared in DCA. Standard multiple sequence alignments cannot exactly be generated by

Furthermore, a multiple sequence alignment manifestly contains information on secondary structure and solvent accessibility, which cannot so far be reliably deduced from

The conceptual appeal of the maximum entropy argument is that it immediately leads to the Boltzmann distribution of equilibrium statistical physics. However, unless it is assumed that the effects of mutation, selection, and genetic drift in a sufficiently large domain of life are well described by a process obeying detailed balance, the proper analogue must be to the nonequilibrium statistical physics. For a review providing a dictionary between models in statistical physics and models in population genetics, see [

When Jaynes introduced the maximum entropy approach, comparatively little was known in nonequilibrium statistical physics, and maximum entropy could have been envisaged a viable approach. This situation has changed in the almost 60 years that have intervened, and it is now settled that this was not the case. The problem stems from the fact that nonequilibrium systems with a flux exhibit long-range correlations [

The success of DCAs, which typically try to infer models with hundreds of thousands of parameters from thousands to tens of thousands of examples, can be phrased as the maxim “it is useful to learn exponential models of Big Data.” Why is this so? Let us emphasize that in DCA, the validation of _{ij}) and the evaluation criterion is indirect, which in principle makes the success of DCAs even more surprising.

Apologetics for maximum entropy is not seldom based on the subjective view of probability, indeed also used by Jaynes [

A second and more intriguing possibility is that naturally occurring probabilities actually do tend to take the exponential form (

I thank Marcin J. Skwark for discussions and the participants of “Regulation and inference in biological networks,” Bardonecchia (Italy), February 2–6, 2015, and “Models of Life,” Krogerup (Denmark), August 2–8, 2015, for lively and fruitful exchanges of views.