^{1}

^{*}

^{1}

^{2}

^{2}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: KC AP BA MAN. Analyzed the data: KC AP BA MAN. Wrote the paper: KC AP BA MAN.

A fundamental question in biology is the following: what is the time scale that is needed for evolutionary innovations? There are many results that characterize single steps in terms of the fixation time of new mutants arising in populations of certain size and structure. But here we ask a different question, which is concerned with the much longer time scale of evolutionary trajectories: how long does it take for a population exploring a fitness landscape to find target sequences that encode new biological functions? Our key variable is the length,

Evolutionary adaptation can be described as a biased, stochastic walk of a population of sequences in a high dimensional sequence space. The population explores a fitness landscape. The mutation-selection process biases the population towards regions of higher fitness. In this paper we estimate the time scale that is needed for evolutionary innovation. Our key parameter is the length of the genetic sequence that needs to be adapted. We show that a variety of evolutionary processes take exponential time in sequence length. We propose a specific process, which we call ‘regeneration processes’, and show that it allows evolution to work on polynomial time scales. In this view, evolution can solve a problem efficiently if it has solved a similar problem already.

Our planet came into existence 4.6 billion years ago. There is clear chemical evidence for life on earth 3.5 billion years ago

Evolutionary dynamics operates in sequence space, which can be imagined as a discrete multi-dimensional lattice that arises when all sequences of a given length are arranged such that nearest neighbors differ by one point mutation

A question that has been extensively studied is how long does it take for existing biological functions to improve under natural selection. This problem leads to the study of adaptive walks on fitness landscapes

We consider an alphabet of size four, as is the case for DNA and RNA, and a nucleotide sequence of length

Consider a high-dimensional sequence space. A particular biological function can be instantiated by some of the sequences. Each sequence

For the purpose of estimating the expected discovery time we can approximate the fitness landscape with a binary step function over the sequence space. We discuss two different approximations (

The figures depict examples of highly rugged fitness landscapes where the sequence space has been projected in one dimension. (A) Sequences with fitness below some level

The second approximation works as follows. Consider the evolutionary process exploring a rugged fitness landscape where the goal is to attain a fitness level

Our key results for estimating the discovery time can now be formulated for binary fitness landscapes, but they apply to any type of rugged landscape using one of the two approximations. We note that our methods can also be applied for certain non-binary fitness landscapes, and an example of a fitness landscape with a large gradient arising from multiplicative fitness effects is discussed in Sections 6 and 7 of

We now present our main results in the following order. We first estimate the discovery time of a single search aiming to find a single broad peak. Then we study multiple simultaneous searches for a single broad peak. Finally, we consider multiple broad peaks that are uniformly randomly distributed in sequence space.

We first study a broad peak of target sequences described as follows: consider a specific sequence; any sequence within a certain Hamming distance of that sequence belongs to the target set. Specifically, we consider that the evolutionary process has succeeded, if the population discovers a sequence that differs from the specific sequence in no more than a fraction

Our result can be interpreted as follows (see Theorem S2 and Corollary S2 in

For the four letter alphabet most random sequences have Hamming distance

For the broad peak there is a specific sequence, and all sequences that are within Hamming distance

Let us now give a numerical example to demonstrate that exponential time is intractable. Bacterial life on earth has been around for at least 3.5 billion years, which correspond to

If individual evolutionary processes cannot find targets in polynomial time, then perhaps the success of evolution is based on the fact that many populations are searching independently and in parallel for a particular adaptation. We prove that multiple, independent parallel searches are not the solution of the problem, if the starting sequence is far away from the target center. Formally we show the following result.

If an evolutionary process takes exponential time, then polynomially many independent searches do not find the target in polynomial time with reasonable probability (for details see Theorem S5 in the

In our basic model, individual mutants are evaluated one at a time. The situation of many mutant lineages evolving in parallel is similar to the multiple searches described above. As we show that whenever a single search takes exponential time, multiple independent searches do not lead to polynomial time solutions, our results imply intractability for this case as well.

We now explore the case of multiple broad peaks that are uniformly and randomly distributed. Consider that there are

Whether or not the function

(A) The target set consists of

It is known that recombination may accelerate evolution on certain fitness landscapes

What are then adaptive problems that can be solved by evolution in polynomial time? We propose a “regeneration process”. The basic idea is that evolution can solve a new problem efficiently, if it is has solved a similar problem already. Suppose gene duplication or genome rearrangement can give rise to starting sequences that are at most

Gene duplication (or possibly some other process) generates a steady stream of starting sequences that are a constant number

The regeneration process formalizes the role of several existing ideas. First, it ties in with the proposal that gene duplications and genome rearrangements are major events leading to the emergence of new genes

There is one other scenario that must be mentioned. It is possible that certain biological functions are hyper-abundant in sequence space

Our theory has clear empirical implications. The regeneration process can be tested in systems of in vitro evolution

In summary, we have developed a theory that allows us to estimate time scales of evolutionary trajectories. We have shown that various natural processes of evolution take exponential time as function of the sequence length,

Our results are based on a mathematical analysis of the underlying stochastic processes. For Markov chains on the one-dimensional grid, we describe recurrence relations for the expected hitting time and present lower and upper bounds on the expected hitting time using combinatorial analysis (see

For a single broad peak, due to symmetry we can interpret the evolutionary random walk as a Markov chain on the one-dimensional grid. A sequence of type

Consider a Markov chain on the one-dimensional grid, and let

Theorem 1 is derived by obtaining precise bounds for the recurrence relation of the hitting time (

The basic intuition obtained from

The basic intuition for the result is as follows: consider a single search for which the expected hitting time is exponential. Then for the single search the probability to succeed in polynomially many steps is negligible (as otherwise the expectation would not have been exponential). In case of independent searches, the independence ensures that the probability that all searches fail is the product of the probabilities that every single search fails. Using the above arguments we establish Theorem 2 (for details see Section 8 in

For this result, it is first convenient to view the evolutionary walk taking place in the sequence space of all sequences of length

An important aspect of our work is that we establish our results using elementary techniques for analysis of Markov chains. The use of more advanced mathematical machinery, such as martingales

Detailed proofs for “The Time Scale of Evolutionary Innovation.”

(PDF)

We thank Nick Barton and Daniel Weissman for helpful discussions and pointing us to relevant literature.