The authors have declared that no competing interests exist.

Recent work of Sottoriva, Graham, and collaborators have led to the controversial claim that exponentially growing tumors have a site frequency spectrum that follows the 1/_{0} < λ_{1}, then the site frequency spectrum is ^{α} where _{0}/λ_{1}. This is due to the advantageous mutations that produce the founders of the type 1 population. Mutations within the growing type 0 and type 1 populations follow the 1/

For many years, the dominant paradigm was that cancers evolve by a succession of selective sweeps in which new fitter mutants take over the system. About five years ago, Sottoriva et al introduced the Big Bang model of cancer initiation, which postulated that all the mutations needed were present when the tumor started growing. A consequence of this viewpoint is that mutations in the growing tumor are neutral. Many researchers have objected to this conclusion for a wide variety of reasons. Here, we use mathematical analysis to show that with enough sequence data the site frequency spectrum can be used to distinguish neutral evolution from the two-phase model of clonal evolution. This conclusion differs from previously published simulation results.

Following up on the introduction of the Big Bang model by Sottoriva et al [

Inferring the allele frequency

Failure to reject the null model is not the same as proving it is true. To quote McDonald, Chakrabarti, and Michor [

Tarabichi et al [

Tarabichi et al [^{2} > 0.98 is neither necessary nor sufficient for neutral evolution.”

To try to shed some light on the controversy, we will do a mathematically rigorous computation of the site frequency spectrum produced by the two-type model of clonal evolution. We will describe the model in Results. The two-type model and its

McDonald, Chakrabarti, and Michor [

In their first model, clonal expansion begins with a single cell of the original tumor-initiating type (type 0). To make it easier to connect with previous mathematical work, we will describe their model using the notation used in [_{0} and die at rate _{0}, so the exponential growth rate is λ_{0} = _{0} − _{0}. For simplicity, we will suppose that neutral mutations accumulate during the individual’s life time at rate

Type 0 individuals mutate to type 1 at rate _{1}. Type 1 individuals give birth at rate _{1} and die at rate _{1}. Their exponential growth rate is λ_{1} = _{1} − _{1} where λ_{1} > λ_{0}. In [

The reader will see many complicated formulas in this paper, so it will be useful to have a concrete set of parameters to plug into these formulas. Borrowing an example from [

As in [_{0} = 0 with probability _{0}/_{0} and has a rate λ_{0}/_{0} exponential distribution with probability λ_{0}/_{0}.

The study of the second wave is simpler if we suppose that _{0} has the same distribution as (_{0}|_{0} > 0), that is exponential with rate λ_{0}/_{0}. Mutations from type 0 to 1 occur at rate _{1}. Let _{1} be the time of the first successful type 1 mutation, i.e., one whose branching process does not die out. Durrett and Moseley [_{1} has median

Durrett and Moseley were the first to rigorously prove results about the asymptotic behavior of the size of the type 1 population _{0} = _{1} = 1 to simplify the constants.

There are three classes of mutations in the two-phase model

type 0: Neutral mutations that occur to type 0 individuals.

type 1A: Advantageous mutations that turn type 0 individuals into type 1.

type 1: Neutral mutations that occur to type 1 individuals.

By the argument in Methods, the type 0 mutations will have a 1/

_{1}

The points in the Poisson process in Theorem 1 indicate the contributions of the various type one families to the limit _{1} > _{2} > _{3}… be the points, then the ^{−α} where _{0}/λ_{1}. However, the fact that the sum of the points in the Poisson process is random makes this difficult to study. Fortunately for us, the work has already been done in 1997 by Pitman and Yor [

Including type 0 passenger mutations in type 1^{−α} shape in

To illustrate the results proved above, we turn to simulations seen in Figs

The figure shows the contribution of the different mutation types to the site frequency spectrum. The simulation was performed with parameters _{1} = 2 × 10^{−4}, λ_{0} = 0.02, λ_{1} = 0.04 and _{0} = _{1} = 1 and is the average site frequency spectrum of 1000 runs. We simulated the 1

To better understand the distribution of 1

McDonald, Chakrabarti, and Michor [

If the fitness distribution was bounded then, as

If the distribution was unbounded, then the population could grow faster than exponential.

In this section, we will modify our example from

To find the distribution of the growth rates of the mutations with the largest family sizes, we note that a mutant that occurs at time _{i} and has growth rate λ_{1,i} will grow to size _{1}exp(λ_{1,i}(1000 − _{i})) at time 1000. The number of _{1,i}(1000 − _{i}) > ^{2} for the inner integral.

The graph indicates the expected number of 1_{1,i}(1000 − _{i}) > _{1} for all type 1 families, we have a different λ_{1,i} for each type 1A family. Each λ_{1,i} is normally distributed with mean 0.04 and standard deviation 0.005. 500 runs were done up until time ^{x} > 10^{10}. If the λ_{1,i} of the largest family is within 2 standard deviations, then multiplying ^{x} by 1/λ_{1,i} implies a family of magnitude around 2 × 10^{11} or greater.

The random fitnesses cause the relative sizes of the contributions of mutations to the final population to change, but as ^{β}, where

(A) shows the site frequency spectrum for multiple values of ^{β}, with

The authors of [^{9} cells, and neutral mutations occur slowly, leading to genealogical relationships that are more like those found in growing cancer tumors.

Bozic, Paterson, and Waclaw [

To argue for this viewpoint, they use the two-type model but with different notation
_{1}/_{0} be the population of type 0’s when the mutation occurs. Since _{0} is large, _{t} ≈ _{0} ^{rt}. The type 1 population at time _{t} ≈ _{1} ^{rct}, where _{1} is an exponentially distributed random variable with rate _{1}. Note that as in Bozic et al [

This graph gives the probability of having a driver with frequency greater than ^{9}. The parameters used are _{0} = _{1} = 1, λ_{0} = 0.02, λ_{1} = .035 and _{1} = 10^{−5} and the data was generated from 1000 runs. Single 1A refers to approach taken by Bozic et al. where there is only 1 selective mutation. Multiple 1A is our approach. The theory curve comes using a Riemann sum with interval size 500 to evaluate the integral in

Writing _{sub} = _{t}/(_{t} + _{t}) they prove that when the total tumor size is _{t} + _{t} the subclonal mutation frequency has
_{sub} ≤ 0.8).

To see what this complicated formula implies, the authors turn to simulation. The mutation rate to produce an additional driver is ^{−5}. Their Fig 2A shows a moderately growing tumor ^{9} cells and remain below 1/3 for ^{11}. For other cases considered there (_{d} = 10^{7}, _{e} = 5 ⋅ 10^{10} and _{f} = 2 ⋅ 10^{8}. In the three cases the frequency is near 0, near 1, and almost uniformly distributed on [0, 1].

Rather than study the tumor when it reaches a fixed size, we will derive results at a fixed time by using Theorem 1. Recall that we have set _{i}

Work of Sottoriva and Graham [

We have studied the two-type model of cancer evolution in which the exponentially growing population of type 0 cells can mutate to a fitter type 1, and all cells can experience neutral mutations. In this model there are three types of mutations that we call 0, 1_{0} < λ_{1}, where _{0}/λ_{1}, then the site frequency spectrum has the shape 1/^{α} due to 1A mutations and the type 0 neutral mutations present in the founders of the type 1 population. These mutation types are more numerous than the others.

McDonald, Chakrabarti, and Michor [

Bozic, Paterson, and Waclaw [

Sottoriva and Graham say in their original paper [^{λs} (we have set ^{−λs} in the population. Evaluating the integral in the previous formula, we have
_{f} = −(1/λ)log _{f}) = 1/_{f} will have frequency ≥

From the derivation given above, we see that the 1/

_{f}) gives the desired result.

To show that the important 1A mutations happen soon after the first, and that therefore all important 1A mutations have roughly the same number of passengers, consider two successful mutations at times _{0} and _{1} which have sizes _{0}^{λ1(t−s0)} and _{1}^{λ1(t−s1)}. For the second mutation to be larger, we’d need _{0}/_{1} ≤ ^{λ1(s0−s1)}. Since the cdf of the quotient of two exponentials with the same rate is _{0}/_{1} ≤ _{1} = _{0} + 4/λ_{1} = _{0} + 200, then the probability that the second mutation is larger is (1 + ^{4})^{−1} = 0.018. Thus, in our concrete example the most significant mutants occur within 200 time units of the first successful mutation. The mean number of mutations in 200 units of time is 200

Both authors would like to thank Jason Schweinsberg, Ivana Bozic, and Einar Bjarki Gunnarsson for helpful comments on a previous version.