The author has declared that no competing interests exist.
Chromosomal crossover is a biological mechanism to combine parental traits. It is perhaps the first mechanism ever taught in any introductory biology class. The formulation of crossover, and resulting recombination, came about 100 years after Mendel's famous experiments. To a great extent, this formulation is consistent with the basic genetic findings of Mendel. More importantly, it provides a mathematical insight for his two laws (and corrects them). From a mathematical perspective, and while it retains similarities, genetic recombination guarantees diversity so that we do not rapidly converge to the same being. It is this diversity that made the study of biology possible. In particular, the problem of genetic mapping and linkage—one of the first efforts towards a computational approach to biology—relies heavily on the mathematical foundation of crossover and recombination. Nevertheless, as students we often overlook the mathematics of these phenomena. Emphasizing the mathematical aspect of Mendel's laws through crossover and recombination will prepare the students to make an
Sexually reproducing organisms generally combine heritable traits from two parents. The biological process that combines those traits is called meiosis. While mutations could occur during meiosis, most of the variation arises from the combinations of parental traits. How do these parental traits combine? The dominant theory was that some sort of blending or averaging took place. However, such a mode of inheritance would result in an average of all ancestors after only a modest number of generations (imagine repeatedly mixing colors). Instead, by performing experiments on plants, Mendel pointed out the existence of discrete elements that combine but do not mix.
Mendel formulated the concept of a
The state of a gene, the


























In a dominant/recessive mode where


























Students often overlook that these ratios are not simply based on counting the entries, but the result of the segregation law: each allele is contributed with equal probability, i.e.,































About 100 years later, it was established that the physical structure underlying Mendel's laws is the chromosome (for simplicity, a long molecule of DNA). This discovery matched Mendel's experiments really well: In diploid organisms like us chromosomes come in pairs (thus the name diploid), one from each parent! With few exceptions, each chromosome of the pair has copies of the same genes (special stretches of DNA) arranged in the same order: the alleles! In an attempt to explain experimental results and confirm Mendel's laws, chromosomal
When two alleles come from different chromosomes of the pair, their corresponding genes are said to recombine (can you identify the recombinations in
Mendel's laws (segregation and independent assortment) dictate that genetic recombination occurs with a probability of
However, it has been observed that some pairs of genes show a correlation in their alleles, e.g., their probability of recombination is less than
Let
I will assume some familiarity with matrices. If, however, this notion is unfamiliar, the parts of the exposition that use matrices may be skipped. Only
One of the series that is almost invariably covered in basic calculus is the geometric series.
This is one of the basic expressions covered when studying limits.
Here's the definition of natural logarithm and some of its properties:
Another famous encounter is the harmonic series and its approximation.
A function
Motivated by
Based on the above setting,
To find the probability that a given allele of gene
Genetic mapping is the problem of placing the genes along the chromosome in their correct relative order. The bad news: It is hard! The good news: Genetic linkage can be used to infer genetic mapping. Though obsolete (it has been done), genetic mapping can be considered to be the first effort towards a computational approach to biology. How does it work?
In the uniform 1crossover model, genetic linkage tells us that the probability of recombination of two genes is proportional to the distance between these genes. Now consider the genotyping depicted in













The reader may choose to skip this section to the next. The uniform 1crossover model is very insightful in explaining Mendel's law of segregation with independent assortment corrected to reflect genetic linkage. However, it suffers from a few deficiencies.
Nothing is seriously wrong about this aspect. By assigning lower probabilities of recombination for smaller distances, the distance between two genes justifies their linkage when they do not assort independently. However, the actual probability of recombination may not necessarily be
The probability of inheriting a given allele is contingent on the probability that the recombination process starts on the given chromosome of the pair, previously called
Despite genetic linkage, one should still expect that genes which are far from each other on the chromosome will assort independently. Because each chromosome can be treated separately, this independence is certainly true for genes that are on different chromosomes altogether. But on the same chromosome, the probability of recombination
In retrospect, two genes
Now, why do we insist that the model must satisfy, among other properties, the law of independent assortment? Well, first because it is a correct law for distant genes. And second, since the probability of recombination increases with distance due to genetic linkage, the law of independent assortment tells us that
One might consider extending the uniform 1crossover model as an attempt of generalization to mimic the actual biological process. However, I will show that extending this model in the most natural way (mathematically, that is) will break the linkage property. For this purpose, consider a uniform 2crossover model. Let
Now, why even bother to show that this model, which is more difficult to analyze than its predecessor, does not work? Well, my experience in teaching has been the following: While it is important to show students what works, it is equally important to show them what does not work.
With this in mind, all we need is a counter example, so consider gene 1 and gene
Probability of recombination of the first gene and a gene at a distance given as a percentage of the chromosome length. A maximum probability of
While the uniform 1crossover model captures the essentials of segregation and linkage, it is lacking in some important aspects. First, the probability that a given allele is inherited (should be
A better mathematical model is needed to rectify the above deficiencies. In principle, the model should satisfy the following three laws with multiple crossovers:
Being a computer scientist by training and not a biologist, when I first suggested to my students a model based on a Markov chain, I called it the
The jumping model is based on a Markov chain. A Markov chain consists of a set of states with probabilities of transition between them (thus the jumping term). For computer scientists, this is often illustrated as a directed weighted graph with vertices representing the states and directed edges representing the transitions between states. The weight of an edge is the probability of the corresponding transition. This is shown in
Arguably the simplest Markov chain with two states, where each state represents one chromosome of the pair. Transitions between the two states (chromosomal crossovers) occur with probability
What is the biological significance of the Markov chain in
Dashed lines represent transitions (crossovers) with probability
A useful representation of a Markov chain is by a matrix
Because
Following the logic of previous sections, the probability that a given allele of gene
Computation is performed with a rounding error
Because
In addition, since both
The previous sections show that
While it is easy to verify the solution, obtaining it should not remain a wild guess. By working out a few iterations for
The mathematically savvy could verify that
When
When
When
The jumping model captures the essential biology of crossover and recombination through the laws of segregation, linkage, and independent assortment. In addition, it reveals the nontypical high recombination probabilities of hotspots. Hotspots are regions on the chromosome that experience a high probability of recombination even at small distances. Therefore, depending on the parameter
While a hotspot does not present a difficult concept, it is usually misinterpreted by students as a
Morgan established that the probability of recombination as a function of distance is the following:
I could have simply argued that the probability of recombination
There is a rapid prototyping with a simple uniform 1crossover model that reflects the essential biological properties of crossover and recombination (though not perfectly). This allows the student to quickly make a connection between the biology and the mathematics.
There is no need for advanced calculus or probability (e.g., no mention of Poisson processes or probability distributions other than uniform).
To achieve a better understanding of the biological properties, the exposition proceeds by pointing out the deficiencies of the simple model.
The simple model itself is a useful tool that is actually used for simulation, e.g., genetic algorithms.
Having a model (whether mathematical or not) provides some operational sense, so the biology is made more concrete.
Moving progressively through the models illustrates what it takes to make attempts, including wrong ones, in the modeling of biological systems.
Multiple models reinforce the ideas by exposing them in different settings.
Markov chains are useful as a tool for modern biological sciences and, therefore, introducing them in this context gives the student an early preparation.
The jumping model captures two modes of recombination, normal and hotspots, and puts them in their biological context by means of the parameter
The jumping model also provides the insight that the probability of crossover must be less than
The alternating behavior of the jumping model corrects one major misunderstanding of hotspots.
Morgan's first result can be derived as a special case.
The jumping model can be described (not necessarily analyzed) very easily and satisfies all the required biological properties of crossover and recombination. Therefore, a student can effectively retain and communicate the recombination process.
Consider the hypothetical family in
Genes  




Father  0,1  0,1  0,1 
Mother  0,1  0,1  0,1 
Offspring 1  0,0  1,0  0,1 




Offspring 
0,0  1,0  0,1 
Offspring 
0,0  0,1  
Offspring 
0,0  1,1  




Offspring 
0,0  1,1  1,1 
To map the genes (genetic mapping), we count the number of recombinations, both paternal and maternal, for each pair of genes,
There are
Since
Genes  




Father  0,1  0,1  
Mother  0,1  0,1  
Offspring 1  0,0  1,0  0,1 




Offspring 
0,0  1,0  0,1 
Offspring 
0,0  0,1  
Offspring 
0,0  1,1  




Offspring 
0,0  1,1  1,1 
This will make
This solution puts
If we believe that our knowledge of the alleles in
Here's a possible method for delivering the content of this exposition to students:
Describe the recombination process and genetic linkage with the uniform 1crossover model as a hypothetical prototype, and explain how genetic mapping can be done based on observed probabilities. Introduce hotspots as an exception to the normal behavior of recombination.
As part of a homework assignment, ask which biological properties are satisfied by the uniform 1crossover model and which are not. Assume that
(optional) As an advanced question, ask to prove that a uniform 2crossover model breaks the linkage property.
Provide solutions and briefly go over them in class. Introduce Markov chains and the jumping model.
As a programming assignment, ask to simulate the jumping model with various values of the parameter
Provide solutions and wrap up by explaining some of the properties of a Markov chain through the jumping model, including the ability to model hotspots.
The derivation of the result is as follows:
The proof is by induction where
Knowing that
I am not aware of any other exposition of chromosomal crossover, recombination, genetic linkage, hotspots, and genetic mapping that takes the approach outlined herein. The approach represents a simple and modern treatment of an ancient subject, without a compromise of its scientific and mathematical integrity.
The reader should find an insightful explanation with a focus on reinforcing the ideas by exposing them in different settings. In addition, there is an attempt to introduce the reader to the process of modeling by showing what works and what doesn't. Most importantly, this should provide an early chance to convey to our students that biology is a computational science.
I ignored some of the biological detail in favor of simplicity and consistency. Keep in mind, however, that in biology there is always an exception to the rule!
There is no explicit referencing in the text. This is intentional. I used what everyone would now consider folklore from biology, probability, and calculus. All can be found in textbooks, even elementary ones. For the interested reader, however, and in addition to any introductory texts on probability and calculus, here is a list (in alphabetical order by author) of book chapters that will provide enough background for further endeavors.
Gallager RG (1996) Finite State Markov Chains. In: Discrete Stochastic Processes (pp. 103–112). Norwell, MA: Kluwer Academic Publishers.
Hunter LE (2009) Evolution. In: The Process of Life: An Introduction to Molecular Biology (pp. 19–47). Cambridge, MA: The MIT Press.
Lovász L, Pelikán J, Vesztergombi K (2003) Combinatorial Probability. In: Discrete Mathematics: Elementary and Beyond (pp. 77–80, Uniform Probability). New York, NY: Springer.
Stein C, Drysdale RL, Bogart K (2011) Probability. In: Discrete Mathematics for Computer Scientists (pp. 276–279, Conditional Probability). Boston, MA: Pearson Education Inc. (AddisonWesley).
Pevzner PA (2001) Computational Gene Hunting. In: Computational Molecular Biology: An Algorithmic Approach (pp. 1–18). Cambridge, MA: The MIT Press.
I would like to thank the QuBi (QUantitative BIology) program committee at Hunter College for their encouragement to write this article, and the reviewers for their valuable suggestions to improve it.