<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id>
<journal-title-group>
<journal-title>PLOS Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-15-00134</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1004394</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes</article-title>
<alt-title alt-title-type="running-head">Identification of Vertebrate Ohnologs Using Multiple Genome Comparison</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" xlink:type="simple">
<name name-style="western">
<surname>Singh</surname> <given-names>Param Priya</given-names></name>
<xref ref-type="aff" rid="aff001"/>
<xref ref-type="fn" rid="currentaff001"><sup>¤a</sup></xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple">
<name name-style="western">
<surname>Arora</surname> <given-names>Jatin</given-names></name>
<xref ref-type="aff" rid="aff001"/>
</contrib>
<contrib contrib-type="author" corresp="yes" xlink:type="simple">
<name name-style="western">
<surname>Isambert</surname> <given-names>Hervé</given-names></name>
<xref ref-type="aff" rid="aff001"/>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<addr-line>CNRS UMR168, UPMC, Institut Curie, Research Center, Paris, France</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple">
<name name-style="western">
<surname>Christos</surname> <given-names>A.</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Ouzounis, Hellas, GREECE</addr-line>
</aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: PPS HI. Performed the experiments: PPS JA HI. Analyzed the data: PPS HI. Wrote the paper: PPS HI.</p>
</fn>
<fn fn-type="current-aff" id="currentaff001">
<label>¤a</label>
<p>Department of Genetics, Stanford, California, United States of America</p>
</fn>
<corresp id="cor001">* E-mail: <email xlink:type="simple">param@stanford.edu</email> (PPS); <email xlink:type="simple">herve.isambert@curie.fr</email> (HI)</corresp>
</author-notes>
<pub-date pub-type="collection">
<month>7</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>16</day>
<month>7</month>
<year>2015</year>
</pub-date>
<volume>11</volume>
<issue>7</issue>
<elocation-id>e1004394</elocation-id>
<history>
<date date-type="received">
<day>29</day>
<month>1</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>9</day>
<month>6</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-year>2015</copyright-year>
<copyright-holder>Singh et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="info:doi/10.1371/journal.pcbi.1004394" xlink:type="simple"/>
<abstract>
<p>Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined ‘ohnologs’ after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://ohnologs.curie.fr/">http://ohnologs.curie.fr/</ext-link>. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.</p>
</abstract>
<abstract abstract-type="summary">
<title>Author Summary</title>
<p>Duplication of existing genes with subsequent divergence of duplicated copies has long been recognized as the primary source of genomic innovation. Gene duplication is thus at the root of the evolution and complexification of living organisms. However, gene duplicates have been retained differently depending on the genomic scale of their duplication and their implication in genetic diseases. The scale of genomic duplication spans from small scale segmental duplication to whole genome duplication (WGD), which corresponds to a dramatic doubling event of a species genome. In particular, all vertebrates, including human, descend from two rounds of WGDs, which occurred in their jawless ancestor some 500 MY ago. Interestingly, WGD gene duplicates, also called ‘ohnologs’, have be shown to be more frequently implicated in genetic diseases in human. Hence, identifying ohnologs appears central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. In this study, we present a computational approach to predict ohnologs in six vertebrate genomes, including human, based on the comparison of their local gene content (<italic>i.e.</italic> synteny) with the genomes of six invertebrate outgroups. We show that such synteny comparisons across multiple genomes enhance the statistical power of ohnolog identification compared to earlier approaches.</p>
</abstract>
<funding-group>
<funding-statement>PPS acknowledges a PhD fellowship from Erasmus Mundus (UPMC) and La Ligue Contre le Cancer. HI acknowledges funding from Foundation Pierre-Gilles de Gennes, grant FPGG025. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="5"/>
<table-count count="1"/>
<page-count count="16"/>
</counts>
<custom-meta-group>
<custom-meta id="data-availability" xlink:type="simple">
<meta-name>Data Availability</meta-name>
<meta-value>All relevant data are within the paper and its Supporting Information files, as well as accessible from the open access server at <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://ohnologs.curie.fr">http://ohnologs.curie.fr</ext-link></meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="sec001" sec-type="intro">
<title>Introduction</title>
<p>Gene duplication and their subsequent divergence is the primary source of new genes in eukaryotes. The importance of evolution by gene duplication is exemplified by a large number of paralogous genes in most eukaryotic genomes. In addition to duplication of single genes or genomic segments, duplications of the entire genome have now been firmly established in all major eukaryotic kingdoms. Multiple lineages including unicellular yeast and paramecium, as well as many plants and animals are known to descend from polyploid ancestors, often through multiple rounds of genome duplications [<xref ref-type="bibr" rid="pcbi.1004394.ref001">1</xref>]. In vertebrates, whole genome duplications (WGD) were first hypothesized by Susumu Ohno [<xref ref-type="bibr" rid="pcbi.1004394.ref002">2</xref>] (the 2R-hypothesis), after whom WGD duplicated genes are now referred to as <italic>“ohnologs”</italic>.</p>
<p>Interestingly, duplicated genes originating from whole genome duplication have been preferentially retained in different functional categories as compared to duplicated genes originating from small scale duplication [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref006">6</xref>]. In particular, many ohnologs have been retained in gene families involved in development, signaling and gene regulation [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref007">7</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref010">10</xref>], and led to the emergence of novel cell types in vertebrates, such as the neural crest, the midbrain/hindbrain organizer and neurogenic placodes [<xref ref-type="bibr" rid="pcbi.1004394.ref011">11</xref>]. In addition, ohnologs are frequently associated with diseases such as cancer [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref006">6</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref012">12</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref014">14</xref>], and are particularly prone to dominant deleterious mutations [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref006">6</xref>] as rationalized from a population genetics perspective [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref015">15</xref>]. These observations suggest that the identification of ohnologs with high statistical confidence has important implications to better understand the developmental complexity of vertebrates as well as their enhanced susceptibility to dominant deleterious mutations and associated diseases.</p>
<p>However, the identification of ohnologs in vertebrate genomes is not straightforward [<xref ref-type="bibr" rid="pcbi.1004394.ref016">16</xref>]. During the millions of years of evolution following WGD, sister regions created by WGD are redistributed across the paleopolyploid genome by chromosomal rearrangements and degenerate by the loss of the majority of ohnologs (<xref ref-type="fig" rid="pcbi.1004394.g001">Fig 1</xref>). In principle, these degenerated WGD duplicated regions sharing a few ohnolog pairs can be identified in the paleopolyploid genome by comparing its genome-wide synteny either with itself (<xref ref-type="fig" rid="pcbi.1004394.g001">Fig 1I</xref>) or with outgroup genomes diverged before the WGD event (<xref ref-type="fig" rid="pcbi.1004394.g001">Fig 1J and 1K</xref>). Yet, the two rounds of WGD at the onset of vertebrates are among the oldest known genome duplications and the conservation of gene order (or micro-synteny) between extant vertebrate and invertebrate outgroup genomes is limited [<xref ref-type="bibr" rid="pcbi.1004394.ref017">17</xref>]. This makes WGD detection methods based on micro-synteny conservation [<xref ref-type="bibr" rid="pcbi.1004394.ref018">18</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref023">23</xref>] difficult to apply to WGD from early vertebrates. Other methods, not-based on synteny, such as Ks-based methods [<xref ref-type="bibr" rid="pcbi.1004394.ref024">24</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref025">25</xref>] and more recent phylogenetic methods [<xref ref-type="bibr" rid="pcbi.1004394.ref026">26</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref027">27</xref>], cannot be easily applied to the 500 MY-old WGD in vertebrates either, due to the saturation effect of the synonymous mutation rates Ks [<xref ref-type="bibr" rid="pcbi.1004394.ref028">28</xref>] and the difficulty in distinguishing between the two rounds of WGD in the phylogeny of early vertebrates [<xref ref-type="bibr" rid="pcbi.1004394.ref017">17</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref029">29</xref>].</p>
<fig id="pcbi.1004394.g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004394.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Evolution after WGD and identification of ohnologs.</title>
<p>Evolution after WGD and identification of ohnologs using content-based synteny comparison. The genomes of three lineages sharing a common ancestor are shown. Orthologs and paralogs have been depicted by the same color. The WGD lineage (A) underwent whole genome duplication (B) followed by non-functionalization (C) and genome rearrangements (D) leading to the current intragenomic content-based synteny (I). By contrast, the two outgroup genomes without WGD (E, G) experienced lineage specific genome rearrangements (F, H) leading to 1-to-2 content-based synteny pattern with the WGD lineage (J, K). Note, that some ohnolog pairs (D) are only identified by one of the two outgroups (J or K) due to lineage specific rearrangements.</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.g001"/>
</fig>
<p>As an alternative, a number of studies have proposed to identify ohnologs in the human genome by relaxing strict gene-order criteria and searching, instead, for content-based synteny [<xref ref-type="bibr" rid="pcbi.1004394.ref030">30</xref>] between the human genome and a single invertebrate outgroup genome [<xref ref-type="bibr" rid="pcbi.1004394.ref017">17</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref031">31</xref>] or within the human genome itself [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref004">4</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref032">32</xref>]. Using content-based synteny criteria, however, increases the odds of old duplicates being incorrectly identified as ohnologs, if no quantitative assessment of the statistical confidence of ohnolog pair candidates is performed. In addition, performing synteny comparison with a single outgroup may lead to omission of many ‘true’ ohnolog pairs, whose orthologs have moved to different non-syntenic regions in the extant outgroup genome (<xref ref-type="fig" rid="pcbi.1004394.g001">Fig 1</xref>).</p>
<p>In this study, we have extended these latter approaches to six amniote vertebrates (human, mouse, rat, pig, dog and chicken) by investigating the conservation of content-based gene synteny relative to six invertebrate outgroup genomes (lancelet, two seasquirts, sea urchin, fly and worm, <xref ref-type="supplementary-material" rid="pcbi.1004394.s002">S1 Fig</xref>). We also analyzed the synteny conservation from the regions created by 2R-WGD within each of the vertebrates, and then integrated the synteny information from both self and outgroup comparisons. The integration of synteny information across multiple genomes enables to identify ohnologs that are no longer in significant synteny in a particular vertebrate genome, as long as their ortholog status can be unequivocally established with proper ohnologs in other vertebrates. We present below the general principles of our multiple genome comparison approach to identify 2R ohnologs and provide a quantitative assessment of the statistical confidence of each ohnolog pairs by comparison with the expected spurious synteny obtained with shuffled genomes. We show that the synteny comparison across multiple genomes enhances the statistical power of ohnolog identification in vertebrates compared to earlier approaches. The resulting ohnolog pairs and families are accessible at <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://ohnologs.curie.fr/">http://ohnologs.curie.fr/</ext-link> for three statistical confidence levels and can also be recompiled for specific, user-defined, significance criteria.</p>
</sec>
<sec id="sec002" sec-type="materials|methods">
<title>Methods</title>
<sec id="sec003">
<title>Overview of the approach</title>
<p>We implemented content-based synteny comparisons between each amniote vertebrate and multiple invertebrate outgroup genomes. Initial ohnolog candidates were identified, in each vertebrate genome, using a window-based approach to detect putative synteny blocks between each vertebrate and the six outgroup genomes (outgroup comparison, <xref ref-type="fig" rid="pcbi.1004394.g001">Fig 1J</xref>), extending earlier similar approaches [<xref ref-type="bibr" rid="pcbi.1004394.ref017">17</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref030">30</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref031">31</xref>]. Additional synteny block candidates were also identified by comparing each vertebrate genome to itself (self comparison, <xref ref-type="fig" rid="pcbi.1004394.g001">Fig 1I</xref>) [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref032">32</xref>] and ohnolog pair candidates were further restricted to paralogous pairs duplicated at the base of vertebrates according to Ensembl compara [<xref ref-type="bibr" rid="pcbi.1004394.ref033">33</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref035">35</xref>] (see <xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>, Supplementary Materials and Methods). <xref ref-type="supplementary-material" rid="pcbi.1004394.s002">S1 Fig</xref> lists the numbers of human ohnolog pair candidates identified by each invertebrate outgroup and human-human synteny comparison, before applying any filtering on the statistical support of candidate synteny blocks. We identified a total of 15,107 such putative ohnolog pair candidates, including 11,428 identified with at least one outgroup and 15,054 identified by self comparison alone.</p>
<p>To narrow down this initial list of ohnolog candidates, we developed a quantitative approach to assess the statistical confidence of each ohnolog pair candidate. This quantitative approach and corresponding ‘q-score’, ranging from 0 to 1, estimates the probability that each ohnolog pair is simply identified by chance. Hence, lower q-scores imply more statistically significant ohnolog pairs (see <xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>). Finally, we integrated q-scores for outgroup-comparison and self-comparison from all vertebrates, and filtered the ohnolog pairs based on the resulting combined q-scores. A flowchart summarizing our algorithmic approach is depicted in <xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2</xref>. The pipeline of the approach is outlined below with methodological details described in Supplementary Materials and Methods (<xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>).</p>
<fig id="pcbi.1004394.g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004394.g002</object-id>
<label>Fig 2</label>
<caption>
<title>Flowchart of the algorithm to identify ohnologs.</title>
<p>Flowchart of the algorithm to identify ohnolog pairs and construct ohnolog families for a single vertebrate genome using content-based synteny comparison with multiple outgroup genomes (left panel) and self-comparison (right panel), see main text and <xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref> for details.</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.g002"/>
</fig>
</sec>
<sec id="sec004">
<title>Outline of the computational pipeline</title>
<list list-type="order">
<list-item><p><bold>Initial ohnolog candidates from comparison with six outgroup genomes.</bold> Initial ohnolog candidates in each amniote genome were identified using a window-based approach to detect putative synteny blocks between each vertebrate genome and the six outgroup genomes (<xref ref-type="supplementary-material" rid="pcbi.1004394.s005">S4 Fig</xref>). We used the orthologs between each vertebrate and outgroup genomes to identify conserved synteny blocks for a given window size <italic>W</italic> ranging from 100 to 500 genes (<xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2A and 2B</xref>, left panel). Vertebrate genes that lie on such synteny blocks and share the same outgroup ortholog (1-to-2 synteny conservation pattern) are ohnolog candidates from the outgroup comparison (<xref ref-type="supplementary-material" rid="pcbi.1004394.s006">S5A Fig</xref>, <xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2D</xref>).</p></list-item>
<list-item><p><bold>Initial ohnolog candidates from self-comparison in each amniote genome.</bold> Additional ohnolog candidates were also identified through self-comparison in each amniote genome using the same window size <italic>W</italic> (<xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2A and 2B</xref>, right panel). We identified regions in each vertebrate genome with multiple paralogs duplicated at the base of vertebrates (<xref ref-type="supplementary-material" rid="pcbi.1004394.s006">S5B Fig</xref>).</p></list-item>
<list-item><p><bold>Filtering ohnolog candidate pairs by duplication time.</bold> Ohnolog pair candidates from both outgroup and self-comparison are further restricted to paralogous gene pairs duplicated at the base of vertebrates according to Ensembl compara (see <xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>).</p></list-item>
<list-item><p><bold>Calculating P-value and q-score for synteny blocks.</bold> A P-value for each synteny block candidate for outgroup and self comparisons is derived based on the observed number of homologous gene pairs in the defined window. This P-value assesses the chance that the observed numbers of orthologous or paralogous gene pairs are unlikely to result simply by chance, due to the average and variance of gene pairs across synteny windows (<xref ref-type="supplementary-material" rid="pcbi.1004394.s007">S6 Fig</xref>, <xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2C</xref>). We then combine P-values to define quantitative scores or ‘q-scores’ for outgroup and self comparisons to assess the statistical significance of each ohnolog pair (<xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>, <xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2E</xref>).</p></list-item>
<list-item><p><bold>Averaging across different window sizes.</bold> The ohnolog identification and statistical significance analysis are subsequently performed for five different window sizes ranging from 100 to 500 genes and a global q-score for outgroup and self comparison is obtained through geometric average for each ohnolog pair over the different window sizes (<xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2F and 2G</xref>).</p></list-item>
<list-item><p><bold>Leveraging statistical power of multiple outgroup comparison.</bold> To take advantage of the statistical power of multiple outgroup comparison, q-scores computed from the different outgroup comparisons are simply multiplied to lead to a unique, more significant global q-score taking into account all outgroups. This amounts to assume independent rearrangements in each outgroup lineages, which diverged more than 500 MY ago. Comparisons with randomized genomes confirmed limited spurious identification of false positive ohnologs due to outgroup genome correlations (<xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>, <xref ref-type="supplementary-material" rid="pcbi.1004394.s008">S7 Fig</xref> and <xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2H</xref>).</p></list-item>
<list-item><p><bold>Computing consensus amniote ohnologs.</bold> The statistical power of multiple genome comparison is further exploited to obtain a consensus set of amniote ohnologs. To this end, outgroup and self-synteny q-scores of ohnolog pairs from different amniotes are averaged over all genomes with corresponding ortholog pairs in Ensembl, <xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>. Using averaged q-scores enables to circumvent some recent lineage specific rearrangements in amniote genomes, while taking into account their long common evolutionary history since divergence from invertebrate outgroups (<xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2I</xref>).</p></list-item>
<list-item><p><bold>Defining statistical confidence criteria.</bold> We then construct three sets of ohnologs by combining averaged q-scores from both outgroup (<inline-formula id="pcbi.1004394.e001"><alternatives><graphic id="pcbi.1004394.e001g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e001"/><mml:math id="M1" display="inline" overflow="scroll"><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">o</mml:mi> <mml:mi mathvariant="normal">u</mml:mi> <mml:mi mathvariant="normal">t</mml:mi> <mml:mi mathvariant="normal">g</mml:mi> <mml:mi mathvariant="normal">r</mml:mi></mml:mrow></mml:msub></mml:math></alternatives></inline-formula>) and self (<inline-formula id="pcbi.1004394.e002"><alternatives><graphic id="pcbi.1004394.e002g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e002"/><mml:math id="M2" display="inline" overflow="scroll"><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">s</mml:mi> <mml:mi mathvariant="normal">e</mml:mi> <mml:mi mathvariant="normal">l</mml:mi> <mml:mi mathvariant="normal">f</mml:mi></mml:mrow></mml:msub></mml:math></alternatives></inline-formula>) comparisons to define three significance criteria (<xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2J</xref>),
<list list-type="alpha-lower">
<list-item><p><bold>Strict</bold>:    <inline-formula id="pcbi.1004394.e003"><alternatives><graphic id="pcbi.1004394.e003g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e003"/><mml:math id="M3" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">o</mml:mi> <mml:mi mathvariant="normal">u</mml:mi> <mml:mi mathvariant="normal">t</mml:mi> <mml:mi mathvariant="normal">g</mml:mi> <mml:mi mathvariant="normal">r</mml:mi></mml:mrow></mml:msub> <mml:mo>&lt;</mml:mo> <mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>01</mml:mn></mml:mrow></mml:math></alternatives></inline-formula> <sc>AND</sc> <inline-formula id="pcbi.1004394.e004"><alternatives><graphic id="pcbi.1004394.e004g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e004"/><mml:math id="M4" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">s</mml:mi> <mml:mi mathvariant="normal">e</mml:mi> <mml:mi mathvariant="normal">l</mml:mi> <mml:mi mathvariant="normal">f</mml:mi></mml:mrow></mml:msub> <mml:mo>&lt;</mml:mo> <mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>01</mml:mn></mml:mrow></mml:math></alternatives></inline-formula></p></list-item>
<list-item><p><bold>Intermediate</bold>: <inline-formula id="pcbi.1004394.e005"><alternatives><graphic id="pcbi.1004394.e005g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e005"/><mml:math id="M5" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">o</mml:mi> <mml:mi mathvariant="normal">u</mml:mi> <mml:mi mathvariant="normal">t</mml:mi> <mml:mi mathvariant="normal">g</mml:mi> <mml:mi mathvariant="normal">r</mml:mi></mml:mrow></mml:msub> <mml:mo>&lt;</mml:mo> <mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>05</mml:mn></mml:mrow></mml:math></alternatives></inline-formula> <sc>AND</sc> <inline-formula id="pcbi.1004394.e006"><alternatives><graphic id="pcbi.1004394.e006g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e006"/><mml:math id="M6" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">s</mml:mi> <mml:mi mathvariant="normal">e</mml:mi> <mml:mi mathvariant="normal">l</mml:mi> <mml:mi mathvariant="normal">f</mml:mi></mml:mrow></mml:msub> <mml:mo>&lt;</mml:mo> <mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>3</mml:mn></mml:mrow></mml:math></alternatives></inline-formula></p></list-item>
<list-item><p><bold>Relaxed</bold>:   <inline-formula id="pcbi.1004394.e007"><alternatives><graphic id="pcbi.1004394.e007g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e007"/><mml:math id="M7" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">o</mml:mi> <mml:mi mathvariant="normal">u</mml:mi> <mml:mi mathvariant="normal">t</mml:mi> <mml:mi mathvariant="normal">g</mml:mi> <mml:mi mathvariant="normal">r</mml:mi></mml:mrow></mml:msub> <mml:mo>&lt;</mml:mo> <mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>05</mml:mn></mml:mrow></mml:math></alternatives></inline-formula> <sc>OR</sc> (<inline-formula id="pcbi.1004394.e008"><alternatives><graphic id="pcbi.1004394.e008g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e008"/><mml:math id="M8" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">o</mml:mi> <mml:mi mathvariant="normal">u</mml:mi> <mml:mi mathvariant="normal">t</mml:mi> <mml:mi mathvariant="normal">g</mml:mi> <mml:mi mathvariant="normal">r</mml:mi></mml:mrow></mml:msub> <mml:mspace width="-0.166667em"/><mml:mo>&lt;</mml:mo> <mml:mspace width="-0.166667em"/><mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>5</mml:mn></mml:mrow></mml:math></alternatives></inline-formula> <sc>AND</sc> <inline-formula id="pcbi.1004394.e009"><alternatives><graphic id="pcbi.1004394.e009g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e009"/><mml:math id="M9" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover><mml:mi>Q</mml:mi> <mml:mo>¯</mml:mo></mml:mover> <mml:mrow><mml:mi mathvariant="normal">s</mml:mi> <mml:mi mathvariant="normal">e</mml:mi> <mml:mi mathvariant="normal">l</mml:mi> <mml:mi mathvariant="normal">f</mml:mi></mml:mrow></mml:msub> <mml:mspace width="-0.166667em"/><mml:mo>&lt;</mml:mo> <mml:mspace width="-0.166667em"/><mml:mn>0</mml:mn> <mml:mo>.</mml:mo> <mml:mn>01</mml:mn></mml:mrow></mml:math></alternatives></inline-formula>)</p></list-item>
</list>
Note that the relaxed criteria may also include a number of paralogs from large scale segmental duplications from the origin of vertebrates.</p></list-item>
<list-item><p><bold>Generating ohnolog gene families.</bold> Finally, we construct ohnolog gene families using a depth-first search algorithm [<xref ref-type="bibr" rid="pcbi.1004394.ref036">36</xref>] in the space of ohnolog pairs (<xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>, <xref ref-type="fig" rid="pcbi.1004394.g002">Fig 2K</xref>).</p></list-item>
</list>
</sec>
</sec>
<sec id="sec005" sec-type="conclusions">
<title>Results/Discussion</title>
<sec id="sec006">
<title>Human ohnologs</title>
<p>The strict, intermediate and relaxed criteria lead to three sets of ohnolog pairs in the human genome with decreasing statistical confidence levels: 2,695 ohnolog pairs with very high confidence, 4,827 with high confidence and 8,178 with medium confidence, respectively (<xref ref-type="table" rid="pcbi.1004394.t001">Table 1</xref>). These predicted ohnolog pairs are also significantly different from ohnolog pairs reported in earlier studies [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref004">4</xref>], <xref ref-type="table" rid="pcbi.1004394.t001">Table 1</xref>. In particular, 617 (23%) of the 2,695 strict ohnologs pairs from our analysis are not identified in [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>]. For example, the strict ohnolog pairs between the transcription factors <italic>SOX11</italic> and <italic>SOX12</italic> or between the microtubule-associated proteins <italic>MAP2</italic>, <italic>MAP4</italic> and <italic>MAPT</italic> are missing in [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>]. Conversely, 3,695 (44%) of the 8,383 ohnolog pairs reported in [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>] are excluded by the present analysis. More precisely, we found that 1,853 (50%) of these 3,695 ohnolog pairs ruled out by our analysis have not been duplicated at the base of vertebrates according to Ensembl compara, while 813 (22%) discarded ohnolog pairs are not supported by our quantitative multi-genome synteny comparison and the remaining 1,029 (28%) are excluded by both duplication timing and quantitative multi-genome synteny assessment. For example, the 3-oxoacid CoA-transferase genes <italic>OXCT1</italic> and <italic>OXCT2</italic>, previously reported as ohnologs [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>], have in fact been duplicated more recently than the 2R-WGD (<italic>i.e.</italic> in mammals according to Ensembl compara). By contrast, the signaling genes <italic>WNT1</italic> and <italic>WNT3</italic>, also reported as an ohnolog pair [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>] are not supported by our quantitative multi-genome synteny criteria and have also been duplicated earlier than the 2R-WGD (<italic>i.e.</italic> in bilateria or coelomata according to Ensembl compara).</p>
<table-wrap id="pcbi.1004394.t001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004394.t001</object-id>
<label>Table 1</label>
<caption>
<title>Individual ohnologs, pairs and families for different quantitative criteria in the human genome (see text).</title>
</caption>
<alternatives>
<graphic id="pcbi.1004394.t001g" mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.t001"/>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
<col align="left" valign="top" span="1"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="2" colspan="1">Confidence criteria (this study) <italic>vs</italic> earlier studies</th>
<th align="left" rowspan="2" colspan="1">Ohno Pairs</th>
<th align="left" rowspan="2" colspan="1">Individual Ohnologs</th>
<th align="left" rowspan="2" colspan="1">Ohnolog Families</th>
<th align="center" colspan="4" rowspan="1">Family Sizes</th>
<th align="left" rowspan="2" colspan="1">% of families with size ≤ 4</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">2</th>
<th align="left" rowspan="1" colspan="1">3</th>
<th align="left" rowspan="1" colspan="1">4</th>
<th align="left" rowspan="1" colspan="1">≥ 5</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Strict criteria</td>
<td align="left" rowspan="1" colspan="1">2695</td>
<td align="left" rowspan="1" colspan="1">3544</td>
<td align="left" rowspan="1" colspan="1">1381</td>
<td align="left" rowspan="1" colspan="1">970</td>
<td align="left" rowspan="1" colspan="1">321</td>
<td align="left" rowspan="1" colspan="1">83</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="char" char="." rowspan="1" colspan="1">99.5%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Intermediate criteria</td>
<td align="left" rowspan="1" colspan="1">4827</td>
<td align="left" rowspan="1" colspan="1">5504</td>
<td align="left" rowspan="1" colspan="1">2024</td>
<td align="left" rowspan="1" colspan="1">1337</td>
<td align="left" rowspan="1" colspan="1">481</td>
<td align="left" rowspan="1" colspan="1">175</td>
<td align="left" rowspan="1" colspan="1">31</td>
<td align="char" char="." rowspan="1" colspan="1">98.5%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Relaxed criteria</td>
<td align="left" rowspan="1" colspan="1">8178</td>
<td align="left" rowspan="1" colspan="1">7831</td>
<td align="left" rowspan="1" colspan="1">2642</td>
<td align="left" rowspan="1" colspan="1">1676</td>
<td align="left" rowspan="1" colspan="1">633</td>
<td align="left" rowspan="1" colspan="1">245</td>
<td align="left" rowspan="1" colspan="1">88</td>
<td align="char" char="." rowspan="1" colspan="1">96.7%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Makino &amp; McLysaght 2010</td>
<td align="left" rowspan="1" colspan="1">8383</td>
<td align="left" rowspan="1" colspan="1">6993</td>
<td align="left" rowspan="1" colspan="1">2351</td>
<td align="left" rowspan="1" colspan="1">1475</td>
<td align="left" rowspan="1" colspan="1">547</td>
<td align="left" rowspan="1" colspan="1">214</td>
<td align="left" rowspan="1" colspan="1">115</td>
<td align="char" char="." rowspan="1" colspan="1">95.1%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Huminiecki &amp; Heldin 2010</td>
<td align="left" rowspan="1" colspan="1">29344</td>
<td align="left" rowspan="1" colspan="1">9557</td>
<td align="left" rowspan="1" colspan="1">2543</td>
<td align="left" rowspan="1" colspan="1">1222</td>
<td align="left" rowspan="1" colspan="1">618</td>
<td align="left" rowspan="1" colspan="1">332</td>
<td align="left" rowspan="1" colspan="1">371</td>
<td align="char" char="." rowspan="1" colspan="1">85.4%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>The distribution of our ohnolog pairs with respect to all six outgroups is depicted on a six way Venn diagram in <xref ref-type="fig" rid="pcbi.1004394.g003">Fig 3</xref> (percentages) and <xref ref-type="supplementary-material" rid="pcbi.1004394.s009">S8 Fig</xref> (numbers). Ohnolog pairs range from 1,416 with sea urchin comparison to a maximum of 5,994 using <italic>Drosophila melanogaster</italic> as outgroup. There are only 3.8% (293) ohnolog pairs identified by all outgroups, while each outgroup combination shaded in green in <xref ref-type="fig" rid="pcbi.1004394.g003">Fig 3</xref> contributes to more than 2% of the total number of ohnolog pairs. This illustrates that many ohnologs would not be identified using just a single outgroup genome owing to lineage specific rearrangements in the outgroup genomes, limitations of genome assembly/annotation or homology criteria. In particular, while 90% (6,943) ohnolog pairs in human are identified by at least one chordate outgroup genome, 10% (772) ohnolog pairs are only identified by synteny comparison with non-chordate genomes. For example, the homeobox protein ohnolog pair <italic>VAX1</italic>/<italic>VAX2</italic> and the nuclear receptor co-repressor ohnolog pair <italic>LCOR</italic>/<italic>LCORL</italic> are only identified by synteny comparison with <italic>D. melanogaster</italic> and <italic>C. elegans</italic>.</p>
<fig id="pcbi.1004394.g003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004394.g003</object-id>
<label>Fig 3</label>
<caption>
<title>Venn diagram of distribution of human ohnologs with respect to outgroups.</title>
<p>A six-way Venn diagram showing the distribution in percentages of the 7,715 of the total 8,178 human ohnolog pairs that are identified by at least one outgroup and predicted from the relaxed criteria. Only 3.8% of human ohnolog pairs are identified by all outgroup. Each of the shaded sectors in green contributes to more than 2% of all ohnolog pairs (numbers of ohnolog pairs are given in <xref ref-type="supplementary-material" rid="pcbi.1004394.s009">S8 Fig</xref>).</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.g003"/>
</fig>
<p>The final human ohnolog counts for strict, intermediate and relaxed criteria are respectively, 3,544 ohnologs (Strict Criteria); 5,504 ohnologs (Intermediate Criteria) and 7,831 ohnologs (Relaxed Criteria), <xref ref-type="table" rid="pcbi.1004394.t001">Table 1</xref>. This is also to be contrasted with the results of previous studies that used either content-based synteny comparison with a single outgroup [<xref ref-type="bibr" rid="pcbi.1004394.ref017">17</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref031">31</xref>] or only self comparison [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref004">4</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref032">32</xref>] without statistical significance criteria to filter out spurious synteny block conservation. We found that the available sets of human ohnologs from these early studies also present significant differences from our results. For instance, the set of 7,075 ohnolog genes from [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>] shows significant differences from ours (<xref ref-type="supplementary-material" rid="pcbi.1004394.s010">S9 Fig</xref>), as 14%, 18% and 23% of our human ohnologs for strict, intermediate and relaxed criteria, respectively, have not been identified in [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>]. Conversely, 57%, 33% and 15% of this early ohnolog data set are excluded from our strict, intermediate and relaxed human ohnolog sets, respectively (<xref ref-type="supplementary-material" rid="pcbi.1004394.s010">S9 Fig</xref>). As discussed above, this is due to inconsistent duplication times, according to Ensembl Compara, and/or limited statistical supports for each confidence criteria.</p>
<p>We then reconstructed ohnolog families from ohnolog pairs using a depth first search algorithm [<xref ref-type="bibr" rid="pcbi.1004394.ref036">36</xref>] (<xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>). The resulting ohnolog families also contain paralogs which are small scale duplicates with respect to each other but form ohnolog pairs with a third gene of the family. Accounting for such small scale duplicates, eventually lead to ohnolog families with an expected maximum of four ohnologs retained from the two rounds of WGD in early vertebrates. However, as most genes lose their duplicates after WGD, most ohnolog families are expected to be of size two or three.</p>
<p>We obtained 1,381, 2,024 and 2,642 ohnolog families using strict, intermediate and relaxed criteria, respectively, for the human genome. Most remarkably, for almost all of these families, the size never exceeds four ohnologs, as expected for two rounds of WGD. As depicted in <xref ref-type="table" rid="pcbi.1004394.t001">Table 1</xref>, all but 7 ohnolog families (99.5%) have a size smaller or equal to four for the strict criteria. Even with the most relaxed criteria, 96.7% of ohnolog families are consistent with a maximum family size of four ohnologs. Furthermore, a sharp decline in the number of families was observed beyond size four, suggesting a limited number of false positive ohnologs incompatible with two rounds of genome duplications. Interestingly, however, many three- or four-ohnolog families could not be identified independently in individual amniote genomes, but only by integrating synteny information from different amniote genomes, such as the four-ohnolog family <italic>ERAS</italic>/<italic>HRAS</italic>/<italic>KRAS</italic>/<italic>NRAS</italic> (relaxed criteria).</p>
<p>We also applied the same approach to generate ohnolog families from the ohnolog pairs provided by [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>] and [<xref ref-type="bibr" rid="pcbi.1004394.ref004">4</xref>]. 95.1% of ohnolog families from [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>] are consistent with two rounds of WGD and only 85.4% of ohnolog families from [<xref ref-type="bibr" rid="pcbi.1004394.ref004">4</xref>] have sizes up to four ohnologs. Clearly families exceeding four ohnologs must result either from the erroneous concatenation of distinct ohnolog families or include non-ohnolog genes. For instance, the ohnolog status of <italic>TRPV5</italic> and <italic>TRPV6</italic> [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>] from the large family of six ion channels (<italic>TRPV1-6</italic>) are not supported by our quantitative assessment of self- and outgroup synteny. Conversely, we could also identified previously overlooked ohnologs, through high confidence assessment of self- and outgroup synteny. For instance, the guanine exchange factor <italic>RGL2</italic> was found to be part of a four-ohnolog family with strict criteria, <italic>RGL1</italic>/<italic>RGL2</italic>/<italic>RGL3</italic>/<italic>RALGDS, RGL4</italic> (with <italic>RGL4</italic> a small scale duplicate of <italic>RALGDS</italic>).</p>
</sec>
<sec id="sec007">
<title>Ohnologs in other amniote vertebrates</title>
<p>In addition to the human genome, our synteny comparison approach across multiple genomes also identified ohnologs in five other amniote genomes: four mammals (mouse, rat, pig and dog) and one bird (chicken). Starting from ohnolog pairs in each species, the same approach was used to generate ohnolog families. A summary of individual ohnologs, ohnolog pairs and ohnolog families for these genomes is given in <xref ref-type="supplementary-material" rid="pcbi.1004394.s003">S2 Fig</xref> for strict, intermediate and relaxed quantitative criteria.</p>
<p>The level of annotation of these genomes is variable and the number of annotated protein coding genes range from 15,310 for chicken to 22,865 for the rat genome (<xref ref-type="supplementary-material" rid="pcbi.1004394.s004">S3 Fig</xref>). Using the relaxed criteria, a minimum of 4,282 to a maximum of 9,708 ohnolog pairs could be identified for chicken and rat, respectively. The six way Venn diagram in <xref ref-type="fig" rid="pcbi.1004394.g004">Fig 4</xref> summarizes the fractions of retention <italic>versus</italic> lineage specific loss of ohnologs in the analyzed amniote genomes for the relaxed criteria (see <xref ref-type="supplementary-material" rid="pcbi.1004394.s011">S10 Fig</xref> for ohnolog numbers). Statistics for the strict criteria are given in <xref ref-type="supplementary-material" rid="pcbi.1004394.s012">S11 Fig</xref>. The identification of consensus ohnologs in this context implies that we are able to detect their ohnolog status through self- and outgroup synteny comparison or, alternatively, through orthology with <italic>bona fide</italic> ohnologs in other amniotes (see <xref ref-type="supplementary-material" rid="pcbi.1004394.s001">S1 Text</xref>). Indeed, ohnologs that are no longer in significant synteny in a particular vertebrate genome can still be identified, as long as their ortholog status can be unequivocally established with proper ohnologs in other vertebrates. This enables to circumvent strict synteny conditions in a specific genome.</p>
<fig id="pcbi.1004394.g004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004394.g004</object-id>
<label>Fig 4</label>
<caption>
<title>Venn diagram of the distribution of amniote ohnologs.</title>
<p>A six-way Venn diagram showing the distribution in percentages of the ohnologs identified in at least one amniote and predicted from the relaxed criteria. 36.6% of ohnologs are found in all six amniotes. Each shaded sectors in red contributes to more than 2% of all consensus ohnologs in amniotes (numbers of ohnologs are given in <xref ref-type="supplementary-material" rid="pcbi.1004394.s011">S10 Fig</xref>).</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.g004"/>
</fig>
<p>By contrast to the small fraction of ohnolog genes identified by the six outgroups (<italic>i.e.</italic> 3.8%, <xref ref-type="fig" rid="pcbi.1004394.g004">Fig 4</xref>), 36.6% of predicted ohnologs are shared by all six amniotes, 53.9% by the five mammals and 74.3% by human, mouse and rat, while only a few other combinations of specific amniotes contribute to more than 2% of all ohnologs (see sectors shaded in red in <xref ref-type="fig" rid="pcbi.1004394.g004">Fig 4</xref>). This illustrates that the ohnologs have been largely conserved in mammals and to a lesser extent across amniotes. Likewise, ohnolog family sizes in each amniote genome consistently follow similar distributions as observed in human (<xref ref-type="table" rid="pcbi.1004394.t001">Table 1</xref>) with a sharp decline in the number of families beyond the maximum size of four ohnologs (<xref ref-type="supplementary-material" rid="pcbi.1004394.s003">S2 Fig</xref>). In fact, the numbers of ohnologs in each family are most often the same in human and other mammals (in particular mouse) with occasional differences, typically missing ohnologs, in chicken which has significantly fewer genes (including ohnologs) than other amniotes considered in this study. For example, chicken has lost a number of adipokine genes [<xref ref-type="bibr" rid="pcbi.1004394.ref037">37</xref>] such as <italic>SERPINE1</italic>, which is part of a four-ohnolog family in mammals, <italic>SERPINE1</italic>/<italic>SERPINE2</italic>/<italic>SERPINE3</italic>/<italic>SERPINI1</italic>|<italic>SERPINI2</italic> (where <italic>SERPINI1</italic> and <italic>SERPINI2</italic> are small scale duplicates). Similarly, all three ohnolog genes in the family of DNA binding Forkhead box protein A, <italic>i.e.</italic> <italic>FOXA1</italic>/<italic>FOXA2</italic>/<italic>FOXA3</italic>, are missing in the annotated chicken genome. Hence, differences in the shared ohnologs in <xref ref-type="fig" rid="pcbi.1004394.g004">Fig 4</xref> arise due to lineage specific ohnolog loss or, possibly, due to missing annotations of genes and/or orthologs in these genomes.</p>
<p>We have so far restricted our synteny conservation analysis across multiple genomes to selected amniote genomes. In particular, amphibians and fishes have not been included in the analysis. This is because assembled chromosomal scaffolds of available amphibians (<italic>e.g.</italic> Xenopus) and non-teleost fishes (<italic>e.g.</italic> elephant shark and coelacanth) do not contain enough genes to be included in a content-based synteny conservation analysis (<italic>e.g.</italic> 81% of <italic>X. tropicalis</italic> genes are on chromosomal scaffolds with fewer than 50 genes). As for teleost fish genomes, they experienced a third more recent (3R) WGD, about 300 MY ago [<xref ref-type="bibr" rid="pcbi.1004394.ref038">38</xref>] in addition to the two rounds of (2R) WGD common to all vertebrates. This additional 3R WGD implies methodological issues specific to teleost fish genomes, which will be addressed in a forthcoming extension of our computational approach to identify ohnologs through multiple genome synteny comparison.</p>
</sec>
<sec id="sec008">
<title>Ohnologs association with functional categories and diseases</title>
<p>As outlined in the introduction, ohnologs have been reported to be preferentially retained in functional categories associated with development, signaling and gene regulation in the human genome [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref007">7</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref010">10</xref>]. We performed a Gene Ontology (GO) enrichment analysis on four amniote vertebrates using DAVID [<xref ref-type="bibr" rid="pcbi.1004394.ref039">39</xref>] and observed the same general trend across these amniote genomes (<xref ref-type="fig" rid="pcbi.1004394.g005">Fig 5A</xref>). This confirms that ohnologs are associated with similar functional categories in different vertebrates.</p>
<fig id="pcbi.1004394.g005" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004394.g005</object-id>
<label>Fig 5</label>
<caption>
<title>Ohnolog association to cancer and diseases in human.</title>
<p>(A) Gene Ontology enrichment for four amniote ohnolog datasets from the relaxed criteria. From top to bottom, the top 25 enriched GO terms, sorted on the basis of average rank across the four genomes. Bubble sizes are proportional to the rank (p-value) of the term for each genome. (B) Ohnolog association to cancer and genetic diseases in human. Ohnolog enrichment is especially significant for core cancer genes, autosomal dominant disease genes and genes with autoinhibitory protein folds, see text, in agreement with earlier reports [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref006">6</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref015">15</xref>].</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.g005"/>
</fig>
<p>In addition, ohnologs have also been associated with disease mutations [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref012">12</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref014">14</xref>], in particular with dominant deleterious mutations frequently implicated in cancers and dominant genetic diseases [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref006">6</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref015">15</xref>]. <xref ref-type="fig" rid="pcbi.1004394.g005">Fig 5B</xref> confirms such cancer and genetic disease associations for all three ohnolog confidence criteria adopted in this study. This is particularly significant for core cancer genes [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref040">40</xref>] (amounting for just 8.3% of non-ohnologs but up to 21.6–26% of ohnologs, <italic>i.e.</italic> a 2.6–3.1 fold increase, <italic>p</italic> = 3.4 × 10<sup>−153</sup> Fisher Exact Test) and autosomal dominant diseases (amounting for just 2.1% of non-ohnologs but up to 5.4–5.9% of ohnologs, <italic>i.e.</italic> a 2.6–2.8 fold increase, <italic>p</italic> = 3.4 × 10<sup>−27</sup> Fisher Exact Test) in agreement with earlier reports [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>, <xref ref-type="bibr" rid="pcbi.1004394.ref006">6</xref>] and evolutionary models [<xref ref-type="bibr" rid="pcbi.1004394.ref015">15</xref>]. We also analyzed the enrichment of ohnologs in genes with autoinhibitory protein folds, which are prone to dominant deleterious mutations. To this end, we collected genes with autoinhibitory protein folds either from careful literature curation [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>] or based on the annotation of structural domains frequently associated with autoinhibition (<italic>i.e.</italic> SH3, DH, PH, CH, Drf and Eth domains), identified using Hidden Markov Model (HMM) search [<xref ref-type="bibr" rid="pcbi.1004394.ref041">41</xref>] against the PFAM database [<xref ref-type="bibr" rid="pcbi.1004394.ref042">42</xref>] (see Supplementary Methods). We observed that the ohnologs are particularly enriched in genes with autoinhibitory protein folds (amounting for just 1.4% of non-ohnologs but up to 9–12% of ohnologs, <italic>i.e.</italic> a 6.4–8.6 fold increase, <italic>p</italic> = 4.4 × 10<sup>−150</sup> Fisher Exact Test) [<xref ref-type="bibr" rid="pcbi.1004394.ref005">5</xref>].</p>
</sec>
<sec id="sec009">
<title>The ‘Ohnologs’ server</title>
<p>The data of all the ohnolog pairs and families for the six vertebrate genomes is accessible through the ‘Ohnologs’ server at <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://ohnologs.curie.fr/">http://ohnologs.curie.fr/</ext-link>. There, users can <bold>i)</bold> search for a particular gene, <bold>ii)</bold> browse pre-compiled ohnolog families and ohnolog pairs or <bold>iii)</bold> generate ohnolog families based on their own, user-defined, quantitative filters. The server is implemented in Perl-CGI and is hosted on a virtual machine at Institut Curie.</p>
<p>On the <italic>Search</italic> page (<xref ref-type="supplementary-material" rid="pcbi.1004394.s013">S12 Fig</xref>), the user can search for a gene of interest in any of the six available vertebrates using either Ensembl Id, gene symbol or any desired keywords. Search by functional categories is also possible using Gene Ontology Id or term. If a keyword search does not match any gene directly, we display all the genes matching that keyword in gene symbol, text description or GO term. A hyperlink from this page directs to the details on its ohnolog families and its possible association with human diseases points to G<sc>enecards</sc> [<xref ref-type="bibr" rid="pcbi.1004394.ref043">43</xref>] and C<sc>osmic</sc> [<xref ref-type="bibr" rid="pcbi.1004394.ref044">44</xref>] databases. This page also contain links to details in UniProt and Entrez databases if available. If the gene exists in our analysis, and is an ohnolog, users are directed to the details about ohnolog families for each statistical confidence levels (<italic>i.e.</italic>, strict, intermediate and relaxed criteria), <xref ref-type="supplementary-material" rid="pcbi.1004394.s014">S13 Fig</xref>.</p>
<p>Alternatively, users can also generate ohnolog families using our multi genome comparison analysis, for any of the six available vertebrate genomes using an arbitrary, user-defined, quantitative criteria for the outgroup and self comparisons. The default values correspond to the strict criteria. The result pages display all the pre-calculated or custom generated families, which can also be downloaded.</p>
<p>In the light of the importance of ohnologs in the evolution of vertebrates and their enhanced association with diseases, our analysis provides a useful resource to gain further insights on the impact of WGD in extant vertebrates.</p>
</sec>
</sec>
<sec id="sec010">
<title>Supporting Information</title>
<supplementary-material id="pcbi.1004394.s001" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s001" mimetype="application/pdf" xlink:type="simple">
<label>S1 Text</label>
<caption>
<title>Supplementary materials and methods including details on ohnolog identification and analysis.</title>
<p>(PDF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s002" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s002" mimetype="image/tiff" xlink:type="simple">
<label>S1 Fig</label>
<caption>
<title>Number of human ohnolog candidates.</title>
<p>Number of human ohnologs identified by outgroup and self comparison before applying any quantitative filter for content-based synteny.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s003" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s003" mimetype="image/tiff" xlink:type="simple">
<label>S2 Fig</label>
<caption>
<title>Ohnologs in the five non-human amniote genomes analyzed.</title>
<p>Individual ohnologs, pairs and families for the three quantitative criteria in the five non-human amniote genomes analyzed.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s004" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s004" mimetype="image/tiff" xlink:type="simple">
<label>S3 Fig</label>
<caption>
<title>Numbers of protein coding orthologs and paralogs.</title>
<p>Number of protein coding genes, orthologs and paralogs for the analyzed vertebrate (A) and invertebrate (B) genomes.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s005" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s005" mimetype="image/tiff" xlink:type="simple">
<label>S4 Fig</label>
<caption>
<title>Schematic tree for the organisms analyzed in this study.</title>
<p>Schematic tree for the paleopolyploid and outgroup organisms with duplication nodes taken from Ensembl Compara [<xref ref-type="bibr" rid="pcbi.1004394.ref033">33</xref>–<xref ref-type="bibr" rid="pcbi.1004394.ref035">35</xref>]. Gray nodes are not part of Ensembl. Paleopolyploid vertebrate genomes included in this study are highlighted with a red box and invertebrate outgroups (for the 2R-WGD) are highlighted by a green box.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s006" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s006" mimetype="image/tiff" xlink:type="simple">
<label>S5 Fig</label>
<caption>
<title>Identification of content-based synteny.</title>
<p>Comparison of genomic regions to identify anchor pairs (in red) and ohnolog candidate pairs (dashed red). Each block represents a gene labeled by <italic>O</italic><sub><italic>i</italic></sub> on the outgroup genome and <italic>V</italic><sub><italic>i</italic></sub> on the vertebrate genome. Duplicated regions in the vertebrate genome are marked by <inline-formula id="pcbi.1004394.e010"><alternatives><graphic id="pcbi.1004394.e010g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e010"/><mml:math id="M10" display="inline" overflow="scroll"><mml:mrow><mml:msubsup><mml:mi>V</mml:mi> <mml:mn>1</mml:mn> <mml:mo>′</mml:mo></mml:msubsup> <mml:mo>−</mml:mo> <mml:msubsup><mml:mi>V</mml:mi> <mml:mi>n</mml:mi> <mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></alternatives></inline-formula>. Other orthologous (A) and paralogous (B) relations are depicted by green lines.</p>
<p><bold>(A)</bold> Identification of synteny <italic>anchors</italic> between an outgroup window and two windows in the vertebrate genome. Using a window of size 8(+1) centered around the <italic>O</italic><sub>7</sub>−<italic>V</italic><sub>7</sub> and <inline-formula id="pcbi.1004394.e011"><alternatives><graphic id="pcbi.1004394.e011g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e011"/><mml:math id="M11" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mi>O</mml:mi> <mml:mn>7</mml:mn></mml:msub> <mml:mo>−</mml:mo> <mml:msubsup><mml:mi>V</mml:mi> <mml:mn>7</mml:mn> <mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></alternatives></inline-formula> orthologous pairs, we observe 4 and 3 additional gene pairs between the outgroup and the vertebrate regions 1 and 2, respectively. Hence, <italic>O</italic><sub>7</sub>−<italic>V</italic><sub>7</sub> and <inline-formula id="pcbi.1004394.e012"><alternatives><graphic id="pcbi.1004394.e012g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e012"/><mml:math id="M12" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mi>O</mml:mi> <mml:mn>7</mml:mn></mml:msub> <mml:mo>−</mml:mo> <mml:msubsup><mml:mi>V</mml:mi> <mml:mn>7</mml:mn> <mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></alternatives></inline-formula> are two <italic>anchors</italic> sharing the same outgroup ortholog <italic>O</italic><sub>7</sub>. Hence <inline-formula id="pcbi.1004394.e013"><alternatives><graphic id="pcbi.1004394.e013g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e013"/><mml:math id="M13" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mi>V</mml:mi> <mml:mn>7</mml:mn></mml:msub> <mml:mo>−</mml:mo> <mml:msubsup><mml:mi>V</mml:mi> <mml:mn>7</mml:mn> <mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></alternatives></inline-formula> are inferred to be an ohnolog pair candidate, which will be further filtered with quantitative statistical significance criteria or q-score, <inline-formula id="pcbi.1004394.e014"><alternatives><graphic id="pcbi.1004394.e014g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e014"/><mml:math id="M14" display="inline" overflow="scroll"><mml:msub><mml:mi>Q</mml:mi> <mml:mrow><mml:mi mathvariant="normal">o</mml:mi> <mml:mi mathvariant="normal">u</mml:mi> <mml:mi mathvariant="normal">t</mml:mi> <mml:mi mathvariant="normal">g</mml:mi> <mml:mi mathvariant="normal">r</mml:mi></mml:mrow></mml:msub></mml:math></alternatives></inline-formula>, see text.</p>
<p><bold>(B)</bold> Identification of ohnologs between two regions in the same vertebrate genome. The anchor <inline-formula id="pcbi.1004394.e015"><alternatives><graphic id="pcbi.1004394.e015g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e015"/><mml:math id="M15" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mi>V</mml:mi> <mml:mn>7</mml:mn></mml:msub> <mml:mo>−</mml:mo> <mml:msubsup><mml:mi>V</mml:mi> <mml:mn>7</mml:mn> <mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></alternatives></inline-formula> having four additional paralog pairs between the windows, it is directly taken as an ohnolog pair candidate, to be further filtered with quantitative statistical significance criteria or q-score, <inline-formula id="pcbi.1004394.e016"><alternatives><graphic id="pcbi.1004394.e016g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004394.e016"/><mml:math id="M16" display="inline" overflow="scroll"><mml:msub><mml:mi>Q</mml:mi> <mml:mrow><mml:mi mathvariant="normal">s</mml:mi> <mml:mi mathvariant="normal">e</mml:mi> <mml:mi mathvariant="normal">l</mml:mi> <mml:mi mathvariant="normal">f</mml:mi></mml:mrow></mml:msub></mml:math></alternatives></inline-formula>, see text.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s007" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s007" mimetype="image/tiff" xlink:type="simple">
<label>S6 Fig</label>
<caption>
<title>Principle of P-value calculation between putative synteny blocks.</title>
<p>The calculation of <italic>P</italic><sub><italic>i</italic></sub> for an outgroup gene <italic>O</italic><sub><italic>i</italic></sub>. Illustration of the likelihood calculation, <italic>P</italic><sub><italic>i</italic></sub>, for an outgroup gene <italic>O</italic><sub>8</sub> to have an ortholog gene in the vertebrate window (<italic>V</italic><sub>16</sub>−<italic>V</italic><sub>20</sub>) defined by the anchor pair (<italic>O</italic><sub>7</sub>−<italic>V</italic><sub>18</sub>). <italic>O</italic><sub>8</sub> has 5 orthologs in the vertebrate genome: <italic>V</italic><sub>1</sub>, <italic>V</italic><sub>8</sub>, <italic>V</italic><sub>19</sub>, <italic>V</italic><sub>23</sub> and <italic>V</italic><sub>32</sub>. There are 12 possible window locations (highlighted in blue) without any of these orthologs in the vertebrate genome. <italic>P</italic><sub><italic>i</italic></sub> for this anchor then becomes 1 − 12/31 = 0.6, where 31 is the total number of possible windows on this schematic vertebrate genome (<italic>N</italic>−<italic>W</italic>).</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s008" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s008" mimetype="image/tiff" xlink:type="simple">
<label>S7 Fig</label>
<caption>
<title>Comparisons of q-score distribution from original and randomized genomes.</title>
<p>Comparisons of the global q-score distributions from the original (blue) and randomized (red) genomes; (A) without worm and fly outgroups; (B) with all six outgroup genomes.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s009" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s009" mimetype="image/tiff" xlink:type="simple">
<label>S8 Fig</label>
<caption>
<title>Venn diagram of outgroup identification of ohnolog pairs in human.</title>
<p>A six-way Venn diagram showing the distribution in numbers of the 7,715 human ohnolog pairs identified by at least one outgroup and predicted from the relaxed criteria.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s010" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s010" mimetype="image/tiff" xlink:type="simple">
<label>S9 Fig</label>
<caption>
<title>Comparisons of human ohnologs with Makino-McLysaght dataset [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>].</title>
<p>Comparison of our human ohnolog prediction for the three quantitative criteria (strict, intermediate and relaxed, see main text) and the ohnolog dataset from [<xref ref-type="bibr" rid="pcbi.1004394.ref003">3</xref>].</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s011" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s011" mimetype="image/tiff" xlink:type="simple">
<label>S10 Fig</label>
<caption>
<title>Venn diagram of distribution of amniote ohnologs for the relaxed criteria.</title>
<p>A six-way Venn diagram showing the distribution in numbers of the ohnologs identified in at least one amniote and predicted from the relaxed criteria.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s012" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s012" mimetype="image/tiff" xlink:type="simple">
<label>S11 Fig</label>
<caption>
<title>Venn diagram of distribution of amniote ohnologs for the strict criteria.</title>
<p>A six-way Venn diagram showing the distribution in numbers (A) and percentages (B) of the ohnologs identified in at least one amniote and predicted from the strict criteria.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s013" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s013" mimetype="image/tiff" xlink:type="simple">
<label>S12 Fig</label>
<caption>
<title>Search page on the ‘Ohnologs’ server.</title>
<p>(TIF)</p>
</caption>
</supplementary-material>
<supplementary-material id="pcbi.1004394.s014" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004394.s014" mimetype="image/tiff" xlink:type="simple">
<label>S13 Fig</label>
<caption>
<title>Ohnolog family page on the ‘Ohnologs’ server.</title>
<p>The result page of the ohnolog family search for the human EMR3 gene is depicted. Families from all three quantitative criteria are displayed, see text. Using the strict criterion, a family of size 2 is generated where <italic>ELTD1 &amp; LPHN2</italic> are ohnologs with <italic>EMR2, EMR3 &amp; LPHN1</italic>. Relaxing the q-score to the intermediate criteria results in an additional ohnolog in this family, <italic>EMTR1</italic>; and to the relaxed criteria results in a family of size 4. Ohnolog partners for the families are displayed in different columns. Genes within the same cell are small scale duplicates <italic>e.g.</italic> <italic>ELTD1—LPHN2</italic>. We use two different separators for SSDs: a comma (,) to distinguish if it is a recent SSD (after 2R-WGD), and a pipe (|) for an ancient SSD (before or around the same time as the 2R-WGD). Hence, <italic>ELTD1 | LPHN2</italic> have been duplicated by an old SSD, while <italic>EMR1, EMR2</italic> and <italic>LPHN1, EMR3</italic> have been duplicated by recent SSDs. It implies that the entire region having <italic>ELTD1 | LPHN2</italic> genes was duplicated by the genome duplications. Duplication time are taken from Ensembl Compara. A link to the corresponding ohnolog family in other vertebrates has also been provided for each gene request, along with the association with human diseases from GeneCards [<xref ref-type="bibr" rid="pcbi.1004394.ref043">43</xref>] and COSMIC [<xref ref-type="bibr" rid="pcbi.1004394.ref044">44</xref>] databases.</p>
<p>(TIF)</p>
</caption>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>We thank Hugues Roest-Crollius and Pierre Pontarotti for discussion and acknowledge technical support from the service informatique of Institut Curie.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1004394.ref001">
<label>1</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Van de Peer</surname> <given-names>Y</given-names></name>, <name name-style="western"><surname>Maere</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Meyer</surname> <given-names>A</given-names></name> (<year>2009</year>) <article-title>The evolutionary significance of ancient genome duplications</article-title>. <source>Nat Rev Genet</source> <volume>10</volume>: <fpage>725</fpage>–<lpage>732</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrg2600" xlink:type="simple">10.1038/nrg2600</ext-link></comment> <object-id pub-id-type="pmid">19652647</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref002">
<label>2</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Ohno</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Wolf</surname> <given-names>U</given-names></name>, <name name-style="western"><surname>Atkin</surname> <given-names>N</given-names></name> (<year>1968</year>) <article-title>Evolution from fish to mammals by gene duplication</article-title>. <source>Hereditas</source> <volume>59</volume>: <fpage>169</fpage>–<lpage>187</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1111/j.1601-5223.1968.tb02169.x" xlink:type="simple">10.1111/j.1601-5223.1968.tb02169.x</ext-link></comment> <object-id pub-id-type="pmid">5662632</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref003">
<label>3</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Makino</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>McLysaght</surname> <given-names>A</given-names></name> (<year>2010</year>) <article-title>Ohnologs in the human genome are dosage balanced and frequently associated with disease</article-title>. <source>Proc Natl Acad Sci USA</source> <volume>107</volume>: <fpage>9270</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.0914697107" xlink:type="simple">10.1073/pnas.0914697107</ext-link></comment> <object-id pub-id-type="pmid">20439718</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref004">
<label>4</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Huminiecki</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Heldin</surname> <given-names>C</given-names></name> (<year>2010</year>) <article-title>2R and remodeling of vertebrate signal transduction engine</article-title>. <source>BMC Biol</source> <volume>8</volume>: <fpage>146</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1741-7007-8-146" xlink:type="simple">10.1186/1741-7007-8-146</ext-link></comment> <object-id pub-id-type="pmid">21144020</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref005">
<label>5</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Singh</surname> <given-names>PP</given-names></name>, <name name-style="western"><surname>Affeldt</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Cascone</surname> <given-names>I</given-names></name>, <name name-style="western"><surname>Selimoglu</surname> <given-names>R</given-names></name>, <name name-style="western"><surname>Camonis</surname> <given-names>J</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>On the expansion of “dangerous” gene repertoires by whole-genome duplications in early vertebrates</article-title>. <source>Cell Rep</source> <volume>2</volume>: <fpage>1387</fpage>–<lpage>1398</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.celrep.2012.09.034" xlink:type="simple">10.1016/j.celrep.2012.09.034</ext-link></comment> <object-id pub-id-type="pmid">23168259</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref006">
<label>6</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Singh</surname> <given-names>PP</given-names></name>, <name name-style="western"><surname>Affeldt</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Malaguti</surname> <given-names>G</given-names></name>, <name name-style="western"><surname>Isambert</surname> <given-names>H</given-names></name> (<year>2014</year>) <article-title>Human dominant disease genes are enriched in paralogs originating from whole genome duplication</article-title>. <source>PLoS Comput Biol</source> <volume>10</volume>: <fpage>e1003754</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.1003754" xlink:type="simple">10.1371/journal.pcbi.1003754</ext-link></comment> <object-id pub-id-type="pmid">25080083</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref007">
<label>7</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Maere</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>De Bodt</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Raes</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Casneuf</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Van Montagu</surname> <given-names>M</given-names></name>, <etal>et al</etal>. (<year>2005</year>) <article-title>Modeling gene and genome duplications in eukaryotes</article-title>. <source>Proc Natl Acad Sci USA</source> <volume>102</volume>: <fpage>5454</fpage>–<lpage>5459</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.0501102102" xlink:type="simple">10.1073/pnas.0501102102</ext-link></comment> <object-id pub-id-type="pmid">15800040</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref008">
<label>8</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Blomme</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Vandepoele</surname> <given-names>K</given-names></name>, <name name-style="western"><surname>De Bodt</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Simillion</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Maere</surname> <given-names>S</given-names></name>, <etal>et al</etal>. (<year>2006</year>) <article-title>The gain and loss of genes during 600 million years of vertebrate evolution</article-title>. <source>Genome Biol</source> <volume>7</volume>: <fpage>R43</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/gb-2006-7-5-r43" xlink:type="simple">10.1186/gb-2006-7-5-r43</ext-link></comment> <object-id pub-id-type="pmid">16723033</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref009">
<label>9</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Freeling</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Thomas</surname> <given-names>BC</given-names></name> (<year>2006</year>) <article-title>Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity</article-title>. <source>Genome Res</source> <volume>16</volume>: <fpage>805</fpage>–<lpage>814</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.3681406" xlink:type="simple">10.1101/gr.3681406</ext-link></comment> <object-id pub-id-type="pmid">16818725</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref010">
<label>10</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Semon</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Wolfe</surname> <given-names>KH</given-names></name> (<year>2007</year>) <article-title>Consequences of genome duplication</article-title>. <source>Curr Opin Genet Dev</source> <volume>17</volume>: <fpage>505</fpage>–<lpage>512</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.gde.2007.09.007" xlink:type="simple">10.1016/j.gde.2007.09.007</ext-link></comment> <object-id pub-id-type="pmid">18006297</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref011">
<label>11</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Holland</surname> <given-names>LZ</given-names></name> (<year>2013</year>) <article-title>Evolution of new characters after whole genome duplications: insights from amphioxus</article-title>. <source>Semin Cell Dev Biol</source> <volume>24</volume>: <fpage>101</fpage>–<lpage>109</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.semcdb.2012.12.007" xlink:type="simple">10.1016/j.semcdb.2012.12.007</ext-link></comment> <object-id pub-id-type="pmid">23291260</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref012">
<label>12</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Dickerson</surname> <given-names>JE</given-names></name>, <name name-style="western"><surname>Robertson</surname> <given-names>DL</given-names></name> (<year>2012</year>) <article-title>On the origins of Mendelian disease genes in man: the impact of gene duplication</article-title>. <source>Mol Biol Evol</source> <volume>29</volume>: <fpage>61</fpage>–<lpage>69</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/molbev/msr111" xlink:type="simple">10.1093/molbev/msr111</ext-link></comment> <object-id pub-id-type="pmid">21705381</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref013">
<label>13</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Tinti</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Johnson</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Toth</surname> <given-names>R</given-names></name>, <name name-style="western"><surname>Ferrier</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>MacKintosh</surname> <given-names>C</given-names></name> (<year>2012</year>) <article-title>Evolution of signal multiplexing by 14-3-3-binding 2R-ohnologue protein families in the vertebrates</article-title>. <source>Open Biol</source> <volume>2</volume>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1098/rsob.120103" xlink:type="simple">10.1098/rsob.120103</ext-link></comment> <object-id pub-id-type="pmid">22870394</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref014">
<label>14</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Tinti</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Dissanayake</surname> <given-names>K</given-names></name>, <name name-style="western"><surname>Synowsky</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Albergante</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>MacKintosh</surname> <given-names>C</given-names></name> (<year>2014</year>) <article-title>Identification of 2R-ohnologue gene families displaying the same mutation-load skew in multiple cancers</article-title>. <source>Open Biol</source> <volume>4</volume>: <fpage>140029</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1098/rsob.140029" xlink:type="simple">10.1098/rsob.140029</ext-link></comment> <object-id pub-id-type="pmid">24806839</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref015">
<label>15</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Malaguti</surname> <given-names>G</given-names></name>, <name name-style="western"><surname>Singh</surname> <given-names>PP</given-names></name>, <name name-style="western"><surname>Isambert</surname> <given-names>H</given-names></name> (<year>2014</year>) <article-title>On the retention of gene duplicates prone to dominant deleterious mutations</article-title>. <source>Theor Popul Biol</source> <volume>93</volume>: <fpage>38</fpage>–<lpage>51</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.tpb.2014.01.004" xlink:type="simple">10.1016/j.tpb.2014.01.004</ext-link></comment> <object-id pub-id-type="pmid">24530892</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref016">
<label>16</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Van de Peer</surname> <given-names>Y</given-names></name> (<year>2004</year>) <article-title>Computational approaches to unveiling ancient genome duplications</article-title>. <source>Nat Rev Genet</source> <volume>5</volume>: <fpage>752</fpage>–<lpage>763</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrg1449" xlink:type="simple">10.1038/nrg1449</ext-link></comment> <object-id pub-id-type="pmid">15510166</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref017">
<label>17</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Putnam</surname> <given-names>NH</given-names></name>, <name name-style="western"><surname>Butts</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Ferrier</surname> <given-names>DE</given-names></name>, <name name-style="western"><surname>Furlong</surname> <given-names>RF</given-names></name>, <name name-style="western"><surname>Hellsten</surname> <given-names>U</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>The amphioxus genome and the evolution of the chordate karyotype</article-title>. <source>Nature</source> <volume>453</volume>: <fpage>1064</fpage>–<lpage>1071</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature06967" xlink:type="simple">10.1038/nature06967</ext-link></comment> <object-id pub-id-type="pmid">18563158</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref018">
<label>18</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Vandepoele</surname> <given-names>K</given-names></name>, <name name-style="western"><surname>Saeys</surname> <given-names>Y</given-names></name>, <name name-style="western"><surname>Simillion</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Raes</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Van de Peer</surname> <given-names>Y</given-names></name> (<year>2002</year>) <article-title>The automatic detection of homologous regions (adhore) and its application to microcolinearity between arabidopsis and rice</article-title>. <source>Genome Res</source> <volume>12</volume>: <fpage>1792</fpage>–<lpage>1801</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.400202" xlink:type="simple">10.1101/gr.400202</ext-link></comment> <object-id pub-id-type="pmid">12421767</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref019">
<label>19</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Hampson</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>McLysaght</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Gaut</surname> <given-names>B</given-names></name>, <name name-style="western"><surname>Baldi</surname> <given-names>P</given-names></name> (<year>2003</year>) <article-title>LineUp: statistical detection of chromosomal homology with application to plant comparative genomics</article-title>. <source>Genome Res</source> <volume>13</volume>: <fpage>999</fpage>–<lpage>1010</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.814403" xlink:type="simple">10.1101/gr.814403</ext-link></comment> <object-id pub-id-type="pmid">12695327</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref020">
<label>20</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Wang</surname> <given-names>X</given-names></name>, <name name-style="western"><surname>Shi</surname> <given-names>X</given-names></name>, <name name-style="western"><surname>Li</surname> <given-names>Z</given-names></name>, <name name-style="western"><surname>Zhu</surname> <given-names>Q</given-names></name>, <name name-style="western"><surname>Kong</surname> <given-names>L</given-names></name>, <etal>et al</etal>. (<year>2006</year>) <article-title>Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice</article-title>. <source>BMC Bioinformatics</source> <volume>7</volume>: <fpage>447</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-7-447" xlink:type="simple">10.1186/1471-2105-7-447</ext-link></comment> <object-id pub-id-type="pmid">17038171</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref021">
<label>21</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Kellis</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Birren</surname> <given-names>B</given-names></name>, <name name-style="western"><surname>Lander</surname> <given-names>E</given-names></name> (<year>2004</year>) <article-title>Proof and evolutionary analysis of ancient genome duplication in the yeast saccharomyces cerevisiae</article-title>. <source>Nature</source> <volume>428</volume>: <fpage>617</fpage>–<lpage>624</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature02424" xlink:type="simple">10.1038/nature02424</ext-link></comment> <object-id pub-id-type="pmid">15004568</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref022">
<label>22</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Tang</surname> <given-names>H</given-names></name>, <name name-style="western"><surname>Wang</surname> <given-names>X</given-names></name>, <name name-style="western"><surname>Bowers</surname> <given-names>JE</given-names></name>, <name name-style="western"><surname>Ming</surname> <given-names>R</given-names></name>, <name name-style="western"><surname>Alam</surname> <given-names>M</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps</article-title>. <source>Genome Res</source> <volume>18</volume>: <fpage>1944</fpage>–<lpage>1954</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.080978.108" xlink:type="simple">10.1101/gr.080978.108</ext-link></comment> <object-id pub-id-type="pmid">18832442</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref023">
<label>23</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Simillion</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Janssens</surname> <given-names>K</given-names></name>, <name name-style="western"><surname>Sterck</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Van de Peer</surname> <given-names>Y</given-names></name> (<year>2008</year>) <article-title>i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles</article-title>. <source>Bioinformatics</source> <volume>24</volume>: <fpage>127</fpage>–<lpage>128</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btm449" xlink:type="simple">10.1093/bioinformatics/btm449</ext-link></comment> <object-id pub-id-type="pmid">17947255</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref024">
<label>24</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Lynch</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Conery</surname> <given-names>JS</given-names></name> (<year>2000</year>) <article-title>The evolutionary fate and consequences of duplicate genes</article-title>. <source>Science</source> <volume>290</volume>: <fpage>1151</fpage>–<lpage>1155</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1126/science.290.5494.1151" xlink:type="simple">10.1126/science.290.5494.1151</ext-link></comment> <object-id pub-id-type="pmid">11073452</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref025">
<label>25</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Blanc</surname> <given-names>G</given-names></name>, <name name-style="western"><surname>Wolfe</surname> <given-names>KH</given-names></name> (<year>2004</year>) <article-title>Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes</article-title>. <source>Plant Cell</source> <volume>16</volume>: <fpage>1667</fpage>–<lpage>1678</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1105/tpc.021345" xlink:type="simple">10.1105/tpc.021345</ext-link></comment> <object-id pub-id-type="pmid">15208399</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref026">
<label>26</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Jiao</surname> <given-names>Y</given-names></name>, <name name-style="western"><surname>Wickett</surname> <given-names>NJ</given-names></name>, <name name-style="western"><surname>Ayyampalayam</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Chanderbali</surname> <given-names>AS</given-names></name>, <name name-style="western"><surname>Landherr</surname> <given-names>L</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Ancestral polyploidy in seed plants and angiosperms</article-title>. <source>Nature</source> <volume>473</volume>: <fpage>97</fpage>–<lpage>100</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature09916" xlink:type="simple">10.1038/nature09916</ext-link></comment> <object-id pub-id-type="pmid">21478875</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref027">
<label>27</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Rabier</surname> <given-names>CE</given-names></name>, <name name-style="western"><surname>Ta</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Ane</surname> <given-names>C</given-names></name> (<year>2014</year>) <article-title>Detecting and Locating Whole Genome Duplications on a Phylogeny: A Probabilistic Approach</article-title>. <source>Mol Biol Evol</source>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/molbev/mst263" xlink:type="simple">10.1093/molbev/mst263</ext-link></comment> <object-id pub-id-type="pmid">24361993</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref028">
<label>28</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Vanneste</surname> <given-names>K</given-names></name>, <name name-style="western"><surname>Van de Peer</surname> <given-names>Y</given-names></name>, <name name-style="western"><surname>Maere</surname> <given-names>S</given-names></name> (<year>2013</year>) <article-title>Inference of genome duplications from age distributions revisited</article-title>. <source>Mol Biol Evol</source> <volume>30</volume>: <fpage>177</fpage>–<lpage>190</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/molbev/mss214" xlink:type="simple">10.1093/molbev/mss214</ext-link></comment> <object-id pub-id-type="pmid">22936721</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref029">
<label>29</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Smith</surname> <given-names>JJ</given-names></name>, <name name-style="western"><surname>Kuraku</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Holt</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Sauka-Spengler</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Jiang</surname> <given-names>N</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution</article-title>. <source>Nat Genet</source> <volume>45</volume>: <fpage>415</fpage>–<lpage>421</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/ng.2568" xlink:type="simple">10.1038/ng.2568</ext-link></comment> <object-id pub-id-type="pmid">23435085</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref030">
<label>30</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Hampson</surname> <given-names>SE</given-names></name>, <name name-style="western"><surname>Gaut</surname> <given-names>BS</given-names></name>, <name name-style="western"><surname>Baldi</surname> <given-names>P</given-names></name> (<year>2005</year>) <article-title>Statistical detection of chromosomal homology using shared-gene density alone</article-title>. <source>Bioinformatics</source> <volume>21</volume>: <fpage>1339</fpage>–<lpage>1348</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/bti168" xlink:type="simple">10.1093/bioinformatics/bti168</ext-link></comment> <object-id pub-id-type="pmid">15585535</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref031">
<label>31</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Abi-Rached</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Gilles</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Shiina</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Pontarotti</surname> <given-names>P</given-names></name>, <name name-style="western"><surname>Inoko</surname> <given-names>H</given-names></name> (<year>2002</year>) <article-title>Evidence of en bloc duplication in vertebrate genomes</article-title>. <source>Nat Genet</source> <volume>31</volume>: <fpage>100</fpage>–<lpage>105</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/ng855" xlink:type="simple">10.1038/ng855</ext-link></comment> <object-id pub-id-type="pmid">11967531</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref032">
<label>32</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Dehal</surname> <given-names>P</given-names></name>, <name name-style="western"><surname>Boore</surname> <given-names>J</given-names></name> (<year>2005</year>) <article-title>Two rounds of whole genome duplication in the ancestral vertebrate</article-title>. <source>PLoS Biol</source> <volume>3</volume>: <fpage>e314</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pbio.0030314" xlink:type="simple">10.1371/journal.pbio.0030314</ext-link></comment> <object-id pub-id-type="pmid">16128622</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref033">
<label>33</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Flicek</surname> <given-names>P</given-names></name>, <name name-style="western"><surname>Ahmed</surname> <given-names>I</given-names></name>, <name name-style="western"><surname>Amode</surname> <given-names>MR</given-names></name>, <name name-style="western"><surname>Barrell</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Beal</surname> <given-names>K</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>Ensembl 2013</article-title>. <source>Nucleic acids research</source> <volume>41</volume>: <fpage>D48</fpage>–<lpage>D55</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/nar/gks1236" xlink:type="simple">10.1093/nar/gks1236</ext-link></comment> <object-id pub-id-type="pmid">23203987</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref034">
<label>34</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Kersey</surname> <given-names>PJ</given-names></name>, <name name-style="western"><surname>Staines</surname> <given-names>DM</given-names></name>, <name name-style="western"><surname>Lawson</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Kulesha</surname> <given-names>E</given-names></name>, <name name-style="western"><surname>Derwent</surname> <given-names>P</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>Ensembl genomes: an integrative resource for genome-scale data from non-vertebrate species</article-title>. <source>Nucleic Acids Res</source> <volume>40</volume>: <fpage>D91</fpage>–<lpage>D97</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/nar/gkr895" xlink:type="simple">10.1093/nar/gkr895</ext-link></comment> <object-id pub-id-type="pmid">22067447</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref035">
<label>35</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Vilella</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Severin</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Ureta-Vidal</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Heng</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Durbin</surname> <given-names>R</given-names></name>, <etal>et al</etal>. (<year>2009</year>) <article-title>EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates</article-title>. <source>Genome Res</source> <volume>19</volume>: <fpage>327</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.073585.107" xlink:type="simple">10.1101/gr.073585.107</ext-link></comment> <object-id pub-id-type="pmid">19029536</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref036">
<label>36</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Tarjan</surname> <given-names>R</given-names></name> (<year>1972</year>) <article-title>Depth-first search and linear graph algorithms</article-title>. <source>SIAM J Comput</source> <volume>1</volume>: <fpage>146</fpage>–<lpage>160</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1137/0201010" xlink:type="simple">10.1137/0201010</ext-link></comment></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref037">
<label>37</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Dakovic</surname> <given-names>N</given-names></name>, <name name-style="western"><surname>Terezol</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Pitel</surname> <given-names>F</given-names></name>, <name name-style="western"><surname>Maillard</surname> <given-names>V</given-names></name>, <name name-style="western"><surname>Elis</surname> <given-names>S</given-names></name>, <etal>et al</etal>. (<year>2014</year>) <article-title>The loss of adipokine genes in the chicken genome and implications for insulin metabolism</article-title>. <source>Mol Biol Evol</source> <volume>31</volume>: <fpage>2637</fpage>–<lpage>2646</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/molbev/msu208" xlink:type="simple">10.1093/molbev/msu208</ext-link></comment> <object-id pub-id-type="pmid">25015647</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref038">
<label>38</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Jaillon</surname> <given-names>O</given-names></name>, <name name-style="western"><surname>Aury</surname> <given-names>JM</given-names></name>, <name name-style="western"><surname>Brunet</surname> <given-names>F</given-names></name>, <name name-style="western"><surname>Petit</surname> <given-names>JL</given-names></name>, <name name-style="western"><surname>Stange-Thomann</surname> <given-names>N</given-names></name>, <etal>et al</etal>. (<year>2004</year>) <article-title>Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype</article-title>. <source>Nature</source> <volume>431</volume>: <fpage>946</fpage>–<lpage>957</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature03025" xlink:type="simple">10.1038/nature03025</ext-link></comment> <object-id pub-id-type="pmid">15496914</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref039">
<label>39</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Huang</surname> <given-names>daW</given-names></name>, <name name-style="western"><surname>Sherman</surname> <given-names>BT</given-names></name>, <name name-style="western"><surname>Lempicki</surname> <given-names>RA</given-names></name> (<year>2009</year>) <article-title>Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources</article-title>. <source>Nat Protoc</source> <volume>4</volume>: <fpage>44</fpage>–<lpage>57</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nprot.2008.211" xlink:type="simple">10.1038/nprot.2008.211</ext-link></comment> <object-id pub-id-type="pmid">19131956</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref040">
<label>40</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Forbes</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Bhamra</surname> <given-names>G</given-names></name>, <name name-style="western"><surname>Bamford</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Dawson</surname> <given-names>E</given-names></name>, <name name-style="western"><surname>Kok</surname> <given-names>C</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>The catalogue of somatic mutations in cancer (COSMIC)</article-title>. <source>Curr Protoc Hum Genet</source> <volume>Chapter 10</volume>: <issue>Unit 10.11</issue>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1002/0471142905.hg1011s57" xlink:type="simple">10.1002/0471142905.hg1011s57</ext-link></comment> <object-id pub-id-type="pmid">18428421</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref041">
<label>41</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Finn</surname> <given-names>RD</given-names></name>, <name name-style="western"><surname>Clements</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Eddy</surname> <given-names>SR</given-names></name> (<year>2011</year>) <article-title>HMMER web server: interactive sequence similarity searching</article-title>. <source>Nucleic Acids Res</source> <volume>39</volume>: <fpage>29</fpage>–<lpage>37</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/nar/gkr367" xlink:type="simple">10.1093/nar/gkr367</ext-link></comment> <object-id pub-id-type="pmid">21593126</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref042">
<label>42</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Punta</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Coggill</surname> <given-names>PC</given-names></name>, <name name-style="western"><surname>Eberhardt</surname> <given-names>RY</given-names></name>, <name name-style="western"><surname>Mistry</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Tate</surname> <given-names>J</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>The Pfam protein families database</article-title>. <source>Nucleic Acids Res</source> <volume>40</volume>: <fpage>290</fpage>–<lpage>301</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/nar/gkr1065" xlink:type="simple">10.1093/nar/gkr1065</ext-link></comment> <object-id pub-id-type="pmid">22127870</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref043">
<label>43</label>
<mixed-citation xlink:type="simple" publication-type="other">Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, et al. (2010) GeneCards Version 3: the human gene integrator. Database (Oxford) 2010: baq020. <object-id pub-id-type="pmid">20689021</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004394.ref044">
<label>44</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Forbes</surname> <given-names>SA</given-names></name>, <name name-style="western"><surname>Bindal</surname> <given-names>N</given-names></name>, <name name-style="western"><surname>Bamford</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Cole</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Kok</surname> <given-names>CY</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer</article-title>. <source>Nucleic Acids Res</source> <volume>39</volume>: <fpage>D945</fpage>–<lpage>950</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/nar/gkq929" xlink:type="simple">10.1093/nar/gkq929</ext-link></comment> <object-id pub-id-type="pmid">20952405</object-id></mixed-citation>
</ref>
</ref-list>
</back>
</article>