<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id><journal-title-group>
<journal-title>PLoS Computational Biology</journal-title></journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-13-01887</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1003610</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biology and life sciences</subject><subj-group><subject>Computational biology</subject></subj-group><subj-group><subject>Evolutionary biology</subject><subj-group><subject>Population genetics</subject></subj-group></subj-group><subj-group><subject>Genetics</subject><subj-group><subject>Human genetics</subject></subj-group></subj-group></subj-group></article-categories>
<title-group>
<article-title>Historical Pedigree Reconstruction from Extant Populations Using PArtitioning of RElatives (PREPARE)</article-title>
<alt-title alt-title-type="running-head">Pedigree Reconstruction via Relative Partitioning</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Shem-Tov</surname><given-names>Doron</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Halperin</surname><given-names>Eran</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="aff" rid="aff2"><sup>2</sup></xref><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><label>1</label><addr-line>The Balvatnic School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel</addr-line></aff>
<aff id="aff2"><label>2</label><addr-line>International Computer Science Institute, Berkeley, California, United States of America</addr-line></aff>
<aff id="aff3"><label>3</label><addr-line>Molecular Microbiology and Biotechnology Department, Tel-Aviv University, Tel-Aviv, Israel</addr-line></aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Keinan</surname><given-names>Alon</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/></contrib>
</contrib-group>
<aff id="edit1"><addr-line>Cornell University, United States of America</addr-line></aff>
<author-notes>
<corresp id="cor1">* E-mail: <email xlink:type="simple">Email: doronshe@tau.ac.il</email></corresp>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
<fn fn-type="con"><p>Conceived and designed the experiments: DST EH. Performed the experiments: DST. Analyzed the data: DST. Contributed reagents/materials/analysis tools: DST EH. Wrote the paper: DST EH. Software design,development and testing: DST.</p></fn>
</author-notes>
<pub-date pub-type="collection"><month>6</month><year>2014</year></pub-date>
<pub-date pub-type="epub"><day>19</day><month>6</month><year>2014</year></pub-date>
<volume>10</volume>
<issue>6</issue>
<elocation-id>e1003610</elocation-id>
<history>
<date date-type="received"><day>30</day><month>10</month><year>2013</year></date>
<date date-type="accepted"><day>13</day><month>3</month><year>2014</year></date>
</history>
<permissions>
<copyright-year>2014</copyright-year>
<copyright-holder>Shem-Tov, Halperin</copyright-holder><license xlink:type="simple"><license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p></license></permissions><related-article id="RA1" related-article-type="companion" ext-link-type="uri" page="e1002972" xlink:type="simple" xlink:href="info:doi/10.1371/journal.pcbi.1002972"> <article-title>New Methods Section in PLOS Computational Biology</article-title></related-article>
<abstract>
<p>Recent technological improvements in the field of genetic data extraction give rise to the possibility of reconstructing the historical pedigrees of entire populations from the genotypes of individuals living today. Current methods are still not practical for real data scenarios as they have limited accuracy and assume unrealistic assumptions of monogamy and synchronized generations. In order to address these issues, we develop a new method for pedigree reconstruction, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e001" xlink:type="simple"/></inline-formula>, which is based on formulations of the pedigree reconstruction problem as variants of graph coloring. The new formulation allows us to consider features that were overlooked by previous methods, resulting in a reconstruction of up to 5 generations back in time, with an order of magnitude improvement of false-negatives rates over the state of the art, while keeping a lower level of false positive rates. We demonstrate the accuracy of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e002" xlink:type="simple"/></inline-formula> compared to previous approaches using simulation studies over a range of population sizes, including inbred and outbred populations, monogamous and polygamous mating patterns, as well as synchronous and asynchronous mating.</p>
</abstract>
<abstract abstract-type="summary"><title>Author Summary</title>
<p>Learning the correct relationships between individuals from genetic data is a basic theoretical problem in the field of genetics, and has many practical consequences. A wide variety of statistical methods for genetic analysis assume the relationships between individuals are known, and can manifest relatedness information to improve inference. The current state-of-the-art methods for relationship inference consider pair-wise genetic similarity, and use it to infer the relationship between each pair of individuals. Reconstructing the pedigrees of an entire population directly has the potential to use more elaborate relationship information, and thus obtains a better prediction of the familial relationships in the population. In contrast to the full set of pair-wise relationships in a population, genetic pedigrees provide a lossless and conflict-free structure for depicting the relationships between individuals. In an effort to make pedigree reconstruction practical we developed a new method, which is an order of magnitude more accurate than previous methods, and is the first method that has the ability to reconstruct polygamous pedigrees.</p>
</abstract>
<funding-group><funding-statement>This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. EH is a Faculty Fellow of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. EH and DST were supported by the German-Israeli Foundation (grant 1094-33.2/2010). URL: <ext-link ext-link-type="uri" xlink:href="http://www.gif.org.il" xlink:type="simple">http://www.gif.org.il</ext-link> EH was also partially supported by National Science Foundation grant III-1217615. DST was also partially supported by the Israel Science Foundation grant no. 1425/13. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement></funding-group><counts><page-count count="13"/></counts></article-meta>
</front>
<body><sec id="s1">
<title/>
<disp-quote>
<p>This is a <italic>PLOS Computational Biology</italic> Methods article.</p>
</disp-quote></sec><sec id="s2">
<title>Introduction</title>
<p>Pedigree reconstruction is an important problem in the field of computational genetics, with many potential applications such as genealogy inference, heritability estimation, and victim identification <xref ref-type="bibr" rid="pcbi.1003610-Blouin1">[1]</xref>–<xref ref-type="bibr" rid="pcbi.1003610-Thomas1">[4]</xref>. Additionally, it has the potential to improve the accuracy of current state-of-the-art relationship inference methods as it uses family structure in a broader sense than just using pairwise genetic similarity information. <xref ref-type="bibr" rid="pcbi.1003610-Kyriazopouloupanagiotopoulou1">[5]</xref>, <xref ref-type="bibr" rid="pcbi.1003610-Huff1">[6]</xref>. There are two main variants of the problem, which require different algorithmic approaches. In the first variant, considered by many classical and contemporary papers, the genotypes of several generations are given, and an attempt is made to estimate the pedigree which best explains the observed individuals, as might be the case in wild animal populations. <xref ref-type="bibr" rid="pcbi.1003610-Thompson1">[7]</xref>–<xref ref-type="bibr" rid="pcbi.1003610-Cussens1">[10]</xref>. In this paper we consider a more difficult variation of the problem, where we are given the genotypes of the currently living population only, and try to reconstruct the historical pedigree of unobserved ancestors. This variant suits well the scenario of reconstructing the pedigrees of living human populations. <xref ref-type="bibr" rid="pcbi.1003610-Kirkpatrick1">[11]</xref>. This variant of pedigree reconstruction was previously studied in several theoretical works <xref ref-type="bibr" rid="pcbi.1003610-Thatte1">[12]</xref>, <xref ref-type="bibr" rid="pcbi.1003610-Steel1">[13]</xref>. These papers focus on presenting theoretical bounds on the length of sequence required for reconstructing pedigrees under various combinatorial and stochastic heritability models, but in contrast to our work, do not aim to provide practical solutions for the problem.</p>
<p>The level of difficulty of the problem is highly dependent on the pedigree in consideration. Particularly, small inbred populations pose a considerable challenge since the probability for multiple mating events within any two families is high, and therefore individual pairs usually have more than two last common ancestors (LCAs). Moreover, in small inbred populations there is a complex relationship pedigree graph due to mating within the family.</p>
<p>Recently, three methods tackling pedigree reconstruction from the genotypes of extant individuals were proposed<xref ref-type="bibr" rid="pcbi.1003610-Kirkpatrick1">[11]</xref>, <xref ref-type="bibr" rid="pcbi.1003610-He1">[14]</xref>; these methods assume monogamy, and synchronized generations. Although unrealistic, these assumptions provide a starting point for developing tools that offer useful methodology. The original paper addressing pedigree reconstruction from the genotypes of extant individuals, presented the methods <italic>COP</italic>/<italic>CIP</italic> <xref ref-type="bibr" rid="pcbi.1003610-Kirkpatrick1">[11]</xref>. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e003" xlink:type="simple"/></inline-formula> assumes infinite population size, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e004" xlink:type="simple"/></inline-formula> tries to reconstruct the pedigree of small inbred populations. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e005" xlink:type="simple"/></inline-formula> is a follow-up method, similar in principal to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e006" xlink:type="simple"/></inline-formula>, but with improved efficiency <xref ref-type="bibr" rid="pcbi.1003610-He1">[14]</xref>. The main idea behind these methods is to construct the pedigree, generation at a time, starting with the given population. In each generation they identify sibling groups using genetic similarity measures, and assign two common parents to each sibling group.</p>
<p>In this work, we point out an important and naturally arising issue of pedigree reconstruction from extant populations, overlooked by all previous methods. We observe that the mother and father of a sibling-group have exactly the same descendants (as must be the case for monogamous couples). Since the genotypes of the parents are unobserved, a pairwise relationship analysis relying on the extant descendants will result in maternal relatives having the same likelihood of being related to the mother and to the father, and vice versa (see <xref ref-type="fig" rid="pcbi-1003610-g001">Fig. 1</xref>). Thus, partitioning the relatives into maternal and paternal relatives is required. Undoubtedly, ignoring this issue has a great potential influence on the quality of inferred pedigrees. We discuss a new framework to help understand and correctly deal with this issue, and present a highly efficient algorithm under this framework - <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e007" xlink:type="simple"/></inline-formula> (Pedigree Reconstruction of Extant populations using PArtitioning of RElatives). We extend our method to the case of polygamous pedigrees, and show that our approach results in a considerable improvement in accuracy compared to existing tools, both on monogamous and polygamous pedigrees. Thus, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e008" xlink:type="simple"/></inline-formula> presents a method that is capable of dealing with more realistic pedigree reconstruction problem as compared to previous methods.</p>
<fig id="pcbi-1003610-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g001</object-id><label>Figure 1</label><caption>
<title>Attempting to reconstruct the simple pedigree on the left, from the genotypes of extant generation (bright blue).</title>
<p>Considering observed genetic similarity of extant descendants only, we cannot distinguish which of the four parents in the second generation are siblings (Correctly inferred sibling relationship are colored blue, and wrong potential sibling-relationships in dashed red).</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g001" position="float" xlink:type="simple"/></fig></sec><sec id="s3" sec-type="methods">
<title>Methods</title>
<p>Similarly to previous methods, we reconstruct the pedigree generation by generation, starting with the last generation, and assuming all of the genotypes of the population come from the same generation. In iteration <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e009" xlink:type="simple"/></inline-formula>, we take the partial <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e010" xlink:type="simple"/></inline-formula> generations pedigree, which we call <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e011" xlink:type="simple"/></inline-formula>, and build <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e012" xlink:type="simple"/></inline-formula> by adding parents to all of the founder individuals in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e013" xlink:type="simple"/></inline-formula>. In order to construct the correct pedigree, full-siblings should have two common parents in the pedigree, and half-siblings should have a single common parent. First, we attempt to detect all founder-individual pairs in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e014" xlink:type="simple"/></inline-formula> which are most likely to be full-siblings, leaving the detection of half-sibling to a later stage. In previous methods, a sibling graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e015" xlink:type="simple"/></inline-formula> is constructed, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e016" xlink:type="simple"/></inline-formula> includes the set of all founders in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e017" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e018" xlink:type="simple"/></inline-formula> corresponds to the set of pairs of individuals that are likely to be full siblings. Pairs of individuals are considered as potential siblings based on the genetic similarity of the pair's extant descendants. Sibling groups are then detected by finding maximum cliques or proper vertex coloring of the graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e019" xlink:type="simple"/></inline-formula>. This approach is problematic, since individuals with equivalent descendant sets, such as parent couples, are completely indistinguishable in the graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e020" xlink:type="simple"/></inline-formula> since they have exactly the same set of neighbors. As a result, the siblings graph includes many redundant edges, and fails to represent the true relationship structure.</p>
<p>In contrast with previous methods, we present an alternative graph representation that accounts for the above-mentioned ambiguity, and uses the transitive property of the full-sibling relationship to correctly find the full-sibling groups. We begin each iteration by constructing a contracted siblings graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e021" xlink:type="simple"/></inline-formula>. The set of vertices <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e022" xlink:type="simple"/></inline-formula> is composed of disjoint subsets of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e023" xlink:type="simple"/></inline-formula>. Particularly, each <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e024" xlink:type="simple"/></inline-formula> corresponds to a subset of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e025" xlink:type="simple"/></inline-formula>, so that for each <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e026" xlink:type="simple"/></inline-formula> we have <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e027" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e028" xlink:type="simple"/></inline-formula> represents the set of extent descendants of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e029" xlink:type="simple"/></inline-formula> (see <xref ref-type="fig" rid="pcbi-1003610-g002">Fig. 2</xref>). Since vertices of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e030" xlink:type="simple"/></inline-formula> correspond to subsets of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e031" xlink:type="simple"/></inline-formula>, we refer to vertices in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e032" xlink:type="simple"/></inline-formula> as super-vertices. The set of edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e033" xlink:type="simple"/></inline-formula> corresponds to potential sibling relationship between the corresponding super-vertices, i.e., <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e034" xlink:type="simple"/></inline-formula> if there are <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e035" xlink:type="simple"/></inline-formula> such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e036" xlink:type="simple"/></inline-formula>. Note that in such case, for every <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e037" xlink:type="simple"/></inline-formula>, we will have <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e038" xlink:type="simple"/></inline-formula>. Edges have weights <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e039" xlink:type="simple"/></inline-formula> representing the confidence of the relationship. For a vertex <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e040" xlink:type="simple"/></inline-formula>, we define <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e041" xlink:type="simple"/></inline-formula> for every <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e042" xlink:type="simple"/></inline-formula>. We provide the details for the construction of the set <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e043" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e044" xlink:type="simple"/></inline-formula> in section 2.1.</p>
<fig id="pcbi-1003610-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g002</object-id><label>Figure 2</label><caption>
<title>Four examples of vertex contractions, typical for first, second, and third generations.</title>
<p>Founders are filled with Grey. Extant individuals are outlined in blue. Green arrows stand for the contraction action.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g002" position="float" xlink:type="simple"/></fig>
<p>The key idea of our method lies in a procedure for the assignment of the edges in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e045" xlink:type="simple"/></inline-formula> to edges in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e046" xlink:type="simple"/></inline-formula> in a consistent way. In principle, we are interested in assigning every super-edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e047" xlink:type="simple"/></inline-formula> to an edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e048" xlink:type="simple"/></inline-formula> that corresponds to the true sibling pair among all pairs in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e049" xlink:type="simple"/></inline-formula>. In doing so, we need to take into consideration a set of constraints on the assignments of neighboring super-edges. Ideally, we would like to find the assignment of super-edges to the edges of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e050" xlink:type="simple"/></inline-formula>, which maximizes the likelihood of the observed population genotypes. In section 2.2, we formulate this problem as an optimization problem using graph terminology, and propose a greedy algorithm which solves it in practice. The assignment algorithm generates an expanded siblings graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e051" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e052" xlink:type="simple"/></inline-formula>, denotes the proposed full-sibling pairs, and forms a disjoint clique-cover of the graph.</p>
<p>Under the monogamy assumption, we finish reconstructing the current generation by adding two common-parents to each sibling clique in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e053" xlink:type="simple"/></inline-formula>. In order to account for potential polygamy we add another step that identifies half-siblings and incorporate these into a second graph formulation. Our approach for the reconstruction of polygamous pedigrees relies on two key observations. First, we note that we can treat the full-sibling relation as an equivalence relation, and the half-sibling relation as a relation between equivalence classes. This is true, since if <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e054" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e055" xlink:type="simple"/></inline-formula> are full siblings, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e056" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e057" xlink:type="simple"/></inline-formula> are half-siblings, then <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e058" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e059" xlink:type="simple"/></inline-formula> are also half-siblings. According to this observation, we construct a half sibling graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e060" xlink:type="simple"/></inline-formula> where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e061" xlink:type="simple"/></inline-formula> corresponds to the equivalence classes defined by the full-sibling relation, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e062" xlink:type="simple"/></inline-formula> correspond to the half-sibling relation. Second, we observe that the children of every parent in the founder group of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e063" xlink:type="simple"/></inline-formula> correspond to a clique in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e064" xlink:type="simple"/></inline-formula>. We formulate the half-sibling detection problem, as a second graph optimization problem. To solve it, we develop a heuristic algorithm which attempts to find the maximal-weighted set of edges in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e065" xlink:type="simple"/></inline-formula>. The edge set has to satisfy a set of constraints, which represent natural constraints that govern half-sibling relationships.(see section 2.3).</p>
<sec id="s3a">
<title>2.1 Constructing the Contracted Sibling Graph</title>
<p>We now describe the construction of the graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e066" xlink:type="simple"/></inline-formula>. Recall that the set of super-vertices <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e067" xlink:type="simple"/></inline-formula> consists of subsets of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e068" xlink:type="simple"/></inline-formula> that share the same set of extant descendants. For every pair <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e069" xlink:type="simple"/></inline-formula> we have to decide whether <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e070" xlink:type="simple"/></inline-formula>. In order to do so, we pick a representative pair <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e071" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e072" xlink:type="simple"/></inline-formula>, and calculate three scores, corresponding to three putative relations of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e073" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e074" xlink:type="simple"/></inline-formula>: unrelated, siblings, and cousins. For each such relationship <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e075" xlink:type="simple"/></inline-formula>, we construct a pedigree <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e076" xlink:type="simple"/></inline-formula> by adding the relevant ancestry structure. For example, when considering the siblings relationship we construct <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e077" xlink:type="simple"/></inline-formula> by adding two common parents for <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e078" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e079" xlink:type="simple"/></inline-formula>. For unrelated pairs we construct <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e080" xlink:type="simple"/></inline-formula> by adding a different pair of parents to each node (see <xref ref-type="fig" rid="pcbi-1003610-g003">Fig. 3</xref>).</p>
<fig id="pcbi-1003610-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g003</object-id><label>Figure 3</label><caption>
<title>Examples for possible ancestry structures created for individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e081" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e082" xlink:type="simple"/></inline-formula> in order to test the relationship between them.</title>
<p>The triangles under <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e083" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e084" xlink:type="simple"/></inline-formula> represent their existing descendants, edges represent parent-offspring relationship.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g003" position="float" xlink:type="simple"/></fig>
<p>We proceed by simulating inheritance on <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e085" xlink:type="simple"/></inline-formula>; that is, the founders in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e086" xlink:type="simple"/></inline-formula> are assigned unique haplotypes and we simulate the recombination process from top to bottom, with a recombination rate of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e087" xlink:type="simple"/></inline-formula>. We then calculate IBD segments between each pair of extant descendants in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e088" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e089" xlink:type="simple"/></inline-formula> and calculate two <italic>IBD features</italic>: The number of IBD segments, and the total length of IBD sharing (we note that these features of IBD sharing were also considered by <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e090" xlink:type="simple"/></inline-formula> <xref ref-type="bibr" rid="pcbi.1003610-Witherspoon1">[15]</xref>, a method for the inference of pair-wise family relationships). We repeat these simulations <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e091" xlink:type="simple"/></inline-formula> times for a specified parameter <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e092" xlink:type="simple"/></inline-formula>, thus obtaining an empirical estimate for the distribution of the IBD features. Using the above empirical distributions, we estimate the probability of observing the IBD features for each pair in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e093" xlink:type="simple"/></inline-formula> under the relationship <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e094" xlink:type="simple"/></inline-formula>. Since the observed IBD features are typically not observed in any of the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e095" xlink:type="simple"/></inline-formula> simulations, we use a smoothed form of the distribution using Gaussian kernel smoothing. Formally, let <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e096" xlink:type="simple"/></inline-formula> <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e097" xlink:type="simple"/></inline-formula> be the simulated IBD features in the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e098" xlink:type="simple"/></inline-formula> simulations for a hypothesized relationship r. The density <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e099" xlink:type="simple"/></inline-formula> at point <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e100" xlink:type="simple"/></inline-formula> is calculated as:<disp-formula id="pcbi.1003610.e101"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003610.e101" xlink:type="simple"/></disp-formula></p>
<p>Empirical tests led us to the conclusion that scaling the features to have equal variance and using a diagonal bandwidth matrix <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e102" xlink:type="simple"/></inline-formula> with a parameter <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e103" xlink:type="simple"/></inline-formula> in the range 1 to 8 gives the best results. The parameter <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e104" xlink:type="simple"/></inline-formula> compensates running time and accuracy. The accuracy stops improving near <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e105" xlink:type="simple"/></inline-formula> = 50, which ends up with a very efficient analysis (See section 2.4 for more details).</p>
<p>Let <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e106" xlink:type="simple"/></inline-formula> be the observed IBD features between extant individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e107" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e108" xlink:type="simple"/></inline-formula>. The above procedure results in a probability <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e109" xlink:type="simple"/></inline-formula>, for every <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e110" xlink:type="simple"/></inline-formula> and every relationship <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e111" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e112" xlink:type="simple"/></inline-formula>.</p>
<p>For each relationship <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e113" xlink:type="simple"/></inline-formula>, we define<disp-formula id="pcbi.1003610.e114"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003610.e114" xlink:type="simple"/></disp-formula></p>
<p>We note that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e115" xlink:type="simple"/></inline-formula> can be intuitively interpreted as a composite likelihood of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e116" xlink:type="simple"/></inline-formula>. If <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e117" xlink:type="simple"/></inline-formula> is larger than <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e118" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e119" xlink:type="simple"/></inline-formula> we add <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e120" xlink:type="simple"/></inline-formula> to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e121" xlink:type="simple"/></inline-formula> with the weight<disp-formula id="pcbi.1003610.e122"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003610.e122" xlink:type="simple"/></disp-formula></p>
<p><xref ref-type="fig" rid="pcbi-1003610-g004">Fig. 4</xref> shows the distribution of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e123" xlink:type="simple"/></inline-formula> under different true relationships. Notice that cases where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e124" xlink:type="simple"/></inline-formula> are distantly related (cousins, 2nd-cousins etc.) will tend to have a maximal score under <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e125" xlink:type="simple"/></inline-formula>. This is desirable, since we only seek to distinguish siblings from non-siblings at this point.</p>
<fig id="pcbi-1003610-g004" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g004</object-id><label>Figure 4</label><caption>
<title>Distribution of relationship scores under specific true relationships.</title>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g004" position="float" xlink:type="simple"/></fig></sec><sec id="s3b">
<title>2.2 The Assignment Algorithm</title>
<p>In the assignment stage, we are given the contracted siblings graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e126" xlink:type="simple"/></inline-formula>, and we search for an assignment of a sibling relation between super-vertices, depicted by an edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e127" xlink:type="simple"/></inline-formula> to a single sibling-relation between two individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e128" xlink:type="simple"/></inline-formula>. Our assignment needs to obey the transitivity constraint of the full sibling relation. Recall that the weight of an edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e129" xlink:type="simple"/></inline-formula> corresponds to the strength of evidence for the existence of a sibling pair <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e130" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e131" xlink:type="simple"/></inline-formula>. We therefore formulate the edge assignment problem as follows:</p>
<sec id="s3b1">
<title><italic>Problem 1.</italic> Maximum weight disjoint clique cover edge assignment</title>
<p>Given the contracted graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e132" xlink:type="simple"/></inline-formula>, find the maximal-weight set of edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e133" xlink:type="simple"/></inline-formula>, such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e134" xlink:type="simple"/></inline-formula> is a legal assignment of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e135" xlink:type="simple"/></inline-formula>, under the constraint that the set of assigned edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e136" xlink:type="simple"/></inline-formula> forms a clique cover of the graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e137" xlink:type="simple"/></inline-formula>, i.e., <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e138" xlink:type="simple"/></inline-formula> is composed of an edge-disjoint set of cliques.</p>
<p>We first show that the above problem is NP-hard:</p>
</sec><sec id="s3b2">
<title>Theorem 1</title>
<p><italic>The maximum weight disjoint clique cover edge assignment is NP-hard.</italic></p>
<p><italic>Proof.</italic> We will show a reduction from maximum clique. In <xref ref-type="bibr" rid="pcbi.1003610-Hstad1">[16]</xref> it is shown that it is NP-hard to decide whether a graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e139" xlink:type="simple"/></inline-formula> has a clique of size <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e140" xlink:type="simple"/></inline-formula> or if its largest clique is smaller than <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e141" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e142" xlink:type="simple"/></inline-formula>. Consider an instance <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e143" xlink:type="simple"/></inline-formula> to the clique problem, and let <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e144" xlink:type="simple"/></inline-formula> be its largest clique. We define <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e145" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e146" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e147" xlink:type="simple"/></inline-formula>. Thus, any clique cover of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e148" xlink:type="simple"/></inline-formula> is a legal assignment of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e149" xlink:type="simple"/></inline-formula>. Note that if <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e150" xlink:type="simple"/></inline-formula> then the optimal clique cover is necessarily of size at least <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e151" xlink:type="simple"/></inline-formula>. On the other hand, if <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e152" xlink:type="simple"/></inline-formula> then it is easy to see that the optimal clique cover is obtained in case all cliques in the cover are of size <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e153" xlink:type="simple"/></inline-formula>, and thus the clique cover size is of size at most <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e154" xlink:type="simple"/></inline-formula>. Thus, if the Maximum Weight Disjoint Clique Cover Edge Assignment was polynomial, then we could decide in polynomial time between the case where the maximum clique is of size <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e155" xlink:type="simple"/></inline-formula> and the case where the maximum clique is of size <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e156" xlink:type="simple"/></inline-formula>, which is an NP-hard problem.</p>
<p>We therefore apply the following greedy algorithm. We will need to introduce a few notations. First, we treat vertices <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e157" xlink:type="simple"/></inline-formula> as vertices in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e158" xlink:type="simple"/></inline-formula>, as well as subsets of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e159" xlink:type="simple"/></inline-formula>, depending on the context. For each <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e160" xlink:type="simple"/></inline-formula>, we denote by <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e161" xlink:type="simple"/></inline-formula> the set of neighbors of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e162" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e163" xlink:type="simple"/></inline-formula>. Moreover, we define <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e164" xlink:type="simple"/></inline-formula>, i.e., the set of super-vertices corresponding to the neighbors of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e165" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e166" xlink:type="simple"/></inline-formula>. Finally, let <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e167" xlink:type="simple"/></inline-formula>.</p>
<p>We start by setting <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e168" xlink:type="simple"/></inline-formula>. The algorithm proceeds by traversing all super-edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e169" xlink:type="simple"/></inline-formula> in decreasing weight order. In each iteration the set <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e170" xlink:type="simple"/></inline-formula> consists of a set of disjoint cliques of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e171" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e172" xlink:type="simple"/></inline-formula> consists of a set of yet to be assigned edges. For each <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e173" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e174" xlink:type="simple"/></inline-formula> we say that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e175" xlink:type="simple"/></inline-formula> can be added to the clique of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e176" xlink:type="simple"/></inline-formula> if for every <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e177" xlink:type="simple"/></inline-formula> we have that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e178" xlink:type="simple"/></inline-formula>. Similarly, we say that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e179" xlink:type="simple"/></inline-formula> can be added to the clique of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e180" xlink:type="simple"/></inline-formula> if for every <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e181" xlink:type="simple"/></inline-formula> we have <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e182" xlink:type="simple"/></inline-formula>. When traversing an edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e183" xlink:type="simple"/></inline-formula> we search for a pair <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e184" xlink:type="simple"/></inline-formula> where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e185" xlink:type="simple"/></inline-formula> has the maximal clique size, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e186" xlink:type="simple"/></inline-formula>, from within <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e187" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e188" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e189" xlink:type="simple"/></inline-formula> can be added to the clique of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e190" xlink:type="simple"/></inline-formula> (or in a symmetric manner that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e191" xlink:type="simple"/></inline-formula> can be added to the clique of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e192" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e193" xlink:type="simple"/></inline-formula> is maximized). We then assign <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e194" xlink:type="simple"/></inline-formula> to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e195" xlink:type="simple"/></inline-formula> by adding <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e196" xlink:type="simple"/></inline-formula> to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e197" xlink:type="simple"/></inline-formula>, and removing <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e198" xlink:type="simple"/></inline-formula> from <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e199" xlink:type="simple"/></inline-formula>. We also assign <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e200" xlink:type="simple"/></inline-formula> to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e201" xlink:type="simple"/></inline-formula> for every <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e202" xlink:type="simple"/></inline-formula>.</p>
<p><xref ref-type="fig" rid="pcbi-1003610-g005">Fig. 5</xref> summarizes the contraction and assignment stages with an example. Note that cases such as 3-cliques in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e203" xlink:type="simple"/></inline-formula> (<xref ref-type="fig" rid="pcbi-1003610-g005">Fig. 5</xref>-B) can have multiple assignments with the same score (3 siblings from one parent couple, or 3 pairs of siblings from 3 different parent couples). In such cases our algorithm chooses the more parsimonious solution in which there is a smaller number of parents.</p>
<fig id="pcbi-1003610-g005" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g005</object-id><label>Figure 5</label><caption>
<title>Intuition for sibling assignment, depicting the potential-siblings graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e204" xlink:type="simple"/></inline-formula>, the contracted graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e205" xlink:type="simple"/></inline-formula>, and assigned graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e206" xlink:type="simple"/></inline-formula>.</title>
<p>In both examples <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e207" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e208" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e209" xlink:type="simple"/></inline-formula> are parent couples with extant descendants in the observed population. A. For the case where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e210" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e211" xlink:type="simple"/></inline-formula> are full-siblings, the contraction will end in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e212" xlink:type="simple"/></inline-formula> composed of three super-vertices, connected by two edges; the assignment algorithm will assign each edge to a disjoint clique. B. If <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e213" xlink:type="simple"/></inline-formula> are also full-siblings, a 3-clique is formed in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e214" xlink:type="simple"/></inline-formula>; the assignment algorithm assigns all edges to a corresponding 3-clique of siblings.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g005" position="float" xlink:type="simple"/></fig></sec></sec><sec id="s3c">
<title>2.3 Half-sibling Detection</title>
<p>In the following stage we define the half-sibling detection problem, where we attempt to detect groups of individuals with a single common-parent. First, we define the full-sibling relation, on individuals: <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e215" xlink:type="simple"/></inline-formula>. Notice that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e216" xlink:type="simple"/></inline-formula> is defined as being reflective, and thus it is an equivalence relation on <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e217" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e218" xlink:type="simple"/></inline-formula> is the quotient set of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e219" xlink:type="simple"/></inline-formula> on <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e220" xlink:type="simple"/></inline-formula>, which in this case is simply the set of disjoint groups of full-siblings. We obtain <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e221" xlink:type="simple"/></inline-formula> from the edges in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e222" xlink:type="simple"/></inline-formula> computed in section 2.2. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e223" xlink:type="simple"/></inline-formula> is a clique cover, and so naturally describes an equivalence relation.</p>
<p>We define <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e224" xlink:type="simple"/></inline-formula>, which is the half-sibling relation, as a relation between equivalence classes in V, in respect to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e225" xlink:type="simple"/></inline-formula>. Assuming the pedigree is known, HS is defined properly since if <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e226" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e227" xlink:type="simple"/></inline-formula> are full siblings, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e228" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e229" xlink:type="simple"/></inline-formula> are half-siblings, than <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e230" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e231" xlink:type="simple"/></inline-formula> are half siblings. This allows us to simplify the half-sib detection problem, by constructing the polygamy graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e232" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e233" xlink:type="simple"/></inline-formula> s.t each vertex <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e234" xlink:type="simple"/></inline-formula>, represents a group of full-siblings, and each edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e235" xlink:type="simple"/></inline-formula> represents a half-sibling relation between <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e236" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e237" xlink:type="simple"/></inline-formula> (see <xref ref-type="fig" rid="pcbi-1003610-g006">Fig. 6</xref>). The edges are added to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e238" xlink:type="simple"/></inline-formula>, with a similar stage to 2.1, only the hypotheses tested this time are made for siblings groups <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e239" xlink:type="simple"/></inline-formula>, and are relevant to the half-sibling case (half-siblings,cousins,unrelated).</p>
<fig id="pcbi-1003610-g006" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g006</object-id><label>Figure 6</label><caption>
<title>An example for the construction of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e240" xlink:type="simple"/></inline-formula> in the first generation.</title>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g006" position="float" xlink:type="simple"/></fig>
<p>The graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e241" xlink:type="simple"/></inline-formula> has the convenient property that if a group of individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e242" xlink:type="simple"/></inline-formula> have a single-common-parent then <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e243" xlink:type="simple"/></inline-formula> form a clique in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e244" xlink:type="simple"/></inline-formula>. We thus assume by parsimony, that each clique <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e245" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e246" xlink:type="simple"/></inline-formula> connects all of the children of a single parent <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e247" xlink:type="simple"/></inline-formula>, such that each <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e248" xlink:type="simple"/></inline-formula> is a full-sibling-group which contains the children of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e249" xlink:type="simple"/></inline-formula> and a single mate. We therefore formulate the half-sib detection problem, as follows:</p>
<sec id="s3c1">
<title><italic>Problem 2.</italic> Maximum weight, two-color clique cover</title>
<p>Given the graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e250" xlink:type="simple"/></inline-formula>, find sets of edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e251" xlink:type="simple"/></inline-formula>, such that both <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e252" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e253" xlink:type="simple"/></inline-formula> consist of an edge-disjoint set of cliques, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e254" xlink:type="simple"/></inline-formula>, and the total weight of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e255" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e256" xlink:type="simple"/></inline-formula> is maximized.</p>
</sec><sec id="s3c2">
<title>Theorem 2</title>
<p><italic>The Maximum Weight Two Color Clique Cover is NP-hard.</italic></p>
<p><italic>Proof.</italic> We will show a reduction from maximum clique. Consider an instance <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e257" xlink:type="simple"/></inline-formula> to the clique problem, and let <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e258" xlink:type="simple"/></inline-formula> be its largest clique. If <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e259" xlink:type="simple"/></inline-formula> we can set <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e260" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e261" xlink:type="simple"/></inline-formula>, and therefore the optimal solution to the coloring problem has at least <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e262" xlink:type="simple"/></inline-formula> edges. On the other hand, if <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e263" xlink:type="simple"/></inline-formula> then the size of each of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e264" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e265" xlink:type="simple"/></inline-formula> is at most <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e266" xlink:type="simple"/></inline-formula>, and thus the total size of both of them is bounded by <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e267" xlink:type="simple"/></inline-formula>. Thus, by solving the Maximum Weight Two Color Clique Cover in polynomial time we can decide between graphs with clique size at most <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e268" xlink:type="simple"/></inline-formula> and graphs with clique size at least <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e269" xlink:type="simple"/></inline-formula>, hence the problem is NP-hard.</p>
<p>Informally, we try to color all edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e270" xlink:type="simple"/></inline-formula> in two colors, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e271" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e272" xlink:type="simple"/></inline-formula>, s.t each color creates a set of disjoint cliques. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e273" xlink:type="simple"/></inline-formula> colored cliques, represent full-sibling-group cliques with a single common father, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e274" xlink:type="simple"/></inline-formula> colored cliques, represent full-sibling-group cliques with a single common mother.</p>
<p>This problem is also NP-hard and we therefore use the following greedy approach. For simplicity, we assume <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e275" xlink:type="simple"/></inline-formula> is connected. The algorithm begins by setting <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e276" xlink:type="simple"/></inline-formula>. We will denote by <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e277" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e278" xlink:type="simple"/></inline-formula> the set of vertices induced by <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e279" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e280" xlink:type="simple"/></inline-formula> respectively. The algorithm proceeds in iterations. In each iteration we search for the heaviest clique <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e281" xlink:type="simple"/></inline-formula> such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e282" xlink:type="simple"/></inline-formula>, and the heaviest clique <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e283" xlink:type="simple"/></inline-formula> such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e284" xlink:type="simple"/></inline-formula>. Without loss of generality, assume that the heaviest among those is a clique <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e285" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e286" xlink:type="simple"/></inline-formula>. If <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e287" xlink:type="simple"/></inline-formula> contains only one vertex, we search instead for the heaviest clique <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e288" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e289" xlink:type="simple"/></inline-formula>. We add the edges of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e290" xlink:type="simple"/></inline-formula> to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e291" xlink:type="simple"/></inline-formula> and remove these edges from the graph. Clearly, both <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e292" xlink:type="simple"/></inline-formula> consist of a set of disjoint cliques of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e293" xlink:type="simple"/></inline-formula>.</p>
<p>Notice that we try to minimize the number of arbitrarily colored cliques, by choosing cliques adjacent to cliques that are already colored. Simulation studies show that choosing this coloring order increases the half-sibling sensitivity from 85% to 97% on average (see <xref ref-type="table" rid="pcbi-1003610-t001">table 1</xref>). It is easy to see that sub-graphs that are composed of a connected list of cliques will be colored optimally by our coloring scheme. An example for such a graph is depicted in <xref ref-type="fig" rid="pcbi-1003610-g007">Fig. 7</xref>.</p>
<fig id="pcbi-1003610-g007" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g007</object-id><label>Figure 7</label><caption>
<title>An example for a case where the coloring order we purpose enables coloring more cliques with two colors than coloring the same graph with an arbitrary order.</title>
<p>The coloring order is depicted near the cliques. In the left graph we follow the depicted order and color the clique blue if possible, else we color it dashed-orange. The fourth click cannot be colored since it touches a blue and a dashed-orange clique. In the right graph we use our coloring scheme, which prefers coloring cliques touching cliques that are already colored. Using this order we are able to color all four cliques.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g007" position="float" xlink:type="simple"/></fig><table-wrap id="pcbi-1003610-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.t001</object-id><label>Table 1</label><caption>
<title>Sensitivity and PPV scores (as defined in the results section) of half-siblings using two coloring order schemes.</title>
</caption><alternatives><graphic id="pcbi-1003610-t001-1" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.t001" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td colspan="2" align="left" rowspan="1">PREPARE</td>
<td colspan="2" align="left" rowspan="1">naive</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Population size</td>
<td align="left" rowspan="1" colspan="1">Sensitivity</td>
<td align="left" rowspan="1" colspan="1">PPV</td>
<td align="left" rowspan="1" colspan="1">Sensitivity</td>
<td align="left" rowspan="1" colspan="1">PPV</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">200</td>
<td align="left" rowspan="1" colspan="1">1.0</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">300</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
<td align="left" rowspan="1" colspan="1">0.88</td>
<td align="left" rowspan="1" colspan="1">0.79</td>
<td align="left" rowspan="1" colspan="1">0.85</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">400</td>
<td align="left" rowspan="1" colspan="1">0.97</td>
<td align="left" rowspan="1" colspan="1">0.88</td>
<td align="left" rowspan="1" colspan="1">0.85</td>
<td align="left" rowspan="1" colspan="1">0.88</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">500</td>
<td align="left" rowspan="1" colspan="1">1.0</td>
<td align="left" rowspan="1" colspan="1">0.88</td>
<td align="left" rowspan="1" colspan="1">0.88</td>
<td align="left" rowspan="1" colspan="1">0.91</td>
</tr>
</tbody>
</table>
</alternatives><table-wrap-foot><fn id="nt101"><label/><p>(1) PREPARE's greedy coloring scheme as described in section 2.3. (2) Coloring cliques from the heaviest to lightest; if possible color with <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e294" xlink:type="simple"/></inline-formula>, else if possible color with <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e295" xlink:type="simple"/></inline-formula>.</p></fn></table-wrap-foot></table-wrap>
<p>The graph formulation of the half-sibling detection assumes that each edge in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e296" xlink:type="simple"/></inline-formula> represents a unique half-sibling relationships. We notice, that in some cases <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e297" xlink:type="simple"/></inline-formula> might contain redundant edges. In order to simplify the explanation, we extend the definition of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e298" xlink:type="simple"/></inline-formula> to nodes in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e299" xlink:type="simple"/></inline-formula>: <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e300" xlink:type="simple"/></inline-formula>. The problem arises, when there exists a pair of nodes <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e301" xlink:type="simple"/></inline-formula> from the same generation, such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e302" xlink:type="simple"/></inline-formula>. In such a case, an edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e303" xlink:type="simple"/></inline-formula> may be added to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e304" xlink:type="simple"/></inline-formula>, as a result of a relationship <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e305" xlink:type="simple"/></inline-formula>. Trying to contract <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e306" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e307" xlink:type="simple"/></inline-formula> is not sound, since different relationships can be detected for <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e308" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e309" xlink:type="simple"/></inline-formula> to a third vertex <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e310" xlink:type="simple"/></inline-formula>, by testing them separately. Instead, we apply a preprocessing to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e311" xlink:type="simple"/></inline-formula>, in the form of a set of parsimonious rules. The rules aim at filtering all the edges, except the ones that explain the observed features in the simplest way.</p>
<p>The first rule we apply concerns the case depicted in <xref ref-type="fig" rid="pcbi-1003610-g008">Fig. 8</xref>-A. In this case, an individual <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e312" xlink:type="simple"/></inline-formula>, with a half-sibling <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e313" xlink:type="simple"/></inline-formula>, has children with two mates <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e314" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e315" xlink:type="simple"/></inline-formula>. Since <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e316" xlink:type="simple"/></inline-formula> do not have full siblings, each of them is represented in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e317" xlink:type="simple"/></inline-formula> as a sibling-group of one individual. Since <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e318" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e319" xlink:type="simple"/></inline-formula> have children only with a, their descendant sets are contained in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e320" xlink:type="simple"/></inline-formula>'s descendant set. As a result, half-sibling edges should form between <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e321" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e322" xlink:type="simple"/></inline-formula>, additionally to the correct edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e323" xlink:type="simple"/></inline-formula>. To deal with this case, if we find a node a, in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e324" xlink:type="simple"/></inline-formula> that has two mates, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e325" xlink:type="simple"/></inline-formula> and the following holds: <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e326" xlink:type="simple"/></inline-formula>, we remove <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e327" xlink:type="simple"/></inline-formula> (we do the same for <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e328" xlink:type="simple"/></inline-formula>). A similar rule is applied to the contracted graph <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e329" xlink:type="simple"/></inline-formula>, where redundant full-sibling edges result from an equivalent case to the one just mentioned, and are removed in the same manner (see <xref ref-type="fig" rid="pcbi-1003610-g008">Fig. 8</xref>-B). A third rule is applied to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e330" xlink:type="simple"/></inline-formula> to deal with a case similar to the one in rule 1, only the mates <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e331" xlink:type="simple"/></inline-formula> are not the mates of a single individual <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e332" xlink:type="simple"/></inline-formula>, but instead <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e333" xlink:type="simple"/></inline-formula> is the mate of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e334" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e335" xlink:type="simple"/></inline-formula> is the mate of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e336" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e337" xlink:type="simple"/></inline-formula> are full-siblings (see <xref ref-type="fig" rid="pcbi-1003610-g008">Fig. 8</xref>-C). In such a case, a true relation <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e338" xlink:type="simple"/></inline-formula> may cause redundant half-sibling edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e339" xlink:type="simple"/></inline-formula>. These cases are characterized by mates <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e340" xlink:type="simple"/></inline-formula> that have few or no full-siblings. Thus, we look for edges <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e341" xlink:type="simple"/></inline-formula>) where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e342" xlink:type="simple"/></inline-formula>, such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e343" xlink:type="simple"/></inline-formula> is the mate of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e344" xlink:type="simple"/></inline-formula>, and remove <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e345" xlink:type="simple"/></inline-formula> from <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e346" xlink:type="simple"/></inline-formula>. Finally, we observed half-sibling edges forming between two mates <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e347" xlink:type="simple"/></inline-formula>, of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e348" xlink:type="simple"/></inline-formula> such that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e349" xlink:type="simple"/></inline-formula> are full-siblings. This results from the fact that most of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e350" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e351" xlink:type="simple"/></inline-formula>'s descendant similarity was already explained by the formation of the full-sibling relationship <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e352" xlink:type="simple"/></inline-formula>. The difference between the half-sibling hypothesis and the null hypothesis for <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e353" xlink:type="simple"/></inline-formula> becomes small. As a result, noisy decisions are made. To handle this final case, we remove half-sibling edges between mates of full siblings <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e354" xlink:type="simple"/></inline-formula> if they have a half-sibling edge <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e355" xlink:type="simple"/></inline-formula> in <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e356" xlink:type="simple"/></inline-formula>(see <xref ref-type="fig" rid="pcbi-1003610-g008">Fig. 8</xref>-D).</p>
<fig id="pcbi-1003610-g008" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g008</object-id><label>Figure 8</label><caption>
<title>Depicting cases where edge removal rules are required in polygamous pedigree reconstruction.</title>
<p>Redundant graph edges are dashed red, correct edges in solid black.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g008" position="float" xlink:type="simple"/></fig></sec></sec><sec id="s3d">
<title>2.4 Efficiency Considerations</title>
<p>Simulating inheritance for the descendants of every two individuals during the graph constructions is very time consuming, and is the reason <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e357" xlink:type="simple"/></inline-formula> is impractical for large populations, or pedigrees deeper than 4 generations. Notice that if a pair of extant descendants has exactly the same ancestor structure in the pedigree, than the simulated IBD features are sampled from the same distribution. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e358" xlink:type="simple"/></inline-formula> purposes caching individual pairs with identical inheritance paths, and introduces an accompanying dynamic programming algorithm for minimizing the number of operations.</p>
<p>In <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e359" xlink:type="simple"/></inline-formula>, we use a simplified version of this idea. For every pair <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e360" xlink:type="simple"/></inline-formula> of extant descendants, we calculate a least-common-ancestors (LCAs) vector <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e361" xlink:type="simple"/></inline-formula>, which is a list of the meiosis distances between <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e362" xlink:type="simple"/></inline-formula> and their least common ancestors. For example, all full-siblings will have the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e363" xlink:type="simple"/></inline-formula> = <xref ref-type="bibr" rid="pcbi.1003610-Blouin1">[1,1]</xref>, since full-siblings always have two common ancestors, with one separating meiosis. We hash the simulated distribution for this LCA vector, where the key represents the vector itself, and the value is the distribution. We simulate inheritance only when needed, i.e. when <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e364" xlink:type="simple"/></inline-formula> have at least one descendant pair, without a hashed distribution, thus saving most of the redundant computation. Practically, the running time of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e365" xlink:type="simple"/></inline-formula> is equivalent to the running time of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e366" xlink:type="simple"/></inline-formula>, and is even slightly faster (see Table. 2). Although <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e367" xlink:type="simple"/></inline-formula> does not capture completely the ancestry structure for <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e368" xlink:type="simple"/></inline-formula>, we observed empirically (data not shown) that running simulations for each ancestry structure does not improve the reconstruction accuracy. Apparently, pairs of individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e369" xlink:type="simple"/></inline-formula> with the same LCAs vector have similar IBD distributions. The similarity is large enough to make the repetition of inheritance simulation for two such pairs redundant.</p>
<table-wrap id="pcbi-1003610-t002" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.t002</object-id><label>Table 2</label><caption>
<title>Running times of PREPARE on 1.6GHz Intel Core i5-2467M machine with 4G RAM using a single thread.</title>
</caption><alternatives><graphic id="pcbi-1003610-t002-2" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.t002" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Population Size</td>
<td align="left" rowspan="1" colspan="1">monogamous</td>
<td align="left" rowspan="1" colspan="1">polygamous</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">31s</td>
<td align="left" rowspan="1" colspan="1">4m 18s</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">200</td>
<td align="left" rowspan="1" colspan="1">53s</td>
<td align="left" rowspan="1" colspan="1">9m 21s</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">500</td>
<td align="left" rowspan="1" colspan="1">4m 55s</td>
<td align="left" rowspan="1" colspan="1">56m 40s</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1000</td>
<td align="left" rowspan="1" colspan="1">10m 27s</td>
<td align="left" rowspan="1" colspan="1">93m 41s</td>
</tr>
</tbody>
</table>
</alternatives><table-wrap-foot><fn id="nt102"><label/><p>The two parameters affecting the running time of prepare is the population size, and whether PREPARE is run on monogamous or polygamous mode. Most of the running time is spent on reconstructing the fifth generation.</p></fn></table-wrap-foot></table-wrap></sec><sec id="s3e">
<title>2.5 Availability</title>
<p>The <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e370" xlink:type="simple"/></inline-formula> method, inheritance simulators, and quality evaluation tools are available at <ext-link ext-link-type="uri" xlink:href="http://www.cs.tau.ac.il/" xlink:type="simple">http://www.cs.tau.ac.il/</ext-link><inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e371" xlink:type="simple"/></inline-formula>heran/cozygene/software.shtml</p>
</sec></sec><sec id="s4">
<title>Results</title>
<p>We compare the accuracy of our method to previous pedigree reconstruction methods on numerous simulations. Different simulations include combinations of population size and inheritance modes (monogamous and polygamous). Smaller population sizes correspond to inbred populations with multiple relationships between families. Larger populations correspond to outbred populations, with simpler pedigree structures. We also study the effect of population bottlenecks on the reconstruction quality. In order to test <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e372" xlink:type="simple"/></inline-formula> on a more realistic scenario, we run it on a realistic simulation starting from HapMap phaseIII <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e373" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e374" xlink:type="simple"/></inline-formula> populations as founders. The simulation simulates polygamous random mating in this population for 200 years, reaching to a final population size of 1000. Finally, we apply PREPARE on the HapMap <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e375" xlink:type="simple"/></inline-formula> population as a feasibility test for application of our method for real populations.</p>
<sec id="s4a">
<title>3.1 Simulations</title>
<p>Similarly to previous methods, we use a Wright-Fisher (WF) simulator that includes recombination and genders. We add several new features, which makes this simulator more flexible. First, we add the ability to control polygamy through a polygamy probability parameter <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e376" xlink:type="simple"/></inline-formula>, which controls the probability for an individual to have a child with more than one mate. Second, we add an option to simulate dynamic population sizes by specifying an initial population size and a final population size. The simulator calculates the required population change per generation and modifies the population size with that ratio in every generation.</p>
<p>Additionally, we experiment with a more realistic forward simulator that does not assume synchronized generations, and allows polygamy. We simulate inheritance as a function of time, where individuals can have children after the age of 20, and die at an age drawn from a capped exponential distribution with mean 50. The birthrate is changed according to the current population size, and is tuned to reach a predefined target population size. This simulator produces actual recombined haplotypes, from the haplotypes of 160 <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e377" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e378" xlink:type="simple"/></inline-formula> HapMap representatives. More specifically, the simulation runs in 5 year iterations, and a pool of unmated mature individuals is maintained at all times. Every iteration, individuals from the pool are matched to uniformly drawn mates. A matching has probability <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e379" xlink:type="simple"/></inline-formula> to succeed. Every mated pair has a probability <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e380" xlink:type="simple"/></inline-formula> to have a child, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e381" xlink:type="simple"/></inline-formula> is initialized to be 1, and is modified in every iteration by +0.2 or -0.2 depending on whether the current population size is smaller or larger than the target population size. Polygamy is achieved through second-marriage, which can occur since once a mate dies, the individual is added back to the unmated pool. Finally, in order to include possible IBD detection errors, we detect IBD segments from simulated genotypes using <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e382" xlink:type="simple"/></inline-formula>, <xref ref-type="bibr" rid="pcbi.1003610-Gusev1">[17]</xref>, and extract the IBD-features information from its output. This simulator also has the advantage of having a possible dynamic population size. The population grows or shrinks depending on the initial and target population sizes.</p>
</sec><sec id="s4b">
<title>3.2 Quality Evaluation</title>
<p>Many different measures can be accounted in evaluating the quality of reconstructed pedigrees. We first use a previously defined score, to compare <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e383" xlink:type="simple"/></inline-formula> to previous methods. For the large part of the presentation, we define and use other natural evaluation scores, which we deem as more relevant, and interpretable. In previous methods, a consensus-accuracy score, which counts the number of extant individual-pairs with the same minimal meiosis-distance as in the true pedigree was used <xref ref-type="bibr" rid="pcbi.1003610-He1">[14]</xref>. This score treats correct detection of unrelated pairs and related pairs identically. This is problematic since the number of unrelated pairs dominates the score. For example, a trivial algorithm that outputs a pedigree where all individuals are unrelated receives a high consensus-accuracy score (see <xref ref-type="fig" rid="pcbi-1003610-g009">Fig. 9</xref>). As a new standard for pedigree-reconstruction evaluation, we suggest three types of scores: sensitivity, positive-predictive-value (PPV), and IBD-length prediction error.</p>
<fig id="pcbi-1003610-g009" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g009</object-id><label>Figure 9</label><caption>
<title>Example for the problematic nature of the consensus-accuracy score, in contrast with the sensitivity score we propose.</title>
<p>Notice how the unrelated pedigree structure receives similar consensus-accuracy scores to <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e384" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e385" xlink:type="simple"/></inline-formula> reconstructions. Still, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e386" xlink:type="simple"/></inline-formula> scores are significantly higher. Shown are average scores over 5 simulations, and standard deviation bars. (Some error bars are too small to be visible).</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g009" position="float" xlink:type="simple"/></fig>
<p>We define sensitivity as the fraction of correctly constructed (distance wise) related pairs from the total number of related pairs in the original pedigree. PPV is defined as the fraction of correctly constructed related pairs from the total number of related pairs in the reconstructed pedigree. More formally, define <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e387" xlink:type="simple"/></inline-formula> as the reconstructed pedigree, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e388" xlink:type="simple"/></inline-formula> as the original pedigree, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e389" xlink:type="simple"/></inline-formula> as the minimal number of meiosis separating <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e390" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e391" xlink:type="simple"/></inline-formula> in pedigree <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e392" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e393" xlink:type="simple"/></inline-formula> as the set of extant-individuals, which are related according to pedigree <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e394" xlink:type="simple"/></inline-formula>. Let <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e395" xlink:type="simple"/></inline-formula>. Then,<disp-formula id="pcbi.1003610.e396"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003610.e396" xlink:type="simple"/></disp-formula></p>
<p>We run <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e397" xlink:type="simple"/></inline-formula> for <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e398" xlink:type="simple"/></inline-formula> generation, and compare the scores of reconstructed pedigrees for every generation <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e399" xlink:type="simple"/></inline-formula> against the first <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e400" xlink:type="simple"/></inline-formula> generations of the original pedigree. This way we can assess the accuracy of different relatedness degrees (<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e401" xlink:type="simple"/></inline-formula> = 2 corresponds to siblings, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e402" xlink:type="simple"/></inline-formula> = 3 to siblings and first-cousins, etc.)</p>
<p>Scores such as sensitivity and PPV have the disadvantage of not weighing mistakes according to their magnitude. A second disadvantage is that the minimal meiotic distance does not capture the full complexity of a real pedigree (for example, double cousins detected as cousins will get a full scoring). For these reasons, we suggest to alternatively measure pedigree quality by calculating the root mean square IBD-length error (<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e403" xlink:type="simple"/></inline-formula>):<disp-formula id="pcbi.1003610.e404"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003610.e404" xlink:type="simple"/></disp-formula>where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e405" xlink:type="simple"/></inline-formula> is the set of extant individuals in the population, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e406" xlink:type="simple"/></inline-formula> is the observed total length of IBD segments between individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e407" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e408" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e409" xlink:type="simple"/></inline-formula> is the total length of IBD segments between individuals <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e410" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e411" xlink:type="simple"/></inline-formula>, as given from simulating inheritance on the reconstructed pedigree <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e412" xlink:type="simple"/></inline-formula>. Since this score is dependent on the randomized scoring-simulation, we average the score of 5 runs. The <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e413" xlink:type="simple"/></inline-formula> can be interpreted as the expected prediction error (in Mbp) of the typical pair-wise total-IBD-length, given the reconstructed pedigree.</p>
</sec><sec id="s4c">
<title>3.3 Comparing <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e414" xlink:type="simple"/></inline-formula> and Competing Methods on Monogamous Simulations</title>
<p>We tested the competing methods on monogamous Wright-Fisher simulated population, of constant sizes: 100, 200, 500, and 1000. When it was possible, we ran <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e415" xlink:type="simple"/></inline-formula> (up to 4 generations due to its high runtime complexity), and for larger populations we ran <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e416" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e417" xlink:type="simple"/></inline-formula> was run in monogamous mode. Results on 100 and 200 individuals were similar, as well as results for 500 and 1000 individuals. In <xref ref-type="fig" rid="pcbi-1003610-g010">Fig. 10</xref>, we compare the three methods for small populations (200) and larger populations (1000). In all the scenarios we tested, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e418" xlink:type="simple"/></inline-formula> was the most sensitive; for pedigrees of up to 5 generations (corresponding to 3rd cousins) and populations as small as 100 individuals. For the larger populations, the improvement in sensitivity is highest, where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e419" xlink:type="simple"/></inline-formula> is able to build a pedigree which correctly predicts the minimal meiosis distance of more than 95% of 1st and 2nd degree relatives and more than 60% of relatives up to 3rd degree. At the same time, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e420" xlink:type="simple"/></inline-formula> has a higher PPV up to pedigrees of 4 generations. In the 5th generation it gets a lower PPV than the other methods, but this disadvantage is not meaningful, since the sensitivity of these methods in the 5th generation is very low. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e421" xlink:type="simple"/></inline-formula> gives better quality of results for larger populations, which is natural, since they tend to form simpler pedigrees with less multi-relationships between families, and less inbred families.</p>
<fig id="pcbi-1003610-g010" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g010</object-id><label>Figure 10</label><caption>
<title>Comparison of pedigree reconstruction methods for monogamous populations, using Sensitivity, PPV, and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e422" xlink:type="simple"/></inline-formula>.</title>
<p>Populations were simulated with Wright-Fisher simulations of 5 generation. Shown are average scores over 5 simulation, with standard deviations bars. The optimal <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e423" xlink:type="simple"/></inline-formula> score is calculated by scoring the true k-generation pedigree. The first generation pedigree in the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e424" xlink:type="simple"/></inline-formula> figures, is the score of the pedigree where all individuals are unrelated, and is shown as reference. (Some error bars are too small to be visible).</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g010" position="float" xlink:type="simple"/></fig>
<p>Considering <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e425" xlink:type="simple"/></inline-formula> scores, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e426" xlink:type="simple"/></inline-formula> gets much better scores than the second best method, and is close to the optimal score, especially for larger populations. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e427" xlink:type="simple"/></inline-formula> gets worse <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e428" xlink:type="simple"/></inline-formula> scores than <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e429" xlink:type="simple"/></inline-formula>/<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e430" xlink:type="simple"/></inline-formula> as a result of its practical tendency to over-predict inbreeding, which we observed during our experiments. An important feature of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e431" xlink:type="simple"/></inline-formula>'s score is that it is non-increasing in the number of generations, similarly to the optimal score. In contrast, we do not see this behavior in other methods. Interestingly, the optimal scores decrease as the population size increases. We attribute this mainly to the increasing proportion of unrelated pairs in larger populations, which are easier to predict.</p>
</sec><sec id="s4d">
<title>3.4 The Effect of Population Expansion on the Success of Pedigree Reconstruction</title>
<p>The simplified Wright-Fisher model that was used in pedigree reconstruction methods up to this day assumes a constant population size. Real populations sizes are obviously not constant, and it is known that population bottlenecks and expansion affect the IBD distribution in the population. We have conducted an experiment to test the effect of population size shifts on the distribution of chosen IBD features, and as a consequence on the quality of the resulting pedigree. We have run the Wright-Fisher simulation with changing initial population sizes of 100,200,300,400,500 and fixed the final population size at 500. By looking at the distribution of IBD features between all pairs of individuals, it is clear to see that the number of IBD segments and the mean IBD segment length have an inverse relationship with the initial population size. This corresponds to a higher proportion of relatives in the populations with smaller initial size. We have found that populations that grow from 100 to 500 individuals in five generations have similar IBD feature distributions to populations with constant population size of size 200. Interestingly the quality of the resulting pedigree of these populations remains unchanged when the initial population size is gradually decreased from 500 to 200. Only at initial size of 100 does the quality decrease. Sensitivity levels for initial population size of 100 are 0.96,0.75, and 0.54 for 2,3 and 4 generations. The largest decrease is for 3-generation pedigrees where the sensitivity is decreased by 10% on average. The PPV remains above 0.95 for generation 2,3 but is decreased from 0.85 to 0.71 in generation 4.</p>
</sec><sec id="s4e">
<title>3.5 Comparing <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e432" xlink:type="simple"/></inline-formula> and Competing Methods on Polygamous Simulations</title>
<p>To asses the quality of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e433" xlink:type="simple"/></inline-formula> on polygamous populations, we simulated polygamous populations of sizes 200 and 1000 with the Wright-Fisher model. In the simulated populations 33% of the siblings are half-siblings on average. Details regarding the execution of previous methods are the same as in section 3.3. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e434" xlink:type="simple"/></inline-formula> was run with the polygamous mode. The results are summarized in <xref ref-type="fig" rid="pcbi-1003610-g011">Fig. 11</xref>. Once again <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e435" xlink:type="simple"/></inline-formula> is generally superior in terms of sensitivity, PPV and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e436" xlink:type="simple"/></inline-formula>. A notable exception is <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e437" xlink:type="simple"/></inline-formula>'s relatively high sensitivity in generations 4 and 5 in smaller population sizes (200). Note however that this sensitivity comes at the cost of very low PPV and very high <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e438" xlink:type="simple"/></inline-formula> in these generations. The <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e439" xlink:type="simple"/></inline-formula> of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e440" xlink:type="simple"/></inline-formula> is not shown in the graph since it is out of the charts, getting as high as 1500 Mbp. This result suggests that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e441" xlink:type="simple"/></inline-formula> has a strong tendency to over-predict relationships in small polygamous populations.</p>
<fig id="pcbi-1003610-g011" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g011</object-id><label>Figure 11</label><caption>
<title>Comparison of pedigree reconstruction methods for polygamous populations.</title>
<p>Populations were simulated with polygamous Wright-Fisher simulations of 5 generation. Shown are average scores over 5 simulation, with standard deviations bars. (Some error bars are too small to be visible).</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g011" position="float" xlink:type="simple"/></fig>
<p>Similarly to the monogamous case, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e442" xlink:type="simple"/></inline-formula> achieves higher performance on larger, and as a result, more simply related populations. For a population size of 1000, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e443" xlink:type="simple"/></inline-formula> is able to build a polygamous pedigree which correctly predicts the minimal meiosis distance of more than 97% of 1st degree relatives and more than 80% of 2nd degree relatives while maintaining a PPV greater than 80%. Polygamous populations pose a much greater challenge for pedigree reconstruction, and the performance is decreased in comparison to monogamous populations. According to our analysis, the difficulty in reconstructing polygamous pedigrees stems from the fact that the IBD feature distributions for the range of possible polygamous relationships have greater overlap than in monogamous relationships (See <xref ref-type="fig" rid="pcbi-1003610-g012">Fig. 12</xref>).</p>
<fig id="pcbi-1003610-g012" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g012</object-id><label>Figure 12</label><caption>
<title>Simulated IBD feature distribution in monogamous and polygamous populations.</title>
<p>The overlap in polygamous distributions is the main challenge in reconstructing pedigrees of real populations.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g012" position="float" xlink:type="simple"/></fig></sec><sec id="s4f">
<title>3.6 Reconstructing Realistically Simulated HapMap Descending Population</title>
<p>We test the performance of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e444" xlink:type="simple"/></inline-formula> on populations produced by the polygamous, asynchronous forward simulator. We run the simulator for hundreds of simulation years, resulting in the mixing of the different generations, and reconstruct the last five generations. We use un-phased IBD segments, to account for the fact that our input is genotypes, and not haplotypes. As a necessary step, we aim to filter out cross-generation relationships, which are not currently modeled, by taking the genotypes from the youngest age stratum (Ages 0-20). We used the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e445" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e446" xlink:type="simple"/></inline-formula> HapMap genotypes as the founder population for our simulation. The results show a comparable success to the Wright-Fisher simulation, increasing our confidence that <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e447" xlink:type="simple"/></inline-formula> can be run on real populations. All accuracy measures show a decrease in accuracy compared to the Wright-Fisher simulation results. This is expected due to the addition of several factors (as discussed above), which adds to the complexity of the analysis (see <xref ref-type="fig" rid="pcbi-1003610-g013">Fig. 13</xref>).</p>
<fig id="pcbi-1003610-g013" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g013</object-id><label>Figure 13</label><caption>
<title>The performance of PREPARE on realistic simulation is comparable to polygamous Wright-Fisher simulations.</title>
<p>The simulated population grew from 160 individuals of the <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e448" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e449" xlink:type="simple"/></inline-formula> HapMap populations to 846 individuals in 200 years. This simulation accounts for IBD detection errors, asynchronous mating and dynamic population size.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g013" position="float" xlink:type="simple"/></fig></sec><sec id="s4g">
<title>3.7 Application for the HapMap MEX Population</title>
<p>We next use <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e450" xlink:type="simple"/></inline-formula> to reconstruct the historical pedigree for the HapMap MEX population. This population is of interest to us since it is known to contain several relatives, including a single 4-generation pedigree <xref ref-type="bibr" rid="pcbi.1003610-Kyriazopouloupanagiotopoulou1">[5]</xref>. Age information is not publicly available for this dataset. Instead, we use known parent-offspring relationships to separate the population into three generations. The correct pedigree is not known, so we use previous relationship inference results by Stevens et al. to validate our results<xref ref-type="bibr" rid="pcbi.1003610-Stevens1">[18]</xref>.</p>
<p>Running <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e451" xlink:type="simple"/></inline-formula> on the parent generation of HapMap phaseII+III <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e452" xlink:type="simple"/></inline-formula> genotypes, we are able to detect a single sibling relationship (NA19662,NA19685), three first-cousin relationships (NA19662,NA19664), (NA19664,NA19685), (NA19657,NA19786) and two second-cousin relationships (NA19657,NA19785), (NA19785,NA19786). We are able to reconstruct correctly the pedigree found by Kyriazopoulou et al. We do this fully automatically and without using the genotypes of the two known grandparents: (NA19662,NA19685) which makes the reconstruction a significantly harder task(see <xref ref-type="fig" rid="pcbi-1003610-g014">Fig. 14</xref>). Further more, all of the relationships inferred by <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e453" xlink:type="simple"/></inline-formula> except (NA19785,NA19786) are confirmed by Stevens et al.<xref ref-type="bibr" rid="pcbi.1003610-Stevens1">[18]</xref>. (NA19657,NA19786) are inferred as Third degree instead of first cousins, and (NA19657,NA19785) as Unknown degree instead of second cousins.</p>
<fig id="pcbi-1003610-g014" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003610.g014</object-id><label>Figure 14</label><caption>
<title>PREPARE successfully isolates the 4 generation pedigree found by CARROT.</title>
<p>Nodes correspond to individuals, and edges to parent offspring relationships. The last generation individuals are real HapMap individuals, and the other nodes are ancestors predicted by PREPARE.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003610.g014" position="float" xlink:type="simple"/></fig></sec></sec><sec id="s5">
<title>Discussion</title>
<p>In this paper, we take a step towards making pedigree reconstruction from present living populations, a realistic objective. By developing better quality assessment tools, we were able to come to the conclusion that our method reconstructs pedigrees with significantly higher quality then previous methods, and in comparable running times. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e454" xlink:type="simple"/></inline-formula> is the first method to our knowledge to address polygamy, and paternal/maternal relative partitioning. Although we succeed partitioning the relatives, there is no way to know which relatives are really related to the father, and which to the mother by considering autosomal data alone. We are not worried about this lack in specificity, as we do not strive to learn the ancestral genders. Instead, we are interested in inferring the pedigree structure, which provides the relatedness structure. Our graph framework, brings to the surface several ambiguous cases that cannot be solved without utilizing additional subtle information. For example, the assignment of a 3-clique (see <xref ref-type="fig" rid="pcbi-1003610-g005">Fig. 5</xref>-B) might be decided better by considering three-way IBD sharing. The chance of having triple IBD sharing diminishes much faster than the chance of pair-wise IBD sharing and limits the theoretical possibility to correctly reconstruct these cases in advanced generations. Reconstructing inbred relationships correctly remains an unmet challenge by all methods in the present. It seems that an approach to deal with inbreeding will need to utilize additional inbreeding imprints on the data, such as homozygosity levels and other IBD-features not used today. Additionally, current methods do not include inbreeding options in the hypothesis testing stage, which might lead to the wrong conclusions when inbreeding exists. Despite the above, our method is able to reconstruct high quality pedigrees by dealing correctly with the most frequently arising cases in randomly mating populations. We believe that improving the performance on such rare aspects will probably have a small impact on the pedigree quality. More importantly, in order to further improve the reconstruction quality of polygamous populations, it seems that a better set of IBD features needs to be found, with higher separating power between different relationship types. Theoretically, the size of a family can influence the scores of its founders since larger families will contribute more extant individuals to the score computation. Simulating populations with differing typical family sizes show little effect on the quality of reconstruction. The current <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003610.e455" xlink:type="simple"/></inline-formula> method can be applicable for real populations, with the setback that only a specific age-range must be taken as input, such that most inter-generation relationships will be excluded.</p>
</sec></body>
<back>
<ack>
<p>We would like to thank Bonnie-Kirkpatrick and Dan He for their aid in successful compilation and running of the <italic>CIP/COP</italic> and <italic>IPED</italic> tools. Additionally, we would like to thank Moshe Einhorn and Roni Vilenchick, who worked with us on a project on the pedigree-reconstruction subject, which was the starting point for this research.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1003610-Blouin1"><label>1</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Blouin</surname><given-names>MS</given-names></name> (<year>2003</year>) <article-title>DNA-based methods for pedigree reconstruction and kinship analysis in natural populations</article-title>. <source>Trends in Ecology &amp; Evolution</source> <volume>18</volume>: <fpage>503</fpage>–<lpage>511</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Lin1"><label>2</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lin</surname><given-names>TH</given-names></name>, <name name-style="western"><surname>Myers</surname><given-names>EW</given-names></name>, <name name-style="western"><surname>Xing</surname><given-names>EP</given-names></name> (<year>2006</year>) <article-title>Interpreting anonymous DNA samples from mass disasters–probabilistic forensic inference using genetic markers</article-title>. <source>Bioinformatics (Oxford, England)</source> <volume>22</volume>: <fpage>e298</fpage>–<lpage>306</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Vouillamoz1"><label>3</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Vouillamoz</surname><given-names>JF</given-names></name>, <name name-style="western"><surname>Grando</surname><given-names>MS</given-names></name> (<year>2006</year>) <article-title>Genealogy of wine grape cultivars: “Pinot” is related to “Syrah”</article-title>. <source>Heredity</source> <volume>97</volume>: <fpage>102</fpage>–<lpage>10</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Thomas1"><label>4</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Thomas</surname><given-names>SC</given-names></name> (<year>2000</year>) <collab xlink:type="simple">HillWG</collab> (<year>2000</year>) <article-title>Estimating quantitative genetic parameters using sibships reconstructed from marker data</article-title>. <source>Genetics</source> <volume>155</volume>: <fpage>1961</fpage>–<lpage>72</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Kyriazopouloupanagiotopoulou1"><label>5</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kyriazopoulou-panagiotopoulou</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Haghighi</surname><given-names>DK</given-names></name>, <name name-style="western"><surname>Aerni</surname><given-names>SJ</given-names></name>, <name name-style="western"><surname>Sundquist</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Bercovici</surname><given-names>S</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Reconstruction of genealogical relationships with applications to Phase III of HapMap</article-title>. <source>Bioinformatics</source> <volume>27</volume>: <fpage>333</fpage>–<lpage>341</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Huff1"><label>6</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Huff</surname><given-names>CD</given-names></name>, <name name-style="western"><surname>Witherspoon</surname><given-names>DJ</given-names></name>, <name name-style="western"><surname>Simonson</surname><given-names>TS</given-names></name>, <name name-style="western"><surname>Xing</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Watkins</surname><given-names>WS</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Maximum-likelihood estimation of recent shared ancestry (ERSA)</article-title>. <source>Genome research</source> <volume>21</volume>: <fpage>768</fpage>–<lpage>74</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Thompson1"><label>7</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Thompson</surname><given-names>Ea</given-names></name> (<year>1976</year>) <article-title>Inference of genealogical structure</article-title>. <source>Social Science Information</source> <volume>15</volume>: <fpage>477</fpage>–<lpage>526</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Almudevar1"><label>8</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Almudevar</surname><given-names>A</given-names></name> (<year>2003</year>) <article-title>A simulated annealing algorithm for maximum likelihood pedigree reconstruction</article-title>. <source>Theoretical Population Biology</source> <volume>63</volume>: <fpage>63</fpage>–<lpage>75</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-McPeek1"><label>9</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>McPeek</surname><given-names>MS</given-names></name>, <name name-style="western"><surname>Sun</surname><given-names>L</given-names></name> (<year>2000</year>) <article-title>Statistical tests for detection of misspecified relationships by use of genome-screen data</article-title>. <source>American journal of human genetics</source> <volume>66</volume>: <fpage>1076</fpage>–<lpage>94</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Cussens1"><label>10</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Cussens</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Bartlett</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Jones</surname><given-names>EM</given-names></name>, <name name-style="western"><surname>Sheehan</surname><given-names>Na</given-names></name> (<year>2013</year>) <article-title>Maximum likelihood pedigree reconstruction using integer linear programming</article-title>. <source>Genetic epidemiology</source> <volume>37</volume>: <fpage>69</fpage>–<lpage>83</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Kirkpatrick1"><label>11</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Kirkpatrick</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Li</surname><given-names>SC</given-names></name>, <name name-style="western"><surname>Karp</surname><given-names>RM</given-names></name>, <name name-style="western"><surname>Halperin</surname><given-names>E</given-names></name> (<year>2011</year>) <article-title>Pedigree reconstruction using identity by descent</article-title>. <source>Journal of computational biology a journal of computational molecular cell biology</source> <volume>18</volume>: <fpage>1481</fpage>–<lpage>93</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Thatte1"><label>12</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Thatte</surname><given-names>BD</given-names></name>, <name name-style="western"><surname>Steel</surname><given-names>M</given-names></name> (<year>2008</year>) <article-title>Reconstructing pedigrees: a stochastic perspective</article-title>. <source>Journal of theoretical biology</source> <volume>251</volume>: <fpage>440</fpage>–<lpage>9</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Steel1"><label>13</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Steel</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Hein</surname><given-names>J</given-names></name> (<year>2006</year>) <article-title>Reconstructing pedigrees: a combinatorial perspective</article-title>. <source>Journal of theoretical biology</source> <volume>240</volume>: <fpage>360</fpage>–<lpage>7</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-He1"><label>14</label>
<mixed-citation publication-type="other" xlink:type="simple">He D, Wang Z, Han B (2013) IPED : Inheritance Path Based Pedigree Reconstruction Algorithm Using Genotype Data. Recomb : 75–87.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Witherspoon1"><label>15</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Witherspoon</surname><given-names>DJ</given-names></name>, <name name-style="western"><surname>Huff</surname><given-names>CD</given-names></name>, <name name-style="western"><surname>Zhang</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Watkins</surname><given-names>WS</given-names></name>, <name name-style="western"><surname>Simonson</surname><given-names>TS</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>ERSA E stimation of R ecent S hared A ncestry by maximum likelihood modeling of pairwise Applications of relationship estimation :</article-title>. <source>Genome research</source> <volume>21</volume>: <fpage>768</fpage>–<lpage>74</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Hstad1"><label>16</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hstad</surname><given-names>J</given-names></name> (<year>1996</year>) <article-title>Clique is hard to approximate within</article-title>. <volume>n(1-ε)</volume> <fpage>627</fpage>–<lpage>636</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Gusev1"><label>17</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gusev</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Lowe</surname><given-names>JK</given-names></name>, <name name-style="western"><surname>Stoffel</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Daly</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Altshuler</surname><given-names>D</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>Whole Population, Genomewide Mapping of Hidden Relatedness</article-title>. <source>Genome research</source> <volume>19</volume>: <fpage>1</fpage>–<lpage>39</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003610-Stevens1"><label>18</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Stevens</surname><given-names>EL</given-names></name>, <name name-style="western"><surname>Heckenberg</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Roberson</surname><given-names>EDO</given-names></name>, <name name-style="western"><surname>Baugher</surname><given-names>JD</given-names></name>, <name name-style="western"><surname>Downey</surname><given-names>TJ</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Inference of relationships in population data using identity-by-descent and identity-by-state</article-title>. <source>PLoS genetics</source> <volume>7</volume>: <fpage>e1002287</fpage>.</mixed-citation>
</ref>
</ref-list></back>
</article>