<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id><journal-title-group>
<journal-title>PLoS Computational Biology</journal-title></journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-14-00732</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1003880</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biology and life sciences</subject><subj-group><subject>Genetics</subject><subj-group><subject>Heredity</subject><subj-group><subject>Genetic linkage</subject><subj-group><subject>Sex linkage</subject><subj-group><subject>X-linked traits</subject></subj-group></subj-group><subj-group><subject>Autosomal linkage</subject></subj-group></subj-group></subj-group><subj-group><subject>Mutation</subject><subj-group><subject>Point mutation</subject></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Research and analysis methods</subject><subj-group><subject>Mathematical and statistical techniques</subject><subj-group><subject>Bayesian method</subject><subject>Probability estimation</subject></subj-group></subj-group></subj-group></article-categories>
<title-group>
<article-title>FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units</article-title>
<alt-title alt-title-type="running-head">FamSeq</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Peng</surname><given-names>Gang</given-names></name><xref ref-type="aff" rid="aff1"/></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Fan</surname><given-names>Yu</given-names></name><xref ref-type="aff" rid="aff1"/></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Wang</surname><given-names>Wenyi</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><addr-line>Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America</addr-line></aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Gardner</surname><given-names>Paul P.</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/></contrib>
</contrib-group>
<aff id="edit1"><addr-line>University of Canterbury, New Zealand</addr-line></aff>
<author-notes>
<corresp id="cor1">* E-mail: <email xlink:type="simple">wwang7@mdanderson.org</email></corresp>
<fn fn-type="conflict"><p>The authors have declared that no competing interests exist.</p></fn>
<fn fn-type="con"><p>Conceived and designed the experiments: WW. Performed the experiments: GP YF WW. Analyzed the data: GP WW. Wrote the paper: GP YF WW.</p></fn>
</author-notes>
<pub-date pub-type="collection"><month>10</month><year>2014</year></pub-date>
<pub-date pub-type="epub"><day>30</day><month>10</month><year>2014</year></pub-date>
<volume>10</volume>
<issue>10</issue>
<elocation-id>e1003880</elocation-id>
<history>
<date date-type="received"><day>25</day><month>4</month><year>2014</year></date>
<date date-type="accepted"><day>20</day><month>8</month><year>2014</year></date>
</history>
<permissions>
<copyright-year>2014</copyright-year>
<copyright-holder>Peng et al</copyright-holder><license xlink:type="simple"><license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p></license></permissions>
<abstract>
<p>Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.</p>
</abstract>
<funding-group><funding-statement>WW and GP are supported in part by the Cancer Prevention Research Institute of Texas through grant number RP130090. WW is supported in part by the National Cancer Institute through grant numbers 1R01CA174206-01, 5U24 CA143883-05 and P30 CA016672. YF is supported in part by the National Cancer Institute through grant number 5U24 CA143883-05, and by a training fellowship from the Keck Center Computational Cancer Biology Training Program of the Gulf Coast Consortia (CPRIT Grant No. RP140113). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement></funding-group><counts><page-count count="6"/></counts><custom-meta-group><custom-meta id="data-availability" xlink:type="simple"><meta-name>Data Availability</meta-name><meta-value>The authors confirm that all data underlying the findings are fully available without restriction. Software is available from: <ext-link ext-link-type="uri" xlink:href="http://bioinformatics.mdanderson.org/main/FamSeq/" xlink:type="simple">http://bioinformatics.mdanderson.org/main/FamSeq/</ext-link> and <ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/famseq/" xlink:type="simple">http://sourceforge.net/projects/famseq/</ext-link></meta-value></custom-meta></custom-meta-group></article-meta>
</front>
<body><sec id="s1">
<title/>
<disp-quote>
<p>This is a <italic>PLOS Computational Biology</italic> Software Article</p>
</disp-quote></sec><sec id="s2">
<title>Introduction</title>
<p>Next-generation sequencing technologies have been employed routinely in detecting DNA variants and unveiling the cause of genetic diseases <xref ref-type="bibr" rid="pcbi.1003880-VanTassell1">[1]</xref>. The broad application of next-generation sequencing technologies has led to an accompanying rapid development in variant calling algorithms and related software <xref ref-type="bibr" rid="pcbi.1003880-Li1">[2]</xref>–<xref ref-type="bibr" rid="pcbi.1003880-Li2">[5]</xref>. However, the variant calling error rate remains relatively high for rare variants <xref ref-type="bibr" rid="pcbi.1003880-Li3">[6]</xref>, even though many new methods have been employed to improve variant calling, such as calling multiple samples together and borrowing information from the dbSNP database <xref ref-type="bibr" rid="pcbi.1003880-Nielsen1">[7]</xref>.</p>
<p>Roach et al. suggested using pedigree information to reduce the false positive rate of variant calling by removing all variants that do not conform to Mendelian transmission <xref ref-type="bibr" rid="pcbi.1003880-Roach1">[8]</xref>. However, this method cannot control the false negative rate and cannot find any de novo mutations. Pedigree information has also been used to improve the accuracy for haplotype phasing in small families <xref ref-type="bibr" rid="pcbi.1003880-Zhou1">[9]</xref>, <xref ref-type="bibr" rid="pcbi.1003880-Roach2">[10]</xref>. Recent studies have shown that incorporating pedigree information into the variant calling reduces both false positive and false negative rates for family trios and extended families <xref ref-type="bibr" rid="pcbi.1003880-Li4">[11]</xref>–<xref ref-type="bibr" rid="pcbi.1003880-Ramu1">[14]</xref>. Peng et al. showed that in some HapMap families, incorporating pedigree information can reduce the false positive rates by 14–33% <xref ref-type="bibr" rid="pcbi.1003880-Peng1">[12]</xref>.</p>
<p>Several software packages have been implemented to incorporate pedigree information for variant calling. SAMtools <xref ref-type="bibr" rid="pcbi.1003880-Li5">[13]</xref> and DeNovoGear <xref ref-type="bibr" rid="pcbi.1003880-Ramu1">[14]</xref> can process family trios together. The Elston-Stewart algorithm was used in PolyMutt <xref ref-type="bibr" rid="pcbi.1003880-Li4">[11]</xref> to incorporate extended families. However, the Elston-Stewart algorithm requires either loop-cutting techniques, which will substantially increase the computing time and give approximate answers that are not always close to the exact results <xref ref-type="bibr" rid="pcbi.1003880-Stricker1">[15]</xref>, or the use of the method proposed by Cannings et al. <xref ref-type="bibr" rid="pcbi.1003880-Cannings1">[16]</xref> that is hard to implement and has large memory requirements. Peng et al. proposed additional computational solutions for implementing the Mendelian genetic model in sequence variant calling <xref ref-type="bibr" rid="pcbi.1003880-Peng1">[12]</xref>. The Bayesian network algorithm, in particular, provides exact results for a family pedigree with inbreeding loops. In order to allow for uncertainty in the minor allele frequency estimation, we also implemented a Markov chain Monte Carlo algorithm <xref ref-type="bibr" rid="pcbi.1003880-Biswas1">[17]</xref> to perform the family-based variant calling. To incorporate pedigree information into variant calling, we provide a program, FamSeq, that allows users to choose among the four following approaches, the Elston-Stewart algorithm, the Bayesian network algorithm, the graphics processing unit (GPU) version of the Bayesian network algorithm and the Markov chain Monte Carlo algorithm. FamSeq further improves the computational efficiency by using the GPU.</p>
<p>In whole genome sequencing, there are billions of loci with millions of candidate variant positions, so computing time is always a problem. We therefore sought to parallelize the Bayesian network algorithm in order to make the computing time feasible for analyzing a large set of whole genome sequencing data. GPUs were originally designed to accelerate the processing of graphics. As GPUs have become more programmable and have performed powerfully in parallel computing, they have been widely used in general-purpose applications, including those used in bioinformatics <xref ref-type="bibr" rid="pcbi.1003880-Buckner1">[18]</xref>–<xref ref-type="bibr" rid="pcbi.1003880-Zandevakili1">[20]</xref>. The Bayesian network algorithm contains many homogeneous tasks that can be accomplished by GPU parallel computing. Therefore, we implemented the parallel computing of the Bayesian network algorithm using the CUDA parallel computing platform on an NVIDIA GPU, which substantially increased the performance of that algorithm.</p>
</sec><sec id="s3" sec-type="methods">
<title>Design and Implementation</title>
<sec id="s3a">
<title>Design Overview</title>
<p>We developed a software package, FamSeq, which calls variants for family-based sequencing data. We used different methods to implement Mendelian transmission in FamSeq.</p>
<p>As outlined in the workflow of FamSeq (<xref ref-type="fig" rid="pcbi-1003880-g001">Fig. 1</xref>), two files are required as data input: a pedigree structure file and a file containing the genotype likelihood <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e001" xlink:type="simple"/></inline-formula>, where <italic>D</italic> denotes the raw sequencing measurements, i.e., read counts, read quality and mapping quality, and <italic>G</italic> denotes the genotype of the individual. The pedigree file stores the individual identification (ID), parents' IDs, and gender and sample name, as is used to denote samples in the likelihood data file (<xref ref-type="fig" rid="pcbi-1003880-g002">Fig. 2</xref>). FamSeq accepts likelihood data files in two formats: a variant call format (VCF) <xref ref-type="bibr" rid="pcbi.1003880-Danecek1">[21]</xref> and a likelihood-only format (see description in our software manual). We introduced the likelihood-only format to allow for data generated from other sequencing platforms, with the requirement that the likelihood for each genotype is available.</p>
<fig id="pcbi-1003880-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003880.g001</object-id><label>Figure 1</label><caption>
<title>Workflow of FamSeq.</title>
<p>We use a pedigree file and a file that includes the likelihood (<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e002" xlink:type="simple"/></inline-formula>) as the input to estimate the posterior probability (<inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e003" xlink:type="simple"/></inline-formula>) for each variant genotype. (E-S: Elston-Stewart algorithm; BN: Bayesian network method; BN-GPU: The computer needs a GPU card installed to run the GPU version of the Bayesian network method; MCMC: Markov chain Monte Carlo method; VCF: variant call format.)</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003880.g001" position="float" xlink:type="simple"/></fig><fig id="pcbi-1003880-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003880.g002</object-id><label>Figure 2</label><caption>
<title>Illustration of input files.</title>
<p>A.) Pedigree structure. B.) Pedigree structure file storing the pedigree structure shown in <xref ref-type="fig" rid="pcbi-1003880-g002">Fig. 2A</xref>. From the left-most column to the right-most column, the data are ID, mID (mother ID), fID (father ID), gender and sample name. C.) Part of VCF file. From the VCF file, we can find that the genome of the grandfather (G-Father) was not sequenced. We add his information to the pedigree structure file to avoid ambiguity. For example, if we include only one parent of two siblings in the pedigree structure file, it will be unclear whether they are full or half siblings. The sample name in the pedigree structure file should be the same as the sample name in the VCF file. When the actual genome was not sequenced, we set the corresponding sample name as NA in the pedigree structure file.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003880.g002" position="float" xlink:type="simple"/></fig>
<p>FamSeq takes as input the two data files and settings of parameters (details on allele frequency and de novo mutation rate are shown hereafter). A data preprocessing feature of FamSeq will check whether there are any errors in the two input files. After that, FamSeq will implement the method the user chooses to call the variants.</p>
<p>FamSeq creates a new file as the output file, which follows the format of the input file but adds additional columns, with results on the posterior probability and the genotype, calculated using both the individual-based method and the family-based method.</p>
</sec><sec id="s3b">
<title>Data Preprocessing</title>
<p>FamSeq first checks the pedigree file. FamSeq requires the input pedigree to be complete, which means that everyone listed in the pedigree should have both a father and a mother represented in the pedigree file, with the exception of the founders of the family (<xref ref-type="fig" rid="pcbi-1003880-g002">Fig. 2</xref>). Otherwise, if two siblings have only one parent's information in the pedigree, FamSeq cannot determine whether they are full siblings or half siblings. FamSeq also checks for any inconsistency in the pedigree file, such as the father being erroneously listed as female. FamSeq extracts likelihood information from the Phred-scaled likelihood (PL) section in a VCF file or directly from a likelihood-only file.</p>
</sec><sec id="s3c">
<title>Input of Allele and Genotype Frequency in the Population</title>
<p>We require <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e004" xlink:type="simple"/></inline-formula>, which is the probability of the genotype in a population. In FamSeq, we consider a bi-allelic model with reference (R) and alternative (A) alleles. Consequently, there are three kinds of genotypes in a diploid genome: RR, RA and AA. Without compromising the detection of true variants, we set the default value of the frequency of three genotypes in the population at 0.9985, 0.001 and 0.0005 if the variant is not represented in dbSNP. The dbSNP information should be provided by the input VCF file. For dbSNPs, the default value is set at 0.45, 0.1 and 0.45. Users can choose to set other values. When only the allele frequency is known, users can set genotype frequencies based on the Hardy-Weinberg equilibrium <xref ref-type="bibr" rid="pcbi.1003880-Hardy1">[22]</xref>. Based on findings from Peng et al., changes in the values of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e005" xlink:type="simple"/></inline-formula> can affect the variant calling results of the founders, while its influence on offsprings in the family is small <xref ref-type="bibr" rid="pcbi.1003880-Peng1">[12]</xref>.</p>
</sec><sec id="s3d">
<title>Rate of De Novo Mutation</title>
<p>In FamSeq, we require the input of the de novo mutation rate by assigning a probability of <italic>m</italic> for each parental allele to mutate into the other allele in the germline <xref ref-type="bibr" rid="pcbi.1003880-Peng1">[12]</xref>. In other words, when the two parents have homozygous reference genotypes, there is still a probability that their child has a genotype with an alternative allele. We added the de novo mutation rate in the calculation of transmission probabilities (described under Model Implementation).</p>
<p>The de novo mutation rate has been estimated to be around 1e−8 per base per generation <xref ref-type="bibr" rid="pcbi.1003880-Conrad1">[23]</xref>. When we analyzed real data, we found that the rates of false positives and false negatives were better controlled when a de novo mutation rate was set at 1e−7. Thus, we set a de novo mutation rate of 1e−7 as the default in FamSeq. Users can set the de novo mutation rate according to their requirements. In general, when the de novo mutation rate is set to a large value, the influence of pedigree-to-variant calling is small and the identification of more de novo mutations is allowed during variant calling.</p>
<p>Even though we allow for de novo mutations in our model, we still may over-correct the variant calling at some loci by following Mendelian inheritance principles when there are true de novo mutations. Therefore, we provide the following option to alleviate the over-correction: when the likelihood ratio for all individuals in the pedigree is larger than a user-specified cutoff and the genotypes do not follow Mendel's law, FamSeq will call variants using the individual-based method instead of the family-based method.</p>
</sec><sec id="s3e">
<title>Method Implementation</title>
<sec id="s3e1">
<title>Markov chain Monte Carlo (MCMC) algorithm</title>
<p>We use the Gibbs sampler to derive the posterior probabilities for each genotype <xref ref-type="bibr" rid="pcbi.1003880-Biswas1">[17]</xref>, <xref ref-type="bibr" rid="pcbi.1003880-Lin1">[24]</xref>. During Gibbs sampling, the genotype of each individual in the family is updated, one at a time, based on the condition of all other family members' genotypes, the family configuration and the raw sequencing measurements. According to Mendelian segregation principles, the genotype of the individual does not depend on those of all family members, but only on the individual's parents, spouse and children. We can write the full conditionals as follows:<disp-formula id="pcbi.1003880.e006"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003880.e006" xlink:type="simple"/><label>(1)</label></disp-formula>where <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e007" xlink:type="simple"/></inline-formula> denotes the genotype for individual <italic>i</italic>, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e008" xlink:type="simple"/></inline-formula> denotes the genotype for all family members, except individual <italic>i</italic>, <bold>D</bold> denotes the raw sequencing measurements, and <bold>P</bold> denotes the pedigree configuration. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e009" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e010" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e011" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e012" xlink:type="simple"/></inline-formula> indicate the genotype of individual <italic>i</italic>'s father, mother, child and spouse. <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e013" xlink:type="simple"/></inline-formula> is the transmission probability, which shows how the parents' genotypes influence the child's genotype.</p>
<p>To avoid a local maximization problem in the Gibbs sampler, we also implemented a heated-Metropolis algorithm in MCMC, as proposed by Lin et al. <xref ref-type="bibr" rid="pcbi.1003880-Lin1">[24]</xref>. In the heated-Metropolis algorithm, <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e014" xlink:type="simple"/></inline-formula> is sampled from a distribution of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e015" xlink:type="simple"/></inline-formula> instead of <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e016" xlink:type="simple"/></inline-formula>. The sampled <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e017" xlink:type="simple"/></inline-formula> is accepted with the probability <inline-formula><inline-graphic xlink:href="info:doi/10.1371/journal.pcbi.1003880.e018" xlink:type="simple"/></inline-formula>.</p>
<p>The accuracy of the MCMC algorithm depends on the number of iterations. As is shown in Biswas et al. <xref ref-type="bibr" rid="pcbi.1003880-Biswas1">[17]</xref>, the MCMC approach requires tens of thousands of iterations to converge for a large pedigree; therefore, the computing time will also increase. By default, we set the number of iterations at 20,000<italic>n</italic>, where <italic>n</italic> is the pedigree size. Users can specify the number of iterations according to their needs.</p>
</sec><sec id="s3e2">
<title>Elston-Stewart algorithm</title>
<p>This algorithm splits the whole pedigree into anterior and posterior parts according to the individual of interest <xref ref-type="bibr" rid="pcbi.1003880-Elston1">[25]</xref>. The anterior part relates to the parents of the individual, and the posterior part relates to the child/children of the individual. The probability of the anterior and posterior parts can be estimated recursively, such that the posterior genotype probability is calculated according to the probability of the anterior part and the posterior part. The Elston-Stewart algorithm is especially complex when there are inbreeding loops in the pedigree because then the pedigree cannot be directly split into anterior and posterior parts. There are two methods to solve this problem. First, we can cut the loops according to complex criteria and obtain an approximate result <xref ref-type="bibr" rid="pcbi.1003880-Stricker1">[15]</xref>, <xref ref-type="bibr" rid="pcbi.1003880-Totir1">[26]</xref>. Cannings et al. suggested using another method to obtain the analytical results <xref ref-type="bibr" rid="pcbi.1003880-Cannings1">[16]</xref>. However, their method has large memory requirements.</p>
</sec><sec id="s3e3">
<title>Bayesian network algorithm</title>
<p>By treating the entire pedigree as a directed acyclic graph (a Bayesian network), the genotype of sample <italic>i</italic> depends on the genotypes of only his/her parents <xref ref-type="bibr" rid="pcbi.1003880-Fishelson1">[27]</xref>. We can write the posterior probability as<disp-formula id="pcbi.1003880.e019"><graphic position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1003880.e019" xlink:type="simple"/><label>(2)</label></disp-formula>The Bayesian network approach directly calculates the joint probabilities for all the combinations of genotypes of the whole family, and allows for analytic calculations for pedigrees with inbreeding loops. The Bayesian network approach is straightforward and easy to implement; however, the computing time increases exponentially when the pedigree size increases, so a supplementary approach is needed for a larger pedigree.</p>
</sec><sec id="s3e4">
<title>Bayesian network parallelization</title>
<p>For variant calling using whole genome sequencing data, there are billions of loci. After filtering by FamSeq, there are still millions of candidate variant positions remaining; thus, we propose to parallelize the Bayesian network algorithm in order to reduce the computing time and make this approach feasible in the DNA sequencing data analysis. In the Bayesian network method, we need to calculate the posterior probability for 3<sup>n</sup> kinds of genotypes. This amounts to a large volume of homogeneous computing tasks that are suitable to parallel computing by GPUs.</p>
<p>Compared to central processing units (CPUs), GPUs have many advantages in parallel computing. A GPU usually has hundreds or thousands of core processors, while there are only several core processors for a CPU. Although the computing speed for each core processor of a GPU (about 1 GHz) is not as fast as that of a CPU (about 3 GHz), the total computing speed of a GPU is faster than that of a CPU. For a large amount of homogeneous computing tasks, we can assign one task to each GPU core to parallelize the computing.</p>
<p>In FamSeq, we use CUDA (version 5.0 or later) to parallelize the Bayesian network algorithm on a GPU. CUDA is a parallel computing platform and programming model developed by NVIDIA. It can be implemented on many CUDA-enabled GPUs (<ext-link ext-link-type="uri" xlink:href="https://developer.nvidia.com/cuda-gpus" xlink:type="simple">https://developer.nvidia.com/cuda-gpus</ext-link>). CUDA provides many application programming interfaces that can be easily incorporated into C++ language. A brief illustration of GPU programming in FamSeq is shown in <xref ref-type="fig" rid="pcbi-1003880-g003">Fig. 3</xref>. For more details on GPU programming in C/C++, see the NVIDIA CUDA C Programming Guide (<ext-link ext-link-type="uri" xlink:href="http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html" xlink:type="simple">http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html</ext-link>).</p>
<fig id="pcbi-1003880-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003880.g003</object-id><label>Figure 3</label><caption>
<title>Illustration of GPU parallel computing in FamSeq.</title>
<p>The program can be divided into two parts: a serial part and a parallel part. The serial part is processed in a CPU and the parallel part is processed in a GPU. The program: 1. Prepare the data for parallel computing in a CPU; 2. Copy the data from CPU memory to GPU memory; 3. Parallelize the 3<sup>n</sup> jobs computing in the GPU, where n is the pedigree size; 4. Copy the results from GPU memory to CPU memory; and 5. Summarize the results in the CPU.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003880.g003" position="float" xlink:type="simple"/></fig></sec></sec></sec><sec id="s4">
<title>Results</title>
<p>We compared the computing time of the four different methods using real sequencing data with one million (1M) variants and a pedigree size that varied from 7 to 12. If there is no alternative allele at a position, this means that all individuals in the family have a homozygous reference genotype. If these positions are provided in the input VCF files, FamSeq will skip these positions and run joint calling on only the remaining potential variant positions. In order to estimate the actual computing time of FamSeq, we prepared a VCF file with 1M candidate variant positions as the input file. We tested the non-GPU version on a Linux server with Intel Xeon CPUs of 3.07 GHz. Only a single core of one CPU was used during testing. The GPU version was conducted on an NVIDIA Tesla M2090 with 512 cores of a 1.3 GHz GPU on a Linux server from Texas Advanced Computing Center (TACC). We used only one GPU during the comparison.</p>
<p><xref ref-type="table" rid="pcbi-1003880-t001">Table 1</xref> shows the computing time for FamSeq based on using the CPU versus the GPU. The Elston-Stewart algorithm was the fastest among the four methods we used, and was the best choice when there were no inbreeding loops in the pedigree. The presence of inbreeding loops in the pedigree requires the use of loop cutting technology before calculating the probability of the anterior and posterior parts, which leads to algorithm complexity, increased computing time, and an approximation of the results. A big advantage of the Elston-Stewart algorithm is that the computing time increases almost linearly with increases in the pedigree size. When the pedigree size is large (greater than 12), the Elston-Stewart algorithm is almost the only computationally feasible method, especially for analyzing whole genome sequencing data.</p>
<table-wrap id="pcbi-1003880-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003880.t001</object-id><label>Table 1</label><caption>
<title>The total time (in seconds) needed for computation using FamSeq at one million positions.</title>
</caption><alternatives><graphic id="pcbi-1003880-t001-1" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003880.t001" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Method</td>
<td align="left" rowspan="1" colspan="1">Loops</td>
<td align="left" rowspan="1" colspan="1">PU</td>
<td colspan="6" align="left" rowspan="1">Pedigree Size</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">12</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">E-S</td>
<td align="left" rowspan="1" colspan="1">N</td>
<td align="left" rowspan="1" colspan="1">CPU</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">15</td>
<td align="left" rowspan="1" colspan="1">16</td>
<td align="left" rowspan="1" colspan="1">22</td>
<td align="left" rowspan="1" colspan="1">34</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">MCMC<xref ref-type="table-fn" rid="nt102">a</xref></td>
<td align="left" rowspan="1" colspan="1">N</td>
<td align="left" rowspan="1" colspan="1">CPU</td>
<td align="left" rowspan="1" colspan="1">100,920</td>
<td align="left" rowspan="1" colspan="1">129,030</td>
<td align="left" rowspan="1" colspan="1">160,170</td>
<td align="left" rowspan="1" colspan="1">177,740</td>
<td align="left" rowspan="1" colspan="1">240,650</td>
<td align="left" rowspan="1" colspan="1">296,600</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">Y</td>
<td align="left" rowspan="1" colspan="1">CPU</td>
<td align="left" rowspan="1" colspan="1">117,460</td>
<td align="left" rowspan="1" colspan="1">233,490</td>
<td align="left" rowspan="1" colspan="1">289,720</td>
<td align="left" rowspan="1" colspan="1">362,630</td>
<td align="left" rowspan="1" colspan="1">432,760</td>
<td align="left" rowspan="1" colspan="1">496,750</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">BN</td>
<td align="left" rowspan="1" colspan="1">N</td>
<td align="left" rowspan="1" colspan="1">CPU</td>
<td align="left" rowspan="1" colspan="1">242</td>
<td align="left" rowspan="1" colspan="1">605</td>
<td align="left" rowspan="1" colspan="1">2,003</td>
<td align="left" rowspan="1" colspan="1">6,483</td>
<td align="left" rowspan="1" colspan="1">23,404</td>
<td align="left" rowspan="1" colspan="1">73,485</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">N</td>
<td align="left" rowspan="1" colspan="1">GPU<xref ref-type="table-fn" rid="nt103">b</xref></td>
<td align="left" rowspan="1" colspan="1">2,472 (150)</td>
<td align="left" rowspan="1" colspan="1">2,907 (169)</td>
<td align="left" rowspan="1" colspan="1">3,312 (239)</td>
<td align="left" rowspan="1" colspan="1">3,856 (397)</td>
<td align="left" rowspan="1" colspan="1">4,519 (946)</td>
<td align="left" rowspan="1" colspan="1">6,452 (2,717)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">Y</td>
<td align="left" rowspan="1" colspan="1">CPU</td>
<td align="left" rowspan="1" colspan="1">250</td>
<td align="left" rowspan="1" colspan="1">902</td>
<td align="left" rowspan="1" colspan="1">2,013</td>
<td align="left" rowspan="1" colspan="1">6,731</td>
<td align="left" rowspan="1" colspan="1">22,078</td>
<td align="left" rowspan="1" colspan="1">70,417</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">Y</td>
<td align="left" rowspan="1" colspan="1">GPU<xref ref-type="table-fn" rid="nt103">b</xref></td>
<td align="left" rowspan="1" colspan="1">2,548 (150)</td>
<td align="left" rowspan="1" colspan="1">2,986 (170)</td>
<td align="left" rowspan="1" colspan="1">3,123 (239)</td>
<td align="left" rowspan="1" colspan="1">3,602 (399)</td>
<td align="left" rowspan="1" colspan="1">4,396 (954)</td>
<td align="left" rowspan="1" colspan="1">6,605 (2,726)</td>
</tr>
</tbody>
</table>
</alternatives><table-wrap-foot><fn id="nt101"><label/><p>PU: processing unit; E-S: Elston-Stewart algorithm; MCMC: Markov chain Monte Carlo algorithm; BN: Bayesian network algorithm; N: No, inbreeding loops are not considered; Y: Yes, inbreeding loops are considered.</p></fn><fn id="nt102"><label>a</label><p>We called only 100,000 variants due to excessive running time for the MCMC algorithm. The time shown here is 10× the time required to call 100,000 variants.</p></fn><fn id="nt103"><label>b</label><p>The time in parentheses is the GPU computing time.</p></fn></table-wrap-foot></table-wrap>
<p>Although the computing time for the Bayesian network algorithm increases exponentially when the family size increases, variant calling with this method can be completed in several hours for a pedigree with fewer than 10 individuals, based on whole genome sequencing data and assuming there are about 20 million candidate variant positions. When the pedigree size is small, the computing time difference between the Bayesian network algorithm and the Elston-Stewart algorithm is small. An advantage of the Bayesian network algorithm is that it can directly calculate posterior probabilities in pedigrees that have inbreeding loops. From <xref ref-type="table" rid="pcbi-1003880-t001">Table 1</xref>, we show that the computing time for the Bayesian network algorithm is not affected by whether or not the pedigree has inbreeding loops.</p>
<p>We implemented the Bayesian network algorithm in both a CPU and GPU. Although we tried to increase the computing speed by parallelization at the GPU, the GPU version was slower than the CPU version when the pedigree size was less than 10. We found that transferring data between a CPU and GPU (steps 2 and 4 in <xref ref-type="fig" rid="pcbi-1003880-g003">Fig. 3</xref>) requires a lot of time and becomes a bottleneck for speed improvement with GPU parallelization. The number in the parentheses in <xref ref-type="table" rid="pcbi-1003880-t001">Table 1</xref> is the actual GPU computing time, which is only about one tenth of the total computing time. Since the time required to copy the data increases linearly and the computing time increases exponentially, the advantage in speed improvement for GPU parallelization becomes evident when the pedigree size is larger than 10. When the pedigree size was 12, the GPU version became 10 times faster than the CPU version, which made it feasible to call variants for the whole genome sequencing data in ∼36 hours as compared to more than 16 days for the CPU version. The actual improvement achieved from GPU computing will depend on its capacity, such as the total number of cores available in the GPU, which will vary from hundreds to thousands.</p>
<p>We ran FamSeq-GPU on a personal computer, a MacBook Pro with OS X 10.8.5, which has an NVIDIA GeForce GT 650M GPU containing 384 CUDA cores of up to 900 MHz. When the family size was 7, the corresponding GPU computing time was 360 s, which almost doubled the time needed by the TACC GPU server (<xref ref-type="table" rid="pcbi-1003880-t001">Table 1</xref>), and the total computing time, including reading and writing between the CPU and GPU, was 3,060 s. We further observed that the GPU computing time increased to 1,970 s and the overall time increased to 7,300 s for a family size of 11. Our result shows if users do not have a professional computer server, they have an option of running FamSeq with parallel computing on a personal computer.</p>
<p>We also tested the computing time for our MCMC algorithm under the same settings (<xref ref-type="table" rid="pcbi-1003880-t001">Table 1</xref>). Here, we set the total number of iterations at 20,000<italic>n</italic>, where <italic>n</italic> is the pedigree size. This option was the most time consuming and only provided approximate results. However, it can be used to analyze pedigrees with inbreeding loops and to incorporate uncertainty in the estimated alternative allele frequency, which is often not given as a set value, but as a value that follows a Beta distribution <xref ref-type="bibr" rid="pcbi.1003880-Ramu1">[14]</xref>.</p>
</sec><sec id="s5">
<title>Availability and Future Directions</title>
<p>FamSeq is a free software package under GNU license (GPL v3), which can be downloaded from our website: <ext-link ext-link-type="uri" xlink:href="http://bioinformatics.mdanderson.org/main/FamSeq" xlink:type="simple">http://bioinformatics.mdanderson.org/main/FamSeq</ext-link>, or from SourceForge: <ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/famseq/" xlink:type="simple">http://sourceforge.net/projects/famseq/</ext-link>. According to feedback from current users, we will add to the output files an annotation of de novo mutations.</p>
<p>The present FamSeq provides the option of harnessing the power of GPUs that are manufactured by NVIDIA (CUDA parallel computing architecture). We plan to re-implement FamSeq using the Open Computing Language (OpenCL) so that FamSeq can be executed across heterogeneous platforms such as CPUs, GPUs, digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors.</p>
</sec><sec id="s6">
<title>Supporting Information</title>
<supplementary-material id="pcbi.1003880.s001" mimetype="application/x-gzip" xlink:href="info:doi/10.1371/journal.pcbi.1003880.s001" position="float" xlink:type="simple"><label>Software S1</label><caption>
<p>FamSeq software package.</p>
<p>(GZ)</p>
</caption></supplementary-material></sec></body>
<back>
<ack>
<p>The authors thank Dr. Jeffrey Morris for providing the GPU computer to test our software and Zeya Wang for helping us prepare the test data. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin (<ext-link ext-link-type="uri" xlink:href="http://www.tacc.utexas.edu" xlink:type="simple">http://www.tacc.utexas.edu</ext-link>) for providing GPU resources that have contributed to the research results reported within this paper.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1003880-VanTassell1"><label>1</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Van Tassell</surname><given-names>CP</given-names></name>, <name name-style="western"><surname>Smith</surname><given-names>TP</given-names></name>, <name name-style="western"><surname>Matukumalli</surname><given-names>LK</given-names></name>, <name name-style="western"><surname>Taylor</surname><given-names>JF</given-names></name>, <name name-style="western"><surname>Schnabel</surname><given-names>RD</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries</article-title>. <source>Nat Methods</source> <volume>5</volume>: <fpage>247</fpage>–<lpage>252</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Li1"><label>2</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Handsaker</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Wysoker</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Fennell</surname><given-names>T</given-names></name>, <name name-style="western"><surname>Ruan</surname><given-names>J</given-names></name>, <etal>et al</etal>. (<year>2009</year>) <article-title>The Sequence Alignment/Map format and SAMtools</article-title>. <source>Bioinformatics</source> <volume>25</volume>: <fpage>2078</fpage>–<lpage>2079</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-McKenna1"><label>3</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>McKenna</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Hanna</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Banks</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Sivachenko</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Cibulskis</surname><given-names>K</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data</article-title>. <source>Genome Res</source> <volume>20</volume>: <fpage>1297</fpage>–<lpage>1303</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-DePristo1"><label>4</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>DePristo</surname><given-names>MA</given-names></name>, <name name-style="western"><surname>Banks</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Poplin</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Garimella</surname><given-names>KV</given-names></name>, <name name-style="western"><surname>Maguire</surname><given-names>JR</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>A framework for variation discovery and genotyping using next-generation DNA sequencing data</article-title>. <source>Nat Genet</source> <volume>43</volume>: <fpage>491</fpage>–<lpage>498</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Li2"><label>5</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Yu</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Li</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Lam</surname><given-names>TW</given-names></name>, <name name-style="western"><surname>Yiu</surname><given-names>SM</given-names></name>, <etal>et al</etal>. (<year>2009</year>) <article-title>SOAP2: an improved ultrafast tool for short read alignment</article-title>. <source>Bioinformatics</source> <volume>25</volume>: <fpage>1966</fpage>–<lpage>1967</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Li3"><label>6</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Sidore</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Kang</surname><given-names>HM</given-names></name>, <name name-style="western"><surname>Boehnke</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Abecasis</surname><given-names>GR</given-names></name> (<year>2011</year>) <article-title>Low-coverage sequencing: implications for design of complex trait association studies</article-title>. <source>Genome Res</source> <volume>21</volume>: <fpage>940</fpage>–<lpage>951</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Nielsen1"><label>7</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Nielsen</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Paul</surname><given-names>JS</given-names></name>, <name name-style="western"><surname>Albrechtsen</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Song</surname><given-names>YS</given-names></name> (<year>2011</year>) <article-title>Genotype and SNP calling from next-generation sequencing data</article-title>. <source>Nat Rev Genet</source> <volume>12</volume>: <fpage>443</fpage>–<lpage>451</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Roach1"><label>8</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Roach</surname><given-names>JC</given-names></name>, <name name-style="western"><surname>Glusman</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Smit</surname><given-names>AF</given-names></name>, <name name-style="western"><surname>Huff</surname><given-names>CD</given-names></name>, <name name-style="western"><surname>Hubley</surname><given-names>R</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>Analysis of genetic inheritance in a family quartet by whole-genome sequencing</article-title>. <source>Science</source> <volume>328</volume>: <fpage>636</fpage>–<lpage>639</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Zhou1"><label>9</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhou</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Whittemore</surname><given-names>AS</given-names></name> (<year>2012</year>) <article-title>Improving sequence-based genotype calls with linkage disequilibrium and pedigree information</article-title>. <source>Ann Appl Stat</source> <volume>6</volume>: <fpage>457</fpage>–<lpage>475</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Roach2"><label>10</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Roach</surname><given-names>JC</given-names></name>, <name name-style="western"><surname>Glusman</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Hubley</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Montsaroff</surname><given-names>SZ</given-names></name>, <name name-style="western"><surname>Holloway</surname><given-names>AK</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Chromosomal haplotypes by genetic phasing of human families</article-title>. <source>Am J Hum Genet</source> <volume>89</volume>: <fpage>382</fpage>–<lpage>397</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Li4"><label>11</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Chen</surname><given-names>W</given-names></name>, <name name-style="western"><surname>Zhan</surname><given-names>X</given-names></name>, <name name-style="western"><surname>Busonero</surname><given-names>F</given-names></name>, <name name-style="western"><surname>Sanna</surname><given-names>S</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>A likelihood-based framework for variant calling and de novo mutation detection in families</article-title>. <source>PLoS Genet</source> <volume>8</volume>: <fpage>e1002944</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Peng1"><label>12</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Peng</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Fan</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Palculict</surname><given-names>TB</given-names></name>, <name name-style="western"><surname>Shen</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Ruteshouser</surname><given-names>EC</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>Rare variant detection using family-based sequencing analysis</article-title>. <source>Proc Natl Acad Sci U S A</source> <volume>110</volume>: <fpage>3985</fpage>–<lpage>3990</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Li5"><label>13</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names>H</given-names></name> (<year>2011</year>) <article-title>A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data</article-title>. <source>Bioinformatics</source> <volume>27</volume>: <fpage>2987</fpage>–<lpage>2993</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Ramu1"><label>14</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ramu</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Noordam</surname><given-names>MJ</given-names></name>, <name name-style="western"><surname>Schwartz</surname><given-names>RS</given-names></name>, <name name-style="western"><surname>Wuster</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Hurles</surname><given-names>ME</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>DeNovoGear: de novo indel and point mutation discovery and phasing</article-title>. <source>Nat Methods</source> <volume>10</volume>: <fpage>985</fpage>–<lpage>987</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Stricker1"><label>15</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Stricker</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Fernando</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Elston</surname><given-names>R</given-names></name> (<year>1995</year>) <article-title>An algorithm to approximate the likelihood for pedigree data with loops by cutting</article-title>. <source>Theor Appl Genet</source> <volume>91</volume>: <fpage>1054</fpage>–<lpage>1063</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Cannings1"><label>16</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Cannings</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Thompson</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Skolnick</surname><given-names>M</given-names></name> (<year>1978</year>) <article-title>Probability functions on complex pedigrees [domesticated mammals, laboratory animals]</article-title>. <source>Advan Appl Probab</source> <volume>10</volume>: <fpage>26</fpage>–<lpage>61</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Biswas1"><label>17</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Biswas</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Berry</surname><given-names>DA</given-names></name> (<year>2005</year>) <article-title>Determining joint carrier probabilities of cancer-causing genes using Markov chain Monte Carlo methods</article-title>. <source>Genet Epidemiol</source> <volume>29</volume>: <fpage>141</fpage>–<lpage>154</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Buckner1"><label>18</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Buckner</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Wilson</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Seligman</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Athey</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Watson</surname><given-names>S</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>The gputools package enables GPU computing in R</article-title>. <source>Bioinformatics</source> <volume>26</volume>: <fpage>134</fpage>–<lpage>135</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Schatz1"><label>19</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Schatz</surname><given-names>MC</given-names></name>, <name name-style="western"><surname>Trapnell</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Delcher</surname><given-names>AL</given-names></name>, <name name-style="western"><surname>Varshney</surname><given-names>A</given-names></name> (<year>2007</year>) <article-title>High-throughput sequence alignment using Graphics Processing Units</article-title>. <source>BMC Bioinformatics</source> <volume>8</volume>: <fpage>474</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Zandevakili1"><label>20</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zandevakili</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Hu</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Qin</surname><given-names>Z</given-names></name> (<year>2012</year>) <article-title>GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units</article-title>. <source>PLoS One</source> <volume>7</volume>: <fpage>e36865</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Danecek1"><label>21</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Danecek</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Auton</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Abecasis</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Albers</surname><given-names>CA</given-names></name>, <name name-style="western"><surname>Banks</surname><given-names>E</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>The variant call format and VCFtools</article-title>. <source>Bioinformatics</source> <volume>27</volume>: <fpage>2156</fpage>–<lpage>2158</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Hardy1"><label>22</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hardy</surname><given-names>GH</given-names></name> (<year>1908</year>) <article-title>Mendelian proportions in a mixed population</article-title>. <source>Science</source> <volume>28</volume>: <fpage>49</fpage>–<lpage>50</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Conrad1"><label>23</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Conrad</surname><given-names>DF</given-names></name>, <name name-style="western"><surname>Keebler</surname><given-names>JE</given-names></name>, <name name-style="western"><surname>DePristo</surname><given-names>MA</given-names></name>, <name name-style="western"><surname>Lindsay</surname><given-names>SJ</given-names></name>, <name name-style="western"><surname>Zhang</surname><given-names>Y</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Variation in genome-wide mutation rates within and between human families</article-title>. <source>Nat Genet</source> <volume>43</volume>: <fpage>712</fpage>–<lpage>714</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Lin1"><label>24</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lin</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Thompson</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Wijsman</surname><given-names>E</given-names></name> (<year>1994</year>) <article-title>An algorithm for Monte Carlo estimation of genotype probabilities on complex pedigrees</article-title>. <source>Ann Hum Genet</source> <volume>58</volume>: <fpage>343</fpage>–<lpage>357</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Elston1"><label>25</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Elston</surname><given-names>RC</given-names></name>, <name name-style="western"><surname>Stewart</surname><given-names>J</given-names></name> (<year>1971</year>) <article-title>A general model for the genetic analysis of pedigree data</article-title>. <source>Hum Hered</source> <volume>21</volume>: <fpage>523</fpage>–<lpage>542</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Totir1"><label>26</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Totir</surname><given-names>LR</given-names></name>, <name name-style="western"><surname>Fernando</surname><given-names>RL</given-names></name>, <name name-style="western"><surname>Abraham</surname><given-names>J</given-names></name> (<year>2009</year>) <article-title>An efficient algorithm to compute marginal posterior genotype probabilities for every member of a pedigree with loops</article-title>. <source>Genet Sel Evol</source> <volume>41</volume>: <fpage>52</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003880-Fishelson1"><label>27</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fishelson</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Geiger</surname><given-names>D</given-names></name> (<year>2002</year>) <article-title>Exact genetic linkage computations for general pedigrees</article-title>. <source>Bioinformatics</source> <volume>18 Suppl 1</volume>: <fpage>S189</fpage>–<lpage>198</lpage>.</mixed-citation>
</ref>
</ref-list></back>
</article>