<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id>
<journal-title-group>
<journal-title>PLOS Computational Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-15-00677</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1004401</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>TRANSIT - A Software Tool for Himar1 TnSeq Analysis</article-title>
<alt-title alt-title-type="running-head">TRANSIT</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" xlink:type="simple">
<name name-style="western">
<surname>DeJesus</surname> <given-names>Michael A.</given-names></name>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple">
<name name-style="western">
<surname>Ambadipudi</surname> <given-names>Chaitra</given-names></name>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple">
<name name-style="western">
<surname>Baker</surname> <given-names>Richard</given-names></name>
<xref ref-type="aff" rid="aff002"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple">
<name name-style="western">
<surname>Sassetti</surname> <given-names>Christopher</given-names></name>
<xref ref-type="aff" rid="aff002"><sup>2</sup></xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple">
<name name-style="western">
<surname>Ioerger</surname> <given-names>Thomas R.</given-names></name>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>Department of Computer Science, Texas A&amp;M University, College Station, Texas, United States of America</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple">
<name name-style="western">
<surname>Gardner</surname> <given-names>Paul P</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>University of Canterbury, NEW ZEALAND</addr-line>
</aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: MAD CA RB CS TRI. Performed the experiments: MAD CA RB CS TRI. Analyzed the data: MAD CA RB CS TRI. Contributed reagents/materials/analysis tools: MAD CA RB CS TRI. Wrote the paper: MAD CA RB CS TRI.</p>
</fn>
<corresp id="cor001">* E-mail: <email xlink:type="simple">mad@cs.tamu.edu</email></corresp>
</author-notes>
<pub-date pub-type="collection">
<month>10</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>8</day>
<month>10</month>
<year>2015</year>
</pub-date>
<volume>11</volume>
<issue>10</issue>
<elocation-id>e1004401</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>4</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>6</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-year>2015</copyright-year>
<copyright-holder>DeJesus et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="info:doi/10.1371/journal.pcbi.1004401" xlink:type="simple"/>
<abstract>
<p>TnSeq has become a popular technique for determining the essentiality of genomic regions in bacterial organisms. Several methods have been developed to analyze the wealth of data that has been obtained through TnSeq experiments. We developed a tool for analyzing Himar1 TnSeq data called TRANSIT. TRANSIT provides a graphical interface to three different statistical methods for analyzing TnSeq data. These methods cover a variety of approaches capable of identifying essential genes in individual datasets as well as comparative analysis between conditions. We demonstrate the utility of this software by analyzing TnSeq datasets of <italic>M. tuberculosis</italic> grown on glycerol and cholesterol. We show that TRANSIT can be used to discover genes which have been previously implicated for growth on these carbon sources. TRANSIT is written in Python, and thus can be run on Windows, OSX and Linux platforms. The source code is distributed under the GNU GPL v3 license and can be obtained from the following GitHub repository: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://github.com/mad-lab/transit">https://github.com/mad-lab/transit</ext-link></p>
</abstract>
<funding-group>
<funding-statement>This work was supported by the National Institutes of Health (<ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://www.nih.gov/">www.nih.gov/</ext-link>) grant U19 AI107774. TRI and CS received funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="7"/>
<table-count count="3"/>
<page-count count="17"/>
</counts>
<custom-meta-group>
<custom-meta id="data-availability" xlink:type="simple">
<meta-name>Data Availability</meta-name>
<meta-value>Data and Soure Code are available at the following GitHub Repository: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://github.com/mad-lab/transit">https://github.com/mad-lab/transit</ext-link></meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<disp-quote>
<p>This is a <italic>PLOS Computational Biology</italic> Software paper</p>
</disp-quote>
<sec id="sec001" sec-type="intro">
<title>Introduction</title>
<p>Transposon insertion sequencing (TnSeq for short) is a popular experimental methodology for determining essential (and conditionally essential) regions in bacterial genomes [<xref ref-type="bibr" rid="pcbi.1004401.ref001">1</xref>]. TnSeq (in the broad sense used in this paper) refers to a family of related methods that use deep sequencing to survey a transposon insertion library and quantify the abundance of insertions at different sites in the genome [<xref ref-type="bibr" rid="pcbi.1004401.ref002">2</xref>–<xref ref-type="bibr" rid="pcbi.1004401.ref005">5</xref>]. The specific methodologies differ in the details of library preparation (such as use of shearing versus digestion, method of enrichment, or the choice of transposable element) [<xref ref-type="bibr" rid="pcbi.1004401.ref006">6</xref>]. While there are several tranposons that can be used to construct Tn insertion libraries, one of the most commonly used is the Himar1 transposon, which is used in several specific protocols including HITS [<xref ref-type="bibr" rid="pcbi.1004401.ref003">3</xref>], Tn-seq [<xref ref-type="bibr" rid="pcbi.1004401.ref004">4</xref>], and INSeq [<xref ref-type="bibr" rid="pcbi.1004401.ref002">2</xref>]. The Himar1 transposon inserts at random TA dinucleotide sites during the library generation process [<xref ref-type="bibr" rid="pcbi.1004401.ref007">7</xref>]. Depending on size of gene and GC-content there are typically between 5 to 50 TA sites per gene. Essential regions are inferred by the lack of insertions observed in a region (presumably because the insertion of the transposon (Tn) disrupts the protein product, making it non functional). Conditionally essential regions have insertions in one condition but not in another (See <xref ref-type="fig" rid="pcbi.1004401.g001">Fig 1</xref>). Knowledge of (conditionally) essential genes plays an important role in drug discovery, as these could be drug targets, and the ability to detect conditionally essential genes is helpful in working out pathways (e.g. comparative analysis between samples with and without supplementation of a critical metabolite).</p>
<fig id="pcbi.1004401.g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Track View of read counts for datasets grown in glycerol and cholesterol.</title>
<p>This region spans approximately 12 kb, and includes 5 genes. TA dinucleotides, which are candidate insertion sites, are indicated in the middle track. Vertical height of each bar reflects # of reads or Tn insertions at each TA site. Some sites with no insertions are probably missing from the library, while others may reflect essential regions. Note that GlpK lacks insertions in the glycerol condition, indicating that it is essential when grown on glycerol.</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g001"/>
</fig>
<p>The preparation of TnSeq samples for sequencing involves fragmenting genomic DNA, attaching sequencing adapters, and amplifying with PCR primers to enrich the sample for fragments carrying Tn:genomic junctions. Illumina next-generation sequencers are the most frequently used platform to sequence TnSeq libraries. The datasets generated from an Illumina sequencer contain short reads (∼ 100 bp) that have the terminus of the Tn as a prefix and a genomic suffix that can be mapped (aligned) to the genome to identify which TA site they represent.</p>
<p>While the relative abundance of insertion mutants can be estimated based on the frequency of read counts or template counts corresponding to an insertion site, stochastic effects during amplification and library generation can also influence these measurements. Despite these fluctuations, some regions show systematically suppressed (or inflated) counts, which could reflect a gene whose disruption causes a growth defect (or growth advantage). In addition, there can also be missing sites not represented in the library. Analysis of TnSeq data is challenging, especially with low density libraries where there are large number of TA sites not represented in the library. Several methods have been proposed for rigorously quantifying the statistical significance of essential regions, including models using the Negative Binomial distribution [<xref ref-type="bibr" rid="pcbi.1004401.ref008">8</xref>] and the Poisson distribution [<xref ref-type="bibr" rid="pcbi.1004401.ref009">9</xref>], a non-parametric test based on re-sampling counts within a sliding window [<xref ref-type="bibr" rid="pcbi.1004401.ref010">10</xref>], Bayesian methods [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>] and Hidden Markov Models [<xref ref-type="bibr" rid="pcbi.1004401.ref012">12</xref>, <xref ref-type="bibr" rid="pcbi.1004401.ref013">13</xref>].</p>
<p>TRANSIT is a new software tool that automates the analysis of Himar1 TnSeq datasets. It has a graphical interface that allows a user to load TnSeq datasets and apply several built-in analyses to identify essential (and conditionally essential) regions, calculate statistical significance, and visualize the results in different ways. For essentiality analysis, TRANSIT provides two alternative methods: a Bayesian method that quantifies the significance of “gaps” (or long consecutive sequences of TA sites lacking insertions) [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>], complemented by a Hidden Markov Model (HMM) that also incorporates local differences in read counts [<xref ref-type="bibr" rid="pcbi.1004401.ref013">13</xref>]. For comparative analysis, TRANSIT utilizes a permutation test that compares the difference of the counts in a genomic region between two different conditions to determine if there is a statistically significant difference (e.g. putatively reflecting selection for or against disruption in one of the two conditions). A pre-processor called TPP (for TRANSIT Pre-Processor) is provided which extracts read counts from raw sequence data files (in .fasta, .fastq or fastq.gz format), maps them to the reference genome, optionally reduces them to unique template counts, and outputs them in .wig format for loading into TRANSIT. Finally, numerous statistics are generated for analyzing the quality of TnSeq datasets and diagnosing any potential problems (i.e. with the library or sample preparation).</p>
<p>TRANSIT was initially designed to analyze TnSeq libraries prepared by the protocol in [<xref ref-type="bibr" rid="pcbi.1004401.ref014">14</xref>], which uses a custom barcoding scheme (unique nucleotides which occur in read 2). TPP has a default mode to recognize these barcodes and use them to reduce read counts to template counts, as described below. However, TRANSIT was intentionally designed in a modular way to decouple the preprocessing from the computational analysis to allow the statistical analysis tools of TRANSIT to be applied to datasets obtained from other (Himar1) TnSeq protocols [<xref ref-type="bibr" rid="pcbi.1004401.ref002">2</xref>–<xref ref-type="bibr" rid="pcbi.1004401.ref004">4</xref>]. For example, if a dataset is collected with just single-ended reads, or a protocol without barcoding was used for sample preparation, TPP can configured to simply process read 1 without read 2. If an alternative barcoding scheme were used, TPP might have to be modified, or users can implement their own processing pipeline for mapping reads and quantifying insertions at genomic locations. As long as these counts are written out as intermediate files in .wig format, they can be input to TRANSIT for subsequent statistical analysis. TRANSIT could in principle be modified to analyze other TnSeq libraries such as those generated with the Tn5 transposon [<xref ref-type="bibr" rid="pcbi.1004401.ref005">5</xref>].</p>
<p>Several other software tools have been developed for analysis of TnSeq data. Some are purely computational [<xref ref-type="bibr" rid="pcbi.1004401.ref012">12</xref>, <xref ref-type="bibr" rid="pcbi.1004401.ref015">15</xref>] and do not have the convenient graphical features of TRANSIT, such as TrackView (to display insertion patterns at various loci) or Volcano plots (to visualize the distribution of hits in comparative analysis). The most similar alternatives are ESSENTIALS [<xref ref-type="bibr" rid="pcbi.1004401.ref008">8</xref>] and Tn-seq Explorer [<xref ref-type="bibr" rid="pcbi.1004401.ref016">16</xref>]. ESSENTIALS uses the Negative Binomial distribution to identify essential genes and quantify their statistical significance. However, it has been observed to output an excessive number of essential genes when utilizing its reported p-values for classification [<xref ref-type="bibr" rid="pcbi.1004401.ref016">16</xref>] and can be susceptible to insertions in the N- or C- terminii, causing essential genes to appear to be non-essential [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>]. Tn-seq Explorer uses a sliding window approach to identify regions where there is a deficit of reads relative to the rest of the genome. However, there is no calculation of statistical significance for the Essentiality Index (EI) computed for each gene. The permutation test in TRANSIT also provides a more statistically rigorous way to identify conditionally essential genes (comparative analysis between conditions), in contrast to the simple comparison of EI values in Tn-seq Explorer.</p>
</sec>
<sec id="sec002" sec-type="materials|methods">
<title>Design and Implementation</title>
<sec id="sec003">
<title>Analysis Methods</title>
<p>TRANSIT provides several statistical methods capable of accomplishing two common types of tasks:
<list list-type="order"><list-item><p>Identifying essential genes in a single growth condition
<list list-type="alpha-lower"><list-item><p>Bayesian/Gumbel Method</p></list-item> <list-item><p>Hidden Markov Model</p></list-item></list></p></list-item> <list-item><p>Identifying conditionally essential genes between conditions (comparative analysis)
<list list-type="alpha-lower"><list-item><p>Resampling (permutation test)</p></list-item></list></p></list-item></list></p>
<p>All TnSeq analysis methods are sensitive to insertion density or library saturation (to different degrees). It is important to have sufficient diversity (or saturation) of the Tn mutant library, so that as many of the TA sites in non-essential regions are represented as possible. While saturation rarely achieves 100%, good libraries often have density greater than 50%, whereas libraries with lower density in the 20-30% range are more challenging to analyze and give less confident predictions (because a sequence of TA sites could be missing insertions due to chance).</p>
<sec id="sec004">
<title>Bayesian/Gumbel Method</title>
<p>For analyzing essentiality in single conditions, TRANSIT incorporates a Bayesian method that identifies the longest consecutive sequence of TA sites lacking insertion in a gene (or “gap”), and calculates the probability of this using the Gumbel or Extreme Value distribution [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>]. The basis of this approach is that, in non-essential regions, gaps will occur by chance (depending on degree of saturation), and the probability of a long gap decreases geometrically. Thus essential genes can be recognized by unusually long gaps, and the posterior probability of the longest gap can be calculated using a Bayesian formula. The Bayesian formula is a joint probability density function <xref ref-type="disp-formula" rid="pcbi.1004401.e002">Eq (1)</xref> with unobservable parameters that must be estimated using the Metropolis-Hasting sampling algorithm [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>]. In the end, the Bayesian method calculates a posterior probability of essentiality (called <inline-formula id="pcbi.1004401.e001"><alternatives><graphic id="pcbi.1004401.e001g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004401.e001"/><mml:math id="M1" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Z</mml:mi> <mml:mo>‾</mml:mo></mml:mover> <mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></alternatives></inline-formula>) for each gene. While p-values are not traditionally used in a Bayesian framework, a technique for controlling the false-discovery rate is used to select a threshold of posterior probabilities that can be used to determine a set of confident essentials and non-essentials that is adjusted for multiple comparisons [<xref ref-type="bibr" rid="pcbi.1004401.ref017">17</xref>]. Some genes might be labeled as Uncertain, which is important if data is too sparse or the gene is too short to make a confident call. While the Gumbel method does not take into consideration the magnitudes of the read counts, an important advantage of this analysis is that it is not sensitive to a few insertions at the N- and C-termini of essential genes (since there is often still a large gap in the middle of an essential ORF), or insertions in non-essential linker regions between domains, and it can identify “domain-essentials”, which are genes containing both an essential and non-essential domain.
<disp-formula id="pcbi.1004401.e002"><alternatives><graphic id="pcbi.1004401.e002g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004401.e002"/><mml:math id="M2" display="block" overflow="scroll"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd/><mml:mtd/><mml:mtd columnalign="left"><mml:mrow><mml:mi>p</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:mi>Y</mml:mi> <mml:mo>,</mml:mo> <mml:mi>Z</mml:mi> <mml:mo>,</mml:mo> <mml:msub><mml:mi>ϕ</mml:mi> <mml:mn>0</mml:mn></mml:msub> <mml:mo>,</mml:mo> <mml:msub><mml:mi>ω</mml:mi> <mml:mn>1</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>=</mml:mo> <mml:mi>p</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:mi>Y</mml:mi> <mml:mo>∣</mml:mo> <mml:mi>Z</mml:mi> <mml:mo>,</mml:mo> <mml:msub><mml:mi>ϕ</mml:mi> <mml:mn>0</mml:mn></mml:msub> <mml:mo>,</mml:mo> <mml:msub><mml:mi>ω</mml:mi> <mml:mn>1</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>×</mml:mo> <mml:mi>π</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>ϕ</mml:mi> <mml:mn>0</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>×</mml:mo> <mml:mi>π</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:mi>Z</mml:mi> <mml:mo>∣</mml:mo> <mml:msub><mml:mi>ω</mml:mi> <mml:mn>1</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>×</mml:mo> <mml:mi>π</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>ω</mml:mi> <mml:mn>1</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr> <mml:mtr><mml:mtd/><mml:mtd/><mml:mtd columnalign="left"><mml:mrow><mml:mspace width="1.em"/><mml:mo>=</mml:mo> <mml:mo>[</mml:mo> <mml:munderover><mml:mo>∏</mml:mo> <mml:mrow><mml:mi>i</mml:mi> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn></mml:mrow> <mml:mrow><mml:mi>n</mml:mi> <mml:mi>o</mml:mi> <mml:mi>n</mml:mi></mml:mrow></mml:munderover> <mml:mi>G</mml:mi> <mml:mi>u</mml:mi> <mml:mi>m</mml:mi> <mml:mi>b</mml:mi> <mml:mi>e</mml:mi> <mml:mi>l</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>r</mml:mi> <mml:mi>i</mml:mi></mml:msub> <mml:mo>|</mml:mo> <mml:mi>μ</mml:mi> <mml:mo>,</mml:mo> <mml:mi>σ</mml:mi> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>×</mml:mo> <mml:mi>N</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>s</mml:mi> <mml:mi>i</mml:mi></mml:msub> <mml:mo>-</mml:mo> <mml:msub><mml:mi>λ</mml:mi> <mml:mi>r</mml:mi></mml:msub> <mml:msub><mml:mi>r</mml:mi> <mml:mi>i</mml:mi></mml:msub> <mml:mo>,</mml:mo> <mml:msubsup><mml:mi>σ</mml:mi> <mml:mi>r</mml:mi> <mml:mn>2</mml:mn></mml:msubsup> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr> <mml:mtr><mml:mtd/><mml:mtd/><mml:mtd columnalign="left"><mml:mrow><mml:mspace width="1.em"/><mml:mo>×</mml:mo> <mml:mo>[</mml:mo> <mml:munderover><mml:mo>∏</mml:mo> <mml:mrow><mml:mi>i</mml:mi> <mml:mo>=</mml:mo> <mml:mn>1</mml:mn></mml:mrow> <mml:mrow><mml:mi>e</mml:mi> <mml:mi>s</mml:mi> <mml:mi>s</mml:mi></mml:mrow></mml:munderover> <mml:mo>Ω</mml:mo> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>s</mml:mi> <mml:mi>i</mml:mi></mml:msub> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>×</mml:mo> <mml:mi>N</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>r</mml:mi> <mml:mi>i</mml:mi></mml:msub> <mml:mo>-</mml:mo> <mml:msub><mml:mi>λ</mml:mi> <mml:mi>s</mml:mi></mml:msub> <mml:msub><mml:mi>s</mml:mi> <mml:mi>i</mml:mi></mml:msub> <mml:mo>,</mml:mo> <mml:msubsup><mml:mi>σ</mml:mi> <mml:mi>s</mml:mi> <mml:mn>2</mml:mn></mml:msubsup> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>]</mml:mo> <mml:mo>×</mml:mo> <mml:mi>B</mml:mi> <mml:mi>e</mml:mi> <mml:mi>t</mml:mi> <mml:mi>a</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>ϕ</mml:mi> <mml:mn>0</mml:mn></mml:msub> <mml:mo>;</mml:mo> <mml:msub><mml:mi>α</mml:mi> <mml:mn>0</mml:mn></mml:msub> <mml:mo>,</mml:mo> <mml:msub><mml:mi>β</mml:mi> <mml:mn>0</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr> <mml:mtr><mml:mtd/><mml:mtd/><mml:mtd columnalign="left"><mml:mrow><mml:mspace width="1.em"/><mml:mo>×</mml:mo> <mml:mi>B</mml:mi> <mml:mi>i</mml:mi> <mml:mi>n</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>K</mml:mi> <mml:mi>z</mml:mi></mml:msub> <mml:mo>;</mml:mo> <mml:mi>G</mml:mi> <mml:mo>,</mml:mo> <mml:msub><mml:mi>ω</mml:mi> <mml:mn>1</mml:mn></mml:msub> <mml:mo>)</mml:mo></mml:mrow> <mml:mo>×</mml:mo> <mml:mi>B</mml:mi> <mml:mi>e</mml:mi> <mml:mi>t</mml:mi> <mml:mi>a</mml:mi> <mml:mrow><mml:mo>(</mml:mo> <mml:msub><mml:mi>ω</mml:mi> <mml:mn>1</mml:mn></mml:msub> <mml:mo>;</mml:mo> <mml:msub><mml:mi>α</mml:mi> <mml:mi>w</mml:mi></mml:msub> <mml:mo>,</mml:mo> <mml:msub><mml:mi>β</mml:mi> <mml:mi>w</mml:mi></mml:msub> <mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></alternatives> <label>(1)</label></disp-formula></p>
<p>In this formula (from [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>]), <italic>Y</italic><sub><italic>i</italic></sub> represents the observed insertions patterns in each gene, <italic>Z</italic><sub><italic>i</italic></sub> is a binary variable that indicates whether each gene is essential, and the other variables are internal parameters of the model that get estimated through the sampling process.</p>
</sec>
<sec id="sec005">
<title>Hidden Markov Model</title>
<p>To complement the Bayesian method, TRANSIT also offers an HMM to identify essential regions in a <italic>non-gene-centric</italic> way (not limited by ORF boundaries) [<xref ref-type="bibr" rid="pcbi.1004401.ref013">13</xref>]. Thus the HMM can be used to identify essential loci larger than one gene, e.g. an operon, or smaller, e.g. an essential protein domain. Also it can potentially identify essential intergenic regions [<xref ref-type="bibr" rid="pcbi.1004401.ref010">10</xref>].</p>
<p>An HMM is a popular choice for analyzing sequential data. In this context, the HMM is applied to the sequence of TA sites to obtain the most probable state (essentiality) assignment based on the read count at the site and the distribution over the surrounding sites. In this manner, the HMM enforces a local consistency among the state assignments, despite not explicitly using a sliding window. The HMM in TRANSIT is a 4-state model (See <xref ref-type="fig" rid="pcbi.1004401.g002">Fig 2</xref>) that with states for: a) essential regions (ES), b) non-essential regions (NE), c) growth-defect regions (GD, with suppressed read counts), and d) growth advantaged regions (GA; with excess read counts, inflated above the global mean). The likelihood of read counts in each state is determined by a geometric distribution (based on the observation that low read counts are more frequent and sites with higher counts are more rare), where the mean is near-0 for ES, near the global mean for NE, intermediate for GD, and high for GA.</p>
<fig id="pcbi.1004401.g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g002</object-id>
<label>Fig 2</label>
<caption>
<title>Hidden Markov Model Diagram.</title>
<p>The HMM is fully connected, allowing transitions between each of the states. Transition probabilities and parameters are estimated in such a way that the HMM will remain in the state which best represents the read-counts observed. (a) Essential regions (“ES”) are mostly devoid of insertions, (c) while non-essential regions (“NE”) contain read-counts around the global mean. (b) Growth-defect regions (“GD”), and (d) growth-advantage regions (“GA”) represent those areas with significantly suppressed or inflated read-counts.</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g002"/>
</fig>
<p>The parameters of the HMM (transition probabilities, etc.) are dynamically adjusted to the attributes of the dataset (such as insertion density and mean read count) in such a way that the model tends to remain within a state (despite a few sites that may not fit) until enough evidence accumulates to justify a transition to another state (See <xref ref-type="fig" rid="pcbi.1004401.g002">Fig 2</xref>). Given the transition probabilities and other parameters of the model, the state distributions for each TA site are estimated from the observed counts using the Viterbi algorithm [<xref ref-type="bibr" rid="pcbi.1004401.ref018">18</xref>]. The HMM has been shown to perform well and make reasonable essentiality calls even in datasets with density as low as 20% [<xref ref-type="bibr" rid="pcbi.1004401.ref013">13</xref>]. At the end of the analysis, the proportions of sites labeled by each of the 4 states is reported. Typically around 15% of the genome would be expected to be essential in most bacteria [<xref ref-type="bibr" rid="pcbi.1004401.ref019">19</xref>], and most of the rest of the genome would be non-essential, while only a small fraction (on the order of 5-10%) might be labeled as GD or GA. An example of a putative GA region would be one containing virulence genes, which are required <italic>in vivo</italic> for infection but are often lost <italic>in vitro</italic> because of the taxing energy requirements on the organism. As a post-processing step, the essentiality state of each gene is called based on the labeling of the majority of TA sites within the ORF.</p>
</sec>
<sec id="sec006">
<title>Resampling</title>
<p>For comparative analysis, TRANSIT uses a variation of the classical permutation test in statistics [<xref ref-type="bibr" rid="pcbi.1004401.ref020">20</xref>]. For each gene, the read counts at all the TA sites and all replicates in each condition are summed, treating replicates within a condition as independent and identically distributed. The difference between the sum of read-counts at each condition is then calculated. The significance of this difference is evaluated by comparing to a resampling distribution generated from randomly reshuffling the observed counts at TA sites in the region among all the datasets. This creates a distribution of read count differences that might be observed by chance, assuming a null hypothesis that the two conditions are not in fact different. A p-value is then derived from the proportion of reshuffled samples that have a difference more extreme than that observed in the actual experimental data.</p>
<p>Due to the stochastic nature of read counts, there will almost always be some measurable difference between these sums. If this difference in sums of read counts falls within the bounds of the resampling distribution, this is interpreted as being due to chance. On the other hand, true conditionally essential genes will show a highly significant difference as insertions in the locus will be observed in one condition but not the other, resulting in a difference which is typically much larger than any of the differences observed by randomly re-shuffling. Furthermore, this method can detect genes whose disruption leads to a reduction in fitness; that is, genes which are not absolutely essential in one of the conditions, but instead have lower read-counts in one of the conditions compared to the other. The permutation test distinguishes which of these differences is statistically significant. p-values are derived from the fraction of samples that exceed the observed difference (See <xref ref-type="fig" rid="pcbi.1004401.g003">Fig 3</xref>), and this is adjusted for multiple comparisons by the Benjamini-Hochberg procedure.</p>
<fig id="pcbi.1004401.g003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g003</object-id>
<label>Fig 3</label>
<caption>
<title>Resampling histogram for gene Rv0017c.</title>
<p>Rv0017c has 23 TA sites, and the sum of the observed counts at the TA sites in this genes <italic>in vitro</italic> was 1,318 and <italic>in vivo</italic> was 399, therefore the observed difference in counts is -918. To determine the significance of this difference, 10,000 permutations of the counts at the TA sites among the datasets was generated and the observed differences plotted as a histogram showing that a difference as extreme as -918 almost never occurs by chance. The p-value is determined by the tail of this distribution to be 0.003 (30 out of 10,000).</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g003"/>
</fig>
<p>The permutation test requires that the datasets be comparably normalized. TRANSIT provides several alternative ways to normalize TnSeq data, each with different strengths and weaknesses in dealing with various sources of noise in real datasets. The default normalization procedure is the non-zero mean (NZmean) method to normalize datasets to have the same mean over non-zero sites. In our experience, this is better than normalizing by the total read-counts, which is sensitive to the degree of saturation of a library. The normalization is achieved by dividing by the total number of reads in the dataset by the total number of sites with at least one insertion, and using this (and the desired mean) as a scaling factor:
<disp-formula id="pcbi.1004401.e003"><alternatives><graphic id="pcbi.1004401.e003g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004401.e003"/><mml:math id="M3" display="block" overflow="scroll"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd columnalign="right"><mml:mrow><mml:msub><mml:mi>σ</mml:mi> <mml:mi>j</mml:mi></mml:msub> <mml:mo>=</mml:mo> <mml:msub><mml:mi>μ</mml:mi> <mml:mi>g</mml:mi></mml:msub> <mml:mo>×</mml:mo> <mml:mfrac><mml:mrow><mml:mtext>Number</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>of</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>sites</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>with</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>an</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>insertion</mml:mtext></mml:mrow> <mml:mrow><mml:mtext>Total</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>number</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>of</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>reads</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>in</mml:mtext> <mml:mspace width="4.pt"/><mml:mtext>dataset</mml:mtext> <mml:mspace width="0.277778em"/><mml:mi>j</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></alternatives></disp-formula>
where <italic>μ</italic><sub><italic>g</italic></sub> is the global mean read-count across all datasets. The normalized counts at each site <italic>i</italic> in dataset <italic>j</italic> are the raw counts times the scaling factor (<inline-formula id="pcbi.1004401.e004"><alternatives><graphic id="pcbi.1004401.e004g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004401.e004"/><mml:math id="M4" display="inline" overflow="scroll"><mml:mrow><mml:msubsup><mml:mi>c</mml:mi> <mml:mrow><mml:mi>i</mml:mi> <mml:mi>j</mml:mi></mml:mrow> <mml:mo>′</mml:mo></mml:msubsup> <mml:mo>=</mml:mo> <mml:msub><mml:mi>c</mml:mi> <mml:mrow><mml:mi>i</mml:mi> <mml:mi>j</mml:mi></mml:mrow></mml:msub> <mml:mo>×</mml:mo> <mml:msub><mml:mi>σ</mml:mi> <mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></alternatives></inline-formula>).</p>
</sec>
</sec>
<sec id="sec007">
<title>Pre-Processing</title>
<p>TRANSIT takes .wig files as input, which contain counts of reads (or unique templates) observed at each each TA site. In this way, TRANSIT accepts datasets prepared with any pre-processing or custom protocol. An optional pre-processor called called TPP is provided with the software distribution for extracting these counts from raw sequencing files (typically in .fastq format, paired-end reads in two files called “read1” and “read2”). Note that this pre-processing procedure is designed for libraries prepared in adherence with the protocol described in [<xref ref-type="bibr" rid="pcbi.1004401.ref014">14</xref>]. However, other labs might want to apply their own custom pre-processing procedure, particularly if they use an alternative protocol for preparing TnSeq samples for sequencing or if they use a different transposon.</p>
<p>TPP uses BWA (Burroughs-Wheeler Aligner; [<xref ref-type="bibr" rid="pcbi.1004401.ref021">21</xref>]) to map reads into the genome. The workflow performed by TPP (<xref ref-type="fig" rid="pcbi.1004401.g004">Fig 4</xref>) can be briefly summarized as follows: First, read 1 is analyzed to identify the subset of reads that have a prefix matching the terminus of the Himar1 transposon (ACTTATCAGCCAACCTGTTA). The transposon prefix is stripped off, and the genomic suffix is mapped onto the genome to identify the TA site (and strand) represented by each read. Then, read 2 is analyzed to extract both a random nucleic-acid barcode (See <xref ref-type="fig" rid="pcbi.1004401.g004">Fig 4</xref>) and genomic suffix. The genomic suffix is also mapped onto the genome and represents the end-point of the original DNA fragment (typically a few hundred bp away). All the reads mapping to a given TA site are reduced to unique “template” counts by discarding duplicates that have the same barcode and end-point, and these template counts are written out in .wig format (for input to TRANSIT). (If barcodes were not applied during sample prep or only single-ended data was collected, TPP can be run optionally without providing read 2, in which case raw read counts are output, without reduction to unique template counts.) In our experience, this barcoding data-reduction technique has proved useful in reducing PCR effects, which can artificially bias the read counts depending on which fragments amplify more efficiently. We have observed that this is especially true for noisier datasets, where read counts are highly variable (though recent optimizations in the PCR protocol have mitigated this problem somewhat [<xref ref-type="bibr" rid="pcbi.1004401.ref010">10</xref>]) and the resulting template counts better reflect the true abundance of distinct mutants in the population. It can take on the order of an hour to process each dataset (depending on size of dataset as well as speed of computer), which is dominated by the time required to align the genomic parts of the reads to the genome using BWA. Ideally, it is recommended to collect datasets with 5-10 million pairs of reads, which often gets reduced to ∼ 2 million unique templates, in order to have sufficient dynamic range of counts for analysis (for example, aiming for a nominal mean of ∼ 50 templates per TA sites, estimated based on ∼ 75,000 TA sites in the <italic>M. tuberculosis</italic> genome, with library saturation of around 50%).</p>
<fig id="pcbi.1004401.g004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g004</object-id>
<label>Fig 4</label>
<caption>
<title>TPP flowchart.</title>
<p>Reads in .fasta, .fastq or fastq.gz format are taken in as input, and mapped to the genome to get read-counts at individual TA sites. A .wig formatted file is returned as output, containing the coordinates and the read-counts at all TA sites in the genome.</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g004"/>
</fig>
<p>Multiple statistics are calculated by TPP for diagnostic purposes. The primary metrics used to assess the quality of a TnSeq dataset are insertion density and mean read count, which should be <italic>P</italic><sub><italic>ins</italic></sub> &gt; 30% and NZmean &gt; 10. Additional statistics are reported, such as number of reads with valid Tn prefixes, number of reads mapping to genome (broken down into read 1, read 2, and both), correlation of reads at each TA sites on forward versus reverse strand, ratio of reads to templates, etc. These metrics are important for diagnostic purposes. In addition, specific nucleotide sequences representing the vector or primer are counted. If the number of mapped reads is low, then the user could check to see if there is a large fraction of reads matching these sequences, which could indicate phage contamination in the library (left over from the original transfection in constructing the library) or excessive primer-dimers lacking genomic inserts generated during sample preparation.</p>
</sec>
<sec id="sec008">
<title>Interface</title>
<p>The main TRANSIT interface (<xref ref-type="fig" rid="pcbi.1004401.g005">Fig 5</xref>) allows the user to select an annotation file (in a tab-separated format called “.prot_table”), which contains the definitions of genes and their coordinates. It must match the reference genome to which reads were mapped by TPP. Several .prot_tables are provided online for commonly used genomes, and a script for converting annotations for other organisms from Genbank to .prot_table format is also available.</p>
<fig id="pcbi.1004401.g005" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g005</object-id>
<label>Fig 5</label>
<caption>
<title>Picture of the main TRANSIT interface.</title>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g005"/>
</fig>
<p>From the main interface the user can also load datasets in .wig format (e.g. output by TPP), which contain the coordinates and template counts at TA coordinates throughout the genome. Once a dataset is loaded, the corresponding table will be populated with diagnostic information about the dataset, like density and mean read count. This information can be used to compare datasets and identify potential problems. TRANSIT also provides a way to create a scatter plot of the read counts in two datasets from the menu-bar at the top of the interface. In addition, the user can visualize read-counts throughout the genome using TRANSIT’s Track View (also found in the menu-bar; see <xref ref-type="fig" rid="pcbi.1004401.g001">Fig 1</xref>).</p>
<p>Once the user has picked the desired datasets for comparison, they can choose which analysis they wish to perform from the drop-down menu to the right. As soon as a method is selected, the right-hand panel of TRANSIT’s interface is automatically populated with the appropriate parameters for the method selected. The user may use the default parameters (which are intended to work well on most datasets) or change individual parameters as needed. Parameter definitions are provided in the documentation included with TRANSIT.</p>
<p>After TRANSIT completes an analysis, it will create output files in the specified location and automatically add them to a list in the results window to keep track of the results files created in a session. Output files are tab-separated so they can be opened in the user’s preferred spreadsheet software (e.g. Excel). TRANSIT also has the capability of opening results files in a new window, by selecting them from the list of results files and clicking on the “Display Table” button. This list also allows the user to generate custom graphs of the results, such as volcano plots (which plots log-fold change in read counts and adjusted p-values; See <xref ref-type="fig" rid="pcbi.1004401.g006">Fig 6</xref>).</p>
<fig id="pcbi.1004401.g006" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Volcano plot of resampling results comparing replicates grown <italic>in vitro</italic> versus <italic>in vivo</italic>.</title>
<p>Significant hits have <italic>q</italic> &lt; 0.05 or −<italic>log</italic><sub>10</sub> <italic>q</italic> &gt; 1.3. Note that some genes have increased essentiality (fewer insertions; left side) and some decreased essentiality (right side).</p>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g006"/>
</fig>
<p>The results window (<xref ref-type="fig" rid="pcbi.1004401.g007">Fig 7</xref>) allows the user to sort on the desired column (e.g. p-values) thus facilitating the identification of genes of interest. The user can right-click on a gene to display the gene in Track View to examine the insertion patterns (<xref ref-type="fig" rid="pcbi.1004401.g001">Fig 1</xref>), or get other method-specific options (like histograms of the permutations obtained with the resampling method; See <xref ref-type="fig" rid="pcbi.1004401.g003">Fig 3</xref>).</p>
<fig id="pcbi.1004401.g007" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.g007</object-id>
<label>Fig 7</label>
<caption>
<title>Table of results obtained from resampling, comparing replicates grown in glycerol versus cholesterol.</title>
</caption>
<graphic mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.g007"/>
</fig>
</sec>
</sec>
<sec id="sec009" sec-type="results">
<title>Results</title>
<p>We illustrate the utility of TRANSIT by analyzing several published TnSeq datasets of <italic>M. tuberculosis</italic> H37Rv as well as <italic>H. influenza</italic>. The H37Rv strain has a total of 74,605 TA sites distributed randomly throughout the genome. It has 3989 genes with an average of 14 TA sites per gene, with almost all genes containing at least one TA site. The TnSeq datasets analyzed came from libraries grown on glycerol, a common <italic>in vitro</italic> carbon source, and cholesterol, a carbon source required for infection [<xref ref-type="bibr" rid="pcbi.1004401.ref022">22</xref>]. Datasets were obtained in multiple replicates, with two replicates grown on glycerol, and three replicates grown on cholesterol. The insertion density of the replicates was in the range of 40% to 60%, with mean template-counts ranging from 50-90 per TA site.</p>
<sec id="sec010">
<title>Bayesian/Gumbel Method</title>
<p>To identify essential genes we analyzed the datasets grown on glycerol using the Bayesian/Gumbel Method, which performs an analysis on an individual condition. After loading the glycerol replicates and the annotation file into TRANSIT, and running the Gumbel method with default parameters, we obtained an output file with results.</p>
<p>A total of 674 genes was found to be essential (<inline-formula id="pcbi.1004401.e005"><alternatives><graphic id="pcbi.1004401.e005g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004401.e005"/><mml:math id="M5" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Z</mml:mi> <mml:mo>‾</mml:mo></mml:mover> <mml:mi>i</mml:mi></mml:msub> <mml:mo>&gt;</mml:mo> <mml:msub><mml:mi>θ</mml:mi> <mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:math></alternatives></inline-formula>, where <italic>θ</italic><sub><italic>e</italic></sub> is a threshold determined by the method used to control the FDR) by the Gumbel method (16.3%), matching expectations that typically 15% of bacterial genomes are necessary for growth [<xref ref-type="bibr" rid="pcbi.1004401.ref019">19</xref>, <xref ref-type="bibr" rid="pcbi.1004401.ref023">23</xref>]. The Gumbel method also identified 2670 non-essential genes (<inline-formula id="pcbi.1004401.e006"><alternatives><graphic id="pcbi.1004401.e006g" mimetype="image" xlink:type="simple" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.1004401.e006"/><mml:math id="M6" display="inline" overflow="scroll"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Z</mml:mi> <mml:mo>‾</mml:mo></mml:mover> <mml:mi>i</mml:mi></mml:msub> <mml:mo>&lt;</mml:mo> <mml:msub><mml:mi>θ</mml:mi> <mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></alternatives></inline-formula>), with the remainder being labeled as Uncertain (because the posterior probability did not exceed the significance thresholds), or were too short for reliable analysis. <xref ref-type="table" rid="pcbi.1004401.t001">Table 1</xref> contains a summary of the classifications obtained by the Gumbel method.</p>
<table-wrap id="pcbi.1004401.t001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.t001</object-id>
<label>Table 1</label>
<caption>
<title>Table of Bayesian/Gumbel Results for H37Rv grown in glycerol.</title>
</caption>
<alternatives>
<graphic id="pcbi.1004401.t001g" mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.t001"/>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"/>
<col align="center" valign="middle" span="1"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Type of Gene</th>
<th align="center" rowspan="1" colspan="1"># of Genes</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Essential</td>
<td align="center" rowspan="1" colspan="1">674</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Uncertain</td>
<td align="center" rowspan="1" colspan="1">307</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Non-Essential</td>
<td align="center" rowspan="1" colspan="1">2670</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Too Small</td>
<td align="center" rowspan="1" colspan="1">338</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Total</td>
<td align="center" rowspan="1" colspan="1">3989</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t001fn001">
<p>Breakdown of essentiality calls for the glycerol datasets obtained by the Bayesian/Gumbel method. Essential and Non-Essential genes are those genes whose posterior probability of essentiality exceeds the dynamic thresholds of essentiality. Uncertain genes are those who do not exceed these thresholds, and “Too Small” represents those genes who are too small for reliable analysis.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Well-known essential genes like GyrA (DNA gyrase A) and RpoB (DNA-directed RNA-polymerase) were identified as essential by the Gumbel method, both achieving a posterior probability of essentiality of 1.0. Those genes are completely devoid of insertions (aside from a few insertions at the N/C termini). However, one of the strengths of the Gumbel method is that it can also identify genes which contain both essential and non-essential regions, indicative of essential domains. An example of such a gene is Rv3910, which codes for an essential MviN domain [<xref ref-type="bibr" rid="pcbi.1004401.ref024">24</xref>]. TRANSIT identifies this gene as essential, as it contains a large gap of 32 TA sites in a row without insertions, despite the fact that it has insertions on 10 out of the the remaining 17 sites. DeJesus et al. [<xref ref-type="bibr" rid="pcbi.1004401.ref011">11</xref>] discusses concordance of these results with previous essentiality analysis using the hybridization-based TraSH method [<xref ref-type="bibr" rid="pcbi.1004401.ref025">25</xref>].</p>
</sec>
<sec id="sec011">
<title>Hidden Markov Model</title>
<p>Another approach to analyzing an individual condition is the Hidden Markov Model. Like before, glycerol replicates were loaded into TRANSIT and the HMM method was run using default parameters.</p>
<p>The HMM analysis classified 16.3% of the sites in the genome as belonging to the “Essential” state, 5.4% belonging to the Growth-Defect state, 77.1% to the Non-Essential state, and 1.2% to the Growth Advantage state (See <xref ref-type="table" rid="pcbi.1004401.t002">Table 2</xref>). One advantage of this method is that it is not limited to gene-boundaries but instead can assess essentiality of entire regions. For example, the PDIM locus (<italic>fadD26</italic>, <italic>ppsABCDE</italic>, <italic>mas</italic>; which spans ∼ 38kb, 594 TA sites, and 10 genes) is required for virulence <italic>in vivo</italic> but has high metabolic costs for the organism <italic>in vitro</italic> and therefore results in a Growth-Advantage for the organism when disrupted. Indeed, sites in this region (e.g. Rv2930-Rv2939) are labeled “GA” (the mean read count in this region is 502.0, a 1.9 fold increase from the global mean), thus identifying that disruption of the PDIM locus when growing on standard <italic>in vitro</italic> conditions affords an advantage to the organism.</p>
<table-wrap id="pcbi.1004401.t002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.t002</object-id>
<label>Table 2</label>
<caption>
<title>Table of HMM Results for H37Rv grown in glycerol.</title>
</caption>
<alternatives>
<graphic id="pcbi.1004401.t002g" mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.t002"/>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"/>
<col align="center" valign="middle" span="1"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Type of Region</th>
<th align="center" rowspan="1" colspan="1">% of TA Sites (out of 74,605)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Essential</td>
<td align="char" char="." rowspan="1" colspan="1">16.3%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Growth Defect</td>
<td align="char" char="." rowspan="1" colspan="1">5.4%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Non-Essential</td>
<td align="char" char="." rowspan="1" colspan="1">77.1%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Growth Advantage</td>
<td align="char" char="." rowspan="1" colspan="1">1.2%</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t002fn001">
<p>Distribution of state calls for the glycerol datasets obtained by the HMM method. Essential states represent those regions which are mostly devoid of insertions. Non-Essential regions contain read-counts that are close to the mean read-count in the dataset. Growth-Defect regions and Growth-Advantage regions represent those regions which have significantly suppressed or increased read-counts.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="sec012">
<title>Resampling</title>
<p>The resampling method can be used for comparative analysis of different growth conditions. After adding glycerol replicates as Control datasets and cholesterol replicates as Experimental datasets, the resampling method was run with default parameters.</p>
<p>A total of 28 genes were identified as differentially essential (adjusted p-value &lt; 0.05; see <xref ref-type="table" rid="pcbi.1004401.t003">Table 3</xref> for the number of conditionally essential genes identified). Several of these genes are known to be uniquely required for growth in glycerol or cholesterol. For example, glycerol kinase (GlpK) is necessary for growth on glycerol but should not be necessary for growth on other carbon sources like cholesterol. Indeed, GlpK had a total of 1968 reads in the cholesterol condition, and only 22 total reads in glycerol, achieving an adjusted p-value (or q-value) of 0.0 with the resampling method.</p>
<table-wrap id="pcbi.1004401.t003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcbi.1004401.t003</object-id>
<label>Table 3</label>
<caption>
<title>Table of results for comparative analysis between glycerol and cholesterol.</title>
</caption>
<alternatives>
<graphic id="pcbi.1004401.t003g" mimetype="image" xlink:type="simple" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.t003"/>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="middle" span="1"/>
<col align="center" valign="middle" span="1"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Type</th>
<th align="center" rowspan="1" colspan="1">Count</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1"># of genes with <italic>q</italic> &lt; 0.05 in resampling</td>
<td align="center" rowspan="1" colspan="1">28</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"># of genes essential in glycerol but not cholesterol</td>
<td align="center" rowspan="1" colspan="1">8</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"># of genes essential in cholesterol but not glycerol</td>
<td align="center" rowspan="1" colspan="1">20</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="t003fn001">
<p>Breakdown of the number of differentially essential genes identified by the resampling method, in each condition (glycerol and cholesterol). Differentially essential genes are those with an adjusted p-value <italic>q</italic> &lt; 0.05.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>Among those genes identified as necessary for growth in cholesterol were several of the Mce-family of proteins, which is believed to be involved in lipid catabolism [<xref ref-type="bibr" rid="pcbi.1004401.ref026">26</xref>]. These included Mce4A, Mce4C, Mce4D, and Mce4F, which had 3,453, 1817, 4896, and 5168 reads in glycerol respectively, and 302, 61, 180 and 32 reads in cholesterol.</p>
<p>Several of the genes identified as differentially essential actually contain read-counts that are significantly suppressed in one condition compared to the other, indicating a selection against insertions, hence suggesting a fitness cost to the organism. An example of such a gene is Rv3200c, which contains 167 reads in glycerol and 1,755 in cholesterol, despite having a substantial number of TA sites with insertions in both conditions (7 out of 13 in glycerol and 9 out of 13 in cholesterol), thus showing the gene can tolerate insertions in both conditions. The relative suppression in read-counts alone is enough to achieve a q-value of 0, suggesting that it is conditionally essential.</p>
<p>To illustrate the comparative analysis on datasets from a different organism, TRANSIT was used to perform a comparative analysis of TnSeq datasets of <italic>H. influenza</italic>. Gawronski et al. [<xref ref-type="bibr" rid="pcbi.1004401.ref003">3</xref>] compared two datasets of <italic>H. influenza</italic> grown <italic>in vitro</italic> and one dataset derived from lung samples after passaging through mice. The libraries were relatively sparse, with 39% mean density in vitro and 26% for the lung dataset. Gawronski et al. identified a total of 136 genes necessary for growth in lung using a combination of the log ratio of read-counts between conditions and the insertion density <italic>in vitro</italic>. Using TRANSIT’s comparative analysis, 342 genes are obtained using a <italic>q</italic> &lt; 0.05 cutoff. Out of the 136 genes identified by Gawronski et al., 133 (98%) are identified by TRANSIT as differentially essential.</p>
</sec>
</sec>
<sec id="sec013">
<title>Availability and Future Directions</title>
<p>TRANSIT standardizes many of the complex steps (workflow) in processing analysis of TnSeq datasets, and provides a user-friendly interface that facilitates analysis of TnSeq data, particularly for libraries generated using the Himar1 transposon. The current version provides three different methods for identifying essential genes, including a method for comparative analysis of conditional essentiality between different conditions.</p>
<p>TRANSIT is written in the Python programming language, and can run on Linux, Macs, or Windows PCs. TRANSIT requires several Python modules (like Scipy for scientific computation, and wxPython for the user-interface), and these dependencies must also be installed. Installation instructions are provided in the manual.</p>
<p>TRANSIT is an Open-Source software platform that can be extended in future releases to include other analysis methods as they are developed. Source code for TRANSIT is available is distributed under the GNU GPL v3 license, and available at the following GitHub repository: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://github.com/mad-lab/transit">https://github.com/mad-lab/transit</ext-link>. The package includes the Python implementation of TRANSIT and TPP, the <italic>M. tuberculosis</italic> TnSeq data used in this article, and documentation.</p>
</sec>
<sec id="sec014">
<title>Supporting Information</title>
<supplementary-material id="pcbi.1004401.s001" position="float" xlink:href="info:doi/10.1371/journal.pcbi.1004401.s001" mimetype="application/x-compressed" xlink:type="simple">
<label>S1 Data</label>
<caption>
<title>Source code and datasets.</title>
<p>Source Code for TRANSIT and TPP, and datasets used to obtain results. Please see the GitHub Repository <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://github.com/mad-lab/transit">https://github.com/mad-lab/transit</ext-link> to obtain the latest version of the software.</p>
<p>(GZ)</p>
</caption>
</supplementary-material>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="pcbi.1004401.ref001">
<label>1</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Barquist</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Boinett</surname> <given-names>CJ</given-names></name>, <name name-style="western"><surname>Cain</surname> <given-names>AK</given-names></name>. <article-title>Approaches to querying bacterial genomes with transposon-insertion sequencing</article-title>. <source>RNA Biol</source>. <year>2013</year> <month>Jul</month>;<volume>10</volume>(<issue>7</issue>):<fpage>1161</fpage>–<lpage>1169</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.4161/rna.24765" xlink:type="simple">10.4161/rna.24765</ext-link></comment> <object-id pub-id-type="pmid">23635712</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref002">
<label>2</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Goodman</surname> <given-names>AL</given-names></name>, <name name-style="western"><surname>McNulty</surname> <given-names>NP</given-names></name>, <name name-style="western"><surname>Zhao</surname> <given-names>Y</given-names></name>, <name name-style="western"><surname>Leip</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Mitra</surname> <given-names>RD</given-names></name>, <name name-style="western"><surname>Lozupone</surname> <given-names>CA</given-names></name>, <etal>et al</etal>. <article-title>Identifying genetic determinants needed to establish a human gut symbiont in its habitat</article-title>. <source>Cell Host Microbe</source>. <year>2009</year> <month>Sep</month>;<volume>6</volume>(<issue>3</issue>):<fpage>279</fpage>–<lpage>289</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.chom.2009.08.003" xlink:type="simple">10.1016/j.chom.2009.08.003</ext-link></comment> <object-id pub-id-type="pmid">19748469</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref003">
<label>3</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Gawronski</surname> <given-names>JD</given-names></name>, <name name-style="western"><surname>Wong</surname> <given-names>SMS</given-names></name>, <name name-style="western"><surname>Giannoukos</surname> <given-names>G</given-names></name>, <name name-style="western"><surname>Ward</surname> <given-names>DV</given-names></name>, <name name-style="western"><surname>Akerley</surname> <given-names>BJ</given-names></name>. <article-title>Tracking insertion mutants within libraries by deep sequencing and a genome-wide screen for Haemophilus genes required in the lung</article-title>. <source>PNAS</source>. <year>2009</year>;<volume>106</volume>(<issue>38</issue>):<fpage>16422</fpage>–<lpage>16427</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.0906627106" xlink:type="simple">10.1073/pnas.0906627106</ext-link></comment> <object-id pub-id-type="pmid">19805314</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref004">
<label>4</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>van Opijnen</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Bodi</surname> <given-names>KL</given-names></name>, <name name-style="western"><surname>Camilli</surname> <given-names>A</given-names></name>. <article-title>Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms</article-title>. <source>Nat Methods</source>. <year>2009</year> <month>Oct</month>;<volume>6</volume>(<issue>10</issue>):<fpage>767</fpage>–<lpage>772</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nmeth.1377" xlink:type="simple">10.1038/nmeth.1377</ext-link></comment> <object-id pub-id-type="pmid">19767758</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref005">
<label>5</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Langridge</surname> <given-names>GC</given-names></name>, <name name-style="western"><surname>Phan</surname> <given-names>MD</given-names></name>, <name name-style="western"><surname>Turner</surname> <given-names>DJ</given-names></name>, <name name-style="western"><surname>Perkins</surname> <given-names>TT</given-names></name>, <name name-style="western"><surname>Parts</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Haase</surname> <given-names>J</given-names></name>, <etal>et al</etal>. <article-title>Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants</article-title>. <source>Genome Research</source>. <year>2009</year>;<volume>19</volume>(<issue>12</issue>):<fpage>2308</fpage>–<lpage>2316</lpage>. <comment>Available from: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/19826075">http://www.ncbi.nlm.nih.gov/pubmed/19826075</ext-link></comment> <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.097097.109" xlink:type="simple">10.1101/gr.097097.109</ext-link></comment> <object-id pub-id-type="pmid">19826075</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref006">
<label>6</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>van Opijnen</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Camilli</surname> <given-names>A</given-names></name>. <article-title>Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms</article-title>. <source>Nat Rev Microbiol</source>. <year>2013</year> <month>Jul</month>;<volume>11</volume>(<issue>7</issue>):<fpage>435</fpage>–<lpage>442</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrmicro3033" xlink:type="simple">10.1038/nrmicro3033</ext-link></comment> <object-id pub-id-type="pmid">23712350</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref007">
<label>7</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Lampe</surname> <given-names>DJ</given-names></name>, <name name-style="western"><surname>Churchill</surname> <given-names>ME</given-names></name>, <name name-style="western"><surname>Robertson</surname> <given-names>HM</given-names></name>. <article-title>A purified mariner transposase is sufficient to mediate transposition in vitro</article-title>. <source>the The European Molecular Biology Organization Journal</source>. <year>1996</year>;<volume>15</volume>(<issue>19</issue>):<fpage>5470</fpage>–<lpage>5479</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1004401.ref008">
<label>8</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Zomer</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Burghout</surname> <given-names>P</given-names></name>, <name name-style="western"><surname>Bootsma</surname> <given-names>HJ</given-names></name>, <name name-style="western"><surname>Hermans</surname> <given-names>PW</given-names></name>, <name name-style="western"><surname>van Hijum</surname> <given-names>SA</given-names></name>. <article-title>ESSENTIALS: software for rapid analysis of high throughput transposon insertion sequencing data</article-title>. <source>PLoS ONE</source>. <year>2012</year>;<volume>7</volume>(<issue>8</issue>):<fpage>e43012</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0043012" xlink:type="simple">10.1371/journal.pone.0043012</ext-link></comment> <object-id pub-id-type="pmid">22900082</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref009">
<label>9</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Deng</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Su</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Lin</surname> <given-names>X</given-names></name>, <name name-style="western"><surname>Hassett</surname> <given-names>DJ</given-names></name>, <name name-style="western"><surname>Lu</surname> <given-names>LJ</given-names></name>. <article-title>A statistical framework for improving genomic annotations of prokaryotic essential genes</article-title>. <source>PLoS ONE</source>. <year>2013</year>;<volume>8</volume>(<issue>3</issue>):<fpage>e58178</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0058178" xlink:type="simple">10.1371/journal.pone.0058178</ext-link></comment> <object-id pub-id-type="pmid">23520492</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref010">
<label>10</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Zhang</surname> <given-names>YJ</given-names></name>, <name name-style="western"><surname>Ioerger</surname> <given-names>TR</given-names></name>, <name name-style="western"><surname>Huttenhower</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Long</surname> <given-names>JE</given-names></name>, <name name-style="western"><surname>Sassetti</surname> <given-names>CM</given-names></name>, <name name-style="western"><surname>Sacchettini</surname> <given-names>JC</given-names></name>, <etal>et al</etal>. <article-title>Global assessment of genomic regions required for growth in Mycobacterium tuberculosis</article-title>. <source>PLoS Pathog</source>. <year>2012</year> <month>Sep</month>;<volume>8</volume>(<issue>9</issue>):<fpage>e1002946</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.ppat.1002946" xlink:type="simple">10.1371/journal.ppat.1002946</ext-link></comment> <object-id pub-id-type="pmid">23028335</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref011">
<label>11</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>DeJesus</surname> <given-names>MA</given-names></name>, <name name-style="western"><surname>Zhang</surname> <given-names>YJ</given-names></name>, <name name-style="western"><surname>Sassetti</surname> <given-names>CM</given-names></name>, <name name-style="western"><surname>Rubin</surname> <given-names>EJ</given-names></name>, <name name-style="western"><surname>Sacchettini</surname> <given-names>JC</given-names></name>, <name name-style="western"><surname>Ioerger</surname> <given-names>TR</given-names></name>. <article-title>Bayesian analysis of gene essentiality based on sequencing of transposon insertion libraries</article-title>. <source>Bioinformatics</source>. <year>2013</year> <month>Mar</month>;<volume>29</volume>(<issue>6</issue>):<fpage>695</fpage>–<lpage>703</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btt043" xlink:type="simple">10.1093/bioinformatics/btt043</ext-link></comment> <object-id pub-id-type="pmid">23361328</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref012">
<label>12</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Pritchard</surname> <given-names>JR</given-names></name>, <name name-style="western"><surname>Chao</surname> <given-names>MC</given-names></name>, <name name-style="western"><surname>Abel</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Davis</surname> <given-names>BM</given-names></name>, <name name-style="western"><surname>Baranowski</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Zhang</surname> <given-names>YJ</given-names></name>, <etal>et al</etal>. <article-title>ARTIST: high-resolution genome-wide assessment of fitness using transposon-insertion sequencing</article-title>. <source>PLoS Genet</source>. <year>2014</year> <month>Nov</month>;<volume>10</volume>(<issue>11</issue>):<fpage>e1004782</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pgen.1004782" xlink:type="simple">10.1371/journal.pgen.1004782</ext-link></comment> <object-id pub-id-type="pmid">25375795</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref013">
<label>13</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>DeJesus</surname> <given-names>MA</given-names></name>, <name name-style="western"><surname>Ioerger</surname> <given-names>TR</given-names></name>. <article-title>A Hidden Markov Model for identifying essential and growth-defect regions in bacterial genomes from transposon insertion sequencing data</article-title>. <source>BMC Bioinformatics</source>. <year>2013</year>;<volume>14</volume>:<fpage>303</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-14-303" xlink:type="simple">10.1186/1471-2105-14-303</ext-link></comment> <object-id pub-id-type="pmid">24103077</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref014">
<label>14</label>
<mixed-citation xlink:type="simple" publication-type="book">
<name name-style="western"><surname>Long</surname> <given-names>JE</given-names></name>, <name name-style="western"><surname>DeJesus</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Ward</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Baker</surname> <given-names>RE</given-names></name>, <name name-style="western"><surname>Ioerger</surname> <given-names>TR</given-names></name>, <name name-style="western"><surname>Sassetti</surname> <given-names>CM</given-names></name>. <chapter-title>Identifying essential genes in <italic>Mycobacterium tuberculosis</italic> by global phenotypic profiling</chapter-title>. In: <name name-style="western"><surname>Lu</surname> <given-names>LJ</given-names></name>, editor. <source>Methods in Molecular Biology: Gene Essentiality</source>. vol. <volume>1279</volume>. <publisher-name>Springer</publisher-name>; <year>2015</year>.</mixed-citation>
</ref>
<ref id="pcbi.1004401.ref015">
<label>15</label>
<mixed-citation xlink:type="simple" publication-type="other">Blades NJ, Broman KW. Estimating the Number of Essential Genes in a Genome by Random Transposon Mutagenesis. Dept. of Biostatistics Working Papers, Johns Hopkins University; 2002. MSU-CSE-00-2. <comment>Available from: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://biostats.bepress.com/jhubiostat/paper15">http://biostats.bepress.com/jhubiostat/paper15</ext-link></comment></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref016">
<label>16</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Solaimanpour</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Sarmiento</surname> <given-names>F</given-names></name>, <name name-style="western"><surname>Mrazek</surname> <given-names>J</given-names></name>. <article-title>Tn-seq explorer: a tool for analysis of high-throughput sequencing data of transposon mutant libraries</article-title>. <source>PLoS ONE</source>. <year>2015</year>;<volume>10</volume>(<issue>5</issue>):<fpage>e0126070</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0126070" xlink:type="simple">10.1371/journal.pone.0126070</ext-link></comment> <object-id pub-id-type="pmid">25938432</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref017">
<label>17</label>
<mixed-citation xlink:type="simple" publication-type="other">Muller P, Parmigiani G, Rice K. FDR and Bayesian Multiple Comparisons Rules. In: Proceedings of the ISBA 8th World Meeting on Bayesian Statistics. Benidorm, Spain; 2006.</mixed-citation>
</ref>
<ref id="pcbi.1004401.ref018">
<label>18</label>
<mixed-citation xlink:type="simple" publication-type="other">Rabiner LR. A tutorial on hidden markov models and selected applications in speech recognition. In: Proceedings of the IEEE; 1989. p. 257–286.</mixed-citation>
</ref>
<ref id="pcbi.1004401.ref019">
<label>19</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Koonin</surname> <given-names>EV</given-names></name>. <article-title>Comparative genomics, minimal gene-sets and the last universal common ancestor</article-title>. <source>Nature Reviews Microbiology</source>. <year>2003</year>;<volume>1</volume>(<issue>2</issue>):<fpage>127</fpage>–<lpage>36</lpage>. <comment>Available from: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/15035042">http://www.ncbi.nlm.nih.gov/pubmed/15035042</ext-link></comment> <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrmicro751" xlink:type="simple">10.1038/nrmicro751</ext-link></comment> <object-id pub-id-type="pmid">15035042</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref020">
<label>20</label>
<mixed-citation xlink:type="simple" publication-type="other">DeJesus MA, Ioerger TR. Reducing type I errors in Tn-Seq experiments by correcting the skew in read count distributions. In: 7th International Conference on Bioinformatics and Computational Biology (BICoB 2015); 2015.</mixed-citation>
</ref>
<ref id="pcbi.1004401.ref021">
<label>21</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Li</surname> <given-names>H</given-names></name>, <name name-style="western"><surname>Durbin</surname> <given-names>R</given-names></name>. <article-title>Fast and accurate short read alignment with Burrows-Wheeler transform</article-title>. <source>Bioinformatics</source>. <year>2009</year> <month>Jul</month>;<volume>25</volume>(<issue>14</issue>):<fpage>1754</fpage>–<lpage>1760</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btp324" xlink:type="simple">10.1093/bioinformatics/btp324</ext-link></comment> <object-id pub-id-type="pmid">19451168</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref022">
<label>22</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Griffin</surname> <given-names>JE</given-names></name>, <name name-style="western"><surname>Gawronski</surname> <given-names>JD</given-names></name>, <name name-style="western"><surname>DeJesus</surname> <given-names>MA</given-names></name>, <name name-style="western"><surname>Ioerger</surname> <given-names>TR</given-names></name>, <name name-style="western"><surname>Akerley</surname> <given-names>BJ</given-names></name>, <name name-style="western"><surname>Sassetti</surname> <given-names>CM</given-names></name>. <article-title>High-Resolution Phenotypic Profiling Defines Genes Essential for Mycobacterial Growth and Cholesterol Catabolism</article-title>. <source>PLoS Pathog</source>. <year>2011</year> <month>09</month>;<volume>7</volume>(<issue>9</issue>):<fpage>e1002251</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.ppat.1002251" xlink:type="simple">10.1371/journal.ppat.1002251</ext-link></comment> <object-id pub-id-type="pmid">21980284</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref023">
<label>23</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Glass</surname> <given-names>JI</given-names></name>, <name name-style="western"><surname>Assad-Garcia</surname> <given-names>N</given-names></name>, <name name-style="western"><surname>Alperovich</surname> <given-names>N</given-names></name>, <name name-style="western"><surname>Yooseph</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Lewis</surname> <given-names>MR</given-names></name>, <name name-style="western"><surname>Maruf</surname> <given-names>M</given-names></name>, <etal>et al</etal>. <article-title>PNAS</article-title>. <year>2006</year>;<volume>103</volume>(<issue>2</issue>):<fpage>425</fpage>–<lpage>430</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1004401.ref024">
<label>24</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Gee</surname> <given-names>CL</given-names></name>, <name name-style="western"><surname>Papavinasasundaram</surname> <given-names>KG</given-names></name>, <name name-style="western"><surname>Blair</surname> <given-names>SR</given-names></name>, <name name-style="western"><surname>Baer</surname> <given-names>CE</given-names></name>, <name name-style="western"><surname>Falick</surname> <given-names>AM</given-names></name>, <name name-style="western"><surname>King</surname> <given-names>DS</given-names></name>, <etal>et al</etal>. <article-title>A phosphorylated pseudokinase complex controls cell wall synthesis in mycobacteria</article-title>. <source>Sci Signal</source>. <year>2012</year>;<volume>5</volume>:<fpage>ra7</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1126/scisignal.2002525" xlink:type="simple">10.1126/scisignal.2002525</ext-link></comment> <object-id pub-id-type="pmid">22275220</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref025">
<label>25</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Sassetti</surname> <given-names>CM</given-names></name>, <name name-style="western"><surname>Boyd</surname> <given-names>DH</given-names></name>, <name name-style="western"><surname>Rubin</surname> <given-names>EJ</given-names></name>. <article-title>Genes required for mycobacterial growth defined by high density mutagenesis</article-title>. <source>Molecular Microbiology</source>. <year>2003</year>;<volume>48</volume>(<issue>1</issue>):<fpage>77</fpage>–<lpage>84</lpage>. <comment>Available from: <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="http://dx.doi.org/10.1046/j.1365-2958.2003.03425.x">http://dx.doi.org/10.1046/j.1365-2958.2003.03425.x</ext-link></comment> <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1046/j.1365-2958.2003.03425.x" xlink:type="simple">10.1046/j.1365-2958.2003.03425.x</ext-link></comment> <object-id pub-id-type="pmid">12657046</object-id></mixed-citation>
</ref>
<ref id="pcbi.1004401.ref026">
<label>26</label>
<mixed-citation xlink:type="simple" publication-type="journal">
<name name-style="western"><surname>Kendall</surname> <given-names>SL</given-names></name>, <name name-style="western"><surname>Withers</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Soffair</surname> <given-names>CN</given-names></name>, <name name-style="western"><surname>Moreland</surname> <given-names>NJ</given-names></name>, <name name-style="western"><surname>Gurcha</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Sidders</surname> <given-names>B</given-names></name>, <etal>et al</etal>. <article-title>A highly conserved transcriptional repressor controls a large regulon involved in lipid degradation in Mycobacterium smegmatis and Mycobacterium tuberculosis</article-title>. <source>Mol Microbiol</source>. <year>2007</year> <month>Aug</month>;<volume>65</volume>(<issue>3</issue>):<fpage>684</fpage>–<lpage>699</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1111/j.1365-2958.2007.05827.x" xlink:type="simple">10.1111/j.1365-2958.2007.05827.x</ext-link></comment> <object-id pub-id-type="pmid">17635188</object-id></mixed-citation>
</ref>
</ref-list>
</back>
</article>