<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="EN"><front><journal-meta><journal-id journal-id-type="publisher-id">plos</journal-id><journal-id journal-id-type="publisher">pcbi</journal-id><journal-id journal-id-type="allenpress-id">plcb</journal-id><journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id><journal-id journal-id-type="pmc">ploscomp</journal-id><!--===== Grouping journal title elements =====--><journal-title-group><journal-title>PLoS Computational Biology</journal-title></journal-title-group><issn pub-type="ppub">1553-734X</issn><issn pub-type="epub">1553-7358</issn><publisher><publisher-name>Public Library of Science</publisher-name><publisher-loc>San Francisco, USA</publisher-loc></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.1371/journal.pcbi.0030222</article-id><article-id pub-id-type="publisher-id">07-PLCB-RA-0312R2</article-id><article-id pub-id-type="sici">plcb-03-11-09</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline"><subject>Computational Biology</subject><subject>Genetics and Genomics</subject></subj-group><subj-group subj-group-type="System Taxonomy"><subject>Mus (mouse)</subject><subject>Homo (human)</subject></subj-group></article-categories><title-group><article-title>Computational Analysis of Mouse piRNA Sequence and Biogenesis</article-title><alt-title alt-title-type="running-head">piRNA Sequence Analysis</alt-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Betel</surname><given-names>Doron</given-names></name><xref ref-type="aff" rid="aff1">
            <sup>
            <sup>1</sup>
          </sup>
          </xref><xref ref-type="corresp" rid="cor1">
            <sup>*</sup>
          </xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Sheridan</surname><given-names>Robert</given-names></name><xref ref-type="aff" rid="aff1">
            <sup>
            <sup>1</sup>
          </sup>
          </xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Marks</surname><given-names>Debora S</given-names></name><xref ref-type="aff" rid="aff2">
            <sup>
            <sup>2</sup>
          </sup>
          </xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Sander</surname><given-names>Chris</given-names></name><xref ref-type="aff" rid="aff1">
            <sup>
            <sup>1</sup>
          </sup>
          </xref></contrib></contrib-group><aff id="aff1">
        <label>1</label>
        <addr-line>
				 Computational and Systems Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
			</addr-line>
      </aff><aff id="aff2">
        <label>2</label>
        <addr-line>
				 Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
			</addr-line>
      </aff><contrib-group><contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Kim</surname><given-names>Narry</given-names></name><role>Editor</role><xref ref-type="aff" rid="edit1"/></contrib></contrib-group><aff id="edit1">Seoul National University, Republic of Korea</aff><author-notes><corresp id="cor1">* To whom correspondence should be addressed. E-mail: <email xlink:type="simple">betel@cbio.mskcc.org</email></corresp><fn fn-type="con" id="ack1"><p> DB and CS conceived and designed the experiments and wrote the paper. DB performed the experiments. DB and RS analyzed the data. DB, RS, and DSM contributed reagents/materials/analysis tools.</p></fn><fn fn-type="conflict" id="ack3"><p> The authors have declared that no competing interests exist.</p></fn></author-notes><pub-date pub-type="ppub"><month>11</month><year>2007</year></pub-date><pub-date pub-type="epub"><day>9</day><month>11</month><year>2007</year></pub-date><pub-date pub-type="epreprint"><day>26</day><month>9</month><year>2007</year></pub-date><volume>3</volume><issue>11</issue><elocation-id>e222</elocation-id><history><date date-type="received"><day>6</day><month>6</month><year>2007</year></date><date date-type="accepted"><day>27</day><month>9</month><year>2007</year></date></history><!--===== Grouping copyright info into permissions =====--><permissions><copyright-year>2007</copyright-year><copyright-holder> Betel et al</copyright-holder><license><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p></license></permissions><abstract><p>The recent discovery of a new class of 30-nucleotide long RNAs in mammalian testes, called PIWI-interacting RNA (piRNA), with similarities to microRNAs and repeat-associated small interfering RNAs (rasiRNAs), has raised puzzling questions regarding their biogenesis and function. We report a comparative analysis of currently available piRNA sequence data from the pachytene stage of mouse spermatogenesis that sheds light on their sequence diversity and mechanism of biogenesis. We conclude that (i) there are at least four times as many piRNAs in mouse testes than currently known; (ii) piRNAs, which originate from long precursor transcripts, are generated by quasi-random enzymatic processing that is guided by a weak sequence signature at the piRNA 5′ends resulting in a large number of distinct sequences; and (iii) many of the piRNA clusters contain inverted repeats segments capable of forming double-strand RNA fold-back segments that may initiate piRNA processing analogous to transposon silencing.</p></abstract><abstract abstract-type="summary"><title>Author Summary</title><sec id="st1"><title/><p>The discovery of a new class of mammalian small regulatory RNAs termed PIWI-interacting RNA (piRNA) has extended the diverse family of small regulatory RNAs. PIWI proteins are a subclass of the larger Argonaute proteins family, of which the Ago members bind microRNAs and play a critical role in gene silencing. Despite the homology between PIWI and Ago proteins, piRNAs are strikingly different from microRNAs in their length, expression pattern, and genomic organization. In contrast, piRNAs are similar to repeat-associated small interfering RNA (rasiRNAs), a class of small RNAs that are responsible for transposon silencing in <italic>Drosophila</italic> germline, although it is unclear if piRNAs function in a similar way. This paper describes a computational comparison and analysis of the existing comprehensive piRNA datasets identified independently by three groups at the pachytene stage in mouse spermatogenesis. We find that the studies have identified similar genomic piRNA clusters, but differ substantially in the piRNAs that were cloned from those clusters. Based on these results we quantify the expected number of piRNAs and suggest that the processing of piRNAs from genomic transcripts is quasi-random. We find that a weak sequence signature may guide the piRNA 5′end processing that accounts for the departure from fully random processing. We further show partial evidence that piRNA biogenesis may be initiated by neighboring transposable elements.</p></sec></abstract><funding-group><funding-statement> The work was supported by the Bressler Scholar Fund and the NIGMS (RU-MSKCC collaborative P01).</funding-statement></funding-group><counts><page-count count="9"/></counts><!--===== Restructure custom-meta-wrap to custom-meta-group =====--><custom-meta-group><custom-meta><meta-name>citation</meta-name><meta-value>Betel D, Sheridan R, Marks DS, Sander C (2007) Computational analysis of mouse piRNA sequence and biogenesis. PLoS Comput Biol 3(11): e222. doi:<ext-link ext-link-type="doi" xlink:href="http://dx.doi.org/10.1371/journal.pcbi.0030222" xlink:type="simple">10.1371/journal.pcbi.0030222</ext-link></meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1"><title>Introduction</title><p>A recent landmark discovery has identified a novel class of small RNAs in mammalian testes that is expressed during spermatogenesis [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b006">6</xref>]. PIWI-interacting RNAs (piRNAs) are typically ∼30 bases long, associate with PIWI proteins, and are organized into distinct genomic clusters (reviewed in [<xref ref-type="bibr" rid="pcbi-0030222-b007">7</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b012">12</xref>]). The function of piRNAs is currently unknown<bold>,</bold> but the homology of PIWI proteins to Argonaute proteins, key components of the small interfering RNA pathway, and the similarities of piRNAs to microRNAs and short-interfering RNAs (siRNAs), known as negative regulators of gene expression, suggest a role in RNA-dependent regulatory processes during meiosis. Furthermore, piRNAs are similar to repeat-associated small interfering RNA (rasiRNA), a class of small RNAs that are responsible for transposon silencing in the <italic>Drosophila</italic> germline [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>] (and recently identified in <italic>Zebrafish</italic> [<xref ref-type="bibr" rid="pcbi-0030222-b021">21</xref>]), suggesting analogies between rasiRNAs and mammalian piRNAs in terms of biogenesis and function. Note that the terms rasiRNA and piRNA are often used interchangeably. Here we refer to the PIWI-interacting small RNAs from <italic>Drosophila</italic> and <italic>Zebrafish</italic> as rasiRNAs and the mammalian counterparts as piRNAs without discounting functional similarity.</p><p>To better understand the origin of piRNAs, we compared the available three largest mouse piRNA datasets (identified at the pachytene stage of spermatogenesis) in terms of sequence similarities and cluster organization. Given the comprehensive nature of these efforts and the focus on a common specific stage in mouse spermatogenesis, we expected close agreement between the datasets. Indeed, the three groups report <italic>similar</italic> location, size, and strand organization of the piRNA genomic clusters (<xref ref-type="fig" rid="pcbi-0030222-g001">Figure 1</xref>A). However, the three sets of sequences are surprisingly <italic>dissimilar</italic> suggesting a much larger underlying pool of potential piRNAs from which each group has been independently sampled. We estimate the size of the pool to be about ∼2 × 10<sup>5</sup> potential piRNAs, based on the number of sequences in each datasets and their overlaps.</p><fig id="pcbi-0030222-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0030222.g001</object-id><label>Figure 1</label><caption><title>Sequence and Cluster Overlaps between Datasets A, B, and C</title><p>Although the three studies identified the same piRNA clusters, they are distinct at the level of piRNA sequences.</p><p>(A) View of Chromosome 5 piRNA sequences and clusters from datasets A, B, C. Top panel (1) is the karyotype view with cluster positions of the datasets: A (green lines), B (top purple triangles), and C (bottom yellow triangles). Lower panels (2–5) are magnified views of the sequences and cluster locations from the three datasets. Top three tracks in each panel are the sequence locations from datasets C (yellow), B (purple), and A (green), and lower three tracks are the cluster positions in the same color scheme. The Venn diagram of the cluster overlaps (B) shows a good agreement between the datasets while sequence overlaps, using 95% identity measure, are small (C). Note that the number of piRNAs used in this comparison is different from the number of sequences reported in the original studies (see <xref ref-type="sec" rid="s3">Methods</xref> and <xref ref-type="supplementary-material" rid="pcbi-0030222-st001">Table S1</xref>).</p></caption><graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.g001" xlink:type="simple"/></fig><p>We further show that 25% of piRNA clusters are bracketed by inverted repeats of varying length, suggesting that some of the long piRNAs single-stranded precursors [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b006">6</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>] can form a double-strand RNA (dsRNA) intermediate from inverted repeats that may trigger piRNA biogenesis. Taking into account positional nucleotide frequencies and copy numbers of experimentally determined piRNAs, we conclude that piRNA precursors are processed by a quasi-random mechanism that generates large numbers of distinct piRNA sequences.</p><sec id="s1a"><title>Discovery of piRNAs</title><p>Five groups reported the discovery of small RNAs expressed exclusively in mammalian testes (mouse, rat, and human) that bind MIWI (murine PIWI) or MILI proteins [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b005">5</xref>]. Here, we focus on the three largest datasets (A–C, listed in decreasing number of piRNA sequences identified in [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>]) each with thousands of distinct piRNA sequences (a recent fourth comprehensive dataset of MILI-bound piRNAs identified in the pre-pachytene stage of spermatogenesis [<xref ref-type="bibr" rid="pcbi-0030222-b006">6</xref>] is not included in this analysis). The number of unique piRNA sequences ranges from 3,482 to 40,102 (<xref ref-type="supplementary-material" rid="pcbi-0030222-st001">Table S1</xref>), as a result of the different methods used to identify the sequences. Overall, the length distributions of piRNAs peak at 29–31 nucleotides. However, the MILI-bound piRNAs (dataset C) [<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>] are generally shorter (26–28 nt) than the MIWI-bound piRNAs (29–31 nt) [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b002">2</xref>], possibly due to differences in binding modes of the two proteins.</p><p>The short length of piRNAs and the structural homology between PIWI and Argonaute proteins are suggestive of functional similarities between piRNAs and microRNAs. However, the combined evidence indicates that both the biogenesis and function of these two classes of RNA are distinct (<xref ref-type="table" rid="pcbi-0030222-t001">Table 1</xref>). Primary differences are in genomic organization, sequence conservation, and in the number of unique sequences—among which are hundreds of microRNAs and tens of thousands of piRNAs. The majority of the identified piRNAs have a preference for a uridine base at the first position (78%–94%). Similar 5′ bias was observed in other types of small RNAs such as microRNAs and siRNAs, although to a lesser extent. The 5′ U is reminiscent of processing by RNase III enzymes [<xref ref-type="bibr" rid="pcbi-0030222-b017">17</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b018">18</xref>] but may also reflect preferential binding to the Argonaute-like proteins. Although microRNAs and piRNAs share similar 5′ termini, other aspects of their biogenesis pathways are noticeably distinct: (i) piRNAs undergo 2′-O-methylation at their 3′ end [<xref ref-type="bibr" rid="pcbi-0030222-b022">22</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b026">26</xref>], which animal microRNAs do not; (ii) microRNA precursors are characterized by a distinct hairpin structure whereas piRNA precursors have no apparent secondary structure; and (iii) in contrast to microRNAs, piRNA maturation is independent of Dicer enzymes [<xref ref-type="bibr" rid="pcbi-0030222-b016">16</xref>].</p><table-wrap content-type="1col" id="pcbi-0030222-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0030222.t001</object-id><label>Table 1</label><caption><p>Comparison of microRNAs and piRNAs</p></caption><graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.t001" xlink:type="simple"/><!-- <table frame="hsides" rules="none"><colgroup><col id="tb1col1" align="left" charoff="0" char=""/><col id="tb1col2" align="left" charoff="0" char=""/><col id="tb1col3" align="left" charoff="0" char=""/></colgroup><thead><tr><td align="left" valign="middle"><hr/>Property</td><td valign="middle"><hr/>microRNAs</td><td valign="middle"><hr/>piRNAs</td></tr></thead><tbody><tr><td valign="middle">Length</td><td valign="middle">20&ndash;21 nt</td><td valign="middle">28&ndash;33 nt</td></tr><tr><td valign="middle">Binding protein in ribonuclear complex</td><td valign="top">Argonaute subfamily</td><td valign="top">PIWI subfamily</td></tr><tr><td valign="middle">Number of distinct sequences (mouse)</td><td valign="top">&sim;420 currently known</td><td valign="top">&sim;50,000 currently known, &sim;200,000 estimated total</td></tr><tr><td valign="middle">Expression patterns</td><td valign="middle">Subsets of microRNAs are expressed in most cell types and developmental stages.</td><td valign="top">Found only in spermatocytes and spermatids in testes.</td></tr><tr><td valign="top">Genomic organization</td><td valign="middle">Some microRNA genes are in polycistronic transcriptional units that generate a few mature microRNAs from a single transcript. Others are individually transcribed. Some are in introns within host genes.</td><td valign="middle">Organized in large genomic clusters of &sim;25&ndash;35 kb containing hundreds of piRNAs with preferential strand organization. Some clusters are bidirectional such that piRNAs originate from two non-overlapping regions from opposing strands.</td></tr><tr><td valign="top">Biogenesis</td><td valign="top">A primary RNA pol-II transcript is initially processed in a position-specific manner by Drosha protein resulting in one or a few &sim;80&ndash;70 nt precursors with a characteristic hairpin structure. Each precursor is further processed by Dicer protein to a mature &sim;21 nt single-stranded mature microRNA.</td><td valign="top">Many piRNAs are generated from long transcript without repeating secondary structure. The precursor transcript is processed by an unknown nuclease complex, apparently by a positionally quasi-random mechanism.</td></tr><tr><td valign="top">Conservation</td><td valign="top">Conserved in metazoans with strong sequence conservation for most microRNAs.</td><td valign="middle">Found in mammals with similarities to <italic>Drosophila</italic> rasiRNAs. No conservation between species and limited syntenic conservation of clusters.</td></tr><tr><td valign="top">5&prime;, 3&prime; Termini modifications</td><td valign="middle">Animal microRNAs contain 5&prime;-phosphate group and 2&prime;,3&prime;-hydroxyl moieties. Plant microRNAs are 2'-O-methylated at the 3' end.</td><td valign="top">Contains 5&prime;-phosphate and are 2&prime;-O-methylated at their 3&prime; terminus</td></tr><tr><td valign="middle">Function</td><td valign="middle">Post-transcriptional regulation of gene expression by hybridizing to complementary regions of target mRNAs.</td><td valign="top">Unknown, possibly involved in transposon silencing</td></tr></tbody></table> --><!-- <table-wrap-foot><fn id="nt101"><p>doi:10.1371/journal/pcbi.0030222.t001</p></fn></table-wrap-foot> --></table-wrap><p>The majority of piRNAs (81%–96%) is organized in clusters (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg001">Figure S1</xref>) with distinct strand preference that ranges from 1 to 127 kb in size and are found predominantly in autosomes. Some of the clusters are organized in a bipartite arrangement with a stretch of piRNAs on one strand adjacent to a second stretch of piRNAs on the opposing strand. This organization is consistent with bi-directional transcription—for a minority of the clusters—from a common origin that generates two RNA precursors. The organization of piRNAs into clusters is common to mouse, human, and rat with significant conservation of the cluster genomic locations (synteny) [<xref ref-type="bibr" rid="pcbi-0030222-b002">2</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>]. In contrast, there is very little conservation at the level of individual piRNA sequences (unpublished data and previously reported by [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b006">6</xref>]). Most reported piRNAs are in un-annotated intergenic regions and only a small fraction appears to be derived from mRNAs (5.7%–12%) or is coincident with other classes of RNAs such as snoRNAs, tRNAs, rRNAs, or miRNAs (0.2%–3.5%) [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>].</p><p>piRNAs bind MILI and MIWI proteins, which are members of the PIWI protein family, a subclass of the Argonaute family. In eukaryotes, Argonaute proteins are key components of the interfering RNA pathway in which they bind mature microRNAs or siRNAs to form the RNA-induced silencing complex (RISC) [<xref ref-type="bibr" rid="pcbi-0030222-b027">27</xref>]. All three murine PIWI members (MIWI, MILI, and MIWI2) are required for spermatogenesis as determined by knockout experiments and are predominantly expressed in testes in partially overlapping time intervals [<xref ref-type="bibr" rid="pcbi-0030222-b028">28</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b031">31</xref>]. Recent reports link mammalian MIWI protein to chromatoid bodies (also known as nuages in <italic>Drosophila</italic>) [<xref ref-type="bibr" rid="pcbi-0030222-b032">32</xref>]. These are cytoplasmic structures found in all mammalian spermatogenic cells that physically associate with the nuclear membrane during spermatogenesis and contain an RNA helicase protein (VASA). The function of chromatoid bodies is unknown but they are presumed to be the site of post-transcriptional processing and storage of mRNAs analogous to processing bodies in somatic cells (P-bodies) [<xref ref-type="bibr" rid="pcbi-0030222-b033">33</xref>]. It is unknown if the co-localization of MIWI proteins to chromatoid bodies is linked in any way to their function with piRNAs.</p></sec><sec id="s1b"><title>Similarities between rasiRNAs and Mammalian piRNAs</title><p>rasiRNAs are a class of interfering RNA with a size distribution of 23–28 nucleotides that were identified in a number of organisms [<xref ref-type="bibr" rid="pcbi-0030222-b017">17</xref>]. They originate from repeat sequences related to transposable elements and heterochromatic regions [<xref ref-type="bibr" rid="pcbi-0030222-b015">15</xref>], and evidence supports their involvement in transposon silencing [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b021">21</xref>]. rasiRNAs are found in both female and male germline where they bind members of the PIWI family (Piwi, Aub, and Ago3 in <italic>Drosophila</italic>) [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b034">34</xref>]. There are two distinct types of <italic>Drosophila</italic> rasiRNAs (there is evidence that similar classes exist in <italic>Zebrafish</italic> [<xref ref-type="bibr" rid="pcbi-0030222-b021">21</xref>]); the first type bind Piwi or Aub proteins, are mostly antisense to transposable elements, and enriched for 5′ uridine. The second type bind Ago3 proteins, are mostly sense to the transposable elements, and enriched in adenosine at position 10. The different strand-specificity and the U and A enrichments led to the hypothesis that the biogenesis of the two types of rasiRNAs is coupled [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>]. In this model the Piwi/Aub-associated rasiRNAs guide the 5′ cleavage of the Ago3-associated rasiRNAs by hybridization to the sense transcript. Similarly, the Ago3-bound rasiRNAs direct the 5′ cleavage of the Piwi/Aub-bound rasiRNAs by hybridization to the anti-sense transcripts. Thus, the two rasiRNA types are engaged in a mutual amplification loop that facilitates the silencing of multiple transposon copies.</p><p>The length characteristics, testis-specific expression, PIWI interaction, genomic organization, and 5′ uridine enrichment suggest that piRNAs may be the mammalian equivalent of rasiRNAs. This would support the idea that mammalian piRNAs might be involved in silencing transposable elements. However, at present, there are a number of differences that cast doubt on this functional analogy. First, genomic annotation of piRNAs indicates that only 12%–20% are repeat derived [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>–<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b005">5</xref>], which is smaller than the frequency of repeat sequences in the mouse genome (37.5%) [<xref ref-type="bibr" rid="pcbi-0030222-b035">35</xref>], while <italic>Drosophila</italic> rasiRNAs originate preferentially from repeat regions. Second, mammalian piRNAs originate from one strand or the other forming clusters with continuous strand bias whereas rasiRNAs originate from both strands of the clusters with positional enrichment for “U” and “A.” We explored the analogy between rasiRNAs and piRNAs, but did not find significant 5′ partial complementarity between piRNA sequences as found in rasiRNAs [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>]. However, at present, sequences associated with the third mouse testes–specific MIWI protein (MIWI2), also essential for spermatogenesis and linked to transposon silencing [<xref ref-type="bibr" rid="pcbi-0030222-b031">31</xref>], have not yet been identified. Future identification of MIWI2-bound piRNAs—in analogy to Ago3-bound <italic>Drosophila</italic> rasiRNAs—enriched for adenosine at position 10 with partial complementary match to other piRNAs would be strongly suggestive of functional similarity between rasiRNAs and piRNAs.</p></sec><sec id="s1c"><title>Open Questions</title><p>The discovery of large sets of piRNAs raises a number of important biological questions. In particular, what is the biochemical role and cellular function of PIWI-bound piRNAs during spermatogenesis? Are they involved in transposon silencing, chromosome rearrangements (as are 30-nt PIWI-bound RNAs in <italic>Tetrahymena</italic> [<xref ref-type="bibr" rid="pcbi-0030222-b036">36</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b037">37</xref>]), or chromosome pairing? What are the evolutionary constraints on piRNA sequences? Answers to these questions will primarily emerge from further experiments. Here, we focused on the basic questions of <italic>how many piRNA sequences</italic> there are and <italic>how they are produced</italic>. We reasoned that a detailed computational comparison of the three major datasets, representing independent discoveries of piRNAs, provides insight into the organization of genomic clusters, the number and distribution of sequences within the clusters, and, by implication, their biogenesis.</p></sec></sec><sec id="s2"><title>Results/Discussion</title><sec id="s2a"><title>Comparing piRNA Clusters</title><p>We first compared the cluster locations in the mouse genome from datasets A–C and found extensive agreement between the datasets. The majority of clusters overlap by more than 75% of the length of the shorter cluster. All 42 genomics clusters from dataset C, the smallest of the three, matched clusters of datasets A and B (<xref ref-type="fig" rid="pcbi-0030222-g001">Figure 1</xref>B). Given the different definitions of clusters in the three datasets, we conclude that the three sets of experiments have determined essentially the same clusters of piRNAs expressed in the pachytene stages of spermatogenesis (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg001">Figure S1</xref>). Other stages of development may yield additional and possibly distinct sets of piRNAs, such as the MILI-bound set of piRNAs (not analyzed here) recently identified in the pre-pachytene phase of spermatogenesis [<xref ref-type="bibr" rid="pcbi-0030222-b006">6</xref>].</p></sec><sec id="s2b"><title>Comparing piRNA Sequences</title><p>We compared the sets of individual sequences from the three groups (A–C). Contrary to the agreement between clusters, we found surprisingly small overlaps between the sets of unique sequences, irrespective of the criteria used for sequence comparison (100%, 95%, or 90% sequence identity, <xref ref-type="supplementary-material" rid="pcbi-0030222-st001">Table S1</xref>). For example, at a 95% sequence identity cutoff only 45% of the sequences from dataset C overlap with dataset A (the largest fractional overlap among all pairs of datasets), although all the piRNA clusters from the smallest dataset C are included in dataset A. Furthermore, only 587 sequences were common to all three datasets representing 20%, 3.7%, and 2.7% of the datasets C, B, and A, respectively (<xref ref-type="fig" rid="pcbi-0030222-g001">Figure 1</xref>C). Similarly low overlap was observed when comparing human piRNA datasets, but as the sequencing coverage is lower than in mouse, this result is not as conclusive.</p></sec><sec id="s2c"><title>Estimating the Complete piRNA Pool</title><p>This small overlap between the piRNA datasets points to an apparent contradiction—how can different sets of piRNA sequences originate from a common set of genomic clusters? The simplest explanation is that each experiment identified only a subset of sequences from a larger pool of unique piRNA sequences. To quantify this effect, we first asked whether the observed overlaps are within the expected range assuming that the complete piRNA pool is simply the union of the three datasets. To facilitate the comparison we restricted this analysis to the intersection of clusters from the three datasets, termed “intersection clusters” (<xref ref-type="supplementary-material" rid="pcbi-0030222-st002">Table S2</xref>). By numerical simulation and direct calculation we find that the observed sequence overlaps between all datasets is significantly lower than expected (unpublished data), indicating that the total pool of piRNA sequences is indeed larger than the simple union of current datasets. Using straightforward statistical calculation, we then estimated the total number of piRNAs from the observed overlaps in the intersection clusters by considering the three studies as independent sampling experiments from a common pool of all piRNAs (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg002">Figure S2</xref>).</p><p>From this estimate we conclude that the current datasets analyzed here have so far identified only 25%–30% of all potential piRNA sequences from the pachytene stage of mouse spermatogenesis. This implies that in the complete set ∼20%–25% of all “U” positions in the clusters are potential start sites for piRNA sequences when taking into account the pronounced preference for 5′ uridine. Extrapolating to saturation in all clusters reported by any of the three groups, we arrive at the overall conservative estimate of <italic>N<sub>total</sub></italic> ≈ 2 × 10<sup>5</sup> potential piRNA sequences in mouse testes (<xref ref-type="fig" rid="pcbi-0030222-g002">Figure 2</xref>). This does not imply that all sequences are necessarily present in any given cell.</p><fig id="pcbi-0030222-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0030222.g002</object-id><label>Figure 2</label><caption><title>Currently Known and Estimated Total Number of piRNAs</title><p>We estimate that the total number of piRNAs in mouse testes is ∼2 × 10<sup>5</sup> (red), roughly four times the number of currently known piRNAs (blue). The estimated number of piRNAs corresponds to ∼23% of all “U” positions (green) or 5%–6% of all nucleotides (yellow) in piRNA clusters.</p></caption><graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.g002" xlink:type="simple"/></fig></sec><sec id="s2d"><title>Quasi-Random Processing</title><p>The details of piRNA biogenesis are not yet known. In particular, what is the precursor form of piRNAs? Is it single-strand or double-strand? What are the components of the nuclease-processing complex? By which mechanism, in which order, and under which regulatory control do thousands of different ∼30 nt transcripts originate from a limited number of genomic regions? The large differences in piRNA datasets and the relatively weak evolutionary conservation of piRNA sequences suggest that the processing of piRNAs from a primary precursor is not a precise step, in contrast to microRNA maturation. Instead, it appears, to a first approximation, that piRNAs are generated by a random mechanism in which any U position is a potential 5′ piRNA start. This notion is supported by the fact that sequence overlap between the datasets remains low even when we compare only the more abundant sequences (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg003">Figure S3</xref>), and that there is no evidence for repetitive spacing between consecutive sequences (unpublished data). However, there appears to be some non-randomness in that some positions are preferentially processed into piRNAs (see patterns in <xref ref-type="fig" rid="pcbi-0030222-g001">Figure 1</xref>A, panels 4 and 5). In particular, a sizable fraction (∼20%) of all piRNA sequences were cloned three or more times, and we find that many piRNA sequences from the same strand are partially overlapping (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg004">Figure S4</xref>). This suggests some, albeit weak, sequence effects within a genomic cluster, either at the level of nuclease processing or at the level of loading into a PIWI complex. We use the term “quasi-random” to reflect this weak departure from random processing.</p></sec><sec id="s2e"><title>Weak Discriminating Sequence Motif</title><p>We therefore attempted to identify a distinguishing sequence signal that predicts which U bases are 5′ piRNA cleavage sites. Using a sequence classification algorithm, we identified, with 61% accuracy, the correct 5′ U piRNA sites from all other U positions using both 10-fold cross-validation on the training set and by testing on randomly withheld test set excluded from training (see <xref ref-type="sec" rid="s3">Methods</xref>). Although the classification accuracy is low, it is significantly better than random prediction (classification on randomized data did not exceed 50%). Furthermore, the classification accuracy improved to 72% when the algorithm was trained and tested on the abundant piRNA sequences (clone counts &gt;2). The differentiating signal is a weak preference for a G or A in the +1 position (relative to the 5′ U), an A in the +4 position, and a slight under-representation of G at the −1 position (<xref ref-type="fig" rid="pcbi-0030222-g003">Figure 3</xref>). These results suggest that the processing of the precursor is quasi-random in that there is a weak yet significant non-random sequence preference at the 5′ cleavage site.</p><fig id="pcbi-0030222-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.0030222.g003</object-id><label>Figure 3</label><caption><title>A Model of piRNA Biogenesis</title><p>piRNAs originate from long RNA precursors transcribed from a small number of genomic regions (A). Some clusters contain inverted repeats that can potentially form dsRNA fold-back structures. In this genomic view of a cluster chr2: 150870000–150910000 (B), the inverted repeats are represented as linked colored bars. These inverted repeats originate from inverted LINE transposable elements that flank the piRNA cluster (red and blue bars in the LINE track). A long transcript containing the pair of inverted LINE elements can potentially form a precursor with a dsRNA segment (C). piRNAs are processed by a quasi-random mechanism with a weak sequence preference near the 5′ U that is most pronounced in frequent clones (D).</p></caption><graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.g003" xlink:type="simple"/></fig></sec><sec id="s2f"><title>Model of piRNA Biogenesis</title><p>The precursor form of piRNA primary transcript— single- or double-stranded—is currently unknown. However, the strong 5′ uridine bias and the presence of the 5′ phosphate group [<xref ref-type="bibr" rid="pcbi-0030222-b004">4</xref>] in mature piRNAs is indicative of a dsRNA precursor that is processed by an RNase III type enzyme [<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>], although no such nuclease has so far been implicated in piRNA processing, and piRNA processing is independent of Dicer [<xref ref-type="bibr" rid="pcbi-0030222-b009">9</xref>].</p><p>In <named-content content-type="genus-species" xlink:type="simple">Caenorhabditis elegans</named-content>, germline silencing of transposable elements by the RNAi pathway is initiated by a dsRNA structure formed by base pairing of the terminal inverted repeats of the transposon in a fold-back structure [<xref ref-type="bibr" rid="pcbi-0030222-b038">38</xref>]. To investigate whether a similar mechanism may be involved in piRNA biogenesis, we searched for inverted repeats in or near the vicinity of piRNA clusters. Such inverted repeats may form precursors containing dsRNA that initiate enzymatic processing. Overall, we found that 63% of all clusters have inverted repeats of length 100 bases or longer (see <xref ref-type="sec" rid="s3">Methods</xref>) and that 25% of all clusters are bracketed by inverted repeats, i.e., the complementary segments are at the ends of the clusters (<xref ref-type="fig" rid="pcbi-0030222-g003">Figure 3</xref>B). Surprisingly, some of the flanking inverted repeats coincide with inverted transposable elements such as SINEs, LINEs, and LTRs that are on opposite strands, one on each side of the cluster (<xref ref-type="fig" rid="pcbi-0030222-g003">Figures 3</xref>B and <xref ref-type="supplementary-material" rid="pcbi-0030222-sg005">S5</xref>), suggesting a link between transposable elements and piRNA biogenesis.</p><p>Recent studies propose that mammalian piRNAs may be involved in transposon silencing analogous to <italic>Drosophila</italic> rasiRNAs, although the mechanistic details remain to be determined [<xref ref-type="bibr" rid="pcbi-0030222-b006">6</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>]. The model of transposon silencing by rasiRNAs put forward by [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>] explains the feed-forward amplification of the silencing process but not its initiation. They propose that the induction requires a pool of initiating rasiRNAs that triggers a mutual amplification loop between the Ago3-bound and the Piwi/Aub-bound rasiRNAs. The source of the initiating rasiRNAs is unknown, and they may be maternally inherited by the developing oocytes.</p><p>We hypothesize that one plausible model of piRNA biogenesis involves long transcripts that contain flanking inverted transposable elements, one at each end of the cluster (<xref ref-type="fig" rid="pcbi-0030222-g003">Figure 3</xref>B). Such precursors can arise, for example, by continuous transcription of one of the repeats past its termination site. If the transcript reaches the other end of the cluster and includes the sequence complementary to the repeat element on the opposing strand, the transcript can potentially form a dsRNA segment. piRNA biogenesis is then triggered by processing of the dsRNA segments which generate the initiating pool of piRNAs. Similar to the <italic>Drosophila</italic> model of rasiRNA generation [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>], these initial sequences may act on transcripts derived from other locations (in trans) containing at least one copy of the initiating repeat element and resulting in the production of a much larger pool of piRNAs.</p><p>We cannot exclude the possibility that the bracketing inverted transposable elements are not part of the primary transcript but simply the result of statistical coincidence. In fact, similar numbers of such repeats are found in randomly chosen genomic regions (unpublished data), as remnants of transposable elements account for over a third of the mouse genome [<xref ref-type="bibr" rid="pcbi-0030222-b035">35</xref>], but most of these may not be expressed. In contrast, the bracketing inverted repeat structures must be transcriptionally active, and we do find that a number of the transposable elements near piRNA clusters are indeed expressed in testes (as indicated by ESTs recorded in genome databases). Alternatively to the initiating dsRNA structure, a single-strand RNA precursor may be a direct substrate of a nuclease, yet to be discovered, that generates approximately 30-residue long 5′ P products.</p></sec><sec id="s2g"><title>Conclusions</title><p>The novel discovery of piRNAs has extended the multifaceted family of small interfering RNAs that includes microRNAs, siRNAs, and rasiRNAs. The tens of thousands of distinct mouse piRNAs observed so far map to ∼117 distinct genomics locations in the genome. The details of piRNA transcriptional control, such as promoter sites and transcription factors, remain to be determined. Our analysis has revealed low sequence overlap between the currently known pachytene-stage mouse piRNA datasets, although the sequences originate from a common set of genomic clusters. This apparent contradiction is resolved by noting lack of saturation in each individual experiment. We interpret the low sequence overlap as suggestive of quasi-random sub-saturation processing from common precursors, such that different experiments yield different and only partially overlapping sets of piRNAs. In addition, based on the observation of repeat structures bracketing some of the clusters, we propose that one plausible mechanism for initiation of piRNA biogenesis involves long transcripts with terminal inverted repeats, possibly derived from (remnants of) transposable elements. Such transcripts may form partial dsRNA intermediates initiating enzymatic degradation. Subsequent stages of piRNA biogenesis may then follow the ping-pong model proposed by [<xref ref-type="bibr" rid="pcbi-0030222-b013">13</xref>,<xref ref-type="bibr" rid="pcbi-0030222-b020">20</xref>].</p><p>The notion that piRNAs both direct the degradation and are the degradation products of their own precursors suggests that piRNA transcripts are under strict regulation at a crucial stage of meiosis. What is their function? The PIWI proteins are highly expressed in the pre-pachytene and pachytene stages of meiosis when chromosome pairing is completed (zygotene) and synapsis is peaked. This raises the intriguing possibility that the transcripts from which the piRNAs derive, and/or the piRNAs themselves, are involved in one of the crucial processes of meiosis, correct chromosome pairing, for which the molecular mechanism remains a mystery. The connection between this and the proposed piRNA function of transposon silencing remains to be elucidated. We look forward to directed biochemical and genomic experiments that will invalidate or confirm the models proposed here and explain the function of piRNAs.</p></sec></sec><sec id="s3"><title>Methods</title><sec id="s3a"><title>Datasets.</title><p>Mouse piRNA sequences were collected from the following sources: Dataset A from Lau et al. [<xref ref-type="bibr" rid="pcbi-0030222-b001">1</xref>]; Table S4 contains 65,535 unique small RNA sequences. After removing known small RNA sequences, the remaining 40,102 were considered as piRNA sequences for this dataset. Dataset B from Girard et al.[<xref ref-type="bibr" rid="pcbi-0030222-b002">2</xref>] (personal communication) includes 51,331 reads representing 28,956 unique sequences. Dataset C from Aravin et al. [<xref ref-type="bibr" rid="pcbi-0030222-b003">3</xref>] (Table 4 therein) contains 5,444 small RNA sequences of which 3,482 are unique sequences annotated as piRNAs. Dataset D from Watanabe et al. [<xref ref-type="bibr" rid="pcbi-0030222-b004">4</xref>] (Table S7 therein) contains 357 unique small RNA sequences. Dataset E from Grivna et al. [<xref ref-type="bibr" rid="pcbi-0030222-b005">5</xref>] (<xref ref-type="supplementary-material" rid="pcbi-0030222-st001">Table S1</xref> therein) contains 40 unique sequences.</p><p>Duplicate and subsequences were removed from each dataset at 100% nucleotide identity (<xref ref-type="table" rid="pcbi-0030222-t001">Table 1</xref>). In cases where genomic annotation was provided we removed known small RNAs (miRNAs, tRNAs, and snoRNAs) as well as apparent rRNA and mRNA fragments from the dataset. All sequences and clusters were mapped to mouse genome build mm7 (August 2005) taking the best genomic match up to a maximum of two mismatches or gaps. Sequence matching to the genome was performed using a combination of WU-BLAST (<ext-link ext-link-type="uri" xlink:href="http://blast.wustl.edu/" xlink:type="simple">http://blast.wustl.edu/</ext-link>) and our in-house alignment software developed jointly with M. Zavolan. The following BLAST arguments were used for short sequence alignments:</p><p>−W = 6 − X = 50 − gapX = 50 − S2 = 50 − gapS2 = 50 − hspmax = 1,000 − gspmax = 1,000 − E = 1,000 − filter = none.</p><p>Over 90% of the sequences mapped to unique genomic locations. In the remaining cases where there was more than one match to the genome, all positions were considered as a possible origin of the piRNA.</p><p>Coordinates of piRNA clusters from dataset C were translated from mm6 to mm7, in some cases resulting in a change in cluster length due to partial mapping:</p><def-list><def-item><term>cluster3 </term><def><p>mm6|chr9|+|67822641|67883254</p></def><def><p>mm7|chr9|+|67751406|67785923</p></def></def-item><def-item><term>cluster8 </term><def><p>mm6|chr14|+|22446408|22484616</p></def><def><p>mm7|chr14|+|21745838|21783387</p></def></def-item><def-item><term>cluster15 </term><def><p>mm6|chr9|+|54305216|54360650</p></def><def><p>mm7|chr9|+|54231430|54253257</p></def></def-item><def-item><term>cluster19 </term><def><p>mm6|chr17|+|63838569|63952874</p></def><def><p>mm7|chr17|+|64406371|64449447</p></def></def-item></def-list><p>The datasets were not significantly biased to specific sequences or nucleotide composition by experimental protocol. The two larger datasets (A and B) were produced using similar ligation adaptors and sequencing methods excluding the possibility of sequence bias due to different methodologies. Indeed, we found no differences in mononucleotide or dinucleotide frequencies between the datasets.</p></sec><sec id="s3b"><title>Comparison of genomic clusters and definition of intersection clusters.</title><p>Overlaps between genomic clusters from different datasets were determined by intersection of their genomic locations. The length of the overlaps ranged from 19% to 100% of the shorter cluster. In the majority (70%) of the overlapping clusters, the extent of the overlap covered &gt;75% of the length of the shorter cluster. Instances where two clusters from one dataset overlapped a single cluster from another dataset were counted as one overlap. Intersection clusters were defined as the genomic regions where clusters from all three datasets overlapped (See <xref ref-type="supplementary-material" rid="pcbi-0030222-st002">Table S2</xref>).</p></sec><sec id="s3c"><title>Sequence comparison.</title><p>Sequence comparison was performed as follows: All sequences (after initial processing) from all datasets were combined and compared all-against-all using WU-BLAST and in-house software. Sequences were grouped into similarity sets by hierarchical clustering and a defined identity measure. To explore sensitivity of the analysis to variation in parameters, we performed three clustering procedures using these identity measures: (i)100% sequence identity over the entire length of the shortest sequence; (ii) 95% sequence identity over 95% length of the shortest sequence; and (iii) 90% sequence identity over 90% length of the shortest sequence. Considering all sequences in a similarity cluster to be essentially identical, the degree of overlap between two datasets is determined by counting the number of similarity clusters that contain sequences from both datasets (<xref ref-type="supplementary-material" rid="pcbi-0030222-st001">Table S1</xref>). Similarly, the three-way overlap between datasets A, B, and C was determined by counting the number of similarity clusters that contained sequences from all three groups (<xref ref-type="fig" rid="pcbi-0030222-g001">Figure 1</xref>C). The comparison of the abundant piRNA sequences (higher clone counts) was performed in the same way using only sequences that were cloned &gt;2 times (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg003">Figure S3</xref>).</p><p>Human piRNA sequences were retrieved from Girard et al. (dataset B) and Aravin et al. (dataset C) studies. Similarly to mouse piRNAs, sequences that matched known small RNAs and mRNAs were removed resulting in 9,600 unique piRNA sequences from dataset B and 120 sequences from dataset C. Sequences comparison was performed as outlined above. Under 95% sequence identity measure, 29 sequences were shared between the two datasets corresponding to ∼24% of dataset C sequences.</p></sec><sec id="s3d"><title>Estimate of the total number of piRNAs.</title><p>The degree of overlap between two independent datasets, say <italic>X</italic> and <italic>Y</italic>, in a genomic intersection cluster is modeled by a hypergeometric distribution with a mean <inline-formula id="pcbi-0030222-ex001"><inline-graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.0030222.ex001" xlink:type="simple"/></inline-formula>
					 where <italic>n<sub>x</sub></italic> and <italic>n<sub>y</sub></italic> are the number of piRNA sequences in the cluster in datasets X and Y, respectively, and <italic>N</italic> is the total number of piRNAs in the cluster, which is unknown. This corresponds to random selection of <italic>n<sub>x</sub></italic> and <italic>n<sub>y</sub></italic> piRNA sequences from a total pool of <italic>N</italic> unique sequences, i.e., ignoring varying clone counts. Under the maximum likelihood assumption, the observed overlap between the two datasets is the most likely value. That is, <inline-formula id="pcbi-0030222-ex002"><inline-graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.0030222.ex002" xlink:type="simple"/></inline-formula>
					 where <italic>n<sub>x</sub></italic><sub> ∩ <italic>y</italic></sub> is the size of the observed overlap between datasets <italic>X</italic> and <italic>Y</italic>. For the purpose of this approximation, the size of the overlap <italic>n<sub>x</sub></italic><sub> ∩ <italic>y</italic></sub> was determined by a 95% sequence identity criterion (see above).
				</p><p>The value of total number of piRNAs <italic>N</italic> can then be computed directly as:
					<disp-formula id="pcbi-0030222-e001"><graphic mimetype="image" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.0030222.e001" xlink:type="simple"/><!-- <mml:math display='block'><mml:mrow><mml:mi>N</mml:mi><mml:mo>&equals;</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:msub><mml:mi>n</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mspace width='1pt'/><mml:mo>&cap;</mml:mo><mml:mspace width='1pt'/><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math> --></disp-formula>
				</p><p>For each intersection cluster we computed three estimates of <italic>N</italic>: <italic>N<sub>AB</sub></italic>,<italic>N<sub>AC</sub></italic>,<italic>N<sub>BC</sub></italic> based on the observed overlaps between datasets AB, AC, and BC (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg002">Figure S2</xref>).</p><p>The total number of piRNAs was computed as the average of the three approximations summed over all clusters:
					<disp-formula id="pcbi-0030222-e002"><graphic mimetype="image" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.0030222.e002" xlink:type="simple"/><!-- <mml:math display='block'><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&equals;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>3</mml:mn></mml:mfrac><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&sum;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&isin;</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>B</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup><mml:mo>&plus;</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:mstyle><mml:mo>&plus;</mml:mo><mml:msubsup><mml:mi>N</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:math> --></disp-formula>where <italic>i</italic> is an intersection cluster, <italic>c</italic> is the set of all intersection clusters, <inline-formula id="pcbi-0030222-ex003"><inline-graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.0030222.ex003" xlink:type="simple"/></inline-formula>
					 is the computed total number of piRNAs in cluster <italic>i</italic> based on the overlap between datasets A and B, and similarly for <inline-formula id="pcbi-0030222-ex004"><inline-graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.0030222.ex004" xlink:type="simple"/></inline-formula>
					 and <inline-formula id="pcbi-0030222-ex005"><inline-graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.0030222.ex005" xlink:type="simple"/></inline-formula>
					.
				</p><p>To approximate the total number of piRNAs in the mouse genome we extrapolated the total number in all intersection clusters, to the union of all clusters from datasets A, B, and C (<xref ref-type="supplementary-material" rid="pcbi-0030222-st003">Table S3</xref>), by multiplying <italic>N<sub>Total</sub></italic> by the ratio of the combined length of the union of all clusters to the combined length of all intersection clusters (<xref ref-type="supplementary-material" rid="pcbi-0030222-sg002">Figure S2</xref>).</p></sec><sec id="s3e"><title>piRNA distance distribution.</title><p>Sequences assigned to genomic positions were sorted by chromosomal position. The distance between two adjacent sequences <italic>i,j</italic> mapped to the <italic>same</italic> strand is determined by:
					<disp-formula id="pcbi-0030222-e003"><graphic mimetype="image" position="anchor" xlink:href="info:doi/10.1371/journal.pcbi.0030222.e003" xlink:type="simple"/><!-- <mml:math display='block'><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&equals;</mml:mo><mml:msub><mml:mi>j</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&minus;</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mspace width='1pt'/><mml:mtext>where</mml:mtext><mml:mspace width='3pt'/><mml:msub><mml:mi>j</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&lt;</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math> --></disp-formula>When i and j are overlapping <italic>d<sub>i</sub></italic><sub>,<italic>j</italic></sub> ≤ 0.
				</p></sec><sec id="s3f"><title>Support vector machine classification of 5′ U piRNA sites.</title><p>To identify a distinguishing signal for 5′ piRNA processing in cluster regions, we trained a support vector machine classifier to discriminate between 5′ piRNA and all other uridine positions.</p><p>Positive set included all of the piRNAs 5′ uridine positions extended ten bases upstream and downstream; a total of 24,604 sequences. Similarly, the negative set was constructed by selecting random non-piRNA uridine positions in the intersection clusters and ten nucleotides upstream and downstream. Both sets were split into two, one part used for training and the other for testing. Feature vectors were constructed by converting the 21-base sequences into 84-bit vectors (21 nt × 4 bases), i.e., each nucleotide position is converted to a 4-bit vector representing the RNA base.</p><p>Support vector machine training and classification was performed using an R interface of “libsvm” (<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/src/contrib/Descriptions/e1071.html" xlink:type="simple">http://cran.r-project.org/src/contrib/Descriptions/e1071.html</ext-link>) using a polynomial kernel of degree 3. Classification accuracy in a 10-fold cross-validation on the training set and testing procedure on an independent test set was ∼61%, whereas classification using a randomized training set did not exceed 50% accuracy. Using the high frequency piRNAs (cloned &gt;2 times) as the positive training set, the prediction accuracy in 10-fold cross-validation and with the test set improves to 72%. In a feature selection process we found that positions −1, +1, and +4 (relative to the starting uridine position 0) were the largest contributors to the classification (<xref ref-type="fig" rid="pcbi-0030222-g003">Figure 3</xref>D). Information content analysis revealed a preference for G or A in positions +1, for an A in positions +4, and under-representation of G at position −1.</p></sec><sec id="s3g"><title>Inverted repeats analysis.</title><p>For detection of inverted repeats in the vicinity of cluster, sequences were collected from the union clusters (<xref ref-type="supplementary-material" rid="pcbi-0030222-st003">Table S3</xref>) and extended by 10 kb in both 5′ and 3′ directions. The sequences were aligned to their complements by “bl2seq” (a BLAST implementation for aligning two sequences) in gapless mode (using –g F flag). Alignments longer than 100 bases with &gt;90% identity were mapped to the mouse genome and used in subsequent analysis.</p></sec></sec><sec id="s4"><title>Supporting Information</title><supplementary-material id="pcbi-0030222-sg001" mimetype="application/pdf" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.sg001" xlink:type="simple"><label>Figure S1</label><caption><title>piRNA Clusters in the Mouse Genome</title><p>(100 KB PDF)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-sg002" mimetype="application/pdf" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.sg002" xlink:type="simple"><label>Figure S2</label><caption><title>Estimation of the Number of piRNAs in Intersection Clusters</title><p>(1.2 MB PDF)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-sg003" mimetype="application/pdf" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.sg003" xlink:type="simple"><label>Figure S3</label><caption><title>Sequence Overlap of Abundant piRNAs</title><p>(18 MB PDF)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-sg004" mimetype="application/pdf" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.sg004" xlink:type="simple"><label>Figure S4</label><caption><title>Distribution of Spacing between Consecutive piRNA Sequences</title><p>(66 KB PDF)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-sg005" mimetype="application/pdf" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.sg005" xlink:type="simple"><label>Figure S5</label><caption><title>Inverted Repeats Derived from Transposable Bracketing piRNA Clusters</title><p>(60 KB PDF)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-st001" mimetype="application/msexcel" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.st001" xlink:type="simple"><label>Table S1</label><caption><title>Sequence Overlap between piRNA Datasets</title><p>(19 KB XLS)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-st002" mimetype="application/msexcel" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.st002" xlink:type="simple"><label>Table S2</label><caption><title>Genomic Positions of Intersection Clusters</title><p>(22 KB XLS)</p></caption></supplementary-material><supplementary-material id="pcbi-0030222-st003" mimetype="application/msexcel" position="float" xlink:href="info:doi/10.1371/journal.pcbi.0030222.st003" xlink:type="simple"><label>Table S3</label><caption><title>Genomic Positions of Union Clusters</title><p>(31 KB XLS)</p></caption></supplementary-material></sec></body><back><ack><p>We are grateful to Boris Reva, Sven Nelander, Nikolaus Schultz, and Tom Tuschl for comments, and to Greg Hannon for early access to piRNA sequences.</p></ack><glossary><title>Abbreviations</title><def-list><def-item><term>dsRNA</term><def><p>double-strand RNA</p></def></def-item><def-item><term>MIWI</term><def><p>murine PIWI</p></def></def-item><def-item><term>piRNA</term><def><p>PIWI-interacting RNA</p></def></def-item><def-item><term>rasiRNA</term><def><p>repeat-associated small interfering RNA</p></def></def-item><def-item><term>siRNA</term><def><p>small interfering RNA</p></def></def-item></def-list></glossary><ref-list><title>References</title><ref id="pcbi-0030222-b001"><label>1</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Lau</surname><given-names>NC</given-names></name><name name-style="western"><surname>Seto</surname><given-names>AG</given-names></name><name name-style="western"><surname>Kim</surname><given-names>J</given-names></name><name name-style="western"><surname>Kuramochi-Miyagawa</surname><given-names>S</given-names></name><name name-style="western"><surname>Nakano</surname><given-names>T</given-names></name><etal/></person-group>
					<year>2006</year>
					<article-title>Characterization of the piRNA complex from rat testes.</article-title>
					<source>Science</source>
					<volume>313</volume>
					<fpage>363</fpage>
					<lpage>367</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b002"><label>2</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Girard</surname><given-names>A</given-names></name><name name-style="western"><surname>Sachidanandam</surname><given-names>R</given-names></name><name name-style="western"><surname>Hannon</surname><given-names>GJ</given-names></name><name name-style="western"><surname>Carmell</surname><given-names>MA</given-names></name></person-group>
					<year>2006</year>
					<article-title>A germline-specific class of small RNAs binds mammalian Piwi proteins.</article-title>
					<source>Nature</source>
					<volume>442</volume>
					<fpage>199</fpage>
					<lpage>202</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b003"><label>3</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Aravin</surname><given-names>A</given-names></name><name name-style="western"><surname>Gaidatzis</surname><given-names>D</given-names></name><name name-style="western"><surname>Pfeffer</surname><given-names>S</given-names></name><name name-style="western"><surname>Lagos-Quintana</surname><given-names>M</given-names></name><name name-style="western"><surname>Landgraf</surname><given-names>P</given-names></name><etal/></person-group>
					<year>2006</year>
					<article-title>A novel class of small RNAs bind to MILI protein in mouse testes.</article-title>
					<source>Nature</source>
					<volume>442</volume>
					<fpage>203</fpage>
					<lpage>207</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b004"><label>4</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Watanabe</surname><given-names>T</given-names></name><name name-style="western"><surname>Takeda</surname><given-names>A</given-names></name><name name-style="western"><surname>Tsukiyama</surname><given-names>T</given-names></name><name name-style="western"><surname>Mise</surname><given-names>K</given-names></name><name name-style="western"><surname>Okuno</surname><given-names>T</given-names></name><etal/></person-group>
					<year>2006</year>
					<article-title>Identification and characterization of two novel classes of small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes and germline small RNAs in testes.</article-title>
					<source>Genes Dev</source>
					<volume>20</volume>
					<fpage>1732</fpage>
					<lpage>1743</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b005"><label>5</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Grivna</surname><given-names>ST</given-names></name><name name-style="western"><surname>Beyret</surname><given-names>E</given-names></name><name name-style="western"><surname>Wang</surname><given-names>Z</given-names></name><name name-style="western"><surname>Lin</surname><given-names>H</given-names></name></person-group>
					<year>2006</year>
					<article-title>A novel class of small RNAs in mouse spermatogenic cells.</article-title>
					<source>Genes Dev</source>
					<volume>20</volume>
					<fpage>1709</fpage>
					<lpage>1714</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b006"><label>6</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Aravin</surname><given-names>AA</given-names></name><name name-style="western"><surname>Sachidanandam</surname><given-names>R</given-names></name><name name-style="western"><surname>Girard</surname><given-names>A</given-names></name><name name-style="western"><surname>Fejes-Toth</surname><given-names>K</given-names></name><name name-style="western"><surname>Hannon</surname><given-names>GJ</given-names></name></person-group>
					<year>2007</year>
					<article-title>Developmentally regulated piRNA clusters implicate MILI in transposon control.</article-title>
					<source>Science</source>
					<volume>316</volume>
					<fpage>744</fpage>
					<lpage>747</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b007"><label>7</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Kim</surname><given-names>VN</given-names></name></person-group>
					<year>2006</year>
					<article-title>Small RNAs just got bigger: Piwi-interacting RNAs (piRNAs) in mammalian testes.</article-title>
					<source>Genes Dev</source>
					<volume>20</volume>
					<fpage>1993</fpage>
					<lpage>1997</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b008"><label>8</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>O'Donnell</surname><given-names>KA</given-names></name><name name-style="western"><surname>Boeke</surname><given-names>JD</given-names></name></person-group>
					<year>2007</year>
					<article-title>Mighty Piwis defend the germline against genome intruders.</article-title>
					<source>Cell</source>
					<volume>129</volume>
					<fpage>37</fpage>
					<lpage>44</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b009"><label>9</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Zamore</surname><given-names>PD</given-names></name></person-group>
					<year>2007</year>
					<article-title>RNA silencing: genomic defense with a slice of pi.</article-title>
					<source>Nature</source>
					<volume>446</volume>
					<fpage>864</fpage>
					<lpage>865</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b010"><label>10</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Lin</surname><given-names>H</given-names></name></person-group>
					<year>2007</year>
					<article-title>piRNAs in the germline.</article-title>
					<source>Science</source>
					<volume>316</volume>
					<fpage>397</fpage>
				</element-citation></ref><ref id="pcbi-0030222-b011"><label>11</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Seto</surname><given-names>AG</given-names></name><name name-style="western"><surname>Kingston</surname><given-names>RE</given-names></name><name name-style="western"><surname>Lau</surname><given-names>NC</given-names></name></person-group>
					<year>2007</year>
					<article-title>The coming of age for Piwi proteins.</article-title>
					<source>Mol Cell</source>
					<volume>26</volume>
					<fpage>603</fpage>
					<lpage>609</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b012"><label>12</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Hartig</surname><given-names>JV</given-names></name><name name-style="western"><surname>Tomari</surname><given-names>Y</given-names></name><name name-style="western"><surname>Forstemann</surname><given-names>K</given-names></name></person-group>
					<year>2007</year>
					<article-title>piRNAs: The ancient hunters of genome invaders.</article-title>
					<source>Genes Dev</source>
					<volume>21</volume>
					<fpage>1707</fpage>
					<lpage>1713</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b013"><label>13</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Brennecke</surname><given-names>J</given-names></name><name name-style="western"><surname>Aravin</surname><given-names>AA</given-names></name><name name-style="western"><surname>Stark</surname><given-names>A</given-names></name><name name-style="western"><surname>Dus</surname><given-names>M</given-names></name><name name-style="western"><surname>Kellis</surname><given-names>M</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila.</article-title>
					<source>Cell</source>
					<volume>128</volume>
					<fpage>1089</fpage>
					<lpage>1103</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b014"><label>14</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Pelisson</surname><given-names>A</given-names></name><name name-style="western"><surname>Sarot</surname><given-names>E</given-names></name><name name-style="western"><surname>Payen-Groschene</surname><given-names>G</given-names></name><name name-style="western"><surname>Bucheton</surname><given-names>A</given-names></name></person-group>
					<year>2007</year>
					<article-title>A novel repeat-associated small interfering RNA-mediated silencing pathway downregulates complementary sense gypsy transcripts in somatic cells of the Drosophila ovary.</article-title>
					<source>J Virol</source>
					<volume>81</volume>
					<fpage>1951</fpage>
					<lpage>1960</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b015"><label>15</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Saito</surname><given-names>K</given-names></name><name name-style="western"><surname>Nishida</surname><given-names>KM</given-names></name><name name-style="western"><surname>Mori</surname><given-names>T</given-names></name><name name-style="western"><surname>Kawamura</surname><given-names>Y</given-names></name><name name-style="western"><surname>Miyoshi</surname><given-names>K</given-names></name><etal/></person-group>
					<year>2006</year>
					<article-title>Specific association of Piwi with rasiRNAs derived from retrotransposon and heterochromatic regions in the Drosophila genome.</article-title>
					<source>Genes Dev</source>
					<volume>20</volume>
					<fpage>2214</fpage>
					<lpage>2222</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b016"><label>16</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Vagin</surname><given-names>VV</given-names></name><name name-style="western"><surname>Sigova</surname><given-names>A</given-names></name><name name-style="western"><surname>Li</surname><given-names>C</given-names></name><name name-style="western"><surname>Seitz</surname><given-names>H</given-names></name><name name-style="western"><surname>Gvozdev</surname><given-names>V</given-names></name><etal/></person-group>
					<year>2006</year>
					<article-title>A distinct small RNA pathway silences selfish genetic elements in the germline.</article-title>
					<source>Science</source>
					<volume>313</volume>
					<fpage>320</fpage>
					<lpage>324</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b017"><label>17</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Aravin</surname><given-names>AA</given-names></name><name name-style="western"><surname>Lagos-Quintana</surname><given-names>M</given-names></name><name name-style="western"><surname>Yalcin</surname><given-names>A</given-names></name><name name-style="western"><surname>Zavolan</surname><given-names>M</given-names></name><name name-style="western"><surname>Marks</surname><given-names>D</given-names></name><etal/></person-group>
					<year>2003</year>
					<article-title>The small RNA profile during Drosophila melanogaster development.</article-title>
					<source>Dev Cell</source>
					<volume>5</volume>
					<fpage>337</fpage>
					<lpage>350</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b018"><label>18</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Chen</surname><given-names>PY</given-names></name><name name-style="western"><surname>Manninga</surname><given-names>H</given-names></name><name name-style="western"><surname>Slanchev</surname><given-names>K</given-names></name><name name-style="western"><surname>Chien</surname><given-names>M</given-names></name><name name-style="western"><surname>Russo</surname><given-names>JJ</given-names></name><etal/></person-group>
					<year>2005</year>
					<article-title>The developmental miRNA profiles of zebrafish as determined by small RNA cloning.</article-title>
					<source>Genes Dev</source>
					<volume>19</volume>
					<fpage>1288</fpage>
					<lpage>1293</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b019"><label>19</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Megosh</surname><given-names>HB</given-names></name><name name-style="western"><surname>Cox</surname><given-names>DN</given-names></name><name name-style="western"><surname>Campbell</surname><given-names>C</given-names></name><name name-style="western"><surname>Lin</surname><given-names>H</given-names></name></person-group>
					<year>2006</year>
					<article-title>The role of PIWI and the miRNA machinery in Drosophila germline determination.</article-title>
					<source>Curr Biol</source>
					<volume>16</volume>
					<fpage>1884</fpage>
					<lpage>1894</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b020"><label>20</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Gunawardane</surname><given-names>LS</given-names></name><name name-style="western"><surname>Saito</surname><given-names>K</given-names></name><name name-style="western"><surname>Nishida</surname><given-names>KM</given-names></name><name name-style="western"><surname>Miyoshi</surname><given-names>K</given-names></name><name name-style="western"><surname>Kawamura</surname><given-names>Y</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>A slicer-mediated mechanism for repeat-associated siRNA 5′ end formation in Drosophila.</article-title>
					<source>Science</source>
					<volume>315</volume>
					<fpage>1587</fpage>
					<lpage>1590</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b021"><label>21</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Houwing</surname><given-names>S</given-names></name><name name-style="western"><surname>Kamminga</surname><given-names>LM</given-names></name><name name-style="western"><surname>Berezikov</surname><given-names>E</given-names></name><name name-style="western"><surname>Cronembold</surname><given-names>D</given-names></name><name name-style="western"><surname>Girard</surname><given-names>A</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>A role for Piwi and piRNAs in germ cell maintenance and transposon silencing in Zebrafish.</article-title>
					<source>Cell</source>
					<volume>129</volume>
					<fpage>69</fpage>
					<lpage>82</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b022"><label>22</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Saito</surname><given-names>K</given-names></name><name name-style="western"><surname>Sakaguchi</surname><given-names>Y</given-names></name><name name-style="western"><surname>Suzuki</surname><given-names>T</given-names></name><name name-style="western"><surname>Suzuki</surname><given-names>T</given-names></name><name name-style="western"><surname>Siomi</surname><given-names>H</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>Pimet, the Drosophila homolog of HEN1, mediates 2′-O-methylation of Piwi- interacting RNAs at their 3′ ends.</article-title>
					<source>Genes Dev</source>
					<volume>21</volume>
					<fpage>1603</fpage>
					<lpage>1608</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b023"><label>23</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Horwich</surname><given-names>MD</given-names></name><name name-style="western"><surname>Li</surname><given-names>C</given-names></name><name name-style="western"><surname>Matranga</surname><given-names>C</given-names></name><name name-style="western"><surname>Vagin</surname><given-names>V</given-names></name><name name-style="western"><surname>Farley</surname><given-names>G</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>The Drosophila RNA methyltransferase, DmHen1, modifies germline piRNAs and single-stranded siRNAs in RISC.</article-title>
					<source>Curr Biol</source>
					<volume>17</volume>
					<fpage>1265</fpage>
					<lpage>1272</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b024"><label>24</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Kirino</surname><given-names>Y</given-names></name><name name-style="western"><surname>Mourelatos</surname><given-names>Z</given-names></name></person-group>
					<year>2007</year>
					<article-title>The mouse homolog of HEN1 is a potential methylase for Piwi-interacting RNAs.</article-title>
					<source>RNA</source>
					<volume>13</volume>
					<fpage>1397</fpage>
					<lpage>1401</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b025"><label>25</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>O'Hara</surname><given-names>T</given-names></name><name name-style="western"><surname>Sakaguchi</surname><given-names>Y</given-names></name><name name-style="western"><surname>Suzuki</surname><given-names>T</given-names></name><name name-style="western"><surname>Ueda</surname><given-names>H</given-names></name><name name-style="western"><surname>Miyauchi</surname><given-names>K</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>The 3′ termini of mouse Piwi-interacting RNAs are 2′-O-methylated.</article-title>
					<source>Nat Struct Mol Biol</source>
					<volume>14</volume>
					<fpage>349</fpage>
					<lpage>350</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b026"><label>26</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Kirino</surname><given-names>Y</given-names></name><name name-style="western"><surname>Mourelatos</surname><given-names>Z</given-names></name></person-group>
					<year>2007</year>
					<article-title>Mouse Piwi-interacting RNAs are 2′-O-methylated at their 3′ termini.</article-title>
					<source>Nat Struct Mol Biol</source>
					<volume>14</volume>
					<fpage>347</fpage>
					<lpage>348</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b027"><label>27</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Carmell</surname><given-names>MA</given-names></name><name name-style="western"><surname>Xuan</surname><given-names>Z</given-names></name><name name-style="western"><surname>Zhang</surname><given-names>MQ</given-names></name><name name-style="western"><surname>Hannon</surname><given-names>GJ</given-names></name></person-group>
					<year>2002</year>
					<article-title>The Argonaute family: Tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis.</article-title>
					<source>Genes Dev</source>
					<volume>16</volume>
					<fpage>2733</fpage>
					<lpage>2742</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b028"><label>28</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Deng</surname><given-names>W</given-names></name><name name-style="western"><surname>Lin</surname><given-names>H</given-names></name></person-group>
					<year>2002</year>
					<article-title>Miwi, a murine homolog of piwi, encodes a cytoplasmic protein essential for spermatogenesis.</article-title>
					<source>Dev Cell</source>
					<volume>2</volume>
					<fpage>819</fpage>
					<lpage>830</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b029"><label>29</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Kuramochi-Miyagawa</surname><given-names>S</given-names></name><name name-style="western"><surname>Kimura</surname><given-names>T</given-names></name><name name-style="western"><surname>Ijiri</surname><given-names>TW</given-names></name><name name-style="western"><surname>Isobe</surname><given-names>T</given-names></name><name name-style="western"><surname>Asada</surname><given-names>N</given-names></name><etal/></person-group>
					<year>2004</year>
					<article-title>Mili, a mammalian member of piwi family gene, is essential for spermatogenesis.</article-title>
					<source>Development</source>
					<volume>131</volume>
					<fpage>839</fpage>
					<lpage>849</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b030"><label>30</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Sasaki</surname><given-names>T</given-names></name><name name-style="western"><surname>Shiohama</surname><given-names>A</given-names></name><name name-style="western"><surname>Minoshima</surname><given-names>S</given-names></name><name name-style="western"><surname>Shimizu</surname><given-names>N</given-names></name></person-group>
					<year>2003</year>
					<article-title>Identification of eight members of the Argonaute family in the human genome.</article-title>
					<source>Genomics</source>
					<volume>82</volume>
					<fpage>323</fpage>
					<lpage>330</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b031"><label>31</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Carmell</surname><given-names>MA</given-names></name><name name-style="western"><surname>Girard</surname><given-names>A</given-names></name><name name-style="western"><surname>van de Kant</surname><given-names>HJ</given-names></name><name name-style="western"><surname>Bourc'his</surname><given-names>D</given-names></name><name name-style="western"><surname>Bestor</surname><given-names>TH</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>MIWI2 is essential for spermatogenesis and repression of transposons in the mouse male germ line.</article-title>
					<source>Dev Cell</source>
					<volume>12</volume>
					<fpage>503</fpage>
					<lpage>514</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b032"><label>32</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Kotaja</surname><given-names>N</given-names></name><name name-style="western"><surname>Bhattacharyya</surname><given-names>SN</given-names></name><name name-style="western"><surname>Jaskiewicz</surname><given-names>L</given-names></name><name name-style="western"><surname>Kimmins</surname><given-names>S</given-names></name><name name-style="western"><surname>Parvinen</surname><given-names>M</given-names></name><etal/></person-group>
					<year>2006</year>
					<article-title>The chromatoid body of male germ cells: Similarity with processing bodies and presence of Dicer and microRNA pathway components.</article-title>
					<source>Proc Natl Acad Sci U S A</source>
					<volume>103</volume>
					<fpage>2647</fpage>
					<lpage>2652</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b033"><label>33</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Parvinen</surname><given-names>M</given-names></name></person-group>
					<year>2005</year>
					<article-title>The chromatoid body in spermatogenesis.</article-title>
					<source>Int J Androl</source>
					<volume>28</volume>
					<fpage>189</fpage>
					<lpage>201</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b034"><label>34</label><element-citation publication-type="other" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Nishida</surname><given-names>KM</given-names></name><name name-style="western"><surname>Saito</surname><given-names>K</given-names></name><name name-style="western"><surname>Mori</surname><given-names>T</given-names></name><name name-style="western"><surname>Kawamura</surname><given-names>Y</given-names></name><name name-style="western"><surname>Nagami-Okada</surname><given-names>T</given-names></name><etal/></person-group>
					<year>2007</year>
					<article-title>Gene silencing mechanisms mediated by Aubergine piRNA complexes in Drosophila male gonad.</article-title>
					<source>RNA</source>
					<comment>In press. doi:<ext-link ext-link-type="doi" xlink:href="http://dx.doi.org/10.1261/rna.744307" xlink:type="simple">10.1261/rna.744307</ext-link></comment>
				</element-citation></ref><ref id="pcbi-0030222-b035"><label>35</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Waterston</surname><given-names>RH</given-names></name><name name-style="western"><surname>Lindblad-Toh</surname><given-names>K</given-names></name><name name-style="western"><surname>Birney</surname><given-names>E</given-names></name><name name-style="western"><surname>Rogers</surname><given-names>J</given-names></name><name name-style="western"><surname>Abril</surname><given-names>JF</given-names></name><etal/></person-group>
					<year>2002</year>
					<article-title>Initial sequencing and comparative analysis of the mouse genome.</article-title>
					<source>Nature</source>
					<volume>420</volume>
					<fpage>520</fpage>
					<lpage>562</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b036"><label>36</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Mochizuki</surname><given-names>K</given-names></name><name name-style="western"><surname>Fine</surname><given-names>NA</given-names></name><name name-style="western"><surname>Fujisawa</surname><given-names>T</given-names></name><name name-style="western"><surname>Gorovsky</surname><given-names>MA</given-names></name></person-group>
					<year>2002</year>
					<article-title>Analysis of a piwi-related gene implicates small RNAs in genome rearrangement in tetrahymena.</article-title>
					<source>Cell</source>
					<volume>110</volume>
					<fpage>689</fpage>
					<lpage>699</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b037"><label>37</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Lee</surname><given-names>SR</given-names></name><name name-style="western"><surname>Collins</surname><given-names>K</given-names></name></person-group>
					<year>2006</year>
					<article-title>Two classes of endogenous small RNAs in Tetrahymena thermophila.</article-title>
					<source>Genes Dev</source>
					<volume>20</volume>
					<fpage>28</fpage>
					<lpage>33</lpage>
				</element-citation></ref><ref id="pcbi-0030222-b038"><label>38</label><element-citation publication-type="journal" xlink:type="simple">
					<person-group person-group-type="author"><name name-style="western"><surname>Sijen</surname><given-names>T</given-names></name><name name-style="western"><surname>Plasterk</surname><given-names>RH</given-names></name></person-group>
					<year>2003</year>
					<article-title>Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi.</article-title>
					<source>Nature</source>
					<volume>426</volume>
					<fpage>310</fpage>
					<lpage>314</lpage>
				</element-citation></ref></ref-list></back></article>