<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="nlm-ta">PLoS Comput Biol</journal-id>
<journal-id journal-id-type="pmc">ploscomp</journal-id><journal-title-group>
<journal-title>PLoS Computational Biology</journal-title></journal-title-group>
<issn pub-type="ppub">1553-734X</issn>
<issn pub-type="epub">1553-7358</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc></publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">PCOMPBIOL-D-14-01502</article-id>
<article-id pub-id-type="doi">10.1371/journal.pcbi.1003929</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biology and life sciences</subject><subj-group><subject>Biochemistry</subject><subj-group><subject>Proteins</subject><subj-group><subject>Protein domains</subject><subject>Protein structure</subject></subj-group></subj-group></subj-group><subj-group><subject>Computational biology</subject><subj-group><subject>Genome analysis</subject><subj-group><subject>Gene ontologies</subject></subj-group></subj-group></subj-group><subj-group><subject>Molecular biology</subject><subj-group><subject>Macromolecular structure analysis</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer and information sciences</subject><subj-group><subject>Computer software</subject><subj-group><subject>Open source software</subject></subj-group></subj-group></subj-group></article-categories>
<title-group>
<article-title>dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations</article-title>
<alt-title alt-title-type="running-head">dcGOR: Software for Ontologies and Domain Annotations</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>Hai</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib>
</contrib-group>
<aff id="aff1"><addr-line>Computational Genomics Group, Department of Computer Science, University of Bristol, Bristol, United Kingdom</addr-line></aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple"><name name-style="western"><surname>Gardner</surname><given-names>Paul P.</given-names></name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/></contrib>
</contrib-group>
<aff id="edit1"><addr-line>University of Canterbury, New Zealand</addr-line></aff>
<author-notes>
<corresp id="cor1">* E-mail: <email xlink:type="simple">hfang@cs.bris.ac.uk</email></corresp>
<fn fn-type="conflict"><p>The author has declared that no competing interests exist.</p></fn>
<fn fn-type="con"><p>Conceived and designed the experiments: HF. Performed the experiments: HF. Analyzed the data: HF. Contributed reagents/materials/analysis tools: HF. Wrote the paper: HF.</p></fn>
</author-notes>
<pub-date pub-type="collection"><month>10</month><year>2014</year></pub-date>
<pub-date pub-type="epub"><day>30</day><month>10</month><year>2014</year></pub-date>
<volume>10</volume>
<issue>10</issue>
<elocation-id>e1003929</elocation-id>
<history>
<date date-type="received"><day>15</day><month>8</month><year>2014</year></date>
<date date-type="accepted"><day>21</day><month>9</month><year>2014</year></date>
</history>
<permissions>
<copyright-year>2014</copyright-year>
<copyright-holder>Hai Fang</copyright-holder><license xlink:type="simple"><license-p>This is an open-access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p></license></permissions>
<abstract>
<p>I introduce an open-source R package ‘dcGOR’ to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at <ext-link ext-link-type="uri" xlink:href="http://supfam.org/dcGOR" xlink:type="simple">http://supfam.org/dcGOR</ext-link>.</p>
</abstract>
<funding-group><funding-statement>This project is funded by Biotechnology and Biological Sciences Research Council (<ext-link ext-link-type="uri" xlink:href="http://www.bbsrc.ac.uk" xlink:type="simple">http://www.bbsrc.ac.uk</ext-link>) [BB/L018543/1]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement></funding-group><counts><page-count count="7"/></counts><custom-meta-group><custom-meta id="data-availability" xlink:type="simple"><meta-name>Data Availability</meta-name><meta-value>The author confirms that all data underlying the findings are fully available without restriction. All relevant data are within the paper and its Supporting Information file.</meta-value></custom-meta></custom-meta-group></article-meta>
</front>
<body><sec id="s1">
<title/>
<disp-quote>
<p>This is a <italic>PLOS Computational Biology</italic> Software Article</p>
</disp-quote></sec><sec id="s2">
<title>Introduction</title>
<p>Proteins are of modular design, with structural units called domains <xref ref-type="bibr" rid="pcbi.1003929-Murzin1">[1]</xref>. Domains often act as the operational units responsible for many aspects of protein function, and some of them are linked to phenotypic traits and diseased states. Despite their importance in biology, domains are less studied than proteins/genes in terms of ontology annotation; something much-needed and only recently addressed by the dcGO database <xref ref-type="bibr" rid="pcbi.1003929-Fang1">[2]</xref>. This database provides a systematic annotation of domains using a panel of ontologies; an ontology such as Gene Ontology (GO) <xref ref-type="bibr" rid="pcbi.1003929-Ashburner1">[3]</xref> is controlled vocabularies but organised in a hierarchy to categorise a particular sphere of knowledge. The dcGO algorithm was initially published as an improvement to the SUPERFAMILY database <xref ref-type="bibr" rid="pcbi.1003929-DeLimaMorais1">[4]</xref>. The quality and utility of this resource were evaluated in the Critical Assessment of Function Annotation (CAFA) competition <xref ref-type="bibr" rid="pcbi.1003929-Radivojac1">[5]</xref>, <xref ref-type="bibr" rid="pcbi.1003929-Fang2">[6]</xref>. The webserver provides several mining facilities, however, web-based facilities are limited in analytical flexibility and scalability; there is a need to have a standalone tool overcoming these limitations. Currently, there are no bioinformatics tools that are specifically designed for analysing ontologies and annotations at the domain level. Most, if not all, open-source tools (such as ‘topGO’ <xref ref-type="bibr" rid="pcbi.1003929-Alexa1">[7]</xref>, ‘GOSemSim’ <xref ref-type="bibr" rid="pcbi.1003929-Yu1">[8]</xref> and ‘ontologizer’ <xref ref-type="bibr" rid="pcbi.1003929-Grossmann1">[9]</xref>) are gene-centric and only deal with a very limited number of ontologies, usually GO. To the best of my knowledge, these tools do not provide the support for customised analysis according to users' own ontologies and annotations. To meet these needs, I have developed ‘dcGOR’, a flexible R package that provides a basic infrastructure suitable for representing ontologies and annotations. More importantly, it supports various analytical utilities tailored to this important resource. As demonstrated below, dcGOR is capable of in-depth analyses of input domains; structural bioinformatics/genomics community is increasingly confronted with this type of analysis. With this package, users are expected to understand their domains of interest: not just in the relevance to functions, phenotypes and diseases, but also at a network level. With this package, users are also able to perform customised analysis using their own ontologies and annotations.</p>
</sec><sec id="s3">
<title>Design and Implementation</title>
<p>The dcGOR package is designed in a general way that allows for representing and analysing three bits of information: domains, ontologies and annotations. For it to be applicable in domain-centric annotations, the backend is its built-in data that is pre-compiled from the latest version of the dcGO database <xref ref-type="bibr" rid="pcbi.1003929-Fang1">[2]</xref>. There are a dozen or so ontologies, such as GO, Disease Ontology (DO) <xref ref-type="bibr" rid="pcbi.1003929-Schriml1">[10]</xref> and Human Phenotype (HP) <xref ref-type="bibr" rid="pcbi.1003929-Khler1">[11]</xref>. They are all used to annotate both SCOP domain superfamilies and families <xref ref-type="bibr" rid="pcbi.1003929-Andreeva1">[12]</xref>. Also supported are GO annotations for domains taken from Pfam <xref ref-type="bibr" rid="pcbi.1003929-Punta1">[13]</xref> and InterPro <xref ref-type="bibr" rid="pcbi.1003929-Hunter1">[14]</xref>, and for non-coding RNAs from Rfam <xref ref-type="bibr" rid="pcbi.1003929-Gardner1">[15]</xref>. <xref ref-type="table" rid="pcbi-1003929-t001"><bold>Table 1</bold></xref> lists ontologies and annotations supported in the package.</p>
<table-wrap id="pcbi-1003929-t001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003929.t001</object-id><label>Table 1</label><caption>
<title>A summary of ontologies, infrastructures and functions included in dcGOR.</title>
</caption><alternatives><graphic id="pcbi-1003929-t001-1" position="float" mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003929.t001" xlink:type="simple"/>
<table><colgroup span="1"><col align="left" span="1"/><col align="center" span="1"/></colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"/>
<td align="left" rowspan="1" colspan="1">Description</td>
</tr>
</thead>
<tbody>
<tr>
<td colspan="2" align="left" rowspan="1"><bold>Ontologies</bold></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Gene Ontology</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on functions; annotate domains from SCOP, Pfam, InterPro and RNA families from Rfam</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Disease Ontology</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on human diseases; annotate SCOP domains only</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Human Phenotype</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on human phenotypes; annotate SCOP domains only</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Mammalian Phenotype</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on mouse phenotypes; annotate SCOP domains only</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Enzyme Commission</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on enzyme activities; annotate SCOP domains only</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>UniProtKB KeyWords</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on functions and others; annotate SCOP domains only</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>UniProtKB UniPathway</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Knowledge on pathways; annotate SCOP domains only</td>
</tr>
<tr>
<td colspan="2" align="left" rowspan="1"><bold>Infrastructures</bold></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold>InfoDataFrame</bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for representing data information (e.g. domains)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Onto</italic></bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for representing ontologies</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Anno</italic></bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for representing domain-centric annotations</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Eoutput</italic></bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for storing enrichment outputs</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Dnetwork</italic></bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for storing domain networks</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Coutput</italic></bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for storing RWR-based contact outputs</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>Cnetwork</italic></bold></td>
<td align="left" rowspan="1" colspan="1">S4 class for storing contact networks</td>
</tr>
<tr>
<td colspan="2" align="left" rowspan="1"><bold>Functions for customised data building</bold></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcBuildInfoDataFrame</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Create an object of S4 class ‘InfoDataframe’ from an input file</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcBuildOnto</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Create an object of S4 class ‘Onto’ from input files</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcBuildAnno</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Create an object of S4 class ‘Anno’ from input files</td>
</tr>
<tr>
<td colspan="2" align="left" rowspan="1"><bold>Functions for analysis and visualisation</bold></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcEnrichment</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Enrichment analysis; return an object of S4 class ‘Eoutput’</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>visEnrichment</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Enrichment output visualisation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcDAGdomainSim</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Semantic similarity calculation; return an object of S4 class ‘Dnetwork’</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcRWRpipeline</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Random walk with restart; return an object of S4 class ‘Coutput’</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcDAGannotate</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Annotation propagation according to true-path rule</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcConverter</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Conversion between different graph classes</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"><bold><italic>dcRDataLoader</italic></bold></td>
<td align="left" rowspan="1" colspan="1">Loading RData into the current environment</td>
</tr>
</tbody>
</table>
</alternatives></table-wrap>
<p>The dcGOR is exclusively implemented on the R software environment. Three S4 classes are defined: ‘InfoDataFrame’ for domains, ‘Onto’ for ontologies and ‘Anno’ for annotations. The class ‘InfoDataFrame’ is used to store domain information. Since an ontology is organised as a directed acyclic graph (DAG; a directed graph without cycles), the class ‘Onto’ represents the ontology as a directed graph in which both adjacency matrix and node/term information are defined. For annotations, the class ‘Anno’ is defined to accommodate a sparse annotation matrix and additional metadata on domains and terms. All these classes have their class-specific S4 methods. This design of data representations greatly simplifies domain ontology analyses. <xref ref-type="table" rid="pcbi-1003929-t001"><bold>Table 1</bold></xref> outlines supported analyses: domain-based enrichment analysis, semantic similarity between pairs of annotated domains, and significance analysis for estimating a contact network.</p>
<p>The function <italic>dcEnrichment</italic> conducts enrichment analysis based on the hypergeometric/binomial distribution or Fisher's exact test <xref ref-type="bibr" rid="pcbi.1003929-Rivals1">[16]</xref>. It tests the statistical significance of the observed number of domains overlapped between an input group of domains and domains annotated by an ontology term. By default, all annotatable domains are used as the test background, but the user can specify this background. Taking as inputs a group of domains, <italic>dcEnrichment</italic> reports ontology terms that are enriched in this input domain group. To account for the ontology DAG, it also implements several algorithms that were originally applied to GO <xref ref-type="bibr" rid="pcbi.1003929-Alexa1">[7]</xref>, <xref ref-type="bibr" rid="pcbi.1003929-Grossmann1">[9]</xref>. The basic idea is to estimate the significance of a term after adjusting (e.g. removing) those annotations that its children terms also have. Enrichment outputs are stored as an object of S4 class ‘Eoutput’, on which methods are defined for easy view and save. Directly operating on this object, the function <italic>visEnrichment</italic> visualises the top significant terms in the context of the ontology DAG to aid intuitive interpretation.</p>
<p>Semantic similarity is a type of comparison to assess the degree of relatedness between two entities (here domains) in meaning of their annotations <xref ref-type="bibr" rid="pcbi.1003929-Pesquita1">[17]</xref>. Semantic similarity between domains is calculated based on their annotation by ontology terms. To do so, information content (IC) of a term is defined as the negative 10-based log-transformed frequency of domains annotated to that term. This definition considers the actual usage of a term (the frequency of annotated domains it has) to measure how specific and informative the term is. The function <italic>dcDAGdomainSim</italic> first calculates semantic similarity between terms, which is then used to derive similarity between domains. All popular IC-based semantic similarity measures <xref ref-type="bibr" rid="pcbi.1003929-Yu1">[8]</xref>, <xref ref-type="bibr" rid="pcbi.1003929-Pesquita1">[17]</xref> are supported. From pairwise term similarity, <italic>dcDAGdomainSim</italic> has several methods to calculate similarity between pairs of domains, including 3 best-matching (BM) based methods: average, maximum, and complete. For a term in either domain, all these BM-based methods first calculate maximum similarity to any terms in the other domain. For more detail, the reader is referred to this review <xref ref-type="bibr" rid="pcbi.1003929-Pesquita1">[17]</xref>. The resulting domain (semantic similarity) network is stored as an object of S4 class ‘Dnetwork’, a weighted undirected graph in which domains are nodes and their semantic similarity scores as the edge weights. Notably, the higher the semantic similarity score is, the more similar the domain pair is (the edge weight). There is no hard threshold for the semantic similarity scores, but it is advisable to focus on the edges with highest weights (e.g. the top 50% of all edges).</p>
<p>Given a domain network (e.g. the one resulting from <italic>dcDAGdomainSim</italic>), the function <italic>dcRWRpipeline</italic> performs random walk with restart (RWR) for estimating contact strength and significance between two input groups of domains (as seeds). It is based on the earlier work <xref ref-type="bibr" rid="pcbi.1003929-Fang3">[18]</xref>, but has been generalised to allow for weighting domain seeds, and done so in a single step. RWR-based contact outputs are stored as an object of S4 class ‘Coutput’, including a contact (statistical significance) network that is also a weighted undirected graph (an object of S4 class ‘Cnetwork’).</p>
<p>In addition to the analyses above, dcGOR also has several auxiliary functions for data load, annotation propagation, graph class conversion, and fast computation. The function <italic>dcRDataLoader</italic> is the hub for loading all kinds of package built-in data; this simplifies data use and also makes room for the future data expansion. The function <italic>dcDAGannotate</italic> is supposed to propagate annotations. According to the true-path rule, a domain annotated to a term is also annotated by all its ancestor terms (propagated to the root). This ensures that only the valid part of the ontology (in terms of domain annotations) is used properly. The function <italic>dcConverter</italic> is able to convert an object between newly defined graph classes and the one used in packages ‘igraph’ <xref ref-type="bibr" rid="pcbi.1003929-Csardi1">[19]</xref> and ‘dnet’ <xref ref-type="bibr" rid="pcbi.1003929-Fang4">[20]</xref>. This conversion enables network visualisation. Visualisation for pairwise semantic similarity matrix is done by package ‘supraHex’ <xref ref-type="bibr" rid="pcbi.1003929-Fang5">[21]</xref>. To relieve computational burden, dcGOR utilises vectorised and parallelised operations. This high-performance parallel computing is realised via executing loops in parallel, aided by two packages ‘doMC’ and ‘foreach’.</p>
</sec><sec id="s4">
<title>Results</title>
<p>The most common use case is to analyse a list of protein domains of interest. As a proof of principle, I use two interesting lists of domains (one from SCOP, the other from Pfam) to demonstrate the functionalities supported in the dcGOR package, particularly enrichment analysis and network analysis. Also, I show how users can benefit from this package to analyse their own domains, ontologies and annotations. All these examples are reproducible following step-by-step demos on the package website, from which results can also be found.</p>
<sec id="s4a">
<title>Analysing SCOP domains gained in human compared to Metazoa</title>
<p>First, I analyse a list of SCOP domain superfamilies that have been gained by the human genome since the ancient ancestral ‘Metazoa’ (animal). According to this report <xref ref-type="bibr" rid="pcbi.1003929-Fang6">[22]</xref>, a total of 1,112 SCOP domain superfamilies are present in human, among which, 58 were absent in the ancient Metazoan ancestor. Thus, these 58 domains were <italic>de novo</italic> gained during the evolution of the human lineage. To shed insight into these domains in the relevance to functions, phenotypes and diseases, I use <italic>dcEnrichment</italic> to perform enrichment analysis using all domains in Metazoa as the background. GO Biological Process (GOBP) enrichments suggest that they are of functional relevance to ‘multicellular organismal development’ and ‘toll-like receptor signalling pathway’; <xref ref-type="fig" rid="pcbi-1003929-g001"><bold>Figure 1</bold></xref> illustrates these top enriched terms in the context of GO hierarchy. This is consistent with the fact that more complex functions evolved along the human lineage. Enrichment analysis using DO also reveals a significant link with ‘disease of cellular proliferation’.</p>
<fig id="pcbi-1003929-g001" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003929.g001</object-id><label>Figure 1</label><caption>
<title>Domain-based enrichment analysis using GOBP terms.</title>
<p>Only the most significant 5 terms/nodes (outlined in black; explained in the bottom-right panel) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003929.g001" position="float" xlink:type="simple"/></fig>
<p>To further understand the relevance of these 58 domains to diseases, I use <italic>dcDAGdomainSim</italic> to construct a domain network according to domain-centric annotations by DO. This is done via calculating the semantic similarity between pairs of domains (<xref ref-type="fig" rid="pcbi-1003929-g002"><bold>Figure 2A</bold></xref>). The resulting domain (semantic similarity) network contains 11 disease domains; they are similar to each other but to a varying degree (<xref ref-type="fig" rid="pcbi-1003929-g002"><bold>Figure 2B</bold></xref>). Finally, based on the resultant domain network, I use <italic>dcRWRpipeline</italic> to estimate the contact strength and significance between sets of domains. The example domain set used here is a GO Molecular Function (GOMF) term and its annotated domains (see <xref ref-type="fig" rid="pcbi-1003929-g002"><bold>Figure 2C</bold></xref>). The statistically significant contacts between terms are visualised in <xref ref-type="fig" rid="pcbi-1003929-g002"><bold>Figure 2D</bold></xref>. These results suggest that (i) domains <italic>de novo</italic> gained during the evolution of the human lineage tend to form a disease similarity domain network, and that (ii) this network has a functional preference. Taken together, this example greatly encourages domain-centric approaches to genome evolution, function and phenotype/disease.</p>
<fig id="pcbi-1003929-g002" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003929.g002</object-id><label>Figure 2</label><caption>
<title>In-depth analysis for network-level understanding.</title>
<p>(<bold>A</bold>) Heatmap visualisation of the semantic similarity between pairs of domains according to their annotations by Disease Ontology (DO). (<bold>B</bold>) Network representation of the pairwise domain semantic similarity. It is a weighted and undirected network, with edge thickness indicating semantic similarity between a pair of domains/nodes. Nodes are labeled by both numeric id and textual description. (<bold>C</bold>) A table listing GOMF terms and their annotated domains (used as domain seeds for random walk with restart, RWR). Notably, terms used here are only those with at least 3 annotatable domains that are also in the domain network (see <xref ref-type="fig" rid="pcbi-1003929-g002">Figure 2B</xref>). (<bold>D</bold>) Contact (statistical significance) network between GOMF terms in <xref ref-type="fig" rid="pcbi-1003929-g002">Figure 2C</xref>, as estimated by RWR on the domain network in <xref ref-type="fig" rid="pcbi-1003929-g002">Figure 2B</xref>. Only those significant contacts/edges (adjusted p-values&lt;0.1) are shown, with thickness indicating the contact strength (z-score).</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003929.g002" position="float" xlink:type="simple"/></fig></sec><sec id="s4b">
<title>Analysing promiscuous Pfam domains</title>
<p>Next, I extend the analysis to a list of Pfam domains that tend to occur in diverse domain architectures; this tendency is called ‘promiscuous’. In this study <xref ref-type="bibr" rid="pcbi.1003929-Basu1">[23]</xref>, a total of 215 domains were identified as strongly promiscuous, in which 76 domains were taken from Pfam. Enrichment analysis of these 76 Pfam domains using GOBP terms and GOMF terms identifies two most significant terms ‘mismatch repair’ and ‘ATPase activity’ (<xref ref-type="fig" rid="pcbi-1003929-g003"><bold>Figure 3</bold></xref>). These two functional categories are consistent with previous report, however, there is a lack of the statistical support for the relevance to ‘signal transduction’ as claimed previously <xref ref-type="bibr" rid="pcbi.1003929-Basu1">[23]</xref>. Unlike DO, GO contains three sub-ontologies GOBP, GOMF and GO Cellular Component (GOCC). Therefore, the semantic similarity between pairs of these 76 domains was first calculated separately for each GO sub-ontology and then additively summed up to obtain the GO overall semantic similarity (<xref ref-type="fig" rid="pcbi-1003929-g004"><bold>Figure 4</bold></xref>).</p>
<fig id="pcbi-1003929-g003" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003929.g003</object-id><label>Figure 3</label><caption>
<title>Enrichment analysis of promiscuous Pfam domains using GOBP terms (left) and GOMF terms (right).</title>
<p>Only the most significant terms/nodes (adjusted p-values&lt;0.05; outlined in black) are visualised along with their ancestral terms. Nodes are coloured according to adjusted p-values.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003929.g003" position="float" xlink:type="simple"/></fig><fig id="pcbi-1003929-g004" position="float"><object-id pub-id-type="doi">10.1371/journal.pcbi.1003929.g004</object-id><label>Figure 4</label><caption>
<title>Heatmap visualisation of the GO overall semantic similarity between pairs of promiscuous Pfam domains.</title>
<p>Domains are ordered according to hierarchical clustering by the package ‘supraHex’.</p>
</caption><graphic mimetype="image" xlink:href="info:doi/10.1371/journal.pcbi.1003929.g004" position="float" xlink:type="simple"/></fig></sec><sec id="s4c">
<title>Analysing users' own domains, ontologies and annotations</title>
<p>Unique to this package, dcGOR supports customised analysis using data files provided by users. From input files (containing relevant information on domains, ontologies and annotations), three functions (<italic>dcBuildInfoDataFrame</italic>, <italic>dcBuildOnto</italic> and <italic>dcBuildAnno</italic>) are able to create objects newly defined in the package (<xref ref-type="table" rid="pcbi-1003929-t001"><bold>Table 1</bold></xref>). Similar to the built-in data, the customised data (created objects) can be subsequently used for all analyses supported in the package. The online demo (<ext-link ext-link-type="uri" xlink:href="http://supfam.org/dcGOR/demo-Customisation.html" xlink:type="simple">http://supfam.org/dcGOR/demo-Customisation.html</ext-link>) provides detailed instructions on how to analyse (starting with input files) the InterPro2GO mapping <xref ref-type="bibr" rid="pcbi.1003929-Burge1">[24]</xref>.</p>
</sec></sec><sec id="s5">
<title>Availability and Future Directions</title>
<p>As open-source software, the dcGOR package is freely available under the GPL-2 license (see <bold><xref ref-type="supplementary-material" rid="pcbi.1003929.s001">Software S1</xref></bold>). For ease of installation (R package dependencies), it is distributed as part of CRAN, <ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/package=dcGOR" xlink:type="simple">http://cran.r-project.org/package=dcGOR</ext-link>. For ease of version control, it is also distributed at GitHub, <ext-link ext-link-type="uri" xlink:href="https://github.com/hfang-bristol/dcGOR" xlink:type="simple">https://github.com/hfang-bristol/dcGOR</ext-link>. The details on documentations and demos can be found at <ext-link ext-link-type="uri" xlink:href="http://supfam.org/dcGOR" xlink:type="simple">http://supfam.org/dcGOR</ext-link>. As missed in most R packages, online documentations and demos are user-friendly; users can see both illustrated codes and executed outputs. This will dramatically reduce the learning curve and promote the wide adoption as users can exactly reproduce what they see.</p>
<p>The dcGOR is a general open-source tool for ontology and annotation analysis, providing a relatively complete framework. As demonstrated, it is able to analyse three most popular domain types (SCOP, Pfam and InterPro) and Rfam RNA families as well, and to support customised analysis. For example, users can analyse domains with different definitions, such as the partner members of the InterPro consortium <xref ref-type="bibr" rid="pcbi.1003929-Hunter1">[14]</xref>. The package is designed to be generic to all ontologies, not merely GO (as is the case with most existing tools) but also organism-specific ontologies. The future intention is to include those in the Open Biomedical Ontologies consortium <xref ref-type="bibr" rid="pcbi.1003929-Smith1">[25]</xref>. Here I only describe a handful of analyses that are routinely required for ontology analysis, but the package is scalable for further development. Other than the data expansion aforementioned, future developments will focus on developing utilities for genome function and phenotype prediction. As the standard has been set in dcGOR, it should be much easier for ontology users/developers to extend this software to meet their needs. Also, there is no reason not to apply the similar design principles for ontology analysis at the gene level.</p>
</sec><sec id="s6">
<title>Supporting Information</title>
<supplementary-material id="pcbi.1003929.s001" mimetype="application/x-gzip" xlink:href="info:doi/10.1371/journal.pcbi.1003929.s001" position="float" xlink:type="simple"><label>Software S1</label><caption>
<p>Package ‘dcGOR’ (version 1.0.3) including source code, documentation and data.</p>
<p>(GZ)</p>
</caption></supplementary-material></sec></body>
<back>
<ack>
<p>I am grateful to Prof. Julian Gough for support of this project.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcbi.1003929-Murzin1"><label>1</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Murzin</surname><given-names>AG</given-names></name>, <name name-style="western"><surname>Brenner</surname><given-names>SE</given-names></name>, <name name-style="western"><surname>Hubbard</surname><given-names>T</given-names></name>, <name name-style="western"><surname>Chothia</surname><given-names>C</given-names></name> (<year>1995</year>) <article-title>SCOP: a structural classification of proteins database for the investigation of sequences and structures</article-title>. <source>J Mol Biol</source> <volume>247</volume>: <fpage>536</fpage>–<lpage>540</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Fang1"><label>2</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Gough</surname><given-names>J</given-names></name> (<year>2013</year>) <article-title>dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more</article-title>. <source>Nucleic Acids Res</source> <volume>41</volume>: <fpage>D536</fpage>–<lpage>44</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Ashburner1"><label>3</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ashburner</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Ball</surname><given-names>CA</given-names></name>, <name name-style="western"><surname>Blake</surname><given-names>JA</given-names></name>, <name name-style="western"><surname>Botstein</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Butler</surname><given-names>H</given-names></name>, <etal>et al</etal>. (<year>2000</year>) <article-title>Gene Ontology: tool for the unification of biology</article-title>. <source>Nat Genet</source> <volume>25</volume>: <fpage>25</fpage>–<lpage>29</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-DeLimaMorais1"><label>4</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>De Lima Morais</surname><given-names>DA</given-names></name>, <name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Rackham</surname><given-names>OJ</given-names></name>, <name name-style="western"><surname>Wilson</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Pethica</surname><given-names>R</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>SUPERFAMILY 1.75 including a domain-centric gene ontology method</article-title>. <source>Nucleic Acids Res</source> <volume>39</volume>: <fpage>D427</fpage>–<lpage>34</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Radivojac1"><label>5</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Radivojac</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Clark</surname><given-names>WT</given-names></name>, <name name-style="western"><surname>Oron</surname><given-names>TR</given-names></name>, <name name-style="western"><surname>Schnoes</surname><given-names>AM</given-names></name>, <name name-style="western"><surname>Wittkop</surname><given-names>T</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>A large-scale evaluation of computational protein function prediction</article-title>. <source>Nat Methods</source> <volume>10</volume>: <fpage>221</fpage>–<lpage>227</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Fang2"><label>6</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Gough</surname><given-names>J</given-names></name> (<year>2013</year>) <article-title>A domain-centric solution to functional genomics via dcGO Predictor</article-title>. <source>BMC Bioinformatics</source> <volume>14</volume> <supplement>Suppl 3</supplement>: <fpage>S9</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Alexa1"><label>7</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Alexa</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Rahnenführer</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Lengauer</surname><given-names>T</given-names></name> (<year>2006</year>) <article-title>Improved scoring of functional groups from gene expression data by decorrelating GO graph structure</article-title>. <source>Bioinformatics</source> <volume>22</volume>: <fpage>1600</fpage>–<lpage>1607</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Yu1"><label>8</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Yu</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Li</surname><given-names>F</given-names></name>, <name name-style="western"><surname>Qin</surname><given-names>Y</given-names></name>, <name name-style="western"><surname>Bo</surname><given-names>X</given-names></name>, <name name-style="western"><surname>Wu</surname><given-names>Y</given-names></name>, <etal>et al</etal>. (<year>2010</year>) <article-title>GOSemSim: an R package for measuring semantic similarity among GO terms and gene products</article-title>. <source>Bioinformatics</source> <volume>26</volume>: <fpage>976</fpage>–<lpage>978</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Grossmann1"><label>9</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Grossmann</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Bauer</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Robinson</surname><given-names>PN</given-names></name>, <name name-style="western"><surname>Vingron</surname><given-names>M</given-names></name> (<year>2007</year>) <article-title>Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis</article-title>. <source>Bioinformatics</source> <volume>23</volume>: <fpage>3024</fpage>–<lpage>3031</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Schriml1"><label>10</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Schriml</surname><given-names>LM</given-names></name>, <name name-style="western"><surname>Arze</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Nadendla</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Chang</surname><given-names>Y-WW</given-names></name>, <name name-style="western"><surname>Mazaitis</surname><given-names>M</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>Disease Ontology: a backbone for disease semantic integration</article-title>. <source>Nucleic Acids Res</source> <volume>40</volume>: <fpage>D940</fpage>–<lpage>6</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Khler1"><label>11</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Köhler</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Doelken</surname><given-names>SC</given-names></name>, <name name-style="western"><surname>Mungall</surname><given-names>CJ</given-names></name>, <name name-style="western"><surname>Bauer</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Firth H</surname><given-names>V</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data</article-title>. <source>Nucleic Acids Res</source> <volume>42</volume>: <fpage>1</fpage>–<lpage>9</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Andreeva1"><label>12</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Andreeva</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Howorth</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Chandonia</surname><given-names>JM</given-names></name>, <name name-style="western"><surname>Brenner</surname><given-names>SE</given-names></name>, <name name-style="western"><surname>Hubbard</surname><given-names>TJ</given-names></name>, <etal>et al</etal>. (<year>2008</year>) <article-title>Data growth and its impact on the SCOP database: new developments</article-title>. <source>Nucleic Acids Res</source> <volume>36</volume>: <fpage>D419</fpage>–<lpage>25</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Punta1"><label>13</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Punta</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Coggill</surname><given-names>PC</given-names></name>, <name name-style="western"><surname>Eberhardt</surname><given-names>RY</given-names></name>, <name name-style="western"><surname>Mistry</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Tate</surname><given-names>J</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>The Pfam protein families database</article-title>. <source>Nucleic Acids Res</source> <volume>40</volume>: <fpage>D290</fpage>–<lpage>D301</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Hunter1"><label>14</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Hunter</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Jones</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Mitchell</surname><given-names>A</given-names></name>, <name name-style="western"><surname>Apweiler</surname><given-names>R</given-names></name>, <name name-style="western"><surname>Attwood</surname><given-names>TK</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>InterPro in 2011: new developments in the family and domain prediction database</article-title>. <source>Nucleic Acids Res</source> <volume>40</volume>: <fpage>D306</fpage>–<lpage>D312</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Gardner1"><label>15</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gardner</surname><given-names>PP</given-names></name>, <name name-style="western"><surname>Daub</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Tate</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Moore</surname><given-names>BL</given-names></name>, <name name-style="western"><surname>Osuch</surname><given-names>IH</given-names></name>, <etal>et al</etal>. (<year>2011</year>) <article-title>Rfam: Wikipedia, clans and the “decimal” release</article-title>. <source>Nucleic Acids Res</source> <volume>39</volume>: <fpage>D141</fpage>–<lpage>5</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Rivals1"><label>16</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Rivals</surname><given-names>I</given-names></name>, <name name-style="western"><surname>Personnaz</surname><given-names>L</given-names></name>, <name name-style="western"><surname>Taing</surname><given-names>L</given-names></name>, <name name-style="western"><surname>Potier</surname><given-names>M-C</given-names></name> (<year>2007</year>) <article-title>Enrichment or depletion of a GO category within a class of genes: which test?</article-title> <source>Bioinformatics</source> <volume>23</volume>: <fpage>401</fpage>–<lpage>407</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Pesquita1"><label>17</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Pesquita</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Faria</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Falcão</surname><given-names>AO</given-names></name>, <name name-style="western"><surname>Lord</surname><given-names>P</given-names></name>, <name name-style="western"><surname>Couto</surname><given-names>FM</given-names></name> (<year>2009</year>) <article-title>Semantic similarity in biomedical ontologies</article-title>. <source>PLoS Comput Biol</source> <volume>5</volume>: <fpage>e1000443</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Fang3"><label>18</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Gough</surname><given-names>J</given-names></name> (<year>2013</year>) <article-title>A disease-drug-phenotype matrix inferred by walking on a functional domain network</article-title>. <source>Mol Biosyst</source> <volume>9</volume>: <fpage>1686</fpage>–<lpage>1696</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Csardi1"><label>19</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Csardi</surname><given-names>G</given-names></name>, <name name-style="western"><surname>Nepusz</surname><given-names>T</given-names></name> (<year>2006</year>) <article-title>The igraph software package for complex network research</article-title>. <source>InterJournal Complex Syst</source> <volume>1695</volume>: <fpage>1695</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Fang4"><label>20</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Gough</surname><given-names>J</given-names></name> (<year>2014</year>) <article-title>The “dnet” approach promotes emerging research on cancer patient survival</article-title>. <source>Genome Med</source> <volume>6</volume>: <fpage>64</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Fang5"><label>21</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Gough</surname><given-names>J</given-names></name> (<year>2014</year>) <article-title>supraHex: An R/Bioconductor package for tabular omics data analysis using a supra-hexagonal map</article-title>. <source>Biochem Biophys Res Commun</source> <volume>443</volume>: <fpage>285</fpage>–<lpage>289</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Fang6"><label>22</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fang</surname><given-names>H</given-names></name>, <name name-style="western"><surname>Oates</surname><given-names>ME</given-names></name>, <name name-style="western"><surname>Pethica</surname><given-names>RB</given-names></name>, <name name-style="western"><surname>Greenwood</surname><given-names>JM</given-names></name>, <name name-style="western"><surname>Sardar</surname><given-names>AJ</given-names></name>, <etal>et al</etal>. (<year>2013</year>) <article-title>A daily-updated tree of (sequenced) life as a reference for genome research</article-title>. <source>Sci Rep</source> <volume>3</volume>: <fpage>2015</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Basu1"><label>23</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Basu</surname><given-names>MK</given-names></name>, <name name-style="western"><surname>Carmel</surname><given-names>L</given-names></name>, <name name-style="western"><surname>Rogozin</surname><given-names>IB</given-names></name>, <name name-style="western"><surname>Koonin</surname><given-names>EV</given-names></name> (<year>2008</year>) <article-title>Evolution of protein domain promiscuity in eukaryotes</article-title>. <source>Genome Res</source> <volume>18</volume>: <fpage>449</fpage>–<lpage>461</lpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Burge1"><label>24</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Burge</surname><given-names>S</given-names></name>, <name name-style="western"><surname>Kelly</surname><given-names>E</given-names></name>, <name name-style="western"><surname>Lonsdale</surname><given-names>D</given-names></name>, <name name-style="western"><surname>Mutowo-Muellenet</surname><given-names>P</given-names></name>, <name name-style="western"><surname>McAnulla</surname><given-names>C</given-names></name>, <etal>et al</etal>. (<year>2012</year>) <article-title>Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation</article-title>. <source>Database (Oxford)</source> <volume>2012</volume>: <fpage>bar068</fpage>.</mixed-citation>
</ref>
<ref id="pcbi.1003929-Smith1"><label>25</label>
<mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Smith</surname><given-names>B</given-names></name>, <name name-style="western"><surname>Ashburner</surname><given-names>M</given-names></name>, <name name-style="western"><surname>Rosse</surname><given-names>C</given-names></name>, <name name-style="western"><surname>Bard</surname><given-names>J</given-names></name>, <name name-style="western"><surname>Bug</surname><given-names>W</given-names></name>, <etal>et al</etal>. (<year>2007</year>) <article-title>The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration</article-title>. <source>Nat Biotechnol</source> <volume>25</volume>: <fpage>1251</fpage>–<lpage>1255</lpage>.</mixed-citation>
</ref>
</ref-list></back>
</article>