AN INTEGRATED RECEPTOR DATABASE ( IRDB )

Various receptor data were collected, edited and integrated into an Integrated Receptor Database (IRDB). The data stored includes structural data (amino acid sequences, their secondary-structure and three-dimensional structure), functional data, binding affinity, cell signaling data etc. The purpose of this database is to allow structural biologists, drug designers and toxicologists to analyse and elucidate receptor-ligand dockings and the resultant post-binding signal transduction pathways. IRDB is available on line (http://impact.nihs.go.jp/RDB.html)


INTRODUCTION
The receptor-ligand binding triggers a series of reactions in a living system.So, detailed knowledge and data about receptors and their ligands are an important basis for understanding living systems and diseases, and for designing new drugs.Because of the advances in molecular biology, which were accelerated by the Human Genome Projects, a huge amount of DNA sequence and protein structural data have been accumulated and are available for public use.Although we had already developed a Receptor Database (Nakata, Takai & Kaminuma, 1999), we have now revised the RDB, integrating new functions and updating the data.The present version of RDB, which we call an Integrated Receptor Database (IRDB), is the updated version of our old Receptor Database (Nakata, Takai-Igarashi, Nakano & Kaminuma, 2001a).
The Internet/World Wide Web (WWW) technology had allowed us to use powerful viewers for representing retrieved data and knowledge graphically.This technology has also allowed us to link RDB dynamically to other related WWW sites.In the previous version of RDB, the goal of our system was to provide one-stop shopping on receptor data.The system uses a good viewer to represent information useful for endocrine disruptor and the drug design; information such as the structural data and binding sites, and the cell signaling pathway that is triggered by a ligand binding.IRDB includes more structural data, binding affinities, the transcription factors and regions, and single nucleotide polymorphisms (SNPs).

Purpose of the database system
The purpose of RDB is to store data and knowledge on receptor proteins and properties.This data and knowledge includes protein structures and their functions, ligands and binding affinity data, cell signaling information, drug and SNPs.The database users are those who study biology and the mechanisms of disease, and those who are developing drugs based on the structure-based drug design (SBDD) approach and personalized medicine.

Hardware and software
RDB was implemented on a UNIX workstation (e.g.Silicon Graphics OCTANE).We used an object oriented database management software ACEDB (A Caenorhabditis elegans DataBase, 2001;Dunham, Durbin, Thierry-Mieg & Bentley, 1994;Stein & Thierry-Mieg, 1998) as the base system.To modify data and insert new functions, PERL and/or C programs were integrated into ACEDB.As for the method of calculating the threedimensional structure of ligands, we used Molecular Mechanics 2 (Allinger, 1977).The three-dimensional (3D) structural image is provided by computer commands that call visualization tools, such as Chime (Martz, 2002) or RasMol (Sayle & Milner-White, 1995).In accordance with the architecture of ACEDB, all the information was stored as structured objects in tree forms.A tree can be arbitrarily extended in any direction as more information is gathered about a particular aspect of an object.Similar objects are grouped together within a "class".The class governs what can be stored in an object and how it is displayed and used.To make all the information about an object available, objects often contain labels (pointers) to other objects.They can also contain letters, numbers, objects and computer commands.The tree class structure in RDB was shown in Table 1.All information was integrated in the file for ACEDB.
The overall system configurations are shown in Figure 1.In the off-line system, we used a BLAST search (Altschul, Madden, Schaffer, Zhang, Zhang, Miller et al., 1997) and the MView program (Brown, Leroy & Sander, 1998) for studying sequence similarity.Sequences were passed to the BLAST search program and the result was modified by the MView program and stored in the RDB.

Database sources
Receptor proteins were retrieved from the Swiss Prot and PIR databases on the Internet.The sources of the RDB are classified into three categories: (1) those that are collected from various references (basic data), (2) those that are retrieved from external on-line databases by the user's request (protein structure data, SNPs collection data, etc.), (3) those that were generated by some theoretical calculations (sequence similarity data, 3D structures of the ligand, etc.).
For the protein secondary-structure prediction, the corresponding amino acid sequence data was edited in a relevant form and passed to another analytical system, for instance the BCM search Launcher (Smith, Wiese, Wojzynski, Davison & Worley, 1996).The relevant drug data, which includes Japanese and US drugs with CAS registry numbers, chemical structures and 3D structures are compiled in a Drug Database.Japanese drugs were retrieved from JAN (Japanese Accepted Names for Pharmaceuticals, n.d), and US drugs were from USP-NF (United State Pharmacopoeia -National Formulary, n.d).3D structures of drugs were calculated using MM2.

Database contents
The numbers of LG (Large Group) in 'Membrane and Nuclear Receptor' are 36 and 5, respectively.At present, the total number of receptor proteins in the RDB is 1780.Each receptor protein has labels for the PIR / Swiss Prot entry, functional region and the secondary-structure prediction.The numbers of the DNA binding sites and ligand binding sites are 250 and 170, respectively.An aligned sequences chart for the different species was stored for the main receptors.There are 410 entries for 3D structure data in the RDB.
DNA sequences, which are translated into the receptor proteins, are available for each receptor protein.Gene data and SNPs information are included for most human receptor proteins.Data about drugs that bind to the receptors, is included as an example .

Automatic genetic variation data collection
An agent system of collecting Single Nucleotide Polymorphisms (SNPs) data on the Internet, was developed to search for and retrieve SNPs data related to those genes and proteins pre-registered in the system (Nakata, Takai-Igarashi, Nakano & Kaminuma, 2001b).The related gene names were previously input into the agent system and linked to IRDB.The position of any allelic frame-shift in the DNA sequence, the corresponding amino acid offset, and the converted amino acids are represented in the SNPs information.

DISCUSSION
IRDB was designed to be one part of the pharmaco-informatics infrastructure for genome-based personalized medicine (Kaminuma, Nakata, Nakano & Takai-Igarashi, 2001).A drug or its metabolite binding to target biomolecules, such as membrane receptors, cytoplasm enzymes, and nuclear receptors, triggers a series of reactions.Although these target molecules are not yet fully identified, it was estimated that nearly half of them are receptors (Drews, 1998).
For structure-based drug design, exact 3D structures of receptors and ligands are essential.Although only the 3D structures of a few receptors have been identified, theoretically predicted secondary-structures and aligned sequence-charts for different species are available in RDB.Although only endocrine disruptor related data is now included in BADB, much more experimental binding-affinity data are still required and theoretically calculated binding-affinity values could be included in the database in the future.
The signal pathways, which are the post-binding effects of the receptor and ligand, can be retrieved via CSNDB.The signal tranduction and transcription information may help in understanding the effects of various chemicals, such as drugs or environmental chemicals, on the living system via gene expression.SNPs data for receptors is essential for personalized medicine, in areas such as drug responses and common disease predisposition.We intend to include a link to OMIM in the near future, relating the receptor and ligand docking and the signal pathway flow.We expect this information to be useful in the basic research of drug design and for understanding living systems.
Our Receptor Database is open to the public.Access is not restricted by any firewall.By installing a free visualization tools, such as Chime (Martz, 2002), the user can look at a three-dimensional image of the protein.
No other tools are needed to look at the information in IRDB.The waiting time may be long for some sites, and sometimes there may be bad connections to the Analytical sites.To improve this, we intend to have the Analytical system on our site.Because of the huge number of amino acid sequences (in PIR and Swiss Prot), DNA sequences (in GenBank) and protein structural data (in PDB), we did not store them on our computer disk, but provided links to the original Web sites.The whole IRDB system is constructed of many in-house sub-systems and independent systems.We do not intend to provide the software itself, because maintaining whole system would be very complicated and difficult.