EXPERIENCE AND STRATEGY OF BIODIVERSITY DATA INTEGRATION IN TAIWAN

The integration of Taiwan's biodiversity databases started in 2001, the same year that Taiwan joined GBIF as an associate participant. Taiwan, hence, embarked on a decade of integrating biodiversity data. Under the support of NSC and COA, the database and websites of TaiBIF, TaiBNET (TaiCOL), TaiBOL, and TaiEOL have been established separately and collaborate with the GBIF, COL, BOL, and EOL respectively. A cross-agency committee was thus established in Academia Sinica in 2008 to formulate policies on data collection and integration as well as the mechanism to make data available to the public. Any commissioned project will hereafter be asked to include these policy requirements in its contract. So far, TaiBIF has gained recognition in Taiwan and abroad for its efforts over the past several years. It can provide its experience and insights for others to reference or replicate.


2001: THE YEAR DATA INTEGRATION BEGAN
Before 2001, the biodiversity databases in Taiwan were scattered in various government agencies, private organizations and academic institutions.There was no real horizontal integration; these databases, at most, provide on their websites links to other sites or the home pages of relevant databases.The agencies and institutions may have departments or research units under them, each in turn may have its own websites and databases.For example, under the Council of Agriculture (COA), there are The Fisheries Agency, Forestry Bureau, and Taiwan Endemic Species Research Institute; under Construction and Planning Agency, there are many National Parks.As for the biodiversity-related private organizations, more than 30 of them have established databases and websites.The large-scale or integrated research projects promoted by the government also have their own websites, such as Forestry Bureau's National Survey and Mapping of Floral Diversity Project, Bureau of Animal and Plant Health Inspection and Quarantine's invasive species project, Council for Economic Planning and Development's National Geographic Information Systems, and National Science Council's Long-Term Ecological Research Network.However, these sites usually cover project introductions, research reports, literature, news articles, and policy and regulation guidance, but lack metadata, raw data, or primary data of the research projects.Moreover, the reports are often in paper form and kept at the funding agencies, making it difficult to achieve the goal of sharing and integrating research data across agencies.
Taiwan began to integrate biodiversity data in 2001.The new National Digital Archives Program aimed to archive not only data in the field of humanities and social sciences, but also data in biological and natural sciences such as specimens and species information.The Executive Yuan approved the Biodiversity Promotion Plan in the same year.One of the projects under the Promotion Plan is for the National Science Council (NSC), leading nine co-organizers, to start to collect and integrate biodiversity data and exchange them with global organizations.The data collected cover expert list, species checklist, specimen information, geographical distribution, spatial and temporal distribution, invasive species, species description, literature and biological resources.Also in 2001, the Global Biodiversity Information Facility (GBIF) was formally established and Taiwan joined it as an Associate Participant.As a result, Taiwan can apply the technologies and standards of GBIF's metadata and exchange platform to promote the integration of its biodiversity information and the exchange with GBIF.(TELDAP), and an International Collaboration and Promotion Division was added.One of the tasks of this Division is to promote international cooperation of biodiversity information.TaiBIF was then incorporated into TELDAP and its tasks were re-defined (Shao et al., 2008a).TaiBIF is to collect the ecological distribution data and data in Biota Taiwanica (in English), which is commissioned by NSC.It is also in charge of the integration of cross-agency and institution biodiversity data.On the other hand, TELDAP is to integrate data from its Institutional Projects, Request-for-Proposals Project, and Biosphere and Nature Thematic Group.It concentrates on the collection of species descriptions (in Chinese), specimens and literature information.Its task of collaboration with international organizations was assisted by TaiBIF.
As to the issue of intellectual property rights (IPR), TELDAP originally employed the "Creative Commons (CC)" licenses approach.Under a combination of Attribution, Non-Commercial, No Derivative Works, and Share-Alike requirements, data are to be put on the Internet to be browsed by the public.However, for a great deal of cultural content, the IPR issues were not clarified.In addition, the nature of some information makes it unsuitable for CC.For example, the IPR of Taiwanese aborigines is covered by separate laws.Therefore, TELDAP in 2008 had to conduct an inventory of IPR for every item and require project directors to re-sign license agreement; hence solved the problem of TaiBIF not being able to receive data from Union Catalog and then submit them to GBIF.

INTEGRATE BIODIVERSITY DATA IN TAIWAN AND EXCHANGE THEM INTERNATIONALLY
Currently TaiBIF/TELDAP integrates cross-agency, cross-institution, cross-project, NGO, and private biodiversity databases through the use of species name and GPS (latitude and longitude).The most critical element is to build a species checklist since scientific name is the keyword or linker to the integration of biodiversity data.Via the scientific name of a species, its specimen information, DNA barcode (Barcode of Life, BOL), phylogeny (Tree of Life, TOL), ecological distributional data (in EML format), and other information (Encyclopedia of Life, EOL) can all be accessed.
There are three different ways to exchange and share Taiwan biodiversity data internationally (Shao, 2006;Shao et al., 2007b).The first path is to link to GBIF directly through TaiBIF.The second path is to go through regional networks which are GBIF's Associate Participants themselves.The third path is for the local organism databases to send Taiwanese native or endemic species information to global species databases (GSDs) such as FishBase and AntBase.They in turn can link to GBIF through large-scale international cooperation projects such as COL, BOL, EOL, BHL, OBIS, BioNET-International, etc. (Fig. 1).

Figure 1. Integrate biodiversity databases in Taiwan and link them globally
In order to more effectively and broadly employ the integration framework and advancement of global biodiversity information, Taiwan will (1) cooperate with Catalogue of Life (COL) of Species 2000 to continue maintaining and operating Taiwan's COL, i.e., the database and website of TaiCOL (renamed from TaiBNET); (2) cooperate with CBOL and iBOL to continue maintaining and operating Taiwan's BOL, and change the name of the present "Cryobanking Program for Wildlife Genetic Material in Taiwan"(http://cryobank.sinica.edu.tw) to TaiBOL; (3) cooperate with EOL and build Taiwan's EOL, TaiEOL.
All the databases mentioned above will be integrated into TaiBIF (http://taibif.org.tw) which is linked to global databases such as GBIF.The old URLs will remain accessible.The new URLs and their website contents are listed below: (1) TaiCOL (http://col.taibif.tw)= TaiBNET: 54,000 native species, 1,351 alien species, along with conservation species and fossil species.
(3) TaiEOL (http://eol.taibif.tw):established in 2011 and will have information in Chinese on 16,000 species by the end of 2013.( 4) TaiBIF (http://taibif.tw):In addition to the information on the three websites mentioned above, there is Biota Taiwanica with English information on 11,000 species and videos, literatures, specimens, etc.Other than database integration work, TaiBIF provides biodiversity communities in Taiwan with information technology assistance to accelerate the integration and sharing of data.

STARTING WITH SPECIES CHECKLIST-TaiCOL (TaiBNET)
The scientific name of a species is the most important keyword in linking biological information; hence, the most basic task of integrating biodiversity data is to first establish an accurate, authoritative, and complete species checklist.TaiBNET was established in 2002 to integrate and update the checklist of valid species in Taiwan.So far a total of 54,417 native species in eight kingdoms (Virus, Bacteria, Archaea, Protozoa, Chromista, Fungi, Plantae, Animalia), 59 phyla, 143 classes, 662 orders, 3,128 families and 17,706 genera have been compiled, including more than a thousand cultivars, alien species, and fossil species.TaiBNET offers users handy search functions such as search by partial scientific name, Chinese name, common name, keyword or string.Users can also browse by the taxonomy tree.Each of the tree's classification level displays the numbers of recorded organisms in its sub-levels.When a species is chosen, a description page with its classification, synonyms, and literature is shown.For some organisms, links to other databases such as Fish Database of Taiwan, FishBase, Discovery Life, and EOL are given.A mechanism is provided letting users submit related information and images to the site.A "2008 Workshop: Research and Status of Taiwan Species Diversity" was held at National Museum of Natural Science in Taichung on August 15-16, 2008.Half a year later, the books of "2008 Taiwan Species Diversity I. Research and Status" and "2008 Taiwan Species Diversity II.Species Checklist" were published along with a DVD.The book and DVD of "Taiwan Species Checklist 2010" were published in 2010 and can be freely downloaded as well (Shao et al., 2008b(Shao et al., , 2008c(Shao et al., , 2010)).
The database of TaiBNET is currently maintained by a full-time assistant.Authorized taxonomists and their doctoral students can update data online.Afterward, the data are reviewed by top ranking scholars in each taxon.
Another way to check the validity of data is to compare the checklist with existing databases such as Sp2000 or the electronic version of Biota Taiwanica.When discrepancies are found, experts in that taxon are invited to double check.Additionally, the fungus can be matched against CABI's Index of Fungi and the marine organisms can be matched against WoRMS.
One advantage of international cooperation is to obtain a wealth of other information.For example, Taiwan has 3,086 valid fish species as of 2011; from FishBase, 14,000 synonyms of its native fishes were obtained.The collection of synonyms is important because scientific names and classification systems are constantly revised.
To fish, 1/10 of fish names are changed approximately every 10 years (Pauly & Froese, 2000) (Shao et al., 2007a).Even the taxonomists will not be able to track and remember the changes and can only rely on automatic database comparison to detect them.When a user enters an invalid name of a species, the system automatically finds the valid name and links to the right webpages.Otherwise, much of the valuable specimen information in herbaria and museums will be lost if users only know synonyms of a species.

ESTABLISHING TAIWAN ENCYCLOPEDIA OF LIFE WITH COMMUNITY ENGAGEMENT
Advocated by biologist E. O. Wilson, the Encyclopedia of Life project began in 2007.Its goal is, through the joint efforts of scientists around the world, to gather and share scientific knowledge about the 1.9 million known organisms in a single online resource.The TaiEOL team started to communicate and exchange information technology with the EOL team in 2009.In 2011, the team began to work on Taiwan Encyclopedia of Life (TaiEOL; in Chinese) and hope to complete in three years the online information content of 20,000 species, including 8,000+ endemic species.

Status and analysis of species data in traditional Chinese characters
TaiEOL from the start utilizes its own Biodiversity Bibliography System to record the species with Chinese descriptions from the biological publications in Taiwan.There are now more than 400 books written in traditional Chinese and a cumulative total of 45,000 scientific names in the system.Using Insecta as an example, there are 21,000 recorded insect species in Taiwan; yet only 4,700 species have Chinese descriptions.Furthermore, only 1,600 out of 6,000 endemic insects have information written in Chinese.These statistics help with the decision on which taxon to invite researchers to fill in with information.They also provide users with the bibliographies of a species when they retrieve information on that species.

Open source software platform of TaiEOL portal
The information collaboration and user-participation concept of Web 2.0 has become an international trend in sharing and integrating biodiversity data.LifeDesks, the modules developed by the EOL project, and Scratchpads, provided by the European Distributed Institute of Taxonomy (EDIT), are two of the most popular participatory platforms.Based on these two types of platforms and using open source software, TaiEOL began to develop its Chinese version with species information as its core.
The TaiEOL portal will build on the existing species checklist of Taiwan.For the 5.4 million species, it will, in staggered phases, invite the taxonomists and citizen scientists to join the effort and contribute to its content.Taiwan's endemic species especially need descriptions, images, and popular science material so that the public can easily learn about them and become knowledgeable in Taiwan's biodiversity.Moreover, TaiEOL can provide the information on endemic species to EOL to be shared by the global community.Through the portal's biological content management system and user interaction mechanisms, the biodiversity information of Taiwan can be directly and effectively shared and exchanged.

INTEGRATING DATABASES AND LINKING GLOBALLY THROUGH TaiBIF
TaiBIF, the Taiwan node of GBIF, uses the metadata format and tools recommended by GBIF to integrate Taiwan's biodiversity information and exchange them with global community.The homepage of TaiBIF is shown in Figure 1.TaiBIF consolidates various databases in Taiwan, including species checklists, Biota Taiwanica, species occurrence data, Taiwan Encyclopedia of Life, etc.Using biodiversity informatics and tools, TaiBIF enables the general public, academic scholars, and government agencies to gain access to the data on its platform.In addition, TaiBIF introduces international data standards and protocols so that it can meet the two objectives of "deepening Taiwan people's biodiversity knowledge" and "helping to make the world's biodiversity picture more complete."The Catalogue of Life in Taiwan project (TaiBNET) offers species names (scientific names and Chinese names) and classification hierarchy, and keeps track of taxonomic work.Being the backbone of the project, these data provide means to access all the biodiversity information.

Primary species occurrence data (specimen and observational data):
There are specimen data from TELDAP and observational data from various collaborating domestic databases.A total of 1.59 million digital specimen and observational records are now available.The data come from institutions such as Academia Sinica, National Museum of Natural Science, National Taiwan University, National Taiwan Museum, Taiwan Forestry Research Institute, Fisheries Research Institute, Agricultural Research Institute, Taiwan Endemic Species Research Institute, National Museum of Marine Biology and Aquarium, National Taiwan Ocean University, National Tsing Hua University, National Sun Yat-sen University, etc.

Biota Taiwanica:
NSC's Division of Life Sciences, in order to fulfill the requirement of the Biodiversity Promotion Plan, assists domestic taxonomists in composing contents for Biota Taiwanica.So far, information on more than 10,000 species has been uploaded online and released to the public.

Literature:
Relevant research papers on biodiversity are provided by the Science and Technology Policy Research and Information Center of the non-profit National Applied Research Laboratories.Currently, there are research projects (8,079 articles), research reports (3,987 articles), periodical papers (10,611 articles), conference papers (4,767 articles), Master's theses and Doctoral dissertation (4,171 articles), English journal papers (2,270 articles), and publications (499 articles).

Eco-Photo
Using Cooliris software, TaiBIF presents ecological photos and videos it collected.It will include the images gathered by TaiEOL in the future.

Species description
The text and images of native Taiwan species as well as popular science material from TaiEOL will be incorporated into TaiBIF in the future.

Web service and tool development
After years of research and development effort, the TaiBIF team has made it available many biodiversity information tools to accelerate the integration and sharing of biodiversity data.For example, there are tools to check scientific names and geological coordinates: i.
The scientific names are matched against TaiBNET and Sp2000, and any errors in spelling are highlighted and reported back to the users for reference.Additional information on the taxon level, author, and publication year of the species are provided in the correct sequence so that users can easily standardize the format and increase the accuracy of data.ii.
TaiBIF converts geological coordinates to the format commonly used in Taiwan so that users can update old ecological distribution data and conduct further analysis and research work.TaiBIF also verifies the coordinates.Its tools will check spatial data for errors; e.g. based on the description of a location and latitude and longitude, the tools can determine if the coordinates fall outside the range.TaiBIF will pass the information back to the users (data providers) to make necessary changes.In doing so, TaiBIF increases the quality of the species occurrence data and accomplish the goal of integrating and accessing biodiversity data through geographic information.

INFORMATION TECHNOLOGY FOLLOWING INTERNATIONAL TREND
In terms of information technology, GBIF uses DiGIR, BioCASE, and TAPIR with Darwin Core as its metadata standard to integrate global species occurrence data (Hill et al., 2009;Chapman, 2005) TAPIR is chosen by the TaiBIF team to be its data exchange protocol, integrating data (in English) into the original information architecture.However, the majority of Taiwan's species occurrence data also contains Chinese characters.TaiBIF, therefore, made use of TAPIR's customization feature and created Chinese-language XML extension.Consequently, not only the English data can be shared with GBIF, the Chinese data can also be shared within the original Darwin Core framework.Furthermore, a retrieval platform was established at TaiBIF (Fig. 3).This framework constantly assists with system deployment at the biodiversity-related institutions in Taiwan.Various institutions and museums have now accumulated over 1.59 million species occurrence data.

DIFFICULTIES AND SOLUTIONS OF BIODIVERSITY DATA INTEGRATION
Comparing to specimen, literature, and species checklist, the collection and integration of the observational raw data are much more difficult.It is due to the fact that, once a species is identified and its distribution is recorded by taxonomists or ecologists, the information can be analyzed and written into papers if the information is posted online.On the contrary, specimens can only be borrowed with the agreement of their original collectors or managers.As a result, most of the researchers perhaps are willing to provide analyzed charts or tables, but hesitate to disclose and share data before they have a chance to publish their papers.It is fairly common, and a regrettable happening, for research data to get lost, destroyed, or buried somewhere (Shao et al., 2007c).
The reluctance to submit raw data is only one of the reasons why, in the past 20 years, there is not much progress in the information integration process among or cross government agencies in Taiwan.Other reasons are: the complicated issues of IPR, the lack of information collection and management unit in the agencies, the absence of clear, executable policies for information, etc.Hence, entrusted by National Science Council, Academia Sinica in 2008 set up the National Committee for GBIF (GBIF-ROC), with committee members being either the representatives from biodiversity-related agencies or researchers who are in charge of databases.The Committee began to establish data exchange standard, study and develop viable information policies, and request all agencies to include a term on the contracts demanding all their commissioned projects to submit raw data when the projects end.After a certain period of time, agreed upon beforehand, the data will be open to the public.

Principles for submitting ecological distribution data
After several discussions, GBIF-ROC reached a conclusion on July 20, 2009 entitled "The principles of government-funded ecological distribution data collection and archiving, promoted by GBIF-ROC Committee" to be considered by the Sustainable Development Research Committee and the Biodiversity Promotion Plan. i.
Type and scope of the submitted data: In the first stage, only the digital raw data of the ecological distribution (species occurrence data) need to be submitted.Other related data will be added later.ii.
Collection and submission of survey and monitor data: All government agencies are requested to include a term on the contracts demanding all their commissioned projects to submit raw data when the projects end.iii.
Format, methodology, and evaluation of data submitted: (i) Submitted information should include survey or monitor methodology, definitions, and data.The format should use the international customary standards such as Darwin Core, Ecological Metadata Language (EML) (Jones et al., 2006), and ISO19115.(ii) Each agency should designate its data collection (storage) unit based on its technology capability.(iii) Each agency should define an evaluation system; e.g.further funding is provided only after data are submitted.iv.
When the data are made public (IPR issue): Each agency should determine when to make the data accessible by the public.

Format of ecological distribution data submitted
Consensus has been reached about the format of the information which the commissioned projects are required to submit.There are 12 items of metadata and one item of raw data.
i. Raw data of items ix-xii mentioned above.

Future challenges and strategies
In order to achieve its goals of collecting and integrating biodiversity data, TaiBIF has officially proposed to higher-level governmental decision makers such as the National Science and Technology Conference and the National Council for Sustainable Development to adopt a top-down approach.However, the database integration work in many agencies still faces some common problems and challenges which are listed below.The difficulties can only be overcome with the support and determination of the managers in each agency.The managers need to commit more manpower and material resources to the work, and also bestow recognition and encouragement on their employees. i.
Without professional IT personnel to serve on the staff, the researchers are forced to outsource system development since they usually don't have the time to manage or maintain the databases on their own.When open source software is not used, it is difficult to maintain a database sustainably.The database typically ceases to exist when a project ends.ii.
Database management in general is regular work.Consequently, its funding is often cut back year after year.iii.
The accomplishment of the researchers is not recognized.The evaluation of a researcher's performance is based exclusively on the published papers (SCI scores).Hence, most researchers are reluctant to spend their time on databases or share their data; making it difficult to conduct further work on the data.Recently GBIF has started to promote the publication of "data papers" in scientific or SCI journals; a good incentive strategy for scientists to publish their data and give public open access to them.
Other methods and strategies promoted by TaiBIF are as follows: i.
Cooperating with other websites to increase page views through mutual links.ii.
Sharing budget and accomplishment.Data integrators need to establish credibility by distributing funds equitably, making their own data public, and attributing achievement to team members and data providers.iii.
Meeting the needs of users.The convenience of users and data providers should take precedence over that of information technology or data integration.iv.
In addition to the number of SCI (Science Citation Index) papers and their impact factors, digitized material such as the number of records (including specimen collected, ecological distribution data, and DNA sequences) archived and uploaded to the Internet, the so-called "Repository Impact Factor," should be taken into account when assessing research performance.Data compilation and submission in the form of "data papers" to academic journals should also be encouraged.This mechanism facilitates the publishing of biodiversity data resources (Chavan et al., 2011).v.
Implementing performance evaluation effectively in order to receive data effortlessly.vi.
Nurturing talents.Provide opportunities for young scholars to organize conferences or attend international meetings so that they can learn and exchange ideas.vii.
Advocating the benefits of data integration  Offer offsite backup (data stored in different places)  Help with data validation to improve data quality  Contribute to the society; academic services; the taxpayers' rights  Increase page views of the data  Assist government in making sure it receives concrete results from the huge investment of funds it allocates to scientific researches and surveys.Raw data are archived and preserved so that when a project ends, these data, rather than summarized reports, are available to the public. Be able to reanalyze and re-simulate models with the data, using newer statistical software such as cluster analysis, ecological model, etc.  Can be used as an important tool, e.g. a quantitative indicator of biodiversity, for resource conservation, sustainable use, and management policies.

CONCLUSION
In order to meet the tasks stipulated by the Convention on Biological Diversity, it is elemental to integrate biodiversity data and make it accessible to the public.Only through sufficient information sharing can the multi-dimension objective of biodiversity conservation, research, education, and resource sustainability be achieved.The promotion of data integration involves establishing work flow and overcoming difficulties occurred in the process.The difficulties cover many aspects such as intellectual property rights, data policy, data standards, data quality concept, standard operating procedure of collecting data, and software/hardware development technologies.In this paper, we present the results of the ten-year experience of Biodiversity Research Center, Academia Sinica in promoting biodiversity data integration.Future work will consider the addition of environmental and hydrological conditions to analyze in order to biodiversity sustainable development or policymaking.

Figure 3 .
Figure 3. Flow of data to TaiBIF (Chinese) and GBIF (English) using TAPIR In 2004, GBIF's Taiwan Portal, TaiBIF (Taiwan Biodiversity Information Facility) was built.Its tasks are to consolidate Taiwan's biodiversity data, develop software tools, and hold educational training workshops in related metadata standards and technologies.In 2007, the second phase (2007-2012) of the National Digital Archives Program was transformed to Taiwan e-Learning and Digital Archives Program In 2002, NSC began to provide funding for Biodiversity Research Center, Academia Sinica to create the website of Taiwan Species Checklist "TaiBNET" (http://taibnet.sinica.edu.tw).More than a hundred taxonomists are involved in the work.
, and has accumulated 312 million occurrence data to date.TaiBIF in 2004 started to use DiGIR (Distributed Generic Information Retrieval) as the protocol to exchange data with GBIF Secretariat.Due to the limitations of certain fields, CRIA (Centro de Referência em Informação Ambiental) developed TAPIR (TDWG Access Protocol for Information Retrieval) to try to solve the problem.TAPIR is a REST style distributed data exchange protocol transmitted by HTTP.It combines and reinforces the strengths of BioCASE and DiGIR and allows users to communicate with data providers in a simple and direct way.It also provides more choices in data distribution standard, including different versions of Darwin Core and the capacity to define data by custom XML, making it more flexible to share data.TAPIR was officially adopted by GBIF and became one of the official TDWG standards since October 2009.
, including detailed definition of measurement categories.If it is Nominal or Ordinal, it should include description and definition of the values.If it is Interval or Ratio, it should include Unit, Precision, and Number Type.If it is Date-Time, it should include Format and Precision.xiii.