A Study on the Organizational Architecture and Standard System of the Data Sharing Network of Earth System Science in China

The aim of this paper is to discuss the organizational architecture and standard system for sharing research data at the national level. The Data Sharing Network of Earth System Science (DSNESS) is one of the nine pilot projects of the Scientific Data Sharing Project in China that has become a long-term operational research data-sharing platform in the National Science and Technology Infrastructure (NSTI) of China. First, a data sharing union mechanism was designed with the core principle being, " data come from research and will be reused in research ". Second, a data sharing organizational architecture was constructed that consists of three sections: data resource architecture, data management architecture, and data services architecture. A physical data sharing network was constructed that includes one general center and 15 distributed sub-centers based on the architecture. Third, a series of data sharing standards and specifications were designed and implemented in the DSNESS. The reference model of the DSNESS standard system includes three levels of standards: directive standards, general standards, and application standards. In total, 21 high level standards and specifications were developed and implemented in the DSNESS. Several core standards and specifications, such as the extensible metadata standard, data quality control specifications, and so on, were analyzed in detail. Finally, the data service effect was summarized in three aspects: dataset services, standard and specification services, and international cooperation services. This research shows that the organizational architecture and standard system is a very important soft environment for research data sharing. The practices of DSNESS will provide useful experiences for multidisciplinary data sharing in Earth science and will help to eliminate the data gap between the rich and poor at the national level. 1 INTRODUCTION To recognize, discover, and understand the Earth's system, scientists and the public need a large amount of historic and current data. Meanwhile, more and more new research data, information, and knowledge, such as the research data reflecting ancient evidence of climate change, current polar environment, Qinghai-Tibet plateau evolution processing, etc. are created or produced in this process. These abundant data, information, and knowledge not only further Earth System Science (Liu Dongsheng, 2002) but also educate and inform the public to promote scientific understanding in order to protect and sustainably develop our common and unique home, Earth. While barriers still exist, research data are increasingly being accessed and used by the public (Li et al., 2011). Research data refer to the …


INTRODUCTION
To recognize, discover, and understand the Earth's system, scientists and the public need a large amount of historic and current data.Meanwhile, more and more new research data, information, and knowledge, such as the research data reflecting ancient evidence of climate change, current polar environment, Qinghai-Tibet plateau evolution processing, etc. are created or produced in this process.These abundant data, information, and knowledge not only further Earth System Science (Liu Dongsheng, 2002) but also educate and inform the public to promote scientific understanding in order to protect and sustainably develop our common and unique home, Earth.
While barriers still exist, research data are increasingly being accessed and used by the public (Li et al., 2011).Research data refer to the products of one or more focused research projects and typically contain data that are subject to limited processing or curation (NSF, 2005).It is difficult to integrate and share these data without a top-level organizational architecture and standard system because these data originate in different research groups or from individual scientists and do not always conform to community standards and content access policies.The next generation of Digital Earth (Goodchild et al., 2012) also suffers from such issues.
The importance of research data sharing has been recognized in the scientific community since the end of WWII.Many organizations, agencies, and pioneers have proposed the sharing of research data and enhanced international or regional data sharing facilities and infrastructure development, e.g., the World Data System (WDS) of the International Council for Science (ICSU) (http://www.icsu-wds.org/),the Committee on Data for Science and Technology (CODATA) of ICSU (http://www.codata.org/),Global Change Master Directory (GCMD) of the National Aeronautics and Space Administration (NASA) (http://gcmd.nasa.gov/),GeoNET (http://www.geonet.org.nz/),Geospatial One-Stop (http://catalog.data.gov),GeoNetwork (http://geonetwork-opensource.org/),DataONE (http://www.dataone.org/),and Data Archiving and Networked Services (http://www.dans.knaw.nl/en).However, the development of infrastructure and data sharing facilities has not been uniform worldwide.In general, data sharing is weak in developing countries because of the poor data sharing culture and environment encompassing their policies, rules and specifications, technology, funding, talents, etc.
Since the 1990s, scientific data sharing has been on the agenda of the Chinese government, largely resulting from the effort of many eminent Earth scientists.In 2002, the Scientific Data Sharing Program (SDSP) was launched by the Ministry Of Science and Technology (MOST), and nine data sharing pilot projects were initiated that involved meteorology, seismology, Earth system science, agriculture, forestry, water resources, mapping and surveying, sustainable development, and rural technology fields (Xu, 2003(Xu, & 2007)).As one of the nine pilot projects in the SDSP, the Data Sharing Network of Earth System Science (DSNESS) attempts to integrate and share research data in the basic and frontier fields of ESS (Sun et al. 2003).It has been one of the platforms in the National Science and Technology Infrastructure (NSTI) in China since 2005 and has been authorized as a unique national data sharing platform on ESS in 2011 by the MOST and the Ministry of Finance (MOF).The aim of this paper is to share the experiences and lessons of the organizational architecture and standard system design and construction of the DSNESS.
The paper is organized as follows.In Section 2, we analyze the data sharing mechanism of the DSNESS.In Section 3, we address the organizational structure of the DSNESS.In Section 4, we study the data sharing standards and specifications of the system.In Section 5, we generalize the impact of DSNESS's service over the past 10 years.A summary is presented in Section 6.The core data sharing philosophy of DSNESS is, "data come from research and will be reused in research".The data sharing union for Earth system science is the implementation of this philosophy.

DATA SHARING MECHANISM
Figure 1 shows a schematic diagram of the union mechanism.

Universal union members
Research Data repository The union is composed of the core union members and the universal union members.Core union members refer to the stakeholders who have significant research data sets and are willing to distribute them within the DSNESS.Many core union members host the regional or disciplinary sub-centers of the DSNESS.Universal union members refer to the data users of the DSNESS.When the universal union members use data from the DSNESS, they are encouraged to submit their final results to the DSNESS.Most of the users are receptive to this requirement and voluntarily submit their results.To make the union run smoothly and sustainably, the rights and obligations of the union members are described in its constitution and are as following:  Union member rights include the following: (1) access and use of all the available data in the union free of charge; (2) publish their own data resource in the union; (3) cooperatively study the data analysis and application methods or technology and produce new data products; (4) acquisition of funding from the union for editing, managing, and updating datasets; and ( 5

ORGANIZATIONAL ARCHITECTURE
Contributed by the data sharing union of DSNESS, more than 28 terabytes of data had been accumulated and archived before 2011.Because these data are multi-disciplinary, multi-scale, and multi-type in nature, their organization requires suitable data resource architecture, data management architecture, and data services architecture.

Data resources architecture
Data resources in DSNESS are divided into three levels: basic data, thematic data, and integrated data products.Basic data are usually raw data that reflect the Earth system sphere layer's architecture, components, spatio-temporal change, energy transformation, and their relationships.Observation data, monitoring data, test data, experimental data, and field survey data are in this category.Thematic data are focused on specific problems, based on the data analysis and processing of a large number of basic datasets.Integrated data products are derived from multi thematic and basic datasets using data processing models, such as data fusion models, remote sensing inversion algorithms, and so on.
Thematic data sets are currently the primary content of the DSNESS, and their data catalogue is as follows: 1. Resource and environmental data at multiple spatial scales  , WMO climate observations data , and the Global soil database (HWSD v1.1) (1 km resolution, 2009), among others  Regional cooperation data: 1:500,000 terrain maps for Mongolia, 1:1 m basic geographic data sets for Mongolia (1997), 1:1,000,000 and 1:4,000,000 basic geographic data sets from Russia; social and economic statistics from Russia; and data from the Hindu Kush Himalaya region

Data management architecture
The data management architecture of DSNESS, which is organized in three levels: general center, subcenter, and data resources node, is shown in Figure 2.
The general center serves as the data portal for the DSNESS as well as a backup warehouse.It also provides data stewardships for the public.The general center stores the metadata harvested from all of the subcenters, and the data repository backs up the whole network.A subcenter manages datasets from a discipline or from regional datasets and provides data search and access functions to the public.A subcenter controls thematic data and data product development and is guided by the general center.A data node collects, integrates, manages, and updates the datasets.The data node provides datasets either to a subcenter or directly to the general center.These datasets consist not only of data entities but also of complete description information including metadata and data processing documents. Schema A: general centersubcenterdata node.When users search data at the data portal in this schema, the metadata information for the queried datasets are dynamically collected from the uniform metadata database in the general center that harvests all the dataset metadata stored in subcenters through the web service.Then these dataset entities can be accessed in subcenters supplied by the data nodes. Schema B: general centersubcenter.Subcenters not only provide the data discovery function but also the data storage and access functions.This schema is suited for subcenters that have a significant amount of data and do not need data nodes support. Schema C: general centerdata node.The general center provides the data discovery function directly.In this schema, the data nodes do not need to construct a web platform.Instead, the data nodes can put their main efforts on collecting and managing data and making high quality thematic data and data products.

Data service architecture
Data service architecture refers to the physical architecture of the DSNESS.There are one general center and 15 regional or disciplinary subcenters, as shown in Figure 3.This architecture provides easy and friendly interactive interfaces for the users, which are supported by network technologies (Zhu et al., 2009).Its portal is http://www.geodata.cn.

General Center
Regional subcenter

Polar regional subcenter
Qinghai-Tibet Plateau regional subcenter Loess Plateau regional subcenter Yangtze River middle and low area regional subcenter Yellow River low area regional subcenter Northeast black soil regional subcenter Xinjiang and Middle Asia regional subcenter South China Sea and its abutting area regional subcenter

STANDARD AND SPECIFICATION SYSTEM
Correct standards and specifications are essential for high quality data sharing.The DSNESS designed the reference model for the standard and specification system.

Standard and specification reference model
The DSNESS standard and specification reference model is shown in Figure 4.It includes three levels: directive standards, general standards, and application standards.which are divided into four classes: data management, data description, platform development, and data services.General standards and specifications cover the common standardization requirements at different stages of the data life cycle, such as data planning, the collection and organization of data, quality assessment and quality control, metadata creation, data backup and preservation, the data search interface, and data analysis and visualization. Application standards refer to special standards and specifications established for particular fields.These standards and specifications can adopt special industry standards that are in use or create new application standards and specifications as required for particular fields.
Over the past 10 years of research and application, 21 general standards and specifications and 35 special application specifications in China have been implemented successfully within the DSNESS (Wang et al., 2009).Extensible metadata standard and data quality control specifications are the core components of the system.These two standards will be analyzed in detail as below.

Extensible metadata standard
Metadata play a very important role in dispersive and distributed research data sharing.To describe the different disciplinary data resources under uniform standards and specifications, the DSNESS designed an extensible metadata model and its extending methods.The extensible metadata standard model is shown in Figure 5.
The metadata system includes the core and universal metadata elements set.Core metadata refer to the least necessary metadata elements for data description.Universal metadata elements refer to the largest metadata elements collection used for any data description.An application profile is created according to the specific data description requirements obtained from the application fields.Usually an application profile includes the core metadata elements and parts of related elements selected from the universal set, using the methods and rules of serial element extraction.The extraction method includes inheriting, cutting out, extending, and expanding, among others.The core metadata elements are divided into eight groups of information: identification, content, distribution, data quality, portrayal catalogue, data schema, metadata maintenance, and metadata reference.There are 22 elements in the core metadata set and 188 elements in the universal set.Metadata standards are applied within the platform of the DSNESS.Metadata descriptions and their dynamic extensions can be realized with eXtensible Markup Language (XML), supported by the XML Schema extending technology, and JAVA, a development language.Under this model, different regional and disciplinary data centers manage their own metadata records that are subsequently collected and integrated at the general center.All of the metadata records can be accessed through uniform metadata databases harvested from the various subcenters within the DSNESS because they comply with the uniform metadata extending system.

Data quality control standards and specifications
Data quality control runs through the entire data-sharing framework.There are different data quality control standards and specifications for different data management stages in the life cycle of the DSNESS.
In the data collection stage, data quality is ensured by the specifications defined by the data entry permissions.In the data integration stage, all the databases are guided by various quality control specifications depending on the type of data, i.e., the specifications for the vector database construction quality control, the raster database construction quality control, and the attribute database construction quality control.In the data management stage, all data should have complete data description information, including a data entity dictionary, metadata, and data documents, which contributes to the understanding of the dataset contents and structure of the records.Corresponding standards and specifications, such as the data entity dictionary specification, core metadata content specification, metadata editing specification, and data document editing specification, are also created.In the data distribution stage, all data are checked by the data distribution quality control specifications.Only those data sets that comply with the specifications can be published in the data sharing environment.

SERVICE APPLICATION
Lead by the data sharing union mechanism and supported by the data resources, data management, and data services architecture, the DSNESS provides extensive data services to the public.DSNESS provides a series of data service functions in the portal, including a data catalogue search, data online booking, data offline application, etc.Any user in China and abroad can access or apply the datasets freely in DSNESS.
Besides datasets access services, DSNESS also enhances the standard and specification services and international data sharing cooperation services.

Datasets services
From 2003 to 2011, the number of registered users and web visitors reached 61047 and 10,754,318, respectively.Over the same period, 40.03 TB of data have been downloaded by the science community and the public, without any cost to them.DSNESS provided more than 640,000 custom processing services for 925 scientific research projects, 20 large-scale science engineering projects, and 22 civil engineering projects.For example, these datasets have been used in the Qinghai-Tibetan railroad construction, earthquake relief and manned spaceflight engineering in China, the international Monsoon Asia Integrated Regional Study (MAIRS), etc.
DSNESS also provides data services for education.More than 800 graduate and undergraduate university students have benefitted from the DSNESS data services.The top five downloaded datasets are listed in Table 1.Qinghai-Tibet Plateau soil database 3934

Standard and specification services
National research data archiving is a new trend.The US, Sweden, Ireland, Australia, and many other countries have begun to experiment with and implement research data archiving (NSF, 2011;NIH, 2003).
Although the archiving of research data in China was proposed at the beginning stages of the SDSP, it was difficult to implement because of the poor data archiving standards and specifications environment.
Encouraged by successful research data integration and sharing in the DSNESS, the MOST adopted these standards and specifications for research data archiving in 2008.The resources and environment field of the National Key Basic Research Program of China (973 Program) was the first pilot project for national research data archiving selected.Seven DSNESS standards and specifications, such as the data archiving administration specifications, data plan specifications, metadata specifications, data document specifications, and data quality control specifications, were used in archiving data for the 973 Program, (Wang et al., 2011b).
By the end of October, 2011, 49 projects in the resources and environment field of the 973 Program have submitted their research data to the data archiving center.The size of the accumulated data is approximately 2.26 TB and includes more than 1000 datasets.These data can be accessed free of charge through the research data archiving portal (http://www.973geodata.cn).

International cooperation services
The DSNESS actively contributes to international data sharing, and five percent of its registered users are from foreign countries, such as the US, Korea, Germany, Japan, Canada, Cote d'Ivoire, UK, etc.The DSNESS closely cooperates with many international data organizations, such as the WDS -ICSU, ICIMOD, GLCF, the Russian Academy of Sciences, and the Mongolian Academy of Sciences.
A typical example of data exchange and cooperation by the DSNESS is the liaison with the WDS-ICSU.
In 2007, the WDS-ICSU invited DSNESS to attend the global metadata interoperability experiment for World Data Centers distributed throughout America, Europe, and Asia.All of the metadata records in DSNESS were first translated into English.The DSNESS then developed a metadata access interface based on the Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) (Wang et al., 2011a).This enabled the WDS Portal (http://www.icsu-wds.org/) in Germany to access all of the metadata records (3922) of the DSNESS in Beijing.These data have been disseminated to the WDS, IGBP, IPY, MAIRS, and many other scientific communities and projects through both the WDS and DSNESS portals.

UsersFigure 1 .
Figure 1.Schematic diagram of the data sharing union in the DSNESS ) the right to use the reputation of the DSNESS. Union member obligations include the following: (1) provide datasets with clear intellectual property rights; (2) publish novel research data when these data are derived from data provided by the union; (3) provide high quality data (only those data that pass the quality check can be published); and (4) propose suggestions and advice to the union.The aforementioned protocol was applied successfully in DSNESS through a 3-stage development process from 2003 to 2012.Currently, there are more than 40 core union members in DSNESS, including 14 institutes in the Chinese Academy of Sciences and 10 universities in China.Many other agencies in China and abroad, such as the WDS members of the ICSU in China, the International Center for Integrated Mountain Development (ICIMOD), the Global Land Cover Facility (GLCF) of the University of Maryland in the USA, the Russian Academy of Sciences, and the Mongolian Academy of Sciences, are also a part of the DSNESS.

Figure 2 .
Figure 2. Multi-layered data management architecture of the DSNESS

Figure 4 .
Figure 4.The DSNESS standard and specification system reference model

Figure 5 .
Figure 5. Extensible metadata standard model of the DSNESS Land cover/land use data of China using a map scale of 1:100,000 and 1:250,000 that cover four periods: the late1980s, 1990s, 20001980s, 1990s,  , and 20051980s, 1990s,   (Zhang et al., 2009)  ) Thematic element geospatial data of China using a map scale of 1:1,000, including geomorphology, vegetation, grassland, land resources, land use, lakes, wetland, and desert distribution data  Raster datasets of China using a resolution of a 1 km grid, including climate, DEM, land-use, soil chemical, population and GDP, among others.2. Resource and environmental data at multiple temporal scales  Paleoclimate and paleoenvironmental data (e.g., isotope, sporopollen, coral, and tree-rings) from archaeology field work and model simulations  Historical population (from 200 BC to present), precipitation (from 1840 to present), and natural disaster (from 1949 to present) data for China 3. Resource and environment data for the typical climate change of sensitive regions  Scientific expedition data and observations from the South Pole (1984 to present) and the North Pole (1999 to present)  Basic geographic data, data from scientific expeditions, and long term observational data from the Qinghai-Tibet Plateau  Palaeoenvironment and paleoclimate data, soil erosion, and conservation data from the Loess Plateau  Glacier, permafrost, and desert data from Northwest China  Mountain disasters and their distribution data from Southwest China  Agricultural, ecological, and environmental data for the black soils in Northeast China  Data for coastal regions, including the Yangtze River Delta, Yellow River Delta, and the coastal area of Southeast China Data inversed from satellite observations, including NDVI, LAI, surface reflectance, soil temperature, evapotranspiration, vegetation phenology, and ice/snow thickness, among others  Atmosphere, environment, and greenhouse gas data or products, including AERONET monitoring data; CO, CO 2 , CH 4 , and O 3 distribution data at different spatial resolutions; NO 2 , SO 2 , and other radiation data; precipitation, temperature, and relative moisture data at HADCM3 scenario for China  Simulation data, including forest carbon cycle and evapotranspiration, soil moisture data for China, GCM simulated climate change data for China, and IBIS simulated water and carbon change data for China 5. International data resources  Global data products: global land cover fusion data sets (GLCC, GLC2000, UMD, MODIS, and Global cover), 1 km resolution global climate data (WORLDCLIM) 4. Integrated global change data Satellite observations: Landsat MSS/TM/ETM+, MODIS, MISR, Aura, and GOME data 
Present service architecture of the DSNESSThe general center is hosted by the Institute of Geographic Sciences and Natural Resources Research (IGSNRR), CAS, in Beijing.The polar regional subcenter is hosted by the Polar Research Institute of China in Shanghai.The Qinghai-Tibet Plateau regional subcenter is hosted by the Institute of Tibetan Plateau Research, CAS, in Beijing.The Loess Plateau regional subcenter is hosted by the Northwest Agriculture and Forestry University, in Shaanxi Province.The Yangtze River delta regional subcenter is hosted by the Nanjing Normal University in Jiangsu Province.The Lower Yellow River regional subcenter is hosted by the Henan University in Henan Province.The Northeast black soil regional subcenter is hosted by the Northeast Institute of Geography and Agroecology, CAS, in Hilongjiang Province.The Xinjiang and Middle Asia regional subcenter is hosted by the Xinjiang Institute of Ecology and Geography, CAS, in Urumqi.The South China Sea and its abutting area regional subcenter is hosted by the South China Sea Institute of Oceanology, CAS, in Guangdong Province.The glacier and frozen soil disciplinary subcenter is hosted by the Cold and Arid Engineer Research Institute, CAS, in Lanzhou, Gansu Province.The geophysical disciplinary subcenter is hosted by the Geology and Geography research institute, CAS, in Beijing.The space science disciplinary subcenter is hosted by the space science center of the Chinese Academy of Sciences in Beijing.The astronomy science disciplinary subcenter is hosted by the national astronomy station of CAS in Beijing.The lake and watershed disciplinary subcenter is hosted by the Nanjing Institute of Geography and Limnology Research, CAS, in Jiangsu Province.The renewable resources and environment subcenter is hosted by the Institute of Geographic Sciences and Natural Resources Research, CAS, in Beijing.The global change subcenter is hosted by Nanjing University in Jiangsu Province.