THE DEVELOPMENT AND USAGE OF THE OVERSEAS SINOLOGY DATABASE

The Overseas Sinology Database is composed of three databases: scholar, organization, and journal. The thesis database is regard as separate and is attached to the scholar database. The database information comes from major areas of the world, especially the countries adjacent to China, and updates are done continuously. The Sinology Database is in several different languages and should satisfy the differing needs of data collection and database application. The data quality is strictly controlled during the whole data life cycle, which includes data collection, processing, storage, and accessing. In addition, according to the standards and specifications of the metadata, metadata are created to accompany the data, which satisfies the cooperation among different databases. Finally, besides the function of searching, statistical calculation, and sorting, the database is also used for data mining and knowledge discovery. Through these methods, conclusions about changes in Sinology can be drawn, which will aid us in understanding the world and China in particular.


THE FEASABILITY OF BUILDING AN OVERSEAS SINOLOGY DATABASE
We considered three principles when building the overseas Sinology database: first, whether this database would have a high academic value and hold a advanced place for a long period; second, whether the database would have a strong document guarantee for the studies of overseas Sinology and fill a gap in China; and third, whether the database would improve social and economic development.According to the above principles, the basic information of the CASS institute meets the requirement of a Characteristic Library Collection, the human resources and material resources that the database required.After the database has been built, it can be used by academia, especially for add-value service.In a word, building the database was feasible.

Composition of the database
The overseas Sinology database is composed of three sub-databases: scholars, organizations, and the journal of overseas Sinology.Beside these three sub-databases, the scholar database describes the works and theses of the scholars whose information has been collected in the scholar sub-database.

Faculty and administration
Requirements for the faculty who would build the database were research ability, wide knowledge, and a specific specialty.The people who collect and process the data should have the following abilities: knowledge of the information, for example, a familiarity with all kinds of information resources, which they are able to search and access; the ability to judge the authenticity, dependability, and authority of the information resources; and the ability to translate material in foreign languages into Chinese.The ability to use computers to process the data and edit text are also necessary.A professional team, selected according these requirements, was built.
The system administration was based on target-orientation.Material in different languages and from different countries was assigned to individual administrators.They have their own duties and are in their own levels in the organizational pyramid.In this way, initiative, activism, and creativity were inspired so that the goal of quality control could be achieved [4] .

Software
The first step in building a database is to consider the management software.One way to do this is to develop a new system, and the other is to select one that has already been developed.Developing a new system requires technicians who are versed in computer programming and a long period of time in which to complete the task.
However, the current situation is short of technicians and time.Thus the feasible method is to select existing software.A library auto-system is not suitable for complicated data structures such as an overseas Sinology

Metadata
Building the metadata is a necessity for the overseas Sinology database.Metadata are data about data.Their main functions are as follows: • Discovery and identification: Identifying and discovering the digital information unit and collection.Dublin Core is a typical example.• Cataloguing: Cataloguing the data unit in detail.
• Resource Administration: Supporting resource administration and access management including rights/privacy management, digital signature, seal of approval/rating, access management, payment, and accounting.
• Preservation and archiving: Describing the format of information, protection condition, and migration methods and support digital preservation. [5]tadata not only can describe the various kinds of information resources and the character and property of the data themselves but also can organize a mass of digital information into an organic structure so that the efficient and exact searches, materials-sharing, and data-management can be achieved.Therefore, metadata are necessary in developing the overseas Sinology database.
Metadata creation should conform to metadata standards as this is a precondition for search, exchange, and usage among different databases.A database should not become an information isolated island.The overseas Sinology database was designed to realize search and exchange among databases with related subjects.Metadata standards or specifications have been established for archives, geography, art, museum such as CDWA, GILS, FGDC, EAD, DC, and others [6] .
However, developing metadata standards and specifications are different processes, especially for metadata which are suitable for specific subjects and formats.CALIS has set up a series of metadata specifications and a standard bibliographic description involving ancient books, stemma, rubbings, chorography, theses, and e-books [7] .
Overseas Sinology is a database that includes information about people, organizations, and journals.There are no metadata standards in existence that describe the object data in the overseas Sinology database.Therefore, we must expand existing metadata standards to suit this database and create new metadata to describe and manage the objects.Thus, a new metadata system has been set up.The metadata about the overseas Sinology scholar database is based on the character metadata in the Chinese metadata standard framework, which has applied to Beijing University famous master collection.The works database references journal papers, e-books, and specifications and standards of the metadata in the digital library.The journal database metadata were newly created because there are no metadata standards to describe a whole journal, only a certain paper.The organization metadata was also newly created.In the process of creating a new metadata system, we considered not only the convenience of management, search, and exchange but also user demand.The bibliography items were established according to the metadata system.The following tables illustrate the information in detail.

Large time span:
The earliest publication can be traced back to 1984, while the latest works are also embodied in the database.Information after 1980 accounts for a major part of the database [9] .2. Internet information resources -the websites of organizations, institutions, and individuals contain a variety of information.Internet information is continuously updated so that it is more comprehensive and newer than the information in print.However, its authenticity should be judged according to the creator.Therefore, the information collectors do painstaking work to analyze, select, and process data from the Internet, which may be absent, contradictory, and confusing.Email, fax, and telephoning are efficient means for authenticating information.

Information resources
3. Indirect methods -information is obtained from universities, scientific research institutions, and guilds from local regions, nearby countries, and even the world.Information collectors attend seminars, dissertations, and lectures, talking with scholars, and obtaining first-hand information [10] .

Data input
Manual data input of documents is better than scanning and using optical character recognition (OCR).The scope of the content is broader and more comprehensive than the existing materials.Information in the existing materials is old and insufficient for research.Scanning and OCR technology are not perfect.A great effort must be made to correct erroneous data.Therefore, we prefer manual input, which allows us to collect the newest information and compile it in a new system ensuring the veracity of the data.

Data verification
Verification of database quality must occur in every step of data processing.Because the users of database include everyone, especially scholars, and the data will be published in print, the quality of the data needs to be strictly controlled to assure its authentication and reliability.
During manual inputting, errors are inevitable.Incorrect keying and other kinds of mistakes must be corrected.

S934
Methods of data verification include: repeated data verification, which deletes duplicate data in the database; visual method, which is an efficient way to find out a large number (75%-85%) of mistakes; logic judgment, in which logic inconsistencies are verified; data typing, which will discover inconsistent data; and format verification.
Table 4 shows a verification entry [11] .Authentication and veracity of the data are very important because they affect the search results, research, and analysis.When this is completed, the data will finally be compiled and published.A good example of data verification is that translated names of the foreign scholars must be inspected by experts.

Data transmission
The final step is data transmission.The database administrator is in charge of uploading the data, and the database log will show the state of the transmission.Error data will be picked out automatically.Good transmission over the Internet is a good test for data transmission.After the transmission, the system will feed back information.

Background administration of the database system
1. Data management: including data upload, processing in batches, checking repeated data, backup, etc.
2. Customization of metadata: setup through the metadata template.
3. Data security: users can not casually alter, delete, or destroy the data.This can be insured by: Individual view distribution -different users have different rights to operate the database, for example, the right to delete, read, and modify the data; and user validation: including Windows validation and SQL Server validation [12] .

Special service applications
1. Relation link: after a search, related content is used for a new search point.The scope of information gets broader.Although it is a function of the relation link, a url need not be put into the address form, the relation link can be opened to log into the database and do the search directly.Thus the users' information scope can be widened, and the limitations of the database alleviated.
2. Encryption: sensitive and critical data should be encrypted.Users who do not have the right to read the encrypted data just see -This record is an encrypted one.
3. Sorting: any data segment can be sorted in ascending or descending order 4. Data mining: there are four methods: statistical, robot-leaning, database, and nerve tracing net.After analyzing the character of the overseas Sinology database, the normal methods include a statistical method such as liner regression, multi-analysis, clustering analysis, differentiate analysis, relativity analysis and database methods such as multidimensional analysis, OLAP.

S936
Analysis is usually done with SPSS software.Data can be downloaded from the overseas Sinology database and uploaded to the SPSS software.Using this software, we can mine a great deal of information and discover trends and relationships.The scope of the information exceeds the original areas and disciplines.Information comes from Canada, Australia, the UK, Germany, India, Australia, New Zealand, Singapore, Malaysia, and Korea along with the United States, Japan, and Russia, the pioneering countries in Sinology research.All these countries, not only adjacent countries in Asia, but also countries in Europe and on other continents, have an interest in Sinology research.The scope of the disciplines is expanding and the number of research organizations and institutions is increasing.Table 6 shows the distribution of organizations and institutions doing Sinology research.Figure 1 shows the distribution of the disciplines.From Figure 1, we can conclude that the hot points of Sinology include cultural research, politics, regional research, economic research, etc.Many institutions and organizations also undertake Chinese language learning activities.

THE PRODUCTION OF THE OVERSEAS SINOLOGY DATABASE
In the Overseas Sinology Journal Database, journals containing overseas Sinology articles come from different countries, including Canada, the United States, Australia, Singapore, New Zealand, Holland, Germany, India, Korea, Japan, etc.The languages in the database are English, French, German, Korean, and Japanese.Some journals are published in both Chinese and English, some are published in several languages, and some are published in e-journals.All this information is in the database.Every record contains a title, ISSN, subject, publishing period, newsroom address, telephone number, fax, email, and website.This detailed information offers a convenience to users who can use this information to get more information, contact a newsroom directly, and easily access a website.The earliest start publication in the journal database is 1984, and the latest is 2002.
Figure 2 shows the magnitude of change in the number of journals from the mid 19 th century to 2002.high praise to this database saying it is very practicable in the area of Sinology [14] .

MAINTENANCE AND CONTINUOUS CONSTRUCTION
Maintenance of the database is a long-term and arduous mission.The work should be done, and the problems we should pay attention to are follows: 1. Update the data and compile new content.With the development of overseas Sinology, the database should reflect changes in development.New scholar information, new works, and new Sinology journals should be tracked and put into the database.
2. Enhance the application.Making the database known in governments and academic communities is further work that should be done.Applications can make the database play a realistic role and show its true value.
3. Integrate the information of this database into other information.Make the database easier and more convenient to use so that users can get more information from just one search or one accession.
There is much other work we could do.Building the database is just a start.

4 . 5 . 6 .
Log in: IP login and user/password login 2. Search: basic search and advanced search.a. Basic search include key words, title, full-text, and subject.b.Advanced search includes every segment, Boolean search, and search on results.c.Search can be done in several languages.3. Browse: the database has a map of the logical structure of the sub-databases.The database supports Unicode and multi-language display.Customization: user can control the configuration, color, font, and icons.Http link: by clicking on the link, the user can log into the website and email function; Search history: the users can review what they have searched.
At present, there are about 2,000 records of scholars in the Overseas Sinology Scholar Database.Aside from information about individual scholars, their work has also been collected and stored in the database.More than 500 records of organizations and institutes have been inputted into the Overseas Sinology Research Organization Database.More than 200 journal records about overseas Sinology have been deposited in the Overseas Sinology Journal Database (up to 2006-8-11).All the information is periodically updated and complemented.

l i g i o n s i n o l o g y p o l i t i c s p h i l o s o p h y l a n g u a g e a r t s m e d i c i n e i n f o r m a t i o n p s y c h o l o g y l i t e r a t u r e c u l t u r e s t u d i e s i n f o r m a t i o n a n d l i b r a r y s o c i o l o g y r e g i n a l s t u d i e s h i s t o r y a r c h a e o l o g y n a t i o n l i s m e c o n o m i c s e d u c a t i o n i n t e r n a t i o n a l r e l a t i o nFigure 1 .
Figure 1.Overseas Sinology discipline distribution

Figure 2 .
Figure 2. Magnitude of change in the number of overseas Sinology journals

Table 1 .
Descriptive metadata of a paper core elementsTitle, author, subject, secondary authors, description, date, type, format, identifier, language, related resources, right

Table 2 .
Descriptive metadata of an e-book elements Title, author, subject, description, publisher, secondary author, date, type, format, identifier, source, language, related resource, time-space scope, right management

Table 3 .
Overseas Sinology metadata and cataloging items

Table 4 .
Data verification item accordance Check the accordance between Scholar database and Works & Paper database the accordance of data from different information source value range and code The value of data in the range; the character and code

Table 5 .
Data Science Journal, Volume 6, Supplement, 23 December 2007 5.The release of the database and flexible setting.6. Statistics: the database can provide log statistics functions.The statistical items include session time, frequency, IP, search, user information, etc. Log statistics 4. Encryption techniques: information can be read only after the rights are granted.The encryption techniques play an important role.