A CHARACTER RECOGNITION SCHEME BASED ON OBJECT ORIENTED DESIGN FOR TIBETAN BUDDHIST TEXTS

The purpose of this study is to develop a plausible method to code and compile Buddhist texts from original Tibetan scripts into Romanized form. Using GUI (Graphical User Interface) based on Object Oriented Design, a dictionary of Tibetan characters can be easily made for Buddhist literature researchers. It is hoped that a computer system capable of highly accurate character recognition will be actively used by all scholars engaged in Buddhist literature research. In the present study, an efficient automatic recognition method for Tibetan characters is established. The result of the experiments performed is that the recognition rate achieved is 99.4% for 28,954 characters.


INTRODUCTION
Buddhism is a religion that has been studied by Buddhist literature researchers all over the world from ancient times.Much of Buddhist literature was written using wooden blocked Tibetan language (Kojima et al., 1997).Some parts of this literature have already been printed for very important works.
As an example, we have used the "rGyal rabs gsal ba'i me long" published in 1993 in a volume of 250 pages.Computer recognition of these Tibetan printed texts would be eagerly welcomed by all scholars engaged in Buddhist literature studies because much printed Buddhist literature has recently been converted to this form.In this paper, we design a character recognition system for Tibetan characters by using UML (Unified Modeling Language) (Kojima et al., 2006), which is a newly developed method of OOD (Object Oriented Design) (Fujita et al., 2005;Kojima et al., 1995;Moriwaki et al., 1994).Using this GUI based on OOD, a Tibetan character dictionary can be easily made for Buddhist literature researchers (Choi et al., 2005;Mack et al., 2005).

EXPERIMENTS
A sample copy of the original Tibetan text is shown in Figure 1.The experimental system we used is schematically shown in Figure 2.   Next, character segmentation is performed by touching the button for character segmentation shown also in the bottom of this diagram.An example of character segmentation is shown in Figure 5.In the character segmentation, we have segmented one syllable by extracting the character "tseg."An arrow in Figure 5 shows the "tseg."The diagram of collecting dictionary characters is generated in Figure 6, by touching the button for collecting dictionary characters in Figure 5.When Tibetan researchers touch the start button in the upper part of the right-hand insert of Figure 6, it is possible for them to collect dictionary characters automatically.Next, it is possible to make the dictionary characters by touching the button "making dictionary characters," with the dictionary character name defined by Tibetan researchers.This operation is very easy for Tibetan researchers.Finally, it is possible automatically to recognize characters by touching the button for character recognition.These procedures are almost automatic using the GUI.A 99.9 % segmentation rate has been achieved for 141,988 characters in 250 pages of "rGyal rabs gsal ba'i me long."After obtaining the results of character recognition for 28,954 characters in 30 pages of "rGyal rabs gsal ba'i me long," we learned that mistakes mainly happen with similar characters.Group: "ba," "pa," and "pha" is shown in Figure 7 (Kojima et al., 1997;Kojima et al., 1995).OOD for these Tibetan characters is created by combining categorization and these characters, respectively.According to this additional procedure, 99.4 % recognition rate has been achieved.
The relationship between class for similar character and class for candidate characters is shown in Figure 8. Tibetan researchers without aid of computers performed all these operations.

CONCLUSION
In the present study, an efficient recognition method for Tibetan characters is established.We achieved 99.4 % recognition rate for 28,954 characters in a test case.Tibetan character recognition equipment using GUI is easy to use by Tibetan researchers and has been systematized.We will next try to recognize wooden blocked Tibetan manuscripts.

Figure 1 .
Figure 1.Sample copy of the original Tibetan text

Figure 8 .
Figure 8. Relationship between class for similar characters and class for candidate characters