Chinese Transcription of Buddhist Terms in the Late Hàn Dynasty

This dataset is a compilation of Chinese transcriptions of Buddhist terms produced by translators from the late Hàn period. It is a compilation of the previous works of Coblin (1983), Karashima (2010), Vetter (2012), Hill, Nattier, Granger, and Kollmeier (2020) for the Chinese transcriptions. To these were added phonological reconstructions of the Chinese terms for late Hàn from Schuessler (2009) and Middle Chinese from Baxter and Sagart (2014a), as well as the Gandhari equivalents of Sanskrit and Pa ̄ li terms from Baums and Glass (2002). This dataset, shared on Zenodo, aims at being the new state-of-the-art dataset on Buddhist transcription material and can be used by anyone working on Hàn Chinese phonology and will help better understanding the possible language sources of the Chinese transcriptions, as well as the phonology of the target Chinese dialects.

• A n Shìgaō 安世高 (fl.148-170), a Central Asian translator active in the Chinese imperial capital of Luòyáng 洛陽, was the first translator of Buddhist texts into Chinese whose name we know (Zacchetti 2019: 630).
• Kang Mèngxiáng 康孟詳, of whom little is known, but is generally considered to be born in China from Sogdian parents (Nattier 2008: 102).
These three figures are of particular interest to us here because of their use of transcription in their translations.For instance, while a concept such as dharma ended up being translated into Chinese as *puɑp 法 (rule, way, doctrine), it can also be found in the translations of Lokakṣ ema as *dəm-mɑ 曇摩, a phonetic transcription of a Prakrit word comparable to Pali damma or Gandhari dhaṃma.
The most extensive discussion of the implication of such transcriptions for the phonology of Late Hàn Chinese is Coblin (1983).Since that publication, however, a lot of things have changed: new manuscripts have been discovered and their authorship has been attributed to A n Shìgaō (Zacchetti 2010: 264), providing new transcriptional data, while some other texts traditionally attributed to him have now been classified as later commentaries (Zacchetti 2010: 259-262); our understanding of Old Chinese phonology has dramatically changed and in particular it is now accepted that Old Chinese had a complex syllable structure with consonant clusters in syllable-initial and final position as well as prefixes and suffixes, cf.Baxter (1992), Baxter and Sagart (2014b); finally, our understanding of languages that could have been close to the source languages of the texts being translated by A n Shìgaō, Lokakṣ ema, and Kang Mèngxiáng -in particular Gandhari (Baums 2009) -has progressed.
These developments make it necessary to revisit Coblin's conclusions regarding the contributions of the Buddhist transcriptional data to our understanding of Hàn Chinese, and the dataset presented here is an attempt to lay out all of the available Buddhist transcriptional data from the Late Hàn period and annotate it with state-of-the-art linguistic knowledge: Sanskrit, Pali and Gandhari equivalents serve as points of comparison for what the pronunciation of the words might have been in the unknown source language, and Late Hàn Chinese and Middle Chinese reconstructions as illustrations of the transcriptions' target language.(2) METHOD

BASE CORPUS
The basis of the dataset is Coblin (1983), whose Buddhist transcriptional data includes the following texts from the Taishō Tripiṭ aka:

ADDITIONS AND REMOVALS
Over the years, scholars have expressed doubts regarding the inclusion of this or that text to the corpus of these translators, 4 while other texts were proposed for inclusion.For the A n Shìgaō corpus, a consensus gradually emerged and is described in detail in Zacchetti (2019), itself based on the work of Zürcher (1977) and Zürcher (1992).Some of the texts in Zacchetti's list were long considered to be part of A n Shìgaō's works but were not studied by Coblin.As a result, we added the following texts on top of Coblin's A n Shìgaō's corpus: For a detailed discussion of Lokakṣ ema's extant corpus, cf.Harrison (1993), andNattier (2008: 77-85) for a detailed segmentation of the texts into three tiers, each tier representing a level of proximity to Lokakṣ ema's own style, and the more distant tiers are posited to be indicative of later revisions of the text.

2
It should be noted that the expected Mandarin reflex for 般若 -as transcribing a word in a Prakrit akin to Pali paññā or Gandhari praṃña ([pɾəɲːə]) -would be banre: indeed, ban 般 had two MC pronunciations, pan and pran, respectively pointing to Eastern Hàn *pa:n and *pra:n, being good fits for the first syllable of either Pali paññā or Gandhari praṃña, while rě 若 corresponds to MC nyaX, pointing to Eastern Hàn *njaʔ, a good match for ña (ruò 若 points to *njak).The standard rendition of 般若 as borě might suggest a later (hypercorrective) learned reading of 般 as transcribing the first syllable of Sanskrit prajña: bō points to Eastern Hàn *p(r)aj.Nevertheless, as borě is the de facto standard pronunciation of the word, we use it in the title of T224.

3
While there is a consensus around T184 belonging to Kang Mèngxiáng's corpus, one should note that the extant text appears to have undergone later revisions as late as the Eastern Jìn 東晉 dynasty (266-420).See Nattier (2008: 104-109) for a discussion of the external and internal evidence, itself based on Kawano Satoshi 河野訓 (1991).

4
An in-depth discussion of each of the three translators can be found in Nattier ( 2008 For Lokakṣ ema and Kang Mèngxiáng, no new texts were added, but for Lokakṣ ema more transcription words were added from T224 Dàoxíng borě jıng 道行般若經, on the basis of Karashima (2010). 6All the transcription material mentioned so far for the three translators can be found in Hill et al. (2020).
On top of these, two manuscripts 7 discovered in 1999 in the Kongo-ji 金剛寺 temple were ascribed to A n Shìgaō in Zacchetti (2010: 264); Vetter (2012), in his study of A n Shìgaō's lexicon, includes material from the Kongo-ji as well as from T101, 8 and we have retrieved the transcription material from there.The final A n Shìgaō corpus, starting from Coblin (1983) and applying all the additions and removals, comprises the following texts: 9  Five texts attributed to Lokakṣ ema in the studies mentioned above are missing from our dataset: T282 (Zhū púsà qiú fú beň yè jıng 諸菩薩求佛本業經), T283 (Púsà shí zhù xíng dào pıň 菩薩十住行道品), T362 (A mítuó sanyésan fó sàlóufó tán guòdù réndào jıng 阿彌陀三耶三佛薩樓佛檀過度人道經), T624 (Dùnzhentuóluó suǒ wèn rúlái sanmèi jıng 伅真陀羅所問如來三昧經), and T807 (Nèi cáng baǐ baǒ jıng 內藏百寶經) since -to the best of our knowledgeno collection of the transcriptions of Indic terms exists.We aim to address this gap in a future publication.

8
T101's status is still a matter of controversy and Hill et al. (2020) chose not to include it.Following Harrison (2002), we have chosen to include it; it contributes 12 new entries to A n Shìgaō's corpus, and our dataset is structured for it to be easy to filter it out if T101 is eventually deemed not to be from A n Shìgaō.9 Vetter (2012) includes T397(13) Shí fang púsà pıň 十方菩薩品 in A n Shìgaō's corpus; we follow Nattier (2008: 55-59) who convincingly argues that the text cannot be from the hand of A n Shìgaō and excludes it.Baley et al.

Journal of Open
Humanities Data DOI: 10.5334/johd.110 Altogether, this forms the Chinese basis of our dataset, along with the identification of the corresponding Sanskrit and/or Pali equivalents.For these, we have relied on the identification made in Vetter (2012) for the Kongo-ji texts and Hill et al. (2020) for the rest.

SOURCE SUMMARY
As a summary, the transcriptions listed in the dataset directly come from the following sources: for A n Shìgaō, we collate Hill et al. (2020), which expands Coblin's work with more texts and more entries for the existing texts, and Baley (2023), which collects transliteration terms from Vetter (2012) for the Kongo-ji 金剛寺.For Lokakṣ ema and Kang Mèngxiáng, we use Hill et al.

INDIC TRANSCRIPTIONS
As the Sanskrit/Pali information in Hill et al. (2020) was incomplete -for some entries only one of the two languages was provided -we have aimed to complete it where possible; in addition, we have used Baums and Glass (2002) to provide Gandhari equivalents to the Sanskrit/Pali whenever we were able to identify such equivalents. 12This will help explore the question of the translations' source language(s) 13 from a quantitative as well as qualitative point of view.We think that expanding this process to other languages of Central Asia, as their scholarship improves, would be desirable; in particular, we aim to explore Tocharian equivalents in a later project.

CHINESE RECONSTRUCTIONS
We have added columns to provide reconstructions of various stages of Chinese phonology: • Late Hàn: Schuessler (2007) and Schuessler (2009) • Middle Chinese: we use the Middle Chinese transcription system (based on the rime books and rime tables) described in Baxter (1992) (3) DATASET DESCRIPTION

OBJECT NAME
Chinese Transcription of Buddhist Terms in the Late Hàn Dynasty.

FORMAT NAMES AND VERSIONS
OpenDocument Spreadsheet 10 In the table, words occurring in multiple places in the corpus of a translator are counted as a single entry (with multiple locations).
11 Of all the transcription words in the 7 extra works added in Hill et al. (2020) compared to Coblin (1983), all were already present in other A n Shìgaō texts, except for A pítán 阿毘曇 in T1557's title which should not be treated as coming from A n Shìgaō because -as an anonymous reviewer suggested -the title is a later addition, and the word itself cannot be found within the text.
12 For Gandhari, we have simply looked up Baums and Glass (2002); for missing Sanskrit / Pali, we have relied on other entries in the database that had contained the same parts of words; for instance, while Baums and Glass (2002) does not contain an entry for Sanskrit indradatta, it does contain one for indra and datta, and so we have marked the Gandhari equivalent as iṃdra+data, to indicate it is the result of two look-ups.
13 Cf.Boucher (1998) discussing why it might not be possible to prove that the source language is Gandhari.

TRANSLATOR COBLIN 1983 HILL ET AL. NEW DATASET
A

CONTRIBUTING
If you find errors in the dataset, please email the corresponding author.

(4) RE-USE POTENTIAL
By bringing together the scholarly work of many different scholars, this dataset can serve as the basis for further analysis of transcription practices of the Chinese Buddhist translators of the late Hàn dynasty.For instance, the question of the attributions of translation works is a recurring one and in the case of translators such as A n Shìgaō and Lokakṣ ema -as we have seen -the debate about the authorship of individual texts can take place over many centuries.Our dataset provides a quick reference that can help argue -on internal grounds -whether the transcriptional vocabulary used in a text is typical of a certain translation team and can therefore contribute to discussions of text attributions, including discussions of layering of the translation process.
Another potential re-use of our dataset is to help with interpreting Gandhari texts: a good number of the texts included in the present dataset are translations of texts that are no longer extant; with new excavations of manuscripts in Gandhari and other languages, as well as the gradual cataloguing of the existing ones, our dataset of equivalence between Chinese and Gandhari may help -in the future -to identify the source text of such translations or -since the editorial history of such texts is generally more complicated -at least to identify passages that bear similarities to our known Chinese texts and help interpret the Gandhari manuscripts and our understanding of the doctrinal development underlying the diffusion of such texts.
Finally, as the dataset contains Chinese transcriptions of Buddhist concepts and their equivalents in several languages, this information can be used to try and qualify the source language of those transcriptions.For example, does a given Chinese transcription of a Buddhist term show greater similarity to its equivalent in Sanskrit, Pali, Gandhari or yet another language, and what does it tell us about the likely phonetic characterstics of the translation's source language?
In the earlier example of dharma transcribed by Lokakṣ ema as *dəm-mɑ 曇摩, as the reconstruction of a final *-m is certain for *dəm 曇, this seems to exclude the possibility of a transcription from Sanskrit dharma, and instead the choice of two syllables, the first ending in *-m and the second starting with *m-and would indicate a gemination in the source language, as is for instance found in Prakrits such as Pali damma and Gandhari dhaṃma.

Table 2
Baley et al.Journal of Open  Humanities Data  DOI: 10.5334/johd.110histranscriptions:somewords match more closely Pali models, as in his use of *ʔɑ tśan dai 阿旃陀 that better matches Pali accanta than Skt.atyanta or Gdh.acada, 14 while others show a Gandhari slant, such as *tṣan diei 羼提 being closer to Gandhari kṣ aṃti 15 than to Pali khanti.16Conversely, the parallel question can also be investigated: given the Chinese transcriptions, what can one learn about the dialect of Chinese spoken by the translator team?What phonological features of that dialect can be discovered from the choice of Chinese characters to transcribe certain syllables of the original Buddhist term?Such questions are of extreme importance to the reconstruction of the historical development of Chinese phonology during the late Hàn period.Sibilants in A n Shìgaō's transcriptions closely match Gandhari.