Professor Jeffrey THARSEN: Chinese Phonological Databases and Intertextual Networks for Comparative Philology and AI (2026/1/28)

img-title img-title

 

The Jao Tsung-I Academy of Sinology successfully held the eleventh lecture in its "Encounters in the Old World, East and West: From a Transdisciplinary Perspective" series on 28 January, supported by the Eurasia Foundation (from Asia). The event featured Professor Jeffrey Tharsen, Associate Director of Technology, Division of the Arts & Humanities, The University of Chicago, who delivered a lecture titled “Chinese Phonological Databases and Intertextual Networks for Comparative Philology and AI”. This lecture illustrated how innovative technologies can be integrated with the humanities, and showcased the application of Old Chinese phonology and intertextual comparison in the AI model development. It attracted over 100 faculty members and students to attend in person. The majority of participants were students enrolled in HKBU's Bachelor of Arts and Sciences programme in Digital Futures and the Humanities.

Professor Jeffrey Tharsen’s lecture focused on four main areas: computational methods for premodern Chinese phonology, methods for determining intertextuality in Chinese texts, intertextuality in the Zhou bronze inscriptions, and the future prospects of AI in academic research. Through the construction of The Digital Etymological Dictionary of Old Chinese (EDOC), he provided phonological annotations for traditional classics such as the Book of Songs. At the same time, he used the intertextual comparison tool TextPAIR to systematically analyze the Twenty-Four Histories and sixty-three works spanning the classics, philosophy, and literary collections, thus establishing an intertextual network of Chinese classics and opening up a new paradigm in Sinological research.

Particularly remarkable was Professor Tharsen’s use of Zhou dynasty bronze inscriptions as an initial corpus to successfully decipher the rhyming structures in inscriptions such as that of the Da Yu Ding, revealing the phonological and rhetorical features of early Chinese. Additionally, through the analysis of more than 2,700 long Zhou inscriptions, Professor Tharsen identified about 295,000 instances of intertextual expression within the corpus, as well as multiple cases of linguistic affinity between these inscriptions and transmitted texts such as the Book of Songs, the Book of Documents, and the Zuo Zhuan, thereby providing empirical data for Zhou dynasty historical studies.

At the end of his lecture, Professor Tharsen pointed out that current general large language models, such as GPT, are not yet able to consistently detect phenomena of intertextuality and textual reuse in classical corpora. He advocated for further exploration of artificial intelligence training frameworks to develop customized analytic models. In the future, he plans to extend his research to include materials from Warring States manuscripts and bamboo slips, aiming to build a comprehensive big data database of intertextuality covering both excavated documents and transmitted literature. Against the backdrop of increasingly mature digital humanities applications in Sinology, this research will drive technological innovation in classical philology, advancing from micro-level phonological annotation to macro-level network analysis.

 

Lecture Review:

HKBUtube: https://hkbutube.lib.hkbu.edu.hk/inner.php?id=BTS-101279

Bilibili: https://www.bilibili.com/video/BV1m1cvzaEM4/?share_source=copy_web&vd_source=5078ed71343fd0bb61bd7b3508c26aa3

Lecture11 Lecture11