HUSTers Talk
NEXT

Guan Haisu: AI aids in making “niche” oracle bone script deciphering popular

Jun 3, 2024

How can AI assist in deciphering texts from over 3,000 years ago? How does the integration of oracle bone script with modern technology make this “niche” field more accessible? Guan Haisu, an undergraduate student from the 2021 cohort at HUST’s School of Future Technology, has proposed a new method for AI-assisted oracle bone script deciphering. In May, his independent first-author paper was accepted by the main conference of ACL 2024, a top-tier international conference in the field of natural language processing. What kind of sparks will fly when oracle bone script meets artificial intelligence? Let’s explore his story.

 

 

A Perfect Match: Beginning an Adventure Together

 

Since the discovery of oracle bone script in 1899, approximately 4,500 unique characters have been identified, with around 1,600 of them deciphered. This leaves a significant number of characters still awaiting interpretation. A scholar once remarked that deciphering a single character is akin to discovering a new planet, highlighting both the importance and the difficulty of studying ancient scripts.

 

In his sophomore year, Guan Haisu joined an undergraduate innovation team led by Professor Bai Xiang from the School of Software Engineering, embarking on his initial foray into scientific research. Within the team, he met Professor Liu Yuliang, who specializes in artificial intelligence, particularly in the fields of computer vision and natural language processing. Due to his keen interest in ancient script research, Guan immediately decided to join Liu’s research group, beginning his journey of using AI to assist in deciphering oracle bone script.

 

Under the guidance of his professors, Guan spent two weeks collecting and studying research papers from both domestic and international sources in this field. He discovered that most existing AI research focuses on recognizing and detecting already deciphered oracle bone characters, while using AI to assist in deciphering unknown characters remains largely unexplored.

 

Guan proposed the idea of using generative models to simulate the evolution of oracle bone script into modern Chinese characters to aid in deciphering. After receiving support from Professors Bai and Liu, he collaborated with four other students in the research group, each exploring different methods to assist in the deciphering of unknown oracle bone characters.

 

 

For artificial intelligence models, datasets are crucial. However, existing open-source oracle bone script datasets often suffer from category gaps or sparse samples. After deliberation, the research group decided to proactively develop a comprehensive oracle bone script dataset.

 

During the summer break of 2023, the group traveled to Anyang, Henan, known as the “hometown of oracle bone script,” to conduct research at the Yinxu Ruins and the National Museum of Chinese Writing. They explored the origins and development of oracle bone script. During their visit, they also engaged with oracle bone script experts from Anyang Normal University and received recommendations for relevant books and data websites from the institution.

 

 

Upon returning to campus, the research group members collaborated to build the dataset. They meticulously organized and input information from electronic books such as “Compilation of Oracle Bone Script,” “Compilation of Western Zhou Bronze Inscriptions,” “Table of Characters from the Spring and Autumn Period,” and “Table of Characters from the Warring States Period,” as well as data from websites like jgw.aynu.edu.cn and guoxuedashi.net.

 

“I primarily organized three books, totaling nearly 2,500 pages. I needed to extract the ancient characters from the books individually to use them as training samples for the model. This process required me to simultaneously verify the accuracy of the text input and refine the database algorithm,” said Guan Haisu. During the input process, he sometimes speculated on the meanings of the oracle bone characters, and to date, he has recognized nearly a hundred of them.

 

By the end of the summer break, the research group successfully constructed the open-source oracle bone script datasets HUST-OBC and EVOBC. These datasets include over 1,600 categories of oracle bone characters and more than 13,000 categories of characters from various stages of their evolution, providing essential samples for subsequent research.

 

Revitalizing Niche and Esoteric Disciplines

 

After the school year began in September, Guan Haisu started training the OBSD model, which uses diffusion models to assist in deciphering oracle bone script, focusing on finding the “optimal solution” for this task.

 

 

The process involves inputting images of oracle bone script characters into the first model, which gradually transforms them through various stages of character evolution into images resembling modern Chinese characters. Another diffusion model then corrects these generated images to better align with the logical structure and writing norms of modern Chinese characters. Over several months, Guan Haisu trained and optimized the model to predict the modern forms of oracle bone script characters based on ancient character evolution patterns, providing clues for deciphering the script.

 

During the research, Guan Haisu collaborated with other members of the research group, collectively securing four national patents related to methods and systems for assisting in the deciphering of oracle bone script using conditional diffusion models and classification models. They also co-designed an AI-based multi-path auxiliary deciphering system called Open-Oracle, which offers services to enthusiasts and researchers of ancient scripts.

 

 

 

Currently, the website features five main functions, including radical decomposition, character evolution, and natural language processing. In the future, the website will also focus on popularizing oracle bone script, aiming to produce digital animations, emoji packs, chat applications, and metaverse projects related to oracle bone script.

 

“Every member of the research group is driven by a strong interest and proactively completes each task, often exceeding my expectations,” said Liu Yuliang.

 

Achieving Success by Pushing Boundaries

 

During the research process, two events left a deep impression on Guan Haisu.

 

When he was preparing to submit his paper, his advisor suggested taking the time to refine it before submission. After countless revisions, the paper was eventually submitted to the top-tier international conference in the field of natural language processing, ACL. “The ACL conference has much higher standards than the conference we initially planned to submit to, and it only accepts papers once a year. Making comprehensive adjustments to the paper in a short time was very challenging,” said Guan Haisu. Encouraged by his advisor, he decided to take on the challenge.

 

During the winter break, Guan Haisu devoted himself entirely to model testing and paper optimization. “On the first day of the Chinese New Year, the test results came out, proving that our model achieved relatively advanced deciphering accuracy compared to other models. It felt like all the hard work paid off.”

 

 


In April this year, Guan Haisu received an email from the ACL conference. The three reviewers raised nearly 60 questions, including doubts about the model itself, the background of oracle bone script, and the evolution of modern Chinese characters. With only four days to respond, he meticulously organized relevant materials, polished his English responses to the reviewers, and repeatedly discussed and revised them with his advisor. Ultimately, he passed the reviewers’ scrutiny.

 

“His thinking is rigorous and objective, and he can integrate knowledge and action, which is a very rare quality for a ‘research newcomer,’” commented Liu Yuliang.

 

On May 16, Guan Haisu’s independent first-author paper was successfully accepted by the main conference of ACL 2024. “It has been exactly one year since I first started researching oracle bone script. I am honored to have achieved something meaningful in such a short time.”

 

Contributing Youthful Energy to Cultural Heritage

 

Over the past three years, Guan Haisu has received numerous awards, including the First Prize of the Chinese Mathematics Competition for College Students, the First Prize of the Contemporary Undergraduate Mathematical Contest in Modeling, the National Scholarship, the University’s Triple-A Scholarship, and two provincial-level innovation and entrepreneurship awards. These honors are a testament to his solid professional foundation.

 

In high school, Guan Haisu excelled in mathematics, winning the first prize in the Chongqing division of the National High School Mathematics League and scoring 148 points (out of 150) in the mathematics section of the national college entrance examination. After entering HUST, he passed the assessment and joined the experimental class of the School of Future Technology for integrated bachelor’s, master’s, and doctoral studies.

 

 

Here, he continued to consolidate his foundational knowledge, expand his professional learning, and combine theoretical knowledge with research content, constantly enhancing his ability to think independently. “Scientific research practice helps me gradually understand the true applications of mathematical theories that I initially didn’t comprehend, and these theories, in turn, provide innovative guidance for my research work,” said Guan Haisu. He believes that taking the time to think deeply and ponder is the key to his moments of inspiration.

 

 

We are as mayflies in our passage on the earth, as insignificant as grains of corn floating in an ocean.” This is Guan Haisu’s favorite line of poetry. In his view, revitalizing ancient texts that have existed for over three thousand years may seem like a modest endeavor given the limited power of an individual, but he still wants to continue his research and exploration, striving to provide better solutions for AI-assisted oracle bone script deciphering, and contribute the wisdom of HUSTers to the protection and inheritance of oracle bone script.

 

 

 

Written and Edited by: Chang Wen, Peng Yumeng

Address: Luoyu Road 1037, Wuhan, China
Tel: +86 27 87542457    Email: apply@hust.edu.cn (Admission Office)

©2017 Huazhong University of Science and Technology