Big Stories
NEXT

Word Walkers persist in their quest for advancing domestic large language model development

Nov 20, 2024

A monkey with three transformations embarks on the seemingly endless journey towards developing domestic large models. This is the work of the Word Walkers team from the School of Artificial Intelligence and Automation at Huazhong University of Science and Technology (HUST). And this monkey, is Monkey, the large multi-modal model.


 


During the China International College Students’ Innovation Competition 2024 (CICSIC 2024) finals held in mid-October, the Word Walkers team stood out by winning the gold award in the industry topic track with their large multi-modal model-based intelligent document processing system.


 


A large multi-modal model is a kind of artificial intelligence architecture capable of processing and integrating multiple types of sensory data, such as text, images and audio. With its extensive knowledge and exceptional conversational abilities, it can understand and perceive the world much like humans do.


Since the emergence of ChatGPT in 2022, the surge of large multi-modal models has swept the whole nation. Internet technology companies have flooded into the field with various large models demonstrating their breathtaking abilities in numerous situations.


In fact, Word Walkers had already been working on its large language model for a decade before large multi-modal models became the industry’s focal point. From 2011 to 2020, their focus was on traditional computer vision tasks. Later in 2021, the team transitioned to the large model field, achieving several technological breakthroughs through their constant hard work.


Fourteen years of dedication to the domestic large language model field resulted in their rapid development.


In January 2024, under the guidance of Prof. Bai Xiang and Prof. Liu Yuliang, the team officially released the large multi-modal model Monkey. Due to its improvements in the accuracy and variety of image descriptions, Monkey was listed as the highlight paper at CVPR 2024 and among the Top 20 most influential papers.


The following months saw Monkey’s evolve into three distinct products—TextMonkey, PdfMonkey and MathMonkey. As another groundbreaking achievement, these products focus on the intelligent processing of official information, academic essays, and K12 education respectively. The products have the capability to lead the market with their one-model solution to multi-situation and multitask problems.


In terms of numbers, China now has more institutes dedicated to basic large model development than all other countries in the world combined. However, in terms of quality, the text processing ability of current large models is still limited by four main weaknesses: limited learning, lack of clarity, short-sightedness, and superficial understanding.

Luckily, the Word Walkers team submitted a satisfying solution to address the four weaknesses with four key technologies, which enhance the overall text processing ability of the project.


The team is fully prepared at every step, from participating in CICSIC 2024 and addressing Baidu’s industry challenge to creating three sub-models of Monkey and reaching the competition’s final stage. The Word Walkers team is continuously engaged in a cycle of identifying problems, researching information, consulting experts, attempting solutions, and refining approaches. At the same time, the School of Artificial Intelligence and Automation and Qiming College have provided consistent support and guidance. From text details in project explanations and logistical support to laboratory resources and industry sponsorships, every single request is well-addressed.


The Word Walkers team is a reflection of the cultivation of top innovative and entrepreneurial talents in the School of Artificial Intelligence and Automation. In recent years, the school has adhered to its original aspiration of cultivating innovative thinking, entrepreneurial spirit and practicing abilities. In the future, the school will continuously optimize the management of innovation and entrepreneurship, fully utilize advanced resources and provide better platforms for students.


Just as Sun Wukong achieved enlightenment after enduring eighty-one trials, the Word Walkers team and their Monkey continue on their journey towards success. In future competitions within the field of large models, Word Walkers aim to release more high-quality open-source projects and break through technological barriers, paving a broader and longer path to success.




Written by: Zhou Ziyue

Edited by: Yang Kunjie, Chang Wen, Peng Yumeng

Address: Luoyu Road 1037, Wuhan, China
Tel: +86 27 87542457    Email: apply@hust.edu.cn (Admission Office)

©2017 Huazhong University of Science and Technology