Large Language Model Application Series Salon | The Linguistic Value of Word Embeddings and the Transformation of Language Research Paradigms

发布时间：2025-10-10浏览次数：10来源：语言科学研究院

Speaker

Shi Jianjun

Time: October 29, 13:30

Venue: Room 5103, Teaching Building No. 5, Songjiang Campus

Do word embeddings drive the success of large language models?
Are word embeddings a crucial means of capturing semantics?
Do word embeddings present new research questions for linguistics?

Abstract

Large language models represent a major scientific achievement in the field of language, and word embeddings are key to their success, serving as mathematical representations of human language extracted from massive-scale corpora. Word embeddings constitute a third mode of language representation, in addition to speech and text. Applying them to linguistic research not only enables the description of morphological language phenomena but also provides powerful tools for capturing semantics. Word embeddings lay a material foundation for introducing mathematical methods into the study and description of language rules, fundamentally addressing the computational treatment of linguistic phenomena. The inherent lack of interpretability of word embeddings raises new research questions for linguistics, such as the linguistic significance encoded in high-dimensional word embeddings, paradigms for applying word embeddings in linguistic research, the scale and content of corpora required for training word embedding models, the application of word embeddings in cross-linguistic studies, and the training of language-specific word embedding models for specific periods in diachronic research.

Join us on October 29 at 13:30 in Room 5103, Teaching Building No. 5!