Lexical Semantic Change through Large Language Models Generation

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-705

Type:

NAISS Small Compute

Principal Investigator:

Pierluigi Cassotti

Affiliation:

Göteborgs universitet

Start Date:

2024-05-16

End Date:

2025-06-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Alvis at C3SE: 1000 GPU-h/month

Abstract

This research plan aims to investigate semantic change, consolidating current research in this area and providing new innovative research directions. Consolidating current research A key aspect of this research is the development of annotation strategies for Lexical Semantic Change (LSC) detection over multiple time points. Several different multilingual datasets for the LSCD task have been designed for two or three time periods, but longitudinal studies still represent a gap in the literature. Longitudinal studies play a crucial role in LSCD as they can reveal patterns of semantic change and are open to large-scale retrieval-based approaches and hypothesis-driven studies. The strategies employed will include a mix of heuristic methods and advanced algorithms, for example focusing on active learning techniques to minimise the effort required for annotation. In particular, strategies specifically designed for time series or graphs will be employed. In addition, we will focus on creating models for classifying types of semantic change. These models will categorise changes into distinct types such as generalisation, specialisation, metaphorical shifts and others, providing a nuanced understanding of the ways in which word meanings change. The classification systems will draw on the latest cutting-edge technologies for LSCD, exploring both traditional word-level encoding and definition-based models, using synchronic resources such as WordNet or metaphor datasets to model hierarchical and metaphorical sense relations. Innovation We will focus on the synthetic generation of linguistic phenomena, exploiting the extensive capabilities of Large Language Models (LLMs). This will enable the induction of LSC in a controlled setting, facilitating the generation of time-sensitive, context-specific meanings of words. In addition, we will pioneer a new line of research in the area of computational approaches to language contact. This will involve exploring how contact between different languages influences semantic change using computational tools and methods. This novel aspect of the research has the potential to provide insights into the dynamic interaction between languages and how this interaction drives changes in word meanings. As a leading researcher in the field of Lexical Semantic Change (LSC), my work has significantly advanced the field through several key contributions. I developed the XL-LEXEME model, a pioneering tool for cross-lingual lexical semantic change detection that has excelled in multilingual benchmarks. I also presented the DIACR-Ita task at EVALITA 2020, the first of its kind for semantic change detection in Italian. My innovative methods include the use of Gaussian Mixtures for unsupervised semantic change detection and the creation of the Diachronic Engine , which extends NoSketchEngine with diachronic information. I have also contributed valuable linguistic resources, including a diachronic Italian corpus and tools for visualising gender-specific job titles. My research extends further into computational linguistics with projects on relation extraction, integration of linguistic resources into graph databases, automatic labelling of sequences, and the adaptation of Large Language Models (LLMs) to specific languages, significantly enriching the understanding and tools of computational linguistics, especially in the Italian context.