SUPR
Developing efficient and capable language models
Dnr:

NAISS 2024/22-632

Type:

NAISS Small Compute

Principal Investigator:

Oskar Holmström

Affiliation:

Linköpings universitet

Start Date:

2024-05-01

End Date:

2025-05-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Abstract

Recent advances in natural language processing (NLP) have led to the emergence of large language models with unprecedented abilities. However, the underlying mechanisms that enable these abilities remain poorly understood. Additionally, the limited availability of data and resources for low-resource languages hinders the development of similarly capable models for these languages, resulting in an accessibility gap. The aim of this research project is to explore the development of language models that are both efficient and capable, especially for low-resource languages. The main focus is on studying under which scenarios language adapters are useful for skill transfer across languages. We will also work on developing new methods related to adapters, modular architectures, and sparse architectures. In a closely related project with the National Library we will study how to automatically curate high-quality corpora for pre-training small and efficient language models. The potential impacts of this research are twofold. Firstly, the development of efficient methods for cross-lingual adaptation and adaptation to unseen tasks could bridge the performance gap between high-resource and low-resource languages. This could lead to improved machine translation systems, information retrieval systems, and other NLP applications for low-resource languages, making advanced language models more accessible to speakers of these languages. Secondly, the findings of this research could also have implications for more efficient resource utilization in English language models. By uncovering insights into the mechanisms of transfer and adaptation, we may be able to develop more efficient methods for adapting large language models to different tasks and domains, reducing the need for extensive data and resources for fine-tuning, and making these models more practical and cost-effective for real-world applications.