Exploring efficient adaption of large language models to unseen tasks and low resource languages

NAISS 2023/22-455


NAISS Small Compute

Principal Investigator:

Oskar Holmström


Linköpings universitet

Start Date:


End Date:


Primary Classification:

10208: Language Technology (Computational Linguistics)




Recent advances in natural language processing (NLP) have led to the emergence of large language models with unprecedented abilities. However, the underlying mechanisms that enable these abilities remain poorly understood. Additionally, the limited availability of data and resources for low-resource languages hinders the development of similarly capable models for these languages, resulting in an accessibility gap. The aim of this research project is to explore efficient adaptation methods for large language models to unseen tasks and low-resource languages. Drawing on previous research on instruction tuning for Swedish, we will investigate and develop methods for cross-lingual adaptation and adaptation to unseen tasks, such as reinforcement learning from human feedback (RLHF). Furthermore, we will study the underlying mechanisms of transfer learning in language models to gain insights into how these models can adapt to different tasks and languages. The potential impacts of this research are twofold. Firstly, the development of efficient methods for cross-lingual adaptation and adaptation to unseen tasks could bridge the performance gap between high-resource and low-resource languages. This could lead to improved machine translation systems, information retrieval systems, and other NLP applications for low-resource languages, making advanced language models more accessible to speakers of these languages. Secondly, the findings of this research could also have implications for more efficient resource utilization in English language models. By uncovering insights into the mechanisms of transfer and adaptation, we may be able to develop more efficient methods for adapting large language models to different tasks and domains, reducing the need for extensive data and resources for fine-tuning, and making these models more practical and cost-effective for real-world applications.