SUPR
Mechanistic Interpretability for Retrieval-augmented and Multimodal LLMs
Dnr:

NAISS 2025/6-58

Type:

NAISS Medium Storage

Principal Investigator:

Richard Johansson

Affiliation:

Chalmers University of Technology and University of Gothenburg

Start Date:

2025-02-25

End Date:

2025-09-01

Primary Classification:

10208: Natural Language Processing

Secondary Classification:

10210: Artificial Intelligence

Allocation

Abstract

The project is going to carry out mechanistic analysis of large language models. The project extends our previous work, which has been carried out in a set of small NAISS projects. The overall goal is to develop new methods to discover computation mechanisms inside of pre-trained LLMs. The main goals of the current project is to extend our previous results to new types of retrieval-augmented LLMs as well as multimodal LLMs (mainly speech/text models at this point). The main subprojects are the following: 1) Extending our previous work on the analysis of retrieval-augmented LLMs. This involves investigating a variety of new LLMs in a RAG setup, and carrying out a set of new experimental setups to gain an improved understanding of RAG mechanisms. 2) Applying our methods to multimodal text/speech-based LLMs. Our previous work only considered purely text-based LLMs. The practical work will mainly be carried out using public LLM benchmarks and open LLMs. We are not planning to work with any text containing sensitive information. The project will be carried out by members of the NLP@DSAI group at Chalmers, led by professor Richard Johansson (see the group’s page https://dsai-nlp.github.io/). It will also involve collaborators from KTH and Linköping University. The practical work will mainly be carried out by PhD students in the NLP@DSAI group. The resulting research will be presented in a number of planned articles that will be submitted to the EMNLP conference, one of the two top conferences in the NLP field.