Optimal initialization for parameter efficient finetuning of large language models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-16

Type:

NAISS Small

Principal Investigator:

Frédéric Zheng

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2026-01-08

End Date:

2027-01-01

Primary Classification:

10208: Natural Language Processing

Webpage:

Allocation

Alvis at C3SE: 250 GPU-h/month
Mimer at C3SE: 250 GiB

Abstract

Fine-tuning large language models (LLMs) has become an increasing computational and storage challenge as these models now contain billions of parameters. Low-rank adaptation (LoRA) is a recent framework for parameter-efficient fine-tuning that addresses this bottleneck by constraining task-specific weight updates to be low rank. Concretely, for a given weight matrix, the update is parameterized as ΔW=BA, where A and B are much smaller matrices whose parameters are trained while the pretrained weights remain frozen. This reparameterization substantially reduces memory and compute requirements while maintaining performance competitive with full fine-tuning. Despite strong empirical results, the theoretical foundations of LoRA remain limited. In particular, a key hyperparameter is the rank, which directly controls the number of trainable parameters and is often selected using ad hoc rules of thumb. The aim of this project is to derive a theoretically grounded, data-driven rule for choosing the rank, together with a principled initialization strategy for the adaptation matrices A and B. In particular, we will investigate the properties of the neural tangent kernel induced by the gradients of the pretrained model. We expect that this procedure will reduce the performance gap between full fine-tuning and LoRA, while further improving efficiency. Main supervisor: Alexandre Proutiere, KTH.