SUPR
Federated learning for multilingual Named Entity Recognition
Dnr:

NAISS 2024/22-1455

Type:

NAISS Small Compute

Principal Investigator:

Kätriin Kukk

Affiliation:

Linköpings universitet

Start Date:

2024-11-05

End Date:

2025-02-01

Primary Classification:

10201: Computer Sciences

Webpage:

Allocation

Abstract

In this project we will implement a federated learning setup to train a language model for named entity recognition (NER) on different languages (Swedish, Norwegian, Danish, Icelandic etc.). The dataset for each language will be located on a separate machine and only the model weights will be shared between different machines during the training. Using this setup, we will be able to investigate the benefits of federated learning for a shared task for which the datasets cannot be shared between organisations. The goal of this project is to provide insights and guidelines into how federated learning can be implemented for scalable training of ML based language models for situations when data cannot be shared between agents. In addition to this, the project will enhance the federated learning competence of four PhD students in Swedish academia who perform research within the field of machine learning, as this project is a part of the WASP Scalable Data Science and Distributed Machine Learning course currently being undertaken by the PhD students. The expected outcome of the project is a set of language models fine-tuned for named entity recognition, trained in a federated learning setting. The model performance will be analysed and presented in a project report, and potentially published at a Nordics natural language processing conference.