New Gold Standard Dataset resource for Swedish NER

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2023/5-424

Type:

NAISS Medium Compute

Principal Investigator:

Dana Dannélls

Affiliation:

Göteborgs universitet

Start Date:

2023-10-30

End Date:

2024-11-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Alvis at C3SE: 2000 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

We aim to study the performance of various data augmentation methods on the Swedish language, and also the combinations of the different methods. This will be done on the Swe-NERC Version 1 dataset. In trying to achieve this goal we will (if a successful result) create an updated new golden dataset resource for Spårkbanken Text, Swe-NERC Version 2. The primary use of the dataset is to train machine learning models for named entity recognition tasks.