SUPR
NLP for studying data annotation
Dnr:

NAISS 2024/22-264

Type:

NAISS Small Compute

Principal Investigator:

Denitsa Saynova

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-02-28

End Date:

2025-03-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Abstract

The project will explore how NLP approaches can be used to study and assess the quality of data annotations. We want to investigate potential bias in human annotations stemming from domain knowledge, annotation task setup, annotation codes, etc. Two main approaches include leveraging external text embedding models for discovering discrepancies as well as modelling different strata of data to assess the homogeneity and transferability of human-assigned labels.