SUPR
Countering bias in AI methods in the social sciences
Dnr:

NAISS 2025/22-180

Type:

NAISS Small Compute

Principal Investigator:

Nicolas Pietro Marie Audinet de Pieuchon

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-02-06

End Date:

2026-03-01

Primary Classification:

10208: Natural Language Processing

Allocation

Abstract

One of the principal challenges when estimating the effects of interventions from observational data (e.g. historical data) is controlling for confounding factors that may bias the results. The causal inference literature is rich with methods to remove confounding bias in settings where they have been identified and measured. However, these settings are typically rare in the social sciences. Information about confounders tends to be captured in less precise data formats such as text (if they are captured at all). The goal of the project - and of my PhD - is to develop and apply causal inference methods that can make use of text data for more precise and robust estimates in the humanities and social sciences. We hope that developing better causal estimation methods will enable researchers to answer more questions about the society we live in and in turn result in more informed policy and institutional decisions. These methods will make use of recent techniques developed in the field of NLP to automatically capture information from text. This project would contribute towards my main research focus at the moment, which revolves around testing the effectiveness of the Design-based Supervised Learning (DSL) framework for debiasing statistics based on annotations from Large Language Models. Depending on the results of the experiments, I also want to test how effective DSL can be at producing unbiased estimates for the Average Treatment Effect when using text documents as proxies for an unobserved confounder. Additionally, together with some master students I will work on a continuation of my previous project "Can Large Language Models (or Humans) Disentangle Text?" where we seek to develop methods to use LLMs to remove information from text in an interpretable way.