SUPR
Countering bias in AI methods in the social sciences
Dnr:

NAISS 2024/22-58

Type:

NAISS Small Compute

Principal Investigator:

Nicolas Pietro Marie Audinet de Pieuchon

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-01-12

End Date:

2025-02-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Allocation

Abstract

One of the principal challenges when estimating the effects of interventions from observational data (e.g. historical data) is controlling for confounding factors that may bias the results. The causal inference literature is rich with methods to remove confounding bias in settings where they have been identified and measured. However, these settings are typically rare in the social sciences. Information about confounders tends to be captured in less precise data formats such as text (if they are captured at all). The goal of the project - and of my PhD - is to develop and apply causal inference methods that can make use of text data for more precise and robust estimates in the humanities and social sciences. We hope that developing better causal estimation methods will enable researchers to answer more questions about the society we live in and in turn result in more informed policy and institutional decisions. These methods will make use of recent techniques developed in the field of NLP to automatically capture information from text. The starting point for my research is a paper by Richard Johansson and Adel Daoud titled “Conceptualizing Treatment Leakage in Text-based Causal Inference". In this paper, they lay out the problem of treatment leakage: if the text used for controlling for confounders is predictive of both treatment assignment and the outcome, then using text could introduce bias into the system and harm the quality of the causal estimate. To counter this, they show (in a simplified experiment) that removing the influence of treatment assignment from the text removes the bias. The first goal of my research is to replicate and extend these experiments to continue to study how using text data might introduce bias into the causal estimate. Another research direction is to attempt to apply these methods to investigate the effect of International Monetary Fund (IMF) programs on the economy of countries. The IMF produces yearly country reports for several countries around the world. The question is: can we use the reports alongside existing tabular data to improve causal estimates about the efficacy of the IMF programs? More concretely, the project involves gathering and preprocessing all the relevant PDFs from the IMF archives, testing the predictive power of the country reports to see if they contain relevant information and then designing and applying methods to the data to see if they improve prediction.