Development of LLM-based digital twins of European citizens

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-732

Type:

NAISS Small

Principal Investigator:

Guillaume Chapron

Affiliation:

Sveriges lantbruksuniversitet

Start Date:

2026-04-23

End Date:

2026-08-01

Primary Classification:

50907: Statistics in Social Sciences

Webpage:

https://www.articulatingcomplexity.org

Allocation

Mimer at C3SE: 5000 GiB
Arrhenius Disk at NAISS: 2500 GiB
Arrhenius Flash at NAISS: 1250 GiB
Alvis at C3SE: 1000 GPU-h/month

Abstract

Large carnivore (i.e. wolves, bears, lynx) policy in the European Union is shifting rapidly: legal protections are being relaxed and hunting quotas expanded. To support evidence-based policy, we need to understand how European citizens would react to newly proposed management actions, but commissioning a fresh pan-European survey is prohibitively expensive. This project develops and rigorously validates a methodology for using open-weights Large Language Models (LLMs) as *synthetic survey respondents*, sometimes called "silicon sampling" or digital twins of real citizens. We have collected a cross-sectional survey of 10,000 fully anonymized EU respondents (https://doi.org/10.1038/s41559-025-02914-1) covering socio-demographics, economic situation, political affiliation, environmental attitudes, relationship with nature and wildlife, and attitudes toward large carnivore conservation. For each respondent we will construct a structured persona prompt and use open-weights LLMs to generate plausible responses to new policy questions that were never asked in the original survey. We will implement and compare several methodological variants: direct persona prompting; a two-stage generate-then-classify pipeline that separates persona simulation from response-format mapping; and a multi-agent architecture decomposing attitude formation into its cognitive processes. Each variant will be evaluated using cross-validated hold-out splits over survey questions, stratified by question domain, with individual-level, distributional, conditional and correlation-structure metrics. Sensitivity analyses will quantify dependence on prompt formulation, sampling temperature and model choice. We may additionally fine-tune smaller open-source models on a training split of the survey to compare with purely prompt-based approaches. A core methodological requirement is to compare results across the full available spectrum of open-weights LLMs — from small efficient models (e.g. gpt-oss 20B, Llama 3 8B, Mistral 7B, Qwen 7B) through mid-sized models (Llama 3.1 70B, Qwen 72B, Mixtral 8x22B) to the largest currently available open-source models (DeepSeek family, ~685B parameters). This spans multiple orders of magnitude in model scale and allows us to ask how synthetic-respondent accuracy depends on model size, architecture and training corpus. Using open-weights rather than proprietary APIs is essential for (i) reproducibility, since weights are frozen and auditable, and (ii) scale, as the validation pipeline requires millions of generations that would be cost-prohibitive on commercial APIs. The outputs will include a publicly released validation benchmark for synthetic survey respondents in the environmental-policy domain, an open-source pipeline, and a peer-reviewed methodological publication. Substantive policy findings on large carnivore management will be reported, with predictions explicitly labelled as model-generated and accompanied by their empirically estimated uncertainty.