AI Agents for clinical healthcare dataset creation

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-932

Type:

NAISS Small

Principal Investigator:

Fredrik Carlsson

Affiliation:

Karolinska Institutet

Start Date:

2026-06-01

End Date:

2027-06-01

Primary Classification:

30299: Other Clinical Medicine

Webpage:

Allocation

Arrhenius Disk at NAISS: 2000 GiB
Arrhenius GPU at NAISS: 150 GPU-h/month

Abstract

The development and validation of AI models in healthcare are significantly constrained by the scarcity of high-quality, diverse, and standardised datasets. Manual data curation by healthcare professionals is time-consuming, costly, and difficult to scale, limiting the broader adoption and robust evaluation of AI solutions in clinical settings. This PhD project will investigate the use of AI agents as an alternative mechanism for generating benchmarking datasets, with a focus on medical imaging and clinical documentation. We hypothesise that AI agents can produce datasets of comparable quality to those curated by trained clinicians, enabling scalable, efficient, and privacy-conscious data collection. The project will develop and validate a reusable computational workflow for agentic dataset generation in medicine. The workflow will combine local open-weight model inference, tool-using AI agents, structured task templates, audit logging, and reproducibility controls. The technical objective is to determine how AI agents can be deployed efficiently to generate, annotate, and refine benchmarking datasets while preserving transparency, traceability, and quality control. The project will benchmark alternative models, prompts, inference settings, agent configurations, and validation procedures across representative tasks in medical imaging and clinical documentation. Outputs will be assessed against predefined prompt requirements and by quality-based review procedures designed to estimate clinical plausibility, completeness, consistency, and usefulness for downstream AI evaluation. Main supervisor: Daniel Lundqvist, Karolinska institutet