The development and validation of AI models in healthcare are significantly
constrained by the scarcity of high-quality, diverse, and standardised
datasets. Manual data curation by healthcare professionals is time-consuming,
costly, and difficult to scale, limiting the broader adoption and robust
evaluation of AI solutions in clinical settings.
This PhD project will investigate the use of AI agents as an alternative
mechanism for generating benchmarking datasets, with a focus on medical
imaging and clinical documentation. We hypothesise that AI agents can produce
datasets of comparable quality to those curated by trained clinicians, enabling
scalable, efficient, and privacy-conscious data collection.
The project will develop and validate a reusable computational workflow for
agentic dataset generation in medicine. The workflow will combine local
open-weight model inference, tool-using AI agents, structured task templates,
audit logging, and reproducibility controls. The technical objective is to
determine how AI agents can be deployed efficiently to generate, annotate, and
refine benchmarking datasets while preserving transparency, traceability, and
quality control.
The project will benchmark alternative models, prompts, inference settings,
agent configurations, and validation procedures across representative tasks in
medical imaging and clinical documentation. Outputs will be assessed against
predefined prompt requirements and by quality-based review procedures designed
to estimate clinical plausibility, completeness, consistency, and usefulness
for downstream AI evaluation.
Main supervisor: Daniel Lundqvist, Karolinska institutet