The development and validation of AI models in healthcare are significantly constrained by the scarcity of high-quality, diverse, and standardized datasets. Manual data curation by healthcare professionals is time-consuming, costly, and difficult to scale, limiting the broader adoption and robust evaluation of AI solutions in clinical settings.
This study will investigate the use of AI agents as an alternative mechanism for generating benchmarking datasets, with a focus on medical imaging and clinical documentation. We hypothesize that AI agents can produce datasets of comparable quality to those curated by trained clinicians, enabling scalable, efficient, and privacy-conscious data collection.
The datasets generated by AI agents will be rigorously evaluated using two complementary methods:
- Prompt compliance
- Quality-based assessments