2SPARE-LLM

2SPARE-LLM

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

sens2025680

Type:

NAISS SENS

Principal Investigator:

Pontus Nauclér

Affiliation:

Karolinska Institutet

Start Date:

2025-10-30

End Date:

2026-11-01

Primary Classification:

30116: Epidemiology

Webpage:

Allocation

Cygnus /proj at UPPMAX: 1000 GiB
Cygnus /proj/nobackup at UPPMAX: 1000 GiB
Bianca at UPPMAX: 2 x 1000 core-h/month

Abstract

The project employs large language models to analyze electronic health record data on a large pseudonymized corpus of electronic health record data from Karolinska University Hospital admissions between 2010-2024, which we've named 2SPARE (2020 started Stockholm/Sweden Proactive Adverse Events REsearch). We will perform multiple studies with the overarching goal to enhance management of adverse events, infection control and antibiotic stewardship through data-driven precision medicine by utilizing advanced data science and machine learning for individualized risk profiling and prevention of adverse events and Healthcare-Associated Infections (HAI), utilizing routinely collected medical data as part of hospital stay. The aim is to develop automated algorithms for adverse events and HAI surveillance and predictive models to improve antibiotic use. Study A: We will investigate the performance of a large-language model (LLM), fine-tuned and domain-adapted using local clinical data, to provide advice on antimicrobial therapy. The desired outcome is to be able to flag patients who should not receive antibiotics in the emergency department, as well as a comparison of correctness of the LLM suggested antibiotics compared to verified bacterial infections. Study B: We aim to identify risk factors for Healthcare Associated Infections and adverse events using natural language processing methods for automatically annotating previously unstructured data. We will investigate LLMs for automated annotation of previously unstructured data in the medical records. We will explore the in-context learning abilities of large language models and prompt engineering for automatic annotation of risk factors. This will be done using (1) generic vs. domain-adapted LLMs based on GPT-SW3 and other open-source models, and (2) in zero-shot vs. few-shot learning scenarios. Study C: We aim to identify patients at low risk of infections that receive unnecessary antibiotics and to develop prediction algorithms for antibiotic stewardship. We will investigate the leverage of both structured and unstructured data to predict unnecessary antibiotic use. A pretrained clinical language model, a BERT model trained on Swedish clinical text, is used for representing the clinical text, and we will use multimodal finetuning for training multimodal prediction models based on deep neural networks. Study D: To investigate how and LLM can be used for early prediction of adverse events and healthcare-associated infections. The LLM will be used to daily predict prelabeled outcomes using both unstructured and structured data.