LLM with tool calling for multimodal medical data

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-941

Type:

NAISS Small Compute

Principal Investigator:

Erik Aerts

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-07-14

End Date:

2026-08-01

Primary Classification:

10201: Computer Sciences

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 300 GPU-h/month

Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language understanding and generation across diverse domains. The high quantity of data needed to train such models results in a broad understanding capable of solving tasks in a multitude of domains. However, when adapting LLMs to real-life subdomains which require more in-detailed knowledge to reach a required level of precision, the models often struggle as the detailed knowledge required is either underrepresented in the training data or beyond the model’s static knowledge boundaries. In order to overcome this hurdle in adapting LLMs to specific domains, previous research has been made on equipping LLMs with external tools to handle domain specific data. One such domain is medicine, where real-life decisions often depend on interpreting specific and dependent measurements such as lab results, physiological signals or medication dosages. Integrating an effective tool-based approach to effectively and reliable process this data can benefit LLM capability and medical AI. Our research aims to investigate and advance the integration of tool calling in LLMs for domain-specific tasks, with a particular focus on the medical domain. We seek to develop models that can effectively identify when and how to invoke external tools, process their outputs, and integrate the resulting information into coherent and accurate predictions. To evaluate the impact of tool integration, we will conduct extensive experiments using publicly available medical datasets and established benchmarks. This includes assessing how tool-based LLMs perform in comparison to baseline models, as well as exploring strategies for optimizing the interaction between the LLM and the external tools. Our goals are: (1) to deepen the understanding of tool-use dynamics within LLM architectures (2) to derive practical techniques that improve model performance, interpretability, and robustness in high-stakes, data-intensive domains such as healthcare.