Multimodal and Generative AI with application in biomedicine

NAISS 2023/5-493


NAISS Medium Compute

Principal Investigator:

Paolo Soda


UmeƄ universitet

Start Date:


End Date:


Primary Classification:

20603: Medical Image Processing

Secondary Classification:

10201: Computer Sciences




This application is a continuation of current NAISS 2023/5-274, where we need more resources. Indeed, project NAISS 2023/5-274 is focused on multimodal deep learning, as interpreting medical findings is multimodal by its very nature, AI needs to be able to interpret different modalities together to progress toward higher informative clinical decision-making. In this respect we are investigating multimodal deep learning and its explanations, an area of great interest that studies how deep neural networks can learn shared representations between different modalities by investigating when to fuse the different modalities, how to embed in the training any process able to learn more powerful data representations, how to explain such models. Applications of these methodologies have been directed toward oncology, digital twins, physiological time series mining. In the last months, we have also started investigating the use of generative AI in different contexts, such as image denoising, image-to-image translation, and image generation to address data scarcity and, for these reasons, we are exceeding the allotted GPU hours, and we are asking for more resources. A key area emerging in our research is emphysema detection which needs for different directions of investigation to be properly addressed. Indeed, first, we are working with images from the SCAPIS study ( that contains images collected using low-dose CT (LDCT): in LDCT the overall image quality decreases, compromising disease assessment and increasing uncertainty in the diagnosis, while it decreases the risk of ionizing radiations for patients. We hypothesize that AI can help overcome these limitations by focusing on three main aspects: enhancing the image quality of LDCT images, developing an explainable decision support system for emphysema quantification, and integrating both aspects into an end-to-end trainable framework. In detail, our three goals are: first, to enhance image quality, a task usually referred to as image denoising, to reduce uncertainties in the images, making the quality of low-dose CT comparable with standard CT scans. Second, to develop an explainable AI-based decision support system, which can detect emphysema and also detect the area of the lungs where the emphysema is located but also provide explanations about how much an input feature contributes to the final predictions. This also needs for generative AI to cope with data scarcity: we plan to study if and how synthetic images can be used to train decision models together or instead of real images. Third, although both AI-based denoising and computer-aided detection/classification have been extensively studied, they are often treated separately and sequentially according to their respective goals, neglecting the interplay between noise reduction and detection/classification. For this reason, our largest efforts will be directed toward the investigation of unified approaches for jointly optimizing the denoising and classification problem in an end-to-end fashion. To this end, a well-designed deep learning framework will be developed by leveraging techniques such as latent space manipulation and custom loss function optimization.