Multimodal deep learning and its application in biomedicine

NAISS 2023/6-190


NAISS Medium Storage

Principal Investigator:

Paolo Soda


UmeƄ universitet

Start Date:


End Date:


Primary Classification:

20603: Medical Image Processing

Secondary Classification:

10201: Computer Sciences




In the fourth industrial revolution we are witnessing a fast and widespread adoption of artificial intelligence (AI) in our life, healthcare included. Advancements in deep learning (DL) should make significant contributions in this area supporting diagnosis, prognosis and treatment decisions. Most of the DL models consider only unimodal data, neglecting information available in other modalities of patient digital phenotypes, e.g. clinical data, CT, MRI images, etc.. As interpreting medical findings is multimodal by its very nature, AI needs to be able to interpret different modalities together to progress towards higher informative clinical decision making. In this respect we plan to investigate multimodal deep learning, an area of great interest at its infancy. It studies how deep neural networks (DNNs) can learn shared representations between different modalities by investigating when to fuse the different modalities and how to embed in the training any process able to learn more powerful data representations. We plan also to search for an optimal MDL fusion architecture, robust to missing modalities or missing data, studying multimodal regularization to improve stability, algorithmic speed-up and to reduce overfitting. We would like to consider approaches mitigating training from scratch, even when datasets of reduced size are available as it happens in healthcare, such as the use of generative approaches, GANs included. Furthermore, we are aware that a key impediment to the use of DL-based systems in practice is their black-box nature that does not permit to directly explain the decisions taken. Explainable AI (XAI) is now attempting to improve trust and transparency, but its investigation in multimodal deep learning in general, and in healthcare in particular, is in an early stage. This project aims to study XAI methods for MDL using attention mechanisms, generating textual and fine-grained visual explanations, creating and testing human-understandable concepts, providing counterfactual examples and enabling the interaction between the algorithm and the final user. This can have a disruptive impact since model opacity makes it difficult for doctors, patients and regulators to trust it. Further to develop a framework for MDL yielding a richer and trustworthy data representation producing much improved performance compared to using a single modality, this project aims to deploy these methodologies in different areas of healthcare, to prove how general they are and to provide experts domain useful insights into the data. In this respect, specific fields of experimentation concern: i) cancer research, where we are looking for quantitative signature from multi-omics data able to predict the prognosis and to select the right personalized therapy in non-small cell lung cancer, ii) COVID-19 research where we look for predictive risk factors computed from images and clinical data collected at the time the of the patients went to the emergency department, which will allow to distinguish between mild and severe outcomes, iii) atherosclerosis research where, using data available within the VIPVIZA study, we look for a multimodal signature prediction the evolution of the plaque in the carotihidies.