In recent years, large deep neural networks (DNNs) have made impressive improvements on many different tasks in a variety of domains. Specifically for natural language and image input modalities, surprising advancements have been achieved through networks such as Generative Pre-trained Transformer- 3 (GPT-3) and GPT-4. Increasingly, these architectures are able to take multiple modalities as input to solve a variety of tasks. Often, large DNNs are massively pre-trained in order to learn general representations of the input data, thereby improving on the target tasks. However, much of the insights that have been gained through this research on large DNNs is focused on text and natural images as input. Additionally, pre-training is usually done with millions of samples extracted from the internet.
In order to translate these findings into domains such as medicine, it must be investigated whether the gained knowledge holds also in differing contexts. The biomedical domain provides significantly different context in that, for instance, sample sizes of dataset available for pre-training are of multiple magnitudes smaller and data modalities are inherently of different structure, as well as relate differently to each other. Therefore, the aim of this project is to investigate transfer learning for multimodal DNNs to gain insights about how downstream biomedical prediction tasks can be improved with these approaches. We focus primarily on molecular and medical imaging modalities as inputs since these have become increasingly available and reflect the state of the biological system at a high granularity.
To do so, we will investigate multiple research questions about transfer learning for multimodal DNNs with survival prediction as the target. Relatively large multimodal datasets for this target are available publicly, containing paired modalities for patients with different cancers. These datasets lend themselves for designing ablation studies in order to gain insights into the different pre-training approaches that will be researched. Particularly, the availability of high-resolution whole slide images (WSIs) of tumor biopsies provides detailed information about the state of the patient’s disease. It is an open challenge how to fuse WSIs with molecular data such as gene expression profiles. Insights gained from this project will enable more informed decisions when designing training schemes for improving predictions by fusing these heterogeneous modalities. As a final step, we will validate the gained insights on biomedical datasets that do not consist of cancer patients by, for instance, predicting endpoints of cardiovascular disease patients from multimodal data.