Developing Multimodal Brain-Vision-Language Foundation Models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-1026

Type:

NAISS Small Compute

Principal Investigator:

Nona Rajabi

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-07-31

End Date:

2026-02-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

https://github.com/NonaRjb/AlignVis

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

Foundation models have recently demonstrated remarkable performance across various domains, including computer vision and natural language processing. Motivated by their success, researchers in neuroscience have begun exploring the application of these models to brain-related data. In our previous project, we developed a multimodal framework to retrieve visual stimuli based on participants’ brain activity. Specifically, we investigated the role of image encoders and compared the performance of human-aligned versus unaligned image representations in constructing joint embedding spaces between brain signals and images. This work was published at ICML 2025, one of the premier conferences in artificial intelligence and machine learning. As a natural extension of this work, we now aim to explore alternative approaches for aligning representations across modalities. While our prior work employed contrastive learning to align brain and image embeddings, the proposed project will examine more recent and potentially more effective techniques in multimodal representation learning. Our goal is to assess their ability to integrate brain signals with various sensory modalities and to improve downstream decoding performance.