NAISS
SUPR
NAISS Projects
SUPR
From Points to Distributions: Joint Probabilistic Modeling of Vision-Language Embeddings via Riemannian Flow Matching
Dnr:

NAISS 2026/4-691

Type:

NAISS Small

Principal Investigator:

Mayank Nautiyal

Affiliation:

Uppsala universitet

Start Date:

2026-04-13

End Date:

2026-08-01

Primary Classification:

10210: Artificial Intelligence

Allocation

Abstract

Modern vision-language models map images and text into a shared embedding space, but standard approaches rely on deterministic point embeddings and cannot represent uncertainty or the one-to-many nature of cross-modal correspondence. This project develops GeoFlowVLM, a geometry-aware probabilistic framework that models the joint distribution of image and text embeddings directly on the product hypersphere using Riemannian Flow Matching. The method enables unified joint, conditional, and marginal sampling without retraining the underlying vision-language model. By explicitly modeling cross-modal uncertainty on the manifold, the project aims to improve uncertainty quantification, retrieval robustness, and probabilistic reasoning in vision-language systems.