Geospatial vision-language pre-trained model

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-781

Type:

NAISS Small Compute

Principal Investigator:

Weiming Huang

Affiliation:

Lunds universitet

Start Date:

2024-06-01

End Date:

2025-06-01

Primary Classification:

10507: Physical Geography

Webpage:

Allocation

Alvis at C3SE: 750 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

In this project, we intend to develop a geospatial large-scale pre-trained model through fine-tuning vision-language models, e.g., CLIP, with geospatial data and theories. Specifically, we intend to use both geospatial visual data (e.g., remote sensing images and street view images) as well as textual data (e.g., points of interest and social media data) to adapt the general-purpose vision-language models to a geospatial and urban context. We expect that the model will then be able to produce highly effective urban representations (embeddings) for various urban analytical tasks, such as urban land use inference and population density estimation.