NAISS
SUPR
NAISS Projects
SUPR
Combining DINOv3 and Mask2Former for Urban Streetscapes Semantic and Instance Segmentation
Dnr:

NAISS 2026/4-506

Type:

NAISS Small

Principal Investigator:

Yinghao Chen

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-03-12

End Date:

2027-04-01

Primary Classification:

20105: Transport Systems and Logistics

Webpage:

Allocation

Abstract

Built environment characteristics, such as sidewalk availability, vegetation coverage, and traffic object density, are important inputs for transportation analysis. However, automated extraction of these indicators from street-level imagery often requires separate computer vision models for different tasks. This project investigates a unified framework for extracting transportation-relevant built environment features using street-level imagery. The proposed approach combines task-oriented supervision with a shared foundation model architecture based on DINOv3 and Mask2Former. A whitelist strategy is used to focus training on transportation-relevant classes while preserving the broader prediction space of the original dataset. The framework jointly produces pixel-level semantic segmentation and instance-level object masks from the same model. Experiments will be conducted on the Mapillary Vistas dataset, which contains high-resolution street-scene images with detailed annotations. Training and evaluation of transformer-based segmentation models require substantial GPU resources due to the large dataset size and high image resolution. The project aims to improve the extraction of transportation-relevant built environment indicators and provide reliable inputs for downstream transportation analysis and planning.