Efficient ViT

Efficient ViT

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-576

Type:

NAISS Small

Principal Investigator:

Mehdi Babaeivavdare

Affiliation:

Chalmers tekniska högskola

Start Date:

2026-03-23

End Date:

2027-04-01

Primary Classification:

20208: Computer Vision and learning System (Computer Sciences aspects in 10207)

Webpage:

https://www.chalmers.se/en/persons/mehdiba/

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month
Arrhenius Disk at NAISS: 250 GiB

Abstract

I am interested in conducting a research project focused on Vision Transformers (ViTs) within the field of image analysis and computer vision. The primary goal of this project is to explore efficient architectures and optimization techniques that improve the performance and scalability of ViT models, particularly in resource-constrained environments. While Vision Transformers have demonstrated strong performance compared to traditional convolutional neural networks, they often require significant computational power and large datasets. In this project, I aim to investigate methods for enhancing efficiency, such as model compression, knowledge distillation, token reduction strategies, and hybrid architectures that combine convolutional and transformer-based approaches. Additionally, I plan to utilize available computational resources effectively, including GPUs and optimized deep learning frameworks, to ensure practical and scalable implementation. The project will involve experimenting with benchmark image datasets, evaluating performance in terms of accuracy, computational cost, and memory usage. By focusing on efficiency without significantly compromising performance, this work seeks to contribute to making Vision Transformers more accessible for real-world applications, especially in edge devices and distributed systems. Ultimately, this research aligns with the broader goal of developing scalable and resource-aware intelligent vision systems.