vision language action model training

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-179

Type:

NAISS Small

Principal Investigator:

Shuo Sun

Affiliation:

Örebro universitet

Start Date:

2026-01-30

End Date:

2026-07-01

Primary Classification:

20201: Robotics and automation

Webpage:

https://www.vinnova.se/en/p/clearpath--multimodal-perception-and-ai-for-safe-autonomous-navigation/

Allocation

Alvis at C3SE: 500 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

Vision-language-action (VLA) is now the most popular topic in Robotics. Previous VLA models are mainly designed for some simple tasks, like pick-and-place. In this project, we will consider extending the VLA model from an agent perspective: creating an agent system that can think, act, and perceive. Motivated by the recent success in many coding agents, we will consider combining the thinking ability and VLA ability to make a full agent system. The main supervisor: Martin Magnusson (Örebro University)