Vision-language-action (VLA) is now the most popular topic in Robotics.
Previous VLA models are mainly designed for some simple tasks, like pick-and-place. In this project, we will consider extending the VLA model from an agent perspective: creating an agent system that can think, act, and perceive.
Motivated by the recent success in many coding agents, we will consider combining the thinking ability and VLA ability to make a full agent system.
The main supervisor: Martin Magnusson (Örebro University)