Motion Generation vis Next Action Prediction for Autonomous Driving

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-917

Type:

NAISS Small Compute

Principal Investigator:

Xiaoyu Mo

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-06-25

End Date:

2026-07-01

Primary Classification:

20201: Robotics and automation

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

Autonomous driving technologies have witnessed significant advances in recent years, yet the capability to generate accurate, diverse, and safe motion trajectories remains a key challenge, especially in complex and dynamic urban traffic scenarios. This project, entitled “Motion Generation via Next Action Prediction for Autonomous Driving”, aims to develop a novel approach that bridges the gap between action-level decision making and trajectory generation, enabling more realistic and controllable autonomous driving behavior. The core objective of this project is to investigate and design an end-to-end framework that predicts a sequence of future driving actions—such as steering angles, accelerations, and braking—based on real-time perception of the environment, and subsequently generates feasible motion trajectories for autonomous vehicles. Unlike traditional trajectory prediction methods that directly output a set of future positions, our approach focuses on learning the mapping from observation to actions, which are then used to roll out trajectories in a closed-loop fashion. This enables better interpretability, easier integration with downstream planning and control modules, and improved robustness in highly interactive traffic scenes. The proposed methodology leverages recent advances in deep learning, particularly sequence modeling and graph neural networks (GNNs), to effectively capture both the temporal dynamics and the heterogeneous interactions among multiple traffic participants (vehicles, pedestrians, cyclists, etc.) and road infrastructure. The model will take as input a bird’s-eye-view (BEV) representation of the driving scene, historical state information, and high-definition (HD) map features, and learn to predict the most plausible next actions for the ego vehicle and, optionally, surrounding agents. These predicted actions are then recursively applied to generate multimodal future trajectories that comply with road geometry, traffic rules, and interactive behavior patterns. This project will be validated on large-scale public autonomous driving datasets (such as Waymo Open Motion Dataset or Argoverse), and evaluated on various metrics including trajectory accuracy (minADE, minFDE), safety, and diversity. The anticipated outcome is a flexible, robust, and interpretable motion generation framework that can enhance the motion planning stack of autonomous driving systems, enabling safer and more human-like driving in complex environments. The developed models and tools are expected to benefit not only academic research but also the practical deployment of intelligent vehicles.