Autonomous driving technologies have witnessed significant advances in recent years, yet the capability to generate accurate, diverse, and safe motion trajectories remains a key challenge, especially in complex and dynamic urban traffic scenarios. This project, entitled “Motion Generation via Next Action Prediction for Autonomous Driving”, aims to develop a novel approach that bridges the gap between action-level decision making and trajectory generation, enabling more realistic and controllable autonomous driving behavior.
The core objective of this project is to investigate and design an end-to-end framework that predicts a sequence of future driving actions—such as steering angles, accelerations, and braking—based on real-time perception of the environment, and subsequently generates feasible motion trajectories for autonomous vehicles. Unlike traditional trajectory prediction methods that directly output a set of future positions, our approach focuses on learning the mapping from observation to actions, which are then used to roll out trajectories in a closed-loop fashion. This enables better interpretability, easier integration with downstream planning and control modules, and improved robustness in highly interactive traffic scenes.
The proposed methodology leverages recent advances in deep learning, particularly sequence modeling and graph neural networks (GNNs), to effectively capture both the temporal dynamics and the heterogeneous interactions among multiple traffic participants (vehicles, pedestrians, cyclists, etc.) and road infrastructure. The model will take as input a bird’s-eye-view (BEV) representation of the driving scene, historical state information, and high-definition (HD) map features, and learn to predict the most plausible next actions for the ego vehicle and, optionally, surrounding agents. These predicted actions are then recursively applied to generate multimodal future trajectories that comply with road geometry, traffic rules, and interactive behavior patterns.
This project will be validated on large-scale public autonomous driving datasets (such as Waymo Open Motion Dataset or Argoverse), and evaluated on various metrics including trajectory accuracy (minADE, minFDE), safety, and diversity. The anticipated outcome is a flexible, robust, and interpretable motion generation framework that can enhance the motion planning stack of autonomous driving systems, enabling safer and more human-like driving in complex environments. The developed models and tools are expected to benefit not only academic research but also the practical deployment of intelligent vehicles.