Self-supervised learning has achieved remarkable results in the past 5 years, and has now become the de facto method for training state-of-the-art machine learning systems. While self-supervised learning has seen considerable application to domains such as natural language processing and computer vision, we are in this project interested in applying it to the less studied field of team sports. More specifically, we wish to understand the game of soccer, where research has so far been narrow and specific, tackling problems such as action classification and trajectory prediction on a case-by case basis. Using the data from our industrial partner TRACAB, which features one of the largest soccer datasets in the world, we aim to gain insights into the sport by training a self-supervised foundational model for soccer comprehension.
Framing the problem as one of representation learning, we model the center of mass positional information of the ball and players as a dynamic complete graph. By employing methods such as Joint-Embedding Predictive Architectures, which have recently been shown to be highly efficient and performant in learning representations, we aim to learn a “master representation” of player behavior, across varying timescales and levels of abstraction. With learned representations in-hand, we believe downstream tasks such as action recognition and behavioral understanding, among others, will become significantly easier as they feature limited labeled data.
A preliminary study conducted with a subset of the data available shows emergent phenomena such as identifying player roles solely from temporal positional information, and marking who is likely to take or receive the ball next. To further the understanding of our proposed method, we aim to perform numerous ablation studies to derive insights on how to optimally train such a self-supervised sports-understanding system. To measure the performance of the trained models, we will use quantitative metrics on downstream tasks, in addition to qualitative analysis. Furthermore, we aim to later on include recently introduced skeleton data for more fine-grained input.
We request 100 GB as each data point is small, though there are many data points in the dataset.