Tracab operates 14 precisely calibrated 4K camera systems in football arenas worldwide, producing large amount of multi-view human motion data. Although the data is high quality, it is not perfect and contains artifacts stemming from occlusion, crowding, and intermittent tracking failures. This project will develop deep generative models to denoise and infill human motion sequences.
Short, moderately noisy clips can often be cleaned with regression. However, longer sequences with severe artifacts, such as several seconds of missing data, are inherently multi-model and require generative modeling. We will employ diffusion-based models to learn continuous distributions over human motion.
Historically, human motion research has been less compute intensive than image based vision due to smaller datasets and lower per-sample dimensionality. Tracab's data closes this gap. Moreover, diffusion training is more demanding than regression, even on public datasets.
The expected outcome is state-of-the-art infilling and denoising of human motion at an unprecedented scale, together with new insights. We also expect the same models to support controllable generation, conditioning on user-specified trajectory, keyframes and more.