Video models for behavioural analysis of existing video recordings of laboratory rats

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-301

Type:

NAISS Small Compute

Principal Investigator:

Bernhard Mehlig

Affiliation:

Göteborgs universitet

Start Date:

2025-03-07

End Date:

2025-06-01

Primary Classification:

20208: Computer Vision and learning System (Computer Sciences aspects in 10207)

Webpage:

Allocation

Alvis at C3SE: 1000 GPU-h/month

Abstract

Here we apply for the computational resources needed for the master thesis project of Peiyu Lu, who is studying Image Analysis & Machine Learning at Uppsala University. The thesis is supervised by Bernhard Mehlig, Gothenburg University. We apply for a small grant to allow us to launch a pilot study to estimate the amount of compute required for the full project, and will follow up by applying for a medium grant. The purpose of this project is to investigate how to adapt self-supervised training techniques to efficiently learn to extract informative features from a dataset with limited variability. We rely exclusively on an existing dataset, which comprises thousands of rat-behavior experiments recorded after administering different drugs. No new recordings are planned. These grayscale videos feature a single rat moving against a fixed background, offering a controlled setting to capture subtle behavioral changes. In computer vision, a common way to achieve a high performance on a novel task is to adapt a so-called foundation model to the task at hand, e.g. by supervised fine-tuning. A foundation model is a model that can generalize well to a large number of different tasks and inputs. Such a model is usually obtained by performing extensive self-supervised training on a large and diverse dataset. However, currently available foundation models do not perform well on this particular dataset. Presumably the reason is the videos in this dataset are too different from the ones that the foundation models have been trained on. The limited variability in the dataset may make it possible to train such a model at a fraction of the cost of training a standard foundation model. As a starting point, we will adapt a self-supervised training method known as Masked Autoencoder training. In this method, parts of an image or video clip is masked out, and a model is trained to reconstruct the masked parts. In order to perform well at this task, the model must learn to identify semantically meaningful features such as the relation between body parts and (in the case of video) how to distinguish between behaviors that unfold differently over time. A key challenge is to ensure that the model learns to distinguish the different rat behaviors that are contained in the dataset, rather than the fixed background which is common to all videos. We will address this through tailored masking and augmentation techniques that focus on the relevant parts of the video. This master thesis project is co-supervised by Erik Werner and Sebastian Oleszko at IRLAB, a small biomedical company based in Gothenburg, working on discovering drugs that help alleviate the symptoms of Parkinson's disease and other neurodegenerative diseases. If successful, the project will both further the understanding of how to adapt self-supervised training techniques to specific datasets, and contribute to ongoing research on how to treat debilitating neurodegenerative disorders.