After conducting a pilot study (NAISS 2025/22-301)) on a specially prepared, low-resolution subset of our data, we achieved successful convergence and promising results. However, our full high-resolution dataset is approximately 200 times larger, necessitating 15–20 times more computational resources. Therefore, we request a larger allocation. We would like to ask you to upgrade the small pilot-project grant NAISS 2025/22-301 to medium one.
This work is the research project of Peiyu Lu's Msc thesis, who is pursuing Image Analysis & Machine Learning at Uppsala University under the supervision of Bernhard Mehlig, Gothenburg University. This project focuses on using unsupervised methods to model and analyze rat behavior. We rely exclusively on an existing dataset, which comprises thousands of rat-behavior experiments recorded after administering different drugs. No new recordings are planned. These grayscale videos feature a single rat moving against a fixed background, offering a controlled setting to capture subtle behavioral changes.
In computer vision, a common way to achieve a high performance on a novel task is to adapt a so-called foundation model to the task at hand, e.g. by supervised fine-tuning. A foundation model is a model that can generalize well to a large number of different tasks and inputs. Such a model is usually obtained by performing extensive self-supervised training on a large and diverse dataset. However, currently available foundation models do not perform well on this particular dataset. Presumably the reason is the videos in this dataset are too different from the ones that the foundation models have been trained on.
The limited variability in the dataset may make it possible to train such a model at a fraction of the cost of training a standard foundation model. As a starting point, we will adapt a self-supervised training method known as Masked Autoencoder training. In this method, parts of an image or video clip is masked out, and a model is trained to reconstruct the masked parts. In order to perform well at this task, the model must learn to identify semantically meaningful features such as the relation between body parts and (in the case of video) how to distinguish between behaviors that unfold differently over time. A key challenge is to ensure that the model learns to distinguish the different rat behaviors that are contained in the dataset, rather than the fixed background which is common to all videos. We will address this through tailored masking and augmentation techniques that focus on the relevant parts of the video.
This master thesis project is co-supervised by Erik Werner and Sebastian Oleszko at IRLAB, a small biomedical company based in Gothenburg, working on discovering drugs that help alleviate the symptoms of Parkinson's disease and other neurodegenerative diseases. If successful, the project will both further the understanding of how to adapt self-supervised training techniques to specific datasets, and contribute to ongoing research on how to treat debilitating neurodegenerative disorders.