SUPR
Causal Egocentric Video Action Recognition
Dnr:

NAISS 2025/22-833

Type:

NAISS Small Compute

Principal Investigator:

Lei Shi

Affiliation:

Örebro universitet

Start Date:

2025-06-03

End Date:

2026-07-01

Primary Classification:

10207: Computer graphics and computer vision (System engineering aspects at 20208)

Webpage:

Allocation

Abstract

In this project, we will on egocentric video action recognition. Give an egocentric video, the task is to predict the action label of the video. We will develop a causality-based deep learning method to tackle this task. Our method consists of vision-language models (VLMs), video transformers and causal variational autoencoders (VAEs). We use VLMs to extract language descriptions of the videos and train transformers to obtain features representing verb, noun and action. The causal VAEs are used to learn causality between language, verb, noun and action.