NAISS
SUPR
NAISS Projects
SUPR
Generative Foundation Models: 4D Human Animation and Video Generation
Dnr:

NAISS 2026/3-475

Type:

NAISS Medium

Principal Investigator:

Christopher Peters

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2026-07-01

End Date:

2027-07-01

Primary Classification:

10201: Computer Sciences

Secondary Classification:

10207: Computer graphics and computer vision (System engineering aspects at 20208)

Tertiary Classification:

10210: Artificial Intelligence

Allocation

Abstract

This proposal requests GPU resources for research on controllable generative AI for human motion, 4D animation, and video generation at KTH’s Embodied Social Agents Lab. The work focuses on two tightly linked directions: large-scale synthetic physics-aware data generation and post-training of foundation models for controllable multimodal generation, editing, and grounding. The project builds on recent work in emotional 3D animation, multimodal reasoning, 3D human understanding, and controllable video generation. During the allocation period, the main work-loads will be: targeted synthetic data refresh and curation; LoRA, supervised fine-tuning, and post-training of video-language and omni/foundation models; and extensive ablations, bench-marking, and evaluation. The requested compute requirement is larger than 20,000 GPU-h/month, so this proposal asks for the round maximum as a productive cap. A100-class GPUs are essential because the target workloads are memory-bound on long visual sequences and multimodal contexts. The storage is needed for rendered videos, fitted motion parameters, derived features, checkpoints, and experiment artifacts. The project is carried out fully within KTH and is intended to support open scientific publication, reusable research infrastructure, and continued development of controllable multimodal generation methods.