Reinforcement Learning via Latent-Space Diffusion Models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-580

Type:

NAISS Small

Principal Investigator:

Antonin Roy

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2026-06-01

End Date:

2026-10-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Arrhenius GPU at NAISS: 500 GPU-h/month
Arrhenius Disk at NAISS: 250 GiB

Abstract

This project investigates whether diffusion-based world models for reinforcement learning can be made more computationally efficient by operating in a learned latent space. Diffusion models have already been used as world models, but their iterative sampling procedure can make long-horizon trajectory generation expensive. The main idea of this thesis is to reduce this cost by applying diffusion to compact latent representations rather than directly in the original observation or state space. The goal is to test whether latent-space diffusion world models can generate longer imagined trajectories, or more trajectory samples, under the same compute budget. This would make them more practical for model-based reinforcement learning, where policy learning depends heavily on efficient model rollouts. The work will be empirical. I will implement or adapt existing diffusion-based world-model and latent model-based RL code, train agents in simulated environments, and compare standard diffusion world models with latent-space diffusion variants. The evaluation will focus on reinforcement learning performance, rollout efficiency, computational cost, and the number of useful imagined trajectory steps that can be generated.