This project investigates whether diffusion-based world models for reinforcement learning can be made more computationally efficient by operating in a learned latent space. Diffusion models have already been used as world models, but their iterative sampling procedure can make long-horizon trajectory generation expensive. The main idea of this thesis is to reduce this cost by applying diffusion to compact latent representations rather than directly in the original observation or state space.
The goal is to test whether latent-space diffusion world models can generate longer imagined trajectories, or more trajectory samples, under the same compute budget. This would make them more practical for model-based reinforcement learning, where policy learning depends heavily on efficient model rollouts.
The work will be empirical. I will implement or adapt existing diffusion-based world-model and latent model-based RL code, train agents in simulated environments, and compare standard diffusion world models with latent-space diffusion variants. The evaluation will focus on reinforcement learning performance, rollout efficiency, computational cost, and the number of useful imagined trajectory steps that can be generated.