Diffusion for Exploration in Reinforcement Learning

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-807

Type:

NAISS Small Compute

Principal Investigator:

Daniele Foffano

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-06-03

End Date:

2026-07-01

Primary Classification:

20202: Control Engineering

Webpage:

Allocation

Alvis at C3SE: 1000 GPU-h/month

Abstract

Efficient exploration remains a major challenge for Reinforcement Learning (RL) algorithms. Over the last two decades, several exploration strategies have been proposed in the literature, often designed with the aim of minimizing regret. With this project, we want to investigate a novel Model-Based algorithm employing guided diffusion methods to encourage exploration. The first step will be to implement a naive solution based on the advantage function, to steer the diffusion generated samples towards trajectories that are bettern than the average. While this approach is appealing for its simplicitiy and proved effectiveness, it does not provide any guarantee on the number of samples required to learn an optimal policy. Therefore, we want to subsequently focus on a more important objective: finding an exploration strategy to learn the optimal policy using the fewest number of samples. This problem is also known as Best Policy Identification, which has been extensively studied in the Multi-Armed Bandits setting, and only recently for Markov Decision Processes.