Efficient exploration remains a major challenge for Reinforcement Learning (RL) algorithms. Over the last two decades, several exploration strategies have been proposed in the literature, often designed with the aim of minimizing regret. With this project, we want to investigate a novel Model-Based algorithm employing guided diffusion methods to encourage exploration. The first step will be to implement a naive solution based on the advantage function, to steer the diffusion generated samples towards trajectories that are bettern than the average. While this approach is appealing for its simplicitiy and proved effectiveness, it does not provide any guarantee on the number of samples required to learn an optimal policy. Therefore, we want to subsequently focus on a more important objective: finding an exploration strategy to learn the optimal policy using the fewest number of samples. This problem is also known as Best Policy Identification, which has been extensively studied in the Multi-Armed Bandits setting, and only recently for Markov Decision Processes.