SUPR
Progressive Distillation of Diffusion Policy in O2O RL
Dnr:

NAISS 2024/22-1307

Type:

NAISS Small Compute

Principal Investigator:

Ruoqi Zhang

Affiliation:

Uppsala universitet

Start Date:

2024-10-16

End Date:

2025-11-01

Primary Classification:

10207: Computer Vision and Robotics (Autonomous Systems)

Webpage:

Allocation

Abstract

This proposal aims to develop an efficient offline-to-online reinforcement learning (O2O-RL) framework that combines the expressiveness of diffusion-based policies with the computational efficiency of progressive distillation. In offline RL, diffusion models are effective due to their ability to capture complex, multimodal action distributions, making them ideal for learning from diverse offline datasets. However, diffusion models typically require many sampling steps, making them computationally expensive and impractical for real-time decision-making in the online phase. To address this, we propose using progressive distillation to reduce the number of sampling steps, ensuring that the agent can act quickly and efficiently during online fine-tuning without losing the policy's robustness. By combining diffusion policies for robust offline training with progressive distillation for fast, real-time action generation, this O2O-RL framework offers an optimal balance between expressiveness and efficiency. This approach will enable RL agents to efficiently adapt to new environments and improve performance during real-time interactions, making it ideal for real-world applications where online adaptation is critical.