SUPR
Manipulator Codesign and Diffusion-based Planning
Dnr:

NAISS 2025/5-43

Type:

NAISS Medium Compute

Principal Investigator:

Florian Pokorny

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-03-01

End Date:

2026-03-01

Primary Classification:

10207: Computer graphics and computer vision (System engineering aspects at 20208)

Secondary Classification:

10201: Computer Sciences

Tertiary Classification:

10299: Other Computer and Information Science

Allocation

Abstract

The European Commission project SoftEnable extends the concept of rigid body caging to soft, deformable objects, integrating both extrinsic and intrinsic constraints to create robust manipulation primitives capable of handling perturbations effectively. This project has two parts: The first part of the project focuses on iterative bilevel optimization for designing tools or robot end-effectors, and it is part of SoftEnable. A Bayesian optimization framework proposes candidate designs in the outer loop, while Berzelius computing resources train an image-based RL agent to evaluate manipulation policies corresponding to these designs. The reinforcement learning agent processes RGB images of manipulated rigid and deformable objects as observations and outputs corresponding tool/gripper actions. Our approach centers on robust manipulation (e.g., pushing, scooping) of deformable objects using manipulation robustness metrics as rewards during training. In addition to the above ongoing deformable object manipulation project (currently allocated 2000 GPU hours/month, which is insufficient), we will start a new project involving diffusion models for imitation learning and skill chaining. The new project uses diffusion models to: Compute escape energy for robust long-horizon manipulation planning, leveraging DDPMs (Ho et al., 2020). Learn manipulation primitive skills such as pushing, picking, and placing from expert demonstrations, and chain these skills using task and motion planning algorithms like PDDLStream (Garrett et al., 2020). We curate a dataset of escape paths and train DDPMs to imitate these behaviors, enabling fast, multi-modal inference of manipulation plans that outperform traditional sampling-based methods. Diffusion model training is computationally intensive, requiring high-performance GPUs.