Learning General Policies with Deep Reinforcement Learning

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-1778

Type:

NAISS Small Compute

Principal Investigator:

Martin Funkquist

Affiliation:

Linköpings universitet

Start Date:

2025-12-23

End Date:

2027-01-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Alvis at C3SE: 125 GPU-h/month
Mimer at C3SE: 125 GiB

Abstract

Classical planning systems typically rely on symbolic search and handcrafted heuristics, which can be computationally expensive and requires the same amount of compute, even if the problem has been solved 100 times before (e.g. no learning). This project aims to learn general policies for classical planning domains using deep reinforcement learning (DRL), enabling decision-making that transfers across instances within a domain. The central goal is to train neural policies that map symbolic planning states to actions, reducing or eliminating the need for search at inference time. The approach integrates classical planning formalisms (e.g., PDDL-defined domains) with DRL, learning directly from structured state representations and transition dynamics. Policies are trained on simpler planning problems and tested on more difficult problems to encourage generalization rather than memorization. Methodologically, the project focuses on relational and graph-based state encodings combined with different modern reinforcement learning algorithms. Some challenges are: generalization across MDPs, sparse rewards and long horizons. Significant GPU resources are required to support large-scale training across multiple planning domains and problem sizes. Experiments involve millions of environment interactions, deep neural architectures such as graph neural networks, and extensive hyperparameter and ablation studies. The expected outcome is a scalable learning-based planning framework that complements classical methods, providing general, transferable policies for structured decision-making problems relevant to robotics, automation, and AI planning.