Meta constrained reinforcement learning

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/22-1677

Type:

NAISS Small Compute

Principal Investigator:

Valter Schütz

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-12-18

End Date:

2026-01-01

Primary Classification:

20206: Computer Systems

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

In many reinforcement learning (RL) problems, the objective is to maximize a goal-related reward while satisfying constraints, often expressed as penalties or negative rewards. For instance, a robot might need to learn to reach a target while adhering to constraints, such as avoiding excessive speed. Overweighting these constraints can impede the agent's ability to learn the goal-oriented behavior. Prior research suggests that temporarily disabling constraints and enabling them only after the agent has achieved the goal a few times can accelerate learning of the optimal policy compared to enforcing constraints from the beginning. This project explores whether managing the activation of such constraints and optimizing the agent's final performance can itself be framed as an RL problem for a meta-agent.