Robust Reinforcement Learning

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2023/22-825

Type:

NAISS Small Compute

Principal Investigator:

Daniele Foffano

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2023-08-23

End Date:

2024-09-01

Primary Classification:

20202: Control Engineering

Webpage:

Allocation

Alvis at C3SE: 250 GPU-h/month
Mimer at C3SE: 250 GiB

Abstract

With this project, I aim at investigating new methods to make Reinforcement Learning (RL) more robust. More precisely, I am going to study how to apply the Conformal Prediction method to Off-Policy Evaluation (OPE) algorithms, where the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most OPE methods do not come with accuracy and certainty guarantees. The aim is to present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. Experiments carried out on this cluster will be useful to validate the theoretical guarantees with practical experiments.