Reinforcement learning has achieved strong results in simulated control, robotics, and sequential decision-making. Still, current methods often remain sensitive to changes in environment dynamics, task parameters, reward structure, and observation distribution. This lack of generalization limits the use of reinforcement learning in real-world control and robotic systems, where operating conditions are rarely identical to the training environment.
This project focuses on studying generalization in reinforcement learning under environment variation and distribution shift. The central goal is to train, adapt, and evaluate neural policies that can transfer across related control and robotic tasks, rather than only solving a single fixed environment. The project will investigate standard reinforcement learning algorithms together with different policy architectures, including multilayer perceptrons, convolutional policies, transformer-based policies, and pretrained neural policies where appropriate.
We will use simulated environments with randomized dynamics, task parameters, initial conditions, and observation settings. Policies will be trained on families of environments and evaluated on unseen variations to study robustness, sample efficiency, and transfer performance. The experiments will include repeated runs over several random seeds, comparisons between policy architectures, and limited adaptation of pretrained models using parameter-efficient methods.