This project uses large-scale compute to automatically discover new reinforcement-learning (RL) paradigms that can outperform today’s hand-designed algorithms. Instead of tuning parameters of existing approaches, we evolve the underlying learning rules themselves, with a large language model proposing new algorithmic updates and GPU-accelerated simulations providing objective performance scores. The search requires running many RL trainings in parallel, each over millions of interaction steps, making HPC resources essential. The goal is to build a scalable system that can explore a vast algorithmic space, identify promising learning strategies, and deepen our understanding of what makes RL methods stable and effective across diverse tasks.