Skip to main content
Improving Sample Efficiency in Evolutionary RL using Off-policy Ranking.

Improving Sample Efficiency in Evolutionary RL using Off-policy Ranking.

Date19th Apr 2022

Time04:00 PM

Venue CRC 302

PAST EVENT

Details

Evolution Strategy (ES) is a powerful technique for optimization based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. When used in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches, leading to increased environmental interactions. To improve sample efficiency, we propose a novel off-policy alternative for ranking. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well. This is joint work with my PhD student Eshwar and Prof. Shishir Kolathaya.

Speakers

Dr Gugan Thoppe

Computer Science and Engg.