Improving Sample Efficiency in Evolutionary RL using Off-policy Ranking.
Date19th Apr 2022
Time04:00 PM
Venue Hybrid: CRC 302 and virtual. Meeting link for first 900 registered participants will be sent before
PAST EVENT
Details
Evolution Strategy (ES) is a powerful technique for optimization based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. When used in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches, leading to increased environmental interactions. To improve sample efficiency, we propose a novel off-policy alternative for ranking. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well. This is joint work with my PhD student Eshwar and Prof. Shishir Kolathaya.
Speakers
Gugan Thoppe
Robert Bosch Center for Data Science and Artificial Intelligence