Continuous Tactical Optimism and Pessimism
Date12th Sep 2023
Time12:00 PM
Venue SSB 233 (MR-1)
PAST EVENT
Details
In the field of reinforcement learning for continuous control, deep
off-policy actor-critic algorithms have become a popular approach
due to their ability to address function approximation errors through
the use of pessimistic value updates. However, this pessimism can
reduce exploration, which is typically seen as beneficial for learning
in uncertain environments. Tactical Optimism and Pessimism
(TOP) proposed an actor-critic framework that dynamically adjusts
the degree of optimism used in value learning based on the
task and learning stage. However, their fixed bandit framework
acts as a hyper-parameter for each task. We need to consider two
hyperparameters: the number of arms and arm values. To simplify
this problem, we consider learning the degree of optimism
Speakers
Kartik Bharadwaj (CS20S020)
Computer Science and Engg.