Skip to main content
Continuous Tactical Optimism and Pessimism

Continuous Tactical Optimism and Pessimism

Date12th Sep 2023

Time12:00 PM

Venue SSB 233 (MR-1)

PAST EVENT

Details

In the field of reinforcement learning for continuous control, deep
off-policy actor-critic algorithms have become a popular approach
due to their ability to address function approximation errors through
the use of pessimistic value updates. However, this pessimism can
reduce exploration, which is typically seen as beneficial for learning
in uncertain environments. Tactical Optimism and Pessimism
(TOP) proposed an actor-critic framework that dynamically adjusts
the degree of optimism used in value learning based on the
task and learning stage. However, their fixed bandit framework
acts as a hyper-parameter for each task. We need to consider two
hyperparameters: the number of arms and arm values. To simplify
this problem, we consider learning the degree of optimism

Speakers

Kartik Bharadwaj (CS20S020)

Computer Science and Engg.