Sequential Multi-hypothesis Testing in Multi-armed Bandit Problems: An Approach for Asymptotic Optimality
Date9th Mar 2022
Time03:00 PM
Venue Google Meet
PAST EVENT
Details
We consider a multi-hypothesis testing problem involving a K-armed bandit. Each arm's signal follows a distribution from a vector exponential family. The actual parameters of the arms are unknown to the decision maker. He incurs a delay cost for delay until a decision and a switching cost whenever he switches from one arm to another. We model the problem as a sequential decision-making problem, where the decision maker gets only a limited view of the true state of nature at each stage, but can control his view by choosing the arm to observe at each stage. The goal is to minimize the overall cost until a decision is reached on the true hypothesis.
An information-theoretic lower bound on the total cost (expected time for a reliable decision plus total switching cost) is first identified, and a variation on a sequential policy based on the generalised likelihood ratio statistic is then studied. The proposed policy, with a suitable threshold for stopping, is shown to satisfy the given constraint on the probability of false detection. Under a continuous selection assumption, we show that the policy is asymptotically optimal in terms of the total cost among all policies that satisfy the constraint on the probability of false detection.
Speakers
Gayathri R Prabhu (EE15D035)
Electrical Engineering