Collaborative Best arm Identification in Multi-armed Bandits
Date24th Mar 2022
Time02:00 PM
Venue Google Meet
PAST EVENT
Details
We consider the best arm identification problem in stochastic multi-armed bandits (MAB) under a fixed budget setting where the agents learning the MAB instance are connected through a network. In our collaborative model, each agent’s
action and its associated reward are instantaneously observed by the agent as well as its one-hop neighbours. At the end of the budget, an ensemble learner (EL) aggregates samples from the agents and decides the best arm. The objective is to design arm selection policies for each of the agents (and a suitable aggregation policy for the EL) to minimize the probability of incorrect arm identification. We propose two collaborative learning policies that first partition the network of agents into dominating sets. We then employ (i) a highly exploring UCB-E variant or (ii) a ‘Follow Your Leader’ policy within each star partition, and at the end of the budget, the EL aggregates the samples from all the partitions to select the best arm. We show that our policies enjoy an exponentially decaying probability of error in the number of agents as well as the budget. Notably, our policies do not require a minimum dominating set partition of the graph (a hard problem), and the probability of error bounds remain independent of the size of the dominating set partition.
Speakers
Amit Anand Jha (EE19S002)
Electrical Engineering