Appears in collection : Schlumberger workshop - Computational and statistical trade-offs in learning
This talk proposes a complete characterization of the complexity of best-arm identification in one-parameter bandit models. We first give a new, tight lower bound on the sample complexity, that is the total number of draws of the arms needed in order to identify the arm with highest mean with a prescribed accuracy. This lower bound does not take an explicit form, but reveals the existence of a vector of optimal proportions of draws of the arms, that can be computed efficiently. We then propose a 'Track-and-Stop' strategy, whose sample complexity is proved to asymptotically match the lower bound. It consists in a new sampling rule, which tracks the optimal proportions of arm draws, and a stopping rule for which we propose several interpretations and that can be traced back to Chernoff (1959).