Multi-armed bandits and beyond
In this tutorial I will discuss recent advances in theory of multi-armed bandits and reinforcement learning, in particular the upper confidence bound (UCB) and Thompson Sampling (TS) techniques for algorithm design and analysis.