EE 290: Theory of Multi-armed Bandits and Reinforcement Learning

Announcements

This course will cover the cutting edge of multi-armed bandits and reinforcement learning theory. We will talk about stochastic and adversarial bandits, the best-of-both-worlds phenomenon, the UCB and EXP3 algorithms and their variations, lower bound techniques, exploration bonus design for contextual bandits and RL, the connections between Thompson sampling and frequentist framework, non-asymptotic analysis of TD learning, policy gradient, and Q learning, fundamental limits of online and offline RL, policy evaluation v.s. optimization, imitation learning and inverse RL, as well as theory building for practical algorithms to bridge the gap between theory and practice.