EE 290: Theory of Multi-armed Bandits and Reinforcement Learning

University of California Berkeley, Jiantao Jiao, Spring 2021

Lectures

Lecture 1 (Introduction to Bandits and Reinforcement Learning)

Lecture 2 (Analysis of finite-arm i.i.d.-reward bandit)

Lecture 3 (Explore then commit and successive elimination)

Lecture 4 (Analysis of Successive Elimination and UCB Algorithm)

Lecture 5 (Minimax Lower Bound for Finite-Arm Bandit Algorithms)

Lecture 6 (Minimax Lower Bound and Thompson Sampling)

Lecture 7 (Frequentist Regret Bound for Thompson Sampling)

Lecture 8 (Wrapping Thompson Sampling and Majority Vote Algorithm)

Lecture 9 (Weighted Majority and Randomized Weighted Majority)

Lecture 10 (Hedge and Exp3 Algorithms)

Lecture 11 (Adversarial Bandits and EXP3)

Lecture 12 (Exp3-Implicit Exploration)

Lecture 13 (Online Mirror Descent)

Lecture 14 (Introduction to Reinforcement Learning)

Lecture 15 (Interpretations of Bellman Equation, Optimality Conditions)

Lecture 16 (Planning Algorithms for Markov Decision Processes)

Lecture 17 (Bellman Operators, Policy Iteration, and Value Iteration)

Lecture 18 (Rmax Exploration)

Lecture 19 (Offline Learning with Uniform Coverage)

Lecture 20 (Value Iteration and Q-learning)

Lecture 21 (TD Learning with Linear Function Approximation)

Lecture 22 (UCB-VI Algorithm)

Lecture 23 and 24 (Importance Sampling, Doubly Robust Estimator)

Lecture 25 (Imitation learning)