Lecture | Topic | Notes | Readings | Optional/Additional Readings | Videos | |
Th Aug 27 | Course introduction. | 2pp 6pp | ||||
Tu Sep 1 | Feedforward, Feedback, PID, Control of fully actuated systems | 2pp 6pp code | AM 10.3, 10.4; T 1.2 | |||
Th Sep 3 | Lyapunov, (Energy pumping) | 2pp 6pp | T Ch. 3; SL, Example 3.21 | |||
Tu Sep 8 | Optimal Control, HJB, Discretization | 2pp 6pp | T Ch. 6; SB Ch. 4; Munos and Moore, MLJ 2001 pp.1-9 pdf; Chow and Tsitsiklis, 1991 pdf | Kushner and Dupuis, 1992/2001 | ||
Th Sep 10 | Dynamic programming with function approximation | 2pp 6pp | Gordon, 1995 pdf; Tsitsiklis and Van Roy, 1996 pdf | |||
Tu Sep 15 | No lecture. | |||||
Th Sep 17 | Dynamic programming with function approximation; speed-ups/tweaks | 2pp 6pp | Gordon, 1995 pdf; Tsitsiklis and Van Roy, 1996 pdf | BT 6.5, 6.1; Moore and Atkeson, Prioritized sweeping pdf | ||
Tu Sep 22 | LQR + variations | 2pp 6pp | Tedrake, LQR trees pdf, Atkeson and Stephens pdf, Todorov, 2005 pdf, Anderson and Moore, Optimal Control: Linear Quadratic Methods | |||
Th Sep 24 | MPC, feedback linearization | draft slides: 2pp 6pp | Tedrake Ch. 9 and App. A | Diehl +al., MPC overview pdf; John T. Betts, "Practical Methods for Optimal Control Using Nonlinear Programming," 2001; Slotine and Li Chapter 6; Isidori, "Nonlinear control systems," 1989. | ||
Tu Sep 29 | Bandits | draft notes: pdf, draft slides: 2pp 6pp | Regret-based approaches: Lai and Robbins, 1985; Auer +al, UCB algorithm 1998 pdf; papers on Bayesian exploration in MDPs: Poupart+al, Asmuth+al, Kolter+Ng | |||
Thu Oct 1 | Policy iteration | If you wanted to read ahead for this and the next few lecturs: See chapters 4, 6, 7, 8, 11.1 of SB | SB Chapters 4, 5, 6, 7, 8, 11.1 html | |||
Tue Oct 6 | Example MDPs, Recap some exact methods for MDPs | 2pp 6pp | SB Chapter 4 | |||
Thu Oct 8 | Linear programming; Model-free methods (TD) | 2pp 6pp LP notes | SB Chapter 6 | |||
Tu Oct 13 | TD, sarsa, Q, TD(\lambda) | 2pp 6pp | SB Chapters 6, 7 | |||
Thu Oct 15 | TD with function approximation, TD Gammon, other examples | 2pp 6pp | SB Chapters 8, 11.1 | Tsitsiklis and Van Roy, 1997 pdf, | ||
Tu Oct 20 | LSTD, LSPI, RLSTD, behavioral cloning | 2pp 6pp | LSPI pdf, Bradtke and Barto, 1996, LSTD pdf , Kolter and Ng, Feature selection in LSTD pdf | |||
Thu Oct 22 | Behavioral cloning, Inverse RL | 2pp 6pp | More complete slides on Inverse RL from Robot Learning Summer School, 2009 pdf | |||
Tu Oct 27 | Inverse RL wrap-up, Policy search | 2pp 6pp | ||||
Thu Oct 29 | Policy search & Actor-Critic | 2pp 6pp | Peters and Schaal, IROS 2006, Policy gradient methods for robotics; Ng and Jordan UAI 2000, an analysis of fixing the random seed in policy search | |||
Tu Nov 3 | An application of likelihood ratio methods: Learning to walk | 2pp 6pp | Lecture by Russ Tedrake on learning to walk
Tedrake, Zhang and Seung, Learning to walk in 20 minutes |
toddler, tinker-toy, Cornell kneed-walker | ||
Thu Nov 5 | Natural gradient; briefs on various topics incl. approximate LP, pomdp's, reward shaping, exploration vs. exploitation, hierarchical methods | 2pp 6pp | Kakade, NIPS 2002 Natural policy gradient; Peters and Schaal natural actor critic ; Calafiore and Campi constraint sampling; de Farias and Van Roy constraint sampling ALP; Kearns and Singh E3; Ng, Harada and Russell reward shaping; Wiewiora reward shaping equivalence with V, Q initialization; Marthi, Russell and Andre hierarchical Q decomposition ; Kober, Peters ball in a cup (nips2008) | ball-in-a-cup-video | ||
Tue Nov 10 | State estimation: HMM, KF | 2pp 6pp | Probabilistic Robotics Chapters 1, 2, 3 | From Gauss to Kalman | ||
Thu Nov 12 | State estimation: KF, EKF, UKF | 2pp 6pp | Probabilistic Robotics Chapter 3 (Gaussian filters), 10 (EKF SLAM), Julier and Uhlmann, the UKF | |||
Tu Nov 17 | State estimation: UKF, particle filter, mapping, localization, SLAM | 2pp 6pp | Probabilistic Robotics Chapter 4 (particle filters), 6 (robot perception), 8 (localization); particle filter tutorial | |||
Thu Nov 19 | State estimation: mapping, SLAM | 2pp 6pp | Probabilistic Robotics Chapters 9 (occupancy grid mapping), 13 (fastSLAM), 11 (graphSLAM) | Bailey and Durrant-Whyte SLAM tutorial: part 1; part 2 | ||
Tue Nov 24 | Quadruped locomotion --- Guest Lecturer: J. Zico Kolter | speaker bio 2009 slides 2008 slides | ||||
Thu Nov 26 | Happy Thanksgiving! | |||||
Tue Dec 1 | Project presentations. | Barron, Smith, Liu, Kolev, Lin, Strausser, Chang-Siu, Moldovan, Soerensen | ||||
Thu Dec 3 | Project presentations. | Hoburg, Weekly, Song, Swift, Hunter, Javdani, Berg Kirkpatrick, Singh+Tang, Maitin-Shepard |