CS294 - The Mathematics of Information and Data
CS294 - The Mathematics of Information and Data
Instructor:Ben Recht
Time: Fri 12:00-3:00 PM
Location: 373 Soda Hall
Description:
This course will explore the foundations of an emerging discipline: the mathematics of information and data. Through recent and classic texts in mathematical statistics, optimization, and computer science, we will find unifying themes in these three disciplinary approaches. We will draw connections between how we analyze running time, statistical accuracy, and implementation of data-driven computations. We will focus in particular on large deviation inequalities, convex analysis and their applications in minimax statistics; sparse and stochastic optimization; and discrete and convex geometry. This course is ideal for advanced graduate students who would like to apply these theoretical and algorithmic developments to their own research.
The current list of topics (which will change depending on the course we chart) is:
- Stochastic Optimization
- stochastic gradients, online learning, and the Kaczmarz algorithm
- core sets and importance sampling
- randomized algorithms for linear systems
- Random Matrices
- Elementary analysis of random matrices
- Graph sparsification, frames, and matrix approximation
- Noncommutative Chernoff Bounds
- Average Case Analysis of Optimization Problems
- covering numbers, VC dimension, rademacher complexity
- metric embedding and restricted isometries
- compressed sensing and all that it has wrought
Grading: Each student will be required to attend class regularly and either lead the discussion scribe notes for at least one class.
Prerequisites: Consent of the instructor is required.
Graduate level courses in probability and optimization will be necessary.
Lecture notes template
Session 1 (08/30): Introduction.
Session 2 (09/13): Stochastic
gradient, online learning, and the Kaczmarz algorithm.
Discussion Leader: Ben Recht
Scribe: Jonathan Terhorst
[notes]
Readings:
- [pdf] Nemirovski, A. and Juditsky, A. and Lan, G. and
Shapiro, A. ``Robust stochastic approximation approach to
stochastic programming.'' SIAM Journal on Optimization.
19(4),pp. 1574--1609. 2009.
Only read 1574--1581.
- [pdf] Strohmer, T. and Vershynin, R. ``A randomized Kaczmarz algorithm with exponential convergence.'' Journal of Fourier Analysis and Applications.
15(2), pp. 262--278. 2009.
- [pdf] Zinkevich, M. ``Online Convex Programming and Generalized
Infinitesimal Gradient Ascent.'' In Proceedings of the Twentieth
International Conference on Machine Learning, pp. 928--936,
2003.
- [pdf] Bucklew, J. A.,
Kurtz, T. G., and Sethares, W. A. ``Weak Convergence and Local
Stability Properties of Fixed Step Size Recursive Algorithms.''
Online Convex Programming and Generalized
Infinitesimal Gradient Ascent.'' IEEE Transactions on Information
Theory 30(3), pp. 966--978. 1993.
Session 3 (09/20): Core-sets and
importance sampling.
Discussion Leader: Nick Alteri and Nick Boyd
Scribe: Lisa Anne Hendricks [notes]
Readings:
- [pdf]
Arthur, D. and Vassilvitskii, S. ``k-means++: The advantages of
careful seeding.'' In Proceedings of SODA, 2007.
- [pdf] Lanberg, M. and
Schulman, L. J. ``Universal epsilon-approximators for integrals.''
In Proceedings of SODA, 2010.
- [pdf]
Bādoiu, M., Har-Peled, S., and Indyk, P. ``Approximate clustering via
core-sets.'' In Proceedings of STOC, 2002.
Session 4 (09/27): Noncommutative
Chernoff bounds.
Discussion Leader: Ashia Wilson and John Duchi
Scribe: Miles Lopes
Readings:
- [pdf]
Ahlswede, R. and Winter, A. ``Strong converse for identification via quantum
channels.'' IEEE Trans. Information Theory 48 pp
568--579. 2002. Only read the appendix! Also read Vershynin's notes for more
inequalities, context, and commentary.
- [pdf]
Oliveira, R. I. ``Sums of random Hermitian matrices and an inequality
by Rudelson.'' Electronic Communications in Probability.
15 203--212. 2010.
Session 5 (10/04): Graph Sparsificiation.
Discussion Leader:TBA
Scribe: TBA
Readings:
- [pdf]
Spielman, D. A. and Srivastava, N. ``Graph sparsification by effective resistances.'' SIAM Journal on Computing
40(6), 1913--1926. 2011.
- [pdf]
Batson, J. and Spielman, D. A. and Srivastava, N.
``Twice-ramanujan sparsifiers.''
SIAM Journal on Computing. 41(6) 1704--1721. 2012.
Session 6 (10/11): Diagonally
dominant systems.
Discussion Leader:TBA
Scribe: TBA
Readings:
- [pdf]
Kelner, J. A and Orecchia, L. and Sidford, A. and Zhu, Z. A.
``A simple, combinatorial algorithm for solving sdd systems in nearly-linear time.''
In Proceedings of the 45th annual ACM symposium on Symposium on
theory of computing. 2013.
Session 7 (10/18): Statistical complexity.
Discussion Leader:TBA
Scribe: TBA
Readings:
- [pdf]
Bartlett, P. L. and Mendelson, S. ``Rademacher and Gaussian
complexities: Risk bounds and structural results.''
Journal of Machine Learning Research. 3. 463--482. (2002)
- [pdf]
Bousquet, O. and Elisseeff, A. ``Stability and generalization.''
Journal of Machine Learning Research. 2. 499--526. (2002).
Session 8 (10/25): Inverse
Problems 1.
Discussion Leader:TBA
Scribe: TBA
Readings:
- [pdf]
Candes, E. J. and Recht, B. ``Simple bounds for recovering low-complexity models.''
Mathematical Programming. 141 (1). 577--589. (2013)
- [pdf]
Recht, B. ``A simpler approach to matrix completion.''
Journal of Machine Learning Research. 12. 3413--3430. (2011).
Session 9 (11/1): Inverse
Problems 2.
Discussion Leader:TBA
Scribe: TBA
Readings:
- [pdf]
Chandrasekaran, V. and Recht, B. and Parrilo, P. A., and Willsky,
A. S. ``The Convex Geometry of Linear Inverse Problems.''
Foundations of Computational Mathematics. 12 (6). 805--849. (2012)
Session 10 (11/8): Learning representations.
Discussion Leader:TBA
Scribe: TBA
Readings:
- [pdf]
Arora, S., Moitra, A., and Ge, R. ``New Algorithms for Learning Incoherent and
Overcomplete Dictionaries.'' arxiv.org/1308.6273v1
(2013)
- [pdf]
Agarwal, A., Anandkumar, A., and Netrapalli, P. ``Exact Recovery of
Sparsely Used Overcomplete Dictionaries.'' arxiv.org/1309.1952v1
(2013)x