Optimization for Machine Learning

The core of contemporary machine learning consists of solving large-scale optimization problems that arise from statistical models. Optimization for Machine Learning is an umbrella project exploring the many facets of optimization that are applicable to machine learning and statistical data analysis. The driving principle is that scaling machine learning may not require a new algorithm, per se, but rather a simplification of how to robustly scale these algorithms to large data sets.

Part of this effort is interested in understanding the theory of modularity and robustness for machine learning pipelines: how can we stitch together many simple, off the shelf techniques for data analysis with guarantees on performance and stability? We are building systems to make it easier to assemble such modular systems, and to prototype effectiveness of varied configurations. And we are working on scaling and parallelizing the backbone algorithms, such as stochastic gradient descent, mapping them on to the commonly used architectures for data analysis.

Relevant papers and reports

Asynchronous algorithms

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. In Advances in Neural Information Processing Systems, 2011.

A Perturbed Iterate Framework for Asynchronous Stochastic Optimization Algorithms. Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Kannan Ramchandran, Michael I. Jordan, and Benjamin Recht. Preprint, 2015.

Parallel Correlation Clustering on Big Graphs. Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan. Preprint, 2015.


Applications and theory of the stochastic gradient method

Parallel Stochastic Gradient Algorithms for Large-Scale Matrix Completion. Benjamin Recht and Christopher Re. Mathematical Programming Computation. 5(2):201–226, 2013.

Factoring Nonnegative Matrices with Linear Programs. Victor Bittorf, Benjamin Recht, Christopher Re, and Joel A. Tropp. In Advances in Neural Information Processing Systems, 2012.

Beneath the Valley of the Noncommutative Arithmetic-Geometric Mean Inequality: conjectures, case-studies, and consequences. Benjamin Recht and Christopher Re. In Conference on Learning Theory (COLT), 2012.

Practical Large-Scale Optimization for Max-Norm Regularization Jason Lee, Benjamin Recht, Ruslan Salakhutdinov, Nathan Srebro, and Joel A. Tropp. In Advances in Neural Information Processing Systems, 2010.

Online Identification and Tracking of Subspaces from Highly Incomplete Information. Laura Balzano, Robert Nowak, and Benjamin Recht. In /Proceedings of the 48th annual Allerton Conference on Communication, Control, and Computing’, 2010.


Automating the analysis of optimization algorithms

A General Analysis of the Convergence of ADMM. Robert Nishihara, Laurent Lessard, Benjamin Recht, Andrew Packard, and Michael I. Jordan. In International Conference on Machine Learning, 2015.

Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints. Laurent Lessard, Andrew Packard, and Benjamin Recht. To appear in SIAM Journal on Optimization, 2015.


Machine Learning Systems

Towards a Unified Architecture for in-RDBMS Analytics. Xixuan Feng, Arun Kumar, Benjamin Recht, and Christopher Re. In Proceedings of the ACM SIGMOD Conference, 2012.


KeystoneML. Evan Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael Franklin and Benjamin Recht.


Other techniques

The Alternating Descent Conditional Gradient Method for Sparse Inverse Problems. Nicholas Boyd, Geoffrey Schiebinger, and Benjamin Recht. Preprint, 2015.

Query Complexity of Derivative-free Optimization. Kevin Jamieson, Robert Nowak, and Benjamin Recht. In Advances in Neural Information Processing Systems, 2012.

Sharp Time–Data Tradeoffs for Linear Inverse Problems. Samet Oymak, Mahdi Soltanolkotabi, and Benjamin Recht. Preprint, 2015.