Sparse Machine Learning for Large-Scale Text Analytics and
Applications
Consider a large data set of documents (emails, Q&As, news, etc). What
are the most important keywords? What are examples of questions (or
answers) that illustrate how these keywords are used? What are the
important clusters of documents? Can we automatically name or tag
them? Can we summarize the difference between two different sets of
documents? Can we even quickly compare them if they are written in a
different language, without having to translate everything? How do
these summarization techniques allow user-friendly visualizations?
In this lecture I will describe a set of recently developed machine
learning techniques for large-scale text analytics, which are based on
the idea of sparsity. These highly scalable techniques allow to
address some of the challenges raised above. I will provide examples
from specific data sets (flight reports from commercial pilots in the
US, and news data), and hope to encourage a discussion on the
relevance of large-scale text analytics for online learning.