Lecturer: Percy Liang

Date: Sep 24

[Lecture slides]

[printer-friendly version with notes]

- Shawe-Taylor and Cristianini. Kernel Methods for Pattern Analysis. 2004. A generally good introduction to kernel methods. Contains a part on PCA and CCA and their kernelized versions.
- de Bie, et al. Eigenproblems in Pattern Recognition. 2004. Gives some general theory on generalized eigenvalue problems, which encompases PCA, CCA, FDA, PLS, and MLR. Also gives an iterative algorithm to solve these problems.
- Borga, et al. A unified approach to PCA, PLS, MLR and CCA. Another paper that uses generalized eigenproblems to unify a bunch of dimensionality reduction techniques.
- Derivation of Maximum Likelihood Factor Analysis using EM.
- Chen. Principal Component Analysis With Missing Data and Outliers. Summarizes some techniques to make PCA robust to outliers and handle missing data with a probabilistic model.

- Lee, Seung. Learning the parts of objects with nonnegative matrix factorization..
- Hoyer. Non-negative Matrix Factorization with Sparseness Constraints.
- Lin. Projected Gradient Methods for Non-negative Matrix Factorization.
- Srebro. Learning with Matrix Factorizations.
- Buntine, Jakulin: Discrete Principal Component Analysis.
- Matlab code for NMF.

- Hyv\"arinen, Karhunen, Oja. Independent Component Analysis (introduction chapter).
- Hyv\"arinen, Oja. Independent Component Analysis: Algorithms and Applications. 2000.
- Bach, Jordan. Kernel Independent Component Analysis. Some ICA software. FastICA.

- Turk and Pentland. Face recognition using eigenfaces. 1991. Used PCA on images of faces (eigenfaces) followed by a distance-based classifier for face recognition. This work is considered the first successful face recognition system. A summary of the techniques written 11 years later.
- Tutorial on component analysis for vision. 2006. Describes applications of PCA and CCA in computer vision.
- Hardoon, et al. Canonical correlation analysis; An overview with application to learning methods. 2003. Applies kernel CCA on two views (one of images and one of text) for image retrieval
- Yamanishi, et al. Heterogeneous data comparison and gene selection with kernel canonical correlation analysis.
- Matlab code for CCA.
- Schuetze. Distributional POS tagging. 1995. SVD applied in natural language processing. Each data point is the context in which a word token appears in a sentence. A lower-dimensional representation is useful because these context vectors are very high dimensional. K-means clustering is used afterwards.
- Ando, Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. 2005. Interesting use of SVD where we find a low-dimensional representation, not of data points, but of classifiers. The situation here is multi-task learning, where we have many (linear) classifiers and would like to learn some common structure between.
- Lakhina, et al. Diagnosing Network-Wide Traffic Anomalies. 2004. Use PCA as a model of "normal network traffic" on links. Traffic that deviates from the subspace is deemed anomalous.
- Larsen, et al. Sensitivity of PCA for Traffic Anomaly Detection. 2007. Studies the limitations of using PCA for detecting network anomalies.