This site is being phased out.

Principal Component Analysis

From Mathematics Is A Science
(Redirected from PCA)
Jump to navigationJump to search

256px-GaussianScatterPCA.png

Principal component analysis is a mathematical procedure for finding a linear transformation of the $n$-dimensional Euclidean space that contains a "point cloud" (a collection of sample points) so that the features of its shape are aligned with the new coordinate axes.

The concept follows the geometric idea of finding the principal axes of an ellipsoid.

Probabilistically, the random variable is decomposed into uncorrelated ones with extreme variances (max or min).

Such analysis will reveal the interdependence in data. If the spread along an axis is less than some threshold, this axis may be ignored which leads to dimensionality reduction.

Given a point cloud $\{ (x_1, ..., x_n) \}$ with $N$ elements, one starts by placing the new origin at the center of mass of the point cloud and then forms the dispersion matrix: $$D_{ij} = \frac{\sum [(x_i-\left<x_i\right>)(x_j-\left<x_j\right>)]}{N}. $$ where $<.>$ is the mean. Then the eigenvectors and their eigenvalues give the principal axes and the corresponding variances.

Related: Topological data analysis.