This site is being phased out.

Geometry of data

From Mathematics Is A Science
Jump to navigationJump to search

The standard approach to dimensionality reduction is Principal Component Analysis (PCA), which finds a "basis" for the dataset as a subspace of the ambient space that reveals its structure.

Another standard approach is to assign weights to edges in the graph of data is finding k-nearest neighbors of each node and then using the Euclidean distances.

So, the metric is local which is then extended globally. Why does the metric have to be Euclidean? For data the Euclidean distance is meaningless (squares and square roots?). Next, why does the dataset have to be a metric space in the first place? The coordinate system is made up too. And there are non-Euclidean topologies for the Euclidean space, even with Euclidean topology on each of the coordinates.

See also Topology of data.