This site is devoted to mathematics and its applications. Created and run by Peter Saveliev.

Persistent homology

From Mathematics Is A Science
Jump to navigationJump to search
See also Seurat’s painting and persistent homology

To evaluate topology in the presence of noise one has to take into account its robustness. The measurement for this robustness is called persistence.

Filtration is a sequence of "nested" cell complexes, where the arrows are inclusions:


Then we have a sequence of homology groups connected by homomorphisms:

$$H_{∗}(K¹)→H_{∗}(K²)→… →H_{∗}(K^{s})→0.$$

The persistence a homology class can be evaluated as its life-span: how long does it take for these homomorphisms to map this class to 0.

Why is this such a good idea? Because this is the only way to deal with noise in a purely topological way.

The standard definition of persistent homology

Persistent homology is commonly defined on the chain level: the p-persistence kth homology group of \(K^{i}\) is $$H_{k}^{i,p}(K^{i})=Z_{k}^{i}/(B_{k}^{i+p}∩Z_{k}^{i}),$$ where

  • \(Z_{k}^{i}\) is the group of k-cycles in \(K^{i}\) and
  • \(B_{k}^{i+p}\) is the group of k-boundaries in \(K^{i+p}\).

Persistent Betti numbers are the ranks of the persistent homology groups.

This is the standard approach to persistence.

Background in homology

Suppose we are given a point cloud $K$ in a euclidean space of dimension $d$. Suppose also that we are given a threshold $r$ so that any two points within $r$ from each other are to be considered "close". Then each pair of points like that is connected by an edge. Further, if three points are connected to each other, pairwise, by edges, we add a face spanned by these edges. If there are four, we add a tetrahedron, etc. From vertices to tetrahedra and beyond, we call them $0$-, $1$-, $2$-, ..., $d$-cell. The result is a cell complex made of $k$-cells that are attached to each other along $(k-1)$-cells. (Note that there are many ways to build a cell complex from a point cloud.)

What do we want to learn about $K$? We want to quantify its topology by means of the so-called Betti numbers: $B_{0}$ is the number of connected components in $K$; $B_{1}$ is the number of holes or tunnels (1 for letter O or the donut; 2 for letter B and the torus); $B_{2}$ is the number of voids or cavities (1 for both the sphere and the torus), etc.

How does one compute Betti numbers? The methods come from homology theory. One starts by considering the collection $C_{k}(K)$ of all combinations of cells of the same dimension $k,$ called chains. Together they form a chain complex $C_{\ast}(K).$ A $k$-chain can be recorded as an $N$-vector, where $N$ is the total number of $k$-cells in $K$. As an illustration, a component of this vector is $1$ if the corresponding cell is present, $-1$ if it present with the opposite orientation, and $0$ if it is absent. The boundary of a $k$-chain is the chain comprised of all $(k-1)$-faces of its cells. Then the boundary operator $\partial :C_{k}(K)\rightarrow C_{k-1}(K)$ acts on the chain complex and is represented by a matrix.

From the chain complex $C_{\ast}(K)$ the homology group is built by means of the standard tools of linear algebra. To capture the topological features one concentrates on cycles, i.e., chains with zero boundary, $\partial A=0$. Further, one can verify whether two given $k$-cycles $A$ and $B$ are homologous: the difference between them is the boundary of a $(k+1)$-chain $T:A-B=\partial T$. In this case, $A$ and $B$ belong to the same homology class $H=[A]=[B]$. Examples of homologous $A$ and $B$ are: two vertices in the same component of any complex ($k=0$); two longitudes of the torus, but not a longitude and a latitude ($k=1$); the inner surface and the outer surface of a "thick" sphere ($k=2$). The totality of these equivalence classes in each dimension $k$ form the $k$-th homology groups $H_{k}(K)$ of $K$, collectively $H_{\ast}(K)$. Commonly, $H_{k}(K)$ is simply a vector space and its dimension is equal to the corresponding Betti number $B_{k}$.

In addition to computing the Betti numbers as the global topological characteristics of the complex, homology theory can also provide local information. For every small patch $P$ of $K$ we compute the relative homology $H_{\ast}(K,K\backslash P)$ by, essentially, collapsing the complement of $P$ to a single point. Then, if $K$ is a manifold, its dimension is equal to $n$ provided: $H_{n}(K,K\backslash P)\neq0$ and $H_{k}(K,K\backslash P)=0$ for $k\neq n.$

"Noise-less" homology

The methods for computing homology groups are well developed. In real life however the point clouds are noisy and one needs to evaluate the significance of the topological features in this uncertain environment. We measure the "robustness" of the Betti numbers as follows. Instead of using a single threshold and studying a single cell complex, we consider all thresholds and all possible cell complexes. The idea is to analyze all of them, then pool the topological features together in a single structure, and finally pick the features that lie within the user's choice of an acceptable level of noise. Homology theory provides a solution for this problem.

First we combine the homology groups of all the cell complexes into one structure. For example, we may have a sequence of complexes: \[ K^{1}\hookrightarrow K^{2}\hookrightarrow K^{3}\hookrightarrow K^{4}% \hookrightarrow\ldots\ \hookrightarrow K^{s}, \] where the arrows are inclusions: $i^{n,n+1}:K^{n}\hookrightarrow K^{n+1},$ and let $i^{n,m}:K^{n}\hookrightarrow K^{m}$ be defined as the compositions. This structure $\{K^{n}\}$ is called a filtration. Further, each of these inclusions generates a homology homomorphism: $i_{\ast}% ^{n,m}:H_{\ast}(K^{n})\hookrightarrow H_{\ast}(K^{m}).$ Commonly, these are simply linear operators represented by matrices. Then we have a sequence of homology groups connected by homomorphisms: \[ H_{\ast}(K^{1})\rightarrow H_{\ast}(K^{2})\rightarrow\ldots\ \rightarrow H_{\ast}(K^{s})\longrightarrow0. \] The homology group of filtration $\{K^{n}\}$ captures all homology classes of all the complexes in a compact way: \[ H_{\ast}(\{K^{n}\})=\ker i_{\ast}^{1,2}\oplus\ker\,i_{\ast}^{2,3}\oplus\ker i_{\ast}^{3,4}\oplus\ldots\oplus\ker i_{\ast}^{s,s+1}. \] Indeed, from the homology group of each complex we take only the elements that are about to die (go to $0$). Since each dies only once, there is no double-counting. Since the sequence ends with $0,$ we know that everyone will die eventually. Thus every homology class appears once and only once.

"Denoised" homology

Next, the significance of a homology class in the sequence is measured by how long it lives before it ends up at $0$. This number $p$ is called the persistence of the homology class $x: i_{\ast}^{n,n+p}(x)=0$ and $i_{\ast}^{n,n+p-1}(x)\neq0.$ Given a positive integer $p,$ the $p$-noise group $N_{\ast}^{p}(\{K^{n}\})$ is comprised of the homology classes with the persistence less than $p:$ \[ N_{\ast}^{p}(\{K^{n}\})=\ker i_{\ast}^{1,1+p}\oplus\ker\,i_{\ast}% ^{2,2+p}\oplus\ker i_{\ast}^{3,3+p}\oplus\ldots\oplus\ker i_{\ast}^{s,s+p}, \] and the $p$-persistent homology group is \[ H_{\ast}^{p}(\{K^{n}\})=H_{\ast}(\{K^{n}\})/N_{\ast}^{p}(\{K^{n}\}). \] In other words: if the difference between two homology classes is deemed noise, they are equivalent.

A two-parameter "filtration" $\{K^{n,m}\}$ can be built from a point cloud: $n=n(r),$ where $r$ is the radius of the ball, and $m=m(d),$ where $d$ is the density of the cloud. Homology theory allows us to handle this situation as well. Suppose we have a partially ordered set $I$ and a collection of complexes indexed by $I$, $\{K^{n}\}_{n\in I},$ with the inclusions $i^{n,m}:K^{n}\rightarrow K^{m}$ for all $n\leq m.$ Then the homology of this "poset filtration" is \[ H_{\ast}(\{K^{n}\})=% %TCIMACRO{\tbigoplus \nolimits_{n}}% %BeginExpansion {\textstyle\bigoplus\nolimits_{n}} %EndExpansion% %TCIMACRO{\tbigcap \nolimits_{m\geq n}}% %BeginExpansion {\textstyle\bigcap\nolimits_{m\geq n}} %EndExpansion \ker\,i_{\ast}^{n,m}. \] The noise groups and the persistence groups are defined as before. Color images will produce 3-parameter filtrations.

More details in a draft paper of mine Homology groups of filtrations.