Our image analysis algorithm, and image analysis software, was created initially to illustrate and test how an elementary tool of algebraic topology, homology theory, can be used in computer vision. The idea is that just as Mathematics rests on Topology (and Algebra), Computer Vision should stand on a simple topological substructure (a little history here). Using Pixcavator as such a platform one can build more advanced image analysis applications.
The scope of the project grew as soon as I realized that computer vision lacks a universally accepted foundation. I think there are many reasons but the biggest one is the emphasis on different “solutions”, “methods” and “algorithms”. Suppose you have many solutions for a computer vision problem. It may sound good to the developer, but is it good for the user? As a user I don't mind having different ways to solve my problem - but only if they all give the same answer! So, here is the first principle we'll follow in this project.
Of course, for each WHAT there may be several HOWs and we need to find the best one. But for now just one will do. So, here is the best part:
But what is this WHAT? It is mathematics. We build our algorithms based entirely on the mathematical understanding of the problems of computer vision (which incidentally forces us to stay away for now from things like AI, machine learning, pattern recognition, fuzzy logic, etc). We develop our methods from scratch starting from the most elementary (or most fundamental if you prefer - I do) and try to do it in such a way that you wouldn’t have to redo it.
The algorithm detects and captures objects in images. But what is an object? The answer is quite obvious in the case of a binary image. It is either a connected cluster of black pixels on white background or a connected cluster of white pixels on black background. What about a gray scale image? Our approach follows human perception - a dark region on a light background may be an object and so is a light region on a darker background. For more see Objects in gray scale images (this approach is also valid for color images). Numerous real life applications of this approach are given here.
It is clear that these "objects" aren't real objects. For example, black pants and white shirt will be two separate objects. Then what's the point? This is the point:
In other words, we are taking care of the very first step in image analysis (see also Fields related to computer vision). The flip side of this is the following principle:
The point is that we need to analyze image in such a way that a single pixel variation of the image would be negligible. Another way to look at this is that as the resolution increases the analysis results should "converge" to the analysis results of the real scene depicted in the image.
The existing methods of algebraic topology apply only to objects and images that have no attributes such as gray level, color, or time, i.e., still binary images. Instead of trying to generalize these methods for gray scale, then for color images, then for videos, etc, we adopt the following general approach to "parametric images".
These image are acquared from the original via thresholding. This approach leads to another general principle:
The mathematical tools that we use make the following goal possible.
This term is justified by the fact that nothing is removed from or ignored in the image unless specifically requested by the user. The user retains complete control of what happens! (See also Computation error.)
What the user chooses to keep is preserved without deformation, smoothing, blurring, etc. There is also no iteration, no approximation (almost) or interpolation, and no floating point arithmetic! This is one of many things that differentiate our approach from those common in the computer vision and image processing industry. Patent pending...
The algorithm detects objects in the image and finds their locations and measurements. Its first version is for Binary Images and second for Grayscale Images. The extensions of the algorithm are in progress for the following:
As a mathematician I am always skeptical about the applicability of a particular piece of mathematics to a particular real life problem. So, a couple of disclaimers.
First, the losslessness of the topological analysis has a flip side. The algorithm will never treat two objects as one no matter how close they are (that's rarely the case in gray scale images though). For example, a thin scratch cuts an object in two, or person's shirt and pants are always treated separately. So, the first limitation of our approach is the following.
The simplest way to group two adjacent objects into one is via dilation/erosion. At this time this part of our approach has not been sufficiently developed. In the near future we will address how morphological operations can evaluate Robustness of topology.
The second limitation is more profound.
This means that the analysis is meaningful only when the image can be interpreted as 2-dimensional. (Of course, we do extract 3D information from multiple 2D images of the same scene - via stereo vision.) For examples, see Images appropriate for analysis.
Finally, why would this "theory" ever be of any use? For the answer I'll refer you to this old essay The Unreasonable Effectiveness of Mathematics in the Natural Sciences (or consider Mathematics: Queen and Servant of Science).