Gestalt and computer vision

Following From Gestalt Theory to Image Analysis, A Probabilistic Approach by Desolneux, Moisan, and Morel.

Gestalt is a psychology theory of the mind. There is also an image analysis angle as Gestalt is a German word for "form" or "shape". In the introduction the book presents are few Gestalt principles and gives them a mathematical interpretation. One principle I found especially relevant.

Werthheimer’s contrast invariance principle: Image interpretation does not depend on actual values of the gray levels, but only their relative values.

As the book further explains, the principle comes from the fact that one shouldn’t expect or rely on precise measurements of intensity. Here's my example:

The second part of the principle suggests that one should look at the level sets of the gray scale function, as well as upper and lower level sets. In the blurred image above, the circle is still recognizable regardless of the low contrast. Which level set should be chosen to evaluate the size of the circle is ambiguous however. (See examples of image analysis that have to deal with this ambiguity.)

So far, so good. Unfortunately, next the authors concentrate on upper level sets exclusively. This is a common approach. The result is that you recognize only light objects on dark background. To see dark on light will take an extra step (invert colors). Meanwhile dealing with image that have objects with holes (or dark spots on light objects) becomes really messy. Pixcavator's algorithm builds the hierarchy of dark and light objects in one sweep (see Topology graph). To experiment with the concepts, download the free Pixcavator Student Edition.

The book isn’t really about Werthheimer’s principle but another one (more of a definition).

Helmholtz principle: Gestalts are sets of points whose (geometric regular) special arrangements could not occur in noise.

Two Gestalt laws can be used to explain some optical illusions.

The amodal completion law: “[W]hen a curve stops another curve, thus creating a “T-junction”… our perception tends to interpret the interrupted curve as the boundary of some object undergoing occlusion.” This law is also related to the good continuation law.

Penrose triangle and fork are illusions (confusions?) are caused by the perceived depth in the image, locally:

Clearly, this should be interpreted as the assumed differentiability (smoothness) of the first curve...

The perspective law: “Whenever several concurring lines appear in an image, the meeting point is perceived as a vanishing point (point of infinity) in a 3-D scene. The concurring lines are then perceived as parallel lines in space.” (Sounds reasonable, but how come all parallel lines are man-made?)

The Sander illusion (the left diagonal appears longer than the right one) and the Müller-Lyer illusion (the middle arrow appears longer) are caused by the perceived depth in the image:

I’d also add the Ponzo illusion (the "farther" bar appears longer than the "closer" one):

Also, remember Willy Wonka’s door?..

To summarize, both laws state that a person always sees 3D in a 2D image. But the fact is, one 2D image may correspond to many different 3D situations – including the drawing itself! That’s what causes the illusions.

So, these are interesting ideas that provide excellent explanations for the illusions. However, is it a good idea to try to design a computer vision system based on these laws? You don’t want to rely on a system that is so easy to fool…

For more illusions, See also Human vision vs. computer vision.

Gestalt and computer vision

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools