This site is being phased out.

Object recognition

From Mathematics Is A Science
Revision as of 22:59, 6 September 2010 by imported>WikiSysop
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Under construction...

Related but very different from "image recognition". The latter is essentially image-to-image search.

So far just a few examples: Numenta (below) and others that try object recognition for image search:

Numenta

The demo program is called Vision4 and was created by Numenta. This is its main point:

This program demonstrates some capabilities of Numenta's Hierarchical Temporal Memory (HTM) technology applied to visual object recognition. .. The HTM network contained in this demo has been trained to recognize four types of objects: cell phones, sailboats, cows, and rubber ducks.

Every image is given four ratings. Each represents how much the image resembles one of the four types.

As you can see, the goal is modest and there are no unsubstantiated claims of how this is ready to be applied in real life (and don’t get me started on academic publications!). This is refreshing. The program is also fun to play with. You can load your own images, you can add noise, blur etc to the images and see the effect on the recognition. The recognition results are often good and when they aren’t, it’s still interesting.

For serious purposes, it is unclear where this is going though.

It’s fine with me that there are only four categories – just one would be enough to test the concept. It does not bother me when a face is rated high in the cow category and another face high in the duck category. My main complaint is the instability of recognition under image transformations. For example, after turning “sailboat” a few degrees it became “cell phone”. A few degrees more and it becomes mixed - half “cow” (first image below). Adding noise, occlusion, etc has similar effect (second image).

Numenta screenshot 1.jpg Numenta screenshot 2.jpg



Certainly, one does not expect rotations to affect image recognition. Meanwhile, a mixed recognition is a failed recognition and should be presented as such.

I am certainly biased here. I don’t believe in “build[ing] machines that work on principles used by the brain”. I don’t believe in trying to imitate brain and written a few times about that. Traditionally, a scientist tries to understand nature by observing it, analyzing it, etc. Instead, it is suggested to try to understand nature by first understanding how the brain understands it? Seems like a roundabout to me, bordering on a vicious circle. I also have serious reservations about the use of machine learning in computer vision.

Annoying bug: every time I start it, the program would turn on my webcam and it would keep it on even after I shut it down.