This site is being phased out.

Stereo vision

From Mathematics Is A Science
Jump to navigationJump to search
Stereo vision is one of the ways of extracting 3D information from 2D images. However, the presence of the second image serves as an extra parameter, which you can think of as the third dimension.


Left.gif Right.gif

Taking two images of the same scene from two (slightly) different locations and then finding the same item in both of them gives you the distances to the item.

The image matching part is crucial and more challenging. For example, in the image on the right the corner of the cube may be a good pixel to choose, but how would the computer know? The rest of image is mainly featureless. The geometry is simple, below.

Suppose we established a match between a pixel P in image I and pixel Q in image J. Let's find the distance to whatever these pixels depict.

Two images with a red pixel in each image representing the same thing:

Stereo vision 1.jpg

We need only to consider only the horizontal line through P,Q to find the distance to the object with the red dot:

Stereo vision 2.jpg

View from above; the eyes are the foci of the cameras. Black lines are the images:

Stereo vision 3.jpg

"Triangulation" (the word means something entirely different in topology): the object lies on the line from the focus of the camera and its mark on the image. Here the big red dot is the actual location of the object:

Stereo vision 4.jpg

The geometry:

Stereo vision 5.jpg

D is what we are looking for.

The pink, and the blue, triangles are similar. So,

 f / x = D / a
 f / y = D / b.

Then,

 a = xD / f
 b = yD / f.

Then,

 L = x +  a   +  b   + y
  = x + xD / f + yD / f + y
  = x + D / f(x + y)  + y
  = (x + y)(D / f + 1)

Then,

 D = f(L / (x + y) - 1)

Now, d = x + y is simply the distance the pixel moves as we switch from one image to the other. It is called the disparity. Then

D = f(L / d - 1)

The lack of structure in the image often makes finding good reference points very hard. One of the tricks that may (partially) overcome this problem is projecting a structure (such as the grid lines below) on the scene.

Cup-left.jpg Cup-right.jpg


See also Stereo vision with Pixcavator (the project is based on image matching and on analysis of Image sequences).

For a video, see Stereo vision with hacked Kinect.