We used the computer vision OpenCV framework to accomplish this effect. We perform background subtraction to get a foreground mask of the person standing in front of the camera. Then on this masked image we filter out anything that is not skin colored so we are left with just the hands and head. Finally blob detection together with the relative position of the blobs is used to determine the position of the hands and the head. Tracking these blobs gives us information about what gestures are used, which in turn make the computer do button prompts to control the photo browser.
Programming: Geert Beuneker.