UCSD CSE190a Haili Wang

Tuesday, January 19, 2010

from learning to finding

This is a brief description of the project work flow:

The computer learns interested objects, in our case, bees, from human inputs. The "knowledge" of bee recognition is stored as descriptors in a data structure. On an input image, the computer calculate descriptors in the same fashion as the learning phase, and then finds correlations between the input image and the "knowledge". A highly correlated input indicates that the input is highly possible to be the interested object(bee). As the computer learns more about what can be a bee and what should not be a bee, it can distinguish better and better of the input image. 

learning.m is a user interactive script for the program to learn from human: "what is a bee and what isn't". User can double click on the bees (30 for each execution) and the program will store the histogram of color (HOC) and histogram of gradient(HOG) in a [99 30] data structure called positiveResult. Instead of loop over each pixel to accumulate the histogram, his_fast.m uses find function to loop over each bin. Each histogram has only 33 bins vs each image has thousands of pixels.  his_fast.m improves its performance significantly by shorten the for-loop. Only A* B* channels and gradient magnitude are used. L channel is ignored and RGB is converted to gray scale before calculation of gradient magnitude. Therefore positiveResult(:, i) is a descriptor of the ith bee. 


finding.m detects bees in a given image. The program divide the input image into 30 by 30 small windows and overlap by 15 pixels. For each small window it calculates the HOC and HOG information (his_image [4620 3] for an input image of 273*397), and then compares them with the positiveResult via corr2. his_image(((i-1)*33 + 1):33*i, :) returns a [33 3] descriptor for each window.

Somehow elements in positiveResult have high correlations with unrelated windows. For an instance of "cropped-2-20081212-091900.jpg", posiveResult(:, 1) and flatten version of image(1:33, :) has correlation of 0.86. This is a problem to be resolved.

I threshold to corr2 returns greater than 0.9, and the window is picked only when 3 or more positiveResult agree.

This is a plot for the 75th window compare to 2nd, 3rd, 9th, and 10th positiveResult.


What's next:

A negativeResult can be done in a similar fashion in order to eliminate false positives.

Windows indexing should be built.

Bigger size of positiveResult should be built.


[1] Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf

[2] Stanley Bileschi, Lior Wolf, Image representations beyond histograms of gradients: The role of Gestalt descriptors, http://www.mit.edu/~bileschi/papers/gestalt.pdf

No comments:

Post a Comment