CN101383008A

CN101383008A - Image classification method based on visual attention model

Info

Publication number: CN101383008A
Application number: CNA2008102016259A
Authority: CN
Inventors: 张瑞; 杨小康; 宋雁斓; 陈尔康; 支琤
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2008-10-23
Filing date: 2008-10-23
Publication date: 2009-03-11

Abstract

The invention discloses a method of image classification based on a visual attention model in the technical field of image classification. The method of the invention comprises the following steps: step 1: certain quantity of images are randomly selected from a selected image database to be used as training samples; step 2: a characteristic vector of each image based on the visual attention model and the overall scarcity is captured; step 3: a characteristic vector of each image based on the visual attention model and the overall scarcity is calculated to the image to be classified; and step 4: the characteristic vectors captured in step 2 and step 3 are transmitted to a classifier to be classified; finally, the result of the classification of the image to be classified is obtained. The method can classify the image by capturing visual characteristic characters at a high level from the characters at a bottom level so as to ensure more precise result of the classification.

Description

Image classification method based on visual attention model

Technical field

What the present invention relates to is a kind of method of image classification technical field, specifically is a kind of image classification method based on visual attention model.

Background technology

The Visual Selective Attention mechanism of human eye is exactly to make us can locate interesting target apace in the visual environment of complexity.Notice is used a kind of information processing bottleneck mechanism, and it only allows sub-fraction to enter sensory information and arrives short-term memory and vision attention zone.If certain visual stimulus (object) is enough remarkable, its can come out from a width of cloth picture saliency, and this conspicuousness is with to observe purpose irrelevant, it with a kind of fast, bottom-up mode works.

Image classification, i.e. image category identification is a subject that rises with the development of computing machine, has now permeated every field.As the colour solid characteristic research in the biology; Telescope graphical analysis in the uranology; Ecg analysis in the medical science, eeg analysis, medical image analysis; Airphoto analysis in the military field, radar and sonar input and classification, automatic target identification or the like.In recent years, along with popularizing of digital picture, designing the image classification method that can manage the large nuber of images storehouse automatically has huge realistic meaning.

Current, the common method that image classification is discerned is to extract characteristics of image earlier, exports to the classification that sorter carries out eigenwert again.Characteristics of image comprises geometric properties, shape facility, color characteristic, textural characteristics or the like.The feature of being extracted is sent to sorter, and sorter is used for the sample of feature that receives and database is mated identification, judges whether present image belongs to a certain classification, and the height of present image and this classification similarity degree.Sorter at present commonly used is based on following several mode classifications, for example Bayesian decision criterion, minimum distance classification method, support vector machine method (Svm), supercharging (Boosting), neural net method etc.

Find by prior art documents, people such as Li Fei-Fei are at proceedings of IEEEComputer Society Conference on Computer Vision and Pattern Recognition, vol.2, pp:524-531,20-25 June, 2005 (IEEE computer vision and pattern-recognition international conferences, the 2nd volume, 524-531 pages, on June 20-25th, 2005) in the article of delivering on " A Bayesian HierarchicalModel for Learning Natural Scene Categories " (" a kind of Bayes's hierarchical model of study natural scene classification "), a kind of method of extracting proper vector is proposed, this method is extracted proper vector to each height piece and is combined to form the required proper vector of classification by entire image being decomposed into the combination of a lot of sub-pieces (patch).This method has individual obvious defects, the extraction feature that promptly each height piece of entire image is not added screening, the feature that is extracted must comprise the information that comprises each ingredients such as target and background in the image like this, so bigger redundancy and interference are arranged.

Summary of the invention

The present invention is directed to the deficiency that above-mentioned prior art exists, a kind of image classification method based on visual attention model is proposed, make it pass through visual attention model, conspicuousness to each pixel in the image is measured, reinforcement is to the description of human eye area-of-interest in the image, and the uninterested zone of inhibition human eye, thereby reduces the interference of mixed and disorderly targets such as background, finding interested content in image rapidly is focus-of-attention, finishes the task of image classification under the guiding of focus-of-attention efficiently.

The present invention is achieved by the following technical solutions, and concrete steps are as follows:

Step 1, picked at random is set the image of quantity as training sample in selected image library;

Step 2, extract the proper vector based on visual attention model and overall pauciform of every width of cloth image in the training sample, specific as follows: at first, obtain the luminance channel of image, Color Channel, direction passage and rareness passage, then, the mechanism of utilizing multiresolution to handle, be respectively each passage, set up gaussian pyramid, secondly, generate series of features figure by center-marginal operation for gaussian pyramid, wherein, by center-marginal operation computing center-edge difference, the while is edge calculation-equation of the ecentre also to luminance channel, and then luminance channel is broken down on two subchannels; At last, in each passage, the characteristic pattern that is generated by the 6th step is striden the yardstick addition, make each passage respectively generate a vice president characteristic pattern, and adopt the grid method of average that each characteristic pattern that generates is extracted proper vector;

Step 3, the image for the treatment of classification also adopts the method identical with step 2 to extract the proper vector of every sub-picture based on visual attention model and overall rareness;

Step 4, the proper vector that step 2 and step 3 are extracted is sent to sorter classifies, and finally obtains the classification results of image to be classified.

The described luminance channel of obtaining is extracted its R, G, B component value to every width of cloth input picture, and the luminance channel of image is expressed as:

I = \frac{r + g + b}{3} .

The described Color Channel that obtains comprises the steps:

At first, with brightness I r, g, b are carried out normalized and obtain r ', g ', b ', to remove the coupled relation between color component and the brightness, and brightness is in the whole zero setting of color of the pixel below the threshold value of setting, because be difficult to cause the attention of human eye vision in the low place of brightness;

Then, r ', g ', b ' are converted into four primaries (Primary Colors) redness, green, blue and yellow, are expressed as R, G, B, Y respectively, transforming relationship is as follows:

\{\begin{matrix} R = r' - \frac{g' + b'}{2} \\ G = g' - \frac{r' + b'}{2} \\ B = b' - \frac{r' + g'}{2} \\ Y = r' + g' + 2 (| r' - g' | + b') \end{matrix}

At last, to obtain Color Channel, as R-G color subchannel and B-Y color subchannel, compositive relation is as follows with synthetic two of above-mentioned quaternary look " color to ":

\{\begin{matrix} RG = R - G \\ BY = B - Y \end{matrix} .

Describedly with brightness I r, g, b are carried out normalized and obtain r ', g ', b ', specific as follows:

1. seek the maximal value I of brightness of image _Max

The threshold value that 2. brightness is set is I _Max/ 10, select the point of all brightness less than threshold value;

3. r ', g ', the b ' of pixel are set, the 2. the pixel selected of step be set to

\{\begin{matrix} r' = 0 \\ g' = 0 \\ b' = 0 \end{matrix},

Other pixel is set to

\{\begin{matrix} r' = r / I \\ g' = g / I \\ b' = b / I \end{matrix} .

The described direction passage that obtains, be meant by luminance channel being added rich (Gabor) filtering and obtain the direction passage, direction parameter according to the Gabor wave filter is divided into four subchannels with the direction passage, corresponding 0 °, 45 °, 90 ° and 135 ° respectively, luminance channel is used the Gabor wave filter on specific direction, just can obtain corresponding director passage: O _θ=I*Gabor (θ), θ ∈ 0 °, and 45 °, 90 °, 135 ° }.

The described rareness passage that obtains comprises the steps:

1. with image as a series of message m _iCombination, and obtain the frequency that each message occurs, specific as follows:

frequency (m_{i}) = \frac{hist (m_{i})}{card (M)}

Wherein, hist (m _i) be image pixel m _iValue in statistic histogram, card (M) is a pixel number in the image M;

If 2. two message m _iAnd m _jHave the identical frequency of occurrences, but m wherein _iCompared with m _jCompare with other message and to have bigger otherness, just m _iWith respect to m _jComparatively rare, this moment m _iObviously should obtain bigger demand value, then introduce global area calibration difference (m _i) come describing message m _iWith the otherness of other message in the image, specific as follows:

difference (m_{i}) = 1 - \frac{Σ_{j = 1}^{card (M)} | m_{i} - m_{j} |}{card (M) * Max (M)}

3. frequency and the global area calibration that occurs according to message obtains message m _iThe self-information amount, wherein rare message will comprise higher self-information amount in the image, and will be specific as follows:

I(m _i)＝-log(p(m _i))

p(m _i)＝frequency(m _i)*difference(m _i)。

The described employing grid method of average is extracted proper vector to each characteristic pattern that generates, be meant: to each characteristic pattern, it is divided into 4 * 4 fixing fritters, and calculate the mean value of each piece, like this to each width of cloth color of generating, brightness, direction character figure with based on the characteristic pattern of overall rareness, characterize with one 16 dimensional feature vector respectively, at last all proper vectors are merged proper vector as next step image classification.The grid method of average among the present invention also can be understood as " Fuzzy Processing ", is a kind of method that meets the biological vision principle.

Describedly set up gaussian pyramid for each passage, specific as follows:

The image of the thinnest yardstick that 1. respectively each passage is obtained is as pyramidal bottom (the 0th grade);

2. with the gaussian kernel function convolution of each tomographic image and 5 * 5;

3. 2 demultiplications sampling on width and height both direction obtains pyramidal next stage;

4. jump to step 2., up to pyramidal top layer.

Each layer of a typical image pyramid is half of preceding one deck width and height, and pyramidal bottom is that the high resolving power of pending image is represented, and the top is the expression of low resolution.When pyramid moved up, size and resolution just reduced.If the resolution of image is 2 power, the perhaps multiple of 2 power, making up these pyramids so is very easily.Between layer and layer, usually use some kernel functions or wave filter to carry out level and smooth and filtering, the kernel function of scene comprises gaussian kernel, laplace kernel, Gabor nuclear etc., therefore corresponding have gaussian pyramid, laplacian pyramid and a Gabor pyramid.

Described center-marginal operation, be meant with image " compares with " edge " zone in " center " zone, in visual attention model, " the regional difference in " center " zone and " edge " is each pixel value in the characteristic pattern.In visual attention model, each pixel that the center is a bottom in the top layer pyramid, the edge then obtains by the mean value that calculates in certain neighborhood of each center pixel.

Compared with prior art, the present invention has following beneficial effect: the present invention is in conjunction with human-eye visual characteristic, a kind of visual attention computation model that combines based on Visual Selective Attention and overall rareness has been proposed, the characteristic pattern that this computation model is produced extracts proper vector by the grid method of average, carry out image classification thereby can from low-level image feature, obtain high-rise feature, make sorting result more accurate with visual characteristic.

Description of drawings

Fig. 1 is a process flow diagram of the present invention;

Fig. 2 is based on the extraction process flow diagram of the proper vector of visual attention model in the inventive method.

Embodiment

Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.

The image data base that adopts in the present embodiment is the Caltech image library, comprises people's face, aircraft, the tailstock, vehicle body, motorcycle and background 6 classes, and each class has in the bigger class otherness and background mixed and disorderly.

As shown in Figure 1, this example comprises the steps:

Step 1, picked at random 50% is as training sample in each class of Caltech image library 6 classes, and 50% as test sample book.

Step 2 is extracted the proper vector of each training sample based on visual attention model and overall pauciform according to the following steps, as shown in Figure 2:

The first step, the obtaining of luminance channel: input picture is extracted its R, G, the B component value, the brightness of image is provided by following formula:

I = \frac{r + g + b}{3};

Second step, the obtaining of Color Channel:

In order to remove the coupled relation between color component and the brightness, at first r, g, b are carried out normalized with brightness I.Consider simultaneously in the very low place of brightness (as less than high-high brightness 10%), be difficult to cause the attention of human eye vision, therefore brightness is in the whole zero setting of color of the pixel below the threshold value.

Normalized detailed process is as follows:

(1) maximal value of searching brightness of image;

(2) find out all brightness and (be made as I usually less than threshold value _Max/ 10) point;

(3) at these correspondence positions

\{\begin{matrix} r' = 0 \\ g' = 0 \\ b' = 0 \end{matrix},

Other positions

\{\begin{matrix} r' = r / I \\ g' = g / I \\ b' = b / I \end{matrix} .

R ', g ', b ' are converted into four primaries (Primary Colors) redness, green, blue and yellow, are expressed as R, G, B, Y respectively.Transforming relationship is provided by following formula:

\{\begin{matrix} R = r' - \frac{g' + b'}{2} \\ G = g' - \frac{r' + b'}{2} \\ B = b' - \frac{r' + g'}{2} \\ Y = r' + g' + 2 (| r' - g' | + b') \end{matrix}

For obtaining Color Channel, we are with two of above-mentioned quaternary looks synthetic " color to ", and as R-G color subchannel and B-Y color subchannel, compositive relation is provided by following formula:

\{\begin{matrix} RG = R - G \\ BY = B - Y \end{matrix}

The 3rd step, the obtaining of direction passage:

The direction passage obtains by luminance channel is carried out Gabor filtering: the direction passage is divided into four subchannels according to the direction parameter of Gabor wave filter, corresponding 0 °, 45 °, 90 ° and 135 ° respectively.Luminance channel is used the Gabor wave filter on specific direction, just can obtain corresponding director passage:

O _θ＝I*Gabor(θ)，θ∈{0°，45°，90°，135°}。

The 4th step, the obtaining of rareness passage:

The Visual Selective Attention model can be weighed the conspicuousness of each pixel in the present image by brightness, color and directional diagram, thereby guides early stage vision attention.Yet a notable attribute may not be the focus of our vision attention in some cases, and this moment, human eye more may be attracted by the less feature of the frequency of occurrences in the image.The obtaining step of rareness passage is as follows:

(1) image is regarded as a series of message m _iCombination

(2) frequency of calculating message appearance is as follows:

frequency (m_{i}) = \frac{hist (m_{i})}{card (M)}

Hist (m wherein _i) be image pixel m _iValue in statistic histogram, card (M) is a pixel number in the image M.

(3) the global area calibration of calculating message is as follows:

difference (m_{i}) = 1 - \frac{Σ_{j = 1}^{card (M)} | m_{i} - m_{j} |}{card (M) * Max (M)}

(4) calculate the self-information amount of each message.

p(m _i)＝frequency(m _i)*difference(m _i)

I(m _i)＝-log(p(m _i))

The 5th step, the pyramid decomposition of each passage:

The first step to the four is gone on foot the brightness that obtains, color, direction and rareness passage, the mechanism of utilizing multiresolution to handle is respectively each passage and sets up gaussian pyramid.

Gaussian pyramid to set up process as follows:

The image of the thinnest yardstick that (1) respectively each passage is obtained is as pyramidal bottom (the 0th grade);

(2) each tomographic image and 5 * 5 gaussian kernel function convolution;

(3) 2 demultiplications sampling on width and height both direction obtains pyramidal next stage;

(4) jump to step (2), up to pyramidal top layer (the 4th grade).

The 6th step, the operation of one edge, center

Each channel image pyramid that the 5th step was set up generates series of features figure by center-marginal operation.Center-marginal operation promptly is with " compare with " edge " zone in the " center " zone.In visual attention model, " the " center " zone generates with " edge " zone characteristic pattern by comparing by center-marginal operation (Center-Surround Operation).In visual attention model, the center is the 2nd grade to the 4th grade each pixel in the pyramid, and the edge then obtains by the mean value that calculates in certain neighborhood of each pixel.

In addition, to luminance channel in order to simulate the response of cell in the human eye receptive field, computing center-edge difference not only, also edge calculation-equation of the ecentre simultaneously, luminance channel is broken down on two subchannels like this, i.e. I _OnAnd I _Off

The 7th step, the generation of proper vector

At first in each passage, the characteristic pattern that is generated by the 6th step is striden the yardstick addition, make each passage respectively generate a vice president characteristic pattern.Adopt the grid method of average that each characteristic pattern that generates is extracted proper vector, promptly to each characteristic pattern, we are divided into 4 * 4 fixing fritters with it, calculate the mean value of each piece then.Like this to 2 width of cloth colors, 2 width of cloth brightness and 4 width of cloth direction character figure that generate and 1 width of cloth based on the remarkable figure of overall rareness, extract the proper vector of 9*16=144 dimensional vector altogether as next step image classification.The grid method of average here also can be understood as so-called " Fuzzy Processing ", is a kind of method that meets the biological vision principle.

Step 3, the image for the treatment of classification extracts the proper vector of every sub-picture based on visual attention model and overall rareness according to the method identical with step 2.

Adopt support vector machine (SVM) sorter in the present embodiment, the confusion matrix that obtains total classification results is as shown in table 1:

Table 16 class confusion matrixs

Total accuracy rate that can be calculated 6 class image classifications by table one is 97.74%.

As shown in Table 1, present embodiment all can be obtained good classifying quality to all kinds of images.Present embodiment is in conjunction with human-eye visual characteristic, a kind of visual attention computation model that combines based on Visual Selective Attention and overall rareness has been proposed, the characteristic pattern that this computation model is produced extracts proper vector by the grid method of average, carries out image classification thereby can obtain the high-rise feature with visual characteristic from low-level image feature.The classification results of present embodiment shows that the present embodiment method all has extraordinary effect on raising classification accuracy and reduction classification error rate.

Claims

1, a kind of image classification method based on visual attention model is characterized in that, comprises the steps:

Step 2, extract the proper vector based on visual attention model and overall pauciform of every width of cloth image in the training sample, specific as follows: at first, obtain the luminance channel of image, Color Channel, direction passage and rareness passage, then, the mechanism of utilizing multiresolution to handle, be respectively each passage and set up gaussian pyramid, secondly, generate series of features figure for gaussian pyramid by center-marginal operation, wherein, luminance channel is passed through center-marginal operation computing center-edge difference, while is edge calculation-equation of the ecentre also, and then luminance channel is broken down on two subchannels; At last, in each passage, the characteristic pattern that is generated by the 6th step is striden the yardstick addition, make each passage respectively generate a vice president characteristic pattern, and adopt the grid method of average that each characteristic pattern that generates is extracted proper vector;

2, the image classification method based on visual attention model according to claim 1 is characterized in that, the described luminance channel of obtaining is extracted its R, G, B component value to every width of cloth input picture, and the luminance channel of image is expressed as:

I = \frac{r + g + b}{3} .

3, the image classification method based on visual attention model according to claim 1 is characterized in that, the described Color Channel that obtains comprises the steps:

Then, r ', g ', b ' are converted into four primaries: red, green, blue and yellow, be expressed as R, G, B, Y respectively, transforming relationship is as follows:

\{\begin{matrix} R = r' - \frac{g' + b'}{2} \\ G = g' - \frac{r' + b'}{2} \\ B = b' - \frac{r' + g'}{2} \\ Y = r' + g' + 2 (| r' - g' | + b') \end{matrix},

\{\begin{matrix} RG = R - G \\ BY = B - Y \end{matrix} .

4, the image classification method based on visual attention model according to claim 3 is characterized in that, describedly with brightness I r, g, b is carried out normalized and obtains r ', g ', b ', and is specific as follows:

1. seek the maximal value I of brightness of image _Max

\{\begin{matrix} r' = 0 \\ g' = 0 \\ b' = 0 \end{matrix},

Other pixel is set to

\{\begin{matrix} r' = r / I \\ g' = g / I \\ b' = b / I \end{matrix} .

5, the image classification method based on visual attention model according to claim 1, it is characterized in that, the described direction passage that obtains, be meant by luminance channel being added rich filtering and obtain the direction passage, direction parameter according to the Gabor wave filter is divided into four subchannels with the direction passage, correspondence is 0 °, 45 °, 90 ° and 135 ° respectively, luminance channel is used add rich wave filter on specific direction, just can obtain corresponding director passage: O _θ=I*Gabor (θ), θ ∈ 0 °, and 45 °, 90 °, 135 ° }.

6, the image classification method based on visual attention model according to claim 1 is characterized in that, the described rareness passage that obtains comprises the steps:

frequency (m_{i}) = \frac{hist (m_{i})}{card (M)}

difference (m_{i}) = 1 - \frac{Σ_{j = 1}^{card (M)} | m_{i} - m_{j} |}{card (M) * Max (M)}

I(m _i)＝-log(p(m _i))

p(m _i)＝frequency(m _i)*difference(m _i)。

7, the image classification method based on visual attention model according to claim 1, it is characterized in that, the described employing grid method of average is extracted proper vector to each characteristic pattern that generates, be meant: to each characteristic pattern, it is divided into 4 * 4 fixing fritters, and calculate the mean value of each piece, promptly to each width of cloth color of generating, brightness, direction character figure with based on the characteristic pattern of overall rareness, characterize with one 16 dimensional feature vector respectively, at last all proper vectors are merged proper vector as next step image classification.

8, the image classification method based on visual attention model according to claim 1 is characterized in that, describedly sets up gaussian pyramid for each passage, and is specific as follows:

The image of the thinnest yardstick that 1. respectively each passage is obtained is as pyramidal bottom;

4. jump to step 2., up to pyramidal top layer.

9, the image classification method based on visual attention model according to claim 1, it is characterized in that, described center-marginal operation, be meant with image " compares with " edge " zone in " center " zone; in visual attention model, and " the regional difference in " center " zone and " edge " is each pixel value in the characteristic pattern, in visual attention model, each pixel that the center is a bottom in the top layer pyramid, the edge then obtains by the mean value that calculates in certain neighborhood of each center pixel.