CN103034871B

CN103034871B - A kind of image classification method based on space aggregation

Info

Publication number: CN103034871B
Application number: CN201210560743.5A
Authority: CN
Inventors: 王亮; 黄永祯; 刘锋
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2012-12-20
Filing date: 2012-12-20
Publication date: 2015-09-23
Anticipated expiration: 2032-12-20
Also published as: CN103034871A

Abstract

The invention discloses a kind of image classification method based on space aggregation.In the method, first, extract local feature and obtain visual dictionary with clustering algorithm; Then, with encryption algorithm, feature is encoded, the feature in different spaces region is assembled and is connected in series together; Finally, carry out feature selecting with general feature selection approach, and be used for classifying to image with training classifier with the expression of the feature selected as image.The inventive method selects there is differentiation power most from the expression of different spaces areas combine, and the feature of robust is as the expression of picture more, thus can reflect the space distribution of feature in given classification picture and symbiosis information.The inventive method can reach the effect that nicety of grading is better than conventional algorithm with few feature quantity.

Description

A kind of image classification method based on space aggregation

Technical field

The present invention relates to mode identification technology, especially a kind of image classification method based on space aggregation.

Background technology

At present, traditional image classification method lacks the ability effectively expressing image space information.This be also computer vision system compared with human visual system on accuracy of identification still one of major reason that there is huge spread.Conventional image classification method often effectively can not utilize spatial information, as pyramid spatial match algorithm, it is only the simple series connection expressed a small amount of area of space, although there is certain robustness, efficiency and the judgement index of the spatial information of reflection are more weak.Some method directly utilizes the absolute spatial position of feature, but is easy to offset due to the locus of feature, such method often on the database alignd performance good, and very poor in unjustified database performance.

Therefore, in view of method is in the past difficult to the needs met Images Classification, the present invention proposes a kind of image classification method based on space aggregation and carry out Expressive Features spatial information in the picture, the method is not only insensitive to the skew of Individual features but also can describe its space distribution flexibly.The dimension of the image expression obtained due to this method is very high, and conventional feature selection approach therefore can be used to carry out feature selecting, using the final expression as image of the feature selected.

Summary of the invention

In order to solve prior art Problems existing, the object of this invention is to provide a kind of image classification method based on space aggregation, the method comprises the following steps:

Step S1, collects multiple image, sets up image classification data storehouse, and described database is divided into training set and test set;

Step S2, extracts the local feature of all images in described database;

Step S3, randomly draws out the local feature of some from the local feature of the image of described training set, utilizes clustering algorithm to learn to obtain a visual dictionary D=[d ₁, d ₂..., d _k], wherein, K represents the size of visual dictionary, i.e. the number of cluster centre; d _ibe a column vector, represent vision word, i.e. a cluster centre;

Step S4, the local feature described step S2 being extracted to all images obtained is encoded;

Step S5, is spatially divided into multiple rectangular block by each image in described database, and assembles respectively the local feature in each rectangular block, as the feature representation of this rectangular block;

Step S6, by the method for assembling, is merged into a region by rectangular block adjacent for space, and the result of being assembled by the several rectangular blocks participating in merging is as the feature representation merging the region obtained;

Identical for all sizes obtained in described step S6 and non-overlapping two regions are assembled, and are connected in series together as the feature representation of this image using the result of gathering by step S7;

Step S8, is similar to described step S5-S7, obtains the feature representation of all images in described training set, and chooses the final feature representation as image in described training set and test set of the feature wherein most with differentiation power;

Step S9, based on selecting the features training support vector machine most with differentiation power in described step S8, obtains Image Classifier;

Step S10, extracts the feature most with differentiation power of image in described test set, and this feature is inputted in described sorter and classify, thus obtains the classification results of this image in described test set.

According to method of the present invention, can be described the symbiosis information of the space distribution of single feature and multiple feature.By taking block as the description robust more that primitive makes for feature space position, and consider that the spacial combi nation form in various region can contribute to excavating more spatial information.

Accompanying drawing explanation

Fig. 1 is the image classification method process flow diagram that the present invention is based on space aggregation.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

Traditional image classification method can be divided into extraction local feature, and training visual dictionary, expresses image, training classifier and detection new images five parts.On this basis, first the present invention adopts the method for space aggregation to express image, and then does feature selecting to the expression of all images, only using the final expression of selected feature as image.

Fig. 1 is the image classification method process flow diagram that the present invention is based on space aggregation, and as shown in Figure 1, the method comprises the following steps:

Step S2, extracts the local feature of all images in described database;

In this step, local feature description's or local feature can be used to detect the local feature that son obtains this image in the mode of intensive sampling, such as Scale invariant features transform SIFT, acceleration robust feature SURF etc.

Step S3, randomly draws out the local feature of some from the local feature of the image of described training set, utilizes clustering algorithm to learn to obtain a visual dictionary;

In this step, after randomly drawing and obtaining local feature, adopt clustering algorithm of the prior art (such as K means clustering algorithm) to train and obtain a visual dictionary D=[d ₁, d ₂..., d _k], wherein, K represents the size of visual dictionary, i.e. the number of cluster centre, d _ibe a column vector, represent vision word, i.e. a cluster centre.

Described coding can use Multi-encoding mode conventional in prior art to local feature f _iencode, be encoded to example with local linear below and be described.

To i-th local feature f _ithe step of carrying out local linear coding comprises further:

Step S41, utilizes following formula to calculate intermediate variable

α_{i}^{*} = {(Δ_{i}^{T} Δ_{i} + βI)}^{- 1} 1,

Wherein, Δ _i=[f _i-c ₁, f _i-c ₂..., f _i-c _m], 1 ∈ R ^{m × 1}to be an all elements be all 1 column vector, R ^{m × 1}for the vector space of M × 1, c _kfor distance feature f _ia kth word in M nearest word; represent Δ _itransposition; β is a constant, usually elects 10 as ^-4; I ∈ R ^{m × M}be a unit matrix, R ^{m × M}for the space of matrices of M × M; represent inverse;

Step S42, because α meets 1 ^tα=1, then right be normalized, obtain another intermediate variable α;

Step S43, according to this another intermediate variable α _ivalue obtain local feature f _ifinal coding express v _i, wherein, v _imiddle distance local feature f _iresponse on nearest several words is corresponding α respectively _iin value, on all the other words response be zero.

In this step, such as along long and the wide rectangular block (as 4 × 4) image being divided into several rules, then respectively maximum gathering can be carried out to the local feature of each rectangular block inside, obtains the response b of vision word on 16 rectangular blocks ₁, b ₂..., b ₁₆, i.e. the feature representation of rectangular block.

Described gathering can use method for congregating conventional in prior art, is described below for maximum method for congregating.

Described agglomeration step is specially: adopt maximum method for congregating that spatially adjacent rectangular block is merged into a region, this region can be arbitrary size, as one by b ₁, b ₂the region of form 1 × 2, then the feature representation in this region can be obtained by following formula:

r ₁＝max(b ₁，b ₂)，

Wherein, max represents the maximal value of getting two vectorial corresponding elements.

Identical for all sizes obtained in described step S6 and non-overlapping two regions (between two) are assembled, and are connected in series together as the feature representation of this image using the result of gathering by step S7;

Described gathering can use method for congregating conventional in prior art, is described below for minimized aggregation method.

In this step, for any formed objects and non-overlapping two areas combine, carry out minimized aggregation operation, i.e. p _k=min (r _i, r _j), wherein, k represents combination sequence number, r _i, r _jthe feature representation in described two regions, then the final feature representation x of described image _ifor the series connection of various areas combine, that is: x _i=(p ₁; p ₂; p _n), wherein, N is the quantity of areas combine.

Each step aggregation operator all can be replaced other clustered pattern above, as summation gathering, weighted sum gathering etc., to reflect other spatial relationship.

Wherein, select the method most with the feature of differentiation power can be any conventional feature selection approach, such as grafting in prior art, a kind of algorithm of increment, it can process large-scale data very easily, finally obtains the feature most with differentiation power chosen.

In order to verify implementation result of the present invention, be next described for certain scene classification database.This database comprises more than 4000 images, respectively show 15 kinds of different scenes.The present invention can according to the content of these images, provide image show the class label of scene.Concrete steps are as follows:

Step S1, from every class scene, random choose goes out 100 images, forms training plan image set, remaining all picture composition test sets;

Step S2, extracts SIFT local feature in the mode of intensive sampling from all images;

Step S3, randomly draws out 1,000,000 local features from training set, utilizes K mean algorithm to learn to obtain the visual dictionary that comprises 1024 vision word;

Step S4, extracts the local feature of all images, encodes to extracted feature by the mode of local linear coding;

Step S5, is spatially divided into the rectangular block of 4 × 4 by each image, carry out maximum gathering respectively, as the expression of this block to each piece of inner feature;

Step S6, adopts the method for maximum gathering spatially adjacent merged block to be become 1 × 1,1 × 2,1 × 3,1 × 4,2 × 1,2 × 2,2 × 3,2 × 4,3 × 1,3 × 2, the region of 4 × Isosorbide-5-Nitrae × 2, obtains the expression of zones of different;

Identical for all sizes and non-overlapping two regions are carried out minimized aggregation, are linked togather by the gathering resultant string of often kind of areas combine by step S7, as the expression of image to carry out feature selecting;

Step S8, adopts grafting algorithm to carry out feature selecting to the expression of image in training set, chooses the feature having differentiation power most wherein, with the final expression of these features as training and testing image;

Step S9, sends the final expression (feature by selecting) of training image into support vector machine training classifier;

Step S10, classifies the final sorter obtained in feeding S8 of expressing of test pattern.

Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. based on an image classification method for space aggregation, it is characterized in that, the method comprises the following steps:

Step S2, extracts the local feature of all images in described database;

Step S10, extracts the feature most with differentiation power of image in described test set, and this feature is inputted in described sorter and classify, thus obtains the classification results of this image in described test set;

In described step S4, local linear coding is used to encode to the local feature of all images, to i-th local feature f _ithe step of carrying out local linear coding comprises:

Step S41, utilizes following formula to calculate intermediate variable

α_{i}^{*} = {(Δ_{i}^{T} Δ_{i} + βI)}^{- 1} 1,

Wherein, Δ _i=[f _i-c ₁, f _i-c ₂..., f _i-c _m], 1 ∈ R ^{m × 1}to be an all elements be all 1 column vector, R ^{m × 1}for the vector space of M × 1; c _kfor distance local feature f _ia kth word in M nearest word; represent Δ _itransposition; β is a constant; I ∈ R ^{m × M}be a unit matrix, R ^{m × M}for the space of matrices of M × M; represent inverse;

Step S42 is right be normalized, obtain another intermediate variable α _i;

2. method according to claim 1, is characterized in that, extracts the local feature of all images in described database in the mode of intensive sampling.

3. method according to claim 1, is characterized in that, in described step S2, described local feature is Scale invariant features transform feature or accelerates robust feature.

4. method according to claim 1, is characterized in that, in described step S3, described clustering algorithm is K means clustering algorithm.

5. method according to claim 1, is characterized in that, in described step S5, adopts maximum method for congregating, summation method for congregating or weighted sum method for congregating to assemble the local feature in each rectangular block.

6. method according to claim 1, is characterized in that, in described step S6, adopts maximum method for congregating, summation method for congregating or weighted sum method for congregating that rectangular block adjacent for space is merged into a region.

7. method according to claim 1, is characterized in that, in described step S7, adopts minimized aggregation method, summation method for congregating or weighted sum method for congregating to be assembled in identical for all sizes obtained in described step S6 and non-overlapping two regions.

8. method according to claim 1, is characterized in that, in described step S8, uses grafting algorithm to choose the feature most with differentiation power.