CN103399870A

CN103399870A - Visual word bag feature weighting method and system based on classification drive

Info

Publication number: CN103399870A
Application number: CN2013102858915A
Authority: CN
Inventors: 金海�; 郑然�; 朱磊; 冯晓文
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2013-07-08
Filing date: 2013-07-08
Publication date: 2013-11-20

Abstract

The invention discloses a visual word bag feature weighting method and system based on classification drive. The method comprises the steps that images are downloaded from the Internet, and an image database is established; the visual word bag features of all N images in the image database are extracted; the reverse index of the visual word bag features of all N images is established; N1 images and corresponding visual word bag features are extracted randomly from the image database; the N1 images are made to form C visual classes through a clustering algorithm; images are chosen randomly from each visual class to form a study sample set of the visual class; a weighting study sample set of the visual word bag features is established on the study sample set of the visual class for each visual class; the weighting study sample sets are used for training the visual classes to form supporting vector machine judging modules of the visual classes. The visual word bag feature weighting method and system based on classification drive can solve the technical problems that index precision is low, and the difference among different searching images can not be excavated in an existing method.

Description

A kind of visual word bag feature weight method and system that drives based on classification

Technical field

The invention belongs to the CBIR field, more specifically, relate to a kind of visual word bag feature weight method and system that drives based on classification.

Background technology

The CBIR technology is extracted the low-level image feature vector according to picture material, by the similarity between the similarity representative image between the computed image proper vector.The image indexing system of main flow adopts local feature description's picture material.Local feature has good unchangeability, and it is abundanter to contain information, can process the situations such as more complicated object blocks, illumination variation.Yet in the image retrieval process based on local feature, the similarity between the local feature collection is calculated and is related to the coupling between local feature, and calculated amount is large, can not meet the needs of image real-time retrieval.For reducing the calculated amount of local feature coupling, CBIR is introduced the concept of " word bag " in text retrieval, on the basis of local feature, extracts visual word bag feature.At first visual word bag feature obtains the cluster centre of local feature set by clustering algorithm, cluster centre is also referred to as visual word; Then according to minor increment cluster centre principle, each local feature is divided to each cluster centre; The frequency that in last statistical picture, visual word occurs, the forming frequency histogram is as visual word bag proper vector.

Existing visual word bag feature weight method adopts the weight method of word bag in text retrieval simply, and the visual word of different importance is carried out to different weight tolerance.Yet there is following shortcoming in existing method:

(1) adopt learning method non-supervisory in text retrieval to carry out weight tolerance to the visual word of image, can not take full advantage of the discriminant information between different images.Therefore, in the computation process of characteristic similarity, existing visual word bag feature is failed the relevant image of retrieval and inquisition image, semantic effectively, thereby causes the decline of retrieval precision.

(2) in CBIR, existing visual word carries out different weight methods and gives identical visual word weight to every query image, can not excavate the otherness between different query image.

Summary of the invention

Above defect or Improvement requirement for prior art, the invention provides a kind of visual word bag feature weight method and system that drives based on classification, its purpose is to solve the technical matters that the retrieval precision that exists in existing method is low, can't excavate different query image differences.

For achieving the above object, according to one aspect of the present invention, provide a kind of visual word bag feature weight method that drives based on classification, comprised the following steps:

(1) from network, download image, and set up image data base;

(2) extract the visual word bag feature that in image data base, all N open image, wherein N is any positive integer;

(3) set up the inverted index that all N open the visual word bag feature of image;

(4) from image data base, randomly drawing N ₁Image and corresponding visual word bag feature thereof, by clustering algorithm by N ₁Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;

(5), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;

(6) adopt maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;

(7) according to the weight learning sample collection of visual word bag feature, calculate the visual word weight of each vision class in C vision class, and set up the mapping relations table between the visual word weight of vision class and this vision class;

(8) extract the visual word bag feature of query image, input the support vector machine discrimination model, obtain the vision class of query image;

(9) according to the mapping relations table of vision class and the visual word weight of query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;

(10) by the distance sequence from small to large that obtains in step (9), and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;

(11) judge whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.

Preferably, step (2) comprises following sub-step:

(2-1) all images in image data base are carried out to normalized, so that all images are unified as same size;

(2-2) adopt the intensive method of sampling to detect the feature point of interest of all images;

(2-3) around each feature point of interest of every image, get image block, and on image block, extract the partial descriptions symbol, to form local feature description's symbol set of this image;

(2-4) from N ₁R local feature description's symbol of the interior random selection of local feature description's symbol set of opening image carries out cluster, to form M cluster centre, wherein N ₁For being less than or equal to the positive integer of N, M is the number of word in visual word bag feature, and R is the positive integer times of M;

(2-5) each local feature description of calculating every image accords with the distance of each cluster centre, and usings the local feature description symbol of nearest cluster centre as this image;

(2-6) add up each cluster centre occurs in every image frequency, using the frequency histogram that forms this image visual word bag feature as this image.

Preferably, clustering algorithm comprises K mean algorithm and mean shift algorithm.

Preferably, step (5) comprises following sub-step:

(5-1), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature

y _n∈ 1,2 ..., C}, wherein f _nRepresent that n opens the visual word bag feature that image is corresponding, y _nThe label that n opens the vision class that image is corresponding, N ₂To be less than or equal to N ₁Positive integer;

(5-2) by showing that mapping function φ is by visual word bag feature f _nShow to map to higher dimensional space, make this visual word bag feature can linear separability;

(5-3) the parameter c w of C linear SVM of study _cAnd b _c, c=1 wherein, 2 ..., C;

(5-4) according to parameter w _cAnd b _cBuild C support vector machine discrimination model Classifier ^c:

Classifier ^c(x)＝cw _c×φ(x)+b _c，

Wherein x is the visual word bag feature of arbitrary image;

Preferably, step (7) comprises following sub-step:

(7-1) utilize the support vector machine discrimination model to classify to the image in the weight learning sample collection of visual word bag feature, to obtain the visual classification designator that this image is corresponding; The support vector machine discrimination model is as follows:

Category (x) = \underset{c}{\arg \max} {Classifier}^{c} (x),

The visual classification designator of Category (x) presentation video x wherein;

(7-2) at the weight learning sample of visual word bag feature, concentrate, according to the label of the vision class of the visual classification designator of every image and this image, calculate classification confusion matrix P, in this matrix, the label of the vision class of the capable c column element of d representative image is d and by the wrong probability that is divided into the label c of vision class, wherein d and c are any positive integer that is less than or equal to C; Specific formula for calculation is:

P (d, c) = \frac{Σ_{n = 1}^{N_{2}} δ (δ (d, y_{n}), δ (c, Category (f_{n}))}{Σ_{n = 1}^{N_{1}} δ (d, y_{n})}

If wherein a equals b, δ (a, b) equals 1, if a is not equal to b, δ (a, b) equals 0;

(7-3) determine that any two in C vision class are obscured class;

(7-4) utilize the weight of each visual word in the weight learning sample collection computation vision class c of this visual word bag feature, the optimized-type of visual word weight calculation is as follows:

\min_{w_{y_{i}}} \frac{λ}{2} {| | w_{y_{i}} | |}^{2} + Σ_{ijk} ξ_{ijk}

{w_{y_{i}}}^{T} D_{ik} - {w_{y_{i}}}^{T} D_{ij} &GreaterEqual; 1 - ξ_{ijk}

w_{y_{i}} &GreaterEqual; 0, ξ_{ijk} &GreaterEqual; 0, &ForAll; (i, j, k) &Element; T

Wherein, T={ (i, j, k) | y _i=y _j, y _i≠ y _k, i, j, k ∈ 1,2 ..., N ₁Represent the tlv triple of the weight learning sample collection of this visual word bag feature.y _i, y _j, y _kWeight learning sample concentrated image i, the j, the category label of k, the wherein y that represent respectively visual word bag feature _kThe visible class of expression is y _iThe visible class of representative obscure class, D _Ij(m) (m=1,2 ..., the M) distance between the individual visual word of presentation video i and image j m, D _Ij={ D _Ij(1), D _Ij(2) ..., D _Ij(M) } M distance vector that visual word distance forms between presentation video i and image j.D _w(I _i, I _j) represent than the weight word bag characteristic distance D between image i and image j _w(I _i, I _j)=w ^T* D _Ij

(7-5) adopt random gradient descent method to solve the optimized-type of visual word weight calculation;

(7-6) set up the mapping relations table of vision class and visual word weight.

Preferably, step (9) middle distance calculating formula is as follows:

D (I_{q}, I_{i}) = w_{category (f_{q})} \times D_{qi} = Σ_{m = 1}^{M} w_{category (f_{q})} (m) \times D_{qi} (m)

D (I wherein _q, I _i) be the distance that in query image q and image data base, i opens the visual word bag feature between image, category (f _q) be the visual category of this query image,

Visual word weight vectors for query image;

Preferably, step (7-3) is specially, for any two vision class c and the d in C vision class, if the value of obscuring between vision class c and d is greater than threshold value ζ _c, d is the class of obscuring of c, c is the class of obscuring of d, wherein threshold value ζ _cSpan is 0.5 to 1.

Preferably, step (11) comprises following sub-step:

(11-1) in result for retrieval, the user selects L to open the upper feedback image I similar to query image of visual perception ₁, I ₂..., I _L

(11-2) according to the mapping relations table between the potential category label of the sequence number of every image of image data base and this image, obtain L and open the corresponding visual category label of image;

(11-3) according to the visual category label, calculate L and open probability of occurrence of all categories in image, and calculate the visual category label that maximum probability occurs;

(11-4) recalculate under each visual category equivalent bag feature weight the distance in query image and image data base between all images

D_{c} (I_{q}, I_{i}) = w_{c} \times D_{qi} = Σ_{m = 1}^{M} w_{c} (m) \times D_{qi} (m)

(11-5) adjusting the distance between all images in query image and image data base is fusion distance, and its calculating formula is as follows:

D (I_{q}, I_{d}) = Σ_{c = 1}^{C} P (c) \times D_{c} (I_{q}, I_{d})

(11-6) all images in image data base are reordered from small to large and return to retrieving images by distance.

According to another aspect of the present invention, a kind of visual word bag feature weight system that drives based on classification is provided, comprising:

The first module, for from network, downloading image, and set up image data base;

The second module, the visual word bag feature of opening image for extracting all N of image data base, wherein N is any positive integer;

The 3rd module, open the inverted index of the visual word bag feature of image be used to setting up all N;

Four module, for randomly drawing N from image data base ₁Image and corresponding visual word bag feature thereof, by clustering algorithm by N ₁Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;

The 5th module, be used for for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;

The 6th module, for adopting maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;

The 7th module, calculate the visual word weight of C each vision class of vision class for the weight learning sample collection according to visual word bag feature, and set up the mapping relations table between the visual word weight of vision class and this vision class;

The 8th module, be used to extracting the visual word bag feature of query image, input the support vector machine discrimination model, obtains the vision class of query image;

The 9th module, mapping relations table for vision class and visual word weight according to query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;

The tenth module, for the distance sequence from small to large that the 9th module is obtained, and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;

The 11 module, for judging whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.

In general, the above technical scheme of conceiving by the present invention compared with prior art, can obtain following beneficial effect:

(1) the present invention, by the weight learning sample collection of offline group tissue visualization word bag feature, fully excavates the discriminant information between different images, and these discriminant informations is incorporated to the weight metric of visual word, improves the precision of CBIR.

(2) the present invention passes through linear classification, query image is mapped to the visual category of definition, and according to the mapping relations table of vision class and the visual word weight of off-line learning, obtain the weight of each visual word in the visual word bag of query image feature, can distinguish different query image like this, embody the otherness of different query image.

(3) the present invention organizes visual word weight corresponding to each vision class of sample learning by off-line, effectively calculates visual word weight corresponding to each vision class.Because employing processed offline, the method do not affect the response speed of CBIR.

(4) the present invention, by the confusion matrix between the computation vision class, for the visual word weight of the Category Learning of easily obscuring, effectively reduces the calculated amount of visual word weight, improves the speed of processed offline.

(5) the present invention, by the feedback method of potential visual category, introduces the high-level semantic of user's inquiry, by supporting vector machine model classification and the effective combination of visual word weight, reaction user's personalized enquire demand.

The accompanying drawing explanation

Fig. 1 is the process flow diagram that the present invention is based on the visual word bag feature weight method of classification driving.

Fig. 2 is the refinement process flow diagram of step in the inventive method (2).

Fig. 3 is the refinement process flow diagram of step in the inventive method (5).

Fig. 4 is the refinement process flow diagram of step in the inventive method (7).

Fig. 5 is the refinement process flow diagram of step in the inventive method (11).

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.In addition, below in each embodiment of described the present invention involved technical characterictic as long as form each other conflict, just can mutually not make up.

At first the technical term in the present invention is explained and illustrated:

Intensive sampling: get a pixel as point of interest every several pixels of image.

The learning sample collection: the learning sample collection is mainly used in the training of supporting vector machine model and the calculating of visual word weight.

Linear SVM: adopt linear partition to divide a kind of support vector machine in the face of positive negative sample

Visual category space: formed by vision class main in image data base.

Visual word bag feature: by quantizing local feature to cluster centre, the frequency of occurrences of the visual word of statistics.

Inverted index: use the inverted index in text retrieval to carry out index terms bag feature, accelerate retrieval rate.

Show mapping function: in the simulation supporting vector machine model, hang down the implicit expression mapping process of dimensional feature to high dimensional feature, and it is expressed as to the explicit function form.

Maximum score value principle: the classification designator of image is got the corresponding sorter label of maximum classifier calculated value.

Confusion matrix: the probability matrix of wrong minute between the vision class, the vision class that obtains not being easily distinguishable each other according to this matrix.

The Integral Thought that the present invention is based on the visual word bag feature weight method that classification drives is, the method, by the weight of the visual word of off-line learning, is set up the mapping relations table of vision class and visual word weight.In the process of retrieval, retrieving images is the vision classification of support vector machine discrimination model at first, then according to the support vector machine discrimination model, obtain the visual category of query image, last weight mapping relations table according to vision class and visual word, obtain the weight of visual word in visual word bag feature, and carry out the similarity coupling of visual word bag feature and based on the relevance feedback of result for retrieval.The method has great importance to CBIR (as the retrieval of the retrieval of shopping picture search, scenic spot image, fresh flower image, appearance patent image retrieval etc.).

As shown in Figure 1, the present invention is based on the visual word bag feature weight method that drives of classifying comprises the following steps:

(1) from network, download image, and set up image data base;

(2) extract the visual word bag feature (wherein N is any positive integer) that in image data base, all N open image; As shown in Figure 2, this step specifically comprises following sub-step:

(2-3) around each feature point of interest of every image, get image block, and on image block, extract the partial descriptions symbol, to form local feature description's symbol set of this image; In the present embodiment, the size of this image block is 16 * 16 pixels;

(2-6) add up each cluster centre occurs in every image frequency, using the frequency histogram that forms this image visual word bag feature as this image;

(4) from image data base, randomly drawing N ₁Image and corresponding visual word bag feature thereof, by clustering algorithm by N ₁Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer; Particularly, clustering algorithm comprises K mean algorithm, mean shift algorithm etc.;

(5), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class; As shown in Figure 3, this step specifically comprises following sub-step:

The advantage of this sub-step is, the visual word bag feature of script linearly inseparable is shone upon and is made its linear separability by demonstration, thereby avoid adopting non-linear support that complexity is higher to the machine model, improves the efficiency of supporting vector machine model training and classification.

Classifier ^c(x)＝cw _c×φ(x)+b _c，

Wherein x is the visual word bag feature of arbitrary image;

The advantage of this step is, by off-line, organizes visual word weight corresponding to each classification of sample learning, effectively calculates visual word weight corresponding to each vision class.Because employing processed offline, the method do not affect the response speed of CBIR.

(7) according to the weight learning sample collection of visual word bag feature, calculate the visual word weight of each vision class in C vision class, and set up the mapping relations table between the visual word weight of vision class and this vision class; As shown in Figure 4, this step specifically comprises following sub-step:

Category (x) = \underset{c}{\arg \max} {Classifier}^{c} (x),

P (d, c) = \frac{Σ_{n = 1}^{N_{2}} δ (δ (d, y_{n}), δ (c, Category (f_{n}))}{Σ_{n = 1}^{N_{1}} δ (d, y_{n})}

If wherein a equals b, δ (a, b) equals 1, if a is not equal to b, δ (a, b) equals 0

(7-3) for any two vision class c and the d in C vision class, if the value of obscuring between vision class c and d is greater than threshold value ζ _c, d is the class of obscuring of c, c is the class of obscuring of d, wherein threshold value ζ _cSpan is 0.5 to 1;

The advantage of this sub-step is, by the confusion matrix between the computation vision class, for the visual word weight of the Category Learning of easily obscuring, effectively reduces the calculated amount of visual word weight, improves the speed of processed offline.

\min_{w_{y_{i}}} \frac{λ}{2} {| | w_{y_{i}} | |}^{2} + Σ_{ijk} ξ_{ijk}

{w_{y_{i}}}^{T} D_{ik} - {w_{y_{i}}}^{T} D_{ij} &GreaterEqual; 1 - ξ_{ijk}

w_{y_{i}} &GreaterEqual; 0, ξ_{ijk} &GreaterEqual; 0, &ForAll; (i, j, k) &Element; T

(7-6) set up the mapping relations table of vision class and visual word weight;

(9) according to the mapping relations table of vision class and the visual word weight of query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base, and it is as follows apart from calculating formula:

D (I_{q}, I_{i}) = w_{category (f_{q})} \times D_{qi} = Σ_{m = 1}^{M} w_{category (f_{q})} (m) \times D_{qi} (m)

Visual word weight vectors for query image.

(11) judge whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval, wherein L is positive integer; As shown in Figure 4, this step specifically comprises following sub-step:

D_{c} (I_{q}, I_{i}) = w_{c} \times D_{qi} = Σ_{m = 1}^{M} w_{c} (m) \times D_{qi} (m)

D (I_{q}, I_{d}) = Σ_{c = 1}^{C} P (c) \times D_{c} (I_{q}, I_{d})

The advantage of this step is, proposes the feedback method based on potential vision class, introduces the high-level semantic of user's inquiry, by supporting vector machine model classification and the effective combination of visual word weight, reaction user's personalized enquire demand.

The visual word bag feature weight system that the present invention is based on the classification driving comprises:

Claims

1. a visual word bag feature weight method that drives based on classification, is characterized in that, comprises the following steps:

(1) from network, download image, and set up image data base;

2. according to the described visual word bag feature weight method of claim 1, it is characterized in that, step (2) comprises following sub-step:

3. according to the described visual word bag feature weight method of claim 2, it is characterized in that, clustering algorithm comprises K mean algorithm and mean shift algorithm.

4. according to the described visual word bag feature weight method of claim 2, it is characterized in that, step (5) comprises following sub-step:

(5-1), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature y _n∈ 1,2 ..., C}, wherein f _nRepresent that n opens the visual word bag feature that image is corresponding, y _nThe label that n opens the vision class that image is corresponding, N ₂To be less than or equal to N ₁Positive integer;

Classifier ^c(x)＝cw _c×φ(x)+b _c，

Wherein x is the visual word bag feature of arbitrary image.

5. according to the described visual word bag feature weight method of claim 4, it is characterized in that, step (7) comprises following sub-step:

(7-3) determine that any two in C vision class are obscured class;

6. according to the described visual word bag feature weight method of claim 5, it is characterized in that, step (9) middle distance calculating formula is as follows:

D (I wherein _q, I _i) be the distance that in query image q and image data base, i opens the visual word bag feature between image, category (f _q) be the visual category of this query image, Visual word weight vectors for query image.

7. according to the described visual word bag feature weight method of claim 5, it is characterized in that, step (7-3) is specially, for any two vision class c and the d in C vision class, if the value of obscuring between vision class c and d is greater than threshold value ζ _c, d is the class of obscuring of c, c is the class of obscuring of d, wherein threshold value ζ _cSpan is 0.5 to 1.

8. according to the described visual word bag feature weight method of claim 1, it is characterized in that, step (11) comprises following sub-step:

9. a visual word bag feature weight system that drives based on classification, is characterized in that, comprising: