CN103399870A - Visual word bag feature weighting method and system based on classification drive - Google Patents

Visual word bag feature weighting method and system based on classification drive Download PDF

Info

Publication number
CN103399870A
CN103399870A CN2013102858915A CN201310285891A CN103399870A CN 103399870 A CN103399870 A CN 103399870A CN 2013102858915 A CN2013102858915 A CN 2013102858915A CN 201310285891 A CN201310285891 A CN 201310285891A CN 103399870 A CN103399870 A CN 103399870A
Authority
CN
China
Prior art keywords
image
visual word
visual
weight
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013102858915A
Other languages
Chinese (zh)
Inventor
金海�
郑然�
朱磊
冯晓文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN2013102858915A priority Critical patent/CN103399870A/en
Publication of CN103399870A publication Critical patent/CN103399870A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a visual word bag feature weighting method and system based on classification drive. The method comprises the steps that images are downloaded from the Internet, and an image database is established; the visual word bag features of all N images in the image database are extracted; the reverse index of the visual word bag features of all N images is established; N1 images and corresponding visual word bag features are extracted randomly from the image database; the N1 images are made to form C visual classes through a clustering algorithm; images are chosen randomly from each visual class to form a study sample set of the visual class; a weighting study sample set of the visual word bag features is established on the study sample set of the visual class for each visual class; the weighting study sample sets are used for training the visual classes to form supporting vector machine judging modules of the visual classes. The visual word bag feature weighting method and system based on classification drive can solve the technical problems that index precision is low, and the difference among different searching images can not be excavated in an existing method.

Description

A kind of visual word bag feature weight method and system that drives based on classification
Technical field
The invention belongs to the CBIR field, more specifically, relate to a kind of visual word bag feature weight method and system that drives based on classification.
Background technology
The CBIR technology is extracted the low-level image feature vector according to picture material, by the similarity between the similarity representative image between the computed image proper vector.The image indexing system of main flow adopts local feature description's picture material.Local feature has good unchangeability, and it is abundanter to contain information, can process the situations such as more complicated object blocks, illumination variation.Yet in the image retrieval process based on local feature, the similarity between the local feature collection is calculated and is related to the coupling between local feature, and calculated amount is large, can not meet the needs of image real-time retrieval.For reducing the calculated amount of local feature coupling, CBIR is introduced the concept of " word bag " in text retrieval, on the basis of local feature, extracts visual word bag feature.At first visual word bag feature obtains the cluster centre of local feature set by clustering algorithm, cluster centre is also referred to as visual word; Then according to minor increment cluster centre principle, each local feature is divided to each cluster centre; The frequency that in last statistical picture, visual word occurs, the forming frequency histogram is as visual word bag proper vector.
Existing visual word bag feature weight method adopts the weight method of word bag in text retrieval simply, and the visual word of different importance is carried out to different weight tolerance.Yet there is following shortcoming in existing method:
(1) adopt learning method non-supervisory in text retrieval to carry out weight tolerance to the visual word of image, can not take full advantage of the discriminant information between different images.Therefore, in the computation process of characteristic similarity, existing visual word bag feature is failed the relevant image of retrieval and inquisition image, semantic effectively, thereby causes the decline of retrieval precision.
(2) in CBIR, existing visual word carries out different weight methods and gives identical visual word weight to every query image, can not excavate the otherness between different query image.
Summary of the invention
Above defect or Improvement requirement for prior art, the invention provides a kind of visual word bag feature weight method and system that drives based on classification, its purpose is to solve the technical matters that the retrieval precision that exists in existing method is low, can't excavate different query image differences.
For achieving the above object, according to one aspect of the present invention, provide a kind of visual word bag feature weight method that drives based on classification, comprised the following steps:
(1) from network, download image, and set up image data base;
(2) extract the visual word bag feature that in image data base, all N open image, wherein N is any positive integer;
(3) set up the inverted index that all N open the visual word bag feature of image;
(4) from image data base, randomly drawing N 1Image and corresponding visual word bag feature thereof, by clustering algorithm by N 1Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;
(5), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;
(6) adopt maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;
(7) according to the weight learning sample collection of visual word bag feature, calculate the visual word weight of each vision class in C vision class, and set up the mapping relations table between the visual word weight of vision class and this vision class;
(8) extract the visual word bag feature of query image, input the support vector machine discrimination model, obtain the vision class of query image;
(9) according to the mapping relations table of vision class and the visual word weight of query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;
(10) by the distance sequence from small to large that obtains in step (9), and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;
(11) judge whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.
Preferably, step (2) comprises following sub-step:
(2-1) all images in image data base are carried out to normalized, so that all images are unified as same size;
(2-2) adopt the intensive method of sampling to detect the feature point of interest of all images;
(2-3) around each feature point of interest of every image, get image block, and on image block, extract the partial descriptions symbol, to form local feature description's symbol set of this image;
(2-4) from N 1R local feature description's symbol of the interior random selection of local feature description's symbol set of opening image carries out cluster, to form M cluster centre, wherein N 1For being less than or equal to the positive integer of N, M is the number of word in visual word bag feature, and R is the positive integer times of M;
(2-5) each local feature description of calculating every image accords with the distance of each cluster centre, and usings the local feature description symbol of nearest cluster centre as this image;
(2-6) add up each cluster centre occurs in every image frequency, using the frequency histogram that forms this image visual word bag feature as this image.
Preferably, clustering algorithm comprises K mean algorithm and mean shift algorithm.
Preferably, step (5) comprises following sub-step:
(5-1), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature
Figure BDA00003478677600043
y n∈ 1,2 ..., C}, wherein f nRepresent that n opens the visual word bag feature that image is corresponding, y nThe label that n opens the vision class that image is corresponding, N 2To be less than or equal to N 1Positive integer;
(5-2) by showing that mapping function φ is by visual word bag feature f nShow to map to higher dimensional space, make this visual word bag feature can linear separability;
(5-3) the parameter c w of C linear SVM of study cAnd b c, c=1 wherein, 2 ..., C;
(5-4) according to parameter w cAnd b cBuild C support vector machine discrimination model Classifier c:
Classifier c(x)=cw c×φ(x)+b c
Wherein x is the visual word bag feature of arbitrary image;
Preferably, step (7) comprises following sub-step:
(7-1) utilize the support vector machine discrimination model to classify to the image in the weight learning sample collection of visual word bag feature, to obtain the visual classification designator that this image is corresponding; The support vector machine discrimination model is as follows:
Category ( x ) = arg max c Classifier c ( x ) ,
The visual classification designator of Category (x) presentation video x wherein;
(7-2) at the weight learning sample of visual word bag feature, concentrate, according to the label of the vision class of the visual classification designator of every image and this image, calculate classification confusion matrix P, in this matrix, the label of the vision class of the capable c column element of d representative image is d and by the wrong probability that is divided into the label c of vision class, wherein d and c are any positive integer that is less than or equal to C; Specific formula for calculation is:
P ( d , c ) = Σ n = 1 N 2 δ ( δ ( d , y n ) , δ ( c , Category ( f n ) ) Σ n = 1 N 1 δ ( d , y n )
If wherein a equals b, δ (a, b) equals 1, if a is not equal to b, δ (a, b) equals 0;
(7-3) determine that any two in C vision class are obscured class;
(7-4) utilize the weight of each visual word in the weight learning sample collection computation vision class c of this visual word bag feature, the optimized-type of visual word weight calculation is as follows:
min w y i λ 2 | | w y i | | 2 + Σ ijk ξ ijk
w y i T D ik - w y i T D ij ≥ 1 - ξ ijk
w y i ≥ 0 , ξ ijk ≥ 0 , ∀ ( i , j , k ) ∈ T
Wherein, T={ (i, j, k) | y i=y j, y i≠ y k, i, j, k ∈ 1,2 ..., N 1Represent the tlv triple of the weight learning sample collection of this visual word bag feature.y i, y j, y kWeight learning sample concentrated image i, the j, the category label of k, the wherein y that represent respectively visual word bag feature kThe visible class of expression is y iThe visible class of representative obscure class, D Ij(m) (m=1,2 ..., the M) distance between the individual visual word of presentation video i and image j m, D Ij={ D Ij(1), D Ij(2) ..., D Ij(M) } M distance vector that visual word distance forms between presentation video i and image j.D w(I i, I j) represent than the weight word bag characteristic distance D between image i and image j w(I i, I j)=w T* D Ij
(7-5) adopt random gradient descent method to solve the optimized-type of visual word weight calculation;
(7-6) set up the mapping relations table of vision class and visual word weight.
Preferably, step (9) middle distance calculating formula is as follows:
D ( I q , I i ) = w category ( f q ) × D qi = Σ m = 1 M w category ( f q ) ( m ) × D qi ( m )
D (I wherein q, I i) be the distance that in query image q and image data base, i opens the visual word bag feature between image, category (f q) be the visual category of this query image,
Figure BDA00003478677600055
Visual word weight vectors for query image;
Preferably, step (7-3) is specially, for any two vision class c and the d in C vision class, if the value of obscuring between vision class c and d is greater than threshold value ζ c, d is the class of obscuring of c, c is the class of obscuring of d, wherein threshold value ζ cSpan is 0.5 to 1.
Preferably, step (11) comprises following sub-step:
(11-1) in result for retrieval, the user selects L to open the upper feedback image I similar to query image of visual perception 1, I 2..., I L
(11-2) according to the mapping relations table between the potential category label of the sequence number of every image of image data base and this image, obtain L and open the corresponding visual category label of image;
(11-3) according to the visual category label, calculate L and open probability of occurrence of all categories in image, and calculate the visual category label that maximum probability occurs;
(11-4) recalculate under each visual category equivalent bag feature weight the distance in query image and image data base between all images
D c ( I q , I i ) = w c × D qi = Σ m = 1 M w c ( m ) × D qi ( m )
(11-5) adjusting the distance between all images in query image and image data base is fusion distance, and its calculating formula is as follows:
D ( I q , I d ) = Σ c = 1 C P ( c ) × D c ( I q , I d )
(11-6) all images in image data base are reordered from small to large and return to retrieving images by distance.
According to another aspect of the present invention, a kind of visual word bag feature weight system that drives based on classification is provided, comprising:
The first module, for from network, downloading image, and set up image data base;
The second module, the visual word bag feature of opening image for extracting all N of image data base, wherein N is any positive integer;
The 3rd module, open the inverted index of the visual word bag feature of image be used to setting up all N;
Four module, for randomly drawing N from image data base 1Image and corresponding visual word bag feature thereof, by clustering algorithm by N 1Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;
The 5th module, be used for for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;
The 6th module, for adopting maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;
The 7th module, calculate the visual word weight of C each vision class of vision class for the weight learning sample collection according to visual word bag feature, and set up the mapping relations table between the visual word weight of vision class and this vision class;
The 8th module, be used to extracting the visual word bag feature of query image, input the support vector machine discrimination model, obtains the vision class of query image;
The 9th module, mapping relations table for vision class and visual word weight according to query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;
The tenth module, for the distance sequence from small to large that the 9th module is obtained, and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;
The 11 module, for judging whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.
In general, the above technical scheme of conceiving by the present invention compared with prior art, can obtain following beneficial effect:
(1) the present invention, by the weight learning sample collection of offline group tissue visualization word bag feature, fully excavates the discriminant information between different images, and these discriminant informations is incorporated to the weight metric of visual word, improves the precision of CBIR.
(2) the present invention passes through linear classification, query image is mapped to the visual category of definition, and according to the mapping relations table of vision class and the visual word weight of off-line learning, obtain the weight of each visual word in the visual word bag of query image feature, can distinguish different query image like this, embody the otherness of different query image.
(3) the present invention organizes visual word weight corresponding to each vision class of sample learning by off-line, effectively calculates visual word weight corresponding to each vision class.Because employing processed offline, the method do not affect the response speed of CBIR.
(4) the present invention, by the confusion matrix between the computation vision class, for the visual word weight of the Category Learning of easily obscuring, effectively reduces the calculated amount of visual word weight, improves the speed of processed offline.
(5) the present invention, by the feedback method of potential visual category, introduces the high-level semantic of user's inquiry, by supporting vector machine model classification and the effective combination of visual word weight, reaction user's personalized enquire demand.
The accompanying drawing explanation
Fig. 1 is the process flow diagram that the present invention is based on the visual word bag feature weight method of classification driving.
Fig. 2 is the refinement process flow diagram of step in the inventive method (2).
Fig. 3 is the refinement process flow diagram of step in the inventive method (5).
Fig. 4 is the refinement process flow diagram of step in the inventive method (7).
Fig. 5 is the refinement process flow diagram of step in the inventive method (11).
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.In addition, below in each embodiment of described the present invention involved technical characterictic as long as form each other conflict, just can mutually not make up.
At first the technical term in the present invention is explained and illustrated:
Intensive sampling: get a pixel as point of interest every several pixels of image.
The learning sample collection: the learning sample collection is mainly used in the training of supporting vector machine model and the calculating of visual word weight.
Linear SVM: adopt linear partition to divide a kind of support vector machine in the face of positive negative sample
Visual category space: formed by vision class main in image data base.
Visual word bag feature: by quantizing local feature to cluster centre, the frequency of occurrences of the visual word of statistics.
Inverted index: use the inverted index in text retrieval to carry out index terms bag feature, accelerate retrieval rate.
Show mapping function: in the simulation supporting vector machine model, hang down the implicit expression mapping process of dimensional feature to high dimensional feature, and it is expressed as to the explicit function form.
Maximum score value principle: the classification designator of image is got the corresponding sorter label of maximum classifier calculated value.
Confusion matrix: the probability matrix of wrong minute between the vision class, the vision class that obtains not being easily distinguishable each other according to this matrix.
The Integral Thought that the present invention is based on the visual word bag feature weight method that classification drives is, the method, by the weight of the visual word of off-line learning, is set up the mapping relations table of vision class and visual word weight.In the process of retrieval, retrieving images is the vision classification of support vector machine discrimination model at first, then according to the support vector machine discrimination model, obtain the visual category of query image, last weight mapping relations table according to vision class and visual word, obtain the weight of visual word in visual word bag feature, and carry out the similarity coupling of visual word bag feature and based on the relevance feedback of result for retrieval.The method has great importance to CBIR (as the retrieval of the retrieval of shopping picture search, scenic spot image, fresh flower image, appearance patent image retrieval etc.).
As shown in Figure 1, the present invention is based on the visual word bag feature weight method that drives of classifying comprises the following steps:
(1) from network, download image, and set up image data base;
(2) extract the visual word bag feature (wherein N is any positive integer) that in image data base, all N open image; As shown in Figure 2, this step specifically comprises following sub-step:
(2-1) all images in image data base are carried out to normalized, so that all images are unified as same size;
(2-2) adopt the intensive method of sampling to detect the feature point of interest of all images;
(2-3) around each feature point of interest of every image, get image block, and on image block, extract the partial descriptions symbol, to form local feature description's symbol set of this image; In the present embodiment, the size of this image block is 16 * 16 pixels;
(2-4) from N 1R local feature description's symbol of the interior random selection of local feature description's symbol set of opening image carries out cluster, to form M cluster centre, wherein N 1For being less than or equal to the positive integer of N, M is the number of word in visual word bag feature, and R is the positive integer times of M;
(2-5) each local feature description of calculating every image accords with the distance of each cluster centre, and usings the local feature description symbol of nearest cluster centre as this image;
(2-6) add up each cluster centre occurs in every image frequency, using the frequency histogram that forms this image visual word bag feature as this image;
(3) set up the inverted index that all N open the visual word bag feature of image;
(4) from image data base, randomly drawing N 1Image and corresponding visual word bag feature thereof, by clustering algorithm by N 1Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer; Particularly, clustering algorithm comprises K mean algorithm, mean shift algorithm etc.;
(5), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class; As shown in Figure 3, this step specifically comprises following sub-step:
(5-1), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature
Figure BDA00003478677600111
y n∈ 1,2 ..., C}, wherein f nRepresent that n opens the visual word bag feature that image is corresponding, y nThe label that n opens the vision class that image is corresponding, N 2To be less than or equal to N 1Positive integer;
(5-2) by showing that mapping function φ is by visual word bag feature f nShow to map to higher dimensional space, make this visual word bag feature can linear separability;
The advantage of this sub-step is, the visual word bag feature of script linearly inseparable is shone upon and is made its linear separability by demonstration, thereby avoid adopting non-linear support that complexity is higher to the machine model, improves the efficiency of supporting vector machine model training and classification.
(5-3) the parameter c w of C linear SVM of study cAnd b c, c=1 wherein, 2 ..., C;
(5-4) according to parameter w cAnd b cBuild C support vector machine discrimination model Classifier c:
Classifier c(x)=cw c×φ(x)+b c
Wherein x is the visual word bag feature of arbitrary image;
The advantage of this step is, by off-line, organizes visual word weight corresponding to each classification of sample learning, effectively calculates visual word weight corresponding to each vision class.Because employing processed offline, the method do not affect the response speed of CBIR.
(6) adopt maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;
(7) according to the weight learning sample collection of visual word bag feature, calculate the visual word weight of each vision class in C vision class, and set up the mapping relations table between the visual word weight of vision class and this vision class; As shown in Figure 4, this step specifically comprises following sub-step:
(7-1) utilize the support vector machine discrimination model to classify to the image in the weight learning sample collection of visual word bag feature, to obtain the visual classification designator that this image is corresponding; The support vector machine discrimination model is as follows:
Category ( x ) = arg max c Classifier c ( x ) ,
The visual classification designator of Category (x) presentation video x wherein;
(7-2) at the weight learning sample of visual word bag feature, concentrate, according to the label of the vision class of the visual classification designator of every image and this image, calculate classification confusion matrix P, in this matrix, the label of the vision class of the capable c column element of d representative image is d and by the wrong probability that is divided into the label c of vision class, wherein d and c are any positive integer that is less than or equal to C; Specific formula for calculation is:
P ( d , c ) = Σ n = 1 N 2 δ ( δ ( d , y n ) , δ ( c , Category ( f n ) ) Σ n = 1 N 1 δ ( d , y n )
If wherein a equals b, δ (a, b) equals 1, if a is not equal to b, δ (a, b) equals 0
(7-3) for any two vision class c and the d in C vision class, if the value of obscuring between vision class c and d is greater than threshold value ζ c, d is the class of obscuring of c, c is the class of obscuring of d, wherein threshold value ζ cSpan is 0.5 to 1;
The advantage of this sub-step is, by the confusion matrix between the computation vision class, for the visual word weight of the Category Learning of easily obscuring, effectively reduces the calculated amount of visual word weight, improves the speed of processed offline.
(7-4) utilize the weight of each visual word in the weight learning sample collection computation vision class c of this visual word bag feature, the optimized-type of visual word weight calculation is as follows:
min w y i λ 2 | | w y i | | 2 + Σ ijk ξ ijk
w y i T D ik - w y i T D ij ≥ 1 - ξ ijk
w y i ≥ 0 , ξ ijk ≥ 0 , ∀ ( i , j , k ) ∈ T
Wherein, T={ (i, j, k) | y i=y j, y i≠ y k, i, j, k ∈ 1,2 ..., N 1Represent the tlv triple of the weight learning sample collection of this visual word bag feature.y i, y j, y kWeight learning sample concentrated image i, the j, the category label of k, the wherein y that represent respectively visual word bag feature kThe visible class of expression is y iThe visible class of representative obscure class, D Ij(m) (m=1,2 ..., the M) distance between the individual visual word of presentation video i and image j m, D Ij={ D Ij(1), D Ij(2) ..., D Ij(M) } M distance vector that visual word distance forms between presentation video i and image j.D w(I i, I j) represent than the weight word bag characteristic distance D between image i and image j w(I i, I j)=w T* D Ij
(7-5) adopt random gradient descent method to solve the optimized-type of visual word weight calculation;
(7-6) set up the mapping relations table of vision class and visual word weight;
(8) extract the visual word bag feature of query image, input the support vector machine discrimination model, obtain the vision class of query image;
(9) according to the mapping relations table of vision class and the visual word weight of query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base, and it is as follows apart from calculating formula:
D ( I q , I i ) = w category ( f q ) × D qi = Σ m = 1 M w category ( f q ) ( m ) × D qi ( m )
D (I wherein q, I i) be the distance that in query image q and image data base, i opens the visual word bag feature between image, category (f q) be the visual category of this query image,
Figure BDA00003478677600132
Visual word weight vectors for query image.
(10) by the distance sequence from small to large that obtains in step (9), and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;
(11) judge whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval, wherein L is positive integer; As shown in Figure 4, this step specifically comprises following sub-step:
(11-1) in result for retrieval, the user selects L to open the upper feedback image I similar to query image of visual perception 1, I 2..., I L
(11-2) according to the mapping relations table between the potential category label of the sequence number of every image of image data base and this image, obtain L and open the corresponding visual category label of image;
(11-3) according to the visual category label, calculate L and open probability of occurrence of all categories in image, and calculate the visual category label that maximum probability occurs;
(11-4) recalculate under each visual category equivalent bag feature weight the distance in query image and image data base between all images
D c ( I q , I i ) = w c × D qi = Σ m = 1 M w c ( m ) × D qi ( m )
(11-5) adjusting the distance between all images in query image and image data base is fusion distance, and its calculating formula is as follows:
D ( I q , I d ) = Σ c = 1 C P ( c ) × D c ( I q , I d )
(11-6) all images in image data base are reordered from small to large and return to retrieving images by distance.
The advantage of this step is, proposes the feedback method based on potential vision class, introduces the high-level semantic of user's inquiry, by supporting vector machine model classification and the effective combination of visual word weight, reaction user's personalized enquire demand.
The visual word bag feature weight system that the present invention is based on the classification driving comprises:
The first module, for from network, downloading image, and set up image data base;
The second module, the visual word bag feature of opening image for extracting all N of image data base, wherein N is any positive integer;
The 3rd module, open the inverted index of the visual word bag feature of image be used to setting up all N;
Four module, for randomly drawing N from image data base 1Image and corresponding visual word bag feature thereof, by clustering algorithm by N 1Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;
The 5th module, be used for for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;
The 6th module, for adopting maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;
The 7th module, calculate the visual word weight of C each vision class of vision class for the weight learning sample collection according to visual word bag feature, and set up the mapping relations table between the visual word weight of vision class and this vision class;
The 8th module, be used to extracting the visual word bag feature of query image, input the support vector machine discrimination model, obtains the vision class of query image;
The 9th module, mapping relations table for vision class and visual word weight according to query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;
The tenth module, for the distance sequence from small to large that the 9th module is obtained, and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;
The 11 module, for judging whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.

Claims (9)

1. a visual word bag feature weight method that drives based on classification, is characterized in that, comprises the following steps:
(1) from network, download image, and set up image data base;
(2) extract the visual word bag feature that in image data base, all N open image, wherein N is any positive integer;
(3) set up the inverted index that all N open the visual word bag feature of image;
(4) from image data base, randomly drawing N 1Image and corresponding visual word bag feature thereof, by clustering algorithm by N 1Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;
(5), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;
(6) adopt maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;
(7) according to the weight learning sample collection of visual word bag feature, calculate the visual word weight of each vision class in C vision class, and set up the mapping relations table between the visual word weight of vision class and this vision class;
(8) extract the visual word bag feature of query image, input the support vector machine discrimination model, obtain the vision class of query image;
(9) according to the mapping relations table of vision class and the visual word weight of query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;
(10) by the distance sequence from small to large that obtains in step (9), and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;
(11) judge whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.
2. according to the described visual word bag feature weight method of claim 1, it is characterized in that, step (2) comprises following sub-step:
(2-1) all images in image data base are carried out to normalized, so that all images are unified as same size;
(2-2) adopt the intensive method of sampling to detect the feature point of interest of all images;
(2-3) around each feature point of interest of every image, get image block, and on image block, extract the partial descriptions symbol, to form local feature description's symbol set of this image;
(2-4) from N 1R local feature description's symbol of the interior random selection of local feature description's symbol set of opening image carries out cluster, to form M cluster centre, wherein N 1For being less than or equal to the positive integer of N, M is the number of word in visual word bag feature, and R is the positive integer times of M;
(2-5) each local feature description of calculating every image accords with the distance of each cluster centre, and usings the local feature description symbol of nearest cluster centre as this image;
(2-6) add up each cluster centre occurs in every image frequency, using the frequency histogram that forms this image visual word bag feature as this image.
3. according to the described visual word bag feature weight method of claim 2, it is characterized in that, clustering algorithm comprises K mean algorithm and mean shift algorithm.
4. according to the described visual word bag feature weight method of claim 2, it is characterized in that, step (5) comprises following sub-step:
(5-1), for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature y n∈ 1,2 ..., C}, wherein f nRepresent that n opens the visual word bag feature that image is corresponding, y nThe label that n opens the vision class that image is corresponding, N 2To be less than or equal to N 1Positive integer;
(5-2) by showing that mapping function φ is by visual word bag feature f nShow to map to higher dimensional space, make this visual word bag feature can linear separability;
(5-3) the parameter c w of C linear SVM of study cAnd b c, c=1 wherein, 2 ..., C;
(5-4) according to parameter w cAnd b cBuild C support vector machine discrimination model Classifier c:
Classifier c(x)=cw c×φ(x)+b c
Wherein x is the visual word bag feature of arbitrary image.
5. according to the described visual word bag feature weight method of claim 4, it is characterized in that, step (7) comprises following sub-step:
(7-1) utilize the support vector machine discrimination model to classify to the image in the weight learning sample collection of visual word bag feature, to obtain the visual classification designator that this image is corresponding; The support vector machine discrimination model is as follows:
Figure FDA00003478677500031
The visual classification designator of Category (x) presentation video x wherein;
(7-2) at the weight learning sample of visual word bag feature, concentrate, according to the label of the vision class of the visual classification designator of every image and this image, calculate classification confusion matrix P, in this matrix, the label of the vision class of the capable c column element of d representative image is d and by the wrong probability that is divided into the label c of vision class, wherein d and c are any positive integer that is less than or equal to C; Specific formula for calculation is:
Figure FDA00003478677500032
If wherein a equals b, δ (a, b) equals 1, if a is not equal to b, δ (a, b) equals 0;
(7-3) determine that any two in C vision class are obscured class;
(7-4) utilize the weight of each visual word in the weight learning sample collection computation vision class c of this visual word bag feature, the optimized-type of visual word weight calculation is as follows:
Figure FDA00003478677500041
Figure FDA00003478677500042
Figure FDA00003478677500043
Wherein, T={ (i, j, k) | y i=y j, y i≠ y k, i, j, k ∈ 1,2 ..., N 1Represent the tlv triple of the weight learning sample collection of this visual word bag feature.y i, y j, y kWeight learning sample concentrated image i, the j, the category label of k, the wherein y that represent respectively visual word bag feature kThe visible class of expression is y iThe visible class of representative obscure class, D Ij(m) (m=1,2 ..., the M) distance between the individual visual word of presentation video i and image j m, D Ij={ D Ij(1), D Ij(2) ..., D Ij(M) } M distance vector that visual word distance forms between presentation video i and image j.D w(I i, I j) represent than the weight word bag characteristic distance D between image i and image j w(I i, I j)=w T* D Ij
(7-5) adopt random gradient descent method to solve the optimized-type of visual word weight calculation;
(7-6) set up the mapping relations table of vision class and visual word weight.
6. according to the described visual word bag feature weight method of claim 5, it is characterized in that, step (9) middle distance calculating formula is as follows:
Figure FDA00003478677500044
D (I wherein q, I i) be the distance that in query image q and image data base, i opens the visual word bag feature between image, category (f q) be the visual category of this query image, Visual word weight vectors for query image.
7. according to the described visual word bag feature weight method of claim 5, it is characterized in that, step (7-3) is specially, for any two vision class c and the d in C vision class, if the value of obscuring between vision class c and d is greater than threshold value ζ c, d is the class of obscuring of c, c is the class of obscuring of d, wherein threshold value ζ cSpan is 0.5 to 1.
8. according to the described visual word bag feature weight method of claim 1, it is characterized in that, step (11) comprises following sub-step:
(11-1) in result for retrieval, the user selects L to open the upper feedback image I similar to query image of visual perception 1, I 2..., I L
(11-2) according to the mapping relations table between the potential category label of the sequence number of every image of image data base and this image, obtain L and open the corresponding visual category label of image;
(11-3) according to the visual category label, calculate L and open probability of occurrence of all categories in image, and calculate the visual category label that maximum probability occurs;
(11-4) recalculate under each visual category equivalent bag feature weight the distance in query image and image data base between all images
Figure FDA00003478677500051
(11-5) adjusting the distance between all images in query image and image data base is fusion distance, and its calculating formula is as follows:
Figure FDA00003478677500052
(11-6) all images in image data base are reordered from small to large and return to retrieving images by distance.
9. a visual word bag feature weight system that drives based on classification, is characterized in that, comprising:
The first module, for from network, downloading image, and set up image data base;
The second module, the visual word bag feature of opening image for extracting all N of image data base, wherein N is any positive integer;
The 3rd module, open the inverted index of the visual word bag feature of image be used to setting up all N;
Four module, for randomly drawing N from image data base 1Image and corresponding visual word bag feature thereof, by clustering algorithm by N 1Open image and form C vision class, and from the random learning sample collection of selecting image to form this vision class each vision class, C is positive integer;
The 5th module, be used for for each vision class, on the learning sample collection of this vision class, set up the weight learning sample collection of visual word bag feature, and utilize this weight learning sample training to practice each vision class, to form the support vector machine discrimination model of this vision class;
The 6th module, for adopting maximum sorter score value principle, utilize the support vector machine discrimination model that obtains to extract the vision class of every image in image data base, this vision class is the potential category label of this image, and sets up the mapping relations table between the potential category label of the sequence number of every image and this image;
The 7th module, calculate the visual word weight of C each vision class of vision class for the weight learning sample collection according to visual word bag feature, and set up the mapping relations table between the visual word weight of vision class and this vision class;
The 8th module, be used to extracting the visual word bag feature of query image, input the support vector machine discrimination model, obtains the vision class of query image;
The 9th module, mapping relations table for vision class and visual word weight according to query image, obtain the weight of visual word in the visual word bag of this query image feature, input inverted index file, calculate the distance between all images in query image and image data base;
The tenth module, for the distance sequence from small to large that the 9th module is obtained, and return to the image in the corresponding image data base of each distance of retrieval user, to obtain result for retrieval;
The 11 module, for judging whether result for retrieval meets the retrieval user demand, if retrieval user is dissatisfied to result for retrieval, in result for retrieval, select L to open image identical with query image on the visual perception and carry out positive feedback, adjust the weight of visual word, the input inverted index, obtain the result that reorders; If retrieval user is satisfied to result for retrieval, directly return to result for retrieval.
CN2013102858915A 2013-07-08 2013-07-08 Visual word bag feature weighting method and system based on classification drive Pending CN103399870A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013102858915A CN103399870A (en) 2013-07-08 2013-07-08 Visual word bag feature weighting method and system based on classification drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013102858915A CN103399870A (en) 2013-07-08 2013-07-08 Visual word bag feature weighting method and system based on classification drive

Publications (1)

Publication Number Publication Date
CN103399870A true CN103399870A (en) 2013-11-20

Family

ID=49563500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013102858915A Pending CN103399870A (en) 2013-07-08 2013-07-08 Visual word bag feature weighting method and system based on classification drive

Country Status (1)

Country Link
CN (1) CN103399870A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765764A (en) * 2015-02-06 2015-07-08 南京理工大学 Indexing method based on large-scale image
CN106055580A (en) * 2016-05-23 2016-10-26 中南大学 Radviz-based fuzzy clustering result visualization method
CN107862342A (en) * 2017-11-27 2018-03-30 清华大学 Lift the visual analysis system and method for tree-model
CN109489663A (en) * 2018-10-19 2019-03-19 北京三快在线科技有限公司 A kind of localization method and device, mobile device and computer readable storage medium
CN109740671A (en) * 2019-01-03 2019-05-10 北京妙医佳信息技术有限公司 A kind of image-recognizing method and device
CN109992676A (en) * 2019-04-01 2019-07-09 中国传媒大学 Across the media resource search method of one kind and searching system
CN110175249A (en) * 2019-05-31 2019-08-27 中科软科技股份有限公司 A kind of search method and system of similar pictures
CN110968721A (en) * 2019-11-28 2020-04-07 上海冠勇信息科技有限公司 Method and system for searching infringement of mass images and computer readable storage medium thereof
CN111310712A (en) * 2020-03-04 2020-06-19 杭州晟元数据安全技术股份有限公司 Fast searching method based on fingerprint bag-of-words features
WO2021087770A1 (en) * 2019-11-05 2021-05-14 深圳市欢太科技有限公司 Picture classification method and apparatus, and storage medium and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521336A (en) * 2011-12-08 2012-06-27 中国信息安全测评中心 Vulnerability information three-dimensional presentation method based on dynamic relationship
CN102663015A (en) * 2012-03-21 2012-09-12 上海大学 Video semantic labeling method based on characteristics bag models and supervised learning
CN102663446A (en) * 2012-04-24 2012-09-12 南方医科大学 Building method of bag-of-word model of medical focus image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521336A (en) * 2011-12-08 2012-06-27 中国信息安全测评中心 Vulnerability information three-dimensional presentation method based on dynamic relationship
CN102663015A (en) * 2012-03-21 2012-09-12 上海大学 Video semantic labeling method based on characteristics bag models and supervised learning
CN102663446A (en) * 2012-04-24 2012-09-12 南方医科大学 Building method of bag-of-word model of medical focus image

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765764A (en) * 2015-02-06 2015-07-08 南京理工大学 Indexing method based on large-scale image
CN106055580A (en) * 2016-05-23 2016-10-26 中南大学 Radviz-based fuzzy clustering result visualization method
CN106055580B (en) * 2016-05-23 2019-02-05 中南大学 A kind of fuzzy clustering result visualization method based on Radviz
CN107862342A (en) * 2017-11-27 2018-03-30 清华大学 Lift the visual analysis system and method for tree-model
CN109489663A (en) * 2018-10-19 2019-03-19 北京三快在线科技有限公司 A kind of localization method and device, mobile device and computer readable storage medium
CN109740671A (en) * 2019-01-03 2019-05-10 北京妙医佳信息技术有限公司 A kind of image-recognizing method and device
CN109992676A (en) * 2019-04-01 2019-07-09 中国传媒大学 Across the media resource search method of one kind and searching system
CN109992676B (en) * 2019-04-01 2020-12-25 中国传媒大学 Cross-media resource retrieval method and retrieval system
CN110175249A (en) * 2019-05-31 2019-08-27 中科软科技股份有限公司 A kind of search method and system of similar pictures
WO2021087770A1 (en) * 2019-11-05 2021-05-14 深圳市欢太科技有限公司 Picture classification method and apparatus, and storage medium and electronic device
CN110968721A (en) * 2019-11-28 2020-04-07 上海冠勇信息科技有限公司 Method and system for searching infringement of mass images and computer readable storage medium thereof
CN111310712A (en) * 2020-03-04 2020-06-19 杭州晟元数据安全技术股份有限公司 Fast searching method based on fingerprint bag-of-words features
CN111310712B (en) * 2020-03-04 2024-02-13 杭州晟元数据安全技术股份有限公司 Quick searching method based on fingerprint word bag characteristics

Similar Documents

Publication Publication Date Title
CN103399870A (en) Visual word bag feature weighting method and system based on classification drive
CN110750656B (en) Multimedia detection method based on knowledge graph
CN107273502B (en) Image geographic labeling method based on spatial cognitive learning
CN108537269B (en) Weak interactive object detection deep learning method and system thereof
CN105574133A (en) Multi-mode intelligent question answering system and method
CN109325148A (en) The method and apparatus for generating information
CN103440274B (en) A kind of video event sketch construction described based on details and matching process
CN104778224B (en) A kind of destination object social networks recognition methods based on video semanteme
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
US20080118160A1 (en) System and method for browsing an image database
CN104679863A (en) Method and system for searching images by images based on deep learning
CN106257496B (en) Mass network text and non-textual image classification method
CN101739428B (en) Method for establishing index for multimedia
CN103052953A (en) Information processing device, method of processing information, and program
CN107045519B (en) A kind of visualization system of analysis crowd movement law
CN103559191A (en) Cross-media sorting method based on hidden space learning and two-way sorting learning
CN102486793A (en) Method and system for searching target user
CN110166851B (en) Video abstract generation method and device and storage medium
CN113010711B (en) Method and system for automatically generating movie poster based on deep learning
CN104252616A (en) Human face marking method, device and equipment
CN109308324A (en) A kind of image search method and system based on hand drawing style recommendation
CN109472282A (en) A kind of depth image hash method based on few training sample
CN109657082A (en) Remote sensing images multi-tag search method and system based on full convolutional neural networks
CN111125396A (en) Image retrieval method of single-model multi-branch structure
Truong et al. Video search based on semantic extraction and locally regional object proposal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131120