CN102385592B - Image concept detection method and device - Google Patents

Image concept detection method and device Download PDF

Info

Publication number
CN102385592B
CN102385592B CN201010271693.XA CN201010271693A CN102385592B CN 102385592 B CN102385592 B CN 102385592B CN 201010271693 A CN201010271693 A CN 201010271693A CN 102385592 B CN102385592 B CN 102385592B
Authority
CN
China
Prior art keywords
concept
word list
sub
local feature
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201010271693.XA
Other languages
Chinese (zh)
Other versions
CN102385592A (en
Inventor
冯明
梁笃国
张艳霞
曹宁
邓涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201010271693.XA priority Critical patent/CN102385592B/en
Publication of CN102385592A publication Critical patent/CN102385592A/en
Application granted granted Critical
Publication of CN102385592B publication Critical patent/CN102385592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image concept detection method and a device, wherein the method comprises the steps of acquiring data to be detected and local characteristics of training data of multiple concepts; gathering word lists with different lengths according to different quantization strategies and respectively performing statistics to the data to be detected and histograms of the local characteristics of training data of the multiple concepts; training a binary support vector machine classifier, calculating the average detection accuracy rate of the local characteristics of training data of each concept and training a classification model of each concept; selecting the best sub word list of each concept through cross validation and using the trained classification model corresponding to the best sub word list of each concept as the final concept detection classifier of each concept; and inputting the statistical histograms of the local characteristics of the data to be detected on the best sub word list of each concept to the final concept detection classifier of each concept to determine the probability of each concept emerging in the data to be detected.

Description

The detection method of image concept and device
Technical field
The present invention relates to multimedia messages detection technique field, more specifically, relate to a kind of detection method and device of image concept.
Background technology
In recent years along with the increase at full speed of the video on network, image resource, produced the digital picture resource of magnanimity, how helping user from so abundant Internet resources, to search rapidly effective resource just becomes the hot issue of recent numerous research units research.One of gordian technique addressing this problem for effective search method of image information.Since early 1990s, CBIR (Content-based Image Retrieval, CBIR) technology is valued by the people gradually.CBIR technology utilizes the low-level feature information such as color, shape, texture and the region of image to be described the index as image to image, calculate the similarity distance of query image and target image, retrieve by similarity coupling, return to one group of image that in image library, content description meets the demands most.
But; due to the similarity of image vision low-level feature and be not exclusively equal to the similarity of people's subjective judgement image; so user conventionally can propose conceptual retrieval requirement in the time carrying out image retrieval, and return to image and whether meet oneself needs from subjective judgement.Therefore,, in order to realize the natural language retrieval mode of the understandability of being more close to the users, the image retrieval technologies of research based on semantic become the developing direction of current field of image search.Concept detection technology is the key link of the image retrieval technologies based on semantic, and the development of concept detection technology can improve the image retrieval effect based on semantic to a great extent.
Concept detection technology is as the typical mode identification technology of one, and feature extraction is link very important in concept detection technology.Because high-layer semantic information cannot directly obtain from the visual signature of image, so the validity feature that feature extraction step extracts can directly affect sorter, so that the performance of whole mode identification procedure.What wish extraction most is that those have obvious differentiation meaning, easily extraction and the feature set to insensitive for noise.
In recent years, there are a lot of research units to do a large amount of research to Feature Extraction Technology both at home and abroad, can roughly characteristics of image be divided into global characteristics and local feature.Global characteristics is many features about color, texture, shape and region of extracting from original pixel value, global characteristics can be expressed the most essential characteristic of image, but global characteristics also has significant limitation, for example, color characteristic is subject to the impact of brightness of image and colourity to a great extent, and the image of the different colourity of same content, brightness is distinguished very large on color characteristic; The feature such as texture, shape is for the vicissitudinous image recognition poor effect of translation, rotation and yardstick.These problems have all embodied the limitation of global characteristics.
In order to address these problems, David G.Lowe has summed up the existing characteristic detection method based on invariant technology in 2004, and formally proposed a kind of based on metric space, to image scaling, rotate the image local feature yardstick invariant features transformation operator (Scale-Invariant Feature Transform, SIFT) that even affined transformation maintains the invariance.In the last few years, there are a lot of research institutions how to utilize SIFT operator to carry out having done a large amount of research aspect concept detection, the word bag model (Bag of words) being proposed by LiFei-fei is for the processing of SIFT feature, embody good effect concept identification is technical, obtained application very widely.
But said method is too single in the selection of the word list of word bag model, all concepts all adopt the word list of equal length, cause the detection efficiency of concept lowlyer, and the computing power of computing machine are had to very high requirement.
Summary of the invention
The technical matters that the present invention will solve is to provide a kind of detection method of image concept, can guarantee that image concept improves detection efficiency in the situation that of detecting effect.
The invention provides a kind of detection method of image concept, comprise the local feature that utilizes SIFT algorithm to obtain respectively the local feature of testing data and the training data of multiple concepts; According to different quantization strategies, utilize the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, the word list of multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of testing data and the local feature of the training data of multiple concepts, wherein, histogram is that local feature is at word bag model { B 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of multiple concepts under i quantization strategy, word list B icomprise the multiple sub-word list corresponding with multiple concepts, every sub-word list is the word list of each concept under i quantization strategy, and the length of every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1; The local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of training set, checksum set, each concept and the local feature of the training data of each concept, and utilize checksum set on binary support vector machine classifier, to calculate and word bag model { B 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list; To calculate with word bag model { B 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at word bag model { B 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by multiple concepts separately final concept detection sorter merge into best concept and detect sorter; The best concept that is input to the histogram that the local feature of testing data is counted on the sub-word list of the best of each concept detects the probability to determine that each concept occurs in testing data in sorter.
According to the inventive method embodiment, according to different quantization strategies, utilize before the local feature of the training data of K means Method and each concept assembles the step about the word list of the different length of each concept, the method is also included as multiple concepts and adds a background classes.
According to another embodiment of the inventive method, 20≤K≤200.
According to the another embodiment of the inventive method, the method also comprises the training data of choosing the each concept that comprises markup information according to Sampling Strategy.
According to an embodiment again of the inventive method, Sampling Strategy is wherein, N ifor the positive sample size of i concept before sampling, n ifor the quantity of training data of i concept after sampling, a ifor the Sampling Strategy parameter between 0 and 1.
Image concept detection method of the present invention is selected the sub-word list for each concept the best adaptively through cross validation, the concept less for local feature adopts shorter word list, not only obtain good detection effect but also improved detection efficiency, the effect for the sufficiently long word list of the abundant conceptual choice of local feature to guarantee to detect.The present invention is directed to different concepts and select respectively the word list of different length, in guaranteeing to detect effect, improved the efficiency detecting.
Another technical matters that the present invention will solve is to provide a kind of pick-up unit of image concept, can guarantee that image concept improves detection efficiency in the situation that of detecting effect.
The invention provides a kind of pick-up unit of image concept, comprise local feature extraction module, for utilizing SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts; Cluster module, be connected with local feature extraction module, be used for according to different quantization strategies, utilize the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, the word list of multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of testing data and the local feature of the training data of multiple concepts, wherein, histogram is that local feature is at word bag model { B 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of multiple concepts under i quantization strategy, word list B icomprise the multiple sub-word list corresponding with multiple concepts, every sub-word list is the word list of each concept under i quantization strategy, and the length of every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1; Disaggregated model training module, be connected with cluster module with local feature extraction module respectively, for the local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of training set, checksum set, each concept and the local feature of the training data of each concept, and utilize checksum set on binary support vector machine classifier, to calculate and word bag model { B 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list; Cross validation module, is connected with disaggregated model training module with cluster module, for to calculate with word bag model { B 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at word bag model { B 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by multiple concepts separately final concept detection sorter merge into best concept and detect sorter; Concept detection module, be connected with cross validation module with cluster module, be input to best concept for the histogram that the local feature of testing data is counted on the sub-word list of the best of each concept and detect the probability of sorter to determine that each concept occurs in testing data.
An embodiment of the apparatus according to the invention, this device also comprises class interpolation module, is connected with cluster module, is used to multiple concepts to add a background classes.
According to another embodiment of apparatus of the present invention, 20≤K≤200.
According to the another embodiment of apparatus of the present invention, this device also comprises sampling module, is connected, for choose the training data of the each concept that comprises markup information according to Sampling Strategy with local feature extraction module.
According to an embodiment again of apparatus of the present invention, Sampling Strategy is wherein, N ifor the positive sample size of i concept before sampling, n ifor the quantity of training data of i concept after sampling, a ifor the Sampling Strategy parameter between 0 and 1.
Image concept pick-up unit of the present invention is selected the sub-word list for each concept the best adaptively through cross validation, the concept less for local feature adopts shorter word list, not only obtain good detection effect but also improved detection efficiency, the effect for the sufficiently long word list of the abundant conceptual choice of local feature to guarantee to detect.The present invention is directed to different concepts and select respectively the word list of different length, in guaranteeing to detect effect, improved the efficiency detecting.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part.In the accompanying drawings:
Fig. 1 is the schematic flow sheet of first embodiment of the inventive method.
Fig. 2 is the schematic flow sheet of second embodiment of the inventive method.
Fig. 3 is the schematic flow sheet of the 3rd embodiment of the inventive method.
Fig. 4 is the structural representation of first embodiment of apparatus of the present invention.
Fig. 5 is the structural representation of second embodiment of apparatus of the present invention.
Fig. 6 is the structural representation of the 3rd embodiment of apparatus of the present invention.
Fig. 7 is the structural representation of the 4th embodiment of apparatus of the present invention.
Embodiment
With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention is wherein described.Exemplary embodiment of the present invention and explanation thereof are used for explaining the present invention, but do not form inappropriate limitation of the present invention.
The object of the invention is to propose a kind of concept detection method and device of the word bag model based on the sub-word list of the best, it can overcome the lower defect of concept detection efficiency in prior art.The present invention is directed to each concept, through cross validation select appropriate length word list (, best sub-word list), utilize the sub-word list of the best of each concept to learn out disaggregated model by binary support vector machine, the disaggregated model of multiple concepts is merged into best concept and detect sorter, and utilize best concept detection sorter to treat detected image and carry out the detection of concept.The present invention is better than traditional word bag model in performance, and has obtained good experiment effect.
Fig. 1 is the schematic flow sheet of first embodiment of the inventive method.
As shown in Figure 1, this embodiment can comprise the following steps:
S102, utilizes SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts;
For example, for every width image (, testing data and training data), first between three adjacent yardsticks of difference of Gaussian metric space, find Local Extremum, this point is the extreme point in 26 points that close in this metric space and adjacent metric space, again by the three-dimensional quadratic function of matching accurately to determine position and the yardstick of Local Extremum, next the gradient direction distribution characteristic of utilizing definite Local Extremum neighbor is that each Local Extremum is formulated direction parameter, make operator possess rotational invariance, the Local Extremum of customizing messages in presentation video can be called as to point of interest, finally centered by each point of interest, 8 × 8 window extracts 128 proper vectors of tieing up, wherein, these points of interest in every width image are a subset of the Local Extremum that contains directivity information,
S104, according to different quantization strategies, utilizes the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, and the word list of multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of testing data and the local feature of the training data of multiple concepts, wherein, histogram is that local feature is at word bag model { B 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of multiple concepts under i quantization strategy, word list B icomprise multiple sub-word lists corresponding with concept, every sub-word list is the word list of each concept under i quantization strategy, and the length of every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1;
Illustrate, all training images of each concept (, the training data of each concept) local feature that obtains through SIFT algorithm process is (, based on the proper vector of point of interest) all utilize the method for K mean cluster, according to the difference of quantization strategy (, in the time of cluster, get different K values), assemble the class of unequal number amount, each class can be regarded as to a word in word list, the class of unequal number amount, , the length difference of each word list, then the word list of the different length of all concepts is merged together, form word bag model, , build the word list of multiple different lengths, be designated as { B 1, B 2..., B i..., B n, the difference of quantization strategy refers to the value difference of K herein, K is integer, conventionally, K > 1, preferably, 20≤K≤200, for example, in the time that K equals 20, can utilize K means Method to calculate the word list B of M the concept that sub-word list length is 20 1in the time that K equals 30, can utilize K means Method to calculate the word list B2 of M the concept that sub-word list length is 30, the rest may be inferred, in the time that K equals 200, can utilize K means Method to calculate the word list BN of M the concept that sub-word list length is 200, the word list of the different length of M concept is merged into word bag model { B 1, B 2..., B i..., B n, then according to word bag model { B 1, B 2..., B i..., B nthe concentrated SIFT local feature that (comprising testing image and training image), all images extracted of statistical picture, (, the local feature of every width image is at word bag model { B about the histogram of every the sub-word list in each word list to obtain every width image 1, B 2..., B i..., B nin each word list B iin each " sub-word list " in occur number of times), in other words, the representative of this histogram is being represented the frequency that these unique points of each concept occur in every sub-word list in statistical picture, the proper vector that this histogram can be used as every image is input in concept detection sorter classifies,
S106, the local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of training set, checksum set, each concept and the local feature of the training data of each concept, and utilize checksum set on binary support vector machine classifier, to calculate and word bag model { B 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list;
For example, can select binary support vector machine is basic classification device, on training image storehouse, train concept detection sorter by machine learning and mode identification technology, wherein, binary support vector machine is that a kind of on decision-making lineoid, in optimization training set, between sample, the algorithm on border is (, support vector machine is by DUAL PROBLEMS OF VECTOR MAPPING to more in the space of higher-dimension, in this space, set up a largest interval lineoid, both sides at the lineoid that separates data have two lineoid parallel to each other, thereby realize a kind of learning algorithm of Data classification);
Particularly, training image storehouse (, training data) the point of interest set after SIFT algorithm process can be divided into training set and checksum set two parts, choose word list B 1(suppose word list B 1comprise M sub-word list B 11, B 12.., B 1M), according to comprising concept C 1all images of markup information and the SIFT local feature of this image at word list B 1each sub-word list in the histogram information that counts on training set, train binary support vector machine classifier, adjust the parameter of binary support vector machine classifier, and test on checksum set, the correlation parameter of adjusting kernel function (generally adopts radial basis function (Radial Basis Function, RBF) core, wherein, parameters C and δ utilize checksum set data to obtain best parameter by cross validation to select), to determine word list B 1in with concept C 1the parameter of the binary support vector machine classifier that corresponding sub-word list is optimum condition,, binary support vector machine classifier test performance the best on checksum set, the concept detection Average Accuracy also calculating on checksum set is the highest, and training obtains and word list B 1in sub-word list B 11corresponding concept C 1disaggregated model, adopt and use the same method, can train respectively and obtain and word list B 1sub-word list B 12.., B 1Mcorresponding other concepts (C 2.., C m) disaggregated model, change word list, repeat above-mentioned steps, with new word list B ithe histogram of lower statistics is trained as proper vector, obtains and word list B isub-word list B i1, B i2.., B iMcorresponding concept (C 1, C 2.., C m) disaggregated model and with word list B iin sub-word list B i1, B i2.., B iMthe detection Average Accuracy of the local feature of the training data of corresponding each concept;
S108, to calculate with word bag model { B 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at word bag model { B 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by multiple concepts separately final concept detection sorter merge into best concept and detect sorter, alternatively, the sub-word list of multiple concepts the best separately can also be merged into best word bag model;
For example, (for example can obtain the performance table of a different concepts under the sub-word list of difference by the step of S106, detect Average Accuracy table), by cross validation (, the mutually relatively corresponding multiple detection Average Accuracies of sub-word lists different from identical concept) in this table, choose the best sub-word list of the best sub-word list of performance as each concept, and the disaggregated model that sub-the best of utilizing each concept word list is learnt out by binary support vector machine is as the final concept detection sorter of each concept, M the final concept detection sorter of concept merged into best concept and detect sorter, the concept comprising for detection of testing image,
S110, the histogram that the local feature of testing data is counted on the sub-word list of the best of each concept is input to best concept and detects the probability to determine that each concept occurs in testing data in sorter;
For example, the SIFT local feature of image set to be detected statistic histogram on the selected sub-word list of the best in S108 is input to best concept and detects in sorter, best concept detects sorter and exports about M concept (, { C 1, C 2..., C m) the testing result of all images to be detected, best concept detects the court verdict that detection of classifier goes out can be shown as probability judgement, that is, the decimal between export one 0~1, the degree of confidence of expression " existence " this concept.
This embodiment combining image treatment technology and mode identification technology realize the semantic concept of image are detected, and it can be the different most suitable word list length of conceptual choice, form the best word bag model of self-adaptation word list length.Meanwhile, this embodiment carries out respectively cluster to each concept, then obtains word list by merging, makes computing machine can carry out parallel computation to improve detection efficiency.In addition, this embodiment is in the time of the image retrieval carrying out based on semantic, and its performance is better than the word bag model that adopts original vocabulary, can significantly improve the retrieval performance of image.
Fig. 2 is the schematic flow sheet of second embodiment of the inventive method.
As shown in Figure 2, this embodiment can comprise the following steps:
S202, utilizes SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts;
S204, for multiple concepts are added a background classes, , for all need detect concepts (for example, M concept) an additional background classes, adding of background classes can provide a lot of background informations for word bag model, the background information of detected data can be proposed on the one hand, to detect more accurately the concept in data to be tested, the pure background information that does not comprise any concept can also be grouped in background classes on the other hand, pure background information is grouped in the corresponding class of certain concept mistakenly preventing, thereby can improve significantly the detection Average Accuracy of concept to be detected,
S206, according to different quantization strategies, utilize the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept (concept herein not only comprises concept to be detected and also comprises added background classes), the word list of multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of testing data and the local feature of the training data of multiple concepts, wherein, histogram is that local feature is at word bag model { B 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of multiple concepts under i quantization strategy, word list B icomprise the multiple sub-word list corresponding with multiple concepts, every sub-word list is the word list of each concept under i quantization strategy, and the length of every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1;
S208, the local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of training set, checksum set, each concept and the local feature of the training data of each concept, and utilize checksum set on binary support vector machine classifier, to calculate and word bag model { B 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list;
S210, to calculate with word bag model { B 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at word bag model { B 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by multiple concepts separately final concept detection sorter merge into best concept and detect sorter;
S212, the histogram that the local feature of testing data is counted on the sub-word list of the best of each concept is input to best concept and detects the probability to determine that each concept occurs in testing data in sorter.
Fig. 3 is the schematic flow sheet of the 3rd embodiment of the inventive method.
As shown in Figure 3, this embodiment can comprise the following steps:
S302, chooses the training data of the each concept that comprises markup information according to Sampling Strategy, wherein, Sampling Strategy can be wherein, N ifor the positive sample size of i concept before sampling, n ifor the quantity of the training data (that is, just sample) of i concept after sampling, a ifor the Sampling Strategy parameter between 0 and 1;
Illustrate, utilize Sampling Strategy for through artificial mark (, mark image in every pictures or video whether comprise certain/some concept) M concept in the training data of each concept choose, the M a selecting concept (can be expressed as to { C 1, C 2..., C m) training data be expressed as { T 1, T 2..., T m), wherein, choose T 1, T 2..., T msampling Strategy Deng training data is
If the positive sample of certain concept (that is, comprising certain concept in this sample) quantity is less than or equal to 100, because its positive sample size is less, in order to make training data comprise abundant information, all positive sample datas are trained; If the positive sample size of certain concept, more than 100, is selected a Sampling Strategy parameter a i(this Sampling Strategy parameter a conventionally, ican be between 0 and 1), n samples out from the positive sample more than 100 i=a i× (N i-100) individual positive sample training, illustrates, if " night " this concept has 252 positive samples, adopts a i=0.5 Sampling Strategy parameter, has 76 positive samples to participate in training this concept for night;
S304, utilizes SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts;
S306, according to different quantization strategies, utilizes the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, and the word list of multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of testing data and the local feature of the training data of multiple concepts, wherein, histogram is that local feature is at word bag model { B 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of multiple concepts under i quantization strategy, word list B icomprise the multiple sub-word list corresponding with multiple concepts, every sub-word list is the word list of each concept under i quantization strategy, and the length of every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1;
S308, the local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of training set, checksum set, each concept and the local feature of the training data of each concept, and utilize checksum set on binary support vector machine classifier, to calculate and word bag model { B 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list;
S310, to calculate with word bag model { B 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at word bag model { B 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by multiple concepts separately final concept detection sorter merge into best concept and detect sorter;
S312, the histogram that the local feature of testing data is counted on the sub-word list of the best of each concept is input to best concept and detects the probability to determine that each concept occurs in testing data in sorter.
Positive sample Sampling Strategy in this embodiment, in guaranteeing that positive sample contains information, has simplified training data, has improved the detection efficiency of concept.
In the above-described embodiments, preferably, 20≤K≤200, in the time that the value of K is less than 20, constructed word list generally can not be expressed the characteristic information of this concept fully, declines to a great extent thereby can make to detect effect; And in the time that the value of K is greater than 200, constructed word list information relative redundancy, has increased the computational burden of computing machine greatly, and do not promote significantly in effect.
The 4th embodiment of the inventive method can comprise the following steps:
Step 1, for, through the training data after semantic artificial mark, application Sampling Strategy is chosen, and uses N irepresent the positive sample total that i concept before sampling has, use n irepresent the quantity of the positive sample that is applied to training extracting, Sampling Strategy parameter a iconventionally between 0 and 1, Sampling Strategy is step 2, all testing images and training image utilize SIFT algorithm to obtain its local feature, and for example, SIFT algorithm can adopt dimensional Gaussian linear transformation core (wherein, σ has represented the variance of Gauss normal distribution) set up metric space, for the two dimensional image of a width gray scale, metric space under different yardsticks represents to be obtained by image and gaussian kernel convolution: L (x, y, σ)=G (x, y, σ) * I (x, y), wherein (x, y) location of pixels of representative image, I (x, y) represents the gray-scale value of this pixel image, σ is called the metric space factor, and L has represented the metric space of image;
After metric space is set up, in order to find stable extreme point, can adopt the method for difference of Gaussian to detect those extreme points at local location, , adopt the image subtraction in two adjacent yardsticks: D (x, y, σ)=L (x, y, k σ)-L (x, y, σ), (for every width image, can between three of a difference of Gaussian metric space adjacent yardstick, find Local Extremum), afterwards by fitting three-dimensional quadratic function accurately to determine position and the yardstick of Local Extremum, next the gradient direction distribution characteristic of utilizing definite Local Extremum neighbor is that each Local Extremum is formulated direction parameter, make operator possess rotational invariance,
Finally 8 × 8 window centered by each point of interest, then on each 4 × 4 fritter, calculate the histogram of gradients of 8 directions, calculate the accumulated value in each direction, obtain 128 dimensional feature vectors of each point of interest;
Step 3, based on great many of experiments, the additional background classes of concept that detect at all needs are effectively to improve the effect of concept detection, represent altogether to need the number of detection concept with M, an additional background classes, total M+1 concept, each concept is utilized the method for K mean cluster, select different quantization strategies (that is, to choose respectively different K values, conventionally, K > 1, preferably, 20≤K≤200), assemble { c 1, c 2..., c n(, the value of K is respectively c for the class of individual varying number 1, c 2..., c n), each class can be regarded as to a word in word list, then the word list of multiple different lengths of this M concept is merged together and forms word bag model, that is, build length for { (M+1) c 1, (M+1}c 2..., (M+1) c nword list, be designated as (B 1, B 2..., B i..., B n, then according to word bag model { B 1, B 2..., B i..., B nstatistical picture concentrates the SIFT local feature that extracted separately of all images, obtains every width image about each word list B iin the histogram of every sub-word list, the proper vector that this histogram can be used as every image is input in concept detection sorter classifies;
Step 4, can select binary support vector machine is basic classification device, on training image storehouse, train concept detection sorter by machine learning and mode identification technology, wherein, binary support vector machine be a kind of on decision-making lineoid the algorithm on border between sample in optimization training set, particularly, training image storehouse can be divided into training set and checksum set two parts at the point of interest set after SIFT algorithm process, the image in every part all comprises for concept C imarkup information and the SIFT local feature of this image at word list B ieach sub-word list in the histogram information that counts, on training set, train binary support vector machine classifier based on above-mentioned information, adjust the parameter of binary support vector machine classifier, and test on checksum set, to determine word list B iin with concept C ithe parameter of the binary support vector machine classifier that corresponding sub-word list is optimum condition, that is, sorter test performance the best on checksum set, also concept detection Average Accuracy is the highest, obtains and word list B ithe concept C that neutron word list is corresponding idisaggregated model, and with word list B ithe concept C that neutron word list is corresponding ithe detection Average Accuracy of local feature of training data, in like manner, obtain and word list B iin the disaggregated model of the corresponding each concept of other sub-word lists, and with word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of other sub-word lists;
Step 5, utilize different word lists, repeating step four, train as proper vector using the histogram of adding up under new word list, obtain the disaggregated model of the detection Average Accuracy of the each concept on new checksum set and all concepts corresponding with every sub-word list in new word list, same step, obtains the word list of all different lengths for the detection effect table P of different concepts mn(m represents the sum of word list, and n represents the sum of concept), wherein, table P mnin element be p ij(i is expressed as the sequence number of different length word list, and j is expressed as the sequence number of concept), the meaning of its expression is the j conception of species, is (M+1) c in length iword list B iunder concept detection Average Accuracy, afterwards can be by P ijchange into row vector (α 0, α 1..., α n) t, get the maximum norm of each row vector || α i|| , that is, pick out for concept c jthere is the sub-word list of the best of highest detection Average Accuracy, and using the disaggregated model that utilizes the sub-word list of this best to learn out by binary support vector machine as concept c jfinal concept detection sorter, merges into best concept by M the final concept detection sorter of concept and detects sorter;
Step 6, detects sorter by the SIFT local feature of the image set to be detected input of statistic histogram on the selected sub-word list of the best best concept in step 5, and best concept detects sorter and exports about M concept (, { C 1, C 2..., C m) the testing result of all images to be detected, the court verdict that best concept detection detection of classifier goes out can be shown as probability judgement, export the decimal between 0~1, represent the degree of confidence of " existence " this concept, if degree of confidence exceedes 0.5, be judged to be testing image and have this concept.
In the above-described embodiments, the image data base of employing is the key frame of the video data of TRECVID2008.TRECVID is authority's match in NBS (NIST) the video frequency searching field of holding.For example, choosing three semantic concepts such as " aloft aircraft ", " bus ", " night " detects.Whole image data base is divided into two parts: training plan image set and image set to be detected, wherein, every image in whole image data base all passes through artificial mark, choose 42, " aloft aircraft " positive sample, 200 of negative samples (, not containing " aloft aircraft " this concept in this sample); Choose 46, " bus " positive sample, 200 of negative samples; Choose 242, " night " positive sample, 500 of negative samples, all concepts one have 10680 images.The word bag model that experiment adopts detection Average Accuracy (Average Precision) to assess the best sub-word list of employing carries out the overall performance of concept detection.Detecting Average Accuracy is a kind of evaluation index that can accurately reflect retrieval performance, and it is widely used in information retrieval field.
Adopt the SIFT descriptor of finding point of interest based on difference of Gaussian metric space, extracted local feature for all training datas and the test data of following these three concepts.By the classification results of support vector machine, select the word list length applicable for these three concepts afterwards: the word list that adopts 50 word lengths for " aloft aircraft " this concept; Adopt the word list of 100 word lengths for " motorbus " this concept; Adopt the word list of 20 word lengths for " night " this concept.
Following table 1 shown on image set to be detected, and the highest Columbia University of the inventive method and TRECVID2008 annual accuracy rate is these three the notional comparisons of test, and with the contrast of the concept detection result that only adopts global characteristics to obtain.
Table 1
As can be seen from Table 1, adopt local feature compared with adopting global characteristics, for example, color and textural characteristics etc. is greatly improved in concept detection.Adopt the word list of suitable length, the Sampling Strategy of suitable positive sample can effectively improve and adopts local feature to carry out the effect of concept detection simultaneously.
Above-described embodiment improves traditional word bag model, traditional word bag model be the word list of a selected regular length for the concept detection of image, but different semantic information,, different concepts, best word list length can be different.Word in so-called word list is exactly the set of the similar local feature that obtains by K means clustering algorithm.For some concept (can be understood as again semantic concept), the feature in this concept just can be expressed completely in simple tens words, if choose the long word list of length, not only increase the burden of computing machine, reduce detection efficiency, but also be mingled with a lot of interfere informations for this concept, reduce on the contrary detection effect.For example, for " night " this scene genus, the local feature containing is less, adopts shorter word list to add up local characteristic information and has not only improved detection efficiency but also strengthened detection effect; For " motorbus " this object genus, owing to containing abundant local feature information, adopt short word list cannot contain the full detail in concept completely, therefore can utilize longer word list,, the informative word list of local feature is effectively to detect this genus.
Compared with prior art, the present invention selects the best word list length of each concept adaptively through cross validation, for the few concept of local feature, adopt shorter word list, has not only obtained good detection effect but also improved detection efficiency; For the abundant concept of local feature, still select sufficiently long word list length, with the effect that guarantees to detect.
Fig. 4 is the structural representation of first embodiment of apparatus of the present invention.
As shown in Figure 4, the device of this embodiment comprises:
Local feature extraction module 11, for utilizing SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts;
For example, for every width image (, testing data and training data), first local feature extraction module 11 finds Local Extremum between three adjacent yardsticks of difference of Gaussian metric space, this point is the extreme point in 26 points that close in this metric space and adjacent metric space, again by the three-dimensional quadratic function of matching accurately to determine position and the yardstick of Local Extremum, next the gradient direction distribution characteristic of utilizing definite Local Extremum neighbor is that each Local Extremum is formulated direction parameter, make operator possess rotational invariance, the Local Extremum of customizing messages in presentation video can be called as to point of interest, finally centered by each point of interest, 8 × 8 window extracts 128 proper vectors of tieing up, wherein, these points of interest in every width image are a subset of the Local Extremum that contains directivity information,
Cluster module 12, be connected with local feature extraction module 11, be used for according to different quantization strategies, utilize the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, the word list of multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of testing data and the local feature of the training data of multiple concepts, wherein, histogram is that local feature is at word bag model { B 1, B 2..., B i..., B n) each word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of multiple concepts under i quantization strategy, word list B icomprise multiple sub-word lists corresponding with concept, every sub-word list is the word list of each concept under i quantization strategy, and the length of every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1;
Illustrate, all training images of each concept (, the training data of each concept) local feature that obtains through SIFT algorithm process is (, based on the proper vector of point of interest) all utilize the method for K mean cluster, according to the difference of quantization strategy (, in the time of cluster, get different K values), assemble the class of unequal number amount, each class can be regarded as to a word in word list, the class of unequal number amount, , the length difference of each word list, then the word list of the different length of all concepts is merged together, form word bag model, built the word list of multiple different lengths, be designated as { B 1, B 2..., B i..., B n, the difference of quantization strategy refers to the value difference of K herein, K is integer, conventionally, K > 1, preferably, 20≤K≤200, for example, in the time that K equals 20, can utilize K means Method to calculate the word list B of M the concept that sub-word list length is 20 1, in the time that K equals 30, can utilize K means Method to calculate the word list B of M the concept that sub-word list length is 30 2, the rest may be inferred, in the time that K equals 200, can utilize K means Method to calculate the word list B of M the concept that sub-word list length is 200 n, the word list of the different length of M concept is merged into word bag model { B 1, B 2..., B i..., B n, then according to word bag model { B 1, B 2..., B i..., B nthe concentrated SIFT local feature that (comprising testing image and training image), all images extracted of statistical picture, (, the local feature of every width image is at word bag model { B about the histogram of every the sub-word list in each word list to obtain every width image 1, B 2..., B i..., B nin each word list B iin each " sub-word list " in occur number of times), in other words, the representative of this histogram is being represented the frequency that these unique points of each concept occur in every sub-word list in statistical picture, the proper vector that this histogram can be used as every image is input in concept detection sorter classifies,
Disaggregated model training module 13, be connected with cluster module 12 with local feature extraction module 11 respectively, for the local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of training set, checksum set, each concept and the local feature of the training data of each concept, and utilize checksum set on binary support vector machine classifier, to calculate and word bag model { B 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list;
For example, can select binary support vector machine is basic classification device, on training image storehouse, train concept detection sorter by machine learning and mode identification technology, wherein, binary support vector machine is that a kind of on decision-making lineoid, in optimization training set, between sample, the algorithm on border is (, support vector machine is by DUAL PROBLEMS OF VECTOR MAPPING to more in the space of higher-dimension, in this space, set up a largest interval lineoid, both sides at the lineoid that separates data have two lineoid parallel to each other, thereby realize a kind of learning algorithm of Data classification);
Particularly, training image storehouse (, training data) the point of interest set after SIFT algorithm process can be divided into training set and checksum set two parts, choose word list B 1(suppose word list B 1comprise M sub-word list B 11, B 12.., B 1M), according to comprising concept C 1all images of markup information and the SIFT local feature of this image at word list B 1each sub-word list in the histogram information that counts on training set, train binary support vector machine classifier, adjust the parameter of binary support vector machine classifier, and test on checksum set, the correlation parameter of adjusting kernel function (generally adopts radial basis function (Radial Basis Function, RBF) core, wherein, parameters C and δ utilize checksum set data to obtain best parameter by cross validation to select), to determine word list B 1in with concept C 1the parameter of the binary support vector machine classifier that corresponding sub-word list is optimum condition,, binary support vector machine classifier test performance the best on checksum set, the concept detection Average Accuracy also calculating on checksum set is the highest, and training obtains and word list B 1in sub-word list B 11corresponding concept C 1disaggregated model, adopt and use the same method, can train respectively and obtain and word list B 1sub-word list B 12.., B 1Mcorresponding other concepts (C 2.., C m) disaggregated model, change word list, repeat above-mentioned steps, with new word list B ithe histogram of lower statistics is trained as proper vector, obtains and word list B isub-word list B i1, B i2.., B iMcorresponding concept (C 1, C 2.., C m) disaggregated model and with word list B iin sub-word list B i1, B i2.., B iMthe detection Average Accuracy of the local feature of the training data of corresponding each concept;
Cross validation module 14, is connected with disaggregated model training module 13 with cluster module 12, for to calculate with word bag model { B 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at word bag model { B 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by multiple concepts separately final concept detection sorter merge into best concept and detect sorter, alternatively, the sub-word list of multiple concepts the best separately can also be merged into best word bag model;
For example, (for example can obtain the performance table of a different concepts under the sub-word list of difference by above-mentioned disaggregated model training module 13, detect Average Accuracy table), by cross validation (, the mutually relatively corresponding multiple detection Average Accuracies of sub-word lists different from identical concept) in this table, choose the best sub-word list of the best sub-word list of performance as each concept, and the disaggregated model that sub-the best of utilizing each concept word list is learnt out by binary support vector machine is as the final concept detection sorter of each concept, M the final concept detection sorter of concept merged into best concept and detect sorter, the concept comprising for detection of testing image,
Concept detection module 15, be connected with cross validation module 14 with cluster module 12, be input to best concept for the histogram that the local feature of testing data is counted on the sub-word list of the best of each concept and detect the probability of sorter to determine that each concept occurs in testing data;
For example, the SIFT local feature of image set to be detected statistic histogram on the selected sub-word list of the best in S108 is input to best concept and detects in sorter, best concept detects sorter and exports about M concept (, { C 1, C 2..., C m) the testing result of all images to be detected, best concept detects the court verdict that detection of classifier goes out can be shown as probability judgement, that is, the decimal between export one 0~1, the degree of confidence of expression " existence " this concept.
This embodiment combining image treatment technology and mode identification technology realize the semantic concept of image are detected, and it can be the different most suitable word list length of conceptual choice, form the best word bag model of self-adaptation word list length.Meanwhile, this embodiment carries out respectively cluster to each concept, then obtains word list by merging, makes computing machine can carry out parallel computation to improve detection efficiency.In addition, this embodiment is in the time of the image retrieval carrying out based on semantic, and its performance is better than the word bag model that adopts original vocabulary, can significantly improve the retrieval performance of image.
Fig. 5 is the structural representation of second embodiment of apparatus of the present invention.
As shown in Figure 5, compared with embodiment in Fig. 4, the device of this embodiment can also comprise:
Class is added module 21, be connected with cluster module 12, be used to multiple concepts to add a background classes, , for all need detect concepts (for example, M concept) an additional background classes, adding of background classes can provide a lot of background informations for word bag model, the background information of detected data can be proposed on the one hand, to detect more accurately the concept in data to be tested, the pure background information that does not comprise any concept can also be grouped in background classes on the other hand, pure background information is grouped in the corresponding class of certain concept mistakenly preventing, thereby can improve significantly the detection Average Accuracy of concept to be detected.
Fig. 6 is the structural representation of the 3rd embodiment of apparatus of the present invention.
As shown in Figure 6, compared with embodiment in Fig. 4, the device of this embodiment can also comprise:
Sampling module 31, is connected with local feature extraction module 11, for choose the training data of the each concept that comprises markup information according to Sampling Strategy.Wherein, Sampling Strategy is wherein, N ifor the positive sample size of i concept before sampling, n ifor the quantity of training data of i concept after sampling, a ifor the Sampling Strategy parameter between 0 and 1;
Illustrate, utilize Sampling Strategy for through artificial mark (, mark image in every pictures or video whether comprise certain/some concept) M concept in the training data of each concept choose, the M a selecting concept (can be expressed as to { C 1, C 2..., C m) training data be expressed as { T 1, T 2..., T m, wherein, choose T 1, T 2..., T msampling Strategy Deng training data is
If the positive sample of certain concept (that is, comprising certain concept in this sample) quantity is less than or equal to 100, because its positive sample size is less, in order to make training data comprise abundant information, all positive sample datas are trained; If the positive sample size of certain concept, more than 100, is selected a Sampling Strategy parameter a i(this Sampling Strategy parameter a conventionally, ican be between 0 and 1), n samples out from the positive sample more than 100 i=a i(N i-100) individual positive sample training, illustrates, if " night " this concept has 252 positive samples, adopts a i=0.5 Sampling Strategy parameter, has 76 positive samples to participate in training this concept for night.
Positive sample Sampling Strategy in this embodiment, in guaranteeing that positive sample contains information, has simplified training data, has improved the detection efficiency of concept.
Fig. 7 is the structural representation of the 4th embodiment of apparatus of the present invention.
As shown in Figure 7, compared with embodiment in Fig. 6, the device of this embodiment also comprises:
Class is added module 21, be connected with cluster module 12, be used to multiple concepts to add a background classes, , for all need detect concepts (for example, M concept) an additional background classes, adding of background classes can provide a lot of background informations for word bag model, the background information of detected data can be proposed on the one hand, to detect more accurately the concept in data to be tested, the pure background information that does not comprise any concept can also be grouped in background classes on the other hand, pure background information is grouped in the corresponding class of certain concept mistakenly preventing, thereby can significantly improve the detection Average Accuracy of concept to be detected.
In the above-described embodiments, 20≤K≤200, in the time that the value of K is less than 20, constructed word list generally can not be expressed the characteristic information of this concept fully, declines to a great extent thereby can make to detect effect; And in the time that the value of K is greater than 200, constructed word list information relative redundancy, has increased the computational burden of computing machine greatly, and do not promote significantly in effect.
Description of the invention provides for example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are apparent for the ordinary skill in the art.Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the present invention's design to be suitable for the various embodiment with various modifications of special-purpose.

Claims (8)

1. a detection method for image concept, is characterized in that, described method comprises:
Choose the training data of the each concept that comprises markup information according to Sampling Strategy;
Utilize SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts;
According to different quantization strategies, utilize the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, the word list of described multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of described testing data and the local feature of the training data of described multiple concepts, wherein, described histogram is for local feature is at the predicate bag model { B of institute 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of described multiple concepts under i quantization strategy, described word list B icomprise multiple sub-word lists corresponding with concept, every sub-word list is the word list of each concept under i quantization strategy, and the length of described every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1;
The local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of described training set, described checksum set, each concept and the local feature of the training data of each concept, and utilize described checksum set on described binary support vector machine classifier, to calculate the predicate bag model { B with institute 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list;
To calculate with the predicate bag model { B of institute 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at the predicate bag model { B of institute 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by described multiple concepts separately final concept detection sorter merge into best concept and detect sorter;
The described best concept that is input to the histogram that the local feature of described testing data is counted on the sub-word list of the best of each concept detects the probability to determine that each concept occurs in described testing data in sorter.
2. method according to claim 1, it is characterized in that,, utilize before the local feature of the training data of K means Method and each concept assembles the step about the word list of the different length of each concept according to different quantization strategies described, described method also comprises:
For described multiple concepts are added a background classes.
3. method according to claim 1, is characterized in that, 20≤K≤200.
4. method according to claim 1, is characterized in that, described Sampling Strategy is n i = N i , N i ≤ 100 a i × ( N i - 100 ) , N i > 100 , Wherein, N ifor the positive sample size of i concept before sampling, n ifor the quantity of training data of i concept after sampling, a ifor the Sampling Strategy parameter between 0 and 1.
5. a pick-up unit for image concept, is characterized in that, described device comprises:
Sampling module, for choosing the training data of the each concept that comprises markup information according to Sampling Strategy;
Local feature extraction module, is connected with described sampling module, for utilizing SIFT algorithm to obtain respectively the local feature of the local feature of testing data and the training data of multiple concepts;
Cluster module, be connected with described local feature extraction module, be used for according to different quantization strategies, utilize the local feature of the training data of K means Method and each concept to assemble the word list about the different length of each concept, the word list of described multiple concepts different length is separately merged into word bag model { B 1, B 2..., B i..., B n, and add up respectively the histogram of the histogram of local feature of described testing data and the local feature of the training data of described multiple concepts, wherein, described histogram is for local feature is at the predicate bag model { B of institute 1, B 2..., B i..., B neach word list B ievery sub-word list in the number of times that occurs, word list B ifor the word list of described multiple concepts under i quantization strategy, described word list B icomprise multiple sub-word lists corresponding with concept, every sub-word list is the word list of each concept under i quantization strategy, and the length of described every sub-word list is determined by quantization strategy, quantization strategy is determined by K value, 1≤i≤N, N>=2, K > 1;
Disaggregated model training module, be connected with described cluster module with described local feature extraction module respectively, for the local feature of the training data of each concept is divided into training set and checksum set, utilize the histogram training binary support vector machine classifier of the concept markup information of training data of described training set, described checksum set, each concept and the local feature of the training data of each concept, and utilize described checksum set on described binary support vector machine classifier, to calculate the predicate bag model { B with institute 1, B 2..., B i..., B neach word list B iin the corresponding each concept of every sub-word list training data local feature detection Average Accuracy and train and each word list B iin the disaggregated model of the corresponding each concept of every sub-word list;
Cross validation module, is connected with described disaggregated model training module with described cluster module, for to calculate with the predicate bag model { B of institute 1, B 2..., B i..., B neach word list B iin the detection Average Accuracy of local feature of training data of the corresponding each concept of every sub-word list carry out cross validation, with at the predicate bag model { B of institute 1, B 2..., B i..., B nin select with the maximum of each concept and detect the best sub-word list of the corresponding sub-word list of Average Accuracy as each concept, and using the disaggregated model corresponding with the sub-word list of the best each concept that train as the final concept detection sorter of each concept, by described multiple concepts separately final concept detection sorter merge into best concept and detect sorter;
Concept detection module, be connected with described cross validation module with described cluster module, be input to described best concept for the histogram that the local feature of described testing data is counted on the sub-word list of the best of each concept and detect the probability of sorter to determine that each concept occurs in described testing data.
6. device according to claim 5, is characterized in that, described device also comprises:
Class is added module, is connected with described cluster module, is used to described multiple concept to add a background classes.
7. device according to claim 5, is characterized in that, 20≤K≤200.
8. device according to claim 5, is characterized in that, described Sampling Strategy is n i = N i , N i ≤ 100 a i × ( N i - 100 ) , N i > 100 , Wherein, N ifor the positive sample size of i concept before sampling, n ifor the quantity of training data of i concept after sampling, a ifor the Sampling Strategy parameter between 0 and 1.
CN201010271693.XA 2010-09-03 2010-09-03 Image concept detection method and device Active CN102385592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010271693.XA CN102385592B (en) 2010-09-03 2010-09-03 Image concept detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010271693.XA CN102385592B (en) 2010-09-03 2010-09-03 Image concept detection method and device

Publications (2)

Publication Number Publication Date
CN102385592A CN102385592A (en) 2012-03-21
CN102385592B true CN102385592B (en) 2014-07-09

Family

ID=45825012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010271693.XA Active CN102385592B (en) 2010-09-03 2010-09-03 Image concept detection method and device

Country Status (1)

Country Link
CN (1) CN102385592B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104299010B (en) * 2014-09-23 2017-11-10 深圳大学 A kind of Image Description Methods and system based on bag of words
CN104657427A (en) * 2015-01-23 2015-05-27 华东师范大学 Bag-of-visual-words information amount weight optimization-based image concept detection method
CN104657742A (en) * 2015-01-23 2015-05-27 华东师范大学 Image concept detection method based on Hamming embedding kernel, and Hamming embedding kernel thereof
CN105825178A (en) * 2016-03-14 2016-08-03 民政部国家减灾中心 Functional region dividing method and device based on remote-sensing image
CN106650778B (en) * 2016-10-14 2019-08-06 北京邮电大学 A kind of method and device of bag of words optimization and image recognition
CN109726726B (en) * 2017-10-27 2023-06-20 北京邮电大学 Event detection method and device in video
CN110516737B (en) * 2019-08-26 2023-05-26 南京人工智能高等研究院有限公司 Method and device for generating image recognition model
CN111460971B (en) * 2020-03-27 2023-09-12 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment
CN111753881B (en) * 2020-05-28 2024-03-29 浙江工业大学 Concept sensitivity-based quantitative recognition defending method against attacks
CN113222018B (en) * 2021-05-13 2022-06-28 郑州大学 Image classification method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398846A (en) * 2008-10-23 2009-04-01 上海交通大学 Image, semantic and concept detection method based on partial color space characteristic

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398846A (en) * 2008-10-23 2009-04-01 上海交通大学 Image, semantic and concept detection method based on partial color space characteristic

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A BAG-OF-FEATURES APPROACH BASED ON HUE-SIFT DESCRIPTOR FOR NUDE DETECTION;Ana P. B. Lopes等;《17th European Signal Processing Conference (EUSIPCO 2009)》;20090828;全文 *
Ana P. B. Lopes等.A BAG-OF-FEATURES APPROACH BASED ON HUE-SIFT DESCRIPTOR FOR NUDE DETECTION.《17th European Signal Processing Conference (EUSIPCO 2009)》.2009,全文.
一种基于PLSA 和词袋模型的图像分类新方法;田甜等;《咸阳师范学院学报》;20100731;第25卷(第4期);全文 *
基于词袋的图像分类中的分类器比较研究;黄鉴欣等;《Pattern Recognition, 2009. CCPR 2009. Chinese Conference on 》;20091231;全文 *
田甜等.一种基于PLSA 和词袋模型的图像分类新方法.《咸阳师范学院学报》.2010,第25卷(第4期),全文.
黄鉴欣等.基于词袋的图像分类中的分类器比较研究.《Pattern Recognition, 2009. CCPR 2009. Chinese Conference on 》.2009,全文.

Also Published As

Publication number Publication date
CN102385592A (en) 2012-03-21

Similar Documents

Publication Publication Date Title
CN102385592B (en) Image concept detection method and device
Fang et al. A Method for Improving CNN-Based Image Recognition Using DCGAN.
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN103186538A (en) Image classification method, image classification device, image retrieval method and image retrieval device
CN112613552B (en) Convolutional neural network emotion image classification method combined with emotion type attention loss
CN115937655B (en) Multi-order feature interaction target detection model, construction method, device and application thereof
CN110263712A (en) A kind of coarse-fine pedestrian detection method based on region candidate
CN103810274A (en) Multi-feature image tag sorting method based on WordNet semantic similarity
CN111046910A (en) Image classification, relation network model training and image annotation method and device
CN109710804B (en) Teaching video image knowledge point dimension reduction analysis method
CN110717554A (en) Image recognition method, electronic device, and storage medium
CN111325237B (en) Image recognition method based on attention interaction mechanism
CN101398846A (en) Image, semantic and concept detection method based on partial color space characteristic
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN114360038B (en) Weak supervision RPA element identification method and system based on deep learning
CN116226785A (en) Target object recognition method, multi-mode recognition model training method and device
CN113032613A (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
CN111242114A (en) Character recognition method and device
CN113343953B (en) FGR-AM method and system for remote sensing scene recognition
Wang et al. Balanced-RetinaNet: solving the imbalanced problems in object detection
CN117152142B (en) Bearing defect detection model construction method and system
Chen et al. Soil Image Segmentation Based on Mask R-CNN
Fernández-Robles et al. Evaluation of clustering configurations for object retrieval using sift features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant