CN103020231B

CN103020231B - The local feature of picture is quantified as to the method and apparatus of visual vocabulary

Info

Publication number: CN103020231B
Application number: CN201210543868.7A
Authority: CN
Inventors: 李�浩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2012-12-14
Filing date: 2012-12-14
Publication date: 2018-06-08
Anticipated expiration: 2032-12-14
Also published as: CN103020231A

Abstract

The present invention provides a kind of method and apparatus that the local feature of picture is quantified as to visual vocabulary, wherein method includes：S1, the first layer from visual vocabulary tree determine vocabulary to be selected；S2, using the distance between vocabulary respectively select of local feature and current level and the confidence level in the father node place path of the vocabulary respectively select of current level, calculate in current level the respectively confidence level in vocabulary place to be selected path respectively；The confidence level in path is greater than or equal to the vocabulary to be selected of default confidence threshold value where in S3, the current level of selection, judges whether current level is last layer, if so, the vocabulary selected in current level to be determined as to the visual vocabulary of local feature；Otherwise, the vocabulary selected from current level enters next level, and the child node of the vocabulary of selection is determined as to the vocabulary to be selected of next level, goes to step S2.The present invention can reduce the computing cost during quantization on the basis of the robustness for improving quantization error.

Description

The local feature of picture is quantified as to the method and apparatus of visual vocabulary

【Technical field】

The present invention relates to computer application technologies, more particularly to a kind of that picture local feature is quantified as visual vocabulary Method and apparatus.

【Background technology】

With the development of multimedia related technologies, the scale of digital picture expands rapidly, and application is also more and more extensive, because This, required picture how is effectively and quickly retrieved from Large Scale Graphs sheet data has become a research hotspot.Traditional Text based picture retrieval mode drawbacks such as subjectivity and uncertainty caused by manually marking picture, cannot Meet requirement of the user to inquiry, therefore the picture retrieval technology based on content is gradually risen and is widely adopted.

It is a kind of general picture retrieval method based on content to establish inverted index to picture according to visual vocabulary, this Method determines the local feature of picture first, by different local feature quantizations to visual vocabulary, and then by a secondary picture table The combination of visual vocabulary is shown as, picture retrieval is realized with the method similar to text retrieval.Wherein, how local feature to be quantified It is a basis for realizing picture retrieval on to visual vocabulary, is primarily present following two modes at present, i.e., nearest map paths (Best Bin Frist) mode and greed N neighbours map paths (Greedy N-best Paths).Both modes are bases In the method for visual vocabulary tree, it is assumed that a visual vocabulary tree has L layers, each father node corresponds to K child node, then one L is that the visual vocabulary tree that 6, K is 10 can represent 1,000,000 visual vocabularies, as shown in Figure 1.

In nearest map paths method, a local feature is selected nearest first compared with the 1st layer of K vocabulary The corresponding child node of vocabulary, then compared with the K vocabulary fixed with the 2nd layer choosing, and selected nearest child node；And so on, L layers of nearest vocabulary therewith are eventually mapped to, vocabulary selected during entire visual vocabulary tree query is formed into the office The visual vocabulary table of portion's feature.

In greedy N neighbours map paths method, visual vocabulary spreading factor a N, a local feature head are introduced First compared with the 1st layer of K vocabulary, and select the nearest corresponding child node of N number of vocabulary；Then N × the K fixed with the 2nd layer choosing A vocabulary compares, and selectes the nearest corresponding child node of N number of vocabulary；And so on, be eventually mapped to L layers it is nearest therewith N number of vocabulary, selected vocabulary during entire visual vocabulary tree query is formed to the visual vocabulary table of the local feature.

Above-mentioned nearest map paths method be easy to cause quantization error since each layer choosing determines a nearest vocabulary, The minor change of picture local feature is also easily quantized on different visual vocabularies, in turn results in mismatch, robustness compared with Difference.Although greedy N enhances the robustness to quantization error close to map paths method, but each layer more selected N number of Vocabulary, i.e., each layer are required for entering N number of path, bring larger computing cost.

【Invention content】

In view of this, the present invention provides a kind of method and apparatus that the local feature of picture is quantified as to visual vocabulary, On the basis of improving the robustness of quantization error, the computing cost during lower quantization.

Specific technical solution is as follows：

A kind of method that the local feature of picture is quantified as to visual vocabulary inquires vision in the local feature for picture During words tree, following steps are performed：

S1, the first layer from visual vocabulary tree determine vocabulary to be selected, and step S2 is performed using first layer as current level；

S2, each of the distance between the local feature and the vocabulary respectively to be selected of current level and current level is utilized The confidence level in path where the father node of vocabulary to be selected calculates in current level putting for each vocabulary place path select respectively The confidence level in path where the father node of the vocabulary respectively to be selected of reliability, wherein first layer is preset initial value；

The confidence level in path is greater than or equal to the word to be selected of default confidence threshold value where in S3, the current level of selection It converges, judges whether current level is last layer, if so, the vocabulary selected in current level is determined as the local feature Visual vocabulary；Otherwise, the vocabulary selected from current level enters next level, and the child node of the vocabulary of selection is determined For the vocabulary to be selected of next level, the step S2 is gone to using next level as current level.

According to one preferred embodiment of the present invention, the step S1 is specially：By words all in the first layer of visual vocabulary tree Remittance is determined as vocabulary to be selected.

According to one preferred embodiment of the present invention, it is calculated in current level i-th according to the following formula in the step S2 The confidence level γ in path where vocabulary to be selected_i：

Wherein described γ_cFor the confidence level in path where the father node of described i-th vocabulary to be selected, the Dist_iFor institute State the distance between local feature and described i-th vocabulary to be selected, the Dist_minFor the local feature and current level Vocabulary respectively to be selected between distance minimum value.

According to one preferred embodiment of the present invention, after the quantization of visual vocabulary is completed for all local features of picture, The visual vocabulary of all local features is ranked up according to the confidence level in place path, selection comes the visual vocabulary of top n As the visual vocabulary of the picture, the N is preset positive integer.

According to one preferred embodiment of the present invention, the local feature number of the picture is bigger, and the N values set smaller, The local feature number of the picture is smaller, and the N values set bigger.

A kind of device that the local feature of picture is quantified as to visual vocabulary, the device include：

Initial query unit, for for the local feature of picture inquire visual vocabulary tree during, from visual word The first layer set that converges determines vocabulary to be selected, and confidence computation unit is triggered using first layer as current level；

Confidence computation unit after toggled, utilizes the local feature and the word respectively to be selected of current level The confidence level in path, calculates current layer respectively where the father node of the vocabulary respectively to be selected of the distance between remittance and current level The each confidence level in path where vocabulary to be selected in secondary, wherein the father node place path of the vocabulary respectively select of first layer are put Reliability is preset initial value；

Judging unit is selected, the confidence level for selecting place path in current level is greater than or equal to default confidence level threshold The vocabulary to be selected of value judges whether current level is last layer, if so, the vocabulary selected in current level is supplied to Visual vocabulary determination unit；Otherwise, the vocabulary selected from current level enters next layer, and by the child node of the vocabulary of selection It is determined as the vocabulary to be selected of next level, the confidence computation unit is triggered using next level as current level；

Visual vocabulary determination unit, the vocabulary for the selection judging unit to be provided are determined as the local feature Visual vocabulary.

According to one preferred embodiment of the present invention, the initial query unit will specifically own in the first layer of visual vocabulary tree Vocabulary is determined as vocabulary to be selected.

According to one preferred embodiment of the present invention, the confidence computation unit specifically calculates current level according to the following formula In path where i-th of vocabulary to be selected confidence level γ_i：

According to one preferred embodiment of the present invention, which further includes：Vocabulary control unit, for the institute of picture to be directed to Have local feature complete visual vocabulary quantization after, by the visual vocabulary of all local features according to place path confidence level into Row sequence selects to come visual vocabulary of the visual vocabulary of top n as the picture, and the N is preset positive integer.

As can be seen from the above technical solutions, the present invention is not each layer choosing fixed one during visual vocabulary is quantified A nearest vocabulary, nor fixed select N number of vocabulary, but according to each layer in visual vocabulary tree of node and local feature The degree of closing on suitable number of vocabulary is adaptive selected hence into next layer, this closes on degree and passes through path where vocabulary Confidence metric.It compares and the mode of a nearest vocabulary is selected in each layer choosing, reduce quantization error, improve quantization and miss The robustness of difference；It comparing and the mode of fixed number vocabulary is selected in each layer choosing, the mode of present invention selection vocabulary is more reasonable, Certain not high vocabulary of degree that close on local feature will not be then selected by participation and regarded because fixed number requires Feel the calculating of lexical quantization, so as to reduce computing cost, improve quantitative efficiency.

【Description of the drawings】

Fig. 1 is the structure diagram of visual vocabulary tree；

Fig. 2 is the method flow diagram that the embodiment of the present invention one provides；

Fig. 3 is the instance graph of a visual vocabulary tree that the embodiment of the present invention one provides；

Fig. 4 is structure drawing of device provided by Embodiment 2 of the present invention.

【Specific embodiment】

To make the objectives, technical solutions, and advantages of the present invention clearer, it is right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

Embodiment one,

The vocabulary of fixed number is no longer chosen in each layer in embodiments of the present invention, but introducing path confidence level is general It reads, determines whether to choose the corresponding vocabulary in the path using path confidence level and enter next node layer, specifically such as Fig. 2 institutes Show, during visual vocabulary tree is inquired respectively for each of picture local feature, from the first layer of visual vocabulary tree Start, perform following steps：

Step 201：Vocabulary to be selected is determined from the first layer of visual vocabulary tree.

In embodiments of the present invention, can using all vocabulary of first layer as vocabulary to be selected, in addition to this kind of mode it Outside, other selection modes can also be used using the vocabulary of one of those or several nodes as vocabulary to be selected.

In order to facilitate understanding, visual vocabulary tree is simply introduced first.Visual vocabulary tree is based on extensive picture It pre-establishes, visual vocabulary is extracted from the local feature of extensive picture, and obtained large-scale visual word will be extracted Remittance is clustered based on level, forms visual vocabulary tree.Wherein, each child node is that the vocabulary of father node is further clustered It obtains, each node is the set that more than one word is formed, in the art commonly referred to as vocabulary.

Step 202：The distance between vocabulary respectively select using local feature and current level and currently level The confidence level in path where the father node of each vocabulary to be selected calculates in current level each vocabulary place select path respectively Confidence level.

Respectively when the confidence level in path where vocabulary is selected, equation below may be used in calculating current vocabulary：

Wherein, γ_cFor the confidence level in path where the father node of i-th of vocabulary to be selected, wherein first layer is each to be selected Preset initial value, for example, 1 may be used in the confidence level in path where selecting the father node of vocabulary.Dist_iFor local feature with The distance between i-th vocabulary to be selected, Dist_minThe distance between local feature and the vocabulary respectively to be selected of current level Minimum value.

Since local feature is the vector of n dimension, in visual vocabulary tree the vocabulary of each node be also a n tie up to Amount, for example, if (SIFT, Scale-invariant feature transform) feature is converted using scale invariant feature, Then N is 128, therefore specific calculation between vector may be used in the distance between local feature and vocabulary to be selected, packet It includes but is not limited to：Euclidean distance, COS distance etc..

In fact, the confidence level in path is an aggregate-value where each vocabulary, which is as the road where its father node Diameter expands, and the value of confidence level has been embodied by the path accumulator where its father node in the current level of deviation Degree between vocabulary and local feature to be selected apart from minimum value.

Step 203：The confidence level in path where selecting in current level is greater than or equal to the to be selected of default confidence threshold value Select vocabulary.

After the confidence level in path where the vocabulary to be selected of current level calculated according to step 202, put according to this Reliability continues to select visual vocabulary, and using a default confidence threshold value, alternatively foundation, the confidence threshold value are usual herein Using empirical value, such as 0.97.

Path is selected by way of the confidence level in path where this judgement vocabulary, in the vocabulary that each layer choosing is selected not It is fixed value, but sees the degree of its local feature reflected.

Step 204：Judge whether current level is last layer, if so, performing step 205；Otherwise, step is performed 206。

Step 205：The vocabulary selected in current level is determined as to the visual vocabulary of the local feature, is terminated for the office The visual vocabulary quantization of portion's feature.

Step 206：The vocabulary selected from current level enters next level, and the child node of the vocabulary of selection is determined For the vocabulary to be selected of next level, next level as current level is gone to and performs step 202.

And so on, local feature is just mapped as all paths that accumulative confidence level is greater than or equal to confidence threshold value On last layer vocabulary.It should be noted that last layer herein can be last layer of visual vocabulary tree, that is, regard Feel the leaf node of words tree.If not requiring optimum efficiency, a certain layer of visual vocabulary tree can also be set as quantization vision Last layer of vocabulary, for example, by the layer second from the bottom of visual vocabulary tree be set as quantization visual vocabulary last layer, in this way Local feature is mapped as the second from the bottom layer vocabulary of the confidence level more than or equal to confidence threshold value in path.

It cites a plain example and method flow shown in Fig. 2 is described, it is false by taking visual signature tree shown in Fig. 3 as an example If the visual signature tree has three layers, for a certain local feature since first layer, it is first determined all vocabulary are equal in first layer For vocabulary to be selected, i.e. vocabulary 1 and vocabulary 2, the confidence level γ in 1 place path of vocabulary is calculated according to formula (1) respectively₁And word Converge the confidence level γ in 2 place paths₂, wherein the γ when calculating_cUsing preset initial value 1, it is assumed that the γ calculated₁More than pre- If confidence threshold value θ, and γ₂Less than θ, then vocabulary 1 is selected to enter the second layer, determine the child node vocabulary 3 and vocabulary of vocabulary 1 4 be vocabulary to be selected.

The confidence level γ in 3 place path of vocabulary is calculated according to formula (1) respectively₃And vocabulary 4 is in the confidence level γ in path₄, The γ wherein when calculating_cUsing the confidence level in 1 place path of vocabulary, i.e. γ₁.Assuming that the γ calculated₃And γ₄It is all higher than presetting Confidence threshold value θ, then vocabulary 3 and vocabulary 4 is selected to respectively enter third layer, determines the child node vocabulary 7 of vocabulary 3 and vocabulary 8 And the child node vocabulary 9 and vocabulary 10 of vocabulary 4 are vocabulary to be selected.

The confidence level γ in 7 place path of vocabulary is calculated according to formula (1) respectively₇, 8 place path of vocabulary confidence level γ₈, The confidence level γ in 9 place path of vocabulary₉And the confidence level γ in 10 place path of vocabulary₁₀, wherein when calculating, vocabulary 7 and word The γ of remittance 8_cUsing the confidence level γ in 3 place path of vocabulary₃, the γ of vocabulary 9 and vocabulary 10_cUsing the confidence in 4 place path of vocabulary Spend γ₄, it is assumed that the confidence level in 9 place path of vocabulary 7 and vocabulary is all higher than θ, since third layer is last layer, selects word The visual vocabulary of remittance 7 and vocabulary 9 as the local feature.

The example that this place is lifted is only a simply example, the often level of visual vocabulary tree during actual queries Child node number corresponding with each father node is larger, and the represented visual vocabulary number of visual vocabulary tree is also very huge, Using the method that the embodiment of the present invention is provided when being calculated to the saving effect of computing cost or apparent.

In addition, after being influenced due to the more picture of local feature by quantization error, still there is more local feature energy It is enough quantized on identical visual vocabulary, so as to effectively be retrieved, therefore, in order to be further reduced unnecessary calculating (computing cost herein refers to after establishing inverted index using visual vocabulary that is carried out during picture retrieval looks into expense Look for the computing cost of inverted index), can limit the number to the visual vocabulary of picture the upper limit.When all local features of picture are complete Into after the quantization of visual vocabulary, the visual vocabulary of all local features according to the confidence level in place path is ranked up, is selected Come visual vocabulary of the visual vocabulary as the picture of top n.N is that the visual vocabulary of setting limits the number the upper limit, Ke Yishi One fixed positive integer can also be set according to the number of local feature.

When being set according to the number of local feature, if the local feature of picture is more, which is retrieving When there is stronger distinction, the visual vocabulary quantity that needs is relatively fewer can just to have higher retrieval accuracy, on the contrary, If the local feature of picture is less, the picture is in retrieval with weaker distinction, and the visual vocabulary needed is relatively Can mostly have higher retrieval accuracy.Therefore, the local feature number of picture is bigger, can set N values smaller, The local feature number of picture is smaller, can set N values bigger.

It is the description carried out to method provided by the present invention above, with reference to embodiment two to provided by the present invention Device is described in detail.

Embodiment two,

Fig. 4 is structure drawing of device provided by Embodiment 2 of the present invention, as shown in figure 4, the device can include：Initial query Unit 401, confidence computation unit 402, selection judging unit 403 and visual vocabulary determination unit 404.

The quantization of local feature progress visual vocabulary tree actually for each local feature of picture is inquired respectively and is regarded Feel words tree process, initial query unit 401 for the local feature of picture inquire visual vocabulary tree during, from regarding Feel that the first layer of words tree determines vocabulary to be selected, confidence computation unit 402 is triggered using first layer as current level.

Specifically, initial query unit 401 can be using all vocabulary of first layer as vocabulary to be selected, in addition to this kind of side Except formula, other selection modes can also be used using the vocabulary of one of those or several nodes as vocabulary to be selected.

Confidence computation unit 402 it is toggled (at the beginning by the triggering of initial query unit 401, subsequently by Select the triggering of judging unit 403) after, using the distance between vocabulary respectively to be selected of local feature and current level and work as The confidence level in path where the father node of the vocabulary respectively to be selected of preceding level calculates in current level respectively vocabulary institute to be selected respectively The confidence level in path where the father node of the vocabulary respectively to be selected of confidence level in path, wherein first layer is preset initial Value.

The confidence level γ in path where i-th of vocabulary to be selected in current level can be calculated according to the following formula_i：

Wherein γ_cFor the confidence level in path where the father node of i-th of vocabulary to be selected, Dist_iFor local feature and i-th The distance between a vocabulary to be selected, Dist_minThe minimum of distance between local feature and the vocabulary respectively to be selected of current level Value.

Calculate local feature and when between select vocabulary apart from when, may be used it is arbitrary calculate it is vectorial between distance Mode, such as Euclidean distance, COS distance etc..

According to the result of calculation of confidence computation unit 402, judging unit 403 is selected to select place path in current level Confidence level be greater than or equal to the vocabulary to be selected of default confidence threshold value, judge whether current level is last layer, if It is that the vocabulary selected in current level is supplied to visual vocabulary determination unit 404；Otherwise, the word selected from current level Converge and enter next layer, and the child node of the vocabulary of selection is determined as to the vocabulary to be selected of next level, using next level as Current level triggering confidence computation unit 402.

Wherein, above-mentioned confidence threshold value generally use empirical value, such as 0.97.

Visual vocabulary determination unit 404, for the vocabulary that judging unit 403 is selected to provide to be determined as regarding for local feature Feel vocabulary.

Final local feature is just mapped as accumulative confidence level more than or equal on all paths of confidence threshold value The vocabulary of last layer.Last layer herein can be the leaf node of last layer, i.e. visual vocabulary tree of visual vocabulary tree. If not requiring optimum efficiency, last layer of a certain layer of visual vocabulary tree as quantization visual vocabulary, example can also be set The layer second from the bottom of visual vocabulary tree is such as set as to last layer of quantization visual vocabulary, such local feature is mapped as road The confidence level of diameter is greater than or equal to the layer vocabulary second from the bottom of confidence threshold value.

In addition, after being influenced due to the more picture of local feature by quantization error, still there is more local feature energy It is enough quantized on identical visual vocabulary, so as to effectively be retrieved, therefore, in order to be further reduced unnecessary calculating (computing cost herein refers to after establishing inverted index using visual vocabulary that is carried out during picture retrieval looks into expense Look for the computing cost of inverted index), which can also include：Vocabulary control unit 405, for all of picture to be directed to After local feature completes the quantization of visual vocabulary, the visual vocabulary of all local features is carried out according to the confidence level in place path Sequence selects to come visual vocabulary of the visual vocabulary as picture of top n, and N is preset positive integer.

N is that the visual vocabulary of setting limits the number the upper limit, can be a fixed positive integer, can also be according to local spy The number of sign is set.

It, can after the visual vocabulary quantization that picture is completed in using the above method provided by the invention and device to picture library To establish inverted index using visual vocabulary, so as to which the retrieval for picture provides basis, a large amount of picture retrieval productions are can be applied to Product.In addition, during picture retrieval, the above method provided by the invention and device pair can be utilized for picture to be retrieved After the local feature of picture to be retrieved carries out visual vocabulary quantization, search using in obtained visual vocabulary to picture database Row's index determines the picture of hit as retrieval result.

The present invention is not that each layer choosing determines one most during visual vocabulary is quantified it can be seen from above description Near vocabulary selectes N number of vocabulary nor fixed, but the node and local feature according to each layer in visual vocabulary tree face Suitable number of vocabulary is adaptive selected hence into next layer in short range degree, this closes on degree putting by path where vocabulary Confidence measure.Therefore, method and apparatus provided by the invention have advantages below：

1) it compares and the mode of a nearest vocabulary is selected in each layer choosing, reduce quantization error, improve quantization error Robustness.

2) due to reducing the influence of quantization error, recall rate is improved during picture retrieval, through overtesting The result number that the average each picture retrieval of demonstration is correctly recalled increases 37%.

3) it comparing and the mode of fixed number vocabulary is selected in each layer choosing, the mode of present invention selection vocabulary is more reasonable, Certain not high vocabulary of degree that close on local feature will not be then selected by participation and regarded because fixed number requires Feel the calculating of lexical quantization, reduce unnecessary extension, so as to reduce computing cost, improve quantitative efficiency.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God and any modification, equivalent substitution, improvement and etc. within principle, done, should be included within the scope of protection of the invention.

Claims

A kind of 1. method that the local feature of picture is quantified as to visual vocabulary, which is characterized in that special in the part for picture During sign inquiry visual vocabulary tree, following steps are performed：

S1, the first layer from visual vocabulary tree determine vocabulary to be selected, and step S2 is performed using first layer as current level；

S2, each to be selected of the distance between the local feature and the vocabulary respectively to be selected of current level and current level is utilized The confidence level in path where selecting the father node of vocabulary calculates in current level the respectively confidence in vocabulary place select path respectively The confidence level in path where the father node of degree, the wherein vocabulary respectively to be selected of first layer is preset initial value；

The confidence level in path is greater than or equal to the vocabulary to be selected of default confidence threshold value where in S3, the current level of selection, sentences Whether the current level that breaks is last layer, if so, the vocabulary selected in current level is determined as regarding for the local feature Feel vocabulary；Otherwise, the vocabulary selected from current level enters next level, and under the child node of the vocabulary of selection is determined as Next level is gone to the step S2 by the vocabulary to be selected of one level.
2. according to the method described in claim 1, it is characterized in that, the step S1 is specially：By the first of visual vocabulary tree All vocabulary are determined as vocabulary to be selected in layer.
3. according to the method described in claim 1, it is characterized in that, current layer is calculated according to the following formula in the step S2 The confidence level γ in path where i-th of vocabulary to be selected in secondary_i：

Wherein described γ_cFor the confidence level in path where the father node of described i-th vocabulary to be selected, the Dist_iFor the office The distance between portion's feature and described i-th vocabulary to be selected, the Dist_minFor each of the local feature and current level The minimum value of distance between vocabulary to be selected.
4. according to the method described in claim 1, it is characterized in that, all local features of picture to be directed to complete visual vocabulary Quantization after, the visual vocabulary of all local features is ranked up according to the confidence level in place path, selection come top n Visual vocabulary of the visual vocabulary as the picture, the N are preset positive integer.
5. according to the method described in claim 4, it is characterized in that, the local feature number of the picture is bigger, the N values are set Put smaller, the local feature number of the picture is smaller, and the N values set bigger.
6. a kind of device that the local feature of picture is quantified as to visual vocabulary, which is characterized in that the device includes：

Initial query unit, for for the local feature of picture inquire visual vocabulary tree during, from visual vocabulary tree First layer determine vocabulary to be selected, using first layer as current level trigger confidence computation unit；

Confidence computation unit, after toggled, using the local feature and current level vocabulary respectively to be selected it Between distance and current level vocabulary respectively to be selected father node where path confidence level, calculate respectively in current level The respectively confidence level in path where vocabulary to be selected, the confidence level in path wherein where the father node of the vocabulary respectively to be selected of first layer For preset initial value；

Judging unit is selected, the confidence level for selecting place path in current level is greater than or equal to default confidence threshold value Vocabulary to be selected judges whether current level is last layer, if so, the vocabulary selected in current level is supplied to vision Vocabulary determination unit；Otherwise, the vocabulary selected from current level enters next layer, and the child node of the vocabulary of selection is determined For the vocabulary to be selected of next level, the confidence computation unit is triggered using next level as current level；

Visual vocabulary determination unit, the vocabulary for the selection judging unit to be provided are determined as the vision of the local feature Vocabulary.
7. device according to claim 6, which is characterized in that the initial query unit is specifically by the of visual vocabulary tree All vocabulary are determined as vocabulary to be selected in one layer.
8. device according to claim 6, which is characterized in that the confidence computation unit is specifically according to the following formula meter Calculate the confidence level γ in path where i-th of vocabulary to be selected in current level_i：

Wherein described γ_cFor the confidence level in path where the father node of described i-th vocabulary to be selected, the Dist_iFor the office The distance between portion's feature and described i-th vocabulary to be selected, the Dist_minFor each of the local feature and current level The minimum value of distance between vocabulary to be selected.
9. device according to claim 6, which is characterized in that the device further includes：Vocabulary control unit, for treating needle After completing the quantization of visual vocabulary to all local features of picture, by the visual vocabulary of all local features according to place path Confidence level be ranked up, select to come visual vocabulary of the visual vocabulary of top n as the picture, the N is preset Positive integer.
10. device according to claim 9, which is characterized in that the local feature number of the picture is bigger, the N values Set smaller, the local feature number of the picture is smaller, and the N values set bigger.