CN110647644A - Feature vector quantization method, feature vector search method, feature vector quantization device, feature vector search device, and storage medium - Google Patents

Feature vector quantization method, feature vector search method, feature vector quantization device, feature vector search device, and storage medium Download PDF

Info

Publication number
CN110647644A
CN110647644A CN201910126323.8A CN201910126323A CN110647644A CN 110647644 A CN110647644 A CN 110647644A CN 201910126323 A CN201910126323 A CN 201910126323A CN 110647644 A CN110647644 A CN 110647644A
Authority
CN
China
Prior art keywords
quantization
feature vector
points
quantized
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910126323.8A
Other languages
Chinese (zh)
Inventor
黄耀海
谭诚
邓远达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to US16/431,520 priority Critical patent/US11308152B2/en
Publication of CN110647644A publication Critical patent/CN110647644A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a quantization method, a retrieval method, a device and a storage medium of a feature vector, wherein the quantization method comprises the following steps: setting a quantization point; selecting at least one quantization point with a distance from the original feature vector smaller than a first preset distance from the set quantization points as a quantization point subset; determining a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points. The retrieval method is a method for retrieving in the quantized feature vector quantized by the quantization method.

Description

Feature vector quantization method, feature vector search method, feature vector quantization device, feature vector search device, and storage medium
Technical Field
The present application relates to a method for increasing the quantization speed of a feature vector, and a retrieval method, apparatus, and storage medium for a quantized feature vector.
Background
The human body image retrieval technology is a retrieval technology based on human body feature matching, and needs to index feature vectors of key information representing human body features in a constructed database, and when retrieval is performed, retrieval results are obtained based on the similarity between the feature vectors to be queried and feature vectors stored in the database. In general, the number of feature vectors stored in a database is very large, and in order to reduce the storage space occupied by a large number of feature vectors, a feature quantization technology can be used to perform quantization processing on original feature vectors. Compared with the original eigenvector before quantization, the quantized eigenvector has shorter length and occupies less storage space, and the storage space can be effectively saved in a large-scale retrieval system. In addition, the length of the quantized feature vector is reduced, so that the retrieval speed can be improved when the human body image is retrieved. Therefore, it is important to adopt a high-performance feature quantization technique to improve the performance of a large-scale search system.
One known feature Quantization technique is the Line Quantization (LQ) technique, which is mentioned in the paper "Efficient Large-scale application neural Search on the GPU", published in 2016 of the IEEE International computer Vision and Pattern recognition Conference (CVPR). The LQ technique is a technique of quantizing an original feature vector by setting Quantization points (Quantization codes) in a feature space and using connection lines between the Quantization points. Fig. 1 shows a flow chart of an implementation of the LQ technique, and as can be seen from fig. 1, the main contents of the LQ technique are: first, as shown in fig. 2, a plurality of quantization points are set in a feature space according to a preset Codebook (Codebook); then, connecting the set quantization points arbitrarily in pairs, calculating the vertical projection (spatial distance) from the original feature vector to each connecting line, and determining the connecting line with the projection distance of the original feature vector meeting the requirement (shortest), as shown by the thick black line in fig. 2; then, taking the projection position of the original feature vector on the determined connecting line with the shortest projection distance as the position of the quantized feature vector corresponding to the original feature vector in the feature space; and finally, representing the quantized feature vectors by using the quantized points at the two ends of the connecting line with the shortest distance and other parameters, and finishing the quantization process of the original feature vectors.
However, in the LQ technique, in order to find a connecting line with the shortest projection distance to the original feature vector, it is necessary to traverse all quantization points, and at the same time, it is necessary to calculate the distance between the original feature vector and the connecting line between any two quantization points, which causes a very large amount of computation and a long time required for performing the quantization process. Especially, when a larger number of quantization points are set to improve the quantization accuracy, more operation resources and longer running time are occupied in the quantization process.
Disclosure of Invention
The present application aims to provide a quantization technique for feature vectors and a retrieval technique based on quantized feature vectors, so as to reduce the operation resources occupied by the quantization process and shorten the quantization time.
According to an aspect of the present application, there is provided a quantization method including: setting a quantization point; selecting at least one quantization point with a distance from the original feature vector smaller than a first preset distance from the set quantization points as a quantization point subset; determining a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points.
According to another aspect of the present application, there is provided a retrieval method including: calculating the distance between the feature vector to be inquired and the quantized feature vector in the database after quantization by adopting the quantization method; and determining at least one quantized feature vector with the distance meeting the condition as a retrieval result.
According to another aspect of the present application, there is provided a quantization apparatus including: a setting unit configured to set a quantization point; a selection unit configured to select, from the set quantization points, at least one quantization point whose distance from the original feature vector is smaller than a first predetermined distance as a quantization point subset; a quantization unit configured to determine a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points.
According to another aspect of the present application, there is provided a retrieval apparatus including: a calculation unit configured to calculate a distance between the feature vector to be queried and a quantized feature vector stored in the database and quantized by the quantization means; a determination unit configured to determine at least one quantized feature vector for which the distance satisfies a condition as a retrieval result.
According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a quantization method, the quantization method including: setting a quantization point; selecting at least one quantization point with a distance from the original feature vector smaller than a first preset distance from the set quantization points as a quantization point subset; determining a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points.
According to another aspect of the present application, there is provided a non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a retrieval method, the retrieval method including: calculating the distance between the feature vector to be inquired and the quantized feature vector in the database after quantization by adopting the quantization method; and determining at least one quantized feature vector with the distance meeting the condition as a retrieval result.
Other features of the present application will become apparent from the following description of exemplary embodiments with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flow diagram of the LQ technique.
Fig. 2 is a schematic diagram of the determination of quantized feature vectors in the LQ technique.
Fig. 3(a) is a schematic diagram of setting 3 quantization points in the feature space.
Fig. 3(b) is a schematic diagram of setting 6 quantization points in the feature space.
Fig. 4 is a schematic diagram of quantization accuracy when 32 quantization points are set and 2080 quantization points are set.
Fig. 5 shows the quantization error obtained based on the LQ technique and the method of the present application.
Fig. 6 is a flowchart illustrating a quantization method according to a first embodiment of the present application.
Fig. 7 is a schematic diagram of a subset of quantization points.
Fig. 8 is a schematic diagram of a subset of quantization points.
Fig. 9 is a schematic diagram of a subset of quantization points.
Fig. 10 is a flow chart illustrating selection of a subset of quantization points.
Fig. 11 is a schematic flow chart of determining a quantized feature vector.
Fig. 12 is a schematic diagram of the quantization process.
Fig. 13(a) is a schematic diagram of determining a quantized feature vector using two quantization points.
Fig. 13(b) is a schematic diagram of determining a quantized feature vector using three quantization points.
Fig. 14 is a schematic diagram of quantization processing of sub-feature vectors.
Fig. 15(a) and 15(b) illustrate two cases where the calculated value of λ is positive and negative, respectively.
Fig. 16(a) and 16(b) are schematic diagrams of values of λ in the feature space.
Fig. 17 is a schematic diagram of a storage structure of quantized feature vectors.
Fig. 18 is a flowchart illustrating a retrieval method according to a second embodiment of the present application.
Fig. 19 is a diagram illustrating the relationship between the feature vector to be queried and the quantization point during retrieval.
Fig. 20 is a geometric view of calculating distances between quantized feature vectors and feature vectors to be queried.
Fig. 21 is a schematic diagram of an image retrieval system.
Fig. 22 is a schematic diagram of a quantization apparatus according to a third embodiment of the present application.
Fig. 23 is a schematic diagram of a search device according to a fourth embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an embodiment have been described in the specification. It should be appreciated, however, that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with device-related and business-related constraints, which may vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
Here, it should also be noted that, in order to avoid obscuring the subject matter of the present application with unnecessary details, only process steps and/or device structures that are closely related to at least the scheme according to the present application are shown in the drawings, and other details that are not so related to the present application are omitted.
In the LQ technique, the greater the number of quantization points set in the feature space, the higher the quantization accuracy. For example, referring to fig. 3(a), if 3 quantization points are set in the feature space, traversing 3 quantization points has
Figure BDA0001973714530000051
And respectively calculating the projection distance from the original characteristic vector to each connecting line by using the connecting lines (3 connecting lines formed by any two quantization points), and determining the connecting line with the shortest distance from the original characteristic vector by calculating for 3 times. Quantization when set in feature spaceWhen the number of points increases to 6, as shown in FIG. 3(b), there is a point where 6 quantization points are traversed
Figure BDA0001973714530000052
The connecting line (15 connecting lines formed by any two quantization points) with the shortest distance to the original feature vector is determined through 15 times of calculation. It follows that as the number of quantization points set in the feature space increases, the amount of calculation of the distance increases much faster. In fact, in order to ensure quantization accuracy, the number of quantization points set in the feature space is far more than 6. As shown in fig. 4, when 32 quantization points are set, an operation is required
Figure BDA0001973714530000053
Experiments prove that the quantization precision is 0.91-0.92 at the moment. When the number of the quantization points is increased to 2080, experiments prove that the quantization precision is remarkably improved to 0.96-0.97, but the operation times are also increased sharply
Figure BDA0001973714530000054
The operation time is also greatly increased.
In order to solve the above problems, the inventors found that: in view of the fact that the distance between a connecting line formed by quantization points closer to the original eigenvector and the original eigenvector is not the shortest and the quantization error of the obtained quantized eigenvector is not the smallest, but the distance between the connecting line and the original eigenvector is not large and the quantization error of the obtained quantized eigenvector is closer than the smallest quantization error, the present application proposes a new method for quantizing eigenvector, and the connecting line between quantization points in the neighborhood range of the original eigenvector (closer to the original eigenvector) is used to determine the quantized eigenvector corresponding to the original eigenvector. By the quantization method, the connecting lines among all quantization points in the feature space do not need to be traversed like an LQ technology, and the distances between the original feature vectors and the connecting lines among a small number of quantization points only need to be calculated, so that the operation amount can be effectively reduced, and the time required by quantization can be shortened.
The left column of fig. 5 shows the quantization errors (20,000 results) after traversing all the quantization points by the LQ technique, the quantization errors refer to the projection distances of the original feature vectors to the connecting lines formed by two quantization points, and when the quantization errors are sorted from small to large, it can be found that the quantization errors of the connecting lines arranged at the top are very similar. The right column of fig. 5 shows the quantization error of the quantized feature vector corresponding to the original feature vector determined by the connecting lines between the quantized points in the neighborhood of the original feature vector (2,000 results for the quantized points in the neighborhood). A comparison of the left and right columns shows that: in the first few rows of the arrangement, the difference between the left and right columns is very small, the smallest quantization error of the right column is equal to the second smallest quantization error of the left column, and the second smallest quantization error of the right column is equal to the third smallest quantization error of the left column. Because the quantization errors of the left column and the right column are very close, the negative influence on the precision can be ignored, but the method greatly reduces the operation amount and the operation time of the quantization process, which is very helpful for improving the performance of a large-scale retrieval system.
Reference will now be made in detail to the present embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. In addition, as a solution to the problem of the present invention, it is not necessary to include all combinations of the features described in the exemplary embodiments.
Example one
In the first embodiment, a method for performing quantization processing on an original feature vector is described, fig. 6 is a flowchart illustrating a quantization method in the first embodiment, and the method in the first embodiment is described below with reference to fig. 6.
In step S10, quantization points are set in the feature space based on a previously set codebook.
In the first embodiment, the quantization points may be arranged in a hierarchical structure, or may be arranged in a non-hierarchical structure (e.g., a flat structure). In order to reduce the amount of operations, it is preferable to set quantization points in a hierarchical structure. An alternative two-layer hierarchical structure is: the top layer quantization point and the sub layer quantization point corresponding to the top layer quantization point, the coordinates of each quantization point and the corresponding relationship between the top layer quantization point and the sub layer quantization point are recorded in the codebook in advance, and in this step S10, the position of each quantization point in the feature space can be directly determined according to the codebook. Preferably, the top-level quantization points are uniformly arranged in the feature space.
The benefit of setting the quantization points in the hierarchy is: assuming that 32 top-layer quantization points are preset in the codebook, and 64 corresponding sub-layer quantization points are set for each top-layer quantization point, after 96 operations of 32+64, the top-layer quantization point closest to the original eigenvector and the sub-layer quantization point closest to the original eigenvector in the top-layer quantization point can be found from all the quantization points. If a common flat structure is used instead of a hierarchical structure, 32 × 64 ═ 2048 operations are required to find the quantization point closest to the original feature vector.
The following description of the present application is provided by taking the two-layer hierarchical structure of the top-layer quantization point and the corresponding sub-layer quantization point as an example, but the present application is not limited to the two-layer hierarchical structure, and a quantization point hierarchical structure having three-layer association relationship may also be applied in the present application. In addition, the codebook referred to in the present application is not different from the codebook known in the art, and the present application does not limit the preset manner of the codebook. The dimension of the feature space referred to in the present application may be any dimension, and the present application does not limit the feature space.
In step S20, a quantization point whose distance from the original feature vector is less than a first predetermined distance is selected from the set quantization points as a quantization point subset.
In this step S20, the quantization points in the quantization point subset are quantization points in the neighborhood range of the original feature vector in the feature space, in other words, the quantization points in the quantization point subset are closer to the original feature vector. The first predetermined distance described herein may be a value set in advance empirically or experimentally, or a value set according to a precision requirement or an amount of computation. For example, when the accuracy requirement is high, a first predetermined distance with a small numerical value may be set, and when the accuracy requirement is low, a first predetermined distance with a large numerical value may be set; for another example, the first predetermined distance with a smaller numerical value may be set when the tolerable computation amount is relatively low, and the first predetermined distance with a larger numerical value may be set when the tolerable computation amount is relatively high.
The following will describe the specific implementation process of step S20 in detail, and will not be described herein again.
Step S30, determining a quantized feature vector corresponding to the original feature vector by using at least two quantized points, wherein there are quantized points from the subset of quantized points.
In step S30, two or more quantization points may be used to determine the quantized feature vector corresponding to the original feature vector, and at least one of these quantization points is a quantization point whose distance from the original feature vector is smaller than the first predetermined distance, and preferably, other quantization points may also be quantization points whose distance from the original feature vector is smaller than the first predetermined distance, and of course, may also be quantization points whose distance from the original feature vector is not smaller than the first predetermined distance, regardless of the actual use of several quantization points. Therefore, the projection distance of the original characteristic vector on the connecting line constructed by two or more quantization points can be controlled to be smaller, and the quantization error between the obtained quantized characteristic vector and the original characteristic vector is ensured to be smaller. In addition, when the quantized feature vector is determined by means of vertical projection on the connecting line, if the number of the quantized points for determining the quantized feature vector is more than two, the number of the connecting lines for performing vertical projection is more than one, and the corresponding quantized feature vector is obtained by performing vertical projection on each connecting line. The more than one quantized feature vector obtained here is the result of quantization of the original feature vector, but it is also possible to select an optimal one from the more than one quantized feature vectors obtained, for example, the quantized feature vector on the connecting line whose vertical projection distance is the shortest.
It should be noted that in the embodiment of the present application, the quantized feature vector is determined by a vertical projection algorithm of the original feature vector on the connecting line formed by two or more quantized points, but the present application does not limit a specific quantization algorithm, as long as an algorithm for determining the quantized feature vector by using the quantized points in the quantized point subset determined by the scheme of the present application can be implemented in the present application.
The following will describe the specific implementation process of step S30 in detail, and will not be described herein again.
In step S40, the determined quantized feature vector is stored.
The quantization process of the original eigenvector is completed through the previous steps S10 to S30, and the quantized eigenvector corresponding to the original eigenvector is obtained, and in this step S40, the quantized eigenvector may be stored in a certain configuration so as to be searched for by using the stored quantized eigenvector in the following. The following will describe the specific implementation process of step S40 in detail, and will not be described herein again.
As can be seen from the above description of the quantization processing in the first embodiment of the present application, the method in the first embodiment can greatly reduce the amount of computation, and shorten the time duration of the quantization processing.
Next, each step shown in fig. 6 will be described in detail. It should be noted that the following detailed description of each step is only an example for implementing each step, and the present application is not limited to the following detailed description.
1. A detailed description of the selection of the quantization point subset of step S20.
The quantization points set in step S10 include top-level quantization points and their corresponding sub-level quantization points, and fig. 7 shows the positions of the original eigenvector and the quantization points set according to the codebook in the eigenspace, where a triangle represents the original eigenvector, and 3 top-level quantization points (represented by larger squares) out of 12 quantization points set, and each top-level quantization point corresponds to 3 sub-level quantization points (represented by smaller squares). Note that 12 quantization points are set in fig. 7 for easy understanding with a smaller number of quantization points, and the number of quantization points actually set in the feature space is likely to be far more than 12.
Step S20 may select a quantization point less than a first predetermined distance from the original feature vector on the basis of the quantization points set in step S10. One way that can be used is: the quantization points are selected based on the distance between the top quantization point and the original feature vector, and the specific implementation process is described in detail below with reference to fig. 10.
S20-a: and selecting a top quantization point which is not processed yet, and calculating the distance between the original feature vector and the top quantization point.
Because the distance between the top-layer quantization point and the corresponding sub-layer quantization point is shorter, the top-layer quantization point which is closer to the original characteristic vector is found by calculating the distance between the original characteristic vector and the top-layer quantization point, and the sub-layer quantization point corresponding to the top-layer quantization point can be regarded as the quantization point which is also closer to the original characteristic vector, and the distance between the original characteristic vector and each quantization point does not need to be calculated.
S20-b: and based on the calculated distance, sequencing the top quantization points according to the distance from the top quantization point to the original characteristic vector from near to far.
S20-c: and judging whether the top-level quantization point which is not executed in the step S20-a exists, if so, returning to the step S20-a, and otherwise, executing the step S20-d.
Step S20-d: and determining at least one top-layer quantization point with the distance from the original characteristic vector meeting the set distance requirement, and taking the top-layer quantization point and the corresponding sub-layer quantization point as a quantization point subset.
In this step S20-d, a top-level quantization point with the shortest distance to the original eigenvector and a sub-level quantization point corresponding to the top-level quantization point may be used as the quantization point subset, see fig. 7, where the quantization points enclosed by the dotted line are the quantization point subset; a plurality of top-layer quantization points having the shortest distance to the original eigenvector and a sub-layer quantization point corresponding to each top-layer quantization point may also be used as the quantization point subset, and fig. 8 illustrates a case where two top-layer quantization points having the shortest distance and a corresponding sub-layer quantization point are used as the quantization point subset. In this step, one or more top-level quantization points and corresponding sub-level quantization points may be selected as a subset of quantization points according to actual requirements. Further, the selected top-layer quantization points and the corresponding sub-layer quantization points thereof may be further refined, that is, the selected quantization points (the selected top-layer quantization points and the corresponding sub-layer quantization points thereof) are divided into one or more sets, and some or all of the sets are used as the quantization point subsets. Fig. 9 is based on fig. 7, and divides the top-layer quantization points at the bottom right and the corresponding sub-layer quantization points into two sets, one or both of which can be used as the quantization point subset.
The process of this step S20 is described below by way of a specific example. Assuming that the ID of the original Feature vector is Feature400, which is an n-dimensional Feature vector, 32 top-level quantization points ID1 to ID32 are recorded in a predetermined codebook, and based on the distance from the Feature400 to each top-level quantization point, it can be assumed that ID2 is the top-level quantization point whose distance from the Feature400 satisfies a predetermined distance requirement (e.g., is the shortest). Thus, the sub-level quantization points corresponding to top-level quantization point ID2 and top-level quantization point ID2 may be selected as the subset of quantization points.
The method shown in steps S20-a to S20-d is an implementation method for selecting the subset of quantization points, but the method for selecting the subset of quantization points in the first embodiment is not limited thereto. For example, distances between the original feature vector and all quantization points may be calculated separately, and a number of quantization points having a distance from the original feature vector smaller than a first predetermined distance may be selected as the subset of quantization points.
2. A detailed description of the determination of the quantized feature vector of step S30.
The quantization points used to determine the quantized feature vector may be partially from the subset of quantization points or may be entirely from the subset of quantization points. Assuming that the quantization points for the quantization process are all from a subset of quantization points, a greedy algorithm may be used to speed up the operation. Fig. 11 illustrates a flowchart of the steps for determining quantized feature vectors based on a greedy algorithm, which is described in detail below.
Step S30-a: and sequencing the distances between each quantization point in the quantization point subset and the original characteristic vector, and selecting at least one quantization point with the distance meeting the condition.
Here, at least one closest quantization point may be selected, and in the case of a higher tolerance to computational resource occupation, a larger number of quantization points, or even all quantization points in the subset of quantization points, may be selected.
Step S30-b: still further quantization points that have not been processed are selected from the at least one quantization point selected in step S30-a.
This step S30-b is to select a quantization point for the subsequent step, and an alternative way is to randomly select a further quantization point that has not been processed from the at least one quantization point selected in step S30-a; preferably, the distances between at least one quantization point selected in step S30-a and the original feature vector may be sorted in advance, and the quantization points are sequentially selected in the order of decreasing distance in step S30-b.
Step S30-c: the quantization feature vector corresponding to the original feature vector is determined using the quantization point selected in step S30-b and other quantization points in the subset of quantization points.
The greedy algorithm is described here by taking an example in which the original feature vector is quantized using two quantization points, but when quantization is performed using another algorithm, the quantization is not limited to being performed using two or more quantization points.
Step S30-d: and judging whether unprocessed quantization points exist in the quantization points selected in the step S30-a, if so, jumping to the step S30-b, and if not, executing the step S30-e.
Taking fig. 12 as an example, assuming that there are 10 quantization points in the quantization point subset, 5 quantization points closest to the original feature vector are selected in step S30-a. When quantization point 1 is selected in step S30-b, in step S30-c, original eigenvectors are quantized by using quantization point 1 and quantization point 2, respectively, to obtain corresponding quantized eigenvectors1-2Quantizing the original characteristic vector by using the quantization points 1 and 3 to obtain the corresponding quantized characteristic vector1-3Quantizing the original characteristic vector by using the quantization points 1 and 4 to obtain the corresponding quantized characteristic vector1-4And by analogy, finally, quantizing the original feature vector by using the quantization point 1 and the quantization point 10Processing to obtain corresponding quantized feature vector1-10At this time, the quantization process using the quantization point 1 is completed. Then, quantization point 2 is selected in step S30-b, and then quantization processing is performed on the original eigenvector using quantization point 2 and quantization point 3 to obtain the corresponding quantized eigenvector in step S30-c2-3And by analogy, finally, quantizing the original feature vector by using the quantization point 2 and the quantization point 10 to obtain the corresponding quantized feature vector2-10At this time, the quantization process using the quantization point 2 is completed. The above process is repeated, and after traversing the quantization points 1 to 5 selected in the step S30-a, the steps S30-b to S30d are ended.
Preferably, in order to further reduce the amount of computation, a maximum value of the number of computations may be set in advance, requiring the processes of steps S30-b to S30-d to be performed without exceeding the maximum value. An optional way is: in step S30-b, quantization points are sequentially selected in the order of being from the near to the far from the original eigenvector, then in step S30-c, the quantization operation is performed once for each quantization point used for performing the quantization operation on the original eigenvector, when the count value does not exceed the maximum value of the operation times, the steps S30-c and S30-d are normally performed, and when the count value exceeds the maximum value of the operation times, the steps S30-c and S30-d are stopped, and the process directly proceeds to step S30-e.
Step S30-e: and taking the determined quantized feature vector with the minimum quantization error as a quantization result of the original feature vector to finish the determination process of the quantized feature vector.
The greedy algorithm of the above steps S30-a to S30-e is an alternative implementation of determining the quantized feature vector in the first embodiment, but it should be understood that the present embodiment is not limited thereto. For example, any at least two quantization points may be selected from the quantization point subset to determine a quantization feature vector corresponding to the original feature vector; or selecting at least two quantization points closest to the original feature vector from the quantization point subset to determine a quantization feature vector corresponding to the original feature vector; or selecting any at least one quantization point from the quantization point subset and selecting at least one quantization point from the quantization point subset to determine the quantization feature vector corresponding to the original feature vector.
The above describes how to select a quantization point for quantizing an original feature vector, and next, how to calculate a quantized feature vector corresponding to the original feature vector using the quantization point after determining the quantization point for performing quantization processing on the original feature vector will be described. In the following, a method of calculating a quantized feature vector is described by taking two quantization points and three quantization points as an example, respectively.
Still taking the case of the original Feature vector Feature400 assumed in the description of step S20 as an example, fig. 13(a) illustrates an example of calculating a quantized Feature vector corresponding to an original Feature vector using the top-level quantization point ID2 and the sub-level quantization point ID 2-1. In fig. 13(a), a triangle indicates the position of the original eigenvector in the feature space, two squares indicate the positions of the quantization point ID2 and the quantization point ID2-1 in the feature space, the projection position of the original eigenvector on the connecting line of ID2 and ID2-1 is the quantized eigenvector corresponding to the original eigenvector, and the position of the quantized eigenvector in the feature space is indicated by a circle. The coordinates of ID2, ID2-1, and the original eigenvector in the feature space are known, and for convenience of characterizing the calculated quantized eigenvector by known coordinates in combination with geometric operations, the distance of the quantized eigenvector to the nearer one of ID2 and ID2-1 may be first calculated by the following equation (1).
Figure BDA0001973714530000131
Where L denotes a distance of the quantized feature vector to ID2 (the quantization point closer to the quantized feature vector out of ID1 and ID 2); a represents the distance of ID1 from the original feature vector; b represents the distance of ID2 from the original feature vector; c represents the distance between ID2 and ID 2-1. By setting parameters, e.g. scaling factors
Figure BDA0001973714530000132
The quantized features may be characterized by geometric operations in terms of parameters (e.g., λ), coordinates of ID2, and coordinates of ID2-1And (5) vector quantity.
Fig. 13(b) illustrates an example of calculating a quantized feature vector corresponding to an original feature vector using a quantization point ID2, a quantization point ID2-1, and a quantization point ID2-2 (assuming that ID2-2 is a next-layer quantization point whose distance from the original feature vector is the next shortest among the next-layer quantization points corresponding to ID 2). Similar to the case where two quantized points are used to determine the quantized feature vectors, the original feature vectors are projected on the connecting lines of ID2 and ID2-2 and on the connecting lines of ID2-1 and ID3, respectively (the projection on the connecting lines of ID2 and ID2-1 is omitted here). Setting two proportionality coefficients
Figure BDA0001973714530000133
And
Figure BDA0001973714530000134
wherein L is1Representing the distance of the quantized feature vector of the original feature vector projected on the connecting line of ID2 and ID2-2 to ID2-2, c representing the distance of ID2 to ID2-2, L2The distance from the quantized feature vector of the original feature vector projected on the connecting line of ID2-1 and ID2-2 to ID2-1, d the distance from ID2-2 to ID2-1, and e the distance from the original feature vector to ID 2-2. According to FIG. 13(b), the data are represented by ID2, ID2-1, ID2-2 and λ1And λ2And determining a quantized feature vector corresponding to the original feature vector based on geometric operation. Note that λ here1And λ2The method is used for distinguishing lambda obtained after the original feature vectors are projected on different connecting lines, and the lambda is not directly related.
As can be seen from the above description of fig. 13(a) and 13(b), when the original feature vector is quantized by two or more quantization points, the quantized feature vector corresponding to the original feature vector may be represented by information of each quantization point (e.g., coordinate information of the quantization point) and at least one parameter, where the parameter may be λ, which represents a ratio of a distance L from a quantized feature vector obtained by vertically projecting the original feature vector on a connection line formed by two quantization points to a closest one of the two quantization points to a distance c between the two quantization points, and the number of parameters is equal to the number of connection lines formed between the quantization points used for quantization. It should be noted that the parameter used for characterizing the quantized feature vector may be the above λ, but may also be a parameter that characterizes the quantized feature vector through other geometric operations, and the parameter is not limited herein as long as the quantized feature vector can be characterized by using information and parameters of the quantization point in combination with the geometric operations. The following description will be given taking the above λ as an example.
In the case where the original Feature vector is a Feature vector with a high dimension, for example, Feature400 is a Feature vector of 480d, and for simplification of the operation, Feature400 may be regarded as 60 sub-Feature vectors of 8d, and a quantization operation may be performed for each sub-Feature vector of 8 d. Still taking ID2 and its sub-layer quantization points ID2-1 to ID2-m as the quantization point subset, as shown in fig. 14, the first 8d sub-Feature vector of Feature400 is quantized, which includes: first, two quantization points (assumed as ID2-1 and ID2-5) are determined from ID2 and ID2-1 to ID2-m, and the vertical projection distance of the first 8d sub-eigenvector of Feature400 on the connecting line formed by the first 8d of the two quantization points is shortest; then, according to the method shown in FIG. 13(a), the data are recorded as ID2-1, ID2-5 and λ1To represent the quantized Feature vector corresponding to the first 8d sub-Feature vector of Feature 400. Next, the second sub-Feature vector 8d of Feature400 is quantized, which includes: first, two quantization points (assumed as ID2-2 and ID2-3) are determined from ID2 and ID2-1 through ID2-m, and the second 8d coordinate of the two quantization points has the shortest distance to the sub Feature vector of the second 8d of Feature 400; then, again according to the method shown in FIG. 13(a), the data are recorded as ID2-2, ID2-3 and λ2To represent the quantized Feature vector corresponding to the sub-Feature vector of level 8d of Feature 400. By analogy, the quantization processing is performed on the other 58 sub-Feature vectors of Feature400, and a corresponding quantization Feature vector is obtained for each sub-Feature vector of 8 d. Note that λ here1And λ2Is used to distinguish between λ when quantization processing is performed for different sub-feature vectors, and there is no direct correlation between λ.
As can be seen from the description of fig. 13(a) and 13(b), λ may be used to represent a quantized feature vector corresponding to an original feature vector, and as can be seen from the description of fig. 14, λ may also be used to represent a quantized feature vector corresponding to a sub-feature vector in the original feature vector. In consideration of the directivity of λ, in the case shown in fig. 15(a), λ is greater than 0 and equal to or less than 0.5; in the case shown in fig. 15(b), λ is smaller than 0.
In the actual operation, the range of the calculated value of λ is (— infinity, 0.5), and if the calculated value of λ is directly stored in the memory at each operation, in order to limit the storage resources occupied by λ, the industry generally adopts a method of limiting the number of values of λ, which includes: a certain number of bytes, e.g. 16 bytes, is allocated to lambda, which may have 65536 different values, 65536 values are then set for λ, in practice, the calculated value of λ is more than 65536, and when the calculated value of λ is not included in the preset 65536 values, the value closest to the calculated value of lambda can be found out from 65536 values, and the found value is taken as the lambda obtained in the quantization operation and stored, and the finally stored value is the preset 65536 value, so that the memory resource occupied by the lambda can be limited.
Fig. 16(a) illustrates an example of values of λ in a feature space (for example, a two-dimensional space), and when λ is projected on an extension line of a connecting line between ID1 and ID2 in a perpendicular projection manner, if 65536 preset values of λ do not include an actual projection position of the original feature vector on the connecting line, a value closest to the actual projection position is selected as λ in the current quantization processing. As can be seen from fig. 16(a), this method of selecting λ values nearby makes the actual quantization error larger than the theoretical quantization error, and especially in the region where λ values are sparsely distributed, the actual quantization error is larger.
In the first embodiment, on the basis of the method of using the preset value of λ to replace the calculated value of λ, an optimization method is proposed to reduce the memory space occupied by λ as much as possible while maintaining the quantization accuracy, and the optimization method in the first embodiment will be described in detail below.
The inventors of the present application found that if the values of 65536 λ (in the case of 16 bytes) are set not in the entire feature space but in a certain specific region in the feature space, the distribution of the values of λ in the specific region is relatively dense. If the calculated value of λ falls in the specific region, the quantization error will not be too large even if the calculated value is replaced with the value closest to the calculated value of λ; if the set specific region is a region in which the calculated value of λ falls with high possibility, the quantization accuracy can be ensured to be high as a whole. For example, referring to fig. 16(b), when the feature space is a 2-dimensional space, the value of λ is limited to a specific region of [ -4,0.5], in which the distribution of the values of λ is dense. When the calculated value of lambda falls into the range of the specific area, the actual quantization error is very close to the theoretical quantization error, and the quantization precision is high; when the calculated value of λ is less than-4, the value of-4 can be taken as the value of λ.
By using the method of setting the value of λ in a specific region in the feature space in the first embodiment, the quantization accuracy can be effectively improved compared to the methods known in the art. Even if the number of bytes allocated for λ is reduced from 16 bytes to 8 bytes, and the presetable λ values have 256 values, according to the optimization method of the first embodiment, the 256 λ values are set in a specific region, so that the distribution of the λ values in the characteristic region is dense, thereby ensuring the quantization accuracy while reducing the number of bytes occupied by λ.
The range of the specific area [ -4,0.5] is an example of setting the specific area in the 2-dimensional feature space, and the specific area may also be set in more-dimensional feature spaces according to actual needs. Of course, the value of λ set in the specific region may be a predetermined proportionality coefficient according to a requirement for accuracy or the like.
3. A detailed description of the storage of the quantized feature vector of step S40.
After determining the quantized feature vector corresponding to the original feature vector in step S30, the quantized feature vector may be stored according to a certain structure, which is described below in the description of the storage structure shown in fig. 17. It should be noted that the storage structure shown in fig. 17 is an example of a structure that can be adopted in the first embodiment, and the present embodiment is not limited to this storage structure at all as long as it is possible to store parameters such as a codebook, parameters for characterizing a quantized feature vector (information of a quantization point and λ), and the like completely, and to perform subsequent retrieval processing using the stored contents.
A codebook including information of a top-layer quantization point and information of a sub-layer quantization point corresponding to the top-layer quantization point (e.g., coordinates of quantization points and a corresponding relationship between quantization points) and information for characterizing quantization feature vectors are stored in the storage structure shown in fig. 17; the information used to characterize the quantized feature vector is the information of the quantization point and λ. Specifically, the storage structure shown in fig. 17 stores the following information: the codebook, the ID of the original feature vector, the first part information and the second part information.
The first partial information is information for calculating two or more quantization points of a quantized feature vector corresponding to the original feature vector, with the original feature vector being regarded as a whole. Taking the example of performing quantization processing on the original Feature vector by using two quantization points, the information of the two quantization points constituting the connection line where the quantization Feature vector corresponding to the original Feature vector Feature400 is located may be the information of the first part; for another example, if there is a top quantization point in the quantization points used in the quantization process, the information of the top quantization point and the information of a sub-layer quantization point closest to the original Feature vector Feature400 in the sub-layer quantization points corresponding to the top quantization point may be used as the information of the first part; in general, a top-layer quantization point closest to the original Feature vector Feature400 is used as a quantization point at the time of quantization processing, and therefore, information of a top-layer quantization point closest to the original Feature vector Feature400 and a sub-layer quantization point closest to the original Feature vector Feature400 among the sub-layer quantization points thereof may be used as information of the first part. Assuming that a top-layer quantization point closest to Feature400 is ID2 and corresponding sub-layer quantization points are ID2-1 to ID2-m, assuming that ID2-1 among the sub-layer quantization points is the shortest distance from the original Feature vector compared to other sub-layer quantization points, ID2 and ID2-1 can be used as the first partial information.
The second part of information is information for characterizing two quantization points and corresponding λ used when quantizing each sub-Feature vector, when the original Feature vector is treated as a plurality of sub-Feature vectors (e.g., 480d Feature400 is treated as 60 8d sub-Feature vectors). In the quantization processing on the sub Feature vectors as shown in FIG. 14, information λ of two quantization points (assumed to be ID2-1 and ID2-5) for performing quantization processing on the first 8d sub Feature vector of Feature4001Is the second partial information, and the corresponding two quantization points and the corresponding λ that quantize the other 59 sub-eigenvectors of 8d are the second partial information.
Through the quantization processing on the original feature vector in the first embodiment, compared with the LQ technique, the amount of computation can be effectively reduced and the time required for quantization can be shortened while ensuring higher quantization accuracy.
Example two
The second embodiment of the present application describes a retrieval method, which can perform feature matching in a database after receiving a feature vector to be queried input by a user, and retrieve a result matched with the feature vector to be queried (with high similarity). Here, the feature vectors for indexing stored in the database are quantized feature vectors subjected to the quantization processing of the first embodiment, and one or more indexed quantized feature vectors closest in distance (highest in similarity) are output to the user as a retrieval result by calculating the distance between the feature vector to be queried and the indexed quantized feature vectors in the database. Considering that the quantized feature vectors stored in the database for being indexed are massive, if the distances between the feature vectors to be queried and the quantized feature vectors stored in the database are calculated separately, the amount of computation is definitely huge. In this respect, in addition to the conventional search method, the second embodiment further provides an optimized search method, which mainly includes: and taking the top layer quantization point and the corresponding sub-layer quantization point as an intermediary, taking the top layer quantization point close to the feature vector to be queried and the sub-layer quantization point thereof as quantization feature vectors obtained by quantization processing, taking the quantization feature vectors as indexed quantization feature vectors, and retrieving one or more retrieval results closest to the feature vector to be queried from the indexed quantization feature vectors. By using the optimization method of the second embodiment, the calculation amount can be greatly reduced while ensuring the retrieval accuracy.
Fig. 18 shows a flowchart of the retrieval method of the second embodiment, which is described in detail as follows.
Step S100: and receiving the feature vector to be queried input by the user.
Step S200: and determining top-layer quantization points with the distance to the feature vector to be queried being smaller than a second preset distance according to the distance between each top-layer quantization point and the feature vector to be queried.
In step S200, based on the position of the feature vector to be queried in the feature space, the top quantization points are sorted according to the sequence from near to far from the feature vector to be queried by using the coordinate information of the top quantization points in the codebook, and at least one top quantization point whose distance from the feature vector to be queried is smaller than a second predetermined distance (i.e., closer to the feature vector to be queried) is determined. Preferably, at least one top-level quantization point closest to the distance of the feature vector to be queried may be used as the determination result of the step S200. Assuming that the top quantization points 2, 5, and 9 are determined to be top quantization points having a distance to the feature vector to be queried smaller than a second predetermined distance after the processing in step S200, the determined 3 top quantization points are sequentially arranged in order of distance from the feature vector to be queried from near to far.
Step S300: and determining a sub-layer quantization point with the distance to the feature vector to be queried smaller than a third preset distance from the sub-layer quantization point corresponding to the top-layer quantization point determined in the step S200.
Taking the top quantization point 2 in fig. 19 as an example, first, all the sub-layer quantization points corresponding to the top quantization point 2 are determined according to the codebook, and then, the sub-layer quantization points whose distances from the feature vector to be queried are smaller than a third predetermined distance are determined from the sub-layer quantization points, assuming that the sub-layer quantization points whose distances from the feature vector to be queried are 2-1, 2-3, and 2-7, the 3 sub-layer quantization points are sequentially recorded in a tree structure with the top quantization point according to the order of the distances from the feature vector to be queried from the near to the far. Taking the top quantization point 9 as an example, firstly determining all the sub-layer quantization points corresponding to the top quantization point 9, then determining the sub-layer quantization points with the distance to the feature vector to be queried being smaller than a third preset distance, assuming that the sub-layer quantization points with the distance to the feature vector to be queried being 9-8, 9-12 and 9-30, and recording the sub-layer quantization points 9-12, 9-30 and 9-8 in sequence with the tree structure of the top quantization point according to the sequence of the distances to the feature vector to be queried from near to far.
Note that, in step S20 of the first embodiment, "first predetermined distance" for defining the distance between the quantization point and the original feature vector is mentioned, "second predetermined distance" for defining the distance between the top-level quantization point and the feature vector to be queried is mentioned in step S200, and "third predetermined distance" for defining the distance between the sub-level quantization point and the feature vector to be queried is mentioned in step S300. There is no necessarily a relation between "first predetermined distance", "second predetermined distance" and "third predetermined distance" herein, and "first", "second" and "third" herein are used to refer to predetermined distances in different steps.
Step S400: based on the top-layer quantization points determined in step S200 and the sub-layer quantization points determined in step S300, the quantized feature vectors generated using these quantization points are determined.
In the example shown in fig. 19, for each quantization point (top-level quantization point or sub-level quantization point), a quantized feature vector generated with the quantization point is determined. For example, the quantized feature vector generated using the sub-layer quantization point 2-1 is the quantized feature vector 400.
Since the top-level quantization point determined in step S200 and the sub-level quantization point determined in step S300 are both quantization points having a short distance to the feature vector to be queried, and the distance between the feature vector to be queried and the feature vector to be queried generated and stored in the database using these quantization points is generally not too long, the quantized feature vector determined in step S400 can be used as the indexed feature vector, and the amount of computation during retrieval can be greatly reduced while ensuring the retrieval accuracy.
Step S500: and performing indexing on the quantized feature vector determined in the step S400 to obtain a retrieval result.
In this step S500, the distances between the quantized feature vectors determined in fig. 19 and the feature vectors to be queried may be sequentially calculated, so as to retrieve at least one quantized feature vector closest to the feature vector to be queried as a retrieval result. As a preferable scheme, in order to further reduce the calculation amount and improve the retrieval efficiency, the number of times of indexing may be preset, after the distance between each quantized feature vector and the feature vector to be queried is calculated, the counter automatically adds 1 until the number of times of calculation reaches the number of times of indexing, and at least one quantized feature vector which is calculated to be closest to the feature to be queried is taken as the retrieval result.
Next, a retrieval method of the second embodiment is described based on the storage structure shown in fig. 17. As shown in fig. 17, assume that there are two original Feature vectors Feature400 and Feature 500 under the quantization points ID2 and ID2-1, i.e., ID2 is the top-level quantization point closest to the two original Feature vectors, and the sub-level quantization point ID2-1 is the sub-level quantization point closest to the two original Feature vectors. Feature400 and Feature 500 are Feature vectors of 480d, which can be respectively regarded as 60 sub-Feature vectors of 8d, and the corresponding quantized Feature vector of each sub-Feature vector is characterized by a quantized point and λ. Upon receiving the Feature vector Feature600 to be queried input by the user, assuming that the top-level quantization point having a distance to the Feature vector Feature600 to be queried determined in step S200 that is less than the second predetermined distance is ID2, the sub-level quantization point having a distance to the Feature vector Feature600 to be queried that is less than the third predetermined distance is ID2-1 in step S300. As can be seen from a query of the storage structure shown in fig. 17, there are two Feature vectors Feature400 and Feature 500 under the quantization points ID2 and ID2-1, and these two Feature vectors can be directly output to the user as the end of the search. Of course, in order to further improve the retrieval accuracy, the two feature vectors may be indexed to determine an optimal retrieval result.
In order to determine the optimal search result in the Feature vectors Feature400 and Feature 500, the distances (the shorter the distance, the higher the similarity) between the Feature400 and Feature 500 and the Feature vector to be queried 600 can be calculated, respectively. An alternative calculation method is: regarding the Feature vector Feature600 to be queried as 60 sub-Feature vectors of 8d, summing the distance between each sub-Feature vector and each sub-Feature vector of Feature400, summing the distance between each sub-Feature vector of Feature600 and each sub-Feature vector of Feature 500, determining the distance between the Feature vector Feature600 to be queried and Feature400 and the distance between the Feature vector Feature600 to be queried and Feature 500, and taking the original Feature vector with the shortest distance as a retrieval result.
Fig. 20 shows a geometric view of the distance between one sub-Feature vector in the quantized Feature vector corresponding to Feature400 and one sub-Feature vector in Feature vector to be queried 600. Assuming that the distance between one of the quantized Feature vectors corresponding to Feature400 and the corresponding sub-Feature vector in Feature400 is negligible and small, the distance | δ py |, between one of the sub-Feature vectors in Feature600 to be queried and the corresponding sub-Feature vector in the quantized Feature vector corresponding to Feature400 is zero2Distance | δ p of one of the quantized Feature vectors corresponding to Feature400 from the corresponding sub-Feature vector in Feature4002The sum is approximately equal to the distance between one sub-Feature vector in Feature600 to be queried and the corresponding sub-Feature vector in Feature400, which corresponds to the following formula (2).
Figure BDA0001973714530000211
Wherein the content of the first and second substances,
Figure BDA0001973714530000212
is the sum of the distances of each sub-Feature vector of Feature vectors Feature600 and Feature400 to be queried;
Figure BDA0001973714530000213
is each sub-feature vector | δ py-2+|δp|2The sum of (1);is the sum of the distances of each sub-Feature vector of the Feature vector Feature600 to be queried and the quantized Feature vector corresponding to Feature 400.
And then, calculating the distance between one sub-Feature vector in the Feature vector Feature600 to be queried and the corresponding sub-Feature vector of the quantized Feature vector corresponding to the Feature400 according to the following formula (3).
|δpy|2=|yb|22*|c|2+λ*(|ya|2-|yb|2-|c|2) (3)
Wherein, assuming that one sub-Feature vector of the quantized Feature vector corresponding to Feature400 is characterized by ID2-1, ID2-5 and λ, ya and yb respectively represent the distances from the corresponding sub-Feature vector of the Feature vector 600 to be queried to ID2-1 and ID 2-5; c represents the distance between ID2-1 and ID 2-5. The distance between the Feature vector Feature600 to be queried and the Feature vectors 400 and 500 can be calculated through the above formula (2) and formula (3), and the original Feature vector with the shortest distance is output to the user as a retrieval result.
Through the scheme of the second embodiment, during retrieval, it is not necessary to traverse all the indexed quantized feature vectors in the database, but the quantized points (top-level quantized points and sub-level quantized points) which are close to the feature vector to be queried and the quantized feature vector at the same time are used as intermediaries to find the set of quantized feature vectors which are possibly close to the feature vector to be queried, and the retrieval is completed by limited index times, so that the operation amount is effectively reduced and the retrieval efficiency is improved under the condition of ensuring the retrieval accuracy.
The image retrieval system of the present application is described below with reference to fig. 21. The image retrieval system includes: an image analysis device 1001, a quantization device 1002, a memory 1003, and a search device 1004. The image analysis apparatus 1001 is configured to receive a human body image directly or via a network from an external device (such as a camera), and then extract an original feature vector of the human body image. The quantization means 1002 is configured to perform quantization processing on the original feature vector extracted by the image analysis means 1001, for example, the quantization processing described in the first embodiment, and store the quantized feature vector after the quantization processing in the memory 1003. The retrieving device 1004 is configured to, upon receiving the quantized feature vector to be queried, retrieve (e.g., retrieving described in the second embodiment) the quantized feature vector with a similarity satisfying the requirement from the database 1003 as a retrieval result to be presented to the user, thereby completing the retrieval process of the image.
The image retrieval system of the present application includes, in addition to the above components, components required for system operation, such as a power supply unit, a processor (CPU), a network interface, an I/O interface, a bus, and the like, which are not described herein again.
The quantizing means and the retrieving means are described in detail below, respectively.
EXAMPLE III
Fig. 22 is a schematic diagram of a quantization apparatus according to a third embodiment of the present application, where the quantization apparatus includes: a setting unit 2001, a selecting unit 2002, and a quantizing unit 2003, wherein the setting unit 2001 is configured to set a quantization point in a feature space according to a codebook set in advance; the selecting unit 2002 is configured to select a part of the quantization points from the set quantization points as a subset of quantization points, wherein a distance between a quantization point in the subset of quantization points and the original feature vector is smaller than a first predetermined distance; the quantization unit 2003 is configured to determine a quantized feature vector corresponding to the original feature vector using at least two quantization points, of which there are quantization points from the subset of quantization points.
The setting unit 2001 may perform the process of setting quantization points in step S10 in embodiment one, the selection unit 2002 may perform the process of selecting a subset of quantization points in step S20 in embodiment one, and the quantization unit 2003 may perform the quantization process in step S30 in embodiment one.
Further, the quantization apparatus may further include a storage processing unit 2004 configured to store the quantized feature vectors obtained after quantization into the memory 1003 in a certain structure. The storage processing unit 2004 may perform the storage processing of step S40 in the first embodiment, storing the quantized feature vector into the memory 1003 in a structure such as that shown in fig. 17.
Example four
Fig. 23 is a schematic structural diagram of a retrieval apparatus according to a fourth embodiment of the present application, where the retrieval apparatus includes: a calculation unit 3001 and a determination unit 3002, wherein the calculation unit 3001 is configured to calculate a distance between a feature vector to be queried and a stored quantized feature vector in a database; the determining unit 3002 is configured to determine at least one quantized feature vector closest to the search result.
Further, the calculating unit 3001 specifically includes: a quantization point determination subunit 3001-a configured to determine a top-level quantization point whose distance from the feature vector to be queried is less than a second predetermined distance and determine a sub-level quantization point whose distance from the feature vector to be queried is less than a third predetermined distance among sub-level quantization points corresponding to the top-level quantization point; a quantization feature vector determination subunit 3001-b configured to determine a quantization feature vector resulting from the top-layer quantization point or the sub-layer quantization point determined by the quantization point determination subunit 3001-a; a computing subunit 3001-c configured to compute a distance between the feature vector to be queried and the quantized feature vector determined by the quantized feature vector determining subunit 3001-b.
Other embodiments
Embodiments of the invention may also be implemented by a computer of a system or apparatus that reads and executes computer-executable instructions (e.g., one or more programs) recorded on a storage medium (also referred to more fully as a "non-transitory computer-readable storage medium") to perform the functions of one or more of the above-described embodiments and/or that includes one or more circuits (e.g., an application-specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiments, and by a method performed by a computer of a system or apparatus by, for example, reading and executing the computer-executable instructions from the storage medium to perform the functions of one or more of the above-described embodiments and/or controlling one or more circuits to perform the functions of one or more of the above-described embodiments. The computer may include one or more processors (e.g., a Central Processing Unit (CPU), Micro Processing Unit (MPU)) and may include a separate computer or a network of separate processors to read out and execute computer-executable instructions. The computer-executable instructions may be provided to the computer from, for example, a network or a storage medium. The storage medium may include, for example, one or more of a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), storage of a distributed computing system, an optical disk such as a Compact Disk (CD), a Digital Versatile Disk (DVD), or a blu-ray disk (BD) (registered trademark), a flash memory device, a memory card, and the like.
The embodiments of the present invention can also be realized by a method in which software (programs) that perform the functions of the above-described embodiments are supplied to a system or an apparatus through a network or various storage media, and a computer or a Central Processing Unit (CPU), a Micro Processing Unit (MPU) of the system or the apparatus reads out and executes the methods of the programs.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Claims (20)

1. A quantization method, the quantization method comprising:
setting a quantization point;
selecting at least one quantization point with a distance from the original feature vector smaller than a first preset distance from the set quantization points as a quantization point subset;
determining a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points.
2. The quantization method according to claim 1, wherein selecting the subset of quantization points from the set quantization points comprises:
a subset of quantization points is selected from the quantization points arranged in a hierarchical structure.
3. The quantization method according to claim 2, wherein the quantization points arranged in a hierarchical structure include a top-level quantization point and a sub-level quantization point corresponding to the top-level quantization point;
selecting the subset of quantization points from the set quantization points, specifically comprising:
calculating the distance between the original characteristic vector and each top-layer quantization point;
and selecting at least one top-layer quantization point with the distance from the original characteristic vector meeting the set distance requirement and a sub-layer quantization point corresponding to the top-layer quantization point as the quantization point subset.
4. The quantization method according to claim 3, wherein at least one top quantization point whose distance from the original eigenvector satisfies a set distance requirement and a sub-level quantization point partition set corresponding to the top quantization point are selected as the quantization point subset.
5. The quantization method according to claim 1, wherein, when there are more than two quantization points for determining the quantized feature vector corresponding to the original feature vector, the quantized feature vectors obtained based on any two quantization points are determined, and one quantized feature vector is selected from the determined quantized feature vectors as the quantized feature vector corresponding to the original feature vector.
6. The quantization method according to claim 1, wherein determining the quantized feature vector corresponding to the original feature vector using at least two quantization points specifically includes:
connecting any two quantization points in the at least two quantization points pairwise, and respectively calculating the vertical projection of the original characteristic vector on each connecting line;
and determining a connecting line with the projection distance meeting the requirement, and determining the projection of the original characteristic vector on the connecting line with the projection distance meeting the requirement as a quantized characteristic vector corresponding to the original characteristic vector.
7. The quantization method according to claim 1, wherein when at least two quantization points for determining a quantized feature vector corresponding to an original feature vector are both from the subset of quantization points, determining the quantized feature vector corresponding to the original feature vector comprises:
selecting at least one quantization point from the subset of quantization points, the distance between which and the original feature vector meets the condition;
connecting each quantization point selected from the quantization point subset with other quantization points in the quantization point subset respectively, and calculating the vertical projection of the original characteristic vector on each connecting line;
and determining a connecting line with the projection distance meeting the requirement, and determining the projection of the original characteristic vector on the connecting line with the projection distance meeting the requirement as a quantized characteristic vector corresponding to the original characteristic vector.
8. The quantization method of claim 7, wherein connecting each quantization point selected from the subset of quantization points to other quantization points in the subset of quantization points, respectively, and calculating a vertical projection of the original feature vector on each connection line comprises:
sequentially connecting each quantization point selected from the quantization point subset with other quantization points in the quantization point subset according to the sequence of the distances from the quantization points to the original characteristic vector from near to far;
and when the calculation times of the vertical projection of the original feature vector on the connecting line exceed the set maximum value, finishing the calculation of the vertical projection of the original feature vector.
9. The quantization method of claim 6, wherein the determined quantized feature vector is characterized by information for determining at least two quantization points of the quantized feature vector to which the original feature vector corresponds and at least one parameter.
10. The quantization method according to claim 9, wherein the parameter is a ratio of a distance of a quantized feature vector obtained by projecting an original feature vector on a connecting line formed by two quantized points to a closer quantized point of the two quantized points and a distance of the two quantized points, considering a direction, and the number of the parameter is equal to the number of the connecting lines formed by the at least two quantized points.
11. The quantization method according to claim 10, wherein values of the preset parameters are distributed in a specific region of the feature space, and if a calculated value of the parameter determined during the vertical projection calculation is in the specific region, the value of the preset parameter closest to the calculated value of the parameter is used as a parameter for representing the quantized feature vector;
and if the calculated value of the parameter determined during the vertical projection calculation is not in the specific area, taking the value which is closer to the calculated value of the parameter in the upper limit value and the lower limit value of the specific area as the parameter for representing the quantized feature vector.
12. The quantization method according to claim 3, wherein determining the quantized feature vector corresponding to the original feature vector using at least two quantization points specifically includes:
when a top-level quantization point from the quantization point subset exists in quantization points used for determining a quantization feature vector corresponding to an original feature vector, and when the original feature vector is divided into a plurality of sub-feature vectors, each sub-feature vector divided by the original feature vector is subjected to quantization processing by using the top-level quantization point and a corresponding sub-level quantization point.
13. The quantization method of claim 12, wherein the method further comprises:
storing the determined quantized feature vector by:
the information of the two quantization points of the quantization characteristic vector corresponding to the original characteristic vector is determined, and the information and the parameters of the two quantization points are used when each sub characteristic vector is subjected to quantization processing.
14. The quantization method according to claim 13, wherein the parameter is a ratio of a distance of the sub-feature vector to a quantization point closer to a quantization point of two quantization points used when performing the quantization process, in consideration of a direction, to the distance of the two quantization points.
15. A retrieval method, the retrieval method comprising:
calculating the distance between the feature vector to be queried and a quantized feature vector in a database quantized by the quantization method of any one of claims 1 to 14;
and determining at least one quantized feature vector with the distance meeting the condition as a retrieval result.
16. The retrieval method according to claim 15, wherein calculating the distance between the feature vector to be queried and the quantized feature vector specifically comprises:
determining a top-layer quantization point which is arranged in the feature space and has a distance with the feature vector to be queried smaller than a second preset distance, and determining a sub-layer quantization point which is arranged in the sub-layer quantization point corresponding to the top-layer quantization point and has a distance with the feature vector to be queried smaller than a third preset distance;
determining a quantization feature vector obtained by using the determined top layer quantization point and/or the determined secondary layer quantization point;
and calculating the distance between the feature vector to be queried and the determined quantized feature vector.
17. A quantization apparatus, the quantization apparatus comprising:
a setting unit configured to set a quantization point;
a selection unit configured to select, from the set quantization points, at least one quantization point whose distance from the original feature vector is smaller than a first predetermined distance as a quantization point subset;
a quantization unit configured to determine a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points.
18. A retrieval device, the retrieval device comprising:
a calculation unit configured to calculate a distance between the feature vector to be queried and a quantized feature vector stored in a database and quantized using the quantization apparatus of claim 17;
a determination unit configured to determine at least one quantized feature vector for which the distance satisfies a condition as a retrieval result.
19. A non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a quantization method, the quantization method comprising:
setting a quantization point;
selecting at least one quantization point with a distance from the original feature vector smaller than a first preset distance from the set quantization points as a quantization point subset;
determining a quantized feature vector corresponding to the original feature vector using at least two quantization points, wherein there are quantization points from the subset of quantization points in the at least two quantization points.
20. A non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform a retrieval method, the retrieval method comprising:
calculating the distance between the feature vector to be queried and a quantized feature vector in a database quantized by the quantization method of any one of claims 1 to 14;
and determining at least one quantized feature vector with the distance meeting the condition as a retrieval result.
CN201910126323.8A 2018-06-07 2019-02-20 Feature vector quantization method, feature vector search method, feature vector quantization device, feature vector search device, and storage medium Pending CN110647644A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/431,520 US11308152B2 (en) 2018-06-07 2019-06-04 Quantization method for feature vector, search method, apparatus and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018105819192 2018-06-07
CN201810581919 2018-06-07

Publications (1)

Publication Number Publication Date
CN110647644A true CN110647644A (en) 2020-01-03

Family

ID=69009281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910126323.8A Pending CN110647644A (en) 2018-06-07 2019-02-20 Feature vector quantization method, feature vector search method, feature vector quantization device, feature vector search device, and storage medium

Country Status (1)

Country Link
CN (1) CN110647644A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668632A (en) * 2020-12-25 2021-04-16 浙江大华技术股份有限公司 Data processing method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090034805A1 (en) * 2006-05-10 2009-02-05 Aol Llc Using Relevance Feedback In Face Recognition
US20140270541A1 (en) * 2013-03-12 2014-09-18 Electronics And Telecommunications Research Institute Apparatus and method for processing image based on feature point
US20150280800A1 (en) * 2012-10-22 2015-10-01 Zte Corporation Method and device for performing codebook processing on channel information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090034805A1 (en) * 2006-05-10 2009-02-05 Aol Llc Using Relevance Feedback In Face Recognition
US20150280800A1 (en) * 2012-10-22 2015-10-01 Zte Corporation Method and device for performing codebook processing on channel information
US20140270541A1 (en) * 2013-03-12 2014-09-18 Electronics And Telecommunications Research Institute Apparatus and method for processing image based on feature point

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHANG, CHIN-CHEN, AND WEN-CHUAN WU: "Fast planar-oriented ripple search algorithm for hyperspace VQ codebook", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668632A (en) * 2020-12-25 2021-04-16 浙江大华技术股份有限公司 Data processing method and device, computer equipment and storage medium
CN112668632B (en) * 2020-12-25 2022-04-08 浙江大华技术股份有限公司 Data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US20240152729A1 (en) Convolutional neural network (cnn) processing method and apparatus performing high-speed and precision convolution operations
Garcia et al. K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching
US9864928B2 (en) Compact and robust signature for large scale visual search, retrieval and classification
JP6721681B2 (en) Method and apparatus for performing parallel search operations
US11755880B2 (en) Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
US9563822B2 (en) Learning apparatus, density measuring apparatus, learning method, computer program product, and density measuring system
US20160259815A1 (en) Large scale image recognition using global signatures and local feature information
WO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method, and program
CN108629345B (en) High-dimensional image feature matching method and device
CN110825894A (en) Data index establishing method, data index retrieving method, data index establishing device, data index retrieving device, data index establishing equipment and storage medium
CN111008620A (en) Target user identification method and device, storage medium and electronic equipment
CN111177438A (en) Image characteristic value searching method and device, electronic equipment and storage medium
CN112825199B (en) Collision detection method, device, equipment and storage medium
CN110647644A (en) Feature vector quantization method, feature vector search method, feature vector quantization device, feature vector search device, and storage medium
US11308152B2 (en) Quantization method for feature vector, search method, apparatus and storage medium
CN110442749B (en) Video frame processing method and device
CN111611228B (en) Load balancing adjustment method and device based on distributed database
US11361003B2 (en) Data clustering and visualization with determined group number
CN113065036A (en) Method and device for measuring performance of space supporting point and related components
CN110209895B (en) Vector retrieval method, device and equipment
CN115984671A (en) Model online updating method and device, electronic equipment and readable storage medium
CN113159211B (en) Method, computing device and computer storage medium for similar image retrieval
KR102433384B1 (en) Apparatus and method for processing texture image
Mulzer et al. Approximate k-flat nearest neighbor search
JP6988991B2 (en) Semantic estimation system, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200103