CN110659375A - Hash model training method, similar object retrieval method and device - Google Patents

Hash model training method, similar object retrieval method and device Download PDF

Info

Publication number
CN110659375A
CN110659375A CN201910892285.7A CN201910892285A CN110659375A CN 110659375 A CN110659375 A CN 110659375A CN 201910892285 A CN201910892285 A CN 201910892285A CN 110659375 A CN110659375 A CN 110659375A
Authority
CN
China
Prior art keywords
neighborhood
hamming
layer
data point
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910892285.7A
Other languages
Chinese (zh)
Inventor
周文罡
王敏
李厚强
田奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201910892285.7A priority Critical patent/CN110659375A/en
Publication of CN110659375A publication Critical patent/CN110659375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a Hash model training method, a similar object retrieval method and a similar object retrieval device. And in each layer of neighborhood of the neighborhood pyramid, taking the average distance from the neighbor point in the layer to the reference point as the Euclidean neighborhood measure of the layer of neighborhood. And mapping the data points in the original space to a Hamming space by using a Hash model, and calculating the Hamming neighborhood measure of each layer of the pyramid. The optimization target of the Hash model is to keep neighborhood measure in an original space in a Hamming space, the optimization target can not only keep the distance distribution of real neighbor points, but also keep the sequencing of neighbors, and finally obtain better distance keeping, thereby improving the accuracy of approximate nearest neighbor retrieval. The feature vector of the Hamming space of the object obtained by the Hash model can better keep the features of the object in the Euclidean space, so that the accuracy of similarity retrieval can be improved by the method.

Description

Hash model training method, similar object retrieval method and device
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a Hash model training method, an image retrieval method and an image retrieval device.
Background
With the rapid growth of multimedia data size, the approximate nearest neighbor search technology is widely applied to computer vision and image processing. Given a query sample, the approximate nearest neighbor search technique can find the true nearest neighbor of the query sample from a large-scale dataset with high probability, and the retrieval time complexity is linear or even constant.
A binary hash method in the approximate nearest neighbor search technology maps high-dimensional data points in an original space to low-dimensional binary codes in a Hamming space through distance keeping constraint or semantic similarity constraint. Binary codes can greatly reduce storage overhead, and Hamming distances between binary codes can be efficiently calculated by modern CPU architectures, so that binary hashing has the advantages of low storage space consumption and high distance calculation speed. Therefore, the binary hash method is widely researched and applicable.
The unsupervised hash method performs hash function learning using training data that does not contain any label information. Although existing unsupervised hashing methods exhibit potential retrieval performance on public data sets, how to obtain higher approximate nearest neighbor retrieval accuracy on large-scale data sets remains an issue to be solved.
Disclosure of Invention
In view of the above, the present invention provides a hash model training method, an object retrieval method and an object retrieval device, so as to improve the approximate nearest neighbor retrieval accuracy of the hash method, and the specific technical solution is as follows:
in a first aspect, the present invention further provides a hash model training method, including:
for each data point in the training data set, constructing a neighborhood pyramid, wherein each layer in the neighborhood pyramid is a neighbor point of the data point, and the number of the neighbor points is increased from top to bottom layer by layer;
calculating the distance between the neighbor point of each layer in the neighborhood pyramid and the data point to obtain the Euclidean neighborhood measure of the neighborhood of the layer;
mapping the data points in the data set to be trained to a Hamming space by using a current Hash model to obtain Hamming space vectors corresponding to the data points;
calculating hamming neighborhood measures corresponding to each layer of neighborhood in the neighborhood pyramid by using hamming space vectors corresponding to the data points;
and optimizing the model parameters in the current model according to the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached to obtain a target Hash model.
In one possible implementation, for each data point in the training dataset, constructing a neighborhood pyramid includes:
for each data point, calculating a distance between the data point and all other data points in the training data set;
sequencing all data points in the training data set according to the sequence of the distances from small to large;
sequentially dividing the sorted data points into N data point groups, wherein each data point group comprises a preset number of data points;
and sequentially determining the data point groups contained in each layer of the neighborhood pyramid according to the sequence of the arrangement indexes of the data point groups from small to large, wherein each layer of neighborhood comprises a preset number of data point groups from the first group, and the preset number is equal to the number of layers of the neighborhood in the neighborhood pyramid.
In a possible implementation manner, calculating a distance between a neighboring point of each layer in the neighborhood pyramid and the data point to obtain a euclidean neighborhood measure of the neighborhood of the layer includes:
and calculating the average Euclidean distance between the neighbor point contained in each layer in the neighborhood pyramid and the data point, and taking the average Euclidean distance as the Euclidean neighborhood measure of the neighborhood of the layer.
In one possible implementation, calculating a hamming neighborhood measure corresponding to each layer in the neighborhood pyramid by using the hamming space vector corresponding to the data point includes:
and calculating the average Hamming distance between the Hamming space vector corresponding to the data point and the Hamming space vector corresponding to the neighbor point contained in each layer of neighborhood to obtain the Hamming neighborhood measure corresponding to the layer of neighborhood of the data point.
In a possible implementation manner, optimizing the model parameters in the current hash model according to the euclidean neighborhood measure and the hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached to obtain a target hash model, including:
performing linear fitting on the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point in the training data set to obtain linear transformation parameters;
and alternately updating the model parameters of the current hash model and the linear transformation parameters to minimize the fitting error and obtain the target hash model.
In a second aspect, the present invention further provides a similar object retrieving method, including:
acquiring a Euclidean feature vector of a target object in a Euclidean space;
converting the European feature vector of the target object to obtain a Hamming feature vector of a Hamming space according to a Hash model obtained by pre-training;
obtaining a Hamming distance between the target object and each object in the object set to be retrieved based on the Hamming feature vector of the target object and the Hamming feature vector of each object in the object set to be retrieved, wherein the Hamming feature vector of each object in the object set to be retrieved is obtained by converting the Euclidean feature vector of each object according to the Hash model;
and determining objects similar to the target object from the object set to be retrieved based on the Hamming distance between the target object and each object in the object set to be retrieved.
In a possible implementation manner, the determining, from the set of objects to be retrieved, an object similar to the target object based on a hamming distance between the target object and each object in the set of objects to be retrieved includes:
and determining the previously specified number of objects as objects similar to the target object according to the sequence of the Hamming distance between the target object and each object in the object set to be retrieved from small to large.
In a third aspect, the present invention further provides a hash model training apparatus, including:
the neighborhood pyramid building module is used for building a neighborhood pyramid for each data point in the training data set, wherein each layer in the neighborhood pyramid is a neighbor point of the data point, and the number of the neighbor points is increased layer by layer from top to bottom;
the Euclidean neighborhood measure calculating module is used for calculating the distance between the neighbor point of each layer in the neighborhood pyramid and the data point to obtain the Euclidean neighborhood measure of the neighborhood of the layer;
the vector mapping module is used for mapping the data points in the data set to be trained to a Hamming space by using a current Hash model to obtain Hamming space vectors corresponding to the data points;
the Hamming neighborhood measure calculating module is used for calculating the Hamming neighborhood measure corresponding to each layer of neighborhood in the neighborhood pyramid by utilizing the Hamming space vector corresponding to the data point;
and the model adjusting module is used for optimizing the model parameters in the current model according to the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached to obtain the target Hash model.
In a possible implementation manner, the neighborhood pyramid constructing module is specifically configured to:
for each data point, calculating a distance between the data point and all other data points in the training data set;
sequencing all data points in the training data set according to the sequence of the distances from small to large;
sequentially dividing the sorted data points into N data point groups, wherein each data point group comprises a preset number of data points;
and sequentially determining the data point groups contained in each layer of the neighborhood pyramid according to the sequence of the arrangement indexes of the data point groups from small to large, wherein each layer of neighborhood comprises a preset number of data point groups from the first group, and the preset number is equal to the number of layers of the neighborhood in the neighborhood pyramid.
In a fourth aspect, the present invention further provides a similar object retrieving device, including:
the Euclidean feature vector acquisition module is used for acquiring the Euclidean feature vector of the target object in the Euclidean space;
the vector mapping module is used for converting the European feature vector of the target object to obtain a Hamming feature vector of a Hamming space according to a Hash model obtained by pre-training;
a hamming distance calculation module, configured to obtain hamming distances between the target object and each object in the object set to be retrieved based on the hamming feature vector of the target object and the hamming feature vector of each object in the object set to be retrieved, where the hamming feature vector of each object in the object set to be retrieved is obtained by converting the euclidean feature vector of each object according to the hash model;
and the determining module is used for determining the objects similar to the target object from the object set to be retrieved based on the Hamming distance between the target object and each object in the object set to be retrieved.
According to the Hash model training method provided by the invention, the multilayer neighborhood pyramid is constructed by gradually increasing the number of the neighbor points. In each layer of the neighborhood pyramid, the average distance from the neighbor point in the layer to the reference point is used as the Euclidean neighborhood measure of the neighborhood of the layer. And mapping the data points in the original space to a Hamming space by using a Hash model, and calculating the Hamming neighborhood measure of each layer of the pyramid. The optimization target of the Hash model is to keep neighborhood measure in an original space in a Hamming space, the optimization target can not only keep the distance distribution of real neighbor points, but also keep the sequencing of the neighbor points, and finally obtain better distance keeping, thereby improving the accuracy of approximate nearest neighbor retrieval.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a Hash model training method provided by the present invention;
FIG. 2 is a schematic diagram of a neighborhood of data points according to the present invention;
FIG. 3 is a flow chart of a similar object retrieval method provided by the present invention;
FIG. 4 is a block diagram of a hash model training apparatus provided in the present invention;
fig. 5 is a block diagram of a similar object searching apparatus according to the present invention.
Detailed Description
The goal of conventional unsupervised hashing methods is to achieve distance preservation between the original feature space (usually the euclidean space) and the hamming space. In order to enable an unsupervised hash method to obtain higher approximate nearest neighbor retrieval accuracy on a large-scale data set, the invention provides a hash model training method, which solves the problem of accuracy from the perspective of neighborhood preservation. The method constructs a multilayer neighborhood pyramid by gradually increasing the number of neighbor points. In each layer of the neighborhood pyramid, the average distance from the neighbor point in the layer to the reference point is used as the Euclidean neighborhood measure of the neighborhood of the layer. And mapping the data points in the original space to a Hamming space by using a Hash model, and calculating the Hamming neighborhood measure of each layer of the pyramid. The optimization target of the Hash model is to keep neighborhood measure in an original space in a Hamming space, the optimization target can not only keep the distance distribution of real neighbor points, but also keep the sequencing of neighbors, and finally obtain better distance keeping, thereby obtaining higher approximate nearest neighbor retrieval performance.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of a hash model training method provided by the present invention is shown, where the method is applied to a computing device, and the computing device may be a server, a PC, a PDA, a mobile phone, or the like.
As shown in fig. 1, the method may include the steps of:
s110, constructing a neighborhood pyramid for each data point in the training data set.
Each layer in the neighborhood pyramid is a neighbor point of the data point, and the number of the neighbor points is increased from top to bottom layer by layer.
The process of constructing the neighborhood pyramid may include the following steps:
(1) for each data point, calculating a distance between the data point and all other data points in the training data set;
(2) sequencing all data points in the training data set according to the sequence of the distances from small to large;
(3) sequentially dividing the sorted data points into N data point groups, wherein each data point group comprises a preset number of data points;
(4) and sequentially determining the data point groups contained in each layer of the neighborhood pyramid according to the sequence of the arrangement indexes of the data point groups from small to large, wherein each layer of neighborhood comprises a preset number of data point groups from the first group, and the preset number is equal to the number of layers of the neighborhood in the neighborhood pyramid. For example, a layer 1 neighborhood includes a first set of data points, a layer 2 neighborhood includes a first set and a second set of data points, i.e., the first two sets of data points, a layer 3 neighborhood includes a first, second, and third set of data points, i.e., the first three sets of data points, and so on, and an nth layer includes the first n sets of data points.
Reference data point X is used belowiFor example, the process of constructing the neighborhood pyramid is illustrated:
first, the reference data point X is calculatediAnd (3) sorting the training data points according to Euclidean distances from all the data points in the training data set to obtain a sorted data point index list, and recording the index list as L ═ i1,i2,…,inAnd f, wherein i is an index corresponding to the data point, and n is the total number of data points contained in the training data set.
Then, the index list is divided into a plurality of segments with equal length, the number of data point indexes contained in each segment is equal, and if all the data point indexes are K, the index list division obtains N index segments which are respectively recorded as:
L1={i1,i2,…,iK}
L2={i(K+1),i(K+2),…,i2K}
……
LN={i[(N-1)×K+1],i[(N-1)×K+2],…,iNK};
the constructed neighborhood pyramid P is as follows:
Figure BDA0002209144120000071
in the same way, the neighborhood pyramid of other data points in the training dataset is constructed, which is not described herein again.
S120, calculating the distance between the neighbor point of each layer in the neighborhood pyramid and the data point to obtain the Euclidean neighborhood measure of the neighborhood of the layer.
The euclidean neighborhood measure is the average euclidean distance between the neighbor points and the reference data points contained in each layer of the neighborhood pyramid.
Taking the reference point i shown in fig. 2 as an example, the process of calculating the euclidean neighborhood measure of the neighborhood of each layer of the neighborhood pyramid is described as follows:
as shown in fig. 2, a two-layer neighborhood pyramid is constructed with reference point i as a reference point; wherein, the first layer neighborhood is data points in the neighborhood range with the radius r1, namely data points 1-3; the second layer neighborhood is the data points within the neighborhood of radius r2, data points 1-6.
Firstly, calculating the average Euclidean distance between the reference point i and three nearest neighbor data points as the Euclidean neighborhood measure of the first layer of the neighborhood pyramid; then, the average euclidean distance between the reference point i and its nearest six neighboring data points is calculated as the euclidean neighborhood measure of the second layer.
For layer I neighborhood, data point XiMeasure of Euclidean neighborhood of (1), denoted as rE(i, lK), wherein l ═ 1,2,3, … …, N. Data point XiOf the pyramid, i.e. data point XiAnd the neighborhood measure of each layer in the neighborhood pyramid is recorded as rE(i)=[rE(i,K),rE(i,2K),rE(i,3K),……,rE(i,NK)]。
And respectively taking each data point in the training data set as a reference point, constructing a neighborhood pyramid corresponding to the reference point, and obtaining Euclidean neighborhood measure of the neighborhood pyramid. And then constructing a neighborhood measure matrix of the whole training data set, and recording the neighborhood measure matrix as
Figure BDA0002209144120000081
S130, mapping the data points in the data set to be trained to a Hamming space by using the current Hash model to obtain Hamming space vectors of the data points.
For a data point in an original space (i.e., a Euclidean space), mapping to a Hamming space by using a hash model, wherein the hash model is a binary hash model, and a data point X in the original space isiGenerating corresponding binary code b by using Hash modeli
S140, calculating Hamming neighborhood measure corresponding to each layer of neighborhood in the neighborhood pyramid by using the Hamming space vector corresponding to the data point.
In the Hamming space, the average Hamming distance between the data point and the nearest k neighbor points is calculated to obtain the Hamming neighborhood measure of the k neighborhood of the data point.
By data point XiThe process of calculating the hamming neighborhood measure corresponding to the data point is illustrated as an example:
the data point XiAnd XjThe corresponding Hamming space vectors are respectively biAnd bj,biAnd bjHamming distance therebetween of
Figure BDA0002209144120000082
It should be noted that the hamming space vector may be a binary vector composed of a series of binary numbers, and in other embodiments, the hamming space vector may be other non-binary vectors.
Wherein the data point XiThe average hamming distance from its k nearest neighbor data points in euclidean space is denoted as rH(i, k), and the calculation formula is:
further, data point XiThe hamming neighborhood measure of the neighborhood pyramid is recorded as rH(i)=[rH(i,K),rH(i,2K),rH(i,3K),……,rH(i,NK)]。
Hamming neighborhood measure of the neighborhood pyramid of the entire training data set is recorded as
S150, optimizing the model parameters in the current model according to the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached, and obtaining a target Hash model.
The learning objective of the Hash model provided by the invention is to keep neighborhood measure in Euclidean space in Hamming space, namely the objective function of neighborhood measure keeping is minW,a,b||aRE+b-RH||2
And a and b represent linear transformation coefficients between the Euclidean neighborhood measure matrix and the Hamming neighborhood measure matrix. The hash model of the invention adopts a linear hash function, wherein the projection matrix is W, namely bi=(sgn(Wxi) (+) 1/, 2 where sgn (·) represents a sign function.
The target function maintained by neighborhood measure in the invention can automatically combine the linear distance maintaining target and the target maintained by the neighbor structure. The objective function may be expanded into the form:
Figure BDA0002209144120000093
wherein S iskRepresenting a set of values of the neighbor data point k, i.e. SkK,2K,3K, …, NK. The objective function may be further expanded according to the definition of the neighborhood measure as follows:
wherein ikRepresents the ith reference data point XiThe k-th nearest neighbor point of (1). W is obtained by analyzing the structure definition of the neighborhood pyramid1>w2>……>wNKIf the data point is more than 0, namely the target kept by the neighborhood measure can automatically give a larger weight to a closer neighbor data point in the Euclidean space, and the retrieval precision of approximate nearest neighbor retrieval is favorably improved.
Since the average hamming distance rH (i, k) in the hamming space is calculated by a binary code, discontinuous binary constraint terms exist in the objective function, and direct optimization is difficult. To solve this problem, in the embodiment of the present invention, a binary constraint term is removed, and a quantization error in the process of minimizing binarization is embedded into a final objective function, so as to obtain the final objective function as shown in the following formula:
minW,a,b||aRE+b-RH||2-α||U-0.5||2+β||WTW-I||2(formula 5)
Where 0.5 is a parameter matrix consisting of constants 0.5U ═ sigmoid (wx) is the result of the projection of the training data feature matrix by the linear hash function W, without binarization, where sigmoid () represents the activation function. RHRepresenting neighborhood measures of the neighborhood pyramid in hamming space computed using floating point type vector representations in U instead of the original binary code. The second term in the above formula represents the quantization error in the process of minimizing binarization, the third term represents the projection matrix W of the constraint linear hash function as an orthogonal matrix, the information quantity which can be represented by different bits of the maximized binary code is represented, and I is an identity matrix. Alpha and beta are hyper-parameters that control the relative importance of the three parts of the objective function.
It is still difficult to optimize the parameters W, a, b simultaneously after removing the discontinuous binary constraint terms introduced by the presence of the binary code. The embodiment adopts an alternate iteration mode to optimize: firstly, fixing a projection matrix W of a linear hash function, and optimizing linear transformation parameters a and b; then a and b are fixed and the projection matrix W is optimized. The two steps are alternately iterated until the set iteration times or the objective function value is converged. The following is a detailed objective function optimization algorithm:
(1) after the projection matrix W is fixed, the objective function shown in equation 5 can be organized as follows:
mina,b||aRE+b-RH||2(formula 6)
The objective function is a linear regression function, and a and b can be solved directly by the least square method.
(2) When a and b are fixed, the objective function shown in equation 5 can be rearranged as follows:
minW||aRE+b-RH||2-α||U-0.5||2+β||WTW-I||2(formula 7)
The objective function shown in equation 7 can be directly optimized by a random gradient descent method. According to the chain rule in the gradient calculation process, the gradient of the above equation can be calculated as follows:
Figure BDA0002209144120000101
wherein the content of the first and second substances,
Figure BDA0002209144120000102
uiand ujIs the corresponding row vector in the matrix U.
Due to the existence of the formula 8And
Figure BDA0002209144120000104
b is calculated byiAnd bjIs a binary code and cannot be directly derived. The embodiment of the invention applies the function sigmoid () in the process of generating the binary code and uses the constraint of minimizing the quantization error, so that the sign function is directly abandoned and used in the gradient calculation
Figure BDA0002209144120000111
Andapproximate substitution
Figure BDA0002209144120000113
Andthe direct calculation of (2) introduces only minor errors. Obtaining the gradient of the objective function with respect to the projection matrix W
Figure BDA0002209144120000115
The projection matrix W may then be updated using the following equation:
Figure BDA0002209144120000116
where δ represents the learning rate;
before the learning algorithm begins, W is randomly initialized to a random orthogonal matrix. The random gradient descent algorithm of the above formula is then repeated to update until the objective function converges or a specified number of iterations is reached.
The above two alternating iterative processes are repeated until convergence or a set number of iterations is reached.
According to the Hash model training method provided by the invention, the multilayer neighborhood pyramid is constructed by gradually increasing the number of the neighbor points. In each layer of the neighborhood pyramid, the average distance from the neighbor point in the layer to the reference point is used as the Euclidean neighborhood measure of the neighborhood of the layer. And mapping the data points in the original space to a Hamming space by using a Hash model, and calculating the Hamming neighborhood measure of each layer of the pyramid. The optimization target of the Hash model is to keep neighborhood measure in an original space in a Hamming space, the optimization target can not only keep the distance distribution of real neighbor points, but also keep the sequencing of neighbors, and finally obtain better distance keeping, thereby improving the accuracy of approximate nearest neighbor retrieval.
On the other hand, the invention also provides an embodiment of a similar object retrieval method. Fig. 3 is a flowchart of a similar object retrieving method provided by the present invention, which is applied in a computer device for retrieving an object similar to a given target object from a set of objects to be retrieved. The method may comprise the steps of:
s210, obtaining the Euclidean feature vector of the target object in the Euclidean space.
The target object is a given object at the time of object retrieval, and the object may be an image, a video, text, or the like. As long as an object that can be converted into a vector of the euclidean space can be a given object.
In a possible implementation manner, an existing feature extraction network can be selected for the type of the object to obtain a feature vector of the object in the euclidean space.
And S220, converting the Euclidean feature vector of the target object to obtain a Hamming feature vector of a Hamming space according to the Hash model obtained by pre-training.
The trained hash model herein refers to a hash model obtained by the above-mentioned hash model training method, and the hash model can be understood as a hash function. Pairing objects using a hash functionConversion of Euclidean image feature vectors into Hamming feature vectors, e.g. data points X in object feature vectors as described aboveiAnd converting into a binary vector.
S230, obtaining the Hamming distance between the target object and each object in the object set to be retrieved based on the Hamming feature vector of the target object and the Hamming feature vector of each object in the object set to be retrieved.
The Hamming feature vector of each object in the object set to be retrieved is obtained by converting the Euclidean feature vector of each object according to a Hash model in advance.
Then, the distance between the hamming feature vector of the target object and the hamming feature vector of each object in the object set to be retrieved, namely the hamming distance, is calculated.
It should be noted that the hamming feature vectors of the objects in the object set to be retrieved may be pre-converted before the retrieval is performed, that is, the process is a preprocessing process. The hamming characteristic vector of each object in the object set to be retrieved only needs to be converted once, and can be directly used when similar objects of the target object are retrieved subsequently.
When the object in the object set to be retrieved is updated, the hamming feature vector corresponding to the object needs to be updated.
S240, determining objects similar to the target object from the object set to be retrieved based on the Hamming distance between the target object and each object in the object set to be retrieved.
And finally, determining objects similar to the target object according to the Hamming distance between the target object and each object to be retrieved.
In a possible implementation manner, the objects to be retrieved may be sorted in the order of hamming distance from small to large, and a first specified number of objects to be retrieved are selected to be determined as similar objects of the target object. One or more objects can be selected according to actual requirements to be determined as objects similar to the target object.
It should be noted that a smaller hamming distance indicates that the two objects are more similar.
In the similar object retrieval method provided by this embodiment, the euclidean eigenvector of the object is mapped according to the hash function obtained by training to obtain the hamming eigenvector, then the hamming distance between the objects is calculated according to the hamming eigenvector of the object, and the object similar to the given object is determined according to the hamming distance. The amount of calculation of the hamming distance is much smaller than the amount of data of the euclidean distance, and therefore, the retrieval speed can be improved by using the method. Meanwhile, the trained Hash model not only can keep the distance distribution of the neighbor points in the Euclidean space, but also can keep the sequencing of the neighbor points, and the feature vector of the Hamming space of the object obtained by the Hash model can better keep the features of the object in the Euclidean space, so that the accuracy of similarity retrieval can be improved by the method.
In another aspect, the present invention further provides an embodiment of a hash model training apparatus.
Referring to fig. 4, a block diagram of a hash model training apparatus provided in the present invention is shown, and as shown in fig. 4, the apparatus includes: the system comprises a neighborhood pyramid construction module 110, a Euclidean neighborhood measure calculation module 120, a vector mapping module 130, a Hamming neighborhood measure calculation module 140 and a model adjustment module 150.
A neighborhood pyramid construction module 110, configured to construct a neighborhood pyramid for each data point in the training data set.
Each layer in the neighborhood pyramid is a neighbor point of the data point, and the number of the neighbor points is increased from top to bottom layer by layer.
In a possible implementation manner, the neighborhood pyramid constructing module is specifically configured to:
for each data point, calculating a distance between the data point and all other data points in the training data set;
sequencing all data points in the training data set according to the sequence of the distances from small to large;
sequentially dividing the sorted data points into N data point groups, wherein each data point group comprises a preset number of data points;
and sequentially determining the data point groups contained in each layer of the neighborhood pyramid according to the sequence of the arrangement indexes of the data point groups from small to large, wherein each layer of neighborhood comprises a preset number of data point groups from the first group, and the preset number is equal to the number of layers of the neighborhood in the neighborhood pyramid.
And the euclidean neighborhood measure calculating module 120 is configured to calculate a distance between a neighboring point of each layer in the neighborhood pyramid and the data point, so as to obtain a euclidean neighborhood measure of the layer neighborhood.
In one possible implementation, an average euclidean distance between a neighboring point included in each layer of the neighborhood pyramid and the data point is calculated as a euclidean neighborhood measure of the neighborhood of the layer.
And the vector mapping module 130 is configured to map the data points in the data set to be trained to a hamming space by using the current hash model, so as to obtain hamming space vectors corresponding to the data points.
In a possible implementation manner, an average hamming distance between the hamming space vector corresponding to the data point and the hamming space vector corresponding to the neighbor point included in each layer of neighborhood is calculated to obtain the hamming neighborhood measure corresponding to the layer of neighborhood of the data point.
And a hamming neighborhood measure calculating module 140, configured to calculate hamming neighborhood measures corresponding to neighborhoods of each layer in the neighborhood pyramid by using hamming space vectors corresponding to the data points.
And the model adjusting module 150 is configured to optimize model parameters in the current model according to the euclidean neighborhood measure and the hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached, so as to obtain a target hash model.
In one possible implementation, performing linear fitting on the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point in the training data set to obtain linear transformation parameters; and alternately updating the model parameters of the current hash model and the linear transformation parameters to minimize the fitting error and obtain the target hash model.
According to the Hash model training device provided by the invention, the multilayer neighborhood pyramid is constructed by gradually increasing the number of the neighbor points. In each layer of the neighborhood pyramid, the average distance from the neighbor point in the layer to the reference point is used as the Euclidean neighborhood measure of the neighborhood of the layer. And mapping the data points in the original space to a Hamming space by using a Hash model, and calculating the Hamming neighborhood measure of each layer of the pyramid. The optimization target of the Hash model is to keep neighborhood measure in an original space in a Hamming space, the optimization target can not only keep the distance distribution of real neighbor points, but also keep the sequencing of the neighbor points, and finally obtain better distance keeping, thereby improving the accuracy of approximate nearest neighbor retrieval.
In another aspect, the present invention further provides an embodiment of a similar object retrieving apparatus.
Referring to fig. 5, a block diagram of a similar object retrieving apparatus provided in the present invention is shown, where the apparatus may include: the euclidean feature vector obtaining module 210, the vector mapping module 220, the hamming distance calculating module 230, and the determining module 240.
The euclidean feature vector obtaining module 210 is configured to obtain a euclidean feature vector of the target object in a euclidean space.
And the vector mapping module 220 is configured to convert the euclidean feature vector of the target object to obtain a hamming feature vector of a hamming space according to a hash model obtained through pre-training.
A hamming distance calculating module 230, configured to obtain hamming distances between the target object and each object in the object set to be retrieved based on the hamming feature vector of the target object and the hamming feature vectors of each object in the object set to be retrieved.
And the Hamming feature vector of each object in the object set to be retrieved is obtained by converting the Euclidean feature vector of each object according to the Hash model.
A determining module 240, configured to determine, based on hamming distances between the target object and each object in the set of objects to be retrieved, an object similar to the target object from the set of objects to be retrieved.
In a possible implementation manner, the objects to be retrieved may be sorted in the order of hamming distance from small to large, and a first specified number of objects to be retrieved are selected to be determined as similar objects of the target object. One or more objects can be selected according to actual requirements to be determined as objects similar to the target object.
It should be noted that a smaller hamming distance indicates that the two objects are more similar.
In the similar object retrieval device provided in this embodiment, the euclidean eigenvector of the object is mapped according to the trained hash function to obtain the hamming eigenvector, then the hamming distance between the objects is calculated according to the hamming eigenvector of the object, and the object similar to the given object is determined according to the hamming distance. The amount of calculation of the hamming distance is much smaller than the amount of data of the euclidean distance, and therefore, the retrieval speed can be improved by using the method. Meanwhile, the trained Hash model not only can keep the distance distribution of the neighbor points in the Euclidean space, but also can keep the sequencing of the neighbor points, and the feature vector of the Hamming space of the object obtained by the Hash model can better keep the features of the object in the Euclidean space, so that the accuracy of similarity retrieval can be improved by using the device.
In another aspect, an embodiment of the present invention provides an apparatus, where the apparatus includes a processor, a memory, and a program stored in the memory and executable on the processor, and when the processor executes the program, the hash model training method and the similar object retrieval method are implemented. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The processor herein may be a CPU of the terminal, or an MCU integrated within the terminal, or a combination of the CPU and the MCU. The processor comprises a kernel, the kernel fetches corresponding programs from the memory, and the kernel can be set to one or more than one.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The application also provides a storage medium executable by the computing device, wherein the storage medium stores a program, and the program realizes the Hash model training method and the similar object retrieval method when being executed by the computing device.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The steps in the method of the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs.
The device and the modules and sub-modules in the terminal in the embodiments of the present application can be combined, divided and deleted according to actual needs.
In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of a module or a sub-module is only one logical division, and there may be other divisions when the terminal is actually implemented, for example, a plurality of sub-modules or modules may be combined or integrated into another module, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules or sub-modules described as separate parts may or may not be physically separate, and parts that are modules or sub-modules may or may not be physical modules or sub-modules, may be located in one place, or may be distributed over a plurality of network modules or sub-modules. Some or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, each functional module or sub-module in the embodiments of the present application may be integrated into one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated into one module. The integrated modules or sub-modules may be implemented in the form of hardware, or may be implemented in the form of software functional modules or sub-modules.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for training a hash model, comprising:
for each data point in the training data set, constructing a neighborhood pyramid, wherein each layer in the neighborhood pyramid is a neighbor point of the data point, and the number of the neighbor points is increased from top to bottom layer by layer;
calculating the distance between the neighbor point of each layer in the neighborhood pyramid and the data point to obtain the Euclidean neighborhood measure of the neighborhood of the layer;
mapping the data points in the data set to be trained to a Hamming space by using a current Hash model to obtain Hamming space vectors corresponding to the data points;
calculating hamming neighborhood measures corresponding to each layer of neighborhood in the neighborhood pyramid by using hamming space vectors corresponding to the data points;
and optimizing the model parameters in the current model according to the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached to obtain a target Hash model.
2. The method of claim 1, wherein constructing a neighborhood pyramid for each data point in the training dataset comprises:
for each data point, calculating a distance between the data point and all other data points in the training data set;
sequencing all data points in the training data set according to the sequence of the distances from small to large;
sequentially dividing the sorted data points into N data point groups, wherein each data point group comprises a preset number of data points;
and sequentially determining the data point groups contained in each layer of the neighborhood pyramid according to the sequence of the arrangement indexes of the data point groups from small to large, wherein each layer of neighborhood comprises a preset number of data point groups from the first group, and the preset number is equal to the number of layers of the neighborhood in the neighborhood pyramid.
3. The method of claim 1, wherein calculating a distance between a neighbor point of each layer in the neighborhood pyramid and the data point to obtain a Euclidean neighborhood measure of the neighborhood of the layer comprises:
and calculating the average Euclidean distance between the neighbor point contained in each layer in the neighborhood pyramid and the data point, and taking the average Euclidean distance as the Euclidean neighborhood measure of the neighborhood of the layer.
4. The method of claim 1, wherein computing the hamming neighborhood measure for each layer in the neighborhood pyramid using the hamming space vector corresponding to the data point comprises:
and calculating the average Hamming distance between the Hamming space vector corresponding to the data point and the Hamming space vector corresponding to the neighbor point contained in each layer of neighborhood to obtain the Hamming neighborhood measure corresponding to the layer of neighborhood of the data point.
5. The method of claim 1, wherein optimizing the model parameters in the current hash model according to the euclidean neighborhood measure and the hamming neighborhood measure corresponding to the same data point until a predetermined convergence condition is reached to obtain a target hash model comprises:
performing linear fitting on the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point in the training data set to obtain linear transformation parameters;
and alternately updating the model parameters of the current hash model and the linear transformation parameters to minimize the fitting error and obtain the target hash model.
6. A method for retrieving similar objects, comprising:
acquiring a Euclidean feature vector of a target object in a Euclidean space;
converting the European feature vector of the target object to obtain a Hamming feature vector of a Hamming space according to a Hash model obtained by pre-training;
obtaining a Hamming distance between the target object and each object in the object set to be retrieved based on the Hamming feature vector of the target object and the Hamming feature vector of each object in the object set to be retrieved, wherein the Hamming feature vector of each object in the object set to be retrieved is obtained by converting the Euclidean feature vector of each object according to the Hash model;
and determining objects similar to the target object from the object set to be retrieved based on the Hamming distance between the target object and each object in the object set to be retrieved.
7. The method of claim 6, wherein the determining the object similar to the target object from the set of objects to be retrieved based on the hamming distance between the target object and each object in the set of objects to be retrieved comprises:
and determining the previously specified number of objects as objects similar to the target object according to the sequence of the Hamming distance between the target object and each object in the object set to be retrieved from small to large.
8. A hash model training apparatus, comprising:
the neighborhood pyramid building module is used for building a neighborhood pyramid for each data point in the training data set, wherein each layer in the neighborhood pyramid is a neighbor point of the data point, and the number of the neighbor points is increased layer by layer from top to bottom;
the Euclidean neighborhood measure calculating module is used for calculating the distance between the neighbor point of each layer in the neighborhood pyramid and the data point to obtain the Euclidean neighborhood measure of the neighborhood of the layer;
the vector mapping module is used for mapping the data points in the data set to be trained to a Hamming space by using a current Hash model to obtain Hamming space vectors corresponding to the data points;
the Hamming neighborhood measure calculating module is used for calculating the Hamming neighborhood measure corresponding to each layer of neighborhood in the neighborhood pyramid by utilizing the Hamming space vector corresponding to the data point;
and the model adjusting module is used for optimizing the model parameters in the current model according to the Euclidean neighborhood measure and the Hamming neighborhood measure corresponding to the same data point until a preset convergence condition is reached to obtain the target Hash model.
9. The apparatus of claim 8, wherein the neighborhood pyramid construction module is specifically configured to:
for each data point, calculating a distance between the data point and all other data points in the training data set;
sequencing all data points in the training data set according to the sequence of the distances from small to large;
sequentially dividing the sorted data points into N data point groups, wherein each data point group comprises a preset number of data points;
and sequentially determining the data point groups contained in each layer of the neighborhood pyramid according to the sequence of the arrangement indexes of the data point groups from small to large, wherein each layer of neighborhood comprises a preset number of data point groups from the first group, and the preset number is equal to the number of layers of the neighborhood in the neighborhood pyramid.
10. A similar object retrieval apparatus, comprising:
the Euclidean feature vector acquisition module is used for acquiring the Euclidean feature vector of the target object in the Euclidean space;
the vector mapping module is used for converting the European feature vector of the target object to obtain a Hamming feature vector of a Hamming space according to a Hash model obtained by pre-training;
a hamming distance calculation module, configured to obtain hamming distances between the target object and each object in the object set to be retrieved based on the hamming feature vector of the target object and the hamming feature vector of each object in the object set to be retrieved, where the hamming feature vector of each object in the object set to be retrieved is obtained by converting the euclidean feature vector of each object according to the hash model;
and the determining module is used for determining the objects similar to the target object from the object set to be retrieved based on the Hamming distance between the target object and each object in the object set to be retrieved.
CN201910892285.7A 2019-09-20 2019-09-20 Hash model training method, similar object retrieval method and device Pending CN110659375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910892285.7A CN110659375A (en) 2019-09-20 2019-09-20 Hash model training method, similar object retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910892285.7A CN110659375A (en) 2019-09-20 2019-09-20 Hash model training method, similar object retrieval method and device

Publications (1)

Publication Number Publication Date
CN110659375A true CN110659375A (en) 2020-01-07

Family

ID=69038246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910892285.7A Pending CN110659375A (en) 2019-09-20 2019-09-20 Hash model training method, similar object retrieval method and device

Country Status (1)

Country Link
CN (1) CN110659375A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400314A (en) * 2020-03-02 2020-07-10 支付宝(杭州)信息技术有限公司 Method and device for searching node vector from database by using vector graph index
CN115495546A (en) * 2022-11-21 2022-12-20 中国科学技术大学 Similar text retrieval method, system, device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400314A (en) * 2020-03-02 2020-07-10 支付宝(杭州)信息技术有限公司 Method and device for searching node vector from database by using vector graph index
CN111400314B (en) * 2020-03-02 2023-10-27 支付宝(杭州)信息技术有限公司 Method and device for retrieving node vector from database by using vector diagram index
CN115495546A (en) * 2022-11-21 2022-12-20 中国科学技术大学 Similar text retrieval method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN107480261B (en) Fine-grained face image fast retrieval method based on deep learning
CN110188227B (en) Hash image retrieval method based on deep learning and low-rank matrix optimization
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
Van Der Maaten Barnes-hut-sne
US11651037B2 (en) Efficient cross-modal retrieval via deep binary hashing and quantization
CN109783682B (en) Point-to-point similarity-based depth non-relaxed Hash image retrieval method
CN105960647B (en) Compact face representation
Gu et al. Clustering-driven unsupervised deep hashing for image retrieval
Zhou et al. Deep forest hashing for image retrieval
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
US10803231B1 (en) Performing tag-based font retrieval using combined font tag recognition and tag-based font retrieval neural networks
Hu et al. Pseudo label based unsupervised deep discriminative hashing for image retrieval
US20110208688A1 (en) Nearest Neighbor Methods for Non-Euclidean Manifolds
Huang et al. Object-location-aware hashing for multi-label image retrieval via automatic mask learning
CN109784405B (en) Cross-modal retrieval method and system based on pseudo-tag learning and semantic consistency
CN111125411B (en) Large-scale image retrieval method for deep strong correlation hash learning
CN106777038A (en) A kind of ultralow complexity image search method for retaining Hash based on sequence
CN113312505B (en) Cross-modal retrieval method and system based on discrete online hash learning
CN112395438A (en) Hash code generation method and system for multi-label image
CN106777388B (en) Double-compensation multi-table Hash image retrieval method
CN107220368B (en) Image retrieval method and device
CN113177141A (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
EP3115908A1 (en) Method and apparatus for multimedia content indexing and retrieval based on product quantization
CN110659375A (en) Hash model training method, similar object retrieval method and device
CN115795065A (en) Multimedia data cross-modal retrieval method and system based on weighted hash code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200107