CN111078952B

CN111078952B - Cross-modal variable-length hash retrieval method based on hierarchical structure

Info

Publication number: CN111078952B
Application number: CN201911141734.0A
Authority: CN
Inventors: 祁晓君
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2023-07-21
Anticipated expiration: 2039-11-20
Also published as: CN111078952A

Abstract

The invention discloses a hierarchical structure-based cross-mode variable length hash retrieval method, which comprises the following steps: the hash learning method is used for searching the cross-modal data, a variable-length hash algorithm which changes along with the length changes of different modal data is provided, so that the memory occupation is reduced, and the variable-length hash is first applied in the cross-modal field; introducing a hierarchical structure model, and selecting a representative data image-text pair to process training data according to similarity by constructing a hierarchical structure among data, so that training time of a subsequent cross-mode variable-length hash retrieval algorithm can be remarkably reduced; meanwhile, the accuracy of the cross-modal retrieval algorithm is effectively improved. The method can be effectively applied to cross-modal retrieval between the natural image text pairs.

Description

Cross-modal variable-length hash retrieval method based on hierarchical structure

Technical Field

The invention belongs to a multi-mode data retrieval method, and combines cross-mode retrieval, hash learning, manifold hierarchical structure, algebraic multi-grid ideas and the like to perform relevant improvement of a cross-mode hash retrieval algorithm and further develop the research work.

Background

Before the advent of the big data age, we have often used tags to retrieve various information. However, due to the explosive growth of large-scale data in recent years, a cross-modal search method has been developed, and we can flexibly query media data of any modality using the cross-modal search method. Although the existing cross-modal hash algorithm based on unified subspace learning has made some progress, the problems of large calculation amount, high storage cost and insufficient large-scale multi-modal data retrieval precision still exist. On this basis, the cross-modal retrieval mode based on hash coding is ubiquitous in machine learning and information retrieval due to its high efficiency and effective time and space saving. Hash learning generates a low-dimensional compact hash code by mapping the original high-dimensional media information to a low-dimensional space for quick and efficient retrieval. However, most hash-based models map multi-modal data to fixed-length hash codes, resulting in a less than optimal representation of the multi-modal data.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The cross-modal variable-length hash retrieval method based on the hierarchical structure has the advantages that the cross-modal hash retrieval algorithm with variable length is provided, the hierarchical structure of algebraic multiple grids is combined, and the data retrieval accuracy is improved. The technical scheme of the invention is as follows:

a cross-modal variable length hash retrieval method based on a hierarchical structure comprises the following steps:

step 1, constructing a neighbor graph for a multi-mode dataset, wherein image data in the dataset corresponds to text data one by one, and for simplifying expression, we use image-text pairs to represent a pair of image-text data in the multi-mode dataset. Respectively extracting different characteristics of an image and a text, and constructing a similarity matrix between the image-text pair training set and an image-text pair database by combining the characteristics of the image and the characteristics of the text;

step 2, selecting a representative image-text pair to construct a bottom-up image-text pair hierarchical structure through a similarity matrix, wherein the selected image-text pair is strongly connected with unselected image-text pairs, the selected image-text pair in each layer is used as an initial image-text pair of the next layer, and the selection of the image-text pair of the next layer is carried out again until the selected representative points are few enough and can represent the whole data set, and the selected top image-text pair at the moment represents similar image-text pairs of different local areas respectively;

and 3, constructing a similarity matrix between top-level representative image-text pairs, and on the basis of a single-mode supervision discrete hash retrieval method, assuming that the image data and the text data have a common potential abstract semantic space, and directly inquiring and retrieving in the space. Respectively projecting hash codes of the image data and the text data to a potential abstract semantic space, and calculating a similarity matrix between the image data and the text data according to the inner product;

step 4, respectively projecting the image data and the text data into hash coding spaces with the optimal lengths, and solving corresponding projection matrixes, similarity incidence matrixes and compact hash codes with the optimal lengths of the modal data through iterative optimization;

step 5, interpolating from top to bottom by using a similarity transfer matrix and returning to the bottom complete data to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database;

and 6, obtaining the required data similar to the data to be queried according to the similarity sorting, and returning the required data to the user and detecting the accuracy of the required data.

Further, in the step 1, a neighbor graph is established for paired data of the image and the text, and a similarity matrix between the image-text pair training set and the image-text pair database is constructed, which specifically includes:

is provided withFor image dataset +.>For text data sets, d ₁ And d ₂ The dimension of the image and text data is respectively, n is the number of samples of the image and text data pair, and R is the whole real number set;

establishing k neighbor graph G for graph-text pairs in database ^[0] (V ^[0] ,W ^[0] ) Wherein V is ^[0] Representing all of the pairs of pictures in the database,representing a similarity matrix between each pair of teletext.

Further, in the step 2, the core of the hierarchical structure is to construct a strong connection graph between the graphic pairs. For the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1-th layer is taken as a representative image-text pair, and the image-text pair in the s-1-th layer but not in the s-th layer, namely the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs are arranged between different layersImage-text pair using similar transfer matrix F ^[n] Where n represents the corresponding number of layers. The similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:

W ^[i] ＝F ^[i-1]T ...F ^[1]T F ^[0]T W ^[0] F ^[0] F ^[1] ...F ^[i-1] ，i＝1,2,...,s.

further, in the step 3, firstly, a similarity matrix of the top-level representation graph-text pair data is constructed, and if the multi-modal data set has a common potential abstract semantic space V, the multi-modal data can be directly queried and retrieved in the space, then the hash code B of the image is obtained _X And hash coding of text B _Y The projection into the underlying abstract semantic space is in the form of:

the similarity between the data in the V space is expressed as follows:

note w=w ₁ ^T W ₂ W is the similarity between the image and the text, and the specific objective function is:

wherein P is _X 、P _Y Representing projection matrices of image data and text data, respectively.

Further, in the step 4, a mapping function of mapping original data of each mode to respective hash codes is obtained, and a compact hash code with an optimal length to the data of each mode is obtained through the mapping function, and then the specific solving steps are as follows:

(1) Fixing other variables and solving for P _X 、P _Y The objective function can be reduced to the following form:

thus, P _X 、P _Y The analytical formula can be calculated by a regression formula:

P _X ＝B _X X ^T (XX ^T ) ^-1 ，P _Y ＝B _Y Y ^T (YY ^T ) ^-1 。

(2) Fixing other variables and solving for W, the objective function can be reduced to the following form:

the formula is a bilinear regression model, and the analytical formula is as follows:

(3) Fixing other variables, solving for B _X The objective function may be reduced to the following form:

for variable B _X Solving row by row, i.e. solving B _X Fixing the rest row vectors when a certain row vector is formed, then sequentially and iteratively solving other row vectors, expanding a formula and deforming the formula into the following form:

s.t.B _X ∈[-1,+1] ^h×n

wherein,,H＝(WB _Y S ^T +P _X x), tr (…) is the trace of the solution matrix, the solution is as follows, when solving B _X Ith row vector b ^T Time, let B _X ' is B _X Deleting row vector b ^T Matrix g ^T The ith row vector of G, G' is the G deleted row vector G ^T The matrix after H is the ith row vector of H, H' is the matrix after H deletes the row vector H, and the solving result is:

b＝sgn(h-B _X ′G′ ^T g)

b can be solved according to the above formula _X Then the rest other row vectors are obtained through similar steps;

(4) Fix other variables to solve B _Y Solution process and B _X Substantially similar, reference is made to step (3) B _X Is a solution of (a).

Further, in the step 5, the similarity transfer matrix is used to interpolate from top to bottom and return to the bottom layer, so as to obtain the similarity ordering of the hash codes of the data to be queried and all the data in the database, which comprises the following specific steps: using a sequence of similar transfer matrices F ^[0] F ^[1] ...F ^[s-2] F ^[s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.

The invention has the advantages and beneficial effects as follows:

the innovation point of the invention is as follows: 1) A variable length cross-mode hash retrieval method is provided. The existing cross-mode hash retrieval methods are all based on hash codes with fixed lengths for retrieval, but data with different modes have hash codes with different lengths, so that the method cannot express the characteristics of different data more accurately. The method proposes to use variable-length hash codes, and follows the coding length of the data, so that the obtained hash codes can more effectively and accurately represent the data of different modes. 2) The occupation of the memory is effectively reduced. By using variable-length hash codes, the generated hash codes can most accurately express different mode data, and meanwhile, the memory occupation is effectively reduced. 3) A hierarchical model is incorporated. The algebraic multiple grid concept is used, training data is processed by selecting representative data pairs, and a subsequent search algorithm is completed by using only the representative data pairs of the top layer, so that the method is a brand new application. 4) The training time is greatly reduced. By adding the hierarchical structure model, the training data is changed from the original whole data to the top-level representative data with different characteristics, and the great reduction of the data quantity leads to the great reduction of the training time. 5) According to the invention, by combining the cross-modal hash learning method and the hierarchical structure model, the retrieval accuracy is increased under the condition of reducing time and memory occupancy rate because the selected representative data are better data.

Table 1 of the present invention statistics MAP values of a hierarchical-based cross-modality variable length hash search method (CVHH) image search text on a Wiki dataset, a NUS-WIDE dataset, and a MIRFlickr dataset.

Table 2 of the present invention counts MAP values for a cross-modal variable length hash search method (CVHH) text search image based on a hierarchical structure on a Wiki dataset, a NUS-WIDE dataset, and a MIRFlickr dataset.

Table 1 MAP value of image retrieval text

Table 2 MAP values of text retrieval image

Compared with a comparison method, the hierarchical structure-based cross-mode variable-length hash retrieval method shows better MAP values in hash codes with different lengths as can be seen from the tables 1 and 2 of the invention, and meanwhile, the CVHH method shows better MAP values along with the increase of hash coding lengths, namely, has better retrieval performance as can be seen from the tables 1 and 2. The effectiveness of the cross-modal variable-length hash retrieval method based on the hierarchical structure is verified through the experimental result.

Drawings

FIG. 1 is a flow chart of an overall algorithm of a preferred embodiment of the present invention

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

in order to reduce the training time of the algorithm, a neighbor graph is established for the paired data of the image and the text, and a similarity matrix between the paired training set of the image and the text and the paired database of the image and the text is established. And selecting a representative image-text pair in the image-text pair of each layer, wherein the selected image-text pair is strongly connected with the unselected image-text pair. And similarly, constructing a bottom-up image-text pair hierarchical structure, and using the image-text pair with the top layer most representative as training data. The selected top-layer image-text pairs respectively represent image-text pairs similar to the image-text pairs in different local areas, and only the top-layer image-text pairs are selected to participate in the subsequent cross-mode variable-length hash retrieval algorithm, so that the number of training data is reduced, and the training time of the algorithm is effectively shortened. The hash codes of the graphic-text modal data are projected to a potential abstract semantic space respectively, a corresponding projection matrix is obtained through iterative optimization, an association matrix between the data to be queried and top-level data is learned, and variable-length hash codes of the top-level image and text data are updated respectively until an objective function tends to converge. And (3) returning to the bottom layer by using the top-down interpolation of the similarity transfer matrix to obtain the similarity sequence of the hash codes of the data to be queried and the hash codes of all the data in the database, and returning the data required by the user according to the similarity sequence.

The technical scheme of the invention will be described in detail as follows:

firstly, constructing a neighbor graph for paired image and text data in a data set, respectively extracting different characteristics of the image and the text, and constructing a similarity matrix between an image-text pair training set and an image-text pair database by combining the characteristics of the image and the characteristics of the text.

The representative image-text pairs are selected through the similarity matrix to construct a bottom-up image-text pair hierarchical structure, wherein the selected image-text pairs are strongly connected with the unselected image-text pairs, the selected image-text pairs in each layer serve as initial image-text pairs of the next layer, the selection of the image-text pairs of the next layer is carried out again until the selected representative points are few enough and can represent the whole data set, the selected top-layer image-text pairs at the moment respectively represent similar image-text pairs of different local areas, and the selected top-layer image-text pairs respectively represent similar image-text pairs of different local areas, so that the number of training data is reduced, and the training time of an algorithm is effectively reduced.

The similarity matrix between the top-level representative image-text pairs is constructed, on the basis of a single-mode supervision discrete hash retrieval method, the image data and the text data are assumed to have a common potential abstract semantic space, and the query and the retrieval can be directly carried out in the space. The hash codes of the image data and the text data are projected to a potential abstract semantic space respectively, and a similarity matrix between the image data and the text data is calculated according to the inner product.

And respectively projecting the image data and the text data into hash coding spaces with the optimal lengths, and solving corresponding projection matrixes, similarity incidence matrixes and compact hash codes with the optimal lengths of the modal data through iterative optimization.

And (4) returning to the bottom complete data by using the top-down interpolation of the similarity transfer matrix to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.

And finally, obtaining the required data similar to the data to be queried according to the similarity sorting, and returning to the user.

Further, the neighbor graph is constructed on the paired image and text data, and meanwhile, a similarity matrix between the paired image training set and the paired image database is constructed. Is provided withIn order to provide a set of image data,for text data sets, d ₁ And d ₂ The dimension of the image and text data is respectively, n is the number of samples of the image and text data pair, and R is the whole real number set;

Further, a bottom-up hierarchical structure of the representative image-text pairs is constructed by selecting the image-text pairs among the similar image-text pairs, and the top-level representative image-text pairs with the most representative image-text pairs are selected as training data. For the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1 st layer is used as a representative image-text pair, and in the s-1 st layer but not in the s-th layer, namely the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs between different layers use a similar transmission matrix F ^[n] Where n represents the corresponding number of layers. The similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:

the hierarchical structure of all the image-text pairs in the database and the similarity transfer matrix between each layer of representative image-text pairs and the adjacent layers can be obtained by the above formula, and only the top layer of representative image-text pairs are selected to participate in the subsequent cross-mode variable-length hash retrieval algorithm.

Further, the assumed multi-modal data set is provided with a common potential abstract semantic space V, in which multi-modal data can be directly queried and retrieved, and then the form of hash codes of images and texts projected to the potential abstract semantic space is respectively as follows:

the similarity between the data in the V space is expressed as follows:

note w=w ₁ ^T W ₂ W is the similarity between the modality data. The specific objective function is:

further, the corresponding projection matrix, the similarity incidence matrix and the compact hash code of the optimal length of each mode data are solved through iterative optimization, and the specific solving steps are as follows:

fixing other variables and solving for P _X 、P _Y The objective function can be reduced to the following form:

P _X ＝B _X X ^T (XX ^T ) ^-1 ，P _Y ＝B _Y Y ^T (YY ^T ) ^-1

fixing other variables and solving for W, the objective function can be reduced to the following form:

fixing other variables, solving for B _X The objective function may be reduced to the following form:

because there is a binary constraint, the direct solution is obviously very complex, and thus the variable B is referred to herein _X Solving row by row, i.e. solving B _X Fixing the rest row vectors when a certain row vector is formed, then sequentially and iteratively solving other row vectors, expanding a formula and deforming the formula into the following form:

s.t.B _X ∈[-1,+1] ^h×n

wherein,, ^H ＝(WB _Y S ^T+P _X x), tr (…) is the trace of the solution matrix,the solving process is as follows, when solving for B _X Ith row vector b ^T Time, let B _X ' is B _X Deleting row vector b ^T Matrix g ^T The ith row vector of G, G' is the G deleted row vector G ^T The matrix after H is the ith row vector of H, H' is the matrix after H deletes the row vector H, and the solving result is:

b＝sgn(h-B _X ′G′ ^T g)

fix other variables to solve B _Y Solution process and B _X Substantially similar, reference is made to B _X Is a solution of (a).

Further, the method returns to the bottom layer by using the top-down interpolation of the similarity transfer matrix to obtain the similarity ordering of the hash codes of the data to be queried and all the data in the database, and comprises the following specific steps:

using a sequence of similar transfer matrices F ^[0] F ^[1] ...F ^[s-2] F ^[s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.

Step one: construction of neighbor graphs and similarity matrices

Is provided withFor image dataset +.>For text data sets, d ₁ And d ₂ The dimensions of the image and text data, respectively, and n is the number of samples of the image and text data pairs.

Step two: building a bottom-up hierarchy

Constructing a strong connection graph between image-text pairs, wherein for the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1 st layer is taken as a representative image-text pair, in the s-1 st layer but not in the s-th layer, namely, the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs between different layers use a similar transmission matrix F ^[n] Where n represents the corresponding number of layers. The similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:

step three: firstly, constructing a similarity matrix of top-level representative graphics-text pair data, and assuming that a common potential abstract semantic space V exists in a multi-modal data set, wherein the multi-modal data can be directly inquired and searched in the space, the hash codes of images and texts are projected to the potential abstract semantic space in the form of respectively:

the similarity between the data in the V space is expressed as follows:

step four: learning projection matrix, association matrix and variable length hash code of each mode data

P _X ＝B _X X ^T (XX ^T ) ^-1 ，P _Y ＝B _Y Y ^T (YY ^T ) ^-1

because there is a binary constraint, the direct solution is obviously very complex, and thus the variable B is referred to herein _X Solving row by row, i.e. solving B _X The rest row vectors are fixed at first when one row vector is used, and then other row vectors are sequentially and iteratively solvedRow vectors, develop the formula and morph into the form:

s.t.B _X ∈[-1,+1] ^h×n

b＝sgn(h-B _X ′G′ ^T g)

fix other variables to solve B _Y Solution process and B _X Substantially similar, reference is made to B _X Solution of (2)

Step five: obtaining a similarity ranking

Using a sequence of similar transfer matrices F ^[0] F ^[1] …F ^[s-2] F ^[s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.

Step six: returning demand data

According to the similarity ordering, the required data related to the data to be queried are obtained and the accuracy is detected, and the specific method is that the required data similar to the data to be queried are obtained according to the similarity matrix, and the data required by the user are returned.

In summary, the innovations and advantages of the present invention:

the cross-mode variable length hash retrieval method based on the hierarchical structure can realize efficient retrieval between graphic and text mode data, and greatly reduce training time and memory occupancy rate;

according to the hierarchical structure-based cross-mode variable-length hash retrieval method, the hash codes with variable lengths are used, the coding length of the data is followed, and the obtained hash codes can more effectively and accurately represent data with different modes.

According to the hierarchical structure-based cross-mode variable-length hash retrieval method, by using variable-length hash codes, the generated hash codes can be ensured to most accurately express different mode data, and meanwhile, the occupation of a memory is effectively reduced.

The invention provides a cross-modal variable-length hash retrieval method based on a hierarchical structure, which uses algebraic multiple grid ideas to process training data by selecting representative data pairs, and only uses the representative data pairs of the top layer to complete subsequent retrieval algorithms, thus being a brand new application.

According to the hierarchical structure-based cross-mode variable length hash retrieval method, the hierarchical structure model is added, so that training data is changed from original all data to top-level representative data with different characteristics, and the training time is greatly shortened due to the fact that the data quantity is greatly reduced.

According to the hierarchical structure-based cross-modal variable length hash retrieval method, the cross-modal hash learning method and the hierarchical structure model are combined, and the selected representative data are the optimal data, so that the retrieval accuracy is improved under the condition of reducing time and memory occupancy rate.

Table 1 MAP value of image retrieval text

Table 2 MAP values of text retrieval image

The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. A cross-modal variable length hash retrieval method based on a hierarchical structure is characterized by comprising the following steps:

step 1, constructing a neighbor graph for a multi-mode dataset, wherein image data in the dataset corresponds to text data one by one, and for simplifying expression, a pair of image-text data in the multi-mode dataset is expressed by using image-text pairs; respectively extracting different characteristics of an image and a text, and constructing a similarity matrix between the image-text pair training set and an image-text pair database by combining the characteristics of the image and the characteristics of the text;

step 3, constructing a similarity matrix between top-level representative graph-text pairs, and on the basis of a single-mode supervision discrete hash retrieval method, assuming that the image data and the text data have a common potential abstract semantic space, and directly inquiring and retrieving in the space; respectively projecting hash codes of the image data and the text data to a potential abstract semantic space, and calculating a similarity matrix between the image data and the text data according to the inner product;

step 6, obtaining required data similar to the data to be queried according to the similarity sorting, returning the required data to the user, and detecting the accuracy of the required data;

in the step 1, a neighbor graph is established for paired data of images and texts, and a similarity matrix between a training set of image-text pairs and an image-text pair database is established, specifically comprising:

establishing k neighbor graph G for graph-text pairs in database ^[0] (V ^[0] ,W ^[0] ) Wherein V is ^[0] Representing all of the pairs of pictures in the database,representing a similarity matrix between each pair of graphics and texts;

in the step 3, firstly, a similarity matrix of top-level representation graphics context versus data is constructed, and the hash coding B of the image is performed on the assumption that the multi-modal data set has a common potential abstract semantic space V in which the multi-modal data can be directly queried and retrieved _X And hash coding of text B _Y The projection into the underlying abstract semantic space is in the form of:

the similarity between the data in the V space is expressed as follows:

2. The method for cross-modal variable length hash search based on hierarchical structure according to claim 1, wherein in the step 2, the core of constructing the hierarchical structure is to construct a strong connection diagram between graphic pairs; for the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1 st layer is used as a representative image-text pair, and in the s-1 st layer but not in the s-th layer, namely the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs between different layers use a similar transmission matrix F ^[n] Wherein n represents the corresponding number of layers; the similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:

W ^[i] ＝F ^[i-1]T …F ^[1]T F ^[0]T W ^[0] F ^[0] F ^[1] …F ^[i-1] ，i＝1,2,...,s。

3. the hierarchical structure-based cross-mode variable length hash retrieval method according to claim 1, wherein in the step 4, a mapping function of mapping each mode of original data to each hash code is obtained, and a compact hash code with an optimal length to each mode of data is obtained through the mapping function, and then the specific solving steps are as follows:

P _X ＝B _X X ^T (XX ^T ) ^-1 ，P _Y ＝B _Y Y ^T (YY ^T ) ^-1 ；

b＝sgn(h-B _X ′G′ ^T g)

4. The hierarchical structure-based cross-modal variable length hash retrieval method according to claim 1, wherein in the step 5, a similarity transfer matrix is used to interpolate from top to bottom and return to the bottom layer, so as to obtain a hash code of the data to be queried and a similarity ordering of all data in a database, and the specific steps are as follows: using a sequence of similar transfer matrices F ^[0] F ^[1] …F ^[s-2] F ^[s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.