CN106777388B

CN106777388B - Double-compensation multi-table Hash image retrieval method

Info

Publication number: CN106777388B
Application number: CN201710088703.8A
Authority: CN
Inventors: 吴永贤; 周先成; 田星
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-02-20
Filing date: 2017-02-20
Publication date: 2020-11-24
Anticipated expiration: 2037-02-20
Also published as: CN106777388A

Abstract

The invention discloses a double-compensation multi-table Hash image retrieval method, which comprises the following steps: 1) extracting image characteristics and processing category information; 2) carrying out Hash table training; 3) mapping the image features to a Hamming space and category weight calculation according to a Hash table; 4) calculating the Hamming distance according to the query, and returning a query result; 5) and (5) reordering the operations. The method of the invention can obtain the characteristics of fast query response, small memory cost and high query performance in the aspect of image retrieval, has great improvement in the aspect of multi-table Hash image retrieval, and overcomes the defect that the multi-table Hash needs extra cost.

Description

Double-compensation multi-table Hash image retrieval method

Technical Field

The invention relates to the technical field of image retrieval, in particular to a double-compensation multi-table Hash image retrieval method.

Background

With the development of the internet, the number of multimedia files is rapidly increasing, and images uploaded by people are also in a very large scale. This presents a significant challenge to the image retrieval problem. The traditional retrieval method based on the tree structure generally needs a lot of additional auxiliary space, which even exceeds the size of the original image data; moreover, once the feature dimension of the image is large, the performance of the method based on the tree structure is degraded even to the complexity of linear retrieval. In contrast, the hash-based image retrieval method always has super-linear time complexity, and the required auxiliary space is quite satisfactory.

For a hash-based image retrieval method with F hash bits, the image is first mapped to a low-dimensional hamming space. F hash bits are used for each image representation and the hamming distance of these binary bits is used to measure the similarity between images. For a good method it should be ensured that similar images have a small hamming distance and dissimilar images have a large hamming distance. The query process of the method is as follows: for a query image, the method firstly maps the image to a Hamming space, calculates the Hamming distance between the query image and the image in the database, and returns the image with smaller Hamming distance as the query result (determined by the threshold set by the user). Because the hash bit can be represented by binary bit, and the hamming distance can be calculated by using bit operation built in the machine, the image retrieval method based on the hash can use less auxiliary memory to achieve faster retrieval speed, and the accuracy of retrieval is ensured by the designer of the method.

In the field of hash-based image retrieval, the image retrieval method can be divided into three types, namely supervised hash, semi-supervised hash and unsupervised hash, according to whether semantic tags are used. The supervised hash method utilizes the label information to train a hash function and requires a database to provide complete label information; semi-supervised hashing also utilizes label information in the database, but tolerates some data to be label-free; unsupervised hashing refers to those hashing methods that do not utilize label information. Images in an image database often have some semantic labels, the information can well improve the performance, the unsupervised Hash method ignores the information and probably loses the retrieval performance, the supervised Hash method requires that all the images have the semantic labels, which is not realistic, and the method is a semi-supervised Hash image retrieval method, can well utilize the label information and simultaneously better conforms to the actual situation.

Disclosure of Invention

The invention aims to overcome the defects of the traditional hash image retrieval method in multi-table retrieval, and provides a double-compensation multi-table hash image retrieval method.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a double-compensation multi-table Hash image retrieval method comprises the following steps:

1) extracting image characteristics and processing category information;

2) carrying out Hash table training;

3) mapping the image features to a Hamming space and category weight calculation according to a Hash table;

4) calculating the Hamming distance according to the query, and returning a query result;

5) and (5) reordering the operations.

In step 1), extracting image features and processing label information, and the specific steps are as follows:

1.1) extracting image features by using a gist algorithm, obtaining an image feature matrix X, wherein X is d multiplied by n, n is the number of data set pictures, d is a feature dimension, and carrying out centralization operation on the X matrix;

1.2) dividing the image into two subsets: semantically tagged data is partitioned into tagged image subsets with a feature matrix of X_lSemantically unlabeled data is divided into unlabeled image subsets with a feature matrix of X_u；

1.3) for unlabeled datasets, calculating their pseudo-label: and calculating Euclidean distances of gist characteristics from all labeled images for each unlabeled image, voting the possible labels of the unlabeled image by taking the image with the nearest 20% of the Euclidean distances, taking the label with the largest number as a pseudo label of the image, and taking any label if the number of the labels is the same.

In step 2), performing hash table training, specifically as follows:

obtaining m hash tables, wherein m is the number of the hash tables, defining an S matrix, and the size is n ' × n ', n ' is the number of the labeled images, and initializing as: s_ijWhen the image pair is 1 (x)_i,x_j) Having the same label, S_ijWhen-1 indicates an image pair (x)_i,x_j) Do not have the same label; defining F as the number of hash functions in a single hash table, including an inner loop and an outer loop:

2.1) the mechanism of outer loop is illustrated in this section: for the current tth hash table, t is 1, … m, which is as follows:

2.1.1) if t is 1, directly entering an inner loop, otherwise, calculating an error of the t-1 th hash table and updating an S matrix, wherein the error refers to: for images with labels, the hamming distance between two images with the same label is greater than that of the images with different labels, or the hamming distance between two images with different labels is less than that of the images with the same label, wherein the hamming distance is a threshold value, and the updating method of the S matrix is as follows:

S^t+1＝S^t+c×ΔS^t

in the formula, S^tRefers to the weight matrix used in the training of the t-th hash table, and initializes S¹C is a parameter that affects the rate of change of S, Δ S^tReferring to the matrix of weight adjustments, defined as follows:

in the formula (d)_H(x_i,x_j) Refers to an image (x)_i,x_j) The hamming distance of (a) is calculated by using the t-1 th hash table;

2.1.2) entering the inner layer circulation of the t layer and calculating the t hash table;

2.1.3) if t is m, the m hash tables with the size of F are trained and finished, and the operation is terminated; otherwise, t is t +1, and the step 2.1.1) is adjusted;

2.2) in the t-th layer loop, the kth hash function, k 1, …, F, first initializes X_trX, specifically as follows:

2.2.1) calculating the M matrix,

where λ is a parameter used to prevent overfitting; s^t,kRefers to the S matrix used by the current hash function, and S^t ^,1＝S^t；

2.2.2) performing characteristic decomposition on the M matrix, and extracting the eigenvector w with the maximum eigenvalue_t,kSynthesizing a kth hash function of the tth table:

sign in the formula is a sign function, a positive number returns to 1, a negative number returns to-1, and 0 returns to 0;

2.2.3) removal of X_trRedundancy in data:

2.2.4) at S^t,kUpdate S on the basis of^t,k+1：S^t,k+1＝S^t,k+ΔS^t,k，

Wherein A ═ α k-D_k)/2k,B＝(βk-D_k)/2k,

α and β are two parameters, the thresholds controlling similarity and dissimilarity, respectively;

2.2.5) the current inner loop ends if k ═ F, otherwise k ═ k +1, tuning to step 2.2.1);

in step 3), mapping the feature matrix X of the image to a hamming space by using a hash function to obtain m binary matrices H with the size of F × n, wherein the category weight calculation method is as follows:

calculating m class weight matrices V, wherein the size of V matrix is n_c×F，n_cIs the number of data set tag types, V_tA class weight matrix corresponding to the tth hash table, t 1,2, …, m, wherein the C-th class includes a weight of the k-th hash function of the pseudo label, corresponding to V_tElement V of the matrix_C,i，k＝1,2,…,Z：

V_C,i＝max(c_-,c₊)/(c_-+c₊)

In the formula, c_-And c₊The kth hash function maps the images of class C including the pseudo labels to the numbers of 0 and 1, respectively.

In step 4), performing preliminary query result calculation, specifically including the following steps:

4.1) mapping the query to a Hamming space to obtain m Hamming codes with the size of F;

4.2) for each hash table, calculating the Hamming distance of the query from all images, and returning m image sets according to the Hamming distance being less than a threshold threo, wherein the threo is a set threshold, which will influence the amount of preliminary returned data.

In step 5), performing a re-ordering operation according to the result returned in step 4), specifically comprising the following steps:

5.1) calculating m query-sensitive weight vectors Z_t，t＝1,2,…,m

In the formula, R_tIs the return set of the t-th hash table in the step 4), l_iAs R_tMiddle image x_iLabel of (2), n_tIs R_tThe number of images in;

5.2) calculating the weighted Hamming distance of the query and all data of the dataset, returning the final query result, wherein for query q and image x_iThe calculation method of the weighted hamming distance is as follows:

in the formula (I), the compound is shown in the specification,

is xor operation;

return d_w(x_i,q)<All images of thro 'as query results, where thro' is a set threshold, will determine the final return result.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the additional auxiliary memory is small: the method of the invention uses a series of binary bits to represent the image, so that the memory required for the image set image is very small.

2. The query speed is high: the method can fully utilize the operation of the machine bit of the bottom layer during query to quickly respond to a query request, can use a binary bit sequence to represent an image, can utilize the operation of the bit of the bottom layer for Hamming distance, and greatly quickens the query speed.

3. And (3) suitability for distributed deployment: the method can arrange the query in a distributed system, and because the query process is a one-by-one comparison process of the query Hamming code and the Hamming code in the database and is not coherent, the image Hamming code of the database can be arranged on a plurality of machines, and the speed is increased.

4. The method overcomes the defect that multi-table hash and single-table hash usually need more expenses under the same performance, and can provide better query results under the same expenses.

5. The method introduces a pseudo label calculation method, and provides a query result reordering method on the basis, so that the query effect can be further improved.

Drawings

FIG. 1 is a flow chart of the operation of the method of the present invention (labeled BDCH-R).

Figure 2 is a graph comparing the performance of the inventive method (no resequencing, BDCH mark) and a prior art image retrieval method.

FIG. 3 is a graph comparing the performance of the inventive method (no resequencing, BDCH labeling) with a single table hash at equal overhead.

Figure 4 is a graph comparing performance with and without reordering (BDCH designation).

Detailed Description

The present invention will be further described with reference to the following specific examples.

Fig. 1 is a flowchart illustrating the operation of the double-compensation multi-table hash image retrieval method of the present invention, which is divided into an offline training part and an online query part, wherein the offline training part is used for feature extraction, and the hash table training part takes a lot of time, but the offline training part does not affect the query performance, so that the hash table is within an acceptable range. The evaluation of query performance is illustrated in fig. 2 through 4, which are described in detail below.

In this example, the present method is performed on a CIFAR10 dataset to establish a search and evaluate performance. The hardware comprises a 128G memory server. The software included MATLAB. The CIFAR10 dataset is a dataset that has 60,000 images, which are divided into 10 categories. In which 1,000 images are randomly drawn as a query and the remaining 59,000 images are used as an image data set. Since the method is applied to the case where a part of the images in the data set are labeled, 1,000 images are randomly extracted from the 59,000 images as labeled images, and the remaining 58,000 images are unlabeled images.

Define recall and precision: for the returned result R of query q, U is the image similar to q in R, U is the number of all images similar to q in the CIFAR10 dataset, and v is the number of images in R, then

Recall＝u/U；

Precision＝u/v；

By setting different thresholds, different returned results and a series of (precision) pairs can be obtained, and fig. 2 to 4 are drawn according to this method.

This example includes the following steps:

1) image feature extraction and category information processing: and (3) extracting the features of the image by adopting a gist algorithm, wherein the gist features describe the macroscopic feature descriptors of the image based on five points of the naturalness, the openness, the roughness, the expansibility and the danger degree of the image. This step will represent the image using d-dimensional feature vectors, and obtain a representation of the feature matrix X of size d × n. Where n is the number of data set images 59,000 and d is the parameter 512 for feature extraction. The feature matrix X is not centered, and it is necessary to center X for convenience of processing. Extracting characteristic matrix X of image with class label in X_lA size of 512 × 1,000; extracting feature matrix X of label-free image_uThe size is 512X 58,000. Calculating a pseudo label of the label-free data: for each image without a label, Euclidean distance calculation is carried out on the image without the label and the images with all labelsAnd sorting, voting the top 20% of labeled images for the class labels of the images, and taking the labels with the most quantity as the pseudo labels of the images.

2) Using dataset X, tagged dataset X_lAnd training m hash tables with the size of F by using the label information and the pseudo label information as basic data. Where the hash table number m is set to 5 and the hash bit F is set to 32. The time required by the training process is far longer than the time required by the query, firstly, the method performance is not influenced because the training is offline, and secondly, the data set is not changed in a large scale after the model is trained, so the training overhead can be regarded as offline one-time overhead and is within an acceptable range.

3) After the data set is processed, the idea of the method is used to train the training data set into a 5-table 32-bit hash function, and all images are mapped into hamming codes. For each image, the hamming code is represented by 0 and 1, and each image has 5 32-bit binary hamming codes, which are represented by 160 binary bits in total. The similarity between images is measured using hamming distance.

Calculating a class weight matrix V_t(t＝1,2,3,4,5)。

4) In this step, the sum of the multiple hamming distances of the query and all images is directly calculated and the image with the smaller hamming distance is returned, which of course is the threshold required by the user of the method. For convenience of performance evaluation, a plurality of thresholds are used, a series of calls, precision values are obtained, and a (call, precision curve) is plotted for comparison with the performance of other methods, which are not re-ordered, and are identified in fig. 2 to 4 as BDCH, in order to further illustrate the performance of the method. The method comprises the following specific steps:

4.1) mapping the query image to Hamming space to obtain 5 32-bit Hamming codes.

4.2) calculating the multi-table Hamming distance between the query and each image, wherein the multi-table Hamming distance is the sum of 5 hash tables.

4.3) set multiple hamming code thresholds (maximum multiple-table hamming distance from 1 to all images), obtain a series of (compare) pairs, to prevent the situation of individual cases, use 1,000 queries, compare with threshold value compare, compare average of multiple queries.

4.4) drawing a call-precision curve.

5) The hash table obtained in the step 3) is directly used in the step, the step of reordering is included, the step is a complete just-sent step and is marked by BDCH-R, and the specific steps are as follows:

5.1) mapping the query image to Hamming space to obtain 5 32-bit Hamming codes.

5.2) for each hash table, calculating the Hamming distance between the query and the image, and returning all the images with the Hamming distances smaller than 8, so 5 return sets are obtained.

5.3) calculating the query sensitive weight vector according to the return set in the step 5.2), thereby obtaining 5 weight vectors with the size of 32.

And 5.4) calculating the hamming distance of the weight, setting a plurality of thresholds, returning a result, obtaining a plurality of (recall, precision) values, and taking the average value of a plurality of queries with the recall and precision values of the thresholds.

5.5) drawing a call-precision curve.

The λ parameter is used to adjust the weight between the labeled image and the unlabeled image, where λ is 1. In practice, of course, the ratio of the tagged images to the entire image dataset is always different for different image datasets, and therefore the value of λ needs to be specifically adjusted according to actual situations.

For m and F parameters, the number of tables used by the hash method and the number of hash bits of the hash table are referred to. The two parameters directly influence the performance of the method, the larger the values of the two parameters are, the better the performance of the method is, but the retrieval time is increased, the memory requirement is increased, a balance needs to be made according to the actual situation, m is 5, and F is 32, so that the better performance can be achieved.

For the alpha, beta parameters, both are thresholds used to measure similarity versus dissimilarity in the training of a single hash table. For k hash functions that have been trained, they are considered similar in Hamming space when sim < α k, and dissimilar in Hamming space when sim > β k, where sim refers to similarity. This is certainly true, i.e. similar information obtained from the label information is in and out, which is an error generated during the training of the method. When the (k + 1) th function is trained, the S matrix is adjusted to some extent according to the information, and the weight of the error image pair is increased, wherein alpha and beta respectively take-0.6 and 0.

For c, the parameters are embodied in the updates between the tables. In the single table training, the M matrix is subjected to characteristic decomposition circularly, and the characteristic vector with the maximum characteristic value is extracted to establish a hash function. This makes the first hash function and the last hash function more different, and the errors introduced by the last hash function are more enormous. Thus, a table is newly trained from scratch and the weights of the images used for current table training are adjusted based on the errors of the previous table. For c, it is the step of updating S, and in this method c is taken to be 8. Is the threshold for measuring the error of the previous table: for the previous table, when d_H< considered similar in Hamming space, for d_HThe images are considered to be dissimilar in hamming space, and compared with the cases recorded in the S matrix, the error of the previous table to the labeled data can be known, so the weights of the images need to be increased in the current table training, and round (F/4) is 8, and round means rounding in the method.

To demonstrate the performance of the method, several methods, which are also in the field of hash-based image retrieval, are introduced below:

BSPLH [1 ]. BSPLH is a semi-supervised single table hashing method, is an optimization based on SPLH, and is called a sequence learning method based on bootstrap by authors, which is called BSPLH for short. Mainly, in the serialization learning, the weight of the current image pair is adjusted according to the errors of all the previous hash functions, and the current hash function is learned.

LSH 2. LSH is a typical unsupervised single-table hash method, which randomly generates a plurality of hyperplanes, and generates a hash function using the hyperplanes. The method has the characteristics of strong adaptability, certain performance on various data sets, and the generated hash function is completely unrelated to the data sets and needs to use enough hash bits to ensure the performance.

SPLH [3 ]. The SPLH is a semi-supervised single table hashing method based on boosting, and specifically adopts a serialized learning method, adjusts the weight of a current image pair according to the error of a previous hashing function, and learns the current hashing function.

CH 4. CH is an unsupervised multi-table hash method, a boosting thought sequence is adopted to learn a plurality of tables, the weight of the current table is adjusted according to the error of the previous table, and the next table is learned.

DCH [5 ]. The DCH is a semi-supervised multi-table hash method, is an improvement of the CH, not only performs boosting serialization learning between table training, but also adopts the idea in the table.

BIQH [6 ]. BIQH is a supervised multi-table hashing method, taking the form of minimizing quantization error for a single table, and then employing a return result reordering operation.

A comparison of the performance of the present method (without resequencing, identified as BDCH) and the several prior art image retrieval methods is shown in figure 2. Under the same call condition, the higher precision is, the better performance is; in general, the area formed by the curve and the coordinate axis can be used as the evaluation standard of the performance of the method, and the method has obvious advantages in comparison with the performance of other methods.

In addition, the method is a multi-table hashing method, the performance is improved by using the multi-table, but more expenses are introduced, so that another evaluation method is introduced, and the performance of the method is compared under the condition that all the hashing methods use the same expenses: in FIG. 3 (no reordering, identified as BDCH) are compared (method 4 Table 8 bit, LSH 32 bit, SPLH 32 bit, BSPLH 32 bit). It can be seen that the method still has a quite excellent performance even with the same overhead.

The comparison in fig. 4 is a comparison of the performance of the method with and without reordering, and it can be seen that reordering improves performance. But further introduces additional overhead, depending on the situation.

In summary, the method of the present invention can obtain the characteristics of fast query response, small memory overhead and high query performance in the aspect of image retrieval, has a great improvement in the aspect of image retrieval of multi-table hash, overcomes the defect that the multi-table hash requires additional overhead, and is worthy of popularization.

[1]C.Wu,J.Zhu,D.Cai,C.Chen,and J.Bu,“Semi-supervised nonlinear hashing using bootstrap sequential projection learning,”IEEE Transactions on Knowledge and Data Engineering,vol.25,no.6,pp.1380–1393,2013.

[2]M.Datar,N.Immorlica,P.Indyk,and V.S.Mirrokni,“Locality-sensitive hashing scheme based on p-stable distributions,”in Proceedings of the twentieth annual symposium on Computational geometry,pp.253–262,2004.

[3]J.Wang,S.Kumar,and S.-F.Chang,“Semi-supervised hashing for large-scale search,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.34,no.12,pp.2393–2406,2012.

[4]Xu,H.,Wang,J.,Li,Z.,Zeng,G.,Li,S.,&Yu,N.“Complementary hashing for approximate nearest neighbor search”.IEEE International Conference on Computer Vision(ICCV),pp.1631-1638,2011.

[5]Li,P.,Cheng,J.,and Lu,H."Hashing with dual complementary projection learning for fast image retrieval."Neurocomputing,vol.120,pp.83-89,2013.

[6]Fu,H.,Kong X.,and Lu,J."Large-scale image retrieval based on boosting iterative quantization hashing with query-adaptive reranking."Neurocomputing,vol.122,pp.480-489,2013.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A double-compensation multi-table Hash image retrieval method is characterized by comprising the following steps:

1) extracting image characteristics and processing category information;

2) the hash table training specifically comprises the following steps:

S^t+1＝S^t+c×ΔS^t

2.2) in the t-th layer inner layer cycle,the kth hash function, k 1, …, F, first initializes X_trX, specifically as follows:

2.2.1) calculating the M matrix,

where λ is a parameter used to prevent overfitting; s^t,kRefers to the S matrix used by the current hash function, and S^t,1＝S^t；

2.2.3) removal of X_trRedundancy in data:

2.2.4) at S^t,kUpdate S on the basis of^t,k+1：S^t,k+1＝S^t,k+ΔS^t,k，

Wherein A ═ α k-D_k)/2k,B＝(βk-D_k)/2k,

calculating m classesA weight matrix V, wherein V is of size n_c×F，n_cIs the number of data set tag types, V_tA class weight matrix corresponding to the tth hash table, t 1,2, …, m, wherein the C-th class includes a weight of the k-th hash function of the pseudo label, corresponding to V_tElement V of the matrix_C,i，k＝1,2,…,F：

V_C,i＝max(c_-,c₊)/(c_-+c₊)

In the formula, c_-And c₊Respectively mapping images of the category C including the pseudo labels to the number of 0 and 1 by the kth hash function of the tth hash table;

5) and (5) reordering the operations.

2. The double-compensated multi-table hash image retrieval method according to claim 1, wherein: in step 1), extracting image features and processing label information, and the specific steps are as follows:

3. The double-compensation multi-table hash image retrieval method according to claim 1, wherein in step 4), preliminary query result calculation is performed, and the specific steps are as follows:

4. The double-compensated multi-table hash image retrieval method according to claim 1, wherein: in step 5), performing a re-ordering operation according to the result returned in step 4), specifically comprising the following steps:

5.1) calculating m query-sensitive weight vectors Z_t，t＝1,2,…,m

in the formula (I), the compound is shown in the specification,

is xor operation;