CN113326393B

CN113326393B - Image retrieval method based on deep hash feature and heterogeneous parallel processing

Info

Publication number: CN113326393B
Application number: CN202110600390.6A
Authority: CN
Inventors: 廖开阳; 陈星�; 曹从军; 章明珠; 王睿天; 罗晓洁
Original assignee: Shenzhen Foresight Information Co ltd
Current assignee: Shenzhen Foresight Information Co ltd; Xi'an Huaqi Zhongxin Technology Development Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2023-04-07
Anticipated expiration: 2041-05-31
Also published as: CN113326393A

Abstract

The invention discloses an image retrieval method based on deep hash characteristics and heterogeneous parallel processing, which is implemented according to the following steps: step 1, training a deep Hash network model; step 2, sending the test set and the query image into a trained network model to obtain the depth hash characteristics of the test set and the query image, namely binary hash codes; step 3, calculating Hamming distances between the binary Hash codes of the test set and the query image obtained in the step 2, and sequencing the Hamming distances in an ascending order to obtain a primary ranking result; and 4, selecting binary Hash codes of the previous p images in the initial ranking result, performing Hamming distance calculation again with the binary codes of the query image, and sequencing according to the Hamming distance ascending sequence to obtain a reordering result, namely finally obtaining q retrieval results most similar to the query image. The image retrieval method based on the deep hash characteristic and heterogeneous parallel processing solves the problem of low image retrieval precision in the prior art.

Description

Image retrieval method based on deep hash feature and heterogeneous parallel processing

Technical Field

The invention belongs to the technical field of computer image retrieval methods, and relates to an image retrieval method based on a deep hash feature and heterogeneous parallel processing.

Background

With the rapid development of storage devices, computer networks, and multimedia technologies, image data that people have come in contact with and make is increasing. Finding out the image that the user wants quickly and accurately in the massive database has become a hot spot of current research, and therefore, the image retrieval technology is also concerned and develops rapidly. There are also two important challenges for such applications: (1) Image features are usually high-dimensional data, storage requirements are high, and computational efficiency is low; (2) The retrieval method of large-scale data has high requirements on speed, time and the like.

In the prior art, two methods are mainly adopted for image retrieval. One is retrieval based on image overall feature description, because the feature dimension is high, all speed influences of storage, operation, retrieval and the like are caused; the other method is to perform retrieval based on local image features, and although the method can accurately describe the local image features, the description of the whole image is lost, so that the retrieval precision is not high.

Therefore, how to provide an image retrieval method to improve the retrieval accuracy and speed is an urgent problem to be solved in the field of computer vision.

Disclosure of Invention

The invention aims to provide an image retrieval method based on deep hash characteristics and heterogeneous parallel processing, and solves the problem of low image retrieval precision in the prior art.

The technical scheme adopted by the invention is that the image retrieval method based on the deep hash feature and heterogeneous parallel processing is implemented according to the following steps:

step 1, off-line training network model

The method comprises the steps of adopting a GoogLeNet network model as an initialization network structure, replacing the last classification layer with a Hash layer, wherein the unit number of the Hash layer is the bit number of an image to be coded, obtaining the GoogLeNet-1 network model, dividing an image data set CIFAR-10 into a training set and a testing set, wherein the training set is divided into 10 types, each type comprises 5000 pieces, and the testing set is divided into 10 types, and each type comprises 1000 pieces.

Inputting the training set into a GoogLeNet-1 network model, extracting image depth features through a convolutional layer, simultaneously learning a hash function, mapping the final depth features through the hash layer to obtain corresponding binary hash codes, and then performing iterative optimization and updating on a loss function to obtain optimal network parameters and a final GoogLeNet-hash model;

step 2, sending the test set and the query image into a trained GoogLeNet-hash network model to obtain the depth hash characteristics of the test set and the query image, namely binary hash codes;

step 3, calculating Hamming distances between the binary Hash codes of the test set and the query image obtained in the step 2, and sequencing the Hamming distances in an ascending order to obtain a primary ranking result;

and 4, selecting binary Hash codes of the previous p images in the initial ranking result, performing Hamming distance calculation again with the binary codes of the query image, and sequencing according to the Hamming distance in an ascending order to obtain a reordering result, namely finally obtaining q retrieval results (q is less than p) which are most similar to the query image.

The present invention is also characterized in that,

the process of generating the binary hash code in the hash layer in the step 1 and the step 2 specifically comprises the following steps:

after an m-dimensional image depth feature x is obtained from a full-connection layer of a GoogleLeNet-hash network model, the x is transmitted to a hash layer, q hash functions are provided on the assumption that the number of nodes of the hash layer is q, q bit hash codes are generated, and the hash codes generated by the q hash functions are shown in the following formula:

(h ₁ ,h ₂ ,...,h _q ) ^T ＝(sigmoid(W ₁ x),sigmoid(W ₂ x)...,sigmoid(W _q x)) ^T (1)

wherein h is ₁ -h _q For hash coding of bits 1 to q, sigmoid (W) ₁ x)-sigmoid(W _q x) is the 1 st to q th Hash codes relaxed by sigmoid function, W ₁ -W _q To construct q m-dimensional random vector matrices, W ₁ -W _q ∈R ^q ^*m ，W ₁ -W _q Is generated from a gaussian distribution;

quantizing the relaxed Hash code to obtain a final binary Hash code H, namely H = { H = ₁ ,h ₂ ,...,h _q } ^T Thresholding is performed, and the final binary hash code is obtained by the following formula:

that is, the binary hash code H is a code consisting of 0 and 1.

In the step 1, iterative optimization and updating are performed on the loss function to obtain the optimal network parameters and the final deep hash network model google lenet-hash, which specifically include:

step 1.1, calculating the probability of each image in the training set belonging to each category;

wherein Z is _k Representing the image features after hash-layer weighting, n representing the number of image classes, f (Z) _k ) Representing the probability of an image belonging to each class, Z _i Representing the ith class, where 1 < = i < = n, k is the class of the image true;

step 1.2, according to f (Z) _k ) Calculating the value of the Loss function Loss:

Loss＝-logf(Z _k ) (4)

step 1.3, solving the optimal value of Loss, and updating the weight coefficient theta by adopting a gradient descent method:

θ＝θ-η(f(Z _k )-1+γθ) (7)

wherein gamma is an attenuation factor, and eta is a learning rate, so that correction of the Softmax classifier and updating of network parameters are completed, and a final deep Hash network model GoogLeNet-hash is obtained.

The characteristic extraction in the step 2 is to input the image into a deep hash network GoogLeNet-hash to extract binary hash characteristics of the image and carry out thresholding to finally obtain a characteristic set, and the specific steps are as follows:

i.e. given test set ψ = { I = { (I) ₁ ,I ₂ ,...,I _g In which I _g Representing the g-th image in the test set, inputting the image in the test set into a deep hash network model GoogLeNet-hash to extract image hash characteristics and thresholding the image hash characteristics to obtain a final characteristic set psi _H ＝{H ₁ ,H ₂ ,...,H _g In which H is _g ＝{0,1} ^q ；

Given a query image I _k To query an image I _k Inputting the data into a deep hash network model GoogLeNet-hash to extract image hash characteristics and thresholding the image hash characteristics to obtain binary hash codes H of the image _k ；

Wherein H _g And H _k According to H = { H ₁ ,h ₂ ,...,h _q } ^T And then thresholding H according to the formula (3).

The step 3 specifically comprises the following steps:

computing a query image I _k Binary hash coding of H _k Set psi of binary hashes corresponding to test set images _H ＝{H ₁ ,H ₂ ,...,H _g H is encoded by each binary hash of _g The initial search result sequence is obtained according to the distance ascending sequence.

When calculating Hamming distance, the binary Hash code H _k And binary hash encoding H _n Comparing each bit, and obtaining the corresponding Hamming distance by comparing whether each bit of the Hash code is the same or not and adding 1 to the Hamming distance if the Hash code is different.

At the CPU end of the central processing unitLine query image I _k Binary hash coding of H _k Set psi of binary hashes corresponding to test set images _H ＝{H ₁ ,H ₂ ,...,H _g Acquisition of H to be acquired _k And psi _H ＝{H ₁ ,H ₂ ,...,H _g And transmitting the results to a GPU end of an image processor, calculating the Hamming distance, sequencing the results from small to large according to the Hamming distance after calculation to obtain initial arrangement results, and transmitting the initial arrangement results to a CPU end.

The step 4 specifically comprises the following steps: and the CPU calculates the Hamming distance between the image and the binary Hash codes of the query image again to obtain a reordering result, namely q images q which are most similar to the query image are less than p to obtain a final retrieval result.

60000 CIFAR-10 data sets, 10 training sets in each class and 5000 training sets in each class, and 10 testing sets in each class and 1000 testing sets in each class.

The invention has the beneficial effects that:

the invention combines a deep learning network and a Hash algorithm to form an end-to-end deep Hash network model, then extracts binary Hash codes of CIFAR-10 images as feature indexes, accelerates the retrieval speed by introducing GPU parallel retrieval to carry out feature matching and distance measurement, and finally improves the precision of the final retrieval result by utilizing result rearrangement.

Drawings

FIG. 1 is a flow chart of an image retrieval method based on deep hash feature and heterogeneous parallel processing according to the present invention;

fig. 2 is a schematic diagram of a CPU + GPU heterogeneous parallel processing structure in the image retrieval method based on the deep hash feature and heterogeneous parallel processing of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses an image retrieval method based on deep hash characteristics and heterogeneous parallel processing, the flow of which is shown in figure 1 and is specifically implemented according to the following steps:

step 1, off-line training network model

Adopting a GoogLeNet network model as an initialization network structure, replacing the last classification layer with a hash layer, wherein the unit number of the hash layer is the bit number of an image to be coded, obtaining a GoogLeNet-1 network model, dividing an image data set CIFAR-10 into a training set and a test set, wherein the CIFAR-10 data set is 60000 in total, the training set is divided into 10 types, each type is 5000, the test set is divided into 10 types, and each type is 1000, inputting the training set into the GoogLeNet-1 network model, extracting image depth characteristics through a convolutional layer, simultaneously performing hash function learning, mapping the final depth characteristics through the hash layer to obtain corresponding binary hash codes, and then performing iterative optimization and updating on a loss function to obtain optimal network parameters and a final GoogLeNet-hash model;

according to the method, the hash layer is designed, and the parameter values of the hash function are learned from the training data to generate the more compact hash features. After image depth features are obtained from a full connection layer of a GoogLeNet-hash network model, the depth features are transmitted into a hash layer to generate binary hash codes;

supposing that m-dimensional image depth features x are obtained from a full connection layer of a GoogLeNet-hash network model, then transmitting x into a hash layer, supposing that the number of nodes of the hash layer is q, namely q hash functions exist, generating q-bit hash codes, wherein the hash codes generated by the q hash functions are shown in the following formula:

(h ₁ ,h ₂ ,...,h _q ) ^T ＝(sgn(W ₁ x),sgn(W ₂ x)...,sgn(W _q x)) ^T (1)

since the sgn function is not a convex function and the objective function cannot be optimized and solved by using a gradient-based method, the sigmoid function is selected for relaxation, the coding range is constrained to the (0, 1) interval, and the final Hash codes generated by q Hash functions are obtained as shown in the following formula:

(h ₁ ,h ₂ ,...,h _q ) ^T ＝(sigmoid(W ₁ x),sigmoid(W ₂ x)...,sigmoid(W _q x)) ^T (2)

that is, the binary hash code H is a code consisting of 0 and 1;

the method comprises the following steps of obtaining optimal network parameters and a final GoogLeNet-hash through iterative optimization and updating of a loss function, wherein the step of obtaining the optimal network parameters and the final GoogLeNet-hash is specifically as follows:

wherein Z is _k Representing the image features after weighting by the hash layer, n representing the number of image classes, f (Z) _k ) Representing the probability of an image belonging to each class, Z _i Representing the ith class, where 1 < = i < = n, k is the class of the image true;

Loss＝-logf(Z _k ) (5)

step 1.3, solving the optimal value of Loss, and updating a weight coefficient theta by adopting a gradient descent method:

θ＝θ-η(f(Z _k )-1+γθ) (8)

wherein gamma is an attenuation factor, eta is a learning rate, so that correction of the Softmax classifier and updating of network parameters are completed, and a final deep hash network model GoogLeNet-hash is obtained;

the hash layer also belongs to a hidden layer of a neural network, the number of neurons of the hidden layer is not specifically determined, and the number of nodes of the hash layer designed in the invention determines the length of the binary coding features of the image, so that the number of nodes of the hash layer can be finally determined by comparing the training speed of different node numbers with the precision of the binary coding during retrieval through experiments.

i.e. given test set ψ = { I ₁ ,I ₂ ,...,I _g In which I _g Representing the g-th image in the test set, inputting the image in the test set into a deep hash network model GoogLeNet-hash to extract image hash characteristics and thresholding the image hash characteristics to obtain a final characteristic set psi _H ＝{H ₁ ,H ₂ ,...,H _g In which H is _g ＝{0,1} ^q ；

Given a query image I _k To query an image I _k Inputting the image into a deep Hash network model GoogLeNet-hash to extract image Hash characteristics and thresholding the image Hash characteristics to obtain a binary Hash code of the imageCode H _k ；

Step 3, calculating Hamming distances between the binary hash codes of the test set and the query image obtained in the step 2, and sequencing the Hamming distances in an ascending order to obtain an initial arrangement result; the method specifically comprises the following steps:

the method specifically comprises the following steps:

computing a query image I _k Binary hash coding of H _k Set psi of binary hashes corresponding to test set images _H ＝{H ₁ ,H ₂ ,...,H _g Each binary hash code H in _g The initial search result ordering is obtained according to the distance ascending order of the Hamming distances.

When calculating Hamming distance, the binary Hash code H _k And binary hash encoding H _n Comparing each bit of the hash code, and if the hash code is different from the hash code, adding 1 to the hamming distance, for example, there are 3 bits different between 10001001 and 10110001. And if the Hamming distance is 3, obtaining the corresponding Hamming distance, and if the Hamming distance is larger, the difference between the query image and the test set image is larger, namely the similarity is lower. And (4) sorting the Hamming distance from small to large, namely sorting similar images.

As shown in FIG. 2, the image I is inquired at the CPU end of the central processing unit _k Binary hash coding of H _k Set psi of binary hashes corresponding to test set images _H ＝{H ₁ ,H ₂ ,...,H _g Acquisition of H to be acquired _k And psi _H ＝{H ₁ ,H ₂ ,...,H _g Transmitting the result to a GPU (graphics processing Unit) end of an image processor, calculating the Hamming distance, sequencing the calculated Hamming distance from small to large to obtain a primary arrangement result, and transmitting the primary arrangement result to a CPU (Central processing Unit) end;

and 4, selecting the binary hash codes of the previous p images in the initial arrangement result, calculating the Hamming distance between the binary hash codes and the binary hash codes of the query image by the CPU, and sequencing according to the Hamming distance in an ascending order to obtain a reordering result, namely finally obtaining q retrieval results (q is less than p) which are most similar to the query image.

The invention utilizes the deep neural network to extract the characteristics of the image, and the network structure has important influence on the training. Too complicated training difficulty of network structure is big, can appear fitting, and the structure is too simple, can not arouse the learning ability of network. The GoogleLeNet network is selected, the number of layers of the network is increased, loss is increased at different depths to avoid the problem of gradient disappearance, and convolution kernels of different sizes are spliced to achieve the advantage of fusion of features of different scales.

The large-scale image retrieval based on the deep hash feature and heterogeneous parallel processing can be divided into four parts as shown in fig. 1. Respectively as follows: the system comprises a network model training part, an image feature extracting part, a parallel processing and calculating part and a retrieval result reordering part. The training network model part is to replace the last full connection layer of the GoogLeNet to a GoogLeNet-1 network model of a Hash layer, and then obtain a final deep Hash network model GoogLeNet-hash through Hash learning and parameter optimization; the image feature extraction part adopts a pre-trained network model to extract the depth features of the test set image and the query image; the parallel processing and calculating part utilizes the strong data processing capacity of the GPU, divides a thread to calculate the Hamming distance between the query image and the binary Hash codes of the test set image, and carries out similarity sorting according to the distance, wherein the smaller the distance is, the more similar the distance is; the search result rearrangement part is a method for improving the search precision, and obtains a final rearrangement result and the most image q images by calculating the Hamming distance twice.

The invention relates to a large-scale image retrieval method based on deep hash characteristics and heterogeneous parallel processing.A deep hash network model GoogLeNet-hash is obtained through a training set from the aspect of function execution; secondly, extracting binary Hash coding characteristics of the image by adopting a pre-trained deep Hash network model; then, extracting and matching the characteristics of the query image, executing CPU + GPU heterogeneous parallel processing, calculating the Hamming distance of binary hash codes of the query image and the test set image by thread, and obtaining an initial sequencing result based on the Hamming distance; and finally, reordering the execution results, and improving the retrieval precision through secondary Hamming distance calculation to obtain q images most similar to the query image. The large-scale image retrieval method based on the depth hash characteristics and heterogeneous parallel processing fully utilizes the depth characteristics of the image and the simplicity of binary hash codes, and combines the strong data processing capacity of the GPU to realize rapid and accurate large-scale image retrieval.

Claims

1. An image retrieval method based on deep hash characteristics and heterogeneous parallel processing is characterized by comprising the following steps:

step 1, off-line training network model

Adopting a GoogLeNet network model as an initialization network structure, replacing the last classification layer with a Hash layer, wherein the unit number of the Hash layer is the bit number of an image to be coded, obtaining a GoogLeNet-1 network model, dividing an image data set CIFAR-10 into a training set and a testing set, wherein the training set and the testing set respectively comprise a plurality of classes of images, inputting the training set into the GoogLeNet-1 network model, extracting image depth characteristics through a convolutional layer, simultaneously performing Hash function learning, mapping the final depth characteristics through the Hash layer to obtain corresponding binary Hash codes, and then performing iterative optimization and updating on a loss function to obtain optimal network parameters and a final GoogLeNet-hash of the deep Hash network model;

step 2, sending the test set and the query image into a trained GoogLeNet-hash network model to obtain the depth hash characteristics of the test set and the query image, namely binary hash codes; the characteristic extraction in the step 2 is to input the image into a deep hash network GoogLeNet-hash to extract binary hash characteristics of the image and carry out thresholding to finally obtain a characteristic set, and specifically comprises the following steps:

i.e. given test set ψ = { I ₁ ,I ₂ ,...,I _g In which I is _g Representing the g image in the test set, inputting the image in the test set into a deep hash network model GoogLeNet-hash, extracting image hash characteristics and thresholding to obtainFinal feature set psi _H ＝{H ₁ ,H ₂ ,...,H _g In which H is _g ＝{0,1} ^q ；

Given a query image I _k To query an image I _k Inputting the image hash characteristics into a deep hash network model GoogLeNet-hash, extracting the image hash characteristics and thresholding the image hash characteristics to obtain a binary hash code H of the image _k ；

Wherein H _g And H _k As H = { H ₁ ,h ₂ ,...,h _q } ^T Then thresholding H according to a formula (3) to obtain the threshold value;

step 3, calculating Hamming distances between the binary Hash codes of the test set and the query image obtained in the step 2, and sequencing the Hamming distances in an ascending order to obtain a primary ranking result; the method specifically comprises the following steps: computing a query image I _k Binary hash coding of H _k Binary hash coding set psi corresponding to test set image _H ＝{H ₁ ,H ₂ ,...,H _g H is encoded by each binary hash of _g The Hamming distances are arranged according to the ascending order of the distances to obtain the ordering of the initial retrieval results;

image I inquiry at CPU _k Binary hash coding of H _k Set psi of binary hashes corresponding to test set images _H ＝{H ₁ ,H ₂ ,...,H _g Acquisition of H to be acquired _k And psi _H ＝{H ₁ ,H ₂ ,...,H _g Transmitting the result to a GPU (graphics processing Unit) end of an image processor, calculating the Hamming distance, sequencing the calculated Hamming distance from small to large to obtain a primary arrangement result, and transmitting the primary arrangement result to a CPU (Central processing Unit) end;

step 4, selecting binary Hash codes of the previous p images in the initial ranking result, performing Hamming distance calculation again with the binary codes of the query image, and obtaining a reordering result according to the ascending ordering of the Hamming distances, namely finally obtaining q retrieval results most similar to the query image, wherein q is less than p;

wherein h is ₁ -h _q For hash coding of bits 1 to q, sigmoid (W) ₁ x)-sigmoid(W _q x) is the 1 st to q th Hash codes relaxed by sigmoid function, W ₁ -W _q To construct q m-dimensional random vector matrices, W ₁ -W _q ∈R ^q*m ，W ₁ -W _q Is generated from a gaussian distribution;

quantizing the relaxed hash code to obtain a final binary hash code H, i.e. for H = { H = ₁ ,h ₂ ,...,h _q } ^T Thresholding is performed, and the final binary hash code is obtained by the following formula:

that is, the binary hash code H is a code consisting of 0 and 1.

2. The image retrieval method based on the deep hash feature and the heterogeneous parallel processing as claimed in claim 1, wherein the step 1 further comprises performing iterative optimization and updating on the loss function to obtain the optimal network parameters and the final deep hash network model google net-hash specifically as follows:

wherein, Z _k Representing the image features after weighting by the hash layer, n representing the number of image classes, f (Z) _k ) Representing the probability of an image belonging to each class, Z _i Represents the ith category, wherein 1<＝i<= n, k is the category of the image truth;

Loss＝-log f(Z _k ) (5)

θ＝θ-η(f(Z _k )-1+γθ) (8)

3. The image retrieval method based on the deep hash feature and heterogeneous parallel processing as claimed in claim 1, wherein the binary hash code H is used for computing the Hamming distance _k And binary hash coding H _n Comparing each bit, and obtaining the corresponding Hamming distance by comparing whether each bit of the Hash code is the same or not and adding 1 to the Hamming distance if the Hash code is different.

4. The image retrieval method based on the deep hash feature and the heterogeneous parallel processing according to claim 1, wherein the step 4 specifically comprises: and the CPU calculates the Hamming distance between the image and the binary Hash codes of the query image again to obtain a reordering result, namely q images q < p which are most similar to the query image, and a final retrieval result is obtained.

5. The image retrieval method based on the deep hash feature and the heterogeneous parallel processing as claimed in claim 1, wherein the CIFAR-10 data sets are 60000 in total, the training sets are classified into 10 categories, each category is 5000 categories, and the testing sets are classified into 10 categories, each category is 1000 categories.