Image retrieval method based on deep hash feature and heterogeneous parallel processing
Technical Field
The invention belongs to the technical field of computer image retrieval methods, and relates to an image retrieval method based on deep hash characteristics and heterogeneous parallel processing.
Background
With the rapid development of storage devices, computer networks, and multimedia technologies, image data that people have come in contact with and manufactured has increased. Finding out the image that the user wants quickly and accurately in the massive database has become a hot spot of current research, and therefore, the image retrieval technology is also concerned and develops rapidly. There are currently two important challenges with such applications: (1) image features are typically high-dimensional data, storage requirements are high, and computational efficiency is low; (2) the retrieval method of large-scale data has high requirements on speed, time and the like.
In the prior art, two methods are mainly adopted for image retrieval. One is retrieval based on image overall feature description, because the feature dimension is high, all speed influences of storage, operation, retrieval and the like are caused; the other method is to perform retrieval based on local image features, and although the method can accurately describe the local image features, the description of the whole image is lost, so that the retrieval precision is not high.
Therefore, how to provide an image retrieval method to improve the retrieval accuracy and speed is an urgent problem to be solved in the field of computer vision.
Disclosure of Invention
The invention aims to provide an image retrieval method based on deep hash characteristics and heterogeneous parallel processing, and solves the problem of low image retrieval precision in the prior art.
The technical scheme adopted by the invention is that the image retrieval method based on the deep hash feature and heterogeneous parallel processing is implemented according to the following steps:
step 1, off-line training network model
The method comprises the steps of adopting a GoogLeNet network model as an initialization network structure, replacing the last classification layer with a Hash layer, wherein the unit number of the Hash layer is the bit number of an image to be coded, obtaining the GoogLeNet-1 network model, dividing an image data set CIFAR-10 into a training set and a testing set, wherein the training set is divided into 10 types, each type comprises 5000 pieces, and the testing set is divided into 10 types, and each type comprises 1000 pieces.
Inputting the training set into a GoogLeNet-1 network model, extracting image depth features through a convolutional layer, simultaneously performing hash function learning, mapping the depth features through a hash layer to obtain corresponding binary hash codes, and performing iterative optimization and updating on a loss function to obtain optimal network parameters and a final GoogLeNet-hash of the deep hash network model;
step 2, sending the test set and the query image into a trained GoogLeNet-hash network model to obtain the depth hash characteristics of the test set and the query image, namely binary hash codes;
step 3, calculating Hamming distances between the binary Hash codes of the test set and the query image obtained in the step 2, and sequencing the Hamming distances in an ascending order to obtain a primary ranking result;
and 4, selecting binary Hash codes of the previous p images in the initial ranking result, performing Hamming distance calculation again with the binary codes of the query image, and sequencing according to the Hamming distance in an ascending order to obtain a reordering result, namely finally obtaining q retrieval results (q is less than p) which are most similar to the query image.
The present invention is also characterized in that,
the process of generating the binary hash code in the hash layer in the step 1 and the step 2 specifically comprises the following steps:
after an m-dimensional image depth feature x is obtained from a full-connection layer of a GoogleLeNet-hash network model, the x is transmitted to a hash layer, q hash functions are provided on the assumption that the number of nodes of the hash layer is q, q bit hash codes are generated, and the hash codes generated by the q hash functions are shown in the following formula:
(h1,h2,...,hq)T=(sigmoid(W1x),sigmoid(W2x)...,sigmoid(Wqx))T (1)
wherein h is1-hqFor hash coding of bits 1 to q, sigmoid (W)1x)-sigmoid(Wqx) is the 1 st to q th Hash codes relaxed by sigmoid function, W1-WqTo construct q m-dimensional random vector matrices, W1-Wq∈Rq *m,W1-WqIs generated from a gaussian distribution;
quantizing the relaxed hash code to obtain a final binary hash code H, i.e. H ═ H { (H) }1,h2,...,hq}TThresholding is performed, and the final binary hash code is obtained by the following formula:
that is, the binary hash code H is a code consisting of 0 and 1.
In the step 1, iterative optimization and updating are performed on the loss function to obtain the optimal network parameters and the final deep hash network model google lenet-hash, which specifically include:
step 1.1, calculating the probability of each image in the training set belonging to each category;
wherein Z iskRepresenting the image features after hash-layer weighting, n representing the number of image classes, f (Z)k) Representing the probability of an image belonging to each class, ZiRepresenting the ith class, wherein 1 < i < n, k is the class of image trueness;
step 1.2, according to f (Z)k) Calculating the value of the Loss function Loss:
Loss=-logf(Zk) (4)
step 1.3, solving the optimal value of Loss, and updating the weight coefficient theta by adopting a gradient descent method:
θ=θ-η(f(Zk)-1+γθ) (7)
wherein gamma is an attenuation factor, and eta is a learning rate, so that correction of the Softmax classifier and updating of network parameters are completed, and a final deep Hash network model GoogLeNet-hash is obtained.
The characteristic extraction in the step 2 is to input the image into a deep hash network GoogLeNet-hash to extract binary hash characteristics of the image and carry out thresholding to finally obtain a characteristic set, and the specific steps are as follows:
i.e. given test set psi ═ I1,I2,...,IgIn which IgRepresenting the g-th image in the test set, inputting the image in the test set into a deep hash network model GoogleLeNet-hash, extracting image hash characteristics and thresholding to obtain a final characteristic set psiH={H1,H2,...,HgIn which H isg={0,1}q;
Given a query image IkTo query an image IkInputting the image hash characteristics into a deep hash network model GoogLeNet-hash, extracting the image hash characteristics and thresholding the image hash characteristics to obtain a binary hash code H of the imagek;
Wherein HgAnd HkAccording to H ═ H1,h2,...,hq}TAnd then thresholding H according to the formula (3).
The step 3 specifically comprises the following steps:
computing a query image IkBinary hash coding of HkBinary hash coding corresponding to test set imagesSet psiH={H1,H2,...,HgH is encoded by each binary hash ofgThe initial search result ordering is obtained according to the distance ascending order of the Hamming distances.
When calculating Hamming distance, the binary Hash code HkAnd binary hash encoding HnComparing each bit, and obtaining the corresponding Hamming distance by comparing whether each bit of the Hash code is the same or not and adding 1 to the Hamming distance if the Hash code is different.
Image I inquiry at CPUkBinary hash coding of HkSet psi of binary hashes corresponding to test set imagesH={H1,H2,...,HgAcquisition of H to be acquiredkAnd psiH={H1,H2,...,HgAnd transmitting the results to a GPU end of an image processor, calculating the Hamming distance, sequencing the results from small to large according to the Hamming distance after calculation to obtain initial arrangement results, and transmitting the initial arrangement results to a CPU end.
The step 4 specifically comprises the following steps: and the CPU calculates the Hamming distance between the image and the binary Hash codes of the query image again to obtain a reordering result, namely q images q which are most similar to the query image are less than p to obtain a final retrieval result.
60000 CIFAR-10 data sets, 10 training sets in each class and 5000 training sets in each class, and 10 testing sets in each class and 1000 testing sets in each class.
The invention has the beneficial effects that:
the invention combines a deep learning network and a Hash algorithm to form an end-to-end deep Hash network model, then extracts binary Hash codes of CIFAR-10 images as feature indexes, accelerates the retrieval speed by introducing GPU parallel retrieval to carry out feature matching and distance measurement, and finally improves the precision of the final retrieval result by utilizing result rearrangement.
Drawings
FIG. 1 is a flow chart of an image retrieval method based on deep hash feature and heterogeneous parallel processing according to the present invention;
fig. 2 is a schematic diagram of a CPU + GPU heterogeneous parallel processing structure in the image retrieval method based on the deep hash feature and heterogeneous parallel processing of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses an image retrieval method based on deep hash characteristics and heterogeneous parallel processing, the flow of which is shown in figure 1 and is specifically implemented according to the following steps:
step 1, off-line training network model
Adopting a GoogLeNet network model as an initialization network structure, replacing the last classification layer with a Hash layer, wherein the unit number of the Hash layer is the bit number of an image to be coded, obtaining a GoogLeNet-1 network model, dividing an image data set CIFAR-10 into a training set and a testing set, wherein the CIFAR-10 data set comprises 60000 data sets, the training set comprises 10 types, each type comprises 5000 data sets, the testing set comprises 10 types and each type comprises 1000 data sets, inputting the training set into the GoogLeNet-1 network model, extracting image depth characteristics through convolution layers, simultaneously performing Hash function learning, mapping the final depth characteristics through the Hash layer to obtain corresponding binary Hash codes, and performing iterative optimization and updating on a loss function to obtain optimal network parameters and a final GoogLeNet-hash of the deep hash network model;
step 2, sending the test set and the query image into a trained GoogLeNet-hash network model to obtain the depth hash characteristics of the test set and the query image, namely binary hash codes;
according to the invention, through designing the hash layer, the parameter values of the hash function are learned from the training data, so as to generate a more compact hash characteristic. After image depth features are obtained from a full connection layer of a GoogLeNet-hash network model, the depth features are transmitted into a hash layer to generate binary hash codes;
the process of generating the binary hash code in the hash layer in the step 1 and the step 2 specifically comprises the following steps:
after an m-dimensional image depth feature x is obtained from a full-connection layer of a GoogleLeNet-hash network model, the x is transmitted to a hash layer, q hash functions are provided on the assumption that the number of nodes of the hash layer is q, q bit hash codes are generated, and the hash codes generated by the q hash functions are shown in the following formula:
(h1,h2,...,hq)T=(sgn(W1x),sgn(W2x)...,sgn(Wqx))T (1)
since the sgn function is not a convex function and the objective function cannot be optimized and solved by using a gradient-based method, the sigmoid function is selected for relaxation, the coding range is constrained to the (0,1) interval, and the final Hash codes generated by q Hash functions are obtained as shown in the following formula:
(h1,h2,...,hq)T=(sigmoid(W1x),sigmoid(W2x)...,sigmoid(Wqx))T (2)
wherein h is1-hqFor hash coding of bits 1 to q, sigmoid (W)1x)-sigmoid(Wqx) is the 1 st to q th Hash codes relaxed by sigmoid function, W1-WqTo construct q m-dimensional random vector matrices, W1-Wq∈Rq *m,W1-WqIs generated from a gaussian distribution;
quantizing the relaxed hash code to obtain a final binary hash code H, i.e. H ═ H { (H) }1,h2,...,hq}TThresholding is performed, and the final binary hash code is obtained by the following formula:
that is, the binary hash code H is a code consisting of 0 and 1;
the method comprises the following steps of obtaining optimal network parameters and a final GoogLeNet-hash through iterative optimization and updating of a loss function, wherein the step of obtaining the optimal network parameters and the final GoogLeNet-hash is specifically as follows:
step 1.1, calculating the probability of each image in the training set belonging to each category;
wherein Z iskRepresenting the image features after hash-layer weighting, n representing the number of image classes, f (Z)k) Representing the probability of an image belonging to each class, ZiRepresenting the ith class, wherein 1 < i < n, k is the class of image trueness;
step 1.2, according to f (Z)k) Calculating the value of the Loss function Loss:
Loss=-logf(Zk) (5)
step 1.3, solving the optimal value of Loss, and updating the weight coefficient theta by adopting a gradient descent method:
θ=θ-η(f(Zk)-1+γθ) (8)
wherein gamma is an attenuation factor, eta is a learning rate, so that correction of the Softmax classifier and updating of network parameters are completed, and a final deep Hash network model GoogLeNet-hash is obtained;
the hash layer also belongs to a hidden layer of a neural network, the number of neurons of the hidden layer is not specifically determined, and the number of nodes of the hash layer designed in the invention determines the length of the binary coding features of the image, so that the number of nodes of the hash layer can be finally determined by comparing the training speed of different node numbers with the precision of the binary coding during retrieval through experiments.
The characteristic extraction in the step 2 is to input the image into a deep hash network GoogLeNet-hash to extract binary hash characteristics of the image and carry out thresholding to finally obtain a characteristic set, and the specific steps are as follows:
i.e. given test set psi ═ I1,I2,...,IgIn which IgRepresenting the g-th image in the test set, inputting the image in the test set into a deep hash network model GoogleLeNet-hash, extracting image hash characteristics and thresholding to obtain a final characteristic set psiH={H1,H2,...,HgIn which H isg={0,1}q;
Given a query image IkTo query an image IkInputting the image hash characteristics into a deep hash network model GoogLeNet-hash, extracting the image hash characteristics and thresholding the image hash characteristics to obtain a binary hash code H of the imagek;
Wherein HgAnd HkAccording to H ═ H1,h2,...,hq}TAnd then thresholding H according to the formula (3).
Step 3, calculating Hamming distances between the binary Hash codes of the test set and the query image obtained in the step 2, and sequencing the Hamming distances in an ascending order to obtain a primary ranking result; the method specifically comprises the following steps:
the method specifically comprises the following steps:
computing a query image IkBinary hash coding of HkSet psi of binary hashes corresponding to test set imagesH={H1,H2,...,HgH is encoded by each binary hash ofgThe initial search result ordering is obtained according to the distance ascending order of the Hamming distances.
When calculating Hamming distance, the binary Hash code HkAnd binary hash encoding HnBy comparing whether each bit of the hash code is the same or not, if different, the hamming distance is increased by 1, e.g., 10001001 and 10110001 have 3 bits different. And if the Hamming distance is 3, obtaining the corresponding Hamming distance, and if the Hamming distance is larger, the difference between the query image and the test set image is larger, namely the similarity is lower. And (4) sorting the Hamming distance from small to large, namely sorting similar images.
As shown in FIG. 2, the image I is inquired at the CPU end of the central processing unitkBinary hash coding of HkSet psi of binary hashes corresponding to test set imagesH={H1,H2,...,HgAcquisition of H to be acquiredkAnd psiH={H1,H2,...,HgTransmitting the results to a GPU end of an image processor, calculating the Hamming distance, sequencing the results from small to large according to the Hamming distance after calculation to obtain initial arrangement results, and transmitting the initial arrangement results to a CPU end;
and 4, selecting binary hash codes of the previous p images in the initial arrangement result, calculating the Hamming distance between the binary hash codes and the binary hash codes of the query image by the CPU, and sequencing according to the Hamming distance in an ascending order to obtain a reordering result, namely finally obtaining q retrieval results (q is less than p) which are most similar to the query image.
The invention utilizes the deep neural network to extract the characteristics of the image, and the network structure has important influence on the training. Too complicated training difficulty of network structure is big, can appear fitting, and the structure is too simple, can not arouse the learning ability of network. The GoogleLeNet network is selected, the number of layers of the network is increased, loss is increased at different depths to avoid the problem of gradient disappearance, and convolution kernels of different sizes are spliced to achieve the advantage of fusion of features of different scales.
The large-scale image retrieval based on the deep hash feature and heterogeneous parallel processing can be divided into four parts as shown in fig. 1. Respectively as follows: the system comprises a network model training part, an image feature extracting part, a parallel processing and calculating part and a retrieval result reordering part. The training network model part is to replace the last full connection layer of the GoogLeNet to a GoogLeNet-1 network model of a Hash layer, and then obtain a final deep Hash network model GoogLeNet-hash through Hash learning and parameter optimization; the image feature extraction part adopts a pre-trained network model to extract the depth features of the test set image and the query image; the parallel processing and calculating part utilizes the strong data processing capacity of the GPU, divides a thread to calculate the Hamming distance between the query image and the binary Hash codes of the test set image, and carries out similarity sorting according to the distance, wherein the smaller the distance is, the more similar the distance is; the search result rearrangement part is a method for improving the search precision, and obtains a final rearrangement result and the most image q images by calculating the Hamming distance twice.
The invention relates to a large-scale image retrieval method based on deep hash characteristics and heterogeneous parallel processing.A deep hash network model GoogLeNet-hash is obtained through a training set from the aspect of function execution; secondly, extracting binary Hash coding characteristics of the image by adopting a pre-trained deep Hash network model; then, extracting features of the query image, performing feature matching, executing CPU + GPU heterogeneous parallel processing, calculating the Hamming distance of binary Hash codes of the query image and the test set image by a thread, and obtaining an initial sequencing result based on the Hamming distance; and finally, reordering the execution results, and improving the retrieval precision through secondary Hamming distance calculation to obtain q images most similar to the query image. The large-scale image retrieval method based on the depth hash characteristics and heterogeneous parallel processing fully utilizes the depth characteristics of the image and the simplicity of binary hash codes, and combines the strong data processing capacity of the GPU to realize rapid and accurate large-scale image retrieval.