CN110457503B

CN110457503B - Method for quickly optimizing depth hash image coding and target image retrieval

Info

Publication number: CN110457503B
Application number: CN201910701690.6A
Authority: CN
Inventors: 张超; 苏树鹏; 韩凯; 田永鸿
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2022-03-25
Anticipated expiration: 2039-07-31
Also published as: CN110457503A

Abstract

The invention discloses a method for quickly optimizing encoding of a deep hash image and a method for retrieving a target image. When the target image is searched, similar images of the query image can be quickly obtained by calculating the Hamming distance between the query image code and the database image code. The method of the invention combines the neural network to better solve the problems of gradient disappearance and quantization error, and the coding performance is better; the training process of the deep network is completed by using fewer iteration times, so that the training speed is higher; the method can be applied to various problems with discrete constraint, and has wider application range; the optimization speed of the deep neural network and the retrieval performance of the generated image codes are further improved, and the retrieval precision of a large image database is effectively improved.

Description

Method for quickly optimizing depth hash image coding and target image retrieval

Technical Field

The invention belongs to the technical field of information retrieval, relates to image processing and image fast retrieval technology, and particularly relates to a greedy strategy-based fast optimization depth hash image coding method and a target image retrieval method thereof.

Background

With the advent of the big data era, data in various fields has increased explosively, and in such a big data wave, how to search for information that is needed by the user is an important and urgent research topic. The hash algorithm is an algorithm for quickly completing target image retrieval on a large image data set, and the main idea is that images are coded into a string of binary codes (namely, each image is represented by a string of binary codes with limit as the characteristics of the image), and the hamming distance of the images is obtained through quick exclusive or operation between the binary codes, so that approximate nearest neighbor image retrieval (namely, the image which is closest to a query image is found out from an image database) is completed after sorting. The representation mode of the binary image features can bring very low storage requirements (binary system) and very high retrieval speed (the Hamming distance between the features can be obtained through the simplest bitwise exclusive OR operation in a computer so as to judge the similarity degree of the two images), so that the binary image feature representation mode has great research potential and application range.

In recent years, deep learning, particularly its representative one, is the rapid development of convolutional neural networks, so that the performance of each large image-related application is improved qualitatively (such as classification, object detection, image retrieval, etc.). The deep neural network is mainly trained by using a stochastic gradient descent method, in short, an image is input into the network, the network propagates forward to obtain image features, a corresponding loss function (which can also be called an objective function, for example, the retrieved target is that images of the same class should have similar image features) is calculated, then the loss propagates backward, and the gradient of neurons in each layer is calculated (in the direction of reducing the loss function), so that parameter updating and network training are completed, and the loss function is reduced to the minimum (as shown in fig. 1). The strong force of deep learning prompts hash researchers to put forward to combine the hash algorithm with the deep network, namely, the deep hash algorithm, so as to further improve the retrieval performance of image coding.

The idea of solving the task of fast image retrieval by the deep hash algorithm is divided into two steps, firstly, the powerful feature learning capacity of the convolutional neural network is utilized to learn the depth feature expression of each image in an image database, so that compared with the representation of using each pixel value of an original image or using the feature extracted by the traditional feature extraction algorithm as an image, the depth network can output the image feature which can better represent the characteristics of the input image. Secondly, the continuous value image features are further coded into binary features by utilizing a Hash algorithm, so that the storage requirement is greatly reduced, the retrieval efficiency is rapidly improved, and the rapid and accurate retrieval requirement is really met. The deep hash algorithm effectively integrates the two steps to the same deep network framework, so that the deep feature learning and the hash coding can mutually promote learning and training, and the optimal image binary coding and the corresponding coding network are obtained.

However, in practice, the deep hash algorithm for true network end-to-end training still remains a very challenging problem, and the main difficulty is that the gradient (derivative) of the sign function (as shown in fig. 2) used for encoding the image into the binary code is zero everywhere, which is fatal to a deep neural network trained by a gradient descent method, and a network at the front layer of the sign function cannot obtain any gradient update information, which results in training failure.

The hash algorithm can quickly complete the task of target image retrieval on a large image data set by encoding an image into a string of compact binary codes (namely, given a query image, the algorithm finds out similar images in a large image database and returns the similar images to a user), and has a very wide application range, such as technical application of image searching, face authentication and the like. The deep hash algorithm hopes to combine both the powerful deep learning and hash algorithm at present, and further improve the performance of the image retrieval system. A very troublesome problem faced in the field of deep hash research is that a sign function (input value greater than 0 and output +1, and input value less than 0 and output-1) used for encoding an image into a binary code has a gradient of zero everywhere, which is fatal to a deep neural network trained by a gradient descent method, and the problem of disappearance of the gradient makes the front layer of the network unable to obtain any updated information, and finally the training fails so that the image cannot be effectively encoded.

Most of the existing deep hash algorithms propose to relax the original problem, and do not require strictly generating binary codes { -1, +1} in the training stage, and relax the binary codes to be continuous values between-1 and +1 (the corresponding generating function is derivable everywhere), so that the network can complete training, and then quantize the continuous value characteristics in the final testing stage to obtain the real binary codes. Although the method can solve the problem of gradient disappearance, the relaxation strategy introduces the quantization error problem, which is reflected in the phenomenon that the performance is reduced because the real binary code forcibly generated in the testing stage is different from the image characteristics generated in the training stage.

Although HashNet and DSDH can avoid obstacles such as gradient elimination and quantization error in deep hash to complete network training, the problems exposed by HashNet and DSDH are obvious. On one hand, they all need iterative training for many times, because the DSDH only updates one bit in the binary code each time, different code positions need to be updated circularly, and the HashNet needs to train again after continuously tightening the constraint beta, so that more training iterations are needed, and the training cost is high. On the other hand, for the DSDH, the discrete circular coordinate descent method used by the DSDH can only be applied to the discrete quadratic programming problem, the application range is limited, and the HashNet still has partial quantization errors as a result, so that the performance is not optimal.

Most of depth hash algorithms apply a relaxation strategy to solve the gradient disappearance problem, for example, in document [1], a tanh function (the derivative is not 0) is used to output a continuous value between [ -1, +1] to replace a discrete value { -1, +1} of a sign function, and then after training is completed, the obtained image feature is strictly binarized to obtain a true binary image feature. Although the method can solve the problem of gradient disappearance, a quantization error is introduced by a relaxation strategy applied by the method, which is embodied in that the characteristics of a real binary code generated in a testing stage and a continuous value image generated in a training stage are different, so that the generated image binary coding and coding network are suboptimal. For the quantization error problem, documents [2] and [3] show respective solutions and achieve the current superior performance in the field. Document [2] proposes a HashNet algorithm that reduces quantization errors by first using a relaxed y-tanh (β x) coding function and then gradually increasing β during the training process to approximate the original sign function y-sgn (x), with the gradual increase in training difficulty not causing the network to fail in training from the beginning. Document [3] proposes a DSDH algorithm, which uses a discrete cyclic coordinate descent method to solve the Hash discrete coding optimization problem, and the whole solving and training stage can maintain the discrete value constraint without relaxation, thereby avoiding the introduction of quantization errors. Although the retrieval performance is improved, the problems of slow network training, limited application range and the like still exist in HashNet and DSDH.

Reference documents:

[1]Zhao F,Huang Y,Wang L,et al.Deep semantic ranking based hashing for multi-label image retrieval[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2015:1556-1564.

[2]Cao Z,Long M,Wang J,et al.HashNet:Deep Learning to Hash by Continuation[C]//ICCV.2017:5609-5618.

[3]Li Q,Sun Z,He R,et al.Deep supervised discrete hashing[C]//Advances in Neural Information Processing Systems.2017:2482-2491.

[4]Goodfellow I,Bengio Y,Courville A,et al.Deep learning[M].Cambridge:MIT press,2016.

[5]Li W J,Wang S,Kang W C.Feature learning based deep supervised hashing with pairwise labels[J].arXiv preprint arXiv:1511.03855,2015.

[6]Zhu H,Long M,Wang J,et al.Deep Hashing Network for Efficient Similarity Retrieval[C]//AAAI.2016:2415-2421.

[7]Lai H,Pan Y,Liu Y,et al.Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2015:3270-3278.

[8]Xia R,Pan Y,Lai H,et al.Supervised hashing for image retrieval via image representation learning[C]//AAAI.2014,1(2014):2.

disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a greedy strategy-based fast optimization depth hash image coding method and a target image retrieval method, which are combined with a neural network to better solve the problems of gradient disappearance and quantization errors and have better coding performance; the training process of the deep network is completed by using fewer iteration times, so that the training speed is higher; the method can be applied to various problems with discrete constraint, and has wider application range; the optimization speed of the deep neural network and the retrieval performance of the generated image codes are further improved, and the retrieval precision of a large image database is effectively improved.

The technical scheme provided by the invention is as follows:

a target image retrieval method for fast optimizing depth hash coding comprises the following steps: a greedy strategy-based method for rapidly optimizing a depth hash image coding method and a target image retrieval method are provided. Based on a greedy strategy, aiming at a large-scale image data set, a Hash image coding model is established, and binary codes of all images are generated through a depth Hash coding network obtained after optimization. When the target image is searched, similar images of the query image can be quickly obtained by calculating the Hamming distance between the query image code and the database image code. The method comprises the following steps:

1) modeling a Hash image coding problem to obtain a Hash image coding model;

modeling the Hash image coding problem to obtain an optimization problem, namely a Hash image coding model, which is expressed as formula (1):

s.t.B∈{-1,+1}^Kformula (1)

Wherein, B represents a binary code generated by forward propagation of the input image by using the depth network, and the constraint condition constrains that each bit of the code B can only be selected from { -1, +1} and has a total of K bits (i.e. each image is coded as a K-bit binary code). L (B) represents the computation of the loss function for B, because the algorithm proposed by the present invention does not limit L to be a quadratic programming form unlike DSDH, L in the above formula can be embodied as various commonly used loss functions in the deep learning field, such as a mean square error function, a cross entropy function, and the like, in the model of the present invention.

2) Solving a Hash image coding model (formula (1)) by using a greedy strategy to obtain an optimal binary code B; the method comprises the following operations:

21) in the solving process, discrete constraint B is not considered to be belonged to-1, +1^KIn the case of (3), the gradient of L with respect to B can be calculated first

The iterative update is then performed using the gradient descent method of the following formula, represented as formula (2):

wherein t represents the updating of the t-th round in the training process, and lr represents the learning rate preset by the algorithm; b is^tRepresents the updated code B of the t round; b is^t+1Represents the code B after the t +1 th round of updating; l represents the loss function of the model.

B found by gradient update (2)^t+1Is l (b) the optimal update direction for the current iteration selected without considering the dispersion constraint;

22) obtaining a value B from the continuum^t+1The nearest solution that satisfies the discrete valued constraint, sgn (B)^t+1) Sgn () represents the use of a sign function element by element;

23) to this solution sgn (B)^t+1) The parameters are updated, that is, the hash coding optimization problem formula (1) is solved by using formula (3):

3) designing a deep Hash coding module in a deep network, training a Hash image coding model, and realizing an updating mode of the formula (3); the method comprises the following operations:

31) representing an input image as a series of image features H taking continuous values using a convolutional neural network;

32) designing a brand-new depth Hash image coding module at the last layer of the convolutional neural network, wherein the input of the depth Hash image coding module is a continuous value image characteristic H, and the output of the depth Hash image coding module is a coded B; realizing the updating mode of the formula (3) in a depth hash image coding module; using a sign function for H bit by bit in module forward propagation to obtain a strictly binary code B; when the module reversely transmits, directly assigning the gradient information obtained by the coding B to H, namely enabling the gradient information of H to be equal to the gradient information of B, and enabling the gradient to be smoothly transmitted back to a front-layer network;

the depth hash image coding module can solve the problems of gradient disappearance and quantization errors in the depth hash algorithm and realize quick optimization and accurate coding. The depth hash image coding module completes the following operations:

321) by first introducing the variable H into equation (3), equation (4) can be obtained:

wherein H^t+1Represents the variable H after the t +1 th round of updating;

322) to implement equation (4a), a sign function is applied to the input H in the forward propagation of the depth hash image coding module, and is expressed as equation (5):

b ═ sgn (h) formula (5)

323) To realize the formula (4b), adding a penalty term in the target function of the network training to assist the depth hash image coding module;

for (4b), a penalty term needs to be added to the loss function L first

Enabling the network to approximately satisfy H ≈ sgn (H) ═ B in the training process; for the H variable, its update formula becomes equation (6):

wherein H^tA variable H after the t-th round of updating is shown;

comparing equation (6) with equation (4b), the method for implementing (4b) is to implement direct assignment back-propagation in the back propagation of the deep hash coding module, which is expressed as equation (7):

equation (7) represents that in the back propagation process of the newly designed coding module, the gradient of the loss function with respect to B can be directly and completely transmitted back to the front layer H and finally transmitted back to the front end of the network to complete parameter learning and network updating.

In the network training process, a sign function is strictly used during forward propagation to keep discrete constraint always true; when the gradient is completely transmitted back to a front-layer network during reverse propagation, the problem of gradient disappearance is solved, and simultaneously, each coding bit is synchronously updated so as to quickly finish effective neural network training and convergence;

4) after training is finished, a trained image depth Hash coding network is obtained;

5) coding all database images by using a trained deep hash coding network to generate database image codes;

through the process, rapid optimization depth hash image coding based on the greedy strategy is achieved.

When the target image retrieval is carried out, the following operations are carried out:

6) when a user provides a query image, the depth hash coding network is used for coding the query image to generate a query image code;

7) then, by calculating the Hamming distance between the query image code and all database image codes, returning the front M (the number of returned images set by a user) database images with the minimum distance after sorting, namely, the similar image retrieval result of the query image;

through the process, the database image with the target image similar to the query image is quickly retrieved.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a greedy strategy-based target image retrieval method for rapidly optimizing deep hash coding, which solves a deep hash discrete optimization problem by using the greedy strategy and updates a network by iteratively solving an approximate optimal solution meeting discrete constraints under the current condition so as to complete rapid and effective training. In the specific implementation, a brand-new deep hash coding module is designed, the sign function is strictly used during forward propagation to keep the discrete constraint always true, the problem of quantization error is avoided, the gradient is completely transmitted back to a front-layer network during reverse propagation, the problem of gradient disappearance is solved, and simultaneously, each coding bit is synchronously updated, so that effective neural network training and convergence are rapidly completed. In addition, a penalty term is added in the objective function of network training to assist the proposed coding module, so that gradient deviation generated in backward propagation of the coding module is effectively reduced, the accuracy of the parameter updating direction and the stability of network optimization are ensured, and the generated image coding is more accurate and robust. Compared with the traditional algorithm, the deep hash coding method provided by the invention has higher training speed and better retrieval performance and fully illustrates the application potential of the method in the large-scale image database retrieval problem.

The technical advantages of the method of the invention are embodied as follows:

a brand-new deep hash coding module is provided based on a greedy strategy, and the coding module strictly keeps using a sign function to maintain discrete constraint during forward propagation, so that quantization errors are prevented from being introduced, then gradients are completely transmitted back in backward propagation, the problem of gradient disappearance is avoided, synchronous updating of all coding bits is realized, and the network is helped to complete quick training and coding performance improvement. For large-scale image retrieval application, the module can effectively reduce the training cost and obviously improve the retrieval precision.

(II) adding a penalty term in the objective function of network training

The encoding module provided by the invention is assisted to enable the value of the continuous variable H to be closer to the discrete encoding variable B, so that the gradient deviation in the backward propagation of the encoding module is reduced, the accuracy of the parameter updating direction and the stability of network optimization are ensured, and the generated image encoding is more accurate and robust.

Drawings

FIG. 1 is a schematic diagram of a prior art neural network gradient descent method;

wherein, the abscissa is a parameter w of the network; the ordinate is the loss function of the training network.

Fig. 2 is a schematic diagram of a sign function.

Fig. 3 is a flowchart of a target image retrieval method provided by the present invention.

FIG. 4 is a schematic diagram of a network model used in the practice of the present invention;

the front-layer basic framework of the network adopts an AlexNet structure, the image is subjected to AlexNet to obtain a characteristic H, and the image is subjected to the Hash coding module provided by the invention to obtain a code B.

FIG. 5 is a comparison of the rapid optimization performance of the present invention method compared to other prior art methods when practiced.

Detailed Description

The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.

The invention provides a greedy strategy-based fast optimization deep hash image coding method and a target image retrieval method. By designing a brand-new deep hash coding module, the sign function is strictly used during forward propagation to keep the discrete constraint always true, so that the problem of quantization error is avoided, the gradient is completely transmitted back to a front-layer network during reverse propagation, and each coding bit is synchronously updated while the problem of gradient disappearance is solved, so that effective neural network training and convergence are rapidly completed. In addition, a penalty term is added in the objective function of network training to assist the proposed coding module, so that gradient deviation generated in backward propagation of the coding module is effectively reduced, the accuracy of the parameter updating direction and the stability of network optimization are ensured, and the generated image coding is more accurate and robust.

The flow of the method of the invention is shown in FIG. 3. The overall structure of the network is shown in fig. 4.

In specific implementation, the method for generating the image code aiming at all the query images and the database images comprises the following steps:

firstly, modeling a Hash coding problem to obtain the following optimization problem:

s.t.B∈{-1,+1}^Kformula (1)

To solve this optimization problem, the optimal binary code B is obtained. Without considering the discrete constraint B e { -1, +1}^KIn the case of (3), the gradient of L with respect to B can be calculated first

The iterative update is then performed using the gradient descent method of the following formula:

wherein t represents the updating of the t-th round in the training process, and lr represents the learning rate preset by the algorithm; b is^tRepresents the updated code B of the t round; b is^t+1Represents the code B after the t +1 th round of updating; l represents the loss function of the model. However, B calculated by this equation is almost impossible to satisfy the constraint B e { -1, +1}^KHowever, once the discrete value constraint is considered, (1) becomes an NP-hard problem. An algorithm for rapidly and effectively solving the NP difficult problem is a greedy algorithm, and the optimal selection under the current condition is selected through each iteration updating, so that a very similar global optimal solution can be obtainedAnd (6) performing solution. If gradient is used to update B found by equation (2)^t+1L (B) the optimal update direction of the current iteration, chosen without taking into account the discrete constraints, then a greedy principle is applied, away from the continuous value B^t+1The nearest solution that satisfies the discrete valued constraint, sgn (B)^t+1) It is likely to be the optimal discrete solution in this iteration, so the invention "greedy" moves towards this solution sgn (B)^t+1) The parameters are updated, that is, the problem in (1) is solved by using the following formula:

although equation (3) may not be the most effective method for solving the discrete optimization problem alone, it is one of the most efficient ways to solve the discrete optimization problem by fusing with the neural network, because there are three points:

1. the deep neural network completes parameter updating and learning through a random gradient descent method, and gradient descent is a greedy strategy which completes final convergence by iteratively updating one step towards the steepest descent direction of the current situation. It can be seen that the feasibility of the greedy strategy in the neural network is very high, so that the idea is very reasonable and time-consuming to solve the problem of discrete value optimization in the deep network.

2. The update form of equation (3) is actually a variant of the neural network gradient update formula (also calculating the parameter gradient and then performing the parameter update), and the update mode with the two similar forms lays a solid foundation for implementing equation (3) in the neural network (see section 4.2).

3. As indicated in document [4], the stochastic gradient descent algorithm (using a part of samples to calculate the gradient) is originally equivalent to adding noise to the true gradient information (calculated using all training samples), and the appropriate noise in the gradient can bring some regularization effect to the neural network and also enable the neural network to escape from most local minimum points and saddle points in the optimization. Observing equation (3) can find that the action is equivalent to introducing "noise" to the original equation (2) through the sign function sgn (). Therefore, the use of the formula (3) in the neural network is not only beneficial to solving the discrete value optimization problem of the network, but also improves the optimization process performance of the network to a certain extent.

The problem of discrete coding optimization in a neural network is solved based on the greedy strategy, and the deep hash coding module provided by the present invention is specifically described below to implement the formula (3) update mode in the deep network.

As with the basic flow of the deep hash algorithm described in the background section, an input image is first represented as a string of image features taking consecutive values using a convolutional neural network (any common deep network can be used, such as AlexNet, ResNet, etc.), and this string of features is denoted as H. Then, a new hash coding module is designed, the input of the hash coding module is the continuous value image characteristic H, and the output of the hash coding module is the coded B. It is necessary to implement the updating manner of equation (3) of section 4.1 in this module, and intuitively clarify how it solves the gradient elimination and quantization error problem in the deep hash algorithm and achieve the goal of fast optimization and accurate coding from the construction manner of this module.

First introducing the variable H into equation (3), one can obtain:

H^t+1represents the variable H after the t +1 th round of updating;

then the task becomes how (4a) and (4b) to implement. Observing equation (4a) can be quickly found, and it is straightforward to implement it, namely using the sign function for input H in the forward propagation of the newly designed coding module:

b ═ sgn (h) formula (5)

For (4b), a penalty term needs to be added to the loss function L

With this penalty term, the network can be made to satisfy H ≈ sgn (H) ═ B approximately in the training process, and then for the H variable, its update formula will become:

H^ta variable H after the t-th round of updating is shown;

comparing equation (4b) and equation (6), we can find the method for implementing (4b), that is, implementing in the reverse propagation of the newly designed deep hash coding module:

the above formula represents that in the back propagation process of the newly designed coding module, the gradient of the loss function with respect to B can be directly and completely transmitted back to the front layer H, and finally transmitted back to the network front end to complete parameter learning and network updating.

Intuitively, two most critical parts (formulas 5 and 7) in the implementation of the hash coding module respectively solve two problems of gradient loss and quantization error in the field of deep hash. Firstly, the Hash coding module uses equation (5) in the forward propagation process, which means that the discrete value-taking characteristic of the code is strictly kept in the whole training process without any relaxation, so that the problem of quantization error is fundamentally avoided. In the process of backward propagation, the coding layer uses a formula (7), so that the gradient of the loss function relative to B is directly and completely transmitted back to the variable H, on one hand, the problem of gradient disappearance caused by using a sign function sgn () in forward propagation is solved, on the other hand, each coding bit can simultaneously obtain the gradient and update, and the network can finish rapid training and learning.

In addition, the invention also adds a punishment item in the objective function of the network training

The proposed coding module is assisted, and the function of the coding module can be intuitively understood to enable the value of the variable H to be closer to the variable B, so that the gradient deviation is reduced (namely the formula (6) is equal as much as possible), and the accuracy of the parameter updating direction is ensuredAnd the stability of network optimization.

Therefore, the algorithm of the invention can obtain a faster optimization speed compared with other algorithms due to the use of the formula (7), and the matching of the formulas (5) (7) and the penalty term enables the hash coding layer and the deep neural network of the invention to be better fused together, so that the algorithm has a better coding performance.

The following is a verification of the method of the invention:

first, the experimental details are clarified, and the invention will accomplish the code implementation of the algorithm of the invention on the pyrrch framework. AlexNet is selected as the basis of the convolutional neural network, namely AlexNet is used for completing the operation of extracting the continuous value features of the image, then the Hash coding layer is connected to the AlexNet output H to generate the image binary coding B, then the most common cross entropy loss is used for classifying the coding B, the sum of the misclassification losses is calculated and back propagation is started, parameters are updated, and the training of the network is completed. The present invention will use an update of the batch sample processing, with the batch size being 32. Meanwhile, the invention uses a random gradient descent optimizer with momentum, the learning rate lr is set to be 0.001, and the momentum parameter is set to be 0.9.

Firstly, the fast optimization performance of the algorithm is shown, the algorithm is compared with a DSDH algorithm capable of keeping discrete constraint all the time and a plurality of common hash coding methods of relaxation strategies (in the figure, the errors are the algorithm provided by the invention, tanh is that a relaxed tanh function is used for replacing an sgn function, penalty is that a network is constrained to output-1 and +1 values as far as possible in a punishment mode, but no strong constraint is carried out, 12 represents that 12bits are used for coding, and 48 represents that 48bits are used), and the result is shown in figure 5.

It can be clearly seen from the figure that the algorithm provided by the invention can complete faster and better search performance (MAP) promotion by using less training iteration times (epoch), so that the training cost is lower than that of the traditional algorithm.

The excellent retrieval performance of the present invention on large image datasets is demonstrated next. The results of comparing the present invention with several current deep hash algorithms with top performance in the field are shown in table 1 (tested on dataset CIFAR 10) and table 2 (tested on dataset ImageNet).

TABLE 1 search Performance MAP comparison on CIFAR10 dataset

TABLE 2 search Performance MAP comparison on ImageNet dataset

Compared with the prior deep hash algorithms such as DSDH, HashNet and the like, the method has better coding performance, means that the image binary code generated by the method can achieve better retrieval performance, and is very suitable for being applied to a large-scale image retrieval system.

It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims

1. A fast optimized depth Hash image coding method is characterized in that a Hash image coding model is established based on a greedy strategy aiming at a large image data set, and binary codes of all images are generated through a depth Hash coding network obtained after optimization; the method comprises the following steps:

1) modeling a Hash image coding problem to obtain a Hash image coding model;

the hash image coding model is expressed by equation (1):

s.t.B∈{-1,+1}^Kformula (1)

Wherein, B represents binary code generated by forward propagation of the input image by using a depth network; wherein the constraint condition restricts that each bit of the code B can only be selected from { -1, +1}, and has K bits in total, namely each image is coded into a K-bit binary code; l (B) represents a loss function calculated for B;

2) solving a Hash image coding model by using a greedy strategy to obtain an optimal binary code B; the method comprises the following operations:

21) in the solving process, discrete constraint B is not considered to be belonged to-1, +1^KIn the case of (1), the gradient of L with respect to B is first calculated

wherein t represents the updating of the t-th round in the training process, and lr represents the learning rate preset by the algorithm; b is^tRepresents the updated code B of the t round; b is^t+1Represents the code B after the t +1 th round of updating; l represents a loss function of the model; b found by gradient update (2)^t+1Is l (b) the optimal update direction for the current iteration selected without considering the dispersion constraint;

23) to this solution sgn (B)^t+1) The parameter updating is carried out in the direction of (2), namely, the formula (1) is solved by using the formula (3):

3) designing a depth Hash image coding module in a depth network, training a Hash image coding model, and realizing an updating mode of the formula (3); the method comprises the following operations:

32) designing a brand-new depth hash image coding module at the last layer of the convolutional neural network:

the input is continuous value image feature H, and the output is code B;

realizing the updating mode of the formula (3) in a depth hash image coding module; using a sign function for H bit by bit in module forward propagation to obtain a binary code B;

when the module reversely transmits, directly assigning the gradient information obtained by the coding B to H, namely enabling the gradient information of H to be equal to the gradient information of B, and enabling the gradient to be smoothly transmitted back to a front-layer network;

4) after the training and convergence of the neural network are completed, a trained image deep hash coding network is obtained;

2. The fast optimized depth hash image coding method of claim 1, wherein the loss function comprises a mean square error function, a cross entropy function.

3. The fast optimized depth hash image coding method according to claim 1, wherein in step 32), the depth hash image coding module implements the following operations:

321) the variable H is first introduced into formula (3) to give formula (4), including formulae (4a) and (4 b):

wherein H^t+1Represents the variable H after the t +1 th round of updating;

b ═ sgn (h) formula (5)

323) Adding a penalty term to an objective function of network training to assist a deep hash image coding module to realize an equation (4 b);

for (4b), a penalty term is first added to the loss function L

Enabling the network to approximately satisfy H ≈ sgn (H) ═ B in the training process; for the H variable, the update formula is equation (6):

wherein H^tA variable H after the t-th round of updating is shown;

comparing equation (6) with equation (4b), the value is directly assigned to the feedback during the backward propagation of the deep hash coding module, and is expressed as equation (7):

equation (7) represents that during the back propagation of the newly designed coding module, the loss function is directly and completely transmitted back to the front layer H with respect to the gradient of B, and finally transmitted back to the network front end, thereby completing parameter learning and network updating.

4. The fast optimized deep hash image coding method according to claim 1, wherein the convolutional neural network employs a deep network AlexNet or ResNet.

5. A target image retrieval method for fast optimization depth hash coding is characterized in that a hash image coding model is established by the fast optimization depth hash image coding method according to any one of claims 1 to 4, and binary codes of all images are generated through a depth hash coding network obtained after optimization; then, by calculating the Hamming distance between the query image code and the database image code, similar images of the query image are obtained; namely, the rapid retrieval of the database images with the target image similar to the query image is realized.

6. The method for retrieving a target image with fast optimized depth hash coding as claimed in claim 5, wherein the following operations are specifically performed:

6) when a user provides a query image, firstly, coding the query image by using the fast optimization depth hash image coding method to generate a code of the query image;

7) then, by calculating Hamming distances between the query image codes and all database image codes, returning the first M database images with the smallest distance after sorting, namely, similar image retrieval results of the query image; m is the number of return images set by the user.