CN110457503B - Method for quickly optimizing depth hash image coding and target image retrieval - Google Patents

Method for quickly optimizing depth hash image coding and target image retrieval Download PDF

Info

Publication number
CN110457503B
CN110457503B CN201910701690.6A CN201910701690A CN110457503B CN 110457503 B CN110457503 B CN 110457503B CN 201910701690 A CN201910701690 A CN 201910701690A CN 110457503 B CN110457503 B CN 110457503B
Authority
CN
China
Prior art keywords
image
hash
coding
network
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910701690.6A
Other languages
Chinese (zh)
Other versions
CN110457503A (en
Inventor
张超
苏树鹏
韩凯
田永鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201910701690.6A priority Critical patent/CN110457503B/en
Publication of CN110457503A publication Critical patent/CN110457503A/en
Application granted granted Critical
Publication of CN110457503B publication Critical patent/CN110457503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for quickly optimizing encoding of a deep hash image and a method for retrieving a target image. When the target image is searched, similar images of the query image can be quickly obtained by calculating the Hamming distance between the query image code and the database image code. The method of the invention combines the neural network to better solve the problems of gradient disappearance and quantization error, and the coding performance is better; the training process of the deep network is completed by using fewer iteration times, so that the training speed is higher; the method can be applied to various problems with discrete constraint, and has wider application range; the optimization speed of the deep neural network and the retrieval performance of the generated image codes are further improved, and the retrieval precision of a large image database is effectively improved.

Description

Method for quickly optimizing depth hash image coding and target image retrieval
Technical Field
The invention belongs to the technical field of information retrieval, relates to image processing and image fast retrieval technology, and particularly relates to a greedy strategy-based fast optimization depth hash image coding method and a target image retrieval method thereof.
Background
With the advent of the big data era, data in various fields has increased explosively, and in such a big data wave, how to search for information that is needed by the user is an important and urgent research topic. The hash algorithm is an algorithm for quickly completing target image retrieval on a large image data set, and the main idea is that images are coded into a string of binary codes (namely, each image is represented by a string of binary codes with limit as the characteristics of the image), and the hamming distance of the images is obtained through quick exclusive or operation between the binary codes, so that approximate nearest neighbor image retrieval (namely, the image which is closest to a query image is found out from an image database) is completed after sorting. The representation mode of the binary image features can bring very low storage requirements (binary system) and very high retrieval speed (the Hamming distance between the features can be obtained through the simplest bitwise exclusive OR operation in a computer so as to judge the similarity degree of the two images), so that the binary image feature representation mode has great research potential and application range.
In recent years, deep learning, particularly its representative one, is the rapid development of convolutional neural networks, so that the performance of each large image-related application is improved qualitatively (such as classification, object detection, image retrieval, etc.). The deep neural network is mainly trained by using a stochastic gradient descent method, in short, an image is input into the network, the network propagates forward to obtain image features, a corresponding loss function (which can also be called an objective function, for example, the retrieved target is that images of the same class should have similar image features) is calculated, then the loss propagates backward, and the gradient of neurons in each layer is calculated (in the direction of reducing the loss function), so that parameter updating and network training are completed, and the loss function is reduced to the minimum (as shown in fig. 1). The strong force of deep learning prompts hash researchers to put forward to combine the hash algorithm with the deep network, namely, the deep hash algorithm, so as to further improve the retrieval performance of image coding.
The idea of solving the task of fast image retrieval by the deep hash algorithm is divided into two steps, firstly, the powerful feature learning capacity of the convolutional neural network is utilized to learn the depth feature expression of each image in an image database, so that compared with the representation of using each pixel value of an original image or using the feature extracted by the traditional feature extraction algorithm as an image, the depth network can output the image feature which can better represent the characteristics of the input image. Secondly, the continuous value image features are further coded into binary features by utilizing a Hash algorithm, so that the storage requirement is greatly reduced, the retrieval efficiency is rapidly improved, and the rapid and accurate retrieval requirement is really met. The deep hash algorithm effectively integrates the two steps to the same deep network framework, so that the deep feature learning and the hash coding can mutually promote learning and training, and the optimal image binary coding and the corresponding coding network are obtained.
However, in practice, the deep hash algorithm for true network end-to-end training still remains a very challenging problem, and the main difficulty is that the gradient (derivative) of the sign function (as shown in fig. 2) used for encoding the image into the binary code is zero everywhere, which is fatal to a deep neural network trained by a gradient descent method, and a network at the front layer of the sign function cannot obtain any gradient update information, which results in training failure.
The hash algorithm can quickly complete the task of target image retrieval on a large image data set by encoding an image into a string of compact binary codes (namely, given a query image, the algorithm finds out similar images in a large image database and returns the similar images to a user), and has a very wide application range, such as technical application of image searching, face authentication and the like. The deep hash algorithm hopes to combine both the powerful deep learning and hash algorithm at present, and further improve the performance of the image retrieval system. A very troublesome problem faced in the field of deep hash research is that a sign function (input value greater than 0 and output +1, and input value less than 0 and output-1) used for encoding an image into a binary code has a gradient of zero everywhere, which is fatal to a deep neural network trained by a gradient descent method, and the problem of disappearance of the gradient makes the front layer of the network unable to obtain any updated information, and finally the training fails so that the image cannot be effectively encoded.
Most of the existing deep hash algorithms propose to relax the original problem, and do not require strictly generating binary codes { -1, +1} in the training stage, and relax the binary codes to be continuous values between-1 and +1 (the corresponding generating function is derivable everywhere), so that the network can complete training, and then quantize the continuous value characteristics in the final testing stage to obtain the real binary codes. Although the method can solve the problem of gradient disappearance, the relaxation strategy introduces the quantization error problem, which is reflected in the phenomenon that the performance is reduced because the real binary code forcibly generated in the testing stage is different from the image characteristics generated in the training stage.
Although HashNet and DSDH can avoid obstacles such as gradient elimination and quantization error in deep hash to complete network training, the problems exposed by HashNet and DSDH are obvious. On one hand, they all need iterative training for many times, because the DSDH only updates one bit in the binary code each time, different code positions need to be updated circularly, and the HashNet needs to train again after continuously tightening the constraint beta, so that more training iterations are needed, and the training cost is high. On the other hand, for the DSDH, the discrete circular coordinate descent method used by the DSDH can only be applied to the discrete quadratic programming problem, the application range is limited, and the HashNet still has partial quantization errors as a result, so that the performance is not optimal.
Most of depth hash algorithms apply a relaxation strategy to solve the gradient disappearance problem, for example, in document [1], a tanh function (the derivative is not 0) is used to output a continuous value between [ -1, +1] to replace a discrete value { -1, +1} of a sign function, and then after training is completed, the obtained image feature is strictly binarized to obtain a true binary image feature. Although the method can solve the problem of gradient disappearance, a quantization error is introduced by a relaxation strategy applied by the method, which is embodied in that the characteristics of a real binary code generated in a testing stage and a continuous value image generated in a training stage are different, so that the generated image binary coding and coding network are suboptimal. For the quantization error problem, documents [2] and [3] show respective solutions and achieve the current superior performance in the field. Document [2] proposes a HashNet algorithm that reduces quantization errors by first using a relaxed y-tanh (β x) coding function and then gradually increasing β during the training process to approximate the original sign function y-sgn (x), with the gradual increase in training difficulty not causing the network to fail in training from the beginning. Document [3] proposes a DSDH algorithm, which uses a discrete cyclic coordinate descent method to solve the Hash discrete coding optimization problem, and the whole solving and training stage can maintain the discrete value constraint without relaxation, thereby avoiding the introduction of quantization errors. Although the retrieval performance is improved, the problems of slow network training, limited application range and the like still exist in HashNet and DSDH.
Reference documents:
[1]Zhao F,Huang Y,Wang L,et al.Deep semantic ranking based hashing for multi-label image retrieval[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2015:1556-1564.
[2]Cao Z,Long M,Wang J,et al.HashNet:Deep Learning to Hash by Continuation[C]//ICCV.2017:5609-5618.
[3]Li Q,Sun Z,He R,et al.Deep supervised discrete hashing[C]//Advances in Neural Information Processing Systems.2017:2482-2491.
[4]Goodfellow I,Bengio Y,Courville A,et al.Deep learning[M].Cambridge:MIT press,2016.
[5]Li W J,Wang S,Kang W C.Feature learning based deep supervised hashing with pairwise labels[J].arXiv preprint arXiv:1511.03855,2015.
[6]Zhu H,Long M,Wang J,et al.Deep Hashing Network for Efficient Similarity Retrieval[C]//AAAI.2016:2415-2421.
[7]Lai H,Pan Y,Liu Y,et al.Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2015:3270-3278.
[8]Xia R,Pan Y,Lai H,et al.Supervised hashing for image retrieval via image representation learning[C]//AAAI.2014,1(2014):2.
disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a greedy strategy-based fast optimization depth hash image coding method and a target image retrieval method, which are combined with a neural network to better solve the problems of gradient disappearance and quantization errors and have better coding performance; the training process of the deep network is completed by using fewer iteration times, so that the training speed is higher; the method can be applied to various problems with discrete constraint, and has wider application range; the optimization speed of the deep neural network and the retrieval performance of the generated image codes are further improved, and the retrieval precision of a large image database is effectively improved.
The technical scheme provided by the invention is as follows:
a target image retrieval method for fast optimizing depth hash coding comprises the following steps: a greedy strategy-based method for rapidly optimizing a depth hash image coding method and a target image retrieval method are provided. Based on a greedy strategy, aiming at a large-scale image data set, a Hash image coding model is established, and binary codes of all images are generated through a depth Hash coding network obtained after optimization. When the target image is searched, similar images of the query image can be quickly obtained by calculating the Hamming distance between the query image code and the database image code. The method comprises the following steps:
1) modeling a Hash image coding problem to obtain a Hash image coding model;
modeling the Hash image coding problem to obtain an optimization problem, namely a Hash image coding model, which is expressed as formula (1):
Figure BDA0002150994730000041
s.t.B∈{-1,+1}Kformula (1)
Wherein, B represents a binary code generated by forward propagation of the input image by using the depth network, and the constraint condition constrains that each bit of the code B can only be selected from { -1, +1} and has a total of K bits (i.e. each image is coded as a K-bit binary code). L (B) represents the computation of the loss function for B, because the algorithm proposed by the present invention does not limit L to be a quadratic programming form unlike DSDH, L in the above formula can be embodied as various commonly used loss functions in the deep learning field, such as a mean square error function, a cross entropy function, and the like, in the model of the present invention.
2) Solving a Hash image coding model (formula (1)) by using a greedy strategy to obtain an optimal binary code B; the method comprises the following operations:
21) in the solving process, discrete constraint B is not considered to be belonged to-1, +1KIn the case of (3), the gradient of L with respect to B can be calculated first
Figure BDA0002150994730000042
The iterative update is then performed using the gradient descent method of the following formula, represented as formula (2):
Figure BDA0002150994730000043
wherein t represents the updating of the t-th round in the training process, and lr represents the learning rate preset by the algorithm; b istRepresents the updated code B of the t round; b ist+1Represents the code B after the t +1 th round of updating; l represents the loss function of the model.
B found by gradient update (2)t+1Is l (b) the optimal update direction for the current iteration selected without considering the dispersion constraint;
22) obtaining a value B from the continuumt+1The nearest solution that satisfies the discrete valued constraint, sgn (B)t+1) Sgn () represents the use of a sign function element by element;
23) to this solution sgn (B)t+1) The parameters are updated, that is, the hash coding optimization problem formula (1) is solved by using formula (3):
Figure BDA0002150994730000051
3) designing a deep Hash coding module in a deep network, training a Hash image coding model, and realizing an updating mode of the formula (3); the method comprises the following operations:
31) representing an input image as a series of image features H taking continuous values using a convolutional neural network;
32) designing a brand-new depth Hash image coding module at the last layer of the convolutional neural network, wherein the input of the depth Hash image coding module is a continuous value image characteristic H, and the output of the depth Hash image coding module is a coded B; realizing the updating mode of the formula (3) in a depth hash image coding module; using a sign function for H bit by bit in module forward propagation to obtain a strictly binary code B; when the module reversely transmits, directly assigning the gradient information obtained by the coding B to H, namely enabling the gradient information of H to be equal to the gradient information of B, and enabling the gradient to be smoothly transmitted back to a front-layer network;
the depth hash image coding module can solve the problems of gradient disappearance and quantization errors in the depth hash algorithm and realize quick optimization and accurate coding. The depth hash image coding module completes the following operations:
321) by first introducing the variable H into equation (3), equation (4) can be obtained:
Figure BDA0002150994730000052
wherein Ht+1Represents the variable H after the t +1 th round of updating;
322) to implement equation (4a), a sign function is applied to the input H in the forward propagation of the depth hash image coding module, and is expressed as equation (5):
b ═ sgn (h) formula (5)
323) To realize the formula (4b), adding a penalty term in the target function of the network training to assist the depth hash image coding module;
for (4b), a penalty term needs to be added to the loss function L first
Figure BDA0002150994730000053
Enabling the network to approximately satisfy H ≈ sgn (H) ═ B in the training process; for the H variable, its update formula becomes equation (6):
Figure BDA0002150994730000054
wherein HtA variable H after the t-th round of updating is shown;
comparing equation (6) with equation (4b), the method for implementing (4b) is to implement direct assignment back-propagation in the back propagation of the deep hash coding module, which is expressed as equation (7):
Figure BDA0002150994730000061
equation (7) represents that in the back propagation process of the newly designed coding module, the gradient of the loss function with respect to B can be directly and completely transmitted back to the front layer H and finally transmitted back to the front end of the network to complete parameter learning and network updating.
In the network training process, a sign function is strictly used during forward propagation to keep discrete constraint always true; when the gradient is completely transmitted back to a front-layer network during reverse propagation, the problem of gradient disappearance is solved, and simultaneously, each coding bit is synchronously updated so as to quickly finish effective neural network training and convergence;
4) after training is finished, a trained image depth Hash coding network is obtained;
5) coding all database images by using a trained deep hash coding network to generate database image codes;
through the process, rapid optimization depth hash image coding based on the greedy strategy is achieved.
When the target image retrieval is carried out, the following operations are carried out:
6) when a user provides a query image, the depth hash coding network is used for coding the query image to generate a query image code;
7) then, by calculating the Hamming distance between the query image code and all database image codes, returning the front M (the number of returned images set by a user) database images with the minimum distance after sorting, namely, the similar image retrieval result of the query image;
through the process, the database image with the target image similar to the query image is quickly retrieved.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a greedy strategy-based target image retrieval method for rapidly optimizing deep hash coding, which solves a deep hash discrete optimization problem by using the greedy strategy and updates a network by iteratively solving an approximate optimal solution meeting discrete constraints under the current condition so as to complete rapid and effective training. In the specific implementation, a brand-new deep hash coding module is designed, the sign function is strictly used during forward propagation to keep the discrete constraint always true, the problem of quantization error is avoided, the gradient is completely transmitted back to a front-layer network during reverse propagation, the problem of gradient disappearance is solved, and simultaneously, each coding bit is synchronously updated, so that effective neural network training and convergence are rapidly completed. In addition, a penalty term is added in the objective function of network training to assist the proposed coding module, so that gradient deviation generated in backward propagation of the coding module is effectively reduced, the accuracy of the parameter updating direction and the stability of network optimization are ensured, and the generated image coding is more accurate and robust. Compared with the traditional algorithm, the deep hash coding method provided by the invention has higher training speed and better retrieval performance and fully illustrates the application potential of the method in the large-scale image database retrieval problem.
The technical advantages of the method of the invention are embodied as follows:
a brand-new deep hash coding module is provided based on a greedy strategy, and the coding module strictly keeps using a sign function to maintain discrete constraint during forward propagation, so that quantization errors are prevented from being introduced, then gradients are completely transmitted back in backward propagation, the problem of gradient disappearance is avoided, synchronous updating of all coding bits is realized, and the network is helped to complete quick training and coding performance improvement. For large-scale image retrieval application, the module can effectively reduce the training cost and obviously improve the retrieval precision.
(II) adding a penalty term in the objective function of network training
Figure BDA0002150994730000071
The encoding module provided by the invention is assisted to enable the value of the continuous variable H to be closer to the discrete encoding variable B, so that the gradient deviation in the backward propagation of the encoding module is reduced, the accuracy of the parameter updating direction and the stability of network optimization are ensured, and the generated image encoding is more accurate and robust.
Drawings
FIG. 1 is a schematic diagram of a prior art neural network gradient descent method;
wherein, the abscissa is a parameter w of the network; the ordinate is the loss function of the training network.
Fig. 2 is a schematic diagram of a sign function.
Fig. 3 is a flowchart of a target image retrieval method provided by the present invention.
FIG. 4 is a schematic diagram of a network model used in the practice of the present invention;
the front-layer basic framework of the network adopts an AlexNet structure, the image is subjected to AlexNet to obtain a characteristic H, and the image is subjected to the Hash coding module provided by the invention to obtain a code B.
FIG. 5 is a comparison of the rapid optimization performance of the present invention method compared to other prior art methods when practiced.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The invention provides a greedy strategy-based fast optimization deep hash image coding method and a target image retrieval method. By designing a brand-new deep hash coding module, the sign function is strictly used during forward propagation to keep the discrete constraint always true, so that the problem of quantization error is avoided, the gradient is completely transmitted back to a front-layer network during reverse propagation, and each coding bit is synchronously updated while the problem of gradient disappearance is solved, so that effective neural network training and convergence are rapidly completed. In addition, a penalty term is added in the objective function of network training to assist the proposed coding module, so that gradient deviation generated in backward propagation of the coding module is effectively reduced, the accuracy of the parameter updating direction and the stability of network optimization are ensured, and the generated image coding is more accurate and robust.
The flow of the method of the invention is shown in FIG. 3. The overall structure of the network is shown in fig. 4.
In specific implementation, the method for generating the image code aiming at all the query images and the database images comprises the following steps:
firstly, modeling a Hash coding problem to obtain the following optimization problem:
Figure BDA0002150994730000081
s.t.B∈{-1,+1}Kformula (1)
Wherein, B represents a binary code generated by forward propagation of the input image by using the depth network, and the constraint condition constrains that each bit of the code B can only be selected from { -1, +1} and has a total of K bits (i.e. each image is coded as a K-bit binary code). L (B) represents the computation of the loss function for B, because the algorithm proposed by the present invention does not limit L to be a quadratic programming form unlike DSDH, L in the above formula can be embodied as various commonly used loss functions in the deep learning field, such as a mean square error function, a cross entropy function, and the like, in the model of the present invention.
To solve this optimization problem, the optimal binary code B is obtained. Without considering the discrete constraint B e { -1, +1}KIn the case of (3), the gradient of L with respect to B can be calculated first
Figure BDA0002150994730000082
The iterative update is then performed using the gradient descent method of the following formula:
Figure BDA0002150994730000083
wherein t represents the updating of the t-th round in the training process, and lr represents the learning rate preset by the algorithm; b istRepresents the updated code B of the t round; b ist+1Represents the code B after the t +1 th round of updating; l represents the loss function of the model. However, B calculated by this equation is almost impossible to satisfy the constraint B e { -1, +1}KHowever, once the discrete value constraint is considered, (1) becomes an NP-hard problem. An algorithm for rapidly and effectively solving the NP difficult problem is a greedy algorithm, and the optimal selection under the current condition is selected through each iteration updating, so that a very similar global optimal solution can be obtainedAnd (6) performing solution. If gradient is used to update B found by equation (2)t+1L (B) the optimal update direction of the current iteration, chosen without taking into account the discrete constraints, then a greedy principle is applied, away from the continuous value Bt+1The nearest solution that satisfies the discrete valued constraint, sgn (B)t+1) It is likely to be the optimal discrete solution in this iteration, so the invention "greedy" moves towards this solution sgn (B)t+1) The parameters are updated, that is, the problem in (1) is solved by using the following formula:
Figure BDA0002150994730000084
although equation (3) may not be the most effective method for solving the discrete optimization problem alone, it is one of the most efficient ways to solve the discrete optimization problem by fusing with the neural network, because there are three points:
1. the deep neural network completes parameter updating and learning through a random gradient descent method, and gradient descent is a greedy strategy which completes final convergence by iteratively updating one step towards the steepest descent direction of the current situation. It can be seen that the feasibility of the greedy strategy in the neural network is very high, so that the idea is very reasonable and time-consuming to solve the problem of discrete value optimization in the deep network.
2. The update form of equation (3) is actually a variant of the neural network gradient update formula (also calculating the parameter gradient and then performing the parameter update), and the update mode with the two similar forms lays a solid foundation for implementing equation (3) in the neural network (see section 4.2).
3. As indicated in document [4], the stochastic gradient descent algorithm (using a part of samples to calculate the gradient) is originally equivalent to adding noise to the true gradient information (calculated using all training samples), and the appropriate noise in the gradient can bring some regularization effect to the neural network and also enable the neural network to escape from most local minimum points and saddle points in the optimization. Observing equation (3) can find that the action is equivalent to introducing "noise" to the original equation (2) through the sign function sgn (). Therefore, the use of the formula (3) in the neural network is not only beneficial to solving the discrete value optimization problem of the network, but also improves the optimization process performance of the network to a certain extent.
The problem of discrete coding optimization in a neural network is solved based on the greedy strategy, and the deep hash coding module provided by the present invention is specifically described below to implement the formula (3) update mode in the deep network.
As with the basic flow of the deep hash algorithm described in the background section, an input image is first represented as a string of image features taking consecutive values using a convolutional neural network (any common deep network can be used, such as AlexNet, ResNet, etc.), and this string of features is denoted as H. Then, a new hash coding module is designed, the input of the hash coding module is the continuous value image characteristic H, and the output of the hash coding module is the coded B. It is necessary to implement the updating manner of equation (3) of section 4.1 in this module, and intuitively clarify how it solves the gradient elimination and quantization error problem in the deep hash algorithm and achieve the goal of fast optimization and accurate coding from the construction manner of this module.
First introducing the variable H into equation (3), one can obtain:
Figure BDA0002150994730000091
Ht+1represents the variable H after the t +1 th round of updating;
then the task becomes how (4a) and (4b) to implement. Observing equation (4a) can be quickly found, and it is straightforward to implement it, namely using the sign function for input H in the forward propagation of the newly designed coding module:
b ═ sgn (h) formula (5)
For (4b), a penalty term needs to be added to the loss function L
Figure BDA0002150994730000092
With this penalty term, the network can be made to satisfy H ≈ sgn (H) ═ B approximately in the training process, and then for the H variable, its update formula will become:
Figure BDA0002150994730000093
Hta variable H after the t-th round of updating is shown;
comparing equation (4b) and equation (6), we can find the method for implementing (4b), that is, implementing in the reverse propagation of the newly designed deep hash coding module:
Figure BDA0002150994730000101
the above formula represents that in the back propagation process of the newly designed coding module, the gradient of the loss function with respect to B can be directly and completely transmitted back to the front layer H, and finally transmitted back to the network front end to complete parameter learning and network updating.
Intuitively, two most critical parts (formulas 5 and 7) in the implementation of the hash coding module respectively solve two problems of gradient loss and quantization error in the field of deep hash. Firstly, the Hash coding module uses equation (5) in the forward propagation process, which means that the discrete value-taking characteristic of the code is strictly kept in the whole training process without any relaxation, so that the problem of quantization error is fundamentally avoided. In the process of backward propagation, the coding layer uses a formula (7), so that the gradient of the loss function relative to B is directly and completely transmitted back to the variable H, on one hand, the problem of gradient disappearance caused by using a sign function sgn () in forward propagation is solved, on the other hand, each coding bit can simultaneously obtain the gradient and update, and the network can finish rapid training and learning.
In addition, the invention also adds a punishment item in the objective function of the network training
Figure BDA0002150994730000102
The proposed coding module is assisted, and the function of the coding module can be intuitively understood to enable the value of the variable H to be closer to the variable B, so that the gradient deviation is reduced (namely the formula (6) is equal as much as possible), and the accuracy of the parameter updating direction is ensuredAnd the stability of network optimization.
Therefore, the algorithm of the invention can obtain a faster optimization speed compared with other algorithms due to the use of the formula (7), and the matching of the formulas (5) (7) and the penalty term enables the hash coding layer and the deep neural network of the invention to be better fused together, so that the algorithm has a better coding performance.
The following is a verification of the method of the invention:
first, the experimental details are clarified, and the invention will accomplish the code implementation of the algorithm of the invention on the pyrrch framework. AlexNet is selected as the basis of the convolutional neural network, namely AlexNet is used for completing the operation of extracting the continuous value features of the image, then the Hash coding layer is connected to the AlexNet output H to generate the image binary coding B, then the most common cross entropy loss is used for classifying the coding B, the sum of the misclassification losses is calculated and back propagation is started, parameters are updated, and the training of the network is completed. The present invention will use an update of the batch sample processing, with the batch size being 32. Meanwhile, the invention uses a random gradient descent optimizer with momentum, the learning rate lr is set to be 0.001, and the momentum parameter is set to be 0.9.
Firstly, the fast optimization performance of the algorithm is shown, the algorithm is compared with a DSDH algorithm capable of keeping discrete constraint all the time and a plurality of common hash coding methods of relaxation strategies (in the figure, the errors are the algorithm provided by the invention, tanh is that a relaxed tanh function is used for replacing an sgn function, penalty is that a network is constrained to output-1 and +1 values as far as possible in a punishment mode, but no strong constraint is carried out, 12 represents that 12bits are used for coding, and 48 represents that 48bits are used), and the result is shown in figure 5.
It can be clearly seen from the figure that the algorithm provided by the invention can complete faster and better search performance (MAP) promotion by using less training iteration times (epoch), so that the training cost is lower than that of the traditional algorithm.
The excellent retrieval performance of the present invention on large image datasets is demonstrated next. The results of comparing the present invention with several current deep hash algorithms with top performance in the field are shown in table 1 (tested on dataset CIFAR 10) and table 2 (tested on dataset ImageNet).
TABLE 1 search Performance MAP comparison on CIFAR10 dataset
Figure BDA0002150994730000111
TABLE 2 search Performance MAP comparison on ImageNet dataset
Figure BDA0002150994730000112
Compared with the prior deep hash algorithms such as DSDH, HashNet and the like, the method has better coding performance, means that the image binary code generated by the method can achieve better retrieval performance, and is very suitable for being applied to a large-scale image retrieval system.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims (6)

1. A fast optimized depth Hash image coding method is characterized in that a Hash image coding model is established based on a greedy strategy aiming at a large image data set, and binary codes of all images are generated through a depth Hash coding network obtained after optimization; the method comprises the following steps:
1) modeling a Hash image coding problem to obtain a Hash image coding model;
the hash image coding model is expressed by equation (1):
Figure FDA0003458134820000011
s.t.B∈{-1,+1}Kformula (1)
Wherein, B represents binary code generated by forward propagation of the input image by using a depth network; wherein the constraint condition restricts that each bit of the code B can only be selected from { -1, +1}, and has K bits in total, namely each image is coded into a K-bit binary code; l (B) represents a loss function calculated for B;
2) solving a Hash image coding model by using a greedy strategy to obtain an optimal binary code B; the method comprises the following operations:
21) in the solving process, discrete constraint B is not considered to be belonged to-1, +1KIn the case of (1), the gradient of L with respect to B is first calculated
Figure FDA0003458134820000012
The iterative update is then performed using the gradient descent method of the following formula, represented as formula (2):
Figure FDA0003458134820000013
wherein t represents the updating of the t-th round in the training process, and lr represents the learning rate preset by the algorithm; b istRepresents the updated code B of the t round; b ist+1Represents the code B after the t +1 th round of updating; l represents a loss function of the model; b found by gradient update (2)t+1Is l (b) the optimal update direction for the current iteration selected without considering the dispersion constraint;
22) obtaining a value B from the continuumt+1The nearest solution that satisfies the discrete valued constraint, sgn (B)t+1) Sgn () represents the use of a sign function element by element;
23) to this solution sgn (B)t+1) The parameter updating is carried out in the direction of (2), namely, the formula (1) is solved by using the formula (3):
Figure FDA0003458134820000014
3) designing a depth Hash image coding module in a depth network, training a Hash image coding model, and realizing an updating mode of the formula (3); the method comprises the following operations:
31) representing an input image as a series of image features H taking continuous values using a convolutional neural network;
32) designing a brand-new depth hash image coding module at the last layer of the convolutional neural network:
the input is continuous value image feature H, and the output is code B;
realizing the updating mode of the formula (3) in a depth hash image coding module; using a sign function for H bit by bit in module forward propagation to obtain a binary code B;
when the module reversely transmits, directly assigning the gradient information obtained by the coding B to H, namely enabling the gradient information of H to be equal to the gradient information of B, and enabling the gradient to be smoothly transmitted back to a front-layer network;
4) after the training and convergence of the neural network are completed, a trained image deep hash coding network is obtained;
5) coding all database images by using a trained deep hash coding network to generate database image codes;
through the process, rapid optimization depth hash image coding based on the greedy strategy is achieved.
2. The fast optimized depth hash image coding method of claim 1, wherein the loss function comprises a mean square error function, a cross entropy function.
3. The fast optimized depth hash image coding method according to claim 1, wherein in step 32), the depth hash image coding module implements the following operations:
321) the variable H is first introduced into formula (3) to give formula (4), including formulae (4a) and (4 b):
Figure FDA0003458134820000021
wherein Ht+1Represents the variable H after the t +1 th round of updating;
322) to implement equation (4a), a sign function is applied to the input H in the forward propagation of the depth hash image coding module, and is expressed as equation (5):
b ═ sgn (h) formula (5)
323) Adding a penalty term to an objective function of network training to assist a deep hash image coding module to realize an equation (4 b);
for (4b), a penalty term is first added to the loss function L
Figure FDA0003458134820000022
Enabling the network to approximately satisfy H ≈ sgn (H) ═ B in the training process; for the H variable, the update formula is equation (6):
Figure FDA0003458134820000023
wherein HtA variable H after the t-th round of updating is shown;
comparing equation (6) with equation (4b), the value is directly assigned to the feedback during the backward propagation of the deep hash coding module, and is expressed as equation (7):
Figure FDA0003458134820000024
equation (7) represents that during the back propagation of the newly designed coding module, the loss function is directly and completely transmitted back to the front layer H with respect to the gradient of B, and finally transmitted back to the network front end, thereby completing parameter learning and network updating.
4. The fast optimized deep hash image coding method according to claim 1, wherein the convolutional neural network employs a deep network AlexNet or ResNet.
5. A target image retrieval method for fast optimization depth hash coding is characterized in that a hash image coding model is established by the fast optimization depth hash image coding method according to any one of claims 1 to 4, and binary codes of all images are generated through a depth hash coding network obtained after optimization; then, by calculating the Hamming distance between the query image code and the database image code, similar images of the query image are obtained; namely, the rapid retrieval of the database images with the target image similar to the query image is realized.
6. The method for retrieving a target image with fast optimized depth hash coding as claimed in claim 5, wherein the following operations are specifically performed:
6) when a user provides a query image, firstly, coding the query image by using the fast optimization depth hash image coding method to generate a code of the query image;
7) then, by calculating Hamming distances between the query image codes and all database image codes, returning the first M database images with the smallest distance after sorting, namely, similar image retrieval results of the query image; m is the number of return images set by the user.
CN201910701690.6A 2019-07-31 2019-07-31 Method for quickly optimizing depth hash image coding and target image retrieval Active CN110457503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910701690.6A CN110457503B (en) 2019-07-31 2019-07-31 Method for quickly optimizing depth hash image coding and target image retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910701690.6A CN110457503B (en) 2019-07-31 2019-07-31 Method for quickly optimizing depth hash image coding and target image retrieval

Publications (2)

Publication Number Publication Date
CN110457503A CN110457503A (en) 2019-11-15
CN110457503B true CN110457503B (en) 2022-03-25

Family

ID=68484255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910701690.6A Active CN110457503B (en) 2019-07-31 2019-07-31 Method for quickly optimizing depth hash image coding and target image retrieval

Country Status (1)

Country Link
CN (1) CN110457503B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506761B (en) * 2020-04-22 2021-05-14 上海极链网络科技有限公司 Similar picture query method, device, system and storage medium
US11748868B2 (en) 2020-09-08 2023-09-05 Kla Corporation Unsupervised pattern synonym detection using image hashing
CN112862096A (en) * 2021-02-04 2021-05-28 百果园技术(新加坡)有限公司 Model training and data processing method, device, equipment and medium
CN113034626B (en) * 2021-03-03 2024-04-02 中国科学技术大学 Optimization method for alignment of target object in feature domain in structured image coding
CN113343020B (en) * 2021-08-06 2021-11-26 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence and electronic equipment
CN115495546B (en) * 2022-11-21 2023-04-07 中国科学技术大学 Similar text retrieval method, system, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799881A (en) * 2012-07-05 2012-11-28 哈尔滨理工大学 Fingerprint direction information obtaining method based on binary image encoding model
US9351007B1 (en) * 2005-07-28 2016-05-24 Teradici Corporation Progressive block encoding using region analysis
CN108932314A (en) * 2018-06-21 2018-12-04 南京农业大学 A kind of chrysanthemum image content retrieval method based on the study of depth Hash
CN110069644A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of compression domain large-scale image search method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9351007B1 (en) * 2005-07-28 2016-05-24 Teradici Corporation Progressive block encoding using region analysis
CN102799881A (en) * 2012-07-05 2012-11-28 哈尔滨理工大学 Fingerprint direction information obtaining method based on binary image encoding model
CN108932314A (en) * 2018-06-21 2018-12-04 南京农业大学 A kind of chrysanthemum image content retrieval method based on the study of depth Hash
CN110069644A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of compression domain large-scale image search method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度哈希在大规模图像处理中的应用;刘玉莹等;《中国计算机用户协会网络应用分会2017年第二十一届网络新技术与应用年会》;20171221;全文 *

Also Published As

Publication number Publication date
CN110457503A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN110457503B (en) Method for quickly optimizing depth hash image coding and target image retrieval
Huang et al. YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers
Chang et al. Data: Differentiable architecture approximation
EP3983948A1 (en) Optimised machine learning
CN105069173A (en) Rapid image retrieval method based on supervised topology keeping hash
CN112417097B (en) Multi-modal data feature extraction and association method for public opinion analysis
CN110019652B (en) Cross-modal Hash retrieval method based on deep learning
CN109919084B (en) Pedestrian re-identification method based on depth multi-index hash
CN110442741B (en) Tensor fusion and reordering-based cross-modal image-text mutual search method
CN116383422B (en) Non-supervision cross-modal hash retrieval method based on anchor points
WO2023004206A1 (en) Unsupervised hashing method for cross-modal video-text retrieval with clip
CN115269865A (en) Knowledge graph construction method for auxiliary diagnosis
CN111008224A (en) Time sequence classification and retrieval method based on deep multitask representation learning
CN114461890A (en) Hierarchical multi-modal intellectual property search engine method and system
Jiang et al. xLightFM: Extremely memory-efficient factorization machine
CN113806554A (en) Knowledge graph construction method for massive conference texts
CN114996493A (en) Electric power scene image data screening method based on data elimination and redundancy elimination
CN105183845A (en) ERVQ image indexing and retrieval method in combination with semantic features
CN114860973A (en) Depth image retrieval method for small sample scene
Qiu et al. Efficient document retrieval by end-to-end refining and quantizing BERT embedding with contrastive product quantization
CN113515540A (en) Query rewriting method for database
CN117235216A (en) Knowledge reasoning method based on heterogeneous knowledge fusion
Zhang et al. Pairwise teacher-student network for semi-supervised hashing
CN116204673A (en) Large-scale image retrieval hash method focusing on relationship among image blocks
Sun et al. CellNet: An Improved Neural Architecture Search Method for Coal and Gangue Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant