US20220414144A1 - Multi-task deep hash learning-based retrieval method for massive logistics product images - Google Patents
Multi-task deep hash learning-based retrieval method for massive logistics product images Download PDFInfo
- Publication number
- US20220414144A1 US20220414144A1 US17/809,601 US202217809601A US2022414144A1 US 20220414144 A1 US20220414144 A1 US 20220414144A1 US 202217809601 A US202217809601 A US 202217809601A US 2022414144 A1 US2022414144 A1 US 2022414144A1
- Authority
- US
- United States
- Prior art keywords
- image
- hash
- denotes
- loss
- hash code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000013016 learning Effects 0.000 title claims abstract description 28
- 230000006870 function Effects 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 16
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 239000002699 waste material Substances 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000035045 associative learning Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/56—Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
Definitions
- the present disclosure relates to the technical field of image processing, and in particular to a multi-task deep Hash learning-based retrieval method for massive logistics product images.
- Hash is an approximate nearest neighbor search algorithm under extensive study, which can convert documents, images, videos and other multimedia information into compact binary codes, and retain the similarity between original data.
- Hamming distance is used for measuring the distance between binary codes (also known as Hash codes), which can be quickly solved by Exclusive OR of hardware. Therefore, Hash algorithm has great advantages in storage and efficiency, making it one of the most popular approximate nearest neighbor search algorithms.
- the present disclosure is oriented towards the field of massive logistics product images in the logistics industry. Therefore, how to quickly and effectively search a database for pictures required has become one of the points to be broken through. Owing to its advantages, Hash learning based on nearest neighbor algorithm has become a powerful tool for mass data search in recent years.
- Hash code is a compact representation of the original sample, and one sample can be represented by Hash codes of different lengths.
- Hash codes of different lengths representing the same sample reflect specific information of a type different from the original sample. If they are treated as different views of the original sample, there should be some differences and connections among different views. When merely Hash codes of a single length are considered, the potential relationship between them will be ignored, resulting in the loss of interactive information, reduced representational capacity and low retrieval accuracy. Moreover, for most linear non-depth Hash algorithms, feature extraction and Hash function learning are asynchronous. The design of Hash function is a complex task, and seeking an optimization method of the model is even more difficult.
- the present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images, so as to improve the performance of Hashing retrieval.
- a multi-task deep Hash learning-based retrieval method for massive logistics product images including the following steps:
- s ij denotes similarity between an ith image and a jth image
- s ij ⁇ 1,0 ⁇ the value of s ij being 1 indicates the ith image is similar to the jth image
- the value of s ij being 0 indicates the ith image is not similar to the jth image
- b i denotes a binary Hash code regarding data of the ith image
- b j denotes a binary Hash code regarding data of the jth image
- T denotes transposition
- MI LOSS Loss(B k , W k T B k+1 )+ ⁇ k ⁇ W k ⁇ 1 ,
- B k denotes a Hash code output from a kth branch, k ⁇ 0, . . . , N ⁇ 1
- B k+1 denotes a Hash code output from a k+1th branch
- W k denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch
- ⁇ k denotes a regularization parameter
- ⁇ 1 denotes an L1 norm
- a k denotes an optimization parameter
- Step f) optimizing the similarity loss function SI Loss and the mutual information loss function MI Loss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model; g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation B database of different lengths for each image;
- Dist Hamming ⁇ B query ⁇ B database ⁇ , and returning, based on the calculated Hamming distance Dist Hamming , mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
- each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3
- each of the pooling layers adopts a pooling kernel with a size of 2*2
- both the convolution layer and the pooling layer apply a Relu activation function.
- the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
- N in Step c) is a positive integer.
- M in Step f) is 5000.
- Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations.
- the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking.
- information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
- the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously.
- the model has an intuitive structure, and is easy to migrate and deploy.
- the multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for massive objects in the logistics industry.
- FIG. 1 is a flowchart of a method for multi-task feature extraction according to the present disclosure.
- FIG. 2 is a flowchart of a method for Hash code learning according to the present disclosure.
- FIG. 1 and FIG. 2 The present disclosure is further described with reference to FIG. 1 and FIG. 2 .
- a multi-task deep Hash learning-based retrieval method for massive logistics product images including the following steps: a) Conduct image preprocessing on an input logistics product image x i , and construct a similarity matrix S among logistics product images according to a label of the image x i .
- Adopt hard parameter sharing network the low-level feature networks have the same structure and share parameters. High-level feature networks have the same structure, but the parameters of the branch network are differentiated according to the difference in high-level features generated. Input the low-level image feature h img to the multi-branch network to obtain a high-level image representation B k indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure.
- S denotes similarity between an ith image and a jth image
- s ij ⁇ 1,0 ⁇
- the value of S being 1 indicates the i th image is similar to the jth image
- the value of s ij being 0 indicates the ith image is not similar to the jth image
- b i denotes a binary Hash code regarding data of the ith image
- b j denotes a binary Hash code regarding data of the jth image
- T denotes transposition.
- This formula is mainly to establish a relationship between Hash codes and similarity of the original samples. If the original samples are similar, the corresponding Hash codes should be as similar as possible; and if the original samples are not similar, the corresponding Hash codes should not be similar.
- MI LOSS Loss(B k , W K T B k+1 )+ ⁇ k ⁇ W k ⁇ 1
- B k denotes a Hash code output from a kth branch, k ⁇ 0, . . . , N ⁇ 1
- B k+1 denotes a Hash code output from a k+1th branch
- W k denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch
- ⁇ k denotes a regularization parameter
- ⁇ 1 denotes an L1 norm
- a k denotes an optimization parameter.
- the length of Hash codes is positively correlated with the representational capacity of Hash codes.
- the purpose of minimizing mutual information loss MI Loss is to draw the representational capacity of a shorter Hash code closer to a longer Hash code, and further enhance the correlation among a plurality of Hash codes, so that the Hash codes learned have good representational capacity, and the Hash code retrieval is improved.
- Step f) Optimize the similarity loss function SI Loss and the mutual information loss function MI Loss using a stochastic gradient descent algorithm, and after optimization, repeat Step a) to Step e) at least M times to obtain a trained model.
- Step g) Input image data in a database to the trained model in Step f) to obtain a binary Hash code representation B database of different lengths for each image. For example, there may be various combinations, such as [16 bits, 32 bits, 48 bits, 64 bits] or [128 bits, 256 bits, 512 bits].
- Step f Input an image to be retrieved img query to the trained model in Step f) to obtain a binary Hash code representation B query of the image to be retrieved img query .
- Dist Hamming ⁇ B query ⁇ B database ⁇
- Hash codes of a plurality of lengths are essentially various feature representations of original data in Hamming space.
- Associative learning of the Hash codes of a plurality of lengths involves the use of complementarity and correlation of features, and this process can also be regarded as multi-level feature fusion of unified samples.
- Related theories of multi-feature fusion and multi-view learning provide a theoretical and technical guarantee for the feasibility of this research method, which further improves the performance of Hashing retrieval.
- Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations.
- the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking.
- information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
- the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously.
- the model has an intuitive structure, and is easy to migrate and deploy.
- the multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for masses of objects in the logistics industry.
- Table 1 provides a first simulation experiment result according to the method of the present disclosure, which is measured by MAP. Test results on NUS-WIDE data sets show that the performance of multi-tasking is better than that of single Hash code learning, which verifies the rationality of the idea of multi-tasking.
- Table 2 provides a second simulation experiment result according to the method of the present disclosure, which is measured by MAP. NUS-WIDE data sets are further studied for the influence of the number of Hash codes of multiple lengths on a Hash code of any length, and it is verified that learning more Hash codes at the same time can also improve the retrieval performance of a Hash code of any length (take 24 bits as an example).
- each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3
- each of the pooling layers adopts a pooling kernel with a size of 2*2
- both the convolution layer and the pooling layer apply a Relu activation function.
- the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
- N in Step c) is a positive integer.
- M in Step f) is 5000.
Abstract
The present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images. According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representation. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
Description
- This patent application claims the benefit and priority of Chinese Patent Application No. 202110732492.3, filed on Jun. 29, 2021, the disclosure of which is incorporated by reference herein in its entirety as part of the present application.
- The present disclosure relates to the technical field of image processing, and in particular to a multi-task deep Hash learning-based retrieval method for massive logistics product images.
- In recent years, with the rapid development of the Internet and electronics technology, information on the Internet has shown an explosive growth. As a result, massive multimedia data such as texts, images, and audios are uploaded almost in every second. This has posed great challenge to many areas requiring Efficient Nearest Neighbor Search, especially retrieval of massive images. When there is a small data size of images in the database, the simplest and direct way to achieve exhaustive search is to calculate an Euclidean distance between a point in the database and a query point, and finally sort them by distance. The time complexity is linear complexity O(dn), where d and n denote a dimension and a sample size of data, respectively. However, when there is a large data size of images, such as millions to hundreds of millions of images, linear search is no longer applicable. In addition, it has become a tendency in the field of computer vision to use high-dimensionality data or structured data to express image information of an object more accurately, and calculate the distance between images of the object using complex similarity calculation formulas. In these cases, exhaustive search has enormous limitations, which makes it impossible to efficiently complete the nearest neighbor search.
- Therefore, approximate nearest neighbor search has been adopted recently to quickly search for effective solutions. Hash is an approximate nearest neighbor search algorithm under extensive study, which can convert documents, images, videos and other multimedia information into compact binary codes, and retain the similarity between original data. Hamming distance is used for measuring the distance between binary codes (also known as Hash codes), which can be quickly solved by Exclusive OR of hardware. Therefore, Hash algorithm has great advantages in storage and efficiency, making it one of the most popular approximate nearest neighbor search algorithms. The present disclosure is oriented towards the field of massive logistics product images in the logistics industry. Therefore, how to quickly and effectively search a database for pictures required has become one of the points to be broken through. Owing to its advantages, Hash learning based on nearest neighbor algorithm has become a powerful tool for mass data search in recent years.
- According to most Hash methods, firstly, a fixed length (e.g., 16, 32, 48) is predetermined for a Hash code to be retrieved. Then the model is trained to learn the Hash code as a high-level image representation, and is used to retrieve mass multimedia data quickly and effectively. When the length of the Hash code is predefined, a Hash code of another length is then required for representation and retrieval once the demand changes. As a result, the model needs to be retrained to learn the new Hash code, which causes a waste of hardware resources and an increase in time cost. Secondly, it is well known that Hash code is a compact representation of the original sample, and one sample can be represented by Hash codes of different lengths. Intuitively speaking, Hash codes of different lengths representing the same sample reflect specific information of a type different from the original sample. If they are treated as different views of the original sample, there should be some differences and connections among different views. When merely Hash codes of a single length are considered, the potential relationship between them will be ignored, resulting in the loss of interactive information, reduced representational capacity and low retrieval accuracy. Moreover, for most linear non-depth Hash algorithms, feature extraction and Hash function learning are asynchronous. The design of Hash function is a complex task, and seeking an optimization method of the model is even more difficult.
- To overcome disadvantages of the above technologies, the present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images, so as to improve the performance of Hashing retrieval.
- The technical solution used in the present disclosure to resolve the technical problem thereof is as follows:
- a multi-task deep Hash learning-based retrieval method for massive logistics product images, including the following steps:
- a) conducting image preprocessing on an input logistics product image xi, and constructing a similarity matrix S among logistics product images according to a label of the image xi;
- b) conducting convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector himg of the image, and taking the one-dimensional feature vector himg as a low-level image feature;
- c) inputting the low-level image feature himg to a multi-branch network to obtain a high-level image representation Bk indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure;
- d) calculating a similarity loss function SILoss by formula
-
- where sij denotes similarity between an ith image and a jth image, sij∈{1,0}, the value of sij being 1 indicates the ith image is similar to the jth image, the value of sij being 0 indicates the ith image is not similar to the jth image, bi denotes a binary Hash code regarding data of the ith image, bj denotes a binary Hash code regarding data of the jth image, and T denotes transposition;
- e) calculating a mutual information loss function MILoss by formula MILOSS=Loss(Bk, Wk TBk+1)+γk∥Wk∥1,
-
- where Bk denotes a Hash code output from a kth branch, k∈0, . . . , N−1, Bk+1 denotes a Hash code output from a k+1th branch, Wk denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter;
- f) optimizing the similarity loss function SILoss and the mutual information loss function MILoss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model; g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation Bdatabase of different lengths for each image;
- h) inputting an image to be retrieved imgquery to the trained model in Step f) to obtain a binary Hash code representation Bquery of the image to be retrieved imgquery; and
- i) calculating a Hamming distance DistHamming by formula DistHamming=∥Bquery⊕Bdatabase∥, and returning, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
- Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
- Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
- Preferably, N in Step c) is a positive integer.
- Preferably, M in Step f) is 5000.
- The present disclosure has the following advantages: according to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for massive objects in the logistics industry.
-
FIG. 1 is a flowchart of a method for multi-task feature extraction according to the present disclosure; and -
FIG. 2 is a flowchart of a method for Hash code learning according to the present disclosure. - The present disclosure is further described with reference to
FIG. 1 andFIG. 2 . - A multi-task deep Hash learning-based retrieval method for massive logistics product images, including the following steps: a) Conduct image preprocessing on an input logistics product image xi, and construct a similarity matrix S among logistics product images according to a label of the image xi.
- b) Conduct convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector himg of the image, and obtain the one-dimensional feature vector himg of the image as a low-level image feature by stacking a certain quantity of convolution kernels and pooling kernels, and processing image data.
- c) Adopt hard parameter sharing network: the low-level feature networks have the same structure and share parameters. High-level feature networks have the same structure, but the parameters of the branch network are differentiated according to the difference in high-level features generated. Input the low-level image feature himg to the multi-branch network to obtain a high-level image representation Bk indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure.
- d) Calculate a similarity loss function SILoss by formula
-
- where S denotes similarity between an ith image and a jth image, sij∈{1,0}, the value of S being 1 indicates the i th image is similar to the jth image, the value of sij being 0 indicates the ith image is not similar to the jth image, bi denotes a binary Hash code regarding data of the ith image, bj denotes a binary Hash code regarding data of the jth image, and T denotes transposition. This formula is mainly to establish a relationship between Hash codes and similarity of the original samples. If the original samples are similar, the corresponding Hash codes should be as similar as possible; and if the original samples are not similar, the corresponding Hash codes should not be similar.
- e) Calculate a mutual information loss function MILoss by formula MILOSS=Loss(Bk, WK TBk+1)+γk∥Wk∥1
-
- where Bk denotes a Hash code output from a kth branch, k∈0, . . . , N−1, Bk+1 denotes a Hash code output from a k+1th branch, Wk denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter. Generally speaking, the length of Hash codes is positively correlated with the representational capacity of Hash codes. The purpose of minimizing mutual information loss MILoss is to draw the representational capacity of a shorter Hash code closer to a longer Hash code, and further enhance the correlation among a plurality of Hash codes, so that the Hash codes learned have good representational capacity, and the Hash code retrieval is improved.
- f) Optimize the similarity loss function SILoss and the mutual information loss function MILoss using a stochastic gradient descent algorithm, and after optimization, repeat Step a) to Step e) at least M times to obtain a trained model. g) Input image data in a database to the trained model in Step f) to obtain a binary Hash code representation Bdatabase of different lengths for each image. For example, there may be various combinations, such as [16 bits, 32 bits, 48 bits, 64 bits] or [128 bits, 256 bits, 512 bits].
- h) Input an image to be retrieved imgquery to the trained model in Step f) to obtain a binary Hash code representation Bquery of the image to be retrieved imgquery.
- Calculate a Hamming distance DistHamming by formula DistHamming=∥Bquery ⊕Bdatabase∥, and return, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
- In the multi-task deep Hash learning-based retrieval method for massive logistics product images, the theory of multi-view learning is adopted to mine potential relevance of Hash codes of different lengths. Hash codes of a plurality of lengths are essentially various feature representations of original data in Hamming space. Associative learning of the Hash codes of a plurality of lengths involves the use of complementarity and correlation of features, and this process can also be regarded as multi-level feature fusion of unified samples. Related theories of multi-feature fusion and multi-view learning provide a theoretical and technical guarantee for the feasibility of this research method, which further improves the performance of Hashing retrieval.
- According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for masses of objects in the logistics industry.
- Table 1 provides a first simulation experiment result according to the method of the present disclosure, which is measured by MAP. Test results on NUS-WIDE data sets show that the performance of multi-tasking is better than that of single Hash code learning, which verifies the rationality of the idea of multi-tasking.
-
TABLE 1 Method 24 bits 48 bits 64 bits 128 bits 256 bits DJMH-Single 0.73 0.78 0.79 0.827 0.833 DJMH-Multiple 0.801 0.827 0.831 0.846 0.855 - Table 2 provides a second simulation experiment result according to the method of the present disclosure, which is measured by MAP. NUS-WIDE data sets are further studied for the influence of the number of Hash codes of multiple lengths on a Hash code of any length, and it is verified that learning more Hash codes at the same time can also improve the retrieval performance of a Hash code of any length (take 24 bits as an example).
-
TABLE 2 Method 24 bits 48 bits 64 bits 128 bits 256 bits DJMH-24, 48 0.755 0.777 DJMH-24, 48, 64 0.777 0.8 0.806 DJMH-24, 48, 64, 128 0.791 0.816 0.821 0.834 DJMH-24, 48, 64, 0.8 0.822 0.828 0.847 0.855 128, 256 - Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
- Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
- Preferably, N in Step c) is a positive integer.
- Preferably, M in Step f) is 5000.
- Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Although the present disclosure is described in detail with reference to the foregoing embodiments, a person skilled in the art can still make modifications to the technical solutions described in the foregoing embodiments, or make equivalent replacement of some technical features therein. Any modifications, equivalent substitutions, improvements, and the like made within the spirit and principle of the present disclosure should be included within the protection scope of the present disclosure.
Claims (5)
1. A multi-task deep Hash learning-based retrieval method for massive logistics product images, comprising the following steps:
a) conducting image preprocessing on an input logistics product image xi, and constructing a similarity matrix S among logistics product images according to a label of the image xi;
b) conducting convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector himg of the image, and taking the one-dimensional feature vector himg as a low-level image feature;
c) inputting the low-level image feature himg to a multi-branch network to obtain a high-level image representation Bk indicated by Hash codes of a plurality of lengths, wherein the multi-branch network is composed of N branches of a same structure;
d) calculating a similarity loss function SILoss by formula
wherein sij denotes similarity between an ith image and a jth image, sij∈{1,0}, the value of sij being 1 indicates the i th image is similar to the jth image, the value of sij being 0 indicates the ith image is not similar to the jth image, bi denotes a binary Hash code regarding data of the ith image, bj denotes a binary Hash code regarding data of the jth image, and T denotes transposition;
e) calculating a mutual information loss function MILoss by formula MILOSS=Loss(Bk, Wk TBk+1)+γk∥Wk∥1
wherein Bk denotes a Hash code output from a kth branch, k∈0, . . . , N−1, Bk+1 denotes a Hash code output from a k+1th branch, Wk denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter;
f) optimizing the similarity loss function SILoss and the mutual information loss function MILoss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model;
g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation Bdatabase of different lengths for each image;
h) inputting an image to be retrieved imgquery to the trained model in Step f) to obtain a binary Hash code representation Bquery of the image to be retrieved imgquery; and
i) calculating a Hamming distance DistHamming by formula DistHamming=∥Bquery ⊕Bdatabase∥, and returning, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
2. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1 , wherein there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
3. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1 , wherein the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
4. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1 , wherein N in Step c) is a positive integer.
5. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1 , wherein M in Step f) is 5000.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110732492.3A CN113377981B (en) | 2021-06-29 | 2021-06-29 | Large-scale logistics commodity image retrieval method based on multitask deep hash learning |
CN202110732492.3 | 2021-06-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220414144A1 true US20220414144A1 (en) | 2022-12-29 |
Family
ID=77580183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/809,601 Pending US20220414144A1 (en) | 2021-06-29 | 2022-06-29 | Multi-task deep hash learning-based retrieval method for massive logistics product images |
Country Status (2)
Country | Link |
---|---|
US (1) | US20220414144A1 (en) |
CN (1) | CN113377981B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704522B (en) * | 2021-10-28 | 2022-02-18 | 山东建筑大学 | Artificial intelligence-based target image rapid retrieval method and system |
CN114419402B (en) * | 2022-03-29 | 2023-08-18 | 中国人民解放军国防科技大学 | Image story description generation method, device, computer equipment and storage medium |
CN116108217B (en) * | 2022-10-27 | 2023-12-19 | 浙江大学 | Fee evasion vehicle similar picture retrieval method based on depth hash coding and multitask prediction |
CN117292104B (en) * | 2023-11-22 | 2024-02-27 | 南京掌控网络科技有限公司 | Goods shelf display detection method and system based on image recognition |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
US20180260665A1 (en) * | 2017-03-07 | 2018-09-13 | Board Of Trustees Of Michigan State University | Deep learning system for recognizing pills in images |
US20180276528A1 (en) * | 2015-12-03 | 2018-09-27 | Sun Yat-Sen University | Image Retrieval Method Based on Variable-Length Deep Hash Learning |
US20190171665A1 (en) * | 2017-12-05 | 2019-06-06 | Salk Institute For Biological Studies | Image similarity search via hashes with expanded dimensionality and sparsification |
CN110659726A (en) * | 2019-09-24 | 2020-01-07 | 北京达佳互联信息技术有限公司 | Image processing method and device, electronic equipment and storage medium |
US20200242422A1 (en) * | 2019-01-29 | 2020-07-30 | Boe Technology Group Co., Ltd. | Method and electronic device for retrieving an image and computer readable storage medium |
CN107679250B (en) * | 2017-11-01 | 2020-12-01 | 浙江工业大学 | Multi-task layered image retrieval method based on deep self-coding convolutional neural network |
CN109063112B (en) * | 2018-07-30 | 2022-04-01 | 成都快眼科技有限公司 | Rapid image retrieval method, model and model construction method based on multitask learning deep semantic hash |
US20220147743A1 (en) * | 2020-11-09 | 2022-05-12 | Nvidia Corporation | Scalable semantic image retrieval with deep template matching |
CN111460200B (en) * | 2020-03-04 | 2023-07-04 | 西北大学 | Image retrieval method and model based on multitask deep learning and construction method thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165306B (en) * | 2018-08-09 | 2021-11-23 | 长沙理工大学 | Image retrieval method based on multitask Hash learning |
CN109508320A (en) * | 2018-11-27 | 2019-03-22 | 聂秀山 | Multiple-length Hash combination learning method |
CN110674333B (en) * | 2019-08-02 | 2022-04-01 | 杭州电子科技大学 | Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing |
CN111177432B (en) * | 2019-12-23 | 2020-11-03 | 北京航空航天大学 | Large-scale image retrieval method based on hierarchical depth hash |
-
2021
- 2021-06-29 CN CN202110732492.3A patent/CN113377981B/en active Active
-
2022
- 2022-06-29 US US17/809,601 patent/US20220414144A1/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180276528A1 (en) * | 2015-12-03 | 2018-09-27 | Sun Yat-Sen University | Image Retrieval Method Based on Variable-Length Deep Hash Learning |
US20180260665A1 (en) * | 2017-03-07 | 2018-09-13 | Board Of Trustees Of Michigan State University | Deep learning system for recognizing pills in images |
CN107679250B (en) * | 2017-11-01 | 2020-12-01 | 浙江工业大学 | Multi-task layered image retrieval method based on deep self-coding convolutional neural network |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
US20190171665A1 (en) * | 2017-12-05 | 2019-06-06 | Salk Institute For Biological Studies | Image similarity search via hashes with expanded dimensionality and sparsification |
CN109063112B (en) * | 2018-07-30 | 2022-04-01 | 成都快眼科技有限公司 | Rapid image retrieval method, model and model construction method based on multitask learning deep semantic hash |
US20200242422A1 (en) * | 2019-01-29 | 2020-07-30 | Boe Technology Group Co., Ltd. | Method and electronic device for retrieving an image and computer readable storage medium |
CN110659726A (en) * | 2019-09-24 | 2020-01-07 | 北京达佳互联信息技术有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111460200B (en) * | 2020-03-04 | 2023-07-04 | 西北大学 | Image retrieval method and model based on multitask deep learning and construction method thereof |
US20220147743A1 (en) * | 2020-11-09 | 2022-05-12 | Nvidia Corporation | Scalable semantic image retrieval with deep template matching |
Non-Patent Citations (1)
Title |
---|
Multi-Task Learning for Deep Semantic Hashing, Ma et al, IEEE (Year: 2018) * |
Also Published As
Publication number | Publication date |
---|---|
CN113377981A (en) | 2021-09-10 |
CN113377981B (en) | 2022-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220414144A1 (en) | Multi-task deep hash learning-based retrieval method for massive logistics product images | |
US20210224286A1 (en) | Search result processing method and apparatus, and storage medium | |
CN110866140B (en) | Image feature extraction model training method, image searching method and computer equipment | |
US10776685B2 (en) | Image retrieval method based on variable-length deep hash learning | |
WO2020182019A1 (en) | Image search method, apparatus, device, and computer-readable storage medium | |
Wu et al. | Online feature selection with streaming features | |
CN108334574B (en) | Cross-modal retrieval method based on collaborative matrix decomposition | |
US20100211588A1 (en) | Context-Aware Query Suggestion By Mining Log Data | |
CN110929080B (en) | Optical remote sensing image retrieval method based on attention and generation countermeasure network | |
CN113806746B (en) | Malicious code detection method based on improved CNN (CNN) network | |
CN108399185B (en) | Multi-label image binary vector generation method and image semantic similarity query method | |
CN108446334B (en) | Image retrieval method based on content for unsupervised countermeasure training | |
CN112115716A (en) | Service discovery method, system and equipment based on multi-dimensional word vector context matching | |
CN109766469A (en) | A kind of image search method based on the study optimization of depth Hash | |
CN109829065B (en) | Image retrieval method, device, equipment and computer readable storage medium | |
CN113806580B (en) | Cross-modal hash retrieval method based on hierarchical semantic structure | |
CN112307182B (en) | Question-answering system-based pseudo-correlation feedback extended query method | |
CN111125411A (en) | Large-scale image retrieval method for deep strong correlation hash learning | |
CN104731882A (en) | Self-adaptive query method based on Hash code weighting ranking | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
CN112199532A (en) | Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism | |
CN110008306A (en) | A kind of data relationship analysis method, device and data service system | |
CN111325264A (en) | Multi-label data classification method based on entropy | |
CN107908757B (en) | Website classification method and system | |
CN113516019B (en) | Hyperspectral image unmixing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SHANDONG JIANZHU UNIVERSITY, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NIE, XIUSHAN;WANG, LETIAN;LIU, XINGBO;AND OTHERS;REEL/FRAME:060354/0857 Effective date: 20220627 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |