US20220414144A1 - Multi-task deep hash learning-based retrieval method for massive logistics product images - Google Patents

Multi-task deep hash learning-based retrieval method for massive logistics product images Download PDF

Info

Publication number
US20220414144A1
US20220414144A1 US17/809,601 US202217809601A US2022414144A1 US 20220414144 A1 US20220414144 A1 US 20220414144A1 US 202217809601 A US202217809601 A US 202217809601A US 2022414144 A1 US2022414144 A1 US 2022414144A1
Authority
US
United States
Prior art keywords
image
hash
denotes
loss
hash code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/809,601
Inventor
Xiushan NIE
Letian Wang
Xingbo Liu
Shaohua Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Assigned to SHANDONG JIANZHU UNIVERSITY reassignment SHANDONG JIANZHU UNIVERSITY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIU, XINGBO, NIE, XIUSHAN, WANG, LETIAN, WANG, SHAOHUA
Publication of US20220414144A1 publication Critical patent/US20220414144A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/56Information retrieval; Database structures therefor; File system structures therefor of still image data having vectorial format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to a multi-task deep Hash learning-based retrieval method for massive logistics product images.
  • Hash is an approximate nearest neighbor search algorithm under extensive study, which can convert documents, images, videos and other multimedia information into compact binary codes, and retain the similarity between original data.
  • Hamming distance is used for measuring the distance between binary codes (also known as Hash codes), which can be quickly solved by Exclusive OR of hardware. Therefore, Hash algorithm has great advantages in storage and efficiency, making it one of the most popular approximate nearest neighbor search algorithms.
  • the present disclosure is oriented towards the field of massive logistics product images in the logistics industry. Therefore, how to quickly and effectively search a database for pictures required has become one of the points to be broken through. Owing to its advantages, Hash learning based on nearest neighbor algorithm has become a powerful tool for mass data search in recent years.
  • Hash code is a compact representation of the original sample, and one sample can be represented by Hash codes of different lengths.
  • Hash codes of different lengths representing the same sample reflect specific information of a type different from the original sample. If they are treated as different views of the original sample, there should be some differences and connections among different views. When merely Hash codes of a single length are considered, the potential relationship between them will be ignored, resulting in the loss of interactive information, reduced representational capacity and low retrieval accuracy. Moreover, for most linear non-depth Hash algorithms, feature extraction and Hash function learning are asynchronous. The design of Hash function is a complex task, and seeking an optimization method of the model is even more difficult.
  • the present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images, so as to improve the performance of Hashing retrieval.
  • a multi-task deep Hash learning-based retrieval method for massive logistics product images including the following steps:
  • s ij denotes similarity between an ith image and a jth image
  • s ij ⁇ 1,0 ⁇ the value of s ij being 1 indicates the ith image is similar to the jth image
  • the value of s ij being 0 indicates the ith image is not similar to the jth image
  • b i denotes a binary Hash code regarding data of the ith image
  • b j denotes a binary Hash code regarding data of the jth image
  • T denotes transposition
  • MI LOSS Loss(B k , W k T B k+1 )+ ⁇ k ⁇ W k ⁇ 1 ,
  • B k denotes a Hash code output from a kth branch, k ⁇ 0, . . . , N ⁇ 1
  • B k+1 denotes a Hash code output from a k+1th branch
  • W k denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch
  • ⁇ k denotes a regularization parameter
  • ⁇ 1 denotes an L1 norm
  • a k denotes an optimization parameter
  • Step f) optimizing the similarity loss function SI Loss and the mutual information loss function MI Loss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model; g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation B database of different lengths for each image;
  • Dist Hamming ⁇ B query ⁇ B database ⁇ , and returning, based on the calculated Hamming distance Dist Hamming , mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
  • each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3
  • each of the pooling layers adopts a pooling kernel with a size of 2*2
  • both the convolution layer and the pooling layer apply a Relu activation function.
  • the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
  • N in Step c) is a positive integer.
  • M in Step f) is 5000.
  • Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations.
  • the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking.
  • information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
  • the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously.
  • the model has an intuitive structure, and is easy to migrate and deploy.
  • the multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for massive objects in the logistics industry.
  • FIG. 1 is a flowchart of a method for multi-task feature extraction according to the present disclosure.
  • FIG. 2 is a flowchart of a method for Hash code learning according to the present disclosure.
  • FIG. 1 and FIG. 2 The present disclosure is further described with reference to FIG. 1 and FIG. 2 .
  • a multi-task deep Hash learning-based retrieval method for massive logistics product images including the following steps: a) Conduct image preprocessing on an input logistics product image x i , and construct a similarity matrix S among logistics product images according to a label of the image x i .
  • Adopt hard parameter sharing network the low-level feature networks have the same structure and share parameters. High-level feature networks have the same structure, but the parameters of the branch network are differentiated according to the difference in high-level features generated. Input the low-level image feature h img to the multi-branch network to obtain a high-level image representation B k indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure.
  • S denotes similarity between an ith image and a jth image
  • s ij ⁇ 1,0 ⁇
  • the value of S being 1 indicates the i th image is similar to the jth image
  • the value of s ij being 0 indicates the ith image is not similar to the jth image
  • b i denotes a binary Hash code regarding data of the ith image
  • b j denotes a binary Hash code regarding data of the jth image
  • T denotes transposition.
  • This formula is mainly to establish a relationship between Hash codes and similarity of the original samples. If the original samples are similar, the corresponding Hash codes should be as similar as possible; and if the original samples are not similar, the corresponding Hash codes should not be similar.
  • MI LOSS Loss(B k , W K T B k+1 )+ ⁇ k ⁇ W k ⁇ 1
  • B k denotes a Hash code output from a kth branch, k ⁇ 0, . . . , N ⁇ 1
  • B k+1 denotes a Hash code output from a k+1th branch
  • W k denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch
  • ⁇ k denotes a regularization parameter
  • ⁇ 1 denotes an L1 norm
  • a k denotes an optimization parameter.
  • the length of Hash codes is positively correlated with the representational capacity of Hash codes.
  • the purpose of minimizing mutual information loss MI Loss is to draw the representational capacity of a shorter Hash code closer to a longer Hash code, and further enhance the correlation among a plurality of Hash codes, so that the Hash codes learned have good representational capacity, and the Hash code retrieval is improved.
  • Step f) Optimize the similarity loss function SI Loss and the mutual information loss function MI Loss using a stochastic gradient descent algorithm, and after optimization, repeat Step a) to Step e) at least M times to obtain a trained model.
  • Step g) Input image data in a database to the trained model in Step f) to obtain a binary Hash code representation B database of different lengths for each image. For example, there may be various combinations, such as [16 bits, 32 bits, 48 bits, 64 bits] or [128 bits, 256 bits, 512 bits].
  • Step f Input an image to be retrieved img query to the trained model in Step f) to obtain a binary Hash code representation B query of the image to be retrieved img query .
  • Dist Hamming ⁇ B query ⁇ B database ⁇
  • Hash codes of a plurality of lengths are essentially various feature representations of original data in Hamming space.
  • Associative learning of the Hash codes of a plurality of lengths involves the use of complementarity and correlation of features, and this process can also be regarded as multi-level feature fusion of unified samples.
  • Related theories of multi-feature fusion and multi-view learning provide a theoretical and technical guarantee for the feasibility of this research method, which further improves the performance of Hashing retrieval.
  • Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations.
  • the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking.
  • information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
  • the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously.
  • the model has an intuitive structure, and is easy to migrate and deploy.
  • the multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for masses of objects in the logistics industry.
  • Table 1 provides a first simulation experiment result according to the method of the present disclosure, which is measured by MAP. Test results on NUS-WIDE data sets show that the performance of multi-tasking is better than that of single Hash code learning, which verifies the rationality of the idea of multi-tasking.
  • Table 2 provides a second simulation experiment result according to the method of the present disclosure, which is measured by MAP. NUS-WIDE data sets are further studied for the influence of the number of Hash codes of multiple lengths on a Hash code of any length, and it is verified that learning more Hash codes at the same time can also improve the retrieval performance of a Hash code of any length (take 24 bits as an example).
  • each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3
  • each of the pooling layers adopts a pooling kernel with a size of 2*2
  • both the convolution layer and the pooling layer apply a Relu activation function.
  • the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
  • N in Step c) is a positive integer.
  • M in Step f) is 5000.

Abstract

The present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images. According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representation. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This patent application claims the benefit and priority of Chinese Patent Application No. 202110732492.3, filed on Jun. 29, 2021, the disclosure of which is incorporated by reference herein in its entirety as part of the present application.
  • TECHNICAL FIELD
  • The present disclosure relates to the technical field of image processing, and in particular to a multi-task deep Hash learning-based retrieval method for massive logistics product images.
  • BACKGROUND ART
  • In recent years, with the rapid development of the Internet and electronics technology, information on the Internet has shown an explosive growth. As a result, massive multimedia data such as texts, images, and audios are uploaded almost in every second. This has posed great challenge to many areas requiring Efficient Nearest Neighbor Search, especially retrieval of massive images. When there is a small data size of images in the database, the simplest and direct way to achieve exhaustive search is to calculate an Euclidean distance between a point in the database and a query point, and finally sort them by distance. The time complexity is linear complexity O(dn), where d and n denote a dimension and a sample size of data, respectively. However, when there is a large data size of images, such as millions to hundreds of millions of images, linear search is no longer applicable. In addition, it has become a tendency in the field of computer vision to use high-dimensionality data or structured data to express image information of an object more accurately, and calculate the distance between images of the object using complex similarity calculation formulas. In these cases, exhaustive search has enormous limitations, which makes it impossible to efficiently complete the nearest neighbor search.
  • Therefore, approximate nearest neighbor search has been adopted recently to quickly search for effective solutions. Hash is an approximate nearest neighbor search algorithm under extensive study, which can convert documents, images, videos and other multimedia information into compact binary codes, and retain the similarity between original data. Hamming distance is used for measuring the distance between binary codes (also known as Hash codes), which can be quickly solved by Exclusive OR of hardware. Therefore, Hash algorithm has great advantages in storage and efficiency, making it one of the most popular approximate nearest neighbor search algorithms. The present disclosure is oriented towards the field of massive logistics product images in the logistics industry. Therefore, how to quickly and effectively search a database for pictures required has become one of the points to be broken through. Owing to its advantages, Hash learning based on nearest neighbor algorithm has become a powerful tool for mass data search in recent years.
  • According to most Hash methods, firstly, a fixed length (e.g., 16, 32, 48) is predetermined for a Hash code to be retrieved. Then the model is trained to learn the Hash code as a high-level image representation, and is used to retrieve mass multimedia data quickly and effectively. When the length of the Hash code is predefined, a Hash code of another length is then required for representation and retrieval once the demand changes. As a result, the model needs to be retrained to learn the new Hash code, which causes a waste of hardware resources and an increase in time cost. Secondly, it is well known that Hash code is a compact representation of the original sample, and one sample can be represented by Hash codes of different lengths. Intuitively speaking, Hash codes of different lengths representing the same sample reflect specific information of a type different from the original sample. If they are treated as different views of the original sample, there should be some differences and connections among different views. When merely Hash codes of a single length are considered, the potential relationship between them will be ignored, resulting in the loss of interactive information, reduced representational capacity and low retrieval accuracy. Moreover, for most linear non-depth Hash algorithms, feature extraction and Hash function learning are asynchronous. The design of Hash function is a complex task, and seeking an optimization method of the model is even more difficult.
  • SUMMARY
  • To overcome disadvantages of the above technologies, the present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images, so as to improve the performance of Hashing retrieval.
  • The technical solution used in the present disclosure to resolve the technical problem thereof is as follows:
  • a multi-task deep Hash learning-based retrieval method for massive logistics product images, including the following steps:
  • a) conducting image preprocessing on an input logistics product image xi, and constructing a similarity matrix S among logistics product images according to a label of the image xi;
  • b) conducting convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector himg of the image, and taking the one-dimensional feature vector himg as a low-level image feature;
  • c) inputting the low-level image feature himg to a multi-branch network to obtain a high-level image representation Bk indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure;
  • d) calculating a similarity loss function SILoss by formula
  • SE Loss = Loss ( s ij , b i b j T ) = - 1 n n = 0 n = 1000 s ij b i b j T - log ( 1 + e b i b j T ) ,
  • where sij denotes similarity between an ith image and a jth image, sij∈{1,0}, the value of sij being 1 indicates the ith image is similar to the jth image, the value of sij being 0 indicates the ith image is not similar to the jth image, bi denotes a binary Hash code regarding data of the ith image, bj denotes a binary Hash code regarding data of the jth image, and T denotes transposition;
  • e) calculating a mutual information loss function MILoss by formula MILOSS=Loss(Bk, Wk TBk+1)+γk∥Wk1,
  • = k = 0 N - 1 a k B K - W K T B k + 1 1 + K = 0 N - 1 γ k W k 1 ,
  • where Bk denotes a Hash code output from a kth branch, k∈
    Figure US20220414144A1-20221229-P00001
    0, . . . , N−1
    Figure US20220414144A1-20221229-P00002
    , Bk+1 denotes a Hash code output from a k+1th branch, Wk denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter;
  • f) optimizing the similarity loss function SILoss and the mutual information loss function MILoss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model; g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation Bdatabase of different lengths for each image;
  • h) inputting an image to be retrieved imgquery to the trained model in Step f) to obtain a binary Hash code representation Bquery of the image to be retrieved imgquery; and
  • i) calculating a Hamming distance DistHamming by formula DistHamming=∥Bquery⊕Bdatabase∥, and returning, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
  • Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
  • Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
  • Preferably, N in Step c) is a positive integer.
  • Preferably, M in Step f) is 5000.
  • The present disclosure has the following advantages: according to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for massive objects in the logistics industry.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flowchart of a method for multi-task feature extraction according to the present disclosure; and
  • FIG. 2 is a flowchart of a method for Hash code learning according to the present disclosure.
  • DETAILED DESCRIPTION OF THE EMBODIMENTS
  • The present disclosure is further described with reference to FIG. 1 and FIG. 2 .
  • A multi-task deep Hash learning-based retrieval method for massive logistics product images, including the following steps: a) Conduct image preprocessing on an input logistics product image xi, and construct a similarity matrix S among logistics product images according to a label of the image xi.
  • b) Conduct convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector himg of the image, and obtain the one-dimensional feature vector himg of the image as a low-level image feature by stacking a certain quantity of convolution kernels and pooling kernels, and processing image data.
  • c) Adopt hard parameter sharing network: the low-level feature networks have the same structure and share parameters. High-level feature networks have the same structure, but the parameters of the branch network are differentiated according to the difference in high-level features generated. Input the low-level image feature himg to the multi-branch network to obtain a high-level image representation Bk indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure.
  • d) Calculate a similarity loss function SILoss by formula
  • SI Loss = Loss ( s ij , b i b j T ) = - 1 n n = 0 n = 1000 s ij b i b j T - log ( 1 + e b i b j T ) ,
  • where S denotes similarity between an ith image and a jth image, sij∈{1,0}, the value of S being 1 indicates the i th image is similar to the jth image, the value of sij being 0 indicates the ith image is not similar to the jth image, bi denotes a binary Hash code regarding data of the ith image, bj denotes a binary Hash code regarding data of the jth image, and T denotes transposition. This formula is mainly to establish a relationship between Hash codes and similarity of the original samples. If the original samples are similar, the corresponding Hash codes should be as similar as possible; and if the original samples are not similar, the corresponding Hash codes should not be similar.
  • e) Calculate a mutual information loss function MILoss by formula MILOSS=Loss(Bk, WK TBk+1)+γk∥Wk1
  • = k = 0 N - 1 a k B K - W K T B k + 1 1 + K = 0 N - 1 γ k W k 1 ,
  • where Bk denotes a Hash code output from a kth branch, k∈
    Figure US20220414144A1-20221229-P00003
    0, . . . , N−1
    Figure US20220414144A1-20221229-P00004
    , Bk+1 denotes a Hash code output from a k+1th branch, Wk denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter. Generally speaking, the length of Hash codes is positively correlated with the representational capacity of Hash codes. The purpose of minimizing mutual information loss MILoss is to draw the representational capacity of a shorter Hash code closer to a longer Hash code, and further enhance the correlation among a plurality of Hash codes, so that the Hash codes learned have good representational capacity, and the Hash code retrieval is improved.
  • f) Optimize the similarity loss function SILoss and the mutual information loss function MILoss using a stochastic gradient descent algorithm, and after optimization, repeat Step a) to Step e) at least M times to obtain a trained model. g) Input image data in a database to the trained model in Step f) to obtain a binary Hash code representation Bdatabase of different lengths for each image. For example, there may be various combinations, such as [16 bits, 32 bits, 48 bits, 64 bits] or [128 bits, 256 bits, 512 bits].
  • h) Input an image to be retrieved imgquery to the trained model in Step f) to obtain a binary Hash code representation Bquery of the image to be retrieved imgquery.
  • Calculate a Hamming distance DistHamming by formula DistHamming=∥Bquery ⊕Bdatabase∥, and return, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
  • In the multi-task deep Hash learning-based retrieval method for massive logistics product images, the theory of multi-view learning is adopted to mine potential relevance of Hash codes of different lengths. Hash codes of a plurality of lengths are essentially various feature representations of original data in Hamming space. Associative learning of the Hash codes of a plurality of lengths involves the use of complementarity and correlation of features, and this process can also be regarded as multi-level feature fusion of unified samples. Related theories of multi-feature fusion and multi-view learning provide a theoretical and technical guarantee for the feasibility of this research method, which further improves the performance of Hashing retrieval.
  • According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for masses of objects in the logistics industry.
  • Table 1 provides a first simulation experiment result according to the method of the present disclosure, which is measured by MAP. Test results on NUS-WIDE data sets show that the performance of multi-tasking is better than that of single Hash code learning, which verifies the rationality of the idea of multi-tasking.
  • TABLE 1
    Method 24 bits 48 bits 64 bits 128 bits 256 bits
    DJMH-Single 0.73 0.78 0.79 0.827 0.833
    DJMH-Multiple 0.801 0.827 0.831 0.846 0.855
  • Table 2 provides a second simulation experiment result according to the method of the present disclosure, which is measured by MAP. NUS-WIDE data sets are further studied for the influence of the number of Hash codes of multiple lengths on a Hash code of any length, and it is verified that learning more Hash codes at the same time can also improve the retrieval performance of a Hash code of any length (take 24 bits as an example).
  • TABLE 2
    Method 24 bits 48 bits 64 bits 128 bits 256 bits
    DJMH-24, 48 0.755 0.777
    DJMH-24, 48, 64 0.777 0.8 0.806
    DJMH-24, 48, 64, 128 0.791 0.816 0.821 0.834
    DJMH-24, 48, 64, 0.8 0.822 0.828 0.847 0.855
    128, 256
  • Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
  • Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
  • Preferably, N in Step c) is a positive integer.
  • Preferably, M in Step f) is 5000.
  • Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Although the present disclosure is described in detail with reference to the foregoing embodiments, a person skilled in the art can still make modifications to the technical solutions described in the foregoing embodiments, or make equivalent replacement of some technical features therein. Any modifications, equivalent substitutions, improvements, and the like made within the spirit and principle of the present disclosure should be included within the protection scope of the present disclosure.

Claims (5)

What is claimed is:
1. A multi-task deep Hash learning-based retrieval method for massive logistics product images, comprising the following steps:
a) conducting image preprocessing on an input logistics product image xi, and constructing a similarity matrix S among logistics product images according to a label of the image xi;
b) conducting convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector himg of the image, and taking the one-dimensional feature vector himg as a low-level image feature;
c) inputting the low-level image feature himg to a multi-branch network to obtain a high-level image representation Bk indicated by Hash codes of a plurality of lengths, wherein the multi-branch network is composed of N branches of a same structure;
d) calculating a similarity loss function SILoss by formula
SI Loss = Loss ( s ij , b i b j T ) = - 1 n n = 0 n = 1000 s ij b i b j T - log ( 1 + e b i b j T ) ,
wherein sij denotes similarity between an ith image and a jth image, sij∈{1,0}, the value of sij being 1 indicates the i th image is similar to the jth image, the value of sij being 0 indicates the ith image is not similar to the jth image, bi denotes a binary Hash code regarding data of the ith image, bj denotes a binary Hash code regarding data of the jth image, and T denotes transposition;
e) calculating a mutual information loss function MILoss by formula MILOSS=Loss(Bk, Wk TBk+1)+γk∥Wk1
= k = 0 N - 1 a k B K - W K T B k + 1 1 + K = 0 N - 1 γ k W k 1 ,
wherein Bk denotes a Hash code output from a kth branch, k∈
Figure US20220414144A1-20221229-P00005
0, . . . , N−1
Figure US20220414144A1-20221229-P00006
, Bk+1 denotes a Hash code output from a k+1th branch, Wk denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter;
f) optimizing the similarity loss function SILoss and the mutual information loss function MILoss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model;
g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation Bdatabase of different lengths for each image;
h) inputting an image to be retrieved imgquery to the trained model in Step f) to obtain a binary Hash code representation Bquery of the image to be retrieved imgquery; and
i) calculating a Hamming distance DistHamming by formula DistHamming=∥Bquery ⊕Bdatabase∥, and returning, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
2. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
3. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
4. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein N in Step c) is a positive integer.
5. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein M in Step f) is 5000.
US17/809,601 2021-06-29 2022-06-29 Multi-task deep hash learning-based retrieval method for massive logistics product images Pending US20220414144A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110732492.3A CN113377981B (en) 2021-06-29 2021-06-29 Large-scale logistics commodity image retrieval method based on multitask deep hash learning
CN202110732492.3 2021-06-29

Publications (1)

Publication Number Publication Date
US20220414144A1 true US20220414144A1 (en) 2022-12-29

Family

ID=77580183

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/809,601 Pending US20220414144A1 (en) 2021-06-29 2022-06-29 Multi-task deep hash learning-based retrieval method for massive logistics product images

Country Status (2)

Country Link
US (1) US20220414144A1 (en)
CN (1) CN113377981B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704522B (en) * 2021-10-28 2022-02-18 山东建筑大学 Artificial intelligence-based target image rapid retrieval method and system
CN114419402B (en) * 2022-03-29 2023-08-18 中国人民解放军国防科技大学 Image story description generation method, device, computer equipment and storage medium
CN116108217B (en) * 2022-10-27 2023-12-19 浙江大学 Fee evasion vehicle similar picture retrieval method based on depth hash coding and multitask prediction
CN117292104B (en) * 2023-11-22 2024-02-27 南京掌控网络科技有限公司 Goods shelf display detection method and system based on image recognition

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
US20180260665A1 (en) * 2017-03-07 2018-09-13 Board Of Trustees Of Michigan State University Deep learning system for recognizing pills in images
US20180276528A1 (en) * 2015-12-03 2018-09-27 Sun Yat-Sen University Image Retrieval Method Based on Variable-Length Deep Hash Learning
US20190171665A1 (en) * 2017-12-05 2019-06-06 Salk Institute For Biological Studies Image similarity search via hashes with expanded dimensionality and sparsification
CN110659726A (en) * 2019-09-24 2020-01-07 北京达佳互联信息技术有限公司 Image processing method and device, electronic equipment and storage medium
US20200242422A1 (en) * 2019-01-29 2020-07-30 Boe Technology Group Co., Ltd. Method and electronic device for retrieving an image and computer readable storage medium
CN107679250B (en) * 2017-11-01 2020-12-01 浙江工业大学 Multi-task layered image retrieval method based on deep self-coding convolutional neural network
CN109063112B (en) * 2018-07-30 2022-04-01 成都快眼科技有限公司 Rapid image retrieval method, model and model construction method based on multitask learning deep semantic hash
US20220147743A1 (en) * 2020-11-09 2022-05-12 Nvidia Corporation Scalable semantic image retrieval with deep template matching
CN111460200B (en) * 2020-03-04 2023-07-04 西北大学 Image retrieval method and model based on multitask deep learning and construction method thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165306B (en) * 2018-08-09 2021-11-23 长沙理工大学 Image retrieval method based on multitask Hash learning
CN109508320A (en) * 2018-11-27 2019-03-22 聂秀山 Multiple-length Hash combination learning method
CN110674333B (en) * 2019-08-02 2022-04-01 杭州电子科技大学 Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing
CN111177432B (en) * 2019-12-23 2020-11-03 北京航空航天大学 Large-scale image retrieval method based on hierarchical depth hash

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276528A1 (en) * 2015-12-03 2018-09-27 Sun Yat-Sen University Image Retrieval Method Based on Variable-Length Deep Hash Learning
US20180260665A1 (en) * 2017-03-07 2018-09-13 Board Of Trustees Of Michigan State University Deep learning system for recognizing pills in images
CN107679250B (en) * 2017-11-01 2020-12-01 浙江工业大学 Multi-task layered image retrieval method based on deep self-coding convolutional neural network
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
US20190171665A1 (en) * 2017-12-05 2019-06-06 Salk Institute For Biological Studies Image similarity search via hashes with expanded dimensionality and sparsification
CN109063112B (en) * 2018-07-30 2022-04-01 成都快眼科技有限公司 Rapid image retrieval method, model and model construction method based on multitask learning deep semantic hash
US20200242422A1 (en) * 2019-01-29 2020-07-30 Boe Technology Group Co., Ltd. Method and electronic device for retrieving an image and computer readable storage medium
CN110659726A (en) * 2019-09-24 2020-01-07 北京达佳互联信息技术有限公司 Image processing method and device, electronic equipment and storage medium
CN111460200B (en) * 2020-03-04 2023-07-04 西北大学 Image retrieval method and model based on multitask deep learning and construction method thereof
US20220147743A1 (en) * 2020-11-09 2022-05-12 Nvidia Corporation Scalable semantic image retrieval with deep template matching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Multi-Task Learning for Deep Semantic Hashing, Ma et al, IEEE (Year: 2018) *

Also Published As

Publication number Publication date
CN113377981A (en) 2021-09-10
CN113377981B (en) 2022-05-27

Similar Documents

Publication Publication Date Title
US20220414144A1 (en) Multi-task deep hash learning-based retrieval method for massive logistics product images
US20210224286A1 (en) Search result processing method and apparatus, and storage medium
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
US10776685B2 (en) Image retrieval method based on variable-length deep hash learning
WO2020182019A1 (en) Image search method, apparatus, device, and computer-readable storage medium
Wu et al. Online feature selection with streaming features
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
US20100211588A1 (en) Context-Aware Query Suggestion By Mining Log Data
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN113806746B (en) Malicious code detection method based on improved CNN (CNN) network
CN108399185B (en) Multi-label image binary vector generation method and image semantic similarity query method
CN108446334B (en) Image retrieval method based on content for unsupervised countermeasure training
CN112115716A (en) Service discovery method, system and equipment based on multi-dimensional word vector context matching
CN109766469A (en) A kind of image search method based on the study optimization of depth Hash
CN109829065B (en) Image retrieval method, device, equipment and computer readable storage medium
CN113806580B (en) Cross-modal hash retrieval method based on hierarchical semantic structure
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN111125411A (en) Large-scale image retrieval method for deep strong correlation hash learning
CN104731882A (en) Self-adaptive query method based on Hash code weighting ranking
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN112199532A (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN110008306A (en) A kind of data relationship analysis method, device and data service system
CN111325264A (en) Multi-label data classification method based on entropy
CN107908757B (en) Website classification method and system
CN113516019B (en) Hyperspectral image unmixing method and device and electronic equipment

Legal Events

Date Code Title Description
AS Assignment

Owner name: SHANDONG JIANZHU UNIVERSITY, CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NIE, XIUSHAN;WANG, LETIAN;LIU, XINGBO;AND OTHERS;REEL/FRAME:060354/0857

Effective date: 20220627

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER