CN112084362A - Image hash retrieval method based on hierarchical feature complementation - Google Patents

Image hash retrieval method based on hierarchical feature complementation Download PDF

Info

Publication number
CN112084362A
CN112084362A CN202010789986.0A CN202010789986A CN112084362A CN 112084362 A CN112084362 A CN 112084362A CN 202010789986 A CN202010789986 A CN 202010789986A CN 112084362 A CN112084362 A CN 112084362A
Authority
CN
China
Prior art keywords
feature map
low
feature
level
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010789986.0A
Other languages
Chinese (zh)
Other versions
CN112084362B (en
Inventor
刘庆杰
马田瑶
许杰浩
王蕴红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010789986.0A priority Critical patent/CN112084362B/en
Publication of CN112084362A publication Critical patent/CN112084362A/en
Application granted granted Critical
Publication of CN112084362B publication Critical patent/CN112084362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Abstract

The invention discloses an image hash retrieval method based on hierarchical feature complementation, which can be applied to large-scale content-based image retrieval and is an algorithm capable of simultaneously and effectively extracting low-level detail information and high-level semantic information of an image and fully utilizing global features and local features of the image. The method extracts the low-level feature map and the high-level feature map in the convolutional neural network at the same time, can acquire the low-level information and the high-level information of the image, introduces the attention module, can reduce the noise interference in the low-level feature map, ensures the effectiveness of the low-level feature map, adds multi-scale feature fusion in the high-level feature map of the convolutional neural network, aggregates context information of different areas, can improve the capability of the convolutional neural network for acquiring local detail information, can ensure that the model can fully extract rich and complex contents of the image by enhancing and fusing the information of different levels, and enables the Hash code to better keep the similarity between the images.

Description

Image hash retrieval method based on hierarchical feature complementation
Technical Field
The invention relates to the technical field of computer vision, in particular to an image hash retrieval method based on hierarchical feature complementation.
Background
With the rapid development of the internet, multimedia technologies, mass storage technologies, and smart devices, multimedia data on the network is being generated, distributed, and stored in an explosive form. Since a large number of images are generated at each time, content-based image retrieval is increasingly faced with a large data size. However, the conventional image retrieval method is gradually replaced by an efficient hash method because of the defects of poor image feature extraction capability, low encoding speed and the like.
In recent years, Convolutional Neural Networks (CNN) technology has been applied to relevant fields such as image processing and computer vision, and has been used with great success. Compared with the feature extraction and matching mode of the manual design extraction algorithm, the convolutional neural network can be trained through a data set, so that the semantic information of the image is well stored. With this in mind, researchers in the related art also began to seek the possibility of combining convolutional neural networks with hash algorithms in the context of large-scale image retrieval applications. The image retrieval method based on the deep hash is beneficial to completing the quick retrieval of large-scale images, and has important practical significance under the background that all industries related to the Internet seek big data as a support and growing point.
At present, most of depth hash algorithms firstly use a convolutional neural network to extract image features, and then use a full-connection hash layer to carry out quantization coding on the image features to generate binary hash codes. In the feature extraction part, most hash code methods use convolutional neural networks with a large number of layers, such as AlexNet or ResNet 50. After the image is subjected to convolution and pooling for many times, each element on the extracted feature map has a larger receptive field, and the feature map is a high-level global feature containing rich semantic information. Because the image features containing the global semantic information of the image are used for coding, the effect of the hash method is better than that of the traditional non-depth method for coding by using local features. However, in practical application scenarios, the content of the image is quite complex. In such cases, if only the high-level global features of the image are still used for encoding, the key information of the image may be covered by some secondary irrelevant information, such as background, and the like, so that the hash model cannot encode truly effective information. In contrast, if the model uses a network with a small number of convolution layers, the receptive field of each element in the feature image is small, and the feature image represents the low-level local features of the image, which may cause the reduction of the retrieval performance due to the fact that the global semantic information in the image is not identified when the quantization coding is performed. Related studies have shown that using low-level features can lead to better results in image instance retrieval, because low-level features have relatively higher resolution, containing more location and local detail information. However, since the original image has less convolution times, the lower-layer features contain less semantic information and are more noisy. Relatively, the high-level global features obtained after the image passes through the convolutional neural network with a large number of layers have richer semantic information, but the resolution is low and the perception capability of the image detail information is poor.
In summary, the current deep hash algorithm has a certain problem in the aspect of feature extraction, and the low-level local features are not fully utilized, so that the detail information of the image is ignored, and the retrieval precision is reduced.
Disclosure of Invention
In view of this, the present invention provides an image hash retrieval method based on hierarchical feature complementation, so as to effectively extract low-level detail information and high-level semantic information of an image and fully utilize global features and local features of the image.
The invention provides an image hash retrieval method based on hierarchical feature complementation, which comprises the following steps:
s1: inputting an image to be retrieved into a convolutional neural network for extracting features;
s2: intercepting a feature map generated by an intermediate layer of the convolutional neural network as a low-layer feature map L, inputting the low-layer feature map L into a space attention module, and aggregating context information into the low-layer feature map L by the space attention module to obtain the feature map L1
S3: inputting the low-level feature map L into a channel attention module, wherein the channel attention module models semantic dependence among channels of the low-level feature map L to obtain a feature map L2
S4: the obtained characteristic diagram L1And a characteristic diagram L2Adding to obtain a characteristic diagram L3Fully connected hash layer pair signature graph L3Encoding to generate length L1The lower hash code of (1);
s5: taking the feature map generated at the last layer of the convolutional neural network as a high-level feature map K, and respectively carrying out convolution operation on the high-level feature map K by using a plurality of convolution kernels with different sizes to generate a plurality of feature maps with different scales;
s6: respectively performing point-by-point convolution on the generated feature maps with different scales by using a multi-scale feature fusion module, and reducing the number of channels of each feature map to 1/4 of the number of channels of the high-level feature map K;
s7: using a bilinear interpolation mode to perform up-sampling on each feature map subjected to point-by-point convolution, reducing each feature map to the same scale as that of a high-level feature map K, and performing splicing fusion in the channel direction on each reduced feature map and the high-level feature map K, wherein the fused feature map contains information of different scales among different subregions, so that the fusion of local information and global information is realized;
s8: using full-connection Hash layer coding to the fused feature graph to generate the length of l2The high-level hash code of (1);
s9: splicing the low-level hash code and the high-level hash code to obtain the length l1+l2The hash code of (2) is used for image retrieval.
In a possible implementation manner, in the image hash retrieval method based on hierarchical feature complementation provided by the present invention, in step S2, a feature map generated by an intermediate layer of the convolutional neural network is intercepted as a low-layer feature map L, the low-layer feature map L is input into a spatial attention module, and the spatial attention module aggregates context information into the low-layer feature map L to obtain a feature map L1The method specifically comprises the following steps:
given a low-level feature map L ∈ RC×H×WAnd performing convolution operation on the low-layer feature map L by using two different convolution layers to generate a feature map Y and a feature map Z, wherein { Y, Z } is belonged to RC×H×WWherein C represents the channel number of the characteristic diagram, H represents the height of the characteristic diagram, and W represents the width of the characteristic diagram; adjusting the dimensions of the characteristic diagram Y and the characteristic diagram Z to C multiplied by N, and obtaining { Y ', Z' } epsilon R after adjustmentC×NWherein N ═ H × W denotes the total number of pixels on one channel in the feature map; multiplying the transposition of the characteristic diagram Z 'with Y', using a softmax function as an activation function to obtain a spatial characteristic relation diagram S epsilon RN×N
Figure BDA0002623412690000041
Wherein S isijThe values of the spatial feature relation diagram S in the ith row and the jth column represent the relation between corresponding local features in the feature diagram Y and the feature diagram Z, SijThe larger the similarity and correlation representing two local features, i 1, 2,.., N, j 1, 2.., N;
Figure BDA0002623412690000042
element of line i, Y, representing the rotation of the feature map Zj'represents the jth column element in the feature map Y'; after the spatial feature relation graph S is obtained, the relative weight of the low-level feature graph L on each spatial position is mined by using the mean pooling layer and the convolution layer; after obtaining the relative weight, the low-level feature diagram L is endowed with the weight again to complete the space dimensionThe weighting formula is as follows:
L1=conv(avg(S))·L (2)
wherein avg represents a mean pooling layer, conv represents a convolution layer with sigmoid as an activation function; the formula (2) weights the spatial position of the low-level feature map L, and enhances the key information of the low-level feature map L in the spatial dimension to obtain the feature map L1
In a possible implementation manner, in the image hash retrieval method based on hierarchical feature complementation provided by the present invention, in step S3, the low-level feature map L is input into a channel attention module, and the channel attention module models semantic dependence between channels of the low-level feature map L to obtain the feature map L2The method specifically comprises the following steps:
adjusting the dimension of the low-level feature diagram L to C multiplied by N to obtain a feature diagram L' epsilon RC×NMultiplying the feature diagram L 'by the transpose of the feature diagram L', and using softmax as an activation function to obtain a channel feature relationship diagram G e RC×C
Figure BDA0002623412690000043
Wherein G ismnA value representing the channel characteristic relationship diagram G in the mth row and the nth column, where m is 1, 2., C, n is 1, 2., C; l'mThe row m elements in the feature map L' are represented,
Figure BDA0002623412690000044
an nth column element in the transition representing the feature map L'; after the channel characteristic relation graph G is obtained, the relative weight of the low-level characteristic graph L on each channel is mined by using a mean pooling layer and a full-connection Hash layer; after obtaining the relative weight, giving the weight again to the low-level feature diagram L to finish the re-calibration on the space dimension, wherein the weighting formula is as follows:
L2=mlp(avg(G))·L (4)
mlp denotes a multilayer perceptron with sigmoid as an activation function; formula (4) is forWeighting the channels of the layer characteristic diagram L, and enhancing the key information of the low-layer characteristic diagram L on the channel dimension to obtain the characteristic diagram L2
The image hash retrieval method based on hierarchical feature complementation provided by the invention can be applied to large-scale content-based image retrieval, and is an algorithm which can simultaneously and effectively extract low-level detail information and high-level semantic information of an image and fully utilize global features and local features of the image. The method extracts the low-level feature map and the high-level feature map in the convolutional neural network at the same time, can acquire the low-level information and the high-level information of the image, introduces the attention module, can reduce the noise interference in the low-level feature map, ensures the effectiveness of the low-level feature map, adds multi-scale feature fusion in the high-level feature map of the convolutional neural network, aggregates context information of different areas, can improve the capability of the convolutional neural network for acquiring local detail information, can ensure that the convolutional neural network can fully extract rich and complex contents of the image by enhancing and fusing the information of different levels, and enables hash codes to better keep the similarity between the images.
Drawings
Fig. 1 is a flowchart of an image hash retrieval method based on hierarchical feature complementation according to the present invention;
fig. 2 is a schematic structural diagram of a multi-scale feature fusion module in an image hash retrieval method based on hierarchical feature complementation according to the present invention;
FIG. 3 is a schematic structural diagram of a spatial attention module in an image hash retrieval method based on hierarchical feature complementation according to the present invention;
FIG. 4 is a schematic structural diagram of a channel attention module in the hierarchical feature complementation-based image hash retrieval method according to the present invention;
fig. 5 is a comparison diagram after feature map weights of an original image are visualized by respectively using the existing ResNet50 and an image hash retrieval method based on hierarchical feature complementation provided by the present invention;
FIG. 6 is a graph of the results of t-SNE visualization experiments for the DHA method;
FIG. 7 is a graph showing the results of t-SNE visualization experiment of DHA + method.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only illustrative and are not intended to limit the present invention.
The invention provides an image hash retrieval method based on hierarchical feature complementation, which comprises the following steps as shown in figure 1:
s1: inputting an image to be retrieved into a convolutional neural network for extracting features;
s2: intercepting a feature map generated by an intermediate layer of the convolutional neural network as a low-layer feature map L, inputting the low-layer feature map L into a space attention module, and aggregating context information into the low-layer feature map L by the space attention module to obtain the feature map L1
S3: inputting the low-level feature map L into a channel attention module, and modeling semantic dependence among channels of the low-level feature map L by the channel attention module to obtain the feature map L2
S4: the obtained characteristic diagram L1And a characteristic diagram L2Adding to obtain a characteristic diagram L3Fully connected hash layer pair signature graph L3Encoding to generate length l1The lower hash code of (1);
s5: taking a feature map generated by the last layer of the convolutional neural network as a high-level feature map K, and respectively carrying out convolution operation on the high-level feature map K by using a plurality of convolution kernels with different sizes to generate a plurality of feature maps with different scales;
s6: respectively performing point-by-point convolution on the generated feature maps with different scales by using a multi-scale feature fusion module, and reducing the number of channels of each feature map to 1/4 of the number of channels of the high-level feature map K;
s7: using a bilinear interpolation mode to perform up-sampling on each feature map subjected to point-by-point convolution, reducing each feature map to the same scale as that of a high-level feature map K, and performing splicing fusion in the channel direction on each reduced feature map and the high-level feature map K, wherein the fused feature map contains information of different scales among different subregions, so that the fusion of local information and global information is realized;
s8: using full-connection Hash layer coding to the fused feature graph to generate the length of l2The high-level hash code of (1);
s9: splicing the lower-layer hash code and the higher-layer hash code to obtain the length l1+l2The hash code of (2) is used for image retrieval.
The following describes a specific implementation of the image hash retrieval method based on hierarchical feature complementation according to a specific embodiment.
Example 1:
the invention uses ResNet50 as the backbone of the convolutional neural network, improves on the basis, uses the characteristics of different layers to generate hash codes which can represent information of different layers, and obtains a more effective hash code which combines information of different layers by a direct splicing mode. The present invention refers to a hash code generated using lower layer information as a lower layer hash code, and refers to a hash code generated using higher layer information as a higher layer hash code.
In order to generate a high-level hash code, a feature map generated at the last layer of a convolutional neural network is used as a high-level feature map, the high-level feature map is convoluted by using convolution kernels with different sizes to generate a plurality of feature maps with different scales, the plurality of feature maps with different scales are convoluted point by point and then are sampled, each feature map is reduced to the same scale as the high-level feature map, each reduced feature map and the high-level feature map are spliced and fused in the channel direction, the feature map with the fused multi-scale features is coded by using a full-connection hash layer, and the length of the generated feature map is l2The high level hash code of (1). In order to generate the low-level hash code, firstly, feature maps generated by the middle layer of the convolutional neural network are intercepted to be used as low-level feature maps, then an attention mechanism (combination of spatial attention and channel attention) is used for enhancing the low-level feature maps, noise interference and semantic divergence in the low-level feature maps are reduced, and then the low-level feature maps are subjected to semantic divergenceThe full-connection Hash layer is used for coding the image to obtain a length l which can represent the detail characteristics of the lower layer of the image1The lower level hash code of (1). Splicing the high-level hash code and the low-level hash code to obtain a hash code with length l1+l2The hash code of (2) is used for image retrieval.
In the process of generating the high-level hash code, as shown in fig. 2, given an input feature as a high-level feature map, first, convolving the high-level feature map by using a plurality of convolution kernels of different sizes to generate a plurality of feature maps of different sizes, for example, the feature map a in fig. 2 has a size of 1 × 1, and the other feature maps are feature maps of various sizes; then, carrying out point-by-point convolution on a plurality of feature maps with different scales, and reducing the channel number of each feature map to 1/4; and finally, performing up-sampling by using a bilinear interpolation mode, restoring each feature map into an original scale, and fusing each restored feature map and the high-level feature map, wherein the fused feature map comprises information of different scales among different sub-regions, so that the fusion of local information and global information is realized.
In the process of generating the low-level hash code, the attention mechanism module uses a combination of a space attention module and a channel attention module. The spatial attention module and the channel attention module are described separately below.
The spatial attention Module, as shown in FIG. 3, gives a lower level feature map L ∈ RC×H×WAnd performing convolution operation on the low-layer feature map L by using two different convolution layers to generate a feature map Y and a feature map Z, wherein { Y, Z } is belonged to RC×H×WWherein C represents the channel number of the characteristic diagram, H represents the height of the characteristic diagram, and W represents the width of the characteristic diagram; adjusting the dimensions of the characteristic diagram Y and the characteristic diagram Z to C multiplied by N, and obtaining { Y ', Z' } epsilon R after adjustmentC×NWherein N ═ H × W denotes the total number of pixels on one channel in the feature map; multiplying the transposition of the characteristic diagram Z 'with Y', using a softmax function as an activation function to obtain a spatial characteristic relation diagram S epsilon RN ×N
Figure BDA0002623412690000081
Wherein S isijThe values of the spatial feature relation diagram S in the ith row and the jth column represent the relation between corresponding local features in the feature diagram Y and the feature diagram Z, SijThe larger the similarity and correlation representing two local features, i 1, 2,.., N, j 1, 2.., N;
Figure BDA0002623412690000082
element of line i, Y, representing the rotation of the feature map Zj'represents the jth column element in the feature map Y'; after the spatial feature relation graph S is obtained, the relative weight of the low-level feature graph L on each spatial position is mined by using the mean pooling layer and the convolution layer; after obtaining the relative weight, giving the weight again to the low-level feature diagram L to finish the re-calibration on the space dimension, wherein the weighting formula is as follows:
L1=conv(avg(S))·L (2)
wherein avg represents a mean pooling layer, conv represents a convolution layer with sigmoid as an activation function; weighting the spatial position of the low-level feature map L by using the formula (2), and enhancing the key information of the low-level feature map L in the spatial dimension to obtain the feature map L1
The channel attention module, as shown in FIG. 4, does not perform additional convolution processing unlike the spatial attention module, but directly uses the input low-level feature map L ∈ RC×H×WCalculating the channel characteristic relation graph G epsilon RC ×C. Firstly, the dimension of a low-level feature diagram L is adjusted to C multiplied by N to obtain a feature diagram L' epsilon RC×NMultiplying the feature diagram L 'by the transpose of the feature diagram L', and using softmax as an activation function to obtain a channel feature relationship diagram G e RC×C
Figure BDA0002623412690000091
Wherein G ismnA value representing the channel characteristic relation graph G in the mth row and the nth column represents the degree of association between the channel m and the channel n in the low-level characteristic graph, wherein m is 1, 2, and C, n is 1, 2,. and C; l'mThe row m elements in the feature map L' are represented,
Figure BDA0002623412690000092
an nth column element in the transition representing the feature map L'; after the channel characteristic relation graph G is obtained, the relative weight of the low-level characteristic graph L on each channel is mined by using a mean pooling layer and a full-connection Hash layer; after obtaining the relative weight, giving the weight again to the low-level feature diagram L to finish the re-calibration on the space dimension, wherein the weighting formula is as follows:
L2=mlp(avg(G))·L (4)
mlp denotes a multilayer perceptron with sigmoid as an activation function; weighting the channel of the low-level feature map L by using the formula (4), and enhancing the key information of the low-level feature map L on the channel dimension to obtain the feature map L2. The channel attention module models semantic dependency between the feature maps, so that similar semantic features are mutually promoted, and the expression capability of the feature maps on image semantics can be improved.
In summary, the present invention uses a high-level feature map and a low-level feature map to obtain local information and global information simultaneously. Aiming at the problem that the high-level feature map has a relatively large receptive field, contains global information and deep semantic information of an image, but has low resolution and ignores much detailed information in the image, the invention introduces a multi-scale feature fusion module to effectively combine feature information of different scales to realize the fusion of the global information and the local information. Although the low-level feature map contains more information of image structure, such as detailed information of texture, color, shape and the like of an object, which has an important influence on the classification result, the low-level feature map has serious problems of background clutter, semantic divergence and the like, and therefore the invention adopts an attention mechanism to carry out certain processing on the low-level feature map so as to reduce the influence of noise.
The feature fusion technology can improve the detection and segmentation performance in the tasks of target detection and image segmentation. According to the sequence of fusion to detection, feature fusion can be divided into early fusion and late fusion. The method adopts an early fusion mode, firstly fuses the extracted multilayer image features, and then predicts the fused feature vector. The fusion mode comprises two modes of superposition fusion and direct addition. The superposition fusion is to splice two feature vectors, if the dimensions of the two features are x and y respectively, the fused feature dimension is x + y; additive fusion, i.e. combining two eigenvectors into a complex vector, z ═ x + iy, where i is the imaginary unit. Aiming at feature graphs of different scales, the method adopts a bilinear interpolation mode, and performs fusion after each feature graph is restored to an original feature by up-sampling.
The attention mechanism uses a combination of channel and spatial attention. Spatial attention is focused on the position of important information in the graph, so the invention uses the spatial relationship of the features to generate the attention graph, and more abundant context information is aggregated into local features, thereby enhancing the feature expression capability of the local features. For the channel attention, each channel of the feature map contains semantic information of a certain instance in an image, the semantic information among different channels is correlated, the semantic divergence of information contained in low-level local features is large, a convolutional neural network is difficult to aggregate similar semantic information of the image, and the representation of the feature map on specific semantics can be improved by mining the interdependency among the channels, so that the attention map is generated by using the cross-channel relation of the features, and the correlation among the channels is learned, which can be regarded as a supplement to the spatial attention. Therefore, after an input image is given, the two attention modules can respectively focus on the category information of the object in the feature map and the position information in the image, the quality of the low-level feature map can be improved, and the fusion feature strengthened by the information can be obtained through addition fusion. Compared with the original features, the fused features reduce the interference of background noise and improve the semantic expression capability of the features.
In summary, different from the existing deep hash method which directly uses the high-level features for classification prediction, the method adopts a mode of combining the high-level feature map and the low-level feature map, and simultaneously alleviates the defects of the high-level feature map and the low-level feature map by respectively using a feature fusion module and an attention mechanism.
The accuracy and the feature visualization of the image hash retrieval method based on the hierarchical feature complementation provided by the invention are analyzed through a specific experiment.
The evaluation index mAP @5000 was used on the multi-label datasets NUS-WIDE and MS COCO to evaluate different methods. In the experiment, the lengths of the finally generated hash codes are set to be 16 bits, 32 bits, 48 bits and 64 bits respectively, and the experiment result is shown in table 1.
TABLE 1 results of different hashing methods on multi-label datasets NUS-WIDE and MS COCO
Figure BDA0002623412690000111
In table 1, DHA + represents a model of original DHA after the hierarchical feature complementation-based image hash retrieval method provided by the present invention is used, HashNet + represents a model of original HashNet after the hierarchical feature complementation-based image hash retrieval method provided by the present invention is used, and the retrieval performance of the two models, DHA + and HashNet + is greatly improved compared with that of a hash model using ResNet50 as a backbone. The improvement on the MS COCO data set is more obvious, and the data of the MS COCO are analyzed to find that the image of the MS COCO has more objects with different scales, so that the image Hash retrieval method based on the hierarchical feature complementation can better extract the features with different scales to a certain extent, and the retrieval performance can be improved.
Likewise, the evaluation index mAP @54000 was used on the single label dataset CIFAR-10 to evaluate different methods. In the experiment, the lengths of the hash codes generated finally are set to be 16 bits, 32 bits, 48 bits and 64 bits respectively, and the experiment result is shown in table 2.
TABLE 2 results of different hashing methods on the single label dataset CIFAR-10
Figure BDA0002623412690000121
In an experiment with four hash code lengths (16bit, 32bit, 48bit and 64bit), compared with the original DHA, the DHA + obtained by the image hash retrieval method based on hierarchical feature complementation provided by the invention has certain improvement on the retrieval performance of a multi-tag data set NUS-WIDE, an MS COCO and a single-tag data set CIFAR-10. However, in the CIFAR-10 dataset, the resolution of each image is relatively low, and each image only contains one example object, so that the improvement brought by the image hash retrieval method based on the hierarchical feature complementation provided by the present invention is relatively small.
The image hash retrieval method based on hierarchical feature complementation provided by the invention can bring different degrees of improvement in experiments on different data sets, and shows the universality of the method. It is worth noting that the retrieval performance of the image hash retrieval method based on the hierarchical feature complementation provided by the invention on the image data set with complex content is greatly improved, and the image hash retrieval method based on the hierarchical feature complementation provided by the invention has better robustness.
Besides verifying the accuracy of the image hash retrieval method based on the hierarchical feature complementation, Grad-CAM is used for visualizing the weights of feature maps in a convolutional neural network, and observing and analyzing the difference of the common ResNet50 and the image hash retrieval method based on the hierarchical feature complementation.
Experiments were performed on images on a partial MS COCO dataset, using Grad-CAM to map the weights of the feature maps of the models generated by the convolutional neural network into the original images, generating a thermodynamic diagram, and selecting three representative images for display, as shown in fig. 5. As can be seen from fig. 5, the image hash retrieval method based on hierarchical feature complementation according to the present invention focuses on people, is not interfered by background and non-critical information, and has high robustness.
The experimental results of feature visualization on images with different complexities show that the image hash retrieval method based on hierarchical feature complementation provided by the invention can completely detect the key information in the images. Through a characteristic complementation mode, the image retrieval model can solve the problems of semantic divergence, noise interference and the like of the convolutional neural network in the characteristic extraction process to a certain extent.
Furthermore, t-SNE visualization experiments were also performed. t-SNE is a commonly used non-linear dimension reduction method that maps high-dimensional data into a low-dimensional space. The CIFAR-10 dataset consists of color images of 10 classes, each class containing 6000 images. Firstly, randomly selecting 1000 images in each type of a CIFAR-10 data set, and respectively generating a 64-bit hash code by using DHA and DHA +; then, the 64-dimensional vector was subjected to dimensionality reduction using t-SNE and displayed in a two-dimensional plane, and the experimental results are shown in fig. 6(DHA) and fig. 7(DHA +). As can be seen from fig. 6 and 7, both methods DH and DHA + can effectively map most of the same class of images into neighboring spaces, and DHA has more erroneous samples than DHA + and is more dispersed.
In summary, it can be seen through accuracy and feature visualization experiments that the image hash retrieval method based on hierarchical feature complementation provided by the invention has good performance, especially on an image data set with complex content.
The image hash retrieval method based on hierarchical feature complementation provided by the invention can be applied to large-scale content-based image retrieval, and is an algorithm which can simultaneously and effectively extract low-level detail information and high-level semantic information of an image and fully utilize global features and local features of the image. The method extracts the low-level feature map and the high-level feature map in the convolutional neural network at the same time, can acquire the low-level information and the high-level information of the image, introduces the attention module, can reduce the noise interference in the low-level feature map, ensures the effectiveness of the low-level feature map, adds multi-scale feature fusion in the high-level feature map of the convolutional neural network, aggregates context information of different areas, can improve the capability of the convolutional neural network for acquiring local detail information, can ensure that the model can fully extract rich and complex contents of the image by enhancing and fusing the information of different levels, and enables the Hash code to better keep the similarity between the images.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (3)

1. An image hash retrieval method based on hierarchical feature complementation is characterized by comprising the following steps:
s1: inputting an image to be retrieved into a convolutional neural network for extracting features;
s2: intercepting a feature map generated by an intermediate layer of the convolutional neural network as a low-layer feature map L, inputting the low-layer feature map L into a space attention module, and aggregating context information into the low-layer feature map L by the space attention module to obtain the feature map L1
S3: inputting the low-level feature map L into a channel attention module, wherein the channel attention module models semantic dependence among channels of the low-level feature map L to obtain a feature map L2
S4: the obtained characteristic diagram L1And a characteristic diagram L2Adding to obtain a characteristic diagram L3Fully connected hash layer pair signature graph L3Encoding to generate length l1The lower hash code of (1);
s5: taking the feature map generated at the last layer of the convolutional neural network as a high-level feature map K, and respectively carrying out convolution operation on the high-level feature map K by using a plurality of convolution kernels with different sizes to generate a plurality of feature maps with different scales;
s6: respectively performing point-by-point convolution on the generated feature maps with different scales by using a multi-scale feature fusion module, and reducing the number of channels of each feature map to 1/4 of the number of channels of the high-level feature map K;
s7: using a bilinear interpolation mode to perform up-sampling on each feature map subjected to point-by-point convolution, reducing each feature map to the same scale as that of a high-level feature map K, and performing splicing fusion in the channel direction on each reduced feature map and the high-level feature map K, wherein the fused feature map contains information of different scales among different subregions, so that the fusion of local information and global information is realized;
s8: using full-connection Hash layer coding to the fused feature graph to generate the length of l2The high-level hash code of (1);
s9: splicing the low-level hash code and the high-level hash code to obtain the length l1+l2The hash code of (2) is used for image retrieval.
2. The image hash retrieval method based on hierarchical feature complementation according to claim 1, wherein in step S2, the feature map generated by the middle layer of the convolutional neural network is intercepted as the low-layer feature map L, the low-layer feature map L is input into the spatial attention module, and the spatial attention module aggregates the context information into the low-layer feature map L to obtain the feature map L1The method specifically comprises the following steps:
given a low-level feature map L ∈ RC×H×WAnd performing convolution operation on the low-layer feature map L by using two different convolution layers to generate a feature map Y and a feature map Z, wherein { Y, Z } is belonged to RC×H×WWherein C represents the channel number of the characteristic diagram, H represents the height of the characteristic diagram, and W represents the width of the characteristic diagram; adjusting the dimensions of the characteristic diagram Y and the characteristic diagram Z to C multiplied by N, and obtaining { Y ', Z' } epsilon R after adjustmentC×NWherein N ═ H × W denotes the total number of pixels on one channel in the feature map; multiplying the transposition of the characteristic diagram Z 'with Y', using a softmax function as an activation function to obtain a spatial characteristic relation diagram S epsilon RN×N
Figure FDA0002623412680000021
Wherein S isijThe values of the spatial feature relation diagram S in the ith row and the jth column represent the relation between corresponding local features in the feature diagram Y and the feature diagram Z, SijThe larger the similarity and correlation representing two local features, i 1, 2,.., N, j 1, 2.., N;
Figure FDA0002623412680000022
element of line i, Y, representing the rotation of the feature map Zj'represents the jth column element in the feature map Y'; after the spatial feature relation graph S is obtained, the relative weight of the low-level feature graph L on each spatial position is mined by using the mean pooling layer and the convolution layer; after obtaining the relative weight, giving the weight again to the low-level feature diagram L to finish the re-calibration on the space dimension, wherein the weighting formula is as follows:
L1=conv(avg(S))·L (2)
wherein avg represents a mean pooling layer, conv represents a convolution layer with sigmoid as an activation function; the formula (2) weights the spatial position of the low-level feature map L, and enhances the key information of the low-level feature map L in the spatial dimension to obtain the feature map L1
3. The image hash retrieval method based on hierarchical feature complementation according to claim 2, wherein in step S3, the low-level feature map L is input into a channel attention module, and the channel attention module models semantic dependence between channels of the low-level feature map L to obtain a feature map L2The method specifically comprises the following steps:
adjusting the dimension of the low-level feature diagram L to C multiplied by N to obtain a feature diagram L' epsilon RC×NMultiplying the feature diagram L 'by the transpose of the feature diagram L', and using softmax as an activation function to obtain a channel feature relationship diagram G e RC×C
Figure FDA0002623412680000031
Wherein G ismnA value representing the channel characteristic relationship diagram G in the mth row and the nth column, where m is 1, 2., C, n is 1, 2., C; l'mThe row m elements in the feature map L' are represented,
Figure FDA0002623412680000032
an nth column element in the transition representing the feature map L'; after the channel characteristic relation graph G is obtained, the relative weight of the low-level characteristic graph L on each channel is mined by using a mean pooling layer and a full-connection Hash layer; after obtaining the relative weight, giving the weight again to the low-level feature diagram L to finish the re-calibration on the space dimension, wherein the weighting formula is as follows:
L2=mlp(avg(G))·L (4)
mlp denotes a multilayer perceptron with sigmoid as an activation function; the formula (4) weights the channel of the low-level feature map L, and enhances the key information of the low-level feature map L on the channel dimension to obtain the feature map L2
CN202010789986.0A 2020-08-07 2020-08-07 Image hash retrieval method based on hierarchical feature complementation Active CN112084362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010789986.0A CN112084362B (en) 2020-08-07 2020-08-07 Image hash retrieval method based on hierarchical feature complementation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010789986.0A CN112084362B (en) 2020-08-07 2020-08-07 Image hash retrieval method based on hierarchical feature complementation

Publications (2)

Publication Number Publication Date
CN112084362A true CN112084362A (en) 2020-12-15
CN112084362B CN112084362B (en) 2022-06-14

Family

ID=73735430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010789986.0A Active CN112084362B (en) 2020-08-07 2020-08-07 Image hash retrieval method based on hierarchical feature complementation

Country Status (1)

Country Link
CN (1) CN112084362B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612913A (en) * 2020-12-28 2021-04-06 厦门市美亚柏科信息股份有限公司 Image searching method and system
CN112906780A (en) * 2021-02-08 2021-06-04 中国科学院计算技术研究所 Fruit and vegetable image classification system and method
CN113011425A (en) * 2021-03-05 2021-06-22 上海商汤智能科技有限公司 Image segmentation method and device, electronic equipment and computer readable storage medium
CN113220926A (en) * 2021-05-06 2021-08-06 安徽大学 Footprint image retrieval method based on multi-scale local attention enhancement network
CN113239217A (en) * 2021-06-04 2021-08-10 图灵深视(南京)科技有限公司 Image index library construction method and system and image retrieval method and system
CN114064952A (en) * 2021-07-09 2022-02-18 武汉邦拓信息科技有限公司 Graph retrieval method based on spatial perception enhancement
CN114295368A (en) * 2021-12-24 2022-04-08 江苏国科智能电气有限公司 Multi-channel fused wind power planetary gear box fault diagnosis method
CN114581456A (en) * 2022-05-09 2022-06-03 深圳市华汉伟业科技有限公司 Multi-image segmentation model construction method, image detection method and device
CN115375980A (en) * 2022-06-30 2022-11-22 杭州电子科技大学 Block chain-based digital image evidence storing system and method
CN116955675A (en) * 2023-09-21 2023-10-27 中国海洋大学 Hash image retrieval method and network based on fine-grained similarity relation contrast learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114774A1 (en) * 2017-10-16 2019-04-18 Adobe Systems Incorporated Generating Image Segmentation Data Using a Multi-Branch Neural Network
CN109840290A (en) * 2019-01-23 2019-06-04 北京航空航天大学 A kind of skin lens image search method based on end-to-end depth Hash

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114774A1 (en) * 2017-10-16 2019-04-18 Adobe Systems Incorporated Generating Image Segmentation Data Using a Multi-Branch Neural Network
CN109840290A (en) * 2019-01-23 2019-06-04 北京航空航天大学 A kind of skin lens image search method based on end-to-end depth Hash

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIEHAO XU ET AL.: "DHA: Supervised Deep Learning to Hash with an Adaptive Loss Function", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW)》 *
曾梦琪等: "基于混合相似度的高效图像检索方案", 《计算机工程》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612913A (en) * 2020-12-28 2021-04-06 厦门市美亚柏科信息股份有限公司 Image searching method and system
CN112906780A (en) * 2021-02-08 2021-06-04 中国科学院计算技术研究所 Fruit and vegetable image classification system and method
CN113011425A (en) * 2021-03-05 2021-06-22 上海商汤智能科技有限公司 Image segmentation method and device, electronic equipment and computer readable storage medium
WO2022183730A1 (en) * 2021-03-05 2022-09-09 上海商汤智能科技有限公司 Image segmentation method and apparatus, electronic device, and computer readable storage medium
CN113220926B (en) * 2021-05-06 2022-09-09 安徽大学 Footprint image retrieval method based on multi-scale local attention enhancement network
CN113220926A (en) * 2021-05-06 2021-08-06 安徽大学 Footprint image retrieval method based on multi-scale local attention enhancement network
CN113239217A (en) * 2021-06-04 2021-08-10 图灵深视(南京)科技有限公司 Image index library construction method and system and image retrieval method and system
CN113239217B (en) * 2021-06-04 2024-02-06 图灵深视(南京)科技有限公司 Image index library construction method and system, and image retrieval method and system
CN114064952A (en) * 2021-07-09 2022-02-18 武汉邦拓信息科技有限公司 Graph retrieval method based on spatial perception enhancement
CN114295368A (en) * 2021-12-24 2022-04-08 江苏国科智能电气有限公司 Multi-channel fused wind power planetary gear box fault diagnosis method
CN114581456A (en) * 2022-05-09 2022-06-03 深圳市华汉伟业科技有限公司 Multi-image segmentation model construction method, image detection method and device
CN114581456B (en) * 2022-05-09 2022-10-14 深圳市华汉伟业科技有限公司 Multi-image segmentation model construction method, image detection method and device
CN115375980A (en) * 2022-06-30 2022-11-22 杭州电子科技大学 Block chain-based digital image evidence storing system and method
CN115375980B (en) * 2022-06-30 2023-05-09 杭州电子科技大学 Digital image certification system and certification method based on blockchain
CN116955675A (en) * 2023-09-21 2023-10-27 中国海洋大学 Hash image retrieval method and network based on fine-grained similarity relation contrast learning
CN116955675B (en) * 2023-09-21 2023-12-12 中国海洋大学 Hash image retrieval method and network based on fine-grained similarity relation contrast learning

Also Published As

Publication number Publication date
CN112084362B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN112084362B (en) Image hash retrieval method based on hierarchical feature complementation
Mohamed et al. Content-based image retrieval using convolutional neural networks
CN113657450B (en) Attention mechanism-based land battlefield image-text cross-modal retrieval method and system
US20220382553A1 (en) Fine-grained image recognition method and apparatus using graph structure represented high-order relation discovery
CN107590505B (en) Learning method combining low-rank representation and sparse regression
CN110929080A (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN111860124B (en) Remote sensing image classification method based on space spectrum capsule generation countermeasure network
CN113743484A (en) Image classification method and system based on space and channel attention mechanism
CN115222998B (en) Image classification method
Wang et al. An image similarity descriptor for classification tasks
CN112446431A (en) Feature point extraction and matching method, network, device and computer storage medium
Guo et al. Multi-view feature learning for VHR remote sensing image classification
Zhou et al. Discriminative attention-augmented feature learning for facial expression recognition in the wild
Wei et al. An automated detection model of threat objects for X-ray baggage inspection based on depthwise separable convolution
US20220019846A1 (en) Image analysis system and operating method of the same
CN116994155B (en) Geological lithology interpretation method, device and storage medium
Sima et al. Composite kernel of mutual learning on mid-level features for hyperspectral image classification
Arulmozhi et al. DSHPoolF: deep supervised hashing based on selective pool feature map for image retrieval
Bibi et al. Deep features optimization based on a transfer learning, genetic algorithm, and extreme learning machine for robust content-based image retrieval
Chen et al. Edge data based trailer inception probabilistic matrix factorization for context-aware movie recommendation
Vijayalakshmi K et al. Copy-paste forgery detection using deep learning with error level analysis
Moujahid et al. Multi-scale multi-block covariance descriptor with feature selection
Li et al. Aggregating hierarchical binary activations for image retrieval
CN113343953B (en) FGR-AM method and system for remote sensing scene recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant