CN114511452A - Remote sensing image retrieval method integrating multi-scale cavity convolution and triple attention - Google Patents

Remote sensing image retrieval method integrating multi-scale cavity convolution and triple attention Download PDF

Info

Publication number
CN114511452A
CN114511452A CN202111480268.6A CN202111480268A CN114511452A CN 114511452 A CN114511452 A CN 114511452A CN 202111480268 A CN202111480268 A CN 202111480268A CN 114511452 A CN114511452 A CN 114511452A
Authority
CN
China
Prior art keywords
convolution
remote sensing
module
image
sensing image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111480268.6A
Other languages
Chinese (zh)
Other versions
CN114511452B (en
Inventor
侯东阳
王思远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202111480268.6A priority Critical patent/CN114511452B/en
Publication of CN114511452A publication Critical patent/CN114511452A/en
Application granted granted Critical
Publication of CN114511452B publication Critical patent/CN114511452B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image retrieval method fusing multi-scale cavity convolution and triple attention, which comprises the following steps: A) constructing a reference network based on a residual error structure; B) replacing a convolution module in the residual error structure with a multi-scale cavity convolution module to enhance the image characteristics; C) embedding a triple attention module in a residual structure formed by adopting a multi-scale cavity convolution module, wherein the triple attention module is embedded in the last convolution layer of each residual block of the residual structure; D) constructing an online label smooth loss function, inputting remote sensing image data into a residual error structure for training, and dynamically generating a smooth weight matrix in the training process; E) extracting a feature vector of the remote sensing image; (F) and matching the characteristics of the remote sensing image with the characteristics of the database image, and retrieving the most similar image. The method can extract the remarkable semantic features of the remote sensing image and can effectively improve the retrieval precision.

Description

Remote sensing image retrieval method integrating multi-scale cavity convolution and triple attention
Technical Field
The invention relates to an image retrieval method, in particular to a remote sensing image retrieval method fusing multi-scale void convolution and triple attention.
Background
Remote sensing image retrieval is a process for inquiring a scene or a target which is interested by a user from a remote sensing image (library) according to a certain similarity index, and is one of key technologies for promoting the sharing and the efficient mining of massive remote sensing images.
However, due to the problems that massive remote sensing images are marked, time and labor are wasted, marked texts cannot accurately express image contents and the like, content-based remote sensing image retrieval (namely 'searching images by images') based on image characteristics as similarity calculation basis becomes the mainstream method. In recent years, a deep learning method represented by a convolutional neural network can extract global features of an image from a large amount of data, and the effect of remote sensing image retrieval is greatly improved.
For this reason, although the required image can be effectively retrieved by using the depth features for retrieval, the method is limited by the characteristics of rich target, complex background, inconsistent scale and the like of the remote sensing image, so that the global features extracted by the CNN are invalid in partial scenes, and the retrieval accuracy is reduced.
Disclosure of Invention
The invention aims to provide a remote sensing image retrieval method fusing multi-scale cavity convolution and triple attention, which can effectively improve retrieval precision.
In order to solve the technical problem, the invention provides a remote sensing image retrieval method fusing multi-scale cavity convolution and triple attention, which comprises the following steps:
A) constructing a reference model based on a residual error structure;
B) replacing a convolution module in the residual error structure with a multi-scale hole convolution module;
C) embedding a triple attention module in the residual structure formed by the multi-scale hole convolution modules, wherein the triple attention module is embedded after the last convolution layer of each residual block of the residual structure;
D) constructing an online label smooth loss function, inputting remote sensing image data into a residual error structure for training, and dynamically generating a smooth weight matrix in the training process;
E) extracting a feature vector of the remote sensing image;
F) and matching the characteristics of the remote sensing image with the characteristics of the database image, and retrieving the most similar image.
Preferably, in step B), the method for replacing the convolution module in the residual structure with the multi-scale hole convolution module is as follows:
B1) setting a 3 multiplied by 3 convolution module in a residual error structure as a hole convolution module;
B2) and respectively setting the expansion rates of the cavity convolution modules as [1,2,5 and 9] to form the multi-scale cavity convolution module.
Further preferably, in step C), the triple attention module models the channel attention and the spatial attention respectively through cross-channel interaction between the channel dimension and the spatial dimension.
Preferably, the interaction steps of the triple attention module are as follows:
C1) setting input characteristic diagram X ∈ RH×W×CThe size of the characteristic diagram is H multiplied by W multiplied by C;
C2) respectively calculating the information data of the three branches of the triple attention module;
C3) and carrying out average pooling aggregation characteristic output on the information extracted by each branch.
Further preferably, the first branch of the triple attention module is a spatial attention calculation branch, and the spatial attention weight is generated by a Sigmoid activation function after inputting the feature values and performing channel pooling and hole convolution.
Preferably, the second branch of the triple attention module is a channel C and space W dimension interaction capture branch, the input feature X is firstly transposed to be changed into an H × C × W dimension feature, the dimension feature is pooled in the H dimension, and finally, the feature is transposed to be the C × H × W feature through convolution and Sigmoid activation functions.
Further preferably, the third branch of the triple attention module is a channel C and space H dimension interaction capture branch, the input feature X is firstly transposed to be a W × H × C dimension feature, the dimension feature is pooled in the W dimension, and finally, the feature is transposed to be a C × H × W feature through convolution and a Sigmoid activation function.
Preferably, in step D), the smooth weight matrix is used to perform differential distance constraint on the images of different categories, and the specific formula of the smooth weight matrix is as follows,
Figure BDA0003395001460000031
Figure BDA0003395001460000032
q(k=yi∣xi)=1,q(k≠yi∣xi)=0
wherein L ishardFor cross entropy loss, xiRepresenting an input image, yiRepresenting the true category of the input image, K being the predicted category of the input image, K being the total number of image categories, p (K | x)i) Representing an input image xiThe probability of prediction as class k, q denotes yiDistribution of (a), LsoftFor online label smoothing loss, t is the number of training iterations,
Figure BDA0003395001460000033
smoothing the threshold for the tag, an
Figure BDA0003395001460000034
And continuously and iteratively adjusting in the training process.
Further preferably, in step F), the calculated model loss and the normalized threshold value used in the training method for the online label loss function are:
Figure BDA0003395001460000035
Figure BDA0003395001460000036
Figure BDA0003395001460000037
after calculating model loss, predicting a threshold value after probability updating according to a reference network model, and carrying out comparison
Figure BDA0003395001460000038
Carrying out standardization to obtain a smooth threshold value when the training iteration number is t +1
Figure BDA0003395001460000039
Preferably, the reference network model is trained by using a cross entropy loss function and an online label smoothing loss function, and the total loss after training is as follows:
L=αLhard+(1-α)Lsoft
wherein, L is the training total loss formed after training, and alpha is a balance coefficient used for balancing the cross entropy loss function and the online label smooth loss function.
According to the technical scheme, the remote sensing image retrieval method fusing the multi-scale cavity convolution and the triple attention extracts the features of ground objects with different scales by adopting the multi-scale cavity convolution module, adds the triple attention module in the residual error feature structure model to enhance the features of the remote sensing image, ensures the accuracy of the extracted image features by matching the triple attention module with the multi-scale cavity convolution module, and restrains images with different categories by adopting an online label smoothing loss training method aiming at the complexity of the remote sensing image, so that the retrieved images are more accurate.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
FIG. 1 is a flow chart of a remote sensing image retrieval method of the invention which combines multi-scale void convolution and triple attention;
FIG. 2 is a general schematic diagram of the remote sensing image retrieval method of the invention which combines multi-scale void convolution and triple attention;
FIG. 3 is a schematic diagram of a first residual structure and a second residual structure of the remote sensing image retrieval method with fusion of multi-scale cavity convolution and triple attention according to the present invention;
FIG. 4 is a schematic diagram of a third residual structure and a fourth residual structure of the remote sensing image retrieval method with fusion of multi-scale void convolution and triple attention according to the present invention;
FIG. 5 is a comparison graph of the visualization effect of the remote sensing image characteristics of the airplane in the invention and the traditional method;
FIG. 6 is a comparison graph of the visualization effect of the image characteristics of the port according to the present invention and the conventional method;
FIG. 7 is a comparison graph of the visualization effect of the image features of the golf course according to the present invention and the conventional method;
FIG. 8 is a comparison graph of the visualization effect of the image characteristics of the parking lot according to the present invention and the conventional method;
FIG. 9 is a comparison graph of the visualization effect of the reservoir image features according to the present invention and the conventional method;
fig. 10 is a comparison graph of the visualization effect of similar image features in the invention and the conventional method.
Reference numerals
1 remote sensing image 2 first convolution layer
3 first residual Structure 4 second residual Structure
5 third residual Structure 6 fourth residual Structure
7 full link layer 8 on-line label smoothing
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1 to 4, in an embodiment of the method for retrieving a remote sensing image with fused multi-scale void convolution and triple attention provided by the present invention, the method includes the following steps:
A) constructing a reference network based on a residual error structure;
B) replacing a convolution module in the residual error structure with a multi-scale hole convolution module;
C) embedding a triple attention module in the residual structure formed by the multi-scale hole convolution modules, wherein the triple attention module is embedded after the last convolution layer of each residual block of the residual structure;
D) constructing an online label smoothing 8 loss function, inputting the data of the remote sensing image 1 into a residual error network for training, and dynamically generating a smoothing weight matrix in the training process;
E) extracting a characteristic vector of the remote sensing image 1;
F) and matching the characteristics of the remote sensing image 1 with the characteristics of the database image, and searching the most similar image.
As shown in fig. 2, in the residual characteristic structure based on the ResNet50 reference network, the accuracy of remote sensing image retrieval can be effectively improved by integrating a reference network model formed by a multi-scale cavity convolution module and a triple attention module. In an adopted reference network model, a shot remote sensing image 1 is input into a first convolution layer 2 as model input data, the first convolution layer 2 is subjected to multiple convolution to form a first residual error structure 3 and a second residual error structure 4, then convolution modules in the first residual error structure 3 and the second residual error structure 4 are replaced by a multi-scale cavity convolution module, and features under different receptive fields are extracted by adopting multi-scale cavity convolution; embedding a triple attention module without parameters into the last convolution layer of each residual structure to form a third residual structure 5 and a fourth residual structure 6, learning an attention weight matrix in a cross-dimension interactive and self-adaptive manner through space and channels, so that important features of an image can be focused, classifying the image extracted through the residual structures by using a full connection layer 7, performing end-to-end training by using an online label smoothing 8 loss function so as to reduce intra-class differences and enhance inter-class separability, and finally performing verification on a public remote sensing image 1 data set, wherein verification data shows that the retrieval accuracy of the remote sensing image 1 can be effectively improved.
Specifically, compared with natural images, the remote sensing image 1 has a more complex background, and is also prone to cause larger intra-class differences, and different classes of images are also prone to have higher similarities, which results in the problems that depth features after training have larger intra-class distances and unclear inter-class boundaries, and the like, and this requires that inter-class separability and intra-class compactness are increased in the training process, so that similar images are divided into more compact clusters, and the dynamically generated smooth weight matrix performs difference distance constraint on the images of different classes to shrink the intra-class distances and enlarge the inter-class differences, and the specific formula of the smooth weight matrix is as follows,
Figure BDA0003395001460000061
Figure BDA0003395001460000062
q(k=yi∣xi)=1,q(k≠yi∣xi)=0
wherein L ishardFor cross entropy loss, xiRepresenting an input image, yiRepresenting the true class of the input image, K the prediction class of the input image, K the total number of image classes, p (K | x)i) Representing an input image xiProbability of prediction as class k, q denotes yiDistribution of (a), LsoftFor online label smoothing 8 loss, t is the number of training iterations,
Figure BDA0003395001460000071
smoothing the threshold for the tag, an
Figure BDA0003395001460000072
And continuously and iteratively adjusting in the training process.
Specifically, the calculated model loss and the normalized threshold value adopted in the training method of the online tag loss function are as follows:
Figure BDA0003395001460000073
Figure BDA0003395001460000074
Figure BDA0003395001460000075
after calculating model loss, predicting a threshold value after probability updating according to a reference network model, and carrying out comparison
Figure BDA0003395001460000076
Carrying out standardization to obtain a smooth threshold value when the training iteration number is t +1
Figure BDA0003395001460000077
Then, a cross entropy loss function and an online label smoothing 8 loss function are adopted to jointly train the reference network model, and the total loss obtained after training is as follows:
L=αLhard+(1-α)Lsoft
wherein, L is the training total loss formed after training, and alpha is a balance coefficient used for balancing the cross entropy loss function and the online label smoothing 8 loss function.
In an embodiment of the remote sensing image retrieval method combining the multi-scale cavity convolution and triple attention, the step B specifically includes setting the multi-scale cavity convolution with expansion rate [1,2,5,9] to be embedded into the residual error structure.
Specifically, the cavity convolution has a larger receptive field under the condition that no additional parameter is introduced, multi-scale context information can be captured at the same time, and the cavity convolution is applied to image separation and target detection, when the features of the remote sensing image 1 in different scales are captured, a multi-scale cavity convolution module is designed in a reference network model, so that feature extraction of the information of the remote sensing image 1 in different scales is realized.
In particular, a greater range of feature information is captured without introducing external parameters. The expansion rate of the hole convolution defines the spacing of values at which the convolution kernel processes the data. For a convolution kernel of size k × k, the extended convolution kernel of size k is obtained from equation (1) at an extension rate rd×kd:kd=kd+(k-1)·(r-1)。
The cavity convolution increases the information reception field, and meanwhile, the convolution space is discontinuous, so that the problem that remote information is irrelevant is brought, information loss of a small target can be caused for the remote sensing image 1 with a complex background, and the continuity of image information is ensured by the multi-scale cavity convolution module adopted by the method. The expansion rate of the superposition void convolution cannot have a common divisor of 1 and other than the expansion rate, after the superposition void convolution is subjected to pooling operation, the distribution of the expansion rate follows a zigzag heuristic structure, for example, for a convolution kernel with k equal to 3, an ascending group with expansion rates of [1,2,5 and 9] is set, and ground feature information with different sizes is extracted in a self-adaptive mode, wherein the convolution with a smaller expansion rate is used for capturing short-distance ground feature information, and the convolution with a larger expansion rate is used for capturing long-distance information, so that information can be acquired from a wider space without destroying the continuity of a convolution area.
In an embodiment of the remote sensing image retrieval method fusing multi-scale cavity convolution and triple attention, the interaction steps of the triple attention module are as follows:
C1) setting input characteristic diagram X ∈ RH×W×CThe size of the characteristic diagram is H multiplied by W multiplied by C;
C2) respectively calculating the information data of the three branches of the triple attention module;
C3) and carrying out average pooling aggregation characteristic output on the information extracted by each branch.
Specifically, the visual attention mechanism obtains a target area needing important attention by rapidly scanning a global image, and then puts more attention resources into the area, obtains more detailed information of the target needing attention, and suppresses other useless information. In the application of the method to the remote sensing image 1, because the remote sensing image 1 contains a large amount of background information and has a great influence on depth feature discrimination, a triple attention module with almost no parameters is embedded into a residual feature structure model, two branches of the triple attention module are respectively used for capturing cross-channel interaction of channel dimension and space dimension, and one branch is used for carrying out space attention weight calculation and respectively modeling channel attention and space attention. The first branch is a channel attention calculation branch, and input features are subjected to channel pooling and 7 × 7 convolution firstly, and then a Sigmoid activated function generates a spatial attention weight; the second branch is a channel C and space W dimension interactive capture branch, input features X are firstly transformed into H multiplied by C multiplied by W dimension features through transposition, then pooling is carried out on H dimension, and finally the features are transformed into C multiplied by H multiplied by W features through 7 multiplied by 7 convolution and a Sigmoid activation function. And the third branch is a channel C and space H dimension interactive capture branch, the input feature X is firstly transformed into a W multiplied by H multiplied by C dimension characteristic through transposition, the dimension characteristic is pooled in the W dimension, and finally transformed into a C multiplied by H multiplied by W characteristic through convolution and a Sigmoid activation function. And finally, performing average pooling aggregation characteristic output on the information extracted by each branch.
In order to verify the accuracy of the retrieval method, the retrieval method is carried out on an Ubuntu 20 system which is loaded with an Intel 3.7GHz i9-10900K processor and an NVIDIA GeForce GTX3090 display card. In the training phase, a training batch is set to be 40epoch, an optimizer is Adam, the initial learning rate is 3e-4, and the weight attenuation is 3 e-4. In all experiments, the input image has been resized to 224 × 224 pixels. For comparison, four public remote sensing image 1 data sets are used as verification data sets, and the four data sets are respectively:
1) UCMD: the UCMD dataset contains 2100 remote sensed images 1 from the us geological survey (USGS) containing 21 different categories of remote sensed images 1 for airplanes, buildings, rivers, etc. each category containing 100 images with an image size of 256 x 256 pixels.
2) NWPU the NWPU data set contains 45 classes of images, each class containing 700 images for a total of 31500 images with an image size of 256 x 256 pixels.
3) Pattern Net A Pattern Net dataset consists of 38 classes, each class containing 800 images collected from 256 x 256 pixels of Google Earth. The ground resolution of the image is 0.6-4.7 meters.
4) VArcGIS: the VArcGIS large-scale remote sensing data set consists of 38 types of images collected from ArcGIS World image, each type comprises 1504-.
For the reference dataset used, we randomly partitioned the training set, the test set, in an 8:2 ratio for each class of images, the training set was re-partitioned into two parts, 80% of the images were used for training and the remaining 20% for validation. During the testing process, the output of the fully connected layer 7 is removed from the model as an image feature, and the Euclidean distance is used to measure the similarity of the features. The closer the visual features of the query image are to other images, the more similar these images are, and when performing the comparative evaluation, the results are evaluated using three standard retrieval metrics of Average Normalized Modified Retrieval Rank (ANMRR), average retrieval precision (mapp), and precision at k, and we set the k value to 5, 10, 20, 50, 100, and 1000, where the lower the ANMRR value, the higher the mapp and Pk values, the better the retrieval precision.
Specific results are generated by performing experiments on the four data sets as shown in tables 1 and 2, table 1: retrieval accuracy on four reference datasets
Figure BDA0003395001460000101
Table 2: retrieval accuracy of different methods on UCMD dataset
Figure BDA0003395001460000102
Figure BDA0003395001460000111
In table 1, the larger the mAP and the larger the Pk, the better the smaller the ANMRR, and as can be seen from table 1, the average retrieval accuracy on the PatternNet and the VArcGIS data sets with clear targets is respectively improved by 6.17% and 9.67%, and the average retrieval accuracy on the UCMD and the NWPU data sets with complex backgrounds is respectively improved by 24.46% and 33.84%, compared with other algorithms, the ANMRR value obtained on the UCMD data sets with complex backgrounds by the method of the present invention is the smallest, the mAP value is the largest, and the retrieval accuracy is the highest. Through the comparison result, the higher requirement on the feature extraction capability of the image with the complex background can be obviously seen, and the multi-scale features and the key area features of the image are extracted from the remote sensing image 1 through the reference network model, so that the performance improvement is greatly achieved on the data set with rich scenes and complex background.
In addition, in order to test the effectiveness of the multi-scale feature extraction module and the attention module, a Grad-CAM + + tool is adopted to visually compare feature heat maps output by the models so as to compare the image characterization capabilities of the models, as shown in FIGS. 5 to 10, the color is more red, which indicates that the models are more sensitive to the pixel values at the positions, i.e., the attention is higher. By comparing the reference method with the remote sensing image 1 detection method adopted by the present invention, the position of the heat map of the reference method is generally inaccurate, for example, as shown in fig. 5, fig. 5(a) is the remote sensing image 1 obtained by shooting, fig. 5(b) is the deviation of the spatial positioning of the feature heat map positioned by the conventional reference method, the focus of the heat map is located in the blank area at the right lower part of the airplane, while fig. 5(c) positioned by the method herein is obvious that the spatial positioning of the feature heat map is just positioned on the airplane without deviation; in fig. 6, fig. 6(a) is a base remote sensing image 1, fig. 6(b) locates the estuary by using a reference method, the characteristic heat map is obviously deviated from two estuaries and is in the middle position of the two estuaries, and fig. 6(c) locates the characteristic heat map by using the method disclosed herein, is accurate and does not have any deviation on the positions of the two estuaries; in fig. 7 to 10, the spatial positioning of the feature heat map by the reference method in fig. 7(b) to 10(b) is more or less deviated, which is particularly obvious in fig. 10, the positioning position of the feature heat map by the reference method in fig. 10(b) is wrong, covering the irrelevant area, while the spatial positioning of the feature heat map by the present method in fig. 10(c) captures the target object feature accurately, and the comparison of the above-mentioned sets of pictures shows that the reference model has a weak capability of capturing the salient features of the image. Compared with the prior art, the method for retrieving the remote sensing image 1 can capture the target object accurately, the formed characteristic heat map can cover the target object, the characteristic heat map formed by the method is more reasonable in covering position and higher in fineness, for example, in a parking lot image in a fourth row, the characteristic heat map generated by the method is accurate in covering range, a heat map focus is better covered on a ground object target with higher detail level, and through comparison of the characteristic heat map and the ground object target, the remote sensing image 1 retrieving method has stronger image characteristic extraction capacity, can capture the multi-scale characteristic and the obvious distinguishing characteristic of the remote sensing image 1 better, and effectively improves retrieving precision.
In the description of the present invention, reference to the description of the terms "one embodiment," "some embodiments," "an implementation," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In the present disclosure, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited thereto. Within the scope of the technical idea of the invention, numerous simple modifications can be made to the technical solution of the invention, including combinations of the individual specific technical features in any suitable way. The invention is not described in detail in order to avoid unnecessary repetition. Such simple modifications and combinations should be considered within the scope of the present disclosure as well.

Claims (10)

1. The remote sensing image retrieval method fusing the multiscale cavity convolution and triple attention is characterized by comprising the following steps of:
A) constructing a reference network based on a residual error structure;
B) replacing a convolution module in the residual error structure with a multi-scale hole convolution module;
C) embedding a triple attention module in the residual structure formed by the multi-scale hole convolution modules, wherein the triple attention module is embedded after the last convolution layer of each residual block of the residual structure;
D) constructing an online label smooth loss function, inputting remote sensing image data into a residual error structure for training, and dynamically generating a smooth weight matrix in the training process;
E) extracting a feature vector of the remote sensing image;
F) and matching the characteristics of the remote sensing image with the characteristics of the database image, and retrieving the most similar image.
2. The method according to claim 1, wherein in step B), the method for replacing the convolution module in the residual structure with the multi-scale hole convolution module is as follows:
B1) setting a 3 multiplied by 3 convolution module in a residual error structure as a hole convolution module;
B2) and respectively setting the expansion rates of the cavity convolution modules as [1,2,5 and 9] to form the multi-scale cavity convolution module.
3. The method of claim 1, wherein in step C) the triplet attention module models channel attention and spatial attention, respectively, through cross-channel interaction between channel dimensions and spatial dimensions.
4. The method of claim 3, wherein the interaction steps of the triple attention module are as follows:
C1) setting input characteristic diagram X ∈ RH×W×CThe size of the characteristic diagram is H multiplied by W multiplied by C;
C2) respectively calculating the information data of the three branches of the triple attention module;
C3) and carrying out average pooling aggregation characteristic output on the information extracted by each branch.
5. The method of claim 3, wherein the first branch of the triple attention module is a spatial attention computation branch, and the spatial attention weight is generated by a Sigmoid activation function after inputting the feature values and after channel pooling and hole convolution.
6. The method of claim 3, wherein the second branch of the triple attention module is a channel C and space W dimension interactive capture branch, the input features X are first transposed into H X C X W dimension features, the dimension features are pooled in the H dimension and are finally transposed into C X H X W features through convolution and Sigmoid activation functions.
7. The method of claim 3, wherein a third branch of the triple attention module is a channel C and space H dimension interactive capture branch, the input features X are firstly transformed into W X H X C dimension features through transposition, the dimension features are pooled in W dimension and are finally transformed into C X H X W features through convolution and Sigmoid activation functions.
8. The method according to claim 1, wherein in step D), the smoothing weight matrix is used to perform differential distance constraint on different classes of images, and the specific formula of the smoothing weight matrix is as follows,
Figure FDA0003395001450000021
Figure FDA0003395001450000022
q(k=yi∣xi)=1,q(k≠yi∣xi)=0
wherein L ishardFor cross entropy loss, xiRepresenting an input image, yiRepresenting the true category of the input image, K being the predicted category of the input image, K being the total number of image categories, p (K | x)i) Representing an input image xiProbability of prediction as class k, q denotes yiDistribution of (a), LsoftFor online label smoothing loss, t is the number of training iterations,
Figure FDA0003395001450000031
smoothing the threshold for the tag, an
Figure FDA0003395001450000032
And continuously and iteratively adjusting in the training process.
9. The method according to claim 8, wherein in step D), the calculated model loss and the normalized threshold value used in the training method of the online tag loss function are:
Figure FDA0003395001450000033
Figure FDA0003395001450000034
Figure FDA0003395001450000035
after calculating model loss, predicting a threshold value after probability updating according to a reference network model, and carrying out comparison
Figure FDA0003395001450000036
Carrying out standardization to obtain a smooth threshold value when the training iteration number is t +1
Figure FDA0003395001450000037
10. The method of claim 8, wherein the reference network model is trained by using a cross entropy loss function and an online label smoothing loss function, and the total loss after training is:
L=αLhard+(1-α)Lsoft
wherein, L is the training total loss formed after training, and alpha is a balance coefficient used for balancing the cross entropy loss function and the online label smooth loss function.
CN202111480268.6A 2021-12-06 2021-12-06 Remote sensing image retrieval method integrating multi-scale cavity convolution and triplet attention Active CN114511452B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111480268.6A CN114511452B (en) 2021-12-06 2021-12-06 Remote sensing image retrieval method integrating multi-scale cavity convolution and triplet attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111480268.6A CN114511452B (en) 2021-12-06 2021-12-06 Remote sensing image retrieval method integrating multi-scale cavity convolution and triplet attention

Publications (2)

Publication Number Publication Date
CN114511452A true CN114511452A (en) 2022-05-17
CN114511452B CN114511452B (en) 2024-03-19

Family

ID=81548234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111480268.6A Active CN114511452B (en) 2021-12-06 2021-12-06 Remote sensing image retrieval method integrating multi-scale cavity convolution and triplet attention

Country Status (1)

Country Link
CN (1) CN114511452B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309927A (en) * 2022-10-09 2022-11-08 中国海洋大学 Multi-label guiding and multi-view measuring ocean remote sensing image retrieval method and system
CN115618098A (en) * 2022-09-08 2023-01-17 淮阴工学院 Cold-chain logistics recommendation method and device based on knowledge enhancement and hole convolution
CN117073848A (en) * 2023-10-13 2023-11-17 中国移动紫金(江苏)创新研究院有限公司 Temperature measurement method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110578A (en) * 2019-02-21 2019-08-09 北京工业大学 A kind of indoor scene semanteme marking method
WO2019210737A1 (en) * 2018-05-04 2019-11-07 上海商汤智能科技有限公司 Object prediction method and apparatus, electronic device and storage medium
CN110705457A (en) * 2019-09-29 2020-01-17 核工业北京地质研究院 Remote sensing image building change detection method
CN111079649A (en) * 2019-12-17 2020-04-28 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
WO2020215984A1 (en) * 2019-04-22 2020-10-29 腾讯科技(深圳)有限公司 Medical image detection method based on deep learning, and related device
CN112101190A (en) * 2020-09-11 2020-12-18 西安电子科技大学 Remote sensing image classification method, storage medium and computing device
CN112183414A (en) * 2020-09-29 2021-01-05 南京信息工程大学 Weak supervision remote sensing target detection method based on mixed hole convolution
CN112669323A (en) * 2020-12-29 2021-04-16 深圳云天励飞技术股份有限公司 Image processing method and related equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019210737A1 (en) * 2018-05-04 2019-11-07 上海商汤智能科技有限公司 Object prediction method and apparatus, electronic device and storage medium
CN110110578A (en) * 2019-02-21 2019-08-09 北京工业大学 A kind of indoor scene semanteme marking method
WO2020215984A1 (en) * 2019-04-22 2020-10-29 腾讯科技(深圳)有限公司 Medical image detection method based on deep learning, and related device
CN110705457A (en) * 2019-09-29 2020-01-17 核工业北京地质研究院 Remote sensing image building change detection method
CN111079649A (en) * 2019-12-17 2020-04-28 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN112101190A (en) * 2020-09-11 2020-12-18 西安电子科技大学 Remote sensing image classification method, storage medium and computing device
CN112183414A (en) * 2020-09-29 2021-01-05 南京信息工程大学 Weak supervision remote sensing target detection method based on mixed hole convolution
CN112669323A (en) * 2020-12-29 2021-04-16 深圳云天励飞技术股份有限公司 Image processing method and related equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DONGYANG HOU等: "An Attention-Enhanced End-to-End Discriminative Network With Multiscale Feature Learning for Remote Sensing Image Retrieval", 《IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING》, vol. 15, pages 8245 *
SI-BAO CHEN等: "Remote Sensing Scene Classification via Multi-Branch Local Attention Network", 《IEEE TRANSACTIONS ON IMAGE PROCESSING 》, vol. 31, pages 99 - 109, XP011890281, DOI: 10.1109/TIP.2021.3127851 *
彭锦超等: "MS-VSCN:用于影像匹配的多尺度视觉相似度比较网络", 《测绘科学技术学报》, vol. 38, no. 1, pages 56 - 63 *
徐胜军等: "多尺度特征融合空洞卷积 ResNet遥感图像建筑物分割", 《光学精密工程》, no. 07, pages 179 - 190 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618098A (en) * 2022-09-08 2023-01-17 淮阴工学院 Cold-chain logistics recommendation method and device based on knowledge enhancement and hole convolution
CN115309927A (en) * 2022-10-09 2022-11-08 中国海洋大学 Multi-label guiding and multi-view measuring ocean remote sensing image retrieval method and system
CN115309927B (en) * 2022-10-09 2023-02-03 中国海洋大学 Multi-label guiding and multi-view measuring ocean remote sensing image retrieval method and system
CN117073848A (en) * 2023-10-13 2023-11-17 中国移动紫金(江苏)创新研究院有限公司 Temperature measurement method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114511452B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN109948425B (en) Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching
Bai et al. Learning-based efficient graph similarity computation via multi-scale convolutional set matching
CN114511452B (en) Remote sensing image retrieval method integrating multi-scale cavity convolution and triplet attention
CN109670528B (en) Data expansion method facing pedestrian re-identification task and based on paired sample random occlusion strategy
CN110929607B (en) Remote sensing identification method and system for urban building construction progress
CN106529499A (en) Fourier descriptor and gait energy image fusion feature-based gait identification method
CN110334578B (en) Weak supervision method for automatically extracting high-resolution remote sensing image buildings through image level annotation
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN114758362B (en) Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding
CN111680678B (en) Target area identification method, device, equipment and readable storage medium
CN106021603A (en) Garment image retrieval method based on segmentation and feature matching
Pang et al. Deep feature aggregation and image re-ranking with heat diffusion for image retrieval
CN104850822B (en) Leaf identification method under simple background based on multi-feature fusion
CN112949740B (en) Small sample image classification method based on multilevel measurement
CN112699834B (en) Traffic identification detection method, device, computer equipment and storage medium
Tzeng et al. User-driven geolocation of untagged desert imagery using digital elevation models
CN113032613B (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
CN112132014A (en) Target re-identification method and system based on non-supervised pyramid similarity learning
Lei et al. Boundary extraction constrained siamese network for remote sensing image change detection
CN105654122A (en) Spatial pyramid object identification method based on kernel function matching
CN110348287A (en) A kind of unsupervised feature selection approach and device based on dictionary and sample similar diagram
Shao et al. Land use classification using high-resolution remote sensing images based on structural topic model
Zhang et al. Semisupervised center loss for remote sensing image scene classification
CN105447869A (en) Particle swarm optimization algorithm based camera self-calibration method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant