CN115115938A - Method for detecting salient target of remote sensing image - Google Patents

Method for detecting salient target of remote sensing image Download PDF

Info

Publication number
CN115115938A
CN115115938A CN202210879580.0A CN202210879580A CN115115938A CN 115115938 A CN115115938 A CN 115115938A CN 202210879580 A CN202210879580 A CN 202210879580A CN 115115938 A CN115115938 A CN 115115938A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
features
salient
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210879580.0A
Other languages
Chinese (zh)
Inventor
夏鲁瑞
蔺崎辉
李森
陈雪旗
卢妍
张占月
王鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Original Assignee
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Liberation Army Strategic Support Force Aerospace Engineering University filed Critical Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority to CN202210879580.0A priority Critical patent/CN115115938A/en
Publication of CN115115938A publication Critical patent/CN115115938A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting a salient target of a remote sensing image, which comprises the following steps: s1, acquiring remote sensing image data containing a training set and a test set, and constructing a remote sensing image salient object detection model comprising a detection feature encoder and a cascade feature decoder; s2, introducing an attention mechanism, a characteristic flow mechanism and a cascade decoding mechanism, training the remote sensing image salient target detection model based on remote sensing image data of a training set, stopping training until a preset loss function is converged, and obtaining the trained remote sensing image salient target detection model; and S3, carrying out salient target prediction on the remote sensing image data of the test set by using the trained remote sensing image salient target detection model, and further outputting a corresponding salient map. The method is based on the cascade structure to decode the features, so that the problems of small target omission and false detection in the remote sensing image are solved, the prediction confidence of the salient region is improved, and more accurate salient target boundaries can be predicted.

Description

Method for detecting salient target of remote sensing image
Technical Field
The invention mainly relates to the technical field of remote sensing image application, in particular to a method for detecting a salient target of a remote sensing image.
Background
With the explosive increase of the data volume of remote sensing images, the traditional remote sensing image utilization method of artificial visual interpretation cannot meet the practical requirements, so that the development of an intelligent interpretation method for the remote sensing images is urgently needed. As an important preprocessing step of computer vision, salient object detection has achieved good results in natural scenes. However, due to the characteristics of different shooting angles, various ground feature types, complex background and the like of the remote sensing scene, the method for detecting the obvious target of the remote sensing image is still less. Meanwhile, in the process of detecting the obvious target in the remote sensing image, the existing method has poor detection effect on the edge area of the obvious target, and has a certain distance from the realization of application on the conditions that the small target is easy to have wrong detection, missing detection and the like.
Disclosure of Invention
In view of the above, the present invention provides a method for detecting a significant target in a remote sensing image, where the method adopts an encoder-decoder structure, introduces an attention mechanism, a feature flow mechanism, and a cascade decoding mechanism, and designs a new loss function to train a target detection model, so as to detect a significant target in a remote sensing image through the trained target detection model, thereby effectively improving a detection effect of a significant target edge of a remote sensing image, and improving conditions such as missing detection and error detection of a small target.
The invention discloses a method for detecting a salient target of a remote sensing image, which comprises the following steps:
s1, acquiring remote sensing image data containing a training set and a test set, and constructing a remote sensing image salient object detection model comprising a detection feature encoder and a cascade feature decoder;
s2, introducing an attention mechanism, a characteristic flow mechanism and a cascade decoding mechanism, training the remote sensing image salient target detection model based on remote sensing image data of a training set, stopping training until a preset loss function is converged, and obtaining the trained remote sensing image salient target detection model;
and S3, carrying out salient target prediction on the remote sensing image data of the test set by using the trained remote sensing image salient target detection model, and further outputting a corresponding salient map.
Further, the detection feature encoder is a dense attention flow encoder, which is obtained by improving a VGG16 network as a backbone network, and the improvement process is as follows: the last three fully connected layers of the VGG16 network are removed and truncated before the last pooling layer of VGG16 network, resulting in the dense attention flow encoder.
Further, the specific implementation manner of step S2 includes:
s21, introducing an attention mechanism, extracting output features of the last layer of each part from the improved VGG16 network, combining output feature dimensions based on a preset spatial pixel relation matrix to construct an operation matrix among pixels, and further realizing the representation of the relation among the pixels;
s22, performing normalization processing based on the operation matrix among the pixels to obtain attention weight, and multiplying the output features after dimensionality combination with the attention weight to obtain features after self attention weighting by using space;
s23, adding the output features and the features weighted by using the space attention by using a residual error connection mode, and obtaining output deep features through a connection channel attention mechanism, wherein the process is expressed as follows by a formula:
F=CA(f+δ·(f*(Re -1 (Re(f)⊙R))))
in the formula, Re -1 Expressing the inverse operation of output feature merging dimension, R expressing a pixel relation matrix, representing element-by-element multiplication, delta expressing a learnable coefficient, CA (DEG) expressing a channel attention mechanism, and f expressing the initial feature of the output of the main network;
s24, performing upsampling and 1 x 1 convolution on the deep features to adjust the size and the channel of the deep features and the current features to be consistent;
s25, splicing the deep features subjected to upsampling and 1 x 1 convolution with the current features from the next layer of the current features according to the sequence from the shallow layer to the deep layer based on a preset step-by-step splicing module;
s26, adjusting the channel number of the spliced features into the channel number of the deep features output by the detection feature encoder, and inputting the channel number into a cascade feature decoder for decoding;
and S27, activating the final output of the cascade feature decoder by using a Sigmoid function, and further completing the training of the remote sensing image salient target detection model.
Further, the preset spatial pixel relationship matrix in step S21 is formulated as:
M={(Re(f)) T ⊙Re(f)} T
where Re () denotes an operation of merging the last two dimensions of the output characteristic into one dimension, which indicates a matrix multiplication operation, and T denotes a transpose.
Further, the normalization process based on the operation matrix between the pixels in step S22 is formulated as:
Figure BDA0003763728030000031
in the formula, r (x, y) represents the degree of importance of the influence of the pixel x on the pixel y, m (x, y) represents an element in the pixel relationship matrix, and e represents a natural constant.
Further, the method further includes step S23', and the information extraction is performed on the output features by using multi-level pyramid fusion multi-scale spatial attention, specifically: updating the output features into three channels with different resolutions by 2-time and 4-time down sampling, refining the features with different scales by using multi-scale spatial attention fused by a multi-level pyramid, fusing the refined features and the output features based on a residual structure, then fusing the three-level features according to the sequence of resolution from low to high, further obtaining deep features of the multi-scale spatial attention weight fused by the multi-level pyramid, and finally merging the deep features output by the step S23 and the deep features of the multi-scale spatial attention weight fused by the multi-level pyramid.
Further, in step S25, the deep features after upsampling and 1 × 1 convolution are spliced with the current features, and are expressed by the following formula:
F k =Conc(Conv(Up(f 5 )),...Conv(Up(F k-1 )),F k )
where Up (-) denotes upsampling aligns the deep feature with the current feature, F k Representing the kth stage characteristic, F, fed to the concatenated decoder 5 Representing the 5 th level features fed into the concatenated decoder, Conv representing the convolutional layer.
Further, the preset loss function is a combined loss function with different weight coefficients, and is formulated as:
L=ω 1 L P2 L R3 L MAE4 L S
in the formula, L P 、L R 、L MAE And L s Respectively representing a precision loss term, a recall loss term, an average absolute error loss term and a structural similarity degree loss term, omega 1 、ω 2 、ω 3 、ω 4 Respectively represent L P 、L R 、L MAE And L s Wherein:
Figure BDA0003763728030000041
Figure BDA0003763728030000042
Figure BDA0003763728030000043
L S =1-S measure
S measure =α×S o +(1-α)×S r
in the formula, N is the total number of samples, N represents a sample serial number, J represents a high-direction pixel serial number of the remote sensing image, i represents a wide-direction pixel serial number of the remote sensing image, epsilon represents a preset constant, W and H are the width and the height of the remote sensing image respectively, S (i, J) e S represents a predicted value of each pixel, G (i, J) e G represents a true value of each pixel, S represents a significance prediction result, G represents a real label, and S represents a real label r For region-oriented similarity measurement, S o For the similarity measure facing the object structure, α represents a hyper-parameter, which is used to measure the similarity measure facing the region and the similarity measure facing the object structure.
Further, step S4 is included, comparing the output corresponding saliency map with the truth map, and further measuring the level of the saliency map generated by the remote sensing image saliency target model.
Further, the specific implementation manner of step S4 is: and measuring a saliency map generated by the remote sensing image saliency target model based on a preset index PR curve, an F value, an average absolute loss and an S value.
Compared with the prior art, the method for detecting the salient target of the remote sensing image has the following advantages that:
(1) the method uses the cascade structure to decode the features, so that more high-level semantic features can guide the feature decoding process, and the problems of missed detection and false detection of small targets in the remote sensing image are effectively solved.
(2) The invention designs a new loss function for training a remote sensing image salient target detection model, improves the prediction confidence coefficient of a salient region, and enables the model to predict more accurate salient target boundaries.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for detecting a salient object in a remote sensing image according to the present invention;
FIG. 2 is a schematic structural diagram of a method for detecting a salient object in a remote sensing image according to the present invention;
FIG. 3 is a schematic diagram of the self-attention mechanism of the present invention;
FIG. 4 is a graph of P-R curve results for an embodiment of the present invention;
FIG. 5 is a diagram of the results of improved false detection and missed detection of small targets, wherein (a) is a remote sensing image, (b) is a true value diagram of a salient target, (c) is an originally generated salient diagram, and (d) is a salient diagram generated by the method;
fig. 6 is a diagram showing the result of improving the prediction effect of the boundary region of the salient object, where (a) is a remote sensing image, (b) is a true value diagram of the salient object, (c) is an originally generated salient image, and (d) is a generated salient image according to the method.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1 to 6, the method for detecting a salient target in a remote sensing image of the invention comprises the following steps:
s1, acquiring remote sensing image data containing a training set and a test set, and constructing a remote sensing image salient object detection model comprising a detection feature encoder and a cascade feature decoder;
in this step, the detection feature encoder is a dense attention flow encoder, which is obtained by improving a VGG16 network as a backbone network, and the improvement process is as follows: removing the last three fully connected layers of the VGG16 network and pooling the layers at the last level of the VGG16 networkForward truncation, and thereby obtaining the dense attention flow encoder. Therefore, the characteristic dimension of the first four layers in the improved VGG16 network is
Figure BDA0003763728030000051
W and H are width and height of the remote sensing image respectively, k is a backbone network layer, the last layer is provided with a pooling layer removed, so that the characteristic dimension of the last layer is consistent with that of the fourth layer, and after each characteristic extraction layer is finished, current-level characteristics are selected and refined and then sent to the next level for extraction.
In this embodiment, EORSSD is used as a remote sensing image data set, 1400 remote sensing images are randomly selected from the remote sensing image data set as a training set, and 600 remote sensing images are used as a test set.
S2, introducing an attention mechanism, a characteristic flow mechanism and a cascade decoding mechanism, training the remote sensing image salient target detection model based on remote sensing image data of a training set, stopping training until a preset loss function is converged, and obtaining the trained remote sensing image salient target detection model;
in this step, it is specifically:
s21, introducing an attention mechanism, extracting output features of the last layer of each part from the improved VGG16 network, combining output feature dimensions based on a preset spatial pixel relation matrix to construct an operation matrix among pixels, and further realizing the representation of the relation among the pixels;
wherein the preset spatial pixel relationship matrix is expressed by a formula as:
M={(Re(f)) T ⊙Re(f)} T
where Re () denotes an operation of merging the last two dimensions of the output characteristic into one dimension, which indicates a matrix multiplication operation, and T denotes a transposition;
s22, performing normalization processing based on an operation matrix among pixels to obtain attention weight, multiplying the output features after dimensionality combination with the attention weight, and further restoring the temperature of the output features to obtain features after self-attention weighting by using space, wherein the obtained features after self-attention weighting by using space have global information;
the normalization processing based on the operation matrix among the pixels is expressed by a formula as follows:
Figure BDA0003763728030000061
in the formula, r (x, y) represents the influence importance degree of the pixel x on the pixel y, m (x, y) represents an element in the pixel relation matrix, and e represents a natural constant;
s23, adding the output features and the features weighted by using the space attention by using a residual error connection mode, and obtaining output deep features through a connection channel attention mechanism, wherein the process is expressed as follows by a formula:
F=CA(f+δ·(f*(Re -1 (Re(f)⊙R))))
in the formula, Re -1 Representing the inverse operation of merging dimensionality of output features, R representing a pixel relation matrix, x representing element-by-element multiplication, delta representing a learnable coefficient, CA (-) representing a channel attention mechanism, and f representing the initial feature of the output of the main network;
in this embodiment, after the improved VGG16 network feature extraction and feature refinement are performed on the remote sensing image, five features with different scales are finally formed, and deeper layers of the five features with different scales include more semantic features, and shallower layers retain more detailed features.
S24, performing upsampling and 1 x 1 convolution on the deep features to adjust the size and the channel of the deep features and the current features to be consistent;
s25, based on the preset step-by-step splicing module, splicing the deep features after upsampling and 1 x 1 convolution with the current features from the next layer of the current features according to the sequence from shallow to deep, wherein the splicing process is expressed by a formula as follows:
F k =Conc(Conv(Up(F 5 )),...Conv(Up(F k-1 )),F k )
wherein Up (. circle.) represents aboveSampling aligns deep features with current features, F k Representing the kth stage characteristic, F, fed to the concatenated decoder 5 Representing the 5 th level features fed into the concatenated decoder, Conv representing the convolutional layer;
in this embodiment, in order to realize complete extraction of image features, a multi-level feature is fused by an attention fusion method of cascading from a shallow layer to a deep layer, for example, a GCA4 module, a GCA4 module receives and splices output features of a GCA1 module, a GCA2 module, and a GCA3 module, and adjusts the number of channels to 1 again, so as to form a final attention map, which is expressed by a formula:
A 4 =Conv(Conc(A1,A2,A3,A4))
the attention map is then multiplied by the refined features and residual concatenation is used to generate the deep features that are fed into the concatenated feature decoder.
S26, adjusting the channel number of the spliced features into the channel number of the deep features output by the detection feature encoder, and inputting the channel number into a cascade feature decoder for decoding;
and S27, activating the final output of the cascade feature decoder by using a Sigmoid function, and further completing the training of the remote sensing image salient target detection model.
In the embodiment, because the deep features have the richest semantic features, each level of decoder can be guided to decode; however, all features from deeper layers have more semantic information than features from shallow layers, and by using the cascaded feature decoder, except the deepest global features, the deep-layer features obtained by each level of encoder can guide the shallow-layer decoder, thereby facilitating the generation of the final saliency map. Each decoder unit receives the output from the upper-level decoder and the deeply spliced features, and activates the output of the last decoder by using a Sigmoid function to obtain a final predicted saliency map.
And S3, carrying out salient target prediction on the remote sensing image data of the test set by using the trained remote sensing image salient target detection model, and further outputting a corresponding salient map.
In this embodiment, the salient target prediction is performed on the remote sensing image data of the test set based on the trained remote sensing image salient target detection model, so that a more accurate salient map of the salient target boundary can be obtained.
Wherein, the preset loss function is a combined loss function with different weight coefficients, and is expressed by a formula as follows:
L=ω 1 L P2 L R3 L MAE4 L S
in the formula, L P 、L R 、L MAE And L s Respectively representing a precision loss term, a recall loss term, an average absolute error loss term and a structural similarity degree loss term, omega 1 、ω 2 、ω 3 、ω 4 Respectively represent L P 、L R 、L MAE And L s Wherein:
Figure BDA0003763728030000081
Figure BDA0003763728030000082
Figure BDA0003763728030000083
L S =1-S measure
S measure =α×S o +(1-α)×S r
in the formula, N is the total number of samples, N represents a sample serial number, j represents a high-direction pixel serial number of the remote sensing image, i represents a wide-direction pixel serial number of the remote sensing image, epsilon represents a preset constant, W and H are the width and the height of the remote sensing image respectively, S (i, j) e S represents a predicted value of each pixel, G (i, j) e G represents a true value of each pixel, S represents a significance prediction result, G represents a real label, and S represents a real label r Being area-orientedSimilarity measure, S o For the similarity measure facing the object structure, α represents a hyper-parameter, which is used to measure the similarity measure facing the region and the similarity measure facing the object structure.
In this embodiment, since the difference of the images obtained by using the structural similarity to compare the structural information between the images better conforms to the perception result of human eyes, the problem that the detection capability of the cross entropy loss function on the edge portion in the process of detecting the salient object is not strong can be solved by using the combined loss function with different weight coefficients as the preset loss function.
In another embodiment, the method further includes step S4, comparing the output corresponding saliency map with the truth map, and further measuring a level of the saliency map generated by the remote sensing image saliency target model, specifically: and measuring a saliency map generated by the remote sensing image saliency target model based on a preset index PR curve, an F value, an average absolute loss and an S value.
In this embodiment, the saliency map generated by the model is compared with the truth map to quantitatively measure the level of saliency map generation. Specifically, four indexes, i.e., a PR curve, an F value, a mean absolute loss (MAE) and an S value, were used for evaluation.
Wherein Precision refers to the proportion of the positive samples with correct prediction to all positive samples, namely Precision; recall is the ratio of the positive sample in the label to be predicted correctly, namely Recall ratio, all (Precision, Recall) values can be obtained by adjusting the threshold value between (0,1), and a Precision-Recall (PR) curve can be obtained by connecting the values in sequence, so that the closer the PR curve is to the (1,1) point of the coordinate axis, the better the performance of the model is represented; as shown in fig. 4, fig. 4 shows a PR curve of the method for detecting a salient object in a remote sensing image in this embodiment.
Wherein the value of F is defined as
Figure BDA0003763728030000091
In the formula, beta 2 Set to 0.3 to emphasize Precision importance;
the mean absolute loss (MAE) is an index for measuring the absolute error of the significant prediction diagram and the truth diagram, and is expressed by the formula:
Figure BDA0003763728030000092
in the formula, S represents a significance prediction result, and G represents a true tag.
S measure The value is an index for measuring the generated saliency map from the structural similarity, and is formulated as:
S measure =α×S o +(1-α)×S r
in the formula, S r For region-oriented similarity measurement, S o For similarity measurements towards object structures, α represents a hyper-parameter.
In this embodiment, the method for detecting the salient object of the remote sensing image is used to detect the salient object of the test set, and the detection results are shown in table 1, fig. 5 and fig. 6.
Table 1 shows the detection results of the salient objects in the remote sensing images
Evaluation index F MAE S
Value of 0.9031 0.0048 0.9189
As can be seen from fig. 5 and 6, the method for detecting the salient object in the remote sensing image can accurately predict the salient object and the boundary region prediction result of the salient object, and can accurately predict the salient object in the small object scene, thereby reducing the situations of missing detection and false detection.
In another embodiment, the method further includes step S23', and performing information extraction on the output features by using multi-level pyramid to fuse multi-scale spatial attention, specifically: updating the output features into three channels with different resolutions by 2-time and 4-time down sampling, refining the features with different scales by using multi-scale spatial attention fused by a multi-level pyramid, fusing the refined features and the output features based on a residual structure, then fusing the three-level features according to the sequence of resolution from low to high, further obtaining deep features of the multi-scale spatial attention weight fused by the multi-level pyramid, and finally merging the deep features output by the step S23 and the deep features of the multi-scale spatial attention weight fused by the multi-level pyramid.
In this embodiment, besides the attention among the pixels, the multi-scale attention of the whole image space can also extract useful information, and specifically, after obtaining the feature output with self attention, the GCA module will also use the multi-level pyramid to fuse the multi-scale spatial attention.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for detecting a salient target of a remote sensing image is characterized by comprising the following steps:
s1, acquiring remote sensing image data including a training set and a test set, and constructing a remote sensing image salient object detection model including a detection feature encoder and a cascade feature decoder;
s2, introducing an attention mechanism, a characteristic flow mechanism and a cascade decoding mechanism, training the remote sensing image salient target detection model based on remote sensing image data of a training set, stopping training until a preset loss function is converged, and obtaining the trained remote sensing image salient target detection model;
and S3, carrying out salient target prediction on the remote sensing image data of the test set by using the trained remote sensing image salient target detection model, and further outputting a corresponding salient map.
2. The method for detecting the salient object in the remote sensing image according to claim 1, wherein the detection feature encoder is a dense attention flow encoder, the dense attention flow encoder is obtained by improving a VGG16 network as a backbone network, and the improvement process comprises the following steps: the last three fully connected layers of the VGG16 network are removed and truncated before the last pooling layer of VGG16 network, resulting in the dense attention flow encoder.
3. The method for detecting the salient object in the remote sensing image according to claim 2, wherein the step S2 is implemented in a specific manner that includes:
s21, introducing an attention mechanism, extracting output features of the last layer of each part from the improved VGG16 network, combining output feature dimensions based on a preset spatial pixel relation matrix to construct an operation matrix among pixels, and further realizing the representation of the relation among the pixels;
s22, performing normalization processing based on the operation matrix among the pixels to obtain attention weight, and multiplying the output features after dimensionality combination with the attention weight to obtain features after self attention weighting by using space;
s23, adding the output features and the features weighted by using the space attention by using a residual error connection mode, and obtaining output deep features through a connection channel attention mechanism, wherein the process is expressed as follows by a formula:
F=CA(f+δ·(f*(Re -1 (Re(f)⊙R))))
in the formula, Re -1 Expressing the inverse operation of output feature merging dimension, R expressing a pixel relation matrix, representing element-by-element multiplication, delta expressing a learnable coefficient, CA (DEG) expressing a channel attention mechanism, and f expressing the initial feature of the output of the main network;
s24, performing upsampling and 1 x 1 convolution on the deep features to adjust the size and the channel of the deep features and the current features to be consistent;
s25, splicing the deep features subjected to upsampling and 1 x 1 convolution with the current features from the next layer of the current features according to the sequence from the shallow layer to the deep layer based on a preset step-by-step splicing module;
s26, adjusting the channel number of the spliced features into the channel number of the deep features output by the detection feature encoder, and inputting the channel number into a cascade feature decoder for decoding;
and S27, activating the final output of the cascade feature decoder by using a Sigmoid function, and further completing the training of the remote sensing image salient target detection model.
4. The method for detecting the salient object in the remote sensing image according to claim 3, wherein the preset spatial pixel relation matrix in the step S21 is expressed by a formula:
M={(Re(f)) T ⊙Re(f)} T
where Re () denotes an operation of merging the last two dimensions of the output characteristic into one dimension, which indicates a matrix multiplication operation, and T denotes a transpose.
5. The method for detecting the salient object in the remote sensing image according to the claim 4, wherein the normalization processing based on the operation matrix among the pixels in the step S22 is expressed by a formula:
Figure FDA0003763728020000021
in the formula, r (x, y) represents the degree of importance of the influence of the pixel x on the pixel y, m (x, y) represents an element in the pixel relationship matrix, and e represents a natural constant.
6. The method for detecting the salient object in the remote sensing image according to claim 5, further comprising a step S23' of extracting information of output features by adopting multi-level pyramid fusion multi-scale spatial attention, specifically: updating the output features into three channels with different resolutions by 2-time and 4-time down sampling, refining the features with different scales by using multi-scale spatial attention fused by a multi-level pyramid, fusing the refined features and the output features based on a residual structure, then fusing the three-level features according to the sequence of resolution from low to high, further obtaining deep features of the multi-scale spatial attention weight fused by the multi-level pyramid, and finally merging the deep features output by the step S23 and the deep features of the multi-scale spatial attention weight fused by the multi-level pyramid.
7. The method for detecting the salient object in the remote sensing image according to claim 6, wherein in the step S25, the deep features after the upsampling and the 1 x 1 convolution are spliced with the current features, and the formula is as follows:
F k =Conc(Conv(Up(F 5 )),...Conv(Up(F k-1 )),F k )
where Up (-) denotes upsampling aligns the deep feature with the current feature, F k Representing the kth stage characteristic, F, fed to the concatenated decoder 5 Representing the 5 th level features fed into the concatenated decoder, Conv representing the convolutional layer.
8. The method for detecting the salient object in the remote sensing image according to claim 7, wherein the preset loss function is a combined loss function with different weight coefficients, and is expressed by a formula:
L=ω 1 L P2 L R3 L MAE4 L S
in the formula, L P 、L R 、L MAE And L s Respectively representLoss of precision term, recall loss term, mean absolute error loss term, and structural similarity loss term, ω 1 、ω 2 、ω 3 、ω 4 Respectively represent L P 、L R 、L MAE And L s Wherein:
Figure FDA0003763728020000031
Figure FDA0003763728020000032
Figure FDA0003763728020000033
L S =1-S measure
S measure =α×S o +(1-α)×S r
in the formula, N is the total number of samples, N represents a sample serial number, j represents a high-direction pixel serial number of the remote sensing image, i represents a wide-direction pixel serial number of the remote sensing image, epsilon represents a preset constant, W and H are the width and the height of the remote sensing image respectively, S (i, j) e S represents a predicted value of each pixel, G (i, j) e G represents a true value of each pixel, S represents a significance prediction result, G represents a real label, and S represents a real label r For region-oriented similarity measurement, S o For the similarity measure facing the object structure, α represents a hyper-parameter, which is used to measure the similarity measure facing the region and the similarity measure facing the object structure.
9. The method for detecting the salient target of the remote sensing image according to claim 8, further comprising a step S4 of comparing the output corresponding salient map with a truth map, and further measuring the level of the salient map generated by the remote sensing image salient target model.
10. The method for detecting the salient object in the remote sensing image according to claim 9, wherein the step S4 is specifically realized by: and measuring a saliency map generated by the remote sensing image saliency target model based on a preset index PR curve, an F value, an average absolute loss and an S value.
CN202210879580.0A 2022-07-25 2022-07-25 Method for detecting salient target of remote sensing image Pending CN115115938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210879580.0A CN115115938A (en) 2022-07-25 2022-07-25 Method for detecting salient target of remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210879580.0A CN115115938A (en) 2022-07-25 2022-07-25 Method for detecting salient target of remote sensing image

Publications (1)

Publication Number Publication Date
CN115115938A true CN115115938A (en) 2022-09-27

Family

ID=83334609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210879580.0A Pending CN115115938A (en) 2022-07-25 2022-07-25 Method for detecting salient target of remote sensing image

Country Status (1)

Country Link
CN (1) CN115115938A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620163A (en) * 2022-10-28 2023-01-17 西南交通大学 Semi-supervised learning deep cut valley intelligent identification method based on remote sensing image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620163A (en) * 2022-10-28 2023-01-17 西南交通大学 Semi-supervised learning deep cut valley intelligent identification method based on remote sensing image

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN107766894B (en) Remote sensing image natural language generation method based on attention mechanism and deep learning
CN110738146B (en) Target re-recognition neural network and construction method and application thereof
CN102385592B (en) Image concept detection method and device
CN111325750B (en) Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN111462124A (en) Remote sensing satellite cloud detection method based on Deep L abV3+
CN110020658B (en) Salient object detection method based on multitask deep learning
CN113673346A (en) Motor vibration data processing and state recognition method based on multi-scale SE-Resnet
CN115131580B (en) Space target small sample identification method based on attention mechanism
CN114821340A (en) Land utilization classification method and system
CN115661459A (en) 2D mean teacher model using difference information
CN115115938A (en) Method for detecting salient target of remote sensing image
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN114821299A (en) Remote sensing image change detection method
CN115994558A (en) Pre-training method, device, equipment and storage medium of medical image coding network
CN113408540A (en) Synthetic aperture radar image overlap area extraction method and storage medium
CN110766708B (en) Image comparison method based on contour similarity
CN117152435A (en) Remote sensing semantic segmentation method based on U-Net3+
CN116959098A (en) Pedestrian re-recognition method and system based on dual-granularity tri-modal measurement learning
CN116778318A (en) Convolutional neural network remote sensing image road extraction model and method
CN111047525A (en) Method for translating SAR remote sensing image into optical remote sensing image
CN115797684A (en) Infrared small target detection method and system based on context information
CN115641498A (en) Medium-term rainfall forecast post-processing correction method based on space multi-scale convolutional neural network
CN115546638A (en) Change detection method based on Siamese cascade differential neural network
CN116486183B (en) SAR image building area classification method based on multiple attention weight fusion characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination