CN116665065B - Cross attention-based high-resolution remote sensing image change detection method - Google Patents
Cross attention-based high-resolution remote sensing image change detection method Download PDFInfo
- Publication number
- CN116665065B CN116665065B CN202310934058.2A CN202310934058A CN116665065B CN 116665065 B CN116665065 B CN 116665065B CN 202310934058 A CN202310934058 A CN 202310934058A CN 116665065 B CN116665065 B CN 116665065B
- Authority
- CN
- China
- Prior art keywords
- change detection
- attention
- spatial
- remote sensing
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008859 change Effects 0.000 title claims abstract description 69
- 238000001514 detection method Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 16
- 230000011218 segmentation Effects 0.000 claims abstract description 14
- 238000011176 pooling Methods 0.000 claims abstract description 11
- 238000010586 diagram Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 45
- 230000006870 function Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000006116 polymerization reaction Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims 1
- 230000007246 mechanism Effects 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Abstract
The invention provides a high-resolution remote sensing image change detection method based on crisscross attention, and belongs to the technical field of remote sensing science. The method comprises the following steps: and acquiring a high-resolution remote sensing image change detection data set, and carrying out data enhancement on the training set high-resolution remote sensing image change detection data set data. Constructing a change detection model, wherein the change detection model comprises an encoder, a space-time attention module and a decoder; will trainThe encoder in the set data input change detection model performs feature extraction to obtain adjacent two-phase multi-scale ground feature images、Characteristic diagram of multi-scale ground object、Inputting the space-time attention module to obtain the characteristics、. Features to be characterized、And (3) processing by a pyramid pooling module to obtain a change detection model after training is completed. And inputting the test set data into the trained change detection model to obtain a detection result. The invention improves the efficiency of semantic segmentation and reduces the consumption of computing resources.
Description
Technical Field
The invention relates to a high-resolution remote sensing image change detection method based on crisscross attention, and belongs to the technical field of remote sensing science.
Background
The change detection is an important research direction in the remote sensing field, adopts an image and image processing method and a mathematical model, combines the characteristics of ground features and corresponding remote sensing imaging mechanisms, filters incoherent change information serving as interference factors from multi-period remote sensing images and related geospatial data of the same ground surface area, and further finds out interesting change information. The change detection provides a research basis for urban planning and reconstruction, environment monitoring, disaster assessment and other fields by identifying the image change of the ground object in different periods, and has wide application scenes.
With the development of deep learning technology, a change detection method based on deep learning has become a research hotspot for detecting the change of remote sensing images. People apply the method to the change detection of hyperspectral images so as to improve the detection precision to a certain extent. Wang et al (2022) propose an end-to-end dense connection network named Y-Net that uses a twin architecture for multiple types of change detection. The proposed network uses dual stream DenseNet to extract dual temporal variation features during the encoding phase and introduces an attention fusion mechanism during the decoding phase to enhance the attention to the variation features. Chen et al (2021) propose a new approach, namely a dual-attention full convolution twin network, for change detection in high resolution images. Through a dual-attention mechanism, long-distance dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. However, more parameters are introduced into the model by the models, so that the complexity of the model is greatly improved, the accuracy of the model is lower, and the subsequent deployment of the model is influenced to a certain extent.
In general, the existing attention method cannot effectively highlight the image difference information and cannot effectively highlight the interesting semantic information, which affects the improvement of the detection accuracy to a certain extent.
Disclosure of Invention
The invention aims to provide a high-resolution remote sensing image change detection method based on crisscross attention, which improves the efficiency of semantic segmentation and reduces the consumption of computing resources.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:
step 1: and acquiring a high-resolution remote sensing image change detection image pair and a corresponding binary semantic segmentation label thereof to obtain a high-resolution remote sensing image change detection data set, and acquiring two-phase image data and ground feature change label data of the same region.
Step 2: and dividing the high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data.
Step 3: a change detection model is constructed, the change detection model comprising an encoder, a spatiotemporal attention module, and a decoder.
Step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model that removes the last pooling layer and full-connection layer.
Step 5: feature map of multi-scale ground object、/>A spatiotemporal attention module is input, the spatiotemporal attention module comprising a crisscrossed spatial attention module and a crisscrossed temporal attention module.
Firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>。
And then the multiscale space characteristic diagram of the refined space information、/>The pixels in the transverse and longitudinal directions and the time-space directions are respectively aggregated through the crisscross time attention module to obtain the characteristic +.>、/>。
Step 6: features to be characterized、/>And processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model.
Step 7: and inputting the test set data into the trained change detection model to obtain a detection result.
Preferably, the data enhancement mode includes: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.
Preferably, the multi-scale ground object feature map、/>Acquiring a multiscale space feature map of corresponding refined space information through crisscross space attention modules respectively +.>、/>The specific mode of (2) is as follows:
acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,,/>is the number of channels of the feature>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature.
Calculating spatial dimension tensors、/>Similarity of->Applying a softmax function toObtain a spatial attention weight matrix +.>The softmax function refers to a normalized exponential function that maps numbers to numbers from 0 to 1.
The similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->。
wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>。/>Indicate->Vector of scalar values>Said vector->Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions, < >>。
Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->。
Acquiring each positionFeatures of->Get the characteristics->And->Doing againAnd inputting the cross space attention module for the initial features to acquire a multi-scale space feature map of the refined space information.
Preferably, the multi-scale space feature map of the spatial information to be refined、/>Pixels in the transverse and longitudinal directions and in the time-space directions are respectively aggregated through the crisscross time attention module, so that the characteristic that the aggregated information is stronger and has more comprehensive characteristic space expression capability is obtained>、/>The specific mode of (2) is as follows:
multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>. Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>. wherein ,/>,/>。
Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->。
The similarity degreeIs a spatial dimension tensor-> and />First->Scalar value locationSimilarity of->Is a set of (a) and (b),
,
wherein ,representing spatial dimension tensor->In every position->Vector on->,/>Represent the firstVector of scalar values>, wherein ,/>Representing spatial dimension tensor->Corresponding to position->Vector in the transverse and longitudinal directions.
Tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:
,
,
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->。
Is the position->Features of the upper part->,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the positionVector of->。
Acquiring each positionFeatures of->Get the characteristics->Obtain every position +.>Features of->Obtaining characteristicsAnd->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>。
For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.
Preferably, the pyramid pooling module comprises a convolution layer with 3 scales, a batch standardization layer and a correction linear unit, and the three layers are connected to form the ConvBNReLU module.
And respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by adopting bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing the feature images on the channels. And then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.
Advantageous effects
The invention has the advantages that:
1. in the process of detecting the remote sensing images in double time phases, a crisscross space-time attention mechanism is introduced and the structure of the model is optimized. Secondly, two space-time attention modules are designed by using the crisscross principle, and more efficient attention is paid to semantic segmentation results and change detection results.
2. In the feature extraction stage, the esnet model with the pooling removed and the full connection layer is used as a twin convolutional neural network to perform feature extraction in a weight sharing mode, so that the time required in the training stage can be greatly reduced, and the training efficiency is improved.
3. The space-time attention module can be conveniently and efficiently applied to various levels of multi-scale features without downsampling to a fixed scale. And combining the pyramid pooling model to obtain a change detection result with better segmentation performance.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a diagram of the overall network architecture and flow chart of the present invention.
FIG. 2 is a block diagram of a crisscrossed spatial attention model of the present invention.
FIG. 3 is a block diagram of a crisscross temporal attention model of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The method is different from the prior space-time attention mechanism, realizes time attention by modeling the relationship of double-time-phase characteristics through the cross attention extraction of double-time-phase characteristics based on the cross principle, realizes space attention by extracting the space characteristics of each time phase through the cross attention based on the cross principle, and realizes the change detection of the high-resolution remote sensing image.
Step 1: and acquiring a high-resolution remote sensing image change detection image pair and a corresponding binary semantic segmentation label thereof to obtain a high-resolution remote sensing image change detection data set, and acquiring two-phase image data and ground feature change label data of the same region.
Step 2: dividing a high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data; the data enhancement mode comprises the following steps: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.
Step 3: a change detection model is constructed, the change detection model comprising an encoder, a spatiotemporal attention module, and a decoder.
Step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model that removes the last pooling layer and full-connection layer.
Step 5: feature map of multi-scale ground object、/>A spatiotemporal attention module is input, the spatiotemporal attention module comprising a crisscrossed spatial attention module and a crisscrossed temporal attention module.
Firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>The method comprises the steps of carrying out a first treatment on the surface of the The specific mode is as follows:
acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,/>Is special toNumber of channels of symptoms, < >>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature.
Calculating spatial dimension tensors、/>Similarity of->Applying the softmax function to D results in a spatial attention weighting matrix +.>The softmax function refers to a normalized exponential function that maps numbers to numbers from 0 to 1.
The similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->。
wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>;/>Indicate->Vector of position->Said vector->Representing spatial dimension tensor->Corresponding to the position->Vectors in the transverse and longitudinal directions, < >>。
Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:
,
wherein ,is->Features on location->,/>Is->Position->Spatial attention weight matrix of individual scalar values +.>,/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is->Vector of position->。
Acquiring each ofFeatures of the location->Get the characteristics->And->And inputting the multi-scale spatial feature map serving as the initial feature into a crisscross spatial attention module to acquire refined spatial information.
And then the multiscale space characteristic diagram of the refined space information、/>Pixels in the transverse and longitudinal directions and in the time-space directions are respectively aggregated through the crisscross time attention module, so that the characteristic that the aggregated information is stronger and has more comprehensive characteristic space expression capability is obtained>、/>The method comprises the steps of carrying out a first treatment on the surface of the The specific mode is as follows:
multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>The method comprises the steps of carrying out a first treatment on the surface of the Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>; wherein />,/>。
Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->。
The similarity degreeFor the space dimensionTensor-> and />First->Scalar value locationSimilarity of->Of (2), wherein->。
wherein Representing spatial dimension tensor->In every position->Vector on->,/>Represent the firstVector of position->, wherein />Representing spatial dimension tensor->Corresponding to the position->Vector in the transverse and longitudinal directions.
Tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:
,
,
wherein ,is->Features on location->,/>Is->Position->Spatial attention weight matrix of individual scalar values +.>,/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is->Vector of position->。
Is->Features on location->,/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is->Vector of position->。
Acquiring each ofFeatures of the location->Get the characteristics->Obtain each->Features of the location->Obtaining characteristicsAnd willCharacteristics->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>;
For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.
Step 6: features to be characterized、/>Processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model; the pyramid segmentation module comprises a convolution layer, a batch standardization layer and a correction linear unit with 3 scales, and the three layers are connected to form the ConvBNReLU module.
Respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing on the channels; and then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.
Step 7: and inputting the test set data into the trained change detection model to obtain a detection result.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Compared with the prior art:
the present method compares the results of several other change detection methods on the LEVIR-CD dataset, including FC-EF, STANet, BIT, changeFormer. The quantitative comparison results are shown in Table 1, and units of Precision, recall, F-Score and OA are%. The total parameter unit is MB. The recall rate and F1 fraction of the method are higher than those of other 4 methods, and Precision, recall, F and OA are respectively improved by 3.33%, 11.96%, 7.78% and 7.78% compared with FC-EF. Compared with STANet, precision, recall, F and OA are respectively improved by 6.43%, 1.13%, 3.92% and 0.45%. Compared with BIT, precision, recall, F and OA are respectively improved by 1.00%, 2.76%, 1.87% and 0.19%. Compared with ChangeFomer, recall, F1, OA was increased by 3.33%, 0.78%,0.07% and Precision was decreased by 1.81%, respectively.
Table 1 compares the quantitative results of the experiment on the LEVIR-CD dataset.
Claims (7)
1. The high-resolution remote sensing image change detection method based on the crisscross attention is characterized by comprising the following steps of:
step 1: acquiring a high-resolution remote sensing image change detection image pair and a binary semantic segmentation label corresponding to the high-resolution remote sensing image change detection image pair to obtain a high-resolution remote sensing image change detection data set, and acquiring two-time-phase image data and ground feature change label data of the same region;
step 2: dividing a high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data;
step 3: constructing a change detection model, wherein the change detection model comprises an encoder, a space-time attention module and a decoder;
step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model for removing the last pooling layer and the full connection layer;
step 5: feature map of multi-scale ground object、/>Inputting a space-time attention module, wherein the space-time attention module comprises a crisscross space attention module and a crisscross time attention module;
firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>;
And then the multiscale space characteristic diagram of the refined space information、/>The pixels in the transverse and longitudinal directions and the time-space directions are respectively aggregated through the crisscross time attention module to obtain the characteristic +.>、/>;
Step 6: features to be characterized、/>Processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model;
step 7: and inputting the test set data into the trained change detection model to obtain a detection result.
2. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as set forth in claim 1, wherein the data enhancement method includes: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.
3. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 1, wherein the multi-scale ground object feature map is generated by、/>Acquiring a multiscale space feature map of corresponding refined space information through crisscross space attention modules respectively +.>、/>The specific mode of (2) is as follows:
acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,,/>is the number of channels of the feature>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature;
calculating spatial dimension tensors、/>Similarity of->Applying a softmax function toObtain a spatial attention weight matrix +.>The softmax function refers to a normalized exponential function mapping numbers to numbers from 0 to 1;
the similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->;
wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>;/>Represent the firstVector of scalar values>Said vector->Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions, < >>;
Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to the positionVectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->;
Acquiring each positionFeatures of->Get the characteristics->And->And inputting the multi-scale spatial feature map serving as the initial feature into a crisscross spatial attention module to acquire refined spatial information.
4. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 3, wherein said multi-scale spatial feature map of spatial information to be refined、/>The pixels in the transverse direction, the longitudinal direction and the time-space direction are respectively aggregated through the crisscross time attention module to obtain the characteristics/>、/>The specific mode of (2) is as follows:
multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>The method comprises the steps of carrying out a first treatment on the surface of the Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>; wherein ,/>,/>;
Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->;
The similarity degreeIs a spatial dimension tensor-> and />First->Scalar value location +.>Similarity of->Is a set of (a) and (b),
,
wherein ,representing spatial dimension tensor->In every position->Vector on->,/>Represent the firstVector of scalar values>, wherein ,/>Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions;
tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:
,
,
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->;
Is the position->Features of the upper part->,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->;
Acquiring each positionFeatures of->Get the characteristics->Obtain every position +.>Features of->Get the characteristics->And->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>;
For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.
5. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 1, wherein the pyramid pooling module comprises a convolution layer, a batch standardization layer and a correction linear unit with 3 scales, and the three layers are connected to form a ConvBNReLU module;
respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing on the channels; and then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.
6. An apparatus for cross-attention based high resolution remote sensing image change detection comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the cross-attention based high resolution remote sensing image change detection method of any one of claims 1 to 5 when the program instructions are run.
7. A storage medium storing program instructions which, when executed, perform the method for detecting changes in high resolution remote sensing images based on crisscross attention as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310934058.2A CN116665065B (en) | 2023-07-28 | 2023-07-28 | Cross attention-based high-resolution remote sensing image change detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310934058.2A CN116665065B (en) | 2023-07-28 | 2023-07-28 | Cross attention-based high-resolution remote sensing image change detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116665065A CN116665065A (en) | 2023-08-29 |
CN116665065B true CN116665065B (en) | 2023-10-17 |
Family
ID=87720914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310934058.2A Active CN116665065B (en) | 2023-07-28 | 2023-07-28 | Cross attention-based high-resolution remote sensing image change detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116665065B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117372879B (en) * | 2023-12-07 | 2024-03-26 | 山东建筑大学 | Lightweight remote sensing image change detection method and system based on self-supervision enhancement |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183360A (en) * | 2020-09-29 | 2021-01-05 | 上海交通大学 | Lightweight semantic segmentation method for high-resolution remote sensing image |
CN113706482A (en) * | 2021-08-16 | 2021-11-26 | 武汉大学 | High-resolution remote sensing image change detection method |
CN114049335A (en) * | 2021-11-18 | 2022-02-15 | 感知天下(北京)信息科技有限公司 | Remote sensing image change detection method based on space-time attention |
US11482048B1 (en) * | 2022-05-10 | 2022-10-25 | INSEER Inc. | Methods and apparatus for human pose estimation from images using dynamic multi-headed convolutional attention |
CN115471467A (en) * | 2022-08-31 | 2022-12-13 | 核工业北京地质研究院 | High-resolution optical remote sensing image building change detection method |
CN116166642A (en) * | 2022-11-29 | 2023-05-26 | 北京航空航天大学 | Spatio-temporal data filling method, system, equipment and medium based on guide information |
CN116187561A (en) * | 2022-04-13 | 2023-05-30 | 北京工业大学 | PM (particulate matter) based on space time domain convolution network 10 Concentration refinement prediction method |
-
2023
- 2023-07-28 CN CN202310934058.2A patent/CN116665065B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183360A (en) * | 2020-09-29 | 2021-01-05 | 上海交通大学 | Lightweight semantic segmentation method for high-resolution remote sensing image |
CN113706482A (en) * | 2021-08-16 | 2021-11-26 | 武汉大学 | High-resolution remote sensing image change detection method |
CN114049335A (en) * | 2021-11-18 | 2022-02-15 | 感知天下(北京)信息科技有限公司 | Remote sensing image change detection method based on space-time attention |
CN116187561A (en) * | 2022-04-13 | 2023-05-30 | 北京工业大学 | PM (particulate matter) based on space time domain convolution network 10 Concentration refinement prediction method |
US11482048B1 (en) * | 2022-05-10 | 2022-10-25 | INSEER Inc. | Methods and apparatus for human pose estimation from images using dynamic multi-headed convolutional attention |
CN115471467A (en) * | 2022-08-31 | 2022-12-13 | 核工业北京地质研究院 | High-resolution optical remote sensing image building change detection method |
CN116166642A (en) * | 2022-11-29 | 2023-05-26 | 北京航空航天大学 | Spatio-temporal data filling method, system, equipment and medium based on guide information |
Non-Patent Citations (3)
Title |
---|
A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection;Hao Chen等;《Remote Sensing》;第12卷(第10期);全文 * |
基于双注意力机制的遥感图像目标检测;周幸;陈立福;;计算机与现代化(第08期);全文 * |
基于深度编-解码结构的高分辨遥感影像变化检测研究;余江南;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116665065A (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fan et al. | Balanced two-stage residual networks for image super-resolution | |
Shah et al. | Stacked U-Nets: a no-frills approach to natural image segmentation | |
CN116665065B (en) | Cross attention-based high-resolution remote sensing image change detection method | |
Zhao et al. | A deep cascade of neural networks for image inpainting, deblurring and denoising | |
Zhang et al. | An unsupervised remote sensing single-image super-resolution method based on generative adversarial network | |
Khan et al. | An encoder–decoder deep learning framework for building footprints extraction from aerial imagery | |
Duan et al. | Research on the natural image super-resolution reconstruction algorithm based on compressive perception theory and deep learning model | |
Ye et al. | Efficient point cloud segmentation with geometry-aware sparse networks | |
Xu et al. | Efficient image super-resolution integration | |
He et al. | Degradation-resistant unfolding network for heterogeneous image fusion | |
Chen et al. | Adaptive fusion network for RGB-D salient object detection | |
Zhou et al. | Adaptive weighted locality-constrained sparse coding for glaucoma diagnosis | |
Chaudhary et al. | Satellite imagery analysis for road segmentation using U-Net architecture | |
Shao et al. | Generative image inpainting with salient prior and relative total variation | |
Cao et al. | DAEANet: Dual auto-encoder attention network for depth map super-resolution | |
Xu et al. | Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation | |
Zhou et al. | A superior image inpainting scheme using Transformer-based self-supervised attention GAN model | |
Zhao et al. | Sharp feature consolidation from raw 3D point clouds via displacement learning | |
CN113208641A (en) | Pulmonary nodule auxiliary diagnosis method based on three-dimensional multi-resolution attention capsule network | |
Afzal et al. | Discriminative feature abstraction by deep L2 hypersphere embedding for 3D mesh CNNs | |
CN116343052A (en) | Attention and multiscale-based dual-temporal remote sensing image change detection network | |
Pu et al. | Hyperspectral image classification with localized spectral filtering-based graph attention network | |
Pandey et al. | A conspectus of deep learning techniques for single-image super-resolution | |
Ahn et al. | Multi-branch neural architecture search for lightweight image super-resolution | |
Qiao et al. | Depth super-resolution from explicit and implicit high-frequency features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |