CN116665065B - Cross attention-based high-resolution remote sensing image change detection method - Google Patents

Cross attention-based high-resolution remote sensing image change detection method Download PDF

Info

Publication number
CN116665065B
CN116665065B CN202310934058.2A CN202310934058A CN116665065B CN 116665065 B CN116665065 B CN 116665065B CN 202310934058 A CN202310934058 A CN 202310934058A CN 116665065 B CN116665065 B CN 116665065B
Authority
CN
China
Prior art keywords
change detection
attention
spatial
remote sensing
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310934058.2A
Other languages
Chinese (zh)
Other versions
CN116665065A (en
Inventor
邢华桥
孙雨生
项俊武
王海航
赵欣
仇培元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN202310934058.2A priority Critical patent/CN116665065B/en
Publication of CN116665065A publication Critical patent/CN116665065A/en
Application granted granted Critical
Publication of CN116665065B publication Critical patent/CN116665065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Abstract

The invention provides a high-resolution remote sensing image change detection method based on crisscross attention, and belongs to the technical field of remote sensing science. The method comprises the following steps: and acquiring a high-resolution remote sensing image change detection data set, and carrying out data enhancement on the training set high-resolution remote sensing image change detection data set data. Constructing a change detection model, wherein the change detection model comprises an encoder, a space-time attention module and a decoder; will trainThe encoder in the set data input change detection model performs feature extraction to obtain adjacent two-phase multi-scale ground feature imagesCharacteristic diagram of multi-scale ground objectInputting the space-time attention module to obtain the characteristics. Features to be characterizedAnd (3) processing by a pyramid pooling module to obtain a change detection model after training is completed. And inputting the test set data into the trained change detection model to obtain a detection result. The invention improves the efficiency of semantic segmentation and reduces the consumption of computing resources.

Description

Cross attention-based high-resolution remote sensing image change detection method
Technical Field
The invention relates to a high-resolution remote sensing image change detection method based on crisscross attention, and belongs to the technical field of remote sensing science.
Background
The change detection is an important research direction in the remote sensing field, adopts an image and image processing method and a mathematical model, combines the characteristics of ground features and corresponding remote sensing imaging mechanisms, filters incoherent change information serving as interference factors from multi-period remote sensing images and related geospatial data of the same ground surface area, and further finds out interesting change information. The change detection provides a research basis for urban planning and reconstruction, environment monitoring, disaster assessment and other fields by identifying the image change of the ground object in different periods, and has wide application scenes.
With the development of deep learning technology, a change detection method based on deep learning has become a research hotspot for detecting the change of remote sensing images. People apply the method to the change detection of hyperspectral images so as to improve the detection precision to a certain extent. Wang et al (2022) propose an end-to-end dense connection network named Y-Net that uses a twin architecture for multiple types of change detection. The proposed network uses dual stream DenseNet to extract dual temporal variation features during the encoding phase and introduces an attention fusion mechanism during the decoding phase to enhance the attention to the variation features. Chen et al (2021) propose a new approach, namely a dual-attention full convolution twin network, for change detection in high resolution images. Through a dual-attention mechanism, long-distance dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. However, more parameters are introduced into the model by the models, so that the complexity of the model is greatly improved, the accuracy of the model is lower, and the subsequent deployment of the model is influenced to a certain extent.
In general, the existing attention method cannot effectively highlight the image difference information and cannot effectively highlight the interesting semantic information, which affects the improvement of the detection accuracy to a certain extent.
Disclosure of Invention
The invention aims to provide a high-resolution remote sensing image change detection method based on crisscross attention, which improves the efficiency of semantic segmentation and reduces the consumption of computing resources.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:
step 1: and acquiring a high-resolution remote sensing image change detection image pair and a corresponding binary semantic segmentation label thereof to obtain a high-resolution remote sensing image change detection data set, and acquiring two-phase image data and ground feature change label data of the same region.
Step 2: and dividing the high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data.
Step 3: a change detection model is constructed, the change detection model comprising an encoder, a spatiotemporal attention module, and a decoder.
Step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model that removes the last pooling layer and full-connection layer.
Step 5: feature map of multi-scale ground object、/>A spatiotemporal attention module is input, the spatiotemporal attention module comprising a crisscrossed spatial attention module and a crisscrossed temporal attention module.
Firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>
And then the multiscale space characteristic diagram of the refined space information、/>The pixels in the transverse and longitudinal directions and the time-space directions are respectively aggregated through the crisscross time attention module to obtain the characteristic +.>、/>
Step 6: features to be characterized、/>And processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model.
Step 7: and inputting the test set data into the trained change detection model to obtain a detection result.
Preferably, the data enhancement mode includes: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.
Preferably, the multi-scale ground object feature map、/>Acquiring a multiscale space feature map of corresponding refined space information through crisscross space attention modules respectively +.>、/>The specific mode of (2) is as follows:
acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,,/>is the number of channels of the feature>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature.
Calculating spatial dimension tensors、/>Similarity of->Applying a softmax function toObtain a spatial attention weight matrix +.>The softmax function refers to a normalized exponential function that maps numbers to numbers from 0 to 1.
The similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->
wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>。/>Indicate->Vector of scalar values>Said vector->Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions, < >>
Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->
Acquiring each positionFeatures of->Get the characteristics->And->Doing againAnd inputting the cross space attention module for the initial features to acquire a multi-scale space feature map of the refined space information.
Preferably, the multi-scale space feature map of the spatial information to be refined、/>Pixels in the transverse and longitudinal directions and in the time-space directions are respectively aggregated through the crisscross time attention module, so that the characteristic that the aggregated information is stronger and has more comprehensive characteristic space expression capability is obtained>、/>The specific mode of (2) is as follows:
multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>. Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>. wherein ,/>,/>
Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->
The similarity degreeIs a spatial dimension tensor-> and />First->Scalar value locationSimilarity of->Is a set of (a) and (b),
wherein ,representing spatial dimension tensor->In every position->Vector on->,/>Represent the firstVector of scalar values>, wherein ,/>Representing spatial dimension tensor->Corresponding to position->Vector in the transverse and longitudinal directions.
Tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->
Is the position->Features of the upper part->,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the positionVector of->
Acquiring each positionFeatures of->Get the characteristics->Obtain every position +.>Features of->Obtaining characteristicsAnd->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>
For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.
Preferably, the pyramid pooling module comprises a convolution layer with 3 scales, a batch standardization layer and a correction linear unit, and the three layers are connected to form the ConvBNReLU module.
And respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by adopting bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing the feature images on the channels. And then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.
Advantageous effects
The invention has the advantages that:
1. in the process of detecting the remote sensing images in double time phases, a crisscross space-time attention mechanism is introduced and the structure of the model is optimized. Secondly, two space-time attention modules are designed by using the crisscross principle, and more efficient attention is paid to semantic segmentation results and change detection results.
2. In the feature extraction stage, the esnet model with the pooling removed and the full connection layer is used as a twin convolutional neural network to perform feature extraction in a weight sharing mode, so that the time required in the training stage can be greatly reduced, and the training efficiency is improved.
3. The space-time attention module can be conveniently and efficiently applied to various levels of multi-scale features without downsampling to a fixed scale. And combining the pyramid pooling model to obtain a change detection result with better segmentation performance.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a diagram of the overall network architecture and flow chart of the present invention.
FIG. 2 is a block diagram of a crisscrossed spatial attention model of the present invention.
FIG. 3 is a block diagram of a crisscross temporal attention model of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The method is different from the prior space-time attention mechanism, realizes time attention by modeling the relationship of double-time-phase characteristics through the cross attention extraction of double-time-phase characteristics based on the cross principle, realizes space attention by extracting the space characteristics of each time phase through the cross attention based on the cross principle, and realizes the change detection of the high-resolution remote sensing image.
Step 1: and acquiring a high-resolution remote sensing image change detection image pair and a corresponding binary semantic segmentation label thereof to obtain a high-resolution remote sensing image change detection data set, and acquiring two-phase image data and ground feature change label data of the same region.
Step 2: dividing a high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data; the data enhancement mode comprises the following steps: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.
Step 3: a change detection model is constructed, the change detection model comprising an encoder, a spatiotemporal attention module, and a decoder.
Step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model that removes the last pooling layer and full-connection layer.
Step 5: feature map of multi-scale ground object、/>A spatiotemporal attention module is input, the spatiotemporal attention module comprising a crisscrossed spatial attention module and a crisscrossed temporal attention module.
Firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>The method comprises the steps of carrying out a first treatment on the surface of the The specific mode is as follows:
acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,/>Is special toNumber of channels of symptoms, < >>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature.
Calculating spatial dimension tensors、/>Similarity of->Applying the softmax function to D results in a spatial attention weighting matrix +.>The softmax function refers to a normalized exponential function that maps numbers to numbers from 0 to 1.
The similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->
wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>;/>Indicate->Vector of position->Said vector->Representing spatial dimension tensor->Corresponding to the position->Vectors in the transverse and longitudinal directions, < >>
Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:
wherein ,is->Features on location->,/>Is->Position->Spatial attention weight matrix of individual scalar values +.>,/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is->Vector of position->
Acquiring each ofFeatures of the location->Get the characteristics->And->And inputting the multi-scale spatial feature map serving as the initial feature into a crisscross spatial attention module to acquire refined spatial information.
And then the multiscale space characteristic diagram of the refined space information、/>Pixels in the transverse and longitudinal directions and in the time-space directions are respectively aggregated through the crisscross time attention module, so that the characteristic that the aggregated information is stronger and has more comprehensive characteristic space expression capability is obtained>、/>The method comprises the steps of carrying out a first treatment on the surface of the The specific mode is as follows:
multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>The method comprises the steps of carrying out a first treatment on the surface of the Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>; wherein />,/>
Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->
The similarity degreeFor the space dimensionTensor-> and />First->Scalar value locationSimilarity of->Of (2), wherein->
wherein Representing spatial dimension tensor->In every position->Vector on->,/>Represent the firstVector of position->, wherein />Representing spatial dimension tensor->Corresponding to the position->Vector in the transverse and longitudinal directions.
Tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:
wherein ,is->Features on location->,/>Is->Position->Spatial attention weight matrix of individual scalar values +.>,/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is->Vector of position->
Is->Features on location->,/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is->Vector of position->
Acquiring each ofFeatures of the location->Get the characteristics->Obtain each->Features of the location->Obtaining characteristicsAnd willCharacteristics->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>
For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.
Step 6: features to be characterized、/>Processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model; the pyramid segmentation module comprises a convolution layer, a batch standardization layer and a correction linear unit with 3 scales, and the three layers are connected to form the ConvBNReLU module.
Respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing on the channels; and then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.
Step 7: and inputting the test set data into the trained change detection model to obtain a detection result.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Compared with the prior art:
the present method compares the results of several other change detection methods on the LEVIR-CD dataset, including FC-EF, STANet, BIT, changeFormer. The quantitative comparison results are shown in Table 1, and units of Precision, recall, F-Score and OA are%. The total parameter unit is MB. The recall rate and F1 fraction of the method are higher than those of other 4 methods, and Precision, recall, F and OA are respectively improved by 3.33%, 11.96%, 7.78% and 7.78% compared with FC-EF. Compared with STANet, precision, recall, F and OA are respectively improved by 6.43%, 1.13%, 3.92% and 0.45%. Compared with BIT, precision, recall, F and OA are respectively improved by 1.00%, 2.76%, 1.87% and 0.19%. Compared with ChangeFomer, recall, F1, OA was increased by 3.33%, 0.78%,0.07% and Precision was decreased by 1.81%, respectively.
Table 1 compares the quantitative results of the experiment on the LEVIR-CD dataset.

Claims (7)

1. The high-resolution remote sensing image change detection method based on the crisscross attention is characterized by comprising the following steps of:
step 1: acquiring a high-resolution remote sensing image change detection image pair and a binary semantic segmentation label corresponding to the high-resolution remote sensing image change detection image pair to obtain a high-resolution remote sensing image change detection data set, and acquiring two-time-phase image data and ground feature change label data of the same region;
step 2: dividing a high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data;
step 3: constructing a change detection model, wherein the change detection model comprises an encoder, a space-time attention module and a decoder;
step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model for removing the last pooling layer and the full connection layer;
step 5: feature map of multi-scale ground object、/>Inputting a space-time attention module, wherein the space-time attention module comprises a crisscross space attention module and a crisscross time attention module;
firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>
And then the multiscale space characteristic diagram of the refined space information、/>The pixels in the transverse and longitudinal directions and the time-space directions are respectively aggregated through the crisscross time attention module to obtain the characteristic +.>、/>
Step 6: features to be characterized、/>Processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model;
step 7: and inputting the test set data into the trained change detection model to obtain a detection result.
2. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as set forth in claim 1, wherein the data enhancement method includes: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.
3. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 1, wherein the multi-scale ground object feature map is generated by、/>Acquiring a multiscale space feature map of corresponding refined space information through crisscross space attention modules respectively +.>、/>The specific mode of (2) is as follows:
acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,,/>is the number of channels of the feature>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature;
calculating spatial dimension tensors、/>Similarity of->Applying a softmax function toObtain a spatial attention weight matrix +.>The softmax function refers to a normalized exponential function mapping numbers to numbers from 0 to 1;
the similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->
wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>;/>Represent the firstVector of scalar values>Said vector->Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions, < >>
Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to the positionVectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->
Acquiring each positionFeatures of->Get the characteristics->And->And inputting the multi-scale spatial feature map serving as the initial feature into a crisscross spatial attention module to acquire refined spatial information.
4. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 3, wherein said multi-scale spatial feature map of spatial information to be refined、/>The pixels in the transverse direction, the longitudinal direction and the time-space direction are respectively aggregated through the crisscross time attention module to obtain the characteristics/>、/>The specific mode of (2) is as follows:
multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>The method comprises the steps of carrying out a first treatment on the surface of the Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>; wherein ,/>,/>
Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->
The similarity degreeIs a spatial dimension tensor-> and />First->Scalar value location +.>Similarity of->Is a set of (a) and (b),
wherein ,representing spatial dimension tensor->In every position->Vector on->,/>Represent the firstVector of scalar values>, wherein ,/>Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions;
tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:
wherein ,is the position->Features of the upper part->,/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->
Is the position->Features of the upper part->,/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>,/>Is the position->Vector of->
Acquiring each positionFeatures of->Get the characteristics->Obtain every position +.>Features of->Get the characteristics->And->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>
For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.
5. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 1, wherein the pyramid pooling module comprises a convolution layer, a batch standardization layer and a correction linear unit with 3 scales, and the three layers are connected to form a ConvBNReLU module;
respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing on the channels; and then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.
6. An apparatus for cross-attention based high resolution remote sensing image change detection comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the cross-attention based high resolution remote sensing image change detection method of any one of claims 1 to 5 when the program instructions are run.
7. A storage medium storing program instructions which, when executed, perform the method for detecting changes in high resolution remote sensing images based on crisscross attention as claimed in any one of claims 1 to 5.
CN202310934058.2A 2023-07-28 2023-07-28 Cross attention-based high-resolution remote sensing image change detection method Active CN116665065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310934058.2A CN116665065B (en) 2023-07-28 2023-07-28 Cross attention-based high-resolution remote sensing image change detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310934058.2A CN116665065B (en) 2023-07-28 2023-07-28 Cross attention-based high-resolution remote sensing image change detection method

Publications (2)

Publication Number Publication Date
CN116665065A CN116665065A (en) 2023-08-29
CN116665065B true CN116665065B (en) 2023-10-17

Family

ID=87720914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310934058.2A Active CN116665065B (en) 2023-07-28 2023-07-28 Cross attention-based high-resolution remote sensing image change detection method

Country Status (1)

Country Link
CN (1) CN116665065B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372879B (en) * 2023-12-07 2024-03-26 山东建筑大学 Lightweight remote sensing image change detection method and system based on self-supervision enhancement

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN113706482A (en) * 2021-08-16 2021-11-26 武汉大学 High-resolution remote sensing image change detection method
CN114049335A (en) * 2021-11-18 2022-02-15 感知天下(北京)信息科技有限公司 Remote sensing image change detection method based on space-time attention
US11482048B1 (en) * 2022-05-10 2022-10-25 INSEER Inc. Methods and apparatus for human pose estimation from images using dynamic multi-headed convolutional attention
CN115471467A (en) * 2022-08-31 2022-12-13 核工业北京地质研究院 High-resolution optical remote sensing image building change detection method
CN116166642A (en) * 2022-11-29 2023-05-26 北京航空航天大学 Spatio-temporal data filling method, system, equipment and medium based on guide information
CN116187561A (en) * 2022-04-13 2023-05-30 北京工业大学 PM (particulate matter) based on space time domain convolution network 10 Concentration refinement prediction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN113706482A (en) * 2021-08-16 2021-11-26 武汉大学 High-resolution remote sensing image change detection method
CN114049335A (en) * 2021-11-18 2022-02-15 感知天下(北京)信息科技有限公司 Remote sensing image change detection method based on space-time attention
CN116187561A (en) * 2022-04-13 2023-05-30 北京工业大学 PM (particulate matter) based on space time domain convolution network 10 Concentration refinement prediction method
US11482048B1 (en) * 2022-05-10 2022-10-25 INSEER Inc. Methods and apparatus for human pose estimation from images using dynamic multi-headed convolutional attention
CN115471467A (en) * 2022-08-31 2022-12-13 核工业北京地质研究院 High-resolution optical remote sensing image building change detection method
CN116166642A (en) * 2022-11-29 2023-05-26 北京航空航天大学 Spatio-temporal data filling method, system, equipment and medium based on guide information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection;Hao Chen等;《Remote Sensing》;第12卷(第10期);全文 *
基于双注意力机制的遥感图像目标检测;周幸;陈立福;;计算机与现代化(第08期);全文 *
基于深度编-解码结构的高分辨遥感影像变化检测研究;余江南;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;全文 *

Also Published As

Publication number Publication date
CN116665065A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
Fan et al. Balanced two-stage residual networks for image super-resolution
Shah et al. Stacked U-Nets: a no-frills approach to natural image segmentation
CN116665065B (en) Cross attention-based high-resolution remote sensing image change detection method
Zhao et al. A deep cascade of neural networks for image inpainting, deblurring and denoising
Zhang et al. An unsupervised remote sensing single-image super-resolution method based on generative adversarial network
Khan et al. An encoder–decoder deep learning framework for building footprints extraction from aerial imagery
Duan et al. Research on the natural image super-resolution reconstruction algorithm based on compressive perception theory and deep learning model
Ye et al. Efficient point cloud segmentation with geometry-aware sparse networks
Xu et al. Efficient image super-resolution integration
He et al. Degradation-resistant unfolding network for heterogeneous image fusion
Chen et al. Adaptive fusion network for RGB-D salient object detection
Zhou et al. Adaptive weighted locality-constrained sparse coding for glaucoma diagnosis
Chaudhary et al. Satellite imagery analysis for road segmentation using U-Net architecture
Shao et al. Generative image inpainting with salient prior and relative total variation
Cao et al. DAEANet: Dual auto-encoder attention network for depth map super-resolution
Xu et al. Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation
Zhou et al. A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
Zhao et al. Sharp feature consolidation from raw 3D point clouds via displacement learning
CN113208641A (en) Pulmonary nodule auxiliary diagnosis method based on three-dimensional multi-resolution attention capsule network
Afzal et al. Discriminative feature abstraction by deep L2 hypersphere embedding for 3D mesh CNNs
CN116343052A (en) Attention and multiscale-based dual-temporal remote sensing image change detection network
Pu et al. Hyperspectral image classification with localized spectral filtering-based graph attention network
Pandey et al. A conspectus of deep learning techniques for single-image super-resolution
Ahn et al. Multi-branch neural architecture search for lightweight image super-resolution
Qiao et al. Depth super-resolution from explicit and implicit high-frequency features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant