CN116665065B

CN116665065B - Cross attention-based high-resolution remote sensing image change detection method

Info

Publication number: CN116665065B
Application number: CN202310934058.2A
Authority: CN
Inventors: 邢华桥; 孙雨生; 项俊武; 王海航; 赵欣; 仇培元
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-10-17
Anticipated expiration: 2043-07-28
Also published as: CN116665065A

Abstract

The invention provides a high-resolution remote sensing image change detection method based on crisscross attention, and belongs to the technical field of remote sensing science. The method comprises the following steps: and acquiring a high-resolution remote sensing image change detection data set, and carrying out data enhancement on the training set high-resolution remote sensing image change detection data set data. Constructing a change detection model, wherein the change detection model comprises an encoder, a space-time attention module and a decoder; will trainThe encoder in the set data input change detection model performs feature extraction to obtain adjacent two-phase multi-scale ground feature images、Characteristic diagram of multi-scale ground object、Inputting the space-time attention module to obtain the characteristics、. Features to be characterized、And (3) processing by a pyramid pooling module to obtain a change detection model after training is completed. And inputting the test set data into the trained change detection model to obtain a detection result. The invention improves the efficiency of semantic segmentation and reduces the consumption of computing resources.

Description

Cross attention-based high-resolution remote sensing image change detection method

Technical Field

The invention relates to a high-resolution remote sensing image change detection method based on crisscross attention, and belongs to the technical field of remote sensing science.

Background

The change detection is an important research direction in the remote sensing field, adopts an image and image processing method and a mathematical model, combines the characteristics of ground features and corresponding remote sensing imaging mechanisms, filters incoherent change information serving as interference factors from multi-period remote sensing images and related geospatial data of the same ground surface area, and further finds out interesting change information. The change detection provides a research basis for urban planning and reconstruction, environment monitoring, disaster assessment and other fields by identifying the image change of the ground object in different periods, and has wide application scenes.

With the development of deep learning technology, a change detection method based on deep learning has become a research hotspot for detecting the change of remote sensing images. People apply the method to the change detection of hyperspectral images so as to improve the detection precision to a certain extent. Wang et al (2022) propose an end-to-end dense connection network named Y-Net that uses a twin architecture for multiple types of change detection. The proposed network uses dual stream DenseNet to extract dual temporal variation features during the encoding phase and introduces an attention fusion mechanism during the decoding phase to enhance the attention to the variation features. Chen et al (2021) propose a new approach, namely a dual-attention full convolution twin network, for change detection in high resolution images. Through a dual-attention mechanism, long-distance dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. However, more parameters are introduced into the model by the models, so that the complexity of the model is greatly improved, the accuracy of the model is lower, and the subsequent deployment of the model is influenced to a certain extent.

In general, the existing attention method cannot effectively highlight the image difference information and cannot effectively highlight the interesting semantic information, which affects the improvement of the detection accuracy to a certain extent.

Disclosure of Invention

The invention aims to provide a high-resolution remote sensing image change detection method based on crisscross attention, which improves the efficiency of semantic segmentation and reduces the consumption of computing resources.

The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:

step 1: and acquiring a high-resolution remote sensing image change detection image pair and a corresponding binary semantic segmentation label thereof to obtain a high-resolution remote sensing image change detection data set, and acquiring two-phase image data and ground feature change label data of the same region.

Step 2: and dividing the high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data.

Step 3: a change detection model is constructed, the change detection model comprising an encoder, a spatiotemporal attention module, and a decoder.

Step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model that removes the last pooling layer and full-connection layer.

Step 5: feature map of multi-scale ground object、/>A spatiotemporal attention module is input, the spatiotemporal attention module comprising a crisscrossed spatial attention module and a crisscrossed temporal attention module.

Firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>。

And then the multiscale space characteristic diagram of the refined space information、/>The pixels in the transverse and longitudinal directions and the time-space directions are respectively aggregated through the crisscross time attention module to obtain the characteristic +.>、/>。

Step 6: features to be characterized、/>And processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model.

Step 7: and inputting the test set data into the trained change detection model to obtain a detection result.

Preferably, the data enhancement mode includes: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.

Preferably, the multi-scale ground object feature map、/>Acquiring a multiscale space feature map of corresponding refined space information through crisscross space attention modules respectively +.>、/>The specific mode of (2) is as follows:

acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,，/>is the number of channels of the feature>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature.

Calculating spatial dimension tensors、/>Similarity of->Applying a softmax function toObtain a spatial attention weight matrix +.>The softmax function refers to a normalized exponential function that maps numbers to numbers from 0 to 1.

The similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->。

wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>。/>Indicate->Vector of scalar values>Said vector->Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions, < >>。

Tensor of spatial dimensionAnd a spatial attention weighting matrix +.>The polymerization is carried out according to the following specific formula:

wherein ,is the position->Features of the upper part->，/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>，/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>，/>Is the position->Vector of->。

Acquiring each positionFeatures of->Get the characteristics->And->Doing againAnd inputting the cross space attention module for the initial features to acquire a multi-scale space feature map of the refined space information.

Preferably, the multi-scale space feature map of the spatial information to be refined、/>Pixels in the transverse and longitudinal directions and in the time-space directions are respectively aggregated through the crisscross time attention module, so that the characteristic that the aggregated information is stronger and has more comprehensive characteristic space expression capability is obtained>、/>The specific mode of (2) is as follows:

multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>. Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>. wherein ,/>，/>。

Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->。

The similarity degreeIs a spatial dimension tensor-> and />First->Scalar value locationSimilarity of->Is a set of (a) and (b),

，

wherein ,representing spatial dimension tensor->In every position->Vector on->，/>Represent the firstVector of scalar values>, wherein ,/>Representing spatial dimension tensor->Corresponding to position->Vector in the transverse and longitudinal directions.

Tensor of spatial dimension、/>Respectively with the spatial attention weight matrix->The polymerization is carried out according to the following specific formula:

，

Is the position->Features of the upper part->，/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>，/>Is the positionVector of->。

Acquiring each positionFeatures of->Get the characteristics->Obtain every position +.>Features of->Obtaining characteristicsAnd->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>。

For characteristics of、/>And obtaining a refined multi-scale change characteristic map by simple absolute value difference.

Preferably, the pyramid pooling module comprises a convolution layer with 3 scales, a batch standardization layer and a correction linear unit, and the three layers are connected to form the ConvBNReLU module.

And respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by adopting bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing the feature images on the channels. And then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.

Advantageous effects

The invention has the advantages that:

1. in the process of detecting the remote sensing images in double time phases, a crisscross space-time attention mechanism is introduced and the structure of the model is optimized. Secondly, two space-time attention modules are designed by using the crisscross principle, and more efficient attention is paid to semantic segmentation results and change detection results.

2. In the feature extraction stage, the esnet model with the pooling removed and the full connection layer is used as a twin convolutional neural network to perform feature extraction in a weight sharing mode, so that the time required in the training stage can be greatly reduced, and the training efficiency is improved.

3. The space-time attention module can be conveniently and efficiently applied to various levels of multi-scale features without downsampling to a fixed scale. And combining the pyramid pooling model to obtain a change detection result with better segmentation performance.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

FIG. 1 is a diagram of the overall network architecture and flow chart of the present invention.

FIG. 2 is a block diagram of a crisscrossed spatial attention model of the present invention.

FIG. 3 is a block diagram of a crisscross temporal attention model of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The method is different from the prior space-time attention mechanism, realizes time attention by modeling the relationship of double-time-phase characteristics through the cross attention extraction of double-time-phase characteristics based on the cross principle, realizes space attention by extracting the space characteristics of each time phase through the cross attention based on the cross principle, and realizes the change detection of the high-resolution remote sensing image.

Step 2: dividing a high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data; the data enhancement mode comprises the following steps: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.

Firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>The method comprises the steps of carrying out a first treatment on the surface of the The specific mode is as follows:

acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ，/>Is special toNumber of channels of symptoms, < >>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature.

Calculating spatial dimension tensors、/>Similarity of->Applying the softmax function to D results in a spatial attention weighting matrix +.>The softmax function refers to a normalized exponential function that maps numbers to numbers from 0 to 1.

wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>；/>Indicate->Vector of position->Said vector->Representing spatial dimension tensor->Corresponding to the position->Vectors in the transverse and longitudinal directions, < >>。

，

wherein ,is->Features on location->，/>Is->Position->Spatial attention weight matrix of individual scalar values +.>，/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>，/>Is->Vector of position->。

Acquiring each ofFeatures of the location->Get the characteristics->And->And inputting the multi-scale spatial feature map serving as the initial feature into a crisscross spatial attention module to acquire refined spatial information.

And then the multiscale space characteristic diagram of the refined space information、/>Pixels in the transverse and longitudinal directions and in the time-space directions are respectively aggregated through the crisscross time attention module, so that the characteristic that the aggregated information is stronger and has more comprehensive characteristic space expression capability is obtained>、/>The method comprises the steps of carrying out a first treatment on the surface of the The specific mode is as follows:

multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>The method comprises the steps of carrying out a first treatment on the surface of the Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>； wherein />，/>。

The similarity degreeFor the space dimensionTensor-> and />First->Scalar value locationSimilarity of->Of (2), wherein->。

wherein Representing spatial dimension tensor->In every position->Vector on->，/>Represent the firstVector of position->, wherein />Representing spatial dimension tensor->Corresponding to the position->Vector in the transverse and longitudinal directions.

，

Is->Features on location->，/>Is->Position->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>，/>Is->Vector of position->。

Acquiring each ofFeatures of the location->Get the characteristics->Obtain each->Features of the location->Obtaining characteristicsAnd willCharacteristics->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>；

Step 6: features to be characterized、/>Processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model; the pyramid segmentation module comprises a convolution layer, a batch standardization layer and a correction linear unit with 3 scales, and the three layers are connected to form the ConvBNReLU module.

Respectively passing the input refined multi-scale change feature images through ConvBNReLU modules, then up-sampling by bilinear interpolation to obtain feature images with the same size in front of the pyramid module, and splicing on the channels; and then, a ConvBNReLU module is used for obtaining the characteristics with the shape of (N, 2, W and H), and a change detection binary segmentation graph is output after an argmax function.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Compared with the prior art:

the present method compares the results of several other change detection methods on the LEVIR-CD dataset, including FC-EF, STANet, BIT, changeFormer. The quantitative comparison results are shown in Table 1, and units of Precision, recall, F-Score and OA are%. The total parameter unit is MB. The recall rate and F1 fraction of the method are higher than those of other 4 methods, and Precision, recall, F and OA are respectively improved by 3.33%, 11.96%, 7.78% and 7.78% compared with FC-EF. Compared with STANet, precision, recall, F and OA are respectively improved by 6.43%, 1.13%, 3.92% and 0.45%. Compared with BIT, precision, recall, F and OA are respectively improved by 1.00%, 2.76%, 1.87% and 0.19%. Compared with ChangeFomer, recall, F1, OA was increased by 3.33%, 0.78%,0.07% and Precision was decreased by 1.81%, respectively.

Table 1 compares the quantitative results of the experiment on the LEVIR-CD dataset.

Claims

1. The high-resolution remote sensing image change detection method based on the crisscross attention is characterized by comprising the following steps of:

step 1: acquiring a high-resolution remote sensing image change detection image pair and a binary semantic segmentation label corresponding to the high-resolution remote sensing image change detection image pair to obtain a high-resolution remote sensing image change detection data set, and acquiring two-time-phase image data and ground feature change label data of the same region;

step 2: dividing a high-resolution remote sensing image change detection data set into a training set, a verification set and a test set, and carrying out data enhancement on the training set data;

step 3: constructing a change detection model, wherein the change detection model comprises an encoder, a space-time attention module and a decoder;

step 4: inputting the training set data into an encoder in a change detection model to perform feature extraction, and obtaining adjacent two-phase multi-scale ground feature images、/>The encoder is an ESNET model for removing the last pooling layer and the full connection layer;

step 5: feature map of multi-scale ground object、/>Inputting a space-time attention module, wherein the space-time attention module comprises a crisscross space attention module and a crisscross time attention module;

firstly, a multi-scale ground feature characteristic diagram、/>Obtaining multiscale space feature map of refined space information through crisscross space attention module respectively +.>、/>；

And then the multiscale space characteristic diagram of the refined space information、/>The pixels in the transverse and longitudinal directions and the time-space directions are respectively aggregated through the crisscross time attention module to obtain the characteristic +.>、/>；

Step 6: features to be characterized、/>Processing by a pyramid pooling module, stretching the width and height to the same size as the label by bilinear interpolation to obtain a change ground feature segmentation map, training the model by minimizing the loss of a final tensor and the label, and obtaining a trained change detection model;

2. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as set forth in claim 1, wherein the data enhancement method includes: random flipping, random rotation, random transparency, HSV transition, random noise, random exchange of two image sequences.

3. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 1, wherein the multi-scale ground object feature map is generated by、/>Acquiring a multiscale space feature map of corresponding refined space information through crisscross space attention modules respectively +.>、/>The specific mode of (2) is as follows:

acquiring a spatial dimension tensor from a multi-scale ground object feature map through three 1 x 1 convolution layers、/>、/>, wherein ,，/>is the number of channels of the feature>Representing the whole real set, +.>Representing the width of the feature>Representing the height of the feature;

calculating spatial dimension tensors、/>Similarity of->Applying a softmax function toObtain a spatial attention weight matrix +.>The softmax function refers to a normalized exponential function mapping numbers to numbers from 0 to 1;

the similarity degreeIs a spatial dimension tensor->、/>First->Scalar value location +.>Similarity of->Of (2), wherein->；

wherein ,representing spatial dimension tensor->In every position->Vector available above, < >>；/>Represent the firstVector of scalar values>Said vector->Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions, < >>；

wherein ,is the position->Features of the upper part->，/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>，/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to the positionVectors in the transverse and longitudinal directions of (2), +.>，/>Is the position->Vector of->；

Acquiring each positionFeatures of->Get the characteristics->And->And inputting the multi-scale spatial feature map serving as the initial feature into a crisscross spatial attention module to acquire refined spatial information.

4. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 3, wherein said multi-scale spatial feature map of spatial information to be refined、/>The pixels in the transverse direction, the longitudinal direction and the time-space direction are respectively aggregated through the crisscross time attention module to obtain the characteristics/>、/>The specific mode of (2) is as follows:

multi-scale spatial feature map to refine spatial informationSpatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>The method comprises the steps of carrying out a first treatment on the surface of the Multiscale spatial feature map of refined spatial information +.>Spatial dimension tensor is obtained by two different 1 x 1 convolution layers>、/>； wherein ,/>，/>；

Calculation of and />Similarity of->Applying the softmax function to +.>Obtaining a time attention matrix->；

The similarity degreeIs a spatial dimension tensor-> and />First->Scalar value location +.>Similarity of->Is a set of (a) and (b),

，

wherein ,representing spatial dimension tensor->In every position->Vector on->，/>Represent the firstVector of scalar values>, wherein ,/>Representing spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions;

，

wherein ,is the position->Features of the upper part->，/>Is the position->First->Spatial attention weight matrix of individual scalar values +.>，/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>，/>Is the position->Vector of->；

Is the position->Features of the upper part->，/>Is the position->First->Vector of individual scalar values->Said vector->Is a spatial dimension tensor->Corresponding to position->Vectors in the transverse and longitudinal directions of (2), +.>，/>Is the position->Vector of->；

Acquiring each positionFeatures of->Get the characteristics->Obtain every position +.>Features of->Get the characteristics->And->、/>The cross time attention module is again entered as an initial feature to obtain the feature +.>、/>；

5. The method for detecting changes in high-resolution remote sensing images based on crisscross attention as claimed in claim 1, wherein the pyramid pooling module comprises a convolution layer, a batch standardization layer and a correction linear unit with 3 scales, and the three layers are connected to form a ConvBNReLU module;

6. An apparatus for cross-attention based high resolution remote sensing image change detection comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the cross-attention based high resolution remote sensing image change detection method of any one of claims 1 to 5 when the program instructions are run.

7. A storage medium storing program instructions which, when executed, perform the method for detecting changes in high resolution remote sensing images based on crisscross attention as claimed in any one of claims 1 to 5.