CN117975267A

CN117975267A - Remote sensing image change detection method based on twin multi-scale cross attention

Info

Publication number: CN117975267A
Application number: CN202410124654.9A
Authority: CN
Inventors: 宋玉莹; 任酉贵; 鲍玉斌; 冷芳玲
Original assignee: 东北大学
Priority date: 2024-01-30
Filing date: 2024-01-30
Publication date: 2024-05-03

Abstract

The invention provides a remote sensing image change detection method based on twin multiscale cross attention, which relates to the technical field of change detection, and aims to solve the problems of inaccurate pixel edge detection and missing of a tiny target in the existing change detection method by adding a CBAM lightweight attention module in a feature extraction stage, obviously improving the feature extraction and expression capability of a network, fusing multiscale features of different layers, fully mining semantic feature information of a double-time-phase remote sensing image, reducing the influence of continuous downsampling detail loss of an encoder, further screening effective features by using cross attention, capturing interaction information between two time-phase images, and filtering pseudo-change information.

Description

Remote sensing image change detection method based on twin multi-scale cross attention

Technical Field

The invention relates to the technical field of change detection, in particular to a remote sensing image change detection method based on twin multi-scale cross attention.

Background

The change detection is a process of recognizing a state difference by observing the same geographical location at different times, and in recent years, with the rapid development of deep learning technology and the excellent feature extraction capability of deep learning in the image processing field, deep learning is widely used in the field of remote sensing image change detection. The existing remote sensing image change detection method based on deep learning can be roughly divided into two types from the network structure, one type of method is to fuse two images with different time phases and then input the fused images into a single-branch full convolution network, the change is detected by maximizing a boundary, the other type of method is to adopt a double-branch twin network, firstly, the characteristics of the images with two different time phases are respectively extracted, and then the distance between the extracted characteristic pairs is measured to detect a change area.

Although deep learning-based approaches achieve good results, there are still problems that prevent the improvement of network performance. The feature extraction capability of the change detection network model determines the quality of the change detection result to a great extent, but in general, the continuous downsampling operation in the feature extraction process can cause loss of accurate spatial position and other detailed information, and cannot effectively locate tiny targets and capture the detailed features of the change targets, thereby causing missed detection, rough edges of the change region and the like. In addition, the twin network is used for extracting the characteristics of the images of the two time phases respectively, so that interaction information between the two images is ignored, and the detection effect is influenced. At present, the change detection is widely used in the fields of natural disaster detection, urban planning and land utilization, ecological environment protection, national defense safety and the like, so that the realization of a remote sensing image change detection method with accurate positioning of a change region and clear edges of the change region is very important.

Although some progress has been made in methods based on deep learning, most of the existing research methods still have the following problems:

First, the feature extraction capability of the change detection network model determines the quality of the change detection result to a great extent, but most methods cannot fully extract the features of the dual-temporal remote sensing image, and the continuous downsampling operation in the feature extraction process generally results in losing accurate spatial positions and difference information, and cannot effectively locate tiny targets and capture the detailed features of the change targets, resulting in missed detection, rough edges of the change region, and the like.

Secondly, the twin network is used for extracting the characteristics of the images of the two time phases respectively, and the interaction information and the space-time dependency relationship between the two images are ignored, so that the detection result has pseudo-change and the detection precision is not high.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a remote sensing image change detection method based on twin multiscale cross attention, which better solves the problem of losing accurate spatial position information in the continuous downsampling process;

a remote sensing image change detection method based on twin multi-scale cross attention comprises the following steps:

step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;

Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, and specifically comprise three conditions of building new construction, building elimination and no change;

step 1.2: cutting the pictures in the change detection data set according to the set size;

step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion;

step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention;

The remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;

The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features by adopting 7×7 convolution layers with a step length of 1, then adopting BN layers and ReLU activation functions, then adopting a maximum pooling layer with a step length of 2, and then extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output features of each residual block are 1/2, 1/4, 1/8 and 1/16 of the input image in terms of their channels 64, 128, 256 and 512, respectively.

Furthermore, a stacked attention module comprising 4 CBAM modules is integrated in the feature extractor: and applying one CBAM blocks to the output characteristics of each residual block, uniformly adjusting the characteristics to half of the size of the original input image, splicing four characteristic graphs with the same size according to the channel direction, and adjusting the number of channels through a 1x1 convolution layer so as to obtain the characteristics with more discernability and containing multi-scale information. Each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:

For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:

M_c(F)＝σ(MLP(Avg(F))+MLP(Max(F)))；

Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M _c (F) by F, expressed as:

Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:

M_s(F′)＝σ(f^3×3(Avg(F′)；Max(F′)))

F ^3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:

The cross attention module is used for further refining the image feature F' extracted by the multi-scale feature extraction module by using cross attention, and the input features are F _i and F _j, and the query Q _i and Q _j, the keys K _i and K _j and the values V _i and V _j are firstly generated and then transmitted to an attention layer; after generating the attention weight by querying the dot product between Q _i and the key K _j, retrieving attention information by the product of the value V _j and the attention weight; wherein the attention layer is expressed as:

When the attention vector is obtained, it is connected to the input feature F _i, resulting in a new feature F _i,j as follows:

F_i,j＝F_i+A(Q_i,K_i,V_i)+A(Q_i,K_j,V_j)

the new feature F _j,i is calculated in the same way; finally, calculating an output vector through 3×3 convolution and normalization;

The measurement module expands the size of the feature image to the size of an original input image through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then zooms in the distance between unchanged feature tensor pairs by using a contrast loss function, zooms out the distance between changed feature tensor pairs, and finally obtains a change detection result image through threshold segmentation;

step 3: and training a remote sensing image change detection model based on twin multi-scale cross attention.

Step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:

D _w(X₁,X₂) represents the euclidean distance between the input feature X ₁ and the input feature X ₂, W is a network weight, P represents a feature dimension, Y represents a predicted value, y=1 represents that the detected image corresponding position changes, y=0 represents that no change occurs, that is, no building is newly added or lost, m is a set threshold, and N is the number of samples.

Step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:

Where, ζ (θ) is a given loss function, n is the number of samples per training input, h _θ(xⁱ) is the weight of the training samples, x ⁱ is the training sample value, y ⁱ is the label value of the samples, and i is the serial number of the samples.

Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a reverse propagation algorithm of errors until the model converges, and the training is finished;

Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph.

And respectively inputting the pre-change remote sensing image and the post-change remote sensing image to be detected into two branches of a remote sensing image change detection model, and predicting by the remote sensing image change detection model to obtain a final change map.

The beneficial effects of adopting above-mentioned technical scheme to produce lie in:

The invention provides a remote sensing image change detection method based on twin multi-scale cross attention, which has the following beneficial effects:

1. According to the remote sensing image change detection method based on the multi-scale cross attention twin network, the features of different scales of different levels are reserved through the feature extraction module, and the high-level semantic information and the low-level space detail information are fused, so that targets with different sizes are considered, and the problem of missed detection of small targets is greatly solved. In addition, by adding CBAM lightweight attention modules after each residual block to obtain more discernable features, the problem of blurring of the edges of the change area is greatly improved, and the change detection precision is further improved.

2. According to the remote sensing image building change detection method based on the multiscale cross attention twin network, the cross attention module is used, so that interaction information between two images is obtained, and the space-time dependency relationship of double-phase images is better captured, so that the performance of a model is improved.

Drawings

FIG. 1 is a schematic diagram of a network model structure according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a cross attention module in a network model structure according to an embodiment of the present invention;

FIG. 3 is a schematic view of a first temporal remote sensing image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a second phase remote sensing image according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a remote sensing image change detection result according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of the actual change area in an embodiment of the present invention;

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

A remote sensing image change detection method based on twin multi-scale cross attention, as shown in figure 1, comprises the following steps:

Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, such as houses, warehouses, industrial parks, garages and the like, and particularly comprise three conditions of building new construction, building elimination and no change;

The open source dataset LEVIR-CD was downloaded from the web in this example, which was acquired by Chen et al, 2002 to 2018, images of 20 different regions of Tex, including 637 pairs of 1024x1024 high resolution dual-temporal images with 0.5m/pixel spatial resolution. LEVIR-CD datasets comprise multiple types of buildings, such as houses, garages, and warehouses.

in this embodiment, a 1024x1024 pixel large-scale image is subjected to non-overlapping cropping into an image of 256x256 in size.

in this embodiment, according to 7:2:1 is divided into a training set, a testing set and a verification set; the cropped final dataset includes 7120 training image pairs, 2048 test image pairs, 1024 verification image pairs; in this embodiment, as shown in fig. 3 and fig. 4, two time phase remote sensing images in the training set are respectively;

step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention, as shown in fig. 2;

The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features rich in spatial information by adopting a 7×7 convolution layer with a step length of 1, then adopting a BN layer and a ReLU activation function, then adopting a maximum pooling layer with a step length of 2, and then fully extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output features of each residual block are 1/2, 1/4, 1/8 and 1/16 of the input image in terms of their channels 64, 128, 256 and 512, respectively.

Furthermore, to fully capture the valid information in the multi-scale features, a stacked attention module containing 4 CBAM modules is integrated in the feature extractor: more specifically, one CBAM block is applied to the output features of each residual block to emphasize useful spatial and channel information, then these features are uniformly adjusted to half the original input image size, four feature maps of the same size are spliced in the channel direction, and the number of channels is adjusted by one 1x1 convolution layer, thereby obtaining more discernable features containing multi-scale information. Each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:

M_c(F)＝σ(MLP(Avg(F))+MLP(Max(F)))；

M_s(F′)＝σ(f^3×3(Avg(F′)；Max(F′)))

The cross attention module is used for further refining the image feature F' extracted by the multi-scale feature extraction module by using cross attention, and the input features are F _i and F _j, and the query Q _i and Q _j, the keys K _i and K _j and the values V _i and V _j are firstly generated and then transmitted to the attention layer; after generating the attention weight by querying the dot product between Q _i and the key K _j, retrieving attention information by the product of the value V _j and the attention weight; wherein the attention layer is expressed as:

F_i,j＝F_i+A(Q_i,K_i,V_i)+A(Q_i,K_j,V_j)

Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph, wherein the change result graph is shown in fig. 5.

The pre-change remote sensing image and the post-change remote sensing image to be detected are respectively input into two branches of a remote sensing image change detection model, and a final change diagram is obtained through the prediction of the remote sensing image change detection model, as shown in fig. 6.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. The remote sensing image change detection method based on twin multi-scale cross attention is characterized by comprising the following steps of:

Step 3: training a remote sensing image change detection model based on twin multi-scale cross attention;

2. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 1 specifically comprises the following steps:

Step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion.

3. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;

The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features by adopting 7×7 convolution layers with a step length of 1, then adopting BN layers and ReLU activation functions, then adopting a maximum pooling layer with a step length of 2, and then extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output feature of each residual block is 1/2, 1/4, 1/8 and 1/16 of the input image, and the channels are 64, 128, 256 and 512;

furthermore, a stacked attention module comprising 4 CBAM modules is integrated in the feature extractor: applying CBAM blocks to the output characteristics of each residual block, uniformly adjusting the characteristics to half of the size of an original input image, splicing four characteristic images with the same size according to the channel direction, and adjusting the number of channels through a 1x1 convolution layer so as to obtain more discernable characteristics containing multi-scale information; each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:

M_c(F)＝σ(MLP(Avg(F))+MLP(Max(F)))；

M_s(F′)＝σ(f^3×3(Avg(F′)；Max(F′)))

4. A method of detecting changes in remote sensing images based on twin multiscale cross attention as claimed in claim 3, wherein the cross attention module is configured to refine the image feature F "extracted by the multiscale feature extraction module further by using cross attention, and the input features are F _i and F _j, and first generate queries Q _i and Q _j, keys K _i and K _j, and values V _i and V _j, and then deliver them to the attention layer; after generating the attention weight by querying the dot product between Q _i and the key K _j, retrieving attention information by the product of the value V _j and the attention weight; wherein the attention layer is expressed as:

F_i,j＝F_i+A(Q_i,K_i,V_i)+A(Q_i,K_j,V_j)

The new feature F _j,i is calculated in the same way; finally, the output vector is calculated by 3 x 3 convolution and normalization.

5. The remote sensing image change detection method based on twin multi-scale cross attention according to claim 3, wherein the measurement module expands the feature image size to be the same as the original input image size through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then uses a contrast loss function to pull the distance between the feature tensor pair which is unchanged, and further pulls the distance between the feature tensor pair which is changed, and finally obtains a change detection result image through threshold segmentation.

6. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 3 specifically comprises the following steps:

Wherein, D _w(X₁,X₂) represents the euclidean distance between the input feature X ₁ and the input feature X ₂, W is the network weight, P represents the dimension of the feature, Y represents the predicted value, y=1 represents the change of the corresponding position of the detected image, y=0 represents no change, i.e. no new addition or disappearance of the building occurs, m is the set threshold value, and N is the number of samples;

Wherein, the factor of [ theta ] is a given loss function, n is the number of samples input by each training, h _θ(xⁱ) is the weight of the training samples, x ⁱ is the training sample value, y ⁱ is the label value of the samples, and i is the serial number of the samples;

Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a back propagation algorithm of errors until the model converges, and the training is finished.

7. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 4 is specifically: and respectively inputting the pre-change remote sensing image and the post-change remote sensing image to be detected into two branches of a remote sensing image change detection model, and predicting by the remote sensing image change detection model to obtain a final change map.