CN117496352A

CN117496352A - Remote sensing change detection method, device and equipment based on gradual fusion of adjacent features

Info

Publication number: CN117496352A
Application number: CN202311507838.5A
Authority: CN
Inventors: 王威; 夏罗成; 王新; 李骥; 张文杰
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2024-02-02

Abstract

The application relates to a remote sensing change detection method, device and equipment based on gradual fusion of adjacent features. The method provides a remote sensing change detection model based on gradual fusion of adjacent features, and the model comprises two innovation modules: the feature difference enhancement module and the adjacent feature gradual fusion module are used for gradually fusing the features; the remote sensing change detection model not only reserves an original feature map, but also adopts the feature difference map of adjacent scales to take the mean value to reduce noise interference and strengthen change features, and simultaneously carries out adjacent feature fusion on multi-scale features so as to relieve the problems of information loss and boundary blurring caused by semantic difference among the features of different scales. Compared with a classical network, the method has better effectiveness and better balance between accuracy and calculation cost.

Description

Remote sensing change detection method, device and equipment based on gradual fusion of adjacent features

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a remote sensing change detection method, device, and equipment based on gradual fusion of adjacent features.

Background

Currently, a change detection method based on deep learning has become a research hotspot. In recent years, a deep learning method typified by a Convolutional Neural Network (CNN) has achieved a good change detection effect. However, due to the complex imaging conditions and the effects of temperature, illumination, and environmental differences between dual-temporal images, it remains a challenge to effectively extract and fuse deep features of dual-temporal images and to ensure the integrity of the boundary of the change region to improve the accuracy of change detection.

The existing deep learning change detection method has achieved certain results, such as: the conjoined network SNunetCD for change detection is used for reducing the loss of positioning information in the deep layer of the neural network through compact information transmission; the end-to-end super-pixel enhanced change detection network ESCNet combines differentiable super-pixel segmentation to solve the problem of accurate positioning of a change region; the depth supervision image fusion network IFN fuses the image difference features with the multi-level depth features of the original image to reconstruct a change map, so that the boundary completeness and the internal compactness of an object in the output change map are improved; the deep supervision network ADS-Net based on the attention mechanism is used for detecting the change in the double-time remote sensing image, and the self-adaptive attention mechanism of the space and channel characteristics is combined to capture the relation between the different scale changes and realize more accurate change detection. The existing remote sensing change detection research process shows that the characteristic difference map can better help the model to improve the extraction of change characteristics and ensure the integrity of the boundary of a change area. However, existing work either does not make full use of the difference map or makes use of the difference map without retaining the original feature map. This may result in insufficient mining of the change information and blurring of the boundary information.

Disclosure of Invention

Based on the foregoing, it is necessary to provide a remote sensing change detection method, device and equipment based on gradual fusion of adjacent features.

A remote sensing change detection method based on gradual fusion of adjacent features comprises the following steps:

and acquiring a to-be-detected double-time remote sensing image pair.

Inputting the double-time remote sensing image pair to be detected into a twin feature extraction network with shared weights to obtain a multi-scale double-temporal feature map;

inputting the multi-scale double temporal characteristics into a channel reduction and difference enhancement module to obtain multi-scale difference characteristics; the channel reduction and difference enhancement module is used for carrying out channel reduction and characteristic difference enhancement on the double-temporal characteristic map of each scale by adopting multiple branches, and the enhanced characteristics are input into the branches of adjacent scales for characteristic difference enhancement;

inputting the multi-scale difference features into an adjacent feature gradual fusion module to obtain fusion features; the adjacent feature gradual fusion module is used for enhancing the change features by utilizing complementary information between adjacent scale difference features through boundary compensation and knowledge review branches to obtain multi-scale primary fusion features, and continuing to fuse the adjacent features of the adjacent scale primary fusion features until a single-scale fusion feature is obtained.

And predicting according to the fusion characteristics to obtain a remote sensing change detection prediction result.

In one embodiment, the weight-shared twin feature extraction network is a weight-shared twin network consisting of four residual modules in the ResNet network.

The multi-scale bi-temporal feature map is a feature output by four residual modules of the weight-sharing twin feature extraction network.

In one embodiment, the multi-scale temporal feature comprises 4 different-scale bi-temporal feature maps; the channel reduction and differential enhancement module comprises four characteristic enhancement branches, wherein the characteristic enhancement branches comprise a channel reduction module and a characteristic differential enhancement module.

Inputting the multi-scale temporal features into a channel reduction and difference enhancement module to obtain multi-scale difference features, wherein the multi-scale difference features comprise:

the 4 different-scale bi-temporal feature maps are respectively input into four feature enhancement branches, and channel reduction is carried out through a channel reduction module.

Inputting the first scale bi-temporal feature map after channel reduction into a feature difference enhancement module of a first feature enhancement branch to obtain a first scale difference feature map.

And inputting the second scale bi-temporal feature map and the first scale difference feature map after channel reduction into a feature difference enhancement module of a second feature enhancement branch to obtain a second scale difference feature map.

And inputting the third-scale bi-temporal feature map and the second-scale difference feature map after channel reduction into a feature difference enhancement module of a third feature enhancement branch to obtain a third-scale difference feature map.

And inputting the fourth-scale bi-temporal feature map and the third-scale difference feature map after channel reduction into a feature difference enhancement module of a fourth feature enhancement branch to obtain a fourth-scale difference feature map.

In one embodiment, inputting the second scale bi-temporal feature map and the first scale difference feature map after channel reduction into a feature difference enhancement module of a second feature enhancement branch to obtain a second scale difference feature map, including:

and carrying out absolute value operation on the second-scale bi-temporal feature map after channel reduction through element-by-element subtraction operation, and obtaining a rough differential feature map.

And extracting the characteristics of the rough differential characteristic diagram through a first convolution layer, and then obtaining a first characteristic differential attention diagram through a first spatial attention module.

And downsampling the first scale difference feature map through a second convolution layer, and obtaining a second feature difference attention map after the downsampling result passes through a second spatial attention module.

And carrying out weight average on the first characteristic difference attention map and the second characteristic difference attention map to obtain a refined characteristic difference attention map.

And respectively carrying out pixel-by-pixel multiplication operation on the second-scale bi-temporal feature map after channel reduction and the refined feature difference attention map to obtain an enhanced bi-temporal feature map.

And adding the enhanced double-temporal feature map with the corresponding channel-reduced second-scale double-temporal feature map, and then obtaining a refined double-temporal feature map through convolution operation.

And inputting the spliced dual-temporal feature map into a channel attention module, and multiplying the obtained result by the spliced result to obtain a feature map after channel enhancement.

And adding the characteristic diagram after channel enhancement with the rough differential characteristic diagram after convolution, and then obtaining a second scale differential characteristic diagram after a convolution layer.

In one embodiment, the adjacent feature gradual fusion module comprises a plurality of feature fusion modules.

Inputting the multi-scale temporal difference feature map into an adjacent feature gradual fusion module to obtain fusion features, wherein the method comprises the following steps:

and inputting the first scale difference feature map and the second scale difference feature map into a first feature fusion module to obtain first scale primary fusion features.

And inputting the second scale difference feature map and the third scale difference feature map into a second feature fusion module to obtain a second scale primary fusion feature.

And inputting the third scale difference feature map and the fourth scale difference feature map into a third feature fusion module to obtain a third scale primary fusion feature.

And inputting the first-scale primary fusion feature and the second-scale primary fusion feature into a fourth feature fusion module to obtain a first-scale secondary fusion feature.

And inputting the first-level fusion features of the second scale and the first-level fusion features of the third scale into a fifth feature fusion module to obtain second-level fusion features of the second scale.

And inputting the first-scale second-level fusion features and the second-scale second-level fusion features into a sixth feature fusion module to obtain fusion features.

In one embodiment, the first scale difference feature map and the second scale difference feature map are input into a first feature fusion module, and the first scale primary fusion feature is obtained by:

b＝1-a

d＝c·(1-a)+a·(1-c)

c＝Mask(D _i )

wherein, the method comprises the steps of,for first-scale first-order fusion features, D _i D, as a first scale difference characteristic diagram _i+1 For the second scale difference feature map, +.>For the upsampling feature of the second scale difference feature map, a and c are the variation features +. >And D _i Predictive change map obtained by Mask, D is at D _i And->The conflict attention strive obtained between b represents the boundary compensation attention strive, cat (·) is the feature concatenation operation, conv _1×1 (. Cndot.) comprising a 1X 1 convolution module, a batch normalization and a ReLU activation function, +.>And->Is two enhanced features, conv _3×3 (. Cndot.) contains a 3X 3 convolution module, a batch normalization and a ReLU activation function, CAM is the channel attention module, mask is the variogram prediction module.

In one embodiment, the variogram prediction module includes a 3×3 convolutional layer, a batch normalization layer, a ReLU activation function, a 1×1 convolutional layer, and a Sigmoid activation function.

In one embodiment, predicting the fusion feature to obtain a remote sensing change detection prediction result includes:

and processing the fusion characteristics by adopting a convolution layer with a convolution kernel of 1 multiplied by 1, and activating by bilinear interpolation up-sampling and Sigmoid activation functions to obtain a remote sensing change detection prediction result.

Remote sensing change detection device based on adjacent characteristic gradual convergence, the device includes:

and the double-time remote sensing image pair acquisition module is used for acquiring the double-time remote sensing image pair to be detected.

The feature extraction module is used for inputting the dual-temporal remote sensing image to be detected into a twin feature extraction network with shared weights to obtain a multi-scale dual-temporal feature map;

the multi-scale difference feature map extraction module is used for inputting the multi-scale double-temporal feature map into the channel reduction and difference enhancement module to obtain a multi-scale difference feature map; the channel reduction and difference enhancement module is used for carrying out channel reduction and characteristic difference enhancement on the double-temporal characteristic map of each scale by adopting multiple branches, and the enhanced characteristics are input into the branches of adjacent scales for characteristic difference enhancement.

The feature fusion module is used for inputting the multi-scale difference feature map into the adjacent feature gradual fusion module to obtain fusion features; the adjacent feature gradual fusion module is used for enhancing the change features by using the boundary compensation and knowledge review branches and utilizing the complementary information between the adjacent scale difference features to obtain multi-scale primary fusion features, and continuing to fuse the adjacent features of the adjacent scale primary fusion features until the single-scale fusion features are obtained.

And the remote sensing change detection module is used for predicting the fusion characteristics to obtain a remote sensing change detection prediction result.

A computer device comprising a memory storing a computer program and a processor implementing the steps of any of the methods described above when the processor executes the computer program.

The method provides a remote sensing change detection model based on gradual fusion of adjacent features, and the model comprises two innovation modules: the feature difference enhancement module and the adjacent feature gradual fusion module are used for gradually fusing the features; the remote sensing change detection model not only reserves an original feature map, but also adopts the feature difference map of adjacent scales to take the mean value to reduce the noise interference and strengthen the change features, and simultaneously carries out adjacent feature fusion on the multi-scale features so as to relieve the problems of information loss and boundary blurring caused by semantic difference among the features of different scales. Compared with a classical network, the method has better effectiveness and better balance between accuracy and calculation cost.

Drawings

FIG. 1 is a flow chart of a remote sensing change detection method based on gradual fusion of neighboring features in one embodiment;

FIG. 2 is a block diagram of a remote sensing change detection model based on gradual fusion of neighboring features in one embodiment;

FIG. 3 is a block structure of a residual in one embodiment, where (a) is a first residual structure and (b) is a second residual structure;

FIG. 4 is a block diagram of a feature differential enhancement module in another embodiment;

FIG. 5 is a block diagram of a progressive feature fusion module in accordance with another embodiment;

FIG. 6 is a block diagram of a remote sensing change detection device based on gradual fusion of neighboring features in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, there is provided a remote sensing change detection method based on gradual fusion of adjacent features, the method comprising the steps of:

step 100: and acquiring a to-be-detected double-time remote sensing image pair.

Specifically, the to-be-measured dual-time remote sensing image pair is two remote sensing images image1 and image2 of the same scene at different moments.

Step 102: and inputting the double-time remote sensing image pair to be detected into a twin feature extraction network with shared weights to obtain a multi-scale double-temporal feature map.

Specifically, a dual-time remote sensing image pair (imageA, imageB) to be detected is input into a twin feature extraction network with shared weight to perform feature extraction, and two groups of multi-scale feature images corresponding to the dual-time remote sensing image pair (imageA, imageB) to be detected are output.

The two branches of the weight-sharing twin feature extraction network adopt the ResNet18 network as a backbone, but a convolution layer and a pooling layer before a residual block of the ResNet18 network are not used, and the last pooling layer and a full connection layer are removed.

Step 104: inputting the multi-scale double temporal characteristics into a channel reduction and difference enhancement module to obtain multi-scale difference characteristics; the channel reduction and difference enhancement module is used for carrying out channel reduction and characteristic difference enhancement on the double-temporal characteristic map of each scale by adopting multiple branches, and the enhanced characteristics are input into the branches of adjacent scales for characteristic difference enhancement.

Specifically, the conventional change detection method generally uses only feature stitching operation or element-by-element subtraction operation to obtain difference information of the bi-temporal features. However, this operation is less suitable for the change detection task. To address this issue, the present application proposes a channel reduction and differential enhancement module that includes the same number of branches as the scale of a multi-scale feature map, each branch being composed of two parts, channel reduction and feature differential enhancement. The channel reduction has the advantages that the quantity and the calculated quantity of model parameters can be greatly reduced, and the temporal difference characteristics in the two-temporal characteristics with different scales can be obtained through the characteristic difference enhancement module so as to capture the rich context information of the two-temporal images.

Step 106: inputting the multi-scale difference features into an adjacent feature gradual fusion module to obtain fusion features; the adjacent feature gradual fusion module is used for enhancing the change features by utilizing complementary information between adjacent scale difference features through boundary compensation and knowledge review branches to obtain multi-scale primary fusion features, and continuing to fuse the adjacent features of the adjacent scale primary fusion features until a single-scale fusion feature is obtained.

Specifically, in convolutional neural networks, different feature levels of the convolutional layer correspond to different degrees of feature representation. Low-level features typically contain edge and texture information of the image, while high-level features can extract shape and semantic information of the object. The multi-level integration may aggregate spatial detail and semantic information. However, there is a semantic gap between the low-level and high-level features, which results in the possibility of losing information or introducing some unnecessary noise after fusing them directly. Thus, utilizing feature integration between adjacent layer features is an effective strategy to solve the above-mentioned problems. The application provides a progressive fusion module of adjacent features, which utilizes complementary information between adjacent layer features to simultaneously refine space details and semantic information.

Step 108: and predicting according to the fusion characteristics to obtain a remote sensing change detection prediction result.

The weight-shared twin feature extraction network, the channel reduction and difference enhancement module and the adjacent feature gradual fusion module form a remote sensing change detection model (AFPF-Net network) based on the adjacent feature gradual fusion, and the structure of the remote sensing change detection model based on the adjacent feature gradual fusion is shown in figure 2. Specifically, a pair of dual-time remote sensing images to be measured (imageA, imageB) are input to a feature extraction network of two weight sharing res net18, which will output two sets of multi-scale feature maps, respectively. Then, two feature images with the same mesoscale in two groups are input into a corresponding channel reduction module, channel reduction is firstly carried out to unify the channel number of the double-time feature pairs to 64, then the double-time feature images after channel reduction are input into a feature difference enhancement module (FDE module) to carry out feature difference enhancement, and the enhanced change features are sent into an adjacent feature gradual fusion module (AFPF module) and an adjacent-scale FDE module. And after all the enhanced change features are input into an AFPF module, the module inputs the adjacent feature difference images into a fusion module in a fusion mode of the adjacent feature difference images, and the fusion of the adjacent feature images is realized through a boundary feature compensation branch and a knowledge review branch. Finally, the final prediction result is obtained through convolution of 1×1 and Sigmoid function.

In the remote sensing change detection method based on the gradual fusion of adjacent features, the method provides a remote sensing change detection model based on the gradual fusion of adjacent features, and the model comprises two innovation modules: the feature difference enhancement module and the adjacent feature gradual fusion module are used for gradually fusing the features; the remote sensing change detection model not only reserves an original feature map, but also adopts the feature difference map of adjacent scales to take the mean value to reduce noise interference and strengthen change features, and simultaneously carries out adjacent feature fusion on multi-scale features so as to relieve the problems of information loss and boundary blurring caused by semantic difference among the features of different scales. Compared with a classical network, the method has better effectiveness and better balance between accuracy and calculation cost.

In one embodiment, the weight-shared twin feature extraction network is a weight-shared twin network consisting of four residual modules in the ResNet network; the multi-scale bi-temporal feature map is a feature output by four residual modules of the weight-sharing twin feature extraction network.

Specifically, the first residual structure is shown in fig. 3 (a). To improve model performance, the weight-shared twin feature extraction network employs a pre-trained ResNet-18 on the image dataset, but does not use a convolutional layer and a pooling layer before the residual block, and serves as the feature extraction branch of the weight-shared twin feature extraction network after the pooling layer and full-connection layer are removed at the end of the network. Thus, the four encoder blocks of the feature extraction branch consist of several residual blocks. The residual block adds a jump mechanism to the convolution block, as shown in fig. 3 (a), which can accelerate model convergence and prevent gradient disappearance. The second parameter configuration shown in fig. 3 (b), which is a modified version of fig. 3 (a), can be applied to the channel variation after passing through the convolutional layer. The feature extraction network adopts a weight sharing twin structure, so that the generalization of the model is improved, and the calculated amount and the parameter number of the model can be reduced. And inputting the double-time remote sensing image pair into a twin feature extraction network with shared weights to obtain four pairs of original features with different scales and different depths. Equation (1) gives the structural characteristics of the residual error through the first type Graph change process, assuming a given standard inputObtain output->Equation (2) gives the profile change over the second residual block, assuming a given standard input +.>Obtain outputThe outputs of the two residual structures are:

X _{out_1} ＝ReLU(BN(Conv(ReLU(BN(Conv(X _{in_1} )))))+X _{in_1} ) (1)

X _{out_2} ＝ReLU(BN(Conv(ReLU(BN(Conv(X _{in_2} )))))+Conv(X _{in_2} )) (2)

in one embodiment, the multi-scale temporal feature comprises 4 different-scale bi-temporal feature maps; the channel reduction and difference enhancement module comprises four characteristic enhancement branches, wherein each characteristic enhancement branch comprises a channel reduction module and a characteristic difference enhancement module; step 104 comprises: respectively inputting 4 double-temporal feature maps with different scales into four feature enhancement branches, and carrying out channel reduction through a channel reduction module; inputting the first scale bi-temporal feature map after channel reduction into a feature difference enhancement module of a first feature enhancement branch to obtain a first scale difference feature map; inputting the second scale bi-temporal feature map and the first scale difference feature map after channel reduction into a feature difference enhancement module of a second feature enhancement branch to obtain a second scale difference feature map; inputting the channel-reduced third-scale bi-temporal feature map and the second-scale difference feature map into a feature difference enhancement module of a third feature enhancement branch to obtain a third-scale difference feature map; and inputting the fourth-scale bi-temporal feature map and the third-scale difference feature map after channel reduction into a feature difference enhancement module of a fourth feature enhancement branch to obtain a fourth-scale difference feature map.

In one embodiment, inputting the second scale bi-temporal feature map and the first scale difference feature map after channel reduction into a feature difference enhancement module of a second feature enhancement branch to obtain a second scale difference feature map, including: carrying out absolute value operation on the channel-reduced second-scale bi-temporal feature map through element-by-element subtraction operation to obtain a rough differential feature map; the rough differential feature map is subjected to feature extraction through a first convolution layer and then passes through a first space attention module, so that a first feature differential attention map is obtained; downsampling the first scale difference feature map through a second convolution layer, and obtaining a second feature difference attention map after a downsampling result passes through a second spatial attention module; the first characteristic difference attention map and the second characteristic difference attention map are subjected to weight average to obtain a refined characteristic difference attention map; the second scale double temporal feature map after channel reduction is respectively subjected to differential attention with the features after refinement, and an enhanced double temporal feature map is obtained through pixel-by-pixel multiplication operation; adding the enhanced double-temporal feature map and the corresponding channel-reduced second-scale double-temporal feature map, and then performing convolution operation to obtain a refined double-temporal feature map; the refined double-temporal feature images are spliced and then input into a channel attention module, and the obtained result is multiplied by the spliced result to obtain a feature image after channel enhancement; and adding the characteristic diagram after channel enhancement with the rough differential characteristic diagram after convolution, and then obtaining a second scale differential characteristic diagram after a convolution layer.

Specifically, the multiscale temporal features extracted by ResNet18 are represented asAnd->Taking the i=2nd bi-temporal feature as an example, the bi-temporal feature is first scaled by the channel scaling module (CR) in the channel scaling and differential enhancement module portion of fig. 2The temporal feature is unified to 64 channels, which can reduce the computational load and memory usage, and then input into the FDE module. The input in fig. 4 consists of three parts, +.>And the previous scale feature difference graph, since i=1 is the lowest temporal feature pair, only two inputs i, e {2,3,4} are composed of three parts. The rough differential characteristic diagram D is calculated in the FDE module through the subtraction operation of elements _di Followed by an absolute value operation. Then, difference feature extraction is performed by a 3 x 3 convolution, and a feature difference attention map is obtained by a Spatial Attention Module (SAM)>Feature D after the previous scale enhancement _d(i-1) Extracting difference features through a convolution layer of 3×3, simultaneously downsampling to ensure that the space dimension of the difference features is consistent with the space dimension of the i=2 bi-temporal features, and obtaining a feature difference attention map (I) through a Space Attention Module (SAM)>By averaging two characteristic difference attentions +. >And->The weight of the pixels in the true change region is relatively increased, while the weight of the pixels in the unchanged region is relatively decreased. This process can be expressed as

Where |·| represents the take absolute value operation,representing pixel-by-pixel subtraction operations, D _di Representing the acquired characteristic difference map, D _d(i-1) Characteristic difference map of last scale, +.>Representing the refined feature differential attention map, conv3×3 (·) contains a 3×3 convolution layer, a batch normalization, and a ReLU activation function, SAM represents the spatial attention module. We further highlight the varying regions of temporal features by a pixel-by-pixel multiplication operation. We then add the enhanced temporal feature to the original feature to improve the feature representation. Refined bi-temporal features are extracted again by a 3 x 3 convolution layer, a process which can be expressed as

Wherein,representing a pixel-wise addition operation, +.>Representing pixel-by-pixel multiplication operations, conv3x3 (·) contains a 3×3 convolutional layer, a batch normalization, and a ReLU activation function. />And->The refined temporal characteristics at the time t1 and the time t2 are respectively shown. They are connected to simulate channel correlation using Channel Attention (CAM) and the channel enhanced features are input into a 3 x 3 convolutional layer to reduce some unimportant channels. And adding the refined bi-temporal difference feature with the previous rough feature difference map to make up for the lost difference information, and finally extracting the refined difference feature map through 3×3 convolution. This process can be expressed as:

Wherein Cat (·) is a feature concatenation operation, CAM represents a channel attention module, D _di Representing the characteristic difference map obtained in equation (3), conv3×3 (·) contains a 3×3 convolutional layer, a batch normalization, and a ReLU activation function, D _i Representing the obtained i-th scale difference feature map. And simultaneously extracting and fusing the feature information of the double-temporal images by using the FDE module in the four double-temporal features with different scales to generate a multi-scale difference feature map.

In one embodiment, the adjacent feature gradual fusion module comprises a plurality of feature fusion modules (FB modules); step 106 includes: inputting the first scale difference feature map and the second scale difference feature map into a first feature fusion module to obtain first scale primary fusion features; inputting the second scale difference feature map and the third scale difference feature map into a second feature fusion module to obtain a second scale primary fusion feature; inputting the third scale difference feature map and the fourth scale difference feature map into a third feature fusion module to obtain a third scale primary fusion feature; inputting the first-scale primary fusion feature and the second-scale primary fusion feature into a fourth feature fusion module to obtain a first-scale secondary fusion feature; inputting the first-level fusion features of the second scale and the first-level fusion features of the third scale into a fifth feature fusion module to obtain second-level fusion features of the second scale; and inputting the first-scale second-level fusion features and the second-scale second-level fusion features into a sixth feature fusion module to obtain fusion features.

b＝1-a (15)

d＝c·(1-a)+a·(1-c) (16)

c＝Mask(D _i ) (17)

wherein,for first-scale first-order fusion features, D _i D, as a first scale difference characteristic diagram _i+1 For the second scale difference feature map, +.>For the upsampling feature of the second scale difference feature map, a and c are the variation features +.>And D _i Predictive change map obtained by Mask, D is at D _i And->The conflict attention strive obtained between b represents the boundary compensation attention strive, cat (·) is the feature concatenation operation, conv _1×1 (. Cndot.) comprising a 1X 1 convolution module, a batch normalization and a ReLU activation function, +.>And->Is two enhanced features, conv _3×3 (. Cndot.) contains a 3X 3 convolution module, a batch normalization and a ReLU activation function, CAM is the channel attention module, mask is the variogram prediction module.

Specifically, the structure of the adjacent feature gradual fusion module (AFPF module) is shown in fig. 5, and the input is a time difference feature D _i I e {1,2,3}, a multi-branch structure equipped with boundary compensation and knowledge review is used to capture the time difference feature D _i And D _i+1 Complementary information between them.

As described above, multi-level time-difference feature captureVarying different aspects of the object. To force the network to pay more attention to conflicts between the varying regions of high-level and low-level features and extract complementary information, we implement a knowledge review branch. FIG. 5, D _i+1 The method comprises the steps of up-sampling through bilinear interpolationIts space size and D _i In agreement, a and c are each the variation characteristics +.>And D _i And a predicted change map obtained by a change map prediction module (Mask module). The knowledge review branch may calculate the portion of the conflict between a and c. This process can be expressed as:

c＝Mask(D _i ) (22)

d＝c·(1-a)+a·(1-c) (23)

wherein D is at D _i Andthe conflict attention obtained between them strives for. In addition to the conflict attention attempt, we introduce a boundary-making attention attempt to obtain boundary information. Advanced features are effective for object localization but provide less boundary information about changed objects. Therefore, we use low-level features to compensate for detail information in high-level features under the guidance of boundary-compensation striving. In particular, boundary compensation strives to mask the changed regions predicted by advanced features, forcing the network to pay more attention to the unchanged boundary regions. This process can be expressed as:

b＝1-a (24)

Where b represents a boundary-compensation attention-seeking diagram. Thereafter, the AFPF module will D _i Andsplicing, and performing feature transition through two 1×1 convolution layers, dividing the spliced features into two branches to extract change features, so that the network captures different change areas. This process can be expressed as

Wherein Cat (-) is a feature concatenation operation, conv1×1 (-) contains a 1×1 convolutional layer, a batch normalization, and a ReLU activation function,and->Representing the input characteristics of the two branches of the AFPF. Collision attention and boundary compensation attention are inserted into two branches, so that the network captures boundary information and fine-grained regions of change. In addition, a 3 x 3 convolutional layer after channel attention is also injected into each branch of the AFPF module to enhance feature representation capability, reducing some unimportant channels. This process can be expressed as:

wherein,and->Is an enhanced feature, < >>And->Is the result given in the formulas (25) and (26). />Representing a pixel-wise addition operation, +.>Representing a pixel-by-pixel multiplication operation, and CAM represents a channel attention module. Finally, the step of obtaining the product,and->Are spliced to generate final time difference features by means of 3 x 3 convolution layers >This process can be expressed as:

as shown in the step-by-step fusion module part of adjacent features in FIG. 2, complementary information among multi-scale time difference features is fully captured through the step-by-step fusion mode of adjacent feature graphs, the capability of detecting real changes and the robustness to pseudo changes of a network are improved, and high-level features are utilized to guide the low-level feature modes so as to further improve the boundary of a change region and finally generate a better change detection result.

In one embodiment, step 108 includes: and processing the fusion characteristics by adopting a convolution layer with a convolution kernel of 1 multiplied by 1, and activating by bilinear interpolation up-sampling and Sigmoid activation functions to obtain a remote sensing change detection prediction result.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.

In one validated example, to investigate the effectiveness of our proposed method, we performed experiments on two publicly challenged datasets, namely WHU-CD and LEVIR-CD.

WHU-CD is a public building change detection dataset. The dataset recorded architectural changes in the new zealand clerster region, comprising a pair of images of size 32507 x 15354 pixels with a spatial resolution of 0.2 meters. We first cut the two original images into small blocks of 256 x 256 in size and do not overlap. The dataset was then randomly divided into 5947 training sets, 744 validation sets and 743 test set experiments for use.

The LEVIR-CD dataset consists of 637 sets of 1024 x 1024 dual-temporal high-resolution remote sensing image pairs with a spatial resolution of 0.5 m pixels. And cutting the original image into 256 x 256 blocks to obtain 10192 groups of 256 x 256 high-resolution remote sensing image pairs. The dataset was randomly partitioned in a 7:1:2 ratio, with the training set containing 7120 image pairs and the validation set containing 1024 image pairs, the remaining 2048 image pairs being the test set

(1) Experimental environment

Experiments used the Pytorch framework and Python language, all methods were tested on intel to strong Gold 5315Y (CPU), inflight a800 (GPU) in order to fairly compare the effectiveness of the models.

(2) Experimental results

In order to evaluate the network performance of AFPF-Net, several representative networks in the field of change detection in recent years were chosen as comparative experiments. The method comprises 3 methods based on pure convolution, namely a network based on pure convolution, namely a FC-EF, a FC-SiamDi and a FC-Siam-Conc, 5 methods based on attention, a deep supervision image fusion network SNUNet, a transformation-based conjoined network changeFormer, BIT of CNN and Transformer connected in series in a sequential manner, ICIF-Net of CNN and Transformer connected in parallel, a two-branch multistage spanning network DMINet, and 1 method based on 3D convolution, wherein characteristics of adjacent layers are fused in a crossed manner. All experiments are operated under the same variable setting and environment, and experimental results show that the AFPF-Net network architecture newly proposed by us is higher than the network in accuracy and also has advantages in calculation amount and parameter quantity. Table 1 gives a comparison of experimental results on the LEVIR-CD dataset and the WHU-CD dataset for the different models:

TABLE 1 comparison of experimental results for different models in LEVIR-CD dataset and WHU-CD dataset

In one embodiment, as shown in fig. 6, there is provided a remote sensing change detection apparatus based on gradual fusion of adjacent features, including: the device comprises a dual-time remote sensing image pair acquisition module, a feature extraction module, a multi-scale temporal difference feature map extraction module, a feature fusion module and a remote sensing change detection module, wherein:

In one embodiment, the weight-shared twin feature extraction network is a weight-shared twin network consisting of four residual modules in a ResNet18 network; the multi-scale bi-temporal feature map is a feature output by four residual modules of the weight-sharing twin feature extraction network.

In one embodiment, the multi-scale temporal feature comprises 4 different-scale bi-temporal feature maps; the channel reduction and difference enhancement module comprises four characteristic enhancement branches, wherein each characteristic enhancement branch comprises a channel reduction module and a characteristic difference enhancement module; the multi-scale difference feature map extraction module is also used for respectively inputting 4 double-temporal feature maps with different scales into four feature enhancement branches, and carrying out channel reduction through the channel reduction module; inputting the first scale bi-temporal feature map after channel reduction into a feature difference enhancement module of a first feature enhancement branch to obtain a first scale difference feature map; inputting the second scale bi-temporal feature map and the first scale difference feature map after channel reduction into a feature difference enhancement module of a second feature enhancement branch to obtain a second scale difference feature map; inputting the channel-reduced third-scale bi-temporal feature map and the second-scale difference feature map into a feature difference enhancement module of a third feature enhancement branch to obtain a third-scale difference feature map; and inputting the fourth-scale bi-temporal feature map and the third-scale difference feature map after channel reduction into a feature difference enhancement module of a fourth feature enhancement branch to obtain a fourth-scale difference feature map.

In one embodiment, the multi-scale difference feature map extracting module is further configured to perform an absolute value operation on the second-scale bi-temporal feature map after the channel is reduced through an element-by-element subtraction operation, so as to obtain a rough difference feature map; the rough differential feature map is subjected to feature extraction through a first convolution layer and then passes through a first space attention module, so that a first feature differential attention map is obtained; downsampling the first scale difference feature map through a second convolution layer, and obtaining a second feature difference attention map after a downsampling result passes through a second spatial attention module; the first characteristic difference attention map and the second characteristic difference attention map are subjected to weight average to obtain a refined characteristic difference attention map; the second scale double temporal feature map after channel reduction is respectively subjected to differential attention with the features after refinement, and an enhanced double temporal feature map is obtained through pixel-by-pixel multiplication operation; adding the enhanced double-temporal feature map and the corresponding channel-reduced second-scale double-temporal feature map, and then performing convolution operation to obtain a refined double-temporal feature map; the refined double-temporal feature images are spliced and then input into a channel attention module, and the obtained result is multiplied by the spliced result to obtain a feature image after channel enhancement; and adding the characteristic diagram after channel enhancement with the rough differential characteristic diagram after convolution, and then obtaining a second scale differential characteristic diagram after a convolution layer.

In one embodiment, the adjacent feature gradual fusion module comprises a plurality of feature fusion modules; the feature fusion module is also used for inputting the first scale difference feature map and the second scale difference feature map into the first feature fusion module to obtain first scale primary fusion features; inputting the second scale difference feature map and the third scale difference feature map into a second feature fusion module to obtain a second scale primary fusion feature; inputting the third scale difference feature map and the fourth scale difference feature map into a third feature fusion module to obtain a third scale primary fusion feature; inputting the first-scale primary fusion feature and the second-scale primary fusion feature into a fourth feature fusion module to obtain a first-scale secondary fusion feature; inputting the first-level fusion features of the second scale and the first-level fusion features of the third scale into a fifth feature fusion module to obtain second-level fusion features of the second scale; and inputting the first-scale second-level fusion features and the second-scale second-level fusion features into a sixth feature fusion module to obtain fusion features.

In one embodiment, the feature fusion module is further configured to input the first-scale temporal difference feature map and the second-scale temporal difference feature map into the first feature fusion module, so as to obtain first-scale first-level fusion features as shown in formulas (11) to (19).

In one embodiment, the change map prediction module in the feature fusion module includes a 3×3 convolution layer, a batch normalization layer, a ReLU activation function, a 1×1 convolution layer, and a Sigmoid activation function.

In one embodiment, the remote sensing change detection module processes the fusion feature by adopting a convolution layer with a convolution kernel of 1×1, and then activates the fusion feature through bilinear interpolation up-sampling and Sigmoid activation function to obtain a remote sensing change detection prediction result.

For specific limitation of the remote sensing change detection device based on the gradual fusion of the adjacent features, reference may be made to the limitation of the remote sensing change detection method based on the gradual fusion of the adjacent features hereinabove, and the description thereof will not be repeated here. All or part of each module in the remote sensing change detection device based on the gradual fusion of adjacent features can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a remote sensing change detection method based on gradual fusion of adjacent features. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment a computer device is provided comprising a memory storing a computer program and a processor implementing the steps of the method embodiments described above when the computer program is executed.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. The remote sensing change detection method based on gradual fusion of adjacent features is characterized by comprising the following steps:

acquiring a to-be-detected double-time remote sensing image pair;

inputting the to-be-detected double-temporal remote sensing image pair into a twin feature extraction network with shared weights to obtain a multi-scale double-temporal feature map;

inputting the multi-scale double temporal characteristics into a channel reduction and difference enhancement module to obtain multi-scale difference characteristics; the channel reduction and difference enhancement module is used for carrying out channel reduction and characteristic difference enhancement on the double-temporal characteristic map of each scale by adopting multiple branches, and the enhanced characteristics are input into branches of adjacent scales for characteristic difference enhancement;

inputting the multi-scale difference features into an adjacent feature gradual fusion module to obtain fusion features; the adjacent feature gradual fusion module is used for enhancing the change features by utilizing complementary information between adjacent scale difference features through boundary compensation and knowledge review branches to obtain multi-scale primary fusion features, and continuing to fuse the adjacent features of the adjacent scale primary fusion features until a single-scale fusion feature is obtained;

2. The method according to claim 1, wherein the weight-shared twin feature extraction network is a weight-shared twin network consisting of four residual modules in a res net network;

3. The method of claim 1, wherein the multi-scale temporal features comprise 4 different-scale bi-temporal feature maps; the channel reduction and difference enhancement module comprises four characteristic enhancement branches, wherein the characteristic enhancement branches comprise a channel reduction module and a characteristic difference enhancement module;

respectively inputting 4 double-temporal feature maps with different scales into four feature enhancement branches, and carrying out channel reduction through the channel reduction module;

inputting the first scale bi-temporal feature map after channel reduction into a feature difference enhancement module of a first feature enhancement branch to obtain a first scale difference feature map;

inputting the second scale bi-temporal feature map and the first scale difference feature map after channel reduction into a feature difference enhancement module of a second feature enhancement branch to obtain a second scale difference feature map;

Inputting the channel-reduced third-scale bi-temporal feature map and the second-scale difference feature map into a feature difference enhancement module of a third feature enhancement branch to obtain a third-scale difference feature map;

4. The method of claim 3, wherein inputting the channel-reduced second-scale bi-temporal feature map and the first-scale difference feature map into a feature difference enhancement module of a second feature enhancement branch to obtain a second-scale difference feature map comprises:

carrying out absolute value operation on the channel-reduced second-scale bi-temporal feature map through element-by-element subtraction operation to obtain a rough differential feature map;

the rough differential feature map is subjected to feature extraction through a first convolution layer and then passes through a first space attention module, so that a first feature differential attention map is obtained;

downsampling the first scale difference feature map through a second convolution layer, and obtaining a second feature difference attention map after a downsampling result passes through a second spatial attention module;

Carrying out weight average on the first characteristic difference attention map and the second characteristic difference attention map to obtain a refined characteristic difference attention map;

the second scale double temporal feature map after channel reduction is respectively subjected to differential attention with the features after refinement, and an enhanced double temporal feature map is obtained through pixel-by-pixel multiplication operation;

adding the enhanced double-temporal feature map and the corresponding channel-reduced second-scale double-temporal feature map, and then performing convolution operation to obtain a refined double-temporal feature map;

the refined double-temporal feature images are spliced and then input into a channel attention module, and the obtained result is multiplied by the spliced result to obtain a feature image after channel enhancement;

5. The method of claim 1, wherein the adjacent feature progressive fusion module comprises a number of feature fusion modules;

inputting the multi-scale difference feature map into an adjacent feature gradual fusion module to obtain fusion features, wherein the method comprises the following steps:

inputting the first scale difference feature map and the second scale difference feature map into a first feature fusion module to obtain first scale primary fusion features;

Inputting the second scale difference feature map and the third scale difference feature map into a second feature fusion module to obtain a second scale primary fusion feature;

inputting the third scale difference feature map and the fourth scale difference feature map into a third feature fusion module to obtain a third scale primary fusion feature;

inputting the first-scale primary fusion feature and the second-scale primary fusion feature into a fourth feature fusion module to obtain a first-scale secondary fusion feature;

inputting the second-scale primary fusion feature and the third-scale primary fusion feature into a fifth feature fusion module to obtain a second-scale secondary fusion feature;

6. The method of claim 5, wherein inputting the first scale difference feature map and the second scale difference feature map into the first feature fusion module, the obtaining first scale first-order fusion features is:

b＝1-a

d＝c·(1-a)+a·(1-c)

c＝Mask(D _i )

wherein,for first-scale first-order fusion features, D _i D, as a first scale difference characteristic diagram _i+1 For the second scale difference feature map, +. >For the upsampling feature of the second scale difference feature map, a and c are the variation features +.>And D _i Predictive change map obtained by Mask, D is at D _i And->The conflict attention strive obtained between b represents the boundary compensation attention strive, cat (·) is the feature concatenation operation, conv _1×1 (. Cndot.) comprising a 1X 1 convolution module, a batch normalization and a ReLU activation function, +.>And->Is two enhanced features, conv _3×3 (. Cndot.) contains a 3X 3 convolution module, a batch normalization and a ReLU activation function, CAM is the channel attention module, mask is the variogram prediction module.

7. The method of claim 6, wherein the change map prediction module comprises a 3 x 3 convolutional layer, a batch normalization layer, a ReLU activation function, a 1 x 1 convolutional layer, and a Sigmoid activation function.

8. The method of claim 1, wherein predicting the fusion feature to obtain a remote sensing change detection prediction comprises:

and processing the fusion characteristic by adopting a convolution layer with a convolution kernel of 1 multiplied by 1, and activating by bilinear interpolation up-sampling and Sigmoid activation function to obtain a remote sensing change detection prediction result.

9. A remote sensing change detection device based on gradual fusion of adjacent features, the device comprising:

the double-time remote sensing image pair acquisition module is used for acquiring a double-time remote sensing image pair to be detected;

the feature extraction module is used for inputting the to-be-detected double-temporal remote sensing image into a weight sharing twin feature extraction network to obtain a multi-scale double-temporal feature map;

the multi-scale difference feature map extraction module is used for inputting the multi-scale double-temporal feature map into the channel reduction and difference enhancement module to obtain a multi-scale difference feature map; the channel reduction and difference enhancement module is used for carrying out channel reduction and characteristic difference enhancement on the double-temporal characteristic map of each scale by adopting multiple branches, and the enhanced characteristics are input into branches of adjacent scales for characteristic difference enhancement;

the feature fusion module is used for inputting the multi-scale difference feature map into the adjacent feature gradual fusion module to obtain fusion features; the adjacent feature gradual fusion module is used for enhancing the change features by using the boundary compensation and knowledge review branches and utilizing the complementary information between adjacent scale difference features to obtain multi-scale primary fusion features, and continuing to fuse the adjacent features of the adjacent scale primary fusion features until a single-scale fusion feature is obtained;

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 8 when executing the computer program.