CN115049922A

CN115049922A - Method and system for detecting change of remote sensing image

Info

Publication number: CN115049922A
Application number: CN202210540288.6A
Authority: CN
Inventors: 张凯; 赵雪; 孙建德; 张风; 万文博
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-09-13

Abstract

The invention provides a method and a system for detecting remote sensing image change, which are used for extracting features of an image before change and an image after change and extracting local information through CNN. Based on the construction of a cross-time-phase Transformer and a convolutional neural network structure, the characteristics of the Transformer are fully utilized, images before and after change respectively enter the cross-time-phase Transformer and the Transformer decoder with new attention mechanisms, global change information is extracted, and a final change detection result is obtained through feature stacking.

Description

Method and system for detecting change of remote sensing image

Technical Field

The disclosure belongs to the technical field of image processing, and particularly relates to a method and a system for detecting various remote sensing image changes.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The change detection technology is generally used for detecting changes of images in the same area at different time periods, and has been applied to multiple fields such as land utilization, city expansion, farmland change, forest protection and the like at present.

The core of the change detection technology is the extraction of image change information, which mainly includes the following three types: 1) based on the extraction of the change information of the pixels, the algorithm takes the pixels as processing units and calculates the change information one by one, and the method has good effect of detecting the change of a large-area, but is easy to generate noise. 2) The time series analysis method aims at analyzing the influence of long time, but has higher requirement on time resolution. 3) The deep learning method has an end-to-end network structure, is widely applied to change detection technologies at present, and comprises a neural network, a deep neural network, a recurrent neural network and the like, and in recent years, a Transformer is also applied to computer vision. However, the conventional change detection method does not fully utilize the characteristics of the Transformer, and only uses the Transformer as a feature learning tool.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a method and a system for detecting changes in remote sensing images, which are based on the construction of cross-time-phase transform and convolutional neural network structures, make full use of the characteristics of the transform, enable images before and after changes to respectively enter cross-time-phase transform and transform decoder with new attention mechanism to capture global change information, extract local information in combination with CNN, and obtain a final change detection result through feature stacking.

According to some embodiments, the following technical scheme is adopted in the disclosure:

a method for detecting changes of remote sensing images comprises the following specific training steps:

acquiring images before change and images after change and preprocessing the images to obtain paired training data;

based on a deep neural network structure of a cross-temporal Transformer, performing feature extraction on image data before and after change by using a convolutional layer to obtain an original feature map;

acquiring a changed region of the feature graph after feature extraction based on a cross-time phase Transformer structure of the deep neural network of the cross-time phase Transformer to obtain changed features;

and stacking the obtained changed features and the original feature map, and extracting convolution features to obtain a final change detection map.

According to other embodiments, the present disclosure also adopts the following technical solutions:

the image acquisition module is used for acquiring images before change and images after change and preprocessing the images to obtain paired training data;

the characteristic extraction module is used for extracting the characteristics of the image data before and after the change by using the convolution layer to obtain an original characteristic diagram;

the characteristic blocking module is used for capturing a changed area of the characteristic graph after characteristic extraction based on the cross-time phase Transformer structure of the cross-time phase Transformer deep neural network to obtain changed characteristics;

the characteristic stacking module is used for stacking the obtained changed characteristics and the original characteristic diagram;

and the characteristic training module is used for extracting convolution characteristics and outputting a final change detection graph.

Compared with the prior art, the beneficial effect of this disclosure is:

the method constructs a cross-time-phase Transformer and convolutional neural network structure, fully captures the characteristics of image change, utilizes the characteristics of the Transformer, provides a new attention mechanism to capture a change area, and combines with the CNN to realize accurate change detection.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow chart of a method implementation of the present disclosure;

FIG. 2 is a graph of the results of change detection of the present disclosure;

FIG. 3 is a schematic structural diagram of two trans-temporal transformers of the present disclosure;

FIG. 4 is a schematic diagram of a corresponding Multi-Head Cross-Temporal Attention structure in a trans-Temporal Transformer according to the present disclosure.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Example 1

The present disclosure provides a method for detecting a change of a remote sensing image, which includes the following steps:

s101: acquiring images before change and images after change and preprocessing the images to obtain paired training data;

s102: based on a deep neural network structure of a cross-temporal Transformer, performing feature extraction on image data before and after change by using a convolutional layer to obtain an original feature map;

s103, capturing a changed area of the feature graph after feature extraction based on a cross-time phase Transformer structure of a cross-time phase Transformer deep neural network to obtain changed features;

and S104, stacking the obtained changed features and the original feature map, and extracting convolution features to obtain a final change detection map.

In step S101, an image before change of an image to be detected and an image after change thereof are acquired and are respectively denoted as I ₁ And I ₂ During training, the images I before change are respectively input ₁ And the changed image I ₂ Specifically, the data in the public large building change detection data set may be used, and the image size includes 637 pairs of high-resolution (0.5m) remote sensing images, which are 1024 × 1024.

The image is preprocessed, mainly the size of the image is regulated, the size of the image before and after the acquisition change is defined, the image is divided into small blocks with the size of 3 multiplied by 256, and the small blocks are overlapped to obtain paired training data.

An improved deep neural network is constructed, namely a cross-time phase Transformer module is arranged in the deep neural network, the deep neural network containing a plurality of convolution modules and the cross-time phase Transformer module is constructed to extract the characteristics of the images before and after change and capture the change area, and in the proposed deep neural network, a residual error network (Resnet18) is used by a first layer of convolution module to down-sample the collected images, namely the input images, so as to obtain the extracted characteristics.

In step S102, based on the deep neural network structure of the cross-temporal Transformer, feature extraction is performed on the image data before and after the change by using the convolution layer, so as to obtain an original feature map.

Specifically, after the sizes of the images before and after the change are defined, the images I before and after the change are respectively defined ₁ 、I ₂ And performing feature extraction through two same residual error network structures, and respectively outputting original feature maps F1 and F2 with the size of 32 multiplied by 128. The image before change passes through a residual error network module to obtain a characteristic map F1, and the image after change passes through the residual error network module to obtain a characteristic map F2.

In step S103, capturing a changed region of the feature map after feature extraction based on a cross-temporal Transformer structure of a deep neural network of the cross-temporal Transformer, to obtain changed features;

feature maps F1 and F2 are subjected to feature blocking to obtain token1 and token2, wherein the feature blocking is realized by combining convolution with a Flatten function and a Transpose function, wherein the convolution is achieved by taking the convolution kernel size as the blocking size, and the step size is the blocking size. The block size was 16 × 16, and the obtained token1 and token2 had a size of 8 × 64 × 32.

Feature images F1 and F2 after feature extraction of images before and after change are respectively subjected to feature block conversion to obtain token1 and token2, then query and value are obtained from the token1, a key input cross-time phase Transformer is obtained from the token2, and a changed area is captured through an attention mechanism.

Specifically, the feature blocking is to perform convolution with input F1 and F2 by using a convolution kernel size as a blocking size and a step size as a blocking size to obtain a four-dimensional tensor, wherein the two latter dimensions of the four-dimensional tensor are square roots of the number of the blocking, the four-dimensional tensor is reduced to three dimensions by a flip function, and the last dimension of the blocking number and a transpose function interchange the first-last dimension and the second-last dimension to obtain token1 and token 2.

token2 gets query, value, token1 gets key input and another cross-phase Transformer captures the changed region by attention mechanism, and gets new token3, token 4.

Specifically, token1 obtains query and value through Linear mapping; token2 is subjected to Linear mapping to obtain a key; simultaneously inputting the obtained query, value and key into a cross-time-phase Transformer, and capturing a changed region through a newly proposed attribute mechanism; token2 obtains query and value through Linear mapping; token1 is subjected to Linear mapping to obtain a key; and inputting the obtained query, value and key into another cross-phase transform, and obtaining a changed area through a newly-proposed attribute mechanism. A new token3, token4 was obtained.

The corresponding Multi-Head Cross-Temporal Attention structure in the Cross-Temporal transform is as follows, and the specific expression of the structure is as follows:

wherein Q represents query, K represents key, and V represents value; q1, K1, V1 were obtained by token1 linear mapping, Q2, K2, V2 were obtained by token2 linear mapping, d is the number of columns Q1 and K1; abs () represents an absolute value operation. Softmax is a normalized exponential function.

In step S104, the obtained changed features and the original feature map are subjected to feature stacking, and convolution feature extraction is performed to obtain a final change detection map.

The obtained feature maps F1 and F2 are down-sampled, token3 and feature map F1 are input to a Transformer decoder, and token4 and feature map F2 are input to another Transformer decoder, so that change feature maps T1 and T2 are obtained.

The obtained change characteristic diagram T ₁ And characteristic diagram F ₁ Change profile T ₂ And characteristic diagram F ₂ The stacking of the features is performed separately,obtaining a new characteristic change diagram T ₁ ’、T ₂ ’。

And performing feature stacking on the obtained feature change maps T1 'and T2' to obtain a feature map T, inputting the feature map T into a convolution feature extraction module, and outputting a final change detection map.

Example 2

And constructing a network structure comprising a cross-time-phase Transformer and a convolution network, fully utilizing the characteristics of the Transformer, enabling the images before and after the change to respectively enter the cross-time-phase Transformer and the Transformer decoder with a new attention mechanism, and obtaining a final change detection result through feature stacking. The method comprises the following specific steps:

(1) inputting an image:

the image before change I1 and the image after change I2 are input, respectively, and feature extraction is performed on the image before change and the image after change by the convolutional layer, so that paired training data is obtained.

(2) Constructing a deep neural network based on a cross-phase Transformer:

and (3) performing feature extraction on the images before and after the change and capturing the change area by constructing a deep neural network containing a plurality of convolution modules and a cross-time phase Transformer module. In the deep neural network provided by the invention, a residual error network (Resnet18) is used by a first layer convolution module to extract the characteristics of an input image.

(2a) And the image before change passes through a residual error network module to obtain a characteristic diagram F1, and the image after change passes through the residual error network module to obtain a characteristic diagram F2.

(2b) Feature maps F1 and F2 convert features into token1 and token2, respectively, by a feature segmentation module. The characteristic block is realized by combining a convolution with a convolution kernel size as a block size and a step size as a block size with a Flatten and Transpose function. Performing convolution on input F1 and F2 to obtain a four-dimensional tensor, wherein the size of a convolution kernel is the size of a block, the step size is the size of the block, the four-dimensional tensor is obtained, the two dimensions behind the four-dimensional tensor are the square root of the number of the blocks, the four-dimensional tensor is reduced to be three-dimensional by a flatten function, the last dimension is the number of the blocks, and the first-to-last dimension and the second-to-last dimension are interchanged by a transpose function to obtain token1 and token 2.

(2c) Query and value are obtained from token1, and a key input cross-time phase transform is obtained from token2, and a changed area is captured by a newly-proposed attention mechanism; another cross-phase transform, which token2 gets query, value, token1 gets key input, gets the changed region through the newly proposed attribute mechanism. A new token3, token4 was obtained. token1 obtains query and value through Linear mapping; token2 is subjected to Linear mapping to obtain a key; simultaneously inputting the obtained query, value and key into a cross-time-phase Transformer, and capturing a changed region through a newly proposed attribute mechanism; token2 obtains query and value through Linear mapping; token1 is subjected to Linear mapping to obtain a key; and inputting the obtained query, value and key into another cross-phase transform, and obtaining a changed region through a newly-proposed attribute mechanism. A new token3, token4 was obtained.

The structures of two trans-phase transformers are shown in FIG. 3:

as shown in fig. 3(a), Q1, K1, and V1 are three matrices obtained by token1 linear mapping, K2 is a matrix obtained by token2 linear mapping, K2, Q1, K1, and V1 enter a cross-phase Transformer, which is composed of a multi-head cross-phase attachment and a feed-forward network layer, and a residual concatenation and normalization module is used.

As shown in fig. 3(b), Q2, K2, and V2 are three matrices obtained by token2 linear mapping, K1 is a matrix obtained by token2 linear mapping, K1, Q2, K2, and V2 enter a cross-phase Transformer, which is composed of a multi-head cross-phase attachment and a feed-forward network layer, and a residual concatenation and normalization module is used.

The corresponding Multi-Head Cross-Temporal Attention structure in a Transformer is shown in FIG. 4:

as shown in fig. 4(a), Q1, K1, V1, and K2 are linearly mapped to token1 and token2 to obtain a three-dimensional matrix, and a four-dimensional matrix is obtained by a rearrange function. The transpose of Q1 by K2 is then subtracted from the transpose of Q1 by K1 and the absolute value is taken. The result obtained is divided by the root number d, d being the dimension of Q. Finally, multiplying by V1 for normalization.

As shown in fig. 4(b), Q2, K2, V2, and K1 are linearly mapped to token2 and token1 to obtain a three-dimensional matrix, and a four-dimensional matrix is obtained by a rearrange function. The transpose of Q2 by K1 is then subtracted from the transpose of Q2 by K2 and the absolute value is taken. The obtained result is divided by the root number d, which is the number of columns of Q. Finally multiplying by V2 for normalization.

The concrete expression of the structure is as follows:

wherein Q1, K1 and V1 are matrixes obtained by linear mapping of token1, Q2, K2 and V2 are matrixes obtained by linear mapping of token2, d is the column number of Q and K, abs () represents absolute value operation, and Softmax is a normalized exponential function.

(2d) F1 and F2 were sampled, and token1 obtained in (2c) and F1 obtained in (2b) were input to a fransformer decoder, and token2 obtained in (2c) and F2 obtained in (2b) were input to another fransformer decoder, to obtain change characteristics T1 and T2.

(2e) And (3) stacking the characteristic diagram T1 obtained in the step (2d) and the characteristic diagram F1 obtained in the step (2b), and stacking the characteristic diagram T2 obtained in the step (2d) and the characteristic diagram F2 obtained in the step (2 b). And obtaining new characteristic maps T1 'and T2'.

(2f) And stacking the characteristics of the characteristic maps T1 'and T2' obtained in the step (2e) to obtain a characteristic map T.

(2g) And (3) inputting the feature map T in the step (2f) into a convolution feature extraction module to obtain a final change detection map.

(3) Training the network by using the training samples generated in the step 1 and a random gradient descent algorithm, wherein a loss function is as follows: the cross-entropy loss is minimized to optimize network parameters.

Wherein l (P) _hw ,y)＝-log(P _hwy ) Is the cross entropy loss, Y _hw Is the pixel value at the (h, w) position.

(4) Training and testing:

and (3) training the deep neural network by using the training sample obtained in the step (1) and adopting a random gradient descent algorithm to obtain the trained deep neural network. And inputting the images before and after the change to be detected into the trained deep neural network to obtain the images of the changed part after the change detection.

Example 3

The present disclosure further provides a remote sensing image change detection system, which specifically includes:

a feature stacking module for stacking the obtained changed features with the original feature map,

The module realizes the following method steps:

s103, capturing a changed region of the feature graph after feature extraction based on a cross-time phase Transformer structure of a deep neural network of the cross-time phase Transformer to obtain changed features;

The effects of the present disclosure can be further illustrated by the following simulations:

1. simulation environment:

PyCharm Community Edition 2021.02x64，NVIDIA 2080Ti GPU,Ubuntu 16.04。

2. simulation content:

simulation 1, the present disclosure uses data from a public large building change detection set, including 637 pairs of high-resolution (0.5m) remote sensing images, image size 1024 × 1024, and time span 5 to 14 years of bit imagery. The detection results are shown in fig. 2, wherein:

fig. 2(a) is a remote sensing image before change, and the size is 3 × 256 × 256.

Fig. 2(b) shows the remote sensing image after the change, and the size is 3 × 256 × 256.

Fig. 2(c) is a change detection image obtained by detecting a change in fig. 2(a) and 2(b) according to the present invention, and has a size of 3 × 256 × 256.

FIG. 2(d) shows that the change detection in FIG. 2(a) and FIG. 2(b) is true, 3X 256.

As can be seen from fig. 2, the detection result of the method provided by the present disclosure is substantially consistent with the true value, and a very accurate change detection effect is achieved.

Simulation 2, in order to prove the effect of the present invention, the method of the present invention is used to perform change detection on the images of fig. 2(a) and 2(b), and objective index evaluation is performed on the detection results, wherein the evaluation indexes are all% and the results are shown in table 1.

TABLE 1 Objective evaluation of the results of various methods of change detection

The evaluation indexes are as follows: pre is precision rate (precision), also known as precision rate. I.e., the proportion of correctly predicted positive samples to all detected predicted positive samples. The value is [0,1], the larger the value, the better.

Recall is the Recall rate, also known as Recall rate. I.e. the proportion of correctly predicted samples to all positive samples. The value is [0,1], the larger the value, the better.

F1 is the harmonic mean of precision and accuracy. The value is [0,1], the larger the value, the better.

Iou is the cross-over ratio. The value is [0,1], the larger the value, the better.

OA is the overall accuracy. I.e. the proportion of the correct sample to the total sample is measured. The value is [0,1], the larger the value, the better.

As can be seen from table 1, F1, Iou, and Recall of the present disclosure are all greater than the prior art evaluation index, and thus it can be seen that most of the objective evaluation indexes of the present disclosure are superior to the prior art objective evaluation indexes.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A method for detecting the change of a remote sensing image is characterized by comprising the following specific training steps:

2. The method for detecting the change of the remote sensing image as claimed in claim 1, wherein the size of the image before and after the change is defined to be 3 x 256, then the image before and after the change is respectively subjected to feature extraction through two same residual error network structures, and an original feature map F is respectively output ₁ 、F ₂ And the size is 32 × 128 × 128.

3. The remote sensing image change detection method of claim 1, wherein the region where the feature map is captured and changed based on a cross-time phase Transformer structure of a cross-time phase Transformer deep neural network is specifically: feature map F after feature extraction of images before and after change ₁ 、F ₂ Feature blocking transformation is respectively carried out to obtain token1 and token2, then query and value are obtained from the token1, a key input cross-phase Transformer is obtained from the token2, and a changed region is captured through an attention mechanism.

4. A remote sensing image change detection method as claimed in claim 3, characterized in that token2 obtains query and value, token1 obtains key input and another cross-phase transform captures changed regions through an attention mechanism, and new token3 and token4 are obtained.

5. A method for remote sensing image change detection as recited in claim 3 wherein said feature blocks are implemented by a convolution kernel with block size and a convolution with block size step size combined with scatter and Transpose functions.

6. A method for detecting changes in remote sensing images as claimed in claim 3, wherein the specific expression of the attention mechanism structure is:

7. A method for detecting changes in remote sensing images as claimed in claim 1, wherein the obtained feature map F is used ₁ 、F ₂ Down-sampling, and connecting token3 with feature map F ₁ Inputting the data into a Transformer decoder, inputting token4 and a characteristic diagram into another Transformer decoder to obtain a change characteristic diagram T ₁ 、T ₂ 。

8. A method for remote sensing image change detection as claimed in claim 7, characterized in that the change feature map T to be obtained ₁ And characteristic diagram F ₁ Change profile T ₂ And characteristic diagram F ₂ Respectively stacking the features to obtain a new feature change diagram T ₁ ’、T ₂ ’。

9. The remote sensing image change detection method of claim 8, wherein the obtained feature change maps T1 'and T2' are subjected to feature stacking to obtain a feature map T, and then the feature map T is input into a convolution feature extraction module to output a final change detection map.

10. A remote sensing image change detection system, comprising:

the characteristic extraction module is used for extracting the characteristics of the image data before and after the change by utilizing the convolution layer to obtain an original characteristic diagram;

the characteristic blocking module is used for capturing a changed area of the characteristic graph after characteristic extraction based on a cross-time phase Transformer structure of a cross-time phase Transformer deep neural network to obtain changed characteristics;