CN110059658B

CN110059658B - Remote sensing satellite image multi-temporal change detection method based on three-dimensional convolutional neural network

Info

Publication number: CN110059658B
Application number: CN201910342178.7A
Authority: CN
Inventors: 高昆; 韩璐; 倪崇; 张庆君; 王俊伟; 张宇桐
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2020-11-24
Anticipated expiration: 2039-04-26
Also published as: CN110059658A

Abstract

The invention discloses a remote sensing satellite image multi-temporal change detection method based on a three-dimensional convolution neural network, and provides a three-dimensional U-net model, wherein the model is input into four dimensions of image length, width, channel number and time, the length, the width and the time are simultaneously operated by utilizing the three-dimensional convolution, and the three-dimensional pooling and up-sampling operation are used in the same way. According to the invention, the relevance between the pictures is controlled by setting the convolution kernel size of a reasonable time dimension, and the relevance of more pictures can be considered by increasing the dimension. For the problem of heavy data labels in the past, the model can set the weight of a loss function of unsupervised data to be zero in the training process according to a small amount of supervised data, so that the model is directly trained, and the workload of required labels is greatly reduced.

Description

Remote sensing satellite image multi-temporal change detection method based on three-dimensional convolutional neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a remote sensing satellite image multi-temporal change detection method based on a three-dimensional convolutional neural network.

Background

The conventional U-net model processes two-dimensional images in spatial dimension, and for a normal image, the input may be represented as CWH, where W and H represent the length and width of the image, C represents the number of channels, and for an RGB image, the number of channels is 3. The two-dimensional convolution kernel size in the model can be expressed as < KW, KH >. A convolution operation is performed on each channel in the spatial dimension. In the multi-temporal problem, the picture data to be predicted is a plurality of pictures observed at different times and having the same size.

The traditional U-net inputs two images (before and after change), two images (with the size of the images) are connected into a double channel (with the size of the images) to be regarded as one image as input, and because the traditional input is multi-temporal image data, the number of the channels is increased by one step, so that the workload is increased, and the data processing problem caused by the entering of new information quantity is solved.

Therefore, the invention provides a remote sensing satellite image multi-time phase change detection method based on a three-dimensional convolution neural network to solve the problems.

Disclosure of Invention

The invention aims to solve the defects in the prior art, such as: the traditional U-net inputs two images (before and after change), two images (with the size of the images) are connected into a double channel (with the size of the images) to be regarded as one image as input, and because the traditional input is multi-temporal image data, the number of the channels is increased by one step, so that the workload is increased, and the data processing problem caused by the entering of new information quantity is solved.

In order to achieve the purpose, the invention adopts the following technical scheme:

a remote sensing satellite image multi-temporal change detection method based on a three-dimensional convolutional neural network comprises the following steps:

the method comprises the following steps: the time dimension is added in the U-net, image information at different times can be effectively brought into calculation of change detection, and due to the introduction of convolution of the time dimension, the time receptive field is increased layer by layer in the U-net, and the general relation of the convolution neural network is satisfied:

R_out＝R_in+(K_T-1)D (1)

where Rout and Rin are the output and input receptive fields of one layer of the convolutional neural network, respectively, and D is the distance between the features.

Step two: the input and output data are stored to have the same time dimension, the space dimensions of the input and output data before and after the convolution operation are kept unchanged by using a method of filling Image edges in the U-net, so that the network design is simplified, and the output of the three-dimensional U-net, namely Change (i-1, i), corresponds to the input images Image i with the same sequence number one by one. Change (i-1, i) is represented as a Change between the graph i-1 and the graph i, and since the first graph Image 1 has no corresponding Change, the corresponding Change (0,1) is null, and the loss function and optimization are not calculated for the value in the training process.

Step three: the receptive field of the three-dimensional U-net on time is controlled by the size KT of a convolution kernel in the time dimension, when the KT is 1, the network only calculates one frame of image in the time dimension in each convolution, so that the difference of adjacent pictures cannot be compared for change detection, and therefore the minimum value of the KT is 2.

Preferably, in the second step, the same space-time size is used for all convolution kernels in the three-dimensional U-net, the fixed space size is KW ═ 3 and KH ═ 3, and KT is adjusted according to the requirement, and the moving step length (Stride) of the convolution kernels is 1.

Preferably, in the second step, for different KTs, the Padding method needs to be adjusted to keep the output time dimension unchanged, for a convolution network with an expansion rate of 1, the output size is 65, and to keep TOUT ═ TIN, when KT is 2 and Stride is 1, the Padding value is 1, that is, the number of Padding needed in this dimension.

Preferably, in the second step, padding is added before Image 1 before each convolution, and the padding method is to copy the positions of Image 1 to Image 0; padding is 2 when KT is 3, two same pictures can be inserted before Image 1 by filling at the beginning or the end of data of the dimension, and the method relates multiple temporal pictures by considering multiple pictures before change detection so as to strengthen the detection result.

Preferably, in the second step, when detecting Image i-1 and Image i, the feature of Image i-2 is also added to the detection calculation.

Preferably, in the second step, when detecting changes of Image i-1 and Image i, a picture of future time, Image i +1, is added to the detection calculation, and the two methods are compared in the subsequent result analysis, and for a larger KT, there are more combinations to assign Padding, and the analysis method is similar to that here.

Compared with the prior art, the invention has the beneficial effects that:

in order to solve the data processing problem caused by the entry of new information quantity, the invention provides a three-dimensional U-net model of a remote sensing satellite image multi-temporal change detection method based on a three-dimensional convolutional neural network. The model inputs four dimensions of image length, width, channel number and time, the length, width and time are simultaneously operated by utilizing three-dimensional convolution, and three-dimensional pooling and up-sampling operation are used in the same way. And can process multi-temporal satellite pictures of any time length. The correlation between pictures is controlled by setting a convolution kernel size of a reasonable time dimension. Increasing this dimension may allow for more picture associations. For the problem of heavy data labels in the past, the model can set the weight of a loss function of unsupervised data to be zero in the training process according to a small amount of supervised data, so that the model is directly trained, and the workload of required labels is greatly reduced.

The first embodiment is as follows:

R_out＝R_in+(K_T-1)D (1)

In the second step, the same space-time size is used for all convolution kernels in the three-dimensional U-net, the fixed space size is KW-3 and KH-3, KT is adjusted according to the requirement, and the moving step length (Stride) of the convolution kernels is 1.

In the second step, for different KTs, the Padding method needs to be adjusted to keep the output time dimension unchanged, for a convolutional network with an expansion rate of 1, the output size is 65, in order to keep TOUT ═ TIN, when KT is 2 and Stride is 1, the Padding value is 1, that is, the number of Padding needed in this dimension.

Adding padding before Image 1 before each convolution, wherein the padding method is to copy the positions of Image 1 to Image 0; padding is 2 when KT is 3, two same pictures can be inserted before Image 1 by filling at the beginning or the end of data of the dimension, and the method relates multiple temporal pictures by considering multiple pictures before change detection so as to strengthen the detection result.

In the second step, when detecting Image i-1 and Image i, the characteristics of Image i-2 are also added to the detection calculation.

In step two, when detecting changes in Image i-1 and Image i, a future time picture Image i +1 is added to the detection calculation, and the two methods will be compared in the subsequent result analysis, and for a larger KT, there are more combinations to assign Padding, and the method of analysis is similar to that here.

In the invention, in order to solve the data processing problem caused by the entering of new information quantity, the patent innovatively provides a three-dimensional U-net model of a remote sensing satellite image multi-temporal change detection method based on a three-dimensional convolution neural network. The model inputs four dimensions of image length, width, channel number and time, the length, width and time are simultaneously operated by utilizing three-dimensional convolution, and three-dimensional pooling and up-sampling operation are used in the same way. And can process multi-temporal satellite pictures of any time length. The correlation between pictures is controlled by setting a convolution kernel size of a reasonable time dimension. Increasing this dimension may allow for more picture associations. For the problem of heavy data labels in the past, the model can also set the weight of a loss function of unsupervised data to be zero in the training process according to a small amount of supervised data, so that the model is directly trained, the workload of the required labels is greatly reduced, and the specific implementation steps are as follows: the time dimension is added in the U-net, image information at different times can be effectively brought into calculation of change detection, and due to the introduction of convolution of the time dimension, the time receptive field is increased layer by layer in the U-net, and the general relation of the convolution neural network is satisfied:

R_out＝R_in+(K_T-1)D (1)

where Rout and Rin are the output and input receptive fields of one layer of the convolutional neural network, respectively, D is the distance between the features,

further, the input and output data are stored to have the same time dimension, the space dimensions of the input and output data before and after the convolution operation are kept unchanged in the U-net by using an Image edge filling method, so that the network design is simplified, and the output of the three-dimensional U-net, namely Change (i-1, i), corresponds to the input Image i with the same sequence number in a one-to-one mode. Change (i-1, i) is represented as a Change between the graph i-1 and the graph i, and since the first graph Image 1 has no corresponding Change, the corresponding Change (0,1) is null, and a loss function and optimization are not calculated on the value in the training process; further: the receptive field of the three-dimensional U-net on time is controlled by the size KT of a convolution kernel in the time dimension, when the KT is 1, the network only calculates one frame of image in the time dimension in each convolution, so that the difference of adjacent pictures cannot be compared for change detection, and the minimum value of the KT is 2; further, the same space-time size is used for all convolution kernels in the three-dimensional U-net, the fixed space size is KW-3 and KH-3, KT is adjusted according to the requirement, and the moving step length (Stride) of the convolution kernels is 1; further, for different KTs, a Padding method needs to be adjusted to keep an output time dimension unchanged, for a convolutional network with an expansion rate of 1, the output size is 65, in order to keep TOUT ═ TIN, when KT is 2 and Stride is 1, the Padding value is 1, that is, the number of Padding needed in the dimension; furthermore, padding is added before Image 1 before each convolution, and the padding method is to copy the positions of Image 1 to Image 0; padding is 2 when KT is 3, and Padding can be started or ended in data of this dimension, two identical pictures are inserted before Image 1, this method relates multiple time-oriented pictures in a way of considering multiple pictures before change detection to strengthen the detection result, when detecting Image i-1 and Image i, the features of Image i-2 are also added in the detection calculation, finally, when detecting the change of Image i-1 and Image i, a picture of future time, Image i +1, is added in the detection calculation, these two methods will be compared in the subsequent result analysis, for a larger KT, there are more combinations to allocate Padding, and the method of analysis is similar to that here.

Experimental part:

in order to construct a multi-time phase change detection data set for research, flood disasters occur in the middle and lower reaches of Yangtze river due to continuous rainstorm in 2017, 6, 30, 7, 5, selecting relevant image data of river changes in Hubei province in the period of time of a sentinal-2 satellite, testing a three-dimensional time sequence U-net on the change detection data set of the Yangtze river basin river, comparing the testing result with a standard U-net, wherein yellow represents a part which is a real value and is detected successfully and is a changed area, red represents a part which is a real value and is not detected and blue represents a part which is a real value and is detected as a changed area.

When the time convolution kernel KT is 2, firstly testing the performance of the three-dimensional U-net of the time convolution kernel KT is 2, and comparing the performance with the two-dimensional U-net;

when the time convolution kernel KT is 3, comparing different filling modes under the condition that the three-dimensional neural network is compared and the KT is 3;

and (4) comparing the results:

(1) when the real value is changed, estimating the number of the change calculated by the model, and recording as TP;

(2) when the real value is changed, estimating the number of unchanged models and marking as FN;

(3) when the true value is unchanged, estimating the number of changes calculated by the model, and recording as FP;

(4) when the true value is unchanged, the estimation model calculates the number of the true value which is also unchanged, and the number is recorded as TN.

Then, the transformation detection evaluation indexes are as follows:

false rate (FA, False Alarm)

Missing rate (Missed Alarm, MA)

Accuracy (Accuracy, ACC)

Precision (Precision)

Recall ratio (Recall)

F1 value (F1-Score), representing the harmonic mean of accuracy and recall, allows for an overall accuracy assessment,

table result comparison

As can be seen from the first table, the detection accuracy of the three-dimensional network provided by the method is higher than that of the traditional method after the time information is added. The comprehensive effect of the U-net 3D KT 2 and the U-net 3D KT 3 header filling is good, and in the aspect of precision, the U-net 3D KT 3 header filling effect is root, but the recall value is lower than that of the U-net 3D KT 2.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A remote sensing satellite image multi-temporal change detection method based on a three-dimensional convolution neural network is characterized by comprising the following steps,

R_out＝R_in+(K_T-1)D (1)

wherein Rout and Rin are respectively the output and input receptive fields of one layer of the convolutional neural network, and D is the distance between the characteristics;

step two: the input and output data are stored to have the same time dimension, the space dimensions of the input and output data before and after the convolution operation are kept unchanged by using a method of filling the Image edge in the U-net, so that the network design is simplified, and the output of the three-dimensional U-net, namely Change (i-1, i), corresponds to the input Image i with the same sequence number one by one; change (i-1, i) is represented as the Change between the graph i-1 and the graph i, because the first graph Image 1 has no corresponding Change, the corresponding Change (0,1) is empty, and the Change (i-1, i) is in the training processCalculating a loss function and optimization on the value, wherein in the second step, padding is added before Image 1 before convolution every time, and the padding method is to copy the positions of Image 1 to Image 0; k_TPadding is 2 when the size is 3, two same pictures can be inserted before Image 1 at the beginning or the end of data of the dimension, and the method relates multiple time-oriented pictures in a mode of considering multiple pictures before change detection so as to strengthen the detection result;

step three: size K in time dimension by convolution kernel_TTo control the time receptive field of the three-dimensional U-net, when K_TAt 1, the network only computes one frame of image per convolution in the time dimension, so that the difference between adjacent pictures cannot be compared for change detection, so K_TIs 2.

2. The method for detecting the multi-temporal change of the remote sensing satellite images based on the three-dimensional convolutional neural network as claimed in claim 1, wherein in the second step, the same space-time size is used for all convolutional kernels in the three-dimensional U-net, the fixed space size is KW-3 and KH-3, and K is adjusted according to the requirement_TThe moving step (Stride) of the convolution kernel is 1.

3. The method for detecting the multi-temporal change of the remote sensing satellite image based on the three-dimensional convolutional neural network as claimed in claim 1, wherein in the second step, for different Ks_TThe padding method needs to be adjusted to keep the output time dimension unchanged, the output size of the convolutional network with the expansion rate of 1 is 65, and when K is equal to TIN, TOUT is kept_TAt 2, when Stride is 1, Padding has a value of 1, i.e., the number of Padding needed in this dimension.

4. The method for detecting the multi-temporal-phase changes of the remote sensing satellite images based on the three-dimensional convolutional neural network as claimed in claim 1, wherein in the second step, when detecting Image i-1 and Image i, the characteristics of Image i-2 are also added into the detection calculation.

5. The method for detecting the multi-temporal-phase change of the remote sensing satellite images based on the three-dimensional convolutional neural network as claimed in claim 1, wherein in the second step, when detecting the changes of Image i-1 and Image i, a picture Image i +1 of the future time is added into the detection calculation, and the two methods are compared in the subsequent result analysis, wherein for a larger K, the two methods are used_TThere are more combinations to assign Padding and the method of analysis is similar here.