CN110533631B

CN110533631B - SAR image change detection method based on pyramid pooling twin network

Info

Publication number: CN110533631B
Application number: CN201910635704.9A
Authority: CN
Inventors: 王蓉芳; 丁凡; 陈佳伟; 刘波; 郝红侠; 尚荣华; 熊涛
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-07-15
Filing date: 2019-07-15
Publication date: 2023-07-04
Anticipated expiration: 2039-07-15
Also published as: CN110533631A

Abstract

The invention provides a SAR image change detection method based on a pyramid pooling twin network, which mainly solves the problem that the result is inaccurate due to the fact that the change detection precision in the traditional method depends on a difference graph. The implementation steps are as follows: 1) Generating a training sample, a test sample and a sample label; 2) Constructing a depth pyramid pooling twin network; 3) Constructing a classification network; 4) Training the deep pyramid pooling twin network and the classification network by using training samples and sample labels to obtain a trained model; 5) And testing the test sample by using the trained model to obtain a change detection result. The invention can avoid using the difference map, effectively solve the influence of the difference map on the change detection result, improve the accuracy of the change detection, and can be used for environment detection and disaster detection.

Description

SAR image change detection method based on pyramid pooling twin network

Technical Field

The invention belongs to the field of image processing, and further relates to a synthetic aperture radar SAR image change detection method which can be used for detecting two SAR image change areas with different time phases in agricultural investigation, natural disaster detection and forest resource monitoring.

Background

The change detection is a technique of obtaining change information by observing states of an object or a phenomenon at different times. The SAR image change detection is to analyze SAR images of different time periods in the same region and detect the change information of the region. As a key technology for earth observation satellites, synthetic aperture radar SAR image change detection has been used in various fields including agricultural investigation, natural disaster detection, forest resource monitoring, and the like.

The traditional change detection method is characterized by the following classical three-step flow paradigm: 1) Inputting two preprocessed synthetic aperture radar SAR images; 2) Obtaining a difference map using a difference operator or other method; 3) And analyzing the difference graph. In the early stage, people use a difference operator to calculate a difference graph, and for the traditional change detection method, the performance quality depends on the formation of the difference graph, and the formation of the difference graph has noise interference, so that high precision cannot be obtained. Since the acquired image has various noises, log ratio operators, mean ratio operators, and the like have been proposed successively. Since the SAR image is deeply affected by the speckle, the detection result of the traditional method is greatly dependent on the performance of the difference image, and the generated precision is often low when the performance of the difference image is poor.

To overcome the above drawbacks, deep learning with a strong abstract expression capability can be used for change detection. Gong (Maoguo Gong, feng Chen, differencing Neural Network for Change Detection in Syntheic Aperture Radar Images, from SpringLink) et al propose a SAR image change detection method based on a deep neural network, which utilizes a stacked restrictive Boltzmann machine to obtain higher precision through layer-by-layer pre-training and fine tuning of the whole network. However, as the difference graph is used as training data, a little noise still exists in the change detection result, and the change detection result is influenced.

An optical image change detection method based on a convolution twin network, which is proposed by Yang Zhan (Yang Zhan, kun Fu, change Detection Based on Deep Siamese Convolutional Networks for Optical Aerial Images, from IGARSS 2017) and the like, performs end-to-end training on a pair of convolution twin networks, and achieves higher precision. However, because the network structure is simple, the variety of the image features learned by the network is single, and the change detection result still needs to be improved.

Disclosure of Invention

The invention aims to provide a SAR image change detection method based on a pyramid pooling twin network, which aims to solve the problems of dependence on a difference map and insufficient image feature richness of the existing method and improve the accuracy of change detection.

In order to achieve the above purpose, the technical scheme of the invention comprises the following steps:

(1) Inputting two SAR images of different time phases in the same region to generate a training sample, a test sample and a sample label;

(2) The method comprises the steps of constructing a deep pyramid pooling twin network consisting of two pyramid pooling networks with identical structures and parameters, wherein each pyramid pooling network comprises the following three parts:

the first part is a convolutional neural network, and the structure of the convolutional neural network sequentially comprises an input layer, a first layer of convolutional layers, a first layer of batch normalization layers, a second layer of convolutional layers, a second layer of batch normalization layers, a third layer of convolutional layers, a third layer of batch normalization layers, a maximum pooling layer and a dropout layer;

the second part is a pyramid pooling module which comprises four convolutional neural networks with the same structure, wherein each convolutional neural network represents one level and has 4 different scale levels in total; each convolution neural network structure sequentially comprises an average two-dimensional pooling layer, a convolution layer and a batch normalization layer, and the outputs of the four networks are fused into the output of the pyramid pooling module;

the third part is a convolutional neural network, and the structure of the third part is that a layer 1 convolutional layer, a batch normalization layer, a dropout layer and a layer 2 convolutional layer are sequentially arranged;

(3) The method comprises the steps of constructing a classification network, wherein the structure of the classification network sequentially comprises a first full-connection layer, a second full-connection layer and a third full-connection layer;

(4) Setting parameters of a deep pyramid pooling twin network and a classification network;

(5) Staged training of depth pyramid pooling twinning networks and classification networks using cross entropy loss functions:

(5a) Inputting training samples and sample labels of the two time phase diagrams into a deep pyramid pooling twin network in batches, extracting features, and splicing the extracted features in pairs to obtain a spliced feature diagram;

(5b) Inputting the spliced feature images into a classification network for training until the cross entropy loss function converges, and obtaining a probability prediction matrix M after the training is completed;

(6) And testing the test sample by using the probability prediction matrix M to obtain a change detection result.

Compared with the prior art, the invention has the following advantages:

firstly, the invention constructs a twin network formed by two identical pyramid pooling networks, directly carries out change detection operation on two phase diagrams, avoids using a difference diagram, and solves the problem that the change detection result in the traditional method is influenced by the difference diagram.

Second, because the pyramid pooling module of the depth pyramid pooling twin network fuses four image features with different scales, the richness of the image features is greatly improved, and the change detection precision is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a graph showing the results of simulation experiment 1 according to the present invention;

FIG. 3 is a graph showing the results of simulation experiment 2 of the present invention;

FIG. 4 is a graph showing the results of simulation experiment 3 of the present invention;

fig. 5 is a graph showing the results of simulation experiment 4 of the present invention.

The specific embodiment is as follows:

embodiments and effects of the present invention are further described below with reference to the accompanying drawings.

Referring to fig. 1, the specific steps for this example are as follows.

And step 1, generating a training sample and a sample label.

1.1 Inputting two SAR images of the same region and different time phases, and carrying out histogram matching treatment:

taking each pixel point in the first time phase diagram, the second time phase diagram and the label image as the center, selecting square image blocks with the size of 31 multiplied by 31 around each pixel point, taking each image block as a sample,

the label image refers to a reference image of a region which is manually marked according to an optical image of the region corresponding to the input first phase diagram and an optical image of the region corresponding to the second phase diagram and has changed and has no non-changing part, or a reference image of the region which is marked and has changed and has no non-changing part is inspected in the field;

1.2 30% of all samples in the two time phase diagrams are randomly selected as training samples, 30% of all samples in the two time phase diagrams are randomly selected as test samples, 30% of all samples in the label image samples are selected as sample labels, and all samples in the boundary parts of the variable type and the non-variable type are selected.

And 2, constructing a depth pyramid pooling twin network.

The network is composed of two deep neural networks with identical structures and parameters, which are respectively used for extracting the characteristics of a first time phase diagram and a second time phase diagram, each network is composed of three parts, and the components of each part are as follows:

2.1 Building a first part of convolutional neural network:

the convolutional neural network structure of the first part is as follows: the method comprises the steps of input layer, first layer convolution layer, first layer batch normalization layer, second layer convolution layer, second layer batch normalization layer, third layer convolution layer, third layer batch normalization layer, maximum pooling layer and dropout layer, wherein the number of convolution kernels of the first layer convolution layer, the second layer convolution layer and the third layer convolution layer is 32, 64 and 128 respectively, the convolution kernels are 3 multiplied by 3, and the step length is 1 multiplied by 1; the momentum of the three batch normalization layers is 0.95; the size of the pooling window of the maximum pooling layer is 2 multiplied by 2, and the step length is 2 multiplied by 2; the dropoff rate of the dropoff layer is 0.5; the activation function of each layer is a relu function;

2.2 A pyramid pooling module of the second part is constructed:

the pyramid pooling module is formed by connecting four convolutional neural networks with the same structure in parallel, wherein each convolutional neural network structure sequentially comprises an average two-dimensional pooling layer, a convolutional layer and a batch normalization layer, the pooling window sizes of the average two-dimensional pooling layer in the four neural networks are respectively 2 multiplied by 2, 4 multiplied by 4, 8 multiplied by 8 and 16 multiplied by 16, and the step sizes are respectively 2 multiplied by 2, 4 multiplied by 4, 8 multiplied by 8 and 16 multiplied by 16; the number of convolution kernels of the convolution layers in the four neural networks is 32, the sizes of the convolution kernels are 1 multiplied by 1, and the step sizes are 1 multiplied by 1; the momentum of the batch normalization layers in the four neural networks is 0.95; the activation functions in the four neural networks are rule functions. The pyramid pooling module is used for realizing the fusion of four different pyramid scale characteristics;

in order to ensure the weight of the global feature, the pyramid has 4 scale levels, and each scale level is subjected to pooling operation and then is subjected to convolution operation of 1 multiplied by 1 to reduce the size of the feature map to 14; restoring the size of the feature map to the size before pooling by a bilinear interpolation method; then fusing the four-scale feature graphs together;

2.3 Constructing a convolutional neural network of the third part:

the convolution neural network structure of the third part sequentially comprises a first layer of convolution layers, a batch normalization layer, a dropout layer and a second layer of convolution layers, wherein the number of convolution kernels of the first layer of convolution layers is 64, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1 multiplied by 1; batch normalized layer momentum is 0.95; the dropoff rate of the dropoff layer is 0.5; the number of convolution kernels of the second layer of convolution layers is 2, the size of the convolution kernels is 1 multiplied by 1, and the step length is 1 multiplied by 1;

the two identical networks are respectively and independently trained in the subsequent steps to respectively extract the characteristics of the two phase diagrams, so as to realize the twin network function and simultaneously avoid using the difference diagrams.

And 3, constructing a classification network.

The network structure sequentially comprises a first full-connection layer, a second full-connection layer and a third full-connection layer. The number of neurons of each fully-connected layer is 1024; the activation functions of the first full-connection layer and the second full-connection layer are relu functions, and the activation function of the third full-connection layer is softmax function.

And step 4, training the deep pyramid pooling twin network and the classification network in stages to obtain a prediction probability matrix.

In the whole training process, the loss function adopts a cross entropy loss function, the optimization algorithm adopts random gradient descent, and the learning rate is 1e-6.

The cross entropy loss function is defined as L _log (y, p) = - (ylog (p) + (1-y) log (1-p)), where y is the label (change class 1, non-change class 0), and p is the prediction probability.

Gradient descent is defined as follows: given a loss function

Where m is the number of samples per training input, h _θ (x ⁱ ) To train the weight of the sample, according to the gradient descent rule, first, J (θ) is used to bias θ,

since the loss function is to be minimized, the parameter θ is updated in the direction of its negative gradient,

random gradient descent refers to randomly extracting a group of samples from training samples, and updating according to a gradient descent rule after each training.

4.1 Training the depth pyramid twinning network):

4.1 a) respectively inputting training samples of the two-time phase diagrams into a first part convolutional neural network of the twin network to perform three convolution operations and one pooling operation, so as to obtain preliminary feature diagrams of 128 first-time phase diagrams and preliminary feature diagrams of 128 second-time phase diagrams;

4.1 b) inputting the feature images of the two-time phase diagrams obtained in 4.1 a) into a second part pyramid pooling module of the twin network respectively, pooling the feature images once for each scale level, and performing 1X 1 convolution operation after pooling each time to reduce the size of the feature images of the two-time phase diagrams to the original size

Restoring the feature map of the reduced two-time phase map to be unbanked by a bilinear interpolation methodSize before chemical conversion;

4.1 c) respectively fusing the characteristic diagrams of the two-time phase diagrams of the four scale levels obtained in the 4.1 b), respectively inputting the fused characteristic diagrams into a third part of a convolution neural network of the twin network for two convolution operations to obtain the characteristic diagrams of the final two-time phase diagrams, and splicing the two phase diagrams in pairs;

4.2 Training the classification network:

inputting the characteristic images of the spliced two-time phase images into a classification network, and mapping the characteristic images of the two-time phase images into a sample tag space by a three-layer full-connection layer in the classification network;

4.3 The depth pyramid pooling twin network and the classification network are updated according to a random gradient descent rule until the cross entropy loss function converges, the trained depth pyramid pooling twin network and the trained classification network are obtained, and a probability prediction matrix M for predicting the probability of the change class and the probability of the non-change class is output through the trained classification network.

And 5, testing the test sample by using the probability prediction matrix M to obtain a change detection result.

5.1 Constructing a tag matrix with the total number of rows equal to the width of the test image and the total number of columns equal to the height of the test image, and sequentially selecting elements from the probability prediction matrix M to fill the elements into the tag matrix;

5.2 Set a threshold τ=0.5 for all elements in the tag matrix and compare each element in the tag matrix to the threshold τ: if the value of the element is larger than tau, classifying the element into a change class, denoted by 1, and if the value of the element is smaller than tau, classifying the element into a non-change class, denoted by 0, and obtaining a new tag matrix only comprising two elements of 0 and 1;

5.3 Outputting the new label matrix in the form of an image to obtain a change detection result graph.

The effects of the present invention are further described below in conjunction with simulation experiments:

simulation experiment condition

1. Experiment platform

The hardware platform of the simulation experiment of the invention is: the processor is Intel i9-9700k CPU, the main frequency is 3.6GHz, and the memory is 16G.

The software platform of the simulation experiment of the invention is: ubuntu 16.04 operating system and python3.6.

2. Experimental parameters:

the simulation experiment of the invention uses four groups of real SAR image data and corresponding reference pictures, wherein:

the first set of real SAR image data and the corresponding reference map are SAR images of Sendai region in japan, and as shown in fig. 2, the image size is 549x 560, wherein fig. 2 (a) is SAR image of Sendai a region in 10 months and 20 days in 2010, fig. 2 (b) is SAR image of Sendai a region in 5 months and 6 days in 2011, and fig. 2 (c) is the corresponding reference map of Sendai region in japan.

The second set of real SAR image data and the corresponding reference map are SAR images of SendaiB region of japan, and the image size is 613×641 as shown in fig. 3, where fig. 3 (a) is SAR image of SendaiB region of 10 months and 20 days of 2010, fig. 3 (b) is SAR image of SendaiB region of 5 months and 6 days of 2011, and fig. 3 (c) is the corresponding reference map of SendaiB region of japan.

The third set of real SAR image data and the corresponding reference map are SAR images of yellow river a region, as shown in fig. 4, the image size is 306×291, wherein fig. 4 (a) is SAR image of yellow river a region of year 2008, fig. 4 (b) is SAR image of yellow river a region of year 2009, and fig. 4 (c) is the corresponding reference map of yellow river a region.

The fourth set of real SAR image data and the corresponding reference map are SAR images of yellow river B, as shown in fig. 6, with the image size of 400×350, where fig. 5 (a) is the SAR image of yellow river B of year 2008, fig. 5 (B) is the SAR image of yellow river B of month 6, and fig. 5 (c) is the corresponding reference map of yellow river B.

3. Evaluation index of simulation experiment

Omission factor FN: the pixel points of the changed class are wrongly detected as unchanged class;

error detection FP: the unchanged class is wrongly detected as the pixel point number of the changed class;

OE: total pixel number of false detection and missing detection;

kappa coefficient: parameters that measure the consistency of the change detection map with the reference map.

Second, simulation content:

four groups of simulation experiments were performed under the above experimental conditions using the present invention:

simulation experiment 1 the entire network was trained using the first and second time phase diagrams 3 (a) and 3 (b) of the second set of SendaiB data sets and tested on the first set of SendaiA data sets, and the resulting change detection results are shown in fig. 2 (d).

Simulation experiment 2 the entire network was trained using the first and second time phase diagrams 2 (a) and 2 (b) of the first set of SendaiA datasets and tested on the second set of SendaiB datasets, resulting in a change detection result diagram as shown in fig. 3 (d).

Simulation experiment 3, the whole network is trained by using the first time phase diagram 5 (a) and the second time phase diagram 5 (B) of the fourth group of yellow river B data sets, and the test is performed on the third group of yellow river A data sets, and the obtained change detection result diagram is shown in fig. 4 (d).

Simulation experiment 4, the whole network is trained by using the first time phase diagram 4 (a) and the second time phase diagram 4 (B) of the third group of yellow river A data sets, and tests are carried out on the fourth group of yellow river B data sets, and the obtained change detection result diagram is shown in fig. 5 (d).

The change detection evaluation indexes obtained by the four groups of simulation experiments are shown in table 1.

TABLE 1

Data set	FP	FN	OE	Kappa
					SendaiA	6165	10246	16411	0.8234
SendaiB	3538	5174	8712	0.8719
					Yellow river A	865	226	1091	0.8959
Yellow river B	1643	1211	2854	0.8922

Third, result analysis

As can be seen from Table 1, the five sets of data all resulted in higher Kappa coefficients.

Comparing the change detection result graph 2 (d) generated in the SendaiA area with the corresponding reference graph 2 (c), comparing the change detection result graph 3 (d) generated in the SendaiB area with the corresponding reference graph 3 (c), comparing the change detection result graph 4 (d) generated in the yellow river A area with the corresponding reference graph 4 (c), and comparing the change detection result graph 5 (d) generated in the yellow river B area with the corresponding reference graph 5 (c), the method can effectively overcome the influence of the difference graph on the final result in the traditional change detection method, effectively reduces false detection FP and missed detection FN, well reserves the details of boundaries while being robust to noise, and improves the Kappa coefficient of change detection.

Claims

1. SAR image change detection method based on pyramid pooling twin network is characterized by comprising the following steps of

2. The method of claim 1, wherein generating training samples, test samples, and sample tags in (1) is accomplished by:

respectively taking each pixel point of the first time phase diagram, the second time phase diagram and the label image as a center, selecting square image blocks with the size of 31 multiplied by 31 around each pixel point, and taking each image block as a sample;

selecting 30% from all time phase pattern books as training samples and 30% as test samples, wherein all samples of the boundary parts of the variable type and the non-variable type are included in the two samples;

selecting 30% from all label image samples as sample labels;

the tag image refers to a reference image of an area which has been artificially marked and has no non-changed part according to the input SAR image of the area corresponding to the first time phase diagram and the SAR image of the area corresponding to the second time phase diagram, or a reference image of an area which has been marked and has no non-changed part is examined in the field.

3. The method of claim 1, wherein the parameters of the depth pyramid pooling twinning network in (4) are set as follows:

for the first part of convolutional neural network, the number of convolution kernels of the first layer of convolutional layers, the second layer of convolutional layers and the third layer of convolutional layers is 32, 64 and 128 respectively, the convolution kernels are 3 multiplied by 3, and the step length is 1 multiplied by 1; all batch normalized layer momentums were 0.95; the size of the pooling window of the maximum pooling layer is 2 multiplied by 2, and the step length is 2 multiplied by 2; the dropoff rate of the dropoff layer is 0.5; the activation function of each layer is a relu function;

for the second part pyramid pooling module, the average two-dimensional pooling layer pooling window sizes of the four convolutional neural networks are sequentially 2×2, 4×4, 8×8 and 16×16, and the pooling step sizes are sequentially 2×2, 4×4, 8×8 and 16×16; the number of convolution kernels of the convolution layers of each convolution neural network is 32, the sizes of the convolution kernels are 1 multiplied by 1, and the step sizes are 1 multiplied by 1; the momentum of the batch normalization layer in each convolutional neural network is 0.95; the activation function in each convolutional neural network is a relu function;

for the third part of convolutional neural network, the number of convolution kernels of the layer 1 convolutional layer is 64, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1 multiplied by 1; momentum of the batch normalization layer is 0.95; the dropoff rate of the dropoff layer is 0.5; the number of convolution kernels of the layer 2 convolution layer is 2, the convolution kernel size is 1 multiplied by 1, and the step size is 1 multiplied by 1.

4. The method of claim 1, wherein the setting of the parameters of the classification network in (4) sets the number of neurons of each fully connected layer to 1024, sets the activation functions of the first fully connected layer and the second fully connected layer to be a relu function, and sets the activation function of the third fully connected layer to be a softmax function.

5. The method of claim 1, wherein the features of the two phase diagrams extracted from the depth pyramid pooling twinning network in (5 a) are implemented as follows:

(5a1) Respectively inputting training samples of the two-time phase diagrams into a first part convolutional neural network of the twin network to perform three-time convolution operation and one-time pooling operation, so as to obtain preliminary feature diagrams of 128 first-time phase diagrams and preliminary feature diagrams of 128 second-time phase diagrams;

(5a2) Respectively inputting the feature images of the two-time phase diagrams obtained in the step (5 a 1) into a second part pyramid pooling module of the twin network, pooling the feature images of the two-time phase diagrams once for each scale level, and carrying out convolution operation of 1X 1 after pooling each time so as to reduce the size reduction of the feature images of the two-time phase diagrams to beOriginal of

Restoring the feature map of the two-time phase map after reduction to the size before pooling by a bilinear interpolation method;

(5a3) And (3) respectively fusing the characteristic diagrams of the two-time phase diagrams of the four scale levels obtained in the step (5 a 2), respectively inputting the fused characteristic diagrams into a third part of the convolution neural network of the twin network to perform convolution operation twice, and outputting the characteristic diagram of the final first time phase diagram and the characteristic diagram of the final second time phase diagram.

6. The method of claim 1, wherein (5 b) inputting the feature map of the spliced two-time phase map into the classification network for training by mapping the feature map of the two-time phase map into the sample tag space through three full-connection layers in the classification network, and continuously updating the network until the cross entropy loss function L _log When (y, p) = - (ylog (p) + (1-y) log (1-p)) converges, the network training is completed, and a probability prediction matrix M is output, wherein y is a label, the variation class is represented by 1, the non-variation class is represented by 0, and p is the prediction probability.

7. The method of claim 1, wherein the step (6) is performed on the test sample with a probability prediction matrix M to obtain a change detection result, and the following is implemented:

(6a) Constructing a tag matrix with the total number of rows equal to the width of the test image and the total number of columns equal to the height of the test image, and sequentially selecting elements from the probability prediction matrix M and filling the elements into the tag matrix;

(6b) All elements in the tag matrix set a threshold τ=0.5 and each element in the tag matrix is compared to the threshold τ: if the value of the element is larger than tau, classifying the element into a change class, denoted by 1, and if the value of the element is smaller than tau, classifying the element into a non-change class, denoted by 0, and obtaining a new tag matrix only comprising two elements of 0 and 1;

(6c) And outputting the new label matrix in an image form to obtain a change detection result graph.