Method for detecting remote sensing image change based on image segmentation and twin neural network
Technical Field
The invention relates to the field of remote sensing change detection, in particular to a high-resolution remote sensing image change detection method based on deep learning, which can be used for change detection of satellite and unmanned aerial vehicle high-resolution two-phase remote sensing images.
Background
With the improvement of the spatial resolution and the revisit frequency of the satellite images, the rapid and accurate discovery of the surface change information by using the change detection technology becomes a hotspot of the current remote sensing field research. The remote sensing change detection is to utilize multi-source remote sensing images and related geographic space data of the same earth surface area in different periods, combine corresponding ground feature characteristics and a remote sensing imaging mechanism, determine and analyze the change of the position and range of the ground feature and the change of the property and the state of the ground feature by adopting an image and graphic processing theory and a mathematical model method, separate out interested change information, filter irrelevant change information serving as an interference factor, and have wide application in the fields of vegetation change, city expansion, illegal building detection and the like.
The early method mainly aims at the medium-low resolution remote sensing image, the pixel is used as a basic unit for analysis, the pixel spectrum difference is analyzed pixel by pixel to extract the change information, the method utilizes the characteristic information of a single pixel, the spatial information and the spectrum information of the pixel are easy to ignore, and the 'salt and pepper' noise and incomplete expression of the change area are caused. With commercialization of high-resolution remote-sensing images, object-oriented image analysis techniques have been introduced into high-resolution remote-sensing image analysis, and the basic unit of change detection has been converted from pixels to objects. The object is used as the most basic analysis unit, and the spectral information of the pixel and the spatial information of the neighborhood of the pixel are integrated, so that the false alarm rate and the false alarm rate in the difference diagram are reduced. Due to the characteristics of the high-resolution remote sensing image, the expression forms of low-level features such as textures, shapes and the like in the image are very complex, the traditional change detection technology usually carries out manual feature extraction according to expert prior, and effective features containing deep-level change information cannot be extracted. The manual features often carry more redundant information and noise, which greatly affects the change detection accuracy. The quality of the performance of the detection algorithm depends in fact to a large extent on the setting of the hyper-parameters. Although the traditional detection algorithm improves the efficiency of parameter search by means of strategies such as grid search and random search, a great deal of manpower and computing resources are wasted. The main limitation of manually setting algorithm parameters is that the conventional optimization strategy is difficult to achieve global optimization in parameter space, which is reflected in poor generalization performance.
By means of strong image feature extraction capability, the change detection method based on deep learning becomes a hot point for remote sensing image change detection research. Different from the traditional change detection algorithm, the mainstream deep learning change detection party at the present stage does not use a single pixel or an object as a basic unit for analysis any more, but adopts an image comparison-based method, treats the change detection as a semantic segmentation task, directly converts the input into a change diagram through a full-volume neural network, simplifies the complexity of the change detection in an end-to-end detection mode, effectively improves the precision of the detection result, has great advantages in the detection speed, and is beneficial to rapidly processing a large amount of data.
Aiming at the problems of complicated work, large workload, low automation degree and the like of urban land resource change detection, Wangming and the like propose a multi-scale network FPN Res-UNet based on a residual error structure and a feature pyramid network, the residual error structure and the feature pyramid network are fused into a UNet model, and the detection performance of the model on different-scale targets is enhanced. Yuan et + + and attention mechanism fused change detection algorithm is provided, and a multi-output fusion strategy is combined for remote sensing image change detection, so that the detection result can better keep the smoothness and integrity of the edge. In order to reduce the pseudo change phenomenon in the detection result, the Ningbo and the like propose a remote sensing image change detection method based on a twin residual error neural network, a multi-temporal multispectral image superpixel is segmented and merged, the segmented subblocks are subjected to feature extraction, then the twin residual error neural network is used for secondary classification to obtain the similarity, and a final change detection difference graph is obtained after OTSU threshold segmentation.
The above studies represent the combination of two time phase images in the present stage deep learning detection method respectively: (1) the early fusion method comprises the following steps: superposing and inputting image data of different time phases into a network; (2) a twin neural network method: inputting the double-time phase images into 1 feature extractor in sequence, and then combining the output feature graph pairs; (3) pseudo-twin network approach: the two time phase images are input into 2 different feature extractors respectively. In the early fusion method, image data of different time phases are input into a network in a superposition manner, and the difference detection of images is started from the first layer of the network, so that the characteristics belonging to different time phases are influenced mutually, and the high-dimensional characteristics of an original image are difficult to maintain. The twin neural network method receives image data of different time phases through two input ends respectively, and links the feature extraction function and the difference recognition function of an original image in the same network through a multilayer network, so that although the high-dimensional features of the image are reserved, the hidden danger of gradient disappearance is greatly increased, and the representativeness of the original image features extracted at the front end of the network is poor.
Disclosure of Invention
The invention provides a method for detecting the change of a remote sensing image based on image segmentation and a twin neural network, which is used for detecting the change of a high-resolution remote sensing image by constructing a new remote sensing image change detection model based on image segmentation and the twin neural network, wherein the detection model is divided into a coder and a decoder, a depth convolution module in the coder can extract the high-dimensional characteristics of the image and then inputs the high-dimensional characteristics into the multi-level fused twin neural network to generate a multi-scale contrast characteristic difference graph; the decoder is responsible for identifying a change area from the difference image, and finally, a detection result with the same size as the original image is obtained by utilizing bilinear interpolation.
In order to solve the technical problems, the invention adopts the technical scheme that: the method for detecting the change of the remote sensing image based on the image segmentation and the twin neural network specifically comprises the following steps:
s1: constructing a training sample library according to the collected images to be processed, the collected earth surface coverage vectors and the collected raster files, wherein the training sample library comprises multi-time image data of the same area and label data of ground object changes;
s2: training a change detection network Siam-Deep by using the multi-time image data and the label data of the ground feature change in the training sample library constructed in the step S1, and learning the change characteristics of different ground features in the high-resolution remote sensing image;
s3: and carrying out post-processing on the extracted change detection result, removing noise and miscellaneous spots in a change area, and regularizing the outline of the building to obtain a final change detection result.
By adopting the technical scheme, a new remote sensing image change detection model Sim-Deep based on image segmentation and a twin neural network is constructed and used for change detection of a high-resolution remote sensing image, and data and a label need to be trained in order to train the model; the detection model is divided into an encoder and a decoder, wherein a depth convolution module in the encoder can extract high-dimensional features of an image and then input the high-dimensional features into a multi-level fused twin neural network to generate a multi-scale contrast feature difference map; the decoder is responsible for identifying a change area from the difference image, and finally, a detection result with the same size as the original image is obtained by utilizing bilinear interpolation.
As a preferred technical solution of the present invention, the training sample library in step S1 further includes real label data based on manual labeling and label data obtained by performing image difference analysis based on the all-purpose segmentation model result.
As a preferred technical solution of the present invention, in the step S1, the specific steps of constructing the training sample library include:
s11 registration of the change area image and spatio-temporal matching: and performing space-time matching on the collected high-resolution image data according to the area covered by the existing change vector and the raster information data, namely matching the data of the same longitude and latitude area in different periods. And if the image to be processed is a framing image, cutting and splicing the framing and framing images to obtain a complete image.
S12 image resampling: counting the resolution of the high-resolution image cut out by the space-time matching in the last step, and resampling other images by taking the resolution of the image with the largest proportion as a reference;
s13 vector change label rasterization: rasterizing the collected vector change file, and converting the vector change file into a grid label with the same resolution as that of a corresponding image, wherein a label pixel comprises two values of a change area and a constant area;
s14 model training sample preparation: cutting out label blocks with the size of 256 multiplied by 256 from random positions of the grid change labels for many times, counting the number of change pixels contained in the label blocks, reserving the label blocks with the ratio of the number of the pixels to the total number of the pixels larger than 0.5, cutting out corresponding image blocks from high-resolution images corresponding to different periods according to the positions of the reserved label blocks, naming the label blocks and the sample blocks, and storing the named label blocks and sample blocks in a training sample library. Wherein, the variable area is marked by (1), and the invariable area is marked by (0).
As a preferred embodiment of the present invention, the change detection network Siam-Deep in step S2 includes two parts, namely, an Encoder (Encoder) and a Decoder (Decoder), wherein the Encoder is composed of two depth convolution modules (DCNN) and a twin space pyramid module (Siam-ASPP); the decoder part consists of an up-sampling module, a characteristic connection module and a convolution module.
As a preferred technical solution of the present invention, the depth Convolution module (DCNN) of the encoder is composed of two groups of successively stacked hole convolutions (irregular Convolution) and modified Linear units (ReLU); the twin space pyramid module (Siam-ASPP) is composed of a cavity convolution and a cavity space convolution pooling pyramid (ASPP), 4 features with different scales obtained by the twin space pyramid module are concat together in the channel dimension, and then are sent to convolution of 1x1 for fusion and a new feature of 256-channel is obtained.
As a preferred technical solution of the present invention, the decoder includes an up-sampling module, a feature connection module, and two convolution modules; the up-sampling module consists of two up-sampling layers; the characteristic connection module is a connection layer; the two convolution modules are a 1 × 1 convolution layer and a 3 × 3 convolution layer, respectively. The decoder firstly uses 1x1 convolution to reduce the dimension of the low-level features output by the depth convolution module, secondly uses the bilinear interpolation of the features obtained by the encoder to obtain 4 times of features, then uses the low-level features concat with the corresponding size in the encoder, secondly uses 3x3 convolution to further fuse the features, and finally uses the bilinear interpolation to obtain the segmentation prediction with the same size as the original picture.
As a preferred embodiment of the present invention, all maximum Pooling layers (Max power Layer) in the hole convolution in the encoder are replaced by stride =2 depth separable convolution.
As a preferred embodiment of the present invention, the method for enhancing data in step S15 includes: image rotation, inversion, random noise addition and brightness adjustment.
As a preferred embodiment of the present invention, the expansion rate (rate) of the void convolution is 2, the twin spatial pyramid module includes 3 expansion rates, which are 6, 12 and 18, respectively, the convolution kernel sizes are all 3 × 3, and the convolution step size is 1.
As a preferred embodiment of the present invention, in the post-processing in step S3, a convolution kernel with low-pass filtering is used to remove noise and noise spots in the detection result, and a polygon regularization method composed of coarse and fine tuning is used to transform the polygons in the detection result into a structured contour.
Compared with the prior art, the invention has the beneficial effects that: 1) a twin neural network-based change detection model is provided for change detection of high-resolution remote sensing images; 2) even in a complex change area, the detection result is still smooth and is close to the real terrain change; 3) the trained network model can be used for detecting the change of various complex scenes such as water bodies, buildings, forests, roads and the like.
Drawings
FIG. 1 is a schematic structural diagram of an overall model provided by the method for detecting changes of remote sensing images based on image segmentation and a twin neural network;
FIG. 2 is a schematic diagram of a twin hole convolution module (Sim-ASPP) in a method for detecting changes of remote sensing images based on image segmentation and a twin neural network;
FIG. 3 is a sample of the results of change detection in the method for detecting changes in remote sensing images based on image segmentation and twin neural networks; wherein (a) is aerial remote sensing image of a certain place, (b) is aerial remote sensing image of later return visit of the same area in (a), and (c) is a map of analysis result of ground feature change in (a) and (b); (d) is another local aerial remote sensing image, (e) is a late return visit aerial remote sensing image of the same area in (d), and (f) is a map of the analysis result of the change of the ground object in (d) and (e).
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the drawings of the embodiments of the present invention.
Example (b): the method for detecting the change of the remote sensing image based on the image segmentation and the twin neural network specifically comprises the following steps:
s1: as shown in fig. 1, in order to train a model, firstly, training data and labels, and constructing a training sample library according to an existing image to be processed, a ground surface coverage vector and a raster file, which are collected by a satellite and an unmanned aerial vehicle, wherein the training sample library comprises multi-time image data and label data of ground feature changes in the same area;
in step S1, the specific steps of constructing the training sample library include:
s11 registration of the change area image and spatio-temporal matching: performing space-time matching on the collected high-resolution image data according to the area covered by the existing change vector and the raster information data, namely matching the data of the same longitude and latitude area in different periods; if the image to be processed is a framing image, cutting and splicing the framing and framing images to obtain a complete image;
s12 image resampling: counting the resolution of the high-resolution image cut out by the space-time matching in the step S11, and resampling other images by taking the resolution of the image with the largest proportion as a reference;
s13 vector change label rasterization: rasterizing the collected vector change file, converting the vector change file into a grid label with the same resolution as that of a corresponding image, wherein a labeled pixel comprises two values of a change area (1) and an invariant area (0);
s14 model training sample preparation: cutting out label blocks with the size of 256 multiplied by 256 from random positions of the grid change labels for many times, counting the number of change pixels contained in the label blocks, reserving the label blocks with the ratio of the number of the pixels to the total number of the pixels larger than 0.5, simultaneously cutting out corresponding image blocks from high-resolution images corresponding to different periods according to the positions of the reserved label blocks, naming the label blocks and the sample blocks, and storing the named label blocks and sample blocks in a training sample library;
s15 data enhancement: performing data enhancement on the image blocks and the corresponding label blocks in the training sample library to generate a training sample library; the method for enhancing data in step S15 includes: rotating and turning the image, adding random noise and adjusting the brightness; the training sample library also comprises real label data based on manual labeling and label data obtained by performing image difference analysis based on all-purpose segmentation model results;
s2: training a change detection network Siam-Deep by using the multi-time image data and the label data of the ground feature change in the training sample library constructed in the step S1, and learning the change characteristics of different ground features in the high-resolution remote sensing image;
the change detection network Siam-Deep in the step S2 includes two parts of an Encoder (Encoder) and a Decoder (Decoder), wherein the Encoder is composed of two Deep convolution modules (DCNN) and a twin spatial pyramid module (Siam-ASPP); the decoder part consists of an up-sampling module, a characteristic connection module and a convolution module; the depth Convolution module (DCNN) of the encoder consists of two groups of successively stacked hole convolutions (atom Convolution) and modified Linear units (ReLU); as shown in fig. 2, the twin space pyramid module (Siam-ASPP) is composed of a cavity convolution and a cavity space convolution pooling pyramid (ASPP), 4 features of different scales obtained by the twin space pyramid module are concat together in the channel dimension, and then are sent to convolution of 1 × 1 for fusion and a new feature of 256-channel is obtained; the decoder comprises an up-sampling module, a characteristic connecting module (Concat Layer) and two convolution modules; the up-sampling module consists of two up-sampling layers; the characteristic connection module is a connection layer; the two Convolution modules are respectively a 1 × 1 Convolution Layer (Convolution Layer) and a 3 × 3 Convolution Layer (Convolution Layer); the expansion rate (rate) of the cavity convolution is 2, the twin space pyramid module comprises 3 expansion rates which are 6, 12 and 18 respectively, the sizes of convolution kernels are all 3 multiplied by 3, and convolution step lengths are all 1;
the decoder firstly uses 1x1 convolution to reduce the dimension of the low-level features output by the depth convolution module, secondly obtains 4 times of features by bilinear interpolation of the features obtained by the encoder, then further fuses the features with the low-level features concat with the corresponding size in the encoder, secondly uses 3x3 convolution, and finally obtains the segmentation prediction with the same size as the original picture by bilinear interpolation; all maximum Pooling layers (Max power Layer) in the hole convolution in the encoder are replaced by stride =2 depth separable convolution;
s3: carrying out post-processing on the extracted change detection result, removing noise and miscellaneous spots in a change area, and regularizing the outline of the building to obtain a final change detection result; in the post-processing in step S3, a convolution kernel with low-pass filtering is used to remove noise and noise spots in the detection result, and a polygon regularization method composed of coarse and fine tuning is used to transform the polygons in the detection result into a structured contour. FIG. 3 is a sample diagram of an example of applying the method of the present invention, wherein (a) in FIG. 3 is an aerial remote sensing image of a certain place, (b) is a late return visit aerial remote sensing image of the same area in (a), and (c) is a graph of the results of analysis of the feature changes in (a) and (b); fig. 3 (d) is another local aerial remote sensing image, (e) is a later return visit aerial remote sensing image of the same region in (d), and (f) is a map of the analysis result of the feature change in (d) and (e).
In order to verify and illustrate the effectiveness of the method, the method is compared with 4 existing deep learning methods, and table 1 shows the accuracy comparison of the method of the present invention and other deep learning-based methods.
TABLE 1 comparison of the method of the present invention with 4 additional methods of deep learning
Detection method
|
Rate of accuracy
|
Recall rate
|
F1 score
|
Overall rate of accuracy
|
FC-EF
|
0.609
|
0.528
|
0.594
|
0.911
|
FC-Siam-diff
|
0.706
|
0.658
|
0.667
|
0.932
|
EF-UNet++
|
0.911
|
0.883
|
0.896
|
0.978
|
DASNet(ResNet50)
|
0.932
|
0.922
|
0.927
|
0.982
|
Siam-Deeplab
|
0.947
|
0.934
|
0.940
|
0.986 |
As can be seen from Table 1, our method is totally superior to the other 4 methods, the accuracy rate is 0.947, the recall rate is 0.934, the F1 score is 0.940, and the total accuracy rate is 0.986.
The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.