CN117115663A - Remote sensing image change detection system and method based on deep supervision network - Google Patents
Remote sensing image change detection system and method based on deep supervision network Download PDFInfo
- Publication number
- CN117115663A CN117115663A CN202311264681.8A CN202311264681A CN117115663A CN 117115663 A CN117115663 A CN 117115663A CN 202311264681 A CN202311264681 A CN 202311264681A CN 117115663 A CN117115663 A CN 117115663A
- Authority
- CN
- China
- Prior art keywords
- network
- convolution
- feature
- change detection
- remote sensing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008859 change Effects 0.000 title claims abstract description 46
- 238000001514 detection method Methods 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims description 22
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 230000004927 fusion Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 230000017105 transposition Effects 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000002474 experimental method Methods 0.000 description 6
- 238000005286 illumination Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000012014 frustrated Lewis pair Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 239000006002 Pepper Substances 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a remote sensing image change detection system and a remote sensing image change detection method based on a depth supervision network, which are based on a coding and decoding network DSNet end to end of a Unet++ model, and comprise an encoder, a decoder and a classifier; the dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image. An asymmetric multi-scale residual feature extraction module (MultiRes block) designed in the system improves F1-Score by 1.4%, and a deep fusion supervision part improves F1-Score by 2.1% under the condition of increasing a small number of parameters.
Description
Technical Field
The invention provides a remote sensing image change detection system and a remote sensing image change detection method based on a depth supervision network, and belongs to the technical field of remote sensing images.
Background
Remote sensing image change detection is a technique that analyzes pixels having changes between multi-temporal remote sensing images acquired in the same area but at different times. The technology is widely applied to the fields of land utilization, urban expansion, disaster assessment and the like. Along with the development of machine learning, methods such as a support vector machine, a random forest, a decision tree and the like are gradually applied to remote sensing images by students. Since conventional ground pixel classification tends to result in a salt-and-pepper effect, the basic unit of change detection has been changed from pixel to object. At the same time, many methods that take into account the relationship between adjacent pixels have also been introduced into object-level change detection, classifying invariant and uncertain pixels in a differential image by linear weights, and using an improved markov random field method for change detection.
Convolutional Neural Networks (CNNs) have been well developed in the relevant research of remote sensing images, representing algorithms such as FC-EF, FC-Siam-coc and FC-Siam-diff, to achieve end-to-end training. The change detection method based on the full convolution network can be roughly divided into two types of pre-fusion and post-fusion. The pre-fusion method mainly comprises the steps of connecting time sequence images in series to be used as input of a single branch network, and then directly generating a binary change chart. For example, an improved Unet++ network, utilizes global and fine granularity information to generate a prediction graph with higher spatial precision, then fuses the prediction graphs of different layers, thereby generating a high-precision final change graph, and verifies the effectiveness and reliability of the method on a high-resolution (VHR) satellite image. However, the improved network structure of the Unet++ requires a large amount of changing image data, and can not effectively generate multi-scale changing characteristics, which limits the wide application of the network structure to a certain extent.
The post fusion is to extract the characteristics of the images by a double-branch network with shared weight, then map the images to the characteristic images of the high-dimensional space for fusion, and finally obtain the change images through up-sampling. Representative algorithms are DifUnet++, SNUNet-CD, sima-DeepLabv3+ networks, and the like. However, the remote sensing image generally has the characteristics of complex scene, rich information and the like, when the existing majority of networks extract the characteristics of the remote sensing image, the original image information is not fully extracted, the semantic relevance among different layers is ignored, and only a single output is matched by virtue of training of a single model, so that the space and spectral characteristics of the image cannot be fully utilized, and the change detection capability cannot be further improved.
Although the existing remote sensing image change detection can well extract the characteristics, some problems still exist in extracting the characteristics:
first: the original image information is not sufficiently extracted.
Second,: the semantic relevance among different layers is ignored, and the space and spectral characteristics of the image cannot be fully utilized.
Third,: remote sensing images acquired in the front and rear periods are easily affected by external conditions such as illumination, and change detection is difficult.
Disclosure of Invention
Aiming at the technical problems, the invention provides a remote sensing image change detection system and a remote sensing image change detection method based on a depth supervision network.
Specific:
a remote sensing image change detection system based on a depth supervision network comprises an encoder, a decoder and a classifier, wherein the encoder and the decoder are based on a coding and decoding network DSNet end to end of a Unet++ model;
the dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image.
The encoder is used as a backbone of a network to extract difference information of the two-phase image. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the front node are connected in series and then used as the input of all the subsequent nodes, and the encoder comprises five paths from the aspect of a horizontal structure and is named as L1-L5 from top to bottom in sequence. The convolution units in the same path have the same characteristic output mapping, different paths comprise different numbers of convolution units, and the number of output channels after L1-L5 pass through the convolution units is 32, 64, 128, 256 and 512 respectively.
The multi-scale residual feature extraction module structure diagram comprises:
firstly, extracting shallow information and detail characteristics of an image through a convolution layer with a standard size of 3 multiplied by 3; next, the asymmetric convolution kernels of 1×3,3×1 sizes are passed. Then, a separable convolution layer with the size of 3×3 is introduced, the features of the asymmetric convolution output are convolved channel by channel on a two-dimensional plane, and the obtained independent channels are weighted and combined in the depth direction. And then, the output features of the three convolution layers are connected in series and used for extracting semantic information of different scales. Meanwhile, a convolution layer with the size of 1×1 is used twice in the module for residual connection and adjustment of the number of channels after cascade connection respectively. And finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information.
The decoder is used for generating a characteristic difference graph:
the feature graphs X0,4, X1,3, X2,2, X3 and 1 are subjected to multi-scale residual feature extraction module and transposition convolution operation, and fusion features of each path are respectively mapped into feature graphs F1, F2, F3 and F4 with channel numbers of 32. Each path uses different up-sampling rates, X0,4- > U1×1, X1,3- > U2×2, X2,2- > U4×4, X3,1- > U8×8, namely, the feature graphs F1, F2, F3 and F4 which are output are realized by using different step sizes in transposed convolution, and the dimensions are 256×256.
Based on the system, the invention also provides a remote sensing image change detection method based on the depth supervision network, which comprises the following steps:
step one: collecting image data to establish a data set;
step two: image preprocessing, eliminating irrelevant information in an image, recovering useful real information, enhancing the detectability of related information and simplifying data to the maximum extent, so that the method is more suitable for inputting models.
Step three: and evaluating the index and parameter setting.
Based on the combination of the prediction result (predictedeuct) and the true value, each pixel is divided into four cases, i.e., positive sample predicted but False Predicted (FP), positive sample predicted and True Predicted (TP), negative sample predicted but false predicted (FN) and negative sample predicted but true predicted (TN), in order to evaluate the performance of the model, the following indices are used: recall, precision, and F1-Score. In the change detection task, the higher the recall rate is, the stronger the model is capable of finding more changed pixels, and the higher the accuracy is, the more accurate the detected changed pixels are. F1-Score combines the results of precision and recovery to be considered as a harmonic mean of model accuracy and recall, with a maximum of 1 and a minimum of 0. The evaluation index formula is as follows:
where N represents the total pixel point, P represents the ratio between the true and predicted values for a given class distribution, and OA represents the overall classification accuracy. In a DSNet network, the training period is 100 rounds and the batch size is set to 16. The learning rate was adjusted with intervals (StepLR) with an initial learning rate of 0.001 and a decay of 0.5 for every 10 rounds.
The invention has the following characteristics:
first: for the problem that the existing model cannot fully extract the original image information, a multi-scale residual feature extraction module (MultiRes block) is introduced into an encoder part to replace a common convolution block of the original Unet++. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, and the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then are used as the input of the twin networkIs the input for all subsequent nodes. Firstly, extracting shallow information and detail characteristics of an image through a convolution layer with a standard size of 3 multiplied by 3; secondly, through asymmetric convolution kernels with the sizes of 1 multiplied by 3 and 3 multiplied by 1, the network characteristic expression capability and robustness can be effectively enhanced, and a certain parameter calculation amount can be reduced relative to a standard convolution layer; then, introducing a separable convolution layer with the size of 3 multiplied by 3, carrying out channel-by-channel convolution on the characteristics of the asymmetric convolution output on a two-dimensional plane, and carrying out weighted combination on the obtained independent channels in the depth direction, so that the characteristic information of different channels in the same spatial position is effectively utilized, and the operation cost of a network is reduced; meanwhile, a convolution layer with the size of 1 multiplied by 1 is used twice in the module and is respectively used for residual connection and channel number adjustment after cascade connection; and finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information. The output of the convolution unit is mapped by X ij The i represents an ith path in the downsampling process, j represents a jth convolution unit and a jump connection direction, C (magnitude) represents convolution operation of the multi-scale residual feature extraction module, D (magnitude) represents downsampling operation, and U (magnitude) represents upsampling operation, [ (magnitude).]Representing a series operation. Each node output of the encoder is:
second,: for the problem that semantic relevance among different layers is poor and spatial and spectral characteristics of images cannot be fully utilized, a decoder part fuses characteristic diagrams of different scales. Four feature maps X of different scales are constructed by a transverse output layer encoder 0,4 ,X 1,3 ,X 2,2 ,X 3,1 And finishing the deep supervision process of the output end from the low-level features to the high-level features. Feature map X 0 , 4 ,X 1,3 ,X 2,2 ,X 3,1 Through a multi-scale residual error feature extraction module and transposition convolution operation, the fusion features of each path are respectively mapped into a feature map F with the channel number of 32 1 ,F 2 ,F 3 ,F 4 . To ensure that the output signature can be restored to original size, each path uses a different up-sampling rate (X 0,4 ->U1×1,X 1,3 ->U2×2,X 2,2 ->U4×,X 3 ,1 ->U8×8), i.e. feature map F with different step sizes for output in transpose convolution 1 ,F 2 ,F 3 ,F 4 The dimensions are 256×256. The training process of different up-sampling branches is supervised through an objective function, learning information from different supervision layers is fused, results are integrated into a highly fused branch, feature differences between an encoder and a decoder are reduced, and the detection effect of a remote sensing image is improved.
Third,: for the problems that remote sensing images acquired in the front and rear periods are easily influenced by external conditions such as illumination and the like and are difficult to detect changes, a Normalization Attention Method (NAM) based on a training model is introduced into a decoder part, and feature images with different paths and the same size are connected in series to form a fusion feature image F with 128 channels C Firstly, carrying out batch normalization BN, then multiplying the BN with a weight coefficient Wgamma, and finally obtaining a characteristic output FD through an activation function, wherein the specific formula is as follows:
F D =sigmoid(W γ (BN(F C )))
fourth,: in order to solve the problem of network learning capacity reduction in the deep learning process, the robustness of the network is improved. The system designs a multi-scale prediction method to improve the segmentation precision of the network by means of the thought of depth supervision in the classifier part, enhances the detection capability of the network on pseudo-changes, and further improves the learning capability of the model on the change region. The depth supervision is mainly realized by calculating the loss value of a real label and a hidden layer classifierTo realize the monitoring and feedback of the hidden layer. The whole network not only fuses the characteristic diagram F D Calculation of loss value is performed, and also for different layers of feature map F obtained by decoder part 1 ,F 2 ,F 3 ,F 4 The loss value is calculated and returned. Namely F 1 ,F 2 ,F 3 ,F 4 ,F D Two-dimensional prediction change maps M are generated respectively through convolution layers with the size of 1 multiplied by 1 and Softmax activation functions 1 ,M 2 ,M 3 ,M 4 ,M 5 And finally, calculating the loss of all the prediction change graphs and the true values respectively, and performing supervised training. The loss calculation formula of the whole network is as follows:
compared with the prior art, the invention has the following effects:
first: in an ablation experiment, an asymmetric multi-scale residual feature extraction module (MultiRes block) designed in the system improves F1-Score by 1.4%, and a deep fusion supervision part improves F1-Score by 2.1% under the condition of increasing a small amount of parameters.
Second,: in the scene graph segmentation comparison experiment with large image difference between the front period and the rear period caused by large, medium and small target scenes and different illumination, compared with other network models, the system has no conditions of missed detection, false detection and blurred edges.
Third,: by contrast, the floating point operation value (FLPs) of the system is only 60G, the model performance is improved, and the balance between the network performance and the calculated amount can be realized.
Drawings
FIG. 1 is an overall block diagram of an algorithm model of the present invention:
FIG. 2 is a block diagram of the encoder of the present system:
FIG. 3 is a block diagram of a multi-scale residual feature extraction module of the present invention:
fig. 4 is a characteristic difference map decoder generated by the decoder of the present invention:
fig. 5 is a depth supervision block diagram of the present invention:
FIG. 6a is one of the data processing schematics of the embodiment;
FIG. 6b is a second schematic diagram of data processing according to an embodiment;
FIG. 6c is a third schematic diagram of data processing according to an embodiment;
FIG. 7 is a schematic diagram of an confusion matrix of an embodiment;
FIG. 8 is a visualization of different change detection methods of an embodiment;
FIG. 9 is a F1-Score plot of training and validation sets of an embodiment.
Detailed Description
The specific technical scheme of the invention is described by combining the embodiments.
A remote sensing image change detection system based on a depth supervision network is disclosed in fig. 1, and a new end-to-end codec network DSNet is designed based on a Unet++ model and mainly comprises an encoder, a decoder and a classifier. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image.
As shown in fig. 2, the encoder section is used as a backbone of the network to extract difference information of the bi-temporal image. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the front node are connected in series and then used as the input of all the subsequent nodes, and the encoder comprises five paths from the aspect of a horizontal structure and is named as L1-L5 from top to bottom in sequence. The convolution units in the same path have the same characteristic output mapping, different paths comprise different numbers of convolution units, and the number of output channels after L1-L5 pass through the convolution units is 32, 64, 128, 256 and 512 respectively.
As shown in fig. 3, first, shallow information and detail features of an image are extracted through a convolution layer of standard 3×3 size; next, the asymmetric convolution kernels of 1×3,3×1 sizes are passed. It has been demonstrated to effectively enhance network feature expressive power and robustness and to reduce certain parameter calculations relative to standard convolution layers. Then, a separable convolution layer with the size of 3 multiplied by 3 is introduced, the characteristics of the asymmetric convolution output are convolved channel by channel on a two-dimensional plane, and the obtained independent channels are weighted and combined in the depth direction, so that the characteristic information of different channels in the same spatial position is effectively utilized, and the operation cost of a network is reduced. And then, the output features of the three convolution layers are connected in series and used for extracting semantic information of different scales. Meanwhile, a convolution layer with the size of 1×1 is used twice in the module for residual connection and adjustment of the number of channels after cascade connection respectively. And finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information.
A decoder for generating a feature difference map, as shown in fig. 4: the feature graphs X0,4, X1,3, X2,2, X3 and 1 are subjected to multi-scale residual feature extraction module and transposition convolution operation, and fusion features of each path are respectively mapped into feature graphs F1, F2, F3 and F4 with channel numbers of 32. In order to ensure that the output feature graphs can be restored to the original graph size, each path uses different up-sampling rates (X0, 4- > U1X 1,3- > U2X 2,2- > U4X 4, X3,1- > U8X 8), namely, the feature graphs F1, F2, F3 and F4 which are output are realized by using different step sizes in transposed convolution, and the dimensions of the feature graphs F1, F2, F3 and F4 are 256X 256.
As shown in the depth supervision block diagram of fig. 5, the whole network not only calculates the loss value of the fused feature map FD, but also calculates and returns the loss value of the feature maps F1, F2, F3, F4 of different layers obtained by the decoder. Namely, F1, F2, F3, F4, FD generates two-dimensional prediction change maps M1, M2, M3, M4, M5 by a convolution layer of 1×1 size and a Softmax activation function, respectively. And finally, calculating the loss of all the prediction change graphs and the true values respectively, and performing supervised training.
The system can be used in the fields of land utilization, urban extension, disaster assessment and the like, improves the detection effect of remote sensing images under the condition that images acquired in the front and rear periods are easily influenced by external conditions such as illumination and the like, realizes higher detection accuracy, has low omission rate of change detection and has better detection effect. The specific implementation steps are as follows.
Step one: image data is collected to establish a dataset, and a LEVIR-CD dataset is adopted for experiments. LEVIR-CD is an annotated dataset, and image pairs are annotated by professionals, containing hyperspectral images of Google Earth (Google Earth) taken in different seasons with a resolution of 0.5. The image time spans from 2002 to 2018, from 20 different regions in several cities in texas, usa. The method is mainly used for identifying major city changes between 5 and 14 years, contains a large amount of change information generated by seasons and light rays, and is helpful for training a change detection model.
Step two: image preprocessing, eliminating irrelevant information in an image, recovering useful real information, enhancing the detectability of related information and simplifying data to the maximum extent, so that the method is more suitable for inputting models. As shown in fig. 6a, 6b and 6c, the original dataset contains a total of 637 pairs of 1024 x 1024 pictures, and 10192 pairs of 256 x 256 pixel images are generated from the original image by clipping and rotation operations. The experiment randomly divides the dataset into three parts: 70% of the samples were used for training, 20% for validation and 10% for testing.
Step three: and evaluating the index and parameter setting. As shown in fig. 7, the real value (GroundTruth) represented by blue is obtained by artificial labeling, the model prediction result is a blue part, and each pixel can be classified into four cases of being predicted as a positive sample but the prediction is False (FP), being predicted as a positive sample and the prediction is True (TP), being predicted as a negative sample but the prediction is False (FN), and being predicted as a negative sample but the prediction is True (TN) according to the combination of the prediction result (PredictedResult) and the real value. In order to more intuitively evaluate the performance of DSNet, fig. 8 shows a further visualization of the different segmentation result graphs of the representative network on the dataset LEVIR in Table 1. The experiment selects a total of 4 sets of typical aviation scene graphs, mainly related to large, medium and small target scenes and scene graphs with large differences in front and rear period images caused by different illuminations, and the parts marked by circles and squares in the graphs show local segmentation details for facilitating observation and comparison to evaluate the performance of the model. The following indexes are adopted: recall, precision, and F1-Score. In the change detection task, the higher the recall rate is, the stronger the model is capable of finding more changed pixels, and the higher the accuracy is, the more accurate the detected changed pixels are. As shown in FIG. 9, F1-Score combines the results of precision and recovery, and can be considered as a harmonic mean of model accuracy and recall, with a maximum of 1 and a minimum of 0. The evaluation index formula is as follows:
where N represents the total pixel point, P represents the ratio between the true and predicted values for a given class distribution, and OA represents the overall classification accuracy. In a DSNet network, the training period is 100 rounds and the batch size is set to 16. The learning rate was adjusted with intervals (StepLR) with an initial learning rate of 0.001 and a decay of 0.5 for every 10 rounds.
Step four: comparing the method of the system with the existing detection method based on the deep learning change, the obtained result is shown as a chart 1, and the system is shown to obtain excellent performance in the change detection of the complex scene.
Table 1 results of comparative experiments on the change detection task of different methods
For large and medium target scenes, compared with other models, the conditions of missing detection, false detection and edge blurring do not occur; for complex scenes in which large targets and small targets coexist, the system can better extract the large targets and can also completely detect the small targets. The system re-performs weight distribution on the fused feature images through the attention introducing mechanism, eliminates partial redundant noise information, emphasizes important information of a target object, and further enables the network to overcome the influence of irrelevant changes of different illumination or climate on the model to the greatest extent. Through experimental comparison, the multi Res block in the system improves the F1-Score by 1.4%, and the deep fusion supervision part improves the F1-Score by 2.1% under the condition of increasing a small quantity of parameters. The FLPs value of the method provided by the invention is only 60G, the model performance is superior to that of other methods, and the balance between the network performance and the calculated amount can be realized.
Claims (5)
1. The remote sensing image change detection system based on the depth supervision network is characterized by comprising a coding and decoding network DSNet based on a Unet++ model end-to-end, and comprising an encoder, a decoder and a classifier;
the dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image.
2. The system of claim 1, wherein the encoder is used as a backbone of the network to extract difference information of the two-phase image; the method comprises the steps that (1) a double-time Image1 and an Image2 are used as inputs of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the double-time Image obtained by each layer and the output of a front node are connected in series and then used as inputs of all subsequent nodes, and the encoder comprises five paths from the aspect of a horizontal structure, and the paths are named as L1-L5 from top to bottom in sequence; the convolution units in the same path have the same characteristic output mapping, different paths comprise different numbers of convolution units, and the number of output channels after L1-L5 pass through the convolution units is 32, 64, 128, 256 and 512 respectively.
3. The remote sensing image change detection system based on the depth supervision network according to claim 1, wherein the multi-scale residual error feature extraction module firstly extracts shallow layer information and detail features of the image through a convolution layer with a standard size of 3×3; second, by an asymmetric convolution kernel of 1×3,3×1 size; then, introducing a separable convolution layer with the size of 3 multiplied by 3, carrying out channel-by-channel convolution on the characteristics of the asymmetric convolution output on a two-dimensional plane, and carrying out weighted combination on the obtained independent channels in the depth direction; then, the output features of the three convolution layers are connected in series and used for extracting semantic information of different scales; meanwhile, a convolution layer with the size of 1 multiplied by 1 is used twice in the module and is respectively used for residual connection and channel number adjustment after cascade connection; and finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information.
4. The remote sensing image change detection system based on a depth supervision network according to claim 1, wherein the decoder is configured to generate a feature difference map: the feature graphs X0,4, X1,3, X2,2, X3 and 1 are subjected to multi-scale residual feature extraction module and transposition convolution operation, and fusion features of each path are respectively mapped into feature graphs F1, F2, F3 and F4 with channel numbers of 32;
each path uses different up-sampling rates, X0,4- > U1×1, X1,3- > U2×2, X2,2- > U4×4, X3,1- > U8×8, namely, the feature graphs F1, F2, F3 and F4 which are output are realized by using different step sizes in transposed convolution, and the dimensions are 256×256.
5. A remote sensing image change detection method based on a depth supervision network, characterized in that a remote sensing image change detection system based on a depth supervision network as claimed in any one of claims 1 to 4 is adopted, comprising the following steps:
step one: collecting image data to establish a data set;
step two: image preprocessing, namely eliminating irrelevant information in an image, recovering useful real information, enhancing the detectability of related information and simplifying data to the greatest extent, so that the method is more suitable for inputting a model;
step three: evaluating indexes and setting parameters;
according to the combination of the prediction result and the true value, each pixel is divided into four cases of FP predicted as positive sample but false, TP predicted as positive sample and true, FN predicted as negative sample but false, and TN predicted as negative sample but true, in order to evaluate the performance of the model, the following indexes are adopted: recall, precision, and F1-Score; in the change detection task, the higher the recall rate is, the stronger the model has the capability of finding more changed pixels, and the higher the precision is, the more accurate the detected changed pixels are; F1-Score synthesizes the results of precision and recovery, and is regarded as a harmonic mean of model accuracy and recall, the maximum value is 1, and the minimum value is 0; the evaluation index formula is as follows:
where N represents the total pixel point, P represents the ratio between the true value and the predicted value given the class distribution, and OA represents the overall classification accuracy; in a DSNet network, the training period is 100 rounds, and the batch size is set to be 16; the learning rate was adjusted by StepLR at intervals, with an initial learning rate of 0.001 and a decay of 0.5 every 10 rounds.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311264681.8A CN117115663A (en) | 2023-09-28 | 2023-09-28 | Remote sensing image change detection system and method based on deep supervision network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311264681.8A CN117115663A (en) | 2023-09-28 | 2023-09-28 | Remote sensing image change detection system and method based on deep supervision network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117115663A true CN117115663A (en) | 2023-11-24 |
Family
ID=88794895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311264681.8A Pending CN117115663A (en) | 2023-09-28 | 2023-09-28 | Remote sensing image change detection system and method based on deep supervision network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117115663A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117494031A (en) * | 2024-01-02 | 2024-02-02 | 深圳市伟昊净化设备有限公司 | Intelligent monitoring method and system for compressed air pipeline |
-
2023
- 2023-09-28 CN CN202311264681.8A patent/CN117115663A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117494031A (en) * | 2024-01-02 | 2024-02-02 | 深圳市伟昊净化设备有限公司 | Intelligent monitoring method and system for compressed air pipeline |
CN117494031B (en) * | 2024-01-02 | 2024-04-26 | 深圳市伟昊净化设备有限公司 | Intelligent monitoring method and system for compressed air pipeline |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111160311B (en) | Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network | |
Zheng et al. | ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection | |
Yang et al. | CDnet: CNN-based cloud detection for remote sensing imagery | |
Guo et al. | CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence | |
Miao et al. | Multigranularity decoupling network with pseudolabel selection for remote sensing image scene classification | |
CN112396607B (en) | Deformable convolution fusion enhanced street view image semantic segmentation method | |
CN111797779A (en) | Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion | |
CN110111345B (en) | Attention network-based 3D point cloud segmentation method | |
Wang et al. | Change detection from synthetic aperture radar images via graph-based knowledge supplement network | |
Rougier et al. | Comparison of sampling strategies for object-based classification of urban vegetation from Very High Resolution satellite images | |
CN116524361A (en) | Remote sensing image change detection network and detection method based on double twin branches | |
CN113569815B (en) | Method for detecting remote sensing image change based on image segmentation and twin neural network | |
Wang et al. | RSCNet: A residual self-calibrated network for hyperspectral image change detection | |
CN112287983B (en) | Remote sensing image target extraction system and method based on deep learning | |
CN117115663A (en) | Remote sensing image change detection system and method based on deep supervision network | |
Lin et al. | Improving impervious surface extraction with shadow-based sparse representation from optical, SAR, and LiDAR data | |
CN115937697A (en) | Remote sensing image change detection method | |
Chen et al. | Change detection algorithm for multi-temporal remote sensing images based on adaptive parameter estimation | |
Xu et al. | TCIANet: Transformer-based context information aggregation network for remote sensing image change detection | |
Song et al. | PSTNet: Progressive sampling transformer network for remote sensing image change detection | |
Niu et al. | Reg-SA–UNet++: A lightweight landslide detection network based on single-temporal images captured postlandslide | |
CN115496950A (en) | Neighborhood information embedded semi-supervised discrimination dictionary pair learning image classification method | |
Lin et al. | An unsupervised transformer-based multivariate alteration detection approach for change detection in VHR remote sensing images | |
CN114943902A (en) | Urban vegetation unmanned aerial vehicle remote sensing classification method based on multi-scale feature perception network | |
CN116977747B (en) | Small sample hyperspectral classification method based on multipath multi-scale feature twin network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |