CN117115663A - Remote sensing image change detection system and method based on deep supervision network - Google Patents

Remote sensing image change detection system and method based on deep supervision network Download PDF

Info

Publication number
CN117115663A
CN117115663A CN202311264681.8A CN202311264681A CN117115663A CN 117115663 A CN117115663 A CN 117115663A CN 202311264681 A CN202311264681 A CN 202311264681A CN 117115663 A CN117115663 A CN 117115663A
Authority
CN
China
Prior art keywords
network
convolution
feature
change detection
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311264681.8A
Other languages
Chinese (zh)
Inventor
袁小平
陈烨
王小倩
贺智杰
刘晨希
张文豪
王子玮
杨婕
杨梓恬
陈宗琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202311264681.8A priority Critical patent/CN117115663A/en
Publication of CN117115663A publication Critical patent/CN117115663A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a remote sensing image change detection system and a remote sensing image change detection method based on a depth supervision network, which are based on a coding and decoding network DSNet end to end of a Unet++ model, and comprise an encoder, a decoder and a classifier; the dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image. An asymmetric multi-scale residual feature extraction module (MultiRes block) designed in the system improves F1-Score by 1.4%, and a deep fusion supervision part improves F1-Score by 2.1% under the condition of increasing a small number of parameters.

Description

Remote sensing image change detection system and method based on deep supervision network
Technical Field
The invention provides a remote sensing image change detection system and a remote sensing image change detection method based on a depth supervision network, and belongs to the technical field of remote sensing images.
Background
Remote sensing image change detection is a technique that analyzes pixels having changes between multi-temporal remote sensing images acquired in the same area but at different times. The technology is widely applied to the fields of land utilization, urban expansion, disaster assessment and the like. Along with the development of machine learning, methods such as a support vector machine, a random forest, a decision tree and the like are gradually applied to remote sensing images by students. Since conventional ground pixel classification tends to result in a salt-and-pepper effect, the basic unit of change detection has been changed from pixel to object. At the same time, many methods that take into account the relationship between adjacent pixels have also been introduced into object-level change detection, classifying invariant and uncertain pixels in a differential image by linear weights, and using an improved markov random field method for change detection.
Convolutional Neural Networks (CNNs) have been well developed in the relevant research of remote sensing images, representing algorithms such as FC-EF, FC-Siam-coc and FC-Siam-diff, to achieve end-to-end training. The change detection method based on the full convolution network can be roughly divided into two types of pre-fusion and post-fusion. The pre-fusion method mainly comprises the steps of connecting time sequence images in series to be used as input of a single branch network, and then directly generating a binary change chart. For example, an improved Unet++ network, utilizes global and fine granularity information to generate a prediction graph with higher spatial precision, then fuses the prediction graphs of different layers, thereby generating a high-precision final change graph, and verifies the effectiveness and reliability of the method on a high-resolution (VHR) satellite image. However, the improved network structure of the Unet++ requires a large amount of changing image data, and can not effectively generate multi-scale changing characteristics, which limits the wide application of the network structure to a certain extent.
The post fusion is to extract the characteristics of the images by a double-branch network with shared weight, then map the images to the characteristic images of the high-dimensional space for fusion, and finally obtain the change images through up-sampling. Representative algorithms are DifUnet++, SNUNet-CD, sima-DeepLabv3+ networks, and the like. However, the remote sensing image generally has the characteristics of complex scene, rich information and the like, when the existing majority of networks extract the characteristics of the remote sensing image, the original image information is not fully extracted, the semantic relevance among different layers is ignored, and only a single output is matched by virtue of training of a single model, so that the space and spectral characteristics of the image cannot be fully utilized, and the change detection capability cannot be further improved.
Although the existing remote sensing image change detection can well extract the characteristics, some problems still exist in extracting the characteristics:
first: the original image information is not sufficiently extracted.
Second,: the semantic relevance among different layers is ignored, and the space and spectral characteristics of the image cannot be fully utilized.
Third,: remote sensing images acquired in the front and rear periods are easily affected by external conditions such as illumination, and change detection is difficult.
Disclosure of Invention
Aiming at the technical problems, the invention provides a remote sensing image change detection system and a remote sensing image change detection method based on a depth supervision network.
Specific:
a remote sensing image change detection system based on a depth supervision network comprises an encoder, a decoder and a classifier, wherein the encoder and the decoder are based on a coding and decoding network DSNet end to end of a Unet++ model;
the dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image.
The encoder is used as a backbone of a network to extract difference information of the two-phase image. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the front node are connected in series and then used as the input of all the subsequent nodes, and the encoder comprises five paths from the aspect of a horizontal structure and is named as L1-L5 from top to bottom in sequence. The convolution units in the same path have the same characteristic output mapping, different paths comprise different numbers of convolution units, and the number of output channels after L1-L5 pass through the convolution units is 32, 64, 128, 256 and 512 respectively.
The multi-scale residual feature extraction module structure diagram comprises:
firstly, extracting shallow information and detail characteristics of an image through a convolution layer with a standard size of 3 multiplied by 3; next, the asymmetric convolution kernels of 1×3,3×1 sizes are passed. Then, a separable convolution layer with the size of 3×3 is introduced, the features of the asymmetric convolution output are convolved channel by channel on a two-dimensional plane, and the obtained independent channels are weighted and combined in the depth direction. And then, the output features of the three convolution layers are connected in series and used for extracting semantic information of different scales. Meanwhile, a convolution layer with the size of 1×1 is used twice in the module for residual connection and adjustment of the number of channels after cascade connection respectively. And finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information.
The decoder is used for generating a characteristic difference graph:
the feature graphs X0,4, X1,3, X2,2, X3 and 1 are subjected to multi-scale residual feature extraction module and transposition convolution operation, and fusion features of each path are respectively mapped into feature graphs F1, F2, F3 and F4 with channel numbers of 32. Each path uses different up-sampling rates, X0,4- > U1×1, X1,3- > U2×2, X2,2- > U4×4, X3,1- > U8×8, namely, the feature graphs F1, F2, F3 and F4 which are output are realized by using different step sizes in transposed convolution, and the dimensions are 256×256.
Based on the system, the invention also provides a remote sensing image change detection method based on the depth supervision network, which comprises the following steps:
step one: collecting image data to establish a data set;
step two: image preprocessing, eliminating irrelevant information in an image, recovering useful real information, enhancing the detectability of related information and simplifying data to the maximum extent, so that the method is more suitable for inputting models.
Step three: and evaluating the index and parameter setting.
Based on the combination of the prediction result (predictedeuct) and the true value, each pixel is divided into four cases, i.e., positive sample predicted but False Predicted (FP), positive sample predicted and True Predicted (TP), negative sample predicted but false predicted (FN) and negative sample predicted but true predicted (TN), in order to evaluate the performance of the model, the following indices are used: recall, precision, and F1-Score. In the change detection task, the higher the recall rate is, the stronger the model is capable of finding more changed pixels, and the higher the accuracy is, the more accurate the detected changed pixels are. F1-Score combines the results of precision and recovery to be considered as a harmonic mean of model accuracy and recall, with a maximum of 1 and a minimum of 0. The evaluation index formula is as follows:
where N represents the total pixel point, P represents the ratio between the true and predicted values for a given class distribution, and OA represents the overall classification accuracy. In a DSNet network, the training period is 100 rounds and the batch size is set to 16. The learning rate was adjusted with intervals (StepLR) with an initial learning rate of 0.001 and a decay of 0.5 for every 10 rounds.
The invention has the following characteristics:
first: for the problem that the existing model cannot fully extract the original image information, a multi-scale residual feature extraction module (MultiRes block) is introduced into an encoder part to replace a common convolution block of the original Unet++. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, and the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then are used as the input of the twin networkIs the input for all subsequent nodes. Firstly, extracting shallow information and detail characteristics of an image through a convolution layer with a standard size of 3 multiplied by 3; secondly, through asymmetric convolution kernels with the sizes of 1 multiplied by 3 and 3 multiplied by 1, the network characteristic expression capability and robustness can be effectively enhanced, and a certain parameter calculation amount can be reduced relative to a standard convolution layer; then, introducing a separable convolution layer with the size of 3 multiplied by 3, carrying out channel-by-channel convolution on the characteristics of the asymmetric convolution output on a two-dimensional plane, and carrying out weighted combination on the obtained independent channels in the depth direction, so that the characteristic information of different channels in the same spatial position is effectively utilized, and the operation cost of a network is reduced; meanwhile, a convolution layer with the size of 1 multiplied by 1 is used twice in the module and is respectively used for residual connection and channel number adjustment after cascade connection; and finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information. The output of the convolution unit is mapped by X ij The i represents an ith path in the downsampling process, j represents a jth convolution unit and a jump connection direction, C (magnitude) represents convolution operation of the multi-scale residual feature extraction module, D (magnitude) represents downsampling operation, and U (magnitude) represents upsampling operation, [ (magnitude).]Representing a series operation. Each node output of the encoder is:
second,: for the problem that semantic relevance among different layers is poor and spatial and spectral characteristics of images cannot be fully utilized, a decoder part fuses characteristic diagrams of different scales. Four feature maps X of different scales are constructed by a transverse output layer encoder 0,4 ,X 1,3 ,X 2,2 ,X 3,1 And finishing the deep supervision process of the output end from the low-level features to the high-level features. Feature map X 0 , 4 ,X 1,3 ,X 2,2 ,X 3,1 Through a multi-scale residual error feature extraction module and transposition convolution operation, the fusion features of each path are respectively mapped into a feature map F with the channel number of 32 1 ,F 2 ,F 3 ,F 4 . To ensure that the output signature can be restored to original size, each path uses a different up-sampling rate (X 0,4 ->U1×1,X 1,3 ->U2×2,X 2,2 ->U4×,X 3 ,1 ->U8×8), i.e. feature map F with different step sizes for output in transpose convolution 1 ,F 2 ,F 3 ,F 4 The dimensions are 256×256. The training process of different up-sampling branches is supervised through an objective function, learning information from different supervision layers is fused, results are integrated into a highly fused branch, feature differences between an encoder and a decoder are reduced, and the detection effect of a remote sensing image is improved.
Third,: for the problems that remote sensing images acquired in the front and rear periods are easily influenced by external conditions such as illumination and the like and are difficult to detect changes, a Normalization Attention Method (NAM) based on a training model is introduced into a decoder part, and feature images with different paths and the same size are connected in series to form a fusion feature image F with 128 channels C Firstly, carrying out batch normalization BN, then multiplying the BN with a weight coefficient Wgamma, and finally obtaining a characteristic output FD through an activation function, wherein the specific formula is as follows:
F D =sigmoid(W γ (BN(F C )))
fourth,: in order to solve the problem of network learning capacity reduction in the deep learning process, the robustness of the network is improved. The system designs a multi-scale prediction method to improve the segmentation precision of the network by means of the thought of depth supervision in the classifier part, enhances the detection capability of the network on pseudo-changes, and further improves the learning capability of the model on the change region. The depth supervision is mainly realized by calculating the loss value of a real label and a hidden layer classifierTo realize the monitoring and feedback of the hidden layer. The whole network not only fuses the characteristic diagram F D Calculation of loss value is performed, and also for different layers of feature map F obtained by decoder part 1 ,F 2 ,F 3 ,F 4 The loss value is calculated and returned. Namely F 1 ,F 2 ,F 3 ,F 4 ,F D Two-dimensional prediction change maps M are generated respectively through convolution layers with the size of 1 multiplied by 1 and Softmax activation functions 1 ,M 2 ,M 3 ,M 4 ,M 5 And finally, calculating the loss of all the prediction change graphs and the true values respectively, and performing supervised training. The loss calculation formula of the whole network is as follows:
compared with the prior art, the invention has the following effects:
first: in an ablation experiment, an asymmetric multi-scale residual feature extraction module (MultiRes block) designed in the system improves F1-Score by 1.4%, and a deep fusion supervision part improves F1-Score by 2.1% under the condition of increasing a small amount of parameters.
Second,: in the scene graph segmentation comparison experiment with large image difference between the front period and the rear period caused by large, medium and small target scenes and different illumination, compared with other network models, the system has no conditions of missed detection, false detection and blurred edges.
Third,: by contrast, the floating point operation value (FLPs) of the system is only 60G, the model performance is improved, and the balance between the network performance and the calculated amount can be realized.
Drawings
FIG. 1 is an overall block diagram of an algorithm model of the present invention:
FIG. 2 is a block diagram of the encoder of the present system:
FIG. 3 is a block diagram of a multi-scale residual feature extraction module of the present invention:
fig. 4 is a characteristic difference map decoder generated by the decoder of the present invention:
fig. 5 is a depth supervision block diagram of the present invention:
FIG. 6a is one of the data processing schematics of the embodiment;
FIG. 6b is a second schematic diagram of data processing according to an embodiment;
FIG. 6c is a third schematic diagram of data processing according to an embodiment;
FIG. 7 is a schematic diagram of an confusion matrix of an embodiment;
FIG. 8 is a visualization of different change detection methods of an embodiment;
FIG. 9 is a F1-Score plot of training and validation sets of an embodiment.
Detailed Description
The specific technical scheme of the invention is described by combining the embodiments.
A remote sensing image change detection system based on a depth supervision network is disclosed in fig. 1, and a new end-to-end codec network DSNet is designed based on a Unet++ model and mainly comprises an encoder, a decoder and a classifier. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image.
As shown in fig. 2, the encoder section is used as a backbone of the network to extract difference information of the bi-temporal image. The dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the front node are connected in series and then used as the input of all the subsequent nodes, and the encoder comprises five paths from the aspect of a horizontal structure and is named as L1-L5 from top to bottom in sequence. The convolution units in the same path have the same characteristic output mapping, different paths comprise different numbers of convolution units, and the number of output channels after L1-L5 pass through the convolution units is 32, 64, 128, 256 and 512 respectively.
As shown in fig. 3, first, shallow information and detail features of an image are extracted through a convolution layer of standard 3×3 size; next, the asymmetric convolution kernels of 1×3,3×1 sizes are passed. It has been demonstrated to effectively enhance network feature expressive power and robustness and to reduce certain parameter calculations relative to standard convolution layers. Then, a separable convolution layer with the size of 3 multiplied by 3 is introduced, the characteristics of the asymmetric convolution output are convolved channel by channel on a two-dimensional plane, and the obtained independent channels are weighted and combined in the depth direction, so that the characteristic information of different channels in the same spatial position is effectively utilized, and the operation cost of a network is reduced. And then, the output features of the three convolution layers are connected in series and used for extracting semantic information of different scales. Meanwhile, a convolution layer with the size of 1×1 is used twice in the module for residual connection and adjustment of the number of channels after cascade connection respectively. And finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information.
A decoder for generating a feature difference map, as shown in fig. 4: the feature graphs X0,4, X1,3, X2,2, X3 and 1 are subjected to multi-scale residual feature extraction module and transposition convolution operation, and fusion features of each path are respectively mapped into feature graphs F1, F2, F3 and F4 with channel numbers of 32. In order to ensure that the output feature graphs can be restored to the original graph size, each path uses different up-sampling rates (X0, 4- > U1X 1,3- > U2X 2,2- > U4X 4, X3,1- > U8X 8), namely, the feature graphs F1, F2, F3 and F4 which are output are realized by using different step sizes in transposed convolution, and the dimensions of the feature graphs F1, F2, F3 and F4 are 256X 256.
As shown in the depth supervision block diagram of fig. 5, the whole network not only calculates the loss value of the fused feature map FD, but also calculates and returns the loss value of the feature maps F1, F2, F3, F4 of different layers obtained by the decoder. Namely, F1, F2, F3, F4, FD generates two-dimensional prediction change maps M1, M2, M3, M4, M5 by a convolution layer of 1×1 size and a Softmax activation function, respectively. And finally, calculating the loss of all the prediction change graphs and the true values respectively, and performing supervised training.
The system can be used in the fields of land utilization, urban extension, disaster assessment and the like, improves the detection effect of remote sensing images under the condition that images acquired in the front and rear periods are easily influenced by external conditions such as illumination and the like, realizes higher detection accuracy, has low omission rate of change detection and has better detection effect. The specific implementation steps are as follows.
Step one: image data is collected to establish a dataset, and a LEVIR-CD dataset is adopted for experiments. LEVIR-CD is an annotated dataset, and image pairs are annotated by professionals, containing hyperspectral images of Google Earth (Google Earth) taken in different seasons with a resolution of 0.5. The image time spans from 2002 to 2018, from 20 different regions in several cities in texas, usa. The method is mainly used for identifying major city changes between 5 and 14 years, contains a large amount of change information generated by seasons and light rays, and is helpful for training a change detection model.
Step two: image preprocessing, eliminating irrelevant information in an image, recovering useful real information, enhancing the detectability of related information and simplifying data to the maximum extent, so that the method is more suitable for inputting models. As shown in fig. 6a, 6b and 6c, the original dataset contains a total of 637 pairs of 1024 x 1024 pictures, and 10192 pairs of 256 x 256 pixel images are generated from the original image by clipping and rotation operations. The experiment randomly divides the dataset into three parts: 70% of the samples were used for training, 20% for validation and 10% for testing.
Step three: and evaluating the index and parameter setting. As shown in fig. 7, the real value (GroundTruth) represented by blue is obtained by artificial labeling, the model prediction result is a blue part, and each pixel can be classified into four cases of being predicted as a positive sample but the prediction is False (FP), being predicted as a positive sample and the prediction is True (TP), being predicted as a negative sample but the prediction is False (FN), and being predicted as a negative sample but the prediction is True (TN) according to the combination of the prediction result (PredictedResult) and the real value. In order to more intuitively evaluate the performance of DSNet, fig. 8 shows a further visualization of the different segmentation result graphs of the representative network on the dataset LEVIR in Table 1. The experiment selects a total of 4 sets of typical aviation scene graphs, mainly related to large, medium and small target scenes and scene graphs with large differences in front and rear period images caused by different illuminations, and the parts marked by circles and squares in the graphs show local segmentation details for facilitating observation and comparison to evaluate the performance of the model. The following indexes are adopted: recall, precision, and F1-Score. In the change detection task, the higher the recall rate is, the stronger the model is capable of finding more changed pixels, and the higher the accuracy is, the more accurate the detected changed pixels are. As shown in FIG. 9, F1-Score combines the results of precision and recovery, and can be considered as a harmonic mean of model accuracy and recall, with a maximum of 1 and a minimum of 0. The evaluation index formula is as follows:
where N represents the total pixel point, P represents the ratio between the true and predicted values for a given class distribution, and OA represents the overall classification accuracy. In a DSNet network, the training period is 100 rounds and the batch size is set to 16. The learning rate was adjusted with intervals (StepLR) with an initial learning rate of 0.001 and a decay of 0.5 for every 10 rounds.
Step four: comparing the method of the system with the existing detection method based on the deep learning change, the obtained result is shown as a chart 1, and the system is shown to obtain excellent performance in the change detection of the complex scene.
Table 1 results of comparative experiments on the change detection task of different methods
For large and medium target scenes, compared with other models, the conditions of missing detection, false detection and edge blurring do not occur; for complex scenes in which large targets and small targets coexist, the system can better extract the large targets and can also completely detect the small targets. The system re-performs weight distribution on the fused feature images through the attention introducing mechanism, eliminates partial redundant noise information, emphasizes important information of a target object, and further enables the network to overcome the influence of irrelevant changes of different illumination or climate on the model to the greatest extent. Through experimental comparison, the multi Res block in the system improves the F1-Score by 1.4%, and the deep fusion supervision part improves the F1-Score by 2.1% under the condition of increasing a small quantity of parameters. The FLPs value of the method provided by the invention is only 60G, the model performance is superior to that of other methods, and the balance between the network performance and the calculated amount can be realized.

Claims (5)

1. The remote sensing image change detection system based on the depth supervision network is characterized by comprising a coding and decoding network DSNet based on a Unet++ model end-to-end, and comprising an encoder, a decoder and a classifier;
the dual-time images Image1 and Image2 are used as the input of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the dual-time-phase images acquired by each layer and the output of the preamble node are connected in series and then used as the input of a decoder, the decoder part fuses the feature images with different scales, and the obtained feature images are classified through a depth supervision network to obtain a prediction change Image.
2. The system of claim 1, wherein the encoder is used as a backbone of the network to extract difference information of the two-phase image; the method comprises the steps that (1) a double-time Image1 and an Image2 are used as inputs of a twin network, continuous feature extraction and downsampling are carried out through a multi-scale residual feature extraction module, the feature difference information of the double-time Image obtained by each layer and the output of a front node are connected in series and then used as inputs of all subsequent nodes, and the encoder comprises five paths from the aspect of a horizontal structure, and the paths are named as L1-L5 from top to bottom in sequence; the convolution units in the same path have the same characteristic output mapping, different paths comprise different numbers of convolution units, and the number of output channels after L1-L5 pass through the convolution units is 32, 64, 128, 256 and 512 respectively.
3. The remote sensing image change detection system based on the depth supervision network according to claim 1, wherein the multi-scale residual error feature extraction module firstly extracts shallow layer information and detail features of the image through a convolution layer with a standard size of 3×3; second, by an asymmetric convolution kernel of 1×3,3×1 size; then, introducing a separable convolution layer with the size of 3 multiplied by 3, carrying out channel-by-channel convolution on the characteristics of the asymmetric convolution output on a two-dimensional plane, and carrying out weighted combination on the obtained independent channels in the depth direction; then, the output features of the three convolution layers are connected in series and used for extracting semantic information of different scales; meanwhile, a convolution layer with the size of 1 multiplied by 1 is used twice in the module and is respectively used for residual connection and channel number adjustment after cascade connection; and finally, adding the characteristics after serial output and the characteristics after residual connection to obtain additional spatial information.
4. The remote sensing image change detection system based on a depth supervision network according to claim 1, wherein the decoder is configured to generate a feature difference map: the feature graphs X0,4, X1,3, X2,2, X3 and 1 are subjected to multi-scale residual feature extraction module and transposition convolution operation, and fusion features of each path are respectively mapped into feature graphs F1, F2, F3 and F4 with channel numbers of 32;
each path uses different up-sampling rates, X0,4- > U1×1, X1,3- > U2×2, X2,2- > U4×4, X3,1- > U8×8, namely, the feature graphs F1, F2, F3 and F4 which are output are realized by using different step sizes in transposed convolution, and the dimensions are 256×256.
5. A remote sensing image change detection method based on a depth supervision network, characterized in that a remote sensing image change detection system based on a depth supervision network as claimed in any one of claims 1 to 4 is adopted, comprising the following steps:
step one: collecting image data to establish a data set;
step two: image preprocessing, namely eliminating irrelevant information in an image, recovering useful real information, enhancing the detectability of related information and simplifying data to the greatest extent, so that the method is more suitable for inputting a model;
step three: evaluating indexes and setting parameters;
according to the combination of the prediction result and the true value, each pixel is divided into four cases of FP predicted as positive sample but false, TP predicted as positive sample and true, FN predicted as negative sample but false, and TN predicted as negative sample but true, in order to evaluate the performance of the model, the following indexes are adopted: recall, precision, and F1-Score; in the change detection task, the higher the recall rate is, the stronger the model has the capability of finding more changed pixels, and the higher the precision is, the more accurate the detected changed pixels are; F1-Score synthesizes the results of precision and recovery, and is regarded as a harmonic mean of model accuracy and recall, the maximum value is 1, and the minimum value is 0; the evaluation index formula is as follows:
where N represents the total pixel point, P represents the ratio between the true value and the predicted value given the class distribution, and OA represents the overall classification accuracy; in a DSNet network, the training period is 100 rounds, and the batch size is set to be 16; the learning rate was adjusted by StepLR at intervals, with an initial learning rate of 0.001 and a decay of 0.5 every 10 rounds.
CN202311264681.8A 2023-09-28 2023-09-28 Remote sensing image change detection system and method based on deep supervision network Pending CN117115663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311264681.8A CN117115663A (en) 2023-09-28 2023-09-28 Remote sensing image change detection system and method based on deep supervision network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311264681.8A CN117115663A (en) 2023-09-28 2023-09-28 Remote sensing image change detection system and method based on deep supervision network

Publications (1)

Publication Number Publication Date
CN117115663A true CN117115663A (en) 2023-11-24

Family

ID=88794895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311264681.8A Pending CN117115663A (en) 2023-09-28 2023-09-28 Remote sensing image change detection system and method based on deep supervision network

Country Status (1)

Country Link
CN (1) CN117115663A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494031A (en) * 2024-01-02 2024-02-02 深圳市伟昊净化设备有限公司 Intelligent monitoring method and system for compressed air pipeline

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494031A (en) * 2024-01-02 2024-02-02 深圳市伟昊净化设备有限公司 Intelligent monitoring method and system for compressed air pipeline
CN117494031B (en) * 2024-01-02 2024-04-26 深圳市伟昊净化设备有限公司 Intelligent monitoring method and system for compressed air pipeline

Similar Documents

Publication Publication Date Title
CN111160311B (en) Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
Zheng et al. ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection
Yang et al. CDnet: CNN-based cloud detection for remote sensing imagery
Guo et al. CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence
Miao et al. Multigranularity decoupling network with pseudolabel selection for remote sensing image scene classification
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN111797779A (en) Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
CN110111345B (en) Attention network-based 3D point cloud segmentation method
Wang et al. Change detection from synthetic aperture radar images via graph-based knowledge supplement network
Rougier et al. Comparison of sampling strategies for object-based classification of urban vegetation from Very High Resolution satellite images
CN116524361A (en) Remote sensing image change detection network and detection method based on double twin branches
CN113569815B (en) Method for detecting remote sensing image change based on image segmentation and twin neural network
Wang et al. RSCNet: A residual self-calibrated network for hyperspectral image change detection
CN112287983B (en) Remote sensing image target extraction system and method based on deep learning
CN117115663A (en) Remote sensing image change detection system and method based on deep supervision network
Lin et al. Improving impervious surface extraction with shadow-based sparse representation from optical, SAR, and LiDAR data
CN115937697A (en) Remote sensing image change detection method
Chen et al. Change detection algorithm for multi-temporal remote sensing images based on adaptive parameter estimation
Xu et al. TCIANet: Transformer-based context information aggregation network for remote sensing image change detection
Song et al. PSTNet: Progressive sampling transformer network for remote sensing image change detection
Niu et al. Reg-SA–UNet++: A lightweight landslide detection network based on single-temporal images captured postlandslide
CN115496950A (en) Neighborhood information embedded semi-supervised discrimination dictionary pair learning image classification method
Lin et al. An unsupervised transformer-based multivariate alteration detection approach for change detection in VHR remote sensing images
CN114943902A (en) Urban vegetation unmanned aerial vehicle remote sensing classification method based on multi-scale feature perception network
CN116977747B (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination