CN115601661A - Building change detection method for urban dynamic monitoring - Google Patents

Building change detection method for urban dynamic monitoring Download PDF

Info

Publication number
CN115601661A
CN115601661A CN202211344397.7A CN202211344397A CN115601661A CN 115601661 A CN115601661 A CN 115601661A CN 202211344397 A CN202211344397 A CN 202211344397A CN 115601661 A CN115601661 A CN 115601661A
Authority
CN
China
Prior art keywords
image
representing
output
loss
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211344397.7A
Other languages
Chinese (zh)
Inventor
徐川
叶昭毅
杨威
梅礼晔
张琪
李迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202211344397.7A priority Critical patent/CN115601661A/en
Publication of CN115601661A publication Critical patent/CN115601661A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Astronomy & Astrophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)

Abstract

The invention discloses a building change detection method for urban dynamic monitoring, which receives an urban surface double-time-phase image detected by a remote sensing satellite through a satellite technology, cuts an original image, inputs the cut image into an urban building automatic detection model, and outputs a change detection result of a building in the double-time-phase image; the automatic detection model of the urban building comprises an encoding stage and a decoding stage. In the encoding stage, a twin network shared by weight is adopted to carry out down-sampling operation on the input double-time phase image, rich multi-scale characteristic information is extracted, and meanwhile, the expression of the characteristic information is enhanced by utilizing a twin cross attention mechanism; in the decoding stage, a multi-scale feature fusion module is adopted to carry out progressive fusion on the extracted multi-scale features; and the detection result is pushed to be closer to the actual change condition by using the differential context discrimination module. The method can efficiently judge and fuse multiple features, thereby improving the accuracy of detecting the change of the urban buildings.

Description

Building change detection method for urban dynamic monitoring
Technical Field
The invention belongs to the field of urban dynamic monitoring, and particularly relates to a building change detection method for urban dynamic monitoring.
Background
At present, large-scale detection equipment and cables need to be laid around the city in most city building automatic monitoring systems, the power supply and the maintenance of equipment need very high cost, and simultaneously, receive signal interference, the shooting angle changes, and factors such as illumination influence are great, can lead to detection system to take place the mistake and supervise and leak the scheduling problem. Remote sensing technology can acquire information of the earth surface at fixed time intervals and extract dynamic changes of the same surface in a plurality of time periods. The automatic detection model of the urban building is based on a remote sensing change detection technology, and the task is to observe the difference change of the same target in different periods and classify each image pixel point by a label, namely label 0 (unchanged) and label 1 (changed). Researchers have done a great deal of work to date on the theory and application of remote sensing change detection. The contributions have important significance on the aspects of land resource management, city construction and planning, illegal construction management and the like.
Over the past several decades, many algorithms have been proposed for remote sensing image change detection models. These algorithms can be broadly divided into two categories: traditional methods and deep learning based methods. For the conventional method, the resolution of the remote sensing image is limited at first, a pixel-based method is mostly adopted for change detection, and the spectral features of each pixel point are analyzed by using Change Vector Analysis (CVA) and Principal Component Analysis (PCA), so that the change detection is performed. With the rapid development of aerospace and remote sensing technologies, the ability to acquire high-resolution remote sensing images is enhanced. Scholars introduce the concept of objects in the field of change detection, and mainly use spectral, texture and spatial background information based on object hierarchy to perform change detection. Although these methods can achieve better effects at that time, the traditional methods need to manually design features and specify thresholds to ensure the final detection effect, and can only extract shallow features, and cannot fully represent the change of buildings in high-resolution remote sensing images, so that the requirements on precision in reality are difficult to meet.
On the other hand, with the development of computing power and the accumulation of massive data, a change detection algorithm based on deep learning has become the mainstream because of its powerful performance. At present, most of change detection methods based on deep learning are developed by networks with better effects on comparison learning and segmentation tasks. And partial scholars adopt focusing contrast loss to carry out change detection, so that intra-class variance is reduced, inter-class difference is increased, and finally, a binarization detection result is obtained through threshold value limitation. The segmentation network performs change detection based on the idea of image segmentation, and representative examples are U-shaped network (UNet) and full Convolutional neural network (FCN) and deep lab series network.
Although these methods also achieve high performance, the following problems remain: first, when there is a large amount of false changes in the previous and subsequent time series images, the current attention mechanism cannot efficiently and pertinently focus on the unchanged area and the changed area, which may cause a serious false detection phenomenon. Secondly, a large amount of down-sampling and up-sampling operations in the existing network cause loss of feature information in front and back time sequence images, and a rough fusion strategy deepens the problem, so that the original features of the images cannot be well restored in the final change detection of the network, and the final detection result has the problems of missing detection, irregular change edge and the like. Finally, the current algorithm cannot perform differential processing on context information well, so that the detection effect on urban building images with many pseudo changes is not good.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a building change detection method for urban dynamic detection, which can accurately realize the automatic detection of urban buildings. The method comprises the following steps:
s1, taking an image of an urban building acquired by a remote sensing satellite as a data set, acquiring an actual change image corresponding to each building in the data set, and dividing the actual change image and a corresponding double-time-phase image into a training set and a test set;
s2, building an automatic building detection model consisting of an encoder and a decoder, wherein the encoder comprises a weight-shared double-channel twin network and a twin cross attention module, and the decoder comprises a multi-scale feature fusion and differential context discrimination module;
the weight-shared two-channel twin network comprises a batch normalization layer and a plurality of up-sampling blocks, and a double-time-phase image is input and used for acquiring feature maps of different scales;
the twin cross attention module firstly carries out embedding operation on feature maps of different scales, and then extracts deeper variation feature semantic information by using a multi-head cross attention mechanism, so that the global attention to the feature information is improved;
the multi-scale feature fusion module adopts a double progressive fusion strategy of reconstruction and up-sampling blocks to fuse the extracted features containing rich multi-scale semantic information;
the input of the differential context discrimination module is the output image of the multi-scale fusion module and the front and back time sequence differential image, and the purpose is to improve the discrimination capability of the network by combining the context information in the image, so that the detection result image is closer to the real change image, and the detection accuracy is improved;
and S3, training the building automatic detection model in the S2 by using the training set in the S1, and realizing building change detection by using the trained model.
In some alternative embodiments, step S1 comprises:
the method comprises the steps of adopting an artificially-made urban building change image as a data set, and making an actual change image according to a front-back time sequence image in the data set, wherein the actual change image is a change area in the front-back time sequence image, and each pixel in the front-back time sequence image represents a category (unchanged or changed).
And (3) forming an automatic detection image data set of the urban building by the front and rear time sequence images and the corresponding actual change images, and dividing a training set and a test set in the data set according to the ratio of 8: 2.
In some optional embodiments, the encoder comprises a weight-shared two-channel twin network and twin cross attention module, and the decoder comprises a multi-scale feature fusion and differential context discrimination module.
In this embodiment, the two-channel twin network for weight sharing in the encoder is implemented using a multi-scale dense connection UNet, which contains a hopping connection, and is capable of sufficiently extracting low-level features and high-level features. A twin cross attention module in an encoder is combined with a Transformer multi-head attention mechanism, and the twin cross attention module firstly independently carries out embedding operation on a double-temporal image to obtain a corresponding multi-stage embedded token. The feature information is further divided into a query queue, a query vector and a query value through a multi-head attention mechanism, the concerned feature information is further activated by a Sigmoid function, the time complexity of the network is effectively reduced by the multi-layer perceptron block, finally, the attention channel respectively pays attention to a change area and an unchanged area in the image, meanwhile, the image information is divided into a sliding window for self-attention calculation, and the modeling capability of the network on the global information is improved.
A multi-scale fusion module in the decoder fuses the multi-stage embedded token extracted from the encoder with the channel attention output rich in context information by using a multi-scale feature fusion technology, and then uses the up-sampling operation fusion feature, so that the network can restore the original image information to the maximum extent, and the omission factor of the network is reduced. And secondly, fusing the rich context output of the embedded token and the channel transformer by utilizing a multi-scale feature fusion technology. Then, the extracted multi-scale information content is subjected to up-sampling fusion, and the original image information is recovered to the maximum extent; the input of the differential context discrimination module in the decoder is the output image and the front and rear time sequence differential images of the multi-scale fusion module in the decoder, and the purpose is to combine the context information in the images to improve the discrimination capability of the network, so that the detection result images approach to the real change images more, and the detection accuracy is improved.
In some optional embodiments, the two-channel twin network with weight sharing in step S2 performs batch normalization on the input two-phase image, including convolution kernel 3, two-dimensional convolution with step size 1, two-dimensional BatchNorm and ReLU activation function with output channel number 64, and then extracts feature information through 3 down-sampling blocks to define x i,j For the output node of a downsampling block, the objective function of the downsampling block is:
Figure BDA0003916698580000041
wherein N (-) represents a nested convolution function, D (-) represents a down-sampling layer, U (-) represents an up-sampling layer, and]representing a characteristic connection function, x i,j Representing an output characteristic diagram, i represents the number of layers, j represents the jth convolutional layer of the layer, and k represents the kth connecting layer; and finally, outputting four kinds of multi-scale characteristic information by the twin network channel.
In some optional embodiments, the twin cross attention module in step S2 performs an embedding operation on the four outputs of the two-channel twin network, first performing a 2D convolution to extract features, and then unfolding the features into a two-dimensional sequence T 1 ,T 2 ,T 3 And T 4 The patch sizes are 32, 16, 8 and 4 respectively, and T is 1 -T 4 Are combined to obtain T Then, a multi-head cross attention mechanism is used for processing, and the objective function of the first stage is as follows:
Figure BDA0003916698580000042
wherein,
Figure BDA0003916698580000043
W K and W V For weight coefficients of different inputs, T l Representing a token of characteristic information, l representing characteristic information of the ith scale, T Representing the feature union of four tokens to obtain a query vector Q u Query key K, query value V, l =1,2,3,4, u =1,2,3,4;
the objective function for the second stage is:
Figure BDA0003916698580000044
wherein σ (·) and
Figure BDA0003916698580000045
respectively representing the softmax function and the instance normalization function, C Represents the sum of the number of channels;
the objective function of the third stage of multi-head cross attention is:
Figure BDA0003916698580000046
wherein, CA h Representing the output of the second stage of multi-head cross attention, h representing the output of the h-th attention head, and N being the number of the attention heads;
the objective function of the final stage of multi-head cross attention is as follows:
O r =MCA p +MLP(Q u +MCA p )
determining the final output of multi-headed cross attention, wherein MCA p Represents the output of the third stage of multi-head cross attention, p represents the p output, MLP (is) is a multi-layer perceptron function, Q u Representing a query vector, u representing the u-th query vector.
In some optional embodiments, in step S2, the objective function of the multi-scale feature fusion module is:
M i =W 1 ·V(T l )+W 2 ·V(O r )
wherein, W 1 And W 2 Is a weight parameter, T, of two linear layers l Representing a token of characteristic information, l representing characteristic information of the l-th scale, O r The output of the multi-head cross attention module is shown, and r represents the output of the r-th attention head.
In some optional embodiments, in step S2, the differential context determination module includes a generator and a determiner, the generator receives two inputs, the detection image obtained at the last layer of the multi-scale feature fusion module and the generated image obtained by performing a differential operation on the first and second time phases calculate losses of the two to push a result closer to an actual change image, and a weighted sum of SCAD and least squares LSGAN loss functions is used as a loss function in the generator to reduce a false monitoring rate of the model; and a least square LSGAN loss function is adopted in the discriminator to improve the detection precision, and the loss functions of the generator and the discriminator are accumulated to obtain the final probability loss.
In some optional embodiments, in step S2, the objective function of the differential context discrimination module is:
L(P)=L(D)+L(G)
L(D)=L LSGAN (D)
L(G)=L LSGAN (D)+αL SCAD
wherein L (P) represents probability loss, L (D) represents discriminator loss, L (G) represents generator loss, L LSGAN (D) Least squares LSGAN loss, L, representing the arbiter LSGAN (G) Least squares LSGAN loss, L, representing the generator SCAD Representing SCAD loss.
In some alternative embodiments, the SCAD penalty is defined as:
Figure BDA0003916698580000051
wherein C represents the detection type, v (C) represents the pixel error value of the detection type, J C For the loss term, ρ is a continuously optimized parameter, and v (c) is defined as follows:
Figure BDA0003916698580000061
wherein, y i For the actual change image, s g (c) To detect the score, g denotes the g-th pixel.
In some alternative embodiments, the least squares LSGAN loss is:
Figure BDA0003916698580000062
wherein, D (x) 1 Y) and D (x) 1 ,G(x 1 ) Denotes the output of the discriminator on the first time phase image, G (x) 1 ) Representing the output of the generator on the first time phase image, D (x) 2 Y) and D (x) 2 ,G(x 2 ) G (x) represents the output of the discriminator on the second phase image 2 ) Representing the output of the generator for the second phase image,
Figure BDA0003916698580000063
and
Figure BDA0003916698580000064
indicating the desire to detect the first time phase image,
Figure BDA0003916698580000065
and
Figure BDA0003916698580000066
indicating the detection expectation of the second phase image, x 1 ,x 2 Respectively representing the first and second time phase images input by the discriminator, and y representing the actual change image.
In some alternative embodiments, the least squares LSGAN loss is:
Figure BDA0003916698580000067
wherein,
Figure BDA0003916698580000068
indicating the desire to detect the first time phase image,
Figure BDA0003916698580000069
indicating the expectation of detection of the second phase image, D (x) 1 ,G(x 1 ) Denotes the output of the discriminator on the first time phase image, G (x) 1 ) Representing the output of the generator on the first time phase image, D (x) 2 ,G(x 2 ) Denotes the output of the discriminator for the second phase image, G (x) 2 ) Representing the output of the generator on the second phase image, x 1 ,x 2 Respectively representing the first and second time phase images input by the discriminator.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects: based on the deep convolutional neural network, a building automatic detection model consisting of an encoder and a decoder is constructed, multi-scale feature information in the double-temporal image can be effectively distinguished and fused, and building change detection accuracy is effectively improved. Finally, the change condition of the urban building can be automatically detected only by inputting the double-time-phase image into the trained model.
Drawings
Fig. 1 is a schematic flow chart of an automated urban building detection system and method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an automated building inspection model according to an embodiment of the present invention;
FIG. 3 is a diagram of a multi-head cross attention mechanism network according to an embodiment of the present invention;
FIG. 4 is a comparison chart of tests performed in a different method according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Extracting abundant characteristic information in the double-time-phase image by adopting a multi-scale dense connection UNet network; the twin attention mechanism pays attention to a changed area and an unchanged area in the double-temporal image respectively, the representation of characteristic information is enhanced, and the global attention of the information is improved; progressively fusing feature information of each scale by adopting a multi-scale feature fusion module; meanwhile, the weighted sum of the generator and the discriminator is calculated by the differential context discrimination module to be used as probability loss, so that the detection result is pushed to be close to a real change image. And 8 evaluation indexes are adopted to evaluate the performance of the invention, including Precision (Precision), recall (Recall), comprehensive evaluation index (F1-score), intersection (IOU), unchanged intersection (IOU _ 0), changed intersection (IOU _ 1), overall Precision (OA) and Kappa coefficient (Kappa) as evaluation indexes. The present invention will be described in further detail with reference to the accompanying drawings and examples.
Fig. 1 is a schematic flow chart of an automated urban building detection system and method according to an embodiment of the present invention, which specifically includes the following steps:
s1: and (3) data set construction: the method comprises the steps of constructing a data set by using images of urban buildings acquired by a remote sensing satellite, acquiring actual change images corresponding to all buildings in the data set, and taking the actual change images and corresponding double-time-phase images as the data set;
the detection precision of the model can be effectively improved by constructing a reasonable building change detection data set. In the experiments of the present example, a LEVIR-CD dataset was used, which contained a wide variety of architectural images from 20 regions. The original size of each image is 1024 × 1024 pixels, and the spatial resolution is 0.5m. Considering the limitation of the memory capacity of the GPU, each image is cut into 16 area images with the size of 256 multiplied by 256 pixels by adopting an image segmentation algorithm, and 4450 front and back time sequence image pairs are finally obtained. The invention adopts professional computer vision labeling software to label the urban building image. For each pair of front and rear time sequence images, a corresponding actual change image group channel is obtained, each pixel point in the actual change image represents a category, wherein the category labels in the actual change image are represented by 0 and 1, 0 represents an unchanged area (which can be displayed as black), and 1 represents a changed area (which can be displayed as white).
The front and rear time sequence images and the corresponding actual change images are obtained through the processing, each front and rear time sequence image and the corresponding actual change image form an automatic detection image data set of the urban building, and a training set (3560 images in total) and a testing set (890 images in total) are divided in the data set according to the ratio of 8: 2.
S2: building automatic detection model construction: constructing a twin cross attention discrimination network consisting of an encoder and a decoder as a building automatic detection model;
as shown in fig. 2, the building automation detection model of the embodiment of the present invention includes two main modules: an encoder and a decoder. The encoder comprises a two-channel twin network and twin cross attention module which are shared by weight, and the decoder comprises a multi-scale feature fusion and difference context discrimination module.
The encoder is responsible for extracting multi-scale characteristic information and high-level semantic information in the input image. The decoder carries out progressive fusion on the extracted multi-scale features, calculates probability loss by combining context difference information, and continuously pushes a result graph to be close to the Ground Truth.
As shown in fig. 2 (a), a two-channel twin network with shared weight is first used to perform batch normalization on the input dual-phase image, including convolution kernel 3, two-dimensional convolution with step size of 1, two-dimensional BatchNorm, and ReLU activation function with output channel number of 64. Then extracting feature information by downsampling block to define x i,j For the output node of a downsampling block, the objective function of the downsampling block is:
Figure BDA0003916698580000081
wherein N (-) represents a nested convolution functionNumber, D (-) represents a down-sampling layer, U (-) represents an up-sampling layer, [ 2 ]]Representing a characteristic connection function, x i,j Representing the output signature, i represents the number of layers, j represents the jth convolutional layer of the layer, and k represents the kth connection layer. To better describe the network parameters, the number of output channels of the three downsample blocks are defined as 128, 256 and 512, respectively. And finally, outputting four kinds of multi-scale characteristic information by the twin network channel.
As shown in fig. 2 (b), the twin cross attention module performs an embedding operation on four outputs of the two-channel twin network with shared weight, firstly performs 2D convolution to extract features, and then expands the features into a two-dimensional sequence T 1 ,T 2 ,T 3 And T 4 The patch sizes are 32, 16, 8 and 4, respectively. Will T 1 -T 4 Are combined to obtain T
As shown in fig. 3, the twin cross attention module extracts deeper-level variation feature semantic information by using a multi-head cross attention mechanism, so as to improve the global attention to the feature information. The objective function for the first stage of multi-headed cross attention is:
Q u =T l W Qi ,K=T W K ,V=T W V
wherein,
Figure BDA0003916698580000091
W K and W V For weight coefficients of different inputs, T l Representing a token of characteristic information, l representing characteristic information of the ith scale, T Representing a feature union of four tokens. Obtain a query vector Q u (u =1,2,3,4), query key K, query value V. The number of channels of the four query vectors is [64, 128, 256, 512 ] respectively]。
Since the time complexity of the network is large due to the global attention mechanism, the calculation amount of the network is reduced by adopting the transposition attention mechanism. Wherein
Figure BDA0003916698580000092
And V T Are respectively query vectors Q u And a transpose of the query value V. The objective function for the second stage of multi-headed cross attention is therefore:
Figure BDA0003916698580000093
determining the output of the second phase of multi-headed cross-attention, where σ (-) and
Figure BDA0003916698580000094
respectively representing the softmax function and the instance normalization function,
Figure BDA0003916698580000095
W K and
Figure BDA0003916698580000096
for the weight coefficients of the different inputs,
Figure BDA0003916698580000097
representing a token of characteristic information, l representing characteristic information of the ith scale, T Representing a feature union of four tokens. C Representing the sum of the number of channels.
The objective function of the third stage of multi-head cross attention is:
Figure BDA0003916698580000098
wherein, CA h The output of the second stage of multi-head cross attention (h =1,2,3,4), h the output of the h-th attention head, and N the number of attention heads, are shown, and experiments prove that the network has the best detection effect when N is 4.
The objective function for the final stage of multi-headed cross attention is:
O r =MCA p +MLP(Q u +MCA p )
determining the final output of multi-headed cross-attention, wherein MCA p Third to show the multi-headed intersection attentionThe output of the stage, p denotes the p-th output, MLP (-) is a multi-layer perceptron function, Q u Representing the query vector, u represents the u-th query vector (u =1,2,3,4). Finally four outputs O are obtained 1 ,O 2 ,O 3 And O 4
As shown in fig. 2 (c), the multi-scale feature fusion module adopts a dual progressive fusion strategy of reconstruction and upsampling block to fuse the extracted features containing rich multi-scale semantic information. The reconstruction strategy first intersects four embedded tokens T in the attention module 1 ,T 2 ,T 3 And T 4 And four outputs O in a multi-headed cross attention mechanism 1 ,O 2 ,O 3 And O 4 Fusion is performed.
The objective function in the reconstruction strategy is:
M i =W 1 ·V(T l )+W 2 ·V(O r )
wherein, W 1 And W 2 Is a weight parameter, T, of two linear layers l Representing a token of characteristic information, l representing characteristic information of the l-th scale, O r The output of the multi-head cross attention module is shown, and r represents the output of the r-th attention head (r =1,2,3,4). Get four outputs M 1 ,M 2 ,M 3 And M 4
For better fusion of the multi-scale feature information, the four outputs are subjected to upsampling block operations, and the output channels of the four upsampling blocks are 256, 128, 64 and 64 respectively. The upsampling block contains a two-dimensional convolution with a convolution kernel size of 2, an average pooling layer, and an activation function ReLu. And finally, performing one-dimensional convolution with a convolution kernel of 1 and a step length of 1 on the output result of the fourth up-sampling block to obtain a detection image.
As shown in fig. 2 (d), the differential context discrimination module includes a generator and a discriminator. The generator receives two inputs, and the multi-scale feature fusion module performs difference operation on the detection image obtained at the last layer and the first time phase and the second time phase to obtain a generated image. The loss of both is calculated to push the result closer to the actual change image. And the generator adopts the weighted sum of the SCAD and the least square LSGAN loss function as the loss function to reduce the false monitoring rate of the model. And a least square LSGAN loss function is adopted in the discriminator to improve the detection precision. And accumulating the loss functions of the generator and the discriminator to obtain the final probability loss. The objective function of the differential context discrimination module is:
L(P)=L(D)+L(G)
L(D)=L LSGAN (D)
L(G)=L LSGAN (D)+αL SCAD
SCAD loss is defined as:
Figure BDA0003916698580000101
determining SCAD loss, wherein C represents a detection class, v (C) represents a pixel error value of the detection class, J C For the loss term, ρ is a continuously optimized parameter. v (c) is defined as follows:
Figure BDA0003916698580000111
wherein, y i For actually changing the image, s g (c) To detect the score, g denotes the g-th pixel.
The least square LSGAN loss of the discriminator is as follows:
Figure BDA0003916698580000112
determining least squares LSGAN loss of the discriminator, where D (x) 1 Y) and D (x) 1 ,G(x 1 ) Denotes the output of the discriminator on the first time phase image, G (x) 1 ) Representing the output of the generator on the first time phase image, D (x) 2 Y) and D (x) 2 ,G(x 2 ) G (x) represents the output of the discriminator on the second phase image 2 ) Representing the output of the generator for the second phase image,
Figure BDA0003916698580000113
and
Figure BDA0003916698580000114
indicating the desire to detect the first time phase image,
Figure BDA0003916698580000119
and
Figure BDA0003916698580000115
indicating the detection expectation of the second phase image, x 1 ,x 2 Respectively representing the first and second time phase images input by the discriminator, and y representing the actual change image.
The generator least squares LSGAN loss in the present invention is:
Figure BDA0003916698580000116
a least squares LSGAN loss of the generator is determined, wherein,
Figure BDA0003916698580000117
indicating the desire to detect the first time phase image,
Figure BDA0003916698580000118
indicating the detection expectation of the second phase image, D (x) 1 ,G(x 1 ) Denotes the output of the discriminator on the first time phase image, G (x) 1 ) Representing the output of the generator on the first time phase image, D (x) 2 ,G(x 2 ) G (x) represents the output of the discriminator on the second phase image 2 ) Representing the output of the generator on the second phase image, x 1 ,x 2 Respectively representing the first and second time phase images input by the discriminator.
Thus, the objective function of the differential context discrimination module is:
L(P)=L(D)+L(G)
L(D)=L LSGAN (D)
L(G)=L LSGAN (D)+αL SCAD
wherein L (P) represents probability loss, L (D) represents discriminator loss, L (G) represents generator loss, L LSGAN (D) Least squares LSGAN loss, L, representing the arbiter LSGAN (G) Least squares LSGAN loss, L, representing the generator SCAD Representing SCAD loss. α is a weighting parameter that controls the relative importance between the two losses. With the aid of the objective function, the generator and the discriminator generate probability loss in a loop iteration mode until the probability loss is lower than a set threshold value, and then a detection result is output.
S3: training the building automatic detection model in the S2 by using the training set in the S1, realizing building change detection by using the trained model, and finally evaluating a detection result by using the evaluation index of the building automatic detection model;
and training the LEVIR-CD data set constructed in the step S1 by using the network structure provided by the invention to obtain model weights for model evaluation. The training process is based on a PyTorch deep learning framework, the software environment is Ubuntu20.04, the hardware environment is 3090 display card, and the video memory is 24GB. The batchsize is set to 8 for a total of 100 epochs. Each input comprises three images: the first time phase image, the second time phase image and the actual change image are tested once after being trained once, and the change information of the urban buildings in the double time phase image and the real change image is continuously learned in the network training process. And (5) iterating circularly until the epoch reaches 100, and finishing the training.
Selecting Precision (Precision), recall (Recall), comprehensive evaluation index (F1-score) Intersection (IOU), unchanged intersection (IOU _ 0), overall Precision (OA) changed intersection (IOU _ 1) and Kappa coefficient (Kappa) as evaluation indexes, wherein the evaluation index calculation formula is as follows:
Figure BDA0003916698580000121
Figure BDA0003916698580000122
Figure BDA0003916698580000123
Figure BDA0003916698580000124
Figure BDA0003916698580000125
Figure BDA0003916698580000126
Figure BDA0003916698580000127
in order to verify the performance of the building automation detection model provided by the invention, the invention provides a final experimental result, fig. 4 is a visual comparison graph of various methods, and table 1 is a quantitative index of various methods.
Fig. 4 shows images of the building detection results obtained by various methods. The image (a) is a preceding image, (b) is a subsequent image, (c) is a real variation image (GT), and (d) - (g) are the detection result images of different methods. By comparing the actual change images, black represents an unchanged area, white represents a changed area, red represents a false detection area, and green represents a missing detection area.
Table 1: building detection precision in LEVIR-CD data set by various methods
Figure BDA0003916698580000131
Note that all indices are in percent units, and the larger the number, the better the effect. For ease of observation, the best results are shown in bold.
It should be noted that, according to the implementation requirement, each step/component described in the present application can be divided into more steps/components, and two or more steps/components or partial operations of the steps/components can be combined into new steps/components to achieve the purpose of the present invention.
It will be appreciated by those skilled in the art that the foregoing is only a preferred embodiment of the invention and is not intended to limit the invention, which is to cover any modifications, equivalents, improvements and the like, which fall within the spirit and scope of the invention.

Claims (10)

1. A building change detection method for urban dynamic monitoring is characterized by comprising the following steps:
s1, taking an image of an urban building acquired by a remote sensing satellite as a data set, acquiring an actual change image corresponding to each building in the data set, and dividing the actual change image and a corresponding double-time-phase image into a training set and a test set;
s2, building an automatic building detection model consisting of an encoder and a decoder, wherein the encoder comprises a two-channel twin network and a twin cross attention module which are shared by weight, and the decoder comprises a multi-scale feature fusion and differential context discrimination module;
the weight-shared two-channel twin network comprises a batch normalization layer and a plurality of up-sampling blocks, and double-time-phase images are input to obtain feature maps of different scales;
the twin cross attention module firstly carries out embedding operation on feature maps of different scales, and then extracts deeper variation feature semantic information by using a multi-head cross attention mechanism, so that the global attention to the feature information is improved;
the multi-scale feature fusion module adopts a dual progressive fusion strategy of reconstruction and up-sampling blocks to fuse the extracted features containing rich multi-scale semantic information;
the input of the differential context discrimination module is the output image of the multi-scale fusion module and the front and back time sequence differential image, and the purpose is to improve the discrimination capability of the network by combining the context information in the image, so that the detection result image is closer to the real change image, and the detection accuracy is improved;
and S3, training the building automatic detection model in the S2 by using the training set in the S1, and realizing building change detection by using the trained model.
2. The method according to claim 1, wherein step S1 comprises:
the method comprises the steps of taking an artificially-made urban building change image as a data set, and making an actual change image according to a double-time-phase image in the data set, wherein the actual change image is a change area in the double-time-phase image, and each pixel in the actual change image represents a type and is unchanged or changed;
and (3) forming an automatic detection image data set of the urban building by the front and rear time sequence images and the corresponding actual change images, wherein the data set comprises the following data according to the ratio of 8:2 into training and test sets.
3. The method of claim 1, wherein: the weight-shared dual-channel twin network in the step S2 carries out batch processing normalization operation on the input dual-time phase image, wherein the batch processing normalization operation comprises a convolution kernel 3, a two-dimensional convolution with the step length of 1, a two-dimensional Batchnorm and a ReLU activation function with the output channel number of 64, and then characteristic information is extracted through 3 down-sampling blocks to define x i,j For the output node of a downsampling block, the objective function of the downsampling block is:
Figure FDA0003916698570000021
wherein N (-) represents a nested convolution function, D (-) represents a down-sampling layer, U (-) represents an up-sampling layer, and]representing a characteristic connection function, x i,j Representing the output characteristic diagram, i representing the number of layers, j representing the number of layersRepresents the jth convolutional layer of the layer, k represents the kth connection layer; and finally, outputting four kinds of multi-scale characteristic information by the twin network channel.
4. The method of claim 1, wherein: the twin cross attention module in the step S2 carries out embedding operation on four outputs of the dual-channel twin network, firstly carries out 2D convolution once to extract characteristics, and then expands the characteristics into a two-dimensional sequence T 1 ,T 2 ,T 3 And T 4 The patch sizes are 32, 16, 8 and 4 respectively, and T is 1 -T 4 Are combined to obtain T Then, a multi-head cross attention mechanism is used for processing, and the objective function of the first stage is as follows:
Figure FDA0003916698570000022
K=T W K ,V=T W V
wherein,
Figure FDA0003916698570000023
W K and W V For weight coefficients of different inputs, T l Representing a token of characteristic information, l representing characteristic information of the ith scale, T Representing the feature union of four tokens to obtain a query vector Q u Query key K, query value V, l =1,2,3,4, u =1,2,3,4;
the objective function for the second stage is:
Figure FDA0003916698570000024
wherein σ (·) and
Figure FDA0003916698570000025
respectively representing the softmax function and the instance normalization function, C Represents the sum of the number of channels;
the objective function of the third stage of multi-head cross attention is:
Figure FDA0003916698570000026
wherein, CA h Representing the output of the second stage of multi-head cross attention, h representing the output of the h-th attention head, and N being the number of the attention heads;
the objective function of the final stage of multi-head cross attention is as follows:
O r =MCA p +MLP(Q u +MCA p )
determining the final output of multi-headed cross-attention, wherein MCA p Represents the output of the third stage of multi-head cross attention, p represents the p output, MLP (is) is a multi-layer perceptron function, Q u Representing a query vector, u representing the u-th query vector.
5. The method of claim 4, wherein: in step S2, the objective function of the multi-scale feature fusion module is:
M i =W 1 ·V(T l )+W 2 ·V(O r )
wherein, W 1 And W 2 Is a weight parameter, T, of two linear layers l Representing a token of characteristic information, l representing characteristic information of the l-th scale, O r The output of the multi-head cross attention module is shown, and r represents the output of the r-th attention head.
6. The method of claim 1, wherein: in step S2, the differential context discrimination module comprises a generator and a discriminator, the generator receives two inputs, the detection image obtained at the last layer of the multi-scale feature fusion module and the generated image obtained by carrying out differential operation on the first time phase and the second time phase are calculated to promote the loss of the detection image and the generated image to be closer to the actual change image, and the generator adopts the weighted sum of an SCAD (sequence-characterized amplified Scattering analysis) and a least square LSGAN (least squares) loss function as a loss function to reduce the false monitoring rate of the model; and a least square LSGAN loss function is adopted in the discriminator to improve the detection precision, and the loss functions of the generator and the discriminator are accumulated to obtain the final probability loss.
7. The method of claim 6, wherein: in step S2, the objective function of the differential context discrimination module is:
L(P)=L(D)+L(G)
L(D)=L LSGAN (D)
L(G)=L LSGAN (D)+αL SCAD
wherein L (P) represents probability loss, L (D) represents discriminator loss, L (G) represents generator loss, L LSGAN (D) Least squares LSGAN loss, L, representing discriminators LSGAN (G) Least squares LSGAN loss, L, representing the generator SCAD Representing SCAD loss.
8. The method of claim 7, wherein: SCAD loss is defined as:
Figure FDA0003916698570000031
wherein C represents the detection type, v (C) represents the pixel error value of the detection type, J C For the loss term, ρ is a continuously optimized parameter, and v (c) is defined as follows:
Figure FDA0003916698570000041
wherein, y i For actually changing the image, s g (c) To detect the score, g represents the g-th pixel.
9. The method of claim 7, wherein: the least squares LSGAN loss is:
Figure FDA0003916698570000042
wherein, D (x) 1 Y) and D (x) 1 ,G(x 1 ) Denotes the output of the discriminator on the first time phase image, G (x) 1 ) Representing the output of the generator on the first time phase image, D (x) 2 Y) and D (x) 2 ,G(x 2 ) G (x) represents the output of the discriminator on the second phase image 2 ) Representing the output of the generator for the second phase image,
Figure FDA0003916698570000043
and
Figure FDA0003916698570000044
indicating the desire to detect the first time phase image,
Figure FDA0003916698570000045
and
Figure FDA0003916698570000046
indicating the detection expectation of the second phase image, x 1 ,x 2 Respectively representing the first and second time phase images input by the discriminator, and y representing the actual change image.
10. The method of claim 7, wherein: the least squares LSGAN loss is:
Figure FDA0003916698570000047
wherein,
Figure FDA0003916698570000048
indicating the desire to detect the first time phase image,
Figure FDA0003916698570000049
indicating the expectation of detection of the second phase image, D (x) 1 ,G(x 1 ) Denotes the output of the discriminator on the first time phase image, G (x) 1 ) Representing the output of the generator on the first time phase image, D (x) 2 ,G(x 2 ) G (x) represents the output of the discriminator on the second phase image 2 ) Representing the output of the generator on the second phase image, x 1 ,x 2 Respectively representing the first and second time phase images input by the discriminator.
CN202211344397.7A 2022-10-31 2022-10-31 Building change detection method for urban dynamic monitoring Pending CN115601661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211344397.7A CN115601661A (en) 2022-10-31 2022-10-31 Building change detection method for urban dynamic monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211344397.7A CN115601661A (en) 2022-10-31 2022-10-31 Building change detection method for urban dynamic monitoring

Publications (1)

Publication Number Publication Date
CN115601661A true CN115601661A (en) 2023-01-13

Family

ID=84850193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211344397.7A Pending CN115601661A (en) 2022-10-31 2022-10-31 Building change detection method for urban dynamic monitoring

Country Status (1)

Country Link
CN (1) CN115601661A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051519A (en) * 2023-02-02 2023-05-02 广东国地规划科技股份有限公司 Method, device, equipment and storage medium for detecting double-time-phase image building change
CN116091492A (en) * 2023-04-06 2023-05-09 中国科学技术大学 Image change pixel level detection method and system
CN116343052A (en) * 2023-05-30 2023-06-27 华东交通大学 Attention and multiscale-based dual-temporal remote sensing image change detection network
CN116862252A (en) * 2023-06-13 2023-10-10 河海大学 Urban building loss emergency assessment method based on composite convolution operator
CN117576574A (en) * 2024-01-19 2024-02-20 湖北工业大学 Electric power facility ground feature change detection method and device, electronic equipment and medium
CN118212532A (en) * 2024-04-28 2024-06-18 西安电子科技大学 Method for extracting building change region in double-phase remote sensing image based on twin mixed attention mechanism and multi-scale feature fusion

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051519A (en) * 2023-02-02 2023-05-02 广东国地规划科技股份有限公司 Method, device, equipment and storage medium for detecting double-time-phase image building change
CN116051519B (en) * 2023-02-02 2023-08-22 广东国地规划科技股份有限公司 Method, device, equipment and storage medium for detecting double-time-phase image building change
CN116091492A (en) * 2023-04-06 2023-05-09 中国科学技术大学 Image change pixel level detection method and system
CN116091492B (en) * 2023-04-06 2023-07-14 中国科学技术大学 Image change pixel level detection method and system
CN116343052A (en) * 2023-05-30 2023-06-27 华东交通大学 Attention and multiscale-based dual-temporal remote sensing image change detection network
CN116343052B (en) * 2023-05-30 2023-08-01 华东交通大学 Attention and multiscale-based dual-temporal remote sensing image change detection network
CN116862252A (en) * 2023-06-13 2023-10-10 河海大学 Urban building loss emergency assessment method based on composite convolution operator
CN116862252B (en) * 2023-06-13 2024-04-26 河海大学 Urban building loss emergency assessment method based on composite convolution operator
CN117576574A (en) * 2024-01-19 2024-02-20 湖北工业大学 Electric power facility ground feature change detection method and device, electronic equipment and medium
CN117576574B (en) * 2024-01-19 2024-04-05 湖北工业大学 Electric power facility ground feature change detection method and device, electronic equipment and medium
CN118212532A (en) * 2024-04-28 2024-06-18 西安电子科技大学 Method for extracting building change region in double-phase remote sensing image based on twin mixed attention mechanism and multi-scale feature fusion

Similar Documents

Publication Publication Date Title
CN111126202B (en) Optical remote sensing image target detection method based on void feature pyramid network
CN115601661A (en) Building change detection method for urban dynamic monitoring
CN111723732B (en) Optical remote sensing image change detection method, storage medium and computing equipment
CN112668494A (en) Small sample change detection method based on multi-scale feature extraction
CN109741328A (en) A kind of automobile apparent mass detection method based on production confrontation network
Liu et al. An attention-based multiscale transformer network for remote sensing image change detection
CN108399248A (en) A kind of time series data prediction technique, device and equipment
Chen et al. Changemamba: Remote sensing change detection with spatio-temporal state space model
CN113569788B (en) Building semantic segmentation network model training method, system and application method
CN103714148B (en) SAR image search method based on sparse coding classification
CN116524361A (en) Remote sensing image change detection network and detection method based on double twin branches
Li et al. A review of deep learning methods for pixel-level crack detection
Eftekhari et al. Building change detection using the parallel spatial-channel attention block and edge-guided deep network
CN116823664B (en) Remote sensing image cloud removal method and system
CN115937774A (en) Security inspection contraband detection method based on feature fusion and semantic interaction
CN114565594A (en) Image anomaly detection method based on soft mask contrast loss
CN116434069A (en) Remote sensing image change detection method based on local-global transducer network
CN116703885A (en) Swin transducer-based surface defect detection method and system
CN115035334B (en) Multi-classification change detection method and system for multi-scale fusion double-time-phase remote sensing image
CN115937697A (en) Remote sensing image change detection method
CN117911879B (en) SAM-fused fine-granularity high-resolution remote sensing image change detection method
CN114972882A (en) Wear surface damage depth estimation method and system based on multi-attention machine system
Seydi et al. BDD-Net+: A building damage detection framework based on modified coat-net
Fan et al. Application of YOLOv5 neural network based on improved attention mechanism in recognition of Thangka image defects
Zhang et al. CDMamba: Remote Sensing Image Change Detection with Mamba

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination