CN115526886A

CN115526886A - Optical satellite image pixel level change detection method based on multi-scale feature fusion

Info

Publication number: CN115526886A
Application number: CN202211320523.5A
Authority: CN
Inventors: 周文明; 赵利华; 张志军; 丁峰; 胡朝鹏; 张�浩; 甘俊; 张冠军; 谭兆; 齐春雨; 王爱辉; 李平苍; 赵振洋
Original assignee: Tianjin Surveying And Mapping Geographic Information Research Center; China Railway Design Corp
Current assignee: Tianjin Surveying And Mapping Geographic Information Research Center; China Railway Design Corp
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2022-12-27
Anticipated expiration: 2042-10-26
Also published as: CN115526886B

Abstract

The invention discloses an optical satellite image pixel level change detection method based on multi-scale feature fusion, and belongs to a remote sensing image processing method. The invention comprises a new convolutional neural network, wherein a new multi-scale feature fusion strategy is designed in the network, and the strategy can resist registration errors existing between double time phase satellite remote sensing images, so that the change detection precision of the remote sensing images is effectively improved; meanwhile, the network provides a half-group convolution module, the inference speed of the network can be effectively improved by embedding the module in a network model, and the change detection efficiency is improved. The method is used for calculating two input remote sensing images with the same size, resolution and geographic coverage range through the network to obtain a change detection result graph with the same size. The change detection method can obtain excellent remote sensing image change detection precision.

Description

Optical satellite image pixel level change detection method based on multi-scale feature fusion

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to an optical satellite image pixel level change detection method based on multi-scale feature fusion.

Background

Most of the existing remote sensing image change detection models focus on accurately positioning a change area, but neglect the efficiency of the method, which limits the application of the change detection technology in actual tasks, especially in large-scale and urgent change detection tasks. Although some researchers design some efficient processing algorithms to improve the efficiency of change detection, the design strategy of such methods often causes the precision loss of the change detection result. For example, in a process of processing a remote sensing image described in patent document with patent number CN201710220588.5 and name of a remote sensing image building detection method based on multi-scale and multi-feature fusion, all high-resolution remote sensing images are uniformly down-sampled, and then edge images with different scales are fused after multi-group feature calculation, and the processing process sacrifices sampling precision and reduces the dimension of feature extraction, so that the processing capability of registration error between satellite images is lacked, and the performance of the existing remote sensing image change detection model in practical application is also severely limited. This shows that a pixel level change detection network more suitable for remote sensing images is in urgent need of being proposed. Aiming at the problems, the invention provides an efficient optical satellite image pixel level change detection method based on a multi-scale feature fusion strategy.

Disclosure of Invention

Therefore, the invention aims to provide a method for detecting the pixel-level change of an optical satellite image based on multi-scale feature fusion, so as to solve the problem of insufficient registration precision and realize the precision of the pixel-level change of a remote sensing image.

In order to achieve the above object, the present invention provides a method for detecting pixel-level changes of an optical satellite image based on multi-scale feature fusion, which comprises the following steps:

s1, acquiring a plurality of groups of change image pairs to be detected, and inputting the change image pairs into a trained change detection model; the change detection model adopts a convolution neural network fusion multi-scale feature fusion strategy; the multi-scale feature fusion strategy is used for carrying out different processing on different types of images to obtain a plurality of feature graphs with different scales and different feature levels;

s2, taking the feature map processed by the multi-scale feature fusion strategy as the input of a convolutional neural network, wherein the convolutional neural network comprises a plurality of half groups of convolution modules, and each half group of convolution modules comprises a separation part and a cascade part; the separation part is used for recombining and grouping parts in the input feature maps to form a plurality of sub-feature maps, and the cascade part is used for carrying out cascade processing on the input feature maps and the sub-feature maps;

and S3, carrying out secondary classification on the final fusion characteristic graph formed after multiple separation and cascade connection by using a classifier to obtain a final change detection result graph.

Further preferably, in S1, the image to be detected is a large-range remote sensing image acquired for 2 different periods, and is subjected to geometric correction, orthorectification and resampling.

Further preferably, in S1, the multi-scale feature fusion strategy includes setting a plurality of spatial feature extraction branches and a plurality of image down-sampling processing branches, where the number of the spatial feature extraction branches and the number of the down-sampling processing branches are the same.

Further preferably, each spatial feature extraction branch adopts two 3 × 3 convolution kernels to connect the pooled layer extraction features, so as to obtain a feature map of each spatial feature extraction branch.

Further preferably, the down-sampling process of each image down-sampling processing branch is performed by using a bilinear interpolation method.

Further preferably, in S2, the step of using the feature map processed by the multi-scale feature fusion policy as an input of a convolutional neural network and finally forming a fusion feature map by using a plurality of half-group convolution modules includes the following steps:

s201, image pair { (I) is changed ₁ ,I ₂ ,CM ^* ) _t I T =1,2, \8230, T is used as the input of the change detection model phi, and the input characteristic diagram L under different scales is obtained through a multi-scale characteristic fusion strategy ₁ -0,…L _4-0 ；

S202, passing through two branch pairs L _1-0 Extracting high-level features to obtain a feature map L _2-1 And L _2-2 (ii) a Characteristic diagram L _2-0 ,L _2-1 And L _2-2 Forming a first half set of convolution modules omega ₁ The input of (1); omega ₁ Is obtained as an output characteristic map S _p1 ＝{L _3-1 ,L _3-2 ，L _3-3 Fourthly, obtaining an output characteristic diagram C of the cascade part _p1 ＝{L _2-c }；

S203, characteristic diagram L _3-0 And omega ₁ Output feature map ofCascade processing to obtain L _3-4 (ii) a Characteristic diagram L _3-0 And omega ₁ The separation section of (2) outputs a result S _p1 And characteristic diagram L _3-4 Together forming a second half set of convolution modules omega ₂ The input of (1); omega ₂ Respectively is S _p2 ＝{L _4-1 ,L _4-2 ，L _4-3 ，L _4-4 And C _p2 ＝{L _3-c }；

S204, feature map L _4-0 And omega ₂ The output characteristic diagram is cascaded to obtain L _4-5 ；

S205, supplementing deep image information and model phi to L _3-c And L _4-5 Carrying out cascade processing to obtain a characteristic diagram L _4-6 And performing a channel compression operation to obtain L _4-c ；

S206, matching feature map L _4-c Performing deconvolution operation and upsampling operation, and comparing with the feature map L _3-c Cascading to obtain a characteristic diagram L _3-u (ii) a For feature map set { L _3-u ,L _2-c And { L } _2-u ，L _2-c VI, the step is repeated to obtain L in sequence _2-u And L _1-u 。

Further preferably, in S3, when the final fused feature map formed by multiple separation and concatenation is classified by a classifier, the classifier is represented by the following formula:

wherein, f _i For the output vector of the convolutional layer, exp () is a logarithmic function, F (F) _i ) Outputting the classified result; as a binary task, F (F) _i ) Has an output range of [0,1 ]]Indicating the probability of a pixel change.

Further preferably, in S3, the method further includes binarizing the change probability results of all pixels to obtain a prediction result CM of change detection, and using the prediction result CM and the true value CM ^* The degree of similarity between them calculates the loss function.

Further preferably, the loss function is expressed by the following formula:

E＝E _bce +λE _dc

wherein λ is a weight control parameter for regulating E _bce And E _dc Ratio of between, E _bce As a result of a cross-entropy loss function of the binary class, E _dc And calculating a result for the Dice coefficient loss function.

Compared with the prior art, the optical satellite image pixel level change detection method based on multi-scale feature fusion at least comprises the following advantages:

the invention provides a multi-scale feature fusion strategy aiming at the detection of the change position in the satellite remote sensing images acquired at different time before and after, which can effectively resist the registration error between the satellite images and further improve the precision of pixel-level change detection. The half-group convolution designed by the invention can effectively improve the processing efficiency of the model and reduce the processing time of change detection. Based on a multi-scale feature fusion strategy and a half-group convolution module, the change detection model constructed by the method has better feature extraction capability and higher processing efficiency, can resist the problem of registration error between satellite images, and is more suitable for the pixel-level change detection of the optical satellite images.

Drawings

FIG. 1 is a schematic diagram of a multi-scale feature fusion strategy proposed by the present invention;

FIG. 2 is a diagram of a half set of convolution modules according to the present invention;

FIG. 3 is a schematic structural diagram of a change detection model according to the present invention;

FIG. 4 is a diagram of some examples of the detection of changes in test data.

FIG. 5 is a flowchart of the method for detecting pixel-level variation of optical satellite images based on multi-scale feature fusion according to the present invention,

Detailed Description

The invention is described in further detail below with reference to the figures and the detailed description.

As shown in fig. 1, according to an implementation sequence, the method for detecting pixel level changes of an optical satellite image based on multi-scale feature fusion according to an embodiment of the present invention includes the following steps of input and output definition, model training, and use:

as shown in fig. 5, when in use, comprises

The model definition and training comprises the following steps:

input-output definition: the input data of the method are two remote sensing images I which need to be subjected to change detection ₁ And I ₂ .I ₁ And I ₂ The method is characterized in that large-range remote sensing images acquired in different periods are acquired through strict geometric correction, orthorectification, resampling and other steps. The output data of the method is a binary change detection result image, namely a binary image CM. Image I ₁ 、I ₂ And CM have identical image size, ground resolution and geographic coverage. In the image CM, the pixel value C (m, n) =0 in the mth column and nth row indicates that no feature change has occurred at the position, and C (m, n) =1 indicates that a feature change has occurred at the position.

Model training: annotating data by mass of human beings { (I) ₁ ,I ₂ ,CM ^* ) _t I T =1,2, \ 8230A T }trainingAnd (3) refining the proposed change detection model phi = { theta, K, gamma }. Wherein CM ^* Detecting a pattern spot for a manually annotated change, hereinafter referred to as a truth value; Θ represents the model parameters to be trained; k represents a designed network characteristic diagram; Γ denotes a change detection classifier. The invention adopts a multi-scale feature fusion strategy in the proposed change detection model phi to improve the robustness of the model to the image registration error and the change detection precision, and the structure of the strategy is shown in figure 1. And a half-group rolling module is designed to improve the detection efficiency of the model, and the structure of the module is shown in figure 2. During the training process, the output of the network phi is K ^c And c represents the number of feature map channels. To K ^c Performing feature dimension reduction to obtain a feature map K ¹ Then, the change detection classifier Γ { K } can be used ¹ 2 pair K ¹ And (5) carrying out binary classification processing to obtain a binary change detection result diagram CM. Model prediction result graph CM and actual change condition CM through calculation ^* The degree of similarity between them supervises the training process of the model and updates the learnable parameters in the model by a back-propagation strategy. The training process needs iteration, loss functions are reduced and model performance is improved by continuously updating model parameters until an iteration stop condition is met.

Model prediction: and (3) carrying out change detection on the image to be detected by using the model phi = { theta, K, gamma } parameters obtained through full training to obtain a change detection binary image. During use, the model parameters Φ = { Θ, K, Γ } are fixed.

Preferably, the convolutional neural network model Φ used in the model training includes a multi-scale feature fusion strategy Ψ = { F = { (F) _i ,DI _j }. The strategy includes two types of image processing operations, namely a spatial feature extraction operation F and an image down-sampling process DI. F _i Subscript i in the drawing indicates an output feature graph obtained by each branch after i spatial feature extraction branch processes; DI _j The subscript j in (a) indicates the output result obtained by each branch after the image down-sampling branch processing by j branches.

A specific use of this structure comprises the following sub-steps:

step (i): fig. 1 shows a multi-scale feature fusion strategy proposed by the present invention. For an input image, the processing procedure outputs a plurality of image features with different scales and different feature levels. Preferably, the spatial feature extraction branch and the downsampling processing branch in the multi-scale feature fusion strategy are the same in number, i.e. i = j.

Step (ii) preferably, the convolution operations taken by the spatial extraction branch are both two convolution kernels of size {3 × 3}, and F _i Has a convolution step size of 2 ^i-1 . The spatial feature extraction operation is followed by pooling operation to obtain an output feature map F of the processing branch _i . Preferably, a maximum pooling operation is employed, and the step size of the pooling operation is 2.

Step (iii) the down-sampling operation is preferably performed using bilinear interpolation, and DI ₁ Down-sampling scale of 2 ⁱ⁺¹ . Sequential DI ₂ To DI _j Has the scale of { 2} ⁱ⁺² ，…,2 ^i+j }. Preferably, the image is downsampled and then convolved to obtain the output DI of the branch _j . Preferably, the convolution process is performed using a convolution kernel of {1 × 1} size.

Preferably, the convolutional neural network model Φ used for model training includes a half set of convolutional modules Ω = { S = { _p ,C _p }. The half set of convolution modules Ω contains two components: separation of fraction S _p And a cascade part C _p . The separation part recombines and groups the parts in the input characteristic diagram, and the cascade part cascades the characteristic diagram without the separation part in the input characteristic diagram. The characteristic diagram of the separation part is processed through corresponding convolution operation and is further cascaded with the characteristic diagram of the cascading part to obtain a final output characteristic diagram.

The specific use of the module comprises the following sub-steps:

step (i): fig. 2 shows a half set of convolution modules designed by the present invention. The half-set convolution splits a process from an input profile to an output profile into a split part and a concatenated part S _p And C _p . Preferably, S is _p And C _p Have the sameThe number of channels in (2) is 1/2 of the number of channels of the input feature map.

Step (ii) preferably, a fraction S is separated _p Dividing a given characteristic diagram into g groups, wherein the number of channels of each group is 1/g. After the grouping of the feature maps is completed, each grouping is followed by two convolution kernels with the size of {3 x 3} to carry out high-level feature extraction.

And (iii) preferably, the cascading part performs cascading processing on the characteristic maps according to the channel dimension, namely performing element-level summation operation on the given characteristic map in the channel dimension.

Fig. 3 shows an efficient optical satellite image pixel-level variation detection model based on a multi-scale feature fusion strategy designed by the present invention. Preferably, the specific training process of the model Φ includes the following sub-steps:

step I: will change the detection image to { (I) ₁ ,I ₂ ,CM ^* ) _t I T =1,2, \8230, T is used as the input of the model phi, and an input characteristic diagram L under different scales is obtained through a multi-scale characteristic fusion strategy _1-0, …L _(i+j)-0 。

Step II: through two branch pairs L _1-0 Extracting high-level features to obtain a feature map L _2-1 And L _2-2 . Characteristic diagram L _2-0 ,L _2-1 And L _2-2 Form the first packet convolution omega ₁ Is input. Omega ₁ The output characteristic map S obtained by the separation section of _p1 ＝{L _3-1 ,L _3-2 ，L _3-3 Fourthly, obtaining an output characteristic diagram C of the cascade part _p1 ＝{L _2-c }。

Step III: characteristic diagram L _3-0 And omega ₁ The output characteristic diagram is cascaded to obtain L _3-4 . Characteristic diagram L _3-0 ，Ω ₁ The separation section of (2) outputs a result S _p1 And characteristic diagram L _3-4 Together forming a second packet convolution omega ₂ Is input. The outputs of which are respectively S _p2 ＝{L _4-1 ,L _4-2 ，L _4-3 ，L _4-4 } and C _p2 ＝{L _3-c }。

Step IV, feature map L _4-0 And omega ₂ The output characteristic diagram is cascaded to obtain L _4-5 。

V, supplementing deep image information, model omega to L _3-c And L _4-5 Carrying out cascade processing to obtain a characteristic diagram L _4-6 And performing a channel compression operation to obtain L _4-c 。

VI, comparing the characteristic diagram L _4-c Performing deconvolution operation and upsampling operation, and comparing with the feature map L _3-c Cascading to obtain a characteristic diagram L _3-u . For feature map set { L _3-u ,L _2-c And { L } _2-u ，L _2-c Repeating the step VI to obtain L in sequence _2-u And L _1-u 。

Step VII, to L _1-u Performing deconvolution processing to obtain a feature map K of the convolution layer ^c (m, n), wherein c represents the number of channels of the feature map, and (m, n) represents the number of rows and columns of the image. To K ^c Performing dimension transformation to obtain K ¹ Where 1 indicates that the result obtained is a single channel vector. Add classifier Γ = { K ] after convolutional layers ¹ ,2}. T is the input feature vector K ¹ Two classifications are made. Preferably, the classifier Γ may be defined as:

wherein f is _i For the output vector of the convolutional layer, exp () is the logarithm function, F (F) _i ) And outputting the classification result. As a binary task, F (F) _i ) Has an output range of [0,1 ]]And represents the probability of the pixel (m, n) changing. And carrying out binarization on the change probability results of all pixels to obtain a prediction result CM of change detection.

Preferably, the model training step uses a loss function consisting of a two-class cross-entropy loss function E _bce And Dice coefficient loss function E _dc In combination of wherein E _bce And E _dc Can be defined as:

wherein N is an image I ₁ The total number of pixels. y is _n＝1 Indicating the number of pixels that change in the image. y is _n＝0 Indicating the number of pixels unchanged. p is a radical of formula _n Indicating the probability of change.

Wherein Y represents a given variation diagram true value,

a graph showing the predicted change results.

Preferably, the loss function used in the model training process may be defined as:

E＝E _bce +λE _dc

wherein λ is a weight control parameter for regulating E _bce And E _dc To each other. Preferably, it is set to 0.5.

The training process needs iteration, loss functions are reduced and network performance is improved by continuously updating model parameters until an iteration stop condition is met.

And (3) model prediction, namely fixing the network phi, and carrying out change detection on each pair of images to be detected to obtain a change detection result graph CM of the corresponding size. Fig. 4 is a diagram showing the true value of the change detection and the result of the change detection of the images 1 and 2 obtained by the method of the present invention.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. An optical satellite image pixel level change detection method based on multi-scale feature fusion is characterized by comprising the following steps:

2. The multi-scale feature fusion-based optical satellite image pixel-level change detection method according to claim 1, wherein in S1, the change image to be detected is a large-range remote sensing image obtained at 2 different periods and is subjected to geometric correction, orthorectification and resampling.

3. The method for detecting pixel-level variation of optical satellite images based on multi-scale feature fusion according to claim 1, wherein in S1, the multi-scale feature fusion strategy includes setting a plurality of spatial feature extraction branches and a plurality of image down-sampling processing branches, and the number of the spatial feature extraction branches and the number of the down-sampling processing branches are the same.

4. The method for detecting pixel-level variation of optical satellite images based on multi-scale feature fusion of claim 3, wherein each spatial feature extraction branch uses two 3 × 3 convolution kernels to connect pooling layer extraction features to obtain a feature map of each spatial feature extraction branch.

5. The method for detecting pixel-level changes of optical satellite images based on multi-scale feature fusion as claimed in claim 3, wherein the down-sampling process of each image down-sampling processing branch is performed by using a bilinear interpolation method.

6. The method for detecting the pixel-level change of the optical satellite image based on the multi-scale feature fusion as claimed in claim 1, wherein in S2, the feature map processed by the multi-scale feature fusion strategy is used as the input of a convolutional neural network, and a fusion feature map is finally formed by using a plurality of half-group convolution modules, comprising the following steps:

s201, image pair { (I) is changed ₁ ,I ₂ ,CM ^* ) _t I T =1,2, \ 8230A T } is used as the input of the change detection model phi, and the input characteristic diagram L under different scales is obtained through a multi-scale characteristic fusion strategy _1-0 ,…L _4-0 ；

S202, passing through two branch pairs L _1-0 Extracting high-level features to obtain a feature map L _2-1 And L _2-2 (ii) a Characteristic diagram L _2-0 ,L _2-1 And L _2-2 Forming a first half set of convolution modules omega ₁ The input of (1); omega ₁ The output characteristic map S obtained by the separation section of _p1 ＝{L _3-1 ,L _3-2 ，L _3-3 Fourthly, obtaining an output characteristic diagram C of the cascade part _p1 ＝{L _2-c }；

S203, characteristic diagram L _3-0 And omega ₁ The output characteristic diagram is cascaded to obtain L _3-4 (ii) a Characteristic diagram L _3-0 And omega ₁ The separation section of (2) outputs a result S _p1 And characteristic diagram L _3-4 Together forming a second half set of convolution modules omega ₂ The input of (1); omega ₂ Respectively is S _p2 ＝{L _4-1 ,L _4-2 ，L _4-3 ，L _4-4 } and C _p2 ＝{L _3-c }；

S204, feature diagram L _4-0 And omega ₂ The output characteristic diagram is cascaded to obtain L _4-5 ；

S205, supplementing deep image information and model phi to L _3-c And L _4-5 Performing cascade processing to obtain a characteristic diagram L _4-6 And performing a channel compression operation to obtain L _4-c ；

S206, comparing feature map L _4-c Performing deconvolution operation and upsampling operation, and comparing with the feature map L _3-c Cascading to obtain a characteristic diagram L _3-u (ii) a For feature map set { L _3-u ,L _2-c And { L } _2-u ，L _2-c Repeating the step VI to obtain L in sequence _2-u And L _1-u 。

7. The method for detecting pixel-level variation of optical satellite images based on multi-scale feature fusion of claim 1, wherein in S3, when the final fused feature map formed by multiple separation and concatenation is classified by a classifier, the classifier is represented by the following formula:

8. The method for detecting pixel-level variation of optical satellite images based on multi-scale feature fusion as claimed in claim 7, further comprising in S3 binarizing the variation probability results of all pixels to obtain a prediction result CM of variation detection, and using the prediction result CM and a true value CM ^* The degree of similarity between them calculates the loss function.

9. The method for detecting pixel-level changes of optical satellite imagery based on multi-scale feature fusion according to claim 8, wherein the loss function is expressed by the following formula:

E＝E _bce +λE _dc

wherein λ is a weight control parameter for regulating E _bce And E _dc Ratio of E to E _bce As a result of a cross-entropy loss function of the binary class, E _dc And calculating a result for the Dice coefficient loss function.