Disclosure of Invention
Therefore, the invention aims to provide a method for detecting the pixel-level change of an optical satellite image based on multi-scale feature fusion, so as to solve the problem of insufficient registration precision and realize the precision of the pixel-level change of a remote sensing image.
In order to achieve the above object, the present invention provides a method for detecting pixel-level changes of an optical satellite image based on multi-scale feature fusion, which comprises the following steps:
s1, acquiring a plurality of groups of change image pairs to be detected, and inputting the change image pairs into a trained change detection model; the change detection model adopts a convolution neural network fusion multi-scale feature fusion strategy; the multi-scale feature fusion strategy is used for carrying out different processing on different types of images to obtain a plurality of feature graphs with different scales and different feature levels;
s2, taking the feature map processed by the multi-scale feature fusion strategy as the input of a convolutional neural network, wherein the convolutional neural network comprises a plurality of half groups of convolution modules, and each half group of convolution modules comprises a separation part and a cascade part; the separation part is used for recombining and grouping parts in the input feature maps to form a plurality of sub-feature maps, and the cascade part is used for carrying out cascade processing on the input feature maps and the sub-feature maps;
and S3, carrying out secondary classification on the final fusion characteristic graph formed after multiple separation and cascade connection by using a classifier to obtain a final change detection result graph.
Further preferably, in S1, the image to be detected is a large-range remote sensing image acquired for 2 different periods, and is subjected to geometric correction, orthorectification and resampling.
Further preferably, in S1, the multi-scale feature fusion strategy includes setting a plurality of spatial feature extraction branches and a plurality of image down-sampling processing branches, where the number of the spatial feature extraction branches and the number of the down-sampling processing branches are the same.
Further preferably, each spatial feature extraction branch adopts two 3 × 3 convolution kernels to connect the pooled layer extraction features, so as to obtain a feature map of each spatial feature extraction branch.
Further preferably, the down-sampling process of each image down-sampling processing branch is performed by using a bilinear interpolation method.
Further preferably, in S2, the step of using the feature map processed by the multi-scale feature fusion policy as an input of a convolutional neural network and finally forming a fusion feature map by using a plurality of half-group convolution modules includes the following steps:
s201, image pair { (I) is changed 1 ,I 2 ,CM * ) t I T =1,2, \8230, T is used as the input of the change detection model phi, and the input characteristic diagram L under different scales is obtained through a multi-scale characteristic fusion strategy 1 -0,…L 4-0 ;
S202, passing through two branch pairs L 1-0 Extracting high-level features to obtain a feature map L 2-1 And L 2-2 (ii) a Characteristic diagram L 2-0 ,L 2-1 And L 2-2 Forming a first half set of convolution modules omega 1 The input of (1); omega 1 Is obtained as an output characteristic map S p1 ={L 3-1 ,L 3-2 ,L 3-3 Fourthly, obtaining an output characteristic diagram C of the cascade part p1 ={L 2-c };
S203, characteristic diagram L 3-0 And omega 1 Output feature map ofCascade processing to obtain L 3-4 (ii) a Characteristic diagram L 3-0 And omega 1 The separation section of (2) outputs a result S p1 And characteristic diagram L 3-4 Together forming a second half set of convolution modules omega 2 The input of (1); omega 2 Respectively is S p2 ={L 4-1 ,L 4-2 ,L 4-3 ,L 4-4 And C p2 ={L 3-c };
S204, feature map L 4-0 And omega 2 The output characteristic diagram is cascaded to obtain L 4-5 ;
S205, supplementing deep image information and model phi to L 3-c And L 4-5 Carrying out cascade processing to obtain a characteristic diagram L 4-6 And performing a channel compression operation to obtain L 4-c ;
S206, matching feature map L 4-c Performing deconvolution operation and upsampling operation, and comparing with the feature map L 3-c Cascading to obtain a characteristic diagram L 3-u (ii) a For feature map set { L 3-u ,L 2-c And { L } 2-u ,L 2-c VI, the step is repeated to obtain L in sequence 2-u And L 1-u 。
Further preferably, in S3, when the final fused feature map formed by multiple separation and concatenation is classified by a classifier, the classifier is represented by the following formula:
wherein, f i For the output vector of the convolutional layer, exp () is a logarithmic function, F (F) i ) Outputting the classified result; as a binary task, F (F) i ) Has an output range of [0,1 ]]Indicating the probability of a pixel change.
Further preferably, in S3, the method further includes binarizing the change probability results of all pixels to obtain a prediction result CM of change detection, and using the prediction result CM and the true value CM * The degree of similarity between them calculates the loss function.
Further preferably, the loss function is expressed by the following formula:
E=E bce +λE dc
wherein λ is a weight control parameter for regulating E bce And E dc Ratio of between, E bce As a result of a cross-entropy loss function of the binary class, E dc And calculating a result for the Dice coefficient loss function.
Compared with the prior art, the optical satellite image pixel level change detection method based on multi-scale feature fusion at least comprises the following advantages:
the invention provides a multi-scale feature fusion strategy aiming at the detection of the change position in the satellite remote sensing images acquired at different time before and after, which can effectively resist the registration error between the satellite images and further improve the precision of pixel-level change detection. The half-group convolution designed by the invention can effectively improve the processing efficiency of the model and reduce the processing time of change detection. Based on a multi-scale feature fusion strategy and a half-group convolution module, the change detection model constructed by the method has better feature extraction capability and higher processing efficiency, can resist the problem of registration error between satellite images, and is more suitable for the pixel-level change detection of the optical satellite images.
Detailed Description
The invention is described in further detail below with reference to the figures and the detailed description.
As shown in fig. 1, according to an implementation sequence, the method for detecting pixel level changes of an optical satellite image based on multi-scale feature fusion according to an embodiment of the present invention includes the following steps of input and output definition, model training, and use:
as shown in fig. 5, when in use, comprises
S1, acquiring a plurality of groups of change image pairs to be detected, and inputting the change image pairs into a trained change detection model; the change detection model adopts a convolution neural network fusion multi-scale feature fusion strategy; the multi-scale feature fusion strategy is used for carrying out different processing on different types of images to obtain a plurality of feature graphs with different scales and different feature levels;
s2, taking the feature map processed by the multi-scale feature fusion strategy as the input of a convolutional neural network, wherein the convolutional neural network comprises a plurality of half groups of convolution modules, and each half group of convolution modules comprises a separation part and a cascade part; the separation part is used for recombining and grouping parts in the input feature maps to form a plurality of sub-feature maps, and the cascade part is used for carrying out cascade processing on the input feature maps and the sub-feature maps;
and S3, carrying out secondary classification on the final fusion characteristic graph formed after multiple separation and cascade connection by using a classifier to obtain a final change detection result graph.
The model definition and training comprises the following steps:
input-output definition: the input data of the method are two remote sensing images I which need to be subjected to change detection 1 And I 2 .I 1 And I 2 The method is characterized in that large-range remote sensing images acquired in different periods are acquired through strict geometric correction, orthorectification, resampling and other steps. The output data of the method is a binary change detection result image, namely a binary image CM. Image I 1 、I 2 And CM have identical image size, ground resolution and geographic coverage. In the image CM, the pixel value C (m, n) =0 in the mth column and nth row indicates that no feature change has occurred at the position, and C (m, n) =1 indicates that a feature change has occurred at the position.
Model training: annotating data by mass of human beings { (I) 1 ,I 2 ,CM * ) t I T =1,2, \ 8230A T }trainingAnd (3) refining the proposed change detection model phi = { theta, K, gamma }. Wherein CM * Detecting a pattern spot for a manually annotated change, hereinafter referred to as a truth value; Θ represents the model parameters to be trained; k represents a designed network characteristic diagram; Γ denotes a change detection classifier. The invention adopts a multi-scale feature fusion strategy in the proposed change detection model phi to improve the robustness of the model to the image registration error and the change detection precision, and the structure of the strategy is shown in figure 1. And a half-group rolling module is designed to improve the detection efficiency of the model, and the structure of the module is shown in figure 2. During the training process, the output of the network phi is K c And c represents the number of feature map channels. To K c Performing feature dimension reduction to obtain a feature map K 1 Then, the change detection classifier Γ { K } can be used 1 2 pair K 1 And (5) carrying out binary classification processing to obtain a binary change detection result diagram CM. Model prediction result graph CM and actual change condition CM through calculation * The degree of similarity between them supervises the training process of the model and updates the learnable parameters in the model by a back-propagation strategy. The training process needs iteration, loss functions are reduced and model performance is improved by continuously updating model parameters until an iteration stop condition is met.
Model prediction: and (3) carrying out change detection on the image to be detected by using the model phi = { theta, K, gamma } parameters obtained through full training to obtain a change detection binary image. During use, the model parameters Φ = { Θ, K, Γ } are fixed.
Preferably, the convolutional neural network model Φ used in the model training includes a multi-scale feature fusion strategy Ψ = { F = { (F) i ,DI j }. The strategy includes two types of image processing operations, namely a spatial feature extraction operation F and an image down-sampling process DI. F i Subscript i in the drawing indicates an output feature graph obtained by each branch after i spatial feature extraction branch processes; DI j The subscript j in (a) indicates the output result obtained by each branch after the image down-sampling branch processing by j branches.
A specific use of this structure comprises the following sub-steps:
step (i): fig. 1 shows a multi-scale feature fusion strategy proposed by the present invention. For an input image, the processing procedure outputs a plurality of image features with different scales and different feature levels. Preferably, the spatial feature extraction branch and the downsampling processing branch in the multi-scale feature fusion strategy are the same in number, i.e. i = j.
Step (ii) preferably, the convolution operations taken by the spatial extraction branch are both two convolution kernels of size {3 × 3}, and F i Has a convolution step size of 2 i-1 . The spatial feature extraction operation is followed by pooling operation to obtain an output feature map F of the processing branch i . Preferably, a maximum pooling operation is employed, and the step size of the pooling operation is 2.
Step (iii) the down-sampling operation is preferably performed using bilinear interpolation, and DI 1 Down-sampling scale of 2 i+1 . Sequential DI 2 To DI j Has the scale of { 2} i+2 ,…,2 i+j }. Preferably, the image is downsampled and then convolved to obtain the output DI of the branch j . Preferably, the convolution process is performed using a convolution kernel of {1 × 1} size.
Preferably, the convolutional neural network model Φ used for model training includes a half set of convolutional modules Ω = { S = { p ,C p }. The half set of convolution modules Ω contains two components: separation of fraction S p And a cascade part C p . The separation part recombines and groups the parts in the input characteristic diagram, and the cascade part cascades the characteristic diagram without the separation part in the input characteristic diagram. The characteristic diagram of the separation part is processed through corresponding convolution operation and is further cascaded with the characteristic diagram of the cascading part to obtain a final output characteristic diagram.
The specific use of the module comprises the following sub-steps:
step (i): fig. 2 shows a half set of convolution modules designed by the present invention. The half-set convolution splits a process from an input profile to an output profile into a split part and a concatenated part S p And C p . Preferably, S is p And C p Have the sameThe number of channels in (2) is 1/2 of the number of channels of the input feature map.
Step (ii) preferably, a fraction S is separated p Dividing a given characteristic diagram into g groups, wherein the number of channels of each group is 1/g. After the grouping of the feature maps is completed, each grouping is followed by two convolution kernels with the size of {3 x 3} to carry out high-level feature extraction.
And (iii) preferably, the cascading part performs cascading processing on the characteristic maps according to the channel dimension, namely performing element-level summation operation on the given characteristic map in the channel dimension.
Fig. 3 shows an efficient optical satellite image pixel-level variation detection model based on a multi-scale feature fusion strategy designed by the present invention. Preferably, the specific training process of the model Φ includes the following sub-steps:
step I: will change the detection image to { (I) 1 ,I 2 ,CM * ) t I T =1,2, \8230, T is used as the input of the model phi, and an input characteristic diagram L under different scales is obtained through a multi-scale characteristic fusion strategy 1-0, …L (i+j)-0 。
Step II: through two branch pairs L 1-0 Extracting high-level features to obtain a feature map L 2-1 And L 2-2 . Characteristic diagram L 2-0 ,L 2-1 And L 2-2 Form the first packet convolution omega 1 Is input. Omega 1 The output characteristic map S obtained by the separation section of p1 ={L 3-1 ,L 3-2 ,L 3-3 Fourthly, obtaining an output characteristic diagram C of the cascade part p1 ={L 2-c }。
Step III: characteristic diagram L 3-0 And omega 1 The output characteristic diagram is cascaded to obtain L 3-4 . Characteristic diagram L 3-0 ,Ω 1 The separation section of (2) outputs a result S p1 And characteristic diagram L 3-4 Together forming a second packet convolution omega 2 Is input. The outputs of which are respectively S p2 ={L 4-1 ,L 4-2 ,L 4-3 ,L 4-4 } and C p2 ={L 3-c }。
Step IV, feature map L 4-0 And omega 2 The output characteristic diagram is cascaded to obtain L 4-5 。
V, supplementing deep image information, model omega to L 3-c And L 4-5 Carrying out cascade processing to obtain a characteristic diagram L 4-6 And performing a channel compression operation to obtain L 4-c 。
VI, comparing the characteristic diagram L 4-c Performing deconvolution operation and upsampling operation, and comparing with the feature map L 3-c Cascading to obtain a characteristic diagram L 3-u . For feature map set { L 3-u ,L 2-c And { L } 2-u ,L 2-c Repeating the step VI to obtain L in sequence 2-u And L 1-u 。
Step VII, to L 1-u Performing deconvolution processing to obtain a feature map K of the convolution layer c (m, n), wherein c represents the number of channels of the feature map, and (m, n) represents the number of rows and columns of the image. To K c Performing dimension transformation to obtain K 1 Where 1 indicates that the result obtained is a single channel vector. Add classifier Γ = { K ] after convolutional layers 1 ,2}. T is the input feature vector K 1 Two classifications are made. Preferably, the classifier Γ may be defined as:
wherein f is i For the output vector of the convolutional layer, exp () is the logarithm function, F (F) i ) And outputting the classification result. As a binary task, F (F) i ) Has an output range of [0,1 ]]And represents the probability of the pixel (m, n) changing. And carrying out binarization on the change probability results of all pixels to obtain a prediction result CM of change detection.
Preferably, the model training step uses a loss function consisting of a two-class cross-entropy loss function E bce And Dice coefficient loss function E dc In combination of wherein E bce And E dc Can be defined as:
wherein N is an image I 1 The total number of pixels. y is n=1 Indicating the number of pixels that change in the image. y is n=0 Indicating the number of pixels unchanged. p is a radical of formula n Indicating the probability of change.
Wherein Y represents a given variation diagram true value,
a graph showing the predicted change results.
Preferably, the loss function used in the model training process may be defined as:
E=E bce +λE dc
wherein λ is a weight control parameter for regulating E bce And E dc To each other. Preferably, it is set to 0.5.
The training process needs iteration, loss functions are reduced and network performance is improved by continuously updating model parameters until an iteration stop condition is met.
And (3) model prediction, namely fixing the network phi, and carrying out change detection on each pair of images to be detected to obtain a change detection result graph CM of the corresponding size. Fig. 4 is a diagram showing the true value of the change detection and the result of the change detection of the images 1 and 2 obtained by the method of the present invention.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.