CN114494870A

CN114494870A - Double-time-phase remote sensing image change detection method, model construction method and device

Info

Publication number: CN114494870A
Application number: CN202210073167.5A
Authority: CN
Inventors: 于建志; 曹书语; 丁兆旭; 王智慧; 崔宾阁; 刘成龙; 曹越
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-05-13

Abstract

The invention discloses a double time-phase remote sensing image change detection method, a model construction method and a device, belonging to the technical field of remote sensing image processing, wherein a depth residual error network model added with an extrusion-excitation module is adopted to construct a double time-phase remote sensing image feature extractor, the feature extractor integrates rich semantic information of high-level features and rich detail information of low-level features, the extrusion-excitation module is introduced to weight information of each channel, so that the model focuses more on important features, the feature extraction effect is improved, a pyramid attention module improves the image cutting mode aiming at the existing targets detecting different sizes and changes, cuts a feature graph into a plurality of groups of feature sub-graphs with edge pixels of different sizes overlapped, and processes the feature sub-graphs by using a common attention algorithm, the detection capability of the model on different size change areas, particularly the change areas with smaller sizes, is improved.

Description

Double-time-phase remote sensing image change detection method, model construction method and device

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a double-time-phase remote sensing image change detection method, a model construction method and a device.

Background

In recent years, remote sensing images are widely used because of the ability to conveniently and accurately acquire surface information of various parts of the world. Change detection has been studied for decades as an important branch of the field of remote sensing image research. Today, the change detection task generally includes several aspects: land use change detection, forest and vegetation change detection, city expansion change detection, earthquake, forest fire and other disaster monitoring and evaluation, and the like. The change detection task usually involves a wide ground surface area, the manual execution is time-consuming and labor-consuming, and the introduction of the change detection algorithm based on deep learning greatly saves the manpower, material resources and financial resources of researchers.

The change detection method for the double-time phase remote sensing image provided by the prior art mainly comprises the following steps:

the first, traditional method: firstly, preprocessing two double-time-phase remote sensing images, then obtaining difference maps of the two images by methods of a change vector analysis method, wavelet fusion and the like, and finally processing the difference maps to generate a binary change detection image. The traditional method greatly depends on the experience judgment of engineers and the long-time debugging of the algorithm, and the traditional method has high implementation cost.

Secondly, a change detection method based on pixels: and extracting the change information of the two double-time-phase remote sensing images by processing the spectral information of the pixel point pairs at the same positions of the two double-time-phase remote sensing images. The method can retain the detail information which is not possessed by other methods, and is simple and easy to implement and wide in application. However, the method has poor robustness to some interference factors, such as the change of illumination angle and intensity, registration error and the like; and the spatial position information of each pixel point and adjacent pixel points is not fully explored.

Thirdly, based on the change detection algorithm of the object: the remote sensing image is segmented firstly, and then the change detection is carried out on the object generated by segmentation. The method makes up the defects of the pixel-based change detection method to a certain extent, the image segmentation pays more attention to the spatial information among the pixel points, and the robustness of the method to noise is enhanced. However, since the integrity and accuracy of the existing segmentation techniques need to be improved, further improvement is needed in this method.

The detection effect of the methods on the variable targets with different sizes under the complex background is poor; the identification capability of a variable region and an invariable region in a remote sensing image needs to be improved; and the problem of reduced training effect caused by the imbalance of the positive and negative samples of the training data set cannot be effectively solved.

Disclosure of Invention

The invention provides a double-temporal remote sensing image change detection method, a model construction method and a device, and aims to solve the problems that in the prior art, a change detection method for double-temporal remote sensing images has poor detection effect on different size change targets under a complex background, poor recognition capability on a change area and an unchanged area, and reduced training effect caused by imbalance of positive and negative samples of a training data set.

The specific technical scheme provided by the invention is as follows:

on one hand, the method for constructing the double-time-phase remote sensing image change detection model provided by the invention comprises the following steps:

constructing a double-time-image remote sensing image feature extractor by adopting a depth residual error network model added with an extrusion-excitation module;

constructing a pyramid attention module for calculating the correlation between the pixel pairs of the double-temporal feature image and weighting and calculating the feature image by taking the correlation as a weight, wherein the inner nesting of the pyramid attention module designs a common attention algorithm;

and training a double-time-phase remote sensing image change detection model consisting of the double-time-phase remote sensing image feature extractor and the pyramid attention module by using a composite loss function, wherein the double-time-phase remote sensing image change detection model is used for acquiring a binary change detection image according to the double-time-phase remote sensing image.

Optionally, the constructing of the dual-temporal remote sensing image feature extractor specifically includes:

reserving the first convolution layer to the fifth convolution layer of the depth residual error network model, deleting the subsequent global pooling layer, full-link layer and activation function layer, and using the layers as a basic network of the double-time-image remote sensing image feature extractor;

adding a 1x1 convolutional layer after a second convolutional layer of the depth residual error network model, respectively adding a 1x1 convolutional layer and an upper sampling layer after a third convolutional layer and a fourth convolutional layer, and adding a global average pooling layer, a 1x1 convolutional layer, an upper sampling layer and a cascade layer after the output of a fifth convolutional layer;

and adding a squeezing-exciting module behind the cascade layer to integrally form the double-time-image remote sensing image feature extractor, wherein the squeezing-exciting module is used for carrying out weighted learning on each channel of the feature map.

Optionally, the specific process of constructing the pyramid attention module is as follows:

constructing a feature graph output by a double-time image remote sensing image feature extractor based on an adjacent sub-image edge pixel overlapping mechanism to obtain a feature sub-image;

predicting each pair of characteristic subgraphs by adopting a common attention algorithm to obtain an attention subgraph;

constructing an image splicing module for splicing the attention subgraphs, wherein the size of the spliced attention characteristic graph is equal to that of the image before cutting, and the splicing result of the overlapped pixel region is equal to the result of weighting the prediction results of the pixel regions corresponding to the two adjacent attention subgraphs with equal weight;

and constructing a cascade convolution module consisting of 1 cascade layer and 1x1 convolution layer, wherein the cascade convolution module is used for fusing the spliced attention feature map, and the size of the attention feature map generated after fusion is equal to that of the feature map before the attention feature map is input into the pyramid attention module.

Optionally, for each pair of feature sub-graphs, a common attention algorithm is used for prediction to obtain an attention sub-graph, specifically:

respectively transforming two sub-graphs contained in a pair of double-time phase subgraphs to a preset size and multiplying the sub-graphs to obtain a related characteristic graph, wherein each element of the related characteristic graph represents the related degree of two pixel points in the two sub-graphs;

and taking the relevant feature graph as a weight, respectively multiplying the weight by the two sub-graphs to obtain two weighted feature graphs, and fusing the weighted feature graphs and the two sub-graphs to obtain the attention sub-graph.

Optionally, an Adam optimizer is used to optimize the model parameters in the process of training the double-temporal remote sensing image change detection model by using the composite loss function.

Optionally, the composite loss function is:

wherein h and w respectively represent the height and width of the attention feature map; y is_ijValue, n, representing a label image_u、n_cRespectively representing the total number of the invariant pixel point pairs and the total number of the variant pixel point pairs in the attention feature map, and respectively completing the expansion of the distance of the variant pixel point pairs with too small distance and the expansion of the distance of the variant pixel point pairs with too small distance by Posdist and NegdistReducing the distance of the large invariant pixel point pair; posdiff, Negdiff are used to achieve learning that makes the model more robust to hard-to-distinguish samples.

Optionally, the total number of invariant pixel point pairs and the total number of variant pixel point pairs in the attention feature map are respectively calculated by the following formulas:

wherein h and w respectively represent the height and width of the attention feature map; y is_ijValue, n, representing a label image_u、n_cRespectively representing the total number of invariant pixel point pairs and the total number of variant pixel point pairs in the attention feature map.

Optionally, Posdist_ij＝max{m-D_ij，0}

Negdist_ij＝max{D_ij-τ，0}

Wherein D is_ijAnd representing the values of the ith row and the jth column in the distance map D, wherein alpha and beta are weighting coefficients, tau and m are threshold values, alpha is more than beta and is more than 1, and m is more than tau.

On the other hand, the invention also provides a double-time phase remote sensing image change detection model construction device, which comprises:

the feature extractor construction module is configured to construct a double-time-image remote sensing image feature extractor by adopting a depth residual error network model added with the extrusion-excitation module;

the pyramid attention building module is configured to build a pyramid attention module used for calculating correlation between the pixel pairs of the two-time phase feature image and performing weighted calculation on the feature image by taking the correlation as a weight, wherein a common attention algorithm is designed by the internal nesting of the pyramid attention module;

and the model training module is configured to adopt a preset remote sensing image data set, carry out image offset, brightness and contrast transformation on the preset remote sensing image data set, and then train a double-time-phase remote sensing image change detection model consisting of the double-time-phase remote sensing image feature extractor and the pyramid attention module by utilizing a composite loss function, wherein the double-time-phase remote sensing image change detection model is used for acquiring a binary change detection image according to the double-time-phase remote sensing image.

On the other hand, the invention also provides a double-time-phase remote sensing image change detection method, which adopts the double-time-phase remote sensing image change detection model constructed by the method to detect the change of the double-time-phase remote sensing image and output a binary change detection image.

The invention has the following beneficial effects:

the invention provides a double-time-phase remote sensing image change detection method, a model construction method and a device, wherein a double-time-phase remote sensing image feature extractor is constructed by adopting a depth residual error network model with an extrusion-excitation module, the feature extractor integrates rich semantic information of high-level features and rich detail information of low-level features, weighting is carried out on information of each channel by introducing the extrusion-excitation module, so that the model focuses on more important features, the feature extraction effect is improved, a pyramid attention module is improved in an image cutting mode aiming at the existing detection of different size change targets, the pyramid attention module cuts a feature map into a plurality of groups of feature sub-maps with edge pixels of different sizes overlapped, the feature sub-maps are processed by using a common attention algorithm, and the situation that the model processes different size change regions is improved, Especially the detection capability of the small-sized change area; the common attention algorithm calculates the correlation among the features in the double-temporal feature map, and weights the feature map by taking the correlation as weight, so that the recognition capability of the model on the variable region and the invariant region is improved. In the model training stage, a composite loss function and an Adam optimizer are used for optimizing the model, the composite loss function reduces the influence of imbalance of positive and negative samples on the training effect in a weighting mode, the learning strength of the model on difficultly-divided samples is increased, and therefore the prediction effect of the model under a complex background is improved; and the Adam optimizer can ensure the stable change of the model parameters in a certain range in the training process.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for constructing a double-temporal remote sensing image change detection model according to an embodiment of the invention;

fig. 2 is a schematic structural diagram of a double time-phase remote sensing image change detection model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a feature extractor according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of image segmentation using an adjacent sub-image edge pixel overlap mechanism according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a pyramid attention module according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a common attention module according to an embodiment of the present invention;

fig. 7 is a detection result illustration of a change detection method of a double-temporal remote sensing image according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises," "comprising," and "having," and any variations thereof, in the description and claims of this invention, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

A double-temporal remote sensing image change detection method, a model construction method and a device according to an embodiment of the present invention will be described in detail with reference to fig. 1 to 7.

Referring to fig. 1, fig. 2, fig. 3, and fig. 4, a method for constructing a double-temporal remote sensing image change detection model according to an embodiment of the present invention includes:

step 100: and constructing a double-time-image remote sensing image feature extractor by adopting a depth residual error network model added with an extrusion-excitation module.

The depth residual error network model of the embodiment of the invention is a ResNet-18 residual error network model, namely the feature extractor is realized based on the ResNet-18 residual error network model. The ResNet-18 model structure is as follows: the first convolutional layer conv1 comprises a 7 × 7 convolutional layer and a 3 × 3 global maximum pooling layer, after the conv1, there are conv2_ x, conv3_ x, conv4_ x and conv5_ x convolutional layers with a BasicBlock residual network as the main body, and finally, a global pooling layer and a full connection layer are followed. The feature extractor constructed by the embodiment of the invention deletes, changes and adds partial sub-structures of ResNet-18, so that the improved structure can be successfully applied to a double-time remote sensing image feature extraction task, the extracted feature graph has rich semantic information and detail information, and the learning strength of the model on more important features is increased by utilizing a channel weighting mechanism.

Specifically, step 100 includes the following steps:

(1) and reserving the first convolution layer to the fifth convolution layer of the depth residual error network model, deleting the subsequent full local pooling layer, full connection layer and activation function layer, and using the layers as a basic network of the double-time-image remote sensing image feature extractor.

Wherein, a ResNet-18 residual network model with the depth of 5 layers is introduced as a basic network of the feature extractor model. Only five convolutional layers of conv1, conv2_ x, conv3_ x, conv4_ x and conv5_ x of ResNet-18 are reserved, and the subsequent global pooling layer, full link layer and softmax activation function are deleted.

(2) A 1x1 convolutional layer is added after the second convolutional layer of the depth residual network model, a 1x1 convolutional layer and an upper sampling layer are added after the third convolutional layer and the fourth convolutional layer, respectively, and a global average pooling layer, a 1x1 convolutional layer, an upper sampling layer, and a cascade layer are added after the output of the fifth convolutional layer.

Adding a 1 × 1 convolutional layer after a conv2_ x convolutional layer of a ResNet-18 residual network model, respectively adding a 1 × 1 convolutional layer and an upsampling layer after two convolutional layers of conv3_ x and conv4_ x, and adding a global average pooling layer, a 1 × 1 convolutional layer and an upsampling layer after the output of conv5_ x to process the output feature maps of 4 modules; a concatenation layer is added to perform data fusion on the 4 feature maps in the channel dimension.

(3) And adding a squeezing-exciting module behind the cascade layer to integrally form the double-time-image remote sensing image characteristic extractor, wherein the squeezing-exciting module is used for carrying out weighted learning on each channel of the characteristic diagram.

Referring to fig. 1, fig. 3, and fig. 4, assuming that two double-time-phase remote sensing images are input into the double-time-phase remote sensing image change detection model according to the embodiment of the present invention, the feature extractor will process the two images respectively, and the respective processing processes are as follows: the image is firstly processed by a conv1 convolution layer and then sequentially processed by four convolution layers, namely conv2_ x, conv3_ x, conv4_ x and conv5_ x; the feature extractor performs global average pooling, convolution and upsampling on four feature maps of different feature levels generated by the four convolution layers, the feature maps are fused in the cascade layers, and then the feature maps are processed by an extrusion-excitation module (SE module layer), so that the model performs weighted learning on the features according to the importance of the different features. The feature map generated after processing is the output of the feature extractor, and then the two double time-phase remote sensing images are processed by the feature extractor to obtain two feature maps.

Conv2_ x, conv3_ x, conv4_ x and conv5_ x convolution layers with four different depths of the feature extractor can learn features of different levels of the remote sensing image. The low-level features contain richer detail information and higher resolution, but have less semantic information and more noise; high-level features have rich semantic features but have poor perceptibility to details. The feature extractor needs to fuse features of different levels to take advantage of the features of different levels. After the global average pooling layer is applied to the conv5_ x module, parameters can be simplified, the calculation amount can be reduced, and the purpose of reducing the dimension can be achieved. The SE module layer weights information of all channels, so that the learning strength of the model to the important features is increased, the feature extraction effect is improved, and the SE module layer is effective in semantic information supplement for the model.

Step 200: and constructing a pyramid attention module for calculating the correlation between the two-time phase feature image pixel pairs and weighting the feature image by taking the correlation as weight.

The image is cut into areas with different sizes for prediction by constructing a pyramid attention module, a plurality of pixels are overlapped between adjacent sub-images during cutting, and a common attention module is nested in a pyramid attention algorithm: the common attention algorithm calculates the correlation between the two-phase feature image pixel pairs and weights the feature images by taking the correlation as a weight.

If two images contain a same object, then the object will exhibit more similar characteristics in both images. Therefore, the double-temporal remote sensing image change detection model provided by the embodiment of the invention can increase the discrimination capability of a change target and a constant target by mining the correlation between the characteristics of double-temporal images, and is realized by designing a common attention module. In addition, the pixel sizes of the various change targets are different, and the prediction effect of the change detection algorithm on the small target is often unsatisfactory. The existing method generally solves the problem by cutting the image at different scale levels, but the image cutting inevitably causes a part of the changed target to be cut into two parts respectively positioned on two adjacent sub-images, thereby influencing the prediction effect. The image cutting method of the double-time-phase remote sensing image change detection model provided by the embodiment of the invention can ensure the locality of the subgraph and also ensure that most of the changed targets can be included by one subgraph with a proper size, thereby ensuring the integrity of the changed targets in the subgraph to a great extent.

Specifically, the specific process of constructing the pyramid attention module includes the following four steps:

the method comprises the following steps: and constructing a feature graph output by a feature extractor for cutting the double-time image remote sensing image based on an adjacent sub-image edge pixel overlapping mechanism to obtain a feature sub-image.

The feature graph obtained by the feature extractor can be equally cut into s x s sub-graphs (s belongs to {1, 2, 4, 8}) by constructing an image cutting module; the image cutting adopts a mechanism of overlapping adjacent sub-image edge pixels: for two sub-images with adjacent edges, pixel ranges extending from the adjacent edges to the insides of the two images by 5-10 pixel points respectively are shared by the two sub-images; after the two double-time-phase images are cut respectively, recombining the cut sub-images into a plurality of groups of double-time-phase characteristic sub-images;

assuming that a feature map with the size of 256 × 256 is input into the image segmentation module, four groups of feature sub-maps will be generated after the segmentation is completed. The first set of feature maps comprises one map, 256 × 256 in size, which is the same as the feature map before cutting.

The second group of characteristic subgraphs comprises four subgraphs, the cutting module firstly cuts the original characteristic graph into 2 x 2 subgraphs, and the size of each subgraph is 128 x 128; however, considering that the image cutting line may pass through some variation targets with smaller sizes, in order to ensure the integrity of the variation target in the sub-images to a greater extent, it is also necessary to overlap several pixels at the edges of adjacent sub-images, so the right edge and the lower edge of the sub-image located at the upper left corner after cutting are respectively enlarged by ten pixels to the right and downward, similarly, two edges of the three sub-images at the upper right, lower left and lower right and adjacent to other sub-images are also enlarged by ten pixels, and the size of the finally obtained feature sub-image is 138 × 138, as shown in fig. 5.

The third group of characteristic subgraphs comprises 16 subgraphs, wherein the size of each subgraph is 74 multiplied by 74, in order to ensure that all the subgraphs in the image of the group have the same size, if the upper edge, the lower edge, the left edge and the right edge of one subgraph are adjacent to other images at the same time, the subgraph is respectively expanded by five pixel points in the two directions, and if only one of the upper edge, the lower edge, the left edge and the right edge is adjacent to other subgraphs, the subgraph is expanded by ten pixel points in the direction.

Similarly, the fourth set of feature subgraphs comprises 64 subgraphs, each subgraph being 42 × 42 in size. The image cutting module cuts the two double-time-phase feature graphs respectively to obtain two large groups of subgraphs, and each large group of subgraphs comprises four groups of subgraphs with different sizes. For convenience of expression, two sub-images with the same size and position in the two groups of sub-images are called a pair of two-time phase characteristic sub-images.

Step two: and predicting by adopting a common attention algorithm for each pair of characteristic subgraphs to obtain an attention subgraph.

Specifically, two sub-graphs included in a pair of double-time phase subgraphs are respectively transformed to a preset size and then multiplied to obtain a related characteristic graph, wherein each element of the related characteristic graph represents the correlation degree of two pixel points in the two sub-graphs; and taking the relevant feature graph as a weight, respectively multiplying the weight by the two sub-graphs to obtain two weighted feature graphs, and fusing the weighted feature graphs and the two sub-graphs to obtain the attention sub-graph.

Predicting each pair of double temporal characteristic subgraphs by using a common attention module to obtain an attention subgraph; the two sub-graphs included in the pair of double-time-phase feature sub-graphs are respectively transformed to a preset size and then multiplied to obtain a related feature graph; each element of the correlation characteristic graph represents the correlation degree of two pixel points in the two sub-graphs;

it is assumed that the dimensions of the pair of feature sub-graphs are h × w, where the dimension of one feature sub-graph is transformed to N × 1 and the other feature sub-graph is transformed to 1 × N (N ═ h × w). After matrix multiplication, the size of the relevant characteristic graph is NXN, wherein each element corresponds to the similarity of two pixel points from two sub-graphs respectively. And taking the relevant feature graph as a weight, respectively multiplying the weight by the two sub-graphs to obtain two weighted feature graphs, and fusing the weighted feature graphs and the two sub-graphs to obtain the attention sub-graph.

The common attention module is used for acquiring the correlation between two features in the two double-time-phase remote sensing images. Since the corresponding features of the invariant target in the two-time phase image have great similarity, weighting the feature map according to the correlation between the features is helpful for improving the discrimination capability of the model on the variant region and the invariant region.

Step three: and constructing an image splicing module for splicing the attention subgraphs, wherein the size of the spliced attention characteristic graph is equal to that of the image before cutting, and the splicing result of the overlapped pixel region is equal to the result of weighting the prediction results of the pixel regions corresponding to the two adjacent attention subgraphs with equal weight.

Referring to fig. 5 and 6, the attention subgraphs are re-spliced by constructing an image splicing module, and the size of the spliced attention feature graph is equal to that of the image before cutting; for the overlapped pixel region, the splicing result is equal to the result of weighting the prediction results of the pixel regions corresponding to the two adjacent attention subgraphs in equal weight;

for example, for a 2 × 2 sub-image, pixel points in 1-118 rows and 118-138 columns in the original image respectively exist in the upper left sub-image and the upper right sub-image, and then in this region: and (the prediction result of the corresponding region of the upper left subgraph + the prediction result of the corresponding region of the upper right subgraph)/2. And for pixel points of 118-138 rows and 118-138 columns, respectively included by four sub-images, the splicing result of the pixel points in the region is equal to the average value of the prediction results of the corresponding regions of the four sub-images.

Step four: and constructing a cascade convolution module consisting of 1 cascade layer and 1x1 convolution layer, wherein the cascade convolution module is used for fusing the spliced attention feature map, and the size of the attention feature map generated after fusion is equal to that of the feature map before the attention feature map is input into the pyramid attention module.

Referring to fig. 6 and 7, the spliced attention feature map is fused by constructing a cascade convolution module, which is composed of a cascade (convolution) layer and a 1 × 1 convolution layer; the size of the attention feature map generated after fusion is equal to that of the feature map before input into the pyramid module.

Referring to fig. 1, 5 and 6, the pyramid attention module is helpful for identifying a change region with a smaller size by the double-temporal remote sensing image change detection model according to the embodiment of the present invention, the overlapping pixel mechanism is used to increase the integrity and accuracy of the sub-image edge position change target identification, and the common attention module is helpful for improving the discrimination capability of the double-temporal remote sensing image change detection model on the change region and the invariant region.

Step 300: and adopting a preset remote sensing image data set, carrying out image offset, brightness and contrast transformation on the preset remote sensing image data set, and then training a double-time-phase remote sensing image change detection model consisting of the double-time-image remote sensing image feature extractor and the pyramid attention module by utilizing a composite loss function.

The double-time-phase remote sensing image change detection model provided by the embodiment of the invention is used for obtaining a binary change detection image according to the double-time-phase remote sensing image. The robustness of the model to interference factors such as illumination intensity difference, illumination angle difference and registration difference is improved by performing data enhancement processing on the training data set; the influence of the imbalance of the positive and negative samples on the training effect is relieved by designing a composite loss function, the learning strength of the model on the difficultly-divided samples is increased, and the double-time-phase remote sensing image change detection model can obtain a more excellent effect in more complex change detection tasks; the Adam optimizer can realize that parameters of the double-time-phase remote sensing image change detection model change in a certain range in each iteration process, and the stability of the double-time-phase remote sensing image change detection model in the training process is improved.

Specifically, based on the disclosed remote sensing data set, the double-time-phase remote sensing image and the label image of the data set are preprocessed, and a training set and a test set of the double-time-phase remote sensing image change detection model used in the embodiment of the invention are constructed. The disclosed remote sensing data set comprises an SZTAKI data set and an LEVIR-CD data set.

The steps of preprocessing the double-temporal remote sensing image and the label image in the data set include, but are not limited to: and carrying out random image offset of not more than 5 pixel points and random image rotation of not more than 15 degrees on the double time phase remote sensing image and the label image, and carrying out random change of not more than 10% on the brightness and the contrast of the double time phase remote sensing image. The data enhancement method related to preprocessing is beneficial to improving the robustness of the model to noise: the image offset helps to increase the robustness of the model against double-phase image registration errors; the variations in brightness and contrast help to increase the adaptability of the model to variations in illumination intensity and illumination angle.

The composite loss function is used as the loss function of the double-time-phase remote sensing image change detection model of the embodiment of the invention, and the Adam optimizer is used for optimizing the model parameters to obtain the trained double-time-phase remote sensing image change detection model. Wherein, according to the formula

Calculating Euclidean distance between pixel pairs at the same position in the two attention feature maps to obtain distance maps D and D_ijRepresents the value of the ith row and the jth column in the distance map D,

and y_ijRepresenting values in both label images. The distance map D may measure the similarity between features, with regions where pixel pairs with larger distances are more likely to be regions of variation.

The composite loss function L is calculated according to the following formula:

wherein h and w respectively represent the height and width of the attention feature map; y is_ijValue, n, representing a label image_u、n_cRespectively representing the total number of the invariant pixel point pairs and the total number of the variant pixel point pairs in the attention feature map, and respectively expanding the distance of the variant pixel point pairs with too small distance and reducing the distance of the invariant pixel point pairs with too large distance by Posdist and Negdist; posdiff, Negdiff are used to achieve learning that makes the model more robust to scoring samples.

y_ijAnd the value of the label image is represented, the value of the pixel point corresponding to the changed area is 1, and the value of the pixel point corresponding to the unchanged area is 0. By y_ijThe loss function is divided into two parts, which respectively correspond to a changing pixel point set with a label of 1 and a non-changing pixel point set with a label of 0. n is_u、n_cRespectively representing the total number of invariant pixel point pairs and the total number of variant pixel point pairs in the attention feature map, calculated according to the following formula:

wherein n is_u、n_cRespectively representing the total number of invariant pixel point pairs and the total number of variant pixel point pairs, y, in the attention feature map_ijRepresenting the value of the binary label graph, and h and w respectively represent the height and width of the attention feature graph.

Relevant researches show that the training effect of the convolutional neural network is sensitive to the balance degree of the training sample class, samples with more balanced classes are beneficial to improving the prediction effect of the model, and unbalanced samples lead the prediction of the model to be biased to a certain class. To overcome the disadvantages ofThe problem that the number difference between the positive samples and the negative samples of the training set is large in practical application can occur, and the total number n of the two types of samples is calculated by the composite loss function_uAnd n_cAnd using them to weight the loss of the two types of samples, respectively, wherein the weight of the positive sample is

The weight of the negative example is

Posdist and Negdist respectively complete the following tasks: the distance of a changed pixel point pair with an excessively small distance is increased, and the distance of an unchanged pixel point pair with an excessively large distance is decreased. Posdist, Negdist are calculated according to the following formulas:

Posdist_ij＝max{m-D_ij，0}

Negdist_ij＝max{D_ij-τ，0}

wherein D is_ijRepresenting the values of the ith row and the jth column in the distance map D, wherein alpha and beta are weighting coefficients, T, m is a threshold value, alpha is more than beta is more than 1, and m is more than tau.

Posdist and Negdist implement a contextual loss-like mechanism: for positive samples with a prediction distance smaller than a threshold m, expanding the distance by the loss function; for negative samples with the prediction distance larger than the threshold value tau, the distance is reduced; for a large part of samples with good prediction results, the loss of the samples is ignored, and the calculation amount is reduced. This mechanism helps to enlarge the discrimination boundary between classes, making the model more accurate for the prediction of challenge samples.

Posdiff, Negdiff accomplish the following tasks: for each pixel point pair, dividing the difficulty degree into three classes according to the class and the corresponding distance in the distance map D, and giving different weights to each class for learning: the refractory samples are given more weight, the normal samples are the next, and the loss of the refractory samples is ignored. They are calculated according to the following formula:

Posdiff and Negdiff can realize that the double-time-phase remote sensing image change detection model in the embodiment of the invention focuses more on learning of difficultly-differentiated samples. And respectively dividing the positive sample and the negative sample through two thresholds of m and tau, wherein the dividing standard is the difficulty degree of the samples, the difficulty degree is embodied by the corresponding distance of the pixel points in the distance map D, and the negative sample with too large prediction distance and the positive sample with too small prediction distance are difficult to divide. For samples which are difficult to be classified, a large weight is given to the samples in the loss function, for samples which are moderate in difficulty, the weight is correspondingly small, and for a large number of samples which are easy to be classified, the loss of the samples is ignored. The Adam optimizer is able to dynamically adjust the learning rate of each parameter using the first moment estimate and the second moment estimate of the gradient. Smooth changes in the explicit range can be achieved with each iteration learning rate.

The invention provides a double-time-phase remote sensing image change detection model construction method, which adopts a depth residual error network model added with an extrusion-excitation module to construct a double-time-phase remote sensing image feature extractor, the feature extractor integrates rich semantic information of high-level features and rich detail information of low-level features, weighting information of each channel by introducing the extrusion-excitation module, so that the model focuses more on important features, and the feature extraction effect is improved The detection capability of the varying area especially with small size; the common attention algorithm calculates the correlation among the features in the double-temporal feature map, and weights the feature map by taking the correlation as the weight, so that the recognition capability of the model on the variable region and the invariant region is improved. In the model training stage, a composite loss function and an Adam optimizer are used for optimizing the model, the composite loss function reduces the influence of imbalance of positive and negative samples on the training effect in a weighting mode, the learning strength of the model on difficultly-classified samples is increased, and therefore the prediction effect of the model under a complex background is improved; and the Adam optimizer can ensure the stable change of the model parameters in a certain range in the training process.

Based on the same inventive concept, an embodiment of the present invention further provides a double-temporal remote sensing image change detection model construction device for executing the above double-temporal remote sensing image change detection model construction method, wherein the double-temporal remote sensing image change detection model construction device includes:

and the model training module is configured to adopt a preset remote sensing image data set, carry out image deviation, brightness and contrast transformation on the preset remote sensing image data set, and then train a double-time-phase remote sensing image change detection model consisting of the double-time-phase remote sensing image feature extractor and the pyramid attention module by utilizing a composite loss function, wherein the double-time-phase remote sensing image change detection model is used for acquiring a binary change detection image according to the double-time-phase remote sensing image.

Based on the same inventive concept, the embodiment of the invention also provides a double-time-phase remote sensing image change detection method, which adopts the double-time-phase remote sensing image change detection model constructed by the method to detect the change of the double-time-phase remote sensing image and output a binary change detection image.

Specifically, the double-time-phase remote sensing image change detection method inputs double-time-phase remote sensing images into a time-phase remote sensing image change detection model trained by the method, and obtains a final two-value change detection image through a threshold segmentation mechanism. Obtaining a distance map D by calculating the Euclidean distance between pixel point pairs at the same position in the two attention feature maps; each pixel point D in the traversal distance map D_ijSelecting a constant threshold value theta, if D_ij>And theta, predicting the pixel point to be changed, otherwise predicting the pixel point to be unchanged. As described with reference to fig. 7, a binary change detection image is generated, and the corresponding value of the changed pixel in the binary change detection image is 1, and the corresponding value of the unchanged pixel is 0.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass these modifications and variations.

Claims

1. A double-time-phase remote sensing image change detection model construction method is characterized by comprising the following steps:

constructing a pyramid attention module for calculating the correlation between the pixel pairs of the double-temporal feature image and weighting and calculating the feature image by taking the correlation as a weight, wherein the pyramid attention module is nested inside and designs a common attention algorithm;

2. The method for constructing the double-temporal remote sensing image change detection model according to claim 1, wherein the method for constructing the double-temporal remote sensing image feature extractor specifically comprises the following steps:

3. The method for constructing the double-temporal remote sensing image change detection model according to claim 1, wherein the concrete process for constructing the pyramid attention module is as follows:

constructing a feature graph output by a double-time image remote sensing image feature extractor based on an adjacent sub-image edge pixel overlapping mechanism to obtain feature sub-images;

4. The method for constructing the double-temporal remote sensing image change detection model according to claim 3, wherein a common attention algorithm is adopted for each pair of feature sub-images to predict to obtain an attention sub-image, and specifically the method comprises the following steps:

respectively transforming two sub-graphs contained in a pair of double-time phase subgraphs to a preset size and multiplying the two sub-graphs to obtain a related characteristic graph, wherein each element of the related characteristic graph represents the related degree of two pixel points in the two sub-graphs;

5. The method for constructing the double-temporal remote sensing image change detection model according to claim 1, characterized in that an Adam optimizer is used for optimizing model parameters in the process of training the double-temporal remote sensing image change detection model by using a composite loss function.

6. The method for constructing the double-temporal remote sensing image change detection model according to claim 1, wherein the composite loss function is as follows:

wherein h and w respectively represent the height and width of the attention feature map; y is_ijValue representing the label image, n_u、n_cRespectively representing invariant pixel point pairs in an attention feature mapPosdist and Negdist respectively complete the expansion of the distance of the changed pixel point pairs with too small distance and the reduction of the distance of the unchanged pixel point pairs with too large distance; posdiff, Negdiff are used to enable learning that makes the model more focused on the hard-to-distinguish samples.

7. The method for constructing the double-temporal remote sensing image change detection model according to claim 6, wherein the total number of the invariant pixel point pairs and the total number of the variant pixel point pairs in the attention feature map are respectively calculated by adopting the following formulas:

wherein h and w respectively represent the height and width of the attention feature map; y is_ijValue representing the label image, n_u、n_cRespectively representing the total number of invariant pixel point pairs and the total number of variant pixel point pairs in the attention feature map.

8. The method for constructing a double temporal remote sensing image change detection model according to claim 6,

Posdist_ij＝max{m-D_ij，0}

Negdist_ij＝max{D_ij-τ，0}

wherein，D_ijAnd representing the values of the ith row and the jth column in the distance map D, wherein alpha and beta are weighting coefficients, tau and m are threshold values, alpha is more than beta and is more than 1, and m is more than tau.

9. The utility model provides a double-temporal remote sensing image change detection model construction equipment which characterized in that, double-temporal remote sensing image change detection model construction equipment includes:

10. A double-time-phase remote sensing image change detection method is characterized in that the double-time-phase remote sensing image change detection method adopts the double-time-phase remote sensing image change detection model constructed according to any one of claims 1 to 9 to detect changes of double-time-phase remote sensing images and outputs binary change detection images.