CN116681930A

CN116681930A - Remote sensing image change detection and model training method, device and storage medium thereof

Info

Publication number: CN116681930A
Application number: CN202310576700.4A
Authority: CN
Inventors: 刘亚岚; 任玉环; 吴飒莎; 柳树福; 王大成
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-09-01

Abstract

The embodiment of the application discloses a remote sensing image change detection method, remote sensing image change detection equipment and a model training method thereof and a storage medium. The change detection model comprises a feature extraction module, the feature extraction module comprises two branches of a common time correlation sub-module, and the training method comprises the following steps: acquiring an image sample pair set; the method comprises the steps that an initial feature extraction submodule of each branch is adopted to extract features of one image sample in each image sample pair, and a first feature map pair set is obtained; performing time relevance enhancement processing on the first feature map pair by adopting a time relevance sub-module to obtain an updated first feature map pair; determining a difference feature map between corresponding image sample pairs based on the updated first feature map pairs by adopting a difference module; classifying the difference feature images by adopting a prediction module to obtain a prediction change area of a target object in the corresponding image sample pair; based on the predicted change region and the actual change region, the change detection model is trained to converge.

Description

Remote sensing image change detection and model training method, device and storage medium thereof

Technical Field

The present application relates to, but not limited to, the field of computer vision technologies, and in particular, to a remote sensing image change detection method, a remote sensing image change detection device, a remote sensing image model training method, a remote sensing image change detection device, and a remote sensing image model training storage medium.

Background

The change detection of the remote sensing image refers to a process of extracting change information of a target object (such as bare land or a building) therein by using remote sensing images, digital line drawings or stereo opposition which are acquired at different time phases and cover the same surface area, determining a change area and analyzing surface change. In the related art, the change detection is mostly performed by: if the method is based on the traditional methods such as band operation, image transformation and the like, the extracted characteristic information is simpler, and the accuracy of the detection result is limited; in addition, the traditional machine learning methods such as a support vector machine, a decision tree and the like need to manually extract and select features, and the method has no universality and is difficult to realize automatic detection of a change region; and extracting a change region of the target object like a convolutional neural network, wherein the global feature is difficult to obtain due to the limited receptive field of the convolutional neural network, and the extraction effect of the global feature is not ideal for a large range of features. Therefore, it is desirable to provide a new method for detecting changes in remote sensing images.

Disclosure of Invention

In view of this, the embodiments of the present application at least provide a remote sensing image change detection method, a remote sensing image change detection device, a remote sensing image model training method, a remote sensing image model training device, and a remote sensing image model training storage medium.

In a first aspect, an embodiment of the present application provides a training method of a change detection model of a remote sensing image, where the change detection model includes a feature extraction module, a difference module, and a prediction module, the feature extraction module includes two branches sharing a time-associated sub-module, and each of the branches further includes an initial feature extraction sub-module, and the method includes: acquiring an image sample pair set, wherein the image sample pair is a two-phase remote sensing image in the same space; the initial feature extraction submodule of each branch is adopted to carry out feature extraction on one image sample in each image sample pair, and a first feature map pair set is obtained; performing time relevance enhancement processing on the first feature map pairs in the first feature map pair set by adopting the time relevance sub-module to obtain updated first feature map pairs; determining a difference feature map between corresponding image sample pairs based on the updated first feature map pairs by adopting the difference module; classifying the difference feature map by adopting the prediction module to obtain a prediction change area of a target object in the corresponding image sample pair; and training the change detection model to be converged based on the predicted change area and the actual change area of the target object in each image sample pair in the image sample pair set.

In a second aspect, an embodiment of the present application provides a method for detecting a change in a remote sensing image, where the change detection model includes a feature extraction module, a difference module, and a prediction module, the feature extraction module includes two branches that share a time-related sub-module, and each of the branches further includes an initial feature extraction sub-module, and the method includes: acquiring an image pair to be detected, wherein the image pair to be detected is a pair of image blocks which are respectively cut into two remote sensing images to be detected, have preset sizes and correspond to the same space; the initial feature extraction sub-module of each branch is adopted to extract features of one to-be-detected image in the to-be-detected image pairs, and a first feature image pair set is obtained; performing time relevance enhancement processing on the first feature map pairs in the first feature map pair set by adopting the time relevance sub-module to obtain updated first feature map pairs; determining a difference feature map between the image pair to be detected based on the updated first feature map by adopting the difference module; and classifying the difference feature map by adopting the prediction module to obtain a change region of the target object in the image pair to be detected.

In a third aspect, embodiments of the present application provide a computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing some or all of the steps of the above method when the program is executed.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs some or all of the steps of the above method.

The embodiment of the application provides a training method of a change detection model of a remote sensing image, the change detection model comprises a feature extraction module, a difference module and a prediction module, the feature extraction module comprises two branches sharing a time correlation sub-module, each branch further comprises an initial feature extraction sub-module, and the method comprises the following steps: firstly, acquiring an image sample pair set, wherein the image sample pair is a two-time-phase remote sensing image in the same space; secondly, respectively adopting an initial feature extraction submodule of each branch to extract features of one image sample in each image sample pair to obtain a first feature map pair set; then, a time correlation sub-module is adopted to perform time correlation enhancement processing on the first feature map pairs in the first feature map pair set, so as to obtain updated first feature map pairs; then, a difference module is adopted, and a difference characteristic diagram between the image sample pairs is obtained by utilizing the updated first characteristic diagram pairs; then, classifying the difference feature map by adopting a prediction module to obtain a prediction change area of the target object in the corresponding image sample pair; and finally, training the change detection model to be converged based on the predicted change area and the real change area of the target object in each image sample pair in the image sample pair set.

Because the image sample is used for coming from the remote sensing image, factors such as imaging time, atmospheric conditions, angles and the like of the remote sensing image can cause pseudo-change, and the model is mistakenly considered to be the real change of the target object, so that the accuracy of the model is reduced. According to the change detection model provided by the embodiment of the application, the time correlation between the first feature map pairs, namely the relation between different time space pixels is enhanced through the time correlation submodule, and the first feature map pairs come from two-phase image sample pairs, so that false judgment on false change in the two-phase image sample pairs can be reduced, and the accuracy of the model is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1A is a schematic implementation flow diagram of a training method of a change detection model of a remote sensing image according to an embodiment of the present application;

FIG. 1B is a schematic structural diagram of a variation detection model according to an embodiment of the present application;

Fig. 1C is a schematic structural diagram of a time-related sub-module according to an embodiment of the present application;

fig. 2 is a schematic implementation flow chart of step S1032 according to an embodiment of the present application;

fig. 3 is a schematic diagram of an implementation flow of step S103 according to an embodiment of the present application;

fig. 4 is a schematic implementation flow chart of a method for detecting changes in a remote sensing image according to an embodiment of the present application;

fig. 5A is a schematic diagram of a remote sensing image of phase a and phase B according to an embodiment of the present application;

FIG. 5B is a schematic diagram of a variation area grid result provided by an embodiment of the present application;

fig. 5C is a schematic implementation flow chart of a method for manufacturing a remote sensing image bare land change detection sample data set and an operation method of a remote sensing image bare land change detection system according to an embodiment of the present application;

fig. 6 is a schematic diagram of a composition structure of a training device for a change detection model of a remote sensing image according to an embodiment of the present application;

fig. 7 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.

Detailed Description

The technical solution of the present application will be further elaborated with reference to the accompanying drawings and examples, which should not be construed as limiting the application, but all other embodiments which can be obtained by one skilled in the art without making inventive efforts are within the scope of protection of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

The term "first/second/third" is merely to distinguish similar objects and does not represent a particular ordering of objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence, as allowed, to enable embodiments of the application described herein to be implemented in other than those illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the application only and is not intended to be limiting of the application.

The embodiment of the application provides a training method of a change detection model of a remote sensing image, which comprises a feature extraction module, a difference module and a prediction module, wherein the feature extraction module comprises two branches sharing a time correlation sub-module, each branch further comprises an initial feature extraction sub-module, and the method can be executed by a processor of computer equipment. The computer device may be a server, a notebook computer, a tablet computer, a desktop computer, or the like, which has data processing capability. Fig. 1A is a schematic implementation flow chart of a training method of a change detection model of a remote sensing image according to an embodiment of the present application, as shown in fig. 1A, the method includes steps S101 to S106 as follows:

Step S101: acquiring an image sample pair set, wherein the image sample pair is a two-phase remote sensing image in the same space;

here, the image sample pair may be obtained by cropping the remote sensing image of two phases. The remote sensing image can be obtained through radiation correction and accurate geometric correction preprocessing. In some embodiments, pairs of image samples may be directed to the same space to determine whether a target object of the space has changed. The image sample pair may include a target object, which may include bare land, a building, or the like. In some embodiments, the ratio of target object pixels to background object pixels in the image sample pair may be greater than 25% for better training of the model.

Step S102: the initial feature extraction submodule of each branch is adopted to carry out feature extraction on one image sample in each image sample pair, and a first feature map pair set is obtained;

here, the initial feature extraction submodule is configured to perform feature extraction on the image sample pair, and obtain a first feature map. The initial feature extraction submodule in each branch is used for extracting features of one image sample in the image sample pair, and the image samples extracted by the two branches are different, so that a first feature map pair is obtained. The embodiment of the application does not limit the structure of the initial feature extraction submodule.

In some embodiments, after extracting the features by the initial feature extraction sub-module, a first feature map pair with the same resolution may be obtained; in some embodiments, the first feature map pair at different resolutions may also be obtained after extracting the features by the initial feature extraction sub-module.

Fig. 1B shows a structure of a change detection model, where the feature extraction module is a module obtained by modifying a twinning moblie vit network, and includes a first branch 101 and a second branch 102, where the first branch 101 and the second branch 102 share a time-related sub-module, and the time-related sub-module is a sub-module obtained by modifying a moblie vit module in an original moblie vit network, so as to implement connection of the two branches and use of a transducer. The initial feature extraction submodule comprises a convolution Conv 3*3 for extracting feature images with 128 x 128 resolution and a plurality of inverse residual structures I, and first feature image pairs with different resolutions (128 x 128, 64 x 64, 32 x 32, 16 x 16) can be obtained through the initial feature extraction submodule. It should be noted that, in the original moblie network, 8×8 feature maps are also output, and the number of parameters and the number of model bodies are increased due to the limited information extracted from the 8×8 feature maps and the superposition of the number of CNN layers, so that the initial feature extraction sub-module does not output the first feature map pair of 8×8.

Step S103: performing time relevance enhancement processing on the first feature map pairs in the first feature map pair set by adopting the time relevance sub-module to obtain updated first feature map pairs;

here, the first feature map pairs subjected to the time-correlation enhancement process may be all of the first feature map pairs in the first feature map pair set, or may be part of the first feature map pairs in the first feature map pair set.

The temporal correlation enhancement process is used to enhance the temporal correlation between the first feature map pairs. Because the remote sensing image is easy to cause some false changes due to the influence of factors such as imaging time, atmospheric conditions, sensor shooting angles and the like, the conventional change detection model misdetects the false changes as real changes, so that the accuracy of change detection is reduced. And by enhancing the time relevance between the first feature map pairs, namely the relation between different time empty pixels, the accuracy of the change detection model is improved. The embodiment of the application does not limit the structure of the time-related sub-module.

In some embodiments, as shown in fig. 1C, the time-associated sub-module may include a local characterization unit 103, a global characterization unit 104, and a fusion unit 105 based on the twin mobile vit module modification, and correspondingly, the implementation of step S103 may include the following steps S1031 to S1033:

Step S1031: performing region division and position coding on the first feature map pairs in the first feature map pair set by adopting the local characterization unit to obtain second feature map pairs;

as will be understood with reference to fig. 1C, each first feature map (F _As And F _Bs ) Divided into a number of h x w (2 x 2 in the figure), a different position code is set for each small region in each first feature map, for example, the position code of the small region in the upper left corner of the first row is 11, the position code of the small region in the middle of the first row is 12, and the position code of the small region in the upper right corner of the first row is 13. A different position code is set for each tile in each cell, for example, in the upper left cell, the upper left tile has a position code of 111, the upper right tile has a position code of 112, the lower left tile has a position code of 113, and the lower right tile has a position code of 114.

The manner of region division and position coding can be seen in the MobileVIT module in the original MobileVIT network. The embodiment of the application does not limit the size of the divided area and the setting mode of the position code.

Step S1032: performing time relevance enhancement processing on the second feature map pair by adopting the global characterization unit to obtain a third feature map pair;

In some embodiments, the global characterization unit 104 may include a transducer that performs a temporal correlation enhancement process on the second feature map pairs via a transducer's self-attention mechanism.

FIG. 1C shows the structure of a global characterization unit, where the global characterization unit 104 includes an unfolding subunit 1041, a channel connection subunit 1042, an association subunit (transducer) 1043, a channel separation subunit 1044, and a folding subunit 1045;

correspondingly, as shown in fig. 2, step S1032 "performing time-correlation enhancement processing on the second feature map pair by using the global characterization unit, to obtain a third feature map pair" includes the following steps S201 to S205 (to be understood with reference to fig. 1C):

step S201: expanding each second characteristic diagram of the second characteristic diagram pair by adopting the expanding subunit to obtain a fourth characteristic diagram pair;

here the number of the elements is the number,the implementation of step S201 may include: according to the position code of each second characteristic diagram, the characteristics of the same position in the small area (h x w) obtained by dividing are spliced together, so that each second characteristic diagram is unfolded, and a fourth characteristic diagram pair (T) _As And T _Bs ). The expansion process can be referred to as an unfold process in the original MobileVIT module, where H and W are the height dimension and width dimension of the feature map, d is the channel dimension, n=h×w, and p=hw/N.

Step S202: splicing the fourth characteristic diagram in the fourth characteristic diagram pair by adopting the channel connection subunit;

here, the fourth feature map pair may be spliced along the direction of deployment of the second feature map pair. As will be understood with reference to fig. 1C, when the second pair of feature maps is deployed along the extending direction of P, the fourth pair of feature maps may be spliced along the extending direction of P, so that features of two-phase images may be fused by a transducer in the associated subunit.

Step S203: performing time relevance enhancement processing on the spliced fourth feature images by adopting the relevance subunit to obtain fifth feature images;

here, the association subunit may include a transducer, and the association between the first time phase and the second time phase in the spliced fourth feature map is enhanced by using a self-attention mechanism in the transducer.

Wherein the transducer comprises a layer normalization (Layer Normalization, LN), a Multi-Head Attention Mechanism (MHA) and a full-connected layer (Multilayer Perception, MLP), and the fourth feature map (T) _As +T _Bs ) After the time-correlation enhancement processing by the transducer, a fifth feature map (T _Asnew +T _Bsnew ). For the calculation method of the transducer, see the related art.

Step S204: the fifth characteristic diagram is separated by adopting the channel separation unit according to the reverse direction of the splicing mode of the channel connection subunit, so as to obtain a sixth characteristic diagram pair;

here, due to the passage connecting the subunitsThe stitching method is stitching along the extending direction of P, and therefore, the separation is also performed along the direction opposite to the extending direction of P, so that a sixth feature map (T _As-new And T _Bs-new ) And a fifth characteristic diagram (T _As And T _Bs ) Is the same as the dimension of each dimension.

Step S205: and carrying out folding processing on each sixth characteristic diagram in the sixth characteristic diagram pair by adopting the folding subunit to obtain a third characteristic diagram pair.

Here, the procedure of the folding process corresponds to the procedure of the unfolding process, and each sixth feature map is folded into a position state before unfolding by the reverse operation of the unfolding process. The folding process can be referred to as the fold process in the original MobileVIT module.

In the embodiment of the application, because the time-related sub-module is improved based on the twin MobileVIT module, each feature in the pair of MobileVIT modules is generated compared with the MobileVIT module. In order to utilize the transducer in the MobileVIT module, a channel connection subunit and a channel separation subunit are added in the time correlation subunit, the channel connection subunit is used for connecting the feature pairs input to the transducer, and the channel separation subunit is used for outputting the features processed by the transducer so as to generate new feature pairs respectively, so that the generated new feature pairs contain more time correlation information.

Step S1033: and fusing each third characteristic diagram in the third characteristic diagram pair with the corresponding first characteristic diagram by adopting the fusion unit to obtain an updated first characteristic diagram pair.

Here, since the updated first feature map pair includes the third feature map obtained by performing the temporal correlation enhancement processing on the second feature map pair by the global characterization unit, the updated first feature map pair includes the temporal correlation between the image sample pairs.

In some embodiments, the fusion unit may perform Conv1 x 1 on each third feature map to obtain F _As ' and F _Bs 'A'; and then F is arranged _As ' and F _Bs ' respectively fusing the first feature images with the corresponding first feature images to obtain F _As "sum F _Bs "; finally to F _As "sum F _Bs "execute Conv n, get updated first feature map pair (F _As-new And F _Bs-new )。

In the embodiment of the application, under the condition that the time correlation sub-module is improved based on the twin mobile VIT module, as the lightweight mobile VIT only carries out information interaction between pixels with the same relative position, the calculated amount can be greatly reduced, a plurality of sequences can be calculated in parallel, the training efficiency is high, the method is better suitable for a small sample data set, and the overfitting is avoided; the time correlation submodule is connected with the vector sequences of the two time phases, so that the multi-head attention mechanism of the transducer focuses on the information of different time phases and different positions at the same time, the time correlation between the two time phase images can be enhanced, and the feature extraction of the two time phase images is tightly connected and is not independently carried out, thereby reducing the influence of pseudo-change and improving the change detection precision.

Step S104: determining a difference feature map between corresponding image sample pairs based on the updated first feature map pairs by adopting the difference module;

here, in the case where there is only one pair of the updated first feature map pairs (i.e., the first feature map pair set includes only the first feature map pair at one resolution), the difference feature map between the corresponding image sample pairs can be obtained by subtracting the absolute values of the updated first feature map pairs.

Under the condition that more than one pair of updated first feature images are provided, the absolute value of each pair of updated first feature images can be subtracted to obtain an initial difference feature image, and the initial difference feature images are fused to obtain the difference feature images between the corresponding image sample pairs.

Step S105: classifying the difference feature map by adopting the prediction module to obtain a prediction change area of a target object in the corresponding image sample pair;

here, the prediction module may be a full convolution network including two layers, or may be other network structures capable of implementing prediction. The classification type can comprise a target object change area and other areas, and the prediction module can obtain a prediction change area of the target object in the image sample pair after classifying the difference feature map through training of the model.

Step S106: and training the change detection model to be converged based on the predicted change area and the actual change area of the target object in each image sample pair in the image sample pair set.

Here, the implementation of step S106 may include: and calculating a loss function of the change detection model through a predicted change area and a real change area of the target object in each image sample pair, and training the change detection model according to the loss function so that the change detection model converges.

The types of loss functions may include, among others, negative log likelihood loss, cross entropy loss, exponential loss, and the like.

In some embodiments, the first set of feature map pairs includes first feature map pairs at different resolutions; as shown in fig. 3, the implementation of step S103 "performing the time-correlation enhancement processing on the first feature map pairs in the first feature map pair set by using the time-correlation sub-module to obtain updated first feature map pairs" may include the following steps S1031 and S1032:

step S1031: determining a first feature map pair at least one target resolution in the first feature map pair set;

here, the target resolution may be a resolution that requires time-correlation enhancement processing. If the resolution is too large or too small, the time correlation enhancement processing is performed by determining the first feature map pair at the target resolution in step S1031, because the effect of the time correlation enhancement processing on the accuracy of the final model is not large, but the calculation amount is increased.

Step S1032: and carrying out time relevance enhancement processing on the first feature map pairs under each target resolution by adopting the time relevance sub-module to obtain at least one updated first feature map pair.

Here, the implementation of step S1032 can be seen in step S103.

Correspondingly, the implementation of "determining a difference feature map between corresponding image sample pairs based on the updated first feature map pair" in step S104 may include the following steps S1041 to S1043:

step S1041: determining a first feature map pair at least one non-target resolution in the first set of feature map pairs;

here, the non-target resolution may be all or part of the resolutions other than the target resolution among all the resolutions included in the first feature map pair set.

For example: all resolutions contained by the first set of feature map pairs include: 128×128, 64×64, 32×32, 16×16, the target resolution includes: 32 x 32, 16 x 16, the non-target resolution may include: 64*64.

Step S1042: determining a first difference feature map between each updated first feature map pair and a second difference feature map between each first feature map pair at the non-target resolution respectively;

Here, the method for determining the first difference feature map or the second difference feature map includes: and subtracting the feature map pairs to obtain absolute values, thereby obtaining corresponding difference feature maps, and extracting the difference features in the corresponding figure 1B.

Step S1043: and fusing each first difference feature map and each second difference feature map to obtain a difference feature map between corresponding image sample pairs.

Here, by fusing each first difference feature map and each second difference feature map, the finally obtained difference feature map fuses features under different resolutions, so that not only coarse-granularity feature information but also fine-granularity feature information can be obtained, and the judgment of the model on the change region of the target object is facilitated.

An embodiment of step S1043 is described below by taking the example that the non-target resolution includes a first resolution, the target resolution includes a second resolution and a third resolution, and the third resolution is smaller than the second resolution, and the second resolution is smaller than the first resolution.

Correspondingly, the implementation of step S1043 "fusing each of the first difference feature map and each of the second difference feature maps to obtain a difference feature map between corresponding image sample pairs" may include the following steps S301 to S305 (understood with reference to fig. 1B):

Step S301: picking up the first difference feature map corresponding to the third resolution to obtain a first difference feature map corresponding to the second resolution, and a third difference feature map and a fourth difference feature map which have the same resolution as the first difference feature map corresponding to the first resolution;

here, this embodiment is directed to a feature extraction module that is a module modified from the twinning-based moblie network shown in fig. 1B.

The third resolution is 16×16, the second resolution is 32×32, and the first resolution is 64×64. The first difference feature map corresponding to the third resolution is denoted as D16, the first difference feature map corresponding to the second resolution is denoted as D32, and the first difference feature map corresponding to the first resolution is denoted as D64. The first difference feature map corresponding to the third resolution is up-sampled, and the third difference feature map which is obtained and has the same resolution as the first difference feature map D32 corresponding to the second resolution is denoted as U32. And (3) up-sampling the first difference feature map corresponding to the third resolution, and recording the fourth difference feature map which is the same as the resolution of the first difference feature map corresponding to the first resolution as U64.

In some embodiments, the difference module includes an attention sub-module, and the first difference feature map corresponding to the third resolution may be determined by: subtracting the absolute value from the updated first feature map pair under the third resolution to obtain a first difference feature map corresponding to the initial third resolution; carrying out feature enhancement processing on the first difference feature map corresponding to the initial third resolution by adopting an attention sub-module to obtain a first difference feature map corresponding to the third resolution;

Here, the attention sub-module may be a lightweight attention module (Convolutional Block Attention Module, CBAM) in fig. 1B. The CBAM comprises 2 independent sub-modules, namely a channel attention module (Channel Attention Module, CAM) and a space attention module (Spatial Attention Module, SAM), and the channel and space feature enhancement processes are respectively carried out. This not only saves parameters and computational power, but also ensures that it can be integrated into existing network architecture as a plug and play module.

The calculation flow of the channel attention module is as follows: and respectively carrying out maximum pooling and average pooling on input features based on height and width to obtain two feature graphs with the size of 1 x C, respectively inputting the feature graphs into a shared full-connection layer, firstly compressing the number of channels to 1/r times of the original number of channels, then expanding the number of channels to the original number of channels, and obtaining two activated features through a ReLU activation function. And then, carrying out addition operation and sigmoid activation operation on the two characteristics sharing the output of the full connection layer, and generating the final channel attention output characteristics.

The calculation flow of the spatial attention module is as follows: the output characteristics of the channel attention module are taken as the input characteristics of the space attention module. Firstly, carrying out maximum pooling and average pooling based on channels to obtain two H1 characteristic diagrams, and carrying out channel splicing operation on the 2 characteristic diagrams based on the channels; then, through a 7*7 convolution operation, the dimension is reduced to 1 channel, namely H is equal to W is equal to 1; and generating the output characteristics of the spatial attention module through sigmoid.

The CBAM may perform feature enhancement processing in a channel and a spatial dimension, where the channel attention makes the network pay more attention to the feature map related to the change, and the spatial attention increases the spatial weight of the changed target object, so that accuracy of model identification of the change region of the target object may be improved.

Step S302: fusing the first difference feature map corresponding to the first resolution with the fourth difference feature map to obtain a fifth difference feature map;

that is, the first difference feature map D64 and the fourth difference feature map U64 corresponding to the first resolution are fused to obtain a fifth difference feature map M64.

The fusion here may include: and splicing the first difference characteristic diagram and the fourth difference characteristic diagram corresponding to the first resolution along the channel dimension. It should be noted that, the following fusion in steps S303 to S305 may include splicing along the channel dimension (corresponding to the c-channel connection in fig. 1B).

In the case where the difference module includes an attention sub-module, the implementation of step S302 may include the following steps S3021 and S3022:

step S3021: fusing the first difference feature map corresponding to the first resolution with the fourth difference feature map to obtain an initial fifth difference feature map;

Step S3022: and carrying out feature enhancement processing on the initial fifth difference feature map by adopting the attention sub-module to obtain a fifth difference feature map.

Step S303: fusing the first difference feature map corresponding to the second resolution with the third difference feature map to obtain a sixth difference feature map;

that is, the first difference feature map D32 and the third difference feature map U32 corresponding to the second resolution are fused to obtain a sixth difference feature map M32.

In the case where the difference module includes an attention sub-module, the implementation of step S303 may include the following steps S3031 and S3032:

step S3031: fusing the first difference feature map corresponding to the second resolution with the third difference feature map to obtain an initial sixth difference feature map;

step S3032: and carrying out feature enhancement processing on the initial sixth difference feature map by adopting the attention sub-module to obtain a sixth difference feature map.

Step S304: the sixth difference feature map is picked up to obtain a seventh difference feature map with the same resolution as the fifth difference feature map;

here, the resolution of the fifth difference feature map M64 is 64×64, and the seventh difference feature map obtained by picking up the sixth difference feature map M32 is denoted as M64'.

Step S305: and fusing the seventh difference feature map with the fifth difference feature map to obtain a difference feature map between the corresponding image sample pairs.

Here, the seventh difference feature map M64' and the fifth difference feature map M64 are fused to obtain a difference feature map between the corresponding image sample pair.

In the embodiment of the application, the obtained difference feature map comprises the features with different resolutions by the fusion method, so that the accuracy of the model can be improved.

The embodiment of the application also provides a change detection method of a remote sensing image, which is applied to a change detection model of the remote sensing image, wherein the change detection model comprises a feature extraction module, a difference module and a prediction module, the feature extraction module comprises two branches sharing a time correlation sub-module, each branch further comprises an initial feature extraction sub-module, as shown in fig. 4, and the method comprises the following steps S401 to S405:

step S401: acquiring an image pair to be detected, wherein the image pair to be detected is a pair of image blocks which are respectively cut into two remote sensing images to be detected, have preset sizes and correspond to the same space;

here, the remote sensing image to be detected may be a two-phase remote sensing image after radiation correction and geometric fine correction. The preset size may be 256×256. In some embodiments, after two remote sensing images to be detected are cut, position labels can be further carried out on each pair of image blocks, so that the later splicing of the classification results of each pair of image blocks is facilitated.

Step S402: the initial feature extraction sub-module of each branch is adopted to extract features of one to-be-detected image in the to-be-detected image pairs, and a first feature image pair set is obtained;

step S403: performing time relevance enhancement processing on the first feature map pairs in the first feature map pair set by adopting the time relevance sub-module to obtain updated first feature map pairs;

step S404: determining a difference feature map between the image pair to be detected based on the updated first feature map by adopting the difference module;

step S405: and classifying the difference feature map by adopting the prediction module to obtain a change region of the target object in the image pair to be detected.

Here, the result after the classification processing may be in the form of a binary image in which 1 represents a change region of the target object and 0 represents the other region. Because the image pairs to be detected are obtained by cutting two remote sensing images to be detected, after the change area of the target object is obtained, the classification results (binary images) of all the image pairs to be detected in the two remote sensing images to be detected can be spliced together according to the position identification of each pair of image blocks; then restoring the geographical position information; and finally, manufacturing a grid image and a vector image of the change area based on the classification result so as to facilitate the processing of later data.

Based on the above, the embodiment of the application also provides a change detection system of the remote sensing image, which can comprise the steps of cutting the remote sensing image to obtain an image block; classifying and predicting the image block; splicing classification results; restoring the geographical position information of the spliced image; a raster image and a vector image of the change region are generated. Therefore, after the two-time-phase remote sensing images subjected to radiation correction and geometric fine correction are obtained, the grid image and the vector image of the change area are directly output by the input system, so that the change area of the target object is accurately and rapidly automatically extracted from the two-time-phase remote sensing images, and the change area of the target object in the remote sensing images is conveniently extracted in a flow manner, and then managed, inquired and analyzed.

In some embodiments, the time-dependent submodules include submodules of a local characterization unit, a global characterization unit and a fusion unit which are improved based on a twin MobileVIT module,

correspondingly, the implementation of step S403 "performing the time-correlation enhancement processing on the first feature map pairs in the first feature map pair set by using the time-correlation sub-module to obtain updated first feature map pairs" may include the following steps S4031 to S4033:

Step S4031: performing region division and position coding on the first feature map pairs in the first feature map pair set by adopting the local characterization unit to obtain second feature map pairs;

step S4032: performing time relevance enhancement processing on the second feature map pair by adopting the global characterization unit to obtain a third feature map pair;

step S4033: and fusing each third characteristic diagram in the third characteristic diagram pair with the corresponding first characteristic diagram by adopting the fusion unit to obtain an updated first characteristic diagram pair.

In some embodiments, the global characterization unit includes an unfolding subunit, a channel connection subunit, an association subunit, a channel separation subunit, and a folding subunit;

correspondingly, the implementation of step S4032 "performing the time-correlation enhancement processing on the second feature map pair by using the global characterization unit to obtain the third feature map pair" may include the following steps S431 to S435:

step S431: expanding each second characteristic diagram of the second characteristic diagram pair by adopting the expanding subunit to obtain a fourth characteristic diagram pair;

step S432: splicing the fourth characteristic diagram in the fourth characteristic diagram pair by adopting the channel connection subunit;

Step S433: performing time relevance enhancement processing on the spliced fourth feature images by adopting the relevance subunit to obtain fifth feature images;

step S434: separating the fifth characteristic diagram by adopting the channel separation sub-unit according to the reverse direction of the splicing mode of the channel connection sub-unit to obtain a sixth characteristic diagram pair;

step S435: and carrying out folding processing on each sixth characteristic diagram in the sixth characteristic diagram pair by adopting the folding subunit to obtain a third characteristic diagram pair.

The following description will take a target object as a bare land as an example:

because the urban land is increased in urban speed, a large amount of bare land exists due to development, utilization, removal and the like of urban land in many large and medium cities, and the quality of the regional atmospheric environment and the appearance of the urban appearance are seriously influenced; the dust pollution can reduce photosynthesis of green plants, reduce carbon absorption and have adverse effects on human health. Many cities are in urgent need of developing supervision work for large-area bare land treatment. The remote sensing technology has the characteristics of macroscopicity, objectivity, dynamics and rapidness, reduces the cost of manpower and time, and provides an effective technical means for accurately and efficiently monitoring the change of the bare land in towns in a large range.

According to the classification of the present state of land use (GB/T21010-2017)), bare land means "land with a surface layer of soil and substantially no vegetation cover". At present, the common land coverage remote sensing classification system at home and abroad has less attention to bare land, and is difficult to meet the requirement of fine treatment of urban environment. According to the embodiment of the application, bare land which is basically free of vegetation coverage and is easy to cause dust pollution on the upper surface of the remote sensing image is used as a change detection object. The characteristics of the bare land are different from those of other land features such as buildings, farmlands and the like, so that the applicability of a change detection model for the other land features to the bare land is limited, and the detection precision is low; the spatial scale difference of the bare land objects is large, the temporary vacant bare land with a small area exists, and the bare land for large-area farmland and building construction exists. However, as the resolution of satellite images is higher and higher, the information contained therein is more and more abundant, and the conventional methods, such as image feature information based on band operation (difference, ratio, bare soil index), image transformation (principal component transformation, thysancap transformation) and the like, are limited, and the geographic information system (Geographic Information System, GIS) information assistance and the conventional machine learning methods, such as support vector machines, decision trees and the like, still need to manually extract and select features, and the methods have no universality and are difficult to realize automatic extraction of bare soil variation.

In recent years, application research of deep learning in the field of remote sensing image change detection is generally realized by improving twinned classical networks such as UNet and ResNet, but the application research is less focused on bare land. The convolutional neural network CNN is easy to train, has good performance when the data set is less, has fast model convergence, but has limited receptive field, is difficult to acquire global features of images, has an unsatisfactory feature extraction effect on a large scale, and inevitably causes loss of some information in the encoding and decoding processes. The transducer series is used as one of the deep learning popular models at present, a self-attention mechanism of the transducer series can effectively acquire global information, the transducer series is designed for natural language problems, but researches prove that the transducer series is effective in image processing, and the models combined with CNN and transducers have unique advantages, but the number of layers of the CNN and the transducer structure can cause the volume of the models to be increased. While lightweight networks such as MobileNet, mobileVIT are small in volume and strong in feature extraction capability, perform well in both object recognition and image classification tasks, they are rarely studied in terms of change detection.

Therefore, it is necessary to construct a lightweight, change-detecting model to reduce the number of model parameters, improve the training efficiency of the model, and better accommodate small sample data sets. At present, for two-time-phase remote sensing images, a change detection method based on deep learning mostly adopts a twin network structure with shared weights, and independent features of the two-time-phase images are extracted respectively. Due to pseudo-changes caused by imaging time, atmospheric conditions, angles and the like, the twin network with shared weights independently extracts the characteristics of the two-phase images, and the influence caused by the pseudo-changes cannot be avoided. Therefore, considering the time correlation between two phase images, i.e., the relationship between different time-space pixels, is beneficial to improving the performance of the change detection model.

Currently, most of the technologies based on deep learning change detection focus on a deep learning model and algorithm, change detection is performed based on a small-size standard image block with a fixed size, for example, 256×256, 512×512, 1024×1024, and the like, and no automatic flow of a two-phase remote sensing image change extraction flow with a large range is established. In practical application, the remote sensing technology generally needs to extract a large range of ground feature changes, and has complex processing process and low efficiency. Therefore, the system integration is carried out on the processes of cutting, predicting, splicing and the like of the remote sensing image change detection, the efficiency of detecting the ground feature change of the remote sensing image in a large range can be improved, the labor and time cost are saved, and the method has important application value. The remote sensing image bare land change detection system is established, so that the automation and the intelligent detection of bare land change information are realized, and technical support can be provided for large-area bare land management and supervision work.

In order to solve the technical problems, the embodiment of the application provides a method and a system for detecting the change of bare land of a remote sensing image by considering time correlation, provides a lightweight space-time correlation enhanced change detection model which aims at bare land, adapts to a small sample data set and considers the time correlation in a two-phase image, and establishes a remote sensing image bare land change detection system to automatically extract a bare land change result.

The method specifically comprises the following steps:

step 1, constructing a change detection model with enhanced time-space relevance;

and 2, establishing a remote sensing image bare land change detection system based on the space-time correlation enhanced change detection model.

The following describes the steps of the embodiments of the present application in detail:

step 1, constructing a change detection model with enhanced time-space relevance; comprising the following steps: as shown in fig. 1B, the change detection model with enhanced spatio-temporal correlation (i.e., the above change detection model) is composed of a feature extractor (i.e., the above feature extraction module), a multi-scale difference fusion device (i.e., the above difference module), and a prediction head (i.e., the above prediction module). The feature extractor is improved based on a twin MobileVIT network, firstly only feature graphs F with the side length of 16 and larger are extracted in the network _As 、F _Bs (s=16, 32,64,128, s is the side size) convolution and inverse residual structure portion, the large-size feature map provides more fine-granularity detail information, and the small-size feature map provides more coarse-granularity semantic information. The feature map of undersize (s=8) extracts limited information and superposition of CNN layers leads to an increase in the number of parameters and the volume of the model, so it is removed; as shown in fig. 1C, the time correlation enhanced MobileVIT module uses a two-phase vector sequence T _As 、T _Bs (s=16, 32) performing channel connection, inputting a transducer for feature enhancement, performing time-space domain context modeling, and enhancing the time correlation between two-phase images A, B of the input model. The inverse residual structure enables depth separable convolution to extract more information in the high-dimensional characteristic space characteristics through point-by-point convolution and dimension ascending and dimension descending; because the bare land objects have larger difference in space scale, have small-area temporary empty bare lands, also have large-area farmland bare lands, building construction bare lands and the like, the global features that CNN receptive fields are limited and images are difficult to acquire are overcome by introducing a transducer, and the method is suitable forAnd the characteristic extraction effect in a large range is not ideal.

The time correlation enhanced MobileVIT module first uses local characterization to input the characteristic F _As 、F _Bs (s=16, 32) to a high-dimensional space, resulting in a high-dimensional feature of h×w×d (where H and W are the height and width dimensions of the feature map and d is the channel dimension). Dividing the high-dimensional feature into a plurality of h×w standard blocks (each standard block corresponds to a vector, recording position information thereof), and splicing pixels with the same relative positions in the standard blocks to obtain a vector sequence T of an expanded N×P×d (wherein N=h×w, P=HW/N is the size of the vector sequence) _As 、T _Bs . Then T is taken up _As 、T _Bs Channel connection and automatic input of transducer for feature enhancement, and F is enhanced during feature extraction _As 、F _Bs Time correlation between.

The transducer consists mainly of layer normalization (Layer Normalization, LN), multi-Head Attention Mechanism (MHA), and full connectivity layer (Multilayer Perception, MLP). Separating the transform enhanced vector sequence to obtain a vector sequence T with rich context space-time information _As-new 、T _Bs-new Folding and point-by-point convolution to obtain a feature map F with unchanged channel number and rich context space-time information _As ’、F _Bs ' and with the original F input _As 、F _Bs Performing channel connection to obtain F _As ”、F _Bs By point-by-point convolution, a characteristic diagram F comprehensively considering A, B time relevance is obtained _As-new 、F _Bs-new . Compared with the traditional converter which considers information interaction between each pixel, the mobile VIT only carries out information interaction between pixels with the same relative positions, so that the calculated amount is greatly reduced, a plurality of sequences can be calculated in parallel, and the training efficiency is high. MobileVIT module with enhanced time relevance through connection T _As 、T _Bs The multi-head attention mechanism of the transducer focuses on information of different time phases and different positions at the same time, so that the time correlation between two time phase images can be enhanced, and the feature extraction of the two time phase images is closely connected and is not independently carried out.

Firstly, a multi-scale two-phase characteristic diagram F is obtained in a multi-scale difference fusion device _As 、F _Bs (s＝64)，F _As-new 、F _Bs-new (s=16, 32), and a multiscale difference feature map Ds (s= 16,32,64) is obtained by subtraction and absolute value operation. The method comprises the steps of up-sampling D16 to obtain U32 and U64, respectively connecting the U32 and the U64 with Ds channels with corresponding sizes, inputting the D channels into a CBAM module to obtain M32 and M64, up-sampling the M32, and multi-scale fusing the M64 multi-scale difference feature map, so that a space-time correlation enhanced change detection model can consider coarse granularity semantic features and fine granularity detail features, and a CBAM attention module (CBAM (Convolutional Block Attention Module) is a lightweight convolution attention module which combines a channel and a space attention mechanism module to make network weights of the CBAM module pay more attention to changes in two time phase images in channel dimension and space dimension, wherein the channel attention makes the network pay more attention to the feature map related to the changes, and the space attention increases the space weight of the changes.

The prediction head finally predicts using a two-layer full convolution network (Fully Convolutional Networks, FCN) as a classifier. The whole network of the space-time correlation enhanced change detection model consists of a lightweight mobile VIT, a shallow FCN, an attention mechanism and the like, and has the advantages of small parameter and model body, high fitting speed and high prediction accuracy.

Step 2, a remote sensing image bare land change detection system is established based on a change detection model with enhanced time-space relevance, and the method specifically comprises the following steps: the remote sensing images (hereinafter referred to as two-phase images) obtained in the front and back phases and preprocessed by radiation correction and precise geometric correction are input into a remote sensing image bare land change detection system, and change detection results in the two-phase images are extracted through automatic processing such as cutting, prediction, splicing, position information recovery, change detection results generation and the like.

Firstly, acquiring remote sensing images (as shown in fig. 5A, a left side 501 is an A-phase remote sensing image and a right side 502 is a B-phase remote sensing image) with spatial resolution of 0.8 meter (m) of Jilin No. 1 (Jilin-1 GXA) of 2021, 7 years and 10 months of test areas subjected to radiation correction and precise geometric correction pretreatment, wherein the geometric registration error is within an allowable range, and inputting the remote sensing images into a bare land change detection system; clipping the two-phase images according to the same size and range to obtain a plurality of standard image blocks with the size of 256 x 256 pixels, and naming the standard image blocks according to row and column numbers to obtain paired image blocks (corresponding to the image sample pairs); inputting each pair of cut image blocks into a trained space-time correlation enhanced change detection model to obtain corresponding pixel-level binary change prediction result image blocks (wherein 1 represents a region of bare land change, corresponds to a change region of the target object, and 0 represents other regions); splicing according to row and column numbers of the predicted result image blocks to obtain a pixel-level binary change predicted result; finally, the geographical position information of the pixel-level binary change prediction result is restored to obtain a change region grid result as shown in fig. 5B, wherein a black region 503 represents a change region, and a planar vector result is generated according to the change region grid result so as to perform post-processing according to different requirements.

The step 2 is also preceded by the following steps: and establishing a bare land change detection sample data set by utilizing remote sensing images which are acquired at different times and are preprocessed by radiation correction and accurate geometric correction, and inputting the bare land change detection sample data set into a change detection model with enhanced space-time correlation for training to obtain the trained change detection model with enhanced space-time correlation. And collecting Beijing No. two (BJ-2) satellite remote sensing images which are obtained in the training and verification test areas of 7 months, 9 months and 1 month and 5 months of 2021 and are subjected to radiation correction and precise geometric correction pretreatment, wherein the spatial resolution is 0.8m, performing visual interpretation, and marking bare lands of the remote sensing images in different time phases by using a planar vector base map. Selecting images of two time phases of 7 months and 9 months in 2020, generating a bare land change binary label aiming at the images of the two time phases according to the distribution of bare lands in the planar vector base map, cutting the images of the two time phases and the corresponding binary label into standard image blocks with 256 x 256 pixels according to the same mode, storing the standard image blocks as png format images, and obtaining a front time phase image A, a rear time phase image B and a binary label (namely the real change area) of a bare land change detection sample set through sample screening. Images of two phases of 1 month and 5 months 2021 are similarly processed to obtain 1083 groups of samples, and randomly dividing the samples according to 8:2:1 to obtain a training set 779 group, a verification set 198 group and a test set 109 group, wherein the proportion of bare land change target pixels to background pixels is approximately 1:4. And inputting the manufactured bare land change detection data set into a change detection model with enhanced space-time correlation, adjusting and selecting optimal super parameters for model training, and continuously reducing training loss to be stable along with the continuous increase of iteration times until the model converges to finish the training of the model.

As shown in fig. 5C, an embodiment of the present application provides a method for making a remote sensing image bare land change detection sample data set, which includes steps S501 to S505a as follows:

step S501: a phase a high resolution image subjected to radiation correction and precise geometry correction and a phase B high resolution image subjected to radiation correction and precise geometry correction;

step S502a: labeling the change of bare land;

step S503a: bare land change labels;

step S504a: cutting;

step S505a: and (5) sample screening.

Here, the method of sample screening may include: and screening out an image pair with the ratio of the target pixel to the background pixel being greater than 1:4.

As shown in fig. 5C, an embodiment of the present application provides an operation method of a remote sensing image bare land change detection system, which includes steps S501 to S505b as follows:

step S502b: cutting;

step S503b: predicting;

here, the cut image block is predicted by a change detection model with enhanced spatio-temporal correlation.

Step S504b: splicing;

Step S505b: recovering the position information;

step S506b: and extracting a change detection result.

The operation method of the remote sensing image bare land change detection system provided by the embodiment of the application has the following beneficial effects:

1. the existing change detection method has less attention to the bare land, and does not consider the characteristic of large spatial scale difference of the bare land object; in addition, most of deep learning change detection models are relatively huge in model body quantity and parameter quantity, are not suitable for small samples, and have low training efficiency; and the time relevance of two-phase images is mostly not fully considered by the twin network structure change detection model. According to the embodiment of the application, a space-time correlation enhanced change detection model is constructed, and a transducer is introduced to solve the problems that CNN receptive fields are limited, so that image global features are difficult to obtain, and the extraction effect of the features on a large scale is not ideal; compared with the traditional converter which considers information interaction between each pixel, the lightweight MobileVIT only carries out information interaction between pixels with the same relative position, so that the calculated amount is greatly reduced, a plurality of sequences can be calculated in parallel, the training efficiency is high, the method is better suitable for a small sample data set, and the overfitting is avoided; the mobile VIT module with enhanced time relevance can enhance the time relevance between two-phase images by connecting vector sequences of the two phases, so that the multi-head attention mechanism of the transducer focuses on information of different phases and different positions at the same time, and the feature extraction of the two-phase images is closely connected instead of being independently carried out.

2. Most of the existing change detection technologies based on deep learning pay attention to models and algorithms, change detection is performed based on standard image blocks with fixed sizes, an automatic process for extracting the change of a two-time-phase remote sensing image in a large range is not established, and the processing process is complex and low in efficiency. The embodiment of the application provides a method and a system for detecting bare land change of a remote sensing image by considering time correlation, which directly output a bare land change grid and a vector result by a two-phase remote sensing image input system after radiation correction and geometric fine correction, can accurately and quickly automatically extract the bare land change facing environmental treatment from the two-phase remote sensing image, is convenient for evaluation, analysis and interpretation of the bare land treatment effect, and can meet the actual application requirements of bare land monitoring and treatment in urban environment fine treatment.

Based on the foregoing embodiments, the embodiments of the present application provide a training device for a change detection model of a remote sensing image, where the device includes each module included, and each unit included in each module may be implemented by a processor in a computer device; in an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), a microprocessor (Microprocessor Unit, MPU), or the like.

Fig. 6 is a schematic structural diagram of a training device for a change detection model of a remote sensing image according to an embodiment of the present application, where the change detection model includes a feature extraction module, a difference module and a prediction module, the feature extraction module includes two branches sharing a time-associated sub-module, and each of the branches further includes an initial feature extraction sub-module, and as shown in fig. 6, a training device 600 for a change detection model of a remote sensing image includes: a first acquisition module 610, a first extraction module 620, a first association module 630, a first determination module 640, a first classification module 650, and a training module 660, wherein:

a first obtaining module 610, configured to obtain a set of image sample pairs, where the image sample pairs are two-phase remote sensing images in the same space;

a first extraction module 620, configured to perform feature extraction on one image sample in each of the image sample pairs by using the initial feature extraction sub-module of each branch, so as to obtain a first feature map pair set;

a first correlation module 630, configured to perform a time correlation enhancement process on a first feature map pair in the first feature map pair set by using the time correlation sub-module, so as to obtain an updated first feature map pair;

A first determining module 640, configured to determine, using the difference module, a difference feature map between the corresponding image sample pair based on the updated first feature map pair;

the first classification module 650 is configured to perform classification processing on the difference feature map by using the prediction module, so as to obtain a predicted change area of the target object in the corresponding image sample pair;

and a training module 660, configured to train the change detection model to converge based on the predicted change region and the actual change region of the target object in each of the image sample pairs in the image sample pair set.

In some embodiments, the first set of feature map pairs includes first feature map pairs at different resolutions; the first association module 630 includes: a determining submodule, configured to determine a first feature map pair at least one target resolution in the first feature map pair set; and the first correlation sub-module is used for carrying out time correlation enhancement processing on the first feature map pairs under each target resolution by adopting the time correlation sub-module to obtain at least one updated first feature map pair.

In some embodiments, the determining submodule includes: a first determining unit, configured to determine a first feature map pair at least one non-target resolution in the first feature map pair set; a second determining unit, configured to determine a first difference feature map between each of the updated first feature map pairs and a second difference feature map between each of the first feature map pairs at the non-target resolution, respectively; and the fusion unit is used for fusing each first difference characteristic image and each second difference characteristic image to obtain a difference characteristic image between the corresponding image sample pairs.

In some embodiments, the time correlation submodule includes a local characterization unit, a global characterization unit, and a fusion unit that are improved based on the twin MobileVIT module, and the first correlation module 630 includes: the first dividing sub-module is used for carrying out region division and position coding on the first characteristic diagram pairs in the first characteristic diagram pair set by adopting the local characterization unit to obtain second characteristic diagram pairs; the second association submodule is used for carrying out time association enhancement processing on the second feature map pair by adopting the global characterization unit to obtain a third feature map pair; and the first fusion sub-module is used for fusing each third characteristic diagram in the third characteristic diagram pair with the corresponding first characteristic diagram by adopting the fusion unit to obtain an updated first characteristic diagram pair.

In some embodiments, the global characterization unit includes an unfolding subunit, a channel connection subunit, an association subunit, a channel separation subunit, and a folding subunit; the second association sub-module includes: the first unfolding unit is used for unfolding each second characteristic diagram of the second characteristic diagram pair by adopting the unfolding subunit to obtain a fourth characteristic diagram pair; the first splicing unit is used for splicing the fourth characteristic diagrams in the fourth characteristic diagram pair by adopting the channel connection subunit; the first association unit is used for carrying out time association enhancement processing on the spliced fourth feature images by adopting the association subunit to obtain fifth feature images; the first separation unit is used for separating the fifth characteristic diagram by adopting the channel separation unit according to the reverse direction of the splicing mode of the channel connection subunit to obtain a sixth characteristic diagram pair; and the first folding unit is used for carrying out folding processing on each sixth characteristic diagram in the sixth characteristic diagram pair by adopting the folding subunit to obtain a third characteristic diagram pair.

The embodiment of the application provides a change detection device of a remote sensing image, which is applied to a change detection model of the remote sensing image, wherein the change detection model comprises a feature extraction module, a difference module and a prediction module, the feature extraction module comprises two branches sharing a time correlation sub-module, each branch further comprises an initial feature extraction sub-module, and the device comprises:

the second acquisition module is used for acquiring an image pair to be detected, wherein the image pair to be detected is a pair of image blocks which are respectively cut into two remote sensing images to be detected, have preset sizes and correspond to the same space;

the second extraction module is used for extracting the characteristics of one to-be-detected image in the to-be-detected image pairs by adopting the initial characteristic extraction sub-module of each branch respectively to obtain a first characteristic image pair set;

the second association module is used for carrying out time association enhancement processing on the first feature map pairs in the first feature map pair set by adopting the time association sub-module to obtain updated first feature map pairs;

the second determining module is used for determining a difference characteristic diagram between the image pair to be detected based on the updated first characteristic diagram by adopting the difference module;

And the second classification module is used for classifying the difference feature images by adopting the prediction module to obtain a change area of the target object in the image pair to be detected.

the second association module includes: the second dividing sub-module is used for carrying out region division and position coding on the first characteristic diagram pairs in the first characteristic diagram pair set by adopting the local characterization unit to obtain second characteristic diagram pairs; the third association sub-module is used for carrying out time association enhancement processing on the second feature map pair by adopting the global characterization unit to obtain a third feature map pair; and the second fusion sub-module is used for fusing each third characteristic diagram in the third characteristic diagram pair with the corresponding first characteristic diagram by adopting the fusion unit to obtain an updated first characteristic diagram pair.

In some embodiments, the global characterization unit includes an unfolding subunit, a channel connection subunit, an association subunit, a channel separation subunit, and a folding subunit; the third association sub-module includes: the second unfolding unit is used for unfolding each second characteristic diagram of the second characteristic diagram pair by adopting the unfolding subunit to obtain a fourth characteristic diagram pair; the second splicing unit is used for splicing the fourth characteristic diagrams in the fourth characteristic diagram pair by adopting the channel connection subunit; the second association unit is used for carrying out time association enhancement processing on the spliced fourth feature images by adopting the association subunit to obtain a fifth feature image; the second separation unit is used for separating the fifth characteristic diagram by adopting the channel separation unit according to the reverse direction of the splicing mode of the channel connection subunit to obtain a sixth characteristic diagram pair; and the second folding unit is used for carrying out folding processing on each sixth characteristic diagram in the sixth characteristic diagram pair by adopting the folding subunit to obtain a third characteristic diagram pair.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, the functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to perform the methods described in the embodiments of the methods, and for technical details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the description of the embodiments of the methods of the present disclosure for understanding.

It should be noted that, in the embodiment of the present application, if the remote sensing image change detection and the model training method thereof are implemented in the form of a software function module, and sold or used as an independent product, the remote sensing image change detection and the model training method may also be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or some of contributing to the related art may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the application are not limited to any specific hardware, software, or firmware, or any combination of hardware, software, and firmware.

The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes part or all of the steps in the method when executing the program.

Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.

Embodiments of the present application provide a computer program comprising computer readable code which, when run in a computer device, causes a processor in the computer device to perform some or all of the steps for carrying out the above method.

Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, in other embodiments the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, the storage medium, the computer program and the computer program product of the present application, reference should be made to the description of the embodiments of the method of the present application.

It should be noted that fig. 7 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application, and as shown in fig. 7, the hardware entity of the computer device 700 includes: a processor 701, a communication interface 702, and a memory 703, wherein:

the processor 701 generally controls the overall operation of the computer device 700.

Communication interface 702 may enable the computer device to communicate with other terminals or servers over a network.

The memory 703 is configured to store instructions and applications executable by the processor 701, and may also cache data (e.g., image data) to be processed or processed by various modules in the processor 701 and the computer device 700, which may be implemented by FLASH memory (FLASH) or random access memory (Random Access Memory, RAM). Data transfer may occur between the processor 701, the communication interface 702 and the memory 703 via the bus 704.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence number of each step/process described above does not mean that the execution sequence of each step/process should be determined by its functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims

1. A training method of a change detection model of a remote sensing image, wherein the change detection model comprises a feature extraction module, a difference module and a prediction module, the feature extraction module comprises two branches sharing a time correlation sub-module, each branch further comprises an initial feature extraction sub-module, the method comprises:

acquiring an image sample pair set, wherein the image sample pair is a two-phase remote sensing image in the same space;

the initial feature extraction submodule of each branch is adopted to carry out feature extraction on one image sample in each image sample pair, and a first feature map pair set is obtained;

performing time relevance enhancement processing on the first feature map pairs in the first feature map pair set by adopting the time relevance sub-module to obtain updated first feature map pairs;

determining a difference feature map between corresponding image sample pairs based on the updated first feature map pairs by adopting the difference module;

Classifying the difference feature map by adopting the prediction module to obtain a prediction change area of a target object in the corresponding image sample pair;

and training the change detection model to be converged based on the predicted change area and the actual change area of the target object in each image sample pair in the image sample pair set.

2. The method of claim 1, wherein the first set of feature map pairs comprises first feature map pairs at different resolutions;

the step of performing time relevance enhancement processing on the first feature map pairs in the first feature map pair set by using the time relevance sub-module to obtain updated first feature map pairs includes:

determining a first feature map pair at least one target resolution in the first feature map pair set;

and carrying out time relevance enhancement processing on the first feature map pairs under each target resolution by adopting the time relevance sub-module to obtain at least one updated first feature map pair.

3. The method of claim 2, wherein determining a difference feature map between a corresponding image sample pair based on the updated first feature map pair comprises:

Determining a first feature map pair at least one non-target resolution in the first set of feature map pairs;

determining a first difference feature map between each updated first feature map pair and a second difference feature map between each first feature map pair at the non-target resolution respectively;

and fusing each first difference feature map and each second difference feature map to obtain a difference feature map between corresponding image sample pairs.

4. The method according to claim 1 to 3, wherein the time-dependent submodule comprises a local characterization unit, a global characterization unit and a fusion unit which are improved based on a twin MobileVIT module,

performing region division and position coding on the first feature map pairs in the first feature map pair set by adopting the local characterization unit to obtain second feature map pairs;

performing time relevance enhancement processing on the second feature map pair by adopting the global characterization unit to obtain a third feature map pair;

And fusing each third characteristic diagram in the third characteristic diagram pair with the corresponding first characteristic diagram by adopting the fusion unit to obtain an updated first characteristic diagram pair.

5. The method of claim 4, wherein the global characterization unit comprises an unfolding subunit, a channel connection subunit, an association subunit, a channel separation subunit, and a folding subunit;

and performing time relevance enhancement processing on the second feature map pair by adopting the global characterization unit to obtain a third feature map pair, wherein the method comprises the following steps:

expanding each second characteristic diagram of the second characteristic diagram pair by adopting the expanding subunit to obtain a fourth characteristic diagram pair;

splicing the fourth characteristic diagram in the fourth characteristic diagram pair by adopting the channel connection subunit;

performing time relevance enhancement processing on the spliced fourth feature images by adopting the relevance subunit to obtain fifth feature images;

separating the fifth characteristic diagram by adopting the channel separation sub-unit according to the reverse direction of the splicing mode of the channel connection sub-unit to obtain a sixth characteristic diagram pair;

and carrying out folding processing on each sixth characteristic diagram in the sixth characteristic diagram pair by adopting the folding subunit to obtain a third characteristic diagram pair.

6. A method for detecting changes in a remote sensing image, the method being applied to a change detection model of the remote sensing image, the change detection model comprising a feature extraction module, a difference module and a prediction module, the feature extraction module comprising two branches sharing a time-dependent sub-module, and each of the branches further comprising an initial feature extraction sub-module, the method comprising:

acquiring an image pair to be detected, wherein the image pair to be detected is a pair of image blocks which are respectively cut into two remote sensing images to be detected, have preset sizes and correspond to the same space;

the initial feature extraction sub-module of each branch is adopted to extract features of one to-be-detected image in the to-be-detected image pairs, and a first feature image pair set is obtained;

determining a difference feature map between the image pair to be detected based on the updated first feature map by adopting the difference module;

and classifying the difference feature map by adopting the prediction module to obtain a change region of the target object in the image pair to be detected.

7. The method of claim 6, wherein the time-dependent submodules include submodules of a local characterization unit, a global characterization unit and a fusion unit based on a twin MobileVIT module improvement,

8. The method of claim 7, wherein the global characterization unit comprises an unfolding subunit, a channel connection subunit, an association subunit, a channel separation subunit, and a folding subunit;

9. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 8 when the program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, realizes the steps in the method according to any one of claims 1 to 8.