CN115713529A

CN115713529A - Light-weight optical remote sensing image change detection method based on efficient attention

Info

Publication number: CN115713529A
Application number: CN202211524552.3A
Authority: CN
Inventors: 李军伟; 李世杰; 杨伟; 金勇�; 郭凌辉
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-02-24

Abstract

The invention discloses a light-weight optical remote sensing image change detection method based on high-efficiency attention, which comprises the following steps of: firstly, preprocessing an optical remote sensing image to obtain a corresponding change label graph; then cutting the sample to obtain a training sample; then, the double time phase images are connected in series and pass through an FOCUS module, a depth residual block and a lightweight attention module to obtain refined different-scale feature maps, and then a multi-scale feature fusion module is used for aggregating a plurality of obtained feature maps to generate a variation map; after training is finished, all parameter information of the model is stored; and finally, inputting the preprocessed sample to be detected into a change detection model, and outputting a detection result graph through calculation. The scheme of the invention utilizes FOCUS down-sampling layer, depth residual volume block capable of expanding receptive field, high-efficiency attention mechanism and multi-scale feature fusion module to realize extraction of change region with less parameters and calculated amount.

Description

Light-weight optical remote sensing image change detection method based on efficient attention

Technical Field

The invention relates to the field of optical remote sensing image change detection, in particular to a light-weight optical remote sensing image change detection method based on high-efficiency attention.

Background

The remote sensing image change detection is a technology for obtaining change information by comparing and analyzing two or more remote sensing images acquired in the same region at different times. Different areas have different definitions for changes, such as agricultural surveys, forest monitoring, city expansion, and disaster assessment. In recent years, with the rapid development of satellite remote sensing technology and computer vision technology, remote sensing image change detection is becoming an active research topic.

With the rapid development of computer technology and the continuous increase of high-resolution optical remote sensing image data sets, scholars at home and abroad propose a plurality of change detection methods based on deep learning. The change detection method based on deep learning has nonlinear features and excellent feature extraction capability, a pixel classification map and a high-semantic-abstraction space context are predicted from an original image, the boundary between the traditional pixel-based method and the traditional object-based method is blurred, and complex scenes can be better understood. The change detection method based on the deep learning does not need image preprocessing, so that not only is manual intervention reduced, but also errors caused by preprocessing are avoided, and therefore, the use of the remote sensing image change detection method based on the deep learning in the aspect of solving the problem of remote sensing image change detection is exponentially increased. Currently, mainstream change detection methods based on deep learning can be roughly divided into two types: a Convolutional Neural Network (CNN) based change detection method and a transform based change detection method. CNN is widely used in deep learning by virtue of its powerful feature learning ability. In recent years, many CNN-based change detection methods [1] to [7] have been proposed. The self-attention mechanism [8] is widely used in the field of natural language processing to find correlations between different parts of the input. Vision Transformer [9] and Swin Transformer [10] introduced the self-attention mechanism into the field of computer Vision and improved. The network based on the self-attention mechanism and the Transformer models the global distance of the input feature map through non-local self-attention. Based on this, many scholars introduce it into the field of change detection to obtain better change detection performance [11-13].

However, the CNN and Transformer based deep learning change detection method still has some problems. The simple model [1-2,7] has small parameter quantity and high reasoning speed, but the performance of change detection is low, and the requirement of accurately identifying a change area cannot be met. The complex model [3-6,11-13] uses more modules, a larger structure and a more complex training process, so that the change detection capability is greatly improved, but the model has high parameter quantity and slow reasoning speed, and the application of the model in large-scale remote sensing image processing, industrial fields or applications requiring real-time performance is limited.

Disclosure of Invention

The invention aims to provide a light-weight optical remote sensing image change detection method based on high-efficiency attention, which can realize better change detection performance with the parameter quantity of 0.88MB and the reasoning speed of 4.75 ms.

The technical scheme adopted by the invention is as follows:

A. sequentially carrying out orthorectification, image registration, image stretching and image numerical value normalization preprocessing on the double-time phase optical remote sensing image so as to obtain a remote sensing image with consistent data distribution;

B. b, labeling the updated part in the remote sensing image of the preprocessed double-time-phase optical remote sensing image obtained in the step A to obtain a corresponding change label graph;

C. cutting the label graph obtained in the step B and the preprocessed double-time-phase optical remote sensing image obtained in the step A by adopting the same size to obtain a training sample;

D. connecting the double time-phase remote sensing images in the training sample in series;

E. the image pairs connected in series are down-sampled by an FOCUS module, and the down-sampled feature images are input into a Depth Residual Block (DRB) for coding so as to extract feature images related to the change area;

F. inputting the characteristic diagram obtained in the step E into an Efficient Attention Module (EAM) to realize the refinement of the characteristic diagram;

G. inputting the features with different scales obtained in the step F into a multi-scale Feature Fusion Module (MFFM) to obtain a final Feature map X;

H. inputting the final fusion feature X into a prediction head (fig. 2 (c)) formed by 1 × 1 convolution and obtaining a prediction change diagram of a two-phase image of a training sample;

I. combining the two-class cross entropy loss and the Dice loss to form a mixed loss function so as to calculate the loss between the prediction change diagram of the training sample double-time phase image obtained in the step H and the corresponding label diagram;

J. after training, storing the weight parameters and the hyper-parameters of the trained change detection model;

K. sequentially carrying out orthorectification, image registration, image stretching and image numerical value normalization pretreatment on front and rear time phase remote sensing images to be detected, and then cutting the front and rear time phase remote sensing images to be detected by adopting the same size to obtain a sample to be detected;

and L, inputting the sample to be detected into the change detection model obtained in the step J, and outputting a predicted change diagram of the sample to be detected through calculation.

The invention provides a Lightweight optical remote sensing image Change Detection method (light Change Detection Network, LCDNet) based on high-efficiency attention, which takes optical remote sensing image Change Detection as an application background and aims at solving the problem that the existing Change Detection method is difficult to balance Change Detection performance and model parameters. The detection method can realize better change detection performance and higher reasoning speed with less parameter quantity. Specifically, the invention designs an end-to-end lightweight change detection method aiming at the problems. The invention is designed from four aspects of down-sampling layer, convolution mode, attention mechanism and feature fusion respectively to meet the requirements of fewer parameters and higher change detection performance. The present invention reduces the number of parameters of the model by placing the downsampling layer at the beginning of each coding layer rather than at the end. The beginning of using a down-sampling layer at the encoding layer can cause the loss of partial characteristic information, so the invention introduces the FOCUS module widely used in the field of target detection to solve the problem. The FOCUS module can both ensure that information is not lost and implement double down sampling of the feature map. The invention uses the Depth (DW) convolution of a large convolution kernel in the network, thereby not only enlarging the receptive field, but also greatly compressing the parameters and the calculated amount. In order to enable the network to pay more attention to the change area and improve the change detection performance of the network, the invention designs an Efficient Attention Module (EAM). And the EAM adds the channel dimension weight obtained by the quick one-dimensional convolution and the space dimension weight obtained by the single-layer two-dimensional convolution and redistributes the weights so as to retain the correlation between the channel and the space characteristics. The invention also designs a multi-scale feature fusion module (MFFM) which only uses valid feature streams when performing feature fusion. The MFFM can achieve efficient fusion of multi-scale features with a simple structure and lower parameters. Compared with the traditional algorithm, the scheme of the invention can realize better change detection performance with less parameters and faster reasoning speed, and can effectively solve the problem that the high-performance change detection algorithm is difficult to apply to the industrial field or needs real-time performance application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a schematic diagram of the LCDNet of the present invention.

FIG. 3 is a diagram of a high efficiency module of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art based on the embodiments of the present invention without inventive step, are within the scope of the present invention.

As shown in fig. 1, the present invention comprises the following steps:

B. marking the preprocessed double-time-phase optical remote sensing image obtained in the step A with an updated part (mainly comprising vegetation change, newly-built urban buildings, suburb expansion, foundation before construction, road expansion and the like) in the remote sensing image to obtain a corresponding change label map;

C. cutting the change label graph obtained in the step B and the preprocessed double-time phase optical remote sensing image obtained in the step A by adopting the same size to obtain a training sample;

D. because the change detection task can be regarded as dividing the change area in the double time phase image, the remote sensing images before and after the change in the training sample can be connected in series to form a whole;

E. and D, encoding the series remote sensing image obtained in the step D. The input feature graph is firstly subjected to downsampling operation to reduce the parameter number of the model, double downsampling is carried out on the feature graph by using a FOCUS module in a front-two-layer encoder while ensuring that feature graph information is not lost, and double downsampling is carried out on a back-two-layer decoder by using a MaxPool2d with a convolution kernel of 3 and a step of 2. Since the feature map of the shallow layer network contains more detailed texture information, the present invention only uses the FOCUS module in the first two coding layers. A drawback of standard CNN is the limited receptive field resulting from the use of a fixed small convolution kernel in the network. To overcome this problem, recent work has focused on using larger convolution kernels to enlarge the field of view. Therefore, the invention stacks the depth convolution layers with convolution kernels of 3 × 3 and 5 × 5, so that the receptive field is enlarged, and the parameters and the calculated amount are greatly compressed. In order to prevent the phenomenon of network degradation along with the increase of the number of network layers, carrying out 1 × 1 convolution on the feature map after down sampling to form a residual error;

F. inputting the feature map F obtained by the encoder in the step E into a designed Efficient Attention Module (EAM), and further refining the feature map extracted in the step E so as to improve the change detection performance of the network, wherein the efficient attention module is shown in FIG. 3. The average pooling layer is used on feature F to generate aggregate vectors of size C × 1 × 1 and 1 × H × W (C is the number of channels, H and W are the height and width of the feature). A one-dimensional convolution with a convolution kernel of 3 is applied to the C × 1 × 1 aggregate vector to obtain an attention map of the channel dimensions. A two-dimensional convolution with a convolution kernel of 7 is applied to the 1 xh × W aggregate vector to obtain an attention map of spatial dimensions. The two attention feature maps described above are expanded to C × H × W addition and the weights are reassigned to obtain an attention map M (F) that preserves the correlation between the channel and spatial features. The input feature map is multiplied element by element with the attention map M (F) to obtain a fine feature map F':

M(F)＝σ(C1D ₃ (AvgPool(F))+C2D ₇ (AvgPool(F)))

where F represents the profile obtained after passing through the depth residual block, avgPool (. Cndot.) represents the flattening pooling operation, C1D ₃ (. C2D) represents a one-dimensional convolution with a convolution kernel size of 3 ₇ (. Cndot.) represents a two-dimensional convolution with a convolution kernel size of 7, σ represents a sigmoid function,

representing element multiplication, and F' represents a weighted refined feature map;

G. in order to effectively fuse the feature maps of different scales extracted in step F, the present invention proposes a multi-scale feature fusion module (MFFM), as shown in fig. 2 (b). The characteristic map is obtained mathematically as follows:

X ₁ ＝C(F ₄ )

X ₂ ＝C(C(X ₁ ,F ₃ ))

X ₃ ＝C(C(X ₁ ,X ₂ ,F ₂ ))

X ₄ ＝C(X ₂ ,X ₃ ,F ₁ )

X＝X ₁ +X ₂ +X ₃ +X ₄

wherein the function C (-) represents the convolution operation using the convolution block (FIG. 2 (d)) formed by convolution of 1 × 1 convolution and 3 × 3DW, and X represents the convolution operation ₁ 、X ₂ 、X ₃ And X ₄ Respectively representing a feature map obtained by four layers of depth residual blocks, F ₁ 、F ₂ 、F ₃ And F ₄ Representation feature diagram X ₁ 、X ₂ 、X ₃ And X ₄ And a refined characteristic diagram is obtained after the efficient attention module. The final fused features X are represented by a feature map X ₁ 、X ₂ 、X ₃ And X ₄ And the up-sampling is added. Because only effective feature streams and designed convolution blocks are used during feature fusion, the proposed MFFM can realize effective fusion of multi-scale features with a simple structure and lower parameters;

I. cross entropy loss common in the classification of two tasks

And can mitigate loss of background imbalance in the sample

Combined formation mixing loss L = L _bce +L _dice To calculate the loss between the predicted variation graph of the two-phase time-phase image of the training sample obtained in step H and the corresponding label graph, wherein y _i,j Representing the probability that the pixel point (i, j) in the corresponding label graph is a changed pixel,

representing the probability that a pixel point (i, j) in the prediction change graph is a change pixel, wherein n and m respectively represent the width and height of an image pixel level;

J. after training, storing the weight parameters and the hyper-parameter information of the trained change detection model;

K. sequentially carrying out orthorectification, image registration, image stretching and image numerical value normalization pretreatment on front and rear time phase remote sensing images to be detected, and cutting the front and rear time phase remote sensing images to be detected by adopting the same size to obtain a sample to be detected;

and L, inputting the sample to be detected into the change detection model stored in the step J, and outputting a predicted change diagram of the sample to be detected through calculation.

In order to solve the problems that the existing high-performance change detection method has large parameter quantity and low reasoning speed and is difficult to deploy in the industrial field or needs real-time performance, the invention utilizes an FOCUS module which is used at the beginning of an encoder, a depth residual block which can expand a receptive field and is used in the encoder, a high-efficiency attention module which refines the characteristics extracted by the encoder and a multi-scale fusion module which can fully utilize characteristic information of different scales. The invention adopts an FOCUS down-sampling layer, a depth residual error rolling block, an efficient attention mechanism and a multi-scale feature fusion module to solve the problem, wherein the FOCUS down-sampling layer is arranged at the beginning of a decoding layer to ensure that the parameter quantity of a model is reduced on the premise of not losing information, the depth residual error rolling block expands the receptive field and greatly compresses parameters and calculated quantity through DW convolution with convolution kernels of 3 multiplied by 3 and 5 multiplied by 5, the efficient attention mechanism can refine a feature map by weak parameter increment to improve the performance of network change detection, and the multi-scale feature fusion module can realize the effective fusion of multi-scale features by using effective feature flow only and lower parameter quantity.

The present invention has been experimented with a change detection (CDD) dataset [1] containing multiple types of changes. In order to verify the validity of the proposed LCDNet, the following thirteen advanced remote sensing image change detection methods were selected for comparison with the method of the present invention and briefly introduced.

FC-EF (full volumetric-Early Fusion) [1] was proposed based on the U-Net architecture, where dual-temporal images are concatenated into multi-band images for input, with skip connections used to gradually transmit multi-scale features from the encoder to the decoder to recover spatial information. FC-Siam-conc (full volumetric-size-localization) [1] as a variant of the FC-EF model, a Siamese encoder was used to extract features of the dual temporal image, and then the same level of features from the encoder were connected to the decoder. Unlike FC-Sim-conc, where FC-Sim-diff (full volumetric-Simase-Difference) [1] is another type of FC-EF model, the jump junction of FC-Sim-diff conveys the absolute Difference between the two temporal features. CDNet [2] is used for the study of street scene change detection. It consists of a contraction block and an expansion block, and a change map is obtained through a softmax layer. DDCNN (Difference-enhancement Dense-accommodation Neural Network) [3] simplifies UNet + +. A dense attention method is combined in feature fusion, and high-level features are utilized to guide the selection of low-level features so as to retain texture and detail information of a change region. DSIFN (Deeply Supervised Image Fusion Network) [4] uses channel and spatial attention to cross-utilize the feature maps obtained from the VGG16 pre-training model multiple times over multiple scales for efficient Fusion to obtain variation maps more accurately. SNUNet-CD (simple network architecture-Change Detection) [5] a Siamese network is combined with a UNet + + network and an integrated Channel Attention Module (ECAM) is used for fusing feature maps obtained from a backbone network at multiple semantic levels, thereby inhibiting positioning errors and semantic vacancies. RDP-Net (Region Detail provisioning Network) [6] is a ConvMixer-based Network, and good change detection performance can be obtained only by a small number of parameters. The method provides a training mode for learning detail information from easy to difficult and focuses on edge loss of network boundary details. LSNet (Lightweight Simesese network) [7] replaces the standard convolution with a depth separable hole convolution. LSNet _ dense FPN fuses the multi-scale features using dense FPN (dense Feature Pyramid Network) proposed by SNUNet-CD. LSNet _ diffFPN proposed a diffFPN (difference Feature Pyramid Network) based on denseFPN. LSNet _ diffFPN eliminates redundant dense connection, and only keeps effective characteristic flow when carrying out Siamese characteristic fusion, thereby compressing parameters and calculated amount. STANet (Spatial-Temporal orientation neural Network) [11] inputs global features extracted by the ResNet18 Network into a self-Attention mechanism module and captures long-term spatiotemporal correlations to learn a better representation. DASNet (Dual active full conditional Siamese Networks) [12] applies the attention-concentrating mechanism to Siamese Networks. BIT (Bitemporal Image Transformer) [13] expresses a bi-temporal Image as several semantics and uses a Transformer encoder to model the context in a compact semantic based spatiotemporal. The semantics are fed back to the pixel space for refining the original features by the transform decoder.

Table i is a comparative experiment performed on the CDD data set. Precision (Precision, P), recall (Recall, R), F1 Score (F1 Score, F1), joint Intersection (IoU) were used to quantitatively evaluate the performance of the methods involved. The number of parameters (Params), floating point Operations Per Second (FLOPs) and Inference speed (Inference time) are used to measure the computational load and efficiency of the involved methods. The accuracy, recall ratio, F1 score, and joint cross degree index are calculated as follows:

here, true Positive (TP) indicates the number of correctly detected non-changed pixels, false Positive (FP) indicates the number of unpredicted non-changed pixels, and False Negative (FN) indicates the number of unpredicted changed pixels. The accuracy indicates the probability of all detected pixels changing. The recall rate represents the probability that all changed pixels were correctly detected. F1 is the harmonic mean of precision and recall that balances conflicts by considering both precision and recall. The IoU is the predicted change pixel and the overlap area between the change pixels divided by the union area between them.

Table i comparative experiments on CDD data sets

As can be seen from the data in the above table, the solution of the present invention improves the CDD data set by 0.56% F1 and 1.01% IoU compared to other existing remote sensing image change detection methods. The model provided by the scheme of the invention only uses the parameter quantity of 0.88MB, the FLOPs of 2.20GB and the reasoning speed of 4.75ms to achieve the most advanced change detection performance. The inventive solution achieves the best performance on the CDD data set and allows to identify the change areas with lower parameters and faster speed.

In order to solve the problems of large calculation amount and low reasoning speed in the prior art, the invention constructs an end-to-end network architecture called a lightweight change detection network (LCDNet). The LCDNet greatly compresses parameters and calculated amount through a FOCUS down-sampling module, a depth residual error module and a multi-scale feature fusion module. In order to realize the capability of network to refine feature attention change regions, the invention constructs an efficient attention module based on channel attention and space attention. In order to realize effective fusion of different scale feature maps, the invention constructs a simple and effective multi-scale feature fusion module.

The LCDNet provided by the invention enables a model to achieve a faster reasoning speed on the premise of ensuring good change detection performance through the FOCUS module, the depth residual block and the lightweight attention, and the provided multi-scale feature fusion module can fully utilize information of each scale feature map and only use effective feature flow during feature fusion, thereby more efficiently and quickly extracting a change region.

The references in the invention are as follows:

[1]Daudt R C,Le Saux B,Boulch A.Fully convolutional siamese networks for change detection[C]//2018 25th IEEE International Conference on Image Processing(ICIP),2018:4063-4067.

[2]Alcantarilla P F,Stent S,Ros G,et al.Street-view change detection with deconvolutional networks[J].Autonomous Robots,2018,42(7):1301-1322.

[3]Peng X,Zhong R,Li Z,et al.Optical remote sensing image change detection based on attention mechanism and image difference[J].IEEE Transactions on Geoscience and Remote Sensing,2020,59(9):7296-7307.

[4]Zhang C,Yue P,Tapete D,et al.A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images[J].ISPRS Journal of Photogrammetry and Remote Sensing,2020,166:183-200.

[5]Fang S,Li K,Shao J,et al.SNUNet-CD:A densely connected Siamese network for change detection of VHR images[J].IEEE Geoscience and Remote Sensing Letters,2021,19:1-5.

[6]Chen H,Pu F,Yang R,et al.RDP-Net:Region detail preserving network for change detection[J].arXiv 2022,arXiv:2202.09745.

[7]Liu B,Chen H,Wang Z.LSNet:Extremely Light-weight siamese network for change detection in remote aensing image[J].arXiv 2022,arXiv:2201.09156.

[8]Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[J].Advances in neural information processing systems,2017,30.

[9]Dosovitskiy A,Beyer L,Kolesnikov A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[J].arXiv 2020,arXiv:2010.11929.

[10]Liu Z,Lin Y,Cao Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.

[11]Chen H,Shi Z.A spatial-temporal attention-based method and a new dataset for remote sensing image change detection[J].Remote Sensing,2020,12(10):1662.

[12]Chen J,Yuan Z,Peng J,et al.DASNet:Dual attentive fully convolutional siamese networks for change detection in high-resolution satellite images[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2020,14:1194-1206.

[13]Chen H,Qi Z,Shi Z.Remote sensing image change detection with transformers[J].IEEE Transactions on Geoscience and Remote Sensing,2021,60:1-14.

[14]N.Bourdis,D.Marraud,and H.Sahbi,“Constrained optical flow for aerial image change detection,”in Proc.IEEE Int.Geosci.Remote Sens.Symp.,2011,4176–4179.

in the description of the present invention, it should be noted that, for the terms of orientation, such as "central", "lateral", "longitudinal", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc., indicate orientations and positional relationships based on the orientations or positional relationships shown in the drawings, which are merely for convenience of description and simplification of the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and should not be construed as limiting the specific scope of the present invention.

It is noted that the terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the application of the principles of the technology. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the specific embodiments described herein, and may include more effective embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. The method for detecting the change of the lightweight optical remote sensing image based on the high-efficiency attention is characterized by comprising the following steps: the method comprises the following steps:

A. sequentially carrying out orthorectification, image registration, image stretching and image numerical value normalization preprocessing on the double-time-phase optical remote sensing image so as to obtain a double-time-phase optical remote sensing image with consistent data distribution;

B. marking the preprocessed double-time-phase optical remote sensing image obtained in the step A with an updated part in the remote sensing image to obtain a corresponding change label graph;

C. cutting the change label graph obtained in the step B and the preprocessed double-time-phase optical remote sensing image obtained in the step A by adopting the same size to obtain a training sample;

D. c, connecting the double-time remote sensing images in the training sample obtained in the step C in series;

E. the series images are down-sampled by an FOCUS module, and then the feature map after down-sampling is input into a depth residual block formed by DW convolutions with the convolution kernel size of 3 multiplied by 3 and 5 multiplied by 5 to be coded, so that the feature map related to the change area is extracted;

F. inputting the characteristic diagram obtained in the step E into a high-efficiency attention module to realize the refinement of the characteristic diagram;

G. inputting the features of different layers obtained in the step F into a multi-scale feature fusion module to obtain a final feature map X;

H. g, inputting the final fusion feature X obtained in the step G into a prediction head formed by 1X 1 convolution and obtaining a prediction change diagram of a double-time-phase image of a training sample;

I. combining the two-class cross entropy loss and the Dice loss to form a mixed loss function so as to calculate the loss between the prediction change diagram of the double-time phase image of the training sample obtained in the step H and the corresponding label diagram;

K. sequentially carrying out orthorectification, image registration, image stretching and image numerical value normalization pretreatment on front and rear time phase remote sensing images to be detected, and then cutting the images by adopting the same size to obtain a sample to be detected;

2. The efficient attention-based lightweight optical remote sensing image change detection method according to claim 1, characterized in that: the step F specifically comprises the following steps: firstly, the methodUsing the average pooling layer for feature F to generate aggregated vectors of sizes C × 1 × 1 and 1 × H × W, where C is the number of channels, and H and W are the height and width of the feature; then, one-dimensional convolution is applied to the C multiplied by 1 to obtain the attention diagram of the channel dimension, and two-dimensional convolution is applied to the 1 multiplied by H multiplied by W to obtain the attention diagram of the space dimension; expanding the attention maps of the channel dimension and the space dimension into C × H × W addition and reassigning weights to obtain a final attention map M (F); finally, multiplying M (F) and the input feature graph element by element: m (F) = σ (C1D) ₃ (AvgPool(F))+C2D ₇ (AvgPool(F)))，

Where F represents the input profile, avgPool (. Circle.) represents the tie pooling operation, C1D ₃ (. C2D) represents a one-dimensional convolution with a convolution kernel size of 3 ₇ (. Cndot.) denotes a two-dimensional convolution with a convolution kernel size of 7, σ denotes a sigmoid function,

the element multiplication is shown, and F' represents the weighted refined feature diagram.

3. The efficient attention-based lightweight optical remote sensing image change detection method according to claim 1, characterized in that: the final fusion characteristic X in the step G consists of four scale characteristic graphs X with different network depths ₁ 、X ₂ 、X ₃ And X ₄ The up-sampling element-by-element addition is carried out, and feature maps of four scales are obtained: x ₁ ＝C(F ₄ )，X ₂ ＝C(C(X ₁ ,F ₃ ))，X ₃ ＝C(C(X ₁ ,X ₂ ,F ₂ ))，X ₄ ＝C(X ₂ ,X ₃ ,F ₁ ) The final fusion feature X: x = X ₁ +X ₂ +X ₃ +X ₂ Where the function C (-) represents the convolution operation using a convolution block consisting of a 1 × 1 convolution and a 3 × 3DW convolution, X ₁ 、X ₂ 、X ₃ And X ₄ Respectively represent the residual depth after four layersFeature maps obtained by difference blocks, F ₁ 、F ₂ 、F ₃ And F ₄ Representation feature diagram X ₁ 、X ₂ 、X ₃ And X ₄ And a refined characteristic diagram is obtained after the efficient attention module.

4. The efficient attention-based lightweight optical remote sensing image change detection method according to claim 1, characterized in that: the mixing loss function described in step I is L = L _bce +L _dice Wherein, two-class cross entropy loss

Loss of Dice

y _i,j Representing the probability that the pixel point (i, j) in the corresponding label graph is a changed pixel,

and the probability that the pixel point (i, j) in the prediction change graph is a change pixel is represented, and n and m respectively represent the width and the height of the image pixel level.