CN117975267A - Remote sensing image change detection method based on twin multi-scale cross attention - Google Patents
Remote sensing image change detection method based on twin multi-scale cross attention Download PDFInfo
- Publication number
- CN117975267A CN117975267A CN202410124654.9A CN202410124654A CN117975267A CN 117975267 A CN117975267 A CN 117975267A CN 202410124654 A CN202410124654 A CN 202410124654A CN 117975267 A CN117975267 A CN 117975267A
- Authority
- CN
- China
- Prior art keywords
- feature
- attention
- remote sensing
- change detection
- sensing image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008859 change Effects 0.000 title claims abstract description 99
- 238000001514 detection method Methods 0.000 title claims abstract description 75
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 28
- 238000011176 pooling Methods 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 15
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000008034 disappearance Effects 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 4
- 238000003708 edge detection Methods 0.000 abstract 1
- 238000001914 filtration Methods 0.000 abstract 1
- 238000005065 mining Methods 0.000 abstract 1
- 238000012216 screening Methods 0.000 abstract 1
- 238000013135 deep learning Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009338 overlapping cropping Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a remote sensing image change detection method based on twin multiscale cross attention, which relates to the technical field of change detection, and aims to solve the problems of inaccurate pixel edge detection and missing of a tiny target in the existing change detection method by adding a CBAM lightweight attention module in a feature extraction stage, obviously improving the feature extraction and expression capability of a network, fusing multiscale features of different layers, fully mining semantic feature information of a double-time-phase remote sensing image, reducing the influence of continuous downsampling detail loss of an encoder, further screening effective features by using cross attention, capturing interaction information between two time-phase images, and filtering pseudo-change information.
Description
Technical Field
The invention relates to the technical field of change detection, in particular to a remote sensing image change detection method based on twin multi-scale cross attention.
Background
The change detection is a process of recognizing a state difference by observing the same geographical location at different times, and in recent years, with the rapid development of deep learning technology and the excellent feature extraction capability of deep learning in the image processing field, deep learning is widely used in the field of remote sensing image change detection. The existing remote sensing image change detection method based on deep learning can be roughly divided into two types from the network structure, one type of method is to fuse two images with different time phases and then input the fused images into a single-branch full convolution network, the change is detected by maximizing a boundary, the other type of method is to adopt a double-branch twin network, firstly, the characteristics of the images with two different time phases are respectively extracted, and then the distance between the extracted characteristic pairs is measured to detect a change area.
Although deep learning-based approaches achieve good results, there are still problems that prevent the improvement of network performance. The feature extraction capability of the change detection network model determines the quality of the change detection result to a great extent, but in general, the continuous downsampling operation in the feature extraction process can cause loss of accurate spatial position and other detailed information, and cannot effectively locate tiny targets and capture the detailed features of the change targets, thereby causing missed detection, rough edges of the change region and the like. In addition, the twin network is used for extracting the characteristics of the images of the two time phases respectively, so that interaction information between the two images is ignored, and the detection effect is influenced. At present, the change detection is widely used in the fields of natural disaster detection, urban planning and land utilization, ecological environment protection, national defense safety and the like, so that the realization of a remote sensing image change detection method with accurate positioning of a change region and clear edges of the change region is very important.
Although some progress has been made in methods based on deep learning, most of the existing research methods still have the following problems:
First, the feature extraction capability of the change detection network model determines the quality of the change detection result to a great extent, but most methods cannot fully extract the features of the dual-temporal remote sensing image, and the continuous downsampling operation in the feature extraction process generally results in losing accurate spatial positions and difference information, and cannot effectively locate tiny targets and capture the detailed features of the change targets, resulting in missed detection, rough edges of the change region, and the like.
Secondly, the twin network is used for extracting the characteristics of the images of the two time phases respectively, and the interaction information and the space-time dependency relationship between the two images are ignored, so that the detection result has pseudo-change and the detection precision is not high.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a remote sensing image change detection method based on twin multiscale cross attention, which better solves the problem of losing accurate spatial position information in the continuous downsampling process;
a remote sensing image change detection method based on twin multi-scale cross attention comprises the following steps:
step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;
Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, and specifically comprise three conditions of building new construction, building elimination and no change;
step 1.2: cutting the pictures in the change detection data set according to the set size;
step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion;
step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention;
The remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;
The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features by adopting 7×7 convolution layers with a step length of 1, then adopting BN layers and ReLU activation functions, then adopting a maximum pooling layer with a step length of 2, and then extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output features of each residual block are 1/2, 1/4, 1/8 and 1/16 of the input image in terms of their channels 64, 128, 256 and 512, respectively.
Furthermore, a stacked attention module comprising 4 CBAM modules is integrated in the feature extractor: and applying one CBAM blocks to the output characteristics of each residual block, uniformly adjusting the characteristics to half of the size of the original input image, splicing four characteristic graphs with the same size according to the channel direction, and adjusting the number of channels through a 1x1 convolution layer so as to obtain the characteristics with more discernability and containing multi-scale information. Each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:
For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:
Mc(F)=σ(MLP(Avg(F))+MLP(Max(F)));
Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M c (F) by F, expressed as:
Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:
Ms(F′)=σ(f3×3(Avg(F′);Max(F′)))
F 3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:
The cross attention module is used for further refining the image feature F' extracted by the multi-scale feature extraction module by using cross attention, and the input features are F i and F j, and the query Q i and Q j, the keys K i and K j and the values V i and V j are firstly generated and then transmitted to an attention layer; after generating the attention weight by querying the dot product between Q i and the key K j, retrieving attention information by the product of the value V j and the attention weight; wherein the attention layer is expressed as:
When the attention vector is obtained, it is connected to the input feature F i, resulting in a new feature F i,j as follows:
Fi,j=Fi+A(Qi,Ki,Vi)+A(Qi,Kj,Vj)
the new feature F j,i is calculated in the same way; finally, calculating an output vector through 3×3 convolution and normalization;
The measurement module expands the size of the feature image to the size of an original input image through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then zooms in the distance between unchanged feature tensor pairs by using a contrast loss function, zooms out the distance between changed feature tensor pairs, and finally obtains a change detection result image through threshold segmentation;
step 3: and training a remote sensing image change detection model based on twin multi-scale cross attention.
Step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:
D w(X1,X2) represents the euclidean distance between the input feature X 1 and the input feature X 2, W is a network weight, P represents a feature dimension, Y represents a predicted value, y=1 represents that the detected image corresponding position changes, y=0 represents that no change occurs, that is, no building is newly added or lost, m is a set threshold, and N is the number of samples.
Step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:
Where, ζ (θ) is a given loss function, n is the number of samples per training input, h θ(xi) is the weight of the training samples, x i is the training sample value, y i is the label value of the samples, and i is the serial number of the samples.
Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a reverse propagation algorithm of errors until the model converges, and the training is finished;
Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph.
And respectively inputting the pre-change remote sensing image and the post-change remote sensing image to be detected into two branches of a remote sensing image change detection model, and predicting by the remote sensing image change detection model to obtain a final change map.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in:
The invention provides a remote sensing image change detection method based on twin multi-scale cross attention, which has the following beneficial effects:
1. According to the remote sensing image change detection method based on the multi-scale cross attention twin network, the features of different scales of different levels are reserved through the feature extraction module, and the high-level semantic information and the low-level space detail information are fused, so that targets with different sizes are considered, and the problem of missed detection of small targets is greatly solved. In addition, by adding CBAM lightweight attention modules after each residual block to obtain more discernable features, the problem of blurring of the edges of the change area is greatly improved, and the change detection precision is further improved.
2. According to the remote sensing image building change detection method based on the multiscale cross attention twin network, the cross attention module is used, so that interaction information between two images is obtained, and the space-time dependency relationship of double-phase images is better captured, so that the performance of a model is improved.
Drawings
FIG. 1 is a schematic diagram of a network model structure according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a cross attention module in a network model structure according to an embodiment of the present invention;
FIG. 3 is a schematic view of a first temporal remote sensing image according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second phase remote sensing image according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a remote sensing image change detection result according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the actual change area in an embodiment of the present invention;
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
A remote sensing image change detection method based on twin multi-scale cross attention, as shown in figure 1, comprises the following steps:
step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;
Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, such as houses, warehouses, industrial parks, garages and the like, and particularly comprise three conditions of building new construction, building elimination and no change;
The open source dataset LEVIR-CD was downloaded from the web in this example, which was acquired by Chen et al, 2002 to 2018, images of 20 different regions of Tex, including 637 pairs of 1024x1024 high resolution dual-temporal images with 0.5m/pixel spatial resolution. LEVIR-CD datasets comprise multiple types of buildings, such as houses, garages, and warehouses.
Step 1.2: cutting the pictures in the change detection data set according to the set size;
in this embodiment, a 1024x1024 pixel large-scale image is subjected to non-overlapping cropping into an image of 256x256 in size.
Step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion;
in this embodiment, according to 7:2:1 is divided into a training set, a testing set and a verification set; the cropped final dataset includes 7120 training image pairs, 2048 test image pairs, 1024 verification image pairs; in this embodiment, as shown in fig. 3 and fig. 4, two time phase remote sensing images in the training set are respectively;
step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention, as shown in fig. 2;
The remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;
The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features rich in spatial information by adopting a 7×7 convolution layer with a step length of 1, then adopting a BN layer and a ReLU activation function, then adopting a maximum pooling layer with a step length of 2, and then fully extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output features of each residual block are 1/2, 1/4, 1/8 and 1/16 of the input image in terms of their channels 64, 128, 256 and 512, respectively.
Furthermore, to fully capture the valid information in the multi-scale features, a stacked attention module containing 4 CBAM modules is integrated in the feature extractor: more specifically, one CBAM block is applied to the output features of each residual block to emphasize useful spatial and channel information, then these features are uniformly adjusted to half the original input image size, four feature maps of the same size are spliced in the channel direction, and the number of channels is adjusted by one 1x1 convolution layer, thereby obtaining more discernable features containing multi-scale information. Each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:
For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:
Mc(F)=σ(MLP(Avg(F))+MLP(Max(F)));
Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M c (F) by F, expressed as:
Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:
Ms(F′)=σ(f3×3(Avg(F′);Max(F′)))
F 3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:
The cross attention module is used for further refining the image feature F' extracted by the multi-scale feature extraction module by using cross attention, and the input features are F i and F j, and the query Q i and Q j, the keys K i and K j and the values V i and V j are firstly generated and then transmitted to the attention layer; after generating the attention weight by querying the dot product between Q i and the key K j, retrieving attention information by the product of the value V j and the attention weight; wherein the attention layer is expressed as:
When the attention vector is obtained, it is connected to the input feature F i, resulting in a new feature F i,j as follows:
Fi,j=Fi+A(Qi,Ki,Vi)+A(Qi,Kj,Vj)
the new feature F j,i is calculated in the same way; finally, calculating an output vector through 3×3 convolution and normalization;
The measurement module expands the size of the feature image to the size of an original input image through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then zooms in the distance between unchanged feature tensor pairs by using a contrast loss function, zooms out the distance between changed feature tensor pairs, and finally obtains a change detection result image through threshold segmentation;
step 3: and training a remote sensing image change detection model based on twin multi-scale cross attention.
Step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:
D w(X1,X2) represents the euclidean distance between the input feature X 1 and the input feature X 2, W is a network weight, P represents a feature dimension, Y represents a predicted value, y=1 represents that the detected image corresponding position changes, y=0 represents that no change occurs, that is, no building is newly added or lost, m is a set threshold, and N is the number of samples.
Step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:
Where, ζ (θ) is a given loss function, n is the number of samples per training input, h θ(xi) is the weight of the training samples, x i is the training sample value, y i is the label value of the samples, and i is the serial number of the samples.
Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a reverse propagation algorithm of errors until the model converges, and the training is finished;
Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph, wherein the change result graph is shown in fig. 5.
The pre-change remote sensing image and the post-change remote sensing image to be detected are respectively input into two branches of a remote sensing image change detection model, and a final change diagram is obtained through the prediction of the remote sensing image change detection model, as shown in fig. 6.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.
Claims (7)
1. The remote sensing image change detection method based on twin multi-scale cross attention is characterized by comprising the following steps of:
step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;
step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention;
Step 3: training a remote sensing image change detection model based on twin multi-scale cross attention;
Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph.
2. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 1 specifically comprises the following steps:
Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, and specifically comprise three conditions of building new construction, building elimination and no change;
step 1.2: cutting the pictures in the change detection data set according to the set size;
Step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion.
3. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;
The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features by adopting 7×7 convolution layers with a step length of 1, then adopting BN layers and ReLU activation functions, then adopting a maximum pooling layer with a step length of 2, and then extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output feature of each residual block is 1/2, 1/4, 1/8 and 1/16 of the input image, and the channels are 64, 128, 256 and 512;
furthermore, a stacked attention module comprising 4 CBAM modules is integrated in the feature extractor: applying CBAM blocks to the output characteristics of each residual block, uniformly adjusting the characteristics to half of the size of an original input image, splicing four characteristic images with the same size according to the channel direction, and adjusting the number of channels through a 1x1 convolution layer so as to obtain more discernable characteristics containing multi-scale information; each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:
For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:
Mc(F)=σ(MLP(Avg(F))+MLP(Max(F)));
Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M c (F) by F, expressed as:
Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:
Ms(F′)=σ(f3×3(Avg(F′);Max(F′)))
F 3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:
4. A method of detecting changes in remote sensing images based on twin multiscale cross attention as claimed in claim 3, wherein the cross attention module is configured to refine the image feature F "extracted by the multiscale feature extraction module further by using cross attention, and the input features are F i and F j, and first generate queries Q i and Q j, keys K i and K j, and values V i and V j, and then deliver them to the attention layer; after generating the attention weight by querying the dot product between Q i and the key K j, retrieving attention information by the product of the value V j and the attention weight; wherein the attention layer is expressed as:
When the attention vector is obtained, it is connected to the input feature F i, resulting in a new feature F i,j as follows:
Fi,j=Fi+A(Qi,Ki,Vi)+A(Qi,Kj,Vj)
The new feature F j,i is calculated in the same way; finally, the output vector is calculated by 3 x 3 convolution and normalization.
5. The remote sensing image change detection method based on twin multi-scale cross attention according to claim 3, wherein the measurement module expands the feature image size to be the same as the original input image size through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then uses a contrast loss function to pull the distance between the feature tensor pair which is unchanged, and further pulls the distance between the feature tensor pair which is changed, and finally obtains a change detection result image through threshold segmentation.
6. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 3 specifically comprises the following steps:
step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:
Wherein, D w(X1,X2) represents the euclidean distance between the input feature X 1 and the input feature X 2, W is the network weight, P represents the dimension of the feature, Y represents the predicted value, y=1 represents the change of the corresponding position of the detected image, y=0 represents no change, i.e. no new addition or disappearance of the building occurs, m is the set threshold value, and N is the number of samples;
step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:
Wherein, the factor of [ theta ] is a given loss function, n is the number of samples input by each training, h θ(xi) is the weight of the training samples, x i is the training sample value, y i is the label value of the samples, and i is the serial number of the samples;
Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a back propagation algorithm of errors until the model converges, and the training is finished.
7. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 4 is specifically: and respectively inputting the pre-change remote sensing image and the post-change remote sensing image to be detected into two branches of a remote sensing image change detection model, and predicting by the remote sensing image change detection model to obtain a final change map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410124654.9A CN117975267A (en) | 2024-01-30 | 2024-01-30 | Remote sensing image change detection method based on twin multi-scale cross attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410124654.9A CN117975267A (en) | 2024-01-30 | 2024-01-30 | Remote sensing image change detection method based on twin multi-scale cross attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117975267A true CN117975267A (en) | 2024-05-03 |
Family
ID=90859142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410124654.9A Pending CN117975267A (en) | 2024-01-30 | 2024-01-30 | Remote sensing image change detection method based on twin multi-scale cross attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117975267A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118470548A (en) * | 2024-07-12 | 2024-08-09 | 湖南大学 | Heterogeneous image change detection method based on width learning |
CN118587511A (en) * | 2024-08-02 | 2024-09-03 | 南京信息工程大学 | SPECT-MPI image classification method and system |
-
2024
- 2024-01-30 CN CN202410124654.9A patent/CN117975267A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118470548A (en) * | 2024-07-12 | 2024-08-09 | 湖南大学 | Heterogeneous image change detection method based on width learning |
CN118470548B (en) * | 2024-07-12 | 2024-09-17 | 湖南大学 | Heterogeneous image change detection method based on width learning |
CN118587511A (en) * | 2024-08-02 | 2024-09-03 | 南京信息工程大学 | SPECT-MPI image classification method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109118479B (en) | Capsule network-based insulator defect identification and positioning device and method | |
CN108846835B (en) | Image change detection method based on depth separable convolutional network | |
CN117975267A (en) | Remote sensing image change detection method based on twin multi-scale cross attention | |
CN111507222B (en) | Three-dimensional object detection frame based on multisource data knowledge migration | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN112801047B (en) | Defect detection method and device, electronic equipment and readable storage medium | |
CN113192124B (en) | Image target positioning method based on twin network | |
CN114332473B (en) | Object detection method, device, computer apparatus, storage medium, and program product | |
CN113610070A (en) | Landslide disaster identification method based on multi-source data fusion | |
CN111815576B (en) | Method, device, equipment and storage medium for detecting corrosion condition of metal part | |
CN114155474A (en) | Damage identification technology based on video semantic segmentation algorithm | |
CN115375999A (en) | Target detection model, method and device applied to dangerous chemical vehicle detection | |
CN113988147A (en) | Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device | |
CN116258877A (en) | Land utilization scene similarity change detection method, device, medium and equipment | |
CN115375925A (en) | Underwater sonar image matching algorithm based on phase information and deep learning | |
CN117726954B (en) | Sea-land segmentation method and system for remote sensing image | |
CN115147727A (en) | Method and system for extracting impervious surface of remote sensing image | |
CN114283082A (en) | Infrared small target detection method based on attention mechanism | |
CN114492755A (en) | Target detection model compression method based on knowledge distillation | |
CN117765363A (en) | Image anomaly detection method and system based on lightweight memory bank | |
CN116912675B (en) | Underwater target detection method and system based on feature migration | |
CN113704276A (en) | Map updating method and device, electronic equipment and computer readable storage medium | |
CN117689932A (en) | InSAR atmospheric phase and earth surface deformation detection method and device based on improved YOLOv7 and computer storage medium | |
CN116977859A (en) | Weak supervision target detection method based on multi-scale image cutting and instance difficulty | |
CN115375966A (en) | Image countermeasure sample generation method and system based on joint loss function |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |