CN117975267A - Remote sensing image change detection method based on twin multi-scale cross attention - Google Patents

Remote sensing image change detection method based on twin multi-scale cross attention Download PDF

Info

Publication number
CN117975267A
CN117975267A CN202410124654.9A CN202410124654A CN117975267A CN 117975267 A CN117975267 A CN 117975267A CN 202410124654 A CN202410124654 A CN 202410124654A CN 117975267 A CN117975267 A CN 117975267A
Authority
CN
China
Prior art keywords
feature
attention
remote sensing
change detection
sensing image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410124654.9A
Other languages
Chinese (zh)
Inventor
宋玉莹
任酉贵
鲍玉斌
冷芳玲
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202410124654.9A priority Critical patent/CN117975267A/en
Publication of CN117975267A publication Critical patent/CN117975267A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a remote sensing image change detection method based on twin multiscale cross attention, which relates to the technical field of change detection, and aims to solve the problems of inaccurate pixel edge detection and missing of a tiny target in the existing change detection method by adding a CBAM lightweight attention module in a feature extraction stage, obviously improving the feature extraction and expression capability of a network, fusing multiscale features of different layers, fully mining semantic feature information of a double-time-phase remote sensing image, reducing the influence of continuous downsampling detail loss of an encoder, further screening effective features by using cross attention, capturing interaction information between two time-phase images, and filtering pseudo-change information.

Description

Remote sensing image change detection method based on twin multi-scale cross attention
Technical Field
The invention relates to the technical field of change detection, in particular to a remote sensing image change detection method based on twin multi-scale cross attention.
Background
The change detection is a process of recognizing a state difference by observing the same geographical location at different times, and in recent years, with the rapid development of deep learning technology and the excellent feature extraction capability of deep learning in the image processing field, deep learning is widely used in the field of remote sensing image change detection. The existing remote sensing image change detection method based on deep learning can be roughly divided into two types from the network structure, one type of method is to fuse two images with different time phases and then input the fused images into a single-branch full convolution network, the change is detected by maximizing a boundary, the other type of method is to adopt a double-branch twin network, firstly, the characteristics of the images with two different time phases are respectively extracted, and then the distance between the extracted characteristic pairs is measured to detect a change area.
Although deep learning-based approaches achieve good results, there are still problems that prevent the improvement of network performance. The feature extraction capability of the change detection network model determines the quality of the change detection result to a great extent, but in general, the continuous downsampling operation in the feature extraction process can cause loss of accurate spatial position and other detailed information, and cannot effectively locate tiny targets and capture the detailed features of the change targets, thereby causing missed detection, rough edges of the change region and the like. In addition, the twin network is used for extracting the characteristics of the images of the two time phases respectively, so that interaction information between the two images is ignored, and the detection effect is influenced. At present, the change detection is widely used in the fields of natural disaster detection, urban planning and land utilization, ecological environment protection, national defense safety and the like, so that the realization of a remote sensing image change detection method with accurate positioning of a change region and clear edges of the change region is very important.
Although some progress has been made in methods based on deep learning, most of the existing research methods still have the following problems:
First, the feature extraction capability of the change detection network model determines the quality of the change detection result to a great extent, but most methods cannot fully extract the features of the dual-temporal remote sensing image, and the continuous downsampling operation in the feature extraction process generally results in losing accurate spatial positions and difference information, and cannot effectively locate tiny targets and capture the detailed features of the change targets, resulting in missed detection, rough edges of the change region, and the like.
Secondly, the twin network is used for extracting the characteristics of the images of the two time phases respectively, and the interaction information and the space-time dependency relationship between the two images are ignored, so that the detection result has pseudo-change and the detection precision is not high.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a remote sensing image change detection method based on twin multiscale cross attention, which better solves the problem of losing accurate spatial position information in the continuous downsampling process;
a remote sensing image change detection method based on twin multi-scale cross attention comprises the following steps:
step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;
Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, and specifically comprise three conditions of building new construction, building elimination and no change;
step 1.2: cutting the pictures in the change detection data set according to the set size;
step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion;
step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention;
The remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;
The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features by adopting 7×7 convolution layers with a step length of 1, then adopting BN layers and ReLU activation functions, then adopting a maximum pooling layer with a step length of 2, and then extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output features of each residual block are 1/2, 1/4, 1/8 and 1/16 of the input image in terms of their channels 64, 128, 256 and 512, respectively.
Furthermore, a stacked attention module comprising 4 CBAM modules is integrated in the feature extractor: and applying one CBAM blocks to the output characteristics of each residual block, uniformly adjusting the characteristics to half of the size of the original input image, splicing four characteristic graphs with the same size according to the channel direction, and adjusting the number of channels through a 1x1 convolution layer so as to obtain the characteristics with more discernability and containing multi-scale information. Each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:
For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:
Mc(F)=σ(MLP(Avg(F))+MLP(Max(F)));
Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M c (F) by F, expressed as:
Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:
Ms(F′)=σ(f3×3(Avg(F′);Max(F′)))
F 3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:
The cross attention module is used for further refining the image feature F' extracted by the multi-scale feature extraction module by using cross attention, and the input features are F i and F j, and the query Q i and Q j, the keys K i and K j and the values V i and V j are firstly generated and then transmitted to an attention layer; after generating the attention weight by querying the dot product between Q i and the key K j, retrieving attention information by the product of the value V j and the attention weight; wherein the attention layer is expressed as:
When the attention vector is obtained, it is connected to the input feature F i, resulting in a new feature F i,j as follows:
Fi,j=Fi+A(Qi,Ki,Vi)+A(Qi,Kj,Vj)
the new feature F j,i is calculated in the same way; finally, calculating an output vector through 3×3 convolution and normalization;
The measurement module expands the size of the feature image to the size of an original input image through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then zooms in the distance between unchanged feature tensor pairs by using a contrast loss function, zooms out the distance between changed feature tensor pairs, and finally obtains a change detection result image through threshold segmentation;
step 3: and training a remote sensing image change detection model based on twin multi-scale cross attention.
Step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:
D w(X1,X2) represents the euclidean distance between the input feature X 1 and the input feature X 2, W is a network weight, P represents a feature dimension, Y represents a predicted value, y=1 represents that the detected image corresponding position changes, y=0 represents that no change occurs, that is, no building is newly added or lost, m is a set threshold, and N is the number of samples.
Step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:
Where, ζ (θ) is a given loss function, n is the number of samples per training input, h θ(xi) is the weight of the training samples, x i is the training sample value, y i is the label value of the samples, and i is the serial number of the samples.
Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a reverse propagation algorithm of errors until the model converges, and the training is finished;
Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph.
And respectively inputting the pre-change remote sensing image and the post-change remote sensing image to be detected into two branches of a remote sensing image change detection model, and predicting by the remote sensing image change detection model to obtain a final change map.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in:
The invention provides a remote sensing image change detection method based on twin multi-scale cross attention, which has the following beneficial effects:
1. According to the remote sensing image change detection method based on the multi-scale cross attention twin network, the features of different scales of different levels are reserved through the feature extraction module, and the high-level semantic information and the low-level space detail information are fused, so that targets with different sizes are considered, and the problem of missed detection of small targets is greatly solved. In addition, by adding CBAM lightweight attention modules after each residual block to obtain more discernable features, the problem of blurring of the edges of the change area is greatly improved, and the change detection precision is further improved.
2. According to the remote sensing image building change detection method based on the multiscale cross attention twin network, the cross attention module is used, so that interaction information between two images is obtained, and the space-time dependency relationship of double-phase images is better captured, so that the performance of a model is improved.
Drawings
FIG. 1 is a schematic diagram of a network model structure according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a cross attention module in a network model structure according to an embodiment of the present invention;
FIG. 3 is a schematic view of a first temporal remote sensing image according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second phase remote sensing image according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a remote sensing image change detection result according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the actual change area in an embodiment of the present invention;
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
A remote sensing image change detection method based on twin multi-scale cross attention, as shown in figure 1, comprises the following steps:
step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;
Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, such as houses, warehouses, industrial parks, garages and the like, and particularly comprise three conditions of building new construction, building elimination and no change;
The open source dataset LEVIR-CD was downloaded from the web in this example, which was acquired by Chen et al, 2002 to 2018, images of 20 different regions of Tex, including 637 pairs of 1024x1024 high resolution dual-temporal images with 0.5m/pixel spatial resolution. LEVIR-CD datasets comprise multiple types of buildings, such as houses, garages, and warehouses.
Step 1.2: cutting the pictures in the change detection data set according to the set size;
in this embodiment, a 1024x1024 pixel large-scale image is subjected to non-overlapping cropping into an image of 256x256 in size.
Step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion;
in this embodiment, according to 7:2:1 is divided into a training set, a testing set and a verification set; the cropped final dataset includes 7120 training image pairs, 2048 test image pairs, 1024 verification image pairs; in this embodiment, as shown in fig. 3 and fig. 4, two time phase remote sensing images in the training set are respectively;
step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention, as shown in fig. 2;
The remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;
The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features rich in spatial information by adopting a 7×7 convolution layer with a step length of 1, then adopting a BN layer and a ReLU activation function, then adopting a maximum pooling layer with a step length of 2, and then fully extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output features of each residual block are 1/2, 1/4, 1/8 and 1/16 of the input image in terms of their channels 64, 128, 256 and 512, respectively.
Furthermore, to fully capture the valid information in the multi-scale features, a stacked attention module containing 4 CBAM modules is integrated in the feature extractor: more specifically, one CBAM block is applied to the output features of each residual block to emphasize useful spatial and channel information, then these features are uniformly adjusted to half the original input image size, four feature maps of the same size are spliced in the channel direction, and the number of channels is adjusted by one 1x1 convolution layer, thereby obtaining more discernable features containing multi-scale information. Each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:
For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:
Mc(F)=σ(MLP(Avg(F))+MLP(Max(F)));
Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M c (F) by F, expressed as:
Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:
Ms(F′)=σ(f3×3(Avg(F′);Max(F′)))
F 3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:
The cross attention module is used for further refining the image feature F' extracted by the multi-scale feature extraction module by using cross attention, and the input features are F i and F j, and the query Q i and Q j, the keys K i and K j and the values V i and V j are firstly generated and then transmitted to the attention layer; after generating the attention weight by querying the dot product between Q i and the key K j, retrieving attention information by the product of the value V j and the attention weight; wherein the attention layer is expressed as:
When the attention vector is obtained, it is connected to the input feature F i, resulting in a new feature F i,j as follows:
Fi,j=Fi+A(Qi,Ki,Vi)+A(Qi,Kj,Vj)
the new feature F j,i is calculated in the same way; finally, calculating an output vector through 3×3 convolution and normalization;
The measurement module expands the size of the feature image to the size of an original input image through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then zooms in the distance between unchanged feature tensor pairs by using a contrast loss function, zooms out the distance between changed feature tensor pairs, and finally obtains a change detection result image through threshold segmentation;
step 3: and training a remote sensing image change detection model based on twin multi-scale cross attention.
Step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:
D w(X1,X2) represents the euclidean distance between the input feature X 1 and the input feature X 2, W is a network weight, P represents a feature dimension, Y represents a predicted value, y=1 represents that the detected image corresponding position changes, y=0 represents that no change occurs, that is, no building is newly added or lost, m is a set threshold, and N is the number of samples.
Step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:
Where, ζ (θ) is a given loss function, n is the number of samples per training input, h θ(xi) is the weight of the training samples, x i is the training sample value, y i is the label value of the samples, and i is the serial number of the samples.
Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a reverse propagation algorithm of errors until the model converges, and the training is finished;
Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph, wherein the change result graph is shown in fig. 5.
The pre-change remote sensing image and the post-change remote sensing image to be detected are respectively input into two branches of a remote sensing image change detection model, and a final change diagram is obtained through the prediction of the remote sensing image change detection model, as shown in fig. 6.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims (7)

1. The remote sensing image change detection method based on twin multi-scale cross attention is characterized by comprising the following steps of:
step 1: acquiring a change detection data set, and dividing the change detection data set into a training set, a verification set and a test set;
step 2: constructing a remote sensing image change detection model based on twin multi-scale cross attention;
Step 3: training a remote sensing image change detection model based on twin multi-scale cross attention;
Step 4: and carrying out change detection by using the trained remote sensing image change detection model to obtain a change result graph.
2. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 1 specifically comprises the following steps:
Step 1.1: acquiring a change detection data set; the change detection data set comprises a plurality of types of building pictures, wherein the building pictures are double-time-phase remote sensing images, and specifically comprise three conditions of building new construction, building elimination and no change;
step 1.2: cutting the pictures in the change detection data set according to the set size;
Step 1.3: dividing the cut image into a training set, a testing set and a verification set according to a set proportion.
3. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the remote sensing image change detection model comprises a multi-scale feature extraction module, a cross attention module and a measurement module;
The multi-scale feature extraction module uses ResNet-18 network with the last average pooling and full-connection layer removed as a feature extractor and expands the feature extractor to a twin structure of two shared weights so as to receive the input of the double-time-phase remote sensing image; firstly, extracting shallow layer features by adopting 7×7 convolution layers with a step length of 1, then adopting BN layers and ReLU activation functions, then adopting a maximum pooling layer with a step length of 2, and then extracting information in an image by utilizing 4 residual blocks to obtain four features with different scales, wherein each residual block comprises two 3×3 convolution layers, one BN layer and one ReLU activation function; before being input to the ReLU activation function, the feature is feature fused by adding pixel by pixel with the original input feature; the output feature of each residual block is 1/2, 1/4, 1/8 and 1/16 of the input image, and the channels are 64, 128, 256 and 512;
furthermore, a stacked attention module comprising 4 CBAM modules is integrated in the feature extractor: applying CBAM blocks to the output characteristics of each residual block, uniformly adjusting the characteristics to half of the size of an original input image, splicing four characteristic images with the same size according to the channel direction, and adjusting the number of channels through a 1x1 convolution layer so as to obtain more discernable characteristics containing multi-scale information; each CBAM block contains a channel attention module to capture channel relationships and a spatial attention module to capture spatial semantic information; the specific process comprises the following steps:
For a given size of c×h×w of feature F, firstly, an average pooling layer and a maximum pooling layer are applied to the input feature respectively to obtain two vectors of size c×1×1 in the channel attention module, then the multi-layer perception module MLP is shared with weights of two 1×1 convolution layers to learn and give each channel weight, these two results are added, and the channel attention feature is obtained by using a sigmoid activation function σ, expressed as:
Mc(F)=σ(MLP(Avg(F))+MLP(Max(F)));
Wherein MLP () is a multi-layer perceptron, avg () is average pooling, max () is maximum pooling; the channel refinement feature F' is the result of multiplying the channel attention feature M c (F) by F, expressed as:
Then taking the output characteristic F 'with the channel attention module size of C×H×W as the input characteristic of the space attention module, firstly compressing F' into two matrixes with the size of 1×H×W by using an average pooling layer and a maximum pooling layer, then inputting the matrixes into a3×3 convolution layer after splicing according to the channel dimension, and finally obtaining a space refinement matrix through a sigmoid activation function, wherein the space refinement matrix is expressed as:
Ms(F′)=σ(f3×3(Avg(F′);Max(F′)))
F 3×3 represents a 3x3 convolutional layer, and therefore, feature F "refined by CBAM is obtained by the following formula:
4. A method of detecting changes in remote sensing images based on twin multiscale cross attention as claimed in claim 3, wherein the cross attention module is configured to refine the image feature F "extracted by the multiscale feature extraction module further by using cross attention, and the input features are F i and F j, and first generate queries Q i and Q j, keys K i and K j, and values V i and V j, and then deliver them to the attention layer; after generating the attention weight by querying the dot product between Q i and the key K j, retrieving attention information by the product of the value V j and the attention weight; wherein the attention layer is expressed as:
When the attention vector is obtained, it is connected to the input feature F i, resulting in a new feature F i,j as follows:
Fi,j=Fi+A(Qi,Ki,Vi)+A(Qi,Kj,Vj)
The new feature F j,i is calculated in the same way; finally, the output vector is calculated by 3 x 3 convolution and normalization.
5. The remote sensing image change detection method based on twin multi-scale cross attention according to claim 3, wherein the measurement module expands the feature image size to be the same as the original input image size through bilinear interpolation operation, calculates the Euclidean distance of each position point in the feature tensor pair in an embedded space, and then uses a contrast loss function to pull the distance between the feature tensor pair which is unchanged, and further pulls the distance between the feature tensor pair which is changed, and finally obtains a change detection result image through threshold segmentation.
6. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 3 specifically comprises the following steps:
step 3.1: calculating an error between the predicted value and the real label using the contrast loss function:
Wherein, D w(X1,X2) represents the euclidean distance between the input feature X 1 and the input feature X 2, W is the network weight, P represents the dimension of the feature, Y represents the predicted value, y=1 represents the change of the corresponding position of the detected image, y=0 represents no change, i.e. no new addition or disappearance of the building occurs, m is the set threshold value, and N is the number of samples;
step 3.2: in the training process, a gradient descent method is adopted to optimize network parameters, and the gradient descent is defined as follows:
Wherein, the factor of [ theta ] is a given loss function, n is the number of samples input by each training, h θ(xi) is the weight of the training samples, x i is the training sample value, y i is the label value of the samples, and i is the serial number of the samples;
Step 3.3: in the training process, the parameters in the model are continuously and iteratively optimized by calculating a loss function between a change detection result and a real label and using a back propagation algorithm of errors until the model converges, and the training is finished.
7. The method for detecting the change of the remote sensing image based on the twin multi-scale cross attention according to claim 1, wherein the step 4 is specifically: and respectively inputting the pre-change remote sensing image and the post-change remote sensing image to be detected into two branches of a remote sensing image change detection model, and predicting by the remote sensing image change detection model to obtain a final change map.
CN202410124654.9A 2024-01-30 2024-01-30 Remote sensing image change detection method based on twin multi-scale cross attention Pending CN117975267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410124654.9A CN117975267A (en) 2024-01-30 2024-01-30 Remote sensing image change detection method based on twin multi-scale cross attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410124654.9A CN117975267A (en) 2024-01-30 2024-01-30 Remote sensing image change detection method based on twin multi-scale cross attention

Publications (1)

Publication Number Publication Date
CN117975267A true CN117975267A (en) 2024-05-03

Family

ID=90859142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410124654.9A Pending CN117975267A (en) 2024-01-30 2024-01-30 Remote sensing image change detection method based on twin multi-scale cross attention

Country Status (1)

Country Link
CN (1) CN117975267A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118470548A (en) * 2024-07-12 2024-08-09 湖南大学 Heterogeneous image change detection method based on width learning
CN118587511A (en) * 2024-08-02 2024-09-03 南京信息工程大学 SPECT-MPI image classification method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118470548A (en) * 2024-07-12 2024-08-09 湖南大学 Heterogeneous image change detection method based on width learning
CN118470548B (en) * 2024-07-12 2024-09-17 湖南大学 Heterogeneous image change detection method based on width learning
CN118587511A (en) * 2024-08-02 2024-09-03 南京信息工程大学 SPECT-MPI image classification method and system

Similar Documents

Publication Publication Date Title
CN109118479B (en) Capsule network-based insulator defect identification and positioning device and method
CN108846835B (en) Image change detection method based on depth separable convolutional network
CN117975267A (en) Remote sensing image change detection method based on twin multi-scale cross attention
CN111507222B (en) Three-dimensional object detection frame based on multisource data knowledge migration
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN113192124B (en) Image target positioning method based on twin network
CN114332473B (en) Object detection method, device, computer apparatus, storage medium, and program product
CN113610070A (en) Landslide disaster identification method based on multi-source data fusion
CN111815576B (en) Method, device, equipment and storage medium for detecting corrosion condition of metal part
CN114155474A (en) Damage identification technology based on video semantic segmentation algorithm
CN115375999A (en) Target detection model, method and device applied to dangerous chemical vehicle detection
CN113988147A (en) Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
CN116258877A (en) Land utilization scene similarity change detection method, device, medium and equipment
CN115375925A (en) Underwater sonar image matching algorithm based on phase information and deep learning
CN117726954B (en) Sea-land segmentation method and system for remote sensing image
CN115147727A (en) Method and system for extracting impervious surface of remote sensing image
CN114283082A (en) Infrared small target detection method based on attention mechanism
CN114492755A (en) Target detection model compression method based on knowledge distillation
CN117765363A (en) Image anomaly detection method and system based on lightweight memory bank
CN116912675B (en) Underwater target detection method and system based on feature migration
CN113704276A (en) Map updating method and device, electronic equipment and computer readable storage medium
CN117689932A (en) InSAR atmospheric phase and earth surface deformation detection method and device based on improved YOLOv7 and computer storage medium
CN116977859A (en) Weak supervision target detection method based on multi-scale image cutting and instance difficulty
CN115375966A (en) Image countermeasure sample generation method and system based on joint loss function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination