CN114359613A - Remote sensing image scene classification method based on space and multi-channel fusion self-attention network - Google Patents
Remote sensing image scene classification method based on space and multi-channel fusion self-attention network Download PDFInfo
- Publication number
- CN114359613A CN114359613A CN202011093081.6A CN202011093081A CN114359613A CN 114359613 A CN114359613 A CN 114359613A CN 202011093081 A CN202011093081 A CN 202011093081A CN 114359613 A CN114359613 A CN 114359613A
- Authority
- CN
- China
- Prior art keywords
- mapping
- width
- feature
- layer
- remote sensing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013507 mapping Methods 0.000 claims abstract description 58
- 230000007246 mechanism Effects 0.000 claims abstract description 6
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 4
- 230000004913 activation Effects 0.000 claims description 8
- 230000000295 complement effect Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000005284 excitation Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000001308 synthesis method Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Abstract
The invention relates to the technical field of video image processing, and provides a remote sensing image scene classification method based on a space and multi-channel fusion attention mechanism, which comprises the following steps: a. extracting residual characteristic information of the remote sensing image by using a ResNet network; b. performing feature mapping on the foreground content and the background content by using a spatial self-attention network to obtain spatial mapping features; b. performing multi-channel multi-scale feature mapping on the residual feature information by using a multi-channel fusion self-attention network to obtain multi-channel fusion mapping features; c. synthesizing the extracted space mapping characteristics and the multi-channel fusion mapping characteristics; d. and classifying the synthesized mapping characteristics by using a width classifier to obtain a classification result. The remote sensing image scene classification method based on space and multi-channel fusion can effectively improve the average precision of the remote sensing scene classification data set and reduce the calculation cost.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a scene classification method of a remote sensing image.
Background
The purpose of the remote sensing image scene classification is to assign a specific semantic class to the remote sensing image. The remote sensing image scene classification technology is concerned by the application potential in the fields of city monitoring, environment detection, geographic structure analysis and the like. The network framework for these tasks generally consists of two basic networks, namely a feature mapping network and a class classification network.
In recent years, due to the strong feature learning capability of a deep Convolutional Neural Network (CNN), the accuracy of a remote sensing image scene classification task is remarkably improved. The existing common methods for remote sensing image scene classification tasks can be roughly divided into two types: based on the original CNN method and the feature mapping method. The first method classifies classes directly using previous CNN networks, extracting features only from the last layer of the deep convolutional neural network. Feature mapping methods typically first encode these features to improve the performance of scene classification. In fact, the previous remote sensing image scene classification method proves that the semantic features extracted from different layers of hierarchy obviously improve the classification precision. In addition, the complicated image background information also causes considerable obstacles, and the spatial information can significantly improve the object detection problem. However, existing approaches typically ignore multi-layer structure and spatial region information.
Disclosure of Invention
Aiming at the defects of the remote sensing image scene method in the prior art, the invention provides an end-to-end remote sensing image scene classification method to integrate a multilayer characteristic structure and spatial information. In addition, the classification method provides a new classifier, namely a width classifier, which is used for identifying the class label of the remote sensing image. First, a new algorithm is provided for the problem of misrecognition caused by incomplete feature representation, namely, the relationship between feature mappings across scales is enhanced through a multi-channel fusion attention module. Second, for error detection problems caused by complex backgrounds, the method provides a spatial attention mechanism and adds to the original feature mapping network. In the final stage, the width classifier is trained using the remapped feature information. In addition, the width classifier, which is composed of a width learning system (BLS), can effectively reduce training time while maintaining classification performance.
In order to achieve the purpose, the remote sensing image scene classification method based on the spatial and multi-channel fusion self-attention network comprises the following steps:
step S1: providing a remote sensing image to be classified, preprocessing the remote sensing image and acquiring corresponding mapping characteristics.
Step S2: building a spatial self-attention network, wherein the network comprises a convolutional layer and an excitation layer; and inputting the mapping characteristics into a spatial self-attention network to obtain spatial mapping characteristics.
Step S3: building a multi-channel fusion self-attention network, wherein the network comprises an upper sampling layer, a connecting layer, a fusion layer, a splitting layer, a lower sampling layer and a convolution layer; and inputting the mapping characteristics into a multi-channel fusion self-attention network to obtain the multi-channel fusion characteristics.
Step S4: and synthesizing the spatial mapping feature and the multi-channel fusion feature.
Step S5: building a width classifier, wherein the classifier comprises a width feature mapping layer and a width feature enhancement layer; and inputting the synthesized spatial mapping characteristics and multi-channel fusion characteristics into the spatial classifier to obtain a classification result.
Optionally, in an embodiment of the invention, the input of the step S1 is a remote sensing image of H × W (H, W respectively represent the length and width of the remote sensing image), and the multi-layer residual mapping features of the image are extracted through a ResNet network and are represented by R-Conv-1-4.
Optionally, in an embodiment of the present invention, the input of the step S2 is a first-layer residual mapping feature R-Conv-1, and the spatial self-attention weight is obtained by a convolution operation connected to an activation function, and the step may be represented as:
S=sigmoid(AsF)*F
wherein S is the space self-attention mapping characteristic of the output, sigmoid is a sigmoid activation function, ASFor the convolution kernel in the convolution layer, F ∈ RH×W×C(C is the feature dimension) is the input residual mapping feature.
Optionally, in an embodiment of the present invention, the step S3 includes:
step S31: the input is the residual mapping feature of layer 2-4 extracted from ResNet, denoted R-Conv-2-4, and R-Conv-3-4 is upsampled to the same size as R-Conv-2.
Step S32: inputting the aligned mapping feature map into the connection layer, and forming a complementary channel fusion feature through the fusion layer.
Step S33: inputting the channel fusion characteristics into a segmentation layer, segmenting the complementary fusion characteristics, and restoring the complementary fusion characteristics to the original size.
Optionally, in an embodiment of the present invention, the synthesis method in step S4 is as follows:
F′=F+sigmoid((Ws*F)*(Wc*F))
wherein F' is a synthetic feature, WsFor spatial self-attention network weights, WcFor the multi-channel fusion self-attention network weights, matrix multiplication is represented.
Optionally, in an embodiment of the present invention, the step S5 includes:
s51: the input of the width classifier is a synthesized feature F', and the corresponding width mapping feature can be expressed as:
wherein M isiFor the ith width-mapping feature,to activate a function, WsiTo width mappingConvolution kernel, betasiFor randomly generated bias terms, n is the total number of feature points. Thus, all width mapping features may be denoted as Mn=[M1,M2,...,Mn]。
S52, inputting the width characteristic enhancement layer as a width mapping characteristic, and obtaining the m-th group of width enhancement characteristics which satisfy the following expression:
Hm=σ(MnWhm+βhm)
wherein HmFor the mth group of width-enhancing features, σ is the activation function, WhmFor width-enhanced convolution kernels, betahmIs a randomly generated bias term.
S53: the output classification result can be expressed as:
Y=[M1,M2,...,Mn|σ(MnWh1+βh1),...,σ(MnWhm+βhm)]Wm
wherein WmRepresenting the connection weights of the width mapping node and the width enhancing node.
The remote sensing image scene classification method provided by the invention comprises a space self-attention network, a multi-channel fusion self-attention network and a width classifier. The core idea of the method is to fully utilize the relation between different layers and the spatial information of the target object. Compared with the prior art, the remote sensing image scene method at least comprises the following advantages:
1) the invention adopts the multi-channel fusion self-attention network, fully utilizes the relation among different network layers and improves the classification accuracy.
2) The invention adopts the space self-attention network, and can effectively extract useful information from the complex background image.
3) The invention adopts the width classifier as a flat network, does not need to pass through a complex reverse transmission process during training, can obviously accelerate the training speed and reduce the training time.
Drawings
FIG. 1 is a flow chart of a remote sensing image scene classification method based on a spatial and multi-channel fusion self-attention network;
FIG. 2 shows a network design block diagram of the remote sensing image scene classification method based on the spatial and multi-channel fusion self-attention network;
FIG. 3 is a schematic diagram showing the comparison between the remote sensing image scene classification method of the present invention and the remote sensing image scene classification of other methods.
Detailed description of the invention
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the method of an embodiment of the present invention operates as follows: s1, extracting mapping characteristics of a remote sensing image; s2, extracting spatial features; s3, extracting multi-channel fusion characteristics; s4, synthesizing spatial features and multi-channel fusion features; and S5, classifying the remote sensing image scene.
For step S1, the present invention employs a ResNet network to extract a multi-scale feature map. The structure of our proposed method is shown in figure 1. X is formed by RH×WRepresenting the input image (H and W representing the width and height of the image, respectively), the residual feature map learned from the ResNet network is denoted as F ∈ RH×W×C。
For step S2, the present invention proposes a spatial attention mechanism. As shown in FIG. 1, the input features of the spatial attention Module are mapped to F ∈ RH×W×CThe output is S ∈ RH×W×C. The spatial attention mechanism can be expressed as:
S=sigmoid(AsF)*F
wherein sigmoid is a sigmoid activation function, AsIs the convolution kernel of the spatial attention module.
For step S3, the feature map reflects the semantic and structural information of the remote sensing image, which is of great significance for the visual classification and recognition task. Furthermore, feature maps from different levels tend to have different features. Therefore, the invention provides a channel fusion self-attention network, which integrates feature mapping of different layers to promote different discrimination capabilities of scene types. As shown in FIG. 2, we represent the residual features extracted in ResNet as R-Conv-1 ~ 4 and upsample R-Conv-3 ~ 4 to the same size as R-Conv-2. Then, the aligned feature maps are connected together to form a new channel mapping information, and the enhancement is performed among all the layers. We then split and restore the fused block to its original size. In fact, this operation enhances the relationship of feature mapping between these different levels, with complementarity.
For step S4, it is important for the image classification task to select an appropriate way to represent the semantic information of the image. We propose to use a spatial and multi-channel fusion self-attention mechanism to solve the image representation problem. Image recognition is also important for image classification. Accordingly, the present invention provides a width classifier to identify image classes. A flat network called the width learning system is used to perform the image classification task. The whole classification process is divided into two steps of feature mapping and node enhancement.
The input of the width classifier is a synthesized feature F', and the corresponding width mapping feature can be expressed as:
wherein M isiFor the ith width-mapping feature,to activate a function, WsiFor width-mapped convolution kernels, betasiFor randomly generated bias terms, n is the total number of feature points. Thus, all width mapping features may be denoted as Mn=[M1,M2,...,Mn]。
The input of the width characteristic enhancement layer is a width mapping characteristic, and the obtained mth group of width enhancement characteristics satisfy the following expression:
Hm=σ(MnWhm+βhm)
wherein HmFor the mth group of width-enhancing features, σ is the activation function, WhmFor width-enhanced convolution kernels, betahmIs a randomly generated bias term.
The output classification result can be expressed as:
Y=[M1,M2,...,Mn|σ(MnWh1+βh1),...,σ(MnWhm+βhm)]Wm
wherein WmRepresenting the connection weights of the width mapping node and the width enhancing node.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (6)
1. A remote sensing image scene classification method based on a space and multi-channel fusion attention mechanism is characterized by comprising the following steps:
step S1: providing a remote sensing image to be classified, preprocessing the remote sensing image and acquiring corresponding mapping characteristics.
Step S2: building a spatial self-attention network, wherein the network comprises a convolutional layer and an excitation layer; and inputting the mapping characteristics into a spatial self-attention network to obtain spatial mapping characteristics.
Step S3: building a multi-channel fusion self-attention network, wherein the network comprises an upper sampling layer, a connecting layer, a fusion layer, a splitting layer, a lower sampling layer and a convolution layer; and inputting the mapping characteristics into a multi-channel fusion self-attention network to obtain the multi-channel fusion characteristics.
Step S4: and synthesizing the spatial mapping feature and the multi-channel fusion feature.
Step S5: building a width classifier, wherein the classifier comprises a width feature mapping layer and a width feature enhancement layer; and inputting the synthesized spatial mapping characteristics and multi-channel fusion characteristics into the spatial classifier to obtain a classification result.
2. The method according to claim 1, wherein the input of the step S1 is a remote sensing image H x W (H, W respectively represents the length and width of the remote sensing image), and the multilayer residual mapping characteristics of the image are extracted through a ResNet network and are represented by R-Conv-1-4.
3. The method according to claim 1, wherein the input of step S2 is a first layer residual mapping feature R-Conv-1, and the spatial self-attention weight is obtained by a convolution operation connected to an activation function, and the steps are expressed as:
S=sigmoid(AsF)*F
wherein S is the space self-attention mapping characteristic of the output, sigmoid is a sigmoid activation function, ASFor the convolution kernel in the convolution layer, F ∈ RH×W×C(C is the feature dimension) is the input residual mapping feature.
4. The method according to claim 1, wherein the step S3 includes:
step S31: the input is the residual mapping feature of layer 2-4 extracted from ResNet, denoted R-Conv-2-4, and R-Conv-3-4 is upsampled to the same size as R-Conv-2.
Step S32: inputting the aligned mapping feature map into the connection layer, and forming a complementary channel fusion feature through the fusion layer.
Step S33: inputting the channel fusion characteristics into a segmentation layer, segmenting the complementary fusion characteristics, and restoring the complementary fusion characteristics to the original size.
5. The method according to claim 1, wherein the synthesis method in step S4 is as follows:
F′=F+sigmoid((Ws*F)*(Wc*F))
wherein F' is a synthetic feature, WsFor spatial self-attention network weights, WcFor the multi-channel fusion self-attention network weights, matrix multiplication is represented.
6. The method according to claim 1, wherein step S5 includes:
s51: the input of the width classifier is a synthesized feature F', and the corresponding width mapping feature can be expressed as:
wherein M isiFor the ith width-mapping feature,to activate a function, WsiFor width-mapped convolution kernels, betasiFor randomly generated bias terms, n is the total number of feature points. Thus, all width mapping features may be denoted as Mn=[M1,M2,...,Mn]。
S52, inputting the width characteristic enhancement layer as a width mapping characteristic, and obtaining the m-th group of width enhancement characteristics which satisfy the following expression:
Hm=σ(MnWhm+βhm)
wherein HmFor the mth group of width-enhancing features, σ is the activation function, WhmFor width-enhanced convolution kernels, betahmIs a randomly generated bias term.
S53: the output classification result can be expressed as:
Y=[M1,M2,...,Mn|σ(MnWh1+βh1),...,σ(MnWhm+βhm)]Wm
wherein WmRepresenting a connection of a width mapping node and a width enhancing nodeAnd (4) weighting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011093081.6A CN114359613A (en) | 2020-10-13 | 2020-10-13 | Remote sensing image scene classification method based on space and multi-channel fusion self-attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011093081.6A CN114359613A (en) | 2020-10-13 | 2020-10-13 | Remote sensing image scene classification method based on space and multi-channel fusion self-attention network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114359613A true CN114359613A (en) | 2022-04-15 |
Family
ID=81089672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011093081.6A Pending CN114359613A (en) | 2020-10-13 | 2020-10-13 | Remote sensing image scene classification method based on space and multi-channel fusion self-attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359613A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116721301A (en) * | 2023-08-10 | 2023-09-08 | 中国地质大学(武汉) | Training method, classifying method, device and storage medium for target scene classifying model |
-
2020
- 2020-10-13 CN CN202011093081.6A patent/CN114359613A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116721301A (en) * | 2023-08-10 | 2023-09-08 | 中国地质大学(武汉) | Training method, classifying method, device and storage medium for target scene classifying model |
CN116721301B (en) * | 2023-08-10 | 2023-10-24 | 中国地质大学(武汉) | Training method, classifying method, device and storage medium for target scene classifying model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112966684B (en) | Cooperative learning character recognition method under attention mechanism | |
CN107038448B (en) | Target detection model construction method | |
CN111210443A (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
CN111047551A (en) | Remote sensing image change detection method and system based on U-net improved algorithm | |
CN109948707B (en) | Model training method, device, terminal and storage medium | |
CN112949673A (en) | Feature fusion target detection and identification method based on global attention | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN110517270B (en) | Indoor scene semantic segmentation method based on super-pixel depth network | |
CN111160407A (en) | Deep learning target detection method and system | |
CN114187311A (en) | Image semantic segmentation method, device, equipment and storage medium | |
CN110008899B (en) | Method for extracting and classifying candidate targets of visible light remote sensing image | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
CN112232371A (en) | American license plate recognition method based on YOLOv3 and text recognition | |
Li et al. | A review of deep learning methods for pixel-level crack detection | |
CN114913498A (en) | Parallel multi-scale feature aggregation lane line detection method based on key point estimation | |
CN116129291A (en) | Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device | |
CN114743126A (en) | Lane line sign segmentation method based on graph attention machine mechanism network | |
CN114359613A (en) | Remote sensing image scene classification method based on space and multi-channel fusion self-attention network | |
CN113223006B (en) | Lightweight target semantic segmentation method based on deep learning | |
CN114494827A (en) | Small target detection method for detecting aerial picture | |
Kaushik et al. | A Survey of Approaches for Sign Language Recognition System | |
CN115272814B (en) | Long-distance space self-adaptive multi-scale small target detection method | |
CN117075778B (en) | Information processing system for picture and text | |
CN117496160B (en) | Indoor scene-oriented semantic segmentation method for low-illumination image shot by unmanned aerial vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |