CN114359613A - Remote sensing image scene classification method based on space and multi-channel fusion self-attention network - Google Patents

Remote sensing image scene classification method based on space and multi-channel fusion self-attention network Download PDF

Info

Publication number
CN114359613A
CN114359613A CN202011093081.6A CN202011093081A CN114359613A CN 114359613 A CN114359613 A CN 114359613A CN 202011093081 A CN202011093081 A CN 202011093081A CN 114359613 A CN114359613 A CN 114359613A
Authority
CN
China
Prior art keywords
mapping
width
feature
layer
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011093081.6A
Other languages
Chinese (zh)
Inventor
陈志华
刘韵娜
刘潇丽
胡灼亮
仇隽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202011093081.6A priority Critical patent/CN114359613A/en
Publication of CN114359613A publication Critical patent/CN114359613A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to the technical field of video image processing, and provides a remote sensing image scene classification method based on a space and multi-channel fusion attention mechanism, which comprises the following steps: a. extracting residual characteristic information of the remote sensing image by using a ResNet network; b. performing feature mapping on the foreground content and the background content by using a spatial self-attention network to obtain spatial mapping features; b. performing multi-channel multi-scale feature mapping on the residual feature information by using a multi-channel fusion self-attention network to obtain multi-channel fusion mapping features; c. synthesizing the extracted space mapping characteristics and the multi-channel fusion mapping characteristics; d. and classifying the synthesized mapping characteristics by using a width classifier to obtain a classification result. The remote sensing image scene classification method based on space and multi-channel fusion can effectively improve the average precision of the remote sensing scene classification data set and reduce the calculation cost.

Description

Remote sensing image scene classification method based on space and multi-channel fusion self-attention network
Technical Field
The invention relates to the technical field of image processing, in particular to a scene classification method of a remote sensing image.
Background
The purpose of the remote sensing image scene classification is to assign a specific semantic class to the remote sensing image. The remote sensing image scene classification technology is concerned by the application potential in the fields of city monitoring, environment detection, geographic structure analysis and the like. The network framework for these tasks generally consists of two basic networks, namely a feature mapping network and a class classification network.
In recent years, due to the strong feature learning capability of a deep Convolutional Neural Network (CNN), the accuracy of a remote sensing image scene classification task is remarkably improved. The existing common methods for remote sensing image scene classification tasks can be roughly divided into two types: based on the original CNN method and the feature mapping method. The first method classifies classes directly using previous CNN networks, extracting features only from the last layer of the deep convolutional neural network. Feature mapping methods typically first encode these features to improve the performance of scene classification. In fact, the previous remote sensing image scene classification method proves that the semantic features extracted from different layers of hierarchy obviously improve the classification precision. In addition, the complicated image background information also causes considerable obstacles, and the spatial information can significantly improve the object detection problem. However, existing approaches typically ignore multi-layer structure and spatial region information.
Disclosure of Invention
Aiming at the defects of the remote sensing image scene method in the prior art, the invention provides an end-to-end remote sensing image scene classification method to integrate a multilayer characteristic structure and spatial information. In addition, the classification method provides a new classifier, namely a width classifier, which is used for identifying the class label of the remote sensing image. First, a new algorithm is provided for the problem of misrecognition caused by incomplete feature representation, namely, the relationship between feature mappings across scales is enhanced through a multi-channel fusion attention module. Second, for error detection problems caused by complex backgrounds, the method provides a spatial attention mechanism and adds to the original feature mapping network. In the final stage, the width classifier is trained using the remapped feature information. In addition, the width classifier, which is composed of a width learning system (BLS), can effectively reduce training time while maintaining classification performance.
In order to achieve the purpose, the remote sensing image scene classification method based on the spatial and multi-channel fusion self-attention network comprises the following steps:
step S1: providing a remote sensing image to be classified, preprocessing the remote sensing image and acquiring corresponding mapping characteristics.
Step S2: building a spatial self-attention network, wherein the network comprises a convolutional layer and an excitation layer; and inputting the mapping characteristics into a spatial self-attention network to obtain spatial mapping characteristics.
Step S3: building a multi-channel fusion self-attention network, wherein the network comprises an upper sampling layer, a connecting layer, a fusion layer, a splitting layer, a lower sampling layer and a convolution layer; and inputting the mapping characteristics into a multi-channel fusion self-attention network to obtain the multi-channel fusion characteristics.
Step S4: and synthesizing the spatial mapping feature and the multi-channel fusion feature.
Step S5: building a width classifier, wherein the classifier comprises a width feature mapping layer and a width feature enhancement layer; and inputting the synthesized spatial mapping characteristics and multi-channel fusion characteristics into the spatial classifier to obtain a classification result.
Optionally, in an embodiment of the invention, the input of the step S1 is a remote sensing image of H × W (H, W respectively represent the length and width of the remote sensing image), and the multi-layer residual mapping features of the image are extracted through a ResNet network and are represented by R-Conv-1-4.
Optionally, in an embodiment of the present invention, the input of the step S2 is a first-layer residual mapping feature R-Conv-1, and the spatial self-attention weight is obtained by a convolution operation connected to an activation function, and the step may be represented as:
S=sigmoid(AsF)*F
wherein S is the space self-attention mapping characteristic of the output, sigmoid is a sigmoid activation function, ASFor the convolution kernel in the convolution layer, F ∈ RH×W×C(C is the feature dimension) is the input residual mapping feature.
Optionally, in an embodiment of the present invention, the step S3 includes:
step S31: the input is the residual mapping feature of layer 2-4 extracted from ResNet, denoted R-Conv-2-4, and R-Conv-3-4 is upsampled to the same size as R-Conv-2.
Step S32: inputting the aligned mapping feature map into the connection layer, and forming a complementary channel fusion feature through the fusion layer.
Step S33: inputting the channel fusion characteristics into a segmentation layer, segmenting the complementary fusion characteristics, and restoring the complementary fusion characteristics to the original size.
Optionally, in an embodiment of the present invention, the synthesis method in step S4 is as follows:
F′=F+sigmoid((Ws*F)*(Wc*F))
wherein F' is a synthetic feature, WsFor spatial self-attention network weights, WcFor the multi-channel fusion self-attention network weights, matrix multiplication is represented.
Optionally, in an embodiment of the present invention, the step S5 includes:
s51: the input of the width classifier is a synthesized feature F', and the corresponding width mapping feature can be expressed as:
Figure RE-GDA0002982424780000041
wherein M isiFor the ith width-mapping feature,
Figure RE-GDA0002982424780000042
to activate a function, WsiTo width mappingConvolution kernel, betasiFor randomly generated bias terms, n is the total number of feature points. Thus, all width mapping features may be denoted as Mn=[M1,M2,...,Mn]。
S52, inputting the width characteristic enhancement layer as a width mapping characteristic, and obtaining the m-th group of width enhancement characteristics which satisfy the following expression:
Hm=σ(MnWhmhm)
wherein HmFor the mth group of width-enhancing features, σ is the activation function, WhmFor width-enhanced convolution kernels, betahmIs a randomly generated bias term.
S53: the output classification result can be expressed as:
Y=[M1,M2,...,Mn|σ(MnWh1h1),...,σ(MnWhmhm)]Wm
wherein WmRepresenting the connection weights of the width mapping node and the width enhancing node.
The remote sensing image scene classification method provided by the invention comprises a space self-attention network, a multi-channel fusion self-attention network and a width classifier. The core idea of the method is to fully utilize the relation between different layers and the spatial information of the target object. Compared with the prior art, the remote sensing image scene method at least comprises the following advantages:
1) the invention adopts the multi-channel fusion self-attention network, fully utilizes the relation among different network layers and improves the classification accuracy.
2) The invention adopts the space self-attention network, and can effectively extract useful information from the complex background image.
3) The invention adopts the width classifier as a flat network, does not need to pass through a complex reverse transmission process during training, can obviously accelerate the training speed and reduce the training time.
Drawings
FIG. 1 is a flow chart of a remote sensing image scene classification method based on a spatial and multi-channel fusion self-attention network;
FIG. 2 shows a network design block diagram of the remote sensing image scene classification method based on the spatial and multi-channel fusion self-attention network;
FIG. 3 is a schematic diagram showing the comparison between the remote sensing image scene classification method of the present invention and the remote sensing image scene classification of other methods.
Detailed description of the invention
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the method of an embodiment of the present invention operates as follows: s1, extracting mapping characteristics of a remote sensing image; s2, extracting spatial features; s3, extracting multi-channel fusion characteristics; s4, synthesizing spatial features and multi-channel fusion features; and S5, classifying the remote sensing image scene.
For step S1, the present invention employs a ResNet network to extract a multi-scale feature map. The structure of our proposed method is shown in figure 1. X is formed by RH×WRepresenting the input image (H and W representing the width and height of the image, respectively), the residual feature map learned from the ResNet network is denoted as F ∈ RH×W×C
For step S2, the present invention proposes a spatial attention mechanism. As shown in FIG. 1, the input features of the spatial attention Module are mapped to F ∈ RH×W×CThe output is S ∈ RH×W×C. The spatial attention mechanism can be expressed as:
S=sigmoid(AsF)*F
wherein sigmoid is a sigmoid activation function, AsIs the convolution kernel of the spatial attention module.
For step S3, the feature map reflects the semantic and structural information of the remote sensing image, which is of great significance for the visual classification and recognition task. Furthermore, feature maps from different levels tend to have different features. Therefore, the invention provides a channel fusion self-attention network, which integrates feature mapping of different layers to promote different discrimination capabilities of scene types. As shown in FIG. 2, we represent the residual features extracted in ResNet as R-Conv-1 ~ 4 and upsample R-Conv-3 ~ 4 to the same size as R-Conv-2. Then, the aligned feature maps are connected together to form a new channel mapping information, and the enhancement is performed among all the layers. We then split and restore the fused block to its original size. In fact, this operation enhances the relationship of feature mapping between these different levels, with complementarity.
For step S4, it is important for the image classification task to select an appropriate way to represent the semantic information of the image. We propose to use a spatial and multi-channel fusion self-attention mechanism to solve the image representation problem. Image recognition is also important for image classification. Accordingly, the present invention provides a width classifier to identify image classes. A flat network called the width learning system is used to perform the image classification task. The whole classification process is divided into two steps of feature mapping and node enhancement.
The input of the width classifier is a synthesized feature F', and the corresponding width mapping feature can be expressed as:
Figure RE-GDA0002982424780000071
wherein M isiFor the ith width-mapping feature,
Figure RE-GDA0002982424780000072
to activate a function, WsiFor width-mapped convolution kernels, betasiFor randomly generated bias terms, n is the total number of feature points. Thus, all width mapping features may be denoted as Mn=[M1,M2,...,Mn]。
The input of the width characteristic enhancement layer is a width mapping characteristic, and the obtained mth group of width enhancement characteristics satisfy the following expression:
Hm=σ(MnWhmhm)
wherein HmFor the mth group of width-enhancing features, σ is the activation function, WhmFor width-enhanced convolution kernels, betahmIs a randomly generated bias term.
The output classification result can be expressed as:
Y=[M1,M2,...,Mn|σ(MnWh1h1),...,σ(MnWhmhm)]Wm
wherein WmRepresenting the connection weights of the width mapping node and the width enhancing node.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (6)

1. A remote sensing image scene classification method based on a space and multi-channel fusion attention mechanism is characterized by comprising the following steps:
step S1: providing a remote sensing image to be classified, preprocessing the remote sensing image and acquiring corresponding mapping characteristics.
Step S2: building a spatial self-attention network, wherein the network comprises a convolutional layer and an excitation layer; and inputting the mapping characteristics into a spatial self-attention network to obtain spatial mapping characteristics.
Step S3: building a multi-channel fusion self-attention network, wherein the network comprises an upper sampling layer, a connecting layer, a fusion layer, a splitting layer, a lower sampling layer and a convolution layer; and inputting the mapping characteristics into a multi-channel fusion self-attention network to obtain the multi-channel fusion characteristics.
Step S4: and synthesizing the spatial mapping feature and the multi-channel fusion feature.
Step S5: building a width classifier, wherein the classifier comprises a width feature mapping layer and a width feature enhancement layer; and inputting the synthesized spatial mapping characteristics and multi-channel fusion characteristics into the spatial classifier to obtain a classification result.
2. The method according to claim 1, wherein the input of the step S1 is a remote sensing image H x W (H, W respectively represents the length and width of the remote sensing image), and the multilayer residual mapping characteristics of the image are extracted through a ResNet network and are represented by R-Conv-1-4.
3. The method according to claim 1, wherein the input of step S2 is a first layer residual mapping feature R-Conv-1, and the spatial self-attention weight is obtained by a convolution operation connected to an activation function, and the steps are expressed as:
S=sigmoid(AsF)*F
wherein S is the space self-attention mapping characteristic of the output, sigmoid is a sigmoid activation function, ASFor the convolution kernel in the convolution layer, F ∈ RH×W×C(C is the feature dimension) is the input residual mapping feature.
4. The method according to claim 1, wherein the step S3 includes:
step S31: the input is the residual mapping feature of layer 2-4 extracted from ResNet, denoted R-Conv-2-4, and R-Conv-3-4 is upsampled to the same size as R-Conv-2.
Step S32: inputting the aligned mapping feature map into the connection layer, and forming a complementary channel fusion feature through the fusion layer.
Step S33: inputting the channel fusion characteristics into a segmentation layer, segmenting the complementary fusion characteristics, and restoring the complementary fusion characteristics to the original size.
5. The method according to claim 1, wherein the synthesis method in step S4 is as follows:
F′=F+sigmoid((Ws*F)*(Wc*F))
wherein F' is a synthetic feature, WsFor spatial self-attention network weights, WcFor the multi-channel fusion self-attention network weights, matrix multiplication is represented.
6. The method according to claim 1, wherein step S5 includes:
s51: the input of the width classifier is a synthesized feature F', and the corresponding width mapping feature can be expressed as:
Figure FDA0002722815790000021
wherein M isiFor the ith width-mapping feature,
Figure FDA0002722815790000022
to activate a function, WsiFor width-mapped convolution kernels, betasiFor randomly generated bias terms, n is the total number of feature points. Thus, all width mapping features may be denoted as Mn=[M1,M2,...,Mn]。
S52, inputting the width characteristic enhancement layer as a width mapping characteristic, and obtaining the m-th group of width enhancement characteristics which satisfy the following expression:
Hm=σ(MnWhmhm)
wherein HmFor the mth group of width-enhancing features, σ is the activation function, WhmFor width-enhanced convolution kernels, betahmIs a randomly generated bias term.
S53: the output classification result can be expressed as:
Y=[M1,M2,...,Mn|σ(MnWh1h1),...,σ(MnWhmhm)]Wm
wherein WmRepresenting a connection of a width mapping node and a width enhancing nodeAnd (4) weighting.
CN202011093081.6A 2020-10-13 2020-10-13 Remote sensing image scene classification method based on space and multi-channel fusion self-attention network Pending CN114359613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011093081.6A CN114359613A (en) 2020-10-13 2020-10-13 Remote sensing image scene classification method based on space and multi-channel fusion self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011093081.6A CN114359613A (en) 2020-10-13 2020-10-13 Remote sensing image scene classification method based on space and multi-channel fusion self-attention network

Publications (1)

Publication Number Publication Date
CN114359613A true CN114359613A (en) 2022-04-15

Family

ID=81089672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011093081.6A Pending CN114359613A (en) 2020-10-13 2020-10-13 Remote sensing image scene classification method based on space and multi-channel fusion self-attention network

Country Status (1)

Country Link
CN (1) CN114359613A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721301A (en) * 2023-08-10 2023-09-08 中国地质大学(武汉) Training method, classifying method, device and storage medium for target scene classifying model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721301A (en) * 2023-08-10 2023-09-08 中国地质大学(武汉) Training method, classifying method, device and storage medium for target scene classifying model
CN116721301B (en) * 2023-08-10 2023-10-24 中国地质大学(武汉) Training method, classifying method, device and storage medium for target scene classifying model

Similar Documents

Publication Publication Date Title
CN112966684B (en) Cooperative learning character recognition method under attention mechanism
CN107038448B (en) Target detection model construction method
CN111210443A (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN111047551A (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN109948707B (en) Model training method, device, terminal and storage medium
CN112949673A (en) Feature fusion target detection and identification method based on global attention
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN110517270B (en) Indoor scene semantic segmentation method based on super-pixel depth network
CN111160407A (en) Deep learning target detection method and system
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN112232371A (en) American license plate recognition method based on YOLOv3 and text recognition
Li et al. A review of deep learning methods for pixel-level crack detection
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN114743126A (en) Lane line sign segmentation method based on graph attention machine mechanism network
CN114359613A (en) Remote sensing image scene classification method based on space and multi-channel fusion self-attention network
CN113223006B (en) Lightweight target semantic segmentation method based on deep learning
CN114494827A (en) Small target detection method for detecting aerial picture
Kaushik et al. A Survey of Approaches for Sign Language Recognition System
CN115272814B (en) Long-distance space self-adaptive multi-scale small target detection method
CN117075778B (en) Information processing system for picture and text
CN117496160B (en) Indoor scene-oriented semantic segmentation method for low-illumination image shot by unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination