CN112949771A

CN112949771A - Hyperspectral remote sensing image classification method based on multi-depth multi-scale hierarchical attention fusion mechanism

Info

Publication number: CN112949771A
Application number: CN202110377199.XA
Authority: CN
Inventors: 高红民; 陈忠昊; 张亦严; 曹雪莹; 李臣明
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2021-06-11

Abstract

The invention discloses a hyperspectral remote sensing image classification method based on a multi-depth multi-scale hierarchical attention fusion mechanism, which comprises the following steps: preprocessing the hyperspectral image data; acquiring and dividing a data set of the preprocessed hyperspectral image data to acquire a training sample set and a test sample set; constructing a convolutional neural network based on gradient shrinkage multi-scale feature extraction; constructing a multi-scale feature extraction and hierarchical attention feature strengthening fusion model based on gradient shrinkage; initializing weight parameters of the fusion model; training the fusion model to obtain a hyperspectral image classification model; and predicting the test sample set to obtain a classification result. The method effectively improves the feature utilization rate of the model, solves the problem of low extraction efficiency of hyperspectral remote sensing image classification during multi-scale feature extraction, improves the classification accuracy rate while ensuring the operation efficiency, and realizes a better classification effect.

Description

Hyperspectral remote sensing image classification method based on multi-depth multi-scale hierarchical attention fusion mechanism

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a hyperspectral remote sensing image classification method based on a multi-depth multi-scale hierarchical attention fusion mechanism.

Background

The remote sensing image processing technology plays an increasingly important role in production and life with the advantages of high flexibility, good reproducibility and the like. The hyperspectral image processing technology is an important branch of the field, and among a plurality of hyperspectral image processing technologies, the hyperspectral image classification technology is always one of research focuses and hotspots. By utilizing a hyperspectral image classification technology, the distribution and growth conditions of crops can be acquired, so that effective and scientific agricultural management is achieved; the method can accurately identify and classify urban houses, pavements and the like, thereby providing help for urban planning and the like.

The traditional hyperspectral image classification method can be roughly divided into a spectrum matching classification method according to a ground feature spectral characteristic curve and a data statistical characteristic-based classification method, and the data statistical characteristic-based classification method can be divided into supervised classification and unsupervised classification according to whether a sample with a mark is required in model training. The unsupervised classification algorithm is that only a training data set is used, no sample with labeled classification results is used, and data features can be analyzed only by a computer. The supervised classification algorithm refers to that a training data set and samples of labeled classification results are input into a model together, so that the obtained classification precision is necessarily high, but a large number of labeled samples are needed.

The excellent performance of the deep learning method has led to extensive research in a number of fields in recent years. In particular, and depending on the field of research, various derived network architectures have emerged. For example, in the direction of supervised learning, cyclic, convolutional and atlas neural networks have achieved excellent success in the fields of natural language processing, computer vision, and irregular data processing. However, when the hyperspectral image classification is processed, the diversity of the spatial feature size is often ignored, so that the further improvement of the classification precision is limited, especially under the condition of small samples. Convolutional neural network based methods have demonstrated strong feature extraction capabilities. In addition, the fusion of multi-scale features and multi-stage features can significantly improve the feature extraction efficiency of the model, but some defects still need to be further overcome at present. Firstly, as the network becomes deeper and deeper, the extracted features of the neural network gradually change from concrete texture edge features to abstract semantic features, so that the sensitivities to the feature scales at different depths of the network are different, and the loss and redundancy of the features can be caused by extracting multi-scale features only at the input end of the model or by designing the full model for multi-scale feature extraction. Second, although the abstract semantic information extracted by the deep neural network plays a large role in the classification task, the large amount of convolution and pooling can seriously impair edge information. Thirdly, the model with a large number of training parameters can effectively improve the classification accuracy, but consumes a large amount of time and storage resources.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the hyperspectral remote sensing image classification method based on the multi-depth multi-scale hierarchical attention fusion mechanism is provided, the problem that the extraction efficiency is low when multi-scale features are extracted in the hyperspectral remote sensing image classification based on the deep neural network can be solved, and the operation efficiency of a model is improved.

The technical scheme is as follows: in order to achieve the purpose, the invention provides a hyperspectral remote sensing image classification method based on a multi-depth multi-scale hierarchical attention fusion mechanism, which comprises the following steps:

s1: preprocessing the hyperspectral image data;

s2: acquiring and dividing a data set of the preprocessed hyperspectral image data to acquire a training sample set and a test sample set;

s3: constructing a convolutional neural network based on gradient shrinkage multi-scale feature extraction;

s4: constructing a gradient shrinkage-based multi-scale feature extraction and hierarchical attention feature enhancement fusion model according to the convolutional neural network of the step S3;

s5: initializing the weight parameters of the fusion model constructed in the step S4;

s6: training a fusion model by using the training sample set obtained in the step S2 to obtain a hyperspectral image classification model;

s7: and predicting the test sample set obtained in the step S2 through a hyperspectral image classification model to obtain a classification result.

Further, the preprocessing in step S1 includes normalization and de-redundancy processing, and the specific processing method is as follows:

a1: acquiring data B of each wave band of hyperspectral image_iCalculating the average value Ave of the band data_iAnd standard deviation S_i；

A2: calculating to obtain a normalized value N of each wave band data according to the following formula_i：

N_i＝(I_i-Ave_i)/S_i

A3: recombining all normalized wave bands obtained in the step A1 and the step A2 into a normalized hyperspectral image H;

a4: b, processing the hyperspectral remote sensing image H obtained in the step A3 to remove redundant wave bands through a PCA technology to obtain a preprocessed hyperspectral remote sensing image

Further, the method for acquiring the training sample set and the test sample set in step S2 includes:

b1: taking each pixel point to be classified as a center of the preprocessed hyperspectral data to obtain a cube neighborhood block;

b2: and selecting a set amount of neighborhood blocks and class labels corresponding to central pixel points of the neighborhood blocks from all the obtained neighborhood blocks as a training sample set, and using the rest neighborhood blocks and the class labels corresponding to the central pixel points of the neighborhood blocks as a test sample set.

Further, the convolutional neural network in step S3 sequentially includes an input layer, a multi-depth multi-scale feature extraction module a, a multi-depth multi-scale feature extraction module B, a multi-depth multi-scale feature extraction module C, and a multi-depth multi-scale feature extraction module D.

A feature extraction network is constructed by four different multi-depth multi-scale feature extraction modules, and the constructed structures of the different multi-depth multi-scale feature extraction modules are formed by connecting four multi-scale feature extraction branches in parallel, wherein the multi-scale feature extraction branches are formed by overlapping expansion convolutions in different numbers. The number of the expansion convolutions included in the four different branches is different, and the expansion rate of the expansion convolution kernel on each branch gradually rises, so that the local multi-depth multi-scale feature extraction is formed. In addition, the number of branches from the multi-depth multi-scale feature extraction module A to the multi-depth multi-scale feature extraction module D is gradually reduced, so that a gradient-shrinkage multi-scale feature extraction network is formed.

Further, the number of branches of the multi-depth multi-scale feature extraction module a, the multi-depth multi-scale feature extraction module B, the multi-depth multi-scale feature extraction module C and the multi-depth multi-scale feature extraction module D is 4, 3, 2 and 1 respectively;

the structure of 4 branches of the multi-depth multi-scale feature extraction module A is as follows: feature input X₁→ 3 × 3 convolution, dilation rate 1 → feature 1_ 1; x₁→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 1_ 2; x₁→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → feature 1_ 3; x₁→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → 3 × 3 convolution, dilation rate of 4 → feature 1_ 4; feature(s)

Feature(s)

Feature(s)

Feature 1_4 ═ feature X output by multi-depth multi-scale feature extraction module a₂；

Representing a characteristic connection;

the structure of 3 branches of the multi-depth multi-scale feature extraction module B is as follows: feature input X₂→ 3 × 3 convolution, dilation rate 1 → feature 2_ 1; x₂→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 2_ 2; x₂→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → feature 2_ 3; feature(s)

Feature(s)

Feature 2_3 ═ feature X output by multi-depth multi-scale feature extraction module B₃；

The structure of 2 branches of the multi-depth multi-scale feature extraction module C is as follows: feature input X₃→ 3 × 3 convolution, dilation rate 1 → feature 3_ 1; x₃→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 3_ 2; feature(s)

Feature 3_2 ═ feature X output by multi-depth multi-scale feature extraction module C₄；

The structure of 1 branch of the multi-depth multi-scale feature extraction module D is as follows: feature input X₄→ 3 × 3 convolution, dilation rate 1 → feature 4_ 1; feature 4_1 ═ feature X output by multi-depth multi-scale feature extraction module D₅。

Further, the fusion mode of the fusion model in step S4 is as follows: low level texture feature X by using attention mechanism₂And X₃Performing fusion to obtain low-level fusion characteristics LF; in the same way, the high-level abstract semantic feature X is₄And X₅Fusing to obtain high-level fusion characteristic HF; and finally, fusing the low-level fusion features LF and the high-level fusion features HF by using an attention mechanism to obtain a feature map LHF finally used for classification.

Further, the low-level fusion feature LF is obtained by the following process:

C1：X₂→ 1 × 1 convolution → obtaining dimension reduction feature map X'₂；

C2：X₃→ 1 × 1 convolution → obtaining dimension reduction feature map X'₃；

C3: prepared from X'₂And X'₃Adding to obtain a primary fusion characteristic X;

c4: the primary fusion feature X is passed through the following network: x → GAP (global average potential capacitance) → full junction → BatchNorm → Relu activation function → full junction → softmax → get the characteristic weight matrix M;

c5: and (3) the feature weight matrix M and the obtained dimension reduction feature map X'₂And X'₃Multiplying to obtain a feature diagram X' of the attention mechanism reinforcement₂And X ″)₃；

C6: will be characterized by the drawing X ″)₂And X ″)₃And adding and fusing to obtain the shallow low-level fusion characteristic LF.

Further, the obtaining method of the classification result in the step S7 is as follows:

d1: inputting the test sample set into a hyperspectral image classification model to obtain a prediction result of the test sample set;

d2: and obtaining the accuracy of the classification result by accurately calculating the classification label corresponding to the prediction result and the test sample set.

Further, the accuracy of the classification result in said step D2 includes overall classification accuracy (OA), average classification accuracy (AA) and KAPPA coefficient.

According to the method, a plurality of multi-depth multi-scale feature extraction modules with different sizes are firstly built, a multi-depth multi-scale feature extraction network is built by the feature extraction modules according to a certain sequence, then a hierarchical feature fusion network based on an attention mechanism is built, the feature fusion network is applied to the multi-depth multi-scale feature extraction network to fuse multi-stage features, and finally the fused features are used for a softmax classifier to realize classification. The method for extracting the features and fusing the features improves the extraction capability of the deep convolutional neural network on multi-scale features in a hyperspectral classification task and improves the utilization capability of the network extracted features.

Has the advantages that: compared with the prior art, the method can expand the receptive field and extract the multi-scale features, simultaneously generate fewer training parameters to improve the classification speed, respectively fuse the shallow layer edge features and the deep layer abstract semantic features to obtain clear concrete edge information and accurate abstract information, and finally fuse the two features with different degrees to extract more discriminative features, thereby effectively improving the feature utilization rate of the model, solving the problem of low extraction efficiency of the hyperspectral remote sensing image classification during multi-scale feature extraction, ensuring the operation efficiency, improving the classification accuracy and realizing better classification effect.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of a multi-depth multi-scale feature extraction module A constructed in the present invention;

FIG. 3 is a diagram showing the sizes of the receptive fields obtained from the first three branches in the multi-depth multi-scale feature extraction module A constructed by the present invention;

FIG. 4 is a data diagram of an Indian Pines hyperspectral remote sensing image in a simulation experiment of the present invention;

FIG. 5 is a pseudo-color label diagram of the simulation experiment Indian Pines hyperspectral remote sensing image data of the present invention;

FIG. 6 is a classification result diagram of an un-hierarchical attention-enhancing fusion model of Indian Pines hyperspectral remote sensing image data of a simulation experiment of the invention;

FIG. 7 is a classification result diagram of a hyperspectral remote sensing image classification method based on gradient shrinkage multiscale feature extraction and hierarchical attention feature fusion for Indian Pines hyperspectral remote sensing image data of a simulation experiment of the invention.

Detailed Description

The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

As shown in FIG. 1, the invention provides a hyperspectral remote sensing image classification method based on a multi-depth multi-scale hierarchical attention fusion mechanism, which comprises the following steps:

s1: preprocessing the hyperspectral image data;

In this embodiment, the preprocessing in step S1 includes normalization and de-redundancy processing, and the specific processing method is as follows:

N_i＝(I_i-Ave_i)/S_i

In this embodiment, the method for acquiring the training sample set and the test sample set in step S2 includes:

b1: taking each pixel point to be classified as a center of the preprocessed hyperspectral data, and acquiring a cube neighborhood block with the size of 15 multiplied by 15 pixels;

b2: and selecting 10% of the neighborhood blocks and the class labels corresponding to the central pixel points thereof from all the obtained neighborhood blocks as a training sample set X _ train, and using the rest of the neighborhood blocks and the class labels corresponding to the central pixel points thereof as a test sample set X _ test.

In this embodiment, the convolutional neural network in step S3 sequentially includes an input layer, a multi-depth multi-scale feature extraction module a, a multi-depth multi-scale feature extraction module B, a multi-depth multi-scale feature extraction module C, and a multi-depth multi-scale feature extraction module D. The number of branches from the multi-depth multi-scale feature extraction module A to the multi-depth multi-scale feature extraction module D is gradually reduced, and a gradient-shrinkage multi-scale feature extraction network is formed. The number of branches of the multi-depth multi-scale feature extraction module A, the multi-depth multi-scale feature extraction module B, the multi-depth multi-scale feature extraction module C and the multi-depth multi-scale feature extraction module D is respectively 4, 3, 2 and 1;

as shown in fig. 2, the structure of the 4 branches of the multi-depth multi-scale feature extraction module a is: feature input X₁→ 3 × 3 convolution, dilation rate 1 → feature 1_ 1; x₁→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 1_ 2; x₁→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → feature 1_ 3; x₁→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → 3 × 3 convolution, dilation rate of 4 → feature 1_ 4; feature(s)

Feature(s)

Feature(s)

Representing a characteristic connection;

referring to fig. 2, the structure of 3 branches of the multi-depth multi-scale feature extraction module B is: feature input X₂→ 3 × 3 convolution, dilation rate 1 → feature 2_ 1; x₂→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 2_ 2; x₂→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → feature 2_ 3; feature(s)

Feature(s)

Referring to fig. 2, the structure of 2 branches of the multi-depth multi-scale feature extraction module C is: feature input X₃→ 3 × 3 convolution, dilation rate 1 → feature 3_ 1; x₃→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 3_ 2; feature(s)

Referring to fig. 2, the structure of 1 branch of the multi-depth multi-scale feature extraction module D is: feature input X₄→ 3 × 3 convolution, dilation rate 1 → feature 4_ 1; feature 4_1 ═ feature X output by multi-depth multi-scale feature extraction module D₅。

Fig. 3 shows the size of the receptive field obtained by the first three branches in the multi-depth multi-scale feature extraction module a, which indicates that the receptive field is enlarged.

In this embodiment, a bilateral attention feature fusion module is constructed in step S4 to implement enhanced fusion of features, where the fusion mode of the fusion model is as follows: low level texture feature X by using attention mechanism₂And X₃Performing fusion to obtain low-level fusion characteristics LF; in the same way, the high-level abstract semantic feature X is₄And X₅Fusing to obtain high-level fusion characteristic HF; and finally, fusing the low-level fusion features LF and the high-level fusion features HF by using an attention mechanism to obtain a feature map LHF finally used for classification.

The acquisition process of the low-level fusion characteristic LF is as follows:

The high-level fusion feature HF is obtained in the same manner as the shallow-level fusion feature LF, and will not be described here.

The method for obtaining the classification result in step S7 in this embodiment is as follows:

d2: and obtaining the accuracy of the classification result by accurately calculating the classification label corresponding to the prediction result and the test sample set through an accurate calculation formula, wherein the accuracy of the classification result comprises overall classification accuracy (OA), average classification accuracy (AA) and KAPPA coefficient.

Based on the above scheme, in order to verify the effect of the method of the present invention, the present embodiment performs a simulation experiment, which is specifically as follows:

fig. 4 is a graph of Indian Pines hyperspectral remote sensing image data used in the experiment, fig. 4 is applied to the method, a pseudo-color label graph of the Indian Pines hyperspectral remote sensing image data shown in fig. 5 and a classification result graph of an unsevel attention-enhanced fusion model of the Indian Pines hyperspectral remote sensing image data shown in fig. 6 are respectively obtained, and finally a classification result graph of the hyperspectral remote sensing image classification method based on gradient shrinkage multiscale feature extraction and hierarchy attention feature fusion of the Indian Pines hyperspectral remote sensing image data shown in fig. 7 is obtained.

Table 1 shows the classification accuracy comparison condition of the Indian pipes hyperspectral remote sensing image and the classification accuracy comparison method of the invention and other current advanced methods.

TABLE 1

As can be seen from the data in Table 1, the classification effect of the method is better than that of other models, the classification effect is subdivided into overall classification accuracy (OA), and the average classification accuracy (AA) and KAPPA coefficient are higher than those of other models, so that the effect of the method is verified.

Claims

1. A hyperspectral remote sensing image classification method based on a multi-depth multi-scale hierarchical attention fusion mechanism is characterized by comprising the following steps:

s1: preprocessing the hyperspectral image data;

2. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 1, wherein the preprocessing in the step S1 comprises normalization and de-redundancy processing, and the specific processing method comprises the following steps:

N_i＝(I_i-Ave_i)/S_i

3. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 1, wherein the method for acquiring the training sample set and the test sample set in the step S2 comprises:

4. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 1, wherein the structure of the convolutional neural network in the step S3 sequentially comprises an input layer, a multi-depth multi-scale feature extraction module A, a multi-depth multi-scale feature extraction module B, a multi-depth multi-scale feature extraction module C and a multi-depth multi-scale feature extraction module D.

5. The hyperspectral remote sensing image classification method based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 4 is characterized in that, in the four multi-depth multi-scale feature extraction modules, the number of branches from the multi-depth multi-scale feature extraction module A to the multi-depth multi-scale feature extraction module D is gradually reduced to form a gradient-shrinkage multi-scale feature extraction network.

6. The hyperspectral remote sensing image classification method based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 5 is characterized in that the number of branches of the multi-depth multi-scale feature extraction module A, the multi-depth multi-scale feature extraction module B, the multi-depth multi-scale feature extraction module C and the multi-depth multi-scale feature extraction module D is respectively 4, 3, 2 and 1;

the structure of 4 branches of the multi-depth multi-scale feature extraction module A is as follows: feature input X₁→ 3 × 3 convolution, dilation rate 1 → feature 1_ 1; x₁→ 3 × 3 convolution, expansion ratio of 1 → 3 in a major extract3 convolution, dilation rate 2 → feature 1_ 2; x₁→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → feature 1_ 3; x₁→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → 3 × 3 convolution, dilation rate of 4 → feature 1_ 4; feature 1_1

Feature 1_2

Feature 1_3

Representing a characteristic connection;

the structure of 3 branches of the multi-depth multi-scale feature extraction module B is as follows: feature input X₂→ 3 × 3 convolution, dilation rate 1 → feature 2_ 1; x₂→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 2_ 2; x₂→ 3 × 3 convolution, dilation rate of 1 → 3 × 3 convolution, dilation rate of 2 → 3 × 3 convolution, dilation rate of 3 → feature 2_ 3; feature 2_1

Feature 2_2

The structure of 2 branches of the multi-depth multi-scale feature extraction module C is as follows: feature input X₃→ 3 × 3 convolution, dilation rate 1 → feature 3_ 1; x₃→ 3 × 3 convolution with a dilation rate of 1 → 3 × 3 convolution with a dilation rate of 2 → feature 3_ 2; feature 3_1

7. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 6, wherein the fusion mode of the fusion model in the step S4 is as follows: low level texture feature X by using attention mechanism₂And X₃Performing fusion to obtain low-level fusion characteristics LF; in the same way, the high-level abstract semantic feature X is₄And X₅Fusing to obtain high-level fusion characteristic HF; and finally, fusing the low-level fusion features LF and the high-level fusion features HF by using an attention mechanism to obtain a feature map LHF finally used for classification.

8. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 7, wherein the low-level fusion features LF are obtained in the following steps:

c4: the primary fusion feature X is passed through the following network: x → GAP → fully connected layer → BatchNorm → Relu activation function → fully connected layer → softmax layer → get the characteristic weight matrix M;

9. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 1, wherein the classification result obtained in the step S7 is obtained by:

10. The method for classifying the hyperspectral remote sensing images based on the multi-depth multi-scale hierarchical attention fusion mechanism according to claim 1, wherein the accuracy of the classification result in the step D2 comprises overall classification accuracy, average classification accuracy and KAPPA coefficient.