CN114048810A

CN114048810A - Hyperspectral image classification method based on multilevel feature extraction network

Info

Publication number: CN114048810A
Application number: CN202111326280.1A
Authority: CN
Inventors: 陆小辰; 杨德政; 贾逢德; 阳云龙; 翟梦琳
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-02-15

Abstract

The invention relates to a hyperspectral image classification method based on a multilevel feature extraction network, which is characterized in that a multilevel joint feature extraction network is constructed, the multilevel joint feature extraction network acquires the most distinctive features on each wave band of a hyperspectral pixel by a lightweight framework, and converts the most distinctive features into global attention details for further joint feature extraction, so as to obtain deep semantic features and further improve the classification effect of hyperspectral images. According to the method, the spectral information of the hyperspectral image is fully utilized, the ground objects can be classified under the condition of a certain amount of samples, the number of parameters in the feature extraction process is effectively reduced, and the Hughes phenomenon is effectively resisted. The model provided by the invention can fully utilize the spectral characteristics of the hyperspectral image and effectively relieve the problems of intra-high-class difference and inter-high-class similarity of the pixels in the hyperspectral image in other methods.

Description

Hyperspectral image classification method based on multilevel feature extraction network

Technical Field

The invention relates to a hyperspectral image classification method.

Background

The hyperspectral imaging is one of the most important detection means in the field of remote sensing, and can obtain the spectral information of various ground objects while obtaining the ground surface distribution information, thereby realizing the combination of images and spectral information. The most prominent characteristics of the hyperspectral image are fine spectral resolution and rich spectral information, so that diagnostic characteristics different from the traditional multispectral or the characteristics found by human eyes can be effectively detected, and the cognitive ability of human beings to the world is remarkably improved. Therefore, the hyperspectral image is widely applied to the field of remote sensing image classification. The hyperspectral image classification technology is utilized, the ground environment transformation can be monitored, and therefore powerful data support is provided for ecological protection; the growth condition of the surface crops can be obtained, and real-time image data are provided for agricultural development; more, hyperspectral image classification is applied to the fields of mineral resource surveying, surface building identification and the like.

The traditional hyperspectral image classification methods include a support vector machine, a Markov random field, a decision tree and the like, most of the methods depend on feature information of artificial marks, and deep information in original data is difficult to obtain. With the development of deep learning technology, the convolutional neural network has great potential and advantages in the field of image processing. The convolutional neural network can autonomously learn and fit various features of the image to form an accurate representation of the image. At present, a convolutional neural network is widely applied to the field of remote sensing image classification, but due to the inherent spectral heterogeneity, Hughes and other problems of hyperspectral images, the conventional hyperspectral image classification method based on the convolutional neural network is difficult to fully utilize the spectral information of the hyperspectral images, and the Hughes phenomenon is serious under the condition that a small number of samples are difficult to deal with.

Disclosure of Invention

The purpose of the invention is: the method overcomes the limitation that a feature extraction model needs to be manually marked in the traditional image classification, realizes full utilization of spectral information of the hyperspectral image under a certain amount of samples through a convolutional neural network, and improves the representation of captured features, thereby effectively improving the ground object classification effect of the hyperspectral image.

In order to achieve the above object, the present invention provides a hyperspectral image classification method based on a multilevel feature extraction network, which is characterized in that a multilevel joint feature extraction network is constructed, the multilevel joint feature extraction network acquires the most distinctive features on each wave band of a hyperspectral pixel with a lightweight framework, and converts the features into global attention details for further joint feature extraction to obtain deep semantic features, thereby improving the classification effect of hyperspectral images, and specifically comprises the following steps:

step 1, constructing a multi-level combined feature extraction network, wherein the multi-level combined feature extraction network comprises two channel-by-channel multi-scale feature extraction modules, two combined attention detail fusion modules, two point-by-point convolution units and a feature fusion and decision module;

step 2, generating a data set:

based on the existing mark data in the hyperspectral image, filling the hyperspectral image in an edge mirror image filling mode to generate a data block with a mark pixel as a center, and dividing the obtained data block into a training sample set, a verification sample set and a test sample set in proportion;

step 3, sending the training sample set into the multi-level combined feature extraction network constructed in the step 1, normalizing the input training sample through a regularization layer to obtain a regularized training sample P',

s multiplied by S is the size of a data block in space, N is the batch number of training samples, and C represents that a hyperspectral image has C spectral bands;

step 4, sending the training sample P' into a channel-by-channel multi-scale feature extraction module to obtain feature information on each channel in the input data, and the method comprises the following steps:

step 401, sending data into a two-dimensional depth convolution layer with a size of 3 × 3 and an expansion rate of 1 and a two-dimensional depth convolution layer with a size of 3 × 3 and an expansion rate of 2 simultaneously to obtain multi-scale context information more focused on a central pixel point, and obtaining two feature maps with the same size;

step 402, respectively regularizing and adding the two obtained feature maps with the same size, then activating by using a linear correction unit function, sending the activated feature information into a standard convolution layer with the size of 3 multiplied by 3 and regularizing to obtain a feature map DC₃；

Step 403, feature map DC₃The distribution of key information on each channel in the input data is strengthened through residual mapping, and further information interaction between the channels is realized through a point-by-point convolution unit to obtain a characteristic diagram F_pwUsing activation function to map feature F_pwFeature map F obtained after activation₃，

Feature map F₃Will be sent to the feature fusion and decision module;

step 5, obtaining the characteristic diagram F obtained in the step 4_pwAnd sending to a joint attention detail fusion module:

the combined attention detail fusion module firstly carries out the feature map F_pwSending the space spectrum feature into a sigmoid function, converting the extracted space spectrum feature into original input combined attention detail distribution, multiplying the space spectrum feature by the training sample P 'in the step 3, and injecting the multiplied space spectrum feature into the training sample P' to obtain feature information of noise suppression information so as to obtain the training sampleP′₁(ii) a Training sample P 'was then extracted with a 3X 7 sized three-dimensional convolutional layer'₁Combined characteristic F of₄，

S ' x S ' is the size of the combined feature in space, C ' represents the corresponding channel size;

step 6, combining characteristics F obtained in the step 5₄Extracting the deep semantic features by the same steps of the step 4 and the step 5 to obtain the combined features F output by the repeated step 5₅And feature map F output by using repeated step 403₆，

Step 7, converting the characteristic diagram F₃、F₅And F₆Meanwhile, global pooling operation on a spatial scale is carried out and is sent into a feature fusion and decision module; in the feature fusion and decision module, the feature graph F is passed₃、F₅And F₆Splicing the obtained three eigenvectors to obtain a multi-level eigenvector F₇，

Feature vector F₇The feature vectors are sent into two full-connection layers for information fusion, and finally, the fused feature vectors are sent into a softmax classifier to obtain a final classification result;

step 8, training the network

Combining the classification result obtained in the step 7 with the real category of the training sample, and constructing a loss function of the multi-level combined feature extraction network through cross entropy; meanwhile, carrying out L2 regularization constraint on the parameters of the first full-connection layer in the step 7, and adding a Dropout layer to prevent an overfitting phenomenon in the training process; finally, repeating the steps from the step 3 to the step 7 through a back propagation algorithm to carry out multiple iterations so as to obtain the best classification effect;

in the training process, model parameters are adjusted through a verification set; after an ideal classification effect is achieved, the model is evaluated through a test set;

and 9, generating a data block by all pixels in the original hyperspectral image obtained in real time in the same manner in the step 2, sending the data block into the multi-level combined feature extraction network trained in the step 8 for classification and prediction, and finally obtaining the classified hyperspectral image.

According to the method, the spectral information of the hyperspectral image is fully utilized, the ground objects can be classified under the condition of a certain amount of samples, the number of parameters in the feature extraction process is effectively reduced, and the Hughes phenomenon is effectively resisted. Through the channel-by-channel multi-scale feature extraction module, the most characteristic feature information on each spectral channel of the hyperspectral image can be effectively obtained, and a basis is provided for further finding out key information in the hyperspectral image; through the combined attention detail fusion module, the invention converts the existing global features into the combined attention details in a concomitant mode, and inhibits redundant information in input data under the condition of not increasing training parameters so as to obtain a deeper spatial spectrum combined feature. The model provided by the invention can fully utilize the spectral characteristics of the hyperspectral image and effectively relieve the problems of intra-high-class difference and inter-high-class similarity of the pixels in the hyperspectral image in other methods.

Drawings

FIG. 1 is a schematic illustration of a process according to the present invention;

FIG. 2 is a flow chart of a method of the present invention;

FIG. 3 is a diagram of an embodiment of the method of the present invention, wherein a) a Kennedy center hyperspectral pseudo-color image, b) a Kennedy center dataset sample class label, c) a Salinaus hyperspectral pseudo-color image, d) a Salinaus dataset sample class label, e) a university of Pavea hyperspectral pseudo-color image, f) a university of Pavea dataset sample class label;

FIG. 4 is a result graph obtained by the method of the present invention on the example graph, wherein a-1) to a-9) are the classification results of Kennedy hyperspectral images, b-1) to b-9) are the classification results of Salinaus hyperspectral images, and c-1) to c-9) are the classification results of the university of Pavea hyperspectral images.

Detailed Description

The invention is further illustrated by the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

Taking a salenas hyperspectral image acquired by an aeronautical and astronautic office Airborne Visible/Infrared spectroscopy imager (AVIRIS) of the united states at the valley of salenas, california as an example, the image contains 204 spectral bands after removing noise bands, covers spectral information in the range of 0.4-2.5 μm, has a space size of 512 × 217, and has a spectral resolution and a spatial resolution of 10nm and 3.7 m. In the whole image, a total of 54129 image elements are marked into 16 classes, wherein: 5% of the pels were used as training samples, 2.5% as validation samples, and the remainder as test samples.

The hyperspectral image classification method based on the multilevel feature extraction network provided by the invention specifically comprises the following steps:

1) constructing a multi-level combined feature extraction network, wherein the specific structure is shown in figure 1;

2) let the original hyperspectral image be

M marked pixels in the H multiplied by W pixels are divided into T categories according to the characteristics of C spectral bands of the M marked pixels. Based on M marked pixels, generating a three-dimensional data block with the marked pixels as the center in a mode of edge mirror image filling

Where S × S is the size of the data block in space. Dividing the obtained three-dimensional data block P into a training sample set P according to the proportion stated in the foregoing_trainVerifying the sample set P_valAnd test sample set P_testThat is, 5% of the three-dimensional data blocks P are used as training samples, 2.5% of the three-dimensional data blocks P are used as verification samples, and the remaining three-dimensional data blocks P are used as test samples.

3) Will train the sample set P_trainSending the multi-level combined feature extraction network constructed in the step 1). In the multi-level combined feature extraction network, training samples are normalized through a regularization layer to obtain regularized training samples

Where N represents the number of batches of training samples, in this embodiment, N is 32. The training sample P' corresponds to a unique hot code label vector set of

The correlation calculation is shown in the following formula (1):

P′＝ο(P_train) (1)

in expression (1), o (-) is a regularization operation.

4) Sending the training sample P' into a channel-by-channel multi-scale feature extraction module to obtain feature information on each channel in the training sample data, and specifically comprising the following steps:

401) simultaneously sending training sample data into a two-dimensional depth convolution layer with the size of 3 multiplied by 3 and the expansion rate of 1 and a two-dimensional depth convolution layer with the size of 3 multiplied by 3 and the expansion rate of 2 so as to obtain multi-scale context information which is more concentrated on the central pixel point;

402) respectively carrying out regularization treatment on two feature maps with the same size obtained by two-dimensional depth convolution layers and then adding the feature maps;

403) activation is performed by using a Linear correction Unit (ReLU) function, which is specifically calculated as shown in the following formula (2):

DC₁＝W₁*P′+b₁

DC₂＝W₂*P′+b₂ (2)

F₁＝δ(ο(DC₂)+ο(DC₂))

in the formula (2), W₁、W₂Convolution kernels of the normal depth convolution and the depth convolution with a dilation rate of 2, b, respectively₁、b₂Is a corresponding filter bias parameter, DC₁、DC₂Is a two feature map of equal size, δ (-) is the ReLU activation function, F₁Is the final output feature map;

404) activated feature map F₁Sending the data into a standard convolution layer with the size of 3 multiplied by 3 and carrying out regularization treatment; then, the characteristic diagram DC subjected to the characteristic processing₃The distribution of key information on each channel in the input data is strengthened through residual mapping, and further information interaction between channels is realized by using a point-by-point convolution unit, and the specific calculation is as shown in the following formula (3):

in the formula (3), W₃、b₃Convolution kernels of a standard 3 x 3 sized convolution layer and corresponding filter bias parameters, F₂Corresponding intermediate calculation result, F_pwFeature maps output for point-by-point convolution, W_pw、b_pwThe convolution kernel and corresponding filter bias parameters for the point-by-point convolution,

is a feature map to be fed into a feature fusion and decision module;

5) the characteristic diagram F obtained in the step 4) is used_pwSending into a joint attention detail fusion module, wherein in the joint attention detail fusion module:

the obtained characteristic diagram F_pwSending a sigmoid function, converting the extracted spatial spectrum characteristics into original input combined attention detail distribution, multiplying the spatial spectrum characteristics by the training sample P 'in the step 3), and injecting the multiplied spatial spectrum characteristics into the training sample P' to obtain characteristic information for inhibiting noise information; further, the obtained features are extracted using a three-dimensional convolution layer of 3X 7 sizeThe joint feature of the feature information is specifically calculated as shown in the following formula (4):

in the formula (4), sigmoid function sigma (-) converts feature map F_pwScaling to [0,1 ]]Within range, convert to global attention detail distribution A and multiply by element

Injecting into P 'to obtain P'₁Further obtaining a combined feature F by a three-dimensional convolution operation₄，W₄、b₄A convolution kernel that is a standard 3 x 3 sized three-dimensional convolution and corresponding filter bias parameters;

6) combining the characteristics F obtained in step 5)₄Extracting deep semantic features to obtain combined features F by the same steps of step 4) and step 5)₅Output characteristic diagram F of another point-by-point convolution unit₆；

7) Will feature map

And simultaneously performing global pooling operation on a spatial scale, and sending the global pooling operation into a feature fusion and decision module, wherein S '× S' and S '× S' are the sizes of the data blocks in the corresponding space, and C 'and C' represent the sizes of the corresponding features in the channel dimension.

In the feature fusion and decision module, the obtained three feature vectors are spliced to obtain a multi-level feature vector F₇The following formula (5) is calculated:

in the formula (5), the reaction mixture is,

and is respectively the global convolution pooling operation and directionAnd (4) measuring splicing operation.

Feature vector F₇Will be sent to two full connection layers for information fusion to obtain fused feature vector F₈The following formula (6) is calculated:

in the formula (6), the reaction mixture is,

is a binary distance; λ is a penalty factor, which is set to 0.02 in this embodiment, for the parameter of the first fully-connected layer

Performing an L2 regularization constraint; φ (-) is a Dropout layer operation with the mask rate set to 0.4; is a matrix multiplication;

is the output of the second fully connected layer;

a filter bias parameter representing a first fully-connected layer;

representing the filter bias parameters of the second fully connected layer.

Fused feature vector F₇Will be sent to the softmax classifier to obtain the final classification result

The calculation is specifically shown in the following formula (7):

in the formula (7), the reaction mixture is,

is a set of prediction vectors F₈The ith prediction vector;

is a reaction of with F₈ ⁱThe output vector corresponding to (softmax-;

is the softmax (-) output vector for the batch of training samples; prediction is the classification prediction result of the final batch of training samples obtained by the argmax (·) function.

8) And training the network.

Combining the classification result obtained in the step 7) with the real category of the training sample, constructing a loss function of the network through cross entropy, and calculating as shown in the following formula (8):

in the formula (8), the reaction mixture is,

is the real label Y corresponding to the ith sample in the batch sample_iThe value of the t-th value of (c),

is the corresponding prediction tag;

and finally, repeating the steps from the step 3) to the step 7) for batch multiple iterations through a back propagation algorithm and an Adam random gradient descent method to obtain the best classification effect. During the training process, the validation set will be used to adjust the model parameters. After the desired classification effect is achieved, the test set is used to evaluate the model.

9) Generating data blocks of all pixels in the original hyperspectral image in the same way in the step 2)

And sending the hyperspectral images into the network trained in the step 8) for classification and prediction, and finally obtaining the classified hyperspectral images

Fig. 3 shows an embodiment of the present invention, in which a) is a hyperspectral pseudo-color image of kennedy center, b) is a schematic diagram of class labels for marking pixels, the data set is obtained by AVIRIS equipment over kennedy space center in the united states, and after removing the low signal-to-noise ratio band, the image contains 176 bands, and the spectral information covering the range of 0.4-2.5 μm, and the spatial size is that the spectral resolution and the spatial resolution are 10nm and 18 m. In the whole image, 5211 pixels are marked and classified into 13 types, wherein 15% of the pixels are used as training samples, 5% of the pixels are used as verification samples, and the rest pixels are used as test samples; fig. c) is a hyperspectral false color image of salinus, d) is a schematic diagram of a category label of a marked pixel, and detailed information is described in the foregoing; fig. e) is a hyperspectral pseudo-color image of houston university, f) is a class label schematic diagram of a labeled pixel thereof, the data set is obtained by a Reflective optical System Imaging Spectrometer (rosss) over the university of italian paviia, the image comprises 103 bands, spectral information covering the range of 0.43-0.86 μm, and the spatial dimensions are such that the spectral resolution and the spatial resolution are divided into 3.74nm and 1.3 m. In the whole image, 42766 pixels are marked and classified into 9 types, wherein 5% of the pixels are used as training samples, 2.5% of the pixels are used as verification samples, and the rest pixels are used as test samples.

The method designed by the invention is compared with 8 hyperspectral image fusion methods, namely SVM (support vector machine-based classification method), EMP (extended morphological contour-based classification method), hybrid SN (hybrid convolutional network-based classification method), RSSAN (residual spectrum-air attention network-based classification method), JSAN (joint air-spectrum attention network-based classification method), SSSERN (air-spectrum feature compression and excitation residual error network-based classification method), TSCNN (double-current convolutional neural network-based classification method of depth feature fusion network), FG-SSCA (serial air-spectrum attention mechanism and feature packet network-based classification method), wherein the classification results of Kennedy data are shown in table 1, the classification results of Sanaissay data are shown in table 2, the classification results of the university of Pavea are shown in table 3, wherein the data evaluation standard is Overall Accuracy (Overallaccuracy, OA), Average Accuracy (AA), and Kappa coefficient (Kappa coefficient).

Table 1 kennedy dataset classification evaluation results (%)

TABLE 2 Classification evaluation results (%)

Table 3 university of parkia data set classification evaluation results (%)

In the eight methods, the relevant parameters of the first two traditional methods are obtained by manual debugging, and the relevant parameters of the other six deep learning-based classification methods are selected according to the prompts of the relevant papers; in the method provided by the invention, the spatial size of the data block is set as 0.0003 for the learning rate and 0.9 for the momentum, and meanwhile, the attenuation coefficient of each 100 iterations of the learning rate is set as 0.0005, the batch number is set as 32, and 300 iterations are performed. The result shows that the method designed by the invention has better classification effect. The prediction accuracy of the classification method on each category of each data set also shows the effectiveness of the method, the spectral features of the hyperspectral images can be fully utilized by the method provided by the invention, and the distinguishable multi-level combined features are obtained for classification, and to a certain extent, the method can relieve the challenges brought by spectral heterogeneity under a small amount of data.

Fig. 4 is a classification result image of the present invention and other methods, and it can be seen from the figure that compared with other fusion methods, the method provided by the present invention can better overcome the phenomenon of spectral heterogeneity, and obtain a good classification image.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto, and those skilled in the art can make possible variations and modifications of the present invention using the above-described methods and techniques without departing from the spirit and scope of the present invention. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention, unless the content of the technical solution of the present invention is departed from.

Claims

1. The method for classifying the hyperspectral images based on the multilevel feature extraction network is characterized in that the multilevel joint feature extraction network is constructed, the multilevel joint feature extraction network acquires the most distinctive features on each wave band of a hyperspectral pixel by a lightweight framework, converts the distinctive features into global attention details for further joint feature extraction, obtains deep semantic features, and further improves the classification effect of the hyperspectral images, and specifically comprises the following steps:

step 2, generating a data set:

Step 403, feature map DC₃The distribution of key information on each channel in the input data is strengthened through residual mapping, and further information interaction between the channels is realized through a point-by-point convolution unit to obtain a characteristic diagram F_pwBy using laserLive function graph F_pwFeature map F obtained after activation₃，

Feature map F₃Will be sent to the feature fusion and decision module;

the combined attention detail fusion module firstly carries out the feature map F_pwSending the space spectrum feature into a sigmoid function, converting the extracted space spectrum feature into original input combined attention detail distribution, multiplying the space spectrum feature by the training sample P 'in the step 3, and injecting the multiplied space spectrum feature into the training sample P' to obtain feature information of noise suppression information so as to obtain the training sample P₁'; the training sample P is then extracted using a three-dimensional convolution layer of size 3X 7₁' combination feature F₄，

S ' x S ' is the size of the data block in space, C ' represents the corresponding channel size;

step 8, training the network