CN111985543A

CN111985543A - Construction method, classification method and system of hyperspectral image classification model

Info

Publication number: CN111985543A
Application number: CN202010781786.0A
Authority: CN
Inventors: 彭进业; 闫怀平; 王珺; 张二磊; 罗迒哉; 彭敏; 赵万青; 李斌
Original assignee: Xianyang Xinhepu Photoelectric Co ltd; Northwestern University
Current assignee: Xianyang Xinhepu Photoelectric Co ltd; Northwestern University
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2020-11-24
Anticipated expiration: 2040-08-06
Also published as: CN111985543B

Abstract

The invention provides a method for constructing a hyperspectral image classification model, an image classification method and a system, wherein the method for constructing the hyperspectral image classification model comprises the following steps: acquiring a hyperspectral image, and standardizing the hyperspectral image to enable the hyperspectral image to accord with the Gaussian distribution of unit variance of 0 mean value; performing category calibration to obtain a calibration sample set and a label set; constructing a neighborhood data cube, and classifying according to a set proportion to obtain a training sample set, a verification sample set and a test sample set; the hyperspectral image classification method provided by the invention adopts a cascade network structure which extracts spectral attention features and then spatial attention features, so that the network focuses more on interested spatial regions and meaningful spectral bands, the spectral features and the spatial features are combined, abundant spectral information and spatial information are fully utilized, therefore, higher classification precision can be obtained, and the obtained classified images are more continuous visually.

Description

Construction method, classification method and system of hyperspectral image classification model

Technical Field

The invention belongs to the technical field of hyperspectral image processing, and particularly relates to a construction method, a classification method and a system of a hyperspectral image classification model.

Background

The hyperspectral image is usually regarded as a three-dimensional data cube, and besides two dimensions of width and height in space, the hyperspectral image also has spectral information of hundreds of wave bands. The wave bands comprise visible light spectrums and also comprise wave band information such as ultraviolet light and near infrared light, people can find interested targets invisible in the visible light by utilizing the abundant information, and strong technical support is provided for the human learning world and the world change, so that more and more researches are carried out on hyperspectral images.

The hyperspectral image classification is a research hotspot in hyperspectral image research, and means that images are classified pixel by using spatial information and spectral information of the hyperspectral images. Early methods include classification methods based on support vector machines, methods for principal component analysis, dimensionality reduction and reclassification, and methods based on sparse representations. Because the methods cannot well utilize the depth information of the image, the classification accuracy is low.

In recent years, methods based on deep learning have been successful in image classification and target detection, and therefore many hyperspectral image classification methods based on deep learning have appeared. Common deep learning networks include automatic encoders, deep belief networks and convolutional neural networks, as well as networks that combine residual blocks with migratory learning. Although the existing deep learning-based method can obtain better classification accuracy, most of the existing methods need more training samples, and have the defects of high network complexity, deeper network depth, more parameters needing training and the like. In practical application, calibration samples of hyperspectral images are often limited, so that the classification accuracy of the existing method needs to be improved, and the robustness for different data sets needs to be enhanced.

Disclosure of Invention

Aiming at the defects and shortcomings of the prior art, the invention aims to provide a method for constructing a hyperspectral image classification model, an image classification method and a system, and solves the problem that the hyperspectral image classification accuracy is low due to insufficient extraction of hyperspectral image features in the method in the prior art.

In order to solve the technical problems, the invention adopts the following technical scheme: a construction method of a hyperspectral image classification model comprises the following steps:

acquiring a hyperspectral image, and standardizing the hyperspectral image to enable the hyperspectral image to accord with the Gaussian distribution of unit variance of 0 mean value;

secondly, performing category calibration on the ground object categories corresponding to the pixels of the standardized hyperspectral image to obtain a calibration sample set and a label set;

constructing a neighborhood data cube by taking the pixel of each sample in the calibration sample set as a center, wherein the dimension of the neighborhood data cube is w W B, w represents the spatial neighborhood of the pixel, B is the initial wave band number of the hyperspectral image, and dividing the constructed neighborhood data cube according to a set proportion to obtain a training sample set, a verification sample set and a test sample set;

taking a neighborhood data cube corresponding to each sample in the training sample set and the verification sample set as the input of the network, taking the label set as the expected output, and training the network to obtain a trained network model;

the network comprises a network input layer, a data dimension reduction module, a spectrum attention module, a secondary data dimension reduction module, a space attention mechanism module, a pooling module and a full-connection network which are sequentially connected in series;

the data dimension reduction module comprises two convolution modules which are connected in sequence, and each convolution module comprises a convolution layer, a batch normalization layer and an activation function layer which are connected in sequence;

the spectral attention module comprises a plurality of spectral attention feature layers which are connected in sequence;

the secondary data dimension reduction module comprises a convolution layer;

the space attention mechanism module comprises a plurality of space attention characteristic layers which are connected in sequence.

And if the boundary problem is encountered when the neighborhood data cube is constructed in the third step, extending the boundary by adopting a 0 boundary filling method.

The concrete method of the fourth step is as follows:

taking a neighborhood data cube corresponding to each sample in a training sample set and a verification sample set as the input of a network, taking a label set as expected output, calculating the cross entropy loss between a real label of the training sample and a network prediction output value, taking the minimized cross entropy loss as an optimization target, optimizing the network by adopting an adaptive moment estimation algorithm, testing an obtained network model on the verification set, updating network model parameters until iteration is finished if the obtained classification accuracy is superior to that of the previous network model, storing the model with the highest classification accuracy on the verification set, obtaining the trained network model, and evaluating the classification performance of the trained network model by adopting the test sample set.

A hyperspectral image classification method comprises the following steps:

acquiring a hyperspectral image set to be classified, inputting the hyperspectral image set into a network model, and classifying each pixel of the hyperspectral image by using the network model to obtain a final classification result;

the network model is a hyperspectral image classification model constructed by the construction method of the hyperspectral image classification model.

A hyperspectral image classification system comprises an image acquisition module, a data preprocessing module and a classification module;

the image acquisition module is used for acquiring a hyperspectral image;

the data preprocessing module is used for carrying out standardization processing on the hyperspectral image so that the input image conforms to Gaussian distribution of unit variance of 0 mean value;

the classification module classifies the hyperspectral images by adopting the method and outputs classification results.

A method of spatial attention feature visualization, comprising the steps of:

a, extracting a network layer label L where a space attention feature is located and a previous layer label L-1 according to a constructed hyperspectral image classification model and a constructed hyperspectral image classification system;

b, calling an L-1 network parameter M1 and an L-1 network parameter M2 from the hyperspectral image classification model, taking the hyperspectral image as an input image, obtaining an image before attention feature extraction by using the parameter M1, obtaining an image after attention feature extraction by using the parameter M2, and realizing the visualization of the attention feature;

the hyperspectral image classification model is a hyperspectral image classification model constructed by the construction method of the hyperspectral image classification model;

the hyperspectral image classification system is the image classification system.

Compared with the prior art, the invention has the following technical effects:

the hyperspectral image classification only uses spectral features to be influenced by 'same object different spectrum' and 'same foreign object spectrum', only uses spatial features to lose abundant spectral information, and both the two situations can cause lower classification precision, so the hyperspectral image classification technology provided by the invention adopts a cascade network structure which extracts the spectral features and then extracts the spatial features, combines the spectral features and the spatial features, and fully utilizes the abundant spectral information and the spatial information, thereby obtaining higher classification precision and obtaining more continuous classified images visually;

(II) the attention mechanism adopted by the invention enables the network to give different weights to different spectral bands and spatial regions, so that the method is more in line with the human visual mechanism, more meaningful characteristics can be obtained, and more meaningful characteristics can be obtained when the number of network layers is not too deep, so that the network has more advantages under the condition of a small sample training set;

(III) the invention adopts an end-to-end network structure, the whole network uniformly adopts 3D CNN on three modules of data dimension reduction, spectral attention feature extraction and spatial attention feature extraction, and fully utilizes the three-dimensional characteristics of a hyperspectral image;

(IV) the invention visualizes the spatial attention characteristics in the deep network, the former deep learning network is similar to a black box, so that the function of each layer is not easy to understand, and the invention can more intuitively understand the function of the attention mechanism by visualizing the spatial attention characteristics.

Drawings

FIG. 1 is a hyperspectral image classification model framework diagram of the invention;

FIG. 2 is a block diagram of a spectral attention module of the present invention;

FIG. 3 is a block diagram of a spatial attention module of the present invention;

FIG. 4 is a schematic diagram of a hyperspectral image classification model construction according to the invention;

FIG. 5 is a graph comparing experimental results provided in examples of the present invention;

fig. 6 is a graph of experimental results of an embodiment of the spatial attention feature visualization proposed in the present invention.

The present invention will be explained in further detail with reference to examples.

Detailed Description

The following embodiments of the present invention are provided, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention are within the protection scope of the present invention.

First, technical terms appearing in the present invention are explained to help better understand the technical contents of the present application:

an attention mechanism is as follows: the attention mechanism is inspired by the human vision mechanism, when the human vision observes a scene, the human vision does not pay attention to all regions in the scene to the same extent, but pays more attention to an interested target region to acquire more detailed information of a target needing attention, and suppresses other useless information.

BN layer: batch normalization layer, an efficient regularization method. It also belongs to one layer of the network, as well as the activation function layer, convolutional layer, full link layer, pooling layer. The BN layer can solve the problem that the data distribution of the middle layer is changed in the training process, and has the advantages of improving the gradient of the network, greatly improving the training speed, reducing the strong dependence of the network on initialization and the like.

In the invention, the convolutional layer is used for a plurality of times when the deep network is built, different parameters are needed to be adopted when convolution operation is implemented in different stages of the network so as to realize different functions, and the main parameters used by the different convolutional layers in the invention are described as follows:

the number of filters: the number of the characteristic graphs obtained after the convolution operation is 24;

convolution kernel size: the invention adopts 3D CNN, the size of convolution kernel is expressed by three-dimensional vector, three main function modules of network adopt different convolution kernel sizes, which are specifically explained in the embodiment;

step length: the method is used for designating convolution steps in 3 dimensions, only the first convolution layer of a data dimension reduction module adopts (1,1,2) in the method, and referring to fig. 1, the other convolution layers uniformly adopt default values (1,1, 1);

the expansion rate is specified, the expansion rate for expanding convolution is specified, in the invention, only the convolution layer of the spectrum attention module adopts (1,1,2), referring to fig. 2, and the other convolution layers uniformly adopt default values (1,1, 1);

and boundary processing, namely whether the boundary is filled during convolution operation or not, wherein the spectrum attention module and the space attention module adopt a boundary filling strategy, and the rest convolution layers are defaulted and are not filled with the boundary.

In specific implementations, parameters not specifically described are default values.

Example 1:

a construction method of a hyperspectral image classification model comprises the following steps:

acquiring a hyperspectral image, wherein the initial wave band number of the hyperspectral image is B, and standardizing the hyperspectral image to enable the hyperspectral image to accord with the Gaussian distribution of a unit variance of a mean value 0;

in this example, the data set used was a scene above the valley of Salinas (Salinas) california collected by an AVIRIS sensor with 224 bands, with high spatial resolution (3.7 meters per pixel). The covered area consists of 512 rows of 217 samples each. The 20 water absorption bands [108- & 112], [154- & 167], 224 were discarded and 204 bands were actually used, so the dimension of the data set actually used was 521 × 217 × 204. It includes the categories of vegetables, bare land and vineyards. 16 classes were calibrated and the total number of samples calibrated was 54129. The specific category names and the number of calibration samples are shown in table 1.

TABLE 1Salinas dataset and classification of training set, validation set and test set

No.	Categories	Calibration sample	Training sample	Validating a sample	Test specimen
							1	Brocoli_green_weeds_1	2009	21	21	1967
2	Brocoli_green_weeds_2	3726	38	38	3650
						3	Fallow	1976	20	20	1936
4	Fallow_rough_plow	1394	14	14	1366
						5	Fallow_smooth	2678	27	27	2624
6	Stubble	3959	40	40	3879
						7	Celery	3579	36	36	3507
8	Grapes_untrained	11271	113	113	11045
						9	Soil_vinyard_develop	6203	63	63	6077
10	Corn_senesced_green_weeds	3278	33	33	3212
						11	Lettuce_romaine_4wk	1068	11	11	1046
12	Lettuce_romaine_5wk	1927	20	20	1887
						13	Lettuce_romaine_6wk	916	10	10	896
14	Lettuce_romaine_7wk	1070	11	11	1048
						15	Vinyard_untrained	7268	73	73	7122
16	Vinyard_vertical_trellis	1807	19	19	1769
							Total	54129	549	549	53031

Constructing a neighborhood data cube by taking the pixel of each sample in the calibration sample set as a center, wherein the dimension of the neighborhood data cube is 7 × 204, wherein 7 × 7 represents the spatial neighborhood of the central pixel, 204 is the initial wave band number, and dividing the constructed neighborhood data cube according to a set proportion to obtain a training sample set, a verification sample set and a test sample set;

the proportion parameter train ratio of the training sample is set to be 1 percent, namely, fewer samples are adopted for training, and the embodiment can measure the classification performance of the invention under a small sample training set; the number of the verification set samples is the same as that of the training set samples;

the batch processing sample number during training is batchsize which is 50, and the iteration number epoch is 100;

classifying the calibration samples (pixels) according to a specified proportion, wherein a training sample set accounts for 1%, a verification sample set accounts for 1%, and the rest 98% of the calibration samples are classified into a test sample set;

If the boundary problem is encountered when the neighborhood cube of the calibration sample is established, 0 boundary filling is adopted to expand the boundary, so that the boundary crossing problem is avoided.

the input of the network model is a data cube of 7 × 204, and the corresponding label of the data cube is a category label corresponding to the central pixel of the cube, wherein 7 × 7 represents the spatial neighborhood of the central pixel, and 204 represents the initial wave band number of the hyperspectral image;

performing a first convolution operation on the input 7 × 204 data cube in the spectral dimension (3 rd dimension), wherein the convolution kernel size is 1 × 7, the step length of the convolution operation is 2, extracting a first layer of shallow features of the data, and reducing the spectral dimension of the data;

performing 2 nd convolution operation on the dimensionality-reduced data on a 3 rd dimension, wherein the size of a convolution kernel is 1 x 7, boundary filling is not performed during convolution, only effective data is convolved, so that a second layer of shallow layer features can be extracted, the spectral dimensionality of the data is reduced again, and the output of the layer is used as the input of a spectral attention module to extract spectral dimension depth features;

the data dimension reduction module comprises two convolution modules which are connected in sequence, and each convolution module comprises a convolution layer, a BN layer and an activation function layer which are connected in sequence; performing two 3-dimensional convolutions on the input 7 x 204 data cube, setting step sizes larger than 1 on a spectrum dimension (3 rd dimension) by the first convolution, and greatly reducing the dimension; the convolution operations are three-dimensional convolution, the convolution kernel size is 1 × 7, the step length is (1,1,2), the first shallow layer feature of the data is extracted, and the spectral dimension of the data is reduced to (int (B-7+1)/2), and int represents rounding-down;

performing a 2 nd convolution operation on the dimensionality-reduced data in a spectrum dimension, wherein the convolution kernel size is 1 × 7, so that the second-layer shallow feature can be extracted, the spectrum dimensionality of the data is reduced to (int (B-7+1)/2-7+1) again, TK is (int (B-7+1)/2-7+1, and the output is used as the input of a spectrum attention module to extract the depth spectrum feature; the convolution operation is followed by a BN layer and a ReLu activation layer.

The spectral attention module comprises a plurality of spectral attention feature layers which are connected in sequence; extracting spectral features from input data by using a convolution kernel of 1 x 3, and performing expansion convolution with an expansion rate of 2 in a 3 rd dimension during convolution, so that a larger convolution field can be covered by using fewer convolution kernels;

performing hyperbolic tangent transformation on the extracted spectral features, and then converting the extracted spectral features into weight values from 0 to 1 through a softmax activation function, wherein the weight values represent the degrees of different spectral band space regions needing attention;

performing element-level multiplication operation on the obtained weight value and the input value to obtain a layer of spectral attention characteristics, and then adding a batch normalization BN layer and a ReLU activation layer to obtain the output of a first layer of spectral attention network;

taking the output of the first layer of spectral attention network as the input of the next layer, repeating the above operations to obtain the output of the second layer of spectral attention network, and so on to obtain the output of the multilayer spectral attention network, wherein the values represent spectral attention characteristics at different depths;

the obtained multilayer spectral attention features are summed, and the result is fused with spectral attention features of different depths, so that the image classification precision is improved;

the secondary data dimension reduction module comprises a convolution layer; the second convolution does not perform boundary filling in the spectral dimension, and the dimension is reduced slightly. For the output dimensionality of the multilayer spectral attention network to be 7 × TK, the convolution operation of the convolution kernel size (1,1, TK) is adopted to convert the 3 rd dimensionality of the data into 1, namely, all the spectral features are concentrated on one dimensionality, the dimensionality of the obtained result data is 7 × 1, and preparation is made for subsequent spatial feature extraction;

performing convolution operation, wherein the 3 rd dimension of the data is converted into 1, namely all the spectral features are concentrated on a wave band, the dimension of the obtained result data is 7 × 1, and preparation is made for subsequent spatial feature extraction; the convolution operations are all three-dimensional convolution, the convolution kernel size is 1 × TK, and the convolution adopts expansion convolution with expansion rate dimension >1 in the 3 rd dimension. The output dimension of the last step is 7 × TK, the 3 rd dimension of the data can be converted into 1 by adopting convolution operation of convolution kernel size (1,1, TK), namely, all spectral features are concentrated on one dimension, the dimension of the obtained result data is 7 × 1, and preparation is made for subsequent spatial feature extraction;

the space attention mechanism module comprises a plurality of space attention characteristic layers which are connected in sequence;

performing 3-dimensional convolution operation on input data, and extracting spatial features by adopting a convolution kernel of 3 × 1;

performing hyperbolic tangent transformation on the extracted spatial features, and then converting the extracted spatial features into weight values of 0 to 1 through a softmax activation function, wherein the weight values represent the attention degrees of different regions (or pixels) on the space under an attention mechanism, and the weight values are called as the weight of the spatial attention features;

multiplying the obtained weight value and the extracted input value of the spatial feature pixel by pixel to obtain a layer of spatial attention feature, and then adding a BN layer and a ReLu activation layer to obtain the output of a first layer of spatial attention network;

taking the output of the first layer of spatial attention network as the input of the next layer, repeating the operation to obtain the output of the second layer of spatial attention network, and so on to obtain the output of the multilayer spatial attention network, wherein the values represent spatial attention characteristics at different depths;

summing the obtained multilayer spatial attention features, and fusing spatial attention features of different depths; the method is beneficial to improving the classification precision of the images; the convolution operations are all three-dimensional convolutions with convolution kernel size 3 x 1.

And (4) carrying out average pooling on the results, and inputting the pooled results into a full-connection network to obtain a final classification result.

Further, the convolution operations are all three-dimensional convolution, and the convolution operations are all followed by a batch normalized BN layer and a ReLu activation layer.

The embodiment also discloses a hyperspectral image classification method, which comprises the following steps:

The classification quality evaluation indexes mainly adopt three indexes of OA (Overall Accuracy rate), AA (Average Accuracy rate) and Kappa coefficient, the percentage of Overall correctly classified pixels is evaluated respectively, the Average value of the Accuracy rate of each category in the image and the proportion of reduced errors generated by classification and completely random classification are improved, and the three indexes are all higher and better. Table 2 shows the results provided by the present invention and the results of other comparative methods, each result is the mean and variance of 10 random experiments, the higher the mean is, the better the classification performance is, and the smaller the variance is, the more stable the classification method is. The comparison shows that the invention obtains better results on various indexes.

TABLE 2 comparison of the Classification Effect obtained by different methods

SSFCN

HbridSN

SSRN

CDSCN

Spe_AN

Spa_AN

The invention

AA

95.13±0.93

97.84±0.46

97.94±0.44

97.54±1.08

97.41±2.68

98.69±0.27

98.91±0.34

Kappa

94.75±1.23

96.38±0.54

95.11±1.59

94.91±1.59

96.57±1.55

97.77±0.44

98.17±0.44

OA

95.29±1.1

96.75±0.49

95.61±1.41

95.43±1.43

96.92±1.39

98±0.39

98.35±0.39

The classification results are shown in fig. 5. In FIG. 5, FIG. 5(a) is a pseudo-color image of a hyperspectral image, FIG. 5(b) is a calibration image, FIG. 5(c) is an SSFCA (spectral-spatial fully convolutional neural network) method classification image, FIG. 5(d) is an HbridSN (mixed spectral convolutional neural network) method classification image, FIG. 5(e) is an SSRN (spectral-spatial residual network) method classification image, FIG. 5(f) is a CDSCN (cascaded two-scale crossing network) method classification image, FIG. 5(g) shows AN Spe _ AN (spectral attention feature network only) method image, FIG. 5(h) shows a Spa _ AN (spatial attention feature network only) method image, FIG. 5(i) shows a classification method image provided by the present invention, it can be clearly seen that the classification method provided by the invention has continuous classification results, fewer noise points and better classification results.

A method for visualizing spatial attention features is performed according to the following steps:

step A, according to a constructed hyperspectral image classification model and a constructed hyperspectral image classification system, a stored optimal network model is represented by M, a network layer label where a space attention feature is extracted is represented by L, and a previous layer network label is represented by L-1;

step B, calling an L-1 network parameter M1 and an L-1 network parameter M2 from the hyperspectral image classification model, taking the hyperspectral image as an input image, obtaining an image before attention feature extraction by using the parameter M1, and (a) obtaining an image after attention feature extraction by using the parameter M2 to realize the visualization of attention features; referring to fig. 6 (b), it can be seen that attention features are more focused on feature-rich regions such as boundary lines or inflection points of different regions in the image.

Claims

1. A construction method of a hyperspectral image classification model is characterized by comprising the following steps:

the network comprises a network input layer, a data dimension reduction module, a spectrum attention module, a secondary data dimension reduction module, a space attention mechanism module, a pooling module and a full-connection network which are sequentially arranged in a cascade manner;

the secondary data dimension reduction module comprises a convolution layer;

2. The construction method according to claim 1, wherein, when a boundary problem is encountered during the construction of the neighborhood data cube in the third step, the boundary is extended by using a 0 boundary filling method.

3. The construction method according to claim 1, wherein the concrete method of the fourth step is as follows:

4. A hyperspectral image classification method is characterized by comprising the following steps:

the network model is a hyperspectral image classification model constructed by the hyperspectral image classification model construction method according to any one of claims 1 to 3.

5. A hyperspectral image classification system is characterized by comprising an image acquisition module, a data preprocessing module and a classification module;

the image acquisition module is used for acquiring a hyperspectral image;

the classification module classifies the hyperspectral images by adopting the method of claim 4 and outputs a classification result.