CN111639587A

CN111639587A - Hyperspectral image classification method based on multi-scale spectrum space convolution neural network

Info

Publication number: CN111639587A
Application number: CN202010461596.0A
Authority: CN
Inventors: 高大化; 张中强; 刘丹华; 石光明; 张学聪; 姜嵩; 秦健瑞; 牛毅
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2020-09-08
Anticipated expiration: 2040-05-27
Also published as: CN111639587B

Abstract

The invention discloses a hyperspectral image classification method of a multi-scale spectrum space-convolution neural network, which mainly solves the problems that in the prior art, only single-scale features are extracted during inter-spectrum feature extraction and space feature extraction, and the classification effect on ground object categories with non-concentrated sample distribution or small sample amount is poor. The implementation scheme is as follows: 1) inputting a hyperspectral image, and generating a training sample set and a test sample set with different sample numbers; 2) constructing a multi-scale spectrum space convolution neural network; 3) inputting the training set into a multi-scale spectrum space convolution neural network to obtain a prediction category, calculating hinge cross entropy loss by using the prediction category and a real label, and training the network by using a random gradient descent method until the hinge cross entropy loss is converged; 4) and inputting the test sample into the trained multi-scale spectrum space convolution neural network to obtain a classification result. The method can obtain high-accuracy classification under the condition of few training samples, and can be used for detecting the ground object types of the hyperspectral images.

Description

Hyperspectral image classification method based on multi-scale spectrum space convolution neural network

Technical Field

The invention belongs to the technical field of remote sensing information processing, and further relates to a hyperspectral image classification method which can be used for detecting the types of ground objects of hyperspectral images.

Background

The hyperspectrum records the continuous spectrum characteristics of the ground object target by rich wave band information, and has the possibility of recognizing more types of ground object targets and classifying the ground objects with higher precision. The key of the hyperspectral image classification technology is to classify the sample categories by utilizing the spatial characteristics and the inter-spectral characteristics of the hyperspectral images. The method has important significance in the aspects of land resource assessment and disaster monitoring. However, the existing classification method mainly depends on a large number of training samples, and due to the fact that sample labels are difficult to obtain, the overfitting problem is easily caused under the condition of few samples, and the sample classification accuracy is further influenced.

A method for classifying hyperspectral images using an end-to-end spectrum-space Residual error method SSRN is proposed by Zilong Zhong et al in its published paper, "Spectral-Spatial reactive Network for hyperspectral Image Classification: A3-D Deep Learning frame" (IEEETransmission on Geoscience and Remote Sensing,2017: 1-12). The method takes an original three-dimensional cube as input data, an inter-spectrum residual block and a space residual block continuously learn identification characteristics from abundant spectra and space information of a hyperspectral image in an end-to-end spectrum-space residual network, spectral characteristics obtained by the inter-spectrum residual block and space characteristics obtained by the space residual block are fused in a cascading mode, and finally the fused high-level spectral-space characteristics are input into a classification layer to classify the hyperspectral image. According to the method, a convolution operation of a single scale is adopted in the inter-spectrum and space residual block, and only the features of the single scale are extracted, so that the effect of the whole network on the classification of the hyperspectral image is poor.

The northwest industrial university discloses a hyperspectral image classification method based on space-spectrum combination of a deep convolutional neural network in an authorized patent document 'hyperspectral image classification method based on space-spectrum combination of a deep convolutional neural network' (authorization publication number: CN 105320965B). The method comprises the following specific steps: firstly, training a convolutional neural network by using a small amount of label data, and autonomously extracting the spatial spectral feature of a hyperspectral image by using the network without any compression and dimension reduction processing; then, training a Support Vector Machine (SVM) classifier by using the extracted space spectrum characteristics to classify the images; and finally, combining the trained deep convolutional neural network and the trained classifier to obtain a deep convolutional neural network structure DCNN-SVM which can autonomously extract and classify the spatial spectral features of the hyperspectral image. According to the method, the extraction of the hollow spectral features in the hyperspectral image by the deep convolutional neural network is still in a single scale, so that the effect of the SVM classifier on the hyperspectral image classification is poor.

Except the listed hyperspectral image classification methods, the existing hyperspectral image classification methods based on the deep convolutional neural network are similar to the two methods, and the common characteristic of the methods is that a feature extraction module adopts single-scale convolution operation, and multi-scale space and inter-spectrum features are not extracted on a feature level, so that the methods have poor hyperspectral image classification effect when few samples are trained.

Disclosure of Invention

The invention aims to provide a hyperspectral image classification method based on a multi-scale spectrum space convolution neural network aiming at the defects of the prior art, so as to improve the precision of ground object target classification in a hyperspectral image under the condition of less sample training.

The idea of achieving the purpose of the invention is to construct a multi-scale space spectrum characteristic extraction module and a multi-scale space characteristic extraction module, then generate a multi-scale space spectrum convolution neural network, input hyperspectral three-dimensional data into the multi-scale space spectrum characteristic module to extract multi-scale space spectrum characteristics, input the multi-scale space spectrum characteristics into the multi-scale space characteristic extraction module, extract multi-scale space spectrum combination characteristics and classify, train the network by using a hinge cross entropy loss function, and finally input a test sample into the trained multi-scale space spectrum convolution neural network to classify hyperspectral images. The hyperspectral image classification model can solve the problem of hyperspectral image classification under the condition of few samples.

In order to achieve the above purpose, the specific steps of the invention comprise:

1. a hyperspectral image classification method based on a multiscale spectral space convolution neural network is characterized by comprising the following steps:

(1) inputting an original three-dimensional hyperspectral image, and performing 0 edge filling operation on the original three-dimensional hyperspectral image; selecting a hyperspectral image block by taking each pixel point as a center in the hyperspectral image after the filling operation;

(2) generating training sample sets and testing sample sets with different sample numbers for the hyperspectral image blocks;

(3) constructing a multi-scale spectrum space convolution neural network:

(3a) constructing a multi-scale inter-spectrum feature extraction module formed by connecting two inter-spectrum residual modules in series;

(3b) constructing a multi-scale spatial feature extraction module formed by connecting two spatial residual modules in series;

(3c) sequentially connecting a multi-scale spectrum inter-characteristic extraction module, a multi-scale space characteristic extraction module and a softmax classifier in series to form a multi-scale spectrum space convolution neural network;

(4) training a multi-scale spectrum space convolution neural network:

(4a) constructing a hinge cross entropy loss function, wherein the formula is as follows:

wherein L represents the hinge cross entropy loss value between the predicted tag vector and the real tag vector, argmax (·) represents the position of the maximum value, y represents the predicted tag vector, y represents the hinge cross entropy loss value^*Representing the true tag vector, theta represents a set threshold,

representing the i-th element, y, in the true tag vector_iRepresenting the ith element in the prediction label vector, and M represents the total number of classes in the training set;

(4b) inputting the training sample set and the training sample label into a multi-scale spectrum space convolution neural network to obtain a prediction label y of the training sample_i(ii) a Training the multi-scale spectrum space-convolution neural network by adopting a random gradient descent algorithm until the hinge cross entropy loss function is converged to obtain the trained multi-scale spectrum space-convolution neural network;

(5) and inputting the test sample into the trained multi-scale spectrum space convolution neural network to obtain a class label of the test sample, and completing the classification of the hyperspectral image.

Compared with the prior art, the invention has the following advantages:

firstly, because the invention constructs the multi-scale spatial feature extraction block and the multi-scale inter-spectrum feature extraction block, and extracts the multi-scale inter-spectrum spatial features with higher resolution and marking in the hyperspectral image, the invention overcomes the defect that only single scale features are extracted during inter-spectrum feature extraction and spatial feature extraction in the prior art when few samples are trained, and improves the classification precision of ground objects in the hyperspectral image.

Secondly, because the novel hinge cross entropy loss function is constructed, the multi-scale spectrum space-convolution neural network can pay more attention to samples which are difficult to classify, the defect that the loss function in the prior art has poor classifying effect on ground object types with non-concentrated sample distribution or small sample amount is overcome, and the classifying capability of the multi-scale spectrum space-convolution neural network during training of few samples is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a model structure of the multi-scale spectral space-convolution neural network in the present invention.

FIG. 3 is a schematic diagram of the structure of the inter-spectral residual block in the multi-scale spectral space-convolution neural network of the present invention.

FIG. 4 is a simulation diagram of classification results of the present invention and the prior art on two different data sets.

Detailed Description

Embodiments and effects of the present invention will be further described below with reference to the accompanying drawings.

Referring to fig. 1, implementation steps for the present example include the following.

Step 1, inputting a hyperspectral image.

The hyperspectral image is three-dimensional data S ∈ R^a×b×cEach wave band in the hyperspectral image corresponds to a two-dimensional matrix S in the three-dimensional data_i∈R^a×bWherein ∈ denotes a symbol, R denotes a real number field symbol, a denotes a length of the hyperspectral image, b denotes a width of the hyperspectral image, c denotes a number of spectral bands of the hyperspectral image, i denotes a number of spectral bands in the hyperspectral image, and i is 1,2, …, c.

And 2, acquiring a hyperspectral image block set.

And performing 0 edge filling operation on the original three-dimensional hyperspectral image data, wherein the size of 0 pixel of the edge filling is k. In the filled hyperspectral image, each pixel point is taken as a center, an image block with the space size of (2k +1) × (2k +1) and the channel number d is selected to obtain a hyperspectral image block set, wherein the channel number d is the same as the spectral band number of the hyperspectral image, and the example is not limited to k being 4.

And 3, generating a training sample set and a testing sample set with different sample numbers from the hyperspectral image block set.

Firstly, distributing a hyperspectral image block to a set to which a central pixel point belongs according to the category of the central pixel point;

then, selecting image blocks in each category set according to the proportion of 0.01 as a training set, and taking a central pixel point label of each image block in the training set as a label of the image block;

and finally, taking the residual image blocks in each category set as a test set.

This example takes a 0.01 ratio for each class of targets as the training set, leaving 0.99 as the test set for two different hyperspectral datasets, Kennedy Space Center and Salinas scene.

And 4, constructing a multi-scale spectrum space convolution neural network.

Referring to fig. 2, the specific implementation steps of the present embodiment include the following.

4.1) constructing a multi-scale spectrum inter-feature extraction module consisting of two inter-spectrum residual modules connected in series, wherein:

the inter-spectrum residual error module structure sequentially comprises: the first convolution layer → the first activation function layer → the first average layer → the second convolution layer → the second activation function layer → the first fusion layer → the third convolution layer → the third activation function layer → the second fusion layer → the fourth convolution layer → the fourth activation function layer → the first carboxylate splice layer → the fifth convolution layer → the fifth activation function layer → the second average layer → the sixth convolution layer → the sixth activation function layer → the third fusion layer → the seventh convolution layer → the seventh activation function layer → the fourth fusion layer → the eighth convolution layer → the eighth activation function layer → the second carboxylate splice layer; the structure is shown in fig. 3.

The first uniform layer is used for uniformly dividing the feature graph output by the first activation function layer into four parts;

the first fusion layer is used for adding the second activation function layer and a third part of the first activation function layer;

the second fusion layer is used for adding the third activation function layer and the fourth part of the first activation function layer;

the first splice layer is used for splicing the first parts of the second activation function layer, the third activation function layer, the fourth activation function layer and the first activation function layer together;

the second uniform layer is used for uniformly dividing the feature map output by the fifth activation function layer into four parts;

the third fusion layer is used for adding the sixth activation function layer and a third part of the fifth activation function layer;

the fourth fusion layer is used for adding the seventh activation function layer and a fourth part of the fifth activation function layer;

the second splice layer is used for splicing the first parts of the sixth, seventh, eighth and fifth activation function layers together;

the parameters of each layer are set as follows:

the convolution kernels of all three-dimensional convolution layers in the inter-spectrum residual module are set to be 1 × 7, and the number of the convolution kernels is set to be 15; setting the activation function of each activation function layer in the inter-spectrum residual error module as a ReLU activation function;

4.2) a multi-scale space feature extraction module which is formed by connecting two space residual modules in series is built, wherein:

the structure of the space residual error module is as follows in sequence: the 1 st convolution layer → the 1 st activation function layer → the 1 st mean layer → the 2 nd convolution layer → the 2 nd activation function layer → the 1 st fusion layer → the 3 rd convolution layer → the 3 rd activation function layer → the 2 nd fusion layer → the 4 th convolution layer → the 4 th activation function layer → the 1 st linkage splice layer → the 5 th convolution layer → the 5 th activation function layer → the 2 nd mean layer → the 6 th convolution layer → the 6 th activation function layer → the 3 rd fusion layer → the 7 th convolution layer → the 7 th activation function layer → the 4 th fusion layer → the 8 th convolution layer → the 8 th activation function layer → the 2 nd linkage splice layer; the structure is the same as that of fig. 3.

The 1 st uniform layer is used for uniformly dividing the feature map output by the 1 st activation function layer into four parts;

the 1 st fusion layer is used for adding the 2 nd activation function layer and a third copy of the 1 st activation function layer;

the 2 nd fusion layer is used for adding the 3 rd activation function layer and the fourth part of the 1 st activation function layer;

the 1 st splice layer is used for splicing the 2 nd, 3 rd, 4 th and 1 st activation function layers together;

the 2 nd uniform layer is used for uniformly dividing the feature map output by the 5 th activation function layer into four parts;

the 3 rd fusion layer is used for adding the 6 th activation function layer and a third copy of the 5 th activation function layer;

the 4 th fusion layer is used for adding the 7 th activation function layer and the fourth part of the 5 th activation function layer;

the 1 st splice layer is used for splicing the first parts of the 6 th, 7 th, 8 th and 5 th activation function layers together;

the parameters of each layer are as follows:

the convolution kernel size of all convolution layers in the space residual module is set to be 3 x 128, and the number of the convolution kernels is set to be 15;

and setting the activation function of each activation function layer in the spatial residual module as a ReLU activation function.

And 4.3) sequentially connecting the multi-scale spectrum inter-feature extraction module, the multi-scale space feature extraction module and the softmax classifier in series to form the multi-scale spectrum space convolution neural network.

And 5, constructing a hinge cross entropy loss function.

In the hyperspectral image classification task, as some ground object target samples are not distributed in a concentrated manner or the amount of the target samples is small, common cross entropy loss functions cannot well distinguish the difficult samples, in order to enable a network to pay more attention to the difficult samples, the embodiment designs a novel hinge cross entropy loss function, and the formula is as follows:

representing the i-th element, y, in the true tag vector_iRepresenting the ith element in the prediction label vector and M representing the total number of classes in the training set.

And 6, training the multi-scale spectrum space convolution neural network.

Inputting a training sample set and a training sample label into a multi-scale spectrum space convolution neural network;

training the multi-scale spectrum space convolution neural network by adopting a random gradient descent algorithm to obtain a prediction label y of the training sample, wherein the prediction label y is the same as the real label y^*And (5) calculating the hinge cross entropy loss through the formula in the step (5) until the hinge cross entropy loss is converged to obtain the trained multi-scale spectrum space convolution neural network.

And 7, classifying the test sample set.

And inputting the test sample into the trained multi-scale spectrum space convolution neural network to obtain the category of the test sample, and completing the classification of the hyperspectral image.

The effect of the present invention will be further explained with the simulation experiment.

1. Simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: intercore i7-6700, frequency of 3.4GHz, NvidiaGeForce GTX1080 Ti. The software of the simulation experiment of the present invention uses a pytorech.

The simulation experiment of the invention is to classify the ground object targets in Kennedy Space Center and Salinas scene hyperspectral data sets respectively by adopting the invention and two existing 3D-Densenet and SSRN methods.

The 3D-Densenet method refers to: chunjn Zhang et al, in the Journal of Applied remove Sensing,2019, propose a method for classifying hyperspectral images using a 3D dense connection network, referred to as the 3D-DenseNet method for short.

The SSRN classification method comprises the following steps: a method for classifying Hyperspectral images by using an end-to-end spectrum-space residual error network is proposed in Spectral-Spatial information network for Hyperspectral Image Classification, A3-D Deep Learning frame (IEEE Transactions on Geoscience and Remote Sensing,2017:1-12), and is called SSRN method for short.

The Kennedy Space Center and Salinas scene hyperspectral datasets used in the present invention are data collected by the AVIRIS sensor at Kennedy Space Center and Salinas Valley, Calif., respectively. The data set used in the simulation experiment of the invention is collected from a website:

http:// www.ehu.eus/ccwinnco/index. php? title is Hyperspectral Remote Sensing Scenes. The size of the image of the Kennedy Space Center hyperspectral dataset is 512 x 614, the image has 176 bands and comprises 13 types of ground objects, and the type and the number of each type of ground object are shown in Table 1.

TABLE 1 Kennedy Space Center sample classes and quantities

The size of the Salinas scene hyperspectral dataset image is 512 x 217, the Salinas scene hyperspectral dataset image has 204 wave bands and comprises 16 types of ground objects, and the category and the number of each type of ground object are shown in Table 2.

TABLE 2 Salinas scene sample classes and quantities

Class label	Class of ground object	Number of
			1	Brocoli_green_weeds_1	2009
2	Brocoli_green_weeds_2	3726
			3	Fallow	1976
4	Fallow_rough_plow	1394
			5	Fallow_smooth	2678
6	Stubble	3959
			7	Celery	3579
8	Graps_untrained	11271
			9	Soil_vinyard_develop	6203
10	Corn_senesced_green_weeds	3278
			11	Lettuce_romaine_4wk	1068
12	Lettuce_romaine_5wk	1927
			13	Lettuce_romaine_6wk	916
14	Lettuce_romaine_7wk	1070
			15	Vinyard_untrained	7268
16	Vinyard_vertical_trellis	1807

2. Simulation experiment content and result analysis:

2.1) in order to verify the high efficiency and good classification performance of the invention, three evaluation indexes of overall classification accuracy OA, average accuracy AA and Kappa coefficient are adopted.

The overall classification accuracy OA refers to the proportion of the number of correctly classified pixels on the test set divided by the total number of pixels, and the value is between 0% and 100%, and the larger the value is, the better the classification effect is.

The average precision AA is that the number of correctly classified pixel points of each type on the test set is divided by the total number of all pixels of the type to obtain the correct classification precision of the type, the precision of all the types is averaged, the value is between 0 and 100 percent, and the larger the value is, the better the classification effect is.

The Kappa coefficient is an evaluation index defined on the confusion matrix, elements on a diagonal line of the confusion matrix and elements deviating from the diagonal line are comprehensively considered, the classification performance of the algorithm is objectively reflected, the value of the Kappa coefficient is between-1 and 1, and the larger the value is, the better the classification effect is.

The classification accuracy of the Kennedy Space Center and Salinas scene hyperspectral datasets of the present invention and two prior art techniques were compared and the results are shown in Table 3.

TABLE 3 comparison of classification accuracy of three networks under different data sets

As can be seen from Table 3, the method of the present invention obtains a higher classification accuracy under the data sets of Kennedy Space Center and Salinas scene, which indicates that the method of the present invention can more accurately predict the category of the hyperspectral image sample.

2.2) the results of the simulated classification on two different datasets, Kennedy Space Center and Salinassicene, using the method of the present invention and the two prior art techniques described above are graphically shown in FIG. 4. Wherein:

FIG. 4a is a graph showing the results of classification on a Kennedy Space Center hyperspectral dataset using a prior art 3D-DenseNet;

FIG. 4b is a graph showing the results of classification using the existing SSRN on a Kennedy Space Center hyperspectral dataset;

FIG. 4c is a graph showing the results of classification on the Kennedy Space Center hyperspectral dataset using the method of the present invention;

FIG. 4D is a graph showing the results of classification using the existing 3D-DenseNet on the Salinas scene hyperspectral dataset;

FIG. 4e is a graph showing the results of classification using the existing SSRN on the Salinas scene hyperspectral dataset;

FIG. 4f is a diagram showing the classification result on the Salinas scene hyperspectral dataset by the method of the present invention;

as can be seen from FIG. 4c, the classification result graph of the present invention on the Kennedy Space Center data set is clearly smoother and more edge-defined than those of FIGS. 4a and 4 b.

As can be seen from FIG. 4f, the classification result graph of the present invention on the Salinas scene data set is obviously smoother, clearer in edge and better in regional target consistency than those of FIGS. 4d and 4 e. This directly illustrates that the method proposed by the present invention can generate a higher quality hyperspectral image classification result map than the two existing techniques.

The above simulation experiments show that: the method can acquire multi-scale spectrum space characteristics by using the constructed multi-scale space and inter-spectrum characteristic extraction module, can acquire a hyperspectral image classification result by building a multi-scale spectrum space convolution neural network, and trains the neural network by adopting a hinge cross entropy loss function, so that the multi-scale spectrum space convolution neural network focuses more on ground object categories with unconcentrated sample distribution or small sample amount. The method solves the problem that the classification accuracy is low under the condition of few training samples due to the fact that single-scale convolution operation is adopted in spectral and spatial features in the prior art, and is a very practical hyperspectral image classification method under the condition of few training samples.

Claims

(3) constructing a multi-scale spectrum space convolution neural network:

(4) training a multi-scale spectrum space convolution neural network:

2. The method according to claim 1, wherein the original three-dimensional hyperspectral image in (1) is three-dimensional data S ∈ R^a×b×cEach of the hyperspectral imagesBand-corresponding two-dimensional matrix S in three-dimensional data_i∈R^a×bWherein ∈ denotes a symbol, R denotes a real number field symbol, a denotes a length of the hyperspectral image, b denotes a width of the hyperspectral image, c denotes a number of spectral bands of the hyperspectral image, i denotes a number of spectral bands in the hyperspectral image, and i is 1,2, …, c.

3. The method of claim 1, wherein the 0 edge fill operation in (1) is to fill the hyperspectral image edge with a size k of 0 pixels.

4. The method according to claim 1, wherein the hyperspectral image block in (1) has a spatial size of (2k +1) × (2k +1), the number of channels d is the same as the number of spectral bands of the hyperspectral image, and k is a size of 0 pixel filled in the edge of the hyperspectral image.

5. The method according to claim 1, wherein in (2), training sample sets and test sample sets with different sample numbers are generated, and the hyperspectral image blocks are firstly distributed into sets to which the categories belong according to the categories of central pixel points of the hyperspectral image blocks; selecting image blocks in each class of set according to the proportion of 0.01 as a training set, and taking the central pixel point label of each image block as the label of the image block; and respectively using the residual image blocks in each class set as test sets.

6. The method of claim 1, wherein the structure of each multi-scale inter-spectrum residual module in (3a) is, in order: the first convolution layer → the first activation function layer → the first average layer → the second convolution layer → the second activation function layer → the first fusion layer → the third convolution layer → the third activation function layer → the second fusion layer → the fourth convolution layer → the fourth activation function layer → the first carboxylate splice layer → the fifth convolution layer → the fifth activation function layer → the second average layer → the sixth convolution layer → the sixth activation function layer → the third fusion layer → the seventh convolution layer → the seventh activation function layer → the fourth fusion layer → the eighth convolution layer → the eighth activation function layer → the second carboxylate splice layer;

the convolution kernel size of all convolution layers in the multi-scale inter-spectrum module is set to be 1 × 7, and the number of the convolution kernels is set to be 15;

and setting the activation function of each activation function layer in the multi-scale inter-spectrum module as a ReLU activation function.

7. The method of claim 1, wherein the structure of each multi-scale space module in (3b) is, in turn: the 1 st convolution layer → the 1 st activation function layer → the 1 st mean layer → the 2 nd convolution layer → the 2 nd activation function layer → the 1 st fusion layer → the 3 rd convolution layer → the 3 rd activation function layer → the 2 nd fusion layer → the 4 th convolution layer → the 4 th activation function layer → the 1 st linkage splice layer → the 5 th convolution layer → the 5 th activation function layer → the 2 nd mean layer → the 6 th convolution layer → the 6 th activation function layer → the 3 rd fusion layer → the 7 th convolution layer → the 7 th activation function layer → the 4 th fusion layer → the 8 th convolution layer → the 8 th activation function layer → the 2 nd linkage splice layer;

the convolution kernel size of all convolution layers in the multi-scale space module is set to be 3 x 128, and the number of the convolution kernels is set to be 15;

and setting the activation function of each activation function layer in the multi-scale space module as a ReLU activation function.