CN111507409A

CN111507409A - Hyperspectral image classification method and device based on depth multi-view learning

Info

Publication number: CN111507409A
Application number: CN202010307781.4A
Authority: CN
Inventors: 刘冰; 郭文月; 余岸竹; 王瑞瑞; 余旭初; 张鹏强; 谭熊; 魏祥坡; 高奎亮; 左溪冰
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-08-07
Anticipated expiration: 2040-04-17
Also published as: CN111507409B

Abstract

The invention provides a hyperspectral image classification method and device based on depth multi-view learning, and belongs to the technical field of remote sensing image processing and application. The classification method comprises the following steps: constructing at least two different visual angles for each unmarked sample in the training sample set, and training the depth residual error network model by using the obtained multi-visual angle data of all unmarked samples in the training sample set; constructing multi-view data of each sample in a sample set to be classified, inputting one view of each sample into a trained model to obtain a feature vector of a corresponding sample, or inputting each sample in the sample set to be classified or each sample subjected to dimensionality reduction into the trained model to obtain the feature vector of the corresponding sample, and further obtaining the feature vectors of all samples in the sample set to be classified; and inputting the characteristic vectors of all samples in the sample set to be classified into a classification model trained in advance to finish the classification of the hyperspectral images. The method can improve the classification precision of the hyperspectral images under the condition of small samples.

Description

Hyperspectral image classification method and device based on depth multi-view learning

Technical Field

The invention relates to a hyperspectral image classification method and device based on depth multi-view learning, and belongs to the technical field of remote sensing image processing and application.

Background

The hyperspectral image classification is one of important steps in hyperspectral image application, and the basic task of the hyperspectral image classification is to endow each pixel in an image with a category identifier, so that the hyperspectral image classification has a high practical application value.

The existing hyperspectral image classification methods mainly comprise the following steps:

(1) the hyperspectral image classification is carried out based on a traditional machine learning method. For example, based on traditional machine learning methods such as a support vector machine, a semi-supervised support vector machine, a random forest and the like, although the method can obtain a certain classification effect, the method usually needs complex feature design work and relies on expert experience to adjust parameters to a great extent, and has great limitation in application and low classification precision.

(2) And (4) carrying out hyperspectral image classification based on a deep learning method. For example, based on a deep learning method such as a convolutional neural network, although the method can automatically extract the spatial spectral features of the hyperspectral image without manually presetting the features, the method can obtain a good classification effect only by relying on a large number of labeled training samples, in practical application, the acquisition of the hyperspectral image labeled samples is time-consuming and labor-consuming, the number of labeled samples which can be used for training is very small, and if a sufficient number of labeled training samples are not supported, the method is difficult to obtain high classification accuracy.

(3) And performing hyperspectral image classification based on the depth residual error network model. The method is based on the characteristics that the number of unmarked samples in the hyperspectral image is large and the unmarked samples are easy to acquire, and researches how to better mine the characteristic information of the hyperspectral image by utilizing the unmarked samples and improve the classification performance of the small samples of the hyperspectral image. For example, in the invention patent application document with publication number CN109754017A, a hyperspectral image classification method based on separable three-dimensional residual error network and transfer learning is disclosed, in which a three-dimensional residual error network model is constructed to realize the autonomous extraction of the hyperspectral image depth features, and compared with the hyperspectral image classification method based on the depth learning method, the network model is deeper and has higher precision, and a better classification effect can be obtained under the condition of a small sample. However, the method performs feature learning based on reconstruction errors, and thus, deeper abstract features cannot be extracted, and the classification accuracy under the condition of a small sample still needs to be further improved.

In summary, in the existing hyperspectral image classification methods, the classification method based on the traditional machine learning not only needs to rely on expert experience to perform complex feature design work, but also has low classification accuracy; the deep learning-based classification method needs to depend on a large number of marked training samples, and has low classification precision under the condition of small samples; the classification method based on the deep residual error network model performs feature learning based on reconstruction errors, so that deeper abstract features cannot be extracted, and the classification accuracy under the condition of small samples needs to be further improved.

Disclosure of Invention

The invention aims to provide a hyperspectral image classification method and device based on depth multi-view learning, and aims to solve the problem that the existing hyperspectral image classification method is low in classification accuracy under the condition of small samples.

In order to achieve the above object, the present invention provides a hyperspectral image classification method based on depth multi-view learning, which comprises the following steps:

(1) inputting a hyperspectral image;

(2) extracting a set number of unmarked samples in the hyperspectral image to form a training sample set, and forming a sample set to be classified by the rest samples, wherein the samples are data cubes of m × m × b selected by taking a pixel to be processed in the hyperspectral image as a center, m is the size of a spatial neighborhood, and b is the number of spectral wave bands;

(3) dividing all spectral bands of the same unmarked sample in the training sample set into at least two groups, processing one group of spectral bands to obtain one visual angle, thereby constructing at least two different visual angles of the unmarked sample, obtaining multi-visual angle data of the unmarked sample, and further obtaining the multi-visual angle data of all the unmarked samples in the training sample set;

(4) training a depth residual error network model by using multi-view data of all unmarked samples in a training sample set, wherein the training is carried out for multiple times, N unmarked samples in the training sample set are extracted for training each time, N is more than or equal to 1, and the training process is as follows: inputting multi-view data of each unmarked sample in the training process into a depth residual error network model, wherein one view obtains one vector, and thus each unmarked sample obtains at least two vectors; calculating comparative losses among all vectors of each unmarked sample by using a loss function, further obtaining the total comparative losses of all the unmarked samples in the training process, judging whether the total comparative losses meet the set requirement, if not, optimizing the depth residual error network model until the total comparative losses meet the set requirement, and finishing the training of the depth residual error network model;

(5) constructing multi-view data of each sample in a sample set to be classified, inputting one view of each sample into a trained depth residual error network model to obtain a feature vector of the corresponding sample, and further obtaining feature vectors of all samples in the sample set to be classified;

or firstly reducing the dimension of each sample in the sample set to be classified, inputting each sample after dimension reduction into a trained depth residual error network model to obtain the feature vector of the corresponding sample, and further obtaining the feature vectors of all samples in the sample set to be classified;

or directly inputting each sample in the sample set to be classified into the trained depth residual error network model to obtain the feature vector of the corresponding sample, and further obtaining the feature vectors of all samples in the sample set to be classified;

(6) and inputting the characteristic vectors of all samples in the sample set to be classified into a classification model trained in advance to finish the classification of the hyperspectral images.

The invention also provides a hyperspectral image classification device based on depth multi-view learning, which comprises a processor and a memory, wherein the processor executes a computer program stored by the memory so as to realize the hyperspectral image classification method based on depth multi-view learning.

The hyperspectral image classification method and device based on depth multi-view learning have the advantages that firstly, a data cube of m × m × b selected by taking a pixel to be processed in a hyperspectral image as a center is selected as a sample, at least two different views are constructed for each unmarked sample in a training sample set on the basis, a depth residual error network model (hereinafter referred to as a model) is trained by using multi-view data of all unmarked samples in the obtained training sample set, on one hand, empty spectrum joint information in the hyperspectral image can be fully utilized, on the other hand, the advantage of large quantity of unmarked samples in the hyperspectral image can be fully exerted, depth characteristic information of the hyperspectral image is mined by means of a large quantity of unmarked samples, so that the depth characteristic features of the hyperspectral image can be extracted by using the trained model, on the other hand, the model training is completed only when total comparative loss meets a set requirement, the consistency of the trained model can be ensured, the consistency of the multi-view data can be deeply mined, the output characteristic vectors from the same sample can be consistent, the extraction of the characteristic vectors of the trained model can be ensured to have stronger representativeness, the discriminative performance of the extraction and the training can be achieved, the classification accuracy of the hyperspectral image can be improved, and the high-spectrum classification accuracy of the hyperspectral image can be particularly, and the high-image classification efficiency classification of the high-.

Further, in the hyperspectral image classification method and device based on depth multi-view learning, the loss function is a cosine similarity function constructed according to cosine similarity between vectors.

The beneficial effects of doing so are: the cosine similarity function constructed according to the cosine similarity between the vectors can enable different view angle data from the same sample to be aggregated, so that the multi-view angle data from different samples are far away from each other, the consistency of the multi-view angle data in the hyperspectral image can be mined, the mined information is more representative, the classification precision can be effectively improved when the hyperspectral image is used for classification, and the classification precision improvement effect is obvious particularly for the hyperspectral image when the number of marked training samples is small.

Further, in the method and the device for classifying hyperspectral images based on depth multi-view learning, the cosine similarity function is as follows:

in the formula, ζ_i,jRepresenting a vector z_iSum vector z_jA comparative loss therebetween;

is a vector z_iSum vector z_jThe cosine similarity between them, the closer the cosine similarity is to 1, indicates that the more similar the two vectors are,

representing a vector z_iTranspose, | | z_i||、||z_jRespectively representing the vector z_iModulo, vector z_jThe mold of (4); l_[k≠i]For indicating the function, k ≠ i has a value of 1; i. j and k are variables; n represents the number of unlabeled samples.

In order to reduce the data processing amount on the premise of ensuring that the obtained sample multi-view data contains enough spatio-spectral joint information, further, in the hyper-spectral image classification method and device based on depth multi-view learning, the process of processing a group of spectral bands to obtain a view angle comprises the following steps: and performing principal component analysis on the group of spectral bands, wherein the first M principal components of the group of spectral bands are taken as a visual angle, and M is more than or equal to 1.

In order to improve the generalization ability of the depth residual error network model, in the hyperspectral image classification method and device based on depth multi-view learning, when the depth residual error network model is trained, sample data expansion is performed by adopting random clipping and a random Gaussian blur method.

In order to ensure that the depth residual error network model has a depth network structure, and enable the extraction of the depth features of the hyperspectral image to be realized by using the trained depth residual error network model, so as to improve the classification precision of the hyperspectral image under the condition of a small sample, further, in the hyperspectral image classification method and device based on the depth multi-view learning, the depth residual error network model comprises 49 convolutional layers and 2 full-link layers, wherein the 49 convolutional layers of the Resnet50 model are used as the 49 convolutional layers of the depth residual error network model.

Further, in the method and the device for classifying hyperspectral images based on depth multi-view learning, the classification model is a support vector machine classification model, a random forest classification model or a convolutional neural network classification model.

Drawings

FIG. 1 is a flowchart of a hyper-spectral image classification method based on depth multi-view learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process for constructing multi-view data of an unmarked sample according to an embodiment of the method of the present invention;

FIG. 3 is a schematic structural diagram of a depth residual error network model in an embodiment of the method of the present invention;

FIG. 4 is a schematic diagram of a standard residual block of the deep residual network model of FIG. 3;

FIG. 5 is a comparison of classification results on the Salinas dataset for various methods in a method embodiment of the invention;

fig. 6 is a structural diagram of a hyperspectral image classification device based on depth multi-view learning in an embodiment of the device.

Detailed Description

Method embodiment

The hyperspectral image classification method based on depth multi-view learning (hereinafter referred to as classification method) of the embodiment is shown in fig. 1, and the classification method includes the following steps:

step 1, inputting a hyperspectral image;

step 2, extracting a set number of unlabeled samples (specifically set according to actual needs) in the hyperspectral image to form a training sample set, and forming a sample set to be classified by the rest samples, wherein in order to fully utilize spatial information in the hyperspectral image to effectively improve the classification precision, a data cube of m × m × b selected by taking a pixel to be processed in the hyperspectral image as a center is used as a sample, m is the size of a spatial neighborhood, and b is the number of spectral wave bands;

step 3, constructing two different visual angles for the same unmarked sample in the training sample set to obtain multi-visual angle data of the unmarked sample, and further obtaining multi-visual angle data of all unmarked samples in the training sample set;

in the embodiment, a depth multi-view learning method is used for constructing multi-view data of a non-marked sample, and in order to reduce complexity of a depth multi-view learning process, two different views are constructed for the same non-marked sample, wherein the view construction process specifically comprises the steps of averagely dividing all spectral bands of the same non-marked sample into two groups, performing principal component analysis on each group of spectral bands, then taking the first 3 principal components of a first group of spectral bands as a first view of the non-marked sample, taking the first 3 principal components of a second group of spectral bands as a second view of the non-marked sample, for example, in fig. 2, selecting a 28 × 28 × 200 data cube for a certain non-marked pixel as the non-marked sample of the pixel, wherein the non-marked sample has 200 spectral bands, averagely dividing the 200 spectral bands of the non-marked sample into two groups, obtaining spectral band groups with the number of two spectral bands being 100, performing principal component analysis on each group of spectral bands, taking the first 3 principal components of the first group of spectra as the first view of the non-marked sample, and taking the first 3 principal components of the second group of the spectral bands as the second group of the non-marked.

As other implementation modes, the first M main components of a group of spectral bands can be used as a viewing angle, wherein the value of M is set according to actual needs, and M is more than or equal to 1; or each group of spectral bands can be directly used as a visual angle, and the step of principal component analysis is omitted.

In addition, as different wave bands of the hyperspectral image can reflect different attributes of ground objects, in order to enable the obtained multi-view data of the hyperspectral image to contain the empty spectrum information of the hyperspectral image as much as possible, as other implementation modes, 3 or more than 3 different view angles can be constructed for the same unmarked sample, and the construction process of the view angles is similar to that when two different view angles are constructed, and is not repeated; in particular, each band of the unlabeled sample may also be taken as a view angle.

Step 4, training the depth residual error network model by using multi-view data of all unmarked samples in the training sample set;

the deep residual network model constructed in this embodiment is composed of a network f (-) including 49 convolutional layers and a network g (-) including 2 fully connected layers, where the network f (-) adopts a Resnet50 model with a de-classified layer as a basic structure, specifically see fig. 3, in fig. 3, Zero PAD denotes a 0-padding operation on the image periphery, CONV denotes a two-dimensional convolutional layer, batatchm denotes a batch normalization layer, Re L U denotes a Re L U activation function, MaxPool denotes a maximum pooling layer, AVGPool denotes a global maximum pooling layer, CONVBlock denotes a residual Block shown in fig. 4, Block 3 denotes repeating the residual Block 3 times, it can be seen that the network f (-) includes 16 standard residual blocks, the structure of each residual Block is shown in fig. 4, CONV2 34 denotes a two-dimensional convolutional layer, batchnom denotes a batch layer, Re 4656 denotes a Re 29U activation function, shcut denotes a network g +3 +.

In this embodiment, after the parameters of the network f (-) are set according to table 1, the vector h with 2048 output dimension of the network f (-) is connected to the network g (-) after the network f (-) is connected, and the dimension of the vector h can be reduced by setting the number of input and output nodes of the network g (-) so that the dimension of the obtained feature vector z is low, wherein the dimension of the feature vector z can be specifically set according to actual needs.

TABLE 1 parameter settings table for network f (-) s

The pseudo code of the deep residual network model training process in this embodiment is shown in table 2:

TABLE 2 pseudo code for deep residual network model training procedure

The method comprises the steps of training a depth residual error network model by using multi-view data of all unmarked samples in a training sample set, wherein a small batch of training strategies are adopted for multiple times during training, namely N unmarked samples in the training sample set are randomly selected for training each time, N is larger than or equal to 1, the N unmarked samples generate 2N views after deep multi-view learning, in the 2N views, two views from the same unmarked sample are called as positive view pairs, and two views from different unmarked samples are called as negative view pairs.

For two visual angles of the same unmarked sample, two vectors z are generated after passing through a depth residual error network model_iAnd z_jThe comparative loss ζ between the two vectors of the unmarked sample can be calculated by using a loss function_i,jIn each batch training process, calculating total comparative loss zeta on all positive visual angle pairs, judging whether the total comparative loss zeta meets the set requirement (set according to actual requirement), and if not, performing calculation on the depth residual error network modelOptimizing, and finishing the training of the deep residual error network model when the total comparative loss zeta meets the set requirement. Therefore, under the unsupervised condition, the feature information shared by different visual angles is learned according to the similarity between the visual angles.

In this embodiment, a cosine similarity function shown in formula (1) is used as a loss function, and formula (1) is as follows:

Because the multi-view data of the same unmarked sample is input into the depth residual error network model, one view obtains one vector, when the number of views constructed by the same unmarked sample is more than or equal to 3, the number of vectors obtained by the unmarked sample is more than or equal to 3, and at the moment, the calculation method of the comparative loss among all vectors of the unmarked sample is as follows: firstly, combining vectors obtained from the unmarked sample pairwise, then respectively calculating comparative losses of the vectors combined pairwise by using a formula (1), and finally summing the comparative losses of all different combinations to obtain the comparative losses among all vectors of the unmarked sample. For example: an unmarked sample has 3 views, and accordingly the unmarked sample results in 3 vectors: a. b, c, combining the 3 vectors pairwise, and calculating the vector a and the vector bTo a comparative loss value ζ_a,bSimilarly, vector a and vector c also obtain a comparative loss value ζ_a,cVector b and vector c also obtain a comparative loss value ζ_b,cThen the loss of comparability between all vectors of the unlabeled sample is ζ_a,b+ζ_a,c+ζ_b,c。

As other embodiments, other forms of cosine similarity functions may also be employed to calculate the comparative loss between two vectors, as long as the cosine similarity functions are constructed according to cosine similarity between vectors; in addition, the cross entropy loss function in the prior art can be used to calculate the comparative loss between two vectors.

The Resnet50 model adopted in this embodiment has a deep network structure, and as another embodiment, it is also possible to construct a network f (-) using another deep residual network model with a deep network structure, for example, the Resnet100 model.

In other embodiments, in order to further enhance the training effect and the robustness of the model, when the deep residual error network model is trained, sample data expansion can be performed by using two data expansion methods, namely random clipping and random gaussian fuzzy.

Step 5, constructing multi-view data of each sample in a sample set to be classified, inputting one view of each sample into a trained depth residual error network model to obtain a feature vector of the corresponding sample, and further obtaining feature vectors of all samples in the sample set to be classified;

the method for constructing the multi-view data of each sample in the to-be-classified sample set is similar to the method for constructing the multi-view data without the labeled sample in the step 3, and is not repeated, and because the optimization goal of the depth residual error network model in the step 4 is to enable the feature vectors output by the multi-view data from the same sample to be consistent, any one view of the to-be-classified sample can be input into the trained depth residual error network model to extract the feature vector of the to-be-classified sample; as another embodiment, the dimension reduction can be performed on each sample in the sample set to be classified, and each sample after the dimension reduction is input into the trained deep residual error network model to obtain the feature vector of the corresponding sample, so as to obtain the feature vectors of all samples in the sample set to be classified; or directly inputting each sample in the sample set to be classified into the trained deep residual error network model to obtain the feature vector of the corresponding sample, and further obtaining the feature vectors of all samples in the sample set to be classified.

And 6, inputting the feature vectors of all samples in the sample set to be classified into a pre-trained classification model to finish hyperspectral image classification.

The trained classification model can be a classification model trained by using a traditional machine learning method, such as a support vector machine classification model or a random forest classification model; or a classification model trained by a deep learning method, such as a convolutional neural network classification model.

The method has the following advantages: (1) on the basis of a three-dimensional data cube, multi-view data is constructed by using a principal component analysis method, and space-spectrum joint information in a hyperspectral image can be fully utilized; (2) constructing two different visual angles for each unmarked sample in a training sample set, training a depth residual error network model (hereinafter referred to as a model) by using multi-visual angle data of all unmarked samples in the obtained training sample set, on one hand, the advantage of more unmarked samples in the hyperspectral image can be fully exerted, and depth characteristic information of the hyperspectral image is mined by using a large number of unmarked samples, so that the depth characteristic of the hyperspectral image can be extracted by using the trained model; on the other hand, the model training is completed only when the total comparative loss meets the set requirement, so that the consistency of the trained model in deep mining of multi-view data is ensured, the feature vectors output by the multi-view data from the same sample are consistent, and the feature vectors extracted by the trained model are further ensured to have stronger representativeness, discriminability and robustness; therefore, the classification precision can be effectively improved by performing high-spectrum image classification on the feature vector extracted by using the trained model, and the improvement effect on the classification precision of the high-spectrum image is obvious particularly when the number of marked training samples is small (namely under the condition of small samples); (3) the training process adopts a data amplification method, so that the generalization capability of the network model can be further improved.

The simulation conditions in this embodiment are that intel borui i7-5700HQ, a 2.7GHz central processor, a GeForce GTX 970M graphics processor, and a 32GB memory, 5 labeled samples of each type of ground object are randomly selected as training samples and the rest of the samples are selected as test samples on the sainas dataset, and experiments are respectively performed by using an extended morphological attribute profile feature + support vector machine (EMP + SVM), a Transduction Support Vector Machine (TSVM), a three-dimensional convolution self-encoder (3DCAE), a generative countermeasure network (GAN), a deep few sample learning method + support vector machine (DFS L + SVM), a 50-layer residual error network model (Resnet50), and the classification method of this embodiment, and the experimental results are specifically shown in table 3 and fig. 5, wherein DMV L + SVM is implemented by using an SVM classification model, and DMV L + RF is implemented by using an RF classification method of this embodiment.

TABLE 3 Classification results of various methods on Salinas dataset

In Table 3, OA represents the overall classification accuracy, AA represents the average classification accuracy of each class, and k represents the kappa coefficient. By comparing the OA value, the AA value and the k value of each method, it can be seen that the classification method of the present embodiment can greatly improve the classification accuracy of the hyperspectral image under the condition of a small sample (5 labeled samples per class) compared with other methods.

Device embodiment

As shown in fig. 6, the hyper-spectral image classification apparatus based on depth multi-view learning of this embodiment includes a processor and a memory, where a computer program operable on the processor is stored in the memory, and the processor implements the method in the foregoing method embodiments when executing the computer program.

That is, the method in the above method embodiment should be understood as a flow of the hyperspectral image classification method based on depth multi-view learning, which can be implemented by computer program instructions. These computer program instructions may be provided to a processor such that execution of the instructions by the processor results in the implementation of the functions specified in the method flow described above.

The processor referred to in this embodiment refers to a processing device such as a microprocessor MCU or a programmable logic device FPGA.

The memory referred to in this embodiment includes a physical device for storing information, and generally, information is digitized and then stored in a medium using an electric, magnetic, optical, or the like. For example: various memories for storing information by using an electric energy mode, such as RAM, ROM and the like; various memories for storing information by magnetic energy, such as hard disk, floppy disk, magnetic tape, magnetic core memory, bubble memory, and U disk; various types of memory, CD or DVD, that store information optically. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.

The apparatus comprising the memory, the processor and the computer program is realized by the processor executing corresponding program instructions in the computer, and the processor can be loaded with various operating systems, such as windows operating system, linux system, android, iOS system, and the like.

As other embodiments, the device can also comprise a display, and the display is used for displaying the classification result for the staff to refer to.

Claims

1. A hyperspectral image classification method based on depth multi-view learning is characterized by comprising the following steps:

(1) inputting a hyperspectral image;

2. The method for classifying hyperspectral images based on depth multi-view learning according to claim 1, wherein the loss function is a cosine similarity function constructed according to cosine similarity between vectors.

3. The hyperspectral image classification method based on depth multi-view learning according to claim 2 is characterized in that the cosine similarity function is:

4. The method for classifying hyperspectral images based on depth multi-view learning according to any one of claims 1 to 3, wherein the process of processing a set of spectral bands to obtain a view comprises: and performing principal component analysis on the group of spectral bands, wherein the first M principal components of the group of spectral bands are taken as a visual angle, and M is more than or equal to 1.

5. The method for classifying hyperspectral images based on depth multi-view learning according to any of claims 1-3, wherein sample data expansion is performed by random clipping and a random Gaussian blur method when the depth residual error network model is trained.

6. The method for classifying hyperspectral images based on depth multi-view learning according to any of claims 1-3, wherein the depth residual error network model comprises 49 convolutional layers and 2 fully-connected layers, wherein the 49 convolutional layers of the Resnet50 model are used as the 49 convolutional layers of the depth residual error network model.

7. The method for classifying hyperspectral images based on depth multi-view learning according to any of claims 1 to 3, wherein the classification model is a support vector machine classification model, a random forest classification model or a convolutional neural network classification model.

8. A hyperspectral image classification apparatus based on depth multi-view learning, characterized in that the apparatus comprises a processor and a memory, wherein the processor executes a computer program stored by the memory to implement the hyperspectral image classification method based on depth multi-view learning according to any one of claims 1 to 7.