CN112633386A

CN112633386A - SACVAEGAN-based hyperspectral image classification method

Info

Publication number: CN112633386A
Application number: CN202011569729.2A
Authority: CN
Inventors: 陈志涛; 同磊; 禹晶; 肖创柏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-12-26
Filing date: 2020-12-26
Publication date: 2021-04-09

Abstract

The invention discloses a SACVAEGAN-based hyperspectral image classification method, wherein a potential vector classifier module is added on the basis of CVAEGAN and is used for classifying potential vectors corresponding to hyperspectral data, so that the potential vectors are cooperatively trained with a decoder and a sample classifier, the problem that the randomly generated potential vectors and classes in a GAN network are difficult to correspond is solved, and the accuracy is further improved. A self-attention mechanism and a spectrum regularization method are applied to a decoder, an encoder and a discriminator, the self-attention mechanism can enable a network model to better extract the characteristics of hyperspectral data, and the spectrum regularization method can improve the stability of the model. Features are extracted from two angles of space and spectrum in the sample classifier, and the structure of a residual error network is added, so that the effect of model classification is improved.

Description

SACVAEGAN-based hyperspectral image classification method

Technical Field

The invention relates to the field of hyperspectral image classification, in particular to a method for classifying hyperspectral images.

Background

With the continuous development of remote sensing technology, High Spectral Image (HSI) makes a significant breakthrough in the field of earth observation. Unlike traditional three-channel color images, HSI can collect images in hundreds of spectral bands simultaneously, with very rich spectral information. Therefore, the hyperspectral image is widely applied to the fields of satellite remote sensing, crop observation, mineral exploration and the like.

In hyperspectral data processing technology, the classification problem has always been one of the very active subjects. The classification problem of hyperspectral data generally has two classification methods: spectral classifiers and spectral-spatial classifiers. Conventional hyperspectral image classification algorithms typically include Support Vector Machines (SVMs), K Nearest Neighbors (KNNs), maximum likelihood, neural networks, and logistic regression. However, since the same material may have spectral differences and different materials may have similar spectral characteristics, it is difficult to accurately distinguish different classes only by spectral information. In order to solve the above problems, some scholars have proposed a method of combining spectral information with spatial information to improve classification performance.

Due to the fact that a large amount of experience and parameter setting are designed in the traditional classification method, in recent years, deep learning methods are applied to hyperspectral image classification in a large amount. Convolutional Neural Networks (CNN) in particular have received great attention. For example, one of the methods proposed by w.hu adopts a five-layer deep CNN model to extract the spectral features of HSI, and obtains better classification performance. Li proposes a pixel pair method as a classifier for depth spectra, which achieves good results in the absence of training data, but which mainly performs convolution operations in the spectral domain, ignoring spatial details. Ying Li provides a method for extracting spectrum-space characteristics by using 3D convolution, various characteristics of hyperspectral data are fully considered, and a good classification result is obtained.

Although the deep learning based approach has made great progress in HSI classification, it still faces some problems, namely, too little tagged data. Based on this, some scholars propose to reduce the phenomenon of limited hyperspectral data by using a GAN network model. The GAN network model typically includes a generative model G and a discriminative model D. Models G and D were trained in an antagonistic manner. Where G attempts to generate as real pseudo-samples as possible by means of a random vector Z, and D attempts to identify real samples and G-generated pseudo-samples. The two are continually confronted until D finally fails to successfully identify a false sample. By correctly using the samples which are generated by the GAN and can be used as virtual samples, the classification accuracy can be improved, and the condition that hyperspectral training data is limited is reduced.

Disclosure of Invention

The invention aims to solve the problems and provides an SACVAEGAN (Self-Attention-based Conditional variant adaptive Encoder generation countermeasure network) hyperspectral image classification method. The invention adopts CVAEGAN (conditional variational self-encoder) as a basic structure to solve the problem of small hyperspectral image training data, generates a virtual sample of the hyperspectral image through a decoder in the CVAEGAN, amplifies the training data and further improves the accuracy of classification. Meanwhile, a potential vector classifier module is added on the basis of CVAEGAN to classify potential vectors corresponding to hyperspectral data, so that the potential vectors are cooperatively trained with a decoder and a sample classifier, the problem that the potential vectors and classes randomly generated in a GAN network are difficult to correspond is solved, and the accuracy is improved. A self-attention mechanism and a spectrum regularization method are applied to a decoder, an encoder and a discriminator, the self-attention mechanism can enable a network model to better extract the characteristics of hyperspectral data, and the spectrum regularization method can improve the stability of the model. Features are extracted from two angles of space and spectrum in the sample classifier, and the structure of a residual error network is added, so that the effect of model classification is improved.

In order to achieve the purpose, the technical scheme and the experimental steps adopted by the invention are as follows:

(1) firstly, the hyperspectral image data is preprocessed.

(1a) The original hyperspectral data edges are first filled and zeroed so that data with a window size of patchsize × patchsize can be taken centered around each point, where indianpins and Salinas datasets patchsize is 28 and PaviaU dataset patchsize is 24.

(1b) Randomly selecting K points as training labels, wherein 500 points are used as the training labels in the IndianPines and PaviaU data sets, 200 labels are used as the training labels in the Salinas data set, and the rest are used as the testing labels.

(1c) A sample set is obtained. And obtaining a sample set of the hyperspectral image, taking K training labels obtained at random as centers, dividing training data by the size of a window being patchsize multiplied by patchsize, and dividing the rest of the training data into test data by the size of the patchsize multiplied by patchsize.

(2) Building a network model

After data preprocessing, the network model is constructed. The training network model consists of five parts which are respectively: conditional variations are from the encoder, the discriminator, the sample classifier, and the latent vector classifier.

(2a) Conditional variations are divided from an encoder into a decoder (i.e., a generator) and an encoder. The encoder mainly functions to generate potential vectors corresponding to real hyperspectral data. The encoder combines a self-attention mechanism and a spectral normalization method. The decoder mainly functions to generate corresponding virtual hyperspectral data according to the potential vectors, and combines a self-attention mechanism and a spectrum normalization method.

(2b) The discriminator is mainly used for discriminating the truth of the input hyperspectral data, and combines an attention mechanism with spectrum normalization.

(2c) The sample classifier is mainly used for classifying input hyperspectral data and is composed of two branches for acquiring spatial features and spectral features of hyperspectral images and extracting features by combining a residual error network.

(2d) The potential vector classifier is mainly used for classifying the randomly generated potential vectors, and then giving a category to the virtual hyperspectral data generated by the generator according to the potential vectors so as to facilitate the following operation.

(3) Training network

Training is started after the data and the model are processed respectively. The training process is mainly divided into four parts, namely a condition variation self-encoder, a discriminator, a sample classifier and a potential vector classifier are trained.

(3a) The arbiter is first trained. The training of the discriminator is divided into three steps, namely, real hyperspectral data and potential vectors generated by an encoder in a condition variation self-encoder are input into a decoder to generate virtual hyperspectral data, and virtual hyperspectral data generated by the potential vectors generated randomly are input into the discriminator to be trained, and a loss function is calculated to optimize parameters of the discriminator.

(3b) Training the conditional variational self-encoder. The training is divided into five steps, namely, the real hyperspectral data are put into an encoder to generate corresponding potential vectors and corresponding loss functions are calculated. And inputting virtual hyperspectral data and real hyperspectral data which are correspondingly generated by the potential vector generated by the encoder into a discriminator, and calculating a corresponding loss function. And putting the virtual hyperspectral data and the real hyperspectral data corresponding to the potential vector generated by the encoder into a sample classifier to calculate a corresponding loss function. And respectively inputting the randomly generated potential vector and the corresponding virtual hyperspectral data into a potential vector classifier and a sample classifier, and calculating corresponding loss functions. And inputting the potential vector generated by the encoder into a potential vector classifier, and calculating a corresponding loss function.

(3c) And classifying the potential vector classifier. The training of the potential vector classifier is mainly to input potential vectors corresponding to real hyperspectral data into the potential vector classifier for classification and calculate a loss function.

(3d) And training a sample classifier. The training of the sample classifier is mainly divided into three steps. And respectively inputting the real hyperspectral data into a sample classifier to calculate a classified loss function. And respectively inputting the real hyperspectral data and the virtual hyperspectral data into a sample classifier to calculate a corresponding loss function. And inputting the randomly generated potential vectors into a potential vector classifier for classification, inputting virtual hyperspectral data generated according to the randomly generated potential vectors into the classifier for classification, and calculating corresponding loss functions according to classification results.

(4) Hyperspectral image classification

And testing after the model training is completed. And comparing the test result with the true value to obtain a classification result, and calculating the accuracy. The potential vector classifier is added in the invention, so that the problem of correspondence between potential vectors and categories randomly generated in the GAN model is solved, and experimental results prove that the accuracy of the model for classifying the hyperspectral data is remarkably improved after the potential vector classifier is added. A self-attention mechanism and a spectrum regularization method are added into the network model, so that the stability and the training speed of the model are improved. The method has the advantages that the characteristics are extracted from two angles of the space spectrum in the sample classifier, and the residual error network structure is added, so that the accuracy of the sample classifier in classifying the hyperspectral data is improved.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is an overall structural view of the present invention.

Fig. 3 is a block diagram of a sample classifier.

Fig. 4 is a structural diagram of a conditional variable self-encoder, in which the upper side is an encoder and the lower side is a decoder.

Fig. 5 is a structural diagram of the discriminator.

Fig. 6 is a block diagram of a potential vector classifier.

In fig. 7, (a) is an IndianPines hyperspectral image used in the present invention, (b) is a PaviaU hyperspectral image used in the present invention, and (c) is a Salinas hyperspectral image used in the present invention.

Fig. 8 (a) is a diagram showing the result of classification of the IndianPines hyperspectral image by the SVM. (b) The result graph is obtained by classifying the IndianPines hyperspectral images through 2 dCNN. (c) The result graph is obtained by classifying 3dCNN on IndianPines hyperspectral images. (d) The result graph is obtained by classifying the IndianPines hyperspectral images through the DCGAN. (e) The invention is a result graph for classifying Indian pines hyperspectral images.

Fig. 9 (a) is a diagram showing the result of classification of the PaviaU hyperspectral image by the SVM. (b) The result graph is obtained by classifying the PaviaU hyperspectral image by 2 dCNN. (c) The result graph is obtained by classifying the PaviaU hyperspectral image by 3 dCNN. (d) The result graph of the classification of the PaviaU hyperspectral images by the DCGAN is shown. (e) The invention is a result graph for classifying the PaviaU hyperspectral image.

Fig. 10 (a) is a diagram showing the results of classification of the Salinas hyperspectral images by the SVM. (b) The result graph is obtained by classifying Salinas hyperspectral images by 2 dCNN. (c) The result graph is that the 3dCNN classifies Salinas hyperspectral images. (d) The result graph of the classification of Salinas hyperspectral images by DCGAN is shown. (e) The invention is a result graph for classifying Salinas hyperspectral images.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to FIG. 1, the experimental procedure of the present invention is as follows:

step 1, data preprocessing

(1) Edge filling and normalization of data

The data is edge-filled with zero padding to enable the data to be centered on each available data point and the window size is divided into

Patchsize × Patchsize data. And carrying out normalization processing on the data.

(2) Obtaining a sample set

Randomly selecting K data points in the original data, and dividing the window into the size of K data points by taking the K data points as the center

The patch size × patch size data block serves as a training set. The patch size × patch sized data block is divided into test sets centered on the remaining data points.

Step 2, constructing a model

The model mainly comprises four parts which are respectively: conditional variations are from the encoder, the discriminator, the sample classifier, and the latent vector classifier.

The conditional variational self-encoder consists of an encoder and a decoder. Wherein corresponding latent vectors are generated for better utilization of the encoder extraction features. The encoder is operative to generate its corresponding potential vector from the input samples, and the decoder is operative to generate sample data of the corresponding class from the input potential vector. Two branches are used in the encoder. One branch uses 2-D convolution to process data to obtain the spatial characteristics of the data, the other branch uses 1-D convolution to process data of the central pixel point of the data block to obtain the spectral characteristics of the data, and after the spatial characteristics and the spectral characteristics are extracted, the spatial characteristics and the spectral characteristics are spliced to continue the next operation. Both branches consist of convolutional layers, fully-connected layers, batch normalization, spectral normalization, Dropout, and self-supervision mechanisms. The last convolutional layer was removed and the remaining convolutional layers were normalized by spectral normalization to meet the constraints of Lipschitz ═ 1. The activation function of the convolutional layer is 'ReLU'. The structure of the decoder is similar to that of the encoder except that it is a single branch. The system consists of an deconvolution layer, a full-link layer, batch normalization, spectrum normalization, Dropout and an auto-supervision mechanism. The last convolutional layer was removed and the remaining convolutional layers were normalized by spectral normalization to meet the constraints of Lipschitz ═ 1. Using batch normalization may allow a higher learning rate to accelerate convergence by normalizing the data for each training mini-batch. Using Dropout may prevent the occurrence of overfitting in the case of small training samples. The spectrum normalization is used, so that the possibility of generating the GAN network mode collapse can be reduced, and the stability of the GAN network mode collapse can be improved. And a self-supervision mechanism is used, so that the feature information can be better extracted. After the encoder passes through a convolution and self-supervision mechanism, the encoder inputs the data into a full connection layer in a leveling mode, and a sigmoid function is used for activation. After passing through the deconvolution layer and the self-supervision layer, the decoder flattens and inputs the deconvolution layer and the self-supervision layer into a full-connection layer, and finally activates the decoder by using a 'Tanh' function.

The main function of the discriminator is to judge whether the input sample data is real sample data or virtual sample data. The model structure consists of convolution layer, full connection layer, batch normalization, spectrum normalization, Dropout and self-supervision mechanism. The last convolutional layer was removed and the remaining convolutional layers were normalized by spectral normalization to meet the constraints of Lipschitz ═ 1. The activation function of the convolutional layer is 'LeakyReLU'. Using batch normalization may allow a higher learning rate to accelerate convergence by normalizing the data for each training mini-batch. Using Dropout may prevent the occurrence of overfitting in the case of small training samples. The spectrum normalization is used, so that the possibility of generating the GAN network mode collapse can be reduced, and the stability of the GAN network mode collapse can be improved. And a self-supervision mechanism is used, so that the feature information can be better extracted. After passing through the convolutional layer and the self-monitoring layer, the arbiter flattens and inputs the convolutional layer and the self-monitoring layer into a full-connection layer, and finally activates the convolutional layer and the self-monitoring layer by using a 'Sigmoid' function.

The main function of the sample classifier is to classify the input sample data. The model structure is mainly divided into two branches, one branch uses 2-D convolution to process data to obtain the spatial characteristics of the data, the other branch uses 1-D convolution to process data of the central pixel point of the data block to obtain the spectral characteristics of the data, and after the spatial characteristics and the spectral characteristics are extracted, the two branches are spliced to continue the next operation. Both branches consist of convolutional layers, fully-connected layers, batch normalization, Dropout, and residual network structures. The activation function of the convolutional layer is 'LeakyReLU'. Using batch normalization may allow a higher learning rate to accelerate convergence by normalizing the data for each training mini-batch. Using Dropout may prevent the occurrence of overfitting in the case of small training samples. The depth of the network is deepened by using a residual network structure, and the accuracy of the network is improved. After passing through the convolutional layer and the residual network layer, the sample classifier is input into the full-link layer in a flattened mode, and finally activated through a 'LogSoftmax' function.

The main role of the potential vector classifier is to classify potential vectors. The model structure is composed of a convolution layer, a full connection layer and a Dropout. The activation function of the convolutional layer is 'ReLU'. Using batch normalization may allow a higher learning rate to accelerate convergence by normalizing the data for each training mini-batch. Using Dropout may prevent the occurrence of overfitting in the case of small training samples. The potential vector classifier is firstly adjusted to be a fixed size through the full link layer, then changes the shape of the potential vector classifier and inputs the shape of the potential vector classifier into the convolutional layer, then flattens the shape of the potential vector classifier and inputs the shape of the potential vector classifier into the full link layer, and finally activates the potential vector classifier through a 'LogSoftmax' function.

Batch Normalization (BN), which normalizes activation of the previous layer of each batch. In other words, it averages the activation of the previous layerThe value is transformed to 0 and the activation standard deviation is transformed to 1. Assume that the batch size is n, and

for the activation value derived for the previous layer,

the batch normalization is calculated as:

wherein the content of the first and second substances,

representing the output of samples from batches after batch normalization.

And

to represent

The expectation and variance of (c). Correspondingly, γ and β represent learned hyper-parameters.

The main role of spectrum normalization (spectra normalization) is to make the parameters in the convolution operation satisfy the constraint of Lipschitz ═ 1, so that the network structure is more stable, and the generation of mode collapse is reduced. The realization method is that each layer of network parameters is divided by the spectrum norm of the layer of parameter matrix to meet the constraint of Lipschitz 1.

The main function of the self attention mechanism (selfatentence) is to improve the quality of image generation, and further improve the classification performance of the classifier. Most GAN-based image generation models are constructed using convolutional layers. Convolutional layers process information in the local neighborhood, and thus modeling long-range correlations in an image using convolutional layers alone is computationally inefficient. A self-attention mechanism is therefore introduced that enables the generator and arbiter to efficiently model the relationship between widely separated spatial regions.

Suppose that the image feature x ∈ R in the previous hidden layer^C×NThe transformation into two feature spaces f and g is used to calculate attention. Wherein f (x) ═ W_fx，g(x)＝W_gx。

β_ijThe degree of participation, i.e., the correlation, of the i region when the model synthesizes the j region image content is represented. The output of the attention layer is o ═ o (o)₁,o₂,...,o_j,...,o_N)∈R^C×NC is the number of channels, and N is the number of feature locations where the previous hidden layer feature is located.

Finally, the output of the attention layer is further multiplied by a proportional parameter, and the characteristic diagram of the input is added. Thus, the final output is:

y_i＝γo_i+x_i (4)

gamma is a learnable scalar and is initialized to 0 in order to make the network pay limited attention to neighborhood information, and then weights are slowly distributed to other distant features.

The residual network consists of a series of residual blocks, one of which can be represented as:

x_l+1＝x_l+F(x_l,W_l) (5)

wherein x_lThe data of the previous layer is directly transferred for direct mapping. F (x)_l,W_l) The residual part is generally composed of 2-3 convolutions, and represents a part generated by the convolution. The residual network is easy to optimize, and the depth can be increased to improve the accuracy.

Step 3, training the network

The training of the conditional variation self-encoder is divided into four parts, namely an encoder part in the conditional variation self-encoder is trained, a decoder is trained by a discriminator, and the decoder is trained by a sample classifier and a potential vector classifier. The loss function when the training condition variates the self-encoder is:

L_G＝L_E+L_GD+L_GC+L_{aux_real} (6)

wherein

In the above formula, v and ξ are the mean and variance of the potential vector output by the encoder network, x is real data, and x' is generated virtual data.

In the above formula, P_r，P_zThe distribution of real data and latent variables, respectively. f. of_D，f_CRespectively, the covariance of the output of the middle layer of the discriminator and the middle layer of the classifier.

Where x represents real data or virtual data generated from potential vectors corresponding to real data, O_CRepresenting the final output of the sample classifier, O_auxRepresenting the output of the potential vector classifier.

The training of the discriminator is mainly divided into three parts. And respectively inputting the real data, virtual samples generated according to potential vectors corresponding to the real data and virtual samples generated according to potential vectors generated randomly into a discriminator to calculate corresponding loss functions. The loss function for training the arbiter is:

L_D＝L_{D_real}+L_{D_fake}+L_{D_fake_random} (11)

wherein

Where x denotes the real data, P_rRepresenting the true data distribution. z represents the potential vector to which the real data corresponds, P_zRepresents the distribution of z. z _ random represents a randomly generated potential vector. P_{z_random}Representing the distribution of z _ random.

The training of the sample classifier is mainly divided into three parts. And respectively inputting the real data, virtual samples generated according to potential vectors corresponding to the real data and virtual samples generated according to potential vectors generated randomly into a sample classifier to calculate corresponding loss functions. The loss function of the training sample classifier is:

L_C＝L_{C_real}+L_{C_fake}+L_{C_fake_random} (14)

wherein

L_{C_real}＝-E[logP(c|x_r)](15)

L_{C_fake}＝||f(x_r)-f(x_g)|| (16)

L_{C_fake_random}＝-E[logP(c|x_{g_random})] (17)

Wherein x_rRepresenting true data, x_gRepresenting decoder rootsVirtual data, x, generated from potential vectors corresponding to real data_{g_random}Representing the dummy data generated by the decoder from the randomly generated potential vectors. And c represents a category. f represents the middle layer output of the sample classifier.

Only one part of the training of the potential vector classifier is to input the potential vector corresponding to the real data into the potential vector classifier for classification and calculate the corresponding loss function. Training the latent vector classifier to a loss function of

L_aux＝-E[logP(c|z)] (18)

Where z is the potential vector to which the real data corresponds.

In addition, the classifier, the discriminator, and the generator all used the Adam optimization algorithm with a batch size of 32, the weight attenuation was set to 0.0005, the learning rate of the generator was 0.0001, the learning rate of the discriminator was 0.0002, the learning rates of both classifiers were 0.0001, and when iterated to 20,40,80,100, the learning rate became 0.7 times the current learning rate. The number of iterations is 500. IndianPines dataset chunk size was 28 x 28, training set size was 500. The PaviaU data set data block size is 24 × 24 and the training set size is 500. The salanas dataset data block size was 28 x 28 and the training set size was 200. The latent variable z has a size of 100.

Step 4, classifying the hyperspectral images

And comparing the output of the classifier with the test value and the true value to obtain a classification result, and calculating the accuracy.

And 5, outputting a classification image result.

Experiments and analyses

1. Conditions of the experiment

The hardware test platform of the invention is: the processor Intel (R) core (TM) i5-9300H CPU with a main frequency of 2.40GHz, a memory of 16GB and a display card of GTX1660 Ti; the software platform is Windows 10 operating system and Pycharm 2019. The programming language is Python, and the network structure is implemented using a PyTorch deep learning framework.

2. Experimental data

The performance evaluation of the present invention mainly uses three data sets. Indiana pine Indian Pines dataset, Pavia University dataset in italy, and sainas valley sainas dataset in california, usa.

Indian pine Indian Pines data sets were collected by an onboard visible infrared imaging spectrometer (AVIRIS) in 1992 on Indian paince test fields in northwest Indian. The image has 220 original wave bands, and after removing useless wave bands, 200 wave bands remain. The images had 16 types of samples in total. Table 1 shows the distribution of the number of samples of each type in the Indian Pines images, and the number of each type in the training set and the testing set on the data set.

TABLE 1

The Pavia University dataset in italy is a portion of the hyperspectral data of images of the Pavia University in italy in 2003 using an onboard reflectance optical spectroscopy imager in germany. The spectral imager continuously images 115 wavebands within the wavelength range of 0.43-0.86 μm, and the spatial resolution of the image is 1.3 m. Of these, 12 bands are rejected due to noise, so that an image composed of the remaining 103 spectral bands is generally used. The data has a size of 610 × 340, and contains 9 types of samples. Table 2 shows the distribution of the number of samples of each type in the Pavia University image, and the number of each type in the training set and the testing set on the data set according to the present invention.

TABLE 2

The Salinas valley Salinas dataset, California, USA is an image of the Salinas valley, California, USA by an AVIRIS imaging spectrometer. The spatial resolution reaches 3.7 m. The image has 224 original bands, and there are 204 bands left after removing several invalid bands. The size of the image was 512 × 217, and there were 16 types of samples. Table 3 shows the distribution of the number of samples of each category in the Salinas image, and the number of each category in the training set and the testing set on the data set according to the present invention.

TABLE 3

3. Performance comparison

The four prior art comparison and classification methods used in the invention are as follows:

(1) the hyperspectral images are classified by using an svm (supported vector machine) based on a Radial Basis Function (RBF) kernel.

(2) The hyperspectral image classification method provided by Rotewara et al in hyperspectral remote sensing image classification based on a deep convolutional neural network. Referred to as convolutional neural network classification method. Firstly, dimensionality reduction is performed on hyperspectral data through PCA (principal Component analysis), and then a 2D convolution is adopted to classify hyperspectral image.

(3) The hyperspectral Image Classification method proposed by Amina Ben Hamida et al in 3-D Deep Learning for Remote Sensing Image Classification. Referred to as 3D convolutional network classification method for short. And classifying the hyperspectral images by adopting 3D convolution.

(4) The Hyperspectral Image Classification method proposed by Lin Zhu et al in general adaptive Networks for Hyperspectral Image Classification. Referred to as a generative confrontation network classification method. The hyperspectral images are classified by DCGAN (deep conditional general adaptive networks).

In the experiment, the following three indexes were used to evaluate the performance of the present invention:

the first evaluation index is the Overall Accuracy (OA), which represents the proportion of correctly classified samples to all samples, with larger values indicating better classification.

The second evaluation index is the Average Accuracy (AA), which represents the average of the accuracy of classification for each class, with larger values indicating better classification results.

The third evaluation index is a chi-square coefficient (Kappa) which represents different weights in the confusion matrix, and the larger the value is, the better the classification effect is.

Table 4 shows the accuracy and contrast of the present invention for the high spectral image classification of Indian pine Italy pins

Table 5 shows the accuracy and contrast of the present invention in classifying the Pavia University hyperspectral images of paviia city, italy.

Table 6 shows the accuracy and contrast of the present invention for classifying Salinas valley hyperspectral images in California, USA.

TABLE 4

TABLE 5

TABLE 6

As can be seen from tables 4, 5 and 6, the hyperspectral classification method provided by the invention has better classification effect than other methods for the same hyperspectral data set. The classification performance of the network was about 3%, 9% and 2% better than the current best method classification performance for the university of pavea dataset, the indian pine dataset and the sainas valley dataset of california, usa, respectively.

In addition, fig. 8, 9 and 10 show classification graphs whose visual classification effect is consistent with the results listed in table 4, table 5 and table 6. From the visualization of results, the classification chart achieved by the method has better effect.

TABLE 7

Table 7 is a time comparison of training and testing of the present invention with SVM, 2dCNN, 3dCNN and DCGAN.

As can be seen from Table 7, the training time of the present invention is much longer than that of other methods due to the complicated structure of the model. The test times for the present invention and DCGAN are comparable and longer than those for 2dCNN and 3 dCNN.

In summary, the invention provides a method for classifying hyperspectral images of SACVAEGAN network structures. An automatic supervision mechanism is added on the basis of CVAEGAN, so that the network can better learn the characteristics of the hyperspectral image. Other structures of the network are modified, and a residual error network is added into the sample classifier, so that the network structure is deepened, and the accuracy is higher; meanwhile, two branches for extracting spectral features and spatial features are added into the sample classifier so as to better extract the features of the hyperspectral image; and a potential vector classifier module is added in the network, so that the potential vector generated by SACVAEGAN is more accurate, and meanwhile, a more accurate label can be given to the potential vector generated randomly. The experimental result shows that the method has higher classification precision than the prior art.

Claims

1. SACVAEGAN-based hyperspectral image classification method is characterized in that: comprises the following steps

(1) Firstly, preprocessing hyperspectral image data;

(2) constructing a network model;

after data preprocessing is carried out, a network model is constructed; the training network model consists of five parts which are respectively: the system comprises a conditional variation self-encoder, a discriminator, a sample classifier and a potential vector classifier;

(3) training a network;

after the data and the model are processed respectively, training is started; the training process is mainly divided into four parts, namely a condition variation self-encoder, a discriminator, a sample classifier and a potential vector classifier are trained;

(4) classifying the hyperspectral images;

testing after the model training is finished; and comparing the test result with the true value to obtain a classification result, and calculating the accuracy.

2. SACVAEGAN-based hyperspectral image classification method according to claim 1, characterized in that: in the step (1), (1a) filling and zero padding are firstly carried out on the edge of the original hyperspectral data, so that data with the window size of patchsize multiplied by patchsize can be obtained by taking each point as the center, wherein the patchsize of the Indian pins data set and the Salinas data set is 28, and the patchsize of the PaviaU data set is 24;

(1b) randomly selecting K points as training labels, wherein 500 points are used as the training labels for the Indian pines and PaviaU data sets, 200 labels are used as the training labels for the Salinas data sets, and the rest are used as test labels;

(1c) obtaining a sample set; and obtaining a sample set of the hyperspectral image, taking K training labels obtained at random as centers, dividing training data by the size of a window being patchsize multiplied by patchsize, and dividing the rest of the training data into test data by the size of the patchsize multiplied by patchsize.

3. SACVAEGAN-based hyperspectral image classification method according to claim 1, characterized in that: in the step (2), (2a) the conditional variation is divided into a decoder and an encoder from an encoder; the encoder is mainly used for generating potential vectors corresponding to real hyperspectral data; the encoder combines a self-attention mechanism and a spectrum normalization method; the decoder is mainly used for generating corresponding virtual hyperspectral data according to the potential vector, and combines a self-attention mechanism and a spectrum normalization method;

(2b) the discriminator mainly has the function of discriminating the truth of the input hyperspectral data, and combines an attention mechanism with spectrum normalization;

(2c) the sample classifier is mainly used for classifying input hyperspectral data and consists of two branches for acquiring spatial features and spectral features of hyperspectral images and extracting features by combining a residual error network;

4. SACVAEGAN-based hyperspectral image classification method according to claim 1, characterized in that: in the step (1), (3a) firstly, training a discriminator; training of the discriminator is divided into three steps, namely inputting real hyperspectral data and potential vectors generated by an encoder in a condition variation self-encoder into a decoder to generate virtual hyperspectral data and inputting virtual hyperspectral data generated by the potential vectors generated randomly into the discriminator to train, and calculating a loss function to further optimize parameters of the discriminator;

(3b) training a conditional variational self-encoder; the training is divided into five steps, namely, the real hyperspectral data are put into an encoder to generate corresponding potential vectors and corresponding loss functions are calculated; inputting virtual hyperspectral data and real hyperspectral data which are correspondingly generated by a potential vector generated by an encoder into a discriminator, and calculating a corresponding loss function; virtual hyperspectral data and real hyperspectral data corresponding to the potential vector generated by the encoder are put into a sample classifier to calculate a corresponding loss function; respectively inputting the randomly generated potential vector and the corresponding virtual hyperspectral data into a potential vector classifier and a sample classifier, and calculating corresponding loss functions; inputting the potential vector generated by the encoder into a potential vector classifier, and calculating a corresponding loss function;

(3c) classifying the potential vector classifier; the training of the potential vector classifier is to input the potential vectors corresponding to the real hyperspectral data into the potential vector classifier for classification and calculate a loss function;

(3d) training a sample classifier; training of the sample classifier is divided into three steps; respectively inputting the real hyperspectral data into a sample classifier to calculate a classified loss function; respectively inputting the real hyperspectral data and the virtual hyperspectral data into a sample classifier to calculate a corresponding loss function; and inputting the randomly generated potential vectors into a potential vector classifier for classification, inputting virtual hyperspectral data generated according to the randomly generated potential vectors into the classifier for classification, and calculating corresponding loss functions according to classification results.