CN110533077A

CN110533077A - Form adaptive convolution deep neural network method for classification hyperspectral imagery

Info

Publication number: CN110533077A
Application number: CN201910709042.5A
Authority: CN
Inventors: 肖亮; 刘启超
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2019-12-03
Anticipated expiration: 2039-08-01
Also published as: CN110533077B

Abstract

The invention discloses a kind of form adaptive convolution deep neural network methods for classification hyperspectral imagery, this method comprises: spatial structural form is taken to learn branch；It is taken based on the form adaptive convolution kernel of guiding figure and can train；One-dimensional convolutional layer is tieed up by spectrum and space dimension two-dimensional convolution layer constitutes sky-spectrum signature extraction unit, each unit is gathered around there are two inputting, respectively characteristic pattern and guiding figure；Depth network is stacked by multiple skies-spectrum signature extraction unit, and skip floor connection is established between every two feature extraction unit；Network losses function is weighting cross entropy.The present invention passes through the spatial coherence between adjacent picture elements in study sky-modal data, the acceptance region shape of convolution algorithm can be adaptively adjusted according to the space structure relationship between explicit definition pixel, the defect of anisotropic character cannot be captured by overcoming fixed rectangular convolution, and excellent classification and Generalization Capability are all had to the high spectrum image of different resolution different scenes complexity.

Description

Shape adaptive convolution depth neural network method for hyperspectral image classification

Technical Field

The invention relates to a hyperspectral image classification technology, in particular to a shape adaptive convolution depth neural network method for hyperspectral image classification.

Background

The hyperspectral camera can acquire cubic 'atlas-in-one' data rich in material information, can have nanometer (nm) level spectral resolution in the range of visible light-near infrared, short wave infrared and even middle infrared and thermal infrared bands, has hundreds of continuous and narrow band spectral band images, and is widely applied to the fields of military reconnaissance, environmental monitoring, geological exploration, target detection and the like. The supervised classification of hyperspectral images (HSI) is one of the most important research contents in the field.

Over the past decade, researchers have proposed many supervised classification methods for HSI. From simple models based on statistics to complex methods based on feature representation, HSI classification has become a targeted research content in the field of remote sensing. General classification methods, such as linear or non-linear regression (LR or NLR), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and multi-kernel learning (MKL), can only roughly divide spectral data in some high-dimensional space without specifying the discriminative features of the spectrum. In order to explore the structure of hyperspectral data, methods based on feature representation, such as Sparse Representation (SR), Dictionary Learning (DL), manifold learning, wavelet transformation, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and the like, reveal spectral discriminant features to some extent. However, noise caused by aspects of low imaging quality (e.g. low resolution, illumination, shooting angle, etc.) and coarse labeling, etc., leads to the phenomenon that some pixels belonging to different classes have the same or similar spectra. In order to reduce the influence of these noises, researchers have proposed a classification method that improves the smoothness of the classification map by using the aggregation property of the pixels in the homogeneous region, i.e., a spatio-spectral joint classification method. The method based on space-spectrum feature extraction and the method based on post-processing are two common space-spectrum combined classification methods. In the space-spectrum feature extraction method, artificial features such as Gabor features, morphological features and texture features are generally used for representing the spatial structure of the HSI. In addition, post-processing based methods, such as Markov Random Fields (MRF), local classification voting, and relearning, utilize a priori local aggregation of pixels to correct partially misclassified pixels, ultimately improving the accuracy of the classification.

However, when processing different types of hyperspectral data, classification methods based on artificial features have certain limitations. For example, a method with a suitable parameter configuration for certain data sets may not perform well for other types of data acquired by different types of cameras. In other words, most conventional classification methods do not have sufficient generalization capability. Fortunately, deep learning methods can learn hierarchical feature representations directly from raw data, which provides another effective solution to the above-mentioned problem. Researchers have conducted extensive research, and some typical deep learning methods, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), and Stacked Automatic Encoders (SAE), have been applied to HSI classification. Although deep learning has powerful feature learning and representation capabilities, the traditional HSI deep learning classification architecture still has limitations. In particular, conventional CNNs that perform well on 2D data (e.g., images) have difficulty in handling 3D data (e.g., HSIs) well. This is because the spatial structure information of HSI exists only in local spatial regions and not in the global spatial domain, and the spectrum is the main source of information for distinguishing substances, while the spatial information plays only an auxiliary role. For the reasons described above, many deep learning methods take HSI neighborhood blocks as the input to the algorithm, thereby utilizing both spatial and spectral information. However, the standard CNN for HSI classification has significant drawbacks. In particular, due to the fixed geometry of the CNN module, the convolution unit samples the input feature map at a fixed location, which introduces interference information into the computation of pixel-level feature extraction and leads to misclassification of pixels near the boundaries between different materials. The obvious result of this CNN deficiency is that the classification map becomes too smooth and loses the detail information of many scenes, and the HSI classification effect is poor for rich scene details.

Disclosure of Invention

The invention aims to provide a shape adaptive convolution depth neural network method for hyperspectral image classification.

The technical solution for realizing the purpose of the invention is as follows: a shape adaptive convolution depth neural network method for hyperspectral image classification comprises the following steps:

the method comprises the steps that firstly, a convolutional neural network branch is used for learning space structure information of a hyperspectral image and storing the space structure information in a guide graph;

secondly, constructing shape self-adaptive convolution and extracting anisotropic space-spectrum characteristics by matching with a guide graph;

thirdly, a space-spectrum feature extraction unit is formed by the spectrum dimension one-dimensional convolution layer and the space dimension two-dimensional shape self-adaptive convolution layer, and spectrum dimension one-dimensional convolution and space dimension two-dimensional shape self-adaptive convolution are executed in sequence; each feature extraction unit has two inputs, namely a feature graph and a guide graph;

fourthly, the deep network is formed by stacking a plurality of space-spectrum feature extraction units, and layer jump connection is established between every two feature extraction units, namely the plurality of feature extraction units are stacked layer by layer to form the deep network, and the input of each unit is formed by splicing the outputs of all the units in front;

and fifthly, constructing a weighted cross entropy loss function.

Compared with the prior art, the invention has the remarkable advantages that: (1) extracting spatial structure information of the space-spectrum data in a network learning mode; (2) the shape adaptive convolution can dynamically adjust the shape of a convolution receiving domain according to the distribution of real ground objects, and the phenomenon of misclassification of image elements near the edge caused by the fixed geometric structure of the traditional convolution is avoided; (3) by using a feature extraction unit formed by the spectrum dimension one-dimensional convolution layer and the space dimension two-dimensional shape self-adaptive convolution layer, anisotropic space-spectrum features can be effectively extracted; (4) the network model is an end-to-end classification model, all learning modules are trained and inferred in a unified mode, extra supervision and training processes are not needed, and the method has excellent generalization and classification performance.

Drawings

FIG. 1 is a flow chart of a shape adaptive convolution depth neural network method for hyperspectral image classification according to the invention.

Fig. 2 is a schematic diagram of shape adaptive convolution.

Fig. 3 is a structural diagram of a spatio-spectral feature extraction unit.

FIG. 4 is a graph of results of different methods for classifying a synthetic dataset.

FIG. 5 is a diagram of results of different methods for classifying Indian Pines datasets.

Detailed Description

With reference to fig. 1, a shape adaptive convolution depth neural network method for hyperspectral image classification includes the following steps:

firstly, a spatial structure information learning branch is adopted, namely a convolutional neural network branch is used for learning spatial structure information of a hyperspectral image and is stored in a feature map called a directed graph. Note the bookAndthe three-dimensional space-spectrum data and the guide graph input by the network are respectively represented, wherein H, W, B, N is the height, the width, the channel number of the three-dimensional space-spectrum data and the channel number of the guide graph respectively. For each spatial coordinate p on the input spatio-spectral data₀The guide map is calculated as (x, y):

G_j(p₀)＝f(W_j·X(p₀)+b_j)

wherein, X (p)₀) Representing a spatial coordinate p in the input spatio-spectral data₀Pixel of (2), W_jAnd b_jRespectively representing the jth one-dimensional convolution kernel and the deviation, G_jRepresents the jth band of the output directed graph, and f (-) represents the softsign activation function.

And secondly, adopting a shape adaptive convolution kernel based on a guide graph and performing trainable training, namely constructing a shape adaptive convolution different from a traditional fixed position sampling mode, and extracting anisotropic space-spectrum characteristics by matching the guide graph, as shown in fig. 2. Note the bookThe acceptance domain in convolution operation is represented, for example, by 3 × 3, as:

for each spatial coordinate p on the input feature map₀(x, y) shape is self-consistent regardless of bias and activation functionThe adaptive convolution operation is represented as:

wherein S is_iDenotes the ith deformable convolution kernel, y_iAnd G is a guide graph. The deformable convolution kernel can be separated into the product of two independent kernels, expressed as:

wherein,represents an isotropic kernel, which is the same as a standard convolution kernel; k is a radical of^anisRepresents the anisotropic kernel, calculated as:

wherein G is a guide graph, and σ is an adjustment sensitivity parameter, | · |₂Representing the norm of L2, exp (·) is an exponential function based on a natural constant e.

Thirdly, a space-spectrum feature extraction unit is formed by the spectrum dimension one-dimensional convolution layer and the space dimension two-dimensional shape adaptive convolution layer, and spectrum dimension one-dimensional convolution and space dimension two-dimensional shape adaptive convolution are executed in sequence, as shown in fig. 3. Each feature extraction unit has two inputs, respectively a feature map and a guide map. Let the input of the first hidden layer unit beOutput is asThen, firstly, batch normalization is performed, specifically:

where E (-) and Var (-) represent the mean and variance functions, respectively. And then performing spectrum dimension one-dimensional convolution, specifically as follows:

wherein k is_l|jAnd b_l|jRespectively representing the jth one-dimensional convolution kernel and deviation, T, in the ith feature extraction unit_l|jThe jth channel of the output characteristic diagram is shown, and f (-) represents the softsign activation function. Finally, the space dimension two-dimensional shape self-adaptive convolution is executed, and a space-spectrum characteristic diagram o is output_lIs concretely provided with

Wherein s is_l|jAnd p_l|jRespectively representing the jth two-dimensional deformable convolution kernel and the corresponding deviation, p, in the ith feature extraction unit_nEnumerating accepting domainsCoordinate of (5), G is a guide map, o_l|jThe jth channel representing the output spatio-spectral feature map, f (-) represents the softsign activation function.

And fourthly, the deep network is formed by stacking a plurality of space-spectrum feature extraction units, layer jump connection is established between every two space-spectrum feature extraction units, namely the plurality of feature extraction units are stacked layer by layer to form the deep network, and the input of each unit is formed by splicing the outputs of all the units in the front. Let the input of the first hidden layer unit beOutput is asThen I_lThe calculation formula of (A) is as follows:

I_l＝[O₁,O₂,…,O_l-1]

wherein [ … ] represents stitching multiple signatures along the spectral dimension.

And fifthly, the network loss function is weighted cross entropy, namely, a weighted cross entropy loss function for relieving the class imbalance problem is constructed. Remember the network input asThe pixels of the spatio-spectral data may be divided into c different classes, and the output of the network isH, W, B, C are height, width, number of channels and number of classes of the three-dimensional space-spectrum data, respectively. The network is formed by stacking L (1 is equal to or less than L) hidden layer units, and the output of the first (1 is equal to or less than L) hidden layer unit is O_lThen, the representation of the feature map input to the classification layer by the network hidden layer is as follows:

I＝[O₁,O₂,…,o_L]

the transformation from the feature map of the spatio-spectral data to the pixel generic probability data is represented as:

wherein, [ … ]]Representing the stitching of multiple signatures along a spectral dimension, p₀Space coordinates of the image elements in the spatio-spectral data are expressed as (x, y), k_jAnd b_jRespectively representing the jth one-dimensional convolution kernel and the deviation, Y_j(p₀) Representing a hyperspectral image at p₀Probability that the picture element of the position belongs to the j-th class. Order toExpressed as a set of spatial coordinates of all training samples in the hyperspectral image, L (p)_t) Represents a sample X (p)_t) The vectorization tag of (a) is,N_c(1 ≦ C ≦ C) represents the number of jth class training samples, and the weighted cross-entropy loss function is expressed as:

wherein p is_tEnumerationAll coordinates of (1), L_c(p_t) Representation vectorization tag L (p)_t) Middle c value, Y_cRepresenting the c-th channel of the probability map Y.

The method has the capability of adaptively adjusting the convolution receiving domain shape and reserving the details of the classified scene, can be suitable for supervision and classification of the hyperspectral images with different resolutions and different scene complexities, and has excellent generalization and classification performances.

The network can adaptively adjust the shape of the acceptance domain of convolution operation according to the spatial structure relationship between explicit definition pixels by learning the spatial correlation between adjacent pixels in the space-spectrum data, overcomes the defect that fixed square convolution cannot capture anisotropic characteristics, and has excellent classification and generalization performance on hyperspectral images with different resolutions and scene complexities.

The effects of the present invention can be further illustrated by the following simulation experiments.

Examples

The hyperspectral image is typical three-dimensional space-spectrum data, and a set of synthetic hyperspectral data (synthesis dataset) and a set of real hyperspectral data (Indian Pines) are adopted in a simulation experiment. The synthetic dataset contained 162 spectral bands, a wavelength range of 0.4-2.5 μm, an image size of 200 × 200, 5 different classes of ground objects, for a total of 40000 annotated samples. The Indian Pines dataset is a hyperspectral remote sensing image acquired by an airborne visible infrared imaging spectrometer (AVIRIS) in an Indian Pines experimental area, indiana, usa. The image contains 220 bands in total, the spatial resolution is 20m, and the image size is 145 × 145. After removing 20 water vapor absorption and low signal-to-noise ratio bands (band numbers 104-. The area contains 10366 samples of 16 known land features. For the synthetic data set, 1% of each type of sample in the experiment is randomly selected as a training set, 1% is randomly selected as a verification set, and the rest 98% is used as a test set. For the Indian Pines data set, 10% of samples of each type are randomly selected as a training set in the experiment, 1% of samples are randomly selected as a verification set, and the rest of samples are used as a test set. The two experiments were repeated 10 times and averaged to obtain the final result, and OA (overhead Accuracy), AA (average Accuracy) and Kappa coefficients were used as evaluation indexes. Both sets of data were without any pre-processing. Further, the comparison method includes: a 2D convolutional neural network (2D-CNN) method, a two-channel convolutional neural network (DC-CNN) method, a 3D convolutional neural network (3D-CNN) method, a multi-channel convolutional neural network (MC-CNN) method, a deep space-spectrum residual error network (SSRN) method, and a fast dense space-spectrum convolutional depth network (FDSSC) method.

The modules of the network under experiment include a 1 × 1 convolutional layer (called a pilot layer) for generating a pilot graph and 5 spatio-spectral feature extraction units, where: the number of output channels of the guide layer is set to be 3; the number of channels of the output feature map of the 1 st feature extraction unit is set to be 128, and the number of channels of the output feature map of the 2 nd-5 th feature extraction units is set to be 32; in all the feature extraction units, the size of the deformable convolution kernel is set to 5 × 5, and the initial value of the sensitivity parameter σ is set to 1. In addition, the network optimizer adopts an Adam optimizer, wherein the learning rate of sigma is 0.01, the learning rate of the residual parameters is 0.001, and the first moment estimation exponential decay rate beta₁Set to 0.9, second moment estimates the exponential decay rate beta₂Set to 0.999, ∈ set to 1e-8, and the number of iterations set to 500. The experimental environment was as follows: a CPU: i7-8700K, GPU: GTX-1080Ti, memory: 32GB, Tensorflow-1.12.

Table 1 and Table 2 show the classification accuracy of simulation experiments performed on the synthetic data set and the Indian Pines data set by the method of the present invention, respectively.

TABLE 1 results of classification of synthetic datasets by different methods

TABLE 2 results of classification of Indian Pines datasets by different methods

From experimental results, the method is very effective for synthesizing data sets, and the performance of the method is obviously higher than that of the advanced methods including SSRN and FDSSC. Due to the inherent defects of the 2D convolution, the 2D-CNN, the DC-CNN, the SSRN and the FDSSC all show an over-smooth phenomenon on the data set, and the method effectively retains the original scene detail information, obtains a better classification effect and proves the effectiveness of the method. For the composite dataset, the classification maps obtained by the different methods are shown in FIG. 4. While for Indian Pines datasets, the method still achieves the best classification results in all comparison methods. As the noise contained in the Indian Pines data set is more, and the training set also contains noise, the method can automatically adjust the retention degree of scene details according to the training set so as to achieve the optimal classification precision. For the Indian Pines dataset, classification plots obtained by different methods are shown in fig. 5. The result shows that the method can effectively learn the structural information of the space-spectrum data, adjust the retention degree of scene details according to the training sample and achieve a better classification effect.

Claims

1. A shape adaptive convolution depth neural network method for hyperspectral image classification is characterized by comprising the following steps:

and fifthly, constructing a weighted cross entropy loss function.

2. The shape-adaptive convolution depth neural network method for hyperspectral image classification according to claim 1, characterized in that the first step is specifically:

adopting a spatial structure information learning branch, namely learning spatial structure information of a hyperspectral image by using a convolutional neural network branch, and storing the spatial structure information in a feature map called a directed graph;

note the bookAndrespectively representing three-dimensional space-spectrum data and a guide graph input by a network, wherein H, W, B, N is the height, width and channel number of the three-dimensional space-spectrum data and the channel number of the guide graph respectively; for each spatial coordinate p on the input spatio-spectral data₀The guide map is calculated as (x, y):

G_j(p₀)＝f(W_j·X(p₀)+b_j)

3. The shape-adaptive convolution depth neural network method for hyperspectral image classification according to claim 1, characterized in that the second step is specifically:

adopting a shape self-adaptive convolution kernel based on a guide graph and performing trainable training, namely constructing a shape self-adaptive convolution different from a traditional fixed position sampling mode, and extracting anisotropic space-spectrum characteristics by matching the guide graph;

note the bookRepresenting the acceptance domain in a convolution operation, for each spatial coordinate p on the input profile₀Without considering the bias and activation function, (x, y), the shape adaptive convolution operation is expressed as:

wherein S is_iDenotes the ith deformable convolution kernel, y_iRepresenting the ith channel of the output characteristic diagram, wherein G is a guide diagram;

the deformable convolution kernel can be separated into the product of two independent kernels, expressed as:

wherein G is a guide graph, and σ is an adjustment sensitivity parameter, | · |₂Denotes the L2 norm, exp (-) is an exponential function with the natural constant e as the baseAnd (4) counting.

4. The shape adaptive convolution depth neural network method for hyperspectral image classification according to claim 1, characterized in that in the third step, a space-spectrum feature extraction unit is composed of a spectrum dimension one-dimensional convolution layer and a space dimension two-dimensional shape adaptive convolution layer, and the spectrum dimension one-dimensional convolution and the space dimension two-dimensional shape adaptive convolution are performed in sequence; each feature extraction unit has two inputs, namely a feature graph and a guide graph, and specifically comprises the following steps:

let the input of the first hidden layer unit beOutput is asFirstly, batch normalization is executed, specifically:

wherein E (-) and Var (-) represent the mean and variance functions, respectively;

and then performing spectrum dimension one-dimensional convolution, specifically as follows:

wherein k is_l|jAnd b_l|jRespectively representing the jth one-dimensional convolution kernel and deviation, T, in the ith feature extraction unit_l|jThe jth channel representing the output characteristic diagram, and f (-) represents the softsign activation function;

finally, the space dimension two-dimensional shape self-adaptive convolution is executed, and a space-spectrum characteristic diagram O is output_LIs concretely provided with

5. The shape-adaptive convolution depth neural network method for hyperspectral image classification according to claim 1, wherein in the fourth step, the depth network is formed by stacking a plurality of space-spectrum feature extraction units, and a layer-skipping connection is established between every two space-spectrum feature extraction units, that is, the depth network is formed by stacking a plurality of feature extraction units layer by layer, and the input of each unit is formed by splicing the outputs of all the units in front, specifically:

let the input of the first hidden layer unit beOutput is asThen I_lThe calculation formula of (A) is as follows:

I_l＝[O₁,O₂,…,O_l-1]

6. The shape-adaptive convolution depth neural network method for hyperspectral image classification according to claim 1, characterized in that in the fifth step, the network loss function is weighted cross entropy, that is, a weighted cross entropy loss function for alleviating the class imbalance problem is constructed, specifically:

remember the network input asThe space-spectrumThe pixels of the data can be divided into c different classes, and the output of the network isH, W, B, C are height, width, number of channels and number of classes of the three-dimensional space-spectrum data, respectively. The network is formed by stacking L hidden layer units, and the output of the first hidden layer unit is O_lL is more than or equal to 1 and less than or equal to L, and L is more than or equal to 1, the representation of the characteristic graph input into the classification layer by the network hidden layer is as follows:

I＝[O₁,O₂,…,O_L]

wherein, [ … ]]Representing the stitching of multiple signatures along a spectral dimension, p₀Space coordinates of the image elements in the spatio-spectral data are expressed as (x, y), k_jAnd b_jRespectively representing the jth one-dimensional convolution kernel and the deviation, Y_j(p₀) Representing a hyperspectral image at p₀Probability that the pixel of the position belongs to the jth class; order toExpressed as a set of spatial coordinates of all training samples in the hyperspectral image, L (p)_t) Represents a sample X (p)_t) The vectorization tag of (a) is,N_crepresenting the number of j-th class training samples, C is more than or equal to 1 and less than or equal to C, then the weighted cross entropy loss function is represented as: