CN114821164A

CN114821164A - Hyperspectral image classification method based on twin network

Info

Publication number: CN114821164A
Application number: CN202210385028.6A
Authority: CN
Inventors: 许德梅; 同磊; 段娟; 肖创柏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-07-29
Anticipated expiration: 2042-04-13
Also published as: CN114821164B

Abstract

The invention discloses a hyperspectral image classification method based on a twin network, which comprises three parts: designing a twin residual error network classification model based on spatial spectrum information; designing a twin residual error feature extraction network based on multi-scale feature fusion parallel branches; designing a twin residual error characteristic extraction network based on a space and channel attention mechanism; the invention expands the receptive field of the feature extraction network by utilizing multi-scale feature fusion, can capture the global dependency between pixels, and simultaneously increases the width of the network to obtain more nonlinear features, thereby obtaining better classification effect. The method comprises the steps of utilizing a feature extraction network based on space and channel attention, applying a channel attention and space attention module to the feature extraction network of the hyperspectral image spectrum and space information, and enhancing important features of space or spectrum by using an attention mechanism to inhibit unnecessary features, so that more obvious performance improvement is obtained under the condition of increasing a small amount of calculated amount.

Description

Hyperspectral image classification method based on twin network

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a hyperspectral image classification method based on a twin network.

Background

The Hyperspectral Image (HSI) is a three-dimensional Image obtained by collecting spectra from a visible light region to a short infrared light region by an aerospace vehicle carrying a Hyperspectral imager, and each pixel in the Image usually contains hundreds of continuous spectral channels and contains abundant spatial and spectral information reflecting surface objects. Compared with the prior remote sensing technology, the hyperspectral remote sensing has the characteristic of integrating space and spectrum, and is a comprehensive remote sensing technology means by using a series of wave bands from visible light to infrared and even thermal infrared. The hyperspectral technology combines the spectrum of a determined substance or a ground object substance with an image which reveals the spatial and geometric relationship of the substance, so that the spectrum characteristic of the ground object is obtained while the integral form and the relationship of the integral form and the surrounding ground object are not lost. Especially, the hyperspectral remote sensing has the advantages of identifying weak information and quantitatively detecting under the condition that ground information is weak. Based on the characteristic, the hyperspectral image can accurately identify and classify the earth surface substances, and is mainly applied to military target detection, mineral exploration, agricultural production and the like.

Hyperspectral Image Classification (Hyperspectral Image Classification) is one of important tasks in the Hyperspectral data analysis process, and is used for accurately identifying different ground objects in an imaging scene by utilizing Image information in an Image. Different from the classification of natural images, the object of hyperspectral image classification is each pixel in an image, and each pixel in the image is endowed with a class label according to sample characteristics. However, the hyperspectral image data volume is large, the wave bands are multiple, the correlation between the wave bands is strong, the image classification faces a larger challenge due to the fact that mixed pixels and samples are difficult to label, and the like, and therefore the hyperspectral image classification becomes one of the research hotspots in the field of current remote sensing. The hyperspectral image classification comprises a traditional classification method based on mathematical statistics and a deep learning method. Methods based on deep learning typically require a large number of labeled samples, while hyperspectral image classification can use less labeled data.

Twin networks (Siamese networks) are a model of similarity metrics, which learn metrics from data and then use the learned metrics to compare and match samples of unknown classes. Two twin branch networks of the twin network share a set of parameters and weights, the input is mapped to a target space through an embedding function, and a simple distance function is used for similarity calculation. The loss of a pair of samples of the same category is minimized in the training phase, and the loss of a pair of samples of different categories is maximized. In the testing stage, the twin network is used to calculate similarity or distance prediction sample classes. Therefore, the method can be used for the classification task with a large number of classes and a small number of samples of each class. The twin convolutional neural network is used for classifying the hyperspectral images by a small sample learning method, and high classification accuracy can be obtained by using few marked samples.

Disclosure of Invention

The invention provides a hyperspectral image classification method based on a twin network, which aims to better solve the problems of spectral data redundancy and hyperspectral image classification under small samples. The twin network model mainly comprises three parts:

1) twin residual error network classification model based on spatial spectrum information

The twin residual error network classification model takes a residual error network as a main characteristic extraction network, and a twin branch is constructed in a mode of adopting space spectrum separation characteristic extraction and further combining characteristic levels. The model mainly comprises two twin branches, wherein each twin branch comprises a two-dimensional characteristic extraction residual error network based on spatial information and a one-dimensional characteristic extraction residual error network based on spectral information. The method comprises the steps of respectively processing spatial geometric information and spectral information of a hyperspectral image, extracting spectral features, extracting the spatial features, superposing the two features to form a long feature vector, feeding the superposed feature vector back to a classifier, and finally obtaining a classification result.

2) Design twin residual error feature extraction network based on multi-scale feature fusion parallel branch

The size of the block of the hyperspectral image sampling pixel block is reasonably designed according to the data distribution characteristics of the hyperspectral image data set, and a multi-scale feature fusion module is adopted. The module consists of four parallel branch structures, which are three different convolution kernels with sizes of 1 × 1, 3 × 3 and 5 × 5, wherein the 5 × 5 convolution is replaced by two-step 3 × 3 convolution, and finally four channels are combined.

3) Twin residual error characteristic extraction network based on space and channel attention mechanism

A space attention mechanism is used for extracting high-level features of two-dimensional space information, and a channel attention mechanism is used for extracting low-level features of one-dimensional spectral information. The channel attention module then enhances or suppresses different bands of spectral information for different tasks by modeling the importance of each characteristic channel. The spatial attention module extracts attention information among pixel positions in the channel by using convolution layers with the same alternating convolution kernels, and obtains information such as correlation and importance degree among different pixel positions in the channel.

Compared with the prior art, the invention has the following beneficial effects: the twin residual error network structure based on the space spectrum separation is constructed, original spectral feature information is kept, meanwhile, the number of parameters for spatial feature extraction is reduced, and classification accuracy is improved. The receptive field of the feature extraction network is enlarged by utilizing multi-scale feature fusion, the global dependency between pixels can be captured, meanwhile, the width of the network is increased to obtain more nonlinear features, and further, a better classification effect is obtained. The method comprises the steps of utilizing a feature extraction network based on space and channel attention, applying a channel attention and space attention module to the feature extraction network of the hyperspectral image spectrum and space information, and enhancing important features of space or spectrum by using an attention mechanism to inhibit unnecessary features, so that more obvious performance improvement is obtained under the condition of increasing a small amount of calculated amount.

Drawings

FIG. 1 is an overall structure diagram of a twin residual error network classification model of the present invention

FIG. 2 is a diagram of a multi-scale feature fusion twin residual error feature extraction network architecture according to the present invention

FIG. 3 is a block diagram of a multi-scale feature fusion module of the present invention

FIG. 4 is a block diagram of an attention mechanism of the present invention

FIG. 5 is a basic flow diagram of the present invention

FIG. 6(a) is a diagram showing the effect of classifying the hyperspectral images of Salinas dataset

FIG. 6(b) is a diagram of the effect of classifying the hyperspectral images of the Salinas dataset by the 2-DCNN method

FIG. 6(c) is a diagram of the effect of classifying the Salinas data set hyperspectral images in the 3-DSSRN method

FIG. 6(d) is a DSCNN method Salinas data set hyperspectral image classification effect diagram

FIG. 6(e) is a diagram of Salinas data set hyperspectral image classification effect under the DSSRN method of the invention

FIG. 7(a) is a graph of the effect of the classification of hyperspectral images of KSC datasets

FIG. 7(b) is a graph showing the effect of classifying hyperspectral images of KSC data set by 2-DCNN method

FIG. 7(c) is a graph showing the effect of classifying the hyperspectral images of the KSC data set in the 3-DSSRN method

FIG. 7(d) is a DSCNN method KSC data set hyperspectral image classification effect graph

FIG. 7(e) is a graph showing the effect of the classification of the hyperspectral image of the KSC data set under the DSSRN method of the invention

Detailed Description

As shown in the attached FIG. 5, the hyperspectral image classification method based on the twin network can be realized by the following four steps: preprocessing hyperspectral image data, constructing a twin network model, training the twin network model, and applying the twin network model to hyperspectral image classification. For the purpose of understanding, the following description of specific steps of embodiments of the present invention is provided in conjunction with the accompanying drawings.

Step one, hyperspectral image data preprocessing

1. Data reading

After the hyperspectral image is read, the sizes of pixel neighborhood image blocks are set, for example, the size PatchSize of a neighborhood pixel block is 7 multiplied by 7 and 27 multiplied by 27 respectively. And then, performing Principal Component Analysis (PCA) and then performing EMP (empirical mode Power) on the expansion morphological section, and performing dimensionality reduction on the hyperspectral image data to generate a low-spectrum image for spatial information feature extraction.

In the embodiment of the invention, PatchSize is set to be 27 × 27, PCA principal component analysis is set to be 4, and the PatchSize is 36 after being subjected to EMP expansion morphological section.

2. Data sampling

Random sampling is carried out on nonzero pixels of the hyperspectral image, 6-8 points of each type are randomly selected to serve as training set samples, meanwhile, 6-8 points of each type are randomly selected to serve as verification set samples, and the rest pixel points serve as test set samples.

In the embodiment of the invention, 8 sampling points of each type of the data set are set as training set samples.

3. Sample pairing

Matching training set samples according to ground object class labels of pixel points, wherein the proportion of positive and negative samples is 1: 1. meanwhile, a central pixel vector and a pixel neighborhood image block are respectively obtained for each sample, the size of the neighborhood image block is a super-parameter, and a reference specific data set is specifically set.

The embodiment of the invention sets the block size of the neighborhood image to be 27 multiplied by 27.

4. Data encapsulation

And respectively carrying out data set packaging and loading on the hyperspectral image training set, the verification set and the test set. The size of a batch processing block of the training set is a hyper-parameter, and the number of ground object types of the hyper-spectral data set can be specifically set; the validation set and test set batch block sizes are set to 1.

The embodiment of the invention sets the batch processing block size of the training set to be 16.

Step two, constructing a twin network model

1. Twin network structure

As shown in the attached figure 1, the twin network model is composed of twin branches of two identical neural network structures, and each twin branch comprises a two-dimensional feature extraction network based on spatial information and a one-dimensional feature extraction network based on spectral information. The twin network model takes a residual error network as a main characteristic extraction network and adopts a mode of characteristic level fusion after space spectrum separation characteristic extraction.

2. Twin branched network structure

As shown in the attached figure 2, the twin branch network takes a lightweight convolutional neural network model as a basic framework of a residual error network, and a multi-scale feature fusion module and an attention mechanism module are introduced for improvement. Twin branched networks are mainly composed of these types of layers: input layer (Input), convolutional layer (Conv), Pooling (Pooling) layer, and fully connected layer (FullConnection), which are added together to construct a complete residual network. As shown in fig. 3, the residual network is composed of a first convolutional layer and 3 residual blocks, and each residual block uses a multi-scale feature fusion method. As shown in FIG. 4, before the multi-scale feature fusion module, the attention mechanism is used for weight distribution, and the important features of the enhanced spatial spectrum restrain unnecessary features.

3. Convolutional layer

The input of the two-dimensional residual error network hyperspectral image extracted based on the spatial information features is 27 multiplied by 36. The first convolutional layer uses convolution kernels of 4 × 4 size, the number of convolution kernels is 32, stride is 1, and edge padding is 0. That is, the first convolutional layer is composed of 32 feature maps (featuremaps), each neuron in the feature maps is connected with a 4 × 4 neighborhood in the input, and the feature map size obtained by the output is 24 × 24 × 32. The first convolutional layer has 640 trainable parameters (each convolutional kernel 4 × 4 ═ 16 unit parameters and one bias parameter, for a total of 32 convolutional kernels, (4 × 4+1) × 32 ═ 544 parameters), and for a total of (27 × 27) × 544 ═ 396576 connections. Then batch standardization and ReLU activation are carried out, and the convergence speed is accelerated.

The first convolutional layer is followed by 3 consecutive residual blocks, with 32, 32, 64 residual block convolutional kernels. Three convolution kernel sizes of 1 × 1, 3 × 3 and 5 × 5 are used for each residual block, wherein the 5 × 5 convolution uses two-step 3 × 3 convolution instead to reduce the amount of parameter calculation. After the convolution step stride is set to be 1, as long as padding is set to be 0, 1 and 2 respectively, the features with the same dimension can be obtained by adopting edge filling, and the three scale features are extracted and fused to be spliced together after batch standardization and ReLU activation. And then adding the inputs of the last residual block, and then performing ReLU activation to obtain a new characteristic diagram. The 1 st residual block convolution layer obtains 32 feature maps, and then the next multi-scale residual block convolution is carried out. And (4) obtaining 32 feature maps by the 2 nd residual block convolution layer, and continuing to perform next multi-scale residual block convolution. The 3 rd residual block convolution layer obtains 64 feature maps, and then the next multi-scale residual block convolution is carried out to obtain a feature map of 1 multiplied by 64.

The one-dimensional residual error network structure extracted based on the spectral information features is similar to the two-dimensional residual error network structure, wherein the one-dimensional residual error network uses one-dimensional convolution, and the number of convolution kernels of each layer is 16, 16, 16 and 32 respectively. The hyperspectral image input of the one-dimensional residual network is 1 × bands, where bands are the number of spectral bands of the dataset, e.g., KSC dataset bands ═ 176.

4. Pooling layer

The pooling layer uses a 2 x 2 size filter with a stride of 2 and an edge padding of 0. That is, each unit in the feature map of the pooling layer is connected with the 2 × 2 neighborhood of the corresponding feature map in the last convolution layer, there are 64 2 × 2 feature maps, the size of the feature map obtained by outputting is 1 × 1 × 64, and there is no parameter to be learned.

5. Full connection layer

Like a full convolutional neural network, the present invention does not use a fully connected layer to compute the dot product between the input vector and the weight vector, but instead all connects with convolutional layers.

Step three, training the twin network model

1. Forward propagation

The twin network model constructed by the invention has two twin branches, the twin branches are input into a group of sample pairs, and each sample consists of a central pixel vector and a corresponding neighborhood pixel block. Firstly, hyperspectral image data is preprocessed, and positive and negative input sample pairs with balanced proportion are formed after random sampling and sample pairing. The number of samples in the training set is 6 to 8 samples in each type of the hyperspectral image dataset. Then, the two inputs of the sample pair enter two twin branches respectively for spatial and spectral feature extraction. And the twin branch adopts a two-dimensional convolution neural network to extract spatial information characteristics, the one-dimensional convolution neural network extracts spectral information characteristics, and spatial spectral information characteristic level fusion is carried out after the respective characteristic extraction. After the spatial spectrum information feature level fusion, the output is calculated through the logistic regression of the L1 distance function, and the formula is as follows.

In the formula d ₁ Is the output of the twin branched network, I ₁ And I ₂ Are pairs of input samples.

2. Loss function

After the model is built, when a twin network is trained, the error of the binary classifier is calculated by adopting a binary cross entropy loss function (BCELoss), and the formula is as follows.

Wherein N is the total number of samples, y _i Is the class to which the ith sample belongs, p _i Is the class predictor for the ith sample, which is here a probability value. The loss function can make the output vector close enough to the desired classification result, which optimizes the network based on the fact that they are not similar but the network is considered quite similar.

3. Optimizer

The stochastic gradient descent algorithm (SGD) selects one mini-batch at a time, instead of all samples, and uses gradient descent to update the model parameters. The parameter updating steps are as follows: and calculating the gradient of the target function relative to the current parameter, calculating the descending gradient of the current moment, and updating the parameter according to the descending gradient. Setting parameters to be optimized as omega, an objective function as f (x), an initial learning rate as alpha, iteration period number as epoch, wherein g _t Is the gradient of the current parameter. The parameter update formula is as follows:

W _t+1 ＝W _t -α·g _t formula (3)

The twin network model is optimized by using gradient descent, the batch size BatchSize is set as the number of categories, the iteration period number epoch is set as 200, the learning rate alpha is initialized to 0.001, and the subsequent learning rate is automatically adjusted along with the change of the epoch. The optimizer directs each parameter of the loss function (objective function) to update to a proper size in a correct direction, so that each updated parameter enables the loss function value to approach the global minimum continuously.

Step four, applying a twin network model to classify the hyperspectral images

In the testing stage, a group of sample pairs is formed by the testing sample and each possible known label sample, the twin network model determines the class label of the prediction sample by calculating and comparing the similarity of the sample pairs, and the similarity is obtained by a distance calculation formula.

The experimental configuration and analysis are briefly described below, and the actual classification effect is improved by comparing and analyzing the experimental results.

1. Conditions of the experiment

The hardware test platform of the invention is a processor Intel (R) core (TM) i7-10700FCPU, the dominant frequency is 2.90Hz, the internal memory is 16GB, and the display card is NvidiaGeForceGTX 3060. The software platform is a Windows10 operating system, and the development environment is Pycharm 2021. The programming language is Python, and the deep learning network architecture is Pythrch.

2. Experimental data

The performance evaluation of the invention mainly uses two data sets: the U.S. Salinas valley dataset and the U.S. KSC Kennedy dataset, Florida.

The size of the U.S. Salinas valley dataset is 512217, there are 204 available bands, and all of them contain 111104 pixels, of which 56975 pixels are background pixels, and 54129 pixels which can be used for classification, and these pixels are totally classified into 16 classes. And taking 8 sample points in each class and taking 128 pixel points as a training set, taking 128 pixel points as a verification set and taking 54001 pixel points as a test set. Table 1 shows the training and test sample selection for the Salinas valley dataset of the United states of the invention. Note: the test data listed in the table below includes a validation set and a test set.

TABLE 1

The KSC kennedy dataset of florida, usa has 176 available bands, which contain 314368 pixels, of which 309157 are background pixels and 5211 pixels applicable to classification, and these pixels are classified into 13 types in total. And each type takes 8 sample points and totally takes 104 pixel points as a training set, 104 pixel points as a verification set and 5107 pixel points as a test set. Table 2 shows the selection of training and testing samples for the KSC Florida Kennedy dataset of the United states. Note: the test data listed in the table below includes a validation set and a test set.

TABLE 2

3. Performance comparison

The three prior art comparison and classification methods used in the present invention are as follows:

(1) a two-dimensional convolution neural network hyperspectral image classification method based on spatial-spectral combination, which is proposed by Camps-Valls et al in Composite kernels for hyperspectral image classification, is called 2-DCNN method for short.

(2) A3-D Deep Learning frame, which is a three-dimensional convolution neural Network Hyperspectral Image Classification method based on spatio-Spectral combination and is called 3-DSSRN method for short.

(3) Lingbo Huang et al propose a Dual-Path twin network based on space-spectrum separation, DSCNN for Hyperspectral Image Classification With limiting sampling Samples.

In the experiment, the following three indexes were used to evaluate the performance of the present invention:

the first evaluation index is Overall Accuracy (OA), which represents the proportion of correctly classified samples to all samples, with larger values indicating better classification. The second evaluation index is the Average Accuracy (AA), which represents the average of the accuracy of classification for each class, with larger values indicating better classification results. The third evaluation index is a chi-square coefficient (Kappa) which represents different weights in the confusion matrix, and the larger the value is, the better the classification effect is.

Table 3 shows the accuracy of the present invention and three other methods for classifying hyperspectral images of Salinas datasets in the United states.

TABLE 3

Table 4 is the accuracy and comparison of the present invention for classification of us KSC kennedy hyperspectral images.

TABLE 4

As can be seen from tables 3 and 4, for the same hyperspectral image data set, the classification accuracy of the classification method provided by the invention is superior to that of other classification methods. In addition, fig. 6 and 7 show classification diagrams of the classification methods, and the visualized classification effect diagrams thereof are the same as the results listed in table 3 and table 4. As can be seen from the images, when a small sample is studied, the classification chart realized by the method has smaller error and is better than the classification methods of 2-D CNN, 3-D SSRN and DSCNN.

In summary, the invention provides a hyperspectral image classification method based on a twin network in order to better solve the problems of spectral data redundancy and hyperspectral image classification under small samples. A twin residual error network structure based on spatial spectral information is constructed, and the spatial spectral separation characteristic extraction network structure reduces the quantity of parameters of spatial characteristic extraction while keeping the original spectral characteristic information, so that the classification precision and speed are improved. And a feature extraction network based on multi-scale feature fusion parallel branches is adopted, and the receptive field of the feature extraction network is enlarged and the width of the network is increased. The feature extraction network based on the spatial and channel attention mechanism is adopted, and unnecessary features are restrained by using the attention mechanism to enhance the spatial spectrum important features, so that the classification precision is improved. Through real experimental data analysis, the method provided by the invention can effectively realize the classification of the hyperspectral images under the condition of small samples, and the classification precision is better than that of other networks of the same type.

Claims

1. The hyperspectral image classification method based on the twin network is characterized by comprising the following steps: the method comprises the following four steps: preprocessing hyperspectral image data, constructing a twin network model, training the twin network model, and applying the twin network model to hyperspectral image classification.

Firstly, preprocessing hyperspectral image data;

1) reading data;

after the hyperspectral image is read, the size of a pixel neighborhood image block is set, an Principal Component Analysis (PCA) is carried out, then an EMP expansion morphological section is carried out, and the hyperspectral image data is subjected to dimensionality reduction to generate a low-spectrum image which is used for spatial information feature extraction.

2) Sampling data;

3) Sample pairing;

matching training set samples according to ground object class labels of pixel points, wherein the proportion of positive and negative samples is 1: 1. and respectively acquiring a central pixel vector and a pixel neighborhood image block for each sample, wherein the size of the neighborhood image block is a hyper-parameter.

4) Data encapsulation;

and respectively carrying out data set packaging and loading on the hyperspectral image training set, the verification set and the test set. And (3) setting the size of the batch processing block of the training set as a hyper-parameter, and specifically setting the number of ground object categories referring to the hyper-spectral data set.

Step two, constructing a twin network model;

1) a twin network structure;

the twin network model is composed of twin branches of two identical neural network structures, and each twin branch comprises a two-dimensional feature extraction network based on spatial information and a one-dimensional feature extraction network based on spectral information. The twin network model takes a residual error network as a feature extraction network and adopts a mode of feature level fusion after space spectrum separation feature extraction.

2) A twin branched network structure;

the twin branch network takes a lightweight convolutional neural network model as a basic framework of a residual error network, and introduces a multi-scale feature fusion module and an attention mechanism module for improvement. Twin branched networks are composed of several types of layers: input layer (Input), convolutional layer (Conv), Pooling (Pooling) layer and fully connected layer (Full Connection), a complete residual network is constructed by adding these layers together. The residual error network is composed of a first convolution layer and 3 residual error blocks, and each residual error block adopts a multi-scale feature fusion mode. Before the multi-scale feature fusion module, an attention mechanism is used for weight distribution, and the important features of the enhanced spatial spectrum restrain unnecessary features.

3) A convolution layer;

the input of the two-dimensional residual error network hyperspectral image extracted based on the spatial information features is 27 multiplied by 36. The first convolutional layer uses convolution kernels of 4 × 4 size, the number of convolution kernels is 32, stride is 1, and edge padding is 0. That is, the first convolutional layer is composed of 32 feature maps, each neuron in the feature map is connected with a 4 × 4 neighborhood in the input, and the size of the feature map obtained by output is 24 × 24 × 32. The first convolutional layer has 640 trainable parameters. Then batch standardization and ReLU activation are carried out, and the convergence speed is accelerated.

The first convolutional layer is followed by 3 consecutive residual blocks, with 32, 32, 64 residual block convolutional kernels. Three convolution kernel sizes of 1 × 1, 3 × 3, and 5 × 5 are used for each residual block, respectively, wherein the 5 × 5 convolution uses two-step 3 × 3 convolution instead to reduce the amount of parameter computation. After the convolution step length stride is set to be 1, only padding is set to be 0, 1 and 2 respectively, the features with the same dimensionality are obtained by adopting edge filling, and the three scale features are extracted and fused to be subjected to batch standardization and spliced together after ReLU activation. And then adding the inputs of the last residual block, and then performing ReLU activation to obtain a new characteristic diagram. The 1 st residual block convolution layer obtains 32 feature maps, and then the next multi-scale residual block convolution is carried out. And (4) obtaining 32 feature maps by the 2 nd residual block convolution layer, and continuing to perform next multi-scale residual block convolution. The 3 rd residual block convolution layer obtains 64 feature maps, and then the next multi-scale residual block convolution is carried out to obtain a feature map of 1 multiplied by 64.

The one-dimensional residual error network structure extracted based on the spectral information features is similar to the two-dimensional residual error network structure, wherein the one-dimensional residual error network uses one-dimensional convolution, and the number of convolution kernels of each layer is 16, 16, 16 and 32 respectively. The hyperspectral image input of the one-dimensional residual error network is 1 multiplied by bands, wherein the bands are the number of data collection spectrum bands.

4) Pooling layer

The pooling layer uses a 2 x 2 size filter with a stride of 2 and an edge padding of 0. That is, each unit in the feature map of the pooling layer is connected with the 2 × 2 neighborhood of the corresponding feature map in the last convolution layer, 64 feature maps of 2 × 2 are provided, the size of the feature map obtained by outputting is 1 × 1 × 64, and no parameter needs to be learned.

5) A fully-connected layer;

the fully-connected layers are not used to calculate the dot products between the input vectors and the weight vectors, but are all connected with convolutional layers.

Step three, training a twin network model;

1) forward propagation;

the constructed twin network model has two twin branches, which are input as a group of sample pairs, each sample consisting of a central pixel vector and a corresponding neighborhood pixel block. Firstly, hyperspectral image data is preprocessed, and positive and negative input sample pairs with balanced proportion are formed after random sampling and sample pairing. The number of samples in the training set is 6 to 8 samples in each type of the hyperspectral image dataset. And respectively entering two inputs of the sample pair into two twin branches for spatial and spectral feature extraction. And the twin branches adopt a two-dimensional convolution neural network to extract spatial information characteristics, the one-dimensional convolution neural network extracts spectral information characteristics, and spatial spectral information characteristic level fusion is carried out after the respective characteristics are extracted. After the spatial spectrum information feature level fusion, the output is calculated through the logistic regression of the L1 distance function, and the formula is as follows.

2. Loss function

After the model is built, when a twin network is trained, the error of the binary classifier is calculated by adopting a binary cross entropy Loss function (BCE Loss), and the formula is as follows.

Wherein N is the total number of samples, y _i Is the class to which the ith sample belongs, p _i Is the class predictor for the ith sample.

3. Optimizer

The model parameters are updated using a stochastic gradient descent algorithm. The parameter updating steps are as follows: and calculating the gradient of the target function relative to the current parameter, calculating the descending gradient of the current moment, and updating the parameter according to the descending gradient. Setting parameters to be optimized as omega, an objective function as f (x), an initial learning rate as alpha, iteration period number as epoch, wherein g _t Is the gradient of the current parameter. The parameter update formula is as follows:

W _t+1 ＝W _t -α·g _t formula (3)

The twin network model is optimized using gradient descent with batch size BatchSize set as the number of classes. The optimizer guides each parameter of the loss function to update the proper size in the correct direction, so that each updated parameter enables the loss function value to approach the global minimum continuously.

Step four, applying a twin network model to classify the hyperspectral images;

2. The twin network-based hyperspectral image classification method according to claim 1, wherein: the twin network model comprises the following three parts:

1) designing a twin residual error network classification model based on spatial spectral information;

2) Designing a twin residual error feature extraction network based on multi-scale feature fusion parallel branches;

3) Designing a twin residual error characteristic extraction network based on a space and channel attention mechanism;

a space attention mechanism is used for extracting the high-level features of the two-dimensional spatial information, and a channel attention mechanism is used for extracting the low-level features of the one-dimensional spectral information. The channel attention module then enhances or suppresses different bands of spectral information for different tasks by modeling the importance of each characteristic channel. The spatial attention module extracts attention information among pixel positions in the channel by using convolution layers with the same alternating convolution kernels, and obtains correlation and importance degree information among different pixel positions in the channel.