CN117671384A

CN117671384A - Hyperspectral image classification method

Info

Publication number: CN117671384A
Application number: CN202311735609.9A
Authority: CN
Inventors: 张绍泉; 唐孟雄; 梁联晖; 李政奎; 李璠; 赖鹏飞; 郑嘉焌; 邓承志; 汪胜前
Original assignee: Nanchang Institute of Technology; Guangzhou Construction Co Ltd
Current assignee: Nanchang Institute of Technology; Guangzhou Construction Co Ltd
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-03-08

Abstract

The invention discloses a hyperspectral image classification method which combines the advantages of a convolutional neural network and a mixed combination converter attention network. In the aspect of spectral feature extraction, local spectral feature information of hyperspectral images is acquired by using a 3D dense convolutional neural network, and then the token of various group sizes and the association among groups are captured simultaneously by using a mixed combination converter attention network; in the aspect of spatial feature extraction, the local spatial feature information of the hyperspectral image is acquired by using a 2D dense convolutional neural network, and then the utilization of the global spatial information of the hyperspectral image is enhanced by using a hybrid combination transformer attention network. And finally, fusing the space and spectrum characteristic information through a characteristic fusion network, and outputting a classification result through a Softmax layer. The method realizes the accurate classification of the hyperspectral image, and the lightweight Sophia optimizer is adopted, so that the running speed of the model can be greatly increased.

Description

Hyperspectral image classification method

Technical Field

The invention belongs to the technical field of hyperspectral image analysis and processing, and particularly relates to a hyperspectral image classification method.

Background

The hyperspectral technology is a technology of crossing multiple subjects such as computer science, geography and the like, and uses narrow spectral intervals to image in different electromagnetic wave ranges through a hyperspectral imager so as to obtain a spectral curve capable of reversing the spectral characteristics of the ground object. Data of hundreds of spectral bands are recorded at the same spatial resolution to form a three-dimensional hyperspectral image with a large amount of spatial and spectral information. The hyperspectral image uses two-dimensional space imaging to represent the reflection effect of a surface object in a single wave band, and the reflection effects of a plurality of wave bands are sequentially combined in sequence to form a spectrum vector dimension with multiple layers and approximate continuity. Each hyperspectral pixel characteristic is composed of the spectral vectors, each pixel data is a continuous spectral curve, and observed ground feature information is recorded in detail. Because the hyperspectral image can describe the spectral information and the spatial information of the ground object in detail, with the development of hyperspectral image classification technology, hyperspectral image classification is widely applied to the fields of precise agriculture, urban and rural construction, mineral exploitation and the like.

The hyperspectral image classification method can be roughly classified into a classification method based on spectral information, a classification method based on space-spectral feature combination and a deep learning classification method. The first type of method only uses spectrum dimension information in the hyperspectral image, and ignores correlation among pixels in space; the second class of methods improves the classification performance of hyperspectral images to some extent, but they depend largely on hand-made features. I.e. classification map effects are mainly determined by low-level features, which however do not represent the complex content in the hyperspectral image, so that classification performance is limited; compared with the prior two traditional shallow classification methods, the third method has stronger characterization and generalization capability, can extract deeper image features, and can obtain more distinguishing features so as to obtain good classification results.

Although the methods obtain better classification effects, the model based on the convolutional neural network cannot model long-term dependency relationship among hyperspectral pixels, so that global spectrum-space characteristic information in a hyperspectral image cannot be fully utilized. Although students have attempted to use transformers to build long-range dependencies between hyperspectral images, traditional Transformer-based methods have their self-attention frameworks only captured associations between tokens at a single granularity, but ignored associations between token groups, resulting in an inability to fully capture global feature information in hyperspectral images. In addition, in terms of optimizers, the method based on the traditional Adam optimizer is relatively slow in convergence speed of the former model and higher in training cost of the model compared with the latest lightweight second-order optimizer Sophia optimizer. Therefore, the method and the device have important significance in fully utilizing the local characteristic information of the hyperspectral image, effectively establishing long-term dependency relationship among the hyperspectral pixels, fully utilizing the local and global spectrum-space characteristic information of the hyperspectral image, improving the classification precision of the hyperspectral image, and reducing the training cost of the model.

In view of the problems existing in the prior art of classifying hyperspectral images by using a transducer and a traditional optimizer, a plurality of technicians in the industry begin to study the classification of hyperspectral images by using a mixed convolutional neural network, for example, the patent of the invention with the grant bulletin number of CN111310598B discloses a hyperspectral remote sensing image classification method based on 3-dimensional and 2-dimensional mixed convolution, and the method comprises the steps of obtaining hyperspectral remote sensing images to be classified; performing spectrum dimension reduction by using a principal component analysis method; according to the spectrum information quantity, arranging spectrum bands in the hyperspectral remote sensing image after dimension reduction from high to low along the middle of the channel to two sides of the channel; giving corresponding weight to the spectral band according to the spectral information amount contained in the spectral band; taking cube data with a fixed space size from each pixel point in the spectrum band, extracting spectrum-space characteristics according to the cube data by using 3-dimensional convolution, and fusing spectrum information by using 2-dimensional convolution to obtain final characteristic information; extracting second-order information from the characteristic information by adopting a covariance pooling method, and outputting a characteristic vector; and inputting the feature vector into a three-layer fully-connected network to obtain a prediction classification result. The method combines the advantages of 3-dimensional convolution and 2-dimensional convolution, and realizes accurate classification of hyperspectral remote sensing images under a low training sample. However, the method is based on a convolution network only, and still cannot establish long-term dependency relationship between hyperspectral pixels, so that the method cannot fully utilize global characteristic information of spectrum-space in a hyperspectral image. In addition, the method is based on a traditional Adam optimizer, and compared with a latest lightweight second-order optimizer Sophia optimizer, the convergence speed of the model is relatively low, and the training cost of the model is higher.

Accordingly, there is a need in the art for a new hyperspectral image classification method to address the above-described problems.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a hyperspectral image classification method based on a convolutional neural network and a hybrid combined transducer attention network, which combines the advantages of the convolutional neural network and the hybrid combined transducer attention network, adopts a new lightweight second-order optimizer Sophia optimizer to replace the traditional Adam optimizer, thereby not only realizing accurate classification of hyperspectral images, but also accelerating the running speed of a model and reducing the training cost of the model.

In order to achieve the above object, the present invention provides a hyperspectral image classification method, wherein hyperspectral images are obtained by photographing with a hyperspectral camera, the hyperspectral image classification method is based on a convolutional neural network and a hybrid combined transducer attention network, and specifically comprises the following steps:

s1, acquiring a hyperspectral image to be classified, and preprocessing the hyperspectral image to be classified to obtain hyperspectral data blocks X1 and X2;

s2, processing the input hyperspectral data block X1 by using a 3D dense convolutional neural network to obtain local spectral feature information of hyperspectral images of different convolutional layers;

s3, firstly carrying out channel transformation on the hyperspectral data output in the step S2 by using a 3D convolutional neural network layer, then carrying out dimension transformation, and then inputting a mixed combination transducer attention network based on spectral characteristics;

s4, in parallel with the step S2, reducing the dimension of the input hyperspectral data block X2 by using a 3D convolutional neural network layer;

s5, after the step S4 is executed, acquiring local spatial characteristic information of hyperspectral images of different convolution layers by using a 2D dense neural network;

s6, performing dimension transformation on the hyperspectral data output in the step S5, and then inputting the hyperspectral data into a hybrid combination transducer attention network based on spatial characteristics;

s7, carrying out feature information fusion on the local spectral feature information of the different convolution layers obtained in the step S3 and the local spatial feature information of the different convolution layers obtained in the step S6 by utilizing a spectral-spatial feature fusion network;

s8, replacing the traditional optimizers used in the models constructed in the steps S1 to S7 by Sophia optimizers;

and S9, processing the feature vector output by the last full-connection layer in the spectrum-space feature fusion network by utilizing a Softmax layer, and finally obtaining a hyperspectral image prediction classification result.

Further, the step S1 includes:

let the size of the image for hyperspectral image classification be w×h×c;

remodelling hyperspectral image classification data to X, size n×c, where n=w×h;

carrying out data preprocessing on the hyperspectral image classification data X to obtain hyperspectral data blocks X1 and X2, wherein the dimensions of X1 and X2 are Xse multiplied by Xse multiplied by C and Xsa multiplied by Xsa multiplied by C respectively; and the obtained hyperspectral data blocks X1 and X2 are respectively used as the input of a subsequent spectral feature extraction branch network and a spatial feature extraction branch network.

Further, step S2 includes:

inputting the hyperspectral data block X1 into the 3D dense convolutional neural network for processing to obtain hyperspectral data characteristic information Ye1, wherein the dimension size of the hyperspectral data characteristic information Ye1 is Xse multiplied by Xse multiplied by C, and the characteristic channel number is 60; wherein:

the 3D dense convolutional neural network comprises 4 convolutional layers, the convolutional kernel size is (1, ke), and the value of Ke is 5; the number of channels of the first layer of convolution layer is 24, and the number of channels of each other layer of convolution layer is 12;

each convolution layer in the 3D dense convolution neural network consists of a 3D convolution neural network layer, a batch normalization layer and a Mish activation function layer.

Further, the step S3 includes:

firstly, carrying out channel transformation on the hyperspectral data characteristic information Ye1 output in the step S2 by using a 3D convolutional neural network layer to obtain new hyperspectral data characteristic information Ye2; then, carrying out dimension transformation on the hyperspectral data characteristic information Ye2, and inputting the hyperspectral data characteristic information Ye2 into a mixed combination transducer attention network based on spectral characteristics; the mixed combination converter attention network based on the spectrum characteristics divides the token of the hyperspectral data characteristic information Ye2 input into the network into fragments, and generates group agents to replace individual token through multiple groups of aggregators with different sizes; wherein:

the convolution kernel size of the 3D convolution neural network layer is (1, ke 1), wherein the value of Ke1 is determined by C;

when C is even, ke1 has the value of Floor ((C-Ke+1)/2); when C is an odd number, ke1 has a value of Floor ((C-Ke)/2+1), where Floor (. Cndot.) represents a rounding-down function.

Further, the step S4 includes:

inputting the hyperspectral data block X2 into a 3D convolutional neural network layer for dimension reduction to obtain hyperspectral data characteristic information Ya1, wherein the dimension of the hyperspectral data characteristic information Ya1 is Xsa multiplied by Xsa; wherein:

the convolution kernel size of the 3D convolution neural network layer is (1, ka0), wherein the value of Ka0 is the spectrum channel size of the input hyperspectral data block X2, and the number of convolution kernel channels is 48.

Further, the step S5 includes:

inputting the hyperspectral data characteristic information Ya1 into the 2D dense convolutional neural network for processing to obtain new hyperspectral data characteristic information Ya2, wherein the dimension of the hyperspectral data characteristic information Ya2 is Xsa multiplied by Xsa, and the characteristic channel number is 60; wherein:

the 2D dense neural network comprises 4 layers of convolution layers, wherein the convolution kernel size is (Ka 1 ), and the value of Ka1 is 5; the number of channels of the first layer of convolution layer is 24, and the number of channels of each other layer of convolution layer is 12;

each convolution layer in the 2D dense convolution neural network consists of a 2D convolution neural network layer, a batch normalization layer and a Mish activation function layer.

Further, the step S6 includes:

the hyperspectral data characteristic information Ya2 is subjected to dimension transformation and then is input into a hybrid combination transducer attention network based on spatial characteristics; the hybrid combination transducer attention network based on the spatial features divides the token input into hyperspectral data feature information Ya2 of the network into fragments, and generates group agents to replace individual token through multiple groups of aggregators with different sizes.

Further, the step S7 includes:

the local spectral characteristic information of different convolution layers obtained in the step S3 and the local spatial characteristic information of different convolution layers obtained in the step S6 are fused through the full-connection layer to form a new spectral-spatial combined characteristic, and a characteristic vector is output; the number of the all-connection layers is 2.

Further, the hyperspectral image classification method is completed by using a hyperspectral image classification system, wherein the hyperspectral image classification system comprises a hyperspectral image module, a 3D dense convolutional neural network module, a mixed combination transform attention network module based on spectral features, a 2D dense convolutional neural network module, a mixed combination transform attention network module based on spatial features, a spectrum-spatial feature fusion network and a classification image module, the step S1 is completed in the hyperspectral image module, the step S2 is completed in the 3D dense convolutional neural network module, the step S3 is completed in the mixed combination transform attention network module based on spectral features, the step S4 and the step S5 are completed in the 2D dense convolutional neural network module, the step S6 is completed in the mixed combination transform attention network module based on spatial features, the step S7 and the step S8 are completed in the spectrum-spatial feature fusion network, and the step S9 is completed in the classification image module.

Compared with the prior art, the invention has the following beneficial effects:

(1) The method combines the advantages of dense convolutional neural networks and hybrid combined transducer attention networks: in the aspect of spectral feature extraction, the local spectral feature information of the hyperspectral image is acquired by utilizing a 3D dense convolutional neural network, and then the token of various group sizes and the association among groups are captured simultaneously by utilizing a hybrid combination transducer attention network so as to improve the utilization of the global spectral feature information of the hyperspectral image; meanwhile, in the aspect of space feature extraction, the local space feature information of the hyperspectral image is acquired by using a 2D dense convolutional neural network, and then the utilization of the global space information of the hyperspectral image is enhanced by using a hybrid combination transducer attention network. And finally, fusing the local spectrum and the spatial characteristic information of the hyperspectral image through a characteristic fusion network to realize the fusion of the spatial-spectral characteristic information, and outputting a classification result through a Softmax layer.

(2) The mixed combination transducer attention network adopted by the method realizes comprehensive modeling of the association of different size groups and individual token through introducing group agents and ingenious aggregation operation, so that the model can more comprehensively capture complex spectrum and space structure characteristic information in hyperspectral images, improve the sensitivity of the model to hyperspectral image characteristic information of different scales and layers, and enable the model to be more suitable for various complex hyperspectral image application scenes. In addition, a new lightweight second-order optimizer Sophia optimizer is adopted to replace the traditional Adam optimizer in the network model, so that the convergence rate of the model is greatly increased, and the training cost of the model is reduced.

In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The invention will be described in further detail with reference to the accompanying drawings.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:

FIG. 1 is a block diagram of a hyperspectral image classification method based on convolutional neural networks and hybrid combined transducer attention networks of the present invention;

FIG. 2 is a flow chart of a 3D dense convolutional neural network and a hybrid combined transducer attention network based on spectral features of the present invention;

FIG. 3 is a flow chart of a 2D dense convolutional neural network and a hybrid combined transducer attention network based on spatial features of the present invention;

FIG. 4 is a flow chart of a hybrid combined transducer attention network of the present invention;

FIG. 5 is a classification diagram on a Pavia University dataset using different methods; wherein, (a) a pseudo-color map, (b) a real world map, (c) SF, (d) SSFFT, (e) BOS2T, (f) GAHT, (g) FTSCN, and (h) the method of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the embodiments shown in the drawings, but it should be understood that the embodiments are not limited to the invention, and functional, method, or structural equivalents and alternatives according to the embodiments are within the scope of protection of the present invention by those skilled in the art.

The embodiment provides a hyperspectral image classification method based on a convolutional neural network and a hybrid combination transducer attention network, which fully utilizes the advantages of the convolutional neural network, the hybrid combination transducer attention network and a new lightweight second-order optimizer Sophia, fully digs local and global spectrum-space characteristic information among hyperspectral pixels, greatly accelerates the convergence rate of a model, reduces the training cost of the model, and achieves a classification result with high accuracy under low training cost.

As shown in fig. 1, a hyperspectral image classification method based on a convolutional neural network and a hybrid combined transducer attention network in the present embodiment includes the following steps:

step S1, acquiring hyperspectral images to be classified, and preprocessing the hyperspectral images to be classified to obtain hyperspectral data blocks X1 and X2. The method specifically comprises the following steps:

first, let the size of an image for hyperspectral image classification be w×h×c; then, the hyperspectral image classification data is remodeled to X, and the size is n×c, wherein n=w×h; then, carrying out data preprocessing on the hyperspectral image classification data X to obtain hyperspectral data blocks X1 and X2, wherein the dimensions of X1 and X2 are Xse multiplied by Xse multiplied by C and Xsa multiplied by Xsa multiplied by C respectively; and the obtained hyperspectral data blocks X1 and X2 are respectively used as the input of a subsequent spectral feature extraction branch network and a spatial feature extraction branch network.

And S2, processing the input hyperspectral data block X1 by using a 3D dense convolutional neural network to acquire local spectral characteristic information of hyperspectral images of different convolutional layers. Preferably, the 3D dense convolutional neural network comprises 4 convolutional layers, the convolutional kernel size is (1, ke), wherein the value of Ke is 5; the number of channels of the first layer of convolution layer is 24, and the number of channels of each of the other layers of convolution layers is 12.

In a specific embodiment, the implementation flow is as shown in fig. 2, and the hyperspectral data block X1 with the dimension of Xse × Xse ×c is input into a 3D dense convolutional neural network for processing;

the 3D dense convolutional neural network has 4 convolutional layers, and the value of a convolution kernel Ke is 5; the number of the channels of the first layer of convolution layers is 24, and the number of the channels of the convolution layers of the second layer, the third layer and the fourth layer of convolution layers is 12;

each convolution layer in the 3D dense convolution neural network consists of a 3D convolution neural network (3D-Conv) layer, a Batch Normalization (BN) layer and a Mish activation function layer;

after the 3D dense convolutional neural network is processed, new hyperspectral data characteristic information Ye1 is obtained, the dimension size of the hyperspectral data characteristic information Ye1 is Xse multiplied by Xse multiplied by C, and the characteristic channel number is 60.

And S3, performing channel transformation on the hyperspectral data output after the step S2 is performed by using a 3D convolution layer, performing dimension transformation, and inputting the hyperspectral data into a mixed combination transducer attention network based on spectral characteristics so as to improve the utilization of global spectral characteristic information of the hyperspectral image.

In a specific embodiment, a 3D convolutional neural network layer is utilized to perform channel transformation on the hyperspectral data characteristic information Ye1 output after the step S2 is executed, so as to obtain new hyperspectral data characteristic information Ye2; then, carrying out dimension transformation on the new hyperspectral data characteristic information Ye2, and inputting the new hyperspectral data characteristic information Ye2 into a mixed combination transducer attention network based on spectral characteristics; the hybrid combination transducer attention network based on spectral features segments the token of hyperspectral data feature information Ye2 entered into the network into fragments and generates group agents by groups of aggregators of different sizes to replace individual tokens. The convolution kernel size of the 3D convolution neural network layer is (1, ke 1), wherein the value of Ke1 is determined by C; when C is even, ke1 has the value of Floor ((C-Ke+1)/2); when C is an odd number, ke1 has a value of Floor ((C-Ke)/2+1), where Floor (. Cndot.) represents a rounding-down function. And step S3, through introducing group agents and ingenious aggregation operation, comprehensive modeling of the association of groups with different sizes and individual token is realized, so that the model can more comprehensively capture the structural information in the hyperspectral image, the sensitivity to different scales and hierarchical features is improved, and the utilization of the global spectral feature information of the hyperspectral image is further improved.

In the transducer model, Q, K and V represent a Query (Query), a Key (Key), and a Value (Value), respectively. As shown in fig. 4, in the flow chart of the hybrid combined transducer attention network, it generates group agents by replacing some of the entries in Q, K and V with some aggregation of whole groups, which can be effectively achieved by the sliding window based operation Agg (·), which expands attention from focusing on only individual tokens to focusing on the whole group. Specifically:

first, input token is generated Q, K and V using three learnable linear projections; the Q, K, V item is divided into n segments uniformly, and each part participates in different calculation respectively.

Next, the network is composed of a branch of interest module and a branch of non-interest module. In the attention branching module, these branches are also called pre-attention branches, since they will ultimately be input to the attention layer for computation; there are a total of 4 pre-attention branches in the attention branching module, in the first three attention branches, different implementations (pooling operation and deep convolution operation) are used as the aggregator Agg (·), wherein the convolution operation consists of three deep convolution layers of different convolution kernel sizes, preferably 3, 5 and 7, respectively; in order to further diversify the structure, in the fourth attention branch, an aggregator is not used, making it an identity map.

In the non-focused branching module, a branch with an aggregator but no focused is constructed.

The output is mixed by a token integration layer, which is simply implemented by a linear projection with normalization and activation; finally, by multiplying the calculated attention map by Value, a recombination of the associated group and individual token is achieved, thereby better capturing the structural information in the hyperspectral image.

S4, in parallel with the step S2, reducing the dimension of the input hyperspectral data block X2 by using a 3D convolutional neural network layer; specifically:

inputting the hyperspectral data block X2 into a 3D convolutional neural network layer for dimension reduction to obtain hyperspectral data characteristic information Ya1, wherein the dimension of the hyperspectral data characteristic information Ya1 is Xsa multiplied by Xsa; the convolution kernel size of the 3D convolution neural network layer is (1, ka0), wherein the value of Ka0 is the spectrum channel size of the input hyperspectral data block X2, and the number of convolution kernel channels is 48.

And S5, after the step S4 is executed, acquiring local spatial characteristic information of hyperspectral images of different convolution layers by using a 2D dense neural network. Specifically:

and inputting the hyperspectral data characteristic information Ya1 into a 2D dense convolutional neural network for processing to obtain new hyperspectral data characteristic information Ya2, wherein the dimension of the hyperspectral data characteristic information Ya2 is Xsa multiplied by Xsa, and the characteristic channel number is 60.

The 2D dense neural network comprises 4 layers of convolution layers, wherein the convolution kernel size is (Ka 1 ), and the value of Ka1 is 5; the number of channels of the first layer of convolution layer is 24, and the number of channels of each other layer of convolution layer is 12; each convolution layer in the 2D dense convolution neural network consists of a 2D convolution neural network layer, a batch normalization layer and a dash activation function layer. The execution flow of the 2D dense neural network is shown in fig. 3.

And S6, carrying out dimension transformation on the hyperspectral data output after the step S5 is executed, and inputting the hyperspectral data into a hybrid combination transducer attention network based on spatial features so as to improve the utilization of global spatial feature information of the hyperspectral image. Specifically: the hyperspectral data characteristic information Ya2 is subjected to dimension transformation and then is input into a hybrid combination transducer attention network based on spatial characteristics; the hybrid combination transducer attention network based on the spatial features divides the token of the hyperspectral data feature information Ya2 input into the network into fragments, and generates group agents to replace individual token through multiple groups of aggregators with different sizes.

Step S7, carrying out feature fusion on the spectrum feature information of the different convolution layers obtained in the step S3 and the space feature information of the different convolution layers obtained in the step S6 through a feature fusion network, forming a new spectrum-space joint feature through the full connection layer, and outputting a feature vector; the number of layers of the full-connection layer is preferably 2.

Step S8, a new lightweight second-order optimizer Sophia optimizer is adopted to replace the traditional Adam optimizer used in the model constructed in the steps S1 to S7; so as to accelerate the running speed of the model and reduce the training cost of the model.

And S9, processing a new feature vector obtained by the last full-connection layer in the spectrum feature fusion network by a Softmax layer to obtain a prediction classification result.

The hyperspectral image classification method of the embodiment of the invention is completed by using a hyperspectral image classification system, wherein the hyperspectral image classification system comprises a hyperspectral image module 1, a 3D dense convolutional neural network module 2, a mixed combination transform attention network module based on spectral characteristics 3, a 2D dense convolutional neural network module 4, a mixed combination transform attention network module based on spatial characteristics 5, a spectral-spatial characteristic fusion network module 6 and a classification image module 7, step S1 is completed in the hyperspectral image module 1, step S2 is completed in the 3D dense convolutional neural network module 2, step S3 is completed in the mixed combination transform attention network module based on spectral characteristics 3, step S4 and step S5 are completed in the 2D dense convolutional neural network module 4, step S6 is completed in the mixed combination transform attention network module based on spatial characteristics 5, step S7 and step S8 are completed in the spectral-spatial characteristic fusion network module 6, and step S9 is completed in the classification image module 7.

The embodiment utilizes a convolutional neural network and a hybrid combination transducer attention network to extract spectral space characteristic information of the hyperspectral image. In the aspect of spectral feature extraction, utilizing a 3D dense convolutional neural network to acquire local spectral feature information for a hyperspectral image, and then utilizing a hybrid combination converter attention network to simultaneously capture token of various group sizes and the association among groups so as to improve the utilization of global spectral feature information of the hyperspectral image; meanwhile, in the aspect of space feature extraction, a 2D dense convolution neural network is utilized for acquiring local space feature information of a hyperspectral image, and then a hybrid combination transducer attention network is utilized for enhancing the utilization of global space information of the hyperspectral image. And finally, fusing the space and spectrum characteristic information through a characteristic fusion network, and outputting a classification result through a Softmax layer.

The mixed combination transducer attention network adopted by the method realizes comprehensive modeling of the association of different size groups and individual token through introducing group agents and ingenious aggregation operation, so that the model can more comprehensively capture complex spectrum and space structure characteristic information in hyperspectral images, improve the sensitivity of the model to hyperspectral image characteristic information of different scales and layers, and enable the model to be more suitable for various complex hyperspectral image application scenes. In addition, a new lightweight second-order optimizer Sophia optimizer is adopted to replace the traditional Adam optimizer in the network model, so that the convergence rate of the model is greatly increased, and the training cost of the model is reduced.

Example 1

The experimental hardware platform is a high-performance computer, which is configured as follows: the Intel Core i9-9900K@3.60GHz eight cores, 128G memory, the display card RTX 2080Ti, and the software environment Python3.6.0 and pytorch1.5.0.

1. Experimental data and sample partitioning

To evaluate the classification effect of the proposed method, a Pavia University dataset was selected to verify the performance of the proposed method.

The Pavia University dataset is hyperspectral image data obtained by a hyperspectral imaging spectral sensor at the University of Pavia in north italy, with pixels of 610 x 340, and 103 effective spectral bands available for classification in the wavelength range of 430-860 nm. 9 types of ground object categories are contained in the Pavia University dataset, and the name and the sample number of each sample category are shown in table 1.

The classification performance of hyperspectral images is measured by three indices, namely, overall classification accuracy (OA), average classification accuracy (AA) and Kappa coefficient.

2. Parameter setting

In the experiment, besides the design of a network structure, three parameters of batch size, learning rate and training iteration number can also have significant influence on the experiment. In this example, the Pavia University dataset was taken as an example, and all experiments were performed with a batch size of 64, a learning rate of 0.001, and a training iteration number of 200 times to optimize the classification performance of the model.

3. Experimental results

In order to ensure the accuracy of the experimental results, the experiment was repeated 20 times and then the average value was taken.

To verify the effectiveness and superiority of the proposed method, the present embodiments are experimentally compared with the latest (proposed after 2022) regarding the method based on a transducer (e.g. SF, SSFTT, GAHT, BOS T, FTSCN). The results of the classification performance versus experiment for the different methods on the Pavia University dataset are shown in table 2.

As can be seen from the results in Table 2, the OA, AA and Kappa values of the proposed method were significantly higher than those of the other latest methods on the Pavia University dataset, wherein the OA value was 0.14% higher than the FTSCN, 1.34% higher than the BOS2T, 4.61% higher than the SSFTT, 5.37% higher than the GAHT and 17.54% higher than the SF classification method. AA 0.32% higher than FTSCN, 2.27% higher than BOS2T, 9.00% higher than SSFTT, 6.17% higher than GAHT, and 24.62% higher accuracy than SF classification. Kappa values were 0.18% higher than FTSCN, 1.74% higher than BOS2T, 6.14% higher than SSFTT, 7.59% higher than GAHT, and 24.29% higher accuracy than SF classification. All three indexes show that the method is superior to other newly proposed methods in classification performance.

Table 2 classification performance of different methods in Pavia University dataset

And a classification diagram of the different methods on the Pavia University dataset is shown in fig. 5. It can be seen from the figure that the final classification results of SF, SSFTT and GAHT methods all have a lot of noisy spots, and that the "bare" and "brick" areas have more misclassification phenomena. The FSTCN and BOS2T methods have significantly better visual classification effect than the former methods, but there are fewer misclassification points in the "brick" area, and better homogeneity. However, the BOS2T method has significantly more misclassified points in the "gravel" and "asphalt rooftop" areas than the classification result plot of the present method, and the FSTCN method has significantly more misclassified pixels in the "gravel" areas than the present method. The method provided by the invention has fewer misclassification points in the 'gravel' area, is basically and completely and correctly classified on other ground object categories, and is relatively smoother in a homogeneous area. This further verifies the effectiveness and superiority of the method of the invention.

In addition, compared with a BOS2T model and an FSTCN model, the method provided by the invention adopts a 2D dense convolution neural network in the aspect of space feature extraction, and compared with a 3D dense convolution neural network and a dense convolution neural network based on a SimAM attention mechanism, the method provided by the invention has the advantage that the model is relatively simpler. In the aspect of enhancing global spectrum and spatial feature extraction, the method adopts a hybrid combined transducer attention network, and realizes comprehensive modeling of the association of groups with different sizes and individual token by introducing group agents and ingenious aggregation operation, so that the model can more comprehensively capture complex spatial structure feature information in hyperspectral images, improve the sensitivity of the model to hyperspectral image feature information with different scales and layers, and enable the model to be more suitable for various complex hyperspectral image application scenes. In addition, a new lightweight second-order optimizer Sophia optimizer is adopted to replace the traditional Adam optimizer in the network model, so that the convergence rate of the model is greatly increased, and the training cost of the model is reduced.

4. Conclusion(s)

In order to enhance the local spectrum-space characteristic extraction of the hyperspectral image, and enhance the establishment of long-distance dependency relationship between pixels in the hyperspectral image by a model, and improve the classification performance of the hyperspectral image, the invention provides a new model based on a dense convolutional neural network and a hybrid combination transducer attention network. The model has a simple structure, relatively small model parameters, no complex data preprocessing and post-processing are needed, and end-to-end training can be realized. Experiments show that compared with the classification performance of the latest method in recent 2 years, the method can more fully extract the spatial characteristics and spectral characteristic information of the hyperspectral image, and has better classification effect.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The hyperspectral image classification method is characterized by comprising the following steps of:

s2, processing an input hyperspectral data block X1 by using a 3D dense convolutional neural network to obtain local spectral characteristic information of hyperspectral images of different convolutional layers;

s3, firstly carrying out channel transformation on the local spectral feature information output in the step S2 by using a 3D convolutional neural network layer, then carrying out dimension transformation, and then inputting a mixed combination transducer attention network based on spectral features;

s6, carrying out dimension transformation on the local spatial feature information output in the step S5, and then inputting a hybrid combination transducer attention network based on spatial features;

and S9, processing the feature vector output by the last full-connection layer in the spectrum-space feature fusion network by utilizing the Softmax layer, and finally obtaining a hyperspectral image prediction classification result.

2. The hyperspectral image classification method as claimed in claim 1, wherein the step S1 includes:

let the size of the image for hyperspectral image classification be w×h×c;

3. The hyperspectral image classification method as claimed in claim 2, wherein step S2 comprises:

4. A hyperspectral image classification method as claimed in claim 3 wherein step S3 comprises:

5. The hyperspectral image classification method as claimed in claim 2, wherein the step S4 includes:

inputting a hyperspectral data block X2 into a 3D convolutional neural network layer for dimension reduction to obtain hyperspectral data characteristic information Ya1, wherein the dimension of the hyperspectral data characteristic information Ya1 is Xsa multiplied by Xsa; wherein:

6. The hyperspectral image classification method as claimed in claim 5, wherein the step S5 includes:

7. The hyperspectral image classification method as claimed in claim 6, wherein the step S6 includes:

8. The hyperspectral image classification method as claimed in claim 1, wherein the step S7 includes:

9. The hyperspectral image classification method according to any of claims 1 to 8 is done using a hyperspectral image classification system comprising a hyperspectral image module (1), a 3D dense convolutional neural network module (2), a hybrid combined transducer attention network module based on spectral features (3), a 2D dense convolutional neural network module (4), a hybrid combined transducer attention network module based on spatial features (5), a spectral-spatial feature fusion network module (6) and a classification image module (7), the step S1 being done in the hyperspectral image module (1), the step S2 being done in the 3D dense convolutional neural network module (2), the step S3 being done in the hybrid combined transducer attention network module based on spectral features (3), the step S4, and the step S5 being done in the 2D dense convolutional neural network module (4), the step S6 being done in the hybrid combined transducer attention network module based on spatial features (5), the step S6 being done in the spatial feature fusion network module (7), the step S9 being done in the spatial feature fusion network module (7).