CN116524265A - Hyperspectral image classification method based on multi-scale mixed convolution network - Google Patents
Hyperspectral image classification method based on multi-scale mixed convolution network Download PDFInfo
- Publication number
- CN116524265A CN116524265A CN202310492375.3A CN202310492375A CN116524265A CN 116524265 A CN116524265 A CN 116524265A CN 202310492375 A CN202310492375 A CN 202310492375A CN 116524265 A CN116524265 A CN 116524265A
- Authority
- CN
- China
- Prior art keywords
- scale
- layer
- dimensional convolution
- hyperspectral image
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000001228 spectrum Methods 0.000 claims abstract description 39
- 230000009467 reduction Effects 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000004913 activation Effects 0.000 claims description 40
- 238000012549 training Methods 0.000 claims description 20
- 238000011176 pooling Methods 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000000513 principal component analysis Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 235000008331 Pinus X rigitaeda Nutrition 0.000 claims description 6
- 235000011613 Pinus brutia Nutrition 0.000 claims description 6
- 241000018646 Pinus brutia Species 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- 235000017333 Parkia speciosa Nutrition 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000007711 solidification Methods 0.000 claims description 3
- 230000008023 solidification Effects 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 2
- 238000011946 reduction process Methods 0.000 claims description 2
- 240000005160 Parkia speciosa Species 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 56
- 238000010586 diagram Methods 0.000 description 7
- 239000000203 mixture Substances 0.000 description 6
- 210000002569 neuron Anatomy 0.000 description 6
- 241000522669 Parkia Species 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241001466077 Salina Species 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003331 infrared imaging Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003121 nonmonotonic effect Effects 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/10—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image classification processing, in particular to a hyperspectral image classification method based on a multi-scale mixed convolution network, which specifically comprises the following steps of step 1, hyperspectral image: using the disclosed hyperspectral image dataset; step 2, image preprocessing: and (3) carrying out data dimension reduction on the hyperspectral image in the step (1) and carrying out sample block taking on the image after dimension reduction to obtain a hyperspectral sample block. According to the invention, the space-spectrum combined features of the hyperspectral image are extracted by adopting a plurality of multi-scale 3D convolution modules, so that the spectrum and space dimension features under different scales are fully fused, and the classification performance of the hyperspectral image is improved.
Description
Technical Field
The invention relates to the technical field of image classification processing, in particular to a hyperspectral image classification method based on a multi-scale mixed convolution network.
Background
The hyperspectral remote sensing utilizes a plurality of electromagnetic waves to acquire related data of a measured object, is the leading edge field of the current remote sensing technology, and a hyperspectral image is formed by combining surface image information and spectrum information, and has the advantages of large spectrum information quantity, nanoscale spectrum resolution, spectrum integration and the like. Classifying hyperspectral images acquired by combining an imaging spectrometer with an aircraft and a satellite by using a deep learning technology has become an emerging research field.
The basis of hyperspectral image classification is spectral information and spatial information, and the combined use of the spatial information and the spectral information is widely applied to hyperspectral image classification in the hyperspectral image classification field at present, and the Chinese patent publication number is CN113837314A, and the name is a hyperspectral image classification method based on a mixed convolutional neural network; the network model adopts a single 3D convolution model to simultaneously extract the spectrum and space dimension characteristics of the preprocessed hyperspectral image; further extracting space dimension characteristics by adopting a single 2D convolution model; processing the output information by adopting a single 1D convolution model; the method does not consider the relation between the features of different scales and different layers, the relation between the remote pixel blocks is not utilized, and the classification effect is required to be improved.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a hyperspectral image classification method based on a multi-scale mixed convolution network, which solves the problem of classification precision.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes: a hyperspectral image classification method based on a multiscale mixed convolution network specifically comprises the following steps
Step 1, hyperspectral image: using the disclosed hyperspectral image dataset;
step 2, image preprocessing: performing data dimension reduction on the hyperspectral image in the step 1, and performing sample block taking on the image after dimension reduction to obtain a hyperspectral sample block;
step 3, constructing a multi-scale mixed convolution network model: the whole network model comprises two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, a mixed attention block, a three-dimensional convolution layer, a two-dimensional convolution layer, a one-dimensional convolution layer, two full connection blocks and a classifier. Inputting a hyperspectral sample block into two directly connected multi-scale three-dimensional convolution blocks to extract spatial spectrum characteristics containing multi-scale information; reconstructing the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks, and then respectively inputting the reconstructed spatial spectrum features into the two multi-scale two-dimensional convolution blocks to further extract multi-scale and multi-level spatial features; the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks are input into a three-dimensional convolution layer, the convolution result is reconstructed and then input into a two-dimensional convolution layer, and the spectral features are further extracted; the multi-scale multi-level spatial features and the spectrum features are subjected to feature fusion and then input into the mixed attention block, so that the mixed attention block only pays attention to useful spatial information and spectrum information; the space information and the spectrum information of interest are reconstructed and then input into a one-dimensional convolution layer to obtain the characteristics of a vector form; finally, the data are input into two full connecting blocks, and a classification result is output through a classifier.
Step 4, selecting a loss function and an evaluation index: the model parameter training is considered to be completed by calculating the classified result and the loss function of the label until the training times reach a set threshold value or the value of the loss function reaches a set range, and meanwhile, an evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;
step 5, saving a training model: and (3) selecting a group of model parameters with the best effect in the training process for solidification, and then when hyperspectral image classification operation is needed, directly inputting hyperspectral images into a network to obtain final classified images.
Further, the disclosed data set adopted in the step 1 is as follows: an indian pine data set (IN), a university of parkia data set (UP), and a salunas data Set (SA).
Further, the method for reducing the dimension of the data in the step 2 uses a principal component analysis (PCA, principal Component Analysis), and the dimension reduction process is as follows:
for dimension W x H x C 1 Is the original hyperspectral image I of (1) 1 Performing covariance matrix feature decomposition to convert into W×H×C dimensions 2 Novel hyperspectral image I of (2) 2 Wherein W is the image width, H is the image height, C 1 C is the number of original image channels 2 For the number of the wave bands after conversion, the dimension reduction operation can reduce the redundancy of data characteristics and can also reduce the calculation parameters.
Further, the block taking operation process of the sample in the step 2 is as follows:
-transforming said new hyperspectral image I 2 Cut into a size of w×w×c 1 Is input into the network model, w is the window size.
Further, the multi-scale three-dimensional convolution block in the step 3 is composed of three branches of small scale, medium scale and large scale, the small scale branch is sequentially composed of a three-dimensional convolution layer, a batch normalization layer and an activation function layer, the medium scale branch and the large scale branch are sequentially composed of a three-dimensional convolution layer, an activation function layer, a three-dimensional convolution layer, a batch normalization layer and an activation function layer, and tensors obtained by the three branches are spliced together according to a first dimension number; the multi-scale two-dimensional convolution block consists of three branches with small scale, medium scale and large scale, each branch consists of a two-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension number; the mixed attention block comprises a spectrum attention block and a space attention block which are connected in a serial mode; the method comprises the steps of firstly carrying out global average pooling and global maximum pooling on input through two branches in a spectrum attention block, sequentially inputting pooling results into a one-dimensional convolution layer and an activation function layer 1, splicing obtained tensors according to a first dimension, sequentially inputting the tensors into a linear layer 1, an activation function layer 2, a linear layer 2, an activation function layer 1, a one-dimensional convolution layer and an activation function layer 1, and finally carrying out element multiplication on the tensors and input features; the method comprises the steps that firstly, input is subjected to average pooling and maximum pooling through two branches in a space attention block, pooling results are sequentially input into a two-dimensional convolution layer and a batch normalization layer, obtained tensors are spliced according to a first dimension number, and then the tensors are sequentially input into the two-dimensional convolution layer and an activation function layer; the full connection block is sequentially composed of a linear layer, an activation function layer and a Dropout layer.
Further, the loss function in the step 4 selects a cross entropy loss function; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the actual value, and can correctly feed back the quality of the model. The evaluation index selects overall accuracy, average accuracy and consistency, can effectively evaluate the quality of classification and measure the effect of a classification network.
(III) beneficial effects
Compared with the prior art, the hyperspectral image classification method based on the multi-scale mixed convolution network has the following beneficial effects:
1. according to the invention, the space-spectrum combined features of the hyperspectral image are extracted by adopting a plurality of multi-scale 3D convolution modules, so that the spectrum and space dimension features under different scales are fully fused, and the classification performance of the hyperspectral image is improved.
2. According to the method, a plurality of multi-scale 2D convolution modules are adopted, the space dimension characteristics of different layers are further extracted, the calculated amount is reduced, and meanwhile, the classification precision of hyperspectral images is further improved.
3. According to the invention, the mixed attention module is added to establish the connection between the far pixel blocks, so that the classification precision of the hyperspectral image is further improved.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a network architecture of the present invention;
FIG. 3 is a schematic diagram of the specific composition of a multi-scale 3D convolution module according to the present disclosure;
FIG. 4 is a schematic diagram of the specific composition of a multi-scale 2D convolution module according to the present disclosure;
FIG. 5 is a schematic diagram showing the specific components of the hybrid attention module of the present invention;
FIG. 6 is a schematic diagram showing the specific composition of a spectral attention block in the hybrid attention module of the present invention;
FIG. 7 is a schematic diagram showing the specific composition of a spatial attention block in the hybrid attention module of the present invention;
FIG. 8 is a schematic diagram showing the specific composition of the full connection block of the present invention;
FIG. 9 is a graph comparing correlation metrics over three data sets according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
As shown in fig. 1-9, a hyperspectral image classification method based on a multi-scale mixed convolution network according to an embodiment of the present invention specifically includes the following steps:
step 1, hyperspectral image: the disclosed hyperspectral image dataset is employed.
Step 2, image preprocessing: and (3) performing data dimension reduction on the hyperspectral image in the step (1), performing sample block taking on the image after dimension reduction to obtain a hyperspectral sample block, performing dimension reduction on the original hyperspectral image I1 according to a principal component analysis method, and performing sample block taking on the new hyperspectral image I2 after dimension reduction to obtain a three-dimensional image block.
Because of the large number of hyperspectral images and numerous wave bands, the data is necessary to be subjected to dimension reduction operation, and a Principal Component Analysis (PCA) is a statistical method, and a group of variables possibly with correlation are converted into a group of variables which are not linearly related through forward-backward conversion, and the converted variables are called as principal components.
The specific implementation method of the sample block is that in the space dimension, the new hyperspectral image I2 is cut into a three-dimensional image block with the size of w multiplied by B, and the three-dimensional image block is input into a network model, wherein w is the size of a window; the sample block is labeled with intermediate pixels.
Step 3, constructing a multi-scale mixed convolution network model: the whole network model comprises two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, a mixed attention block, a three-dimensional convolution layer, a two-dimensional convolution layer, a one-dimensional convolution layer, two full connection blocks and a classifier, and hyperspectral sample blocks are input into the two multi-scale three-dimensional convolution blocks which are directly connected to extract spatial spectrum features containing multi-scale information; reconstructing the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks, and then respectively inputting the reconstructed spatial spectrum features into the two multi-scale two-dimensional convolution blocks to further extract multi-scale and multi-level spatial features; the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks are input into a three-dimensional convolution layer, the convolution result is reconstructed and then input into a two-dimensional convolution layer, and the spectral features are further extracted; the multi-scale multi-level spatial features and the spectrum features are subjected to feature fusion and then input into the mixed attention block, so that the mixed attention block only pays attention to useful spatial information and spectrum information; the space information and the spectrum information of interest are reconstructed and then input into a one-dimensional convolution layer to obtain the characteristics of a vector form; finally, inputting the data into two full connecting blocks, and outputting a classification result through a classifier; the multi-scale three-dimensional convolution block consists of three branches of small scale, medium scale and large scale, wherein the small scale branch consists of a three-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, the medium scale and large scale branch consists of a three-dimensional convolution layer, an activation function layer, a three-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension; the multi-scale two-dimensional convolution block consists of three branches with small scale, medium scale and large scale, each branch consists of a two-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension number; the mixed attention block comprises a spectrum attention block and a space attention block which are connected in a serial mode; the method comprises the steps of firstly carrying out global average pooling and global maximum pooling on input through two branches in a spectrum attention block, sequentially inputting pooling results into a one-dimensional convolution layer and an activation function layer 1, splicing obtained tensors according to a first dimension, sequentially inputting the tensors into a linear layer 1, an activation function layer 2, a linear layer 2, an activation function layer 1, a one-dimensional convolution layer and an activation function layer 1, and finally carrying out element multiplication on the tensors and input features; the method comprises the steps that firstly, input is subjected to average pooling and maximum pooling through two branches in a space attention block, pooling results are sequentially input into a two-dimensional convolution layer and a batch normalization layer, obtained tensors are spliced according to a first dimension number, and then the tensors are sequentially input into the two-dimensional convolution layer and an activation function layer; the full connection block is sequentially composed of a linear layer, an activation function layer and a Dropout layer.
The batch normalization layer forcibly pulls the distribution of any neuron input values of each layer of neural network back to the standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means, so that the activated input values fall in a region of which the nonlinear function is sensitive to input, the output of the network is not too large, a relatively large gradient is obtained, the problem of gradient disappearance is avoided, and further gradient enlargement also means a learning convergence speed block; the Dropout layer may set a certain probability to stop some neurons from working when propagating forward, then start training, update and keep those neurons still working and weight parameters, after all the parameters are updated, re-make some neurons stop working according to the probability set by us, then start training, if a new neuron for training has been trained in the first time, continue updating its parameters, if the parameters are updated for the first time, but keep the weights of the neurons stopped for the second time, and keep the process until the training is finished, so as to prevent the network from fitting during learning.
Step 4, selecting a loss function and an evaluation index: the model parameter training is considered to be completed by calculating the loss function of the classified result image and the label until the training times reach a set threshold value or the value of the loss function reaches a set range, the model parameter is saved, meanwhile, an evaluation index is selected to measure the accuracy of an algorithm, the performance of a system is evaluated, the selection of the loss function influences the quality of the model, the difference between a predicted value and a true value can be truly reflected, the quality of the model can be correctly fed back, the overall accuracy, the average accuracy and the consistency of the evaluation index are selected, the classified quality can be effectively evaluated, and the effect of a classification network is measured.
Step 5, saving a training model: and (3) selecting a group of model parameters with the best effect in the training process for solidification, and then when hyperspectral image classification operation is needed, directly inputting hyperspectral images into a network to obtain final classified images.
Further, said step 1 selecting an indian pine data set (IN), a university of parkia data set (UP) and a salunas data Set (SA); the indian pine data set (IN) is a hyperspectral image obtained by an on-board visible infrared imaging spectrometer IN northwest of indiana, the space size of the image is 145×145, the number of wave bands is 220, the resolution of the spectrum and space is 10nm and 20m, background pixels are removed, the number of space pixels generally used for experiments is 10249, the true class of ground object is 16, and IN 220 wave bands, 20 are unavailable, and the rest 200 wave bands are only used for the experiments for research; the parkia university dataset (UP) was obtained by AVIRIS sensor in florida 1996, with a spatial size of 512 x 614, a spatial resolution of 18m, and the dataset was divided into 9 categories; 115 bands, 12 noise bands are removed, and 103 available bands are left; the salina dataset (SA) is a hyperspectral image obtained in the united states by an avisis sensor; the spatial size of the image is 512×217, the spatial resolution is 1.7m, wherein the ground features have 16 categories, 224 wave bands, but 20 water absorption band frequency bands are removed, and the remaining 204 wave bands are used for hyperspectral image classification experiments.
Further, in the step 2, taking the Indian pine data set as an example, the original hyperspectral image I1 is firstly reduced to 145×145×30, the covariance matrix of the original hyperspectral image is solved, and then the characteristic root lambda of the covariance matrix is calculated 1 ≥λ 2 …≥λ 200 And (3) setting a threshold value theta, selecting the first P principal components larger than the threshold value theta, obtaining corresponding unit feature vectors from the feature roots of the first P principal components, combining the unit feature vectors into a matrix, solving a transposed matrix, and then transforming the original hyperspectral image by the transposed matrix to obtain the hyperspectral image with reduced dimension. And then, the dimension-reduced image I2 is subjected to block taking, and a three-dimensional image block with the size of 25 multiplied by 30 is obtained. The calculation formula of the band mean and covariance matrix is as follows:
wherein X is i Representing the ith pixel point of the original hyperspectral image, Q represents the number of the pixel points and X j The j-th wave band of the original hyperspectral image is represented, and B is represented as the number of wave bands;
further, in the step 3, the structure of the multi-scale mixed convolution network model is shown in fig. 2, and the whole network model includes two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, one mixed attention block, one three-dimensional convolution layer, one two-dimensional convolution layer, one-dimensional convolution layer, two full connection blocks and one classifier. The structure of the multi-scale three-dimensional convolution block is shown in fig. 3, wherein all activation function layers use a mich function; in a small-scale branch of a first multi-scale three-dimensional convolution block, the size of convolution kernels in a convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 16; in a mesoscale branch of the first multi-scale three-dimensional convolution block, the size of convolution kernels in the first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 8, the size of convolution kernels in the second three-dimensional convolution layer is 3 multiplied by 3, and the number of convolution kernels is 16; in a large-scale branch of a first multi-scale three-bit convolution block, the size of convolution kernels in a first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 8, the size of convolution kernels in a second three-dimensional convolution layer is 5 multiplied by 5, and the number of convolution kernels is 32; in the small-scale branch of the second multi-scale three-dimensional convolution block, the size of convolution kernels in the convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 32; in a mesoscale branch of the second multi-scale three-dimensional convolution block, the size of convolution kernels in the first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 16, the size of convolution kernels in the second three-dimensional convolution layer is 3 multiplied by 3, and the number of convolution kernels is 32; in the large-scale branch of the second multi-scale three-dimensional convolution block, the size of convolution kernels in the first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 16, the size of convolution kernels in the second three-dimensional convolution layer is 5 multiplied by 5, and the number of convolution kernels is 64; the structure of the multi-scale two-dimensional convolution block is shown in fig. 4, wherein all activation function layers use a mich function; in a small-scale branch of a first multi-scale two-dimensional convolution block, the convolution kernel size in the convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 16; in a mesoscale branch of a first multiscale two-dimensional convolution block, the size of convolution kernels in a convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 16; in a large-scale branch of the first multi-scale two-dimensional convolution block, the size of convolution kernels in the convolution layer is 5 multiplied by 5, and the number of the convolution kernels is 16; in a small-scale branch of the second multi-scale two-dimensional convolution block, the convolution kernel size in the convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 32; in a mesoscale branch of the second multiscale two-dimensional convolution block, the size of convolution kernels in the convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 32; in the large-scale branch of the second multi-scale two-dimensional convolution block, the size of convolution kernels in the convolution layer is 5 multiplied by 5, and the number of the convolution kernels is 32; the size of the convolution kernel in the three-dimensional convolution layer connected with the second multi-scale three-dimensional convolution block is 1 multiplied by 1, the number of the convolution kernels is 64, the size of the convolution kernel in the two-dimensional convolution layer connected with the three-dimensional convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 64; the structure of the mixed attention block is shown in fig. 5; the structure of the spectrum attention block is shown in fig. 6, the size of a convolution kernel in a one-dimensional convolution layer in two branches is B (B is the number of channels after dimension reduction), a Sigmoid function is used for an activation function layer 1, a Mish function is used for an activation function layer 2, and the size of the convolution kernel in the subsequent one-dimensional convolution layer is 2B; the structure of the spatial attention block is shown in fig. 7, wherein the convolution kernel size of the two-dimensional convolution layers of the two branches is 1×1, the convolution kernel size in the subsequent two-dimensional convolution layer is 3×3, and the activation function layer uses a Sigmoid function; the structure of the fully connected layer is shown in fig. 8, wherein the activation function layer uses a Mish function, and the Dropout coefficient is set to 0.2; the above-mentioned Mish activation function is a non-monotonic smooth activation function, which can achieve better accuracy and generalization; the Sigmoid activation function is less affected by noise data; the LogSoftmax classifier is selected by the classifier, so that the operation speed can be increased, and the data stability can be improved; the Sigmoid function, the mix function, and the LogSoftmax function are defined as follows:
f(x) Mish =x*tanh(ln(1+e x ))
wherein x represents the input characteristic information, x i Representing predicted tag value, x j Representing the true tag value.
Further, the output of the network and the label in the step 4 calculate a loss function, the loss function selects a cross entropy loss function, and the cross entropy loss function is defined as follows:
wherein C represents the cost, x represents the samples, y represents the actual value, a represents the output value, and n represents the total number of samples.
The overall accuracy is an index for measuring the overall classification accuracy of the classification model, the average accuracy is an index for indicating the classification accuracy of the classification model on a certain class, and the consistency coefficient is used for measuring the consistency of the predicted value and the true value. The calculation formulas of the overall accuracy, the average accuracy and the consistency coefficient are as follows:
where TP is the positive sample correctly classified by the model, FN is the positive sample incorrectly classified by the model, FP is the negative sample incorrectly classified by the model, and TN is the negative sample correctly classified by the model. C is the total number of categories, T i Is the number of correctly classified samples for each category, a i Is the number of real samples of each type, b i The number of samples of each type is predicted, and n is the number of total samples.
The training frequency is set to be 200, the number of the network pictures input each time is 64, the upper limit of the number of the network pictures input each time is mainly determined according to the performance of a computer graphics processor, and generally, the larger the number of the network pictures input each time is, the better the network is, so that the network is more stable. The learning rate of the training process is set to 0.005, which can ensure the rapid fitting of the network without causing the over fitting of the network. The network optimizer selects an Adam optimizer. The method has the advantages of simple realization, high calculation efficiency, less memory requirement, no influence of gradient expansion transformation on parameter updating and stable parameter updating. The loss function value threshold is set to be about 0.005, and if the loss function value threshold is smaller than 0.005, training of the whole network can be considered to be basically completed.
The implementation of convolution, activation function and splicing operation is an algorithm well known to those skilled in the art, and the specific flow and method can be referred to in corresponding textbooks or technical literature.
According to the hyperspectral image classification method based on the multi-scale mixed convolution network, hyperspectral images can be classified, the hyperspectral image multi-scale and multi-level spatial spectrum characteristic information is utilized, and the connection is established between the long-distance pixels, so that the classification precision of the hyperspectral images is improved; the feasibility and superiority of the method are further verified by calculating the related indexes of the image obtained by the existing method.
On the indian pine data set (IN), the university of parkia data set (UP) and the salunas data Set (SA), the related index pairs of the prior art and the proposed method of the present invention are shown IN fig. 9; wherein 10% of the sample data is used as a training set, 10% of the sample data is used as a validation set, and the rest of the sample data is used as a test set in each data set; the method provided by the invention is applied to three data sets, and the values of three indexes of overall precision, average precision and consistency coefficient are higher than those of the existing method, and the indexes further indicate that the method provided by the invention has better classification effect.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A hyperspectral image classification method based on a multi-scale mixed convolution network is characterized by comprising the following steps of: the method comprises the following steps in particular,
step 1, hyperspectral image: using the disclosed hyperspectral image dataset;
step 2, image preprocessing: performing data dimension reduction on the hyperspectral image in the step 1, and performing sample block taking on the image after dimension reduction to obtain a hyperspectral sample block;
step 3, constructing a multi-scale mixed convolution network model: the whole network model comprises two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, a mixed attention block, a three-dimensional convolution layer, a two-dimensional convolution layer, a one-dimensional convolution layer, two full connection blocks and a classifier. Inputting a hyperspectral sample block into two directly connected multi-scale three-dimensional convolution blocks to extract spatial spectrum characteristics containing multi-scale information; reconstructing the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks, and then respectively inputting the reconstructed spatial spectrum features into the two multi-scale two-dimensional convolution blocks to further extract multi-scale and multi-level spatial features; the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks are input into a three-dimensional convolution layer, the convolution result is reconstructed and then input into a two-dimensional convolution layer, and the spectral features are further extracted; the multi-scale multi-level spatial features and the spectrum features are subjected to feature fusion and then input into the mixed attention block, so that the mixed attention block only pays attention to useful spatial information and spectrum information; the space information and the spectrum information of interest are reconstructed and then input into a one-dimensional convolution layer to obtain the characteristics of a vector form; finally, inputting the data into two full connecting blocks, and outputting a classification result through a classifier;
step 4, selecting a loss function and an evaluation index: the model parameter training is considered to be completed by calculating the classified result and the loss function of the label until the training times reach a set threshold value or the value of the loss function reaches a set range, and meanwhile, an evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;
step 5, saving a training model: and (3) selecting a group of model parameters with the best effect in the training process for solidification, and then when hyperspectral image classification operation is needed, directly inputting hyperspectral images into a network to obtain final classified images.
2. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the public data set adopted in the step 1 is as follows: an indian pine data set (IN), a university of parkia data set (UP), and a salunas data Set (SA).
3. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the method for reducing the data dimension in the step 2 uses a principal component analysis (PCA, principal Component Analysis), and the dimension reduction process is as follows:
for dimension W x H x C 1 Is the original hyperspectral image I of (1) 1 Performing covariance matrix feature decomposition to convert into W×H×C dimensions 2 Novel hyperspectral image I of (2) 2 Wherein W is the image width, H is the image height, C 1 C is the number of original image channels 2 For the number of the wave bands after conversion, the dimension reduction operation can reduce the redundancy of data characteristics and can also reduce the calculation parameters.
4. A hyperspectral image classification method based on a multi-scale mixed convolution network as claimed in claim 3 wherein: the block taking operation process of the sample in the step 2 is as follows:
-transforming said new hyperspectral image I 2 Cut into a size of w×w×c 1 Is input into the network model, w is the window size.
5. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the multi-scale three-dimensional convolution block in the step 3 is composed of three branches of small scale, medium scale and large scale, the small scale branch is sequentially composed of a three-dimensional convolution layer, a batch normalization layer and an activation function layer, the medium scale branch and the large scale branch are sequentially composed of a three-dimensional convolution layer, an activation function layer, a three-dimensional convolution layer, a batch normalization layer and an activation function layer, and tensors obtained by the three branches are spliced together according to a first dimension number; the multi-scale two-dimensional convolution block consists of three branches with small scale, medium scale and large scale, each branch consists of a two-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension number; the mixed attention block comprises a spectrum attention block and a space attention block which are connected in a serial mode; the method comprises the steps of firstly carrying out global average pooling and global maximum pooling on input through two branches in a spectrum attention block, sequentially inputting pooling results into a one-dimensional convolution layer and an activation function layer 1, splicing obtained tensors according to a first dimension, sequentially inputting the tensors into a linear layer 1, an activation function layer 2, a linear layer 2, an activation function layer 1, a one-dimensional convolution layer and an activation function layer 1, and finally carrying out element multiplication on the tensors and input features; the method comprises the steps that firstly, input is subjected to average pooling and maximum pooling through two branches in a space attention block, pooling results are sequentially input into a two-dimensional convolution layer and a batch normalization layer, obtained tensors are spliced according to a first dimension number, and then the tensors are sequentially input into the two-dimensional convolution layer and an activation function layer; the full connection block is sequentially composed of a linear layer, an activation function layer and a Dropout layer.
6. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the loss function in the step 4 selects a cross entropy loss function; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the actual value, and can correctly feed back the quality of the model. The evaluation index selects overall accuracy, average accuracy and consistency, can effectively evaluate the quality of classification and measure the effect of a classification network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310492375.3A CN116524265A (en) | 2023-05-04 | 2023-05-04 | Hyperspectral image classification method based on multi-scale mixed convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310492375.3A CN116524265A (en) | 2023-05-04 | 2023-05-04 | Hyperspectral image classification method based on multi-scale mixed convolution network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116524265A true CN116524265A (en) | 2023-08-01 |
Family
ID=87397179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310492375.3A Pending CN116524265A (en) | 2023-05-04 | 2023-05-04 | Hyperspectral image classification method based on multi-scale mixed convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524265A (en) |
-
2023
- 2023-05-04 CN CN202310492375.3A patent/CN116524265A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11783569B2 (en) | Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model | |
CN110135267B (en) | Large-scene SAR image fine target detection method | |
CN109934282B (en) | SAGAN sample expansion and auxiliary information-based SAR target classification method | |
CN109145992B (en) | Hyperspectral image classification method for cooperatively generating countermeasure network and spatial spectrum combination | |
CN113128134B (en) | Mining area ecological environment evolution driving factor weight quantitative analysis method | |
CN107330463B (en) | Vehicle type identification method based on CNN multi-feature union and multi-kernel sparse representation | |
CN111738124A (en) | Remote sensing image cloud detection method based on Gabor transformation and attention | |
Sun et al. | Satellite data cloud detection using deep learning supported by hyperspectral data | |
CN111191514A (en) | Hyperspectral image band selection method based on deep learning | |
CN111507521A (en) | Method and device for predicting power load of transformer area | |
CN114821164A (en) | Hyperspectral image classification method based on twin network | |
CN112950780B (en) | Intelligent network map generation method and system based on remote sensing image | |
CN111368691B (en) | Unsupervised hyperspectral remote sensing image space spectrum feature extraction method | |
CN112464745A (en) | Ground feature identification and classification method and device based on semantic segmentation | |
CN114037891A (en) | High-resolution remote sensing image building extraction method and device based on U-shaped attention control network | |
CN116912588A (en) | Agricultural greenhouse identification method integrating non-local attention mechanism under coding-decoding | |
CN115965862A (en) | SAR ship target detection method based on mask network fusion image characteristics | |
CN114511785A (en) | Remote sensing image cloud detection method and system based on bottleneck attention module | |
CN116168235A (en) | Hyperspectral image classification method based on double-branch attention network | |
CN116563649B (en) | Tensor mapping network-based hyperspectral image lightweight classification method and device | |
CN116188981A (en) | Hyperspectral high-spatial-resolution remote sensing image classification method and device | |
CN113887656B (en) | Hyperspectral image classification method combining deep learning and sparse representation | |
CN116524265A (en) | Hyperspectral image classification method based on multi-scale mixed convolution network | |
Ni et al. | High-order generalized orderless pooling networks for synthetic-aperture radar scene classification | |
CN115035408A (en) | Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |