CN116524265A - Hyperspectral image classification method based on multi-scale mixed convolution network - Google Patents

Hyperspectral image classification method based on multi-scale mixed convolution network Download PDF

Info

Publication number
CN116524265A
CN116524265A CN202310492375.3A CN202310492375A CN116524265A CN 116524265 A CN116524265 A CN 116524265A CN 202310492375 A CN202310492375 A CN 202310492375A CN 116524265 A CN116524265 A CN 116524265A
Authority
CN
China
Prior art keywords
scale
layer
dimensional convolution
hyperspectral image
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310492375.3A
Other languages
Chinese (zh)
Inventor
葛微
陈博文
陈婷婷
王鹏
詹伟达
唐雁峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Research Institute Of Changchun University Of Technology
Original Assignee
Chongqing Research Institute Of Changchun University Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Research Institute Of Changchun University Of Technology filed Critical Chongqing Research Institute Of Changchun University Of Technology
Priority to CN202310492375.3A priority Critical patent/CN116524265A/en
Publication of CN116524265A publication Critical patent/CN116524265A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image classification processing, in particular to a hyperspectral image classification method based on a multi-scale mixed convolution network, which specifically comprises the following steps of step 1, hyperspectral image: using the disclosed hyperspectral image dataset; step 2, image preprocessing: and (3) carrying out data dimension reduction on the hyperspectral image in the step (1) and carrying out sample block taking on the image after dimension reduction to obtain a hyperspectral sample block. According to the invention, the space-spectrum combined features of the hyperspectral image are extracted by adopting a plurality of multi-scale 3D convolution modules, so that the spectrum and space dimension features under different scales are fully fused, and the classification performance of the hyperspectral image is improved.

Description

Hyperspectral image classification method based on multi-scale mixed convolution network
Technical Field
The invention relates to the technical field of image classification processing, in particular to a hyperspectral image classification method based on a multi-scale mixed convolution network.
Background
The hyperspectral remote sensing utilizes a plurality of electromagnetic waves to acquire related data of a measured object, is the leading edge field of the current remote sensing technology, and a hyperspectral image is formed by combining surface image information and spectrum information, and has the advantages of large spectrum information quantity, nanoscale spectrum resolution, spectrum integration and the like. Classifying hyperspectral images acquired by combining an imaging spectrometer with an aircraft and a satellite by using a deep learning technology has become an emerging research field.
The basis of hyperspectral image classification is spectral information and spatial information, and the combined use of the spatial information and the spectral information is widely applied to hyperspectral image classification in the hyperspectral image classification field at present, and the Chinese patent publication number is CN113837314A, and the name is a hyperspectral image classification method based on a mixed convolutional neural network; the network model adopts a single 3D convolution model to simultaneously extract the spectrum and space dimension characteristics of the preprocessed hyperspectral image; further extracting space dimension characteristics by adopting a single 2D convolution model; processing the output information by adopting a single 1D convolution model; the method does not consider the relation between the features of different scales and different layers, the relation between the remote pixel blocks is not utilized, and the classification effect is required to be improved.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a hyperspectral image classification method based on a multi-scale mixed convolution network, which solves the problem of classification precision.
(II) technical scheme
The invention adopts the following technical scheme for realizing the purposes: a hyperspectral image classification method based on a multiscale mixed convolution network specifically comprises the following steps
Step 1, hyperspectral image: using the disclosed hyperspectral image dataset;
step 2, image preprocessing: performing data dimension reduction on the hyperspectral image in the step 1, and performing sample block taking on the image after dimension reduction to obtain a hyperspectral sample block;
step 3, constructing a multi-scale mixed convolution network model: the whole network model comprises two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, a mixed attention block, a three-dimensional convolution layer, a two-dimensional convolution layer, a one-dimensional convolution layer, two full connection blocks and a classifier. Inputting a hyperspectral sample block into two directly connected multi-scale three-dimensional convolution blocks to extract spatial spectrum characteristics containing multi-scale information; reconstructing the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks, and then respectively inputting the reconstructed spatial spectrum features into the two multi-scale two-dimensional convolution blocks to further extract multi-scale and multi-level spatial features; the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks are input into a three-dimensional convolution layer, the convolution result is reconstructed and then input into a two-dimensional convolution layer, and the spectral features are further extracted; the multi-scale multi-level spatial features and the spectrum features are subjected to feature fusion and then input into the mixed attention block, so that the mixed attention block only pays attention to useful spatial information and spectrum information; the space information and the spectrum information of interest are reconstructed and then input into a one-dimensional convolution layer to obtain the characteristics of a vector form; finally, the data are input into two full connecting blocks, and a classification result is output through a classifier.
Step 4, selecting a loss function and an evaluation index: the model parameter training is considered to be completed by calculating the classified result and the loss function of the label until the training times reach a set threshold value or the value of the loss function reaches a set range, and meanwhile, an evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;
step 5, saving a training model: and (3) selecting a group of model parameters with the best effect in the training process for solidification, and then when hyperspectral image classification operation is needed, directly inputting hyperspectral images into a network to obtain final classified images.
Further, the disclosed data set adopted in the step 1 is as follows: an indian pine data set (IN), a university of parkia data set (UP), and a salunas data Set (SA).
Further, the method for reducing the dimension of the data in the step 2 uses a principal component analysis (PCA, principal Component Analysis), and the dimension reduction process is as follows:
for dimension W x H x C 1 Is the original hyperspectral image I of (1) 1 Performing covariance matrix feature decomposition to convert into W×H×C dimensions 2 Novel hyperspectral image I of (2) 2 Wherein W is the image width, H is the image height, C 1 C is the number of original image channels 2 For the number of the wave bands after conversion, the dimension reduction operation can reduce the redundancy of data characteristics and can also reduce the calculation parameters.
Further, the block taking operation process of the sample in the step 2 is as follows:
-transforming said new hyperspectral image I 2 Cut into a size of w×w×c 1 Is input into the network model, w is the window size.
Further, the multi-scale three-dimensional convolution block in the step 3 is composed of three branches of small scale, medium scale and large scale, the small scale branch is sequentially composed of a three-dimensional convolution layer, a batch normalization layer and an activation function layer, the medium scale branch and the large scale branch are sequentially composed of a three-dimensional convolution layer, an activation function layer, a three-dimensional convolution layer, a batch normalization layer and an activation function layer, and tensors obtained by the three branches are spliced together according to a first dimension number; the multi-scale two-dimensional convolution block consists of three branches with small scale, medium scale and large scale, each branch consists of a two-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension number; the mixed attention block comprises a spectrum attention block and a space attention block which are connected in a serial mode; the method comprises the steps of firstly carrying out global average pooling and global maximum pooling on input through two branches in a spectrum attention block, sequentially inputting pooling results into a one-dimensional convolution layer and an activation function layer 1, splicing obtained tensors according to a first dimension, sequentially inputting the tensors into a linear layer 1, an activation function layer 2, a linear layer 2, an activation function layer 1, a one-dimensional convolution layer and an activation function layer 1, and finally carrying out element multiplication on the tensors and input features; the method comprises the steps that firstly, input is subjected to average pooling and maximum pooling through two branches in a space attention block, pooling results are sequentially input into a two-dimensional convolution layer and a batch normalization layer, obtained tensors are spliced according to a first dimension number, and then the tensors are sequentially input into the two-dimensional convolution layer and an activation function layer; the full connection block is sequentially composed of a linear layer, an activation function layer and a Dropout layer.
Further, the loss function in the step 4 selects a cross entropy loss function; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the actual value, and can correctly feed back the quality of the model. The evaluation index selects overall accuracy, average accuracy and consistency, can effectively evaluate the quality of classification and measure the effect of a classification network.
(III) beneficial effects
Compared with the prior art, the hyperspectral image classification method based on the multi-scale mixed convolution network has the following beneficial effects:
1. according to the invention, the space-spectrum combined features of the hyperspectral image are extracted by adopting a plurality of multi-scale 3D convolution modules, so that the spectrum and space dimension features under different scales are fully fused, and the classification performance of the hyperspectral image is improved.
2. According to the method, a plurality of multi-scale 2D convolution modules are adopted, the space dimension characteristics of different layers are further extracted, the calculated amount is reduced, and meanwhile, the classification precision of hyperspectral images is further improved.
3. According to the invention, the mixed attention module is added to establish the connection between the far pixel blocks, so that the classification precision of the hyperspectral image is further improved.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a network architecture of the present invention;
FIG. 3 is a schematic diagram of the specific composition of a multi-scale 3D convolution module according to the present disclosure;
FIG. 4 is a schematic diagram of the specific composition of a multi-scale 2D convolution module according to the present disclosure;
FIG. 5 is a schematic diagram showing the specific components of the hybrid attention module of the present invention;
FIG. 6 is a schematic diagram showing the specific composition of a spectral attention block in the hybrid attention module of the present invention;
FIG. 7 is a schematic diagram showing the specific composition of a spatial attention block in the hybrid attention module of the present invention;
FIG. 8 is a schematic diagram showing the specific composition of the full connection block of the present invention;
FIG. 9 is a graph comparing correlation metrics over three data sets according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
As shown in fig. 1-9, a hyperspectral image classification method based on a multi-scale mixed convolution network according to an embodiment of the present invention specifically includes the following steps:
step 1, hyperspectral image: the disclosed hyperspectral image dataset is employed.
Step 2, image preprocessing: and (3) performing data dimension reduction on the hyperspectral image in the step (1), performing sample block taking on the image after dimension reduction to obtain a hyperspectral sample block, performing dimension reduction on the original hyperspectral image I1 according to a principal component analysis method, and performing sample block taking on the new hyperspectral image I2 after dimension reduction to obtain a three-dimensional image block.
Because of the large number of hyperspectral images and numerous wave bands, the data is necessary to be subjected to dimension reduction operation, and a Principal Component Analysis (PCA) is a statistical method, and a group of variables possibly with correlation are converted into a group of variables which are not linearly related through forward-backward conversion, and the converted variables are called as principal components.
The specific implementation method of the sample block is that in the space dimension, the new hyperspectral image I2 is cut into a three-dimensional image block with the size of w multiplied by B, and the three-dimensional image block is input into a network model, wherein w is the size of a window; the sample block is labeled with intermediate pixels.
Step 3, constructing a multi-scale mixed convolution network model: the whole network model comprises two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, a mixed attention block, a three-dimensional convolution layer, a two-dimensional convolution layer, a one-dimensional convolution layer, two full connection blocks and a classifier, and hyperspectral sample blocks are input into the two multi-scale three-dimensional convolution blocks which are directly connected to extract spatial spectrum features containing multi-scale information; reconstructing the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks, and then respectively inputting the reconstructed spatial spectrum features into the two multi-scale two-dimensional convolution blocks to further extract multi-scale and multi-level spatial features; the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks are input into a three-dimensional convolution layer, the convolution result is reconstructed and then input into a two-dimensional convolution layer, and the spectral features are further extracted; the multi-scale multi-level spatial features and the spectrum features are subjected to feature fusion and then input into the mixed attention block, so that the mixed attention block only pays attention to useful spatial information and spectrum information; the space information and the spectrum information of interest are reconstructed and then input into a one-dimensional convolution layer to obtain the characteristics of a vector form; finally, inputting the data into two full connecting blocks, and outputting a classification result through a classifier; the multi-scale three-dimensional convolution block consists of three branches of small scale, medium scale and large scale, wherein the small scale branch consists of a three-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, the medium scale and large scale branch consists of a three-dimensional convolution layer, an activation function layer, a three-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension; the multi-scale two-dimensional convolution block consists of three branches with small scale, medium scale and large scale, each branch consists of a two-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension number; the mixed attention block comprises a spectrum attention block and a space attention block which are connected in a serial mode; the method comprises the steps of firstly carrying out global average pooling and global maximum pooling on input through two branches in a spectrum attention block, sequentially inputting pooling results into a one-dimensional convolution layer and an activation function layer 1, splicing obtained tensors according to a first dimension, sequentially inputting the tensors into a linear layer 1, an activation function layer 2, a linear layer 2, an activation function layer 1, a one-dimensional convolution layer and an activation function layer 1, and finally carrying out element multiplication on the tensors and input features; the method comprises the steps that firstly, input is subjected to average pooling and maximum pooling through two branches in a space attention block, pooling results are sequentially input into a two-dimensional convolution layer and a batch normalization layer, obtained tensors are spliced according to a first dimension number, and then the tensors are sequentially input into the two-dimensional convolution layer and an activation function layer; the full connection block is sequentially composed of a linear layer, an activation function layer and a Dropout layer.
The batch normalization layer forcibly pulls the distribution of any neuron input values of each layer of neural network back to the standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means, so that the activated input values fall in a region of which the nonlinear function is sensitive to input, the output of the network is not too large, a relatively large gradient is obtained, the problem of gradient disappearance is avoided, and further gradient enlargement also means a learning convergence speed block; the Dropout layer may set a certain probability to stop some neurons from working when propagating forward, then start training, update and keep those neurons still working and weight parameters, after all the parameters are updated, re-make some neurons stop working according to the probability set by us, then start training, if a new neuron for training has been trained in the first time, continue updating its parameters, if the parameters are updated for the first time, but keep the weights of the neurons stopped for the second time, and keep the process until the training is finished, so as to prevent the network from fitting during learning.
Step 4, selecting a loss function and an evaluation index: the model parameter training is considered to be completed by calculating the loss function of the classified result image and the label until the training times reach a set threshold value or the value of the loss function reaches a set range, the model parameter is saved, meanwhile, an evaluation index is selected to measure the accuracy of an algorithm, the performance of a system is evaluated, the selection of the loss function influences the quality of the model, the difference between a predicted value and a true value can be truly reflected, the quality of the model can be correctly fed back, the overall accuracy, the average accuracy and the consistency of the evaluation index are selected, the classified quality can be effectively evaluated, and the effect of a classification network is measured.
Step 5, saving a training model: and (3) selecting a group of model parameters with the best effect in the training process for solidification, and then when hyperspectral image classification operation is needed, directly inputting hyperspectral images into a network to obtain final classified images.
Further, said step 1 selecting an indian pine data set (IN), a university of parkia data set (UP) and a salunas data Set (SA); the indian pine data set (IN) is a hyperspectral image obtained by an on-board visible infrared imaging spectrometer IN northwest of indiana, the space size of the image is 145×145, the number of wave bands is 220, the resolution of the spectrum and space is 10nm and 20m, background pixels are removed, the number of space pixels generally used for experiments is 10249, the true class of ground object is 16, and IN 220 wave bands, 20 are unavailable, and the rest 200 wave bands are only used for the experiments for research; the parkia university dataset (UP) was obtained by AVIRIS sensor in florida 1996, with a spatial size of 512 x 614, a spatial resolution of 18m, and the dataset was divided into 9 categories; 115 bands, 12 noise bands are removed, and 103 available bands are left; the salina dataset (SA) is a hyperspectral image obtained in the united states by an avisis sensor; the spatial size of the image is 512×217, the spatial resolution is 1.7m, wherein the ground features have 16 categories, 224 wave bands, but 20 water absorption band frequency bands are removed, and the remaining 204 wave bands are used for hyperspectral image classification experiments.
Further, in the step 2, taking the Indian pine data set as an example, the original hyperspectral image I1 is firstly reduced to 145×145×30, the covariance matrix of the original hyperspectral image is solved, and then the characteristic root lambda of the covariance matrix is calculated 1 ≥λ 2 …≥λ 200 And (3) setting a threshold value theta, selecting the first P principal components larger than the threshold value theta, obtaining corresponding unit feature vectors from the feature roots of the first P principal components, combining the unit feature vectors into a matrix, solving a transposed matrix, and then transforming the original hyperspectral image by the transposed matrix to obtain the hyperspectral image with reduced dimension. And then, the dimension-reduced image I2 is subjected to block taking, and a three-dimensional image block with the size of 25 multiplied by 30 is obtained. The calculation formula of the band mean and covariance matrix is as follows:
wherein X is i Representing the ith pixel point of the original hyperspectral image, Q represents the number of the pixel points and X j The j-th wave band of the original hyperspectral image is represented, and B is represented as the number of wave bands;
further, in the step 3, the structure of the multi-scale mixed convolution network model is shown in fig. 2, and the whole network model includes two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, one mixed attention block, one three-dimensional convolution layer, one two-dimensional convolution layer, one-dimensional convolution layer, two full connection blocks and one classifier. The structure of the multi-scale three-dimensional convolution block is shown in fig. 3, wherein all activation function layers use a mich function; in a small-scale branch of a first multi-scale three-dimensional convolution block, the size of convolution kernels in a convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 16; in a mesoscale branch of the first multi-scale three-dimensional convolution block, the size of convolution kernels in the first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 8, the size of convolution kernels in the second three-dimensional convolution layer is 3 multiplied by 3, and the number of convolution kernels is 16; in a large-scale branch of a first multi-scale three-bit convolution block, the size of convolution kernels in a first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 8, the size of convolution kernels in a second three-dimensional convolution layer is 5 multiplied by 5, and the number of convolution kernels is 32; in the small-scale branch of the second multi-scale three-dimensional convolution block, the size of convolution kernels in the convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 32; in a mesoscale branch of the second multi-scale three-dimensional convolution block, the size of convolution kernels in the first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 16, the size of convolution kernels in the second three-dimensional convolution layer is 3 multiplied by 3, and the number of convolution kernels is 32; in the large-scale branch of the second multi-scale three-dimensional convolution block, the size of convolution kernels in the first three-dimensional convolution layer is 1 multiplied by 1, the number of convolution kernels is 16, the size of convolution kernels in the second three-dimensional convolution layer is 5 multiplied by 5, and the number of convolution kernels is 64; the structure of the multi-scale two-dimensional convolution block is shown in fig. 4, wherein all activation function layers use a mich function; in a small-scale branch of a first multi-scale two-dimensional convolution block, the convolution kernel size in the convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 16; in a mesoscale branch of a first multiscale two-dimensional convolution block, the size of convolution kernels in a convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 16; in a large-scale branch of the first multi-scale two-dimensional convolution block, the size of convolution kernels in the convolution layer is 5 multiplied by 5, and the number of the convolution kernels is 16; in a small-scale branch of the second multi-scale two-dimensional convolution block, the convolution kernel size in the convolution layer is 1 multiplied by 1, and the number of the convolution kernels is 32; in a mesoscale branch of the second multiscale two-dimensional convolution block, the size of convolution kernels in the convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 32; in the large-scale branch of the second multi-scale two-dimensional convolution block, the size of convolution kernels in the convolution layer is 5 multiplied by 5, and the number of the convolution kernels is 32; the size of the convolution kernel in the three-dimensional convolution layer connected with the second multi-scale three-dimensional convolution block is 1 multiplied by 1, the number of the convolution kernels is 64, the size of the convolution kernel in the two-dimensional convolution layer connected with the three-dimensional convolution layer is 3 multiplied by 3, and the number of the convolution kernels is 64; the structure of the mixed attention block is shown in fig. 5; the structure of the spectrum attention block is shown in fig. 6, the size of a convolution kernel in a one-dimensional convolution layer in two branches is B (B is the number of channels after dimension reduction), a Sigmoid function is used for an activation function layer 1, a Mish function is used for an activation function layer 2, and the size of the convolution kernel in the subsequent one-dimensional convolution layer is 2B; the structure of the spatial attention block is shown in fig. 7, wherein the convolution kernel size of the two-dimensional convolution layers of the two branches is 1×1, the convolution kernel size in the subsequent two-dimensional convolution layer is 3×3, and the activation function layer uses a Sigmoid function; the structure of the fully connected layer is shown in fig. 8, wherein the activation function layer uses a Mish function, and the Dropout coefficient is set to 0.2; the above-mentioned Mish activation function is a non-monotonic smooth activation function, which can achieve better accuracy and generalization; the Sigmoid activation function is less affected by noise data; the LogSoftmax classifier is selected by the classifier, so that the operation speed can be increased, and the data stability can be improved; the Sigmoid function, the mix function, and the LogSoftmax function are defined as follows:
f(x) Mish =x*tanh(ln(1+e x ))
wherein x represents the input characteristic information, x i Representing predicted tag value, x j Representing the true tag value.
Further, the output of the network and the label in the step 4 calculate a loss function, the loss function selects a cross entropy loss function, and the cross entropy loss function is defined as follows:
wherein C represents the cost, x represents the samples, y represents the actual value, a represents the output value, and n represents the total number of samples.
The overall accuracy is an index for measuring the overall classification accuracy of the classification model, the average accuracy is an index for indicating the classification accuracy of the classification model on a certain class, and the consistency coefficient is used for measuring the consistency of the predicted value and the true value. The calculation formulas of the overall accuracy, the average accuracy and the consistency coefficient are as follows:
where TP is the positive sample correctly classified by the model, FN is the positive sample incorrectly classified by the model, FP is the negative sample incorrectly classified by the model, and TN is the negative sample correctly classified by the model. C is the total number of categories, T i Is the number of correctly classified samples for each category, a i Is the number of real samples of each type, b i The number of samples of each type is predicted, and n is the number of total samples.
The training frequency is set to be 200, the number of the network pictures input each time is 64, the upper limit of the number of the network pictures input each time is mainly determined according to the performance of a computer graphics processor, and generally, the larger the number of the network pictures input each time is, the better the network is, so that the network is more stable. The learning rate of the training process is set to 0.005, which can ensure the rapid fitting of the network without causing the over fitting of the network. The network optimizer selects an Adam optimizer. The method has the advantages of simple realization, high calculation efficiency, less memory requirement, no influence of gradient expansion transformation on parameter updating and stable parameter updating. The loss function value threshold is set to be about 0.005, and if the loss function value threshold is smaller than 0.005, training of the whole network can be considered to be basically completed.
The implementation of convolution, activation function and splicing operation is an algorithm well known to those skilled in the art, and the specific flow and method can be referred to in corresponding textbooks or technical literature.
According to the hyperspectral image classification method based on the multi-scale mixed convolution network, hyperspectral images can be classified, the hyperspectral image multi-scale and multi-level spatial spectrum characteristic information is utilized, and the connection is established between the long-distance pixels, so that the classification precision of the hyperspectral images is improved; the feasibility and superiority of the method are further verified by calculating the related indexes of the image obtained by the existing method.
On the indian pine data set (IN), the university of parkia data set (UP) and the salunas data Set (SA), the related index pairs of the prior art and the proposed method of the present invention are shown IN fig. 9; wherein 10% of the sample data is used as a training set, 10% of the sample data is used as a validation set, and the rest of the sample data is used as a test set in each data set; the method provided by the invention is applied to three data sets, and the values of three indexes of overall precision, average precision and consistency coefficient are higher than those of the existing method, and the indexes further indicate that the method provided by the invention has better classification effect.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A hyperspectral image classification method based on a multi-scale mixed convolution network is characterized by comprising the following steps of: the method comprises the following steps in particular,
step 1, hyperspectral image: using the disclosed hyperspectral image dataset;
step 2, image preprocessing: performing data dimension reduction on the hyperspectral image in the step 1, and performing sample block taking on the image after dimension reduction to obtain a hyperspectral sample block;
step 3, constructing a multi-scale mixed convolution network model: the whole network model comprises two multi-scale three-dimensional convolution blocks, two multi-scale two-dimensional convolution blocks, a mixed attention block, a three-dimensional convolution layer, a two-dimensional convolution layer, a one-dimensional convolution layer, two full connection blocks and a classifier. Inputting a hyperspectral sample block into two directly connected multi-scale three-dimensional convolution blocks to extract spatial spectrum characteristics containing multi-scale information; reconstructing the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks, and then respectively inputting the reconstructed spatial spectrum features into the two multi-scale two-dimensional convolution blocks to further extract multi-scale and multi-level spatial features; the spatial spectrum features extracted by the two multi-scale three-dimensional convolution blocks are input into a three-dimensional convolution layer, the convolution result is reconstructed and then input into a two-dimensional convolution layer, and the spectral features are further extracted; the multi-scale multi-level spatial features and the spectrum features are subjected to feature fusion and then input into the mixed attention block, so that the mixed attention block only pays attention to useful spatial information and spectrum information; the space information and the spectrum information of interest are reconstructed and then input into a one-dimensional convolution layer to obtain the characteristics of a vector form; finally, inputting the data into two full connecting blocks, and outputting a classification result through a classifier;
step 4, selecting a loss function and an evaluation index: the model parameter training is considered to be completed by calculating the classified result and the loss function of the label until the training times reach a set threshold value or the value of the loss function reaches a set range, and meanwhile, an evaluation index is selected to measure the accuracy of the algorithm and evaluate the performance of the system;
step 5, saving a training model: and (3) selecting a group of model parameters with the best effect in the training process for solidification, and then when hyperspectral image classification operation is needed, directly inputting hyperspectral images into a network to obtain final classified images.
2. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the public data set adopted in the step 1 is as follows: an indian pine data set (IN), a university of parkia data set (UP), and a salunas data Set (SA).
3. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the method for reducing the data dimension in the step 2 uses a principal component analysis (PCA, principal Component Analysis), and the dimension reduction process is as follows:
for dimension W x H x C 1 Is the original hyperspectral image I of (1) 1 Performing covariance matrix feature decomposition to convert into W×H×C dimensions 2 Novel hyperspectral image I of (2) 2 Wherein W is the image width, H is the image height, C 1 C is the number of original image channels 2 For the number of the wave bands after conversion, the dimension reduction operation can reduce the redundancy of data characteristics and can also reduce the calculation parameters.
4. A hyperspectral image classification method based on a multi-scale mixed convolution network as claimed in claim 3 wherein: the block taking operation process of the sample in the step 2 is as follows:
-transforming said new hyperspectral image I 2 Cut into a size of w×w×c 1 Is input into the network model, w is the window size.
5. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the multi-scale three-dimensional convolution block in the step 3 is composed of three branches of small scale, medium scale and large scale, the small scale branch is sequentially composed of a three-dimensional convolution layer, a batch normalization layer and an activation function layer, the medium scale branch and the large scale branch are sequentially composed of a three-dimensional convolution layer, an activation function layer, a three-dimensional convolution layer, a batch normalization layer and an activation function layer, and tensors obtained by the three branches are spliced together according to a first dimension number; the multi-scale two-dimensional convolution block consists of three branches with small scale, medium scale and large scale, each branch consists of a two-dimensional convolution layer, a batch normalization layer and an activation function layer in sequence, and tensors obtained by the three branches are spliced together according to a first dimension number; the mixed attention block comprises a spectrum attention block and a space attention block which are connected in a serial mode; the method comprises the steps of firstly carrying out global average pooling and global maximum pooling on input through two branches in a spectrum attention block, sequentially inputting pooling results into a one-dimensional convolution layer and an activation function layer 1, splicing obtained tensors according to a first dimension, sequentially inputting the tensors into a linear layer 1, an activation function layer 2, a linear layer 2, an activation function layer 1, a one-dimensional convolution layer and an activation function layer 1, and finally carrying out element multiplication on the tensors and input features; the method comprises the steps that firstly, input is subjected to average pooling and maximum pooling through two branches in a space attention block, pooling results are sequentially input into a two-dimensional convolution layer and a batch normalization layer, obtained tensors are spliced according to a first dimension number, and then the tensors are sequentially input into the two-dimensional convolution layer and an activation function layer; the full connection block is sequentially composed of a linear layer, an activation function layer and a Dropout layer.
6. The hyperspectral image classification method based on the multi-scale mixed convolution network according to claim 1, wherein the method comprises the following steps: the loss function in the step 4 selects a cross entropy loss function; the selection of the loss function influences the quality of the model, can truly reflect the difference between the predicted value and the actual value, and can correctly feed back the quality of the model. The evaluation index selects overall accuracy, average accuracy and consistency, can effectively evaluate the quality of classification and measure the effect of a classification network.
CN202310492375.3A 2023-05-04 2023-05-04 Hyperspectral image classification method based on multi-scale mixed convolution network Pending CN116524265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310492375.3A CN116524265A (en) 2023-05-04 2023-05-04 Hyperspectral image classification method based on multi-scale mixed convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310492375.3A CN116524265A (en) 2023-05-04 2023-05-04 Hyperspectral image classification method based on multi-scale mixed convolution network

Publications (1)

Publication Number Publication Date
CN116524265A true CN116524265A (en) 2023-08-01

Family

ID=87397179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310492375.3A Pending CN116524265A (en) 2023-05-04 2023-05-04 Hyperspectral image classification method based on multi-scale mixed convolution network

Country Status (1)

Country Link
CN (1) CN116524265A (en)

Similar Documents

Publication Publication Date Title
US11783569B2 (en) Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model
CN110135267B (en) Large-scene SAR image fine target detection method
CN109934282B (en) SAGAN sample expansion and auxiliary information-based SAR target classification method
CN109145992B (en) Hyperspectral image classification method for cooperatively generating countermeasure network and spatial spectrum combination
CN113128134B (en) Mining area ecological environment evolution driving factor weight quantitative analysis method
CN107330463B (en) Vehicle type identification method based on CNN multi-feature union and multi-kernel sparse representation
CN111738124A (en) Remote sensing image cloud detection method based on Gabor transformation and attention
Sun et al. Satellite data cloud detection using deep learning supported by hyperspectral data
CN111191514A (en) Hyperspectral image band selection method based on deep learning
CN111507521A (en) Method and device for predicting power load of transformer area
CN114821164A (en) Hyperspectral image classification method based on twin network
CN112950780B (en) Intelligent network map generation method and system based on remote sensing image
CN111368691B (en) Unsupervised hyperspectral remote sensing image space spectrum feature extraction method
CN112464745A (en) Ground feature identification and classification method and device based on semantic segmentation
CN114037891A (en) High-resolution remote sensing image building extraction method and device based on U-shaped attention control network
CN116912588A (en) Agricultural greenhouse identification method integrating non-local attention mechanism under coding-decoding
CN115965862A (en) SAR ship target detection method based on mask network fusion image characteristics
CN114511785A (en) Remote sensing image cloud detection method and system based on bottleneck attention module
CN116168235A (en) Hyperspectral image classification method based on double-branch attention network
CN116563649B (en) Tensor mapping network-based hyperspectral image lightweight classification method and device
CN116188981A (en) Hyperspectral high-spatial-resolution remote sensing image classification method and device
CN113887656B (en) Hyperspectral image classification method combining deep learning and sparse representation
CN116524265A (en) Hyperspectral image classification method based on multi-scale mixed convolution network
Ni et al. High-order generalized orderless pooling networks for synthetic-aperture radar scene classification
CN115035408A (en) Unmanned aerial vehicle image tree species classification method based on transfer learning and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination