CN112580670A

CN112580670A - Hyperspectral-spatial-spectral combined feature extraction method based on transfer learning

Info

Publication number: CN112580670A
Application number: CN202011633323.6A
Authority: CN
Inventors: 彭元喜; 赵丽媛; 杨文婧; 周侗; 刘煜; 黄达; 李雪琼; 徐利洋; 蓝龙; 任静; 杨绍武; 徐炜遐
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-03-30
Anticipated expiration: 2040-12-31
Also published as: CN112580670B

Abstract

The invention discloses a hyperspectral and spatial spectrum combined feature extraction method based on transfer learning, and belongs to the field of deep learning remote sensing. The method for extracting the space-spectrum combined characteristic of the hyperspectral data comprises the steps of firstly designing 1D CNN and 2D CNN to respectively extract the spectrum and the space characteristic of the hyperspectral data, and then fusing the two parts of characteristics. In order to overcome the contradiction that a deep neural network needs a large amount of training data and hyperspectral data lacks of marked samples, the invention adopts a method that a model ResNet-18 pre-trained on an RGB image data set ImageNet is transferred to a hyperspectral image target domain, thereby realizing network parameter sharing and reducing the calculation cost of the training model. And training a SoftMax layer to realize a hyperspectral target classification task based on the extracted combined features. And finally, the model after migration is more suitable for the hyperspectral data through a fine-tuned migration learning strategy, and the classification precision is improved. The invention has clear structure, easy realization and deep theoretical basis and practical significance.

Description

Hyperspectral-spatial-spectral combined feature extraction method based on transfer learning

Technical Field

The invention mainly relates to the field of hyperspectral image classification, in particular to a hyperspectral data space spectrum combined feature extraction method based on transfer learning, which is used for hyperspectral classification.

Background

With the development of spectral imaging technology, hyperspectral imaging has attracted extensive attention in the field of remote sensing due to its broadband detection from visible light to near-infrared wavelength range. The applications of target detection and image classification for hyperspectral data are becoming more and more mature. The early hyperspectral classification method relies on artificial expertise to a great extent, and classification basis is mostly based on shallow features, so that the classification precision is low and the cost is high. With the development of deep learning in recent years, the deep learning method has better feature extraction and expression capability by simulating more complex functions and extracting features hierarchically.

The deep learning method is applied to the hyperspectral classification problem, so that the algorithm performance can be improved, the cost of a large number of manual labels is reduced, and deeper features are excavated. Some representative methods are a Stacked Autoencoder (SAE), a limited boltzmann machine (RBM), and a Convolutional Neural Network (CNN). The CNN is a deep learning network most widely applied in the field of hyperspectral classification, and commonly adopted structures include 1D CNN, 2D CNN, 3D CNN, mixed structure CNN networks and the like.

However, there are two contradictions to the hyperspectral classification problem based on deep learning. Firstly, the spectrum characteristics of different hyperspectral categories have low difference, and the error rate of classification by only utilizing the spectrum characteristics is high. Secondly, along with the increase of the number of the wave bands participating in the operation, the classification precision is increased firstly and then decreased, and the deep learning model is easy to generate the phenomenon of overfitting. For the problem of intra-class difference, a corresponding method is usually adopted to embed the spatial domain information into the spectral features, i.e., the spatial spectral features of the hyperspectrum are comprehensively considered and then classified. Some representative methods for the hough (Hughes) phenomenon are Transfer Learning (TL), data enhancement (DA) and unsupervised/semi-supervised learning. Transfer learning refers to a method of transferring network parameters from a source domain to a target domain, thereby reducing the random initialization process and achieving better performance. The advantages are that: (1) a large number of training samples are not required; (2) training time is saved by sharing the weights of the pre-trained model.

In many methods for applying transfer learning to hyperspectral image problems, a source domain and a target domain which are transferred generally belong to homologous or heterologous hyperspectral data. For example, Yang et al propose that a dual-branch CNN structure based on transfer learning is used for hyperspectral classification by extracting spatio-spectral union features, and although this method saves a random initialization process through transfer learning, a large amount of hyperspectral labeled data is still required to pre-train a model. For example, in a three-dimensional residual network and a transfer learning hyperspectral image classification method provided in patent CN109754017A, a deeper small sample hyperspectral image classification of a network model is realized by transferring hyperspectral Data acquired by different sensors, but the invention aims at transfer learning among heterogeneous hyperspectrum, but does not solve the problem that the feature dimension and the sample number are not matched due to the hous phenomenon Hughes in the hyperspectral classification problem, the feature extracted by a 3D network structure is deeper, the feature dimension is higher, and the Data Shift (Data Shift) problem among heterogeneous hyperspectrum cannot be effectively improved.

However, in recent years, training test data sets of various deep neural network models (such as ResNet, VGG-Net, Google-Net, etc.) which perform better in various image classification tasks are all mass-collected RGB images. In order to fully utilize such excellent model structures and hopefully achieve the effect of improving the hyperspectral classification by transfer learning, and solve the problem of few hyperspectral labeled training data, the problem of dimension mismatch in the process of transferring the RGB image to the hyperspectral image must be solved at first.

To apply the model trained on the 2D RGB image dataset directly to the 3D hyperspectral data, a method of morphing the RGB data or the model may be used. For example, Zhang et al expand RGB images into 3D cube data and input the 3D cube data into a 3D CNN model for pre-training, and then input hyperspectral cube data into the model to realize transfer learning. The method has the advantages that the existing large number of RGB image data resources can be fully utilized, the random initialization process is saved, and the characteristic expression capability of the model is improved. However, due to the physical difference between the hyperspectral data and the RGB data, the classification precision of the migrated model on the hyperspectral data is not high enough, and the model which is trained by expanding and pre-training the RGB image has no way to fully utilize the hyperspectral abundant spectral features, weakens the expression of the spectral features, and does not fundamentally solve the problem that the characteristic dimension caused by the Hughes phenomenon in the hyperspectral classification problem is not matched with the number of samples.

Generally, the existing hyperspectral image classification method has poor classification effect or takes too long time, is subject to high cost of manual labeling of hyperspectral data, and has less data. The factors seriously restrict the application of artificial intelligence in the field of hyperspectral image classification. Aiming at the defects of the existing hyperspectral image classification method based on deep learning, the invention provides an improved hyperspectral image classification method based on the past hyperspectral spectral feature or spatial feature classification. On the basis of fully utilizing hyperspectral data information, a double-branched 1D CNN structure and a double-branched 2D CNN structure are respectively used for extracting hyperspectral spectral information and spatial information, and classification prediction is realized according to fusion of two characteristics. Meanwhile, by adopting a transfer learning method in the hyperspectral spatial feature extraction process, the structural superiority of ResNet-18 is fully utilized, a random initialization process of network parameters is omitted by virtue of a large number of RGB image data set pre-training models, the aim of reducing training time is fulfilled, and finally, a fine-tuning strategy is utilized to act on the whole target domain hyperspectral data set, so that the effect of obviously improving classification precision is achieved.

Disclosure of Invention

The invention mainly solves the technical problems that: aiming at the problems of high marking cost, few data set sources, long training time consumption and the like of the traditional hyperspectral image classification data, the invention provides the hyperspectral classification method which is clear in structure, easy to realize, good in classification effect and short in time consumption.

In the invention, a network (ResNet-18) pre-trained on a 2D RGB image data set (ImageNet) is adopted to extract spatial features of hyperspectral data, meanwhile, spectral features are extracted through 1D CNN and embedded into the whole model, the information of the hyperspectral data set is fully utilized, and finally, the model after migration is more suitable for the data set with a hyperspectral target domain through a fine-tuning strategy.

In the hyperspectral image classification problem, different from the action range of the traditional transfer learning strategy, millions of 2D RGB images are regarded as a source domain (source domain) of transfer learning, and a hyperspectral data set is regarded as a target domain of the transfer learning. The invention designs a double-branch feature extraction network model which is respectively used for extracting the spectral features and the spatial features of the hyperspectral image. The 1D CNN branch is formed by connecting three convolution layers, three maximum pooling layers and a full-link layer in a crossed manner and aims to extract the spectral characteristics of hyperspectrum.

The depth of the network is an important factor for realizing good feature expression capability of the deep neural network, however, under the condition that the depth of the network is continuously deepened, gradient explosion becomes a great obstacle for training the deep network, and the phenomenon can cause that the network cannot be converged or starts to degenerate even if the network convergence accuracy rate is close to saturation. The network degradation can be prevented by superposing a layer of identity mapping on the basis of a shallow network. This is because the multi-layer nonlinear network cannot approximate the fitted identity mapping network. ResNet is a special case in short-cut connections (shortcuts connections) based identity mapping, and does not introduce additional parameters and increase the complexity of calculation. The ResNet structure has been the first of the classification tasks of the ImageNet game because of its excellent performance, and is also widely used in the fields of image detection, segmentation, recognition, and the like. The ResNet network is simple in structure and suitable for transfer learning in an image classification task. To extract deeper spatial features while avoiding excessive superparameters, the present invention selects the ResNet-18 network as the network structure in the spatial branch.

The technical scheme of the invention is as follows:

a hyperspectral and spatial spectrum combined feature extraction method based on transfer learning comprises the following steps:

s1: representing the original hyperspectral data as I e R^M×N×LWhere I represents each sample in a data cube captured by a hyperspectral image sensor, M is the height, N is the width, and L represents the spectrum; before training a neural network model, preprocessing an original hyperspectral image, and performing Principal Component Analysis (PCA) operation along a spectral channel to reduce the dimension of hyperspectral data so as to reduce the calculation cost; reducing the number of spectral channels from L to k by PCA and keeping the space size unchanged; data after PCA pre-processing is expressed as I epsilon R^M×N×kWhere k is the number of principal components, where the value of k is set to 3 corresponding to the number of RGB channels.

After RCA operation is performed, the hyperspectral data is expressed as I e R^M×N×3Representing each pixel of the hyperspectral image as x_ij＝[x₁，x₂，...，x_k]Wherein x is_ij∈X^LExtracting r pixels around each pixel to form an image, and representing each sample asThe following:

when x is_ijWhen the image data is positioned at the edge of the image, zero filling operation is carried out on the part exceeding the edge of the image, so that the image data input into the spatial feature extraction network is preprocessed;

after pre-processing, each pixel is converted from the original 1D vector to a picture of size (2r +1) × (2r +1) × 3; after data preprocessing, making I be belonged to R^{(2r+1)×(2r+1)×3}As input to the spatial branch, let I ∈ R^3×3×LAs input to the spectral branch;

s2, extracting spatial features based on a ResNet-18 network model structure, and directly transferring the pre-trained network parameters on a two-dimensional image data set ImageNet into a network for extracting the spatial features of hyperspectral data by using a transfer learning method, wherein the ResNet-18 network structure consists of 17 convolution layers and 1 full-connection layer, ResNet-18 is represented by stacked residual blocks, and the residual blocks of different convolution structures are connected through downsampling with stride of 2; the ResNet-18 network adopts the sizes of a convolution kernel and a pooling layer kernel, specifically, a convolution layer is formed by a filter with the size of 3 multiplied by 3, a pooling layer is formed by a filter with the size of 2 multiplied by 2, a full-link layer is formed after the average pooling operation after convolution, the FC layer of the last layer of the ResNet-18 structure comprises 1000 channels, and the last part of the model is used for classification by a SoftMax classifier; the migration model for spatial feature extraction deletes the FC layer of the ResNet-18 structure, newly adds an FC layer with the number of neurons being 1024, and keeps the rest unchanged to serve as a spatial branch network; a schematic diagram of a spatial feature extraction framework based on ResNet-18 transfer learning is shown in fig. 2.

S3, applying the 1D CNN network to spectral feature extraction of hyperspectrum, wherein the 1D CNN network is formed by connecting three convolution layers with convolution kernel size of 1 x 1 and filter size of 512, three maximum pooling layers with size of 2 x 2 and a full-link layer containing 1024 neurons; after extraction of spectral and spatial features by 1D CNN and ResNet-18, respectively, the last two fully connected layers from the spatial and spectral branches are connected together as follows:

where F denotes a cascaded FC layer of the spatial spectrum joint feature,

the FC layer representing the spatial branch is,

the last FC layer representing a spectral branch; the method comprises the steps of fusing two full-link layers to achieve space-spectrum combined feature extraction, and finally adding a SoftMax classifier to output to predict probability; the schematic diagram of the network structure is shown in fig. 1.

S4, adopting a fine-tuning strategy method to take the process of the network parameters of the ResNet-18 model which is transferred on the RGB image and is pre-trained as a step for randomly initializing the alternative space branch model, then training the whole model with the spectrum branches, and updating and optimizing each parameter in the new training process, so that the adaptability of the transfer model to the hyperspectral data is higher, and the classification effect is better.

Compared with the prior art, the invention has the advantages that:

1. in the invention, a spatial feature extraction model based on transfer learning is provided by applying a transfer learning method to a hyperspectral image classification problem. This is the first attempt to apply a pre-trained ResNet-18 based network on an RGB image dataset (ImageNet) directly to a hyperspectral classification target domain, enabling better classification performance than transfer learning between two homologous or heterologous hyperspectral datasets.

2. In order to fully utilize spectral information and spatial information provided by a hyperspectral imaging technology, a double-branch feature extractor structure is designed to combine spatial and spectral features. By training the whole double-branch model in combination with the transfer learning, the classification effect of the combined space spectrum feature given to the transfer learning is expected to be better than that of the spectrum feature or the space feature alone.

3. In order to make the migrated model more suitable for a new hyperspectral data set, the parameters of the whole model are optimized and updated by adopting a fine-tuning strategy, so that the classification precision is further improved, the model can express hyperspectral features more accurately on the premise of not generating overfitting, and the problem of model incompatibility caused by trans-physical imaging migration learning is relieved to a certain extent.

Drawings

FIG. 1 is a schematic diagram of a network framework structure for performing space-spectrum joint feature extraction according to the present invention;

FIG. 2 is a schematic diagram of a network structure of a transfer learning ResNet-18 proposed by the present invention;

FIG. 3 is a schematic representation of a hyperspectral data set employed in an example embodiment of the invention;

FIG. 4 is a diagram illustrating the pre-processing operation of spatially branched input data in an exemplary embodiment of the present invention;

fig. 5 is an overall framework diagram of the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and specific examples.

s1: representing the original hyperspectral data as I e R^M×N×LWhere I represents each sample in a data cube captured by a hyperspectral image sensor, M is the height, N is the width, and L represents the spectrum; before training a neural network model, preprocessing an original hyperspectral image, and performing Principal Component Analysis (PCA) operation along a spectral channel to reduce the dimension of hyperspectral data so as to reduce the calculation cost; reducing the number of spectral channels from L to k by PCA and keeping the space size unchanged; data after PCA pre-processing is expressed as I epsilon R^M×N×kWhere k is the number of principal components, whereThe value of k is set to 3 corresponding to the number of RGB channels.

After performing the PCA operation, the hyperspectral data is represented as I ∈ R^M×N×3Representing each pixel of the hyperspectral image as x_ij＝[x₁，x₂，...，x_k]Wherein x is_ij∈X^LExtracting r pixels surrounding each pixel to compose an image, and representing each sample as follows:

s2: extracting spatial features based on a ResNet-18 network model structure, and directly transferring network parameters pre-trained on a two-dimensional image data set ImageNet into a network for extracting the spatial features of hyperspectral data by using a transfer learning method, wherein the ResNet-18 network structure consists of 17 convolutional layers and 1 full-connection layer, ResNet-18 is represented by stacked residual blocks, and the residual blocks of different convolutional structures are connected through downsampling with stride of 2; the ResNet-18 network adopts the sizes of a convolution kernel and a pooling layer kernel, specifically, a convolution layer is formed by a filter with the size of 3 multiplied by 3, a pooling layer is formed by a filter with the size of 2 multiplied by 2, a full-link layer is formed after the average pooling operation after convolution, the FC layer of the last layer of the ResNet-18 structure comprises 1000 channels, and the last part of the model is used for classification by a SoftMax classifier; the migration model for spatial feature extraction deletes the FC layer of the ResNet-18 structure, newly adds an FC layer with the number of neurons being 1024, and keeps the rest unchanged to serve as a spatial branch network; a schematic diagram of a spatial feature extraction framework based on ResNet-18 transfer learning is shown in fig. 2.

S3: the 1D CNN network is used for spectral feature extraction of hyperspectrum, wherein the 1D CNN network is formed by connecting three convolution layers with convolution kernel size of 1 x 1 and filter size of 512, three maximum pooling layers with size of 2 x 2 and a full-connection layer containing 1024 neurons; after extraction of spectral and spatial features by 1D CNN and ResNet-18, respectively, the last two fully connected layers from the spatial and spectral branches are connected together as follows:

where F denotes a cascaded FC layer of the spatial spectrum joint feature,

the FC layer representing the spatial branch is,

the last FC layer representing a spectral branch; the method comprises the steps of fusing two full-link layers to achieve space-spectrum combined feature extraction, and finally adding a SoftMax classifier to output to predict probability; the schematic diagram of the network structure is shown in FIG. 1

S4: and a fine-tuning strategy method is adopted to only serve the process of the network parameters of the ResNet-18 model which is transferred on the RGB image and is pre-trained as a step for randomly initializing the alternative space branch model, then the whole model with the spectrum branches is trained, and each parameter is updated and optimized in a new training process, so that the adaptability of the transfer model to the hyperspectral data is higher, and the classification effect is better.

As shown in fig. 3, in order to more vividly describe the method provided by the present invention, the present invention adopts a hyperspectral image scene of the University of pavian (Pavia University) as a target hyperspectral dataset, and realizes the classification prediction of Pavia University data by transferring a model pre-trained on an RGB dataset ImageNet to a hyperspectral dataset Pavia University. The Pavia University dataset was acquired by the ROSIS sensor at a pixel size of 610 × 340 with a spectral band number of 103. The ground truth of the university of parkia is divided into 9 categories.

S1: as shown in fig. 4, after PCA preprocessing is performed on the Pavia University hyperspectral dataset, the first three principal components are extracted, and a picture similar to an RGB image composed of three channels is formed. Then, the field r of each pixel is expanded, and zero padding is performed on the edge pixels, so that each pixel is converted into a picture with the size of (2r +1) × (2r +1) × 3 from the original one-dimensional spectral vector, and the picture is used as the input of the spatial branch in the invention. According to the input picture size requirement of the ResNet-18 model, the value of the field r is larger than 32, and therefore 33 is selected as the size of r. After preprocessing, each pixel is converted into a picture with the size of 67 × 67 × 3 from the original one-dimensional spectral vector as the input of a spatial feature extraction branch, each 3 × 3 pixel is extracted by a spectral branch, and a rectangular data block with the spectral dimension of 103 is used as the input of the spectral branch.

S2, as shown in FIG. 2, the model of migration of the spatial-branched network structure core block is composed of classical ResNet-18, which removes the last full-link layer and adds a full-link layer composed of 1024 neurons. The network parameters are obtained by direct migration of parameters pre-trained on the RGB image dataset ImageNet.

And S3, as shown in the figure 1, performing feature extraction and feature fusion on the hyperspectral data set Pavia University through spectrum branching and space branching to obtain a hyperspectral classification model based on transfer learning and space spectrum combined feature extraction.

S4, performing optimization, updating and retraining on the parameters of the whole network by a fine adjustment method, and finally performing prediction and classification on 9 categories corresponding to the SoftMax layer to obtain prediction results of different pixels in the hyperspectral image, wherein a flow diagram of the whole method is shown in FIG. 5.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. The hyperspectral and spatial spectrum combined feature extraction method based on transfer learning is characterized by comprising the following steps:

s1, representing the original hyperspectral data as I e R^M×N×LWhere I represents each sample in a data cube captured by a hyperspectral image sensor, M is the height, N is the width, and L represents the spectrum; before training a neural network model, preprocessing an original hyperspectral image, and performing Principal Component Analysis (PCA) operation along a spectral channel to reduce the dimension of hyperspectral data so as to reduce the calculation cost; reducing the number of spectral channels from L to k by PCA and keeping the space size unchanged; data after PCA pre-processing is expressed as I epsilon R^M×N×kWhere k is the number of principal components, where the value of k is set to 3 corresponding to the number of RGB channels;

after performing the PCA operation, the hyperspectral data is represented as I ∈ R^M×N×3Representing each pixel of the hyperspectral image as x_ij＝[x₁,x₂,…,x_k]Wherein x is_ij∈X^LExtracting r pixels surrounding each pixel to compose an image, and representing each sample as follows:

after pre-processing, each pixel is converted from the original 1D vector to a picture of size (2r +1) × (2r +1) × 3; after data preprocessing, making I be belonged to R^{(2r+1)×(2r+1)×3}As input to the spatial branch, let I eR^3×3×LAs input to the spectral branch;

s2, extracting spatial features based on a ResNet-18 network model structure, and directly transferring the pre-trained network parameters on a two-dimensional image data set ImageNet into a network for extracting the spatial features of hyperspectral data by using a transfer learning method, wherein the ResNet-18 network structure consists of 17 convolution layers and 1 full-connection layer, ResNet-18 is represented by stacked residual blocks, and the residual blocks of different convolution structures are connected through downsampling with stride of 2; the ResNet-18 network adopts the sizes of a convolution kernel and a pooling layer kernel, specifically, a convolution layer is formed by a filter with the size of 3 multiplied by 3, a pooling layer is formed by a filter with the size of 2 multiplied by 2, a full-link layer is formed after the average pooling operation after convolution, the FC layer of the last layer of the ResNet-18 structure comprises 1000 channels, and the last part of the model is used for classification by a SoftMax classifier; the migration model for spatial feature extraction deletes the FC layer of the ResNet-18 structure, newly adds an FC layer with the number of neurons being 1024, and keeps the rest unchanged to serve as a spatial branch network;

where F denotes a cascaded FC layer of the spatial spectrum joint feature,

the FC layer representing the spatial branch is,

the last FC layer representing a spectral branch; the method comprises the steps of fusing two full-link layers to achieve space-spectrum combined feature extraction, and finally adding a SoftMax classifier to output to predict probability;