CN104008383A

CN104008383A - Hyperspectral image characteristic extraction algorithm based on manifold learning linearization

Info

Publication number: CN104008383A
Application number: CN201410286545.3A
Authority: CN
Inventors: 张淼; 赖镇洲; 刘攀; 沈毅
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2014-06-24
Filing date: 2014-06-24
Publication date: 2014-08-27
Anticipated expiration: 2034-06-24
Also published as: CN104008383B

Abstract

一种基于流形学习线性化的高光谱图像特征提取算法，属于高光谱图像数据处理与应用技术领域。本发明针对流形学习算法无泛化能力的不足，提出了一种改进的流形学习线性化算法。所述方法包括如下步骤：一、计算初步降维结果和拉普拉斯矩阵；二、构建矩阵方程组常数项矩阵和系数矩阵；三、计算特征转换矩阵；四、通过特征转换矩阵计算最终降维结果。本发明针对LPP、NPE和LLTSA线性化流形学习算法中全局线性映射的假设在很多时候不成立的不足，在原有的代价函数中加入了偏离原流形学习算法结果的惩罚项，并且舍去了原目标函数中的约束项，将最优特征转换矩阵的求解转换为一个矩阵方程组的求解问题。该方法适用于高光谱图像的特征提取。A hyperspectral image feature extraction algorithm based on manifold learning linearization belongs to the technical field of hyperspectral image data processing and application. The invention proposes an improved manifold learning linearization algorithm aiming at the lack of generalization ability of the manifold learning algorithm. The method comprises the following steps: 1. Calculating the preliminary dimensionality reduction result and the Laplacian matrix; 2. Constructing a constant term matrix and a coefficient matrix of the matrix equation system; 3. Calculating the feature transformation matrix; dimension results. The present invention aims at the deficiency that the assumption of global linear mapping in LPP, NPE and LLTSA linearized manifold learning algorithms is often untenable, and adds a penalty item that deviates from the result of the original manifold learning algorithm to the original cost function, and discards the The constraint item in the original objective function converts the solution of the optimal feature transformation matrix into a solution problem of a matrix equation system. This method is suitable for feature extraction of hyperspectral images.

Description

Hyperspectral Image Feature Extraction Algorithm Based on Manifold Learning Linearization

技术领域technical field

本发明属于高光谱图像数据处理与应用技术领域，涉及一种高光谱图像特征提取算法，具体涉及一种基于流形学习线性化的高光谱图像特征提取算法。The invention belongs to the technical field of hyperspectral image data processing and application, and relates to a hyperspectral image feature extraction algorithm, in particular to a hyperspectral image feature extraction algorithm based on manifold learning linearization.

背景技术Background technique

高光谱图像是信息量巨大的数据立方体，每个像元都对应一条包含上百个波段的谱线，这为人们研究物质与光谱曲线之间的关系提供了可能。但是高光谱数据存在数据冗余和维数灾难问题，人们有迫切的需求去消除高光谱数据的这种信息的冗余性。高光谱数据的这种冗余性主要是高光谱数据波段间的相关性造成的，降维是一种重要的预处理方法，尽管像PCA(Principal Component Analysis)和LDA(LinearDiscriminantAnalysis)这样的线性降维算法实现简单，但是高光谱图像具有非线性特性，流形学习算法可以更好地挖掘高光谱数据的非线性结构，提高数据分析能力。经典的流形学习算法有LE(LaplacianEgenmap)、LLE(Locally Linear Embedding)和LTSA(Local TangentSpace Alignment)算法，可用于高光谱图像的特征提取算法。Hyperspectral images are data cubes with a huge amount of information, and each pixel corresponds to a spectral line containing hundreds of bands, which provides the possibility for people to study the relationship between substances and spectral curves. However, there are data redundancy and dimensionality disaster problems in hyperspectral data, and there is an urgent need to eliminate the information redundancy of hyperspectral data. This redundancy of hyperspectral data is mainly caused by the correlation between the bands of hyperspectral data. Dimensionality reduction is an important preprocessing method, although linear reduction methods such as PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) Dimensional algorithms are simple to implement, but hyperspectral images have nonlinear characteristics. Manifold learning algorithms can better mine the nonlinear structure of hyperspectral data and improve data analysis capabilities. The classic manifold learning algorithms include LE (Laplacian Egenmap), LLE (Locally Linear Embedding) and LTSA (Local TangentSpace Alignment) algorithms, which can be used for feature extraction algorithms of hyperspectral images.

但是经典流形学习算法如LE无泛化能力，当有新的高光谱样本数据出现时，只有把新的样本数据和原有的样本数据结合在一起再一次地进行整体的学习才能够得到新样本数据的降维结果。当新的样本数据的个数相比于原有的样本数据的个数显得很少的时候，显然这样重复大规模的计算会大大增加算法的时间复杂度。从高光谱分类需求来说，流形学习算法必须具备泛化能力，因为很多时候无法将训练数据和待分类的测试数据放在一起进行学习，对于这种情况，待分类的测试数据必须要通过泛化算法映射到低维的特征空间，否则将无法在低维的特征空间对新的高光谱数据进行分类。However, classic manifold learning algorithms such as LE have no generalization ability. When new hyperspectral sample data appears, only by combining the new sample data with the original sample data and performing overall learning again can new Dimensionality reduction results for sample data. When the number of new sample data is very small compared with the number of original sample data, it is obvious that repeated large-scale calculations will greatly increase the time complexity of the algorithm. From the perspective of hyperspectral classification requirements, the manifold learning algorithm must have generalization ability, because in many cases it is impossible to put the training data and the test data to be classified together for learning. In this case, the test data to be classified must pass The generalization algorithm is mapped to a low-dimensional feature space, otherwise it will not be possible to classify new hyperspectral data in a low-dimensional feature space.

线性降维方法如PCA等虽然在处理具有非线性的数据效果不佳，但是它们却能够获得一个全局的映射函数，因而具备泛化能力。为此很多学者对流形学习线性化来解决流形学习无泛化能力的问题。其中非常典型的就是对LE、LLE、LTSA的线性化，分别得到LPP(Locality preserving projections)、NPE(Neighborhood PreservingEmbedding)、LLTSA(Linear Local Tangent Space Alignment)。Although linear dimensionality reduction methods such as PCA are not effective in dealing with nonlinear data, they can obtain a global mapping function and thus have generalization capabilities. For this reason, many scholars linearize manifold learning to solve the problem that manifold learning has no generalization ability. The very typical ones are the linearization of LE, LLE, and LTSA, and LPP (Locality preserving projections), NPE (Neighborhood Preserving Embedding), and LLTSA (Linear Local Tangent Space Alignment) are obtained respectively.

流形学习算法具备统一的框架，对于如LE、LLE和LTSA等基于谱分解的流形学习算法而言，最优的降维结果Y^*都可以通过下面的最优化问题来求解：The manifold learning algorithm has a unified framework. For manifold learning algorithms based on spectral decomposition such as LE, LLE, and LTSA, the optimal dimensionality reduction result Y ^* can be solved by the following optimization problem:

${Y Y}^{* *} = = \underset{Y Y}{min min} tr tr (({YLY YLY}^{T T})) s the s . . t t {YBY YBY}^{Y Y} = = I I - - - - - - ((11)),,$

其中tr(·)是求矩阵的迹运算符，L为拉普拉斯矩阵，B是约束矩阵。LPP、NPE和LLTSA都假设降维过程中存在一个全局的线性映射：Among them, tr(·) is the trace operator for finding the matrix, L is the Laplacian matrix, and B is the constraint matrix. LPP, NPE, and LLTSA all assume that there is a global linear map in the dimensionality reduction process:

Y＝V^TX (2)，Y = V ^T X (2),

使得式(1)中的优化问题转变为：The optimization problem in formula (1) is transformed into:

${V V}^{* *} = = \underset{V V}{min min} tr tr (({V V}^{T T} {XLX XLX}^{T T} V V)) s the s . . t t {V V}^{T T} {XVX XVX}^{T T} V V = = I I - - - - - - ((33)) . .$

获得了最优的线性映射矩阵V^*之后，不论是新的样本还是旧的样本，都可以通过式(2)映射到降维空间。但是式(2)这个假设在很多时候是不成立的。为了获得全局的线性映射，这些算法其实是对原本算法的局部保持特性做了一定的妥协。After obtaining the optimal linear mapping matrix V ^* , whether it is a new sample or an old sample, it can be mapped to the dimensionality reduction space by formula (2). But the assumption of formula (2) is not true in many cases. In order to obtain a global linear map, these algorithms actually compromise the local preservation characteristics of the original algorithm.

发明内容Contents of the invention

本发明针对流形学习算法无泛化能力的不足，提出了一种改进的流形学习线性化算法，并将其用于高光谱图像的特征提取算法。Aiming at the lack of generalization ability of the manifold learning algorithm, the invention proposes an improved manifold learning linearization algorithm, and uses it in the feature extraction algorithm of the hyperspectral image.

本发明的目的是通过以下技术方案实现的：The purpose of the present invention is achieved through the following technical solutions:

为了使得线性化的过程尽可能不要牺牲原流形学习算法保持局部特性的能力，本发明在原有的代价函数中加入了偏离原流形学习算法结果的惩罚项，使得式(3)中的优化问题转变为：In order to avoid sacrificing the ability of the original manifold learning algorithm to maintain local characteristics during the linearization process, the present invention adds a penalty term that deviates from the result of the original manifold learning algorithm in the original cost function, so that the optimization in formula (3) The problem turns into:

$min min tr tr (({V V}^{T T} {XLX XLX}^{T T} V V)) + + α α {| | | | {Y Y}_{L L} - - {V V}^{T T} X x | | | |}_{F f}^{22} - - - - - - ((44)),,$

其中Y_L是已经学习了的降维结果，α是惩罚系数，可以通过LE、LLE、LTSA或其他任何一种效果良好但是不具备泛化能力的流形学习算法获得。相比于式(3)，式(4)不再需要添加约束项V^TXBX^TV＝I，这使得优化问题的搜索域得到了拓宽，可以获得比传统降维算法更好的降维效果。这种泛化算法是基于Y_L已知去求取一个线性化映射矩阵V的过程，是一种全局的线性回归算法。原有的流形学习算法都是通过特征分解来获得最优的降维结果，本发明提出的改进方法通过求解一个矩阵方程来获得最优的降维结果。Among them, Y _L is the dimension reduction result that has been learned, and α is the penalty coefficient, which can be obtained by LE, LLE, LTSA or any other manifold learning algorithm with good effect but no generalization ability. Compared with formula (3), formula (4) no longer needs to add the constraint item V ^T XBX ^T V = I, which broadens the search domain of the optimization problem and can obtain better dimension reduction effect than traditional dimension reduction algorithms . This generalization algorithm is a process of finding a linearized mapping matrix V based on the known Y _L , which is a global linear regression algorithm. The original manifold learning algorithm obtains the optimal dimension reduction result through feature decomposition, and the improved method proposed by the present invention obtains the optimal dimension reduction result by solving a matrix equation.

这里给出对(4)中的优化问题的求解过程：Here is the solution process of the optimization problem in (4):

$\begin{matrix} g g ((V V)) = = tr tr (({V V}^{T T} {XLX XLX}^{T T} V V)) + + α α {| | | | {Y Y}_{L L} - - {V V}^{T T} X x | | | |}_{F f}^{22} \\ = = tr tr (({V V}^{T T} {XLX XLX}^{T T} V V)) + + αtr αtr {{[[{Y Y}_{L L} - - {V V}^{T T} X x]] {[[{Y Y}_{L L} - - {V V}^{T T} X x]]}^{T T}}} \\ = = tr tr (({V V}^{T T} X x ((αI α I + + L L)) {X x}^{T T} V V)) + + αtr αtr (({Y Y}_{L L} {Y Y}_{L L}^{T T})) \\ - - αtr αtr (({Y Y}_{L L} {X x}^{T T} V V)) - - αtr αtr (({V V}^{T T} {XT XT}_{L L}^{T T})) \end{matrix} - - - - - - ((55))$

代价函数g(V)对V求导得：The cost function g(V) is derived from V:

$\frac{&PartialD; &PartialD; g g ((V V))}{&PartialD; &PartialD; V V} = = 22 [[{XAX XAX}^{T T}]] V V - - 22 α α {XY X Y}_{L L}^{T T} = = 00 - - - - - - ((66)),,$

其中A＝αI+L。where A=αI+L.

于是式(4)中的最优化问题变为一个求解矩阵方程的问题：So the optimization problem in formula (4) becomes a problem of solving the matrix equation:

[XAX^T]V＝aXY_L ^T (7)。[XAX ^T ] V = aXY _L ^T (7).

等价于下面的线性方程组问题：Equivalent to the following system of linear equations problem:

$\begin{matrix} vec vec ((V V)) = = {(({XAX XAX}^{T T} &CircleTimes; &CircleTimes; {I I}_{d d}))}^{- - 11} vec vec ((α α {XY X Y}_{L L}^{T T})) \\ = = [[{(({XAX XAX}^{T T}))}^{- - 11} &CircleTimes; &CircleTimes; {I I}_{d d}]] vec vec ((α α {XY X Y}_{L L}^{T T})) \end{matrix} - - - - - - ((88)) . .$

所以有：F:

$V V = = avec avec {{[[{(({XAX XAX}^{T T}))}^{- - 11} &CircleTimes; &CircleTimes; {I I}_{d d}]] vec vec ((B B)),, d d}} - - - - - - ((99)) . .$

式(9)是一个函数表达式，其中vec(·)是矩阵向量化运算符，对于vec(B)，即将D×d维的矩阵转换为D×d维列向量。avec(·)是向量矩阵化运算符。对于D×d维的矩阵，vec(·)的运算规则为将矩阵每一行的元素转换为一个d维的列向量，并且将每行转换后的列向量依行号排成一个D×d维列向量，avec(·)则是其逆运算，即将D×d维的列向量转换为D×d维矩阵。Equation (9) is a functional expression, where vec( ) is a matrix vectorization operator, and for vec(B), it is to convert a D×d-dimensional matrix into a D×d-dimensional column vector. avec( ) is the vector matrixization operator. For a D×d-dimensional matrix, the operation rule of vec(·) is to convert the elements of each row of the matrix into a d-dimensional column vector, and arrange the converted column vectors of each row into a D×d-dimensional one according to the row number column vector, and avec(·) is its inverse operation, which converts a D×d-dimensional column vector into a D×d-dimensional matrix.

对(9)进一步化简得：(9) is further simplified to get:

V＝(XAX^T)^-1B (10)。V = (XAX ^T ) ^-1 B (10).

本发明提供的基于流形学习线性化的高光谱图像特征提取算法，具体步骤如下：The hyperspectral image feature extraction algorithm based on manifold learning linearization provided by the present invention, the specific steps are as follows:

步骤一：计算初步降维结果和拉普拉斯矩阵。Step 1: Calculate the preliminary dimensionality reduction results and Laplacian matrix.

给定高光谱数据集X，X是D×N维矩阵，D是数据维数，N是样本个数，d是降维维数，执行如LE、LLE以及LTSA算法中的某一种流形学习算法，得到这种算法的初步降维结果Y_L和拉普拉斯矩阵L，其中Y_L是d×N维矩阵，L是N×N维矩阵。Given a hyperspectral data set X, where X is a D×N dimensional matrix, D is the data dimension, N is the number of samples, and d is the dimensionality reduction dimension, execute a certain manifold such as LE, LLE, and LTSA algorithms Learning algorithm, get the preliminary dimension reduction result Y _L and Laplacian matrix L of this algorithm, where Y _L is a d×N dimensional matrix, and L is an N×N dimensional matrix.

步骤二：构建矩阵方程组常数项矩阵和系数矩阵。Step 2: Construct the constant term matrix and coefficient matrix of the matrix equation system.

LPP、NPE和LLTSA等线性化的流形学习通常都要通过对N×N维的拉普拉斯矩阵L特征分解来获取最优的特征转换矩阵，计算量大，本步骤的创新点在于将最优特征转换矩阵的求解转换为一个矩阵方程组的求解问题。Linear manifold learning such as LPP, NPE, and LLTSA usually obtains the optimal feature transformation matrix by decomposing the N×N-dimensional Laplacian matrix L eigenvalues, which requires a large amount of calculation. The innovation of this step is to The solution of the optimal eigentransition matrix is transformed into a solution problem of a system of matrix equations.

矩阵方程组如式(7)所示，这里给出常数项矩阵和系数矩阵的构建公式：The matrix equations are shown in formula (7), here are the construction formulas of the constant term matrix and the coefficient matrix:

1)构建D×d维的矩阵方程组常数项矩阵B：1) Construct a D×d-dimensional matrix equation system constant item matrix B:

B＝αXY_L ^T，B=αXY _L ^T ,

其中α是一个为正数的惩罚系数，默认可设置为1。Where α is a positive penalty coefficient, which can be set to 1 by default.

2)构建D×D维的矩阵方程组系数矩阵C：2) Construct D×D-dimensional matrix equation system coefficient matrix C:

C＝X(αI+L)X^T，C=X(αI+L)X ^T ,

其中，I是N×N维的单位矩阵。Wherein, I is an N×N-dimensional identity matrix.

步骤三：计算特征转换矩阵。Step 3: Calculate the feature transformation matrix.

矩阵方程组的求解过程中，通常要对矩阵进行向量化以及对向量进行矩阵化运算，而且还涉及到矩阵的Kronecker积，这些运算使得矩阵方程组的求解时间大消耗的内存空间也大。本步骤的创新点在于将矩阵方程组求解的问题化简为两个矩阵相乘来实现，实现方便，算法复杂度小。In the process of solving the matrix equations, it is usually necessary to vectorize the matrix and perform matrix operations on the vectors, and also involves the Kronecker product of the matrix. These operations make the solution of the matrix equations take a long time and consume a lot of memory space. The innovation of this step is to simplify the problem of solving matrix equations into two matrix multiplications, which is easy to implement and has low algorithm complexity.

1)对方程组系数矩阵C进行求逆得到矩阵H：1) Invert the coefficient matrix C of the system of equations to obtain the matrix H:

H＝C^-1；H=C ^-1 ;

2)通过矩阵H和方程组常数项矩阵B相乘得特征转换矩阵V：2) The feature transformation matrix V is obtained by multiplying the matrix H and the constant item matrix B of the equation system:

V＝HB。V=HB.

步骤四：通过特征转换矩阵计算最终降维结果：Step 4: Calculate the final dimension reduction result through the feature transformation matrix:

Y＝V^TX，Y = V ^T X,

其中，Y是最终的降维结果。Among them, Y is the final dimensionality reduction result.

本发明的有益效果在于：The beneficial effects of the present invention are:

1、这种算法是一种全局的线性回归算法，可以使得任何一种不具备泛化能力的流形学习算法通过该方法进行线性化之后，获得泛化能力，并能可以获得比传统降维算法更好的特征提取效果。1. This algorithm is a global linear regression algorithm, which can enable any manifold learning algorithm that does not have generalization ability to obtain generalization ability after linearization through this method, and can obtain better than traditional dimensionality reduction. The algorithm has better feature extraction effect.

2、针对LPP、NPE和LLTSA线性化流形学习算法中全局线性映射的假设在很多时候不成立的不足，在原有的代价函数中加入了偏离原流形学习算法结果的惩罚项，并且舍去了原目标函数中的约束项，这使得优化问题的搜索域得到了拓展，可以获得比传统降维算法更好的降维效果。2. In view of the fact that the assumption of global linear mapping in the LPP, NPE and LLTSA linearized manifold learning algorithms is not valid in many cases, a penalty item for deviating from the original manifold learning algorithm result is added to the original cost function, and the The constraints in the original objective function expand the search domain of the optimization problem and achieve better dimensionality reduction results than traditional dimensionality reduction algorithms.

附图说明Description of drawings

图1为本发明步骤流程图；Fig. 1 is a flow chart of steps of the present invention;

图2为本发明中的高光谱图像数据的特征散点图；Fig. 2 is a characteristic scatter diagram of the hyperspectral image data in the present invention;

图3为基于本发明的特征提取之后的高光谱图像的分类精度。Fig. 3 is the classification accuracy of the hyperspectral image after feature extraction based on the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案作进一步的说明，但并不局限如此，凡是对本发明技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，均应涵盖在本发明的保护范围中。The technical solution of the present invention will be further described below in conjunction with the accompanying drawings, but it is not limited to this. Any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention should be covered by the technical solution of the present invention. in the scope of protection.

本发明在第一步骤中需要使用一种现有的流形学习算法获得拉普拉斯矩阵和初步的降维结果，这里以LLE算法为例，将其作为第一步骤中的流形学习算法，然后使用本发明所提出的算法对高光谱图像进行特征提取。实验用高光谱图像数据选用IND PINE高光谱图像，该高光谱图像是由美国的肯尼迪航天中心在对美国印第安纳州的一个农田进行拍照而得，这幅图像中一共有16种不同的农作物，图像的空间分辨率是20×20m²，每个像素点有224个波段，覆盖光谱范围是0.2～2.4μm的波谱范围，谱分辨率为10nm。In the first step, the present invention needs to use an existing manifold learning algorithm to obtain the Laplacian matrix and preliminary dimensionality reduction results. Here, the LLE algorithm is used as an example of the manifold learning algorithm in the first step. , and then use the algorithm proposed by the present invention to extract features from the hyperspectral image. The hyperspectral image data used in the experiment is the IND PINE hyperspectral image. The hyperspectral image was taken by the Kennedy Space Center of the United States in a farmland in Indiana, USA. There are 16 different crops in this image. The spatial resolution is 20×20m ² , each pixel has 224 bands, the spectral range is 0.2-2.4μm, and the spectral resolution is 10nm.

随机地从IND PINE高光谱图像中选择1500个像素点作为训练样本，随机地从剩余的样本中选择1500个像素点作为测试样本。本实施例中，使用本发明对训练样本进行学习，获取特征转换矩阵，然后使用特征转换矩阵对测试样本进行特征提取，并且使用KNN分类器对测试样本进行分类。为了验证本发明特征提取地有效性，对同样的训练数据、测试数据和分类器，同时使用LLE经典的线性化算法NPE作为对比算法。Randomly select 1500 pixels from the IND PINE hyperspectral image as training samples, and randomly select 1500 pixels from the remaining samples as test samples. In this embodiment, the present invention is used to study the training samples, obtain the feature transformation matrix, then use the feature transformation matrix to extract the features of the test samples, and use the KNN classifier to classify the test samples. In order to verify the effectiveness of feature extraction in the present invention, for the same training data, test data and classifier, the classic linearization algorithm NPE of LLE is used as a comparison algorithm.

如图1所示，采用本发明进行特征提取的具体步骤如下：As shown in Figure 1, the specific steps of feature extraction using the present invention are as follows:

步骤一：通过LLE流形学习算法获得降维结果Y_L和拉普拉斯矩阵L。Step 1: Obtain the dimensionality reduction result Y _L and the Laplacian matrix L through the LLE manifold learning algorithm.

1)输入训练集其中每个样本数据维数为224，测试样本个数为1500；设置近邻域的个数为20；设置降维结果维数30。1) Input training set The data dimension of each sample is 224, the number of test samples is 1500; the number of nearest neighbors is set to 20; the dimensionality of the dimension reduction result is set to 30.

2)寻找邻域集2) Find the neighborhood set

对训练集中的每个样本x_i，其中i表示x_i在训练样本集X中的位置索引号，i＝1，2，...，1500，通过样本间欧式距离排序，搜索训练集中与x_i最近的20个样本，构成样本x_i的邻域集 For each sample x _i in the training set, where i represents the position index number of x _i in the training sample set X, i=1, 2, ..., 1500, sort by the Euclidean distance between samples, search the training set and x The nearest 20 samples _{of i} constitute the neighborhood set of sample x _i

3)构建重构系数矩阵W：3) Construct the reconstruction coefficient matrix W:

${W W}_{ij ij} = = \{\begin{matrix} T T (({x x}_{j j},, {X x}_{i i})) / / S S (({x x}_{j j},, {X x}_{i i})) & {x x}_{j j} &Element; &Element; {X x}_{i i} \\ 00 & others others \end{matrix} i i,, j j = = 1,2 1,2,, . . . . . .,, 15001500,,$

其中：W_ij为x_j对x_i的重构系数，x_j为训练样本集X中位置索引号为j的测试样本，其中，T(x_j，X_i)通过下面公式计算：Among them: W _ij is the reconstruction coefficient of x _j to x _i , x _j is the test sample whose position index number is j in the training sample set X, where T(x _j ,X _i ) is calculated by the following formula:

$T T (({x x}_{j j},, {X x}_{i i})) = = {Σ Σ}_{l l = = 11}^{2020} {P P}_{l l,, index index (({x x}_{j j},, {X x}_{i i}))} . .$

其中：index(x_j，X_i)表示x_j在X_i中位置索引，T计算的是矩阵P中index(x_j，X_i)列的所有元素的和；矩阵P为邻域集X_i的局部协方差矩阵的逆，可以通过下面公式计算：Among them: index(x _j , _Xi ) indicates the position index of x _j in _Xi , and T calculates the sum of all elements in the column index(x _j , _Xi ) in matrix P; matrix P is the neighborhood set _Xi The inverse of the local covariance matrix of can be calculated by the following formula:

P＝[(X_i-x_i)^T(X_i-x_i)]^-1，P=[(X _i _-xi ) ^T (X _i _-xi )] ^-1 ,

其中：S为矩阵P的所有元素的和，可以通过下面公式计算：Among them: S is the sum of all elements of the matrix P, which can be calculated by the following formula:

$S S (({x x}_{j j},, {X x}_{i i})) = = {Σ Σ}_{l l = = 11}^{2020} {Σ Σ}_{m m = = 11}^{2020} {P P}_{l l,, m m} . .$

4)构建拉普拉斯矩阵：4) Construct the Laplacian matrix:

L＝(I-W)^T(I-W)，L=(IW) ^T (IW),

其中：I为N×N维单位矩阵。Where: I is an N×N-dimensional identity matrix.

5)进行特征分解：5) Perform feature decomposition:

Lz＝λz。Lz=λz.

6)获取初步降维结果：6) Obtain preliminary dimensionality reduction results:

其中：z_ii是L的第_ii小的特征值对应的特征向量，_ii＝2，...，d+1。Where: z _ii is the eigenvector corresponding to the _ii- th smallest eigenvalue of L, _ii =2,...,d+1.

步骤二：构建矩阵方程组常数项矩阵B和系数矩阵C。Step 2: Construct the constant term matrix B and the coefficient matrix C of the matrix equation system.

1)设置正数惩罚系数α，这里设为α＝1。1) Set a positive penalty coefficient α, where α=1.

2)矩阵方程组常数项矩阵B通过下面公式计算：2) The constant item matrix B of the matrix equation system is calculated by the following formula:

B＝αXY_L ^T。B ⁼ _αXYLT .

3)矩阵方程组系数矩阵C通过下面公式计算：3) The coefficient matrix C of the matrix equation system is calculated by the following formula:

C＝X(αI+L)X^T。C=X(αI+L) ^XT .

步骤三：计算特征转换矩阵V。Step 3: Calculate the feature transformation matrix V.

H＝C^-1。H=C ^-1 .

V＝HB。V=HB.

步骤四：通过特征转换矩阵计算最终降维结果。Step 4: Calculate the final dimension reduction result through the feature transformation matrix.

1)通过下面公式计算训练集的最终降维结果Y：1) Calculate the final dimensionality reduction result Y of the training set by the following formula:

Y＝(V)^TX。Y = (V) ^T X.

2)通过下面公式计算测试集的最终降维结果：2) Calculate the final dimensionality reduction result of the test set by the following formula :

$\overset{~ ~}{Y Y} = = {((V V))}^{T T} \overset{~ ~}{X x},,$

其中：为测试样本集。in: is the test sample set.

测试集和训练集的最终降维结果如图2所示。从图中可以看到新样本的泛化结果和测试样本的降维结果基本一致，可知本发明所提特征提取算法的泛化效果较好。The final dimensionality reduction results of the test set and training set are shown in Figure 2. It can be seen from the figure that the generalization result of the new sample is basically consistent with the dimensionality reduction result of the test sample, and it can be seen that the generalization effect of the feature extraction algorithm proposed in the present invention is better.

步骤五：使用KNN分类算法对测试样本进行分类测试。Step 5: Use the KNN classification algorithm to classify the test samples.

1)计算测试样本集的最终降维结果中的每个测试样本与所有训练样本集的最终降维结果Y的欧式距离。1) Calculate the final dimensionality reduction result of the test sample set The Euclidean distance between each test sample in and the final dimensionality reduction result Y of all training sample sets.

2)取距离最近的5个训练样本作为测试样本的近邻。2) Take the 5 closest training samples as the neighbors of the test samples.

3)根据这5个近邻归属的主要类别，来对测试样本分类。3) Classify the test samples according to the main categories that these five neighbors belong to.

4)计算分类的正确率。4) Calculate the correct rate of classification.

分类正确率如图3所示。从图3中可以看到通过本发明所提特征提取算法进行高光谱图像的泛化之后通过KNN分类具有较高的整体分类精度，LLE-GLR算法的分类精度明显要高于LLE、NPE算法的分类精度，这说明本发明中提出的特征提取算法有利于提高高光谱图像的分类精度。The classification accuracy rate is shown in Figure 3. As can be seen from Fig. 3, after the generalization of the hyperspectral image by the proposed feature extraction algorithm of the present invention, the KNN classification has a higher overall classification accuracy, and the classification accuracy of the LLE-GLR algorithm is obviously higher than that of the LLE and NPE algorithms. Classification accuracy, which shows that the feature extraction algorithm proposed in the present invention is beneficial to improve the classification accuracy of hyperspectral images.

综合上述分析，可知本发明提出的基于流形学习线性化的高光谱图像特征提取算法，对已有的流形学习线性化算法的确进行了改良。本发明中提出的特征提取算法可以有效的解决现有的流形学习对新样本无法学习的不足，同时还可以提高高光谱图像的分类精度，具有很大的工程实际价值。Based on the above analysis, it can be seen that the hyperspectral image feature extraction algorithm based on manifold learning linearization proposed by the present invention has indeed improved the existing manifold learning linearization algorithm. The feature extraction algorithm proposed in the present invention can effectively solve the problem that the existing manifold learning cannot learn new samples, and can also improve the classification accuracy of hyperspectral images, which has great engineering practical value.

Claims

1. a hyperspectral image feature extraction algorithm based on manifold learning linearization, characterized in that the hyperspectral image feature extraction algorithm step is as follows:

1. Given a hyperspectral data set X, obtain the preliminary dimensionality reduction result Y _L and the Laplacian matrix L through the manifold learning algorithm, where X is a D×N dimensional matrix, D is the data dimension, and N is the number of samples , Y _L is a d×N dimensional matrix, L is an N×N dimensional matrix, and d is the dimensionality reduction;

2. Construct the constant term matrix B and the coefficient matrix C of the matrix equation system:

1) Construct a D×d-dimensional matrix equation system constant item matrix B:

B=αXY _L ^T ,

Among them, α is a penalty coefficient that is a positive number;

2) Construct D×D-dimensional matrix equation system coefficient matrix C:

C=X(αI+L)X ^T ,

Wherein, I is an identity matrix of N×N dimensions;

3. Calculate the feature transformation matrix:

1) Invert the coefficient matrix C of the matrix equation system to obtain the matrix H:

H=C ^-1 ;

2) By multiplying the matrix H and the constant item matrix B of the matrix equation system, the characteristic transformation matrix V is obtained:

V=HB;

4. Calculate the final dimension reduction result through the feature transformation matrix:

Y = V ^T X,

Among them, Y is the final dimensionality reduction result.