CN103530658A

CN103530658A - Method for recognizing plant leaf data based on sparse representation

Info

Publication number: CN103530658A
Application number: CN201310481334.0A
Authority: CN
Inventors: 李波; 田贝贝; 黄德双
Original assignee: Wuhan University of Science and Technology WHUST
Current assignee: Wuhan University of Science and Technology WHUST
Priority date: 2013-10-15
Filing date: 2013-10-15
Publication date: 2014-01-22
Anticipated expiration: 2033-10-15
Also published as: CN103530658B

Abstract

本发明具体涉及一种基于稀疏表示的植物叶片数据识别方法。其技术方案是：本发明基于同类数据分布在同一流形上，不同数据分布于不同流形上的假设，一方面利用植物叶片数据类别信息定义流形间距离，另一方面对于流形局部邻域进行稀疏表示，通过在局部近邻图中建立一种稀疏表示关系来获取线性表示系数，建立目标函数寻找最佳投影低维空间，实现在该子空间内，流形间距离最大，而多流形的稀疏性能够较好地保持，最后采用最近邻分类方法在子空间内对植物叶片的稀疏特征进行分类和识别，提高了植物叶片数据的识别效果。The invention specifically relates to a method for identifying plant leaf data based on sparse representation. Its technical solution is: the present invention is based on the assumption that the same kind of data is distributed on the same manifold, and different data are distributed on different manifolds. On the one hand, the plant leaf data category information is used to define the distance between manifolds; The domain is sparsely represented, and the linear representation coefficient is obtained by establishing a sparse representation relationship in the local neighbor graph, and the objective function is established to find the best projected low-dimensional space, so that in this subspace, the distance between the manifolds is the largest, and the multi-flow The sparsity of the shape can be well maintained. Finally, the nearest neighbor classification method is used to classify and identify the sparse features of the plant leaves in the subspace, which improves the recognition effect of the plant leaf data.

Description

A recognition method of plant leaf data based on sparse representation

技术领域technical field

本发明属于植物叶片数据识别技术领域。具体涉及一种基于稀疏表示的植物叶片数据识别方法。The invention belongs to the technical field of plant leaf data identification. Specifically, it relates to a method for identifying plant leaf data based on sparse representation.

背景技术Background technique

植物是地球上物种数量最多、分布最广泛的生命形式之一。植物是人类生存与发展的重要遗传资源，是人类的重要食物来源，也是人类生产和生活必需的资源。同时，植物在水土保持、抑制荒漠和改善气候等方面起着至关重要的作用。近年来随着人类生产活动的日益增加，生态环境不断遭到破坏。据调查统计，世界上大约有3.4万种植物物种已处于灭绝的边缘，占世界上已知的25万种植物的13%。对植物进行保护已经是势在必行。Plants are one of the most abundant and widely distributed forms of life on Earth. Plants are an important genetic resource for human survival and development, an important source of food for human beings, and an essential resource for human production and life. At the same time, plants play a vital role in soil and water conservation, desert suppression and climate improvement. In recent years, with the increase of human production activities, the ecological environment has been continuously destroyed. According to survey statistics, about 34,000 plant species in the world are on the verge of extinction, accounting for 13% of the 250,000 known plant species in the world. It is imperative to protect plants.

目前的植物分类有很多种方法，如植物细胞分类学、植物化学分类学、植物血清分类学以及植物遗产学，但对于非专业人员，这些分类方法很难掌握或不实用。因此，有必要研究借助数字图像处理、模式识别、人工智能等信息技术进行计算机辅助的植物分类性状自动提取，实现植物物种的自动分类、机器识别，以及研究这些数字分类性状在植物物种生态分类中的意义。There are many methods of plant classification, such as plant cell taxonomy, phytochemical taxonomy, plant serotaxonomy and plant heritage, but for non-professionals, these classification methods are difficult or impractical. Therefore, it is necessary to study computer-aided automatic extraction of plant taxonomic traits with the help of information technologies such as digital image processing, pattern recognition, and artificial intelligence, to realize automatic classification and machine identification of plant species, and to study the role of these digital taxonomic traits in the ecological classification of plant species. meaning.

从目前的研究结果来看，应用在植物物种识别中比较广泛和成功的方法是结合植物叶片形状特征的神经网络方法，而此方法的成功在于神经网络的构造和植物图像特征矢量化，即如何从植物图像中提取特征。特征提取和选择对于机器学习方法来说是至关重要的，所抽取和选择的特征决定了分类器的性能和整个算法的结果。目前，绝大多数应用到植物物种识别中的机器学习方法的不同之处在于植物图像特征矢量化方法的不同，由此可见特征提取和选择在植物识别中的重要性。Judging from the current research results, the widely used and successful method used in plant species recognition is the neural network method combined with the shape characteristics of plant leaves, and the success of this method lies in the construction of the neural network and the vectorization of plant image features, that is, how to Extract features from plant images. Feature extraction and selection are crucial to machine learning methods, and the extracted and selected features determine the performance of the classifier and the results of the entire algorithm. At present, most of the machine learning methods applied to plant species identification are different in the vectorization method of plant image features, which shows the importance of feature extraction and selection in plant identification.

目前最常用的特征抽取技术就是主成分分析方法。在植物物种识别中，主成分分析也是一种常用的特征维数约简方法。主成分分析对具有线性结构的数据处理效果很好，它通过寻找数据的二阶统计性质来发现数据的线性结构，但对于高度非线性分布的数据并不能找到真正的分布结构。基于非线形分布数据本征维数分析的流形学习方法提供了一种新的解决途径。流形学习旨在发现高维流形分布数据的内在规律性,其本质是要从采样数据学习低维流形的内在几何结构.这就意味着流形学习方法比传统的维数约简方法更能体现事物的本质，更利于对数据的理解和进一步处理。因此，针对植物物种的多类别、高维分类数据，流形学习更有助于发现这些数据的内在分布规律和几何结构，这为植物分类学提供一种新型有效的分类性状分析工具。目前流形学习方法已经初步地应用到植物叶片数据特征提取和分类中，但是在流形学习中，需要大量的训练样本来学习流形的局部结构信息，而植物叶片训练数据还比较少。The most commonly used feature extraction technique is principal component analysis. In plant species identification, principal component analysis is also a commonly used feature dimensionality reduction method. Principal component analysis works well for data with a linear structure. It finds the linear structure of the data by looking for the second-order statistical properties of the data, but it cannot find the real distribution structure for the data with a highly nonlinear distribution. The manifold learning method based on the eigendimensional analysis of nonlinear distribution data provides a new solution. Manifold learning aims to discover the internal regularity of high-dimensional manifold distribution data, and its essence is to learn the internal geometric structure of low-dimensional manifolds from sampled data. This means that the manifold learning method is better than the traditional dimensionality reduction method. It can better reflect the essence of things, and is more conducive to the understanding and further processing of data. Therefore, for the multi-category and high-dimensional classification data of plant species, manifold learning is more helpful to discover the internal distribution and geometric structure of these data, which provides a new and effective tool for the analysis of taxonomic traits for plant taxonomy. At present, the manifold learning method has been preliminarily applied to the feature extraction and classification of plant leaf data, but in manifold learning, a large number of training samples are needed to learn the local structure information of the manifold, and the training data of plant leaves is still relatively small.

发明内容Contents of the invention

本发明旨在克服现有技术缺陷，目的是提出一种能提高识别效果的基于稀疏表示的植物叶片数据识别方法。The invention aims to overcome the defects of the prior art, and aims to propose a sparse representation-based plant leaf data recognition method that can improve the recognition effect.

为实现上述目的，本发明采用的技术方案的具体步骤如下：In order to achieve the above object, the concrete steps of the technical solution adopted in the present invention are as follows:

1)植物叶片数据的预处理1) Preprocessing of plant leaf data

先对原始采集的植物叶片图像进行去噪处理和平滑处理，然后进行植物叶片的图像分割，再将分割后的彩色图像转换为灰度图像，最后将灰度图像进行归一化和向量化处理，得到任一副植物叶片图像处理后的向量数据X_i和所有植物叶片图像预处理后的矩阵数据X。First, denoise and smooth the original collected plant leaf image, then segment the image of the plant leaf, then convert the segmented color image into a grayscale image, and finally normalize and vectorize the grayscale image , to obtain the vector data X _i after processing any plant leaf image and the matrix data X after preprocessing all plant leaf images.

2)计算任一副植物叶片图像预处理后的向量数据X_i投影后的向量数据Y_i 2) Calculate the vector data Y _i after the projection of the vector data X _i after the preprocessing of any pair of plant leaf images

A、建立流形之间差异度矩阵J_D A. Establish the difference degree matrix J _D between manifolds

根据所有植物叶片图像预处理后的矩阵数据X和类别信息矩阵H，建立流形之间差异度矩阵J_D According to the preprocessed matrix data X and category information matrix H of all plant leaf images, establish the difference matrix J _D between manifolds

$\begin{matrix} {J J}_{D D.} = = \frac{11}{22} {Σ Σ}_{i i,, j j}^{n no} {H h}_{ij ij} (({X x}_{i i} - - {X x}_{j j})) {(({X x}_{i i} - - {X x}_{j j}))}^{T T} \\ = = {Σ Σ}_{i i} {X x}_{i i} {Q Q}_{ii i} {X x}_{i i}^{T T} - - {Σ Σ}_{ij ij} {X x}_{i i} {H h}_{ij ij} {X x}_{j j}^{T T} = = X x {((Q Q - - H h)) X x}^{T T} \end{matrix} - - - - - - ((11))$

式（1）中：H_ij表示类别信息矩阵H的第i行第j列元素，In formula (1): H _ij represents the i-th row and j-th column element of the category information matrix H,

Q_ii表示类别信息矩阵对角化矩阵Q的第i行第i列元素，Q _ii represents the i-th row and i-column element of the category information matrix diagonalization matrix Q,

Q_ii＝Σ_jH_ij （3）Q _ii =Σ _j H _ij (3)

B、建立基于稀疏表示流形局部结构矩阵J_L B. Establish the local structure matrix J _L based on the sparse representation manifold

根据所有植物叶片图像预处理后的矩阵数据X，建立基于稀疏表示流形局部结构矩阵J_L According to the preprocessed matrix data X of all plant leaf images, establish a local structure matrix J _L based on the sparse representation manifold

$\begin{matrix} {J J}_{L L} = = \frac{11}{22} {Σ Σ}_{i i = = 11}^{n no} {Σ Σ}_{j j = = 11}^{n no} {S S}_{ij ij} (({X x}_{i i} - - {X x}_{j j})) {(({X x}_{i i} - - {X x}_{j j}))}^{T T} \\ = = \underset{ij ij}{Σ Σ} {X x}_{i i} {S S}_{ij ij} {X x}_{i i}^{T T} - - \underset{ij ij}{Σ Σ} {X x}_{i i} {S S}_{ij ij} {X x}_{j j}^{T T} = = \underset{i i}{Σ Σ} {X x}_{i i} {D D.}_{ii i} {X x}_{i i}^{T T} - - \underset{ij ij}{Σ Σ} {X x}_{i i} {S S}_{ij ij} {X x}_{j j}^{T T} \\ = = X x ((D D. - - S S)) {X x}^{T T} \end{matrix} - - - - - - ((44))$

式（4）中：S_ij表示稀疏表示系数矩阵第i行第j列元素；In formula (4): S _ij represents the element in row i and column j of the sparse representation coefficient matrix;

$\begin{matrix} {S S}_{i i} = = min min | | | | {S S}_{i i} {| | | |}_{11} \\ s the s . . t t . . {X x}_{i i} = = {XS XS}_{i i} \end{matrix} - - - - - - ((55))$

D_ii表示稀疏表示系数矩阵对角化矩阵D的第i行第i列元素；D _ii represents the element in row i and column i of the diagonalization matrix D of the sparse representation coefficient matrix;

${D D.}_{ii i} = = \underset{j j}{Σ Σ} {S S}_{ij ij} - - - - - - ((66))$

C、计算一副植物叶片图像处理后的向量数据X_i投影后的向量数据Y_i C. Calculate the vector data Y _i after projection of the vector data X _i after the image processing of a pair of plant leaves

通过线性变化，得到一副植物叶片图像处理后的向量数据X_i投影后的向量数据Y_i Through linear change, a pair of plant leaf image processed vector data X _i and projected vector data Y _i are obtained

Y_i＝W^TX_i （7）Y _i = W ^T X _i (7)

式（7）中：W表示变换矩阵，变换矩阵W通过如下目标函数获得：In formula (7): W represents the transformation matrix, and the transformation matrix W is obtained by the following objective function:

maxtr{W^T（J_D-J_L）W} （8）maxtr{W ^T (J _D -J _L )W} (8)

对（J_D-J_L）进行特征值分解，Eigenvalue decomposition of (J _D -J _L ),

（J_D-J_L）ν＝λν （9）(J _D -J _L ) ν = λν (9)

式(9)中：λ表示特征值；In formula (9): λ represents the eigenvalue;

ν表示特征向量。ν denotes the eigenvector.

将特征值λ按照由大到小顺序排列取前d个特征值所对应特征向量ν，组成投影矩阵W。Arrange the eigenvalues λ in descending order and take the eigenvectors ν corresponding to the first d eigenvalues to form the projection matrix W.

3)植物叶片数据的识别3) Identification of plant leaf data

对于每一未知类别的每一植物叶片图像处理后的向量数据X_i，获得一副植物叶片图像处理后的向量数据X_i投影后的向量数据Y_i，然后在低维空间内采用最近邻法识别一副植物叶片图像处理后的向量数据X_i投影后的向量数据Y_i的类别。For each plant leaf image processed vector data X _i of each unknown category, obtain a pair of plant leaf image processed vector data X _i and projected vector data Y _i , and then use the nearest neighbor method in the low-dimensional space Identify the category of the vector data Y _i after the projection of the vector data X _i processed by a pair of plant leaf images.

所述最近邻法是：在低维空间进行分类时，采用K近邻分类器，K为1。The nearest neighbor method is: when classifying in a low-dimensional space, a K-nearest neighbor classifier is used, and K is 1.

由于采用上述技术方案，本发明的有益效果是：本发明为解决面向植物叶片图像的识别问题，基于同类植物叶片数据分布在同一流形上，不同类别植物叶片数据分布在不同流形上的假设，提出了一种基于稀疏表示的植物叶片数据识别方法，针对植物叶片数据，通过在局部近邻图中建立一种稀疏表示关系来获取线性表示系数，与传统的流形学习方法比较而言，本发明提供了一种更加鲁棒的线性表示系数的优化学习方法，同时以植物叶片数据类别信息定义流形间距离，并建立目标函数以最大化流形间距离为优化目标实现植物叶片数据分类需求，提高了植物叶片数据的识别效果。Due to the adoption of the above technical solution, the beneficial effects of the present invention are: the present invention solves the identification problem facing plant leaf images, based on the assumption that the data of similar plant leaves are distributed on the same manifold, and the data of different types of plant leaves are distributed on different manifolds , proposed a method for plant leaf data recognition based on sparse representation. For plant leaf data, a sparse representation relationship was established in the local neighbor graph to obtain linear representation coefficients. Compared with traditional manifold learning methods, this The invention provides a more robust optimization learning method for linear representation coefficients. At the same time, the distance between manifolds is defined by the category information of plant leaf data, and the objective function is established to maximize the distance between manifolds as the optimization goal to realize the classification requirements of plant leaf data. , which improves the recognition effect of plant leaf data.

具体实施方式Detailed ways

下面结合本具体实施方式对本发明作进一步的描述，并非对其保护范围的限制。The present invention will be further described below in conjunction with the present specific embodiment, which is not intended to limit the scope of protection thereof.

实施例1Example 1

一种基于稀疏表示的植物叶片数据识别方法。其具体步骤如下：A recognition method for plant leaf data based on sparse representation. The specific steps are as follows:

1)植物叶片数据的预处理1) Preprocessing of plant leaf data

本实施例原始采集的20类数据共1000副植物叶片图像，每幅图像为64×64像素。先对植物叶片图像进行去噪处理和平滑处理，然后进行植物叶片的图像分割，再按照RGB图像与灰度图像的转换方法，将分割后的彩色图像转换为灰度图像。最后将灰度图像进行归一化和向量化处理，得到一副植物叶片图像处理后的向量数据X_i为4096维和所有植物叶片图像预处理后的规模为1000×4096的矩阵数据X。In this embodiment, 20 types of data were originally collected, and a total of 1000 plant leaf images were collected, and each image was 64×64 pixels. First, denoise and smooth the plant leaf image, then segment the image of the plant leaf, and then convert the segmented color image into a grayscale image according to the conversion method of RGB image and grayscale image. Finally, the grayscale image is normalized and vectorized to obtain a vector data X _i of 4096 dimensions after processing a pair of plant leaf images and a matrix data X of 1000×4096 after preprocessing all plant leaf images.

根据所有植物叶片图像预处理后的矩阵数据X和类别信息矩阵H，建立规模为4096×4096的流形之间差异度矩阵J_D According to the preprocessed matrix data X and category information matrix H of all plant leaf images, establish a difference matrix J _D between manifolds with a scale of 4096×4096

Q_ii＝Σ_jH_ij （3）Q _ii =Σ _j H _ij (3)

根据所有植物叶片图像预处理后的矩阵数据X，建立规模为4096×4096的基于稀疏表示流形局部结构矩阵J_L According to the preprocessed matrix data X of all plant leaf images, a local structure matrix J _L based on a sparse representation manifold with a scale of 4096×4096 is established

式（4）中：S_ij表示稀疏表示系数矩阵第i行第j列元素，In formula (4): S _ij represents the sparse representation coefficient matrix element in row i and column j,

D_ii表示稀疏表示系数矩阵对角化矩阵D的第i行第i列元素，D _ii represents the element in the i-th row and i-column of the diagonalization matrix D of the sparse representation coefficient matrix,

${D D.}_{ii i} = = \underset{j j}{Σ Σ} {S S}_{ij ij} - - - - - - ((66))$

Y_i＝W^TX_i （7）Y _i = W ^T X _i (7)

maxtr{W^T（J_D-J_L）W} （8）maxtr{W ^T (J _D -J _L )W} (8)

（J_D-J_L）ν＝λν （9）(J _D -J _L ) ν = λν (9)

式(9)中：λ表示特征值；In formula (9): λ represents the eigenvalue;

ν表示特征向量。ν denotes the eigenvector.

将特征值λ按照由大到小顺序排列取前d个特征值所对应特征向量ν，组成投影矩阵W，在对20类数据共1000副植物叶片图像进行计算时，取前38个特征值所对应特征向量，组成规模为4096×38的投影矩阵W。The eigenvalues λ are arranged in order from large to small, and the eigenvectors ν corresponding to the first d eigenvalues are taken to form the projection matrix W. When calculating 1000 plant leaf images of 20 types of data, the first 38 eigenvalues are taken. Corresponding to the eigenvectors, a projection matrix W with a size of 4096×38 is formed.

3)植物叶片数据的识别3) Identification of plant leaf data

对于每一未知类别的每一植物叶片图像处理后的向量数据X_i，获得一副植物叶片图像处理后的向量数据X_i投影后的向量数据Y_i，然后在低维空间内采用最近邻法识别一副植物叶片图像处理后的向量数据X_i投影后的向量数据Y_i的类别。For each plant leaf image processed vector data X _i of each unknown category, obtain a pair of plant leaf image processed vector data X _i and projected vector data Y _i , and then use the nearest neighbor method in the low-dimensional space Identify the category of the vector data Y _i after the projection of the vector data X _i after the image processing of a pair of plant leaves.

本实施例所述最近邻法是：在低维空间进行分类时，采用K近邻分类器，K为1。The nearest neighbor method described in this embodiment is: when performing classification in a low-dimensional space, a K-nearest neighbor classifier is used, and K is 1.

重复实验10次，并对预测正确率计算平均值，比较于传统的流形学习方法LPP，本发明方法能提高凭据识别率2.34%。Repeat the experiment 10 times, and calculate the average value of the prediction accuracy rate. Compared with the traditional manifold learning method LPP, the method of the present invention can improve the credential recognition rate by 2.34%.

实施例2Example 2

2)植物叶片数据的预处理2) Preprocessing of plant leaf data

本实施例原始采集的50类数据共1500副植物叶片图像，每幅图像为32×32像素。先对植物叶片图像进行去噪处理和平滑处理，然后进行植物叶片的图像分割，再按照RGB图像与灰度图像的转换方法，将分割后的彩色图像转换为灰度图像。最后将灰度图像进行归一化和向量化处理，得到一副植物叶片图像处理后的向量数据X_i为1024维和所有植物叶片图像预处理后的规模为1500×1024的矩阵数据X。In this embodiment, 50 categories of data were originally collected, with a total of 1500 plant leaf images, and each image is 32×32 pixels. First, denoise and smooth the plant leaf image, then segment the image of the plant leaf, and then convert the segmented color image into a grayscale image according to the conversion method of RGB image and grayscale image. Finally, the grayscale image is normalized and vectorized to obtain a vector data X _i of 1024 dimensions after processing a pair of plant leaf images and a matrix data X of 1500×1024 after preprocessing all plant leaf images.

根据所有植物叶片图像预处理后的矩阵数据X和类别信息矩阵H，建立规模为1024×1024的流形之间差异度矩阵J_D：According to the preprocessed matrix data X and category information matrix H of all plant leaf images, a difference matrix J _D between manifolds with a scale of 1024×1024 is established:

Q_ii＝Σ_jH_ij （3）Q _ii =Σ _j H _ij (3)

根据所有植物叶片图像预处理后的矩阵数据X，建立规模为1024×1024的基于稀疏表示流形局部结构矩阵J_L According to the preprocessed matrix data X of all plant leaf images, establish a local structure matrix J _L based on a sparse representation manifold with a scale of 1024×1024

${D D.}_{ii i} = = \underset{j j}{Σ Σ} {S S}_{ij ij} - - - - - - ((66))$

Y_i＝W^TX_i （7）Y _i = W ^T X _i (7)

maxtr{W^T（J_D-J_L）W} （8）maxtr{W ^T (J _D -J _L )W} (8)

（J_D-J_L）ν＝λν （9）(J _D -J _L ) ν = λν (9)

式(9)中：λ表示特征值；In formula (9): λ represents the eigenvalue;

ν表示特征向量。ν represents the eigenvector.

将特征值λ按照由大到小顺序排列取前d个特征值所对应特征向量ν，组成投影矩阵W，在对50类数据共1500副植物叶片图像进行计算时，取前102个特征值所对应特征向量，组成规模为1024×102的投影矩阵W。The eigenvalues λ are arranged in order from large to small, and the eigenvectors ν corresponding to the first d eigenvalues are taken to form the projection matrix W. When calculating 1500 plant leaf images of 50 types of data, the first 102 eigenvalues are taken. Corresponding to the eigenvectors, a projection matrix W with a size of 1024×102 is formed.

3)植物叶片数据的识别3) Identification of plant leaf data

重复实验10次，并对预测正确率计算平均值，比较于传统的流形学习方法LPP，本发明方法能提高凭据识别率1.67%。Repeat the experiment 10 times, and calculate the average value of the prediction accuracy rate. Compared with the traditional manifold learning method LPP, the method of the present invention can improve the credential recognition rate by 1.67%.

本具体实施方式的有益效果是：本具体实施方式为解决面向植物叶片图像的识别问题，基于同类植物叶片数据分布在同一流形上，不同类别植物叶片数据分布在不同流形上的假设，提出了一种基于稀疏表示的植物叶片数据识别方法，针对植物叶片数据，通过在局部近邻图中建立一种稀疏表示关系来获取线性表示系数，与传统的流形学习方法比较而言，本具体实施方式提供了一种更加鲁棒的线性表示系数的优化学习方法，同时以植物叶片数据类别信息定义流形间距离，并建立目标函数以最大化流形间距离为优化目标实现植物叶片数据分类需求，提高了植物叶片数据的识别效果。The beneficial effects of this specific embodiment are: this specific embodiment is to solve the recognition problem of plant leaf images, based on the assumption that the data of similar plant leaves are distributed on the same manifold, and the data of different types of plant leaves are distributed on different manifolds. A method for identifying plant leaf data based on sparse representation is proposed. For plant leaf data, a sparse representation relationship is established in the local neighbor graph to obtain linear representation coefficients. Compared with traditional manifold learning methods, this specific implementation The method provides a more robust optimization learning method for linear representation coefficients, at the same time defines the inter-manifold distance with the plant leaf data category information, and establishes an objective function to maximize the inter-manifold distance as the optimization goal to meet the plant leaf data classification requirements , which improves the recognition effect of plant leaf data.

Claims

1. the plant leaf blade data recognition methods based on rarefaction representation, is characterized in that described sparse features is extracted and the concrete steps of recognition methods are as follows:

1) pre-service of plant leaf blade data

First the leaf image of acquired original is carried out to denoising and smoothing processing, then the image that carries out plant leaf blade is cut apart, again the coloured image after cutting apart is converted to gray level image, finally gray level image is normalized with vectorization and processes, obtain the pretreated vector data X of arbitrary secondary leaf image _iwith the pretreated matrix data X of all leaf images;

2) calculate the pretreated vector data X of arbitrary secondary leaf image _ivector data Y after projection _i

Diversity factor matrix J between A, foundation stream shape _d

According to the pretreated matrix data X of all leaf images and classification information matrix H, set up diversity factor matrix J between stream shape _d

\begin{matrix} J_{D} = \frac{1}{2} Σ_{i, j}^{n} H_{ij} (X_{i} - X_{j}) {(X_{i} - X_{j})}^{T} \\ = Σ_{i} X_{i} Q_{ii} X_{i}^{T} - Σ_{ij} X_{i} H_{ij} X_{j}^{T} = X {(Q - H) X}^{T} \end{matrix} - - - (1)

In formula (1): H _ijthe capable j column element of i that represents classification information matrix H,

Q _iithe capable i column element of i that represents classification information matrix diagonalizable matrix Q,

Q _ii=Σ _jH _ij (3)

B, foundation are based on rarefaction representation stream shape partial structurtes matrix J _l

According to the pretreated matrix data X of all leaf images, set up based on rarefaction representation stream shape partial structurtes matrix J _l

\begin{matrix} J_{L} = \frac{1}{2} Σ_{i = 1}^{n} Σ_{j = 1}^{n} S_{ij} (X_{i} - X_{j}) {(X_{i} - X_{j})}^{T} \\ = \underset{ij}{Σ} X_{i} S_{ij} X_{i}^{T} - \underset{ij}{Σ} X_{i} S_{ij} X_{j}^{T} = \underset{i}{Σ} X_{i} D_{ii} X_{i}^{T} - \underset{ij}{Σ} X_{i} S_{ij} X_{j}^{T} \\ = X (D - S) X^{T} \end{matrix} - - - (4)

In formula (4): S _ijrepresent the capable j column element of rarefaction representation matrix of coefficients i,

\begin{matrix} S_{i} = \min | | S_{i} {| |}_{1} \\ s . t . X_{i} = {XS}_{i} \end{matrix} - - - (5)

D _iithe capable i column element of i that represents rarefaction representation matrix of coefficients diagonalizable matrix D,

D_{ii} = \underset{j}{Σ} S_{ij} - - - (6)

C, calculate the vector data X after a secondary leaf image is processed _ivector data Y after projection _i

By linear change, obtain the vector data X after a secondary leaf image is processed _ivector data Y after projection _i

Y _i＝W ^TX _i （7）

In formula (7): W represents transformation matrix, transformation matrix W obtains by following objective function:

maxtr{W ^T（J _D-J _L）W} （8）

To (J _d-J _l) carry out Eigenvalues Decomposition,

（J _D-J _L）ν＝λν （9）

In formula (9): λ representation feature value,

ν representation feature vector;

Eigenvalue λ is arranged and got front d the character pair vector ν of eigenwert institute according to descending order, form projection matrix W;

3) identification of plant leaf blade data

Vector data X after processing for each leaf image of each unknown classification _i, obtain the vector data X after a secondary leaf image is processed _ivector data Y after projection _i, then in lower dimensional space, adopt nearest neighbor method to identify the vector data X after a secondary leaf image is processed _ivector data Y after projection _iclassification.

2. the described plant leaf blade data recognition methods based on rarefaction representation according to claim 1, is characterized in that described nearest neighbor method is: when lower dimensional space is classified, adopt k nearest neighbor sorter, K is 1.