CN105181650B

CN105181650B - A method of quickly differentiating local tea variety using near-infrared spectrum technique

Info

Publication number: CN105181650B
Application number: CN201510652180.6A
Authority: CN
Inventors: 武斌; 武小红; 贾红雯
Original assignee: Chuzhou Vocational and Technical College
Current assignee: Yiyang Jiaming Tea Industry Co ltd
Priority date: 2015-10-08
Filing date: 2015-10-08
Publication date: 2018-08-17
Anticipated expiration: 2035-10-08
Also published as: CN105181650A

Abstract

The present invention is a kind of method quickly differentiating local tea variety using near-infrared spectrum technique, the near-infrared diffusing reflection spectrum of tealeaves is acquired near infrared spectrometer first, dimension-reduction treatment is carried out to the higher-dimension near infrared spectrum of tealeaves with principal component analysis (PCA) again, the extraction of the variety classification information of tealeaves spectroscopic data is carried out with linear discriminant analysis (LDA), and the discriminatory analysis of local tea variety is finally carried out using a kind of new broad sense noise clustering method.The present invention has detection speed fast, differentiates that accuracy rate is high, it is environmentally protective, it can be achieved that local tea variety accurate discriminating.

Description

A method for rapid identification of tea varieties using near-infrared spectroscopy

技术领域technical field

本发明涉及一种茶叶品种鉴别方法的技术领域，具体涉及一种使用近红外光谱技术快速鉴别茶叶品种的方法。The invention relates to the technical field of a method for identifying tea varieties, in particular to a method for quickly identifying tea varieties using near-infrared spectroscopy.

背景技术Background technique

茶叶是世界三大饮料之一，它含有茶多酚、蛋白质和氨基酸等有机物质，也含有钾、钙和镁等无机物质，具有安神，明目和清热等功效，常饮茶有益于人的身体健康。乐山竹叶青是乐山地区特有的茶叶品牌，但是在茶叶市场上存在以次充好现象，而普通消费者无法辨认优质名茶和劣质茶叶，往往会受骗上当。另外，以次充好的劣质茶叶损害了名优茶的品牌信誉，侵害了消费者权益，给名优茶的市场推广带来困扰。所以研究一种方法简单、易于操作、检测速度快的茶叶品种的鉴别方法是非常必要的。Tea is one of the three major beverages in the world. It contains organic substances such as tea polyphenols, protein and amino acids, as well as inorganic substances such as potassium, calcium and magnesium. Healthy body. Leshan Zhuyeqing is a unique tea brand in Leshan. However, there is a phenomenon of shoddy tea in the tea market, and ordinary consumers cannot distinguish between high-quality famous tea and low-quality tea, and are often deceived. In addition, shoddy and inferior tea leaves have damaged the brand reputation of famous and high-quality teas, violated the rights and interests of consumers, and brought troubles to the marketing of famous and high-quality teas. Therefore, it is very necessary to study a method for identifying tea varieties that is simple, easy to operate, and fast in detection speed.

近红外光谱检测技术作为一种快速无损检测技术，近年来应用于茶叶品质的检测分析中。张龙等用近红外光谱技术，主成分分析和典则判别分析对非发酵茶，半发酵茶和发酵茶进行分类研究。宁井铭等用近红外光谱技术和神经网络区分三种不同发酵程度的普洱茶。Huang等用近红外光谱技术和蚁群优化模型检测花茶的总花青素含量。Ren等用近红外光谱技术检测红茶的化学组成成分和识别茶叶的溯源地。He等用近红外光谱技术，偏最小二乘判别分析和欧式距离法检测茶叶的溯源地。Xiong等用近红外光谱技术和多光谱图像系统检测铁观音茶的总多酚含量。Near-infrared spectroscopy detection technology, as a rapid non-destructive detection technology, has been used in the detection and analysis of tea quality in recent years. Zhang Long et al. used near-infrared spectroscopy, principal component analysis and canonical discriminant analysis to classify non-fermented tea, semi-fermented tea and fermented tea. Ning Jingming et al. used near-infrared spectroscopy and neural networks to distinguish three types of Pu-erh tea with different degrees of fermentation. Huang et al. used near-infrared spectroscopy and an ant colony optimization model to detect the total anthocyanin content of scented tea. Ren et al. used near-infrared spectroscopy to detect the chemical composition of black tea and identify the traceability of tea. He et al. used near-infrared spectroscopy, partial least squares discriminant analysis and Euclidean distance method to detect the traceability of tea. Xiong et al. used near-infrared spectroscopy and multi-spectral imaging system to detect the total polyphenol content of Tieguanyin tea.

模糊C-均值聚类(FCM)是著名的模糊聚类算法，其应用非常广泛，但是FCM对噪声数据敏感。噪声聚类是一种模糊聚类算法，它适用于处理含噪声数据的聚类分析，噪声聚类将噪声数据看做一个类别进行处理，但是噪声聚类对参数具有依赖性，同时，噪声聚类的目标函数均是建立在样本到类中心矢量的欧式距离的平方基础上，它们在聚类拓扑结构比较复杂的数据时准确率往往不是很理想。Fuzzy C-means clustering (FCM) is a well-known fuzzy clustering algorithm, which is widely used, but FCM is sensitive to noise data. Noise clustering is a fuzzy clustering algorithm, which is suitable for cluster analysis of noise-containing data. Noise clustering treats noise data as a category, but noise clustering has a dependence on parameters. At the same time, noise clustering The objective functions of the class are all based on the square of the Euclidean distance from the sample to the class center vector, and their accuracy is often not ideal when clustering data with a complex topology.

用近红外光谱仪采集到的茶叶近红外漫反射光谱数据是一种高维数据，经过维数压缩和特征提取后数据的簇拓扑结构比较复杂，若采用噪声聚类进行数据聚类时，由于噪声聚类采用的欧式距离来度量数据，则聚类效果不理想。The near-infrared diffuse reflectance spectrum data of tea collected by near-infrared spectrometer is a kind of high-dimensional data. After dimensionality compression and feature extraction, the cluster topology of the data is relatively complex. If the Euclidean distance used by clustering is used to measure data, the clustering effect is not ideal.

发明内容Contents of the invention

本发明针对现有技术中噪声聚类方法的缺陷和不足的问题，提出了一种检测速度快，鉴别准确率高，绿色环保，可实现茶叶品种的准确鉴别的一种使用近红外光谱技术快速鉴别茶叶品种的方法；从而解决了噪声聚类方法只能聚类拓扑结构简单的数据问题，提高了噪声聚类的准确率。Aiming at the defects and insufficiencies of the noise clustering method in the prior art, the present invention proposes a fast detection speed, high identification accuracy, green and environmental protection, and can realize accurate identification of tea varieties using near-infrared spectrum technology. A method for identifying tea varieties; thereby solving the problem that the noise clustering method can only cluster data with a simple topological structure, and improving the accuracy of the noise clustering.

本发明的目的是通过以下技术手段实现的：一种使用近红外光谱技术快速鉴别茶叶品种的方法，其特征在于包括以下步骤：The object of the present invention is achieved by the following technical means: a method for quickly identifying tea varieties using near-infrared spectroscopy, characterized in that it comprises the following steps:

步骤一、茶叶样本近红外光谱的采集：用近红外光谱仪采集不同品种的茶叶样本，获取茶叶样本的近红外漫反射光谱；Step 1. Collection of near-infrared spectra of tea samples: collecting tea samples of different varieties with a near-infrared spectrometer to obtain near-infrared diffuse reflectance spectra of tea samples;

步骤二、对茶叶样本近红外光谱进行降维处理：采用主成分分析方法(PCA)将茶叶样本近红外光谱从高维数据变换为低维数据；Step 2, performing dimensionality reduction processing on the near-infrared spectrum of the tea sample: using principal component analysis (PCA) to convert the near-infrared spectrum of the tea sample from high-dimensional data to low-dimensional data;

步骤三、提取茶叶样本近红外光谱的鉴别信息：采用线性判别分析(LDA)提取茶叶样本近红外光谱的鉴别信息；Step 3, extracting the identification information of the near-infrared spectrum of the tea sample: using linear discriminant analysis (LDA) to extract the identification information of the near-infrared spectrum of the tea sample;

步骤四、运行模糊C-均值聚类以得到初始聚类中心；Step 4, run fuzzy C-means clustering to obtain initial cluster centers;

步骤五、用一种广义噪声聚类方法进行茶叶品种的鉴别：根据步骤四的初始聚类中心运行广义噪声聚类方法得到模糊隶属度，根据模糊隶属度可实现茶叶品种的鉴别。Step 5. Use a generalized noise clustering method to identify tea varieties: run the generalized noise clustering method based on the initial clustering center in step 4 to obtain fuzzy membership degrees, and tea varieties can be identified according to the fuzzy membership degrees.

步骤一、二、三中所述的近红外漫反射光谱，因不同的茶叶样本的近红外漫反射光谱包含了茶叶的不同的内部品质信息，品种不同的茶叶其内部品质不同，所对应的近红外漫反射光谱也不相同，此为本发明的原理。For the near-infrared diffuse reflectance spectra described in steps 1, 2, and 3, because the near-infrared diffuse reflectance spectra of different tea samples contain different internal quality information of tea leaves, the internal quality of tea leaves with different varieties is different, and the corresponding near-infrared reflectance spectra The infrared diffuse reflectance spectrum is also different, which is the principle of the present invention.

所述步骤五中的广义噪声聚类方法采用基于欧式距离的p次方的广义噪声聚类进行茶叶品种的分类，具体如下：The generalized noise clustering method in the step 5 adopts the generalized noise clustering based on the p power of Euclidean distance to classify the tea varieties, as follows:

(1).初始化(1).Initialization

设置茶叶近红外光谱样本数目n(+∞＞n＞1)，样本类别数目c(n＞c＞1)，权重指数m(+∞＞m＞1)和p(+∞＞p＞1)，初始迭代次数r＝1，最大迭代数r_max，误差上限值ε，初始化类中心v_i,0；Set the number of tea near-infrared spectrum samples n (+∞>n>1), the number of sample categories c (n>c>1), the weight index m (+∞>m>1) and p (+∞>p>1) , the initial number of iterations r=1, the maximum number of iterations r _max , the upper limit of error ε, the initialization class center v _i,0 ;

(2).计算参数α_ik：(2). Calculation parameter α _ik :

这里σ²是样本的方差；α_ik为第i(i＝1，2，……，c)类别的第k(k＝1，2，……，n)个样本的参数；D_ik,r＝||x_k-v_i,r-1||是x_k-v_i,r-1的欧式距离，x_k为第k个样本，v_i,r-1为第r－1次迭代时第i类的类中心矢量；D_jk,r＝||x_k-v_j,r-1||是x_k-v_j,r-1的欧式距离，ν_j,r-1为第r－1次迭代时第j类的类中心矢量；为总体样本均值，x_j为第j个样本；Here σ ² is the variance of the sample; α _ik is the parameter of the kth (k=1, 2, ..., n) sample of the i (i=1, 2, ..., c) category; D _ik,r ＝||x _k -v _{i, r-1} || is the Euclidean distance of x _k -v _{i, r-1} , x _k is the kth sample, v _{i, r-1} is the r-1th iteration The class center vector of the i-th class; D _jk,r =||x _k -v _j,r-1 || is the Euclidean distance of x _k -v _j,r-1 , ν _j,r-1 is the r-th The class center vector of the jth class at 1 iteration; is the overall sample mean, x _j is the jth sample;

(3).计算第r次迭代时的模糊隶属度值u_ik,r；(3). Calculate the fuzzy membership value u _ik,r during the rth iteration;

这里隶属度值u_ik,r表示第r次迭代计算时第k个样本隶属于第i类的模糊隶属度值；D_ik,r＝||x_k-v_i,r-1||，v_i,r-1为第r－1次迭代时第i类的类中心矢量；Here the membership value u _ik,r represents the fuzzy membership value of the k-th sample belonging to the i-th class during the r-th iterative calculation; D _ik,r =||x _k -v _i,r-1 ||, v _{i, r-1} is the class center vector of the i-th class at the r-1th iteration;

(4).计算第r次迭代时的类中心v_i,r；(4). Calculate the class center v _i,r at the rth iteration;

当max_i||v_i,r-v_i,r-1||＜ε或者r＝r_max时，迭代终止；否则，r＝r+1，返回步骤(2)继续迭代计算。When max _i ||v _{i, r} -v _{i, r-1} ||<ε or r=r _max , the iteration terminates; otherwise, r=r+1, return to step (2) to continue the iterative calculation.

与现有技术相比本发明具有以下明显的优点：Compared with the prior art, the present invention has the following obvious advantages:

1、本发明采用基于欧式距离的p次方的广义噪声聚类进行茶叶品种的分类；从而解决了噪声聚类方法只能聚类拓扑结构简单的数据问题，提高了噪声聚类的准确率。2、本发明方法用近红外光谱仪采集茶叶的近红外漫反射光谱，再用主成分分析(PCA)对茶叶的高维近红外光谱进行降维处理，用线性判别分析(LDA)进行茶叶光谱数据的品种类别信息的提取，最后利用一种新的广义噪声聚类方法进行茶叶品种的鉴别分析。3、本发明具有检测速度快，鉴别准确率高，绿色环保，可实现茶叶品种的准确鉴别。1. The present invention adopts the generalized noise clustering based on the pth power of Euclidean distance to classify tea varieties; thereby solving the problem that the noise clustering method can only cluster data with simple topology, and improving the accuracy of the noise clustering. 2, the inventive method collects the near-infrared diffuse reflectance spectrum of tealeaves with near-infrared spectrometer, then uses principal component analysis (PCA) to carry out dimensionality reduction process to the high-dimensional near-infrared spectrum of tealeaves, carries out tealeaves spectral data with linear discriminant analysis (LDA) Finally, a new generalized noise clustering method is used to identify and analyze tea varieties. 3. The present invention has the advantages of fast detection speed, high identification accuracy, environmental protection, and accurate identification of tea varieties.

附图说明Description of drawings

图1为本发明的流程示意图；Fig. 1 is a schematic flow sheet of the present invention;

图2为本发明中茶叶样本的漫反射近红外光谱图；Fig. 2 is the diffuse reflectance near-infrared spectrogram of tea sample among the present invention;

图3为本发明中线性判别分析特征提取后得到的二维数据图；Fig. 3 is the two-dimensional data diagram that obtains after linear discriminant analysis feature extraction among the present invention;

图4为本发明方法的模糊隶属度图；Fig. 4 is the fuzzy degree of membership figure of the inventive method;

图5为本发明方法实现茶叶品种鉴别的聚类准确率图。Fig. 5 is a diagram of the clustering accuracy rate of tea variety identification realized by the method of the present invention.

具体实施方式Detailed ways

以下结合附图说明和具体实施方式对本发明作进一步的详细描述：本发明的一种广义噪声聚类的近红外光谱茶叶品种鉴别方法适用于茶叶品种的鉴别分析，本发明的实施流程如图1所示。Below in conjunction with accompanying drawing description and specific embodiment, the present invention is described in further detail: a kind of near-infrared spectrum tea variety identification method of generalized noise clustering of the present invention is applicable to the identification analysis of tea variety, and the implementation process of the present invention is as shown in Figure 1 shown.

实施例Example

步骤一、茶叶样本近红外光谱的采集：用近红外光谱仪采集不同品种的茶叶样本，获取茶叶样本的近红外漫反射光谱。Step 1. Collection of near-infrared spectra of tea samples: collecting tea samples of different varieties with a near-infrared spectrometer to obtain near-infrared diffuse reflectance spectra of tea samples.

采集优质乐山竹叶青、劣质乐山竹叶青和峨眉山毛峰三种茶叶，每种茶叶的样本数为32，合计96个样本。所有茶叶样本被研磨粉粹后经40目筛过滤，每个样本取0.5g分别与溴化钾按1:100均匀混合后取混合物1g进行压膜处理。在进行采集近红外光谱时实验室温度约25℃，相对湿度在50％左右，FTIR-7600型傅里叶近红外光谱分析仪开机预热1小时。光谱分析仪扫描每个茶叶样本32次，光谱扫描的波数范围为4001.569～401.1211cm^-1，扫描间隔为1.9285cm^-1，每个茶叶样本的近红外光谱是1868维的高维数据。每个样本采样3次，取其平均值作为后续模型建立的实验数据。茶叶样本的近红外光谱图如图2所示。Three kinds of tea were collected: high-quality Leshan Zhuyeqing, inferior Leshan Zhuyeqing and Emeishan Maofeng. The number of samples for each tea was 32, totaling 96 samples. All the tea samples were ground and then filtered through a 40-mesh sieve. 0.5 g of each sample was uniformly mixed with potassium bromide at a ratio of 1:100, and 1 g of the mixture was taken for film-pressing treatment. When collecting near-infrared spectra, the laboratory temperature is about 25°C, the relative humidity is about 50%, and the FTIR-7600 Fourier transform near-infrared spectroscopy analyzer is turned on and preheated for 1 hour. The spectrum analyzer scans each tea sample 32 times. The wavenumber range of spectral scanning is 4001.569～401.1211cm ^-1 , and the scanning interval is 1.9285cm ^-1 . The near-infrared spectrum of each tea sample is 1868-dimensional high-dimensional data. Each sample was sampled 3 times, and the average value was taken as the experimental data for subsequent model establishment. The NIR spectra of the tea samples are shown in Figure 2.

步骤二、对茶叶样本近红外光谱进行降维处理：采用主成分分析方法(PCA)将茶叶样本近红外光谱从高维数据变换为低维数据。Step 2. Dimensionality reduction processing of the near-infrared spectrum of the tea sample: the near-infrared spectrum of the tea sample is transformed from high-dimensional data to low-dimensional data by using principal component analysis (PCA).

采用主成分分析将96个样本的近红外光谱数据压缩为20维的数据。The near-infrared spectral data of 96 samples were compressed into 20-dimensional data by principal component analysis.

步骤三、提取茶叶样本近红外光谱的鉴别信息：采用线性判别分析(LDA)提取茶叶样本近红外光谱的鉴别信息。Step 3, extracting the identification information of the near-infrared spectrum of the tea sample: using linear discriminant analysis (LDA) to extract the identification information of the near-infrared spectrum of the tea sample.

从每种茶叶样本中选取13个样本组成茶叶样本训练集，则训练集样本总数为39个，剩余的样本组成茶叶样本测试集，则测试集样本总数为57个。通过运行LDA计算20维的训练集样本的鉴别向量，并取前2个鉴别向量，将20维的测试集样本投影到这2个鉴别向量上，其测试样本的LDA得分图如图3所示。Select 13 samples from each tea sample to form a training set of tea samples, and the total number of samples in the training set is 39, and the remaining samples form a test set of tea samples, and the total number of samples in the test set is 57. Calculate the discriminant vector of the 20-dimensional training set sample by running LDA, and take the first two discriminant vectors, project the 20-dimensional test set sample onto these two discriminative vectors, and the LDA score map of the test sample is shown in Figure 3 .

步骤四、运行模糊C-均值聚类以得到初始聚类中心。Step 4. Run fuzzy C-means clustering to obtain initial cluster centers.

设置模糊C-均值聚类(FCM)的权重指数m＝2.0，最大迭代数r_max＝100，误差上限值ε＝0.00001，FCM的初始类中心矢量为图3的测试数据的前3个数据。计算所得的FCM的类中心矢量为：Set the weight index m=2.0 of fuzzy C-means clustering (FCM), the maximum number of iterations _rmax =100, the upper limit of error ε=0.00001, the initial class center vector of FCM is the first 3 data of the test data in Fig. 3 . The calculated class center vector of FCM is:

v_1,0＝[-0.097 0.0026]v _1,0 = [-0.097 0.0026]

v_2,0＝[0.0198 -0.0910]v _2,0 = [0.0198 -0.0910]

v_3,0＝[0.0660 0.0472]v _3,0 = [0.0660 0.0472]

步骤五、用广义噪声聚类方法进行茶叶品种的鉴别：根据步骤四的初始聚类中心运行广义噪声聚类方法得到模糊隶属度，根据模糊隶属度可实现茶叶品种的鉴别。Step 5: Use the generalized noise clustering method to identify tea varieties: run the generalized noise clustering method based on the initial clustering center in step 4 to obtain the fuzzy membership degree, and the tea variety identification can be realized according to the fuzzy membership degree.

所述步骤五中的广义噪声聚类方法如下：The generalized noise clustering method in the step five is as follows:

(1).初始化(1).Initialization

设置茶叶近红外光谱样本数目n＝57，样本类别数目c＝3，权重指数m＝2和p(+∞＞p＞1)，初始迭代次数r＝1，最大迭代数r_max＝100，误差上限值ε＝0.00001，初始化类中心v_i,0(i＝1,2,3)；Set the number of tea near-infrared spectrum samples n=57, the number of sample categories c=3, the weight index m=2 and p (+∞>p>1), the initial iteration number r=1, the maximum iteration number r _max =100, the error Upper limit ε=0.00001, initialize class center v _i,0 (i=1,2,3);

(2).计算参数α_ik：(2). Calculation parameter α _ik :

这里σ²是样本的方差；α_ik为第i(i＝1，2，……，c)类别的第k(k＝1，2，……，n)个样本的参数；D_ik,r＝||x_k-v_i,r-1||是x_k-v_i,r-1的欧式距离，x_k为第k个样本，v_i,r-1为第r－1次迭代时第i类的类中心矢量；D_jk,r＝||x_k-v_j,r-1||是x_k-v_j,r-1的欧式距离，ν_j,r-1为第r－1次迭代时第j类的类中心矢量；为总体样本均值，x_j为第j个样本。Here σ ² is the variance of the sample; α _ik is the parameter of the kth (k=1, 2, ..., n) sample of the i (i=1, 2, ..., c) category; D _ik,r ＝||x _k -v _{i, r-1} || is the Euclidean distance of x _k -v _{i, r-1} , x _k is the kth sample, v _{i, r-1} is the r-1th iteration The class center vector of the i-th class; D _jk,r =||x _k -v _j,r-1 || is the Euclidean distance of x _k -v _j,r-1 , ν _j,r-1 is the r-th The class center vector of the jth class at 1 iteration; is the overall sample mean, and x _j is the jth sample.

这里隶属度值u_ik,r表示第r次迭代计算时第k个样本隶属于第i类的模糊隶属度值；D_ik,r＝||x_k-v_i,r-1||，v_i,r-1为第r－1次迭代时第i类的类中心矢量。Here the membership value u _ik,r represents the fuzzy membership value of the k-th sample belonging to the i-th class during the r-th iterative calculation; D _ik,r =||x _k -v _i,r-1 ||, v _{i, r-1} is the class center vector of the i-th class at the r-1th iteration.

实验结果：当迭代7次(r＝7)时迭代终止，此时的模糊隶属度值u_ik,7的数值如图4所示，取第k个样本中u_ik,7的最大值所对应的i值，即判定第k个样本属于第i类。当权重指数p分别为2，3，4，5，6，7，8，9，10，11，12，13时，根据模糊隶属度的值可得到聚类准确率如图5所示。Experimental results: when iterating 7 times (r=7), the iteration terminates, and the value of the fuzzy membership value u _ik,7 at this time is shown in Figure 4, which corresponds to the maximum value of u _ik,7 in the kth sample The i value of , that is, to determine that the kth sample belongs to the i-th class. When the weight index p is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, the clustering accuracy can be obtained according to the value of fuzzy membership, as shown in Figure 5.

实验结果：迭代终止时r＝7，v_i,7为：Experimental results: when the iteration terminates r=7, v _i,7 is:

a判断以v_0,7为类中心的茶叶属于哪一类：a Determine which category the tea with v _0,7 as the center of the category belongs to:

所以，的值最小，则判定以v_0,7为类中心的茶叶属于优质乐山竹叶青。so, The value of is the smallest, then it is determined that the tea with v _0,7 as the center of the class belongs to high-quality Leshan Zhuyeqing.

b判断以v_1,7为类中心的茶叶属于哪一类：b Determine which category the tea with v _1,7 as the center of the category belongs to:

所以，的值最小，则判定以v_1,7为类中心的茶叶属于峨眉山毛峰。so, The value of is the smallest, then it is determined that the tea with v _1,7 as the center of the class belongs to Maofeng of Mount Emei.

c判断以v_2,7为类中心的茶叶属于哪一类：c Determine which category the tea with v _2,7 as the center of the category belongs to:

所以，的值最小，则判定以v_2,7为类中心的茶叶属于劣质乐山竹叶青。so, The value of is the smallest, then it is determined that the tea with v _2,7 as the center of the class belongs to inferior Leshan Zhuyeqing.

以上所述，仅为本发明的一部分具体实施方式，本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。The above is only a part of the specific implementation methods of the present invention, and the protection scope of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention.

Claims

1. A method for quickly identifying tea varieties using near-infrared spectroscopy, characterized in that it may further comprise the steps:

Step 1. Collection of near-infrared spectra of tea samples: collecting tea samples of different varieties with a near-infrared spectrometer to obtain near-infrared diffuse reflectance spectra of tea samples;

Step 2: Perform dimensionality reduction processing on the near-infrared spectrum of the tea sample: use principal component analysis to convert the near-infrared spectrum of the tea sample from high-dimensional data to low-dimensional data;

Step 3, extracting the identification information of the near-infrared spectrum of the tea sample: using linear discriminant analysis to extract the identification information of the near-infrared spectrum of the tea sample;

Step 4, run fuzzy C-means clustering to obtain initial cluster centers;

Step five, carry out the discrimination of tea variety with a kind of generalized noise clustering method: run generalized noise clustering method according to the initial clustering center of step four to obtain fuzzy membership degree, can realize the discrimination of tea variety according to fuzzy membership degree;

The generalized noise clustering method in the step 5 adopts the generalized noise clustering based on the p power of Euclidean distance to classify the tea varieties, as follows:

(1).Initialization

Set the number of tea near-infrared spectrum samples n (+∞>n>1), the number of sample categories c (n>c>1), the weight index m (+∞>m>1) and p (+∞>p>1) , the initial number of iterations r=1, the maximum number of iterations r _max , the upper limit of error ε, the initialization class center v _i,0 ;

(2). Calculation parameter α _ik :

Here σ ² is the variance of the sample; α _ik is the parameter of the kth (k=1, 2, ..., n) sample of the i (i=1, 2, ..., c) category; D _ik,r ＝||x _k -v _{i, r-1} || is the Euclidean distance of x _k -v _{i, r-1} , x _k is the kth sample, v _{i, r-1} is the r-1th iteration The class center vector of the i-th class; D _jk,r =||x _k -v _j,r-1 || is the Euclidean distance of x _k -v _j,r-1 , ν _j,r-1 is the r-th The class center vector of the jth class at 1 iteration; is the overall sample mean, x _j is the jth sample;

(3). Calculate the fuzzy membership value u _ik,r during the rth iteration;

Here the membership value u _ik,r represents the fuzzy membership value of the k-th sample belonging to the i-th class during the r-th iterative calculation; D _ik,r =||x _k -v _i,r-1 ||, v _{i, r-1} is the class center vector of the i-th class at the r-1th iteration;

(4). Calculate the class center v _i,r at the rth iteration;

When max _i ||v _{i, r} -v _{i, r-1} ||<ε or r=r _max , the iteration terminates; otherwise, r=r+1, return to step (2) to continue the iterative calculation.