CN110749565A

CN110749565A - Method for rapidly identifying storage years of Pu' er tea

Info

Publication number: CN110749565A
Application number: CN201911201641.2A
Authority: CN
Inventors: 冯德军; 王淑贤; 朱佳成; 肖航; 杨振发; 姜明顺; 隋青美
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2020-02-04

Abstract

The invention discloses a method for rapidly identifying the storage years of Pu' er tea, which comprises the following steps: collecting an original spectrum: polishing and crushing Pu 'er tea leaves to prepare a Pu' er tea sample, and detecting an original spectrum of the sample by using a near-infrared spectrometer; preprocessing of the original spectrum: preprocessing an original spectrum by using a method of first-order derivative and multivariate scattering correction combination, and dividing the preprocessed spectrum data into a correction set and a verification set; constructing a discriminant partial least square model: establishing a discriminant partial least square model by using the spectral data of the calibration set samples, and verifying the effectiveness of the model by using the spectral data of the verification set samples; identification of the sample: and (3) carrying out near infrared spectrum acquisition on a Pu 'er tea sample to be identified for unknown years, preprocessing the spectral data, and then introducing the spectral data into a judgment partial least square model subjected to validity verification to obtain the storage year of the Pu' er tea. The method disclosed by the invention is simple to operate and high in accuracy of the identification result.

Description

Method for rapidly identifying storage years of Pu' er tea

Technical Field

The invention relates to the technical field of tea leaf identification, in particular to a method for quickly identifying the storage years of Pu' er tea.

Background

The Pu 'er tea is a unique tea product in Yunnan province of China, is mainly produced in Xishuangbanna, Lincang, Pu' er and other areas in Yunnan province, and has unique taste and aroma. According to research and analysis, the Pu' er tea has the activities of reducing blood fat, resisting bacteria and viruses and relaxing bowels, has unique advantages in the aspect of health, is greatly pursued by people, and obtains larger market space. The price of the Pu ' er tea is higher than that of most other tea in the market, the price difference of different types of Pu ' er tea is large, and the shapes of various types of Pu ' er tea are similar, so that common consumers are difficult to distinguish. Sensory evaluation and physical and chemical index detection are two important methods for detecting the quality of the Pu' er tea at present. However, sensory evaluation depends mainly on the experience of an evaluator, and is easily confused subjectively. The physical and chemical index detection operation is complex, and time and labor are wasted.

In recent years, various analysis technologies are combined with chemometrics methods to be applied to quantitative and qualitative analysis of Pu' er tea, and mainly include analysis technologies such as infrared spectroscopy, electronic nose, laser-induced breakdown spectroscopy and surface-enhanced Raman spectroscopy, and chemometrics methods such as principal component analysis, artificial neural network, linear discrimination and support vector machine.

The near infrared spectrum technology is a detection technology which is developed and widely applied in recent years, is a rapid and nondestructive detection means, has the advantages of low analysis cost, high detection speed and the like, and is widely applied to qualitative and quantitative analysis. The near infrared spectrum technology is to obtain near infrared spectrum data by measuring the near infrared spectrum of a sample, and analyze the spectrum data by combining a chemometrics method to achieve the aim of identifying the sample.

In the existing spectroscopy method for the storage year research of the Pu 'er tea, some Pu' er tea can be distinguished in different years by comparing absorption peaks and absorbance ratios, and unknown samples can not be distinguished. Some methods are combined with chemometrics methods for qualitative judgment, but the judgment accuracy is not ideal. The spectral matrix of the near infrared spectrum measurement technology has many useless noise information, so that the identification result has large errors.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for rapidly identifying the storage year of Pu' er tea, so as to achieve the purposes of simple operation and high identification result accuracy.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a method for rapidly identifying the storage years of Pu' er tea comprises the following steps:

(1) collecting an original spectrum: polishing and crushing Pu 'er tea leaves to prepare a Pu' er tea sample, and detecting an original spectrum of the sample by using a near-infrared spectrometer;

(2) preprocessing of the original spectrum: preprocessing an original spectrum by using a method of first-order derivative and multivariate scattering correction combination, and dividing the preprocessed spectrum data into a correction set and a verification set;

(3) constructing a discriminant partial least square model: establishing a discriminant partial least square model by using the spectral data of the correction set, and verifying the effectiveness of the model by using the spectral data of the verification set;

(4) identification of the sample: the method comprises the steps of collecting a near infrared spectrum of a Pu 'er tea sample of unknown years to be identified, preprocessing the near infrared spectrum by a first derivative and multivariate scattering correction combination method, and then introducing the processed sample into a judgment partial least square model with validity verification to obtain the storage year of the Pu' er tea.

In the scheme, in the step (1), a Fourier transform near-infrared spectrometer is used, polytetrafluoroethylene is used as a background spectrum, a quartz halogen lamp is used as a light source, and a high-flux double-axis Michelson interferometer is adopted.

In the above scheme, the pretreatment method in step (2) is as follows:

1) calculating a spectrum matrix X with n samples and k wavelength points_n×kAverage spectrum of

Wherein n is the number of Pu' er tea samples, k is the number of wavelength points on each spectrum, and X_i,jExpressed as the absorbance value of the ith sample at the jth wavelength spot;

2) establishing each spectrum X_iAnd

a linear regression relationship between them to obtain a_iAnd b_i：

3) Performing multiple scattering correction according to each spectrum X_iAnd corresponding a_iAnd b_iObtaining a corrected spectrum X_i(MSC)：

X_i(MSC)＝(X_i-a_i)/b_i； (3)

4) Using direct difference method to X_i(MSC)And (5) calculating a first derivative with the difference width of g at the wavelength point k according to the following formula:

X_i(1st)＝(X_i,k+g-X_i,k)/g； (4)

X_i,k+gand X_i,kThe absorbances at wavenumber points k + g and k on the ith sample spectrum are shown, respectively.

Further, the specific method of the step (3) is as follows:

1) spectral matrix X using calibration set samples_n×kAnd the category matrix C_n×1Performing main component decomposition:

X_n×k＝T_n×d·P_d×k+E_n×k； (5)

C_n×1＝U_n×d·Q_d×1+F_n×1； (6)

in the above formula, T_n×dAnd U_n×dRespectively an absorbance characteristic factor matrix and a class characteristic factor matrix, P_d×kAnd Q_d×1Respectively absorbance load matrix and class load matrix, E_n×kAnd F_n×1Is an error matrix;

2) will T_n×dAnd U_n×dMultiple linear regression:

U_n×d＝T_n×d·B； (7)

B＝(X′_n×k·X_n×k)^-1·X′_n×k·C_n×1； (8)

3) determining the value of the number of characteristic factors d:

substituting equations (7) and (8) into equations (5) and (6) yields:

C_n×1＝T_n×d·B·Q_d×1+F_n×1； (9)

determining the number d of the characteristic factors according to the cross validation root mean square error RMSECV of the real value and the predicted value of the correction set;

4) and verifying the effectiveness of the partial least square model by using the spectral data of the verification set samples.

Preferably, the threshold of the discriminant partial least squares model is set to 0.5, and when the absolute value of the difference between the predicted value and the actual value of the verification centralized sample is less than 0.5, the model is correctly discriminated; and (5) selecting the number d of the characteristic factors to be 6 to establish a discriminant partial least square model.

Preferably, in the step (1), the wave number range of the collected near infrared spectrum is 10000-4000cm^-1The repeated scanning times of spectrum acquisition is 32 times, and the spectral resolution is 4cm^-1Each sample was measured 3 times and averaged to give the final measured spectrum. Through the technical scheme, the method for rapidly identifying the storage years of the Pu' er tea provided by the invention has the following advantages:

1. the near infrared spectrum has the characteristics of high speed, high efficiency, low cost, wide application range and the like, can be used for directly analyzing a solid sample, and can extract a large amount of useful information from a near infrared spectrogram by adopting full spectrum analysis and combining a chemometrics method.

2. In the preprocessing process, the Multivariate Scattering Correction (MSC) corrects each spectrum by using an average spectrum, and can eliminate the light scattering influence caused by optical path difference and uneven sample particle size and density; first derivative (1)^stDer) adopts a direct difference method to conduct derivation on the spectrum, can effectively eliminate the influence of baseline drift and background interference, and improves the signal-to-noise ratio and the resolution ratio of the spectrum.

3. Discriminant partial least squares (PLS-DA) is a Partial Least Squares (PLS) based on discriminant basis, which is an effective combination of multiple linear regression, canonical correlation analysis and principal component analysis, and has superior discriminant effects by projecting a high-dimensional data matrix into a lower-dimensional space. In the process of decomposing the spectral matrix, PLS-DA needs to introduce the information of the category information matrix into the spectral information matrix and then carry out orthogonal decomposition. By doing so, useless noise information in the spectrum matrix can be effectively eliminated, and useless information in the category information matrix is also eliminated, so that an optimal calibration model is ensured to be obtained.

The discrimination model for discriminating the storage years of the Pu' er tea, which is established by combining the near infrared spectrum technology with the discrimination partial least square algorithm, has higher accuracy, can be verified by a physicochemical method, fully proves the effectiveness of the discrimination result, and has simple operation and high accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is an original spectrogram of Pu' er tea of five different storage years;

FIG. 2 is a spectrum after first derivative and MSC pre-processing;

FIG. 3 is a PLS-DA discriminant model result diagram of Pu' er tea calibration set samples;

FIG. 4 is a PLS-DA discriminant model result diagram of Pu' er tea validation set samples.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The invention provides a method for rapidly identifying the storage years of Pu' er tea, which comprises the following specific embodiments:

1. sample (I)

The research sample materials are Pu ' er tea from five different storage years in the unworked mountain of Pu ' er city in Yunnan province, are respectively produced from Pu ' er tea in 2010, 2012, 2014, 2016 and 2018, and have better representativeness. 40 parts of samples are prepared from Pu' er tea of different storage years, and the total amount is 200 parts of samples. 5g (precision 0.01) of tea leaves weighed each time are put into a solid sample crusher (Wanke instruments Co., Ltd.) to be ground and crushed, and then are put into a sample bottle to be used as a Pu' er tea sample.

2. Laboratory instrument and spectral acquisition

The research adopts a Fourier near infrared spectrometer (ABB MB3600 in Switzerland) provided with a high-sensitivity InGaAs detector, a diffuse reflection probe and a solid sample testing kit for near infrared spectrum measurement. The light source of the spectrometer is a quartz halogen lamp, and the high-flux double-rotating-shaft Michelson interferometer is adopted, so that the stability and the repeatability are ensured. The acquisition of near infrared spectrum data is realized by a horizon n MB software (3.4.0.3 edition, ABB MB3600, Switzerland), and the scanning wave number of the near infrared spectrometer is 10000-4000cm^-1And at 1.928cm^-1The data was measured at intervals of 3112 variables per spectrum.

The spectra were obtained by 32 consecutive scans with a spectral resolution of 4cm^-1Each sample was measured 3 times and averaged to give the final measured spectrum. Polytetrafluoroethylene (PTFE, mod. skg8613g, ABB, switzerland) was chosen as background spectrum. The raw spectra collected are shown in fig. 1.

3. Spectral preprocessing

The invention uses the first derivative (1)^stDer) and Multivariate Scatter Correction (MSC) are performed, and the specific process is as follows:

Wherein n is 200, k is 3112, X_i,jExpressed as the absorbance value of the ith sample at the jth wavelength spot;

2) establishing each spectrum X_iAnd

a linear regression relationship between them to obtain a_iAnd b_i：

X_i(MSC)＝(X_i-a_i)/b_i； (3)

X_i(1st)＝(X_i,k+g-X_i,k)/g； (4)

The spectrum after pretreatment is shown in FIG. 2.

The data of 200 Pu' er tea samples in different storage years are as follows: the scale of 1 is divided into a correction set and a validation set. The correction set of the Pu 'er tea samples in each year is 30 parts, the verification set is 10 parts, the correction set is 150 parts and the verification set is 50 parts for the Pu' er tea samples in five different storage years.

4. Construction of discriminant partial least squares model (PLS-DA)

PLS-DA is a partial least squares algorithm based on discriminant analysis. The PLS-DA is used for performing PLS analysis on a matrix representing sample class attributes and a matrix containing sample spectrum data, establishing a PLS discriminant model of classification variables and spectrum data, and performing discriminant prediction on unknown samples in a verification set. The method uses the category information matrix to replace a concentration matrix in a partial least square regression model, so that noise information in a spectrum matrix can be effectively eliminated, useless information in the information matrix is also eliminated, and the optimal calibration model is ensured to be obtained. The specific modeling method is as follows:

X_n×k＝T_n×d·P_d×k+E_n×k； (5)

C_n×1＝U_n×d·Q_d×1+F_n×1； (6)

in the correction set, n is 150, and k is 3112.

2) Will T_n×dAnd U_n×dMultiple linear regression:

U_n×d＝T_n×d·B； (7)

B＝(X′_n×k·X_n×k)^-1·X′_n×k·C_n×1； (8)

3) determining the value of the number of characteristic factors d:

substituting equations (7) and (8) into equations (5) and (6) yields:

C_n×1＝T_n×d·B·Q_d×1+F_n×1； (9)

in the invention, as the number of the characteristic factors increases, RMSECV is continuously reduced, when d is 6, RMSECV is 0.0041, and when d is 6>At 6, the RMSECV value tends to be stable, and the residual matrix F at this time_n×1It is negligible and therefore the number of selected eigenfactors is 6 for modeling.

4) The effectiveness of the partial least square distinguishing model is verified by using the spectrum data of the verification set samples, the quality of the model performance is based on the distinguishing accuracy of the verification set samples, and the higher the distinguishing accuracy is, the better the performance of the model is. The preprocessing and modeling of spectra in this study used Matlab and PLS _ toolbox. FIG. 3 is a PLS-DA discriminant model result diagram of Pu 'er tea calibration set samples, and FIG. 4 is a PLS-DA discriminant model result diagram of Pu' er tea verification set samples. As can be seen from fig. 3 and 4, pass 1^stAnd (3) performing Der and MSC pretreatment, wherein the number of discrimination errors in 50 verification set samples is 0, and the discrimination accuracy is 100.00%.

5. Identifying the storage year of Pu' er tea

Carrying out near infrared spectrum scanning on a Pu' er tea sample to be identified to obtain near infrared spectrum data, and processing by a processing unit 1^stAnd (5) pretreating the Der and the MSC, and introducing the pretreated Der and MSC into a model with verified effectiveness to identify the storage year of the Pu' er tea.

Conclusion 6

Identifying Pu 'er tea samples of five different storage years in Pu' er city of Yunnan province by near infrared spectrum technology and partial least square algorithm, and using 1^stThe Der and the MSC preprocess the near infrared spectrum, and establish a PLS-DA model to obtain better identification effect, and can accurately judge the storage year of the Pu' er tea.

Comparison of different pretreatment methods:

the data of 200 Pu' er tea samples in different storage years are as follows: the scale of 1 is divided into a correction set and a validation set. The correction set of the Pu 'er tea samples in each year is 30 parts, the verification set is 10 parts, the correction set of the Pu' er tea samples in five different storage years is 150 parts, and the verification set is 50 parts.

Different spectrum preprocessing data are used for establishing PLS-DA (partial least squares-data acquisition) discrimination model

The original spectral data is preprocessed by a plurality of preprocessing methods to obtain the spectral data preprocessed by each method, and the spectral data is divided into a correction set and a verification set. Assigning classification variables of Pu 'er tea correction set samples in different storage years according to the flow of the PLS-DA discrimination method, and assigning 1, 2, 3, 4 and 5 to Pu' er tea in 2010, 2012, 2014, 2016 and 2018 respectively. After assignment, carrying out regression analysis on the spectrum of the correction set sample and the classification variable corresponding to the sample, establishing a PLS model between the spectrum characteristics and the classification variable, comparing the discrimination accuracy of the sample under various spectrum preprocessing methods, screening out the optimal preprocessing method according to the discrimination accuracy, establishing the optimal discrimination combination, and obtaining the discrimination result of the PLS-DA model established under different spectrum preprocessing methods on the storage year of the Pu' er tea in the table 1.

TABLE 1 correction and prediction results of PLS-DA models constructed under different preprocessing methods

Pretreatment method	Number of erroneous judgments of correction set	Correction set accuracy	Number of false positives for a validation set	Validation set accuracy
					Original spectrum
	15	90.00％	17	66.00％
					MSC	14	90.67％	5	90.00％
SNV
		15	90.00％	5	90.00％
Normalization						16	89.33％	6	88.00％
	1^st Der	1	99.33％	1	98.00％
2^nd Der						0	100.00％	1	98.00％
	MSC+1^st Der	0	100.00％	0	100.00％

As shown in Table 1, use 1^stWhen the Der and the MSC preprocess the spectral data, the judgment accuracy of the PLS-DA judgment model is the highest, and the judgment accuracy of the model correction set and the verification set is 100%. All 150 calibration set samples and 50 validation set samples were correctly discriminated. In summary, 1^stThe Der combines with MSC to use together to preprocess the spectral data, establishes PLS-DA model with highest discrimination accuracy, has better stability and is beneficial to sample discrimination, therefore, the invention 1^stAnd (3) establishing a PLS-DA model by using Der and MSC as an optimal preprocessing method to perform discriminant analysis on the storage years of the Pu' er tea.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for rapidly identifying the storage years of Pu' er tea is characterized by comprising the following steps:

(4) identification of the sample: the method comprises the steps of collecting a near infrared spectrum of a Pu 'er tea sample to be identified for unknown years, preprocessing the near infrared spectrum by a first derivative and multivariate scattering correction combination method, and guiding the processed sample into a judgment partial least square model with validity verification to obtain the storage year of the Pu' er tea.

2. The method for rapidly identifying the storage years of Pu' er tea according to claim 1, wherein in the step (1), a Fourier transform near infrared spectrometer is used, polytetrafluoroethylene is used as a background spectrum, a light source is a quartz halogen lamp, and a high-flux double-axis Michelson interferometer is adopted.

3. The method for rapidly identifying the storage years of Pu' er tea according to claim 1, wherein the preprocessing method in the step (2) is as follows:

2) establishing each spectrum X_iAnd

a linear regression relationship between them to obtain a_iAnd b_i：

X_i(MSC)＝(X_i-a_i)/b_i； (3)

X_i(1st)＝(X_i,k+g-X_i,k)/g； (4)

X_i,k+gand X_i,kThe absorbances at the i-th spectral wavenumber points k + g and k are shown, respectively.

4. The method for rapidly identifying the storage years of Pu' er tea according to claim 3, wherein the specific method of the step (3) is as follows:

X_n×k＝T_n×d·P_d×k+E_n×k； (5)

C_n×1＝U_n×d·Q_d×1+F_n×1； (6)

2) will T_n×dAnd U_n×dMultiple linear regression:

U_n×d＝T_n×d·B； (7)

B＝(X′_n×k·X_n×k)^-1·X′_n×k·C_n×1； (8)

3) determining the value of the number of characteristic factors d:

substituting equations (7) and (8) into equations (5) and (6) yields:

C_n×1＝T_n×d·B·Q_d×1+F_n×1； (9)

determining the number d of the characteristic factors according to the cross validation root mean square error of the real value and the predicted value of the correction set;

4) and verifying the effectiveness of the discriminant partial least squares model by using the spectral data of the verification set samples.

5. The method for rapidly identifying the storage years of Pu' er tea according to claim 4, wherein the threshold of the discrimination partial least square model is set to 0.5, and when the absolute value of the difference between the predicted value and the actual value of the verification centralized sample is less than 0.5, the model discrimination is correct; and (5) selecting the number d of the characteristic factors to be 6 to establish a discriminant partial least square model.

6. The method as claimed in claim 2, wherein the wave number of the collected near infrared spectrum in step (1) is 10000-4000cm^-1The repeated scanning times of spectrum acquisition is 32 times, and the spectral resolution is 4cm^-1Each sample was measured 3 times and averaged to give the final measured spectrum.