CN115165950A

CN115165950A - Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof

Info

Publication number: CN115165950A
Application number: CN202210495061.4A
Authority: CN
Inventors: 侯如燕; 金戈; 崔传坚; 韦朝领
Original assignee: Anhui Agricultural University AHAU
Current assignee: Anhui Agricultural University AHAU
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2022-10-11
Anticipated expiration: 2042-05-07
Also published as: CN115165950B

Abstract

The invention relates to the field of nuclear magnetic resonance detection, in particular to a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof, and is based on ¹ Two-phase extraction fingerprinting by H NMR combined with multivariate data, the geographic traceability (ballast) of Taiping Houkui green tea was analyzed, principal component analysis was used as an exploratory tool for clustering summary, support vector machines and Random Forests (RF) were further applied for classification, combining polar and non-polar extractionThe RF model of the substance achieved an optimal accuracy of 87.5%, with catechins, fatty acids and sucrose being considered as contributors to this classification and as important differential metabolites, these results supporting the use of ¹ H NMR combined with machine learning tools to identify green tea in narrow-sense origins.

Description

Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof

Technical Field

The invention relates to the field of nuclear magnetic resonance detection, in particular to a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof.

Background

Tea leaves, one of the three most popular beverages in the world, are favored by consumers due to their unique flavors. Among all tea leaf types, green tea has the largest market share in china. The demand for green tea is closely related to the geographical origin and corresponding quality of the tea leaves, which further affects the price and consumer choice. In china, famous green tea is usually produced in narrow production areas such as west lake longjing, huangshan Maofeng and Taiping Houkui. Taiping houkui is considered to be the king of green tea, and has a unique appearance and an orchid-like aroma. The taiping houkui is mainly produced in the new mingten town, the three-mouth town, the longmen town, the youth, the monkey sentry and the monkey pit of the yellow mountain city of Anhui province. The Yanjia, the monkey hillock and the monkey pit are the core producing areas and are the most special places in the field. The taiping houkui from different origins are similar in appearance and geographical location. Driven by the interest, some illegal merchants know that the product production comes from elsewhere, but still mark the product as a valuable geographic place. Therefore, there is an urgent need for an effective method for certifying green tea from narrow places of origin.

The traditional tea production place identification method is judged by sensory evaluation, depends on the experience of people, is easily influenced by artificial subjective factors, and has certain limitation. In recent years, some emerging detection techniques are widely used as complementary methods to sensory evaluation, such as high performance liquid chromatography-mass spectrometry, headspace solid phase microextraction and gas chromatography-mass spectrometry, stable isotopes, elemental analysis, electronic noses and electronic tongues. However, these techniques typically require complex sample pre-treatment or derivatization, run times are long, and are not suitable for routine analysis. In contrast, nuclear Magnetic Resonance (NMR) techniques are fast (typically 3-5 minutes per sample) and can produce reliable metabolite fingerprints in the smallest samples. In addition, nuclear magnetic resonance allows simultaneous identification of multiple chemical components from one experiment with good reproducibility. Due to the presence of different metabolitesThe vast difference that a complete analysis of the metabolome is not feasible. Some occur in large amounts (up to ten percent of dry matter) while others occur only in trace amounts (in pmol amounts or less). Some are extremely hydrophilic (like sugar) and others are lipophilic (like fat), which makes extraction both in one extract almost impossible. In addition, due to ¹ The spectral range of H NMR is limited and superposition of signals often occurs. Especially small signals close to large signals are difficult to find. These factors limit the development of nuclear magnetic spectroscopy. The two-phase extraction can obtain polar and nonpolar metabolites at one time and comprehensively reflect the metabolite information in the tea.

In recent decades, there have been several reports on the traceability of tea leaves based on nuclear magnetic resonance, and different countries have been studied to differentiate green tea according to origin, and no satisfactory results have been obtained. The accuracy of distinguishing oolong tea from three different origins using nmr data was only 68.2-78.7%. These studies only used polar metabolites in tea leaves for identification, and the degree of contribution of non-polar metabolites to identification of tea-leaf origin is still unknown, and traceability of polar extracts in tea leaves to narrow-leaf origin is limited. In addition, the nuclear magnetic resonance is applied to classify the tea leaves with narrow production places and unobvious climate differences, and the accuracy of the classification needs to be further improved. Research reports suggest that obtaining a comprehensive metabolic fingerprint may provide additional insight. However, few studies will be based on ¹ The metabolomics approach of H NMR was applied to certification of tea origin using polar and non-polar extracts.

In view of the above-mentioned drawbacks, the inventors of the present invention have finally obtained the present invention through a long period of research and practice.

Disclosure of Invention

The invention aims to solve the problems that polar metabolites in tea leaves are only used for identification, the contribution degree of non-polar metabolites to tea leaf origin identification is unknown, nuclear magnetic resonance is used for classifying tea leaves with narrow origin and unobvious climate difference, and the accuracy is not high in the existing research, and provides a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof.

In order to achieve the aim, the invention discloses a method for identifying the origin tracing of a tea production place by using a two-phase extraction NMR spectrum, which comprises the following steps:

s1: crushing a tea sample, freeze-drying, and collecting a nuclear magnetic resonance spectrum;

s2: pre-processing the NMR spectra with MestReNova software;

s3: performing principal component analysis on the spectral data to reduce dimensionality and visualize the results;

s4: and (4) importing the spectral data into the model, calculating the accuracy and evaluating the model.

And (2) carrying out ultrasonic treatment and centrifugation on the tea sample crushed in the step (S1), and obtaining a nuclear magnetic resonance spectrum by a 600MHz HMR spectrometer at the temperature of 298K.

The region of interest in step S2 is selected D ₂ O:0.6-8.12ppm, excluding 4.52-5.0ppm and CDCL ₃ :0.48-7.60ppm, excluding chemical shifts of 7.2-7.28 ppm.

And in the step S3, the principal component analysis uses PCA to reduce the dimensionality of the data so as to visualize the data.

The specific process of importing the spectrum data into the model in the step S4 is as follows: for the spectral data extracted from the aqueous phase, 0.6-8.12ppm was selected, excluding the chemical shifts of 4.52-5.0ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, for the spectrum taken from the chloroform phase, 0.48-7.60ppm was selected, excluding the chemical shifts of 7.2-7.28ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, and the nuclear magnetic spectral data obtained from the aqueous phase and the chloroform phase were directly combined by low-level data fusion and introduced into a random forest RF model.

The formula for calculating accuracy in step S4 is:

accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%

Wherein TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.

In the step S4, the sensitivity and the specificity of the model are also evaluated, and the calculation formulas of the sensitivity and the specificity are respectively as follows:

sensitivity = TP/(TP + TN) × 100%

Specificity = TN/(TP + TN) × 100%

The invention also discloses application of the method for identifying the origin tracing of the tea through the two-phase extraction NMR spectrum in identifying the origin tracing of the Taiping Houkui tea.

Due to the large differences between different metabolites, a complete analysis of the metabolome is not feasible. Some occur in large amounts (up to ten percent of dry matter) while others occur only in trace amounts (in pmol amounts or less). Some are extremely hydrophilic (like sugar) and others are lipophilic (like fat), which makes extraction both in one extract almost impossible. In addition, due to ¹ The spectral range of H NMR is limited and superposition of signals often occurs. Especially small signals close to large signals are difficult to find. Only a part of metabolites can be obtained by single metabolite extraction, and in order to improve the coverage rate of the extracted metabolites, a comprehensive NMR fingerprint metabolic spectrum of the tea leaves is obtained by one-time biphasic extraction.

Compared with the prior art, the invention has the beneficial effects that: the invention uses ¹ H NMR fusion of polar and nonpolar compounds is very advantageous for the traceability of narrow production areas, based on ¹ The two-phase extraction of H NMR fingerprints is combined with machine learning, green tea in narrow areas can be distinguished, single solvent extraction is limited for tea source tracing in the narrow areas, the classification precision is obviously improved through fusion extraction of polar metabolites and nonpolar metabolites, the random forest model shows the best classification precision of 87.50%, and the method can be used as a rapid screening technology, helps professional auditors to identify production places, and can be used as an additional reference based on objective measurement.

Drawings

FIG. 1 is a sample collection plot of Taiping Houkui;

FIG. 2 shows 600 of Taiping Houkui sample (mixed sample)MHz ¹ H NMR spectrum, (a) two-phase extraction (D) ₂ O); (b) Two-phase extraction (CDCl) ₃ )；

FIG. 3 shows the PCA of two-phase extraction of TPHX, (a) D ₂ A phase O; (b) CDCl ₃ Phase (1);

FIG. 4 is D ₂ O and CDCl ₃ Visualization of pearson correlation coefficients between data;

FIG. 5 is a data fusion for biphasic extraction, (a) PCA visualization; (b) RF model features are ordered by their contribution to classification accuracy, core zone: core production area (youth, monkey sentry, monkey village), other production areas: other production areas (new Ming, three-port, gantry);

FIG. 6 is D ₂ O and CDCl ₃ Fusing data for visualization of the first five PCAs;

fig. 7 is bin between core and non-core pay zones (P < 0.05);

FIG. 8 is a box plot of the significantly different metabolites in TPXH samples obtained from two different regions (Kruskal-Wallis test, P <0.05, FDR <0.05, RF model feature variable screening), core production zone: core pay zone, other pay zones: other production areas.

Detailed Description

The above and further features and advantages of the present invention are described in more detail below with reference to the accompanying drawings.

1. Green tea samples

From the core production area (young house, monkey post, monkey pit) and other production areas (Longzhen, xinming town, sankouzhen town) of Huangshan City, anhui province, 72 Taiping Houkui samples were collected almost covering the whole tea production season (figure 1), samples from different production areas were collected in the same production process and in different batches, modern tea producers were entrusted to make tea samples according to the Chinese National Official Standards (CNOS) GB/T19698-2008, detailed information of the samples is shown in Table 1, and all samples were stored at 4 ℃ for analysis.

TABLE 1 Taiping Houkui sample information

2. Sample preparation

The taiping kowkui sample was pulverized with a blender and freeze-dried for 48 hours. Then, 100mg of the freeze-dried tea leaves were transferred to a 2 mL centrifuge tube and 0.8mL of D was added ₂ O, then 0.8mL CDCl was added ₃ (TMS 0.03% w/v). Subsequently, the extract was sonicated for 10 minutes, then centrifuged (13000 Xg) at 20 ℃ for 5 minutes, and then 0.4mL of D in the sample was added ₂ O phase transfer to NMR tube and 0.1mL D addition ₂ O (TSP 0.05% w/v). Then, 0.4mL of CDCl was taken out ₃ Phase transfer to another nmr tube.

3. Nuclear magnetic resonance data acquisition, processing and analysis

All of ¹ The H NMR spectra were obtained using a 600MHz NMR spectrometer (Agilent Technologies, CA, USA) at a temperature of 298K. CDCl ₃ Of extracts ¹ The H NMR spectrum used the following parameters: the number of scans =64; spectral width =9615.4 hertz, size of FID (TD =65536; relaxation delay =1 second; acquisition time =1.7 seconds. D ₂ Of O ¹ The H NMR spectra were obtained by WET1D pulse sequences using a deformable selective pulse to suppress the residual water signal. Each spectrum consists of 64 scans, 65536 data points, a spectral width of 9615.4 hz in the frequency domain, a relaxation delay of 1.5 seconds, and an acquisition time of 4.00 seconds.

The NMR spectra were pre-processed with MestReNova software (MestReNova v 14.0.1,2018, mestrelab research, santiago de Compstela, spain). The signal peaks for the internal references TMS and TSP were set to a chemical shift of 0.00 ppm. For all spectra, automatic phase and baseline corrections were performed. For theThe extracted spectrum, using bin for segmented spectrum, was set to 0.04ppm. Region of interest selection D ₂ O (0.6-8.12 ppm, excluding 4.52-5.0 ppm) and CDCl ₃ (0.48-7.60 ppm, excluding 7.2-7.28 ppm). Bins are generated by normalizing the intensity of each bin to the total intensity of each spectrum for multivariate analysis. Two resulting data matrices, one CDCl, were obtained ₃ (72 x 176), the other is D ₂ O (72 x 176) extract. These matrices are then merged into a third fused data matrix (72 x 352).

At present, the research on tea leaves focuses on polar extraction, and the nonpolar fingerprint spectrum is ignored. To obtain the comprehensive metabolic fingerprint of taiping kowkui, two-phase extraction was used to analyze polar and non-polar metabolites. According to the published literature, the HMDB database, in a two-phase extraction (D) ₂ O) identified 16 taiping kowkui metabolites (fig. 2a and table 2). Metabolites identified in the current study include carbohydrates (sucrose, alpha-glucose, beta-glucose and fructose), amino acids (theanine, alanine, isoleucine, leucine and threonine), organic acids (gallic acid, quinic acid and acetic acid) and phenols (EGCG, EC, ECG and EGC). For two-phase extraction (CDCl) ₃ ) The main fatty acids in the spectrum tea of (a) are linolenic acid, linoleic acid, oleic acid and palmitoleic acid. The proton distribution of the different functional groups is shown in figure 2b and table 3. Because the chemical properties of different fatty acids in tea are similar, ¹ the HNMR signal will produce a tight resonance. In tea leaves ¹ H NMR CDCl ₃ The phases had a small number of characteristic peaks (FIG. 2 b). Fatty acids are precursors to the fresh and green odour in tea soups. The fatty acid is oxygenated by Lipoxygenase (LOX), and the activity of the enzyme is induced to change by the temperature of the environment, so that the tea fragrance in different environments is different, which shows that the content of the fatty acid in different producing areas indirectly reflects the difference of the fragrance. In addition, fatty acids also produce cyclic aromas, such as methyl jasmonate. Methyl jasmonate is an important contributor to the aroma of orchids, and is considered a characteristic aroma of high-quality taiping kowkui. In conclusion, the non-polar extraction is beneficial to tracking the origin of the Taiping Houkui green tea in a narrow production area.

TABLE 2TPHK (D) ₂ Peak assignment of O) nuclear magnetic resonance spectroscopy

No.	Component	Chemical shiftδ(ppm)(No.)	References
				1	Theanine	1.12,2.15,2.48,3.22,3.79	(Kumar et al.,2016；Gall et al.,2004)
2	EGCG	2.88,3.02,5.05,5.54,6.09,6.64,6.96	(Gall et al.,2004)
				3	EGC	2.78,2.91,4.31,6.09,6.64	(Gall et al.,2004)
4	ECG	2.91,3.04,5.05,6.09,6.85,6.92	(Gall et al.,2004)
				5	EC	2.76,2.88,4.27,6.09,6.94,7.04	(Gall et al.,2004)
6	Sucrose	3.4,3.65,3.70,4.08,4.23,5.43	(Kumar et al.,2016)
				7	α-glucose	3.50,5.25	(Bo et al.,2019)
8	β-glucose	4.58	(Gall et al.,2004)
				9	Fructose	3.56,4.13	(Bo et al.,2019)
10	Leucine	0.98	(Lee et al.,2010)
				11	Isoleucine	1.03,1.98	(Lee et al.,2010)
12	Threonine	1.36,4.23	(Gall et al.,2004)
				13	Alanine	1.50,3.84	(Bo et al.,2019)
14	Quinic acid	2.00,4.04	(Kumar et al.,2016；Lee et al.,2010)
				15	Acetic acid	2.07	(Lee et al.,2011)
16	Gallic acid	7.18	(Kumar et al.,2016)

TABLE 3TPHK (CDCl) ₃ ) Peak assignment in NMR spectra

4. Multivariate data analysis and classification

Principal component analysis is performed to reduce dimensionality and visualize the results. The significance analysis of the variables was performed by Kruskal-Wallis test with a 95% confidence. P-value the multiplex assay was adjusted using the Benjamini-Hochberg False Discovery Rate (FDR) method (FDR < 0.05). Pearson correlation analysis was performed in MATLAB 2018b (The Mathworks inc., nature, MA, USA).

(1) RF model

Random Forest (RF) algorithms are used to classify tai hough production zones by geographic origin. This method uses bootstrap samples to generate a combination of decision trees. The number of trees is set to 1000.RF is a tree-combining method developed from a training data set and validated internally for the purpose of accurately predicting target variables from predictors. RF will create a plurality of classification and regression trees (CART) from the bootstrap samples of the raw training data. It also randomly searches features to determine split points for the growing tree. Importantly, the RF model can rank the important feature variables by their contribution to classification accuracy.

(2) SVM model

SVM is a machine learning technique that transforms data from a low-dimensional space to a high-dimensional space and creates an optimal hyperplane to classify data points of different classes of samples. In this study, the model was constructed using a linear kernel function SVM algorithm and 10-fold cross validation. Cross-validation can prevent overfitting when the data set is small and produce a reliable and stable model. The SVM algorithm is executed in MATLAB 2018 b.

(3) Model evaluation method

The performance of each established model was evaluated by calculating the accuracy, which is expressed according to the following formula, while the model was evaluated with sensitivity and specificity and its application was expanded. The higher the value, the better the classification performance.

Accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%

Sensitivity = TP/(TP + TN) × 100%

Specificity = TN/(TN + FP) × 100%.

In the formula, TP, FP, TN and FN refer to true positive, false positive, true negative and false negative results, respectively.

To distinguish TPHK from different origins, principal Component Analysis (PCA) was performed to visualize group separation and estimate internal differences. PCA results showed, with CDCl ₃ By comparison, D ₂ The samples in O were well separated (fig. 3). This may be that the exact lipid composition is complex. The signals of the non-polar extracts are easily superimposed and it is difficult to find the differences between the samples (fig. 2 b). Overall, a relatively small distance between different intergroups leads to an ambiguity of the intergroup boundaries. The overlap between the different groups was significant, further verifying that the sensory evaluation results had higher intra-group differences and lower inter-group differences. This is also a difficulty in tracking the origin in a narrow sense. The unsupervised method PCA provides limited information, while more information is obtained in the "supervised" method of sample-like knowledge. Therefore, machine learning is used to further analyze the data. The order of the determination accuracy is as follows (Table 4). SVM (CDCl) ₃ )>RF(D ₂ O)>RF(CDCl ₃ )>SVM(D ₂ O). SVM model only (CDCl) ₃ ) The accuracy of (2) is over 80%; however, the specificity was only 72.22%. This means that a single stage is difficult to solve the complex narrow-area traceability problem.

TABLE 4 accuracy of different models

5. Data fusion

Due to the narrow production area of TPHK, the two-phase extraction data alone hardly reflected the differences in the samples. Thus, D ₂ O phase and CDCl ₃ And performing fusion analysis on the data of the facies, and performing Pearson correlation analysis on the fused data to investigate the correlation between the variables. The Pearson correlation matrix showed little correlation between polar and non-polar extracts (r)<0.5 |), the different characteristics of TPHK can be effectively reflected (fig. 4). This facilitates better results with fused data in conjunction with machine learning. Further applying PCA after fusingThe data structure of polar and non-polar extracts of (a) was visualized (fig. 5 a). PCA shows that the spatial distribution of sample points exhibits overlapping clusters between different groups. These results indicate that there is significant overlap between the different groups even when the combined metabolic profiles are fused. Since the first two PCs account for only 57% of the total variance, further examination of 5 PCs during the analysis provided a variance contribution of 78.5% (FIG. 6). However, there is no improvement in overlapping clusters between different groups. ¹ The metabolites provided by H NMR are only one aspect of the dominating tea production site information, and the resolution of the instrument results in limited unsupervised learning results for PCA. Interestingly, in supervised learning, fusion of polar and non-polar data significantly improved accuracy (table 4). The accuracy of the SVM classifier on the identification of the core producing area is 94.00 percent. Unfortunately, the identification of other producing zones is poor, only 75.00%. Overall, the RF classifier achieved an optimal classification rate of 87.50%, with specificity and sensitivity of 86.11% and 88.89%, respectively, being acceptable (table 4). The results show that the two-phase extraction is suitable for distinguishing TPHK of different producing areas, and the accuracy rate is different from 78% of single-phase fusion to 87.5% of two-phase fusion. The fused two-phase data set shows better sample distinguishing performance, because the spectra obtained by two-phase extraction are completely complementary, the relationship between chemical changes caused by different producing areas can be more completely understood.

6. Related metabolites

Using the Kruskal-Wallis test (p < 0.05), 61 bins that were significantly different in the two producing zones were obtained (FIG. 7). To further determine the interval of high importance for distinguishing the core pay zone from the other pay zones, an FDR corrected Kruskal-Wallis and RF characteristic variable screen was performed (FIG. 5 b). 55 bins were excluded (Table 5). Of these, 1.16, 2.2 and 3.8ppm theanine bin distinguished two production zones. Theanine is reported to be a potential marker for distinguishing green tea in three narrow producing areas, and with more rigorous screening, it was determined that theanine was not as important as the more relevant differential metabolites. The 6 most relevant potential metabolic markers were screened (table 6) and the relative concentrations of these results are summarized in a boxplot (figure 8). The nmr signal of sucrose was 3.68ppm, the most abundant carbohydrate in tea, partly produced by photosynthesis during growth. It has been previously reported that the sucrose content of black tea samples from different geographical regions and climates varies greatly. The core pay zone has higher EGCG, ECG, EGC and EC content (6.08-6.20 ppm and 7.00 ppm) than other pay zones. It was demonstrated that the synthesis of catechins is climatically affected, and is the reason for the significant difference between green tea from three different origins, representing 0.84ppm of all fatty acids (except linolenic acid), contributing very prominently to the classification results, which have been used effectively to determine the geographical origin of the oil. However, they are often overlooked in tea. The fatty acid is converted during the tea processing process to produce saturated and unsaturated C6 and C9 aldehydes and alcohols, which provide a faint scent to the tea soup. As precursors of aromas, they make a significant contribution to the aroma. The bins of linolenic acid were 2.04, 2.8, 5.24 and 5.36ppm (table 5), and the linolenic acid concentration in the core production zone was lower than in the other production zones. During tea processing, linolenic acid is converted into methyl jasmonate, which contributes greatly to the orchid aroma of TPHK. This may be another reason that the core zone has a higher aroma score than the other zones, although the contribution classification is less important than 0.84ppm as a result of the removal of linoleic bin by FDR correction. It is noted that there are also some less relevant bins available for classification. The classification is based on the co-existence and interaction of multiple bins. Therefore, we consider the entire spectrum, which is characteristic of one particular TPHK. Due to the quality and quantity of the full spectrum, combined with a machine algorithm, the production area of the taiping kowkui can be identified.

Table 5 significant differences between other and core pay zones (Kruskal-Wallis test, P < 0.05), 55 bins were excluded by the FDR method

Table 6 significant differences between other and core producing zones, P values were obtained from Wilcoxon rank sum test and corrected using FDR method

bins(δppm ¹ H)	p-value	–log 10(p)	FDR
				3.68(D ₂ O)	2.72×10 ^–6	5.5652	0.000958
7(D ₂ O)	1.15×10 ^–5	4.939	0.002026
				6.16(D ₂ O)	3.85×10 ^–5	4.4147	0.004515
0.84(CDCl ₃ )	0.000243	3.6149	0.020672
				6.2(D ₂ O)	0.000294	3.5322	0.020672
6.12(D ₂ O)	0.000471	3.3274	0.027607

7. Model comparison

In this study, we used the basis of ¹ H NMR fingerprint identification technology combined with machine learning tracks narrow green tea producing areas. One key consequence is that the traceability of polar extracts from tea leaves to narrow origins is limited. We found that non-polar extracts are also very important for classification. Importantly, fusing polar and non-polar extracts can significantly improve the accuracy of classification.

Seeger proposes nuclear magnetic resonance metabonomics, which selects polar and non-polar metabolites to simply and rapidly distinguish black tea from green tea. Black tea and green tea can be distinguished only visually. In addition, the metabolites of black and green tea are very different and can be distinguished by a single extraction of the polar extract. For narrow production areas, the metabolites are very similar. TPHK in two narrow producing areas is distinguished by polar extract, and the accuracy rate is only 76.39%. The two-phase data are obtained by adopting the fused two-phase extraction method once, and the fused two-phase data have satisfactory accuracy (87.50%). Previous studies using nuclear magnetic resonance to obtain fingerprints of polar and non-polar metabolites separately, which are less accurate than single metabolites after data fusion, mean that only one type of metabolite is needed to classify geographically distant origins, while incorporating other variables introduces redundancy. In disagreement with these results, we found that fusing polar and non-polar metabolites of tea leaves without data overload (figure 4,r<0.5), and the accuracy of classification is obviously improved by combining machine learning. Furthermore, we have used a two-phase extraction to obtain all metabolites at once, which is faster than obtaining polar and non-polar metabolites separately. Tea leaves obtained by a simple and efficient two-phase extraction process ¹ The H NMR comprehensive fingerprint has great application potential for controlling the quality of the tea product due to short time requirement and minimum sample size.

The foregoing is illustrative of the preferred embodiments of the present invention, which is set forth only, and not to be taken as limiting the invention. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for identifying the source tracing of a tea producing area through a two-phase extraction NMR spectrum is characterized by comprising the following steps:

s2: preprocessing the chemical shift of the region of interest of the NMR spectrum by summation normalization by using MestReNova software;

s4: and importing the spectral data into the model, calculating the accuracy, evaluating the model, and selecting the model with the highest accuracy.

2. The method for identifying the source tracing of tea production places through the two-phase extraction NMR spectrum as claimed in claim 1, wherein the crushed tea sample in the step S1 is subjected to ultrasonic treatment and centrifugation, and the nuclear magnetic resonance spectrum is obtained through a 600MHz HMR spectrometer at the temperature of 298K.

3. The method for identifying the source tracing of tea leaf origin by means of two-phase extraction NMR spectrum according to claim 1, wherein the region of interest selection D in the step S2 ₂ O:0.6-8.12ppm, excluding 452-5.0ppm, and CDCL ₃ :0.48-7.60ppm, excluding chemical shifts of 7.2-7.28 ppm.

4. The method for identifying the tea leaf origin tracing through the biphase extraction NMR spectrum according to claim 1, wherein the principal component analysis in the step S3 uses PCA to reduce the dimension of the data, so as to visualize the data.

5. The method for identifying the tea leaf origin tracing through the two-phase extraction NMR spectrum as claimed in claim 1, wherein the specific process of introducing the spectrum data into the model in the step S4 is as follows: for the spectral data extracted from the aqueous phase, 0.6-8.12ppm was selected, excluding the chemical shifts of 4.52-5.0ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, for the spectrum taken from the chloroform phase, 0.48-7.60ppm was selected, excluding the chemical shifts of 7.2-7.28ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, and the nuclear magnetic spectral data obtained from the aqueous phase and the chloroform phase were directly combined by low-level data fusion and introduced into a random forest RF model.

6. The method for identifying the source tracing of tea leaf origin by means of two-phase extraction NMR spectrum according to claim 1, wherein the accuracy of the calculation in the step S4 is represented by the formula:

accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%

Wherein, TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.

7. The method for identifying the tea leaf origin tracing through the biphase extraction NMR spectrum as claimed in claim 1, wherein the sensitivity and specificity of the model are also evaluated in the step S4, and the calculation formulas of the sensitivity and specificity are respectively as follows:

sensitivity = TP/(TP + TN) × 100%

Specificity = TN/(TP + TN) × 100%

8. Use of a method for identifying the origin of a tea leaf by biphasic extraction NMR spectroscopy as defined in any one of claims 1 to 7 for identifying the origin of a taiping kowkui origin.