CN115165950A - Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof - Google Patents

Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof Download PDF

Info

Publication number
CN115165950A
CN115165950A CN202210495061.4A CN202210495061A CN115165950A CN 115165950 A CN115165950 A CN 115165950A CN 202210495061 A CN202210495061 A CN 202210495061A CN 115165950 A CN115165950 A CN 115165950A
Authority
CN
China
Prior art keywords
tea
identifying
nmr spectrum
spectrum
origin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210495061.4A
Other languages
Chinese (zh)
Other versions
CN115165950B (en
Inventor
侯如燕
金戈
崔传坚
韦朝领
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Agricultural University AHAU
Original Assignee
Anhui Agricultural University AHAU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Agricultural University AHAU filed Critical Anhui Agricultural University AHAU
Priority to CN202210495061.4A priority Critical patent/CN115165950B/en
Publication of CN115165950A publication Critical patent/CN115165950A/en
Application granted granted Critical
Publication of CN115165950B publication Critical patent/CN115165950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N24/00Investigating or analyzing materials by the use of nuclear magnetic resonance, electron paramagnetic resonance or other spin effects
    • G01N24/08Investigating or analyzing materials by the use of nuclear magnetic resonance, electron paramagnetic resonance or other spin effects by using nuclear magnetic resonance
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/34Purifying; Cleaning
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/40Concentrating samples
    • G01N1/4055Concentrating samples by solubility techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/40Concentrating samples
    • G01N1/4055Concentrating samples by solubility techniques
    • G01N2001/4061Solvent extraction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biochemistry (AREA)
  • Medical Informatics (AREA)
  • High Energy & Nuclear Physics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention relates to the field of nuclear magnetic resonance detection, in particular to a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof, and is based on 1 Two-phase extraction fingerprinting by H NMR combined with multivariate data, the geographic traceability (ballast) of Taiping Houkui green tea was analyzed, principal component analysis was used as an exploratory tool for clustering summary, support vector machines and Random Forests (RF) were further applied for classification, combining polar and non-polar extractionThe RF model of the substance achieved an optimal accuracy of 87.5%, with catechins, fatty acids and sucrose being considered as contributors to this classification and as important differential metabolites, these results supporting the use of 1 H NMR combined with machine learning tools to identify green tea in narrow-sense origins.

Description

Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof
Technical Field
The invention relates to the field of nuclear magnetic resonance detection, in particular to a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof.
Background
Tea leaves, one of the three most popular beverages in the world, are favored by consumers due to their unique flavors. Among all tea leaf types, green tea has the largest market share in china. The demand for green tea is closely related to the geographical origin and corresponding quality of the tea leaves, which further affects the price and consumer choice. In china, famous green tea is usually produced in narrow production areas such as west lake longjing, huangshan Maofeng and Taiping Houkui. Taiping houkui is considered to be the king of green tea, and has a unique appearance and an orchid-like aroma. The taiping houkui is mainly produced in the new mingten town, the three-mouth town, the longmen town, the youth, the monkey sentry and the monkey pit of the yellow mountain city of Anhui province. The Yanjia, the monkey hillock and the monkey pit are the core producing areas and are the most special places in the field. The taiping houkui from different origins are similar in appearance and geographical location. Driven by the interest, some illegal merchants know that the product production comes from elsewhere, but still mark the product as a valuable geographic place. Therefore, there is an urgent need for an effective method for certifying green tea from narrow places of origin.
The traditional tea production place identification method is judged by sensory evaluation, depends on the experience of people, is easily influenced by artificial subjective factors, and has certain limitation. In recent years, some emerging detection techniques are widely used as complementary methods to sensory evaluation, such as high performance liquid chromatography-mass spectrometry, headspace solid phase microextraction and gas chromatography-mass spectrometry, stable isotopes, elemental analysis, electronic noses and electronic tongues. However, these techniques typically require complex sample pre-treatment or derivatization, run times are long, and are not suitable for routine analysis. In contrast, nuclear Magnetic Resonance (NMR) techniques are fast (typically 3-5 minutes per sample) and can produce reliable metabolite fingerprints in the smallest samples. In addition, nuclear magnetic resonance allows simultaneous identification of multiple chemical components from one experiment with good reproducibility. Due to the presence of different metabolitesThe vast difference that a complete analysis of the metabolome is not feasible. Some occur in large amounts (up to ten percent of dry matter) while others occur only in trace amounts (in pmol amounts or less). Some are extremely hydrophilic (like sugar) and others are lipophilic (like fat), which makes extraction both in one extract almost impossible. In addition, due to 1 The spectral range of H NMR is limited and superposition of signals often occurs. Especially small signals close to large signals are difficult to find. These factors limit the development of nuclear magnetic spectroscopy. The two-phase extraction can obtain polar and nonpolar metabolites at one time and comprehensively reflect the metabolite information in the tea.
In recent decades, there have been several reports on the traceability of tea leaves based on nuclear magnetic resonance, and different countries have been studied to differentiate green tea according to origin, and no satisfactory results have been obtained. The accuracy of distinguishing oolong tea from three different origins using nmr data was only 68.2-78.7%. These studies only used polar metabolites in tea leaves for identification, and the degree of contribution of non-polar metabolites to identification of tea-leaf origin is still unknown, and traceability of polar extracts in tea leaves to narrow-leaf origin is limited. In addition, the nuclear magnetic resonance is applied to classify the tea leaves with narrow production places and unobvious climate differences, and the accuracy of the classification needs to be further improved. Research reports suggest that obtaining a comprehensive metabolic fingerprint may provide additional insight. However, few studies will be based on 1 The metabolomics approach of H NMR was applied to certification of tea origin using polar and non-polar extracts.
In view of the above-mentioned drawbacks, the inventors of the present invention have finally obtained the present invention through a long period of research and practice.
Disclosure of Invention
The invention aims to solve the problems that polar metabolites in tea leaves are only used for identification, the contribution degree of non-polar metabolites to tea leaf origin identification is unknown, nuclear magnetic resonance is used for classifying tea leaves with narrow origin and unobvious climate difference, and the accuracy is not high in the existing research, and provides a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof.
In order to achieve the aim, the invention discloses a method for identifying the origin tracing of a tea production place by using a two-phase extraction NMR spectrum, which comprises the following steps:
s1: crushing a tea sample, freeze-drying, and collecting a nuclear magnetic resonance spectrum;
s2: pre-processing the NMR spectra with MestReNova software;
s3: performing principal component analysis on the spectral data to reduce dimensionality and visualize the results;
s4: and (4) importing the spectral data into the model, calculating the accuracy and evaluating the model.
And (2) carrying out ultrasonic treatment and centrifugation on the tea sample crushed in the step (S1), and obtaining a nuclear magnetic resonance spectrum by a 600MHz HMR spectrometer at the temperature of 298K.
The region of interest in step S2 is selected D 2 O:0.6-8.12ppm, excluding 4.52-5.0ppm and CDCL 3 :0.48-7.60ppm, excluding chemical shifts of 7.2-7.28 ppm.
And in the step S3, the principal component analysis uses PCA to reduce the dimensionality of the data so as to visualize the data.
The specific process of importing the spectrum data into the model in the step S4 is as follows: for the spectral data extracted from the aqueous phase, 0.6-8.12ppm was selected, excluding the chemical shifts of 4.52-5.0ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, for the spectrum taken from the chloroform phase, 0.48-7.60ppm was selected, excluding the chemical shifts of 7.2-7.28ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, and the nuclear magnetic spectral data obtained from the aqueous phase and the chloroform phase were directly combined by low-level data fusion and introduced into a random forest RF model.
The formula for calculating accuracy in step S4 is:
accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%
Wherein TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
In the step S4, the sensitivity and the specificity of the model are also evaluated, and the calculation formulas of the sensitivity and the specificity are respectively as follows:
sensitivity = TP/(TP + TN) × 100%
Specificity = TN/(TP + TN) × 100%
Wherein TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
The invention also discloses application of the method for identifying the origin tracing of the tea through the two-phase extraction NMR spectrum in identifying the origin tracing of the Taiping Houkui tea.
Due to the large differences between different metabolites, a complete analysis of the metabolome is not feasible. Some occur in large amounts (up to ten percent of dry matter) while others occur only in trace amounts (in pmol amounts or less). Some are extremely hydrophilic (like sugar) and others are lipophilic (like fat), which makes extraction both in one extract almost impossible. In addition, due to 1 The spectral range of H NMR is limited and superposition of signals often occurs. Especially small signals close to large signals are difficult to find. Only a part of metabolites can be obtained by single metabolite extraction, and in order to improve the coverage rate of the extracted metabolites, a comprehensive NMR fingerprint metabolic spectrum of the tea leaves is obtained by one-time biphasic extraction.
Compared with the prior art, the invention has the beneficial effects that: the invention uses 1 H NMR fusion of polar and nonpolar compounds is very advantageous for the traceability of narrow production areas, based on 1 The two-phase extraction of H NMR fingerprints is combined with machine learning, green tea in narrow areas can be distinguished, single solvent extraction is limited for tea source tracing in the narrow areas, the classification precision is obviously improved through fusion extraction of polar metabolites and nonpolar metabolites, the random forest model shows the best classification precision of 87.50%, and the method can be used as a rapid screening technology, helps professional auditors to identify production places, and can be used as an additional reference based on objective measurement.
Drawings
FIG. 1 is a sample collection plot of Taiping Houkui;
FIG. 2 shows 600 of Taiping Houkui sample (mixed sample)MHz 1 H NMR spectrum, (a) two-phase extraction (D) 2 O); (b) Two-phase extraction (CDCl) 3 );
FIG. 3 shows the PCA of two-phase extraction of TPHX, (a) D 2 A phase O; (b) CDCl 3 Phase (1);
FIG. 4 is D 2 O and CDCl 3 Visualization of pearson correlation coefficients between data;
FIG. 5 is a data fusion for biphasic extraction, (a) PCA visualization; (b) RF model features are ordered by their contribution to classification accuracy, core zone: core production area (youth, monkey sentry, monkey village), other production areas: other production areas (new Ming, three-port, gantry);
FIG. 6 is D 2 O and CDCl 3 Fusing data for visualization of the first five PCAs;
fig. 7 is bin between core and non-core pay zones (P < 0.05);
FIG. 8 is a box plot of the significantly different metabolites in TPXH samples obtained from two different regions (Kruskal-Wallis test, P <0.05, FDR <0.05, RF model feature variable screening), core production zone: core pay zone, other pay zones: other production areas.
Detailed Description
The above and further features and advantages of the present invention are described in more detail below with reference to the accompanying drawings.
1. Green tea samples
From the core production area (young house, monkey post, monkey pit) and other production areas (Longzhen, xinming town, sankouzhen town) of Huangshan City, anhui province, 72 Taiping Houkui samples were collected almost covering the whole tea production season (figure 1), samples from different production areas were collected in the same production process and in different batches, modern tea producers were entrusted to make tea samples according to the Chinese National Official Standards (CNOS) GB/T19698-2008, detailed information of the samples is shown in Table 1, and all samples were stored at 4 ℃ for analysis.
TABLE 1 Taiping Houkui sample information
Figure BDA0003632576430000041
Figure BDA0003632576430000051
Figure BDA0003632576430000061
2. Sample preparation
The taiping kowkui sample was pulverized with a blender and freeze-dried for 48 hours. Then, 100mg of the freeze-dried tea leaves were transferred to a 2 mL centrifuge tube and 0.8mL of D was added 2 O, then 0.8mL CDCl was added 3 (TMS 0.03% w/v). Subsequently, the extract was sonicated for 10 minutes, then centrifuged (13000 Xg) at 20 ℃ for 5 minutes, and then 0.4mL of D in the sample was added 2 O phase transfer to NMR tube and 0.1mL D addition 2 O (TSP 0.05% w/v). Then, 0.4mL of CDCl was taken out 3 Phase transfer to another nmr tube.
3. Nuclear magnetic resonance data acquisition, processing and analysis
All of 1 The H NMR spectra were obtained using a 600MHz NMR spectrometer (Agilent Technologies, CA, USA) at a temperature of 298K. CDCl 3 Of extracts 1 The H NMR spectrum used the following parameters: the number of scans =64; spectral width =9615.4 hertz, size of FID (TD =65536; relaxation delay =1 second; acquisition time =1.7 seconds. D 2 Of O 1 The H NMR spectra were obtained by WET1D pulse sequences using a deformable selective pulse to suppress the residual water signal. Each spectrum consists of 64 scans, 65536 data points, a spectral width of 9615.4 hz in the frequency domain, a relaxation delay of 1.5 seconds, and an acquisition time of 4.00 seconds.
The NMR spectra were pre-processed with MestReNova software (MestReNova v 14.0.1,2018, mestrelab research, santiago de Compstela, spain). The signal peaks for the internal references TMS and TSP were set to a chemical shift of 0.00 ppm. For all spectra, automatic phase and baseline corrections were performed. For theThe extracted spectrum, using bin for segmented spectrum, was set to 0.04ppm. Region of interest selection D 2 O (0.6-8.12 ppm, excluding 4.52-5.0 ppm) and CDCl 3 (0.48-7.60 ppm, excluding 7.2-7.28 ppm). Bins are generated by normalizing the intensity of each bin to the total intensity of each spectrum for multivariate analysis. Two resulting data matrices, one CDCl, were obtained 3 (72 x 176), the other is D 2 O (72 x 176) extract. These matrices are then merged into a third fused data matrix (72 x 352).
At present, the research on tea leaves focuses on polar extraction, and the nonpolar fingerprint spectrum is ignored. To obtain the comprehensive metabolic fingerprint of taiping kowkui, two-phase extraction was used to analyze polar and non-polar metabolites. According to the published literature, the HMDB database, in a two-phase extraction (D) 2 O) identified 16 taiping kowkui metabolites (fig. 2a and table 2). Metabolites identified in the current study include carbohydrates (sucrose, alpha-glucose, beta-glucose and fructose), amino acids (theanine, alanine, isoleucine, leucine and threonine), organic acids (gallic acid, quinic acid and acetic acid) and phenols (EGCG, EC, ECG and EGC). For two-phase extraction (CDCl) 3 ) The main fatty acids in the spectrum tea of (a) are linolenic acid, linoleic acid, oleic acid and palmitoleic acid. The proton distribution of the different functional groups is shown in figure 2b and table 3. Because the chemical properties of different fatty acids in tea are similar, 1 the HNMR signal will produce a tight resonance. In tea leaves 1 H NMR CDCl 3 The phases had a small number of characteristic peaks (FIG. 2 b). Fatty acids are precursors to the fresh and green odour in tea soups. The fatty acid is oxygenated by Lipoxygenase (LOX), and the activity of the enzyme is induced to change by the temperature of the environment, so that the tea fragrance in different environments is different, which shows that the content of the fatty acid in different producing areas indirectly reflects the difference of the fragrance. In addition, fatty acids also produce cyclic aromas, such as methyl jasmonate. Methyl jasmonate is an important contributor to the aroma of orchids, and is considered a characteristic aroma of high-quality taiping kowkui. In conclusion, the non-polar extraction is beneficial to tracking the origin of the Taiping Houkui green tea in a narrow production area.
TABLE 2TPHK (D) 2 Peak assignment of O) nuclear magnetic resonance spectroscopy
No. Component Chemical shiftδ(ppm)(No.) References
1 Theanine 1.12,2.15,2.48,3.22,3.79 (Kumar et al.,2016;Gall et al.,2004)
2 EGCG 2.88,3.02,5.05,5.54,6.09,6.64,6.96 (Gall et al.,2004)
3 EGC 2.78,2.91,4.31,6.09,6.64 (Gall et al.,2004)
4 ECG 2.91,3.04,5.05,6.09,6.85,6.92 (Gall et al.,2004)
5 EC 2.76,2.88,4.27,6.09,6.94,7.04 (Gall et al.,2004)
6 Sucrose 3.4,3.65,3.70,4.08,4.23,5.43 (Kumar et al.,2016)
7 α-glucose 3.50,5.25 (Bo et al.,2019)
8 β-glucose 4.58 (Gall et al.,2004)
9 Fructose 3.56,4.13 (Bo et al.,2019)
10 Leucine 0.98 (Lee et al.,2010)
11 Isoleucine 1.03,1.98 (Lee et al.,2010)
12 Threonine 1.36,4.23 (Gall et al.,2004)
13 Alanine 1.50,3.84 (Bo et al.,2019)
14 Quinic acid 2.00,4.04 (Kumar et al.,2016;Lee et al.,2010)
15 Acetic acid 2.07 (Lee et al.,2011)
16 Gallic acid 7.18 (Kumar et al.,2016)
TABLE 3TPHK (CDCl) 3 ) Peak assignment in NMR spectra
Figure BDA0003632576430000081
Figure BDA0003632576430000091
4. Multivariate data analysis and classification
Principal component analysis is performed to reduce dimensionality and visualize the results. The significance analysis of the variables was performed by Kruskal-Wallis test with a 95% confidence. P-value the multiplex assay was adjusted using the Benjamini-Hochberg False Discovery Rate (FDR) method (FDR < 0.05). Pearson correlation analysis was performed in MATLAB 2018b (The Mathworks inc., nature, MA, USA).
(1) RF model
Random Forest (RF) algorithms are used to classify tai hough production zones by geographic origin. This method uses bootstrap samples to generate a combination of decision trees. The number of trees is set to 1000.RF is a tree-combining method developed from a training data set and validated internally for the purpose of accurately predicting target variables from predictors. RF will create a plurality of classification and regression trees (CART) from the bootstrap samples of the raw training data. It also randomly searches features to determine split points for the growing tree. Importantly, the RF model can rank the important feature variables by their contribution to classification accuracy.
(2) SVM model
SVM is a machine learning technique that transforms data from a low-dimensional space to a high-dimensional space and creates an optimal hyperplane to classify data points of different classes of samples. In this study, the model was constructed using a linear kernel function SVM algorithm and 10-fold cross validation. Cross-validation can prevent overfitting when the data set is small and produce a reliable and stable model. The SVM algorithm is executed in MATLAB 2018 b.
(3) Model evaluation method
The performance of each established model was evaluated by calculating the accuracy, which is expressed according to the following formula, while the model was evaluated with sensitivity and specificity and its application was expanded. The higher the value, the better the classification performance.
Accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%
Sensitivity = TP/(TP + TN) × 100%
Specificity = TN/(TN + FP) × 100%.
In the formula, TP, FP, TN and FN refer to true positive, false positive, true negative and false negative results, respectively.
To distinguish TPHK from different origins, principal Component Analysis (PCA) was performed to visualize group separation and estimate internal differences. PCA results showed, with CDCl 3 By comparison, D 2 The samples in O were well separated (fig. 3). This may be that the exact lipid composition is complex. The signals of the non-polar extracts are easily superimposed and it is difficult to find the differences between the samples (fig. 2 b). Overall, a relatively small distance between different intergroups leads to an ambiguity of the intergroup boundaries. The overlap between the different groups was significant, further verifying that the sensory evaluation results had higher intra-group differences and lower inter-group differences. This is also a difficulty in tracking the origin in a narrow sense. The unsupervised method PCA provides limited information, while more information is obtained in the "supervised" method of sample-like knowledge. Therefore, machine learning is used to further analyze the data. The order of the determination accuracy is as follows (Table 4). SVM (CDCl) 3 )>RF(D 2 O)>RF(CDCl 3 )>SVM(D 2 O). SVM model only (CDCl) 3 ) The accuracy of (2) is over 80%; however, the specificity was only 72.22%. This means that a single stage is difficult to solve the complex narrow-area traceability problem.
TABLE 4 accuracy of different models
Figure BDA0003632576430000101
5. Data fusion
Due to the narrow production area of TPHK, the two-phase extraction data alone hardly reflected the differences in the samples. Thus, D 2 O phase and CDCl 3 And performing fusion analysis on the data of the facies, and performing Pearson correlation analysis on the fused data to investigate the correlation between the variables. The Pearson correlation matrix showed little correlation between polar and non-polar extracts (r)<0.5 |), the different characteristics of TPHK can be effectively reflected (fig. 4). This facilitates better results with fused data in conjunction with machine learning. Further applying PCA after fusingThe data structure of polar and non-polar extracts of (a) was visualized (fig. 5 a). PCA shows that the spatial distribution of sample points exhibits overlapping clusters between different groups. These results indicate that there is significant overlap between the different groups even when the combined metabolic profiles are fused. Since the first two PCs account for only 57% of the total variance, further examination of 5 PCs during the analysis provided a variance contribution of 78.5% (FIG. 6). However, there is no improvement in overlapping clusters between different groups. 1 The metabolites provided by H NMR are only one aspect of the dominating tea production site information, and the resolution of the instrument results in limited unsupervised learning results for PCA. Interestingly, in supervised learning, fusion of polar and non-polar data significantly improved accuracy (table 4). The accuracy of the SVM classifier on the identification of the core producing area is 94.00 percent. Unfortunately, the identification of other producing zones is poor, only 75.00%. Overall, the RF classifier achieved an optimal classification rate of 87.50%, with specificity and sensitivity of 86.11% and 88.89%, respectively, being acceptable (table 4). The results show that the two-phase extraction is suitable for distinguishing TPHK of different producing areas, and the accuracy rate is different from 78% of single-phase fusion to 87.5% of two-phase fusion. The fused two-phase data set shows better sample distinguishing performance, because the spectra obtained by two-phase extraction are completely complementary, the relationship between chemical changes caused by different producing areas can be more completely understood.
6. Related metabolites
Using the Kruskal-Wallis test (p < 0.05), 61 bins that were significantly different in the two producing zones were obtained (FIG. 7). To further determine the interval of high importance for distinguishing the core pay zone from the other pay zones, an FDR corrected Kruskal-Wallis and RF characteristic variable screen was performed (FIG. 5 b). 55 bins were excluded (Table 5). Of these, 1.16, 2.2 and 3.8ppm theanine bin distinguished two production zones. Theanine is reported to be a potential marker for distinguishing green tea in three narrow producing areas, and with more rigorous screening, it was determined that theanine was not as important as the more relevant differential metabolites. The 6 most relevant potential metabolic markers were screened (table 6) and the relative concentrations of these results are summarized in a boxplot (figure 8). The nmr signal of sucrose was 3.68ppm, the most abundant carbohydrate in tea, partly produced by photosynthesis during growth. It has been previously reported that the sucrose content of black tea samples from different geographical regions and climates varies greatly. The core pay zone has higher EGCG, ECG, EGC and EC content (6.08-6.20 ppm and 7.00 ppm) than other pay zones. It was demonstrated that the synthesis of catechins is climatically affected, and is the reason for the significant difference between green tea from three different origins, representing 0.84ppm of all fatty acids (except linolenic acid), contributing very prominently to the classification results, which have been used effectively to determine the geographical origin of the oil. However, they are often overlooked in tea. The fatty acid is converted during the tea processing process to produce saturated and unsaturated C6 and C9 aldehydes and alcohols, which provide a faint scent to the tea soup. As precursors of aromas, they make a significant contribution to the aroma. The bins of linolenic acid were 2.04, 2.8, 5.24 and 5.36ppm (table 5), and the linolenic acid concentration in the core production zone was lower than in the other production zones. During tea processing, linolenic acid is converted into methyl jasmonate, which contributes greatly to the orchid aroma of TPHK. This may be another reason that the core zone has a higher aroma score than the other zones, although the contribution classification is less important than 0.84ppm as a result of the removal of linoleic bin by FDR correction. It is noted that there are also some less relevant bins available for classification. The classification is based on the co-existence and interaction of multiple bins. Therefore, we consider the entire spectrum, which is characteristic of one particular TPHK. Due to the quality and quantity of the full spectrum, combined with a machine algorithm, the production area of the taiping kowkui can be identified.
Table 5 significant differences between other and core pay zones (Kruskal-Wallis test, P < 0.05), 55 bins were excluded by the FDR method
Figure BDA0003632576430000111
Figure BDA0003632576430000121
Table 6 significant differences between other and core producing zones, P values were obtained from Wilcoxon rank sum test and corrected using FDR method
bins(δppm 1 H) p-value –log 10(p) FDR
3.68(D 2 O) 2.72×10 –6 5.5652 0.000958
7(D 2 O) 1.15×10 –5 4.939 0.002026
6.16(D 2 O) 3.85×10 –5 4.4147 0.004515
0.84(CDCl 3 ) 0.000243 3.6149 0.020672
6.2(D 2 O) 0.000294 3.5322 0.020672
6.12(D 2 O) 0.000471 3.3274 0.027607
7. Model comparison
In this study, we used the basis of 1 H NMR fingerprint identification technology combined with machine learning tracks narrow green tea producing areas. One key consequence is that the traceability of polar extracts from tea leaves to narrow origins is limited. We found that non-polar extracts are also very important for classification. Importantly, fusing polar and non-polar extracts can significantly improve the accuracy of classification.
Seeger proposes nuclear magnetic resonance metabonomics, which selects polar and non-polar metabolites to simply and rapidly distinguish black tea from green tea. Black tea and green tea can be distinguished only visually. In addition, the metabolites of black and green tea are very different and can be distinguished by a single extraction of the polar extract. For narrow production areas, the metabolites are very similar. TPHK in two narrow producing areas is distinguished by polar extract, and the accuracy rate is only 76.39%. The two-phase data are obtained by adopting the fused two-phase extraction method once, and the fused two-phase data have satisfactory accuracy (87.50%). Previous studies using nuclear magnetic resonance to obtain fingerprints of polar and non-polar metabolites separately, which are less accurate than single metabolites after data fusion, mean that only one type of metabolite is needed to classify geographically distant origins, while incorporating other variables introduces redundancy. In disagreement with these results, we found that fusing polar and non-polar metabolites of tea leaves without data overload (figure 4,r<0.5), and the accuracy of classification is obviously improved by combining machine learning. Furthermore, we have used a two-phase extraction to obtain all metabolites at once, which is faster than obtaining polar and non-polar metabolites separately. Tea leaves obtained by a simple and efficient two-phase extraction process 1 The H NMR comprehensive fingerprint has great application potential for controlling the quality of the tea product due to short time requirement and minimum sample size.
The foregoing is illustrative of the preferred embodiments of the present invention, which is set forth only, and not to be taken as limiting the invention. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A method for identifying the source tracing of a tea producing area through a two-phase extraction NMR spectrum is characterized by comprising the following steps:
s1: crushing a tea sample, freeze-drying, and collecting a nuclear magnetic resonance spectrum;
s2: preprocessing the chemical shift of the region of interest of the NMR spectrum by summation normalization by using MestReNova software;
s3: performing principal component analysis on the spectral data to reduce dimensionality and visualize the results;
s4: and importing the spectral data into the model, calculating the accuracy, evaluating the model, and selecting the model with the highest accuracy.
2. The method for identifying the source tracing of tea production places through the two-phase extraction NMR spectrum as claimed in claim 1, wherein the crushed tea sample in the step S1 is subjected to ultrasonic treatment and centrifugation, and the nuclear magnetic resonance spectrum is obtained through a 600MHz HMR spectrometer at the temperature of 298K.
3. The method for identifying the source tracing of tea leaf origin by means of two-phase extraction NMR spectrum according to claim 1, wherein the region of interest selection D in the step S2 2 O:0.6-8.12ppm, excluding 452-5.0ppm, and CDCL 3 :0.48-7.60ppm, excluding chemical shifts of 7.2-7.28 ppm.
4. The method for identifying the tea leaf origin tracing through the biphase extraction NMR spectrum according to claim 1, wherein the principal component analysis in the step S3 uses PCA to reduce the dimension of the data, so as to visualize the data.
5. The method for identifying the tea leaf origin tracing through the two-phase extraction NMR spectrum as claimed in claim 1, wherein the specific process of introducing the spectrum data into the model in the step S4 is as follows: for the spectral data extracted from the aqueous phase, 0.6-8.12ppm was selected, excluding the chemical shifts of 4.52-5.0ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, for the spectrum taken from the chloroform phase, 0.48-7.60ppm was selected, excluding the chemical shifts of 7.2-7.28ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, and the nuclear magnetic spectral data obtained from the aqueous phase and the chloroform phase were directly combined by low-level data fusion and introduced into a random forest RF model.
6. The method for identifying the source tracing of tea leaf origin by means of two-phase extraction NMR spectrum according to claim 1, wherein the accuracy of the calculation in the step S4 is represented by the formula:
accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%
Wherein, TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
7. The method for identifying the tea leaf origin tracing through the biphase extraction NMR spectrum as claimed in claim 1, wherein the sensitivity and specificity of the model are also evaluated in the step S4, and the calculation formulas of the sensitivity and specificity are respectively as follows:
sensitivity = TP/(TP + TN) × 100%
Specificity = TN/(TP + TN) × 100%
Wherein, TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
8. Use of a method for identifying the origin of a tea leaf by biphasic extraction NMR spectroscopy as defined in any one of claims 1 to 7 for identifying the origin of a taiping kowkui origin.
CN202210495061.4A 2022-05-07 2022-05-07 Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof Active CN115165950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210495061.4A CN115165950B (en) 2022-05-07 2022-05-07 Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210495061.4A CN115165950B (en) 2022-05-07 2022-05-07 Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof

Publications (2)

Publication Number Publication Date
CN115165950A true CN115165950A (en) 2022-10-11
CN115165950B CN115165950B (en) 2024-06-04

Family

ID=83483634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210495061.4A Active CN115165950B (en) 2022-05-07 2022-05-07 Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof

Country Status (1)

Country Link
CN (1) CN115165950B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485418A (en) * 2023-06-21 2023-07-25 福建基茶生物科技有限公司 Tracing method and system for tea refining production

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005008256A2 (en) * 2003-07-10 2005-01-27 Barry Callebaut A.G. Method of determining the geographical origin of cocoa beans and derivative products thereof
KR100905414B1 (en) * 2008-06-25 2009-07-02 대한민국 Origin discrimination method of herbal medicine
CN108931548A (en) * 2018-06-06 2018-12-04 厦门大学 A method of tea-leaf producing area difference is identified by purifying displacement study H NMR spectroscopy
CN109001306A (en) * 2018-06-01 2018-12-14 南昌大学 The prediction technique of squalene and sterol index in a kind of tea oil
CN111272931A (en) * 2020-02-17 2020-06-12 江苏一片叶高新科技有限公司 Method for tracing origin of tea

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005008256A2 (en) * 2003-07-10 2005-01-27 Barry Callebaut A.G. Method of determining the geographical origin of cocoa beans and derivative products thereof
KR100905414B1 (en) * 2008-06-25 2009-07-02 대한민국 Origin discrimination method of herbal medicine
CN109001306A (en) * 2018-06-01 2018-12-14 南昌大学 The prediction technique of squalene and sterol index in a kind of tea oil
CN108931548A (en) * 2018-06-06 2018-12-04 厦门大学 A method of tea-leaf producing area difference is identified by purifying displacement study H NMR spectroscopy
CN111272931A (en) * 2020-02-17 2020-06-12 江苏一片叶高新科技有限公司 Method for tracing origin of tea

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘艳丽等: "茶树铝、氟富集研究进展", 《植物科学学报》, 27 December 2016 (2016-12-27) *
袁玉伟;胡桂仙;邵圣枝;张永志;张玉;朱加虹;杨桂玲;张志恒;: "茶叶产地溯源与鉴别检测技术研究进展", 核农学报, no. 04, 27 April 2013 (2013-04-27) *
金戈等: "Tracing the origin of taiping houkui green tea using 1H NMR and HS-SPME-GC-MS chemical fingerprints, data fusion and chemometrics", 《FOOD CHEMISTRY》, 1 June 2023 (2023-06-01) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485418A (en) * 2023-06-21 2023-07-25 福建基茶生物科技有限公司 Tracing method and system for tea refining production
CN116485418B (en) * 2023-06-21 2023-09-05 福建基茶生物科技有限公司 Tracing method and system for tea refining production

Also Published As

Publication number Publication date
CN115165950B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
Lim et al. Non-destructive profiling of volatile organic compounds using HS-SPME/GC–MS and its application for the geographical discrimination of white rice
Kalogiouri et al. Application of High Resolution Mass Spectrometric methods coupled with chemometric techniques in olive oil authenticity studies-A review
Ch et al. Metabolomic fingerprinting of volatile organic compounds for the geographical discrimination of rice samples from China, Vietnam and India
Hu et al. Characterization of volatile components in four vegetable oils by headspace two-dimensional comprehensive chromatography time-of-flight mass spectrometry
Vaclavik et al. Liquid chromatography–mass spectrometry-based metabolomics for authenticity assessment of fruit juices
Cozzolino et al. Can spectroscopy geographically classify Sauvignon Blanc wines from Australia and New Zealand?
Consonni et al. NMR based geographical characterization of roasted coffee
Li et al. A novel strategy for discriminating different cultivation and screening odor and taste flavor compounds in Xinhui tangerine peel using E-nose, E-tongue, and chemometrics
Cagliani et al. NMR investigations for a quality assessment of Italian PDO saffron (Crocus sativus L.)
Stilo et al. Untargeted approaches in food-omics: The potential of comprehensive two-dimensional gas chromatography/mass spectrometry
Cui et al. Machine learning applications for identify the geographical origin, variety and processing of black tea using 1H NMR chemical fingerprinting
CN104316635A (en) Method for rapidly identifying flavor and quality of fruits
Zhao et al. Detection of adulteration of sesame and peanut oils via volatiles by GC× GC–TOF/MS coupled with principal components analysis and cluster analysis
JP2009014700A (en) Green tea quality prediction method
CN110376153B (en) Method for tracing origin of market saffron by combining ATR-FTIR with RBF neural network
Moreno-Ley et al. Prediction of coumarin and ethyl vanillin in pure vanilla extracts using MID-FTIR spectroscopy and chemometrics
Tian et al. Development of a flavour fingerprint by GC‐MS and GC‐O combined with chemometric methods for the quality control of Korla pear (Pyrus serotina Reld)
CN115165950B (en) Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof
CN113125590A (en) Objective evaluation method for aroma quality of Yunnan red congou tea soup based on rapid gas-phase electronic nose technology
Jin et al. Tracing the origin of Taiping Houkui green tea using 1H NMR and HS-SPME-GC–MS chemical fingerprints, data fusion and chemometrics
Cui et al. 1H NMR-based metabolomic approach combined with machine learning algorithm to distinguish the geographic origin of huajiao (Zanthoxylum bungeanum Maxim.)
Jiménez-Carvelo et al. Multivariate approach for the authentication of vanilla using infrared and Raman spectroscopy
Serag et al. Integrated comparative metabolite profiling via NMR and GC–MS analyses for tongkat ali (Eurycoma longifolia) fingerprinting and quality control analysis
Soni et al. A review of conventional and rapid analytical techniques coupled with multivariate analysis for origin traceability of soybean
Zhou et al. Understanding the flavor signature of the rice grown in different regions of China via metabolite profiling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant