CN115165950A - Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof - Google Patents
Method for identifying origin tracing of tea leaves through double-phase extraction NMR spectrum and application thereof Download PDFInfo
- Publication number
- CN115165950A CN115165950A CN202210495061.4A CN202210495061A CN115165950A CN 115165950 A CN115165950 A CN 115165950A CN 202210495061 A CN202210495061 A CN 202210495061A CN 115165950 A CN115165950 A CN 115165950A
- Authority
- CN
- China
- Prior art keywords
- tea
- identifying
- nmr spectrum
- spectrum
- origin
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000000655 nuclear magnetic resonance spectrum Methods 0.000 title claims abstract description 25
- 241001122767 Theaceae Species 0.000 title claims 11
- 238000005481 NMR spectroscopy Methods 0.000 claims abstract description 27
- 238000007637 random forest analysis Methods 0.000 claims abstract description 19
- 238000000513 principal component analysis Methods 0.000 claims abstract description 17
- 239000000126 substance Substances 0.000 claims abstract description 13
- 239000012071 phase Substances 0.000 claims description 40
- 238000004519 manufacturing process Methods 0.000 claims description 27
- 238000001228 spectrum Methods 0.000 claims description 19
- 230000003595 spectral effect Effects 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 11
- 230000035945 sensitivity Effects 0.000 claims description 9
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 claims description 8
- 239000008346 aqueous phase Substances 0.000 claims description 4
- 230000002051 biphasic effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000005119 centrifugation Methods 0.000 claims description 2
- 238000004108 freeze drying Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000009210 therapy by ultrasound Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims 1
- 244000269722 Thea sinensis Species 0.000 abstract description 63
- 235000013616 tea Nutrition 0.000 abstract description 44
- 239000002207 metabolite Substances 0.000 abstract description 30
- 235000009569 green tea Nutrition 0.000 abstract description 17
- 235000014113 dietary fatty acids Nutrition 0.000 abstract description 9
- 229930195729 fatty acid Natural products 0.000 abstract description 9
- 239000000194 fatty acid Substances 0.000 abstract description 9
- 150000004665 fatty acids Chemical class 0.000 abstract description 9
- 238000010801 machine learning Methods 0.000 abstract description 7
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 abstract description 5
- 229930006000 Sucrose Natural products 0.000 abstract description 5
- 239000005720 sucrose Substances 0.000 abstract description 5
- 238000001514 detection method Methods 0.000 abstract description 3
- ADRVNXBAWSRFAJ-UHFFFAOYSA-N catechin Natural products OC1Cc2cc(O)cc(O)c2OC1c3ccc(O)c(O)c3 ADRVNXBAWSRFAJ-UHFFFAOYSA-N 0.000 abstract description 2
- 150000001765 catechin Chemical class 0.000 abstract description 2
- 235000005487 catechin Nutrition 0.000 abstract description 2
- 238000012706 support-vector machine Methods 0.000 abstract 1
- 239000000284 extract Substances 0.000 description 15
- DATAGRPVKZEWHA-YFKPBYRVSA-N N(5)-ethyl-L-glutamine Chemical compound CCNC(=O)CC[C@H]([NH3+])C([O-])=O DATAGRPVKZEWHA-YFKPBYRVSA-N 0.000 description 10
- 241000282693 Cercopithecidae Species 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 8
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 6
- WMBWREPUVVBILR-UHFFFAOYSA-N GCG Natural products C=1C(O)=C(O)C(O)=CC=1C1OC2=CC(O)=CC(O)=C2CC1OC(=O)C1=CC(O)=C(O)C(O)=C1 WMBWREPUVVBILR-UHFFFAOYSA-N 0.000 description 6
- 235000006468 Thea sinensis Nutrition 0.000 description 5
- DTOSIQBPPRVQHS-PDBXOOCHSA-N alpha-linolenic acid Chemical compound CC\C=C/C\C=C/C\C=C/CCCCCCCC(O)=O DTOSIQBPPRVQHS-PDBXOOCHSA-N 0.000 description 5
- 235000020661 alpha-linolenic acid Nutrition 0.000 description 5
- 229960004488 linolenic acid Drugs 0.000 description 5
- KQQKGWQCNNTQJW-UHFFFAOYSA-N linolenic acid Natural products CC=CCCC=CCC=CCCCCCCCC(O)=O KQQKGWQCNNTQJW-UHFFFAOYSA-N 0.000 description 5
- 230000002503 metabolic effect Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 229940026510 theanine Drugs 0.000 description 5
- 238000012313 Kruskal-Wallis test Methods 0.000 description 4
- 235000020279 black tea Nutrition 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- LNTHITQWFMADLM-UHFFFAOYSA-N gallic acid Chemical compound OC(=O)C1=CC(O)=C(O)C(O)=C1 LNTHITQWFMADLM-UHFFFAOYSA-N 0.000 description 4
- WMBWREPUVVBILR-WIYYLYMNSA-N (-)-Epigallocatechin-3-o-gallate Chemical compound O([C@@H]1CC2=C(O)C=C(C=C2O[C@@H]1C=1C=C(O)C(O)=C(O)C=1)O)C(=O)C1=CC(O)=C(O)C(O)=C1 WMBWREPUVVBILR-WIYYLYMNSA-N 0.000 description 3
- GEWDNTWNSAZUDX-WQMVXFAESA-N (-)-methyl jasmonate Chemical compound CC\C=C/C[C@@H]1[C@@H](CC(=O)OC)CCC1=O GEWDNTWNSAZUDX-WQMVXFAESA-N 0.000 description 3
- XQCFHQBGMWUEMY-ZPUQHVIOSA-N Nitrovin Chemical compound C=1C=C([N+]([O-])=O)OC=1\C=C\C(=NNC(=N)N)\C=C\C1=CC=C([N+]([O-])=O)O1 XQCFHQBGMWUEMY-ZPUQHVIOSA-N 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- GEWDNTWNSAZUDX-UHFFFAOYSA-N methyl 7-epi-jasmonate Natural products CCC=CCC1C(CC(=O)OC)CCC1=O GEWDNTWNSAZUDX-UHFFFAOYSA-N 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000001953 sensory effect Effects 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- AAWZDTNXLSGCEK-LNVDRNJUSA-N (3r,5r)-1,3,4,5-tetrahydroxycyclohexane-1-carboxylic acid Chemical compound O[C@@H]1CC(O)(C(O)=O)C[C@@H](O)C1O AAWZDTNXLSGCEK-LNVDRNJUSA-N 0.000 description 2
- AAWZDTNXLSGCEK-UHFFFAOYSA-N Cordycepinsaeure Natural products OC1CC(O)(C(O)=O)CC(O)C1O AAWZDTNXLSGCEK-UHFFFAOYSA-N 0.000 description 2
- 229930091371 Fructose Natural products 0.000 description 2
- 239000005715 Fructose Substances 0.000 description 2
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 2
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 2
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 2
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 2
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 2
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 2
- 102000003820 Lipoxygenases Human genes 0.000 description 2
- 108090000128 Lipoxygenases Proteins 0.000 description 2
- 241000233855 Orchidaceae Species 0.000 description 2
- 238000010220 Pearson correlation analysis Methods 0.000 description 2
- AAWZDTNXLSGCEK-ZHQZDSKASA-N Quinic acid Natural products O[C@H]1CC(O)(C(O)=O)C[C@H](O)C1O AAWZDTNXLSGCEK-ZHQZDSKASA-N 0.000 description 2
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 2
- 239000004473 Threonine Substances 0.000 description 2
- 229960003767 alanine Drugs 0.000 description 2
- 235000004279 alanine Nutrition 0.000 description 2
- WQZGKKKJIJFFOK-DVKNGEFBSA-N alpha-D-glucose Chemical compound OC[C@H]1O[C@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-DVKNGEFBSA-N 0.000 description 2
- 235000019568 aromas Nutrition 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 2
- 150000001720 carbohydrates Chemical class 0.000 description 2
- 235000014633 carbohydrates Nutrition 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 235000019197 fats Nutrition 0.000 description 2
- 239000003205 fragrance Substances 0.000 description 2
- 235000004515 gallic acid Nutrition 0.000 description 2
- 229940074391 gallic acid Drugs 0.000 description 2
- 229960000310 isoleucine Drugs 0.000 description 2
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 2
- 229960003136 leucine Drugs 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000000491 multivariate analysis Methods 0.000 description 2
- SECPZKHBENQXJG-FPLPWBNLSA-N palmitoleic acid Chemical compound CCCCCC\C=C/CCCCCCCC(O)=O SECPZKHBENQXJG-FPLPWBNLSA-N 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 235000014347 soups Nutrition 0.000 description 2
- 229960002898 threonine Drugs 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- OYHQOLUKZRVURQ-NTGFUMLPSA-N (9Z,12Z)-9,10,12,13-tetratritiooctadeca-9,12-dienoic acid Chemical compound C(CCCCCCC\C(=C(/C\C(=C(/CCCCC)\[3H])\[3H])\[3H])\[3H])(=O)O OYHQOLUKZRVURQ-NTGFUMLPSA-N 0.000 description 1
- WRIDQFICGBMAFQ-UHFFFAOYSA-N (E)-8-Octadecenoic acid Natural products CCCCCCCCCC=CCCCCCCC(O)=O WRIDQFICGBMAFQ-UHFFFAOYSA-N 0.000 description 1
- LQJBNNIYVWPHFW-UHFFFAOYSA-N 20:1omega9c fatty acid Natural products CCCCCCCCCCC=CCCCCCCCC(O)=O LQJBNNIYVWPHFW-UHFFFAOYSA-N 0.000 description 1
- QSBYPNXLFMSGKH-UHFFFAOYSA-N 9-Heptadecensaeure Natural products CCCCCCCC=CCCCCCCCC(O)=O QSBYPNXLFMSGKH-UHFFFAOYSA-N 0.000 description 1
- -1 ECG Chemical compound 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 208000035126 Facies Diseases 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 239000005642 Oleic acid Substances 0.000 description 1
- ZQPPMHVWECSIRJ-UHFFFAOYSA-N Oleic acid Natural products CCCCCCCCC=CCCCCCCCC(O)=O ZQPPMHVWECSIRJ-UHFFFAOYSA-N 0.000 description 1
- 235000021319 Palmitoleic acid Nutrition 0.000 description 1
- 201000007902 Primary cutaneous amyloidosis Diseases 0.000 description 1
- 150000001298 alcohols Chemical class 0.000 description 1
- 150000001299 aldehydes Chemical class 0.000 description 1
- 229940024606 amino acid Drugs 0.000 description 1
- 235000001014 amino acid Nutrition 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- SECPZKHBENQXJG-UHFFFAOYSA-N cis-palmitoleic acid Natural products CCCCCCC=CCCCCCCCC(O)=O SECPZKHBENQXJG-UHFFFAOYSA-N 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000001212 derivatisation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000921 elemental analysis Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 description 1
- 238000001319 headspace solid-phase micro-extraction Methods 0.000 description 1
- 238000000589 high-performance liquid chromatography-mass spectrometry Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- QXJSBBXBKPUZAA-UHFFFAOYSA-N isooleic acid Natural products CCCCCCCC=CCCCCCCCCC(O)=O QXJSBBXBKPUZAA-UHFFFAOYSA-N 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000002705 metabolomic analysis Methods 0.000 description 1
- 230000001431 metabolomic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007837 multiplex assay Methods 0.000 description 1
- 210000001331 nose Anatomy 0.000 description 1
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 description 1
- ZQPPMHVWECSIRJ-KTKRTIGZSA-N oleic acid Chemical compound CCCCCCCC\C=C/CCCCCCCC(O)=O ZQPPMHVWECSIRJ-KTKRTIGZSA-N 0.000 description 1
- 235000021313 oleic acid Nutrition 0.000 description 1
- 235000020333 oolong tea Nutrition 0.000 description 1
- 150000007524 organic acids Chemical class 0.000 description 1
- 235000005985 organic acids Nutrition 0.000 description 1
- 150000002989 phenols Chemical class 0.000 description 1
- 230000029553 photosynthesis Effects 0.000 description 1
- 238000010672 photosynthesis Methods 0.000 description 1
- 208000014670 posterior cortical atrophy Diseases 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000000425 proton nuclear magnetic resonance spectrum Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000002105 tongue Anatomy 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N24/00—Investigating or analyzing materials by the use of nuclear magnetic resonance, electron paramagnetic resonance or other spin effects
- G01N24/08—Investigating or analyzing materials by the use of nuclear magnetic resonance, electron paramagnetic resonance or other spin effects by using nuclear magnetic resonance
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/34—Purifying; Cleaning
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/40—Concentrating samples
- G01N1/4055—Concentrating samples by solubility techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/40—Concentrating samples
- G01N1/4055—Concentrating samples by solubility techniques
- G01N2001/4061—Solvent extraction
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/30—Assessment of water resources
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Pathology (AREA)
- Chemical & Material Sciences (AREA)
- Immunology (AREA)
- Analytical Chemistry (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biochemistry (AREA)
- Medical Informatics (AREA)
- High Energy & Nuclear Physics (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention relates to the field of nuclear magnetic resonance detection, in particular to a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof, and is based on 1 Two-phase extraction fingerprinting by H NMR combined with multivariate data, the geographic traceability (ballast) of Taiping Houkui green tea was analyzed, principal component analysis was used as an exploratory tool for clustering summary, support vector machines and Random Forests (RF) were further applied for classification, combining polar and non-polar extractionThe RF model of the substance achieved an optimal accuracy of 87.5%, with catechins, fatty acids and sucrose being considered as contributors to this classification and as important differential metabolites, these results supporting the use of 1 H NMR combined with machine learning tools to identify green tea in narrow-sense origins.
Description
Technical Field
The invention relates to the field of nuclear magnetic resonance detection, in particular to a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof.
Background
Tea leaves, one of the three most popular beverages in the world, are favored by consumers due to their unique flavors. Among all tea leaf types, green tea has the largest market share in china. The demand for green tea is closely related to the geographical origin and corresponding quality of the tea leaves, which further affects the price and consumer choice. In china, famous green tea is usually produced in narrow production areas such as west lake longjing, huangshan Maofeng and Taiping Houkui. Taiping houkui is considered to be the king of green tea, and has a unique appearance and an orchid-like aroma. The taiping houkui is mainly produced in the new mingten town, the three-mouth town, the longmen town, the youth, the monkey sentry and the monkey pit of the yellow mountain city of Anhui province. The Yanjia, the monkey hillock and the monkey pit are the core producing areas and are the most special places in the field. The taiping houkui from different origins are similar in appearance and geographical location. Driven by the interest, some illegal merchants know that the product production comes from elsewhere, but still mark the product as a valuable geographic place. Therefore, there is an urgent need for an effective method for certifying green tea from narrow places of origin.
The traditional tea production place identification method is judged by sensory evaluation, depends on the experience of people, is easily influenced by artificial subjective factors, and has certain limitation. In recent years, some emerging detection techniques are widely used as complementary methods to sensory evaluation, such as high performance liquid chromatography-mass spectrometry, headspace solid phase microextraction and gas chromatography-mass spectrometry, stable isotopes, elemental analysis, electronic noses and electronic tongues. However, these techniques typically require complex sample pre-treatment or derivatization, run times are long, and are not suitable for routine analysis. In contrast, nuclear Magnetic Resonance (NMR) techniques are fast (typically 3-5 minutes per sample) and can produce reliable metabolite fingerprints in the smallest samples. In addition, nuclear magnetic resonance allows simultaneous identification of multiple chemical components from one experiment with good reproducibility. Due to the presence of different metabolitesThe vast difference that a complete analysis of the metabolome is not feasible. Some occur in large amounts (up to ten percent of dry matter) while others occur only in trace amounts (in pmol amounts or less). Some are extremely hydrophilic (like sugar) and others are lipophilic (like fat), which makes extraction both in one extract almost impossible. In addition, due to 1 The spectral range of H NMR is limited and superposition of signals often occurs. Especially small signals close to large signals are difficult to find. These factors limit the development of nuclear magnetic spectroscopy. The two-phase extraction can obtain polar and nonpolar metabolites at one time and comprehensively reflect the metabolite information in the tea.
In recent decades, there have been several reports on the traceability of tea leaves based on nuclear magnetic resonance, and different countries have been studied to differentiate green tea according to origin, and no satisfactory results have been obtained. The accuracy of distinguishing oolong tea from three different origins using nmr data was only 68.2-78.7%. These studies only used polar metabolites in tea leaves for identification, and the degree of contribution of non-polar metabolites to identification of tea-leaf origin is still unknown, and traceability of polar extracts in tea leaves to narrow-leaf origin is limited. In addition, the nuclear magnetic resonance is applied to classify the tea leaves with narrow production places and unobvious climate differences, and the accuracy of the classification needs to be further improved. Research reports suggest that obtaining a comprehensive metabolic fingerprint may provide additional insight. However, few studies will be based on 1 The metabolomics approach of H NMR was applied to certification of tea origin using polar and non-polar extracts.
In view of the above-mentioned drawbacks, the inventors of the present invention have finally obtained the present invention through a long period of research and practice.
Disclosure of Invention
The invention aims to solve the problems that polar metabolites in tea leaves are only used for identification, the contribution degree of non-polar metabolites to tea leaf origin identification is unknown, nuclear magnetic resonance is used for classifying tea leaves with narrow origin and unobvious climate difference, and the accuracy is not high in the existing research, and provides a method for identifying the origin tracing of tea leaves by using a two-phase extraction NMR spectrum and application thereof.
In order to achieve the aim, the invention discloses a method for identifying the origin tracing of a tea production place by using a two-phase extraction NMR spectrum, which comprises the following steps:
s1: crushing a tea sample, freeze-drying, and collecting a nuclear magnetic resonance spectrum;
s2: pre-processing the NMR spectra with MestReNova software;
s3: performing principal component analysis on the spectral data to reduce dimensionality and visualize the results;
s4: and (4) importing the spectral data into the model, calculating the accuracy and evaluating the model.
And (2) carrying out ultrasonic treatment and centrifugation on the tea sample crushed in the step (S1), and obtaining a nuclear magnetic resonance spectrum by a 600MHz HMR spectrometer at the temperature of 298K.
The region of interest in step S2 is selected D 2 O:0.6-8.12ppm, excluding 4.52-5.0ppm and CDCL 3 :0.48-7.60ppm, excluding chemical shifts of 7.2-7.28 ppm.
And in the step S3, the principal component analysis uses PCA to reduce the dimensionality of the data so as to visualize the data.
The specific process of importing the spectrum data into the model in the step S4 is as follows: for the spectral data extracted from the aqueous phase, 0.6-8.12ppm was selected, excluding the chemical shifts of 4.52-5.0ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, for the spectrum taken from the chloroform phase, 0.48-7.60ppm was selected, excluding the chemical shifts of 7.2-7.28ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, and the nuclear magnetic spectral data obtained from the aqueous phase and the chloroform phase were directly combined by low-level data fusion and introduced into a random forest RF model.
The formula for calculating accuracy in step S4 is:
accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%
Wherein TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
In the step S4, the sensitivity and the specificity of the model are also evaluated, and the calculation formulas of the sensitivity and the specificity are respectively as follows:
sensitivity = TP/(TP + TN) × 100%
Specificity = TN/(TP + TN) × 100%
Wherein TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
The invention also discloses application of the method for identifying the origin tracing of the tea through the two-phase extraction NMR spectrum in identifying the origin tracing of the Taiping Houkui tea.
Due to the large differences between different metabolites, a complete analysis of the metabolome is not feasible. Some occur in large amounts (up to ten percent of dry matter) while others occur only in trace amounts (in pmol amounts or less). Some are extremely hydrophilic (like sugar) and others are lipophilic (like fat), which makes extraction both in one extract almost impossible. In addition, due to 1 The spectral range of H NMR is limited and superposition of signals often occurs. Especially small signals close to large signals are difficult to find. Only a part of metabolites can be obtained by single metabolite extraction, and in order to improve the coverage rate of the extracted metabolites, a comprehensive NMR fingerprint metabolic spectrum of the tea leaves is obtained by one-time biphasic extraction.
Compared with the prior art, the invention has the beneficial effects that: the invention uses 1 H NMR fusion of polar and nonpolar compounds is very advantageous for the traceability of narrow production areas, based on 1 The two-phase extraction of H NMR fingerprints is combined with machine learning, green tea in narrow areas can be distinguished, single solvent extraction is limited for tea source tracing in the narrow areas, the classification precision is obviously improved through fusion extraction of polar metabolites and nonpolar metabolites, the random forest model shows the best classification precision of 87.50%, and the method can be used as a rapid screening technology, helps professional auditors to identify production places, and can be used as an additional reference based on objective measurement.
Drawings
FIG. 1 is a sample collection plot of Taiping Houkui;
FIG. 2 shows 600 of Taiping Houkui sample (mixed sample)MHz 1 H NMR spectrum, (a) two-phase extraction (D) 2 O); (b) Two-phase extraction (CDCl) 3 );
FIG. 3 shows the PCA of two-phase extraction of TPHX, (a) D 2 A phase O; (b) CDCl 3 Phase (1);
FIG. 4 is D 2 O and CDCl 3 Visualization of pearson correlation coefficients between data;
FIG. 5 is a data fusion for biphasic extraction, (a) PCA visualization; (b) RF model features are ordered by their contribution to classification accuracy, core zone: core production area (youth, monkey sentry, monkey village), other production areas: other production areas (new Ming, three-port, gantry);
FIG. 6 is D 2 O and CDCl 3 Fusing data for visualization of the first five PCAs;
fig. 7 is bin between core and non-core pay zones (P < 0.05);
FIG. 8 is a box plot of the significantly different metabolites in TPXH samples obtained from two different regions (Kruskal-Wallis test, P <0.05, FDR <0.05, RF model feature variable screening), core production zone: core pay zone, other pay zones: other production areas.
Detailed Description
The above and further features and advantages of the present invention are described in more detail below with reference to the accompanying drawings.
1. Green tea samples
From the core production area (young house, monkey post, monkey pit) and other production areas (Longzhen, xinming town, sankouzhen town) of Huangshan City, anhui province, 72 Taiping Houkui samples were collected almost covering the whole tea production season (figure 1), samples from different production areas were collected in the same production process and in different batches, modern tea producers were entrusted to make tea samples according to the Chinese National Official Standards (CNOS) GB/T19698-2008, detailed information of the samples is shown in Table 1, and all samples were stored at 4 ℃ for analysis.
TABLE 1 Taiping Houkui sample information
2. Sample preparation
The taiping kowkui sample was pulverized with a blender and freeze-dried for 48 hours. Then, 100mg of the freeze-dried tea leaves were transferred to a 2 mL centrifuge tube and 0.8mL of D was added 2 O, then 0.8mL CDCl was added 3 (TMS 0.03% w/v). Subsequently, the extract was sonicated for 10 minutes, then centrifuged (13000 Xg) at 20 ℃ for 5 minutes, and then 0.4mL of D in the sample was added 2 O phase transfer to NMR tube and 0.1mL D addition 2 O (TSP 0.05% w/v). Then, 0.4mL of CDCl was taken out 3 Phase transfer to another nmr tube.
3. Nuclear magnetic resonance data acquisition, processing and analysis
All of 1 The H NMR spectra were obtained using a 600MHz NMR spectrometer (Agilent Technologies, CA, USA) at a temperature of 298K. CDCl 3 Of extracts 1 The H NMR spectrum used the following parameters: the number of scans =64; spectral width =9615.4 hertz, size of FID (TD =65536; relaxation delay =1 second; acquisition time =1.7 seconds. D 2 Of O 1 The H NMR spectra were obtained by WET1D pulse sequences using a deformable selective pulse to suppress the residual water signal. Each spectrum consists of 64 scans, 65536 data points, a spectral width of 9615.4 hz in the frequency domain, a relaxation delay of 1.5 seconds, and an acquisition time of 4.00 seconds.
The NMR spectra were pre-processed with MestReNova software (MestReNova v 14.0.1,2018, mestrelab research, santiago de Compstela, spain). The signal peaks for the internal references TMS and TSP were set to a chemical shift of 0.00 ppm. For all spectra, automatic phase and baseline corrections were performed. For theThe extracted spectrum, using bin for segmented spectrum, was set to 0.04ppm. Region of interest selection D 2 O (0.6-8.12 ppm, excluding 4.52-5.0 ppm) and CDCl 3 (0.48-7.60 ppm, excluding 7.2-7.28 ppm). Bins are generated by normalizing the intensity of each bin to the total intensity of each spectrum for multivariate analysis. Two resulting data matrices, one CDCl, were obtained 3 (72 x 176), the other is D 2 O (72 x 176) extract. These matrices are then merged into a third fused data matrix (72 x 352).
At present, the research on tea leaves focuses on polar extraction, and the nonpolar fingerprint spectrum is ignored. To obtain the comprehensive metabolic fingerprint of taiping kowkui, two-phase extraction was used to analyze polar and non-polar metabolites. According to the published literature, the HMDB database, in a two-phase extraction (D) 2 O) identified 16 taiping kowkui metabolites (fig. 2a and table 2). Metabolites identified in the current study include carbohydrates (sucrose, alpha-glucose, beta-glucose and fructose), amino acids (theanine, alanine, isoleucine, leucine and threonine), organic acids (gallic acid, quinic acid and acetic acid) and phenols (EGCG, EC, ECG and EGC). For two-phase extraction (CDCl) 3 ) The main fatty acids in the spectrum tea of (a) are linolenic acid, linoleic acid, oleic acid and palmitoleic acid. The proton distribution of the different functional groups is shown in figure 2b and table 3. Because the chemical properties of different fatty acids in tea are similar, 1 the HNMR signal will produce a tight resonance. In tea leaves 1 H NMR CDCl 3 The phases had a small number of characteristic peaks (FIG. 2 b). Fatty acids are precursors to the fresh and green odour in tea soups. The fatty acid is oxygenated by Lipoxygenase (LOX), and the activity of the enzyme is induced to change by the temperature of the environment, so that the tea fragrance in different environments is different, which shows that the content of the fatty acid in different producing areas indirectly reflects the difference of the fragrance. In addition, fatty acids also produce cyclic aromas, such as methyl jasmonate. Methyl jasmonate is an important contributor to the aroma of orchids, and is considered a characteristic aroma of high-quality taiping kowkui. In conclusion, the non-polar extraction is beneficial to tracking the origin of the Taiping Houkui green tea in a narrow production area.
TABLE 2TPHK (D) 2 Peak assignment of O) nuclear magnetic resonance spectroscopy
No. | Component | Chemical shiftδ(ppm)(No.) | |
1 | Theanine | 1.12,2.15,2.48,3.22,3.79 | (Kumar et al.,2016;Gall et al.,2004) |
2 | EGCG | 2.88,3.02,5.05,5.54,6.09,6.64,6.96 | (Gall et al.,2004) |
3 | EGC | 2.78,2.91,4.31,6.09,6.64 | (Gall et al.,2004) |
4 | ECG | 2.91,3.04,5.05,6.09,6.85,6.92 | (Gall et al.,2004) |
5 | EC | 2.76,2.88,4.27,6.09,6.94,7.04 | (Gall et al.,2004) |
6 | Sucrose | 3.4,3.65,3.70,4.08,4.23,5.43 | (Kumar et al.,2016) |
7 | α-glucose | 3.50,5.25 | (Bo et al.,2019) |
8 | β-glucose | 4.58 | (Gall et al.,2004) |
9 | Fructose | 3.56,4.13 | (Bo et al.,2019) |
10 | Leucine | 0.98 | (Lee et al.,2010) |
11 | Isoleucine | 1.03,1.98 | (Lee et al.,2010) |
12 | Threonine | 1.36,4.23 | (Gall et al.,2004) |
13 | Alanine | 1.50,3.84 | (Bo et al.,2019) |
14 | Quinic acid | 2.00,4.04 | (Kumar et al.,2016;Lee et al.,2010) |
15 | Acetic acid | 2.07 | (Lee et al.,2011) |
16 | Gallic acid | 7.18 | (Kumar et al.,2016) |
TABLE 3TPHK (CDCl) 3 ) Peak assignment in NMR spectra
4. Multivariate data analysis and classification
Principal component analysis is performed to reduce dimensionality and visualize the results. The significance analysis of the variables was performed by Kruskal-Wallis test with a 95% confidence. P-value the multiplex assay was adjusted using the Benjamini-Hochberg False Discovery Rate (FDR) method (FDR < 0.05). Pearson correlation analysis was performed in MATLAB 2018b (The Mathworks inc., nature, MA, USA).
(1) RF model
Random Forest (RF) algorithms are used to classify tai hough production zones by geographic origin. This method uses bootstrap samples to generate a combination of decision trees. The number of trees is set to 1000.RF is a tree-combining method developed from a training data set and validated internally for the purpose of accurately predicting target variables from predictors. RF will create a plurality of classification and regression trees (CART) from the bootstrap samples of the raw training data. It also randomly searches features to determine split points for the growing tree. Importantly, the RF model can rank the important feature variables by their contribution to classification accuracy.
(2) SVM model
SVM is a machine learning technique that transforms data from a low-dimensional space to a high-dimensional space and creates an optimal hyperplane to classify data points of different classes of samples. In this study, the model was constructed using a linear kernel function SVM algorithm and 10-fold cross validation. Cross-validation can prevent overfitting when the data set is small and produce a reliable and stable model. The SVM algorithm is executed in MATLAB 2018 b.
(3) Model evaluation method
The performance of each established model was evaluated by calculating the accuracy, which is expressed according to the following formula, while the model was evaluated with sensitivity and specificity and its application was expanded. The higher the value, the better the classification performance.
Accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%
Sensitivity = TP/(TP + TN) × 100%
Specificity = TN/(TN + FP) × 100%.
In the formula, TP, FP, TN and FN refer to true positive, false positive, true negative and false negative results, respectively.
To distinguish TPHK from different origins, principal Component Analysis (PCA) was performed to visualize group separation and estimate internal differences. PCA results showed, with CDCl 3 By comparison, D 2 The samples in O were well separated (fig. 3). This may be that the exact lipid composition is complex. The signals of the non-polar extracts are easily superimposed and it is difficult to find the differences between the samples (fig. 2 b). Overall, a relatively small distance between different intergroups leads to an ambiguity of the intergroup boundaries. The overlap between the different groups was significant, further verifying that the sensory evaluation results had higher intra-group differences and lower inter-group differences. This is also a difficulty in tracking the origin in a narrow sense. The unsupervised method PCA provides limited information, while more information is obtained in the "supervised" method of sample-like knowledge. Therefore, machine learning is used to further analyze the data. The order of the determination accuracy is as follows (Table 4). SVM (CDCl) 3 )>RF(D 2 O)>RF(CDCl 3 )>SVM(D 2 O). SVM model only (CDCl) 3 ) The accuracy of (2) is over 80%; however, the specificity was only 72.22%. This means that a single stage is difficult to solve the complex narrow-area traceability problem.
TABLE 4 accuracy of different models
5. Data fusion
Due to the narrow production area of TPHK, the two-phase extraction data alone hardly reflected the differences in the samples. Thus, D 2 O phase and CDCl 3 And performing fusion analysis on the data of the facies, and performing Pearson correlation analysis on the fused data to investigate the correlation between the variables. The Pearson correlation matrix showed little correlation between polar and non-polar extracts (r)<0.5 |), the different characteristics of TPHK can be effectively reflected (fig. 4). This facilitates better results with fused data in conjunction with machine learning. Further applying PCA after fusingThe data structure of polar and non-polar extracts of (a) was visualized (fig. 5 a). PCA shows that the spatial distribution of sample points exhibits overlapping clusters between different groups. These results indicate that there is significant overlap between the different groups even when the combined metabolic profiles are fused. Since the first two PCs account for only 57% of the total variance, further examination of 5 PCs during the analysis provided a variance contribution of 78.5% (FIG. 6). However, there is no improvement in overlapping clusters between different groups. 1 The metabolites provided by H NMR are only one aspect of the dominating tea production site information, and the resolution of the instrument results in limited unsupervised learning results for PCA. Interestingly, in supervised learning, fusion of polar and non-polar data significantly improved accuracy (table 4). The accuracy of the SVM classifier on the identification of the core producing area is 94.00 percent. Unfortunately, the identification of other producing zones is poor, only 75.00%. Overall, the RF classifier achieved an optimal classification rate of 87.50%, with specificity and sensitivity of 86.11% and 88.89%, respectively, being acceptable (table 4). The results show that the two-phase extraction is suitable for distinguishing TPHK of different producing areas, and the accuracy rate is different from 78% of single-phase fusion to 87.5% of two-phase fusion. The fused two-phase data set shows better sample distinguishing performance, because the spectra obtained by two-phase extraction are completely complementary, the relationship between chemical changes caused by different producing areas can be more completely understood.
6. Related metabolites
Using the Kruskal-Wallis test (p < 0.05), 61 bins that were significantly different in the two producing zones were obtained (FIG. 7). To further determine the interval of high importance for distinguishing the core pay zone from the other pay zones, an FDR corrected Kruskal-Wallis and RF characteristic variable screen was performed (FIG. 5 b). 55 bins were excluded (Table 5). Of these, 1.16, 2.2 and 3.8ppm theanine bin distinguished two production zones. Theanine is reported to be a potential marker for distinguishing green tea in three narrow producing areas, and with more rigorous screening, it was determined that theanine was not as important as the more relevant differential metabolites. The 6 most relevant potential metabolic markers were screened (table 6) and the relative concentrations of these results are summarized in a boxplot (figure 8). The nmr signal of sucrose was 3.68ppm, the most abundant carbohydrate in tea, partly produced by photosynthesis during growth. It has been previously reported that the sucrose content of black tea samples from different geographical regions and climates varies greatly. The core pay zone has higher EGCG, ECG, EGC and EC content (6.08-6.20 ppm and 7.00 ppm) than other pay zones. It was demonstrated that the synthesis of catechins is climatically affected, and is the reason for the significant difference between green tea from three different origins, representing 0.84ppm of all fatty acids (except linolenic acid), contributing very prominently to the classification results, which have been used effectively to determine the geographical origin of the oil. However, they are often overlooked in tea. The fatty acid is converted during the tea processing process to produce saturated and unsaturated C6 and C9 aldehydes and alcohols, which provide a faint scent to the tea soup. As precursors of aromas, they make a significant contribution to the aroma. The bins of linolenic acid were 2.04, 2.8, 5.24 and 5.36ppm (table 5), and the linolenic acid concentration in the core production zone was lower than in the other production zones. During tea processing, linolenic acid is converted into methyl jasmonate, which contributes greatly to the orchid aroma of TPHK. This may be another reason that the core zone has a higher aroma score than the other zones, although the contribution classification is less important than 0.84ppm as a result of the removal of linoleic bin by FDR correction. It is noted that there are also some less relevant bins available for classification. The classification is based on the co-existence and interaction of multiple bins. Therefore, we consider the entire spectrum, which is characteristic of one particular TPHK. Due to the quality and quantity of the full spectrum, combined with a machine algorithm, the production area of the taiping kowkui can be identified.
Table 5 significant differences between other and core pay zones (Kruskal-Wallis test, P < 0.05), 55 bins were excluded by the FDR method
Table 6 significant differences between other and core producing zones, P values were obtained from Wilcoxon rank sum test and corrected using FDR method
bins(δppm 1 H) | p-value | –log 10(p) | FDR |
3.68(D 2 O) | 2.72×10 –6 | 5.5652 | 0.000958 |
7(D 2 O) | 1.15×10 –5 | 4.939 | 0.002026 |
6.16(D 2 O) | 3.85×10 –5 | 4.4147 | 0.004515 |
0.84(CDCl 3 ) | 0.000243 | 3.6149 | 0.020672 |
6.2(D 2 O) | 0.000294 | 3.5322 | 0.020672 |
6.12(D 2 O) | 0.000471 | 3.3274 | 0.027607 |
7. Model comparison
In this study, we used the basis of 1 H NMR fingerprint identification technology combined with machine learning tracks narrow green tea producing areas. One key consequence is that the traceability of polar extracts from tea leaves to narrow origins is limited. We found that non-polar extracts are also very important for classification. Importantly, fusing polar and non-polar extracts can significantly improve the accuracy of classification.
Seeger proposes nuclear magnetic resonance metabonomics, which selects polar and non-polar metabolites to simply and rapidly distinguish black tea from green tea. Black tea and green tea can be distinguished only visually. In addition, the metabolites of black and green tea are very different and can be distinguished by a single extraction of the polar extract. For narrow production areas, the metabolites are very similar. TPHK in two narrow producing areas is distinguished by polar extract, and the accuracy rate is only 76.39%. The two-phase data are obtained by adopting the fused two-phase extraction method once, and the fused two-phase data have satisfactory accuracy (87.50%). Previous studies using nuclear magnetic resonance to obtain fingerprints of polar and non-polar metabolites separately, which are less accurate than single metabolites after data fusion, mean that only one type of metabolite is needed to classify geographically distant origins, while incorporating other variables introduces redundancy. In disagreement with these results, we found that fusing polar and non-polar metabolites of tea leaves without data overload (figure 4,r<0.5), and the accuracy of classification is obviously improved by combining machine learning. Furthermore, we have used a two-phase extraction to obtain all metabolites at once, which is faster than obtaining polar and non-polar metabolites separately. Tea leaves obtained by a simple and efficient two-phase extraction process 1 The H NMR comprehensive fingerprint has great application potential for controlling the quality of the tea product due to short time requirement and minimum sample size.
The foregoing is illustrative of the preferred embodiments of the present invention, which is set forth only, and not to be taken as limiting the invention. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. A method for identifying the source tracing of a tea producing area through a two-phase extraction NMR spectrum is characterized by comprising the following steps:
s1: crushing a tea sample, freeze-drying, and collecting a nuclear magnetic resonance spectrum;
s2: preprocessing the chemical shift of the region of interest of the NMR spectrum by summation normalization by using MestReNova software;
s3: performing principal component analysis on the spectral data to reduce dimensionality and visualize the results;
s4: and importing the spectral data into the model, calculating the accuracy, evaluating the model, and selecting the model with the highest accuracy.
2. The method for identifying the source tracing of tea production places through the two-phase extraction NMR spectrum as claimed in claim 1, wherein the crushed tea sample in the step S1 is subjected to ultrasonic treatment and centrifugation, and the nuclear magnetic resonance spectrum is obtained through a 600MHz HMR spectrometer at the temperature of 298K.
3. The method for identifying the source tracing of tea leaf origin by means of two-phase extraction NMR spectrum according to claim 1, wherein the region of interest selection D in the step S2 2 O:0.6-8.12ppm, excluding 452-5.0ppm, and CDCL 3 :0.48-7.60ppm, excluding chemical shifts of 7.2-7.28 ppm.
4. The method for identifying the tea leaf origin tracing through the biphase extraction NMR spectrum according to claim 1, wherein the principal component analysis in the step S3 uses PCA to reduce the dimension of the data, so as to visualize the data.
5. The method for identifying the tea leaf origin tracing through the two-phase extraction NMR spectrum as claimed in claim 1, wherein the specific process of introducing the spectrum data into the model in the step S4 is as follows: for the spectral data extracted from the aqueous phase, 0.6-8.12ppm was selected, excluding the chemical shifts of 4.52-5.0ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, for the spectrum taken from the chloroform phase, 0.48-7.60ppm was selected, excluding the chemical shifts of 7.2-7.28ppm, and a bin was used for segmenting the spectrum, set to 0.04ppm, to obtain a total of 176 variables, and the nuclear magnetic spectral data obtained from the aqueous phase and the chloroform phase were directly combined by low-level data fusion and introduced into a random forest RF model.
6. The method for identifying the source tracing of tea leaf origin by means of two-phase extraction NMR spectrum according to claim 1, wherein the accuracy of the calculation in the step S4 is represented by the formula:
accuracy = (TP + TN)/(TP + TN + FP + FN) × 100%
Wherein, TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
7. The method for identifying the tea leaf origin tracing through the biphase extraction NMR spectrum as claimed in claim 1, wherein the sensitivity and specificity of the model are also evaluated in the step S4, and the calculation formulas of the sensitivity and specificity are respectively as follows:
sensitivity = TP/(TP + TN) × 100%
Specificity = TN/(TP + TN) × 100%
Wherein, TP, FP, TN and FN are true positive, false positive, true negative and false negative results respectively.
8. Use of a method for identifying the origin of a tea leaf by biphasic extraction NMR spectroscopy as defined in any one of claims 1 to 7 for identifying the origin of a taiping kowkui origin.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210495061.4A CN115165950B (en) | 2022-05-07 | 2022-05-07 | Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210495061.4A CN115165950B (en) | 2022-05-07 | 2022-05-07 | Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115165950A true CN115165950A (en) | 2022-10-11 |
CN115165950B CN115165950B (en) | 2024-06-04 |
Family
ID=83483634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210495061.4A Active CN115165950B (en) | 2022-05-07 | 2022-05-07 | Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115165950B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116485418A (en) * | 2023-06-21 | 2023-07-25 | 福建基茶生物科技有限公司 | Tracing method and system for tea refining production |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005008256A2 (en) * | 2003-07-10 | 2005-01-27 | Barry Callebaut A.G. | Method of determining the geographical origin of cocoa beans and derivative products thereof |
KR100905414B1 (en) * | 2008-06-25 | 2009-07-02 | 대한민국 | Origin discrimination method of herbal medicine |
CN108931548A (en) * | 2018-06-06 | 2018-12-04 | 厦门大学 | A method of tea-leaf producing area difference is identified by purifying displacement study H NMR spectroscopy |
CN109001306A (en) * | 2018-06-01 | 2018-12-14 | 南昌大学 | The prediction technique of squalene and sterol index in a kind of tea oil |
CN111272931A (en) * | 2020-02-17 | 2020-06-12 | 江苏一片叶高新科技有限公司 | Method for tracing origin of tea |
-
2022
- 2022-05-07 CN CN202210495061.4A patent/CN115165950B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005008256A2 (en) * | 2003-07-10 | 2005-01-27 | Barry Callebaut A.G. | Method of determining the geographical origin of cocoa beans and derivative products thereof |
KR100905414B1 (en) * | 2008-06-25 | 2009-07-02 | 대한민국 | Origin discrimination method of herbal medicine |
CN109001306A (en) * | 2018-06-01 | 2018-12-14 | 南昌大学 | The prediction technique of squalene and sterol index in a kind of tea oil |
CN108931548A (en) * | 2018-06-06 | 2018-12-04 | 厦门大学 | A method of tea-leaf producing area difference is identified by purifying displacement study H NMR spectroscopy |
CN111272931A (en) * | 2020-02-17 | 2020-06-12 | 江苏一片叶高新科技有限公司 | Method for tracing origin of tea |
Non-Patent Citations (3)
Title |
---|
刘艳丽等: "茶树铝、氟富集研究进展", 《植物科学学报》, 27 December 2016 (2016-12-27) * |
袁玉伟;胡桂仙;邵圣枝;张永志;张玉;朱加虹;杨桂玲;张志恒;: "茶叶产地溯源与鉴别检测技术研究进展", 核农学报, no. 04, 27 April 2013 (2013-04-27) * |
金戈等: "Tracing the origin of taiping houkui green tea using 1H NMR and HS-SPME-GC-MS chemical fingerprints, data fusion and chemometrics", 《FOOD CHEMISTRY》, 1 June 2023 (2023-06-01) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116485418A (en) * | 2023-06-21 | 2023-07-25 | 福建基茶生物科技有限公司 | Tracing method and system for tea refining production |
CN116485418B (en) * | 2023-06-21 | 2023-09-05 | 福建基茶生物科技有限公司 | Tracing method and system for tea refining production |
Also Published As
Publication number | Publication date |
---|---|
CN115165950B (en) | 2024-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lim et al. | Non-destructive profiling of volatile organic compounds using HS-SPME/GC–MS and its application for the geographical discrimination of white rice | |
Kalogiouri et al. | Application of High Resolution Mass Spectrometric methods coupled with chemometric techniques in olive oil authenticity studies-A review | |
Ch et al. | Metabolomic fingerprinting of volatile organic compounds for the geographical discrimination of rice samples from China, Vietnam and India | |
Hu et al. | Characterization of volatile components in four vegetable oils by headspace two-dimensional comprehensive chromatography time-of-flight mass spectrometry | |
Vaclavik et al. | Liquid chromatography–mass spectrometry-based metabolomics for authenticity assessment of fruit juices | |
Cozzolino et al. | Can spectroscopy geographically classify Sauvignon Blanc wines from Australia and New Zealand? | |
Consonni et al. | NMR based geographical characterization of roasted coffee | |
Li et al. | A novel strategy for discriminating different cultivation and screening odor and taste flavor compounds in Xinhui tangerine peel using E-nose, E-tongue, and chemometrics | |
Cagliani et al. | NMR investigations for a quality assessment of Italian PDO saffron (Crocus sativus L.) | |
Stilo et al. | Untargeted approaches in food-omics: The potential of comprehensive two-dimensional gas chromatography/mass spectrometry | |
Cui et al. | Machine learning applications for identify the geographical origin, variety and processing of black tea using 1H NMR chemical fingerprinting | |
CN104316635A (en) | Method for rapidly identifying flavor and quality of fruits | |
Zhao et al. | Detection of adulteration of sesame and peanut oils via volatiles by GC× GC–TOF/MS coupled with principal components analysis and cluster analysis | |
JP2009014700A (en) | Green tea quality prediction method | |
CN110376153B (en) | Method for tracing origin of market saffron by combining ATR-FTIR with RBF neural network | |
Moreno-Ley et al. | Prediction of coumarin and ethyl vanillin in pure vanilla extracts using MID-FTIR spectroscopy and chemometrics | |
Tian et al. | Development of a flavour fingerprint by GC‐MS and GC‐O combined with chemometric methods for the quality control of Korla pear (Pyrus serotina Reld) | |
CN115165950B (en) | Method for identifying tea production place tracing through double-phase extraction NMR spectrum and application thereof | |
CN113125590A (en) | Objective evaluation method for aroma quality of Yunnan red congou tea soup based on rapid gas-phase electronic nose technology | |
Jin et al. | Tracing the origin of Taiping Houkui green tea using 1H NMR and HS-SPME-GC–MS chemical fingerprints, data fusion and chemometrics | |
Cui et al. | 1H NMR-based metabolomic approach combined with machine learning algorithm to distinguish the geographic origin of huajiao (Zanthoxylum bungeanum Maxim.) | |
Jiménez-Carvelo et al. | Multivariate approach for the authentication of vanilla using infrared and Raman spectroscopy | |
Serag et al. | Integrated comparative metabolite profiling via NMR and GC–MS analyses for tongkat ali (Eurycoma longifolia) fingerprinting and quality control analysis | |
Soni et al. | A review of conventional and rapid analytical techniques coupled with multivariate analysis for origin traceability of soybean | |
Zhou et al. | Understanding the flavor signature of the rice grown in different regions of China via metabolite profiling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |