CN111220754A - Ginseng recognition platform and ginseng recognition method using same - Google Patents
Ginseng recognition platform and ginseng recognition method using same Download PDFInfo
- Publication number
- CN111220754A CN111220754A CN201811418055.9A CN201811418055A CN111220754A CN 111220754 A CN111220754 A CN 111220754A CN 201811418055 A CN201811418055 A CN 201811418055A CN 111220754 A CN111220754 A CN 111220754A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- image
- information
- unknown
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 241000208340 Araliaceae Species 0.000 title claims abstract description 35
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 title claims abstract description 35
- 235000003140 Panax quinquefolius Nutrition 0.000 title claims abstract description 35
- 235000008434 ginseng Nutrition 0.000 title claims abstract description 35
- 239000003814 drug Substances 0.000 claims abstract description 106
- 150000001875 compounds Chemical class 0.000 claims abstract description 99
- 238000001819 mass spectrum Methods 0.000 claims abstract description 78
- 239000000523 sample Substances 0.000 claims description 253
- 238000004949 mass spectrometry Methods 0.000 claims description 48
- 150000002500 ions Chemical class 0.000 claims description 32
- 230000014759 maintenance of location Effects 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 26
- 239000000126 substance Substances 0.000 claims description 16
- 238000012937 correction Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 239000003550 marker Substances 0.000 claims description 10
- 238000002360 preparation method Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 239000000463 material Substances 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 239000000843 powder Substances 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 230000009471 action Effects 0.000 claims description 4
- 238000002156 mixing Methods 0.000 claims description 4
- 238000003908 quality control method Methods 0.000 claims description 4
- 239000013558 reference substance Substances 0.000 claims description 4
- 238000013375 chromatographic separation Methods 0.000 claims description 3
- 238000004587 chromatography analysis Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 239000013062 quality control Sample Substances 0.000 claims description 3
- 241000894007 species Species 0.000 claims description 3
- 238000003556 assay Methods 0.000 claims description 2
- 230000005672 electromagnetic field Effects 0.000 claims description 2
- 239000008187 granular material Substances 0.000 claims description 2
- 238000003709 image segmentation Methods 0.000 claims description 2
- 238000002347 injection Methods 0.000 claims description 2
- 239000007924 injection Substances 0.000 claims description 2
- 238000001871 ion mobility spectroscopy Methods 0.000 claims description 2
- 239000002245 particle Substances 0.000 claims description 2
- 229940126680 traditional chinese medicines Drugs 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 9
- 238000012512 characterization method Methods 0.000 abstract description 2
- 230000004907 flux Effects 0.000 abstract description 2
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 12
- 239000013074 reference sample Substances 0.000 description 10
- 238000011160 research Methods 0.000 description 8
- 239000012634 fragment Substances 0.000 description 7
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 6
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 6
- 238000005336 cracking Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 239000006228 supernatant Substances 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 229930014626 natural product Natural products 0.000 description 5
- 235000008216 herbs Nutrition 0.000 description 4
- 238000000703 high-speed centrifugation Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 238000004704 ultra performance liquid chromatography Methods 0.000 description 4
- 238000002137 ultrasound extraction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 229930013930 alkaloid Natural products 0.000 description 2
- 238000000889 atomisation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010828 elution Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 229930182470 glycoside Natural products 0.000 description 2
- 150000002338 glycosides Chemical class 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 150000007524 organic acids Chemical class 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004885 tandem mass spectrometry Methods 0.000 description 2
- 150000003505 terpenes Chemical class 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 239000003643 water by type Substances 0.000 description 2
- GOLORTLGFDVFDW-UHFFFAOYSA-N 3-(1h-benzimidazol-2-yl)-7-(diethylamino)chromen-2-one Chemical compound C1=CC=C2NC(C3=CC4=CC=C(C=C4OC3=O)N(CC)CC)=NC2=C1 GOLORTLGFDVFDW-UHFFFAOYSA-N 0.000 description 1
- GAMYVSCDDLXAQW-AOIWZFSPSA-N Thermopsosid Natural products O(C)c1c(O)ccc(C=2Oc3c(c(O)cc(O[C@H]4[C@H](O)[C@@H](O)[C@H](O)[C@H](CO)O4)c3)C(=O)C=2)c1 GAMYVSCDDLXAQW-AOIWZFSPSA-N 0.000 description 1
- 238000000862 absorption spectrum Methods 0.000 description 1
- 150000003797 alkaloid derivatives Chemical class 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 150000001450 anions Chemical class 0.000 description 1
- PYKYMHQGRFAEBM-UHFFFAOYSA-N anthraquinone Natural products CCC(=O)c1c(O)c2C(=O)C3C(C=CC=C3O)C(=O)c2cc1CC(=O)OC PYKYMHQGRFAEBM-UHFFFAOYSA-N 0.000 description 1
- 150000004056 anthraquinones Chemical class 0.000 description 1
- RJGDLRCDCYRQOQ-UHFFFAOYSA-N anthrone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3CC2=C1 RJGDLRCDCYRQOQ-UHFFFAOYSA-N 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229940126678 chinese medicines Drugs 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000002989 correction material Substances 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005684 electric field Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 229930003944 flavone Natural products 0.000 description 1
- 150000002212 flavone derivatives Chemical class 0.000 description 1
- 235000011949 flavones Nutrition 0.000 description 1
- 229930003935 flavonoid Natural products 0.000 description 1
- 150000002215 flavonoids Chemical class 0.000 description 1
- 235000017173 flavonoids Nutrition 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 description 1
- 241000411851 herbal medicine Species 0.000 description 1
- 238000004896 high resolution mass spectrometry Methods 0.000 description 1
- 229930182851 human metabolite Natural products 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005040 ion trap Methods 0.000 description 1
- 150000002596 lactones Chemical class 0.000 description 1
- 229920005610 lignin Polymers 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 235000005985 organic acids Nutrition 0.000 description 1
- 150000002989 phenols Chemical class 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229930182490 saponin Natural products 0.000 description 1
- 150000007949 saponins Chemical class 0.000 description 1
- 235000017709 saponins Nutrition 0.000 description 1
- 229930000044 secondary metabolite Natural products 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000010421 standard material Substances 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 235000018553 tannin Nutrition 0.000 description 1
- 229920001864 tannin Polymers 0.000 description 1
- 239000001648 tannin Substances 0.000 description 1
- 235000007586 terpenes Nutrition 0.000 description 1
- 238000001269 time-of-flight mass spectrometry Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- VHBFFQKBGNRLFZ-UHFFFAOYSA-N vitamin p Natural products O1C2=CC=CC=C2C(=O)C=C1C1=CC=CC=C1 VHBFFQKBGNRLFZ-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N30/00—Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
- G01N30/02—Column chromatography
- G01N30/88—Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N30/00—Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
- G01N30/02—Column chromatography
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N30/00—Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
- G01N30/02—Column chromatography
- G01N30/62—Detectors specially adapted therefor
- G01N30/72—Mass spectrometers
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N30/00—Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
- G01N30/02—Column chromatography
- G01N30/88—Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
- G01N2030/8809—Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The application discloses a ginseng identification platform and a method for identifying ginseng by using the platform. The platform comprises a known sample information database module, an unknown sample information database module, a known sample chromatogram-mass spectrum image module, an unknown sample chromatogram-mass spectrum image module and an unknown sample identification module. And determining whether the chromatogram-mass spectrum data of the unknown sample is matched with the chromatogram-mass spectrum data of the known sample by comparing the generated chromatogram-mass spectrum data image of the unknown sample with the chromatogram-mass spectrum data image of the known sample, thereby identifying the unknown sample. The invention applies the traditional Chinese medicine chromatography-mass spectrum high-dimensional image technology, can realize comprehensive characterization on the spatial information among a large number of compounds in the ginseng sample, realizes matching and identification of unknown samples by utilizing the spatial information, and has the advantages of rapidness, high flux, high precision, high reliability and the like.
Description
Technical Field
The application relates to the technical field of traditional Chinese medicine detection, in particular to a ginseng identification platform and a ginseng identification method.
Background
The compound formation of complex samples is extremely complex. The traditional Chinese medicine is a typical complex sample, the contained components are extremely complex, the structure is various, the variety is various, the common types comprise phenols, alkaloids, saponins, terpenoids, flavonoids, lactones, anthrone, organic acids, tannins and the like, a single traditional Chinese medicine contains hundreds of thousands of secondary metabolites and micromolecular components, and the components of a traditional Chinese medicine compound preparation combined by various traditional Chinese medicines are more. Accordingly, a large amount of information is contained in the complex sample. Scientific problems such as the interrelation of Chinese herbal medicine compounds, the difference of drug properties and effects of different Chinese herbs, the difference of chemical components of the same herb, the influence of producing area, year and growth environment on the quality of herb, etc. are all contained therein.
Current research on complex samples faces two important bottlenecks: on the one hand, the research mostly adopts fragmented and dotted low-dimensional data, such as chromatographic retention time, m/z value, daughter ion fragment information and the like, and the correlation among the large amount of chemical components cannot be reflected even if the low-dimensional data is neglected. High-dimensional data is just a powerful carrier of massive amounts of information. High-dimensional data can effectively represent spatial information of data points in a sample to reflect their spatial relationship, as compared to low-dimensional data. Therefore, the high-dimensional data of the compound of the complex sample can be obtained, processed and mined from the complex sample. On the other hand, data resources generated by experiments are huge but scattered, and data generated by related researches cannot be integrated and utilized, so that the input cost of manpower, material resources, time and the like in scientific research work is high, and the output is not obvious. Database technology is a method for computer-aided management and data integration. It is the direction to solve the above-mentioned problems to combine high-dimensional data with database technology to build a high-dimensional data database.
The acquisition of high dimensional data requires the use of co-instrumentation. The chromatography-mass spectrometry combined technology combines a separation method-chromatography with extremely wide application range and a mass spectrometry which is sensitive, exclusive and can provide molecular weight and structural information, and is obviously an ideal means for acquiring high-dimensional data of a complex sample. At present, some databases based on chromatography-mass spectrometry technology are available, and can be roughly divided into two types:
1. standard compound mass spectra database: for example, a NIST standard compound substance spectrum database published by the National Institute of Science and Technology (NIST) of America records tens of thousands of standard substance spectra, and plays a great role in the metabonomics research on a GC-MS platform; also, for example, the Human Metabolome Database (HMDB) is currently the most complete and comprehensive Database of Human metabolites and Human metabolism. Such databases have found wide application in many research areas. However, the number of compounds that this type of database can provide is limited and does not provide chromatographic retention information for the compounds. Zhang Jia Yuan et al (pharmacy report, 2012,47(9):1187-1192) utilize high performance liquid chromatography-electrospray ion trap tandem mass spectrometry (HPLC-ESIIT-MS/MS), and establish a liquid chromatography-mass spectrometry-database (LC-MS-DS) containing 636 natural compounds (including common types of natural products such as flavone, coumarin, lignin, terpene and glycosides thereof, steroid and glycosides thereof, organic acid, alkaloid, anthraquinone, amino acid and the like) by taking a commercial workstation library editor as a platform, and are used for identification and targeted separation of unknown components of natural products. The database belongs to a standard compound substance spectrum database, and the reliability of the spectrum library retrieval can be evaluated by matching the retention time and the ultraviolet absorption spectrum of an unknown component and a reference substance or comparing whether main ion fragments in multi-level mass spectrograms of the unknown component and the reference substance are the same, so that the reliability of the result is improved. This database can only be used for the identification of compounds, and cannot be used for the identification of biological samples including natural products.
2. Compound information base: the UNIFI chinese medicine database introduced by WATERS corporation contains all the herbs listed in the chinese pharmacopoeia, 2010, and thousands of compound information (main compounds reported in the literature) related to these herbs. The database needs to obtain the chromatogram-mass spectrum data of the traditional Chinese medicine to be detected based on Ultra Performance Liquid Chromatography (UPLC) and quadrupole time-of-flight mass spectrometry (QTOF MS), the molecular formula is presumed according to the accurate molecular weight and is matched with the compound structure in the database, and the theoretical fragments calculated by software are matched with the collected secondary ions for confirmation. The database has the advantages that all the herbs and main compounds in the 2010 version Chinese pharmacopoeia are integrated, and the compound scale reaches thousands. The feasibility of scaling up the compound number of the database relative to a database of standard compound profiles with limited sources of standard material is evident. However, the database does not actually have real chromatogram-mass spectrum data of each compound, the identification of the compound only utilizes high-resolution mass spectrum to obtain accurate molecular weight presumed molecular formula, and the reliability is improved by combining theory to calculate secondary fragment matching. Although high resolution mass spectrometry provides accurate molecular weight of compounds to predict possible molecular formulas, the number of possible candidates for the same molecular formula is large, and even though the total number of compounds in the database is thousands, the average number of compounds in each herb is only tens, and most of them are high-content common compounds. The chemical components of the traditional Chinese medicine have typical complex diversity, hundreds of components of each traditional Chinese medicine may exist, the compound in the database may only contain a small part of the chemical components in the traditional Chinese medicine to be detected, and the identification capability of the traditional Chinese medicine for the medium-low content components is very limited. And the technology of theoretical calculation of secondary fragments is not mature at present, the accuracy is not high, and the matching result may have deviation, thereby causing false positive or false negative. The database also has compatibility problem and is only suitable for the WATERS workstation system. CellTie et al invented a database construction method suitable for mass spectrometry data analysis of natural products (application No. 201510443268.7). The method downloads all related compounds from a PubChem, CA or Reaxyz compound database, carries out computer simulation cracking on the compounds based on a cracking rule, obtains cracking fragments of the compounds, records related information of the compounds and the fragments, and then establishes the database. Compared with the UNIFI traditional Chinese medicine database, the method has the advantages that the number of the compounds is rich, the cracking rule is based on the cracking rule reported by the existing literature and the compound identification is completed by combining computer simulation cracking, and the reliability of the result is relatively improved. But the same as the UNIFI traditional Chinese medicine database, the database data is only based on compound structure information data, and no compound actual spectrogram exists; in addition, different instruments, different parameters have a great influence on the fragmentation behavior of the compounds, and the adaptability of the database to different sources (instruments, experimental conditions, etc.) is not clear.
The chromatography-mass spectrometry database takes a compound as a main body, focuses on the characteristics of a single dimension in data, stores part of the data in multi-dimension data, and does not convert the multi-dimension data into high-dimension data for integrated use. The traditional Chinese medicine chromatogram-mass spectrum high-dimensional image database established by the invention takes the traditional Chinese medicine as a main body, and not only comprises the traditional Chinese medicine integral information, but also comprises the single-point information of the traditional Chinese medicine compound. The Chinese medicine chromatogram-mass spectrum high-dimensional image database can be used for various researches such as identification, classification, quality control and deep data mining of Chinese medicines.
It should be noted that the Chinese medicine identification method of the invention can be applied to data obtained under similar or similar sample analysis conditions, so that the applicability of the method is greatly improved.
Disclosure of Invention
To solve the problems in the prior art, one aspect of the present invention provides a ginseng identification platform, which includes the following modules:
the system comprises a known sample information database module, an unknown sample information database module, a known sample chromatogram-mass spectrum image module, an unknown sample chromatogram-mass spectrum image module and an unknown sample identification module;
the known sample information database module transmits chromatogram-mass spectrum data of a known sample to the known sample chromatogram-mass spectrum image module, and the known sample chromatogram-mass spectrum image module outputs a first data image;
the unknown sample information database module transmits the chromatography-mass spectrometry data of the unknown sample to the unknown sample chromatography-mass spectrometry image module, and the unknown sample chromatography-mass spectrometry image module outputs a second data image;
the unknown sample identification module is used for recording the sample information of the known sample and the generated first data image, and comparing the generated second data image with the first data image to determine whether the chromatogram-mass spectrum data of the unknown sample is matched with the chromatogram-mass spectrum data of the known sample.
In a preferred embodiment, the chromatography-mass spectrometry data of the known sample comprises raw chromatography-mass spectrometry information of the known sample and the chromatography-mass spectrometry data of the unknown sample comprises raw chromatography-mass spectrometry information of the unknown sample.
In a preferred embodiment, the chromatography-mass spectrometry data of the known sample further comprises high dimensional data of each compound in the known sample, and the chromatography-mass spectrometry data of the unknown sample further comprises high dimensional data of each compound in the unknown sample.
The high-dimensional data expresses spatial information among data points in the sample, and the spatial information is a matrix formed by at least one of the following information: distance information between data points; angular relationship information between data points; coordinate position information of the data points; density information of the data points; edge range information for the set of data points; intensity information of the data points.
Preferably, the distance information between data points comprises at least one of chromatographic retention time t, m/z value, m value, z value, peak intensity I.
Preferably, the intensity information of the data point includes at least one of information reflected by the intensity of the size or brightness of the data point.
Preferably, the high dimensional data may be stored as a table file or a text file, and further preferably, the table file is one or more of · xls,. xlsx,. csv,. xml, and the text file is at least one of ·, α docx,. txt,. rtf.
In a preferred embodiment, the high-dimensional data image generated by the high-dimensional data includes at least one of an original image generated by the high-dimensional data, an image generated based on image features, an image generated by performing conversion processing on the image, and an image constructed by using a function.
Preferably, the image features comprise clusters of data point points, common particles, sample contours.
Preferably, the image conversion process includes at least one of a blurring process of the image or a process of subjecting the image to different resolutions.
Preferably, the function comprises at least one of chromatographic retention times t, m/z, m, I.
Preferably, the high-dimensional image is an image of more than two dimensions;
preferably, the image file may be stored in any image file format.
In a preferred embodiment, the known sample comprises at least one of a standard and a known chinese medicine sample.
Preferably, the standard substance comprises at least one of a reference substance of traditional Chinese medicine, traditional Chinese medicine marking components and main chemical components of traditional Chinese medicine in '2015 edition Chinese pharmacopoeia'.
Preferably, the known traditional Chinese medicine sample is a sample with definite category information, and the category information comprises at least one of the species, the origin, the part and the processing mode of the sample;
preferably, the known TCM sample comprises at least one of TCM raw material, decoction pieces and powder. Further preferably, the known chinese medicine sample includes at least one of different parts of chinese medicine and their processed products.
In a preferred embodiment, the unknown sample identification module comprises an image segmentation tool or a clustering tool.
In a preferred embodiment, the database types in the database modules in the ginseng identification platform provided by the present invention include at least one of a folder data set, a web page database, a database based on a commercial workstation or a database based on a user self-development workstation.
Preferably, the database format includes at least one of text, EXCEL, Oracle, mysql, split, or microsoft sqlserver.
In another aspect of the present invention, there is provided a method for recognizing ginseng using a ginseng recognition platform, the method at least comprising the steps of:
1) acquiring raw chromatograph-mass spectrum data of a known sample and an unknown sample using chromatography and mass spectrometry;
2) generating chromatogram-mass spectrum high-dimensional data of a known sample and an unknown sample, wherein the chromatogram-mass spectrum high-dimensional data expresses spatial information among data points;
3) generating a chromatogram-mass spectrum high-dimensional data image of a known sample and an unknown sample, enabling each ion in the high-dimensional data to correspond to a point in a formed image one by one, enabling each point to have own coordinate information, enabling the intensity of each point to be represented by the size or/and the intensity of the brightness of the point, and enabling the point in the high-dimensional data image to correspond to the high-dimensional data one by one;
4) dividing points in the chromatogram-mass spectrum high-dimensional image of the unknown sample into n point clusters (n is an integer more than or equal to 1) by using an image dividing tool or a clustering tool, and respectively scanning and matching the chromatogram-mass spectrum high-dimensional image of the unknown sample after the point clusters are extracted and the mass spectrum-chromatogram high-dimensional image of the known sample one by one;
5) ranking known samples matched with unknown samples according to the matching degree, sequentially retrieving the known samples in original chromatogram-mass spectrum data information and/or high-dimensional data information of the unknown samples according to the matching degree ranking, wherein the number of the labeled compounds corresponding to the labeled compounds of the known samples is more than or equal to 1, and when the labeled compounds are retrieved from the unknown samples, the unknown samples are accepted as the known samples, and the retrieval is stopped; if the first ranked known sample is not searched in the unknown sample, then retrieving a second ranked known sample marker compound in the unknown sample, and so on until the marker compound is retrieved; if all the matched marked compounds in the known samples are not retrieved from the unknown samples, the established database is considered to contain no unknown samples;
in a preferred embodiment, the coordinate information includes at least one of distance information between data points, angular relationship information between data points, coordinate position information of data points, density information of data points, edge range information of a set of data points, and intensity information of data points.
In a preferred embodiment, a point cluster is a collection of spatially close data points, where the number of data points n ≧ 3 within the point cluster.
Preferably, each of said clusters of points has its own centre point.
Preferably, the shape of the dot clusters is arbitrary.
In a preferred embodiment, raw chromatograph-mass spectrometry data of a known sample and an unknown sample are obtained by:
separating the mixed molecules in the known and unknown samples by selective action by using a chromatograph and an ion mobility spectrometry instrument to obtain different chromatographic retention time information t;
separating and detecting compounds in a sample according to different mass-to-charge ratios of molecules by using the electromagnetic field effect of a mass spectrometer to obtain different mass-to-charge ratio information m/z;
analyzing the sample extract by using a chromatography-mass spectrometer to obtain original chromatography-mass spectrometry data;
in a preferred embodiment, the time t used for chromatographic separation is in the range of 1 to 10000s and the m/z scan of the ions is in the range of 50 to 10000 Da.
In a preferred embodiment, the method may further comprise subjecting the acquired raw chromatography-mass spectrometry data to at least one of retention time correction, filtering and normalization.
In a preferred embodiment, the method may further comprise the step of using quality control samples and mixing standard internal standards.
Preferably, the quality control sample comprises at least one of a known sample or a mixture thereof, an unknown sample or a mixture thereof, and a mixture of two or more standards, and is used to evaluate the quality of data.
Preferably, internal standards for mixed standards can be used when mixed standards are employed to improve the reproducibility of the assay and to perform retention time corrections.
In a preferred embodiment, the unknown sample is at least one of a raw herb, a decoction piece, a powder, a preparation, a different part of a herb, and a processed product thereof.
Preferably, the preparation comprises traditional Chinese medicine granules or traditional Chinese medicine injection.
The beneficial effects that this application can produce include:
1) the ginseng recognition platform established by the invention comprises a traditional Chinese medicine chromatogram-mass spectrum high-dimensional image database, wherein the database takes the traditional Chinese medicine as a main body and comprises the traditional Chinese medicine integral information and single-point information of traditional Chinese medicine compounds. Therefore, the ginseng identification platform can well reveal the correlation among various complex components of the traditional Chinese medicine and can realize comprehensive characterization on the spatial information among a large number of compounds in a traditional Chinese medicine sample.
2) The ginseng chromatogram-mass spectrum high-dimensional image database can be used for various researches such as identification, classification, quality control and deep data mining of ginseng.
3) The ginseng identification method is suitable for data obtained under similar or similar sample analysis conditions, so that the applicability of the method is greatly improved.
4) The ginseng identification method provided by the invention realizes matching and identification of the known sample and the unknown sample by utilizing the spatial information of the sample, and has the advantages of rapidness, high flux, high precision, high reliability and the like.
Drawings
Fig. 1 is a schematic diagram illustrating the inventive concept.
Detailed Description
The present application will be described in detail with reference to examples, but the present application is not limited to these examples.
The following uniform interpretation of the relevant terms is as follows:
in the present application, "high-dimensional" refers to two and more dimensions. The "lower dimension" is one dimension.
The "common ions" refer to the same component (retention time and m/z are the same) in the same or different sample high-dimensional images.
"sample contours" refer to contours of a high-dimensional image produced by a sample.
A schematic diagram of the inventive concept is shown in fig. 1.
1. Establishing a Chinese medicine chromatogram-mass spectrum high-dimensional image database:
1) acquiring and processing raw chromatography-mass spectrometry (X-MS) data of a known chinese medicine sample in a known sample information database module 20: acquiring original X-MS data of a known traditional Chinese medicine sample by using a chromatogram and a mass spectrum, introducing the original X-MS data of the known traditional Chinese medicine sample into peak extraction software such as Progenisis QI, and carrying out data processing on the original X-MS data by using the chromatogram-mass spectrum;
2) generating high-dimensional data 200 of the known traditional Chinese medicine sample and generating a high-dimensional data image in the known sample chromatogram-mass spectrum image module 22: obtaining m/z, t, I, m and z values of each compound in a sample, generating a high-dimensional data matrix (such as an m/z-t-I matrix, an m-z-t-I matrix or an m-t-I matrix), and generating known traditional Chinese medicine sample chromatography-mass spectrometry combined high-dimensional data 200; the high dimensional data 200 is imported into image generation software such as Matlab to generate a first data image 220. Enabling each ion in the high-dimensional data to correspond to a point in a constructed image one by one, wherein each point has own coordinate information (such as t, m/z or m and z), the intensity of each point is represented by the size or/and the intensity of the brightness of the point, and the points in the high-dimensional data image correspond to the high-dimensional data one by one;
3) establishing a chromatogram-mass spectrum high-dimensional image database of known traditional Chinese medicine samples: taking the obtained high-dimensional data image of the known traditional Chinese medicine samples of 1 or more than 2 types as a traditional Chinese medicine chromatogram-mass spectrum high-dimensional image database, wherein the number of the samples in each type of the known traditional Chinese medicine samples is 1 or more than 2; the traditional Chinese medicine chromatogram-mass spectrum high-dimensional image database comprises sample information of known traditional Chinese medicine samples, original X-MS data information, high-dimensional data information and high-dimensional image data information;
2. and (3) rapidly identifying the ginseng:
1) acquisition of unknown sample high-dimensional image data 400: adopting the same or similar operation parameters and conditions as those in the step 1, operating according to the steps 1) to 2) in the step 1, analyzing the unknown sample to be analyzed, and obtaining original X-MS data and high-dimensional data of the unknown sample; obtaining an X-MS second data image 420 of the unknown sample from the X-MS data by using image generation software;
2) identifying the unknown sample in the unknown sample identification module 60;
A. dividing points in the unknown sample X-MS high-dimensional image into n point clusters (n is more than or equal to 1 integer) by utilizing an image dividing tool such as a Matlab2016b self-contained dividing program in machine learning or a clustering tool such as K-Means, DBSCAN or Fanny and the like;
the point cluster refers to a set of points close to each other in space, and the number n of the points in the point cluster is more than or equal to 3;
each point cluster can have a central point, and the shape of the point cluster can be any shape;
B. respectively scanning and matching the unknown sample X-MS second data image 420 after the point cluster extraction and the known traditional Chinese medicine sample X-MS first data image 220 in the traditional Chinese medicine X-MS high-dimensional image database one by one;
during scanning, aligning the origin, the t axis and the m/z (m) axis of the two X-MS high-dimensional images;
during scanning, the point cluster is taken as a whole, and the moving range is 0-TkWherein T iskThe maximum analysis time corresponding to the known traditional Chinese medicine sample;
during scanning, each point cluster of the unknown sample reserves the position and the geometric shape of an m/z (or m) axis and is scanned along a time axis (t);
through scanning, common points which can be accurately matched in t and m/z (or m) in the X-MS high-dimensional image of the unknown sample point cluster and the known traditional Chinese medicine sample are searched; in the scanning process, when a point in a point cluster in an unknown sample is matched with a point in a known traditional Chinese medicine sample X-MS high-dimensional image, the absolute deviation value (T tolerance) of T allowed by each point is equal to or more than T, and T is equal to the sum of the average deviation value (absolute value, which can be calculated by repeatedly measuring 1 or more than 1 standard substance or 1 or more than 1 compound in a certain sample) of retention time allowed by a chromatograph during the acquisition of the unknown sample X-MS data and the average deviation value (absolute value, which can be calculated by repeatedly measuring 1 or more than 1 standard substance or 1 or more than 1 compound in a certain sample) of retention time allowed by the chromatograph during the acquisition of the known traditional Chinese medicine sample X-MS data;
in the scanning process, when a point in a point cluster in an unknown sample is matched with a point in a known traditional Chinese medicine sample X-MS high-dimensional image, the allowed m/z (or m) absolute measurement error [ m/z (or m) tolerance ] of each point is more than or equal to A, wherein A is equal to the sum of the allowed mass average deviation (absolute value, which can be repeatedly measured by a correction fluid used by an instrument) during mass spectrometry scanning during the acquisition of the X-MS data of the unknown and known traditional Chinese medicine samples;
when one point in the unknown sample point cluster and a certain point of the known traditional Chinese medicine sample meet t deviation and m/z (or m) deviation, the point is considered to meet the matching requirement;
during scanning, the scanning step length of the point cluster along the time axis (T) is less than or equal to T, and under the normal condition, T is more than 0s and less than 10000 s;
C. when a point cluster moves to each position of the t axis of the known traditional Chinese medicine sample X-MS high-dimensional image, recording the number of matching points, the coordinates of each matching point and the coordinates of the geometric center point of the point cluster;
D. when each position is calculated, the matching degree (Si) between one point cluster (i, i is more than or equal to 1 integer) of the unknown sample and the X-MS high-dimensional image of the known traditional Chinese medicine sample can be calculated, and the size of the matching degree can be calculated by utilizing a statistical tool such as Matlab to calculate one or more than two of the point number, the similarity (such as Euclidean distance method in image similarity calculation) or the correlation (such as 2D-correlation coefficient in Matlab) of the point cluster (i) and the X-MS high-dimensional image of the known traditional Chinese medicine sample;
the matching degrees obtained by the three methods are respectively represented by points (or functions of the points), similarity or correlation;
the matching degree of the point clusters is linearly or nonlinearly related to the four variables of the point number, the coordinate position (t, m/z) and the intensity matched with the point clusters; the basis for calculating the number of points (or a function of the number of points), the similarity or the correlation is based on the relation transformation of four variables;
different matching degree calculation methods can be selected to respectively calculate the overall matching degree of the point clusters and the X-MS high-dimensional images of the known traditional Chinese medicine samples;
the number of the matching points refers to the number of the points of the point cluster meeting the matching condition; based on the steps, carrying out mathematical weighting processing (such as addition, average or logarithm taking) on the maximum matching degree (Si) of each point cluster in the unknown sample X-MS high-dimensional image to obtain the integral matching degree (Sc) of the unknown sample X-MS high-dimensional image and the known traditional Chinese medicine sample X-MS high-dimensional image;
E. repeating the steps, and analyzing the matching degree between the X-MS high-dimensional images of the unknown sample and the X-MS high-dimensional images of other known traditional Chinese medicine samples one by one to obtain the integral matching degree (Sc) between the X-MS high-dimensional images and each known traditional Chinese medicine sample;
F. the class to which the unknown sample belongs may be determined without or with the aid of a threshold value;
when the threshold value is not used, matching the unknown sample with the known traditional Chinese medicine sample by utilizing the steps, and sequencing the matching degrees from large to small, wherein if the rank of the matching degree of the unknown sample and a certain known traditional Chinese medicine sample is more advanced, the probability that the unknown sample is the sample is higher, and otherwise, the probability is smaller;
when the threshold value is used, setting a threshold value gamma for judging the credibility range of matching of unknown samples of different sources and similar known traditional Chinese medicine samples;
the threshold value can be set according to a statistical method: operating according to steps 1) to 2), selecting more than 2 known traditional Chinese medicine samples of the same type as training samples of a certain type by adopting the same or similar operating parameters and conditions, and analyzing to obtain X-MS original data; converting X-MS original data or a multi-dimensional information text into an X-MS high-dimensional image by using image generation software (such as Matlab2016b) to obtain a training X-MS high-dimensional image set of the sample; matching the training X-MS high-dimensional image set with the X-MS high-dimensional images of the same type of known traditional Chinese medicine samples, finding a matching degree distribution interval by a statistical method (such as probability, ratio and the like), and selecting the lower limit of the matching degree in the distribution interval as a threshold value gamma of the sample;
in addition, the threshold value can be obtained by utilizing literature reports or experimental observation, the distribution interval (n is more than or equal to 2) of the matching degree of a certain type of samples and the known traditional Chinese medicine samples (the analysis result is obtained by adopting the same or similar operation parameters and conditions according to the steps 1-2) in operation), and the lower limit of the matching degree in the distribution interval is selected as the threshold value gamma of the type of samples;
matching the unknown sample with the known traditional Chinese medicine sample, and sequencing the matching degrees from large to small, wherein if the matching degree of the unknown sample and the known traditional Chinese medicine sample is ranked more forward and Sc is greater than a threshold value gamma measured by the known traditional Chinese medicine sample, the probability that the unknown sample is the sample is higher, and if not, the probability is lower;
3) verification of unknown sample identification results
Arranging the known traditional Chinese medicine samples matched with the unknown samples in the step 2 according to the matching degree rank, sequentially searching the marked compounds (the number of the marked compounds is more than or equal to 1) corresponding to the known traditional Chinese medicine samples in the original X-MS data information and/or the high-dimensional data information of the unknown samples according to the matching degree rank, receiving the unknown samples as the known traditional Chinese medicine samples when the marked compounds are searched in the unknown samples, and stopping searching; if the first ranking known traditional Chinese medicine sample is not searched in the unknown sample, then searching a second ranking known traditional Chinese medicine sample for the marker compound in the unknown sample, and so on until the marker compound is searched; if all the matched marked compounds in the known traditional Chinese medicine samples are not retrieved from the unknown samples, the established database is considered to contain no unknown samples.
In step 2, when it is known whether the sample database has the standard, there is a slight difference:
search for marker compounds with standards: and (3) obtaining high-dimensional data of the standard sample by adopting the method in the step 1. Matching the high-dimensional data of the marked compound with the high-dimensional data of the unknown sample, and searching ions in the unknown sample, the retention time t and the m/z of which with the marked compound meet a threshold window;
search for marker compounds without standards: searching the m/z value of the marker compound in the unknown sample, and searching for the ions in the unknown sample, the retention time t and the m/z of the marker compound in the known traditional Chinese medicine sample both meeting a threshold window.
In step 1, in order to make the unknown sample comparable to the known traditional Chinese medicine sample, the same or similar repeatable sample processing, raw data acquisition and data processing methods should be adopted for each sample during the preparation of the unknown sample, the acquisition of the raw data and the data processing.
In step 1, the mean deviation (absolute value) of the retention time of the chromatograph means the mean value (absolute value) of the time deviation of each compound when the chromatograph repeatedly measures the same sample under the same conditions, and the measurement can be performed by using a mixed standard.
In step 1, raw chromatography-mass spectrometry data is obtained by:
1) separating the mixed molecules in the traditional Chinese medicine sample by a chromatograph and an ion mobility spectrometer through a selective action to obtain different retention time information t;
2) the mass spectrometer separates and detects according to different mass-to-charge ratios of molecules under the action of an electric field or a magnetic field to obtain different mass-to-charge ratio information m/z;
3) analyzing the Chinese medicinal sample extract with a chromatography-mass spectrometer, wherein the time (t) range for chromatographic separation is 1-10000s, and the ion (m/z) scanning range is 50-10000 Da; chromatography-mass spectrometry (X-MS) data were obtained.
In the step 1, the acquired original data can be subjected to one or more than two data processing of retention time correction, filtering, normalization and the like; wherein the retention time correction can adopt a retention time correction of a plurality of compounds (more than or equal to 2) in the sample to be analyzed, a retention time correction of a mixed standard substance or other retention time correction modes.
The high-dimensional data may include all ions in the high-dimensional data matrix, or the ions in the high-dimensional data matrix may be selectively retained.
The spot location of the high-dimensional data image is determined by the nature of the compound: the vertical axis represents the retention time of the chromatogram, and the compounds are distributed along the direction of the vertical axis from large to small according to the polarity; the horizontal axis represents m/z value, and the compounds are distributed along the horizontal axis from small to large according to the m/z value; the same compound can exist in a plurality of forms such as excimer ions, addition ions, fragment ions and the like in a mass spectrum, and each compound can exist spots at different horizontal axis positions at the same longitudinal axis position; compounds of similar nature (spots) form regional clusters of dots representing a certain type of substance.
The more ions contained in the chromatogram-mass spectrum data, the richer and more beneficial the constructed chromatogram-mass spectrum high-dimensional image information is to be identified.
The noise can cause recognition deviation, and early denoising is carried out by utilizing the signal-to-noise ratio or isotope distribution form of each ion in the original chromatogram-mass spectrum data, so that the higher the recognition accuracy is.
Step 1 does not require a forced time correction.
The chromatogram-mass spectrum information or ion mobility spectrum-mass spectrum information in the database can be expanded into two dimensions, three dimensions or higher dimensions.
Example 1 establishment of Chinese medicine chromatography-Mass Spectrometry high-dimensional image database
Preparation of known traditional Chinese medicine sample
The preparation method of the traditional Chinese medicine sample comprises but is not limited to solvent extraction, and comprises a method suitable for preparing all traditional Chinese medicine samples. Known Chinese medicine samples in the database of the present invention were prepared from 547 varieties of control drugs from the national institute for food and drug (see Table 1). Taking 100mg of each control medicinal material powder, respectively adding 0.5ml of 50% methanol by volume concentration, performing ultrasonic extraction for 10min, performing high-speed centrifugation at 15000 r/min for 10min to obtain supernatant, adding 0.5ml of 50% methanol by volume concentration into filter residue again, performing ultrasonic extraction for 10min, and performing high-speed centrifugation at 15000 r/min for 10min to obtain supernatant. Mixing the two extractive solutions to obtain supernatant.
Secondly, acquiring and processing the original data of the chromatography-mass spectrum of the known traditional Chinese medicine sample
The method is based on a chromatography-mass spectrometry combined technology to obtain the original data of the known traditional Chinese medicine sample. It is known that the original data of the traditional Chinese medicine sample needs to be analyzed under the same condition to obtain a comparable chromatogram-mass spectrum high-dimensional image. 6520Q-TOF-MS (Agilent Corp, USA) was cascaded using an Agilent 1290 ultra performance liquid chromatography system (Agilent, Waldbronn, Germany).
1. Chromatographic process
Using an Agilent ZORBAX Eclipse Plus C18 column (3.0X 150mm,1.8 μm), mobile phase A was water (0.5% acetic acid) and phase B was acetonitrile, gradient elution: 0 to 15 minutes, 5% -100% of phase B, 15 to 20 minutes, 100% of phase B, 20 to 21 minutes, 100% -5% of phase B, 21 to 25 minutes, 5% of phase B, flow rate 0.3 ml/minute. The column temperature was 60 ℃ and the amount of sample was 2. mu.l.
2. Mass spectrometry method
The mass spectrum adopts ESI ion source and negative ion mode to collect data. The data acquisition range is m/z 100-3200. The temperature was 350 ℃, the dryer flow rate was 8L/min, the atomization gas pressure was 40psi, the capillary voltage was 3500V, the Fragmentor voltage was 200V, and the skimmer voltage was 65V.
3. Data processing of chromatography-mass spectrometry raw data of known traditional Chinese medicine sample
The raw data of the present invention includes chromatographic information, such as chromatographic retention time and peak intensity, and mass spectral information, such as mass to charge ratio, for each compound in the sample extract. Raw data processing includes correction, filtering, and normalization of the data. And importing the original data into peak extraction software Progenisis QI, setting a threshold value as 0.005% of the intensity of a base peak to remove noise signals, acquiring m/z, t and I values of each compound in the sample, generating an m/z-t-I data matrix, and storing the m/z-t-I data matrix in an EXCEL table-csv file format.
Thirdly, acquiring high-dimensional data of known traditional Chinese medicine samples and high-dimensional images of chromatography-mass spectrometry
1. Acquisition of high dimensional data
And (3) importing the step file of processing the original data into Matlab software, and reserving the ions with the ion intensity ranking of the top 2000.
2. Creation of high dimensional data images
The points in the chromatogram-mass spectrum high-dimensional image correspond to high-dimensional data one by one. And (3) introducing the high-dimensional data into Matlab software, and drawing an m/z-t-I graph of the sample by taking m/z and t as coordinates, wherein each measurable compound has specific mass and time coordinates, and the mass spectrum signal intensity (peak value) I value of the compound is expressed by the area of a point or the chroma value of the point.
3. Conversion of high-dimensional image of chromatogram-mass spectrum
The high-dimensional data image can adopt the original image established in the steps to carry out conversion processing on the image, and the processing modes comprise image fuzzification processing, image different-resolution processing and the like.
Spatial information of high-dimensional image of chromatography-mass spectrum
The X-MS high-dimensional image of the present invention includes, but is not limited to, speckle and dot clusters. Each spot is generated by one compound, but each compound can generate one or more spots. The spot location is determined by the nature of the compound: the vertical axis represents the retention time of the chromatogram, and the compounds are distributed along the direction of the vertical axis from large to small according to the polarity; the horizontal axis represents m/z value, and the compounds are distributed along the horizontal axis from small to large according to the m/z value; the same compound can exist in multiple forms of excimer ions, addition ions, fragment ions and the like in a mass spectrum, so that each compound can exist in spots with the same longitudinal axis position and different transverse axis positions. Compounds of similar nature (spots) form regional clusters of dots representing a certain type of substance.
Fifthly, establishing high-dimensional images of traditional Chinese medicine chromatography-mass spectrometry
The database established by the embodiment includes, but is not limited to, text, EXCEL, Oracle, mysql, split or microsoft sql server, and the like. Obtaining a traditional Chinese medicine chromatogram-mass spectrum high-dimensional image database of 547 varieties of control medicinal materials, wherein the database comprises: 1) the sample information base in the EXCEL format comprises sample numbers, names, sources, specifications, medicinal material parts, orders, families, genera and species; 2) a folder-format original data database of all variety chromatography-mass spectra; 3) and the high-dimensional image database of all variety high-dimensional data in a folder format.
Example two: fast identification of unknown samples
First, preparation of unknown sample
The preparation method of the unknown sample is the same as that of the known traditional Chinese medicine sample. This example uses ginseng from the market as an unknown sample, designated NCYXT-A-D3-01. Taking 100mg of each unknown sample powder, respectively adding 0.5ml of 50% methanol by volume concentration, carrying out ultrasonic extraction for 10 minutes, carrying out high-speed centrifugation at 15000 r/min for 10 minutes, taking supernatant, adding 0.5ml of 50% methanol by volume concentration into filter residue again, carrying out ultrasonic extraction for 10 minutes, and carrying out high-speed centrifugation at 15000 r/min for 10 minutes, taking supernatant. Mixing the two extractive solutions to obtain supernatant.
Acquiring and processing original data of chromatography-mass spectrum of unknown sample
And acquiring the original data of the unknown sample based on a chromatography-mass spectrometry technology. The unknown sample raw data needs to be analyzed under the same or similar conditions with the known traditional Chinese medicine sample so as to obtain a comparable chromatogram-mass spectrum high-dimensional image. Unknown sample NCYXT-A-D3-01 sample unknown sample raw data were obtained using an Agilent 1290 ultra performance liquid chromatography system (Agilent, Waldbronn, Germany) in series with 6540Q-TOF-MS (Agilent Corp, USA).
1. Chromatographic process
Using an Agilent ZORBAX Eclipse Plus C18 column (3.0X 150mm,1.8 μm), mobile phase A was water (0.5% acetic acid) and phase B was acetonitrile, gradient elution: 0 to 15 minutes, 5% -100% of phase B, 15 to 20 minutes, 100% of phase B, 20 to 21 minutes, 100% -5% of phase B, 21 to 25 minutes, 5% of phase B, flow rate 0.3 ml/minute. The column temperature was 60 ℃ and the amount of sample was 2. mu.l.
2. Mass spectrometry method
The Agilent 6540Q-TOF-MS mass spectrum adopts an ESI ion source and an anion mode to collect data. The data acquisition range is m/z 100-3200. The temperature was 350 ℃, the dryer flow rate was 8L/min, the atomization gas pressure was 40psi, the capillary voltage was 3500V, the Fragmentor voltage was 200V, and the skimmer voltage was 65V.
3. Data processing of unknown sample chromatography-mass spectrometry raw data
The raw data includes chromatographic information, such as chromatographic retention time and peak intensity, and mass spectral information, such as mass to charge ratio, for each compound in the sample extract. Raw data processing includes correction, filtering, and normalization of the data. And importing the original data into peak extraction software Progenisis QI, setting a threshold value as 0.005% of the intensity of a base peak to remove noise signals, acquiring m/z, t and I values of each compound in the sample, generating an m/z-t-I data matrix, and storing the m/z-t-I data matrix in an EXCEL table-csv file format.
Thirdly, acquiring high-dimensional data of unknown sample and high-dimensional image of chromatogram-mass spectrum
1. Acquisition of high dimensional data
And (3) importing the step file of processing the original data into Matlab software, and reserving the ions with the ion intensity ranking of the top 2000.
2. Creation of high dimensional data images
And points in the chromatogram-mass spectrum high-dimensional image correspond to high-dimensional data one by one. And (3) introducing the high-dimensional data into Matlab software, and drawing an m/z-t-I graph of the sample by taking m/z and t as coordinates, wherein each measurable compound has specific mass and time coordinates, and the mass spectrum signal intensity (peak value) I value of the compound is expressed by the area of a point or the chroma value of the point.
3. Conversion of high-dimensional image of chromatogram-mass spectrum
The high-dimensional data image can adopt the original image established in the steps to carry out conversion processing on the image, and the conversion processing comprises processing modes such as image fuzzification processing, image different-resolution processing and the like. In this example, a high-dimensional data raw chromatography-mass spectrometry high-dimensional image is used.
Fourth, identification of unknown samples
1. Firstly, dividing points in an X-MS high-dimensional image of a sample to be detected NCYXT-A-D3-01 into 34 point clusters by using a clustering tool Clusterdp in machine learning; the number n of the points in the point cluster is more than or equal to 10;
2. respectively scanning and matching the chromatogram-mass spectrum high-dimensional image of the sample to be detected after the point cluster extraction and the chromatogram-mass spectrum high-dimensional image of the reference sample (m);
3. during scanning, aligning the origin, the t axis and the m/z axis of the two chromatograph-mass spectrum high-dimensional images, then keeping the position and the geometric shape of the m/z axis of each point cluster of the sample to be detected, and continuously scanning along a time axis (t); searching a common point which can be accurately matched in t and m/z in a chromatographic-mass spectrum high-dimensional image of the sample point cluster to be detected and a reference sample (m) through scanning;
4. during scanning, the point cluster as a whole moves within the range of 0-TkT is the effective analysis time corresponding to the sample, and T is taken in the examplek=1000s;
5. During scanning, the scanning step length of the point cluster along a time axis (t) is 1 s;
6. in the scanning process, when a point cluster in the sample to be detected is matched with a point in a chromatogram-mass spectrum high-dimensional image of a reference sample (m), the allowed minimum t deviation (ttolerance) of each point is +/-30 s; the minimum deviation allowed by m/z (or m) [ m/z (or m) tolerance ] is +/-0.01 Da;
7. when a point cluster moves to each position of the t axis of the X-MS high-dimensional image of the reference sample (m), recording the number of matching points, the coordinate of each matching point and the coordinate of the geometric center point of the point cluster;
8. calculating the correlation between a point cluster (i) of the sample to be detected and a reference sample (m) in a traditional Chinese medicine X-MS high-dimensional image database by using a 2D correlation function in Matlab;
9. calculating the maximum correlation degree of each point cluster of the sample to be detected and a reference sample chromatogram-mass spectrum high-dimensional image in the direction of the t axis;
10. calculating the matching degree (S) of each point cluster in the X-MS high-dimensional image of the sample to be detected and the chromatogram-mass spectrum high-dimensional image of the reference sample by using a point number calculating method according to the position of the point cluster for obtaining the maximum correlation degreei);
SiRepresenting the matching degree corresponding to the ith point cluster; k represents that a total of k points in the point cluster meet the matching requirement,a function of the relationship of m/z (or substitution of m), t (chromatographic retention time) and I (signal intensity of the ion) for each match point;representing the function value corresponding to the j point;
x, y, z refer to the index of the three variables I, m/z, and t, where x is 0 or greater; y is more than or equal to 0; z is more than or equal to 0;
in this embodiment, x is taken to be 0; y is 1/2; z is 1/2;
11. according to the steps, the overall matching degree (S) of the X-MS high-dimensional image of the sample to be detected and the X-MS high-dimensional image (m) of the reference sample is calculatedc);
n represents the number of all matching points corresponding to all point clusters at the maximum matching degree,representational general medicineObtaining the corresponding point of each point (1-n) by matching through a point cluster methodA value;
12. and repeating the steps to respectively obtain the matching degree of each detection sample.
And respectively matching the sample to be detected with the reference sample of 547 class by NCYXT-A-D3-01, wherein the matching degree of the sample to be detected and the reference sample of the ginseng DB-A2-1-0001 is the highest and is 218.19%. (the match of all reference samples is shown in Table 2).
Fifthly, verification of unknown sample identification result
And (4) searching the known ginseng sample with the highest matching degree as ginseng in the unknown sample NCYXT-A-D3-01 according to the matching degree, and searching the known ginseng sample for main components in the unknown sample NCYXT-A-D3-01 (t9.73, m/z 1163.5859). As a result, compound t9.12min, m/z1163.5903 was retrieved in unknown sample NCYXT-A-D3-01, and the retrieved compound was within acceptable retention time and m/z window, so that the unknown sample NCYXT-A-D3-01 was accepted as ginseng. And the ginseng sample is correctly identified by referring to the medicinal material information of the unknown sample.
Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.
TABLE 1
TABLE 2
Claims (10)
1. A ginseng recognition platform, the platform comprising:
the system comprises a known sample information database module, an unknown sample information database module, a known sample chromatogram-mass spectrum image module, an unknown sample chromatogram-mass spectrum image module and an unknown sample identification module;
the known sample information database module transmits chromatogram-mass spectrum data of a known sample to the known sample chromatogram-mass spectrum image module, and the known sample chromatogram-mass spectrum image module outputs a first data image;
the unknown sample information database module transmits the chromatography-mass spectrometry data of the unknown sample to the unknown sample chromatography-mass spectrometry image module, and the unknown sample chromatography-mass spectrometry image module outputs a second data image;
the unknown sample identification module is used for recording the sample information of the known sample and the generated first data image, and comparing the generated second data image with the first data image to determine whether the chromatogram-mass spectrum data of the unknown sample is matched with the chromatogram-mass spectrum data of the known sample.
2. The ginseng identification platform of claim 1, wherein the chromatography-mass spectrometry data of the known sample comprises raw chromatography-mass spectrometry information of the known sample, and wherein the chromatography-mass spectrometry data of the unknown sample comprises raw chromatography-mass spectrometry information of the unknown sample;
preferably, the chromatography-mass spectrometry data of the known sample further comprises high-dimensional data of each compound in the known sample, and the chromatography-mass spectrometry data of the unknown sample further comprises high-dimensional data of each compound in the unknown sample;
further preferably, the spatial information between data points in the high-dimensional data expression sample is a matrix formed by at least one of the following information:
distance information between data points;
angular relationship information between data points;
coordinate position information of the data points;
density information of the data points;
edge range information for the set of data points;
intensity information of the data points;
preferably, the distance information between the data points comprises at least one of chromatographic retention time t, m/z value, m value, z value, peak intensity I;
preferably, the intensity information of the data point includes at least one of information reflected by the intensity of the size or brightness of the data point.
3. The ginseng recognition platform according to claim 2, wherein the high-dimensional data image generated by the high-dimensional data comprises at least one of an original image generated by the high-dimensional data, an image generated based on image features, an image generated by converting the image, and an image constructed by using a function;
preferably, the image features comprise clusters of data point points, common particles, sample contours;
preferably, the image conversion process includes at least one of a process of blurring the image and a process of subjecting the image to different resolutions;
preferably, the function comprises at least one of chromatographic retention time t, m/z, m, peak intensity I;
preferably, the high-dimensional image is an image of more than two dimensions;
preferably, the image file is stored in an image file format.
4. The ginseng recognition platform of claim 1, wherein the known samples comprise at least one of standards or known chinese herbal samples;
preferably, the standard substance comprises at least one of a reference substance, a traditional Chinese medicine marking component and a main chemical component of the traditional Chinese medicine 2015 edition of Chinese pharmacopoeia;
preferably, the known traditional Chinese medicine sample is a sample with definite category information, and the category information comprises at least one of the species, the producing area, the part and the processing mode of the sample;
preferably, the known traditional Chinese medicine sample comprises at least one of raw traditional Chinese medicine materials, decoction pieces and powder, and further preferably, the known traditional Chinese medicine sample comprises at least one of different parts of traditional Chinese medicines and processed products thereof.
5. The ginseng recognition platform of claim 1, wherein the unknown sample recognition module comprises an image segmentation tool or a clustering tool.
6. The ginseng recognition platform of claim 1, wherein the database types in the database modules comprise at least one of a folder data set, a web page database, a database based on a commercial workstation or a database based on a user self-developed workstation.
7. A method for identifying ginseng using the ginseng identification platform according to any one of claims 1 to 6, the method comprising at least the steps of:
1) acquiring raw chromatograph-mass spectrum data of a known sample and an unknown sample using chromatography and mass spectrometry;
2) generating chromatography-mass spectrometry high-dimensional data of a known sample and an unknown sample, wherein the chromatography-mass spectrometry high-dimensional data expresses spatial information among data points;
3) generating a chromatogram-mass spectrum high-dimensional data image of a known sample and an unknown sample, enabling each ion in the high-dimensional data to correspond to a point in a formed image one by one, enabling each point to have own coordinate information, enabling the intensity of each point to be represented by the size or/and the intensity of the brightness of the point, and enabling the point in the high-dimensional data image to correspond to the high-dimensional data one by one;
4) dividing points in the chromatogram-mass spectrum high-dimensional image of the unknown sample into n point clusters by using an image dividing tool or a clustering tool, wherein n is an integer more than or equal to 1, and respectively scanning and matching the chromatogram-mass spectrum high-dimensional image of the unknown sample after the point clusters are extracted and the mass spectrum-chromatogram high-dimensional image of the known sample one by one;
5) ranking known samples matched with unknown samples according to the matching degree, sequentially retrieving the known samples in original chromatogram-mass spectrum data information and/or high-dimensional data information of the unknown samples according to the matching degree ranking, wherein the number of the labeled compounds corresponding to the labeled compounds of the known samples is more than or equal to 1, and when the labeled compounds are retrieved from the unknown samples, the unknown samples are accepted as the known samples, and the retrieval is stopped; if the first ranked known sample is not searched in the unknown sample, then retrieving a second ranked known sample marker compound in the unknown sample, and so on until the marker compound is retrieved; if all the matched marked compounds in the known samples are not retrieved from the unknown samples, the established database is considered to contain no unknown samples;
preferably, the coordinate information includes at least one of distance information between data points, angular relationship information between data points, coordinate position information of data points, density information of data points, edge range information of a data point set, and intensity information of data points;
preferably, the point cluster is a set of data points close in space, and the number n of the data points in the point cluster is more than or equal to 3;
preferably, each of said clusters has its own centre point;
preferably, the shape of the dot clusters is arbitrary.
8. The method of claim 7, wherein the raw chromatography-mass spectrometry data of the known and unknown samples is obtained by:
separating the mixed molecules in the known and unknown samples by selective action by using a chromatograph and an ion mobility spectrometry instrument to obtain different chromatographic retention time information t;
separating and detecting compounds in a sample according to different mass-to-charge ratios of molecules by using the electromagnetic field effect of a mass spectrometer to obtain different mass-to-charge ratio information m/z;
analyzing the sample extract by using a chromatography-mass spectrometer to obtain original chromatography-mass spectrometry data;
preferably, the time t used for chromatographic separation is in the range of 1-10000s and the m/z scan of the ions is in the range of 50-10000 Da.
9. The method of claim 8, further comprising subjecting the acquired raw chromatography-mass spectrometry data to at least one of retention time correction, filtering, and normalization;
preferably, the method further comprises the step of using quality control samples and mixing standard internal standards;
preferably, the quality control sample comprises at least one of a known sample or a mixture thereof, an unknown sample or a mixture thereof, and a mixture of two or more standards, and the quality control sample is used for evaluating data quality;
preferably, internal standards of the mixed standards are used when the mixed standards are employed to improve the reproducibility of the assay and to perform retention time corrections.
10. The method of claim 8, wherein the unknown sample is at least one of a raw herb, a decoction piece, a powder, a preparation, different parts of a herb, and processed products thereof;
preferably, the preparation comprises traditional Chinese medicine granules or a traditional Chinese medicine injection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811418055.9A CN111220754A (en) | 2018-11-26 | 2018-11-26 | Ginseng recognition platform and ginseng recognition method using same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811418055.9A CN111220754A (en) | 2018-11-26 | 2018-11-26 | Ginseng recognition platform and ginseng recognition method using same |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111220754A true CN111220754A (en) | 2020-06-02 |
Family
ID=70810460
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811418055.9A Pending CN111220754A (en) | 2018-11-26 | 2018-11-26 | Ginseng recognition platform and ginseng recognition method using same |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111220754A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113933442A (en) * | 2021-09-17 | 2022-01-14 | 深圳大学 | Full-two-dimensional meteorological chromatography-mass spectrometry combined data analysis method, system and application |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003037250A2 (en) * | 2001-10-26 | 2003-05-08 | Phytoceutica, Inc. | Matrix methods for analyzing properties of botanical samples |
CN103823008A (en) * | 2014-03-14 | 2014-05-28 | 北京市疾病预防控制中心 | Method for detecting unknown poison by establishing liquid chromatography-mass spectrometry database |
GB201512602D0 (en) * | 2015-07-17 | 2015-08-26 | Ixico Technologies Ltd And Imp Innovations Ltd | Method of modelling biomarkers |
CN105574474A (en) * | 2014-10-14 | 2016-05-11 | 中国科学院大连化学物理研究所 | Mass spectrometry information-based biological characteristic image identification method |
CN108152434A (en) * | 2016-12-02 | 2018-06-12 | 中国科学院大连化学物理研究所 | A kind of lookup method of the Chinese medicine specific component based on visualization Information in Mass Spectra |
-
2018
- 2018-11-26 CN CN201811418055.9A patent/CN111220754A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003037250A2 (en) * | 2001-10-26 | 2003-05-08 | Phytoceutica, Inc. | Matrix methods for analyzing properties of botanical samples |
CN103823008A (en) * | 2014-03-14 | 2014-05-28 | 北京市疾病预防控制中心 | Method for detecting unknown poison by establishing liquid chromatography-mass spectrometry database |
CN105574474A (en) * | 2014-10-14 | 2016-05-11 | 中国科学院大连化学物理研究所 | Mass spectrometry information-based biological characteristic image identification method |
GB201512602D0 (en) * | 2015-07-17 | 2015-08-26 | Ixico Technologies Ltd And Imp Innovations Ltd | Method of modelling biomarkers |
CN108152434A (en) * | 2016-12-02 | 2018-06-12 | 中国科学院大连化学物理研究所 | A kind of lookup method of the Chinese medicine specific component based on visualization Information in Mass Spectra |
Non-Patent Citations (3)
Title |
---|
SANG-KYUN KIM 等: "TM-MC: a database of medicinal materials and chemical compounds in Northeast Asian traditional medicine", 《BMC COMPLEMENTARY MEDICINE AND THERAPIES》 * |
周璐薇 等: "中药关键质量属性快速评价:近红外化学成像可视化技术", 《世界科学技术-中医药现代化》 * |
田宏哲 等: "农产品中50余种农药LC-MS/MS质谱数据库的建立及应用", 《食品科学》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113933442A (en) * | 2021-09-17 | 2022-01-14 | 深圳大学 | Full-two-dimensional meteorological chromatography-mass spectrometry combined data analysis method, system and application |
CN113933442B (en) * | 2021-09-17 | 2023-09-29 | 深圳大学 | Full two-dimensional gas chromatography-mass spectrometry combined data analysis method, system and application |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109870515B (en) | Traditional Chinese medicine identification method based on traditional Chinese medicine chromatogram-mass spectrum high-dimensional image database | |
CN109781917B (en) | Biological sample intelligent identification method based on molecular map | |
CN103109345B (en) | The Dynamic data exchange of product ion spectrum obtains and reference spectra storehouse coupling | |
CN105574474A (en) | Mass spectrometry information-based biological characteristic image identification method | |
US20060292246A1 (en) | Characteristic mass spectral fingerprint setting method and rapid identification method for Chinese herbal medicines and prescriptions | |
CN108593825B (en) | Method for mining mass spectrum data of red ginseng and screening specific markers | |
CN104170052A (en) | Method and apparatus for improved quantitation by mass spectrometry | |
Mattoli et al. | Mass spectrometry‐based metabolomic analysis as a tool for quality control of natural complex products | |
Mitrevski et al. | Chemical signature of ecstasy volatiles by comprehensive two-dimensional gas chromatography | |
CN111220750A (en) | Traditional Chinese medicine identification platform and traditional Chinese medicine identification method using same | |
CN109946413B (en) | method for detecting proteome by pulse type data independent acquisition mass spectrum | |
CN103115991B (en) | Spectrum library screening method aiming at mass spectrograms of mixtures | |
CN113419000B (en) | Method for identifying panax notoginseng with 25 heads and less than 80 heads based on non-targeted metabonomics | |
CN111220754A (en) | Ginseng recognition platform and ginseng recognition method using same | |
CN111220751A (en) | Pseudo-ginseng identification platform and pseudo-ginseng identification method using same | |
Barnes | Overview of experimental methods and study design in metabolomics, and statistical and pathway considerations | |
Fischer et al. | An accessible, scalable ecosystem for enabling and sharing diverse mass spectrometry imaging analyses | |
CN111220756A (en) | Radix rehmanniae identification platform and radix rehmanniae identification method using same | |
CN111220752A (en) | American ginseng identification platform and American ginseng identification method using same | |
CN111220753A (en) | Sophora flavescens recognition platform and sophora flavescens recognition method using same | |
CN111222524A (en) | Albizzia julibrissin identification platform and albizzia julibrissin identification method using same | |
CN111220757A (en) | Astragalus membranaceus identification platform and astragalus membranaceus identification method using same | |
CN111220755A (en) | Albizzia julibrissin identification platform and albizzia julibrissin identification method using same | |
CN114420222B (en) | Distributed flow processing-based method for rapidly confirming fragment ion compound structure | |
CN114577966B (en) | GC x GC fingerprint rapid comparison method for classifying MSCC combined with modulation peak |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200602 |