CN114283877A - Method for establishing metabolite model and metabonomics database thereof - Google Patents

Method for establishing metabolite model and metabonomics database thereof Download PDF

Info

Publication number
CN114283877A
CN114283877A CN202110471744.1A CN202110471744A CN114283877A CN 114283877 A CN114283877 A CN 114283877A CN 202110471744 A CN202110471744 A CN 202110471744A CN 114283877 A CN114283877 A CN 114283877A
Authority
CN
China
Prior art keywords
metabolites
retention time
metabolite
database
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110471744.1A
Other languages
Chinese (zh)
Inventor
赵爽
韩伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Mailio Technology Co ltd
Original Assignee
Xiamen Mailio Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Mailio Technology Co ltd filed Critical Xiamen Mailio Technology Co ltd
Priority to CN202110471744.1A priority Critical patent/CN114283877A/en
Publication of CN114283877A publication Critical patent/CN114283877A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a method for establishing a metabolite model and a metabonomics database thereof. Firstly, a metabolite retention time model is established, and a new metabonomic database using predicted retention time is established on the basis of a known metabonomic database. During the establishment, known metabolites were randomly divided into training and test groups. The MD and retention times of the metabolites of the training set were used to model, the support vector machine method was used to model, and the test set was used to verify the model condition. And combining the information of the new metabolites to be included, and obtaining the retention time of the new metabolites according to the established model. The method of the invention aims to obtain the predicted retention time of the metabolite by means of computer-aided simulation in the absence of chemical standards for the metabolite.

Description

Method for establishing metabolite model and metabonomics database thereof
Technical Field
The invention relates to the field of biological information, in particular to a method for establishing a model and a metabonomics database thereof.
Background
Metabolite identification is an important link in non-targeted metabolomics. By metabolite identification, peak signals collected by an instrument (such as high performance liquid chromatography-mass spectrometry) can be converted into metabolite information, so that qualitative and quantitative results of the metabolites are obtained. Identification of metabolites is typically done by comparing unknown signals to known information in a metabolite database, using one or more parameter matching approaches to determine the metabolite profile. Metabolite identification can be classified as accurate identification (matching using two or more unrelated parameters) or inferred (matching using only one parameter) depending on the number of parameters used, the accuracy of the match, and the threshold of the match.
In metabolomics research, the efficacy of metabolite identification is closely related to the metabolomic databases used. The more the quantity and the more the variety of the metabolites are covered by the database, the more detailed and accurate the parameter information possessed by each metabolite, the better the metabolite identification effect is, i.e. the more the metabolites can be identified and the more accurate the identification result is.
HP-CIL metabonomics are called high-efficiency chemical isotope labeling combined liquid chromatography-mass spectrometry metabonomics. In contrast to conventional metabolomics analysis, HP-CIL metabolomics derivatize samples using chemical isotope labeling reagents in the sample processing stage. In a conventional metabolomics analysis process, the workflow includes: sample pretreatment-sample preparation-instrument analysis-data processing-metabolite identification-biological analysis. Since the original form of the metabolite is being detected, the metabolite information contained in the database is also a parameter of each type of metabolite prototype. In HP-CIL metabonomics, the workflow includes: sample pre-treatment-sample preparation-metabolite derivatization-instrument analysis-data processing-metabolite identification-biological analysis, and the detected signal comes from the derivatized metabolite. Therefore, in the metabolite identification process, the information contained in the database used should be various kinds of parameter information of the derived metabolites.
In the identification of metabolites in HP-CIL metabolomics, the parameters used include: accurate mass, retention time and secondary mass spectral fragment information. Results obtained from performing the assay using at least two of the parameters (e.g., using accurate mass and retention time), the results obtained being accurate assays; and one of the two is used (generally, the accurate mass), and the obtained result is an estimation result, so that the reliability is low. Among these, the acquisition of the retention time parameter generally requires the analysis of the metabolite by experiment, i.e. the experimental retention time.
The current method for establishing HP-CIL metabonomics database comprises the following steps:
1. purchasing or laboratory anabolic chemical standards;
2. chemical standards for each metabolite were dissolved separately and derivatized following the same derivatization reaction steps using the corresponding derivatizing reagents.
3. And detecting and analyzing the derivatized metabolite standard substance by using high performance liquid chromatography-mass spectrometry. And (3) independently analyzing each derived metabolite standard, and collecting accurate mass, experiment retention time and secondary mass spectrum fragment information.
4. Unifying the collected information and establishing a database. Each entry in the database is a metabolite and contains information corresponding to the three parameters.
The information collected by the metabonomics database established in the way is from a real experiment, the information is accurate, and the obtained retention time is called experiment retention time. The results obtained by identifying metabolites using these information are highly reliable. But the economic cost and the time cost for purchasing or laboratory anabolic chemical standards are large; some metabolites do not have chemical standards available and therefore cannot be added to metabonomic databases. The method is limited by the difficulty in obtaining chemical standards of the metabolites and high cost in the existing method, and the database established by the method has low content and contains few metabolites (<1000 metabolites). For the convenience of expression, the database established by the method is called a CIL metabonomics database.
Disclosure of Invention
In order to solve the problem of small database content caused by difficulty in obtaining standard products in the existing database establishing method, the invention aims to provide a novel database establishing method. The new database was created to obtain the predicted retention time of the metabolite by means of computer-assisted simulation in the absence of metabolite chemical standards.
In order to achieve the above object, the present invention provides a method for modeling a metabolite, comprising the steps of:
1) establishing a metabolite retention time model:
a) searching metabolites in a known metabolism database on a PubChem website to obtain the SMILES structure and other related information of the metabolites;
b) analyzing all metabolites in the known metabolomics database according to the SMILES structural formula obtained from PubChem by using Chemistry Development Kit to obtain CDK Descriptors thereof;
c) combining PubChem Descriptors and CDK Molecular Descriptors to obtain complete property expression of all metabolites, namely MDs;
d) combining the MD of all metabolites with their corresponding retention times as recorded in a known metabolomic database;
e) randomly dividing all metabolites in a known metabonomic database into two groups, wherein one group contains 6/7 total metabolites and is called a training group; the other group, containing the remaining 1/7 metabolite, was the test group;
f) establishing a regression model by using MD and retention time of the metabolites of the training group, and establishing the model by using an SVM method; requirement Q2>0.7,
2) Creating a new metabolite list:
a) establishing metabolites to be included in the database;
b) obtaining PubChem SMILES structural formula and CDK Descriptors of the metabolites by the method;
c) combining the PubChem Descriptors and CDK Descriptors of the metabolites to obtain complete property expression of each metabolite, namely MDs;
3) obtaining a predicted retention time:
a) according to the established model, using MDs of the new metabolite to obtain the retention time of the new metabolite;
b) a database is built with the list of new metabolites, the exact mass of the new metabolites and the retention time obtained.
Further, the specific steps of establishing the model by using the svm method in the step f) are as follows:
running an e1071 package by using R;
using MD and retention time of metabolites of the training group as variable inputs;
thirdly, using a radial basis kernel to transform the MD data into a high-latitude data space; radial basis kernel equation is
Figure BDA0003045622360000031
Wherein u and v are variables, e is a natural constant, and r and cost are parameters;
fourthly, running the program after the parameters are determined, and establishing a regression model between the MD data and the retention time; the model can be expressed as rt-XLOGP + LipinskiFailures + nRotB + MLogP + nATOmLAC + … …, wherein each variable is MD, and is limited by using a parameter weight, namely w, and an intercept, namely b; preferably, the values of the portions w and b are as follows: XLOGP of w is 109.6916, LipinskiFailues is-45.92101, nRotB is-31.93641, MLogP is 128.7612, nATOmLAC is 96.8386; and b is-1.53251329.
Furthermore, after the model is established, the metabolites of the test group are required to be used for model verification, and the steps are as follows,
1) loading the established regression model in the R program: an Rdata file;
2) inputting MD of the test group metabolites as variables;
3) running the program to obtain a predicted retention time; comparing the predicted retention time with the experimental retention time;
the model predicted retention time success criteria were:
the predicted retention time and the experimental retention time of all the metabolites in the test group are linearly related;
the difference value between the predicted retention time of all metabolites and the experimental retention time is in a certain range; preferably, this range is used as retention time threshold for metabolite identification, within 180 second.
Further, the known metabolism database is a known CIL metabolism database.
The invention also provides a method for establishing the metabonomics database, which is characterized in that the metabonomics database is obtained by applying the method.
The metabonomics database established by the invention has the following characteristics:
1) the retention time obtained by this method is the predicted retention time, not the experimental retention time, due to the lack of metabolite chemical standards. The accuracy of the predicted retention time is lower than that of the experimental retention time, and when the predicted retention time is used for metabolite identification, the reliability of the obtained result is lower than that of the experimental retention time.
2) The results of metabolite identification using the database built in this way are considered to be highly reliable presumption results: because of the lack of experimental information for chemical standards, it can only be used as a presumptive result; however, two independent parameters are used for identification (i.e. accurate mass and predicted retention time), and compared with general estimation, the obtained result has higher reliability, which is called high-reliability estimation result.
3) The metabolomics database is established in a computer-aided simulation mode, chemical standards and a large number of experimental processes are not needed, and huge economic cost and time cost are avoided.
4) The database established in this way has higher metabolite content, which can reach tens of thousands of metabolite information.
Drawings
FIG. 1 is a flow chart of the operation of the method of the present invention as exemplified by CIL metabolomics database;
FIG. 2 is a graph of the results of using the established model with a linear correlation of predicted retention time to actual experimental retention time.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention. The examples do not specify particular techniques or conditions, and are performed according to the techniques or conditions described in the literature in the art or according to the product specifications. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products commercially available.
A method of modeling a metabolite, comprising the steps of:
1) establishing a metabolite retention time model:
a) searching metabolites in a known metabolism database on a PubChem website to obtain the SMILES structure and other related information of the metabolites;
b) analyzing all metabolites in the known metabolomics database according to the SMILES structural formula obtained from PubChem by using Chemistry Development Kit to obtain CDK Descriptors thereof;
c) combining PubChem Descriptors and CDK Molecular Descriptors to obtain complete property expression of all metabolites, namely MDs;
d) combining the MD of all metabolites with their corresponding retention times as recorded in a known metabolomic database;
e) randomly dividing all metabolites in a known metabonomic database into two groups, wherein one group contains 6/7 total metabolites and is called a training group; the other group, containing the remaining 1/7 metabolite, was the test group;
f) establishing a regression model by using MD and retention time of the metabolites of the training group, and establishing the model by using an SVM method; requirement Q2>0.7,
2) Creating a new metabolite list:
a) establishing metabolites to be included in the database;
b) obtaining PubChem SMILES structural formula and CDK Descriptors of the metabolites by the method;
c) combining the PubChem Descriptors and CDK Descriptors of the metabolites to obtain complete property expression of each metabolite, namely MDs;
3) obtaining a predicted retention time:
a) according to the established model, using MDs of the new metabolite to obtain the retention time of the new metabolite;
b) a database is built with the list of new metabolites, the exact mass of the new metabolites and the retention time obtained.
Further, the specific steps of establishing the model by using the svm method in the step f) are as follows:
running an e1071 package by using R;
using MD and retention time of metabolites of the training group as variable inputs;
thirdly, using a radial basis kernel to transform the MD data into a high-latitude data space; radial basis kernel equation is
Figure BDA0003045622360000051
Wherein u and v are variables, e is a natural constant, and r and cost are parameters;
fourthly, running the program after the parameters are determined, and establishing a regression model between the MD data and the retention time; the model can be expressed as rt-XLOGP + LipinskiFailures + nRotB + MLogP + nATOmLAC + … …, wherein each variable is MD, and is limited by using a parameter weight, namely w, and an intercept, namely b; preferably, the values of the portions w and b are as follows: XLOGP of w is 109.6916, LipinskiFailues is-45.92101, nRotB is-31.93641, MLogP is 128.7612, nATOmLAC is 96.8386; and b is-1.53251329.
Furthermore, after the model is established, the metabolites of the test group are required to be used for model verification, and the steps are as follows,
1) loading the established regression model in the R program: an Rdata file;
2) inputting MD of the test group metabolites as variables;
3) running the program to obtain a predicted retention time; comparing the predicted retention time with the experimental retention time;
the model predicted retention time success criteria were:
the predicted retention time and the experimental retention time of all the metabolites in the test group are linearly related;
the difference value between the predicted retention time of all metabolites and the experimental retention time is in a certain range; preferably, this range is used as retention time threshold for metabolite identification, within 180 second.
The following example is combined with the workflow diagram of fig. 1. FIG. 1 is a work flow diagram of the method of the invention as exemplified by CIL metabolomics database.
Example 1:
Dns-Library is a CIL metabonomics database, namely established by the prior art method, and the establishment process and the database are published in the following steps:
Tao Huan,Yiman Wu,Chenqu Tang,Guohui Lin and Liang Li,2015,“DnsID in MyCompoundID for Rapid Identification of Dansylated Amine-and Phenol-Containing Metabolites in LC-MS-Based Metabolomics”,Anal.Chem.87,9838–9845.
the database contains 273 metabolites in small quantities, and the small quantities of metabolites that can be identified from the instrumental data are used for metabolite identification. For human urine sample analysis, 105 metabolites were identified if identified using accurate mass and experimental retention time.
By using the method provided by the invention, a new metabonomics database using the predicted retention time is established on the basis of the CIL metabonomics database according to the steps. During the set-up, 273 metabolites were randomly divided into training and testing groups. According to the principle of using 1/7 metabolites as test groups, 234 metabolites in 273 metabolites are used as training groups, and 39 metabolites are used as test groups. The training set was used to build the model and the test set was used to verify the model condition.
The specific model establishing steps are as follows:
1) establishing a metabolite retention time model:
a) 273 metabolites in the CIL metabonomics database are searched on a PubChem website (https:// Pubchem. ncbi. nlm. nih. gov /) to obtain the SMILES structure and other related information of the metabolites (see Table 1 for a few examples);
table 1 lists SMILES structures of several metabolites and other data sheets related thereto
Figure BDA0003045622360000061
b) 273 metabolites in the CIL metabonomics database are analyzed by using Chemistry Development Kit (https:// CDK. githu. io /) according to the SMILES structural formula obtained from PubChem to obtain CDK Descriptors thereof;
CDK Descriptors data sheet for several metabolites listed in Table 2
Figure BDA0003045622360000071
Figure BDA0003045622360000081
Figure BDA0003045622360000091
Figure BDA0003045622360000101
c) Combining PubChem Descriptors with CDK Molecular Descriptors to obtain 273 complete Metabolite property representations (namely, Metabolite Descriptors, MDs);
d) combining the MD of the 273 metabolites with corresponding retention time (namely Experimental RT) recorded in a CIL metabonomics database; table 3 lists the experimental retention times for three of the metabolites;
table 3 Experimental retention time data for three metabolites
Figure BDA0003045622360000102
e) 273 metabolites in total in the CIL metabonomics database are randomly divided into two groups, wherein one group contains 6/7 total metabolites, namely 234 metabolites, and is called as a training group; the other group, containing the remaining 1/7 metabolites, 39 metabolites, was the test group; see tables 4 and 5 for examples.
Table 4 exemplary table of metabolites in training set
Figure BDA0003045622360000103
Table 5 exemplary test group metabolites
Figure BDA0003045622360000104
Figure BDA0003045622360000111
f) The MD and retention time of the training group metabolites are used for establishing a model, and the support vector machine method is used for establishing the model. The method comprises the steps of converting data at a low latitude into a data space at a high latitude, so as to identify the variable interrelation of training group data, and establish a model for describing the mathematical relationship between MD and retention time; the method comprises the following specific steps:
(R run e1071 package: (a)https://cran.r-project.org/web/packages/e1071/ index.html);
Using MD and retention time of metabolites of the training group as variable inputs;
and thirdly, transforming the MD data into a high-latitude data space by using a radial basis kernel. radial basis kernel equation is
Figure BDA0003045622360000112
Where u and v are variables, e is a natural constant, r is a parameter, and there is another parameter cost for model building (in a formula not shown). In this example, the calculation parameter r is 0.000244140625, cost is 32;
and fourthly, running the program after the parameters are determined, and establishing a regression model between the MD data and the retention time (the model is a high-dimensional matrix model and is stored in the form of an Rdata file).
The model can be expressed as rt-XLOGP + LipinskiFailure + nRotB + MLogP + nATOmLAC + … …, where each variable is MD and is defined using the parameters weight (w) and interrupt (b). In this example, the values of the portions w and b are as follows: XLOGP of w is 109.6916, LipinskiFailues is-45.92101, nRotB is-31.93641, MLogP is 128.7612, nATOmLAC is 96.8386; and b is-1.53251329.
After the model is established, carrying out model verification by using metabolites of the test group;
after the model is built, the model is used for predicting the retention time of 39 metabolites in the test group, and the specific steps are as follows:
loading the established regression model (. Rdata file) in an R program;
using MD of the metabolite of the test group as variable input;
and running the program to obtain the predicted retention time.
The predicted retention time is compared to the experimental retention time. The model predicted retention time success criteria were:
the predicted retention time and the experimental retention time of all the test group metabolites are linearly related, as shown in FIG. 2 and Table 6;
the difference value between the predicted retention time of all metabolites and the experimental retention time is in a certain range; (this range is used as retention time threshold for metabolite identification, in this case within 180 seconds).
Calculating Q of model when SVM method model is built2Value, required Q2>0.7。
Table 6 lists the predicted retention times and experimental retention time results for the metabolites of the five test groups
Name Experimental RT Predicted RT Difference between predicted and experimental retention times
Citrulline 224.4 156.76 66.6398
5-Aminopentanoic acid 520.8 576 55.2
Homovanillic acid 990.6 1032 41.4
Serotonin 1479 1374.42 104.584
L-Thyronine 1526.4 1500.68 25.7188
2) Creating a new metabolite list:
a) establishing 3281 metabolites in total in a metabolite list which is expected to be recorded in a database but is not easy to obtain chemical standards; for example 3-methyl-histadine, Trans-4-hydroxy-L-Proline, Sepiapterin, Malonyl-CoA, 2-Hydroxyestradiol.
b) Obtaining PubChem SMILES structural formula and CDK Descriptors of the metabolites by the method; partial metabolites as shown in tables 7 and 8.
Table 7 data sheet listing several metabolites
Figure BDA0003045622360000121
Table 8 data sheet listing several metabolites
Figure BDA0003045622360000122
Figure BDA0003045622360000131
Figure BDA0003045622360000141
Figure BDA0003045622360000151
Figure BDA0003045622360000161
c) Combining the PubPhem Descriptors and CDK Descriptors of the metabolites to obtain a complete property expression of each Metabolite (namely, Metabolite Descriptors, MDs);
3) obtaining a predicted retention time
a) Obtaining the retention time of the new metabolite using the MDs of the new metabolite according to the established model describing the mathematical relationship between the MDs and the retention time; table 9 lists the predicted retention times for several metabolites.
Table 9 lists the predicted retention time results for metabolites
Figure BDA0003045622360000162
b) The database is built with a list of new metabolites, the exact mass of the new metabolites (from the list of metabolites) and the retention times obtained.
The verification results are as follows:
the results show that when the test group metabolites are predicted by using the newly established model, the obtained predicted retention time is linearly related to the retention time of the real experiment, and the results are shown in FIG. 2, wherein R20.9624, demonstrating a linear correlation. Simultaneous cross-validation process yields Q2And (0.792) proving that the model is successfully established. (the model is judged in such a manner that cross-validation is performed and Q is generated based on the cross-validation2(representing the predicted behavior of the model) determining the accuracy of the model, Q2The higher (closer to 1), the higher the model accuracy, generally requiring Q2At least greater than 0.7).
The established new metabonomics database contains 3554 metabolites in total.
The metabolites recorded by the method are purchased as standard products, and then the experimental retention time is collected and compared, which can be used for further verifying the accuracy of the predicted retention time. The verification result is matched with the test group result. For example, table 10:
table 10 shows the results of verifying the difference between predicted retention time and experimental retention time
Figure BDA0003045622360000163
Figure BDA0003045622360000171
Detection of human urine:
380 metabolites were identified using human urine as a sample, as shown in Table 11. There are other novel metabolites, as listed in Table 12.
And (3) carrying out sample preparation, LC-MS analysis and data processing on the human urine sample by using an HP-CIL metabonomics technology, and carrying out metabolite identification on the obtained peak pair list. The accurate mass and retention time of unknown peak pairs in the urine sample are used for identification and are matched with the accurate mass and retention time of metabolites in the database, when the newly established database is used for matching, the used retention time is the predicted retention time predicted by the model, the matching threshold is that the accurate mass (m/z) is 10ppm, and the retention time threshold is 180 seconds.
TABLE 11 380 metabolite tables identified in human urine
Figure BDA0003045622360000172
Figure BDA0003045622360000181
Figure BDA0003045622360000191
Figure BDA0003045622360000201
Figure BDA0003045622360000211
Figure BDA0003045622360000221
TABLE 12 list of other novel metabolites
Figure BDA0003045622360000222
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (5)

1. A method of modeling a metabolite, comprising the steps of:
1) establishing a metabolite retention time model:
a) searching metabolites in a known metabolism database on a PubChem website to obtain the SMILES structure and other related information of the metabolites;
b) analyzing all metabolites in the known metabolomics database according to the SMILES structural formula obtained from PubChem by using Chemistry Development Kit to obtain CDK Descriptors thereof;
c) combining PubChem Descriptors and CDK Molecular Descriptors to obtain complete property expression of all metabolites, namely MDs;
d) combining the MD of all metabolites with their corresponding retention times as recorded in a known metabolomic database;
e) randomly dividing all metabolites in a known metabonomic database into two groups, wherein one group contains 6/7 total metabolites and is called a training group; the other group, containing the remaining 1/7 metabolite, was the test group;
f) establishing a regression model by using MD and retention time of the metabolites of the training group, and establishing the model by using an SVM method; requirement Q2>0.7,
2) Creating a new metabolite list:
a) establishing metabolites to be included in the database;
b) obtaining PubChem SMILES structural formula and CDK Descriptors of the metabolites by the method;
c) combining the PubChem Descriptors and CDK Descriptors of the metabolites to obtain complete property expression of each metabolite, namely MDs;
3) obtaining a predicted retention time:
a) according to the established model, using MDs of the new metabolite to obtain the retention time of the new metabolite;
b) a database is built with the list of new metabolites, the exact mass of the new metabolites and the retention time obtained.
2. The method for modeling a metabolite according to claim 1, wherein the specific step of modeling using the svm method in step f) is as follows:
running an e1071 package by using R;
using MD and retention time of metabolites of the training group as variable inputs;
thirdly, using a radial basis kernel to transform the MD data into a high-latitude data space; radial basis kernel equation is
Figure FDA0003045622350000011
Wherein u and v are variables, e is a natural constant, and r and cost are parameters;
fourthly, running the program after the parameters are determined, and establishing a regression model between the MD data and the retention time; the model can be expressed as rt-XLOGP + LipinskiFailures + nRotB + MLogP + nATOmLAC + … …, wherein each variable is MD, and is limited by using a parameter weight, namely w, and an intercept, namely b; preferably, the values of the portions w and b are as follows: XLOGP of w is 109.6916, LipinskiFailues is-45.92101, nRotB is-31.93641, MLogP is 128.7612, nATOmLAC is 96.8386; and b is-1.53251329.
3. The method for modeling metabolites according to claim 1 wherein after the modeling, the metabolites of the test group are subjected to model validation by the steps of,
1) loading the established regression model in the R program: an Rdata file;
2) inputting MD of the test group metabolites as variables;
3) running the program to obtain a predicted retention time; comparing the predicted retention time with the experimental retention time;
the model predicted retention time success criteria were:
the predicted retention time and the experimental retention time of all the metabolites in the test group are linearly related;
the difference value between the predicted retention time of all metabolites and the experimental retention time is in a certain range; preferably, this range is used as retention time threshold for metabolite identification, within 180 second.
4. The method for modeling metabolites according to claim 1 wherein said database of known metabolism is a database of known CIL metabolism.
5. A method for creating a metabolomics database, wherein the metabolomics database is obtained by using the metabolite data obtained by the method according to any one of claims 1-4.
CN202110471744.1A 2021-04-29 2021-04-29 Method for establishing metabolite model and metabonomics database thereof Pending CN114283877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110471744.1A CN114283877A (en) 2021-04-29 2021-04-29 Method for establishing metabolite model and metabonomics database thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110471744.1A CN114283877A (en) 2021-04-29 2021-04-29 Method for establishing metabolite model and metabonomics database thereof

Publications (1)

Publication Number Publication Date
CN114283877A true CN114283877A (en) 2022-04-05

Family

ID=80868299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110471744.1A Pending CN114283877A (en) 2021-04-29 2021-04-29 Method for establishing metabolite model and metabonomics database thereof

Country Status (1)

Country Link
CN (1) CN114283877A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117133377A (en) * 2023-10-27 2023-11-28 浙江大学 Metabonomics-based metabolite combination model data iterative processing method
WO2024040840A1 (en) * 2022-08-23 2024-02-29 南京品生医疗科技有限公司 Construction method for intestinal microorganism-related metabolite mass spectrometry database and use thereof

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009020037A (en) * 2007-07-13 2009-01-29 Jcl Bioassay Corp Identification method by metabolome analysis, identification method of metabolite and their screening method
CN101719147A (en) * 2009-11-23 2010-06-02 合肥兆尹信息科技有限责任公司 Rochester model-naive Bayesian model-based data classification system
CN102472756A (en) * 2009-07-31 2012-05-23 百奥科瑞茨生命科学公司 Method for predicting the likelihood of an onset of an inflammation associated organ failure
CN103186718A (en) * 2011-12-29 2013-07-03 上海聚类生物科技有限公司 Novel algorithm for building cellular metabolism network
US9646139B1 (en) * 2014-12-08 2017-05-09 Hongjie Zhu Chemical structure-informed metabolomics data analysis
CN109668984A (en) * 2019-01-30 2019-04-23 山西医科大学第医院 A kind of construction method of the laryngocarcinoma serum discrimination model based on metabolism group
CN109880877A (en) * 2017-12-06 2019-06-14 中国科学院大连化学物理研究所 A kind of organic pollutant low dosage joint exposure poisonous effect appraisal procedure based on metabonomic technology
CN109884302A (en) * 2019-03-14 2019-06-14 北京博远精准医疗科技有限公司 Lung cancer early diagnosis marker and its application based on metabolism group and artificial intelligence technology
CN110057955A (en) * 2019-04-30 2019-07-26 中国医学科学院病原生物学研究所 The screening technique of hepatitis B specific serum marker
CN110741255A (en) * 2017-04-05 2020-01-31 代谢科技公司 Urine testing method based on high-throughput mass spectrum for colorectal state
CN111429971A (en) * 2020-02-21 2020-07-17 广州中医药大学 Lingnan damp-heat syndrome pattern animal identification method based on machine learning and metabonomics
CN112509702A (en) * 2020-11-30 2021-03-16 质美(北京)生物科技有限公司 Disease prediction method and system based on medical big data

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009020037A (en) * 2007-07-13 2009-01-29 Jcl Bioassay Corp Identification method by metabolome analysis, identification method of metabolite and their screening method
CN102472756A (en) * 2009-07-31 2012-05-23 百奥科瑞茨生命科学公司 Method for predicting the likelihood of an onset of an inflammation associated organ failure
CN101719147A (en) * 2009-11-23 2010-06-02 合肥兆尹信息科技有限责任公司 Rochester model-naive Bayesian model-based data classification system
CN103186718A (en) * 2011-12-29 2013-07-03 上海聚类生物科技有限公司 Novel algorithm for building cellular metabolism network
US9646139B1 (en) * 2014-12-08 2017-05-09 Hongjie Zhu Chemical structure-informed metabolomics data analysis
CN110741255A (en) * 2017-04-05 2020-01-31 代谢科技公司 Urine testing method based on high-throughput mass spectrum for colorectal state
CN109880877A (en) * 2017-12-06 2019-06-14 中国科学院大连化学物理研究所 A kind of organic pollutant low dosage joint exposure poisonous effect appraisal procedure based on metabonomic technology
CN109668984A (en) * 2019-01-30 2019-04-23 山西医科大学第医院 A kind of construction method of the laryngocarcinoma serum discrimination model based on metabolism group
CN109884302A (en) * 2019-03-14 2019-06-14 北京博远精准医疗科技有限公司 Lung cancer early diagnosis marker and its application based on metabolism group and artificial intelligence technology
CN110057955A (en) * 2019-04-30 2019-07-26 中国医学科学院病原生物学研究所 The screening technique of hepatitis B specific serum marker
CN111429971A (en) * 2020-02-21 2020-07-17 广州中医药大学 Lingnan damp-heat syndrome pattern animal identification method based on machine learning and metabonomics
CN112509702A (en) * 2020-11-30 2021-03-16 质美(北京)生物科技有限公司 Disease prediction method and system based on medical big data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BRADLEY C. NAYLOR等: "QSRR Automator: A Tool for Automating Retention Time Prediction in Lipidomics and Metabolomics", 《METABOLITES》 *
MINGSHU CAO等: "Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics", 《METABOLOMICS》 *
PAOLO BONINI等: "Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics", 《ANAL. CHEM》 *
刘春波等: "基于蚁群优化算法的支持向量机参数选择及仿真", 《中南大学学报(自然科学版)》 *
史怀等: "基于LC/Q-TOF MS的芽胞杆菌代谢组学分析方法", 《福建农业学报》 *
王宇: "基于机器学习建立创伤代谢组学快速精准分析技术的研究", 《优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024040840A1 (en) * 2022-08-23 2024-02-29 南京品生医疗科技有限公司 Construction method for intestinal microorganism-related metabolite mass spectrometry database and use thereof
CN117133377A (en) * 2023-10-27 2023-11-28 浙江大学 Metabonomics-based metabolite combination model data iterative processing method

Similar Documents

Publication Publication Date Title
Navarro et al. A multicenter study benchmarks software tools for label-free proteome quantification
Tarazona et al. Harmonization of quality metrics and power calculation in multi-omic studies
Wen et al. IQuant: an automated pipeline for quantitative proteomics based upon isobaric tags
Tautenhahn et al. An accelerated workflow for untargeted metabolomics using the METLIN database
Nesvizhskii et al. Analysis and validation of proteomic data generated by tandem mass spectrometry
Luedemann et al. TagFinder for the quantitative analysis of gas chromatography—mass spectrometry (GC-MS)-based metabolite profiling experiments
Kenar et al. Automated label-free quantification of metabolites from liquid chromatography–mass spectrometry data
Luedemann et al. TagFinder: preprocessing software for the fingerprinting and the profiling of gas chromatography–mass spectrometry based metabolome analyses
Rojas-Chertó et al. Elemental composition determination based on MS n
Röst et al. Automated SWATH data analysis using targeted extraction of ion chromatograms
JP4818116B2 (en) Method and device for processing LC-MS or LC-MS / MS data in metabonomics
Petrick et al. AI/ML-driven advances in untargeted metabolomics and exposomics for biomedical applications
CN114283877A (en) Method for establishing metabolite model and metabonomics database thereof
KR20160146727A (en) Means and methods for determination of quality of blood samples based on metabolite panel
Keerthikumar et al. Proteotypic peptides and their applications
Sonnett et al. Quantitative proteomics for Xenopus embryos II, data analysis
Jaeger et al. Statistical and multivariate analysis of MS-based plant metabolomics data
Enot et al. Bioinformatics for mass spectrometry-based metabolomics
CN111858570A (en) CCS data standardization method, database construction method and database system
Yu et al. Quantitative challenges and their bioinformatic solutions in mass spectrometry-based metabolomics
Schork et al. Important issues in planning a proteomics experiment: statistical considerations of quantitative proteomic data
Barnes Overview of experimental methods and study design in metabolomics, and statistical and pathway considerations
Sun et al. A systematic model of the LC-MS proteomics pipeline
KR20200046991A (en) Automatic analysis system and method for metabolite data for identifying bio-markers
Hogan et al. Experimental standards for high-throughput proteomics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination