CN114594171A - Deep annotation method for metabolome - Google Patents

Deep annotation method for metabolome Download PDF

Info

Publication number
CN114594171A
CN114594171A CN202011407735.8A CN202011407735A CN114594171A CN 114594171 A CN114594171 A CN 114594171A CN 202011407735 A CN202011407735 A CN 202011407735A CN 114594171 A CN114594171 A CN 114594171A
Authority
CN
China
Prior art keywords
metabolites
metabolite
molecular
measured
molecular structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011407735.8A
Other languages
Chinese (zh)
Other versions
CN114594171B (en
Inventor
许国旺
李在芳
王鑫欣
亓彦鹏
路鑫
林晓惠
赵春霞
赵欣捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Dalian Institute of Chemical Physics of CAS
Original Assignee
Dalian University of Technology
Dalian Institute of Chemical Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology, Dalian Institute of Chemical Physics of CAS filed Critical Dalian University of Technology
Priority to CN202011407735.8A priority Critical patent/CN114594171B/en
Publication of CN114594171A publication Critical patent/CN114594171A/en
Application granted granted Critical
Publication of CN114594171B publication Critical patent/CN114594171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/26Conditioning of the fluid carrier; Flow patterns
    • G01N30/28Control of physical parameters of the fluid carrier
    • G01N30/34Control of physical parameters of the fluid carrier of fluid composition, e.g. gradient
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • G01N30/7233Mass spectrometers interfaced to liquid or supercritical fluid chromatograph
    • G01N30/724Nebulising, aerosol formation or ionisation
    • G01N30/7266Nebulising, aerosol formation or ionisation by electric field, e.g. electrospray
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8675Evaluation, i.e. decoding of the signal into analytical information
    • G01N30/8679Target compound analysis, i.e. whereby a limited number of peaks is analysed
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8675Evaluation, i.e. decoding of the signal into analytical information
    • G01N30/8686Fingerprinting, e.g. without prior knowledge of the sample components

Landscapes

  • Chemical & Material Sciences (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Dispersion Chemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a deep annotation method for a complex biological sample metabolome. The method comprises the steps of carrying out non-targeted metabonomics analysis based on ultra-high performance liquid chromatography-high resolution mass spectrometry on a biological sample extract to obtain metabonomic chromatography-mass spectrometry information of a biological sample, and screening matched candidate metabolites from a metabonomic database according to the ion mass-to-charge ratio and the experiment retention time of an experimental primary mass spectrometry in the obtained non-targeted metabonomic data; and further constructing a metabolite molecular structure association network according to the molecular fingerprint similarity of the candidate metabolites. And then, carrying out large-scale qualification on the metabolome by using experimental data of the non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry metabolome and taking the molecular structure association network as a background network. The method does not depend on a large-scale experimental secondary spectrogram database, and has higher qualitative coverage and reliability.

Description

Deep annotation method for metabolome
Technical Field
The invention relates to the field of analytical chemistry and metabonomics, in particular to a metabolite deep annotation method based on a molecular structure association network.
Background of the study
Metabolites are diverse in kind and species-specific. Metabolome characterisation has been a bottleneck in the field of metabolomics and analytical chemistry. The non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry technology is one of the mainstream technologies of metabonomics research, and with the continuous progress of the high resolution mass spectrometry technology, the generation of high-throughput metabonomics data is no longer the main bottleneck of the research. Metabonomics methods based on non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry (UHPLC-HRMS) have enabled the detection of tens of thousands or tens of thousands of mass spectra peaks (metablic features) at a time, but the number of metabolites that can be obtained is typically less than 1000, and the number of metabolites that can be identified therein is typically only a few hundred. Due to limited information which can be annotated by non-targeted metabonomics experimental data, a large number of discovered different metabolites cannot be used for subsequent researches such as functional mechanisms and the like due to unknown structures.
High reliability metabolite identification based on mass spectrometry techniques typically requires pool matching identification by accurate mass number, retention time, and secondary mass spectrometry (MS/MS). At present, a large amount of endogenous metabolites are recorded in a metabolome database, but the database lacks of chromatographic retention time, the number of experimental secondary spectrograms is small, most of the recorded secondary spectrograms are theoretical prediction spectrograms, and the difference between the recorded secondary spectrograms and actual measurement spectrograms is large. In addition, the reproducibility of secondary spectrograms acquired by different types of mass spectrums is poor, so that the capability of database search qualitative is limited, and the effective identification of metabolites is seriously influenced. For this reason, it is urgently needed to develop a deep annotation method for non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry metabolome data.
Disclosure of Invention
The invention provides a large-scale qualitative method of a metabolome. In order to realize the purpose of the invention, the non-targeted metabonomics analysis based on ultra-high performance liquid chromatography-high resolution mass spectrometry is carried out on the biological sample extract to obtain the metabolome-related chromatography-mass spectrometry information of the biological sample extract; collecting candidate metabolites in a metabolome database based on the obtained non-targeted metabolomic data; constructing a metabolite molecular structure association network based on the similarity of the candidate metabolite molecular fingerprints; and carrying out large-scale qualification on the metabolome by using experimental data of the non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry metabolome and using a molecular structure association network as a background network. The technical scheme adopted by the invention comprises the following steps:
firstly, performing non-targeted metabonomics analysis on an extract of a biological sample to be detected by adopting ultra-high performance liquid chromatography-high resolution mass spectrometry; obtaining relevant chromatogram-mass spectrum information of the extract metabolome, including retention time t of experimentally measured metabolite peakMeasured rFirst order mass spectral information, i.e., first order ion mass-to-charge ratio m/zMeasured in factAnd corresponding secondary mass spectral information, i.e., the mass-to-charge ratio and intensity of the secondary ions; the primary ions refer to ions directly collected after the compound is ionized; the secondary ions refer to ions collected after the primary ions are impacted and fragmented by applying certain energy;
secondly, constructing a candidate metabolite molecular structure database; the primary ion mass-to-charge ratio m/z of all metabolites in the biological sample extract to be detected is obtained according to the first step of experimentMeasured in factAnd experimental retention time tMeasured r. Obtaining the mass-to-charge ratio m/z of theoretical primary ions by using molecular formulas of metabolites in open source metabolome databaseTheory of the invention(ii) a Then obtaining the predicted retention time t of the metabolite according to the retention time prediction modelr predictionAnd the retention time prediction model is constructed based on the known metabolite structure retention relationship. Mass-to-charge ratio m/z of first-order ions of metabolism physics in open source metabolome databaseTheory of the inventionAnd predicted retention time tr is preSide surveyAnd experimental metabolite data primary ion mass-to-charge ratio m/zMeasured in factAnd experimental retention time tMeasured rMatch is performed while satisfying
|tr prediction-tr measured|<2min, and | m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the invention*1000000<5ppm of metabolites are used as candidate metabolites to construct a candidate metabolite database; the database contains a simplified molecular linear input specification (SMILES), name, molecular formula, molecular structure and predicted retention time for the metabolite;
thirdly, constructing a molecular structure association network of the metabolome; obtaining a molecular fingerprint according to the molecular structure of the metabolites in the candidate metabolite database, wherein the molecular fingerprint can be any one of Morgan fingerprint, MACCS fingerprint, Atom-pair fingerprint and Daylight fingerprint; and calculating the similarity between any two candidate metabolite molecular fingerprints, wherein the calculation method of the similarity is based on an open source tool RDkit. Setting a similarity threshold, connecting metabolites with similarity of molecular fingerprints being more than or equal to a similarity threshold between the molecular fingerprints by taking the metabolites as nodes and the molecular fingerprints as edges, and constructing a metabolic group level molecular structure association network;
fourthly, carrying out large-scale qualification of the metabolites based on the molecular structure association network; taking the molecular structure associated network constructed in the third step as a background network, taking a candidate metabolite database as a reference, selecting 5-50 metabolites from the background network, identifying 5-50 metabolites from non-targeted ultra high performance liquid chromatography-high resolution mass spectrometry metabolome experimental data by using standard samples of the 5-50 metabolites as seed metabolites, mapping the seed metabolites into the established molecular structure associated network, and acquiring adjacent metabolites of the seed metabolites from the network, wherein the adjacent metabolites refer to the metabolites with direct edge connection in the molecular structure associated network; assigning the secondary mass spectrum of the seed metabolite to the adjacent metabolite as the quasi-secondary mass spectrum, and setting a search threshold, | tr prediction-tMeasured r|<2min and | m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the invention*1000000<5ppm and the similarity between the experimental secondary mass spectrum of the metabolite peak and the quasi-secondary mass spectrum of the adjacent metabolite is more than or equal to 0.5. Number of experimentsAccording to which m/z is searched for adjacent metabolitesTheory of the invention,tr predictionMatching the metabolite peak matched with the secondary mass spectrum, and completing the identification of the metabolite peak if the matching is successful; (ii) a The identified metabolites are used as new seeds, and the qualitative process is repeated until no new metabolites are identified; when a plurality of matching results exist, the matching results are scored, the higher the score of the metabolite peak is, the higher the identification accuracy of the metabolite peak is, and the identified metabolite is not used as a new seed. The score was 0.25 × (1- | m/z)Theory of the invention-m/zMeasured actually|×1000000/(m/zTheory of the invention×5))+0.25×(1-|tr (metabolites)-tr (Experimental value)I/2) +0.5 Xsecond-order spectrum similarity.
According to the invention, on the premise that MS/MS of metabolites with similar molecular structures have similarity, an experimental data-oriented large-scale qualitative method based on a molecular structure correlation network is established, and the structural identification of unknown metabolites is realized. By establishing a candidate metabolite database and a candidate metabolite molecular structure correlation network thereof and adopting the network to identify the metabolite without a standard MS/MS spectrogram, the structural identification of the metabolite can be independent of a large-scale standard MS/MS database. The invention is a deep annotation method of the metabolome without depending on a large-scale experimental secondary spectrogram database, can realize the large-scale and reliable qualitative of the metabolome, and obviously expands the coverage of the annotation of the metabolome.
Drawings
FIG. 1 shows a molecular structure association network (similarity threshold of metabolite molecular fingerprints is 0.7);
FIG. 2 is a partial enlarged view of the molecular structure association network;
FIG. 3 is a schematic diagram of a metabolite characterization process based on a molecular structure association network;
FIG. 4A is a molecular structure association network from the positive ion mode of maize filament mass spectrometry;
FIG. 4B is a molecular structure association network from the negative ion mode of maize filament mass spectrometry.
Detailed Description
The following detailed description of the invention refers to the accompanying drawings in which: the present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.
Example 1
In order to confirm the effectiveness and feasibility of the invention, mixed standard samples consisting of 173 hydroxycinnamic acid amides (including N-Cinnamoyl-putrescine, N- (p-Coumaroyl) -cadoverine, N- (p-Coumaroyl) -agmatine, N ' -Caffeoyl-feruloyl-putrescine, N ' -Caffeoyl-feruloyl-silacyl-speramine, N ' -Tris-feruloyl-speramine and the like are added into the plant extracts, the marked plant extracts are subjected to ultra performance liquid chromatography-high resolution mass spectrometry data collection, and the qualitative of the hydroxycinnamic acid amides in the collected non-targeted metabolic data is taken as an example to carry out the principle description of the invention.
Extraction of plant tissue metabolome: and (3) extracting metabolites from the corn filaments by adopting a plant metabonomics method. First, 50 mg of the corn silk powder was weighed into a 1.5 ml centrifuge tube, 1.0 ml of methanol/water (volume ratio, 4:1) extractant was added, vortexed on a vortexer for 5 minutes, and centrifuged at 15000rpm for 10min at 4 ℃. Taking 700 mu L of supernatant, putting the supernatant into a vacuum centrifugal concentrator for freeze-drying. Add 100. mu.l methanol/water (volume ratio, 4:1) to the lyophilized sample powder, vortex for 1min, and centrifuge for 10min in a high speed centrifuge at 15000rpm at 4 ℃.
Acquiring non-target chromatogram-mass spectrum information: data were collected from an analytical instrument using a combination of the ACQUITY UHPLC ultra high performance liquid chromatography system (UPLC, Waters, Milford, MA, USA.) and Q exact HF high resolution mass spectrometry (Thermo Fisher Scientific, Rockford, IL, USA.).
The liquid chromatogram conditions in the positive ion mode of the mass spectrum electrospray ionization source are as follows: the A phase and the B phase were 0.1% formic acid/water (volume ratio) and 0.1% formic acid/acetonitrile (volume ratio), respectively, and the flow rate was 0.35 mL/min. The initial elution gradient was 5% B, held for 1 min; the linear gradient increased to 100% B within 23min and was held for 4min, followed by a linear return to the initial gradient within 0.1min and was held for 2.9min for a total analysis time of 30 min. The sample is ACQUITY BEH C18Chromatography column (100 mm. times.2.1 mm,1.7 μ M, Waters, Milford, MA, u.s.a.). The column temperature was 50 ℃. The temperature of the sample injection chamber is set to be 4 ℃, and the sample injection amount is 5 mu L.
The liquid chromatogram conditions in the mass spectrum electrospray ionization source negative ion mode are as follows: phases A and B were 6.5mM ammonium bicarbonate in water and 6.5mM ammonium bicarbonate in 95% methanol/water (by volume), respectively. The flow rate was 0.35 mL/min. The initial elution gradient was 2% B, held for 1min, the linear gradient increased to 100% B at 18min and held to 22min, then at 22.1min the linear gradient returned to the initial ratio and held to 25 min. The sample adopts ACQUITY HSST3The column (100 mm. times.2.1 mm,1.8 μm, Waters, Milford, MA, U.S.A.) was used for the separation. The column temperature was 50 ℃, the sample introduction chamber temperature was set to 4 ℃ and the sample introduction amount was 5. mu.L.
The Q active HF mass spectrometry conditions are as follows: the scan mode is full scan/auto-triggered data-dependent secondary mass spectrometry (full MS/data-dependent MS)2). In the full scan mass spectrometry setting, the resolution was taken to be 120,000, and the automatic gain control target (AGC target) and maximum injection time (maximum IT) were set to be 3 × 10, respectively6Ion capacity and 100 ms. The full-scan mass scan range is m/z 85-1250. In the secondary mass spectrum setting, the automatic gain control target (AGC target) and the maximum injection time (maximum IT) are set to 1 × 10, respectively5Ion capacity and 50 ms. The isolation window was m/z 1.0. Collision energy used mixed normalized energy (NCE) 15%, 30% and 45%. Acquisition of the secondary mass spectrum is triggered by the first 10 ions responding most strongly in each full scan cycle. The Inclusion list was added and set to on. The electrospray voltage under the positive and negative ion mode is 3.5kV and 3.0kV respectively, the temperature of the ion transmission tube is 320 ℃, and the temperature of the auxiliary gas is 350 ℃. The sheath and assist gas flow rates were 45 and 10(in the arbitrary units), respectively. S-lens is set to 50.0(in arbitrary units).
Acquiring experimental chromatogram-mass spectrum information: non-targeted metabolomics data based on spiked extracts, using the software CompundDiscovery3.1, to obtain a peak table, including the experimental retention time tr measuredFirst order mass spectral information, i.e., first order ion mass-to-charge ratio m/zMeasured in factAnd exporting the Excel table. The raw data is processed by adopting software ProteWizardMgf, containing the corresponding secondary mass spectral information, i.e. the mass-to-charge ratio and intensity of the secondary ion. Primary ion mass-to-charge ratio m/z of metabolite peaks in experimental dataMeasured in factExperiment retention time tMeasured rThe mass window matched to the corresponding secondary mass spectrum was 10ppm and the retention time window was 10 s. Extracting the experimental retention time t of 173 hydroxycinnamic acid amides from the collected non-targeted metabonomics dataMeasured rFirst order mass spectral information, i.e., first order ion mass-to-charge ratio m/zMeasured in factAnd corresponding secondary mass spectral information, i.e., the mass-to-charge ratio and intensity of the secondary ions.
Constructing a retention time prediction model: 127 hydroxycinnamic acid amide (including N- (p-Coumaryl) -speramine, N-Sinapoyl-tyramine, N '-Cinnaryl-Sinapoyl-putrescine and N, N' - (p-Coumaryl) -bis-caffeoyl-speramine, etc.) standards were analyzed using the same ultra performance liquid chromatography-high resolution mass spectrometry data collection conditions as the plant extracts to obtain the retention time of the liquid chromatography experiment. Calculating to obtain a 1D &2D molecular descriptor of each standard sample in open source website ChemDes (http:// www.scbdd.com/ChemDes) by using an SDF file of the standard sample, and selecting a step-by-step method to construct a retention time prediction model by using the retention time of the liquid chromatogram as a dependent variable and the molecular descriptor as an independent variable by using a multiple linear regression method.
Candidate metabolites were collected using the open source plant hydroxycinnamic amide metabolome database (https:// pubs. acs. org/doi/abs/10.1021/acs. analchem.8b03654) which contains 846 hydroxycinnamic amides. Firstly, the molecular formula of hydroxycinnamic acid amide in a database is utilized to obtain the mass-to-charge ratio m/z of theoretical primary ion of each hydroxycinnamic acid amideTheory of the invention(ii) a Predicting the predicted retention time t of 846 hydroxycinnamic acid amides by using the retention time prediction model constructed in the previous stepr prediction. The mass-to-charge ratio m/z of the primary ions of 173 hydroxycinnamic acid amides obtained by non-targeted metabonomics experiment of the added standard plant extractMeasured in factAnd experimental retention time tr measuredSearching an open source plant hydroxycinnamic acid amide metabolome database, and simultaneously satisfying the following conditions in the database:
|tr prediction-tMeasured r|<2min,
And | m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the invention×1000000<5ppm of 220 hydroxycinnamic acid amides are used as candidate metabolites, SMILES, names, molecular formulas, molecular structures and predicted retention time are obtained, and a candidate hydroxycinnamic acid amide database is constructed.
Constructing a molecular structure association network: the method comprises the steps of obtaining Morgan fingerprints of candidate hydroxycinnamic acid amides on the molecular structure of the candidate hydroxycinnamic acid amides, calculating the similarity between any two candidate Morgan fingerprints, setting the threshold value of the similarity of the molecular fingerprints to be 0.7, constructing a molecular structure correlation network by taking the candidate hydroxycinnamic acid amides as nodes and the Morgan fingerprint similarity between any two candidate hydroxycinnamic acid amides as edges, wherein the molecular structure correlation network is shown in figure 1 and has 220 nodes and 3866 edges.
Molecular structure-based correlation network characterization: and identifying the labeled metabolites collected by the non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry metabolome by taking the constructed molecular structure association network as a background network. The specific process of characterization is as follows:
1) identifying 6 hydroxycinnamic acid amides as seed metabolites from the labeled plant extract non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry metabolome experimental data by using a standard sample, mapping the hydroxycinnamic acid amides into an established molecular structure correlation network, and acquiring adjacent metabolites of the seed metabolites from the network, wherein the adjacent metabolites refer to the metabolites with direct edge connection in the molecular structure correlation network. FIG. 2 is a partial enlarged view of a network of related molecular structures, in which the seed metabolite 1 is N-Caffeoyl-5-methoxytryptamine, and the adjacent metabolites thereof are 5 in total, including the adjacent metabolite 1 being N-Sinapoyl-serotonin, the adjacent metabolite 2 being N, N '-Feruloyl-cinnamyl-cadverine, the adjacent metabolite 3 being N, N' - (p-Coumaryl) -Feruloyl-agmatine, the adjacent metabolite 4 being N-Feruloyl-octopamine and the adjacent metabolite 5 being N-Caffeoyl-serotonin. M/z of adjacent metabolites 1 to 5Theory of the invention,tr predictionM/z 383.1607, 6.62min respectively; m/z 409.2127, 8.90 min; m/z 453.2138, 8.14 min; m/z 330.1341, 5.91min and m/z339.1345, 6.19 min.
2) The secondary mass spectrum of the seed metabolite is assigned to the adjacent metabolite as its "pseudo-secondary mass spectrum". Setting a search threshold value:
|tr prediction-tMeasured r|<2min,
|m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the invention*1000000<5ppm,
And the similarity between the experimental secondary mass spectrum and the quasi-secondary mass spectrum of the adjacent metabolites is more than or equal to 0.5
The qualitative procedure is illustrated as follows: as shown in fig. 3, the secondary spectrum of the seed metabolite 1 is a red spectrum in the graph, which is taken as a "pseudo-secondary spectrum" of 5 adjacent metabolites; finding m/z from each neighboring metabolite in the experimental dataTheory of the invention,tr predictionAnd a metabolite peak that is fit to the secondary mass spectrum. The retention time of [ M + H ] is searched in the experimental data and is 6.97min]+383.1594 peak of metabolite linked to | t of neighboring metabolite 1(N-Sinapoyl-serotonin)r prediction-tMeasured r|=0.35min,Δm=|m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the inventionX 1000000 ═ 3.4ppm, and the experimental secondary spectrum (blue) of this peak had a similarity of 0.86 to the "pseudo-secondary spectrum" (red chromatogram) of the adjacent metabolite 1. Thus, the metabolite peak was characterized as N-Sinapoyl-serotonin. Using a similar qualitative approach, 3 metabolite peaks (m/z) in the experimental dataMeasured in fact,tMeasured rSecond order similarity) m/z409.2109, 9.34min, 0.78; m/z453.2118, 7.92min, 0.76 and m/z330.1330, 5.71min, 0.86 matched the adjacent metabolites 2, 3, and 4, respectively, and these 3 metabolite peaks were also successfully identified.
3) When a plurality of matching results are searched out from the experimental data, the matching results are scored, and the scoring rule is as follows:
the score was 0.25 × (1- | m/z)Theory of the invention-m/zMeasured in fact|×1000000/(m/zTheory of the invention×5))+0.25×(1-|tr (metabolites)-tr (Experimental value)I/2) +0.5 Xsecond-order spectrogram similarity
If 3 metabolite peaks are searched in the experimental data to match with the adjacent metabolite 5, all satisfy the search threshold, and the m/z thereofMeasured in fact,trMeasured actuallyThe similarity of the second-order mass spectrum is m/z 339.1332, 5.89min and 0.77 respectively; m/z 339.1330, 5.47min, 0.61 and m/z 339.1335, 6.63min and 0.63, the 3 results are scored, the corresponding scores are 0.66, 0.50 and 0.62, the results are output in a descending order, and the high-score identification result is high in reliability. The metabolite peaks identified in this case no longer participate as seeds in the next round of characterization.
4) The metabolites identified above are used as new seeds and the qualitative process is repeated until no new metabolite peaks are characterized. The metabolite peak (383.1594, 6.97min) was successfully identified as N-Sinapoyl-serotonin (adjacent metabolite 1 in FIG. 2) in the experimental data, and the experimental secondary spectrum was assigned to the next adjacent metabolite 1(N, N', N "-Feruloyl-bis-cinnamyl-speramine) in FIG. 2 as its" pseudo-secondary spectrum ". M/z of next order neighbor metabolite 1Theory of the invention,tr prediction582.2968, 11.19 min. And (3) finding out a metabolite peak 582.2948 meeting a threshold value in the experimental data, wherein the similarity between the experimental secondary spectrum and the simulated secondary spectrum is 0.75 in 11.65min, and the matching is successful. The metabolite peak (582.2948, 11.65min) was identified as N, N', N "-Feruloyl-bis-cinnamoyl-spermidine and the above characterization procedure was repeated as a new seed.
By adopting the method, 167 hydroxycinnamic acid amides are successfully identified by using 6 hydroxycinnamic acid amides as initial seed metabolites, and the accuracy of the identification result is 98.8%. The first one has 141, the second one 19, the third one 5, and the 4 th one 2. The reason for the ranking was not the first one was that 80 of the 169 hydroxycinnamic amides had isomers with similar retention times for the isomers and similar secondary mass spectra.
Comparing the identification result with a common database searching method, the database response (http:// spectrum. psc. key. jp /) only contains 23 hydroxycinnamamides, the Metlin (https:// Metlin. script. edu) contains 44 hydroxycinnamamides, but the databases hardly contain secondary spectrogram of the hydroxycinnamamides, only primary ion mass-to-charge ratio searching is used, the reliability of qualitative result cannot be guaranteed, and the coverage degree is limited.
The results show that the metabolite qualitative method based on the molecular structure correlation network can realize reliable qualitative operation without depending on a large-scale experimental secondary spectrogram database; the coverage of metabolome annotations can be significantly expanded using an open-source structural database.
Example 2
The invention is adopted to carry out qualitative analysis on the actual biological sample extract. Extracting a plant tissue (corn filament) metabolome, carrying out ultra-high performance liquid chromatography-high resolution mass spectrometry data acquisition on the corn filament tissue extract, and carrying out qualitative analysis on the obtained non-targeted metabonomics data.
The procedure and conditions were the same as in example 1, except that:
extraction of plant tissue metabolome: the same as in example 1.
Non-targeted metabolomics data acquisition: the same as in example 1.
Acquiring experimental chromatogram-mass spectrum information: non-targeted metabonomics data based on maize filament extract, using software CompoundDiscovery3.1 to obtain a peak table including the experimental retention time tMeasured rFirst order mass spectral information, i.e., first order ion mass-to-charge ratio m/zMeasured in factAnd exporting the Excel table. The original data is transformed by software ProteWizard to obtain a secondary file of mgf, which contains corresponding secondary mass spectrum information, i.e. the mass-to-charge ratio and intensity of secondary ions.
Constructing a retention time prediction model: adopting the same ultra performance liquid chromatography-high resolution mass spectrometry data acquisition conditions as the plant extracts to analyze 254 standard samples (comprising 1, 3-dihydroyacetone, Benzoic acid, Methionine sulfoxide, 7-Methoxycoumarin, Vibalanone B, Nardosine and the like) in a positive ion mode and 327 standard samples (comprising 3-hydroxyproanoid acid, 2-hydroxyquinoline, Coixol, 6-benzothiazoaminopurin, Quercetin, Daphnoretin and the like) in a negative ion mode to respectively obtain the retention time of the liquid chromatography experiment. Calculating to obtain a 1D &2D molecular descriptor of each standard sample in an open source website ChemDes (http:// www.scbdd.com/ChemDes) by utilizing an SDF file of the standard sample, and selecting a step-by-step method to respectively construct retention time prediction models of a positive ion mode and a negative ion mode by adopting a multiple linear regression method and taking the retention time of a liquid chromatogram as a dependent variable and the molecular descriptor as an independent variable.
The open source metabolome Database Universal Products Database UNPD (http:// pkuxxj. pku. edu. cn/UNPD /), Plant Metabolic Network (https:// Plant. org /) and KEGG (https:// www.genome.jp/KEGG /) were used. Firstly, based on the molecular formula of metabolites in a database, obtaining the mass-to-charge ratio m/z of theoretical primary ions of each metaboliteTheory of the invention(ii) a Predicting the predicted retention time t of each metabolite by using the retention time prediction modelr prediction. The primary ion mass-to-charge ratio m/z of metabolite peaks obtained by non-targeted metabonomics experiments of plant extractsMeasured in factAnd experimental retention time tMeasured rSearching an open source metabolome database, and simultaneously satisfying the following conditions in the database:
|tr prediction-tMeasured r|<2min,
|m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the invention*1000000<And 5ppm of metabolite is used as candidate metabolite, the SMILES, the name, the molecular formula, the molecular structure and the predicted retention time are obtained, and a candidate metabolite database is constructed.
Constructing a molecular structure association network: obtaining Morgan fingerprints based on the molecular structures of the candidate metabolites, calculating the similarity between any two Morgan fingerprints of the candidate metabolites, setting the threshold value of the similarity of the molecular fingerprints to be 0.6, taking the candidate metabolites as nodes and the Morgan fingerprint similarity between any two candidate metabolites as edges, and constructing a molecular structure association network, wherein the molecular structure association network in a positive ion mode comprises 1965 metabolites (nodes) and 28199 edges, which are shown in figure 4A; the molecular structure association network in negative ion mode comprises 1945 metabolites (nodes), side 34451, see FIG. 4B.
Molecular structure-based correlation network characterization: and identifying the experimental data acquired by the non-targeted ultra-high performance liquid chromatography-high resolution mass spectrometry metabolome by taking the constructed molecular structure association network as a background network, determining the metabolites in the biological sample to be detected, wherein the identification process is the same as that in the embodiment 1.
The process shows that abundant candidate metabolites can be obtained from the complex plant tissue extract metabolic group data, the similarity between Morgan fingerprints is calculated for the candidate metabolites, and when the threshold value of the similarity of the molecular fingerprints is set to be 0.6, a complete communicated network can be formed, and the large-scale qualitative of the metabolic group can be realized.

Claims (8)

1. A method for deep annotation of metabolome, comprising:
firstly, performing non-targeted metabonomics analysis on a biological sample extract to be detected by adopting ultra-high performance liquid chromatography-high resolution mass spectrometry; obtaining chromatography-mass spectrometry information of the metabolome of the extract, including experimentally determined retention time t of the metabolite peakMeasured rMass to charge ratio m/z of ions of the first mass spectrumMeasured in factAnd the mass-to-charge ratio and intensity of the corresponding secondary mass spectrometry ions;
secondly, constructing a candidate metabolite molecular structure database; the primary ion mass-to-charge ratio m/z of all metabolites in the biological sample extract to be detected is obtained according to the first step of experimentMeasured in factAnd experimental retention time tMeasured rScreening out the mass-to-charge ratio m/z of primary ions from the open source metabonomics databaseMeasured in factAnd experimental retention time tMeasured rThe matched metabolites are used as candidate metabolites to construct a candidate metabolite database; the database contains a simplified molecular linear input specification of metabolites, SMILES, name, molecular formula, molecular structure and predicted retention time;
thirdly, constructing a metabolic component substructure association network; obtaining a molecular fingerprint according to the molecular structure of the metabolites in the candidate metabolite database; calculating the similarity between any two candidate metabolite molecular fingerprints, wherein the calculation method of the similarity is based on an open source tool RDkit; setting a similarity threshold value between molecular fingerprints to be 0.5-0.8 generally, further taking metabolites as nodes and molecular fingerprint similarity as edges, and connecting metabolites with the similarity threshold value between the molecular fingerprints being more than or equal to the similarity threshold value between the molecular fingerprints to construct a molecular structure association network;
fourthly, carrying out metabolite qualification based on the molecular structure association network; and identifying the experimental data acquired by the non-targeted ultra high performance liquid chromatography-high resolution mass spectrometry by taking the molecular structure association network constructed in the third step as a background network, and determining the metabolites in the biological sample to be detected.
2. The method of claim 1, further comprising: the first-stage mass spectrum ions are ions directly collected after ionization and ionization of a compound by a mass spectrum; the secondary mass spectrum ions are ions collected after primary ions are impacted and fragmented by applying certain energy.
3. The method of claim 1, further comprising: in the second step, the candidate metabolite obtaining method comprises the following steps: obtaining the mass-to-charge ratio m/z of theoretical primary mass spectrum under positive and negative ion ionization modes by using molecular formula of metabolites in public metabolome databaseTheory of the inventionThen obtaining the predicted retention time t of the metabolite according to the structural parameters of the metaboliter prediction(ii) a The inclusion criteria of the candidate metabolites are, at the same time, satisfied
|tr prediction-tMeasured r|<2min and
|m/ztheory of the invention-m/zMeasured in fact|/m/zTheory of the invention×1000000<5ppm。
4. The method of claim 1, further comprising: fourthly, the metabolite identification method based on the molecular structure association network comprises the steps of selecting 5-50 metabolites from a candidate metabolite database serving as a reference, identifying 5-50 metabolites serving as seed metabolites from non-targeted ultra high performance liquid chromatography-high resolution mass spectrometry metabolome experimental data by using standard samples of the 5-50 metabolites, mapping the seed metabolites to the established molecular structure association network, and obtaining adjacent metabolites of the seed metabolites from the network; assigning a secondary mass spectrum of the seed metabolite to an adjacent metabolite as a pseudo-secondary mass spectrum thereof; setting a search threshold value, and searching for adjacent metabolites m/z in experimental dataTheory of the invention,tr predictionSimulating a metabolite peak matched with the secondary mass spectrum, and completing the identification of the metabolite peak if the matching is successful; the identified metabolites are used as new seeds, and the qualitative process is repeated untilUntil no new metabolites were characterized;
when a plurality of matching results exist, the matching results are scored, the scores are ranked from high to low, the higher the score of the metabolite peak is, the higher the identification accuracy of the metabolite peak is, and the identified metabolite is no longer used as a new seed.
5. The method of claim 4, further comprising: searching threshold value:
|tr prediction-tMeasured r|<2min,
|m/zTheory of the invention-m/zMeasured in fact|/m/zTheory of the invention*1000000<5ppm,
And the similarity between the experimental secondary mass spectrum of the metabolite peak and the quasi-secondary mass spectrum of the adjacent metabolite is more than or equal to 0.5.
6. The method of claim 1, further comprising: and thirdly, the molecular fingerprint is any one of Morgan fingerprint, MACCS fingerprint, Atom-pair fingerprint and Daylight fingerprint.
7. The method according to claim 3 or 5, wherein: the metabolite predicted retention time is obtained through prediction of a retention time prediction model, and the retention time prediction model is constructed through a known metabolite structure-retention relation.
8. The method of claim 4, further comprising: the adjacent metabolites are metabolites with direct edge connection in the molecular structure association network.
CN202011407735.8A 2020-12-03 2020-12-03 Metabolome deep annotation method Active CN114594171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011407735.8A CN114594171B (en) 2020-12-03 2020-12-03 Metabolome deep annotation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011407735.8A CN114594171B (en) 2020-12-03 2020-12-03 Metabolome deep annotation method

Publications (2)

Publication Number Publication Date
CN114594171A true CN114594171A (en) 2022-06-07
CN114594171B CN114594171B (en) 2023-12-15

Family

ID=81813178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011407735.8A Active CN114594171B (en) 2020-12-03 2020-12-03 Metabolome deep annotation method

Country Status (1)

Country Link
CN (1) CN114594171B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140088885A1 (en) * 2011-03-11 2014-03-27 Dong-Yup LEE Method, an apparatus, and a computer program product for identifying metabolites from liquid chromatography-mass spectrometry measurements
CN107729721A (en) * 2017-10-17 2018-02-23 中国科学院上海有机化学研究所 A kind of metabolin identification and disorderly path analysis method
WO2018072306A1 (en) * 2016-10-23 2018-04-26 哈尔滨工业大学深圳研究生院 Visualization network-based two-stage metabolite mass spectrometry detection method for compound
CN110907575A (en) * 2018-09-14 2020-03-24 中国科学院大连化学物理研究所 Deep annotation method of hydroxycinnamic acid amide in plants
CN111710363A (en) * 2020-06-19 2020-09-25 苏州帕诺米克生物医药科技有限公司 Method and device for determining metabolite pairing relationship

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140088885A1 (en) * 2011-03-11 2014-03-27 Dong-Yup LEE Method, an apparatus, and a computer program product for identifying metabolites from liquid chromatography-mass spectrometry measurements
WO2018072306A1 (en) * 2016-10-23 2018-04-26 哈尔滨工业大学深圳研究生院 Visualization network-based two-stage metabolite mass spectrometry detection method for compound
CN107729721A (en) * 2017-10-17 2018-02-23 中国科学院上海有机化学研究所 A kind of metabolin identification and disorderly path analysis method
CN110907575A (en) * 2018-09-14 2020-03-24 中国科学院大连化学物理研究所 Deep annotation method of hydroxycinnamic acid amide in plants
CN111710363A (en) * 2020-06-19 2020-09-25 苏州帕诺米克生物医药科技有限公司 Method and device for determining metabolite pairing relationship

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BRIAN E. SEDIO 等: "A protocol for high-throughput,untargeted forest community metabolomics using mass spectrometry molecular networks", 《APPLICATIONS IN PLANT SCIENCES》 *
HUIBIN SHEN 等: "Metabolite identification through multiple kernel learning on fragmentation trees", 《BIOINFORMATICS》 *
NAOKI TANAKA 等: "Small-World Phenomena in Chemical Library Networks:Application to Fragment-Based Drug Discovery" *
SHANSHAN XU 等: "Metabolomics Based on UHPLC-Orbitrap-MS and Global Natural Product Social Molecular Networking Reveals Effects of Time Scale and Environment of Storage on the Metabolites and Taste Quality of Raw Pu-erh Tea", 《JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY》 *
XIAOTAO SHEN 等: "Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics" *
孔宏伟 等: "基于液相色谱-质谱联用的代谢组学研究中代谢物的结构鉴定进展", 《色谱》 *
王先龙: "《计算机辅助药物设计 实践指南》", 30 June 2016, 电子科技大学出版社 *

Also Published As

Publication number Publication date
CN114594171B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
US11222775B2 (en) Data independent acquisition of product ion spectra and reference spectra library matching
EP1827657B1 (en) Qualitative and quantitative mass spectral analysis
Werner et al. Mass spectrometry for the identification of the discriminating signals from metabolomics: current status and future trends
JP5590156B2 (en) Mass spectrometry method and apparatus
Draper et al. Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour'rules'
CN109828068B (en) Mass spectrum data acquisition and analysis method
EP1766394B1 (en) System and method for grouping precursor and fragment ions using selected ion chromatograms
Kirkwood et al. Simultaneous, untargeted metabolic profiling of polar and nonpolar metabolites by LC‐Q‐TOF mass spectrometry
JP6149810B2 (en) Metabolite analysis system and metabolite analysis method
CN110907575B (en) Deep annotation method of hydroxycinnamic acid amide in plants
Roessner et al. Metabolite measurements
CN114594171A (en) Deep annotation method for metabolome
EP4078600B1 (en) Method and system for the identification of compounds in complex biological or environmental samples
Aalizadeh et al. AutoSuspect: An R package to Perform Automatic Suspect Screening based on Regulatory Databases
CN114609318B (en) Large-scale metabolome qualitative method based on molecular structure association network
US20220301839A1 (en) Method for analyzing mass spectrometry data, computer program medium, and device for analyzing mass spectrometry data
CN114420222B (en) Distributed flow processing-based method for rapidly confirming fragment ion compound structure
WO2023037295A2 (en) Chemical peak finder model for unknown compound detection and identification
CN117976097A (en) Establishment and application method, device and equipment of glucocorticoid mass spectrum database
CN118120041A (en) Three-dimensional chemical peak finder for qualitative and quantitative analysis workflow
WO2023037293A2 (en) Ion type tailored library search pre-processing, constraints and spectral database building
WO2023037306A2 (en) Three-dimensional chemical peak finder for qualitative and quantitative analytical workflows
Lynn et al. An Automated Identification Tool for LC-MS Based Metabolomics Studies
Souza et al. Accelerated unknown compound annotation with confidence: from spectra to structure in untargeted metabolomics experiments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant