WO2017128497A1 - 一种代谢混合物ms/ms质谱的仿真生成方法及系统 - Google Patents

一种代谢混合物ms/ms质谱的仿真生成方法及系统 Download PDF

Info

Publication number
WO2017128497A1
WO2017128497A1 PCT/CN2016/076226 CN2016076226W WO2017128497A1 WO 2017128497 A1 WO2017128497 A1 WO 2017128497A1 CN 2016076226 W CN2016076226 W CN 2016076226W WO 2017128497 A1 WO2017128497 A1 WO 2017128497A1
Authority
WO
WIPO (PCT)
Prior art keywords
mass
simulation
mass spectrum
unit
metabolite
Prior art date
Application number
PCT/CN2016/076226
Other languages
English (en)
French (fr)
Inventor
周家锐
纪震
殷夫
朱泽轩
Original Assignee
哈尔滨工业大学深圳研究生院
周家锐
纪震
殷夫
朱泽轩
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 哈尔滨工业大学深圳研究生院, 周家锐, 纪震, 殷夫, 朱泽轩 filed Critical 哈尔滨工业大学深圳研究生院
Publication of WO2017128497A1 publication Critical patent/WO2017128497A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • G16B5/20Probabilistic models
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Definitions

  • the invention relates to the field of simulation, and in particular relates to a simulation generation method and system for a metabolic mixture MS/MS mass spectrum.
  • Metabolites are a general term for small molecular organic compounds that complete metabolic processes in living organisms and contain abundant physiological state information. Metabolomics is based on the overall systematic study of metabolites, which can effectively reveal the real mechanism behind physiological phenomena and more fully display the dynamic state of living organisms. Therefore, it has gained more and more attention and is widely used in many In the field of scientific research and practical. Mass Spectrometry (MS) is one of the most important research tools in metabolomics. Among them, secondary mass spectrometry (MS/MS) has become a major development direction in recent years because it can effectively identify different metabolites and accurately measure signal intensity. The data format is shown in Figure 1 and Figure 2. In practical applications, it is generally necessary to analyze a mixture containing a plurality of metabolites, and its MS/MS mass spectrometry becomes the data base for related research and development.
  • MS mass spectrometry
  • the first is to directly obtain the MS/MS mass spectrum of a specific mixture sample by using a mass spectrometer.
  • This method can obtain real spectral line data and is the most important information source and support for metabolomics.
  • its cost is high, and the mass spectra of different mixtures and parameters are quite different, which is difficult to meet the needs of related research.
  • the second is the use of computer simulation technology to generate putative Spectra based on known single metabolite MS/MS mass spectra and physicochemical knowledge. This method is less costly and can produce a large number of metabolic mixture simulation mass spectra under specific parameters. However, the accuracy is not high, and research and development may lead to erroneous results.
  • the MS/MS mass spectrometry data generated by computer simulation is generally used for initial development, and then the real performance is verified based on the experimental data. The accuracy of the simulated mass spectrometry determines the quality and speed of the relevant research.
  • MS/MS data is not a simple superposition of the individual mass spectra.
  • the spectral line distribution will vary depending on the mass spectrometer parameter settings. Therefore, experimental data is often difficult to reuse, and it is necessary to redesign and collect all mass spectrometry information for a specific research and development project, which is extremely costly.
  • the noise model used in existing methods is too simple, generally with biological information. Unrelated Gaussian noise or editing errors, etc., the generated MS/MS simulated mass spectrum is difficult to reflect the actual situation.
  • an object of the present invention is to provide a method and system for simulating generation of a metabolic mixture MS/MS mass spectrometer, which aims to solve the problem that the existing metabolic mixture MS/MS mass spectrometry generation method is high in cost and difficult to collect. Or analyze problems such as poor performance and large errors.
  • a method for simulating generation of a metabolic mixture MS/MS mass spectrum comprising the steps of:
  • step B specifically includes:
  • the mean value of the calculated T is ⁇ T
  • the variance is ⁇ T
  • the structural mass ratio probability model is a normal distribution N( ⁇ T , ⁇ T );
  • the mean value of the calculated I is ⁇ I
  • the variance is ⁇ I
  • the structural strength probability model is a normal distribution N ( ⁇ I , ⁇ I );
  • step C specifically includes:
  • step C4 adding simulation noise to each spectral line (m d , i d ) in R c to obtain (m * d , i * d ) and replacing the original (m d , i d ), and then proceeding to step C6;
  • step C4 specifically includes:
  • a random value r uniformly distributed in the range [0, 1] is generated, and if r ⁇ p del , the corresponding spectral line is deleted.
  • p del is the spectral line deletion probability
  • p mz is the mass-to-core ratio migration probability
  • p int is the intensity offset probability
  • step D specifically includes:
  • m * d m d + t obeying the Nl ( ⁇ T , ⁇ T ) distribution, and calculate the corresponding intensity using R l
  • a simulation generation system for a metabolic mixture MS/MS mass spectrometer comprising:
  • a noise probability model statistic module for counting a noise probability model for each metabolite based on a real MS/MS mass spectrum of each metabolite
  • Simulating a mass spectrometry generating module for generating a simulated mass spectrometer set of corresponding metabolites in ⁇ according to a noise probability model of each metabolite
  • a simulated mass spectrometry generation module for sequentially generating a metabolic mixture MS/MS simulated mass spectrum based on a simulated mass spectrometer set of all metabolites
  • noise probability model statistics module specifically includes:
  • a first structural unit for calculating a mean value of T T and a variance of ⁇ T such that the texture-mass ratio probability model is a normal distribution N( ⁇ T , ⁇ T );
  • a second structural unit for calculating a mean value I of I , a variance of ⁇ I , and a structural probability probability model of a normal distribution N ( ⁇ I , ⁇ I );
  • simulation mass spectrometry group generation module specifically includes:
  • Adding a unit for generating a uniformly distributed random value r in the range [0,1], if r ⁇ p ins , adding a spectral line (m d , i d ) to R c , where m d c +t,t is a random offset obeying N( ⁇ T , ⁇ T ) ⁇ P n distribution; i d is a random value obeying N( ⁇ I , ⁇ I ) ⁇ P n distribution, and p ins is the spectral line increasing probability ;
  • the simulation generation system of the metabolic mixture MS/MS mass spectrum wherein the replacement unit specifically includes:
  • the subunit is deleted for generating a random value r uniformly distributed in the range [0, 1], and if r ⁇ p del , the corresponding spectral line is deleted.
  • the intensity offset sub-unit is used to generate a random value r uniformly distributed in the range [0,1]. If r ⁇ p int , the i d becomes a compliant N( ⁇ I , ⁇ I ) ⁇ P n distribution. New random value i * d ;
  • p del is the spectral line deletion probability
  • p mz is the mass-to-core ratio migration probability
  • p int is the intensity offset probability
  • the simulation generation system of the metabolic mixture MS/MS mass spectrum wherein the simulated mass spectrometry generating module specifically comprises:
  • the calculation unit is configured to extract the mass-to-nuclear ratio vector of S l as M l , and calculate a probability ratio model of mass-nuclear ratio as Nl ( ⁇ T , ⁇ T );
  • the present invention does not rely on real experiments, and can modify the parameter settings to generate a large amount of the desired metabolic mixture MS/MS simulation mass spectrum, which is extremely low in cost and the sample size is not limited by the acquisition conditions.
  • the invention uses the nonlinear regression model to generate MS/MS simulated mass spectrum, which avoids the accuracy problem caused by simple linear superposition in the traditional algorithm.
  • a noise probability model is established by statistical real metabolic mass spectrometry data, Covers the complex interference situations that need to be faced in real-world applications. The generated mass spectrometry data is more realistic, can effectively guide the preliminary development of metabolomics, and is partially used for the verification of algorithm performance.
  • FIG. 1 and 2 are schematic diagrams showing the data structure of a secondary mass spectrum in the present invention.
  • FIG. 3 is a flow chart of a preferred embodiment of a simulation generation method for a metabolic mixture MS/MS mass spectrum according to the present invention.
  • Figure 4 is a MS/MS simulated mass spectrometry construction method for a single metabolite.
  • Figure 5 is a metabolic mass MS/MS simulated mass spectrometry construction method.
  • the present invention provides a method and system for simulating the generation of a metabolic mixture MS/MS mass spectrometer.
  • a metabolic mixture MS/MS mass spectrometer In order to make the objects, technical solutions and effects of the present invention more clear and clear, the present invention will be further described in detail below. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
  • FIG. 3 is a flow chart of a preferred embodiment of a method for generating a simulated mixture MS/MS mass spectrum of the present invention. As shown in the figure, the method includes the following steps:
  • a noise probability model is established by counting the distribution of Mass to Charge Ratio (m/z) and intensity (Intensity). . Then, the original spectral line is increased or decreased using the editing error, and the simulated noise is added according to the noise probability model. Thereby a set of putative metabolite MS/MS mass spectra is formed. Finally, the regression model was used to nonlinearly model the simulated mass spectrum, and the MS/MS simulated mass spectrum of the mixture was generated as the algorithm output.
  • the mixture of the desired simulation contains N metabolites.
  • the step B specifically includes:
  • the mean value of the calculated T is ⁇ T
  • the variance is ⁇ T
  • the structural mass ratio probability model is a normal distribution N( ⁇ T , ⁇ T );
  • the mean value of the calculated I is ⁇ I
  • the variance is ⁇ I
  • the structural strength probability model is a normal distribution N ( ⁇ I , ⁇ I );
  • the step C specifically includes:
  • step C4 adding simulation noise to each spectral line (m d , i d ) in R c to obtain (m * d , i * d ) and replacing the original (m d , i d ), and then proceeding to step C6;
  • the step C4 specifically includes:
  • a random value r uniformly distributed in the range [0, 1] is generated, and if r ⁇ p del , the corresponding spectral line is deleted.
  • p del is the spectral line deletion probability
  • p mz is the mass-to-core ratio migration probability
  • p int is the intensity offset probability
  • the step D specifically includes:
  • the mass-to-nuclear ratio vector of S l is extracted as M l
  • the probability ratio model of mass-nuclear ratio is calculated as Nl( ⁇ T , ⁇ T ); the specific method can be referred to as B1 to B3.
  • m * d m d + t obeying the Nl ( ⁇ T , ⁇ T ) distribution, and calculate the corresponding intensity using R l
  • the present invention also provides a preferred embodiment of a simulation generation system for a metabolic mixture MS/MS mass spectrometer, comprising:
  • a noise probability model statistic module for counting a noise probability model for each metabolite based on a real MS/MS mass spectrum of each metabolite
  • Simulating a mass spectrometry generating module for generating a simulated mass spectrometer set of corresponding metabolites in ⁇ according to a noise probability model of each metabolite
  • a simulated mass spectrometry generation module for sequentially generating a metabolic mixture MS/MS simulated mass spectrum based on a simulated mass spectrometer set of all metabolites
  • noise probability model statistics module specifically includes:
  • a first structural unit for calculating a mean value of T T and a variance of ⁇ T such that the texture-mass ratio probability model is a normal distribution N( ⁇ T , ⁇ T );
  • a second structural unit for calculating a mean value I of I , a variance of ⁇ I , and a structural probability probability model of a normal distribution N ( ⁇ I , ⁇ I );
  • the simulated mass spectrometry group generating module specifically includes:
  • Adding a unit for generating a uniformly distributed random value r in the range [0,1], if r ⁇ p ins , adding a spectral line (m d , i d ) to R c , where m d c +t,t is a random offset obeying N( ⁇ T , ⁇ T ) ⁇ P n distribution; i d is a random value obeying N( ⁇ I , ⁇ I ) ⁇ P n distribution, and p ins is the spectral line increasing probability ;
  • replacing unit specifically includes:
  • the subunit is deleted for generating a random value r uniformly distributed in the range [0, 1], and if r ⁇ p del , the corresponding spectral line is deleted.
  • the intensity offset sub-unit is used to generate a random value r uniformly distributed in the range [0, 1]. If r ⁇ p int , the i d becomes a compliant N( ⁇ I , ⁇ I ) ⁇ P n distribution. New random value i * d ;
  • p del is the spectral line deletion probability
  • p mz is the mass-to-core ratio migration probability
  • p int is the intensity offset probability
  • the simulated mass spectrometry generating module specifically includes:
  • the calculation unit is configured to extract the mass-to-nuclear ratio vector of S l as M l , and calculate a probability ratio model of mass-nuclear ratio as Nl ( ⁇ T , ⁇ T );
  • the present invention does not rely on real experiments, and can modify the parameter settings to generate a large amount of the desired metabolic mixture MS/MS simulated mass spectrum, which is extremely low in cost and the sample size is not limited by the acquisition conditions.
  • the invention uses the nonlinear regression model to generate MS/MS simulated mass spectrum, which avoids the accuracy problem caused by simple linear superposition in the traditional algorithm.
  • the noise probability model is established by statistical real-metabolism mass spectrometry data, covering the complex interference situations that need to be faced in real-world applications. The generated mass spectrometry data is more realistic, can effectively guide the preliminary development of metabolomics, and is partially used for the verification of algorithm performance.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

一种代谢混合物MS/MS质谱的仿真生成方法及系统,其不依赖于真实实验,可通过修改参数设定,大量产生所需的代谢混合物MS/MS仿真质谱,样本量不受采集条件限制。此外,当条件与环境变动时,也无需重新设计并进行实验;有助于提升代谢组学的研发效率。本方法使用非线性回归模型生成MS/MS仿真质谱,避免了传统算法中简单线性叠加所带来的准确性问题。此外,通过统计真实代谢质谱数据建立噪声概率模型,涵盖了现实应用中需要面临的复杂干扰情况。所生成的质谱数据更符合实际,可有效指导代谢组学的前期研发,并部分用于算法性能的验证。

Description

一种代谢混合物MS/MS质谱的仿真生成方法及系统 技术领域
本发明涉及仿真领域,尤其涉及一种代谢混合物MS/MS质谱的仿真生成方法及系统。
背景技术
代谢物是生物体内完成代谢过程的小分子有机化合物总称,其包含了丰富的生理状态信息。代谢组学基于对代谢物的整体系统性研究,可有效揭示生理现象背后的真实机理,并更为全面地展示生命体的动态状态,因此获得了越来越多的重视,被广泛应用于诸多科研与实用领域中。质谱分析(Mass Spectrometry, MS)是代谢组学最为重要的研究工具之一。其中的二级质谱(MS/MS)因其可有效鉴别不同代谢物质,并准确衡量信号强度,近年来已成为主要的发展方向,其数据形式如图1和图2所示。在实际应用中,一般需分析包含多种代谢物的混合物质,其MS/MS质谱成为相关研究与开发的数据基础。
现有的代谢混合物MS/MS质谱主要由两种方法获得:
第一种是直接通过实际实验,使用质谱仪取得特定混合物样本的MS/MS质谱,这一方法可得到真实的谱线数据,是代谢组学最为重要的信息来源与支撑。但其成本较高,在不同混合物与参数条件下的质谱差异较大,难以满足相关研究的需求。
第二种是使用计算机仿真技术,根据已知的单一代谢物MS/MS质谱及理化知识,生成推定的谱数据(Putative Spectra)。此方法成本较低,可大量产生特定参数条件下的代谢混合物仿真质谱。但准确度不高,用于研究开发可能导致错误结果。
在现有的代谢组学研究中,一般使用计算机仿真所生成的MS/MS质谱数据进行初期研发,而后再基于实验数据验证其真实性能。仿真质谱的准确程度决定了相关研究的质量与速度。
现有基于实验的代谢混合物MS/MS质谱生成方法,其缺点在于:
第一,混合物中的各种化学分子在进行二级质谱分析时会相互影响,所产生的MS/MS数据并非各单一物质谱线的简单叠加。且质谱仪参数设置不同,谱线分布也会有所差异。因此实验数据往往难以重复利用,需针对特定的研发项目,重新设计并采集所有的质谱信息,所需成本极高。
第二,某些特定的代谢混合物,例如糖尿病人的血液样本等,采集难度较大、成本较高。且在每个个体上仅能获得有限的样本量,其总数难以保证。影响了后续研究的进行。
现有基于计算机仿真的混合物MS/MS质谱生成方法,其缺点在于:
第一,现有算法往往基于各单一代谢物质谱的线性叠加,与实际情况中的非线性混合状况差异较大。当用于代谢组学研究时,容易导致模型过于简化。在真实的混合物MS/MS质谱上的分析性能不佳。
第二,现有方法所使用的噪声模型过于简单,一般为与生物信息 无关的高斯噪声或编辑误差等,所生成的MS/MS仿真质谱难以反映实际情况。
因此,现有技术还有待于改进和发展。
发明内容
鉴于上述现有技术的不足,本发明的目的在于提供一种代谢混合物MS/MS质谱的仿真生成方法及系统,旨在解决现有的代谢混合物MS/MS质谱生成方法要么成本高、采集难度大、要么分析性能不佳、误差大等问题。
本发明的技术方案如下:
一种代谢混合物MS/MS质谱的仿真生成方法,其中,包括步骤:
A、设所需仿真的混合物质内包含N种代谢物
Figure PCTCN2016076226-appb-000001
所述N种代谢物的真实MS/MS质谱对应为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…(md,id),…],md、id分别为第d条谱线的质核比与强度值;
B、根据每个代谢物的真实MS/MS质谱,统计每个代谢物的噪声概率模型;
C、根据每一代谢物的噪声概率模型生成Φ中相应代谢物的仿真质谱组;
D、根据所有代谢物的仿真质谱组,依次产生代谢混合物MS/MS仿真质谱;
E、设置最大生成仿真数量为L,将每次产生的代谢混合物MS/MS 仿真质谱组成S*={S* 1,S* 2,…,S* L},并作为生成结果输出。
所述的代谢混合物MS/MS质谱的仿真生成方法,其中,所述步骤B具体包括:
B1、设当前输入为第n个代谢物的真实MS/MS质谱Sn,Sn=[(m1,i1),(m2,i2),…(md,id),…],提取其质核比矢量为M=[m1,m2,…],强度矢量为I=[i1,i2,…];
B2、对于M中的每个质核比数值,取其小数部分,形成质核偏移矢量T=[t1,t2,…];
B3、计算T的均值为μT,方差为σT,从而构造质核比概率模型为正态分布N(μTT);
B4、计算I的均值为μI,方差为σI,构造强度概率模型为正态分布N(μII);
B5、从而得到第n个代谢物的噪声概率模型为Pn=[N(μTT),N(μII)]。
所述的代谢混合物MS/MS质谱的仿真生成方法,其中,所述步骤C具体包括:
C1、设当前输入为第n个代谢物的真实MS/MS质谱Sn及噪声概率模型Pn,初始化计数器k=1;
C2、计算Sn中质核比矢量的取值范围为R=[min(M),max(M)],取C为R内所有整数值所形成的矢量;
C3、对于每个c∈C,若Rc=[c-0.5,c+0.5]范围内不包含谱线,则转至步骤C5,若Rc=[c-0.5,c+0.5]范围内包含谱线,则进入步骤C4;
C4、对Rc内的每个谱线(md,id)增加仿真噪声得到(m* d,i* d)并替换原有的(md,id),然后转至步骤C6;
C5、产生[0,1]范围内均匀分布的随机值r,若有r<pins,则在Rc内添加一根谱线(md,id),其中md=c+t,t为服从N(μTT)∈Pn分布的随机偏移;id为服从N(μII)∈Pn分布的随机值,pins为谱线增加概率;
C6、将修改后的谱线数据存储为第n个代谢物的第k个仿真质谱S* n,k,更新计数器k=k+1,若k<K则转至步骤C2,K为最大生成质谱数量;
C7、输出第n个代谢物的仿真质谱组为S* n={S* n,1,S* n,2,...,S* n,K}。
所述的代谢混合物MS/MS质谱的仿真生成方法,其中,所述步骤C4具体包括:
产生[0,1]范围内均匀分布的随机值r,若有r<pdel,则将对应谱线删除。
产生[0,1]范围内均匀分布的随机值r,若有r<pmz,则使md产生一个服从N(μTT)∈Pn分布的随机偏移t,有m* d=md+t;
产生[0,1]范围内均匀分布的随机值r,若有r<pint,则使id变为一个服从N(μII)∈Pn分布的新随机值i* d
其中,pdel为谱线删除概率,pmz为质核比偏移概率,pint为强度偏移概率。
所述的代谢混合物MS/MS质谱的仿真生成方法,其中,所述步骤D具体包括:
D1、从每个代谢物的仿真质谱组S* n,n=1,2,…,N中,各随机选择一个质谱S* n,K,k∈K,共计N个;将其中所有谱线混合,组成新的质谱矢量Sl=[(m1,i1),(m2,i2),…];
D2、提取Sl的质核比矢量为Ml,计算其质核比概率模型为Nl(μTT);
D3、使用回归算法对Sl进行建模形成非线性模型Rl
D4、对于Ml中的每个md,使其产生一个服从Nl(μTT)分布的随机偏移值t:m* d=md+t,并使用Rl计算对应的强度值为i* d,构成新的仿真谱线(m* d,i* d),将所有仿真谱线组成代谢混合物MS/MS仿真质谱S* l=[(m* 1,i* 1),(m* 2,i* 2),…],作为当前输出;
D5、更新计数器l=l+1,若l<L则转至步骤D1。
一种代谢混合物MS/MS质谱的仿真生成系统,其中,包括:
设置模块,用于设所需仿真的混合物质内包含N种代谢物
Figure PCTCN2016076226-appb-000002
Figure PCTCN2016076226-appb-000003
所述N种代谢物的真实MS/MS质谱对应为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…(md,id),…],md、id分别为第d条谱线的质核比与强度值;
噪声概率模型统计模块,用于根据每个代谢物的真实MS/MS质谱,统计每个代谢物的噪声概率模型;
仿真质谱组生成模块,用于根据每一代谢物的噪声概率模型生成Φ中相应代谢物的仿真质谱组;
仿真质谱产生模块,用于根据所有代谢物的仿真质谱组,依次产生代谢混合物MS/MS仿真质谱;
结果输出模块,用于设置最大生成数量为L,将每次产生的代谢混合物MS/MS仿真质谱组成S*={S* 1,S* 2,…,S* L},并作为生成结果输出。
所述的代谢混合物MS/MS质谱的仿真生成系统,其中,所述噪声概率模型统计模块具体包括:
提取单元,用于设当前输入为第n个代谢物的真实MS/MS质谱Sn,Sn=[(m1,i1),(m2,i2),…(md,id),…],提取其质核比矢量为M=[m1,m2,…],强度矢量为I=[i1,i2,…];
质核偏移矢量形成单元,用于对于M中的每个质核比数值,取其小数部分,形成质核偏移矢量T=[t1,t2,…];
第一构造单元,用于计算T的均值为μT,方差为σT,从而构造质核比概率模型为正态分布N(μTT);
第二构造单元,用于计算I的均值为μI,方差为σI,构造强度概率模型为正态分布N(μII);
噪声概率模块生成单元,用于从而得到第n个代谢物的噪声概率模型为Pn=[N(μTT),N(μII)]。
所述的代谢混合物MS/MS质谱的仿真生成系统,其中,所述仿真质谱组生成模块具体包括:
初始化单元,用于设当前输入为第n个代谢物的真实MS/MS质谱Sn及噪声概率模型Pn,初始化计数器k=1;
取整单元,用于计算Sn中质核比矢量的取值范围为R=[min(M),max(M)],取C为R内所有整数值所形成的矢量;
判断单元,用于对于每个c∈C,若Rc=[c-0.5,c+0.5]范围内不包含谱线,则转至增加单元,若Rc=[c-0.5,c+0.5]范围内包含谱线,则进入替换单元;
替换单元,用于对Rc内的每个谱线(md,id)增加仿真噪声得到(m* d,i* d)并替换原有的(md,id),然后转至存储单元;
增加单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pins,则在Rc内添加一根谱线(md,id),其中md=c+t,t为服从N(μTT)∈Pn分布的随机偏移;id为服从N(μII)∈Pn分布的随机值,pins为谱线增加概率;
存储单元,用于将修改后的谱线数据存储为第n个代谢物的第k个仿真质谱S* n,k,更新计数器k=k+1,若k<K则转至取整单元,K为最大生成质谱数量;
输出单元,用于输出第n个代谢物的仿真质谱组为S* n={S* n,1,S* n,2,…,S* n,K}。
所述的代谢混合物MS/MS质谱的仿真生成系统,其中,所述替换单元具体包括:
删除子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pdel,则将对应谱线删除。
质核比偏移子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pmz,则使md产生一个服从N(μTT)∈Pn分布的随机偏移t,有m* d=md+t;
强度偏移子单元,用于产生[0,1]范围内均匀分布的随机值r,若 有r<pint,则使id变为一个服从N(μII)∈Pn分布的新随机值i* d
其中,pdel为谱线删除概率,pmz为质核比偏移概率,pint为强度偏移概率。
所述的代谢混合物MS/MS质谱的仿真生成系统,其中,所述仿真质谱产生模块具体包括:
混合单元,用于从每个代谢物的仿真质谱组S* n,n=1,2,…,N中,各随机选择一个质谱S* n,K,k∈K,共计N个;将其中所有谱线混合,组成新的质谱矢量Sl=[(m1,i1),(m2,i2),…];
计算单元,用于提取Sl的质核比矢量为Ml,计算其质核比概率模型为Nl(μTT);
建模单元,用于使用回归算法对Sl进行建模形成非线性模型Rl
随机偏移单元,用于对于Ml中的每个md,使其产生一个服从Nl(μTT)分布的随机偏移值t:m* d=md+t,并使用Rl计算对应的强度值为i* d,构成新的仿真谱线(m* d,i* d),将所有仿真谱线组成代谢混合物MS/MS仿真质谱S* l=[(m* 1,i* 1),(m* 2,i* 2),…],作为当前输出;
更新单元,用于更新计数器l=l+1,若l<L则转至混合单元。
有益效果:本发明不依赖于真实实验,可通过修改参数设定,大量产生所需的代谢混合物MS/MS仿真质谱,其成本极低,样本量不受采集条件限制。此外,当条件与环境变动时,也无需重新设计并进行实验;有助于提升代谢组学的研发效率。本发明使用非线性回归模型生成MS/MS仿真质谱,避免了传统算法中简单线性叠加所带来的准确性问题。此外,通过统计真实代谢质谱数据建立噪声概率模型, 涵盖了现实应用中需要面临的复杂干扰情况。所生成的质谱数据更符合实际,可有效指导代谢组学的前期研发,并部分用于算法性能的验证。
附图说明
图1和图2为本发明中二级质谱的数据结构示意图。
图3为本发明一种代谢混合物MS/MS质谱的仿真生成方法较佳实施例的流程图。
图4为单一代谢物的MS/MS仿真质谱构造方法。
图5为代谢混合物MS/MS仿真质谱构造方法。
具体实施方式
本发明提供一种代谢混合物MS/MS质谱的仿真生成方法及系统,为使本发明的目的、技术方案及效果更加清楚、明确,以下对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
请参阅图3,图3为本发明一种代谢混合物MS/MS质谱的仿真生成方法较佳实施例的流程图,如图所示,其包括步骤:
A、设所需仿真的混合物质内包含N种代谢物Φ(即目标代谢物集合
Figure PCTCN2016076226-appb-000004
所述N种代谢物的真实MS/MS质谱对应为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…(md,id),…],md、id分别为第d条谱线的质核比与强度值;
B、根据每个代谢物的真实MS/MS质谱,统计每个代谢物的噪声概率模型;
C、根据每一代谢物的噪声概率模型生成Φ中相应代谢物的仿真质谱组;
D、根据所有代谢物的仿真质谱组,依次产生代谢混合物MS/MS仿真质谱;
E、设置最大生成仿真数量为L,将每次产生的代谢混合物MS/MS仿真质谱组成S*={S* 1,S* 2,…,S* L},并作为生成结果输出。
在本发明中,以现有单一代谢物MS/MS质谱数据库中的信息为基础,通过统计其质核比(Mass to Charge Ratio, m/z)与强度(Intensity)的分布,建立噪声概率模型。而后,使用编辑误差对原始谱线进行增减,并根据噪声概率模型添加仿真噪声。从而形成一组推定的代谢物MS/MS质谱。最后,使用回归模型对仿真质谱进行非线性建模,产生混合物的MS/MS仿真质谱作为算法输出结果。
在所述步骤A中,设所需仿真的混合物质内包含N种代谢物
Figure PCTCN2016076226-appb-000005
Figure PCTCN2016076226-appb-000006
通过查询现有代谢物MS/MS质谱数据库如MassBank等,获得这N种代谢物的真实MS/MS质谱为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…],Sn∈S,md、id分别为其中第d条谱线的质核比与强度值。
所述步骤B具体包括:
B1、设当前输入为第n个代谢物的真实MS/MS质谱Sn,Sn=[(m1,i1),(m2,i2),…(md,id),…],提取其质核比矢量为M=[m1,m2,…],强 度矢量为I=[i1,i2,…];
B2、对于M中的每个质核比数值,取其小数部分,形成质核偏移矢量T=[t1,t2,…];例如,若有md=12.36∈M,则有对应td=0.36,td∈T,td为md小数部分。
B3、计算T的均值为μT,方差为σT,从而构造质核比概率模型为正态分布N(μTT);
B4、计算I的均值为μI,方差为σI,构造强度概率模型为正态分布N(μII);
B5、从而得到第n个代谢物的噪声概率模型为Pn=[N(μTT),N(μII)]。
如图4所示,所述步骤C具体包括:
C1、设当前输入为第n个代谢物的真实MS/MS质谱Sn及噪声概率模型Pn,初始化计数器k=1;
C2、计算Sn中质核比矢量的取值范围为R=[min(M),max(M)],取C为R内所有整数值所形成的矢量;例如若R=[0,5],则有C=[0,1,2,3,4,5]。
C3、对于每个c∈C,若Rc=[c-0.5,c+0.5]范围内不包含任何实际(真实)谱线,亦即没有任何md∈M在Rc内,则转至步骤C5;若Rc=[c-0.5,c+0.5]范围内包含谱线,则进入步骤C4;
C4、对Rc内的每个谱线(md,id)增加仿真噪声得到(m* d,i* d)并替换原有的(md,id),然后转至步骤C6;
C5、产生[0,1]范围内均匀分布的随机值r,若有r<pins,则在Rc 内添加一根噪声谱线(md,id),其中md=c+t,t为服从N(μTT)∈Pn分布的随机偏移;id为服从N(μII)∈Pn分布的随机值,pins为谱线增加概率;
C6、将修改后的谱线数据存储为第n个代谢物的第k个仿真质谱S* n,k,更新计数器k=k+1,若k<K则转至步骤C2;,K为最大生成质谱数量;
C7、输出第n个代谢物的(MS/MS)仿真质谱组为S* n={S* n,1,S* n,2,…,S* n,K}。
所述步骤C4具体包括:
产生[0,1]范围内均匀分布的随机值r,若有r<pdel,则将对应谱线删除。
产生[0,1]范围内均匀分布的随机值r,若有r<pmz,则使md产生一个服从N(μTT)∈Pn分布的随机偏移t,有m* d=md+t。
产生[0,1]范围内均匀分布的随机值r,若有r<pint,则使id变为一个服从N(μII)∈Pn分布的新随机值i* d
其中,pdel为谱线删除概率,pmz为质核比偏移概率,pint为强度偏移概率。
如图5所示,所述步骤D具体包括:
D1、从每个代谢物的仿真质谱组S* n,n=1,2,…,N中,各随机选择一个质谱S* n,K,k∈K,共计N个;将其中所有谱线混合,组成新的质谱矢量Sl=[(m1,i1),(m2,i2),…];
D2、提取Sl的质核比矢量为Ml,计算其质核比概率模型为Nl(μT, σT);其具体方法可参见B1至B3。
D3、使用回归算法对Sl进行建模形成非线性模型Rl;例如使用支持向量机回归(Support Vector Machine Regression, SVR)等方法来进行建模,形成非线性模型Rl
D4、对于Ml中的每个md,使其产生一个服从Nl(μTT)分布的随机偏移值t:m* d=md+t,并使用Rl计算对应的强度值为i* d,构成新的仿真谱线(m* d,i* d),将所有仿真谱线组成代谢混合物MS/MS仿真质谱S* l=[(m* 1,i* 1),(m* 2,i* 2),…],作为当前输出;
D5、更新计数器l=l+1,若l<L则转至步骤D1。
基于上述方法,本发明还提供一种代谢混合物MS/MS质谱的仿真生成系统较佳实施例,其包括:
设置模块,用于设所需仿真的混合物质内包含N种代谢物
Figure PCTCN2016076226-appb-000007
Figure PCTCN2016076226-appb-000008
所述N种代谢物的真实MS/MS质谱对应为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…(md,id),…],md、id分别为第d条谱线的质核比与强度值;
噪声概率模型统计模块,用于根据每个代谢物的真实MS/MS质谱,统计每个代谢物的噪声概率模型;
仿真质谱组生成模块,用于根据每一代谢物的噪声概率模型生成Φ中相应代谢物的仿真质谱组;
仿真质谱产生模块,用于根据所有代谢物的仿真质谱组,依次产生代谢混合物MS/MS仿真质谱;
结果输出模块,用于设置最大生成数量为L,将每次产生的代谢 混合物MS/MS仿真质谱组成S*={S* 1,S* 2,…,S* L},并作为生成结果输出。
进一步,所述噪声概率模型统计模块具体包括:
提取单元,用于设当前输入为第n个代谢物的真实MS/MS质谱Sn,Sn=[(m1,i1),(m2,i2),…(md,id),…],提取其质核比矢量为M=[m1,m2,…],强度矢量为I=[i1,i2,…];
质核偏移矢量形成单元,用于对于M中的每个质核比数值,取其小数部分,形成质核偏移矢量T=[t1,t2,…];
第一构造单元,用于计算T的均值为μT,方差为σT,从而构造质核比概率模型为正态分布N(μTT);
第二构造单元,用于计算I的均值为μI,方差为σI,构造强度概率模型为正态分布N(μII);
噪声概率模块生成单元,用于从而得到第n个代谢物的噪声概率模型为Pn=[N(μTT),N(μII)]。
进一步,所述仿真质谱组生成模块具体包括:
初始化单元,用于设当前输入为第n个代谢物的真实MS/MS质谱Sn及噪声概率模型Pn,初始化计数器k=1;
取整单元,用于计算Sn中质核比矢量的取值范围为R=[min(M),max(M)],取C为R内所有整数值所形成的矢量;
判断单元,用于对于每个c∈C,若Rc=[c-0.5,c+0.5]范围内不包含谱线,则转至增加单元,若Rc=[c-0.5,c+0.5]范围内包含谱线,则进入替换单元;
替换单元,用于对Rc内的每个谱线(md,id)增加仿真噪声得到(m* d,i* d)并替换原有的(md,id),然后转至存储单元;
增加单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pins,则在Rc内添加一根谱线(md,id),其中md=c+t,t为服从N(μTT)∈Pn分布的随机偏移;id为服从N(μII)∈Pn分布的随机值,pins为谱线增加概率;
存储单元,用于将修改后的谱线数据存储为第n个代谢物的第k个仿真质谱S* n,k,更新计数器k=k+1,若k<K则转至取整单元,K为最大生成质谱数量;
输出单元,用于输出第n个代谢物的仿真质谱组为S* n={S* n,1,S* n,2,…,S* n,K}。
进一步,所述替换单元具体包括:
删除子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pdel,则将对应谱线删除。
质核比偏移子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pmz,则使md产生一个服从N(μTT)∈Pn分布的随机偏移t,有m* d=md+t;
强度偏移子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pint,则使id变为一个服从N(μII)∈Pn分布的新随机值i* d
其中,pdel为谱线删除概率,pmz为质核比偏移概率,pint为强度偏移概率。
进一步,所述仿真质谱产生模块具体包括:
混合单元,用于从每个代谢物的仿真质谱组S* n,n=1,2,…,N中,各随机选择一个质谱S* n,K,k∈K,共计N个;将其中所有谱线混合,组成新的质谱矢量Sl=[(m1,i1),(m2,i2),…];
计算单元,用于提取Sl的质核比矢量为Ml,计算其质核比概率模型为Nl(μTT);
建模单元,用于使用回归算法对Sl进行建模形成非线性模型Rl
随机偏移单元,用于对于Ml中的每个md,使其产生一个服从Nl(μTT)分布的随机偏移值t:m* d=md+t,并使用Rl计算对应的强度值为i* d,构成新的仿真谱线(m* d,i* d),将所有仿真谱线组成代谢混合物MS/MS仿真质谱S* l=[(m* 1,i* 1),(m* 2,i* 2),…],作为当前输出;
更新单元,用于更新计数器l=l+1,若l<L则转至混合单元。
关于上述模块单元的技术细节在前面的方法中已有详述,故不再赘述。
综上所述,本发明不依赖于真实实验,可通过修改参数设定,大量产生所需的代谢混合物MS/MS仿真质谱,其成本极低,样本量不受采集条件限制。此外,当条件与环境变动时,也无需重新设计并进行实验;有助于提升代谢组学的研发效率。本发明使用非线性回归模型生成MS/MS仿真质谱,避免了传统算法中简单线性叠加所带来的准确性问题。此外,通过统计真实代谢质谱数据建立噪声概率模型,涵盖了现实应用中需要面临的复杂干扰情况。所生成的质谱数据更符合实际,可有效指导代谢组学的前期研发,并部分用于算法性能的验证。
应当理解的是,本发明的应用不限于上述的举例,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本发明所附权利要求的保护范围。

Claims (10)

  1. 一种代谢混合物MS/MS质谱的仿真生成方法,其特征在于,包括步骤:
    A、设所需仿真的混合物质内包含N种代谢物
    Figure PCTCN2016076226-appb-100001
    所述N种代谢物的真实MS/MS质谱对应为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…(md,id),…],md、id分别为第d条谱线的质核比与强度值;
    B、根据每个代谢物的真实MS/MS质谱,统计每个代谢物的噪声概率模型;
    C、根据每一代谢物的噪声概率模型生成Φ中相应代谢物的仿真质谱组;
    D、根据所有代谢物的仿真质谱组,依次产生代谢混合物MS/MS仿真质谱;
    E、设置最大生成仿真数量为L,将每次产生的代谢混合物MS/MS仿真质谱组成S*={S* 1,S* 2,…,S* L},并作为生成结果输出。
  2. 根据权利要求1所述的代谢混合物MS/MS质谱的仿真生成方法,其特征在于,所述步骤B具体包括:
    B1、设当前输入为第n个代谢物的真实MS/MS质谱Sn,Sn=[(m1,i1),(m2,i2),…(md,id),…],提取其质核比矢量为M=[m1,m2,…],强度矢量为I=[i1,i2,…];
    B2、对于M中的每个质核比数值,取其小数部分,形成质核偏移矢量T=[t1,t2,…];
    B3、计算T的均值为μT,方差为σT,从而构造质核比概率模型为正态分布N(μTT);
    B4、计算I的均值为μI,方差为σI,构造强度概率模型为正态分布N(μII);
    B5、从而得到第n个代谢物的噪声概率模型为Pn=[N(μTT),N(μII)]。
  3. 根据权利要求1所述的代谢混合物MS/MS质谱的仿真生成方法,其特征在于,所述步骤C具体包括:
    C1、设当前输入为第n个代谢物的真实MS/MS质谱Sn及噪声概率模型Pn,初始化计数器k=1;
    C2、计算Sn中质核比矢量的取值范围为R=[min(M),max(M)],取C为R内所有整数值所形成的矢量;
    C3、对于每个c∈C,若Rc=[c-0.5,c+0.5]范围内不包含谱线,则转至步骤C5,若Rc=[c-0.5,c+0.5]范围内包含谱线,则进入步骤C4;
    C4、对Rc内的每个谱线(md,id)增加仿真噪声得到(m* d,i* d)并替换原有的(md,id),然后转至步骤C6;
    C5、产生[0,1]范围内均匀分布的随机值r,若有r<pins,则在Rc内添加一根谱线(md,id),其中md=c+t,t为服从N(μTT)∈Pn分布的随机偏移;id为服从N(μII)∈Pn分布的随机值,pins为谱线增加概率;
    C6、将修改后的谱线数据存储为第n个代谢物的第k个仿真质谱S* n,k,更新计数器k=k+1,若k<K则转至步骤C2,K为最大生成 质谱数量;
    C7、输出第n个代谢物的仿真质谱组为S* n={S* n,1,S* n,2,…,S* n,K}。
  4. 根据权利要求3所述的代谢混合物MS/MS质谱的仿真生成方法,其特征在于,所述步骤C4具体包括:
    产生[0,1]范围内均匀分布的随机值r,若有r<pdel,则将对应谱线删除。
    产生[0,1]范围内均匀分布的随机值r,若有r<pmz,则使md产生一个服从N(μTT)∈Pn分布的随机偏移t,有m* d=md+t;
    产生[0,1]范围内均匀分布的随机值r,若有r<pint,则使id变为一个服从N(μII)∈Pn分布的新随机值i* d
    其中,pdel为谱线删除概率,pmz为质核比偏移概率,pint为强度偏移概率。
  5. 根据权利要求1所述的代谢混合物MS/MS质谱的仿真生成方法,其特征在于,所述步骤D具体包括:
    D1、从每个代谢物的仿真质谱组S* n,n=1,2,…,N中,各随机选择一个质谱S* n,K,k∈K,共计N个;将其中所有谱线混合,组成新的质谱矢量Sl=[(m1,i1),(m2,i2),…];
    D2、提取Sl的质核比矢量为Ml,计算其质核比概率模型为Nl(μTT);
    D3、使用回归算法对Sl进行建模形成非线性模型Rl
    D4、对于Ml中的每个md,使其产生一个服从Nl(μTT)分布的随机偏移值t:m* d=md+t,并使用Rl计算对应的强度值为i* d,构成 新的仿真谱线(m* d,i* d),将所有仿真谱线组成代谢混合物MS/MS仿真质谱S* l=[(m* 1,i* 1),(m* 2,i* 2),…],作为当前输出;
    D5、更新计数器l=l+1,若l<L则转至步骤D1。
  6. 一种代谢混合物MS/MS质谱的仿真生成系统,其特征在于,包括:
    设置模块,用于设所需仿真的混合物质内包含N种代谢物
    Figure PCTCN2016076226-appb-100002
    Figure PCTCN2016076226-appb-100003
    所述N种代谢物的真实MS/MS质谱对应为S={S1,S2,…Sn…,SN},其中任意Sn=[(m1,i1),(m2,i2),…(md,id),…],md、id分别为第d条谱线的质核比与强度值;
    噪声概率模型统计模块,用于根据每个代谢物的真实MS/MS质谱,统计每个代谢物的噪声概率模型;
    仿真质谱组生成模块,用于根据每一代谢物的噪声概率模型生成Φ中相应代谢物的仿真质谱组;
    仿真质谱产生模块,用于根据所有代谢物的仿真质谱组,依次产生代谢混合物MS/MS仿真质谱;
    结果输出模块,用于设置最大生成数量为L,将每次产生的代谢混合物MS/MS仿真质谱组成S*={S* 1,S* 2,…,S* L},并作为生成结果输出。
  7. 根据权利要求6所述的代谢混合物MS/MS质谱的仿真生成系统,其特征在于,所述噪声概率模型统计模块具体包括:
    提取单元,用于设当前输入为第n个代谢物的真实MS/MS质谱Sn,Sn=[(m1,i1),(m2,i2),…(md,id),…],提取其质核比矢量为M=[m1, m2,…],强度矢量为I=[i1,i2,…];
    质核偏移矢量形成单元,用于对于M中的每个质核比数值,取其小数部分,形成质核偏移矢量T=[t1,t2,…];
    第一构造单元,用于计算T的均值为μT,方差为σT,从而构造质核比概率模型为正态分布N(μTT);
    第二构造单元,用于计算I的均值为μI,方差为σI,构造强度概率模型为正态分布N(μII);
    噪声概率模块生成单元,用于从而得到第n个代谢物的噪声概率模型为Pn=[N(μTT),N(μII)]。
  8. 根据权利要求6所述的代谢混合物MS/MS质谱的仿真生成系统,其特征在于,所述仿真质谱组生成模块具体包括:
    初始化单元,用于设当前输入为第n个代谢物的真实MS/MS质谱Sn及噪声概率模型Pn,初始化计数器k=1;
    取整单元,用于计算Sn中质核比矢量的取值范围为R=[min(M),max(M)],取C为R内所有整数值所形成的矢量;
    判断单元,用于对于每个c∈C,若Rc=[c-0.5,c+0.5]范围内不包含谱线,则转至增加单元,若Rc=[c-0.5,c+0.5]范围内包含谱线,则进入替换单元;
    替换单元,用于对Rc内的每个谱线(md,id)增加仿真噪声得到(m* d,i* d)并替换原有的(md,id),然后转至存储单元;
    增加单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pins,则在Rc内添加一根谱线(md,id),其中md=c+t,t为服从N(μTT) ∈Pn分布的随机偏移;id为服从N(μII)∈Pn分布的随机值,pins为谱线增加概率;
    存储单元,用于将修改后的谱线数据存储为第n个代谢物的第k个仿真质谱S* n,k,更新计数器k=k+1,若k<K则转至取整单元,K为最大生成质谱数量;
    输出单元,用于输出第n个代谢物的仿真质谱组为S* n={S* n,1,S* n,2,…,S* n,K}。
  9. 根据权利要求8所述的代谢混合物MS/MS质谱的仿真生成系统,其特征在于,所述替换单元具体包括:
    删除子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pdel,则将对应谱线删除。
    质核比偏移子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pmz,则使md产生一个服从N(μTT)∈Pn分布的随机偏移t,有m* d=md+t;
    强度偏移子单元,用于产生[0,1]范围内均匀分布的随机值r,若有r<pint,则使id变为一个服从N(μII)∈Pn分布的新随机值i* d
    其中,pdel为谱线删除概率,pmz为质核比偏移概率,pint为强度偏移概率。
  10. 根据权利要求6所述的代谢混合物MS/MS质谱的仿真生成系统,其特征在于,所述仿真质谱产生模块具体包括:
    混合单元,用于从每个代谢物的仿真质谱组S* n,n=1,2,…,N中,各随机选择一个质谱S* n,K,k∈K,共计N个;将其中所有谱线混合, 组成新的质谱矢量Sl=[(m1,i1),(m2,i2),…];
    计算单元,用于提取Sl的质核比矢量为Ml,计算其质核比概率模型为Nl(μTT);
    建模单元,用于使用回归算法对Sl进行建模形成非线性模型Rl
    随机偏移单元,用于对于Ml中的每个md,使其产生一个服从Nl(μTT)分布的随机偏移值t:m* d=md+t,并使用Rl计算对应的强度值为i* d,构成新的仿真谱线(m* d,i* d),将所有仿真谱线组成代谢混合物MS/MS仿真质谱S* l=[(m* 1,i* 1),(m* 2,i* 2),…],作为当前输出;更新单元,用于更新计数器l=l+1,若l<L则转至混合单元。
PCT/CN2016/076226 2016-01-25 2016-03-14 一种代谢混合物ms/ms质谱的仿真生成方法及系统 WO2017128497A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610049964.4A CN105760708B (zh) 2016-01-25 2016-01-25 一种代谢混合物ms/ms质谱的仿真生成方法及系统
CN201610049964.4 2016-01-25

Publications (1)

Publication Number Publication Date
WO2017128497A1 true WO2017128497A1 (zh) 2017-08-03

Family

ID=56342555

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/076226 WO2017128497A1 (zh) 2016-01-25 2016-03-14 一种代谢混合物ms/ms质谱的仿真生成方法及系统

Country Status (2)

Country Link
CN (1) CN105760708B (zh)
WO (1) WO2017128497A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339356A (zh) * 2011-07-01 2012-02-01 苏州大学 应用代谢组学技术评价与预测药物毒性和药效的方法
CN104615903A (zh) * 2015-02-16 2015-05-13 厦门大学 一种模型自适应的nmr代谢组学数据归一化方法
CN104834832A (zh) * 2015-05-26 2015-08-12 哈尔滨工业大学深圳研究生院 代谢物ms/ms质谱计算机仿真方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10840073B2 (en) * 2012-05-18 2020-11-17 Thermo Fisher Scientific (Bremen) Gmbh Methods and apparatus for obtaining enhanced mass spectrometric data
CA2884633A1 (en) * 2012-09-13 2014-03-20 President And Fellows Of Harvard College Methods for multiplex analytical measurements in single cells of solid tissues
CN103728379B (zh) * 2012-10-11 2015-06-10 中国科学院大连化学物理研究所 一种判断用于科学研究的血液样本采集质量的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339356A (zh) * 2011-07-01 2012-02-01 苏州大学 应用代谢组学技术评价与预测药物毒性和药效的方法
CN104615903A (zh) * 2015-02-16 2015-05-13 厦门大学 一种模型自适应的nmr代谢组学数据归一化方法
CN104834832A (zh) * 2015-05-26 2015-08-12 哈尔滨工业大学深圳研究生院 代谢物ms/ms质谱计算机仿真方法

Also Published As

Publication number Publication date
CN105760708A (zh) 2016-07-13
CN105760708B (zh) 2018-12-14

Similar Documents

Publication Publication Date Title
CN110580501B (zh) 一种基于变分自编码对抗网络的零样本图像分类方法
Hur et al. A global approach to analysis and interpretation of metabolic data for plant natural product discovery
Maarleveld et al. StochPy: a comprehensive, user-friendly tool for simulating stochastic biological processes
CN114169442B (zh) 基于双原型网络的遥感图像小样本场景分类方法
WO2022267750A1 (zh) 建模方法及建模装置、电子设备及存储介质
CN104166731A (zh) 一种社交网络重叠社区发现系统及其方法
CN105825269B (zh) 一种基于并行自动编码机的特征学习方法及系统
CN118786411A (zh) 用于程序合成的会话框架的系统和方法
CN111444094B (zh) 一种测试数据的生成方法和系统
CN117334271B (zh) 一种基于指定属性生成分子的方法
CN111144017A (zh) 一种基于ff-rvm的多时段间歇过程软测量建模方法
CN107993636A (zh) 基于递归神经网络的乐谱建模与生成方法
Varelas et al. Benchmarking large-scale continuous optimizers: The bbob-largescale testbed, a COCO software guide and beyond
Yousefnezhad et al. A new selection strategy for selective cluster ensemble based on diversity and independency
CN115169470A (zh) 一种基于可接受区域的高维小样本数据扩充方法
Den Hartogh et al. Barium stars as tracers of s-process nucleosynthesis in AGB stars-II. Using machine learning techniques on 169 stars
CN103455466A (zh) 计算器计算方法及系统
WO2017128497A1 (zh) 一种代谢混合物ms/ms质谱的仿真生成方法及系统
CN112947080A (zh) 一种基于场景参数变换的智能决策模型性能评估系统
CN113722951B (zh) 基于神经网络的散射体三维有限元网格优化方法
CN114999564A (zh) 蛋白质数据处理方法、装置、电子设备以及存储介质
Deng et al. Second‐order quasi‐likelihood for spatial point processes
Donnelly et al. Power Analysis for Conditional Indirect Effects: A Tutorial for Conducting Monte Carlo Simulations with Categorical Exogenous Variables
CN111859985A (zh) Ai客服模型测试方法、装置、电子设备及存储介质
CN111651887A (zh) 一套数值模型参数不确定性分析方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16887379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16887379

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09/05/2019)