CN104316591B - A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode - Google Patents
A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode Download PDFInfo
- Publication number
- CN104316591B CN104316591B CN201410498854.7A CN201410498854A CN104316591B CN 104316591 B CN104316591 B CN 104316591B CN 201410498854 A CN201410498854 A CN 201410498854A CN 104316591 B CN104316591 B CN 104316591B
- Authority
- CN
- China
- Prior art keywords
- characteristic parameter
- rsqb
- lsqb
- vector
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
The present invention relates to a kind of peptide mass spectra peak characteristic parameter extraction method.When existing method exists relatively large deviation for its distribution of each sampling point forming spectral peak in peptide fragment mass spectrogram, there is the deficiency being difficult to ensure that extracted mass spectra peak characteristic parameter precision.The present invention proposes peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode, utilize multiple sampling point data, with the minimum guiding of difference between real data and fitting result, alternative manner is used to constantly update time parameters estimation value, until meeting the condition of convergence, thus obtain final characteristic parameter valuation.The method is effectively reduced sampling point distribution bias and Gaussian curve characteristic parameter solves the adverse effect brought, and improves characteristic parameter numerical value precision, and then beneficially peptide fragment identifies the improvement of precision.
Description
Technical field
The invention belongs to biological mass spectrometry data prediction and information extraction technology field, be specifically related to
A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode.
Background technology
It is widely used in current proteome research field that peptide based on tandem mass spectrum is identified
Technology.Peptide to be identified is fractured as fragment ion in a mass spectrometer, thus generates tandem mass spectrum
Data, and compare with theoretical tandem mass spectra storehouse or the peptide fragment mass spectral database identified and analyze,
Finally complete the qualification to unknown peptide fragment.
Certain ion carries out Mass Spectrometer Method under normal circumstances, and detected mass-to-charge ratio data are not
Being single numerical point, but there is some sampling points, on mass spectrogram, it fits to Gaussian curve,
I.e. Gaussian peak.For determining the charge-mass ratio of this ion, these sampling points need to be pre-processed, calculate
Go out the barycenter (Centroid) in its X direction, i.e. the actual measurement mass-to-charge ratio of this ion.According to institute
Seek barycenter, other characteristic parameters such as this ion maximum Abundances and then can be extrapolated.Barycenter at present
Method for solving has multiple, and relatively common thinking is: assuming that constitute each of Gaussian peak on mass spectrogram
Individual sampling point is the most strictly distributed on certain Gaussian curve, utilize each sampling point numerical value (mass-to-charge ratio and
Abundances), it is updated in the common Gaussian curvilinear function expression formula of unknown parameters, constructs simultaneous
Equation group, thus solve the characteristic parameter of corresponding Gaussian peak, including barycenter, maximum Abundances etc..
The extremely wide a proteomics data of current application is analyzed software MAXQUANT and is adopted
Be i.e. this method.But in actually detected, by experiment condition, place environment and
The impact of the factors such as instrument and equipment noise, on mass spectrogram, each sampling point is often and non-critical is distributed in
On Gaussian curve, but there is certain deviation.When each sampling point amount of deflection is relatively big, the most above-mentioned
Assumed condition in method is difficult to set up, thus the characteristic parameter solved certainly will be caused at numerical value
The bigger error of upper existence, and then have influence on the precision that peptide fragment is identified.
Summary of the invention
It is an object of the invention to solve the shortcoming and defect of said method, propose a kind of based on
The peptide mass spectra peak characteristic parameter extraction method of nonlinear fitting mode.
If in mass spectrogram, the Gaussian peak of certain ion is made up of N number of sampling point, under normal circumstances N >=3.
After sorting sampling point from big to small by its Abundances, its coordinate constitutes set A.
A={ (m1,d1),(m2,d2),…(mN,dN)}
Wherein, miRepresent mass-to-charge ratio, diRepresent abundance, its value be more than 0, i ∈ 1,2 ..., N}.Need
Its functional form of Gaussian curve gone out by spot fitting is set to:
Wherein, function f (x, P) represents Abundances, and independent variable x represents mass-to-charge ratio, p1、p2And p3For
Gaussian curve characteristic parameter to be solved, characterizes zoom factor, barycenter, standard deviation, structure respectively
Become characteristic parameter vector P=[p1p2p3].Described characteristic parameter extraction method processes step such as
Under:
Gaussian curve feature, according to 3 sampling point data of Abundances maximum, is joined by step (1)
Number composes initial value.
Wherein, right log operations is taken from ln () expression.
Step (2) selects appropriate value to initialize iteration step length parameter lambda, this parameter initialization
The size of numerical value will affect iterations and the convergence rate for the treatment of method.
Step (3) digital simulation resultant error Err, it is determined that whether iterative process terminates.
Set decision threshold ε1If, Err≤ε1, then processing procedure terminates, the spy in current vector P
Levy parameter value and be the final result solved.Whereas if Err > ε1, then step (4) is entered.
Thresholding ε1Value determine the precision of extracted characteristic ginseng value, affect place simultaneously
The iterations of reason process.ε1Value is the least, and the precision of characteristic ginseng value is the highest, processes
Required iterations is the most.It should be noted that if ε1Value is too small, then this iteration
Process may will be unable to finally restrain.Otherwise, ε1Value is the biggest, the parameter precision extracted
To reduce accordingly, and iterations will reduce.
Step (4) is according to current signature parameter vector P, structural matrix J.
Step (5) calculates in each iterative process, the renewal vector of characteristic parameter vector P
H=[Δ p1 Δp2 Δp3]T, Δ p1, Δ p2With Δ p3It is respectively characteristic parameter p1, p2And p3Treat
Determine updated value.Instrument error vector E.
E=[d1-f(m1,P),d2-f(m2,P),… dN-f(mN,P)]T
Then:
H=[JT×J+λ×diag(JT×J)]-1×JT×E
Wherein, diag () representing matrix diagonal element extracts and creates diagonal matrix operation.
Step (6) calculates the metric ρ (H) updating vector H.
Step (7) updates characteristic parameter vector P and iteration step length parameter lambda.Set decision threshold
ε2If updating metric ρ (H) the > ε of vector H2, then current signature parameter vector P numerical value
Substituted by P+H, i.e. P ← P+H, complete to update, current iteration step parameter λ numerical value simultaneously
It is decreased to λ/K, i.e. λ ← λ/K.Whereas if ρ (H)≤ε2, then current signature parameter to
Amount P keeps constant, and Simultaneous Iteration step parameter λ numerical value increases K times, i.e. λ ← K × λ.K is
Scale factor, general span is 5~20.Decision threshold ε2Should according to sampling point extent of deviation,
The specific targets such as convergence rate requirement arrange appropriate value.
After completing characteristic parameter vector P and the renewal of iteration step length parameter lambda, it is back to step
(3), row next round iteration is entered.
Peptide mass spectra peak characteristic parameter extraction method in the present invention, uses many sampling points Nonlinear Quasi
Conjunction mode solves characteristic parameter, decreases the adverse effect that sampling point distribution bias is brought, carries
Rise parameter extraction precision, and then beneficially peptide fragment has identified the improvement of precision.
Detailed description of the invention
A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode,
Specific as follows:
If in mass spectrogram, the Gaussian peak of certain ion is made up of N number of sampling point, under normal circumstances
N≥3.After sorting sampling point from big to small by its Abundances, its coordinate constitutes set A.
A={ (m1,d1),(m2,d2),…(mN,dN)}
Wherein, miRepresent mass-to-charge ratio, diRepresent abundance, its value be more than 0, i ∈ 1,2 ..., N}.Need
Its functional form of Gaussian curve gone out by spot fitting is set to:
Wherein, function f (x, P) represents Abundances, and independent variable x represents mass-to-charge ratio, p1、p2And p3For
Gaussian curve characteristic parameter to be solved, characterizes zoom factor, barycenter, standard deviation, structure respectively
Become characteristic parameter vector P=[p1p2p3].It is as follows that characteristic parameter extraction method processes step:
Gaussian curve feature, according to 3 sampling point data of Abundances maximum, is joined by step (1)
Number composes initial value.
Wherein, right log operations is taken from ln () expression.
Step (2) selects appropriate value to initialize iteration step length parameter lambda, this parameter initialization
The size of numerical value will affect iterations and the convergence rate for the treatment of method.
Step (3) digital simulation resultant error Err, it is determined that whether iterative process terminates.
Set decision threshold ε1If, Err≤ε1, then processing procedure terminates, the spy in current vector P
Levy parameter value and be the final result solved.Whereas if Err > ε1, then step (4) is entered.
Thresholding ε1Value determine the precision of extracted characteristic ginseng value, affect place simultaneously
The iterations of reason process.ε1Value is the least, and the precision of characteristic ginseng value is the highest, processes
Required iterations is the most.It should be noted that if ε1Value is too small, then this iteration
Process may will be unable to finally restrain.Otherwise, ε1Value is the biggest, the parameter precision extracted
To reduce accordingly, and iterations will reduce.
Step (4) is according to current signature parameter vector P, structural matrix J.
Step (5) calculates in each iterative process, the renewal vector of characteristic parameter vector P
H=[Δ p1 Δp2 Δp3]T, Δ p1, Δ p2With Δ p3It is respectively characteristic parameter p1, p2And p3Treat
Determine updated value.Instrument error vector E.
E=[d1-f(m1,P),d2-f(m2,P),… dN-f(mN,P)]T
Then:
H=[JT×J+λ×diag(JT×J)]-1×JT×E
Wherein, diag () representing matrix diagonal element extracts and creates diagonal matrix operation.
Step (6) calculates the metric ρ (H) updating vector H.
Step (7) updates characteristic parameter vector P and iteration step length parameter lambda.Set decision threshold
ε2If updating metric ρ (H) the > ε of vector H2, then current signature parameter vector P numerical value
Substituted by P+H, i.e. P ← P+H, complete to update, current iteration step parameter λ numerical value simultaneously
It is decreased to λ/K, i.e. λ ← λ/K.Whereas if ρ (H)≤ε2, then current signature parameter to
Amount P keeps constant, and Simultaneous Iteration step parameter λ numerical value increases K times, i.e. λ ← K × λ.K is
Scale factor, general span is 5~20.Decision threshold ε2Should according to sampling point extent of deviation,
The specific targets such as convergence rate requirement arrange appropriate value.
After completing characteristic parameter vector P and the renewal of iteration step length parameter lambda, it is back to step
(3), row next round iteration is entered.
Claims (1)
1. a peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode, its feature exists
In:
If in mass spectrogram, the Gaussian peak of certain ion is made up of N number of sampling point, N >=3;To sampling point by its abundance
After value sequence from big to small, its coordinate constitutes set A;
A={ (m1,d1),(m2,d2),…(mN,dN)}
Wherein, miRepresent mass-to-charge ratio, diExpression Abundances, i ∈ 1,2 ..., N};It is ready to pass through sampling point to intend
Its functional form of the Gaussian curve closed out is set to:
Wherein, function f (x, P) representation theory Abundances, independent variable x represents mass-to-charge ratio, p1、p2With
p3For Gaussian curve characteristic parameter to be solved, characterize zoom factor, barycenter, standard deviation, structure respectively
Become characteristic parameter vector P=[p1p2p3];
Specifically comprise the following steps that
Step (1) is according to 3 sampling point data of Abundances maximum, at the beginning of composing Gaussian curve characteristic parameter
Value;
Wherein, right log operations is taken from ln () expression;
Step (2) selects appropriate value to initialize iteration step length parameter lambda, this parameter initialization numerical value
Size will affect iterations and convergence rate;
Step (3) digital simulation resultant error Err, it is determined that whether iterative process terminates;
Set decision threshold ε1If, Err≤ε1, then processing procedure terminates, the feature in current vector P
Parameter value is the final result solved;Whereas if Err > ε1, then step (4) is entered;
Step (4) is according to current signature parameter vector P, structural matrix J;
Step (5) calculates in each iterative process, the renewal vector of characteristic parameter vector P
H=[Δ p1 Δp2 Δp3]T, Δ p1, Δ p2With Δ p3It is respectively characteristic parameter p1, p2And p3Renewal undetermined
Value;Instrument error vector E;
E=[d1-f(m1,P),d2-f(m2,P),…dN-f(mN,P)]T
Then:
H=[JT×J+λ×diag(JT×J)]-1×JT×E
Wherein, diag () representing matrix diagonal element extracts and creates diagonal matrix operation;
Step (6) calculates the metric ρ (H) updating vector H;
Step (7) updates characteristic parameter vector P and iteration step length parameter lambda;Set decision threshold ε2,
If updating metric ρ (H) the > ε of vector H2, then current signature parameter vector P numerical value is replaced by P+H
Generation, i.e. P ← P+H, completing to update, current iteration step parameter λ numerical value is decreased to λ/K, i.e. simultaneously
λ←λ/K;Whereas if ρ (H)≤ε2, then current signature parameter vector P keeps constant, changes simultaneously
Long parameter lambda numerical value of riding instead of walk increases K times, i.e. λ ← K × λ;K is scale factor, and span is
5~20;After completing characteristic parameter vector P and the renewal of iteration step length parameter lambda, it is back to step
(3), next round iteration is carried out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410498854.7A CN104316591B (en) | 2014-09-25 | 2014-09-25 | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410498854.7A CN104316591B (en) | 2014-09-25 | 2014-09-25 | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104316591A CN104316591A (en) | 2015-01-28 |
CN104316591B true CN104316591B (en) | 2016-09-07 |
Family
ID=52371851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410498854.7A Expired - Fee Related CN104316591B (en) | 2014-09-25 | 2014-09-25 | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104316591B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6695086B2 (en) * | 2016-10-04 | 2020-05-20 | アトナープ株式会社 | System and method for accurately quantifying the composition of a sample to be measured |
CN114487073B (en) * | 2021-12-27 | 2024-04-12 | 浙江迪谱诊断技术有限公司 | Time-of-flight nucleic acid mass spectrum data calibration method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08329123A (en) * | 1995-05-30 | 1996-12-13 | Mitsubishi Electric Corp | Parameter extraction system |
CN1769891A (en) * | 2004-11-03 | 2006-05-10 | 中国科学院计算技术研究所 | Method for identifying peptide by using tandem mass spectrometry data |
CN102914515A (en) * | 2012-07-29 | 2013-02-06 | 安徽皖仪科技股份有限公司 | Method for extracting low-concentration signals of laser gas analyzer |
CN103018194A (en) * | 2012-12-06 | 2013-04-03 | 江苏省质量安全工程研究院 | Asymmetric least square baseline correction method based on background estimation |
CN103217679A (en) * | 2013-03-22 | 2013-07-24 | 北京航空航天大学 | Full-waveform laser radar echo data gaussian decomposition method based on genetic algorithm |
CN103389335A (en) * | 2012-05-11 | 2013-11-13 | 中国科学院大连化学物理研究所 | Analysis device and method for identifying biomacromolecules |
CN103777192A (en) * | 2012-10-24 | 2014-05-07 | 中国人民解放军第二炮兵工程学院 | Linear feature extraction method based on laser sensor |
CN104062644A (en) * | 2013-11-22 | 2014-09-24 | 董立新 | Method for extracting tree height from laser radar Gaussian echo data |
-
2014
- 2014-09-25 CN CN201410498854.7A patent/CN104316591B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08329123A (en) * | 1995-05-30 | 1996-12-13 | Mitsubishi Electric Corp | Parameter extraction system |
CN1769891A (en) * | 2004-11-03 | 2006-05-10 | 中国科学院计算技术研究所 | Method for identifying peptide by using tandem mass spectrometry data |
CN103389335A (en) * | 2012-05-11 | 2013-11-13 | 中国科学院大连化学物理研究所 | Analysis device and method for identifying biomacromolecules |
CN102914515A (en) * | 2012-07-29 | 2013-02-06 | 安徽皖仪科技股份有限公司 | Method for extracting low-concentration signals of laser gas analyzer |
CN103777192A (en) * | 2012-10-24 | 2014-05-07 | 中国人民解放军第二炮兵工程学院 | Linear feature extraction method based on laser sensor |
CN103018194A (en) * | 2012-12-06 | 2013-04-03 | 江苏省质量安全工程研究院 | Asymmetric least square baseline correction method based on background estimation |
CN103217679A (en) * | 2013-03-22 | 2013-07-24 | 北京航空航天大学 | Full-waveform laser radar echo data gaussian decomposition method based on genetic algorithm |
CN104062644A (en) * | 2013-11-22 | 2014-09-24 | 董立新 | Method for extracting tree height from laser radar Gaussian echo data |
Non-Patent Citations (5)
Title |
---|
Characterization of 1H NMR spectroscopic data and the generation of synthetic validation sets;Anderson Paul E. et al.;《Bioinformatics》;20091115;第25卷(第22期);第2992-3000页 * |
Increasing Peptide Identification in Tandem Mass Spectrometry Through Automatic Function Switching Optimization;Carrillo Brian et al.;《Journal of The American Society for Mass Spectrometry》;20051130;第16卷(第11期);第1818-1826页 * |
整体最小二乘的迭代解法;孔建 等;《武汉大学学报 信息科学版》;20100630;第35卷(第6期);第711-714页 * |
激光诱导击穿光谱数据特征自动提取方法研究;刘立拓 等;《光谱学与光谱分析》;20111231;第31卷(第12期);第3285-3288页 * |
蛋白质质谱分析的无标记定量算法研究进展;张伟 等;《生物化学与生物物理进展》;20110630;第38卷(第6期);第506-518页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104316591A (en) | 2015-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Marzani et al. | Fitting the strong coupling constant with soft-drop thrust | |
CN103559129B (en) | Statistical regression test data generating method based on genetic algorithm | |
CN106094786A (en) | Industrial process flexible measurement method based on integrated-type independent entry regression model | |
CN105590829A (en) | Systems and methods for calibrating gain in an electron multiplier | |
CN104316591B (en) | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode | |
Charity et al. | 2 p-2 p decay of 8 C and isospin-allowed 2 p decay of the isobaric-analog state in 8 B | |
WO2014128912A1 (en) | Data processing device, and data processing method | |
US20220293403A1 (en) | Mass spectrometer calibration | |
CN105334185A (en) | Spectrum projection discrimination-based near infrared model maintenance method | |
KR20200050434A (en) | Method and apparatus for identifying strain based on mass spectrum | |
Stratigopoulos et al. | Efficient Monte Carlo-based analog parametric fault modelling | |
CN109257160A (en) | A kind of side channel template attack method based on decision tree | |
US20140249765A1 (en) | Method and system for analyzing sugar-chain structure | |
Martyna et al. | Analysis of lead isotopic ratios of glass objects with the aim of comparing them for forensic purposes | |
Cammin et al. | The ATLAS discovery potential for the channel ttH, H to bb | |
US10522335B2 (en) | Mass spectrometry data processing apparatus, mass spectrometry system, and method for processing mass spectrometry data | |
CN107064042B (en) | Qualitative analysis method of infrared spectrum | |
Araki et al. | Adaptive Markov chain Monte Carlo for auxiliary variable method and its application to parallel tempering | |
CN114487072B (en) | Time-of-flight mass spectrum peak fitting method | |
CN114199989B (en) | Method and system for identifying pericarpium citri reticulatae based on mass spectrum data fusion | |
US20230410947A1 (en) | Systems and methods for rapid microbial identification | |
Peralta et al. | Unit commitment with load uncertainty by joint chance-constrained programming | |
CN104297328A (en) | Least-square-method-based peptide mass spectrum peak characteristic parameter extraction method | |
CN110147614B (en) | Engineering safety evaluation method based on grading difference Stacking multi-model ensemble learning | |
Rosenbusch et al. | Accurately accounting for effects on times-of-flight caused by finite field-transition times during the ejection of ions from a storage trap: A study for single-reference TOF and MRTOF mass spectrometry |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160907 Termination date: 20170925 |
|
CF01 | Termination of patent right due to non-payment of annual fee |