CN104316591A - Non-linear fitting mode-based peptide mass spectrum speak characteristic parameter extraction method - Google Patents
Non-linear fitting mode-based peptide mass spectrum speak characteristic parameter extraction method Download PDFInfo
- Publication number
- CN104316591A CN104316591A CN201410498854.7A CN201410498854A CN104316591A CN 104316591 A CN104316591 A CN 104316591A CN 201410498854 A CN201410498854 A CN 201410498854A CN 104316591 A CN104316591 A CN 104316591A
- Authority
- CN
- China
- Prior art keywords
- partiald
- characteristic parameter
- parameter
- vector
- mass spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The invention relates to a non-linear fitting mode-based peptide mass spectrum speak characteristic parameter extraction method. The prior art is difficult to guarantee a precision of the extracted mass spectrum speak characteristic parameter when distributing error of sample points formed in a peptide mass spectrum is large. The non-linear fitting mode-based peptide mass spectrum speak characteristic parameter extraction method is characterized in that data of multiple sample points is utilized, the difference minimum of actual data and a fitting result is used as a guide, and a characteristic parameter estimated value is continuously updated by an iteration method until convergence conditions are satisfies so that the final characteristic parameter estimated value is obtained. The method provided by the invention can effectively reduce the unfavorable influence produced by sample point distributing error on Gauss curve characteristic parameter solving, improve a characteristic parameter value precision and is conducive to peptide segment identification precision improvement.
Description
Technical field
The invention belongs to biological mass spectrometry data prediction and information extraction technology field, be specifically related to a kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode.
Background technology
Peptide qualification based on tandem mass spectrum is widely used technology in current proteome research field.Peptide to be identified in a mass spectrometer by cracked be fragmention, thus generate tandem mass spectrum data, and compare with theoretical tandem mass spectra storehouse or the peptide section mass spectral database identified and analyze, finally complete the qualification to unknown peptide section.
Carry out Mass Spectrometer Method to certain ion under normal circumstances, detected mass-to-charge ratio data are not single numerical points, but there is some sampling points, and on mass spectrogram, it fits to Gaussian curve, i.e. Gaussian peak.For determining the specific charge of this ion, pre-service need be carried out to these sampling points, calculating the barycenter (Centroid) in its X direction, i.e. the actual measurement mass-to-charge ratio of this ion.According to required barycenter, other characteristic parameters such as the maximum Abundances of this ion and then can be extrapolated.Current barycenter method for solving has multiple, more common thinking is: assuming that each sampling point mass spectrogram being formed Gaussian peak is all strictly distributed on certain Gaussian curve, utilize the numerical value (mass-to-charge ratio and Abundances) of each sampling point, be updated in the common Gaussian curvilinear function expression formula of unknown parameters, structure Simultaneous Equations, thus solve the characteristic parameter of corresponding Gaussian peak, comprise barycenter, maximum Abundances etc.A proteomics data analysis software MAXQUANT adopts current application very widely is namely this method.But in reality detects, by the impact of the factors such as experiment condition, place environment and instrument and equipment noise, on mass spectrogram, each sampling point often and non-critical is distributed on Gaussian curve, but there is certain deviation.When each sampling point amount of deflection is comparatively large, then the assumed condition in said method is difficult to set up, and the characteristic parameter solved thus certainly will be caused numerically to there is comparatively big error, and then has influence on the precision of peptide section qualification.
Summary of the invention
The object of the invention is to the shortcoming and defect solving said method, propose a kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode.
If in mass spectrogram, the Gaussian peak of certain ion is made up of N number of sampling point, N >=3 under normal circumstances.After sorting from big to small by its Abundances to sampling point, its coordinate forms set A.
A={(m
1,d
1),(m
2,d
2),…(m
N,d
N)}
Wherein, m
irepresent mass-to-charge ratio, d
irepresent abundance, its value be greater than 0, i ∈ 1,2,3 ..., N}.Its functional form of Gaussian curve gone out by spot fitting is needed to be set to:
Wherein, function f (x, P) represents Abundances, and independent variable x represents mass-to-charge ratio, p
1, p
2and p
3for Gaussian curve characteristic parameter to be solved, characterize zoom factor, barycenter, standard deviation respectively, constitutive characteristic parameter vector P=[p
1p
2p
3].Described characteristic parameter extraction method treatment step is as follows:
Step (1) according to maximum 3 the sampling point data of Abundances, to Gaussian curve characteristic parameter initialize.
Wherein, right log operations is taken from ln () expression.
Step (2) selects appropriate value initialization iteration step length parameter lambda, and the large young pathbreaker of this parameter initialization numerical value affects iterations and the speed of convergence for the treatment of method.
Step (3) digital simulation resultant error Err, judges whether iterative process terminates.
Setting decision threshold ε
1if, Err≤ε
1, then processing procedure terminates, and the characteristic ginseng value in current vectorial P is the net result solved.Otherwise, if Err > is ε
1, then step (4) is entered.
Thresholding ε
1value determine the precision of extracted characteristic ginseng value, affect the iterations of processing procedure simultaneously.ε
1value is less, and the precision of characteristic ginseng value is higher, and the required iterations of process is more.It should be noted that if ε
1value is too small, then this iterative process may will cannot finally restrain.Otherwise, ε
1value is larger, and the parameter precision extracted is by corresponding reduction, and iterations will reduce.
Step (4) is according to current signature parameter vector P, structural matrix J.
Step (5) calculates in each iterative process, renewal vector H=[the Δ p of characteristic parameter vector P
1Δ p
2Δ p
3]
t, Δ p
1, Δ p
2with Δ p
3be respectively characteristic parameter p
1, p
2and p
3updated value undetermined.Instrument error vector E.
E=[d
1-f(m
1,P),d
2-f(m
2,P),…d
N-f(m
N,P)]
T
Then:
H=[J
T×J+λ×diag(J
T×J)]
-1×J
T×E
Wherein, diag () representing matrix diagonal element extracts and creates diagonal matrix operation.
Step (6) calculates the metric ρ (H) upgrading vectorial H.
Step (7) regeneration characteristics parameter vector P and iteration step length parameter lambda.Setting decision threshold ε
2if upgrade metric ρ (H) the > ε of vectorial H
2, then current signature parameter vector P numerical value is substituted by P+H, i.e. P ← P+H, completes renewal, and current iteration step parameter λ numerical value is decreased to λ/K, i.e. λ ← λ/K simultaneously.Otherwise, if ρ (H)≤ε
2, then current signature parameter vector P remains unchanged, and Simultaneous Iteration step parameter λ numerical value increases K doubly, i.e. λ ← K × λ.K is scale factor, and general span is 5 ~ 20.Decision threshold ε
2appropriate value should be set according to specific targets such as sampling point extent of deviation, speed of convergence requirements.
After completing characteristic parameter vector P and the renewal of iteration step length parameter lambda, be back to step (3), enter row next round iteration.
Peptide mass spectra peak characteristic parameter extraction method in the present invention, adopts various some nonlinear fitting mode to solve characteristic parameter, decreases the adverse effect that sampling point distribution bias is brought, improve parameter extraction precision, and then is conducive to the improvement of peptide section qualification precision.
Embodiment
Step (1) according to maximum 3 the sampling point data of Abundances, to Gaussian curve characteristic parameter initialize.
Wherein, right log operations is taken from ln () expression.
Step (2) selects appropriate value initialization iteration step length parameter lambda, and the large young pathbreaker of this parameter initialization numerical value affects iterations and the speed of convergence for the treatment of method.
Step (3) digital simulation resultant error Err, judges whether iterative process terminates.
Setting decision threshold ε
1if, Err≤ε
1, then processing procedure terminates, and the characteristic ginseng value in current vectorial P is the net result solved.Otherwise, if Err > is ε
1, then step (4) is entered.
Thresholding ε
1value determine the precision of extracted characteristic ginseng value, affect the iterations of processing procedure simultaneously.ε
1value is less, and the precision of characteristic ginseng value is higher, and the required iterations of process is more.It should be noted that if ε
1value is too small, then this iterative process may will cannot finally restrain.Otherwise, ε
1value is larger, and the parameter precision extracted is by corresponding reduction, and iterations will reduce.
Step (4) is according to current signature parameter vector P, structural matrix J.
Step (5) calculates in each iterative process, renewal vector H=[the Δ p of characteristic parameter vector P
1Δ p
2Δ p
3]
t, Δ p
1, Δ p
2with Δ p
3be respectively characteristic parameter p
1, p
2and p
3updated value undetermined.Instrument error vector E.
E=[d
1-f(m
1,P),d
2-f(m
2,P),…d
N-f(m
N,P)]
T
Then:
H=[J
T×J+λ×diag(J
T×J)]
-1×J
T×E
Wherein, diag () representing matrix diagonal element extracts and creates diagonal matrix operation.
Step (6) calculates the metric ρ (H) upgrading vectorial H.
Step (7) regeneration characteristics parameter vector P and iteration step length parameter lambda.Setting decision threshold ε
2if upgrade metric ρ (H) the > ε of vectorial H
2, then current signature parameter vector P numerical value is substituted by P+H, i.e. P ← P+H, completes renewal, and current iteration step parameter λ numerical value is decreased to λ/K, i.e. λ ← λ/K simultaneously.Otherwise, if ρ (H)≤ε
2, then current signature parameter vector P remains unchanged, and Simultaneous Iteration step parameter λ numerical value increases K doubly, i.e. λ ← K × λ.K is scale factor, and general span is 5 ~ 20.Decision threshold ε
2appropriate value should be set according to specific targets such as sampling point extent of deviation, speed of convergence requirements.
After completing characteristic parameter vector P and the renewal of iteration step length parameter lambda, be back to step (3), enter row next round iteration.
Claims (1)
1., based on a peptide mass spectra peak characteristic parameter extraction method for nonlinear fitting mode, it is characterized in that:
If in mass spectrogram, the Gaussian peak of certain ion is made up of N number of sampling point, N >=3; After sorting from big to small by its Abundances to sampling point, its coordinate forms set A;
A={(m
1,d
1),(m
2,d
2),…(m
N,d
N)}
Wherein, m
irepresent mass-to-charge ratio, d
irepresent Abundances, i ∈ 1,2,3 ..., N}; Its functional form of Gaussian curve preparing to be gone out by spot fitting is set to:
Wherein, function f (x, P) representation theory Abundances, independent variable x represents mass-to-charge ratio, p
1, p
2and p
3for Gaussian curve characteristic parameter to be solved, characterize zoom factor, barycenter, standard deviation respectively, constitutive characteristic parameter vector P=[p
1p
2p
3];
Concrete steps are as follows:
Step (1) according to maximum 3 the sampling point data of Abundances, to Gaussian curve characteristic parameter initialize;
Wherein, right log operations is taken from ln () expression;
Step (2) selects appropriate value initialization iteration step length parameter lambda, and the large young pathbreaker of this parameter initialization numerical value affects iterations and speed of convergence;
Step (3) digital simulation resultant error Err, judges whether iterative process terminates;
Setting decision threshold ε
1if, Err≤ε
1, then processing procedure terminates, and the characteristic ginseng value in current vectorial P is the net result solved; Otherwise, if Err > is ε
1, then step (4) is entered;
Step (4) is according to current signature parameter vector P, structural matrix J;
Step (5) calculates in each iterative process, renewal vector H=[the Δ p of characteristic parameter vector P
1Δ p
2Δ p
3]
t, Δ p
1, Δ p
2with Δ p
3be respectively characteristic parameter p
1, p
2and p
3updated value undetermined; Instrument error vector E;
E=[d
1-f(m
1,P),d
2-f(m
2,P),…d
N-f(m
N,P)]
T
Then:
H=[J
T×J+λ×diag(J
T×J)]
-1×J
T×E
Wherein, diag () representing matrix diagonal element extracts and creates diagonal matrix operation;
Step (6) calculates the metric ρ (H) upgrading vectorial H;
Step (7) regeneration characteristics parameter vector P and iteration step length parameter lambda; Setting decision threshold ε
2if upgrade metric ρ (H) the > ε of vectorial H
2, then current signature parameter vector P numerical value is substituted by P+H, i.e. P ← P+H, completes renewal, and current iteration step parameter λ numerical value is decreased to λ/K, i.e. λ ← λ/K simultaneously; Otherwise, if ρ (H)≤ε
2, then current signature parameter vector P remains unchanged, and Simultaneous Iteration step parameter λ numerical value increases K doubly, i.e. λ ← K × λ; K is scale factor, and span is 5 ~ 20; After completing characteristic parameter vector P and the renewal of iteration step length parameter lambda, be back to step (3), carry out next round iteration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410498854.7A CN104316591B (en) | 2014-09-25 | 2014-09-25 | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410498854.7A CN104316591B (en) | 2014-09-25 | 2014-09-25 | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104316591A true CN104316591A (en) | 2015-01-28 |
CN104316591B CN104316591B (en) | 2016-09-07 |
Family
ID=52371851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410498854.7A Expired - Fee Related CN104316591B (en) | 2014-09-25 | 2014-09-25 | A kind of peptide mass spectra peak characteristic parameter extraction method based on nonlinear fitting mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104316591B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109844515A (en) * | 2016-10-04 | 2019-06-04 | Atonarp株式会社 | System and method for the accurately composition of quantified goal sample |
CN114487073A (en) * | 2021-12-27 | 2022-05-13 | 浙江迪谱诊断技术有限公司 | Time-of-flight nucleic acid mass spectrum data calibration method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08329123A (en) * | 1995-05-30 | 1996-12-13 | Mitsubishi Electric Corp | Parameter extraction system |
CN100376895C (en) * | 2004-11-03 | 2008-03-26 | 中国科学院计算技术研究所 | Method for identifying peptide by using tandem mass spectrometry data |
CN103389335A (en) * | 2012-05-11 | 2013-11-13 | 中国科学院大连化学物理研究所 | Analysis device and method for identifying biomacromolecules |
CN102914515A (en) * | 2012-07-29 | 2013-02-06 | 安徽皖仪科技股份有限公司 | Method for extracting low-concentration signals of laser gas analyzer |
CN103018194A (en) * | 2012-12-06 | 2013-04-03 | 江苏省质量安全工程研究院 | Asymmetric least square baseline correction method based on background estimation |
CN103217679B (en) * | 2013-03-22 | 2014-10-08 | 北京航空航天大学 | Full-waveform laser radar echo data gaussian decomposition method based on genetic algorithm |
CN104062644A (en) * | 2013-11-22 | 2014-09-24 | 董立新 | Method for extracting tree height from laser radar Gaussian echo data |
-
2014
- 2014-09-25 CN CN201410498854.7A patent/CN104316591B/en not_active Expired - Fee Related
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109844515A (en) * | 2016-10-04 | 2019-06-04 | Atonarp株式会社 | System and method for the accurately composition of quantified goal sample |
CN114487073A (en) * | 2021-12-27 | 2022-05-13 | 浙江迪谱诊断技术有限公司 | Time-of-flight nucleic acid mass spectrum data calibration method |
CN114487073B (en) * | 2021-12-27 | 2024-04-12 | 浙江迪谱诊断技术有限公司 | Time-of-flight nucleic acid mass spectrum data calibration method |
Also Published As
Publication number | Publication date |
---|---|
CN104316591B (en) | 2016-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11057788B2 (en) | Method and system for abnormal value detection in LTE network | |
Wang et al. | On multi-event co-calibration of dynamic model parameters using soft actor-critic | |
CN105354860B (en) | Extension target CBMeMBer trackings based on case particle filter | |
CN103392220B (en) | Correcting time-of-flight drifts in time-of-flight mass spectrometers | |
CN104699894A (en) | JITL (just-in-time learning) based multi-model fusion modeling method adopting GPR (Gaussian process regression) | |
CN104766175A (en) | Power system abnormal data identifying and correcting method based on time series analysis | |
CN105425779A (en) | ICA-PCA multi-working condition fault diagnosis method based on local neighborhood standardization and Bayesian inference | |
CN109936113B (en) | Protection action intelligent diagnosis method and system based on random forest algorithm | |
CN104794735A (en) | Extended target tracking method based on variational Bayesian expectation maximization | |
CN113822770A (en) | Farmland construction area intelligent demarcation method and system | |
Chen et al. | Proactive quality control: Observing system simulation experiments with the Lorenz’96 model | |
GB2590107A (en) | Mass spectrometer calibration | |
CN105447844A (en) | New method for characteristic selection of complex multivariable data | |
CN104316591A (en) | Non-linear fitting mode-based peptide mass spectrum speak characteristic parameter extraction method | |
KR100789430B1 (en) | Method for determining isotopic clusters and monoisotopic masses of polypeptides on mass spectra of complex polypeptide mixtures and computer-readable medium thereof | |
CN104268560B (en) | A kind of land use recognition methods based on remote Sensing Interpretation | |
US20140005954A1 (en) | Method Of Processing Multidimensional Mass Spectrometry | |
CN103076595B (en) | Abnormal type identifying method for multivariate discrete radar emitter signal | |
CN114487072B (en) | Time-of-flight mass spectrum peak fitting method | |
CN104008292B (en) | Broad-band antenna super-broadband electromagnetic impulse response prediction method | |
CN104297328A (en) | Least-square-method-based peptide mass spectrum peak characteristic parameter extraction method | |
Peralta et al. | Unit commitment with load uncertainty by joint chance-constrained programming | |
CN112836104B (en) | Database-assisted autonomous clustering signal sorting method and system | |
CN116975742A (en) | Partial discharge pattern recognition method, apparatus, device, and storage medium | |
CN112102889A (en) | Free energy perturbation network design method based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160907 Termination date: 20170925 |
|
CF01 | Termination of patent right due to non-payment of annual fee |