CN106248844B - A kind of peptide fragment liquid chromatogram retention time prediction method and system - Google Patents

A kind of peptide fragment liquid chromatogram retention time prediction method and system Download PDF

Info

Publication number
CN106248844B
CN106248844B CN201610941299.XA CN201610941299A CN106248844B CN 106248844 B CN106248844 B CN 106248844B CN 201610941299 A CN201610941299 A CN 201610941299A CN 106248844 B CN106248844 B CN 106248844B
Authority
CN
China
Prior art keywords
peptide fragment
retention time
amino acid
established
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610941299.XA
Other languages
Chinese (zh)
Other versions
CN106248844A (en
Inventor
涂慧君
刘超
迟浩
贺思敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201610941299.XA priority Critical patent/CN106248844B/en
Publication of CN106248844A publication Critical patent/CN106248844A/en
Application granted granted Critical
Publication of CN106248844B publication Critical patent/CN106248844B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The present invention proposes a kind of peptide fragment liquid chromatogram retention time prediction method and system, it is related to bioinformatics, this method includes scanning for raw mass spectrum data file, obtain the matching of peptide fragment spectrogram and be used as qualification result, matched for peptide fragment spectrograms from object library of the FDR in the qualification result less than 1%, the experiment retention time of corresponding peptide fragment in the matching of peptide fragment spectrogram is extracted, and training sample and test sample are set;Using the training sample, using the amino acid with modification as amino acid, multiple linear regression model is established, the retention factor of every kind of amino acid is solved using gradient descent method;To every peptide fragment in the training sample, 56 dimensional features are extracted, and calculate corresponding characteristic value;Prediction model is established, predicting retention time is carried out to the peptide fragment of known array in the test sample.The present invention can be used for the predicting retention time of the peptide fragment with modification under different chromatographic conditions, greatly improve speed, be contrasted on different data acquisition systems with Elude, and speed accelerates more than 30 times.

Description

A kind of peptide fragment liquid chromatogram retention time prediction method and system
Technical field
The present invention relates to bioinformatics, liquid chromatogram predicting retention time, more particularly to a kind of peptide fragment liquid chromatogram is protected Stay time forecasting methods and system.
Background technology
When using " shotgun " identification protein in the prior art, before peptide fragment enters mass spectrograph, with chromatography pair It is separated, and is avoided since peptide fragment number is excessive, and disposable to input mass spectrograph and cause serious Signal averaging, peptide fragment is in quilt It is retention time that the intensity of injection chromatography, which reaches the time undergone during highest, and retention time is independently of another outside Information in Mass Spectra One-dimensional important information, under the conditions of certain reverse phase liquid chromatography, the retention time of different peptide fragments is different, according to the sequence of peptide fragment Etc. information, retention time of the peptide fragment in chromatography can be predicted, and by predicting retention time result and Information in Mass Spectra phase With reference to for improving sensitivity or the reliability of peptide fragment qualification result.
Existing main predicting retention time software has SSCalc, BioLCC, Elude etc., their some are only supported special Prediction under fixation spectral condition, what is had cannot support the prediction of modification peptide fragment, and operational efficiency is low, it is impossible to meet current number According to the requirement of processing.
There are three major issues or shortcoming for existing technology:
1. existing technology can be supported to predict the less of retention time under different chromatographic conditions.When chromatographic condition changes When, the retention time of peptide fragment can accordingly change, and original model is no longer applicable in.
2. for existing technology mostly for conventional peptide fragment, the support to modification peptide fragment is less.Research finds, specific modification The retention time of peptide fragment can be influenced, when peptide fragment is modified, existing model prediction is not allowed.
3. existing technical finesse data efficiency is not high, such as operations of the well-known software Elude in multiple test data sets Time is generally more than 20 minutes.
Inventor has found that the prior art often only supports special color when carrying out the peptide retention time prediction model research of peptide fragment Prediction under spectral condition, and only support the prediction of conventional peptide fragment, this aspect is due to that existing research method compares limitation, The data set under specific chromatographic condition is only studied in some laboratories, and the parameter selected for the data set is not suitable for other chromatography Condition;On the one hand it is due to that researcher does not recognize material impact of the modification to peptide fragment retention time, meanwhile, existing technology It is universal inefficient, it is due to that the process for selecting parameter takes very much.
Innovation and creation " a kind of prediction method for retention time of high performance liquid chromatographic peak ", which is related to a kind of high-efficient liquid phase color Spectral peak retention time prediction method.This method includes:The standard retention time of the various composition of various samples is measured, in each sample Select two components to compare component as double marks of the sample in the target component of product, obtain double mark control components in sample to be tested Test solution in actual measurement retention time, obtain the actual measurement retention times of other target components, carry out 2 points of verifications and more Point verification and etc..Using the invention provide prediction method for retention time of high performance liquid chromatographic peak can Accurate Prediction treat test sample The retention time of the chromatographic peak of the various composition of product, and then the chromatographic peak progress to sample to be tested is qualitative, carries out sample to be tested Differentiate.The method that the invention is provided has higher precision of prediction, and applicable chromatographic column quantity is more, hence it is evident that better than existing phase To retention time method.The invention is by measuring the standard retention times of various sample compositions, using mark control component to be measured Experiment retention time in sample, calculates the opposite retention time of other target components, different from the present invention, and the present invention need not Selected marker component, as long as the experiment retention time of any one part peptide fragment in chromatography experiment is obtained, with regard to known sequence can be predicted The retention time of row peptide fragment, more vague generalization.
Innovation and creation " a kind of method for predicting retention time of gradient elution mode of reversed-phase high-performance liquid chromatography ", this method obtains Take the retention equation of description flowing phase composition and capacity factor measure relation;It is near to linear multistage condition of gradient elution with plate theory Like processing, the initial volume fraction of the i-th step gradient of acquisitionWith corresponding Retention factor ki;Pass through initial volume fractionWith corresponding Retention factor kiConcentration of the testing compound in mobile phase is obtained, according to testing compound in mobile phase Concentration calculates testing compound retention time.The high-precision retention time predicted under the conditions of Arbitrary Gradient of this method, and Prediction process is simple;The feasibility of this method is demonstrated by three embodiments, and when retention time considers the delay of instrument Between when, further increase the precision of retention time.The invention is based on plate theory, is predicted using the retention equation of manual construction Retention time, belongs to the method using experiment parameter construction empirical equation, different from the present invention, the present invention need not construct experience Formula, the physicochemical property by analyzing and using multidimensional characteristic description experiment peptide fragment under the chromatographic condition, it is possible to which prediction is treated Survey the retention time of peptide fragment.
The content of the invention
In view of the deficiencies of the prior art, the present invention proposes a kind of peptide fragment liquid chromatogram retention time prediction method and system.
The present invention proposes a kind of peptide fragment liquid chromatogram retention time prediction method, including:
Step 1, raw mass spectrum data file is scanned for, obtains peptide fragment-spectrogram matching and be used as qualification result, for institute State peptide fragments from object library of the FDR less than 1% in qualification result-spectrogram matching, corresponding peptide fragment in extraction peptide fragment-spectrogram matching Experiment retention time, and training sample and test sample are set;
Step 2, using the training sample, using the amino acid with modification as amino acid, establish multiple linear and return Return model, the retention factor of every kind of amino acid is solved using gradient descent method;
Step 3, to every peptide fragment in the training sample, 56 dimensional features are extracted, and calculate corresponding characteristic value;
Step 4, prediction model is established, predicting retention time is carried out to the peptide fragment of known array in the test sample.
The step 1 includes:
Step 11, according to decorating site, peptide fragment is respectively processed;
Step 12, in the case of same peptide fragment corresponds to multiple two level spectrograms, the peptide fragment of highest scoring is chosen, extraction is real Test retention time;
Step 13, during extraction experiment retention time, the peptide fragment for giving mass-to-charge ratio, is searched on continuous level-one spectrogram Its signal, and the maximum intensity of the signal is recorded, current intensity stops searching when being less than the 10% of maximum intensity, determines signal Terminal, the experiment retention time using the maximum intensity corresponding time as peptide fragment;
Step 24, during every peptide fragment is handled, there is the title and frequency modified in statistics, and is stored.
The step 2 includes:
Step 21, the amino acid is constructed into polynary line together with amino acid present in existing 20 kinds of natures Property regression formula, the multiple linear regression formula are as follows:
T=∑s (Ri*Ni)+b+ε
Wherein, RiRepresent the retention factor of the various amino acid of composition peptide fragment, NiFor the number of various amino acid, b for it is dead when Between, ε is random error;
Step 22, in order to avoid the step-length of gradient decline is too small, cause convergence rate slow, and step-length is excessive, causes not Convergence, by test, is now arranged to 0.000001 by step-length.
Described the step of establishing prediction model is established in the step 4 to be included:
Step 41, according to the characteristic value, prediction model is established, it is as follows:
Constraints is | | yi-(wTxi+ b) | |≤ε, wherein i=1 ..., n, ε >=0, ε represent prediction retention time and reality Maximum gap between the retention time of border;yiRepresent actual retention time;xiRepresent taking for each dimensional feature in the prediction model Value;W represents the weight of each dimensional feature, wTFor the transposed matrix of w;B represents the dead time;
Step 42, will be with described in not if occurring the modification not occurred in the training sample in the test sample The amino acid of the modification of appearance is according to conventional amino acid treatment.
The present invention also proposes a kind of peptide fragment liquid chromatogram predicting retention time system, including:Matching module, to raw mass spectrum Data file scans for, and obtains peptide fragment-spectrogram matching and is used as qualification result, for FDR in the qualification result less than 1% Peptide fragment from object library-spectrogram matching, the experiment retention time of corresponding peptide fragment in extraction peptide fragment-spectrogram matching, and instruction is set Practice sample and test sample;
Establish multiple linear regression model module, for use the training sample, using the amino acid with modification as Amino acid, establishes multiple linear regression model, and the retention factor of every kind of amino acid is solved using gradient descent method;
Characteristic value module is calculated, for every peptide fragment in the training sample, extracting 56 dimensional features, and calculate corresponding Characteristic value;
Prediction model module is established, for establishing prediction model, the peptide fragment of known array in the test sample is carried out Predicting retention time.
The matching module includes:
According to decorating site, peptide fragment is respectively processed;
In the case of same peptide fragment corresponds to multiple two level spectrograms, the peptide fragment of highest scoring is chosen, extraction experiment retains Time;
During extraction experiment retention time, the peptide fragment for giving mass-to-charge ratio, its signal is searched on continuous level-one spectrogram, And the maximum intensity of the signal is recorded, current intensity stops searching when being less than the 10% of maximum intensity, determines the start-stop of signal Point, the experiment retention time using the maximum intensity corresponding time as peptide fragment;
During every peptide fragment is handled, there is the title and frequency modified in statistics, and is stored.
The multiple linear regression model module of establishing includes:
By the amino acid together with amino acid present in existing 20 kinds of natures, construction multiple linear regression is public Formula, the multiple linear regression formula are as follows:
T=Σ (Ri*Ni)+b+ε
Wherein, RiRepresent the retention factor of the various amino acid of composition peptide fragment, NiFor the number of various amino acid, b for it is dead when Between, ε is random error;
In order to avoid the step-length of gradient decline is too small, cause convergence rate slow, and step-length is excessive, causes not restrain, warp Test is crossed, step-length is now arranged to 0.000001.
Described establish in prediction model module is established described the step of establishing prediction model and is included
According to the characteristic value, prediction model is established, it is as follows:
Constraints is | | yi-(wTxi+ b) | |≤ε, wherein i=1 ..., n, ε >=0, ε represent prediction retention time and reality Maximum gap between the retention time of border;yiRepresent actual retention time;xiRepresent taking for each dimensional feature in the prediction model Value;W represents the weight of each dimensional feature, wTFor the transposed matrix of w;B represents the dead time;
If occurring the modification not occurred in the training sample in the test sample, by with it is described do not occur repair The amino acid of decorations is according to conventional amino acid treatment.
From above scheme, the advantage of the invention is that:
The present invention can automate the strategy for adjusting model core parameter, on the one hand can be used for different chromatographic condition lower bands Have the predicting retention time of the peptide fragment of modification, on the one hand greatly improve speed, on different data acquisition systems with Elude pairs Than speed accelerates more than 30 times.
Brief description of the drawings
Fig. 1 is flow chart of the present invention.
Embodiment
Technical scheme can be divided into five steps:
Step 1, using pFind 3, (pFind is that the current domestic uniquely identification of proteins with independent intellectual property right is drawn Hold up) raw mass spectrum data file is scanned for, for every level-one spectrogram in the mass spectrometric data file, obtain corresponding peptide Section qualification result, the i.e. matching of peptide spectrum.
Step 2, the coming from less than 1% for FDR in qualification result (False Discovery Rate, false discovery rate) The peptide spectrum matching of object library, extracts the experiment retention time of corresponding peptide fragment, and it is equally divided into disjoint two at random Point, it is training sample and test sample respectively.PFind 3 controls the FDR of spectrogram aspect by target-bait storehouse method, takes FDR Less than 1% from object library peptide spectrum matching, represent it is expected peptide spectrum matching at least 99% the result is that correct, you can Peptide spectrum matching is believed, available for training and test.
Step 3, using training sample, the amino acid with modification is regarded into " amino acid ", establishes multiple linear regression Model, the method declined using gradient solve the retention factor of every kind of amino acid.
Step 4, for every peptide fragment in training set, 56 dimensional features is extracted, calculate corresponding characteristic value.
Step 5, prediction model is established using SVR methods, it is pre- to the peptide fragment progress retention time of known array in test set Survey.
Further included in the step 2
Step 21, for the different peptide fragment of decorating site, as the processing of different peptide fragments.
Step 22, in the case of same peptide fragment corresponds to multiple two level spectrograms, the peptide fragment of highest scoring is chosen, extracts it Experiment retention time.
Step 23, during extraction experiment retention time, the peptide fragment for giving mass-to-charge ratio, is searched on continuous level-one spectrogram Its signal, records current maximum intensity, stops when intensity is less than the 10% of maximum intensity, determines the terminal of signal, Experiment retention time using the maximum intensity corresponding time as peptide fragment.
Step 24, during every peptide fragment is handled, there is the title and frequency modified in statistics, is recorded in text In.
The step 3 further includes
Step 31, using the modification counted in step 2 as new group, the amino acid with modification is regarded into " new amino Acid ", and amino acid present in existing 20 kinds of natures is together, constructs multiple linear regression formula, is declined using gradient Method solves.Multiple linear regression formula is as follows:
T=∑s (Ri*Ni)+b+ε
Wherein, RiRepresent the retention factor of the various amino acid of composition peptide fragment, be value to be solved, NiFor various amino acid Number, b is the dead time, and ε is random error.RiCan take on the occasion of and negative value, take negative value interval scale shorten peptide fragment reservation when Between, the retention time for extending peptide fragment on the occasion of interval scale is taken, T is the experiment retention time of the peptide fragment in training set.
Step 32, in order to avoid the step-length of gradient decline is too small, cause convergence rate slow, and step-length is excessive, causes not Convergence, by test, is now arranged to 0.000001 by step-length.
Further included in the step 4
Step 41, for each peptide fragment, 56 dimensional features are calculated, as shown in table 1.
The feature applied in 1 prediction model of table
A needs when calculating the retention factor using amino acid with the feature of No. *, when all routine peptides in data set Section, when being modified without processing, these features are respectively using the retention factor and Kyte-Doolittle hydrophobicitys obtained in step 3 Calculated.When including the peptide fragment of modification in data set, only calculated using the retention factor obtained in step 3.# tables Show that intrinsic dimensionality is related with the species of amino acid, for the species of 20 " amino acids " calculated plus participation.
Further included in the step 5
Step 51, SVR models are trained with above-mentioned 56 dimensional feature.The object function of SVR models is as follows:
Constraints is | | yi-(wTxi+ b) | |≤ε, i=1 ..., n.Wherein ε >=0, represents prediction retention time and reality Maximum gap between retention time;yiRepresent actual retention time;xiFor the value of each dimensional feature in prediction model;W represents each The weight of dimensional feature, wTFor the transposed matrix of w;B represents the dead time.||yi-(wTxi+ b) | |≤ε represent prediction retention time and Difference is necessarily less than ε between real time.
Step 52, if occurring the modification not occurred in training set in test set, by the amino acid with modification according to normal The amino acid treatment of rule, prevents program crashing.
The present invention also proposes a kind of peptide fragment liquid chromatogram predicting retention time system, including:
Matching module, for being scanned for raw mass spectrum data file, obtains peptide fragment-spectrogram matching as identification knot Fruit, matches for the peptide fragments from object library of the FDR in the qualification result less than 1%-spectrogram, extraction peptide fragment-spectrogram matching In corresponding peptide fragment experiment retention time, and training sample and test sample are set;
Establish multiple linear regression model module, for use the training sample, using the amino acid with modification as Amino acid, establishes multiple linear regression model, and the retention factor of every kind of amino acid is solved using gradient descent method;
Characteristic value module is calculated, for every peptide fragment in the training sample, extracting 56 dimensional features, and calculate corresponding Characteristic value;
Prediction model module is established, for establishing prediction model, the peptide fragment of known array in the test sample is carried out Predicting retention time.
The matching module includes:
According to decorating site, peptide fragment is respectively processed;
In the case of same peptide fragment corresponds to multiple two level spectrograms, the peptide fragment of highest scoring is chosen, extraction experiment retains Time;
During extraction experiment retention time, the peptide fragment for giving mass-to-charge ratio, its signal is searched on continuous level-one spectrogram, And the maximum intensity of the signal is recorded, current intensity stops searching when being less than the 10% of maximum intensity, determines the start-stop of signal Point, the experiment retention time using the maximum intensity corresponding time as peptide fragment;
During every peptide fragment is handled, there is the title and frequency modified in statistics, and is stored.
The multiple linear regression model module of establishing includes:
By the amino acid together with amino acid present in existing 20 kinds of natures, construction multiple linear regression is public Formula, the multiple linear regression formula are as follows:
T=∑s (Ri*Ni)+b+ε
Wherein, RiRepresent the retention factor of the various amino acid of composition peptide fragment, NiFor the number of various amino acid, b for it is dead when Between, ε is random error;
In order to avoid the step-length of gradient decline is too small, cause convergence rate slow, and step-length is excessive, causes not restrain, warp Test is crossed, step-length is now arranged to 0.000001.
Described establish in prediction model module is established described the step of establishing prediction model and is included
According to the characteristic value, prediction model is established, it is as follows:
Constraints is | | yi-(wTxi+ b) | |≤ε, wherein i=1 ..., n, ε >=0, ε represent prediction retention time and reality Maximum gap between the retention time of border;yiRepresent actual retention time;xiRepresent taking for each dimensional feature in the prediction model Value;W represents the weight of each dimensional feature, wTFor the transposed matrix of w;B represents the dead time;
If occurring the modification not occurred in the training sample in the test sample, by with it is described do not occur repair The amino acid of decorations is according to conventional amino acid treatment.

Claims (6)

  1. A kind of 1. peptide fragment liquid chromatogram retention time prediction method, it is characterised in that including:
    Step 1, raw mass spectrum data file is scanned for, obtains peptide fragment-spectrogram matching and be used as qualification result, for the mirror Determine peptide fragments from object library of the FDR less than 1% in result-spectrogram matching, the reality of corresponding peptide fragment in extraction peptide fragment-spectrogram matching Retention time is tested, and training sample and test sample are set;
    Step 2, using the training sample, using the amino acid with modification as amino acid, multiple linear regression mould is established Type, the retention factor of every kind of amino acid is solved using gradient descent method;
    Step 3, to every peptide fragment in the training sample, 56 dimensional features are extracted, and calculate corresponding characteristic value;
    Step 4, prediction model is established, predicting retention time is carried out to the peptide fragment of known array in the test sample;
    Described the step of establishing prediction model is established in wherein described step 4 to be included:
    Step 41, according to the characteristic value, prediction model is established, it is as follows:
    <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>w</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>
    Constraints is | | yi-(wTxi+ b) | |≤ε, wherein i=1 ..., n, ε >=0, ε represent prediction retention time and actual guarantor Stay maximum gap between the time;yiRepresent actual retention time;xiRepresent the value of each dimensional feature in the prediction model;W tables Show the weight of each dimensional feature, wTFor the transposed matrix of w;B represents the dead time;
    Step 42, if occurring the modification not occurred in the training sample in the test sample, will not occur with described Modification amino acid according to conventional amino acid treatment.
  2. 2. peptide fragment liquid chromatogram retention time prediction method as claimed in claim 1, it is characterised in that including:The step 1 Including:
    Step 11, according to decorating site, peptide fragment is respectively processed;
    Step 12, in the case of same peptide fragment corresponds to multiple two level spectrograms, the peptide fragment of highest scoring is chosen, extraction experiment is protected Stay the time;
    Step 13, during extraction experiment retention time, the peptide fragment for giving mass-to-charge ratio, its letter is searched on continuous level-one spectrogram Number, and the maximum intensity of the signal is recorded, current intensity stops searching when being less than the 10% of maximum intensity, determines rising for signal Stop, the experiment retention time using the maximum intensity corresponding time as peptide fragment;
    Step 24, during every peptide fragment is handled, there is the title and frequency modified in statistics, and is stored.
  3. 3. peptide fragment liquid chromatogram retention time prediction method as claimed in claim 1, it is characterised in that including:The step 2 Including:
    Step 21, by the amino acid together with amino acid present in existing 20 kinds of natures, construction multiple linear returns Make a public possession formula, the multiple linear regression formula is as follows:
    T=∑s (Ri*Ni)+b+ε
    Wherein, RiRepresent the retention factor of the various amino acid of composition peptide fragment, NiFor the number of various amino acid, b is the dead time, ε For random error;
    Step 22, in order to avoid the step-length of gradient decline is too small, cause convergence rate slow, and step-length is excessive, causes not restrain, By test, step-length is now arranged to 0.000001.
  4. A kind of 4. peptide fragment liquid chromatogram predicting retention time system, it is characterised in that including:Matching module, to raw mass spectrum number Scanned for according to file, obtain peptide fragment-spectrogram matching and be used as qualification result, for the coming less than 1% of FDR in the qualification result From peptide fragment-spectrogram matching of object library, the experiment retention time of corresponding peptide fragment in extraction peptide fragment-spectrogram matching, and training is set Sample and test sample;
    Multiple linear regression model module is established, for using the training sample, using the amino acid with modification as new ammonia Base acid, is established multiple linear regression model, the retention factor of every kind of amino acid is solved using gradient descent method;
    Characteristic value module is calculated, for every peptide fragment in the training sample, extracting 56 dimensional features, and calculate corresponding special Value indicative;
    Prediction model module is established, for establishing prediction model, the peptide fragment of known array in the test sample is retained Time prediction;
    Wherein described establish in prediction model module is established described the step of establishing prediction model and is included
    According to the characteristic value, prediction model is established, it is as follows:
    <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mi>w</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>
    Constraints is | | yi-(wTxi+ b) | |≤ε, wherein i=1 ..., n, ε >=0, ε represent prediction retention time and actual guarantor Stay maximum gap between the time;yiRepresent actual retention time;xiRepresent the value of each dimensional feature in the prediction model;W tables Show the weight of each dimensional feature, wTFor the transposed matrix of w;B represents the dead time;
    If occurring the modification not occurred in the training sample in the test sample, by with the modification not occurred Amino acid is according to conventional amino acid treatment.
  5. 5. peptide fragment liquid chromatogram predicting retention time system as claimed in claim 4, it is characterised in that including:The matching Module includes:
    According to decorating site, peptide fragment is respectively processed;
    In the case of same peptide fragment corresponds to multiple two level spectrograms, the peptide fragment of highest scoring, extraction experiment retention time are chosen;
    During extraction experiment retention time, the peptide fragment for giving mass-to-charge ratio, searches its signal, and remember on continuous level-one spectrogram The maximum intensity of the signal is recorded, current intensity stops searching when being less than the 10% of maximum intensity, determines the terminal of signal, will Experiment retention time of the maximum intensity corresponding time as peptide fragment;
    During every peptide fragment is handled, there is the title and frequency modified in statistics, and is stored.
  6. 6. peptide fragment liquid chromatogram predicting retention time system as claimed in claim 4, it is characterised in that including:The foundation Multiple linear regression model module includes:
    By the amino acid together with amino acid present in existing 20 kinds of natures, multiple linear regression formula is constructed, The multiple linear regression formula is as follows:
    T=∑s (Ri*Ni)+b+ε
    Wherein, RiRepresent the retention factor of the various amino acid of composition peptide fragment, NiFor the number of various amino acid, b is the dead time, ε For random error;
    In order to avoid the step-length of gradient decline is too small, cause convergence rate slow, and step-length is excessive, causes not restrain, by surveying Examination, is now arranged to 0.000001 by step-length.
CN201610941299.XA 2016-10-25 2016-10-25 A kind of peptide fragment liquid chromatogram retention time prediction method and system Active CN106248844B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610941299.XA CN106248844B (en) 2016-10-25 2016-10-25 A kind of peptide fragment liquid chromatogram retention time prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610941299.XA CN106248844B (en) 2016-10-25 2016-10-25 A kind of peptide fragment liquid chromatogram retention time prediction method and system

Publications (2)

Publication Number Publication Date
CN106248844A CN106248844A (en) 2016-12-21
CN106248844B true CN106248844B (en) 2018-05-04

Family

ID=57600700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610941299.XA Active CN106248844B (en) 2016-10-25 2016-10-25 A kind of peptide fragment liquid chromatogram retention time prediction method and system

Country Status (1)

Country Link
CN (1) CN106248844B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020129895A1 (en) * 2018-12-20 2020-06-25 キヤノン株式会社 Information processing device, method for controlling information processing device, and program
CN113936742A (en) * 2021-09-14 2022-01-14 上海中科新生命生物科技有限公司 Peptide spectrum retention time prediction method and system based on mass spectrometry

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050267688A1 (en) * 2002-12-18 2005-12-01 Konstantinos Petritis Method for enhanced accuracy in predicting peptides elution time using liquid separations or chromatography
WO2010060218A1 (en) * 2008-11-28 2010-06-03 University Of Manitoba A composition for use as a peptide retention standard and a method of predicting peptide hydrophobicity in liquid chromatography
US20130090862A1 (en) * 2010-05-20 2013-04-11 University Of Manitoba Methods and systems for analysis of peptide sample streams using tandem mass spectroscopy
CN101871945B (en) * 2010-06-13 2013-05-08 中国科学院计算技术研究所 Spectrum library generating method and spectrogram identifying method of tandem mass spectrometry
CN102507813B (en) * 2011-09-26 2013-11-13 天津大学 Method for forecasting retention time of gas chromatography under temperature programming after shortening of chromatographic column
CN103091434B (en) * 2013-01-10 2014-10-15 天津大学 Method for predicting retention time of gradient elution mode of reversed-phase high-performance liquid chromatography
CN103439440B (en) * 2013-07-11 2015-08-26 中国食品药品检定研究院 A kind of prediction method for retention time of high performance liquid chromatographic peak
CN104182658B (en) * 2014-08-06 2017-05-03 中国科学院计算技术研究所 Tandem mass spectrogram identification method
GB2532430B (en) * 2014-11-18 2019-03-20 Thermo Fisher Scient Bremen Gmbh Method for time-alignment of chromatography-mass spectrometry data sets

Also Published As

Publication number Publication date
CN106248844A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
Causon et al. Fingerprinting of traditionally produced red wines using liquid chromatography combined with drift tube ion mobility-mass spectrometry
CN104297355B (en) Simulative-target metabonomics analytic method based on combination of liquid chromatography and mass spectrum
Rachineni et al. Identifying type of sugar adulterants in honey: Combined application of NMR spectroscopy and supervised machine learning classification
CN109696510B (en) Method for acquiring metabolic difference between transgenic corn and non-transgenic corn based on UHPLC-MS
CN106568924A (en) Method used for determining molecular composition of crude oil based on crude oil macroscopic properties
CN110441423A (en) A kind of method and its system measuring grain fragrance component
CN113480599A (en) Characteristic polypeptide for identifying deer antler glue of sika deer or red deer and application thereof
CN105301163A (en) Targeted metabo lomics analysis method for determining metabolites of living body
CN106248844B (en) A kind of peptide fragment liquid chromatogram retention time prediction method and system
CN106749598A (en) A kind of feature peptide for detecting the adulterated ratio of milk powder in goat milk powder is combined and method
CN111060642A (en) Method for classifying and identifying tobacco leaves of same variety and different producing areas
CN101606970A (en) The method of quality control of radix scutellariae medicinal materials
CN105181678A (en) Identification method of rice varieties based on laser-induced breakdown spectroscopy (LIBS)
EP2834659B1 (en) Method for substance identification from nmr spectrum
CN104182658B (en) Tandem mass spectrogram identification method
JP3707010B2 (en) General-purpose multicomponent simultaneous identification and quantification method in chromatograph / mass spectrometer
JP2007256126A (en) Mass spectrometry system
CN111812254A (en) 2-decene diacid used as indicator substance for honey authenticity evaluation and application thereof in honey adulteration identification
CN108205042B (en) Anhua dark tea identification method
CN103439441B (en) Peptide identification method based on subset error rate estimation
CN112415208A (en) Method for evaluating quality of proteomics mass spectrum data
CN107300535A (en) The method of near-infrared quick detection organic fertilizer active constituent content
CN106908527A (en) A kind of method for differentiating the honey of lychee flowers place of production
GB2549354A (en) Chromatograph mass spectrometer and control method therefor
CN108027346A (en) Mass spectrometer, mass spectrometric analysis method and mass spectral analysis program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant