CN106018535A - Complete glycopeptide identifying method and system - Google Patents
Complete glycopeptide identifying method and system Download PDFInfo
- Publication number
- CN106018535A CN106018535A CN201610309699.9A CN201610309699A CN106018535A CN 106018535 A CN106018535 A CN 106018535A CN 201610309699 A CN201610309699 A CN 201610309699A CN 106018535 A CN106018535 A CN 106018535A
- Authority
- CN
- China
- Prior art keywords
- sugar chain
- peptide fragment
- glycopeptide
- chain structure
- ion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
Landscapes
- Chemical & Material Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Electrochemistry (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention provides a complete glycopeptide identifying method. The method includes the steps the a carbohydrate chain structure database is traversed for any actual measurement tandem mass spectrum to be identified, for each carbohydrate chain structure, the mass of glycopeptide Y ions possibly obtained in fragment tests is concluded according to the mass of parent ions in the current series connection spectrogram, then the number of spectrum peaks matched with the current second-level spectrum is calculated, and the number of the matched spectrum peaks is used as a coarse marking result of matching between the glycopeptide Y ions and the current second-level spectrum; carbohydrate chain structures with the top K coarse marking scores serve as candidate carbohydrate chain structures; for the current series connection spectrogram, all the candidate carbohydrate chain structures are traversed, spectrum-spectrum matching between actual measurement spectrums and theoretical spectrums of peptide fragments of each candidate carbohydrate chain structure is marked, spectrum-spectrum matching between actual measurement spectrums and theoretical spectrums of the carbohydrate chain structures is marked, and then the glycopeptide structure identifying result is obtained. Reliability of complete large-scale glycopeptide identification can be improved, and calculation complexity is low.
Description
Technical field
The present invention relates to bioinformatics technique field, specifically, the present invention relates to glycoproteomics and mass spectrum skill
Art field.
Background technology
Mass-spectrometric technique is the Main Means that site-specific modification of protein glycosylation is identified in scale.Mass-spectrometric technique
In, the tandem mass spectrum figure generally first passing through intact glycopeptide identifies intact glycopeptide, and that then infers on protein is glycosylation modified.
At present, for the qualification of scale intact glycopeptide spectrogram, there are two kinds and identify strategy, be 1. with GRIP, ArMone 2.0 respectively
Etc. system be representative based on intact glycopeptide series connection spectrogram identify sugar chain then according to the method for peptide fragment quality matches peptide fragment, and
2. then forming according to sugar chain quality supposition sugar chain based on intact glycopeptide spectrogram qualification peptide fragment with systems such as Byonic as representative
Method.Both approaches is briefly described below, then analyzes the deficiency of both approaches.
1. the plant method is mainly by the sugar chain fragment ion information in glycopeptide spectrogram, after mating with the Y-ion of glycopeptide
Obtain the qualification result of sugar chain, carried out by the peptide fragment quality inferred and the peptide section sequence in protein sequence storehouse the most again
Join, infer peptide section sequence.2. the plant method is mainly by the peptide fragment fragment ion information in glycopeptide spectrogram, with the peptide of glycopeptide
The qualification result of peptide fragment is obtained, then by the sugar chain quality inferred and sugar chain in sugar chain data base after section fragment ion coupling
Structure is mated, and infers sugar chain structure.
In above two method, intact glycopeptide spectrogram is only utilized to identify the sugar chain portion of glycopeptide or peptide fragment part, and another
Partly then directly being inferred by quality and obtain, therefore reliability is poor.Give an example, identify when utilizing intact glycopeptide spectrogram
After the sugar chain portion of glycopeptide, thus it is speculated that the quality going out peptide fragment is 999.5633, and the quality that can match in this range of error is just
A lot, the quality of two peptide fragments of such as LTEAKPVDK and DVPKAETLK is just the same, and can match 999.5633, as
Fruit is only mated by quality, cannot be distinguished by the two completely.In like manner, only the method for sugar chain composition is speculated also according to sugar chain quality
There are the problems referred to above.
Therefore, currently solution is identified in the urgent need to a kind of scale intact glycopeptide with higher reliability.
Summary of the invention
The task of the present invention is to provide a kind of scale intact glycopeptide with higher reliability and identifies solution.
The invention provides a kind of method that intact glycopeptide is identified, comprise the following steps:
1) the actual measurement tandem mass spectrum of the two grades of fragmentations of glycopeptide simultaneously containing sugar chain patch information and peptide fragment patch information is obtained;
2) for arbitrary actual measurement tandem mass spectrum to be identified, travel through sugar chain structure data base, the most each sugar chain is tied
Structure, performs step 21)~22), until all sugar chain structures traversal is complete in sugar chain structure data base;
21) for current sugar chain structure, according to the parent ion quality of current spectrogram of connecting, infer in fragmentation test all
Can the quality of obtainable glycopeptide Y-ion;
22) according to step 21) each of drawn the quality of glycopeptide Y-ion under situation, calculate and match when the first two
The number of the spectral peak of level spectrogram, and using the number of this coupling spectral peak as the glycopeptide Y-ion under corresponding situation and when the first two level
The result of slightly giving a mark of spectrogram coupling;
3) the thick marking result sugar chain structure in front K name is taken as candidate's sugar chain structure of current spectrogram of connecting;Wherein, K
Number of candidates for default sugar chain structure;
4) for spectrogram of currently connecting, travel through all of candidate's sugar chain structure, for each candidate's sugar chain structure, carry out reality
Survey the spectrum spectrum coupling marking of the theoretical spectrum of spectrum and peptide fragment, and the spectrum spectrum coupling marking of the theoretical spectrum of actual measurement spectrum and sugar chain structure,
And then draw glycopeptide Structural Identification result.
Wherein, described step 21) in, infer the side of the quality of the glycopeptide Y-ion likely obtained in fragmentation test
Method includes: calculating glycopeptide Y-ion quality=peptide fragment quality+sugar chain reducing end mass of ion, described peptide fragment quality is by currently connecting
The parent ion quality of spectrogram deducts the quality of current sugar chain structure and draws, described sugar chain reducing end mass of ion is counted as follows
Draw: analyze all possible cases of current sugar chain structure fragmentation, obtain the sugar chain after the fragmentation under every kind of possible case also
The structure of former end ion, then structure based on these sugar chain reducing end ions draws sugar chain reducing end mass of ion.
Wherein, described step 21) in, described sugar chain reducing end mass of ion draws by searching sugar chain concordance list, described
Sugar chain concordance list, with quality as index entry, records the sugar chain reducing end ionic structure corresponding to each quality.
Wherein, described step 21) in, described sugar chain concordance list draws in advance according to substep:
211) import sugar chain structure data base and travel through each of which sugar chain structure;
212) for current sugar chain, structural analysis based on this sugar chain goes out all possible fracture position of this sugar chain, draws
The structure of the reducing end ion that every kind of fracture position is corresponding;
213) calculate the quality of each possible reducing end ion, and then obtain described sugar chain concordance list.
Wherein, described step 21) also include: directly give up unmatched sugar chain structure obvious with spectrogram of currently connecting, as
In two grades of spectrograms of fruit current series connection spectrogram, 274 spectral strengths do not comprise in exceeding 10% and current sugar chain structure of summit
NeuAc, then give up this sugar chain structure;If two grades of spectrograms of series connection spectrogram currently do not comprise 274 spectral peaks and current sugar chain knot
Structure comprises NeuAc, then gives up this sugar chain structure;If currently in two grades of spectrograms of series connection spectrogram, 290 spectral strengths exceed
10% and the current sugar chain structure on peak does not comprise NeuGc, then gives up this sugar chain structure;If currently two grades of series connection spectrogram
Spectrogram does not comprise in 290 spectral peaks and current sugar chain structure and comprise NeuGc, then give up this sugar chain structure.
Wherein, described step 21) also include: if it is known that glycopeptide sample to be identified is N glycopeptide, then according to current sugar chain
The number of pentasaccharides core ion in structure, gives up obvious unmatched sugar chain structure.
Wherein, described step 4) include substep:
41) travel through all of candidate's sugar chain structure, step 42 is performed for each candidate's sugar chain structure);
42) the peptide fragment quality inferred according to the parent ion quality of current sugar chain structure and current series connection spectrogram, then retrieves
Peptide fragment concordance list obtains the peptide fragment of this quality matches;
43) the peptide fragment theory fragmentation that will match to, is met the ion of fragmentation condition;
44) theoretical spectra of the theoretical fragment ion of the peptide fragment each matched and actual spectrogram carry out spectrum spectrum
Join, obtain corresponding peptide fragment and carefully give a mark;
45) for current candidate sugar chain structure, the theoretical spectra of the theoretical Y-ion of glycopeptide is composed with actual spectrogram
Spectrum coupling, obtains the thin marking of sugar chain structure;
46) sugar chain structure is carefully given a mark carefully to give a mark with peptide fragment after weighting is averaging and is obtained by corresponding sugar chain structure and corresponding peptide fragment
The thin marking of the glycopeptide constituted;
47) glycopeptide Structural Identification result is drawn according to the thin marking of each glycopeptide structure.
Wherein, described step 42) in, described peptide fragment concordance list is in advance based on protein sequence storehouse and sets up, described peptide fragment rope
Draw table and correspond to the peptide section sequence of each quality with quality for index entry record.
Wherein, the process of setting up of described peptide fragment concordance list includes substep:
421) import protein sequence database and travel through each of which protein sequence;
422) for current protein sequence, analyze all possible theoretical enzyme action situation of this protein sequence, draw every
Plant the peptide section sequence that theoretical enzyme action situation is corresponding;
423) for each possible peptide fragment, generate what corresponding multiple band was modified based on all possible modified forms
Peptide fragment;
424) calculating the quality of each possible peptide fragment, and then obtain described peptide fragment concordance list, peptide fragment therein was both
Including without the peptide fragment modified, also include the peptide fragment that band is modified.
Wherein, the process of setting up of described sugar chain concordance list also includes substep:
214) according to the sugar chain in sugar chain structure data base, construct bait sugar storehouse, analyze the reducing end ion of bait sugar chain
Structure also calculates corresponding reducing end mass of ion;
215) index entry of the reducing end ion from bait sugar chain is added step 33) generating based on raw sugar storehouse of obtaining
Sugar chain concordance list in;
The process of setting up of described peptide fragment concordance list also includes:
425) based on protein sequence storehouse and all possible enzyme action situation, draw comprise likely peptide fragment peptide fragment row
Table, according to peptide fragment lists construction bait peptide fragment list the peptide fragment quality that calculates each of which peptide fragment;
426) bait peptide fragment list is incorporated to step 213) in the peptide fragment concordance list of gained;
Described intact glycopeptide authentication method also comprises the following steps:
5) for each series connection spectrogram to be identified, step 2 is performed)~4), take each series connection spectrogram glycopeptide and carefully give a mark the
The result of one, estimates the False discovery rate of intact glycopeptide, and output is with the final qualification result of False discovery rate, described step 5) bag
Include substep:
51) glycopeptide taking each series connection spectrogram first place is labeled as the qualification result of this spectrogram;
52), in all first place results, the sugar chain that sugar chain is considered mistake from the result in bait storehouse is identified, estimates accordingly
Count out
Wherein,Represent the marking sugar chain more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror
Determining the marking set of result, x represents the scoring threshold of artificial setting, and G=False represents sugar chain and identifies the event of mistake, and p represents
Probability function;
53), in all first place results, the peptide fragment that peptide fragment is considered mistake from the result in bait storehouse is identified, according to mistake
Peptide fragment identifies number by mistake, calculates
Wherein,Represent the marking peptide fragment more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror
Determining the marking set of result, x represents the scoring threshold of artificial setting, and P=False represents peptide fragment and identifies the event of mistake;
54), in all first place results, sugar chain and peptide fragment both are from the result in bait storehouse and are considered sugar chain and peptide fragment simultaneously
The qualification of mistake, calculates
WhereinRepresent the marking glycopeptide more than or equal to x and identify sugar chain and peptide fragment mistake simultaneously in set
False discovery rate, G=False ∩ P=False represents sugar chain and identifies wrong event with peptide fragment simultaneously;
55) False discovery rate of intact glycopeptide is calculated
Compared with prior art, the present invention has a following technique effect:
1, the present invention can significantly improve the reliability that intact glycopeptide scale is identified.
2, the present invention is on the premise of promoting qualification accuracy, it is possible to maintain relatively low computation complexity.
3, the False discovery rate of intact glycopeptide qualification result can be controlled by the present invention exactly so that qualification result is more
Add perfect.
Accompanying drawing explanation
Hereinafter, describe embodiments of the invention in detail in conjunction with accompanying drawing, wherein:
Fig. 1 shows the flow chart of the intact glycopeptide authentication method of one embodiment of the invention.
Detailed description of the invention
For ease of understanding, be first given herein involved by the implication of some professional conceptual:
Glycosidic bond: monosaccharide is connected the chemical bond formed with monosaccharide;
The reducing end of sugar chain: refer to the one end containing free radical-CHO, glycoproteomics refers in sugar chain with
That one end that peptide fragment is connected;
The reducing end ion of sugar chain: after the glycosidic bond fracture of sugar chain, be in the part fragment ion of reducing end, the most also
It is referred to as the Y-ion of sugar chain
The Y-ion of glycopeptide: the reducing end ion of sugar chain is plus the peptide section sequence of complete glycopeptide;
The b/y ion of glycopeptide: the b/y ion of the peptide fragment part of glycopeptide, b/y ion does not comprise any sugar chain structure;
Pentasaccharides core: five fixing monosaccharide structures that the reducing end of N sugar chain is usually present, i.e. 2 × HexNAc+3 ×
Hex。
Below, in conjunction with the accompanying drawings and embodiments the present invention is described further.
Fig. 1 shows the flow chart of the intact glycopeptide authentication method of one embodiment of the invention, and this flow process includes following step
Rapid:
Step 100: set up sugar chain concordance list based on sugar chain structure storehouse, sets up peptide fragment concordance list based on protein sequence storehouse.
Sugar chain concordance list, with quality as index entry, records the sugar chain reducing end ionic structure corresponding to each quality.Wherein, a quality
Likely correspond to multiple sugar chain reducing end ionic structure.Peptide fragment concordance list is also with quality as index entry, and record is corresponding to each matter
The peptide section sequence (i.e. forming the aminoacid sequence of peptide fragment) of amount.It should be noted that this step is the preposition of intact glycopeptide qualification
Step, if having obtained described sugar chain concordance list and peptide fragment concordance list, then can omit this step in intact glycopeptide identity process.
According to one embodiment of present invention, set up the process of sugar chain concordance list based on sugar chain structure storehouse and include following sub-step
Rapid:
Step 110, imports sugar chain structure data base and travels through each of which sugar chain structure (hereinafter sometimes simply referred to as sugar
Chain).
Step 111: for current sugar chain, structural analysis based on this sugar chain goes out all possible fracture position of this sugar chain,
Draw the structure of the reducing end ion that every kind of fracture position is corresponding.
Step 112: calculate the quality of each possible reducing end ion, and then obtain described sugar chain concordance list.
Set up the process of peptide fragment concordance list based on protein sequence storehouse and include substep:
Step 120: import protein sequence database and travel through each of which protein sequence.
Step 121: for current protein sequence, analyzes all possible theoretical enzyme action situation of this protein sequence,
Go out every kind of peptide section sequence corresponding to theoretical enzyme action situation (herein, peptide section sequence refers to the fragment after front protein sequence fragmentation,
Sometimes peptide fragment it is also referred to as).
Step 122: for each possible peptide fragment, generate corresponding multiple band based on all possible modified forms and repair
The peptide fragment of decorations.
Step 123: calculate the quality of each possible peptide fragment, and then obtain described peptide fragment concordance list.Peptide therein
Section had both included, without the peptide fragment modified, also including the peptide fragment that band is modified.
Step 200: glycopeptide sample carries out two grades of fragmentations of mass spectrum, obtains and contains sugar chain patch information and peptide fragment fragment simultaneously
The actual measurement tandem mass spectrum of two grades of fragmentations of glycopeptide of information.In one embodiment, use the energetic encounter cracking of multi-energy combination
(HCD) fragmentation mode.Inventor finds during research glycopeptide Fragmentation rule, utilizes based on multi-energy combination
HCD technology, it is possible to simultaneously obtain sugar chain and peptide fragment fragment in a series connection spectrogram, is thus to identify sugar chain and peptide fragment simultaneously
Provide the foundation.
Step 300: complete one-level spectrum parent ion screening and derive.In the present embodiment, first mass spectrometric parent ion screens and leads
Go out and complete based on pParse system.The concrete ins and outs of pParse system refer to document: [Yuan, Z.F., et
al.pParse:a method for accurate determination of monoisotopic peaks in high-
resolution mass spectra.Proteomics 12,226-235(2012)].It should be noted that in the present invention, one
Level mass spectrometry precursor ion screening technique is not unique, in other embodiments, it is also possible to select other first mass spectrometric parent ions
Screening technique.
Step 400: choose a collection of tandem mass spectrum from the spectrogram that two grades of fragmentations of mass spectrum are obtained as mass spectrum to be identified, often
Secondary take a tandem mass spectrum as current mass spectrum to be identified.It is known that tandem mass spectrum includes the first mass spectrometric and two being associated
Level mass spectrum.In this step, the pretreatment such as isotopic peak are carried out for the second order ms in current tandem mass spectrum to be identified.One
In individual embodiment, isotopic peak is gone to include: to retain single isotopic spectral peak and remove the isotopic peak of other forms.Remove isotope
The purpose at peak is to make qualification result more accurate, because the isotopic peak of other forms may be mated by the result of mistake
On, the probability causing mistake to be identified increases.
Step 500: for pretreated second order ms, open search sugar chain data base, to each of which sugar chain
Structure, mates the number drawing coupling spectral peak, and then draws candidate's sugar chain structure, this enforcement based on quality to second order ms
In example, this method that second order ms mates the number drawing coupling spectral peak based on quality is also called thick marking.
In one embodiment, step 500 includes substep:
Step 510: for second order ms the most to be identified, carry out principium identification according to the information in spectrogram, screen out bright
Show for the best spectrogram of non-glycopeptide spectrogram or quality.If current spectrogram is screened out, then return to step 400, continue with down
One tandem mass spectrum.
In one embodiment, screen out and be evident as the process of non-glycopeptide spectrogram or the best spectrogram of quality and include: if
The value of mass-to-charge ratio (138.06,204.09 and following 274 and 290 all represent) spectral peak exists in spectrogram 138.06 and 204.09
And the intensity of relative spectrogram summit is all more than 20%, then judge that this spectrogram is as glycopeptide spectrogram;Otherwise judge that this spectrogram is as non-saccharide
Peptide spectrogram, screens out current spectrogram.
Step 520: traversal sugar chain structure data base, for each sugar chain structure, performs step 530~533.
Step 530: for current sugar chain structure, according to the parent ion quality of current spectrogram, infers institute in fragmentation test
The quality of the glycopeptide Y-ion likely obtained.Glycopeptide Y-ion quality=peptide fragment quality+sugar chain reducing end mass of ion.Wherein,
The quality that peptide fragment quality is deducted current sugar chain structure by the parent ion quality of spectrogram draws, the calculating of sugar chain reducing end mass of ion
As follows: to analyze all possible cases of current sugar chain structure fragmentation, obtain the reduction of the sugar chain after the fragmentation under every kind of possible case
Hold ion, then Structure Calculation based on these sugar chain reducing end ions draws sugar chain reducing end mass of ion.
Step 531: combine in the information and current sugar chain structure shown in current spectrogram whether comprise NeuAc or NeuGc,
Give up obvious unmatched sugar chain structure.Wherein, if 274 spectral strengths exceed the 10% of summit and current in current spectrogram
Do not comprise NeuAc in sugar chain structure, then this sugar chain structure is rejected;If spectrogram does not comprise 274 spectral peaks and current sugar chain knot
Comprise NeuAc in structure, then this sugar chain structure is rejected;If 290 spectral strengths exceed the 10% of summit and current in spectrogram
Sugar chain structure does not comprise NeuGc, then gives up this sugar chain structure;If spectrogram does not comprise 290 spectral peaks and current sugar chain structure
In comprise NeuGc, then give up this sugar chain structure.
Step 532: step 530 each of is drawn to the quality of glycopeptide Y-ion under situation, to this glycopeptide Y-ion
Quality is mated with when the first two grade of spectrogram, calculates the number of the spectral peak matched.Using the number of this coupling spectral peak as phase
Answer the result of slightly giving a mark that the glycopeptide Y-ion under situation mates with when the first two grade of spectrogram.
Step 533: if it is known that glycopeptide to be identified is N glycopeptide, then according to pentasaccharides core ion in current sugar chain structure
Number, give up obvious unmatched sugar chain structure.In the present embodiment, if pentasaccharides core ion mates less than 2, then give up
Current sugar chain structure.
After execution of step 533, choose next sugar chain structure, re-execute step 530~533, until sugar chain structure
In data base, all sugar chain structures traversal is complete.
Step 540: before taking the number of the spectral peak matched, the sugar chain of K name is tied as candidate's sugar chain of current glycopeptide spectrogram
Structure.Wherein, K is configurable parameter, and in the present embodiment, the range of choice of K is: K is more than or equal to 50.
Step 600: for when the first two grade of spectrogram, travel through all of candidate's sugar chain structure (K sugar that i.e. step 540 draws
Chain), for each candidate's sugar chain structure, carry out peptide fragment coupling according to spectrogram further, obtain complete glycopeptide and carefully give a mark.Carefully beat
Dividing and be normally based on actual measurement spectrum and the marking of the theoretical spectrum spectrum coupling composed, it can weigh out each candidate's glycopeptide knot more accurately
Structure with when the matching degree of the first two grade of spectrogram.
In the present embodiment, glycopeptide is carefully given a mark, and carefully giving a mark according to sugar chain structure carefully gives a mark with peptide fragment comprehensively draws, its process bag
Include substep:
Step 610: travel through all of candidate's sugar chain structure, performs step 611 for each candidate's sugar chain structure.
Step 611: the peptide fragment quality inferred according to the parent ion quality of current sugar chain structure and current spectrogram, retrieves peptide fragment
Concordance list obtains the peptide fragment of quality matches.
Step 612: by peptide fragment theory fragmentation, be met the ion of fragmentation condition.Such as under HCD fragmentation available
The ion meeting fragmentation condition includes: the b/y ion of peptide fragment, comprises the b/y+83.03711Da ion of glycosylation site, and
B/y+HexNAc ion.
Step 613: mated with actual spectrogram by the theoretical spectra of the theoretical fragment ion of each peptide fragment, obtains corresponding
Peptide fragment is carefully given a mark.In this step, theoretical spectrum is mated marking with the spectrum spectrum of actual spectrum and KSDP algorithm can be used to realize.Specifically refer to
Document [Fu, Y., et al.Exploiting the kernel trick to correlate fragment ions for
peptide identification via tandem mass spectrometry.Bioinformatics 20,1948-
1954(2004)]。
Step 614: the theoretical spectra of the theoretical Y-ion of glycopeptide is mated with actual spectrogram, obtains sugar chain structure and carefully beat
Point.In this step, theoretical spectrum is mated marking with the spectrum spectrum of actual spectrum and KSDP algorithm can be used to realize.Specifically refer to document [Fu,
Y.,et al.Exploiting the kernel trick to correlate fragment ions for peptide
identification via tandem mass spectrometry.Bioinformatics 20,1948-1954
(2004)]。
Step 615: carefully giving a mark carefully to give a mark with peptide fragment after weighting is averaging according to sugar chain structure obtains by corresponding sugar chain structure
Thin marking with the glycopeptide that corresponding peptide fragment is constituted.Weight can draw according to the significance level of sugar chain structure and peptide fragment.Excellent at one
Selecting in embodiment, the weight of peptide fragment marking is 0.65, and the weight of sugar chain marking is 0.35.
After the glycopeptide drawing each structure is carefully given a mark, i.e. can determine that the qualification result that current spectrum figure is final.
The present embodiment can significantly improve the reliability that intact glycopeptide scale is identified, and, additionally it is possible to identify promoting
Relatively low computation complexity is maintained on the premise of accuracy.
Further, according to a preferred embodiment of the present invention, the wig of above-mentioned intact glycopeptide authentication method it is also directed to
Now rate is studied, and gives the scheme controlling qualification result False discovery rate further.
In this embodiment, the process setting up sugar chain concordance list also includes:
Step 113: according to the sugar chain in sugar storehouse (i.e. sugar chain structure data base), constructs bait sugar storehouse, analyzes bait sugar chain
Reducing end ionic structure and calculate corresponding reducing end mass of ion.
Step 114: the index entry of the reducing end ion from bait sugar chain is added that step 112 obtains based on raw sugar storehouse
In the sugar chain concordance list generated.
The process setting up peptide fragment concordance list also includes:
Step 124: based on protein sequence storehouse and all possible enzyme action situation, draw comprise the peptide of likely peptide fragment
Duan Liebiao, according to peptide fragment lists construction bait peptide fragment list the peptide fragment quality that calculates each of which peptide fragment.
Step 125: bait peptide fragment list is incorporated in the peptide fragment concordance list of step 123 gained.
The intact glycopeptide authentication method of the present embodiment also includes:
Step 700: take every spectrogram glycopeptide and carefully give a mark the result of first place, estimate the False discovery rate (FDR) of intact glycopeptide,
Output is with the final qualification result file of FDR.
Step 700 includes substep:
Step 710: the glycopeptide taking every spectrogram first place is labeled as the qualification result of this spectrogram.
Step 720: in all first place results, for sugar chain from the result in bait storehouse, it is believed that be the sugar chain mirror of mistake
Fixed, accordingly it is estimated that
Wherein,Represent the marking sugar chain more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror
Determining the marking set of result, x represents the scoring threshold of artificial setting, and G=False represents sugar chain and identifies the event of mistake, and p represents
Probability function.
In one example, step 720 includes substep 721~723:
Step 721: sugar chain based on bait storehouse mates, utilizes limited gauss hybrid models to estimate dividing of sugar chain mistake qualification
Cloth function fG=False(x)。
Wherein, fG=FalseX () represents marking distribution function when sugar chain is identified wrong.
Step 722: the distribution function f identified according to sugar chain mistakeG=FalseX (), utilizes limited gauss hybrid models to estimate
The distribution function f of non-bait storehouse sugar chain couplingG(x)=πG=FalsefG=False(x)+πG=TruefG=True(x)。
Wherein fGX () represents mistake sugar chain coupling marking and mates the mixed distribution function of marking, f with correct sugar chainG=True
X () represents the distribution function of correct sugar chain coupling marking, πG=FalseAnd πG=TrueRepresent erroneous matching and correctly mate always respectively
Matching result in the ratio that accounts for.
Step 723: calculate
Wherein, xiRepresent the marking of i-th sugar chain coupling, #{xi>=x} represents the number of the marking sugar chain coupling more than x, N
It it is the number of all sugar chains coupling.
Step 730: in all first place results, the peptide fragment that peptide fragment is considered mistake from the result in bait storehouse identifies, root
Identify number according to the peptide fragment of mistake, calculate
Wherein,Represent the marking peptide fragment more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror
Determining the marking set of result, x represents the scoring threshold of artificial setting, and P=False represents peptide fragment and identifies the event of mistake.
Step 740: in all first place results, sugar chain and peptide fragment both are from the result in bait storehouse and are considered sugar chain and peptide fragment mistake simultaneously
Qualification, with reference to abovementioned steps 721 to step 723, calculates by mistake
WhereinRepresent the marking glycopeptide more than or equal to x and identify sugar chain and peptide fragment mistake simultaneously in set
False discovery rate, G=False ∩ P=False represents sugar chain and identifies wrong event with peptide fragment simultaneously.
Step 750: calculate the False discovery rate of intact glycopeptideI.e.
For verifying the actual effect of the present invention, inventor utilizes computer simulation to generate the spectrogram of some glycopeptides as mark
Note data set, and above-mentioned error rate algorithm for estimating is tested.Result shows, above-mentioned error rate algorithm for estimating is estimated
Error rate be 1%, and being directly based upon the actual error rate that aforementioned labeled data collection draws is 1.29%, meets expection.Can
To find out, above-mentioned error rate algorithm for estimating can relatively accurately estimate the error rate of glycopeptide qualification result, and then can be more preferable
Take into account glycopeptide identify identification rate and accuracy rate.
Finally it should be noted that above example is only in order to describe technical scheme rather than to this technical method
Limiting, the present invention can extend to other amendment in application, change, applies and embodiment, and it is taken as that institute
Have such amendment, change, apply, embodiment is all in the range of the spirit or teaching of the present invention.
Claims (10)
1. the method that intact glycopeptide is identified, comprises the following steps:
1) the actual measurement tandem mass spectrum of the two grades of fragmentations of glycopeptide simultaneously containing sugar chain patch information and peptide fragment patch information is obtained;
2) for arbitrary actual measurement tandem mass spectrum to be identified, travel through sugar chain structure data base, for the most each sugar chain structure, hold
Row step 21)~22), until all sugar chain structures traversal is complete in sugar chain structure data base;
21) for current sugar chain structure, according to the parent ion quality of current spectrogram of connecting, infer that in fragmentation test, institute is likely
The quality of the glycopeptide Y-ion obtained;
22) according to step 21) each of drawn the quality of glycopeptide Y-ion under situation, calculate and match when the first two grade of spectrum
The number of the spectral peak of figure, and using the number of this coupling spectral peak as the glycopeptide Y-ion under corresponding situation and when the first two grade of spectrogram
The result of slightly giving a mark of coupling;
3) the thick marking result sugar chain structure in front K name is taken as candidate's sugar chain structure of current spectrogram of connecting;Wherein, K is pre-
If the number of candidates of sugar chain structure;
4) for spectrogram of currently connecting, travel through all of candidate's sugar chain structure, for each candidate's sugar chain structure, carry out actual measurement spectrum
Marking, and the spectrum spectrum coupling marking of the theory spectrum of actual measurement spectrum and sugar chain structure is mated with the spectrum spectrum of the theoretical spectrum of peptide fragment, and then
Draw glycopeptide Structural Identification result.
The method that intact glycopeptide the most according to claim 1 is identified, it is characterised in that described step 21) in, infer broken
The method of the quality splitting the glycopeptide Y-ion likely obtained in test includes: calculating glycopeptide Y-ion quality=peptide fragment quality+
Sugar chain reducing end mass of ion, described peptide fragment quality is deducted the matter of current sugar chain structure by the parent ion quality of spectrogram of currently connecting
Measuring out, described sugar chain reducing end mass of ion calculates as follows: analyzing current all of sugar chain structure fragmentation can
Energy situation, obtains the structure of the sugar chain reducing end ion after the fragmentation under every kind of possible case, then based on these sugar chain reducing ends
The structure of ion draws sugar chain reducing end mass of ion.
The method that intact glycopeptide the most according to claim 2 is identified, it is characterised in that described step 21) in, described sugar chain
Reducing end mass of ion by search sugar chain concordance list draw, described sugar chain concordance list with quality as index entry, record corresponding to
The sugar chain reducing end ionic structure of each quality.
The method that intact glycopeptide the most according to claim 3 is identified, it is characterised in that described step 21) in, described sugar chain
Concordance list draws in advance according to substep:
211) import sugar chain structure data base and travel through each of which sugar chain structure;
212) for current sugar chain, structural analysis based on this sugar chain goes out all possible fracture position of this sugar chain, draws every kind
The structure of the reducing end ion that fracture position is corresponding;
213) calculate the quality of each possible reducing end ion, and then obtain described sugar chain concordance list.
The method that intact glycopeptide the most according to claim 3 is identified, it is characterised in that described step 21) also include: directly
Give up unmatched sugar chain structure obvious with spectrogram of currently connecting, if 274 spectral peaks are strong in two grades of spectrograms of spectrogram of currently connecting
Degree exceedes in 10% and current sugar chain structure of summit and does not comprise NeuAc, then give up this sugar chain structure;If currently connecting spectrum
Two grades of spectrograms of figure do not comprise in 274 spectral peaks and current sugar chain structure and comprise NeuAc, then give up this sugar chain structure;If worked as
In two grades of spectrograms of front series connection spectrogram, 290 spectral strengths do not comprise NeuGc in exceeding 10% and current sugar chain structure of summit,
Then give up this sugar chain structure;Wrap if currently two grades of spectrograms of series connection spectrogram not comprised in 290 spectral peaks and current sugar chain structure
Containing NeuGc, then give up this sugar chain structure.
The method that intact glycopeptide the most according to claim 3 is identified, it is characterised in that described step 21) also include: if
Known glycopeptide sample to be identified is N glycopeptide, then according to the number of pentasaccharides core ion in current sugar chain structure, give up the most not
The sugar chain structure of coupling.
The method that intact glycopeptide the most according to claim 4 is identified, it is characterised in that described step 4) include following sub-step
Rapid:
41) travel through all of candidate's sugar chain structure, step 42 is performed for each candidate's sugar chain structure);
42) the peptide fragment quality inferred according to the parent ion quality of current sugar chain structure and current series connection spectrogram, then retrieves peptide fragment
Concordance list obtains the peptide fragment of this quality matches;
43) the peptide fragment theory fragmentation that will match to, is met the ion of fragmentation condition;
44) theoretical spectra of the theoretical fragment ion of the peptide fragment each matched carries out spectrum spectrum and mate with actual spectrogram, must
Carefully give a mark to corresponding peptide fragment;
45) for current candidate sugar chain structure, the theoretical spectra of the theoretical Y-ion of glycopeptide and actual spectrogram are carried out spectrum spectrum
Join, obtain the thin marking of sugar chain structure;
46) sugar chain structure is carefully given a mark carefully to give a mark with peptide fragment and is obtained after weighting is averaging being made up of corresponding sugar chain structure and corresponding peptide fragment
The thin marking of glycopeptide;
47) glycopeptide Structural Identification result is drawn according to the thin marking of each glycopeptide structure.
The method that intact glycopeptide the most according to claim 7 is identified, it is characterised in that described step 42) in, described peptide fragment
Concordance list is in advance based on protein sequence storehouse and sets up, and described peptide fragment concordance list corresponds to each quality with quality for index entry record
Peptide section sequence.
The method that intact glycopeptide the most according to claim 8 is identified, it is characterised in that the foundation of described peptide fragment concordance list
Journey includes substep:
421) import protein sequence database and travel through each of which protein sequence;
422) for current protein sequence, analyze all possible theoretical enzyme action situation of this protein sequence, draw every kind of reason
The peptide section sequence that opinion enzyme action situation is corresponding;
423) for each possible peptide fragment, generate, based on all possible modified forms, the peptide that corresponding multiple band is modified
Section;
424) calculating the quality of each possible peptide fragment, and then obtain described peptide fragment concordance list, peptide fragment therein both included
Without the peptide fragment modified, also include the peptide fragment that band is modified.
The method that intact glycopeptide the most according to claim 7 is identified, it is characterised in that the foundation of described sugar chain concordance list
Process also includes substep:
214) according to the sugar chain in sugar chain structure data base, construct bait sugar storehouse, analyze the reducing end ionic structure of bait sugar chain
And calculate corresponding reducing end mass of ion;
215) index entry of the reducing end ion from bait sugar chain is added step 33) sugar generated based on raw sugar storehouse that obtains
In chain index table;
The process of setting up of described peptide fragment concordance list also includes:
425) based on protein sequence storehouse and all possible enzyme action situation, draw comprise the peptide fragment list of likely peptide fragment,
According to peptide fragment lists construction bait peptide fragment list the peptide fragment quality that calculates each of which peptide fragment;
426) bait peptide fragment list is incorporated to step 213) in the peptide fragment concordance list of gained;
Described intact glycopeptide authentication method also comprises the following steps:
5) for each series connection spectrogram to be identified, step 2 is performed)~4), take each series connection spectrogram glycopeptide and carefully give a mark first place
Result, estimate the False discovery rate of intact glycopeptide, output is with the final qualification result of False discovery rate, described step 5) include under
Row sub-step:
51) glycopeptide taking each series connection spectrogram first place is labeled as the qualification result of this spectrogram;
52), in all first place results, the sugar chain that sugar chain is considered mistake from the result in bait storehouse is identified, estimates accordingly
Wherein,Represent the marking sugar chain more than or equal to x and identify the False discovery rate of set, IGPRepresent all qualification to tie
The marking set of fruit, x represents the scoring threshold of artificial setting, and G=False represents sugar chain and identifies the event of mistake, and p represents probability
Function;
53), in all first place results, the peptide fragment that peptide fragment is considered mistake from the result in bait storehouse is identified, according to mistake
Peptide fragment identifies number, calculates
Wherein,Represent the marking peptide fragment more than or equal to x and identify the False discovery rate of set, IGPRepresent all qualification to tie
The marking set of fruit, x represents the scoring threshold of artificial setting, and P=False represents peptide fragment and identifies the event of mistake;
54), in all first place results, sugar chain and peptide fragment both are from the result in bait storehouse and are considered sugar chain and peptide fragment mistake simultaneously
Qualification, calculate
WhereinRepresent the marking glycopeptide more than or equal to x and identify the wig of sugar chain and peptide fragment mistake simultaneously in set
Now rate, G=False ∩ P=False represents sugar chain and identifies wrong event with peptide fragment simultaneously;
55) False discovery rate of intact glycopeptide is calculated
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610309699.9A CN106018535B (en) | 2016-05-11 | 2016-05-11 | A kind of method and system of intact glycopeptide identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610309699.9A CN106018535B (en) | 2016-05-11 | 2016-05-11 | A kind of method and system of intact glycopeptide identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106018535A true CN106018535A (en) | 2016-10-12 |
CN106018535B CN106018535B (en) | 2018-11-09 |
Family
ID=57099828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610309699.9A Active CN106018535B (en) | 2016-05-11 | 2016-05-11 | A kind of method and system of intact glycopeptide identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106018535B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109738532A (en) * | 2018-12-31 | 2019-05-10 | 复旦大学 | A method of automatically parsing stable isotope labeling sugar chain quantitative mass spectral data |
CN111220749A (en) * | 2018-11-25 | 2020-06-02 | 中国科学院大连化学物理研究所 | Analysis method of O-linked glycopeptide |
CN112326770A (en) * | 2020-11-04 | 2021-02-05 | 西北大学 | Method for identifying N-linked sugar chain type on complete glycopeptide |
CN112326769A (en) * | 2020-11-04 | 2021-02-05 | 西北大学 | Method for identifying N-sugar chain branch structure on complete glycopeptide |
CN112824894A (en) * | 2019-11-21 | 2021-05-21 | 株式会社岛津制作所 | Glycopeptide analysis device |
CN113571129A (en) * | 2021-09-24 | 2021-10-29 | 北京理工大学 | Complex cross-linked peptide identification method based on mass spectrum |
CN114166925A (en) * | 2021-10-22 | 2022-03-11 | 西安电子科技大学 | Method and system for identifying Denovo by N-sugar chain structure based on mass spectrum data |
CN115662534A (en) * | 2022-12-14 | 2023-01-31 | 药融云数字科技(成都)有限公司 | Chemical structure determination method and system based on map, storage medium and terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2616888A1 (en) * | 2001-12-08 | 2003-07-03 | Micromass Uk Limited | Method of mass spectrometry |
US20040096982A1 (en) * | 2002-11-19 | 2004-05-20 | International Business Machines Corporation | Methods and apparatus for analysis of mass spectra |
WO2004061407A2 (en) * | 2003-01-03 | 2004-07-22 | Caprion Pharmaceuticals, Inc. | Glycopeptide identification and analysis |
CN102072932A (en) * | 2009-11-19 | 2011-05-25 | 复旦大学 | Method and device for identifying glycopeptide segment |
JP2012220365A (en) * | 2011-04-11 | 2012-11-12 | Shimadzu Corp | Sugar peptide analysis method and analysis apparatus |
-
2016
- 2016-05-11 CN CN201610309699.9A patent/CN106018535B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2616888A1 (en) * | 2001-12-08 | 2003-07-03 | Micromass Uk Limited | Method of mass spectrometry |
US20040096982A1 (en) * | 2002-11-19 | 2004-05-20 | International Business Machines Corporation | Methods and apparatus for analysis of mass spectra |
WO2004061407A2 (en) * | 2003-01-03 | 2004-07-22 | Caprion Pharmaceuticals, Inc. | Glycopeptide identification and analysis |
CN102072932A (en) * | 2009-11-19 | 2011-05-25 | 复旦大学 | Method and device for identifying glycopeptide segment |
JP2012220365A (en) * | 2011-04-11 | 2012-11-12 | Shimadzu Corp | Sugar peptide analysis method and analysis apparatus |
Non-Patent Citations (1)
Title |
---|
WEN-FENG ZENG等: "pGlyco: a pipeline for the identification of intact N-glycopeptides by using HCD-and CID-MS/MS and MS3", 《SCIENTIFIC REPORTS》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111220749A (en) * | 2018-11-25 | 2020-06-02 | 中国科学院大连化学物理研究所 | Analysis method of O-linked glycopeptide |
CN109738532A (en) * | 2018-12-31 | 2019-05-10 | 复旦大学 | A method of automatically parsing stable isotope labeling sugar chain quantitative mass spectral data |
CN109738532B (en) * | 2018-12-31 | 2022-07-22 | 复旦大学 | Method for automatically analyzing quantitative mass spectrum data of stable isotope labeled sugar chains |
US11686713B2 (en) | 2019-11-21 | 2023-06-27 | Shimadzu Corporation | Glycopeptide analyzer |
CN112824894A (en) * | 2019-11-21 | 2021-05-21 | 株式会社岛津制作所 | Glycopeptide analysis device |
CN112824894B (en) * | 2019-11-21 | 2024-01-09 | 株式会社岛津制作所 | Glycopeptide analyzer |
CN112326770A (en) * | 2020-11-04 | 2021-02-05 | 西北大学 | Method for identifying N-linked sugar chain type on complete glycopeptide |
CN112326769A (en) * | 2020-11-04 | 2021-02-05 | 西北大学 | Method for identifying N-sugar chain branch structure on complete glycopeptide |
CN112326770B (en) * | 2020-11-04 | 2021-10-26 | 西北大学 | Method for identifying N-linked sugar chain type on complete glycopeptide |
CN113571129A (en) * | 2021-09-24 | 2021-10-29 | 北京理工大学 | Complex cross-linked peptide identification method based on mass spectrum |
CN114166925A (en) * | 2021-10-22 | 2022-03-11 | 西安电子科技大学 | Method and system for identifying Denovo by N-sugar chain structure based on mass spectrum data |
CN114166925B (en) * | 2021-10-22 | 2024-03-26 | 西安电子科技大学 | Denovo method and system for identifying N-sugar chain structure based on mass spectrum data |
CN115662534A (en) * | 2022-12-14 | 2023-01-31 | 药融云数字科技(成都)有限公司 | Chemical structure determination method and system based on map, storage medium and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN106018535B (en) | 2018-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106018535A (en) | Complete glycopeptide identifying method and system | |
CN103884806B (en) | In conjunction with the Leaf proteins Label-free Protein Quantification Methods of second order ms and machine learning algorithm | |
US20080139396A1 (en) | Method of Identifying Sugar Chain Structure and Apparatus For Analyzing the Same | |
CN106935477B (en) | Tandem Mass Spectrometry Analysis data processing equipment | |
CN103852513B (en) | A kind of peptide section de novo sequencing method and system based on HCD and ETD mass spectrogram | |
Curran et al. | Computer aided manual validation of mass spectrometry-based proteomic data | |
CN105136714A (en) | Terahertz spectral wavelength selection method based on genetic algorithm | |
CN103776891A (en) | Method for detecting differentially-expressed protein | |
Ahrné et al. | An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates | |
WO2024082581A1 (en) | M protein detection method | |
US8372653B2 (en) | Mass tag reagents for simultaneous quantitation and identification of small molecules | |
CN104965020B (en) | Multi-stage mses structure of biological macromolecule authentication method | |
CN104820011B (en) | A kind of method of protein post-translational modification positioning | |
JP2014169879A (en) | Method and device for sugar chain structure analysis | |
Zou et al. | Charge state determination of peptide tandem mass spectra using support vector machine (SVM) | |
JP2007263641A (en) | Structure analysis system | |
CN112326769B (en) | Method for identifying N-sugar chain branch structure on complete glycopeptide | |
CN103439441B (en) | Peptide identification method based on subset error rate estimation | |
CN106404883B (en) | A kind of polysaccharide topological structure analytic method based on mass spectral analysis | |
JP2021536567A (en) | Identification and scoring of related compounds in composite samples | |
US20230113788A1 (en) | System based on learning peptide properties for predicting spectral profile of peptide-producing ions in liquid chromatograph-mass spectrometry | |
CN112331269A (en) | Method for constructing N-linked sugar chain branch structure database aiming at sample to be detected | |
Mandal et al. | Top‐down characterization of proteins and drug‐protein complexes using nanoelectrospray tandem mass spectrometry | |
CN106198706B (en) | A kind of pair of polypeptide crosslinking peptide fragment carries out the False discovery rate control method of Mass Spectrometric Identification | |
CN109145887A (en) | A kind of Threshold Analysis method for obscuring differentiation based on spectrum latent variable |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |