CN106018535A - Complete glycopeptide identifying method and system - Google Patents

Complete glycopeptide identifying method and system Download PDF

Info

Publication number
CN106018535A
CN106018535A CN201610309699.9A CN201610309699A CN106018535A CN 106018535 A CN106018535 A CN 106018535A CN 201610309699 A CN201610309699 A CN 201610309699A CN 106018535 A CN106018535 A CN 106018535A
Authority
CN
China
Prior art keywords
sugar chain
peptide fragment
glycopeptide
chain structure
ion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610309699.9A
Other languages
Chinese (zh)
Other versions
CN106018535B (en
Inventor
曾文锋
刘铭琪
张晓今
吴建强
张扬
孙瑞祥
杨芃原
贺思敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201610309699.9A priority Critical patent/CN106018535B/en
Publication of CN106018535A publication Critical patent/CN106018535A/en
Application granted granted Critical
Publication of CN106018535B publication Critical patent/CN106018535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode

Landscapes

  • Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a complete glycopeptide identifying method. The method includes the steps the a carbohydrate chain structure database is traversed for any actual measurement tandem mass spectrum to be identified, for each carbohydrate chain structure, the mass of glycopeptide Y ions possibly obtained in fragment tests is concluded according to the mass of parent ions in the current series connection spectrogram, then the number of spectrum peaks matched with the current second-level spectrum is calculated, and the number of the matched spectrum peaks is used as a coarse marking result of matching between the glycopeptide Y ions and the current second-level spectrum; carbohydrate chain structures with the top K coarse marking scores serve as candidate carbohydrate chain structures; for the current series connection spectrogram, all the candidate carbohydrate chain structures are traversed, spectrum-spectrum matching between actual measurement spectrums and theoretical spectrums of peptide fragments of each candidate carbohydrate chain structure is marked, spectrum-spectrum matching between actual measurement spectrums and theoretical spectrums of the carbohydrate chain structures is marked, and then the glycopeptide structure identifying result is obtained. Reliability of complete large-scale glycopeptide identification can be improved, and calculation complexity is low.

Description

The method and system that a kind of intact glycopeptide is identified
Technical field
The present invention relates to bioinformatics technique field, specifically, the present invention relates to glycoproteomics and mass spectrum skill Art field.
Background technology
Mass-spectrometric technique is the Main Means that site-specific modification of protein glycosylation is identified in scale.Mass-spectrometric technique In, the tandem mass spectrum figure generally first passing through intact glycopeptide identifies intact glycopeptide, and that then infers on protein is glycosylation modified. At present, for the qualification of scale intact glycopeptide spectrogram, there are two kinds and identify strategy, be 1. with GRIP, ArMone 2.0 respectively Etc. system be representative based on intact glycopeptide series connection spectrogram identify sugar chain then according to the method for peptide fragment quality matches peptide fragment, and 2. then forming according to sugar chain quality supposition sugar chain based on intact glycopeptide spectrogram qualification peptide fragment with systems such as Byonic as representative Method.Both approaches is briefly described below, then analyzes the deficiency of both approaches.
1. the plant method is mainly by the sugar chain fragment ion information in glycopeptide spectrogram, after mating with the Y-ion of glycopeptide Obtain the qualification result of sugar chain, carried out by the peptide fragment quality inferred and the peptide section sequence in protein sequence storehouse the most again Join, infer peptide section sequence.2. the plant method is mainly by the peptide fragment fragment ion information in glycopeptide spectrogram, with the peptide of glycopeptide The qualification result of peptide fragment is obtained, then by the sugar chain quality inferred and sugar chain in sugar chain data base after section fragment ion coupling Structure is mated, and infers sugar chain structure.
In above two method, intact glycopeptide spectrogram is only utilized to identify the sugar chain portion of glycopeptide or peptide fragment part, and another Partly then directly being inferred by quality and obtain, therefore reliability is poor.Give an example, identify when utilizing intact glycopeptide spectrogram After the sugar chain portion of glycopeptide, thus it is speculated that the quality going out peptide fragment is 999.5633, and the quality that can match in this range of error is just A lot, the quality of two peptide fragments of such as LTEAKPVDK and DVPKAETLK is just the same, and can match 999.5633, as Fruit is only mated by quality, cannot be distinguished by the two completely.In like manner, only the method for sugar chain composition is speculated also according to sugar chain quality There are the problems referred to above.
Therefore, currently solution is identified in the urgent need to a kind of scale intact glycopeptide with higher reliability.
Summary of the invention
The task of the present invention is to provide a kind of scale intact glycopeptide with higher reliability and identifies solution.
The invention provides a kind of method that intact glycopeptide is identified, comprise the following steps:
1) the actual measurement tandem mass spectrum of the two grades of fragmentations of glycopeptide simultaneously containing sugar chain patch information and peptide fragment patch information is obtained;
2) for arbitrary actual measurement tandem mass spectrum to be identified, travel through sugar chain structure data base, the most each sugar chain is tied Structure, performs step 21)~22), until all sugar chain structures traversal is complete in sugar chain structure data base;
21) for current sugar chain structure, according to the parent ion quality of current spectrogram of connecting, infer in fragmentation test all Can the quality of obtainable glycopeptide Y-ion;
22) according to step 21) each of drawn the quality of glycopeptide Y-ion under situation, calculate and match when the first two The number of the spectral peak of level spectrogram, and using the number of this coupling spectral peak as the glycopeptide Y-ion under corresponding situation and when the first two level The result of slightly giving a mark of spectrogram coupling;
3) the thick marking result sugar chain structure in front K name is taken as candidate's sugar chain structure of current spectrogram of connecting;Wherein, K Number of candidates for default sugar chain structure;
4) for spectrogram of currently connecting, travel through all of candidate's sugar chain structure, for each candidate's sugar chain structure, carry out reality Survey the spectrum spectrum coupling marking of the theoretical spectrum of spectrum and peptide fragment, and the spectrum spectrum coupling marking of the theoretical spectrum of actual measurement spectrum and sugar chain structure, And then draw glycopeptide Structural Identification result.
Wherein, described step 21) in, infer the side of the quality of the glycopeptide Y-ion likely obtained in fragmentation test Method includes: calculating glycopeptide Y-ion quality=peptide fragment quality+sugar chain reducing end mass of ion, described peptide fragment quality is by currently connecting The parent ion quality of spectrogram deducts the quality of current sugar chain structure and draws, described sugar chain reducing end mass of ion is counted as follows Draw: analyze all possible cases of current sugar chain structure fragmentation, obtain the sugar chain after the fragmentation under every kind of possible case also The structure of former end ion, then structure based on these sugar chain reducing end ions draws sugar chain reducing end mass of ion.
Wherein, described step 21) in, described sugar chain reducing end mass of ion draws by searching sugar chain concordance list, described Sugar chain concordance list, with quality as index entry, records the sugar chain reducing end ionic structure corresponding to each quality.
Wherein, described step 21) in, described sugar chain concordance list draws in advance according to substep:
211) import sugar chain structure data base and travel through each of which sugar chain structure;
212) for current sugar chain, structural analysis based on this sugar chain goes out all possible fracture position of this sugar chain, draws The structure of the reducing end ion that every kind of fracture position is corresponding;
213) calculate the quality of each possible reducing end ion, and then obtain described sugar chain concordance list.
Wherein, described step 21) also include: directly give up unmatched sugar chain structure obvious with spectrogram of currently connecting, as In two grades of spectrograms of fruit current series connection spectrogram, 274 spectral strengths do not comprise in exceeding 10% and current sugar chain structure of summit NeuAc, then give up this sugar chain structure;If two grades of spectrograms of series connection spectrogram currently do not comprise 274 spectral peaks and current sugar chain knot Structure comprises NeuAc, then gives up this sugar chain structure;If currently in two grades of spectrograms of series connection spectrogram, 290 spectral strengths exceed 10% and the current sugar chain structure on peak does not comprise NeuGc, then gives up this sugar chain structure;If currently two grades of series connection spectrogram Spectrogram does not comprise in 290 spectral peaks and current sugar chain structure and comprise NeuGc, then give up this sugar chain structure.
Wherein, described step 21) also include: if it is known that glycopeptide sample to be identified is N glycopeptide, then according to current sugar chain The number of pentasaccharides core ion in structure, gives up obvious unmatched sugar chain structure.
Wherein, described step 4) include substep:
41) travel through all of candidate's sugar chain structure, step 42 is performed for each candidate's sugar chain structure);
42) the peptide fragment quality inferred according to the parent ion quality of current sugar chain structure and current series connection spectrogram, then retrieves Peptide fragment concordance list obtains the peptide fragment of this quality matches;
43) the peptide fragment theory fragmentation that will match to, is met the ion of fragmentation condition;
44) theoretical spectra of the theoretical fragment ion of the peptide fragment each matched and actual spectrogram carry out spectrum spectrum Join, obtain corresponding peptide fragment and carefully give a mark;
45) for current candidate sugar chain structure, the theoretical spectra of the theoretical Y-ion of glycopeptide is composed with actual spectrogram Spectrum coupling, obtains the thin marking of sugar chain structure;
46) sugar chain structure is carefully given a mark carefully to give a mark with peptide fragment after weighting is averaging and is obtained by corresponding sugar chain structure and corresponding peptide fragment The thin marking of the glycopeptide constituted;
47) glycopeptide Structural Identification result is drawn according to the thin marking of each glycopeptide structure.
Wherein, described step 42) in, described peptide fragment concordance list is in advance based on protein sequence storehouse and sets up, described peptide fragment rope Draw table and correspond to the peptide section sequence of each quality with quality for index entry record.
Wherein, the process of setting up of described peptide fragment concordance list includes substep:
421) import protein sequence database and travel through each of which protein sequence;
422) for current protein sequence, analyze all possible theoretical enzyme action situation of this protein sequence, draw every Plant the peptide section sequence that theoretical enzyme action situation is corresponding;
423) for each possible peptide fragment, generate what corresponding multiple band was modified based on all possible modified forms Peptide fragment;
424) calculating the quality of each possible peptide fragment, and then obtain described peptide fragment concordance list, peptide fragment therein was both Including without the peptide fragment modified, also include the peptide fragment that band is modified.
Wherein, the process of setting up of described sugar chain concordance list also includes substep:
214) according to the sugar chain in sugar chain structure data base, construct bait sugar storehouse, analyze the reducing end ion of bait sugar chain Structure also calculates corresponding reducing end mass of ion;
215) index entry of the reducing end ion from bait sugar chain is added step 33) generating based on raw sugar storehouse of obtaining Sugar chain concordance list in;
The process of setting up of described peptide fragment concordance list also includes:
425) based on protein sequence storehouse and all possible enzyme action situation, draw comprise likely peptide fragment peptide fragment row Table, according to peptide fragment lists construction bait peptide fragment list the peptide fragment quality that calculates each of which peptide fragment;
426) bait peptide fragment list is incorporated to step 213) in the peptide fragment concordance list of gained;
Described intact glycopeptide authentication method also comprises the following steps:
5) for each series connection spectrogram to be identified, step 2 is performed)~4), take each series connection spectrogram glycopeptide and carefully give a mark the The result of one, estimates the False discovery rate of intact glycopeptide, and output is with the final qualification result of False discovery rate, described step 5) bag Include substep:
51) glycopeptide taking each series connection spectrogram first place is labeled as the qualification result of this spectrogram;
52), in all first place results, the sugar chain that sugar chain is considered mistake from the result in bait storehouse is identified, estimates accordingly Count out
Wherein,Represent the marking sugar chain more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror Determining the marking set of result, x represents the scoring threshold of artificial setting, and G=False represents sugar chain and identifies the event of mistake, and p represents Probability function;
53), in all first place results, the peptide fragment that peptide fragment is considered mistake from the result in bait storehouse is identified, according to mistake Peptide fragment identifies number by mistake, calculates
Wherein,Represent the marking peptide fragment more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror Determining the marking set of result, x represents the scoring threshold of artificial setting, and P=False represents peptide fragment and identifies the event of mistake;
54), in all first place results, sugar chain and peptide fragment both are from the result in bait storehouse and are considered sugar chain and peptide fragment simultaneously The qualification of mistake, calculates
WhereinRepresent the marking glycopeptide more than or equal to x and identify sugar chain and peptide fragment mistake simultaneously in set False discovery rate, G=False ∩ P=False represents sugar chain and identifies wrong event with peptide fragment simultaneously;
55) False discovery rate of intact glycopeptide is calculated
Compared with prior art, the present invention has a following technique effect:
1, the present invention can significantly improve the reliability that intact glycopeptide scale is identified.
2, the present invention is on the premise of promoting qualification accuracy, it is possible to maintain relatively low computation complexity.
3, the False discovery rate of intact glycopeptide qualification result can be controlled by the present invention exactly so that qualification result is more Add perfect.
Accompanying drawing explanation
Hereinafter, describe embodiments of the invention in detail in conjunction with accompanying drawing, wherein:
Fig. 1 shows the flow chart of the intact glycopeptide authentication method of one embodiment of the invention.
Detailed description of the invention
For ease of understanding, be first given herein involved by the implication of some professional conceptual:
Glycosidic bond: monosaccharide is connected the chemical bond formed with monosaccharide;
The reducing end of sugar chain: refer to the one end containing free radical-CHO, glycoproteomics refers in sugar chain with That one end that peptide fragment is connected;
The reducing end ion of sugar chain: after the glycosidic bond fracture of sugar chain, be in the part fragment ion of reducing end, the most also It is referred to as the Y-ion of sugar chain
The Y-ion of glycopeptide: the reducing end ion of sugar chain is plus the peptide section sequence of complete glycopeptide;
The b/y ion of glycopeptide: the b/y ion of the peptide fragment part of glycopeptide, b/y ion does not comprise any sugar chain structure;
Pentasaccharides core: five fixing monosaccharide structures that the reducing end of N sugar chain is usually present, i.e. 2 × HexNAc+3 × Hex。
Below, in conjunction with the accompanying drawings and embodiments the present invention is described further.
Fig. 1 shows the flow chart of the intact glycopeptide authentication method of one embodiment of the invention, and this flow process includes following step Rapid:
Step 100: set up sugar chain concordance list based on sugar chain structure storehouse, sets up peptide fragment concordance list based on protein sequence storehouse. Sugar chain concordance list, with quality as index entry, records the sugar chain reducing end ionic structure corresponding to each quality.Wherein, a quality Likely correspond to multiple sugar chain reducing end ionic structure.Peptide fragment concordance list is also with quality as index entry, and record is corresponding to each matter The peptide section sequence (i.e. forming the aminoacid sequence of peptide fragment) of amount.It should be noted that this step is the preposition of intact glycopeptide qualification Step, if having obtained described sugar chain concordance list and peptide fragment concordance list, then can omit this step in intact glycopeptide identity process.
According to one embodiment of present invention, set up the process of sugar chain concordance list based on sugar chain structure storehouse and include following sub-step Rapid:
Step 110, imports sugar chain structure data base and travels through each of which sugar chain structure (hereinafter sometimes simply referred to as sugar Chain).
Step 111: for current sugar chain, structural analysis based on this sugar chain goes out all possible fracture position of this sugar chain, Draw the structure of the reducing end ion that every kind of fracture position is corresponding.
Step 112: calculate the quality of each possible reducing end ion, and then obtain described sugar chain concordance list.
Set up the process of peptide fragment concordance list based on protein sequence storehouse and include substep:
Step 120: import protein sequence database and travel through each of which protein sequence.
Step 121: for current protein sequence, analyzes all possible theoretical enzyme action situation of this protein sequence, Go out every kind of peptide section sequence corresponding to theoretical enzyme action situation (herein, peptide section sequence refers to the fragment after front protein sequence fragmentation, Sometimes peptide fragment it is also referred to as).
Step 122: for each possible peptide fragment, generate corresponding multiple band based on all possible modified forms and repair The peptide fragment of decorations.
Step 123: calculate the quality of each possible peptide fragment, and then obtain described peptide fragment concordance list.Peptide therein Section had both included, without the peptide fragment modified, also including the peptide fragment that band is modified.
Step 200: glycopeptide sample carries out two grades of fragmentations of mass spectrum, obtains and contains sugar chain patch information and peptide fragment fragment simultaneously The actual measurement tandem mass spectrum of two grades of fragmentations of glycopeptide of information.In one embodiment, use the energetic encounter cracking of multi-energy combination (HCD) fragmentation mode.Inventor finds during research glycopeptide Fragmentation rule, utilizes based on multi-energy combination HCD technology, it is possible to simultaneously obtain sugar chain and peptide fragment fragment in a series connection spectrogram, is thus to identify sugar chain and peptide fragment simultaneously Provide the foundation.
Step 300: complete one-level spectrum parent ion screening and derive.In the present embodiment, first mass spectrometric parent ion screens and leads Go out and complete based on pParse system.The concrete ins and outs of pParse system refer to document: [Yuan, Z.F., et al.pParse:a method for accurate determination of monoisotopic peaks in high- resolution mass spectra.Proteomics 12,226-235(2012)].It should be noted that in the present invention, one Level mass spectrometry precursor ion screening technique is not unique, in other embodiments, it is also possible to select other first mass spectrometric parent ions Screening technique.
Step 400: choose a collection of tandem mass spectrum from the spectrogram that two grades of fragmentations of mass spectrum are obtained as mass spectrum to be identified, often Secondary take a tandem mass spectrum as current mass spectrum to be identified.It is known that tandem mass spectrum includes the first mass spectrometric and two being associated Level mass spectrum.In this step, the pretreatment such as isotopic peak are carried out for the second order ms in current tandem mass spectrum to be identified.One In individual embodiment, isotopic peak is gone to include: to retain single isotopic spectral peak and remove the isotopic peak of other forms.Remove isotope The purpose at peak is to make qualification result more accurate, because the isotopic peak of other forms may be mated by the result of mistake On, the probability causing mistake to be identified increases.
Step 500: for pretreated second order ms, open search sugar chain data base, to each of which sugar chain Structure, mates the number drawing coupling spectral peak, and then draws candidate's sugar chain structure, this enforcement based on quality to second order ms In example, this method that second order ms mates the number drawing coupling spectral peak based on quality is also called thick marking.
In one embodiment, step 500 includes substep:
Step 510: for second order ms the most to be identified, carry out principium identification according to the information in spectrogram, screen out bright Show for the best spectrogram of non-glycopeptide spectrogram or quality.If current spectrogram is screened out, then return to step 400, continue with down One tandem mass spectrum.
In one embodiment, screen out and be evident as the process of non-glycopeptide spectrogram or the best spectrogram of quality and include: if The value of mass-to-charge ratio (138.06,204.09 and following 274 and 290 all represent) spectral peak exists in spectrogram 138.06 and 204.09 And the intensity of relative spectrogram summit is all more than 20%, then judge that this spectrogram is as glycopeptide spectrogram;Otherwise judge that this spectrogram is as non-saccharide Peptide spectrogram, screens out current spectrogram.
Step 520: traversal sugar chain structure data base, for each sugar chain structure, performs step 530~533.
Step 530: for current sugar chain structure, according to the parent ion quality of current spectrogram, infers institute in fragmentation test The quality of the glycopeptide Y-ion likely obtained.Glycopeptide Y-ion quality=peptide fragment quality+sugar chain reducing end mass of ion.Wherein, The quality that peptide fragment quality is deducted current sugar chain structure by the parent ion quality of spectrogram draws, the calculating of sugar chain reducing end mass of ion As follows: to analyze all possible cases of current sugar chain structure fragmentation, obtain the reduction of the sugar chain after the fragmentation under every kind of possible case Hold ion, then Structure Calculation based on these sugar chain reducing end ions draws sugar chain reducing end mass of ion.
Step 531: combine in the information and current sugar chain structure shown in current spectrogram whether comprise NeuAc or NeuGc, Give up obvious unmatched sugar chain structure.Wherein, if 274 spectral strengths exceed the 10% of summit and current in current spectrogram Do not comprise NeuAc in sugar chain structure, then this sugar chain structure is rejected;If spectrogram does not comprise 274 spectral peaks and current sugar chain knot Comprise NeuAc in structure, then this sugar chain structure is rejected;If 290 spectral strengths exceed the 10% of summit and current in spectrogram Sugar chain structure does not comprise NeuGc, then gives up this sugar chain structure;If spectrogram does not comprise 290 spectral peaks and current sugar chain structure In comprise NeuGc, then give up this sugar chain structure.
Step 532: step 530 each of is drawn to the quality of glycopeptide Y-ion under situation, to this glycopeptide Y-ion Quality is mated with when the first two grade of spectrogram, calculates the number of the spectral peak matched.Using the number of this coupling spectral peak as phase Answer the result of slightly giving a mark that the glycopeptide Y-ion under situation mates with when the first two grade of spectrogram.
Step 533: if it is known that glycopeptide to be identified is N glycopeptide, then according to pentasaccharides core ion in current sugar chain structure Number, give up obvious unmatched sugar chain structure.In the present embodiment, if pentasaccharides core ion mates less than 2, then give up Current sugar chain structure.
After execution of step 533, choose next sugar chain structure, re-execute step 530~533, until sugar chain structure In data base, all sugar chain structures traversal is complete.
Step 540: before taking the number of the spectral peak matched, the sugar chain of K name is tied as candidate's sugar chain of current glycopeptide spectrogram Structure.Wherein, K is configurable parameter, and in the present embodiment, the range of choice of K is: K is more than or equal to 50.
Step 600: for when the first two grade of spectrogram, travel through all of candidate's sugar chain structure (K sugar that i.e. step 540 draws Chain), for each candidate's sugar chain structure, carry out peptide fragment coupling according to spectrogram further, obtain complete glycopeptide and carefully give a mark.Carefully beat Dividing and be normally based on actual measurement spectrum and the marking of the theoretical spectrum spectrum coupling composed, it can weigh out each candidate's glycopeptide knot more accurately Structure with when the matching degree of the first two grade of spectrogram.
In the present embodiment, glycopeptide is carefully given a mark, and carefully giving a mark according to sugar chain structure carefully gives a mark with peptide fragment comprehensively draws, its process bag Include substep:
Step 610: travel through all of candidate's sugar chain structure, performs step 611 for each candidate's sugar chain structure.
Step 611: the peptide fragment quality inferred according to the parent ion quality of current sugar chain structure and current spectrogram, retrieves peptide fragment Concordance list obtains the peptide fragment of quality matches.
Step 612: by peptide fragment theory fragmentation, be met the ion of fragmentation condition.Such as under HCD fragmentation available The ion meeting fragmentation condition includes: the b/y ion of peptide fragment, comprises the b/y+83.03711Da ion of glycosylation site, and B/y+HexNAc ion.
Step 613: mated with actual spectrogram by the theoretical spectra of the theoretical fragment ion of each peptide fragment, obtains corresponding Peptide fragment is carefully given a mark.In this step, theoretical spectrum is mated marking with the spectrum spectrum of actual spectrum and KSDP algorithm can be used to realize.Specifically refer to Document [Fu, Y., et al.Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry.Bioinformatics 20,1948- 1954(2004)]。
Step 614: the theoretical spectra of the theoretical Y-ion of glycopeptide is mated with actual spectrogram, obtains sugar chain structure and carefully beat Point.In this step, theoretical spectrum is mated marking with the spectrum spectrum of actual spectrum and KSDP algorithm can be used to realize.Specifically refer to document [Fu, Y.,et al.Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry.Bioinformatics 20,1948-1954 (2004)]。
Step 615: carefully giving a mark carefully to give a mark with peptide fragment after weighting is averaging according to sugar chain structure obtains by corresponding sugar chain structure Thin marking with the glycopeptide that corresponding peptide fragment is constituted.Weight can draw according to the significance level of sugar chain structure and peptide fragment.Excellent at one Selecting in embodiment, the weight of peptide fragment marking is 0.65, and the weight of sugar chain marking is 0.35.
After the glycopeptide drawing each structure is carefully given a mark, i.e. can determine that the qualification result that current spectrum figure is final.
The present embodiment can significantly improve the reliability that intact glycopeptide scale is identified, and, additionally it is possible to identify promoting Relatively low computation complexity is maintained on the premise of accuracy.
Further, according to a preferred embodiment of the present invention, the wig of above-mentioned intact glycopeptide authentication method it is also directed to Now rate is studied, and gives the scheme controlling qualification result False discovery rate further.
In this embodiment, the process setting up sugar chain concordance list also includes:
Step 113: according to the sugar chain in sugar storehouse (i.e. sugar chain structure data base), constructs bait sugar storehouse, analyzes bait sugar chain Reducing end ionic structure and calculate corresponding reducing end mass of ion.
Step 114: the index entry of the reducing end ion from bait sugar chain is added that step 112 obtains based on raw sugar storehouse In the sugar chain concordance list generated.
The process setting up peptide fragment concordance list also includes:
Step 124: based on protein sequence storehouse and all possible enzyme action situation, draw comprise the peptide of likely peptide fragment Duan Liebiao, according to peptide fragment lists construction bait peptide fragment list the peptide fragment quality that calculates each of which peptide fragment.
Step 125: bait peptide fragment list is incorporated in the peptide fragment concordance list of step 123 gained.
The intact glycopeptide authentication method of the present embodiment also includes:
Step 700: take every spectrogram glycopeptide and carefully give a mark the result of first place, estimate the False discovery rate (FDR) of intact glycopeptide, Output is with the final qualification result file of FDR.
Step 700 includes substep:
Step 710: the glycopeptide taking every spectrogram first place is labeled as the qualification result of this spectrogram.
Step 720: in all first place results, for sugar chain from the result in bait storehouse, it is believed that be the sugar chain mirror of mistake Fixed, accordingly it is estimated that
Wherein,Represent the marking sugar chain more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror Determining the marking set of result, x represents the scoring threshold of artificial setting, and G=False represents sugar chain and identifies the event of mistake, and p represents Probability function.
In one example, step 720 includes substep 721~723:
Step 721: sugar chain based on bait storehouse mates, utilizes limited gauss hybrid models to estimate dividing of sugar chain mistake qualification Cloth function fG=False(x)。
Wherein, fG=FalseX () represents marking distribution function when sugar chain is identified wrong.
Step 722: the distribution function f identified according to sugar chain mistakeG=FalseX (), utilizes limited gauss hybrid models to estimate The distribution function f of non-bait storehouse sugar chain couplingG(x)=πG=FalsefG=False(x)+πG=TruefG=True(x)。
Wherein fGX () represents mistake sugar chain coupling marking and mates the mixed distribution function of marking, f with correct sugar chainG=True X () represents the distribution function of correct sugar chain coupling marking, πG=FalseAnd πG=TrueRepresent erroneous matching and correctly mate always respectively Matching result in the ratio that accounts for.
Step 723: calculate
Wherein, xiRepresent the marking of i-th sugar chain coupling, #{xi>=x} represents the number of the marking sugar chain coupling more than x, N It it is the number of all sugar chains coupling.
Step 730: in all first place results, the peptide fragment that peptide fragment is considered mistake from the result in bait storehouse identifies, root Identify number according to the peptide fragment of mistake, calculate
Wherein,Represent the marking peptide fragment more than or equal to x and identify the False discovery rate of set, IGPRepresent all mirror Determining the marking set of result, x represents the scoring threshold of artificial setting, and P=False represents peptide fragment and identifies the event of mistake.
Step 740: in all first place results, sugar chain and peptide fragment both are from the result in bait storehouse and are considered sugar chain and peptide fragment mistake simultaneously Qualification, with reference to abovementioned steps 721 to step 723, calculates by mistake
WhereinRepresent the marking glycopeptide more than or equal to x and identify sugar chain and peptide fragment mistake simultaneously in set False discovery rate, G=False ∩ P=False represents sugar chain and identifies wrong event with peptide fragment simultaneously.
Step 750: calculate the False discovery rate of intact glycopeptideI.e.
For verifying the actual effect of the present invention, inventor utilizes computer simulation to generate the spectrogram of some glycopeptides as mark Note data set, and above-mentioned error rate algorithm for estimating is tested.Result shows, above-mentioned error rate algorithm for estimating is estimated Error rate be 1%, and being directly based upon the actual error rate that aforementioned labeled data collection draws is 1.29%, meets expection.Can To find out, above-mentioned error rate algorithm for estimating can relatively accurately estimate the error rate of glycopeptide qualification result, and then can be more preferable Take into account glycopeptide identify identification rate and accuracy rate.
Finally it should be noted that above example is only in order to describe technical scheme rather than to this technical method Limiting, the present invention can extend to other amendment in application, change, applies and embodiment, and it is taken as that institute Have such amendment, change, apply, embodiment is all in the range of the spirit or teaching of the present invention.

Claims (10)

1. the method that intact glycopeptide is identified, comprises the following steps:
1) the actual measurement tandem mass spectrum of the two grades of fragmentations of glycopeptide simultaneously containing sugar chain patch information and peptide fragment patch information is obtained;
2) for arbitrary actual measurement tandem mass spectrum to be identified, travel through sugar chain structure data base, for the most each sugar chain structure, hold Row step 21)~22), until all sugar chain structures traversal is complete in sugar chain structure data base;
21) for current sugar chain structure, according to the parent ion quality of current spectrogram of connecting, infer that in fragmentation test, institute is likely The quality of the glycopeptide Y-ion obtained;
22) according to step 21) each of drawn the quality of glycopeptide Y-ion under situation, calculate and match when the first two grade of spectrum The number of the spectral peak of figure, and using the number of this coupling spectral peak as the glycopeptide Y-ion under corresponding situation and when the first two grade of spectrogram The result of slightly giving a mark of coupling;
3) the thick marking result sugar chain structure in front K name is taken as candidate's sugar chain structure of current spectrogram of connecting;Wherein, K is pre- If the number of candidates of sugar chain structure;
4) for spectrogram of currently connecting, travel through all of candidate's sugar chain structure, for each candidate's sugar chain structure, carry out actual measurement spectrum Marking, and the spectrum spectrum coupling marking of the theory spectrum of actual measurement spectrum and sugar chain structure is mated with the spectrum spectrum of the theoretical spectrum of peptide fragment, and then Draw glycopeptide Structural Identification result.
The method that intact glycopeptide the most according to claim 1 is identified, it is characterised in that described step 21) in, infer broken The method of the quality splitting the glycopeptide Y-ion likely obtained in test includes: calculating glycopeptide Y-ion quality=peptide fragment quality+ Sugar chain reducing end mass of ion, described peptide fragment quality is deducted the matter of current sugar chain structure by the parent ion quality of spectrogram of currently connecting Measuring out, described sugar chain reducing end mass of ion calculates as follows: analyzing current all of sugar chain structure fragmentation can Energy situation, obtains the structure of the sugar chain reducing end ion after the fragmentation under every kind of possible case, then based on these sugar chain reducing ends The structure of ion draws sugar chain reducing end mass of ion.
The method that intact glycopeptide the most according to claim 2 is identified, it is characterised in that described step 21) in, described sugar chain Reducing end mass of ion by search sugar chain concordance list draw, described sugar chain concordance list with quality as index entry, record corresponding to The sugar chain reducing end ionic structure of each quality.
The method that intact glycopeptide the most according to claim 3 is identified, it is characterised in that described step 21) in, described sugar chain Concordance list draws in advance according to substep:
211) import sugar chain structure data base and travel through each of which sugar chain structure;
212) for current sugar chain, structural analysis based on this sugar chain goes out all possible fracture position of this sugar chain, draws every kind The structure of the reducing end ion that fracture position is corresponding;
213) calculate the quality of each possible reducing end ion, and then obtain described sugar chain concordance list.
The method that intact glycopeptide the most according to claim 3 is identified, it is characterised in that described step 21) also include: directly Give up unmatched sugar chain structure obvious with spectrogram of currently connecting, if 274 spectral peaks are strong in two grades of spectrograms of spectrogram of currently connecting Degree exceedes in 10% and current sugar chain structure of summit and does not comprise NeuAc, then give up this sugar chain structure;If currently connecting spectrum Two grades of spectrograms of figure do not comprise in 274 spectral peaks and current sugar chain structure and comprise NeuAc, then give up this sugar chain structure;If worked as In two grades of spectrograms of front series connection spectrogram, 290 spectral strengths do not comprise NeuGc in exceeding 10% and current sugar chain structure of summit, Then give up this sugar chain structure;Wrap if currently two grades of spectrograms of series connection spectrogram not comprised in 290 spectral peaks and current sugar chain structure Containing NeuGc, then give up this sugar chain structure.
The method that intact glycopeptide the most according to claim 3 is identified, it is characterised in that described step 21) also include: if Known glycopeptide sample to be identified is N glycopeptide, then according to the number of pentasaccharides core ion in current sugar chain structure, give up the most not The sugar chain structure of coupling.
The method that intact glycopeptide the most according to claim 4 is identified, it is characterised in that described step 4) include following sub-step Rapid:
41) travel through all of candidate's sugar chain structure, step 42 is performed for each candidate's sugar chain structure);
42) the peptide fragment quality inferred according to the parent ion quality of current sugar chain structure and current series connection spectrogram, then retrieves peptide fragment Concordance list obtains the peptide fragment of this quality matches;
43) the peptide fragment theory fragmentation that will match to, is met the ion of fragmentation condition;
44) theoretical spectra of the theoretical fragment ion of the peptide fragment each matched carries out spectrum spectrum and mate with actual spectrogram, must Carefully give a mark to corresponding peptide fragment;
45) for current candidate sugar chain structure, the theoretical spectra of the theoretical Y-ion of glycopeptide and actual spectrogram are carried out spectrum spectrum Join, obtain the thin marking of sugar chain structure;
46) sugar chain structure is carefully given a mark carefully to give a mark with peptide fragment and is obtained after weighting is averaging being made up of corresponding sugar chain structure and corresponding peptide fragment The thin marking of glycopeptide;
47) glycopeptide Structural Identification result is drawn according to the thin marking of each glycopeptide structure.
The method that intact glycopeptide the most according to claim 7 is identified, it is characterised in that described step 42) in, described peptide fragment Concordance list is in advance based on protein sequence storehouse and sets up, and described peptide fragment concordance list corresponds to each quality with quality for index entry record Peptide section sequence.
The method that intact glycopeptide the most according to claim 8 is identified, it is characterised in that the foundation of described peptide fragment concordance list Journey includes substep:
421) import protein sequence database and travel through each of which protein sequence;
422) for current protein sequence, analyze all possible theoretical enzyme action situation of this protein sequence, draw every kind of reason The peptide section sequence that opinion enzyme action situation is corresponding;
423) for each possible peptide fragment, generate, based on all possible modified forms, the peptide that corresponding multiple band is modified Section;
424) calculating the quality of each possible peptide fragment, and then obtain described peptide fragment concordance list, peptide fragment therein both included Without the peptide fragment modified, also include the peptide fragment that band is modified.
The method that intact glycopeptide the most according to claim 7 is identified, it is characterised in that the foundation of described sugar chain concordance list Process also includes substep:
214) according to the sugar chain in sugar chain structure data base, construct bait sugar storehouse, analyze the reducing end ionic structure of bait sugar chain And calculate corresponding reducing end mass of ion;
215) index entry of the reducing end ion from bait sugar chain is added step 33) sugar generated based on raw sugar storehouse that obtains In chain index table;
The process of setting up of described peptide fragment concordance list also includes:
425) based on protein sequence storehouse and all possible enzyme action situation, draw comprise the peptide fragment list of likely peptide fragment, According to peptide fragment lists construction bait peptide fragment list the peptide fragment quality that calculates each of which peptide fragment;
426) bait peptide fragment list is incorporated to step 213) in the peptide fragment concordance list of gained;
Described intact glycopeptide authentication method also comprises the following steps:
5) for each series connection spectrogram to be identified, step 2 is performed)~4), take each series connection spectrogram glycopeptide and carefully give a mark first place Result, estimate the False discovery rate of intact glycopeptide, output is with the final qualification result of False discovery rate, described step 5) include under Row sub-step:
51) glycopeptide taking each series connection spectrogram first place is labeled as the qualification result of this spectrogram;
52), in all first place results, the sugar chain that sugar chain is considered mistake from the result in bait storehouse is identified, estimates accordingly
Wherein,Represent the marking sugar chain more than or equal to x and identify the False discovery rate of set, IGPRepresent all qualification to tie The marking set of fruit, x represents the scoring threshold of artificial setting, and G=False represents sugar chain and identifies the event of mistake, and p represents probability Function;
53), in all first place results, the peptide fragment that peptide fragment is considered mistake from the result in bait storehouse is identified, according to mistake Peptide fragment identifies number, calculates
Wherein,Represent the marking peptide fragment more than or equal to x and identify the False discovery rate of set, IGPRepresent all qualification to tie The marking set of fruit, x represents the scoring threshold of artificial setting, and P=False represents peptide fragment and identifies the event of mistake;
54), in all first place results, sugar chain and peptide fragment both are from the result in bait storehouse and are considered sugar chain and peptide fragment mistake simultaneously Qualification, calculate
WhereinRepresent the marking glycopeptide more than or equal to x and identify the wig of sugar chain and peptide fragment mistake simultaneously in set Now rate, G=False ∩ P=False represents sugar chain and identifies wrong event with peptide fragment simultaneously;
55) False discovery rate of intact glycopeptide is calculated
CN201610309699.9A 2016-05-11 2016-05-11 A kind of method and system of intact glycopeptide identification Active CN106018535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610309699.9A CN106018535B (en) 2016-05-11 2016-05-11 A kind of method and system of intact glycopeptide identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610309699.9A CN106018535B (en) 2016-05-11 2016-05-11 A kind of method and system of intact glycopeptide identification

Publications (2)

Publication Number Publication Date
CN106018535A true CN106018535A (en) 2016-10-12
CN106018535B CN106018535B (en) 2018-11-09

Family

ID=57099828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610309699.9A Active CN106018535B (en) 2016-05-11 2016-05-11 A kind of method and system of intact glycopeptide identification

Country Status (1)

Country Link
CN (1) CN106018535B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109738532A (en) * 2018-12-31 2019-05-10 复旦大学 A method of automatically parsing stable isotope labeling sugar chain quantitative mass spectral data
CN111220749A (en) * 2018-11-25 2020-06-02 中国科学院大连化学物理研究所 Analysis method of O-linked glycopeptide
CN112326770A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-linked sugar chain type on complete glycopeptide
CN112326769A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-sugar chain branch structure on complete glycopeptide
CN112824894A (en) * 2019-11-21 2021-05-21 株式会社岛津制作所 Glycopeptide analysis device
CN113571129A (en) * 2021-09-24 2021-10-29 北京理工大学 Complex cross-linked peptide identification method based on mass spectrum
CN114166925A (en) * 2021-10-22 2022-03-11 西安电子科技大学 Method and system for identifying Denovo by N-sugar chain structure based on mass spectrum data
CN115662534A (en) * 2022-12-14 2023-01-31 药融云数字科技(成都)有限公司 Chemical structure determination method and system based on map, storage medium and terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2616888A1 (en) * 2001-12-08 2003-07-03 Micromass Uk Limited Method of mass spectrometry
US20040096982A1 (en) * 2002-11-19 2004-05-20 International Business Machines Corporation Methods and apparatus for analysis of mass spectra
WO2004061407A2 (en) * 2003-01-03 2004-07-22 Caprion Pharmaceuticals, Inc. Glycopeptide identification and analysis
CN102072932A (en) * 2009-11-19 2011-05-25 复旦大学 Method and device for identifying glycopeptide segment
JP2012220365A (en) * 2011-04-11 2012-11-12 Shimadzu Corp Sugar peptide analysis method and analysis apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2616888A1 (en) * 2001-12-08 2003-07-03 Micromass Uk Limited Method of mass spectrometry
US20040096982A1 (en) * 2002-11-19 2004-05-20 International Business Machines Corporation Methods and apparatus for analysis of mass spectra
WO2004061407A2 (en) * 2003-01-03 2004-07-22 Caprion Pharmaceuticals, Inc. Glycopeptide identification and analysis
CN102072932A (en) * 2009-11-19 2011-05-25 复旦大学 Method and device for identifying glycopeptide segment
JP2012220365A (en) * 2011-04-11 2012-11-12 Shimadzu Corp Sugar peptide analysis method and analysis apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEN-FENG ZENG等: "pGlyco: a pipeline for the identification of intact N-glycopeptides by using HCD-and CID-MS/MS and MS3", 《SCIENTIFIC REPORTS》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111220749A (en) * 2018-11-25 2020-06-02 中国科学院大连化学物理研究所 Analysis method of O-linked glycopeptide
CN109738532A (en) * 2018-12-31 2019-05-10 复旦大学 A method of automatically parsing stable isotope labeling sugar chain quantitative mass spectral data
CN109738532B (en) * 2018-12-31 2022-07-22 复旦大学 Method for automatically analyzing quantitative mass spectrum data of stable isotope labeled sugar chains
US11686713B2 (en) 2019-11-21 2023-06-27 Shimadzu Corporation Glycopeptide analyzer
CN112824894A (en) * 2019-11-21 2021-05-21 株式会社岛津制作所 Glycopeptide analysis device
CN112824894B (en) * 2019-11-21 2024-01-09 株式会社岛津制作所 Glycopeptide analyzer
CN112326770A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-linked sugar chain type on complete glycopeptide
CN112326769A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-sugar chain branch structure on complete glycopeptide
CN112326770B (en) * 2020-11-04 2021-10-26 西北大学 Method for identifying N-linked sugar chain type on complete glycopeptide
CN113571129A (en) * 2021-09-24 2021-10-29 北京理工大学 Complex cross-linked peptide identification method based on mass spectrum
CN114166925A (en) * 2021-10-22 2022-03-11 西安电子科技大学 Method and system for identifying Denovo by N-sugar chain structure based on mass spectrum data
CN114166925B (en) * 2021-10-22 2024-03-26 西安电子科技大学 Denovo method and system for identifying N-sugar chain structure based on mass spectrum data
CN115662534A (en) * 2022-12-14 2023-01-31 药融云数字科技(成都)有限公司 Chemical structure determination method and system based on map, storage medium and terminal

Also Published As

Publication number Publication date
CN106018535B (en) 2018-11-09

Similar Documents

Publication Publication Date Title
CN106018535A (en) Complete glycopeptide identifying method and system
CN103884806B (en) In conjunction with the Leaf proteins Label-free Protein Quantification Methods of second order ms and machine learning algorithm
US20080139396A1 (en) Method of Identifying Sugar Chain Structure and Apparatus For Analyzing the Same
CN106935477B (en) Tandem Mass Spectrometry Analysis data processing equipment
CN103852513B (en) A kind of peptide section de novo sequencing method and system based on HCD and ETD mass spectrogram
Curran et al. Computer aided manual validation of mass spectrometry-based proteomic data
CN105136714A (en) Terahertz spectral wavelength selection method based on genetic algorithm
CN103776891A (en) Method for detecting differentially-expressed protein
Ahrné et al. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates
WO2024082581A1 (en) M protein detection method
US8372653B2 (en) Mass tag reagents for simultaneous quantitation and identification of small molecules
CN104965020B (en) Multi-stage mses structure of biological macromolecule authentication method
CN104820011B (en) A kind of method of protein post-translational modification positioning
JP2014169879A (en) Method and device for sugar chain structure analysis
Zou et al. Charge state determination of peptide tandem mass spectra using support vector machine (SVM)
JP2007263641A (en) Structure analysis system
CN112326769B (en) Method for identifying N-sugar chain branch structure on complete glycopeptide
CN103439441B (en) Peptide identification method based on subset error rate estimation
CN106404883B (en) A kind of polysaccharide topological structure analytic method based on mass spectral analysis
JP2021536567A (en) Identification and scoring of related compounds in composite samples
US20230113788A1 (en) System based on learning peptide properties for predicting spectral profile of peptide-producing ions in liquid chromatograph-mass spectrometry
CN112331269A (en) Method for constructing N-linked sugar chain branch structure database aiming at sample to be detected
Mandal et al. Top‐down characterization of proteins and drug‐protein complexes using nanoelectrospray tandem mass spectrometry
CN106198706B (en) A kind of pair of polypeptide crosslinking peptide fragment carries out the False discovery rate control method of Mass Spectrometric Identification
CN109145887A (en) A kind of Threshold Analysis method for obscuring differentiation based on spectrum latent variable

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant