WO1999062930A2 - Sequençage de proteines au moyen de la spectroscopie de masse en tandem - Google Patents
Sequençage de proteines au moyen de la spectroscopie de masse en tandem Download PDFInfo
- Publication number
- WO1999062930A2 WO1999062930A2 PCT/US1999/012221 US9912221W WO9962930A2 WO 1999062930 A2 WO1999062930 A2 WO 1999062930A2 US 9912221 W US9912221 W US 9912221W WO 9962930 A2 WO9962930 A2 WO 9962930A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- mass
- graph
- peptide
- ions
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/0027—Methods for using particle spectrometers
- H01J49/0036—Step by step routines describing the handling of the data generated during a measurement
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K1/00—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
- C07K1/12—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length by hydrolysis, i.e. solvolysis in general
- C07K1/128—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length by hydrolysis, i.e. solvolysis in general sequencing
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/004—Combinations of spectrometers, tandem spectrometers, e.g. MS/MS, MSn
Definitions
- a tandem mass spectrometer is capable of automatically ionizing a mixture of peptides and measuring their respective parent mass/charge ratios, then selectively fragmenting each peptide into constitutive pieces and measuring the mass/charge ratios of the fragment ions (MS/MS spectra of peptides).
- the peptide sequencing problem is then to derive the sequences of peptides given their MS/MS spectra.
- sequence of the peptide could be simply determined by converting the mass differences of consecutive fragmentions in the spectrum to their corresponding amino acids.
- de novo peptide sequencing remains an open problem and even simple spectrum may require tens of minutes for a trained expert to interpret.
- the number of sequence permutations examined can be further pruned by limiting the possible amino acid composition derived either through chemical amino acid analysis or through composition measurement for ions below m/z 160 in the tandem mass spectrum.
- the difficulty with the prefix approach is that pruning frequently discards the correct sequence if its prefixes are poorly represented in the spectrum.
- Another intrinsic problem with the global approach is that the spectrum information is used for scoring only after the potential peptide sequences are generated.
- the global approach de novo programs typically have running time on the order of hours.
- the peaks in the spectrum serve as vertices in the spectrum graph while the edges of the graph correspond to linking of vertices differing by the mass of an amino acid residue.
- Fundamental to graph theory approaches is the prior transformation of each peak in the experimental spectrum into several vertices in a spectrum graph. Each vertex represents a different possible fragment ion type assignment for the peak.
- the de novo peptide sequencing problem is thus cast as finding the longest path in the resulting directed acyclic graph. Since the number of edges in the spectrum graph is at most quadratic in the number of ions in the spectrum and since efficient algorithms for finding the longest paths are known such approaches have the potential to efficiently prune the set of all peptides to the set of high-scoring paths in the spectrum graph.
- A be the set of amino acids with molecular masses w(a) , a e A .
- a (parent) peptide P Pi .
- Pn is a sequence of amino acids
- a partial peptide P' c P is a substring p i ..p j ⁇ f P of mass ⁇ i ⁇ t ⁇ : m(p t ).
- Electronic spectrum E(P) of peptide P is a set of masses of its partial peptides.
- a match m(S,P) ⁇ s € S m(s,P) between spectrum S and peptide P is the number of ions from the spectrum S that match peptide P.
- m(S,P) is the number of masses that experimental and electronic spectra have in common.
- the peptide sequencing problem can stated as follows. Given spectrum S and a parent mass m find a peptide of mass m with the maximal match to spectrum S.
- a ⁇ - ion of a partial peptide P' c P is such modification of P' that has molecular mass m(P')- ⁇ .
- electronic spectrum E of peptide P is created by subtracting all offsets from ⁇ from the masses of all partial peptides of P (denoted as E ⁇ ).
- W ⁇ (P) W ⁇ (P,) ⁇ W ⁇ (P (n . 1 ⁇ ).
- the set of vertices of spectrum graph then is ⁇ s ⁇ jnjtial ⁇ ⁇ V(s 1 ) .... V(s m )
- a spectrum S of a peptide P is called "complete" if S contains an ion corresponding to P j for every 1 ⁇ i ⁇ n.
- the use of spectrum graph is based on the observation that for a complete spectrum S of peptide P, S is a complete spectrum of a peptide P when there exists a path of length n from V ⁇ initjal ⁇ to V ⁇ final j in G ⁇ (S) that is labeled by P and
- ⁇ vet s(v), there s(v) denotes the multiplicity with which vertex v was created.
- An offset frequency function is introduced that represents an important new tool for defining the ion type tendencies for particular mass-spectrometers.
- the offset frequency function allows one to compare different mass spectrometers based on their propensity to generate different ion types thus making our algorithm instrument- independent.
- Peaks in a spectrum either represent random noise or ⁇ -ions of partial peptides.
- d(S) be the average distance between the peaks.
- ⁇ is approximately (l-p( ⁇ )) + p( ⁇ ) where p( ⁇ ) is the d(S) probability of ⁇ -ion (the portion of partial peptides that produce ⁇ -ions).
- p( ⁇ ) is the d(S) probability of ⁇ -ion (the portion of partial peptides that produce ⁇ -ions).
- the average d(S) for our sample spectra is 17.5, therefore probability of random offset is 0.057.
- the probability of an a-ion with offset -27 is 0.23.
- the offset -27 is observed 4 times more frequently then the average offset.
- the statistics of offsets over all ions and all partial peptides provides a reliable learning algorithm for ion types.
- Offsets ⁇ ⁇ j ,..., ⁇ k ⁇ corresponding to peaks of H(x) represent the ion-types produced by a given mass-spectrometer. Under normal circumstances we expect these offsets to correspond to the ion types that have sufficient support by chemistry.
- Table 1 Information about terminal ion types learned from experimental spectra. The remaining offsets have average count 45 and average intensity 0.431024. When computing filtered counts, the peaks that have been identified as ions are not counted again for subsequent ion types.
- Table 1 contains the list of offsets that have larger than expected counts and the corresponding ion types as known in chemistry All the significant offsets we found correspond to known ion types Surprisingly enough, some ion types turned to be more significant than previously thought (i.e. b-H 2 0-H 2 0 has larger count that y-NH 3 ). Also Fig. 1 clearly shows the presence of internal b-ions in the spectra.
- a part of the learning of ion types is to decide what interval of offsets should be considered for particular ion type.
- Peaks in a spectrum differ in intensity and one has to address the question of setting a threshold for distinguishing the signal from noise in a spectrum prior to transforming it to a spectrum graph. Low thresholds lead to excessive growth of the spectrum graph while high thresholds lead to fragmentation of the spectrum graph.
- Earlier de novo sequencing algorithms set up the intensity thresholds for experimental spectra in a largely heuristic manner and have not addressed the fact that the intensity thresholds are ion-type dependent.
- the offset frequency function allows one to set up intensity thresholds in a rigorous way.
- K the length of the underlying peptide. Since this information is usually unavailable, K may be chosen as the ratio of the peptide mass and the average mass of an amino acid.
- K may be chosen as the ratio of the peptide mass and the average mass of an amino acid.
- the analysis of b-ions can be limited to intensity ranks 1, 2 and 3, while the analysis of b-H 2 0 can be limited to intensity ranks 3, 4 and 5.
- a similar analysis implies that only intensities ranked 1 and 2 (i.e 20-30 high-intensity peaks) should be considered for y-ions while intensities ranked 2, 3 and 4 represent potential y-H 2 0 ions.
- Fig. 3 shows that only intensities ranked 1 and 2 should be considered for y-ions while intensities ranked 2, 3 and 4 represent potential y-H 2 0 ions.
- the merging algorithm decides what vertices in the spectrum graph are to be merged into one vertex. It is important to merge appropriate vertices; if we do not merge vertices that correspond to the same partial peptide, we will interpret meaningful peaks of spectra as a noise. On the other hand, if we merge vertices that do not correspond to the same peptide, we may interpret noise as meaningful peaks.
- SHERENGA uses greedy a algorithm for merging vertices and introduces bridge edges in the resulting graph.
- a gap edge in the spectrum graph is a directed edge from u to v such that v - u is - l i ⁇
- the goal of scoring is to answer the question of how well a candidate peptide "explains" a spectrum and to choose the peptide that explains the spectrum the best.
- p(P,S) be the probability that a spectrum S is generated by a peptide P produces spectrum S. It is appropriate to design scoring schema so that the high scoring peptides P have the high probability p(P,S).
- p(P,S) evaluate p(P,S) and derive a scoring schema for paths in the spectrum graph, by the probabilities ofthe responding peptides. The longest path in the weighted spectrum graph corresponds to the peptide P that "explains" spectrum S the best.
- the protein sequencing algorithm involves the generation ofthe weighted spectrum graph (as described above) and the search for the highest scoring paths in the spectrum graph.
- Every peak in the spectrum may be interpreted either as an N-terminal ion or C-terminal ion. Therefore, every "real" vertex (corresponding to a mass m) has a
- G be a graph and let T be a set of forbidden pairs of vertices of G (twins).
- a path in G is called anti-symmetric if it contains at most one vertex from every forbidden pair.
- Anti-symmetric longest path problem is to find a longest anti-symmetric path in G with a set of forbidden pairs T.
- the intrinsic property ofthe conventional longest path algorithms is that they use only neighbors of a given vertex while computing the shortest path ending in this vertex.
- Vertices in the spectrum graph are numbers that correspond to masses of potential partial peptides.
- Two forbidden pairs of vertices (x 1 ? yj) and (x 2 , y 2 ) are non- interleaving if the intervals (x l5 y j ) and (x 2 , y 2 ) do not interleave, i.e. one of them is contained inside another.
- a graph G with a set of forbidden pairs is called proper if every two forbidden pairs of vertices are non-interleaving.
- Tandem mass-spectrometry peptide sequencing problem corresponds to antisymmetric longest path problem in a proper graph. We submit that there exists an efficient algorithm for anti-symmetric longest path problem in a proper graph.
- C(G) a graph having a path that corresponds to a path in spectrum graph that is folded in the middle.
- the vertices ofthe combined graph are pairs (e,x) such that edge e covers vertex x.
- An initial vertex corresponds to pair (V ⁇ initjaj ⁇ ,v ⁇ f ⁇ nalj ) and a final vertex ( ⁇ p / ⁇ ,V jP 2j ) corresponds to a folding point ofthe spectrum graph.
- the weight of new vertex will be the weighted average (i(s) u+i(t) v)/(i(s)+i(t)) of weights of u and v.
- the greedy algorithm for merging provides satisfying results for most spectra.
- a peak of a spectrum is actually a mass/charge ( m/z ) ratio ofthe corresponding ion.
- m/z 1
- m/z ofthe peak is the same as the mass ofthe corresponding ion.
- some Mass-spectrometers are capable of producing ions with charge 2 or even more, in this case observed mass is half (third,%) ofthe ion's actual mass.
- c(S, S (x)) be the number of peaks s ; e S and ⁇ e S(x) such that
- the value of x that maximizes c(S, S (x)) then would be an appropriate choice for parent mass. Should there be many choices for x, we can select one that minimizes the sum of distances
- This approach significantly improves the accuracy ofthe parent mass determination.
- This approach can similarly be used to correct a mis-assignment ofthe parent mass/charge value resulting from an incorrect charge assignment.
Landscapes
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biochemistry (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
L'invention concerne un nouvel algorithme, SHERENGA, servant à effectuer une interprétation spectrale de novo apprenant automatiquement des types d'ions fragmentaires et des seuils d'intensité à partir d'un ensemble recueilli de spectres d'essai générés à partir de tout type de spectromètre de masse. Cet algorithme met en application une approche théorique graphique. On utilise les données d'essai afin de construire une valeur optimale de trajet dans les représentations graphiques de spectres de masse en tandem. Une liste classifiée de trajets présentant une valeur forte correspond à des séquences potentielles de peptides. SHERENGA est particulièrement utile pour interpréter des séquences de peptides provenant de protéines inconnues non encore rencontrées en séquençage génomique, ainsi que pour mettre en correspondance des configurations basées sur des textes éprouvés afin de rechercher une homologie avec des protéines connues. Cet algorithme sert également d'appoint efficace permettant de valider les résultats d'algorithmes de correspondance de bases de données en séquençage de peptides très productif et totalement automatisé.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU42284/99A AU4228499A (en) | 1998-06-03 | 1999-06-02 | Protein sequencing using tandem mass spectroscopy |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US8778598P | 1998-06-03 | 1998-06-03 | |
US60/087,785 | 1998-06-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999062930A2 true WO1999062930A2 (fr) | 1999-12-09 |
Family
ID=22207246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/012221 WO1999062930A2 (fr) | 1998-06-03 | 1999-06-02 | Sequençage de proteines au moyen de la spectroscopie de masse en tandem |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU4228499A (fr) |
WO (1) | WO1999062930A2 (fr) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002021139A2 (fr) * | 2000-09-08 | 2002-03-14 | Oxford Glycosciences (Uk) Ltd. | Identification automatisee de peptides |
WO2003046577A1 (fr) * | 2001-11-30 | 2003-06-05 | The European Molecular Biology Laboratory | Systeme et procede de sequencage automatique de proteines par spectrometrie de masse |
WO2003075306A1 (fr) * | 2002-03-01 | 2003-09-12 | Applera Corporation | Procede d'identification de proteines au moyen de donnees de spectrometrie de masse |
WO2003098190A2 (fr) * | 2002-05-20 | 2003-11-27 | Purdue Research Foundation | Identification de proteines a partir de spectres d'ions produits de proteines |
EP1366360A2 (fr) * | 2001-03-09 | 2003-12-03 | Applera Corporation | Procedes d'appariement de proteines a grande echelle |
WO2004008371A1 (fr) * | 2002-07-10 | 2004-01-22 | Institut Suisse De Bioinformatique | Procede d'identification de peptides et de proteines |
WO2004083233A2 (fr) * | 2003-02-10 | 2004-09-30 | Battelle Memorial Institute | Identification de peptides |
US6800449B1 (en) | 2001-07-13 | 2004-10-05 | Syngenta Participations Ag | High throughput functional proteomics |
DE10323917A1 (de) * | 2003-05-23 | 2004-12-16 | Protagen Ag | Verfahren und System zur Aufklärung der Primärstruktur von Biopolymeren |
US6963807B2 (en) | 2000-09-08 | 2005-11-08 | Oxford Glycosciences (Uk) Ltd. | Automated identification of peptides |
US7158862B2 (en) * | 2000-06-12 | 2007-01-02 | The Arizona Board Of Regents On Behalf Of The University Of Arizona | Method and system for mining mass spectral data |
DE102011014805A1 (de) * | 2011-03-18 | 2012-09-20 | Friedrich-Schiller-Universität Jena | Verfahren zur Identifizierung insbesondere unbekannter Substanzen durch Massenspektrometrie |
-
1999
- 1999-06-02 AU AU42284/99A patent/AU4228499A/en not_active Withdrawn
- 1999-06-02 WO PCT/US1999/012221 patent/WO1999062930A2/fr active Application Filing
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7158862B2 (en) * | 2000-06-12 | 2007-01-02 | The Arizona Board Of Regents On Behalf Of The University Of Arizona | Method and system for mining mass spectral data |
WO2002021139A3 (fr) * | 2000-09-08 | 2003-02-06 | Oxford Glycosciences Uk Ltd | Identification automatisee de peptides |
WO2002021139A2 (fr) * | 2000-09-08 | 2002-03-14 | Oxford Glycosciences (Uk) Ltd. | Identification automatisee de peptides |
US6963807B2 (en) | 2000-09-08 | 2005-11-08 | Oxford Glycosciences (Uk) Ltd. | Automated identification of peptides |
EP1366360A2 (fr) * | 2001-03-09 | 2003-12-03 | Applera Corporation | Procedes d'appariement de proteines a grande echelle |
EP1366360A4 (fr) * | 2001-03-09 | 2005-03-16 | Applera Corp | Procedes d'appariement de proteines a grande echelle |
US6800449B1 (en) | 2001-07-13 | 2004-10-05 | Syngenta Participations Ag | High throughput functional proteomics |
WO2003046577A1 (fr) * | 2001-11-30 | 2003-06-05 | The European Molecular Biology Laboratory | Systeme et procede de sequencage automatique de proteines par spectrometrie de masse |
WO2003075306A1 (fr) * | 2002-03-01 | 2003-09-12 | Applera Corporation | Procede d'identification de proteines au moyen de donnees de spectrometrie de masse |
WO2003098190A2 (fr) * | 2002-05-20 | 2003-11-27 | Purdue Research Foundation | Identification de proteines a partir de spectres d'ions produits de proteines |
WO2003098190A3 (fr) * | 2002-05-20 | 2004-07-15 | Purdue Research Foundation | Identification de proteines a partir de spectres d'ions produits de proteines |
WO2004008371A1 (fr) * | 2002-07-10 | 2004-01-22 | Institut Suisse De Bioinformatique | Procede d'identification de peptides et de proteines |
WO2004083233A3 (fr) * | 2003-02-10 | 2004-12-29 | Battelle Memorial Institute | Identification de peptides |
WO2004083233A2 (fr) * | 2003-02-10 | 2004-09-30 | Battelle Memorial Institute | Identification de peptides |
US7979214B2 (en) | 2003-02-10 | 2011-07-12 | Battelle Memorial Institute | Peptide identification |
DE10323917A1 (de) * | 2003-05-23 | 2004-12-16 | Protagen Ag | Verfahren und System zur Aufklärung der Primärstruktur von Biopolymeren |
DE102011014805A1 (de) * | 2011-03-18 | 2012-09-20 | Friedrich-Schiller-Universität Jena | Verfahren zur Identifizierung insbesondere unbekannter Substanzen durch Massenspektrometrie |
Also Published As
Publication number | Publication date |
---|---|
AU4228499A (en) | 1999-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Colinge et al. | OLAV: Towards high‐throughput tandem mass spectrometry data identification | |
Xu et al. | MassMatrix: a database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data | |
US7409296B2 (en) | System and method for scoring peptide matches | |
Zhang et al. | ProbIDtree: an automated software program capable of identifying multiple peptides from a single collision‐induced dissociation spectrum collected by a tandem mass spectrometer | |
EP1047108A2 (fr) | Méthode et dispositif d' identification de peptides et de protéines par spectrometrie de masse | |
Colinge et al. | High‐performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics | |
Bafna et al. | On de novo interpretation of tandem mass spectra for peptide identification | |
WO1999062930A2 (fr) | Sequençage de proteines au moyen de la spectroscopie de masse en tandem | |
WO2008008919A2 (fr) | Procédés et systèmes de conception de transitions et expériences de suivi de réactions multiples à partir de séquences | |
Razumovskaya et al. | A computational method for assessing peptide‐identification reliability in tandem mass spectrometry analysis with SEQUEST | |
Lu et al. | A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications | |
Ahrné et al. | An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates | |
US7230235B2 (en) | Automatic detection of quality spectra | |
US20020046002A1 (en) | Method to evaluate the quality of database search results and the performance of database search algorithms | |
Zou et al. | Charge state determination of peptide tandem mass spectra using support vector machine (SVM) | |
CN114639445B (zh) | 一种基于贝叶斯评价和序列搜库的多肽组学鉴定方法 | |
Park et al. | Human plasma proteome analysis by reversed sequence database search and molecular weight correlation based on a bacterial proteome analysis | |
Fei | Novel Peptide Sequencing With Deep Reinforcement Learning | |
Zhang et al. | A new strategy to filter out false positive identifications of peptides in SEQUEST database search results | |
Fei et al. | GameTag: A New Sequence Tag Generation Algorithm Based on Cooperative Game Theory | |
Sanders et al. | A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data | |
Colinge et al. | A systematic statistical analysis of ion trap tandem mass spectra in view of peptide scoring | |
Liu et al. | PRIMA: peptide robust identification from MS/MS spectra | |
Hubbard | Computational approaches to peptide identification via tandem MS | |
Dančík et al. | De novo peptide sequencing via tandem mass spectrometry: A graph-theoretical approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AU CA JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WA | Withdrawal of international application | ||
122 | Ep: pct application non-entry in european phase |