WO2005057208A1 - Procede d'identification de peptides et de proteines - Google Patents
Procede d'identification de peptides et de proteines Download PDFInfo
- Publication number
- WO2005057208A1 WO2005057208A1 PCT/US2004/040225 US2004040225W WO2005057208A1 WO 2005057208 A1 WO2005057208 A1 WO 2005057208A1 US 2004040225 W US2004040225 W US 2004040225W WO 2005057208 A1 WO2005057208 A1 WO 2005057208A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- peptide
- peptide sequence
- putative
- protein
- sequence
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6848—Methods of protein analysis involving mass spectrometry
Definitions
- proteomics involves characterization of gene and cellular function by determining the activities, interactions, localization and modifications of individual proteins and protein complexes present in a cell or tissue.
- a proteome is highly dynamic because the types of proteins expressed by a cell and their abundances, modifications and subcellular locations vary substantially with the physiological condition of a cell or tissue. Characterization of changes in protein content and activity in response to disease, therefore, may assist in identifying new targets useful for drug development and novel biomarkers for the diagnosis and early detection of disease. Furthermore, proteomics research is highly complementary to other functional approaches to understanding cellular processes, such as microarray-based expression profiles, systematic genetics, and small molecule based arrays. Integration of information from these diverse perspectives via bioinformatic analysis promises to greatly facilitate our emerging understanding of systems-level cellular behavior.
- Proteomics is a complex field involving a very large number of proteins and protein complexes corresponding to a genome.
- the human proteome is expected to consist of between about 400,000 to about 1 ,000,000 proteins, which may interact to form a huge number of protein-protein complexes important in regulating cellular behavior.
- This complexity is further compounded by the large dynamic range associated with protein expression, typically exceeding over six orders of magnitude, and by important post-translational modifications that affect protein activity and function ["From Genomics to Proteomics," Tyers, M. and Mann, Matthais, Nature, Vol. 422, pg 193-197 (2003)].
- Mass spectrometry has played a long-standing role in identifying proteins in complex mixtures, probing protein-protein interactions and characterizing post- translational modifications. Mass spectrometric analysis provides sensitive, fast and selective detection and requires extremely small quantities of protein samples. In addition, mass spectrometric analysis is well suited for automated, high throughput operation, particularly when combined with multidimensional separation techniques, such as high performance liquid chromatography (HPLC) or capillary electrophoresis.
- HPLC high performance liquid chromatography
- the application of mass spectrometric methods to protein identification as been the subject of numerous scientific publications including "Mass Spectrometry and the Age of the Proteome," Yates, J.R., J.
- Mass Spectrometry Vol 33, 1-19 (1998); "Mass Spectrometry-based Proteomics,” Aebersold, R. and Mann, Matthias, Nature, Vol. 422, 198-207 (2003); "Proteomics to Study Genes and Genomes,” Pandey, A. and Mann, M., Nature, Vol. 405, 837-846 (2000); "Mass Spectrometry in Proteomics,” Aebersold, R. and Goodlett, D.R., Chem. Rev., Vol 101 , 269-295 (2001); "An automated Multidimensional Protein Identification Technology for Shotgun Proteomics," Wolters, D.A., Washburn, M.P. and Yates, J.R., III, Anal Chem., Vol 73, 5683-5690 (2001); and "Analysis of Proteins and
- Proteomes by Mass Spectrometry Mann, M., Hendrickson, R.C. and Pandey, A., Annu. Rev, Biochem, Vol. 70, 437-473 (2001), which are all hereby incorporated by reference in their entireties to the extent not inconsistent with the present description.
- protein sequences are determined by stepwise enzymatic degradation of purified proteins into peptide fragments, for example by trypsin digestion, and subsequent mass analysis of peptide fragments by mass spectrometry.
- the recent availability of complete or partially complete gene and genome sequence databases has revolutionized the use of mass-spectrometry to identify proteins.
- peptide mapping mass spectrometric methods such as peptide-mass mapping or peptide-mass fingerprinting.
- a purified protein sample is subjected to proteolytic digestion and the resulting peptides are analyzed in a mass spectrometer, commonly an electrospray ionization (ESI) mass spectrometer or matrix assisted laser desorption (MALDI) mass spectrometer, thereby generating a list of experimentally determined peptide molecular masses.
- ESI electrospray ionization
- MALDI matrix assisted laser desorption
- Protein sequences are identified by matching the list of experimentally determined peptide masses with calculated lists of all possible peptide masses in each entry of a comprehensive protein sequence database. Protein identification via peptide mapping may often be improved by incorporation of auxiliary sequence database search constraints including the estimated molecular weight of the parent protein and the cleavage specificity of the protease used for digestion.
- Tandem mass spectrometry (MS/MS) analysis methods have recently replaced traditional peptide-mapping or peptide-mass fingerprinting methods as the preferred protein identification technique.
- MS/MS Tandem mass spectrometry
- a protein-containing sample is first subjected to proteolytic digestion, usually by a protease having high digestion specificity, such as trypsin.
- protease having high digestion specificity such as trypsin.
- the resulting complex mixture of peptides is fractionated and delivered to a mass spectrometer for peptide identification.
- peptides are ionized, thereby forming precursor ions, which are selectively transmitted by a first mass analyzer on the basis of molecular mass, charged state or both.
- Transmitted precursor ions are broken down into fragment ions (or daughter ions) of the precursor ion. Fragment ions are subsequently mass analyzed by a second mass analyzer and detected, thereby generating a fragmentation mass spectrum comprising a series of peaks corresponding to the mass-to-charge ratios of all charge carrying fragments generated upon dissociation.
- each peptide in the complex mixture may be characterized in terms of: (1) a precursor ion molecular mass and (2) a fragmentation spectrum.
- Acquired fragmentation spectra are analyzed using peptide sequence database search tools (e.g. spectrum matching tools), which compare acquired fragmentation spectra to theoretical peptide fragmentation spectra generated on the basis of peptide molecular mass.
- a de novo peptide-sequencing algorithm may be used to directly interpret a fragmentation mass spectrum and provide putative peptide sequence assignments.
- the output of these search tools is a list of putative peptide sequence assignments for each peptide analyzed.
- each putative peptide sequence assignment in the list is characterized by a confidence score that is intended to provide a measure of the accuracy of the assignment.
- the scored list of putative peptide assignments is provided as input to a protein identification algorithm which reconstructs the sequences of proteins originally present in the sample using a protein sequence database derived from protein, gene and genome sequence information.
- MS/MS analysis characterizes each peptide with respect to a fragmentation spectrum, in addition to peptide molecular mass.
- the peptide fragmentation process depends on the amino acid sequence of a peptide and in many cases the fragmentation products of a given peptide can be accurately predicted.
- peptide fragmentation spectra often provide unique signatures of peptide identities. For example, different peptides having the same or similar molecular masses are often easily distinguished on the basis of the peak patterns in their respective fragmentation spectra.
- Second, the additional information contained in fragmentation spectra allows specific proteins to be identified in the presence of other proteins.
- MS/MS analysis methods are capable of identification and characterization of proteins in complex mixtures and are well suited for high throughput analysis.
- Several algorithms have recently emerged for relating MS/MS data to sequence information using either spectrum matching or de novo peptide sequencing methods.
- the "peptide sequence tag” approach uses short, unambiguous amino acid sequences, which are derived from fragmentation spectra, in combination with peptide molecular mass measurements, to provide a specific probe to determine the identity of some proteins in a sample.
- the "cross- correlation method” uses peptide sequences extracted from a protein sequence database to generate theoretical peptide fragmentation spectra.
- time-of-flight— time-of-flight (TOF-TOF) instrumentation provides for MS/MS analysis of primarily singly charged precursor ions generated by matrix assisted laser desorption/ionization methods.
- triple quadrupole mass spectrometers provide MS/MS analysis of multiply charged precursor ions generated by electrospray ionization methods.
- stable-isotope dilution methods enhance the extent of quantitative information, which may be extracted from peptide fragmentation spectra.
- stable isotope tags are introduced into proteins via metabolic labeling, enzymatic reactions or via chemical reactions using isotope-coded affinity tags.
- Peptide sequence verification primarily involves distinguishing correct peptide assignments from false identifications in peptide or protein sequence database search results.
- a range of approaches to the problem of peptide sequence verification has been examined over the last several years.
- manual verification by researchers having expertise in fragmentation spectrum interpretation may assist in reducing or eliminating false peptide sequence assignments.
- Such manual verification approaches are not feasible for the analysis of high throughput data sets, which may comprise thousands of individual peptide fragmentation spectra.
- manual verification regularly entails a significant amount of subjective fragmentation spectrum interpretation, which often generates different protein identifications from different experts.
- sequence assignment filtering methods have also been applied to peptide sequence assignment verification.
- filtering criterion based on observed and predicted peak positions in fragmentation spectra are applied to either the list of putative peptide sequence assignments or to the list of protein sequence assignments to reduce the rate of false peptide and/or protein identifications.
- these methods reduce the sheer amount of MS/MS data input into protein identification algorithms, no single filtering criterion is capable of providing a truly correct list of peptides or proteins.
- attempts have been made to combine a plurality of filters in a serial fashion. Use of a plurality of filtering criteria, however, often results in a propagation of errors associated with each filter criterion and may actually increase the incidence of peptide and protein misidentification.
- serial filtering data reduction methods fail to utilize important correlations between filtering criteria and other experimentation parameters not derived from fragmentation spectra, which may be especially important for sequence assignment verification.
- This invention provides methods for identifying biological molecules by mass spectrometry, particularly peptides and proteins.
- the present invention provides methods of identifying peptides from fragmentation spectra.
- the present invention includes methods of identifying proteins by analyzing peptides derived from proteins. It is an object of the present invention to provide methods of correlating MS/MS data to amino acid sequences in protein sequence and/or peptide sequence databases and/or amino acid sequences derived from de novo peptide-sequencing algorithms.
- the present invention provides methods for identifying the amino acid sequence of peptides wherein peptide sequence annotators are used to independently verify assigned peptide sequence identities.
- the molecular mass of a peptide analyte is determined and a peptide fragmentation mass spectrum is generated comprising a series of peaks corresponding to fragments of the peptide analyte.
- a plurality of putative peptide sequence assignments is generated using any peptide sequence assignment algorithm or method that uses peptide fragmentation mass spectra and/or peptide mass data as input.
- putative peptide sequence assignments are generated using a spectrum matching algorithm in combination with one or more protein sequence databases comprising a plurality of protein amino acid sequences and/or peptide sequence databases comprising a plurality of peptide sequences and peptide fragment sequences.
- Exemplary peptide sequence databases useful in the methods of the present invention may be derived from protein amino acid sequence data and/or genomic data.
- peptide sequence assignments are generated using one or more de novo peptide-sequencing algorithms.
- the putative peptide sequence assignments generated by the spectrum matching algorithm or de novo peptide-sequencing algorithm are characterized by molecular masses within a selected range of the molecular mass of said peptide analyte.
- putative peptide sequence assignments may be determined on the basis of both measured peptide analyte mass and observed peak positions in a fragmentation spectrum using information from a protein sequence database and/or information from a de novo peptide-sequencing algorithm.
- a peptide sequence annotator index comprising a plurality of peptide sequence annotators is compiled for each of the putative peptide sequence assignments.
- at least a portion of the peptide sequence annotators are determined by comparing the fragmentation mass spectrum of the peptide analyte to the entries of one or more peptide or protein sequence databases comprising masses of peptides, masses of fragments of peptides or both and/or comparing the fragmentation spectrum to one or more protein sequence databases comprising protein amino acid sequences.
- annotators can be determined on the basis of other characteristics, physical properties and/or chemical properties corresponding to each putative peptide sequence assignment, such as predicted retention times, elution times and mobilities on specific chromatographic media, molecular mass and expected fragmentation products. Further, annotators can be determined on the basis of mass spectrometric and/or chromatographic instrumentation used to analyze the peptide analyte, experimental conditions in the mass spectrometer, the composition and/or purity of the sample containing the peptide analyte, and statistical parameters characterizing the closeness of the measured peptide fragmentation spectrum to a theoretical fragmentation spectrum corresponding to a given putative peptide sequence assignment.
- At least a portion of the peptide sequence annotators in each peptide sequence annotator index is input into a parallel confidence assessment algorithm. Selection of which peptide sequence annotators to be input in the parallel confidence assessment algorithm in the present invention may be based on a wide number of experimental parameters including, but not limited to, the instrumentation used for MS/MS analysis, the composition of the sample containing the peptide analyte, sample purity, the presence of known background proteins, MS/MS data quality, signal to noise ratio in the peptide fragmentation mass spectrum, and precursor ion charge state. Operation of the parallel confidence assessment algorithm generates a quantitative confidence assessment for each putative peptide sequence assignment generated by the protein sequence database. The identity of the peptide analyte is determined by selecting the putative peptide sequence assignment having the highest confidence assessment.
- a parallel confidence assessment algorithm of the present invention calculates the sum of selected peptide sequence annotators multiplied by weighting factors, each selected to achieve an accurate assessment of the confidence of each putative peptide sequence assignment.
- the parallel confidence assessment algorithm comprises a series of peptide sequence assessment rules determined by an artificial neural network algorithm, preferably peptide sequence assessment rules derived from the operation of an artificial neural network algorithm on one or more MS/MS data sets generated by analyzing a plurality of peptides having known identities or one or more MS/MS data sets wherein sequence assignments are manually verified.
- the parallel confidence assessment algorithm evaluates one or more correlations between different peptide sequence annotators in each peptide sequence annotator index to achieve an accurate assessment of the confidence of each putative peptide sequence assignment.
- Peptide sequence annotators useable in the present invention comprise any information useful by itself or in combination with other annotators for assessing the accuracy of an assigned putative peptide sequence assignment.
- peptide sequence annotators of the present invention are capable of distinguishing correct putative peptide sequence assignments from incorrect putative peptide sequence assignments and are capable of determining a confidence assessment for each putative peptide sequence assignment.
- Sequence annotators may be determined empirically and/or predicted from known information relating to a putative peptide sequence assignment.
- Sequence annotators may be derived from predicted chemical and/or physical properties of putative peptide sequence assignments, for example, on the basis of the amino acid sequence, the presence of modifications in the amino acids comprising the peptide, size, affinity, structure, molecular mass or any combination of these properties.
- peptide sequence annotators of the present invention may be derived from experimental conditions or measurements.
- exemplary sequence annotators may be derived from ionization conditions or fragmentation conditions in a mass spectrometer.
- peptide sequence annotators of the present invention may be derived from the combination of predicted chemical or physical properties of putative peptide sequence assignments and experimental conditions or measurements.
- exemplary annotators may be derived from a comparison of the measured molecular mass or observed fragmentation spectrum of a peptide analyte and the molecular mass or predicted fragments corresponding to a putative peptide sequence assignment.
- exemplary sequence annotators are derived from correlations between predicted fragmentation patterns of putative peptide sequence assignments and fragment masses extracted from a peptide fragmentation mass spectrum.
- the observed pattern of peaks in a fragmentation mass spectrum is analyzed to provide a series of intensities corresponding to a plurality of fragments having different molecular masses.
- Annotators of the present invention may be determined by comparing the masses and/or relative intensities of fragments observed in a peptide fragmentation mass spectrum to peptide fragments predicted for a given putative peptide sequence assignment using one or more peptide and/or protein sequence databases comprising amino acid sequences of proteins, masses of peptides, mass of expected fragments of peptides or any combination of these.
- Annotators of the present invention may be determined by comparing the masses and/or relative intensities of fragments observed in a peptide fragmentation mass spectrum to peptide fragments predicted on the basis of known peptide fragmentation kinetics and dynamics.
- fragments predicted for each putative peptide sequence assignment refers to the fragments that are expected to be generated upon analysis of a peptide having the same sequence as the putative peptide sequence via a selected MS/MS analysis method or instrument.
- the presence or absence of "matching fragments" which are present in a peptide fragmentation mass spectrum and predicted for a selected putative peptide sequence assignment is an indicator as to the accuracy of a given putative peptide sequence assignment.
- Annotators of the present invention can be derived by analysis of the molecular masses corresponding to all fragments observed in a peptide fragmentation spectrum or from a wide variety of specific fragment types, such as a- type fragments, b-type fragments, c-type fragments, x-type fragments, y-type fragments, z-type fragments, internal fragments, immonium ions and satellite ions.
- Exemplary annotators of the present invention comprise cumulative relative intensities and/or numbers of all matching fragments, all matching a-type fragments, all matching b-type fragments, all matching c-type fragments, all matching x-type fragments, all matching y-type fragments, all matching z-type fragments all matching internal fragments and all matching immonium ions.
- annotators may be derived from correlations between the molecular masses and/or relative intensities of two or more observed or predicted peptide fragments.
- An exemplary annotator of the present invention is determined by comparing fragment masses observed in a fragmentation mass spectrum and de novo sequence tags predicted for a given putative peptide sequence assignment.
- de novo sequence tags refers to one or more correlations between the masses of peptide fragments predicted on the basis of the amino acid sequence of a given putative peptide sequence assignment. Such correlations typically comprise one or more mass differences between a plurality of expected peptide fragments.
- the presence or absence of fragment masses extracted from a fragmentation spectrum which are characterized by the same correlations as a de novo sequence tag is an indicator as to the accuracy of a given putative peptide sequence assignment.
- exemplary sequence annotators are derived peptide fractionation and/or separation properties, such as elution time, retention time and mobility on specific chromatographic media under specific conditions.
- Exemplary separation properties include mobilities and/or retention times characterized using liquid phase and/or gas phase chromatographic methods.
- a peptide-containing sample is subjected to fractionation prior to MS/MS analysis and peptide analytes in the sample are characterized in terms of retention time, elution time or mobility.
- Exemplary fractionation techniques useful in the present invention included, but are not limited to, chromatographic and electrophoresis methods, such as capillary electrophoresis.
- Annotators of the present invention may be determined by comparing the fractionation and/or separation properties experimentally determined for a peptide analyte to the predicted fractionation and/or separation properties of a peptide corresponding to putative peptide sequence assignment.
- An exemplary annotator of the present invention is determined by comparing the observed retention time of a peptide analyte on specific chromatographic media to a retention time predicted for a peptide having the same sequence as a putative peptide sequence assignment on the same chromatographic media. Retention times and other fractionation/separation properties can be predicted for a putative peptide sequence on the basis of amino acid sequence, peptide structure, size, shape, affinity or any combination of these properties.
- exemplary sequence annotators are derived by comparing the measured molecular mass of a peptide analyte and the molecular mass corresponding to a putative peptide sequence assignment.
- An exemplary peptide sequence annotator is determined by subtracting the experimentally determined molecular mass of a peptide analyte from the molecular mass corresponding to a selected putative peptide assignment. The closeness of these molecular masses is an indication of the accuracy of a given putative peptide assignment.
- the peptide identification methods of the present invention provide several advantages over conventional methods of peptide identification by mass spectrometry.
- the methods of the present invention are highly versatile and are applicable to a wide variety of mass spectrometric analysis methods and instrumentation.
- the present methods are amenable to computer assisted automation and, thus, are well suited to high throughput analysis of a large number of different peptide analytes.
- peptide sequence verification provided by the present invention uses an objective validation criterion based on an observed peptide fragmentation spectrum and predicted chemical and/or physical properties of peptides. Therefore, the present methods are is not susceptible to operator- introduced subjective bias.
- the present methods reduce the number of false peptide sequence assignments and missed peptide sequence assignments generated for a given peptide analyte.
- the present invention provides methods of quantifying the confidence of a putative peptide sequence assignment generated by a de novo peptide-sequencing algorithm, or a peptide identification algorithm employing one or more protein sequence databases comprising protein amino acid sequences and/or peptide sequence databases comprising protein masses, peptide masses, peptide fragment masses or both, or any other peptide sequence assignment method, algorithm or computer software that uses peptide fragmentation mass spectra and/or peptide mass data as input.
- a peptide sequence annotator index is generated for a putative peptide sequence assignment using the methods of the present invention.
- At least a portion of the peptide sequence annotators in the index are input into a parallel peptide sequence assessment algorithm of the present invention and operation of the parallel peptide sequence assessment algorithm yields a determination of the confidence assessment of the putative peptide sequence assignment.
- "confidence assessment of a peptide sequence assignment' refers to the probability that the sequence assignment is correct. Therefore, the present invention provides methods of determining the probability that a peptide sequence assignment is correct or incorrect, preferably for a chosen statistical significance level.
- the peptide sequence confidence assessment methods of the present invention are capable of ranking a plurality of putative peptide sequence assignments, such as the sequence assignments determined by a conventional peptide identification algorithm, in order of ascending or descending probability that the putative peptide sequence assignment is correct.
- the present invention comprises methods of identifying the amino acid sequence of a protein employing peptide sequence verification by analyzing peptide MS/MS data.
- a protein analyte is decomposed into a plurality of peptides analytes, preferably by selective proteolytic digestion.
- the peptides are fractionated and sequentially delivered to a mass spectrometer.
- Peptide molecular masses and peptide fragmentation spectra are determined for some or all of the peptides and input into a peptide identification algorithm, for example an spectrum matching algorithm using one or more protein sequence databases comprising protein amino acid sequences and/or one or more peptide or protein sequence databases comprising peptide masses, peptide fragment masses or both, or a de novo peptide sequence assignment algorithm.
- a series of putative peptide sequence assignments are generated for each fragmentation spectrum.
- a confidence assessment of each putative peptide sequence assignment in each series of putative peptide sequence assignments is made using the methods of the present invention. At least a portion of the peptide sequence assignments corresponding to each fragmentation spectrum are input into a protein identification algorithm utilizing one or more protein amino acid sequence databases.
- Operation of the protein identification algorithm results in a determination of one or more putative protein sequences associated with the protein analyte, particularly the amino acid sequence of the protein.
- operation of the protein identification algorithm results in the determination of a single protein sequence associated with the protein analyte.
- peptide sequence assignments having a confidence assessment greater than a selected threshold value are input into the protein identification algorithm.
- a confidence assessment is assigned to every putative peptide sequence assignment for each peptide analyte. Confidence assessments are input into the protein identification algorithm along with each putative sequence assignment, thereby resulting in more accurate protein sequence identifications by the protein identification algorithm.
- putative peptide sequence assignments corresponding to each peptide analyte are ranked in order of descending confidence assessment and are input into the protein identification algorithm in the form of an ordered list.
- Methods of identifying protein amino acid sequences employing peptide sequence verification provide several advantages over conventional protein identification methods.
- First, use of peptide sequence verification decreases the number of putative peptide assignments submitted to the protein sequence analysis algorithm and thus, reduces the computational resources required to generate protein identifications.
- Second, peptide sequence verification also reduces the rate of false protein identifications and provides more accurate sequence assessments of identified proteins.
- Protein sequence identification methods of the present invention may be used to identify proteins in substantially purified samples or in complex mixtures. Proteins may be identified using the present methods in the presence of one or more different proteins or other biological molecules, such as oligonucleotides, polysaccharides and carbohydrates. Methods of the present invention are capable of identifying a plurality of proteins present in a protein-containing sample.
- the protein identification methods of the present invention are capable of detecting and characterizing post-translational modification of proteins. These methods are based on the fact that peptides comprising modified amino acids exhibit different fragmentation processes than peptides comprising unsubstituted amino acids and, therefore, generate different fragments. To distinguish between modified and unmodified peptides, masses of fragments observed in peptide fragmentation spectra are compared to one or more protein sequence databases and/or protein or peptide sequence databases comprising the masses of expected fragments of unmodified and modified proteins and/or peptides.
- masses of fragments observed in a peptide fragmentation spectrum may be analyzed using a de novo peptide-sequencing algorithm to generate a plurality of putative peptide sequences, including peptide sequences comprising one or more modified amino acids.
- the composition of modified proteins is reconstructed by inputting putative peptide sequence assignments, including putative peptide sequence assignments comprising modified amino acids, into a protein identification algorithm.
- Post-translational modifications detectable and characterizable by the methods of the present invention include, but are not limited to, phosphorylation, lipidation, prenylation, sulfation, hydroxylation, acetylation, addition of cofactors, formation of disulfide bonds and proteolysis.
- the methods of the present invention are broadly applicable to the analysis of any polymeric material, particularly biopolymers such as oligonucleotides, polysaccharides and carbohydrates.
- Application of the present methods to identifying polymers involves cleaving the polymer into its constituent parts (or cleavage products) and identifying the composition of these parts by MS/MS analysis. Cleavage of biopolymers may be performed by any cleaving means known in the art including, but not limited to, enzymatic degradation, chemical degradation, photolytic degradation and photochemical degradation. Preferred means of cleaving biological molecules provide cleavage at specific bonds.
- the methods of identifying polymers of the present invention include the step of generating a sequence database, which characterizes the sequence of analyte biomolecules with respect to their expected cleavage products.
- the methods of identifying biopolymers of the present invention include the step of generating expected fragment databases for analyte biopolymers and comprising the masses of polymers, cleavage products of polymers, fragments of cleavage products or any combination of these.
- the present invention provides a method for identifying a peptide analyte, said method comprising the steps of: (1) measuring the molecular mass of said peptide analyte; (2) generating a fragmentation mass spectrum of said peptide analyte comprising a series of peaks corresponding to fragments of said peptide analyte; (3) determining a plurality of putative peptide sequence assignments for said peptide analyte using a peptide sequence database comprising protein amino acid sequences, a spectrum matching algorithm, a de novo peptide sequence assignment algorithm or any combination of these; (4) compiling a peptide sequence annotator index for each of said putative peptide sequence assignments comprising a plurality of peptide sequence annotators; (5) combining at least a portion of said peptide sequence annotators in a parallel confidence assessment algorithm, thereby generating a confidence assessment for each putative peptide sequence assignment, wherein said parallel confidence assessment algorithm comprise a series of
- the present invention provides a method for identifying a protein analyte comprising the steps of: (1) digesting said protein analyte, thereby generating a plurality of peptide analytes; (2) measuring the molecular mass of said peptide analytes; (3)generating a fragmentation mass spectrum for each of said peptide analytes comprising a series of peaks corresponding to fragments of said peptide analytes; (4) determining a plurality of putative peptide sequence assignments for each of said peptide analytes using a peptide sequence database comprising protein amino acid sequences, a spectrum matching algorithm, a de novo peptide sequence assignment algorithm or any combination of these; (5) compiling a peptide sequence annotator index for each of said putative peptide sequence assignments comprising a plurality of peptide sequence annotators; (6) combining at least a portion of said peptide sequence annotators in a parallel confidence assessment algorithm, thereby generating a confidence
- Fig. 1 is a schematic illustrating an exemplary method of identifying the amino acid sequence of protein analytes in a protein-containing sample.
- Figure 2 is a schematic diagram illustrating exemplary methods of peptide sequence verification and data analysis for the analysis of peptide and protein analytes.
- Figures 3A-D show MS/MS spectra corresponding to four different peptide analytes.
- Figure 3A corresponds to Sequence Identity No. 1
- Figure 3B corresponds to Sequence Identity No. 2
- Figure 3C corresponds to Sequence Identity No. 3
- Figure 3D corresponds to Sequence Identity No. 4.
- peptide and “polypeptide” are used synonymously in the present disclosure, and refer to a class of compounds composed of amino acid residues chemically bonded together by amide bonds (or peptide bonds). Peptides and polypeptides also include polymeric compounds composed of amino acid residues including one or more modified amino acid residues. Modifications can be naturally occurring or non-naturally occurring, such as modifications generated by chemical synthesis.
- Modifications to amino acids in peptides or polypeptides include, but are not limited to, phosphorylation, lipidation, prenylation, sulfonation, hydroxylation, acetylation, methionine oxidation, alkylation, acylation, carbamylation, iodination and the addition of cofactors.
- Peptides and polypetides are polymeric compounds comprising at least two amino acid residues or modified amino acid residues.
- Peptides and polypeptides of the present invention may be generated by degradation of proteins, for example by proteolyic digestion.
- Peptides and polypeptides may be generated by substantially complete digestion or by partial digestion of proteins.
- Identifying a peptide or polypeptide refers to determination of is composition, particularly its amino acid sequence, and characterization of any modifications of one or more amino acids comprising the peptide or polypeptide.
- Protein refers to a class of compounds comprising one or more polypeptide chains and/or modified polypeptide chains. Proteins may be modified by naturally occurring processes such as post-translational modifications or co-translational modifications. Exemplary post-translational modifications or co-translational modifications include, but are not limited to, phosphorylation, lipidation, prenylation, sulfonation, hydroxylation, acetylation, methionine oxidation, the addition of cofactors, proteolysis, and assembly of proteins into macromolecular complexes.
- Modification of proteins may also include non-naturally occurring derivatives, analogues and functional mimetics generated by chemical synthesis.
- exemplary derivatives include chemical modifications such as alkylation, acylation, carbamylation, iodination or any modification that derivatizes the protein.
- proteins may be modified by labeling methods, such as metabolic labeling, enzymatic labeling or by chemical reactions. Proteins may be modified by the introduction of stable isotope tags, for example as is typically done in a stable isotope dilution experiment.
- Proteins of the present invention may be derived from sources, which include but are not limited to cells, cell or tissue lysates, cell culture medium after cell growth, whole organisms or organism lysates or any excreted fluid or solid from a cell or organism.
- “Fragment” refers to a portion of polymer analyte, such as a peptide.
- Fragments may be derived from bond cleavage in a parent polymer, such as a parent peptide. Fragments may also be generated from multiple cleavage events or steps. Fragments may be a truncated peptide, either carboxy-terminal, amino- terminal or both, of a parent peptide. A fragment may refer to products generated upon the cleavage of a peptide bond, a C-C bond, a C-N bond, a C-0 bond or combination of these processes. Fragments may refer to products formed by processes whereby one or more side chains of amino acids are removed, or a modification is removed, or any combination of these processes.
- Fragments useful in the present invention include fragments formed under metastable conditions or result from the introduction of energy to the precursor by a variety of methods including, but not limited to, collision induced dissociation (CID), surface induced dissociation (SID), laser induced dissociation (LID), electron capture dissociation or any combination of these methods or any equivalents known in the art of tandem mass spectrometry. Fragments useful in the present invention also include, but are not limited to, x-type fragments, y-type fragments, z-type fragments, a-type fragments, b-type fragments, c-type fragments, internal ion (or internal cleavage ions), immonium ions or satellite ions.
- CID collision induced dissociation
- SID surface induced dissociation
- LID laser induced dissociation
- electron capture dissociation or any combination of these methods or any equivalents known in the art of tandem mass spectrometry.
- Fragments useful in the present invention also include, but are
- fragments derived from a parent polymer analyte often depend on the sequence of the parent, method of fragmentation, charge state of the parent precursor ion, amount of energy introduced to the parent precursor ion and method of delivering energy into the parent precursor ion.
- comparison of the molecular masses of fragments derived from a fragmentation spectrum to the molecular masses of expected fragments corresponding to a putative peptide sequence assignment is used to determine peptide sequence annotators which are useful in verifying peptide sequence assignments.
- Properties of fragments, such as molecular mass may be characterized by analysis of a fragmentation mass spectrum.
- Framation mass spectrum refers to one or more peaks corresponding to the mass-to-charge ratios of fragments generated upon dissociation of a parent precursor ion in a mass spectrometer.
- Exemplary fragmentation mass spectra are mass spectra obtained upon the dissociation of a parent precursor ion. Dissociation may occur under metastable conditions or result from the introduction of energy to the precursor by a variety of methods including, but not limited to, collision induced dissociation (CID), surface induced dissociation (SID), laser induced dissociation (LID), electron capture dissociation or any combination of these methods or any equivalents known in the art of tandem mass spectrometry.
- Exemplary methods of the present invention use peptide sequence annotators derived from observed and predicted collision induced dissociation (CID) fragmentation mass spectra.
- “Putative peptide sequence assignment” is a sequence of amino acid residues and/or modified amino acid residues, which is associated with a fragmentation spectrum.
- putative peptide sequence assignments are generated by measuring the molecular mass of a peptide analyte and determining all possible peptide sequences in proteins of a given proteome which have molecular masses with a certain range of the measured molecular mass of a peptide analyte.
- putative peptide sequence assignments may also be determined by analyzing the fragmentation mass spectrum of a peptide analyte.
- Putative peptide sequence assignments useful in the methods of the present invention may be determined using any peptide sequence assignment method, algorithm or computer software that uses peptide fragmentation mass spectra and/or peptide mass data as input.
- putative peptide sequence assignments are determined using one or more protein sequence databases comprising amino acid sequences of proteins, such as a protein sequence database derived from gene and genome databases.
- putative peptide sequence assignments are determined using one or more peptide sequence databases comprising peptide sequences, peptide fragment sequences, or a combination of peptide sequences and peptide fragment sequences.
- putative peptide sequence assignments are determined using one or more de novo peptide-sequencing algorithms. The present invention includes methods of verifying putative peptide sequence assignments.
- De novo peptide-sequencing algorithm refers to methods, algorithms and computer software that determine the sequence of a peptide without prior knowledge of the peptide sequence.
- De novo sequencing of peptides in the present invention can be performed by any method known in the art including, but not limited to, Edman degradation and by interpretation of MS/MS spectra corresponding to peptide analytes.
- Exemplary de novo peptide-sequencing algorithms are described in "De Novo Peptide Sequencing by Nanoelectrospray Tandem Mass Spectrometry Using Triple Quadrupole and Quadrupole/Time-of-Flight Instruments," Shevchenko, A. et al, Mass Spectrometry of Proteins and Peptides, ISBN 1-59259-045-4 (2000); "Rapid 'de Novo' Peptide Sequencing by a Combination of Nanoelectrospray,
- de novo peptide-sequencing algorithms use peptide mass data and peptide fragmentation spectra as input and generate one or more putative peptide assignments.
- De novo peptide sequencing may be achieved by measuring distances between peaks in a fragmentation spectrum, and comparing these distances to masses of either amino acid residues, modified amino acid residues, or fragments of amino acid residues or modified amino acid residues.
- operation of a de novo peptide-sequencing algorithm generates a plurality of possible peptide sequence assignments corresponding to a single peptide fragmentation spectrum.
- the methods of the present invention are ideally suited to verify and/or assess the confidence of putative peptide assignments generated by de novo peptide-sequencing algorithms.
- the present invention also includes peptide sequence assignment algorithms that employ a combination of one or more de novo peptide-sequencing algorithms and one or more protein and/or peptide data base search algorithms.
- the present invention includes methods wherein all or some peptide sequences generated by de novo peptide sequencing are assigned to one or more peptide sequence entries in a protein sequence and/or peptide sequence database.
- “Spectrum matching” refers to a process in which peaks that correspond to the mass-to-charge ratios of fragments and/or precursor ions are matched to predicted fragments or precursor ions derived from peptide and/or protein databases comprising protein masses, peptide masses, peptide fragment masses or any combinations of these. Alternatively, the peaks that correspond to the mass-to- charge ratios of fragments and/or precursor ions are matched to stored spectra or stored representations of spectra generated from actual peptide or protein samples.
- the matching process may be performed entirely manually, or more preferably some or all of the steps may be performed automatically using a computer-based spectrum matching algorithm. Spectrum matching in the present invention may be performed by any method, algorithm or computer software known in the art.
- This invention provides methods of identifying peptides and proteins using MS/MS data.
- the present invention provides methods of verifying peptide sequence assignments derived from protein and peptide sequence databases or derived from another peptide sequence assignment algorithm, such as a de novo peptide sequence algorithm. Further, the present invention provides methods for detecting and characterizing modifications of peptides and proteins.
- Figure 1 is a schematic illustrating an exemplary method of identifying the amino acid sequences of protein analytes in a protein-containing sample.
- a protein sample containing one or more protein analytes is subjected to digestion resulting in a mixture of peptides.
- Preferred digestion methods of the present invention include proteolytic digestion exhibiting highly specific sequence site cleavage.
- Exemplary methods of digestion usable in the present invention include the use of proteases or combinations of proteases, such as trypsin, thrombin and chymotrypsin.
- peptides may be generated from proteins by the addition of chemical reagents, such as cyanogen bromide, acids and/or bases.
- the peptide mixture is fractionated, thereby generating a plurality of discrete peptide fractions.
- discrete peptide fractions correspond to spatially separated aliquots, which comprise substantially purified peptide analytes.
- discrete peptide fractions correspond to spatially separated aliquots which each comprise substantially a single peptide analyte. Fractionation may be achieved by any method know in the art of peptide separation including, but not limited to, chromatographic methods and electrophoresis methods.
- Exemplary chromatographic separation methods useable in the present invention include single and multidimensional chromatography, such as strong cation exchange/reverse phase high performance liquid chromatography or strong cation exchange/avidin/reverse phase high performance liquid chromatography.
- Exemplary chromatographic methods useful in the present invention also include separation on the basis of hydrophobicity, for example using C18 columns.
- Preferred methods and instrumentation of fractionating peptides include online or offline methods and instrumentation, which are capable of interfacing with a mass spectrometer.
- discrete peptide fractions are separately delivered to a tandem mass spectrometer for analysis, wherein molecular masses of peptide analytes in each discrete peptide fraction is experimentally determined and at least one fragmentation mass spectrum is acquired corresponding to each peptide fraction.
- peptide analytes in each discrete fraction are ionized, for example by electrospray ionization or matrix assisted laser desorption/ionization methods, and are subjected to conditions for dissociation, thereby generating one or more charge carrying fragments.
- Charge carrying fragments are subsequently mass analyzed, for example, by time-of-flight analysis, quadrupole mass filtering or ion trap methods, and detected, thereby generating fragmentation mass spectra.
- Exemplary fragmentation mass spectra comprise a series of peaks, of varying abundance corresponding to the mass-to-charge ratio of fragments generated upon collisional induced dissociation of a peptide precursor ion. Any method known in the art of mass spectrometry of determining the mass of peptide analytes and acquiring fragmentation mass spectra is useable in the methods of the present invention.
- the result of MS/MS analysis is that each discrete fraction is characterized in terms of at least one molecular mass and at least one fragmentation mass spectrum.
- each fragmentation mass spectrum is analyzed to provide lists of the mass-to-charge ratios, molecular masses and relative intensities of each charge carrying fragment.
- Figure 2 is a schematic diagram illustrating exemplary methods of peptide sequence verification and data analysis for the analysis of peptide and protein analytes.
- the peptide analyte mass and peak lists corresponding to mass-to-charge ratios, molecular masses or both of fragments observed in the fragmentation mass spectrum are input into to a peptide sequence assignment algorithm.
- An exemplary peptide sequence assignment algorithm usable in the methods of the present invention is a peptide database search algorithm which compares the peptide analyte mass and peak lists corresponding to a peptide analyte to entries in a protein sequence database and/or a peptide sequence database.
- An exemplary protein sequence database comprises a plurality of protein amino acid sequences and an exemplary peptide sequence database comprises a plurality of peptide masses and a plurality of masses of fragments of peptides. Operation of the peptide database search algorithm generates a plurality of putative peptide sequence assignments corresponding to the peptide analyte.
- Exemplary peptide database search algorithms include conventional peptide database search software tools, such as MASCOT and SEQUEST computer software packages.
- the present invention may be practiced using a peptide sequence assignment algorithm comprising one or more de novo peptide- sequencing algorithms.
- each putative peptide sequence assignment is analyzed to generate a peptide sequence annotator index comprising a plurality of peptide sequence annotators.
- peptide sequence annotator indices for all putative peptide sequence assignments are organized in a relational database. Derivation of peptide sequence annotator indices for putative peptide sequence assignments may be achieved by any means known in the art including the use of one or more peptide sequence annotator algorithms, preferably automated, computer-assisted peptide sequence annotator algorithms.
- peptide sequence annotator algorithms compare the masses of fragments observed in fragmentation mass spectra to the entries of protein sequence and peptide sequence databases.
- the methods of the present invention may further comprise steps of inputting additional data into the peptide sequence annotator algorithm, such as the type of instrumentation used for MS/MS analysis, experimental conditions in the mass spectrometer and observed physical and chemical properties characterizing the discrete peptide fractions, such as retention times, elution times or mobilities for specific chromatographic media (for both liquid and gas phase chromatography) under specific conditions.
- Preferred peptide sequence annotator algorithms of the present invention are capable of evaluating this additional data and deriving additional peptide sequence annotators.
- Peptide sequence annotators useful in the present invention may be derived from results generated by conventional peptide database search software tools, such as MASCOT and SEQUEST computer software packages or any other peptide sequence assignment algorithm, such as a de novo sequencing algorithm.
- At least a portion of the peptide sequence annotators in each peptide sequence annotator index are input into a parallel confidence assessment algorithm, thereby generating a confidence assessment for each putative peptide sequence assignment.
- evaluation of a single annotator is not able to accurately assess the confidence of a putative peptide sequence assignment or series of putative peptide sequence assignments
- evaluation of a plurality of selected annotators provides an assessment of the probability that a given putative peptide sequence assignment is correct.
- Any parallel confidence assessment algorithm capable of accurately characterizing the confidence of a putative peptide sequence assignment is useable in the present invention.
- An exemplary parallel confidence assessment algorithm comprises a plurality of peptide sequence assignment rules derived from the operation of an artificial neural network algorithm on one or more MS/MS data sets corresponding to known peptide and/or protein analytes.
- exemplary parallel confidence assessment algorithms may assign peptide sequence annotators different weighting factors, such as weighting factors determined by operation of an artificial neural network algorithm on one or more MS/MS data sets corresponding to known peptide and/or protein analytes.
- summation of at least a portion of the peptide sequence annotators multiplied by their respective weighting factors provides a quantitative assessment of the confidence of a given putative peptide sequence assignment.
- the present invention includes parallel confidence assessment algorithms employing non-linear peptide sequence annotator weighting and fully automated parallel confidence assessment algorithms. Exemplary methods of the invention also use machine learning algorithms, statistical tools or a combination of these.
- At least a portion of the annotated putative peptide sequence assignments corresponding to peptide analytes in each discrete peptide fraction are input into a protein identification algorithm.
- Exemplary protein identification algorithms compare the putative peptide sequence assignments to entries in one or more protein sequence databases comprising protein amino acid sequences.
- only putative peptide sequence assignments having a confidence assessment greater than a selected threshold value are input into the protein identification algorithm.
- putative peptide sequence assignments may be input into the protein identification algorithm in an ordered list ranked in order of decreasing or increasing confidence assessment.
- the present methods include embodiments wherein putative peptide sequence assignments and associated confidence assessments are input into the protein identification algorithm together, preferably in the form of a relational database.
- Peptide sequence annotators useable in the present invention include annotators derived from the predicted fragmentation patterns of peptides.
- the types and abundance of charge carrying fragments observed in the MS/MS analysis of peptides depends on a number of factors including primary amino acid sequence, the presence of modified amino acids in the peptide sequence, the amount of energy imparted to the peptide precursor ion and the charge state of the peptide precursor ion.
- fragments which are expect to be generated from a selected peptide sequence may be accurately predicted in many instances on the base of its primary sequence, the type of MS/MS instrumentation employed for analysis and the properties of the precursor ion mass-selected for collisional-induced dissociation.
- Peptide ions subjected to a variety of dissociation conditions may fragment at any bond along the peptide backbone, thereby generating a ladder of sequence ions.
- peptide ions frequently fragment at amide bonds (or peptide bonds), thereby generating a sequence of daughter ions, such as y-type ions and b-type ions.
- y-type ions are formed.
- y-type ions are formed.
- peptide ions subjected to dissociation conditions may fragment at C-C bonds in a peptide, thereby generating a sequence of daughter ions, such as a-type ions and x-type ions. Specifically, if charge is retained on the fragment ion corresponding to the N-terminal portion of the peptide after cleavage of the C-C bond, a-type ions are formed. If charge is retained on the fragment ion corresponding to the C-terminal portion of the peptide after cleavage of the C-C bond, however, x-type ions are formed.
- peptide ions subjected to dissociation conditions may fragment at C-N bonds adjacent to the amide bond in a peptide, thereby generating a sequence of daughter ions, such as c-type ions and z- type ions.
- daughter ions such as c-type ions and z- type ions.
- Double backbone cleavage of a peptide may also generate charge-carrying fragments, commonly referred to as internal fragments. Usually, these are formed by a combination b-type and y-type cleavage processes to produce an amino- acylium ion. Alternatively, double cleavage by a combination of a-type and y-type cleavage processes produces an amino-immonium ion. An internal fragment with a single side chain formed by a combination of a-type and y-type cleavage processes is referred to as an immonium ion.
- exemplary annotators of the present invention are determined by comparing or matching the masses of fragments observed in peptide CID fragmentation mass spectra to the predicted fragments for a given putative peptide sequence assignment.
- the term "matched peak” refers to a peak in a peptide fragmentation mass spectrum which corresponds to a molecular mass that is within a selected range of one of the molecular masses of fragments predicted for a putative peptide sequence assignment.
- An exemplary annotator of the present invention comprises the number of all matched peaks in a peptide fragmentation mass spectrum.
- annotators of the present invention may comprise the number of matched peaks in a peptide fragmentation mass spectrum which correspond to one or more specific fragment ion types.
- Exemplary annotators of the present invention include the number of matched peaks corresponding to a-type fragments, b-type fragments, c-type fragments, x-type fragments, y-type fragments, z-type fragments, internal fragments, immonium ions or any combinations of these.
- exemplary annotators of the present invention are determined by calculating the relative intensities of matched peaks in a peptide fragmentation mass spectrum.
- the term "relative intensity" refers to the integrated areas of one or more selected peaks in a peptide fragmentation mass spectrum divided by the sum of integrated areas of all peaks in the fragmentation mass spectrum and may be expressed by the equation:
- An exemplary annotator of the present invention comprises the relative intensity of all matched peaks in a peptide fragmentation mass spectrum.
- annotators may be determined by calculating the relative intensity of match peaks corresponding to one or more specific fragment ion types.
- Exemplary annotators include the relative intensities of matched peaks in a peptide fragmentation mass spectrum that correspond to a-type fragments, b-type fragments, c-type fragments, x-type fragments, y-type fragments, z-type fragments, internal fragments, immonium ions or any combination of these, or fragments derived from these.
- Annotators may also be derived from the identities of one or more fragments, which are matched to one or more peaks in a peptide fragmentation mass spectrum. For example, the presence of one or more peaks in a peptide fragmentation mass spectrum that are positively matched to an internal fragment having a known identity, such as an immonium ion having a known side chain, may serve the basis of an exemplary annotator.
- exemplary annotators may be derived from correlations between observed peaks in a fragmentation mass spectrum and one or more de novo peptide sequence tags. In this embodiment, correlations between the relative positions of peaks in a fragmentation mass spectrum and fragment masses predicted on the basis of de novo sequence tags is an indicator of a specific peptide sequence identity or specific protein sequence identity.
- Peptide sequence annotators useable in the present invention include annotators derived from predicted retention times, elution times and mobilities of peptides or peptide ions through a gas or fluid under influence of an electric field, or on specific chromatographic media, such as a high performance liquid chromatography column.
- Peptide retention times for example, in many cases can be accurately predicted on the basis of primary amino acid sequence, affinity, size, molecular mass, structure or any combination of these properties.
- an annotator of the present invention is calculated by subtracting the retention time measured for a peptide analyte on specific chromatographic media and the predict retention time for a putative peptide sequence assignment for the same specific chromatographic media.
- An exemplary annotator is calculated using predicted and measured retention times using chromatographic separation on the basis of hydrophobicity, for example using C18 columns.
- the putative peptide sequence assignments themselves are used to generate a regression line used for predicting peptide retention times under relevant experimental conditions.
- the putative peptide sequence assignments used to derive the regression line have confidence scores, such as MASCOT peptide assignment scores, larger than a selected threshold value to ensure that the regression line accurately reflects actual peptide retention times.
- Another exemplary annotator is based on the determination of whether or not a particular putative peptide sequence assignment was used to determine the regression line used for used for predicting peptide retention times. If the retention time of the putative peptide sequence assignment was used to determine the regression line then the putative peptide sequence assignment may be allowed much lower tolerance than if it has not been used.
- peptide sequence annotators are determined on the basis of the gas phase ion mobility, such as electrophoretic mobility.
- precursor ions generated from peptide analytes are analyzed by a differential mobility analyzer. Flight times through a given mobility media, such as selected pressure of a gas or mixture of gases, are determined for each peptide analyte and analyzed to determine gas phase ion mobilities.
- An exemplary peptide sequence annotator is determined by subtracting the measured gas ion mobility from gas ion mobilities predicted for each putative peptide sequence assignment.
- a number of useful peptide sequence annotators may be derived from other empirical observations made during protein and peptide analysis.
- exemplary annotators may be derived on the basis of whether or not the same peptide sequence was detected from a precursor ion having a different charge state. Putative peptide sequence assignments are more likely to be accurate if the same or similar fragments are generated from the same precursor ion having two charge states.
- exemplary annotators may be derived on the basis of whether or not the same peptide sequence was determined by analysis using more than one type of MS/MS instrumentation. Since different instruments employ different ionization, mass analysis and dissociation conditions, identification of the same peptide by more than one MS/MS instrument will reduce the probability that the assignment is a random occurrence.
- exemplary annotators may be derived on the basis of the total number of times a peptide is identified.
- exemplary annotators may be derived on the basis of whether or not proline is present in the sequence. If proline is present, an annotator may be determined by determining the relative intensity of the most dominant peak. This exemplary annotator is based on the fact that if a peptide contains proline, it is likely that it will cleave exactly on the N- terminal part of proline and that the peak resulting from that one bond break is often the most prominent peak in the observed fragmentation mass spectrum.
- exemplary annotators may be derived on the basis of whether or not the putative peptide is a part of a known background protein. If a peptide is known to be derived from a background protein it is less likely to be a component of a protein analyte.
- Annotators may also be derived from statistical analysis of correlations between masses and relative intensities in a fragmentation mass spectrum and entries in one or more protein sequence or peptide sequence database.
- An exemplary annotator comprises an error distribution annotator, which determines if the distribution of disparities between matched fragments and predicted fragments is random or if it follows a pattern. The hypothesis testing is performed by using the likelihood ratio method. The null hypothesis in this case is that the error distribution of the peptide under consideration is random and the exemplary annotator calculates the probability that this is the case.
- Exemplary parallel confidence assessment algorithms of the present invention are derived from the operation of an artificial neural network algorithm on one or more MS/MS data sets resulting from the analysis of known peptides and/or proteins or one or more MS/MS data sets wherein putative protein sequences are manually verified.
- artificial neural network packages that may be used in the methods of the present invention including, but not limited to, STATISTICA by StatSoft (http://www.statsoftinc.com/). Neural Network Toolbox for MATLAB by The MathWorks
- a training set comprising thousands of manually verified peptide sequence assignments is used in conjunction with an artificial neural network algorithm to determine a set of intricate relationships between selected peptide sequence annotators in a peptide sequence annotator index.
- An exemplary artificial neural network is a system comprised of large number of simple processing elements linked together by connections, which is modeled after neuronal structure of a brain.
- Exemplary ANNs typically consist of one input layer, one or more processing layers and one output layer.
- Artificial neural networks are used in the situations where there is a relationship between the proposed (known) input and desired (unknown) output but the nature of the relationship is not precisely known. ANNs are particularly useful in situations in which the relationship between the input and output is not linear.
- An artificial neural network is first designed to best fit the problem it will be used for. Then, before it can be used it needs to be trained and that training can be supervised, or non-supervised.
- supervised training is used, which consists of feeding an artificial neural network with the large amounts of input data, together with the desired output. For example, a large number of putative peptide assignments with full sets of corresponding peptide sequence annotators are provide to the network as input. In this embodiment, manual determinations of whether or not a given peptide sequence assignment is correct is provided as the desired output.
- the ANN determines the best possible relationship between the input and output data by trial-and-error method.
- an artificial neural network After training is complete, such an artificial neural network can be provided with the input of new data sets, and it can then calculate the (previously unknown) output.
- Neural networks use in the present invention may be trained by any method known in the art including use of back-propagation algorithms.
- the present invention includes use of artificial neural networks that employ non-supervised training.
- the parallel confidence assessment algorithm is a series of rules that applies weights to different annotators.
- the parallel confidence assessment algorithm is a parallel, multivariate statistical algorithm. For example, selected peptide sequence annotators and corresponding weighting factors may be combined in a parallel, multivariate statistical algorithm with linear weighting to provide an assessment of the confidence of a given putative peptide sequence assignment provided by the equation:
- C is the confidence assessment of a given putative peptide sequence assignment
- peptide sequence annotators are peptide sequence annotators and ⁇ t and ⁇ 0 are weighting factors.
- the present invention also includes non-linear parallel, multivariate statistical algorithms employing a wide range of nonlinear weighting schemes including exponential factor weighing, logarithmic factor weighting, polynomial factor weighting or any combinations of these.
- the present invention includes methods using parallel, multivariate statistical algorithms employing a combination of linear and non-linear weighting.
- the process of assigning weights to one or more peptide sequence annotators may be performed using artificial neural networks or other decision-making algorithms. Correlations between annotators are also especially important in deriving a confidence assessment.
- exemplary parallel confidence assessment algorithms of the present invention are capable of evaluating interdependencies and correlations between different annotators.
- interdependencies of annotators such as the numbers or relative intensities of match peaks correlating to specific fragment ion types, depends strongly on the instrumentation used for MS/MS analysis. Therefore, the MS/MS instrumentation used will often factor into an analysis of peptide sequence annotators interdependencies.
- correlations between annotators based on the number of times a peptide has been identified and annotators based on peptide retention time typically indicate a higher confidence assessment.
- MS/MS systems include TOF-TOF mass spectrometers, triple quadrupole mass spectrometers, linear ion traps, 3D ion traps, quadrupole-time-of- flight mass spectrometers and Fourier transform ion cyclotron resonance mass spectrometers.
- Ion formation via electrospray ionization or MALDI methods is useable in the methods of the present invention.
- the present methods are applicable to low energy CID conditions, high energy CID conditions, electron capture dissociation, laser induced dissociation, or any combination of these methods or any other equivalent methods known in the art of mass spectrometry.
- exemplary computers useable in the present methods include microcomputers computers, such as an IBM personal computer or suitable equivalent thereof, and work station computers.
- algorithms of the present invention are embedded in a computer readable medium, such as a computer compact disc or floppy disc.
- computer readable medium may be in the form of a hard disk or memory chip, such as random access memory or read only memory.
- computer software code embodying the methods and algorithms of the present invention may be written using any suitable programming language.
- Exemplary languages include, but are not limited to, C or any versions of C, Perl, Java, Pascal, or any equivalents of these. While it is preferred for some applications of the present invention that a computer be used to accomplish all the steps of the present methods, it is contemplated that a computer may be used to perform only a certain step or selected series of steps in the present methods.
- Example 1 Exemplary methods of verifying putative peptide assignments The methods of the present invention were used to determine the amino acid sequences of several peptides by analyzing peptide fragmentation mass spectra. Specifically, the present methods were used to verify peptide sequence assignments generated by conventional protein and peptide sequence database search tools. The results of these studies indicate that the peptide identification methods of the present invention are useful for confirming or rejecting sequence identities generated by these search tools.
- Peptide samples in this study were generated by proteolyic digestion of a sample containing a plurality of parent proteins. The peptide containing sample resulting from digestions was fractionated prior to MS/MS analysis using multidimensional chromatography employing strong cation exchange HPLC and separation on the basis of hydrophobicity using C18 columns. Peptide retention times for fractionated peptide containing aliquots were measured. Peptide fragmentation mass spectra were acquired using a three-dimensional quadrupole ion trap-based instrument employing an electrospray ionization source.
- Mass agreement criteria employed for matching peaks in the fragmentation mass spectra and peaks in the database was 0.8 Daltons. Mass agreement criteria employed for matching the observed mass of the precursor ion and putative peptide sequence assignment was 1.5 Daltons.
- Peptide sequence annotator indices were compiled for each putative peptide sequence assignment. Individual peptide sequence annotators used included: (1) the difference between observed peptide retention times on the C18 column and predicted retention times for each putative peptide sequence assignment; (2) the number of matched y-type fragments; (3) the number of matched b-type fragments; (4) the number of matched y neutral loss fragments; (5) the number of matched b neutral loss fragments; (6) the relative intensity of all matched fragments; (7) the relative intensity of matched y-type fragments; (8) the relative intensity of matched b- type fragments; (9) the relative intensity of matched y neutral loss fragments, (10) the relative intensity of matched b neutral loss fragments; and (11) the error distribution.
- Table 6 summarizes peptide sequence annotators based on measured and predicted peptide retention times.
- Table 7 summarizes peptide sequence annotators based on the number of matched fragments. The two numbers provided in each entry in Table 7 correspond to the number of matched fragments and the number of total fragments, respectively.
- Table 8 summarizes peptide sequence annotators based on the relative intensity of matched fragments.
- Table 9 summarizes peptide sequence annotators based on calculated error distributions.
- SEQ ID NO RT a Predicted Rl "a Difference 1 35.65 33.45833 2.191674 2 18.59 18.68694 0.096942 3 19.25 24.17538 4.925377 4 25.1 25.71325 0.6132463 a RT is an abbreviation for retention time.
- C ii No. of b -type is an abbreviation for the number of matched b-type fragments.
- d “No. of y nls” is an abbreviation for the number of matched y neutral loss fragments.
- e “No. of b nls” is an abbreviation for the number of matched b neutral loss fragments
- Table 8 Summary of Peptide Sequence Annotators Based on the Relative Intensities of Matching Fragments.
- f "rel all” is an abbreviation for the relative intensity of all matched fragments.
- 9 "rel Y” is an abbreviation for the relative intensity of matched y-type fragments.
- h “rel Ynl” is an abbreviation for the relative intensity of matched y neutral loss fragments.
- k .. ED(p) is an abbreviation for error distribution.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Biomedical Technology (AREA)
- Hematology (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Food Science & Technology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US52704003P | 2003-12-03 | 2003-12-03 | |
US60/527,040 | 2003-12-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005057208A1 true WO2005057208A1 (fr) | 2005-06-23 |
Family
ID=34676691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2004/040225 WO2005057208A1 (fr) | 2003-12-03 | 2004-12-02 | Procede d'identification de peptides et de proteines |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2005057208A1 (fr) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090173878A1 (en) * | 2007-11-30 | 2009-07-09 | Coon Joshua J | Methods for Processing Tandem Mass Spectral Data for Protein Sequence Analysis |
CN107103205A (zh) * | 2017-05-27 | 2017-08-29 | 湖北普罗金科技有限公司 | 一种基于蛋白质质谱数据注释真核生物基因组的生物信息学方法 |
CN107991411A (zh) * | 2014-05-21 | 2018-05-04 | 萨默费尼根有限公司 | 使用优化的低聚物调度用于质谱生物聚合物分析的方法 |
CN109643633A (zh) * | 2016-08-10 | 2019-04-16 | Dh科技发展私人贸易有限公司 | 自动化质谱库保留时间校正 |
EP3345003A4 (fr) * | 2015-08-31 | 2019-05-15 | DH Technologies Development Pte. Ltd. | Identification de protéines en l'absence d'identifications de peptides |
EP3598135A1 (fr) * | 2018-07-20 | 2020-01-22 | Univerzita Palackého v Olomouci | Procédé d'identification d'entités à partir de spectres de masse |
WO2020014767A1 (fr) * | 2017-07-17 | 2020-01-23 | Bioinformatics Solutions Inc. | Systèmes et procédés de séquençage de peptides de novo à partir d'une acquisition indépendante de données à l'aide d'un apprentissage profond |
WO2021172946A1 (fr) * | 2020-02-28 | 2021-09-02 | ㈜베르티스 | Système basé sur des propriétés peptidiques d'apprentissage pour prédire un profil spectral d'ions produisant des peptides en spectrométrie de masse en phase liquide |
CN114577967A (zh) * | 2020-12-01 | 2022-06-03 | 中国科学院大连化学物理研究所 | 基于人工神经网络和差谱的中药复方样品色谱分析方法 |
US11694769B2 (en) | 2017-07-17 | 2023-07-04 | Bioinformatics Solutions Inc. | Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6489608B1 (en) * | 1999-04-06 | 2002-12-03 | Micromass Limited | Method of determining peptide sequences by mass spectrometry |
US6489121B1 (en) * | 1999-04-06 | 2002-12-03 | Micromass Limited | Methods of identifying peptides and proteins by mass spectrometry |
US6582965B1 (en) * | 1997-05-22 | 2003-06-24 | Oxford Glycosciences (Uk) Ltd | Method for de novo peptide sequence determination |
-
2004
- 2004-12-02 WO PCT/US2004/040225 patent/WO2005057208A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6582965B1 (en) * | 1997-05-22 | 2003-06-24 | Oxford Glycosciences (Uk) Ltd | Method for de novo peptide sequence determination |
US6489608B1 (en) * | 1999-04-06 | 2002-12-03 | Micromass Limited | Method of determining peptide sequences by mass spectrometry |
US6489121B1 (en) * | 1999-04-06 | 2002-12-03 | Micromass Limited | Methods of identifying peptides and proteins by mass spectrometry |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8278115B2 (en) * | 2007-11-30 | 2012-10-02 | Wisconsin Alumni Research Foundation | Methods for processing tandem mass spectral data for protein sequence analysis |
US20090173878A1 (en) * | 2007-11-30 | 2009-07-09 | Coon Joshua J | Methods for Processing Tandem Mass Spectral Data for Protein Sequence Analysis |
CN107991411A (zh) * | 2014-05-21 | 2018-05-04 | 萨默费尼根有限公司 | 使用优化的低聚物调度用于质谱生物聚合物分析的方法 |
CN107991411B (zh) * | 2014-05-21 | 2020-10-16 | 萨默费尼根有限公司 | 使用优化的低聚物调度用于质谱生物聚合物分析的方法 |
EP3345003A4 (fr) * | 2015-08-31 | 2019-05-15 | DH Technologies Development Pte. Ltd. | Identification de protéines en l'absence d'identifications de peptides |
CN109643633B (zh) * | 2016-08-10 | 2021-09-14 | Dh科技发展私人贸易有限公司 | 自动化质谱库保留时间校正 |
EP3497709A4 (fr) * | 2016-08-10 | 2020-04-01 | DH Technologies Development Pte. Ltd. | Correction automatique de temps de rétention d'une spectrothèque |
CN109643633A (zh) * | 2016-08-10 | 2019-04-16 | Dh科技发展私人贸易有限公司 | 自动化质谱库保留时间校正 |
CN107103205A (zh) * | 2017-05-27 | 2017-08-29 | 湖北普罗金科技有限公司 | 一种基于蛋白质质谱数据注释真核生物基因组的生物信息学方法 |
WO2020014767A1 (fr) * | 2017-07-17 | 2020-01-23 | Bioinformatics Solutions Inc. | Systèmes et procédés de séquençage de peptides de novo à partir d'une acquisition indépendante de données à l'aide d'un apprentissage profond |
US11573239B2 (en) | 2017-07-17 | 2023-02-07 | Bioinformatics Solutions Inc. | Methods and systems for de novo peptide sequencing using deep learning |
US11694769B2 (en) | 2017-07-17 | 2023-07-04 | Bioinformatics Solutions Inc. | Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning |
WO2020016428A1 (fr) * | 2018-07-20 | 2020-01-23 | Univerzita Palackeho V Olomouci | Procédé d'identification d'entités à partir de spectres de masse |
EP3598135A1 (fr) * | 2018-07-20 | 2020-01-22 | Univerzita Palackého v Olomouci | Procédé d'identification d'entités à partir de spectres de masse |
JP2021531586A (ja) * | 2018-07-20 | 2021-11-18 | ウニヴェルジタ パラケーホ ヴ オロモウツ | 質量スペクトルからの存在物の同定の方法 |
JP7218019B2 (ja) | 2018-07-20 | 2023-02-06 | ウニヴェルジタ パラケーホ ヴ オロモウツ | 質量スペクトルからの存在物の同定の方法 |
WO2021172946A1 (fr) * | 2020-02-28 | 2021-09-02 | ㈜베르티스 | Système basé sur des propriétés peptidiques d'apprentissage pour prédire un profil spectral d'ions produisant des peptides en spectrométrie de masse en phase liquide |
CN114577967A (zh) * | 2020-12-01 | 2022-06-03 | 中国科学院大连化学物理研究所 | 基于人工神经网络和差谱的中药复方样品色谱分析方法 |
CN114577967B (zh) * | 2020-12-01 | 2023-03-24 | 中国科学院大连化学物理研究所 | 基于人工神经网络和差谱的中药复方样品色谱分析方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Mapping the human plasma proteome by SCX-LC-IMS-MS | |
Fusaro et al. | Prediction of high-responding peptides for targeted protein assays by mass spectrometry | |
Malmström et al. | Advances in proteomic workflows for systems biology | |
Cagney et al. | In silico proteome analysis to facilitate proteomics experiments using mass spectrometry | |
US20060031023A1 (en) | Mass intensity profiling system and uses thereof | |
JP4654230B2 (ja) | マススペクトル測定方法 | |
KR20090068199A (ko) | 질량 분광법에 의한 바이오마커 어세이 | |
Blueggel et al. | Bioinformatics in proteomics | |
US20100137151A1 (en) | Protein Expression Profile Database | |
US20060122785A1 (en) | Constellation mapping and uses thereof | |
Bowler et al. | Proteomics in pulmonary medicine | |
Clarke et al. | The application of clinical proteomics to cancer and other diseases | |
MacCoss | Computational analysis of shotgun proteomics data | |
Yu et al. | Proteomics: the deciphering of the functional genome | |
WO2005057208A1 (fr) | Procede d'identification de peptides et de proteines | |
Chakravarti et al. | Informatic tools for proteome profiling | |
WO2006129401A1 (fr) | Procede de criblage pour une proteine specifique dans une analyse detaillee du proteome | |
US20060003460A1 (en) | Method for comparing proteomes | |
Lubeck et al. | New computational approaches for de novo peptide sequencing from MS/MS experiments | |
McGuire et al. | Mass spectrometry is only one piece of the puzzle in clinical proteomics | |
Russell et al. | Proteomic informatics | |
Gomase et al. | Proteomics: technologies for protein analysis | |
Vaezzadeh et al. | Proteomics and opportunities for clinical translation in urological disease | |
US7765068B2 (en) | Identification and characterization of protein fragments | |
Hewel et al. | High‐resolution biomarker discovery: Moving from large‐scale proteome profiling to quantitative validation of lead candidates |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase |