CA2468689A1

CA2468689A1 - A system and method for automatic protein sequencing by mass spectrometry

Info

Publication number: CA2468689A1
Application number: CA002468689A
Authority: CA
Inventors: Matthias Wilm; Gitte Jackie Neubauer
Original assignee: Europaisches Laboratorium fuer Molekularbiologie EMBL
Current assignee: Individual
Priority date: 2001-11-30
Filing date: 2001-11-30
Publication date: 2003-06-05
Also published as: AU2002218321A1; WO2003046577A1; JP2005510732A

Abstract

A method of deducing the sequence of a protein by analysing tandem mass spectrometry data. The protein is subjected to partial isotopic labelling by enzymatic digestion in a water mixture comprising a non-natural abundance of H21sO. Differential scanning mass spectrometry is applied to the peptide fragments obtained from the digestion. Peaks in the spectra are analyzed to ascertain whether they arise from isotopically labeled fragments or not. A
filtered spectrum is calculated that just comprises peaks from the y-ions. The sequence of the peptide is deduced by computing the difference in the mass between adjacent y-ion peaks.

Description

A SYSTEM AND METHOD FOR AUTOMATIC PROTEIN
SEQUENCING BY MASS SPECTROMETRY
FIELD OF THE INVENTION
The present invention relates generally to a computer implemented method of determining the amino acid sequence of a protein by automatic interpretation of mass spectra of isotopically-labeled C-terminal peptide fragments of the protein.
BACKGROUND
The linear arrangement of amino acids in a protein is elucidated by protein sequencing. Knowledge of the sequence of a protein is essential to the techniques of molecular biology. For example, protein sequence information is a prerequisite for DNA
cloning and provides information for making oligonucleotide probes and polymerase chain reaction (PCR) primers. Furthermore, protein sequencing allows the synthesis of peptides to be used in antibody production, enables the identification of proteins of interest, and helps characterize recombinant products.
When the sequence of a peptide sample is deduced without any additional information such as the sequence of a known related peptide, the approach is known as de novo sequencing. Despite the progress in genomic DNA sequencing, de novo sequencing of proteins and peptides is still required in a biological research environment since many experiments are carried out in organisms whose genomes are not sequenced.
The basic method of protein sequencing is Edman degradation, (Ward & Simpson, "Proteins and Peptides, Isolation for Sequence Analysis of in Molecular Biology and Biotechnology, Robert A. Meyers, Ed., VCH Publishers, Inc. (1995), p. 767), a three-step chemical process based on N-terminal cleavage. Although laboratory automation has made today's practice of the Edman method very efficient, it has several drawbacks, including sensitivity to non-protein contaminants (see, e.g., Keen ~ Findlay, "Protein Sequencing Techniques" in Molecular Biology and Biotechnology, Robert A. Meyers, Ed., VCH
Publishers, Inc., (1995)).
Chemical sequencing of the C-terminus of a protein can be accomplished by the thiocyanate method (Schlack & Kumpf, Physiol. Chem., (1926) 154:125-170).
Although useful for sequencing proteins and peptides that are blocked at the N-terminus, this method also has its drawbacks, including the severity of the reaction conditions and the need to couple the protein to a solid support (Bailey, J. Chromatog. A, (1995), 705:47-65).
Ultimately, mass spectrometry (MS) has emerged as an attractive alternative to chemical methods and has been used to solve sequencing problems that are not easily SUBSTITUTE SHEET (RULE 26) handled by conventional techniques of protein chemistry (she, a g , Carr &
Annan, "Overview of Peptide and Protein Analysis by Mass Spectrometry," in Current Protocols in Molecular Biology, Ausubel et al., Eds., John Wiley & Sons, Inc., (1997), 10.21). In mass spectrometry, the molecular weights of gas-phase ions that are formed from intact neutral molecules are determined by separation based on their mass-to-charge (m/z) ratios.
One effective way of sequencing proteins is the use of mass spectrometry to determine the molecular weights of peptides in mixtures, such as those resulting from proteolytic digestion. The digestion of a protein with a particular enzyme, e.g., trypsin, cleaves the protein at specific sites whose locations depend on the amino acid sequence of the protein. The result is a collection of peptides that gives rise to a signature mass spectrum, often caned a "fingerprint." When mlz values are measured to better than 0.01%
accuracy, the amino acid composition of a peptide fragment can be reliably deduced. Thus, a fingerprint can be utilized to unambiguously identify a protein, or to verify a translation product by comparing it to information contained in a 'database of peptide fingerprints of known proteins.
Mass spectrometry is not limited to measuring the masses of single species but, through the technique of tandem mass spectrometry (MS/MS), can also reveal structural information, including peptide sequences. In many mass spectrometry systems, further fragmentation of the gas phase ions occurs, either spontaneously, or by collision with gas molecules in so-called "collision induced dissociation" (CI17). The subfragments that are generated can also be separated from one another by m/z ratio.
A particular advantage of tandem mass spectrometry is that it can provide amino acid sequence information for peptides at the picomolar or femtomolar level.
In this application, tandem mass spectrometry typically uses a first mass analyzer to select a particular peptide ion that it permits to undergo fragmentation, for example by CID, to produce subfragment ions of the parent peptide or peptide fragment. The technique also utilizes a second mass analyzer so that, after initial peptide ionization and ion selection, subfragment ions are separated and analyzed. The resulting mass spectra contain m/z ratios for the subfragments.
The fragmentation mechanisms undergone by organic molecules, for example during CID, have been well-studied. Therefore important structural information can be revealed by analyzing the masses of both parent species and their subfragments. In particular, molecules tend to preferentially cleave at weak chemical bonds but many functional groups remain intact during the fragmentation process. It has been found that the peptide amide linkage is particularly susceptible to cleavage under the conditions employed in MS/MS.

SUBSTITUTE SHEET (RULE 26) Consequently, the tandem mass spectra of peptides conPain beaks corresponding to subfragments which differ from one another by single amino acid residues and can therefore assist in sequence determination (see, e.g., Hunt et al., (1986), Proc. Natl.
Acad. Sci. USA, 83:6233-6237).
Nevertheless, the problem of analyzing tandem mass spectra remains formidable for a number of reasons. First, cleavage at a peptide bond gives rise to a pair of fragments, one containing the N-terminus, the other containing the C-terminus. Which of these fragments bears the charge after fragmentation is not predictable so that most spectra contain two series of fragments: those containing the C-terminus, known as the y-ions (also called Y"
ions); and those containing the N-terminus, known as the b-ions (also called B-ions). The main challenge of de novo sequencing by mass spectrometry is to reliably recognize the ions of one series of fragments in an otherwise complicated spectrum.
Second, the fragmentation process is not ideal. Some amide linkages are not cleaved during Cm so that the differences between some peaks in the MS/MS
spectrum do not correspond to masses of single amino acid residues but to two or more residues.
Similarly, some fragmentation occurs within amino acid residues to produce subfragments whose masses do not differ from the masses of other subfragments by exact numbers of amino acid residues.
Third, the conditions under which peptide samples are ionized often give rise to multiply charged ions. Therefore there may be series of peaks in the spectrum which correspond to ions of the same fragment bearing different charges. In such circumstances, the peaks which correspond to fragments differing by a single amino acid residue will differ by a m/z value which is a fraction of the mass of the residue.
Finally, a further problem, which depends upon the resolution of the instrument employed, is that it may not be possible to resolve closely spaced peaks that correspond to different isotopically substituted forms of the same fragment.
In general, then, the outcome of a typical de novo MS/MS analysis of a polypeptide is a spectrum whose interpretation is far from straightforward and which usually results in some unidentified members of the peptide sequence.
Hitherto, computational methods for the interpretation of de novo MS/1VIS
peptide spectra have been only partially successful. Part of the reason is that the spectra themselves do not have sufficient sensitivity or resolution to permit thorough analysis.
Another reason is that the algorithms employed are either too time-consuming to be practical or not accurate enough to be useful. For example, in one early approach, measured masses of peptide fragments after enzymatic digestion are compared with theoretical peptide masses from SUBSTITUTE SHEET (RULE 26) each sequence entry in a database, using the same cleav~.ge'~peci~city. The comparison gives a score with which to quantify the goodness of fit (see, e.g., Cottrell, Pept. Res., (1994), 7:115-124; Matsui et al., Electrophoresis, (1997), 18:409-417).
In another approach, theoretical spectra of many possible sequences are matched S with the actual spectrum until a good fit is obtained (Eng, J.K., et al., "An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database," Journal of the American Society of Mass Spectrometry, 5:976 - 989, (1994)). A
drawback with this approach is that there can be a combinatorial explosion associated with trying to match many possible sequences of amino acid residues with the set that gives rise to a spectrum. In employing approximations to limit the inevitable combinatorial explosion, the correct sequence may often be rejected.
In situations where the protein of interest may have a high homology with proteins whose sequence is already known, rapid identification of proteins has recently been achieved by combining partial sequence data obtained by mass spectrometry with efficient l S methods of searching large sequence databases (Neubauer et al., Proc.
Natl. Acad. Sci.
USA, (1997), 94:385-390; Neubauer et al., Nature Genetics, (1998), 20:46-50).
Similarly, direct analysis of large protein complexes has been accomplished by using computer algorithms to correlate acquired peptide fragment mass spectra with predicted amino acid sequences in translated genomic databases (Link et al., Nature Biotechnology, (1999), 17:676-682).
The foregoing techniques will not always impact upon de novo sequencing, however, because of the dependence upon existing sequence data. In an early approach to de novo sequencing by mass spectrometry, mass differences between successive adjacent peaks in the spectra were compared with the masses of the amino acids in turn until a match was found. A sequence was deduced based on a score associated with the intensity of the peaks (Pates, et al., "Computer aided Interpreation of Low Enery MS/MSl Mass Spectra of Peptides", in Techniques In Protein Chemistry II, Ed., J. J. Villafranca, (1991), Academic Press, Inc., p.477).
In another approach to de novo sequencing, a so-called "spectrum graph" is derived from the measured spectrum by assigning a vertex to each peak and constructing an edge between pairs of vertices whose masses differ by the mass of an amino acid residue, (Dancik, et al., J. Comp. Biol., (1999), 6:327-342). The correct sequence can be inferred from the longest path within the graph but only if noise is efficiently eliminated from the spectrum. However, this method produces a large number of suggested sequences with a scoring probability associated with each, and relies upon carrying out a graph theoretical SUBSTITUTE SHEET (RULE 26) technique, the antisymmetric longest path problem, which sale's very poorly with increasing peptide length.
The most recent experimental approaches,to protein sequencing by mass spectrometry have utilized labelling or tagging of peptide sequences. In a chemical method, methyl labelling of C-terminus residues by methyl ester formation, for example, comparison of spectra for labelled and unlabelled samples, can lead to sequence data from characteristic peak spacings in the spectra (Hunt et al., Proc. Natl. Acad.
Sci. USA, (1986), 83:6233-6237). A general drawback of chemical labeling is the chemical reaction step involved and the need to obtain spectra for two different samples.
In another labelling method, deuterium.is exchanged for acidic hydrogens along the peptide sequence (Sepetov, et al., Rapiel Commun. In Mass Spect., (1993), 7:58-62).
Although this method permits ready differentiation between b-ion and y-ion series peaks, the technique is only practical for short peptide sequences (< 10 residues) and only offers additional sequence information for those residues with acidic side chains.
Isotopic labelling of the peptide sequence prior to MS/MS analysis with labels other than: deuterium has been a desirable technique for some time but requires a sensitivity which is hard to obtain with tandem mass spectrometry and usually requires comparison of spectra for two samples. An example of a technique in which information can be obtained from a single spectrum is described in Gygi et al., "Quantitative analysis of complex protein mixtures using isotope-coded affinity tags," Nature Biotech., 17:994-999 (1999). In this technique, proteins are labeled with a reagent such as iodoacetamide that has affinity for sulfhydryl groups. Proteins from one sample are labeled with a normal reagent and proteins from another sample are labeled with reagent that has been substituted with 8 deuteriums.
Both samples are combined, and further labeled with a biotin affinity tag prior to analysis by mass spectrometry. Peaks from the two samples are separated by 8 mass units.
The drawback of the method is the need for a cysteine residue on the protein samples.
Although an application to'$O labelling with a four-sector tandem mass spectrometer has been reported (Takao et al., Anal. Chem., (1993), 65:2394-2399) in which two spectra on a single sample are obtained, the method of analysis is a simple comparison of spectra which becomes rapidly impractical for sequences longer than those reported (about 10 residues). De novo sequencing of proteins by mass spectrometry has therefore been a challenging problem for quite some time.
Recently, however, the sensitivity required for de novo sequencing isotopically labeled peptides has been achieved by combining a nanoelectrospray ion source with a quadrupole time-of flight tandem mass spectrometer. The approach utilizes an intrinsic SUBSTITUTE SHEET (RULE 26) feature of the quadrupole time-of flight device which gtves'rise to a fiigher sensitivity and resolution than other types of mass spectrometers (Shevchenko et al., Rapid Communications in Mass Spectrometry, (1997) 11:1015-1024). Isotopic labeling of C-terminal peptide fragments, e.g., by enzymatic digestion of a protein in 1:1 '60/'80 water, provides a characteristic isotopic distribution for these fragments that can be readily identified (Schnolzer et al., Electrophoresis, (1996), 17:945-953). The principle of the method is to identify C-terminal fragment ions of a peptide in one spectrum by their 1:1 '60/'80 isotopic pattern when the peptide has been labeled at its C-terminus to SO% with'$O
isotopes and to SO% with'60 isotopes before being subjected to a tandem mass spectrometric analysis. Although two spectra are required, they are both obtained for the same sample. The fact that analysis of the difference between the two spectra, i. e., a subtraction, is used means that measurements can be made with an enhanced sensitivity, leading to identification of a series of peaks from isotopically labelled subfragments. These peaks arise from C-terminal ions that differ in mass by one amino acid, a fact which allows elucidation of the amino acid sequence.
Nevertheless, the analysis of isotopically labelled spectra remains complicated for a number of reasons. The identification of a C-terminal peptide by a characteristic isotopic distribution, such as that obtained when digesting a protein in water having a known percentage of'80, is made difficult by the natural isotopic abundance of isotopes such as, ~ ~ inter alia, '3C and 'sN. For example, as a result of these natural isotopic abundances, two peaks in a peptide mass spectrum that are separated by 2 Daltons (Da) might arise from:
peptide subfragment ions having either a'60 atom or a'80 atom at the C-terminus; peptide subfragment ions having either a'60 atom at the C-terminus, or one'3C atom and one'SN
atom, or two '3C atoms, or two 'SN atoms. As the peptides become larger, there is a greater chance for incorporation of the less abundant'3C and'SN isotopes, and the problem of identifying C-terminal peaks for amino acid sequencing becomes increasingly difficult.
In summary, existing methods for de novo protein sequencing all have drawbacks.
Mass spectrometry is a more promising technique for protein sequencing because it requires picomolar or even femtomolar amounts of sample and produces highly accurate spectra.
However, difficulties in spectral interpretation are significant for larger peptides and proteins. Accordingly, the present art is in need of an analytical technique that permits the sequence of large peptides to be deduced from mass spectra.
Citation of a reference herein shall not be construed as indicating that such reference is prior art to the present invention.

SUBSTITUTE SHEET (RULE 26) SUMMARY OF THE INVENTION
The present invention involves the derivation of the amino acid residue sequence of a protein or peptide through the automated analysis of differential scanning mass spectrometry data. Specifically, the aspect of peptide sequence analysis addressed by the present invention is the automated identification of C-terminal, or y ion peaks, in the mass spectrometry data. Once y-ion peaks have been identified, peptide sequences can be deduced by calculating mass differences between adjacent y-ion peaks and attributing each mass difference to a specific amino acid residue. Since a mass spectrum of a peptide consists of a large number of peaks, the derivation of the peptide sequence by human inspection of a simple difference between a pair of spectra is usually not straightforward and rarely fast.
Accordingly, the subject of the present invention is a computer algorithm for deducing the peptide sequence of a peptide from a pair of MS/MS spectra obtained on a partially isotopically labelled sample. The algorithm seeks to compute a "filtered" spectrum comprising just the C-terminus set of subfragments (the y-ion series), from which it is possible to accurately deduce the amino acid sequence.
v The present invention involves an apparatus for determining the amino acid residue sequence of a peptide, comprising: an input device configured to accept mass spectrometry data obtained by applying differential scanning mass spectrometry to a sample of the ~ peptide in which an isotopic label is present in a proportion which is substantially different from its natural abundance; a processor configured to execute mathematical operations on the mass spectrometry data; and a memory connected to the processor to store:
a first set of instructions to direct the processor to generate a probability that a peak in the mass spectrometry data derives from a y ion subfragment of the peptide wherein the first set of instructions are repeatedly executed for each peak in the mass spectrometry data; a second set of instructions to direct the processor to produce a filtered mass spectrum of the peptide, wherein each peak in the filtered mass spectrum whose intensity is greater than a threshold value, is predicted to correspond to a y-ion subfragment of the peptide; and a third set of instructions to direct the processor to derive and store in the memory an amino acid residue sequence of the peptide from the filtered mass spectrum. In a preferred embodiment, the isotopic label is '80, and the proportion is 50%.
According to the technique of differential scanning mass spectrometry, the mass spectrometry data comprises a first mass spectrum that has signals from subfragment ions in which the isotopic label is both present and absent, and a second mass spectrum in which signals from subfragment ions in which the isotopic label is not present are substantially SUBSTITUTE SHEET (RULE 26) suppressed. In a preferred embodiment, the probanmty is computed from a product of a first scoring value and a second scoring value, wherein the first scoring value is proportional to the likelihood that a peak in the first mass spectrum arises from an isotopic cluster that comprises a signal from a subfragment ion in which the isotopic label is absent and also a signal from a subfragment ion in which the isotopic label is present in the proportion; and wherein the second scoring value is proportional to the likelihood that a peak in the second mass spectrum arises from an isotopic cluster containing a peak from a subfragment ion in which the isotopic label is present in the proportion and in which a peak from a subfragment ion in which the isotopic label is absent is effectively suppressed relative to the first mass spectrum.
The present invention additionally involves a method for determining the amino acid residue sequence of a peptide, the method comprising: accepting mass spectrometry data obtained by applying differential scanning mass spectrometry to a sample of the peptide in which an isotopic label is present in a proportion which is substantially different from its natural abundance; generating a probability that a peak in the mass spectrometry data derives from a y-ion subfragment of the peptide wherein the first set of instructions are repeatedly executed for each peak in the mass spectrometry data; producing a filtered mass spectrum of the peptide, wherein each peak in the filtered mass spectrum~whose intensity is greater than a threshold value, is predicted to correspond to a y-ion subfragment of the 20~ -peptide; and deriving an amino acid residue sequence of the peptide from the filtered mass spectrum. According to a preferred embodiment of the present invention, the method for determining the amino acid residue sequence of a peptide is executed by a computer under the control of a program, the computer including a memory for storing the program, an input device configured to accept mass spectrometry data and a processor configured to execute mathematical operations on said mass spectrometry data.
BRIEF DESCRIPTION OF THE DRAWINGS
Other advantages of the invention will become apparent upon reading the following detailed description and upon reference to the drawings, in which:
Figure 1. A computer system according to the present invention.
Figure 2. A quadrupole time of flight mass spectrometer used in the preferred embodiment of the invention.
_g_ SUBSTITUTE SHEET (RULE 26) Figure 3. Flow chart of partial isotopic labelling for u~'~' i~ith'a preferrea embodiment, of the present invention.
Figure 4. Flow chart of a differential scanning method.
Figure 5. Flow chart of an algorithm according to the present invention.
Figure 6. A representative shape of the scoring value Sl" and S2" as a function of g" calculated using Equations 3 and 6, respectively.
Figure 7. Spectra showing comparison of unfiltered and filtered peptide subfragment ion mass spectra.
Figure 8. Representative mass spectrometer for practicing the invention.
~~ DETAILED DESCRIPTION OF THE INVENTION
Introduction A method of identifying the y-ion peaks of a protein in a tandem mass spectrum is described. The term "protein" is used herein in a broad sense which includes, mutatis E ynutandis, peptides, polypeptides and oligopeptides, and derivatives thereof, such as glycoproteins, lipoproteins, and phosphoproteins, and metalloproteins. The principal distinguishing features of such species is that the "protein" comprises one or more peptide (-N(-H)C(=O}-) linkages.
The aim of the method is to simplify and automate analysis of the MS/MS
spectra in such a way that a likely peptide sequence can be proposed. The method is implemented in a computer algorithm. It is based on acquiring not just one, but two, fragment-ion spectra of peptides from a protein sample which has been enzymatically digested in a water mixture comprising known proportions of Ha'$O and HZ'60. The water mixture is such that the fractional composition of H2'$O is substantially greater than its natural abundance and the conditions are such that the peptide fragments incorporate'8O labels at their C-termini in the same proportion as is present in the water mixture. One spectrum is obtained by selecting the entire'60f$O isotopic mixture of the peptide for fragmentation and a second spectrum is obtained for which only'$O labeled peptide ions are fragmented.
After acquisition of the'60/'$O and'80 mass spectra, the data are analyzed using the computer program product and methods of the present invention in order to identify the SUBSTITUTE SHEET (RULE 26) peaks which arise from y-ions. Peaks corresponding to' C-t'~rminal peptide-subi~agments can be identified when comparing the two spectra using two criteria. The first criterion is their'60/'$O isotopic distribution in the first spectrum which is usually difficult or impossible to recognize unambiguously by visual inspection. The second criterion is the change in the isotopic distribution of C-terminal subfragment ion peaks when comparing the first spectrum with the second. C-terminal ions are identified by having peaks from complete'6O/'g0 isotopic distributions in the first spectrum but only peaks from'80 isotopes in the second spectrum. Non C-terminal ions have the same isotopic representation in both spectra since they do not contain the'$O isotope in the proportion introduced by enzymatic digestion.
Once all C-terminal fragment ions have been identified, the peptide sequence can be deduced by calculating the mass difference between adjacent fragments and from their order in the spectrum. The methods and computer program product of the present invention may further comprise the calculation of subtracted and filtered mass spectra.
The methods of the present invention may be applied to proteins or peptides of any length, provided that machine resolution permits a well-resolved mass spectrum to be obtained, in particular as long as the different isotopes can be resolved. The number of amino acids which can be read is sequence dependent, so there will be peptides of say 20 amino acids in length for which only 5 amino acids can be read, whereas there.
may be 20. others that are 25 amino acids in length and for which all 25 residues can be read: In general, a tendency is observed that the readable sequence gets shorter when the peptide exceeds a certain size. It has been found that peptides up to a size of 3 kDa (approximately 30 residues) can still be sequenced to a sufficient length ( i.e., it is possible to read 20 of the 30 amino acids). There is no lower limit for sequencing.
Apparatus The invention, as shown in figure 1, comprises a system 100 for deducing a peptide sequence from mass spectrometry data obtained from mass spectrometer 130.
System 100 comprises a processor 102; a section of memory 104 which will typically include both high speed random access memory as well as non-volatile memory (such as one or more magnetic disk drives); an input device 106, for inputting user-specific parameters, which may comprise a keyboard, mouse and/or touch-screen display; an output device 10~ for printing or displaying the sequence of the protein or peptide, and at least one bus 110 connecting the processor 102, the memory 104, the input device 106, and the output device 10~. Though not shown in Figure 1, the system 100 also preferably comprises a network or SUBSTITUTE SHEET (RULE 26) other communication interface for communicating with other bbh'ipufers as weld as other devices.
The memory preferably stores an operating system 120 for providing basic system services, a file system 122, an analysis module 128 configured to analyze mass spectrometry data, a cache 126 and optionally a graphical user interface (GIT.~ 124.
Preferably, system 100 acquires mass spectrometry data via data channel 132 from mass spectrometer 130. In one embodiment of the present invention, the mass spectrometer 130 is a triple quadrupole mass spectrometer.
The analysis module 128, upon receiving a request to deduce the sequence of the peptide or protein from mass spectrometry data, executes instructions which enable identification, with substantial probability, which peaks in the mass spectrometry data correspond to peptide subfragments in the y-ion series. Once the y-ions are identified, the amino acid sequence is determined by calculating the mass differences between adjacent y ion peaks. Each mass difference corresponds to the mass of one amino acid residue. All amino acids in a peptide chain, except for leucine and isoleucine which have the same mass as each other, may be distinguished. The entire protein sequence may be determined by concatenating or overlapping separate peptide sequences determined from the spectra of different peptide fragments, using principles well known to one skilled in the art.
This system, when operated in a laboratory environment in conjunction with mass spectrometry data can provide an efficient and useful method of deducing the amino acid residue sequence of a protein or peptide.
Instrumentation A mass spectrometer separates ions according to their m/z ratio, the ratio of their mass, m to charge, z. In a first stage, a sample is ionized, for example by electron bombardment, creating ions that, in a subsequent stage, are accelerated through an inhomogeneous electromagnetic field towards a detector. In one embodiment, the magnetic field perturbs the trajectories of the ions according to their m/z ratio: an ion with a small mass will travel more quickly and be less easily perturbed than a heavier ion;
an ion with a small charge will be perturbed more than one with a large charge. In practice, most ions that are produced carry only a single positive charge, though some ionization techniques can readily give rise to multiply charged ions. The production of ions of different m/z from the same sample arises for several reasons: the conditions of ionization may cause the molecules to dissociate; the ions themselves may subsequently rearrange and dissociate; and SUBSTITUTE SHEET (RULE 26) because there are invariably many different isotopic substituerits in the molecules of a given sample.
In one embodiment, a triple quadrupole mass spectrometer is used to acquire peptide subfragment data. An example of such a machine is an API III from Perkin Elmer Sciex (PE-Sciex). In this embodiment, three quadrupoles are used as an ion guide, the mass filter and the collision cell. The typical layout of such a mass spectrometer 300 is shown in Figure 3, though it is understood that variations on the components of such a mass spectrometer are envisaged for practice with the methods of the present invention.
In tandem mass spectrometry, two stages of mass analysis are used. In the first stage, precursor ions are produced from an ionization source 304. In a preferred embodiment, electrospray ionization is used to produce the precursor ions. The precursor ions are optionally passed through a first quadrupole 306 which acts as an ion guide. This ion guide is not usually a mass-selective quadrupole and is usually only present in triple-quadrupole machines. Precursor ions pass into a mass filter 310 that selects a precursor ion having a particular value of the m/z ratio, or, more generally precursor ions whose m/z ratios lie within a narrow range. Currently, it is known that the mass filter 310 which gives the greatest sensitivity is the quadrupole mass filter. An ion trap can alternatively be used.
In a preferred embodiment of the present invention, mass filter 310 is a quadrupole mass filter. The range of m/z ratios transmitted by the quadrupole mass filter is known as the 20. transmission window.
In a preferred embodiment, mass spectrometer 300 used to acquire peptide subfragment data is a quadrupole time of flight ("Q-TOF") mass spectrometer.
An example of such a machine is the "Q-Tof2" by Micromass, in the United Kingdom.. Such a machine employs two quadrupoles. A quadrupole 312 is employed as the mass filter for precursor ion selection, and a quadrupole 322 is used in a collision cell 310 where the precursor ion is further fragmented into subfragments. A time of flight ("TOF") mass analyzer 340 is used to examine the subfragment ions. A representative mass spectrometer design for practicing the invention is also shown as Figure 8.
Iofzization Techniques:
There are a number of ionization techniques used to produce precursor ions for mass spectrometry analysis. These include, but are not limited to, electron ionization, chemical ionization, field ionization, field desorption, fast-atom bombardment, plasma desorption, laser desorption, and electrospray ionization. The two most commonly-used ionization techniques for biomolecule analysis are matrix-assisted laser desorption ionization SUBSTITUTE SHEET (RULE 26) ("MALDr') and electrospray ionization ("ESI"). The methods of the present invention are independent of the ionization technique employed.
In one embodiment of the mass spectrometer utilized with the present invention, MALDI is used. MALDI is a specific type of laser desorption in which biomolecules are co-crystallized with a large molar excess of a small, ultraviolet radiation-absorbing organic acid (matrix). Upon irradiation of the co-crystal with an ultraviolet laser, matrix molecules and biomolecules are sent into the gas phase, where protons are transferred from the matrix molecules to the biomolecules, thus forming biomolecule precursor ions for analysis.
MALDI usually gives rise to singly-charged precursor ions which subsequently undergo "post-source decay" (PSD) to produce fragment ions. It is not usually necessary to use collision-induced dissociation with MALDI, therefore. The MALDI method is often used in conjunction with a time of flight mass spectrometer and therefore may be used with the methods of the present invention.
In a~preferred embodiment, in use with the present invention, the ionization source 304 produces precursor ions by ESI, according to which, ions are formed by spraying a dilute solution of biomolecules at atmospheric pressure from the tip of a fine metal .
capillary. The spray creates a fine mist of droplets that become highly charged in a high electric field. As the droplets evaporate, the biomolecules pick up one or more protons from the solvent to form :ions with single or multiple positive charges. As the droplets ; shrink, charge repulsion causes the ions to be evaporated from the droplet surface, which are then analyzed in the mass spectrometer. In a preferred embodiment of the mass spectrometer used with the present invention, ESI is used to generate precursor ions.
Whereas MALDI can result in extensive fragmentation of the sample and precursor ions, ESI results in little to no fragmentation. Furthermore, samples for ESI are in solution so that the technique is ideally suited for coupling with purification techniques, such as HPLC.
lllllass Filters:
In a preferred embodiment of the mass spectrometer in use with the present invention, a quadrupole mass filter 310 is used to select precursor ions. A
quadrupole mass filter comprises a quadrupole 312, consisting of two pairs of precisely parallel metal rods, with opposite rods being electrically connected. A voltage made up of a direct current potential ("DC") and an alternating radiofrequency ("RF") component is applied to each pair of rods. Because ions passing through the quadrupole are alternately attracted to and repulsed from the rods, they have an oscillating traj ectory, and only those ions with kinetic energy in a certain range pass between the rods and out the other side. All other ions collide SUBSTITUTE SHEET (RULE 26) with the rods. Since the kinetic energy of any given ion is proportional to its mass, the selection of ions is mass-dependent. The ions that pass through the quadrupole identify the transmission window of the quadrupole. If the DC and RF amplitudes are varied together in such a way that their ratio, DC/RF, remains constant, the center of the transmission window can be shifted to other m/z values, and ions with different masses can be "filtered" through and analyzed.
Production of Subfragment Ions:
In one embodiment of the mass spectrometer used with the present invention, the . 10 filtered precursor ions having a particular m/z are sent to a collision cell 320. In a triple quadrupole mass spectrometer, the collision cell comprises the third quadrupole. In a ToF
machine, it typically comprises the second of two quadrupoles. It is understood that many machines that are compatible with the present invention utilize collision cells that comprise quadrupoles. In machines that utilize ion traps, the ion trap itself is a collision cell because ions can be collided with rest gas atoms inside it.
In collision cell 320, the filtered precursor ions collide with uncharged gas molecules, such as argon or xenon, or dinitrogen, delivered from a source 314.
The kinetic energy of the precursor ions is partially transformed into vibrational energy, resulting in the breaking of the precursor ions'. predominantly weak chemical bonds. Peptide precursor ions preferentially fragment at their peptide amide bonds to produce peptide subfragments. The resulting subfragment ions are analyzed by the mass analyzer 340.
Mass Analyzers:
In a preferred embodiment, the mass analyzer used is a time of flight ("TOF") mass analyzer. In this type of mass analyser, subfragment ions are accelerated through accelerating plates 342 and pass into a region that has no external electric field, known as a drift tube 344. If all of the subfragment ions entering the drift tube have the same kinetic energy, given by /Zmv~ for an ion of mass m and speed v, then since velocity is inversely proportional to the square-root of mass, subfragments with larger mass 346 will travel more slowly than subfragments with smaller mass 348. The heavier subfragment ions will therefore reach the detector 350 at the end of the drift tube at a later time than the lighter subfragment ions. TOF analyzers are often used in conjunction with MALDI. TOF
analyzers are advantageous in that they have virtually unlimited mass range and high scan rates.

SUBSTITUTE SHEET (RULE 26) In a preferred embodiment, the detector 350 is an e~ectxon multiplier, wherein the display of the mass spectrum is effectively instantaneous. Detector 350 transmits mass data to computer system 100, via transmission channel 132.
A limitation of TOF analyzers is that peaks are broadened because not all members of the same subfragment ion population have the same kinetic energy. Since the initial energy spread is mass dependent, peaks from heavier subfragment ions are broader. As is well known to one skilled in the art, the initial kinetic energy distribution of subfragment ions entering the drift tube can be decreased by increasing the final accelerating voltage.
The resolution of the TOF analyzer can also be increased by increasing the length of the drift tube, which increases the time difference between arrivals of ions of different mlz, but also increases the spread of arrival times of ions having the same mlz. In another embodiment of the mass spectrometer used with the present invention, the TOF
analyzer is a "reflectron" type in which the ions follow a curved path. A reflectron TOF
analyzer slows the ions down and turns them round before directing them to the detector. When the ions turn around the slower ones catch up with the faster ones.
Mass Spectrometry Data Mass spectrometry data comprises a number of elements, wherein each element has an intensity value, I, for a m/z value. The data comprises elements across a range of m/z ..values. A name of the unit widely used for m/z values is "Thomson" (Th). The collection of data comprising intensity values for a range of Thomson is often called a "mass spectrum." The m/z values in a mass spectrum are typically separated from one another by 0.02 Th, but, depending upon resolution, may be separated from one another by 0.01 Th or 0.05 Th.
A "peak" in a mass spectrum is defined by a collection of adjacent elements, at which each intensity value is above a threshold intensity value. Mass spectrometry data typically also comprises a background intensity, and many low-intensity pieces of data, often called noise. The threshold intensity value may be chosen so that noise is eliminated from consideration during analysis. Usually a peak intensity is proportional to its height, though this approximation may break down for more complex spectra, particularly for heavier ions.
Strictly, the overall intensity of a peak is obtained by calculating the area under the peak. In one embodiment, the calculation is achieved by a centroiding method.
In centroiding, for any peak whose width, measured as full-width at half maximum height ("FWHM"), is at least 0.04 Th, data in a window of width 0.08 Th are merged into the peak SUBSTITUTE SHEET (RULE 26) and added up. Centroiding is not generally good enough for the accuracy needed with the present invention because separate peaks may be accidentally merged.
Accordingly, in a preferred embodiment, an integration method is employed for calculating peak intensities.
This method preferably adds all intensities that are present around a peak within a window of about ~ 0.02 Th. Because different subfragment ions within the same spectrum may have different charges, a different window should be chosen according to the number of charges on the subfragment ion. Accordingly, for a singly charged fragment, the window is preferably 0.04 Th; for a doubly charged fragment, the window is preferably 0.02 Th. It is consistent with the methods of the present invention that other windows may be chosen when carrying out peak integration. Indeed it is also possible that different sized windows may be chosen over different regions of a mass spectrum.
Most of the chemical elements of which organic molecules are comprised have more than one naturally occurring isotope, see Table 1, hereinbelow. Because a mass spectrum is made up from signals produced by a large number of ions, the spectrum comprises a statistical sampling of all of the naturally occurring isotopes. As molecules become larger, the percentage of the population of molecules having one or more atoms of a heavier isotope also increases. Consequently, a given subfragment ion does not appear as a single sharp peak in the spectrum (except in the case of artificially ensured isotopic purity). Instead, the portion of the mass spectrum around the mlz value of a given ion contains a number of :.peaks because each of the elements present in the ion has its own distribution of isotopes in nature.
Table 1 Isotopic Mass and Abundance Values for Atoms in Proteins (Taken from, Wapstra & Audi, Nucl. Phys., (1985), A432:1-54) Element Isotope Mass Natural abundance (%) Hydrogen 'H 1.007 825 035 99.985 ZH 2.014 1 O l 779 0.01 S
Carbon 'aC 12.000 000 000 98.90 '3C 13.003 354 826 1.10 Nitrogen '4N 14.003 074 002 99.634 isN 15.000108 97 0.366 Oxygen '60 15.994 914 63 99.762 SUBSTITUTE SHEET (RULE 26) "O 16.999 131 2 0.038 '$O 17.999 160 3 0.200 Sulphur 32S 31.972 070 698 95.02 33S 32.971458 428 0.75 saS 33.96? 866 650 4.21 36S 35.967 080 620 0.02 Therefore, the mass spectrum of a given peptide subfragment will comprise a n~ber of closely separated peaks, each of which corresponds to a particular distribution of isotopes amongst its atoms. If the peptide subfragment attains a single charge during ionization, then the closely separated peaks for that subfragment are each separated by approximately one m/z unit. The collection of peaks which correspond to fragments differing from one another only by isotopic variation is called a cluster.
With the exception o f laC whose mass is defined to be 12.0000 atomic mass units, no isotope has an integer mass. The mass of a peptide molecule with one '3C atom is not exactly the same as the mass of the same peptide molecule with one "O atom but no '3C atoms. Therefore the peaks within a cluster may be poorly resolved'and may overlap to a great extent.
The mass of those molecules in which every atom is present as the most abundant ' : isotope is called the "monoisotopic mass." The monoisotopic mass of a molecule comprises a sum of the accurate masses for the most abundant isotopes over all the atoms. The peak which corresponds to the monoisotopic mass is typically of lowest mass because the most abundant isotope of each element occurring in a protein or peptide has the lowest mass of all the isotopes. This peak is not always the most intense, however.
The intensity distribution of the peaks within a cluster is often called an "envelope"
and its shape is the result of many contributing factors. For very large molecules, the peak corresponding to the monoisotopic mass is not necessarily the most intense.
The most significant contributor to the isotopic peak pattern for biomolecules is '3C.
The occurrences of the heavy isotopes of oxygen, nitrogen, and sulfur also contribute to the isotope envelope.
C~.bon has two principal naturally-occurring isotopes: 'ZC, which has a mass of 12.000000 and a natural abundance of 98.9%; and'3C, which has a mass of 13.003355 and a natural abundance of 1.1 %. Irrespective of peptide size, the first peak in the resolved isotopic cluster arises from the all'ZC-containing ion. For peptides with masses less than approximately 1,800 Daltons (corresponding to peptides containing approximately 100 c~.bon atoms), this is the most intense peak. However, for peptides with masses greater SUBSTITUTE SHEET (RULE 26) than 1,800 Daltons (corresponding to peptides containing more tltan'~b6ut f00 ca9~on atoms), the first peak in the isotopic cluster will not be the most intense peak because the all 'ZC-containing ion will no longer be the most abundant, i.e., on average every molecule in the sample will contain at least one atom of'3C. In such cases, it may be more useful to consider the most intense peak and refer to it as the "average mass."
Manipulation ofData:
A feature of the present invention is the comparison of two mass spectra obtained for the same sample, the two spectra differing from one another by the centering of the transmission window. Obtaining the difference between the two spectra on the same sample by simple subtraction is rarely straightforward and several data processing operations should be carried out. One problem is that subtraction may lead to negative peaks due to phase mismatches. As an illustration, there may be a peak in both of the two spectra corresponding to an m/z value of 123. In the first spectrum it may start at 122.85 and end at 123.15; but in the second spectrum the corresponding peak may start at 122.88 and end at :123.18. In order for subtraction to be effective, precise alignment of the spectra is required, a procedure that may be difficult if the phase mismatch is not constant over the whole range of the spectrum. In practice, this problem is preferably addressed by "partial centroiding:"
A bin-width is chosen, ,typically 0.05 Th but which may be as low as 0.02 Th, according to the mass resolution of the instrument. The spectra are divided into regular spacings of this bin width. If two data points are within the bin-width, their intensities are added up.
A second complication in the subtraction of spectra is the fact that, under slightly different operating conditions (as necessarily arises for the two different spectra), a pair of peaks common to both spectra do not necessarily have the same intensity. So, even when aligned in phase, the subtraction of one peak from another may not give a baseline value, thus giving rise to small positive or negative peaks. Although the spectra can be scaled so as to match peak heights to one another, the scaling factor required may vary over the range of the spectrum. In a preferred embodiment, the spectrum of the'g0 containing peptide subfragment is scaled to overlap with the'60/'80 spectrum. The spectra are divided into windows, typically 20 Th wide, though other values, both larger and smaller are consistent with the methods of the present invention. In each such window, the highest peak in the 's0~'sO spectrum is determined with, say, m/z value mP and intensity Il . The highest peak in the'g0 spectrum in the range (mP-1, mP+2) is determined, with intensity I2.
The 20 Th wide window of the'$O spectrum is scaled with the factor Il/I2. Finally, once the scaled spectrum has been subtracted from the unsealed spectrum, noise filtering is applied to the SUBSTITUTE SHEET (RULE 26) resulting spectrum: any peak whose width is below some threshold, Say 0.05 '1'h,1s eliminated.
Partial Isotopic Labeling of Peptides In a preferred embodiment, the computer program product and methods of the present invention are for use in conjunction with partial isotopic labeling of peptide fragments of a protein and the differential scanning mass spectrometry technique. Partial isotopic labeling of the C-termini of peptide fragments can be accomplished by methods known to those of skill in the art. A preferred embodiment for use with the present invention is shown in Figure 3. Peptides are labeled by enzymatic digestion of a protein 200 using, inter alia, trypsin, chymotrypsin, or papain, preferably trypsin, in bulk solvent water, a known proportion of which is '$Q-labeled water, i.e., H2'$O, step 202.
The known proportion of labelled water is substantially different from the proportion of the label found naturally. Preferably, substantially different means present in an amount that renders contribution from the natural abundance of the isotope insignificant when carrying out mass spectrometry measurements and means present in an amount that . facilitates automated analysis of a mass spectrum so that signals from peptides that have incorporated label from the labeled water are readily distinguished. In one embodiment of the present invention, the protein is digested in the presence of 30% by volume 'g0=labeled .water, preferably in the presence of 33% by volume '$O-labeled water, more preferably in the presence of 40% by volume '$O-labeled water, or most preferably SO% by volume'$O-labeled water. In general, any known proportion between about 30% and about 75% by volume of'$O-labeled water is suitable for carrying out the methods of the present invention. Proportions by volume of about 30% to about 75%'$O-labeled water are substantially different from the natural abundance of'g0-labeled water.
Those of skill in the art will recognize that enzymatic digestion of a protein in, e.g., SO% HZ'$O and 50% HZ'60 results in the generation of many peptide fragments.
After digestion, the peptide fragments are purified and separated, step 204, by, e.g., gel electrophoresis or HPLC. The peptide fragments 206 that are produced are.analysed by mass spectrometry. Accordingly, hereinafter the term peptide will also include the term peptide fragment, as understood to be a peptide that has been produced by fragmentation of some longer peptide.
When the enzyme digests the protein, it cleaves a peptide amide bond leaving at least one peptide fragment with a free amino group (N-terminus) and a corresponding peptide fragment with a trailing carbonyl group (C-terminus). A water molecule from the SUBSTITUTE SHEET (RULE 26) bulk solvent water adds to the C-terminus group to produce a~carboxylie acid gro~rp. Due to the presence of a known proportion of i80-labeled water, a known proportion of the cleaved peptide fragments will have'80 at the C-terminus. The proportion of cleaved peptide fragments with'$O at the C-terminus is preferably substantially the same as the proportion of'$O-labeled water by volume in the bulk solvent water.
In a preferred embodiment, the known proportion of'$O-labeled water is 50% by volume. For a particular peptide, digested in bulls solvent water, SO% of which by volume is 180-labeled water, for every peptide fragment molecule of mass m with a'6O
atom incorporated at the C-terminus, there will be approximately one peptide fragment molecule with mass m+2 because an'g0 atom is incorporated at the C-terminus. Therefore, the peptide fragment and each subfragment of the peptide fragment that includes the C-terminus will have the characteristic 1:1 '60 l'80 isotopic distribution that should be distinguishable in a mass spectrum as two peaks of similar intensity separated by two mass units. At lower resolution, such a pair of peaks may appear to be a single split peak. Thus, y-ions in a M/S
of such a sample should appear as split peaks or pairs of similar intensity peaks.
Unfortunately, more often than not, it is not possible to discern visually the 1:1 '6Of$O .
isotopic pattern in a mass spectrum of a peptide fragment. Other ions can mimic the pair of peaks or split peak distribution and overlapping subfragment ions with similar masses can distort it. For example, as already mentioned, in sufficiently long peptide sequences, the ,20 peak at m+1 due to molecules with one'3C substituent is at least as big as the peak at m corresponding to those molecules with no '3C substituent. If it were straightforward to identify y-ion peaks in this way then sequencing of the peptide by inspection of a mass spectrum of the isotopic mixture would be feasible.
Although the preferred embodiments of the invention are described herein using peptides labeled at the C-terminus with'60 and'gO using tryptic digestion in H2'6O and Ii~'$O, those of skill in the art will recognize that the present invention can be used in conjunction with peptide fragments partially labeled with other isotopes, for example "O, and using alternative peptide labeling techniques. It will be understood that the amounts of label that should be incorporated may differ for other isotopic labels from those that are preferred for'g0 but that one of skill in the art will be able to determine the amount of label, different from the natural abundance, that should be utilized in order to practice the methods of the present invention.
Furthermore, in principle, there is an analogous labeling scheme for b-ions.
In such an embodiment, the labeling is preferably on the N-terminus. Isotopic labeling at the N-terminus is not as straightforward as isotopic labeling at the C-terminus, which is readily SUBSTITUTE SHEET (RULE 26) accomplished at the same time as enzymatic digestion. An'~'N baset~I~beling scheme i's not, ideal because there are very few practical reactions which could introduce such an isotopic label into the peptide. Thus, labeling at the N-terminus is preferably accomplished artificially, for example, by acetylation. An acetylation reaction introduces a CH3-C(=O)-group at the N-terminus (see, for example, Pfeifer, T., Rucknagel, P., Kuellertz, G., and Schierhorn, A., "A strategy for rapid and efficient sequencing of Lys-C
peptides by matrix-assisted laser desorption/ionisation time-of flight mass spectrometry post-source decay," Rapid Commun. Mass Spectrum., 13(5):362-9 (1999)). Carrying out the acetylation reaction with a mixture of reagents, one ordinary, the other containing a heavier isotope could introduce a mixture of isotopes at the N-terminus. Since the acetylation reaction is an additional reaction to be performed in such a scheme, isotopic labeling cannot ordinarily be accomplished with the same efficiency as with C-terminal labeling during enzymatic digestion which is usually required to generate the peptide fragments for sequencing.
Additionally, the isotopically labeled component is preferably CH3-C(=180)- or C(=O), (both of which give a mass shift of 2 Da) which are more expensive than HZ 1s0, which can be readily purchased.
The methods of the present invention are not limited to sequence determination of peptide fragments obtained by enzymatic digestion of a protein. The sequence of any peptide that has been subjected to partial isotopic labelling may be determined by the method of the present invention.
Differential Seanning ltlass Spectrometry In differential scanning mass spectrometry, outlined in Figure 4, two MS/MS
spectra are obtained for a given peptide fragment 400. A first spectrum, denoted SP1, is obtained, step 402, for the mixture of 160 and 180 containing peptide and their respective subfragments. A second spectrum, denoted SP2, is obtained, step 406, for just the 180 containing peptide and its subfragments. Thus, in SP2, signals for the 160 containing peptide and its subfragment ions are substantially suppressed. In a preferred embodiment, these two spectra are collected on the same peptide sample. The first and second spectra may be obtained in any order but are separated by a step of re-centering the transmission window, step 404. Computational analysis, step 408, of the two spectra can produce a substantially clean spectrum for the C-terminus series of subfragments of the 160 containing peptide. Peaks arising from non C-terminal subfragments always have their normal isotopic distribution (irrespective of 1g0 labelling through enzymatic digestion) and therefore should SUBSTITUTE SHEET (RULE 26) not remain when the two spectra are subtracted from one ano~her~ Tie peptide sequence 410 can be obtained from the analysis.
A peptide sample usually contains many different species, for example the different peptide fragments which result from enzymatic digestion, or the different isotopically substituted forms of a particular peptide or peptide fragment. In a preferred embodiment, the different isotopically substituted forms of a particular peptide or peptide fragment constitute the sample that is introduced into the mass spectrometer. The selection of a precursor ion or precursor ions by appropriate adjustment of the transmission window therefore permits analysis of a particular species or a restricted subset of all of the species.
The performance of precursor ion selection from a quadrupole mass filter entails a compromise between resolution and sensitivity. The resolution is determined by the width of the transmission window. Although the highest resolution is obtained from the narrowest window, the highest resolution also requires the highest sensitivity.
Therefore, operating the quadrupole mass filter so that it selects a single isotope results in insufficient transmission of precursor ions to permit accurate analysis. That is, at the highest resolution possible, not enough sample is transmitted to give a useful spectrum at the sensitivity levels employed.
The transmission window is not uniform, however. That is, ions whose m/z ratios lie within the transmission window are not transmitted with equal intensities. The way in which the intensity varies across the transmission window of a mass filter is called the transmission : curve.
Differential scanning mass spectrometry is based in part on Applicants' surprising discovery that, because of the shape of the transmission curve of a quadrupole mass filter, the transmission window can be chosen in such a way that ions may effectively be excluded without a concomitant loss of sensitivity. Without being bound by any theory, the shape of the transmission curve of a quadrupole mass filter is not symmetric around the selected m/z, but has a sharply rising flank (towaxd the lower m/z) and an extended, longer tail (toward the higher m/z). Because of this characteristic, when the center of a transmission window of a constant width is re-centered, i.e., is moved from one m/z to a slightly higher m/z, so that ions with the lower m/z are not transmitted. Thus, by moving the window to higher values, the lighter'60 isotopes fall out of the transmission window and the'$O
isotopes fall within it. The transmission window behaves as if it has a sharp cut-off 'edge' at the lower end of its m/z range.
The quadrupole mass filter is such that a transmission window corresponding to, e.g., 3 Da, can be chosen. If it is centered at an mlz value corresponding to the mono-isotopic mass, it transmits both the'60- and 1g0-containing ions of a particular peptide, SUBSTITUTE SHEET (RULE 26) giving a first spectrum, SP1. The transmission window is th~'n r~
c~~red'aroun"d a second ' position, at a m/z value corresponding to one mass unit higher, without changing its width and thereby without reducing the signal-to-noise ratio, in order to obtain a second spectrum, SP2. In its second position, the transmission window effectively prevents transmission of the 160-containing ion without affecting transmission of the l80-containing ion. Therefore, transmission of the peptide containing the lower molecular weight oxygen isotope at its C-terminus is essentially completely suppressed in the second spectrum, SP2. The second position of the transmission window permits transmission of ions whose masses are two mass units higher than the monoisotopic mass. Although such species include normal isotopic variants of the 160-containing species (e.g., those ions containing two 13C atoms), their contribution is out-weighed by the contribution from the peptide ions which have picked up an unnatural proportion of 180 through enzymatic digestion. In an alternate embodiment, the transmission window can be centered at the second position prior to the first position.
As shown in Fig. 2, the selected precursor ions are subsequently passed into a collision cell 320 wherein the precursor ions are fragmented into "subfragments."
Subfragments are also identified herein as "peptide subfragments," or "subfraginent ions."
In the second stage of mass analysis, subfragment ions that are produced from a precursor ion are passed into a mass analyzer 340 and thereafter to a detector 350.
In order to accurately assign masses from m/z values in the spectrum, it is usually preferable to calibrate the mass analyzer. As is well known to one skilled in the art, calibration can take the form of recording a spectrum for a sample whose mass is known accurately.
A transmission window of 3 Da is not so narrow that unacceptable loss of sensitivity occurs in the resulting spectrum. Therefore, a given fragment of a C-terminal peptide digested in 50% H2180 and SO% Ha160, whose 160-containing form has mass m, will give rise to two peaks of approximately equal intensity in the first spectrum, and only one peak in the second spectrum. The two peaks in the first spectrum SP1 correspond to fragments with masses at m and m+2 whereas the single peak in the second spectrum SP2 corresponds to fragment ions with masses m+2.
Mass resolution is often expressed as the ratio m/Om, where m and m+0m are the masses of two adjacent peaks of approximately equal intensity to be resolved in the mass spectrum. The differential scanning technique requires the mass analyzer 340 and detector 350 to be able to resolve signals for subfragment ions whose molecular masses differ by at most about one or two Daltons. Specifically, the peak arising from a peptide subfragment SUBSTITUTE SHEET (RULE 26) with mass m, having a 160 atom at the C-terminus, and the pEak ansmg~TO~n the same peptide subfragment having mass m+2 because of a 180 atom at the C-terminus, and both of which having the same charge, must be resolvable in the spectrum. The larger the peptide, the larger the mass of the subfragment ions. Therefore, the resolution of the analyzer must be greater for larger peptides, if the m and m+2 peaks are to be resolvable.
Precursor ions created by electrospray ionization often have multiple charges and consequently their m/z values are fractions of their masses. Although a doubly charged subfragment ion will appear at m/z values one half of its mass, it will be necessary to resolve peaks for subfragment ions of mass m and m+2 which are separated by a single m/z unit.
Those of skill in the art will recognize that the resolution of the instrument used to collect data will influence accurate identification of C-terminal ions.
Although the methods of the present invention can be practiced on low-resolution machines, such as triple quadrupoles, they are preferably carned out on high resolution machines.
If all of the C-terminal peptide subfragment ions can be identified by the characteristic appearance of the m and m+2 doublet in the'60/'$O spectrum and by corresponding suppression of the 160 peak in the l80 spectrum, then the sequence 216 of the peptide or protein can, in principle, be "read" from the spectrum by looking at the m/z differences between successive peaks in the C-terminal series. All amino acids except for leucine and isoleucine, which have the same mass as one another, are distinguishable from each other by their characteristic masses and hence m/z values.
In practice, however, peptide sequencing using differential scanning mass spectrometry is more difficult than simple comparison of spectra.
Identification of the peaks arising from C-terminal peptide subfragment ions usually cannot be accomplished by visual inspection, particularly for longer peptides. The computer program product and methods of the invention alleviate this difficulty and allow for fast and accurate interpretation of mass spectra acquired using the differential scanning technique, resulting in fast and accurate determination of the previously-unknown amino acid sequence of a protein.
Algorithm For Identification of C terminal Peptide Subfragment Ions The main problem addressed by the algorithms of the present invention is the identification of y-ions in the mass spectrum of a peptide. The overall principle is to compute a filtered spectrum, SS, for the peptide, see Figure 5. The filtered spectrum is effectively a simulated spectrum which contains a peak at a m/z value of mP, if mP
corresponds to a y-ion of a 160 containing peptide. The height of a peak in the filtered SUBSTITUTE SHEET (RULE 26) spectrum is analogous to an intensity in a measured spectrum but is cxleulated by a cumulative multiplication of factors, each of which indicates the likelihood that the peak corresponds to a y-ion. An advantage of a filtered spectrum is that it is also visually pleasing and easy to interpret.
The steps that precede production of a filtered spectrum SS are as follows, with reference to Figure 5. The charge on the peptide that gives rise to the spectrum is preferably ascertained. The starting points are the 160/'x0 mass spectrum SP1 500 and the spectrum SP2 502, from which, the charges on the subfragment ions are deduced, step 504.
Subsequently, the peak for each subfragment in the'60/ls0 mass spectrum is analyzed to see 1f whether it corresponds to an 's0-labeled ion, step 506, and a scoring value S1 508 for each peak is deduced. Peaks in the's0 mass spectrum are also analyzed to see whether they represent'60 containing peptide subfragments whose presence is suppressed in the's0 mass spectrum relative to the'60/'s0 mass spectrum, to produce a scoring value S2, 512. It is to be understood that steps 506 and 510 may be reversed in order without departing from the 15 scope of the present invention. Finally, scoring values S 1 and S2 are combined to produce a filtered spectrum SS, 514. Each of the foregoing steps is now.described in greater detail.
The algorithm utilizes data for the'60/'s0 spectrum, SP1, and the's0 only spectrum, SP2. The principal task of the algorithm is to produce a scoring dataset, SD, in which every peak in the '60/180 spectrum is assigned a probability value that it is a y-ion of a'60 20 .containing peptide subfragment. The filtered spectrum, SS, is then computed, for every value mP, according to equation (1):
SS(mP) = SD(mP)*SP1(mP) (1) 25 The final result of the algorithm is to produce a filtered spectrum which contains computed m/z values for'60 y ions, with all other ions screened out. It is to be understood that the methods of the present invention are equally applicable to calculations of filtered spectra that correspond to just parts, or ranges, of the measured spectra. It is not to be construed that the methods of the present invention are limited to calculations of filtered spectra, 30 scoring dataset or scoring values that encompass the entirety of measured spectra for either of the positions of the transmission window.
In conjunction with differential scanning mass spectrometry, the identification of y-ions using the computer program product and methods of the present invention is facilitated by recognizing two essential features of y-ions in the spectra. First, y-ions have a'60/'s0 35 isotopic distribution in the'6~/'s0 spectrum, SP1. Second, the'60 peaks of y-ions are SUBSTITUTE SHEET (RULE 26) suppressed in the'80 spectrum, SP2. These two features are used toi'~afcul~te an'oveiall scoring value, for each peak in the'60/'80 spectrum,. which forms part of the scoring dataset, SD.
The first step is to deduce a mass value, m, of the fragment which gives rise to the peak at position mP. Methods for accomplishing this can be found in:
Uttenweiler-Joseph, S., Neubauer,G., Christoforidis, S., Zerial, M., and Wilm, M., "Automated de novo sequencing of proteins using the differential scanning technique," Proteomics, 1(5):668-682, (2001), incorporated herein by reference. According to the type of ionization employed the subfragment ion giving rise to the peak at mP may be multiply charged. The electrospray ionization method typically give rise to multiply charged ions. Methods of deducing the number of charges are well-known toahose skilled in the art. The most straightforward way of identifying multiply charged ions is to examine the spacing of peaks associated with adjacent isotypes. For example, if such peaks are 0.5 m/z units apart, the ions are double charged. If the peaks are 0.33 or 0.25 m/z units apart, the ions are triply or quadruply charged respectively. More sophisticated methods of interpreting mass spectra of multiply . charged ions include those described in U.S. Patent 5,072,11 S, to Zhou, incorporated herein by reference.
The overall scoring value, SD(mP), for a peak at mP, which measures the overall probability that the peak is the first peak of a doublet arising from a partially-labeled peptide :.subfragment, is computed from a product of two factors, equation 2:
SD(mP) = S 1 (mP)*S2(mP) (2) Sl(rnP) is a first scoring value that is a probability calculated by comparing the distribution and intensities of peaks in the envelope around the peak at mP in the'64/'$O
spectrum, with the expected distribution and intensities of peaks for a peptide of the same mass using natural isotopic abundances. Therefore S1(mP) indicates how likely the peak at mP arises from a fragment with the'60l'80 ratio resulting from enzymatic digestion in 50%
HZ'80 and 50% HZ'60, or, in an alternative embodiment, in a water mixture containing some other proportion of HZ'$O.
S2(mP) is a second scoring value that is a probability calculated by comparing the intensity of the peak at mP in the'60f$O spectrum SP1 with the intensity of the peak at mP
in the'$O spectrum SP2 and evaluating the degree of suppression of this peak in the second spectrum. Therefore S2(mP) indicates how likely the peak at mP corresponds to the'60 3~ containing y-ion of a peptide.

SUBSTITUTE SHEET (RULE 26) Calculation of a first scoring value SI based on expected andobserv~d ~sott~pic distributions.
The first step in the method of the present invention calculates a first probability, known as a first scoring value, S 1, that a particular peak at position mP
arises from the first isotope of a '60/180 isotopic cluster in spectrum SP1. For a peptide whose monoisotopic mass is mo, the observed isotope envelope comprises contributions from ions whose masses are approximately mo+1, mo+2, mo+3, etc. If the monoisotopic species gives rise to a peak at mP with intensity Io the envelope will comprise successive peaks, denoted (mP+1) with intensity h, (mP+2) with intensity Ia, (mP+3) with intensity I3, and so forth.
If the ions contributing to the cluster have a single charge, the successive peaks in the envelope are separated by approximately one m/z unit. The highest mass that is usually to be considered depends on the value of mo, since larger peptides are expected to incorporate a greater number of heavy isotopes, and will therefore have more significant peaks in the isotope envelope. The observed peak intensities of the isotope envelope, Io, h, Iz and so forth, are usually governed by the natural isotopic abundance. The natural isotopic distribution of carbon, nitrogen, oxygen and sulfur (Table 1) has been a factor that has complicated the interpretation of peptide mass spectra, but in the present invention it can be used to some advantage. Because natural abundances are known it is straightforward to identify when they are perturbed, for example, by the artificial'6~f$O ratio arising from enzymatic digestion in a mixture of 50% Hz'80 and SO% Hz160, and to quantify the extent to which they are perturbed.
The theoretical appearance of a peptide subfragment's isotope envelope may be accurately modeled by solving a polynomial expression that calculates the abundance-weighted sum of the isotopes of each element to the molecular ion cluster.
(Yergey (1983) J.
Mass Spectrom. Ion Process 52:337-349). Examples of formulae which are employed by the present algorithm include, but are not limited to:
if (NlpeP >300) ~~ _ -0.015468 + 0.00056164 NI~,~, h=h+b, endif if (Mpep > 1000) $z = 0.020233 - 0.000039644 lVlpep + 0.00000017749 Mpepz Iz=Iz+8z SUBSTITUTE SHEET (RULE 26) endif if (M~~, > 1800) 83 = -0.0033252 - 0.000052477 MPs, -0.000000049304 Nh,~Z +
0.000000000045306 Nh,~,3 I3=I3+S3 endif if (Nlp~, >2400) 8a = -0.01038 + 0.00012296 Mp~p - 0.00000014875 lVlpepa +
0.000000000053833 Nlp~,3 Ia = Ia +(Sa endif if (1VI~~, > 3100) ~s = - 0.0025452 + 0.0000122914 Mpg - 0.0000000116655 MP~,Z. +
0.000000000010044 Mp~3 + 6.2631 X 10-'s Mp~,a Is=Is+8s endif if (Mpg, > 3500) 86 = -0.53925 + 0.00055053 M~,~, - 0.00000012987 Mpep2 =I6+(~6 endif In these formulae, a fragment of mass Mpep has a fragment of mass MpeP + n in its isotopic cluster. The intensity, In, of fragment lVlpep + n in the envelope can be calculated by addition of the term 8". The initial values for all the In's are 0 with the exception of Io which is never calculated because it is set to 1. to is the intensity of the monoisotopic species of the molecule. I" is the intensity of the n'th isotope after the first (the n+1' th isotope altogether).
In order to illustrate an application of these formula, consider the 3rd isotope above the initial one. As discussed hereinabove, there may be more than one contribution to every isotopic mass. There is a contribution from the'60 containing peptide that has isotopic substituents from other elements, so the contribution is calculated taking 83.
But there is SUBSTITUTE SHEET (RULE 26) also a contribution from the '$O-containing peptide. Since the IBiJ-p~pti~e:is 2 Da header, the same mass is also the first isotope after the initial one for the 1$O-labeled peptide. Hence, for the contribution of the'80 labeled peptide to the I3 peak, the contribution must be calculated taking bl. In this case, then, I3 is incremented by both 81 and 83.
These exemplary formulae depend only upon fragment mass, not element type or chemical composition, nor charge. Accordingly, before applying such formulae, the charge on the fragment must be deduced, so that its mass can be found. The formulae are approximations derived from average abundances amongst currently known peptide sequences. The most up to date compilations of peptide sequences that are suitable for deriving these formulae include, for example, a "non-redundant" database, updated~on a regular basis by the European Bioinformatics Institute (EBI). See for example http://www.ebi.ac.uk/ also available at fta~//~ embl-heidelberg delpub/databases/nrdb.
Another similar database compiled by NCB1, can be found at htt~:l/www.ncbi.nhn.nih.,gLovl.
Note that, the heavier the peptide subfragment ion, the more peaks in the envelope have significant intensity. It can be seen that the envelope of a peptide subfragment whose monoisotopic mass is 1100, say, will have a fragment at mass 1101, whose intensity is calculated by the first formula and a fragment at mass 1.102 whose intensity is calculated by the second formula. For a peptide subfragment of mass 1100, contributions at 1103 and greater are negligible.
In the differential scanning technique, the isotopic envelope of y-ions is perturbed by the characteristic isotopic distribution from partial labeling with'80. For example, if the peak at mP is the first peak in a y-ion'60/'$O isotope cluster, then the observed intensities of the peaks (mP+2), (mP+3) and so forth will be different from those of a subfragment whose oxygen content is that naturally occurring. The characteristic doublet of the'60/'80 isotopic cluster and the isotope envelope of the '80-containing ion will be superimposed onto the isotope envelope of the '60-containing ion. In contrast, the isotopic envelope of a non-y-ion will simply follow the expected naturally occurring form. As a result, although it is extremely difficult to visually assign peaks in a peptide mass spectrum arising from y-ions, the fact that their isotopic envelopes differ considerably from the envelopes expected for the naturally occurring distribution of isotopes can be exploited computationally.
The theoretically expected peak intensities, denoted h*, IZ*, I3*, and so forth, based on natural abundance of isotopes, for a'60-containing y-ion can be calculated using a polynomial expression of the type shown above. The observed and calculated intensities are normalized to Io and lo* respectively to permit quantitative comparison. The scoring value SUBSTITUTE SHEET (RULE 26) S1 is a function of the difference between the observed and c~.lcuiatedantei~sities.for each peak in the isotopic envelope, as shown in equation 3:
Sln(mp) _ ~ (0.001+ e-~~n) (3) wherein:
~ n = In - In (f) The absolute value of the difference in intensities, O", is calculated from I", the observed intensity for peak (mP+n) and I"*, the intensity calculated for a peak (mP+n) assuming that ~e peak at mP arises from a'60-containing y-ion. S1"(mP) in equation (3) is the contribution to the scoring value S 1 of the peak at mP from the intensity of the peak at (mP+n). The fundamental constant, a (~- 2.71828...), is the base of the natural logarithm.
There are two parameters in equation (3) which have the following effects: ~, is a "strength", i.e., a weight given to the scoring value, adjustable according to how significant this criterion is to be; 6 is a "sharpness" parameter affecting how quickly Sl" drops to zero . with increasing g,. The form of equation (3) is shown in Figure 6(a) wherein ~, =5 and a =
0.25. O values are on the x-axis, S 1 values on the y-axis.
The sharpness parameter, a, determines how fast the scoring values drop to 0.001 *~..
It is preferably not fixed, but calculated from the data itself. The purpose of the scoring action is to multiply peaks which have an'g0 isotope with the scoring strength ~, and peaks which do not have this isotope with a very small value, 0.001 *~,, according to a preferred form of equation (3). Since most of the peaks will not have an'g0 isotope (only the C-terminal fragments have it), the average peak should be multiplied with a very small value close to 0.001 *~,. Thus, a is preferably chosen such that the average peak is multiplied with about 0.003*~,. This means mathematically:
6 - - ln(0.002) (S) (~ avg ) with Da~g the average of all values determined from this spectrum.
In a preferred embodiment of the present invention, ~, is fixed at the value 10Ø
Values of a and ~, are sensitive to the machine employed and the quality of the data. It is within the capability of one skilled in the art to choose values of ~, and a different from SUBSTITUTE SHEET (RULE 26) those given here in order to produce better results, according to the sample and the machine employed.
The exponential term in equation (3) ensures that large differences (~ in observed and calculated intensities, when squared, result in small S1" values. Note that, because both spectra are normalized to Io = Io* = l, Slo(mP) =1.001 ~, (~~,). In an alternative embodiment, the relative contribution of the two terms in equation (3) can be adjusted by separately altering the values of their coefficients, while ensuing that their sum remains close to 1Ø
Still other forms for equation (3) are possible without deviating from the principles of the present invention.
It is to be noted that actual isotopic abundances of the fragments in the sample cannot be measured perfectly either. Two major reasons for this are: that a given peak comprises signals from only a small number of ions, so statistically the full abundance of all isotopic substituents may not be realized; and that a given peak often develops with signals from other ions, giving rise to distortions that cannot be predicted.
For every cluster in the spectrum, a contribution Sln(mP) is calculated for peaks (mP+n), according to the mass of the peptide subfragment ion, giving rise to the isotopic:
envelope. For an unlabelled fragment:. 300 < mo < 1000, just (mP+1) is considered; for 1000 < ~ < 1800, peaks (mP+1) and (mP+2) are considered. When considering a potentially 'g~
. labelled fragment, isotopes with mass mo + 2 are also considered, so for 300 < NiJ, < 1000 this includes peaks (mP+1), (mP+2) and (mP+3). It is to be understood that the methods of the present invention can also be practiced by computing S 1 over a subset of clusters in the spectrum.
Thus, the first scoring value, S 1 (mP), is the measure of similarity between observed intensities of peaks in the isotopic envelope around a given peak at mP and the intensities calculated for these peaks assuming that the peak at mP is the first isotope in a'bp/180 isotopic cluster in SP 1. The scoring value takes into account not only the degree of y-ion labeling, but also the natural abundance of isotopes, which have traditionally complicated the mass spectra of large peptides. A small difference between observed and calculated intensities is reflected in a high scoring value, which indicates a high probability that the peak at mP is due to a'60-containing y-ion, i.e., the monoisotopic species.
In a final step, S 1-values are preferably normalized to l, by dividing through the entire S 1 function by its maximum value after it has been calculated for every peak mP.
This step effectively converts the scoring values into probabilities.

SUBSTITUTE SHEET (RULE 26) Calculation of a second scoring value based oh the degree o~'su~pYession of a peak in the second spectrum.
The second procedure in the method of the present invention is to compute a second probability, known as a second scoring value, S2, that a particular peak at mP
arises from the first isotope of a'60f$O isotopic cluster whose'60 isotopes axe suppressed in spectrum SP2. This calculation is achieved by comparing the two spectra, SP1 and SP2, thereby determining the amount of suppression of the peak in SP2.
As described hereinabove, the transmission window of a quadrupole mass filter can be re-centered to a higher m/z value without being narrowed, so that transmission of a lighter isotope is effectively precluded. Use of a constant transmission window width ensures constant sensitivity. In this way, the two different spectra, SP1 from an isotopic mixture of a particular peptide, and SP2 from only the heavx isotope-containing peptide, have similar signal-to-noise ratios.
A peak at mP and intensity Ifl in spectrum SP 1, collected when the quadrupole transmission window embraces the signals for both the'60 and the'g0 containing fragments, gives rise to additional peaks denoted (mP+n) each having intensity II;, due to the . natural distribution of isotopes and the mixture of substituted fragments.
Similarly, in the second spectrum, collected when the quadrupole transmission window is centered around the.fragment containing the'80 isotope at the C-terminus, the peak at mP
has.intensity denoted by Ko, the peak (mP+n) in the same envelope has intensity denoted by K".
First, the intensities of the peaks in SP1 arising from the peak at mP are normalized to la, and the intensities of the peaks in SP2 arising from the peak at mP are normalized to Kfl. If the peak at mP arises from the first isotope in the'60fg0 isotopic cluster, I~«l0 because this peak is suppressed in the econd spectrum. By making I~=1, the intensities of the other peaks, Kl, K2, etc., are set to arkificially high values.
For the calculation of the second scoring value, it is desirable to average abundant isotopes of a fragment ion which could be'g0 labeled. For a peptide fragment whose mo is less than 1,400 Da, the mP and (mP+2) peaks are considered: if unlabeled, only the first isotope, at mo, is abundant; if labeled, both the isotopes mo and mo+2 are abundant. For mo that is greater than 1,400 Da, the mP, (mP+1), (mP+2), and (mP+3) peaks are considered: if unlabeled, only the first isotope is abundant; if labeled with'$O, all of the isotopes mo to mo+4 are abundant. This is because, as mentioned above, for heavier peptide ions, the contribution of subfragments containing multiple isotopic substituents increases. The choice of 1,400 Da is not fixed and other values in the region of about 1,400 Da can be chosen without departing from the spirit of the present invention.

SUBSTITUTE SHEET (RULE 26) For each spectrum, the average relative isotopic intensity is calculated by taking the average of the intensities of all of the peaks considered. For a particular ion having mass mo, the averages are I(ave) and K(ave) for spectra SP1 and SP2, respectively.
Thus, for mo <
1,400 Da, I(ave) _ (lo + I~)/2; for mo > 1,400 Da, I(ave) _ (J.~ + Ii + Ia +
I3)/4.
If the peak at mP is the first isotope in the ~601~80 isotopic cluster, which is suppressed in the second spectrum, then K(ave)>I(ave). If the peak at mP is the first isotope of a non-y-ion, then K(ave) ~-= I(ave).
An expression for the second scoring value, S2, that measures the degree of suppression of the peak at mP in the second spectrum is shown in equation 6:
S2n(YYIp)= ~,2(1- 2 624n) (6) wherein the parameter, ~,2, is a scoring weight given to S2(mP) and 0" is the difference in peg intensities between the two spectra, i.e., the peak suppression. In a preferred embodiment, the scoring weight parameter is given a value of 5. Figure 6(b) gives an example for ~ =5 and a2 =0.25. In Fig. 6(b), O values are on the x-axis, and S2 values are on the y-axis.
If the suppression is negative, i.e., if the intensity of a.given isotope in the second spectrum is greater than in the first spectrum, then 0 is set to 0 (corresponding to S2 = 0, i.e., no suppression). As with the formulae for Sl, 62 is a sharpness parameter since it determines how fast the scoring values drop to 0 if there is no suppression ( i.e., 0" is small).
The scoring function should have the value ~ for peaks which were'80 labelled.
Such peaks are suppressed in the second spectrum. Since most of the peaks are not labeled, ~e average peak is not'$Q labelled. For an average peak the scoring value should be very low, say 0.002 *~ in a preferred embodiment. Furthermore, a2 is preferably chosen such that the average peak is multiplied with a small factor such as 0.002*7~,~.
This means mathematically that 6 may be expressed as a function of the form:
_ -~(1- ~2) 2 (~ n )~
Wlth L~a~g being the average of all values determined from this spectrum and (3a being a parameter that is preferably chosen to be 0.002. Of course, many other mathematical forms for a are consistent with the methods of the present invention.

SUBSTITUTE SHEET (RULE 26) If I(ave)>K(ave), then S2(mP) is given a value of 0. This indicates that there was no suppression of the peak at mP in the second spectrum, and therefore, it is not the first peak in an'60/'s0 isotopic cluster.
A high value of S2(mP) indicates a high probability that the peak arises from a y ion.
In order to convert S2 values to probabilities, after calculation of all S2 values, the scoring values are divided through by its maximum value.
Finally, a filtered spectrum SS may be calculated using equation 8, obtained by substituting equation (2) into equation (1), to calculate an intensity for a peak at each value of mP:
SS(mP)= SP1(mP)* S1(mP)*S2(mP) (8) The procedure described is preferably repeated for every peak in both spectra, or according to choice for as many peaks as are of interest.
As demonstrated hereinabove, the scoring functions for every peak depend on peak specific parameters (such as suppression and deviation from the expected isotopic distribution for an '$O-labelled peak) and on parameters which can only be calculated if all suppressions and all deviations are known (i.e., giving rise to the averaged values Da~J.
Therefore, in a preferred embodiment, the calculation of the filtered spectrum starts by ensuring that all deviations and suppressions are calculated for every peak.
Subsequently, the scoring values for all peaks are calculated and then the spectrum is multiplied with the two scoring functions. It is therefore preferred that no peak cluster is skipped. All calculations are done for all peaks always evaluating every peak for its characteristic whether this one could be the first of an'60f$O cluster. Even if one peak could be the first isotope it would be premature to skip all the peaks belonging to this cluster since it can not be determined, a priori that the first peak already evaluated is really the first one before the others had not been evaluated for the same purpose. For example, it is possible that the second or third peak are much more appropriate first peaks of an'60/'$O
cluster.
Determinatioh ofAmiho Acid Sequences Having identified a series of y-ions in the mass spectrum of the peptide, it is possible to deduce the sequence of the peptide by considering mass differences between adjacent y-ion peaks in the spectrum.
The method of differential scanning in conjunction with the algorithm of the present invention permits identification of peaks in the mass spectrum which correspond to the series of y-ion subfragments. Each ion in this series contains the C-terminus of the peptide.

SUBSTITUTE SHEET (RULE 26) In ideal conditions of collision induced dissociation, in which the peptide amide bonds are preferentially cleaved, each y-ion corresponds to a peptide subfragment containing an exact number of amino acid residues. Accordingly, if every peptide amide bond is cleaved in the collision chamber, each y ion in the series differs from the nearest y-ion in mass by the mass S of an amino acid residue. Because all amino acid residues have unique masses, except for leucine and isoleucine whose masses are the same, by calculating the mass difference between adjacent y ion peaks, it is possible to identify the exact amino acid residue which has been cleaved from the heavier fragment in order to produce the lighter fragment or to show that it must be either leucine or isoleucine.
10' In one embodiment of the present invention, once a mass difference has been computed for a pair. of adjacent y-ion peaks, the mass difference is compared with the mass of each of the 20 naturally occurring amino acid residues in turn until a match is found. If the mass difference is the same as that of one of the amino acid residues, the corresponding position in the peptide sequence is assigned to that amino acid residue. This procedure is 15 repeated for each adjacent pair of y ion peaks in the mass spectrum. In a preferred embodiment of the present invention, if a mass difference between two adjacent y-ion peaks does not correspond to the mass of one of the 20 naturally occurnng amino acids, the mass difference is compared to the sums of the masses of all pairs of amino acid residues to search for a match. If a match is found with a pair of amino acid masses, the two amino 20 . acid residues are placed in the sequence. The peptide amide bond between this pair of amino acids has not cleaved easily enough in the collision chamber to generate a separate subfragment containing each of the pair of residues. In this case, it is not possible to infer the order in which the pair of amino acids occurs unless other information, for example for other overlapping fragments, is available.
25 In a preferred embodiment of the present invention, the procedure of matching mass differences between adjacent y ion peaks is repeated for each distinct peptide or peptide fragment produced by enzymatic digestion of the protein. The sequence of each peptide or peptide fragment is deduced and the sequence of the protein inferred by joining or overlapping the sequences of each fragment, according to methods well known to one 30 skilled in the art (See, for example, Mann, M., "A shortcut to interesting human genes:peptide sequence tags, expressed-sequence tags and computers," Trends in Biological Science, (1996), 21:494-495).
REFERENCES CITED

SUBSTITUTE SHEET (RULE 26) All references cited herein are incorporated herein by re~ferL~iic.'e irt their'entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.
Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art.
The specific embodiments described herein are offered by way of example only, and the invention is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled.
Alternate Embodiments The present invention can be implemented as a computer program product that includes a computer program mechanism embedded in a computer readable storage medium. For instance, the computer program product could contain a number of separate program modules that may be stored on a CD-ROM, magnetic disk storage product, or any other computer readable data or program storage product. The software modules in the computer program product may also be distributed electronically, via the Internet or otherwise, by transmission of a computer data signal (in which the software modules are embedded) on a carrier wave.
While the present invention has been described with reference to a few specific embodiments, the description is illustrative of the invention and is not to be construed as limiting the invention. Various modifications may occur to those skilled in the art without departing from the true spirit and scope of the invention as defined by the appended claims.

SUBSTITUTE SHEET (RULE 26) SEQUENCE LISTING
<110> EMBLEM
<120> MASS SPECTROMETRY METHOD
<130> 9882-010-228 <160> 1 <170> PatentIn version 3.1 <210> 1 <211> 26 <212> PRT
<213> Unknown <220>
<223> Description of unknown sequence: Unknown peptide <400> 1 Leu Phe Val Arg Pro Phe Pro Leu Asp Val Gln G1u Ser Glu Leu Asn Glu Ile Phe Gly Pro Phe Gly Pro Phe Lys SUBSTITUTE SHEET (RULE 26)

Claims

What is claimed is:

1. An apparatus for determining the amino acid residue sequence of a peptide, comprising:
an input device configured to accept mass spectrometry data obtained by applying differential scanning mass spectrometry to a sample of the peptide in which an isotopic label is present in a proportion which is different from its natural abundance;
a processor configured to execute mathematical operations on said mass spectrometry data; and a memory connected to said processor to store:
a first set of instructions to direct said processor to generate a probability that a peak in said mass spectrometry data derives from a y-ion subfragment of said peptide wherein said first set of instructions are repeatedly executed for each peak in said mass spectrometry data;
a second set of instructions to direct said processor to produce a filtered mass spectrum of said peptide, wherein each peak in said filtered mass spectrum whose intensity is greater than a threshold value, is predicted to correspond to a y-ion subfragment of said peptide; and a third set of instructions to direct said processor to derive and store in said memory an amino acid residue sequence of said peptide from said filtered mass spectrum.

2. The apparatus of claim 1, wherein said mass spectrometry data comprises a first mass spectrum that has signals from subfragment ions in which said isotopic label is present and from subfragment ions in which said isotopic label is absent, and a second mass spectrum in which signals from subfragment ions in which said isotopic label is not present are substantially suppressed.

3. The apparatus of claim 1, wherein said isotopic label is 18O.

4. The apparatus of claim 3, wherein said proportion is 50%.

5. The apparatus of claim 3, wherein said proportion is 33%.

6. The apparatus of claim 2, wherein said probability is computed from a product of a first scoring value, S1, and a second scoring value, S2, wherein said first scoring value is proportional to (i) the likelihood that a peak in said first mass spectrum arises from an isotopic cluster that comprises a signal from a subfragment ion in which said isotopic label is absent and (ii) a signal from a subfragment ion in which said isotopic label is present in said proportion; and wherein said second scoring value is proportional to the likelihood that a peak in said second mass spectrum arises from an isotopic cluster containing a peak from a subfragment ion in which said isotopic label is present in said proportion and in which a peak from a subfragment ion in which said isotopic label is absent is effectively suppressed relative to said first mass spectrum.

7. The apparatus of Claim 6, wherein said first scoring value, S1, is computed for a peak at m p from the relationship wherein .lambda. and 6 are user-defined parameters, and wherein I n is the intensity of the nth peak in the isotopic cluster in which peak m p is found in said first spectrum and I n* is the calculated intensity of the nth peak in the isotopic cluster in which peak m p is found in a spectrum according to natural isotopic abundances.

8. The apparatus of claim 7 in which the value of .lambda. is 10Ø

9. The apparatus of claim 7 in which 6 is given by:

10. The apparatus of claim 7 wherein said second scoring value, S2, is computed for a peak at m p from the relationship wherein .lambda.2 and .sigma.2 are user-defined parameters, and wherein I n is the intensity of the nth peals in the isotopic cluster in which peak m p is found in said first spectrum and I n* is the calculated intensity of the nth peak in the isotopic cluster in which peak m p is found in a spectrum according to natural isotopic abundances.

11. The apparatus of claim 10 wherein the value of .lambda.2 is 5.

12. The apparatus of claim 10 in which .sigma.2 is given by:

wherein .beta.2 is a parameter.

13. The apparatus of Claim 2, wherein said third set of instructions comprises instructions to: calculate a mass difference between a pair of peaks in said filtered mass spectrum and compare said mass difference with the mass of the residue of each of the 20 naturally occurring amino acids.

14. A method for determining the amino acid residue sequence of a peptide, said method comprising:
accepting mass spectrometry data obtained by applying differential scanning mass spectrometry to a sample of the peptide in which an isotopic label is present in a proportion which is different from its natural abundance;
generating a probability that a peak in said mass spectrometry data derives from a y-ion subfragment of said peptide wherein said first set of instructions are repeatedly executed for each peak in said mass spectrometry data;
producing a filtered mass spectrum of said peptide, wherein each peak in said filtered mass spectrum whose intensity is greater than a threshold value, is predicted to correspond to a y-ion subfragment of said peptide; and deriving an amino acid residue sequence of said peptide from said filtered mass spectrum.

15. The method of claim 14, wherein said mass spectrometry data comprises a first mass spectrum that has signals from subfragment ions in which said isotopic label is present and from subfragment ions in which said isotopic label is absent, and a second mass spectrum in which signals from subfragment ions in which said isotopic label is clot present are substantially suppressed.

16. The method of claim 15, wherein said isotopic label is 18O.

17. The method of claim 15, wherein said proportion is 50%.

18. The method of claim 15, wherein said proportion is 33%.

19. The method of claim 15, wherein said probability is computed from a product of a first scoring value, S1, and a second scoring value, S2, wherein said first scoring value is proportional to (i) the likelihood that a peak in said first mass spectrum arises from an isotopic cluster that comprises a signal from a subfragment ion in which said isotopic label is absent and (ii) a signal from a subfragment ion in which said isotopic label is present in said proportion; and wherein said second scoring value is proportional to the likelihood that a peak in said second mass spectrum arises from an isotopic cluster containing a peak from a subfragment ion in which said isotopic label is present in said proportion and in which a peak from a subfragment ion in which said isotopic label is absent is effectively suppressed relative to said first mass spectrum.

20. The method of claim 19, wherein said first factor, S1, is computed for peak m p from the relationship wherein .lambda. and a are user-defined parameters, and wherein I n is the intensity of the nth peak in the isotopic cluster in which peak m p is found in said first spectrum and I n* is the intensity of the nth peak in the isotopic cluster in which peak m p is found in a spectrum calculated according to natural isotopic abundances.

21. The method of claim 20 in which the value of .lambda. is 10Ø

22. The method of claim 20 wherein said second scoring value, S2, is computed for a peak at m p from the relationship wherein .lambda.2 and .sigma.2 are user-defined parameters, and wherein I n is the intensity of the nth peak in the isotopic cluster in which peak m p is found in said first spectrum and I n* is the calculated intensity of the nth peak in the isotopic cluster in which peak m p is found in a spectrum according to natural isotopic abundances.

23. The method of claim 22 wherein the value of .lambda.2 is 5.

24. The method of claim 22 in which .sigma.2 is given by:

wherein .beta.2 is a parameter.

25. The method of Claim 15, wherein the step of producing a filtered mass spectrum for said peptide comprises calculating a mass difference between a pair of peaks in said filtered mass spectrum and comparing said mass difference with the mass of each residue of the 20 naturally occurring amino acids.

26. The method of any one of Claims 14 - 25, wherein said method is executed by a computer under the control of a program, said computer including a memory for storing said program, an input device configured to accept mass spectrometry data and a processor configured to execute mathematical operations on said mass spectrometry data.