WO2008111911A1 - System and process for pre-filtering of tandem mass spectra using piecewise convolution - Google Patents

System and process for pre-filtering of tandem mass spectra using piecewise convolution Download PDF

Info

Publication number
WO2008111911A1
WO2008111911A1 PCT/SG2007/000070 SG2007000070W WO2008111911A1 WO 2008111911 A1 WO2008111911 A1 WO 2008111911A1 SG 2007000070 W SG2007000070 W SG 2007000070W WO 2008111911 A1 WO2008111911 A1 WO 2008111911A1
Authority
WO
WIPO (PCT)
Prior art keywords
spectrum
peaks
convolution
filtering
polypeptide
Prior art date
Application number
PCT/SG2007/000070
Other languages
French (fr)
Other versions
WO2008111911A9 (en
Inventor
Keng Wah Choo
Tsu Soo Tan
Original Assignee
Nanyang Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanyang Polytechnic filed Critical Nanyang Polytechnic
Priority to PCT/SG2007/000070 priority Critical patent/WO2008111911A1/en
Publication of WO2008111911A1 publication Critical patent/WO2008111911A1/en
Publication of WO2008111911A9 publication Critical patent/WO2008111911A9/en

Links

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • H01J49/0036Step by step routines describing the handling of the data generated during a measurement

Definitions

  • the present invention generally relates to mass spectrometry technologies, and more particularly to a system and process for pre-f ⁇ ltering of tandem mass spectra using piecewise convolution.
  • Mass Spectrometry (MS) in proteomics is a powerful analytical tool that is used for obtaining amino acid sequences of peptides and protein identification.
  • a mass spectrometer is an instrument that measures the mass-to-charge ratio of individual molecules that have been converted into electrically charged gas-phase molecules, or ions. These ions are filtered in such a way as to produce an ordered separation of the ions as they pass through the instrument, ordered from lower to higher mass-to-charge ratios. The data is typically displayed as a plot of intensity vs. mass-to-charge ratio.
  • Mass spectrometry is experiencing a period of rapid growth because of the applications in proteomic analyses. As a consequence of the rapid growth, the need to collect the large amounts of information produced and make it available to researchers worldwide has ended with a number of MS databases.
  • MS databases For example, the Illinois Bio-Grid Mass Spectrometry Database (IBG-MSD) is a public database of curated and annotated empirically derived mass spectra of peptides. The goal of the database is to address the need for a pubic database of mass spectrometry data and to implement a useful web interface that will allow researchers to access the data and perform a variety of tasks based on their individual needs.
  • IBG-MSD Illinois Bio-Grid Mass Spectrometry Database
  • MS/MS tandem mass spectrometry
  • OMSSA Open Mass Spectrometry Search Algorithm
  • the Open Mass Spectrometry Search Algorithm is an efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST.
  • the OMSSA noise filter has several steps. The first step is to delete background peaks whose intensity is below a percentage of the maximum intensity peak.
  • OMSSA cuts peaks below 2.5% of the maximum intensity, although this value is user adjustable and is modified dynamically in the last part of the algorithm.
  • the second step is to delete any remaining precursor ion peaks.
  • the third step is to eliminate peaks that are obviously not mono-isotopic. This is accomplished by examining peaks in order of intensity and deleting peaks that are within 0 to 2 Da of the m/z value of the peak being examined.
  • the last step in the noise filter is to filter out peaks that are too close together. In this step, precursor charge 1+ and 2+ spectra are treated differently from charge 3+ spectra. It is assumed that 1+ and 2+ spectra contain mainly 1+ product ions. For these spectra, the filter examines each peak in order of intensity.
  • (X) of chromatography mass spectrometry (XCMS) as a preprocessing and analysis algorithm includes peak detection, peak matching, and retention time alignment.
  • the peak detection is based on cutting the LC/MS data into slices a fraction of a mass unit (0.1 m/z) wide and then operating on those individual slices in the chromatographic time domain. Within each slice, the signal is determined by taking the maximum intensity at each time point in the slice. After filtration, peaks are selected using a signal-to-noise ratio cutoff. Because of the second-derivative transformation and the resulting negative 5
  • the computer programs evaluates the quality of the MS/MS spectrum based on the number of peaks obtained from the performance of piecewise convolution.
  • FIG 1 illustrates various breakage points within a peptide during fragmentation.
  • FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR.
  • FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks.
  • FIG 4 shows an exemplary illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention.
  • FIG 5 is a figure showing a common tandem mass spectrum.
  • FIG 6 is a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
  • FIG 7 is a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6.
  • FIG 8 is a figure showing the scatter plot of MF Score versus Mascot score.
  • FIG 9 is a figure showing the scatter plot of MF Score versus SONAR score.
  • FIG 10 is a plot showing the sensitivity and the selectivity of the present invention.
  • FIG 11 shows the pseudo code of the pre-filtering algorithm using piecewise convolution that was used in the tests of the present invention. 6
  • FIG 12 shows the scores obtained from MASCOT, SONAR and the algorithm of the present invention.
  • a peptide When a peptide is subjected to fragmentation by collision with neutral gas molecules, the fragmentation is induced by transfer of kinetic energy from the neutral gas molecules to the peptide. While the breakage can occur between any bonds in the peptide, it commonly occurs at the peptide bond.
  • a peptide When a peptide is fragmented at a single peptide bond between the carbonyl and nitrogen, two fragments are formed. In the case where one peptide fragment retains the positive charge at the C-terminus of the peptide ion, it is called a y-ion. If the fragment retains the positive charge at the N-terminus, it is known as a b-ion.
  • a b-ion usually has a complementary y-ion.
  • fragmentation may cause various breakage points within a peptide, resulting in different fragments.
  • other ions including a-ions and z-ions (complementary pairs) and c-ions and x-ions (complementary pairs) are also formed. These ions are formed when fragmentation occurs at high-energy conditions since higher amounts of energy are required to break these bonds.
  • FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR.
  • the weights of the b-ions are as follows:
  • y(n) [775.39 676.32 605.28 548.26 374.22 303.18 246.16 175.12].
  • the weight differences between the b-ion peaks and the weight differences between the y-ion peaks are calculated.
  • the weight difference between b 2 and bs ions is 71.04 Daltons, which corresponds to the amino acid 'Alanine'
  • the weight difference between b 3 and b 4 ions is 57.02 Daltons, which corresponds to the amino acid
  • the inventors of the present application discovered that there is symmetry in mass spectra data around the mid point of a tandem MS/MS spectrum, where the mid point equals to (mass-to-charge ratio of peptide) / 2.
  • the weight difference between fy and b 2 is equivalent to that between y 8 and y 7 as they represent the same amino acid 'Alanine' at 71.04 Dalton and b ⁇ corresponds to y 8 while b 2 corresponds to y 7 .
  • the fast Fourier transform is a discrete Fourier transform algorithm which reduces the number of computations needed for ⁇ points from 2 iV 2 to 2.VIg-V 5 where
  • Ig is the base-2 logarithm. If the function to be transformed is not harmonically related to the sampling frequency, the response of an FFT looks like a sine function (although the integrated power is still correct). Aliasing (leakage) can be reduced by apodization using a tapering function. However, aliasing reduction is at the expense of broadening the spectral response.
  • FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks.
  • the pre-filtering process comprises noise peaks filtering, isotopic peaks removing, and piecewise convolution.
  • the noise peaks filtering is the initial step of the pre-filtering process.
  • the noise peaks in MS/MS spectrum data are removed by their intensities.
  • a peak can be treated as a noise one if it falls below a pre-determined intensity threshold that can be set by a user according to certain parameters.
  • the noise peaks are simply removed from the MS/MS spectrum data for any further processing.
  • the isotopic peaks result from the existence of isotopes for certain atoms, interfering with the identification of amino acid sequences of peptides; thus the following step of pre-filtering process is to remove the isotopic peaks. In one embodiment, only the 10
  • first high peak is kept while the isotopic peaks are removed, where the isotopic peaks are within 3 Dalton from the first peak.
  • the MS/MS spectrum data After the MS/MS spectrum data have been cleaned by removing the noise and isotopic peak, they are ready for the piecewise convolution. As discussed above, the tandem mass spectrum owns the symmetry property.
  • FIG 4 there is provided an illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention.
  • the algorithm then performs convolution on the two "pieces" of spectra.
  • FIG 5 there is provided a figure showing a common tandem mass spectrum.
  • FIG 6 there is provided a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
  • FIG 7 there is provided a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6.
  • This pre- filtered mass spectrum can be sent for protein identification, either by de novo methods or database search methods.
  • the number of the peaks left with high intensity can be used as an indication of the quality of the mass spectrum.
  • a spectrum producing higher number of peaks should be better than the one that produces fewer peaks.
  • the MS Filtered Score is defined as the quality measurement of the mass spectrum, by counting the number of high intensity peaks remain after the filtering process. 11
  • FIG 8 there is provided a figure showing the scatter plot of MF Score versus Mascot score. There is a linear correlation between these scores, i.e. as the Mascot score increases, MF score increases linearly, which implies that good mass spectra produce high matching peak counts.
  • FIG 9 there is provided a figure showing the scatter plot of MF Score versus SONAR score.
  • MF score increases linearly, which again implies that good mass spectra produce high matching peak counts.
  • FIG 10 there is provided a plot showing the sensitivity and the selectivity of the present invention.
  • a good MS is defined as the one that protein can be found in a database search.
  • a bad MS is the one that does not yield a result after database search.
  • a system by which MS/MS spectra data can be processed by the piecewise convolution as described above comprises electronic means for receiving the MS/MS spectra data, and computer executable storage medium in which the computer programs are embedded.
  • the embedded programs can perform the piecewise convolution on MS/MS spectra data.
  • the system further comprises an output means for outputting and displaying the results.
  • the system may be any electronic device including PC, Notebook, or the like.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The present invention provides a process for pre-filtering a MS/MS spectrum of a polypeptide byperforming piecewise convolution upon the MS/MS spectrum. The present invention also provides a system for pre-filtering a MS/MS spectrum of a polypeptide.

Description

SYSTEM AND PROCESS FOR PRE-FILTERING OF TANDEM MASS SPECTRA
USING PIECEWISE CONVOLUTION
Field of the Invention
[0001] The present invention generally relates to mass spectrometry technologies, and more particularly to a system and process for pre-fϊltering of tandem mass spectra using piecewise convolution.
Background of the Invention
[0002] Mass Spectrometry (MS) in proteomics is a powerful analytical tool that is used for obtaining amino acid sequences of peptides and protein identification. A mass spectrometer is an instrument that measures the mass-to-charge ratio of individual molecules that have been converted into electrically charged gas-phase molecules, or ions. These ions are filtered in such a way as to produce an ordered separation of the ions as they pass through the instrument, ordered from lower to higher mass-to-charge ratios. The data is typically displayed as a plot of intensity vs. mass-to-charge ratio. The ionization techniques generally used to ionize sample peptides in most proteomic analyses are Matrix Assisted Laser Desorption Ionization (MALDI) and Electrospray Ionization (ESI). [0003] Mass spectrometry is experiencing a period of rapid growth because of the applications in proteomic analyses. As a consequence of the rapid growth, the need to collect the large amounts of information produced and make it available to researchers worldwide has ended with a number of MS databases. For example, the Illinois Bio-Grid Mass Spectrometry Database (IBG-MSD) is a public database of curated and annotated empirically derived mass spectra of peptides. The goal of the database is to address the need for a pubic database of mass spectrometry data and to implement a useful web interface that will allow researchers to access the data and perform a variety of tasks based on their individual needs.
[0004] With the advent of tandem mass spectrometry (MS/MS), the identification of proteins has become an easier and more accurate process. MS/MS involves two stages of mass spectrometry analysis of a sample in a single experiment. The first stage filters out the ions of interest from the sample. These selected ions are then passed into a collision cell that breaks the ions into smaller ion fragments. The second stage then separates and detects these smaller ion fragments.
[0005] Currently, the actual MS/MS data have to be pre-filtered before being used for identification of amino acid sequences of peptides; this is because the actual MS/MS data contain two many noises from the systems and/or artificial operations. Many known pre-filtering algorithms are available; a few of them will be described briefly herein. [0006] The Open Mass Spectrometry Search Algorithm (OMSSA) is an efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST. The OMSSA noise filter has several steps. The first step is to delete background peaks whose intensity is below a percentage of the maximum intensity peak. By default, OMSSA cuts peaks below 2.5% of the maximum intensity, although this value is user adjustable and is modified dynamically in the last part of the algorithm. The second step is to delete any remaining precursor ion peaks. The third step is to eliminate peaks that are obviously not mono-isotopic. This is accomplished by examining peaks in order of intensity and deleting peaks that are within 0 to 2 Da of the m/z value of the peak being examined. The last step in the noise filter is to filter out peaks that are too close together. In this step, precursor charge 1+ and 2+ spectra are treated differently from charge 3+ spectra. It is assumed that 1+ and 2+ spectra contain mainly 1+ product ions. For these spectra, the filter examines each peak in order of intensity. In a region ±27 Da of the peak being examined, all except the most intense peak is deleted under the assumption that in a region of this size, no other peak from the same ion series will exist and only one ion from the complementary ion series can exist. 27 Da is used as it is less than the residue mass of the smallest amino acid. [0007] (X) of chromatography mass spectrometry (XCMS) as a preprocessing and analysis algorithm includes peak detection, peak matching, and retention time alignment. The peak detection is based on cutting the LC/MS data into slices a fraction of a mass unit (0.1 m/z) wide and then operating on those individual slices in the chromatographic time domain. Within each slice, the signal is determined by taking the maximum intensity at each time point in the slice. After filtration, peaks are selected using a signal-to-noise ratio cutoff. Because of the second-derivative transformation and the resulting negative 5
[0020] In another embodiment of the system, the computer programs evaluates the quality of the MS/MS spectrum based on the number of peaks obtained from the performance of piecewise convolution.
[0021] The objectives and advantages of the invention will become apparent from the following detailed description of preferred embodiments thereof in connection with the accompanying drawings.
Brief Description of the Drawings
[0022] Preferred embodiments according to the present invention will now be described with reference to the Figures, in which like reference numerals denote like elements.
[0023] FIG 1 illustrates various breakage points within a peptide during fragmentation.
[0024] FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR.
[0025] FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks.
[0026] FIG 4 shows an exemplary illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention.
[0027] FIG 5 is a figure showing a common tandem mass spectrum.
[0028] FIG 6 is a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
[0029] FIG 7 is a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6.
[0030] FIG 8 is a figure showing the scatter plot of MF Score versus Mascot score.
[0031] FIG 9 is a figure showing the scatter plot of MF Score versus SONAR score.
[0032] FIG 10 is a plot showing the sensitivity and the selectivity of the present invention.
[0033] FIG 11 shows the pseudo code of the pre-filtering algorithm using piecewise convolution that was used in the tests of the present invention. 6
[0034] FIG 12 shows the scores obtained from MASCOT, SONAR and the algorithm of the present invention.
[0035]
Detailed Description of the Invention
[0036] The present invention may be understood more readily with reference to the following detailed description of certain embodiments of the invention. [0037] Throughout this application, where publications are referenced, the disclosures of these publications are hereby incorporated by reference, in their entireties, into this application in order to more fully describe the state of art to which this invention pertains.
[0038] In the following detailed description, specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the relevant art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and materials have not been described in detail so as not to obscure the present invention. [0039] The present invention teaches various embodiments of the system and process that pre-filtering MS/MS spectrum data by employing the principles of convolution. The general principles of convolution and fragmentation are well known in their relevant arts; nonetheless, for better understanding of the present invention brief descriptions of convolution and fragmentation are provided herein.
[0040] Now there is provided a brief description of peptide fragmentation and resultant mass spectra.
[0041] When a peptide is subjected to fragmentation by collision with neutral gas molecules, the fragmentation is induced by transfer of kinetic energy from the neutral gas molecules to the peptide. While the breakage can occur between any bonds in the peptide, it commonly occurs at the peptide bond. When a peptide is fragmented at a single peptide bond between the carbonyl and nitrogen, two fragments are formed. In the case where one peptide fragment retains the positive charge at the C-terminus of the peptide ion, it is called a y-ion. If the fragment retains the positive charge at the N-terminus, it is known as a b-ion. A b-ion usually has a complementary y-ion. When a singly charged peptide is fragmented, 7 the charge is retained only at one terminus and only the fragment containing the charge is detected while the other fragment is lost as a neutral fragment. Doubly charged peptides tend to produce two singly charged ions, though sometimes doubly charged ions can also be formed, b-ions and y-ions are usually formed when fragmentation occurs under low energy conditions.
[0042] As shown in FIG 1, fragmentation may cause various breakage points within a peptide, resulting in different fragments. In addition to y-ion and b-ion, other ions including a-ions and z-ions (complementary pairs) and c-ions and x-ions (complementary pairs) are also formed. These ions are formed when fragmentation occurs at high-energy conditions since higher amounts of energy are required to break these bonds.
[0043] When mass spectra contain mainly of y-ions and b-ions, they can be used for protein sequencing and identification. FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR. The weights of the b-ions are as follows:
[0044] b(n) = [171.11 242.15 299.17 473.22 544.26 601.28 672.31].
[0045] The weights of the y-ions are as follows:
[0046] y(n)=[775.39 676.32 605.28 548.26 374.22 303.18 246.16 175.12].
[0047] To derive the peptide sequence from the above spectrum, the weight differences between the b-ion peaks and the weight differences between the y-ion peaks are calculated. For example, from the spectrum above, the weight difference between b2 and bs ions is 71.04 Daltons, which corresponds to the amino acid 'Alanine', and the weight difference between b3 and b4 ions is 57.02 Daltons, which corresponds to the amino acid
'Glycine', and so on, thus yielding the final peptide sequence of AVAGCAGAR.
[0048] The inventors of the present application discovered that there is symmetry in mass spectra data around the mid point of a tandem MS/MS spectrum, where the mid point equals to (mass-to-charge ratio of peptide) / 2. For example, as shown in FIG 2, the weight difference between fy and b2 is equivalent to that between y8 and y7 as they represent the same amino acid 'Alanine' at 71.04 Dalton and bϊ corresponds to y8 while b2 corresponds to y7. Likewise, the weight difference between b2 and b3 is equivalent to that which is between y7 and y6 as they represent the same amino acid 'Glycine' at 57.02 Dalton and b3 corresponds to V6, and so on. This observed symmetry proved to be a useful feature that is suitable for the application of convolution in pre-filtering MS/MS spectrum data. 8
[0049] Now there is provided a brief description of the fast Fourier transform and convolution.
[0050] The fast Fourier transform (FFT) is a discrete Fourier transform algorithm which reduces the number of computations needed for ^points from 2 iV2to 2.VIg-V5 where
Ig is the base-2 logarithm. If the function to be transformed is not harmonically related to the sampling frequency, the response of an FFT looks like a sine function (although the integrated power is still correct). Aliasing (leakage) can be reduced by apodization using a tapering function. However, aliasing reduction is at the expense of broadening the spectral response.
[0051] Fast Fourier transform algorithms generally fall into two classes: decimation in time, and decimation in frequency. The Cooley-Tukey FFT algorithm first rearranges the input elements in bit-reversed order, and then builds the output transform (decimation in time). The basic idea is to break up a transform of length ^into two transforms of length
N/^using the identity according to the following equation (1):
[0052] iV-i -V/2-1 Nβ-i
H=O S=O H=O ' I '
-V/2-1 iV/2-1
,evHi Λ-2 Sr- H ft/røC) ^ β-2 πih/N V « odd e~2 *»* */£#£)
S=O H=O
[0053] The algorithm for if ft (x) is the same as the algorithm for f f t (x) , except for a sign change and a scale factor of n = length (x) . As for f f t, the execution time for if ft depends on the length of the transform. It is fastest for powers of two. It is almost as fast for lengths that have only small prime factors. It is typically several times slower for lengths that are prime or which have large prime factors.
[0054] Convolution was performed using two blocks of 8 intensities each at any one time, starting from both ends of the spectrum, till the entire spectrum is being processed. 8 peaks was chosen because FFT was used to perform the convolution, hence the length is best to be of 2Λn. In addition, the precision of this length would be 0.8 Dalton. [0055] Let / (£)and 8 ifi be arbitrary functions of time ^ with Fourier transforms so as to have the following equations (2) and (3): 9
[0056]
Figure imgf000008_0001
[0057] Where ^v-" (^denotes the inverse Fourier transform (where the transform pair is defined to have constants A = land £ = -2 π). Then the convolution is performed according to the following equation (4): [0058]
Figure imgf000008_0002
[0059] Interchange the order of integration according to the following equation (5):
[0060]
Figure imgf000008_0003
[0061] Now, there is provided a more detailed description of the pre-filtering process in accordance with one embodiment of the present invention. FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks. The pre-filtering process comprises noise peaks filtering, isotopic peaks removing, and piecewise convolution. [0062] The noise peaks filtering is the initial step of the pre-filtering process. In one embodiment, the noise peaks in MS/MS spectrum data are removed by their intensities. A peak can be treated as a noise one if it falls below a pre-determined intensity threshold that can be set by a user according to certain parameters. The noise peaks are simply removed from the MS/MS spectrum data for any further processing. [0063] The isotopic peaks result from the existence of isotopes for certain atoms, interfering with the identification of amino acid sequences of peptides; thus the following step of pre-filtering process is to remove the isotopic peaks. In one embodiment, only the 10
first high peak is kept while the isotopic peaks are removed, where the isotopic peaks are within 3 Dalton from the first peak.
[0064] After the MS/MS spectrum data have been cleaned by removing the noise and isotopic peak, they are ready for the piecewise convolution. As discussed above, the tandem mass spectrum owns the symmetry property. Now referring to FIG 4, there is provided an illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention. The algorithm first picks two "pieces" of spectra from the given spectrum; the first piece, "left piece", is extracted from the lowest m/z end of the spectrum, and the second piece, "right piece", is extracted from the highest m/z end of the spectrum, which also contains the precursor mass of the peptide for charge z=l. The algorithm then performs convolution on the two "pieces" of spectra. The peak values of this convolution are kept as the representative peaks of the selected piece. Then the "left piece" is represented by the next inner piece of the spectrum at its right side; likewise the next "right piece" is replaced by the inner piece of the spectrum at its left side. The peak values of this convolution are again kept as the representative peaks of the second piece. This process is iterated until the final "left piece" and "right piece" have been convolved, and the representative peaks of the last piece have been obtained.
[0065] Now referring to FIG 5, there is provided a figure showing a common tandem mass spectrum.
[0066] Now referring to FIG 6, there is provided a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
[0067] Now referring to FIG 7, there is provided a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6. This pre- filtered mass spectrum can be sent for protein identification, either by de novo methods or database search methods.
[0068] There is another novel use of the result. The number of the peaks left with high intensity can be used as an indication of the quality of the mass spectrum. A spectrum producing higher number of peaks should be better than the one that produces fewer peaks. In one embodiment, the MS Filtered Score (MF Score) is defined as the quality measurement of the mass spectrum, by counting the number of high intensity peaks remain after the filtering process. 11
[0069] EXAMPLES
[0070] The following examples are for the sole purpose of illustrating the present invention. It by no means intends to limit the scope of the present invention.
[0071] The pseudo code of the pre-filtering algorithm using piece wise convolution that was used in the tests of the present invention is shown in FIG 11.
[0072] The test was performed with 557 unpublished MS/MS datasets. Two software programmes were used to verify the results; the first one is MASCOT, and the second one is SONAR. The scores obtained from MASCOT, SONAR and the algorithm of the present invention are shown in FIG 12.
[0073] Now referring to FIG 8, there is provided a figure showing the scatter plot of MF Score versus Mascot score. There is a linear correlation between these scores, i.e. as the Mascot score increases, MF score increases linearly, which implies that good mass spectra produce high matching peak counts.
[0074] Now referring to FIG 9, there is provided a figure showing the scatter plot of MF Score versus SONAR score. There is a linear correlation between these scores, i.e. as the SONAR score increases, MF score increases linearly, which again implies that good mass spectra produce high matching peak counts.
[0075] Now referring to FIG 10, there is provided a plot showing the sensitivity and the selectivity of the present invention. A good MS is defined as the one that protein can be found in a database search. A bad MS is the one that does not yield a result after database search. A cautious note: there could be proteins which are not found in the current protein database that makes the search fails to identify any protein while the spectrum could be a good one. The plot of the good-MS and bad-MS curves shows an intersection at a score=0.3788. This threshold of score helps in differentiating the passing MS from the failing MS judging from the score itself.
[0076] Table 1. Results from SONAR
Figure imgf000010_0001
[0077] So the sensitivity and selectivity could be calculated: 12
[0078] Sensitivity = 243 / ( 243+48 ) = 0.835052 = 83.5052%
[0079] Selectivity = 223 / ( 231+30 ) = 0.854406 = 85.4406%
[0080] In certain embodiments of the present invention, there is provided a system by which MS/MS spectra data can be processed by the piecewise convolution as described above. The system comprises electronic means for receiving the MS/MS spectra data, and computer executable storage medium in which the computer programs are embedded. The embedded programs can perform the piecewise convolution on MS/MS spectra data. The system further comprises an output means for outputting and displaying the results. The system may be any electronic device including PC, Notebook, or the like. [0081] While the present invention has been described with reference to particular embodiments, it is understood that the embodiments are illustrative and that the invention scope is not so limited as such. Alternative embodiments of the present invention will become apparent to those having ordinary skill in the art to which the present invention pertains. Such alternate embodiments are considered to be encompassed within the spirit and scope of the present invention. Accordingly, the scope of the present invention is described by the appended claims and is supported by the foregoing description.

Claims

13CLAIMSWhat is claimed is:
1. A process for pre-filtering a MS/MS spectrum of a polypeptide, said process comprising the steps of: obtaining the MS/MS spectrum of a polypeptide from a source; and performing piecewise convolution upon the MS/MS spectrum, and then outputting the piecewise convoluted MS/MS spectrum to an algorithm for identifying the amino acid sequence of the polypeptide; thereby the amino acid sequence of the polypeptide can be identified.
2. The process of claim 1, wherein before the step of performing piecewise convolution, further comprising a step of removing the noise peaks that have intensities lower than a pre-determined threshold.
3. The process of claim 1, wherein before the step of performing piecewise convolution, further comprising a step of removing isotopic peaks.
4. The process of claim 1, wherein before the step of performing piecewise convolution, further comprises a step of normalization, where all selected peaks are normalized into a pre-set scale.
5. The process of claim 1, wherein the step of performing piecewise convolution evaluates the quality of the MS/MS spectrum based on the number of peaks.
6. A system for pre-filtering a MS/MS spectrum of a polypeptide, said system comprising: a computer executable storage medium in which computer programs are embedded, wherein the computer programs enable the system to obtain the MS/MS spectrum from a 14
source, and to perform piecewise convolution upon the MS/MS spectrum; thereby the amino acid sequence of the polypeptide can be generated.
7. The system of claim 6, wherein the computer programs further enables the system to remove the noise peaks that have intensities lower than a pre-determined threshold.
8. The system of claim 6, wherein the computer programs further enables the system to remove isotopic peaks.
9. The system of claim 6, wherein the computer programs further enables the system to normalize all selected peaks into a pre-set scale.
10. The system of claim 6, wherein the computer programs evaluates the quality of the MS/MS spectrum based on the number of peaks obtained from the performance of piecewise convolution.
PCT/SG2007/000070 2007-03-13 2007-03-13 System and process for pre-filtering of tandem mass spectra using piecewise convolution WO2008111911A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SG2007/000070 WO2008111911A1 (en) 2007-03-13 2007-03-13 System and process for pre-filtering of tandem mass spectra using piecewise convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2007/000070 WO2008111911A1 (en) 2007-03-13 2007-03-13 System and process for pre-filtering of tandem mass spectra using piecewise convolution

Publications (2)

Publication Number Publication Date
WO2008111911A1 true WO2008111911A1 (en) 2008-09-18
WO2008111911A9 WO2008111911A9 (en) 2009-11-26

Family

ID=39759766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2007/000070 WO2008111911A1 (en) 2007-03-13 2007-03-13 System and process for pre-filtering of tandem mass spectra using piecewise convolution

Country Status (1)

Country Link
WO (1) WO2008111911A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8455818B2 (en) 2010-04-14 2013-06-04 Wisconsin Alumni Research Foundation Mass spectrometry data acquisition mode for obtaining more reliable protein quantitation
US9040903B2 (en) 2011-04-04 2015-05-26 Wisconsin Alumni Research Foundation Precursor selection using an artificial intelligence algorithm increases proteomic sample coverage and reproducibility
WO2016198984A1 (en) * 2015-06-11 2016-12-15 Dh Technologies Development Pte. Ltd. Method for deconvolution

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000036285A (en) * 1998-07-17 2000-02-02 Jeol Ltd Spectrum processing method for time-of-flight mass spectrometer
JP2000113853A (en) * 1998-10-07 2000-04-21 Jeol Ltd Mass spectrometry
WO2004046731A2 (en) * 2002-11-18 2004-06-03 Ludwig Institute For Cancer Research Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives
US20050063864A1 (en) * 2003-08-13 2005-03-24 Akihiro Sano Mass spectrometer system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000036285A (en) * 1998-07-17 2000-02-02 Jeol Ltd Spectrum processing method for time-of-flight mass spectrometer
JP2000113853A (en) * 1998-10-07 2000-04-21 Jeol Ltd Mass spectrometry
WO2004046731A2 (en) * 2002-11-18 2004-06-03 Ludwig Institute For Cancer Research Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives
US20050063864A1 (en) * 2003-08-13 2005-03-24 Akihiro Sano Mass spectrometer system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8455818B2 (en) 2010-04-14 2013-06-04 Wisconsin Alumni Research Foundation Mass spectrometry data acquisition mode for obtaining more reliable protein quantitation
US9040903B2 (en) 2011-04-04 2015-05-26 Wisconsin Alumni Research Foundation Precursor selection using an artificial intelligence algorithm increases proteomic sample coverage and reproducibility
WO2016198984A1 (en) * 2015-06-11 2016-12-15 Dh Technologies Development Pte. Ltd. Method for deconvolution
US10128093B2 (en) 2015-06-11 2018-11-13 Dh Technologies Development Pte. Ltd. Method for deconvolution

Also Published As

Publication number Publication date
WO2008111911A9 (en) 2009-11-26

Similar Documents

Publication Publication Date Title
US7457708B2 (en) Methods and devices for identifying related ions from chromatographic mass spectral datasets containing overlapping components
US8975577B2 (en) System and method for grouping precursor and fragment ions using selected ion chromatograms
JP5008564B2 (en) Method and apparatus for identifying proteins in a mixture
US7498568B2 (en) Real-time analysis of mass spectrometry data for identifying peptidic data of interest
JP4988884B2 (en) Mass spectrometry system
JP4515819B2 (en) Mass spectrometry system
CN107066789B (en) Use of windowed mass spectrometry data for retention time determination or validation
EP2741224A1 (en) Methods for generating local mass spectral libraries for interpreting multiplexed mass spectra
EP1779406B1 (en) Mass spectrometer
EP3779454A1 (en) Methods for mass spectrometry of mixtures of proteins or polypeptides using proton transfer reaction
US10460919B2 (en) Automated determination of mass spectrometer collision energy
JP2010256101A (en) Method and device for analyzing glycopeptide structure
US20150162175A1 (en) Methods for Isolation and Decomposition of Mass Spectrometric Protein Signatures
EP1457776B1 (en) Methods and devices for identifying biopolymers using mass spectroscopy
JP4058449B2 (en) Mass spectrometry method and mass spectrometer
JP2021081365A (en) Glycopeptide analyzer
WO2008111911A1 (en) System and process for pre-filtering of tandem mass spectra using piecewise convolution
EP3745407A1 (en) Operating a mass spectrometer utilizing a promotion list
JP5696592B2 (en) Mass spectrometry data analysis method and analysis apparatus
US20020120404A1 (en) Methods and apparatus for mass fingerprinting of biomolecules
JP2009168695A (en) Three-dimensional structure prediction method, three-dimensional structure prediction program, and mass spectroscope
KR101768098B1 (en) Method and system for identification and quantification of peptide considering noise of quantitative mass spectrometry analysis based on isobaric tag

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07716158

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07716158

Country of ref document: EP

Kind code of ref document: A1