WO2008111911A1 - System and process for pre-filtering of tandem mass spectra using piecewise convolution - Google Patents
System and process for pre-filtering of tandem mass spectra using piecewise convolution Download PDFInfo
- Publication number
- WO2008111911A1 WO2008111911A1 PCT/SG2007/000070 SG2007000070W WO2008111911A1 WO 2008111911 A1 WO2008111911 A1 WO 2008111911A1 SG 2007000070 W SG2007000070 W SG 2007000070W WO 2008111911 A1 WO2008111911 A1 WO 2008111911A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- peaks
- convolution
- filtering
- polypeptide
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/0027—Methods for using particle spectrometers
- H01J49/0036—Step by step routines describing the handling of the data generated during a measurement
Definitions
- the present invention generally relates to mass spectrometry technologies, and more particularly to a system and process for pre-f ⁇ ltering of tandem mass spectra using piecewise convolution.
- Mass Spectrometry (MS) in proteomics is a powerful analytical tool that is used for obtaining amino acid sequences of peptides and protein identification.
- a mass spectrometer is an instrument that measures the mass-to-charge ratio of individual molecules that have been converted into electrically charged gas-phase molecules, or ions. These ions are filtered in such a way as to produce an ordered separation of the ions as they pass through the instrument, ordered from lower to higher mass-to-charge ratios. The data is typically displayed as a plot of intensity vs. mass-to-charge ratio.
- Mass spectrometry is experiencing a period of rapid growth because of the applications in proteomic analyses. As a consequence of the rapid growth, the need to collect the large amounts of information produced and make it available to researchers worldwide has ended with a number of MS databases.
- MS databases For example, the Illinois Bio-Grid Mass Spectrometry Database (IBG-MSD) is a public database of curated and annotated empirically derived mass spectra of peptides. The goal of the database is to address the need for a pubic database of mass spectrometry data and to implement a useful web interface that will allow researchers to access the data and perform a variety of tasks based on their individual needs.
- IBG-MSD Illinois Bio-Grid Mass Spectrometry Database
- MS/MS tandem mass spectrometry
- OMSSA Open Mass Spectrometry Search Algorithm
- the Open Mass Spectrometry Search Algorithm is an efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST.
- the OMSSA noise filter has several steps. The first step is to delete background peaks whose intensity is below a percentage of the maximum intensity peak.
- OMSSA cuts peaks below 2.5% of the maximum intensity, although this value is user adjustable and is modified dynamically in the last part of the algorithm.
- the second step is to delete any remaining precursor ion peaks.
- the third step is to eliminate peaks that are obviously not mono-isotopic. This is accomplished by examining peaks in order of intensity and deleting peaks that are within 0 to 2 Da of the m/z value of the peak being examined.
- the last step in the noise filter is to filter out peaks that are too close together. In this step, precursor charge 1+ and 2+ spectra are treated differently from charge 3+ spectra. It is assumed that 1+ and 2+ spectra contain mainly 1+ product ions. For these spectra, the filter examines each peak in order of intensity.
- (X) of chromatography mass spectrometry (XCMS) as a preprocessing and analysis algorithm includes peak detection, peak matching, and retention time alignment.
- the peak detection is based on cutting the LC/MS data into slices a fraction of a mass unit (0.1 m/z) wide and then operating on those individual slices in the chromatographic time domain. Within each slice, the signal is determined by taking the maximum intensity at each time point in the slice. After filtration, peaks are selected using a signal-to-noise ratio cutoff. Because of the second-derivative transformation and the resulting negative 5
- the computer programs evaluates the quality of the MS/MS spectrum based on the number of peaks obtained from the performance of piecewise convolution.
- FIG 1 illustrates various breakage points within a peptide during fragmentation.
- FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR.
- FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks.
- FIG 4 shows an exemplary illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention.
- FIG 5 is a figure showing a common tandem mass spectrum.
- FIG 6 is a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
- FIG 7 is a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6.
- FIG 8 is a figure showing the scatter plot of MF Score versus Mascot score.
- FIG 9 is a figure showing the scatter plot of MF Score versus SONAR score.
- FIG 10 is a plot showing the sensitivity and the selectivity of the present invention.
- FIG 11 shows the pseudo code of the pre-filtering algorithm using piecewise convolution that was used in the tests of the present invention. 6
- FIG 12 shows the scores obtained from MASCOT, SONAR and the algorithm of the present invention.
- a peptide When a peptide is subjected to fragmentation by collision with neutral gas molecules, the fragmentation is induced by transfer of kinetic energy from the neutral gas molecules to the peptide. While the breakage can occur between any bonds in the peptide, it commonly occurs at the peptide bond.
- a peptide When a peptide is fragmented at a single peptide bond between the carbonyl and nitrogen, two fragments are formed. In the case where one peptide fragment retains the positive charge at the C-terminus of the peptide ion, it is called a y-ion. If the fragment retains the positive charge at the N-terminus, it is known as a b-ion.
- a b-ion usually has a complementary y-ion.
- fragmentation may cause various breakage points within a peptide, resulting in different fragments.
- other ions including a-ions and z-ions (complementary pairs) and c-ions and x-ions (complementary pairs) are also formed. These ions are formed when fragmentation occurs at high-energy conditions since higher amounts of energy are required to break these bonds.
- FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR.
- the weights of the b-ions are as follows:
- y(n) [775.39 676.32 605.28 548.26 374.22 303.18 246.16 175.12].
- the weight differences between the b-ion peaks and the weight differences between the y-ion peaks are calculated.
- the weight difference between b 2 and bs ions is 71.04 Daltons, which corresponds to the amino acid 'Alanine'
- the weight difference between b 3 and b 4 ions is 57.02 Daltons, which corresponds to the amino acid
- the inventors of the present application discovered that there is symmetry in mass spectra data around the mid point of a tandem MS/MS spectrum, where the mid point equals to (mass-to-charge ratio of peptide) / 2.
- the weight difference between fy and b 2 is equivalent to that between y 8 and y 7 as they represent the same amino acid 'Alanine' at 71.04 Dalton and b ⁇ corresponds to y 8 while b 2 corresponds to y 7 .
- the fast Fourier transform is a discrete Fourier transform algorithm which reduces the number of computations needed for ⁇ points from 2 iV 2 to 2.VIg-V 5 where
- Ig is the base-2 logarithm. If the function to be transformed is not harmonically related to the sampling frequency, the response of an FFT looks like a sine function (although the integrated power is still correct). Aliasing (leakage) can be reduced by apodization using a tapering function. However, aliasing reduction is at the expense of broadening the spectral response.
- FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks.
- the pre-filtering process comprises noise peaks filtering, isotopic peaks removing, and piecewise convolution.
- the noise peaks filtering is the initial step of the pre-filtering process.
- the noise peaks in MS/MS spectrum data are removed by their intensities.
- a peak can be treated as a noise one if it falls below a pre-determined intensity threshold that can be set by a user according to certain parameters.
- the noise peaks are simply removed from the MS/MS spectrum data for any further processing.
- the isotopic peaks result from the existence of isotopes for certain atoms, interfering with the identification of amino acid sequences of peptides; thus the following step of pre-filtering process is to remove the isotopic peaks. In one embodiment, only the 10
- first high peak is kept while the isotopic peaks are removed, where the isotopic peaks are within 3 Dalton from the first peak.
- the MS/MS spectrum data After the MS/MS spectrum data have been cleaned by removing the noise and isotopic peak, they are ready for the piecewise convolution. As discussed above, the tandem mass spectrum owns the symmetry property.
- FIG 4 there is provided an illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention.
- the algorithm then performs convolution on the two "pieces" of spectra.
- FIG 5 there is provided a figure showing a common tandem mass spectrum.
- FIG 6 there is provided a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
- FIG 7 there is provided a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6.
- This pre- filtered mass spectrum can be sent for protein identification, either by de novo methods or database search methods.
- the number of the peaks left with high intensity can be used as an indication of the quality of the mass spectrum.
- a spectrum producing higher number of peaks should be better than the one that produces fewer peaks.
- the MS Filtered Score is defined as the quality measurement of the mass spectrum, by counting the number of high intensity peaks remain after the filtering process. 11
- FIG 8 there is provided a figure showing the scatter plot of MF Score versus Mascot score. There is a linear correlation between these scores, i.e. as the Mascot score increases, MF score increases linearly, which implies that good mass spectra produce high matching peak counts.
- FIG 9 there is provided a figure showing the scatter plot of MF Score versus SONAR score.
- MF score increases linearly, which again implies that good mass spectra produce high matching peak counts.
- FIG 10 there is provided a plot showing the sensitivity and the selectivity of the present invention.
- a good MS is defined as the one that protein can be found in a database search.
- a bad MS is the one that does not yield a result after database search.
- a system by which MS/MS spectra data can be processed by the piecewise convolution as described above comprises electronic means for receiving the MS/MS spectra data, and computer executable storage medium in which the computer programs are embedded.
- the embedded programs can perform the piecewise convolution on MS/MS spectra data.
- the system further comprises an output means for outputting and displaying the results.
- the system may be any electronic device including PC, Notebook, or the like.
Landscapes
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The present invention provides a process for pre-filtering a MS/MS spectrum of a polypeptide byperforming piecewise convolution upon the MS/MS spectrum. The present invention also provides a system for pre-filtering a MS/MS spectrum of a polypeptide.
Description
SYSTEM AND PROCESS FOR PRE-FILTERING OF TANDEM MASS SPECTRA
USING PIECEWISE CONVOLUTION
Field of the Invention
[0001] The present invention generally relates to mass spectrometry technologies, and more particularly to a system and process for pre-fϊltering of tandem mass spectra using piecewise convolution.
Background of the Invention
[0002] Mass Spectrometry (MS) in proteomics is a powerful analytical tool that is used for obtaining amino acid sequences of peptides and protein identification. A mass spectrometer is an instrument that measures the mass-to-charge ratio of individual molecules that have been converted into electrically charged gas-phase molecules, or ions. These ions are filtered in such a way as to produce an ordered separation of the ions as they pass through the instrument, ordered from lower to higher mass-to-charge ratios. The data is typically displayed as a plot of intensity vs. mass-to-charge ratio. The ionization techniques generally used to ionize sample peptides in most proteomic analyses are Matrix Assisted Laser Desorption Ionization (MALDI) and Electrospray Ionization (ESI). [0003] Mass spectrometry is experiencing a period of rapid growth because of the applications in proteomic analyses. As a consequence of the rapid growth, the need to collect the large amounts of information produced and make it available to researchers worldwide has ended with a number of MS databases. For example, the Illinois Bio-Grid Mass Spectrometry Database (IBG-MSD) is a public database of curated and annotated empirically derived mass spectra of peptides. The goal of the database is to address the need for a pubic database of mass spectrometry data and to implement a useful web interface that will allow researchers to access the data and perform a variety of tasks based on their individual needs.
[0004] With the advent of tandem mass spectrometry (MS/MS), the identification of proteins has become an easier and more accurate process. MS/MS involves two stages of mass spectrometry analysis of a sample in a single experiment. The first stage filters out
the ions of interest from the sample. These selected ions are then passed into a collision cell that breaks the ions into smaller ion fragments. The second stage then separates and detects these smaller ion fragments.
[0005] Currently, the actual MS/MS data have to be pre-filtered before being used for identification of amino acid sequences of peptides; this is because the actual MS/MS data contain two many noises from the systems and/or artificial operations. Many known pre-filtering algorithms are available; a few of them will be described briefly herein. [0006] The Open Mass Spectrometry Search Algorithm (OMSSA) is an efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST. The OMSSA noise filter has several steps. The first step is to delete background peaks whose intensity is below a percentage of the maximum intensity peak. By default, OMSSA cuts peaks below 2.5% of the maximum intensity, although this value is user adjustable and is modified dynamically in the last part of the algorithm. The second step is to delete any remaining precursor ion peaks. The third step is to eliminate peaks that are obviously not mono-isotopic. This is accomplished by examining peaks in order of intensity and deleting peaks that are within 0 to 2 Da of the m/z value of the peak being examined. The last step in the noise filter is to filter out peaks that are too close together. In this step, precursor charge 1+ and 2+ spectra are treated differently from charge 3+ spectra. It is assumed that 1+ and 2+ spectra contain mainly 1+ product ions. For these spectra, the filter examines each peak in order of intensity. In a region ±27 Da of the peak being examined, all except the most intense peak is deleted under the assumption that in a region of this size, no other peak from the same ion series will exist and only one ion from the complementary ion series can exist. 27 Da is used as it is less than the residue mass of the smallest amino acid. [0007] (X) of chromatography mass spectrometry (XCMS) as a preprocessing and analysis algorithm includes peak detection, peak matching, and retention time alignment. The peak detection is based on cutting the LC/MS data into slices a fraction of a mass unit (0.1 m/z) wide and then operating on those individual slices in the chromatographic time domain. Within each slice, the signal is determined by taking the maximum intensity at each time point in the slice. After filtration, peaks are selected using a signal-to-noise ratio cutoff. Because of the second-derivative transformation and the resulting negative
5
[0020] In another embodiment of the system, the computer programs evaluates the quality of the MS/MS spectrum based on the number of peaks obtained from the performance of piecewise convolution.
[0021] The objectives and advantages of the invention will become apparent from the following detailed description of preferred embodiments thereof in connection with the accompanying drawings.
Brief Description of the Drawings
[0022] Preferred embodiments according to the present invention will now be described with reference to the Figures, in which like reference numerals denote like elements.
[0023] FIG 1 illustrates various breakage points within a peptide during fragmentation.
[0024] FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR.
[0025] FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks.
[0026] FIG 4 shows an exemplary illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention.
[0027] FIG 5 is a figure showing a common tandem mass spectrum.
[0028] FIG 6 is a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
[0029] FIG 7 is a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6.
[0030] FIG 8 is a figure showing the scatter plot of MF Score versus Mascot score.
[0031] FIG 9 is a figure showing the scatter plot of MF Score versus SONAR score.
[0032] FIG 10 is a plot showing the sensitivity and the selectivity of the present invention.
[0033] FIG 11 shows the pseudo code of the pre-filtering algorithm using piecewise convolution that was used in the tests of the present invention.
6
[0034] FIG 12 shows the scores obtained from MASCOT, SONAR and the algorithm of the present invention.
[0035]
Detailed Description of the Invention
[0036] The present invention may be understood more readily with reference to the following detailed description of certain embodiments of the invention. [0037] Throughout this application, where publications are referenced, the disclosures of these publications are hereby incorporated by reference, in their entireties, into this application in order to more fully describe the state of art to which this invention pertains.
[0038] In the following detailed description, specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the relevant art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and materials have not been described in detail so as not to obscure the present invention. [0039] The present invention teaches various embodiments of the system and process that pre-filtering MS/MS spectrum data by employing the principles of convolution. The general principles of convolution and fragmentation are well known in their relevant arts; nonetheless, for better understanding of the present invention brief descriptions of convolution and fragmentation are provided herein.
[0040] Now there is provided a brief description of peptide fragmentation and resultant mass spectra.
[0041] When a peptide is subjected to fragmentation by collision with neutral gas molecules, the fragmentation is induced by transfer of kinetic energy from the neutral gas molecules to the peptide. While the breakage can occur between any bonds in the peptide, it commonly occurs at the peptide bond. When a peptide is fragmented at a single peptide bond between the carbonyl and nitrogen, two fragments are formed. In the case where one peptide fragment retains the positive charge at the C-terminus of the peptide ion, it is called a y-ion. If the fragment retains the positive charge at the N-terminus, it is known as a b-ion. A b-ion usually has a complementary y-ion. When a singly charged peptide is fragmented,
7 the charge is retained only at one terminus and only the fragment containing the charge is detected while the other fragment is lost as a neutral fragment. Doubly charged peptides tend to produce two singly charged ions, though sometimes doubly charged ions can also be formed, b-ions and y-ions are usually formed when fragmentation occurs under low energy conditions.
[0042] As shown in FIG 1, fragmentation may cause various breakage points within a peptide, resulting in different fragments. In addition to y-ion and b-ion, other ions including a-ions and z-ions (complementary pairs) and c-ions and x-ions (complementary pairs) are also formed. These ions are formed when fragmentation occurs at high-energy conditions since higher amounts of energy are required to break these bonds.
[0043] When mass spectra contain mainly of y-ions and b-ions, they can be used for protein sequencing and identification. FIG 2 shows an exemplary MS/MS spectrum of a peptide AVAGCAGAR. The weights of the b-ions are as follows:
[0044] b(n) = [171.11 242.15 299.17 473.22 544.26 601.28 672.31].
[0045] The weights of the y-ions are as follows:
[0046] y(n)=[775.39 676.32 605.28 548.26 374.22 303.18 246.16 175.12].
[0047] To derive the peptide sequence from the above spectrum, the weight differences between the b-ion peaks and the weight differences between the y-ion peaks are calculated. For example, from the spectrum above, the weight difference between b2 and bs ions is 71.04 Daltons, which corresponds to the amino acid 'Alanine', and the weight difference between b3 and b4 ions is 57.02 Daltons, which corresponds to the amino acid
'Glycine', and so on, thus yielding the final peptide sequence of AVAGCAGAR.
[0048] The inventors of the present application discovered that there is symmetry in mass spectra data around the mid point of a tandem MS/MS spectrum, where the mid point equals to (mass-to-charge ratio of peptide) / 2. For example, as shown in FIG 2, the weight difference between fy and b2 is equivalent to that between y8 and y7 as they represent the same amino acid 'Alanine' at 71.04 Dalton and bϊ corresponds to y8 while b2 corresponds to y7. Likewise, the weight difference between b2 and b3 is equivalent to that which is between y7 and y6 as they represent the same amino acid 'Glycine' at 57.02 Dalton and b3 corresponds to V6, and so on. This observed symmetry proved to be a useful feature that is suitable for the application of convolution in pre-filtering MS/MS spectrum data.
8
[0049] Now there is provided a brief description of the fast Fourier transform and convolution.
[0050] The fast Fourier transform (FFT) is a discrete Fourier transform algorithm which reduces the number of computations needed for ^points from 2 iV2to 2.VIg-V5 where
Ig is the base-2 logarithm. If the function to be transformed is not harmonically related to the sampling frequency, the response of an FFT looks like a sine function (although the integrated power is still correct). Aliasing (leakage) can be reduced by apodization using a tapering function. However, aliasing reduction is at the expense of broadening the spectral response.
[0051] Fast Fourier transform algorithms generally fall into two classes: decimation in time, and decimation in frequency. The Cooley-Tukey FFT algorithm first rearranges the input elements in bit-reversed order, and then builds the output transform (decimation in time). The basic idea is to break up a transform of length ^into two transforms of length
N/^using the identity according to the following equation (1):
[0052] iV-i -V/2-1 Nβ-i
H=O S=O H=O ' I '
-V/2-1 iV/2-1
,evHi Λ-2 Sr- H ft/røC) ^ β-2 πih/N V « odd e~2 *»* */£#£)
S=O H=O
[0053] The algorithm for if ft (x) is the same as the algorithm for f f t (x) , except for a sign change and a scale factor of n = length (x) . As for f f t, the execution time for if ft depends on the length of the transform. It is fastest for powers of two. It is almost as fast for lengths that have only small prime factors. It is typically several times slower for lengths that are prime or which have large prime factors.
[0054] Convolution was performed using two blocks of 8 intensities each at any one time, starting from both ends of the spectrum, till the entire spectrum is being processed. 8 peaks was chosen because FFT was used to perform the convolution, hence the length is best to be of 2Λn. In addition, the precision of this length would be 0.8 Dalton. [0055] Let / (£)and 8 ifi be arbitrary functions of time ^ with Fourier transforms so as to have the following equations (2) and (3):
9
[0056]
[0057] Where ^v-" (^denotes the inverse Fourier transform (where the transform pair is defined to have constants A = land £ = -2 π). Then the convolution is performed according to the following equation (4): [0058]
[0059] Interchange the order of integration according to the following equation (5):
[0060]
[0061] Now, there is provided a more detailed description of the pre-filtering process in accordance with one embodiment of the present invention. FIG 3 shows a typical MS/MS spectrum data including noise and isotopic peaks. The pre-filtering process comprises noise peaks filtering, isotopic peaks removing, and piecewise convolution. [0062] The noise peaks filtering is the initial step of the pre-filtering process. In one embodiment, the noise peaks in MS/MS spectrum data are removed by their intensities. A peak can be treated as a noise one if it falls below a pre-determined intensity threshold that can be set by a user according to certain parameters. The noise peaks are simply removed from the MS/MS spectrum data for any further processing. [0063] The isotopic peaks result from the existence of isotopes for certain atoms, interfering with the identification of amino acid sequences of peptides; thus the following step of pre-filtering process is to remove the isotopic peaks. In one embodiment, only the
10
first high peak is kept while the isotopic peaks are removed, where the isotopic peaks are within 3 Dalton from the first peak.
[0064] After the MS/MS spectrum data have been cleaned by removing the noise and isotopic peak, they are ready for the piecewise convolution. As discussed above, the tandem mass spectrum owns the symmetry property. Now referring to FIG 4, there is provided an illustration of the principles of the piecewise convolution in accordance with one embodiment of the present invention. The algorithm first picks two "pieces" of spectra from the given spectrum; the first piece, "left piece", is extracted from the lowest m/z end of the spectrum, and the second piece, "right piece", is extracted from the highest m/z end of the spectrum, which also contains the precursor mass of the peptide for charge z=l. The algorithm then performs convolution on the two "pieces" of spectra. The peak values of this convolution are kept as the representative peaks of the selected piece. Then the "left piece" is represented by the next inner piece of the spectrum at its right side; likewise the next "right piece" is replaced by the inner piece of the spectrum at its left side. The peak values of this convolution are again kept as the representative peaks of the second piece. This process is iterated until the final "left piece" and "right piece" have been convolved, and the representative peaks of the last piece have been obtained.
[0065] Now referring to FIG 5, there is provided a figure showing a common tandem mass spectrum.
[0066] Now referring to FIG 6, there is provided a figure showing a portion of the resultant tandem mass spectrum after the noise and isotopic filtering processes, where all peaks have normalized into 100.
[0067] Now referring to FIG 7, there is provided a figure showing the result of piecewise convolution performed on the tandem mass spectrum shown in FIG 6. This pre- filtered mass spectrum can be sent for protein identification, either by de novo methods or database search methods.
[0068] There is another novel use of the result. The number of the peaks left with high intensity can be used as an indication of the quality of the mass spectrum. A spectrum producing higher number of peaks should be better than the one that produces fewer peaks. In one embodiment, the MS Filtered Score (MF Score) is defined as the quality measurement of the mass spectrum, by counting the number of high intensity peaks remain after the filtering process.
11
[0069] EXAMPLES
[0070] The following examples are for the sole purpose of illustrating the present invention. It by no means intends to limit the scope of the present invention.
[0071] The pseudo code of the pre-filtering algorithm using piece wise convolution that was used in the tests of the present invention is shown in FIG 11.
[0072] The test was performed with 557 unpublished MS/MS datasets. Two software programmes were used to verify the results; the first one is MASCOT, and the second one is SONAR. The scores obtained from MASCOT, SONAR and the algorithm of the present invention are shown in FIG 12.
[0073] Now referring to FIG 8, there is provided a figure showing the scatter plot of MF Score versus Mascot score. There is a linear correlation between these scores, i.e. as the Mascot score increases, MF score increases linearly, which implies that good mass spectra produce high matching peak counts.
[0074] Now referring to FIG 9, there is provided a figure showing the scatter plot of MF Score versus SONAR score. There is a linear correlation between these scores, i.e. as the SONAR score increases, MF score increases linearly, which again implies that good mass spectra produce high matching peak counts.
[0075] Now referring to FIG 10, there is provided a plot showing the sensitivity and the selectivity of the present invention. A good MS is defined as the one that protein can be found in a database search. A bad MS is the one that does not yield a result after database search. A cautious note: there could be proteins which are not found in the current protein database that makes the search fails to identify any protein while the spectrum could be a good one. The plot of the good-MS and bad-MS curves shows an intersection at a score=0.3788. This threshold of score helps in differentiating the passing MS from the failing MS judging from the score itself.
[0076] Table 1. Results from SONAR
[0077] So the sensitivity and selectivity could be calculated:
12
[0078] Sensitivity = 243 / ( 243+48 ) = 0.835052 = 83.5052%
[0079] Selectivity = 223 / ( 231+30 ) = 0.854406 = 85.4406%
[0080] In certain embodiments of the present invention, there is provided a system by which MS/MS spectra data can be processed by the piecewise convolution as described above. The system comprises electronic means for receiving the MS/MS spectra data, and computer executable storage medium in which the computer programs are embedded. The embedded programs can perform the piecewise convolution on MS/MS spectra data. The system further comprises an output means for outputting and displaying the results. The system may be any electronic device including PC, Notebook, or the like. [0081] While the present invention has been described with reference to particular embodiments, it is understood that the embodiments are illustrative and that the invention scope is not so limited as such. Alternative embodiments of the present invention will become apparent to those having ordinary skill in the art to which the present invention pertains. Such alternate embodiments are considered to be encompassed within the spirit and scope of the present invention. Accordingly, the scope of the present invention is described by the appended claims and is supported by the foregoing description.
Claims
1. A process for pre-filtering a MS/MS spectrum of a polypeptide, said process comprising the steps of: obtaining the MS/MS spectrum of a polypeptide from a source; and performing piecewise convolution upon the MS/MS spectrum, and then outputting the piecewise convoluted MS/MS spectrum to an algorithm for identifying the amino acid sequence of the polypeptide; thereby the amino acid sequence of the polypeptide can be identified.
2. The process of claim 1, wherein before the step of performing piecewise convolution, further comprising a step of removing the noise peaks that have intensities lower than a pre-determined threshold.
3. The process of claim 1, wherein before the step of performing piecewise convolution, further comprising a step of removing isotopic peaks.
4. The process of claim 1, wherein before the step of performing piecewise convolution, further comprises a step of normalization, where all selected peaks are normalized into a pre-set scale.
5. The process of claim 1, wherein the step of performing piecewise convolution evaluates the quality of the MS/MS spectrum based on the number of peaks.
6. A system for pre-filtering a MS/MS spectrum of a polypeptide, said system comprising: a computer executable storage medium in which computer programs are embedded, wherein the computer programs enable the system to obtain the MS/MS spectrum from a 14
source, and to perform piecewise convolution upon the MS/MS spectrum; thereby the amino acid sequence of the polypeptide can be generated.
7. The system of claim 6, wherein the computer programs further enables the system to remove the noise peaks that have intensities lower than a pre-determined threshold.
8. The system of claim 6, wherein the computer programs further enables the system to remove isotopic peaks.
9. The system of claim 6, wherein the computer programs further enables the system to normalize all selected peaks into a pre-set scale.
10. The system of claim 6, wherein the computer programs evaluates the quality of the MS/MS spectrum based on the number of peaks obtained from the performance of piecewise convolution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2007/000070 WO2008111911A1 (en) | 2007-03-13 | 2007-03-13 | System and process for pre-filtering of tandem mass spectra using piecewise convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2007/000070 WO2008111911A1 (en) | 2007-03-13 | 2007-03-13 | System and process for pre-filtering of tandem mass spectra using piecewise convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2008111911A1 true WO2008111911A1 (en) | 2008-09-18 |
WO2008111911A9 WO2008111911A9 (en) | 2009-11-26 |
Family
ID=39759766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2007/000070 WO2008111911A1 (en) | 2007-03-13 | 2007-03-13 | System and process for pre-filtering of tandem mass spectra using piecewise convolution |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008111911A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8455818B2 (en) | 2010-04-14 | 2013-06-04 | Wisconsin Alumni Research Foundation | Mass spectrometry data acquisition mode for obtaining more reliable protein quantitation |
US9040903B2 (en) | 2011-04-04 | 2015-05-26 | Wisconsin Alumni Research Foundation | Precursor selection using an artificial intelligence algorithm increases proteomic sample coverage and reproducibility |
WO2016198984A1 (en) * | 2015-06-11 | 2016-12-15 | Dh Technologies Development Pte. Ltd. | Method for deconvolution |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000036285A (en) * | 1998-07-17 | 2000-02-02 | Jeol Ltd | Spectrum processing method for time-of-flight mass spectrometer |
JP2000113853A (en) * | 1998-10-07 | 2000-04-21 | Jeol Ltd | Mass spectrometry |
WO2004046731A2 (en) * | 2002-11-18 | 2004-06-03 | Ludwig Institute For Cancer Research | Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives |
US20050063864A1 (en) * | 2003-08-13 | 2005-03-24 | Akihiro Sano | Mass spectrometer system |
-
2007
- 2007-03-13 WO PCT/SG2007/000070 patent/WO2008111911A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000036285A (en) * | 1998-07-17 | 2000-02-02 | Jeol Ltd | Spectrum processing method for time-of-flight mass spectrometer |
JP2000113853A (en) * | 1998-10-07 | 2000-04-21 | Jeol Ltd | Mass spectrometry |
WO2004046731A2 (en) * | 2002-11-18 | 2004-06-03 | Ludwig Institute For Cancer Research | Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives |
US20050063864A1 (en) * | 2003-08-13 | 2005-03-24 | Akihiro Sano | Mass spectrometer system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8455818B2 (en) | 2010-04-14 | 2013-06-04 | Wisconsin Alumni Research Foundation | Mass spectrometry data acquisition mode for obtaining more reliable protein quantitation |
US9040903B2 (en) | 2011-04-04 | 2015-05-26 | Wisconsin Alumni Research Foundation | Precursor selection using an artificial intelligence algorithm increases proteomic sample coverage and reproducibility |
WO2016198984A1 (en) * | 2015-06-11 | 2016-12-15 | Dh Technologies Development Pte. Ltd. | Method for deconvolution |
US10128093B2 (en) | 2015-06-11 | 2018-11-13 | Dh Technologies Development Pte. Ltd. | Method for deconvolution |
Also Published As
Publication number | Publication date |
---|---|
WO2008111911A9 (en) | 2009-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7457708B2 (en) | Methods and devices for identifying related ions from chromatographic mass spectral datasets containing overlapping components | |
US8975577B2 (en) | System and method for grouping precursor and fragment ions using selected ion chromatograms | |
JP5008564B2 (en) | Method and apparatus for identifying proteins in a mixture | |
US7498568B2 (en) | Real-time analysis of mass spectrometry data for identifying peptidic data of interest | |
JP4988884B2 (en) | Mass spectrometry system | |
JP4515819B2 (en) | Mass spectrometry system | |
CN107066789B (en) | Use of windowed mass spectrometry data for retention time determination or validation | |
EP2741224A1 (en) | Methods for generating local mass spectral libraries for interpreting multiplexed mass spectra | |
EP1779406B1 (en) | Mass spectrometer | |
EP3779454A1 (en) | Methods for mass spectrometry of mixtures of proteins or polypeptides using proton transfer reaction | |
US10460919B2 (en) | Automated determination of mass spectrometer collision energy | |
JP2010256101A (en) | Method and device for analyzing glycopeptide structure | |
US20150162175A1 (en) | Methods for Isolation and Decomposition of Mass Spectrometric Protein Signatures | |
EP1457776B1 (en) | Methods and devices for identifying biopolymers using mass spectroscopy | |
JP4058449B2 (en) | Mass spectrometry method and mass spectrometer | |
JP2021081365A (en) | Glycopeptide analyzer | |
WO2008111911A1 (en) | System and process for pre-filtering of tandem mass spectra using piecewise convolution | |
EP3745407A1 (en) | Operating a mass spectrometer utilizing a promotion list | |
JP5696592B2 (en) | Mass spectrometry data analysis method and analysis apparatus | |
US20020120404A1 (en) | Methods and apparatus for mass fingerprinting of biomolecules | |
JP2009168695A (en) | Three-dimensional structure prediction method, three-dimensional structure prediction program, and mass spectroscope | |
KR101768098B1 (en) | Method and system for identification and quantification of peptide considering noise of quantitative mass spectrometry analysis based on isobaric tag |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07716158 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07716158 Country of ref document: EP Kind code of ref document: A1 |