WO2007140355A2 - Analyzing mass spectral data - Google Patents

Analyzing mass spectral data Download PDF

Info

Publication number
WO2007140355A2
WO2007140355A2 PCT/US2007/069832 US2007069832W WO2007140355A2 WO 2007140355 A2 WO2007140355 A2 WO 2007140355A2 US 2007069832 W US2007069832 W US 2007069832W WO 2007140355 A2 WO2007140355 A2 WO 2007140355A2
Authority
WO
WIPO (PCT)
Prior art keywords
ion
ions
fragments
mass
peak shape
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2007/069832
Other languages
English (en)
French (fr)
Other versions
WO2007140355A3 (en
Inventor
Yongdong Wang
Ming Gu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cerno Bioscience LLC
Original Assignee
Cerno Bioscience LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cerno Bioscience LLC filed Critical Cerno Bioscience LLC
Priority to JP2009512334A priority Critical patent/JP5393449B2/ja
Priority to EP07811952A priority patent/EP2032238A4/en
Priority to CA002653400A priority patent/CA2653400A1/en
Publication of WO2007140355A2 publication Critical patent/WO2007140355A2/en
Publication of WO2007140355A3 publication Critical patent/WO2007140355A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • H01J49/0036Step by step routines describing the handling of the data generated during a measurement
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions

Definitions

  • the present invention relates to mass spectrometry systems. More particularly, it relates to mass spectrometry systems that are useful for the analysis of complex mixtures of molecules, including large organic molecules such as proteins or peptides, environmental pollutants, pharmaceuticals and their metabolites, and petrochemical compounds, to methods of analysis used therein, and to a computer program product having computer code embodied therein for causing a computer, or a computer and a mass spectrometer in combination, to affect such analysis.
  • LC/MS/MS Liquid chromatography interfaced with tandem mass spectrometry
  • This method involves a few processes including digestion of proteins, LC separation of peptide mixtures generated from the protein digests, MS/MS analysis of the resulting peptides, and database search for protein identification.
  • the key to effectively identify proteins with LC/MS/MS is to produce as many high quality MS/MS spectra as possible to allow for reliable matching during database search. This is achieved by a data- dependent scanning technique in a quadrupole or an ion trap instrument.
  • the mass spectrometer checks the intensities and signal to noise ratios of the most abundant ion(s) in a full scan MS spectrum and perform MS/MS experiments when the intensities and signal to noise ratios of the most abundant ions exceed a preset or predetermined threshold.
  • the three most abundant ions are selected for the product ion scans to maximize the sequence information and minimize the time required, as the selection of more than three ions for MS/MS experiments would possibly result in missing other qualified peptides currently eluting from the LC to the mass spectrometer.
  • LC/MS/MS for identification of proteins is largely due to its many outstanding analytical characteristics. Firstly, it is a quite robust technique with excellent reproducibility. It has been demonstrated that it is reliable for high throughput LC/MS/MS analysis for protein identification. Secondly, when using nanospray ionization, the technique delivers quality MS/MS spectra of peptides at sub-femtomole levels. Thirdly, the
  • MS/MS spectra carry sequence information of both C-terminal and N-terminal ions. This valuable information can be used not only for identification of proteins, but also for pinpointing what post translational modifications (PTM) have occurred to the protein and at which amino acid reside the PTM take place.
  • PTM post translational modifications
  • MALDI Matrix-Assisted Laser Desorption Ionization
  • TOF time of flight
  • MALDI/TOF is commonly used to detect 2DE separated intact proteins because of its excellent speed, high sensitivity, wide mass range, high resolution, and contaminant- forgivingness.
  • MALDI/TOF with capabilities of delay extraction and reflecting ion optics can achieve impressive mass accuracy at 1-10 ppm and mass resolution with m/ ⁇ m at 10000-15000 for the accurate analysis of peptides.
  • MS/MS capability in MALDI/TOF is one of the major limitations for its use in proteomics applications.
  • PSD Post Source Decay
  • MALDI/TOF does generate sequence-like MS/MS information for peptides, but the operation of PSD often is not as robust as that of a triple quadrupole or an ion trap mass spectrometer.
  • a newly developed MALDI TOF/TOF system (T. Rejtar et al, J. Proteomr. Res. 1(2) 171- 179 (2002)) delivers many attractive features.
  • the system consists of two TOFs and a collision cell, which is similar to the configuration of a tandem quadrupole system.
  • the first TOF is used to select precursor ions that undergo collisional induced dissociation (CID) in the cell to generate fragment ions. Subsequently, the fragment ions are detected by the second TOF.
  • CID collisional induced dissociation
  • TOF/TOF is able to perform as many data dependent MS/MS experiments as necessary, while a typical LC/MS/MS system selects only a few abundant ions for the experiments.
  • This unique development makes it possible for TOF/TOF to perform industry scale proteomic analysis.
  • the proposed solution is to collect fractions from 2D LC experiments and spot the fractions onto an MALDI plate for MS/MS. As a result, more MS/MS spectra can be acquired for more reliable protein identification by database search as the quality of MS/MS spectra generated by high-energy CID in TOF/TOF is far better than PSD spectra.
  • FTICR-MS Fourier-Transform Ion-Cyclotron Resonance MS
  • FTMS Fourier-Transform Ion-Cyclotron Resonance MS
  • AMT Accurate Mass Tags
  • the user is usually supplied with a standard material having several known ions covering the mass spectral m/z range of interest.
  • peak positions of these standard ions are determined either in terms of centroids or peak maxima through a low order polynomial fit at the peak top. These peak positions are then fit to the known peak positions through either 1 st or other higher order polynomial fit to calibrate the mass (m/z) axis.
  • a typical mass spectral data trace would then be subjected to peak analysis where peaks (ions) are identified.
  • This peak detection routine is a highly empirical and compounded process where peak shoulders, noise in data trace, baselines due to chemical backgrounds or contamination, isotope peak interferences, etc., are considered.
  • centroiding For the peaks identified, a process called centroiding is typically applied to attempt to calculate the integrated peak areas and peak positions. Due to the many interfering factors outlined above and the intrinsic difficulties in determining peak areas in the presence of other peaks and/or baselines, this is a process plagued by many adjustable parameters that can make an isotope peak appear or disappear with no objective measures of the centroiding quality.
  • Nonlinear Operation uses a multi-stage disjointed process with many empirically adjustable parameters during each stage.
  • Systematic errors biasses
  • biases are generated at each stage and propagated down to the later stages in an uncontrolled, unpredictable, and nonlinear manner, making it impossible for the algorithms to report meaningful statistics as measures of data processing quality and reliability.
  • centroid mass spectral data In nearly all applications of mass spectrometry, it is the form of centroid mass spectral data that will be compared with known mass spectral centroid data, acquired separately, from a known database, or from theoretical isotope calculations, for the purpose of ion or ion fragment identification.
  • centroiding When one form of acquired centroid data is compared with another form acquired earlier or on a different instrument, the above mentioned errors associated with mass determination and peak area integration (centroiding) appear twice (once for each instrument) before the actual comparison.
  • mass spectrometry In many applications of mass spectrometry, such as with the use of MS/MS, electron impact (EI) ionization, electro-spray ionization (ESI), and post source decay (PSD), an ion in the sample can typically be observed at multiple m/z (or mass) positions due to the creation of many fragment ions or the same ion with different charge states, or both. Even with the poorly processed centroid data mentioned above, the added information from multiple fragments can typically reduce the number of hits during a search while increasing the search confidence. This has made possible some important applications of mass spectrometry:
  • centroiding first and searching or comparison second typically has large peak integration errors associated with it, an issue further compounded by the experimentally varying abundances. This typically leads to algorithms that ignore the peak area or signal intensities through some form of normalization, for example, as disclosed in the United States Patent 5,538,897. While normalization provides an easy solution computationally, it inevitably results in the loss of valuable information regarding the likelihood of a particular ion fragment under consideration.
  • An additional aspect of the invention is, in general, a computer readable medium having thereon computer readable code for use with a mass spectrometer system having a data analysis portion including a computer, the computer readable code being for causing the computer to analyze data by performing the methods described herein.
  • the computer readable medium preferably further comprises computer readable code for causing the computer to perform at least one the specific methods described.
  • the invention is also directed generally to a mass spectrometer system for analyzing chemical composition, the system including a mass spectrometer portion, and a data analysis system, the data analysis system operating by obtaining calibrated continuum spectral data by processing raw spectral data; generally in accordance with the methods described herein.
  • the data analysis portion may be configured to operate in accordance with the specifics of these methods.
  • the mass spectrometer system further comprises a sample preparation portion for preparing samples to be analyzed, and a sample separation portion for performing an initial separation of samples to be analyzed.
  • the separation portion may comprise at least one of an electrophoresis apparatus, a chemical affinity chip, or a chromatograph for separating the sample into various components.
  • Fig. 1 is a block diagram of a mass spectrometer in accordance with the invention.
  • Fig. 2 is flow chart of the steps in the analysis used by the system of Fig. 1.
  • FIG. 1 there is shown a block diagram of an analysis system 10, that may be used to analyze proteins or other molecules, as noted above, incorporating features of the present invention.
  • an analysis system 10 that may be used to analyze proteins or other molecules, as noted above, incorporating features of the present invention.
  • FIG. 1 a block diagram of an analysis system 10, that may be used to analyze proteins or other molecules, as noted above, incorporating features of the present invention.
  • the present invention will be described with reference to the single embodiment shown in the drawings, it should be understood that the present invention can be embodied in many alternate forms of embodiments. In addition, any suitable types of components could be used.
  • the sample preparation portion 12 may include a sample introduction unit 20, of the type that introduces a sample containing proteins or peptides of interest to system 10, such as Finnegan LCQ Deca XP Max, manufactured by Thermo Electron Corporation of Waltham, MA, USA.
  • the sample preparation portion 12 may also include an analyte separation unit 22, which is used to perform a preliminary separation of analytes, such as the proteins to be analyzed by system 10.
  • Analyte separation unit 22 may be any one of a chromatography column, an electrophoresis separation unit, such as a gel-based separation unit manufactured by Bio- Rad Laboratories, Inc. of Hercules, CA, and is well known in the art. In general, a voltage is applied to the unit to cause the proteins to be separated as a function of one or more variables, such as migration speed through a capillary tube, isoelectric focusing point
  • the mass spectrometer portion 14 may be a conventional mass spectrometer and may be any one available, but is preferably one of MALDI-TOF, quadrupole MS, ion trap MS, qTOF, TOF/TOF, or FTMS. If it has a MALDI or electrospray ionization ion source, such ion source may also provide for sample input to the mass spectrometer portion 14.
  • mass spectrometer portion 14 may include an ion source 24, a mass analyzer 26 for separating ions generated by ion source 24 by mass to charge ratio, an ion detector portion 28 for detecting the ions from mass analyzer 26, and a vacuum system 30 for maintaining a sufficient vacuum for mass spectrometer portion 14 to operate efficiently. If mass spectrometer portion 14 is an ion mobility spectrometer, generally no vacuum system is needed and the data generated are typically called a plasmagram instead of a mass spectrum.
  • the data analysis system 16 includes a data acquisition portion 32, which may include one or a series of analog to digital converters (not shown) for converting signals from ion detector portion 28 into digital data.
  • This digital data is provided to a real time data processing portion 34, which process the digital data through operations such as summing and/or averaging.
  • a post processing portion 36 may be used to do additional processing of the data from real time data processing portion 34, including library searches, data storage and data reporting.
  • Computer system 18 provides control of sample preparation portion 12, mass spectrometer portion 14, and data analysis system 16, in the manner described below.
  • Computer system 18 may have a conventional computer monitor 40 to allow for the entry of data on appropriate screen displays, and for the display of the results of the analyses performed.
  • Computer system 18 may be based on any appropriate personal computer, operating for example with a Windows® or UNIX® operating system, or any other appropriate operating system.
  • Computer system 18 will typically have a hard drive 42, on which the operating system and the program for performing the data analysis described below is stored.
  • a drive 44 for accepting a CD or floppy disk is used to load the program in accordance with the invention on to computer system 18.
  • the program for controlling sample preparation portion 12 and mass spectrometer portion 14 will typically be downloaded as firmware for these portions of system 10.
  • Data analysis system 16 may be a program written to implement the processing steps discussed below, in any of several programming languages such as C++, JAVA or Visual Basic.
  • r is an (n x 1) matrix of the profile mode mass spectral data measured of the sample, digitized at n m/z values
  • c is a (p x 1) matrix of regression coefficients which are representative of the concentrations of p ions or fragments in the sample
  • K is an (n x p) matrix composed of profile mode mass spectral responses for the p components, all sampled at the same n m/z points as r
  • e is an (n x 1) matrix of a fitting residual with contributions from random noise and any systematic deviations from this model.
  • peak components The components arranged in the columns of matrix K will be referred to as peak components, which may optionally include any baseline of known functionality such as a column of l's for a flat baseline or an arithmetic series for a sloping baseline.
  • a key peak component in matrix K is the known mass spectral response for the ion or fragment of interest, which can either be experimentally measured or theoretically calculated.
  • the peak component in matrix K be calculated as the convolution of the theoretical isotope distribution and the known mass spectral peak shape function.
  • This known mass spectral peak shape function may be directly measured from a section of the mass spectral data, mathematically calculated from actual measurements through deconvolution, or given by the target peak shape function if a comprehensive mass spectral calibration has already been applied, all using the approach outlined in United States Patent No. 6,983,213 and International Patent Application PCT/US2004/034618 filed on October 20, 2004.
  • actual measured profile mode MS data may be used as a peak component in K.
  • This actual measured profile mode MS data can be from, for example, an established library of many ions or fragments, which may have been measured on a different instrument (or instruments) of preferably higher resolution and quality. It is preferred that these library mass spectra have been calibration using the above mentioned comprehensive mass spectral calibration process involving peak shape functions to insure as close a match as possible between r and K in terms of mass spectral peak shape functions.
  • the centroid data from a library such as the EI library from NIST as described by S. E. Stein, J. Am. Soc. Mass Spectrom. 1999, 10, 770, can be convoluted with a peak shape function matching that for the spectrum in r to create peak components for inclusion in K.
  • a peak component included in K does not have to correspond to a pure ion or fragment. It can be a linear combination of a few ions or fragments, as would be the case when isotope labeled protein or peptide fragments are involved in MS/MS experiment.
  • the isotope pattern for each ion or fragment can be calculated or measured separately before combining the isotope patterns with given concentration ratios to form a single peak component in K.
  • one or more first derivatives corresponding to that of a peak component, a known linear combination of several peak components, or the measured mass spectral data r may be added into the peak components matrix K to account for any relative mass spectral errors between r and K.
  • K + is a form of the inverse of K, which can, for example, take the form of:
  • K + (K 1 K)- 1 K 1
  • concentration vector c contains the concentration information of all included peak components including any baseline contribution automatically determined.
  • concentration vector c contain the relative mass error information for the given components included in the peak component matrix.
  • weighted regression For most mass spectrometry applications where the noise in the mass spectral response r typically comes from ion shot noise, it is advantageous to use weighted regression in the above model where the weight at each mass sampling point would be inversely proportional to the signal variance at this mass spectral sampling point, i.e., the mass spectral intensity in r. This is further described by John Neter et al., in Applied Linear Regression, 2 nd Ed., Irwin, 1989, p. 418, the entire disclosure of which is incorporated by reference herein.
  • S 1 is the standard deviation estimate for a particular peak component i in its concentration estimate C 1 , all using the approach outlined in United States Patent No. 6,983,213 and International Patent Application PCT/US2004/034618 filed on October 20, 2004.
  • a p-value can be defined as the probability that a non-existing ion with expected concentration of zero could have generated a high enough signal with the t value given in Equation 3, or,
  • Equation 4 t(df) is the t distribution of the concentration estimate at given degrees of freedom df.
  • t-statistic in Equation 3, the smaller the p-value, and the more likely this ion or fragment exists.
  • the t-distribution, p-value and degrees of freedom df are all described by John Neter et al., in Applied Linear Regression, 2 nd Ed., Irwin, 1989, p. 8, p.12, and p. 7, the entire disclosure of which is incorporated by reference herein.
  • Equation 4 When the ion or fragment signal is not very high, especially at low level ion abundances, the p-value from Equation 4 may not be small enough to give enough statistical confidence related to the likely presence of the given ion or fragment.
  • ion fragmentations in the ion source such as EI, tandem MS/MS experiments, post source decay (PSD) or other decays or ion reactions inside a mass analyzer such as dehydration or sodium adduct formation.
  • PSD post source decay
  • ion reactions inside a mass analyzer such as dehydration or sodium adduct formation.
  • electrospray ionization (ESI) of large bio-molecules such as proteins or peptides
  • the same ion can be charged with multiple charge states all in the same experiment, creating multiple observable signals at various m/z values or masses.
  • Each p-value p j represents the false positive probability for the corresponding j-th ion or fragment resulting from the same starting ion. While p j could vary widely from one ion or fragment to another, depending on its abundance and the noise in its measurement, an overall false positive statistic can be established based on the individual P j ' s through the following equation,
  • J is the total number of ions or fragments observed from the same starting ion.
  • the probability for the presence of the given starting ion can be calculated as 1-p.
  • peptides TIYTPGSTVLYR, SKDVFLNSVFSK, and QSDFTFGKVTIK all have identical elemental composition C O3 H 1OO N 1S O 1 C) + with the same exact mass of 1370.7320Da, making them indistinguishable even on high resolution FTMS systems.
  • elemental composition C O3 H 1OO N 1S O 1 C) + with the same exact mass of 1370.7320Da making them indistinguishable even on high resolution FTMS systems.
  • very different fragments will be generated from these peptides with very different p j values in Equation 5, resulting in very different overall p values to clearly differentiate one from the other.
  • the analysis can be accomplished based on a single conventional MS measurement of multiple ions or fragments associated with the given ion of interest with all probability measures derived from this single MS measurement itself.
  • the one ion or fragment at a time approach in this invention not only avoids the problem of varying ion or fragment abundances, but also derives individual probability measures which can then be combined into an overall probability measure for the starting ion of interest.
  • statistically rigorous confidence level such as t-statistics or p- values can be established for a given ion to test for its presence or absence in the sample and used to rank possible candidates for compound identification including protein/peptide identification or database search.
  • this approach provides an easy, fast, yet mathematically sound and statistically rigorous measure for general compound identification through the use of multiple ions or fragments with applications to either de novo protein or peptide sequencing or database search.
  • raw continuum mass spectral data is obtained for a sample containing, for example, tandem MS/MS spectrum containing many isotope patterns or clusters corresponding to the many fragments of a given peptide. While, as mentioned above, most commercial techniques utilize stick spectral data, it will be recognized that the use of the entire raw spectrum means that data is not lost due to a premature gross simplification of the features of the data. However, this raw spectrum has characteristics relating to instrument peak shape function, instrument resolution, and baseline variations due to spurious ions and neutral particles that may reach the detector. Further, there may be a mass dependence with respect to all of these potential factors. For example, there is an exponential decay of baseline displacement as a function of increasing m/z in a MALDI system, principally due to ions of the matrix material, some of which arrive at the detector, despite every attempt to reduce their presence.
  • the raw data acquired in step 210 is subjected to a full calibration of the mass spectrometer based on internal and/or external standards so as to standardize the raw continuum data. This assures that the peaks are lined up at the proper m/z values, and that the shape of the peaks is properly defined and known mathematically. This is preferably accomplished by the procedure set forth in United States Patent No. 6,983,213 and
  • step 230 candidate ion fragments are selected and proposed for matching with one observed isotope pattern or cluster in the mass spectrum. There are several approaches that can be used to select a candidate ion or fragment at this stage:
  • the multiple fragments within an MS/MS spectrum provides important information to deduce the amino acid sequences for the peptide of interest, i.e., de novo sequencing.
  • these two isotope clusters may contain one or a few amino acids with some modifications such as oxidation (addition of O or O 2 to methionine), dehydration (loss of H 2 O), phosphorylation (addition of HPO 3 on tyrosine, serine, and threonine), sulphation (on the O of tyrosine), and glycosylation.
  • a candidate fragment can be selected computationally efficiently with or without accurate monoisotopic mass measurement as a pre-filter through the elemental composition search disclosed in international patent application PCT/US2005/039186 i filed on October 28, 2005.
  • This new fragment can be formed by adding to or deleting from the previous fragment, a new segment.
  • This new segment is typically composed of one or a few of the 20 known amino acids including possible complications such as modifications and incomplete fragmentations.
  • the same elemental composition search would be used to select and propose the new fragment(s).
  • Other modifications such as isotope tags or enzymatic modifications on terminus can also be incorporated.
  • Similar approaches can be applied to other more general or specific polymers including DNA molecules composed of A, G, T, and C bases in a chain.
  • b For an MS/MS search of peptides/proteins through a database, the same approach as described above for de novo sequencing can be used to select and propose fragments, except that not every one of the 20 amino acids would be possible at each stage due to the limited known sequences available in the protein or peptide databases after in sillico digestion with known enzymes. Further reduction in the search space can be achieved through accurate mass measurement on the peptide that generates the MS/MS fragments, i.e., the precursor ion, thus limiting the search to only those peptides that have exact masses within a tight, accurate mass window.
  • the charge state of each observed ion in the series needs to be determined and used to calculate the intact protein or peptide's mass, for example, as disclosed in United States Patent Nos. 5,300,771 and 6,118,120.
  • the calculated protein or peptide mass can then be used to generate a list of possible proteins or peptides from the database.
  • Each of these possible proteins/peptides associated with a given charge can now be selected as a candidate ion whereas the same protein/peptide with a different charge will be selected as the next candidate ion.
  • centroid data with all available isotopes would be preferred to maintain the data integrity.
  • relative intensity data are also available for various fragments of a compound, and has been used in the currently available searching algorithms. In order for this search to work properly, however, the MS instrument must be painstakingly tuned to insure peak ratios within 20-30% error bounds of expected values, for example, as mandated by the QC procedure of United States Environmental Protection Agency (USEPA) Method 525.2 with the use of decafluorotriphenylphosphine (DFTPP) as the tuning compound.
  • USEPA United States Environmental Protection Agency
  • DFTPP decafluorotriphenylphosphine
  • fragment abundance ratios change as much from one compound to another as with instrument conditions and therefore are very difficult to maintain even on a well tuned instrument. It is appreciated that, in this invention, the relative intensities of fragments in a library are no longer relevant, and the actual measured fragment intensity will be used to assess the probability of a given fragment before the establishment of an overall probability based on all of the fragment
  • the experimentally measured fragmentation data come from a high mass accuracy instrument such as qTOF or even FTMS, or a unit mass resolution system with comprehensive mass spectral calibration as outlined in United States Patent No. 6,983,213 and International Patent Application PCT/US2004/034618 filed on October 20, 2004, an elemental composition determination can be carried out for each observed fragment, using either commercially available formula search algorithms or preferably the approach outlined in international patent application PCT/US2005/039186, filed on October 28, 2005.
  • the exact mass locations for the candidate ion/fragment is calculated based on its elemental composition if available. This includes theoretically calculated isotope distributions, which are taken into account in the manner described in United States Patent No. 6,983,213 and International Patent Application PCT/US2004/034618 filed on October 20, 2004.
  • the isotope distribution is convoluted with the peak shape function calculated or specified as the target peak shape function in the full mass spectral calibration, all given in step 220, to obtain a calculated isotope pattern (mass spectral continuum) for the candidate ion/fragment.
  • convolution may refer to matrix operations, or point by point operation in Fourier transform space, or any other type of convolution, filtering, or correlation, either of a traditional type, or not.
  • steps 240A and 250A take an isotope pattern in profile or continuum mode as measured from an instrument or from a library and convert the isotope pattern to have desired peak shape function consistent with what is calculated (actual peak shape function) or transformed into (target peak shape function) in step 220.
  • This is achieved by either a separate full mass spectral calibration, just as the one in step 220, performed on this isotope pattern, or through a convolution of the isotope pattern measured on a higher resolution system here with the peak shape function from step 220.
  • the isotope pattern is measured with high resolution, the original peak shape function observed in it becomes insignificant compared to the peak shape function in step 220.
  • a matrix K is generated to include known and sometime mass-dependent baseline functions and the isotope pattern for the candidate ion/fragments.
  • Examples of possible baseline functions include a flat line and several lower order terms such as linear or quadratic terms. The combination of these lower order terms can adequately compensate for an exponentially decaying baseline within a small mass spectral range, and help arrive at the computationally efficient linear solutions in step 270, though one may choose to incorporate the nonlinear terms explicitly and seek a nonlinear solution instead.
  • Matrix K may optionally contain any other components interfering with the candidate fragment's isotope pattern such as the isotope patterns from co-existing ions or fragments including isotope labeled version of the fragments.
  • the first derivatives of the components in K or the first derivative of the sample measurement r may also be included.
  • a classical least squares regression (or weighted least squares regression with all weights equal to one) is performed to fit the components of the matrix generated in step 260 to the acquired and/or calibrated mass spectral spectrum data of step 220, in the form given in Equations 1-2.
  • the regression coefficients are reported out as the relative concentrations for components included in matrix K along with probability measures in the form of either t-statistic or p-value as given in Equations 3-4.
  • a statistical test based on t-statistic, p-value, or other measures such as F- statistic is performed to determine whether any or all of components included in the matrix K are significant.
  • the baseline may be treated in the analysis as if it is another compound found in the sample (in the data produced in step 220). If any component is insignificant, then branching to step 290A occurs, and this component is removed from the matrix K before the next iteration back to step 270 (and continuing on to steps 280 and 290).
  • This process of first estimating the contribution of the possible components as part of an overall fit, followed by the removal of insignificant baseline components serves the purpose of unbiased correction of components including baselines without unnecessarily introducing extra components into matrix K and Equation 1.
  • Step 290B a statistical test on the residual e (Equation 1) is performed to check and see if there are other components missing in the matrix K resulting in larger than expected residuals, in which case more components may be added in step 290C before returning to step 270 for another iteration.
  • These components may be an isotope labeled version of the fragment involved, or may be a fragment from an interfering precursor ion not separated in time and mass during the survey scan of an LC/MS/MS experiment.
  • step 290D When all components are deemed significant with statistically insignificant residuals, one would go through step 290D and return to step 230 for the analysis of the next ion/fragment.
  • the individual probability measures pertaining to each ion/fragment can be combined to form an overall probability measure for the ion that generates these fragments observed in r, in step 290E. Equation 5 above shows an example of how to progress from individual p-values to an overall p-value.
  • the results reported in step 300 can be used for unknown compound identification including de novo sequencing, where the amino acid sequence for a previously unknown peptide or protein can be determined.
  • the possible peptides or proteins selected from the library can be sorted in a search report based on their overall probability measures as scores.
  • Some combinations of various steps can be conceived by those skilled in the art, such as always performing an analysis as if there is no known protein or peptide library available, i.e., de novo sequencing, to determine the amino acid sequence before searching in an available library, in which case a simple and very fast text string search can be performed on the sequence through the use of known computational techniques such as BLAST.
  • the various isotope satellites of the same ion may be spectrally well separated from each other, may be interleaved with the isotope satellites of other ions in between, may have intensities located at a different nonlinear part of the ion detector response curve, may have different mass shifts due to space charges, and may have different baselines, etc.
  • One satellite isotope is the monoisotope, which may be advantageously used due to its simplicity, to propose initial possible elemental compositions for later confirmation and ranking, based on all observable satellite isotopes.
  • each addition candidate ion (satellite isotope here) would have its elemental composition derived from other candidate ion (satellite isotope also) by switching one or more isotopes in its elemental composition.
  • a calibrated library produced in accordance with the invention is a very valuable commodity that can be sold separately, because it has high intrinsic value to different users of different mass spectrometer systems that are standardized with respect to the same peak shape functions.
  • mass to charge ratio are used somewhat interchangeably in connection with information or output as defined by the mass to charge ratio axis of a mass spectrometer. This is a common practice in the scientific literature and in scientific discussions, and no ambiguity will occur, when the terms are read in context, by one skilled in the art.
  • the methods of analysis of the present invention can be realized in hardware, software, or a combination of hardware and software. Any kind of computer system - or other apparatus adapted for carrying out the methods and/or functions described herein - is suitable.
  • a typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system, which in turn control an analysis system, such that the system carries out the methods described herein.
  • the present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which - when loaded in a computer system (which in turn control an analysis system), is able to carry out these methods.
  • Computer program means or computer program in the present context include any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language, code or notation, and/or reproduction in a different material form.
  • the invention includes an article of manufacture, which comprises a computer usable medium having computer readable program code means embodied therein for causing a function described above.
  • the computer readable program code means in the article of manufacture comprises computer readable program code means for causing a computer to effect the steps of a method of this invention.
  • the present invention may be implemented as a computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing a function described above.
  • the computer readable program code means in the computer program product comprising computer readable program code means for causing a computer to effect one or more functions of this invention.
  • the present invention may be implemented as a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for causing one or more functions of this invention.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Electron Tubes For Measurement (AREA)
PCT/US2007/069832 2006-05-26 2007-05-28 Analyzing mass spectral data Ceased WO2007140355A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2009512334A JP5393449B2 (ja) 2006-05-26 2007-05-28 質量スペクトルデータの解析
EP07811952A EP2032238A4 (en) 2006-05-26 2007-05-28 ANALYSIS OF MASS SPECTRAL DATA
CA002653400A CA2653400A1 (en) 2006-05-26 2007-05-28 Analyzing mass spectral data

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US80913506P 2006-05-26 2006-05-26
US60/809,135 2006-05-26
US11/754,305 US7781729B2 (en) 2006-05-26 2007-05-27 Analyzing mass spectral data
US11/754,305 2007-05-27

Publications (2)

Publication Number Publication Date
WO2007140355A2 true WO2007140355A2 (en) 2007-12-06
WO2007140355A3 WO2007140355A3 (en) 2008-09-04

Family

ID=38779399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/069832 Ceased WO2007140355A2 (en) 2006-05-26 2007-05-28 Analyzing mass spectral data

Country Status (5)

Country Link
US (1) US7781729B2 (https=)
EP (1) EP2032238A4 (https=)
JP (1) JP5393449B2 (https=)
CA (1) CA2653400A1 (https=)
WO (1) WO2007140355A2 (https=)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CZ302779B6 (cs) * 2010-05-17 2011-11-02 Univerzita Palackého v Olomouci Mössbaueruv spektrometr
EP2447980A1 (en) * 2010-11-02 2012-05-02 Thermo Fisher Scientific (Bremen) GmbH Method of generating a mass spectrum having improved resolving power
GB2485257B (en) * 2010-11-03 2018-03-07 Agilent Technologies Inc System and method for curating mass spectral libraries

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7501621B2 (en) * 2006-07-12 2009-03-10 Leco Corporation Data acquisition system for a spectrometer using an adaptive threshold
WO2008059567A1 (en) * 2006-11-15 2008-05-22 Shimadzu Corporation Mass spectrometry method and mass spectrometry apparatus
US7977626B2 (en) * 2007-06-01 2011-07-12 Agilent Technologies, Inc. Time of flight mass spectrometry method and apparatus
US8803080B2 (en) * 2007-06-02 2014-08-12 Cerno Bioscience Llc Self calibration approach for mass spectrometry
US9128054B2 (en) 2008-05-09 2015-09-08 Nuctech Company Limited Detection method for an ion migration spectrum and an ion migration spectrometer using the same method
CN101576531A (zh) * 2008-05-09 2009-11-11 同方威视技术股份有限公司 离子迁移谱检测方法及使用该方法的离子迁移谱仪
WO2010129187A1 (en) * 2009-05-08 2010-11-11 Thermo Finnigan Llc Methods and systems for matching productions to precursor ions
US10497466B2 (en) * 2010-05-14 2019-12-03 Dh Technologies Development Pte. Ltd. Systems and methods for calculating protein confidence values
US8805074B2 (en) * 2010-09-27 2014-08-12 Sharp Laboratories Of America, Inc. Methods and systems for automatic extraction and retrieval of auxiliary document content
US8935101B2 (en) 2010-12-16 2015-01-13 Thermo Finnigan Llc Method and apparatus for correlating precursor and product ions in all-ions fragmentation experiments
JP5664667B2 (ja) * 2011-01-11 2015-02-04 株式会社島津製作所 質量分析データ解析方法、質量分析データ解析装置、及び質量分析データ解析用プログラム
DE112012001185B4 (de) * 2011-03-11 2014-08-28 Leco Corporation Systeme und Verfahren zur Datenverarbeitung in Chromatographiesystemen
GB2495899B (en) * 2011-07-04 2018-05-16 Thermo Fisher Scient Bremen Gmbh Identification of samples using a multi pass or multi reflection time of flight mass spectrometer
US9329122B2 (en) * 2011-08-15 2016-05-03 Schlumberger Technology Corporation Diffuse reflectance infrared fourier transform spectroscopy for characterization of earth materials
US8723108B1 (en) 2012-10-19 2014-05-13 Agilent Technologies, Inc. Transient level data acquisition and peak correction for time-of-flight mass spectrometry
WO2014107573A1 (en) * 2013-01-04 2014-07-10 The Regents Of The University Of California Method for the determination of biomolecule turnover rates
CN107077592B (zh) * 2014-03-28 2021-02-19 威斯康星校友研究基金会 高分辨率气相色谱-质谱数据与单位分辨率参考数据库的改进谱图匹配的高质量精确度滤波
EP3268978A1 (en) * 2015-03-12 2018-01-17 Thermo Finnigan LLC Methods for data-dependent mass spectrometry of mixed biomolecular analytes
DE112017001151B4 (de) * 2016-03-04 2025-12-24 Leco Corporation Benutzerdefiniertes skaliertes Massendefektdiagramm mit Filterung und Kennzeichnung
US10615015B2 (en) 2017-02-23 2020-04-07 Thermo Fisher Scientific (Bremen) Gmbh Method for identification of the elemental composition of species of molecules
JP6994921B2 (ja) * 2017-12-05 2022-01-14 日本電子株式会社 質量分析データ処理装置および質量分析データ処理方法
CA3091796A1 (en) * 2018-02-19 2019-08-22 Cerno Bioscience Llc Reliable and automatic mass spectral analysis
WO2019232520A1 (en) * 2018-06-01 2019-12-05 Cerno Bioscience Llc Mass spectral analysis of large molecules
GB2585258B (en) 2019-01-30 2022-10-19 Bruker Daltonics Gmbh & Co Kg Mass spectrometric method for determining the presence or absence of a chemical element in an analyte
GB201907792D0 (en) * 2019-05-31 2019-07-17 Thermo Fisher Scient Bremen Gmbh Deconvolution of mass spectromerty data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040195500A1 (en) 2003-04-02 2004-10-07 Sachs Jeffrey R. Mass spectrometry data analysis techniques

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5130538A (en) * 1989-05-19 1992-07-14 John B. Fenn Method of producing multiply charged ions and for determining molecular weights of molecules by use of the multiply charged ions of molecules
US5300771A (en) * 1992-06-02 1994-04-05 Analytica Of Branford Method for determining the molecular weights of polyatomic molecules by mass analysis of their multiply charged ions
US5538897A (en) * 1994-03-14 1996-07-23 University Of Washington Use of mass spectrometry fragmentation patterns of peptides to identify amino acid sequences in databases
ATE554458T1 (de) * 2003-04-28 2012-05-15 Cerno Bioscience Llc Rechnerisches verfahren und system für die massenspektralanalyse
US6983213B2 (en) * 2003-10-20 2006-01-03 Cerno Bioscience Llc Methods for operating mass spectrometry (MS) instrument systems
US7199363B2 (en) * 2003-10-14 2007-04-03 Micromass Uk Limited Mass spectrometer
CA2585453C (en) * 2004-10-28 2020-02-18 Cerno Bioscience Llc Qualitative and quantitative mass spectral analysis
US7297940B2 (en) * 2005-05-03 2007-11-20 Palo Alto Research Center Incorporated Method, apparatus, and program product for classifying ionized molecular fragments
US7904253B2 (en) * 2006-07-29 2011-03-08 Cerno Bioscience Llc Determination of chemical composition and isotope distribution with mass spectrometry
US8803080B2 (en) * 2007-06-02 2014-08-12 Cerno Bioscience Llc Self calibration approach for mass spectrometry

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040195500A1 (en) 2003-04-02 2004-10-07 Sachs Jeffrey R. Mass spectrometry data analysis techniques

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. E. STEIN, J. AM. SOC. MASS SPECTROM., vol. 10, 1999, pages 770
See also references of EP2032238A4

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CZ302779B6 (cs) * 2010-05-17 2011-11-02 Univerzita Palackého v Olomouci Mössbaueruv spektrometr
EP2447980A1 (en) * 2010-11-02 2012-05-02 Thermo Fisher Scientific (Bremen) GmbH Method of generating a mass spectrum having improved resolving power
GB2485257B (en) * 2010-11-03 2018-03-07 Agilent Technologies Inc System and method for curating mass spectral libraries

Also Published As

Publication number Publication date
EP2032238A2 (en) 2009-03-11
WO2007140355A3 (en) 2008-09-04
JP5393449B2 (ja) 2014-01-22
US20080001079A1 (en) 2008-01-03
CA2653400A1 (en) 2007-12-06
EP2032238A4 (en) 2012-08-15
US7781729B2 (en) 2010-08-24
JP2009539068A (ja) 2009-11-12

Similar Documents

Publication Publication Date Title
US7781729B2 (en) Analyzing mass spectral data
US12033839B2 (en) Data independent acquisition of product ion spectra and reference spectra library matching
EP1623351B1 (en) Computational method and system for mass spectral analysis
JP4818270B2 (ja) 選択されたイオンクロマトグラムを使用して先駆物質および断片イオンをグループ化するシステムおよび方法
US7904253B2 (en) Determination of chemical composition and isotope distribution with mass spectrometry
US7451052B2 (en) Application of comprehensive calibration to mass spectral peak analysis and molecular screening
US7197402B2 (en) Determination of molecular structures using tandem mass spectrometry
US20090210167A1 (en) Computational methods and systems for multidimensional analysis
EP1623352B1 (en) Computational methods and systems for multidimensional analysis
US20260038637A1 (en) System and method for optimizing analysis of dia data by combining spectrum-centric with peptide-centric analysis
Li et al. Informatics for mass spectrometry-based protein characterization
JP2008170346A (ja) 質量分析システム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07811952

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2653400

Country of ref document: CA

Ref document number: 2009512334

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2007811952

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2007811952

Country of ref document: EP