WO2004111609A2 - Procedes d'extraction precise de l'intensite de composant a partir de donnees de separations-spectrometrie de masse - Google Patents

Procedes d'extraction precise de l'intensite de composant a partir de donnees de separations-spectrometrie de masse Download PDF

Info

Publication number
WO2004111609A2
WO2004111609A2 PCT/US2004/017908 US2004017908W WO2004111609A2 WO 2004111609 A2 WO2004111609 A2 WO 2004111609A2 US 2004017908 W US2004017908 W US 2004017908W WO 2004111609 A2 WO2004111609 A2 WO 2004111609A2
Authority
WO
WIPO (PCT)
Prior art keywords
mass
lineshape
spectrum
modeled
distribution
Prior art date
Application number
PCT/US2004/017908
Other languages
English (en)
Other versions
WO2004111609A3 (fr
Inventor
Zulfikar Ahmed
Hans Bitter
Michael Brown
Jonathan C. Heller
David Donoho
Arjuna Balasingham
James Quarato
Perry De Valpine
Original Assignee
Predicant Biosciences, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/462,228 external-priority patent/US7072772B2/en
Priority claimed from US10/846,996 external-priority patent/US20050255606A1/en
Application filed by Predicant Biosciences, Inc. filed Critical Predicant Biosciences, Inc.
Publication of WO2004111609A2 publication Critical patent/WO2004111609A2/fr
Publication of WO2004111609A3 publication Critical patent/WO2004111609A3/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • H01J49/0036Step by step routines describing the handling of the data generated during a measurement
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N31/00Investigating or analysing non-biological materials by the use of the chemical methods specified in the subgroup; Apparatus specially adapted for such methods

Definitions

  • Mass spectrometry has become increasingly important in the field of proteomics. Mass spectromentry can be used, for example, for protein sequencing, sample analysis, functional group identification, phenotyping, etc. There are various mass spectrometers available commercially. Most mass spectrometers are based on the following four key features: a sample inlet, an ionization source, a mass analyzer, and an ion detector.
  • Mass spectrometer instruments may combine the above four features in different ways, but all mass spectrometers function by introducing a sample of molecules into the instrument, ionizing the same molecules to convert molecules into ions, propelling the ions into the analyzer where they are separated, detecting the ions according to their mass-to-charge ratio (m/z).
  • ionization there are many forms of ionization. Examples of commonly used forms of ionization include, but are not limited to, electrospray ionization (ESI), nanoelectrospray ionization (nanoESI), atmospectric pressure chemical ionization (APCI), matrix-assisted laser desorption/ionization (MALDI), desorption/ionization on silicon (DIOS), fast atom/ion bombardment (FAB), electron ionization (EI), and chemical ionization (CI).
  • a mass spectrometer is an ESI or a MALDI mass spectrometer.
  • ESI generates a fine spray of charged droplets in the presence of an electric field by converting a liquid solution to a gas.
  • ESI can produce singly charged small molecules (e.g., a small peptide) as well as multiply charged larger molecules (e.g., a protein).
  • Nanoelectrospray or nanospray has also been with a mass spectrometer. Nano-electrospray can involve the use of a spray needle that has a flow rate of approximately 1-100 or more preferably 1-10 nanoliters per minute.
  • An electrospray ionization time-of-flight mass spectrum has a number of difficulties that must be overcome before a neutral mass spectrum may be obtained.
  • mass analyzers Just as there are many forms of ionization sources, there are also many types of mass analyzers. Examples of commonly utilized mass analyzers include, but are not limited to, quadrupole, quadrupole ion trap, time-of-flight (TOF), time-of-flight reflectron (TOFR), Quad-TOF, magnetic sector, Fourier transform ion cyclotron resonance (FTMS or FT-ICR). While different mass analyzers operate in different ways (e.g., some separate ions in space others separate ions in time), all mass analyzers measure the relative intensity of gas phase ions according to their m/z ratios.
  • TOF time-of-flight
  • TOFR time-of-flight reflectron
  • Quad-TOF magnetic sector
  • FTMS or FT-ICR Fourier transform ion cyclotron resonance
  • a quadrupole mass analyzer involves the use of four rods, two positively charged and two negatively charged, wherein similarly charged rods are lined up opposite of each other. Ions generated from an ionization source are forced in between the four rods, superimposed by radio frequency.
  • a quadrupole ion trap mass analyzer is similar to a quadrupole mass analyzer, however, instead of passing through a quadrupole analyzer with a superimposed radio frequency, the ions are trapped in a radio frequency quadrupole field.
  • Quadrupole ion traps commonly employ an ESI or MALDI ionization source.
  • a TOF mass analyzer detects the time it takes ions to reach a detector. Ions in a TOF mass analyzer are given the same amount of energy through an accelerating potential. This allows for lighter ions to reach the detector faster than heavier ions of equal charge state.
  • a modification of the TOF analyzer is the TOF reflectron analyzer.
  • the TOF reflectron analyzer adds an electrostatic mirror that functions to increase the amount of time ions need to reach the detector while reducing their kinetic energy distribution and temporal distribution. Since mass resolution is defined by mass-to-charge of a peak divided by ⁇ m, where ⁇ m is the full width at half height (or t/2 ⁇ t since m is related to t quadratically), increasing t and decreasing ⁇ t results in higher resolution.
  • TOF and TOF reflectron mass analyzers function well with ESI, MALDI, and other ionization sources.
  • FTMS Fourier transform-ion cylotron resonance
  • FTMS is based on the concept of monitoring a charged particle as it orbits in a magnetic field. While the ion is orbiting, a pulsed radio frequency (RF) signal is used to excite the ions and produce a detectable current. The image current generated by all of the ions is then Fourier-transformed to obtain the component frequencies of the different ions.
  • RF radio frequency
  • Mass spectrometry can be applied to the search for significant signatures that characterize and diagnose diseases. These signatures can be useful for the clinical management of disease and/or the drug development process for novel therapeutics.
  • a mass spectrometer can histogram a number of particles by mass.
  • Time-of-flight mass spectrometers which can include an ionization source, a mass analyzer, and a detector, can histogram ion gases by mass-to-charge ratio.
  • Time-of-flight instruments typically put the gas through a uniform electric field for a fixed distance. Regardless of mass or charge all molecules of the gas pick up the same kinetic energy. The gas floats through an electric-field-free region of a fixed length.
  • a histogram can be prepared for the time-of-flight of particles in the field free region, determined by mass-to-charge ratio.
  • Typical mass spectrometers can measure approximately 5% of the ionized protein molecules in a sample.
  • Raw data analysis can treat each data point as an independent entity. However, the intensity at a data point may be due to overlapping peaks from several molecular species. Adjacent data points can have correlated intensities, rather than independent intensities. Ad hoc peak picking involves identifying peaks in a spectrum of raw data and collapsing each peak into a single data point.
  • the mass spectra of sera or other complex mixtures can be more problematic.
  • a complex mixture can contain many species within a small mass-to-charge window. The intensity value at any given data point may have contributions from a number of overlapping peaks from different species. Overlapping peaks can cause difficulties with accurate mass measurements, and can hide differences in mass spectra from one sample to the next.
  • Accurate modeling of the lineshapes, or shapes of the peaks can enhance the reliability and accurate analysis of mass spectra of complex biological mixtures. Lineshape models, or models of the peaks can also be called modeled mass-to-charge distributions.
  • Mass spectral signal processing can address the resolution problem inherent in mass spectra of complex mixtures. Pattern discovery can be enhanced from signal processing techniques that remove noise, remove irrelevant information and/or reduce variance. In one application, these methods can discover preliminary biostate profiles from proteomics or other studies. Therefore, it is desirable to reduce the noise and/or dimensionality of datasets, improve the sensitivity of mass spectrometry, and/or process the raw data generated by mass spectrometry to improve tasks such as pattern recognition.
  • molecules can be represented with a modeled mass-to-charge distribution detected by a mass spectrometer.
  • the modeled mass-to-charge distribution can be based on a modeled initial distribution representing the molecules prior to traveling in the mass spectrometer.
  • the modeled initial distribution can represent the molecules as having multiple positions and/or multiple energies and/or other initial parameters including ionization, position focusing, extraction source shape, fringe effects of electric fields, and/or electronic hardware artifacts.
  • the modeled mass-to-charge distribution of the molecules and an empirical mass-to-charge distribution of the molecules can be compared.
  • molecules can be represented by an analytic expression of a modeled mass-to-charge distribution detected by a mass spectrometer.
  • the modeled mass-to-charge distribution can be based on a modeled initial distribution representing molecules prior to traveling in the mass spectrometer.
  • the modeled initial distribution can represent the molecules as having multiple positions and/or multiple energies and/or other initial parameters including ionization, position focusing, extraction source shape, fringe effects of electric fields, and/or electronic hardware artifacts.
  • the present invention contemplates methods for processing mass spectra data comprising performing a deconvolution of a one-dimensional (ID) spectrum to increase the mass resolution of the raw data accurately and to reduce or remove the noise in the spectrum.
  • Deconvolution of mass spectra output is preferably made using maximum entropy estimation or basis pursuit (BP).
  • BP basis pursuit
  • the axis of the original ID spectrum e.g. the TOF axis, may be transformed prior to deconvolution and re-transformed subsequent to deconvolution. Need a sentence that says that a collection of Id spectra can form a 2d data set.
  • clustering analysis is preformed on the two-dimensional data set subsequent to deconvolution of the ID mass spectra output.
  • the role of clustering is to accurately represent the different peaks represented across time in a 2D separations-mass spectrum and to obtain an accurate count of these peaks.
  • deconvolved, clustered peak lists are further processed to group isotopes and charge states observed for distinct molecular ion species.
  • the results of deconvolution of mass spectra output accurately represent the molecular ion species detected from the sample.
  • 50% of the resulting peaks represent molecular ions detected in the sample, more preferably at least
  • 70% more preferably at least 80%, more preferably at least 90%, more preferably at least 95%, or more preferably at least 99%.
  • 50% of the molecular ion detected from the sample are represented by resulting peaks, more preferably at least 70%, more preferably at least 80%, more preferably at least 90%, more preferably at least 95%, or more preferably at least 99%.
  • the results of deconvolution, clustering, and grouping isotopes and charge states accurately represent the neutral mass molecular species detected from the sample.
  • 50% of the resulting peaks represent molecular species detected in the sample, more preferably at least 70%, more preferably at least 80%, more preferably at least 90%, more preferably at least 95%, or more preferably at least 99%.
  • 50% of the molecular ion detected from the sample are represented by resulting peaks, more preferably at least 70%, more preferably at least 80%, more preferably at least 90%, more preferably at least 95%, or more preferably at least 99%.
  • Figure 1 is a flowchart illustrating one embodiment of performing signal processing on a mass spectrum.
  • Figure 2 is a flowchart illustrating aspects of some embodiments of performing signal processing on a mass spectrum.
  • Figure 3 is a simple schematic of a time-of-flight mass spectrometer.
  • Figure 4 is a simple schematic of a time-of-flight mass spectrometer with a reflectron.
  • Figure 5 illustrates a probability density function of a pushed forward Gaussian, showing a skew to the right.
  • Figure 6 shows a change of coordinates from (x, z) to (v, ⁇ )
  • Figure 7 shows a mass spectrum.
  • Figure 8 shows an expanded view of Figure 7.
  • Figure 9A represents a "cluster" or a group of isotopes (e.g., a, b, and c) of the same charge state of a species.
  • Figure 9B represents different charge states (e.g., A-E) of a species.
  • a single charge state can comprise of a single isotope as is illustrated in Figure 9B or multiple isotopes as is illustrated in Figure 9A.
  • Figure 10 illustrates a flow diagram of a high-throughput online system disclosed herein.
  • Figure 11 illustrates the process of scaling, deconvolving, and descaling.
  • Figure 1 IA is a ID spectrum before scaling (raw data).
  • Figure HB is a ID spectrum after scaling but before deconvolving.
  • Figure HC is a ID spectrum after deconvolving but before descaling.
  • 1 ID is a ID spectrum after deconvolving and descaling.
  • Figure 12 illustrates the process of compiling multiple ID spectra into a 2D spectrum.
  • Figure 12A illustrates the compilation of multiple ID spectra such that similarly situated peaks are aligned vertically.
  • Figure 12B illustrates a compiled 2D spectrum of more than 20 individual ID spectra.
  • Figure 13 illustrates a list of data output that may be generated by the methods herein.
  • Figure 13, Column 1 lists centroid mass values;
  • Figure 13, Column 2 lists the centroids in separation time value;
  • Figure 13, Column 3 lists the total intensity for. those deconvolved, collapsed peaks.
  • Figure 14 illustrates an overview of a high-throughput online system disclosed herein.
  • the present invention involves a high-throughput method that allows for the diagnosis and prognosis of various diseases, research for discovery of proteomic markers whose levels in biological samples can statistically distinguish between healthy and disease states as well as between different disease states, and identification of novel compositions that may function as targets or therapeutics in the treatment and management of diseases.
  • the present invention relates to methods to determine accurate estimates of the total intensity (abundance), average and carbon- 12 monoisotopic as well as average molecular weight, mass-to-charge ratio, and isotopic composition of molecular ion species present in a raw separations-mass spectrum. This allows for summarizing the information of large multidimensional spectra by output data that are several orders of magnitude smaller in size.
  • the present invention involves high throughput methods using a measured mass spectrum to estimate the signal for the same sample that would be produced by a mass spectrometer with a higher resolution and with lower noise levels, with the limit of an idealized mass spectrometer which gives exact estimates of location and intensity for each charge state and each isotope of every molecular species in the sample.
  • the present invention relates to methods for providing a lineshape to raw data/mass spectrum.
  • dimensionality reduction can be performed on the mass spectrometry data.
  • Signal processing can ensure that processed data contains as little noise and irrelevant information as possible. This increases the likelihood that the biostate profiles discovered by the pattern recognition algorithms are statistically significant and are not obtained purely by chance.
  • Dimensionality reduction techniques can reduce the scope of the problem.
  • An important tool of dimensionality reduction is the analysis of lineshapes, which are the shapes of peaks in a mass spectrum. Lineshapes, instead of individual data points, can be interpreted in a physically meaningful way.
  • the physics of the mass spectrometer can be used to derive mathematical models of mass spectrometry lineshapes. Ions traveling through mass spectrometers have well-defined statistical behavior, which can be modeled with probability distributions that describe lineshapes.
  • the modeled lineshapes can represent the distribution of the time-of-flight for a given mass/charge (m / z), given factors such as the initial conditions of the ions and instrument configurations.
  • equations are derived for the flight time of an ion given its initial velocity and position.
  • a probability distribution is assumed of initial positions and/or velocities and/or other initial parameters that affect the time-of-flight based on rigorous statistical mechanical approximation techniques and/or distributions such as gaussians.
  • Formulae are then calculated for the time-of-flight probability distributions that result from the probability-theoretical technique of "pushing forward" the initial position and/or velocity distributions by the time-of-flight equations.
  • Each formula obtained can describe the lineshape for a mass-to-charge species.
  • a complex spectrum can be modeled as a mixture of such lineshapes.
  • modeled lineshapes real spectrometric raw data of an observed mass spectrum can be deconvolved into a more informative description.
  • the modeled lineshapes can be fitted to spectra, and/or residual error minimization techniques can be used, such as optimization algorithms with L2 and/or Ll penalties. Coefficients can be obtained that describe the components of the deconvolved spectrum. Thus, data dimensions that describe a given peak can be collapsed into a simpler record that gives, for example, the center of the peak and the total intensity of the peak.
  • a broad peak in a spectrum can be replaced with much less data, which can be several m / z data points or a single m / z data point that represents the observed component's abundance in the spectrometer, which in turn is correlated with the abundance of the observed component in the original sample.
  • Filtering techniques can be performed to de-noise and/or compress data.
  • the processed data, with noise removed and/or having reduced dimensionality, can be one or more orders of magnitude smaller than the original raw dataset.
  • the original raw dataset can be decomposed into chemically meaningful elements, despite the artifacts and broadening introduced by the mass spectrometer. Even in instances where peaks overlap such that they are visually indiscernible, this method can be applied to decompose the spectrum.
  • the processed data may be roughly physically interpretable and can be much better suited for pattern recognition, due to the significantly less noise, fewer data dimensions, and/or more meaningful representation of charged states, isotopes of particular proteins, and/or chemical elements, that relate to the abundance of different molecular species.
  • Such pattern recognition methods When applied to processed data, such pattern recognition methods identify proteins which may be indicative of disease, and/or aid in the diagnosis of disease in people and quantify their significance. Finding the proteins and/or making a disease diagnosis can be based at least partly on the modeled mass-to-charge distribution.
  • Figure 1 is a flowchart illustrating one embodiment of performing signal processing on a mass spectrum.
  • a modeled mass-to-charge distribution represents molecules that have traveled through a mass spectrometer.
  • the modeled mass-to-charge distribution is based on at least a modeled initial distribution of any parameter affecting time-of-flight representing the molecules prior to traveling in the mass spectrometer.
  • the modeled mass-to-charge distribution is compared with an empirical mass-to-charge distribution.
  • Various embodiments can add, delete, combine, rearrange, and/or modify parts of this flowchart.
  • Figure 2 is a flowchart illustrating aspects of some embodiments of performing signal processing on a mass spectrum.
  • a modeled initial distribution of one or more parameters affecting time-of-flight represents molecules prior to traveling in the mass spectrometer.
  • the modeled initial distribution is pushed forward by time of flight functions. The modeled distribution is thereby based at least partly on the modeled initial distribution.
  • a mass spectrometer detects an empirical distribution of molecules. This empirical distribution and the modeled distribution can be compared.
  • a fit is performed between the empirical and modeled distributions.
  • the fit is filtered.
  • Various embodiments can add, delete, combine, rearrange, and/or modify parts of this flowchart.
  • Figure 3 illustrates a simple schematic of a time-of-flight mass spectrometer.
  • the mass analyzer has two chambers: the extraction region 310 and the drift region 320 (also called the field-free region), at the end of which is the detector 330.
  • the flight axis 340 extends from the extraction chamber to the detector.
  • Ion 360 is closer to the back of the extraction chamber than ion 370.
  • Ion 360 is accelerated for a longer time in the extraction region 310 than ion 370.
  • Ion 360 exits the extraction region 310 with a higher velocity than ion 370.
  • ion 360 reaches the detector 330 before ion 370.
  • Figure 4 illustrates a simple schematic of a time-of-flight mass spectrometer with a reflectron.
  • a reflectron 440 helps to lengthen the drift region 420 and focus the ions.
  • the full gas content is completely localized in the extraction chamber with negligible kinetic energy in the direction of the flight axis.
  • Other embodiments permit the gas tohave some kinetic energy in the direction of the flight axis, and/or have some kinetic energy away from the direction of the flight axis.
  • the gas ions have an initial spatial distribution within the extraction source.
  • the gas ions have an initial spatial distribution within the extraction source and have some kinetic energy in the direction of the flight axis, and/or have some kinetic energy away from the direction of the flight axis.
  • an extraction chamber has a potentially pulsed uniform electric field E 0 in the direction of the flight axis, and has length SQ.
  • An ion of mass m and charge q that starts at the back of the extraction chamber will pick up kinetic energy E ⁇ q white traveling through the electric field.
  • the field-free region has length D. If the ion has constant energy while in the field-free region, then:
  • Other embodiments model an extraction chamber with a uniform electric field in a direction other than the flight axis, and/or an electric field that is at least partly nonuniform and/or at least partly time dependent.
  • the velocity can be a function of distance traveled (from the energy gained). If u is the distance traveled, then v( ⁇ ) _ / nowadays*£
  • Analogous equations can be derived to represent the ions as they move through other regions of a mass spectrometer.
  • Some factors that affect the time-of-flight distributions of a given mass-to-charge species are the initial spatial distribution within the extraction chamber, and the initial kinetic energy (alternatively, initial velocity) distribution in the flight-axis direction, and/or other initial parameters including ionization, position focusing, extraction source shape, fringe effects of electric fields, and/or electronic hardware artifacts.
  • Other embodiments can represent the initial kinetic energy (alternatively initial velocity) distribution in a direction other than the flight-axis direction.
  • the initial distributions of parameters of an ion species that affect the time-of-flight pushed forward by the time of flight functions can be called modeled initial distributions.
  • Some embodiments use distributions such as gaussian distributions of initial positions and/or energies (alternatively velocities).
  • Other embodiments can use various parametric distributions of initial positions and/or energies.
  • the parameters can result from data fitting and/or by scientific heuristics.
  • Further embodiments rely on statistical mechanical models of ion gases or statistical mechanical models of parameters that affect the time-of-flight.
  • the quantity of material in the extraction region is in the pico-molar range (10 "12 moles is on the order of 10 11 particles) and hence statistics are reliable.
  • An issue is the timescale for the system to reach equilibrium.
  • equilibrium statistical mechanics can apply if the system converges to equilibrium faster than, e.g. the microsecond range. Model of species distributed in position
  • Some embodiments have a parametric model of the initial position distribution and with a fixed initial energy.
  • the time-of-flight distribution to be observed can be modeled.
  • S be a normal random variable with mean S 0 and variance ⁇ 2 0 « s 0 .
  • the distribution of the time-of-flight in the field-free region (to) is modeled rather than the total time-of-flight (t tot )-
  • Other embodiments can model the total time-of-flight, or in the field regions such as constant field regions.
  • the time-of-flight can be a random variable f ⁇ ( ⁇ S) and what will be observed in the mass spectrum is the probability density function of f ⁇ ( ⁇ S).
  • this can be a strictly decreasing function; other embodiments have an increasing function.
  • a constant is defined:
  • a clear skew to the right is shown.
  • the initial position is constant but the initial kinetic energy in the flight axis-direction has a gaussian distribution.
  • the initial distribution can be given by a N ⁇ o , ⁇ l) random variable U.
  • the time-of-flight in the drift region is given by
  • tof is the time-of-flight, at is the time the ion spends in the extraction chamber, and f o is the time the ion spends in the field-free region.
  • Equations for calculating the time-of-flight of an ion through any system involving uniform electric fields can be derived from the laws of basic physics. Such equations can accurately determine the flight time as a function of the mass-to-charge ratio for any specific instrument, with distances, voltages and initial conditions. The accuracy of such calculations can be limited by uncertainties in the precise values of the input parameters and by the extent to which the simplified one-dimensional model accurately represents the real three-dimensional instrument. Other embodiments can use more than one-dimension, such as a two-dimensional, or a three-dimensional model.
  • Analyzers with electric fields can have at least two kinds of regions: field free regions, and constant field regions. Velocities of an ion can be traced at different regions to understand the time-of-flight. In an ideal field-free region of length L, an ion's initial and final velocities are the same and therefore the time spent in the region is
  • decelerations and/or accelerations can be accounted for in the time spent in the field-free region.
  • Some embodiments can be applied to a mass spectrometer including three chambers and a detector - a ion extraction chamber (e.g. rectangular), a field-free drift tube, and a reflectron.
  • the shape of the distribution of the time-of-flight of a single mass-to-charge species can be determined at least partly by the distributions of initial positions in the extraction chamber and/or the initial velocities along the flight-axis.
  • Approximate formulae can be derived for the time-of-flight distribution for a species of fixed mass-to-charge ratio, in this example assuming that the distributions for initial positions and velocities are gaussian. The initial positions have restricted range, and the assumption for initial position may be modified to reflect this.
  • the plane that separates the extraction region from the field-free drift region can be called the “drift start” plane.
  • the flight-axis velocity at the “drift start” plane can be referred to as the “drift start velocity.
  • E 0 is the electric field strength of the extraction region
  • the probability density can be determined that results when this distribution is pushed forward by (x,y) ⁇ v(x,y) .
  • the resulting density in the space of velocities can be denoted by p v .
  • T can be used to push forward the density p v to a new density in the t-space ⁇ Expression for p v in the gaussian case
  • F(x,y) is any function of x and y.
  • E ⁇ [F] f r F(X 3 — )P ⁇ r(x,— )— zdzdx .
  • E ⁇ [F] _[ £ F(v sin ⁇ 5 - cos 2 ⁇ (v sin ⁇ 5 - cos 2 ⁇ )—cos ⁇ vd ⁇ dv K K
  • the mathematical forms derived above for the lineshapes, or shapes of peaks, of the different species based upon the underlying physics of the mass spectrometer, can be applied to the analysis of spectra. Rigorous fits can be performed between empirical mass spectra and synthetic mass spectra generated from mixtures of lineshapes.
  • a more complex method for fitting a mass spectrum using modeled lineshape equations uses model basis vectors, such as wavelets and/or vaguelettes. This can be done generally, and/or for a given mass spectrometer design.
  • a basis set is a set of vectors (or sub-spectra), the combination of which can be used to model an observed spectrum.
  • An expansion of the lineshape equations can derive a basis set that is very specific for a given mass spectrometer design.
  • a spectrum can be described using the basis vectors.
  • An observed empirical spectrum can be described by a weighted sum of basis vectors, where each basis vector is weighted by multiplication by a coefficient.
  • Some embodiments use scaling.
  • the linewidth of the peak corresponding to a species in a mass spectrum is dependent on the time-of-flight of the species.
  • the linewidth in a mass spectrum may not be constant for all species.
  • One way to address this is to rescale the spectrum such that the linewidths in the scaled spectrum are constant.
  • Such a method can utilize the linewidth as a function of time-of-flight. This can be determined and/or be estimated analytically, empirically, and/or by simulation. Spectra with constant linewidth can be suitable for many signal processing techniques which may not apply to non-constant linewidth spectra.
  • Some embodiments use linear combinations and/or matched filtering.
  • a weighted sum of lineshape functions representing peaks of different species can be fitted to the observed signal by minimizing error.
  • the post-processed data can include the resulting vector of weights, which can represent the abundance of species in the observed mass spectrum.
  • Fitting can assume that the spectrum has a fixed set of lineshape centers (including
  • a lineshape function such as ⁇ > > ) ma y be determined for each center-width pair.
  • a synthetic spectrum may include a weighted sum of such lineshape
  • One advantage of this method is that it reduces the number of data dimensions, since an observed spectrum with a large number of data points can be described by a few parameters. For example, if an observed spectrum has 20,000 data points, and 20 peaks, then the spectrum can be described by 60 points consisting of 20 triplets of center, width, and amplitude. The original
  • Some embodiments construct convolution operators. Lineshapes constructed analytically, determined empirically, and/or determined by simulation may be used to approximate a convolution operator that replaces a delta peak (e.g., an ideal peak corresponding to the time-of-flight for a particular species) with the corresponding lineshape.
  • a delta peak e.g., an ideal peak corresponding to the time-of-flight for a particular species
  • Some embodiments use Fourier transform deconvolution.
  • the Fourier transform and/or numerical fast Fourier transform of a spectrum such as the rescaled spectrum can be multiplied by a suitable function of the Fourier transform of the lineshape determined analytically, estimated empirically, and/or by simulation.
  • the inverse Fourier transform or inverse fast Fourier transform can be applied to the resulting signal to recover a deconvolved spectrum.
  • Some embodiments use scaling and wavelet filtering. Any family of wavelet bases can be chosen, and used to transform a spectrum, such as a rescaled spectrum. A constant linewidth of the spectrum can be used to choose the level of decomposition for approximation and/or thresholding. The wavelet coefficients can be used to describe the spectrum with reduced dimensions and reduced noise.
  • Some embodiments use blocking and wavelet filtering.
  • the spectrum can be divided into blocks whose sizes can be determined by linewidths determined analytically, estimated empirically, and/or by simulation. Any family of wavelet bases can be chosen and used to transform a spectrum, such as the raw spectrum. Different width features can be described in the wavelet coefficients at different levels. The wavelet coefficients from the appropriate decomposition levels can be used to describe the spectrum with reduced dimensions and reduced noise.
  • Some embodiments construct new wavelet bases.
  • Analytical lineshapes, empirically determined lineshapes, and/or simulated lineshapes for a given configuration of a mass spectrometer can be used to construct families of wavelets. These wavelets can then be used for filtering. Vaguelettes are another choice for basis sets.
  • the vaguelettes vectors can include vaguelettes derived from wavelet vectors, vaguelettes derived from modeled lineshapes, and/or vaguelettes derived from empirical lineshapes.
  • Some embodiments use wavelet-vaguelette decomposition.
  • Another method based on wavelet filtering may be the wavelet-vaguelette decomposition.
  • the modeled lineshape functions may be used to construct a convolution operator that replaces a delta peak with the corresponding lineshape.
  • Any family of wavelet bases may be chosen, such as 'db4', 'symmlet', 'coifief.
  • the convolution operator may be applied to the wavelet bases to construct a set of vaguelettes.
  • a minimal error fit may be performed for the coefficients of the vaguelettes to the observed spectrum.
  • the resulting coefficients may be used with the corresponding wavelet vectors to produce a deconvolved spectrum that represents abundances of species in the observed spectrum.
  • Some embodiments use thresholding estimators.
  • the Kalifa-Mallat mirror wavelet basis can guarantee that K is almost diagonal in that basis.
  • the decomposition coefficients in this basis can be performed with a wavelet packet filter bank requiring 0(N) operations. These coefficients can be soft-thresholded with almost optimal denoising properties for the reconstructed synthetic spectra.
  • Fitting a basis set to an observed empirical spectrum does not necessarily reduce the dimensionality, or the number of data points needed to describe a spectrum. However, fitting the basis set "changes the basis” and does yield coefficients (parameters) that can be filtered more easily. If many of the coefficients of the basis vectors are close to zero, then the new representation is sparse, and only some of the new basis vectors contain most of the information.
  • thresholding can be performed on the basis vector coefficients. These methods remove or deemphasize the lowest amplitude coefficients, leaving intensity values for only the true signals. Hard thresholding sets a minimum cutoff value, and throws out any peaks whose height is under that threshold; smaller peaks may be considered to be noise. Soft thresholding can scale the numbers and then threshold. Multiple thresholds and/or scales can be used. Figures 7 and 8 are empirical figures that show that real mass spectra have lineshapes with a skewed shape consistent with the results of the pushed-forward lineshapes.
  • Figure 7 illustrates a mass spectrum of a 3 peptide mixture of angiotensin (A), bradykinin (B), and neurotensin (N). Data were collected on an electro-spray-ionization time-of-flight mass spectrometer (ESI-TOF MS). For each peptide, there are two peaks, one for the +2 and +3 charge states. For example, A(+2) is the angiotensin +2 charge state.
  • Figure 8 illustrates an expanded view of Figure 7 to display in detail the bradykinin +2 charge state.
  • the various peaks present are due to different isotope compositions of the bradykinin ions in the ensemble (e.g. 13C vs. 12C ).
  • the peakshapes are skewed to the right.
  • a time-of-flight distribution can be considered an example of a mass-to-charge distribution.
  • Some embodiments can run on a computer cluster.
  • Networked computers that perform CPU-intensive tasks in parallel can run many jobs in parallel.
  • Daemons running on the computer nodes can accept jobs and notify a server node of each node's progress.
  • a daemon running on the server node can accept results from the computer nodes and keep track of the results.
  • a job control program can run on the server node to allow a user to submit jobs, check on their progress, and collect results.
  • the cluster can be loosely parallel, more like a simple network of individual computers, or tightly parallel, where each computer can be dedicated to the cluster.
  • Some embodiments can be implemented on a computer cluster or a supercomputer.
  • a computer cluster or a supercomputer can allow quick and exhaustive sweeps of parameter spaces to determine optimal signatures of diseases such as cancer, and/or discover patterns in cancer.
  • Separation-Mass Spectrum A common feature of electrospray ionization mass spectrometry is the ability of the mass spectrometer to produce ions with multiple charge states.
  • An ESI mass spectrum generally comprises of a sequence of multiply charged peaks. Each group of peaks in a one-charge state is often referred to as an isotope envelope.
  • An envelope is a cluster of peaks for a given charge state representing all of the different observable isotope states of a particular molecule.
  • An envelope represents one charge state of a molecule. Thus, multiple envelopes may represent one molecule in its different charge states.
  • Figures 9 A and IB illustrate the above concepts.
  • Figure 9A represents a single charge state of a species.
  • the charge state comprises of a set of peaks (e.g., a, b, and c). Each peak represents a different isotope of the charge state.
  • Figure 9B represents different charge states (A-E) of the same species.
  • the species illustrated by Figures 9A and 9B are not the same.
  • Mass analyzers are finite resolution instruments and hence, instead of producing a sharp width-less spike for each ion species, they produce a positive-width lineshape or pointspread function whose width depends on the mass-to-charge of the species, the species' temporal and energy distributions, and on the instrumental configuration for each m/z species.
  • a mass spectrometer may also produce noise that can distort the spectrum.
  • noise examples include "white noise” (usually modeled as “Gaussian noise”) and “detector noise” (usually modeled as “poisson noise”).
  • White noise can result from various internal errors that can influence an entire data set.
  • White noise can occur, for example, as a result of an imperfect vacuum, impurities in the device or sample, insufficient concentration of sample, temperature, etc.
  • the white noise may be independent on the signal intensity.
  • detector noise may depend on the intensity of the signal.
  • the methods herein involve analyzing one or more samples 101.
  • the methods herein involve high throughput screening of numerous samples.
  • a sample analyzed by a mass spectrometer of the present invention can include one or more compositions including, a carbohydrate, a polypeptide, a polynucleotide, a lipid, a synthetic polymer, a small or large organic or inorganic molecule, a mimetic, or a combination of any of the above.
  • a sample is obtained from a plant or an animal, more preferably from a mammal, or more preferably from a human.
  • liquid samples that may be derived from an individual include urine, nasal discharge, vaginal discharge, mucus, lymph, blood, serum, plasma, saliva, and tears. Non-liquid samples may also be used as a non-liquid sample may be solubilized.
  • a sample may be input directly into the mass spectrometer for analysis or, in preferred embodiments, it may be first separated in step 105. Separation may be made according to, for example, size, weight, charge, isoelectric point, binding affinity, time of travel, etc.
  • a sample can be separated using, for example, electrophoresis, chromatography, filtration, centrifugation, fractionation, antibodies, or any other means for separating in time various components of the sample.
  • samples are separated by electrophoresis or chromatography, more preferably samples are separated by capillary electrophoresis (CE) or high performance liquid chromatography (HPLC).
  • Capillary electrophoresis refers to a set of related techniques that employ capillaries (e.g., 10-200 ⁇ m i.d. in width) to perform high efficiency separations.
  • CE can be used to separate both large and small molecules.
  • CE techniques perform separations based on, for example, molecular size, isoelectric focusing, and hydrophobicity.
  • high voltages may be used to separate molecules based on differences in charge and size.
  • separation results from the combination of electrophoretic migration and electro-osmotic flow.
  • CE is performed for example on a P/ACETM MDQ (Beckman Instrument). Electrophoresis can also be performed on microfluidic chips with channels of smaller dimensions.
  • the separation step can be repeated more than one, two, three, or four times. Each time a separation step is repeated the same or a different separation technique may be utilized.
  • samples are separated twice or three times using capillary electrophoresis and/or HPLC. The greater the number of separations used the greater the number of dimensions produced by the output of the mass spectrometer. However, no matter how many separations are conducted the mass spectrum output may be deconvolved line-by-line as a ID spectrum as described in more detail herein.
  • the separation step may be preceded by an acidification step 104.
  • a liquid sample is acidified to denature proteins therein thereby breaking up complexes.
  • the sample is then filtered or separated to remove a subset of species before separating it (e.g., by capillary electrophoresis).
  • the acidification step may be followed by a separation step 105 by ultracentrifugation and/or ultrafiltration. This allows for a crude separation of components into fractions to be analyzed further and unwanted fractions.
  • Acidification may occur with acids that will not cleave desired proteins.
  • acids used for acidification reduce the acidity of the sample to no less than pH5, pH4, pH3, or pH2.
  • formic acid may acidify a sample to a pH of 3. It is then possible to separate unwanted constituents in the sample by ultracentrifugation. Fractionation of the liquid sample yields the result that, for example, only fractions of e.g., proteins and/or peptides of a certain molecular weight are retained for further analysis.
  • proteins may be digested with proteases, e.g. trypsin, or by other means and those protein fragments may then be separated and analyzed by mass spectrometry. Information from such digestion experiments can help analyze larger proteins.
  • Separation step 105 is preferably automated and followed by the ionization step 110.
  • the ionization step 110 involves producing gas phase ions from analyte in solid or liquid phase.
  • gas phase ions from analyte in solid or liquid phase.
  • a mass analyzer analyzes ionized samples/fragments in step 115.
  • any mass analyzer may be used to analyze the resulting ions.
  • the mass analyzer is a TOF mass analyzer or an FTMS mass analyzer.
  • the mass analyzer may be a tandem mass spectrometer as well, in which mass spectrometry is essentially performed twice. Species selected after the first mass analysis are fragmented and the fragments are analyzed in the second mass analyzer. This type of analysis can be helpful, for example, in identifying proteins.
  • tandem mass spectrometers including for example, quadropole-TOF mass spectrometers.
  • Output from a coupled separations-mass spectrometer system can include both a "1 -dimensional (ID) mass spectrum” wherein m/z values are in the x-axis and intensity values are in the y-axis, and "2-dimensional (2D) mass spectrum,” wherein m/z values are in the x-axis the migration time is in the y-axis, and contours or colors represent intensities.
  • ID m/z values are in the x-axis and intensity values are in the y-axis
  • 2-dimensional (2D) mass spectrum wherein m/z values are in the x-axis the migration time is in the y-axis, and contours or colors represent intensities.
  • Figures 12A and 12B Figure 4 A illustrates the compilation of multiple ID spectra such that similarly situated peaks are aligned vertically.
  • Figure 12B illustrates a compiled 2D spectrum of more than 20 individual ID spectra. Any number of ID spectra can be compiled into a 2D spectrum.
  • the invention preferably utilizes separations procedures that allow elution of a single molecular species for longer than the acquisition time of a single mass spectrum.
  • peaks for the various charge states of a species appear in more than 1, more than 2, more than 3, more than 4, more than 5, or more preferably more than 10, more than 15, or more than 20 contiguous ID spectra in similar m/z locations.
  • a 2D spectrum which has "2D peaks" that depend on the mass to charge and the separation time axis is formed.
  • the ID spectrum may be analyzed by determining a lineshape for the ID mass spectrum in step 125, transforming (scaling) the lineshape signal to an axis wherein the peaks have similar shape and width independent of the m/z of the species in scaling step 130, deconvolving the scaled lineshape in step 135, and descaling the output of deconvolution in step 140 back to the original mass spectrum axis.
  • scaling parameters, lineshape parameters, and noise levels are estimated in steps 121, 122, and 123, prior to determination of a lineshape.
  • scaling parameters are estimated in step 121 by fitting a statistical model where a parameter ⁇ represents the change of peak- widths as a function of time-of-flight.
  • a parameter ⁇ represents the change of peak- widths as a function of time-of-flight.
  • a subset of data from a 2D separations-spectrum is chosen judiciously based on whether they contain resolved isotope clusters. Then a statistical fit for ⁇ is made depending on a collection of fits to isotope clusters with time-of-flight centers that cover a wide range.
  • lineshape parameters are estimated in step 122 based on parametric and non-parametric methods. For example, estimation of known lineshape parameters is done using physical parameters of the mass spectrometer and statistical distributions of the locations, velocities, and other physical parameters of the particles and of the mass spectrometer. Statistical estimation of the unknown lineshape parameters is done by standard methods such as maximum likelihood, least squares, maximum entropy, and/or model selection methods such as information criteria. In preferred embodiments noise levels are estimated in step 123 by high frequency wavelet coefficients of the signal. In other embodiment, noise levels are estimated by any well-known method in signal processing.
  • a mass spectrum lineshape is determined. Certain methods of determining lineshape are provided herein and in U.S. Application Ser. No. 10/462,228, filed on 6/12/03, entitled “Method And Apparatus For Modeling Mass Spectrometer Lineshapes," incorporated herein by reference for all purposes, which discloses analytic models to determine some envelopes of lineshapes.
  • a lineshape u is calculated based on physical parameters of the mass spectrometer/separation-mass spectrometry system and statistical distributions of the locations, velocities, and other physical parameters of the particles and of the mass spectrometer/ separation-mass spectrometry. For particular settings of a mass spectrometer for which a well-understood physics model is available, this method allows calculation of the parameters that define u from data with statistical bounds representing confidence of the fit of the model.
  • a lineshape u is calculated by combining physical derivation of the lineshape with statistical estimation of unknown features of the lineshape.
  • the lineshape may be unspecified, such as a reference width, tail shape, or other features.
  • the unspecified features may be estimated by statistically fitting u to a selected subset of single peaks or isotopic peak clusters.
  • a useful analogy for this approach is estimation of standard statistical distributions such as a normal distribution where the mean and variance are estimated from data; the distribution is specified as a parametric envelope with parameters to be estimated from data.
  • the lineshape is derived as a parametric envelope from understanding of the mass spectrometer, with some parameters to be estimated from data.
  • a likelihood for such set of parameters can be calculated for the data, and the best parameters can be selected by optimizing the likelihood (or other fitting function).
  • the parameters to be estimated can also be formulated as unknown physical parameters of the mass spectrometer/separation-mass spectrometry.
  • a lineshape can be calculated, from which the value of the statistical fitting function can be calculated and optimized over the parameter space.
  • a lineshape u is determined completely from raw data by relying exclusively on statistical estimation of the lineshape using flexible non-parametric methods for estimation of arbitrary distribution functions.
  • This method omits physical derivation of any aspects of the lineshape, and the three methods specified here represent a spectrum from completely physical derivation to combined physical and statistical estimation to completely statistical estimation.
  • flexible functional forms such as smoothing splines, B-splines, thin plate splines, piecewise polynomials, and mixtures of distributions may be used.
  • the lineshape can be considered a multiple of a probability density function.
  • the methods of the last paragraph can be used to estimate either the probability density function or the logarithm of the probability density function. Each of these methods involves parameters to be estimated, and some involve smoothness penalties that can be chosen manually or by automated methods such as cross-validation.
  • the density function estimator produces a particular lineshape, for which a likelihood (or other fitting function) can be calculated for the data, and the best parameters can be selected by optimizing the likelihood (or other fitting function) over the parameter space.
  • the ID spectra is scaled or transformed in step 130.
  • Scaling step 130 transforms the u along the time-of-flight-axis.
  • scaling step 130 transforms the lineshape along the time-of-flight-axis such that the peaks have the same shape and width independent of the m/z of the species or time-of-flight. This allows use of Fourier transform techniques to deconvolve the spectrum, since the blurring effect of the mass spectrometer is independent of the location in the transformed coordinates. This is especially useful when using a single-extraction time-of-flight mass spectrum, which generates peaks widths that increase linearly as a function of the time of flight.
  • scaling step 130 involves transforming the lineshape of a spectrum to an artificial axis where the peak- widths of the underlying individual isotopes of each species will be constant and the lineshape is transformed along the time-of-flight-axis such that lineshape u varies deterministically.
  • the present invention when using a TOF mass spectrometer with a single acceleration region, provides for a F(t) that is a continuous function of time-of-flight, t, representing a signal with a fixed lineshape or point-spread function with the property that the peak centered at to has peak width a to+b, where a > 0.
  • F(t) that is a continuous function of time-of-flight, t, representing a signal with a fixed lineshape or point-spread function with the property that the peak centered at to has peak width a to+b, where a > 0.
  • scaling step 130 transforms the lineshape along the time-of-flight-axis such that the width of lineshape u varies linearly or quadratically as a function of time-of-flight.
  • Linear or quadratic parameters may be calculated from raw data using a parametric model of the lineshape.
  • the parametric model can be determined using a model of the lineshape that includes initial position and energy distribution of charged ions.
  • the parametric model can be gaussian.
  • the parametric model can be a student-t distribution.
  • the parametric model can be determined by computer simulation of the mass spectrometer. After scaling an observed signal, the scaled signal is deconvolved.
  • the operator Kx u * x may be singular or at least numerically singular, and hence the problem of determining y even in the case where ⁇ is zero is not a well-posed problem.
  • a scaled lineshape u is deconvolved in step 135.
  • deconvolution is made by parametric deconvolution techniques (PDPS).
  • PDPS is described in more detail in (Li et al, 2000), which is incorporated herein by references for all purposes.
  • x the "true signal”
  • x the "true signal”
  • the process of deconvolution can be made using the maximum entropy (entropy penalty) method. (Donoho D.L., 1992, and Ramanation R. et al., 2004), which are incorporated herein by reference for all purposes.
  • entropy penalty entropy penalty
  • the process of deconvolution is made by a least-square estimate with a 1-norm penalty, also known as the basis pursuit algorithm.
  • the basis pursuit deconvolution is an optimization problem with asymptotic minimax optimality properties proven for signals where a high percent of the points are noise.
  • a ID-slice of a well-separated 2D signal falls in the regime of being "nearly black” in this sense.
  • Two major benefits of using the basis pursuit method for separating mass spectrometry peaks are as follows. First, the output is maximally sparse; second, with a carefully chosen ⁇ , the output x is an asymptotically minimax (and hence in a measurable sense "best") statistical estimate of the true signal in the presence of white noise.
  • the basis pursuit method has been further described by Chen, S. S., et al., 2001, and Donoho, D. 1992, which are incorporated herein by reference for all purposes.
  • deconvolution step 135 further includes using fast wavelet transforms for convolution calculations.
  • Deconvolution step 135 may further include one or more means for removing noise and/or increasing resolution.
  • Poisson noise may be removed in any method known in the art.
  • poisson noise may be removed separately from the white noise by assuming that the deconvolved output is signal with only poisson noise.
  • poisson noise may be incorporated in the deconvolution model by modifying the objective function to be a penalized log-likelihood function rather than a penalized least-squares problem.
  • white noise and poisson noise are independent of position, there may be correlations between white noise and poisson noise that may be detected by an operator skilled in the art.
  • noise level may be used in an objective function calculation for deconvolution step 135.
  • a deconvolution objective function may be modified by methods known in the art to reduce such noise.
  • the deconvolution step 135 may further include the use of fast fourier transform (FFT) for convolution calculations. This is possible because of a well-known mathematical relationship between Fourier transform and convolution - given two signals A and B, the FFT of the convolution C of A and B is equal to the pointwise multiplication of FFT of A with the FFT of B.
  • FFT fast fourier transform
  • x is obtained by deconvolution, it is retransformed or descaled in step 140 to place the output signals in their correct positions on the original time-of-flight axis.
  • the descaling transformation for the linear peak width increase is the inverse function of the following algorithm:
  • a ID mass spectrum may be processed without scaling step 130 and descaling step 140 using a non-scaling method.
  • the non-scaling method is preferably a wavelet basis where the time-of-flight dependence of a blurring operator is included directly in the algorithm.
  • deconvolution algorithm yields data with increased resolution.
  • deconvolution step 135 enhances the signal-to-noise ratio of the spectrum by at least 2, more preferably by at least 5, more preferably by at least 10, more preferably by at least 50, more preferably by at least 100.
  • deconvolution step 135 yields data with increased resolution by a factor of at least 1.5, more preferably by at least 2, more preferably by at least 10, more preferably by at least 100.
  • the deconvolution step results in a spectrum with less than 20% artifact peaks, more preferably less than 10%, more preferably with less than 5%, more preferably less than 1%, more preferably less than 0.1%.
  • the number of output peaks representing observable isotope states of ion species is 50% accurate, more preferably 60%, more preferably 70%, more preferably 80%, more preferably 90%, more preferably 95%, more preferably 99% accurate.
  • the mass-to-charge accuracy is preferably within 1% of its true
  • the intensity of the deconvolved output deviates from the count or the
  • ID mass spectrum may optionally be corrected by using isotope distribution data to group deconvolved peaks into isotopic clusters in
  • L 5 step 145 For example, if a particular group of signals is known to belong to the signal for a particular molecular ion species, then a few statistics such as center of mass, total intensity, and approximate number of carbons may be estimated. Such statistics will be sufficient to determine the binomial structure of the isotope distribution, and hence the charge state and positions of the true isotope positions.
  • Figure 11 illustrates the process of scaling, deconvolving, and descaling.
  • Figure HA illustrates a ID spectrum before scaling (raw data). As can be seen by this figure, each cluster comprises of multiple peaks.
  • Figure HB illustrates a ID spectrum after scaling but before deconvolution.
  • Figure HC illustrates the scaled and deconvolved spectrum.
  • Figure HD illustrates the scaled and deconvolved spectra after it has been descaling. 5 Subsequent to deconvolving 135, descaling 140, and correcting 145, a ID mass spectrum may be converted into 2D spectrum in step 147.
  • data are formed into 2D by continuously ionizing a sample such that a peak of interest is detected in more than one, more than two, more than three, more than four, more than 5, or preferably more than 10 spectra.
  • Conversion of ID spectrum to 2D spectrum preferably involves the use of a programmable 0 computer unit that can line up ID spectra wherein identical m/z's line up on the x-axis and that sequential spectra line up on the y-axis.
  • Figure 12 illustrates the process of compiling multiple ID spectra into a 2D spectrum.
  • Figure 12 A illustrates the compilation of multiple ID spectra such that similarly situated peaks are aligned vertically. Peaks 1, 2, 3, and 4 are exemplary peaks that align in more than 1, 2, or 3 sequential mass spectra.
  • Figure 12B illustrates a compiled 2D spectrum of more than 20 individual ID spectra.
  • the 2D spectrum is subject to cluster analysis and collapsing of 2D peaks in step 150.
  • Cluster analysis 150 allows for the determination of 2D peaks in order to allow each isotope/charge state combination for a molecular ion species to be represented only once in the resulting data.
  • step 150 isotopic peak clusters may be identified by statistical estimation of a model defined by the physical properties of the isotopic variation for a charge state of a species. Specifically, isotopic clusters are expected to have spacing between peaks approximately equal to the inverse of the number of charges (charge state) for that cluster. Relative intensities of peaks within an isotopic cluster are expected to be identified approximately by a probability distribution such as a binomial distribution for the number of heavy carbon isotopes in the isotopic mass creating each peak in the isotopic cluster. Actual intensities may further vary by noise in m/z location and/or intensity according to poisson or other statistical models. These physically derived statistical relationships of peak spacing and relative intensity within an isotope cluster define a model with parameters that can be estimated by standard methodology such as maximum likelihood or least squares methods.
  • Parameters to be estimated could include various combinations of: m/z location of the maximum intensity peak (or a reference peak for the cluster); parameters of the binomial or other statistical model describing relative peak intensities; overall intensity of the cluster (e.g. absolute intensity of the maximum-intensity peak); charge state (z) or inverse charge state (1/z) giving peak spacing; and parameters of distributions describing noise in m/z location and/or peak intensity.
  • particular parameters can be estimated from a subset of data and used for the remainder of the data.
  • the 2D clustering analysis of step 150 usually involves the use of one or more parameters having a minimum or maximum threshold.
  • the thresholds allow for a programmable machine or a person to make a binary decision - whether a peak belongs to a cluster or not. If a peak belongs to a cluster, then the peak is further analyzed as described below. If a peak does not belong to a cluster, then it may be removed from further analysis or subject to further analysis as described below. Examples of parameters that have minimum or maximum thresholds that may be used for 2D clustering analysis in deciding if a peak belongs to a particular cluster include, but are not limited to, noise level, signal-to-noise ratio, spacing between peaks, and atomic mass unit differences.
  • a peak can be included in an envelope if it is located less than a multiple of 1, 2, 4, or 8 of the peak's width away from the envelope (or another peak).
  • a parameter for clustering may be noise level
  • a threshold amount for identifying resolved peaks may be any peak with intensity above a particular noise level or a multiple of that noise level, e.g., 1, 20, 40, or 80 times a particular noise level. Using such a threshold, all peaks below a threshold are eliminated from further calculations, while all peaks above the threshold are further analyzed.
  • the original dataset may be reduced in size by at least 1 order of magnitude, at least 2, at least 3, or at least 4 orders of magnitude.
  • a threshold parameter e.g., noise level
  • the parameter for identifying resolved peaks might be the difference in atomic mass unit between two peaks. If a second peak has an atomic mass unit that is greater than a particular threshold, e.g. >1 m/z, than that second peak is deemed outside of a particular cluster. If, on the other hand, a second peak has mass that is less than a particular threshold, than it is deemed to belong to the cluster of the first peak and is further analyzed as described below.
  • the parameter used to cluster isotope states may be determined empirically without reference to m/z differences. Other parameters and numerical values for such parameters may also be used, independently or in conjunction with any of the above.
  • Parameters and their numerical values may be determined depending upon the sample, mass spectrometer, and the 2D spectrum output. The selection of parameters and their numerical values is generally known to a person or ordinary skill in the art. Typically, the order of magnitude of raw separations-mass spectra is several orders of magnitude larger than the number of molecular ion species detected from the sample.
  • the 2D mass spectrum data may be converted into a list of 2D peaks step 150. The conversion involves grouping peaks across ID spectra that occur at the same or similar m/z' s and representing that group of peaks by a single intensity value for the cluster. The 2D peaks represent an intensity contribution for the collective isotope states of each ion species.
  • each 2D peak is de-isotoped in step 160.
  • De-isotoping is the process of summing up the contributions of all of the isotope state intensities and placing the sum either at the m/z position of the molecular ion species where only carbon- 12 occurs or at the centroid of the molecular ion species, where centroid is defined as the m/z position of the 5 intensity weighted average over all observable isotopes.
  • the sum of all of the isotope state intensities for one cluster is also referred to as the "total intensity" of the cluster. Deisotoping is performed by any known method.
  • deisotoping is performed for a cluster comprising of ID deconvolved peaks that represent isotopes of a molecular ion species within an accuracy of 0 0.1 m/z by summing up intensities and placing them at the position of the m/z of determined monoisotopic m/z.
  • deisotoping is performed for a cluster comprising of ID deconvolved peaks that may or may not represent accurate molecular ion species within an accuracy of 0.1 m/z, by estimating an average m/z position by an intensity- weighted average of [ 5 the peaks, and placing the sum of the intensities at that m/z position.
  • step 165 After deisotoping of step 160 has been competed, the deisotoped peaks are de-charged in step 165.
  • De-charging is the process of determining the clusters that represent the different charge states of the same molecular species, calculating the molecular weight and/or the average molecular weight of the molecular species, and placing the sum of the intensities of each charge £0 state of the molecular species at the determined molecular weight. In other words, decharging involves collapsing neutral mass components. De-charging is performed by any known method.
  • a cluster whose underlying deconvolved ID peaks represent the molecular ion species isotope state within an accuracy of 0.1 m/z may be decharged by determining the spacing between it and other ID deconvolved peaks.
  • a cluster whose underlying deconvolved ID peaks may or may not represent the molecular ion species isotope states within an accuracy of 0.1 m/z, may be de-charged by determining by the width of the lineshape and the width of the collection of peaks at half max. Not sure what the algorithm is here.
  • charge state is assigned by maximizing a score that is a function of 0 charge state and intensities that increases with the intensities of contiguous charge states also present.
  • the likelihood of the presence of a given neutral mass component is calculated by making a table of possible neutral mass on x-axis, possible charge states on the y-axis, and putting a score for each entry. In a second step, analysis of this table is performed to 5 determine the highest likelihood of particular molecular weights present in the spectrum. Additional methods to calculate likelihood of presence of a given neutral mass component include those disclosed in Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001; and Ludwig Fahrmeir and Gerhard Tutz, Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, 1994, both of which are incorporated herein by reference in their entirety for all purposes.
  • Neutral mass data is compiled into a list, as is illustrated by Figure 13.
  • Figure 13 illustrates a list of data output that may be generated by the methods herein.
  • Figure 13, Column 1 illustrates neutral mass values
  • Figure 13, Column 2 illustrates the centroids in separation time value
  • Figure 13, Column 3 illustrates the title intensity under island of delta neutrals.
  • the present invention further contemplates alignment of multiple neutral mass lists or multiple 2D peak lists.
  • Alignment can be done using a programmable computer unit. Alignment of spectra in the separation time axis can be accomplished by estimating a linear or non-linear relationship between the separation times of particular peaks between any two samples. Peaks used for estimation of the alignment relationship can include known calibrants or known endogenous peaks that are consistently present. The separation time of known peaks (calibrants or endogenous) is estimated for each sample. A reference set of separation times for each known peak is either estimated as the average separation time, or is fixed at known reference values, or is chosen to be the separation times for a particular sample, or is chosen or estimated by some other method.
  • the relationship between separation times of the known peaks of each sample and the reference locations of those peaks is estimated using methods for statistical function estimation, such as linear regression, piecewise linear regression, non-linear regression such as polynomial regression or piecewise or local polynomial regression, or other function estimation methods. Once the relationship is estimated; it is used to adjust separation times for the non-reference spectra to match those of the reference spectra.
  • This method has been described assuming known peaks (calibrants or endogenous) are available. We also include in this methodology estimation of those peaks from the data. This data may be used to find patterns in data from many samples by using statistical or pattern recognition methods. Alternatively, if one already has knowledge of a pattern of interest, this data may be used to assess the presence or absence of that pattern in a dataset.
  • a mammal is diagnosed as having (or not having) a disease state by testing a sample from said mammal for the presence (or absence) of a particular 2D peak or neutral mass.
  • a mammal may be tested for a disease state wherein the disease is selected from the group consisting of a neoplastic disease, an immunologic disease, an endocrine disease, a metabolic disease, or a cardiovascular disease. More preferably, the disease state is a neoplastic disease.
  • Neoplastic diseases include, but are not limited to, any condition associated with excessive cellular proliferation, such as brain cancer, breast cancer, bone cancer, cancer of the larynx, gallbladder, pancreas, rectum, parathyroid, thyroid, adrenal, neural tissue, head and neck, colon, stomach, bronchi, kidneys, basal cell carcinoma, squamous cell carcinoma of both ulcerating and papillary type, metastatic skin carcinoma, osteo sarcoma, Ewing's sarcoma, veticulum cell sarcoma, myeloma, giant cell tumor, small-cell lung tumor, gallstones, islet cell tumor, primary brain tumor, acute and chronic lymphocytic and granulocytic tumors, hairy-cell tumor, adenoma, hyperplasia, medullary carcinoma, pheochromocytoma, mucosal neuronms, intestinal ganglioneuromas, hyperplastic corneal nerve tumor, marfanoid habitus tumor, Wilm's tumor, semi
  • a solid or liquid sample or biopsy is obtained from the mammal.
  • liquid samples include urine, nasal discharge, vaginal discharge, mucus, lymph, blood, serum, plasma, saliva, and tears.
  • a liquid sample such as serum is used.
  • the sample is then acidified. This denatures proteins in the sample.
  • the sample is then separated to eliminate certain size molecules from the sample.
  • the sample is then provided into a mass spectrum where the sample is ionized, preferably by an electrospray or nano-electrospray. After ionization, a mass analyzer is used to separate ions according to size and charge.
  • the ID mass spectrum produced by the mass analyzer is provided to a computer system for analysis as described herein.
  • This invention also relates to high throughput automated system for determining composition(s) in sample(s) and abundance of such composition(s). Patterns of sample compositions can subsequently be used for diagnosis, prognosis, and as research tools.
  • FIG 14 illustrates an overview of the high throughput automated system.
  • one or more samples are collected.
  • samples can be collected from control and case individuals in conducting association studies.
  • the samples are then loaded onto an apparatus that includes a preparation/separation unit 605 and a mass spectrometer unit 610.
  • the preparation/separation unit 605 is preferably a microfluidic chip that can perform sample preparation (e.g., acidification) and separation (e.g., electrophoresis).
  • sample preparation e.g., acidification
  • separation e.g., electrophoresis
  • the preparation/separation unit 605 and the mass spectrometer unit 610 are coupled for high-throughput screening.
  • the fluidic device preparation/separation unit 605 has an electrospray interface 607.
  • the mass spectrometer unit 610 and, optionally, the preparation/separation unit 605 are connected online via an interface to a computer unit 615.
  • the computer unit 615 can include a program to control sample preparation, sample separation, ionization, and mass analysis.
  • the computer unit 615 preferable includes means for storage of measured values and data.
  • the computer unit 615 can also function to compare the new measured values with previous measured values already stored.
  • the computer is connected to the other units online and has an interface system as is illustrated in Figure 14.
  • the separation device is a capillary electrophoresis device. In other preferred embodiments, the separation device is a microfluidics chip.
  • a separation device preferably has high separation efficiency, permitting high-resolutions separations in less than 24 hours, less than 2 hours, less than 30 minutes, more preferably less than 15 minutes, more preferably less than 10 minutes.
  • the mass spectrometer device and computer device provide prompt information regarding a given sample (e.g., quality and quantity), and can be used for quick diagnosis, prognosis and analysis. For example, markers for early stages of a disease or for genetic disposition may be identified using the methods and devices herein. Such markers can then be used for diagnosis and prognosis of disease.
  • a sample may be analyzed in less than 15 minutes, more preferably less than 10 minutes, or more preferably less than 5 minutes.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Molecular Biology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

L'invention concerne des procédés et des appareils modelant les formes linéaires de données de spectrométrie de masse. Des icônes peuvent être modelées au moyen d'une distribution initiale modelant des molécules de manière qu'elles possèdent plusieurs positions et/ou énergies avant le voyage dans le spectromètre de masse. Ces distributions initiales peuvent être avancées au moyen du temps des fonctions de vol. Un ajustement peut être effectué entre les formes linéaires modelées et des données empiriques. Une filtration peut réduire de manière importante les dimensions des données empiriques, éliminer du bruit, comprimer les données, récupérer une perte et/ou des données endommagées.
PCT/US2004/017908 2003-06-12 2004-06-04 Procedes d'extraction precise de l'intensite de composant a partir de donnees de separations-spectrometrie de masse WO2004111609A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10/462,228 US7072772B2 (en) 2003-06-12 2003-06-12 Method and apparatus for modeling mass spectrometer lineshapes
US10/462,228 2003-06-12
US10/846,996 2004-05-13
US10/846,996 US20050255606A1 (en) 2004-05-13 2004-05-13 Methods for accurate component intensity extraction from separations-mass spectrometry data

Publications (2)

Publication Number Publication Date
WO2004111609A2 true WO2004111609A2 (fr) 2004-12-23
WO2004111609A3 WO2004111609A3 (fr) 2005-07-14

Family

ID=33555136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/017908 WO2004111609A2 (fr) 2003-06-12 2004-06-04 Procedes d'extraction precise de l'intensite de composant a partir de donnees de separations-spectrometrie de masse

Country Status (1)

Country Link
WO (1) WO2004111609A2 (fr)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1879684A2 (fr) * 2005-04-11 2008-01-23 Cerno Bioscience LLC Analyse de donnees chromatographiques et de spectre de masse
CN102608061A (zh) * 2012-03-21 2012-07-25 西安交通大学 一种改进的tr多组分气体傅里叶变换红外光谱特征变量提取方法
WO2015040381A1 (fr) * 2013-09-23 2015-03-26 Micromass Uk Limited Évaluation de pics pour spectromètres de masse
GB2532820A (en) * 2014-06-11 2016-06-01 Micromass Ltd Flagging ADC Coalescence
US9496126B2 (en) 2015-04-17 2016-11-15 Thermo Finnigan Llc Systems and methods for improved robustness for quadrupole mass spectrometry
DE102015010602A1 (de) * 2015-08-18 2017-02-23 Hochschule Aschaffenburg Verfahren zur Analyse eines Datensatzes einer Flugzeit-Massenspektrometrie-Messung und eine Vorrichtung
CN106530259A (zh) * 2016-11-24 2017-03-22 天津大学 一种基于多尺度散焦信息的全聚焦图像重建方法
US9928999B2 (en) 2014-06-11 2018-03-27 Micromass Uk Limited Flagging ADC coalescence
CN108717685A (zh) * 2018-05-14 2018-10-30 西北大学 一种增强图像分辨率的方法及系统
CN111982949A (zh) * 2020-08-19 2020-11-24 东华理工大学 一种四次导数结合三样条小波变换分离edxrf光谱重叠峰方法
WO2020243643A1 (fr) * 2019-05-31 2020-12-03 President And Fellows Of Harvard College Systèmes et méthodes d'identification de masse basée sur la sm 1 comprenant des techniques de super-résolution
US11211237B2 (en) * 2019-01-30 2021-12-28 Bruker Daltonik Gmbh Mass spectrometric method for determining the presence or absence of a chemical element in an analyte
CN114487072A (zh) * 2021-12-27 2022-05-13 浙江迪谱诊断技术有限公司 一种飞行时间质谱峰拟合方法
GB2607378A (en) * 2021-06-02 2022-12-07 Bruker Scient Llc Physical-chemical property scoring for structure identification in ion spectrometry

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5247175A (en) * 1992-05-27 1993-09-21 Finnigan Corporation Method and apparatus for the deconvolution of unresolved data
US5300771A (en) * 1992-06-02 1994-04-05 Analytica Of Branford Method for determining the molecular weights of polyatomic molecules by mass analysis of their multiply charged ions
US6300626B1 (en) * 1998-08-17 2001-10-09 Board Of Trustees Of The Leland Stanford Junior University Time-of-flight mass spectrometer and ion analysis
US6675104B2 (en) * 2000-11-16 2004-01-06 Ciphergen Biosystems, Inc. Method for analyzing mass spectra

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5247175A (en) * 1992-05-27 1993-09-21 Finnigan Corporation Method and apparatus for the deconvolution of unresolved data
US5300771A (en) * 1992-06-02 1994-04-05 Analytica Of Branford Method for determining the molecular weights of polyatomic molecules by mass analysis of their multiply charged ions
US6300626B1 (en) * 1998-08-17 2001-10-09 Board Of Trustees Of The Leland Stanford Junior University Time-of-flight mass spectrometer and ion analysis
US6675104B2 (en) * 2000-11-16 2004-01-06 Ciphergen Biosystems, Inc. Method for analyzing mass spectra

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1879684A4 (fr) * 2005-04-11 2009-07-15 Cerno Bioscience Llc Analyse de donnees chromatographiques et de spectre de masse
EP1879684A2 (fr) * 2005-04-11 2008-01-23 Cerno Bioscience LLC Analyse de donnees chromatographiques et de spectre de masse
CN102608061A (zh) * 2012-03-21 2012-07-25 西安交通大学 一种改进的tr多组分气体傅里叶变换红外光谱特征变量提取方法
WO2015040381A1 (fr) * 2013-09-23 2015-03-26 Micromass Uk Limited Évaluation de pics pour spectromètres de masse
US20160217986A1 (en) * 2013-09-23 2016-07-28 Micromass Uk Limited Peak Assessment for Mass Spectrometers
GB2532820A (en) * 2014-06-11 2016-06-01 Micromass Ltd Flagging ADC Coalescence
US9928999B2 (en) 2014-06-11 2018-03-27 Micromass Uk Limited Flagging ADC coalescence
GB2532820B (en) * 2014-06-11 2019-02-06 Micromass Ltd Flagging ADC Coalescence
US9496126B2 (en) 2015-04-17 2016-11-15 Thermo Finnigan Llc Systems and methods for improved robustness for quadrupole mass spectrometry
DE102015010602A1 (de) * 2015-08-18 2017-02-23 Hochschule Aschaffenburg Verfahren zur Analyse eines Datensatzes einer Flugzeit-Massenspektrometrie-Messung und eine Vorrichtung
CN106530259B (zh) * 2016-11-24 2019-10-18 天津大学 一种基于多尺度散焦信息的全聚焦图像重建方法
CN106530259A (zh) * 2016-11-24 2017-03-22 天津大学 一种基于多尺度散焦信息的全聚焦图像重建方法
CN108717685A (zh) * 2018-05-14 2018-10-30 西北大学 一种增强图像分辨率的方法及系统
US11211237B2 (en) * 2019-01-30 2021-12-28 Bruker Daltonik Gmbh Mass spectrometric method for determining the presence or absence of a chemical element in an analyte
WO2020243643A1 (fr) * 2019-05-31 2020-12-03 President And Fellows Of Harvard College Systèmes et méthodes d'identification de masse basée sur la sm 1 comprenant des techniques de super-résolution
CN111982949A (zh) * 2020-08-19 2020-11-24 东华理工大学 一种四次导数结合三样条小波变换分离edxrf光谱重叠峰方法
CN111982949B (zh) * 2020-08-19 2022-06-07 东华理工大学 一种四次导数结合三样条小波变换分离edxrf光谱重叠峰方法
GB2607378A (en) * 2021-06-02 2022-12-07 Bruker Scient Llc Physical-chemical property scoring for structure identification in ion spectrometry
CN114487072A (zh) * 2021-12-27 2022-05-13 浙江迪谱诊断技术有限公司 一种飞行时间质谱峰拟合方法
CN114487072B (zh) * 2021-12-27 2024-04-12 浙江迪谱诊断技术有限公司 一种飞行时间质谱峰拟合方法

Also Published As

Publication number Publication date
WO2004111609A3 (fr) 2005-07-14

Similar Documents

Publication Publication Date Title
US20050255606A1 (en) Methods for accurate component intensity extraction from separations-mass spectrometry data
US7493225B2 (en) Method for calibrating mass spectrometry (MS) and other instrument systems and for processing MS and other data
US10217619B2 (en) Methods for data-dependent mass spectrometry of mixed intact protein analytes
US8975577B2 (en) System and method for grouping precursor and fragment ions using selected ion chromatograms
EP2447980B1 (fr) Procédé de génération d'un spectre de masse disposant d'une alimentation à résolution améliorée
US9337009B2 (en) Exponential scan mode for quadrupole mass spectrometers to generate super-resolved mass spectra
EP2775509B1 (fr) Procédés et appareil pour décomposer des spectres de masse en tandem générés par fragmentation d'ions
US20130131998A1 (en) Methods and Apparatus for Identifying Mass Spectral Isotope Patterns
EP2641260B1 (fr) Contrôle de l'échange hydrogène-deutérium spectre par spectre
WO2004111609A2 (fr) Procedes d'extraction precise de l'intensite de composant a partir de donnees de separations-spectrometrie de masse
CN108982729B (zh) 用于提取质量迹线的系统和方法
CN113495112B (zh) 质谱分析方法和质谱系统
US7072772B2 (en) Method and apparatus for modeling mass spectrometer lineshapes
EP3523818B1 (fr) Système et procédé d'identification d'isotope en temps réel
EP3542292B1 (fr) Techniques d'analyse de masse d'un échantillon complexe
Monchamp et al. Signal processing methods for mass spectrometry
Payne Profiling the metabolome using Fourier transform ion cyclotron resonance mass spectrometry, optimised signal processing, noise filtering and constraints methods

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase