US7653496B2 - Feature selection in mass spectral data - Google Patents
Feature selection in mass spectral data Download PDFInfo
- Publication number
- US7653496B2 US7653496B2 US11/670,955 US67095507A US7653496B2 US 7653496 B2 US7653496 B2 US 7653496B2 US 67095507 A US67095507 A US 67095507A US 7653496 B2 US7653496 B2 US 7653496B2
- Authority
- US
- United States
- Prior art keywords
- mass
- sample
- samples
- spectral data
- mass spectral
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 230000003595 spectral effect Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 claims abstract description 66
- 150000001875 compounds Chemical class 0.000 claims abstract description 33
- 239000000523 sample Substances 0.000 claims description 66
- 230000014759 maintenance of location Effects 0.000 claims description 39
- 239000000203 mixture Substances 0.000 claims description 23
- 230000015572 biosynthetic process Effects 0.000 claims description 19
- 239000012472 biological sample Substances 0.000 claims description 18
- 150000002500 ions Chemical class 0.000 claims description 13
- 239000000539 dimer Substances 0.000 claims description 12
- 239000000463 material Substances 0.000 claims description 11
- 238000004811 liquid chromatography Methods 0.000 claims description 9
- 239000013638 trimer Substances 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 5
- 238000005251 capillar electrophoresis Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 claims description 2
- 238000004817 gas chromatography Methods 0.000 claims description 2
- 238000004252 FT/ICR mass spectrometry Methods 0.000 claims 2
- 238000003795 desorption Methods 0.000 claims 1
- 238000005040 ion trap Methods 0.000 claims 1
- 239000011159 matrix material Substances 0.000 claims 1
- 239000000090 biomarker Substances 0.000 abstract description 13
- 239000002207 metabolite Substances 0.000 abstract description 13
- 238000004458 analytical method Methods 0.000 description 40
- 239000000126 substance Substances 0.000 description 25
- 238000004949 mass spectrometry Methods 0.000 description 24
- 239000011734 sodium Substances 0.000 description 22
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 15
- 108090000765 processed proteins & peptides Proteins 0.000 description 12
- 210000002700 urine Anatomy 0.000 description 12
- 239000000284 extract Substances 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 10
- 102000004169 proteins and genes Human genes 0.000 description 10
- 238000012545 processing Methods 0.000 description 8
- 238000007619 statistical method Methods 0.000 description 7
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 6
- 241000196324 Embryophyta Species 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 6
- 238000010195 expression analysis Methods 0.000 description 6
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 6
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 5
- 229940098773 bovine serum albumin Drugs 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 238000004128 high performance liquid chromatography Methods 0.000 description 5
- 238000001819 mass spectrum Methods 0.000 description 5
- 239000000419 plant extract Substances 0.000 description 5
- 102000004196 processed proteins & peptides Human genes 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 241000894007 species Species 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- 102000004338 Transferrin Human genes 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000001035 drying Methods 0.000 description 4
- 239000007789 gas Substances 0.000 description 4
- 230000000155 isotopic effect Effects 0.000 description 4
- 230000007935 neutral effect Effects 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 4
- 238000010183 spectrum analysis Methods 0.000 description 4
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 3
- DGAQECJNVWCQMB-PUAWFVPOSA-M Ilexoside XXIX Chemical compound C[C@@H]1CC[C@@]2(CC[C@@]3(C(=CC[C@H]4[C@]3(CC[C@@H]5[C@@]4(CC[C@@H](C5(C)C)OS(=O)(=O)[O-])C)C)[C@@H]2[C@]1(C)O)C)C(=O)O[C@H]6[C@@H]([C@H]([C@@H]([C@H](O6)CO)O)O)O.[Na+] DGAQECJNVWCQMB-PUAWFVPOSA-M 0.000 description 3
- 102000004877 Insulin Human genes 0.000 description 3
- 108090001061 Insulin Proteins 0.000 description 3
- 238000004850 capillary HPLC Methods 0.000 description 3
- 238000013375 chromatographic separation Methods 0.000 description 3
- 238000011033 desalting Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 229940125396 insulin Drugs 0.000 description 3
- 239000006199 nebulizer Substances 0.000 description 3
- 150000007524 organic acids Chemical class 0.000 description 3
- 235000005985 organic acids Nutrition 0.000 description 3
- 210000002966 serum Anatomy 0.000 description 3
- 150000003384 small molecules Chemical class 0.000 description 3
- 229910052708 sodium Inorganic materials 0.000 description 3
- 239000002904 solvent Substances 0.000 description 3
- QHZLMUACJMDIAE-UHFFFAOYSA-N 1-monopalmitoylglycerol Chemical compound CCCCCCCCCCCCCCCC(=O)OCC(O)CO QHZLMUACJMDIAE-UHFFFAOYSA-N 0.000 description 2
- 101000766308 Bos taurus Serotransferrin Proteins 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- PXIPVTKHYLBLMZ-UHFFFAOYSA-N Sodium azide Chemical compound [Na+].[N-]=[N+]=[N-] PXIPVTKHYLBLMZ-UHFFFAOYSA-N 0.000 description 2
- 108090000901 Transferrin Proteins 0.000 description 2
- 150000001413 amino acids Chemical class 0.000 description 2
- 235000014113 dietary fatty acids Nutrition 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 229930195729 fatty acid Natural products 0.000 description 2
- 239000000194 fatty acid Substances 0.000 description 2
- 150000004665 fatty acids Chemical class 0.000 description 2
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 210000004153 islets of langerhan Anatomy 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000002503 metabolic effect Effects 0.000 description 2
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 238000000955 peptide mass fingerprinting Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000007921 spray Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000001269 time-of-flight mass spectrometry Methods 0.000 description 2
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 1
- 241000219194 Arabidopsis Species 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 238000005377 adsorption chromatography Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 230000000711 cancerogenic effect Effects 0.000 description 1
- 238000000738 capillary electrophoresis-mass spectrometry Methods 0.000 description 1
- 231100000357 carcinogen Toxicity 0.000 description 1
- 239000003183 carcinogenic agent Substances 0.000 description 1
- 150000005829 chemical entities Chemical class 0.000 description 1
- 239000013626 chemical specie Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000007857 degradation product Substances 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 235000019253 formic acid Nutrition 0.000 description 1
- 238000002290 gas chromatography-mass spectrometry Methods 0.000 description 1
- -1 glycerol-esters Chemical class 0.000 description 1
- 150000002314 glycerols Chemical class 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005342 ion exchange Methods 0.000 description 1
- 238000001155 isoelectric focusing Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 150000002605 large molecules Chemical class 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000001972 liquid chromatography-electrospray ionisation mass spectrometry Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 229920002521 macromolecule Polymers 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- 230000002335 preservative effect Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005204 segregation Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 150000003385 sodium Chemical class 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
- G01N27/622—Ion mobility spectrometry
- G01N27/623—Ion mobility spectrometry combined with mass spectrometry
Definitions
- High throughput MS is a powerful technique in biomarker discovery.
- the use of this technique is complicated by a number of factors.
- Biological samples are very complex, and often contain hundreds to thousands of compounds, and analysis of these samples can often be difficult.
- the differential comparison of LC-MS data from different biological samples generates complex datasets, and presents significant data processing challenges.
- the analysis is time-consuming and there is often significant noise and variability that is not properly accounted for.
- Current methods to eliminate noise and detect mass spectral peaks use an ad hoc approach, and do not use any a priori or learned information with regard to peak shape, retention time, or relationship among peaks.
- Statistical methods used to subtract background and reduce noise often remove relevant information in addition to filtering out noise and irrelevant information. The resulting data sets are not suitable for downstream analysis during biomarker discovery.
- FIG. 1 is a diagram representing a method for the differential analysis of two complex biological samples using one method of this invention.
- FIG. 2 shows the effect of noise reduction in an MS spectrum. The pattern before noise reduction is on the left, while that after noise reduction is on the right.
- FIG. 3 shows a graphic user interface for filtering the data according to the user's choice.
- FIG. 4 is a screen capture showing background-subtracted mass spectra and TIC from salt containing cellular extract.
- FIG. 5 shows total number of features as a function of m/z and retention time. From the top: (a) no filtering and (b) features present in all samples and at least 2 ⁇ variation in relative response, (c) Log ratio versus retention time, and (d) Log/Log plot.
- FIG. 6 (a) (left panel) graphical output of chemical identification in profiler; (b) (upper right panel) zoom of multiple charge components of insulin; and (c) (lower right panel) deconvoluted mass spectrum.
- the present invention relates to, inter alia, methods for differential profiling of samples.
- some embodiments of the methods of the present invention integrate chemical information with differential expression analysis and statistical methods to identify or differentiate expression level changes in a biological sample.
- the methods of the present invention use a molecular feature extraction process to group mass peaks in mass spectrometric data sets.
- the peaks are grouped according to particular chemical features or properties.
- Extracted molecular feature information is then normalized and statistically or visually analyzed to identify differentially expressed features.
- the methods of the present invention combine chemical information with differential expression analysis, thereby significantly reducing noise.
- using chemically relevant information to extract molecular features also reduces the complexity of the input data for the differential expression analysis.
- the present invention provides improved methods for rapid and accurate identification of differentially expressed entities in biological samples. Therefore, the methods of the present invention can be used to compare complex sets of data for various samples, and is particularly useful in biomarker discovery.
- differential profiling or “differential display” refers to investigating the differences between the mass spectral data for a first sample and those for a second sample.
- differential profiling can be performed for more than two sets of data, namely comparing the mass spectral data of three or more samples and investigating the differences among them. It should be noted that sometimes differential profiling is performed using sample sets, each of which comprises multiple samples. For instance, a user may wish to compare the molecules in the sera of breast cancer patients and those in the sera of normal controls. Thus, serum samples from multiple breast cancer patients are obtained, and serum samples from multiple normal controls are also collected.
- Each sample is analyzed, and differential profiling is conducted to compare the mass spectral data of the samples in the patient group to the mass spectral data of the control group.
- a differential display image or plot shows the differences between or among the samples, with respect to abundance of a particular component, presence of a particular chemical species, or changes in expression level of a particular component.
- sample as used herein relates to a material or complex mixture of materials, typically, although not necessarily, in fluid form.
- Samples of the present invention include, but are not limited to, biological samples obtained from natural biological sources, such as cells or tissues, or plants.
- the samples of the present invention include, but are not limited to, complex biological samples containing many different components or metabolites, such as urine or serum, for example.
- the samples of the present invention also include complex mixtures derived from non-animal sources, such as complex extracts derived from plants.
- the sample may also be non-biological, such as environmental samples (water, air, rain, etc.)
- spectral peak refers to a peak in the output from any type of spectral analysis instrument, and is known in the art. In a given analysis, peaks can represent one or more components in a sample.
- a “mass spectral peak” is a spectral peak in a mass spectrum.
- 3-D peak refers to a cluster of LC-MS (or GC-MS, CE-MS, etc.) signals that have the same m/z value (subject to variations in measurement), and similar retention time values.
- the signals could be either raw profile spectral pixels or spectral peaks.
- the spectral peaks that are related to the same compound are grouped together. This can be achieved by analyzing the features of the various spectral peaks. Specifically, spectral peaks of similar retention time (RT) and mass to charge ratio (m/z) are grouped, optionally also taking into consideration other properties such as isotope clustering, mass, adduct formation, dimer formation, and/or trimer formation. An exemplary grouping is demonstrated in Table 1 below, which shows eight feature groups that elute at approximately the same time in a liquid chromatography column.
- the first broad group in Table 1 relates to a compound (M) and comprises three subgroups.
- the first subgroup comprises M associated with a hydrogen (M+H), and four other species that are also M+H but differ in isotope compositions (M+H+1, M+H+2, M+H+3, M+H+4). Isotope clustering is explained in more detail in Example 1 of the present application.
- the second subgroup comprises the sodium adducts of M (M+Na), including species with different isotope compositions.
- the third subgroup comprises dimers of M (2M) with sodium, again including species with different isotope compositions.
- the other seven groups are similarly listed.
- the compound (also called molecular feature) that each group relates to is designated M and shown in bold with its average RT, mass, and cumulative abundance.
- some embodiments of the present invention provide a method for analyzing mass spectral data of a sample, comprising dividing the mass spectral data into feature groups, each feature group relating to a compound, wherein said dividing is performed based on retention time, mass to charge ratio, and optionally at least one property selected from the group consisting of isotope clustering, mass, adduct formation, dimer formation, and trimer formation.
- the method may further comprise comparing the properties of any feature group to those of known compound in order to identify the compound(s) in the feature group.
- the present invention further provides methods for differential profiling of multiple samples, by extracting chemically relevant information, followed by differential expression analysis.
- Chemically relevant information is extracted using a computerized algorithm that automatically extracts unique features from a data set.
- the extracted features are then differentially analyzed using a combination of statistical and visual methods.
- each one mass spectral data set obtained from a sample can be analyzed by the feature grouping process described above.
- the grouped data derived from multiple samples are then compared to each other or one another to identify compounds that are present in one or some, but not all, of the samples, or compounds of which the abundance changes significantly between or among the samples.
- the results can be used, for example, to identify compounds that increase or decrease in a diseased sample versus the normal control; an animal that has been fed with a carcinogen versus the normal control; a water sample from a potentially polluted area versus an unpolluted area; and the like.
- the data can be filtered by criteria determined by the user.
- the user can choose a specific range of retention time and/or mass to include in the analysis.
- the user can choose the isotope pattern, charge state, abundance, etc. of the spectral peaks that the user wishes to include in the analysis. For example, if the user only wants to look for peptide markers that change in abundancy between or among the samples being analyzed, the peptide filter can be selected (see Example 1 and FIG. 3 ). If the user only wants small molecules to be analyzed or displayed, a mass range can be prescribed.
- FIG. 1 A simplified representation of certain embodiments of the method is shown in FIG. 1 .
- a biological sample containing a complex mixture of chemical components is obtained and then analyzed by any of one or more spectral analysis methods, such as LC-MS, for example.
- Spectral peaks obtained from the LC-MS analysis are grouped according to m/z ratio and retention time into 3-D peaks.
- the peak volume can be measured for ach 3-D peak.
- the molecular features are extracted as clusters of these 3-D peaks, using an algorithm that filters out irrelevant features while associating or grouping 3-D peaks according to chemically relevant information.
- 3-D peaks can be associated as isotope clusters, dimers, adducts, etc.
- Peaks can be grouped according to chemical information available from a database, or based on user input.
- the extracted molecular features are listed in the output and can be used to identify different chemical components (or markers).
- the samples are then differentially profiled by comparing the extracted features (i.e. the filtered output) with the filtered output for a second biological sample, or by comparing the filtered output with molecular features extracted from known compounds.
- the present disclosure describes a method for differential profiling of highly complex biological samples.
- Samples of the present invention include, but are not limited to, biological samples obtained from natural biological sources, such as cells or tissues.
- the sample is a highly complex mixture of different components.
- the components may include, but are not limited to, proteins, metabolites, amino acids, glycerol esters, fatty acids, etc., and any derivatives or degradation products thereof.
- the sample contains hundreds to thousands of components, spanning a broad range of compositions and concentrations.
- the sample is a complex mixture of peptides or nucleic acids.
- the sample is a complex mixture of metabolites present in a biological sample.
- the sample is a complex mixture of small molecules.
- the present disclosure describes methods for analyzing a complex mixture of various components.
- the analysis begins with separating portions of the sample into multiple components.
- the sample can be separated using any of a number of separation techniques including, but not limited to, ion exclusion, ion exchange, normal/reversed phase partition, size exclusion, ligand exchange, liquid/gel phase isoelectric focusing, adsorption chromatography, and liquid chromatography.
- the components are separated by liquid chromatography (i.e. LC, such as HPLC). Each separated component is associated with a specific retention time. Variation in retention time can be reduced using flow-controlled capillary HPLC.
- the separated components are then further analyzed by mass spectrometry (MS) methods, to determine the identity of the separated components.
- MS mass spectrometry
- the MS analysis can be performed by using any MS method or instrumentation.
- the MS analysis uses a combination of time-of-flight mass spectrometry (TOF-MS) and electron spray ionization mass spectrometry (ESI-MS).
- TOF-MS time-of-flight mass spectrometry
- ESI-MS electron spray ionization mass spectrometry
- ESI methods are preferred because ions are generated directly in solution and therefore, ESI can be readily combined with other spectral analysis methods, such as HPLC.
- the charge mode of ESI can be varied according to the sample being analyzed. For example, samples derived from plants and cellular extracts are typically analyzed in the positive ion ESI mode. On the other hand, samples containing mostly organic acids (such as a urine sample, for example) are typically analyzed in the negative ESI mode.
- each separated component of the sample yields multiple spectral peaks with a measured mass-to-charge (m/z) ratio. Variations in mass measurement can be reduced by normalization against an internal reference standard.
- spectral peaks can be grouped, into 3-D peaks, according to m/z ratio and retention time.
- the magnitude parameter, m/z ratio and retention time for each peak are stored in a text-readable format.
- the magnitude parameter, m/z ratio and retention time are stored in a text-translatable format.
- the magnitude, m/z ratio and retention time are stored in text format and displayed as a graphical representation.
- the 3-D peaks obtained by MS analysis of a sample are associated or grouped on the basis of particular chemical or molecular features.
- 3-D peaks are associated or grouped into isotope clusters.
- isotope clusters are associated into molecular features characterized by their neutral masses and retention times.
- 3-D peaks are grouped into adducts, dimers, and charge states.
- non-chromatographic background is removed by subtracting the baseline.
- molecular features obtained from MS analysis of a sample are normalized before comparison with molecular features obtained from MS analysis of a known compound.
- the present disclosure describes a method for differential analysis of components in different samples or groups of samples.
- the differential analysis compares expression levels of the components in a first sample with expression levels of the components in a second sample.
- parameters for a chemical component in a sample can be compared with parameters for a known component.
- the associated or grouped spectral peaks for separated components in a sample are compared with the associated or grouped spectral peaks obtained from MS analysis of a known material.
- the known material is a peptide.
- the known material is a nucleic acid.
- the methods for MS analysis of biological samples include extracting molecular features from the spectral data.
- the analysis comprises identifying molecular features from a sample at a given retention time value, and then associating the identified molecular feature with the molecular features of a known material.
- associating molecular features comprises identifying and grouping isotope clusters.
- associating molecular features comprises identifying and grouping neutral mass components in the sample.
- the extraction of molecular features from the spectral data comprises associating spectral peaks into 3-D peak, 3-D peaks into isotope clusters, and isotope clusters into molecular features on the basis of ion charge at a given retention time.
- a graphical representation is a plot of Log ratio versus retention time. In another aspect, a graphical representation is a Log/Log ratio plot.
- markers compounds that change in abundancy between or among samples
- their properties retention time, mass, etc.
- the markers can then be studied in further detail.
- the system comprises a first apparatus for separating a complex biological sample into chemical components on the basis of retention time and a second apparatus that determines the mass of each of the separated chemical components.
- the retention time data and mass data for each separated component are retained in a storage medium.
- the system includes a processing subsystem that associates or groups the separated components on the basis of properties including retention time and mass.
- the system also includes an output subsystem for displaying the association of the separated chemical components.
- the first apparatus comprises a liquid chromatography column, a gas chromatography column, or a capillary electrophoresis device.
- the processing subsystem associates the components on the basis of spectral peak intensity. In another embodiment, the processing subsystem determines a magnitude parameter for each 3-D peak based on the intensity of the spectral peak, the retention time and measured m/z. In an aspect, the magnitude parameter is the volume of the 3-D peak.
- the processing subsystem associates 3-D peaks according to chemical properties, such as mass or charge state. In another aspect, the subsystem associates 3-D peaks into isotope clusters. In one embodiment, the processing subsystem can be used as part of a differential analysis system. The processing subsystem may optionally compare the associated spectral peaks for the components of the sample with the associated spectral peaks for a known material, to identify one or more components in the sample.
- the system includes a storage medium for retaining the retention time and mass for each separated chemical component in a sample.
- the storage medium is a computer-readable medium that stores a plurality of data objects.
- the stored data objects include data objects identifying the retention time for components in the sample, the m/z ratio for components in the sample, and other chemically relevant attributes of components within the sample. Chemically relevant attributes include charge states, isotope properties and adducts.
- the stored data objects contain information about peak magnitude or peak volume.
- the data objects to be stored on the computer-readable medium may be further selected on the basis of signal strength. In an aspect, only data objects having signal strength greater than a prescribed value are stored on the computer-readable medium.
- the data objects stored on the computer-readable medium can be manipulated as text.
- data objects are stored in data base form, such that data objects identifying retention time, m/z ratio and peak magnitude are displayed as related objects in a record.
- the method to be performed can be a method for dividing the mass spectral data from a sample into feature groups, each feature group relating to a compound, wherein said dividing is performed based on retention time, mass to charge ratio, and optionally at least one property selected from the group consisting of isotope clustering, mass, adduct formation, dimer formation, and trimer formation.
- the method may further comprise allowing the user to filter in or out compounds of interest based on one or more properties selected from the group consisting of retention time, mass, isotope pattern, charge state, abundance, mass defect, and number of ions, for example.
- the method may be a differential profiling method, in which each sample in a collection of multiple samples is first analyzed as described above, then the results from the multiple samples are compared to each other or one another to identify the differences.
- This example describes a software for complexity reduction of liquid chromatography/mass spectrometry (LC/MS) data.
- MFE Molecular Feature Extractor
- LC/MS liquid chromatography/mass spectrometry
- Biomarkers are metabolites, peptides, and other biomolecules that can be used to determine the presence or absence of a specific condition such as a disease.
- LC/MS-based platforms are rapidly becoming popular for the discovery of new biomarkers.
- the general strategy for doing this is to isolate the type of biomarker from a biological sample (e.g. blood) and then perform reversed phase LC/MS on the resulting mixture.
- the major challenge for such analyses is the complexity of the sample, which can have concentrations spanning up to 12 orders of magnitude and can contain hundreds of thousands of components.
- LC/MS systems separate the signal for a single component in the mixture into 100's of different peaks in the mass spectral data, and reconstructing the molecular entities is a rather difficult task.
- the high throughput and resolution of mass spectrometers result in data files that can be up to several gigabytes in size. Thus, accurate and high throughput data reduction is one of the critical steps in such a biomarker analysis platform.
- MFE Molecular Feature Extractor
- FIG. 2 shows contour plots of the signals from a peptide mixture before and after the removal of chemical noise.
- the program can remove all signals with a signal/noise ratio of less than 2.
- Peaks that belong to the same isotope cluster elute from the chromatographic column at the same time and are spaced at regular intervals that reflects the charge state of the isotopic cluster. However, this may not be sufficient since overlapping, coeluting isotopic clusters are common. Thus, the heights of the peaks in the same isotopic cluster can be taken into account based on knowledge of chemical composition of the compounds in the sample. For example, the ratio between the amounts of naturally occurring 12 C and 13 C is known.
- a and B correspond to the same compound except that they contain 12 C and 13 C, respectively.
- isotope clusters into molecular features.
- the separate isotopic clusters for a given molecular feature elute at approximately the same time.
- These sets of isotope clusters with the same RT are grouped into different molecular features by their associated mass and according to chemistry rules such as the presence of salt adducts.
- FIG. 3 shows a graphical user interface that lists many filters.
- the user can choose a specific range of retention time and/or mass to include in the analysis.
- the user can choose the isotope pattern, charge state, abundance, etc. of the spectral peaks that the user wishes to include in the analysis.
- MFE speed performance
- a key feature of a biomarker discovery platform is the ability to detect small but significant changes in concentrations for proteins between a normal and a diseased sample within a highly complex mixture.
- a digest of the complete set of cytosolic proteins from E. coli was split three ways, with 100, 200, and 400 fmol of a tryptic digest of bovine serum albumin (BSA) and 200, 100, and 50 fmol of a tryptic digest of serotransferrin added to these three samples, respectively.
- BSA bovine serum albumin
- Each sample was run five times using LC/MS and each of the 15 resulting datafiles ( ⁇ 300 MB each) was analyzed using MFE to create lists of components.
- the resulting compounds were loaded into GeneSpring MS, filtered by fold change and the requirement that a compound must be present in 4 out of 5 datafiles for each condition, and subjected to K-means clustering.
- the resulting sets were submitted to Agilent Spectrum Mill Proteomics Workbench for manual peptide mass fingerprinting (PMF), and BSA and serotransferrin were successfully identified. It had been expected that the signals for BSA and serotransferrin should change in magnitude proportional to the amount of the corresponding protein in the sample. Indeed, the BSA peptides increased in intensity across the three samples while the serotransferrin peptides decreased in intensity, as expected.
- Metabolic samples typically are highly complex, containing hundreds to thousands of compounds, many of which co-elute, spanning a broad range of concentration and compound classes. These characteristics result in significant data-processing challenges.
- the MFE algorithm extracts chemically qualified molecular features from complex LC-TOF data sets.
- Mass Profiler software combines capabilities for cross-sample alignment of molecular features in both the retention time and mass dimensions, including several normalization options, with statistical methods and visualization tools to aid in identification of differentially expressed features.
- the software tools are applied to differential profiling of metabolites present in rat urine, Arabidopsis plant extracts, and pancreatic islet cell extracts, and accurate mass data are used in determining the chemical compositions and identities of differentially expressed features.
- Agilent 1100 Series Capillary HPLC system interfaced to an Agilent LC/MSD TOF mass spectrometer via a standard or a capillary-optimized ESI source.
- Rat urine samples contained 5 mM sodium azide as a preservative to insure the stability of the samples.
- Minimal sample preparation included 100-fold dilution with HPLC grade water and filtration prior to analysis.
- the plant extract sample was dissolved in 320 ⁇ L acetonitrile, 16 ⁇ L of the supernatant was mixed with 4 ⁇ L of water and 1 ⁇ L was injected. Desalting using C18 coated pipette tips was used for a portion of the plant and cellular extracts.
- Chromatographic separation of the organic acids present in the rat urine was accomplished using a C18 3.5, 2.1 ⁇ 100 mm column, with a flow rate of 400 ⁇ L/min, solvent gradient: 0-6 min, 0%-20% B; 6-10 min, 20%-95% B; 10-11 min, 95-100% B; with a 3 minute hold time and injection volume of 3.0 ⁇ L.
- Chromatographic separation of the metabolites present in the plant extract was performed on a Zorbax SB C18, 5.0, 150 ⁇ 0.5 mm column, flow rate of 20 ⁇ L/min, solvent gradient: 0-5 min, 5% B, 5-25 min, 5%-95% B with a 20 minute hold and an injection volume of 1.0 ⁇ L.
- the mobile phase solvents were changed from 0.1% formic acid (Water A, Acetonitrile B); to 0.1% acetic acid to reduce the formation of sodium bound dimers from the presence of sodium in the rat urine samples.
- Electrospray data for the rat urine samples was obtained using orthogonal ESI optimized with flow rate: at 400 ⁇ L/min, drying gas temperature: 350° C.; drying gas flow: 9.5 L/min; nebulizer pressure: 40 psig; capillary voltage: 3000 V and fragmenter voltage: 175 V;
- the plant extracts were analyzed at 20 ⁇ L/min using a micro-spray nebulizer with drying gas temperature: 300° C., drying gas flow: 4.0 L/min, nebulizer pressure 20 psig, capillary voltage: 3500 V and fragmenter voltage 215 V.
- Negative ion ESI mode provided enhanced detection of the organic acids present in the urine samples
- positive ion ESI mode provided enhanced detection of the metabolites in the plant and cellular extracts.
- the mass spectrometer was operated with a mass range of m/z 50 to 1,100, and 1.29 cycles/second and m/z 100 to 3200 for the cellular extracts
- the internal reference mass correction was utilized to correct for scan to scan variations.
- the MFE algorithm identifies features in MS data by first finding the mass peaks in all mass spectra, and then removing non-chromatographic chemical background. Next, peaks are clustered in RT (in seconds) and m/z to form 3-D peaks. The 3-D peaks are centroided and a peak volume determined for each peak. Related 3-D peaks (isotopes, adducts, dimers, trimers, multiple charge states) are combined and assigned a neutral mass and total volume.
- An example of the output is shown below in FIG. 4 , including raw, background-subtracted, and extracted feature chromatograms, the list of detected grouped features including RT, m/z and volume values, and an example of an extracted mass spectrum for one of the features.
- LCMS TOF data obtained from a pancreatic islet cellular extract.
- the sample is a complex mixture of low molecular weight metabolites (amino acids, glycerol-esters, fatty acids) and insulin.
- the Profiler software employs the measured mass to determine putative elemental compositions. For example, the feature at 3.440 minutes corresponds to glycerol monopalmitate with an empirical formula of C19H3804, a mass error of ⁇ 0.2 ppm and an isotope match of 91 ( FIG. 6 a ). Deconvolution of multiple charge states by the software confirmed residual insulin ( FIGS. 6 b / 6 c ).
Landscapes
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Molecular Biology (AREA)
- Electrochemistry (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The present invention provides, inter alia, methods of analyzing mass spectral data. In some embodiments, the methods can be used for differential profiling of samples, such as comparing a sample comprising a compound and a sample comprising metabolites of the same compound. The methods can also be used to identify and isolate biomarkers. Systems for performing the methods, as well as computer-readable media for performing the methods, are also described.
Description
This application claims the benefit of U.S. Provisional Patent Application No. 60/764,729, filed Feb. 2, 2006, which is herein incorporated by reference in its entirety.
Recent advances in biotechnology, such as the sequencing of the human genome, have increased the need for information on how various encoded gene products, or proteins, mediate the biological processes that either contribute to health, or cause diseases. Standard molecular biological techniques study these processes at the genomic level, but do not provide information at the protein level. The growing field of proteomics research involves the search for targets or biomarkers for drug discovery and development, as well as to provide information that can be used to diagnose disease.
Comprehensive system-wide biomarker discovery has been made easier by the advent of large-scale analytical methods such as DNA microarray technology, high-throughput mass spectrometry (MS) and other techniques used to study complex biological systems. Statistical and machine-learning methods have also been developed, allowing the study of very large datasets produced by high-throughput protein analysis methods.
High throughput MS is a powerful technique in biomarker discovery. However, the use of this technique is complicated by a number of factors. Biological samples are very complex, and often contain hundreds to thousands of compounds, and analysis of these samples can often be difficult. For example, the differential comparison of LC-MS data from different biological samples generates complex datasets, and presents significant data processing challenges. The analysis is time-consuming and there is often significant noise and variability that is not properly accounted for. Current methods to eliminate noise and detect mass spectral peaks use an ad hoc approach, and do not use any a priori or learned information with regard to peak shape, retention time, or relationship among peaks. Statistical methods used to subtract background and reduce noise often remove relevant information in addition to filtering out noise and irrelevant information. The resulting data sets are not suitable for downstream analysis during biomarker discovery.
Therefore, there is a need for methods to analyze complex MS data sets that will incorporate richer qualitative information and thereby improve biomarker analysis. One way to address these challenges is by using a software module that contains a means for a priori partitioning of features, such that irrelevant features are filtered out before performing differential analysis of the data, while preserving relevant features for later analysis. If molecular features corresponding to specific chemical properties can be extracted in a fast and efficient manner, the data obtained can be used to make a powerful bioinformatics system.
The present invention relates to, inter alia, methods for differential profiling of samples. In particular, some embodiments of the methods of the present invention integrate chemical information with differential expression analysis and statistical methods to identify or differentiate expression level changes in a biological sample.
In some embodiments, the methods of the present invention use a molecular feature extraction process to group mass peaks in mass spectrometric data sets. In an aspect, the peaks are grouped according to particular chemical features or properties. Extracted molecular feature information is then normalized and statistically or visually analyzed to identify differentially expressed features.
In some embodiments, the methods of the present invention combine chemical information with differential expression analysis, thereby significantly reducing noise. In an aspect, using chemically relevant information to extract molecular features also reduces the complexity of the input data for the differential expression analysis.
The present invention provides improved methods for rapid and accurate identification of differentially expressed entities in biological samples. Therefore, the methods of the present invention can be used to compare complex sets of data for various samples, and is particularly useful in biomarker discovery.
Various embodiments of the present invention will be described in detail with reference to the drawings, wherein like reference numerals represent like parts throughout the several views. Reference to various embodiments does not limit the scope of the invention, which is limited only by the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the claimed invention.
Prior to describing the invention in further detail, the terms used in this application are defined as follows unless otherwise indicated.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. Although any methods, devices and material similar or equivalent to those described herein can be used in the practice or testing of the invention, the preferred methods, devices and materials are now described.
All publications cited in this specification, including patent publications, are indicative of the level of ordinary skill in the art to which this invention pertains and are incorporated herein by reference in their entireties.
As used herein, the term “differential profiling” or “differential display” refers to investigating the differences between the mass spectral data for a first sample and those for a second sample. Similarly, differential profiling can be performed for more than two sets of data, namely comparing the mass spectral data of three or more samples and investigating the differences among them. It should be noted that sometimes differential profiling is performed using sample sets, each of which comprises multiple samples. For instance, a user may wish to compare the molecules in the sera of breast cancer patients and those in the sera of normal controls. Thus, serum samples from multiple breast cancer patients are obtained, and serum samples from multiple normal controls are also collected. Each sample is analyzed, and differential profiling is conducted to compare the mass spectral data of the samples in the patient group to the mass spectral data of the control group. A differential display image or plot shows the differences between or among the samples, with respect to abundance of a particular component, presence of a particular chemical species, or changes in expression level of a particular component.
The term “sample” as used herein relates to a material or complex mixture of materials, typically, although not necessarily, in fluid form. Samples of the present invention include, but are not limited to, biological samples obtained from natural biological sources, such as cells or tissues, or plants. The samples of the present invention include, but are not limited to, complex biological samples containing many different components or metabolites, such as urine or serum, for example. The samples of the present invention also include complex mixtures derived from non-animal sources, such as complex extracts derived from plants. The sample may also be non-biological, such as environmental samples (water, air, rain, etc.)
The term “spectral peak” refers to a peak in the output from any type of spectral analysis instrument, and is known in the art. In a given analysis, peaks can represent one or more components in a sample. A “mass spectral peak” is a spectral peak in a mass spectrum.
The term “3-D peak” refers to a cluster of LC-MS (or GC-MS, CE-MS, etc.) signals that have the same m/z value (subject to variations in measurement), and similar retention time values. The signals could be either raw profile spectral pixels or spectral peaks.
In this specification and the appended claims, the singular form “a,” “an,” and “the” include plural reference unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs.
Methods for Analysis of Samples
To reduce the complexity of the mass spectral data obtained from a sample, the spectral peaks that are related to the same compound are grouped together. This can be achieved by analyzing the features of the various spectral peaks. Specifically, spectral peaks of similar retention time (RT) and mass to charge ratio (m/z) are grouped, optionally also taking into consideration other properties such as isotope clustering, mass, adduct formation, dimer formation, and/or trimer formation. An exemplary grouping is demonstrated in Table 1 below, which shows eight feature groups that elute at approximately the same time in a liquid chromatography column.
| TABLE 1 |
| Exemplary Feature Groups |
| Group | species | RT | m/z | mass | abundance |
| 1 | M | 2.037 | 310.0743 | 4195152 | |
| M + H | 2.037 | 311.0819 | 310.0746 | 2293700 | |
| M + H + 1 | 2.037 | 312.0838 | 281053 | ||
| M + H + 2 | 2.037 | 313.0796 | 89762 | ||
| M + H + 3 | 2.036 | 314.0822 | 9620 | ||
| M + H + 4 | 2.034 | 315.0858 | 1271 | ||
| M + Na | 2.038 | 333.0629 | 310.0737 | 681860 | |
| M + Na + 1 | 2.038 | 334.0652 | 76900 | ||
| M + Na + 2 | 2.038 | 335.0619 | 25150 | ||
| M + Na + 3 | 2.04 | 336.0628 | 3048 | ||
| 2M + Na | 2.036 | 643.1359 | 310.0734 | 267980 | |
| 2M + Na + 1 | 2.036 | 644.1388 | 65029 | ||
| 2M + Na + 2 | 2.036 | 645.1358 | 26983 | ||
| 2M + Na + 3 | 2.036 | 646.1369 | 5102 | ||
| 2M + Na + 4 | 2.033 | 647.1357 | 1155 | ||
| 2M + Na + 5 | 2.045 | 648.1414 | 145 | ||
| 2 | M | 2.036 | 310.3532 | 86544 | |
| M + H | 2.036 | 311.3602 | 310.3529 | 79826 | |
| M + H + 1 | 2.035 | 312.356 | 881 | ||
| M + Na | 2.036 | 333.3464 | 310.3572 | 5838 | |
| 3 | M | 2.037 | 369.1474 | 59441 | |
| M + H | 2.037 | 370.1547 | 369.1474 | 48982 | |
| M + H + 1 | 2.037 | 371.1566 | 8742 | ||
| M + H + 2 | 2.036 | 372.1538 | 1717 | ||
| 4 | M | 2.037 | 355.1317 | 40265 | |
| M + H | 2.037 | 356.1389 | 355.1317 | 34295 | |
| M + H + 1 | 2.038 | 357.1418 | 5969 | ||
| 5 | M | 2.038 | 468.0295 | 13908 | |
| M + H | 2.038 | 469.0368 | 468.0295 | 8874 | |
| M + H + 1 | 2.04 | 470.0376 | 1442 | ||
| M + H + 2 | 2.038 | 471.0452 | 403 | ||
| M + Na | 2.037 | 491.0186 | 468.0294 | 2004 | |
| M + Na + 1 | 2.036 | 492.0178 | 166 | ||
| 2M + H | 2.033 | 937.0685 | 468.0306 | 260 | |
| 2M + H + 1 | 2.041 | 938.061 | 74 | ||
| 2M + Na | 2.035 | 959.0463 | 468.0285 | 176 | |
| 6 | M | 2.036 | 778.1027 | 9792 | |
| M + H | 2.036 | 779.1102 | 778.1029 | 6379 | |
| M + H + 1 | 2.036 | 780.1115 | 1863 | ||
| M + H + 2 | 2.039 | 781.1053 | 914 | ||
| M + H + 3 | 2.032 | 782.1132 | 167 | ||
| M + Na | 2.031 | 801.088 | 778.0988 | 382 | |
| M + Na + 1 | 2.036 | 802.0854 | 87 | ||
| 7 | M | 2.037 | 664.1093 | 8630 | |
| M + H | 2.037 | 665.1164 | 664.1091 | 5531 | |
| M + H + 1 | 2.037 | 666.1182 | 1478 | ||
| M + Na | 2.036 | 687.0991 | 664.1099 | 1196 | |
| M + Na + 1 | 2.032 | 688.1061 | 424 | ||
| 8 | M | 2.039 | 354.0369 | 6416 | |
| M + H | 2.039 | 355.0442 | 354.0369 | 6010 | |
| M + H + 1 | 2.033 | 356.0482 | 407 | ||
Thus, the first broad group in Table 1 relates to a compound (M) and comprises three subgroups. The first subgroup comprises M associated with a hydrogen (M+H), and four other species that are also M+H but differ in isotope compositions (M+H+1, M+H+2, M+H+3, M+H+4). Isotope clustering is explained in more detail in Example 1 of the present application. The second subgroup comprises the sodium adducts of M (M+Na), including species with different isotope compositions. The third subgroup comprises dimers of M (2M) with sodium, again including species with different isotope compositions. The other seven groups are similarly listed. The compound (also called molecular feature) that each group relates to is designated M and shown in bold with its average RT, mass, and cumulative abundance.
These eight groups in Table 1 formed one peak in liquid chromatography and many peaks in subsequent mass spectrometry in various places. Upon grouping as shown above, however, their relationship becomes clear, which not only facilitates differential profiling, but also enables a practitioner to compare the properties of each compound to the spectral behavior of known compounds in an effort to identify the compounds in the sample.
Thus, some embodiments of the present invention provide a method for analyzing mass spectral data of a sample, comprising dividing the mass spectral data into feature groups, each feature group relating to a compound, wherein said dividing is performed based on retention time, mass to charge ratio, and optionally at least one property selected from the group consisting of isotope clustering, mass, adduct formation, dimer formation, and trimer formation. The method may further comprise comparing the properties of any feature group to those of known compound in order to identify the compound(s) in the feature group.
The present invention further provides methods for differential profiling of multiple samples, by extracting chemically relevant information, followed by differential expression analysis. Chemically relevant information is extracted using a computerized algorithm that automatically extracts unique features from a data set. The extracted features are then differentially analyzed using a combination of statistical and visual methods. Thus, each one mass spectral data set obtained from a sample can be analyzed by the feature grouping process described above. The grouped data derived from multiple samples are then compared to each other or one another to identify compounds that are present in one or some, but not all, of the samples, or compounds of which the abundance changes significantly between or among the samples. The results can be used, for example, to identify compounds that increase or decrease in a diseased sample versus the normal control; an animal that has been fed with a carcinogen versus the normal control; a water sample from a potentially polluted area versus an unpolluted area; and the like.
To further facilitate the mass spectral analysis, and particularly the differential profiling, the data can be filtered by criteria determined by the user. As described in Example 1, the user can choose a specific range of retention time and/or mass to include in the analysis. Similarly, the user can choose the isotope pattern, charge state, abundance, etc. of the spectral peaks that the user wishes to include in the analysis. For example, if the user only wants to look for peptide markers that change in abundancy between or among the samples being analyzed, the peptide filter can be selected (see Example 1 and FIG. 3 ). If the user only wants small molecules to be analyzed or displayed, a mass range can be prescribed.
A simplified representation of certain embodiments of the method is shown in FIG. 1 . In these embodiments, a biological sample containing a complex mixture of chemical components is obtained and then analyzed by any of one or more spectral analysis methods, such as LC-MS, for example. Spectral peaks obtained from the LC-MS analysis are grouped according to m/z ratio and retention time into 3-D peaks. The peak volume can be measured for ach 3-D peak. The molecular features are extracted as clusters of these 3-D peaks, using an algorithm that filters out irrelevant features while associating or grouping 3-D peaks according to chemically relevant information. For example, 3-D peaks can be associated as isotope clusters, dimers, adducts, etc. Peaks can be grouped according to chemical information available from a database, or based on user input. The extracted molecular features are listed in the output and can be used to identify different chemical components (or markers). The samples are then differentially profiled by comparing the extracted features (i.e. the filtered output) with the filtered output for a second biological sample, or by comparing the filtered output with molecular features extracted from known compounds.
In some embodiments, the present disclosure describes a method for differential profiling of highly complex biological samples. Samples of the present invention include, but are not limited to, biological samples obtained from natural biological sources, such as cells or tissues. In an aspect, the sample is a highly complex mixture of different components. The components may include, but are not limited to, proteins, metabolites, amino acids, glycerol esters, fatty acids, etc., and any derivatives or degradation products thereof. In an aspect, the sample contains hundreds to thousands of components, spanning a broad range of compositions and concentrations. In yet another aspect, the sample is a complex mixture of peptides or nucleic acids. In some embodiments, the sample is a complex mixture of metabolites present in a biological sample. In some embodiments, the sample is a complex mixture of small molecules.
The present disclosure describes methods for analyzing a complex mixture of various components. In some embodiments, the analysis begins with separating portions of the sample into multiple components. The sample can be separated using any of a number of separation techniques including, but not limited to, ion exclusion, ion exchange, normal/reversed phase partition, size exclusion, ligand exchange, liquid/gel phase isoelectric focusing, adsorption chromatography, and liquid chromatography. In an aspect, the components are separated by liquid chromatography (i.e. LC, such as HPLC). Each separated component is associated with a specific retention time. Variation in retention time can be reduced using flow-controlled capillary HPLC. The separated components are then further analyzed by mass spectrometry (MS) methods, to determine the identity of the separated components.
The MS analysis can be performed by using any MS method or instrumentation. In some embodiments, the MS analysis uses a combination of time-of-flight mass spectrometry (TOF-MS) and electron spray ionization mass spectrometry (ESI-MS). ESI methods are preferred because ions are generated directly in solution and therefore, ESI can be readily combined with other spectral analysis methods, such as HPLC. Although not limiting to the present description, the charge mode of ESI can be varied according to the sample being analyzed. For example, samples derived from plants and cellular extracts are typically analyzed in the positive ion ESI mode. On the other hand, samples containing mostly organic acids (such as a urine sample, for example) are typically analyzed in the negative ESI mode.
In mass spectrometry, each separated component of the sample yields multiple spectral peaks with a measured mass-to-charge (m/z) ratio. Variations in mass measurement can be reduced by normalization against an internal reference standard. In an aspect, spectral peaks can be grouped, into 3-D peaks, according to m/z ratio and retention time. In an aspect, the magnitude parameter, m/z ratio and retention time for each peak are stored in a text-readable format. In another aspect, the magnitude parameter, m/z ratio and retention time are stored in a text-translatable format. In yet another aspect, the magnitude, m/z ratio and retention time are stored in text format and displayed as a graphical representation.
In the methods described herein, the 3-D peaks obtained by MS analysis of a sample are associated or grouped on the basis of particular chemical or molecular features. In an aspect, 3-D peaks are associated or grouped into isotope clusters. In another aspect, isotope clusters are associated into molecular features characterized by their neutral masses and retention times. In yet another aspect, 3-D peaks are grouped into adducts, dimers, and charge states. In an aspect, non-chromatographic background is removed by subtracting the baseline. In another embodiment, molecular features obtained from MS analysis of a sample are normalized before comparison with molecular features obtained from MS analysis of a known compound.
The present disclosure describes a method for differential analysis of components in different samples or groups of samples. In some embodiments, the differential analysis compares expression levels of the components in a first sample with expression levels of the components in a second sample. In another embodiment, parameters for a chemical component in a sample can be compared with parameters for a known component. In an aspect, the associated or grouped spectral peaks for separated components in a sample are compared with the associated or grouped spectral peaks obtained from MS analysis of a known material. In an aspect, the known material is a peptide. In another aspect, the known material is a nucleic acid.
In some embodiments, the methods for MS analysis of biological samples include extracting molecular features from the spectral data. In some embodiments, the analysis comprises identifying molecular features from a sample at a given retention time value, and then associating the identified molecular feature with the molecular features of a known material. In an aspect, associating molecular features comprises identifying and grouping isotope clusters. In another aspect, associating molecular features comprises identifying and grouping neutral mass components in the sample. In some embodiments, the extraction of molecular features from the spectral data comprises associating spectral peaks into 3-D peak, 3-D peaks into isotope clusters, and isotope clusters into molecular features on the basis of ion charge at a given retention time.
Methods for the differential profiling of biological samples, comprising compiling extracted molecular features from a plurality of responses or data sets are described herein. The compiled features from the plurality of responses are stored in a text-readable format, or in a text-translatable format. In an aspect, compiled molecular features for a first sample are cross-aligned with compiled features for a second sample. Statistical methods are then used to normalize the cross-aligned molecular features. In an aspect, differential analysis is performed using standard statistical methods. Differentially expressed features can also be identified visually, through graphical representation. In an aspect, a graphical representation is a plot of Log ratio versus retention time. In another aspect, a graphical representation is a Log/Log ratio plot.
The methods described herein can also be utilized to isolate compounds of interest. For example, after markers (compounds that change in abundancy between or among samples) are identified by differential profiling, their properties (retention time, mass, etc.) can be used as criteria for isolation and purification from samples. The markers can then be studied in further detail.
Systems for Differential Analysis of Samples
A system for differential analysis of samples is described herein. In some embodiments, the system comprises a first apparatus for separating a complex biological sample into chemical components on the basis of retention time and a second apparatus that determines the mass of each of the separated chemical components. The retention time data and mass data for each separated component are retained in a storage medium. The system includes a processing subsystem that associates or groups the separated components on the basis of properties including retention time and mass. The system also includes an output subsystem for displaying the association of the separated chemical components. In some embodiments, the first apparatus comprises a liquid chromatography column, a gas chromatography column, or a capillary electrophoresis device.
In some embodiments, the processing subsystem associates the components on the basis of spectral peak intensity. In another embodiment, the processing subsystem determines a magnitude parameter for each 3-D peak based on the intensity of the spectral peak, the retention time and measured m/z. In an aspect, the magnitude parameter is the volume of the 3-D peak. The processing subsystem associates 3-D peaks according to chemical properties, such as mass or charge state. In another aspect, the subsystem associates 3-D peaks into isotope clusters. In one embodiment, the processing subsystem can be used as part of a differential analysis system. The processing subsystem may optionally compare the associated spectral peaks for the components of the sample with the associated spectral peaks for a known material, to identify one or more components in the sample.
The system includes a storage medium for retaining the retention time and mass for each separated chemical component in a sample. In some embodiments, the storage medium is a computer-readable medium that stores a plurality of data objects. The stored data objects include data objects identifying the retention time for components in the sample, the m/z ratio for components in the sample, and other chemically relevant attributes of components within the sample. Chemically relevant attributes include charge states, isotope properties and adducts. In another aspect, the stored data objects contain information about peak magnitude or peak volume. The data objects to be stored on the computer-readable medium may be further selected on the basis of signal strength. In an aspect, only data objects having signal strength greater than a prescribed value are stored on the computer-readable medium. The data objects stored on the computer-readable medium can be manipulated as text. In some embodiments, data objects are stored in data base form, such that data objects identifying retention time, m/z ratio and peak magnitude are displayed as related objects in a record.
Some embodiments of this invention provide a computer-readable medium comprising executable instructions for performing the analysis methods described herein. For example, the method to be performed can be a method for dividing the mass spectral data from a sample into feature groups, each feature group relating to a compound, wherein said dividing is performed based on retention time, mass to charge ratio, and optionally at least one property selected from the group consisting of isotope clustering, mass, adduct formation, dimer formation, and trimer formation. The method may further comprise allowing the user to filter in or out compounds of interest based on one or more properties selected from the group consisting of retention time, mass, isotope pattern, charge state, abundance, mass defect, and number of ions, for example. The method may be a differential profiling method, in which each sample in a collection of multiple samples is first analyzed as described above, then the results from the multiple samples are compared to each other or one another to identify the differences.
In this disclosure, the following abbreviations have the following meanings unless indicated otherwise. Abbreviations not defined have their generally accepted meanings.
| ° C. = | degree Celsius | ||
| hr = | hour | ||
| min = | minute | ||
| sec = | second | ||
| mM = | millimolar | ||
| μM = | micromolar | ||
| nM = | nanomolar | ||
| ml = | milliliter | ||
| μl = | microliter | ||
| nl = | nanoliter | ||
| mg = | milligram | ||
| μg = | microgram | ||
| HPLC = | high performance liquid chromatography | ||
| LC = | liquid chromatography | ||
| MS = | mass spectrometry | ||
| MFE = | Molecular Feature Extractor | ||
| ppm = | parts per million | ||
This example describes a software for complexity reduction of liquid chromatography/mass spectrometry (LC/MS) data. This program, the Molecular Feature Extractor (MFE), combines coeluting spectral peaks into compounds, accurately calculating their neutral mass and abundance while removing chemical interferences. MFE is equally effective in small molecule (such as metabolites) and large molecule (such as peptides) applications. Complex peptide mixtures with known amounts of spiked proteins were first analyzed using MFE, and Genespring MS (Agilent Technologies, Santa Clara, Calif.) software was used to show that the known proteins were not only found but accurately quantified.
Biomarkers are metabolites, peptides, and other biomolecules that can be used to determine the presence or absence of a specific condition such as a disease. LC/MS-based platforms are rapidly becoming popular for the discovery of new biomarkers. The general strategy for doing this is to isolate the type of biomarker from a biological sample (e.g. blood) and then perform reversed phase LC/MS on the resulting mixture. The major challenge for such analyses is the complexity of the sample, which can have concentrations spanning up to 12 orders of magnitude and can contain hundreds of thousands of components. Further, LC/MS systems separate the signal for a single component in the mixture into 100's of different peaks in the mass spectral data, and reconstructing the molecular entities is a rather difficult task. Furthermore, the high throughput and resolution of mass spectrometers result in data files that can be up to several gigabytes in size. Thus, accurate and high throughput data reduction is one of the critical steps in such a biomarker analysis platform.
Here, we demonstrate the utility of MFE using a complex peptide mixture.
Method
MFE (Molecular Feature Extractor) takes raw MS data as the input and outputs a list of “molecular features.” A molecular feature represents a chemical entity in the real world, such as a compound (e.g., a peptide). MFE identifies a feature by its mass and retention time (RT or elution time), together with the information on all its isotope clusters, associated with different ion species such as different charge state, dimers, and adducts. The algorithm procedure can be summarized as follows:
Removal of background chemical noise. Background noise is generally concentrated at certain m/z values and distributed evenly throughout the LC/MS run. FIG. 2 shows contour plots of the signals from a peptide mixture before and after the removal of chemical noise. For example, the program can remove all signals with a signal/noise ratio of less than 2.
Extraction of three dimensional peaks (m/z, RT, and peak height are the three dimensions). Each peak is a single isotopomer of a molecular feature.
Grouping of 3-D peaks into isotope clusters. Peaks that belong to the same isotope cluster elute from the chromatographic column at the same time and are spaced at regular intervals that reflects the charge state of the isotopic cluster. However, this may not be sufficient since overlapping, coeluting isotopic clusters are common. Thus, the heights of the peaks in the same isotopic cluster can be taken into account based on knowledge of chemical composition of the compounds in the sample. For example, the ratio between the amounts of naturally occurring 12C and 13C is known. If two coeluting peaks A and B differ in mass by 1, and the ratio between the heights of A and B equals (within tolerable error) the ratio of 12C/13C abundancy, then A and B correspond to the same compound except that they contain 12C and 13C, respectively.
Grouping isotope clusters into molecular features. The separate isotopic clusters for a given molecular feature elute at approximately the same time. These sets of isotope clusters with the same RT are grouped into different molecular features by their associated mass and according to chemistry rules such as the presence of salt adducts.
The user is provided with the option of many filters, which can be used individually or in combination, to separate the result into different categories based on their chemical properties and relationship, and/or the goal of the user. For example, FIG. 3 shows a graphical user interface that lists many filters. Thus, the user can choose a specific range of retention time and/or mass to include in the analysis. Similarly, the user can choose the isotope pattern, charge state, abundance, etc. of the spectral peaks that the user wishes to include in the analysis.
Results and Discussion
To measure the speed performance of MFE, we used a data set acquired during 70-minute LC-MS run that has 238,163,145 data points. MFE extracted 2043 molecular features from this dataset within 5 minutes. Since the data was originally acquired over 70 minutes, this analysis time is fast enough for a high throughput system.
A key feature of a biomarker discovery platform is the ability to detect small but significant changes in concentrations for proteins between a normal and a diseased sample within a highly complex mixture. To test this, a digest of the complete set of cytosolic proteins from E. coli was split three ways, with 100, 200, and 400 fmol of a tryptic digest of bovine serum albumin (BSA) and 200, 100, and 50 fmol of a tryptic digest of serotransferrin added to these three samples, respectively. Each sample was run five times using LC/MS and each of the 15 resulting datafiles (˜300 MB each) was analyzed using MFE to create lists of components. The resulting compounds were loaded into GeneSpring MS, filtered by fold change and the requirement that a compound must be present in 4 out of 5 datafiles for each condition, and subjected to K-means clustering. The resulting sets were submitted to Agilent Spectrum Mill Proteomics Workbench for manual peptide mass fingerprinting (PMF), and BSA and serotransferrin were successfully identified. It had been expected that the signals for BSA and serotransferrin should change in magnitude proportional to the amount of the corresponding protein in the sample. Indeed, the BSA peptides increased in intensity across the three samples while the serotransferrin peptides decreased in intensity, as expected.
Metabolic samples typically are highly complex, containing hundreds to thousands of compounds, many of which co-elute, spanning a broad range of concentration and compound classes. These characteristics result in significant data-processing challenges. In this work we illustrate the use of two new MS informatics tools designed to facilitate rapid differential analysis of samples for metabolic profiling applications. The MFE algorithm extracts chemically qualified molecular features from complex LC-TOF data sets. Mass Profiler software combines capabilities for cross-sample alignment of molecular features in both the retention time and mass dimensions, including several normalization options, with statistical methods and visualization tools to aid in identification of differentially expressed features. The software tools are applied to differential profiling of metabolites present in rat urine, Arabidopsis plant extracts, and pancreatic islet cell extracts, and accurate mass data are used in determining the chemical compositions and identities of differentially expressed features.
Method
Accurate mass LC-ESI-MS data obtained from urine, cell and plant extracts were used to evaluate the performance of the MFE algorithm and Mass Profiler software. Mass spectral data were acquired with an Agilent LC/MSD TOF. Sample introduction and chromatographic separations were performed with an Agilent 1100 series capillary HPLC system. Minimal sample preparation (desalting, enrichment) was completed prior to analysis using commercially available desalting spin tubes and micropipette tips.
Experimental
Instrumentation:
Agilent 1100 Series Capillary HPLC system interfaced to an Agilent LC/MSD TOF mass spectrometer via a standard or a capillary-optimized ESI source.
Sample Preparation:
Rat urine samples contained 5 mM sodium azide as a preservative to insure the stability of the samples. Minimal sample preparation included 100-fold dilution with HPLC grade water and filtration prior to analysis. The plant extract sample was dissolved in 320 μL acetonitrile, 16 μL of the supernatant was mixed with 4 μL of water and 1 μL was injected. Desalting using C18 coated pipette tips was used for a portion of the plant and cellular extracts.
Chromatographic Conditions:
Chromatographic separation of the organic acids present in the rat urine was accomplished using a C18 3.5, 2.1×100 mm column, with a flow rate of 400 μL/min, solvent gradient: 0-6 min, 0%-20% B; 6-10 min, 20%-95% B; 10-11 min, 95-100% B; with a 3 minute hold time and injection volume of 3.0 μL. Chromatographic separation of the metabolites present in the plant extract was performed on a Zorbax SB C18, 5.0, 150×0.5 mm column, flow rate of 20 μL/min, solvent gradient: 0-5 min, 5% B, 5-25 min, 5%-95% B with a 20 minute hold and an injection volume of 1.0 μL. The mobile phase solvents were changed from 0.1% formic acid (Water A, Acetonitrile B); to 0.1% acetic acid to reduce the formation of sodium bound dimers from the presence of sodium in the rat urine samples.
Mass Spectrometer Conditions:
Electrospray data for the rat urine samples was obtained using orthogonal ESI optimized with flow rate: at 400 μL/min, drying gas temperature: 350° C.; drying gas flow: 9.5 L/min; nebulizer pressure: 40 psig; capillary voltage: 3000 V and fragmenter voltage: 175 V; The plant extracts were analyzed at 20 μL/min using a micro-spray nebulizer with drying gas temperature: 300° C., drying gas flow: 4.0 L/min, nebulizer pressure 20 psig, capillary voltage: 3500 V and fragmenter voltage 215 V. Negative ion ESI mode provided enhanced detection of the organic acids present in the urine samples, while positive ion ESI mode provided enhanced detection of the metabolites in the plant and cellular extracts. For the plant and urine samples the mass spectrometer was operated with a mass range of m/z 50 to 1,100, and 1.29 cycles/second and m/z 100 to 3200 for the cellular extracts To ensure low-ppm mass accuracy, the internal reference mass correction was utilized to correct for scan to scan variations.
Results and Discussion
MFE Algorithm
The MFE algorithm identifies features in MS data by first finding the mass peaks in all mass spectra, and then removing non-chromatographic chemical background. Next, peaks are clustered in RT (in seconds) and m/z to form 3-D peaks. The 3-D peaks are centroided and a peak volume determined for each peak. Related 3-D peaks (isotopes, adducts, dimers, trimers, multiple charge states) are combined and assigned a neutral mass and total volume.
An example of the output is shown below in FIG. 4 , including raw, background-subtracted, and extracted feature chromatograms, the list of detected grouped features including RT, m/z and volume values, and an example of an extracted mass spectrum for one of the features.
Dynamic Range Evaluation
Since the MFE algorithm generates feature lists for input to differential expression analysis, we have evaluated its performance over an extended dynamic range. A representative rat urine sample was extracted using a signal-to-noise threshold of 2 and the resulting molecular feature list sorted by peak volume. The largest peak volume (m/z 178.052) was found to be 200 arbitrary units. By inspection, several molecular features were identified at m/z 160.040 with various retention times, and with relative volumes between 0.04% and 1%. The results were compared to the features identified via manual extraction of the m/z 160.040 ion from the raw data set using Analyst QS data analysis software (Sciex, Toronto, Canada). For every molecular feature studied, there was a corresponding peak in the extracted ion chromatograph, exhibiting good agreement for both retention time and relative abundance, over greater than 3 orders of magnitude.
Mass Profiler
In biomarker discovery, carefully designed protocols for sample collection, handling, preparation and analysis must be followed to enable meaningful differential analysis. Appropriate statistical analysis then allows segregation of experimental and within-population variations from cross-population expression level changes. The Profiler software compensates for scan-to-scan shifts in retention time (seconds) and measured masses (ppm). Common features are identified and cross-sample response relative standard deviation values are calculated. Results filters can be used to reduce the number of differentially expressed metabolites to be investigated.
Over 13,192 total features were extracted from the two sets of 3 replicates each of rat urine LC/MS data (FIG. 5 a). However, only 699 features were present in all six LCMS data sets and only 84 features exhibited a response difference of a factor of 2 or greater across the two sample sets (FIG. 5 b). Differential expression changes are readily visualized using the Log ratio versus retention time and Log/Log ratio plots (FIGS. 5 c and 5 d).
Chemical Identification of Metabolites
Following differential expression analysis, a next step is frequently the chemical identification of differentially expressed features. Chemical identification capabilities are demonstrated using LCMS TOF data obtained from a pancreatic islet cellular extract. The sample is a complex mixture of low molecular weight metabolites (amino acids, glycerol-esters, fatty acids) and insulin. The Profiler software employs the measured mass to determine putative elemental compositions. For example, the feature at 3.440 minutes corresponds to glycerol monopalmitate with an empirical formula of C19H3804, a mass error of −0.2 ppm and an isotope match of 91 (FIG. 6 a). Deconvolution of multiple charge states by the software confirmed residual insulin (FIGS. 6 b/6 c).
The various embodiments described above are provided by way of illustration only and should not be construed to limit the invention. Those skilled in the art will readily recognize various modifications and changes that may be made to the present invention without following the example embodiments and applications illustrated and described herein, and without departing from the true spirit and scope of the present invention.
Claims (20)
1. A method of analyzing mass spectral data obtained from a sample, comprising:
dividing the mass spectral data into feature groups, wherein each of said feature groups relates to a compound; and
filtering the data based on properties selected from the group consisting of retention time, mass, isotope pattern, charge state, abundance, mass defect, and number of ions.
2. The method of claim 1 , wherein said dividing is performed based on retention time, mass to charge ratio, isotope clustering, adduct formation, dimer formation, and trimer formation of spectral peaks.
3. The method of claim 1 further comprising comparing the properties of at least one feature group to the properties of a known material to identify one or more components in the sample.
4. The method of claim 1 , wherein the sample is a biological sample.
5. A method for differential profiling multiple sets of mass spectral data, wherein each set of the mass spectral data is obtained from a distinct sample, the method comprising:
(a) analyzing each set of mass spectral data according to the method of claim 1 ;
(b) comparing the results of step (a) from different samples to identify compounds that are present in different amounts between or among the samples.
6. A computer-readable medium comprising executable instructions to perform the method of claim 1 .
7. A system comprising the computer-readable medium of claim 6 .
8. The system of claim 7 further comprising a mass spectrometer.
9. The system of claim 8 further comprising at least one liquid chromatography column.
10. The system of claim 8 further comprising at least one gas chromatography column.
11. The system of claim 8 further comprising at least one capillary electrophoresis apparatus.
12. The system of claim 8 wherein the mass spectrometer comprises an ion source selected from the group consisting of electrospray, matrix assisted laser desorption (MALDI), and photoionization ion sources.
13. The system of claim 8 wherein the mass spectrometer comprises a mass analyzer selected from the group consisting of quadrupole, time-of-flight, ion trap, and fourier transform-ion cyclotron resonance (FT-ICR) mass analyzers.
14. A method of comparing the compositions of multiple samples, comprising:
(a) separating at least part of the components in each sample;
(b) analyzing the separated components in each sample with a mass spectrometer to generate a mass spectral data set for each sample;
(c) recording the multiple mass spectral data sets; and
(d) analyzing each of the mass spectral data sets according to the method of claim 1 and comparing the results from said multiple samples.
15. The method of claim 14 , wherein said dividing is performed based on retention time, mass to charge ratio, isotope clustering, adduct formation, dimer formation, and trimer formation of spectral peaks in the mass spectral data.
16. The method of claim 14 further comprising comparing the properties of at least one feature group to the properties of a known material to identify one or more components in each of the samples.
17. The method of claim 14 , wherein the multiple samples are biological samples.
18. The method of claim 14 , wherein the at least part of the components are separated in step (a) by liquid chromatography.
19. A method for differential profiling multiple sets of mass spectral data, wherein each set of the mass spectral data is obtained from a distinct sample, the method comprising:
(a) dividing the mass spectral data from each sample into feature groups, each feature group relating to a compound, wherein said dividing is performed based on retention time, mass to charge ratio, and optionally at least one property selected from the group consisting of isotope clustering, mass, adduct formation, dimer formation, and trimer formation; and
(b) comparing the results of step (a) from different samples to identify compounds that are present in different amounts between or among the samples.
20. A computer-readable medium comprising executable instructions to perform the method of claim 19 .
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/670,955 US7653496B2 (en) | 2006-02-02 | 2007-02-02 | Feature selection in mass spectral data |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US76472906P | 2006-02-02 | 2006-02-02 | |
| US11/670,955 US7653496B2 (en) | 2006-02-02 | 2007-02-02 | Feature selection in mass spectral data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20070176088A1 US20070176088A1 (en) | 2007-08-02 |
| US7653496B2 true US7653496B2 (en) | 2010-01-26 |
Family
ID=38321126
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US11/670,955 Active 2028-07-31 US7653496B2 (en) | 2006-02-02 | 2007-02-02 | Feature selection in mass spectral data |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US7653496B2 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100108876A1 (en) * | 2008-10-31 | 2010-05-06 | Horn David M | Mass Spectral Analysis Of Complex Samples Containing Large Molecules |
| US20130297230A1 (en) * | 2012-05-07 | 2013-11-07 | Shimadzu Corporation | Data-Processing System For Chromatographic Mass Spectrometry |
| EP3980764A4 (en) * | 2019-06-06 | 2023-07-05 | Icahn School of Medicine at Mount Sinai | SYSTEMS AND METHODS OF DIAGNOSING BIOLOGICAL DISORDERS ASSOCIATED WITH PERIODIC VARIATIONS IN METAL METABOLISM |
Families Citing this family (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4985153B2 (en) * | 2007-07-03 | 2012-07-25 | 株式会社島津製作所 | Chromatograph mass spectrometer |
| US7820964B2 (en) * | 2007-08-06 | 2010-10-26 | Metabolic Analyses, Inc | Method for generation and use of stable isotope patterns in mass spectral data |
| US8536520B2 (en) * | 2007-08-06 | 2013-09-17 | Iroa Technologies Llc | Method for generation and use of isotopic patterns in mass spectral data of simple organisms |
| US7820963B2 (en) | 2007-08-06 | 2010-10-26 | Metabolic Alayses, Inc. | Method for generation and use of isotopic patterns in mass spectral data of simple organisms |
| ES2456266T3 (en) * | 2007-10-02 | 2014-04-21 | Iroa Technologies Llc | Generation and use of isotopic patterns in phenotypic comparison of mass spectra of organisms |
| US8969251B2 (en) * | 2007-10-02 | 2015-03-03 | Methabolic Analyses, Inc. | Generation and use of isotopic patterns in mass spectral phenotypic comparison of organisms |
| DE102007000627A1 (en) | 2007-11-06 | 2009-05-07 | Agilent Technologies Inc., Santa Clara | Measured data evaluating device for e.g. liquid chromatography/mass spectrometry device, has processing unit for processing of measured data of measurements such that processed data are represented in two-dimensions |
| US8283631B2 (en) * | 2008-05-08 | 2012-10-09 | Kla-Tencor Corporation | In-situ differential spectroscopy |
| WO2012107786A1 (en) | 2011-02-09 | 2012-08-16 | Rudjer Boskovic Institute | System and method for blind extraction of features from measurement data |
| JP5941073B2 (en) * | 2011-03-11 | 2016-06-29 | レコ コーポレイションLeco Corporation | System and method for processing data in a chromatography system |
| JP6245387B2 (en) * | 2015-01-26 | 2017-12-13 | 株式会社島津製作所 | Three-dimensional spectral data processing apparatus and processing method |
| US10823714B2 (en) | 2016-12-29 | 2020-11-03 | Thermo Finnigan Llc | Simplified source control interface |
| US11519900B2 (en) * | 2017-01-23 | 2022-12-06 | Koninklijke Philips N.V. | Alignment of breath sample data for database comparisons |
| AU2018224235B2 (en) | 2017-02-24 | 2024-12-05 | Iroa Technologies, Llc | IROA metabolomics workflow for improved accuracy, identification and quantitation |
| CN110709700A (en) | 2017-04-14 | 2020-01-17 | 朱诺治疗学股份有限公司 | Method for evaluating cell surface glycosylation |
| JP6505167B2 (en) * | 2017-07-21 | 2019-04-24 | 株式会社日立ハイテクサイエンス | Mass spectrometer and mass spectrometry method |
| US20190391092A1 (en) * | 2018-06-21 | 2019-12-26 | Oregon Institute of Science and Medicine | Metabolic profiling with magnetic resonance mass spectrometry (mrms) |
| JP7504891B2 (en) | 2018-09-11 | 2024-06-24 | ジュノー セラピューティクス インコーポレイテッド | Methods for mass spectrometric analysis of engineered cell compositions - Patents.com |
| JP6915005B2 (en) * | 2019-08-28 | 2021-08-04 | 日本電子株式会社 | Mass spectrum processing device and model generation method |
| EP4070332A1 (en) * | 2019-12-05 | 2022-10-12 | BASF Plant Science Company GmbH | Method for analyzing the metabolic content of a biological sample |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5699269A (en) * | 1995-06-23 | 1997-12-16 | Exxon Research And Engineering Company | Method for predicting chemical or physical properties of crude oils |
| US7499807B1 (en) * | 2006-09-19 | 2009-03-03 | Battelle Memorial Institute | Methods for recalibration of mass spectrometry data |
-
2007
- 2007-02-02 US US11/670,955 patent/US7653496B2/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5699269A (en) * | 1995-06-23 | 1997-12-16 | Exxon Research And Engineering Company | Method for predicting chemical or physical properties of crude oils |
| US7499807B1 (en) * | 2006-09-19 | 2009-03-03 | Battelle Memorial Institute | Methods for recalibration of mass spectrometry data |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100108876A1 (en) * | 2008-10-31 | 2010-05-06 | Horn David M | Mass Spectral Analysis Of Complex Samples Containing Large Molecules |
| US7910877B2 (en) * | 2008-10-31 | 2011-03-22 | Agilent Technologies, Inc. | Mass spectral analysis of complex samples containing large molecules |
| US20110198491A1 (en) * | 2008-10-31 | 2011-08-18 | Horn David M | Mass spectral analysis of complex samples containing large molecules |
| US8237108B2 (en) * | 2008-10-31 | 2012-08-07 | Agilent Technologies, Inc. | Mass spectral analysis of complex samples containing large molecules |
| US20130297230A1 (en) * | 2012-05-07 | 2013-11-07 | Shimadzu Corporation | Data-Processing System For Chromatographic Mass Spectrometry |
| US10607722B2 (en) * | 2012-05-07 | 2020-03-31 | Shimadzu Co. | Data-processing for chromatographic mass spectrometry |
| EP3980764A4 (en) * | 2019-06-06 | 2023-07-05 | Icahn School of Medicine at Mount Sinai | SYSTEMS AND METHODS OF DIAGNOSING BIOLOGICAL DISORDERS ASSOCIATED WITH PERIODIC VARIATIONS IN METAL METABOLISM |
Also Published As
| Publication number | Publication date |
|---|---|
| US20070176088A1 (en) | 2007-08-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7653496B2 (en) | Feature selection in mass spectral data | |
| Courant et al. | Basics of mass spectrometry based metabolomics | |
| Karpievitch et al. | Liquid chromatography mass spectrometry-based proteomics: biological and technological aspects | |
| Theodoridis et al. | Mass spectrometry‐based holistic analytical approaches for metabolite profiling in systems biology studies | |
| US8068987B2 (en) | Method and system for profiling biological systems | |
| JP4768189B2 (en) | Methods for non-targeted complex sample analysis | |
| Metz et al. | Future of liquid chromatography–mass spectrometry in metabolic profiling and metabolomic studies for biomarker discovery | |
| EP2834835B1 (en) | Method and apparatus for improved quantitation by mass spectrometry | |
| CN109564207B (en) | Mass spectrometry methods for detection and quantification of metabolites | |
| JP7431933B2 (en) | Method for absolute quantification of low abundance polypeptides using mass spectrometry | |
| Schwämmle et al. | Computational and statistical methods for high-throughput analysis of post-translational modifications of proteins | |
| WO2003102543A2 (en) | A method of using data binning in the analysis of chromatograhpy/spectrometry data | |
| EP3542292B1 (en) | Techniques for mass analyzing a complex sample | |
| US10197576B2 (en) | Mass spectrometry imaging with substance identification | |
| CA2614508C (en) | Means and methods for characterizing a chemical sample | |
| US20220221433A1 (en) | Mass spectrometry assay methods for detection of metabolites | |
| Hu et al. | Recent advances in mass spectrometry-based peptidome analysis | |
| Mirnaghi et al. | Challenges of analyzing different classes of metabolites by a single analytical method | |
| Furlani et al. | Liquid chromatography-mass spectrometry for clinical metabolomics: an overview | |
| JP4317083B2 (en) | Mass spectrometry method and mass spectrometry system | |
| Deda et al. | GC-MS-based metabolic phenotyping | |
| Stojiljkovic et al. | Evaluation of horse urine sample preparation methods for metabolomics using LC coupled to HRMS | |
| US8237108B2 (en) | Mass spectral analysis of complex samples containing large molecules | |
| JP2009020037A (en) | Identification method by metabolome analysis, identification method of drug metabolite, and screening method thereof | |
| Junot et al. | Metabolomics using Fourier transform mass spectrometry |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: AGILENT TECHNOLOGIES, INC., COLORADO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LI, XIANGDONG DON;REEL/FRAME:018871/0509 Effective date: 20070202 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| FPAY | Fee payment |
Year of fee payment: 4 |
|
| FPAY | Fee payment |
Year of fee payment: 8 |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |