WO2007147938A1 - Normalisation de données de spectroscopie avec plusieurs étalons internes - Google Patents

Normalisation de données de spectroscopie avec plusieurs étalons internes Download PDF

Info

Publication number
WO2007147938A1
WO2007147938A1 PCT/FI2007/050356 FI2007050356W WO2007147938A1 WO 2007147938 A1 WO2007147938 A1 WO 2007147938A1 FI 2007050356 W FI2007050356 W FI 2007050356W WO 2007147938 A1 WO2007147938 A1 WO 2007147938A1
Authority
WO
WIPO (PCT)
Prior art keywords
peaks
variability
spectrum
denotes
data
Prior art date
Application number
PCT/FI2007/050356
Other languages
English (en)
Other versions
WO2007147938A9 (fr
Inventor
Matej Oresic
Original Assignee
Valtion Teknillinen Tutkimuskeskus
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valtion Teknillinen Tutkimuskeskus filed Critical Valtion Teknillinen Tutkimuskeskus
Publication of WO2007147938A1 publication Critical patent/WO2007147938A1/fr
Publication of WO2007147938A9 publication Critical patent/WO2007147938A9/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8624Detection of slopes or peaks; baseline correction
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • G01N30/7233Mass spectrometers interfaced to liquid or supercritical fluid chromatograph

Definitions

  • the invention relates to methods and equipment for normalizing spectroscopy data, particularly metabolomics data, by multiple internal standards. Particularly, the invention relates to forming an optimal selection of the multiple internal standards.
  • an internal standard compound means a standard compound which is added to a sample prior to extraction
  • an external standard compound means a standard compound which is added to the sample after extraction.
  • Metabolomics is a discipline dedicated to the global study of metabolites, their dynamics, composition, interactions, and responses to interventions or to changes in their environment, in cells, tissues, and biofluids. Concentration changes of specific groups of metabolites may be descriptive of systems' responses to environmental or genetic interventions, and their study may therefore be a powerful tool for characterization of complex phenotypes as well as for development of biomarkers for specific physiological responses. [0003] Study of the variability of metabolites in different states of biological systems is therefore an important task in systems biology. Because researches' principal interest is in system responses which result in metabolite level regulation in relation to diverse genetic or environmental changes, it is important to separate such interesting biological variation from obscuring sources of variability introduced in experimental studies of metabolites.
  • the sources of the obscuring variation are many and platform- specific. Such sources may include variation in sample preparation and metabolite extraction, which are affected by primary sample handling such as quenching, pipetting error, reagent quality or temperature. In mass spectrometry-based detection, the sources include the variations in the ion source as well as biological sample-specific effects such as ion suppression. Following the measurement, the data pre-processing steps, such as peak detection and alignment, may introduce additional errors. [0004] Chemical diversity of metabolites, which may, for example, lead to different recoveries during extraction and responses during ionization in a mass spectrometer, hampers the task of separating interesting variations from obscuring ones.
  • Quantitative analytical methods have commonly relied on utilization of isotope-labelled internal standard for each metabolite measured. However, in broad metabolic profiling approaches this is not practical, since the number of metabolites is very high. Their chemical diversity is too high for a common labelling approach, and many of the metabolites may not even be known.
  • a first category includes statistical models used to derive optimal scaling factors for each sample on the basis of a complete dataset, such as normalization by sum of squares of intensities or maximum likelihood method adopted from the approach developed for gene expression data.
  • a second category includes normalization techniques by one or more internal or external standard compounds on the basis of empirical rules, such as specific regions of retention time, or distance to the metabolite peaks in the spectra.
  • the retention time is not necessarily descriptive of all matrix and chemical properties leading to obscuring variation.
  • lipid separation based on reverse phase LC diverse lipid species such as ceramides, sphingomyelins, diacylglycerols, and several phsopholipid classes, are overlapping in retention time, and it is not reasonable to assume same normalization factor can be applied to all these species. The situation is even more complex when analyzing water soluble metabolites.
  • An object of the invention is to develop methods and equipment which alleviate some or all of the problem described above. Particularly, it is an object of the invention to improve the ability of spectrometer analysis equipment to distinguish between relevant and obscuring variations. This is accomplished by a novel normalization method which diminishes effects of systematic variation within the spectra.
  • An aspect of the invention is a method for normalizing a plurality of spectra, the method comprising: - preparing a plurality of experiment runs;
  • - denotes variability of a quantity, wherein the variability is a measure of the quantity's deviation from an average value of the quantity over the sample runs;
  • the acronym "NOMIS" which stands for Normalization with Multiple Internal Standards, denotes the technique according to the invention.
  • the NOMIS technique can be used directly as a one-step normalization method, or as a two-step method where the normalization parameters containing information about the variabilities of internal standard compounds and their association to variabilities of metabolites are first calculated from a repeatability study. Additionally, the technique can be used to select standard compounds for normalization and evaluate their influence on variability of metabolites across the full spectrum.
  • the inventive method is formally expressed as follows.
  • the non-normalized metabolomics data resulting from first stages of pre-processing, which usually include peak detection and alignment, can be represented by a matrix of N variables (metabolite peaks) and M objects (samples). For example, in liquid chromatography/mass spectrometry-based (LC/MS) profiling, each peak is represented by mass to charge ratio (m/z) and retention time (r ⁇ .
  • LC/MS liquid chromatography/mass spectrometry-based
  • m is the actual intensity value, ie, an intensity value independent of the run, ry is the correction factor, and e ⁇ j is the random error.
  • ry is the correction factor
  • e ⁇ j is the random error.
  • the systematic variation in each individual metabolite X 1 is modelled as a function of variation of standard compounds, as illustrated in Figure 2. Based on this assumption, the correction factors r ⁇ can be determined from the profiles of standard compounds.
  • the random error ⁇ is assumed Gaussian with a zero mean and independent variables:
  • variable p logarithm of the correction factor
  • can be obtained from the profiles of identified internal standards found in the spectra, and the parameters ⁇ can be calculated from equation [9].
  • the matrix ⁇ relates the variability of each individual metabolite in biological matrix with that of internal standards for a specific platform and biological matrix, it is possible that the parameters ⁇ are obtained from a separate repeatability experiment involving a large number of repeated measurements. This may often be desirable due to the large number of normalization parameters (NxS) to be determined by the inventive technique.
  • the correction factors from equation [12] in a real biological application then include the matrix ⁇ obtained independently as well as the measured levels of internal standards ⁇ sj ⁇ from the biological experiment.
  • a technical benefit of the inventive normalization technique is improved spectroscopy analysis because the effect of systematic variation is diminished.
  • Those skilled in the art will realize that the use of the logarithm function as the data transformation functions simplifies the description of the inventive normalization method. It also simplifies calculations to computers. However, the invention is not restricted to the use of the logarithm function, and a large variety of data transformation functions can be used.
  • Figure 2 illustrates an operating principle of the inventive normalization method
  • Figure 3 shows a coefficient of variance distributions for different normalization methods
  • Figure 4 shows coefficients of variance for individual peaks in a liver repeatability study
  • Figure 5 shows an internal standard profile upon its addition to a raw dataset
  • Figure 6 illustrates the inventive method as a tool to select the best set of internal standards used for normalization
  • Figure 7 shows the beta ( ⁇ ) matrix values for selected liver lipid components
  • Figure 8 shows coefficients of variance for identified liver lipid species
  • Figure 9 which was described earlier, in the background section of this application, shows a comparison of two metabolomic total ion chromatograms (TIC) from two different mouse phenotypes.
  • Figure 1 is a flow chart illustrating main phases in a method according to an embodiment of the invention.
  • the invention relates to processing of spectral data from a plurality of sample runs. Each sample run produces a spectrum ⁇ spectral data) from a sample.
  • the samples used in the different sample runs can be subsamples from a common larger sample, or they can derive from different samples altogether.
  • Reference numeral 1-2 denotes sample preparation steps which are known to those skilled in the art and which have been briefly discussed in the background section of this document.
  • Reference numeral 1-4 denotes a step which comprises spectrometry operations, including recording of measured spectral data.
  • Reference numeral 1-6 denotes an optional step in which the spectral data is converted from a vendor-specific data format to some open data format, such as netCDF. A benefit of this step, or the corresponding routine and data structures in the software product, is the ability to support a wide variety of spectrometry instruments.
  • the spectral data is smoothed to suppress noise and other spurious data. In some implementations this step may be performed by the spectrometer itself.
  • step 1-10 the spectral data is internally represented in two dimensions, wherein one dimension corresponds to mass-charge ratio m/z, while the other dimension corresponds to retention time rt.
  • the term 'internal representation 1 means that a visualization of the spectral data is not necessary, at least not at this stage.
  • Reference numeral 1-12 denotes a peak detection step in which peaks in the spectral data are detected.
  • Steps 1-2 through 1-12 are known to those skilled in the art and a detailed description is omitted for brevity. In these steps the several sample runs are typically processed serially, each sample run at a time. In the following steps the several sample runs are processed in parallel, interdependently.
  • step 1-14 data from the several sample runs are aligned such that there is a maximal correspondence between the peaks of the spectra.
  • the verb 'align' may imply visualization, but visualization is not strictly necessary, and any equivalent data processing technique may be used.
  • the alignment operation searches for corresponding peaks across different mass spectrometry runs. Peaks from the same compound usually match closely in m/z values, but retention time between the runs may vary. The retention time largely depends on the analytical method used.
  • a method according to the invention comprises a second peak detection step 1-16, the purpose of which is to fill these gaps.
  • the second peak detection step employs the mk m and rt m values for estimating locations in which the missing peaks can be expected.
  • a search is then conducted to find the highest local maximum over a range around the expected location in the raw spectral data. The search is performed over a search window which is preferably user-settable.
  • Step 1-18 relates to a normalization step which is further described in connection with Figure 2 and the above-described equations.
  • Figure 2 illustrates an operating principle of the inventive normalization method. As usual, “m/z” stands for mass-to-charge ratio and “rt” denotes retention time. Figure 2 illustrates how the normalization factors F 1 (SISi)
  • Fi(SlS 4 ) for each metabolite peak M t are influenced by the variability of each internal or external standard component and its association with the variability of the metabolite.
  • the standard components are shown as internal standard components ISi, ... , IS 4 .
  • Figure 3 shows a coefficient of variance distributions for different normalization methods.
  • the data shown in Figure 3 is based on mouse liver repeatability and reproducibility run of 16 samples (3 extractions from the same biological sample, each with repeated runs of 10, 3, and 3 injections, respectively). A total of 1470 monoisotopic peaks were included in the analysis.
  • the technique according to the invention which is denoted by symbol "NOMIS" and placed in the upper-right hand corner of Figure 3, produces a notably narrower distribution of coefficient of variation (CV) as well as a lower median CV than do raw data and other normalization methods.
  • Figure 4 shows coefficients of variance for individual peaks in a liver repeatability study. Each detected peak is shown in a two-dimensional plot of m/z vs.
  • Figure 5 shows a coefficient of variation for an internal standard (GPEth(17:0/17:0)) profile upon its addition to a raw dataset.
  • the NOMIS method therefore utilized only four internal standards. While none of the method produces significant deviation in intensity, the NOMIS method leads to 5 the lowest variability of the component.
  • Figure 6 illustrates the inventive technique as a tool for selecting an optimal set of internal standards used normalization.
  • FIG. 7 shows the beta ( ⁇ ) matrix values for selected liver lipid components.
  • the beta matrix values are shown for eight illustrative lipid molecular species of different functional class and for all internal standards used, which are abbreviated as shown in Table 1.
  • the LPC has expectedly high influence on monoacyl lipids.
  • sphingomyelin which does not have an internal standard of its own, is influenced most by ceramide and PC, as one would have expected based on chemical structure.
  • the internal standard specific factor influencing the normalization is also proportional to the internal standard concentration.
  • Figure 8 shows coefficients of variance for identified liver lipid species. Each lipid molecular species is shown in the two dimensional plot of m/z vs. retention time plot, with the colour corresponding to the coefficient of variance. The data is based on normalization performed on a different biological sample as in Figure 4, which was run nine times (three extractions with three injections each). A total of 360 identified lipid molecular species were included in the analysis. The NOMIS method utilized the Beta matrix calculated previously from a 16-sample run which was described in connection with Figures 3 and 4.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

L'invention concerne la normalisation d'un spectre, qui comprend les étapes qui consistent à: préparer (1 - 2) des essais expérimentaux; les traiter (1 - 4) dans un spectromètre LC/MS afin d'obtenir un spectre pour chaque essai expérimental; effectuer une représentation interne (1-10) de chaque spectre du rapport masse/charge (m/z) en fonction du temps de rétention (rt); réaliser une détection des pics (1-12) de chaque spectre; effectuer un alignement interne (1-14) des pics détectés et effectuer une normalisation (1-18) du spectre, qui comprend la modélisation des variations de Yij, notées SYij, en fonction de la variabilité de Ω, notée f(SΩ). δ représente la variabilité d'une grandeur, (la déviation de la grandeur par rapport à la valeur moyenne de la grandeur sur un lot d'échantillons); X = Xij = la matrice d'intensité de tous les pics, affichés en fonction de Y via une première fonction de transformation de telle sorte que Y = f1(X); Z = Zij = la matrice d'intensité des pics des étalons internes (IS1 - IS4) affichés en fonction de Ω via une deuxième fonction de transformation t de telle sorte que Ω = f1(Z). i représente des pics: i → {m/z, rt}, i = 1... N et j représente le nombre d'expériences d'un lot.
PCT/FI2007/050356 2006-06-21 2007-06-14 Normalisation de données de spectroscopie avec plusieurs étalons internes WO2007147938A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20065430 2006-06-21
FI20065430A FI20065430A0 (fi) 2006-06-21 2006-06-21 Spektroskopiadatan normalisointi usean sisäisen standardin avulla

Publications (2)

Publication Number Publication Date
WO2007147938A1 true WO2007147938A1 (fr) 2007-12-27
WO2007147938A9 WO2007147938A9 (fr) 2008-03-20

Family

ID=36651525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2007/050356 WO2007147938A1 (fr) 2006-06-21 2007-06-14 Normalisation de données de spectroscopie avec plusieurs étalons internes

Country Status (3)

Country Link
US (1) US20080091359A1 (fr)
FI (1) FI20065430A0 (fr)
WO (1) WO2007147938A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104170052B (zh) * 2012-04-02 2017-08-11 塞莫费雪科学(不来梅)有限公司 用于改进的质谱分析法定量作用的方法和装置
JP6152301B2 (ja) * 2013-06-03 2017-06-21 国立医薬品食品衛生研究所長 定量方法およびプログラム
CN110044997B (zh) * 2018-01-15 2023-08-04 中国医学科学院药物研究所 一种体内药物的离子强度虚拟校正和定量质谱成像分析方法
CN112395983B (zh) * 2020-11-18 2022-03-18 深圳市步锐生物科技有限公司 一种质谱数据峰位置对齐方法及装置
CN112967758A (zh) * 2021-02-04 2021-06-15 麦特绘谱生物科技(上海)有限公司 一种自组装的代谢组学数据处理系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080040A1 (en) * 2004-04-23 2006-04-13 Roche Diagnostics Operations, Inc. Method and system for processing multi-dimensional measurement data
WO2006125865A1 (fr) * 2005-05-26 2006-11-30 Valtion Teknillinen Tutkimuskeskus Techniques d'analyse pour chromatographie en phase liquide/spectrometrie de masse

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080040A1 (en) * 2004-04-23 2006-04-13 Roche Diagnostics Operations, Inc. Method and system for processing multi-dimensional measurement data
WO2006125865A1 (fr) * 2005-05-26 2006-11-30 Valtion Teknillinen Tutkimuskeskus Techniques d'analyse pour chromatographie en phase liquide/spectrometrie de masse

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BERGEN H.R. ET AL.: "Normalization of relative peptide ratios derived from in-gel digests: applications to protein variant analysis at the peptide level", RAPID COMMUN. MASS SPECTROM., vol. 19, 2005, pages 2871 - 2877 *
DE RIDDER F. ET AL.: "An improved multiple internal standard normalisation for drift in LA-ICP-MS measurements", J. ANAL. AT. SPECTROM., vol. 17, 2002, pages 1461 - 1470 *
KATAJAMAA M. ET AL.: "Processing methods for differential analysis of LC/MS profile data", BMC BIOINFORMATICS 2005, vol. 6, no. 179, 2005 *
ORESIC M.: "Normalization of metabolomics data using multiple internal standards", PROCEEDINGS OF PROBABILISTIC MODELING AND MACHINE LEARNING IN STRUCTURAL AND SYSTEMS BIOLOGY WORKSHOP (PSMB 2006, JUHO ROUSU, SAMUEL KASKI AND ESKO UKKONEN, EDS., TUUSULA, FINLAND, 17 June 2006 (2006-06-17) - 18 June 2006 (2006-06-18), pages 147 - 152 *

Also Published As

Publication number Publication date
WO2007147938A9 (fr) 2008-03-20
FI20065430A0 (fi) 2006-06-21
US20080091359A1 (en) 2008-04-17

Similar Documents

Publication Publication Date Title
Domingo-Almenara et al. Metabolomics data processing using XCMS
Sysi-Aho et al. Normalization method for metabolomics data using optimal selection of multiple internal standards
Katajamaa et al. Data processing for mass spectrometry-based metabolomics
Cao et al. Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics
US8068987B2 (en) Method and system for profiling biological systems
Tautenhahn et al. Highly sensitive feature detection for high resolution LC/MS
Kenar et al. Automated label-free quantification of metabolites from liquid chromatography–mass spectrometry data
CA2501003C (fr) Analyse d'echantillons pour obtenir des donnees de caracterisation
JP4818116B2 (ja) メタボノミクスにおいてlc−msまたはlc−ms/msデータの処理を行うための方法およびデバイス
Farrés et al. Chemometric evaluation of Saccharomyces cerevisiae metabolic profiles using LC–MS
EP2834835B1 (fr) Procédé et appareil pour la quantification améliorée par spectrométrie de masse
O’Connor et al. LipidFinder: a computational workflow for discovery of lipids identifies eicosanoid-phosphoinositides in platelets
Burton et al. Instrumental and experimental effects in LC–MS-based metabolomics
Zhang et al. A comprehensive automatic data analysis strategy for gas chromatography-mass spectrometry based untargeted metabolomics
Khan et al. Evaluating a targeted multiple reaction monitoring approach to global untargeted lipidomic analyses of human plasma
Ruhaak et al. Chip-based nLC-TOF-MS is a highly stable technology for large-scale high-throughput analyses
Matsuda et al. Assessment of metabolome annotation quality: a method for evaluating the false discovery rate of elemental composition searches
JP2006528339A (ja) クロマトグラフィー/質量分析における生体分子パターンのアノテーション法及びシステム
WO2007147938A1 (fr) Normalisation de données de spectroscopie avec plusieurs étalons internes
Mitchell et al. New methods to identify high peak density artifacts in Fourier transform mass spectra and to mitigate their effects on high-throughput metabolomic data analysis
Feng et al. Dynamic binning peak detection and assessment of various lipidomics liquid chromatography-mass spectrometry pre-processing platforms
Hübner et al. lipID—a software tool for automated assignment of lipids in mass spectra
Hnatyshyn et al. Automated and unbiased analysis of LC–MS metabolomic data
EP3002696B1 (fr) Procédés pour générer, rechercher et valider statistiquement une bibliothèque d'ions de fragment peptidique
Rodrigues et al. Standard key steps in mass spectrometry-based plant metabolomics experiments: Instrument performance and analytical method validation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07788741

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07788741

Country of ref document: EP

Kind code of ref document: A1