CN104615903A - Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method - Google Patents

Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method Download PDF

Info

Publication number
CN104615903A
CN104615903A CN201510084309.8A CN201510084309A CN104615903A CN 104615903 A CN104615903 A CN 104615903A CN 201510084309 A CN201510084309 A CN 201510084309A CN 104615903 A CN104615903 A CN 104615903A
Authority
CN
China
Prior art keywords
normalization
data
nmr
vector
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510084309.8A
Other languages
Chinese (zh)
Other versions
CN104615903B (en
Inventor
董继扬
邓伶莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN201510084309.8A priority Critical patent/CN104615903B/en
Publication of CN104615903A publication Critical patent/CN104615903A/en
Application granted granted Critical
Publication of CN104615903B publication Critical patent/CN104615903B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Magnetic Resonance Imaging Apparatus (AREA)

Abstract

The invention relates to nuclear magnetic resonance, in particular to a model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method. The method includes the steps of (1) data acquisition, (2) data centralization and normalization coefficient initialization, (3) normalization processing, (4) multivariate statistical analysis, (5) model adaptive coefficient normalization and (6) loop iteration, wherein the loop iteration specifically includes repeating the steps from (3) to (5) till loop termination that the formula is satisfied. By the loop iteration between multivariate statistical analysis models and a maximized projection vector and class vector correlation coefficients, normalization coefficient vectors are regulated continuously, and the multivariate statistical analysis model established according to normalized data can be used for extracting coefficient vectors accurately. The model adaptive normalization method is an adaptive normalization method capable of adopting the suitable normalization coefficient vector specific to the selected multivariate statistical analysis model. Compared with the conventional data normalization methods, the method is more flexible and effectively and capable of effectively retaining structural information of spectrum data.

Description

A kind of NMR metabolism group data normalization method of model adaptation
Technical field
The present invention relates to nuclear magnetic resonance, especially relate to a kind of NMR metabolism group data normalization method of model adaptation.
Background technology
Metabolism group method is the emerge science grown up the phase at the end of the nineties in last century, it is by the modern analytical technique of high flux, high sensitivity and pinpoint accuracy, the entirety composition of endogenous metabolism thing in analysis of cells, tissue and biological fluid, and by metabolin complexity, change dynamically, identification and resolve the physio-pathological condition of studied object.
Because nuclear magnetic resonance (NMR) technology has non-intruding and the feature without skewed popularity, become the analytical technology that metabolism group is main.High flux, high-resolution modern NMR analytical instrument be abundanter in acquisition biological sample, more accurately while metabolic information, bring huge challenge also to follow-up data analysis.Usually, the one dimension of a biological sample 1h NMR has composed 4k ~ 32k data point, and there is serious collinearity between these data points.In order to know the main cause causing data variation, need in conjunction with statistical calculation methods such as multidimensional statistics pattern-recognitions.
At present, main employing principal component analysis (PCA) (Principal Component Analysis, PCA) (Wold S:Principalcomponent analysis.Chemometrics and Intelligent Laboratory Systems 1987, 2 (1): 37-52), partial least squares analysis (Partial Least Square, PLS) (Geladi P, Kowalski BR:Partial least-squares regression:atutorial.Analytica Chimica Acta 1986, 185:1-17) with orthogonal partial least squares analysis (Orthogonal PLS, OPLS) (Trygg J, Wold S:Orthogonal projections to latent structures (O-PLS) .Journal ofChemometrics 2002, 16 (3): 119-128) etc. Multivariable Linear projecting method reduces data dimension and eliminates collinearity, and obtain interested biological metabolism information.But in the biological sample of complexity, the concentration difference of different metabolic thing is often very large.When utilizing this kind of Multivariable Statistical Methods based on variance of PCA and PLS to these undressed data analyses, the effect of small scale signal easily cover by large scale signal.In fact, this change of large scale signal may just because yardstick causes greatly, and can not reflect the situation of change of data itself, it is insignificant for obtaining the possibility of result, therefore in order to eliminate the excessive harmful effect brought of data different scale, need to be normalized data.
Data normalization method is a lot, method for normalizing conventional in NMR metabolism group mainly contains unit variance normalization method (UnitVariance, UV) (Van Den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, Van Der Werf MJ:Centering, scaling, and transformations:improving the biological information content ofmetabolomics data.BMC Genomics 2006, 7 (1): 142), Pa Laituo normalization method (Patero) (Odunsi K, WollmanRM, Ambrosone CB, Hutson A, McCann SE, Tammela J, Geisler JP, Miller G, Sellers T, Cliby W:Detection of epithelial ovarian cancer using 1H-NMR-based metabonomics.International Journal ofCancer 2005, 113 (5): 782-788) and variable stability normalization method (Variable Stability, be abbreviated as VAST) (Keun HC, Ebbels T, Antti H, Bollard ME, Beckonert O, Holmes E, Lindon JC, Nicholson JK:Improvedanalysis of multivariate data by variable stability scaling:application to NMR-based metabolicprofiling.Analytica Chimica Acta 2003, 490 (1): 265-276), its general general formula can be expressed as:
x ~ ij = x ij × s j
Wherein, s jfor the normalized weight coefficient of matrix X jth row (a jth variable).
UV method by the standard deviation of each variable as normalized " yardstick ".After UV process, each variable will have identical standard deviation, the method is more responsive to noise ratio, for the data point of low signal-to-noise ratio, the calculating of its weight is affected by noise very large, and the variable particularly in the pure noise range of spectrogram, because their standard deviation is less, therefore by these variablees after UV process by obtaining larger weight, be unfavorable for the identification of characteristic metabolic thing.Pareto method (Pareto scaling) is between not doing a kind of method between normalized and UV method for normalizing, the evolution of the standard deviation of variable is normalized as yardstick by it, this method can when reducing large-signal overweight and affecting, maintain the structure of raw data to a certain extent simultaneously, relative to UV method, the result obtained and raw data are more close.VAST is on the basis of UV method, utilize the average stability of each variable in different classes of sample, further the scale factor of fine setting variable, because the stability of noise spot is generally poor, therefore VAST method effectively can reduce the weight of noise spot, improves the treatment effect of UV method.
Normalized object is to improve multivariate statistical analysis effect, improves the interpretation of multivariate statistical analysis model.These method for normalizing existing are just from data, and do not consider the impact on follow-up changeable statistical study, the result of multivariate statistical analysis (as: PCA, PLS) is often unsatisfactory.Up to the present, the method for normalizing also not in conjunction with the NMR metabolism group data of multivariate statistical analysis model is open, and the normalization of NMR metabolism group data combines with follow-up multivariate statistical analysis model by the present invention, and obtains good effect.
Summary of the invention
Object of the present invention is overcoming the deficiencies in the prior art, provides a kind of NMR metabolism group data normalization method of model adaptation.
The present invention includes following steps:
1) data acquisition: metabolism group biological specimen to be detected, is adopted by nuclear magnetic resonance spectrometer and obtains 1h NMR composes; Right 1hNMR spectrum carries out spectrogram editor, obtains pending NMR metabolism group data.
In step 1) in, described spectrogram editor comprises phase place and Baseline wander, compose that peak aligns, integration etc.
2) data centerization and normalization coefficient initialization: treat that normalized NMR metabolism group data are expressed as X, its every a line represents a shape sample, and sample class vector representation is Y, does centralization process respectively to X and Y; Normalization coefficient vector is designated as s=[s 1, s 2..., s d] t, initialization s is complete 1 column vector;
3) normalized:
X s=X*diag(s)
In step 3) in, diag () is diagonal matrix conversion symbol, and namely diag (s) represents generation one with vector s for diagonal element, and other element is the diagonal matrix of 0; X sfor data matrix after normalization.
4) multivariate statistical analysis: multivariate statistical analysis is done to the data matrix after normalization, its load vector is designated as u;
5) model adaptation normalization coefficient: maximize X sprojection and category vectors Y related coefficient on u, that is:
max : r 2 = ( cov ( X disg ( s ) u , Y ) std ( X diag ( s ) u ) * s td ( Y ) ) 2
Gradient descent method is utilized to upgrade s,
s = s + η ∂ r 2 ∂ s
Wherein,
∂ r 2 ∂ s = 2 Y T X diag ( s ) u * Y T diag ( u ) * | | X diag ( s ) u | | F 2 - ( Y T X diag ( s ) u ) 2 * 2 s T diag ( u ) X T X diag ( u ) | | X diag ( s ) u | | F 4 * | | Y | | F 2
In step 5) in, r represents related coefficient; Cov () and std () is respectively covariance and standard deviation calculates symbol; for gradient operator; η is constant, and its span is (0,1).
6) loop iteration: repeat step 3) ~ 5), until meet circulation terminates.
In step 6) in, || || be vector modulo symbol; ε is self-defined constant.
Principle of the present invention is: mainly through carrying out loop iteration at multivariate statistical analysis model with between maximization projection vector and category vectors related coefficient, continuous adjustment normalization coefficient vector, makes data are set up after normalization multivariate statistical analysis model can characteristic information between accurately extraction group.
Compared with prior art, tool of the present invention has the following advantages and beneficial effect:
Model adaptation method for normalizing (MAS) can adopt suitable normalization coefficient vector for selected multivariate statistical analysis model, is a kind of adaptive method for normalizing.Relative to the method for normalizing in the past based on data, the method more flexibly, effectively.In addition, normalization algorithm different from the past, this method for normalizing effectively can keep the structural information of modal data.
Accompanying drawing explanation
Fig. 1 is the stack spectral (not doing centralization process) of raw data.
Fig. 2 is the stack spectral after the normalization of UV method.
Fig. 3 is the stack spectral after the normalization of Pareto method.
Fig. 4 is the stack spectral after the normalization of VAST method.
Fig. 5 is the stack spectral after the normalization of PCA-MAS method.
Fig. 6 is the PCA shot chart of raw data.In figure 6, marking ■ is VM; ● be OM.
Fig. 7 is the PCA shot chart after the normalization of UV method.In the figure 7, marking ■ is VM; ● be OM.
Fig. 8 is the PCA shot chart after the normalization of Pareto method.In fig. 8, marking ■ is VM; ● be OM.
Fig. 9 is the PCA shot chart after the normalization of VAST method.In fig .9, marking ■ is VM; ● be OM.
Figure 10 is the PCA shot chart after the normalization of PCA-MAS method.In Fig. 10, marking ■ is VM; ● be OM.
Embodiment
Below in conjunction with drawings and Examples, the present invention is further detailed explanation.But embodiments of the present invention are not limited to following embodiment, following embodiment is only as example of the present invention instead of restriction.
Embodiment: based on vegetarian diet crowd urine 1the pca model self-adaptation method for normalizing (PCA-MAS) of H NMR modal data.
1) collection of human urine sample and pre-service
83 routine human urine samples.Wherein, the 41 routine full diet male sex (from certain army officer) and the 42 routine Lacto-Ovo male sex (from the south of Fujian Province College of Buddhism and Nanputuo Temple, Xiamen).The age of general food volunteer is 23 ~ 55 years old.The age of Lacto-Ovo volunteer is 28 ~ 40 years old, and eating habit maintains more than 5 years, lives in concentrated communities in the mode of colony.The routine biochemistry index of volunteer, all in normal range, gets rid of the volunteer suffering from the diseases such as diabetes, blood sugar level exception, hypertension.In addition, suffering from acute disease and once taking in 2 ~ 3 months in sample collection two weeks may cause the volunteer of the medicine of liver or renal toxicity spinoff also to get rid of outside this research, to reduce the impact of not clear factor on metabolism phenotype.Forbid drinking in the 48h of volunteer before urine capture.All volunteers all collect urina sanguinis, and with the centrifugal 5min of 6000r/min at 4 DEG C.Get supernatant after centrifugal, be stored in-20 DEG C of refrigerators until NMR experiment.
2) NMR obtains 1h NMR composes
1D 1h NMR experiment all obtains on Varian 500MHz (Palo Alto, CA, USA) spectrometer.Adopt and test based on the presaturation of 1DNOESY sequence, be i.e. NOESYPR (delay-90 °-t 1-90 ° of-t m-90 ° of-acquisition) carry out water peak pressure system [7], t 1postpone to be set to 4 μ s, t mpostpone for 100ms, and in the pulse daley and incorporation time of 2s, add saturation pulse irradiation to suppress water signal.To all samples, all correct 90 ° of pulse lengths before experiment, experimental temperature is 293K; Adopt data point 5k, spectrum width 5kHz, cumulative 256 times of signal, Fourier transform zeroizes the 16k that counts.
3) spectrogram pre-service
Obtained sample is divided into the vegetarian diet male sex (VM) and the general food male sex (OM) two groups.Data prediction can adopt MestRe-C2.3 software (http://qobrue.usc.es/jsgroup/MestRe-C) and self-programmed software [8]manual phase modulation, baseline correction and the alignment of spectrum peak etc. are carried out to spectrogram.Get the modal data point in δ 0.5 ~ 9.0 interval, and remove 3 regions such as δ 4.6 ~ 6.0 (residual water peak and urea peak), δ 0.6 ~ 0.8 (DSS peak) and δ 1.6 ~ 1.8 (DSS peak), adopt method to be at equal intervals 1348 data points by each spectral integral.
4), after spectrogram pre-service, the vegetarian diet male sex (VM) and the general food male sex (OM) data are expressed as data matrix: X vMand X oM; The class label that sample is corresponding is expressed as 1 and 2.
5) PCA-MAS method for normalizing
A) data matrix X=[X vM; X oM], corresponding category vectors is Y={y i, y iget 1 or 2.Respectively centralization process is done to X and Y, initial normalization coefficient vector s=[s 1, s 2..., s d] t, wherein s i=1, d=1348.
B) s is utilized to be normalization X to X s=X*diag (s).
C) to X sprincipal component analyzes (PCA), extracts first principal component u.
D) gradient descent method is utilized to upgrade normalization coefficient vector:
s = s + η ∂ r 2 ∂ s Wherein η=0.01.
If e) algorithm terminates; Otherwise repeat step b) ~ d) until algorithm convergence
6) Fig. 1 and 5 is the stack spectral before and after normalization respectively.Fig. 2 ~ 4 are the stack spectral of data after other method for normalizing of employing respectively.
7) to data matrix X after normalization sbe PCA analyze, Figure 10 be PCA analyze shot chart.The PCA that other method for normalizing is corresponding analyzes shot chart as shown in Fig. 6 ~ 9.
8) compare with other method for normalizing: the predictive ability utilizing Monte Carlo cross validation (Monte Carlo Cross Validation, MCCV) computation model.
A) internal verification
Utilize data matrix X to calculate normalization coefficient, after normalization, data matrix is X s, X smiddle random choose 80% sample is as training set modeling, and remaining 20% sample is as test set; Training set principal component is analyzed, extracts first principal component and set up pca model; Under utilizing Receiver operating curve, area (Area Under Curve, AUC) weighs test set separability in pca model.Repeat to average for 40 times, be denoted as Q 2 int.
B) external certificate
For data matrix X, each random choose 80% sample is as training set modeling, and remaining 20% sample is as test set; The normalization coefficient vector s of calculation training sample, analyzes the training sample principal component after normalization, extracts first principal component and set up pca model; Utilize normalization coefficient vector s to test sample book normalization, utilize AUC to weigh test set separability in pca model.Repeat to be averaged for 40 times, be denoted as Q 2 ext.
Table 1
Different method for normalizing comparative result is see table 1.

Claims (5)

1. a NMR metabolism group data normalization method for model adaptation, is characterized in that comprising the following steps:
1) data acquisition: metabolism group biological specimen to be detected, is adopted by nuclear magnetic resonance spectrometer and obtains 1h NMR composes; Right 1hNMR spectrum carries out spectrogram editor, obtains pending NMR metabolism group data;
2) data centerization and normalization coefficient initialization: treat that normalized NMR metabolism group data are expressed as X, its every a line represents a shape sample, and sample class vector representation is Y, does centralization process respectively to X and Y; Normalization coefficient vector is designated as s=[s 1, s 2..., s d] t, initialization s is complete 1 column vector;
3) normalized:
X s=X*diag(s)
4) multivariate statistical analysis: multivariate statistical analysis is done to the data matrix after normalization, its load vector is designated as u;
5) model adaptation normalization coefficient: maximize X sprojection and category vectors Y related coefficient on u, that is:
max : r 2 = ( cov ( Xdiag ( s ) u , Y ) std ( Xdiag ( s ) u ) * std ( Y ) ) 2
Gradient descent method is utilized to upgrade s,
s = s + η ∂ r 2 ∂ s
Wherein,
∂ r 2 ∂ s = 2 Y T Xdiag ( s ) u * Y T diag ( u ) * | | Xdiag ( s ) u | | F 2 - ( Y T Xdiag ( s ) u ) 2 * 2 s T diag ( u ) X T Xdiag ( u ) | | Xdiag ( s ) u | | F 4 * | | Y | | F 2
6) loop iteration: repeat step 3) ~ 5), until meet circulation terminates.
2. the NMR metabolism group data normalization method of a kind of model adaptation as claimed in claim 1, is characterized in that in step 1) in, described spectrogram editor comprises phase place and Baseline wander, compose that peak aligns, integration.
3. the NMR metabolism group data normalization method of a kind of model adaptation as claimed in claim 1, it is characterized in that in step 3) in, diag () is diagonal matrix conversion symbol, namely diag (s) represents generation one with vector s for diagonal element, and other element is the diagonal matrix of 0; X sfor data matrix after normalization.
4. the NMR metabolism group data normalization method of a kind of model adaptation as claimed in claim 1, is characterized in that in step 5) in, r represents related coefficient; Cov () and std () is respectively covariance and standard deviation calculates symbol; for gradient operator; η is constant, and its span is (0,1).
5. the NMR metabolism group data normalization method of a kind of model adaptation as claimed in claim 1, is characterized in that in step 6) in, || || be vector modulo symbol; ε is self-defined constant.
CN201510084309.8A 2015-02-16 2015-02-16 Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method Expired - Fee Related CN104615903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510084309.8A CN104615903B (en) 2015-02-16 2015-02-16 Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510084309.8A CN104615903B (en) 2015-02-16 2015-02-16 Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method

Publications (2)

Publication Number Publication Date
CN104615903A true CN104615903A (en) 2015-05-13
CN104615903B CN104615903B (en) 2017-05-03

Family

ID=53150344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510084309.8A Expired - Fee Related CN104615903B (en) 2015-02-16 2015-02-16 Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method

Country Status (1)

Country Link
CN (1) CN104615903B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017128497A1 (en) * 2016-01-25 2017-08-03 哈尔滨工业大学深圳研究生院 Simulation generation method and system for metabolism mixture ms/ms mass spectra
CN112967758A (en) * 2021-02-04 2021-06-15 麦特绘谱生物科技(上海)有限公司 Self-assembled metabonomics data processing system
CN113974618A (en) * 2021-12-12 2022-01-28 广西澍源智能科技有限公司 Noninvasive blood glucose testing method based on water peak blood glucose correction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050170441A1 (en) * 2003-09-12 2005-08-04 Health Research, Inc. Early detection of cancer of specific type using 1HNMR metabonomics
CN102323285A (en) * 2010-11-15 2012-01-18 上海聚类生物科技有限公司 Method for analyzing NMR (Nuclear Magnetic Resonance) metabonomics detection data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050170441A1 (en) * 2003-09-12 2005-08-04 Health Research, Inc. Early detection of cancer of specific type using 1HNMR metabonomics
CN102323285A (en) * 2010-11-15 2012-01-18 上海聚类生物科技有限公司 Method for analyzing NMR (Nuclear Magnetic Resonance) metabonomics detection data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董继扬 等: "核磁共振代谢组学数据的尺度归一化新方法", 《高等学校化学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017128497A1 (en) * 2016-01-25 2017-08-03 哈尔滨工业大学深圳研究生院 Simulation generation method and system for metabolism mixture ms/ms mass spectra
CN112967758A (en) * 2021-02-04 2021-06-15 麦特绘谱生物科技(上海)有限公司 Self-assembled metabonomics data processing system
CN113974618A (en) * 2021-12-12 2022-01-28 广西澍源智能科技有限公司 Noninvasive blood glucose testing method based on water peak blood glucose correction

Also Published As

Publication number Publication date
CN104615903B (en) 2017-05-03

Similar Documents

Publication Publication Date Title
Guhaniyogi et al. Bayesian tensor regression
Koehl Linear prediction spectral analysis of NMR data
Reich et al. A hierarchical max-stable spatial model for extreme precipitation
Forshed et al. Peak alignment of NMR signals by means of a genetic algorithm
Liang et al. Variable selection for partially linear models with measurement errors
CN104303258B (en) Method and apparatus for obtaining enhanced mass spectrometric data
Martin et al. PepsNMR for 1H NMR metabolomic data pre-processing
de Boves Harrington Support vector machine classification trees based on fuzzy entropy of classification
Pantazis et al. Parametric sensitivity analysis for biochemical reaction networks based on pathwise information theory
CN102323285A (en) Method for analyzing NMR (Nuclear Magnetic Resonance) metabonomics detection data
CN104615903B (en) Model adaptive NMR (nuclear magnetic resonance) metabonomics data normalization method
CN109212631B (en) Satellite observation data three-dimensional variation assimilation method considering channel correlation
Karakach et al. Characterization of the measurement error structure in 1D 1H NMR data for metabolomics studies
Kalus et al. Cosmological parameter inference from galaxy clustering: The effect of the posterior distribution of the power spectrum
Legrand et al. Jeffrey’s divergence between autoregressive processes disturbed by additive white noises
Zhu et al. Constrained ordination analysis with flexible response functions
Laudadio et al. Subspace-based MRS data quantitation of multiplets using prior knowledge
Ross Kunz et al. Multivariate calibration maintenance and transfer through robust fused LASSO
Donnet et al. EM algorithm coupled with particle filter for maximum likelihood parameter estimation of stochastic differential mixed-effects models
DelSole et al. Comparing climate time series–Part 1: Univariate test
Ghosh Normality testing for a long-memory sequence using the empirical moment generating function
Juhlin et al. Fast gridless estimation of damped modes
CN111616686A (en) Physiological signal non-equilibrium analysis method based on Heaviside kernel function
Zhao et al. Forward and backward extended Prony method for complex exponential signals with/without additive noise
Harden et al. SARA: a software environment for the analysis of relaxation data acquired with accordion spectroscopy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170503