CN104091089A - Infrared spectrum data PLS modeling method - Google Patents
Infrared spectrum data PLS modeling method Download PDFInfo
- Publication number
- CN104091089A CN104091089A CN201410362602.1A CN201410362602A CN104091089A CN 104091089 A CN104091089 A CN 104091089A CN 201410362602 A CN201410362602 A CN 201410362602A CN 104091089 A CN104091089 A CN 104091089A
- Authority
- CN
- China
- Prior art keywords
- int
- interval section
- pls model
- omega
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses an infrared spectrum data PLS modeling method. Errors of PLS models of all partition regions and relevancy between the errors are combined so that the weight coefficients of the PLS models of all the partition regions can be determined and the obtained fused PLS model can have the minimum error. According to the method, spectrum information of all the partition regions can be made best use of, the method is easy and convenient to use, the process is visual, the calculating amount is small, and the characteristic wavelength region can be found fast. The weight coefficient determining method takes the errors of the models participating in fusion and the relevancy between the errors into consideration at the same time, and therefore it can be guaranteed that the fused model can have the minimum error.
Description
Technical field
The invention belongs to infrared spectrum identification field, specifically a kind of data processing method that can promote infrared spectrum offset minimum binary modeling effect.
Background technology
In the multivariable ir data of small sample, PLS model can well solve variable collinearity problem and the dimension disaster that other modeling method runs into, and therefore in infrared spectrum identification, has obtained using widely.Although PLS can be directly to full spectrum modeling, theoretical and a large amount of wavelength that experimental results show that select to be still a kind of method of effective raising PLS model.Wavelength optimization selects to refer to the screening of carrying out characteristic wavelength or wave band by certain method before modeling.After wavelength is selected, institute's established model is owing to having rejected uncorrelated or non-linear variable, and therefore more full wavelength model is more simplified, and predictive ability and robustness are also better.Wherein iPLS (interval PLS-iPLS) is a kind of conventional Wavelength selecting method.The advantage of iPLS method be easy, visual, operand is little, can be very fast find characteristic wavelength interval.Shortcoming is only to utilize the spectral information of an interval section, may lose the useful spectral information of other interval sections.Therefore how the best spectral information that utilizes each interval section is problem demanding prompt solution.
Summary of the invention
Technical matters to be solved by this invention is, for above-mentioned the deficiencies in the prior art, to provide a kind of ir data PLS modeling method.
For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of ir data PLS modeling method, comprises the following steps:
1) largest interval interval number max_int_no, maximum latent variable be set count the tuple k of max_lv_no, bracketing method
1and k
2; Wherein, k
1, k
2all be not less than 2;
2) when counting period interval number is int_no, the cross validation error of corresponding fusion PLS model, the step of calculating is all 2.1 to 2.2, wherein 1≤int_no≤max_int_no:
2.1) the spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X
i: the columns of each interval section
[] represents to round; I interval section X
i[(i-1) × l+1]~(i × l) data of row of corresponding spectrum matrix X; 1≤i≤int_no;
2.2) when calculating latent variable number is lv_no, fusion PLS model
wherein 1≤lv_no≤max_lv_no, the step of calculating is all 2.2.1 to 2.2.5;
2.2.1) use k
1retransposing method counting period number is int_no, when latent variable number is lv_no, and the cross validation error of the PLS model that each interval section is corresponding
wherein
y represents the actual value of the dependent variable matrix in infrared spectrum sample set data,
represent the predicted value of the dependent variable matrix that PLS model that i latent variable number corresponding to interval section is lv_no obtains according to k1 retransposing method, e
ibe corresponding prediction residual matrix, n is the sample number of infrared spectrum sample set data;
2.2.2) counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding
wherein,
2.2.3) calculate following formula by the method for nonlinear optimization,
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient ω=[ω of the PLS model that each interval section is corresponding
1..., ω
int_no] ':
2.2.4) use k
2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
wherein
represent that PLS model that i latent variable number corresponding to interval section is lv_no is according to k
2the predicted value of the dependent variable matrix that retransposing method obtains, calculates
2.2.5) select minimum
the cross validation error of the fusion PLS model while being int_no as interval section number, is designated as
3) select under all interval section numbers minimum
this minimum
corresponding interval section is counted int_bt, latent variable and is counted lv_bt and combination coefficient ω _ bt as optimum model parameter;
4) merge PLS model according to optimum model parameter structure: spectrum matrix X is equally divided into int_bt interval section, merges PLS model as follows:
Wherein, ω _ bt
gg the component of ω _ bt, y
*it is the predicted value that merges the dependent variable of PLS model to sample; b
g, c
grespectively interval section X
gpartially minimum regression coefficient and intercept while being lv_bt with the corresponding latent variable number of dependent variable matrix Y; x
gg the ir data that interval section is corresponding.
Fusion PLS model of the present invention is the weighted array of multiple member's models.Member's model is exactly the PLS model that each interval section is corresponding.The quantity of the corresponding member's model of interval section number.The concrete form of i member's model is determined by the spectroscopic data of i interval section and the latent variable of extraction.
Compared with prior art, the beneficial effect that the present invention has is: the spectral information that utilizes each interval section that method of the present invention can be best, easy, visual, operand is little, can be very fast find characteristic wavelength interval; Definite method of the weight coefficient during we are bright, due to the correlativity of having considered between each error and error that participates in the model merging simultaneously, can ensure that the model after merging has minimum error.
Brief description of the drawings
Fig. 1 is method flow diagram of the present invention.
Embodiment
Now in conjunction with example, the present invention will be further described.
The spectra spectroscopic data that spectroscopic data adopts matlab2012a to carry, sample is gasoline, dependent variable is the octane value of sample.Original sample data collection comprises 60 samples, and the spectral variables length of each sample is 700.For convenience of description, this example only selects 1-6 spectral variables data of 1-6 sample as the spectroscopic data matrix of sample sets.The sample sets data that this example adopts are made up of spectroscopic data matrix X and dependent variable matrix Y, as follows respectively,
Specific embodiment of the invention step is as follows:
Step 1, parameter setting: arrange that maximum interval section is counted Max_int_no=2, maximum latent variable is counted the tuple k of Max_lv_no=2, bracketing method
1=4, k
2=6.Arranging of these parameters can be adjusted according to actual needs.Here so just modeling procedure for convenience of explanation of parameters.
Step 2, when counting period interval number is int_no, corresponding fusion PLS model
the step of calculating is all 2.1 to 2.2, and wherein 1≤int_no≤max_int_no, describes as an example of int_no=2 example below:
The spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X by step 2.1
i.The columns of each interval section
x
1first row to the three examples of corresponding spectroscopic data matrix X, X
2the 4th row of corresponding spectroscopic data matrix X are to the 6th row.X
1, X
2as follows respectively.
When step 2.2 is calculated latent variable number and is lv_no, merge the cross validation error of PLS model, wherein 1≤lv_no≤max_lv_no, the step of calculating is all 2.2.1 to 2.2.5; Describe as an example of lv_no=2 example below.
Step 2.2.1 k
1retransposing method counting period number is int_no, the cross validation error of the PLS model that each interval section when latent variable number is lv_no is corresponding.
According to k
1the predicted value of the PLS model that the latent variable number corresponding to first interval section of retransposing method calculating gained is lv_no to dependent variable matrix
prediction residual matrix e
1and e
1standard deviation S (e
1) as follows respectively,
According to k
1the predicted value of the PLS model that the latent variable number corresponding to second interval section of retransposing method calculating gained is lv_no to dependent variable matrix
prediction residual matrix e
2and e
2standard deviation S (e
2) as follows respectively,
Step 2.2.2 counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding,
r
11=1.0000,r
12=-0.0900,r
21=-0.0900,r
22=1.0000。
Step 2.2.3 calculates following formula by the method for nonlinear optimization,
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient of the PLS model that each interval section is corresponding
ω=[0.7376?0.2624]′。
Step 2.2.4 k
2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
respectively first, second interval corresponding latent variable number of the interval section predicted value of PLS model to dependent variable matrix that be lv_no.
Calculate
Step 2.2.5 selects minimum
the cross validation error of the fusion PLS model while being int_no as interval section number, is designated as
in this example
therefore, when interval section is counted int_no=1, merge the cross validation error of PLS model
when interval section is counted int_no=2, the cross validation error that merges PLS model is
Step 3, selects all interval sections and counts in int_no (1≤int_no≤Max_int_no) situation, merges the cross validation error minimum value of PLS model.In this example
be minimum value, corresponding optimum model parameter is as follows: interval section is counted int_bt=2, and latent variable is counted lv_bt=2, combination coefficient ω _ bt=[0.7376 0.2624] '.
Step 4, merges PLS model according to optimum model parameter structure.B
1=[64.4-2120.4 443.4] ', c
1=1565.1 is respectively X
1with latent variable number corresponding to Y be partial least squares regression coefficient and the intercept of 2 o'clock.B
2=[105.8 596.31404.7] ', c
2=-1544.9 is respectively X
2with latent variable number corresponding to Y be partial least squares regression coefficient and the intercept of 2 o'clock.Final fusion PLS model is as follows,
y=0.7376×(x
1b
1+64.4)+0.2624×(x
2b
2+105.8)。
The complete spectroscopic data x of a sample is by x
1and x
2form i.e. x=[x
1x
2].X
1be first interval section corresponding spectroscopic data, x
2second spectroscopic data that interval section is corresponding.Y is the predicted value that merges the dependent variable of PLS model to sample.
Claims (1)
1. an ir data PLS modeling method, is characterized in that, comprises the following steps:
1) largest interval interval number max_int_no, maximum latent variable be set count the tuple k of max_lv_no, bracketing method
1and k
2; Wherein, k
1, k
2all be not less than 2;
2) according to step 2.1) and step 2.2) counting period interval number is while being int_no, the cross validation error of corresponding fusion PLS model, wherein 1≤int_no≤max_int_no:
2.1) the spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X
i: the columns of each interval section
[] represents to round; I interval section X
i[(i-1) × l+1]~(i × l) data of row of corresponding spectrum matrix X; 1≤i≤int_no;
2.2) according to step 2.2.1)~step 2.2.5) when calculating latent variable number and being lv_no, merge PLS model
wherein 1≤lv_no≤max_lv_no:
2.2.1) use k
1retransposing method counting period number is int_no, when latent variable number is lv_no, and the cross validation error of the PLS model that each interval section is corresponding
wherein
y represents the actual value of the dependent variable matrix in infrared spectrum sample set data,
represent the predicted value of the dependent variable matrix that PLS model that i latent variable number corresponding to interval section is lv_no obtains according to k1 retransposing method, e
ibe corresponding prediction residual matrix, n is the sample number of infrared spectrum sample set data;
2.2.2) counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding
wherein,
2.2.3) calculate following formula by the method for nonlinear optimization:
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient ω=[ω of the PLS model that each interval section is corresponding
1..., ω
int_no] ':
2.2.4) use k
2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
wherein
represent that PLS model that i latent variable number corresponding to interval section is lv_no is according to k
2the predicted value of the dependent variable matrix that retransposing method obtains, calculates
2.2.5) select minimum
the cross validation error of the fusion PLS model while being int_no as interval section number, is designated as
3) select under all interval section numbers minimum
this minimum
corresponding interval section is counted int_bt, latent variable and is counted lv_bt and combination coefficient ω _ bt as optimum model parameter;
4) merge PLS model according to optimum model parameter structure: spectrum matrix X is equally divided into int_bt interval section, merges PLS model as follows:
Wherein, ω _ bt
gg the component of ω _ bt, y
*it is the predicted value that merges the dependent variable of PLS model to sample; b
g, c
grespectively interval section X
gpartially minimum regression coefficient and intercept while being lv_bt with the corresponding latent variable number of dependent variable matrix Y; x
gg the ir data that interval section is corresponding.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410362602.1A CN104091089B (en) | 2014-07-28 | 2014-07-28 | A kind of ir data PLS modeling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410362602.1A CN104091089B (en) | 2014-07-28 | 2014-07-28 | A kind of ir data PLS modeling method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104091089A true CN104091089A (en) | 2014-10-08 |
CN104091089B CN104091089B (en) | 2016-04-27 |
Family
ID=51638805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410362602.1A Active CN104091089B (en) | 2014-07-28 | 2014-07-28 | A kind of ir data PLS modeling method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104091089B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462751A (en) * | 2014-10-29 | 2015-03-25 | 温州大学 | Near infrared spectrum modeling method based on multi-element Gaussian fitting |
CN105092519A (en) * | 2015-07-10 | 2015-11-25 | 东北大学 | Sample composition determination method based on increment partial least square method |
CN108872142A (en) * | 2018-06-19 | 2018-11-23 | 温州大学 | The selection optimization method of multi-parameter in a kind of wavelength selection algorithm |
CN108918446A (en) * | 2018-04-18 | 2018-11-30 | 天津大学 | A kind of super low concentration sulfur dioxide ultraviolet difference feature extraction algorithm |
CN109060771A (en) * | 2018-07-26 | 2018-12-21 | 温州大学 | A kind of common recognition model building method based on spectrum different characteristic collection |
CN109060715A (en) * | 2018-07-31 | 2018-12-21 | 温州大学 | A kind of construction method of the near infrared spectrum common recognition model based on self organizing neural network |
CN111125629A (en) * | 2019-12-25 | 2020-05-08 | 温州大学 | Domain-adaptive PLS regression model modeling method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1657907A (en) * | 2005-03-23 | 2005-08-24 | 江苏大学 | Agricultural products, food near-infrared spectral specragion selection method |
CN101021471A (en) * | 2007-03-13 | 2007-08-22 | 山东医学高等专科学校 | Method for Chinese patent drug fast quantitative analysis by acousto-optic filter near infrared spectral technique |
CN103398971A (en) * | 2013-07-19 | 2013-11-20 | 华北电力大学(保定) | Chemometrics method for determining cetane number of diesel oil |
-
2014
- 2014-07-28 CN CN201410362602.1A patent/CN104091089B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1657907A (en) * | 2005-03-23 | 2005-08-24 | 江苏大学 | Agricultural products, food near-infrared spectral specragion selection method |
CN101021471A (en) * | 2007-03-13 | 2007-08-22 | 山东医学高等专科学校 | Method for Chinese patent drug fast quantitative analysis by acousto-optic filter near infrared spectral technique |
CN103398971A (en) * | 2013-07-19 | 2013-11-20 | 华北电力大学(保定) | Chemometrics method for determining cetane number of diesel oil |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462751A (en) * | 2014-10-29 | 2015-03-25 | 温州大学 | Near infrared spectrum modeling method based on multi-element Gaussian fitting |
CN104462751B (en) * | 2014-10-29 | 2017-05-03 | 温州大学 | Near infrared spectrum modeling method based on multi-element Gaussian fitting |
CN105092519A (en) * | 2015-07-10 | 2015-11-25 | 东北大学 | Sample composition determination method based on increment partial least square method |
CN105092519B (en) * | 2015-07-10 | 2017-11-14 | 东北大学 | Sample component assay method based on increment PLS |
CN108918446A (en) * | 2018-04-18 | 2018-11-30 | 天津大学 | A kind of super low concentration sulfur dioxide ultraviolet difference feature extraction algorithm |
CN108872142A (en) * | 2018-06-19 | 2018-11-23 | 温州大学 | The selection optimization method of multi-parameter in a kind of wavelength selection algorithm |
CN108872142B (en) * | 2018-06-19 | 2020-12-22 | 温州大学 | Multi-parameter selection optimization method in wavelength selection algorithm |
CN109060771A (en) * | 2018-07-26 | 2018-12-21 | 温州大学 | A kind of common recognition model building method based on spectrum different characteristic collection |
CN109060771B (en) * | 2018-07-26 | 2020-12-29 | 温州大学 | Consensus model construction method based on different characteristic sets of spectrum |
CN109060715A (en) * | 2018-07-31 | 2018-12-21 | 温州大学 | A kind of construction method of the near infrared spectrum common recognition model based on self organizing neural network |
CN111125629A (en) * | 2019-12-25 | 2020-05-08 | 温州大学 | Domain-adaptive PLS regression model modeling method |
CN111125629B (en) * | 2019-12-25 | 2023-04-07 | 温州大学 | Domain-adaptive PLS regression model modeling method |
Also Published As
Publication number | Publication date |
---|---|
CN104091089B (en) | 2016-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104091089B (en) | A kind of ir data PLS modeling method | |
CN105630743B (en) | A kind of system of selection of spectrum wave number | |
WO2017143921A1 (en) | Multi-sampling model training method and device | |
WO2008126209A1 (en) | Method, device, and program for making prediction model by multiple regression analysis | |
CN106248621A (en) | A kind of evaluation methodology and system | |
CN104700152A (en) | Method for predicting tobacco sales volumes by means of fusing seasonal sales information with search behavior information | |
Erhel et al. | Trends in Job Quality during the Great Recession: a Comparative Approach for the EU/Tendances de la qualité de l'emploi pendant la crise: une approche européenne comparative | |
CN103076308A (en) | Laser-induced breakdown spectroscopy overlapped peak resolution method | |
CN104063577A (en) | Method for forecasting characteristic gas development tendency in transformer oil based on generalized recurrent neural network | |
CN102305792B (en) | Nonlinear partial least square optimizing model-based forest carbon sink remote sensing evaluation method | |
CN108564248B (en) | Method for establishing quality control model in traditional Chinese medicine production process | |
Chen | On the urbanization curves: Types, stages, and research methods | |
CN105823751B (en) | Infrared spectrum Multivariate Correction regression modeling method based on λ-SPXY algorithms | |
CN104715160A (en) | Soft measurement modeling data outlier detecting method based on KMDB | |
CN104462751B (en) | Near infrared spectrum modeling method based on multi-element Gaussian fitting | |
CN112946746A (en) | Method and device for improving AVO inversion accuracy of thin coal seam | |
CN106651184A (en) | Vineyard chemical fertilizer utilization efficiency measurement and calculation method and device | |
CN102074008B (en) | Fully-constrained least square linear spectrum hybrid analysis method of hyperspectral image | |
CN105842183B (en) | A kind of infrared spectrum modeling method based on common recognition selection technique | |
CN103592286A (en) | Method for calculating concentration of interfered element through calculated concentration of element | |
CN105866062A (en) | Temperature correction method for gasoline near-infrared spectrum | |
Li-jun et al. | Rapid authentication of palm oil in pure peanut oil by Raman spectroscopy | |
CN103383705A (en) | Metamaterial meta-modeling method and system, and metamaterial electromagnetic response curve acquisition method | |
CN103607179B (en) | Filtering method, system and wave filter | |
Chen et al. | NIR spectroscopy combined with stability and equivalence MW-PLS method applied to analysis of hyperlipidemia indexes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |