CN104091089B - A kind of ir data PLS modeling method - Google Patents

A kind of ir data PLS modeling method Download PDF

Info

Publication number
CN104091089B
CN104091089B CN201410362602.1A CN201410362602A CN104091089B CN 104091089 B CN104091089 B CN 104091089B CN 201410362602 A CN201410362602 A CN 201410362602A CN 104091089 B CN104091089 B CN 104091089B
Authority
CN
China
Prior art keywords
int
interval section
pls model
omega
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410362602.1A
Other languages
Chinese (zh)
Other versions
CN104091089A (en
Inventor
陈孝敬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN201410362602.1A priority Critical patent/CN104091089B/en
Publication of CN104091089A publication Critical patent/CN104091089A/en
Application granted granted Critical
Publication of CN104091089B publication Critical patent/CN104091089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of ir data PLS modeling method, determine the weight coefficient of the PLS model of each interval section in conjunction with the correlativity between the error of the PLS model of each interval section and error, thus the fusion PLS model of gained can be made to have minimum error.Method of the present invention can be best the spectral information utilizing each interval section, easy, visual, operand is little, can be very fast find characteristic wavelength interval; The defining method of the weight coefficient during we are bright, due to the correlativity between the error of model that take into account each and participate in merging and error simultaneously, can ensure that the model after fusion has minimum error.

Description

A kind of ir data PLS modeling method
Technical field
The invention belongs to infrared spectrum identification field, specifically a kind of data processing method that can promote infrared spectrum offset minimum binary modeling effect.
Background technology
In the multivariable ir data of small sample, PLS model can well solve the variable collinearity problem and dimension disaster that other modeling method runs into, and therefore obtains in infrared spectrum identification and uses widely.Although PLS can directly to full spectrum modeling, theoretical and a large amount of experiments proves that wavelength chooses is still a kind of method of effective raising PLS model.Wavelength optimization is selected to refer to the screening being carried out characteristic wavelength or wave band by certain method before modeling.After wavelength chooses, institute's established model is owing to eliminating uncorrelated or non-linear variable, and therefore more full wavelength model more simplifies, and predictive ability and robustness are also better.Wherein iPLS (intervalPLS-iPLS) is a kind of conventional Wavelength selecting method.The advantage of iPLS method be easy, visual, operand is little, can be very fast find characteristic wavelength interval.Shortcoming is the spectral information only utilizing an interval section, may lose the useful spectral information of other interval sections.Therefore how the best spectral information of each interval section that utilizes is problem demanding prompt solution.
Summary of the invention
Technical matters to be solved by this invention is, for above-mentioned the deficiencies in the prior art, provides a kind of ir data PLS modeling method.
For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of ir data PLS modeling method, comprises the following steps:
1) the tuple k of largest interval interval number max_int_no, maximum latent variable number max_lv_no, bracketing method is set 1and k 2; Wherein, k 1, k 2all be not less than 2;
2) when counting period interval number is int_no, the cross validation error of corresponding fusion PLS model, the step of calculating is all 2.1 to 2.2, wherein 1≤int_no≤max_int_no:
2.1) the spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X i: the columns of each interval section [] expression rounds; I-th interval section X ithe data that [(i-1) × l+1] ~ (i × l) of corresponding spectrum matrix X arranges; 1≤i≤int_no;
2.2) when calculating latent variable number is lv_no, fusion PLS model wherein 1≤lv_no≤max_lv_no, the step of calculating is all 2.2.1 to 2.2.5;
2.2.1) use k 1retransposing method counting period number is int_no, when latent variable number is lv_no, and the cross validation error of the PLS model that each interval section is corresponding wherein y represents the actual value of the dependent variable matrix in infrared spectrum sample set data, represent that latent variable number that i-th interval section is corresponding is the predicted value of the dependent variable matrix that the PLS model of lv_no obtains according to k1 retransposing method, e ibe corresponding prediction residual matrix, n is the sample number of infrared spectrum sample set data;
2.2.2) counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding wherein, cov ( e i , e j ) = 1 n < e i , e j > , i , j = 1,2 , . . . , int _ no ;
2.2.3) following formula is calculated by the method for nonlinear optimization,
f = min ( &Sigma; i = 1 int _ no &omega; i 2 S 2 ( e i ) + 2 &Sigma; i = 1 int _ no &Sigma; p > i int _ no &omega; i &omega; p r ip S ( e i ) S ( e p ) )
s . t &Sigma; i = 1 int _ no &omega; i = 1 0 &le; &omega; i &le; 1 ;
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient ω=[ω of the PLS model that each interval section is corresponding 1..., ω int_no] ':
2.2.4) use k 2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding wherein represent that the latent variable number that i-th interval section is corresponding is that the PLS model of lv_no is according to k 2the predicted value of the dependent variable matrix that retransposing method obtains, calculates
f ^ int _ no lv _ no = &Sigma; i = 1 int _ no &omega; i 2 S 2 ( e 2 i ) + 2 &Sigma; i = 1 int _ no &Sigma; p > i int _ no &omega; i &omega; p r ip S ( e 2 i ) S ( e 2 p ) ;
2.2.5) select minimum the cross validation error of fusion PLS model when being int_no as interval section number, is designated as
3) minimum under selecting all interval section numbers this is minimum corresponding interval section number int_bt, latent variable number lv_bt and combination coefficient ω _ bt are as the model parameter of optimum;
4) PLS model is merged according to the model parameter structure of optimum: spectrum matrix X is equally divided into int_bt interval section, merges PLS model as follows:
y * = &Sigma; g = 1 int _ bt &omega; _ bt g ( x g &times; b g + c g )
Wherein, ω _ bt gg the component of ω _ bt, y *merge PLS model to the predicted value of the dependent variable of sample; b g, c ginterval section X respectively gpartially minimum regression coefficient when being lv_bt with the corresponding latent variable number of dependent variable matrix Y and intercept; x git is the ir data that g interval section is corresponding.
Fusion PLS model of the present invention is the weighted array of multiple member's model.Member's model is exactly PLS model corresponding to each interval section.The quantity of the corresponding member's model of interval section number.The concrete form of i-th member's model is determined by the spectroscopic data of i-th interval section and the latent variable of extraction.
Compared with prior art, the beneficial effect that the present invention has is: the spectral information utilizing each interval section that method of the present invention can be best, easy, visual, operand is little, can be very fast find characteristic wavelength interval; The defining method of the weight coefficient during we are bright, due to the correlativity between the error of model that take into account each and participate in merging and error simultaneously, can ensure that the model after fusion has minimum error.
Accompanying drawing explanation
Fig. 1 is method flow diagram of the present invention.
Embodiment
Now in conjunction with example, the present invention will be further described.
The spectra spectroscopic data that spectroscopic data adopts matlab2012a to carry, sample is gasoline, and dependent variable is the octane value of sample.Original sample data collection comprises 60 samples, and the spectral variables length of each sample is 700.For convenience of description, this example only selects the spectrum data matrix of 1-6 spectral variables data as sample sets of 1-6 sample.The sample sets data that this example adopts are made up of spectrum data matrix X and dependent variable matrix Y, as follows respectively,
X = - 0.0502 - 0.0459 - 0.0422 - 0.0372 - 0.0333 - 0.0312 - 0.0442 - 0.0396 - 0.0357 - 0.0309 - 0.0267 - 0.0239 - 0.0469 - 0.0413 - 0.0370 - 0.0315 - 0.0265 - 0.0233 - 0.0467 - 0.0422 - 0.0386 - 0.03456 - 0.0302 - 0.0277 - 0.0509 - 0.0451 - 0.0410 - 0.0364 - 0.0327 - 0.0315 - 0.0481 - 0.0427 - 0.0388 - 0.0340 - 0.0301 - 0.0277
Y = 85.3000 85.2500 88.4500 83.4000 87.9000 85.5000
Specific embodiment of the invention step is as follows:
Step 1, optimum configurations: the tuple k that maximum interval section number Max_int_no=2, maximum latent variable number Max_lv_no=2, bracketing method are set 1=4, k 2=6.Arranging of these parameters can adjust according to actual needs.Here such parameters just modeling procedure for convenience of explanation.
Step 2, when counting period interval number is int_no, corresponding fusion PLS model the step calculated is all 2.1 to 2.2, and wherein 1≤int_no≤max_int_no, is described for int_no=2 below:
Spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X by step 2.1 i.The columns of each interval section x 1the first row of corresponding spectrum data matrix X is routine to the 3rd, X 24th row of corresponding spectrum data matrix X are to the 6th row.X 1, X 2as follows respectively.
X 1 = - 0.0502 - 0.0459 - 0.0422 - 0.0442 - 0.0396 - 0.0357 - 0.0469 - 0.0413 - 0.0370 - 0.0467 - 0.0422 - 0.0386 - 0.0509 - 0.0451 - 0.0410 - 0.0481 - 0.0427 - 0.0388
X 2 = - 0.0372 - 0.0333 - 0.0312 - 0.0309 - 0.0267 - 0.0239 - 0.0315 - 0.0265 - 0.0233 - 0.0345 - 0.0302 - 0.0277 - 0.0364 - 0.0327 - 0.0315 - 0.0340 - 0.0301 - 0.0277
Step 2.2 calculates latent variable number when being lv_no, and merge the cross validation error of PLS model, wherein 1≤lv_no≤max_lv_no, the step of calculating is all 2.2.1 to 2.2.5; Be described for lv_no=2 below.
Step 2.2.1 k 1retransposing method counting period number is int_no, the cross validation error of the PLS model that each interval section when latent variable number is lv_no is corresponding.
According to k 1the latent variable number that first interval section of retransposing method calculating gained is corresponding is that the PLS model of lv_no is to the predicted value of dependent variable matrix prediction residual matrix e 1and e 1standard deviation S (e 1) as follows respectively,
y ^ 1 = 83.5924 86.7554 85.2694 85.1904 89.5037 87.4816 , e 1 = 1.7076 - 1.5054 3.1806 1.2096 - 1.6037 - 1.9816 , S(e 1)=2.1490。
According to k 1the latent variable number that second interval section of retransposing method calculating gained is corresponding is that the PLS model of lv_no is to the predicted value of dependent variable matrix prediction residual matrix e 2and e 2standard deviation S (e 2) as follows respectively,
y ^ 2 = 80.1147 86.4685 86.9897 86.9970 81.4383 84.1570 , e 2 = 5.1853 - 1.2185 1.4603 - 3.5970 6.4617 1.3430 , S(e 2)=3.7823。
Step 2.2.2 counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding,
r 11=1.0000,r 12=-0.0900,r 21=-0.0900,r 22=1.0000。
Step 2.2.3 calculates following formula by the method for nonlinear optimization,
f = min ( &Sigma; i = 1 int _ no &omega; i 2 S 2 ( e i ) + 2 &Sigma; i = 1 int _ no &Sigma; j > i int _ no &omega; i &omega; j r ij S ( e i ) S ( e j ) )
s . t &Sigma; i = 1 int _ no &omega; i = 1 0 &le; &omega; i &le; 1
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient of the PLS model that each interval section is corresponding
ω=[0.73760.2624]′。
Step 2.2.4 k 2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
y ^ 21 = 83.5924 86.7554 85.2694 82.1904 89.5037 87.4816 , y ^ 22 = 80.1147 86.4685 86.9897 86.9970 81.4383 84.1570 , e 21 = 1.7076 - 1.5054 3.1806 1.2096 - 1.6037 - 1.9816 , e 22 = 5.1853 - 1.2185 1.4603 - 3.5970 6.4617 1.3430
be the interval corresponding latent variable number of first, second interval section be respectively that the PLS model of lv_no is to the predicted value of dependent variable matrix.
Calculate f ^ int _ no lv _ no = &Sigma; i = 1 int _ no &omega; i 2 S 2 ( e 2 i ) + 2 &Sigma; i = 1 int _ no &Sigma; p > i int _ no &omega; i &omega; p r ip S ( e 2 i ) S ( e 2 p ) = 1.7250
Step 2.2.5 selects minimum the cross validation error of fusion PLS model when being int_no as interval section number, is designated as in this example f ^ 1 1 = 2.4440 , f ^ 1 2 = 2.7208 , f ^ 2 1 = 2.1265 , therefore, during interval section number int_no=1, the cross validation error of PLS model is merged during interval section number int_no=2, the cross validation error merging PLS model is f ^ 2 = 1.7250 .
Step 3, under selecting all interval section number int_no (1≤int_no≤Max_int_no) situations, merges the cross validation error minimum value of PLS model.In this example be minimum value, corresponding optimum model parameter is as follows: interval section number int_bt=2, latent variable number lv_bt=2, combination coefficient ω _ bt=[0.73760.2624] '.
Step 4, the model parameter structure according to optimum merges PLS model.B 1=[64.4-2120.4443.4] ', c 1=1565.1 is X respectively 1partial least squares regression coefficient when the latent variable number corresponding with Y is 2 and intercept.B 2=[105.8596.31404.7] ', c 2=-1544.9 is X respectively 2partial least squares regression coefficient when the latent variable number corresponding with Y is 2 and intercept.Final fusion PLS model is as follows,
y=0.7376×(x 1b 1+64.4)+0.2624×(x 2b 2+105.8)。
The complete spectroscopic data x of a sample is by x 1and x 2form, i.e. x=[x 1x 2].X 1be first interval section corresponding spectroscopic data, x 2it is the spectroscopic data that second interval section is corresponding.Y merges PLS model to the predicted value of the dependent variable of sample.

Claims (1)

1. an ir data PLS modeling method, is characterized in that, comprises the following steps:
1) the tuple k of largest interval interval number max_int_no, maximum latent variable number max_lv_no, bracketing method is set 1and k 2; Wherein, k 1, k 2all be not less than 2;
2) according to step 2.1) and step 2.2) when the counting period, interval number was int_no, the cross validation error of corresponding fusion PLS model, wherein 1≤int_no≤max_int_no:
2.1) the spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X i: the columns of each interval section [] expression rounds; I-th interval section X ithe data that [(i-1) × l+1] ~ (i × l) of corresponding spectrum matrix X arranges; 1≤i≤int_no;
2.2) according to step 2.2.1) ~ step 2.2.5) when to calculate latent variable number be lv_no, merge PLS model wherein 1≤lv_no≤max_lv_no:
2.2.1) use k 1retransposing method counting period number is int_no, when latent variable number is lv_no, and the cross validation error of the PLS model that each interval section is corresponding wherein y represents the actual value of the dependent variable matrix in infrared spectrum sample set data, represent that latent variable number that i-th interval section is corresponding is the predicted value of the dependent variable matrix that the PLS model of lv_no obtains according to k1 retransposing method, e ibe corresponding prediction residual matrix, n is the sample number of infrared spectrum sample set data;
2.2.2) counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding wherein, cov ( e i , e j ) = 1 n < e i , e j > , i , j = 1,2 , . . . , int _ no ;
2.2.3) following formula is calculated by the method for nonlinear optimization:
f = min ( &Sigma; i = 1 int _ no &omega; i 2 S 2 ( e i ) + 2 &Sigma; i = 1 int _ no &Sigma; p > i int _ no &omega; i &omega; p r ip S ( e i ) S ( e p ) )
s . t &Sigma; i = 1 int _ no &omega; i = 1 0 &le; &omega; i &le; 1 ;
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient ω=[ω of the PLS model that each interval section is corresponding 1..., ω int_no] ':
2.2.4) use k 2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding wherein represent that the latent variable number that i-th interval section is corresponding is that the PLS model of lv_no is according to k 2the predicted value of the dependent variable matrix that retransposing method obtains, calculates
f ^ int _ no lv _ no = &Sigma; i = 1 int _ no &omega; i 2 S 2 ( e 2 i ) + 2 &Sigma; i = 1 int _ no &Sigma; p > i int _ no &omega; i &omega; p r ip S ( e 2 i ) S ( e 2 p ) ;
2.2.5) select minimum the cross validation error of fusion PLS model when being int_no as interval section number, is designated as
3) minimum under selecting all interval section numbers this is minimum corresponding interval section number int_bt, latent variable number lv_bt and combination coefficient ω _ bt are as the model parameter of optimum;
4) PLS model is merged according to the model parameter structure of optimum: spectrum matrix X is equally divided into int_bt interval section, merges PLS model as follows:
y * = &Sigma; g = 1 int _ bt &omega; _ bt g ( x g &times; b g + c g )
Wherein, ω _ bt gg the component of ω _ bt, y *merge PLS model to the predicted value of the dependent variable of sample; b g, c ginterval section X respectively gpartially minimum regression coefficient when being lv_bt with the corresponding latent variable number of dependent variable matrix Y and intercept; x git is the ir data that g interval section is corresponding.
CN201410362602.1A 2014-07-28 2014-07-28 A kind of ir data PLS modeling method Active CN104091089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410362602.1A CN104091089B (en) 2014-07-28 2014-07-28 A kind of ir data PLS modeling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410362602.1A CN104091089B (en) 2014-07-28 2014-07-28 A kind of ir data PLS modeling method

Publications (2)

Publication Number Publication Date
CN104091089A CN104091089A (en) 2014-10-08
CN104091089B true CN104091089B (en) 2016-04-27

Family

ID=51638805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410362602.1A Active CN104091089B (en) 2014-07-28 2014-07-28 A kind of ir data PLS modeling method

Country Status (1)

Country Link
CN (1) CN104091089B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462751B (en) * 2014-10-29 2017-05-03 温州大学 Near infrared spectrum modeling method based on multi-element Gaussian fitting
CN105092519B (en) * 2015-07-10 2017-11-14 东北大学 Sample component assay method based on increment PLS
CN108918446B (en) * 2018-04-18 2021-05-04 天津大学 Ultra-low concentration sulfur dioxide ultraviolet difference feature extraction algorithm
CN108872142B (en) * 2018-06-19 2020-12-22 温州大学 Multi-parameter selection optimization method in wavelength selection algorithm
CN109060771B (en) * 2018-07-26 2020-12-29 温州大学 Consensus model construction method based on different characteristic sets of spectrum
CN109060715A (en) * 2018-07-31 2018-12-21 温州大学 A kind of construction method of the near infrared spectrum common recognition model based on self organizing neural network
CN111125629B (en) * 2019-12-25 2023-04-07 温州大学 Domain-adaptive PLS regression model modeling method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1657907A (en) * 2005-03-23 2005-08-24 江苏大学 Agricultural products, food near-infrared spectral specragion selection method
CN101021471A (en) * 2007-03-13 2007-08-22 山东医学高等专科学校 Method for Chinese patent drug fast quantitative analysis by acousto-optic filter near infrared spectral technique
CN103398971A (en) * 2013-07-19 2013-11-20 华北电力大学(保定) Chemometrics method for determining cetane number of diesel oil

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1657907A (en) * 2005-03-23 2005-08-24 江苏大学 Agricultural products, food near-infrared spectral specragion selection method
CN101021471A (en) * 2007-03-13 2007-08-22 山东医学高等专科学校 Method for Chinese patent drug fast quantitative analysis by acousto-optic filter near infrared spectral technique
CN103398971A (en) * 2013-07-19 2013-11-20 华北电力大学(保定) Chemometrics method for determining cetane number of diesel oil

Also Published As

Publication number Publication date
CN104091089A (en) 2014-10-08

Similar Documents

Publication Publication Date Title
CN104091089B (en) A kind of ir data PLS modeling method
CN107194600A (en) A kind of electric load Seasonal Characteristics sorting technique
WO2008156147A1 (en) Coating color database creating method, search method using the database, their system, program, and recording medium
CN104700153A (en) PH (potential of hydrogen) value predicting method of BP (back propagation) neutral network based on simulated annealing optimization
WO2008126209A1 (en) Method, device, and program for making prediction model by multiple regression analysis
CN105630743A (en) Spectrum wave number selection method
NO20071670L (en) Method and apparatus for analyzing the performance of a hydrocarbon reservoir.
CN103279794B (en) Electric power telecommunication network risk assessment method
CN102305792B (en) Nonlinear partial least square optimizing model-based forest carbon sink remote sensing evaluation method
CN106248621A (en) A kind of evaluation methodology and system
CN106384188A (en) Water flooding production potential evaluating method for single horizontal well of strong heterogeneous carbonatite oil reservoir
CN104063577A (en) Method for forecasting characteristic gas development tendency in transformer oil based on generalized recurrent neural network
CN104156530A (en) Channel radiation quantity reconstructing method of high temperature target
CN105372198A (en) Infrared spectrum wavelength selection method based on integrated L1 regularization
CN107729988B (en) Blue algae bloom prediction method based on dynamic deep belief network
CN105823751B (en) Infrared spectrum Multivariate Correction regression modeling method based on λ-SPXY algorithms
MX2015014815A (en) Method for characterising a product by means of topological spectral analysis.
CN111879915A (en) High-resolution monthly soil salinity monitoring method and system for coastal wetland
CN104462751B (en) Near infrared spectrum modeling method based on multi-element Gaussian fitting
CN103366095B (en) A kind of least square fitting signal processing method based on coordinate transform
CN105760682A (en) Four-channel signal reconstruction method based on four-element Hankel matrix
CN105241823A (en) Thermal power plant flue gas spectral quantitative analysis method based on sparse representation
CN106447029B (en) Anti-dazzle glas chemical erosion process parameter optimizing method based on BP neural network
CN102868653A (en) Digital modulation signal classification method based on bispectrum and sparse matrix
CN105842183B (en) A kind of infrared spectrum modeling method based on common recognition selection technique

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant