CN104091089B - A kind of ir data PLS modeling method - Google Patents
A kind of ir data PLS modeling method Download PDFInfo
- Publication number
- CN104091089B CN104091089B CN201410362602.1A CN201410362602A CN104091089B CN 104091089 B CN104091089 B CN 104091089B CN 201410362602 A CN201410362602 A CN 201410362602A CN 104091089 B CN104091089 B CN 104091089B
- Authority
- CN
- China
- Prior art keywords
- int
- interval section
- pls model
- omega
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of ir data PLS modeling method, determine the weight coefficient of the PLS model of each interval section in conjunction with the correlativity between the error of the PLS model of each interval section and error, thus the fusion PLS model of gained can be made to have minimum error.Method of the present invention can be best the spectral information utilizing each interval section, easy, visual, operand is little, can be very fast find characteristic wavelength interval; The defining method of the weight coefficient during we are bright, due to the correlativity between the error of model that take into account each and participate in merging and error simultaneously, can ensure that the model after fusion has minimum error.
Description
Technical field
The invention belongs to infrared spectrum identification field, specifically a kind of data processing method that can promote infrared spectrum offset minimum binary modeling effect.
Background technology
In the multivariable ir data of small sample, PLS model can well solve the variable collinearity problem and dimension disaster that other modeling method runs into, and therefore obtains in infrared spectrum identification and uses widely.Although PLS can directly to full spectrum modeling, theoretical and a large amount of experiments proves that wavelength chooses is still a kind of method of effective raising PLS model.Wavelength optimization is selected to refer to the screening being carried out characteristic wavelength or wave band by certain method before modeling.After wavelength chooses, institute's established model is owing to eliminating uncorrelated or non-linear variable, and therefore more full wavelength model more simplifies, and predictive ability and robustness are also better.Wherein iPLS (intervalPLS-iPLS) is a kind of conventional Wavelength selecting method.The advantage of iPLS method be easy, visual, operand is little, can be very fast find characteristic wavelength interval.Shortcoming is the spectral information only utilizing an interval section, may lose the useful spectral information of other interval sections.Therefore how the best spectral information of each interval section that utilizes is problem demanding prompt solution.
Summary of the invention
Technical matters to be solved by this invention is, for above-mentioned the deficiencies in the prior art, provides a kind of ir data PLS modeling method.
For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of ir data PLS modeling method, comprises the following steps:
1) the tuple k of largest interval interval number max_int_no, maximum latent variable number max_lv_no, bracketing method is set
1and k
2; Wherein, k
1, k
2all be not less than 2;
2) when counting period interval number is int_no, the cross validation error of corresponding fusion PLS model, the step of calculating is all 2.1 to 2.2, wherein 1≤int_no≤max_int_no:
2.1) the spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X
i: the columns of each interval section
[] expression rounds; I-th interval section X
ithe data that [(i-1) × l+1] ~ (i × l) of corresponding spectrum matrix X arranges; 1≤i≤int_no;
2.2) when calculating latent variable number is lv_no, fusion PLS model
wherein 1≤lv_no≤max_lv_no, the step of calculating is all 2.2.1 to 2.2.5;
2.2.1) use k
1retransposing method counting period number is int_no, when latent variable number is lv_no, and the cross validation error of the PLS model that each interval section is corresponding
wherein
y represents the actual value of the dependent variable matrix in infrared spectrum sample set data,
represent that latent variable number that i-th interval section is corresponding is the predicted value of the dependent variable matrix that the PLS model of lv_no obtains according to k1 retransposing method, e
ibe corresponding prediction residual matrix, n is the sample number of infrared spectrum sample set data;
2.2.2) counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding
wherein,
2.2.3) following formula is calculated by the method for nonlinear optimization,
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient ω=[ω of the PLS model that each interval section is corresponding
1..., ω
int_no] ':
2.2.4) use k
2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
wherein
represent that the latent variable number that i-th interval section is corresponding is that the PLS model of lv_no is according to k
2the predicted value of the dependent variable matrix that retransposing method obtains, calculates
2.2.5) select minimum
the cross validation error of fusion PLS model when being int_no as interval section number, is designated as
3) minimum under selecting all interval section numbers
this is minimum
corresponding interval section number int_bt, latent variable number lv_bt and combination coefficient ω _ bt are as the model parameter of optimum;
4) PLS model is merged according to the model parameter structure of optimum: spectrum matrix X is equally divided into int_bt interval section, merges PLS model as follows:
Wherein, ω _ bt
gg the component of ω _ bt, y
*merge PLS model to the predicted value of the dependent variable of sample; b
g, c
ginterval section X respectively
gpartially minimum regression coefficient when being lv_bt with the corresponding latent variable number of dependent variable matrix Y and intercept; x
git is the ir data that g interval section is corresponding.
Fusion PLS model of the present invention is the weighted array of multiple member's model.Member's model is exactly PLS model corresponding to each interval section.The quantity of the corresponding member's model of interval section number.The concrete form of i-th member's model is determined by the spectroscopic data of i-th interval section and the latent variable of extraction.
Compared with prior art, the beneficial effect that the present invention has is: the spectral information utilizing each interval section that method of the present invention can be best, easy, visual, operand is little, can be very fast find characteristic wavelength interval; The defining method of the weight coefficient during we are bright, due to the correlativity between the error of model that take into account each and participate in merging and error simultaneously, can ensure that the model after fusion has minimum error.
Accompanying drawing explanation
Fig. 1 is method flow diagram of the present invention.
Embodiment
Now in conjunction with example, the present invention will be further described.
The spectra spectroscopic data that spectroscopic data adopts matlab2012a to carry, sample is gasoline, and dependent variable is the octane value of sample.Original sample data collection comprises 60 samples, and the spectral variables length of each sample is 700.For convenience of description, this example only selects the spectrum data matrix of 1-6 spectral variables data as sample sets of 1-6 sample.The sample sets data that this example adopts are made up of spectrum data matrix X and dependent variable matrix Y, as follows respectively,
Specific embodiment of the invention step is as follows:
Step 1, optimum configurations: the tuple k that maximum interval section number Max_int_no=2, maximum latent variable number Max_lv_no=2, bracketing method are set
1=4, k
2=6.Arranging of these parameters can adjust according to actual needs.Here such parameters just modeling procedure for convenience of explanation.
Step 2, when counting period interval number is int_no, corresponding fusion PLS model
the step calculated is all 2.1 to 2.2, and wherein 1≤int_no≤max_int_no, is described for int_no=2 below:
Spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X by step 2.1
i.The columns of each interval section
x
1the first row of corresponding spectrum data matrix X is routine to the 3rd, X
24th row of corresponding spectrum data matrix X are to the 6th row.X
1, X
2as follows respectively.
Step 2.2 calculates latent variable number when being lv_no, and merge the cross validation error of PLS model, wherein 1≤lv_no≤max_lv_no, the step of calculating is all 2.2.1 to 2.2.5; Be described for lv_no=2 below.
Step 2.2.1 k
1retransposing method counting period number is int_no, the cross validation error of the PLS model that each interval section when latent variable number is lv_no is corresponding.
According to k
1the latent variable number that first interval section of retransposing method calculating gained is corresponding is that the PLS model of lv_no is to the predicted value of dependent variable matrix
prediction residual matrix e
1and e
1standard deviation S (e
1) as follows respectively,
According to k
1the latent variable number that second interval section of retransposing method calculating gained is corresponding is that the PLS model of lv_no is to the predicted value of dependent variable matrix
prediction residual matrix e
2and e
2standard deviation S (e
2) as follows respectively,
Step 2.2.2 counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding,
r
11=1.0000,r
12=-0.0900,r
21=-0.0900,r
22=1.0000。
Step 2.2.3 calculates following formula by the method for nonlinear optimization,
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient of the PLS model that each interval section is corresponding
ω=[0.73760.2624]′。
Step 2.2.4 k
2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
be the interval corresponding latent variable number of first, second interval section be respectively that the PLS model of lv_no is to the predicted value of dependent variable matrix.
Calculate
Step 2.2.5 selects minimum
the cross validation error of fusion PLS model when being int_no as interval section number, is designated as
in this example
therefore, during interval section number int_no=1, the cross validation error of PLS model is merged
during interval section number int_no=2, the cross validation error merging PLS model is
Step 3, under selecting all interval section number int_no (1≤int_no≤Max_int_no) situations, merges the cross validation error minimum value of PLS model.In this example
be minimum value, corresponding optimum model parameter is as follows: interval section number int_bt=2, latent variable number lv_bt=2, combination coefficient ω _ bt=[0.73760.2624] '.
Step 4, the model parameter structure according to optimum merges PLS model.B
1=[64.4-2120.4443.4] ', c
1=1565.1 is X respectively
1partial least squares regression coefficient when the latent variable number corresponding with Y is 2 and intercept.B
2=[105.8596.31404.7] ', c
2=-1544.9 is X respectively
2partial least squares regression coefficient when the latent variable number corresponding with Y is 2 and intercept.Final fusion PLS model is as follows,
y=0.7376×(x
1b
1+64.4)+0.2624×(x
2b
2+105.8)。
The complete spectroscopic data x of a sample is by x
1and x
2form, i.e. x=[x
1x
2].X
1be first interval section corresponding spectroscopic data, x
2it is the spectroscopic data that second interval section is corresponding.Y merges PLS model to the predicted value of the dependent variable of sample.
Claims (1)
1. an ir data PLS modeling method, is characterized in that, comprises the following steps:
1) the tuple k of largest interval interval number max_int_no, maximum latent variable number max_lv_no, bracketing method is set
1and k
2; Wherein, k
1, k
2all be not less than 2;
2) according to step 2.1) and step 2.2) when the counting period, interval number was int_no, the cross validation error of corresponding fusion PLS model, wherein 1≤int_no≤max_int_no:
2.1) the spectrum matrix X in infrared spectrum sample set data is equally divided into int_no interval section X
i: the columns of each interval section
[] expression rounds; I-th interval section X
ithe data that [(i-1) × l+1] ~ (i × l) of corresponding spectrum matrix X arranges; 1≤i≤int_no;
2.2) according to step 2.2.1) ~ step 2.2.5) when to calculate latent variable number be lv_no, merge PLS model
wherein 1≤lv_no≤max_lv_no:
2.2.1) use k
1retransposing method counting period number is int_no, when latent variable number is lv_no, and the cross validation error of the PLS model that each interval section is corresponding
wherein
y represents the actual value of the dependent variable matrix in infrared spectrum sample set data,
represent that latent variable number that i-th interval section is corresponding is the predicted value of the dependent variable matrix that the PLS model of lv_no obtains according to k1 retransposing method, e
ibe corresponding prediction residual matrix, n is the sample number of infrared spectrum sample set data;
2.2.2) counting period number is int_no, when latent variable number is lv_no, and the correlativity between the prediction residual matrix of the PLS model that each interval section is corresponding
wherein,
2.2.3) following formula is calculated by the method for nonlinear optimization:
Obtaining space-number is int_no, when latent variable number is lv_no, and the combination coefficient ω=[ω of the PLS model that each interval section is corresponding
1..., ω
int_no] ':
2.2.4) use k
2retransposing method counting period number is int_no, when latent variable number is lv_no, and the prediction residual matrix of the PLS model that each interval section is corresponding
wherein
represent that the latent variable number that i-th interval section is corresponding is that the PLS model of lv_no is according to k
2the predicted value of the dependent variable matrix that retransposing method obtains, calculates
2.2.5) select minimum
the cross validation error of fusion PLS model when being int_no as interval section number, is designated as
3) minimum under selecting all interval section numbers
this is minimum
corresponding interval section number int_bt, latent variable number lv_bt and combination coefficient ω _ bt are as the model parameter of optimum;
4) PLS model is merged according to the model parameter structure of optimum: spectrum matrix X is equally divided into int_bt interval section, merges PLS model as follows:
Wherein, ω _ bt
gg the component of ω _ bt, y
*merge PLS model to the predicted value of the dependent variable of sample; b
g, c
ginterval section X respectively
gpartially minimum regression coefficient when being lv_bt with the corresponding latent variable number of dependent variable matrix Y and intercept; x
git is the ir data that g interval section is corresponding.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410362602.1A CN104091089B (en) | 2014-07-28 | 2014-07-28 | A kind of ir data PLS modeling method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410362602.1A CN104091089B (en) | 2014-07-28 | 2014-07-28 | A kind of ir data PLS modeling method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104091089A CN104091089A (en) | 2014-10-08 |
CN104091089B true CN104091089B (en) | 2016-04-27 |
Family
ID=51638805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410362602.1A Active CN104091089B (en) | 2014-07-28 | 2014-07-28 | A kind of ir data PLS modeling method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104091089B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104462751B (en) * | 2014-10-29 | 2017-05-03 | 温州大学 | Near infrared spectrum modeling method based on multi-element Gaussian fitting |
CN105092519B (en) * | 2015-07-10 | 2017-11-14 | 东北大学 | Sample component assay method based on increment PLS |
CN108918446B (en) * | 2018-04-18 | 2021-05-04 | 天津大学 | Ultra-low concentration sulfur dioxide ultraviolet difference feature extraction algorithm |
CN108872142B (en) * | 2018-06-19 | 2020-12-22 | 温州大学 | Multi-parameter selection optimization method in wavelength selection algorithm |
CN109060771B (en) * | 2018-07-26 | 2020-12-29 | 温州大学 | Consensus model construction method based on different characteristic sets of spectrum |
CN109060715A (en) * | 2018-07-31 | 2018-12-21 | 温州大学 | A kind of construction method of the near infrared spectrum common recognition model based on self organizing neural network |
CN111125629B (en) * | 2019-12-25 | 2023-04-07 | 温州大学 | Domain-adaptive PLS regression model modeling method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1657907A (en) * | 2005-03-23 | 2005-08-24 | 江苏大学 | Agricultural products, food near-infrared spectral specragion selection method |
CN101021471A (en) * | 2007-03-13 | 2007-08-22 | 山东医学高等专科学校 | Method for Chinese patent drug fast quantitative analysis by acousto-optic filter near infrared spectral technique |
CN103398971A (en) * | 2013-07-19 | 2013-11-20 | 华北电力大学(保定) | Chemometrics method for determining cetane number of diesel oil |
-
2014
- 2014-07-28 CN CN201410362602.1A patent/CN104091089B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1657907A (en) * | 2005-03-23 | 2005-08-24 | 江苏大学 | Agricultural products, food near-infrared spectral specragion selection method |
CN101021471A (en) * | 2007-03-13 | 2007-08-22 | 山东医学高等专科学校 | Method for Chinese patent drug fast quantitative analysis by acousto-optic filter near infrared spectral technique |
CN103398971A (en) * | 2013-07-19 | 2013-11-20 | 华北电力大学(保定) | Chemometrics method for determining cetane number of diesel oil |
Also Published As
Publication number | Publication date |
---|---|
CN104091089A (en) | 2014-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104091089B (en) | A kind of ir data PLS modeling method | |
CN107194600A (en) | A kind of electric load Seasonal Characteristics sorting technique | |
WO2008156147A1 (en) | Coating color database creating method, search method using the database, their system, program, and recording medium | |
CN104700153A (en) | PH (potential of hydrogen) value predicting method of BP (back propagation) neutral network based on simulated annealing optimization | |
WO2008126209A1 (en) | Method, device, and program for making prediction model by multiple regression analysis | |
CN105630743A (en) | Spectrum wave number selection method | |
NO20071670L (en) | Method and apparatus for analyzing the performance of a hydrocarbon reservoir. | |
CN103279794B (en) | Electric power telecommunication network risk assessment method | |
CN102305792B (en) | Nonlinear partial least square optimizing model-based forest carbon sink remote sensing evaluation method | |
CN106248621A (en) | A kind of evaluation methodology and system | |
CN106384188A (en) | Water flooding production potential evaluating method for single horizontal well of strong heterogeneous carbonatite oil reservoir | |
CN104063577A (en) | Method for forecasting characteristic gas development tendency in transformer oil based on generalized recurrent neural network | |
CN104156530A (en) | Channel radiation quantity reconstructing method of high temperature target | |
CN105372198A (en) | Infrared spectrum wavelength selection method based on integrated L1 regularization | |
CN107729988B (en) | Blue algae bloom prediction method based on dynamic deep belief network | |
CN105823751B (en) | Infrared spectrum Multivariate Correction regression modeling method based on λ-SPXY algorithms | |
MX2015014815A (en) | Method for characterising a product by means of topological spectral analysis. | |
CN111879915A (en) | High-resolution monthly soil salinity monitoring method and system for coastal wetland | |
CN104462751B (en) | Near infrared spectrum modeling method based on multi-element Gaussian fitting | |
CN103366095B (en) | A kind of least square fitting signal processing method based on coordinate transform | |
CN105760682A (en) | Four-channel signal reconstruction method based on four-element Hankel matrix | |
CN105241823A (en) | Thermal power plant flue gas spectral quantitative analysis method based on sparse representation | |
CN106447029B (en) | Anti-dazzle glas chemical erosion process parameter optimizing method based on BP neural network | |
CN102868653A (en) | Digital modulation signal classification method based on bispectrum and sparse matrix | |
CN105842183B (en) | A kind of infrared spectrum modeling method based on common recognition selection technique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |