CN111125629B

CN111125629B - Domain-adaptive PLS regression model modeling method

Info

Publication number: CN111125629B
Application number: CN201911353268.2A
Authority: CN
Inventors: 陈孝敬; 黄光造; 石文; 袁雷明; 陈熙
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2023-04-07
Anticipated expiration: 2039-12-25
Also published as: CN111125629A

Abstract

The invention discloses a domain-adaptive PLS regression model modeling method, which comprises the steps of constructing an original domain spectrum centering matrix by adopting near infrared spectrum data acquired from an original domain, constructing a target domain spectrum centering matrix by adopting near infrared spectrum data acquired from a target domain, eliminating the mean difference of spectra of the original domain and the target domain, finding out the optimal projection direction from the original domain spectrum centering matrix and the target domain spectrum centering matrix by adopting a mode of mapping a transfer matrix to a nuclear matrix space based on the original domain spectrum centering matrix and the target domain spectrum centering matrix, determining an optimal projection matrix, constructing and obtaining a final PLS regression model based on the optimal projection matrix, and weakening the projection scores among different domains and the non-independence of domain labels; the method has the advantages that the difference of collected near-infrared spectrum data under different domains is eliminated by adopting a domain adaptive algorithm, and concentration information of a target domain sample is not required to be collected, so that the modeling process is simplified, and the constructed PLS regression model has good prediction precision on the near-infrared spectrum data of the target domain.

Description

Domain-adaptive PLS regression model modeling method

Technical Field

The invention relates to a modeling method of a PLS regression model, in particular to a domain-adaptive modeling method of the PLS regression model.

Background

The near infrared spectrum technology is a simple, rapid and reliable detection technology. The method comprehensively utilizes the research results of multiple subjects such as a spectrum technology, a computer technology, a mode recognition and the like, is increasingly widely applied in multiple fields by using the unique advantages of the research results, and is gradually accepted by the public and officially approved. Near infrared spectroscopy is an indirect analysis method, and a regression model reflecting the relationship between near infrared spectroscopy data and the property of a sample to be analyzed is often required to be constructed. Among them, a Partial Least Squares (PLS) regression model is the most commonly used multiple regression model. The PLS can eliminate noise information in the spectrum matrix and the concentration matrix, and a good prediction effect is obtained.

The modeling method of the existing Partial Least Squares (PLS) regression model in near infrared spectrum analysis comprises the following steps: the method comprises the steps of firstly, collecting near infrared spectrum data and concentration data of a standard sample to construct a corresponding near infrared spectrum data matrix and a concentration vector, then decomposing the near infrared spectrum matrix, determining the optimal principal component number of the near infrared spectrum matrix through a cross verification method, and finally establishing a mathematical model relation between the near infrared spectrum matrix and the concentration vector by utilizing a Partial Least Squares (PLS) regression method.

The conventional Partial Least Squares (PLS) regression model modeling method based on near infrared spectral data requires the acquisition of near infrared spectral data and concentration data of a standard sample. However, with the complication of the application scenario of the near infrared spectrum, the situation that the detection condition or the apparatus itself changes, such as the temperature/humidity change of the sample inspection, the change of the sample form, the aging of the apparatus, and the replacement of accessories, is often encountered, and at this time, the near infrared spectrum data of the collected standard sample often generates absorbance difference and wavelength drift, so that the prediction result of the Partial Least Squares (PLS) regression model constructed based on the data of the original domain (source domain, corresponding to the near infrared spectrum data collected under the condition 1 state) to the data of the target domain (target domain, corresponding to the near infrared spectrum data collected under the condition 2 state) has a large deviation.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a domain-adaptive PLS regression model modeling method, which does not need to acquire concentration information of a target domain sample, simplifies the modeling process, and eliminates the difference of acquired near-infrared spectrum data in different domains by adopting a domain-adaptive algorithm, so that the constructed PLS regression model has good prediction precision on the near-infrared spectrum data of the target domain.

The technical scheme adopted by the invention for solving the technical problems is as follows: a domain-adaptive PLS regression model modeling method comprises the following steps:

step 1, acquiring ns near-red spectrum samples from an original domain, wherein ns is an integer greater than or equal to 5, and constructing by adopting the ns near-red spectrum samples to obtain an original domain near-infrared spectrum data set { x [) _sq ,y _sq | q =1,2, \8230;, ns }, where x is _sq Near-red spectral data for the q sample taken from the original domain, y _sq The concentration attribute value of the q sample obtained from the original domain;

acquiring nt near-red spectrum samples from a target domain, wherein nt is an integer greater than or equal to 5, and constructing by adopting the nt near-red spectrum samples to obtain a target domain near-infrared spectrum data set { x _tj L j =1,2, \8230 |, nt }, where x _tj Is the near infrared spectral data of the jth sample obtained from the target domain; x is the number of _sq And x _tj Are vectors of 1 row and p columns respectively, and p is collected near-red spectral data x of an original domain _sq And target domain near infrared spectral data x _tj The number of wavebands of the spectroscopic instrument used;

step 2, adopting near-red spectral data x in the original domain near-infrared spectral data set _s1 ～x _sns Constructing to obtain an original domain spectrum matrix X,

the original domain spectrum matrix X is subjected to centering processing to obtain an original domain spectrum centering matrix, and the method specifically comprises the following steps: calculating the mean value of all data in each row in X, and then subtracting the mean value of all data in the row from each column of data in each row in X to obtain an original domain spectrum centering matrix X _s ；

Near-red spectral data x in target domain near-infrared spectral data set _t1 ～x _snt Constructing to obtain a target domain spectrum matrix S,

the method comprises the following steps of performing centering processing on a target domain spectrum matrix S to obtain a target domain spectrum centering matrix, and specifically comprises the following steps: calculating the average value of all data in each line in S, and then, calculating the number of lines in SSubtracting the average value of all the data of the row from each line of data to obtain a target domain spectrum centering matrix X _t ；

Concentration attribute value y in near infrared spectrum data set of original domain _s1 ～y _sns Constructing to obtain an original domain concentration vector Y,

step 3, designing a kernel function, and recording the kernel functions of the vector x and the vector y as k (x, y), wherein the k (x, y) is expressed by a formula (1):

in the formula (1), exp represents an exponential function with a natural logarithm as a base number, | | | | | represents an Euclidean distance between x and y, D represents a kernel parameter, the kernel matrixes corresponding to the two matrixes can be calculated by using the formula (1) and the conventional kernel matrix calculation method, and the kernel matrixes corresponding to the two matrixes Q and D obtained by calculation by using the formula (1) and the conventional kernel matrix calculation method are marked as K (Q, D);

step 4, constructing a category label matrix with m + n rows and m + n columns, wherein m = nt and n = ns, and recording the category label matrix as L, wherein L is expressed by adopting a formula (2):

step 5, constructing a transfer matrix, and recording the transfer matrix as X _st ，X _st Expressed by formula (3):

X _st ＝[X _s →X _t ] (3)

wherein, X _s →X _t Representation matrix X _s And X _t The data in (1) are connected longitudinally in rows;

and 6, constructing an intermediate matrix, and recording the intermediate matrix as H, wherein the H is expressed by a formula (4):

/>

where v is the transition matrix X _st Number of rows of (I) _v×v For a unitary diagonal matrix containing v rows and v columns of elements, 1 _v Representing a column vector containing v columns of elements and all elements being 1, the superscript T representing the transpose of the matrix,/representing the division operation sign;

step 7, setting parameter optimization interval d of parameters d, r and A, wherein the parameter optimization interval d belongs to [10 ] ^-5 ,10 ^-4 ,…10 ⁴ ,10 ⁵ ]，r∈[10 ^-5 ,10 ^-4 ,…10 ⁴ ,10 ⁵ ]，A∈[1,2,…14,15]Constructing a parameter set [ d, r, A ]]Combining all parameters in the parameter optimization interval of d, r and A to obtain

A parameter set [ d, r, A]；

Step 8, setting a projection matrix W, and comparing the projection matrix W obtained in step 7

Individual parameter set [ d, r, A]Each parameter set [ d, r, A ] in]Calculating the projection matrix W corresponding to each parameter set by respectively adopting a grid optimization method to obtain ^ greater than or equal to>

The projection matrix W comprises the following specific processes:

a. judging whether A is equal to 1, and according to the judgment result, performing the following operations:

if A is equal to 1, the following steps are carried out:

a1-1, setting an intermediate parameter KS ₁ 、Y ₁ 、KT ₁ And B ₁ Respectively calculating by using formulas (5) to (8) to obtain an intermediate parameter KS ₁ 、Y ₁ 、KT ₁ And B ₁ ：

KS ₁ ＝K(X _s ,X _s ) (5)

Y ₁ ＝Y (6)

KT ₁ ＝K(X _st ,X _s ) (7)

In the formula (8), the upper corner symbol T represents matrix transposition, K (X) _s ,X _s ) Representation matrix X _s And matrix X _s The corresponding kernel matrix is obtained by calculation by adopting the formula (1) and the conventional kernel matrix calculation method, and K (X) _st ,X _s ) Representation matrix X _st And matrix X _s The corresponding kernel matrix is obtained by calculation by adopting a formula (1) and the conventional kernel matrix calculation method;

a1-2, setting an intermediate parameter w ₁ A1 to B ₁ The eigenvector corresponding to the largest eigenvalue of (a) is assigned to w ₁ ；

a1-3, mixing w ₁ As using the current parameter set [ d, r, A]Calculating to obtain a projection matrix W;

if A is not equal to 1, the following steps are carried out:

a2-1, calculating according to the step a1-1 to obtain an intermediate parameter KS ₁ 、Y ₁ 、KT ₁ And B ₁ ；

a2-2, obtaining an intermediate parameter w by adopting the method of the step a1-2 ₁ The intermediate parameter w ₁ As the 1 st generation projection matrix, finishing the 1 st generation assignment of the projection matrix;

a2-3, setting an intermediate parameter t1 ₁ 、t2 ₁ 、p1 ₁ 、p2 ₁ And c ₁ Calculating to obtain an intermediate parameter t1 by using the formulas (9) to (13) ₁ 、t2 ₁ 、p1 ₁ 、p2 ₁ And c ₁ ：

t1 ₁ ＝KS ₁ w ₁ (9)

t2 ₁ ＝KT ₁ w ₁ (10)

Wherein, the upper corner mark-1 represents matrix inversion, and the upper corner mark T represents the transposition of the matrix;

a2-4, setting an algebraic variable i, initializing i, and enabling i to be equal to 2;

a2-5, carrying out ith generation assignment on the projection matrix, specifically:

s1, setting an intermediate parameter KS _i 、KT _i 、Y _i And B _i Calculating the intermediate parameter KS by using the formula (14) to the formula (17) _i 、KT _i 、Y _i And B _i ：

Y _i ＝Y _i-1 -c _i t1 _i-1 (16)

S2, setting an intermediate parameter w _i A1 to B _i The eigenvector corresponding to the largest eigenvalue of (a) is assigned to w _i W is to be _i As the ith generation projection matrix, finishing the ith generation assignment of the projection matrix;

s3, setting an intermediate parameter t1 _i 、t2 _i 、p1 _i 、p2 _i And c _i Calculating to obtain the intermediate parameter t1 by adopting the formulas (18) to (22) _i 、t2 _i 、p1 _i 、p2 _i And c _i c ₁ ：

t1 _i ＝KS _i w _i (18)

t2 _i ＝KT _i w _i (19)

S4, judging whether the value of i is equal to A or not, if not, adding 1 to the current value of i and updating the value of i, returning to the step S1 to carry out next generation assignment of the projection matrix, and if the value of i is equal to A, entering the step a2-6;

step a2-6, obtaining w ₁ To w _A After being connected in sequence and transversely according to columns, the parameter group [ d, r, A ] at this time is used]Calculated projection matrix, W = [ W = ₁ ,…,w _A ]；

Step 9, setting an intermediate variable T _s And T _t The product obtained in step 8

The projection matrixes W are respectively substituted into formulas (23) and (24) to be calculated, and the result is ^ greater or less than or equal to>

An intermediate variable T _s And &>

An intermediate variable T _t ：

T _s ＝K(X _s ,X _s )W (23)

T _t ＝K(X _xt ,X _s )W (24)

Step 10, obtaining step 9

An intermediate variable T _s Respectively as independent variable and Y as dependent variable to obtain

And the data matrix is composed of independent variables and dependent variables.

Step 11, constructing a PLS regression model by a 5-fold cross validation method to obtain

The PLS regression model specifically comprises: will adopt>

An intermediate variable T _s Based on a dependent variable Y>

The data matrix composed of independent variables and dependent variables is divided into 5 parts at random, and 4 parts are selected at random and constructed by adopting a cross validation method to obtain

A PLS regression model;

step 12, the product obtained in step 9

An intermediate variable T _t Combined as test data with the argument in the remaining part 1 as argument for the->

Testing each PLS regression model for 4 times in succession to obtain the prediction result in the dependent variable Y of each PLS regression model>

And X _t Prediction result of the corresponding dependent variable>

Wherein the predicted result->

Derived by an argument in the remaining 1 part, prediction result->

By means of intermediate variables T as arguments _t Obtaining;

step 13, defining a PLS regression model optimization objective function, wherein the optimization objective function is expressed by an equation (25):

/>

in the formula (25), mean represents the Mean value of the solved vector, std represents the standard deviation of the solved vector, and | represents the symbol of taking the absolute value;

step 14, the dependent variable Y and the prediction result corresponding to each PLS regression model

And prediction result>

Respectively substituting the values into an equation (25) for calculation to obtain the value of f corresponding to each PLS regression model;

step 15, comparing all the values of f obtained in step 14, and corresponding the f with the minimum value to a group [ d, r, A ]]Calculating to obtain a corresponding projection matrix by adopting the method of the step 8, and taking the projection matrix as an optimal projection matrix and marking as W _op Setting the optimum independent variable and marking it as T _sop By means of T _sop ＝K(X _s ,X _s )W _op Calculating to obtain T _sop In terms of T _sop A final PLS regression model was constructed with Y as the dependent variable.

Compared with the prior art, the method has the advantages that the original domain spectrum centering matrix is constructed by adopting the near infrared spectrum data acquired from the original domain, the target domain spectrum centering matrix is constructed by adopting the near infrared spectrum data acquired from the target domain, the mean difference of the original domain spectrum and the target domain spectrum is eliminated, then the optimal projection direction is found out by adopting the mode that the transfer matrix is mapped to the nuclear matrix space based on the original domain spectrum centering matrix and the target domain spectrum centering matrix, the optimal projection matrix is determined, the final PLS regression model is constructed based on the optimal projection matrix, and therefore the projection scores and the non-independence of domain labels between different domains are weakened.

Detailed Description

The present invention will be described in further detail with reference to examples.

The embodiment is as follows: a domain-adaptive PLS regression model modeling method comprises the following steps:

step 1, acquiring ns near-red light spectrum samples from an original domain, wherein ns is an integer greater than or equal to 5, and constructing by adopting the ns near-red light spectrum samples to obtain an original domain near-infrared spectrum data set { x } _sq ,y _sq | q =1,2, \8230;, ns }, where x is _sq Near-red spectral data for the q sample taken from the original domain, y _sq The concentration attribute value of the q sample obtained from the original domain;

acquiring nt near-red spectrum samples from a target domain, wherein nt is an integer greater than or equal to 5, and constructing by adopting the nt near-red spectrum samples to obtain a target domain near-infrared spectrum data set { x [) _tj L j =1,2, \8230;, nt }, where x _tj Is the near infrared spectral data of the jth sample obtained from the target domain; x is the number of _sq And x _tj Respectively 1 row and p columns of vectors, p is the collected near-red spectral data x of the original domain _sq And target domain near infrared spectral data x _tj To makeThe number of bands of the spectroscopic instrument used;

Near-red spectral data x in target domain near-infrared spectral data set _t1 ～x _snt Constructing and obtaining a target domain spectrum matrix S,

the method comprises the following steps of performing centering processing on a target domain spectrum matrix S to obtain a target domain spectrum centering matrix, and specifically comprises the following steps: calculating the mean value of all data in each row in the S, and then subtracting the mean value of all data in each row from each line of data in each row in the S to obtain a target domain spectrum centering matrix X _t ；/>

Concentration attribute value y in original domain near infrared spectrum data set _s1 ～y _sns Constructing to obtain an original domain concentration vector Y,

step 3, designing a kernel function, and marking the kernel functions of the vector x and the vector y as k (x, y), wherein the k (x, y) is expressed by a formula (1):

step 5, constructing a transfer matrix, and marking the transfer matrix as X _st ，X _st Expressed by formula (3):

X _st ＝[X _s →X _t ] (3)

where v is the transition matrix X _st Number of rows of (I) _v×v For a unity diagonal matrix containing v rows and v columns of elements, 1 _v Representing a column vector containing v columns of elements, all the elements being 1, the superscript T representing the transpose of the matrix,/representing a division operator;

Individual parameter set [ d, r, A]；

Step 8, setting a projection matrix W, for the stepObtained in step 7

Individual parameter set [ d, r, A]Each parameter set [ d, r, A ] in]Calculating projection matrix W corresponding to each parameter set by respectively adopting a grid optimization method to obtain->

The projection matrix W comprises the following specific processes:

if A is equal to 1, the following steps are carried out:

KS ₁ ＝K(X _s ,X _s ) (5)

Y ₁ ＝Y (6)

KT ₁ ＝K(X _st ,X _s ) (7)

In the formula (8), the upper corner mark T represents matrix transposition, K (X) _s ,X _s ) Representation matrix X _s And matrix X _s The corresponding kernel matrix is obtained by calculation by adopting the formula (1) and the conventional kernel matrix calculation method, and K (X) _st ,X _s ) Representation matrix X _st And matrix X _s The corresponding kernel matrix is obtained by calculation by adopting a formula (1) and the conventional kernel matrix calculation method;

a1-2, setting an intermediate parameter w ₁ B is to be ₁ The eigenvector corresponding to the largest eigenvalue of (a) is assigned to w ₁ ；

if A is not equal to 1, the following steps are carried out:

a2-2, obtaining the intermediate parameter w by adopting the method of the step a1-2 ₁ The intermediate parameter w ₁ As the 1 st generation projection matrix, finishing the 1 st generation assignment of the projection matrix;

t1 ₁ ＝KS ₁ w ₁ (9)

t2 ₁ ＝KT ₁ w ₁ (10)

s1, setting an intermediate parameter KS _i 、KT _i 、Y _i And B _i Calculating by using the formulas (14) to (17) to obtain an intermediate parameter KS _i 、KT _i 、Y _i And B _i ：

Y _i ＝Y _i-1 -c _i t1 _i-1 (16)

s3, setting an intermediate parameter t1 _i 、t2 _i 、p1 _i 、p2 _i And c _i Calculating intermediate parameter t1 by using equations (18) to (22) _i 、t2 _i 、p1 _i 、p2 _i And c _i c ₁ ：

t1 _i ＝KS _i w _i (18)

t2 _i ＝KT _i w _i (19)

/>

step a2-6, obtaining w ₁ To w _A After being connected in series and transversely, the parameter group [ d, r, A ] of this time is used]Calculated projection matrix, W = [) ₁ ,…,w _A ]；

An intermediate variable T _s And &>

An intermediate variable T _t ：

T _s ＝K(X _s ,X _s )W (23)

T _t ＝K(X _xt ,X _s )W (24)

Step 10, obtaining step 9

The PLS regression model specifically comprises: will adopt>

An intermediate variable T _s Obtained respectively as independent variable and Y as dependent variable>

The data matrix composed of independent variables and dependent variables is divided into 5 parts at random, 4 parts are randomly selected and constructed by adopting a cross validation method to obtain

A PLS regression model; the 5-fold cross validation method is one of the current mature methods for constructing the PLS regression model;

step 12, the product obtained in step 9

Testing the PLS regression models for 4 times to obtain the prediction result of the dependent variable Y of each PLS regression model>

And X _t Prediction result of the corresponding dependent variable>

Wherein the prediction result +>

The prediction result is obtained by the argument in the remaining part 1, the prediction result->

By means of intermediate variables T as arguments _t Obtaining;

in the formula (25), mean represents the Mean value of the vector, std represents the standard deviation of the vector, and | | represents the symbol of taking the absolute value;

step 14, the dependent variable Y and the prediction result corresponding to each PLS regression model are calculated

And predicting a result->

Respectively substituting into an equation (25) for calculation to obtain a value of f corresponding to each PLS regression model;

step 15, comparing all the values of f obtained in step 14, and corresponding the f with the minimum value to a group [ d, r, A ]]Calculating to obtain a corresponding projection matrix by adopting the method of the step 8, and taking the projection matrix as an optimal projection matrix and marking as W _op Setting the optimum independent variable and marking it as T _sop By means of T _sop ＝K(X _s ,X _s )W _op Calculating to obtain T _sop With T _sop A final PLS regression model was constructed with Y as the dependent variable.

When the domain self-adaptive PLS regression model modeling method is adopted to test the test sample, the corresponding near infrared spectrum x of the test sample is obtained _t X is to _t As a one-dimensional matrix, an independent variable parameter T is set _t By using T _t ＝K(x _t ,X _s )W _op Calculating to obtain T _t Then T is added _t Substituting the constructed PLS model as an independent variable to obtain x _t The predicted result of the corresponding dependent variable.

Claims

1. A domain-adaptive PLS regression model modeling method is characterized by comprising the following steps:

acquiring nt near-red spectrum samples from a target domain, wherein nt is an integer greater than or equal to 5, and constructing by adopting the nt near-red spectrum samples to obtain a target domain near-infrared spectrum data set { x [) _tj L j =1,2, \8230;, nt }, where x _tj Is the near infrared spectral data of the jth sample obtained from the target domain; x is the number of _sq And x _tj Are vectors of 1 row and p columns respectively, and p is collected near-red spectral data x of an original domain _sq And target domain near infrared spectral data x _tj The number of wavebands of the spectroscopic instrument used;

the method comprises the following steps of performing centering processing on a target domain spectrum matrix S to obtain a target domain spectrum centering matrix, and specifically comprises the following steps: calculating the mean value of all data in each row in the S, and then subtracting the mean value of all data in each row from each line of data in each row in the S to obtain a target domain spectrum centering matrix X _t ；

step 4, constructing a category label matrix with m + n rows and m + n columns, wherein m = nt, n = ns, and recording the category label matrix as L, wherein L is expressed by adopting a formula (2):

X _st ＝[X _s →X _t ] (3)

wherein, X _s →X _t Representation matrix X _s And X _t The data in (2) are connected longitudinally in rows;

where v is the transition matrix X _st Number of lines of (I) _v×v For a unitary diagonal matrix containing v rows and v columns of elements, 1 _v Representing a column vector containing v columns of elements, all elements being 1, with superscriptsT represents the transpose of the matrix,/represents the division operation symbol;

step 7, setting parameter optimization interval d of parameters d, r and A, wherein the parameter optimization interval d belongs to [10 ] ^-5 ,10 ^-4 ,…10 ⁴ ,10 ⁵ ]，r∈[10 ^-5 ,10 ^-4 ,…10 ⁴ ,10 ⁵ ]，A∈[1,2,…14,15]Constructing the parameter set [ d, r, A ]]Combining all parameters in the parameter optimization interval of d, r and A to obtain

A parameter set [ d, r, A]；

A parameter set [ d, r, A]Each parameter set [ d, r, A ] in]Calculating the projection matrix W corresponding to each parameter set by respectively adopting a grid optimization method to obtain ^ greater than or equal to>

The projection matrix W comprises the following specific processes:

if A is equal to 1, the following steps are carried out:

KS ₁ ＝K(X _s ,X _s ) (5)

Y ₁ ＝Y (6)

KT ₁ ＝K(X _st ,X _s ) (7)

a1-2, setting an intermediate parameter w ₁ A1 to B ₁ Is assigned to w ₁ ；

if A is not equal to 1, the following steps are carried out:

t1 ₁ ＝KS ₁ w ₁ (9)

t2 ₁ ＝KT ₁ w ₁ (10)

/>

Y _i ＝Y _i-1 -c _i t1 _i-1 (16)

S2, setting an intermediate parameter w _i A1 to B _i Is assigned to w _i W is to be _i As the ith generation projection matrix, finishing the ith generation assignment of the projection matrix;

t1 _i ＝KS _i w _i (18)

t2 _i ＝KT _i w _i (19)

S4, judging whether the value of i is equal to A or not, if not, adopting the current value of i plus 1 and updating the value of i, returning to the step S1 for next generation assignment of the projection matrix, and if the value of i is equal to A, entering the step a2-6;

An intermediate variable T _s And &>

An intermediate variable T _t ：

T _s ＝K(X _s ,X _s )W (23)

T _t ＝K(X _xt ,X _s )W (24)

Step 10, obtaining step 9

An intermediate variable T _s Respectively as independent variable and Y as dependent variable, get ^ 5>

A data matrix composed of independent variables and dependent variables;

step 11, constructing a PLS regression model through a 5-fold cross validation method to obtain

The PLS regression model specifically comprises: will adopt>

An intermediate variable T _s Based on a dependent variable Y>

A PLS regression model;

step 12, the product obtained in the step 9

And X _t Prediction result of the corresponding dependent variable>

Wherein the prediction result +>

By means of intermediate variables T as arguments _t Obtaining;

And predicting a result->

step 15, comparing all the values of f obtained in step 14, and corresponding the f with the minimum value to a group [ d, r, A ]]Calculating to obtain a corresponding projection matrix by adopting the method of the step 8, and taking the projection matrix as an optimal projection matrix and marking as W _op Setting the optimum independent variable and marking it as T _sop By using T _sop ＝K(X _s ,X _s )W _op Calculating to obtain T _sop With T _sop A final PLS regression model was constructed using Y as the dependent variable as the independent variable.