CN111220565A - CPLS-based infrared spectrum measuring instrument calibration migration method - Google Patents

CPLS-based infrared spectrum measuring instrument calibration migration method Download PDF

Info

Publication number
CN111220565A
CN111220565A CN202010045812.3A CN202010045812A CN111220565A CN 111220565 A CN111220565 A CN 111220565A CN 202010045812 A CN202010045812 A CN 202010045812A CN 111220565 A CN111220565 A CN 111220565A
Authority
CN
China
Prior art keywords
center
matrix
spectrum
data set
domain data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010045812.3A
Other languages
Chinese (zh)
Other versions
CN111220565B (en
Inventor
赵煜辉
刘晓东
李雪晶
芦鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University Qinhuangdao Branch
Original Assignee
Northeastern University Qinhuangdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University Qinhuangdao Branch filed Critical Northeastern University Qinhuangdao Branch
Priority to CN202010045812.3A priority Critical patent/CN111220565B/en
Publication of CN111220565A publication Critical patent/CN111220565A/en
Application granted granted Critical
Publication of CN111220565B publication Critical patent/CN111220565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2201/00Features of devices classified in G01N21/00
    • G01N2201/12Circuits of general importance; Signal processing
    • G01N2201/127Calibration; base line adjustment; drift compensation

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention relates to the technical field of migration learning under a machine learning module, and provides a CPLS-based infrared spectrum measuring instrument calibration migration method. First, a source domain data set { X is collectedmY and target Domain data set { XsY, and carrying out centralization processing on the data set to obtain a centralized source domain data set { X }, wherein the centralized source domain data set is obtainedm_center,YcenterAnd a target domain data set { X }s_center,Ycenter}; then, the matrix X is subjected to correlation based on CPLS algorithmm_center、YcenterPerforming principal component analysis and applying to the matrix Xs_centerPerforming principal component analysis; recalculating the transition matrix Mtrans_preAnd a transfer matrix Mtrans(ii) a Finally, the substance concentration variation of the object to be measured is predicted. The invention can eliminate the random noise measured by the main instrument, improve the data utilization rate and the modeling precision and reduce the time complexity.

Description

CPLS-based infrared spectrum measuring instrument calibration migration method
Technical Field
The invention relates to the technical field of migration learning under a machine learning module, in particular to a CPLS-based infrared spectrum measuring instrument calibration migration method.
Background
The near infrared spectroscopy (NIRS) analysis technology has the advantages of simple instrument operation, high data analysis speed, low cost, no sample pollution and the like, and is generally applied to various fields. In the production process, a near infrared spectrum analysis technology is used for modeling, and the existing calibration model is invalid due to unstable measurement conditions and instrument hardware performance.
The main goal of migration learning is to extract classification or regression knowledge from one or more tasks in the source domain and apply that knowledge to the target domain tasks, if the knowledge of one task is successfully transferred to another, then a model of the new task can be obtained without too many new samples. The learning performance of the target domain is improved by using the knowledge learned in one or more source domains, the problems of target domain label loss, high label cost, time-consuming learning process and the like are solved, and the purpose of improving the learning performance is achieved.
The calibration migration method refers to the migration of a multivariate calibration model under different measuring instruments or measuring states. The method utilizes the linear relation among the spectral data of different sources to convert the measured spectral sample of a new instrument or in a new state, and further can directly utilize the original model to predict the new sample. The migration research can be applied to related fields instead of the same field, and realizes useful information of migration and inter-domain conversion, so that the effectiveness of an original model can be maintained or the original information is utilized to accelerate the modeling speed, a large number of target domain samples or models are prevented from being used for sampling or modeling a target domain again, the effectiveness of the model is improved, the cost is reduced to a great extent, and the modeling speed is accelerated.
The existing calibration migration method has the problems of low prediction precision, limited application occasions and the like. For example, in a PLS-based calibration migration method, partial least-squares (PLS) is one of algorithms commonly used in data information extraction and process monitoring, and by extracting feature information with the maximum correlation between a process variable and a quality variable and dividing the process variable, the process variable and the quality variable are converted into a principal component subspace and a residual subspace, thereby realizing compression and extraction of data. However, the PLS algorithm first extracts the process variable and quality variable pivot separately using principal component analysis, with no correlation between the two pivots. It defaults to all process variables acting on the quality variable, ignoring the state information of internal variables. In many cases, due to lack of excitation of process data, there are a lot of unmeasured process and quality disturbances, and when the remaining information of the quality variables changes, alarm failure occurs, resulting in poor PLS prediction output. In fact, monitoring of quality variable information changes is more important than process variables. On the other hand, the optimization goal involved in building the PLS model is to maximize the principal component correlation between the process and quality variables without residual constraints, maximizing the residual variance between the process and quality variables. Variables cannot be guaranteed to be minimal, which may lead to a large amount of information being left over for process and quality variables. Moreover, the data volume of near infrared spectrum modeling processing is large at present, the time complexity of a serial partial least square algorithm is high, and the training and testing process is long.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides the CPLS-based infrared spectrum measuring instrument calibration migration method, which can eliminate random noise measured by a main instrument, improve the data utilization rate and the modeling precision and reduce the time complexity.
The technical scheme of the invention is as follows:
a CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { XmY and target Domain data set { Xs,Y};
Wherein, Xm=(Xm1,Xm2,...,Xmi,...,XmI)T,Xmi=(xmi1,xmi2,...,xmij,...,xmiJ),Xs=(Xs1,Xs2,...,Xsi,...,XsI)T,Xsi=(xsi1,xsi2,...,xsij,...,xsiJ),xmij、xsijJ, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y1,Y2,...,Yi,...,YI)T,Yi=(yi1,yi2,...,yik,...,yiK),yikThe value of the kth substance concentration variable of the ith sample, where K is 1, 2.. and K is the total number of substance concentration variables;
step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { Xm_center,YcenterAnd a target domain data set { X }s_center,Ycenter};
And step 3: CPLS algorithm based matrix Xm_center、YcenterPerforming principal component analysis:
step 3.1: data set { X) based on PLS algorithmm_center,YcenterEstablishment of calibration model Ycenter=Xm_centerB, calculating to obtain a coefficient matrix B, Xm_centerScore matrix T, Xm_centerLoad matrix P, YcenterScore matrix U, YcenterThe matrix R is introduced so that T is Xm_centerR, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure BDA0002369344620000021
Performing singular value decomposition on predictable substance concentration variables to obtain
Figure BDA0002369344620000031
Wherein, UcAs a left singular matrix, DcAs diagonal matrix of singular values, VcAs a right singular matrix, VcIs an orthogonal matrix; qc=VcDc TIncluding l in descending ordercA plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure BDA0002369344620000032
To obtain
Rc=RQTVcDc -1(4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure BDA0002369344620000033
Extracting main components from unpredictable substance concentration variables to obtain lyThe main component number is
Figure BDA0002369344620000034
Wherein the content of the first and second substances,
Figure BDA0002369344620000035
is composed of
Figure BDA0002369344620000036
The output residual matrix of (3);
the matrix is obtained by equation (6)
Figure BDA0002369344620000037
Step 3.4: by spatially RcProjection of an input variable independent of the material concentration variable as
Figure BDA0002369344620000038
Wherein R isc *=(Rc TRc)-1Rc T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain lxThe main component number is
Figure BDA0002369344620000039
Wherein the content of the first and second substances,
Figure BDA00023693446200000310
is composed of
Figure BDA00023693446200000311
The input residual matrix of (3);
the matrix is obtained by equation (8)
Figure BDA00023693446200000312
Step 3.5: from step 3.1 to step 3.4, X is obtainedm_center、YcenterThe main components extracted by the PLS algorithm are respectively Xm_pre=TPT、Ypre=UQT,Xm_center、YcenterRespectively have a residual error of Xm_res_c=Xm_center-Xm_pre、Yres_c=Ycenter-YpreThat is to obtain
Figure BDA0002369344620000041
Figure BDA0002369344620000042
And 4, step 4: applying the same method as in step 3 to the matrix Xs_centerPerforming principal component analysis to obtain Xs_centerHas a residual error of Xs_res_c
And 5: computer masterAfter the spectrum extracts principal components through a PLS algorithm, the score T of the source domain data setm_pre=Xm_centerR, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithms_pre=Xs_centerR, according to Tm_pre、Ts_preCalculating transfer matrix M based on least square methodtrans_pre(ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrumm=Xm_res_cP, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual errors=Xs_res_cP, according to Tm、TsCalculating transfer matrix M based on least square methodtrans
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured objects_test
Step 6.2: x pair based on CPLS algorithms_testPerforming principal component analysis to obtain Xs_testHas a residual error of Xs_res_c_test
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Ytest_predict=(Xs_test*R*Mtrans_pre*PT+Xs_res_c_test*R*Mtrans*PT)*B。
Further, in step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables include moisture content, oil content, protein content, and starch content of the grain.
The invention has the beneficial effects that:
the invention carries out primary principal component extraction on the source domain data set and the target domain data set based on the CPLS algorithm, then carries out primary principal component extraction on the residual error, calculates the transfer matrix on the basis of the two primary component extractions, eliminates the random noise measured by a main instrument, improves the data utilization rate and the modeling precision, reduces the time complexity and improves the training and testing speed.
Drawings
Fig. 1 is a flow chart of the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS of the present invention.
Fig. 2 is a flow chart of the CPLS-based principal component analysis of the source domain data set in the calibration migration method of the CPLS-based infrared spectroscopic measuring instrument of the present invention.
Fig. 3 is a flow chart of solving a transfer matrix in the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS.
Fig. 4 is a flow chart of predicting the substance concentration variable of the measured object in the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS.
FIG. 5 is a graphical representation of cross validation error of oil on a corn data set as a function of principal component number in accordance with an embodiment.
FIG. 6 is a graph showing the fitting results of mp6spec to mp5spec in the embodiment.
FIG. 7 is a graph showing the fitting results of m5spec-mp5spec in the embodiment.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
The invention provides a CPLS-based infrared spectrum measuring instrument calibration migration method. In data processing, PLS simply extracts principal components from X and Y once, but the residual error of X and Y usually contains effective information, and the extraction is insufficient, so that the error of the established model is large, a parallel partial least squares (CPLS) algorithm is proposed, and on the basis of PLS, the residual error is extracted once again, so that the established model error is smaller, and the linear relation is closer to the real situation. However, in reality, the acquisition of samples is very expensive and time-consuming, so that the transfer learning is proposed on the basis of the CPLS, and the prediction of the target domain test set is completed by establishing a mapping relation on the standard set of the source domain and the target domain.
The CPLS algorithm adopted by the invention is further improved on the PLS algorithm, and the quality of process variable information irrelevant to quality variables and information which cannot be respectively predicted is subjected to principal component analysis and is divided into 5 subspaces: a subspace of process variable and quality variable related information (related principal element subspace), a process variable principal element space, a process variable residual error space, a quality variable principal element space, a quality variable residual error subspace.
The CPLS model achieves three goals: (1) extracting scores directly related to predictable changes in the output from the standard PLS projection, and these score vectors constitute a co-variational subspace (CVS); (2) further projecting the unpredicted output changes to an Output Principal Subspace (OPS) and an Output Residual Subspace (ORS) to monitor these subspaces for abnormal changes; (3) input changes that are not related to the prediction output are further projected into an input principal component subspace (IPS) and an Input Residual Subspace (IRS) to monitor for abnormal changes in these subspaces.
The CPLS algorithm sets the process variable data into two main parts, one of which is information related to the quality variable and the other of which is information unrelated to the quality variable. The quality variable data is also divided into two main parts, one part being information that is predictable from the process variable and the other part being information that is not predictable from the process variable. Thus, the CPLS-based monitoring method provides a complete monitoring framework that is capable of monitoring process and quality variables as well as other portions of information.
As shown in fig. 1, the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS of the present invention includes the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { XmY and target Domain data set { Xs,Y};
Wherein, Xm=(Xm1,Xm2,...,Xmi,...,XmI)T,Xmi=(xmi1,xmi2,...,xmij,...,xmiJ),Xs=(Xs1,Xs2,...,Xsi,...,XsI)T,Xsi=(xsi1,xsi2,...,xsij,...,xsiJ),xmij、xsijJ, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y1,Y2,…,Yi,…,YI)T,Yi=(yi1,yi2,…,yik,...,yiK),yikThe K is the value of the kth substance concentration variable of the ith sample, K is 1,2, …, K is the total number of substance concentration variables.
In this example, the sample is corn in the grain class, the spectral data is absorbance, and the material concentration variables include moisture content, oil content, protein content, and starch content of the corn. The data measured for the same sample, I-80, by the three spectroscopic instruments constitutes the corn data set. The infrared spectrum is measured by infrared spectrum measuring instruments m5, mp5 and mp6 at the wavelength range of 1100-2498nm at intervals of a-2 nm, and J-700 attributes. The main spectrum of the first experiment, namely the secondary spectrum is m5spec-mp6spec, namely the spectrum measured by m5 is taken as the main spectrum, and the corresponding spectral data set is taken as the initial source domain data set; since the spectrum measured for mp6 differs significantly from the spectrum measured for m5, it is selected as the original target domain data set from the spectrum, the corresponding spectral data set. Then, five more experiments were carried out on mp5spec-mp6spec, mp6spec-mp5spec, m5spec-mp5spec, mp5spec-m5spec, and mp6spec-m5spec in this order.
In this example, the Kennard-Stone (KS) algorithm is used to segment the corn data set. First, 20% of the data in the initial source domain data set and the initial target domain data set are extracted as test samples, which are 16 samples of data respectively. And testing the calibration migration model by using the test sample of the target domain. Then, the remaining 80% of the data in the initial source domain data set and the initial target domain data set are extracted as training samples,respectively 64 samples of data. Establishing a reference model by utilizing a training sample of a source domain, and predicting a migration sample of a target domain; and establishing a standard model of the target domain by using the training sample of the target domain so as to compare the performances of other migration models. Then, 20% of data are respectively extracted from the training samples of the source domain and the training samples of the target domain by using a KS algorithm to form a standard sample set of the source domain and a standard sample set of the target domain, and the standard sample sets are respectively used as source domain data sets { X ] used in the method of the inventionmY and target Domain data set { XsY, to establish a transfer relationship between the source domain samples and the target domain samples.
Step 2: centralizing the source domain data set and the target domain data set, namely, averaging the data of each column, and then subtracting the average value of each column from the original data of each column to obtain a centralized source domain data set { Xm_center,YcenterAnd a target domain data set { X }s_center,YcenterAnd thus, deviation caused by large numerical difference can be effectively avoided.
And step 3: as shown in fig. 2, the matrix X is paired based on the CPLS algorithmm_center、YcenterPerforming principal component analysis:
step 3.1: data set { X) based on PLS algorithmm_center,YcenterEstablishment of calibration model Ycenter=Xm_centerB, calculating to obtain a coefficient matrix B, Xm_centerScore matrix T, Xm_centerLoad matrix P, YcenterScore matrix U, YcenterThe matrix R is introduced so that T is Xm_centerR, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure BDA0002369344620000071
Performing Singular Value Decomposition (SVD) on predictable substance concentration variables to obtain
Figure BDA0002369344620000072
Wherein, UcAs a left singular matrix, DcAs diagonal matrix of singular values, VcAs a right singular matrix, VcIs an orthogonal matrix; qc=VcDc TIncluding l in descending ordercA plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure BDA0002369344620000073
To obtain
Rc=RQTVcDc -1(4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure BDA0002369344620000074
Principal component extraction (PCA) is performed on unpredictable substance concentration variables to obtain lyThe main component number is
Figure BDA0002369344620000075
Wherein the content of the first and second substances,
Figure BDA0002369344620000076
is composed of
Figure BDA0002369344620000077
The output residual matrix of (3);
the matrix is obtained by equation (6)
Figure BDA0002369344620000078
Step 3.4: by spatially RcProjection of an input variable independent of the material concentration variable as
Figure BDA0002369344620000081
Wherein R isc *=(Rc TRc)-1Rc T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain lxThe main component number is
Figure BDA0002369344620000082
Wherein the content of the first and second substances,
Figure BDA0002369344620000083
is composed of
Figure BDA0002369344620000084
The input residual matrix of (3);
the matrix is obtained by equation (8)
Figure BDA0002369344620000085
Step 3.5: from step 3.1 to step 3.4, X is obtainedm_center、YcenterThe main components extracted by the PLS algorithm are respectively Xm_pre=TPT、Ypre=UQT,Xm_center、YcenterRespectively have a residual error of Xm_res_c=Xm_center-Xm_pre、Yres_c=Ycenter-YpreThat is to obtain
Figure BDA0002369344620000086
Figure BDA0002369344620000087
According to the algorithm flow of CPLS, X can be obviously seenm_center、YcenterIs divided into three parts: principal component, pair residual extracted by PLS algorithmExtracted principal components, unpredictable errors. Compared with the PLS algorithm, the CPLS algorithm flow shows that the method has the advantages of more processing for extracting the principal component from the residual error and improving the data utilization rate.
And 4, step 4: applying the same method as in step 3 to the matrix Xs_centerPerforming principal component analysis to obtain Xs_centerHas a residual error of Xs_res_c
In this embodiment, the selection result of the PLS algorithm with the optimal principal component number is analyzed as follows: the principal component number of the PLS method is selected by adopting a 10-fold cross validation method, and the change situation of the oil content model cross validation error of the target domain training set in the corn data set caused by the change of the principal component number is shown in FIG. 5 by taking the oil as an example. As can be seen from fig. 5, the cross validation error of oil on corn set reaches global minimum when the principal component number is 12, so we set the optimal principal component number for oil to be 12. The method for selecting the optimal number of main components of the other three components is the same as the method.
And 5: as shown in fig. 3, a transfer matrix is established that maps the target domain latent structure to the source domain latent structure using a least squares algorithm: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithmm_pre=Xm_centerR, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithms_pre=Xs_centerR, according to Tm_pre、Ts_preCalculating transfer matrix M based on least square methodtrans_pre(ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrumm=Xm_res_cP, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual errors=Xs_res_cP, according to Tm、TsCalculating transfer matrix M based on least square methodtrans
Step 6: as shown in fig. 4, the substance concentration variation of the object to be measured is predicted:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as that in step 1 to obtain J measured objectsMatrix X of spectral datas_test
Step 6.2: x pair based on CPLS algorithms_testPerforming principal component analysis to obtain Xs_testHas a residual error of Xs_res_c_test
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Ytest_predict=(Xs_test*R*Mtrans_pre*PT+Xs_res_c_test*R*Mtrans*PT)*B。
In this example, the data is predicted using a model, and the prediction error RMSEP results for different master-slave instrument combinations in the corn data set are shown in table 1 below:
TABLE 1
Figure BDA0002369344620000091
Analysis of Table 1 reveals that: in general, the operation effect of the invention between the spectrum mp5spec and the spectrum mp6spec is generally better than that of the other two groups, because the similarity between mp5spec and mp6spec is higher, and the difference between the two groups and the spectrum m5spec is larger, so that the transfer learning between the two groups is more meaningful, and the result error is smaller. It can be seen that, taking mp6spec as the main spectrum and mp5spec as the auxiliary spectrum, the measurement errors of water, oil, protein and starch are basically the smallest in the six groups of experiments, while the migration results between m5spec and mp5spec, mp6spec are the largest in the six groups.
As shown in FIGS. 6 and 7, the fitting results of mp6spec-mp5spec and m5spec-mp5spec in this example are shown. Comparing fig. 6 and fig. 7, it is clear that the two sets of fitting effects are good or bad. Compared with the transfer learning between the spectrum mp6spec and the spectrum mp5spec, the spectrum mp5spec has higher similarity and better fitting degree, and most points of the spectrum m5spec fall near or on a fitting line, and all points of the spectrum m5spec and the spectrum mp5spec fall below the fitting straight line, which shows that the transfer learning effect of the spectrum m is obviously better than that of the spectrum m5spec and the spectrum m5spec has no need of transfer between the two spectra, because the predicted effect is not good at all.
Since the spectrum mp6spec-mp5spec has the best migration effect, the set of spectra is chosen for the experiment and compared with other algorithms, which are respectively: multivariate Scatter Correction (MSC), Canonical Correlation Analysis (CCA), Slope deviation Correction (SBC), piecewise direct normalization (PDS). As shown in Table 2, the results of RMSEP comparisons under each algorithm for mp6spec-m5spec in the corn data set. As can be seen from table 2, in general, the migration effect of the calibration migration method of the infrared spectroscopic measurement instrument based on CPLS of the present invention is very good: compared with MSC, CCA and PDS algorithms, the method disclosed by the invention is far superior to the three algorithms in the prediction of the four components; compared with the SBC algorithm, the method has better prediction effect on water and oil, and has little difference on the prediction effect on protein and starch.
TABLE 2
Figure BDA0002369344620000101
In a word, through six groups of experiments on a corn data set, according to the obtained experimental results, the results are respectively compared with the MSC algorithm, the CCA algorithm, the SBC algorithm and the PDS algorithm, and the prediction effect of the CPLS algorithm combined with the transfer learning is similar to that of the SBC algorithm, but is far better than that of the MSC algorithm, the CCA algorithm and the PDS algorithm. Therefore, the method eliminates the random noise measured by the main instrument, and improves the data utilization rate and the modeling precision.
It is to be understood that the above-described embodiments are only a few embodiments of the present invention, and not all embodiments. The above examples are only for explaining the present invention and do not constitute a limitation to the scope of protection of the present invention. All other embodiments, which can be derived by those skilled in the art from the above-described embodiments without any creative effort, namely all modifications, equivalents, improvements and the like made within the spirit and principle of the present application, fall within the protection scope of the present invention claimed.

Claims (2)

1. A CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { XmY and target Domain data set { Xs,Y};
Wherein, Xm=(Xm1,Xm2,...,Xmi,...,XmI)T,Xmi=(xmi1,xmi2,...,xmij,...,xmiJ),Xs=(Xs1,Xs2,...,Xsi,...,XsI)T,Xsi=(xsi1,xsi2,...,xsij,...,xsiJ),xmij、xsijJ, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y1,Y2,...,Yi,...,YI)T,Yi=(yi1,yi2,...,yik,...,yiK),yikThe value of the kth substance concentration variable of the ith sample, where K is 1, 2.. and K is the total number of substance concentration variables;
step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { Xm_center,YcenterAnd a target domain data set { X }s_center,Ycenter};
And step 3: CPLS algorithm based matrix Xm_center、YcenterPerforming principal component analysis:
step 3.1: data pair based on PLS algorithmSet { Xm_center,YcenterEstablishment of calibration model Ycenter=Xm_centerB, calculating to obtain a coefficient matrix B, Xm_centerScore matrix T, Xm_centerLoad matrix P, YcenterScore matrix U, YcenterThe matrix R is introduced so that T is Xm_centerR, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure FDA0002369344610000011
Performing singular value decomposition on predictable substance concentration variables to obtain
Figure FDA0002369344610000012
Wherein, UcAs a left singular matrix, DcAs diagonal matrix of singular values, VcAs a right singular matrix, VcIs an orthogonal matrix; qc=VcDc TIncluding l in descending ordercA plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure FDA0002369344610000013
To obtain
Rc=RQTVcDc -1(4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure FDA0002369344610000021
Extracting main components from unpredictable substance concentration variables to obtain lyThe main component number is
Figure FDA0002369344610000022
Wherein the content of the first and second substances,
Figure FDA0002369344610000023
is composed of
Figure FDA0002369344610000024
The output residual matrix of (3);
the matrix is obtained by equation (6)
Figure FDA0002369344610000025
Step 3.4: by spatially RcProjection of an input variable independent of the material concentration variable as
Figure FDA0002369344610000026
Wherein R isc *=(Rc TRc)-1Rc T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain lxThe main component number is
Figure FDA0002369344610000027
Wherein the content of the first and second substances,
Figure FDA0002369344610000028
is composed of
Figure FDA0002369344610000029
The input residual matrix of (3);
the matrix is obtained by equation (8)
Figure FDA00023693446100000210
Step 3.5: from step 3.1 to step 3.4, X is obtainedm_center、YcenterThe main components extracted by the PLS algorithm are respectively Xm_pre=TPT、Ypre=UQT,Xm_center、YcenterRespectively have a residual error of Xm_res_c=Xm_center-Xm_pre、Yres_c=Ycenter-YpreThat is to obtain
Figure FDA00023693446100000211
Figure FDA00023693446100000212
And 4, step 4: applying the same method as in step 3 to the matrix Xs_centerPerforming principal component analysis to obtain Xs_centerHas a residual error of Xs_res_c
And 5: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithmm_pre=Xm_centerR, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithms_pre=Xs_centerR, according to Tm_pre、Ts_preCalculating transfer matrix M based on least square methodtrans_pre(ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrumm=Xm_res_cP, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual errors=Xs_res_cP, according to Tm、TsCalculating transfer matrix M based on least square methodtrans
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured objects_test
Step 6.2: x pair based on CPLS algorithms_testPerforming principal component analysis to obtain Xs_testHas a residual error of Xs_res_c_test
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Ytest_predict=(Xs_test*R*Mtrans_pre*PT+Xs_res_c_test*R*Mtrans*PT)*B。
2. The CPLS-based Infrared Spectroscopy measurement instrument calibration migration method according to claim 1, wherein in the step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables comprise moisture content, oil content, protein content and starch content of grain.
CN202010045812.3A 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method Active CN111220565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010045812.3A CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010045812.3A CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Publications (2)

Publication Number Publication Date
CN111220565A true CN111220565A (en) 2020-06-02
CN111220565B CN111220565B (en) 2022-07-29

Family

ID=70827000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010045812.3A Active CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Country Status (1)

Country Link
CN (1) CN111220565B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113959979A (en) * 2021-10-29 2022-01-21 燕山大学 Near infrared spectrum model migration method based on deep Bi-LSTM network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606164A (en) * 1996-01-16 1997-02-25 Boehringer Mannheim Corporation Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
US20020183600A1 (en) * 2000-03-31 2002-12-05 Roumiana Tsenkova Method and apparatus for detecting mastitis by using visual light and/or near infrared lights
US20040033618A1 (en) * 1998-10-13 2004-02-19 Haass Michael J. Accommodating subject and instrument variations in spectroscopic determinations
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN107064054A (en) * 2017-02-28 2017-08-18 浙江大学 A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN108960329A (en) * 2018-07-06 2018-12-07 浙江科技学院 A kind of chemical process fault detection method comprising missing data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606164A (en) * 1996-01-16 1997-02-25 Boehringer Mannheim Corporation Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
US20040033618A1 (en) * 1998-10-13 2004-02-19 Haass Michael J. Accommodating subject and instrument variations in spectroscopic determinations
US20020183600A1 (en) * 2000-03-31 2002-12-05 Roumiana Tsenkova Method and apparatus for detecting mastitis by using visual light and/or near infrared lights
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN107064054A (en) * 2017-02-28 2017-08-18 浙江大学 A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN108960329A (en) * 2018-07-06 2018-12-07 浙江科技学院 A kind of chemical process fault detection method comprising missing data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LI J: "Qualitative analysis of maize haploid kernels based on calibration transfer by near-infrared spectroscopy", 《ANALYTICAL LETTERS》 *
ZIMMERMAN N: "A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring", 《ATMOSPHERIC MEASUREMENT TECHNIQUES》 *
刘翠玲: "迁移学习在食用油光谱模型转移中的应用", 《食品科学技术学报》 *
吴静珠等: "基于Si-cPLS的小麦种子发芽率近红外模型优化研究", 《光谱学与光谱分析》 *
赵煜辉: "基于校正分布差异的标定迁移方法研究", 《东北大学学报(自然科学版)》 *
赵煜辉: "平均分布差异最小化的NIR标定迁移方法研究", 《光谱学与光谱分析》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113959979A (en) * 2021-10-29 2022-01-21 燕山大学 Near infrared spectrum model migration method based on deep Bi-LSTM network
CN113959979B (en) * 2021-10-29 2022-07-29 燕山大学 Near infrared spectrum model migration method based on deep Bi-LSTM network

Also Published As

Publication number Publication date
CN111220565B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
Deng et al. A bootstrapping soft shrinkage approach for variable selection in chemical modeling
Feilhauer et al. Multi-method ensemble selection of spectral bands related to leaf biochemistry
CN110687072B (en) Calibration set and verification set selection and modeling method based on spectral similarity
CN111563436B (en) Infrared spectrum measuring instrument calibration migration method based on CT-CDD
CN107958267B (en) Oil product property prediction method based on spectral linear representation
CN103854305A (en) Module transfer method based on multiscale modeling
Fan et al. Direct calibration transfer to principal components via canonical correlation analysis
CN111999258B (en) Spectral baseline correction-oriented weighting modeling local optimization method
Cen et al. Combination and comparison of multivariate analysis for the identification of orange varieties using visible and near infrared reflectance spectroscopy
CN114611582B (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
WO2023207453A1 (en) Traditional chinese medicine ingredient analysis method and system based on spectral clustering
CN111220565B (en) CPLS-based infrared spectrum measuring instrument calibration migration method
CN112651173B (en) Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system
Wu et al. Variety identification of Chinese cabbage seeds using visible and near-infrared spectroscopy
Metz et al. RoBoost-PLS2-R: an extension of RoBoost-PLSR method for multi-response
CN111125629B (en) Domain-adaptive PLS regression model modeling method
CN116858822A (en) Quantitative analysis method for sulfadiazine in water based on machine learning and Raman spectrum
CN115630332A (en) Wheat flour quality characteristic prediction method
de Paula et al. Parallelization of a modified firefly algorithm using GPU for variable selection in a multivariate calibration problem
Shan et al. A nonlinear calibration transfer method based on joint kernel subspace
EP3961189A1 (en) Cigarette mainstream smoke spectral data sensory evaluation method
CN114878509A (en) Standard sample-free transfer method of tobacco near-infrared quantitative analysis model
CN110632024B (en) Quantitative analysis method, device and equipment based on infrared spectrum and storage medium
CN111220566A (en) Infrared spectrum measuring instrument calibration migration method based on OPLS and PDS
CN109145887B (en) Threshold analysis method based on spectral latent variable confusion discrimination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant