CN111220565B - CPLS-based infrared spectrum measuring instrument calibration migration method - Google Patents

CPLS-based infrared spectrum measuring instrument calibration migration method Download PDF

Info

Publication number
CN111220565B
CN111220565B CN202010045812.3A CN202010045812A CN111220565B CN 111220565 B CN111220565 B CN 111220565B CN 202010045812 A CN202010045812 A CN 202010045812A CN 111220565 B CN111220565 B CN 111220565B
Authority
CN
China
Prior art keywords
center
matrix
data set
spectrum
domain data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010045812.3A
Other languages
Chinese (zh)
Other versions
CN111220565A (en
Inventor
赵煜辉
刘晓东
李雪晶
芦鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University Qinhuangdao Branch
Original Assignee
Northeastern University Qinhuangdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University Qinhuangdao Branch filed Critical Northeastern University Qinhuangdao Branch
Priority to CN202010045812.3A priority Critical patent/CN111220565B/en
Publication of CN111220565A publication Critical patent/CN111220565A/en
Application granted granted Critical
Publication of CN111220565B publication Critical patent/CN111220565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2201/00Features of devices classified in G01N21/00
    • G01N2201/12Circuits of general importance; Signal processing
    • G01N2201/127Calibration; base line adjustment; drift compensation

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention relates to the technical field of migration learning under a machine learning module, and provides a CPLS-based infrared spectrum measuring instrument calibration migration method. First, a source domain data set { X is collected m Y and target Domain data set { X s Y, and carrying out centralization processing on the data set to obtain a centralized source domain data set { X }, wherein the centralized source domain data set is obtained m_center ,Y center And a target domain data set { X } s_center ,Y center }; then, the matrix X is subjected to correlation based on CPLS algorithm m_center 、Y center Performing principal component analysis and applying to the matrix X s_center Performing principal component analysis; recalculating the transition matrix M trans_pre And a transfer matrix M trans (ii) a Finally, the substance concentration variation of the object to be measured is predicted. The invention can eliminate the random noise measured by the main instrument, improve the data utilization rate and the modeling precision and reduce the time complexity.

Description

CPLS-based infrared spectrum measuring instrument calibration migration method
Technical Field
The invention relates to the technical field of migration learning under a machine learning module, in particular to a CPLS-based infrared spectrum measuring instrument calibration migration method.
Background
The near infrared spectroscopy (NIRS) analysis technology has the advantages of simple instrument operation, high data analysis speed, low cost, no sample pollution and the like, and is generally applied to various fields. In the production process, a near infrared spectrum analysis technology is used for modeling, and the existing calibration model is invalid due to unstable measurement conditions and instrument hardware performance.
The main goal of migration learning is to extract classification or regression knowledge from one or more tasks in the source domain and apply that knowledge to the target domain tasks, if the knowledge of one task is successfully transferred to another, then a model of the new task can be obtained without too many new samples. The learning performance of the target domain is improved by using the knowledge learned in one or more source domains, the problems of target domain label loss, high label cost, time-consuming learning process and the like are solved, and the purpose of improving the learning performance is achieved.
The calibration migration method refers to the migration of a multi-element calibration model under different measuring instruments or measuring states. The method utilizes the linear relation among the spectral data of different sources to convert the measured spectral sample of a new instrument or in a new state, and further can directly utilize the original model to predict the new sample. The migration research can be applied to related fields instead of the same field, and realizes useful information of migration and inter-domain conversion, so that the effectiveness of an original model can be maintained or the original information is utilized to accelerate the modeling speed, a large number of target domain samples or models are prevented from being used for sampling or modeling a target domain again, the effectiveness of the model is improved, the cost is reduced to a great extent, and the modeling speed is accelerated.
The existing calibration migration method has the problems of low prediction precision, limited application occasions and the like. For example, in a PLS-based calibration migration method, partial least-squares (PLS) is one of algorithms commonly used in data information extraction and process monitoring, and by extracting feature information with the maximum correlation between a process variable and a quality variable and dividing the process variable, the process variable and the quality variable are converted into a principal component subspace and a residual subspace, thereby realizing compression and extraction of data. However, the PLS algorithm first extracts the process variable and quality variable pivot separately using principal component analysis, with no correlation between the two pivots. It defaults to all process variables acting on the quality variable, ignoring the state information of internal variables. In many cases, due to lack of excitation of process data, there are a lot of unmeasured process and quality disturbances, and when the remaining information of the quality variables changes, alarm failure occurs, resulting in poor PLS prediction output. In fact, monitoring of quality variable information changes is more important than process variables. On the other hand, the optimization goal involved in building the PLS model is to maximize the principal component correlation between the process and quality variables without residual constraints, maximizing the residual variance between the process and quality variables. Variables cannot be guaranteed to be minimal, which may lead to a large amount of information being left over for process and quality variables. Moreover, the data volume of near infrared spectrum modeling processing is large at present, the time complexity of a serial partial least square algorithm is high, and the training and testing process is long.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides the CPLS-based infrared spectrum measuring instrument calibration migration method, which can eliminate random noise measured by a main instrument, improve the data utilization rate and the modeling precision and reduce the time complexity.
The technical scheme of the invention is as follows:
a CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,...,Y i ,...,Y I ) T ,Y i =(y i1 ,y i2 ,...,y ik ,...,y iK ),y ik The value of the kth substance concentration variable of the ith sample, where K is 1, 2.. and K is the total number of substance concentration variables;
Step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { X m_center ,Y center And a target domain data set { X } s_center ,Y center };
And step 3: CPLS algorithm based matrix X m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure BDA0002369344620000021
Performing singular value decomposition on predictable substance concentration variables to obtain
Figure BDA0002369344620000031
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure BDA0002369344620000032
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure BDA0002369344620000033
Extracting main components from unpredictable substance concentration variables to obtain l y The main component number is
Figure BDA0002369344620000034
Wherein,
Figure BDA0002369344620000035
is composed of
Figure BDA0002369344620000036
The output residual matrix of (3);
obtaining a matrix by equation (6)
Figure BDA0002369344620000037
Step 3.4: by spatially R c Projection, obtaining variables independent of substance concentration Input variable of
Figure BDA0002369344620000038
Wherein R is c * =(R c T R c ) -1 R c T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Figure BDA0002369344620000039
Wherein,
Figure BDA00023693446200000310
is composed of
Figure BDA00023693446200000311
The input residual matrix of (3);
the matrix is obtained by equation (8)
Figure BDA00023693446200000312
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
Figure BDA0002369344620000041
Figure BDA0002369344620000042
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c
And 5: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured object s_test
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
Further, in step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables include moisture content, oil content, protein content, and starch content of the grain.
The invention has the beneficial effects that:
the invention carries out primary principal component extraction on the source domain data set and the target domain data set based on the CPLS algorithm, then carries out primary principal component extraction on the residual error, calculates the transfer matrix on the basis of the two primary component extractions, eliminates the random noise measured by a main instrument, improves the data utilization rate and the modeling precision, reduces the time complexity and improves the training and testing speed.
Drawings
Fig. 1 is a flow chart of the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS of the present invention.
Fig. 2 is a flow chart of the CPLS-based principal component analysis of the source domain data set in the calibration migration method of the CPLS-based infrared spectroscopic measuring instrument of the present invention.
Fig. 3 is a flow chart of solving a transfer matrix in the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS.
Fig. 4 is a flowchart of predicting the substance concentration variable of the measured object in the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS according to the present invention.
FIG. 5 is a graphical representation of cross-validation error of oil on a corn data set as a function of principal component number in accordance with an embodiment.
FIG. 6 is a graph showing the fitting results of mp6spec to mp5spec in the embodiment.
FIG. 7 is a graph showing the fitting results of m5spec-mp5spec in the embodiment.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
The invention provides a CPLS-based infrared spectrum measuring instrument calibration migration method. In data processing, PLS simply extracts principal components from X and Y once, but the residual error of X and Y usually contains effective information, and the extraction is insufficient, so that the error of the established model is large, a parallel partial least squares (CPLS) algorithm is proposed, and on the basis of PLS, the residual error is extracted once again, so that the established model error is smaller, and the linear relation is closer to the real situation. However, in reality, the acquisition of samples is very expensive and time-consuming, so that the transfer learning is proposed on the basis of the CPLS, and the prediction of the target domain test set is completed by establishing a mapping relation on the standard set of the source domain and the target domain.
The CPLS algorithm adopted by the invention is further improved on the PLS algorithm, and the quality of process variable information irrelevant to quality variables and information which cannot be respectively predicted is subjected to principal component analysis and is divided into 5 subspaces: a subspace of process variable and quality variable related information (related principal element subspace), a process variable principal element space, a process variable residual error space, a quality variable principal element space, a quality variable residual error subspace.
The CPLS model achieves three goals: (1) extracting scores directly related to predictable changes in the output from the standard PLS projection, and these score vectors constitute a co-variational subspace (CVS); (2) further projecting the unpredicted output changes to an Output Principal Subspace (OPS) and an Output Residual Subspace (ORS) to monitor these subspaces for abnormal changes; (3) input changes that are not related to the prediction output are further projected into an input principal component subspace (IPS) and an Input Residual Subspace (IRS) to monitor for abnormal changes in these subspaces.
The CPLS algorithm sets the process variable data into two main parts, one of which is information related to the quality variable and the other of which is information unrelated to the quality variable. The quality variable data is also divided into two main parts, one part being information that is predictable from the process variable and the other part being information that is not predictable from the process variable. Thus, the CPLS-based monitoring method provides a complete monitoring framework that is capable of monitoring process and quality variables as well as other portions of information.
As shown in fig. 1, the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS of the present invention includes the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement main instrument to correspond to a source domain, enabling an infrared spectrum measurement secondary instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement main instrument and the infrared spectrum measurement secondary instrument to respectively obtain a main spectrum and a secondary spectrum, respectively extracting spectral data of the main spectrum and the secondary spectrum at intervals anm within a wavelength range, and collecting substances of each sampleThe value of the concentration variable is changed to obtain a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,…,Y i ,…,Y I ) T ,Y i =(y i1 ,y i2 ,…,y ik ,...,y iK ),y ik The K is the value of the kth substance concentration variable of the ith sample, K is 1,2, …, K is the total number of substance concentration variables.
In this example, the sample is corn in the grain class, the spectral data is absorbance, and the material concentration variables include moisture content, oil content, protein content, and starch content of the corn. The data measured for the same sample, I-80, by the three spectroscopic instruments constitutes the corn data set. The infrared spectrum is measured by infrared spectrum measuring instruments m5, mp5 and mp6 at intervals of a-2 nm in the wavelength range of 1100-2498nm, and J-700 attributes. The main spectrum of the first experiment, namely the secondary spectrum is m5spec-mp6spec, namely the spectrum measured by m5 is taken as the main spectrum, and the corresponding spectral data set is taken as the initial source domain data set; since the spectrum measured for mp6 differs significantly from the spectrum measured for m5, it is selected as the original target domain data set from the spectrum, the corresponding spectral data set. Then, five more experiments were carried out on mp5spec-mp6spec, mp6spec-mp5spec, m5spec-mp5spec, mp5spec-m5spec, and mp6spec-m5spec in this order.
In this example, the Kennard-Stone (KS) algorithm was used to segment the corn data set. Firstly, 20% of data in the initial source domain data set and the initial target domain data set are extracted as a test sampleHere, the data are 16 samples. And testing the calibration migration model by using the test sample of the target domain. Then, the remaining 80% of the data in the initial source domain data set and the initial target domain data set are extracted as training samples, which are 64 samples of data respectively. Establishing a reference model by utilizing a training sample of a source domain, and predicting a migration sample of a target domain; and establishing a standard model of the target domain by using the training sample of the target domain so as to compare the performances of other migration models. Then, 20% of data are respectively extracted from the training samples of the source domain and the training samples of the target domain by using a KS algorithm to form a standard sample set of the source domain and a standard sample set of the target domain, and the standard sample sets are respectively used as source domain data sets { X ] used in the method of the invention m Y and target Domain data set { X s Y, to establish a transfer relationship between the source domain samples and the target domain samples.
Step 2: centralizing the source domain data set and the target domain data set, namely, averaging the data of each column, and then subtracting the average value of each column from the original data of each column to obtain a centralized source domain data set { X m_center ,Y center } and the target domain dataset X s_center ,Y center And thus, deviation caused by large numerical difference can be effectively avoided.
And 3, step 3: as shown in fig. 2, the matrix X is paired based on the CPLS algorithm m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure BDA0002369344620000071
Performing Singular Value Decomposition (SVD) on predictable substance concentration variables to obtain
Figure BDA0002369344620000072
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure BDA0002369344620000073
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure BDA0002369344620000074
Principal component extraction (PCA) is performed on unpredictable substance concentration variables to obtain l y The main component number is
Figure BDA0002369344620000075
Wherein,
Figure BDA0002369344620000076
is composed of
Figure BDA0002369344620000077
The output residual matrix of (3);
passing through type(6) Determining a matrix
Figure BDA0002369344620000078
Step 3.4: by spatially R c Projection of an input variable independent of the material concentration variable as
Figure BDA0002369344620000081
Wherein R is c * =(R c T R c ) -1 R c T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Figure BDA0002369344620000082
Wherein,
Figure BDA0002369344620000083
is composed of
Figure BDA0002369344620000084
The input residual matrix of (3);
the matrix is obtained by equation (8)
Figure BDA0002369344620000085
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
Figure BDA0002369344620000086
Figure BDA0002369344620000087
According to the algorithm flow of CPLS, X can be obviously seen m_center 、Y center Is divided into three parts: principal component extracted by PLS algorithm, principal component extracted for residual, unpredictable error. Compared with the PLS algorithm, the CPLS algorithm flow shows that the method has the advantages of more processing for extracting the principal component from the residual error and improving the data utilization rate.
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c
In this embodiment, the selection result of the PLS algorithm with the optimal principal component number is analyzed as follows: the principal component number of the PLS method is selected by adopting a 10-fold cross validation method, and the change situation of the oil content model cross validation error of the target domain training set in the corn data set caused by the change of the principal component number is shown in FIG. 5 by taking the oil as an example. As can be seen from fig. 5, the cross validation error of oil on corn set reaches global minimum when the principal component number is 12, so we set the optimal principal component number for oil to be 12. The method for selecting the optimal number of main components of the other three components is the same as the method.
And 5: as shown in fig. 3, a transfer matrix is established that maps the target domain latent structure to the source domain latent structure using a least squares algorithm: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans
Step 6: as shown in fig. 4, the substance concentration variation of the object to be measured is predicted:
step 6.1: collecting spectrum of measured object from instrument by infrared spectrometry, extracting spectrum data by the same method as step 1 to obtain J matrices X composed of spectrum data s_test
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test
Step 6.3: predicting the matrix formed by the material concentration variable of the measured object as Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
In this example, the data is predicted using a model, and the prediction error RMSEP results for different master-slave instrument combinations in the corn data set are shown in table 1 below:
TABLE 1
Figure BDA0002369344620000091
Analysis of Table 1 reveals that: in general, the operation effect of the invention between the spectrum mp5spec and the spectrum mp6spec is generally better than that of the other two groups, because the similarity between mp5spec and mp6spec is higher, and the difference between the two groups and the spectrum m5spec is larger, so that the transfer learning between the two groups is more meaningful, and the result error is smaller. It can be seen that, taking mp6spec as the main spectrum and mp5spec as the auxiliary spectrum, the measurement errors of water, oil, protein and starch are basically the smallest in the six groups of experiments, while the migration results between m5spec and mp5spec, mp6spec are the largest in the six groups.
As shown in FIGS. 6 and 7, the fitting results of mp6spec-mp5spec and m5spec-mp5spec in this example are shown. Comparing fig. 6 and fig. 7, it is clear that the two sets of fitting effects are good or bad. Compared with the transfer learning between the spectrum mp6spec and the spectrum mp5spec, the spectrum mp5spec has higher similarity and better fitting degree, and most points of the spectrum m5spec fall near or on a fitting line, and all points of the spectrum m5spec and the spectrum mp5spec fall below the fitting straight line, which shows that the transfer learning effect of the spectrum m is obviously better than that of the spectrum m5spec and the spectrum m5spec has no need of transfer between the two spectra, because the predicted effect is not good at all.
Since the spectrum mp6spec-mp5spec has the best migration effect, the set of spectra is chosen for the experiment and compared with other algorithms, which are respectively: multivariate Scatter Correction (MSC), Canonical Correlation Analysis (CCA), Slope deviation Correction (SBC), Piecewise Direct normalization (PDS). As shown in Table 2, the results of RMSEP comparisons under each algorithm are for mp6spec-m5spec in the corn data set. As can be seen from table 2, in general, the migration effect of the calibration migration method of the infrared spectroscopic measurement instrument based on CPLS of the present invention is very good: compared with MSC, CCA and PDS algorithms, the method disclosed by the invention is far superior to the three algorithms in the prediction of the four components; compared with the SBC algorithm, the method has better prediction effect on water and oil, and has little difference on the prediction effect on protein and starch.
TABLE 2
Figure BDA0002369344620000101
In a word, through six groups of experiments on a corn data set, according to the obtained experimental results, the results are respectively compared with the MSC algorithm, the CCA algorithm, the SBC algorithm and the PDS algorithm, and the prediction effect of the CPLS algorithm combined with the transfer learning is similar to that of the SBC algorithm, but is far better than that of the MSC algorithm, the CCA algorithm and the PDS algorithm. Therefore, the method eliminates the random noise measured by the main instrument, and improves the data utilization rate and the modeling precision.
It is to be understood that the above-described embodiments are only a few embodiments of the present invention, and not all embodiments. The above examples are only for explaining the present invention and do not constitute a limitation to the scope of protection of the present invention. All other embodiments, which can be derived by those skilled in the art from the above-described embodiments without any creative effort, namely all modifications, equivalents, improvements and the like made within the spirit and principle of the present application, fall within the protection scope of the present invention claimed.

Claims (2)

1. A CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,...,Y i ,...,Y I ) T ,Y i =(y i1 ,y i2 ,...,y ik ,...,y iK ),y ik Is the value of the kth species concentration variable for the ith sample, K being 1,2The total number of concentration variables;
step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { X m_center ,Y center And a target domain data set { X } s_center ,Y center };
And 3, step 3: CPLS algorithm based matrix X m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Figure FDA0002369344610000011
Performing singular value decomposition on predictable substance concentration variables to obtain
Figure FDA0002369344610000012
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
Figure FDA0002369344610000013
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Figure FDA0002369344610000021
Extracting main components from unpredictable substance concentration variables to obtain l y The main component number is
Figure FDA0002369344610000022
Wherein,
Figure FDA0002369344610000023
is composed of
Figure FDA0002369344610000024
The output residual matrix of (3);
The matrix is obtained by equation (6)
Figure FDA0002369344610000025
Step 3.4: by spatially R c Projection of an input variable independent of the material concentration variable as
Figure FDA0002369344610000026
Wherein R is c * =(R c T R c ) -1 R c T
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Figure FDA0002369344610000027
Wherein,
Figure FDA0002369344610000028
is composed of
Figure FDA0002369344610000029
The input residual matrix of (3);
obtaining a matrix by equation (8)
Figure FDA00023693446100000210
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
Figure FDA00023693446100000211
Figure FDA00023693446100000212
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c
And 5: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured object s_test
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
2. The CPLS-based Infrared Spectroscopy measurement instrument calibration migration method according to claim 1, wherein in the step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables comprise moisture content, oil content, protein content and starch content of grain.
CN202010045812.3A 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method Active CN111220565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010045812.3A CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010045812.3A CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Publications (2)

Publication Number Publication Date
CN111220565A CN111220565A (en) 2020-06-02
CN111220565B true CN111220565B (en) 2022-07-29

Family

ID=70827000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010045812.3A Active CN111220565B (en) 2020-01-16 2020-01-16 CPLS-based infrared spectrum measuring instrument calibration migration method

Country Status (1)

Country Link
CN (1) CN111220565B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113959979B (en) * 2021-10-29 2022-07-29 燕山大学 Near infrared spectrum model migration method based on deep Bi-LSTM network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606164A (en) * 1996-01-16 1997-02-25 Boehringer Mannheim Corporation Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN107064054A (en) * 2017-02-28 2017-08-18 浙江大学 A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN108960329A (en) * 2018-07-06 2018-12-07 浙江科技学院 A kind of chemical process fault detection method comprising missing data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7098037B2 (en) * 1998-10-13 2006-08-29 Inlight Solutions, Inc. Accommodating subject and instrument variations in spectroscopic determinations
US6748251B2 (en) * 2000-03-31 2004-06-08 Japan, As Represented By President Of Kobe University Method and apparatus for detecting mastitis by using visual light and/or near infrared lights

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606164A (en) * 1996-01-16 1997-02-25 Boehringer Mannheim Corporation Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN107064054A (en) * 2017-02-28 2017-08-18 浙江大学 A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN108960329A (en) * 2018-07-06 2018-12-07 浙江科技学院 A kind of chemical process fault detection method comprising missing data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring;Zimmerman N;《Atmospheric Measurement Techniques》;20181231;第11卷(第1期);全文 *
Qualitative analysis of maize haploid kernels based on calibration transfer by near-infrared spectroscopy;Li J;《Analytical Letters》;20191231;第52卷(第2期);全文 *
基于Si-cPLS的小麦种子发芽率近红外模型优化研究;吴静珠等;《光谱学与光谱分析》;20170415(第04期);全文 *
基于校正分布差异的标定迁移方法研究;赵煜辉;《东北大学学报(自然科学版)》;20210331;第42卷(第3期);全文 *
平均分布差异最小化的NIR标定迁移方法研究;赵煜辉;《光谱学与光谱分析》;20211031;第41卷(第10期);全文 *
迁移学习在食用油光谱模型转移中的应用;刘翠玲;《食品科学技术学报》;20190731;第37卷(第4期);全文 *

Also Published As

Publication number Publication date
CN111220565A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
Feilhauer et al. Multi-method ensemble selection of spectral bands related to leaf biochemistry
Deng et al. A bootstrapping soft shrinkage approach for variable selection in chemical modeling
CN106815643B (en) Infrared spectroscopy Model Transfer method based on random forest transfer learning
CN110687072B (en) Calibration set and verification set selection and modeling method based on spectral similarity
CN111563436B (en) Infrared spectrum measuring instrument calibration migration method based on CT-CDD
CN106680238B (en) Method based on infrared spectrum analysis material component content
CN106248621B (en) A kind of evaluation method and system
CN107958267B (en) Oil product property prediction method based on spectral linear representation
Fan et al. Direct calibration transfer to principal components via canonical correlation analysis
CN114611582B (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
CN111999258A (en) Spectral baseline correction-oriented weighting modeling local optimization method
CN111220565B (en) CPLS-based infrared spectrum measuring instrument calibration migration method
Lei et al. Achieving joint calibration of soil Vis-NIR spectra across instruments, soil types and properties by an attention-based spectra encoding-spectra/property decoding architecture
Metz et al. RoBoost-PLS2-R: an extension of RoBoost-PLSR method for multi-response
Shao et al. A new approach to discriminate varieties of tobacco using vis/near infrared spectra
CN114878509A (en) Standard sample-free transfer method of tobacco near-infrared quantitative analysis model
Wu et al. Variety identification of Chinese cabbage seeds using visible and near-infrared spectroscopy
CN112651173B (en) Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system
CN111125629B (en) Domain-adaptive PLS regression model modeling method
Norgaard et al. Artificial Neural Networks and Near Infrared Spectroscopy-A case study on protein content in whole wheat grain
Xie et al. Calibration transfer via filter learning
CN115630332A (en) Wheat flour quality characteristic prediction method
CN116484989A (en) Tobacco near-infrared multicomponent prediction method based on deep migration learning
Shan et al. A nonlinear calibration transfer method based on joint kernel subspace
de Paula et al. Parallelization of a modified firefly algorithm using GPU for variable selection in a multivariate calibration problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant