CN111220565B - CPLS-based infrared spectrum measuring instrument calibration migration method - Google Patents
CPLS-based infrared spectrum measuring instrument calibration migration method Download PDFInfo
- Publication number
- CN111220565B CN111220565B CN202010045812.3A CN202010045812A CN111220565B CN 111220565 B CN111220565 B CN 111220565B CN 202010045812 A CN202010045812 A CN 202010045812A CN 111220565 B CN111220565 B CN 111220565B
- Authority
- CN
- China
- Prior art keywords
- center
- matrix
- data set
- spectrum
- domain data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000013508 migration Methods 0.000 title claims abstract description 30
- 230000005012 migration Effects 0.000 title claims abstract description 30
- 238000002329 infrared spectrum Methods 0.000 title claims abstract description 24
- 238000004164 analytical calibration Methods 0.000 title claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims abstract description 61
- 239000000126 substance Substances 0.000 claims abstract description 28
- 238000000513 principal component analysis Methods 0.000 claims abstract description 14
- 238000012546 transfer Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 6
- 238000001228 spectrum Methods 0.000 claims description 60
- 238000012360 testing method Methods 0.000 claims description 24
- 238000005259 measurement Methods 0.000 claims description 16
- 230000003595 spectral effect Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 10
- 239000000463 material Substances 0.000 claims description 8
- 229920002472 Starch Polymers 0.000 claims description 5
- 235000019698 starch Nutrition 0.000 claims description 5
- 239000008107 starch Substances 0.000 claims description 5
- 238000004566 IR spectroscopy Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000002835 absorbance Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 2
- 230000007704 transition Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 21
- 240000008042 Zea mays Species 0.000 description 10
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 10
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 10
- 235000005822 corn Nutrition 0.000 description 10
- 230000000694 effects Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000013526 transfer learning Methods 0.000 description 5
- 238000002790 cross-validation Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010987 Kennard-Stone algorithm Methods 0.000 description 2
- 238000004497 NIR spectroscopy Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000007786 learning performance Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001516 effect on protein Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005477 standard model Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2201/00—Features of devices classified in G01N21/00
- G01N2201/12—Circuits of general importance; Signal processing
- G01N2201/127—Calibration; base line adjustment; drift compensation
Landscapes
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention relates to the technical field of migration learning under a machine learning module, and provides a CPLS-based infrared spectrum measuring instrument calibration migration method. First, a source domain data set { X is collected m Y and target Domain data set { X s Y, and carrying out centralization processing on the data set to obtain a centralized source domain data set { X }, wherein the centralized source domain data set is obtained m_center ,Y center And a target domain data set { X } s_center ,Y center }; then, the matrix X is subjected to correlation based on CPLS algorithm m_center 、Y center Performing principal component analysis and applying to the matrix X s_center Performing principal component analysis; recalculating the transition matrix M trans_pre And a transfer matrix M trans (ii) a Finally, the substance concentration variation of the object to be measured is predicted. The invention can eliminate the random noise measured by the main instrument, improve the data utilization rate and the modeling precision and reduce the time complexity.
Description
Technical Field
The invention relates to the technical field of migration learning under a machine learning module, in particular to a CPLS-based infrared spectrum measuring instrument calibration migration method.
Background
The near infrared spectroscopy (NIRS) analysis technology has the advantages of simple instrument operation, high data analysis speed, low cost, no sample pollution and the like, and is generally applied to various fields. In the production process, a near infrared spectrum analysis technology is used for modeling, and the existing calibration model is invalid due to unstable measurement conditions and instrument hardware performance.
The main goal of migration learning is to extract classification or regression knowledge from one or more tasks in the source domain and apply that knowledge to the target domain tasks, if the knowledge of one task is successfully transferred to another, then a model of the new task can be obtained without too many new samples. The learning performance of the target domain is improved by using the knowledge learned in one or more source domains, the problems of target domain label loss, high label cost, time-consuming learning process and the like are solved, and the purpose of improving the learning performance is achieved.
The calibration migration method refers to the migration of a multi-element calibration model under different measuring instruments or measuring states. The method utilizes the linear relation among the spectral data of different sources to convert the measured spectral sample of a new instrument or in a new state, and further can directly utilize the original model to predict the new sample. The migration research can be applied to related fields instead of the same field, and realizes useful information of migration and inter-domain conversion, so that the effectiveness of an original model can be maintained or the original information is utilized to accelerate the modeling speed, a large number of target domain samples or models are prevented from being used for sampling or modeling a target domain again, the effectiveness of the model is improved, the cost is reduced to a great extent, and the modeling speed is accelerated.
The existing calibration migration method has the problems of low prediction precision, limited application occasions and the like. For example, in a PLS-based calibration migration method, partial least-squares (PLS) is one of algorithms commonly used in data information extraction and process monitoring, and by extracting feature information with the maximum correlation between a process variable and a quality variable and dividing the process variable, the process variable and the quality variable are converted into a principal component subspace and a residual subspace, thereby realizing compression and extraction of data. However, the PLS algorithm first extracts the process variable and quality variable pivot separately using principal component analysis, with no correlation between the two pivots. It defaults to all process variables acting on the quality variable, ignoring the state information of internal variables. In many cases, due to lack of excitation of process data, there are a lot of unmeasured process and quality disturbances, and when the remaining information of the quality variables changes, alarm failure occurs, resulting in poor PLS prediction output. In fact, monitoring of quality variable information changes is more important than process variables. On the other hand, the optimization goal involved in building the PLS model is to maximize the principal component correlation between the process and quality variables without residual constraints, maximizing the residual variance between the process and quality variables. Variables cannot be guaranteed to be minimal, which may lead to a large amount of information being left over for process and quality variables. Moreover, the data volume of near infrared spectrum modeling processing is large at present, the time complexity of a serial partial least square algorithm is high, and the training and testing process is long.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides the CPLS-based infrared spectrum measuring instrument calibration migration method, which can eliminate random noise measured by a main instrument, improve the data utilization rate and the modeling precision and reduce the time complexity.
The technical scheme of the invention is as follows:
a CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,...,Y i ,...,Y I ) T ,Y i =(y i1 ,y i2 ,...,y ik ,...,y iK ),y ik The value of the kth substance concentration variable of the ith sample, where K is 1, 2.. and K is the total number of substance concentration variables;
Step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { X m_center ,Y center And a target domain data set { X } s_center ,Y center };
And step 3: CPLS algorithm based matrix X m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Performing singular value decomposition on predictable substance concentration variables to obtain
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Extracting main components from unpredictable substance concentration variables to obtain l y The main component number is
Step 3.4: by spatially R c Projection, obtaining variables independent of substance concentration Input variable of
Wherein R is c * =(R c T R c ) -1 R c T ;
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c ;
And 5: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans ;
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured object s_test ;
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test ;
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
Further, in step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables include moisture content, oil content, protein content, and starch content of the grain.
The invention has the beneficial effects that:
the invention carries out primary principal component extraction on the source domain data set and the target domain data set based on the CPLS algorithm, then carries out primary principal component extraction on the residual error, calculates the transfer matrix on the basis of the two primary component extractions, eliminates the random noise measured by a main instrument, improves the data utilization rate and the modeling precision, reduces the time complexity and improves the training and testing speed.
Drawings
Fig. 1 is a flow chart of the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS of the present invention.
Fig. 2 is a flow chart of the CPLS-based principal component analysis of the source domain data set in the calibration migration method of the CPLS-based infrared spectroscopic measuring instrument of the present invention.
Fig. 3 is a flow chart of solving a transfer matrix in the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS.
Fig. 4 is a flowchart of predicting the substance concentration variable of the measured object in the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS according to the present invention.
FIG. 5 is a graphical representation of cross-validation error of oil on a corn data set as a function of principal component number in accordance with an embodiment.
FIG. 6 is a graph showing the fitting results of mp6spec to mp5spec in the embodiment.
FIG. 7 is a graph showing the fitting results of m5spec-mp5spec in the embodiment.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
The invention provides a CPLS-based infrared spectrum measuring instrument calibration migration method. In data processing, PLS simply extracts principal components from X and Y once, but the residual error of X and Y usually contains effective information, and the extraction is insufficient, so that the error of the established model is large, a parallel partial least squares (CPLS) algorithm is proposed, and on the basis of PLS, the residual error is extracted once again, so that the established model error is smaller, and the linear relation is closer to the real situation. However, in reality, the acquisition of samples is very expensive and time-consuming, so that the transfer learning is proposed on the basis of the CPLS, and the prediction of the target domain test set is completed by establishing a mapping relation on the standard set of the source domain and the target domain.
The CPLS algorithm adopted by the invention is further improved on the PLS algorithm, and the quality of process variable information irrelevant to quality variables and information which cannot be respectively predicted is subjected to principal component analysis and is divided into 5 subspaces: a subspace of process variable and quality variable related information (related principal element subspace), a process variable principal element space, a process variable residual error space, a quality variable principal element space, a quality variable residual error subspace.
The CPLS model achieves three goals: (1) extracting scores directly related to predictable changes in the output from the standard PLS projection, and these score vectors constitute a co-variational subspace (CVS); (2) further projecting the unpredicted output changes to an Output Principal Subspace (OPS) and an Output Residual Subspace (ORS) to monitor these subspaces for abnormal changes; (3) input changes that are not related to the prediction output are further projected into an input principal component subspace (IPS) and an Input Residual Subspace (IRS) to monitor for abnormal changes in these subspaces.
The CPLS algorithm sets the process variable data into two main parts, one of which is information related to the quality variable and the other of which is information unrelated to the quality variable. The quality variable data is also divided into two main parts, one part being information that is predictable from the process variable and the other part being information that is not predictable from the process variable. Thus, the CPLS-based monitoring method provides a complete monitoring framework that is capable of monitoring process and quality variables as well as other portions of information.
As shown in fig. 1, the calibration migration method of the infrared spectroscopic measuring instrument based on CPLS of the present invention includes the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement main instrument to correspond to a source domain, enabling an infrared spectrum measurement secondary instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement main instrument and the infrared spectrum measurement secondary instrument to respectively obtain a main spectrum and a secondary spectrum, respectively extracting spectral data of the main spectrum and the secondary spectrum at intervals anm within a wavelength range, and collecting substances of each sampleThe value of the concentration variable is changed to obtain a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,…,Y i ,…,Y I ) T ,Y i =(y i1 ,y i2 ,…,y ik ,...,y iK ),y ik The K is the value of the kth substance concentration variable of the ith sample, K is 1,2, …, K is the total number of substance concentration variables.
In this example, the sample is corn in the grain class, the spectral data is absorbance, and the material concentration variables include moisture content, oil content, protein content, and starch content of the corn. The data measured for the same sample, I-80, by the three spectroscopic instruments constitutes the corn data set. The infrared spectrum is measured by infrared spectrum measuring instruments m5, mp5 and mp6 at intervals of a-2 nm in the wavelength range of 1100-2498nm, and J-700 attributes. The main spectrum of the first experiment, namely the secondary spectrum is m5spec-mp6spec, namely the spectrum measured by m5 is taken as the main spectrum, and the corresponding spectral data set is taken as the initial source domain data set; since the spectrum measured for mp6 differs significantly from the spectrum measured for m5, it is selected as the original target domain data set from the spectrum, the corresponding spectral data set. Then, five more experiments were carried out on mp5spec-mp6spec, mp6spec-mp5spec, m5spec-mp5spec, mp5spec-m5spec, and mp6spec-m5spec in this order.
In this example, the Kennard-Stone (KS) algorithm was used to segment the corn data set. Firstly, 20% of data in the initial source domain data set and the initial target domain data set are extracted as a test sampleHere, the data are 16 samples. And testing the calibration migration model by using the test sample of the target domain. Then, the remaining 80% of the data in the initial source domain data set and the initial target domain data set are extracted as training samples, which are 64 samples of data respectively. Establishing a reference model by utilizing a training sample of a source domain, and predicting a migration sample of a target domain; and establishing a standard model of the target domain by using the training sample of the target domain so as to compare the performances of other migration models. Then, 20% of data are respectively extracted from the training samples of the source domain and the training samples of the target domain by using a KS algorithm to form a standard sample set of the source domain and a standard sample set of the target domain, and the standard sample sets are respectively used as source domain data sets { X ] used in the method of the invention m Y and target Domain data set { X s Y, to establish a transfer relationship between the source domain samples and the target domain samples.
Step 2: centralizing the source domain data set and the target domain data set, namely, averaging the data of each column, and then subtracting the average value of each column from the original data of each column to obtain a centralized source domain data set { X m_center ,Y center } and the target domain dataset X s_center ,Y center And thus, deviation caused by large numerical difference can be effectively avoided.
And 3, step 3: as shown in fig. 2, the matrix X is paired based on the CPLS algorithm m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Performing Singular Value Decomposition (SVD) on predictable substance concentration variables to obtain
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Principal component extraction (PCA) is performed on unpredictable substance concentration variables to obtain l y The main component number is
Step 3.4: by spatially R c Projection of an input variable independent of the material concentration variable as
Wherein R is c * =(R c T R c ) -1 R c T ;
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
According to the algorithm flow of CPLS, X can be obviously seen m_center 、Y center Is divided into three parts: principal component extracted by PLS algorithm, principal component extracted for residual, unpredictable error. Compared with the PLS algorithm, the CPLS algorithm flow shows that the method has the advantages of more processing for extracting the principal component from the residual error and improving the data utilization rate.
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c 。
In this embodiment, the selection result of the PLS algorithm with the optimal principal component number is analyzed as follows: the principal component number of the PLS method is selected by adopting a 10-fold cross validation method, and the change situation of the oil content model cross validation error of the target domain training set in the corn data set caused by the change of the principal component number is shown in FIG. 5 by taking the oil as an example. As can be seen from fig. 5, the cross validation error of oil on corn set reaches global minimum when the principal component number is 12, so we set the optimal principal component number for oil to be 12. The method for selecting the optimal number of main components of the other three components is the same as the method.
And 5: as shown in fig. 3, a transfer matrix is established that maps the target domain latent structure to the source domain latent structure using a least squares algorithm: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans 。
Step 6: as shown in fig. 4, the substance concentration variation of the object to be measured is predicted:
step 6.1: collecting spectrum of measured object from instrument by infrared spectrometry, extracting spectrum data by the same method as step 1 to obtain J matrices X composed of spectrum data s_test ;
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test ;
Step 6.3: predicting the matrix formed by the material concentration variable of the measured object as Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
In this example, the data is predicted using a model, and the prediction error RMSEP results for different master-slave instrument combinations in the corn data set are shown in table 1 below:
TABLE 1
Analysis of Table 1 reveals that: in general, the operation effect of the invention between the spectrum mp5spec and the spectrum mp6spec is generally better than that of the other two groups, because the similarity between mp5spec and mp6spec is higher, and the difference between the two groups and the spectrum m5spec is larger, so that the transfer learning between the two groups is more meaningful, and the result error is smaller. It can be seen that, taking mp6spec as the main spectrum and mp5spec as the auxiliary spectrum, the measurement errors of water, oil, protein and starch are basically the smallest in the six groups of experiments, while the migration results between m5spec and mp5spec, mp6spec are the largest in the six groups.
As shown in FIGS. 6 and 7, the fitting results of mp6spec-mp5spec and m5spec-mp5spec in this example are shown. Comparing fig. 6 and fig. 7, it is clear that the two sets of fitting effects are good or bad. Compared with the transfer learning between the spectrum mp6spec and the spectrum mp5spec, the spectrum mp5spec has higher similarity and better fitting degree, and most points of the spectrum m5spec fall near or on a fitting line, and all points of the spectrum m5spec and the spectrum mp5spec fall below the fitting straight line, which shows that the transfer learning effect of the spectrum m is obviously better than that of the spectrum m5spec and the spectrum m5spec has no need of transfer between the two spectra, because the predicted effect is not good at all.
Since the spectrum mp6spec-mp5spec has the best migration effect, the set of spectra is chosen for the experiment and compared with other algorithms, which are respectively: multivariate Scatter Correction (MSC), Canonical Correlation Analysis (CCA), Slope deviation Correction (SBC), Piecewise Direct normalization (PDS). As shown in Table 2, the results of RMSEP comparisons under each algorithm are for mp6spec-m5spec in the corn data set. As can be seen from table 2, in general, the migration effect of the calibration migration method of the infrared spectroscopic measurement instrument based on CPLS of the present invention is very good: compared with MSC, CCA and PDS algorithms, the method disclosed by the invention is far superior to the three algorithms in the prediction of the four components; compared with the SBC algorithm, the method has better prediction effect on water and oil, and has little difference on the prediction effect on protein and starch.
TABLE 2
In a word, through six groups of experiments on a corn data set, according to the obtained experimental results, the results are respectively compared with the MSC algorithm, the CCA algorithm, the SBC algorithm and the PDS algorithm, and the prediction effect of the CPLS algorithm combined with the transfer learning is similar to that of the SBC algorithm, but is far better than that of the MSC algorithm, the CCA algorithm and the PDS algorithm. Therefore, the method eliminates the random noise measured by the main instrument, and improves the data utilization rate and the modeling precision.
It is to be understood that the above-described embodiments are only a few embodiments of the present invention, and not all embodiments. The above examples are only for explaining the present invention and do not constitute a limitation to the scope of protection of the present invention. All other embodiments, which can be derived by those skilled in the art from the above-described embodiments without any creative effort, namely all modifications, equivalents, improvements and the like made within the spirit and principle of the present application, fall within the protection scope of the present invention claimed.
Claims (2)
1. A CPLS-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { X m Y and target Domain data set { X s ,Y};
Wherein, X m =(X m1 ,X m2 ,...,X mi ,...,X mI ) T ,X mi =(x mi1 ,x mi2 ,...,x mij ,...,x miJ ),X s =(X s1 ,X s2 ,...,X si ,...,X sI ) T ,X si =(x si1 ,x si2 ,...,x sij ,...,x siJ ),x mij 、x sij J, I is the total number of samples, and J is the total number of extracted spectral data points; y ═ Y 1 ,Y 2 ,...,Y i ,...,Y I ) T ,Y i =(y i1 ,y i2 ,...,y ik ,...,y iK ),y ik Is the value of the kth species concentration variable for the ith sample, K being 1,2The total number of concentration variables;
step 2: the source domain data set and the target domain data set are subjected to centralized processing to obtain a centralized source domain data set { X m_center ,Y center And a target domain data set { X } s_center ,Y center };
And 3, step 3: CPLS algorithm based matrix X m_center 、Y center Performing principal component analysis:
step 3.1: data set { X) based on PLS algorithm m_center ,Y center Establishment of calibration model Y center =X m_center B, calculating to obtain a coefficient matrix B, X m_center Score matrix T, X m_center Load matrix P, Y center Score matrix U, Y center The matrix R is introduced so that T is X m_center R, and determining the number l of the latent variables;
step 3.2: calculating a predictable substance concentration variable of
Performing singular value decomposition on predictable substance concentration variables to obtain
Wherein, U c As a left singular matrix, D c As diagonal matrix of singular values, V c As a right singular matrix, V c Is an orthogonal matrix; q c =V c D c T Including l in descending order c A plurality of non-zero singular values and corresponding right singular vectors;
obtained by the formula (2)
To obtain
R c =RQ T V c D c -1 (4)
Step 3.3: calculating an unpredictable substance concentration variable as
Extracting main components from unpredictable substance concentration variables to obtain l y The main component number is
Step 3.4: by spatially R c Projection of an input variable independent of the material concentration variable as
Wherein R is c * =(R c T R c ) -1 R c T ;
Subjecting the input variable independent of the concentration variable of the substance to principal component extraction to obtain l x The main component number is
Step 3.5: from step 3.1 to step 3.4, X is obtained m_center 、Y center The main components extracted by the PLS algorithm are respectively X m_pre =TP T 、Y pre =UQ T ,X m_center 、Y center Respectively have a residual error of X m_res_c =X m_center -X m_pre 、Y res_c =Y center -Y pre That is to obtain
And 4, step 4: applying the same method as in step 3 to the matrix X s_center Performing principal component analysis to obtain X s_center Has a residual error of X s_res_c ;
And 5: calculating the score T of the source domain data set after the principal spectrum is extracted by the PLS algorithm m_pre =X m_center R, calculating the score T of the target domain data set after extracting the principal components from the spectrum by a PLS algorithm s_pre =X s_center R, according to T m_pre 、T s_pre Calculating transfer matrix M based on least square method trans_pre (ii) a Calculating the score T of the data set of the source domain after extracting principal components from the residual error of the principal spectrum m =X m_res_c P, calculating the score T of the target domain data set after extracting the principal component from the spectrum pair residual error s =X s_res_c P, according to T m 、T s Calculating transfer matrix M based on least square method trans ;
Step 6: predicting the substance concentration variable of the measured object:
step 6.1: collecting the spectrum of the measured object from the instrument by infrared spectrometry, and extracting the spectrum data by the same method as step 1 to obtain J matrixes X formed by the spectrum data of the measured object s_test ;
Step 6.2: x pair based on CPLS algorithm s_test Performing principal component analysis to obtain X s_test Has a residual error of X s_res_c_test ;
Step 6.3: the matrix formed by predicting the material concentration variable of the measured object is Y test_predict =(X s_test *R*M trans_pre *P T +X s_res_c_test *R*M trans *P T )*B。
2. The CPLS-based Infrared Spectroscopy measurement instrument calibration migration method according to claim 1, wherein in the step 1, the sample is grain, the spectral data is absorbance, and the substance concentration variables comprise moisture content, oil content, protein content and starch content of grain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010045812.3A CN111220565B (en) | 2020-01-16 | 2020-01-16 | CPLS-based infrared spectrum measuring instrument calibration migration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010045812.3A CN111220565B (en) | 2020-01-16 | 2020-01-16 | CPLS-based infrared spectrum measuring instrument calibration migration method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111220565A CN111220565A (en) | 2020-06-02 |
CN111220565B true CN111220565B (en) | 2022-07-29 |
Family
ID=70827000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010045812.3A Active CN111220565B (en) | 2020-01-16 | 2020-01-16 | CPLS-based infrared spectrum measuring instrument calibration migration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111220565B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113959979B (en) * | 2021-10-29 | 2022-07-29 | 燕山大学 | Near infrared spectrum model migration method based on deep Bi-LSTM network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5606164A (en) * | 1996-01-16 | 1997-02-25 | Boehringer Mannheim Corporation | Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection |
CN106596450A (en) * | 2017-01-06 | 2017-04-26 | 东北大学秦皇岛分校 | Incremental method for analysis of material component content based on infrared spectroscopy |
CN106680238A (en) * | 2017-01-06 | 2017-05-17 | 东北大学秦皇岛分校 | Method for analyzing material composition content on basis of infrared spectroscopy |
CN107064054A (en) * | 2017-02-28 | 2017-08-18 | 浙江大学 | A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models |
CN108152239A (en) * | 2017-12-13 | 2018-06-12 | 东北大学秦皇岛分校 | The sample composition content assaying method of feature based migration |
CN108960329A (en) * | 2018-07-06 | 2018-12-07 | 浙江科技学院 | A kind of chemical process fault detection method comprising missing data |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7098037B2 (en) * | 1998-10-13 | 2006-08-29 | Inlight Solutions, Inc. | Accommodating subject and instrument variations in spectroscopic determinations |
US6748251B2 (en) * | 2000-03-31 | 2004-06-08 | Japan, As Represented By President Of Kobe University | Method and apparatus for detecting mastitis by using visual light and/or near infrared lights |
-
2020
- 2020-01-16 CN CN202010045812.3A patent/CN111220565B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5606164A (en) * | 1996-01-16 | 1997-02-25 | Boehringer Mannheim Corporation | Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection |
CN106596450A (en) * | 2017-01-06 | 2017-04-26 | 东北大学秦皇岛分校 | Incremental method for analysis of material component content based on infrared spectroscopy |
CN106680238A (en) * | 2017-01-06 | 2017-05-17 | 东北大学秦皇岛分校 | Method for analyzing material composition content on basis of infrared spectroscopy |
CN107064054A (en) * | 2017-02-28 | 2017-08-18 | 浙江大学 | A kind of near-infrared spectral analytical method based on CC PLS RBFNN Optimized models |
CN108152239A (en) * | 2017-12-13 | 2018-06-12 | 东北大学秦皇岛分校 | The sample composition content assaying method of feature based migration |
CN108960329A (en) * | 2018-07-06 | 2018-12-07 | 浙江科技学院 | A kind of chemical process fault detection method comprising missing data |
Non-Patent Citations (6)
Title |
---|
A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring;Zimmerman N;《Atmospheric Measurement Techniques》;20181231;第11卷(第1期);全文 * |
Qualitative analysis of maize haploid kernels based on calibration transfer by near-infrared spectroscopy;Li J;《Analytical Letters》;20191231;第52卷(第2期);全文 * |
基于Si-cPLS的小麦种子发芽率近红外模型优化研究;吴静珠等;《光谱学与光谱分析》;20170415(第04期);全文 * |
基于校正分布差异的标定迁移方法研究;赵煜辉;《东北大学学报(自然科学版)》;20210331;第42卷(第3期);全文 * |
平均分布差异最小化的NIR标定迁移方法研究;赵煜辉;《光谱学与光谱分析》;20211031;第41卷(第10期);全文 * |
迁移学习在食用油光谱模型转移中的应用;刘翠玲;《食品科学技术学报》;20190731;第37卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111220565A (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Feilhauer et al. | Multi-method ensemble selection of spectral bands related to leaf biochemistry | |
Deng et al. | A bootstrapping soft shrinkage approach for variable selection in chemical modeling | |
CN106815643B (en) | Infrared spectroscopy Model Transfer method based on random forest transfer learning | |
CN110687072B (en) | Calibration set and verification set selection and modeling method based on spectral similarity | |
CN111563436B (en) | Infrared spectrum measuring instrument calibration migration method based on CT-CDD | |
CN106680238B (en) | Method based on infrared spectrum analysis material component content | |
CN106248621B (en) | A kind of evaluation method and system | |
CN107958267B (en) | Oil product property prediction method based on spectral linear representation | |
Fan et al. | Direct calibration transfer to principal components via canonical correlation analysis | |
CN114611582B (en) | Method and system for analyzing substance concentration based on near infrared spectrum technology | |
CN111999258A (en) | Spectral baseline correction-oriented weighting modeling local optimization method | |
CN111220565B (en) | CPLS-based infrared spectrum measuring instrument calibration migration method | |
Lei et al. | Achieving joint calibration of soil Vis-NIR spectra across instruments, soil types and properties by an attention-based spectra encoding-spectra/property decoding architecture | |
Metz et al. | RoBoost-PLS2-R: an extension of RoBoost-PLSR method for multi-response | |
Shao et al. | A new approach to discriminate varieties of tobacco using vis/near infrared spectra | |
CN114878509A (en) | Standard sample-free transfer method of tobacco near-infrared quantitative analysis model | |
Wu et al. | Variety identification of Chinese cabbage seeds using visible and near-infrared spectroscopy | |
CN112651173B (en) | Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system | |
CN111125629B (en) | Domain-adaptive PLS regression model modeling method | |
Norgaard et al. | Artificial Neural Networks and Near Infrared Spectroscopy-A case study on protein content in whole wheat grain | |
Xie et al. | Calibration transfer via filter learning | |
CN115630332A (en) | Wheat flour quality characteristic prediction method | |
CN116484989A (en) | Tobacco near-infrared multicomponent prediction method based on deep migration learning | |
Shan et al. | A nonlinear calibration transfer method based on joint kernel subspace | |
de Paula et al. | Parallelization of a modified firefly algorithm using GPU for variable selection in a multivariate calibration problem |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |