CN111563436B - Infrared spectrum measuring instrument calibration migration method based on CT-CDD - Google Patents

Infrared spectrum measuring instrument calibration migration method based on CT-CDD Download PDF

Info

Publication number
CN111563436B
CN111563436B CN202010348512.2A CN202010348512A CN111563436B CN 111563436 B CN111563436 B CN 111563436B CN 202010348512 A CN202010348512 A CN 202010348512A CN 111563436 B CN111563436 B CN 111563436B
Authority
CN
China
Prior art keywords
matrix
spectrum
instrument
data set
calibration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010348512.2A
Other languages
Chinese (zh)
Other versions
CN111563436A (en
Inventor
赵煜辉
刘晓东
芦鹏程
赵子恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University Qinhuangdao Branch
Original Assignee
Northeastern University Qinhuangdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University Qinhuangdao Branch filed Critical Northeastern University Qinhuangdao Branch
Priority to CN202010348512.2A priority Critical patent/CN111563436B/en
Publication of CN111563436A publication Critical patent/CN111563436A/en
Application granted granted Critical
Publication of CN111563436B publication Critical patent/CN111563436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/27Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands using photo-electric detection ; circuits for computing concentration
    • G01N21/274Calibration, base line adjustment, drift correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2201/00Features of devices classified in G01N21/00
    • G01N2201/12Circuits of general importance; Signal processing
    • G01N2201/129Using chemometrical methods

Abstract

The invention relates to the technical field of transfer learning under a machine learning module, and provides a CT-CDD-based infrared spectrum measuring instrument calibration transfer method. First, a source domain and target domain data set { X is collectedm,ym}、{XsDividing a source domain calibration set by using a KS algorithm
Figure DDA0002471095940000011
Centralizing the same; then, the centralized source domain calibration set
Figure DDA0002471095940000012
Establishing a PLS calibration model; then, the characteristic spectrum T of the main instrument is calculatedmPseudo-signature spectra of slave instruments
Figure DDA0002471095940000013
Using OLS and dataset { Tm,ymDetermine the cluster number K by cross validation and pair { T }m,ymAnd
Figure DDA0002471095940000014
clustering separately, sub-datasets
Figure DDA0002471095940000015
Establishing a kth OLS model and calculating a transformation matrix Mk(ii) a And finally, predicting the substance concentration variable of the measured object set. The invention does not need to use a standard sample to construct a migration model, and can greatly improve the precision and efficiency of the calibration migration of the infrared spectrum measuring instrument.

Description

Infrared spectrum measuring instrument calibration migration method based on CT-CDD
Technical Field
The invention relates to the technical field of transfer learning under a machine learning module, in particular to a CT-CDD-based infrared spectrum measuring instrument calibration transfer method.
Background
The near infrared region is an electromagnetic wave between visible light and mid-infrared light, and is the non-visible region first discovered by WilliamHerschel in the 19 th century. The American Society for Testing and Materials (ASTM) defined the spectral region at wavelength 780 nm-2526 nm and wave number 12820-3959 cm-1 in 10 months of 1985. By the 50 s of the 20 th century, the near infrared spectroscopy analysis technology can be applied in some fields. The interest in the near infrared region, with the exception of some users for specific analysis applications, was gradually diminished by the ensuing 60 s due to the constant emergence of some novel analysis techniques, coupled with some of the drawbacks of the near infrared spectroscopy technique.
Since then, research on near infrared spectroscopy has entered a long silent period. As research and discussion on stoichiometry has grown, and as manufacturing techniques for spectroscopic instruments have continued to improve, infrared spectroscopy techniques have advanced further in the mid-80's. Different from the traditional analysis technology, the near infrared spectrum is an indirect analysis technology, the information such as the content of substances and the like cannot be directly obtained, and a calibration model must be established through a known sample to realize the prediction of the concentration information of an unknown sample so as to complete quantitative or qualitative analysis. The analysis process of the near infrared spectrum technology is shown in figure 1, and the main steps are as follows:
(1) selecting representative samples to form a calibration set, and testing the near infrared spectrum of the calibration set samples, wherein the collected calibration set samples need to be representative in the process;
(2) after collecting the calibration set of samples, measuring the concentration information of the substance of interest in the sample by standard analytical chemistry means;
(3) and selecting a proper algorithm to model the spectrum of the measured calibration set sample and the corresponding substance concentration information. The step is a core step of near infrared spectrum quantitative analysis, a calibration model is established for the preprocessed near infrared spectrum and concentration information, parameters of the model are generally determined through cross validation, and finally, the performance of the model needs to be checked;
(4) after the multivariate calibration model is established, the near infrared spectrum of the current test sample can be measured, and the substance content of the test sample is predicted by using the established calibration model.
Modern near infrared spectroscopy analysis technology already has abundant theoretical basis and technical practical experience. Unlike other analytical techniques, near infrared spectroscopy involves the theoretical knowledge of many different disciplines such as spectroscopy, chemometrics, and computer technology.
Near infrared spectroscopy has many advantages in that it can measure the chemical composition and properties of a sample in a matter of minutes. The method can simultaneously analyze various components of the sample only by completing one-time acquisition and measurement of the near infrared spectrum on the sample to be detected, and can reach more than ten indexes at most. The near infrared spectrum analysis technology can be directly analyzed after simple pretreatment is carried out on the sample, the sample is not damaged, and nondestructive detection is realized; does not need to use any chemical reagent, greatly reduces the analysis cost, does not cause pollution to the environment, and belongs to the 'green analysis' technology. The near infrared spectrum mainly reflects the information of chemical bonds of organic molecules containing hydrogen groups C-H, O-H, N-H, S-H and the like in a sample, and is very suitable for quantitative or qualitative analysis of hydrogen-containing organic matters. The analysis range of the near infrared spectrum analysis technology comprises most organic mixtures and compounds, and the unique advantages of the near infrared spectrum analysis technology make the application field of the technology extremely wide, and the technology has an indispensable effect in many industries, and is used for measuring the component content of substances in the agricultural field, such as the content of moisture or protein in corn; in the field of medicine, measurement of component contents of medicines, biological, food, environmental tests, and the like.
Machine learning and data mining techniques have enjoyed significant success in many areas of knowledge engineering including classification, regression, and clustering. For a traditional machine learning method, the distribution of training data and the distribution of test data should be the same, so that the test data can be predicted using a model built by the training data. In practical application scenarios, there will be some differences between their data distributions. In some cases, training data is expensive or impossible to collect. In this case, if there is a significant difference in the data distribution of the training data and the test data, there will be a large difference between the predicted result and the actual result of the test data, and most statistical models need to be re-modeled using newly collected training data. In this case, it is desirable to perform knowledge migration between task domains, and this method is called migration learning. Migratory learning is the ability of learners in one area to improve the ability of learners in another area by passing information from the relevant area.
Multivariate calibration is a very useful tool for extracting chemical information from spectral signals, and the established multivariate calibration model is crucial for many analytical measurements. It has been applied to a variety of analytical techniques, but its importance has been manifested in the Near Infrared (NIR) spectrum. Usually, a lot of manpower and material resources are invested in constructing a robust calibration model. Problems arise when measuring samples on different instruments or under different environmental factors. Even if the same sample is measured, the two spectral matrices measured by different instruments are different, and the established model will generate differences. A model built on one instrument is generally not predictive of the spectrum measured by a second instrument. One way to solve this problem is to re-measure each sample and build a new model for the newly acquired spectrum, but this is not a practical solution. Establishing a robust calibration model requires significant cost and time, and another acceptable method to save these unnecessary expenses is to perform model migration. This way of dealing with problems in the field of machine learning is called migration learning, and more specifically, the case where tasks are the same but domains are different is called domain adaptation. And in the field of chemometrics they are referred to as nominal migration.
Most calibration migration methods, which construct a migration model by using a set of standard samples, require measuring a set of standard samples on a master instrument and a slave instrument, respectively, and various standard migration methods have been proposed. For example, Direct normalization (DS) and Piecewise Direct normalization (PDS) correct for spectral differences between the master and slave instruments by a set of standard samples. In DS, each wavelength of the master is associated with all wavelengths of the slave. In the PDS, each wavelength of the master is associated with a wavelength window of the slave, and finally a band migration matrix is formed from the regression coefficients of each window. The experimental results are consistent with the assumption that the spectral dependence between the master and slave is limited to a small area in various migration methods. The key to the PDS is the selection of the window size and the determination of the number of standard samples, which creates multiple regression models, resulting in a large number of calculations. PDS is one of the most widely used migration methods, often used as a comparison to other new technologies. In slope correction of deviation (SBC), a linear relationship between the predicted values of different instruments is assumed. Firstly, calculating a regression coefficient between the spectrum and the response value; calculating the predicted values of the master instrument and the slave instrument through the regression coefficient; finally, a linear fit is made between the predicted values. Liang et al proposed that a calibration migration method based on a typical correlation analysis successfully corrected the differences between the different spectra. Firstly, constructing a PLS model by using a calibration set of a master instrument; selecting a part of a calibration set of a master instrument and a slave instrument as a standard sample; features are extracted separately by Canonical Correlation Analysis (CCA). The relationship between the master spectrum and the slave spectrum is established by the least square method (OLS), and finally, the difference of the spectra is successfully corrected. In addition, other calibration migration methods are proposed, such as Spectral Regression (SR), orthogonal projection Transfer (TOP), Single wavelength normalization (SWS), multi-Spectral calibration migration based on independent component analysis, Generalized Least Squares (GLSW) method, and other methods that require standard samples.
As can be seen from the above, in the prior art, many methods have been used to develop a relatively stable calibration model, but changes in environmental conditions and adjustments of a measurement instrument all cause poor prediction performance of the calibration model and even cause model failure, so that it is necessary to migrate to a spectrum to be measured by using the relevant knowledge of the established calibration model to help the spectrum to be measured to predict so as to save a lot of overhead. In the existing calibration migration method capable of remarkably improving the predictive performance of the model, a standard sample is mostly needed to be used for constructing the migration model. The standard sample should closely match the sample from which the calibration model was constructed and must exhibit sufficient variability to account for differences between the two instruments. The volatility and reactivity of the components make it a great challenge to maintain the integrity of the standard sample. Even more, in some practical applications it is difficult or even impossible to obtain standard samples, i.e. to measure their spectra simultaneously on the master and slave instruments. Although there are a small number of calibration migration methods that do not require standard samples, the prediction performance of these methods is very different from that of the migration methods with standard samples.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides the CT-CDD-based infrared spectrum measuring instrument calibration migration method, which does not need to use a standard sample to construct a migration model, and can greatly improve the precision and efficiency of infrared spectrum measuring instrument calibration migration.
The technical scheme of the invention is as follows:
a CT-CDD-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { Xm,ymAnd a target domain data set { X }s};
Wherein, Xm、XsRespectively a master spectral matrix and a slave spectral matrix,
Figure BDA0002471095920000041
Figure BDA0002471095920000042
i is the main spectral vector and the slave spectral vector of the ith sample, I is 1, 2.
Figure BDA0002471095920000043
J, J is the total number of extracted spectral data points, i.e. the jth primary spectral data and the jth secondary spectral data of the ith sample respectively; y ismIs a matrix of values of the concentration of the substance,
Figure BDA0002471095920000044
is the value of the substance concentration variable for the ith sample;
step 2: source domain data set { X using KS algorithmm,ymDividing into a source domain calibration set
Figure BDA0002471095920000045
And source domain test set
Figure BDA0002471095920000046
And step 3: set of source domain calibrations
Figure BDA0002471095920000047
Performing centralization treatment to obtain a source domain calibration set after centralization treatment
Figure BDA0002471095920000048
And 4, step 4: PLS algorithm based on data sets
Figure BDA0002471095920000049
Establishing a calibration model
Figure BDA00024710959200000410
Is calculated to obtain
Figure BDA00024710959200000411
Weight matrix W ofm
Figure BDA00024710959200000412
Load matrix PmRegression coefficient matrix betam
And 5: constructing a migration model:
step 5.1: calculating a characteristic spectrum matrix of a main instrument for infrared spectrum measurement
Tm=XmWm(PmWm)-1
Calculating a pseudo-characteristic spectral matrix of the infrared spectrometric slave instrument
Figure BDA0002471095920000051
Step 5.2: for each cluster number L belongs to L*Using k-means clustering algorithm to data set { T }m,ymThe characteristic spectrum vectors of the data set are clustered, and the data set is subjected to the clustering of the characteristic spectrum vectors of the data set Tm,ymDivide into L sub-datasets
Figure BDA0002471095920000052
l=1,2,...,L;
On the basis of OLS algorithm, the first sub-data set
Figure BDA0002471095920000053
Establishing an initial least squares model
Figure BDA0002471095920000054
l=1,2,...,L;
Calculating the cross validation error RMSECV of L initial least square models under each cluster numberLDetermining min { RMSECV }L|L∈L*The corresponding cluster number is the final cluster number K;
wherein L is*To be a set of the number of clusters,
Figure BDA0002471095920000055
is the l-th initial sub-feature spectral matrix,
Figure BDA0002471095920000056
is composed of
Figure BDA0002471095920000057
Matrix of values of variables of the concentration of substance of the corresponding sample, beta0_lIs the first initial regression coefficient matrix;
step 5.3: using K-means clustering algorithm to perform data set { T) according to clustering number Km,ymThe characteristic spectrum vectors of the data set are clustered, and the data set is subjected to the clustering of the characteristic spectrum vectors of the data set Tm,ymDivide into K sub-datasets
Figure BDA0002471095920000058
k=1,2,...,K;
Using K-means clustering algorithm to perform data set according to clustering number K
Figure BDA0002471095920000059
The pseudo characteristic spectral vectors are clustered, and the data set is obtained
Figure BDA00024710959200000510
Partitioning into K sub-datasets
Figure BDA00024710959200000511
k=1,2,...,K;
Wherein the characteristic spectrum vector and the pseudo characteristic spectrum vector are respectively Tm
Figure BDA00024710959200000512
The line vectors of (a) are,
Figure BDA00024710959200000513
respectively a k-th sub characteristic spectrum matrix and a sub pseudo characteristic spectrum matrix,
Figure BDA00024710959200000514
is composed of
Figure BDA00024710959200000515
A matrix formed by the variable values of the substance concentration of the corresponding sample;
step 5.4: on the basis of OLS algorithm, the kth sub-data set
Figure BDA00024710959200000516
Establishing a kth least squares model
Figure BDA00024710959200000517
Calculating to obtain a k-th regression coefficient matrix betak
Step 5.5: computing the kth transformation matrix
Figure BDA00024710959200000518
Wherein the content of the first and second substances,
Figure BDA00024710959200000519
are respectively as
Figure BDA00024710959200000520
The covariance matrix of (a);
step 6: and predicting the substance concentration variable of the measured object set:
step 6.1: collecting the spectrum of each measured object in the measured object set from the instrument by using infrared spectrum measurement, and extracting the spectrum data by using the same method as the step 1 to obtain a secondary spectrum matrix of the measured object set
Figure BDA00024710959200000521
Step 6.2: calculating a pseudo characteristic spectrum matrix of a measured object set under an infrared spectrum measuring slave instrument as
Figure BDA0002471095920000061
Step 6.3: using K-means clustering algorithm to perform data set according to clustering number K
Figure BDA0002471095920000062
The pseudo characteristic spectral vectors are clustered, and the data set is obtained
Figure BDA0002471095920000063
Partitioning into K sub-datasets
Figure BDA0002471095920000064
K1, 2,. K; wherein the content of the first and second substances,
Figure BDA0002471095920000065
for the kth sub-pseudo characteristic spectrum matrix of the measured object set,
step 6.4: using the kth transformation matrix MkTo pair
Figure BDA0002471095920000066
Carrying out transformation correction to obtain a k transformation corrected sub-pseudo characteristic spectrum matrix of
Figure BDA0002471095920000067
Step 6.5: computing a k-th transform corrected sub-pseudo feature spectrum matrix
Figure BDA0002471095920000068
The matrix of the predicted values of the concentration variables of the corresponding measured objects is
Figure BDA0002471095920000069
The invention has the beneficial effects that:
the invention carries out calibration migration by correcting the data distribution difference (CT-CDD) of PLS subspace, specifically, a PLS model of a master instrument is established, the spectra of the master instrument and a slave instrument are projected to the PLS subspace, the latent variables of different spectra are respectively subjected to cluster analysis, a regression model between the latent variables and concentration information of the master instrument is established by using a common least square method, the characteristic spectrum with the closest data distribution between the two instruments is found, the conversion function is respectively calculated to predict the substance concentration variable of a measured object, the prediction result can be corrected by respective conversion function, a migration model is not required to be established by using a standard sample in the whole process, the precision and the efficiency of infrared spectrum calibration migration are greatly improved, and the problem that the calibration migration method which can obviously improve the predictive performance of the model in the prior art needs the standard sample to establish the migration model which is difficult to even impossible to obtain and the standard sample is solved The integrity is difficult to guarantee, and the calibration migration method without a small amount of standard samples has poor prediction performance.
Drawings
FIG. 1 is a schematic diagram of an analysis process of near infrared spectroscopy.
FIG. 2 is a flow chart of the CT-CDD-based infrared spectroscopy measurement apparatus calibration migration method of the present invention.
FIG. 3 is a diagram showing the difference in spectrum between different devices in the first and second embodiments
Fig. 4 is a schematic diagram illustrating the prediction results of the CT-CDD-based infrared spectroscopic measurement apparatus calibration migration method and five other calibration migration methods of the present invention on M5 × MP5 according to an embodiment.
Fig. 5 is a schematic diagram illustrating the prediction results of the CT-CDD-based infrared spectroscopic measurement apparatus calibration migration method and five other calibration migration methods of the present invention on M5 × MP6 according to an embodiment of the present invention.
Fig. 6 is a schematic diagram illustrating the prediction results of the CT-CDD-based infrared spectroscopic measurement apparatus calibration migration method and five other calibration migration methods of the present invention on MP5 × MP6 according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of the prediction results of the CT-CDD-based infrared spectroscopic measuring instrument calibration migration method and five other calibration migration methods according to the second embodiment of the present invention on B1 × B2.
Fig. 8 is a schematic diagram of the prediction results of the CT-CDD-based infrared spectroscopic measurement apparatus calibration migration method and five other calibration migration methods according to the second embodiment of the present invention on B1 × B3.
Fig. 9 is a schematic diagram of the prediction results of the CT-CDD-based infrared spectroscopic measuring instrument calibration migration method and five other calibration migration methods according to the second embodiment of the present invention on B3 × B2.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
The invention provides a migration-standard-free calibration migration method by utilizing a migration learning method in machine learning, aiming at the technical problems that in the prior art, a standard sample is required to construct a migration model, the standard sample is difficult to or even impossible to obtain, the integrity of the standard sample is difficult to guarantee, and the prediction performance of the calibration migration method without the standard sample is poor, aiming at the characteristics of high dimensionality of spectral data and multiple collinearity. The performance of the CT-CDD of the invention was compared to the predicted performance of SBC, PDS, CCACT, TCR and CTAI by two near infrared spectral datasets. Without a standard sample, the prediction performance obtained by the method is superior to that of a classical standard calibration migration method.
Example one
As shown in FIG. 2, the CT-CDD-based infrared spectroscopy measurement instrument calibration migration method of the present invention comprises the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { Xm,ymAnd a target domain data set { X }s};
Wherein, Xm、XsRespectively a master spectral matrix and a slave spectral matrix,
Figure BDA0002471095920000071
Figure BDA0002471095920000072
i is the master spectral vector and the slave spectral vector of the ith sample, I is 1,2The total number of samples is,
Figure BDA0002471095920000073
j, J is the total number of extracted spectral data points, i.e. the jth primary spectral data and the jth secondary spectral data of the ith sample respectively; y ismIs a matrix of values of the concentration of the substance,
Figure BDA0002471095920000074
is the value of the substance concentration variable for the ith sample.
In the first embodiment, the sample is corn, and the spectral data is absorbance. The material concentration variables can be moisture content, oil content, protein content, starch content, and in this example, moisture content is used to verify the method of the present invention. The data measured on the same I-80 samples using three near infrared spectroscopic measuring instruments (M5, MP5, MP6) constitute a corn data set. The near infrared spectrum measuring instruments M5, MP5 and MP6 measure the infrared spectrum at intervals of a-2 nm in the wavelength range of 1100-2498nm, and J-700 channels, so as to obtain the spectrum differences among M5-MP5, M5-MP6 and MP5-MP6, which are respectively shown in fig. 3A, fig. 3B and fig. 3C.
Step 2: source domain data set { X using KS algorithmm,ymDividing into a source domain calibration set
Figure BDA0002471095920000081
And source domain test set
Figure BDA0002471095920000082
In this example one, the Kennard-Stone (KS) algorithm divides 80 corn samples into two groups: the source domain data of the first set of 64 samples constitutes a source domain calibration set; the source domain data of the second set of 16 samples constitutes a source domain test set.
And step 3: set of source domain calibrations
Figure BDA0002471095920000083
Performing centralization treatment to obtain a source domain calibration set after centralization treatment
Figure BDA0002471095920000084
And 4, step 4: PLS algorithm based on data sets
Figure BDA0002471095920000085
Establishing a calibration model
Figure BDA0002471095920000086
Is calculated to obtain
Figure BDA0002471095920000087
Weight matrix W ofm
Figure BDA0002471095920000088
Load matrix PmRegression coefficient matrix betam
And 5: constructing a migration model:
step 5.1: calculating a characteristic spectrum matrix of a main instrument for infrared spectrum measurement
Tm=XmWm(PmWm)-1
Calculating a pseudo-characteristic spectral matrix of the infrared spectrometric slave instrument
Figure BDA0002471095920000089
Step 5.2: for each cluster number L belongs to L*Using k-means clustering algorithm to data set { T }m,ymThe characteristic spectrum vectors of the data set are clustered, and the data set is subjected to the clustering of the characteristic spectrum vectors of the data set Tm,ymDivide into L sub-datasets
Figure BDA00024710959200000810
l=1,2,...,L;
On the basis of OLS algorithm, the first sub-data set
Figure BDA00024710959200000811
Establishing an initial least squares model
Figure BDA00024710959200000812
l=1,2,...,L;
Calculating the cross validation error RMSECV of L initial least square models under each cluster numberLDetermining min { RMSECV }L|L∈L*The corresponding cluster number is the final cluster number K;
wherein L is*To be a set of the number of clusters,
Figure BDA0002471095920000091
is the l-th initial sub-feature spectral matrix,
Figure BDA0002471095920000092
is composed of
Figure BDA0002471095920000093
Matrix of values of variables of the concentration of substance of the corresponding sample, beta0_lIs the first initial regression coefficient matrix;
step 5.3: using K-means clustering algorithm to perform data set { T) according to clustering number Km,ymThe characteristic spectrum vectors of the data set are clustered, and the data set is subjected to the clustering of the characteristic spectrum vectors of the data set Tm,ymDivide into K sub-datasets
Figure BDA0002471095920000094
k=1,2,...,K;
Using K-means clustering algorithm to perform data set according to clustering number K
Figure BDA0002471095920000095
The pseudo characteristic spectral vectors are clustered, and the data set is obtained
Figure BDA0002471095920000096
Partitioning into K sub-datasets
Figure BDA0002471095920000097
k=1,2,...,K;
Wherein the characteristic spectral vector and the pseudo characteristic lightThe spectral vectors are respectively Tm
Figure BDA0002471095920000098
The line vectors of (a) are,
Figure BDA0002471095920000099
respectively a k-th sub characteristic spectrum matrix and a sub pseudo characteristic spectrum matrix,
Figure BDA00024710959200000910
is composed of
Figure BDA00024710959200000911
A matrix formed by the variable values of the substance concentration of the corresponding sample;
step 5.4: on the basis of OLS algorithm, the kth sub-data set
Figure BDA00024710959200000912
Establishing a kth least squares model
Figure BDA00024710959200000913
Calculating to obtain a k-th regression coefficient matrix betak
Step 5.5: computing the kth transformation matrix
Figure BDA00024710959200000914
Wherein the content of the first and second substances,
Figure BDA00024710959200000915
are respectively as
Figure BDA00024710959200000916
The covariance matrix of (2).
In the construction process of the least square model, the main instrument models closest to the characteristic spectrum after clustering the slave instruments are respectively found, and the transformation matrixes are respectively calculated. The method comprises the following specific steps:
the master instrument and the slave instrument correspond to one domain, respectively. A domain consists of two main parts: input space X, its corresponding marginal probability distribution p (X). The relative entropy, or KL divergence, may represent the distance between the data distributions of the two domains, expressed using equation (1):
Figure BDA00024710959200000917
wherein, p and q are probability density functions of data distribution of the source domain and the target domain respectively.
p (x) cannot be directly acquired, but it is assumed that a limited set of training points x has been observednN can be derived from p (x). Then the expectation for p (x) can be approximated by a finite sum over these points, as shown below:
Figure BDA00024710959200000918
labeled spectrum given host instrument { Xm,ymAnd unlabelled spectra of slave instruments { X }sWith the aim of predicting the output of spectra to be measured from the instrument
Figure BDA0002471095920000101
The spectra measured by the different instruments are different, resulting in different data distributions between the two instruments. Equation (3) uses the form of absolute values to represent the distances between the data distributions, both random vectors, following the respective data distributions:
KL(P||Q)≈|lnP(Xm)-lnQ(Xs)| (3)
p, Q is probability density function of data distribution of source domain and target domain;
and considering that the spectrum data has multiple collinearity, all the spectra are mapped to the PLS subspace of the master instrument, and the model (3) is simplified while the dimension of the data is reduced. The characteristic spectrum of the master instrument and the pseudo-characteristic spectrum of the slave instrument are calculated using equation (4) in the form:
Figure BDA0002471095920000102
wherein, TmAnd
Figure BDA0002471095920000103
respectively, a matrix composed of the extracted a score vectors.
At this time, the KL distance between the data profiles of the two instruments is:
KL(P||Q)≈|lnP(tm)-lnQ(ts)| (5)
wherein, tmAnd tsAre random vectors, each following a data distribution TmAnd
Figure BDA0002471095920000104
the spectral data of the master instrument and the slave instrument are mixed, the data distribution after clustering is assumed to be single Gaussian distribution, and the data distribution of the characteristic spectra of the master instrument and the slave instrument are respectively
Figure BDA0002471095920000105
And
Figure BDA0002471095920000106
the distribution difference in the formula (6) can be reduced by correcting the mean and covariance of each cluster separately. Firstly centralizing data to correct mean value, the data distribution of ith characteristic spectrum of two instruments is respectively
Figure BDA0002471095920000107
And
Figure BDA0002471095920000108
and
Figure BDA0002471095920000109
the ith gaussian distributions for the two instruments, respectively;
Figure BDA00024710959200001010
is the ith transfer function;
Figure BDA00024710959200001011
and
Figure BDA00024710959200001012
are two random vectors, which are respectively subject to the data distribution of the ith characteristic spectrum of the two instruments. Equation (5) can be written in the form of equation (6).
Figure BDA00024710959200001013
Assuming the existence of a linear transformation matrix MiCan make the above-mentioned
Figure BDA00024710959200001014
Is formed by
Figure BDA00024710959200001015
So that the distance between the corrected slave spectrum and the master spectrum is minimal, the relative entropy (6) can be rewritten as follows:
Figure BDA00024710959200001016
in equation (7), the linear transformation matrix MiThe solution process of (2) is as follows:
each group of clustered data is approximately in normal distribution, the mean value of the data is 0, and the probability density function of the main characteristic spectrum
Figure BDA00024710959200001017
From the probability density function of the instrument, given by equation (8.1)
Figure BDA0002471095920000111
Given by equation (8.2).
Figure BDA0002471095920000112
Figure BDA0002471095920000113
Passing function
Figure BDA0002471095920000114
After transformation, the random vector t of the slave instrumentsCan be converted into random vector t of main instrumentmThe formula is as follows:
Figure BDA0002471095920000115
suppose MiIs a non-singular matrix, then the random vector of the master instrument can be converted to the random vector of the slave instrument using equation (10):
Figure BDA0002471095920000116
according to the nature of the probability density function, the probability density function of the master instrument
Figure BDA0002471095920000117
Can be transformed by equation (9) as follows
Figure BDA0002471095920000118
It can be changed as follows
Figure BDA0002471095920000119
Thus, there are:
Figure BDA00024710959200001111
equation (11) is a transformation matrix M from the instrument characteristic spectrumiThe probability density function after transformation to the master instrument, expanded as follows:
Figure BDA00024710959200001112
equation (12) is the same as equation (8.1)So that the covariance of both is the same, and so there is
Figure BDA00024710959200001113
MiThe solution of (a) is:
Figure BDA00024710959200001114
step 6: and predicting the substance concentration variable of the measured object set:
step 6.1: collecting the spectrum of each measured object in the measured object set from the instrument by using infrared spectrum measurement, and extracting the spectrum data by using the same method as the step 1 to obtain a secondary spectrum matrix of the measured object set
Figure BDA0002471095920000121
Step 6.2: calculating a pseudo characteristic spectrum matrix of a measured object set under an infrared spectrum measuring slave instrument as
Figure BDA0002471095920000122
Step 6.3: using K-means clustering algorithm to perform data set according to clustering number K
Figure BDA0002471095920000123
The pseudo characteristic spectral vectors are clustered, and the data set is obtained
Figure BDA0002471095920000124
Partitioning into K sub-datasets
Figure BDA0002471095920000125
K1, 2,. K; wherein the content of the first and second substances,
Figure BDA0002471095920000126
for the kth sub-pseudo characteristic spectrum matrix of the measured object set,
step 6.4: make itUsing the kth transformation matrix MkTo pair
Figure BDA0002471095920000127
Carrying out transformation correction to obtain a k transformation corrected sub-pseudo characteristic spectrum matrix of
Figure BDA0002471095920000128
Step 6.5: computing a k-th transform corrected sub-pseudo feature spectrum matrix
Figure BDA0002471095920000129
The matrix of the predicted values of the concentration variables of the corresponding measured objects is
Figure BDA00024710959200001210
k=1,2,...,K。
In this embodiment, the substance concentration variable of the measured object set is predicted by using the CT-CDD-based infrared spectroscopy instrument calibration migration method of the present invention and the conventional SBC, PDS, CCACT, TCR, and CTAI-based infrared spectroscopy instrument calibration migration method, respectively. The PLS model of the master instrument of the present invention is built on a calibration set, and for other migration methods with migration criteria, a number of standard samples are selected on the calibration set using the Kennard-Stone method. And for SBC, PDS, CCACT and CTAI algorithms, a PLS algorithm is adopted as a main algorithm, a multivariate calibration model is established by using spectral data of a main instrument and is used as a reference model, and a sample to be measured of a slave instrument is predicted.
The parameter selection criteria for the different migration methods are similar to CTAI. And the optimal number of latent variables of the PLS model takes values in the range [1,15], is determined by cross validation through ten folds, and is selected according to the minimum cross validation error criterion.
The PLS modeling method and parameter optimization of the primary instruments of SBC, PDS are the same as CTAI. The window size in the PDS is from 3, 16 is searched by the increment of 2, parameters are selected through 5-fold cross validation, the RMSECV of each window model is respectively calculated, and the window with the minimum RMSECV is selected as the optimal parameter; PDS performs poorly on wheat data sets and, in window selection, an F-test is used to determine the optimal window size.
In this embodiment, the root mean square error RMSE is used as an indicator for parameter selection and model evaluation. Furthermore, RMSEC represents the training error for the calibration set, RMSECV represents the cross-validation error, and RMSEP represents the prediction error for the test set. The RMSE calculation method is written as
Figure BDA0002471095920000131
Wherein the content of the first and second substances,
Figure BDA0002471095920000132
is a predicted value, y is a measured value, and n represents the number of samples.
RMSEC, RMSEP, RMSECVmin, and LV of the PLS model of the three instruments M5, MP5, MP6 on the maize dataset are shown in Table 1. Wherein, RMSECVmin is the minimum value of the cross validation error, and LV is the corresponding latent variable number when the minimum cross validation error is obtained. As can be seen from table 1, RMSECVmin, RMSEC, and RMSEP of the PLS model of the apparatus M5 are 0.01066, 0.00599, and 0.00764, respectively, and it can be seen that the three root mean square errors are not very different, the PLS model is relatively stable, and the phenomena of over-fitting and under-fitting do not occur. The instrument MP5 PLS model has RMSECVmin, RMSEC, RMSEP 0.13035, 0.09458, 0.12445, respectively, similar to the M5 PLS model, without under-and over-fitting, and the same conclusions were drawn on the MP6 PLS model. The parameters were selected by 10-fold cross-validation, and the optimal number of latent variables was determined based on the lowest RMSECV criteria, 14, 15, and 10 for the PLS models of the three instruments M5, MP5, and MP6, respectively. It is important that the master device establishes a model with better prediction performance, and the embodiment selects a device with good prediction performance as the master device. As can be seen from table 1, the prediction error of instrument MP6 > the prediction error of instrument MP5 > the prediction error of instrument M5, making it more reasonable to perform model tests with these three combinations (M5 × MP5, M5 × MP6, and MP5 × MP 6); where the superscript denotes the master and the other the slave.
TABLE 1
Instrument for measuring the position of a moving object Reference value RMSEC RMSEP RMSECVmin LV
M5 Moisture content 0.00599 0.00764 0.01066 14
MP5 Moisture content 0.09458 0.12445 0.13035 15
MP6 Moisture content 0.09991 0.15637 0.14775 10
CT-CDD was compared to five calibration migration methods, SBC, PDS, CCACT, TCR, and CTAI. In CT-CDD, the number of clusters is determined by ten-fold cross-validation. The maize dataset contains 80 samples and the maximum number of sub-models after clustering is set to 3, otherwise the calculated migration matrix is under-ranked, which would lead to infinite final prediction results. The limitation of the number of samples results in that when the number of clusters is large, the clustered characteristic spectrum does not have enough samples to establish a stable model. In the first embodiment, it is found by calculation that when the number of clusters is 2, the minimum cross-validation error is obtained.
The prediction errors for CT-CDD and the five other calibration migration methods are shown in Table 2. In table 2, N is the number of standard samples in the migration method requiring the standard samples, a is the optimal window size in the PDS, and b is the dimension of the corresponding optimal subspace in the TCR.
As can be seen from table 2:
(1) for spectral transfer from instrument MP5 to instrument M5: when the number of standard samples was 35, the SBC reached the lowest RMSEP (0.28872); PDS reached the lowest RMSEP when the number of standard samples was 5 (0.18828); CCACT reaches a minimum RMSEP when the number of standard samples is 25 (0.18699); it can be seen that RMSEP (0.15024) for CT-CDD is less than the minimum RMSEP for the three methods PDS, SBC, CCACT, and is also less than TCR (0.47391) and CTAI (0.17511).
(2) For the spectrum transfer from MP6 to M5, the lowest RMSEPs obtained by SBC, PDS, CCACT were 0.33240, 0.27901, 0.17862, respectively, with CT-CDD having lower RMSEP than the other five methods.
(3) For the spectrum transfer from MP6 to MP5, the lowest RMSEPs for SBC, PDS, CCACT were 0.20481, 0.18409 and 0.13722, respectively, the prediction errors for TCR and CTAI were 0.46124 and 0.16563, respectively, and the CT-CDD again reached the smallest RMSEP (0.12357).
From the three groups of experiments, the CT-CDD model obtains the optimal prediction performance under the general condition and has better generalization capability.
TABLE 2
Figure BDA0002471095920000141
Fig. 4, 5, and 6 show the relationship between the predicted values and the measured values obtained by 6 different calibration migration methods on combinations M5 × MP5, M5 × MP6, and MP5 × MP6, respectively. A zero difference between the predicted and measured concentrations will make the sample point on a straight line. For the calibration migration method with the standard sample, under different standard samples, when the prediction performance is optimal, the set of experiments is selected for comparison, so that the CT-CDD can be more fully embodied to obtain good prediction performance.
The prediction results of CT-CDD, CTAI, TCR, CCACT, SBC, and PDS at M5 × MP5 are shown in fig. 4A, 4B, 4C, 4D, 4E, and 4F, respectively, the prediction results at M5 × MP6 are shown in fig. 5A, 5B, 5C, 5D, 5E, and 5F, respectively, and the prediction results at MP5-MP6 are shown in fig. 6A, 6B, 6C, 6D, 6E, and 6F, respectively. As can be seen from FIG. 4, the sample points of CT-CDD are more nearly straight; the TCR and SBC were less well fitted under this set of experiments. As can be seen from FIG. 5, in the spectral transmission from the instrument MP6 to the instrument M5, CT-CDD is closer to a straight line than the other five methods, SBC and TCR achieve the worst prediction performance again, and the prediction errors of the three methods PDS, CCACT and CTAI are smaller but the prediction performance is still poor compared with that of CT-CDD. As can be seen from fig. 6, the same conclusions were drawn in the spectral transmission from instrument MP6 to instrument MP5 as in fig. 4 and 5, which confirmed that CT-CDD achieved the best predictive performance. Therefore, the CT-CDD obtains more satisfactory results in comparison with other five models, and the optimal prediction performance is realized.
Example two
In the second embodiment, the sample is wheat. The wheat dataset was the "zootout" dataset published by International Diffuse Reflectance Conference (IDRC) in 2016. The wheat data set contained data from 3 different NIR spectrometers (B1, B2, B3) on the same I-248 samples, with protein content being selected as the substance concentration variable. The spectral differences between B1-B2, B1-B3 and B3-B2 were obtained by measuring the infrared spectra at an interval a of 0.5nm in the 570-1100nm wavelength range using NIR spectrometers B1, B2 and B3, as shown in fig. 3D, fig. 3E and fig. 3F, respectively.
In this example, the Kennard-stone (ks) algorithm divides 248 wheat samples into two groups: the source domain data of the first set of 198 samples constitutes a source domain calibration set; the source domain data of the second set of 50 samples constitutes a source domain test set.
On the wheat data set, the RMSEC, RMSEP, RMSECVmin, and LV for the PLS models of the three instruments B1, B2, B3 are shown in table 3. As can be seen from Table 3, the PLS model established on instrument B1 has RMSECVmin, RMSEC, RMSEP 0.50337, 0.32880, 0.33254, respectively, and does not show over-fitting or under-fitting. The same was observed in both instrument B2 and instrument B3, and neither overfitting nor underfitting was observed in the PLS models created by the three instruments, explaining the rational selection of the optimal latent variables. For the wheat data set, the PLS model was similar to the corn data set in terms of parameter selection criteria, with the number of latent variables set at 15 at maximum. From the observations in table 3, the prediction error of instrument B1 < the prediction error of instrument B3 < the prediction error of instrument B2, and thus the model performance tests were performed with these three combinations (B1 × B2, B1 × B3, and B3 × B2).
TABLE 3
Instrument for measuring the position of a moving object Reference value RMSEC RMSEP RMSECVmin LV
B1 Protein 0.32880 0.33254 0.50337 15
B2 Protein 0.21636 0.83755 0.32441 15
B3 Protein 0.30288 0.51567 0.43896 15
In CT-CDD, the number of clusters of characteristic spectra is determined similarly to the maize dataset. When the number of samples in the wheat data set is relatively sufficient, the under-rank condition can not occur when the migration matrix is calculated in the corn data set. The number of clusters is set to be between 2 and 5. In the second embodiment, it is found by calculation that when the number of clusters is 2, the minimum cross-validation error is obtained.
Other calibration migration methods have similar parameter selection criteria to the corn data set. In the PDS, the optimal window size is shown in Table 4. When B1 is the master and B2 is the slave, the optimal window sizes for PDS are 11, 15, respectively. When B1 is the master and B3 is the slave, the optimal window sizes for PDS are 15, 11, 5, respectively. When B3 is the master and B2 is the slave, the optimal window sizes for PDS are 7, 15, respectively. In the TCR, the optimal dimensions of the subspace are 7, 12, 21, respectively.
As can be seen from table 4:
(1) when instrument B1 was the master and instrument B2 was the slave, the SBC produced the lowest RMSEP when the number of standard samples was 5 (0.45225). In PDS and CCACT, RMSEP decreased significantly as the standard sample increased. PDS and CCACT reached the lowest RMSEP at a standard sample number of 35, 0.47222 and 0.80448, respectively. CT-CDD achieves the lowest RMSEP compared to SBC, PDS, CCACT (0.43007). The prediction errors for TCR and CTAI are 0.86884 and 0.41419, respectively. Compared with the CT-CDD, the predicted effect of the CT-CDD is only second to CTAI under the current experimental group.
(2) When instrument B1 is the master and instrument B3 is the slave, SBC, PDS and CCACT correspond to the lowest RMSEP of 0.79919, 0.41235 and 0.83440, respectively, for different numbers of standard samples. RMSEPs for TCR and CTAI are 0.72987 and 0.68215, respectively. The results show that RMSEP (0.35160) of CT-CDD is significantly lower than other calibration migration methods, and optimal prediction performance is achieved.
(3) When instrument B3 is the master and instrument B2 is the slave, the lowest RMSEPs for SBC, PDS and CCACT are 0.47177, 0.33707, 0.75119, respectively. The prediction errors of both TCR and CTAI methods are 0.63708 and 0.38446, respectively. The same situation again occurs, with CT-CDD achieving the best prediction performance (RMSEP 0.31856). SBC, PDS and CCACT require standard samples, TCR requires reference values from the instrument, and CT-CDD achieves better prediction performance without the standard samples. Obviously, this means that CT-CDD is a more acceptable approach.
TABLE 4
Figure BDA0002471095920000171
Fig. 7, 8, and 9 show the predicted values and measured values obtained by 6 different calibration migration methods on combinations B1-B2, B1-B3, and B3-B2, respectively.
The prediction results of CT-CDD, CTAI, TCR, CCACT, SBC, and PDS at B1 × B2 are shown in fig. 7A, 7B, 7C, 7D, 7E, and 7F, respectively, the prediction results at B1 × B3 are shown in fig. 8A, 8B, 8C, 8D, 8E, and 8F, respectively, and the prediction results at B3 — B2 are shown in fig. 9A, 9B, 9C, 9D, 9E, and 9F, respectively. As can be clearly seen from fig. 7C and 7D, the correlation between TCR and CCACT is poor, and the prediction error of the model is large. It can be observed from fig. 7A that the fitting effect of CT-CDD is better, and the prediction error of the corresponding model is smaller. FIG. 8 shows that CT-CDD and PDS achieve good fitting results, and the other four methods are relatively poor. FIG. 9 shows that the TCR and CCACT showed a relatively poor correlation between the concentration of the substance and the predicted results, and the other four methods showed better fitting results, but the CT-CDD showed the best fitting results. It can be seen that CT-CDD provides more satisfactory results in comparison with the other five migration methods.
It can be seen from the above first and second embodiments of the present invention that the CT-CDD based calibration migration method of the present invention achieves the best RMSEP (minimum) in the process of using CTAI, TCR, CCACT, SBC, PDS as comparative experiments to test the performance of the CT-CDD method using two NIR data sets. The results clearly show that CT-CDD successfully corrected the differences between the spectra measured on the different instruments. For SBC, PDS and CCACT, they require standard samples to establish the migration model. In a TCR, a small number of reference values are also required from the instrument sample. Both of these conditions are expensive in practical application, and even cannot be satisfied. Therefore, the CT-CDD-based method of the present invention is an economical and efficient calibration migration method when standard samples are not available in practical applications.
The inventive calibration migration method, which is non-standard and by correcting PLS subspace data distribution differences (CT-CDD), attempts to find a transfer function that ensures that the data distribution distance between the master and slave instruments can be reduced when the data of the slave instrument is projected into this space. The data distribution of the characteristic spectrum is a mixed distribution, and the spectra need to be clustered and the distance of each sub-distribution between the two instruments is minimized by the respective transfer function. The present invention preserves the important properties of both instruments in the same PLS subspace and eliminates the multicollinearity of the spectra, while the data differences between the master instrument's features and the slave instrument's dummy features can be more accurately scaled down. The differences in the data distribution are further corrected by correcting the mean and variance of each part of the latent variable from different instruments.
It is to be understood that the above-described embodiments are only a few embodiments of the present invention, and not all embodiments. The above examples are only for explaining the present invention and do not constitute a limitation to the scope of protection of the present invention. All other embodiments, which can be derived by those skilled in the art from the above-described embodiments without any creative effort, namely all modifications, equivalents, improvements and the like made within the spirit and principle of the present application, fall within the protection scope of the present invention claimed.

Claims (1)

1. A CT-CDD-based infrared spectrum measuring instrument calibration migration method is characterized by comprising the following steps:
step 1: the method comprises the steps of enabling an infrared spectrum measurement master instrument to correspond to a source domain, enabling an infrared spectrum measurement slave instrument to correspond to a target domain, collecting the spectrum of each sample by using the infrared spectrum measurement master instrument and the infrared spectrum measurement slave instrument, respectively obtaining a master spectrum and a slave spectrum, respectively extracting spectral data of the master spectrum and the slave spectrum at intervals anm within a wavelength range, collecting material concentration variable values of each sample, and obtaining a source domain data set { Xm,ymAnd a target domain data set { X }s};
Wherein, Xm、XsRespectively a master spectral matrix and a slave spectral matrix,
Figure FDA0002471095910000011
Figure FDA0002471095910000012
Figure FDA0002471095910000013
i is the main spectral vector and the slave spectral vector of the ith sample, I is 1, 2.
Figure FDA0002471095910000014
Figure FDA0002471095910000015
J, J is the total number of extracted spectral data points, i.e. the jth primary spectral data and the jth secondary spectral data of the ith sample respectively; y ismIs a matrix of values of the concentration of the substance,
Figure FDA0002471095910000016
Figure FDA0002471095910000017
is the value of the substance concentration variable for the ith sample;
step 2: source domain data set { X using KS algorithmm,ymDividing into a source domain calibration set
Figure FDA0002471095910000018
And source domain test set
Figure FDA0002471095910000019
And step 3: set of source domain calibrations
Figure FDA00024710959100000110
Performing centralization treatment to obtain a source domain calibration set after centralization treatment
Figure FDA00024710959100000111
And 4, step 4: PLS algorithm based on data sets
Figure FDA00024710959100000112
Establishing a calibration model
Figure FDA00024710959100000113
Is calculated to obtain
Figure FDA00024710959100000114
Weight matrix W ofm
Figure FDA00024710959100000115
Load matrix PmRegression coefficient matrix betam
And 5: constructing a migration model:
step 5.1: calculating a characteristic spectrum matrix of a main instrument for infrared spectrum measurement
Tm=XmWm(PmWm)-1
Calculating a pseudo-characteristic spectral matrix of the infrared spectrometric slave instrument
Figure FDA00024710959100000116
Step 5.2: for each cluster number L belongs to L*Using k-means clustering algorithm to data set { T }m,ymThe characteristic spectrum vectors of the data set are clustered, and the data set is subjected to the clustering of the characteristic spectrum vectors of the data set Tm,ymDivide into L sub-datasets
Figure FDA00024710959100000117
l=1,2,...,L;
On the basis of OLS algorithm, the first sub-data set
Figure FDA0002471095910000021
Establishing an initial least squares model
Figure FDA0002471095910000022
l=1,2,...,L;
Calculating the cross validation error RMSECV of L initial least square models under each cluster numberLDetermining min { RMSECV }L|L∈L*The corresponding cluster number is the final cluster number K;
wherein L is*To be a set of the number of clusters,
Figure FDA0002471095910000023
is the l-th initial sub-feature spectral matrix,
Figure FDA0002471095910000024
is composed of
Figure FDA0002471095910000025
Matrix of values of variables of the concentration of substance of the corresponding sample, beta0_lIs the first initial regression coefficient matrix;
step 5.3: using K-means clustering algorithm to perform data set { T) according to clustering number Km,ymThe characteristic spectrum vectors of the data set are clustered, and the data set is subjected to the clustering of the characteristic spectrum vectors of the data set Tm,ymDivide into K sub-datasets
Figure FDA0002471095910000026
k=1,2,...,K;
Using K-means clustering algorithm to perform data set according to clustering number K
Figure FDA0002471095910000027
The pseudo characteristic spectral vectors are clustered, and the data set is obtained
Figure FDA0002471095910000028
Partitioning into K sub-datasets
Figure FDA0002471095910000029
k=1,2,...,K;
Wherein the characteristic spectrum vector and the pseudo characteristic spectrum vector are respectively Tm
Figure FDA00024710959100000210
The line vectors of (a) are,
Figure FDA00024710959100000211
respectively a k-th sub characteristic spectrum matrix and a sub pseudo characteristic spectrum matrix,
Figure FDA00024710959100000212
is composed of
Figure FDA00024710959100000213
A matrix formed by the variable values of the substance concentration of the corresponding sample;
step 5.4: on the basis of OLS algorithm, the kth sub-data set
Figure FDA00024710959100000214
Establishing a kth least squares model
Figure FDA00024710959100000215
Calculating to obtain a k-th regression coefficient matrix betak
Step 5.5: computing the kth transformation matrix
Figure FDA00024710959100000216
Wherein the content of the first and second substances,
Figure FDA00024710959100000217
are respectively as
Figure FDA00024710959100000218
The covariance matrix of (a);
step 6: and predicting the substance concentration variable of the measured object set:
step 6.1: collecting the spectrum of each measured object in the measured object set from the instrument by using infrared spectrum measurement, and extracting the spectrum data by using the same method as the step 1 to obtain a secondary spectrum matrix of the measured object set
Figure FDA00024710959100000219
Step 6.2: calculating a pseudo characteristic spectrum matrix of a measured object set under an infrared spectrum measuring slave instrument as
Figure FDA00024710959100000220
Step 6.3: using K-means clustering algorithm to perform data set according to clustering number K
Figure FDA00024710959100000221
The pseudo characteristic spectral vectors are clustered, and the data set is obtained
Figure FDA00024710959100000222
Partitioning into K sub-datasets
Figure FDA00024710959100000223
K1, 2,. K; wherein the content of the first and second substances,
Figure FDA00024710959100000224
for the kth sub-pseudo characteristic spectrum matrix of the measured object set,
step 6.4: using the kth transformation matrix MkTo pair
Figure FDA0002471095910000031
Carrying out transformation correction to obtain a k transformation corrected sub-pseudo characteristic spectrum matrix of
Figure FDA0002471095910000032
Step 6.5: computing a k-th transform corrected sub-pseudo feature spectrum matrix
Figure FDA0002471095910000033
Predicted value of substance concentration variation of corresponding measured objectThe matrix is
Figure FDA0002471095910000034
k=1,2,...,K。
CN202010348512.2A 2020-04-28 2020-04-28 Infrared spectrum measuring instrument calibration migration method based on CT-CDD Active CN111563436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010348512.2A CN111563436B (en) 2020-04-28 2020-04-28 Infrared spectrum measuring instrument calibration migration method based on CT-CDD

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010348512.2A CN111563436B (en) 2020-04-28 2020-04-28 Infrared spectrum measuring instrument calibration migration method based on CT-CDD

Publications (2)

Publication Number Publication Date
CN111563436A CN111563436A (en) 2020-08-21
CN111563436B true CN111563436B (en) 2022-04-08

Family

ID=72074370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010348512.2A Active CN111563436B (en) 2020-04-28 2020-04-28 Infrared spectrum measuring instrument calibration migration method based on CT-CDD

Country Status (1)

Country Link
CN (1) CN111563436B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762208B (en) * 2021-09-22 2023-07-28 山东大学 Spectrum conversion method of near infrared spectrum and characteristic spectrum and application thereof
CN113959979B (en) * 2021-10-29 2022-07-29 燕山大学 Near infrared spectrum model migration method based on deep Bi-LSTM network
CN114112989B (en) * 2021-12-03 2023-07-11 四川启睿克科技有限公司 Near infrared detection method and system based on compound vision
CN115049025B (en) * 2022-08-16 2022-11-04 山东钢铁股份有限公司 Model migration method and system based on elastic segmentation standardization algorithm
CN117171566A (en) * 2022-08-17 2023-12-05 无锡迅杰光远科技有限公司 Sample content identification method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN109444066A (en) * 2018-10-29 2019-03-08 山东大学 Model transfer method based on spectroscopic data
CN110068543A (en) * 2019-03-26 2019-07-30 昆明理工大学 A kind of tera-hertz spectra recognition methods based on transfer learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106596450A (en) * 2017-01-06 2017-04-26 东北大学秦皇岛分校 Incremental method for analysis of material component content based on infrared spectroscopy
CN106680238A (en) * 2017-01-06 2017-05-17 东北大学秦皇岛分校 Method for analyzing material composition content on basis of infrared spectroscopy
CN108152239A (en) * 2017-12-13 2018-06-12 东北大学秦皇岛分校 The sample composition content assaying method of feature based migration
CN109444066A (en) * 2018-10-29 2019-03-08 山东大学 Model transfer method based on spectroscopic data
CN110068543A (en) * 2019-03-26 2019-07-30 昆明理工大学 A kind of tera-hertz spectra recognition methods based on transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
迁移学习在食用油光谱模型转移中的应用;刘翠玲 等;《食品科学技术学报》;20190731;第37卷(第4期);全文 *

Also Published As

Publication number Publication date
CN111563436A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN111563436B (en) Infrared spectrum measuring instrument calibration migration method based on CT-CDD
Lorber et al. A theoretical foundation for the PLS algorithm
Chen et al. Calibration transfer via an extreme learning machine auto-encoder
US7038774B2 (en) Method of characterizing spectrometer instruments and providing calibration models to compensate for instrument variation
Peris-Díaz et al. A guide to good practice in chemometric methods for vibrational spectroscopy, electrochemistry, and hyphenated mass spectrometry
Forina et al. Multivariate calibration
Centner et al. Comparison of multivariate calibration techniques applied to experimental NIR data sets
Peng et al. Near-infrared calibration transfer based on spectral regression
Huang et al. Improved generalization of spectral models associated with Vis-NIR spectroscopy for determining the moisture content of different tea leaves
CN105842190B (en) A kind of method for transferring near infrared model returned based on spectrum
Yun-Ying et al. Progress and applications of multivariate calibration model transfer methods
Cao Calibration optimization and efficiency in near infrared spectroscopy
CN112285056B (en) Method for selecting and modeling personalized correction set of spectrum sample
Wang et al. A new ensemble modeling method for multivariate calibration of near infrared spectra
Mishra et al. FRUITNIR-GUI: A graphical user interface for correcting external influences in multi-batch near infrared experiments related to fruit quality prediction
CN113158575A (en) Method for transferring online near-infrared spectrum model of assumed standard sample
Puttipipatkajorn et al. Development of calibration models for rapid determination of moisture content in rubber sheets using portable near-infrared spectrometers
Tang et al. On-line multi-component alkane mixture quantitative analysis using Fourier transform infrared spectrometer
Mou et al. Robust calibration model transfer
Codazzi et al. Gaussian graphical modeling for spectrometric data analysis
Bitetto et al. A nonlinear principal component analysis to study archeometric data
Hemmateenejad et al. Clustering of variables in regression analysis: a comparative study between different algorithms
Estienne et al. A comparison of multivariate calibration techniques applied to experimental NIR data sets: Part II. Predictive ability under extrapolation conditions
Shan et al. Unsupervised model adaptation for multivariate calibration by domain adaptation-regularization based kernel partial least square
Shan et al. A nonlinear calibration transfer method based on joint kernel subspace

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant