WO2023123329A1 - 近红外光谱的净信号提取方法及其系统 - Google Patents

近红外光谱的净信号提取方法及其系统 Download PDF

Info

Publication number
WO2023123329A1
WO2023123329A1 PCT/CN2021/143614 CN2021143614W WO2023123329A1 WO 2023123329 A1 WO2023123329 A1 WO 2023123329A1 CN 2021143614 W CN2021143614 W CN 2021143614W WO 2023123329 A1 WO2023123329 A1 WO 2023123329A1
Authority
WO
WIPO (PCT)
Prior art keywords
infrared spectrum
net
spectral
signal
model
Prior art date
Application number
PCT/CN2021/143614
Other languages
English (en)
French (fr)
Inventor
潘天红
李孟虎
陈琦
陈山
樊渊
Original Assignee
安徽大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安徽大学 filed Critical 安徽大学
Priority to US18/109,439 priority Critical patent/US20230204504A1/en
Publication of WO2023123329A1 publication Critical patent/WO2023123329A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light

Definitions

  • the invention belongs to the technical field of near-infrared spectroscopy, in particular to a method and system for extracting net signals of near-infrared spectroscopy.
  • near-infrared spectroscopy Because its wavelength is close to the visible light region, near-infrared spectroscopy has strong penetrating ability, can carry more sample information, and is more suitable for material composition analysis. In recent years, near-infrared spectroscopy technology has both Due to the advantages of fastness and simplicity, it has rapidly developed into a new analysis and research method.
  • the most widely used method for establishing a quantitative analysis model in near-infrared spectroscopy is the partial least squares method. Like principal component regression, the partial least squares method It also belongs to the factor analysis method. In the modeling process, the spectral matrix needs to be decomposed. A few variables extracted during the decomposition process can represent most of the information of the original spectrum. In the partial least squares regression, these variables are called principal components.
  • the partial least squares regression not only considers the detection target vector in the process of principal component extraction, but also maximizes the covariance between the extracted principal component and the detection target vector, which ensures that the relationship between the potential principal component and the detection target vector is There is the greatest correlation.
  • the partial least squares method Before using the partial least squares method to establish a correction model, it is necessary to use a preprocessing scheme to correct the spectrum of the original near-infrared spectral data.
  • the widely used near-infrared spectral processing methods mainly include standard normal transformation. Multivariate scatter correction, baseline correction, smoothing.
  • the purpose of the present invention is to provide a method and system for extracting net signals of near-infrared spectra, which solves the problem that although the existing preprocessing can eliminate the redundant information contained in the original spectral data, highlight the differences between the spectral signals of different samples, and simplify the follow-up
  • these processing methods are difficult to extract the net analytical signal in the near-infrared spectrum, that is, the technical problem that only contains the signal of the analyte we are interested in.
  • a net signal extraction method of near-infrared spectrum comprising the steps of:
  • the rank elimination method is used to obtain the noise subspace, that is, the subspace spanned by the interference signal (other chemical composition vector), and the measured spectral signal is transferred to The noise subspace is orthogonally projected, and the signal perpendicular to the noise subspace is the net signal of the measured component;
  • the method of rank elimination is used to solve the problem, which is specifically described as follows: the original data is reconstructed by using the principal component analysis (PCA) method, and the resulting reconstructed matrix is denoted as R.
  • PCA principal component analysis
  • the solution to the noise subspace is expressed as in, is the projection of c k in A-dimensional space d T is the average spectrum of all calibration sets, and the scalar a is calculated as For the near-infrared spectrum data r k,un of an unknown sample, the calculation method of the net analytical signal of the analyte is
  • PLS Partial Least Squares
  • R 2 coefficient of determination
  • LASSO wavelength selection method
  • the penalty coefficient in the wavelength selection method is determined by ten-fold cross-validation.
  • a net signal extraction system for near-infrared spectroscopy comprising:
  • the sampling module collects samples and obtains the original data of the near-infrared spectrum of the samples
  • Prediction module uses chemical detection methods to detect the content of the analyte of interest as a response variable
  • the processing module applies different spectral preprocessing methods (SNV, MSC, SG, 1 st derivation) and combinations between different spectral preprocessing methods (SNV, MSC, SG, 1 st derivation) to raw spectral data , and use the ten-fold cross-validation to find the optimal pretreatment scheme, and use the LASSO algorithm to select the bands related to the response variable;
  • the extraction module uses the rank elimination method to obtain the noise subspace, that is, the subspace formed by the interference signal (other chemical composition vector), and the measured The obtained spectral signal is orthogonally projected to the noise subspace, and the signal perpendicular to the noise subspace is the net signal of the measured component;
  • the detection module establishes a prediction model, extracts correction data, and uses the correction data to test the performance of the model.
  • An embodiment of the present invention reduces the number of principal components in the optimal model of the partial least squares method by extracting the net analysis signal of the near-infrared spectrum, simplifies the complexity of the model while improving the accuracy and robustness of the model, and introduces a preprocessing scheme
  • the direction of the near-infrared spectral disturbance is changed, so that the projection of the spectral disturbance in the direction of the net signal is reduced.
  • the introduction of LASSO reduces the mode of the disturbance vector and further eliminates the influence of the disturbance on the extraction of the net analysis signal.
  • the wavelength selection method The introduction of NIR spectral data solves the problem of multiple correlations and reduces the modulus of the spectral disturbance vector.
  • the introduction of these two spectral data processing schemes increases the signal-to-noise ratio of the net analysis signal, thereby improving the accuracy of the model and improving the accuracy of the model. robustness.
  • Fig. 1 is the original tea spectral data in the tea near-infrared spectral analysis of Example 1 of the present invention
  • Fig. 2 is the original tea spectral data in the tea near-infrared spectral analysis of Example 1 of the present invention
  • Fig. 3 is the tea spectral data after using S-G (9 o'clock window)+SNV pretreatment in embodiment 1 of the present invention
  • Fig. 4 is the net analysis signal of a piece of tea spectral data after the pretreatment of embodiment 1 of the present invention
  • Fig. 5 is the near-infrared band selected by LASSO in Embodiment 1 of the present invention.
  • Fig. 6 is the net analysis signal of the spectral data after extracting and processing in embodiment 1 of the present invention.
  • Fig. 7 is the model prediction result based on optimal preprocessing method and LASSO in embodiment 1 of the present invention.
  • Fig. 8 is the model prediction result based on the common processing method and LASSO in Example 1 of the present invention.
  • a net signal extraction method of near-infrared spectrum comprising the following steps:
  • the rank elimination method is used to obtain the noise subspace, that is, the subspace spanned by the interference signal (other chemical composition vector), and the measured spectral signal is transferred to The space formed by the interference signal is orthogonally projected, and the signal perpendicular to the space formed by the interference signal is the net signal of the measured component;
  • a net signal extraction system for near-infrared spectroscopy comprising:
  • the sampling module collects samples and obtains the original data of the near-infrared spectrum of the samples
  • the prediction module uses chemical detection methods to detect the content of the analyte of interest as a response variable;
  • the processing module applies different spectral preprocessing methods (SNV, MSC, S-G, 1st derivation) and the combination of different spectral preprocessing methods (SNV, MSC, S-G, 1st derivation) to the original spectral data, And use the ten-fold cross-check to find the optimal preprocessing scheme, and use the LASSO algorithm to select the band spectral data related to the response variable as input;
  • the extraction module uses the rank elimination method to obtain the noise subspace, that is, the subspace formed by the interference signal (other chemical composition vector), and the measured The obtained spectral signal is orthogonally projected to the noise subspace, and the signal perpendicular to the noise subspace is the net signal of the measured component;
  • the detection module establishes a prediction model, extracts correction data, and uses the correction data to test the performance of the model.
  • the application of one aspect of this embodiment is: first, compare the net analysis signals extracted under different preprocessing methods to establish a PLS correction model, obtain the optimal preprocessing scheme by comparing the experimental results, and finally use LASSO on the preprocessed spectral data Perform wavelength selection to obtain the final spectral correction data, extract its net signal, further improve the signal-to-noise ratio of the spectral signal and simplify the model.
  • the introduction of the preprocessing scheme changes the disturbance of the near-infrared spectrum
  • the direction of the direction reduces the projection of the spectral disturbance in the direction of the net signal.
  • the introduction of LASSO reduces the mode of the disturbance vector and further eliminates the influence of interference on the extraction of the net analysis signal.
  • the introduction of the wavelength selection method solves the problem of near-infrared spectroscopy
  • the problem of multiple correlations in the data, and the reduction of the modulus of the spectral disturbance vector, the introduction of these two spectral data processing schemes increases the signal-to-noise ratio of the net analysis signal, thereby improving the accuracy of the model and improving the robustness of the model.
  • each column of the matrix is the components contained in the spectrum except the concentration of the analyte of interest ( That is, the concentration vector c k of the interference component), is a pure spectrum containing only k th components, I is the identity matrix, the superscript T represents the transpose of the matrix, and the superscript + represents the pseudo-inverse matrix of the matrix.
  • the method of rank elimination is used to solve it, and the specific description is as follows: the original data is reconstructed by applying the principal component analysis (PCA) method, and the reconstruction is generated.
  • the constructed matrix is denoted as R, the purpose is to prevent R T R from being dissatisfied with the rank and unable to calculate the regression coefficient and eliminate random noise.
  • the solution of the noise subspace of this embodiment is expressed as in, is the projection of c k in A-dimensional space dT is the average spectrum of all calibration sets.
  • the scalar a is calculated as
  • the calculation method of the net analysis signal of the analyte is:
  • the partial least square method is used to establish a prediction model, and the determination coefficient (R 2 ) of the prediction set is used as a criterion to select the best pretreatment scheme under the condition of no underfitting and overfitting , using the wavelength selection method (Least absolute shrinkage and selection operator, LASSO) to select the optimal band, the penalty coefficient in the wavelength selection method (LASSO) is determined by the ten-fold cross-check, the selected band is used as input, and the net analysis signal is extracted, using Make the final calibration data, and finally use partial least squares (PLS) to establish a prediction model and test the performance of the model.
  • the wavelength selection method Least absolute shrinkage and selection operator, LASSO
  • LASSO least absolute shrinkage and selection operator
  • This embodiment provides a method for extracting the net analysis signal in the near-infrared spectrum analysis of tea, and the process of selecting a model optimization scheme for predicting sugar content in tea (as shown in Figure 1), the specific steps are as follows:
  • Step 1 First prepare the sample to be tested, and collect the spectral data of green tea as (As shown in Figure 2), utilize liquid chromatography to measure the sugar content data in the sample as The sample is randomly divided into a calibration set and a prediction set according to a ratio of 7:3;
  • Step 2 Use different preprocessing schemes to process the original near-infrared spectral data, extract its net analysis signal related only to the sugar content, establish a PLS quantitative analysis model, and use the accuracy of the prediction set as the evaluation standard to select the optimal preprocessing method, and finally the optimal preprocessing method is 9-point S-G smoothing combined with SNV.
  • the near-infrared spectrogram after preprocessing is shown in Figure 3, and its extracted net analysis signal is shown in Figure 4;
  • Step 3 Use LASSO to select the wavelength of the preprocessed near-infrared spectrum, and use 10-fold cross-validation to determine the optimal penalty coefficient.
  • the selected band is shown in Figure 5, and then extract the net analysis signal of the processed spectral data as shown in the figure As shown in 6, it is used as the final modeling data;
  • Step 4 Based on the final spectral data, use PLS to establish a quantitative analysis model and analyze the performance of the model. Under the condition that the optimal PLS principal component is 2, the results of 100 Monte Carlo simulation experiments are shown in Figure 7, and the prediction set R 2 The median of is 0.91. The PLS model under the common processing method (S-G+SNV) is used as a comparison. The results of 100 Monte Carlo simulation experiments are shown in Figure 8. Under the condition that the optimal PLS principal component is 7 Next, the median of the prediction set R 2 is 0.89.
  • the method of the present invention can realize the measurement of the sugar content in green tea with high precision through the near-infrared spectrum data, and the precision of the obtained model is better than that of the traditional modeling method.

Abstract

一种近红外光谱的净信号提取方法及其系统,涉及近红外光谱技术领域。净信号提取方法包括如下步骤:采集样本,获取样本近红外光谱原始数据;使用化学检测方法检测感兴趣的分析物的含量,将其作为响应变量;将不同的光谱预处理方法以及不同的光谱预处理方法之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段。通过提取近红外光谱的净分析信号减少偏最小二乘法最佳模型中的主成分个数,简化模型复杂度的同时提高模型准确度和鲁棒性,预处理方案的引入改变了近红外光谱扰动的方向,使得光谱扰动在净信号方向的投影减少。

Description

近红外光谱的净信号提取方法及其系统 技术领域
本发明属于近红外光谱技术领域,特别是涉及一种近红外光谱的净信号提取方法及其系统。
背景技术
近红外光谱由于其波长靠近可见光区,穿透能力较强,能携带较多的样本信息,更适用于物质成分分析,近年来,近红外光谱技术以其在相对准确分析的基础上,兼有快速、简便等优点而迅速发展成为一种新兴的分析与研究手段,近红外光谱分析中应用最为普遍的一种定量分析模型建立方法为偏最小二乘法,与主成分回归一样,偏最小二乘法也属于因子分析方法,在建模过程中需要对光谱矩阵进行分解,分解过程提取出少数几个变量就能代表原光谱的大部分信息,在偏最小二乘回归中这些变量称之为主成分,但偏最小二乘回归在主成分提取过程中不仅考虑了检测目标向量,还要使提取出的主成分与检测目标向量间的协方差最大化,这保证了潜主成分与检测目标向量间有最大的相关性,在使用偏最小二乘法建立校正模型前,需要使用预处理方案对原始近红外光谱数据进行谱图校正,目前应用较广的近红外光谱处理方法主要包括标准正态变换,多元散射校正,基线校正,平滑处理。
现有的预处理虽然可以消除原始光谱数据里包含的多余信息,突出不同样本光谱信号间的差异,简化后续所建模型和提高模型预测精度,但是这些处理方法难以提取近红外光谱中的净分析信号,即只包含我们感兴趣的分析物的信号。
发明内容
本发明的目的在于提供一种近红外光谱的净信号提取方法及其系统,解决了现有的预处理虽然可以消除原始光谱数据里包含的多余信息,突出不同样本光谱信号间的差异,简化后续所建模型和提高模型预测精度,但是这些处理方法难以提取近红外光谱中的净分析信号,即只包含我们感兴趣的分析物的信号的技术问题。
为达上述目的,本发明是通过以下技术方案实现的:
一种近红外光谱的净信号提取方法,包括如下步骤:
采集样本,获取样本近红外光谱原始数据;
使用化学检测方法检测感兴趣的分析物的含量,将其作为响应变量;
将不同的光谱预处理方法(SNV,MSC,S-G,1 stderivation)以及不同的光谱预处理方法(SNV,MSC,S-G,1 stderivation)之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段;
在逆模型的条件下(只知道感兴趣的分析物的含量),使用秩消方法获取噪声子空间,即由干扰信号(其他化学成分向量)张成的子空间,将测量得到的光谱信号向噪声子空间进行正交投影,垂直于噪声子空间的信号即为被测成份的净信号;
建立预测模型,提取校正数据,利用校正数据检测模型性能。
可选的,净信号求解过程所用的秩消方法的过程如下:假定
Figure PCTCN2021143614-appb-000001
(H′1)为采集的一个光谱向量,X(N′H)包含N个近红外光谱样本,c k(N′1)为样本对应的感兴趣的分析物浓度向量,则r分解为两部分r=r //+r ^,其 中,r //为r在重构矩阵空间的投影,r ^为正交于r //的部分。
可选的,近红外光谱的净信号通过
Figure PCTCN2021143614-appb-000002
计算,其中S -k=span{s 1,s 2,L s k-1,s k+1,L,s m},矩阵的每一列为光谱中除去感兴趣分析物的浓度所含成分(即干扰成分)的浓度向量c k
Figure PCTCN2021143614-appb-000003
为只包含k th成分的纯光谱,I为单位矩阵,上标T代表矩阵的转置,上标+代表矩阵的伪逆矩阵,在逆模型条件下,没有先验数据求解S -k矩阵,因此采用秩消的方法来求解,具体描述为:应用主成分分析(PCA)法将原始数据进行重构,产生重构后的矩阵记作R。
可选的,噪声子空间的求解表示为
Figure PCTCN2021143614-appb-000004
其中,
Figure PCTCN2021143614-appb-000005
为c k在A维空间的投影
Figure PCTCN2021143614-appb-000006
d T为所有校正集的平均光谱,标量a的计算方法为
Figure PCTCN2021143614-appb-000007
对于未知样品的近红外光谱数据r k,un,其关于分析物的净分析信号计算方式为
Figure PCTCN2021143614-appb-000008
可选的,使用偏最小二乘法(PLS)建立预测模型,使用预测集的测定系数(R 2)作为评判标准,在不发生欠拟合和过拟合的条件下选取最佳的预处理方案,使用波长选择方法(Least absolute shrinkage and selection operator,LASSO)选取最优波段,将选取的波段作为输入,提取净分析信号,用作最终的校正数据,最后利用偏最小二乘法(PLS)建立预测模型,并检测模型性能。
可选的,波长选择方法(LASSO)中的惩罚系数由十折交叉检验确定。
一种近红外光谱的净信号提取系统,包括:
采样模块:采样模块采集样本,获取样本近红外光谱原始数据;
预测模块:预测模块使用化学检测方法检测感兴趣的分析物的含量, 将其作为响应变量;
处理模块:处理模块将不同的光谱预处理方法(SNV,MSC,S-G,1 stderivation)以及不同的光谱预处理方法(SNV,MSC,S-G,1 stderivation)之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段;
提取模块:提取模块在逆模型的条件下(只知道感兴趣的分析物的含量),使用秩消方法获取噪声子空间,即由干扰信号(其他化学成分向量)张成的子空间,将测量得到的光谱信号向噪声子空间进行正交投影,垂直于噪声子空间的信号即为被测成份的净信号;
检测模块:检测模块建立预测模型,提取校正数据,利用校正数据检测模型性能。
本发明的实施例具有以下有益效果:
本发明的一个实施例通过提取近红外光谱的净分析信号减少偏最小二乘法最佳模型中的主成分个数,简化模型复杂度的同时提高模型准确度和鲁棒性,预处理方案的引入改变了近红外光谱扰动的方向,使得光谱扰动在净信号方向的投影减少,LASSO的引入则减小了扰动向量的模,进一步消除干扰对净分析信号提取的影响,另一方面,波长选择方法的引入解决近红外光谱数据存在的多重相关性的问题,并且缩减了光谱扰动向量的模,这两种处理光谱数据方案的引入增加了净分析信号的信噪比,进而改善模型精度,提高模型鲁棒性。
当然,实施本发明的任一产品并不一定需要同时达到以上所述的所有优点。
附图说明
构成本申请的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本发明实施例1茶叶近红外光谱分析中原始的茶叶光谱数据;
图2为本发明实施例1茶叶近红外光谱分析中原始的茶叶光谱数据;
图3为本发明实施例1使用S-G(9点窗口)+SNV预处理之后的茶叶光谱数据;
图4为本发明实施例1预处理之后的一条茶叶光谱数据的净分析信号;
图5为本发明实施例1中LASSO选取的近红外波段;
[根据细则91更正 14.09.2022]
图6为本发明实施例1中提取处理之后的光谱数据的净分析信号;
[根据细则91更正 14.09.2022]
图7为本发明实施例1中基于最佳预处理方法与LASSO的模型预测结果;
[根据细则91更正 14.09.2022]
图8为本发明实施例1中基于普通处理方法与LASSO的模型预测结果。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。
为了保持本发明实施例的以下说明清楚且简明,本发明省略了已知功能和已知部件的详细说明。
在本实施例中提供了一种近红外光谱的净信号提取方法,包括如下步骤:
采集样本,获取样本近红外光谱原始数据;
使用化学检测方法检测感兴趣的分析物的含量,将其作为响应变量;
将不同的光谱预处理方法(SNV,MSC,S-G,1 st derivation)以及不同的光谱预处理方法(SNV,MSC,S-G,1 stderivation)之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段;
使用LASSO算法挑选出与响应变量相关的波段,将其用作输入数据;
在逆模型的条件下(只知道感兴趣的分析物的含量),使用秩消方法获取噪声子空间,即由干扰信号(其他化学成分向量)张成的子空间,将测量得到的光谱信号向干扰信号张成的空间进行正交投影,垂直于干扰信号张成的空间的信号即为被测成份的净信号;
建立预测模型,提取校正数据,利用校正数据检测模型性能。
一种近红外光谱的净信号提取系统,包括:
采样模块:采样模块采集样本,获取样本近红外光谱原始数据;
预测模块:预测模块使用化学检测方法检测感兴趣的分析物的含量,将其作为响应变量;
处理模块:处理模块将不同的光谱预处理方法(SNV,MSC,S-G,1st derivation)以及不同的光谱预处理方法(SNV,MSC,S-G,1st derivation)之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段光谱数据作为输入;
提取模块:提取模块在逆模型的条件下(只知道感兴趣的分析物的含量),使用秩消方法获取噪声子空间,即由干扰信号(其他化学成分向量) 张成的子空间,将测量得到的光谱信号向噪声子空间进行正交投影,垂直于噪声子空间的信号即为被测成份的净信号;
检测模块:检测模块建立预测模型,提取校正数据,利用校正数据检测模型性能。
本实施例一个方面的应用为:首先对比使用不同预处理方法下提取的净分析信号建立PLS校正模型,通过对比实验结果获取最优的预处理方案,最后再对预处理过的光谱数据使用LASSO进行波长选择得到最终的光谱校正数据,提取其净信号,进一步提高光谱信号的信噪比并且简化模型。
通过提取近红外光谱的净分析信号减少偏最小二乘法最佳模型中的主成分个数,简化模型复杂度的同时提高模型准确度和鲁棒性,预处理方案的引入改变了近红外光谱扰动的方向,使得光谱扰动在净信号方向的投影减少,LASSO的引入则减小了扰动向量的模,进一步消除干扰对净分析信号提取的影响,另一方面,波长选择方法的引入解决近红外光谱数据存在的多重相关性的问题,并且缩减了光谱扰动向量的模,这两种处理光谱数据方案的引入增加了净分析信号的信噪比,进而改善模型精度,提高模型鲁棒性。
本实施例的净信号求解过程所用的秩消方法的过程如下:假定
Figure PCTCN2021143614-appb-000009
(H′1)为采集的一个光谱向量,X(N′H)包含N个近红外光谱样本,c k(N′1)为样本对应的感兴趣的分析物浓度向量,则r分解为两部分r=r //+r ^,其中,r //为r在重构矩阵空间的投影,r ^为正交于r //的部分,感兴趣的分析物浓度c k只与近红外光谱中的这部分信号有关。
本实施例的近红外光谱的净信号通过
Figure PCTCN2021143614-appb-000010
计算,其中 S -k=span{s 1,s 2,L s k-1,s k+1,L,s m},矩阵的每一列为光谱中除去感兴趣分析物的浓度所含成分(即干扰成分)的浓度向量c k
Figure PCTCN2021143614-appb-000011
为只包含k th成分的纯光谱,I为单位矩阵,上标T代表矩阵的转置,上标+代表矩阵的伪逆矩阵。
本实施例的在逆模型条件下,没有先验数据求解S -k矩阵,因此采用秩消的方法来求解,具体描述为:应用主成分分析(PCA)法将原始数据进行重构,产生重构后的矩阵记作R,目的是为了避免R TR不满秩无法计算回归系数同时消除随机噪声。
本实施例的噪声子空间的求解表示为
Figure PCTCN2021143614-appb-000012
其中,
Figure PCTCN2021143614-appb-000013
为c k在A维空间的投影
Figure PCTCN2021143614-appb-000014
d T为所有校正集的平均光谱。标量a的计算方法为
Figure PCTCN2021143614-appb-000015
本实施例的对于未知样品的近红外光谱数据r k,un,其关于分析物的净分析信号计算方式为
Figure PCTCN2021143614-appb-000016
本实施例的使用偏最小二乘法(PLS)建立预测模型,使用预测集的测定系数(R 2)作为评判标准,在不发生欠拟合和过拟合的条件下选取最佳的预处理方案,使用波长选择方法(Least absolute shrinkage and selection operator,LASSO)选取最优波段,波长选择方法(LASSO)中的惩罚系数由十折交叉检验确定,将选取的波段作为输入,提取净分析信号,用作最终的校正数据,最后利用偏最小二乘法(PLS)建立预测模型,并检测模型性能。
实施例1:
本实施例提供茶叶近红外光谱分析中净分析信号提取的方法,以及选择预测茶叶中糖分含量的模型优化方案的过程(如图1所示),具体步骤如 下:
步骤一:首先制备待测样品,采集绿茶光谱数据为
Figure PCTCN2021143614-appb-000017
(如图2所示),利用液相色谱法测定样品中的糖分含量数据为
Figure PCTCN2021143614-appb-000018
将样本按照7:3的比例随机抽取划分为校正集和预测集;
步骤二:使用不同的预处理方案处理原始的近红外光谱数据,提取其只与糖分含量相关的净分析信号,建立PLS定量分析模型,以预测集的精度作为评价标准,选取最优的预处理方法,最终获得最优的预处理方式为9点S-G平滑处理结合SNV。预处理之后的近红外光谱图如图3所示,对其提取净分析信号如图4所示;
步骤三:使用LASSO对预处理过的近红外光谱进行波长选择,采用10折交叉验证确定最优惩罚系数,选取的波段如图5所示,然后提取处理之后的光谱数据的净分析信号如图6所示,将其作为最终的建模数据;
步骤四:基于最终的光谱数据使用PLS建立定量分析模型,分析模型性能,在最优PLS主成分为2的条件下,100次蒙特卡洛模拟实验的结果如图7所示,预测集R 2的中位数为0.91,将普通处理方法(S-G+SNV)下的PLS模型作为对比,100次蒙特卡洛模拟实验的结果如图8所示,在最优PLS主成分为7的条件下,预测集R 2的中位数为0.89。
通过对比可知,本发明所述方法能够通过近红外光谱数据实现高精度绿茶中糖分含量的测量,所得模型精度优于传统的建模方法。
以上实施例仅用以说明本发明的技术方案,而非对其限制,尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分 技术特征进行等同替换,而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种近红外光谱的净信号提取方法,其特征在于,包括如下步骤:
    采集样本,获取样本近红外光谱原始数据;
    使用化学检测方法检测感兴趣的分析物的含量,将其作为响应变量;
    将不同的光谱预处理方法以及不同的光谱预处理方法之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段;
    在逆模型的条件下,使用秩消方法获取噪声子空间,将测量得到的光谱信号向噪声子空间进行正交投影,垂直于噪声子空间的信号即为被测成份的净信号;
    建立预测模型,提取校正数据,利用校正数据检测模型性能。
  2. 如权利要求1所述的一种近红外光谱的净信号提取方法,其特征在于,净信号求解过程所用的秩消方法的过程如下:假定
    Figure PCTCN2021143614-appb-100001
    为采集的一个光谱向量,X(N′H)包含N个近红外光谱样本,c k(N′1)为样本对应的感兴趣的分析物浓度向量,则r分解为两部分r=r //+r ^,其中,r //为r在噪声子空间的投影,r ^为正交于r //的部分。
  3. 如权利要求2所述的一种近红外光谱的净信号提取方法,其特征在于,近红外光谱的净信号通过
    Figure PCTCN2021143614-appb-100002
    计算,其中S -k=span{s 1,s 2,L s k-1,s k+1,L,s m},矩阵的每一列为光谱中除去感兴趣分析物的浓度所含成分的浓度向量c k
    Figure PCTCN2021143614-appb-100003
    为只包含k th成分的纯光谱,I为单位矩阵,上标T代表矩阵的转置,上标+代表矩阵的伪逆矩阵。
  4. 如权利要求3所述的一种近红外光谱的净信号提取方法,其特征在于,在逆模型条件下,没有先验数据求解S -k矩阵,因此采用秩消的方法来 求解,具体描述为:应用主成分分析法将原始数据进行重构,产生重构后的矩阵记作R。
  5. 如权利要求4所述的一种近红外光谱的净信号提取方法,其特征在于,噪声子空间的求解表示为
    Figure PCTCN2021143614-appb-100004
    其中,
    Figure PCTCN2021143614-appb-100005
    为c k在重构矩阵空间的投影
    Figure PCTCN2021143614-appb-100006
    d T为所有校正集的平均光谱。
  6. 如权利要求5所述的一种近红外光谱的净信号提取方法,其特征在于,标量a的计算方法为
    Figure PCTCN2021143614-appb-100007
  7. 如权利要求6所述的一种近红外光谱的净信号提取方法,其特征在于,对于未知样品的近红外光谱数据r k,un,其关于分析物的净分析信号计算方式为
    Figure PCTCN2021143614-appb-100008
  8. 如权利要求7所述的一种近红外光谱的净信号提取方法,其特征在于,使用偏最小二乘法建立预测模型,使用预测集的测定系数作为评判标准,在不发生欠拟合和过拟合的条件下选取最佳的预处理方案,使用LASSO选取最优波段,将选取的波段作为输入,提取净分析信号,用作最终的校正数据,最后利用偏最小二乘法建立预测模型,并检测模型性能。
  9. 如权利要求8所述的一种近红外光谱的净信号提取方法,其特征在于,波长选择方法中的惩罚系数由十折交叉检验确定。
  10. 一种近红外光谱的净信号提取系统,其特征在于,包括:
    采样模块:采样模块采集样本,获取样本近红外光谱原始数据;
    预测模块:预测模块使用化学检测方法检测感兴趣的分析物的含量,将其作为响应变量;
    处理模块:处理模块将不同的光谱预处理方法以及不同的光谱预处理 方法之间的结合应用到原始光谱数据上,并使用十折交叉检验找出最优预处理方案,使用LASSO算法挑选出与响应变量相关的波段;
    提取模块:提取模块在逆模型的条件下,使用秩消方法获取噪声子空间,将测量得到的光谱信号向噪声子空间进行正交投影,垂直于噪声子空间的信号即为被测成份的净信号;
    检测模块:检测模块建立预测模型,提取校正数据,利用校正数据检测模型性能。
PCT/CN2021/143614 2021-12-29 2021-12-31 近红外光谱的净信号提取方法及其系统 WO2023123329A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/109,439 US20230204504A1 (en) 2021-12-29 2023-02-14 Method and system for extracting net signals of near infrared spectrum

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111634942.1 2021-12-29
CN202111634942.1A CN114298107A (zh) 2021-12-29 2021-12-29 近红外光谱的净信号提取方法及其系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/109,439 Continuation US20230204504A1 (en) 2021-12-29 2023-02-14 Method and system for extracting net signals of near infrared spectrum

Publications (1)

Publication Number Publication Date
WO2023123329A1 true WO2023123329A1 (zh) 2023-07-06

Family

ID=80971159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143614 WO2023123329A1 (zh) 2021-12-29 2021-12-31 近红外光谱的净信号提取方法及其系统

Country Status (2)

Country Link
CN (1) CN114298107A (zh)
WO (1) WO2023123329A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117288692A (zh) * 2023-11-23 2023-12-26 四川轻化工大学 一种酿酒粮食中单宁含量的检测方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6697654B2 (en) * 1999-07-22 2004-02-24 Sensys Medical, Inc. Targeted interference subtraction applied to near-infrared measurement of analytes
CN103239239A (zh) * 2013-04-23 2013-08-14 天津大学 一种定幅值的动态光谱数据提取方法
CN104255118A (zh) * 2014-01-22 2015-01-07 南京农业大学 基于近红外光谱技术的水稻种子发芽率快速无损测试方法
CN105203498A (zh) * 2015-09-11 2015-12-30 天津工业大学 一种基于lasso的近红外光谱变量选择方法
CN110006844A (zh) * 2019-05-22 2019-07-12 安徽大学 基于函数性主元分析的近红外光谱特征提取方法和系统
CN111965137A (zh) * 2020-08-18 2020-11-20 山东金璋隆祥智能科技有限责任公司 一种感冒清热颗粒中成分含量、水分的测定方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6697654B2 (en) * 1999-07-22 2004-02-24 Sensys Medical, Inc. Targeted interference subtraction applied to near-infrared measurement of analytes
CN103239239A (zh) * 2013-04-23 2013-08-14 天津大学 一种定幅值的动态光谱数据提取方法
CN104255118A (zh) * 2014-01-22 2015-01-07 南京农业大学 基于近红外光谱技术的水稻种子发芽率快速无损测试方法
CN105203498A (zh) * 2015-09-11 2015-12-30 天津工业大学 一种基于lasso的近红外光谱变量选择方法
CN110006844A (zh) * 2019-05-22 2019-07-12 安徽大学 基于函数性主元分析的近红外光谱特征提取方法和系统
CN111965137A (zh) * 2020-08-18 2020-11-20 山东金璋隆祥智能科技有限责任公司 一种感冒清热颗粒中成分含量、水分的测定方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BLANCO, M. CASTILLO, M. PEINADO, A. BENEYTO, R.: "Determination of low analyte concentrations by near-infrared spectroscopy: Effect of spectral pretreatments and estimation of multivariate detection limits", ANALYTICA CHIMICA ACTA, ELSEVIER, AMSTERDAM, NL, vol. 581, no. 2, 9 January 2007 (2007-01-09), AMSTERDAM, NL , pages 318 - 323, XP022209563, ISSN: 0003-2670, DOI: 10.1016/j.aca.2006.08.018 *
GENG YING, XIANG BING-REN, HE LAN: "Study on the Application of NAS-Based Algorithm in the NIR Model Optimization", SPECTROSCOPY AND SPECTRAL ANALYSIS, vol. 35, no. 10, 1 October 2015 (2015-10-01), pages 2730 - 2733, XP093074821 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117288692A (zh) * 2023-11-23 2023-12-26 四川轻化工大学 一种酿酒粮食中单宁含量的检测方法
CN117288692B (zh) * 2023-11-23 2024-04-02 四川轻化工大学 一种酿酒粮食中单宁含量的检测方法

Also Published As

Publication number Publication date
CN114298107A (zh) 2022-04-08

Similar Documents

Publication Publication Date Title
Mishra et al. New data preprocessing trends based on ensemble of multiple preprocessing techniques
Sharma et al. Trends of chemometrics in bloodstain investigations
Xu et al. Pretreatments of chromatographic fingerprints for quality control of herbal medicines
Ye et al. Non-destructive prediction of protein content in wheat using NIRS
US7620674B2 (en) Method and apparatus for enhanced estimation of an analyte property through multiple region transformation
US20210270742A1 (en) Peak-preserving and enhancing baseline correction methods for raman spectroscopy
Bujak et al. PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data
CN113588847B (zh) 一种生物代谢组学数据处理方法、分析方法及装置和应用
Bi et al. Quality evaluation of flue-cured tobacco by near infrared spectroscopy and spectral similarity method
CN113008805B (zh) 基于高光谱成像深度分析的白芷饮片质量预测方法
Xie et al. Using FT-NIR spectroscopy technique to determine arginine content in fermented Cordyceps sinensis mycelium
US7693689B2 (en) Noise-component removing method
Griffiths et al. Self-weighted correlation coefficients and their application to measure spectral similarity
WO2023123329A1 (zh) 近红外光谱的净信号提取方法及其系统
Ochoa et al. Class comparison enabled mass spectrum purification for comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry
Boysworth et al. Aspects of multivariate calibration applied to near-infrared spectroscopy
CN109214423B (zh) 一种基于动静态数据融合的食品质量判别分析方法
Shao et al. Extraction of chemical information from complex analytical signals by a non-negative independent component analysis
Rinnan et al. Simultaneous classification of multiple classes in NMR metabolomics and vibrational spectroscopy using interval-based classification methods: iECVA vs iPLS-DA
CN109520950A (zh) 一种对光谱偏移不敏感的化学成分光谱检测方法
US20230204504A1 (en) Method and system for extracting net signals of near infrared spectrum
Ding et al. Rapid assessment of exercise state through athlete’s urine using temperature-dependent NIRS technology
Boysworth et al. Aspects of multivariate calibration applied to near-infrared spectroscopy
Chen et al. A new hybrid strategy for constructing a robust calibration model for near-infrared spectral analysis
Al-Mbaideen et al. Coupling subband decomposition and independent component regression for quantitative NIR spectroscopy