# CN102854151B - Chemometrics method for classifying sample sets in spectrum analysis - Google Patents

Chemometrics method for classifying sample sets in spectrum analysis Download PDF## Info

- Publication number
- CN102854151B CN102854151B CN 201210375066 CN201210375066A CN102854151B CN 102854151 B CN102854151 B CN 102854151B CN 201210375066 CN201210375066 CN 201210375066 CN 201210375066 A CN201210375066 A CN 201210375066A CN 102854151 B CN102854151 B CN 102854151B
- Authority
- CN
- Grant status
- Grant
- Patent type
- Prior art keywords
- chemometrics
- method
- spectrum
- classifying
- sample
- Prior art date

## Links

## Abstract

## Description

一种光谱分析中样品集划分的化学计量学方法 One kind of spectroscopic analysis chemometric methods set into the sample

技术领域 FIELD

[0001] 本发明涉及光谱分析中的样品集划分技术领域，具体涉及一种用于定标集和预测集划分的化学计量学方法。 [0001] The present invention relates to a technical field sample set into spectral analysis, in particular, to a method for chemometric calibration set and the prediction set partition.

背景技术 Background technique

[0002] 光谱分析是根据物质的光谱通过定性或者定量分析来确定物质的化学成分及其含量的方法，由于其具有快速、灵敏、无创的优点。 [0002] Spectroscopy is a method to determine the chemical composition and the content of a substance through qualitative or quantitative analysis of the spectral substance, rapid, sensitive, non-invasive because it has an advantage. 目前应用的光谱分析主要有红外光谱分析、紫外光谱分析、拉曼光谱分析等。 Spectrum currently used mainly infrared spectroscopy, ultraviolet spectroscopy, Raman spectroscopy and so on. 特别是近红外(NIR)光谱分析技术以其简便快速、非破坏性、实时在线、多成分同时检测等特点在环境、食品、农业、生物医学等众多领域具有应用优势。 In particular near-infrared (NIR) spectroscopy technology for its simple, rapid, non-destructive, real-time online, multi-component simultaneous detection and so has the advantage in many application areas of environment, food, agriculture, bio-medicine.

[0003] 光谱分析需要把全部待分析样品划分为定标集和预测集。 [0003] All the spectral analysis requires the sample to be analyzed is divided into calibration set and the prediction set. 首先利用定标集样品的参考化学值(C)和光谱吸光度(A)来建立定标模型；然后结合预测集样品的光谱吸光度，利用定标模型计算预测集样品的成分含量预测值，通过比较预测集样品的预测值和参考化学值来评价模型的预测效果。 First, using the calibration set samples the reference chemical value (C) and the spectral absorbance (A) to establish the calibration models; then combined spectral absorbance of prediction set samples using component content predictive value calculating calibration model prediction set sample by comparing prediction sample and the reference value set value to evaluate chemical prediction results. 定标模型是基于样品的光谱吸光度和参考化学值的数据来建立、优化和评价的。 Calibration model is based on the spectral data of absorbance of the sample and the reference value to establish chemistry, optimization and evaluation. 然而，在光谱测量过程中，由于实验环境、操作技能、仪器精度等原因，光谱吸光度有可能产生漂移、倾斜等各方面的噪音；同样的，在化学值测量方面，常规的化学测量方法也不可避免地带来系统噪音、环境噪音、操作噪音等，使得数据存在测量误差，导致所建立的定标模型很难得到理想的预测效果。 However, in the spectral measurement process, due to the experimental environment, skills, precision instruments and other reasons, the spectral absorbance possible noise aspects of drift, inclination and the like; the same, the measured values of chemical, conventional chemical measurement can not be bring to avoid system noise, ambient noise, an operation noise, so that the data measurement errors, resulting in the established calibration model is difficult to obtain ideal prediction.

[0004] 实验表明，由于各种噪音的存在，定标集和预测集的不同划分会造成模型预测效果的变化很大，模型参数(如光谱分析波段、平滑模式、PLS因子数等)也会受到影响。 [0004] The experiments show that, due to the presence of various noise calibration set and the prediction set different divisions will cause great changes in the effect of the prediction model, the model parameters (e.g. spectral band, smoothing mode, the number of factors such as the PLS) also affected. 为了找到一个良好的划分，提高模型的适用率，在定标集和预测集划分的过程中，考虑如何选取信噪比较高的波长点，以此为基点做出定标集和预测集良好的划分，是光谱分析的一个关键研究课题。 To find a good division, to improve the rate applicable model, in the process of calibration set and prediction set divided, considering how to select a high signal to noise ratio wavelength point, anchored made good calibration set and prediction set the division is a key research topic spectroscopy.

发明内容 SUMMARY

[0005] 本发明所要解决的技术问题是为光谱分析提供一种样品划分的化学计量学方法，采用该方法能够为光谱分析的模型优化做出良好的数据准备。 [0005] The invention solves this technical problem is to provide a method of chemometric spectral analysis of a sample divided, this method can be made ready for the good data model optimized spectroscopic analysis. 该方法适用于紫外可见(UV)、近红外(NIR)、中红外(MIR)、拉曼(Raman)等光谱分析领域，已经在柚子皮果胶的NIR分析、土壤有机质总氮的NIR分析、废水化学需氧量的MIR分析、血液血红蛋白的MIR分析中得到验证。 This method is suitable for ultraviolet-visible (UV), near infrared (NIR), mid-infrared (the MIR), Raman (RAMAN) spectrum analysis and other fields, have orange peel pectin in the NIR analysis, NIR analysis of total nitrogen soil organic matter, MIR wastewater COD analysis, analysis of blood hemoglobin MIR verified.

[0006] 具体步骤为: [0006] Specific steps of:

[0007] I)数据归一化 [0007] I) data normalization

[0008] a)参考化学值的归一化 [0008] a) a reference value of normalized chemical

[0009] [0009]

[0011] b)光谱数据的归一化 [0011] b) normalized spectral data

[0015] 其中,N为样品个数,P为波长点个数；Cj为样品j的参考化学值，Cm为所有样品的参考化学值均值，cn(j) = HOrm(Cj)为该样品的参考化学值经过归一化计算之后的化学值数据；AU为样品j在第i个波长的吸光度值，Aiiffl为在第i个波长处的所有N个样品的吸光度平均值，norm(Au)为该样品在第i个波长处的吸光度值经过归一化计算之后的吸光度值;An(j) = IAj为样品j的吸光度向量的模； [0015] where, N is the number of samples, P is the number of wavelength points; Cj of the reference value of the sample j is a chemical, Cm is the reference value of the average of all chemical samples, cn (j) = HOrm (Cj) for samples Chemistry chemically reference value after the normalization value is calculated data; AU j of the i th sample absorbance wavelength, Aiiffl for the i-th average value of absorbance at a wavelength of all N samples, norm (Au) is the sample absorbance values after the normalization value is calculated after the absorbance at a wavelength of the i-th; an (j) = absorbance of the mold IAj sample vector of j;

[0016] 基于上述参考化学值和吸光度的归一化计算，每个样品对应有一个Cn(j)和一个An(J);根据朗伯比尔定律，基于所有样品的Cn(j) ^PAn(J) (j = 1，2，...，N)，回归计算每个样品的化学值预测值C' n(j)，随后计算每个样品的归一化数据回归偏差，即RDND，进一步对所有样品计算RDND的平均值，即RDNDAve ； [0016] The chemical reference value and absorbance normalization calculated based on, for each sample corresponding to a Cn (J) and an An (J); according to Lambert-Beer's law, based on all samples Cn (j) ^ PAn ( J) (j = 1,2, ..., N), the return value is calculated for each sample chemical prediction value C 'n (j), normalized data is then calculated for each sample regression deviation, i.e. RDND, further RDND average calculation for all samples, i.e. RDNDAve;

[0017] RDND(j) = |C，n(j)_Cn(j) |， (6) [0017] RDND (j) = | C, n (j) _Cn (j) |, (6)

[0018] 2)最值和次值样品的划分 [0018] 2) the most value and dividing the value of sample times

[0019] 为了定标预测模型能够具有保证良好的相关性，原则上需要把具有Cn(j)最大值和最小值的2个样品和具有An(j)最大值和最小值的2个样品放入定标集，把具有(；(」)次大值和次小值的2个样品和具有An(j)次大值和次小值的2个样品放入预测集；然而，这其中所选择的样品可能有若干个是相同的，需要做相应的选择处理；具体操作过程如下: [0019] In order to predict the calibration model can ensure good correlation has required the two samples having Cn (j) 2 samples and having An (j) of the maximum and minimum values maximum and minimum discharge principle meditation calibration set, having the (; ( ') 2 samples and the second largest value having the next smallest value An (j) 2 times a sample value and the second smallest value into the prediction set; however, wherein selected samples might be the same number, the appropriate selection process needs to be done; specific operation is as follows:

[0020] 把具有Cn(j)最大值和最小值的2个样品和具有An(j)最大值和最小值的2个样品作为最值集合，记为SZ ;同时把具有Cn(j)次大值和次小值的2个样品和具有An(j)次大值和次小值的2个样品作为次值集合，记为SC ;首先假设SZ和SC的内部样品均不相同，设定每个集合内部的样品个数为4，下面针对SZ和SC的交集进行讨论，以确定最值样品的划分； [0020] with the Cn (j) the maximum value and the minimum value of two samples having An (j) 2 samples the maximum and minimum values set as the most, referred to as SZ; the same time having Cn (j) times 2 samples having An (j) value and the second smallest value and the value times two times smaller sample value as the value set times, referred to as SC; SZ sample is first assumed that the internal and SC are not the same, set the number of samples within each set of 4, and are discussed below for SZ SC intersection to determine the best value of the sample is divided;

[0021] 如果SZ η SC为空集，即SZ和SC互相之间没有相同的样品，则SZ所有样品放入定标集，SC所有样品放入预测集；进一步记录SZ内部具有相同样品的个数S1和SC内部具有相同样品的个数S2,即S1, S2 e {O, I, 2}； [0021] If the SC SZ [eta] is an empty set, i.e. SZ SC and no mutually identical between samples, all samples were placed in the SZ calibration set, SC All samples were placed in the prediction set; SZ internal recording further sample with the same number the number of S1 and S2 number of samples have the same internal SC, i.e. S1, S2 e {O, I, 2};

[0022] 如果SZ η SC不为空集，则记录SZ η SC内部样品的个数s3，S3 = 1，2，3，4，把SZ η SC内部每一个样品的RDND分别与RDNDAve比较大小，如果某个样品的RDND>RDNDAve，则该样品选择放入定标集，否则将该样品选择放入预测集；然后，把SZ n Cs(SC)内部所有样品放入定标集，把Cs (SZ) n SC内部所有样品放入预测集，并分别记录sz η Cs (SC)内部和Cs (SZ) η SC内部具有相同样品的个数S1和s2，即S1, S2 e {O, 1,2};其中Cs是补集运算符； [0022] If SZ η SC is not an empty set, the number of recording SZ η SC internal sample S3, S3 = 1,2,3,4, inside the SZ η RDND SC of each sample were compared in magnitude with RDNDAve, If a sample RDND> RDNDAve, the selected samples into calibration set, or the sample was placed in a prediction set selected; then, the SZ n Cs (SC) into the interior of all calibration set samples, the Cs ( SZ) n-SC All samples were placed inside the prediction set, respectively, recorded sz η Cs (SC) and the inner Cs (SZ) η SC have the same number of internal samples S1 and s2, i.e. S1, S2 e {O, 1, 2}; where Cs is the complement operator;

[0023] 3)剩余样品的划分原则 [0023] 3) the principle of dividing the remaining sample

[0024] 经过最值样品的划分以后，剩余样品个数为N-8+Sl+s2+s3 ;关于剩余样品的划分，基于最高相关的原则，分别计算每一个波长点i的光谱数据和参考化学值的相关系数R⑴， [0024] After the division through most of the sample values, the number of the remaining sample to N-8 + Sl + s2 + s3; divided respect to the remaining samples, the highest correlation based on the principle of spectral data were calculated for each wavelength and a reference point i chemical correlation value R⑴,

[0026] 从所有的波长点中找到最大的Rmte = max{R(i), i = 1，2....P},并记录Rntrte所在的波长点序号in(rte;; [0026] found the largest Rmte = max {R (i), i = 1,2 .... P} from all points in the wavelength and the recording wavelength where the point number Rntrte in (rte ;;

[0027] 对剩余的样品做足够多次的随意划分，对每一次划分，选取第in-个波长点处的光谱数据{AMJ，结合样品的参考化学值，分别在定标集内和预测集内计算相关系数&-和R.ivPset ， [0027] randomly divided enough times to do the remaining samples, for each division, select {AMJ spectral data at the point of in- wavelength, incorporated herein by reference chemical values of the samples were set in the calibration set and the prediction set the correlation coefficient & - and R.ivPset,

[0030] 其中L、K分别为定标集和预测集样品数量，即L+K = N ；CLffl, CKm分别为定标集和预测集样品化学值平均值，Anote,L(J)为定标集中第j个样品在第in-个波长点上的光谱数据，Anote,Lffl为定标集样品在第in_个波长点上的光谱数据均值，Αη(^，κω为预测集中第j个样品在第in(rte个波长点上的光谱数据，Anote,Kffl为预测集样品在第in(rte个波长点上的光谱数据均值； [0030] wherein L, K are calibration set and the prediction set number of samples, i.e. L + K = N; CLffl, CKm are calibration set and the prediction set of the average value of chemical samples, Anote, L (J) is constant the j-labeled sample spectral data in the first wavelength in- point, Anote, Lffl set of calibration sample spectra in the first data in_ mean wavelength point, Αη (^, κω to predict the j-th spectral data on the first sample in (rte wavelength points, Anote, Kffl sample spectra of the prediction set of data on average in (rte point wavelength;

[0031]计算 Rcset 和Rpset 之间的绝对偏差，即Absolute offset of correlationcoefficients,简称AOC: [0031] The calculation of the absolute deviation between Rcset and Rpset, i.e. Absolute offset of correlationcoefficients, referred AOC:

[0032] AOC= I Rcset-Rpset |， (10) [0032] AOC = I Rcset-Rpset |, (10)

[0033] 为了使定标集和预测集具有相似性，以A0C〈10_5为准则选择一个划分作为建立近红外光谱分析模型的划分； [0033] In order to make the calibration set and the prediction set similarity to A0C <10_5 selects one dividing a near infrared spectroscopy as a model established criteria;

[0034] 按照这种划分方法，设计把全部待分析样品按照2:1的比例划分为定标集和预测集；根据上述过程进行计算，并选择一个满足A0C〈10_5的划分。 [0034] According to this method of division, the design of the entire sample to be analyzed in a 2: 1 ratio is divided into a calibration set and the prediction set; calculated according to the above process, and select a division A0C <10_5 satisfying.

[0035] 本发明的化学计量学方法对全部样品的光谱数据和参考化学值数据进行降噪、归一、关联等技术处理，并进行样品划分。 Chemometrics [0035] The present invention all samples spectral data and reference values of the chemical data for noise reduction, normalization, and other related processing technology, and dividing the sample. 对于归一化降噪后的数据，通过光谱吸光度和化学值相结合进行计算，基于最高相关波长点的光谱数据进行划分，使得定标模型具有较高的决定系数，同时，通过比较定标集和预测集的内部相关系数，保证划分之后定标集和预测集具有一定的相关相似程度。 For data normalized noise reduction by spectral and chemical absorbance values calculated combination, divided based on the spectral data of the highest point of the relevant wavelength, so that the calibration model has a high coefficient of determination, at the same time, by comparing the calibration set and internal correlation coefficient prediction set, then divided to ensure calibration set and the prediction set associated with a certain degree of similarity. 在这样的划分下建立定标模型，可以得到良好的预测效果。 Establish calibration models in this division, we can get a good prediction. 在这个意义下，本发明提出的化学计量学方法为光谱分析的模型优化提供了良好的数据基础；该方法适用于红外、紫外、拉曼等光谱分析的数据建模优化及模型验证系统，为优选连续波段、离散波长组合，以及原光谱、导数光谱的峰值优选等模型优化过程提供良好的数据准备。 In this sense, chemometric methods proposed by the present invention provides a good model of optimized spectral analysis data base; the method is applicable to infrared, ultraviolet, and other Raman spectroscopic analysis of data modeling and model validation of the optimization system for preferably a continuous band, a combination of discrete wavelengths, and the original spectrum, the spectrum peaks derivative preferably the like to provide a good model optimization data preparation process.

附图说明 BRIEF DESCRIPTION

[0036] 图1为本发明一种光谱分析中样品集划分的化学计量学方法的工作流程图。 [0036] A flow chart of FIG chemometric methods of spectroscopic analysis of a sample set into the present disclosure.

[0037] 图中:最值和次值样品划分方法的具体流程由图2进行说明。 [0037] FIG: best value for the sample value and the time division method will be described specifically by the flow of FIG.

[0038] 图2为最值和次值样品划分方法的流程图；是图1中的子图。 [0038] FIG 2 is a flowchart of the most value and sample time values dividing method; sub FIG. 1 FIG.

[0039] 图3为本发明实施例中寻找最高相关光谱数据点的相关系数图。 [0039] Figure 3 to find the highest correlation coefficient of FIG spectral data points of the embodiment of the present invention.

[0040] 图中:全谱段范围是lOOOO1000cnT1，包含了可见光和近红外谱段，以每个波长点的光谱数据结合样品的参考化学值来计算相关系数，从而找到最高相关的光谱数据点是SOSScnT1，样品的划分是基于该点的光谱数据和化学值来进行，依此，在一定程度上可以保证定标模型具有较高的相关性。 [0040] FIG: the full spectral range lOOOO1000cnT1, includes visible and near infrared spectral range, to calculate the correlation coefficient for each wavelength spectral data point to the binding of the reference value of the sample chemistry, to find the most relevant spectral data points are dividing SOSScnT1, the sample is performed based on the spectral and chemical data of this point, and so, to a certain extent can be guaranteed calibration model has a high correlation.

具体实施方式 detailed description

[0041] 实施例: [0041] Example:

[0042] 以柚子皮果胶的近红外分析为例，共有118个柚子皮样品(N = 118)，每个样品通过光谱实验测量得到3114个波长点(P = 3114)的光谱值，按照大约2:1的比例，定标集分配78个样品(L = 78)，预测集分配40个样品(K = 40);采用本发明的方法对样品进行划分，具体步骤: [0042] In near-infrared analysis Pectin example grapefruit, grapefruit peel total of 118 samples (N = 118), each sample point to obtain wavelength 3114 (P = 3114) of the experimentally measured spectra by spectral values, according to some 2: 1 ratio, 78 assigned calibration set samples (L = 78), the predicted set allocation of 40 samples (K = 40); using the method of the invention the sample is divided, the specific steps:

[0043] 1)数据归一化 [0043] 1) data normalization

[0044] a)参考化学值的归一化 [0044] a) a reference value of normalized chemical

[0045] 基于已知的118个柚子皮样品(编号从I到118)的果胶化学值，首先由⑴式计算平均数Cm = 4.987(% )，进一步根据平均数Cm，由(2)式计算每个样品的化学值归一化数值Cn(j)。 [0045] Based on the known 118 grapefruit skin samples (numbered from I to 118) pectin chemical values, first by calculating the average formula ⑴ Cm = 4.987 (%), based on the mean Cm is further, by the formula (2) chemical each sample was calculated values were normalized value Cn (j).

[0046] b)光谱数据的归一化 [0046] b) normalized spectral data

[0047] 基于已知的118个柚子皮样品在3114个波长点上的光谱数据，由(3)式计算每个波长上的光谱平均数Ai,，进一步根据平均数Ai,，由(4)式计算每个样品在每个波长上的归一化光谱数值norm(Au)，然后把每个样品在所有波长点上的光谱数值视为该样品的光谱数值向量，进一步根据(5)式计算每个样品的光谱数值向量的模八^」)。 [0047] Based on the known spectral data of 118 samples on the grapefruit skin wavelength 3114 points, by the equation (3) the average number Ai spectrum at each wavelength is calculated based on the mean Ai ,, ,, further from (4) calculated normalized spectral norm value for each sample at each wavelength (Au), and then each sample spectrum values at all points of the considered wavelength spectral data of the sample vector, is calculated further according to (5) eight analog value vector of each sample spectrum ^ ").

[0048] 基于上述参考化学值和吸光度的归一化计算，每个样品对应有一个Cn(j)和一个An(j)。 [0048] The chemical reference value and the normalized absorbance is calculated based on, for each sample corresponding to a Cn (j) and an An (j). 根据朗伯比尔定律，基于所有样品的Cn(j)和4„(」)(j = 1，2，…，118)，回归计算每个样品的化学值预测值c'n(j)，随后根据(6)式计算每个样品的归一化数据回归偏差RDND，并进一步对所有样品计算RDND的平均值RDNDAve。 According to Lambert-Beer's law, based on all samples Cn (j) and 4 "(") (j = 1,2, ..., 118), the predicted value of the regression calculation chemical c'n each sample value (j), then each sample was calculated according to (6) of formula normalized data regression deviation RDND, and further calculates the average of all samples for RDND RDNDAve.

[0049] 2)最值样品的划分 [0049] 2) the most value of the sample divided

[0050] 为了定标预测模型能够具有保证良好的相关性，原则上需要把具有Cn(j)最大值和最小值的2个样品和具有An(j)最大值和最小值的2个样品放入定标集，把具有(；(」)次大值和次小值的2个样品和具有An(j)次大值和次小值的2个样品放入预测集；然而，这其中所选择的样品可能有若干个是相同的，需要做相应的选择处理；具体操作过程如下: [0050] In order to predict the calibration model can ensure good correlation has required the two samples having Cn (j) 2 samples and having An (j) of the maximum and minimum values maximum and minimum discharge principle meditation calibration set, having the (; ( ') 2 samples and the second largest value having the next smallest value An (j) 2 times a sample value and the second smallest value into the prediction set; however, wherein selected samples might be the same number, the appropriate selection process needs to be done; specific operation is as follows:

[0051] 假设SZ和SC的内部样品均不相同，设定每个集合内部的样品个数为4 ;把具有Cn(J)最大值和最小值的2个样品和具有An(j)最大值和最小值的2个样品作为最值集合SZ,即SZ = {98，66，98，16}，同时把具有Cn(j)次大值和次小值的2个样品和具有An(j)次大值和次小值的2个样品作为最值集合SC，即SC = {85，81，85，13}。 [0051] Suppose SZ inside the sample are not the same and SC, setting the number of samples within each set of 4; the two samples having Cn (J) and having maximum and minimum values An (j) maximum 2 samples and the minimum value set as the most SZ, i.e. SZ = {98,66,98,16}, while the two samples having the second largest values An and having a second smallest value Cn (j) (j) sample times 2 times a large value and a small value as the value set most SC, i.e., SC = {85,81,85,13}.

[0052] 显然，SZ η SC为空集(即SZ和SC互相之间没有相同的样品)，则SZ所有样品放入定标集，SC所有样品放入预测集；进一步记录SZ内部具有相同样品的个数S1 = I和SC内部具有相同样品的个数S2 = I。 [0052] Obviously, SC SZ [eta] is an empty set (i.e., SC SZ and not mutually identical between samples), placed in the SZ all calibration set samples, all samples were placed in a prediction set SC; recording further sample having the same internal SZ S1 = internal number of the number of I and SC have the same sample S2 = I.

[0053] 3)剩余样品的划分原则 [0053] 3) the principle of dividing the remaining sample

[0054] 经过最值样品的划分以后，剩余样品个数为112 ;关于剩余的划分，基于最高相关的原则，由(7)式分别计算每一个波长点i的光谱数据和参考化学值的相关系数R(i)，从所有的波长点中找到最大的Rntrte = max{R(i), i = I, 2，-,3114} = 0.6332，并记录Rnrte所在的波长是8058CHT1，对应波长点序号in(rte = 1009。 [0054] After the division through most of the sample values, the number of the remaining sample of 112; the remaining divided on the basis of the highest correlation principle, by the formula (7) related to each spectral data point i and a reference wavelength chemical values are calculated the coefficient R (i), to find the largest Rntrte = max {R (i), i = I, 2, -, 3114} from all wavelengths points = 0.6332, and the recording wavelength Rnrte where is 8058CHT1, the wavelength corresponding to point number in (rte = 1009.

[0055] 对剩余的112个样品做足够多次的随机划分，对每一次划分，选取第1009个波长点处的光谱数据{AMtJ，结合样品的参考化学值{CMtJ，根据⑶式和(9)式分别在定标集内和预测集内计算相关系数R^t和RPsrt，并进一步由(10)式计算Rfcrt和Rpsrt之间的绝对偏差(AOC)，选择A0C〈10_5的一个划分用于建立近红外光谱分析模型。 [0055] make the remaining 112 samples were split randomly enough times, each time division, select {AMtJ spectral data at wavelengths of 1009 points, with the reference value of the sample chemical {CMtJ, and according ⑶ formula (9 ) were calculated in the formula calibration set and the prediction set and the correlation coefficient R ^ t RPsrt, and further calculated from (10) and the absolute deviation between Rfcrt Rpsrt (AOC), select A0C <10_5 for dividing a establishing a near infrared spectroscopy model.

[0056] 作为比较，另外采用完全随机的方法对样品进行划分；采用偏最小二乘法(PLS)，分别对本发明的划分方法和完全随机划分方法所得到的定标集和预测集样品数据建立近红外定标模型，并对模型预测效果进行比较(见表1);结果表明，采用本发明的划分方法进行定标集和预测集样品的划分可以提高模型的预测结果，改善近红外的检测能力。 [0056] For comparison, additional use of completely random approach samples were divided; calibration set and the prediction set sample data using the partial least squares method (the PLS), each of the division method of the present invention and completely random division obtained by the method established near infrared calibration model, and the model predictive effect (see Table 1); the results show that the division method of the present invention is divided into calibration set and the prediction set of samples to improve the prediction results of the model can improve the ability to detect the near infrared .

[0057] 表1基于PLS模型的两种划分方法的比较 [0057] Table 1 compares the PLS model based on two methods of division

[0058].、 PLS模型预测结果 [0058]., PLS prediction model results

[0059] 注:RMSEC为定标集样品的预测偏差 [0059] Note: RMSEC set as the standard deviation of the prediction given sample

[0060] RMSEP为预测集样品的预测偏差 [0060] RMSEP deviation of prediction set sample

[0061] Rc为定标集样品的预测相关系数 [0061] Rc is a calibration set of samples the correlation coefficient prediction

[0062] Rp为预测集样品的预测相关系数 [0062] Rp is a prediction coefficient set of samples

## Claims (1)

- 1.一种光谱分析中样品集划分的化学计量学方法，其特征在于具体步骤为: 1)数据归一化a)参考化学值的归一化 A chemometric spectral analysis of the sample set into, characterized in that the specific steps: 1) data were normalized a) chemical normalized reference value b)光谱数据的归一化 b) normalizing the spectral data 其中，N为样品个数，P为波长点个数；ς_为样品j的参考化学值，Cm为所有样品的参考化学值均值，cn(j) = HOrm(Cj)为该样品的参考化学值经过归一化计算之后的化学值数据；Aij为样品j在第i个波长的吸光度值，Aiim为在第i个波长处的所有N个样品的吸光度平均值，norm(Au)为该样品在第i个波长处的吸光度值经过归一化计算之后的吸光度值；An(j)=Aj为样品j的吸光度向量的模； 基于上述参考化学值和吸光度的归一化计算，每个样品对应有一fCn(j)和一个An(J);根据朗伯比尔定律，基于所有样品的Cn(j) ^PAn(J) (j = 1，2，...，N)，回归计算每个样品的化学值预测值C' n(j)，随后计算每个样品的归一化数据回归偏差，即RDND，进一步对所有样品计算RDND的平均值，即RDNDAve ； RDND(j) = |C'n(j)-Cn(j) |， (6) 2)最值和次值样品的划分为了定标预测模型能够具有保证良好的相关性，原则上需 Wherein, N is the number of samples, P is the number of wavelength points; ς_ reference value of the sample j is a chemical, Cm is the reference value of the average of all chemical samples, cn (j) = HOrm (Cj) for the reference sample of the chemical chemically value data value calculated after normalization; Aij of the sample j in the i-th wavelength absorbance, Aiim absorbance mean of all N samples in the i th wavelength of, norm (Au) for samples after the value of absorbance values after normalizing the absorbance calculated at the i-th wavelength; an (j) = Aj j die absorbance of the sample vector; chemical based on the reference value and the normalized absorbance is calculated for each sample corresponding to a fCn (j) and an An (J); according to Lambert-Beer's law, based on all samples Cn (j) ^ PAn (J) (j = 1,2, ..., N), is calculated for each regression chemical sample value the predicted value C 'n (j), normalized data is then calculated for each sample regression deviation, i.e. RDND, RDND further calculates the average of all samples, i.e. RDNDAve; RDND (j) = | C' n (j) -Cn (j) |, (6) 2) sub-dividing the most value and for scaling values of the samples having a prediction model can be guaranteed to be on the good correlation, principle 要把具有Cn(j)最大值和最小值的2个样品和具有An(j)最大值和最小值的2个样品放入定标集，把具有Cn(j)次大值和次小值的2个样品和具有An(j)次大值和次小值的2个样品放入预测集；然而，这其中所选择的样品可能有若干个是相同的，需要做相应的选择处理；具体操作过程如下: 把具有Cn(j)最大值和最小值的2个样品和具有An(j)最大值和最小值的2个样品作为最值集合，记为SZ ;同时把具有Cn(j)次大值和次小值的2个样品和具有An(j)次大值和次小值的2个样品作为次值集合，记为SC ;首先假设SZ和SC的内部样品均不相同，设定每个集合内部的样品个数为4，下面针对SZ和SC的交集进行讨论，以确定最值样品的划分；如果SZ η SC为空集，即SZ和SC互相之间没有相同的样品，则SZ所有样品放入定标集，SC所有样品放入预测集；进一步记录SZ内部具有相同样品的个数S1和SC内部具有 We should have Cn (j) 2 samples and having maximum and minimum values An (j) 2 of the maximum and minimum samples into calibration set, with the Cn (j) and the value of the second largest next smallest value two samples having an (j) 2 times a sample value and the second smallest value into the prediction set; however, the selected sample which may have the same number, the corresponding selection process needs to be done; specific Proceed as follows: the set of values as the most two samples having Cn (j) and the minimum and maximum two samples having An (j) the maximum value and the minimum value, referred to as SZ; the same time having Cn (j) 2 samples having An (j) and the value of the second largest next smallest value and the value of two times the sample value of the next smallest value set as times, referred to as SC; SZ sample is first assumed that the internal and SC are not the same, provided each internal set of predetermined number of samples is 4, and are discussed below for SZ SC intersection to determine the best value of the sample is divided; if SC SZ [eta] is an empty set, i.e. SZ SC and no mutually identical between samples, All samples were placed in the SZ calibration set, SC All samples were placed in the prediction set; SZ further inside the same recording sample number S1 and SC having internal 同样品的个数S2,即S1, S2 e {O, I, 2}； 如果SZ η SC不为空集，则记录SZ n sc内部样品的个数s3，s3 = 1,2,3，4jEsz n sc内部每一个样品的RDND分别与RDNDAve比较大小，如果某个样品的RDND>RDNDAve，则该样品选择放入定标集，否则将该样品选择放入预测集；然后，把SZ H Cs(SC)内部所有样品放入定标集，把Cs(SZ) η SC内部所有样品放入预测集，并分别记录SZ η Cs (SC)内部和Cs (SZ) η SC内部具有相同样品的个数S1和S2，即S1, S2 e {O, I, 2};其中Cs是补集运算符; 3)剩余样品的划分原则经过最值样品的划分以后，剩余样品个数为N-8+Sl+S2+S3 ;关于剩余样品的划分，基于最高相关的原则，分别计算每一个波长点i的光谱数据和参考化学值的相关系数R(i)， S2 likewise number of products, i.e. S1, S2 e {O, I, 2}; if SZ η SC is not an empty set, the number of samples inside s3 SZ n sc is recorded, s3 = 1,2,3,4jEsz n internal sc RDND each sample respectively RDNDAve size comparison, if a sample RDND> RDNDAve, the selected samples into calibration set, or the sample was placed in a prediction set selected; then, the SZ H Cs ( the number of SC) into the interior of all calibration set samples, the Cs (SZ) η SC All samples were placed inside the prediction set, were recorded and SZ η Cs (SC) and the inner Cs (SZ) η sample having the same internal SC S1 and S2, i.e., S1, S2 e {O, I, 2}; where Cs is the complement operator; 3) dividing the remaining sample through the principle of most value after dividing the sample, the number of the remaining sample to N-8 + Sl + S2 + S3; dividing the remaining sample on the basis of the principle of the highest correlation, a correlation coefficient R were calculated for each spectral data point i and a reference wavelength chemical value (i), 从所有的波长点中找到最大的Rmte = max{R(i), i = I, 2....P},并记录Rntrte所在的波长点序号in(rte;; 对剩余的样品做足够多次的随意划分，对每一次划分，选取第in-个波长点处的光谱数据{An_}，结合样品的参考化学值，分别在定标集内和预测集内计算相关系数Resrt和R.ivPset ， Found largest Rmte = max {R (i), i = I, 2 .... P} from all points in the wavelength and the recording wavelength where the point number Rntrte in (rte ;; made on the remaining sample enough randomly divided times, each time division, a first select spectral data in- {An_} at a point wavelength, the chemical binding reference value of the sample were calculated in the fixed inner landmark set and the prediction set and the correlation coefficient Resrt R.ivPset , 其中L、K分别为定标集和预测集样品数量，即L+K = N ；CLffl,CKffl分别为定标集和预测集样品化学值平均值，为定标集中第j个样品在第个波长点上的光谱数据，Antrt⑶为定标集样品在第in-个波长点上的光谱数据均值，Anote,K(J)为预测集中第j个样品在第inote个波长点上的光谱数据，Anote^为预测集样品在第in(rte个波长点上的光谱数据均值； 计算Rcset 和Rpset 之间的绝对偏差，即Absolute offset of correlationcoefficients,简称AOC: AOC= I Rcset-Rpset |， (10)为了使定标集和预测集具有相似性，以A0C〈10_5为准则选择一个划分作为建立近红外光谱分析模型的划分； 按照这种划分方法，设计把全部待分析样品按照2:1的比例划分为定标集和预测集；根据上述过程进行计算，并选择一个满足A0C〈10_5的划分。 Wherein L, K are calibration set and the prediction set number of samples, i.e. L + K = N; CLffl, CKffl are calibration set and the prediction set of chemical sample value average concentration of calibration samples in the j-th th the wavelength spectrum data points, Antrt⑶ calibration set spectral data for the given sample at the wavelength of the in- point average, Anote, K (J) of the j-th prediction sample spectral data in the first wavelength inote point, Anote ^ prediction data set in the first sample spectra in (rte point mean wavelength; absolute deviation between the calculated and Rcset Rpset, i.e. absolute offset of correlationcoefficients, referred AOC: AOC = I Rcset-Rpset |, (10) in order to make the calibration set and the prediction set similarity to A0C <10_5 selects one dividing a near infrared spectroscopy as a model established criteria; division method in accordance with this, the entire design of the sample to be analyzed in a 2: 1 division ratio according to the calculation procedure performed, and select a division A0C <10_5 satisfying; as calibration set and the prediction set.

## Priority Applications (1)

Application Number | Priority Date | Filing Date | Title |
---|---|---|---|

CN 201210375066 CN102854151B (en) | 2012-10-06 | 2012-10-06 | Chemometrics method for classifying sample sets in spectrum analysis |

## Applications Claiming Priority (1)

Application Number | Priority Date | Filing Date | Title |
---|---|---|---|

CN 201210375066 CN102854151B (en) | 2012-10-06 | 2012-10-06 | Chemometrics method for classifying sample sets in spectrum analysis |

## Publications (2)

Publication Number | Publication Date |
---|---|

CN102854151A true CN102854151A (en) | 2013-01-02 |

CN102854151B true CN102854151B (en) | 2014-07-30 |

# Family

## ID=47400928

## Family Applications (1)

Application Number | Title | Priority Date | Filing Date |
---|---|---|---|

CN 201210375066 CN102854151B (en) | 2012-10-06 | 2012-10-06 | Chemometrics method for classifying sample sets in spectrum analysis |

## Country Status (1)

Country | Link |
---|---|

CN (1) | CN102854151B (en) |

## Families Citing this family (2)

Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|

CN103267739A (en) * | 2013-05-07 | 2013-08-28 | 浙江大学 | Aquaculture wastewater organic matter concentration measurement method based on spectrum technology |

CN105092039B (en) * | 2015-08-04 | 2017-04-12 | 深圳市华星光电技术有限公司 | Spectrophotometer method for obtaining multi-frequency correction value |

## Citations (2)

Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|

US6687620B1 (en) | 2001-08-01 | 2004-02-03 | Sandia Corporation | Augmented classical least squares multivariate spectral analysis |

CN102305772A (en) | 2011-07-29 | 2012-01-04 | 江苏大学 | Method for screening characteristic wavelength of near infrared spectrum features based on heredity kernel partial least square method |

## Family Cites Families (1)

Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|

US6475800B1 (en) * | 1999-07-22 | 2002-11-05 | Instrumentation Metrics, Inc. | Intra-serum and intra-gel for modeling human skin tissue |

## Patent Citations (2)

Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|

US6687620B1 (en) | 2001-08-01 | 2004-02-03 | Sandia Corporation | Augmented classical least squares multivariate spectral analysis |

CN102305772A (en) | 2011-07-29 | 2012-01-04 | 江苏大学 | Method for screening characteristic wavelength of near infrared spectrum features based on heredity kernel partial least square method |

## Non-Patent Citations (6)

Title |
---|

Huazhou Chen等.waveband selection for NIR spectroscopy analysis of soil organic matter based on SG smoothing and MWPLS methods.《chemometrics and intelligent laboratory systems》.2011,第107卷第139-146页. |

冯华.土壤有机质NIR光谱分析的模型优化与稳定性.《中国优秀硕士学位论文全文数据库 农业科技辑》.2009,(第9期),第8-22页. |

潘涛.土壤总氮近红外光谱分析的波段优选.《分析化学研究》.2012,第40卷(第6期),第1-5页. |

王锡昌.近红外光谱快速无损测定狭鳕鱼糜水分和蛋白质含量.《食品科学》.2010,第31卷(第16期),第168-171页. |

陈华舟.光谱分析的化学计量学研究及其在土壤近红外分析中的应用.《中国博士学位论文全文数据库 农业科技辑》.2012,(第2期),第15-18页. |

黄富荣等.应用近红外漫反射光谱快速测定土壤锌含量.《光学精密工程》.2010,第18卷(第3期),第586-592页. |

## Also Published As

Publication number | Publication date | Type |
---|---|---|

CN102854151A (en) | 2013-01-02 | application |

## Similar Documents

Publication | Publication Date | Title |
---|---|---|

Downey | Food and food ingredient authentication by mid-infrared spectroscopy and chemometrics | |

Peng et al. | Prediction of apple fruit firmness and soluble solids content using characteristics of multispectral scattering images | |

Golic et al. | Robustness of calibration models based on near infrared spectroscopy for the in-line grading of stonefruit for total soluble solids content | |

Gallardo-Velázquez et al. | Application of FTIR-HATR spectroscopy and multivariate analysis to the quantification of adulterants in Mexican honeys | |

Leardi et al. | Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data | |

Mouazen et al. | Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy | |

Galtier et al. | Geographic origins and compositions of virgin olive oils determinated by chemometric analysis of NIR spectra | |

Shao et al. | Visible/near infrared spectrometric technique for nondestructive assessment of tomato ‘Heatwave’(Lycopersicum esculentum) quality characteristics | |

Ying et al. | Experiments on predicting sugar content in apples by FT-NIR technique | |

Wu et al. | Hybrid variable selection in visible and near-infrared spectral analysis for non-invasive quality determination of grape juice | |

Rossel | Robust modelling of soil diffuse reflectance spectra by “bagging-partial least squares regression” | |

Soares et al. | The successive projections algorithm | |

Rudnitskaya et al. | Analysis of apples varieties–comparison of electronic tongue with different analytical techniques | |

Cai et al. | A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra | |

Dumarey et al. | Exploration of linear multivariate calibration techniques to predict the total antioxidant capacity of green tea from chromatographic fingerprints | |

Alamprese et al. | Detection of minced beef adulteration with turkey meat by UV–vis, NIR and MIR spectroscopy | |

Shao et al. | A method for near-infrared spectral calibration of complex plant samples with wavelet transform and elimination of uninformative variables | |

Shao et al. | A new regression method based on independent component analysis | |

Di Egidio et al. | NIR and MIR spectroscopy as rapid methods to monitor red wine fermentation | |

Cen et al. | Measurement of soluble solids contents and pH in orange juice using chemometrics and Vis− NIRS | |

Dos Santos et al. | A review on the applications of portable near-infrared spectrometers in the agro-food industry | |

Liu et al. | Nondestructive determination of pear internal quality indices by visible and near-infrared spectrometry | |

Flores et al. | Feasibility in NIRS instruments for predicting internal quality in intact tomato | |

Rao et al. | Quantitative and qualitative determination of acid value of peanut oil using near-infrared spectrometry | |

Casale et al. | Characterisation of PDO olive oil Chianti Classico by non-selective (UV–visible, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques |

## Legal Events

Date | Code | Title | Description |
---|---|---|---|

C06 | Publication | ||

C10 | Request of examination as to substance | ||

C14 | Granted |