WO2020248961A1 - 一种无参考值的光谱波数选择方法 - Google Patents

一种无参考值的光谱波数选择方法 Download PDF

Info

Publication number
WO2020248961A1
WO2020248961A1 PCT/CN2020/095035 CN2020095035W WO2020248961A1 WO 2020248961 A1 WO2020248961 A1 WO 2020248961A1 CN 2020095035 W CN2020095035 W CN 2020095035W WO 2020248961 A1 WO2020248961 A1 WO 2020248961A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
samples
wavenumber
value
spectra
Prior art date
Application number
PCT/CN2020/095035
Other languages
English (en)
French (fr)
Inventor
毕一鸣
李永生
帖金鑫
李石头
何文苗
廖付
张立立
田雨农
郝贤伟
吴继忠
王辉
许利平
Original Assignee
浙江中烟工业有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江中烟工业有限责任公司 filed Critical 浙江中烟工业有限责任公司
Publication of WO2020248961A1 publication Critical patent/WO2020248961A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Definitions

  • the invention belongs to the field of data analysis and mathematical modeling, and specifically relates to a method for selecting spectral wavenumbers without reference values.
  • Spectroscopy technology is widely used in chemical, food, pharmaceutical and other industries because of its fast, accurate and non-destructive properties.
  • Spectral multivariate correction technology can be effectively used for substance component content detection and online process monitoring.
  • information such as the content of the substance of interest cannot be directly obtained from the spectrum.
  • Quantitative analysis using spectroscopy generally uses secondary modeling, that is, a calibration model is established based on a set of known samples. This set of known samples is called the calibration set sample or the training set sample.
  • the spectrum of this set of samples and their material reference values are used to establish a multivariate calibration method. Forecast model. For the sample to be tested, only the spectrum is measured, and the quantitative result can be quickly given according to the established model.
  • Lu Jiangang Patent Application No. 201510991505.3 discloses a method for selecting spectral wavenumbers. The method is based on the wavenumber of the spectrum, multiple random samples are sampled and corrected, a partial least squares regression model is established, and the variable projection importance of each wavenumber is calculated Coefficient and filter according to the specified rules.
  • This patent and other wavenumber screening methods rely on establishing a correlation model between spectral wavenumber and substance content (such as partial least squares model), and then screening based on the performance of each wavenumber in the model.
  • the number of modeling samples is related to the complexity of the detection target and the content of the substance of interest. For example, the use of near-infrared spectroscopy to analyze the protein content of grains generally requires 200 modeling samples. In practical applications, each sample in the modeling data set needs to collect spectra and determine the content of the substance to be modeled. However, the labor, instruments, and consumables required to determine the content of substances using traditional chemical methods are relatively high, and enterprises have to bear high time and economic cost pressures.
  • an urgent problem to be solved is not to rely on the establishment of models, that is, to effectively filter the spectral wavenumbers under the conditions of only spectra, so as to use fewer samples with material content to model, and save a lot of money for enterprises
  • the cost of labor consumables and time is of great significance.
  • the purpose of the present invention is to provide a method for selecting spectral wavenumbers without a reference value, which only uses spectra to filter wavenumbers that are meaningful for modeling, thereby reducing the amount of modeling samples and improving model accuracy.
  • Spectral detection is a secondary detection method.
  • the sample to be analyzed is scanned for its spectrum, and on the other hand, the content of the substance of interest is detected by traditional chemical methods (such as flow analysis, GC-MS, etc.), and chemometric methods are used
  • a mathematical model is established between the spectrum and the substance content, so that the substance content of the sample to be tested can be predicted based on the spectrum.
  • Spectral detection requires a large number of sample modeling in practical applications.
  • the linear/non-linear relationship between the spectral wavenumber and the substance is established.
  • the sample is used to overcome the influence of baseline, scattering, instrument noise, etc. in the spectrum collection, so that the model remains stable.
  • a method for selecting spectral wavenumber without reference value includes the following steps:
  • Step 1) Prepare K samples to detect spectra
  • Step 2 According to the application requirements, simulate various testing conditions to test the samples;
  • Step 3) Set the conditions concerned in step 2) and collect spectra, collecting N times in total;
  • Step 4) Pre-processing or not pre-processing the spectra obtained by repeatedly collecting K samples for N times;
  • Step 5 For each wave number, X is composed of wave value, that is, the wave value of K samples N repeated collection; y is the category label, where each sample is collected as one category; different samples belong to different categories, and there is a total of K class;
  • Step 6) Stability d is defined as: the intra-class distance Sw divided by the inter-class distance Sb,
  • each category contains N samples, Represents the i-th sample in the k-th category.
  • the mean of each category is ⁇ (k) , and the overall mean is ⁇ ;
  • Step 7) The larger the value of d, the greater the dispersion of the different spectra of the same sample of the sample, that is, the wavenumber has more interference factors, which is not conducive to modeling.
  • the smaller the d value it means that the same sample is more concentrated in the wave number, reflecting the real material information; based on the above method, given the threshold, determine the wave number required for modeling.
  • the conditions in step 2) include temperature adjustment, humidity adjustment, and instrument stability adjustment variable factors.
  • the temperature adjustment is specifically the use of an air conditioner to simulate the temperature environment that appears in actual demand
  • the humidity adjustment is specifically the use of a humidifier/dehumidifier to simulate the humidity environment that occurs in the actual demand
  • the stability adjustment of the instrument is specifically 15 minutes after starting , 1h, 2h... time to collect spectra to simulate the actual working conditions of the instrument.
  • the first derivative, the second derivative, the standard normal correction (SNV) or the multiple scattering correction (MSC) preprocessing is performed on the spectra obtained by repeatedly collecting the K samples N times.
  • the present invention collects spectra from the same sample at different times, estimates the various noises and influencing factors of this type of sample in the instrument, and determines the wave number that is not useful for modeling, thereby greatly reducing the number of samples and chemical values required to establish a model .
  • the present invention has the following beneficial effects:
  • the method provided by the present invention does not need to perform reference value detection, and only performs wavenumber screening by spectrum.
  • the method provided by the present invention can establish a relatively accurate model when the number of samples in the early stage of modeling is insufficient, and the result does not provide a useful reference for subsequent decision-making and promotion.
  • Figure 1 shows multiple scan spectra of different samples
  • Figure 2 shows the d values of different wavenumber points
  • Figure 3 shows the spectrum (black part) selected for subsequent modeling.
  • Embodiment 1 A method for selecting spectral wavenumber without reference value
  • This embodiment provides a method for selecting a spectral wavenumber without a reference value, which specifically includes the following steps:
  • the preprocessing method is 1st derivative processing + standard normal correction (SNV).
  • X is composed of wave value, that is, the wave value of 15 samples collected repeatedly for 6 times;
  • y is the category label, where each sample is collected as one category; different samples belong to different categories, and there are 15 categories in total;
  • the stability d is: the intra-class distance Sw divided by the inter-class distance Sb:
  • each category contains N(6) samples, Represents the i-th sample in the K(15) category; the mean of each category is ⁇ (k) , and the overall mean is ⁇ ; here, the meaning of d is that the same sample is scanned under different environments/states, and its spectrum is represented
  • the absorbance values of each wavenumber of the substance content should be relatively close, and better distinguished from the absorbance values of the same wavenumber of other sample spectra.
  • the value of d reflects the strength of the substance information in each wavenumber point, and the smaller the d value , Indicating that the material response of the wave number point is stronger, different samples can be distinguished, and the larger the d value is, it means that the wave number point is more affected by various non-material factors, and it is difficult to use for sample differentiation and composition prediction.
  • the spectral wavenumber range covers 10000cm -1 ⁇ 3800cm -1 , the d value of most wavebands is large and unstable. Only two regions of 6200cm -1 -5600cm -1 and 5100cm -1 -4100cm -1 are relatively stable. Therefore, these two regions are determined to be the selected modeling wavenumber regions.
  • test set contains 35 samples
  • test set two contains 29 samples.
  • the spectra and methods for the determination of total alkaloids are the same as above.
  • Table 2 shows the comparison between this method and the comparison method in the modeling effect
  • Table 3 and Table 4 are the total alkaloid content and model prediction values corresponding to the two test sets.
  • RMSE Root Mean Square Error
  • y pre represents the predicted value of the content of the component to be tested
  • y pre represents the true value of the content of the component to be tested
  • N represents the number of samples.
  • the cross-validation error RMSECV is calculated from the training samples through crossover calculation.
  • the root mean square error of prediction (RMSEP) is obtained from the test sample. It can be seen that through the method of the present invention, without resorting to any reference value information, the filtered spectral bands have a positive influence on the model, and the prediction errors in the two data sets are reduced by 21% and 28%, respectively.
  • Sample 2 1.34 1.70 0.36 1.56 0.21 Sample 3 2.12 2.68 0.56 2.51 0.39 Sample 4 1.79 1.91 0.12 1.90 0.11 Sample 5 1.96 2.04 0.08 2.24 0.28 Sample 6 2.69 2.71 0.02 2.83 0.13 Sample 7 1.66 1.75 0.09 1.84 0.18 Sample 8 2.25 2.28 0.03 2.49 0.24 Sample 9 2.20 2.18 0.02 2.34 0.14 Sample 10 2.22 2.45 0.23 2.52 0.30 Sample 11 2.77 3.04 0.27 3.09 0.33 Sample 12 1.48 1.65 0.17 1.80 0.33 Sample 13 2.83 3.13 0.30 3.05 0.22 Sample 14 2.31 2.68 0.37 2.80 0.49 Sample 15 2.03 2.31 0.28 2.12 0.09 Sample 16 2.54 2.65 0.11 2.83 0.29 Sample 17 3.08 3.22 0.14 3.55 0.47 Sample 18 2.18 2.56 0.38 2.62 0.44 Sample 19 2.12 2.41 0.29 2.49 0.37 Sample 20 1.98 2.24 0.26 2.25 0.26 Sample 21 1.95 2.04 0.09 2.19 0.25 Sample 22 2.11 2.34 0.23 2.40 0.29 Sample 23
  • Test set 2 actual value Comparison method Absolute error This method Absolute error Sample 1 1.93 1.76 0.18 1.96 0.03 Sample 2 2.17 1.97 0.20 2.21 0.03 Sample 3 2.99 2.87 0.12 3.13 0.14 Sample 4 3.00 2.86 0.14 3.14 0.14 Sample 5 2.76 2.80 0.04 3.09 0.33
  • Sample 6 2.67 2.62 0.05 2.93 0.26 Sample 7 2.52 2.57 0.05 2.88 0.36 Sample 8 2.62 2.50 0.12 2.72 0.10 Sample 9 2.24 2.02 0.22 2.34 0.10 Sample 10 2.12 1.91 0.22 2.18 0.06 Sample 11 2.76 2.72 0.05 2.97 0.21 Sample 12 2.77 2.64 0.13 3.00 0.23 Sample 13 3.12 2.97 0.15 3.31 0.19 Sample 14 3.10 2.95 0.16 3.32 0.22 Sample 15 2.27 2.14 0.12 2.27 0.01 Sample 16 3.00 2.93 0.07 3.03 0.02 Sample 17 2.53 2.31 0.22 2.64 0.11 Sample 18 2.19 2.02 0.17 2.31 0.12 Sample 19 2.46 2.31 0.15 2.60 0.14 Sample 20 2.45 2.42 0.02 2.67 0.23 Sample 21 1.88 1.85 0.03 2.10 0.22 Sample 22 2.39 2.32 0.07 2.63 0.24 Sample 23 2.22 2.16 0.06 2.43 0.21 Sample 24 2.51 2.50 0.01 2.73 0.22 Sample 25 2.10 1.98 0.12 2.33 0.23 Sample 26 2.02 1.87 0.15 2.15 0.14 Sample 27

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Algebra (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

一种无参考值的光谱波数选择方法,该方法通过同一样本不同时间采集光谱,对该类样本在仪器中的各种噪声及影响因素进行估计,确定出对建模无益的波数,从而大幅减少了建立模型所需的样本及化学值数量。

Description

一种无参考值的光谱波数选择方法 技术领域
本发明属于数据分析与数学建模领域,具体涉及一种无参考值的光谱波数选择方法。
背景技术
光谱技术,尤其是近红外光谱,中红外光谱和拉曼光谱,因其快速、准确和无损而被广泛应用于化工、食品、制药等行业。光谱多元校正技术能够有效地用于物质成分含量检测和在线过程监测。与流动分析仪,气相色谱等传统的分析化学方法不同,从光谱图上并不能直接得到如感兴趣物质的含量等信息。使用光谱进行定量分析一般采用二次建模方式,即基于一组已知样本建立校正模型。这一组已知样本称为校正集样本或训练集样本,通过这组样本的光谱及其物质参考值(例如样本中感兴趣物质的含量,由其他分析化学手段测定),利用多元校正方法建立预测模型。对于待测样本,只需测定其光谱,根据建立的模型可以快速给出定量结果。
随着光谱技术的发展,越来越多的领域希望通过利用光谱检测代替传统的化学检测。然而,对于一个新问题(新的检测目标,新成分),传统的光谱建模法需要有一定数量的建模样本及其利用标准方法检测得到的物质参考值才能进行建模。例如卢建刚(专利申请号201510991505.3)公开的一种光谱波数的选择方法,所述方法针对光谱的波数,多次随机抽样校正样本,建立偏最小二乘回归模型,计算每个波数的变量投影重要性系数并按指定的规则进行筛选。该专利及其他波数筛选方法都依赖于建立光谱波数与物质含量间的相关模型(例如偏最小二乘模型),然后根据各个波数在模型中的表现进行筛选。
建模样本的数量与检测目标的复杂度及感兴趣物质的含量相关,例如,利用近红外光谱分析谷物中的蛋白质含量,一般要求建模样本达到200个。实际应用中,由于建模数据集中的每一个样本都需采集光谱和测定待建模物质的含量。而使用传统化学方法测定物质含量所需的人工,仪器,耗材等要求较高,企业需承受很高的时间和经济成本压力。
因此,一个亟待解决的问题是不借助于建立模型,即在仅有光谱的条件下,对光谱波数进行有效的筛选,从而借助较少的带有物质含量的样本进行建模,为企业节省大量人工耗材和时间成本,具有重要意义。
发明内容
本发明的目的在于提供一种无参考值的光谱波数选择方法,仅通过光谱,筛选对建模有意义的波数,从而减少建模样本量并提高模型精度。
光谱检测是一种二次检测方法,对待分析样本,一方面扫描其光谱,另一方面用传统化学方法(例如流动分析,GC-MS等)检测其感兴趣的物质含量,并用化学计量学方法在光谱与物质含量间建立数学模型,从而依据光谱就可预测出待测样本的感兴趣物质含量。光谱检测在实际应用中需要大量样本建模,一方面建立光谱波数与物质间的线性/非线性关系,另一方面,通过样本克服光谱采集中基线、散射、仪器噪声等的影响,使得模型保持稳定。
为了实现上述发明目的,本发明采用以下技术方案:
一种无参考值的光谱波数选择方法,该方法包括如下步骤:
步骤1)准备K个样品,在用以检测光谱;
步骤2)根据应用需求,模拟各种检测条件,对样本进行检测;
步骤3)对步骤2)中所关注的条件进行设定并采集光谱,共采集N次;
步骤4)对K个样本N次重复采集得到的光谱进行预处理或者不作预处理;
步骤5)对每一个波数,X由波数值组成,即K个样本N次重复采集的波数值;y为类别标签,其中每一个样本不同采集作为一类;不同样本属于不同类,一共有K类;
步骤6)稳定性d定义为:类内距离Sw除以类间距离Sb,
Figure PCTCN2020095035-appb-000001
Figure PCTCN2020095035-appb-000002
其中,每类中包含N个样本,
Figure PCTCN2020095035-appb-000003
代表第k类中的第i个样本。每一类的均值为μ (k),总体均值为μ;
步骤7)d值越大,说明样本的同一样本不同光谱的离散度大,即该波数受到的干扰因素较多,不利于建模。d值越小,说明同一样本在该波数较为集中,反映出真实的物质信息;基于以上方法,给定阈值,确定建模所需的波数。
进一步优选地,步骤2)中的条件包括温度调整,湿度调整,仪器稳定性调整可变因素。
进一步优选地,温度调整具体为使用空调,模拟实际需求中所出现的温度环境;湿度调整具体为使用加湿器/除湿机模拟实际需求中所出现的湿度环境;仪器稳定性调整具体为在开机15min,1h,2h…时段采集光谱,以模拟实际中的仪器工况。
进一步优选地,对K个样本N次重复采集得到的光谱进行1阶导数、2阶导数、标准正态校正(SNV)或多元散射校正(MSC)预处理。
本发明通过同一样本不同时间采集光谱,对该类样本在仪器中的各种噪声及影响因素进行估计,确定出对建模无益的波数,从而大幅减少了建立模型所需的样本及化学值数量。
与现有技术相比,本发明具有以下有益效果:
1.本发明提供的方法无需进行参考值的检测,仅通过光谱进行波数筛选。
2.本发明提供的方法在建模早期样本数量不足够时即可建立相对准确的模型,其结果未后续决策和推广提供有益参考。
附图说明
图1为不同样本多次扫描光谱图;
图2为不同波数点d值;
图3为筛选出用以后续建模的谱段(黑色部分)。
具体实施方式
实施例1 一种无参考值的光谱波数选择方法
本实施例提供一种无参考值的光谱波数选择方法,具体包括如下步骤:
(1)选取15个烟叶样本。取样后将样本按照烟草行业标准《YC/T 31-1996烟草及烟草制品试样的制备和水分测定烘箱法》制备成粉末样本(将烟叶置于烘箱中,40℃下干燥4h,用旋风磨(FOSS)磨碎过40目筛),装入塑料瓶保存。
(2)如表1所示,在不同时间不同环境下(参照表1),对15个样本进行6次光谱扫描;扫描所得的光谱如图1所示;
表1
Figure PCTCN2020095035-appb-000004
(3)对K个样本N次重复采集得到的光谱进行预处理,预处理方法为1阶导数处理+标准正态校正(SNV)。
(4)计算光谱各个波数的稳定性d,d值越大,说明样本的同一样本不同光谱的离散度大,即该波数受到的干扰因素较多,不利于建模。d值越小,说明同一样本在该波数较为集中,反映出真实的物质信息。
对每一个波数,X由波数值组成,即15个样本6次重复采集的波数值;y为类别标签,其中每一个样本不同采集作为一类;不同样本属于不同类,一共有15类;
稳定性d为:类内距离Sw除以类间距离Sb:
Figure PCTCN2020095035-appb-000005
Figure PCTCN2020095035-appb-000006
Figure PCTCN2020095035-appb-000007
其中,每类中包含N(6)个样本,
Figure PCTCN2020095035-appb-000008
代表第K(15)类中的第i个样本;每一类的均值为μ (k),总体均值为μ;在这里,d的意义在于同一样本不同环境/状态下扫描,其光谱中表征物质含量的各波数吸光度值应较为接近,并且,与其他样本光谱的同一波数吸光度值有较 好的区分,因此,d值的大小反映了各波数点中物质信息的强弱,d值越小,说明该波数点的物质相应越强烈,不同样本可以区分,d值越大,说明该波数点受各种非物质因素影响较大,难以用来进行样本区分和成分预测。
计算结果参见图2可以看出,虽然光谱波数范围涵盖10000cm -1~3800cm -1,但多数波段d值较大,并不稳定。仅有6200cm -1-5600cm -1以及5100cm -1-4100cm -1两个区域较为稳定。因此,确定这两个区域为所选的建模波数区域。
对比例 波数选择效果比较
选取40个烟叶样本,磨粉后,利用流动分析仪按国标检测方法《YC/T160烟草和烟草制品总植物碱的测定》测得烟叶样本的总植物碱含量;
分别利用全维光谱和本发明给出的谱段,利用偏最小二乘法进行建模。另选取两批数据作为测试集,测试集一含有35个样本,测试集二含有29个样本。光谱及总植物碱测定值方法均与上文相同。
表2给出了本方法与对比方法在建模效果中的比较;表3和表4分别为两个测试集所对应的总植物碱含量及模型预测值。
我们使用根均方误差(Root mean square error,RMSE)来度量模型的预测能力。RMSE的定义如下:
Figure PCTCN2020095035-appb-000009
其中,y pre代表待测组分含量的预测值,y pre代表待测组分含量的真实值。N代表样本数。交叉验证误差RMSECV由训练样本通过交叉计算得到。预测根均方误差(Root mean square error of prediction,RMSEP)由测试样本得到。可以看出,通过本发明方法,在不借助于任何参考值信息下,筛选出的光谱波段对模型产生了积极的影响,两个数据集中的预测误差分别降低了21%与28%。
表2 本方法与对比方法的模型精度对比
数据集 方法 潜变量数 RMSEC RMSECV RMSEP
测试集1 不做筛选 9 0.05 0.13 0.29
  本方法筛选 10 0.04 0.09 0.23
测试集2 不做筛选 9 0.05 0.13 0.21
  本方法筛选 10 0.04 0.09 0.13
表3 测试集2的总植物碱值及模型预测值
测试集1 真实值 对比方法 绝对误差 本方法 绝对误差
样本1 1.65 1.77 0.11 1.93 0.28
样本2 1.34 1.70 0.36 1.56 0.21
样本3 2.12 2.68 0.56 2.51 0.39
样本4 1.79 1.91 0.12 1.90 0.11
样本5 1.96 2.04 0.08 2.24 0.28
样本6 2.69 2.71 0.02 2.83 0.13
样本7 1.66 1.75 0.09 1.84 0.18
样本8 2.25 2.28 0.03 2.49 0.24
样本9 2.20 2.18 0.02 2.34 0.14
样本10 2.22 2.45 0.23 2.52 0.30
样本11 2.77 3.04 0.27 3.09 0.33
样本12 1.48 1.65 0.17 1.80 0.33
样本13 2.83 3.13 0.30 3.05 0.22
样本14 2.31 2.68 0.37 2.80 0.49
样本15 2.03 2.31 0.28 2.12 0.09
样本16 2.54 2.65 0.11 2.83 0.29
样本17 3.08 3.22 0.14 3.55 0.47
样本18 2.18 2.56 0.38 2.62 0.44
样本19 2.12 2.41 0.29 2.49 0.37
样本20 1.98 2.24 0.26 2.25 0.26
样本21 1.95 2.04 0.09 2.19 0.25
样本22 2.11 2.34 0.23 2.40 0.29
样本23 1.99 2.38 0.40 2.25 0.26
样本24 2.78 3.01 0.23 3.15 0.37
样本25 1.63 1.74 0.11 1.78 0.15
样本26 1.97 1.94 0.03 2.00 0.03
样本27 1.85 1.84 0.01 2.09 0.25
样本28 1.80 1.77 0.03 2.06 0.26
样本29 2.42 2.41 0.01 2.65 0.23
样本30 2.47 2.82 0.35 3.00 0.53
样本31 2.56 2.81 0.25 2.77 0.21
样本32 2.25 2.44 0.18 2.42 0.16
样本33 2.07 2.34 0.27 2.28 0.21
样本34 2.60 2.90 0.30 2.89 0.29
样本35 1.88 1.93 0.06 2.19 0.31
表4 测试集2的总植物碱值及模型预测值
测试集2 真实值 对比方法 绝对误差 本方法 绝对误差
样本1 1.93 1.76 0.18 1.96 0.03
样本2 2.17 1.97 0.20 2.21 0.03
样本3 2.99 2.87 0.12 3.13 0.14
样本4 3.00 2.86 0.14 3.14 0.14
样本5 2.76 2.80 0.04 3.09 0.33
样本6 2.67 2.62 0.05 2.93 0.26
样本7 2.52 2.57 0.05 2.88 0.36
样本8 2.62 2.50 0.12 2.72 0.10
样本9 2.24 2.02 0.22 2.34 0.10
样本10 2.12 1.91 0.22 2.18 0.06
样本11 2.76 2.72 0.05 2.97 0.21
样本12 2.77 2.64 0.13 3.00 0.23
样本13 3.12 2.97 0.15 3.31 0.19
样本14 3.10 2.95 0.16 3.32 0.22
样本15 2.27 2.14 0.12 2.27 0.01
样本16 3.00 2.93 0.07 3.03 0.02
样本17 2.53 2.31 0.22 2.64 0.11
样本18 2.19 2.02 0.17 2.31 0.12
样本19 2.46 2.31 0.15 2.60 0.14
样本20 2.45 2.42 0.02 2.67 0.23
样本21 1.88 1.85 0.03 2.10 0.22
样本22 2.39 2.32 0.07 2.63 0.24
样本23 2.22 2.16 0.06 2.43 0.21
样本24 2.51 2.50 0.01 2.73 0.22
样本25 2.10 1.98 0.12 2.33 0.23
样本26 2.02 1.87 0.15 2.15 0.14
样本27 2.22 2.18 0.03 2.55 0.33
样本28 2.31 2.33 0.01 2.66 0.35
样本29 2.50 2.51 0.01 2.88 0.38

Claims (4)

  1. 一种无参考值的光谱波数选择方法,其特征在于:该方法包括如下步骤:
    步骤1)准备K个样品,在用以检测光谱;
    步骤2)根据应用需求,模拟各种检测条件,对样本进行检测;
    步骤3)对步骤2)中所关注的条件进行设定并采集光谱,共采集N次;
    步骤4)对K个样本N次重复采集得到的光谱进行预处理或者不作预处理;
    步骤5)对每一个波数,X由波数值组成,即K个样本N次重复采集的波数值;y为类别标签,其中每一个样本不同采集作为一类;不同样本属于不同类,一共有K类;
    步骤6)计算各个波数点的d值,稳定性d定义为:类内距离Sw除以类间距离Sb,
    Figure PCTCN2020095035-appb-100001
    Figure PCTCN2020095035-appb-100002
    其中,每类中包含N个样本,
    Figure PCTCN2020095035-appb-100003
    代表第k类中的第i个样本;每一类的均值为μ (k),总体均值为μ;
    步骤7)d值越大,说明样本的同一样本不同光谱的离散度大,即该波数受到的干扰因素较多,不利于建模;d值越小,说明同一样本在该波数较为集中,反映出真实的物质信息;基于以上方法,给定阈值,确定建模所需的波数。
  2. 根据权利要求1所述的无参考值的光谱波数选择方法,其特征在于:步骤2)中的条件包括温度调整,湿度调整,仪器稳定性调整可变因素。
  3. 根据权利要求2所述的无参考值的光谱波数选择方法,其特征在于:温度调整具体为使用空调,模拟实际需求中所出现的温度环境;湿度调整具体为使用加湿器/除湿机模拟实际需求中所出现的湿度环境;仪器稳定性调整具体为在开机15min,1h,2h…时段采集光谱,以模拟实际中的仪器工况。
  4. 根据权利要求1所述的无参考值的光谱波数选择方法,其特征在于:对K个样本N次重复采集得到的光谱进行1阶导数、2阶导数、标准正态校正(SNV)或多元散射校正(MSC)预处理。
PCT/CN2020/095035 2019-06-11 2020-06-09 一种无参考值的光谱波数选择方法 WO2020248961A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910501214.X 2019-06-11
CN201910501214.XA CN110210005A (zh) 2019-06-11 2019-06-11 一种无参考值的光谱波数选择方法

Publications (1)

Publication Number Publication Date
WO2020248961A1 true WO2020248961A1 (zh) 2020-12-17

Family

ID=67791994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095035 WO2020248961A1 (zh) 2019-06-11 2020-06-09 一种无参考值的光谱波数选择方法

Country Status (2)

Country Link
CN (1) CN110210005A (zh)
WO (1) WO2020248961A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210005A (zh) * 2019-06-11 2019-09-06 浙江中烟工业有限责任公司 一种无参考值的光谱波数选择方法
CN113030010A (zh) * 2021-03-11 2021-06-25 贵州省生物技术研究所(贵州省生物技术重点实验室、贵州省马铃薯研究所、贵州省食品加工研究所) 一种基于逐步缩短步长优中选优的近红外光谱特征波数的筛选方法
CN113295673B (zh) * 2021-04-29 2022-10-11 中国科学院沈阳自动化研究所 一种激光诱导击穿光谱弱监督特征提取方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101498661A (zh) * 2008-01-30 2009-08-05 香港浸会大学 高精度分辨中药材品种、产地及生长方式的红外光谱特征提取方法
US20100297291A1 (en) * 2007-09-21 2010-11-25 Suntory Holdings Limited Visible/near-infrared spectrum analyzing method and grape fermenting method
CN102930533A (zh) * 2012-10-09 2013-02-13 河海大学 一种基于改进k-均值聚类的半监督高光谱影像降维方法
CN105630743A (zh) * 2015-12-24 2016-06-01 浙江大学 一种光谱波数的选择方法
CN106568724A (zh) * 2016-11-01 2017-04-19 清华大学 光谱曲线预处理及特征挖掘方法及装置
CN110210005A (zh) * 2019-06-11 2019-09-06 浙江中烟工业有限责任公司 一种无参考值的光谱波数选择方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105486658B (zh) * 2015-11-19 2018-09-28 江南大学 一种具有无测点温度补偿功能的近红外物性参数测量方法
CN108181263B (zh) * 2017-12-29 2021-01-12 浙江中烟工业有限责任公司 基于近红外光谱的烟叶部位特征提取及判别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100297291A1 (en) * 2007-09-21 2010-11-25 Suntory Holdings Limited Visible/near-infrared spectrum analyzing method and grape fermenting method
CN101498661A (zh) * 2008-01-30 2009-08-05 香港浸会大学 高精度分辨中药材品种、产地及生长方式的红外光谱特征提取方法
CN102930533A (zh) * 2012-10-09 2013-02-13 河海大学 一种基于改进k-均值聚类的半监督高光谱影像降维方法
CN105630743A (zh) * 2015-12-24 2016-06-01 浙江大学 一种光谱波数的选择方法
CN106568724A (zh) * 2016-11-01 2017-04-19 清华大学 光谱曲线预处理及特征挖掘方法及装置
CN110210005A (zh) * 2019-06-11 2019-09-06 浙江中烟工业有限责任公司 一种无参考值的光谱波数选择方法

Also Published As

Publication number Publication date
CN110210005A (zh) 2019-09-06

Similar Documents

Publication Publication Date Title
WO2020248961A1 (zh) 一种无参考值的光谱波数选择方法
CN108680515B (zh) 一种单粒水稻直链淀粉定量分析模型构建及其检测方法
CN103278473B (zh) 白胡椒中胡椒碱及水分含量的测定和品质评价方法
Xiao et al. Discrimination of organic and conventional rice by chemometric analysis of NIR spectra: a pilot study
CN110749565A (zh) 一种快速鉴别普洱茶存储年份的方法
CN108613943B (zh) 一种基于光谱形态转移的近红外单籽粒作物成分检测方法
CN105138834A (zh) 基于近红外光谱波数k均值聚类的烟草化学值定量方法
CN111504942A (zh) 一种提高牛奶中蛋白质预测精度的近红外光谱分析方法
CN108169168A (zh) 检测分析水稻籽粒蛋白质含量数学模型及构建方法和应用
CN113655027A (zh) 一种近红外快速检测植物中单宁含量的方法
CN105223140A (zh) 同源物质的快速识别方法
CN104316492A (zh) 近红外光谱测定马铃薯块茎中蛋白质含量的方法
CN113030007B (zh) 基于相似度学习算法快速检验烟用香精质量稳定性的方法
CN114088661A (zh) 一种基于迁移学习和近红外光谱的烟叶烘烤过程化学成分在线预测方法
CN111141809B (zh) 一种基于非接触式电导信号的土壤养分离子含量检测方法
CN109540837B (zh) 近红外快速检测苎麻叶片木质纤维素含量的方法
CN110887809B (zh) 一种基于近红外光谱技术测定烟丝中梗含量的方法
CN110231306A (zh) 一种无损、快速测定奇亚籽蛋白质含量的方法
Wang et al. Monitoring model for predicting maize grain moisture at the filling stage using NIRS and a small sample size
CN114169165B (zh) 一种三波段植被指数估算镉胁迫下水稻叶绿素的模型方法
CN105787518A (zh) 一种基于零空间投影的近红外光谱预处理方法
CN113984708B (zh) 一种化学指标检测模型的维护方法和装置
CN104181125A (zh) 快速测定啤酒麦芽中库尔巴哈值的方法
CN112763448A (zh) 一种基于atr-ftir技术的米糠中多糖含量的快速检测方法
CN106872398A (zh) 一种hmx炸药水分含量快速测量方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20823018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20823018

Country of ref document: EP

Kind code of ref document: A1