CN105717066B

CN105717066B - A kind of near infrared spectrum identification model based on weighted correlation coefficient

Info

Publication number: CN105717066B
Application number: CN201610064824.4A
Authority: CN
Inventors: 徐雪芹; 李小兰; 刘鸿; 黄善松; 贾海江; 周芸; 邵利民; 周艳枚; 潘玉灵
Original assignee: China Tobacco Guangxi Industrial Co Ltd
Current assignee: China Tobacco Guangxi Industrial Co Ltd
Priority date: 2016-01-29
Filing date: 2016-01-29
Publication date: 2018-09-18
Anticipated expiration: 2036-01-29
Also published as: CN105717066A

Abstract

The present invention establishes a near-infrared spectrum identification model, uses a near-infrared spectrometer to scan the spectra of a series of normal similar products, and calculates the average spectrum as a reference spectrum; then calculates the weighted correlation coefficient between the product spectrum and the reference spectrum, and uses the average Values and standard deviations are used to calculate the identification interval and establish the identification model. Compared with traditional instrument analysis such as chromatography and mass spectrometry, it has the advantages of being green, environmentally friendly, simple and fast, and easy to operate, and the built model has high recognition accuracy, high detection efficiency, and low cost. Authenticity identification provides technical support.

Description

A Near Infrared Spectral Identification Model Based on Weighted Correlation Coefficient

技术领域technical field

本发明涉及近红外光谱技术领域，具体是一种基于加权相关系数的近红外光谱识别模型，可用于卷烟产品真伪鉴别和质量稳定性分析。The invention relates to the technical field of near-infrared spectroscopy, in particular to a near-infrared spectroscopy identification model based on a weighted correlation coefficient, which can be used for authenticity identification and quality stability analysis of cigarette products.

背景技术Background technique

近红外光谱技术具有分析过程高效、绿色、环保的现代分析特征，因而成为近年来发展较快、引人注目的光谱分析技术之一。根据美国实验和材料协会(ASTM)的规定，其波长范围为780～2526mn。分子在NIR区的吸收主要由C-H、0-H、N-H和C＝0等基团的合频吸收与倍频吸收组成，此区的吸收强度低、谱带复杂、重叠严重，无法使用经典定性、定量的方法，必须借助化学计量学中的多元统计、曲线拟合、聚类分析等方法进行定标建模，并结合合适的模型实现快速多组分分析。近红外光谱技术具有快速、无损、实时检测等优点，已经成为工业产品分析的强有力工具。然而，近红外光谱攒在特征弱、数据量大，视觉识别以及传统匹配算法难以获得可靠结果的弱点。因此，迫切需要开发一种有效、快速、自动化程度高的识别算法。Near-infrared spectroscopy has the characteristics of efficient, green, and environmentally friendly analysis, so it has become one of the fastest-growing and eye-catching spectral analysis techniques in recent years. According to the provisions of the American Society for Testing and Materials (ASTM), its wavelength range is 780-2526nm. The absorption of molecules in the NIR region is mainly composed of combined frequency absorption and double frequency absorption of groups such as C-H, 0-H, N-H and C=0. The absorption intensity in this region is low, the bands are complex, and the overlap is serious, so it is impossible to use classical qualitative , Quantitative methods must use methods such as multivariate statistics, curve fitting, and cluster analysis in chemometrics to carry out calibration modeling, and combine appropriate models to achieve rapid multi-component analysis. Near-infrared spectroscopy has the advantages of fast, non-destructive, and real-time detection, and has become a powerful tool for industrial product analysis. However, near-infrared spectroscopy has weak features, large amount of data, visual recognition and traditional matching algorithms are difficult to obtain reliable results. Therefore, it is urgent to develop an effective, fast and highly automated recognition algorithm.

长期以来，卷烟产品内在质量表征主要通过感官评吸的方法，缺乏直观形象的定量描述方法。随着产品市场竞争的日趋激烈及工业企业生产自动化程度的日益提高，产品质量的稳定性及控制日趋凸显，迫切需要快速、高效、简便的分析方法用于卷烟产品质量稳定性的评价与控制。For a long time, the internal quality of cigarette products has been mainly characterized by sensory evaluation, lacking an intuitive and quantitative description method. With the increasingly fierce competition in the product market and the increasing automation of industrial enterprises, the stability and control of product quality are becoming more and more prominent. There is an urgent need for fast, efficient and simple analysis methods for the evaluation and control of cigarette product quality stability.

本发明针对上述问题，提出了一种基于加权相关系数的识别模型。该模型以一系列正常同类产品的近红外光谱为基础，通过加权相关系数建立识别模型。在模型应用中，被判定为正常同类产品的近红外光谱又可以添加到模型中，从而实现对识别模型的更新，使其适应性更强，结果更准确。Aiming at the above problems, the present invention proposes a recognition model based on weighted correlation coefficients. The model is based on a series of near-infrared spectra of normal similar products, and the identification model is established through weighted correlation coefficients. In the application of the model, the near-infrared spectrum that is judged to be a normal product of the same kind can be added to the model, so as to realize the update of the recognition model, making it more adaptable and more accurate.

本发明中通过加权相关系数来衡量光谱的相似性，加权相关系数能够有效地将光谱特征用于相似性的计算，从而提高了结果的可靠性。上述正常同类产品的近红外光谱与参考光谱高度相似，但又存在不同；既体现了共同特性，也反映了个体差异。由此确定出一个光谱相似性的区间，在此区间中的产品为同类产品，否则为非同类产品或者异常产品。In the present invention, the similarity of the spectrum is measured by the weighted correlation coefficient, and the weighted correlation coefficient can effectively use the spectral features for the calculation of the similarity, thereby improving the reliability of the result. The near-infrared spectra of the above-mentioned normal similar products are highly similar to the reference spectra, but there are differences; they not only reflect common characteristics, but also reflect individual differences. A range of spectral similarity is thus determined, and the products in this range are similar products, otherwise they are non-similar products or abnormal products.

发明内容Contents of the invention

为进一步提高模型的识别精度，本发明提出了一种基于加权相关系数方法提取特征光谱。本发明提出的识别模型由参考光谱和识别区间组成。In order to further improve the recognition accuracy of the model, the present invention proposes a method based on weighted correlation coefficient to extract the characteristic spectrum. The recognition model proposed by the invention is composed of reference spectrum and recognition interval.

本发明利用近红外光谱仪将一系列正常同类产品的光谱进行扫描，然后提取每个光谱的特征数据，以平均光谱作为参考光谱；计算产品光谱与参考光谱的加权相关系数，通过加权相关系数的平均值和标准偏差计算识别区间，建立识别模型，该识别模型由参考光谱和识别区间组成。The present invention uses a near-infrared spectrometer to scan the spectra of a series of normal similar products, then extracts the characteristic data of each spectrum, and uses the average spectrum as a reference spectrum; calculates the weighted correlation coefficient between the product spectrum and the reference spectrum, and calculates the weighted correlation coefficient between the product spectrum and the reference spectrum. Values and standard deviations are used to calculate the identification interval, and the identification model is established. The identification model is composed of the reference spectrum and the identification interval.

具体建模步骤如下：The specific modeling steps are as follows:

(1)首先对m个不同批次生产的同类样品进行前处理，采用近红外对样品光谱进行扫描，以向量s_i表示第i个样本的光谱数据(i＝1,2,...,m)，每个光谱包含n个数据点。(1) Firstly, pre-treat m samples of the same kind produced in different batches, scan the sample spectrum by using near-infrared, and use the vector s _i to represent the spectral data of the i-th sample (i=1,2,..., m), each spectrum contains n data points.

(2)以光谱向量s_i为行向量，整理出如下形式的数据矩阵S，(2) Taking the spectral vector _si as a row vector, sort out the data matrix S in the following form,

通过下式计算出平均光谱 Calculate the average spectrum by

(3)计算所有光谱s与的加权相关系数wcc，其中w为权重(3) Calculate all spectra s and The weighted correlation coefficient wcc, where w is the weight

其中，s_j和分别表示光谱s和的第j个数据点；w_j表示权重向量w的第j个数据点。Among them, s _j and represent the spectra s and The jth data point of ; w _j represents the jth data point of the weight vector w.

(4)步骤3中的权重向量w通过下式计算得到(4) The weight vector w in step 3 is calculated by the following formula

其中，向量d通过下式计算得到Among them, the vector d is calculated by the following formula

其中，上标T表示矩阵转置。in, The superscript T indicates matrix transpose.

(5)计算wcc的均值和标准偏差，分别表示为和d。建立识别区间其中k为比例系数，根据校正集数据设定，以使所有wcc均大于将(5) Calculate the mean and standard deviation of wcc, expressed as and d. Create an identification interval Where k is a proportional coefficient, set according to the calibration set data, so that all wcc are greater than Will

作为该类产品的识别区间。 As the identification interval of this type of product.

对于未知产品，首先扫描其近红外光谱，以s_x表示，然后通过步骤3中的公式计算加权相关系数wcc_x。判断wcc_x是否处于识别区间如果是，那么认为该未知产品与校正集产品相同；将s_x添加到校正集中，重复步骤1-4，更新识别区间。如果否，那么认为该未知产品与校正集产品不同。For an unknown product, first scan its near-infrared spectrum, denoted by s _x , and then calculate the weighted correlation coefficient wcc _x by the formula in step 3. Determine whether wcc _x is in the recognition interval If yes, then the unknown product is considered to be the same as the calibration set product; add s _x to the calibration set, repeat steps 1-4, and update the identification interval. If not, then the unknown product is considered to be different from the calibration set product.

应用加权相关系数计算公式，算出所有同类光谱的加权相关系数wcc的均值和标准偏差，其中均值用来表示，标准偏差用d来表示，建立识别区间其中k为比例系数。Apply the weighted correlation coefficient calculation formula to calculate the mean value and standard deviation of the weighted correlation coefficient wcc of all similar spectra, where the mean value is expressed as to represent, the standard deviation is represented by d, and the identification interval is established where k is the scaling factor.

根据加权相关系数计算出的识别区间和校正集数据设定，所有同类光谱的加权相关系数wcc均大于该类产品的识别区间为 According to the identification interval calculated by the weighted correlation coefficient and the data setting of the calibration set, the weighted correlation coefficient wcc of all spectra of the same kind is greater than The identification interval for this type of product is

在对模型进行应用时，通过扫描待分析样品的光谱，计算加权相关系数wcc_x，若该系数落入识别区间可判定其为同类正常产品。When applying the model, the weighted correlation coefficient wcc _x is calculated by scanning the spectrum of the sample to be analyzed. If the coefficient falls into the identification interval It can be determined that it is a normal product of the same kind.

该模型以一系列正常同类产品的近红外光谱为基础，通过加权相关系数建立识别模型。The model is based on a series of near-infrared spectra of normal similar products, and the identification model is established through weighted correlation coefficients.

所述的基于加权相关系数的近红外光谱识别模型：包括扫描前步骤将样品粉碎为40-80目。所属样品为烟丝、烟梗/或烟末。The near-infrared spectrum identification model based on the weighted correlation coefficient: includes the step of crushing the sample into 40-80 meshes before scanning. The samples are shredded tobacco, tobacco stems and/or tobacco dust.

在模型应用中，被判定为正常同类产品的近红外光谱又可以添加到模型中，从而实现对识别模型的补充与更新，使模型适应性更强，预测结果更精确。In the application of the model, the near-infrared spectrum that is judged to be a normal product of the same kind can be added to the model, so as to realize the supplement and update of the identification model, make the model more adaptable and the prediction result more accurate.

相对于现有技术，本发明具有以下显著优点：Compared with the prior art, the present invention has the following significant advantages:

1、本发明提出的一种基于加权相关系数建立产品真伪及稳定性识别模型的方法，加权相关系数能够有效地将光谱特征用于相似性的计算，极大的提高了识别模型的可靠性。1. A method for establishing a product authenticity and stability identification model based on the weighted correlation coefficient proposed by the present invention. The weighted correlation coefficient can effectively use spectral features for similarity calculations, greatly improving the reliability of the identification model .

2、通过加权相关系数，确定出一个光谱相似性的区间，既可体现出每个光谱的共同特性，又反映了个体差异。利用正常同类产品的近红外光谱与参考光谱高度相似性，建立区域空间，样品扫描光谱落在此区间为同类产品，区间外的为非同类产品或者异常产品，可有效避免判断不准确的现象，提高了模型识别精度，为卷烟生产企业的产品的质量稳定性分析及真伪鉴别，提供技术保障。2. Through the weighted correlation coefficient, determine a spectral similarity interval, which can not only reflect the common characteristics of each spectrum, but also reflect individual differences. Using the high similarity between the near-infrared spectrum and the reference spectrum of normal similar products, the regional space is established. The sample scanning spectrum falls within this range as the same product, and the ones outside the range are non-similar products or abnormal products, which can effectively avoid the phenomenon of inaccurate judgment. The accuracy of model recognition is improved, and technical support is provided for the quality stability analysis and authenticity identification of cigarette manufacturers' products.

3、所采用近红外光谱技术，与传统的色谱、质谱等仪器分析相比，整个分析过程中不使用化学试剂，具有绿色、环保，简单快捷，易于操作的优点，模型建立中应用矩阵、加权相关系数等化学计量学工具，所建模型识别精确度高，检测效率高，成本低。3. Compared with traditional chromatography, mass spectrometry and other instrument analysis, the near-infrared spectroscopy technology adopted does not use chemical reagents in the whole analysis process, which has the advantages of green, environmental protection, simple and fast, and easy operation. Matrix and weighting are used in model establishment Chemometric tools such as correlation coefficient, the built model has high recognition accuracy, high detection efficiency and low cost.

附图说明：Description of drawings:

图1是本发明的建模流程图；Fig. 1 is the modeling flowchart of the present invention;

图2是卷烟烟丝的近红外扫描原始谱图；Fig. 2 is the near-infrared scanning original spectrogram of shredded tobacco;

图3是A牌号卷烟烟丝近红外光谱建立的识别模型；Fig. 3 is the identification model established by near-infrared spectrum of shredded tobacco of brand A cigarette;

图4是A、B牌号分类识别模型；Fig. 4 is A, B trade mark classification recognition model;

具体实施方式：Detailed ways:

下面结合附图对本发明的具体实施方式作进一步详细说明。The specific implementation manners of the present invention will be described in further detail below in conjunction with the accompanying drawings.

本发明及实施例的建模流程如下：首先进行实验设计，根据设计进行样品采集，对采集到的有代表性的样品进行预处理，用近红外光谱仪进行光谱采集，对采集的光谱参数进行优化，光谱预处理方法采用Norris导数平滑滤波、微分处理、多元散射校正、标准归一化等方法；波段选择利用偏最小二乘法、遗传算法、无信息变量消除等方式对光谱波段进行优化。光谱优化后建立定性分析模型，按照提取光谱特征数据、建立数据矩阵、计算平均光谱、计算加权相关系数等步骤建立近红外识别模型，模型建立后，对待测样品进行光谱扫描，应用模型进行分析。见图1The modeling process of the present invention and the embodiment is as follows: firstly, the experimental design is carried out, the sample is collected according to the design, the representative sample collected is pretreated, the spectrum is collected with a near-infrared spectrometer, and the collected spectral parameters are optimized. , Spectral preprocessing methods use Norris derivative smoothing filter, differential processing, multivariate scattering correction, standard normalization and other methods; band selection uses partial least squares method, genetic algorithm, non-informative variable elimination and other methods to optimize the spectral band. After the spectrum is optimized, a qualitative analysis model is established, and the near-infrared recognition model is established according to the steps of extracting spectral characteristic data, establishing a data matrix, calculating the average spectrum, and calculating the weighted correlation coefficient. After the model is established, the spectrum of the sample to be tested is scanned and the model is used for analysis. see picture 1

实施例一Embodiment one

在本实施例中，将识别模型用于同类产品的稳定性识别。In this embodiment, the recognition model is used for the stability recognition of similar products.

1、实验仪器1. Experimental equipment

BRUKER公司(德国)生产的MPA型傅里叶近红外光谱仪，1095Cyclotec(XF-98B)型旋风精密粉碎机。The MPA Fourier near-infrared spectrometer produced by BRUKER (Germany), and the 1095Cyclotec (XF-98B) cyclone precision pulverizer.

2、样品采集2. Sample collection

为了使建立的识别模型在预测不同时段、不同机台生产的卷烟产品具有广泛的适用性，本实验样品选取了3个不同机台在2015年1-12月生产的同一牌号(A)卷烟产品83个正常样品用于模型建立，选取31个未知样品进行模型外部验证。In order to make the established recognition model widely applicable in predicting cigarette products produced by different machines at different times, this experiment sample selected the same brand (A) cigarette products produced by three different machines from January to December 2015 83 normal samples were used for model building, and 31 unknown samples were selected for external validation of the model.

3、样品制备3. Sample preparation

将卷烟烟丝剥出，于40℃的烘箱中烘干，使样品的水份基本保持一致，再用1095Cyclotec(XF-98B)型旋风精密粉碎机充分粉碎，过60目筛。The shredded cigarette tobacco was stripped off and dried in an oven at 40°C to keep the water content of the sample basically consistent, then fully pulverized with a 1095Cyclotec (XF-98B) cyclone precision grinder, and passed through a 60-mesh sieve.

4、光谱扫描和数据处理4. Spectral scanning and data processing

卷烟烟丝样品谱图扫描采用BRUKER公司(德国)生产的MPA型傅里叶近红外光谱仪(带近红外定量分析漫反射镀金大积分球和样品旋转器采样附件)进行，应用Bruker OPUS中定性分析软件QUANT6.5对谱图进行处理，具体操作如下：将烟草粉末装入样品杯，在杯中的高度约为3cm，将砝码压在样品上10s后取出，用纱布将杯子底部的石英玻璃擦拭干净，然后将样品杯置于旋转平台上进行NIR扫描。操作参数为：光谱扫描范围12000～4000cm^-1，光谱分辨率8cm^-1，扫描次数64次(约30S)。以透过方式采集光谱数据并处理为吸收光谱的一阶微分。卷烟烟丝的原始扫描图见图2。在建模过程中，为消除噪音和基线的影响，采用一阶导数9点平滑(Savitzky-Golay)对扫描后的原始光谱进行预处理。样品扫描后，用统计学软件对光谱数据进行处理。Cigarette shredded tobacco sample spectrogram scanning was carried out by MPA Fourier near-infrared spectrometer (with near-infrared quantitative analysis diffuse reflection gold-plated large integrating sphere and sample rotator sampling accessories) produced by BRUKER (Germany), and the qualitative analysis software Bruker OPUS was used QUANT6.5 processes the spectrum, the specific operation is as follows: put the tobacco powder into the sample cup, the height in the cup is about 3cm, press the weight on the sample for 10s and take it out, wipe the quartz glass at the bottom of the cup with gauze Clean, then place the sample cup on the rotating platform for NIR scanning. The operating parameters are: spectral scanning range 12000-4000 cm ^-1 , spectral resolution 8 cm ^-1 , scanning times 64 times (about 30S). Spectral data is collected in transmission and processed as the first differential of the absorption spectrum. The original scanned image of cigarette shredded tobacco is shown in Figure 2. In the modeling process, in order to eliminate the influence of noise and baseline, the original spectrum after scanning was preprocessed by using the first derivative 9-point smoothing (Savitzky-Golay). After the samples were scanned, the spectral data were processed with statistical software.

5、识别模型建立5. Recognition model establishment

模型的建立步骤如下：The steps to build the model are as follows:

5.1以光谱向量s_i为行向量，整理出如下形式的数据矩阵S，5.1 Taking the spectral vector _si as a row vector, sort out the data matrix S in the following form,

通过下式计算出平均光谱 Calculate the average spectrum by

通过上述公式，算出By the above formula, calculate

5.2计算所有光谱s与的加权相关系数wcc，其中w为权重5.2 Calculate all spectra s with The weighted correlation coefficient wcc, where w is the weight

5.3上述步骤中的权重向量w通过下式计算得到5.3 The weight vector w in the above steps is calculated by the following formula

5.4计算wcc的均值和标准偏差，分别表示为和d。建立识别区间其中k为比例系数，根据校正集数据设定，以使所有wcc均大于将作为该类产品的识别区间。5.4 Calculate the mean and standard deviation of wcc, expressed as and d. Create an identification interval Where k is a proportional coefficient, set according to the calibration set data, so that all wcc are greater than Will As the identification interval of this type of product.

通过对扫描的83个A牌号卷烟烟丝光谱的平均光谱，以此作为参考光谱；通过上述公式计算83个该牌号烟丝光谱与参考光谱的加权相关系数；这些加权相关系数的均值和标准偏差分别是0.9998和0.0002025；通过分析发现，所有加权相关系数均大于0.9998–3*0.0002025＝0.9992，所以将此类产品的识别区间确定为[0.9992,1]，建立的A牌号卷烟烟丝近红外光谱建立的识别模型见图3：By scanning the average spectrum of 83 cut tobacco spectra of brand A cigarettes, use this as the reference spectrum; calculate the weighted correlation coefficients of 83 cut tobacco spectra of this brand and the reference spectrum by the above formula; the mean value and standard deviation of these weighted correlation coefficients are respectively 0.9998 and 0.0002025; through analysis, it is found that all weighted correlation coefficients are greater than 0.9998–3*0.0002025=0.9992, so the identification interval of this type of product is determined as [0.9992,1], and the identification established by the near-infrared spectrum of shredded tobacco of brand A cigarette The model is shown in Figure 3:

将同批次样品前处理后，放置于空气中6个小时，然后上级扫描光谱，计算该产品的近红外光谱与模型中参考光谱的加权相关系数，计算结果为0.9985。这一数值在模型识别区间[0.9992,1]，所以，判定该产品不同于识别模型所描述的正常产品。实际上，该产品在近红外光谱扫描之前久置于空气中，吸湿比较严重，虽然烟丝本质没变，但水分发生了变化，质量受到了影响，从光谱上看不出异样，但扫描后光谱的加权相关系数远低于同类产品，说明该模型可用于同一产品的质量稳定性分析。After the same batch of samples were pretreated, they were placed in the air for 6 hours, and then the spectrum was scanned by the upper level, and the weighted correlation coefficient between the near-infrared spectrum of the product and the reference spectrum in the model was calculated, and the calculation result was 0.9985. This value is in the model identification interval [0.9992,1], so it is determined that the product is different from the normal product described by the identification model. In fact, the product was placed in the air for a long time before the near-infrared spectrum scanning, and the moisture absorption was serious. Although the essence of the shredded tobacco has not changed, the moisture has changed, and the quality has been affected. No abnormality can be seen from the spectrum, but the spectrum after scanning The weighted correlation coefficient of is much lower than that of similar products, indicating that the model can be used for the quality stability analysis of the same product.

6、模型验证6. Model validation

为了更好的验证模型的识别能力，本实验采用外部验证的方法，选取未参与建模的31批次样品，用所建模型对不同批次、不同机台生产的A牌号卷烟进行识别，结果见表1:In order to better verify the recognition ability of the model, this experiment adopts the method of external verification, selects 31 batches of samples that have not participated in the modeling, and uses the built model to identify A cigarettes produced by different batches and different machines. The results See Table 1:

表1“A”品牌产品特征模型的识别结果Table 1 Recognition results of "A" brand product feature model

结果显示：不同批次不同机台的31个样品均成功识别出，识别率100％，说明所建模型的预测精确性较高，可用于卷烟产品的质量稳定性分析。The results showed that all 31 samples from different batches and different machines were successfully identified, with a recognition rate of 100%, indicating that the model built had a high prediction accuracy and could be used for quality stability analysis of cigarette products.

实施例二Embodiment two

在本实施例中，将识别模型用于产品的真伪识别。In this embodiment, the identification model is used to identify the authenticity of the product.

1、实验仪器1. Experimental equipment

2、样品采集2. Sample collection

本实施例中选取的卷烟样品为牌号A和B，产品选取了5个不同机台，生产时间为2015年1-12月，共选取牌号A的正常样品83个用于模型建立，选取牌号B的17个未知样品进行模型真伪识别。The cigarette samples selected in this example are brand A and B, and 5 different machines were selected for the product, and the production time was from January to December 2015. A total of 83 normal samples of brand A were selected for model building, and brand B was selected. The 17 unknown samples were used to identify the authenticity of the model.

3、样品制备3. Sample preparation

将卷烟烟丝剥出，于40℃烘箱中烘干，使样品的水份基本保持一致，再用1095Cyclotec(XF-98B)型旋风精密粉碎机充分粉碎，过80目筛。The shredded cigarette tobacco was stripped off and dried in an oven at 40°C to keep the water content of the sample basically consistent, then fully pulverized with a 1095Cyclotec (XF-98B) cyclone precision grinder, and passed through an 80-mesh sieve.

本实施例的光谱扫描和数据处理及模型建立方法同实施例一，所建A、B牌号分类识别模型见图4：Spectrum scanning and data processing and model establishment method of the present embodiment are the same as embodiment one, and the built A, B trade mark classification identification model is shown in Fig. 4:

采用近红外扫描了51个非同类产品牌号为B的烟丝近红外光谱，计算与识别模型中参考光谱的加权相关系数，结果如图2实心点所示。从图中可以看出，这些数据点全部位于识别区间之外，识别率为100％，因此判定为非同类产品。该结论与实际情况完全一致，从而证明了识别模型的有效性。The near-infrared spectra of 51 non-similar products with brand B were scanned by near-infrared, and the weighted correlation coefficients of the reference spectra in the identification model were calculated and identified. The results are shown in solid points in Figure 2. It can be seen from the figure that all these data points are outside the recognition interval, and the recognition rate is 100%, so it is determined as a non-similar product. This conclusion is completely consistent with the actual situation, thus proving the validity of the recognition model.

4、模型验证4. Model verification

为了更好的验证模型的识别能力，本实验采用外部验证的方法，选取未参与建模的不同牌号的29批次卷烟样品，用所建模型对市场收集的A牌号假烟及牌号为B和C的卷烟进行识别，具体结果见表1:In order to better verify the identification ability of the model, this experiment adopts the method of external verification, selects 29 batches of cigarette samples of different brands that have not participated in the modeling, and uses the built model to compare the counterfeit cigarettes of brand A and brands B and B collected in the market. The cigarette of C is identified, and the specific results are shown in Table 1:

结果显示：不是正品A牌号的卷烟29个样品均成功识别出，识别率100％，说明所建模型可用于产品真伪识别。The results show that 29 samples of cigarettes that are not genuine brand A are successfully identified, and the identification rate is 100%, which shows that the model built can be used to identify the authenticity of products.

本发明的上述实施例仅仅清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定，对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其他不同形式的变化或变动，这里无法对所用的实施方式予以穷举，凡是属于本发明技术方案所引申出的显而易见的变化或变动仍处于本发明的保护范围之列。The above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other different forms can also be made on the basis of the above description. The changes or changes of the present invention cannot be exhaustively listed here, and any obvious changes or changes that are derived from the technical solutions of the present invention are still within the scope of protection of the present invention.

Claims

1. A near-infrared spectrum identification model based on a weighted correlation coefficient, using a near-infrared spectrometer to scan the spectra of a series of normal similar products, then extract the characteristic data of each spectrum, and use the average spectrum as a reference spectrum; calculate the product spectrum and Referring to the weighted correlation coefficient of the spectrum, the identification interval is calculated by the average value and standard deviation of the weighted correlation coefficient, and the identification model is established. The identification model is composed of the reference spectrum and the identification interval;

The model building includes the following steps:

(1) Spectrum scanning: scan the near-infrared spectrum of the sample to be tested, and extract the characteristic spectrum;

(2) Establish a data matrix: represent the spectral data of the i-th sample (i=1,2,...,m) with a vector si, and each spectrum contains n data points; use _the spectral vector _si as a row vector , the form of the spectral data matrix S is as follows,

(3) Calculate the average spectrum: calculate as follows on the basis of step (2) matrix formula:

In the formula,

(4) Calculate all spectra s and The weighted correlation coefficient of , expressed in wcc, the formula is as follows:

In the formula, s _j and Denote the jth data point of the spectra s and s respectively; w _j represents the jth data point of the weight vector w, w is the weight;

(5) The calculation formula of the weight vector w in step (4) is as follows:

In the formula, the calculation formula of the vector d is as follows:

In the formula, The superscript T indicates matrix transpose.

2. the near-infrared spectrum identification model based on weighted correlation coefficient according to claim 1, is characterized in that: apply weighted correlation coefficient computing formula, calculate the mean value and the standard deviation of the weighted correlation coefficient wcc of all similar spectra, wherein mean value uses to represent, the standard deviation is represented by d, and the identification interval is established where k is the scaling factor.

3. The near-infrared spectrum identification model based on weighted correlation coefficient according to claim 1 or 2, characterized in that: according to the identification interval calculated by weighted correlation coefficient and the calibration set data setting, the weighted correlation coefficient wcc of all similar spectra are greater than The identification interval for this type of product is

4. the near-infrared spectrum identification model based on weighted correlation coefficient according to claim 3 is characterized in that: when model is applied, by scanning the spectrum of sample to be analyzed, calculate weighted correlation coefficient wcc _x , if this coefficient falls into the identification interval It can be determined that it is a normal product of the same kind.

5. The near-infrared spectrum identification model based on weighted correlation coefficient according to claim 4, characterized in that: it includes the step of crushing the sample into 40-80 meshes before scanning.

6. The near-infrared spectrum identification model based on weighted correlation coefficient according to claim 5, characterized in that: the samples to which it belongs are shredded tobacco, tobacco stems and/or tobacco powder.