CN102841072A

CN102841072A - Method for identifying transgenic rice and non-transgenic rice based on NIR (Near Infrared Spectrum)

Info

Publication number: CN102841072A
Application number: CN2012102863872A
Authority: CN
Inventors: 朱诚; 张龙; 丁艳菲; 王珊珊
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2012-08-13
Filing date: 2012-08-13
Publication date: 2012-12-26

Abstract

The invention discloses a method for identifying transgenic rice and non-transgenic rice based on NIR (Near Infrared Spectrum), which comprises the following steps of: (1) transmitting the NIR to the rice seed samples, and collecting the diffuse reflection spectrum information of all the rice samples; (2) respectively preprocessing the diffuse reflection spectrum information for all the rice seed samples, extracting the spectrum information in characteristic spectrum regions after preprocessing through a principal component analysis method, selecting principal components, and acquiring the scores of the principal components; (3) building a model by the principal component scores corresponding to the rice seed sample spectrum information as an input and the rice seed type set values corresponding to the rice seed samples as an output; and (4) acquiring the scores of the principal components of the spectrum of rice seeds to be detected, taking the scores into the model in the step (3), and obtaining the types of the rice seeds to be detected. The invention is high in identification precision, simple to operate, low in cost, and is capable of realizing quickly and losslessly identifying the transgenic rice and the non-transgenic rice.

Description

A method for identifying transgenic rice and non-transgenic rice based on near-infrared spectroscopy

技术领域 technical field

本发明属于转基因植物的检测领域，尤其涉及基于近红外光谱鉴别转基因水稻和非转基因水稻的方法。The invention belongs to the detection field of transgenic plants, and in particular relates to a method for discriminating transgenic rice and non-transgenic rice based on near-infrared spectroscopy.

背景技术 Background technique

水稻是我国第一大粮食作物，年产量约占粮食总产量的38％，其生产关系着国家的粮食安全。转基因技术突破了水稻传统育种的限制，为保障我国的粮食安全提供了新的途径。随着水稻基因工程技术的飞速发展，转基因水稻的研究成果斐然。近20年来，利用转基因技术已成功研发出抗虫、抗病、抗除草剂、抗逆境和高产优质的转基因水稻，获得了大量目标性状遗传和表达稳定、农艺性状优良的转基因水稻株系。Rice is the largest grain crop in my country, and its annual output accounts for about 38% of the total grain output. Its production is related to the food security of the country. Transgenic technology breaks through the limitations of traditional rice breeding and provides a new way to ensure my country's food security. With the rapid development of rice genetic engineering technology, the research results of genetically modified rice are remarkable. In the past 20 years, the use of transgenic technology has successfully developed insect-resistant, disease-resistant, herbicide-resistant, stress-resistant, high-yield and high-quality transgenic rice, and obtained a large number of genetically modified rice lines with stable inheritance and expression of target traits and excellent agronomic traits.

随着大量转基因作物逐步走向市场，转基因作物和转基因作物加工的食物的安全性问题也开始受人们的关注。从本质上讲，转基因作物和常规育成的作物品种没有差别。常规育种一般是通过有性杂交来实现，而植物基因工程则是用农杆菌、基因枪、电激、微注射等技术将外源重组DNA导入植物基因组中，尽管从理论上讲，转基因的遗传特性及表型应该可以更加精确地预测，在应用上更为安全，但对转基因作物进行安全性仍然很有必要。As a large number of genetically modified crops gradually enter the market, the safety of genetically modified crops and food processed by genetically modified crops has also begun to attract people's attention. Essentially, GM crops are no different from conventionally bred varieties. Conventional breeding is generally achieved through sexual hybridization, while plant genetic engineering uses techniques such as Agrobacterium, gene gun, electric shock, and microinjection to introduce exogenous recombinant DNA into the plant genome. Traits and phenotypes should be more accurately predictable and safer in application, but it is still necessary to conduct safety tests on genetically modified crops.

转基因成分检测方法，可分为定性检测法和定量检测法。目前常用的转基因作物检测方法有PCR检测法、化学组织检测法、酶联免疫吸附法、外源基因整合鉴定法、Westren杂交法、生物测定检测法等。常规的一些转基因植物的检测方法已经不能满足目前快速、准确检测的需要，这些方法有的需要转膜、杂交，操作繁琐、费用高，有的不适合批量样本的检测，严重限制了其应用。转基因检测技术的发展趋势是操作简便、费用较低、适用性强。The detection methods of genetically modified ingredients can be divided into qualitative detection methods and quantitative detection methods. At present, the commonly used detection methods of transgenic crops include PCR detection method, chemical tissue detection method, enzyme-linked immunosorbent assay, exogenous gene integration identification method, Westren hybridization method, biological assay detection method and so on. Conventional detection methods for some transgenic plants can no longer meet the needs of rapid and accurate detection. Some of these methods require membrane transfer and hybridization, which are cumbersome and expensive to operate, and some are not suitable for the detection of batch samples, which severely limits their application. The development trend of genetically modified detection technology is simple operation, low cost and strong applicability.

申请公布号为CN 102081075A的发明专利申请公开了一种鉴别转基因水稻和非转基因水稻的方法，包括：(1)分别测定非转基因水稻C418和待检测水稻限定条件下人工培养生长6周叶中莽草酸和半乳糖醛酸的含量；(2)如果待检测水稻的莽草酸和半乳糖醛酸的含量相比非转基因水稻的莽草酸和半乳糖醛酸的含量有显著性下降，则待检测水稻是转基因水稻；由于莽草酸和半乳糖醛酸的检测方法较为复杂，致使此发明操作步骤繁琐，耗时长。The invention patent application with the application publication number CN 102081075A discloses a method for identifying transgenic rice and non-transgenic rice, including: (1) separately measuring shikimic acid and The content of galacturonic acid; (2) if the content of shikimic acid and galacturonic acid of the rice to be detected is significantly lower than the content of shikimic acid and galacturonic acid of non-transgenic rice, the rice to be detected is transgenic Rice: As the detection methods of shikimic acid and galacturonic acid are relatively complicated, the operation steps of this invention are cumbersome and time-consuming.

发明内容 Contents of the invention

本发明提供了基于近红外光谱鉴别转基因水稻和非转基因水稻的方法，该方法能够快速、无损、简便的鉴别转基因水稻和非转基因水稻。The invention provides a method for identifying transgenic rice and non-transgenic rice based on near-infrared spectroscopy, and the method can quickly, non-destructively and simply identify transgenic rice and non-transgenic rice.

一种基于近红外光谱鉴别转基因水稻和非转基因水稻的方法，包括以下步骤：A method for discriminating transgenic rice and non-transgenic rice based on near-infrared spectroscopy, comprising the following steps:

(1)向水稻种子样本发射波数范围为4000～10000cm^-1的近红外光谱，并采集所有水稻种子样本的漫反射光谱信息；(1) Transmit near-infrared spectra with wavenumbers ranging from 4000 to 10000 cm ^-1 to rice seed samples, and collect diffuse reflection spectral information of all rice seed samples;

(2)分别对所有的水稻种子样本的漫反射光谱信息进行预处理，利用主成分分析法提取特征光谱区段中的光谱特征信息，选取主成分，并获取各个主成分的得分；(2) Preprocess the diffuse reflectance spectral information of all rice seed samples respectively, use the principal component analysis method to extract the spectral characteristic information in the characteristic spectral section, select the principal components, and obtain the score of each principal component;

(3)以所有水稻种子样本光谱信息对应的各个主成分得分作为输入，以水稻种子样本对应的水稻种子类型设定值作为输出，建立模型；(3) With the respective principal component scores corresponding to the spectral information of all rice seed samples as input, the rice seed type setting value corresponding to the rice seed samples is used as output to establish a model;

(4)按照步骤(1)～(2)的操作获取待测水稻种子光谱的各个主成分得分，将其带入(3)中所述模型，获得待测水稻种子类型。(4) According to the operations of steps (1)-(2), obtain the scores of the principal components of the spectrum of the rice seeds to be tested, and bring them into the model described in (3) to obtain the types of rice seeds to be tested.

漫反射光是近红外光进入待测样品内部后，经过多次反射、折射、衍射、吸收，与待测样品内部分子发生了相互作用后返回至检测器的光，因此漫反射光谱信息负载了待测样品的结构和组成信息。由于转基因水稻种子和非转基因水稻种子内部的结构和组成有所差别，结合光谱信息处理技术，可提取出差别信息，并通过有效的数据处理方法将光谱图像信息间的细小差异进一步突出显示，转变为可识别的信号，用于转基因水稻与非转基因水稻的鉴别。Diffuse reflection light is the light that returns to the detector after the near-infrared light enters the sample to be tested, undergoes multiple reflections, refraction, diffraction, and absorption, and interacts with the molecules inside the sample to be tested. The structure and composition information of the sample to be tested. Since the internal structure and composition of transgenic rice seeds and non-transgenic rice seeds are different, combined with spectral information processing technology, the difference information can be extracted, and the small differences between spectral image information can be further highlighted and transformed through effective data processing methods. It is an identifiable signal for the identification of transgenic rice and non-transgenic rice.

步骤(1)中，所述水稻种子样本为转基因水稻种子样本和非转基因水稻种子样本，转基因水稻为转蛋白基因水稻或转调控基因水稻。In step (1), the rice seed samples are transgenic rice seed samples and non-transgenic rice seed samples, and the transgenic rice is transprotein gene rice or transregulated gene rice.

步骤(2)中，为了去除高频随机噪声、基线漂移、样本不均匀等因素的影响，需要对光谱进行预处理。In step (2), in order to remove the influence of factors such as high-frequency random noise, baseline drift, and sample inhomogeneity, the spectrum needs to be preprocessed.

光谱预处理方法的选取影响着预处理效果以及光谱有效信息的提取，通过比较分析，预处理方法优选为标准正态变换法。The selection of spectral preprocessing method affects the preprocessing effect and the extraction of effective spectral information. Through comparative analysis, the preprocessing method is preferably the standard normal transformation method.

所述特征光谱区段是指转基因水稻种子和非转基因水稻种子光谱信息中存在较为明显区别的区段，通过对光谱区段光谱信息进行数字化处理，可实现转基因水稻种子和非转基因水稻种子的鉴别；本发明中所述特征光谱区段优选为波数范围为4000～10000cm^-1或4000～8000cm^-1的光谱区段。The characteristic spectrum section refers to a section with obvious differences in the spectral information of transgenic rice seeds and non-transgenic rice seeds, and the identification of transgenic rice seeds and non-transgenic rice seeds can be realized by digitally processing the spectral information of the spectral section ; The characteristic spectral section in the present invention is preferably a spectral section with a wavenumber ranging from 4000 to 10000 cm ^-1 or 4000 to 8000 cm ^-1 .

利用主成分分析法提取光谱特征信息，主成分分析(PCA)的目的是将光谱数据降维，把原变量转换成一组彼此正交的新变量的线性组合，消除了多变量共存中相互重叠的信息，同时，新变量能最大限度地表征原变量的数据结构特征。由于主成分分析法的新变量数量少、彼此不相关，更有利于对光谱信息的分析。主成分分析方法可在Unscrambler软件(由美国CAMO制造)上实现。The principal component analysis (PCA) is used to extract spectral feature information. The purpose of principal component analysis (PCA) is to reduce the dimensionality of spectral data, transform the original variables into a linear combination of a set of new variables that are orthogonal to each other, and eliminate the overlapping of multiple variables. information, at the same time, the new variable can best characterize the data structure characteristics of the original variable. Because the number of new variables of principal component analysis is small and irrelevant to each other, it is more conducive to the analysis of spectral information. The principal component analysis method can be implemented on Unscrambler software (manufactured by CAMO, USA).

主成分的选取与原始信息的有效提取密切相关，在选择主成分时，取累计贡献率≥85％，但主成分选取不宜过多，否则会引入不必要的噪声并造成过拟合；采用的标准正态变换进行预处理时，通过比较分析，主成分数优选为4～8。The selection of principal components is closely related to the effective extraction of original information. When selecting principal components, the cumulative contribution rate should be ≥ 85%, but the selection of principal components should not be too many, otherwise unnecessary noise will be introduced and overfitting will be caused; When the standard normal transformation is used for preprocessing, the principal component score is preferably 4-8 through comparative analysis.

步骤(2)优选为：利用标准正态变换法分别对所有的水稻种子样本的漫反射光谱图进行预处理，利用主成分分析法提取预处理后的波数范围为4000～8000cm^-1光谱信息中的光谱特征信息，选取5个主成分，并获取各个主成分的得分。Step (2) is preferably: use the standard normal transformation method to preprocess the diffuse reflectance spectra of all rice seed samples, and use the principal component analysis method to extract the pretreated wavenumber range from 4000 to 8000cm ^-1 in the spectral information The spectral feature information of , select 5 principal components, and obtain the score of each principal component.

步骤(2)优选为：利用标准正态变换法分别对所有的水稻种子样本的漫反射光谱图进行预处理，利用主成分分析法提取预处理后的波数范围为4000～10000cm^-1光谱信息中的光谱特征信息，选取6个主成分，并获取各个主成分的得分。Step (2) is preferably: using the standard normal transformation method to preprocess the diffuse reflectance spectra of all rice seed samples, and using the principal component analysis method to extract the preprocessed wavenumber range from 4000 to 10000cm ^-1 in the spectral information The spectral feature information of , select 6 principal components, and obtain the score of each principal component.

步骤(3)中，水稻种子样本对应的水稻种子类型为转基因和非转基因，在模型中可设置转基因和非转基因的设定值分别为-1和1。In step (3), the rice seed types corresponding to the rice seed samples are transgenic and non-transgenic, and the set values of transgenic and non-transgenic can be set to -1 and 1, respectively, in the model.

所述模型优选为偏最小二乘判别分析(PLS-DA)模型，PLS-DA法是基于PLS回归的一种判别分析方法，在构造因素时因考虑到了辅助矩阵以代码形式提供的类成员信息，因此具有高效的鉴别能力，可提高水稻类型鉴别的精确度。Described model is preferably partial least squares discriminant analysis (PLS-DA) model, and PLS-DA method is a kind of discriminant analysis method based on PLS regression, because of taking into account the class membership information that auxiliary matrix provides with code form when constructing factor , so it has efficient identification ability and can improve the accuracy of rice type identification.

相对于现有技术，本发明的有益效果为：Compared with the prior art, the beneficial effects of the present invention are:

(1)本发明操作简单、省时省力，仅需获取水稻种子的近红外光谱，即可进行转基因与非转基因水稻的鉴别；(1) The present invention is simple to operate, saves time and effort, and only needs to obtain the near-infrared spectrum of rice seeds to identify transgenic and non-transgenic rice;

(2)本发明鉴别精度高、结果可靠，可实现转基因与非转基因水稻的快速、无损鉴别。(2) The invention has high identification accuracy and reliable results, and can realize rapid and non-destructive identification of transgenic and non-transgenic rice.

附图说明 Description of drawings

图1为实施例1验证集中转基因水稻和非转基因水稻预测值和实际值回归图。Fig. 1 is the regression chart of predicted value and actual value of transgenic rice and non-transgenic rice in verification set of Example 1.

具体实施方式 Detailed ways

实施例1Example 1

1、建立模型1. Build a model

(1)分别选取80粒转基因水稻种子和40粒非转基因水稻种子作为水稻种子样本，利用Nicolet Nexus870(Thermo Corporation USA)傅里叶变换近红外光谱仪向水稻种子样本发射波数范围为4000～10000cm^-1的近红外光谱，利用OMNIC 6.0软件采集所有水稻种子样本的漫反射光谱图；设置近红外光谱仪扫描次数32次，分辨率4cm^-1。采集时室温控制在25℃左右，湿度保持稳定。(1) Select 80 transgenic rice seeds and 40 non-transgenic rice seeds as rice seed samples, and use Nicolet Nexus870 (Thermo Corporation USA) Fourier transform near-infrared spectrometer to emit wavenumbers in the range of 4000-10000cm- ¹ to the rice seed samples The near-infrared spectrum was collected by using OMNIC 6.0 software to collect the diffuse reflectance spectra of all rice seed samples; the number of scans of the near-infrared spectrometer was set to 32, and the resolution was 4cm ^-1 . During collection, the room temperature was controlled at about 25°C, and the humidity was kept stable.

以上转基因水稻为转TCTP基因水稻和转Osmi166基因水稻，其获取方法可为：转基因材料分别采用水稻的成熟胚为外植体诱导、培养愈伤组织，挑选胚性愈伤组织作为转化受体，通过包含植物表达载体p1301-TCTP和p1301-mi166的根癌农杆菌EHA105转化到水稻愈伤中，经一系列筛选分化得到转基因水稻。非转基因水稻为中花11水稻。The above-mentioned transgenic rice is TCTP gene transgenic rice and Osmi166 gene transgenic rice, and its acquisition method can be: the transgenic material adopts the mature embryo of rice respectively as explant induction, cultures callus, selects embryogenic callus as transformation recipient, Transgenic rice was obtained by transforming Agrobacterium tumefaciens EHA105 containing plant expression vectors p1301-TCTP and p1301-mi166 into rice calluses and differentiated through a series of screening. The non-transgenic rice is Zhonghua 11 rice.

(2)利用标准正态变换法分别对所有的水稻种子样本的漫反射光谱图进行预处理，得到预处理后的光谱信息；利用Unscrambler软件中的主成分分析法提取各个预处理后的波数范围为4000～10000cm^-1光谱信息中的光谱特征信息，选取主成分数为6，获取各个主成分的得分(如表1所示)；(2) Use the standard normal transformation method to preprocess the diffuse reflectance spectra of all rice seed samples to obtain the preprocessed spectral information; use the principal component analysis method in the Unscrambler software to extract the wave number range after each pretreatment For the spectral characteristic information in the 4000～10000cm ^-1 spectral information, select the number of principal components as 6, and obtain the scores of each principal component (as shown in Table 1);

(3)以所有水稻种子样本光谱信息对应的各个主成分得分作为输入，以水稻种子样本对应的水稻种子类型设定值作为输出，建立PLS-DA模型；水稻种子样本对应的水稻种子类型设定值设置如下：转TCTP基因水稻和转mi166基因水稻设定为-1，非转基因水稻设定为1。限于篇幅，仅将其中20粒水稻种子样本的数据列于此，见表1。(3) Take the principal component scores corresponding to the spectral information of all rice seed samples as input, and use the rice seed type setting value corresponding to the rice seed sample as the output to establish the PLS-DA model; the rice seed type setting corresponding to the rice seed sample The values were set as follows: TCTP-transgenic rice and mi166-transgenic rice were set to -1, and non-transgenic rice was set to 1. Due to space limitations, only the data of 20 rice seed samples are listed here, see Table 1.

表1用于模型建立的部分数据库Table 1 Part of the database used for model building

NO NO X₁ _x1 X₂ _x2 X₃ _x3 X₄ _x4 X₅ _x5 X₆ X ₆ Y Y 1 1 1.3530 1.3530 -0.7930 -0.7930 0.7700 0.7700 1.0820 1.0820 0.3880 0.3880 -0.1170 -0.1170 1 1 2 2 1.1700 1.1700 0.5890 0.5890 0.7120 0.7120 -0.8540 -0.8540 0.8270 0.8270 -0.5790 -0.5790 1 1 3 3 2.0450 2.0450 1.2560 1.2560 -0.0391 -0.0391 -0.9710 -0.9710 0.5480 0.5480 -0.7620 -0.7620 1 1 4 4 -0.4370 -0.4370 -0.1520 -0.1520 1.0850 1.0850 0.4820 0.4820 0.4580 0.4580 -0.3600 -0.3600 1 1 5 5 -0.6610 -0.6610 0.1800 0.1800 0.4670 0.4670 0.6060 0.6060 0.5070 0.5070 -0.5080 -0.5080 1 1 6 6 -2.7930 -2.7930 0.7790 0.7790 -0.2370 -0.2370 0.5790 0.5790 0.4760 0.4760 0.1470 0.1470 1 1 7 7 2.0430 2.0430 -0.8770 -0.8770 -0.3850 -0.3850 0.8900 0.8900 0.1430 0.1430 0.2080 0.2080 1 1 8 8 -2.5240 -2.5240 0.3730 0.3730 -0.0229 -0.0229 0.3550 0.3550 0.6040 0.6040 0.3070 0.3070 1 1 9 9 -1.3080 -1.3080 -0.1710 -0.1710 0.5250 0.5250 0.1620 0.1620 0.6420 0.6420 0.0927 0.0927 1 1 10 10 -4.5830 -4.5830 1.6350 1.6350 -0.3410 -0.3410 1.0520 1.0520 -0.1640 -0.1640 -0.2080 -0.2080 1 1 11 11 -1.0420 -1.0420 -0.1430 -0.1430 -0.5020 -0.5020 -0.8720 -0.8720 -0.6650 -0.6650 -0.7770 -0.7770 -1 -1 12 12 -2.1080 -2.1080 -0.8060 -0.8060 0.0816 0.0816 -0.9740 -0.9740 -0.0098 -0.0098 -0.5140 -0.5140 -1 -1 13 13 1.3600 1.3600 -2.1210 -2.1210 -0.0541 -0.0541 -0.0623 -0.0623 -0.4610 -0.4610 -0.2890 -0.2890 -1 -1 14 14 -4.7990 -4.7990 -0.3640 -0.3640 0.1410 0.1410 -0.7450 -0.7450 -0.6880 -0.6880 -0.3830 -0.3830 -1 -1 15 15 0.9920 0.9920 -1.4920 -1.4920 -0.3650 -0.3650 1.3850 1.3850 -0.7650 -0.7650 -0.1610 -0.1610 -1 -1 16 16 -0.4150 -0.4150 -0.2960 -0.2960 0.2750 0.2750 -1.3150 -1.3150 -0.0383 -0.0383 -0.7440 -0.7440 -1 -1 17 17 0.6530 0.6530 -1.8010 -1.8010 0.2070 0.2070 -0.8100 -0.8100 -0.2300 -0.2300 -0.3810 -0.3810 -1 -1 18 18 -2.5350 -2.5350 -1.3630 -1.3630 0.5500 0.5500 -0.6420 -0.6420 0.0111 0.0111 -0.0736 -0.0736 -1 -1 19 19 -2.1330 -2.1330 -1.1560 -1.1560 0.6900 0.6900 -0.5170 -0.5170 -0.1900 -0.1900 -0.7030 -0.7030 -1 -1 20 20 1.3300 1.3300 -3.3070 -3.3070 0.6430 0.6430 0.1860 0.1860 -0.1150 -0.1150 -0.5490 -0.5490 -1 -1

其中，NO是指水稻种子样本的序号，X₁、X₂、X₃、X₄、X₅、X₆分别对应每个主成分得分；Y为输出值。Among them, NO refers to the serial number of the rice seed sample, X ₁ , X ₂ , X ₃ , X ₄ , X ₅ , and X ₆ respectively correspond to the scores of each principal component; Y is the output value.

2、利用模型预测校正集水稻种子样本的水稻种子类型2. Use the model to predict the rice seed type of the rice seed samples in the correction set

建立PLS-DA模型后，将1中根据步骤(1)～(2)获取的校正集中水稻种子样本的6个主成分得分，带入1中建立好的PLS-DA模型，得到输出值(如表2所示)；通过输出值，根据以下原则来确定水稻种子样本的水稻种子类型：预测值小于0时，为转基因水稻；预测值大于0时，为非转基因水稻。限于篇幅，仅将其中20粒水稻种子样本的数据列于此，见表2。After the PLS-DA model is established, the six principal component scores of the rice seed samples in the correction set obtained in step 1 according to steps (1)-(2) are brought into the established PLS-DA model in step 1 to obtain the output value (such as shown in Table 2); through the output value, the rice seed type of the rice seed sample is determined according to the following principles: when the predicted value is less than 0, it is transgenic rice; when the predicted value is greater than 0, it is non-transgenic rice. Due to space limitations, only the data of 20 rice seed samples are listed here, see Table 2.

表2校正集中水稻种子样本预测水稻种子类型以及实际水稻种子类型Table 2 Predicted rice seed types and actual rice seed types of rice seed samples in the calibration set

NO NO X₁ _x1 X₂ _x2 X₃ _x3 X₄ _x4 X₅ _x5 X₆ X ₆ Y Y S₁ S ₁ S₂ S ₂ 1 1 1.3530 1.3530 -0.7930 -0.7930 0.7700 0.7700 1.0820 1.0820 0.3880 0.3880 -0.1170 -0.1170 0.88 0.88 1 1 1 1 2 2 1.1700 1.1700 0.5890 0.5890 0.7120 0.7120 -0.8540 -0.8540 0.8270 0.8270 -0.5790 -0.5790 1.213 1.213 1 1 1 1 3 3 2.0450 2.0450 1.2560 1.2560 -0.0391 -0.0391 -0.9710 -0.9710 0.5480 0.5480 -0.7620 -0.7620 1.162 1.162 1 1 1 1 4 4 -0.4370 -0.4370 -0.1520 -0.1520 1.0850 1.0850 0.4820 0.4820 0.4580 0.4580 -0.3600 -0.3600 0.976 0.976 1 1 1 1 5 5 -0.6610 -0.6610 0.1800 0.1800 0.4670 0.4670 0.6060 0.6060 0.5070 0.5070 -0.5080 -0.5080 0.793 0.793 1 1 1 1 6 6 -2.7930 -2.7930 0.7790 0.7790 -0.2370 -0.2370 0.5790 0.5790 0.4760 0.4760 0.1470 0.1470 0.665 0.665 1 1 1 1 7 7 2.0430 2.0430 -0.8770 -0.8770 -0.3850 -0.3850 0.8900 0.8900 0.1430 0.1430 0.2080 0.2080 0.222 0.222 1 1 1 1 8 8 -2.5240 -2.5240 0.3730 0.3730 -0.0229 -0.0229 0.3550 0.3550 0.6040 0.6040 0.3070 0.3070 0.604 0.604 1 1 1 1 9 9 -1.3080 -1.3080 -0.1710 -0.1710 0.5250 0.5250 0.1620 0.1620 0.6420 0.6420 0.0927 0.0927 0.635 0.635 1 1 1 1 10 10 -4.5830 -4.5830 1.6350 1.6350 -0.3410 -0.3410 1.0520 1.0520 -0.1640 -0.1640 -0.2080 -0.2080 0.653 0.653 1 1 1 1 11 11 -1.0420 -1.0420 -0.1430 -0.1430 -0.5020 -0.5020 -0.8720 -0.8720 -0.6650 -0.6650 -0.7770 -0.7770 -1.046 -1.046 -1 -1 -1 -1 12 12 -2.1080 -2.1080 -0.8060 -0.8060 0.0816 0.0816 -0.9740 -0.9740 -0.0098 -0.0098 -0.5140 -0.5140 -1.015 -1.015 -1 -1 -1 -1 13 13 1.3600 1.3600 -2.1210 -2.1210 -0.0541 -0.0541 -0.0623 -0.0623 -0.4610 -0.4610 -0.2890 -0.2890 -1.253 -1.253 -1 -1 -1 -1 14 14 -4.7990 -4.7990 -0.3640 -0.3640 0.1410 0.1410 -0.7450 -0.7450 -0.6880 -0.6880 -0.3830 -0.3830 -1.264 -1.264 -1 -1 -1 -1 15 15 0.9920 0.9920 -1.4920 -1.4920 -0.3650 -0.3650 1.3850 1.3850 -0.7650 -0.7650 -0.1610 -0.1610 -0.717 -0.717 -1 -1 -1 -1 16 16 -0.4150 -0.4150 -0.2960 -0.2960 0.2750 0.2750 -1.3150 -1.3150 -0.0383 -0.0383 -0.7440 -0.7440 -0.456 -0.456 -1 -1 -1 -1 17 17 0.6530 0.6530 -1.8010 -1.8010 0.2070 0.2070 -0.8100 -0.8100 -0.2300 -0.2300 -0.3810 -0.3810 -1.153 -1.153 -1 -1 -1 -1 18 18 -2.5350 -2.5350 -1.3630 -1.3630 0.5500 0.5500 -0.6420 -0.6420 0.0111 0.0111 -0.0736 -0.0736 -0.919 -0.919 -1 -1 -1 -1 19 19 -2.1330 -2.1330 -1.1560 -1.1560 0.6900 0.6900 -0.5170 -0.5170 -0.1900 -0.1900 -0.7030 -0.7030 -0.909 -0.909 -1 -1 -1 -1 20 20 1.3300 1.3300 -3.3070 -3.3070 0.6430 0.6430 0.1860 0.1860 -0.1150 -0.1150 -0.5490 -0.5490 -1.566 -1.566 -1 -1 -1 -1

其中，NO是指水稻种子样本的序号，X₁、X₂、X₃、X₄、X₅、X₆分别对应每个主成分得分；Y为输出值，S₁为模型预测水稻种子类型，S₂为实际水稻种子类型；S₁和S₂列中，1代表非转基因水稻，-1代表转基因水稻。Among them, NO refers to the serial number of the rice seed sample, X ₁ , X ₂ , X ₃ , X ₄ , X ₅ , and X ₆ respectively correspond to the scores of each principal component; Y is the output value, S ₁ is the type of rice seed predicted by the model, S ₂ is the actual rice seed type; in columns S ₁ and S ₂ , 1 represents non-transgenic rice, and -1 represents transgenic rice.

通过数据分析可知：校正集决定系数R² _c为0.9183、校正集均方根误差RMSECV为0.2695，建立的PLS-DA模型对校正集水稻种子样本的预测准确率达到100％。Through data analysis, it can be seen that the coefficient of determination R ² _c of the calibration set is 0.9183, and the root mean square error RMSECV of the calibration set is 0.2695.

3、利用模型预测验证集待测水稻种子的水稻种子类型3. Use the model to predict the rice seed type of the rice seeds to be tested in the verification set

取60粒待测水稻种子，作为验证集，根据1中步骤(1)～(2)获取待测水稻种子的6个主成分得分，并将其带入1中建立好的PLS-DA模型，得到模型输出值Y；通过输出值Y，根据2中判定原则来确定待测水稻种子的类型。限于篇幅，仅将25个待测水稻种子的数据列举在此，如表3所示。Take 60 rice seeds to be tested as a verification set, obtain the six principal component scores of the rice seeds to be tested according to steps (1)-(2) in 1, and bring them into the established PLS-DA model in 1, The model output value Y is obtained; through the output value Y, the type of the rice seed to be tested is determined according to the judgment principle in 2. Due to space limitations, only the data of 25 rice seeds to be tested are listed here, as shown in Table 3.

表3验证集中待测水稻种子预测水稻种子类型及实际水稻种子类型Table 3 The predicted rice seed type and the actual rice seed type of the rice seeds to be tested in the verification set

NO NO X₁ _x1 X₂ _x2 X₃ _x3 X₄ _x4 X₅ _x5 X₆ X ₆ Y Y S₁ S ₁ S₂ S ₂ 1 1 -2.9310 -2.9310 -0.3970 -0.3970 -0.6650 -0.6650 0.5860 0.5860 -0.1070 -0.1070 0.3520 0.3520 0.968 0.968 1 1 1 1 2 2 -0.6670 -0.6670 -0.2310 -0.2310 0.1920 0.1920 0.0401 0.0401 0.0825 0.0825 0.2030 0.2030 0.73 0.73 1 1 1 1 3 3 -2.2690 -2.2690 -1.0310 -1.0310 0.8860 0.8860 -0.3120 -0.3120 -0.0803 -0.0803 0.0724 0.0724 1.188 1.188 1 1 1 1 4 4 -0.5720 -0.5720 -0.3310 -0.3310 0.3430 0.3430 -0.1160 -0.1160 -0.2660 -0.2660 -0.0198 -0.0198 1.178 1.178 1 1 1 1 5 5 -0.7930 -0.7930 -0.4970 -0.4970 0.0681 0.0681 0.2100 0.2100 -0.1360 -0.1360 0.5340 0.5340 0.846 0.846 1 1 1 1 6 6 6.3560 6.3560 0.4730 0.4730 -0.8840 -0.8840 0.2470 0.2470 -0.5010 -0.5010 0.1160 0.1160 0.986 0.986 1 1 1 1 7 7 -2.6240 -2.6240 0.2530 0.2530 -0.0890 -0.0890 -0.3410 -0.3410 0.1510 0.1510 -0.3470 -0.3470 0.505 0.505 1 1 1 1 8 8 -3.1970 -3.1970 0.5690 0.5690 -0.6570 -0.6570 -0.2880 -0.2880 -0.2240 -0.2240 -0.6270 -0.6270 0.935 0.935 1 1 1 1 9 9 -1.0460 -1.0460 1.7940 1.7940 -0.7750 -0.7750 0.2840 0.2840 -0.2470 -0.2470 -0.2890 -0.2890 0.978 0.978 1 1 1 1 10 10 -0.8210 -0.8210 1.0490 1.0490 -0.2000 -0.2000 0.4340 0.4340 -0.1160 -0.1160 -0.1670 -0.1670 0.293 0.293 1 1 1 1 11 11 -3.7120 -3.7120 0.7550 0.7550 -0.0027 -0.0027 -0.1010 -0.1010 -0.3900 -0.3900 -0.4650 -0.4650 0.28 0.28 1 1 1 1 12 12 0.2270 0.2270 0.6260 0.6260 0.0185 0.0185 -0.3020 -0.3020 0.2860 0.2860 0.5710 0.5710 0.871 0.871 1 1 1 1 13 13 -2.8670 -2.8670 -0.9860 -0.9860 0.9000 0.9000 -0.4200 -0.4200 0.1780 0.1780 0.3570 0.3570 -0.722 -0.722 -1 -1 -1 -1 14 14 -2.5370 -2.5370 -0.0343 -0.0343 0.8240 0.8240 -0.5960 -0.5960 -0.1770 -0.1770 0.2700 0.2700 -0.771 -0.771 -1 -1 -1 -1 15 15 2.9340 2.9340 1.1510 1.1510 1.5360 1.5360 -0.7920 -0.7920 -0.3610 -0.3610 0.0080 0.0080 -0.886 -0.886 -1 -1 -1 -1 16 16 3.2440 3.2440 1.3420 1.3420 1.3470 1.3470 -1.1520 -1.1520 -0.3610 -0.3610 -0.3780 -0.3780 -1.093 -1.093 -1 -1 -1 -1 17 17 -1.3670 -1.3670 0.9700 0.9700 -0.4390 -0.4390 0.0960 0.0960 0.4480 0.4480 0.0888 0.0888 -0.959 -0.959 -1 -1 -1 -1 18 18 0.0870 0.0870 0.3080 0.3080 0.8900 0.8900 -0.0888 -0.0888 0.5510 0.5510 0.1600 0.1600 -1.023 -1.023 -1 -1 -1 -1

19 19 -2.4050 -2.4050 0.3110 0.3110 -0.2150 -0.2150 0.1330 0.1330 -0.0730 -0.0730 -0.3800 -0.3800 -0.959 -0.959 -1 -1 -1 -1 20 20 3.1820 3.1820 -0.0256 -0.0256 1.6990 1.6990 0.8590 0.8590 0.5650 0.5650 -0.5930 -0.5930 -1.554 -1.554 -1 -1 -1 -1 21 twenty one 0.2270 0.2270 -2.7020 -2.7020 -0.3470 -0.3470 -0.1950 -0.1950 0.2150 0.2150 -0.6300 -0.6300 -0.45 -0.45 -1 -1 -1 -1 22 twenty two -3.4210 -3.4210 -0.4880 -0.4880 -0.1550 -0.1550 0.2390 0.2390 0.2590 0.2590 -0.3490 -0.3490 -0.878 -0.878 -1 -1 -1 -1 23 twenty three -0.8670 -0.8670 0.3150 0.3150 0.3190 0.3190 0.0802 0.0802 -0.1740 -0.1740 -0.7870 -0.7870 -1.316 -1.316 -1 -1 -1 -1 24 twenty four -1.4390 -1.4390 0.1650 0.1650 0.0678 0.0678 0.4940 0.4940 -0.4470 -0.4470 -1.2150 -1.2150 -0.944 -0.944 -1 -1 -1 -1 25 25 -0.3240 -0.3240 1.0430 1.0430 0.0300 0.0300 -0.2890 -0.2890 -0.1670 -0.1670 0.4830 0.4830 -0.877 -0.877 -1 -1 -1 -1

其中，NO是指待测水稻种子的序号，X₁、X₂、X₃、X₄、X₅、X₆分别对应每个主成分得分；Y为输出值，S₁为模型预测水稻种子类型，S₂为实际水稻种子类型；S₁和S₂列中，1代表非转基因水稻，-1代表转基因水稻。Among them, NO refers to the serial number of the rice seed to be tested, X ₁ , X ₂ , X ₃ , X ₄ , X ₅ , and X ₆ correspond to the scores of each principal component respectively; Y is the output value, and S ₁ is the type of rice seed predicted by the model , S ₂ is the actual rice seed type; in columns S ₁ and S ₂ , 1 represents non-transgenic rice, and -1 represents transgenic rice.

通过数据分析可知：验证集决定系数R² _p为0.8979、验证集均方根误差RMSEP为0.2878，建立的PLS-DA模型对验证集水稻种子的预测准确率达到100％。验证集中转基因水稻和非转基因水稻预测值和实际值回归图如图1所示。Through data analysis, it can be seen that the determination coefficient R ² _p of the verification set is 0.8979, and the root mean square error RMSEP of the verification set is 0.2878. The regression chart of the predicted value and actual value of transgenic rice and non-transgenic rice in the verification set is shown in Figure 1.

实施例2Example 2

1、建立模型1. Build a model

(1)同实施例1。(1) With embodiment 1.

(2)利用标准正态变换法分别对所有的水稻种子样本的漫反射光谱图进行预处理，得到预处理后的光谱信息；利用Unscrambler软件中的主成分分析法提取各个预处理后的波数范围为4000～8000cm^-1光谱区段中的光谱特征信息，选取主成分数为5，获取各个主成分的得分(如表4所示)；(2) Use the standard normal transformation method to preprocess the diffuse reflectance spectra of all rice seed samples to obtain the preprocessed spectral information; use the principal component analysis method in the Unscrambler software to extract the wave number range after each pretreatment For the spectral feature information in the 4000～8000cm ^-1 spectral segment, select the principal component number as 5, and obtain the scores of each principal component (as shown in Table 4);

(3)以所有水稻种子样本光谱信息对应的各个主成分得分作为输入，以水稻种子样本对应的水稻种子类型设定值作为输出，建立PLS-DA模型；水稻种子样本对应的水稻种子类型设定值设置如下：转TCTP基因水稻和转mi166基因水稻设定为-1，非转基因水稻设定为1。限于篇幅，仅将其中20粒水稻种子样本的数据列于此，见表4。(3) Take the principal component scores corresponding to the spectral information of all rice seed samples as input, and use the rice seed type setting value corresponding to the rice seed sample as the output to establish the PLS-DA model; the rice seed type setting corresponding to the rice seed sample The values were set as follows: -1 for TCTP-transgenic rice and mi166-transgenic rice, and 1 for non-transgenic rice. Due to space limitations, only the data of 20 rice seed samples are listed here, see Table 4.

表4用于模型建立的部分数据库Table 4 Part of the database used for model building

NO NO X₁ _x1 X₂ _x2 X₃ _x3 X₄ _x4 X₅ _x5 X₆ X ₆ Y Y

1 1 1.5760 1.5760 -1.2890 -1.2890 0.2420 0.2420 0.7620 0.7620 0.1610 0.1610 0.0249 0.0249 1 1 2 2 0.7060 0.7060 1.4910 1.4910 -0.1330 -0.1330 0.4010 0.4010 0.2350 0.2350 -0.2130 -0.2130 1 1 3 3 2.2380 2.2380 1.8980 1.8980 -0.1730 -0.1730 0.3890 0.3890 0.0969 0.0969 -0.5140 -0.5140 1 1 4 4 -1.4210 -1.4210 -0.5080 -0.5080 0.3290 0.3290 0.9520 0.9520 -0.0171 -0.0171 -0.1600 -0.1600 1 1 5 5 -1.3610 -1.3610 -0.4000 -0.4000 0.2890 0.2890 0.5180 0.5180 0.0336 0.0336 -0.3000 -0.3000 1 1 6 6 -3.8860 -3.8860 -0.1460 -0.1460 0.0375 0.0375 0.2370 0.2370 0.1880 0.1880 -0.2860 -0.2860 1 1 7 7 3.0420 3.0420 -1.4200 -1.4200 0.1070 0.1070 -0.1660 -0.1660 0.1130 0.1130 0.1830 0.1830 1 1 8 8 -3.5180 -3.5180 -0.0658 -0.0658 -0.0167 -0.0167 0.2420 0.2420 0.3000 0.3000 0.0238 0.0238 1 1 9 9 -2.0770 -2.0770 0.1050 0.1050 0.1180 0.1180 -0.2780 -0.2780 0.2160 0.2160 0.0316 0.0316 1 1 10 10 -6.2470 -6.2470 0.3450 0.3450 0.8330 0.8330 0.4870 0.4870 -0.5970 -0.5970 -0.1830 -0.1830 1 1 11 11 -1.0590 -1.0590 0.2430 0.2430 -0.6420 -0.6420 -0.0786 -0.0786 -0.2050 -0.2050 0.1910 0.1910 -1 -1 12 12 -3.0900 -3.0900 0.0187 0.0187 -0.5990 -0.5990 0.2100 0.2100 -0.0894 -0.0894 0.0323 0.0323 -1 -1 13 13 2.2590 2.2590 -1.2940 -1.2940 -0.2860 -0.2860 -0.2610 -0.2610 -0.1850 -0.1850 0.2650 0.2650 -1 -1 14 14 -6.5120 -6.5120 0.3220 0.3220 -0.6920 -0.6920 -0.2590 -0.2590 0.1780 0.1780 0.4370 0.4370 -1 -1 15 15 2.2500 2.2500 -2.4760 -2.4760 -0.0601 -0.0601 -0.2320 -0.2320 -0.3630 -0.3630 0.2610 0.2610 -1 -1 16 16 -0.9040 -0.9040 0.9580 0.9580 -0.5490 -0.5490 -0.2850 -0.2850 0.0365 0.0365 -0.1090 -0.1090 -1 -1 17 17 1.1180 1.1180 -0.1560 -0.1560 -0.3270 -0.3270 -0.7240 -0.7240 0.2460 0.2460 -0.2080 -0.2080 -1 -1 18 18 -3.4160 -3.4160 -0.1470 -0.1470 -0.2280 -0.2280 -0.2390 -0.2390 -0.0240 -0.0240 -0.0630 -0.0630 -1 -1 19 19 -3.2530 -3.2530 -0.4290 -0.4290 -0.2160 -0.2160 -0.4260 -0.4260 0.0084 0.0084 -0.0983 -0.0983 -1 -1 20 20 2.0300 2.0300 -1.7920 -1.7920 -0.1390 -0.1390 -0.9330 -0.9330 -0.0915 -0.0915 -0.5240 -0.5240 -1 -1

建立PLS-DA模型后，将1中根据步骤(1)～(2)获取的校正集中水稻种子样本的6个主成分得分，带入1中建立好的PLS-DA模型，得到输出值(如表5所示)；通过输出值，根据以下原则来确定水稻种子样本的水稻种子类型：预测值小于0时，为转基因水稻；预测值大于0时，为非转基因水稻。限于篇幅，仅将其中20粒水稻种子样本的数据列于此，见表5。After the PLS-DA model is established, the six principal component scores of the rice seed samples in the correction set obtained in step 1 according to steps (1)-(2) are brought into the established PLS-DA model in step 1 to obtain the output value (such as shown in Table 5); through the output value, the rice seed type of the rice seed sample is determined according to the following principles: when the predicted value is less than 0, it is transgenic rice; when the predicted value is greater than 0, it is non-transgenic rice. Due to space limitations, only the data of 20 rice seed samples are listed here, see Table 5.

表5校正集中水稻种子样本预测水稻种子类型以及实际水稻种子类型Table 5 Predicted rice seed type and actual rice seed type of rice seed samples in the calibration set

NO NO X₁ _x1 X₂ _x2 X₃ _x3 X₄ _x4 X₅ _x5 X₆ X ₆ Y Y S₁ S ₁ S₂ S ₂ 1 1 1.5760 1.5760 -1.2890 -1.2890 0.2420 0.2420 0.7620 0.7620 0.1610 0.1610 0.0249 0.0249 1.069 1.069 1 1 1 1 2 2 0.7060 0.7060 1.4910 1.4910 -0.1330 -0.1330 0.4010 0.4010 0.2350 0.2350 -0.2130 -0.2130 1.235 1.235 1 1 1 1 3 3 2.2380 2.2380 1.8980 1.8980 -0.1730 -0.1730 0.3890 0.3890 0.0969 0.0969 -0.5140 -0.5140 1.336 1.336 1 1 1 1 4 4 -1.4210 -1.4210 -0.5080 -0.5080 0.3290 0.3290 0.9520 0.9520 -0.0171 -0.0171 -0.1600 -0.1600 1.159 1.159 1 1 1 1 5 5 -1.3610 -1.3610 -0.4000 -0.4000 0.2890 0.2890 0.5180 0.5180 0.0336 0.0336 -0.3000 -0.3000 0.895 0.895 1 1 1 1 6 6 -3.8860 -3.8860 -0.1460 -0.1460 0.0375 0.0375 0.2370 0.2370 0.1880 0.1880 -0.2860 -0.2860 0.29 0.29 1 1 1 1 7 7 3.0420 3.0420 -1.4200 -1.4200 0.1070 0.1070 -0.1660 -0.1660 0.1130 0.1130 0.1830 0.1830 0.275 0.275 1 1 1 1 8 8 -3.5180 -3.5180 -0.0658 -0.0658 -0.0167 -0.0167 0.2420 0.2420 0.3000 0.3000 0.0238 0.0238 0.379 0.379 1 1 1 1 9 9 -2.0770 -2.0770 0.1050 0.1050 0.1180 0.1180 -0.2780 -0.2780 0.2160 0.2160 0.0316 0.0316 0.369 0.369 1 1 1 1 10 10 -6.2470 -6.2470 0.3450 0.3450 0.8330 0.8330 0.4870 0.4870 -0.5970 -0.5970 -0.1830 -0.1830 0.938 0.938 1 1 1 1 11 11 -1.0590 -1.0590 0.2430 0.2430 -0.6420 -0.6420 -0.0786 -0.0786 -0.2050 -0.2050 0.1910 0.1910 -0.951 -0.951 -1 -1 -1 -1 12 12 -3.0900 -3.0900 0.0187 0.0187 -0.5990 -0.5990 0.2100 0.2100 -0.0894 -0.0894 0.0323 0.0323 -0.862 -0.862 -1 -1 -1 -1 13 13 2.2590 2.2590 -1.2940 -1.2940 -0.2860 -0.2860 -0.2610 -0.2610 -0.1850 -0.1850 0.2650 0.2650 -0.727 -0.727 -1 -1 -1 -1 14 14 -6.5120 -6.5120 0.3220 0.3220 -0.6920 -0.6920 -0.2590 -0.2590 0.1780 0.1780 0.4370 0.4370 -1.306 -1.306 -1 -1 -1 -1 15 15 2.2500 2.2500 -2.4760 -2.4760 -0.0601 -0.0601 -0.2320 -0.2320 -0.3630 -0.3630 0.2610 0.2610 -0.951 -0.951 -1 -1 -1 -1 16 16 -0.9040 -0.9040 0.9580 0.9580 -0.5490 -0.5490 -0.2850 -0.2850 0.0365 0.0365 -0.1090 -0.1090 -0.438 -0.438 -1 -1 -1 -1 17 17 1.1180 1.1180 -0.1560 -0.1560 -0.3270 -0.3270 -0.7240 -0.7240 0.2460 0.2460 -0.2080 -0.2080 -0.395 -0.395 -1 -1 -1 -1 18 18 -3.4160 -3.4160 -0.1470 -0.1470 -0.2280 -0.2280 -0.2390 -0.2390 -0.0240 -0.0240 -0.0630 -0.0630 -0.608 -0.608 -1 -1 -1 -1 19 19 -3.2530 -3.2530 -0.4290 -0.4290 -0.2160 -0.2160 -0.4260 -0.4260 0.0084 0.0084 -0.0983 -0.0983 -0.773 -0.773 -1 -1 -1 -1 20 20 2.0300 2.0300 -1.7920 -1.7920 -0.1390 -0.1390 -0.9330 -0.9330 -0.0915 -0.0915 -0.5240 -0.5240 -1.068 -1.068 -1 -1 -1 -1

其中，NO是指水稻种子样本的序号，X₁、X₂、X₃、X₄、X₅、X₆分别对应每个主成分得分；Y为输出值，S₁为模型预测水稻种子类型，S₂为实际水稻种子类型；S₁和S₂列中，1代表非转基因水稻，-1代表转基因水稻。Among them, NO refers to the serial number of the rice seed sample, X ₁ , X ₂ , X ₃ , X ₄ , X ₅ , and X ₆ correspond to the scores of each principal component respectively; Y is the output value, S ₁ is the type of rice seed predicted by the model, S ₂ is the actual rice seed type; in columns S ₁ and S ₂ , 1 represents non-transgenic rice, and -1 represents transgenic rice.

通过数据分析可知：校正集决定系数R² _c为0.8578、校正集均方根误差RMSECV为0.3555，建立的PLS-DA模型对校正集水稻样本的预测准确率达到100％。Through data analysis, it can be seen that the coefficient of determination R ² _c of the calibration set is 0.8578, and the root mean square error RMSECV of the calibration set is 0.3555.

取60粒待测水稻种子，作为验证集，根据1中步骤(1)～(3)获取待测水稻种子的6个主成分得分，并将其带入建立好的PLS-DA模型，得到模型输出值Y；通过输出值Y，根据2中判定原则来确定待测水稻种子的类型。限于篇幅，仅将25粒待测水稻种子的数据列于此，见表6。Take 60 rice seeds to be tested as a verification set, obtain the six principal component scores of the rice seeds to be tested according to steps (1) to (3) in 1, and bring them into the established PLS-DA model to obtain the model Output value Y; through the output value Y, determine the type of the rice seed to be tested according to the judgment principle in 2. Due to space limitations, only the data of 25 rice seeds to be tested are listed here, see Table 6.

表6验证集中待测水稻种子预测水稻种子类型及实际水稻种子类型Table 6 The predicted rice seed type and the actual rice seed type of the rice seeds to be tested in the validation set

NO NO X₁ _x1 X₂ _x2 X₃ _x3 X₄ _x4 X₅ _x5 X₆ X ₆ Y Y S₁ S ₁ S₂ S ₂ 1 1 -3.4170 -3.4170 -1.0060 -1.0060 -0.7280 -0.7280 0.6610 0.6610 0.0677 0.0677 -0.2440 -0.2440 0.858 0.858 1 1 1 1 2 2 -0.9130 -0.9130 -0.0728 -0.0728 -0.0108 -0.0108 0.1660 0.1660 -0.0995 -0.0995 0.0692 0.0692 0.787 0.787 1 1 1 1 3 3 -3.6300 -3.6300 -0.3270 -0.3270 -0.0092 -0.0092 -0.0052 -0.0052 -0.1700 -0.1700 -0.1260 -0.1260 0.994 0.994 1 1 1 1 4 4 -0.9640 -0.9640 -0.0178 -0.0178 -0.1990 -0.1990 -0.2020 -0.2020 -0.0221 -0.0221 -0.0748 -0.0748 0.713 0.713 1 1 1 1 5 5 -1.0860 -1.0860 -0.5690 -0.5690 -0.2700 -0.2700 0.6380 0.6380 -0.3260 -0.3260 0.0469 0.0469 0.616 0.616 1 1 1 1 6 6 8.0500 8.0500 -0.3310 -0.3310 -0.3900 -0.3900 -0.0776 -0.0776 -0.2580 -0.2580 0.0654 0.0654 1.431 1.431 1 1 1 1 7 7 -3.0820 -3.0820 0.2200 0.2200 0.5230 0.5230 -0.1130 -0.1130 -0.0029 -0.0029 0.4140 0.4140 0.375 0.375 1 1 1 1 8 8 -3.4340 -3.4340 0.1150 0.1150 0.0832 0.0832 -0.3410 -0.3410 -0.2630 -0.2630 0.2880 0.2880 1.216 1.216 1 1 1 1 9 9 0.0422 0.0422 1.4380 1.4380 -0.3540 -0.3540 -0.0101 -0.0101 -0.5670 -0.5670 0.4380 0.4380 0.998 0.998 1 1 1 1 10 10 -0.2760 -0.2760 0.9760 0.9760 -0.5780 -0.5780 0.1790 0.1790 -0.1710 -0.1710 0.3150 0.3150 0.953 0.953 1 1 1 1 11 11 -4.3810 -4.3810 0.9250 0.9250 -0.0026 -0.0026 -0.9750 -0.9750 -0.4250 -0.4250 0.2840 0.2840 0.880 0.880 1 1 1 1 12 12 0.5170 0.5170 0.8000 0.8000 0.6040 0.6040 0.7250 0.7250 0.0724 0.0724 0.1790 0.1790 0.324 0.324 1 1 1 1 13 13 -4.3450 -4.3450 -0.3570 -0.3570 0.4730 0.4730 0.3960 0.3960 0.1150 0.1150 -0.0927 -0.0927 -0.537 -0.537 -1 -1 -1 -1 14 14 -3.5570 -3.5570 0.6740 0.6740 0.4930 0.4930 -0.0930 -0.0930 0.0196 0.0196 -0.3320 -0.3320 -1.026 -1.026 -1 -1 -1 -1 15 15 3.2640 3.2640 2.4540 2.4540 0.1720 0.1720 0.1120 0.1120 -0.0863 -0.0863 -0.3330 -0.3330 -0.754 -0.754 -1 -1 -1 -1 16 16 3.7650 3.7650 2.4540 2.4540 0.8320 0.8320 -1.1410 -1.1410 0.1980 0.1980 -0.5470 -0.5470 -1.203 -1.203 -1 -1 -1 -1 17 17 -0.9060 -0.9060 0.8170 0.8170 0.2280 0.2280 1.0120 1.0120 0.1660 0.1660 0.5320 0.5320 -1.083 -1.083 -1 -1 -1 -1 18 18 -0.1260 -0.1260 1.1780 1.1780 0.2870 0.2870 0.9450 0.9450 0.2910 0.2910 0.4300 0.4300 -1.299 -1.299 -1 -1 -1 -1 19 19 -2.6800 -2.6800 0.2390 0.2390 -0.3510 -0.3510 0.1930 0.1930 0.1640 0.1640 0.4000 0.4000 -1.048 -1.048 -1 -1 -1 -1 20 20 3.0620 3.0620 1.2720 1.2720 -1.0030 -1.0030 0.1120 0.1120 0.6080 0.6080 0.3440 0.3440 -1.242 -1.242 -1 -1 -1 -1 21 twenty one -0.6180 -0.6180 -3.1850 -3.1850 -0.0892 -0.0892 -0.3880 -0.3880 0.7230 0.7230 0.4070 0.4070 0.045 0.045 1 1 1 1 22 twenty two -4.3620 -4.3620 -0.6470 -0.6470 -0.0510 -0.0510 -0.2630 -0.2630 0.4750 0.4750 0.3470 0.3470 -0.709 -0.709 -1 -1 -1 -1 23 twenty three -0.9680 -0.9680 0.6720 0.6720 -0.3570 -0.3570 -0.7200 -0.7200 0.0012 0.0012 0.5530 0.5530 -0.550 -0.550 -1 -1 -1 -1 24 twenty four -1.5620 -1.5620 0.3000 0.3000 -1.0520 -1.0520 -1.3440 -1.3440 0.4130 0.4130 0.6250 0.6250 -0.768 -0.768 -1 -1 -1 -1 25 25 0.1140 0.1140 1.1700 1.1700 0.6600 0.6600 -0.1340 -0.1340 0.0222 0.0222 -0.3180 -0.3180 -1.524 -1.524 -1 -1 -1 -1

其中，NO是指待测水稻种子的序号，X₁、X₂、X₃、X₄、X₅、X₆分别对应每个主成分得分；Y为输出值，S₁为模型预测水稻种子类型，S₂为实际水稻种子类型；S₁和S₂列中，1代表非转基因水稻，-1代表转基因水稻。Among them, NO refers to the serial number of the rice seed to be tested, X ₁ , X ₂ , X ₃ , X ₄ , X ₅ , and X ₆ correspond to the scores of each principal component; Y is the output value, and S ₁ is the type of rice seed predicted by the model , S ₂ is the actual rice seed type; in columns S ₁ and S ₂ , 1 represents non-transgenic rice, and -1 represents transgenic rice.

通过数据分析可知：验证集决定系数R² _p为0.8344、验证集均方根误差RMSEP为0.3543，建立的PLS-DA模型对验证集水稻种子的预测准确率达到100％。Through data analysis, it can be seen that the determination coefficient R ² _p of the verification set is 0.8344, and the root mean square error RMSEP of the verification set is 0.3543.

Claims

1. the method based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice is characterized in that, may further comprise the steps:

(1) be 4000～10000cm to rice paddy seed sample emission wave-number range ^-1Near infrared spectrum, and gather the spectrum information that diffuses of all rice paddy seed samples;

(2) respectively the spectrum information that diffuses of all rice paddy seed samples is carried out pre-service, utilize PCA to extract the spectral signature information in the pretreated characteristic spectrum section, choose major component, and obtain the score of each major component;

(3) with corresponding each principal component scores of all rice paddy seed sample light spectrum informations as input, as output, set up model with the corresponding rice paddy seed type setting value of rice paddy seed sample;

(4) obtain each principal component scores of rice paddy seed spectrum to be measured according to the operation of step (1)～(2), carry it into model described in (3), obtain rice paddy seed type to be measured.

2. the method based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice as claimed in claim 1 is characterized in that in the step (2), preprocess method is the standard normal converter technique.

3. the method based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice as claimed in claim 1 is characterized in that in the step (2), said characteristic spectrum section is that wave-number range is 4000～10000cm ^-1The spectrum section.

4. the method based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice as claimed in claim 1 is characterized in that in the step (2), said characteristic spectrum section is that wave-number range is 4000～8000cm ^-1The spectrum section.

5. the method based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice as claimed in claim 1 is characterized in that in the step (2), number of principal components is 4～8.

6. like claim 3 or 4 described methods, it is characterized in that in the step (2), number of principal components is 5 based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice.

7. the method based near infrared spectrum discriminating transgenic paddy rice and non-transgenic paddy rice as claimed in claim 1 is characterized in that in the step (3), said model is the partial least squares discriminant analysis model.