CN106442390A - 基于pca‑svm算法的转基因大豆分类鉴定方法 - Google Patents
基于pca‑svm算法的转基因大豆分类鉴定方法 Download PDFInfo
- Publication number
- CN106442390A CN106442390A CN201610754442.4A CN201610754442A CN106442390A CN 106442390 A CN106442390 A CN 106442390A CN 201610754442 A CN201610754442 A CN 201610754442A CN 106442390 A CN106442390 A CN 106442390A
- Authority
- CN
- China
- Prior art keywords
- genetically engineered
- engineered soybean
- identification method
- soybean
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000010469 Glycine max Nutrition 0.000 title claims abstract description 42
- 244000068988 Glycine max Species 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000009261 transgenic effect Effects 0.000 title abstract description 9
- 238000004611 spectroscopical analysis Methods 0.000 claims abstract description 12
- 238000012706 support-vector machine Methods 0.000 claims abstract description 11
- 238000000513 principal component analysis Methods 0.000 claims abstract description 6
- 238000001228 spectrum Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 6
- 238000002203 pretreatment Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000010224 classification analysis Methods 0.000 claims description 3
- 238000011946 reduction process Methods 0.000 claims description 2
- 238000010521 absorption reaction Methods 0.000 abstract description 3
- 230000009467 reduction Effects 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 238000001328 terahertz time-domain spectroscopy Methods 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000003869 genetically modified organism Nutrition 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 235000012424 soybean oil Nutrition 0.000 description 2
- 239000003549 soybean oil Substances 0.000 description 2
- 238000007400 DNA extraction Methods 0.000 description 1
- 108700019146 Transgenes Proteins 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000008157 edible vegetable oil Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 235000015112 vegetable and seed oil Nutrition 0.000 description 1
- 239000008158 vegetable oil Substances 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/3581—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation
- G01N21/3586—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation by Terahertz time domain spectroscopy [THz-TDS]
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/3563—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Pathology (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Toxicology (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
本发明公开了一种基于PCA‑SVM算法的转基因大豆分类鉴定方法。本发明利用转基因大豆生物大分子对太赫兹时域光谱的吸收得到的光谱数据,首先进行主成分分析,降维处理后得到个位数的主成分因子,再利用支持向量机算法对多个样本进行分类鉴定,最终得到了多个大豆样本的分类结果图。结果也证明了本发明对转基因大豆的分类结果准确率为100%。本发明利用降维的光谱数据和支持向量机算法,能够同时处理大量大豆样本,并能够快速、高效、准确的对样本进行分类鉴定。
Description
技术领域
本发明涉及一种转基因大豆鉴定方法,具体是一种基于THz-TDS的PCA-SVM模型转基因大豆的分类鉴定方法。
背景技术
转基因大豆的毒性和安全隐患问题一直备受争议,食用大豆油经过复杂的加工环节,DNA降解和破坏极其严重,给转基因检测带来了极大的困难,检测中容易出现假阴性。因此,提取开始前,利用DNA可溶于水溶液的特性,在食用油脂中加入一定体积的TE溶液进行洗涤,这一步骤对于成功提取DNA至关重要。通过改进DNA提取方法、针对内源基因设计质控引物、控制扩增片段长度、使用不同检测技术等方法,排除假阴性并提高了检测灵敏度,使大豆油转基因成分的核酸检测技术日趋成熟。实时荧光定量PCR技术,不但具有高灵敏度和快捷的优点,而且使转基因检测既能够进行定性,还能进行定量,成为转基因检测的趋势。当前,国际上仍然没有建立统一的植物油转基因检测方法,这是一个亟待解决的问题。
THz波技术已经成为科学研究的新的强有力的方法,尤其是THz成像和THz波谱学在物理学、化学、生物医学、天文学、材料科学和环境科学等方面有着极其重要的应用。
支持向量机(SVM)是建立在统计学习理论基础上的一种数据挖掘方法,能够成功地处理回归问题(时间序列分析)和模式识别(分类问题、判别分析)等诸多问题,并可推广于预测和综合评价等领域和学科。SVM是以结构风险最小化(SRM)代替常用风险最小化(ERM)作为优化准则,可用支持向量机处理的问题可分为线性可分问题和线性不可分问题。
利用基于THz-TDS的支持向量机(SVM)算法进行转基因大豆的分类鉴定目前仍然处于空白研究领域。
发明内容
本发明的目的是提供一种基于THz-TDS的PCA-SVM算法的转基因大豆分类鉴定方法,以弥补现有技术的不足。
本发明利用转基因大豆大分子在THz波段有明显吸收,从而得到透射光谱的原始数据,再经过主成分分析,数据降维,并能近似代替原光谱信息;再采用主成分分析处理后的光谱数据作为SVM的输入,核函数选用径向基核函数,最终输出分类结果。
为达到上述目的,本发明采取的具体技术方案为:
一种基于PCA-SVM算法的转基因大豆分类鉴定方法,该方法具体包括以下步骤:
1)转基因大豆样品的前处理;
2)运用THz-TDS对多个转基因大豆样品进行照射处理,得到各样品的透射原光谱数据;
3)对上述得到的原光谱数据进行主成分分析,进行数据降维处理,得到了输入光谱数据;
4)将上述得到的输入光谱数据进行支持向量机(SVM)算法进行分类分析。
进一步的,上述步骤1)中前处理具体为转基因大豆依次进行去包衣,研磨、压片,干燥处理。
进一步的,上述步骤2)中的波段为0.5~1.5THz。
进一步的,所述步骤3)中原光谱数据的维数由千位数降到个位数,且保留了原光谱数据的绝大部分信息。
进一步的,所述支持向量机算法采用径向基核函数。
进一步的,所述径向基核函数为:
本发明的有益效果:本发明利用大豆大分子对太赫兹时域光谱的吸收得到的光谱数据,再利用支持向量机算法对多个样本进行分类鉴定,最终得到了多个大豆样本的分类结果图。结果也证明了本发明对转基因大豆的分类结果准确率为100%。本发明利用降维的光谱数据和支持向量机算法,能够同时处理大量大豆样本,并能够快速、高效、准确的对样本进行分类,最终完成转基因大豆的样本分类鉴定。
附图说明
图1为本发明支持向量机(SVM)的原理拓扑图。
图2为本发明实施例中的分类鉴定结果图。
具体实施方式
下面结合实施例对本发明做进一步的详细说明,以令本领域技术人员参照说明书文字能够据以实施。
实施例1:
本实施例采用90个转基因大豆样本,三种转基因大豆品种,即每种转基因大豆品种样本数为30。
1)转基因大豆样品的前处理;
2)对多个转基因大豆样品进行THz-TDS照射处理,得到各样品的投射原光谱数据,0.5~1.5THz波段范围的THz透射光谱特征向量构造数据表;
3)在保留光谱主要信息的的前提下,对高维原光谱数据进行降维处理后,再进行聚类判别;对光谱变量做降维处理时,当前n个主成分的累计方差贡献率大于某个特定值(一般地大于85.00%),我们就能以前n个主成分近似代替原光谱变量,得到了输入光谱数据;在本实施例中,经主成分分析后得第1主成分(Component1)、第2主成分(Component2)和第3主成分(Component3)的累积方差贡献率达原数据的90.252%,且Component1到Component3的方差贡献率呈递减趋势,说明前3个主成分能近似代替原光谱信息。即原光谱数据的维数从1599降到3而不丢失原光谱数据的绝大部分信息,即最终得到3个主成分因子;
4)将上述得到的输入光谱数据进行支持向量机(SVM)算法,进行分类分析,SVM算法原理如图1所示。对于多类分类,是对于N类问题,构造N个两类分类器,第i个分类器用第i类训练样本作为正的训练样本,将其他类的训练样本作为负的训练样本,此时最后的输出是N个两类分类器输出中最大的那一类。在0.5~1.5THz范围内,从提取的3个主成分因子中选取方差贡献率最大两个。一般地训练集样本数应远大于预测集样本数,从共132个样品中,选取90个作为样本,以90个样本的2个主成分信息构成X90×2的矩阵作为支持向量机的输入因子,对3类转基因大豆种子进行训练,经过多次训练,选定函数为最佳系数C=100,最佳核函数参数δ=0.5,最后据此PCA-SVM模型以剩余42个样本的2个主成分信息构成X42×2的矩阵作为支持向量机的输入因子,对3类转基因大豆进行分类鉴定,结果如图2所示。
由图2可以看出,三类转基因大豆种子的区分效果明显,其中训练所用时间为0.015625秒,支持向量个数为8,分类错误率为0.11%。
另外,还进行了进一步的测试分析,采用42个转基因大豆样本,42个样本的2个主成分信息构成X42×2的矩阵作为PCA-SVM模型的输入因子,三类转基因大豆种子的区分效果明显,其中训练所用时间为0.000805秒,支持向量个数为7,分类错误率为0.00%;三种转基因大豆种子样品分类准确率达到了100.00%,分类效果非常理想。
尽管本发明的实施方案已公开如上,但其并不仅仅限于说明书和实施方式所列运用,它完全可以被适用于各种适合本发明的领域,对于熟悉本领域的人员而言,可容易地实现另外的修改,因此在不背离权利要求及等同范围所限制的一般概念下,本发明并不限于特定的细节和这里示出的实施例。
Claims (6)
1.一种基于PCA-SVM算法的转基因大豆分类鉴定方法,其特征在于,该方法具体包括以下步骤:
1)转基因大豆样品的前处理;
2)对多个转基因大豆样品进行THz-TDS照射处理,得到各样品的透射原光谱数据;
3)对上述得到的原光谱数据进行主成分分析,进行数据降维处理,得到了输入光谱数据;
4)将上述得到的输入光谱数据进行支持向量机(SVM)算法进行分类分析。
2.如权利要求1所述的转基因大豆分类鉴定方法,其特征在于,上述步骤1)中前处理具体为转基因大豆依次进行去包衣,研磨、压片,干燥处理。
3.如权利要求1所述的转基因大豆分类鉴定方法,其特征在于,上述步骤2)中的波段为0.5~1.5THz。
4.如权利要求1所述的转基因大豆分类鉴定方法,其特征在于,所述步骤3)中原光谱数据的维数由千位数降到个位数,且保留了原光谱数据的基本信息。
5.如权利要求1所述的转基因大豆分类鉴定方法,其特征在于,所述支持向量机算法采用径向基核函数。
6.如权利要求5所述的转基因大豆分类鉴定方法,其特征在于,所述径向基核函数为:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610754442.4A CN106442390A (zh) | 2016-08-29 | 2016-08-29 | 基于pca‑svm算法的转基因大豆分类鉴定方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610754442.4A CN106442390A (zh) | 2016-08-29 | 2016-08-29 | 基于pca‑svm算法的转基因大豆分类鉴定方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106442390A true CN106442390A (zh) | 2017-02-22 |
Family
ID=58090033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610754442.4A Pending CN106442390A (zh) | 2016-08-29 | 2016-08-29 | 基于pca‑svm算法的转基因大豆分类鉴定方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106442390A (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106959284A (zh) * | 2017-03-27 | 2017-07-18 | 江苏大学 | 一种区分转基因玉米和非转基因玉米的检测方法 |
CN109325551A (zh) * | 2018-11-21 | 2019-02-12 | 广东工业大学 | 结合径向基函数和核主成分分析的太赫兹光谱识别方法 |
CN109657731A (zh) * | 2018-12-28 | 2019-04-19 | 长沙理工大学 | 一种微滴数字pcr仪抗干扰分类方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389281A (zh) * | 2012-05-09 | 2013-11-13 | 云南天士力帝泊洱生物茶集团有限公司 | 一种基于近红外光谱技术的普洱茶聚类分析方法 |
CN104597052A (zh) * | 2015-02-09 | 2015-05-06 | 淮阴工学院 | 基于多特征融合的马铃薯高速无损分级检测方法及系统 |
CN105372202A (zh) * | 2015-10-27 | 2016-03-02 | 九江学院 | 转基因棉花品种识别方法 |
-
2016
- 2016-08-29 CN CN201610754442.4A patent/CN106442390A/zh active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389281A (zh) * | 2012-05-09 | 2013-11-13 | 云南天士力帝泊洱生物茶集团有限公司 | 一种基于近红外光谱技术的普洱茶聚类分析方法 |
CN104597052A (zh) * | 2015-02-09 | 2015-05-06 | 淮阴工学院 | 基于多特征融合的马铃薯高速无损分级检测方法及系统 |
CN105372202A (zh) * | 2015-10-27 | 2016-03-02 | 九江学院 | 转基因棉花品种识别方法 |
Non-Patent Citations (4)
Title |
---|
TAO CHEN ET AL.: "Classification and recognition of genetically modified organisms by chemometrics methods using terahertz spectroscopy", 《INTERNATIONAL JOURNAL OF FOOD SCIENCE AND TECHNOLOGY》 * |
WENDAO XU ET AL.: "Discrimination of Transgenic Rice containing the Cry1Ab Protein using Terahertz Spectroscopy and Chemometrics", 《SCIENTIFIC REPORTS》 * |
涂闪: "基于太赫兹光谱技术的转基因农产品无损鉴别方法研究", 《万方学位论文》 * |
聂君扬 等: "基于太赫兹时域光谱技术与PCA-BPN网络的转基因大豆鉴别", 《光子学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106959284A (zh) * | 2017-03-27 | 2017-07-18 | 江苏大学 | 一种区分转基因玉米和非转基因玉米的检测方法 |
CN109325551A (zh) * | 2018-11-21 | 2019-02-12 | 广东工业大学 | 结合径向基函数和核主成分分析的太赫兹光谱识别方法 |
CN109657731A (zh) * | 2018-12-28 | 2019-04-19 | 长沙理工大学 | 一种微滴数字pcr仪抗干扰分类方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nie et al. | Classification of hybrid seeds using near-infrared hyperspectral imaging technology combined with deep learning | |
Pang et al. | Rapid vitality estimation and prediction of corn seeds based on spectra and images using deep learning and hyperspectral imaging techniques | |
Feng et al. | Application of visible/infrared spectroscopy and hyperspectral imaging with machine learning techniques for identifying food varieties and geographical origins | |
CN102819745B (zh) | 一种基于AdaBoost的高光谱遥感影像分类方法 | |
Tu et al. | A non-destructive and highly efficient model for detecting the genuineness of maize variety'JINGKE 968′ using machine vision combined with deep learning | |
Shao et al. | Rapid classification of Chinese quince (Chaenomeles speciosa Nakai) fruit provenance by near-infrared spectroscopy and multivariate calibration | |
CN106442390A (zh) | 基于pca‑svm算法的转基因大豆分类鉴定方法 | |
CN110132938B (zh) | 一种拉曼光谱法鉴别大米种类的特征数据提取方法 | |
CN105117734B (zh) | 基于模型在线更新的玉米种子高光谱图像分类识别方法 | |
Jiang et al. | A residual neural network based method for the classification of tobacco cultivation regions using near-infrared spectroscopy sensors | |
CN104374739A (zh) | 一种基于近红外定性分析的种子品种真实性鉴别方法 | |
Qi et al. | An additional data fusion strategy for the discrimination of porcini mushrooms from different species and origins in combination with four mathematical algorithms | |
Pang et al. | Feasibility study on identifying seed viability of Sophora japonica with optimized deep neural network and hyperspectral imaging | |
Sun et al. | A method of information fusion for identification of rice seed varieties based on hyperspectral imaging technology | |
Liu et al. | Method for identifying transgenic cottons based on terahertz spectra and WLDA | |
CN106203452A (zh) | 基于多线性判别分析的玉米种子高光谱图像多特征转换方法 | |
CN114112983B (zh) | 一种基于Python数据融合的藏药全缘叶绿绒蒿产地判别方法 | |
Wu et al. | Variety identification of Chinese cabbage seeds using visible and near-infrared spectroscopy | |
Wang et al. | Extraction and classification of apple defects under uneven illumination based on machine vision | |
Tan et al. | An improved DCGAN model: Data augmentation of hyperspectral image for identification pesticide residues of Hami melon | |
Yang et al. | Assessment of the vigor of rice seeds by near-infrared hyperspectral imaging combined with transfer learning | |
Fan et al. | Non-destructive detection of single-seed viability in maize using hyperspectral imaging technology and multi-scale 3D convolutional neural network | |
Xu et al. | Detection of apple varieties by near‐infrared reflectance spectroscopy coupled with SPSO‐PFCM | |
Zhang et al. | Three different SVM classification models in Tea Oil FTIR Application Research in Adulteration Detection | |
CN113433076A (zh) | 基于高光谱成像技术的玉米籽粒中黄曲霉毒素鉴别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170222 |