WO2016150130A1 - 一种基于近红外光谱的杂交种纯度鉴别方法 - Google Patents

一种基于近红外光谱的杂交种纯度鉴别方法 Download PDF

Info

Publication number
WO2016150130A1
WO2016150130A1 PCT/CN2015/090229 CN2015090229W WO2016150130A1 WO 2016150130 A1 WO2016150130 A1 WO 2016150130A1 CN 2015090229 W CN2015090229 W CN 2015090229W WO 2016150130 A1 WO2016150130 A1 WO 2016150130A1
Authority
WO
WIPO (PCT)
Prior art keywords
seed
sample
spectral data
identified
purity
Prior art date
Application number
PCT/CN2015/090229
Other languages
English (en)
French (fr)
Inventor
安冬
李卫军
孙虎
董肖莉
Original Assignee
山东翰能高科科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东翰能高科科技有限公司 filed Critical 山东翰能高科科技有限公司
Publication of WO2016150130A1 publication Critical patent/WO2016150130A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light

Definitions

  • the invention relates to the field of purity identification of seeds, in particular to a hybrid identification method based on near infrared spectroscopy.
  • China is a big grain producer.
  • Crop seed industry is a national strategic and basic core industry, and is the foundation for promoting long-term stable development of agriculture and ensuring national food security.
  • Corn is one of the important food and feed crops in China. In 2012, it has become the first food crop in China.
  • China's corn production mainly depends on hybrids.
  • the identification of seed purity is the core to ensure seed quality.
  • the main reason for affecting the purity of corn hybrids is that the female parent is mixed in the hybrid seeds.
  • Field identification is the seeding of samples in the field. This method is an authoritative method for purity identification, but the biggest limiting factor is long time, high cost and more land.
  • the indoor identification has morphological identification technology, electrophoretic band identification technology, molecular marker identification method, etc., but the operation is complicated, the cost is high, the time is long, and real-time analysis cannot be performed.
  • the near-infrared spectrum has a wavelength range of 780 to 2500 nm.
  • the spectral region carries the rich molecular information and structural information of the sample molecules CH, NH, OH, and the near-infrared spectra of different spectral regions can collect different depths of samples (10-1 ⁇ 102 nm).
  • the sample information has the advantages of fast analysis, non-destructive, low cost, and green analysis.
  • the application of near-infrared spectroscopy in the detection of agricultural products has a lot of research results, and the research on the purity identification of corn hybrids is also common. However, most of the existing researches need to smash the seeds for analysis. It is impossible to carry out non-destructive and real-time detection of seeds. It is impossible to sort out the single parent seeds mixed in the hybrid seeds, and the hybrids of different seed cultures are not studied. Whether the female parent can be divided.
  • the present invention proposes a low-cost, easy-to-operate, and highly reliable method for purity identification of seed hybrids based on near-infrared spectroscopy, which can realize the purity identification of hybrids without professionals. And the shortcomings of the stability and adaptability of the identification model established by the existing methods are improved.
  • a hybrid purity identification method based on near-infrared spectroscopy comprising the following steps:
  • Step 1 collecting near-infrared spectroscopy data of the seed sample
  • Step 2 Preprocessing the near-infrared spectral data of the collected seed samples, and selecting representative training sample spectral data from the pre-processed spectral data;
  • Step 3 Establish a seed purity identification model for the selected training sample spectral data by using a feature extraction algorithm and a modeling method
  • Step 4 Using the established identification model, the spectrum of the seed to be identified is identified.
  • the near-infrared spectrum data of the collected seed sample in the step 1 is derived from a near-infrared spectrometer, and if there are multiple near-infrared spectrometers of the same model, when collecting the near-infrared spectrum data of the seed sample,
  • the external environment of the near-infrared spectrometer is the same; for the same seed sample, it is required to measure on different NIR spectrometers at the same measurement time point to obtain corresponding multiple spectral data.
  • the near-infrared spectral data is preprocessed as described in step 2.
  • the preprocessing methods used include data normalization processing, derivative processing, smoothing processing, centralization, and standardization processing.
  • the representative training sample spectral data in step 2 refers to the ability to identify the uncertainty due to the time and space variation of the seed sample spectrometry, the sample origin and the seeding time variation. The impact of the inclusive sample data.
  • the representative training sample spectral data is able to Sample data that is inclusive of these uncertain information to reduce the accuracy of spectral discrimination.
  • the purity identification model of the seed is established in the step 3, and the dimensionality processing of the selected training sample spectral data is performed, and the dimensionality reduction processing includes principal component analysis (PCA) and partial least squares regression (PLS). Or linear discriminant analysis (LDA) dimension reduction method.
  • PCA principal component analysis
  • PLS partial least squares regression
  • LDA linear discriminant analysis
  • the modeling method used in step 3 adopts different modeling methods according to the applicable scope of the model and the analysis target, including a bionic pattern recognition method (BPR) which is not limited to high-dimensional image geometric analysis, Support vector machine (SVM) or recent Euclidean distance method.
  • BPR bionic pattern recognition method
  • SVM Support vector machine
  • the spectrum of the seed to be identified is identified by using the established purity identification model in step 4, which comprises: first obtaining spectral data of the seed to be identified, and then preprocessing the spectral data of the seed to be identified, Feature extraction, and finally using the established identification model for identification, and give the identification result.
  • the pre-processing and feature extraction operations performed on the spectral data of the seed to be identified are the same as the pre-processing and feature extraction operations used in the purity discrimination model.
  • the present invention has the following beneficial effects:
  • the model is adapted to the change of the time, place, environment and the like of the spectrum acquisition.
  • the resilience of the model also enhances the model's ability to respond to changes in the time and location of sample seed production and enhances the robustness of the model.
  • the method for identifying purity of hybrids based on near-infrared spectroscopy of the present invention can quickly identify the purity of hybrids, and the identification time is low and the cost is low, and the tester is not required to have professional knowledge and is convenient to apply.
  • FIG. 1 is a flow chart of a hybridization method for hybrid purity based on near infrared spectroscopy provided by the present invention
  • FIG. 2 is a sample space distribution diagram after dimensionality reduction processing according to an embodiment of the present invention.
  • FIG. 3 is a two-dimensional spatial distribution diagram of data after expanding a variation range of a sample set sample source according to an embodiment of the present invention
  • 4 is a time period, a measurement day, a number of spectra, and an abbreviation of a measurement spectrum in an embodiment of the present invention.
  • the preprocessing method, the feature extraction method, and the modeling method in the embodiment algorithm are not fixed, and the experimenter can reasonably select each step method according to different situations and different experimental experiences, and the steps used in the embodiment.
  • the algorithm is not intended to limit the invention.
  • the method for identifying hybrid purity based on near infrared spectroscopy comprises the following steps:
  • Step 1 collecting near-infrared spectroscopy data of the seed sample
  • Step 2 Preprocessing the near-infrared spectral data of the collected seed samples, and selecting representative training sample spectral data from the pre-processed spectral data;
  • Step 3 Establish a seed purity identification model for the selected training sample spectral data by using a feature extraction algorithm and a modeling method
  • Step 4 Using the established identification model, the spectrum of the seed to be identified is identified.
  • the near-infrared spectrum data of the collected seed sample in the step 1 is derived from a near-infrared spectrometer, and if there are multiple near-infrared spectrometers of the same model, when collecting the near-infrared spectrum data of the seed sample,
  • the external environment of the near-infrared spectrometer is the same; for the same seed sample, it is required to measure on different NIR spectrometers at the same measurement time point to obtain corresponding multiple spectral data.
  • the near-infrared spectral data is preprocessed as described in step 2.
  • the preprocessing methods used include data normalization processing, derivative processing, smoothing processing, centralization, and standardization processing.
  • the representative training sample spectral data in step 2 refers to the ability to identify the uncertainty due to the time and space variation of the seed sample spectrometry, the sample origin and the seeding time variation. The impact of the inclusive sample data.
  • the representative training sample spectral data is able to Sample data that is inclusive of these uncertain information to reduce the accuracy of spectral discrimination.
  • the purity identification model of the seed is established in the step 3, and the dimensionality processing of the selected training sample spectral data is performed, and the dimensionality reduction processing includes principal component analysis (PCA) and partial least squares regression (PLS). Or linear discriminant analysis (LDA) dimension reduction method.
  • PCA principal component analysis
  • PLS partial least squares regression
  • LDA linear discriminant analysis
  • the modeling method used in step 3 adopts different modeling methods according to the applicable scope of the model and the analysis target, including a bionic pattern recognition method (BPR) which is not limited to high-dimensional image geometric analysis, Support vector machine (SVM) or recent Euclidean distance method.
  • BPR bionic pattern recognition method
  • SVM Support vector machine
  • the spectrum of the seed to be identified is identified by using the established purity identification model in step 4, which comprises: first obtaining spectral data of the seed to be identified, and then preprocessing the spectral data of the seed to be identified, Feature extraction, and finally using the established identification model for identification, and give the identification result.
  • the pre-processing and feature extraction operations performed on the spectral data of the seed to be identified are the same as the pre-processing and feature extraction operations used in the purity discrimination model.
  • the experimental instruments in the following examples used the German BRUKER Optics MPA type Fourier transform diffuse reflectance near-infrared spectrometer, the spectral range (wavenumber): 12000 ⁇ 4000cm-1, the number of scans: 32 times, resolution: 8cm-1. Single grain samples were measured with small sample attachments.
  • the spectral acquisition and data format conversion software is OPUS 6.5.
  • Single-particle spectroscopy is used to measure the sample by placing the seed sample face down into a small sample cell, and each sample is randomly sampled and measured multiple times.
  • the experiment was carried out in 5 time periods, the time span was up to 10 months, and each time period contained 4 (or 7) days.
  • the experimental samples were randomly taken from the large bag seeds every day, and the spectrum was measured once.
  • the time period of the measurement spectrum, the number of measurement days, the number of spectra, and the abbreviation are shown in the list in Figure 4.
  • Data preprocessing uses moving window averaging (MWA, window 9), first order differential derivative (FD, differential width of 9), and vector normalization (VN). These methods can filter out the noise data in the spectral data, perform baseline correction on the spectrum and improve resolution, eliminate spectral shift, and can eliminate random errors generated in spectral measurements to some extent.
  • MWA moving window averaging
  • FD first order differential derivative
  • VN vector normalization
  • the dimension reduction method used in the modeling process is a dimensionality reduction based on principal component analysis (PCA), which reduces the spectral data to 10 dimensions (the cumulative contribution rate of the first 10 principal components is greater than 98%), and then uses linear discriminant analysis ( LDA) performs secondary dimensionality reduction and reduces to 2 dimensions.
  • PCA principal component analysis
  • LDA linear discriminant analysis
  • the modeling method used in the modeling process is the modeling of the parent and hybrid of NH101 by the bionic pattern recognition method (BPR). Using the two-weighted neurons as the basic covering unit, the minimum spanning tree method is used to connect the basic covering units, and the bionic pattern recognition model of the maize maternal and hybrids is established.
  • BPR bionic pattern recognition method
  • R ij represents the relative distance between the i-th and j-th samples
  • D ij represents the squared Euclidean distance of the i-th and j-th centers
  • W i represents the sum of squares of the average dispersion inside the i-th sample.
  • W j represents the sum of the squares of the average dispersion inside the jth class.
  • the larger the value of R ij the greater the difference between the i-th and j-th sample sets.
  • the relative distance between the two sample sets in the qualitative analysis can be used to evaluate the classification ability of the two types of samples.
  • CAR Correct Acceptance Rate
  • CRR Correct Rejection Rate
  • the spectral data of the T1 time period was used to investigate whether the NH101 female parent and the hybrid corn seed were separable to the near-infrared spectrum, and whether the relative distance satisfies the identification requirement.
  • Fig. 2 is a sample space distribution diagram of the original spectrum of the T1 period using the PCA method for dimensionality reduction, and the first 10 dimensions of the first 10 dimensions using the LDA method. It can be seen that the difference in spatial distribution of spectral dispersion is difficult to see due to the difference in spectral time. The spatial distribution of the whole sample set is more contracted. The relative distance between MCB and MFP is calculated to be 2.0734, which is lower than the first two dimensions of PCA. The relative distance increases by nearly 70 times. The importance of spectral preprocessing to correct spectral distortion and dispersion is shown in the figure.
  • the MCB, MFP and T3, T5 time-measured MCB, MFP, HCB, HFP measured in the T1 time period were used as the training set to establish the model.
  • the scope of the model set sample source is expanded, and the model's tolerance to sample source changes is improved.
  • the test sets of MCB, MFP, HCB, and HFP measured in the MCB, MFP, and T4 time periods measured in the T2 time period were tested. Experiments have shown that the identification model can accommodate differences in origin and year after expanding the range of variation of the sample source of the model set.
  • the two-dimensional spatial distribution of the data is shown in Figure 3.
  • the invention adopts the near-infrared spectroscopy method for identifying the purity of corn hybrid seeds by using NH101 corn hybrids and female seeds in different years and regions.
  • the time and place of spectral collection can be improved.
  • the adaptability of conditions such as changes in the environment also improves the model's ability to respond to changes in the time and place of sample seed production, enhances the robustness of the model, and improves the correct recognition rate and correct rejection rate of the test sample.
  • the model has The practical feasibility can further provide a basis for the development of practical equipment.

Landscapes

  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

一种基于近红外光谱的杂交种纯度鉴别方法,包括以下步骤:步骤1:采集种子样本的近红外光谱数据;步骤2:对采集的种子样本的近红外光谱数据进行预处理,并从经过预处理的光谱数据中选择具代表性的训练样本光谱数据;步骤3:通过特征提取算法和建模方法对所选取的训练样本光谱数据建立种子的纯度鉴别模型;步骤4:利用所建立的鉴别模型,对待鉴别种子的光谱进行鉴别。其具有成本低、易操作、高可靠的优点,无需专业人士即可实现杂交种的纯度鉴别,并改进了现有方法所建立的鉴别模型的稳定性和适应性不高的缺点。

Description

一种基于近红外光谱的杂交种纯度鉴别方法 技术领域
本发明涉及种子的纯度鉴定领域,特别是一种基于近红外光谱的杂交种纯度鉴别方法。
背景技术
我国是粮食生产大国。农作物种业是国家战略性、基础性的核心产业,是促进农业长期稳定发展、保障国家粮食安全的根本。玉米是我国重要的粮食和饲料作物之一,2012年已经成为我国产量第一的粮食作物。我国玉米生产主要依靠杂交种,种子纯度的鉴定是保障种子质量的核心,影响玉米杂交种纯度的主要原因是母本在杂交种子中混杂。
目前鉴定玉米杂交种纯度的方法主要分为田间鉴定和室内鉴定两大类。田间鉴定是将样品在田间播种,该方法是权威的纯度鉴定方法,但其最大的限制因素是时间长、费用高、用地较多。室内鉴定有形态学鉴定技术、电泳谱带鉴定技术、分子标记鉴定法等,但是操作复杂,成本较高,耗时长,不能进行实时分析。
近红外光谱波长范围为780~2500nm,该谱区承载样品分子C-H,N-H,O-H极为丰富的成分信息与结构信息,而且不同谱区的近红外光谱可以采集样品不同深度(10-1~102nm)的样品信息,使其具有分析速度快、无损、成本低、绿色分析等优点。近红外光谱分析技术在农产品检测中的应用已有大量研究成果,在玉米杂交种纯度鉴定方面的研究也屡见不鲜。但现有的研究大多需要将种子粉碎后进行分析,不能实现对种子进行无损、实时检测,不能分选出杂交种子中混合的母本单粒种子,而且没有研究不同制种地的杂交种和母本是否可分。
发明内容
针对现有种子纯度鉴别方法的现状与诸多不足,本发明提出一种低成本、易操作、高可靠的基于近红外光谱的种子杂交种纯度鉴别方法,无需专业人士即可实现杂交种的纯度鉴别,并改进了现有方法所建立的鉴别模型的稳定性和适应性不高的缺点。
本发明的技术方案是这样实现的:
一种基于近红外光谱的杂交种纯度鉴别方法,包括以下步骤:
步骤1:采集种子样本的近红外光谱数据;
步骤2:对采集的种子样本的近红外光谱数据进行预处理,并从经过预处理的光谱数据中选择具代表性的训练样本光谱数据;
步骤3:通过特征提取算法和建模方法对所选取的训练样本光谱数据建立种子的纯度鉴别模型;
步骤4:利用所建立的鉴别模型,对待鉴别种子的光谱进行鉴别。
上述技术方案中,所述步骤1中的采集的种子样本的近红外光谱数据来源于近红外光谱仪,如果有相同型号的多台近红外光谱仪,则在采集种子样本的近红外光谱数据时,多台近红外光谱仪所处的外部环境相同;对同一份种子样本,在相同的测量时间点要求在不同的近红外光谱仪上进行测量,得到对应的多条光谱数据。
上述技术方案中,步骤2中所述对近红外光谱数据进行预处理,采用的预处理方法包括数据归一化处理、导数法处理、平滑处理、中心化及标准化处理。
上述技术方案中,步骤2中所述具代表性的训练样本光谱数据,是指能够对由于种子样品光谱测定时间、空间的变动、样品产地与制种时间变动这些不确定因素给纯度鉴别带来的影响进行包容的样本数据。
由于种子样品光谱测定时间、空间的变动(测试仪器的不同等)、样品产地与制种时间变动这些不确定因素,会给纯度鉴别带来影响,因此该具有代表性的训练样本光谱数据是能够对这些不确定信息进行包容的样本数据,以减小对光谱鉴别的准确性。
上述技术方案中,步骤3中所述建立种子的纯度鉴别模型,包括对挑选的训练样本光谱数据进行降维处理,该降维处理包括主成分分析(PCA)、偏最小二乘法回归(PLS)或线性鉴别分析(LDA)降维方法。
上述技术方案中,步骤3中所采用的建模方法根据模型的适用范围及分析目标的不同采用不同的建模方法,包括基但不限于高维形象几何分析的仿生模式识别方法(BPR)、支持向量机(SVM)或最近欧氏距离方法。
上述技术方案中,步骤4中所述利用所建立的纯度鉴别模型,对待鉴别种子的光谱进行鉴别,包括:首先获取待鉴别种子的光谱数据,然后对该待鉴别种子的光谱数据进行预处理、特征提取,最后利用所建立的鉴别模型进行鉴别,并给出鉴别结果。
上述技术方案中,对待鉴别种子的光谱数据所进行的预处理与特征提取操作,与所述的纯度鉴别模型所使用的预处理与特征提取操作相同。
从上述技术方案中可以看出,本发明具有以下有益效果:
(1)本发明的基于近红外光谱的杂交种纯度鉴别方法,由于选择了能够包容大部分不确定信息的代表性的训练样本光谱数据,提高了模型应对光谱采集时间、地点、环境等条件变动的应变能力,也提高了模型应对样品种子制种时间与地点变动的应变能力,增强了模型的稳健性。
(2)本发明的基于近红外光谱的杂交种纯度鉴别方法,能够快速对杂交种纯度做出鉴别,鉴别时间少成本低,对测试者不要求具有专业知识,应用方便。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明提供的基于近红外光谱的杂交种纯度鉴别方法的流程图;
图2是依照本发明实施例的经过降维处理之后的样本空间分布图;
图3是依照本发明实施例的扩大建模集样品来源的变动范围之后,数据的二维空间分布图;
图4是本发明实施例中的测量光谱的时间段、测量天数、光谱数目及简称。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,实施例算法中的预处理方法、特征提取方法以及建模方法并不固定,实验者可以根据不同的情况以及不同的实验经验来合理选择各步骤方法,实施例使用的各步骤算法并非用来限制本发明。
如图1所示,本发明提供的基于近红外光谱的杂交种纯度鉴别方法,包括以下步骤:
步骤1:采集种子样本的近红外光谱数据;
步骤2:对采集的种子样本的近红外光谱数据进行预处理,并从经过预处理的光谱数据中选择具代表性的训练样本光谱数据;
步骤3:通过特征提取算法和建模方法对所选取的训练样本光谱数据建立种子的纯度鉴别模型;
步骤4:利用所建立的鉴别模型,对待鉴别种子的光谱进行鉴别。
上述技术方案中,所述步骤1中的采集的种子样本的近红外光谱数据来源于近红外光谱仪,如果有相同型号的多台近红外光谱仪,则在采集种子样本的近红外光谱数据时,多台近红外光谱仪所处的外部环境相同;对同一份种子样本,在相同的测量时间点要求在不同的近红外光谱仪上进行测量,得到对应的多条光谱数据。
上述技术方案中,步骤2中所述对近红外光谱数据进行预处理,采用的预处理方法包括数据归一化处理、导数法处理、平滑处理、中心化及标准化处理。
上述技术方案中,步骤2中所述具代表性的训练样本光谱数据,是指能够对由于种子样品光谱测定时间、空间的变动、样品产地与制种时间变动这些不确定因素给纯度鉴别带来的影响进行包容的样本数据。
由于种子样品光谱测定时间、空间的变动(测试仪器的不同等)、样品产地与制种时间变动这些不确定因素,会给纯度鉴别带来影响,因此该具有代表性的训练样本光谱数据是能够对这些不确定信息进行包容的样本数据,以减小对光谱鉴别的准确性。
上述技术方案中,步骤3中所述建立种子的纯度鉴别模型,包括对挑选的训练样本光谱数据进行降维处理,该降维处理包括主成分分析(PCA)、偏最小二乘法回归(PLS)或线性鉴别分析(LDA)降维方法。
上述技术方案中,步骤3中所采用的建模方法根据模型的适用范围及分析目标的不同采用不同的建模方法,包括基但不限于高维形象几何分析的仿生模式识别方法(BPR)、支持向量机(SVM)或最近欧氏距离方法。
上述技术方案中,步骤4中所述利用所建立的纯度鉴别模型,对待鉴别种子的光谱进行鉴别,包括:首先获取待鉴别种子的光谱数据,然后对该待鉴别种子的光谱数据进行预处理、特征提取,最后利用所建立的鉴别模型进行鉴别,并给出鉴别结果。
上述技术方案中,对待鉴别种子的光谱数据所进行的预处理与特征提取操作,与所述的纯度鉴别模型所使用的预处理与特征提取操作相同。
以下实施例中的实验仪器采用德国BRUKER公司Optics MPA型傅里叶变换漫反射近红外光谱仪,谱区范围(波数):12000~4000cm-1,扫描次数:32次,分辨率:8cm-1。单籽粒样品测定用小样品附件。光谱采集和数据格式转换软件均为OPUS 6.5。
(一)采集种子样本的近红外光谱数据。
采用单粒光谱测量,测量样品时将种子样品胚面朝下放入小样品池,每个品种多次随机取样并测量。
玉米种子杂交种及其母本研究所用玉米种子样品为2009年海南制种的NH101杂交种(CB)和母本(FP),分别简称HCB和HFP,以及另一种(非2009年和非海南制种)NH101杂交种和母本,分别简称MCB和MFP。
实验在5个时间段进行,时间跨度最长达10个月,每个时间段包含4(或7)天,每天从大袋种子中随机抽取实验样品,测量一次光谱。测量光谱的时间段、测量天数、光谱数目及简称见图4列表。
(二)对采集的种子样本的近红外光谱数据进行预处理,并从经过预处理的光谱数据中选择具代表性的训练样本光谱数据。
由于在采集样本的光谱数据时,不同数据采集时间导致了环境温度变化比较大,而参比只做了一次,导致部分光谱数据发生偏移,因此需要采用适当的预处理方法消除这一影响。
数据预处理采用移动窗口平均(MWA,窗口为9)、一阶差分导数(FD,差分宽度为9)和矢量归一化(VN)方法。这些方法可以滤除光谱数据中的噪声数据,对光谱进行基线校正和提高分辨率,消除光谱平移,并能在一定程度上消除光谱测量中产生的随机误差。
(三)通过特征提取算法和建模方法对所选取的训练样本光谱数据建立种子的纯度鉴别模型。
建模过程中采用的降维方法为主成分分析(PCA)进行一次降维,将光谱数据降维至10维(前10个主成分的累积贡献率大于98%),再用线性鉴别分析(LDA)进行二次降维,降至2维。
建模过程中采用的建模方法为仿生模式识别方法(BPR)对NH101的母本和杂交种进行建模。使用二权值神经元作为基本覆盖单元,运用最小生成树的方法连接基本覆盖单元,建立玉米母本和杂交种的仿生模式识别模型。
为了能够表示出各类种子样品集差异的大小,定义它们在特征空间的相对距离Rij,如下述公式所述:
Figure PCTCN2015090229-appb-000001
式中Rij表示第i类和第j类样本之间的相对距离,Dij表示第i类与第j类重心的平方欧氏距离,Wi表示第i类样本内部的平均离差平方和,Wj表示第j类内部的平均离差平方和。Rij的值越大,表明第i类和第j类样品集差异也越大。在定性分析中两个样品集之间的相对距离可评价这两类样品的分类能力。
实验中的测试结果用正确识别率(Correct Acceptance Rate,CAR)、正确拒识率(Correct Rejection Rate,CRR)和相对距离等参数来评价。
(四)利用所建立的鉴别模型,对待鉴别种子的光谱进行鉴别。
用T1时间段的光谱数据探究NH101母本和杂交种玉米种子对近红外光谱是否具有可分性,其相对距离是否满足鉴别要求。
图2为T1时间段原始光谱用PCA方法降维,其前10维用LDA方法继续降维之后的样本空间分布图。图中可见,由于测定光谱时间不同光谱离散在空间分布的差异难以见到,整个样品集在空间分布更加收缩,MCB和MFP的相对距离经过计算为2.0734,比单纯用PCA降维的前2维的相对距离增大近70倍。图中可得光谱预处理对校正光谱失真与离散的重要性。
为了使鉴别模型能包容不同产地和年份的NH101母本和杂交种的差异,使用T1时间段测量的MCB、MFP和T3、T5时间段测量的MCB、MFP、HCB、HFP作为训练集建立模型,扩大了建模集样品来源的范围,提高了模型应对样品来源变动的容变性。用T2时间段测量的MCB、MFP和T4时间段测量的MCB、MFP、HCB、HFP组成测试集进行测试。实验证明,经过扩大建模集样品来源的变动范围,鉴别模型能够包容产地和年份的差异。数据的二维空间分布如图3所示。
本发明以不同年份和产地的NH101玉米杂交种和母本种子为对象,研究了鉴别玉米杂交种子纯度的近红外光谱分析方法,通过代表性样本的选择,能够提高模型对光谱采集时间、地点、环境等条件变动的应变能力,也提高了模型对样品种子制种时间与地点变动的应变能力,增强了模型的稳健性,能够提高测试样品的正确识别率和正确拒识率,所建模型具有实际的可行性,可以进一步为实用设备的研制提供基础。
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (8)

  1. 一种基于近红外光谱的杂交种纯度鉴别方法,其特征在于,包括以下步骤:
    步骤1:采集种子样本的近红外光谱数据;
    步骤2:对采集的种子样本的近红外光谱数据进行预处理,并从经过预处理的光谱数据中选择具代表性的训练样本光谱数据;
    步骤3:通过特征提取算法和建模方法对所选取的训练样本光谱数据建立种子的纯度鉴别模型;
    步骤4:利用所建立的鉴别模型,对待鉴别种子的光谱进行鉴别。
  2. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    所述种子样本的近红外光谱数据来源于近红外光谱仪,如果有相同型号的多台近红外光谱仪,则在采集种子样本的近红外光谱数据时,多台近红外光谱仪所处的外部环境相同;
    对同一份种子样本,在相同的测量时间点要求在不同的近红外光谱仪上进行测量,得到对应的多条光谱数据。
  3. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    步骤2中所述对近红外光谱数据进行预处理,采用的预处理方法包括数据归一化处理、导数法处理、平滑处理、中心化及标准化处理。
  4. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    步骤2中所述具代表性的训练样本光谱数据,是指能够对由于种子样品光谱测定时间、空间的变动、样品产地与制种时间变动这些不确定因素给纯度鉴别带来的影响进行包容的样本数据。
  5. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    步骤3中所述建立种子的纯度鉴别模型,包括对挑选的训练样本光谱数据进行降维处理,该降维处理包括主成分分析(PCA)、偏最小二乘法回归(PLS)或线性鉴别分析(LDA)降维方法。
  6. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    步骤3中所述建立种子的纯度鉴别模型,采用的建模方法包括但不限于:基于高维形象几何分析的仿生模式识别方法(BPR)、支持向量机(SVM)或最近欧氏距离方法。
  7. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    步骤4中所述利用所建立的鉴别模型,对待鉴别种子的光谱进行鉴别,包括:
    首先获取待鉴别种子的光谱数据,然后对该待鉴别种子的光谱数据进行预处理、特征提取,最后利用所建立的鉴别模型进行鉴别,并给出鉴别结果。
  8. 根据权利要求1所述的基于近红外光谱的杂交种纯度鉴别方法,其特征在于:
    对采集的待鉴别种子的近红外光谱数据进行的预处理与特征提取操作,与所述的纯度鉴别模型所使用的预处理与特征提取操作相同。
PCT/CN2015/090229 2015-03-25 2015-09-22 一种基于近红外光谱的杂交种纯度鉴别方法 WO2016150130A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510133847.1A CN105866056A (zh) 2015-03-25 2015-03-25 一种基于近红外光谱的杂交种纯度鉴别方法
CN201510133847.1 2015-03-25

Publications (1)

Publication Number Publication Date
WO2016150130A1 true WO2016150130A1 (zh) 2016-09-29

Family

ID=56623611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/090229 WO2016150130A1 (zh) 2015-03-25 2015-09-22 一种基于近红外光谱的杂交种纯度鉴别方法

Country Status (2)

Country Link
CN (1) CN105866056A (zh)
WO (1) WO2016150130A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084227A (zh) * 2019-05-22 2019-08-02 黑龙江八一农垦大学 基于近红外光谱技术的模式识别方法
CN111595802A (zh) * 2020-04-30 2020-08-28 珠海大横琴科技发展有限公司 一种基于nir光谱的忧遁草种源地分类模型的构建方法及应用
CN112649393A (zh) * 2021-01-14 2021-04-13 中国林业科学研究院木材工业研究所 一种基于红外光谱的交趾黄檀小摆件鉴别方法
CN112649394A (zh) * 2021-01-14 2021-04-13 中国林业科学研究院木材工业研究所 一种基于红外光谱的降香黄檀手串鉴别方法
CN112697743A (zh) * 2021-01-14 2021-04-23 中国林业科学研究院木材工业研究所 一种基于二维相关红外光谱的檀香紫檀笔筒鉴别方法
CN112924412A (zh) * 2021-01-22 2021-06-08 中国科学院合肥物质科学研究院 基于近红外光谱的单籽粒水稻品种真实性判别方法及装置
CN113158935A (zh) * 2021-04-28 2021-07-23 上海应用技术大学 一种酒类光谱峭度回归模式的年份鉴定系统及年份鉴定方法
CN113536927A (zh) * 2021-06-15 2021-10-22 南昌海关技术中心 一种赣南脐橙鉴别方法、系统及存储介质
CN114136920A (zh) * 2021-12-02 2022-03-04 华南农业大学 一种基于高光谱的单粒杂交水稻种子种类鉴定方法
CN115060687A (zh) * 2022-08-18 2022-09-16 南京富岛信息工程有限公司 一种成品油生产企业税收监管方法
WO2023142256A1 (zh) * 2022-01-28 2023-08-03 深圳市现代农业装备研究院 杂交制种雄性不育苗早期识别分选方法、系统及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106383088A (zh) * 2016-08-19 2017-02-08 合肥工业大学 一种基于多光谱成像技术的种子纯度快速无损检测方法
CN107024445A (zh) * 2017-04-17 2017-08-08 中国科学院南京土壤研究所 蔬菜中硝酸盐的快速检测的建模方法和检测方法
US10984334B2 (en) * 2017-05-04 2021-04-20 Viavi Solutions Inc. Endpoint detection in manufacturing process by near infrared spectroscopy and machine learning techniques
CN110724758B (zh) * 2019-11-27 2022-07-01 北京市农林科学院 一种基于snp标记鉴定京农科728玉米杂交种纯度的方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004079347A1 (en) * 2003-03-07 2004-09-16 Pfizer Products Inc. Method of analysis of nir data
CN1831515A (zh) * 2006-04-03 2006-09-13 浙江大学 用可见光和近红外光谱技术无损鉴别作物种子品种的方法
EP2140749A1 (en) * 2008-07-04 2010-01-06 Aarhus Universitet Det Jordbrugsvidenskabelige Fakultet Classification of seeds
CN101738373A (zh) * 2008-11-24 2010-06-16 中国农业大学 一种鉴别作物种子品种的方法
CN103344602A (zh) * 2013-07-04 2013-10-09 中国科学院合肥物质科学研究院 一种基于近红外光谱的水稻种质真伪无损检测方法
CN104062262A (zh) * 2014-07-09 2014-09-24 中国科学院半导体研究所 一种基于近红外光谱的作物种子品种真实性鉴别方法
CN104374739A (zh) * 2014-10-30 2015-02-25 中国科学院半导体研究所 一种基于近红外定性分析的种子品种真实性鉴别方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100927144B1 (ko) * 2002-10-19 2009-11-18 삼성전자주식회사 램덤화 특성이 개선된 내부 인터리버를 가지는 디지털방송시스템의 전송장치 및 그의 전송방법

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004079347A1 (en) * 2003-03-07 2004-09-16 Pfizer Products Inc. Method of analysis of nir data
CN1831515A (zh) * 2006-04-03 2006-09-13 浙江大学 用可见光和近红外光谱技术无损鉴别作物种子品种的方法
EP2140749A1 (en) * 2008-07-04 2010-01-06 Aarhus Universitet Det Jordbrugsvidenskabelige Fakultet Classification of seeds
CN101738373A (zh) * 2008-11-24 2010-06-16 中国农业大学 一种鉴别作物种子品种的方法
CN103344602A (zh) * 2013-07-04 2013-10-09 中国科学院合肥物质科学研究院 一种基于近红外光谱的水稻种质真伪无损检测方法
CN104062262A (zh) * 2014-07-09 2014-09-24 中国科学院半导体研究所 一种基于近红外光谱的作物种子品种真实性鉴别方法
CN104374739A (zh) * 2014-10-30 2015-02-25 中国科学院半导体研究所 一种基于近红外定性分析的种子品种真实性鉴别方法

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084227A (zh) * 2019-05-22 2019-08-02 黑龙江八一农垦大学 基于近红外光谱技术的模式识别方法
CN111595802A (zh) * 2020-04-30 2020-08-28 珠海大横琴科技发展有限公司 一种基于nir光谱的忧遁草种源地分类模型的构建方法及应用
CN112649393A (zh) * 2021-01-14 2021-04-13 中国林业科学研究院木材工业研究所 一种基于红外光谱的交趾黄檀小摆件鉴别方法
CN112649394A (zh) * 2021-01-14 2021-04-13 中国林业科学研究院木材工业研究所 一种基于红外光谱的降香黄檀手串鉴别方法
CN112697743A (zh) * 2021-01-14 2021-04-23 中国林业科学研究院木材工业研究所 一种基于二维相关红外光谱的檀香紫檀笔筒鉴别方法
CN112924412B (zh) * 2021-01-22 2022-11-04 中国科学院合肥物质科学研究院 基于近红外光谱的单籽粒水稻品种真实性判别方法及装置
CN112924412A (zh) * 2021-01-22 2021-06-08 中国科学院合肥物质科学研究院 基于近红外光谱的单籽粒水稻品种真实性判别方法及装置
CN113158935A (zh) * 2021-04-28 2021-07-23 上海应用技术大学 一种酒类光谱峭度回归模式的年份鉴定系统及年份鉴定方法
CN113158935B (zh) * 2021-04-28 2023-09-22 上海应用技术大学 一种酒类光谱峭度回归模式的年份鉴定系统及年份鉴定方法
CN113536927A (zh) * 2021-06-15 2021-10-22 南昌海关技术中心 一种赣南脐橙鉴别方法、系统及存储介质
CN114136920A (zh) * 2021-12-02 2022-03-04 华南农业大学 一种基于高光谱的单粒杂交水稻种子种类鉴定方法
WO2023142256A1 (zh) * 2022-01-28 2023-08-03 深圳市现代农业装备研究院 杂交制种雄性不育苗早期识别分选方法、系统及存储介质
CN115060687A (zh) * 2022-08-18 2022-09-16 南京富岛信息工程有限公司 一种成品油生产企业税收监管方法
CN115060687B (zh) * 2022-08-18 2022-11-08 南京富岛信息工程有限公司 一种成品油生产企业税收监管方法

Also Published As

Publication number Publication date
CN105866056A (zh) 2016-08-17

Similar Documents

Publication Publication Date Title
WO2016150130A1 (zh) 一种基于近红外光谱的杂交种纯度鉴别方法
CN101881726B (zh) 植物幼苗综合性状活体无损检测方法
CN109142317B (zh) 一种基于随机森林模型的拉曼光谱物质识别方法
CN105224960B (zh) 基于聚类算法的玉米种子高光谱图像分类识别模型更新方法
CN104374738B (zh) 一种基于近红外提高鉴别结果的定性分析方法
CN110376167A (zh) 基于无人机高光谱的水稻叶片氮含量监测方法
CN104062262A (zh) 一种基于近红外光谱的作物种子品种真实性鉴别方法
CN108181263B (zh) 基于近红外光谱的烟叶部位特征提取及判别方法
CN105486655B (zh) 基于红外光谱智能鉴定模型的土壤有机质快速检测方法
CN101210875A (zh) 基于近红外光谱技术的无损测量土壤养分含量的方法
CN108844917A (zh) 一种基于显著性假设检验和偏最小二乘法的近红外光谱数据分析方法
CN103940748B (zh) 基于高光谱技术的柑橘冠层含氮量预测与可视化的方法
Wang et al. Identification of maize haploid kernels based on hyperspectral imaging technology
CN106018332A (zh) 一种柑桔黄龙病的近红外光谱田间检测方法
CN103048273A (zh) 基于模糊聚类的水果近红外光谱分类方法
Cui et al. Identification of maize seed varieties based on near infrared reflectance spectroscopy and chemometrics
CN102072767A (zh) 基于波长相似性共识回归红外光谱定量分析方法和装置
CN104374739A (zh) 一种基于近红外定性分析的种子品种真实性鉴别方法
CN112085781A (zh) 一种基于光谱重构技术提取冬小麦种植面积的方法
CN115905881B (zh) 黄珍珠分类的方法以及装置、电子设备、存储介质
CN112098361A (zh) 一种基于近红外光谱的玉米种子鉴别方法
Wen et al. Measurement of nitrogen content in rice by inversion of hyperspectral reflectance data from an unmanned aerial vehicle
Suzuki et al. Mapping the spatial distribution of botanical composition and herbage mass in pastures using hyperspectral imaging
CN114112983B (zh) 一种基于Python数据融合的藏药全缘叶绿绒蒿产地判别方法
CN113570538A (zh) 一种叶片rgb图像偏态分布参数信息采集及分析方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15886052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/05/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15886052

Country of ref document: EP

Kind code of ref document: A1