CN108445035B

CN108445035B - A method for identification of maize haploid grains based on NMR CPMG decay curve

Info

Publication number: CN108445035B
Application number: CN201810377928.XA
Authority: CN
Inventors: 陈绍江; 李金龙; 李伟; 焦炎炎; 张俊稳; 陈琛; 陈明; 刘晨旭; 田小龙; 钟裕; 祁晓龙; 王鼎昌
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2021-02-02
Anticipated expiration: 2038-04-25
Also published as: CN108445035A

Abstract

本发明公开了一种基于核磁共振CPMG衰减曲线鉴别玉米单倍体籽粒的方法。本发明提供的方法包括如下步骤：(1)采集训练集各个玉米籽粒的核磁信号，获得每个籽粒质量归一化后的CPMG衰减曲线；(2)对0‑600ms区段进行数据处理，然后进行主成分分析，然后构建单倍体鉴别模型；(3)取待测玉米籽粒，采集核磁信号，获得质量归一化后的CPMG衰减曲线；(4)对0‑600ms区段进行数据处理，然后进行主成分分析，然后将结果代入单倍体鉴别模型，由模型输出结果。本发明提供的方法可以用于自动化鉴别，对于推动玉米单倍体育种技术工程化具有重要作用。本发明提供的鉴别玉米单倍体的方法，简单可行、快速高效，并具有广适性，具有重大的应用推广价值。The invention discloses a method for identifying corn haploid grains based on nuclear magnetic resonance CPMG decay curve. The method provided by the invention includes the following steps: (1) collecting the nuclear magnetic signals of each corn kernel in the training set, and obtaining the normalized CPMG decay curve of the quality of each kernel; (2) performing data processing on the 0-600ms section, and then Carry out principal component analysis, and then construct a haploid identification model; (3) take the corn kernels to be tested, collect NMR signals, and obtain a mass-normalized CPMG decay curve; (4) perform data processing on the 0-600ms section, Then carry out principal component analysis, and then substitute the results into the haploid identification model, and output the results from the model. The method provided by the invention can be used for automatic identification and plays an important role in promoting the engineering of maize haploid breeding technology. The method for identifying maize haploids provided by the invention is simple, feasible, fast and efficient, has wide applicability, and has great application and popularization value.

Description

Method for identifying corn haploid grains based on nuclear magnetic resonance CPMG attenuation curve

Technical Field

The invention relates to the field of identification of corn haploid grains, in particular to a method for identifying corn haploid grains based on a nuclear magnetic resonance CPMG attenuation curve.

Background

Corn is native to central and south america, has been introduced into china for over 400 years, and is the first crop in China due to high yield, wide application, strong adaptability and rapid development of cultivation area. Meanwhile, the corn is the crop with the highest commercialization degree, and the commercial operation mode requires seed companies to keep up with the trend of variety updating, so that the corn variety suitable for the market can be cultivated more quickly. The DH breeding technology can reduce the time of line selection, shorten the breeding period and improve the breeding efficiency.

The corn Haploid technology is a breeding technology which is easy to realize engineering and comprises four links of preparation of basic materials, production of Haploid, Haploid doubling, management and application of Double Haploid (DH) systems and the like. Wherein haploid production comprises two key steps of haploid induction and identification. The induced line is used as a male parent to be hybridized with the basic material, and the offspring can generate haploid with certain frequency. At present, the induction rate of a haploid induction line is only 2% -15%, hybrid grains are only haploid grains, and more double-hybrid grains, and how to rapidly and accurately identify the haploid from a large number of induced grains becomes very important.

At present, a plurality of identification methods exist, and the identification methods can be divided into a kernel development stage, a kernel stage and a kernel post-stage according to an identification period. The identification in the grain development stage mainly depends on the tissue culture technology and is carried out after pollination according to the color development condition or the existence of fluorescence. The common method for the kernel stage is based on the kernel color expression. The post-kernel stage is mainly to identify the induced kernels after planting into plants, and the common method is according to the forms of the plants, because the ploidy of haploids and heterozygous diploids is different, the forms of the plants are different: the haploid plant is short and small, has long and narrow leaves and is mostly sterile.

Although the accuracy of each identification method is different, in the large-scale haploid engineering breeding, the efficiency and the cost are the first problems to be considered. The seed development stage identification needs to be carried out by means of tissue culture technology, an industrial tissue culture laboratory is constructed, and the seed development stage identification has certain timeliness (only can be carried out for a certain number of days after pollination). The seeds need to be planted in a seedling raising pot or a field in the later stage of the seeds, so that a large amount of land resources are occupied, and seedling transplanting (seedling raising pot) and impurity removal (field) in the later stage are also tedious work. But the identification time is flexible in the kernel stage compared with the kernel development stage, and manpower and material resources are saved in the later stage compared with the kernel. Thus, the kernel stage is a good time to identify haploids.

The R1-nj color marking system is a method for identifying the corn haploid at the most widely applied mature grain stage at present. The method is proposed by Nanda and Chase in 1966, and according to the forming characteristics of the haploid, the haploid embryo only contains chromosomes of a female parent, so that the expression of the haploid embryo in the color of the embryo is different from that of the diploid, and the purpose of haploid identification can be achieved only by color recognition in the seed stage. The advantages of the R1-nj color marking system are: the technical content is not high, and the method is simple and quick. Disadvantages of the R1-nj color marking system are: some germplasm materials have dominant suppressor genes such as C1-I, and R1-nj has great difference in expression definition in grains; due to manual selection, visual fatigue occurs after long-term work, and the identification accuracy rate of different personnel is different.

Disclosure of Invention

The invention aims to provide a method for identifying corn haploid grains based on a nuclear magnetic resonance CPMG attenuation curve.

The first method for identifying the corn haploid comprises the following steps:

(1) collecting nuclear magnetic signals of each corn grain of the training set to obtain a CPMG attenuation curve of each grain, and then dividing the amplitude corresponding to each time point by the weight of the grain to normalize the data to obtain a CPMG attenuation curve after the quality of each grain is normalized; the training set consists of a plurality of corn grains, wherein one part of the corn grains is a real haploid, and the other part of the corn grains is a real diploid;

(2) performing data processing on the 0-600ms section of the CPMG attenuation curve obtained in the step (1) after the quality normalization, then performing principal component analysis, and then constructing a haploid identification model;

(3) acquiring nuclear magnetic signals of corn kernels to be detected to obtain a CPMG attenuation curve, and dividing the amplitude corresponding to each time point by the kernel weight to normalize the data to obtain a CPMG attenuation curve after quality normalization;

(4) and (3) carrying out data processing on the 0-600ms section of the CPMG attenuation curve obtained in the step (3) after the quality normalization, then carrying out principal component analysis, then substituting the result into the haploid identification model constructed in the step (2), and outputting the result of predicting the haploid or diploid corn kernel to be detected by the model.

In the step (2) and the step (4), the data processing is smoothing processing. The smoothing process may specifically be a 10-point smoothing process.

In the step (2) and the step (4), the number of principal components in the principal component analysis is 100.

In the step (2), the algorithm for constructing the haploid identification model is a support vector machine algorithm.

The parameters of the support vector machine algorithm are as follows: the sigma is 0.004976874, and the penalty factor C is 16.

In the step (2) and the step (4), the principal component analysis is a principal component analysis based on an R language.

And (3) acquiring nuclear magnetic signals by using a nuclear magnetic resonance instrument and matched nuclear magnetic resonance analysis software. The NMR spectrometer may be a MesoMR23-020H-I NMR spectrometer manufactured by Neumei technologies, Inc., Shanghai. The nuclear magnetic resonance analysis software may be specifically "CPMG (CPMG) pulse sequence". The nuclear magnetic signal acquisition parameter setting is specifically as follows: TW is 800ms, TE is 0.600ms, and NS is 16.

In the step (1) and the step (3), the software used for quality normalization may specifically be: microsoft Excel 2016MSO 32 bit.

The second method for identifying the corn haploid comprises the following steps:

(2) performing data processing on the CPMG attenuation curve after the quality normalization obtained in the step (1), then performing principal component analysis, and then constructing a haploid identification model;

(4) and (3) carrying out data processing on the CPMG attenuation curve after the quality normalization obtained in the step (3), then carrying out principal component analysis, then substituting the result into the haploid identification model constructed in the step (2), and outputting the result that the corn kernel to be detected is the prediction haploid or the prediction diploid by the model.

The third method for identifying the corn haploid comprises the following steps:

(2) performing principal component analysis on the 0-600ms section of the CPMG attenuation curve obtained in the step (1) after the quality normalization, and then constructing a haploid identification model;

(4) and (3) carrying out principal component analysis on the 0-600ms section of the CPMG attenuation curve obtained in the step (3) after the quality normalization, then substituting the result into the haploid identification model constructed in the step (2), and outputting the result that the corn kernel to be detected is a predicted haploid or a predicted diploid by the model.

The fourth method for identifying the corn haploid comprises the following steps:

(2) performing principal component analysis on the CPMG attenuation curve after the quality normalization obtained in the step (1), and then constructing a haploid identification model;

(4) and (3) carrying out principal component analysis on the CPMG attenuation curve after the quality normalization obtained in the step (3), substituting the result into the haploid identification model constructed in the step (2), and outputting the result of predicting the corn kernel to be tested to be a haploid or a diploid by the model.

The invention also protects the application of any one of the methods in identifying the corn haploid.

The invention also protects the application of the nuclear magnetic resonance apparatus and the vector recorded with any one of the methods in identifying the corn haploid. The NMR spectrometer may be a MesoMR23-020H-I NMR spectrometer manufactured by Neumei technologies, Inc., Shanghai.

The invention also provides a system for identifying the corn haploid, which comprises a nuclear magnetic resonance apparatus and a carrier recorded with any one of the methods. The NMR spectrometer may be a MesoMR23-020H-I NMR spectrometer manufactured by Neumei technologies, Inc., Shanghai.

Any of the diploids described above is a heterozygous diploid.

The corn kernel is mature kernel.

The true haploids are obtained by field test identification.

The true diploid is obtained by field test identification.

In any of the above methods, the corn kernels in the training set and the corn kernels to be tested belong to the same cross population.

In any of the above methods, the corn kernels in the training set are obtained by sampling from the cross population in which the corn kernels to be tested are located.

The hybrid population may specifically be the following hybrid population: and (3) adopting a haploid inducing line to hybridize with the hybrid corn to obtain hybrid progeny (seeds). In the hybridization, the haploid inducer line serves as a male parent. The haploid inducer line is a non-high oil inducer line.

The hybrid population may specifically be the following hybrid population: respectively hybridizing n1 haploid inducing lines with m1 hybrid corns, and then mixing obtained hybrid progeny (seeds). In each set of crosses, the haploid inducer line serves as the male parent and the hybrid maize serves as the female parent. The haploid inducer line is a non-high oil inducer line.

The hybrid population may specifically be the following hybrid population: hybridizing by using a corn Zhengdan 958 as a female parent and a corn haploid induction line CAU3 as a male parent to obtain hybrid progeny (seeds) to form a hybrid population A1; hybridizing by using a corn Zhengdan 958 as a female parent and a corn haploid induction line CAU4 as a male parent to obtain hybrid progeny (seeds) to form a hybrid population A2; hybridizing by using a corn Zhengdan 958 as a female parent and a corn haploid induction line CAU5 as a male parent to obtain hybrid progeny (seeds) to form a hybrid population A3; hybridizing by using corn BM as a female parent and using a corn haploid induction line CAU5 as a male parent to obtain hybrid progeny (seeds) to form a hybrid population A4; hybridizing by using a maize Jingke 968 as a female parent and a maize haploid induction line CAU5 as a male parent to obtain hybrid progeny (seeds) to form a hybrid population A5; and mixing the five hybridization groups to obtain a hybridization group.

The method provided by the invention can be used for automatic identification and has an important effect on promoting the engineering of the corn haploid breeding technology. The method for identifying the corn haploid is simple, feasible, rapid and efficient, has universality and has great application and popularization values.

Drawings

Fig. 1 is a mass-normalized CPMG decay curve (full relaxation time) for each kernel.

Fig. 2 is a CPMG attenuation curve (0-600ms) after mass normalization of each kernel.

Fig. 3 is a graph of the CPMG decay curve normalized to the average mass of all true haploids versus the CPMG decay curve normalized to the average mass of all true diploids.

Detailed Description

The following examples are given to facilitate a better understanding of the invention, but do not limit the invention. The experimental procedures in the following examples are conventional unless otherwise specified. The test materials used in the following examples were purchased from a conventional biochemical reagent store unless otherwise specified. The quantitative tests in the following examples, all set up three replicates and the results averaged.

Zhengdan 958, Jingke 968 and BM are hybrid corn. The corn induction line CAU3, the corn induction line CAU4 and the corn induction line CAU5 are corn haploid induction lines (non-high oil).

Corn induction line CAU3 (also called "Nongda high inducing No. 3"): the method is a conventional induction line for breeding by the national corn improvement center of China agricultural university. Corn induction line CAU4 (also called "Nongda high inducing No. 4"): the method is a conventional induction line for breeding by the national corn improvement center of China agricultural university. Corn induction line CAU5 (also called "Nongda high inducing No. 5"): the method is a conventional induction line for breeding by the national corn improvement center of China agricultural university.

Zhengdan 958: the product of Beijing agriculture species Limited, implements the standard: GB 4404.1-2008.

Jingke 968: the product of Beijing Tungyu species Co., Ltd, the number is approved: jade 2011007 was examined domestically.

Corn BM: the F1 generation individual is obtained by hybridization with B73 as a female parent and Mo17 as a male parent.

The NMR spectrometer used in the examples was a MesoMR23-020H-I NMR spectrometer manufactured by Neumei technologies, Inc. of Shanghai.

The principal component analysis in the embodiments is a principal component analysis based on the R language.

The indexes for evaluating the model effect are accuracy, selection missing rate and selection error rate. The model evaluation confusion matrix is shown in table 1.

TABLE 1

Total up to	Haploid (true)	Heterozygous diploid (true)
			Haploid (prediction)	True Positive(TP)	False Positive(FP)
Heterozygous diploid (predictive)	False Negtive(FN)	True Negtive(TN)

The accuracy is as follows: the number of diploid grains which are predicted to be haploid or heterozygous with the real accounts for the percentage of all grains.

The selection missing rate is as follows: how many haplotypes out of all haplotypes were judged as heterozygous diploids.

The wrong selection rate is as follows: percentage of heterozygous diploids in grain predicted to be haploid.

Example 1 preparation of hybrid population

Methods for identifying haploid and heterozygous diploids: after the corn ears are mature, harvesting the ears obtained by hybridization, and placing the ears in a dry environment for airing; and then, selecting haploid grains and heterozygous diploid (diploid for short) grains according to the R1-nj color, wherein the grains with purple endosperm and colorless embryonic shield slices are the haploid grains, and the grains with purple endosperm and purple embryonic shield slices are the heterozygous diploid grains. The method is adopted to screen haploid and heterozygous diploid from the filial generation obtained in the embodiment, wherein the haploid is a real haploid, and the heterozygous diploid is a real diploid.

Time: 2017. A place: hainan province.

Hybridizing by using a corn Zhengdan 958 as a female parent and a corn haploid induction line CAU3 as a male parent to obtain hybrid progeny (seeds); randomly taking 45 haploids and 45 heterozygous diploids from hybrid offspring (grains) to form a hybrid population A1.

Hybridizing by using a corn Zhengdan 958 as a female parent and a corn haploid induction line CAU4 as a male parent to obtain hybrid progeny (seeds); from the filial generation (grain), 34 haploids and 35 heterozygous diploids are randomly selected to form a cross population A2.

Hybridizing by using a corn Zhengdan 958 as a female parent and a corn haploid induction line CAU5 as a male parent to obtain hybrid progeny (seeds); randomly taking 20 haploids and 20 heterozygous diploids from hybrid offspring (grains) to form a hybrid population A3.

Hybridizing by using corn BM as a female parent and using a corn haploid induction line CAU5 as a male parent to obtain hybrid progeny (seeds); from the filial generation (grain), 50 haploids and 50 heterozygous diploids are randomly selected to form a cross population A4.

Hybridizing by using a Jingke 968 corn as a female parent and a haploid induction line CAU5 corn as a male parent to obtain hybrid progeny (grains); from the filial generation (grain), 50 haploids and 20 heterozygous diploids are randomly selected to form a cross population A5.

And mixing the five hybridization groups to obtain a hybridization group B (369 grains in total, 199 haploid grains and 170 heterozygous diploid grains). Statistics of kernel numbers for hybrid population B are shown in table 2.

TABLE 2

Example 2 Nuclear magnetic Signal acquisition

And (3) respectively processing each seed in the hybrid population B as follows:

1. and (5) weighing.

2. Nuclear magnetic resonance instrument and nuclear magnetic resonance analysis software CPMG (CPMG) pulse sequence are adopted for nuclear magnetic signal acquisition. The parameters are set as follows: TW is 800ms, TE is 0.600ms, and NS is 16. Obtaining the CPMG attenuation curve of each seed.

3. Quality normalization (eliminating the influence of kernel weight on the signal quantity)

And dividing the amplitude corresponding to each time point by the weight of the grains to normalize the data to obtain a CPMG attenuation curve after the quality normalization (1 CPMG attenuation curve after the quality normalization is obtained for each grain).

The software adopted for the quality normalization is as follows: microsoft Excel 2016MSO 32 bit.

4. Spectral band selection

The mass-normalized CPMG decay curves (full relaxation times) of individual kernels are shown in fig. 1.

The mass-normalized CPMG attenuation curves (0-600ms) of each kernel are shown in FIG. 2.

The mean mass normalized CPMG decay curve for all true haploids versus the mean mass normalized CPMG decay curve for all true diploids are shown in fig. 3.

Observing the CPMG attenuation curve after mass normalization, the CPMG attenuation curve tends to be stable after 600ms, and has larger variation in 0-600ms, so that a 0-600ms section is intercepted for analysis, and 1000 points are summed in the section.

Example 3 selection of data processing method

Model building was performed 100 times and the results averaged. In each model building, 80% of haploids and 80% of heterozygous diploids are randomly taken from the hybridization group B to form a training set, and the remaining 20% of haploids and 20% of heterozygous diploids form a verification set.

Firstly, processing the data of the training set grains as follows

And (3) carrying out data processing on the 0-600ms section of the CPMG attenuation curve after the quality normalization obtained in the embodiment 2, then carrying out principal component analysis, and adopting the first 100 principal components as variables and adopting a support vector machine algorithm to construct a haploid identification model.

The data processing method respectively adopts the following steps: 10-point smoothing processing (S), first-order derivation (D) and vector normalization processing (V), wherein the first-order derivation (SD) is carried out after the 10-point smoothing processing, the vector normalization processing (SV) is carried out after the 10-point smoothing processing, and the first-order derivation and the vector normalization processing (SDV) are carried out after the 10-point smoothing processing.

Secondly, the verification kernel is processed as follows

And (3) carrying out data processing on the 0-600ms section of the CPMG attenuation curve obtained in the embodiment 2 after the quality normalization (the data processing method is the same as the step one), then carrying out principal component analysis, and then substituting the result into the haploid identification model constructed in the step one to obtain the predicted value.

And evaluating the model according to the predicted value and the true value of the verification set grains. The results are shown in Table 3. The data processing modeling effect is best by adopting a 10-point smoothing method.

TABLE 3

Example 4 selection of the amount of principal Components

Firstly, processing the data of the training set grains as follows

And (3) carrying out data processing (the data processing method is 10-point smoothing) on the 0-600ms section of the CPMG attenuation curve obtained in the embodiment 2 after the quality normalization, then carrying out principal component analysis, and adopting a principal component related to the seed character of the haploid and the heterozygous diploid as a variable to construct a haploid identification model by adopting a support vector machine algorithm.

The number of the main components is respectively set as: 50. 100, 150 or 200.

Secondly, the verification kernel is processed as follows

And (3) carrying out data processing (the data processing method is 10-point smoothing) on the 0-600ms section of the CPMG attenuation curve obtained in the embodiment 2 after the quality normalization, then carrying out principal component analysis (the number of the principal components is consistent with that of the principal components in the step one), and then substituting the result into the haploid identification model constructed in the step one to obtain a predicted value of the haploid identification model.

And evaluating the model according to the predicted value and the true value of the verification set grains. The results are shown in Table 4. The modeling effect is best when the number of the principal components is 100.

TABLE 4

Example 5 selection of modeling method

Firstly, processing the data of the training set grains as follows

The CPMG attenuation curve obtained in example 2 after the quality normalization was subjected to data processing in the 0-600ms range (the data processing method was 10-point smoothing), and then principal component analysis was performed (the number of principal components was 100) to construct a haploid identification model.

The algorithm for establishing the model is respectively as follows: support vector machine algorithm (SVM; parameters are sigma 0.004976874 and penalty coefficient C16), random forest algorithm (RF; parameters are random sampling variable number mtry 12), K neighbor algorithm (KNN; parameters are K39), decision tree algorithm (DT; parameters are number trials of independent decision trees 35), and naive Bayes algorithm (NB; prediction variables conform to independent distribution characteristics).

Secondly, the verification kernel is processed as follows

And (3) carrying out data processing (the data processing method is 10-point smoothing) on the 0-600ms section of the CPMG attenuation curve obtained in the embodiment 2 after the quality normalization, then carrying out principal component analysis (the number of the principal components is 100), and then substituting the result into the haploid identification model constructed in the step one to obtain the predicted value.

And evaluating the model according to the predicted value and the true value of the verification set grains. The results are shown in Table 5. The modeling effect is best by adopting the support vector machine algorithm.

TABLE 5

Claims

1. a method for identifying maize haploid, comprising the steps:

(1) Collect the NMR signals of each corn kernel in the training set, obtain the CPMG decay curve of each kernel, and then divide the corresponding amplitude at each time point by the kernel weight to normalize the data, and obtain the normalized weight of each kernel The CPMG decay curve of ; the training set consists of several corn kernels, some of which are true haploids and the other are true diploids;

(2) Data processing is performed on the 0-600ms section of the mass-normalized CPMG decay curve obtained in step (1), followed by principal component analysis, and then a haploid identification model is constructed; The algorithm is a support vector machine algorithm; the parameters of the support vector machine algorithm are: sigma=0.004976874, and the penalty coefficient C=16;

(3) take the corn kernel to be tested, collect the NMR signal, obtain the CPMG decay curve, then divide the corresponding amplitude at each time point by the kernel weight to normalize the data, and obtain the mass-normalized CPMG decay curve;

(4) Perform data processing on the 0-600ms segment of the mass-normalized CPMG decay curve obtained in step (3), then perform principal component analysis, and then substitute the results into the haploid identification model constructed in step (2) , the model outputs that the corn kernel to be tested is the result of predicted haploid or predicted diploid;

In the step (2) and the step (4), the data processing is 10-point smoothing;

In the step (2) and the step (4), the number of principal components of the principal component analysis is 100;

In the step (2) and the step (4), the principal component analysis is a principal component analysis based on the R language.