CN116855596A

CN116855596A - Rice variety homogeneity evaluation method

Info

Publication number: CN116855596A
Application number: CN202310832471.8A
Authority: CN
Inventors: 樊龙江; 叶楚玉; 沈恩惠
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-07-08
Filing date: 2023-07-08
Publication date: 2023-10-10

Abstract

The invention discloses a rice variety homogeneity evaluation method, which belongs to the technical field of crop genetic breeding and comprises the following steps: 1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking a plurality of single plants to respectively obtain re-sequencing data; 2. and carrying out whole genome variation detection, sample population structure analysis and construction of a phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity analysis on the genome re-sequenced variety by utilizing re-sequencing data. Through the data, the method can complete rice variety homogeneity assessment, suspicious sample screening and even new variety identification by utilizing whole genome locus information and combining genetic polymorphism indexes. Genome resequencing is carried out on a plurality of single plants of the rice variety, so that variation information in a whole genome range can be obtained for evaluating the homogeneity of the rice variety; the variety diversity can be obtained quantitatively by calculating the nucleotide polymorphism of a plurality of single plants of the rice variety.

Description

Rice variety homogeneity evaluation method

Technical Field

The invention relates to the technical field of crop genetic breeding, in particular to a rice variety homogeneity evaluation method.

Background

With the development of next generation sequencing technology (Next Generation Sequencing, NGS) to replace the first generation sequencing technology represented by Sanger sequencing, and the progress of genotyping means and bioinformatics analysis methods, molecular marker technology based on genomic sequence differences has been developed. Molecular markers represented by SSR, SNP and the like provide relatively stable and reliable genetic background basis for crop variety identification.

At present, the national standard for variety approval by SSR and SNP molecular markers is established. Taking a national standard (NYT 1433-2014) of a rice variety SSR labeling method as an example, the standard is based on the difference of the short tandem repeat times of genome DNA of different varieties of rice, and combining PCR and gel electrophoresis experiments to carry out variety comparison and distinction, and judging the similarity of two varieties according to whether the number of difference sites between samples is more than 2. However, the experimental steps involved in SSR are complicated, from DNA extraction, PCR amplification to electrophoresis detection, silver staining and the like, a large number of experimental operations and detail control are involved, the final judgment basis is still rough, and the application of the method for evaluating the differences among varieties with small differences is limited. The SNP molecular marker national standard (NYT 2745-2015) is used for detecting the polymorphism difference of the corresponding single nucleotide of the marker pair sample based on 3072, and variety judgment is carried out through the genetic similarity index, so that the accuracy is improved compared with that of the SSR marker, and the experimental process is relatively simple. However, molecular probes are relatively expensive to produce, and since the genotype information of each variety is represented by only a single sample, the possible influence of the genetic polymorphism present in the variety itself on the discrimination of molecular markers is not considered.

Based on the above, the present invention devised a rice variety homogeneity evaluation method to solve the above problems.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a rice variety homogeneity evaluation method.

In order to achieve the above purpose, the invention is realized by the following technical scheme:

a rice variety homogeneity evaluation method comprises the following steps:

1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking a plurality of single plants to respectively obtain re-sequencing data;

2. and carrying out whole genome variation detection, sample population structure analysis and construction of a phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity analysis on the genome re-sequenced variety by utilizing re-sequencing data.

Further, the whole genome variation detection method comprises the following steps:

detecting the whole genome variation sites of the detected genome resequencing variety;

decompressing the determined variety original data, filtering the decompressed high-throughput sequencing data, and removing the linker sequence, the non-ATGC base and the low-quality read length;

comparing the filtered read length to a japonica type reference genome Nipponbare;

merging, filtering and converting the compared results into BAM format files;

performing preliminary SNP variation detection on the BAM file of a single sample, then integrating preliminary SNP detection information of all samples, and performing multi-sample SNP detection according to chromosomes one by one;

integrating the chromosome variation detection result, and filtering the position points.

Further, the sample population structure analysis comprises the following steps: and converting the filtered VCF variation file into a BED format, and analyzing the structure of the sample group, wherein the K value range is 2-4.

Still further, wherein constructing the phylogenetic tree comprises the steps of: extracting all sample haplotype sequences by using a local perl script; and aligning and constructing a phylogenetic tree according to the haplotype sequence information by using FastTree based on a maximum likelihood method.

Further, wherein the IBS distance and genetic distance analysis includes the steps of: calculating IBS distance matrixes among all individuals and genetic distance distribution of individuals inside the variety;

firstly, converting a VCF file into a PED and MAP format file, then calculating IBS distance between every two samples by using PLINK, and calculating genetic distance between every two samples according to the IBS distance; the final calculation result is input into R language (language for statistical analysis, drawing) and data visualization is performed.

Further, wherein the nucleotide sequence diversity analysis comprises the steps of: and calculating pi value-pi s of all inter-individual whole genome loci in the variety, and then utilizing python script to calculate average pi value, namely dividing the sum of pi s of all loci by the total number of loci to obtain the nucleotide sequence diversity index of the variety.

Further, the variety internal diversity was calculated by combining a plurality of individuals using the following formula:

wherein pi is: the average nucleotide number of each site of any two randomly selected DNA sequences is different; x is x _i And x _j Indicating the relative frequencies of the ith and jth sequences in the population, pi _ij The number of nucleotide differences at each site between the two sequences is represented, and n represents the number of individual plants in the variety.

In the first step, DNA extraction and detection are carried out on the material, a 48-piece DNA small-piece library with the insert length of 300-500bp is constructed from a sample which is qualified in detection, and the library is sequenced by utilizing an Illumina Hiseq4000 sequencing platform and adopting a Pair End150 double-ended sequencing packet Lane mode.

Further, the method also comprises a third step of rice variety homogeneity evaluation;

further, the third step specifically includes the following steps:

3.1, processing and counting rice variety sequencing data;

3.2, primarily evaluating the phylogenetic tree;

3.3, evaluating IBS distance and genetic distance;

3.4, quantifying the homogeneity of the rice variety.

Advantageous effects

The method of the invention completes rice variety homogeneity assessment, suspicious sample screening and even new variety identification by utilizing whole genome locus information and combining genetic polymorphism indexes;

genome resequencing is carried out on a plurality of single plants of the rice variety, so that variation information in a whole genome range can be obtained for evaluating the homogeneity of the rice variety;

the variety diversity (pi value) can be quantitatively obtained by calculating the nucleotide polymorphism of a plurality of single plants of the rice variety;

the variety's own diversity can be used to evaluate homogeneity or degree of variation between varieties as compared to unknown or known materials.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a partial selection of the source of the parent and the female parent;

FIG. 2 is a phylogenetic tree between individuals within a rice variety based on genomic resequencing data;

FIG. 3 is an IBS distance heatmap; the darker the region color, the closer the IBS distance, the more similar the sample;

FIG. 4 is a graph showing the distribution of genetic distances between individuals within a rice variety;

FIG. 5 shows the comparison of nucleotide polymorphisms of rice varieties.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention is further described below with reference to examples.

Example 1

The embodiment provides a rice variety homogeneity evaluation method, which comprises the following steps:

1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking 5 single plants to respectively obtain re-sequencing data, wherein the sequencing depth is more than 10 x;

specifically, 10 common cultivated rice (o.sativa) varieties are selected, 5 indica rice varieties comprise Minghui 63, 9311, IR64, zhongjia early 17 and Huazhan, 5 japonica rice varieties comprise Nippon, wu Yun japonica 7, qiu Guang, zhonghua 11 and Shennong 265 (table 1), wherein the pedigree information corresponding to the inquired varieties is shown in fig. 1, few pedigree overlapping or mixing exists among the selected varieties, for example, the female parent of Shennong 265 is Liaojing 326, the male parent is 1308 and 02428 hybrid, the female parent of Zhonghua 11 is Beijing 5 and Tetpu hybrid, the male parent is Fujin, and the pedigree relation between the two is small, so that the interference of the difference between varieties can be controlled; under the same culture condition, observing the growth stage, the phenotypic characteristics and the like of the field plants; finally, randomly selecting 5 single plants for each variety, wherein 50 parts of materials are used in total; each material is taken 1-2 cm from the top of a fresh leaf, numbered according to the variety name and sent to be measured; the research relates to the fact that the germplasm resource of the material is from a national variety library of China paddy rice institute;

table 1 Rice variety name and type in national variety base

Carrying out DNA extraction and detection on the material, constructing 48 DNA small fragment libraries with insert fragments of 300-500bp in length by using a Illumina Hiseq4000 sequencing platform, and sequencing the libraries in a Pair End150 double-End sequencing packet Lane mode;

2. carrying out whole genome variation detection, sample population structure analysis and construction of phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity (pi) analysis on the genome resequencing variety, and specifically comprising the following steps:

2.1, whole genome variation detection

Performing full genome variation site detection on the determined 10 genome re-sequenced varieties by using fastq-dump (v 2.8.2);

decompressing the determined 10 varieties of original data, filtering the decompressed high-throughput sequencing data by using NGSQCToolkit (v2.3.3) according to a default standard, and removing a joint sequence, non-ATGC bases and low-quality read length, thereby improving the reliability of the read length and reducing random errors in the material sequencing process;

the filtered read length was aligned to the japonica type reference genome Nipponbare (IRGSP-1.0, kawahara et al, 2013) using bowtie2 (v2.3.5.1);

the results after comparison are combined, filtered and converted into BAM (Binary Alignment Map) format files by samtools (v1.3.1);

further performing preliminary SNP variation detection on BAM files of single samples by using GATK (v 3.7), then integrating preliminary SNP detection information of all samples, and performing multi-sample SNP detection according to chromosomes one by one;

in the embodiment, 12 chromosome variation detection results are integrated, and sites are filtered according to the standard that QUAL is more than or equal to 30, DP is more than or equal to 10, QD is more than or equal to 2, minimum minor allele frequency (minor allele frequency) is 0.05, and maximum deletion rate (max transmission) is 0.8;

2.2, analyzing a sample group structure and constructing a phylogenetic tree;

converting the filtered VCF variation file into BED (Browser Extensible Data) format by vccftools (v0.1.17) and PLINK (v1.9), and carrying out sample group structure analysis by using FastSTRUCTURE, wherein the K value range is 2-4;

extracting all sample haplotype (haplotype) sequences by using a local perl script; aligning and constructing a phylogenetic tree by using a FastTree (v2.1.10) based on a maximum likelihood method (Approximately maximum-likelihood) according to haplotype sequence information, wherein parameters are default;

2.3, IBS distance and genetic distance analysis

Calculating IBS distance matrixes among all individuals and genetic distance distribution of individuals inside the variety by utilizing vcftools and PLINK;

firstly, converting VCF (Variant Calling Format) files into PED and MAP format files, then calculating IBS distance between every two samples by using PLINK (the selection parameter is-genome-cluster-distance-matrix-alloy-extra-chr-alloy-no-six), and calculating Genetic distance between every two samples according to the IBS distance (Genetic distance=1-IBS); inputting a final calculation result into R, and carrying out data visualization by using ggplot 2;

2.4 nucleotide sequence diversity (pi) analysis

Calculating pi value-pi s of all inter-individual genome loci in the variety by utilizing vcftools, and then obtaining a nucleotide sequence diversity index of the variety by utilizing python script to calculate average pi value, namely dividing the sum of pi s of all loci by the total number of loci;

calculating the self homogeneity value of the variety to be detected, and calculating the internal diversity of the variety by combining five single plants by using the following formula:

wherein pi is: the average nucleotide number of each site of any two randomly selected DNA sequences is different; x is x _i And x _j Indicating the relative frequencies of the ith and jth sequences in the population, pi _ij The number of nucleotide differences at each site between two sequences is represented, and n represents the number of single plants in the variety;

calculating the nucleotide sequence diversity of the whole genome in the variety by utilizing vccftools-window-pi-indv and python scripts according to a sliding window with the window size of 100Kb, and comparing the individual difference distribution in the variety with the genetic diversity trend among varieties according to the sliding window distribution condition of the heterozygosity of each individual whole genome range in the variety; the calculation result is visualized in R;

the internal homogeneity value obtained by calculation can be compared with other varieties or materials by calculation, and the indica rice homogeneity difference value can provide a judgment basis for variety identification or internal difference;

3. the rice variety homogeneity evaluation specifically comprises the following steps:

3.1, processing and counting the sequencing data of rice varieties

Carrying out double-end sequencing on all individuals of 10 rice varieties to finally obtain sequencing data of 50 libraries, wherein the total data size is more than or equal to 360G, the average number of bases is 6.26Gb, the average number of bases is 6.15Gb after filtration, the average comparison rate of samples is 0.95, and the average sequencing depth is about 16×;

the number of original mutation sites detected by all samples of 10 resequencing varieties measured by the embodiment is 4,025,683, and 3,461,005 high-reliability SNP sites are finally obtained through filtering; based on the whole genome variation sites, developing a next evaluation method;

3.2, phylogenetic tree preliminary evaluation

The 10 varieties measured in this example are mainly divided into two branches, corresponding to indica rice and japonica rice, respectively; each variety is clustered independently, individuals inside the variety are clustered together tightly, and the difference is very small and is obviously smaller than the genetic distance difference between varieties; compared with indica type varieties, the genetic distance between individuals in the japonica type varieties is relatively closer (shown in figure 2), and the genetic distance between individuals is difficult to distinguish by naked eyes, wherein the example is the variety Wu Yun japonica type No. 7 (CX 2-1-CX 2-5), all individuals are tightly gathered on one branch; the evolutionary tree provides visual evidence for comparing the indica-japonica character, the kindred character and the difference between varieties, so that the evolutionary tree can be used as an auxiliary judging means in the rice variety homogeneity analysis process; however, the phylogenetic tree can only qualitatively judge the genetic distance between varieties and between individuals in varieties, and it is difficult to quantify the specific difference value between varieties and between internal individuals;

3.3 evaluation of IBS distance and genetic distance

Inter-individual IBS distances (fig. 3) intuitively reflect sequence similarity between different individuals;

as can be seen from the color shade of fig. 3, the degree of IBS distance difference is: indica/japonica > indica/indica > japonica/japonica; the resequencing varieties measured by the embodiment are relatively uniform in internal color, and large in color difference among varieties, so that good resequencing variety consistency of the embodiment is indirectly reflected; IBS distance heatmaps comparatively intuitively show the uniformity inside varieties and the background difference degree of subspecies, varieties and individual levels;

statistical results of Genetic Distance (GD) differences based on IBS distances show that the differences of the Genetic distances between the japonica types and the indica types in the 10 re-sequenced varieties measured in the embodiment are obvious (figure 4), and the differences of the Genetic distances between the japonica types and the indica types are consistent with the results of the IBS distances, so that the Genetic backgrounds of the indica types and the japonica types are quite different;

from the difference in individual genetic distances, the inter-variety individual difference of the japonica type re-sequenced variety is between 0.0002 and 0.001, and the inter-variety difference of the indica type re-sequenced variety is about 0.0003 to 0.0072 (Table 2);

it is noted that, since 10 resequencing varieties selected in this example are from the national variety base and are strictly controlled in variety selection, phenotype identification, material handling, sequencing, etc., the homogeneity of the varieties is theoretically very high, in other words, the differences between individuals within the varieties are small; while the genetic distance between the insides of individuals of the indica type variety Huazhan (CX 10) is maximally 0.0254 and minimally 0.0182, which indicates that the internal genetic difference of the indica type variety is larger;

TABLE 2 statistics of the differences in Genetic Distances (GD) between two individuals within different varieties

3.4, quantitative determination of the homogeneity of Rice varieties

Analyzing the internal genome difference of the rice variety based on nucleotide sequence diversity (pi);

the pi value calculation process relates to SNP locus diversity of the whole genome, can integrate variation information of the whole genome category better, and further carries out quantization;

taking 10 varieties measured in the example as an example, it can be observed that pi values corresponding to the japonica type variety and the indica type variety are obviously within a certain range, namely, the nucleotide sequence diversity among a plurality of individuals of the same variety is near a certain value (figure 5), the value is different from that of the indica type variety, the average pi value of 5 japonica type re-sequencing varieties is about 0.0016, and the average pi value of 5 indica type re-sequencing varieties is about 0.0045 (table 3);

if individuals of different varieties are mixed, the nucleotide sequence diversity of the mixed variety is obviously increased and is far greater than a single variety threshold value; also, a japonica type variety, the nucleotide sequence diversity of which is about 0.016 after the mixture of Japanese sunny (CX 1) and Zhonghua No. 11 (CX 4), and the variety pi corresponding to each of the two varieties is about 0.0015; the seeds are indica type varieties; sample nucleotide difference after IR64 (CX 8) and Huazhan (CX 10) are mixed reaches 0.063, sample mixed nucleotide sequence diversity of Shennong 265 (CX 5) and Zhongjia early 17 (CX 9) of indica type and japonica type varieties is as high as 0.153, and average pi value of Shennong type and japonica type varieties is different by two orders of magnitude;

the result shows that the nucleotide sequence diversity exists among individuals in the variety to a certain extent, and the diversity after the different varieties are mixed is far superior to the inter-individual diversity in the variety; therefore, the genome-wide nucleotide polymorphism of the variety obtained by repeated sequencing is used for variety homogeneity test, namely the same variety or strain should be within a certain degree of variation, and different varieties or strains have larger difference;

TABLE 3 average nucleotide sequence diversity and average for 10 resequencing variety loci measured in this example

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The rice variety homogeneity evaluation method is characterized by comprising the following steps:

2. The method for evaluating the homogeneity of a rice variety according to claim 1, wherein the whole genome variation detection comprises the steps of:

merging, filtering and converting the compared results into BAM format files;

3. The method for evaluating the homogeneity of a rice variety of claim 2, wherein the sample population structure analysis comprises the steps of: and converting the filtered VCF variation file into a BED format, and analyzing the structure of the sample group, wherein the K value range is 2-4.

4. A method for evaluating the homogeneity of a rice variety according to claim 3, wherein the construction of a phylogenetic tree comprises the steps of: extracting all sample haplotype sequences by using a local perl script; and aligning and constructing a phylogenetic tree according to the haplotype sequence information by using FastTree based on a maximum likelihood method.

5. The method for evaluating the homogeneity of a rice variety according to claim 4, wherein the analysis of IBS distance and genetic distance comprises the steps of: calculating IBS distance matrixes among all individuals and genetic distance distribution of individuals inside the variety;

firstly, converting a VCF file into a PED and MAP format file, then calculating IBS distance between every two samples by using PLINK, and calculating genetic distance between every two samples according to the IBS distance; and inputting the final calculation result into R language, and performing data visualization.

6. The method for evaluating the homogeneity of a rice variety according to claim 5, wherein the nucleotide sequence diversity analysis comprises the steps of: and calculating pi value-pi s of all inter-individual whole genome loci in the variety, and then utilizing python script to calculate average pi value, namely dividing the sum of pi s of all loci by the total number of loci to obtain the nucleotide sequence diversity index of the variety.

7. The method for evaluating the homogeneity of a rice variety according to claim 6, wherein the variety internal diversity is calculated by combining a plurality of individual plants using the following formula:

8. The method for evaluating the homogeneity of rice varieties according to claim 7, wherein in the first step, DNA extraction and detection are carried out on materials, a DNA small fragment library with the insert length of 300-500bp is constructed on a sample which is qualified in detection, and the library is sequenced by utilizing an Illumina Hiseq4000 sequencing platform and adopting a Pair End150 double-ended sequencing packet Lane mode.

9. The method for evaluating the homogeneity of a rice variety of claim 8, further comprising the step of evaluating the homogeneity of a rice variety.

10. The method for evaluating the homogeneity of a rice variety of claim 9, wherein the third step specifically comprises the steps of:

3.1, processing and counting rice variety sequencing data;

3.2, primarily evaluating the phylogenetic tree;

3.3, evaluating IBS distance and genetic distance;

3.4, quantifying the homogeneity of the rice variety.