CN116855596A - Rice variety homogeneity evaluation method - Google Patents
Rice variety homogeneity evaluation method Download PDFInfo
- Publication number
- CN116855596A CN116855596A CN202310832471.8A CN202310832471A CN116855596A CN 116855596 A CN116855596 A CN 116855596A CN 202310832471 A CN202310832471 A CN 202310832471A CN 116855596 A CN116855596 A CN 116855596A
- Authority
- CN
- China
- Prior art keywords
- variety
- homogeneity
- evaluating
- rice
- rice variety
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 235000007164 Oryza sativa Nutrition 0.000 title claims abstract description 51
- 235000009566 rice Nutrition 0.000 title claims abstract description 50
- 238000011156 evaluation Methods 0.000 title claims abstract description 14
- 240000007594 Oryza sativa Species 0.000 title description 33
- 230000002068 genetic effect Effects 0.000 claims abstract description 33
- 239000002773 nucleotide Substances 0.000 claims abstract description 31
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 31
- 238000012163 sequencing technique Methods 0.000 claims abstract description 27
- 238000001514 detection method Methods 0.000 claims abstract description 26
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 22
- 241000196324 Embryophyta Species 0.000 claims abstract description 17
- 238000010276 construction Methods 0.000 claims abstract description 5
- 241000209094 Oryza Species 0.000 claims abstract 18
- 244000184734 Pyrus japonica Species 0.000 claims description 18
- 239000000463 material Substances 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 102000054766 genetic haplotypes Human genes 0.000 claims description 7
- 238000013515 script Methods 0.000 claims description 7
- 210000000349 chromosome Anatomy 0.000 claims description 6
- 108020004414 DNA Proteins 0.000 claims description 4
- 238000007400 DNA extraction Methods 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 3
- 238000013079 data visualisation Methods 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 238000012165 high-throughput sequencing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 206010071602 Genetic polymorphism Diseases 0.000 abstract description 4
- 238000012216 screening Methods 0.000 abstract description 3
- 238000012214 genetic breeding Methods 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 19
- 240000002582 Oryza sativa Indica Group Species 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 108700028369 Alleles Proteins 0.000 description 2
- 240000008467 Oryza sativa Japonica Group Species 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000003147 molecular marker Substances 0.000 description 2
- 208000002109 Argyria Diseases 0.000 description 1
- 108091092878 Microsatellite Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003068 molecular probe Substances 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Zoology (AREA)
- Biophysics (AREA)
- Wood Science & Technology (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Botany (AREA)
- Mycology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a rice variety homogeneity evaluation method, which belongs to the technical field of crop genetic breeding and comprises the following steps: 1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking a plurality of single plants to respectively obtain re-sequencing data; 2. and carrying out whole genome variation detection, sample population structure analysis and construction of a phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity analysis on the genome re-sequenced variety by utilizing re-sequencing data. Through the data, the method can complete rice variety homogeneity assessment, suspicious sample screening and even new variety identification by utilizing whole genome locus information and combining genetic polymorphism indexes. Genome resequencing is carried out on a plurality of single plants of the rice variety, so that variation information in a whole genome range can be obtained for evaluating the homogeneity of the rice variety; the variety diversity can be obtained quantitatively by calculating the nucleotide polymorphism of a plurality of single plants of the rice variety.
Description
Technical Field
The invention relates to the technical field of crop genetic breeding, in particular to a rice variety homogeneity evaluation method.
Background
With the development of next generation sequencing technology (Next Generation Sequencing, NGS) to replace the first generation sequencing technology represented by Sanger sequencing, and the progress of genotyping means and bioinformatics analysis methods, molecular marker technology based on genomic sequence differences has been developed. Molecular markers represented by SSR, SNP and the like provide relatively stable and reliable genetic background basis for crop variety identification.
At present, the national standard for variety approval by SSR and SNP molecular markers is established. Taking a national standard (NYT 1433-2014) of a rice variety SSR labeling method as an example, the standard is based on the difference of the short tandem repeat times of genome DNA of different varieties of rice, and combining PCR and gel electrophoresis experiments to carry out variety comparison and distinction, and judging the similarity of two varieties according to whether the number of difference sites between samples is more than 2. However, the experimental steps involved in SSR are complicated, from DNA extraction, PCR amplification to electrophoresis detection, silver staining and the like, a large number of experimental operations and detail control are involved, the final judgment basis is still rough, and the application of the method for evaluating the differences among varieties with small differences is limited. The SNP molecular marker national standard (NYT 2745-2015) is used for detecting the polymorphism difference of the corresponding single nucleotide of the marker pair sample based on 3072, and variety judgment is carried out through the genetic similarity index, so that the accuracy is improved compared with that of the SSR marker, and the experimental process is relatively simple. However, molecular probes are relatively expensive to produce, and since the genotype information of each variety is represented by only a single sample, the possible influence of the genetic polymorphism present in the variety itself on the discrimination of molecular markers is not considered.
Based on the above, the present invention devised a rice variety homogeneity evaluation method to solve the above problems.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a rice variety homogeneity evaluation method.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
a rice variety homogeneity evaluation method comprises the following steps:
1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking a plurality of single plants to respectively obtain re-sequencing data;
2. and carrying out whole genome variation detection, sample population structure analysis and construction of a phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity analysis on the genome re-sequenced variety by utilizing re-sequencing data.
Further, the whole genome variation detection method comprises the following steps:
detecting the whole genome variation sites of the detected genome resequencing variety;
decompressing the determined variety original data, filtering the decompressed high-throughput sequencing data, and removing the linker sequence, the non-ATGC base and the low-quality read length;
comparing the filtered read length to a japonica type reference genome Nipponbare;
merging, filtering and converting the compared results into BAM format files;
performing preliminary SNP variation detection on the BAM file of a single sample, then integrating preliminary SNP detection information of all samples, and performing multi-sample SNP detection according to chromosomes one by one;
integrating the chromosome variation detection result, and filtering the position points.
Further, the sample population structure analysis comprises the following steps: and converting the filtered VCF variation file into a BED format, and analyzing the structure of the sample group, wherein the K value range is 2-4.
Still further, wherein constructing the phylogenetic tree comprises the steps of: extracting all sample haplotype sequences by using a local perl script; and aligning and constructing a phylogenetic tree according to the haplotype sequence information by using FastTree based on a maximum likelihood method.
Further, wherein the IBS distance and genetic distance analysis includes the steps of: calculating IBS distance matrixes among all individuals and genetic distance distribution of individuals inside the variety;
firstly, converting a VCF file into a PED and MAP format file, then calculating IBS distance between every two samples by using PLINK, and calculating genetic distance between every two samples according to the IBS distance; the final calculation result is input into R language (language for statistical analysis, drawing) and data visualization is performed.
Further, wherein the nucleotide sequence diversity analysis comprises the steps of: and calculating pi value-pi s of all inter-individual whole genome loci in the variety, and then utilizing python script to calculate average pi value, namely dividing the sum of pi s of all loci by the total number of loci to obtain the nucleotide sequence diversity index of the variety.
Further, the variety internal diversity was calculated by combining a plurality of individuals using the following formula:
wherein pi is: the average nucleotide number of each site of any two randomly selected DNA sequences is different; x is x i And x j Indicating the relative frequencies of the ith and jth sequences in the population, pi ij The number of nucleotide differences at each site between the two sequences is represented, and n represents the number of individual plants in the variety.
In the first step, DNA extraction and detection are carried out on the material, a 48-piece DNA small-piece library with the insert length of 300-500bp is constructed from a sample which is qualified in detection, and the library is sequenced by utilizing an Illumina Hiseq4000 sequencing platform and adopting a Pair End150 double-ended sequencing packet Lane mode.
Further, the method also comprises a third step of rice variety homogeneity evaluation;
further, the third step specifically includes the following steps:
3.1, processing and counting rice variety sequencing data;
3.2, primarily evaluating the phylogenetic tree;
3.3, evaluating IBS distance and genetic distance;
3.4, quantifying the homogeneity of the rice variety.
Advantageous effects
The method of the invention completes rice variety homogeneity assessment, suspicious sample screening and even new variety identification by utilizing whole genome locus information and combining genetic polymorphism indexes;
genome resequencing is carried out on a plurality of single plants of the rice variety, so that variation information in a whole genome range can be obtained for evaluating the homogeneity of the rice variety;
the variety diversity (pi value) can be quantitatively obtained by calculating the nucleotide polymorphism of a plurality of single plants of the rice variety;
the variety's own diversity can be used to evaluate homogeneity or degree of variation between varieties as compared to unknown or known materials.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a partial selection of the source of the parent and the female parent;
FIG. 2 is a phylogenetic tree between individuals within a rice variety based on genomic resequencing data;
FIG. 3 is an IBS distance heatmap; the darker the region color, the closer the IBS distance, the more similar the sample;
FIG. 4 is a graph showing the distribution of genetic distances between individuals within a rice variety;
FIG. 5 shows the comparison of nucleotide polymorphisms of rice varieties.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is further described below with reference to examples.
Example 1
The embodiment provides a rice variety homogeneity evaluation method, which comprises the following steps:
1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking 5 single plants to respectively obtain re-sequencing data, wherein the sequencing depth is more than 10 x;
specifically, 10 common cultivated rice (o.sativa) varieties are selected, 5 indica rice varieties comprise Minghui 63, 9311, IR64, zhongjia early 17 and Huazhan, 5 japonica rice varieties comprise Nippon, wu Yun japonica 7, qiu Guang, zhonghua 11 and Shennong 265 (table 1), wherein the pedigree information corresponding to the inquired varieties is shown in fig. 1, few pedigree overlapping or mixing exists among the selected varieties, for example, the female parent of Shennong 265 is Liaojing 326, the male parent is 1308 and 02428 hybrid, the female parent of Zhonghua 11 is Beijing 5 and Tetpu hybrid, the male parent is Fujin, and the pedigree relation between the two is small, so that the interference of the difference between varieties can be controlled; under the same culture condition, observing the growth stage, the phenotypic characteristics and the like of the field plants; finally, randomly selecting 5 single plants for each variety, wherein 50 parts of materials are used in total; each material is taken 1-2 cm from the top of a fresh leaf, numbered according to the variety name and sent to be measured; the research relates to the fact that the germplasm resource of the material is from a national variety library of China paddy rice institute;
table 1 Rice variety name and type in national variety base
Carrying out DNA extraction and detection on the material, constructing 48 DNA small fragment libraries with insert fragments of 300-500bp in length by using a Illumina Hiseq4000 sequencing platform, and sequencing the libraries in a Pair End150 double-End sequencing packet Lane mode;
2. carrying out whole genome variation detection, sample population structure analysis and construction of phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity (pi) analysis on the genome resequencing variety, and specifically comprising the following steps:
2.1, whole genome variation detection
Performing full genome variation site detection on the determined 10 genome re-sequenced varieties by using fastq-dump (v 2.8.2);
decompressing the determined 10 varieties of original data, filtering the decompressed high-throughput sequencing data by using NGSQCToolkit (v2.3.3) according to a default standard, and removing a joint sequence, non-ATGC bases and low-quality read length, thereby improving the reliability of the read length and reducing random errors in the material sequencing process;
the filtered read length was aligned to the japonica type reference genome Nipponbare (IRGSP-1.0, kawahara et al, 2013) using bowtie2 (v2.3.5.1);
the results after comparison are combined, filtered and converted into BAM (Binary Alignment Map) format files by samtools (v1.3.1);
further performing preliminary SNP variation detection on BAM files of single samples by using GATK (v 3.7), then integrating preliminary SNP detection information of all samples, and performing multi-sample SNP detection according to chromosomes one by one;
in the embodiment, 12 chromosome variation detection results are integrated, and sites are filtered according to the standard that QUAL is more than or equal to 30, DP is more than or equal to 10, QD is more than or equal to 2, minimum minor allele frequency (minor allele frequency) is 0.05, and maximum deletion rate (max transmission) is 0.8;
2.2, analyzing a sample group structure and constructing a phylogenetic tree;
converting the filtered VCF variation file into BED (Browser Extensible Data) format by vccftools (v0.1.17) and PLINK (v1.9), and carrying out sample group structure analysis by using FastSTRUCTURE, wherein the K value range is 2-4;
extracting all sample haplotype (haplotype) sequences by using a local perl script; aligning and constructing a phylogenetic tree by using a FastTree (v2.1.10) based on a maximum likelihood method (Approximately maximum-likelihood) according to haplotype sequence information, wherein parameters are default;
2.3, IBS distance and genetic distance analysis
Calculating IBS distance matrixes among all individuals and genetic distance distribution of individuals inside the variety by utilizing vcftools and PLINK;
firstly, converting VCF (Variant Calling Format) files into PED and MAP format files, then calculating IBS distance between every two samples by using PLINK (the selection parameter is-genome-cluster-distance-matrix-alloy-extra-chr-alloy-no-six), and calculating Genetic distance between every two samples according to the IBS distance (Genetic distance=1-IBS); inputting a final calculation result into R, and carrying out data visualization by using ggplot 2;
2.4 nucleotide sequence diversity (pi) analysis
Calculating pi value-pi s of all inter-individual genome loci in the variety by utilizing vcftools, and then obtaining a nucleotide sequence diversity index of the variety by utilizing python script to calculate average pi value, namely dividing the sum of pi s of all loci by the total number of loci;
calculating the self homogeneity value of the variety to be detected, and calculating the internal diversity of the variety by combining five single plants by using the following formula:
wherein pi is: the average nucleotide number of each site of any two randomly selected DNA sequences is different; x is x i And x j Indicating the relative frequencies of the ith and jth sequences in the population, pi ij The number of nucleotide differences at each site between two sequences is represented, and n represents the number of single plants in the variety;
calculating the nucleotide sequence diversity of the whole genome in the variety by utilizing vccftools-window-pi-indv and python scripts according to a sliding window with the window size of 100Kb, and comparing the individual difference distribution in the variety with the genetic diversity trend among varieties according to the sliding window distribution condition of the heterozygosity of each individual whole genome range in the variety; the calculation result is visualized in R;
the internal homogeneity value obtained by calculation can be compared with other varieties or materials by calculation, and the indica rice homogeneity difference value can provide a judgment basis for variety identification or internal difference;
3. the rice variety homogeneity evaluation specifically comprises the following steps:
3.1, processing and counting the sequencing data of rice varieties
Carrying out double-end sequencing on all individuals of 10 rice varieties to finally obtain sequencing data of 50 libraries, wherein the total data size is more than or equal to 360G, the average number of bases is 6.26Gb, the average number of bases is 6.15Gb after filtration, the average comparison rate of samples is 0.95, and the average sequencing depth is about 16×;
the number of original mutation sites detected by all samples of 10 resequencing varieties measured by the embodiment is 4,025,683, and 3,461,005 high-reliability SNP sites are finally obtained through filtering; based on the whole genome variation sites, developing a next evaluation method;
3.2, phylogenetic tree preliminary evaluation
The 10 varieties measured in this example are mainly divided into two branches, corresponding to indica rice and japonica rice, respectively; each variety is clustered independently, individuals inside the variety are clustered together tightly, and the difference is very small and is obviously smaller than the genetic distance difference between varieties; compared with indica type varieties, the genetic distance between individuals in the japonica type varieties is relatively closer (shown in figure 2), and the genetic distance between individuals is difficult to distinguish by naked eyes, wherein the example is the variety Wu Yun japonica type No. 7 (CX 2-1-CX 2-5), all individuals are tightly gathered on one branch; the evolutionary tree provides visual evidence for comparing the indica-japonica character, the kindred character and the difference between varieties, so that the evolutionary tree can be used as an auxiliary judging means in the rice variety homogeneity analysis process; however, the phylogenetic tree can only qualitatively judge the genetic distance between varieties and between individuals in varieties, and it is difficult to quantify the specific difference value between varieties and between internal individuals;
3.3 evaluation of IBS distance and genetic distance
Inter-individual IBS distances (fig. 3) intuitively reflect sequence similarity between different individuals;
as can be seen from the color shade of fig. 3, the degree of IBS distance difference is: indica/japonica > indica/indica > japonica/japonica; the resequencing varieties measured by the embodiment are relatively uniform in internal color, and large in color difference among varieties, so that good resequencing variety consistency of the embodiment is indirectly reflected; IBS distance heatmaps comparatively intuitively show the uniformity inside varieties and the background difference degree of subspecies, varieties and individual levels;
statistical results of Genetic Distance (GD) differences based on IBS distances show that the differences of the Genetic distances between the japonica types and the indica types in the 10 re-sequenced varieties measured in the embodiment are obvious (figure 4), and the differences of the Genetic distances between the japonica types and the indica types are consistent with the results of the IBS distances, so that the Genetic backgrounds of the indica types and the japonica types are quite different;
from the difference in individual genetic distances, the inter-variety individual difference of the japonica type re-sequenced variety is between 0.0002 and 0.001, and the inter-variety difference of the indica type re-sequenced variety is about 0.0003 to 0.0072 (Table 2);
it is noted that, since 10 resequencing varieties selected in this example are from the national variety base and are strictly controlled in variety selection, phenotype identification, material handling, sequencing, etc., the homogeneity of the varieties is theoretically very high, in other words, the differences between individuals within the varieties are small; while the genetic distance between the insides of individuals of the indica type variety Huazhan (CX 10) is maximally 0.0254 and minimally 0.0182, which indicates that the internal genetic difference of the indica type variety is larger;
TABLE 2 statistics of the differences in Genetic Distances (GD) between two individuals within different varieties
3.4, quantitative determination of the homogeneity of Rice varieties
Analyzing the internal genome difference of the rice variety based on nucleotide sequence diversity (pi);
the pi value calculation process relates to SNP locus diversity of the whole genome, can integrate variation information of the whole genome category better, and further carries out quantization;
taking 10 varieties measured in the example as an example, it can be observed that pi values corresponding to the japonica type variety and the indica type variety are obviously within a certain range, namely, the nucleotide sequence diversity among a plurality of individuals of the same variety is near a certain value (figure 5), the value is different from that of the indica type variety, the average pi value of 5 japonica type re-sequencing varieties is about 0.0016, and the average pi value of 5 indica type re-sequencing varieties is about 0.0045 (table 3);
if individuals of different varieties are mixed, the nucleotide sequence diversity of the mixed variety is obviously increased and is far greater than a single variety threshold value; also, a japonica type variety, the nucleotide sequence diversity of which is about 0.016 after the mixture of Japanese sunny (CX 1) and Zhonghua No. 11 (CX 4), and the variety pi corresponding to each of the two varieties is about 0.0015; the seeds are indica type varieties; sample nucleotide difference after IR64 (CX 8) and Huazhan (CX 10) are mixed reaches 0.063, sample mixed nucleotide sequence diversity of Shennong 265 (CX 5) and Zhongjia early 17 (CX 9) of indica type and japonica type varieties is as high as 0.153, and average pi value of Shennong type and japonica type varieties is different by two orders of magnitude;
the result shows that the nucleotide sequence diversity exists among individuals in the variety to a certain extent, and the diversity after the different varieties are mixed is far superior to the inter-individual diversity in the variety; therefore, the genome-wide nucleotide polymorphism of the variety obtained by repeated sequencing is used for variety homogeneity test, namely the same variety or strain should be within a certain degree of variation, and different varieties or strains have larger difference;
TABLE 3 average nucleotide sequence diversity and average for 10 resequencing variety loci measured in this example
The method of the invention completes rice variety homogeneity assessment, suspicious sample screening and even new variety identification by utilizing whole genome locus information and combining genetic polymorphism indexes;
genome resequencing is carried out on a plurality of single plants of the rice variety, so that variation information in a whole genome range can be obtained for evaluating the homogeneity of the rice variety;
the variety diversity (pi value) can be quantitatively obtained by calculating the nucleotide polymorphism of a plurality of single plants of the rice variety;
the variety's own diversity can be used to evaluate homogeneity or degree of variation between varieties as compared to unknown or known materials.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. The rice variety homogeneity evaluation method is characterized by comprising the following steps:
1. planting the variety to be tested in the field, observing the consistency of the phenotype of the field, and then taking a plurality of single plants to respectively obtain re-sequencing data;
2. and carrying out whole genome variation detection, sample population structure analysis and construction of a phylogenetic tree, IBS distance and genetic distance analysis and nucleotide sequence diversity analysis on the genome re-sequenced variety by utilizing re-sequencing data.
2. The method for evaluating the homogeneity of a rice variety according to claim 1, wherein the whole genome variation detection comprises the steps of:
detecting the whole genome variation sites of the detected genome resequencing variety;
decompressing the determined variety original data, filtering the decompressed high-throughput sequencing data, and removing the linker sequence, the non-ATGC base and the low-quality read length;
comparing the filtered read length to a japonica type reference genome Nipponbare;
merging, filtering and converting the compared results into BAM format files;
performing preliminary SNP variation detection on the BAM file of a single sample, then integrating preliminary SNP detection information of all samples, and performing multi-sample SNP detection according to chromosomes one by one;
integrating the chromosome variation detection result, and filtering the position points.
3. The method for evaluating the homogeneity of a rice variety of claim 2, wherein the sample population structure analysis comprises the steps of: and converting the filtered VCF variation file into a BED format, and analyzing the structure of the sample group, wherein the K value range is 2-4.
4. A method for evaluating the homogeneity of a rice variety according to claim 3, wherein the construction of a phylogenetic tree comprises the steps of: extracting all sample haplotype sequences by using a local perl script; and aligning and constructing a phylogenetic tree according to the haplotype sequence information by using FastTree based on a maximum likelihood method.
5. The method for evaluating the homogeneity of a rice variety according to claim 4, wherein the analysis of IBS distance and genetic distance comprises the steps of: calculating IBS distance matrixes among all individuals and genetic distance distribution of individuals inside the variety;
firstly, converting a VCF file into a PED and MAP format file, then calculating IBS distance between every two samples by using PLINK, and calculating genetic distance between every two samples according to the IBS distance; and inputting the final calculation result into R language, and performing data visualization.
6. The method for evaluating the homogeneity of a rice variety according to claim 5, wherein the nucleotide sequence diversity analysis comprises the steps of: and calculating pi value-pi s of all inter-individual whole genome loci in the variety, and then utilizing python script to calculate average pi value, namely dividing the sum of pi s of all loci by the total number of loci to obtain the nucleotide sequence diversity index of the variety.
7. The method for evaluating the homogeneity of a rice variety according to claim 6, wherein the variety internal diversity is calculated by combining a plurality of individual plants using the following formula:
wherein pi is: the average nucleotide number of each site of any two randomly selected DNA sequences is different; x is x i And x j Indicating the relative frequencies of the ith and jth sequences in the population, pi ij The number of nucleotide differences at each site between the two sequences is represented, and n represents the number of individual plants in the variety.
8. The method for evaluating the homogeneity of rice varieties according to claim 7, wherein in the first step, DNA extraction and detection are carried out on materials, a DNA small fragment library with the insert length of 300-500bp is constructed on a sample which is qualified in detection, and the library is sequenced by utilizing an Illumina Hiseq4000 sequencing platform and adopting a Pair End150 double-ended sequencing packet Lane mode.
9. The method for evaluating the homogeneity of a rice variety of claim 8, further comprising the step of evaluating the homogeneity of a rice variety.
10. The method for evaluating the homogeneity of a rice variety of claim 9, wherein the third step specifically comprises the steps of:
3.1, processing and counting rice variety sequencing data;
3.2, primarily evaluating the phylogenetic tree;
3.3, evaluating IBS distance and genetic distance;
3.4, quantifying the homogeneity of the rice variety.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310832471.8A CN116855596A (en) | 2023-07-08 | 2023-07-08 | Rice variety homogeneity evaluation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310832471.8A CN116855596A (en) | 2023-07-08 | 2023-07-08 | Rice variety homogeneity evaluation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116855596A true CN116855596A (en) | 2023-10-10 |
Family
ID=88228148
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310832471.8A Pending CN116855596A (en) | 2023-07-08 | 2023-07-08 | Rice variety homogeneity evaluation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116855596A (en) |
-
2023
- 2023-07-08 CN CN202310832471.8A patent/CN116855596A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10354747B1 (en) | Deep learning analysis pipeline for next generation sequencing | |
CN111304303B (en) | Method for predicting microsatellite instability and application thereof | |
CN109545278B (en) | Method for identifying interaction between plant lncRNA and gene | |
CN110846411B (en) | Method for distinguishing gene mutation types of single tumor sample based on next generation sequencing | |
CN112397151B (en) | Methylation marker screening and evaluating method and device based on target capture sequencing | |
CN108304694B (en) | Method for analyzing gene mutation based on second-generation sequencing data | |
CN112233722B (en) | Variety identification method, and method and device for constructing prediction model thereof | |
CN114999573A (en) | Genome variation detection method and detection system | |
CN111968701A (en) | Method and device for detecting somatic copy number variation of designated genome region | |
KR101936933B1 (en) | Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same | |
CN105046105B (en) | The Haplotype map and its construction method of chromosome span | |
US20220277811A1 (en) | Detecting False Positive Variant Calls In Next-Generation Sequencing | |
CN110846429A (en) | Corn whole genome InDel chip and application thereof | |
CN110993029A (en) | Method and system for detecting chromosome abnormality | |
CN115083521A (en) | Method and system for identifying tumor cell group in single cell transcriptome sequencing data | |
CN110444253B (en) | Method and system suitable for mixed pool gene positioning | |
CN113564266B (en) | SNP typing genetic marker combination, detection kit and application | |
CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
WO2024140368A1 (en) | Sample cross contamination detection method and device | |
CN113160891A (en) | Microsatellite instability detection method based on transcriptome sequencing | |
CN112102944A (en) | NGS-based brain tumor molecular diagnosis analysis method | |
CN116312779A (en) | Method and apparatus for detecting sample contamination and identifying sample mismatch | |
CN116855596A (en) | Rice variety homogeneity evaluation method | |
US20070134706A1 (en) | Gene information display method and apparatus | |
CN112102880A (en) | Method for identifying variety, and method and device for constructing prediction model thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |