WO2022160700A1 - Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing - Google Patents
Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing Download PDFInfo
- Publication number
- WO2022160700A1 WO2022160700A1 PCT/CN2021/115146 CN2021115146W WO2022160700A1 WO 2022160700 A1 WO2022160700 A1 WO 2022160700A1 CN 2021115146 W CN2021115146 W CN 2021115146W WO 2022160700 A1 WO2022160700 A1 WO 2022160700A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- genotype
- progeny
- parent
- snp
- parents
- Prior art date
Links
- 238000012070 whole genome sequencing analysis Methods 0.000 title abstract description 7
- 238000000034 method Methods 0.000 claims abstract description 139
- 238000012163 sequencing technique Methods 0.000 claims abstract description 82
- 210000000349 chromosome Anatomy 0.000 claims abstract description 51
- 238000009826 distribution Methods 0.000 claims abstract description 12
- 238000011156 evaluation Methods 0.000 claims abstract description 8
- 238000005215 recombination Methods 0.000 claims description 79
- 230000006798 recombination Effects 0.000 claims description 77
- 238000004458 analytical method Methods 0.000 claims description 53
- 241000196324 Embryophyta Species 0.000 claims description 45
- 240000007594 Oryza sativa Species 0.000 claims description 44
- 235000007164 Oryza sativa Nutrition 0.000 claims description 43
- 235000009566 rice Nutrition 0.000 claims description 43
- 238000003205 genotyping method Methods 0.000 claims description 36
- 230000002759 chromosomal effect Effects 0.000 claims description 17
- 238000010276 construction Methods 0.000 claims description 10
- 238000007405 data analysis Methods 0.000 claims description 7
- 244000025254 Cannabis sativa Species 0.000 claims description 3
- 244000068988 Glycine max Species 0.000 claims description 2
- 235000010469 Glycine max Nutrition 0.000 claims description 2
- 235000002637 Nicotiana tabacum Nutrition 0.000 claims description 2
- 244000061176 Nicotiana tabacum Species 0.000 claims description 2
- 235000021307 Triticum Nutrition 0.000 claims description 2
- 241000209140 Triticum Species 0.000 claims 1
- 230000002068 genetic effect Effects 0.000 description 39
- 238000013515 script Methods 0.000 description 32
- 238000013507 mapping Methods 0.000 description 26
- 239000000463 material Substances 0.000 description 24
- 230000008569 process Effects 0.000 description 22
- 108090000623 proteins and genes Proteins 0.000 description 14
- 238000009395 breeding Methods 0.000 description 13
- 230000001488 breeding effect Effects 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 12
- 238000004088 simulation Methods 0.000 description 12
- 238000012216 screening Methods 0.000 description 10
- 239000003550 marker Substances 0.000 description 9
- 238000003752 polymerase chain reaction Methods 0.000 description 8
- 238000012165 high-throughput sequencing Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 238000012268 genome sequencing Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 108020004414 DNA Proteins 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000007614 genetic variation Effects 0.000 description 3
- 230000001965 increasing effect Effects 0.000 description 3
- 238000002493 microarray Methods 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 244000184734 Pyrus japonica Species 0.000 description 2
- 230000009418 agronomic effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000012350 deep sequencing Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 102000054766 genetic haplotypes Human genes 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 125000003729 nucleotide group Chemical group 0.000 description 2
- 102000054765 polymorphisms of proteins Human genes 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 1
- 241000219194 Arabidopsis Species 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 240000002582 Oryza sativa Indica Group Species 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 244000098338 Triticum aestivum Species 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 238000013476 bayesian approach Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000012215 gene cloning Methods 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 238000009399 inbreeding Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003147 molecular marker Substances 0.000 description 1
- 238000013386 optimize process Methods 0.000 description 1
- 238000011112 process operation Methods 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Definitions
- the invention relates to the technical field of biological information processing, in particular to multi-parent crop genotype identification based on high-throughput whole genome sequencing. More specifically, the present invention provides a method and device for identifying genotypes of multi-parent crops based on high-throughput whole genome sequencing data.
- Genome sequencing opens the door to high-throughput genotyping. Initially this was done using microarray chip technology, which detects single nucleotide polymorphisms (SNPs) by hybridizing genomic DNA to oligonucleotides on a gene chip. Since hundreds to thousands of markers can be detected in a single hybridization, this method of genotyping greatly improves the efficiency [1] . This method has been applied to some model organism systems such as human, Arabidopsis and rice [2-4] . Although the goal of high throughput has been achieved, microarray-based approaches have serious limitations, such as laborious, time-consuming, and high costs in designing, producing, and using microarrays.
- SNPs single nucleotide polymorphisms
- next-generation sequencing technology has brought a leap forward in methodological methods for genotyping and genetic mapping.
- New sequencing technologies not only increase sequencing throughput by several orders of magnitude, but also allow parallel sequencing of many samples [5-6] . Advances in these technologies have paved the way for the development of sequencing-based high-throughput genotyping methods.
- the new genotyping method combines the advantages of fast and inexpensive, high-density marker coverage, high accuracy and high resolution, while also being applicable to more mapping populations and species for comparative genomic and genetic map construction.
- the object of the present invention is to provide a method and device for identifying the genotypes of multi-parent plants with rapid analysis and accurate results, so that the population genotypes constructed by multiple parents can be analyzed quickly, accurately and reliably.
- a method for identifying the genotype of a multi-parent plant comprising:
- step (c) the analysis of recombination break sites is performed based on the SNP "word string”.
- step (c) includes analyzing the recombination break site, so as to obtain the analysis result of the recombination break site,
- step (s1) no matter how large the actual distance between two adjacent SNPs is, all the gaps between the SNPs are removed.
- step (s1) the SNPs constituting the word string are homozygous SNP sites of the parent.
- step (s1) the method further includes: firstly screening the SNP sites included in the analysis, thereby excluding any SNP site whose parent is heterozygous.
- step (s2) scoring is performed according to the scoring rules in Table A.
- step (s3) for each chromosomal region of the progeny, the genotype corresponding to each chromosomal region of the progeny is determined based on each parental score value or score value curve.
- step (s3) the genotype of each chromosomal region is determined based on the score value and the standard deviation.
- the chromosomal region of the genotype determines whether a parent A has a high score value close to full score ( ⁇ 80% full score, preferably ⁇ 80% full score), and the parent is in this paragraph.
- the score value of the region is quite stable, there is not much numerical fluctuation, and the score value of the remaining parents is low ( ⁇ 50% full score, preferably ⁇ 30% full score) or there is a large numerical fluctuation, then the gene in this chromosome region
- the genotype is determined as the genotype of the parent A.
- step (s3) it includes: by sliding the sliding window on the SNP site of the whole genome, the score value of each parent on each chromosome can be obtained, and the score value is vertical Coordinates, with the position of each sliding window on the chromosome as the abscissa, draw the score curve of each parent.
- step (s3) the sub-step of evaluating the heterozygous region is included:
- step (s3) the degree of similarity between the offspring and each parent in this section is quantified, and according to the numerical characteristics (value level and standard deviation) of the score curve of each parent, determine Genotype of each segment.
- step (s3) the genotype assessment is performed in the following manner:
- the method further includes: if the genotypes on both sides of the unknown region are the same, determining the region as the genotype; and if the genotypes on the two sides are different, determining the unknown region
- the middle position of the unknown region is regarded as a recombination breakpoint, and the two sides of the unknown region are the genotypes on both sides.
- the progeny is a multi-parent plant.
- n is 3-6, more preferably 3, 4 or 5.
- the sequencing data is selected from the group consisting of genome sequencing data, RNA sequencing data, or a combination thereof.
- sequencing data are files in fastq format.
- the sliding window size is 170-500 consecutive SNP sites, preferably 200-400 consecutive SNP sites;
- sequencing depth of the sequencing data is 0.1x-10x, preferably 0.2x-5x.
- the sequencing depth of the sequencing data ⁇ 1, preferably 1-5, more preferably 1.5-3.
- each parental score curve is obtained.
- the SNP site is used to determine the genotype of the individual
- step (b) the sequencing data (such as fastq files) are compared and processed by bwa and GATK software to obtain SNP information.
- the SNP site information includes location information and genotype information.
- the SNP site used for judging the genotype meets the following requirements:
- SNP sites cover the whole genome as much as possible, and there will be no deletions in certain regions;
- the SNP information position information and genotype information of the corresponding two parents and offspring are known, and the locus should be deleted if any of the three is unknown.
- step (c) the evaluation result of each SNP of the progeny is recorded in the rlt file, and the rlt file records the genotype determination situation of each SNP position;
- the distribution information of the recombination breakpoints on each chromosome of the whole genome of the progeny is recorded in a bin file, and the bin file records the distribution of the recombination breakpoints on the 12 chromosomes of the whole genome.
- step (c) read genotype and recombination break site judgment are performed by SNPwindow script.
- step (d) the genotype map is performed on the m individuals of the progeny at the same time
- step (d) the recombination map is constructed by the SNPwindow script, and the gene map of each progeny individual is drawn by the SNP2png script.
- step (d) it also includes performing alignment on the recombination map of each individual through the Bin2MCD script to generate a recombination bin map.
- the resolution of the recombined bin map is one bin per 5-200kb, preferably one bin per 10-100kb.
- the method further comprises: processing the recombination bin map to obtain the genetic map of the progeny.
- the method further comprises: performing QTL analysis on the genetic map.
- the method further includes: performing a visual analysis on the genotypes of the entire population of parents and progeny, generating genotype data, and constructing a linkage map based on the genotype data.
- the plants include crops, preferably grass crops.
- the crops include rice, wheat, soybean, and tobacco.
- a data analysis device for identifying the genotypes of multi-parent plants comprising:
- a data input module for inputting the data to be processed to be analyzed, the data to be processed includes: the sequencing data Df of the progeny plant to be identified, and the sequencing data Dp of the parent plant corresponding to the progeny plant;
- a multi-parental plant genotype identification module is configured to perform the method described in the first aspect of the present invention, thereby obtaining the genotype identification result of the progeny;
- the described multi-parent plant genotype identification module includes:
- the SNP site information analysis submodule is configured to determine the SNP site information of the parent and progeny based on the sequencing data Df and the sequencing data Dp;
- Chromosomal recombination breakpoint analysis sub-module which is configured to judge the genotype of the progeny based on the SNP site information, so as to obtain the evaluation result of each SNP of the progeny and the whole genome of the progeny
- a genotype map construction submodule which is configured to: construct and/or draw a genotype map of the progeny based on the SNP assessment result information of the progeny and the position information of the whole genome recombination breakpoint, so as to obtain the progeny. Genotyping results of multiple parental plants.
- the plants include crops, preferably grass crops.
- the output module includes: a display, a printer, a pad, and the like.
- Figure 1 shows the simulated two-parental material genome-wide recombination breakpoints.
- Figure 2 shows the simulated four-parental material genome-wide recombination breakpoints.
- Figure 3 shows genotyping of two-parental mock progeny using a SNP-based sliding window approach.
- Figure 4 shows the genotyping of two-parental mock progeny using the SEG-Map software method.
- Figure 5 shows genotyping of four-parental mock progeny using a SNP-based sliding window approach.
- Figure 6 shows the effect of different sliding window sizes on the accuracy of genotype determination results.
- Figure 7 shows the effect of different sequencing depths on the accuracy of genotype determination results.
- Figure 8 shows the analysis framework flow of the SNP-based sliding window genotyping method.
- Figure 9 shows gene mapping using the SNP2png script.
- Figure 10 shows a genotype identification ensemble plot of the rice population.
- Figure 11 shows the genotype table of the recombinant inbred line individual recombination segment map in one example.
- Figure 12 shows the SNP "string” with window 15.
- Figure 13 shows the four parental score curves of rice chromosome 3 mock progeny.
- Figure 14 shows two parental score curves for rice chromosome 11 mock progeny.
- Figure 15 shows the scores of each parent when it is determined that the parents are three homozygous genotypes in one embodiment.
- Figure 16 shows the scores of each parent when it is determined that the two parents are heterozygous genotypes in one embodiment.
- Figure 17 shows the scores of each parent when the genotype is determined as unknown in one embodiment.
- Figure 18 shows subsequent genotype determination of unknown regions in one embodiment.
- Figure 19 shows a graph of genotyping of individual individuals in the DH population.
- Figure 20 shows a graph of the genotyping ensemble for the DH population.
- Figure 21 shows genotyping of three parental material in one example.
- Figure 22 shows SEG-Map identification of three parental material in one example.
- Figure 23 shows the genotyping of four parental mimics in one example.
- Figure 24 shows the true genotypes of the four-parent mock material in one example.
- the present inventors After extensive and in-depth research, the present inventors have developed a method for more rapid and accurate genotype identification for the first time, thereby realizing more effective genetic mapping and genome analysis.
- the method of the present invention is particularly suitable for genotyping and identification of low coverage sequenced multi-parent populations.
- the genotype information of the real SNPs of multiple parents and progeny in a certain section is directly read, and then the degree of similarity between the progeny and each parent in this section is quantified, according to the numerical value of the score curve of each parent characteristics (value level and standard deviation), forming an efficient, simplified and accurate method for multi-parent plant (or multi-parent crop) genotype identification.
- the present invention has been completed on this basis.
- the present inventors developed a high-throughput method to identify genotypes of recombinant populations containing multiple parents based on whole-genome low-coverage sequencing data generated by second-generation sequencing technology.
- the inventors designed a "sliding window" method to determine the genotype of this segment by comprehensively analyzing the genotypes of multiple single nucleotide polymorphisms (SNPs) in a local region of the genome , and then determine the specific position of the recombination break site to construct a fine recombination map of the multi-parent population.
- SNPs single nucleotide polymorphisms
- the inventors constructed simulated whole-genome sequencing data of biparental populations and multi-parental populations, constructed a genetic linkage map using this method, and finally compared the genotype information obtained by identification with the genotypes of the real simulated data.
- the genotype identification accuracy of the population can reach 89.61%, which is similar to the accuracy of the inventor's SEG-Map software method for identifying the genotype of the parental population (the SEG-Map method has an accuracy of 89.32%).
- the genotype identification method newly developed by the present inventors has an identification accuracy of 92.10% for multi-parent populations, which cannot be achieved by SEG-Map software or methods.
- the method of the invention can effectively and quickly analyze the genotype of each individual in the population, plays a key guiding role in genome design and breeding, and can also provide fast and accurate genotype data for QTL mapping of multi-parent populations of different crops.
- the present inventors tested the method using the real rice RIL genetic population, used high-throughput sequencing-based genotype identification, and finally obtained a fairly good high-precision recombination map.
- this genotype identification method based on low-coverage genome sequencing can replace the traditional marker-based genotype identification method, and provide large-scale gene exploration research and solve more complex biological Learning questions provide a powerful tool.
- the method of the invention is more suitable for genotype identification of multi-parent backcross populations that have undergone low coverage sequencing, provides accurate genotype support for QTL mapping, and is also helpful for molecular design breeding applications of multi-parent populations.
- the terms "containing” or “including (including)” can be open, semi-closed, and closed. In other words, the term also includes “consisting essentially of,” or “consisting of.”
- the term "biparental" indicates that two parents are involved.
- multi-parent indicates that 3 parents and more are involved.
- multi-parent plant refers to plants involving 3 parents and more, eg, progeny plants (eg, crops) involving 3, 4, or 5 parents.
- the invention provides a method for identifying multi-parent crop genotypes.
- the method of the invention is a genotype identification method of the sliding window of the SNP site.
- the data processing is optimized.
- the optimized process can directly analyze and process the unidirectional or bidirectional end short sequence sequencing results generated by the next-generation sequencing technology, and finally construct the genetic map of the recombinant population.
- the genome-wide SNPs of both parents need to be identified before proceeding with the data analysis pipeline.
- the identification of this SNP can be obtained by high-coverage whole-genome deep sequencing, or by existing genomic SNP information in the rice haplotype map, or by low-coverage whole-genome sequencing combined with missing genotypes (SNPs) to fill in to get. Since SNP identification between two parental varieties can be obtained in a fast and cost-effective way, sequencing-based genotype identification of a recombinant population will mainly rely on subsequent analysis, including reading genotypes, recombination breakpoints Point determination and construction of genetic linkage maps.
- the first step consists of several tasks that can be processed simultaneously. Individuals and parental material in a certain number of recombinant populations are subjected to second-generation high-throughput sequencing simultaneously. The obtained fastq files were aligned and processed by bwa and GATK software to obtain high-quality SNP information.
- the SNP loci used for the final determination of genotype should meet the following requirements:
- SNP sites cover the whole genome as much as possible, and there will be no deletions in certain regions.
- the SNP information position information and genotype information
- the locus should be deleted if any one of the three is not known.
- the rice parent is an inbred homozygous line, and there is basically no heterozygous locus in the genome. Therefore, if a heterozygous SNP locus is found in the parent, it is generally considered that the locus is Not credible, so SNP sites where either parent is heterozygous can be deleted.
- a python script SNPwindow can be used to judge the genotype of the progeny.
- the script output will have two files, the rlt file and the bin file.
- the rlt file records the genotype determination of each SNP position
- the bin file records the distribution of recombination breakpoints on the 12 chromosomes in the whole genome.
- a genotype map can be drawn first by using the rlt and bin files through a perl script SNP2png, and the image format is in PNG format.
- the map is drawn based on the genotype information of the determined SNP loci and the position information of the whole-genome recombination breakpoint. Different colors in the figure represent different genotype types.
- a perl script can also be used to visualize the genotype profile of the entire population. Programs and scripts used in the analysis process are shown in italics and form a series of analysis steps. The genotype data generated at the end of the analysis process can be directly used in other software (including MapMaker and JoinMap) to construct linkage maps.
- the bins are reorganized when analyzing the final output data produced by the software, usually at a resolution of one bin per 100kb, or even one bin per 10kb.
- the genotype results of the mapping population can be imported into programs such as MapMaker [16] or JoinMap [32] for genetic map construction. With the genetic map available, QTL analysis is performed.
- the genetic map produced by the method of the present invention is much finer in scale than maps produced by most conventional molecular markers.
- the method of the present invention relates to judging the recombination break site, and its detailed process comprises the following steps:
- Step 1 Construct the SNP "string”.
- the SNPs on the 12 chromosomes become 12 consecutive word strings (see Figure 12).
- the blue in the figure represents the genotype of parent 1, and the red represents the genotype of parent 2.
- the homozygous genotype of parent 1 Blue
- homozygous genotype red
- heterozygous genotype yellow
- the genome of the artificially cultivated rice parent material is highly homozygous, and for some multi-generation self-recombinant rice populations, the genome is relatively homozygous, and there are only some heterozygous regions in some chromosomal locations. Therefore, the SNP loci included in the analysis were first screened artificially, and any SNP loci whose parents were heterozygous were excluded. Such loci cannot be accurately judged and scored. In addition, if the sequencing depth of the progeny is not very high, the SNP loci that are heterozygous in the progeny can be filtered, because the reliability of the heterozygous locus judged based on the low depth is not high, which is likely due to sequencing errors resulting in misjudgment.
- Step 2 Score both parents in one window
- the scores of all SNP sites in a sliding window are calculated, and the total score of each parent is calculated as the score of the parent in the chromosome position of the sliding window.
- the degree of conformity of the offspring with the parent is measured according to the typing of each parent.
- the scoring rules are as shown in Table A or similar scoring rules.
- the scoring rules are formulated according to the genetic laws of organisms.
- genotype scoring rules in the present invention are further described below.
- the score of the offspring for any parent consists of three parts: 1.
- the offspring has the same SNP site as the parent; 2.
- the offspring is different from the parent but conforms to Mendelian inheritance 3.
- the number of SNP loci in the offspring to be tested that are the same as that of the parent is m;
- the number of loci of Del's inheritance law and the number of misjudged loci caused by various possible factors is e.
- s 1 is the scoring value of the same SNP site of the progeny and the parent.
- s 2 is the scoring value of the locus that is different from the parent but conforms to the Mendelian inheritance law.
- a continuous SNP frame of size N there are i parents to be determined.
- the genotypes of the progeny and parent at this locus are gk and g′k, respectively . .
- the genotypes of the genes of the pure line parents of rice are generally 0/0, 0
- the genotypes of the offspring are generally 0/0, 0
- the frequency of 2 alleles is generally less, and is not considered for the time being.
- the probability that the offspring matches its genotype is:
- the genotype of a certain region of the offspring is determined to the parent genotype with the highest coincidence probability, that is, the maximum coincidence probability among i parents is obtained:
- P max max ⁇ P 1 , P 2 , . . . , P t ⁇
- the following table is used for genotype scoring.
- the standard deviation std is calculated by sliding window on the continuous parent score value.
- the score S is the highest, and the standard deviation is the smallest.
- Step 3 Determine the genotype of the chromosome region according to the score value
- the score value of each parent on each chromosome can be obtained. Take the score as the ordinate, and draw the score curve of each parent with the position of each sliding window on the chromosome as the abscissa.
- genotype judgment of each chromosome is based on the characteristics of different parental score curves.
- a sliding window score was performed on the progeny of the simulated four-parent source, and the score curves of the four parents were drawn according to the score values of the four parents.
- the yellow curve (parent 4) has a high score value close to full score in this region, and the score value of the parent in this region is quite stable, not too large
- the numerical fluctuation of the score is measured by the standard deviation in statistics.
- parent 4 has a high score value and a small standard deviation in this area, while the other three parents have a high score value in this area that fluctuates in the range of 0-200, with a high standard deviation, so this region can be judged to be the homozygous genotype of parent 4.
- the offspring genotypes of different regions of the 12 chromosomes can be determined based on the score values.
- the rectangular bar corresponding to true in the figure corresponds to the real genotype information of the simulated offspring of each chromosome segment, while the rectangular bar corresponding to judge represents the offspring genotype determined by the method of the present invention, and the information of the two basically matches.
- the judgment of the heterozygous region is illustrated by the genotype judgment of the simulated progeny of the two parents. According to the principle of genetics, even if it is a hybrid progeny derived from multiple parents, its parental origin in a certain chromosomal region is at most two parents. Therefore, it can be judged whether this region is a heterozygous region according to the score curve of the two parents in this region.
- the genotype identification of the simulated offspring is performed, the rectangular bar corresponding to true corresponds to the real genotype information of the simulated offspring of each chromosome segment, and the rectangular bar corresponding to judge represents the actual genotype information of the simulated offspring.
- the genotype of the offspring determined by the inventive method. Similarly, when the score of one parent is high and the standard deviation is small, and the score of the other parent has considerable fluctuation and the standard deviation is large, the region is judged to be the homozygous genotype of the former (orange or blue area).
- One of the core ideas of the method of the present invention is based on directly reading the genotype information of the real SNP of the parent and the progeny in a certain segment, and then quantifying the degree of similarity between the progeny and each parent in this segment.
- the numerical characteristics of the score curve form a relatively simplified analysis model, and then determine the genotype of each segment.
- the criteria for the judgment of the present invention mainly include the following situations:
- the region to be judged is first defined as "unknown", and the genotype of this region is determined by the genotypes of the regions on both sides.
- the region is determined as this genotype, and if the genotypes on both sides are different, the middle position of this region is regarded as a recombination breakpoint, and the two sides of the region are respectively genotypes on both sides.
- genotype determination is performed through a secondary sliding window.
- a sliding window was performed on the genotype of the SNP, and a parental score value in each window was counted.
- the determination of the final genotype depends on the score value and the size of the standard deviation obtained by the secondary sliding window, and the determination of the genotype is carried out by the highest probability that a certain segment of the offspring belongs to a certain parent.
- genotype determination can be performed faster and more accurately by using the secondary sliding window for genotype determination.
- a schematic example of a secondary sliding window is as follows:
- the present invention also provides an identification device or an analysis device for multi-parent crop genotypes for performing the method of the present invention.
- the device includes:
- a multi-parent plant genotype identification module is configured to perform the method of the present invention, thereby obtaining the genotype identification result of the progeny;
- the present invention provides a multi-parent crop genotype identification method based on high-throughput sequencing data for the first time. Before the present invention, there is currently no systematic method for identifying multiple parental genotypes of crops.
- the high-throughput genotype identification method of the present invention can greatly simplify and accelerate the genetic mapping of quantitative traits in crops [37-39,20] .
- the theoretical method of the present invention can better cooperate with the multi-parent population for genotype identification, improve the accuracy and efficiency of QTL mapping, and make full use of the abundant genetic variation existing in the multi-parent population. It also contributes to the improvement of crop genetic quality and the design of molecular breeding.
- the present invention can be used for the acquisition of molecular markers closely linked to important agronomic trait genes, the efficient screening of offspring in the breeding process, the fine identification of genotype maps of improved varieties, etc., and provides molecular marker-assisted screening and breeding It has developed a fast and efficient means and platform, making it a new level in efficiency and accuracy.
- sequencing-based high-throughput genotyping method of the present invention will provide convenience for solving complex biological problems and improving crop breeding.
- the GenomicsDBImport program in the GATK package uses the GenomicsDBImport program in the GATK package to merge all the mutation intermediate files, and then use the GenotypeGVCFs program in the GATK package to export the merged mutation file, using SelectVariants
- the program selects the required SNP site information, and then passes the VariantFiltration program (parameters are --cluster-size 3--cluster-window-size 10--filter-expression "QD ⁇ 10.00"--filter-name lowQD--filter- expression"FS>15.000"--filter-name highFS--genotype-filter-expression"DP>50
- a python script After obtaining the SNP information of parents and progeny through GATK software, a python script is used, the principle is to regionalize the SNPs identified by each individual along the sliding window of all SNP sites for comprehensive analysis, based on a fixed-length sliding window to read Take the genotype, then judge the recombination breakpoint and construct the recombination segment map.
- a perl script uses the intermediate file determined by the program to generate a PNG format recombination segment map for each individual, which is convenient for users to intuitively browse their overall genotype.
- the GD module in Perl needs to be used when drawing.
- Bin2MCD Another script, Bin2MCD was next used to generate a high-density map consisting of recombinant bins [19] for subsequent QTL analysis.
- output files can be used directly to identify QTLs by several QTL analysis software packages, including Windows QTL Cartographer V2.5 [17] .
- the rice DH population used in this study was constructed by the laboratory of the National Genetic Research Center of the Chinese Academy of Sciences. Its two parents are Kasalath and japonica cv. Nipponbare. The DH population is the line produced by the F2 progeny after many years of self-recombination. The inventors selected dozens of strains for genotype identification and analysis.
- the three-parent rice plants used in this study were constructed by the laboratory of the National Genetic Research Center of the Chinese Academy of Sciences. Its three parents are Wushan Simiao, 93-11 and Shuohui 70. The plants in this population are produced by self-recombination of the hybrid progeny of the three parents, and there are many recombination information in their genomes.
- the DH population of rice was genotyped using the method of the present invention, and a high-density map composed of recombinant bins was generated by Bin2MCD.
- genotype analysis and high-density bin map were also performed using the method published in 2010.
- the high-depth (20-30x) sequencing data of the two parents, Kasalath and Nipponbare, were compared to the Nipponbare reference genome IRGSP 1.0 using the bwa software, and then the GATK software was used to find the high-quality SNP information of the two parents, and then use a
- the perl script replaces the SNP at the specified locus on the Nipponbare genome, thereby generating a pseudo reference for the two parents.
- the low-abundance sequencing data of the DH population was then aligned to the pseudo reference of the two parents for genotyping.
- the predicted genotype information of the progeny from the two parents simulated by the inventors should be consistent with it.
- the generated simulation data includes three cases: the homozygous region of Wushan silk seedlings, the homozygous region of 93-11 and the heterozygous region of Wushan silk seedlings and 93-11.
- the figure shows the expected length of each region and the location of the recombination breakpoint.
- the production of the simulated data is based on the real sequencing data of the two parents. First, the fastq data of the two parents are aligned to the rice Nipponbare genome, and then the required alignment information (chromosome and position information) in the obtained sam file is screened. The fastq information from the two parents was then reformatted to form the simulated hybrid progeny fastq data.
- the fastq data of the simulated progeny were compared with the fastq data of the two parents Wushan Simiao and 93-11 to the rice reference genome IRGSP 1.0, and then the genome-wide variation information of the two parents and the simulated progeny was searched by GATK software, and filtered. Screening to obtain high-quality SNP sites.
- the "sliding window” method is used to judge the SNPs of the whole genome, and the two parents are scored and compared in a sliding window. If it is higher, this segment is judged as the homozygous genotype of the parent (indicated in red or blue in the figure). When the scores of the two parents are not significantly different, the segment is judged as the heterozygous region of the two parents (indicated in yellow in the figure).
- the inventors designed a quantitative method to measure the accuracy of judgment, divided the whole genome into thousands of small regions of 100kb (or small regions of 20-200kb), and then compared the degree of agreement between the results obtained by the method of the present invention and the standard map
- the accuracy of the method of the present invention can be measured. According to this method, comparing the genotype information obtained from the simulated data with the real genotype of the simulated data, the accuracy of the identification of the two parents can reach 89.61%.
- the inventors also used the published SEG-Map method to judge the genotype of the simulated progeny data, compared the fastq files of the simulated data to the pseudo reference of the two parents, and used the software to screen out the parent-specific fastq sequence, and then determine the information of the SNP site according to the position of the sequence alignment, and then use the sliding window method to determine the genotype information.
- the method has more detailed theoretical verification and data simulation in the published articles, and has high accuracy and feasibility.
- the accuracy obtained by the SEG-Map software results is 89.32%, which is not much different from the method of the present invention, indicating that the method of the present invention has high feasibility and accuracy.
- the SEG-Map method does have high reliability for the identification of the genotypes of the two parents, and the inventors have used this method for a long time in the genome analysis of rice materials. However, this method cannot perform genotype identification on materials derived from multiple parents, so the method of the present invention is also intended to solve the problem of multiple parent genotype identification.
- the inventors used the SNP-based sliding window method to identify the genotypes of the simulated progeny derived from the four parents, and scored the four parents in one window.
- the modified region is determined as the homozygous region of the parent, and the homozygous regions of the four parents are represented by red, blue, green and yellow respectively in the figure.
- the fastq data of the four parents were simulated for 100 times, and the method of dividing the genome into small regions was also used to quantify the accuracy.
- the average simulation accuracy of the method of the present invention for the genotype identification of the simulated data of the four parents is 92.10%.
- Genotypes are read by the scores of the two parental SNPs as the "window" slides along the chromosome. A genotype does not change until a recombination breakpoint is encountered.
- the inventors found that there are two types of break sites: one is to separate two homozygous genotypes, and the other is to separate a segment of homozygous genotypes from a segment of heterozygous genotypes; the former case in RIL is the predominant form of existence, while the latter is mostly found in the F 2 population.
- the homozygous genotype When a sliding window hits a "homozygous/homozygous” breakpoint, the homozygous genotype briefly changes to a heterozygous genotype and then back again from a heterozygous genotype into a homozygous genotype.
- the homozygous genotype When a sliding window hits a "homozygous/heterozygous” breakpoint, the homozygous genotype becomes a heterozygous genotype and then changes to a homozygous genotype again, this The boundary point between the homozygous genotype region and the heterozygous genotype region can be determined.
- the present invention adopts different window sizes to perform genotype analysis on the final SNP information screened by the four-parent simulation data, and it is found that the sliding window sizes of different sizes do have an impact on the final analysis accuracy.
- the window size is small (less than 199)
- the final accuracy rate is less than 90%, but when the sliding window size is increased to 199, the genotype identification accuracy can reach 93.72%, but when the sliding window size continues to increase, the final The accuracy rate does not change much, indicating that the accuracy rate of the judgment result does not always increase with the size of the sliding window.
- larger sliding window size requires more computing resources and computing time, and the time cost will be more prominent when large-scale groups need to be processed. Therefore, the inventor comprehensively considers the time cost and the accuracy rate, and the sliding window size of 199 (or the sliding window size of 180-220) is a more reasonable choice.
- genotype identification can be carried out with SEG-Map software at a lower sequencing depth, so the method of the present invention is deeply tested.
- the genome-wide SNPs of both parents need to be identified before proceeding with the data analysis pipeline.
- the identification of this SNP can be obtained by high-coverage whole-genome deep sequencing, or by existing genomic SNP information in the rice haplotype map, or by low-coverage whole-genome sequencing combined with missing genotypes (SNPs) to fill in to get. Since SNP identification between two parental varieties can be obtained in a fast and cost-effective way, sequencing-based genotype identification of a recombinant population will mainly rely on subsequent analysis, including reading genotypes, recombination breakpoints Point determination and construction of genetic linkage maps.
- the functions, steps, and software (scripts) in the data analysis are shown in Figure 8.
- the first step consists of several tasks that can be processed simultaneously. Individuals and parental material in a certain number of recombinant populations are subjected to second-generation high-throughput sequencing simultaneously. The obtained fastq files were aligned and processed by bwa and GATK software to obtain high-quality SNP information.
- the SNP loci used for the final determination of the genotype should meet the following requirements: 1. The SNP loci should cover the whole genome as much as possible, and there will be no deletions in certain regions. 2.
- the SNP information position information and genotype information
- the SNP information of the two parents and simulated offspring are known, and if any one of the three is not known, the locus should be deleted. 3. It is generally believed that the rice parent is an inbred homozygous line, and there is basically no heterozygous locus in the genome. Therefore, if a heterozygous SNP locus is found in the parent, it is generally considered that the locus is unreliable, so SNP sites where either parent was heterozygous were deleted.
- the script output will have two files, the rlt file and the bin file.
- the rlt file records the genotype determination of each SNP position
- the bin file records the distribution of recombination breakpoints on 12 chromosomes in the whole genome.
- a genotype map is generally drawn first using the rlt and bin files through a perl script SNP2png, and the image format is in PNG format.
- the map is drawn according to the genotype information of SNP loci determined by the program and the position information of whole-genome recombination breakpoints. Different colors are used to represent different genotype types in the map.
- a perl script can also be used to visualize the genotype of the entire population. Programs and scripts used in the analysis process are shown in italics and form a series of analysis steps. The genotype data generated at the end of the analysis process can be directly used in other software (including MapMaker and JoinMap) to construct linkage maps.
- the bins are reorganized when analyzing the final output data produced by the software, usually at a resolution of one bin per 100kb, or even one bin per 10kb.
- the genotype results of the mapping population can be imported into programs such as MapMaker [16] or JoinMap [32] for genetic map construction. With the genetic map available, QTL analysis is performed.
- this genetic map is much finer in scale than maps produced by most traditional molecular markers.
- This package is compatible with multiple platforms (eg: Unix, Linux and Windows).
- the GD module In addition to the perl environment itself, the GD module also needs to be installed, because there are drawing steps in the process operation.
- Step 1 Construct the SNP "string”.
- the SNPs on the 12 chromosomes become 12 consecutive word strings (Fig. 12).
- the blue in the figure represents the genotype of parent 1, and the red represents the genotype of parent 2.
- the homozygous genotype of parent 1 Blue
- homozygous genotype red
- heterozygous genotype yellow
- the genomes of artificially cultivated rice parent materials are highly homozygous, and for some multi-generation self-recombinant rice populations, the genomes are also relatively homozygous, with only some heterozygous regions in some chromosomal locations. Therefore, the SNP loci included in the analysis were first screened artificially, and any SNP loci whose parents were heterozygous were excluded. Such loci cannot be accurately judged and scored. In addition, if the sequencing depth of the progeny is not very high, the SNP loci that are heterozygous in the progeny can be filtered, because the reliability of the heterozygous locus judged based on the low depth is not high, which is likely due to sequencing errors resulting in misjudgment.
- Step 2 Score both parents in one window
- the scores of all SNP sites in a sliding window are calculated, and the total score of each parent is calculated as the score of the parent in the chromosome position of the sliding window.
- the degree of conformity of the offspring with the parent is measured according to the typing of each parent.
- the preferred scoring rules are shown in Table A, and the scoring rules of the present invention are formulated according to the genetic rules of organisms.
- Step 3 Determine the genotype of the chromosome region according to the score value
- the score value of each parent on each chromosome can be obtained. Take the score as the ordinate, and draw the score curve of each parent with the position of each sliding window on the chromosome as the abscissa.
- the genotype judgment of each chromosome is based on the characteristics of different parental score curves. As shown in Figure 13, the offspring of the simulated four-parent source were scored by sliding window, and the score curves of the four parents were drawn according to the score values of the four parents.
- the yellow curve (parent 4) has a high score value close to full score in this region, and the score value of the parent in this region is quite stable, not too large
- the numerical fluctuation of the score is measured by the standard deviation in statistics.
- parent 4 has a high score value and a small standard deviation in this area, while the other three parents have a high score value in this area that fluctuates in the range of 0-200, with a high standard deviation, so this region is judged to be the homozygous genotype of parent 4.
- the offspring genotypes of different regions of the 12 chromosomes can be determined based on the score values.
- the rectangular bar corresponding to true in the figure corresponds to the real genotype information of the simulated offspring of each chromosome segment, while the rectangular bar corresponding to judge represents the offspring genotype determined by the method of the present invention, and the information of the two basically matches.
- the judgment of the heterozygous region is illustrated by the genotype judgment of the simulated progeny of the two parents.
- the genotype identification of the simulated progeny is carried out, the rectangular bar corresponding to true corresponds to the real genotype information of the simulated progeny of each chromosome segment, and the rectangular bar corresponding to judge represents the judgment of the method of the present invention. offspring genotype.
- the region is judged to be the homozygous genotype of the former (orange or blue area).
- Step 5 Description of Judgment Criteria
- the core idea of the method of the present invention is based on directly reading the genotype information of the real SNPs of the parent and the offspring in a certain segment, and then quantifying the degree of similarity between the offspring and each parent in this segment, according to the score curve of each parent
- the numerical characteristics (value level and standard deviation) of form a relatively simplified analysis model, and then determine the genotype of each segment.
- the criteria for judgment mainly include the following situations:
- the region to be judged is first defined as "unknown", and the genotype of this region is determined by the genotypes of the regions on both sides.
- the region is determined as this genotype, and if the genotypes on both sides are different, the middle position of this region is regarded as a recombination breakpoint, and the region The two sides are the genotypes of the two sides, respectively.
- the two parents of the DH population of rice used were Kasalath and Nipponbare. It is a population formed by inducing haploids and doubling the F1 generation of biparental crosses. Its plants are homozygous, and the self-bred progeny are pure lines, which can be repeated for many years and multiple points. It is an ideal material for studying the interaction of genotype and environment.
- the high-depth sequencing data (20x-30x) of the two parents Kasalath and Nipponbare materials were compared to the rice reference genome IRGSP 1.0 through bwa software, and then the required high-quality SNP information was found through GATK software, and the SNP information of the parents was compared. Combined with the SNP information of all progeny into the same vcf file, it is convenient to extract the required variation site information from it.
- the average sequencing depth of each progeny in the DH population is about 0.02x, which belongs to the sequencing data of lower depth.
- the SNP information and parental information of each progeny are extracted separately, and then the SNPwindow script is used to judge, and each progeny is judged by rlt file and bin file.
- the genotype identification results were visualized using the result file obtained in the previous step.
- the homozygous genotypes of the two parents Kerath in red and Nipponbare in blue
- the reliability of the heterozygous region is low, which may be due to sequencing errors or program misjudgments caused by the low polymorphism of the two parents in this region.
- Bin2MCD script uses the Bin2MCD script to take the bin file of the entire population as input, and calculate the map file of the overall genotype distribution.
- the map file divides the whole genome into many small bins, and each bin is determined according to the genotype results of the individual identification. genotype type.
- a perl script was used to visualize the genotype information of the entire population, and the proportion of different genotypes at each bin position was also calculated, which is an important parameter for population genetics research.
- the red and blue ratio map in Figure 20 represents the proportion of the three genotypes in different bins.
- the visualization of this step facilitates a quick, direct view of the population genotype.
- the map file output with the phenotype can directly use the analysis software such as winQTL for QTL localization analysis.
- multi-parent genotype identification was performed on the three-parent progeny materials grown in the laboratory.
- the amount of sequencing data of the progeny is about 0.2x
- the three parent materials are Wushan Simiao, 93-11 and Shuohui 70, respectively, and the sequencing depth of the three parents is about 20x-30x.
- the progeny and the three parental SNP information were integrated into the same vcf, and the final high-quality SNP was further screened from it. Then use the SNPwindow script to judge the genotype of the offspring. In a window, if a parent has the highest score, the region is judged as the homozygous genotype of the parent.
- the bin file for judging the recombination breakpoint is obtained by using the judgment degree of multiple parents, and a perl script is used to visualize the judgment result.
- the red area corresponds to parent 1 Wushan Simiao
- the blue area corresponds to parent 2 of 93-11
- the green area corresponds to parent 3 of Shuohui 70
- the yellow is the heterozygous area.
- the SEG-Map method was used to determine the genotype of this material. Therefore, the laboratory's previous genotype determination of these plants was mainly based on the genotype identification of the two parents, Wushan Simiao and 9311, and three species were determined. Genotype, Wushan silk seedling homozygous genotype, 93-11 homozygous genotype and the heterozygous genotype of the two. According to the judgment results of the three parents, the inventors found that the heterozygous segment obtained from the judgment of the two parents is likely to correspond to the homozygous genotype of the third parent. Therefore, the method of the present invention can make up for the deficiencies of the previous SEG-Map software under the condition of ensuring the accuracy, and solve the problem of multi-parent genotype judgment.
- the inventors used the real sequencing fastq data of four real rice materials in the laboratory, 93-11, Shuohui 70, Wushan Simiao, and Huang Huazhan, and then segmented out the corresponding regions according to the comparison results.
- the reads are artificially combined and screened to produce data of a simulated progeny, and the real genotype information and recombination breakpoints of the progeny are clear, so the simulated data can be used to evaluate the feasibility and accuracy of the present invention.
- the inventors identified dozens of recombination breakpoints in the whole genome, and the determined different chromosomal regions were also roughly consistent with the real genotype results.
- the figure shows the real genotype information of the mock progeny of the present invention.
- the inventor checked the intermediate output rlt file of the judgment process, and checked the reasons for the difference in judgment.
- the possible reasons are as follows: 1. Because the depth of the sequencing data of the progeny is not very high, only a part of the variation information of the whole genome can be captured, and some important parental distinguishing sites may be missed, resulting in the inability to distinguish in some regions. real parent. 2.
- the identified two or more parents are very similar in certain regions, and there is no polymorphism of the parents. This is also not due to sequencing errors or sequencing depth. For such high-similarity regions, no judgment may be made for the time being.
- the genotype judgment depends on the genotype information on both sides of the genotype. Therefore, some regions may not be able to make accurate judgments, and the genotypes are judged on both sides. Most likely parental genotype.
- Multi-parent populations have great application prospects in genetic analysis. By selecting multiple parents, the genetic diversity of the population can be increased, and multiple parents can be fused into a population by means of hybridization and selfing (or inbreeding). Number of reorganizations. Multi-parent populations can not only increase the frequency of recombination and tap the genetic basis behind complex traits, but also have great potential in breeding applications due to the richness of the genetic basis of selected parents. Compared with the biparental group, the multi-parental group has a large number of parents, which increases the population variation richness, including allelic diversity and phenotypic diversity, provides mapping accuracy and precision, and improves the efficiency of QTL detection.
- the recombination events of will improve the resolution of QTL mapping; because the parental screening of multi-parent populations is more refined, i.e., the criteria are more stringent, and multiple parents increase the diversity of the genetic basis, so its QTL results can be applied to breeding research.
- multi-parent groups are constructed by mixing multiple parents evenly. Compared with natural groups, because they can know the pedigree relationship and have detailed information on group construction, from the aspect of experimental design, group stratification is avoided, and then control False positive problem of localization results.
- the present inventors developed a novel method for high-throughput genotyping by whole-genome low-coverage resequencing detection of SNPs.
- This type of SNP data differs from traditional genetic markers in two main ways. First, in general, not all lines in a recombinant population can obtain information on a certain SNP locus by random sequencing. Second, a single SNP locus is not a reliable marker or locus for genotyping because of potential sequence errors.
- the inventors In order to process these SNP data with unique properties generated by second-generation sequencing, the inventors further developed a new analysis framework, that is, using a "sliding window method", according to multiple SNPs at local locations The genotype determines the genotype of this segment.
- SEG-Map Sequencing Enabled Genotyping for Mapping recombination populations
- GAII Illumina Genome Analyzer II
- the inventors of the present invention have developed a set of novel program processing and analysis procedures and corresponding methods and devices after research. Using the process of the present invention, in addition to optimizing the steps in the previous SEG-Map program and being compatible with current mainstream bioinformatics analysis software and different types of high-throughput sequencing data, the most important thing is that it can quickly, accurately and reliably analyze multiple Parental constructed population genotypes.
- the establishment of the method of the invention can help the multi-parent population to be better applied in crop breeding; it can also accurately identify more QTL sites in the multi-parent population; the genome prediction for the multi-parent population can help them be used as germplasm resources It is directly applied to the variety to provide the basis.
- the inventors used a high-throughput sequencing-based method for genotyping of recombinant inbred lines in rice, showing the advantages of this new genotyping method over the commonly used PCR-based method.
- the inventors used 287 insertion/deletion markers (including SSR markers) on the F 8 of this population of recombinant inbred lines. Generation individuals were genotyped. These markers were amplified by PCR and identified on agarose gel electrophoresis.
- each marker covers an average genetic distance of about 5cM, which is equivalent to a physical distance of about 1.4Mb, which is larger than most previously reported rice genetic maps. Designing, screening, and collecting these PCR markers took three researchers more than a year of work. In the study of recombinant inbred lines in rice, the inventors used Illumina GA to obtain an average marker coverage of 40kb per SNP in less than two weeks. In this way, sequencing-based high-throughput genotyping methods are much faster, more efficient, and less expensive than traditional PCR-based genotyping methods.
- the throughput of resequencing can be easily adjusted, which also allows the inventors to obtain suitable marker density levels and resolution of recombination breakpoints while choosing the shortest time and resource investment.
- the inventors can increase the coverage of resequencing for the whole or part of the mapping population. It should be noted that, using this method, the recombination break site can be determined very accurately, and if there is a high enough resequencing coverage, it can theoretically be located within 1kb. Such a fine resolution enables the detection of "double crossovers" that have not been previously identified with other types of genetic markers.
- this method can improve the accuracy of QTL detection and mapping and increase the efficiency and success rate of gene cloning.
- Precise identification of recombination breakpoints also enables the study of genomic regions with specific genetic properties, such as recombination hotspots.
- this high-throughput genotype identification method combined with second-generation sequencing technology will greatly simplify and accelerate the genetic mapping of quantitative traits in crops [37-39,20] .
- the theoretical method proposed by the present inventor can better cooperate with the multi-parent population for genotype identification, improve the accuracy and efficiency of QTL mapping, and make full use of the abundant genetic variation existing in the multi-parent population. It also contributes to the improvement of crop genetic quality and the design of molecular breeding.
- this method can be used for the acquisition of closely linked molecular markers of important agronomic trait genes, the efficient screening of offspring in the breeding process, and the fine identification of genotype maps of improved varieties, etc. Fast and efficient means and platforms make it a new level of efficiency and accuracy.
- this sequencing-based high-throughput genotyping method will provide convenience for solving complex biological problems and improving crop breeding.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims (10)
- 一种对多亲本植物的基因型进行鉴定的方法,其特征在于,所述方法包括:A method for identifying the genotype of a multi-parent plant, wherein the method comprises:(a)对于n个亲本及其子代,提供待鉴定的子代植物的测序数据Df,以及与所述子代植物相应的亲本植物的测序数据Dp,其中n为≥3的正整数;(a) for n parents and their progeny, provide the sequencing data Df of the progeny plants to be identified, and the sequencing data Dp of the parental plants corresponding to the progeny plants, wherein n is a positive integer ≥ 3;(b)基于所述测序数据Df和所述测序数据Dp,确定亲代和子代的SNP位点信息;(b) based on the sequencing data Df and the sequencing data Dp, determine the SNP site information of the parent and progeny;(c)基于所述的SNP位点信息,对子代的基因型进行判断,从而获得所述子代的各个SNP的评定结果以及所述子代的全基因组的各染色体上的重组断裂点的分布信息;(c) Judging the genotype of the progeny based on the SNP site information, thereby obtaining the evaluation results of each SNP of the progeny and the recombination breakpoints on each chromosome of the entire genome of the progeny distribution information;(d)基于所述子代的SNP评定结果信息和全基因组重组断裂点的位置信息,构建和/或绘制所述子代的基因型图谱,从而获得所述多亲本植物的基因型鉴定结果。(d) constructing and/or drawing a genotype map of the progeny based on the SNP evaluation result information of the progeny and the position information of the whole-genome recombination breakpoint, thereby obtaining the genotype identification result of the multi-parent plant.
- 如权利要求1所述的方法,其特征在于,在步骤(c)中,基于SNP“字串”进行重组断裂位点的分析。The method of claim 1, wherein, in step (c), analysis of recombination break sites is performed based on SNP "word strings".
- 如权利要求1所述的方法,其特征在于,步骤(c)中包括对重组断裂位点进行分析,从而获得重组断裂位点的分析结果,The method of claim 1, wherein the step (c) comprises analyzing the recombination break site, thereby obtaining an analysis result of the recombination break site,并且所述的重组断裂位点分析包括:And the recombination break site analysis includes:(s1)构建SNP“字串”,其中将亲本和子代的各条染色体上所有的SNP的基因型按顺序压缩成一个字串;(s1) constructing a SNP "string", wherein the genotypes of all SNPs on each chromosome of the parent and progeny are sequentially compressed into a string;(s2)按照预定的窗口大小,确定对应于所述SNP字符串的各个滑动窗口,并对每个窗口中的SNP位点进行打分,从而获得所述窗口内对各个亲本的各自得分值P;(s2) Determine each sliding window corresponding to the SNP string according to a predetermined window size, and score the SNP sites in each window, thereby obtaining the respective score values P for each parent in the window ;(s3)基于(s2)步骤中获得的得分值P,确定对应于子代的各染色体区域的基因型。(s3) Based on the score value P obtained in the step (s2), the genotype corresponding to each chromosomal region of the progeny is determined.
- 如权利要求1所述的方法,其特征在于,在步骤(s3)中,对于子代的各染色体区域,基于各亲本得分值或得分值曲线,确定对应于子代的各染色体区域的基因型。The method according to claim 1, wherein, in step (s3), for each chromosomal region of the progeny, based on each parental score value or score value curve, determine the chromosomal region corresponding to each chromosomal region of the progeny. genotype.
- 如权利要求1所述的方法,其特征在于,在步骤(s3)中,包括:通过滑动窗口在全基因组SNP位点上的滑动,就可以得到每一条染色体上各个亲本的得分值,并将该分值为纵坐标,以每个滑动窗在染色体上的位置为横坐标,绘制每个亲本的得分曲线。The method according to claim 1, characterized in that, in step (s3), comprising: by sliding the sliding window on the SNP site of the whole genome, the score value of each parent on each chromosome can be obtained, and Take the score as the ordinate, and draw the score curve of each parent with the position of each sliding window on the chromosome as the abscissa.
- 如权利要求1所述的方法,其特征在于,在步骤(s3)中,对子代与各个亲本在这一区段的相似程度进行量化值,根据各个亲本的得分曲线的数值特征(数值高低和标准差),判断各个区段的基因型。The method according to claim 1, characterized in that, in step (s3), quantifying the degree of similarity between the offspring and each parent in this section, according to the numerical characteristics of the score curve of each parent (value level and standard deviation) to determine the genotype of each segment.
- 如权利要求3所述的方法,其特征在于,滑动窗口大小170-500个连续SNP位点,较佳地200-400个连续SNP位点;The method of claim 3, wherein the sliding window size is 170-500 consecutive SNP sites, preferably 200-400 consecutive SNP sites;和/或所述的测序数据的测序深度为0.1x-10x,较佳地0.2x-5x。And/or the sequencing depth of the sequencing data is 0.1x-10x, preferably 0.2x-5x.
- 如权利要求1所述的方法,其特征在于,所述的植物包括作物,较佳地禾本科作物;The method of claim 1, wherein the plant comprises a crop, preferably a grass crop;更佳地,所述的作物包括水稻、小麦、大豆、烟草。More preferably, the crops include rice, wheat, soybean, and tobacco.
- 一种对多亲本植物的基因型进行鉴定的数据分析装置,该装置包括:A data analysis device for identifying genotypes of multi-parent plants, the device comprising:数据输入模块,用于输入待分析的待处理数据,所述的待处理数据包括:待鉴定的子代植物的测序数据Df,以及与所述子代植物相应的亲本植物的测序数据Dp;a data input module for inputting data to be processed to be analyzed, the data to be processed includes: the sequencing data Df of the progeny plant to be identified, and the sequencing data Dp of the parent plant corresponding to the progeny plant;多亲本植物基因型鉴定模块,所述多亲本植物基因型鉴定模块被配置为执行权利要求1中所述的方法,从而获得所述子代的基因型鉴定结果;A multi-parental plant genotype identification module configured to perform the method described in claim 1, thereby obtaining the genotype identification result of the progeny;和输出模块,用于输出所述的所述子代的基因型鉴定结果。and an output module for outputting the genotype identification result of the progeny.
- 如权利要求9所述的装置,其特征在于,所述的多亲本植物基因型鉴定模块包括:The device of claim 9, wherein the multi-parent plant genotype identification module comprises:SNP位点信息分析子模块,其被配置为基于所述测序数据Df和所述测序数据Dp,确定亲代和子代的SNP位点信息;The SNP site information analysis submodule is configured to determine the SNP site information of the parent and progeny based on the sequencing data Df and the sequencing data Dp;染色体重组断裂点分析子模块,其被配置为基于所述的SNP位点信息,对子代的基因型进行判断,从而获得所述子代的各个SNP的评定结果以及所述子代的全基因组的各染色体上的重组断裂点的分布信息;Chromosomal recombination breakpoint analysis sub-module, which is configured to judge the genotype of the progeny based on the SNP site information, so as to obtain the evaluation result of each SNP of the progeny and the whole genome of the progeny The distribution information of recombination breakpoints on each chromosome;基因型图谱构建子模块,其被配置为:基于所述子代的SNP评定结果信息和全基因组重组断裂点的位置信息,构建和/或绘制所述子代的基因型图谱,从而获得所述多亲本植物的基因型鉴定结果。A genotype map construction submodule, which is configured to: construct and/or draw a genotype map of the progeny based on the SNP assessment result information of the progeny and the position information of the whole genome recombination breakpoint, so as to obtain the progeny. Genotyping results of multiple parental plants.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021423830A AU2021423830A1 (en) | 2021-01-30 | 2021-08-27 | Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110131330.4 | 2021-01-30 | ||
CN202110131330.4A CN114842907A (en) | 2021-01-30 | 2021-01-30 | Multi-parent crop genotype identification based on high-throughput whole genome sequencing |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022160700A1 true WO2022160700A1 (en) | 2022-08-04 |
Family
ID=82561095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/115146 WO2022160700A1 (en) | 2021-01-30 | 2021-08-27 | Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN114842907A (en) |
AU (1) | AU2021423830A1 (en) |
WO (1) | WO2022160700A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115798580B (en) * | 2023-02-10 | 2023-11-07 | 北京中仪康卫医疗器械有限公司 | Genotype filling and low-depth sequencing-based integrated genome analysis method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104328507A (en) * | 2014-10-11 | 2015-02-04 | 中国水稻研究所 | SNP chip used for identifying rice variety, preparation method and application |
CN111508560A (en) * | 2020-04-29 | 2020-08-07 | 上海师范大学 | Method for constructing high-density genotype map of outcrossing species |
-
2021
- 2021-01-30 CN CN202110131330.4A patent/CN114842907A/en active Pending
- 2021-08-27 WO PCT/CN2021/115146 patent/WO2022160700A1/en active Application Filing
- 2021-08-27 AU AU2021423830A patent/AU2021423830A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104328507A (en) * | 2014-10-11 | 2015-02-04 | 中国水稻研究所 | SNP chip used for identifying rice variety, preparation method and application |
CN111508560A (en) * | 2020-04-29 | 2020-08-07 | 上海师范大学 | Method for constructing high-density genotype map of outcrossing species |
Non-Patent Citations (5)
Title |
---|
LU WANG; AHONG WANG; XUEHUI HUANG; QIANG ZHAO; GUOJUN DONG; QIAN QIAN; TAO SANG; BIN HAN: "Mapping 49 quantitative trait loci at high resolution through sequencing-based genotyping of rice recombinant inbred lines", THEORETICAL AND APPLIED GENETICS ; INTERNATIONAL JOURNAL OF PLANT BREEDING RESEARCH, SPRINGER, BERLIN, DE, vol. 122, no. 2, 28 September 2010 (2010-09-28), Berlin, DE , pages 327 - 340, XP019873367, ISSN: 1432-2242, DOI: 10.1007/s00122-010-1449-8 * |
LUO LONGHAI, YUE GUIDONG, GAO QIANG, WANG JUNYI, XU JIAOHUI, YIN YE: "The Application of High-throughput Sequencing Technology in Plant and Animal Research", SCIENCE CHINA: CHINESE BULLETIN OF LIFE SCIENCE = SCIENTIA SINICA VITAE, vol. 42, no. 2, 1 February 2012 (2012-02-01), pages 107 - 124, XP055954237, ISSN: 1674-7232, DOI: 10.1360/052011-634 * |
QING-QING HOU, LI-ZHEN SI, XUE-HUI HUANG, BIN HAN: "Progress on genome-wide association study of important agronomic traits in rice", CHINESE BULLETIN OF LIFE SCIENCES, vol. 28, no. 10, 1 October 2016 (2016-10-01), pages 1 - 8, XP055954244, ISSN: 1004-0374, DOI: 10.13376/j.cbls/2016162 * |
X. HUANG, Q. FENG, Q. QIAN, Q. ZHAO, L. WANG, A. WANG, J. GUAN, D. FAN, Q. WENG, T. HUANG, G. DONG, T. SANG, B. HAN: "High-throughput genotyping by whole-genome resequencing", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, US, vol. 19, no. 6, 1 June 2009 (2009-06-01), US , pages 1068 - 1076, XP055577519, ISSN: 1088-9051, DOI: 10.1101/gr.089516.108 * |
XUE YONGBIAO, HAN BIN, CHONG KANG, WANG TAI, HE ZUHUA, FU XIANGDONG, CHU CHENGCAI, CHENG ZHUKUAN, XU YUNYUAN, LI MING: "Achievements and Prospect of Designer Breeding by Molecular Modules in Rice ", BULLETIN OF CHINESE ACADEMY OF SCIENCES, vol. 33, no. 9, 1 September 2018 (2018-09-01), pages 1 - 10, XP055954240, ISSN: 1000-3045, DOI: 10.16418/j.issn.1000-3045.2018.09.002 * |
Also Published As
Publication number | Publication date |
---|---|
AU2021423830A1 (en) | 2023-12-21 |
CN114842907A (en) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mason et al. | A user guide to the Brassica 60K Illumina Infinium™ SNP genotyping array | |
CN105008599B (en) | Oryza sativa L. full-length genome breeding chip and application thereof | |
Huang et al. | High-throughput genotyping by whole-genome resequencing | |
AU2019101778A4 (en) | Method for constructing rice molecular marker map based on Kompetitive Allele Specific PCR and application in breeding Using the same | |
CN109196123B (en) | SNP molecular marker combination for rice genotyping and application thereof | |
CN108998550B (en) | SNP molecular marker for rice genotyping and application thereof | |
CN106868131A (en) | No. 6 chromosomes of upland cotton SNP marker related to fibre strength | |
Li et al. | Construction of high-density genetic map and mapping quantitative trait loci for growth habit-related traits of peanut (Arachis hypogaea L.) | |
US20140208449A1 (en) | Genetics of gender discrimination in date palm | |
Zhao et al. | SEG-Map: a novel software for genotype calling and genetic map construction from next-generation sequencing | |
Li et al. | Three representative inter and intra‐subspecific crosses reveal the genetic architecture of reproductive isolation in rice | |
CN112289384A (en) | Construction method and application of whole citrus genome KASP marker library | |
CN110846429A (en) | Corn whole genome InDel chip and application thereof | |
WO2022160700A1 (en) | Genotype identification of multi-parent crop on basis of high-throughput whole genome sequencing | |
Gardiner et al. | A framework for gene mapping in wheat demonstrated using the Yr7 yellow rust resistance gene | |
KR101539737B1 (en) | Methodology for improving efficiency of marker-assisted backcrossing using genome sequence and molecular marker | |
Li et al. | Genome-wide artificial introgressions of Gossypium barbadense into G. hirsutum reveal superior loci for simultaneous improvement of cotton fiber quality and yield traits | |
CN114574613B (en) | Wheat-goose-roegneria kamoji whole genome liquid chip and application | |
Su et al. | Fine‐mapping a fibre strength QTL QFS‐D 11‐1 on cotton chromosome 21 using introgressed lines | |
CN116004898A (en) | Peanut 40K liquid-phase SNP chip PeannitGBTS 40K and application thereof | |
Fletcher et al. | AFLAP: assembly-free linkage analysis pipeline using k-mers from genome sequencing data | |
CN103725675A (en) | Molecular tagging method for paddy rice | |
Agrawal et al. | Molecular marker tools for breeding program in crops | |
Wang et al. | A pangenome analysis pipeline (PSVCP) provides insights into rice functional gene identification | |
CN115992292B (en) | SNP molecular marker combination for brassica napus and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21922295 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021423830 Country of ref document: AU Ref document number: AU2021423830 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref document number: 2021423830 Country of ref document: AU Date of ref document: 20210827 Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21922295 Country of ref document: EP Kind code of ref document: A1 |