WO2018103037A1 - 水稻全基因组育种芯片及其应用 - Google Patents

水稻全基因组育种芯片及其应用 Download PDF

Info

Publication number
WO2018103037A1
WO2018103037A1 PCT/CN2016/109007 CN2016109007W WO2018103037A1 WO 2018103037 A1 WO2018103037 A1 WO 2018103037A1 CN 2016109007 W CN2016109007 W CN 2016109007W WO 2018103037 A1 WO2018103037 A1 WO 2018103037A1
Authority
WO
WIPO (PCT)
Prior art keywords
snp
chip
rice
site
sites
Prior art date
Application number
PCT/CN2016/109007
Other languages
English (en)
French (fr)
Inventor
周发松
喻辉辉
谢为博
雷昉
李菁
张小波
周莹
程丹
陆青
邱树青
韦懿
陈�光
张启发
Original Assignee
中国种子集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国种子集团有限公司 filed Critical 中国种子集团有限公司
Priority to PCT/CN2016/109007 priority Critical patent/WO2018103037A1/zh
Priority to CN201680091357.2A priority patent/CN110050092B/zh
Publication of WO2018103037A1 publication Critical patent/WO2018103037A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof

Definitions

  • the present application relates to the fields of genomics, molecular biology, bioinformatics and molecular plant breeding, and in particular to a rice whole genome breeding chip and application thereof.
  • Genomic breeding refers to the application of molecular biology techniques to breeding and breeding at the genomic level.
  • the main advantages are as follows: First, the plant seeds or seedlings can be identified at the molecular level, and further judge whether there is an expected good trait, thereby making selection, speeding up the breeding process and improving the breeding accuracy; Molecular biology testing and analysis can form a set of standard processes. Different technicians can quickly and accurately obtain accurate results according to the process, which greatly reduces the impact of personal experience on plant selection. Third, in genome breeding. Labeling technology can detect at the genome-wide level, avoiding the separation of offspring due to the inclusion of heterozygous sites in the material, and ensuring the stability of the material.
  • SNP Single Nucleotide Polymorphism
  • the technology of high-throughput detection of SNP mainly includes detection platform based on sequencing technology and detection platform based on chip technology.
  • SNP chip becomes a process of genome breeding due to controllability of labeling sites, convenience of operation and reliability of results. Important tool.
  • the most mature SNP chip detection technology is the Illumina infinium chip and the Affymetrix Axiom chip.
  • Illumina infinium chip technology is a high-density chip technology based on microbeads. This technique utilizes microbeads of 3 ⁇ m in diameter to self-assemble in the micropores of the fiber bundle or planar wafer. Each microbead is covered with dozens of specific oligonucleotides Ten thousand copies, these copies will be used as capture sequences to genotype the sample during the assay.
  • the chip can be divided into the following formats according to the number of types of oligonucleotides: 24 sample formats (3,000-90,000 bead types), 12 sample formats (90,001-250,000 bead types) or 4 sample formats (250,001) - 1,000,000 microbead types).
  • the chip-ready scanning system features advanced laser and optical components that can handle high-density multi-sample chips, producing high-quality data while ensuring fast operation.
  • Advanced analytical techniques result in high sample detection rates with repeatability of up to 99.9%. These high quality data reduce the possibility of false positives and false negatives, making the results of genotyping more accurate.
  • the Affymetrix Axiom chip uses in-situ lithography.
  • the photomask design and rigorous process flow make the fabricated chip high quality, repeatability and uniformity, and ensure the probe synthesis on the chip.
  • the Affymetrix GeneTitan system is a fully automated, highly integrated chip workstation that uses a chip board similar to a 96-well plate. Each square chip occupies approximately one hole of a 96-well plate.
  • One chip can contain 16, 24 or 96 blocks. Chips for multi-sample high-throughput detection.
  • the system integrates the hybridization furnace, fluid workstation and CCD scanning imaging equipment used in the whole process of hybridization to scanning into one instrument. After the chip board is placed in the GeneTitan system, the hybridization, washing and scanning of the chip requires almost no labor. Interventions can all be done automatically by the machine.
  • the rice genome-wide breeding chip Rice60K is disclosed in the PCT International Application Publication No. WO/2014/121419A1, which has been successfully applied to rice genome breeding and functional genomics research.
  • the application provides a SNP marker combination for rice genotyping, characterized by comprising a SNP marker in the nucleotide sequence set forth in SEQ ID NO:1-277881.
  • the SNP marker combination of the present application further comprises a SNP marker in the nucleotide sequence set forth in SEQ ID: 27782-86071.
  • the SNP marker combination of the present application includes a SNP marker in at least 37582 nucleotide sequences in the nucleotide sequence set forth in SEQ ID NO: 1-86071.
  • the application provides a rice chip comprising a detection site designed for a SNP marker in the nucleotide sequence set forth in SEQ ID NO:1-27781.
  • the rice chip of the present application comprises a detection site designed for a SNP tag in the nucleotide sequence set forth in SEQ ID NO: 1-23778.
  • the rice chip of the present application further comprises a detection site designed for the SNP marker in the nucleotide sequence set forth in SEQ ID NO: 27782-86071.
  • the rice chip of the present application comprises a detection site designed for a SNP tag in at least 37582 nucleotide sequences in the nucleotide sequence set forth in SEQ ID NOs: 1-86071.
  • the detection site in the rice chip of the present application is a probe combination designed for SNP markers.
  • the rice chips of the present application are made using in-situ synthesis, off-chip synthesis, or microbead methods.
  • the rice chip of the present application is synthesized by in-situ lithography synthesis, photoresist parallel synthesis, microfluidic channel in-slice synthesis, light-guided in-situ synthesis, soft lithography in situ synthesis , printing synthesis method, molecular seal in sheet synthesis, maskless chip synthesis, BeadArray method, or suspension chip method.
  • the rice chips of the present application are made by Illumina Infinium technology or Affymetrix Axiom technology.
  • the application provides the use of the above SNP marker combination or chip in detecting a biological sample.
  • the assay is used for breeding, identification, gene mapping and cloning, germplasm identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.
  • the application provides a method of detecting a biological sample, the method comprising detecting information of a SNP marker in a nucleotide sequence set forth in SEQ ID NO:1-277881 in the biological sample.
  • the methods of the present application further comprise detecting information of a SNP marker in the nucleotide sequence set forth in SEQ ID: 27782-86071 in the biological sample.
  • the methods of the present application comprise detecting information of a SNP marker in at least 37582 nucleotide sequences in the nucleotide sequence set forth in SEQ ID NO: 1-86071 in the biological sample.
  • the parties to the present application The method uses a gene chip for the detection.
  • the application provides a method of screening a representative SNP marker combination of germplasm resources, comprising the steps of:
  • I.SNP sites The difference of I.SNP sites is 0 points for A/T or C/G, and 20 points for other differences;
  • the SNP locus is located in the gene spacer, intron, promoter, 5' non-coding region (5'-UTR) and 3' non-coding region (3'-UTR). , 1.5, 2, 2 and 2.5;
  • mutation scores 2, 5, and 10 are administered, respectively;
  • the whole genome of rice was divided into linkage disequilibrium blocks. Each block selected two sites with the highest comprehensive score and 25 sites at most, satisfying at least 10 sites per 100 kb.
  • the present application provides a method of screening for a hybrid rice-specific SNP marker combination, comprising the steps of:
  • I.SNP sites The difference of I.SNP sites is 0 points for A/T or C/G, and 20 points for other differences;
  • the SNP locus is located in the gene spacer, intron, promoter, 5' non-coding region (5'-UTR) and 3' non-coding region (3'-UTR). , 1.5, 2, 2 and 2.5;
  • SNPs cause synonymous mutations, non-synonymous mutations, and large-effect mutations in the coding region, they are given 2, 5, and 10, respectively;
  • the application provides a method of screening a wild rice-derived SNP marker combination comprising the steps of:
  • the 55 bp sequence upstream or downstream of the SNP site is aligned with the rice genome, and the SNP site with more than 70% matching with other positions in the genome is removed;
  • the rice genome was divided into sections according to the position per 40 kb, and each segment selected one of the SNP sites with the highest score.
  • the application provides a method of screening a functional gene region marker combination comprising the steps of:
  • the 55 bp sequence upstream or downstream of the SNP site is aligned with the rice genome, and the SNP site with more than 70% matching with other positions in the genome is removed;
  • the SNP site in a particular functional gene region is selected, and the Rice60K chip disclosed in WO/2014/121419A1 has a SNP site number of no more than 10 in this region.
  • Figure 1 shows the distribution of SNP loci on rice genome.
  • the ordinate numbers represent 12 chromosomes in rice, the abscissa is the physical position; the vertical line height indicates the number of SNP sites; the legend indicates the correspondence between the height of the vertical line and the number of SNP sites.
  • 1a is the distribution of SNP loci in the functional gene region of the newly added 30K SNP locus
  • 1b is the distribution of SNP locus in wild rice from the newly added 30K SNP locus
  • 1c is the unique SNP locus in the hybrid rice in the newly added 30K SNP locus Distribution
  • 1d is the representative SNP locus distribution of germplasm resources in the newly added 30K SNP locus
  • 1e is the distribution of the newly added 30K SNP locus
  • 1f is the distribution of the newly added 30K and Rice60K SNP loci.
  • Figure 2 shows the genetic background of the rice blast resistant material A08-1 using a 90K chip.
  • 2a is the result of Rice60KAddon1 test
  • 2b is the test result of Os90Kv1.
  • the box indicated by the abscissa number sequentially represents 12 chromosomes of rice, and the ordinate number is the physical position on the rice genome [in megabases (Mb)]; the white background in the figure indicates the space material with the receptor material.
  • the genotypes were consistent, the black lines indicated that they were consistent with the donor material K22 genotype, and the lines at the black dots on chromosome 6 were the target fragments.
  • Fig. 3 Results of haplotype cluster analysis of the blast resistance gene Pi2/Pi9/Pigm region.
  • 3a uses the cluster analysis results of the new 30K SNP marker combination;
  • 3b is the cluster analysis result of the Rice60K chip.
  • the ordinate represents the difference value of the material pieces;
  • the horizontal direction is the detection material, and the representation connected by the horizontal line is divided into the same haplotype type.
  • single nucleotide polymorphism or "SNP” or “SNP marker” or “SNP locus” as used herein refers to a nucleotide sequence present in the genomic sequence of a chromosome, based on differences in nucleotide sequences ( Polynucleotide sequence changes caused by a single nucleotide - a change in A, T, C or G), resulting in a diversity of chromosomal genomes, which in turn allows for different alleles (eg alleles from two different individuals) Or different individuals are distinguished from each other. This change may occur in the coding region or non-coding region of the gene (eg, in the vicinity of the promoter region, or in the intron) or in the intergenic region.
  • allele refers to a different form of the same gene that is present in a given locus on a homologous chromosome.
  • linkage disequilibrium refers to a non-random association at two or more loci that may be on the same chromosome or on different chromosomes. Linkage disequilibrium is also referred to as gamete level imbalance or gamete imbalance. In another sense, linkage disequilibrium is the frequency at which an allele or genetic marker exhibits a single-mode specimen that is higher or lower in the population than predicted by the random frequency of the allele. Linkage refers to a limited combination of two or more sites on a chromosome, and linkage disequilibrium is not equivalent to linkage. The number of linkage disequilibriums depends on the difference in observed and expected site frequencies.
  • linkage balance For those groups whose recombination sites or genotypes have a frequency equal to the expected population, we call it a linkage balance.
  • the degree of linkage disequilibrium depends on a variety of factors, including genetic linkage, selection, and probability of recombination, genetic drift, selection mating, and population structure.
  • linked disequilibrium block refers to a haplotype block in which a whole genome SNP marker is defined by the LD value D' based on the difference in linkage disequilibrium.
  • a group of haplotypes that are located in a particular region of a chromosome is associated with each other and tends to be a combination of single nucleotide polymorphisms that are inherited globally to the offspring.
  • MAF is the Minor Allele Frequency, which refers to the frequency of occurrence of alleles that are not common in a given population. Higher values indicate a greater likelihood of polymorphism between any two species.
  • index refers to an insertion or deletion, which specifically refers to a difference in the whole genome, with a certain number of nucleotide insertions or deletions in the genome of an individual relative to a standard control (Jander et al., 2002). .
  • SNP chip refers to a biological microchip capable of analyzing the presence of a SNP contained in a sample DNA by arranging and attaching hundreds to hundreds of thousands of biomolecules as probes, such as having known Sequence DNA, DNA fragments, cDNA, oligonucleotides, RNA or RNA fragments that are immobilized at regular intervals on a small solid substrate formed of glass, silicon or nylon. Hybridization occurs between the nucleic acid contained in the sample and the probe immobilized on the surface depending on the degree of complementarity. By detecting and judging hybridization, information about substances contained in the sample can be obtained at the same time.
  • the current major types of DNA chips include: in situ in-situ synthesis, which uses a modified oligonucleotide monomer to gradually synthesize spatially combined probe sequences to form a DNA core. Tablets, thereby directly synthesizing an array of oligonucleotide probes on a hard surface.
  • For off-slice synthesis which involves spotting a pre-synthesized probe sequence to a specific site to form a DNA chip, thereby forming a DNA probe array immobilized on a glass substrate.
  • the microbead method involves directly synthesizing a DNA probe on the encoded microbeads, or fixing the prepared probe sequence to the encoded microbeads, and arbitrarily assembling the microbead chips.
  • the application provides a SNP marker combination for rice genotyping, characterized by comprising a SNP marker in the nucleotide sequence set forth in SEQ ID NO:1-277881.
  • the nucleotide sequence shown in SEQ ID NO: 1-27781 is a SNP site and 70 bp each upstream and downstream, and the probe can be designed from the upstream or downstream when actually designing the probe.
  • the SNP marker combination further comprises a SNP marker in the nucleotide sequence set forth in SEQ ID: 27782-86071.
  • the SNP marker in the nucleotide sequence set forth in SEQ ID: 27782-86071 is a combination of 58,290 SNP markers detected by the rice whole genome breeding chip Rice60K disclosed in PCT International Application No. WO 2014/121419 A1, which includes the SNP marker and its unilateral sequence. Can be used to design chips.
  • nucleotide sequences set forth in SEQ ID: 1-86071 are collectively referred to as 90K, wherein the SNP marker first published in the present application (ie, the SNP in the nucleotide sequence shown in SEQ ID NO: 1-27778) The marker) is referred to as the addition of 30K, and the SNP marker in the nucleotide sequence shown in SEQ ID: 27782-86071 is referred to as 60K.
  • the application provides a rice chip comprising a detection site designed for a SNP marker in the nucleotide sequence set forth in SEQ ID NO:1-27781.
  • the chip further comprises a detection site designed for a SNP tag in the nucleotide sequence set forth in SEQ ID NO: 27782-86071, ie the chip comprises for SEQ ID NO: 1-86071 A detection site designed for SNP tagging in a nucleotide sequence.
  • the chip comprises a detection site designed for a SNP tag in at least 37582 nucleotide sequences in the nucleotide sequence set forth in SEQ ID NOs: 1-86071.
  • the detection site is a probe combination designed for a SNP marker.
  • the chip is fabricated using in-situ synthesis, off-chip synthesis, or microbeading.
  • the chip is formed by in-situ lithography synthesis, photoresist parallel synthesis, microfluidic channel in sheet synthesis, light Guided in-situ synthesis, soft lithography in-situ synthesis, jet synthesis, molecular stamping in sheet synthesis, maskless chip synthesis, BeadArray method, or suspension chip method.
  • the chip is fabricated by Illumina Infinium technology, Affymetrix Axiom technology.
  • the application provides the use of the above SNP marker combination or chip in detecting a biological sample.
  • the assay is used for breeding, identification, gene mapping and cloning, germplasm identification, hybrid rice identification, wild rice identification, functional gene identification, or functional gene haplotype analysis.
  • the application provides a method of detecting a biological sample, the method comprising detecting information of a SNP marker in a nucleotide sequence set forth in SEQ ID NO:1-277881 in the biological sample.
  • the method further comprises detecting information of a SNP marker in the nucleotide sequence set forth in SEQ ID: 27782-86071 in the biological sample. In certain embodiments, the method comprises detecting information of a SNP marker in at least 37582 nucleotide sequences in the nucleotide sequence set forth in SEQ ID NO: 1-86071 in the biological sample.
  • the detection is performed using a gene chip.
  • the chip comprises a detection site designed for a SNP marker in the nucleotide sequence set forth in SEQ ID NOs: 1-27781.
  • the chip further comprises a detection site designed for a SNP marker in the nucleotide sequence set forth in SEQ ID NO: 27782-86071.
  • the chip comprises a detection site designed for a SNP tag in at least 37582 nucleotide sequences in the nucleotide sequence set forth in SEQ ID NOs: 1-86071.
  • the detection site is a probe combination designed for a SNP marker.
  • the chip is fabricated using in-situ synthesis, off-chip synthesis, or microbeading.
  • the chip is formed by in-situ lithography synthesis, photoresist parallel synthesis, microfluidic channel in-slice synthesis, photo-guided in-situ synthesis, soft lithography in situ synthesis, Spray synthesis method, molecular seal in sheet synthesis, maskless chip synthesis, BeadArray method, or suspension chip method Production.
  • the chip is fabricated by Illumina Infinium technology or Affymetrix Axiom technology.
  • the application provides a method of screening a representative SNP marker combination of germplasm resources, comprising the steps of:
  • I.SNP sites The difference of I.SNP sites is 0 points for A/T or C/G, and 20 points for other differences;
  • the SNP locus is located in the gene spacer, intron, promoter, 5' non-coding region (5'-UTR) and 3' non-coding region (3'-UTR). , 1.5, 2, 2 and 2.5;
  • mutation scores 2, 5, and 10 are administered, respectively;
  • the whole genome of rice was divided into linkage disequilibrium blocks. Each block selected two sites with the highest comprehensive score and 25 sites at most, satisfying at least 10 sites per 100 kb.
  • the present application provides a method of screening for a hybrid rice-specific SNP marker combination, comprising the steps of:
  • I.SNP sites The difference of I.SNP sites is 0 points for A/T or C/G, and 20 points for other differences;
  • SNP locus is located in the intergenic region, intron, promoter, 5' non-linear When the code region (5'-UTR) and the 3'-end non-coding region (3'-UTR) are in different positions, the points are 1, 1.5, 2, 2, and 2.5, respectively;
  • SNPs cause synonymous mutations, non-synonymous mutations, and large-effect mutations in the coding region, they are given 2, 5, and 10, respectively;
  • the application provides a method of screening a wild rice-derived SNP marker combination comprising the steps of:
  • the 55 bp sequence upstream or downstream of the SNP site is aligned with the rice genome, and the SNP site with more than 70% matching with other positions in the genome is removed;
  • the rice genome was divided into sections according to the position per 40 kb, and each segment selected one of the SNP sites with the highest score.
  • the application provides a method of screening a functional gene region marker combination comprising the steps of:
  • the 55 bp sequence upstream or downstream of the SNP site is aligned with the rice genome, and the SNP site with more than 70% matching with other positions in the genome is removed;
  • the SNP site in a particular functional gene region is selected, and the Rice60K chip disclosed in WO/2014/121419A1 has a SNP site number of no more than 10 in this region.
  • the SNP marker of the nucleotide sequence shown in SEQ ID NO: 1-27778 consists of five types of markers, and the corresponding SNP sites are screened by the following methods, respectively.
  • I.SNP sites The difference of I.SNP sites is 0 points for A/T or C/G, and 20 points for other differences;
  • the base mutation of the coding region is directly related to function, when the SNP causes synonymous mutation, non-synonymous mutation and large effect mutation (such as termination mutation) in the coding region, it is divided into 2, 5 and 10 respectively;
  • the rice genome is divided into linkage unbalanced blocks; the general principle of selection sites is that the SNP sites are representative and evenly distributed, and each block selects two sites with the highest comprehensive score. Ensure that at least 10 sites are selected per 100 kb; when the number of blocks within 100 kb is less than 5, ie, fewer than 10 sites are selected per 100 kb, then 3 or more SNP sites are selected for each block, each region A maximum of 25 sites are selected in the block.
  • I.SNP sites The difference of I.SNP sites is 0 points for A/T or C/G, and 20 points for other differences;
  • the base mutation of the coding region is directly related to function, when the SNP causes synonymous mutation, non-synonymous mutation and large effect mutation (such as termination mutation) in the coding region, it is divided into 2, 5 and 10 respectively;
  • the SNP locus was uniformly selected on the rice genome.
  • the rice genome is divided into sections according to the position every 40 kb, and each segment selects one of the SNP sites with the highest score.
  • 8316 large-effect SNP loci were selected from the reported 879 functional gene regions (the distribution of SNP loci in the functional gene region in the newly added 30K SNP locus in Figure 1a).
  • the ordinate numbers sequentially represent 12 chromosomes in rice. The coordinates are physical positions; the height of the vertical line indicates the number of SNP sites; the legend indicates the correspondence between the height of the vertical line and the number of SNP sites).
  • 191 SNP markers involving the rice blast resistance gene, the brown planthopper resistance gene, the fertility restorer gene and other gene regions can distinguish different allele types.
  • the design method is as follows: selecting the rice material containing the target gene and the target gene, and based on the positional information of the known target gene in the genome, using the Nipponbare genome as a reference, designing the primer every 5-10 kb, and obtaining the target gene by Sanger sequencing method. The gene sequences in the 250kb interval were used to explore the differential SNP marker design markers of the two groups of materials, and a total of 191 SNP markers were obtained from five gene regions (Pi1, Pi2, Bph14, Bph15, Rf-1).
  • Applicants will combine all of the SNP markers obtained in Example 1 with 58,290 SNP markers detected by the Rice Whole Genome Breeding Chip Rice60K disclosed in PCT International Application WO/2014/121419A1, using the Illumina infinium chip technology to produce the rice 90K genome. Breeding chips (as shown in Figure 1f, the distribution of 30K and Rice60K SNP loci is shown. The ordinate numbers represent 12 chromosomes in rice, the abscissa is the physical position; the vertical line height indicates the number of SNP loci; the legend indicates the vertical line height and The correspondence between the number of SNP sites is named Rice60KAddon1.
  • the markers detected by the chip contained 27,781 SNP markers of the present application, as well as 58,290 SNP markers detected by the rice whole genome breeding chip Rice90K disclosed in PCT International Application WO/2014/121419 A1.
  • the chip probe sequence distribution was designed and selected in the 70 bp region on both sides of the SNP marker according to the Illumina infinium chip technology requirements.
  • the SNP marker combination in the nucleotide sequence set forth in SEQ ID NO:1-27781 is simply referred to as the addition of 30K to distinguish it from the published SNP markers in the chip.
  • Rice Genome Breeding Chips Rice6K and Rice60K (or RiceSNP50) based on Illumina infinium technology developed by the applicant have been proven to be well applied to rice molecular breeding and functional genomics research (Yu et al., A whole-genome SNP array ( RICE6K) for genomic breeding in rice. Plant Biotechnol J. 2014, 12: 28-37; Chen et al, A high-density SNP genotyping array for rice biology and molecular breeding. Mol Plant.
  • Applicants submitted a total of 86,290 SNP markers detected by Rice90K chip and a total of 27,781 SNP markers to a total of 86,071 SNP markers submitted to Affymetrix (http://www.affymetrix.com/) to make chips.
  • Affymetrix designed two probe sets according to the sequence on each side of each mark. Finally, there were 131,631 probe sets, which detected a total of 86,014 SNP sites. Named Os90Kv1.
  • Pi9 gene reference variety 75-1-127 The broad-spectrum blast resistance gene Pi9encodes a nucleotide-binding site-leucine-rich repeat protein and is a member of a multigene family in rice. Genetics. 2006, 172: 1901-1914), containing the Pigm gene reference species as a valley Mei 4 (GM4) (Deng et al, Genetic Characterization and fine mapping of the blast resistance locus Pigm(t) tightly linked to Pi2 and Pi9in a broad-spectrum resistant Chinese variety. TheorAppl Genet 113, 705-713). A total of 7 samples of the sample to be tested and the reference sample were extracted for DNA, and the whole genome genotype of 7 samples was obtained according to the Illumina infinium chip detection process using the rice genome-wide breeding chip Rice60K Addon1.
  • results of 60K SNP markers detected by rice whole genome breeding chip Rice60K disclosed in WO2014/121419A1
  • 30K SNP markers were clustered in the Pi2/Pi9/Pigm gene region (upper and lower 250kb regions). Analysis, the results are shown in Figure 3 (the ordinate indicates the material value difference; the lateral direction is the detection material, which is divided into the same haplotype by the horizontal line).
  • the clustering results of the two were consistent in this region, that is, the haplotypes of R002, R005, R006 and C101A51 were identical, and R004 was consistent with the haplotype of GM4. This result indicates that R002, R005, and R006 contain the Pi2 gene, and R004 contains the Pigm gene.
  • the Sanger method was used to sequence the target genes of the above materials, which was consistent with the clustering results, indicating that the SNP markers designed according to the functional gene region haplotype can achieve their functions.
  • the clustering results of Rice60K showed that the difference between 75-1-127 and C101A51 was less than 0.2, and the result of adding 30K was greater than 0.2 and close to 0.3. The larger the value, the better the classification effect.
  • the two materials have been confirmed to contain different resistance genes, so it can be seen that the new 30K classification effect is better than Rice60K in this functional gene region.
  • the target genomic DNA fragment is homologously recombined, the target genomic DNA fragment is homozygous, and the target plant with complete background recovery.
  • the "high-density label detection method" in the step (3) can perform genotype detection using the SNP marker combination described in the present application and a chip designed for these SNP markers.
  • a method for identifying rice DNA identity disclosed in Chinese Patent Application CN201610009053.9 (Publication No. CN 105550537A), which obtains standard genetic fingerprint data of rice by detecting genotypes of a set of genetic diversity markers distributed throughout the genome of rice. Thereby, the DNA identity of the rice is identified.
  • the "set of genetic diversity markers distributed throughout the genome of rice” in this method can be detected using the SNP marker combinations described herein and the chips designed for these SNP markers.
  • the rice genome-wide breeding chip Rice6K developed by the applicant has been applied to the rice grain size and yield-related QTL positioning (Sun et al., Identification of quantitative trait loci for grain size and the contributions of major grain-size QTLs to grain weight in rice, Mol Breeding DOI10.1007/s11032-012-9802-z; Tan et al, QTL Scanning for Rice Yield Using a Whole Genome SNP Array, Journal of Genetics and Genomics, 2013), SNP marker combinations described herein and designed for these SNP markers
  • the chip has a purposeful increase in the detected SNP sites, which can provide more accurate information for gene mapping and cloning.
  • SNP marker combinations described in the present application and the chips designed for these SNP markers add the following five types of markers: representative markers of germplasm resources, promotion of hybrid rice-specific markers, wild rice source markers, functional gene region markers, and functional gene regions. Type mark. It is apparent that the SNP marker combinations and chips designed for these SNP markers can be applied to germplasm resource identification, hybrid rice identification, wild rice identification, functional gene identification, and functional gene haplotype analysis.
  • Embodiment 6 sets the minimum number of SNP markers for implementing the detection function
  • Rice60KAdd1 can accurately determine the rice blast resistant fragment contained in A08-1.
  • Rice60KAdd1 detected a total of 65,071 high-quality sites in A08-1, in which there are 11 SNP markers distinguishing A08-1 from the recipient parental vaccination 131 in the target rice blast resistance fragment, see the table below, where the receptor The parental empty breeding 131 genotype was set to A, and the donor parental K22 genotype was set to B.
  • the polymorphic sites in the material appear 3 times AA or BB consecutively, that is, more than 3 SNP markers in the above table detect the difference to determine the difference of the material in the target segment.
  • Standard random sampling was performed on 65,071 high-quality sites, and the above-mentioned sites were randomly selected 100 times, and the number of 11 different SNP markers in the table was counted.
  • the results show that when the number of sampling sites is greater than 37582, the probability of the number of less than 3 in the 11 differential SNP markers is less than 0.05, which is a small probability event in the normal distribution. That is, among the 86,014 SNP markers contained in the Rice60KAdd1 chip, 37582 is the minimum number of SNP markers to implement the detection function.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Plant Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

用于水稻基因分型的SNP标记组合和设计方法,针对这些SNP标记设计的芯片及其应用。

Description

水稻全基因组育种芯片及其应用 技术领域
本申请涉及基因组学、分子生物学、生物信息学和分子植物育种领域,具体地,涉及一种水稻全基因组育种芯片及其应用。
背景技术
基因组育种指的是将分子生物学技术应用于育种中,在基因组水平上进行育种。其主要优势有如下三点:第一,可对植物种子或者幼苗在分子水平进行鉴定,进一步判断是否具有期待的优良性状,从而进行选择,实现育种进程的加速及育种准确性的提高;第二,分子生物学检测及分析可以形成一套标准的流程,不同的技术员严格按照该流程操作均能快速得到准确的结果,大大降低了个人经验对植株选育的影响;第三,基因组育种中的标记技术能够在全基因组水平进行检测,避免由于材料含杂合位点造成后代的分离,保证材料的稳定性。标记技术是基因组育种中的一个重要工具,该技术已经对作物的功能基因组研究及遗传改良做出了巨大的贡献。其中SNP(Single Nucleotide Polymorphism,单核苷酸多态性)作为第三代标记因为在基因组上分布广、密度高、稳定性及准确性高的特点而得到越来越广泛的应用。高通量检测SNP的技术主要有基于测序技术的检测平台和基于芯片技术的检测平台,SNP芯片由于标记位点的可控性、操作的便利性和结果的可靠性而成为基因组育种过程中的重要工具。目前,SNP芯片检测技术最成熟的有Illumina infinium芯片和Affymetrix Axiom芯片两大平台。
Illumina infinium芯片技术是一种基于微珠的高密度芯片技术。该技术利用直径3μm的微珠,在光纤束或平面硅片为基质的微孔中进行自我组装。每个微珠上都覆盖了特定寡核苷酸的几十 万条拷贝,这些拷贝将作为捕获序列在检测中对样品进行基因分型。芯片可以根据寡核苷酸的类型数目分成以下几种格式:24个样品格式(3,000-90,000种微珠类型)、12个样品格式(90,001-250,000种微珠类型)或4个样品格式(250,001-1,000,000种微珠类型)。芯片配套的扫描系统具有先进的激光和光学元件,能够处理高密度的多样品芯片,产生高质量数据的同时保证运转速度快。先进的分析技术使得样品的平均检出率高,重复性高达99.9%。这些高质量数据降低了假阳性和假阴性的可能,使得基因分型的结果更加精确。
Affymetrix Axiom芯片采用的是原位光刻技术,该技术中的光掩膜设计和严格的工艺流程使制造的芯片具有高质量、高重复性和一致性,也确保了芯片上探针合成的极高密度,每平方厘米基片上合成的探针数量超过400万。Affymetrix GeneTitan系统是全自动高度集成的芯片工作站,使用类似于96孔板形式的芯片板,其中每一块方形芯片大约占据了96孔板一个孔的面积,一块芯片板可包含16、24或96块芯片,从而实现多样本高通量检测。该系统将从杂交到扫描的实验全过程中用到的杂交炉、流体工作站和CCD扫描成像设备整合为一台仪器,将芯片板放入GeneTitan系统后,芯片的杂交、洗涤、扫描几乎无需人工干预,全部可以由机器自动完成。
申请人在PCT国际申请公布WO/2014/121419A1中公开了水稻全基因组育种芯片Rice60K,该芯片已成功应用于水稻基因组育种和功能基因组研究。
发明内容
在一方面,本申请提供了用于水稻基因分型的SNP标记组合,其特征在于,包括SEQ ID NO:1-27781所示核苷酸序列中的SNP标记。
在一些实施方案中,本申请的SNP标记组合还包括SEQ ID:27782-86071所示核苷酸序列中的SNP标记。在一些实施方案中, 本申请的SNP标记组合包括SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记。
在另一方面,本申请提供了水稻芯片,其包含针对SEQ ID NO:1-27781所示核苷酸序列中的SNP标记设计的检测位点。
在一些实施方案中,本申请的水稻芯片包含针对SEQ ID NO:1-27781所示核苷酸序列中的SNP标记设计的检测位点。
在一些实施方案中,本申请的水稻芯片还包含针对SEQ ID NO:27782-86071所示核苷酸序列中的SNP标记设计的检测位点。在一些实施方案中,本申请的水稻芯片包含针对SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记设计的检测位点。在一些实施方案中,本申请的水稻芯片中的检测位点为针对SNP标记设计的探针组合。
在一些实施方案中,本申请的水稻芯片利用在片原位合成法、离片合成法、或微珠法制作。在一些实施方案中,本申请的水稻芯片通过原位光刻合成法、光敏抗蚀层并行合成法、微流体通道在片合成法、光引导原位合成法、软光刻技术原位合成法、喷印合成法、分子印章在片合成法、无掩膜芯片合成法、BeadArray法、或悬浮芯片法制作。在一些实施方案中,本申请的水稻芯片通过Illumina Infinium技术或Affymetrix Axiom技术制作。
在另一方面,本申请提供了上述SNP标记组合或芯片在检测生物样品中的用途。在某些具体实施方案中,所述检测用于育种、身份鉴定、基因定位及克隆、种质资源鉴定、杂交稻鉴定、野生稻鉴定、功能基因鉴定或功能基因单倍型分析。
在另一方面,本申请提供了检测生物样品的方法,所述方法包括检测所述生物样品中SEQ ID NO:1-27781所示核苷酸序列中的SNP标记的信息。在一些实施方案中,本申请的方法还包括检测所述生物样品中SEQ ID:27782-86071所示核苷酸序列中的SNP标记的信息。在一些实施方案中,本申请的方法包括检测所述生物样品中SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记的信息。在一些实施方案中,本申请的方 法利用基因芯片进行所述检测。
在另一方面,本申请提供了筛选种质资源代表性SNP标记组合的方法,其包括以下步骤:
从多个水稻品种测序结果中获取SNP位点;
选择在Illumina评分系统中分值大于0.6的位点;
对SNP位点进行综合评分,所述综合评分为以下数值的简单加和:
I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
II.SNP位点位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
III.当SNP在编码区造成同义突变、非同义突变和大效应的突变时,分别给予突变评分2、5和10;
IV.(SNP位点在整个群体中的MAF×25)+(SNP位点在籼稻群体中的MAF×25)+(SNP位点在粳稻群体中的MAF×25)+(SNP位点在混合测序中的MAF×25);
根据综合评分,在水稻基因组上均匀选择多个SNP位点;以及
根据LD值将水稻全基因组进行连锁不平衡区块划分,每个区块选择2个综合评分最高的位点、最多选择25个位点,满足每100kb至少选择10个位点。
在另一方面,本申请提供了筛选推广杂交稻特有SNP标记组合的方法,其包括以下步骤:
对多份杂交稻进行全基因组测序,获得多个SNP位点;
选择在Illumina评分系统中分值大于0.6的位点;
SNP位点的综合评分,其由以下数值的简单加和组成:
I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
II.SNP位点位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、 1.5、2、2和2.5;
III.当SNP在编码区造成同义突变、非同义突变和大效应的突变时,分别给分2、5和10;
IV.SNP位点在混合测序中的MAF×50;
根据综合评分结果,在水稻基因组上均匀选择多个SNP位点。
在另一方面,本申请提供了筛选野生稻来源SNP标记组合的方法,其包括以下步骤:
从水稻SNP数据库中获得来源于野生水稻品种的SNP位点;
除去在SNP位点上下游55bp内存在其它SNP或Indel的位点;
选择在至少10%的品种中能检测出来的SNP位点;
将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的SNP位点;
选择在Illumina评分系统中分值大于0.6的位点;
将水稻基因组按照位置每40kb划分区段,每个区段选择分值最高的SNP位点一个。
在另一方面,本申请提供了筛选功能基因区域标记组合的方法,其包括以下步骤:
从水稻SNP数据库中获得多个SNP位点,其中所述多个SNP位点位于多个水稻品种的多个功能基因的核苷酸序列内并且能够在三个以上品种中检测出;
除去在SNP位点上下游55bp内存在其它SNP或Indel的位点;
将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的SNP位点;
除去位于所述功能基因上下游5kb以外的SNP标记;
选择在Illumina评分系统中分值大于0.6的位点;
选择特定功能基因区域中的SNP位点,所述特定功能基因区域为WO/2014/121419A1公开的Rice60K芯片在此区域已有SNP位点数量不超过10。
附图说明
图1为SNP位点在水稻基因组上的分布情况。纵坐标数字依次表示水稻12条染色体,横坐标为物理位置;竖线高度表示SNP位点数目;图例表示竖线高度与SNP位点数目的对应关系。1a为新增30K SNP位点中功能基因区SNP位点分布;1b为新增30K SNP位点中野生稻来源SNP位点分布;1c为新增30K SNP位点中推广杂交稻特有SNP位点分布;1d为新增30K SNP位点中种质资源代表性SNP位点分布;1e为新增30K SNP位点分布;1f为新增30K和Rice60K SNP位点的分布。
图2为利用90K芯片检测抗稻瘟病改良材料A08-1的遗传背景。2a为Rice60KAddon1检测结果;2b为Os90Kv1检测结果。其中,横坐标数字所指示方框依次表示水稻12条染色体,纵坐标数字为水稻基因组上的物理位置[以兆碱基(Mb)为单位];图中白色背景表示与受体材料空育131基因型一致,黑色线条表示与供体材料K22基因型一致,第6号染色体上黑色圆点处线条为目标片段。
图3抗稻瘟病基因Pi2/Pi9/Pigm区域单倍型聚类分析结果。3a利用新增30K SNP标记组合的聚类分析结果;3b为Rice60K芯片的聚类分析结果。其中,纵坐标表示材料件差异值;横向为各检测材料,以横线相连的表示划分为相同单倍型类型。
具体实施方式
本文所用术语“单核苷酸多态性”或者“SNP”或者“SNP标记”或者“SNP位点”是指存在于染色体的基因组序列中的核苷酸序列,基于核苷酸序列的差异(单个核苷酸——A、T、C或G的改变)而引起的多核苷酸序列变化,造成染色体基因组的多样性,进而允许不同等位基因(例如来自两个不同个体的等位基因)或不同个体彼此相区分。该变化可能发生在基因的编码区或非编码区(例如启动子区或其附近,或者内含子)内或者基因间区域中。
本文所用术语“等位基因”指同源染色体上存在于给定基因座中的相同基因的不同形式。
本文所用术语“连锁不平衡”是指在两个或者多个位点上的非随机关联性,这些位点既可能在同一条染色体上,也可以在不同的染色体上。连锁不平衡性也被称作配子水平的不平衡性或配子不平衡性。从另一个角度讲,连锁不平衡是等位基因或者遗传标记在群体中表现出高于或低于由等位基因的随机频率而预测的单模标本的频率。连锁是指染色体上的两个或者多个位点进行有限的组合,而连锁不平衡性不等同于连锁。连锁不平衡的数量取决于观察和预期的位点频率的差异。对于那些重组后位点或者基因型的频率等于预期的群体我们称其为连锁平衡。连锁不平衡的程度取决于多方面的因素,包括遗传连锁,选择,和重组的概率,遗传漂变,选型交配以及群体结构。
本文所用术语“连锁不平衡区块”指根据连锁不平衡的差异,以LD值D'为标准定义全基因组SNP标记的单倍型区块。单倍型位于一条染色体特定区域的一组相互关联,并倾向于以整体遗传给后代的单核苷酸多态的组合。
MAF为最小等位基因频率(Minor Allele Frequency),其是指在给定群体中不常见的等位基因的发生频率。其值越高表明在任意两个品种间具有多态性的可能性越大。
本文所用术语“Indel”是指插入或缺失,其具体是指全基因组中的差异,相对标准对照而言,个体的基因组中有一定数量的核苷酸插入或缺失(Jander et al.,2002)。
本文所用术语“SNP芯片”指生物微芯片,其能够通过排列和附着几百至几十万个生物分子作为探针来分析样品DNA中所含有的SNP的存在,所述生物分子如具有已知序列的DNA、DNA片段、cDNA、寡核苷酸、RNA或RNA片段,它们被以一定的间隔固定在由玻璃、硅或尼龙形成的小固体基材上。根据互补的程度,样品中含有的核酸和固定在表面的探针之间发生杂交。通过检测和判断杂交,可以同时获得关于样品中含有的物质的信息。
现行的主要类型DNA芯片包括:在片原位合成法,其采用修饰的寡核苷酸单体逐步原位合成空间组合的探针序列形成DNA芯 片,从而在硬质表面上直接合成寡核苷酸探针阵列。离片合成法用,其涉及利用点样法将预先合成好的探针序列点到特定位点形成DNA芯片,从而形成固定在玻璃基片的DNA探针阵列。微珠法,其涉及在编码的微珠上直接合成DNA探针,或者将预先制备好的探针序列固定到编码的微珠上,进而任意组装构成微珠芯片。
在一方面,本申请提供了用于水稻基因分型的SNP标记组合,其特征在于,包括SEQ ID NO:1-27781所示核苷酸序列中的SNP标记。SEQ ID NO:1-27781所示核苷酸序列为SNP位点及其上下游各70bp,实际设计探针时可选择从上游或下游设计。
在某些实施方案中,SNP标记组合还包括SEQ ID:27782-86071所示核苷酸序列中的SNP标记。SEQ ID:27782-86071所示核苷酸序列中的SNP标记为PCT国际申请WO2014/121419A1中公开的水稻全基因组育种芯片Rice60K所检测的58,290个SNP标记组合,其包括SNP标记及其单侧序列,可用于设计芯片。
在本文的上下文中,将SEQ ID:1-86071所示核苷酸序列总称为90K,其中将本申请首次公布的SNP标记(即SEQ ID NO:1-27781所示核苷酸序列中的SNP标记)称为新增30K,而将SEQ ID:27782-86071所示核苷酸序列中的SNP标记称为60K。
在另一方面,本申请提供了水稻芯片,其包含针对SEQ ID NO:1-27781所示核苷酸序列中的SNP标记设计的检测位点。
在某些实施方案中,所述芯片还包含针对SEQ ID NO:27782-86071所示核苷酸序列中的SNP标记设计的检测位点,即所述芯片包含针对SEQ ID NO:1-86071所示核苷酸序列中的SNP标记设计的检测位点。在某些实施方案中,所述芯片包含针对SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记设计的检测位点。在某些实施方案中,所述检测位点为针对SNP标记设计的探针组合。
在某些实施方案中,所述芯片利用在片原位合成法、离片合成法、或微珠法制作。在某些实施方案中,所述芯片通过原位光刻合成法、光敏抗蚀层并行合成法、微流体通道在片合成法、光 引导原位合成法、软光刻技术原位合成法、喷印合成法、分子印章在片合成法、无掩膜芯片合成法、BeadArray法、或悬浮芯片法制作。在某些实施方案中,所述芯片通过Illumina Infinium技术、Affymetrix Axiom技术制作。
在另一方面,本申请提供了上述SNP标记组合或芯片在检测生物样品中的用途。在某些具体实施方案中,所述检测用于育种、身份鉴定、基因定位及克隆、种质资源鉴定、杂交稻鉴定、野生稻鉴定、功能基因鉴定或功能基因单倍型分析。
在另一方面,本申请提供了检测生物样品的方法,所述方法包括检测所述生物样品中SEQ ID NO:1-27781所示核苷酸序列中的SNP标记的信息。
在某些实施方案中,所述方法还包括检测所述生物样品中SEQ ID:27782-86071所示核苷酸序列中的SNP标记的信息。在某些实施方案中,所述方法包括检测所述生物样品中SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记的信息。
在某些实施方案中,利用基因芯片进行所述检测。在某些实施方案中,所述芯片包含针对SEQ ID NO:1-27781所示核苷酸序列中的SNP标记设计的检测位点。
在某些实施方案中,所述芯片还包含针对SEQ ID NO:27782-86071所示核苷酸序列中的SNP标记设计的检测位点。在某些实施方案中,所述芯片包含针对SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记设计的检测位点。在某些实施方案中,所述检测位点为针对SNP标记设计的探针组合。
在某些实施方案中,所述芯片利用在片原位合成法、离片合成法、或微珠法制作。在某些实施方案中,所述芯片通过原位光刻合成法、光敏抗蚀层并行合成法、微流体通道在片合成法、光引导原位合成法、软光刻技术原位合成法、喷印合成法、分子印章在片合成法、无掩膜芯片合成法、BeadArray法、或悬浮芯片法 制作。在某些实施方案中,所述芯片通过Illumina Infinium技术或Affymetrix Axiom技术制作。
在另一方面,本申请提供了筛选种质资源代表性SNP标记组合的方法,其包括以下步骤:
从多个水稻品种测序结果中获取SNP位点;
选择在Illumina评分系统中分值大于0.6的位点;
对SNP位点进行综合评分,所述综合评分为以下数值的简单加和:
I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
II.SNP位点位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
III.当SNP在编码区造成同义突变、非同义突变和大效应的突变时,分别给予突变评分2、5和10;
IV.(SNP位点在整个群体中的MAF×25)+(SNP位点在籼稻群体中的MAF×25)+(SNP位点在粳稻群体中的MAF×25)+(SNP位点在混合测序中的MAF×25);
根据综合评分,在水稻基因组上均匀选择多个SNP位点;以及
根据LD值将水稻全基因组进行连锁不平衡区块划分,每个区块选择2个综合评分最高的位点、最多选择25个位点,满足每100kb至少选择10个位点。
在另一方面,本申请提供了筛选推广杂交稻特有SNP标记组合的方法,其包括以下步骤:
对多份杂交稻进行全基因组测序,获得多个SNP位点;
选择在Illumina评分系统中分值大于0.6的位点;
SNP位点的综合评分,其由以下数值的简单加和组成:
I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
II.SNP位点位于基因间隔区、内含子、启动子、5’端非编 码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
III.当SNP在编码区造成同义突变、非同义突变和大效应的突变时,分别给分2、5和10;
IV.SNP位点在混合测序中的MAF×50;
根据综合评分结果,在水稻基因组上均匀选择多个SNP位点。
在另一方面,本申请提供了筛选野生稻来源SNP标记组合的方法,其包括以下步骤:
从水稻SNP数据库中获得来源于野生水稻品种的SNP位点;
除去在SNP位点上下游55bp内存在其它SNP或Indel的位点;
选择在至少10%的品种中能检测出来的SNP位点;
将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的SNP位点;
选择在Illumina评分系统中分值大于0.6的位点;
将水稻基因组按照位置每40kb划分区段,每个区段选择分值最高的SNP位点一个。
在另一方面,本申请提供了筛选功能基因区域标记组合的方法,其包括以下步骤:
从水稻SNP数据库中获得多个SNP位点,其中所述多个SNP位点位于多个水稻品种的多个功能基因的核苷酸序列内并且能够在三个以上品种中检测出;
除去在SNP位点上下游55bp内存在其它SNP或Indel的位点;
将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的SNP位点;
除去位于所述功能基因上下游5kb以外的SNP标记;
选择在Illumina评分系统中分值大于0.6的位点;
选择特定功能基因区域中的SNP位点,所述特定功能基因区域为WO/2014/121419A1公开的Rice60K芯片在此区域已有SNP位点数量不超过10。
实施例
实施例1.SNP标记选择方法
如SEQ ID NO:1-27781所示核苷酸序列的SNP标记由五类标记组成,其对应的SNP位点分别按照以下方法筛选得到。
1.种质资源代表性SNP位点:
(1)从1491个水稻品种(来自RiceVarMap数据库,参见网页http://ricevarmap.ncpgr.cn/)测序得到6,428,770个SNP位点;
(2)选择在Illumina评分系统中分值大于0.6的位点;
(3)对SNP位点进行综合评分,其为以下数值的简单加和:
I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
II.根据基因结构不同区域对基因功能影响程度的差异,当SNP位点分别位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
III.因编码区的碱基突变与功能直接相关,当SNP在编码区造成同义突变、非同义突变和大效应的突变(如终止突变)时,分别给分2、5和10;
IV.(SNP位点在整个群体中的MAF×25)+(SNP位点在籼稻群体中的MAF×25)+(SNP位点在粳稻群体中的MAF×25)+(SNP位点在混合测序中的MAF×25);
(4)根据综合评分,在水稻基因组上均匀选择4850个SNP位点;
(5)根据LD值将水稻全基因组进行连锁不平衡区块划分;选择位点的一般原则为,SNP位点具有代表性且均匀分布,每个区块选择2个综合评分最高的位点,确保每100kb至少选择10个位点;当100kb内的区块小于5个时,即每100kb选择的位点少于10个,则部分区块选择3个或以上的SNP位点,每个区块中最多选择25个位点。
最终,基于LD选择,结合整体水稻种群、籼粳亚种及杂交稻混合测序结果,挑选6108个SNP位点(如图1d新增30K SNP位点中种质资源代表性SNP位点分布所示。纵坐标数字依次表示水稻 12条染色体,横坐标为物理位置;竖线高度表示SNP位点数目;图例表示竖线高度与SNP位点数目的对应关系)。
2.推广杂交稻特有SNP标记:
(1)从市场购买的杂交稻混合进行全基因组测序,获得2,207,700个SNP位点,其中13.8%的位点在1491个品种(RiceVarMap数据库,参见网页http://ricevarmap.ncpgr.cn/)测序数据中未检测出,表明增加推广杂交稻特有标记是必要的;
(2)选择在Illumina评分系统中分值大于0.6的位点;
(3)SNP位点的综合评分,其由以下数值的简单加和组成:
I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
II.根据基因结构不同区域对基因功能影响程度的差异,当SNP位点分别位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
III.因编码区的碱基突变与功能直接相关,当SNP在编码区造成同义突变、非同义突变和大效应的突变(如:终止突变)时,分别给分2、5和10;
IV.SNP位点在混合测序中的MAF×50;
(4)根据综合评分结果,在水稻基因组上均匀选择SNP位点。
最终,从100多个生产上应用的杂交稻基因组测序数据中选择出4850个SNP位点(如图1c新增30K SNP位点中推广杂交稻特有SNP位点分布所示。纵坐标数字依次表示水稻12条染色体,横坐标为物理位置;竖线高度表示SNP位点数目;图例表示竖线高度与SNP位点数目的对应关系)。
3.野生稻来源SNP标记:
(1)从水稻SNP数据库(http://202.127.18.221/RiceHap3/index.php)中获得来源于446个野生水稻品种的2,472,942个SNP位点;
(2)除去在上下游55bp内存在其它SNP或Indel的位点;
(3)选择在至少10%的品种中都能检测出来的SNP位点;
(4)将SNP位点上游或下游55bp的序列与水稻基因组进行比对, 除去与基因组其它位置匹配度在70%以上的SNP位点;
(5)选择在Illumina评分系统中分值大于0.6的位点;
(6)将水稻基因组按照位置每40kb划分区段,每个区段选择分值最高的SNP位点一个。
最终,从已发表的446个野生稻品种中挑选出基因组上均匀分布的8316个SNP位点(如图1b新增30K SNP位点中野生稻来源SNP位点分布所示。纵坐标数字依次表示水稻12条染色体,横坐标为物理位置;竖线高度表示SNP位点数目;图例表示竖线高度与SNP位点数目的对应关系)。
4.功能基因区域标记:
(1)从水稻SNP数据库(http://ricevarmap.ncpgr.cn/)中获得来源于590个水稻品种的879个功能基因区(肖景华等.中国水稻功能基因组研究进展与展望.科学通报,2015,60:1711-1722)的5,680,149个SNP位点,此SNP位点均能在三个以上品种中检测出;
(2)除去在上下游55bp内存在其它SNP或Indel的位点;
(3)将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的位点;
(4)选择在已克隆的879个功能基因上下游5kb范围内的SNP位点;
(5)选择在Illumina评分系统中分值大于0.6的位点;
(6)选择特定功能基因区域中的SNP位点,所述特定功能基因区域为WO/2014/121419A1所公开的Rice60K芯片中已有SNP位点数量不超过10。
最终,从已报道的879个功能基因区选择8316个大效应SNP位点(图1a新增30K SNP位点中功能基因区SNP位点分布所示。纵坐标数字依次表示水稻12条染色体,横坐标为物理位置;竖线高度表示SNP位点数目;图例表示竖线高度与SNP位点数目的对应关系)。
5.功能基因区域单倍型标记:
涉及稻瘟病抗性基因、褐飞虱抗性基因、育性恢复基因等基因区域(Pi1、Pi2、Bph14、Bph15、Rf-1)的191个SNP标记,能够区分不同等位基因型。设计方法如下:选择含有目标基因及不含目标基因的水稻材料,根据已知目标基因的在基因组中的位置信息,以日本晴基因组为参照,每5-10kb设计引物,利用Sanger测序法获得目标基因前后250kb区间内的基因序列,发掘两组材料的差异SNP标记设计标记,共获得5个基因区域(Pi1、Pi2、Bph14、Bph15、Rf-1)的191个SNP标记。
实施例2.利用SNP标记组合构建Rice60KAddon1芯片
申请人将从实施例1中得到的所有SNP标记与PCT国际申请WO/2014/121419A1中公开的水稻全基因组育种芯片Rice60K所检测的58,290个SNP标记组合,利用Illumina infinium芯片技术制作水稻90K全基因组育种芯片(如图1f新增30K和Rice60K SNP位点的分布所示。纵坐标数字依次表示水稻12条染色体,横坐标为物理位置;竖线高度表示SNP位点数目;图例表示竖线高度与SNP位点数目的对应关系),命名为Rice60KAddon1。芯片所检测的标记包含本申请的27781个SNP标记,以及在PCT国际申请WO/2014/121419A1中公开的水稻全基因组育种芯片Rice90K所检测的58,290个SNP标记。芯片探针序列分布按照Illumina infinium芯片技术要求在SNP标记两侧各70bp的区域内进行设计与选择。SEQ ID NO:1-27781所示核苷酸序列中的SNP标记组合简称为新增30K,以与芯片中的已公开的SNP标记进行区分。
申请人研制的基于Illumina infinium技术的水稻全基因组育种芯片Rice6K和Rice60K(或称RiceSNP50)已经经过证实,能够很好地应用于水稻分子育种和功能基因组研究(Yu等,A whole-genome SNP array(RICE6K)for genomic breeding in rice.Plant Biotechnol J.2014,12:28-37;Chen等,A high-density SNP genotyping array for rice biology and molecular breeding.Mol Plant.2014,7:541-553),本申请新增的30K标记也是基于Illumina  infinium平台设计的,显然可以判定本申请的SNP标记组合适用于基于Illumina infinium平台设计基因芯片,因此本申请中没有对基于Illumina infinium技术的Rice60KAddon1进行简单验证,而是直接用于后续分析。
实施例3利用SNP标记组合构建Os90Kv1芯片
申请人将Rice90K芯片所检测的58,290个SNP标记和新增加的27,781个SNP标记共86,071个SNP标记提交给Affymetrix公司(http://www.affymetrix.com/)制作芯片。为了使之适合Affymetrix Axiom芯片平台,Affymetrix公司根据每个标记两侧的序列分别设计两个探针组(probe set),最后共有131,631个探针组,其共检测86,014个SNP位点,该芯片命名为Os90Kv1。
Os90Kv1芯片生产好之后,按照Affymetrix Axiom 2.0芯片检测流程在GeneTitan设备上(http://www.affymetrix.com/)检测192个水稻样品,包括96个自交系亲本和96个杂种F1。经过Affymetrix公司数据分析人员分析,共有190个样品(检出率>99%)通过质控QC,被认为检测合格。申请人对这些数据进一步分析,按照以下标准筛选高质量SNP标记:(1)检测同一个SNP位点的两个探针组取基因分型效果最好的一个;(2)在检测89个自交系亲本品种(96个自交系样品中部分为同一个品种的重复检测或亲缘关系很近,只取一个)时杂合基因型总数≤3;(3)分型类型为PolyHighResolution、MonoHighResolution或者NoMinorHom(分型类型由Affymetrix公司提供)。最后共得到60,938个高质量的探针组,检测60,938个SNP位点。
利用这些高质量SNP标记对空育131导入抗稻瘟病基因的一个稳定株系A08-1(专利申请号CN201410532337.7,公开号CN105567790A)进行背景分析,结果表明除了Chr6导入了目标片段之外,背景基本上回复到空育131,见图2b(图中横坐标数字所指示方框依次表示水稻12条染色体,纵坐标数字为水稻基因组上的物理位置[以兆碱基(Mb)为单位];图中白色背景表示与受体材料空育131 基因型一致,黑色线条表示与供体材料K22基因型一致或实验误差,第6号染色体上黑色圆点处线条为目标片段)。同样的样品经过基于Illumina infinium芯片平台的90K芯片(Rice60KAddon1)检测,背景完全干净,见图2a(图中横坐标数字所指示方框依次表示水稻12条染色体,纵坐标数字为水稻基因组上的物理位置[以兆碱基(Mb)为单位];图中白色背景表示与受体材料空育131基因型一致,黑色线条表示与供体材料K22基因型一致,第6号染色体上黑色圆点处线条为目标片段)。在实际中,在临近小区域中频繁发生交换的概率非常低。因此判断,图2b在非目标片段处显示的黑色线条为实验误差。也就是说,在误差允许的范围内(可靠性>99%),基于Affymetrix Axiom平台的Os90Kv1芯片同样具有较好的分型效果。
实施例4.功能基因单倍型分析及新增30K、60K的比较
水稻中很多重要农艺性状相关基因都并非单拷贝,例如绝大多数抗稻瘟病基因都属于NBS-LRR类基因家族。对于这类结构复杂的基因,要开发单个功能标记或者在基因上设计连锁标记很困难,可以通过基因区域单倍型标记检测基因功能。
为了验证育种芯片对于这类基因的单倍型分型效果,申请人针对水稻第6染色体的抗稻瘟病基因簇Pi2/Pi9/Pigm进行分析。为了鉴别R002、R005、R004和R006稻瘟病抗性材料中是否含有这个区域的抗稻瘟病基因,利用报道的含有特定基因的材料作为参照,含有Pi2基因参照品种为C101A51(Zhou等,The eight amino-acid differences within three leucine-rich repeats between Pi2and Piz-t resistance proteins determine the resistance specificity to Magnaporthegrisea.Mol Plant Microbe Interact.2006,19:1216-1228),含有Pi9基因参照品种为75-1-127(Qu等,The broad-spectrum blast resistance gene Pi9encodes a nucleotide-binding site-leucine-rich repeat protein and is a member of a multigene family in rice.Genetics.2006,172:1901-1914),含有Pigm基因参照品种为谷梅4号(GM4)(Deng等,Genetic  characterization and fine mapping of the blast resistance locus Pigm(t)tightly linked to Pi2and Pi9in a broad-spectrum resistant Chinese variety.TheorAppl Genet 113,705-713)。将待测样品和参照样品一共7个样品抽提DNA,按照Illumina infinium芯片检测流程,利用水稻全基因组育种芯片Rice60KAddon1检测,得到7个样品的全基因组基因型。
分别提取60K(WO2014/121419A1中公开的水稻全基因组育种芯片Rice60K所检测的SNP标记)和新增30K的SNP标记组合在Pi2/Pi9/Pigm基因区域(上下游250kb区域内)的结果进行聚类分析,结果如图3所示(纵坐标表示材料件差异值;横向为各检测材料,以横线相连的为划分为相同单倍型类型)。两者在此区域中聚类结果一致,即R002、R005、R006和C101A51的单倍型一致,而R004与GM4号的单倍型一致。该结果表明,R002、R005、R006含有Pi2基因,R004含有Pigm基因。使用Sanger法对上述材料目标基因进行测序验证,与聚类结果一致,说明根据功能基因区域单倍型设计的SNP标记可以实现其功能。此外,Rice60K的聚类结果显示75-1-127与C101A51差异值小于0.2,新增30K的结果为大于0.2接近0.3。数值越大,表明分类效果越好。而两份材料已经证实含有不同的抗性基因,因此可见新增30K在此功能基因区域分类效果优于Rice60K。
实施例5.SNP标记组合和芯片的应用
1.在水稻育种中的应用
中国专利申请CN201410532337.7(公开号CN105567790A)中所公开的含目标基因组DNA片段的植株选育方法:
(1)以不含目标基因组DNA片段的受体植物亲本作为轮回亲本,与含有所述目标基因组DNA片段的供体植物亲本,进行杂交、回交和自交;
(2)在育种过程中利用前景选择标记进行前景选择;
(3)在育种过程中利用高密度标记检测方法进行全基因组背 景选择;
(4)利用上述步骤直至获得目标基因组DNA片段两侧同源重组,目标基因组DNA片段纯合,且背景完全回复的目标植株。
步骤(3)中“高密度标记检测方法”即可以利用本申请所述SNP标记组合和针对这些SNP标记设计的芯片进行基因型检测。
2.在水稻身份鉴定中的应用
中国专利申请CN201610009053.9(公开号CN 105550537A)中所公开的一种鉴定水稻DNA身份的方法,通过检测分布于水稻全基因组的一组遗传多样性标记的基因型,获得水稻的标准基因指纹数据,由此鉴定所述水稻的DNA身份。
该方法中“分布于水稻全基因组的一组遗传多样性标记”即可以利用本申请所述SNP标记组合和针对这些SNP标记设计的芯片进行检测。
3.水稻基因定位及克隆中的应用
申请人研制的水稻全基因组育种芯片Rice6K已经被应用于水稻籽粒大小及产量相关QTL的定位(Sun等,Identification of quantitative trait loci for grain sizeand the contributions of major grain-size QTLs to grain weight in rice,Mol Breeding DOI10.1007/s11032-012-9802-z;Tan等,QTL Scanning for Rice Yield Using a Whole Genome SNP Array,Journal of Genetics and Genomics,2013),本申请所述SNP标记组合和针对这些SNP标记设计的芯片有目的性的增加了所检测的SNP位点,可以给基因定位及克隆提供更为准确的信息。
4.在其他方向的应用
本申请所述SNP标记组合和针对这些SNP标记设计的芯片增加了如下五类标记:种质资源代表性标记、推广杂交稻特有标记、野生稻来源标记、功能基因区域标记和功能基因区域单倍型标记。显而易见的是,所述SNP标记组合和针对这些SNP标记设计的芯片可应用与种质资源鉴定、杂交稻鉴定、野生稻鉴定、功能基因鉴定和功能基因单倍型分析。
实施例6实现检测功能的最少SNP标记数目设定
如实施例3所述,Rice60KAdd1可以精确判断A08-1所含有的稻瘟病抗性片段。Rice60KAdd1在A08-1中共有65071个高质量位点检出,其中能区分A08-1与受体亲本空育131在目标稻瘟病抗性片段的SNP标记共有11个,见下表,其中受体亲本空育131基因型设定为A,供体亲本K22基因型设定为B。
表1目标稻瘟病抗性片段空育131与A08-1差异SNP标记
Figure PCTCN2016109007-appb-000001
在实际判断中,一般认为材料有多态的位点连续出现3次AA或BB较为可靠,即上表中有3个以上SNP标记检测出差异即可确定材料在目标区段的差异。对65071个高质量位点进行标准随机抽样,随机抽取上述位点各100次,统计上表中11个差异SNP标记抽中的次数。结果显示,当抽样位点数大于37582时,11个差异SNP标记中,抽中的数量小于3个的概率小于0.05,属于正态分布中的小概率事件。即,Rice60KAdd1芯片所含的86,014个SNP标记中,37582为实现检测功能的最少SNP标记数目。
虽然,上文中已经用一般性说明及具体实施方案对本申请作了详尽的描述,但在本申请基础上,可以对之作一些修改或改进,这对本领域技术人员而言是显而易见的。因此,在不偏离本申请精神的基础上所做的这些修改或改进,均属于本申请要求保护的范围。

Claims (12)

  1. 用于水稻基因分型的SNP标记组合,其特征在于,包括SEQ ID NO:1-27781所示核苷酸序列中的SNP标记。
  2. 如权利要求1所述的SNP标记组合,其还包括SEQ ID:27782-86071所示核苷酸序列中的SNP标记;任选地,其包括SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记。
  3. 水稻芯片,其特征在于,包含针对SEQ ID NO:1-27781所示核苷酸序列中的SNP标记设计的检测位点。
  4. 如权利要求3所述的芯片,其中所述芯片还包含针对SEQ ID NO:27782-86071所示核苷酸序列中的SNP标记设计的检测位点;任选地,所述芯片包含针对SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记设计的检测位点;任选地,所述检测位点为针对SNP标记设计的探针组合。
  5. 如权利要求4所述的芯片,其中所述芯片利用在片原位合成法、离片合成法、或微珠法制作;任选地,所述芯片通过原位光刻合成法、光敏抗蚀层并行合成法、微流体通道在片合成法、光引导原位合成法、软光刻技术原位合成法、喷印合成法、分子印章在片合成法、无掩膜芯片合成法、BeadArray法、或悬浮芯片法制作;任选地,所述芯片通过Illumina Infinium技术或Affymetrix Axiom技术制作。
  6. 权利要求1-2中任一项所述的SNP标记组合,或权利要求3-5中任一项所述的芯片在检测生物样品中的用途,任选地,所述检测用于育种、身份鉴定、基因定位及克隆、种质资源鉴定、杂交稻鉴定、野生稻鉴定、功能基因鉴定或功能基因单倍型分析。
  7. 检测生物样品的方法,所述方法包括检测所述生物样品中SEQ ID NO:1-27781所示核苷酸序列中的SNP标记的信息。
  8. 如权利要求7所述的方法,所述方法还包括检测所述生物样品中SEQ ID:27782-86071所示核苷酸序列中的SNP标记的信息;任选地,所述方法包括检测所述生物样品中SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记的信息;
    任选地,利用基因芯片进行所述检测;任选地所述芯片包含针对SEQ ID NO:1-27781所示核苷酸序列中的SNP标记设计的检测位点;
    任选地,所述芯片还包含针对SEQ ID NO:27782-86071所示核苷酸序列中的SNP标记设计的检测位点;任选地,所述芯片包含针对SEQ ID NO:1-86071所示核苷酸序列中至少37582个核苷酸序列中的SNP标记设计的检测位点;任选地所述检测位点为针对SNP标记设计的探针组合;
    任选地所述芯片利用在片原位合成法、离片合成法、或微珠法制作;任选地,所述芯片通过原位光刻合成法、光敏抗蚀层并行合成法、微流体通道在片合成法、光引导原位合成法、软光刻技术原位合成法、喷印合成法、分子印章在片合成法、无掩膜芯片合成法、BeadArray法、或悬浮芯片法制作;任选地,所述芯片通过Illumina Infinium技术或Affymetrix Axiom技术制作。
  9. 筛选种质资源代表性SNP标记组合的方法,其包括以下步骤:
    从多个水稻品种测序结果中获取SNP位点;
    选择在Illumina评分系统中分值大于0.6的位点;
    对SNP位点进行综合评分,所述综合评分为以下数值的简单加和:
    I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
    II.SNP位点位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
    III.当SNP在编码区造成同义突变、非同义突变和大效应的突变时,分别给予突变评分2、5和10;
    IV.(SNP位点在整个群体中的MAF×25)+(SNP位点在籼稻群体中的MAF×25)+(SNP位点在粳稻群体中的MAF×25)+(SNP位点在混合测序中的MAF×25);
    根据综合评分,在水稻基因组上均匀选择多个SNP位点;以及
    根据LD值将水稻全基因组进行连锁不平衡区块划分,每个区块选择2个综合评分最高的位点、最多选择25个位点,满足每100kb至少选择10个位点。
  10. 筛选推广杂交稻特有SNP标记组合的方法,其包括以下步骤:
    对多份杂交稻进行全基因组测序,获得多个SNP位点;
    选择在Illumina评分系统中分值大于0.6的位点;
    SNP位点的综合评分,其由以下数值的简单加和组成:
    I.SNP位点差异为A/T或者C/G计0分,其他差异计20分;
    II.SNP位点位于基因间隔区、内含子、启动子、5’端非编码区(5’-UTR)和3’端非编码区(3’-UTR)不同位置时,分别给分1、1.5、2、2和2.5;
    III.当SNP在编码区造成同义突变、非同义突变和大效应的突变时,分别给分2、5和10;
    IV.SNP位点在混合测序中的MAF×50;
    根据综合评分结果,在水稻基因组上均匀选择多个SNP位点。
  11. 筛选野生稻来源SNP标记组合的方法,其包括以下步骤:
    从水稻SNP数据库中获得来源于野生水稻品种的SNP位点;
    除去在SNP位点上下游55bp内存在其它SNP或Indel的位点;
    选择在至少10%的品种中能检测出来的SNP位点;
    将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的SNP位点;
    选择在Illumina评分系统中分值大于0.6的位点;
    将水稻基因组按照位置每40kb划分区段,每个区段选择分值最高的SNP位点一个。
  12. 筛选功能基因区域标记组合的方法,其包括以下步骤:
    从水稻SNP数据库中获得多个SNP位点,其中所述多个SNP位点位于多个水稻品种的多个功能基因的核苷酸序列内并且能够在三个以上品种中检测出;
    除去在SNP位点上下游55bp内存在其它SNP或Indel的位点;
    将SNP位点上游或下游55bp的序列与水稻基因组进行比对,除去与基因组其它位置匹配度在70%以上的SNP位点;
    除去位于所述功能基因上下游5kb以外的SNP标记;
    选择在Illumina评分系统中分值大于0.6的位点;
    选择特定功能基因区域中的SNP位点,所述特定功能基因区域为WO/2014/121419A1公开的Rice60K芯片在此区域已有SNP位点数量不超过10。
PCT/CN2016/109007 2016-12-08 2016-12-08 水稻全基因组育种芯片及其应用 WO2018103037A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/109007 WO2018103037A1 (zh) 2016-12-08 2016-12-08 水稻全基因组育种芯片及其应用
CN201680091357.2A CN110050092B (zh) 2016-12-08 2016-12-08 水稻全基因组育种芯片及其应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/109007 WO2018103037A1 (zh) 2016-12-08 2016-12-08 水稻全基因组育种芯片及其应用

Publications (1)

Publication Number Publication Date
WO2018103037A1 true WO2018103037A1 (zh) 2018-06-14

Family

ID=62490633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/109007 WO2018103037A1 (zh) 2016-12-08 2016-12-08 水稻全基因组育种芯片及其应用

Country Status (2)

Country Link
CN (1) CN110050092B (zh)
WO (1) WO2018103037A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110257553A (zh) * 2019-08-05 2019-09-20 江苏省农业科学院 一种鉴定水稻稻瘟病抗性基因Pigm的KASP分子标记方法
CN110408719A (zh) * 2019-08-05 2019-11-05 江苏省农业科学院 一种鉴定水稻抗稻瘟病基因Pigm的四引物分子标记方法
WO2020082314A1 (zh) * 2018-10-25 2020-04-30 武汉双绿源创芯科技研究院有限公司 水稻绿色基因芯片与应用

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681709B (zh) * 2020-06-17 2023-04-28 深圳市早知道科技有限公司 一种设计高密度基因芯片上基因位点的方法
CN112941216A (zh) * 2020-12-29 2021-06-11 武汉基诺赛克科技有限公司 水稻1K SNP-Panel的开发方法与育种应用
CN113308562B (zh) * 2021-05-24 2022-08-23 浙江大学 棉花全基因组40k单核苷酸位点及其在棉花基因分型中的应用

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101583956A (zh) * 2007-01-17 2009-11-18 先正达参股股份有限公司 用于选择个体和设计育种程序的方法
WO2011008361A1 (en) * 2009-06-30 2011-01-20 Dow Agrosciences Llc Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
CN102747138A (zh) * 2012-03-05 2012-10-24 中国种子集团有限公司 一种水稻全基因组snp芯片及其应用
WO2014048062A1 (zh) * 2012-09-28 2014-04-03 未名兴旺系统作物设计前沿实验室(北京)有限公司 Snp位点集合及其使用方法与应用
WO2014121419A1 (zh) * 2013-02-07 2014-08-14 中国种子集团有限公司 水稻全基因组育种芯片及其应用
CN104328507A (zh) * 2014-10-11 2015-02-04 中国水稻研究所 一种用于水稻品种鉴定的snp芯片、制备方法及用途
CN104789648A (zh) * 2014-12-25 2015-07-22 中国种子集团有限公司 鉴定水稻CMS恢复基因Rf-1区段单倍型的分子标记及其应用
CN105550537A (zh) * 2016-01-07 2016-05-04 中国种子集团有限公司 鉴定水稻dna身份的方法及其应用
CN105567790A (zh) * 2014-10-10 2016-05-11 中国种子集团有限公司 含目标基因组dna片段植株的选育方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101583956A (zh) * 2007-01-17 2009-11-18 先正达参股股份有限公司 用于选择个体和设计育种程序的方法
WO2011008361A1 (en) * 2009-06-30 2011-01-20 Dow Agrosciences Llc Application of machine learning methods for mining association rules in plant and animal data sets containing molecular genetic markers, followed by classification or prediction utilizing features created from these association rules
CN102747138A (zh) * 2012-03-05 2012-10-24 中国种子集团有限公司 一种水稻全基因组snp芯片及其应用
WO2014048062A1 (zh) * 2012-09-28 2014-04-03 未名兴旺系统作物设计前沿实验室(北京)有限公司 Snp位点集合及其使用方法与应用
WO2014121419A1 (zh) * 2013-02-07 2014-08-14 中国种子集团有限公司 水稻全基因组育种芯片及其应用
CN105567790A (zh) * 2014-10-10 2016-05-11 中国种子集团有限公司 含目标基因组dna片段植株的选育方法
CN104328507A (zh) * 2014-10-11 2015-02-04 中国水稻研究所 一种用于水稻品种鉴定的snp芯片、制备方法及用途
CN104789648A (zh) * 2014-12-25 2015-07-22 中国种子集团有限公司 鉴定水稻CMS恢复基因Rf-1区段单倍型的分子标记及其应用
CN105550537A (zh) * 2016-01-07 2016-05-04 中国种子集团有限公司 鉴定水稻dna身份的方法及其应用

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUO, LONGBIAO ET AL.: "Progress and Prospects of Breeding by Gene Design in Rice", CHINESE JOURNAL OF RICE SCIENCE, vol. 22, no. 6, 30 November 2008 (2008-11-30), pages 650 - 657 *
XIAO, JINGHUA ET AL.: "The Progress and Perspective of Rice Functional Genomics Research", CHINESE SCIENCE BULLETIN, vol. 60, no. 18, 30 June 2015 (2015-06-30), pages 1711 - 1722 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082314A1 (zh) * 2018-10-25 2020-04-30 武汉双绿源创芯科技研究院有限公司 水稻绿色基因芯片与应用
CN110257553A (zh) * 2019-08-05 2019-09-20 江苏省农业科学院 一种鉴定水稻稻瘟病抗性基因Pigm的KASP分子标记方法
CN110408719A (zh) * 2019-08-05 2019-11-05 江苏省农业科学院 一种鉴定水稻抗稻瘟病基因Pigm的四引物分子标记方法
CN110408719B (zh) * 2019-08-05 2022-07-08 江苏省农业科学院 一种鉴定水稻抗稻瘟病基因Pigm的四引物分子标记方法
CN110257553B (zh) * 2019-08-05 2022-07-08 江苏省农业科学院 一种鉴定水稻稻瘟病抗性基因Pigm的KASP分子标记方法

Also Published As

Publication number Publication date
CN110050092B (zh) 2023-01-03
CN110050092A (zh) 2019-07-23

Similar Documents

Publication Publication Date Title
WO2018103037A1 (zh) 水稻全基因组育种芯片及其应用
KR102015929B1 (ko) 논벼의 전체 게놈 육종 칩 및 이의 응용
CN109196123B (zh) 用于水稻基因分型的snp分子标记组合及其应用
CN108779459B (zh) 棉花全基因组snp芯片及其应用
CN108998550B (zh) 用于水稻基因分型的snp分子标记及其应用
CN104024438A (zh) Snp位点集合及其使用方法与应用
CN107090495B (zh) 与谷子脖长性状相关的分子标记及其检测引物和应用
CN107090494B (zh) 与谷子码粒数性状相关的分子标记及其检测引物和应用
CN115198023B (zh) 一种海南黄牛液相育种芯片及其应用
CN115029451B (zh) 一种绵羊液相芯片及其应用
WO2022165853A1 (zh) 一种大豆snp分型检测芯片及其在分子育种与基础研究中的应用
US20210285063A1 (en) Genome-wide maize snp array and use thereof
CN110675915B (zh) 一种同时定位两个性状相关基因的方法
CN110846429A (zh) 一种玉米全基因组InDel芯片及其应用
CN107090450B (zh) 与谷子穗长性状相关的分子标记及其检测引物和应用
CN112289384A (zh) 一种柑橘全基因组kasp标记库的构建方法及应用
CN108866233B (zh) 用于鉴定桃树对南方根结线虫的抗病/感病性状的标记位点、引物对、试剂盒及应用
CN113718052B (zh) 5000个snp位点组合的应用及小麦品种真实性身份鉴定的方法
CN112813186A (zh) 基于kasp的大豆核心snp标记及其应用
CN112538535A (zh) 一种与长毛兔产毛量相关的分子标记及其应用
Thomson et al. Development and application of 96-and 384-plex single nucleotide polymorphism (SNP) marker sets for diversity analysis, mapping and marker-assisted selection in rice
CN114457070A (zh) 一种小麦-二倍体长穗偃麦草45k液相芯片及应用
CN109913575A (zh) 一种鉴定辣椒cms雄性不育恢复基因的kasp分子标记、试剂盒及其应用
CN115976260A (zh) 用于水稻基因分型的snp分子标记及其应用
CN118043485A (zh) 玉米的snp芯片及应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16923497

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16923497

Country of ref document: EP

Kind code of ref document: A1