WO2020199127A1 - Design of sequencing primers and pcr-based method for sequencing whole genome - Google Patents

Design of sequencing primers and pcr-based method for sequencing whole genome Download PDF

Info

Publication number
WO2020199127A1
WO2020199127A1 PCT/CN2019/081029 CN2019081029W WO2020199127A1 WO 2020199127 A1 WO2020199127 A1 WO 2020199127A1 CN 2019081029 W CN2019081029 W CN 2019081029W WO 2020199127 A1 WO2020199127 A1 WO 2020199127A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
primer
universal
amplification
pcr
Prior art date
Application number
PCT/CN2019/081029
Other languages
French (fr)
Chinese (zh)
Inventor
夏志强
邹枚伶
Original Assignee
中国热带农业科学院热带生物技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国热带农业科学院热带生物技术研究所 filed Critical 中国热带农业科学院热带生物技术研究所
Priority to PCT/CN2019/081029 priority Critical patent/WO2020199127A1/en
Publication of WO2020199127A1 publication Critical patent/WO2020199127A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids

Definitions

  • the present invention relates to the field of gene sequencing, in particular to the design of sequencing primers and a PCR-based whole genome sequencing method.
  • genotyping is a series of genetic analysis, including the use of high-throughput sequencing technology for molecular marker discovery and genotyping, one of the most powerful applications is in the field of plant breeding, opening up new ideas for plant breeding and plant genetics research. Possibility. It provides a cost-effective whole genome scanning and multiple sequencing platform.
  • Simplified sequencing technology is a new application of high-throughput sequencing for mining single nucleotide polymorphisms (SNPs) in species populations and genotyping them. Bioinformatics tools are needed to analyze and parse simplified sequencing data sets. It is a low-cost technology and an excellent MAS tool.
  • Simplified sequencing has been successfully applied to genome-wide association analysis, molecular marker mining, genetic linkage map, genomic genetic selection, and population genomic diversity research in large-scale plant molecular breeding strategies.
  • Simplified genome sequencing is developed on the basis of second-generation sequencing. It uses enzyme digestion technology, sequence capture chip technology or other experimental methods to reduce the complexity of the species genome, sequence specific regions of the genome, and then reflect the structural information of part of the genome sequence. Sequencing technology. Simplified genome sequencing currently developed includes: polymorphic sequence sequencing with reduced complexity, DNA sequencing related to restriction enzyme sites, and genotyping sequencing.
  • restriction enzyme site-related DNA which uses restriction enzymes to digest the genome, generate fragments of a certain size, construct a sequencing library, and limit the restriction generated after digestion. Restriction site marking for high-throughput sequencing. Since restriction enzyme site markers are genome-wide small fragments of DNA tags near specific restriction sites, which represent the sequence characteristics of the entire genome, the restriction enzyme site markers can be sequenced in most Thousands of single nucleotide polymorphism markers are obtained in biology.
  • Sequencing-based genotyping is a new application of high-throughput sequencing solutions for the discovery and genotyping of single nucleotide polymorphisms for crop improvement.
  • the low cost of simplified sequencing makes it an attractive method to completely map and propagate populations through high-density single nucleotide polymorphism markers.
  • the continuous improvement of sequencing and base calling software will enable high-throughput sequencing technology to provide higher sequencing throughput for each run, thereby achieving deeper multiplexing to obtain a fixed average sequencing depth for each sample.
  • simplified sequencing has become a cost-competitive alternative to other whole-genome genotyping platforms.
  • Plant breeders will be able to sequence the genomes of large crops and build high-density genetic linkage maps from breeding populations.
  • the future application of simplified sequencing in crop improvement may allow plant breeders to mark new germplasm or species without first developing any previous molecular tools. Since sequence-based genotyping can be used for entire genome research, simplified sequencing will become one of the main components of plant genetics and breeding.
  • Genotyping chips are designed to design probes using the sequences flanking the known SNPs. After the probe is fixed on the chip, the DNA of the sample to be tested hybridizes with the chip and the hybridization fluorescence signal is scanned to identify the genotype of these probe sites (sites of single nucleotide polymorphism). The most representative brands are illumina and affymetrix.
  • the chip has very high practical value in the rapid identification of breeding materials, genome selection, germplasm resource analysis, variety improvement, QTL mapping, genetic analysis, and variety authenticity identification, and has great cost advantages.
  • the current use of chips has some difficulties that limit the wide application of chips in crop genetic improvement.
  • the embodiment of the present invention provides a sequencing primer set and a PCR-based whole-genome sequencing method, which aims to solve the technical problems of slow speed, high requirements, and complicated operations in screening genomes of SNP sites in the existing chip.
  • a sequencing primer set is provided, and the sequencing primer set includes a universal upstream primer, a universal downstream primer, and an enriched promoter downstream primer;
  • the sequence of the universal upstream primer includes: 5'-T[barcode]CAAAXXXXNNN-3';
  • the sequence of the universal downstream primer includes: 5'-GACTGCGTACGZZZZNNN-3';
  • sequence of the primer downstream of the enriched promoter includes: 5'-GACTGCGTACYYNCTATA-3';
  • the length of "XXXX” is 4 bases, and the base is selected from at least one of A, T, C and G; the length of "ZZZZ” is 4 bases, and the base is selected From at least one of A, T, C and G; the length of the "barcode” is 4-6 bases, and the base is selected from at least one of A, T, C and G; the "N “Is any one of bases A, T, C, and G, and the "Y” is base C or T.
  • a PCR-based whole genome sequencing method which includes the following steps:
  • the sequencing primer set of any one of the above is provided, the first PCR amplification is performed on the DNA of the test sample using the universal upstream primer and the universal downstream primer to obtain the first amplification product, and the universal upstream primer and The downstream primer of the enriched promoter performs a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
  • the first amplification product and the second amplification product are respectively subjected to sequencing analysis.
  • the sequencing primer set provided by the present invention includes a universal upstream primer with a unique sequence, a universal downstream primer, and an enriched promoter downstream primer.
  • the universal upstream primer and the universal downstream primer include four bases: "XXXX” and "ZZZZ" (The bases are selected from at least one of A, T, C and G in combination), these eight bases are the core base region of the universal primer, which can be changed according to the changes in the genome of the sample to be tested.
  • the primer scheme can be optimized according to the sequence of the genome, and the size of the genome and the size of the sample can be adjusted arbitrarily; at the same time, the design has a starter by simulating the base sequence of the upstream region of eukaryotes
  • the primers that are characteristic of the Pormoter region are the downstream primers of the enriched promoter, so that the universal upstream primer and the downstream primer of the enriched promoter can effectively amplify the sequence near the promoter.
  • Such a primer set is used for whole-genome amplification and sequencing, not only has a good enrichment effect, but also makes library construction and sequencing more concise and accurate, and reduces sequencing costs.
  • the present invention provides a PCR-based whole-genome sequencing method, which amplifies the DNA of the test sample by using the above-mentioned sequencing primer set to obtain the PCR product of the test sample, and then performs sequencing analysis.
  • the above-mentioned PCR-based whole-genome sequencing method takes into account the characteristics of "discovering” and “detecting” single nucleotide polymorphism (SNP) sites, and can detect and screen the whole genome while new hair styles are marked.
  • Figure 1 shows the results of agarose electrophoresis analysis of the PCR products of 5 samples with universal primers provided in Example 1 of the present invention.
  • Fig. 2 is the agarose electrophoresis analysis result of five samples enriched in Pormoter region primer PCR products provided in Example 1 of the present invention.
  • Figure 3 is a graph of the genome coverage analysis of sequencing reads obtained by the universal primer method.
  • Figure 4 is a graph of genome coverage analysis of sequencing reads obtained by enriching Pormoter region primers.
  • Figure 5 shows the enrichment of PCR products in the gene region.
  • An example of the present invention provides a sequencing primer set, which includes a universal upstream primer, a universal downstream primer, and an enriched promoter downstream primer;
  • the sequence of the universal upstream primer includes: 5'-T[barcode]CAAAXXXXNNN-3';
  • the sequence of the universal downstream primer includes: 5'-GACTGCGTACGZZZZNNN-3';
  • sequence of the primer downstream of the enriched promoter includes: 5'-GACTGCGTACYYNCTATA-3';
  • the length of "XXXX" is 4 bases, and the base is selected from at least one of A, T, C and G; the length of "ZZZZ” is 4 bases, and the base is selected From at least one of A, T, C and G; the length of the "barcode” is 5 bases, and the base is selected from at least one of A, T, C and G; the "N” is Any one of bases A, T, C, and G, and the "Y" is base C or T.
  • the length of "XXXX” is 4 bases, and the base is selected from at least one of A, T, C, and G; the length of "ZZZZ” is 4 bases, and the base is At least one selected from A, T, C, and G.
  • These eight bases are the core base region of the universal primer, which can be changed according to the changes in the genome of the sample to be tested, so it can amplify different sequence fragments on different genomes according to the difference of bases.
  • the "barcode” is designed on the universal upstream primer, and its length is 5 bases, and it is any combination of bases A, T, C, and G.
  • the "barcode” is preferably 5 bases, and based on any combination of bases A, T, C, and G, a total of 384 "barcodes" are available for selection.
  • any number of “barcode” sequences can be added according to different samples to further distinguish the samples.
  • the added “barcode” changes according to the change of the genome of the sample to be tested.
  • the purpose of adding "barcode” is to distinguish each different sample, that is, to distinguish each DNA sequence;
  • the downstream primer of the enriched promoter 5'-GACTGCGTACYYNCTATA-3'.
  • Designing an enriched promoter downstream primer the principle is to design a primer with the characteristics of the promoter region by simulating the base sequence of the upstream region of eukaryotes, that is, the enriched promoter downstream primer, so that the primer can effectively amplify the vicinity of the promoter region The sequence to further achieve the purpose of gene enrichment, which is conducive to subsequent experiments.
  • the "N” is any one of bases A, T, C, and G, and the "Y” is base C Or T.
  • the coverage of the primers can be increased, and the primers can amplify the entire genomic DNA to a greater extent; on the other hand, the design method of mixed bases is adopted. It can further improve the binding ability of the primer and the sample to be tested, and achieve the purpose of gene enrichment.
  • "N” is a random base. Several random bases are designed at the end of the universal primer, which can eliminate the bias effect of PCR enrichment and is used to restore the real genome.
  • the "N "It can be designed to any number of bases from 3 to 8 for greater simplification of the genome. It can be used in complex or large genomes, and it can also be adjusted freely to reduce costs (larger samples, reduced SNP).
  • cassava is used as the detection object.
  • the sequence of the universal upstream primer is: 5'-T[barcode]CAAACCGGNNN-3'
  • sequence of the universal downstream primer is : 5'-GACTGCGTACGAATTNNN-3'; or,
  • the sequence of the universal upstream primer is: 5'-T[barcode]CAAACGCGNNN-3'
  • sequence of the universal downstream primer is: 5'-GACTGCGTACGTATANNN-3'.
  • the core base "XXXX" of the universal upstream primer is “CCGG” or "CGCG”; the core base “ZZZZ” of the universal downstream primer is determined as “AATT” "Or “ATAT”. Choosing the bases with a higher frequency in the cassava gene as the core bases of the primers can further increase the coverage of the amplification and have a good enrichment effect.
  • cassava is used as the detection object, and the "barcode" is selected from at least one of CTTAT, GTAGA, CCTCG, GAACT and TTACT.
  • a 5-base-length "barcode” is designed on the universal upstream primers to distinguish each DNA sequence, while ensuring more uniform amplification, higher coverage of the whole genome, and more comprehensive .
  • the sequencing primer set provided by the present invention includes a universal upstream primer with a unique sequence, a universal downstream primer, and an enriched promoter downstream primer, wherein the universal upstream primer and the universal downstream primer each include four bases : "XXXX” and "ZZZZ" (the bases are all selected from at least one of A, T, C and G in combination), these eight bases are the core base regions of the universal primer, which can be based on the sample genome According to the different bases, it can amplify different sequence fragments on different genomes.
  • the primer scheme can be optimized according to the sequence of the genome, and the genome size and sample size can be adjusted arbitrarily; at the same time, by simulating eukaryotes
  • Such a primer set is used for whole-genome amplification and sequencing, not only has a good enrichment effect, but also makes library construction and sequencing more concise and accurate, and reduces sequencing costs.
  • an embodiment of the present invention also provides a PCR-based whole genome sequencing method, which includes the following steps:
  • S02. Provide the sequencing primer set, use the universal upstream primer and the universal downstream primer to perform the first PCR amplification on the DNA of the test sample to obtain the first amplification product, and use the universal upstream primer and the The downstream primer of the enriched promoter performs a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
  • the first amplification product and the second amplification product are respectively subjected to sequencing analysis.
  • the DNA of the sample to be tested is prepared.
  • the method for preparing the DNA of the sample to be tested is selected from any one of a whole genome DNA extraction method, a PCR kit amplification method, and a lysis solution preparation method.
  • the ordinary whole-genome DNA extraction method mainly includes the use of cetyltrimethylammonium bromide method (CTAB method).
  • CTAB method cetyltrimethylammonium bromide method
  • This method mainly uses cationic detergents to precipitate nucleic acids and acidity from low ionic strength solutions. Polysaccharides. In a solution with high ion concentration, cetyltrimethylammonium bromide forms complexes with proteins and polysaccharides, but does not precipitate nucleic acids.
  • ethanol precipitation is added, even if the nucleic acid is separated, the whole genome DNA of the sample is obtained; or SDS (sodium dodecylbenzene sulfonate) is used to lyse the cells to make Chromosome segregation and protein denaturation, while SDS combines with protein and polysaccharides to form a complex, releasing nucleic acid.
  • SDS sodium dodecylbenzene sulfonate
  • it can also be directly prepared by the PCR kit amplification method; or the whole genomic DNA can be obtained by preparing a lysis solution and lysing tissue.
  • the direct PCR method described in the embodiments of the present invention can quickly enrich the whole genomic DNA of the sample.
  • the DNA requirements are not high, the method is simple and fast, and the library preparation time is greatly shortened. , Improve test efficiency.
  • the sequencing primer set is provided, the first PCR amplification is performed on the DNA of the test sample using the universal upstream primer and the universal downstream primer to obtain the first amplification product, and the The universal upstream primer and the downstream primer of the enriched promoter perform a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
  • the first PCR amplification step includes:
  • the general PCR system (20 ⁇ L system) is as follows:
  • the binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
  • the enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
  • the second PCR amplification step includes:
  • Advanced use of universal upstream primers and enrichment promoter downstream primers are bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enriched target region is performed.
  • the PCR system (20 ⁇ L system) for enriching the Pormoter zone is as follows:
  • the binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
  • the enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
  • the main purpose of combined amplification is to first combine the aforementioned primers with the sample DNA.
  • the number of PCR reaction cycles combined with amplification is 5, and each primer can bind to the sample DNA strand under this number of cycles.
  • the annealing temperature is 35°C, mainly through this pre-amplification step, setting a lower annealing temperature , So that the "XXXX”, “N” and “Y” designed on the primer can be accurately and quickly combined with the sample DNA, and the primer is positioned on the sample DNA, which is conducive to subsequent enrichment and amplification.
  • the main purpose of enrichment and amplification is to enrich the gene to produce an enrichment effect and facilitate subsequent detection.
  • the number of cycles of the PCR reaction of the enrichment amplification is 35, and the number of cycles is increased to 35, mainly to be able to amplify a large amount of sample DNA and increase the number of sample DNA enriched fragments .
  • the annealing temperature is set to 50 ⁇ 58°C. A suitable temperature can increase the rate of DNA amplification and increase the amount of DNA enrichment.
  • the first amplification product and the second amplification product are respectively subjected to sequencing analysis.
  • agarose gel electrophoresis is used to detect the first amplification product and the second amplification product respectively to ensure that the product can be obtained by PCR amplification.
  • the first amplification product is quantified to a concentration of 180-200 ng/ ⁇ L
  • the second amplification product is quantified to a concentration of 180-200 ng/ ⁇ L
  • the concentration of each PCR product is kept consistent.
  • the concentration is too low, the amount of PCR amplification product obtained is small, and it is easy to be interfered by impurities in the subsequent test and detection process; if the concentration is too high, the amount of PCR amplification product is too much. High, unclear analysis, unclear sequencing results.
  • the mixing method is selected from any one of suction mixing and centrifugal mixing.
  • the single samples are respectively quantified and then mixed for sequencing, which can overcome the problem of the uniformity of the data output from the common phenomenon of simplified sequencing to a certain extent.
  • the second-generation library construction sequencing is performed on the first mixture and the second mixture, and the sequencing results are analyzed.
  • the second-generation library construction sequencing analysis usually requires illumina sequencing.
  • the obtained sequencing results are analyzed, and the [barcode] of different primers designed can be used to distinguish samples for analysis, not only for detection but also for mining population variation sites.
  • the present invention provides a PCR-based whole-genome sequencing method that amplifies the DNA of the test sample by using the sequencing primer set to obtain the PCR product of the test sample, and then performs sequencing analysis.
  • the above-mentioned PCR-based whole-genome sequencing method takes into account the characteristics of "discovering” and “detecting” single nucleotide polymorphism (SNP) sites, and can detect and screen the whole genome while new hair styles are marked.
  • Step 1 Prepare the DNA of the sample to be tested
  • Genome DNA was extracted from different varieties of cassava that had the same growth under the same growth conditions, the same growth period, the same part, and no pests and diseases. Long-term storage of samples requires liquid nitrogen or a refrigerator below -70°C. Adopt DNeasy 96 Plant Kit (QIAGEN) kit for extraction of genomic DNA.
  • agarose gel detection is marked with ⁇ marker, take 1 ⁇ L of DNA, add 2 ⁇ L of l0 ⁇ bromophenol blue loading buffer, mix well, and spot containing 0.5 ⁇ g/ml
  • a 0.8% agarose gel with Goldview dye use 1 ⁇ TAE buffer, 90 V for 40 min electrophoresis; gel imaging analysis system (Tanon4100) to observe DNA bands.
  • Step 2 Provide the sequencing primer set, use the universal upstream primer and universal downstream primer to perform the first PCR amplification on the DNA of the test sample to obtain the first amplification product, and use the universal upstream primer and the universal The downstream primer of the enriched promoter performs a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
  • 5 universal upstream primers are designed, namely
  • a universal downstream primer sequence was designed as:
  • SEQ ID NO. 7 (5'-GACTGCGTACYYNCTATA-3').
  • the first PCR amplification is performed on the DNA of the sample to be tested using the universal upstream primer and the universal downstream primer, and the first PCR amplification step includes:
  • the general PCR system (20 ⁇ L system) is as follows:
  • the binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
  • the enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
  • the second PCR amplification is performed on the DNA of the test sample by using the universal upstream primer and the downstream primer of the enriched promoter, and the second PCR amplification step includes:
  • Advanced use of universal upstream primers and enrichment promoter downstream primers are bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enriched target region is performed.
  • the PCR system for enriching the Pormoter zone (20 ⁇ L system) is as follows:
  • the binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
  • the enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
  • Step 3 Perform sequencing analysis on the first amplification product and the second amplification product respectively.
  • Figure 1 is a graph of the detection results of the first amplification product of 5 samples
  • Figure 2 is a graph of the detection results of the second amplification product of 5 samples.
  • FIG. 3 is the genome coverage of the sequencing reads obtained by the universal primer method
  • Figure 4 is the genome coverage of the sequencing reads obtained by the enrichment Pormoter region primer method
  • Figure 5 is the enrichment in the gene region.
  • 70% of the PCR products are Enriched in non-gene region (Non-gene region)
  • 17% of PCR products are enriched in intron region (Intron)
  • 7% of PCR products are enriched in CDS region
  • 6% of PCR products are enriched in UP_2000 region .
  • the PCR method of the present invention is used to perform whole-genome sequencing, which has a very high coverage rate on the genome, and the result is extremely reliable.
  • this method is more concise and convenient to build a database, at the same time, it has high coverage, has a gene enrichment effect, and has a short sequencing time. It can be widely used in genome sequencing of different samples.

Abstract

Disclosed are a sequencing primer set, comprising a universal upstream primer, a universal downstream primer, and a promoter-enriched downstream primer; and a PCR-based method for sequencing a whole genome, the method comprising: by using the universal upstream primer and the universal downstream primer, performing first PCR amplification on the DNA of a sample to be sequenced, so as to obtain a first amplification product, and by using the universal upstream primer and the universal downstream primer, performing second PCR amplification on the DNA of the sample to be sequenced, so as to obtain a second amplification product; and simultaneously performing sequencing on the two amplification products. According to the above-mentioned method, the genome size and the sample amount can be adjustable, and the requirements for DNA quality are not high; there is no need for known SNP markers and for pre-construction of chips; and the method can meet requirements for the detection and screening of the whole genome.

Description

测序引物的设计及基于PCR的全基因组测序方法Design of sequencing primers and PCR-based whole-genome sequencing method 技术领域Technical field
本发明涉及基因测序领域,尤其涉及一种测序引物的设计及基于PCR的全基因组测序方法。The present invention relates to the field of gene sequencing, in particular to the design of sequencing primers and a PCR-based whole genome sequencing method.
背景技术Background technique
随着基因分型是一系列遗传分析,包括利用高通量测序技术进行分子标记发现和基因分型,最强大的应用之一是植物育种领域,为植物育种和植物遗传学研究开辟了新的可能性。它提供经济高效的全基因组扫描和多重测序平台。简化测序技术是高通量测序的新应用,用于挖掘物种群体中的单核苷酸多态性(SNP)并对其进行基因分型。需要生物信息学工具来分析和解析简化测序数据集。是低成本的技术和优秀的MAS工具,简化测序目前已成功应用于大规模植物分子育种策略中的全基因组关联分析研究,分子标记挖掘,遗传连锁图谱,基因组遗传选择和群体基因组多样性研究。As genotyping is a series of genetic analysis, including the use of high-throughput sequencing technology for molecular marker discovery and genotyping, one of the most powerful applications is in the field of plant breeding, opening up new ideas for plant breeding and plant genetics research. Possibility. It provides a cost-effective whole genome scanning and multiple sequencing platform. Simplified sequencing technology is a new application of high-throughput sequencing for mining single nucleotide polymorphisms (SNPs) in species populations and genotyping them. Bioinformatics tools are needed to analyze and parse simplified sequencing data sets. It is a low-cost technology and an excellent MAS tool. Simplified sequencing has been successfully applied to genome-wide association analysis, molecular marker mining, genetic linkage map, genomic genetic selection, and population genomic diversity research in large-scale plant molecular breeding strategies.
简化基因组测序是在第二代测序基础上发展起来的一种利用酶切技术、序列捕获芯片技术或其他实验手段降低物种基因组复杂程度,对基因组特定区域进行测序,进而反映部分基因组序列结构信息的测序技术。目前发展起来的简化基因组测序有:复杂度降低的多态序列测序,限制性酶切位点相关的DNA测序,基因分型测序等。Simplified genome sequencing is developed on the basis of second-generation sequencing. It uses enzyme digestion technology, sequence capture chip technology or other experimental methods to reduce the complexity of the species genome, sequence specific regions of the genome, and then reflect the structural information of part of the genome sequence. Sequencing technology. Simplified genome sequencing currently developed includes: polymorphic sequence sequencing with reduced complexity, DNA sequencing related to restriction enzyme sites, and genotyping sequencing.
一个简单的,快速和低成本有效的系统,已经用于在非模式生物的测序。其中运用最为广泛的是限制性酶切位点相关DNA的测序技术,该技术利用限制性内切酶对基因组进行酶切、产生一定大小的片段、构建测序文库、对酶切后产生的限制性酶切位点标记进行高通量测序。由于限制性酶切位点标记是全基因组范围的呈现特异性酶切位点附近的小片段DNA标签,代表了整个基因组的序列特征, 因此通过对限制性酶切位点标记测序能够在大多数生物中获得成千上万的单核苷酸多态性标记。A simple, fast and cost-effective system has been used for sequencing in non-model organisms. Among them, the most widely used is the sequencing technology of restriction enzyme site-related DNA, which uses restriction enzymes to digest the genome, generate fragments of a certain size, construct a sequencing library, and limit the restriction generated after digestion. Restriction site marking for high-throughput sequencing. Since restriction enzyme site markers are genome-wide small fragments of DNA tags near specific restriction sites, which represent the sequence characteristics of the entire genome, the restriction enzyme site markers can be sequenced in most Thousands of single nucleotide polymorphism markers are obtained in biology.
基于测序的基因分型是高通量测序方案的新应用,用于发现和基因分型用于作物改良的单核苷酸多态性。简化测序的低成本使其成为一种有吸引力的方法,可以通过高密度的单核苷酸多态性标记来完全绘制和繁殖种群。测序和碱基调用软件的不断改进将使高通量测序技术能够为每次运行提供更高的测序通量,从而实现更深的多重化,以获得每个样品的固定平均测序深度。随着每次运行产生的序列信息的数量和质量不断增加,这使得每个样本的plex更高且成本更低,简化测序已成为其他全基因组基因分型平台的成本竞争性替代品。植物育种者将能够对大型作物基因组进行测序,并从繁殖种群中建立高密度的遗传连锁图谱。简化测序未来在作物改良中的应用可能允许植物育种者在新的种质或物种上进行标记,而无需先开发任何先前的分子工具。由于基于序列的基因分型可用于整个基因组研究,简化测序将成为植物遗传学和育种的主要组成部分之一。Sequencing-based genotyping is a new application of high-throughput sequencing solutions for the discovery and genotyping of single nucleotide polymorphisms for crop improvement. The low cost of simplified sequencing makes it an attractive method to completely map and propagate populations through high-density single nucleotide polymorphism markers. The continuous improvement of sequencing and base calling software will enable high-throughput sequencing technology to provide higher sequencing throughput for each run, thereby achieving deeper multiplexing to obtain a fixed average sequencing depth for each sample. With the continuous increase in the quantity and quality of sequence information generated by each run, which makes the plex of each sample higher and lower in cost, simplified sequencing has become a cost-competitive alternative to other whole-genome genotyping platforms. Plant breeders will be able to sequence the genomes of large crops and build high-density genetic linkage maps from breeding populations. The future application of simplified sequencing in crop improvement may allow plant breeders to mark new germplasm or species without first developing any previous molecular tools. Since sequence-based genotyping can be used for entire genome research, simplified sequencing will become one of the main components of plant genetics and breeding.
通过简化测序鉴定高密度单核苷酸多态性标记以构建遗传谱图对于植物育种中的众多应用具有重要价值。The identification of high-density single-nucleotide polymorphism markers through simplified sequencing to construct a genetic map is of great value for many applications in plant breeding.
基因分型芯片是利用已知的单核苷酸多态性的位点侧翼的序列设计探针。探针固定在芯片上后,待测定样本的DNA与芯片杂交并扫描杂交荧光信号,从而鉴定这些探针位点(单核苷酸多态性的位点)的基因型。最有代表性的品牌是illumina和affymetrix。Genotyping chips are designed to design probes using the sequences flanking the known SNPs. After the probe is fixed on the chip, the DNA of the sample to be tested hybridizes with the chip and the hybridization fluorescence signal is scanned to identify the genotype of these probe sites (sites of single nucleotide polymorphism). The most representative brands are illumina and affymetrix.
单核苷酸多态性的芯片已成为作物遗传改良的重要工具。分子育种时代已经全面到来,为了解决分子标记检测手段的不足,传统育种周期长、不可预见,全凭育种家的经验和肉眼筛选。可把大田搬到实验室,进行大规模精准筛选,排除95%以上的单株,剩下少量单株种到大田,大大减少了田间工作量。原方法一个品种平均8到10年的育种周期,该方法3到5年就可完成。Single nucleotide polymorphism chips have become an important tool for crop genetic improvement. The era of molecular breeding has come in an all-round way. In order to solve the shortcomings of molecular marker detection methods, traditional breeding cycles are long and unpredictable, relying on the experience of breeders and visual screening. The field can be moved to the laboratory for large-scale precision screening, and more than 95% of the individual plants can be excluded, and a small number of individual plants can be planted in the field, which greatly reduces the workload of the field. The original method has an average breeding cycle of 8 to 10 years for a variety, and this method can be completed in 3 to 5 years.
目前芯片其在育种材料的快速鉴定、基因组选择、种质资源分析、品种改良、QTL定位、遗传分析、品种的真实性鉴定等方面有非常高的实用价值,并且具有极大的成本优势。但是,目前芯片的使用存在一些困难限制了芯片在作物遗传改良的广泛引用。主要困难有:(1)目前芯片前期研发成本较高,需要有已经测序的参考基因组和已知SNP标记;仅仅能够检测已知SNP位点,得到标记数量较少;仅仅能够检测已有SNP标记,无法发现新的SNP位点;检测手段仪器依耐性较高;目前仅仅几个育种的模式物种有相应的芯片;(2)简化测序成本还不够低,因为建库繁琐其效率不够高,对于育种大样本的检测很难广泛使用;由于都需要酶切方法,所以简化测序对DNA质量要求较高,无法使用直接解列的DNA(直接PCR);建库中出现样本之间测序量不均一特征,需要补充测序;没有基因区的富集效应。At present, the chip has very high practical value in the rapid identification of breeding materials, genome selection, germplasm resource analysis, variety improvement, QTL mapping, genetic analysis, and variety authenticity identification, and has great cost advantages. However, the current use of chips has some difficulties that limit the wide application of chips in crop genetic improvement. The main difficulties are: (1) At present, the cost of early-stage development of the chip is relatively high, and it needs to have a sequenced reference genome and known SNP markers; only known SNP sites can be detected, and the number of markers obtained is small; only existing SNP markers can be detected , It is impossible to find new SNP sites; the detection methods and instruments are highly resistant; currently only a few breeding model species have corresponding chips; (2) the simplified sequencing cost is not low enough, because the library construction is complicated and the efficiency is not high enough. The detection of large breeding samples is difficult to be widely used; due to the need for enzyme digestion methods, simplified sequencing requires high DNA quality, and it is not possible to use directly de-sorted DNA (direct PCR); the amount of sequencing between samples is uneven during library construction Features need to be sequenced; there is no enrichment effect of gene regions.
技术问题technical problem
本发明实施例提供了一种测序引物组及基于PCR的全基因组测序方法,旨在解决现有芯片筛选基因组中SNP位点速度慢、要求高、操作复杂的技术问题。The embodiment of the present invention provides a sequencing primer set and a PCR-based whole-genome sequencing method, which aims to solve the technical problems of slow speed, high requirements, and complicated operations in screening genomes of SNP sites in the existing chip.
技术解决方案Technical solutions
本发明实施例是这样实现的,第一方面,提供了一种测序引物组,所述测序引物组包括通用上游引物、通用下游引物及富集启动子下游引物;The embodiments of the present invention are implemented in this way. In the first aspect, a sequencing primer set is provided, and the sequencing primer set includes a universal upstream primer, a universal downstream primer, and an enriched promoter downstream primer;
所述通用上游引物的序列包括:5’-T[barcode]CAAAXXXXNNN-3’;The sequence of the universal upstream primer includes: 5'-T[barcode]CAAAXXXXNNN-3';
所述通用下游引物的序列包括:5’-GACTGCGTACGZZZZNNN-3’;The sequence of the universal downstream primer includes: 5'-GACTGCGTACGZZZZNNN-3';
所述富集启动子下游引物的序列包括:5’-GACTGCGTACYYNCTATA-3’;The sequence of the primer downstream of the enriched promoter includes: 5'-GACTGCGTACYYNCTATA-3';
其中,所述“XXXX”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“ZZZZ”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“barcode”的长度为4-6个碱基,且碱基选自A、T、C和G的至少一种;所述“N”为碱基A、T、C和G中的任意一种,所述“Y”为碱基C或T。Wherein, the length of "XXXX" is 4 bases, and the base is selected from at least one of A, T, C and G; the length of "ZZZZ" is 4 bases, and the base is selected From at least one of A, T, C and G; the length of the "barcode" is 4-6 bases, and the base is selected from at least one of A, T, C and G; the "N "Is any one of bases A, T, C, and G, and the "Y" is base C or T.
第二方面,提供了一种基于PCR的全基因组测序方法,其包括如下步骤:In the second aspect, a PCR-based whole genome sequencing method is provided, which includes the following steps:
制备待测样品的DNA;Prepare the DNA of the sample to be tested;
提供上述任一项所述的测序引物组,利用所述通用上游引物和通用下游引物对所述待测样品的DNA进行第一PCR扩增得到第一扩增产物,利用所述通用上游引物和所述富集启动子下游引物对所述待测样品的DNA进行第二PCR扩增得到第二扩增产物;The sequencing primer set of any one of the above is provided, the first PCR amplification is performed on the DNA of the test sample using the universal upstream primer and the universal downstream primer to obtain the first amplification product, and the universal upstream primer and The downstream primer of the enriched promoter performs a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
将所述第一扩增产物和第二扩增产物分别进行测序分析。The first amplification product and the second amplification product are respectively subjected to sequencing analysis.
有益效果Beneficial effect
本发明提供的测序引物组包括特有序列的通用上游引物、通用下游引物及富集启动子下游引物,其中,通用上游引物和通用下游引物分别包括四个碱基:“XXXX”和“ZZZZ”(碱基均选自A、T、C和G中的至少一种进行组合),这八个碱基为通用引物的核心碱基区域,可以根据待测样品基因组的变化而变化,根据碱基的不同,使之能够扩增不同基因组上不同的序列片段,可以根据基因组的序列优化引物方案,任意调节基因组大小和样品量的大小;同时,通过模拟真核生物上游区域碱基序列,设计具有启动子(Pormoter)区域特征的引物即富集启动子下游引物,这样,通过通用上游引物和富集启动子下游引物可以有效扩增启动子附近序列。这样的引物组用于全基因组扩增测序,不仅具有很好的富集效应,而且使建库测序更加简洁、准确,降低测序成本。The sequencing primer set provided by the present invention includes a universal upstream primer with a unique sequence, a universal downstream primer, and an enriched promoter downstream primer. The universal upstream primer and the universal downstream primer include four bases: "XXXX" and "ZZZZ" ( The bases are selected from at least one of A, T, C and G in combination), these eight bases are the core base region of the universal primer, which can be changed according to the changes in the genome of the sample to be tested. Different, so that it can amplify different sequence fragments on different genomes, the primer scheme can be optimized according to the sequence of the genome, and the size of the genome and the size of the sample can be adjusted arbitrarily; at the same time, the design has a starter by simulating the base sequence of the upstream region of eukaryotes The primers that are characteristic of the Pormoter region are the downstream primers of the enriched promoter, so that the universal upstream primer and the downstream primer of the enriched promoter can effectively amplify the sequence near the promoter. Such a primer set is used for whole-genome amplification and sequencing, not only has a good enrichment effect, but also makes library construction and sequencing more concise and accurate, and reduces sequencing costs.
同时,本发明提供的一种基于PCR的全基因组测序方法,通过利用上述测序引物组对所述待测样品的DNA进行扩增得到待测样品PCR产物,再进行测序分析。上述基于PCR的全基因组进行测序方法,兼顾“发现”和“检测”单核苷酸多态性(SNP)位点的特点,能够在发型新标记的同时,对全基因组进行检测筛查,因此不需要前期构建芯片,不需要对已知的单核苷酸多态性(SNP)位点的标记;同时,利用此方法进行全基因组测序,对样品DNA质量要求不高,建库更加简洁方便,同时覆盖度高,具有基因区富集效应,测序时间短,能够广泛应用于不同样品的基因组测序中。At the same time, the present invention provides a PCR-based whole-genome sequencing method, which amplifies the DNA of the test sample by using the above-mentioned sequencing primer set to obtain the PCR product of the test sample, and then performs sequencing analysis. The above-mentioned PCR-based whole-genome sequencing method takes into account the characteristics of "discovering" and "detecting" single nucleotide polymorphism (SNP) sites, and can detect and screen the whole genome while new hair styles are marked. Therefore, There is no need to build a chip in the early stage, and no need to mark the known single nucleotide polymorphism (SNP) sites; at the same time, using this method for whole genome sequencing does not require high sample DNA quality, and the library construction is more concise and convenient , At the same time, it has high coverage, gene enrichment effect, short sequencing time, and can be widely used in genome sequencing of different samples.
附图说明Description of the drawings
图1是本发明实施例1提供的5个样品通用引物PCR产物的琼脂糖电泳分析结果。Figure 1 shows the results of agarose electrophoresis analysis of the PCR products of 5 samples with universal primers provided in Example 1 of the present invention.
图2是本发明实施例1提供的5个样品富集Pormoter区引物PCR产物的琼脂糖电泳分析结果。Fig. 2 is the agarose electrophoresis analysis result of five samples enriched in Pormoter region primer PCR products provided in Example 1 of the present invention.
图3是通用引物方法得到测序reads在基因组覆盖率分析图。Figure 3 is a graph of the genome coverage analysis of sequencing reads obtained by the universal primer method.
图4是富集Pormoter区引物方法得到测序reads在基因组覆盖率分析图。Figure 4 is a graph of genome coverage analysis of sequencing reads obtained by enriching Pormoter region primers.
图5是PCR产物在基因区富集情况。Figure 5 shows the enrichment of PCR products in the gene region.
本发明的实施方式Embodiments of the invention
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.
本发明实例提供一种测序引物组,所述测序引物组包括通用上游引物、通用下游引物及富集启动子下游引物;An example of the present invention provides a sequencing primer set, which includes a universal upstream primer, a universal downstream primer, and an enriched promoter downstream primer;
所述通用上游引物的序列包括:5’-T[barcode]CAAAXXXXNNN-3’;The sequence of the universal upstream primer includes: 5'-T[barcode]CAAAXXXXNNN-3';
所述通用下游引物的序列包括:5’-GACTGCGTACGZZZZNNN-3’;The sequence of the universal downstream primer includes: 5'-GACTGCGTACGZZZZNNN-3';
所述富集启动子下游引物的序列包括:5’-GACTGCGTACYYNCTATA-3’;The sequence of the primer downstream of the enriched promoter includes: 5'-GACTGCGTACYYNCTATA-3';
其中,所述“XXXX”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“ZZZZ”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“barcode”的长度为5个碱基,且碱基选自A、T、C和G的至少一种;所述“N”为碱基A、T、C和G中的任意一种,所述“Y”为碱基C或T。Wherein, the length of "XXXX" is 4 bases, and the base is selected from at least one of A, T, C and G; the length of "ZZZZ" is 4 bases, and the base is selected From at least one of A, T, C and G; the length of the "barcode" is 5 bases, and the base is selected from at least one of A, T, C and G; the "N" is Any one of bases A, T, C, and G, and the "Y" is base C or T.
具体的,所述“XXXX”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“ZZZZ”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种。这八个碱基为通用引物的核心碱基区域,可以根据待测样品基因组的变化而变化,因此根据碱基的不同,使其能够扩增不同基因组上不同的序列片段。具体的,在通用上游引物上设计了所述“barcode”,其长度为5个碱基,且为碱基A、T、C、G的任意组合。优选的,所述“barcode”优选5个碱基,基于其为碱基A、T、C、G的任意组合,共计384个“barcode”可供选择。在本发明优选实施例中,可根据不同的样品添加任意数量的“barcode”序列以进一步对样品进行区别。所添加的“barcode”均是根据待测样品的基因组的变化而变化,添加“barcode”的目的是为了一方面是为了区别每一个不同的样品,即区别每一段DNA序列;Specifically, the length of "XXXX" is 4 bases, and the base is selected from at least one of A, T, C, and G; the length of "ZZZZ" is 4 bases, and the base is At least one selected from A, T, C, and G. These eight bases are the core base region of the universal primer, which can be changed according to the changes in the genome of the sample to be tested, so it can amplify different sequence fragments on different genomes according to the difference of bases. Specifically, the "barcode" is designed on the universal upstream primer, and its length is 5 bases, and it is any combination of bases A, T, C, and G. Preferably, the "barcode" is preferably 5 bases, and based on any combination of bases A, T, C, and G, a total of 384 "barcodes" are available for selection. In a preferred embodiment of the present invention, any number of "barcode" sequences can be added according to different samples to further distinguish the samples. The added "barcode" changes according to the change of the genome of the sample to be tested. The purpose of adding "barcode" is to distinguish each different sample, that is, to distinguish each DNA sequence;
具体的,所述富集启动子下游引物:5’-GACTGCGTACYYNCTATA-3’。设计一个富集启动子下游引物,其原理是通过模拟真核生物上游区域碱基序列,设计具有启动子区域特征的引物即富集启动子下游引物,使该引物能够有效扩增启动子区域附近的序列,进一步达到基因区富集的目的,有利于后续的试验。Specifically, the downstream primer of the enriched promoter: 5'-GACTGCGTACYYNCTATA-3'. Designing an enriched promoter downstream primer, the principle is to design a primer with the characteristics of the promoter region by simulating the base sequence of the upstream region of eukaryotes, that is, the enriched promoter downstream primer, so that the primer can effectively amplify the vicinity of the promoter region The sequence to further achieve the purpose of gene enrichment, which is conducive to subsequent experiments.
优选的,上述通用上游引物、通用下游引物及富集启动子下游引物中,所述“N”为碱基A、T、C、G中的任意一种,所述“Y”为碱基C或T。通过进一步在引物上设计“N”、“Y”碱基,一方面能够提高引物的覆盖率,使引物更大程度地扩增整个基因组DNA;另一方面,采用了混合碱基的设计方法,能够进一步的提高引物与待测样品的结合能力,达到基因去富集的目的。“N”为随机碱基,通用引物末端设计几个随机碱基,可以消除PCR富集的偏向效应,用于还原真实基因组,在拷贝数等分析上极具优势;优选的,所述“N”可设计为3-8任意数量的碱基位,用于对基因组进行更大程度的简化,可用在复杂基因组或者大基因组中,也是可以自由调节降低成本(更大样本,减少SNP)。Preferably, in the aforementioned universal upstream primer, universal downstream primer and enriched promoter downstream primer, the "N" is any one of bases A, T, C, and G, and the "Y" is base C Or T. By further designing the "N" and "Y" bases on the primers, on the one hand, the coverage of the primers can be increased, and the primers can amplify the entire genomic DNA to a greater extent; on the other hand, the design method of mixed bases is adopted. It can further improve the binding ability of the primer and the sample to be tested, and achieve the purpose of gene enrichment. "N" is a random base. Several random bases are designed at the end of the universal primer, which can eliminate the bias effect of PCR enrichment and is used to restore the real genome. It has great advantages in copy number analysis; preferably, the "N "It can be designed to any number of bases from 3 to 8 for greater simplification of the genome. It can be used in complex or large genomes, and it can also be adjusted freely to reduce costs (larger samples, reduced SNP).
在本发明优选实施例中,以木薯为检测对象,所述测序引物组中,所述通用上游引物的序列为:5’-T[barcode]CAAACCGGNNN-3’,所述通用下游引物的序列为:5’-GACTGCGTACGAATTNNN-3’;或者,In a preferred embodiment of the present invention, cassava is used as the detection object. In the sequencing primer set, the sequence of the universal upstream primer is: 5'-T[barcode]CAAACCGGNNN-3', and the sequence of the universal downstream primer is : 5'-GACTGCGTACGAATTNNN-3'; or,
所述通用上游引物的序列为:5’-T[barcode]CAAACGCGNNN-3’,所述通用下游引物的序列为:5’-GACTGCGTACGTATANNN-3’。The sequence of the universal upstream primer is: 5'-T[barcode]CAAACGCGNNN-3', and the sequence of the universal downstream primer is: 5'-GACTGCGTACGTATANNN-3'.
通过分析木薯基因中出现频率较高的基因序列,进一步确定通用上游引物的引物核心碱基“XXXX”为“CCGG”或“CGCG”;确定通用下游引物的引物核心碱基“ZZZZ”为“AATT”或“ATAT”。选择木薯基因中出现频率较高的碱基作为引物核心碱基,能够进一步提高扩增的覆盖率,能够具有良好的富集效应。By analyzing the gene sequence with higher frequency in the cassava gene, it is further determined that the core base "XXXX" of the universal upstream primer is "CCGG" or "CGCG"; the core base "ZZZZ" of the universal downstream primer is determined as "AATT" "Or "ATAT". Choosing the bases with a higher frequency in the cassava gene as the core bases of the primers can further increase the coverage of the amplification and have a good enrichment effect.
在本发明另一优选实施例中,以木薯为检测对象,所述“barcode”选自CTTAT、GTAGA、CCTCG、GAACT和TTACT中的至少一种。根据木薯检测样品的不同,在通用上游引物上设计了长度为5个碱基的“barcode”,为了区别每一段DNA序列,同时保证扩增更均匀,在全基因组的覆盖率更高,更全面。In another preferred embodiment of the present invention, cassava is used as the detection object, and the "barcode" is selected from at least one of CTTAT, GTAGA, CCTCG, GAACT and TTACT. According to different cassava test samples, a 5-base-length "barcode" is designed on the universal upstream primers to distinguish each DNA sequence, while ensuring more uniform amplification, higher coverage of the whole genome, and more comprehensive .
因此,与现有技术相比,本发明提供的测序引物组包括特有序列的通用上游引物、通用下游引物及富集启动子下游引物,其中,通用上游引物和通用下游引物分别包括四个碱基:“XXXX”和“ZZZZ”(碱基均选自A、T、C和G中的至少一种进行组合),这八个碱基为通用引物的核心碱基区域,可以根据待测样品基因组的变化而变化,根据碱基的不同,使之能够扩增不同基因组上不同的序列片段,可以根据基因组的序列优化引物方案,任意调节基因组大小和样品量的大小;同时,通过模拟真核生物上游区域碱基序列,设计具有启动子区域特征的引物即富集启动子下游引物,这样,通过通用上游引物和富集启动子下游引物可以有效扩增启动子附近序列。这样的引物组用于全基因组扩增测序,不仅具有很好的富集效应,而且使建库测序更加简洁、准确,降低测序成本。Therefore, compared with the prior art, the sequencing primer set provided by the present invention includes a universal upstream primer with a unique sequence, a universal downstream primer, and an enriched promoter downstream primer, wherein the universal upstream primer and the universal downstream primer each include four bases : "XXXX" and "ZZZZ" (the bases are all selected from at least one of A, T, C and G in combination), these eight bases are the core base regions of the universal primer, which can be based on the sample genome According to the different bases, it can amplify different sequence fragments on different genomes. The primer scheme can be optimized according to the sequence of the genome, and the genome size and sample size can be adjusted arbitrarily; at the same time, by simulating eukaryotes For the base sequence of the upstream region, design a primer with the characteristics of the promoter region, that is, the downstream primer of the enriched promoter, so that the universal upstream primer and the downstream primer of the enriched promoter can effectively amplify the sequence near the promoter. Such a primer set is used for whole-genome amplification and sequencing, not only has a good enrichment effect, but also makes library construction and sequencing more concise and accurate, and reduces sequencing costs.
相应的,本发明实施例还提供了一种基于PCR的全基因组测序方法,其包括如下步骤:Correspondingly, an embodiment of the present invention also provides a PCR-based whole genome sequencing method, which includes the following steps:
S01. 制备待测样品的DNA;S01. Prepare the DNA of the sample to be tested;
S02. 提供所述的测序引物组,利用所述通用上游引物和通用下游引物对所述待测样品的DNA进行第一PCR扩增得到第一扩增产物,利用所述通用上游引物和所述富集启动子下游引物对所述待测样品的DNA进行第二PCR扩增得到第二扩增产物;S02. Provide the sequencing primer set, use the universal upstream primer and the universal downstream primer to perform the first PCR amplification on the DNA of the test sample to obtain the first amplification product, and use the universal upstream primer and the The downstream primer of the enriched promoter performs a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
S03. 将所述第一扩增产物和第二扩增产物分别进行测序分析。S03. The first amplification product and the second amplification product are respectively subjected to sequencing analysis.
具体的,在上述步骤S01中,制备待测样品的DNA。在本发明优选实施例中,所述制备待测样品的DNA的方法选自全基因组DNA提取方法、PCR试剂盒扩增法和制备裂解液裂解组织法中的任意一种。优选的,普通全基因组DNA提取方法,主要包括利用十六烷基三甲基溴化铵法(CTAB法),这种方法主要是利用阳离子去污剂,从低离子强度溶液中沉淀核酸与酸性多聚糖,在高离子浓度的溶液中,十六烷基三甲基溴化铵与蛋白质和多聚糖形成复合物,但不会沉淀核酸。再通过有机溶剂抽提,去除蛋白、多糖、酚类等杂质后,加入乙醇沉淀,即使核酸分离出来,得到样品全基因组DNA;或者采用SDS(十二烷基苯磺酸钠)裂解细胞,使染色体离析、蛋白变性,同时SDS与蛋白质和多糖结合成复合物,释放得到核酸。其次,也可以直接利用PCR试剂盒扩增法进行直接制备;或者制备裂解液裂解组织法提取得到全基因组DNA。由于后续试验对DNA质量要求不高,利用本发明实施例所述的直接PCR的方法可以快速富集得到样品全基因组DNA,同时对DNA要求不高,方法简单快捷,极大地缩短了文库制备时间,提高试验效率。Specifically, in the above step S01, the DNA of the sample to be tested is prepared. In a preferred embodiment of the present invention, the method for preparing the DNA of the sample to be tested is selected from any one of a whole genome DNA extraction method, a PCR kit amplification method, and a lysis solution preparation method. Preferably, the ordinary whole-genome DNA extraction method mainly includes the use of cetyltrimethylammonium bromide method (CTAB method). This method mainly uses cationic detergents to precipitate nucleic acids and acidity from low ionic strength solutions. Polysaccharides. In a solution with high ion concentration, cetyltrimethylammonium bromide forms complexes with proteins and polysaccharides, but does not precipitate nucleic acids. After extraction with organic solvents to remove impurities such as proteins, polysaccharides, and phenols, ethanol precipitation is added, even if the nucleic acid is separated, the whole genome DNA of the sample is obtained; or SDS (sodium dodecylbenzene sulfonate) is used to lyse the cells to make Chromosome segregation and protein denaturation, while SDS combines with protein and polysaccharides to form a complex, releasing nucleic acid. Secondly, it can also be directly prepared by the PCR kit amplification method; or the whole genomic DNA can be obtained by preparing a lysis solution and lysing tissue. Since subsequent experiments do not require high DNA quality, the direct PCR method described in the embodiments of the present invention can quickly enrich the whole genomic DNA of the sample. At the same time, the DNA requirements are not high, the method is simple and fast, and the library preparation time is greatly shortened. , Improve test efficiency.
具体的,在上述S02步骤中,提供所述的测序引物组,利用所述通用上游引物和通用下游引物对所述待测样品的DNA进行第一PCR扩增得到第一扩增产物,利用所述通用上游引物和所述富集启动子下游引物对所述待测样品的DNA进行第二PCR扩增得到第二扩增产物;Specifically, in the above step S02, the sequencing primer set is provided, the first PCR amplification is performed on the DNA of the test sample using the universal upstream primer and the universal downstream primer to obtain the first amplification product, and the The universal upstream primer and the downstream primer of the enriched promoter perform a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
优选的,其中所述第一PCR扩增的步骤包括:Preferably, wherein the first PCR amplification step includes:
先进行使通用上游引物和通用下游引物结合到所述待测样品的DNA上的结合扩增,然后进行富集目标区域的富集扩增。Advanced use of universal upstream primers and universal downstream primers bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enrichment target region.
其中,通用PCR体系(20μL体系)如下:Among them, the general PCR system (20μL system) is as follows:
DNA样品(20ng)                                   1.0μLDNA sample (20ng) (20ng) 1.0μL
2× NEB Taq Master Mix                                10μL2× NEB Taq Master Mix 10μL
5μM Primer通用上游引物(对应5个样本)              0.6μL5μM Primer universal upstream primer (corresponding to 5 samples) 0.6μL
5μM Primer 通用下游引物                              0.6μL5μM Primer Universal Downstream Primer 0.6μL
ddH 2O                                               7.8μL; ddH 2 O 7.8μL;
所述结合扩增包括94℃ 5分钟,5个循环的如下反应程序:94℃,1分钟;35℃,1分钟;72℃,1.5分钟;和The binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
所述富集扩增包括35个循环的如下反应程序:94℃,1分钟;50~58℃,1分钟;72℃,1.5分钟。The enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
72℃,7分钟72°C, 7 minutes
优选的,所述第二PCR扩增的步骤包括:Preferably, the second PCR amplification step includes:
先进行使通用上游引物和富集启动子下游引物结合到所述待测样品的DNA上的结合扩增,然后进行富集目标区域的富集扩增。Advanced use of universal upstream primers and enrichment promoter downstream primers are bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enriched target region is performed.
其中,富集Pormoter区PCR体系(20μL体系)如下:Among them, the PCR system (20μL system) for enriching the Pormoter zone is as follows:
DNA样品(20ng)                                   1.0μLDNA sample (20ng) (20ng) 1.0μL
2× NEB Taq Master Mix                                10μL2× NEB Taq Master Mix 10μL
5μM Primer通用上游引物(对应5个样本)              0.6μL5μM Primer universal upstream primer (corresponding to 5 samples) 0.6μL
5μM Primer 通用下游引物                              0.6μL5μM Primer Universal Downstream Primer 0.6μL
ddH 2O                                               7.8μL; ddH 2 O 7.8μL;
所述结合扩增包括94℃ 5分钟,5个循环的如下反应程序:94℃,1分钟;35℃,1分钟;72℃,1.5分钟;和The binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
所述富集扩增包括35个循环的如下反应程序:94℃,1分钟;50~58℃,1分钟;72℃,1.5分钟。The enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
72℃,7分钟72°C, 7 minutes
其中,结合扩增主要的目的是先将上述各引物与样品DNA进行结合。其中,结合扩增的PCR反应循环次数为5个,在这个循环次数下,各个引物可以结合到样品DNA链上。根据引物上所设计的“XXXX”、“ZZZZ”、“N”及“Y”等碱基,采用了退火温度为35℃,主要是通过这种预扩增的步骤,设置较低的退火温度,使引物上所设计的“XXXX”、“N”及“Y”能够准确迅速地与样品DNA结合,将引物定位于样品DNA上,有利于后续进行富集扩增。Among them, the main purpose of combined amplification is to first combine the aforementioned primers with the sample DNA. Among them, the number of PCR reaction cycles combined with amplification is 5, and each primer can bind to the sample DNA strand under this number of cycles. According to the bases of "XXXX", "ZZZZ", "N" and "Y" designed on the primer, the annealing temperature is 35℃, mainly through this pre-amplification step, setting a lower annealing temperature , So that the "XXXX", "N" and "Y" designed on the primer can be accurately and quickly combined with the sample DNA, and the primer is positioned on the sample DNA, which is conducive to subsequent enrichment and amplification.
优选的,富集扩增主要目的是使基因去进行富集,产生富集效应,方便后续进行检测。在本发明优选实施例中,所述富集扩增的PCR反应循环次数为35个,将循环次数增加为35个,主要是为了能够扩增大量的样品DNA,提高样品DNA富集的片段数量。并且,依据各引物的长度,设置退火温度为50~58℃,合适的温度能够提高DNA的扩增速率,使DNA的富集量增加。Preferably, the main purpose of enrichment and amplification is to enrich the gene to produce an enrichment effect and facilitate subsequent detection. In a preferred embodiment of the present invention, the number of cycles of the PCR reaction of the enrichment amplification is 35, and the number of cycles is increased to 35, mainly to be able to amplify a large amount of sample DNA and increase the number of sample DNA enriched fragments . Moreover, according to the length of each primer, the annealing temperature is set to 50~58℃. A suitable temperature can increase the rate of DNA amplification and increase the amount of DNA enrichment.
具体的,在上述步骤S03中,将所述第一扩增产物和第二扩增产物分别进行测序分析。在本发明优选实施例中,采用琼脂糖凝胶电泳分别对第一扩增产物和第二扩增产物进行检测,确保PCR扩增能够得到产物。优选的,将所述第一扩增产物定量至浓度为180-200ng/μL,以及将所述第二扩增产物定量至浓度为180-200ng/μL,保持各PCR产品浓度一致。若PCR产物浓度太低,则得到的PCR扩增产物的量较少,后续试验检测过程中容易受杂质干扰;若浓度太高,则PCR扩增产物量过多,在后续试验中由于浓度过高,分析不清楚,测序结果不清楚。Specifically, in the foregoing step S03, the first amplification product and the second amplification product are respectively subjected to sequencing analysis. In a preferred embodiment of the present invention, agarose gel electrophoresis is used to detect the first amplification product and the second amplification product respectively to ensure that the product can be obtained by PCR amplification. Preferably, the first amplification product is quantified to a concentration of 180-200 ng/μL, and the second amplification product is quantified to a concentration of 180-200 ng/μL, and the concentration of each PCR product is kept consistent. If the PCR product concentration is too low, the amount of PCR amplification product obtained is small, and it is easy to be interfered by impurities in the subsequent test and detection process; if the concentration is too high, the amount of PCR amplification product is too much. High, unclear analysis, unclear sequencing results.
优选的,将第一扩增产物调节至相同浓度之后混合所有样品得到第一混合物,将第二扩增产物调节至相同浓度之后混合所有样品得到第二混合物。优选的,所述混合的方法选自吸打混匀或离心混匀的任一一种。本发明具体实施例中对单样本分别进行定量后,再混合测序,可以一定程度克服简化测序普遍现象产出数据量均一性的问题。Preferably, after adjusting the first amplification product to the same concentration, all samples are mixed to obtain a first mixture, and after adjusting the second amplification product to the same concentration, all samples are mixed to obtain a second mixture. Preferably, the mixing method is selected from any one of suction mixing and centrifugal mixing. In the specific embodiment of the present invention, the single samples are respectively quantified and then mixed for sequencing, which can overcome the problem of the uniformity of the data output from the common phenomenon of simplified sequencing to a certain extent.
优选的,对所述第一混合物和第二混合物分别进行二代建库测序并分析测序结果。优选的,二代建库测序分析通常需用illumina测序。在本发明优选实施例中,对得到的测序结果进行分析,可以利用所设计的不同引物的[barcode]分别区分样品进行分析,不仅进行检测同时挖掘群体变异位点。Preferably, the second-generation library construction sequencing is performed on the first mixture and the second mixture, and the sequencing results are analyzed. Preferably, the second-generation library construction sequencing analysis usually requires illumina sequencing. In a preferred embodiment of the present invention, the obtained sequencing results are analyzed, and the [barcode] of different primers designed can be used to distinguish samples for analysis, not only for detection but also for mining population variation sites.
与现有技术相比,本发明提供的一种基于PCR的全基因组测序方法,通过利用上述测序引物组对所述待测样品的DNA进行扩增得到待测样品PCR产物,再进行测序分析。上述基于PCR的全基因组进行测序方法,兼顾“发现”和“检测”单核苷酸多态性(SNP)位点的特点,能够在发型新标记的同时,对全基因组进行检测筛查,因此不需要前期构建芯片,不需要对已知的单核苷酸多态性(SNP)位点的标记;同时,利用此方法进行全基因组测序,对样品DNA质量要求不高,建库更加简洁方便,同时覆盖度高,具有基因区富集效应,测序时间短,能够广泛应用于不同样品的基因组测序中。下面结合具体实施例进行说明。Compared with the prior art, the present invention provides a PCR-based whole-genome sequencing method that amplifies the DNA of the test sample by using the sequencing primer set to obtain the PCR product of the test sample, and then performs sequencing analysis. The above-mentioned PCR-based whole-genome sequencing method takes into account the characteristics of "discovering" and "detecting" single nucleotide polymorphism (SNP) sites, and can detect and screen the whole genome while new hair styles are marked. Therefore, There is no need to build a chip in the early stage, and no need to mark the known single nucleotide polymorphism (SNP) sites; at the same time, using this method for whole genome sequencing does not require high sample DNA quality, and the library construction is more concise and convenient , At the same time, it has high coverage, gene enrichment effect, short sequencing time, and can be widely used in genome sequencing of different samples. The description is given below in conjunction with specific embodiments.
实施例1Example 1
一种基于PCR的全基因组测序方法,以不同品种的木薯为例进行试验,A PCR-based whole-genome sequencing method, using different varieties of cassava as an example to test,
其包括如下步骤:It includes the following steps:
步骤一:制备待测样品的DNA;Step 1: Prepare the DNA of the sample to be tested;
选取在同一生长条件下生长一致,同生长期、同一部位,且无病虫害的不同品种的木薯提取基因组DNA。长期保存样品需液氮或-70℃以下冰箱。采用DNeasy 96 Plant Kit (QIAGEN)试剂盒提取基因组DNA。Genome DNA was extracted from different varieties of cassava that had the same growth under the same growth conditions, the same growth period, the same part, and no pests and diseases. Long-term storage of samples requires liquid nitrogen or a refrigerator below -70°C. Adopt DNeasy 96 Plant Kit (QIAGEN) kit for extraction of genomic DNA.
对提取的基因组DNA质量检测及定量:琼脂糖凝胶检测以λmarker为标记,取1μL DNA ,加入2μL l0× 溴酚蓝上样缓冲液,混匀,点入含0.5μg/ml Goldview 染料的0.8% 琼脂糖凝胶中,用1× TAE 缓冲液,90 V 电泳40 m in;凝胶成像分析系统(Tanon4100)观察DNA条带。Quality detection and quantification of extracted genomic DNA: agarose gel detection is marked with λmarker, take 1μL of DNA, add 2μL of l0× bromophenol blue loading buffer, mix well, and spot containing 0.5μg/ml In a 0.8% agarose gel with Goldview dye, use 1× TAE buffer, 90 V for 40 min electrophoresis; gel imaging analysis system (Tanon4100) to observe DNA bands.
取1-2μL DNA样品,用NANODROP 2000C 对基因组DNA进行检测。根据260nm处的光吸收值计算DNA浓度,根据OD260/OD280、OD260/OD230比值判断有无多糖、蛋白质、RNA等杂质,从而确定DNA的纯度。Take 1-2μL DNA sample and use NANODROP 2000C to detect genomic DNA. Calculate the DNA concentration based on the light absorption value at 260nm, and determine the presence or absence of impurities such as polysaccharides, proteins, and RNA based on the ratio of OD260/OD280 and OD260/OD230 to determine the purity of DNA.
步骤二:提供所述的测序引物组,利用所述通用上游引物和通用下游引物对所述待测样品的DNA进行第一PCR扩增得到第一扩增产物,利用所述通用上游引物和所述富集启动子下游引物对所述待测样品的DNA进行第二PCR扩增得到第二扩增产物;Step 2: Provide the sequencing primer set, use the universal upstream primer and universal downstream primer to perform the first PCR amplification on the DNA of the test sample to obtain the first amplification product, and use the universal upstream primer and the universal The downstream primer of the enriched promoter performs a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
其中,设计了5个通用上游引物,分别为Among them, 5 universal upstream primers are designed, namely
SEQ ID NO.1(5’-TCTTATCAAACCGGNNN-3’)、SEQ ID NO.1 (5’-TCTTATCAAACCGGNNN-3’),
SEQ ID NO.2(5’-TGTAGACAAACCGGNNN-3’)、SEQ ID NO.2(5’-TGTAGACAAACCGGNNN-3’),
SEQ ID NO.3(5’-TCCTCGCAAACCGGNNN-3’)、SEQ ID NO.3(5’-TCCTCGCAAACCGGNNN-3’),
SEQ ID NO.4(5’-TGAACTCAAACCGGNNN-3’)、SEQ ID NO.4 (5’-TGAACTCAAACCGGNNN-3’),
SEQ ID NO.5(5’-TTTACTCAAACCGGNNN-3’),其中,[barcode]区域为不同的序列,通用上游引物中“XXXX”序列为“GGCC”,由于“GGCC”为木薯中出现频率较高的序列,因此选择“GGCC”。SEQ ID NO.5 (5'-TTTACTCAAACCGGNNN-3'), where the [barcode] region is a different sequence, the "XXXX" sequence in the universal upstream primer is "GGCC", because "GGCC" is a higher frequency in cassava Sequence, so choose "GGCC".
设计了1个通用下游引物序列为:A universal downstream primer sequence was designed as:
SEQ ID NO.6(5’-GACTGCGTACGAATTNNN-3’),其中,“ZZZZ”序列为“AATT”,由于“AATT”为木薯中出现频率较高的序列,因此选择“AATT”。SEQ ID NO.6 (5'-GACTGCGTACGAATTNNN-3'), where the "ZZZZ" sequence is "AATT". Since "AATT" is a sequence with a higher frequency in cassava, "AATT" is selected.
设计了1个富集启动子下游引物为:Designed a downstream primer of enriched promoter:
SEQ ID NO.7(5’-GACTGCGTACYYNCTATA-3’)。SEQ ID NO. 7 (5'-GACTGCGTACYYNCTATA-3').
利用所述通用上游引物和通用下游引物对所述待测样品的DNA进行第一PCR扩增,所述第一PCR扩增的步骤包括:The first PCR amplification is performed on the DNA of the sample to be tested using the universal upstream primer and the universal downstream primer, and the first PCR amplification step includes:
先进行使通用上游引物和通用下游引物结合到所述待测样品的DNA上的结合扩增,然后进行富集目标区域的富集扩增。Advanced use of universal upstream primers and universal downstream primers bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enrichment target region.
其中,通用PCR体系(20μL体系)如下:Among them, the general PCR system (20μL system) is as follows:
DNA样品(20ng)                                   1.0μLDNA sample (20ng) (20ng) 1.0μL
2× NEB Taq Master Mix                                10μL2× NEB Taq Master Mix 10μL
5μM Primer通用上游引物(对应5个样本)              0.6μL5μM Primer universal upstream primer (corresponding to 5 samples) 0.6μL
5μM Primer 通用下游引物                              0.6μL5μM Primer Universal Downstream Primer 0.6μL
ddH 2O                                               7.8μL; ddH 2 O 7.8μL;
所述结合扩增包括94℃ 5分钟,5个循环的如下反应程序:94℃,1分钟;35℃,1分钟;72℃,1.5分钟;和The binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
所述富集扩增包括35个循环的如下反应程序:94℃,1分钟;50~58℃,1分钟;72℃,1.5分钟。The enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
72℃,7分钟72°C, 7 minutes
利用所述通用上游引物和所述富集启动子下游引物对所述待测样品的DNA进行第二PCR扩增,所述第二PCR扩增步骤包括:The second PCR amplification is performed on the DNA of the test sample by using the universal upstream primer and the downstream primer of the enriched promoter, and the second PCR amplification step includes:
先进行使通用上游引物和富集启动子下游引物结合到所述待测样品的DNA上的结合扩增,然后进行富集目标区域的富集扩增。Advanced use of universal upstream primers and enrichment promoter downstream primers are bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enriched target region is performed.
富集Pormoter区PCR体系(20μL体系)如下:The PCR system for enriching the Pormoter zone (20μL system) is as follows:
DNA样品(20ng)                                   1.0μLDNA sample (20ng) (20ng) 1.0μL
2× NEB Taq Master Mix                                10μL2× NEB Taq Master Mix 10μL
5μM Primer通用上游引物(对应5个样本)              0.6μL5μM Primer universal upstream primer (corresponding to 5 samples) 0.6μL
5μM Primer 通用下游引物                              0.6μL5μM Primer Universal Downstream Primer 0.6μL
ddH 2O                                               7.8μL; ddH 2 O 7.8μL;
所述结合扩增包括94℃ 5分钟,5个循环的如下反应程序:94℃,1分钟;35℃,1分钟;72℃,1.5分钟;和The binding amplification includes the following reaction program at 94°C for 5 minutes and 5 cycles: 94°C, 1 minute; 35°C, 1 minute; 72°C, 1.5 minutes; and
所述富集扩增包括35个循环的如下反应程序:94℃,1分钟;50~58℃,1分钟;72℃,1.5分钟。The enrichment amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50-58°C, 1 minute; 72°C, 1.5 minutes.
72℃,7分钟72°C, 7 minutes
步骤三:将所述第一扩增产物和第二扩增产物分别进行测序分析。Step 3: Perform sequencing analysis on the first amplification product and the second amplification product respectively.
分别取8μL第一扩增产物和8μL第二扩增产物,2%琼脂糖胶进行检测。检测结果如图1、图2。图1为5个样品第一扩增产物的检测结果图,图2为5个样品第二扩增产物的检测结果图。分别对所有PCR产物进行定量,需要均一化,精确定量到100ng/μL。分别将定量后的第一扩增产物和第二扩增产物的每个样品取2ul各混合为第一混合产物和第二混合产物。纯化回收200bp-700bp大小片段。送到第三方测序公司,使用Hiseq2500测序,测序长度为双端150bp。得到原始reads总数据量5Gb。测序结果如图3、图4、图5。图3是通用引物方法得到测序reads在基因组覆盖,图4是富集Pormoter区引物方法得到测序reads在基因组覆盖,图5是在基因区富集情况,由图5可知,其中70%的PCR产物富集于非编码区(Non-gene region),17%的PCR产物富集于内含子区(Intron),7%的PCR产物富集于CDS区,6%的PCR产物富集于UP_2000区域。综合上述三个图,可以分析得到利用本发明的PCR的方法进行全基因组进行测序,其在基因组上的覆盖率非常高,结果具有极高的可信度。同时这个方法建库更加简洁方便,同时覆盖度高,具有基因区富集效应,测序时间短,能够广泛应用于不同样品的基因组测序中。Take 8 μL of the first amplification product and 8 μL of the second amplification product, and 2% agarose gel for detection. The test results are shown in Figure 1 and Figure 2. Figure 1 is a graph of the detection results of the first amplification product of 5 samples, and Figure 2 is a graph of the detection results of the second amplification product of 5 samples. To quantify all PCR products separately, it needs to be homogenized and accurately quantified to 100ng/μL. Take 2ul of each sample of the quantified first amplification product and the second amplification product and mix them into the first mixed product and the second mixed product. Purify and recover 200bp-700bp size fragments. Sent to a third-party sequencing company, using Hiseq2500 sequencing, the sequencing length is double-ended 150bp. The total data volume of original reads is 5Gb. The sequencing results are shown in Figure 3, Figure 4, and Figure 5. Figure 3 is the genome coverage of the sequencing reads obtained by the universal primer method, Figure 4 is the genome coverage of the sequencing reads obtained by the enrichment Pormoter region primer method, and Figure 5 is the enrichment in the gene region. As shown in Figure 5, 70% of the PCR products are Enriched in non-gene region (Non-gene region), 17% of PCR products are enriched in intron region (Intron), 7% of PCR products are enriched in CDS region, 6% of PCR products are enriched in UP_2000 region . Based on the above three figures, it can be analyzed that the PCR method of the present invention is used to perform whole-genome sequencing, which has a very high coverage rate on the genome, and the result is extremely reliable. At the same time, this method is more concise and convenient to build a database, at the same time, it has high coverage, has a gene enrichment effect, and has a short sequencing time. It can be widely used in genome sequencing of different samples.
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in Within the protection scope of the present invention.

Claims (15)

  1. 一种测序引物组,其特征在于,所述测序引物组包括通用上游引物、通用下游引物及富集启动子下游引物;A sequencing primer set, characterized in that the sequencing primer set includes a universal upstream primer, a universal downstream primer, and an enriched promoter downstream primer;
    所述通用上游引物的序列包括:5’-T[barcode]CAAAXXXXNNN-3’;The sequence of the universal upstream primer includes: 5'-T[barcode]CAAAXXXXNNN-3';
    所述通用下游引物的序列包括:5’-GACTGCGTACGZZZZNNN-3’;The sequence of the universal downstream primer includes: 5'-GACTGCGTACGZZZZNNN-3';
    所述富集启动子下游引物的序列包括:5’-GACTGCGTACYYNCTATA-3’;The sequence of the primer downstream of the enriched promoter includes: 5'-GACTGCGTACYYNCTATA-3';
    其中,所述“XXXX”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“ZZZZ”的长度为4个碱基,且碱基选自A、T、C和G中的至少一种;所述“barcode”的长度为4-6个碱基,且碱基选自A、T、C和G的至少一种;所述“N”为碱基A、T、C和G中的任意一种,所述“Y”为碱基C或T。Wherein, the length of "XXXX" is 4 bases, and the base is selected from at least one of A, T, C and G; the length of "ZZZZ" is 4 bases, and the base is selected From at least one of A, T, C and G; the length of the "barcode" is 4-6 bases, and the base is selected from at least one of A, T, C and G; the "N "Is any one of bases A, T, C, and G, and the "Y" is base C or T.
  2. 根据权利要求1所述的测序引物组,其特征在于,所述测序引物组中,所述通用上游引物的序列为:5’-T[barcode]CAAACCGGNNN-3’,所述通用下游引物的序列为:5’-GACTGCGTACGAATTNNN-3’。The sequencing primer set of claim 1, wherein in the sequencing primer set, the sequence of the universal upstream primer is: 5'-T[barcode]CAAACCGGNNN-3', and the sequence of the universal downstream primer It is: 5'-GACTGCGTACGAATTNNN-3'.
  3. 根据权利要求1所述的测序引物组,其特征在于,所述测序引物组中,所述通用上游引物的序列为:5’-T[barcode]CAAACGCGNNN-3’,所述通用下游引物的序列为:5’-GACTGCGTACGTATANNN-3’。The sequencing primer set of claim 1, wherein in the sequencing primer set, the sequence of the universal upstream primer is: 5'-T[barcode]CAAACGCGNNN-3', and the sequence of the universal downstream primer It is: 5'-GACTGCGTACGTATANNN-3'.
  4. 根据权利要求1所述的测序引物组,其特征在于,所述“barcode”选自CTTAT、GTAGA、CCTCG、GAACT和TTACT中的至少一种。The sequencing primer set of claim 1, wherein the "barcode" is selected from at least one of CTTAT, GTAGA, CCTCG, GAACT and TTACT.
  5. 根据权利要求1所述的测序引物组,其特征在于,所述通用上游引物如SEQ ID NO.1~5所示,所述通用下游引物如SEQ ID NO.6所示。The sequencing primer set of claim 1, wherein the universal upstream primer is SEQ ID As shown in NO.1~5, the universal downstream primer is as SEQ ID NO.6 shown.
  6. 根据权利要求1所述的测序引物组,其特征在于,所述通用上游引物如SEQ ID NO.1~5所示,所述富集启动子下游引物如SEQ ID NO.7所示。The sequencing primer set of claim 1, wherein the universal upstream primer is SEQ ID No. 1 to 5, the downstream primer of the enriched promoter is shown in SEQ ID NO.7.
  7. 一种基于PCR的全基因组测序方法,其包括如下步骤:A PCR-based whole-genome sequencing method, which includes the following steps:
    制备待测样品的DNA;Prepare the DNA of the sample to be tested;
    提供权利要求1-6任一项所述的测序引物组,利用所述通用上游引物和通用下游引物对所述待测样品的DNA进行第一PCR扩增得到第一扩增产物,利用所述通用上游引物和所述富集启动子下游引物对所述待测样品的DNA进行第二PCR扩增得到第二扩增产物;Provide the sequencing primer set according to any one of claims 1 to 6, using the universal upstream primer and universal downstream primer to perform first PCR amplification on the DNA of the sample to be tested to obtain a first amplification product, and using the The universal upstream primer and the downstream primer of the enriched promoter perform a second PCR amplification on the DNA of the test sample to obtain a second amplification product;
    将所述第一扩增产物和第二扩增产物混合,进行测序分析。The first amplification product and the second amplification product are mixed and subjected to sequencing analysis.
  8. 根据权利要求7所述的基于PCR的全基因组测序方法,其特征在于,所述第一PCR扩增的步骤包括:The PCR-based whole genome sequencing method according to claim 7, wherein the first PCR amplification step comprises:
    先进行使通用上游引物和通用下游引物结合到所述待测样品的DNA上的结合扩增,然后进行富集目标区域的富集扩增。Advanced use of universal upstream primers and universal downstream primers bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enrichment target region.
  9. 根据权利要求8所述的基于PCR的全基因组测序方法,其特征在于,所述结合扩增包括94℃,5分钟;5个循环的如下反应程序:94℃,1分钟;35℃,1分钟;72℃,1.5分钟。The PCR-based whole-genome sequencing method according to claim 8, wherein the combined amplification comprises 94°C, 5 minutes; 5 cycles of the following reaction program: 94°C, 1 minute; 35°C, 1 minute ; 72°C, 1.5 minutes.
  10. 根据权利要求8所述的基于PCR的全基因组测序方法,其特征在于,所述富集扩增包括35个循环的如下反应程序:94℃,1分钟;50~58℃,1分钟;72℃,1.5分钟;72℃,7分钟。The PCR-based whole-genome sequencing method of claim 8, wherein the enrichment and amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50~58°C, 1 minute; 72°C , 1.5 minutes; 72°C, 7 minutes.
  11. 根据权利要求7所述的基于PCR的全基因组测序方法,其特征在于,所述第二PCR扩增的步骤包括:The PCR-based whole-genome sequencing method according to claim 7, wherein the second PCR amplification step comprises:
    先进行使通用上游引物和富集启动子下游引物结合到所述待测样品的DNA上的结合扩增,然后进行富集目标区域的富集扩增。Advanced use of universal upstream primers and enrichment promoter downstream primers are bound to the DNA of the sample to be tested for binding amplification, and then enrichment amplification of the enriched target region is performed.
  12. 根据权利要求11所述的基于PCR的全基因组测序方法,其特征在于,所述结合扩增包括94℃,5分钟;5个循环的如下反应程序:94℃,1分钟;35℃,1分钟;72℃,1.5分钟。The PCR-based whole-genome sequencing method according to claim 11, wherein the combined amplification comprises 94°C, 5 minutes; 5 cycles of the following reaction procedures: 94°C, 1 minute; 35°C, 1 minute ; 72°C, 1.5 minutes.
  13. 根据权利要求11所述的基于PCR的全基因组测序方法,其特征在于,所述富集扩增包括35个循环的如下反应程序:94℃,1分钟;50~58℃,1分钟;72℃,1.5分钟;72℃,7分钟。The PCR-based whole-genome sequencing method according to claim 11, wherein the enrichment and amplification includes 35 cycles of the following reaction procedures: 94°C, 1 minute; 50~58°C, 1 minute; 72°C , 1.5 minutes; 72°C, 7 minutes.
  14. 根据权利要求7所述的基于PCR的全基因组测序方法,其特征在于,所述制备待测样品的DNA的方法选自全基因组DNA提取方法、PCR试剂盒扩增法和制备裂解液裂解组织法中的任意一种。The PCR-based whole-genome sequencing method according to claim 7, wherein the method for preparing the DNA of the sample to be tested is selected from the group consisting of whole-genome DNA extraction method, PCR kit amplification method, and preparation of lysis solution and tissue lysis method Any of them.
  15. 根据权利要求7所述的基于PCR的全基因组测序方法,其特征在于,将所述第一扩增产物定量至浓度为180-200ng/μL,以及将所述第二扩增产物定量至浓度为180-200ng/μL后,再分别进行测序分析。The PCR-based whole genome sequencing method of claim 7, wherein the first amplification product is quantified to a concentration of 180-200 ng/μL, and the second amplification product is quantified to a concentration of After 180-200ng/μL, sequencing analysis was performed respectively.
PCT/CN2019/081029 2019-04-02 2019-04-02 Design of sequencing primers and pcr-based method for sequencing whole genome WO2020199127A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/081029 WO2020199127A1 (en) 2019-04-02 2019-04-02 Design of sequencing primers and pcr-based method for sequencing whole genome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/081029 WO2020199127A1 (en) 2019-04-02 2019-04-02 Design of sequencing primers and pcr-based method for sequencing whole genome

Publications (1)

Publication Number Publication Date
WO2020199127A1 true WO2020199127A1 (en) 2020-10-08

Family

ID=72664682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081029 WO2020199127A1 (en) 2019-04-02 2019-04-02 Design of sequencing primers and pcr-based method for sequencing whole genome

Country Status (1)

Country Link
WO (1) WO2020199127A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220098639A1 (en) * 2020-09-25 2022-03-31 Hainan University Simple, Cost-effective and Amplification-based Whole Genome Sequencing Approach

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060110756A1 (en) * 2004-10-25 2006-05-25 Tom Tang Large-scale parallelized DNA sequencing
US20060172333A1 (en) * 2005-01-28 2006-08-03 Applera Corporation Compositions and methods for terminating a sequencing reaction at a specific location in a target DNA template
CN101153338A (en) * 2007-09-14 2008-04-02 东南大学 DNA sequencing method based on base modification protection reciprocation extension
CN101928770A (en) * 2009-09-18 2010-12-29 深圳市血液中心 Sequencing-based typing method of human leucocyte antigen (HLA)-Cw gene
CN102344961A (en) * 2011-09-30 2012-02-08 康旭基因技术(北京)有限公司 Economical multi-target multi-gene detection method applying large-scale parallel sequencing technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060110756A1 (en) * 2004-10-25 2006-05-25 Tom Tang Large-scale parallelized DNA sequencing
US20060172333A1 (en) * 2005-01-28 2006-08-03 Applera Corporation Compositions and methods for terminating a sequencing reaction at a specific location in a target DNA template
CN101153338A (en) * 2007-09-14 2008-04-02 东南大学 DNA sequencing method based on base modification protection reciprocation extension
CN101928770A (en) * 2009-09-18 2010-12-29 深圳市血液中心 Sequencing-based typing method of human leucocyte antigen (HLA)-Cw gene
CN102344961A (en) * 2011-09-30 2012-02-08 康旭基因技术(北京)有限公司 Economical multi-target multi-gene detection method applying large-scale parallel sequencing technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220098639A1 (en) * 2020-09-25 2022-03-31 Hainan University Simple, Cost-effective and Amplification-based Whole Genome Sequencing Approach
US11905554B2 (en) * 2020-09-25 2024-02-20 Hainan University Simple, cost-effective and amplification-based whole genome sequencing approach

Similar Documents

Publication Publication Date Title
EP3789506B1 (en) Prunus mume pendulous trait snp molecular markers and use thereof
CN105256035B (en) Molecular labeling for differentiating Jinhua both ends crow and duroc genetic background and its application
CN113308562B (en) Cotton whole genome 40K single nucleotide site and application thereof in cotton genotyping
CN107400720B (en) Method for detecting growth traits of cattle under assistance of KLF3 gene CNV marker and special kit thereof
WO2020118883A1 (en) Tightly linked snp molecular marker of prunus mume sieb. et zucc. hanging branch trait, detection method therefor and application thereof
CN109593876A (en) The KASP label serotype specific primer group and its application of high throughput detection AhFAD2B gene mutation site
Wyrwa et al. Integration of Lupinus angustifolius L.(narrow-leafed lupin) genome maps and comparative mapping within legumes
CN113502335B (en) Molecular marker related to sheep growth traits and application thereof
CN107988385B (en) Method for detecting marker of PLAG1 gene Indel of beef cattle and special kit thereof
CN106566872B (en) The analysis method in the pig SNP marker site based on sequencing genotyping technique
WO2023208078A1 (en) Genome structure variation for regulating tomato fruit soluble solid content, related product, and application
CN106367498B (en) Periplaneta americana microsatellite loci and application thereof
WO2020199127A1 (en) Design of sequencing primers and pcr-based method for sequencing whole genome
CN111763668B (en) Sequencing primer group and PCR-based whole genome sequencing method
CN113278723B (en) Composition for analyzing genetic diversity of Chinese cabbage genome segment or genetic diversity introduced in synthetic mustard and application
CN116083592A (en) Molecular marker related to sheep growth traits and application thereof
US11905554B2 (en) Simple, cost-effective and amplification-based whole genome sequencing approach
CN113046467A (en) SNP loci significantly associated with wheat stripe rust resistance and application thereof in genetic breeding
CN112410441A (en) Method for identifying anti-cysticercosis trait of bee colony by using SNP marker KZ 288479.1-95621
CN112011640A (en) KASP molecular marker, primer and application for identifying pH of watermelon fruit
CN106520955B (en) Development method of rice microsatellite marker locus and length detection method of microsatellite marker in microsatellite marker locus
CN111411165B (en) SNP (Single nucleotide polymorphism) site primer combination for identifying cucumber germplasm authenticity and application
CN106566890B (en) Method for developing rape microsatellite marker locus and method for detecting length of microsatellite marker in microsatellite marker locus
WO2008015975A1 (en) Method for amplification of dna fragment
CN113755629A (en) Mixed sample detection method for detecting purity of tomato seeds based on mSNP technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19922704

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19922704

Country of ref document: EP

Kind code of ref document: A1