WO2012068919A1 - DNA文库及其制备方法、以及检测SNPs的方法和装置 - Google Patents

DNA文库及其制备方法、以及检测SNPs的方法和装置 Download PDF

Info

Publication number
WO2012068919A1
WO2012068919A1 PCT/CN2011/079971 CN2011079971W WO2012068919A1 WO 2012068919 A1 WO2012068919 A1 WO 2012068919A1 CN 2011079971 W CN2011079971 W CN 2011079971W WO 2012068919 A1 WO2012068919 A1 WO 2012068919A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
snps
library
sequencing
dna library
Prior art date
Application number
PCT/CN2011/079971
Other languages
English (en)
French (fr)
Inventor
杜野
赵美茹
陈颖
武靖华
田埂
王俊
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Priority to US13/989,031 priority Critical patent/US9493821B2/en
Priority to EP11843141.0A priority patent/EP2631336B1/en
Priority to DK11843141.0T priority patent/DK2631336T3/en
Publication of WO2012068919A1 publication Critical patent/WO2012068919A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • C12Q1/683Hybridisation assays for detection of mutation or polymorphism involving restriction enzymes, e.g. restriction fragment length polymorphism [RFLP]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2521/00Reaction characterised by the enzymatic activity
    • C12Q2521/30Phosphoric diester hydrolysing, i.e. nuclease
    • C12Q2521/313Type II endonucleases, i.e. cutting outside recognition site
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2525/00Reactions involving modified oligonucleotides, nucleic acids, or nucleotides
    • C12Q2525/10Modifications characterised by
    • C12Q2525/191Modifications characterised by incorporating an adaptor

Definitions

  • the present invention relates to the field of molecular biology. Specifically, the present invention provides a DNA library and a method for preparing the same, a method for determining DNA sequence information, a device and kit for detecting SNPs, and a genotyping method.
  • Single nucleotide polymorphism refers to the variation of a single nucleotide in the genome, which is large in number and polymorphic. SNPs are considered to be the most ideal genetic markers in comparative genomics and evolutionary genomics research, and are also used as effective molecular markers in disease-related genetic and pharmacogenomic studies. Regardless of the application field, SNPs in a large number of samples need to be detected and typed. Although deep resequencing of genomes is the most direct and effective method for detecting SNPs, the cost of genome sequencing is currently too expensive. Meeting the requirements of large-scale sample sequencing, many high-throughput SNPs typing methods and commercial platforms have been developed ( Chunming Ding and Shengnan Jin.
  • the commonly used high-throughput SNPs typing platform is mainly based on the single-base extension technology of the lillumina® BeadArray platform and the differential hybridization method based Affymetrix SNP micro array, both based on existing SNPs site information.
  • synthetic probes to detect specific tagSNPs according to their different principles, different combinations of SNPs to be tested can be designed for different traits to be analyzed, which makes the detection design more flexible and specific.
  • these methods also have certain limitations.
  • the probes are subject to strict screening design, and not all tagSNPs can meet the requirements of these designs.
  • restriction endonucleases are combined with next-generation sequencing technology (NGS) for the detection of polymorphisms at specific sites within the genome [Nathan A, Baird, Paul D, et al. (2008) . Rapid SNP Discovery And Genetic Mapping Using Sequenced RAD Markers . PLoS ONE, 3(10) :3376. ; Michael A. Gore, et /. (2009) . A First-Generation Haplotype Map of Maize. Science, 326: 1115 , by reference Several existing methods, or the sequencing of DNA ( RAD ) tags associated with a restriction enzyme site ( Nathan A, Baird, Paul D, et al.
  • a method of preparing a DNA library which can prepare a DNA library for detecting SNPs.
  • the method comprises the steps of: digesting the sample genomic DNA using a restriction enzyme to obtain a cleavage product, wherein the restriction enzyme comprises a substance selected from the group consisting of M w II and 73 ⁇ 47 451 At least one of the digested products is isolated to obtain a DNA fragment having a length of 100 bp - 1 , 000 bp; the DNA fragment is end-repaired to obtain a DNA fragment that has undergone end repair; Base A is added to the end of the end-repaired DNA fragment to obtain a DNA fragment having a terminal base A; and the DNA fragment having the terminal base A is ligated to a sequencing linker to obtain the DNA library.
  • the method for constructing a DNA library can efficiently construct a DNA library of a sample, thereby obtaining sequence information of the sample DNA by sequencing the DNA library, and finally performing SNPs data analysis on the DNA sequence information. Obtain SNPs information for the sample DNA. Further, the inventors have found that the above method is simple in process, extremely easy to operate, easy to standardize in operation flow, and low in cost. In addition, the inventors have surprisingly found that when the DNA library is constructed using the candidate different restriction enzymes for the same sample based on the above method, the obtained sequencing data results are stable and repeatable. Very good sex.
  • the present invention also provides a DNA library obtained by a method of preparing a DNA library according to an embodiment of the present invention.
  • the present invention also provides a method of determining DNA sequence information.
  • the method comprises the steps of: constructing a DNA library of the sample genomic DNA according to a method of preparing a DNA library according to an embodiment of the present invention; and sequencing the DNA library to obtain the DNA sequence information. Based on the method, the sequence information of the DNA sample in the DNA library can be efficiently obtained, so that the SNPs data can be analyzed for the DNA sequence information to obtain the SNPs information of the sample DNA. Further, the inventors have surprisingly found that the use of the method according to an embodiment of the present invention to determine DNA sample sequence information can effectively reduce the problem of data output bias and can reduce costs.
  • the present invention also provides an apparatus for detecting SNPs, according to an embodiment of the present invention, comprising: a DNA library preparation unit for preparing a DNA library a sequencing unit, the sequencing unit is coupled to the DNA library preparation unit, configured to sequence the DNA library to obtain DNA sequence information; and a SNPs data analysis unit, the SNPs data analysis unit and the sequencing unit Connected for performing SNPs data analysis on the DNA sequence information to obtain SNPs information.
  • the present invention also provides a kit for detecting SNPs, according to an embodiment of the present invention, the kit comprising: a restriction enzyme, the restriction enzyme comprising the selection At least one of M o II and 451.
  • the kit comprising: a restriction enzyme, the restriction enzyme comprising the selection At least one of M o II and 451.
  • the present invention also provides a genotyping method, comprising: providing a sample genome according to an embodiment of the present invention; preparing a DNA library according to an embodiment of the present invention, preparing a sample genome a DNA library; sequencing the DNA library to obtain the DNA sequence information; performing SNPs data analysis on the DNA sequence information to obtain SNPs information of the sample; and performing the sample on the sample based on the SNPs information Genotyping.
  • a genotyping method comprising: providing a sample genome according to an embodiment of the present invention; preparing a DNA library according to an embodiment of the present invention, preparing a sample genome a DNA library; sequencing the DNA library to obtain the DNA sequence information; performing SNPs data analysis on the DNA sequence information to obtain SNPs information of the sample; and performing the sample on the sample based on the SNPs information Genotyping.
  • Figure 1 shows the flow of a SNPs detecting method according to an embodiment of the present invention
  • Fig. 2 shows the results of electrophoresis detection of genomic DNA after enzyme cleavage by constructing a DNA library according to an embodiment of the present invention.
  • Figure 3 shows the method of constructing a DNA library according to the method of constructing a DNA library according to an embodiment of the present invention, the genome
  • the DNA was detected by Agilent® Bioanalyzer 2100, which was digested with four enzymes.
  • Figure 4 shows a statistical model of the insert range of the DY library constructed using 73 ⁇ 49 451, in accordance with an embodiment of the present invention.
  • Figure 5 shows a statistical plot of the insert range of the YH library constructed using Tsp 451, a method for constructing a DNA library according to an embodiment of the present invention.
  • Figure 6 shows a statistical curve of the depth of sequencing data of a DY library constructed using Tsp 451, according to an embodiment of the present invention.
  • Figure 7 A statistical graph showing the depth of sequencing data of a YH library constructed using Tsp 451, according to a method of constructing a DNA library according to an embodiment of the present invention.
  • Figure 8 shows a method for constructing a DNA library according to an embodiment of the present invention, constructed using 7 ⁇ 451 respectively
  • Fig. 9 is a view showing a comparison of depth of coincidence of target regions between two constructed YH libraries, showing a method of constructing a DNA library in an embodiment of the present invention.
  • Figure 10 shows a schematic diagram of an apparatus for detecting SNPs in accordance with one embodiment of the present invention.
  • the present invention provides a method of preparing a DNA library which can prepare a DNA library for detecting SNPs. Specifically, in accordance with an embodiment of the present invention, referring to FIG. 1, the method includes the following steps:
  • the sample genomic DNA is digested with a restriction endonuclease to obtain a digested product, wherein the restriction enzyme includes at least one selected from the group consisting of M w II and : ⁇ 45 ⁇ .
  • the restriction enzyme further comprises at least one selected from the group consisting of Hind III and Bcc I.
  • the source of the DNA sample is not particularly limited.
  • the sample genomic DNA may be any species derived from the existing genome-wide sequence data (for example, the species listed at http://www.ncbi.nlm.nih.gov/sites/genome), specifically
  • the sample genomic DNA can be taken from an individual, a single cell, or a tissue of the species.
  • the sample genomic DNA is human genomic DNA.
  • the method of extracting genomic DNA is not particularly limited. Those skilled in the art will appreciate that the extraction of genomic DNA may be based on different methods depending on the species and sample, and in particular, may be accomplished according to methods known in the art (including the use of commercially available kits), such as plants.
  • Tissues or microorganisms can be extracted using the standard CTAB method, and human blood genomic DNA can be obtained using the QIAamp® DNA Mini Kit (QIAGEN).
  • the method for constructing a DNA library according to an embodiment of the present invention requires that the obtained genomic DNA should be kept as intact as possible, that is, to reduce excessive small DNA fragments generated by human rupture, and it is generally considered that 23K is detected by agarose gel electrophoresis.
  • the above criteria are acceptable, and the purity of the DNA is required to be as high as possible to avoid affecting the enzyme digestion.
  • restriction enzymes used for restriction enzyme digestion are slightly different depending on the species studied. The more commonly used recognition sequences are 5 or 6 base type II restriction enzymes, in addition to the cleavage site. Type II s restriction enzymes other than the recognition site can also be used. In general, the restriction enzymes used should be 1-2, because the use of too many restriction enzymes is difficult to complete in a tube reaction system, which not only increases the complexity of the operation, but also leads to the enzyme digestion. Incomplete or asterisk activity.
  • NEB NEB
  • TaKaRa a preferred restriction endonuclease according to an embodiment of the present invention, which is at least one selected from the group consisting of: (1) Mbo II; 2) Tsp 451; (3) Mbo II and Hind III; and (4) Mbo II and ⁇ I.
  • the digested product is separated to obtain a DNA fragment having a length of 100 bp - 1 , 000 bp.
  • the method for separating and recovering the digested product is not limited and can be carried out according to a method well known in the art.
  • the digested product can be isolated using a suitable concentration of agarose gel electrophoresis.
  • the digested product is separated by 2% agarose gel electrophoresis, and then a gel of a target length (100 bp - 1 , 000 bp) is cut out, and then commercialized coagulation can be utilized.
  • a gel recovery kit (such as the MinElute® PCR Purification Kit (QIAGEN)) recovers DNA fragments within the target length range.
  • the target length of the DNA fragment ranges from 100 bp to 1 000 bp, and further, the length of the DNA fragment is from 200 bp to 700 bp.
  • a terminally repairing and "A" reaction of a DNA fragment uses a standardized procedure as follows: Adding the above DNA fragment, 10 mM dNTP, T4 DNA Polymerase, Klenow Fragment, T4 to a reaction system Polynucleotide Kinase and T4 DNA ligase buffer (with lOmM ATP), incubated at 20 °C for 30 minutes, then recover the DNA fragment, then add the complementary DNA, dATP, Klenow Fragment, Klenow (3' -5) in another reaction system. ' exo— ), react at 37 ° C for 30 minutes.
  • a DNA fragment having a terminal base A is ligated to a sequencing adaptor to obtain a DNA library.
  • the selection of the sequencing adaptor is not particularly limited, and different adaptors can be selected depending on the sequencing technique method (high-throughput sequencing platform) used.
  • Iillumina®'s edge-synthesis sequencing principle method is employed, so the sequencing linker selects the corresponding Iillumina® linker, which contains the sequence complementary to the oligonucleotides on the flow cell used for sequencing. The sequence, whereby the library fragment can be ligated to the flow cell, allowing the subsequent sequencing process to proceed.
  • the sequencing linker does not need to comprise an amplification primer binding site (because the method of constructing a DN A library according to an embodiment of the invention does not involve PCR amplification), but a binding site with sequencing primers is required
  • an 8 bp tag sequence and a tag sequencing primer sequence can also be brought into one side of the linker, which facilitates direct mixing of different libraries. After sequencing on the machine, it can be applied to the sequencing of large-scale samples.
  • the Agilent® Bioanalyzer 2100 can be used to detect the distribution of the library fragments and quantify the library by Q-PCR.
  • the method for constructing a DNA library according to an embodiment of the present invention can efficiently construct a DNA library of a sample, and by performing SNPs data analysis on the sequence information of the sample DNA obtained by sequencing the DNA library, the SNPs information of the sample DNA can be accurately obtained. It is thus possible to carry out many related scientific studies using the SNPs information of the samples. Further, the inventors have found that the above method is simple in process, extremely easy to operate, easy to standardize in operation flow, and low in cost. In addition, the inventors have surprisingly found that when constructing a DNA library with different candidate restriction enzymes based on the above methods, the stability and reproducibility of the obtained sequencing data results are obtained for the same sample. Very good; when multiple parallel libraries were constructed for the same sample, the sequencing data results were stable, indicating that the method of constructing the DNA library according to the embodiment of the present invention is parallel and reproducible.
  • the present invention provides a method of constructing a DNA library, comprising: 1) using at least one restriction endonuclease to digest the sample genomic DNA to obtain a digested product;
  • the method further comprises the steps of:
  • the method further comprises the steps of:
  • the above step 1) of the method for constructing a DNA tag library according to an embodiment of the present invention the sample genomic DNA used may be any species derived from the existing genome-wide sequence data (Example 3: http: ⁇ www.ncbi.nlm.nih.gov/sites/genome listed species), genomic DNA can be taken from individuals, single cells or a tissue of the species. Preferably, it is human genomic DNA.
  • the method of extracting genomic DNA can be carried out according to methods known in the art (including the use of commercially available kits) depending on the species and sample, for example, plant tissues or microorganisms can be extracted using standard CTAB method.
  • Human blood genomic DNA can be completed using the QIAamp® DNA Mini Kit (QIAGEN).
  • the obtained genomic DNA should be kept as intact as possible to reduce excessive small DNA fragments caused by human rupture.
  • the standard of 23K or more by agarose gel electrophoresis is regarded as qualified, and the purity of DNA is as high as possible to avoid affecting the enzymatic cleavage process.
  • at least one restriction enzyme is selected to cleave the genomic DNA, and the restriction enzymes used are slightly different depending on the species studied, and the more commonly used recognition sequence is 5 or 6 bases II. Type restriction enzymes, and type II s restriction enzymes having a cleavage site other than the recognition site can also be used.
  • the restriction enzymes used should be 1-2, because the use of too many restriction enzymes is difficult to complete in a tube reaction system, which not only increases the complexity of the operation, but also leads to the enzyme digestion. Incomplete or asterisk activity.
  • the reaction conditions are based on the instructions provided by the restriction endonuclease to ensure the optimal enzyme is achieved. Cut the effect.
  • the enzyme is cut to complete digestion.
  • different cleavage combinations are separately designed with the human genome as the main research object, and it is preferred that the enzyme digestion combination is as shown in Table 1. Among them, the name of the restriction enzyme is based on the announcement of NEB.
  • step 2) of the above method for constructing a DNA tag library according to an embodiment of the present invention recovery of the genomic fragment after enzymatic cleavage according to a method well known in the art, for example, using a suitable concentration of agarose
  • the DNA fragment was digested by gel electrophoresis.
  • a 2% agarose gel is a suitable choice for recovering DNA fragments in the range of 1 kb or less.
  • the gel in the target length range is cut.
  • a commercially available gel recovery kit (such as the MinElute® PCR Purification Kit (QIAGEN)) can then be used to recover DNA fragments within the target length range.
  • the restriction endonuclease cleaves the human genome into substantially the same length distribution (eg, 100 bp - 10,00 bp), and the fragments distributed in this range are for obtaining a part of the genome, and the length of the fragments in one library is different.
  • the quality of the data is sequenced and can result in significant cost increases.
  • the obtained DNA fragment has a length of 100 bp - 1 , 000 bp, and further, according to an embodiment of the present invention, the length of the DNA fragment is 200 bp - 700 bp.
  • the restriction enzyme in the step 1) of the method is preferably selected from the following (1) - (4) according to an embodiment of the present invention. At least one of the groups (as shown in Table 1): (l) M w II; (2) 451 (3) Mbo II ⁇ Hind III; and (4) Mbo II and Bcc I.
  • steps 3) and 4) of the above method for constructing a DNA tag library according to an embodiment of the present invention: the recovered DNA fragment is subjected to end repair and an "A" reaction using a standardized procedure,
  • the specific process is as follows: After recovering the recovered DNA, 10 mM dNTP, T4 DNA Polymerase Klenow Fragment, T4 Polynucleotide Kinase and T4 DNA ligase buffer (with lOmM ATP) in a reaction system for 30 minutes at 20 ° C, the fragments are recovered. In another reaction system, the filled DNA, dATP, Klenow Fragments Klenow (3 '-5' exo-) was added and reacted at 37 ° C for 30 minutes.
  • the above step 5) of the method for constructing a DNA tag library according to an embodiment of the present invention the ligation of a linker to a restriction fragment, and the selection of the linker is due to the sequencing technique used (high-throughput sequencing)
  • the platform varies).
  • illumina® company's side synthesis sequencing principle method is used. Therefore, the illumina® linker sequence comprises a sequence complementary to the ligation oligonucleotide on the flow cell used for sequencing to facilitate the ligation of the library fragment to the flow cell. on.
  • the added linker does not need to contain an amplification primer binding site, but a binding site with a sequencing primer is required, in order to sequence a DNA library derived from different sample preparations.
  • the 8 bp Index tag sequence and the index tag sequencing primer sequence can also be brought into one side of the linker, which facilitates direct mixing of different libraries and sequencing.
  • the library fragment distribution was measured by Agilent® Bioanalyzer 2100 and the library was quantified by Q-PCR.
  • a DNA library of a sample can be efficiently constructed, and after sequencing the DNA library, sequence information of the sample DNA can be accurately obtained, and SNPs data analysis is performed on the sequence information of the sample DNA.
  • the SNPs information of the sample DNA can be obtained, so that it can be successfully applied to many downstream related scientific researches.
  • the inventors have found that the above method is simple in process, the operation flow can be standardized, the operation is convenient, and the cost is low.
  • the inventors have surprisingly found that when constructing a DNA library using different restriction enzymes based on the above methods for the same sample, the sequencing data is stable and reproducible; and the library is constructed in parallel for the same sample. When the sequencing data results are stable, it indicates that the method of constructing the DNA library according to the embodiment of the present invention is parallel and reproducible.
  • the present invention also provides a DNA library constructed according to the method of constructing a DNA library of the present invention.
  • the DNA library can be effectively applied to high-throughput sequencing technologies such as Solexa technology, so that the SNPs can be obtained by obtaining sequence information of the sample DNA, thereby obtaining SNPs information of the sample DNA, so as to be applied to the downstream. Prepare for relevant scientific research.
  • the present invention also provides a method of determining DNA sequence information by sequencing a DNA library constructed by the method of constructing a DNA library according to an embodiment of the present invention.
  • the method comprises the steps of: constructing a DNA library of sample genomic DNA according to a method of preparing a DNA library according to an embodiment of the present invention; and sequencing the DNA library to obtain DNA sequence information.
  • the step of performing SNPs data analysis on the DNA sequence information is further included to obtain SNPs information of the DNA.
  • the use of sequencing platforms selected GS, GA sequencing platform, HiSeq2000 TM sequencing platform, and the platform SOLiD TM sequencing DNA library were sequenced. Based on the method, the sequence information of the DNA sample in the DNA library can be effectively obtained, so that the SNPs data can be analyzed on the DNA sequence information to obtain the SNPs information of the sample DNA, and then the samples can be obtained according to the SNPs information of the obtained sample. Scientific research such as genotyping.
  • the inventors have surprisingly found that the use of the method according to an embodiment of the present invention to determine DNA sample sequence information can effectively reduce the problem of data output bias, and the method is operability and parallelism for large-scale samples. Sequencing also effectively simplifies the process and reduces sequencing costs.
  • the present invention also provides an apparatus for detecting SNPs.
  • the apparatus 1000 for detecting SNPs includes: a DNA library preparation unit 100, a sequencing unit 200, and a SNPs data analysis unit 300, according to an embodiment of the present invention.
  • the DNA library preparation unit 100 is used to prepare a DNA library, and for example, any device suitable for the library construction method described above can be employed as the DNA library preparation unit 100.
  • the sequencing unit 200 is connected to the DNA library preparation unit 100, and the prepared DNA library can be received from the DNA library preparation unit 100, and the received DNA library can be sequenced, thereby obtaining DNA sequence information of the sample.
  • the SNPs data analysis unit 300 is connected to the sequencing unit 200, and can receive the DNA sequence information of the obtained samples from the sequencing unit 200, and can further perform SNPs data analysis on the DNA sequence information, thereby obtaining SNPs information. It will be understood by those skilled in the art that any device known in the art suitable for performing the above operations can be employed as a component of each of the above units.
  • the term "connected” as used herein is used in a broad sense and may be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms may be understood by one of ordinary skill in the art.
  • the apparatus for detecting SNPs according to an embodiment of the present invention can be applied to SNPs detection of a large number of samples, thereby simplifying the sequencing process, saving sequencing time and cost, and obtaining more accurate and accurate SNPs. For this purpose, it is only necessary to add an Index tag to the DNA library in the DNA library preparation unit, and the DNA library from the plurality of samples is mixed and sequenced.
  • the present invention also provides a kit for detecting SNPs, according to an embodiment of the present invention, the kit comprising: a restriction enzyme, the restriction enzyme comprising the selection At least one of M o II and 451.
  • the kit comprising: a restriction enzyme, the restriction enzyme comprising the selection At least one of M o II and 451.
  • a method of genotyping comprising: first, providing a sample genome; and, next, a method of constructing a DNA library according to an embodiment of the present invention Preparing a DNA library of the sample genome; sequencing the DNA library to obtain DNA sequence information; performing SNPs data analysis on the DNA sequence information to obtain SNPs information of the sample; and genotyping the sample based on the SNPs information.
  • sequence information of DNA samples can be obtained based on sequencing of high-quality DNA libraries, and then accurate and effective based on SNPs data analysis of DNA sequence information.
  • SNPs information combined with existing genotype information, can effectively genotype the sample. Further, the inventors have found that the genotyping method has a simple process, is easy to operate, can be applied to a large-scale sample at the same time, and has a low cost.
  • Example 1 Determination of preferred restriction endonucleases or knee combinations
  • the recognition sequence of the enzyme or enzyme combination in Table 1 below identify the site information by known restriction enzyme digestion, use the hgl8 genomic sequence as the reference sequence, and classify the genome by the length of the restriction site, and finally select 200bp.
  • a fragment in the range of -700 bp was used as a collection of libraries to be tested.
  • the hgl 8 genomic sequence data can be downloaded from a known database, for example, from http://genome.ucsc.edu/.
  • the result is optimal.
  • the result that is, the result of optimization.
  • Table 1 Preferred enzyme combinations for human genome restriction endonuclease digestion
  • SNP dbSNP vl28 coverage can be detected. Recovered fragment range Enzyme or enzyme combination
  • the human genomic DNA was extracted from the blood cells of Yanhuang No. 1 (YH1) and extracted using the QIAamp® DNA Mini Kit (QIAGEN). The operation was carried out in full accordance with the instructions. Finally, the genomic DNA was dissolved in EB buffer, and quantified by NonoDrop® ND-1000 at a absorbance of A260, and 5 g of the enzyme was digested.
  • the restriction enzymes were all purchased from NEB, and the buffer was supplied with the enzyme, and a total of four enzyme digestion combinations were performed.
  • the amount of genomic DNA in each digestion reaction system is 5 g, and the amount of restriction enzyme is ⁇ 3 ⁇ 4 liter, which is 20 U (NEB definition unit).
  • the most suitable buffer is selected for each reaction depending on the enzyme combination. The liquid and reaction conditions are detailed in Table 3 below.
  • reaction buffers are all lO x mother liquors, and finally the reaction system is filled to 10 with ultrapure water (H starts to rise, according to the optimum reaction conditions.
  • the digested genomic DNA was separated by 2% agarose gel electrophoresis (TAE buffer system) (Fig. 2), and the fragments ranging from 200 bp to 700 bp were manually cut through QIAquick® Gel Extraction Kit (QIAGEN) gel. It will be dissolved in 30 ⁇ l of ultrapure water.
  • TAE buffer system 2% agarose gel electrophoresis
  • QIAGEN QIAquick® Gel Extraction Kit
  • the end repair reaction was carried out as follows:
  • the DNA fragment was filled in using the MinElute® PCR Purification Kit (QIAGEN). The sample was finally dissolved in 32 ⁇ l of EB buffer.
  • connection reaction of the joint is as follows:
  • the ligation reaction was ligated overnight at 16 °C.
  • the linker was the Iillumina® PCR-free index linker, and the four libraries each had a unique 8 bp index tag sequence.
  • the constructed library was analyzed by the Agilent® Bioanalyzer 2100 for fragment distribution (Figure 3, A - D). As can be seen from Fig. 3, the fragments cut by the library ranged from 200 bp to 700 bp, and the length of the fragment increased by about 120 bp after ligation. It can be seen from Fig. 3 that the range of the four library fragments basically meets the requirements, and the library is in accordance with sequencing. Claim.
  • the library in which the Tsp 451 enzyme was constructed was named as a YH library (YH library trial 1).
  • the data analysis was mainly performed according to the method described in jun wang et al., Nature (2008) (J Wang, et al, (2008). The diploid genome sequence of an Asian individual. Nature, 56:60. Two-way sequencing, so the raw data is filtered by setting the direction and interval distance parameters of the sequencing read length (50 bp - 2000 bp), and the sequencing read lengths satisfying the conditions are compared in pairs, and the unsatisfied ones are separately The sequencing reads are aligned.
  • the alignment method can use SOAP v2.20 to compare the sequencing reads to the reference sequence hg l 8 .
  • the alignment process allows two base mismatches, and all sequencing reads can be calculated. Compare to the ratio on the reference sequence.
  • we can detect the proportion of the read lengths that can be compared on the target area of the different enzyme digestion results (shown in Table 1), as well as the coverage and coverage depth of the target area. The results are shown in Table 4. Show.
  • the final results of the four enzyme digestion combinations are basically the same, except that the Mbo ll-Bcc I combination is doubled, and the other three sequencing libraries generate 3Gb - 4Gb data volume. 70% - 80% of the sequences can be compared to the genome, and 57% - 73% of the data can be compared in the target area. Finally, compared with the results shown in Table 1, 72% - 90% of the standard The region is covered by sequencing, and the average depth of coverage is 3 - 5 . Thus, the method can obtain about 90% of the target region using a better combination of enzyme digestion, and compared with the results shown in Table 1, the application The consistency of different enzyme digestion combinations is better.
  • the SNP is detected using the SOAPsnp program, filtered according to the filtering parameters of Q20. mean quality of best allele>20.copy number ⁇ 1.1, and finally the actual number of SNPs obtained, and the proportion of these sites in the dbSNP database.
  • the existing SNP locus information of Yanhuang No. 1 whole genome Ruiqiang Li et /., (2010). SNP detection for massively parallel whole-genome resequencing.
  • SNP locus information within the target region of the cut library is compared with the SNP locus identified in the present embodiment, and the ratio of the actually detected SNPs locus to the existing results is calculated.
  • the sequencing data of the two libraries independently constructed using 451 was aligned with the hgl 8 genomic sequence as a reference, and the length distribution of the insert was counted using these sequencing sequences which correctly aligned to the reference genome, and the results showed that Using the DY genome (Fig. 4) or the YH genome (Fig. 5), the inserts were normally distributed between 200 bp and 700 bp, which is consistent with the original experimental design and operation, and between the two libraries. The distribution ratio (Y coordinate) of the sequence data in the fragment length range (X coordinate) is also consistent. Furthermore, the distribution of the sequencing data of the two libraries was counted, and the average sequencing depth of the DY library (Fig.
  • the 28 ⁇ (65%) locus although not detected in the reported alfalfa genome, was included in the dbSNP database, indicating that this may be filtered out for some reason in the reference SNP dataset of YH. In this experiment, it was correctly detected. Therefore, in addition to this reason, the false positive rate can be controlled within a reasonable range.
  • the false negative part has about 21K (28%) because the SNP is located in the recognition site of the restriction enzyme, which eventually causes the enzyme to be unrecognizable and cleavage, and the II target region fragment and SNP site information are discarded, and Most of them are caused by insufficient sequencing depth or low quality of sequencing of this site. This part is not related to this method, and can be further optimized by increasing the amount of sequencing in subsequent experiments.
  • the data obtained this time were compared with the typing information of the YH genome using the current mainstream genotyping chip (Iillumina 1M BeadChip), which covers about 1M on the chip.
  • the SNPs site has 100K located in the target region of the method, and the method covers about 98K (90%).
  • the coincidence rate for homozygous sites is over 99%, and the heterozygous position
  • the point consistency rate is 92%, and the accuracy and coverage are good.
  • the method of constructing a DNA library, determining DNA sequence information, and detecting a SNP site can effectively obtain a target region fragment of more than 90% of the pre-simulated (Table 1), and succeeds. And accurately detect most of the SNPs in the region, these SNP information can be used in subsequent genotyping or GWAS studies.
  • DNA library of the present invention preparation method thereof, method for determining DNA sequence information, and device for detecting SNPs And kits, as well as genotyping methods, can be applied to DNA sequencing, and thus applied to SNPs detection and genotyping, and can effectively improve the sequencing throughput of sequencing platforms such as the Solexa sequencing platform.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

DNA文库及其制备方法、 以及检测 SNPs的方法和装置 优先权信息
本申请请求 2010 年 1 1 月 23 日向中国国家知识产权局提交的、 专利申请号为 201010555192.4的专利申倚的优先权和权益, 并且通过参照将其全文并入此处。
技术领域
本发明涉及分子生物学领域, 具体地, 本发明提供了 DNA文库及其制备方法、 确 定 DNA序列信息的方法、 检测 SNPs的装置和试剂盒、 以及基因分型方法。
背景技术
单核苦酸位点多态性( single nucleotide polymorphism, SNP )是指基因组上单个核 苷酸的变异, 它的数量巨大而且多态性丰富。 SNP在比较基因组学和进化基因组学的研 究中被认为是最理想的遗传标记,同时在与疾病相关的遗传学和药理基因组学研究中也 被作为有效的分子标记。 而无论对于哪个应用领域, 都需要对大量的样本中的 SNPs位 点进行检测并分型, 虽然基因组的深度重测序是检测 SNPs最为直接且有效的办法, 但 是目前基因组测序的成本较为昂贵, 无法满足大规模样本测序的要求, 所以, 许多高通 量的 SNPs分型方法和商业平台得以大力发展( Chunming Ding and Shengnan Jin.(2009) . High-Throughput Methods for SNP Genotyping . Single Nucleotide Polymorphisms, Methods in Molecular Biology. AA. Komar (eds), Humana Press. p578 , 通过参照将其全文并入本 文) 。
然而, 目前对样本 SNPs进行检测的方法, 仍有待改进。
发明内容
本发明是基于发明人的下列发现而完成的:
目前, 应用比较普遍的高通量 SNPs 分型平台主要是基于单碱基延伸技术的 lillumina® BeadArray平台和基于差异杂交方法的 Affymetrix SNP micro array , 两者都是 基于已有的 SNPs位点信息, 通过设计合成探针根据各自不同的原理对特定的 tagSNPs 进行检测的方法, 同时可以针对不同的待关联分析的性状设计不同的待检 SNPs组合, 使检测设计更加灵活, 特异性更高。 但这些方法也有一定的局限性, 比如探针要经过严 格的筛选设计, 而且并不是所有的 tagSNPs都能满足这些设计的要求。 同时芯片的合成 要求比较高,一般实验室很难实现, 而购买商业芯片又导致成本过高同时需要专门的扫 描仪器和分析软件。此外,一个重要的限制因素是该方法中探针的设计必须建立在已知 SNPs 数据库的基础上, 无法发现未知的 SNPs 位点 ( Chunming Ding and Shengnan Jin. (2009) . High-Throughput Methods for SNP Genotyping. Single Nucleotide Polymorphisms, Methods in Molecular Biology. AA. Komar (eds), Humana Press . p578 , 通 过参照将其全文并入本文) 。
此外, 将限制性内切酶与下一代测序技术( NGS )结合, 用于基因组范围内特异位 点的多态性检测 [ Nathan A, Baird, Paul D, et al. (2008) . Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers . PLoS ONE, 3(10) :3376. ; Michael A. Gore, et /. (2009) . A First-Generation Haplotype Map of Maize. Science, 326: 1115 , 通过参照将其 全文并入本文) 】 的几个现有方法, 或者依赖于对一种限制性酶切位点关联的 DNA ( RAD )标签的测序 ( Nathan A, Baird, Paul D, et al. (2008) Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE, 3( 10):3376. ; S a nchez CC, Smith TPL, Wiedmann RT, et al. (2009). Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library. BMC Genomics, 10:559. ,通过参照将其全文并入本文),或者至少要使用一种稀有位点的限制性内切酶, 且酶切建库过程都比较繁瑣, 而其中有些方法因为要经过 PCR扩增过程更容易将序列 的偏差引入并放大。 本发明旨在解决现有技术问题的至少之一。 为此, 本发明的一个方面, 提出了一种 制备 DNA文库的方法,其可以制备用于检测 SNPs的 DNA文库。根据本发明的实施例, 其包括以下步骤: 使用限制性内切酶, 对样本基因组 DNA进行酶切, 以便获得酶切产 物, 其中所述限制性内切酶包括选自 M w II和 7¾7 451的至少一种; 将所述酶切产物进 行分离, 以便获得长度为 100 bp - 1 ,000 bp的 DNA片段;将所述 DNA片段进行末端修 复, 以便获得经过末端修复的 DNA片段; 在所述经过末端修复的 DNA片段的末端添 加碱基 A,以便获得具有末端碱基 A的 DNA片段;以及将所述具有末端碱基 A的 DNA 片段与测序接头连接, 以便获得所述 DNA文库。 利用根据本发明实施例的构建 DNA 文库的方法, 能够有效地构建样本的 DNA文库, 从而可以通过对 DNA文库进行测序, 获得样品 DNA的序列信息,最终通过对 DNA序列信息进行 SNPs数据分析就可以获得 样本 DNA的 SNPs信息。 另外, 本发明人发现, 上述方法过程简单, 极易操作, 操作 流程易标准化, 且成本较低。 除此之外, 发明人还惊奇地发现, 当针对相同的样品, 基 于上述方法, 釆用候选的不同的限制性内切酶构建 DNA文库时, 所得到的测序数据结 果的稳定性和可重复性非常好。
进一步, 本发明还提供了一种 DNA文库, 其是由根据本发明实施例的制备 DNA 文库的方法所获得的。
根据本发明的又一方面, 本发明还提供了一种确定 DNA序列信息的方法。 根据本 发明的实施例, 其包括下列步骤: 根据本发明实施例的制备 DNA文库的方法构建所述 样本基因组 DNA的 DNA文库; 以及对所述 DNA文库进行测序, 以便获得所述 DNA序列 信息。基于该方法,能够有效地获得 DNA文库中 DNA样品的序列信息,从而能够对 DNA 序列信息进行 SNPs数据分析, 以获得样本 DNA的 SNPs信息。 另外, 发明人惊奇地发现, 利用根据本发明实施例的方法确定 DNA样品序列信息, 能够有效地减少数据产出偏向 性的问题, 并能够降低成本。
根据本发明的又一方面, 本发明还提供了一种用于检测 SNPs的装置, 根据本发明 的实施例,其包括如下单元: DNA文库制备单元,所述 DNA文库制备单元用于制备 DNA 文库; 测序单元, 所述测序单元与所述 DNA文库制备单元相连, 用于对所述 DNA文库 进行测序, 以便获得 DNA序列信息; 以及 SNPs数据分析单元, 所述 SNPs数据分析单元 与所述测序单元相连, 用于对所述 DNA序列信息进行 SNPs数据分析, 以便获得 SNPs信 息。 利用该装置, 能够方便地对样本进行 SNPs检测, 并能获得准确的 SNPs信息, 而且 可以应用于大规模数量的样本的 SNPs检测。
根据本发明的再一方面, 本发明还提供了一种用于检测 SNPs的试剂盒, 根据本发 明的实施例, 该试剂盒包括: 限制性内切酶, 所述限制性内切酶包括选自 M o II和 451的至少一种。 由此, 利用该试剂盒, 能够方便地检测样本的 SNPs。
根据本发明的再一方面,本发明还提供了一种基因分型方法,根据本发明的实施例, 其包括: 提供样本基因组; 根据本发明实施例的构建 DNA文库的方法, 制备样本基因 组的 DNA文库; 对所述 DNA文库进行测序, 以便获得所述 DNA序列信息; 对所述 DNA 序列信息进行 SNPs数据分析, 以便获得所述样本的 SNPs信息; 以及基于所述 SNPs信息 对所述样本进行基因分型。 利用上述方法, 通过构建符合 SNPs检测要求的样本 DNA文 库, 以及对 DNA文库进行测序获得 DNA样品的序列信息, 然后对 DNA序列信息进行 SNPs数据分析, 就能够准确有效地获得样本 DNA的 SNPs信息, 从而基于获得的样本的 SNPs信息, 结合该物种已有的基因型信息, 就能够有效地对样本进行基因分型。 另夕卜, 本发明人发现, 该基因分型方法过程简单, 操作容易, 且成本很低。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。
附图说明
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中:
图 1: 显示了根据本发明实施例的 SNPs检测方法的流程;
图 2: 显示了 >据本发明实施例的构建 DNA文库的方法构建 DNA文戽时 基因组 DNA被酶切后的电泳检测结果。
图 3 : 显示了才良据本发明实施例的枸建 DNA文库的方法构建 DNA文库时,基因组
DNA分别被四种酶组合酶切后的 Agilent® Bioanalyzer 2100检测结果。
图 4: 显示了根据本发明实施例的构建 DNA文库的方法, 使用 7¾9 451构建的 DY 文库的插入片段范围的统计曲线。
图 5 : 显示了根据本发明实施例的构建 DNA文库的方法, 使用 Tsp 451构建的 YH 文库的插入片段范围的统计曲线。
图 6 : 显示了根据本发明实施例的构建 DNA文库的方法, 使用 Tsp 451构建的 DY 文库的测序数据深度的统计曲线。
图 7 : 显示了根据本发明实施例的构建 DNA文库的方法, 使用 Tsp 451构建的 YH 文库的测序数据深度的统计曲线。
图 8 : 显示了 居本发明实施例的构建 DNA文库的方法, 使用 7^ 451分别构建的
DY文库和 YH文库间的目标区域覆盖深度一致性的比较图。
图 9 : 显示了才 居本发明实施例的构建 DNA文库的方法, 两次构建的 YH文库间 的目标区域覆盖深度一致性的比较图。
图 10显示了根据本发明一个实施例的用于检测 SNPs的装置的示意图。
发明详细描述
下面详细描述本发明的实施例, 所述实施例的示例在附图中示出,其中自始至终相 述的实施例是示例' :仅用于解 发明 ,δ 不能理解为对本发明的限制 ' 需要说明的是, 术语 "第一" 、 "第二" 仅用于描述目的, 而不能理解为指示或暗 示相对重要性或者隐含指明所指示的技术特征的数量。 由此, 限定有 "第一"、 "第二" 的特征可以明示或者隐含地包括一个或者更多个该特征。进一步地,在本发明的描述中, 除非另有说明, "多个" 的含义是两个或两个以上。
DNA文库及其构建并测序的方法
根据本发明的一个方面, 本发明提出了一种制备 DNA文库的方法, 其可以制备用 于检测 SNPs的 DNA文库。 具体地, 根据本发明的实施例, 参考图 1 , 该方法包括以 下步驟:
首先, 使用限制性内切酶, 对样本基因组 DNA进行酶切, 以便获得酶切产物, 其 中限制性内切酶包括选自 M w II和: ψ 45Ι的至少一种。 根据本发明的实施例, 限制性 内切酶进一步包括选自 Hind III和 Bcc I的至少一种。根据本发明的实施例, DNA样品 的来源并不受特別限制。 根据本发明的具体示例, 样本基因组 DNA可以是来源于目前 已有全基因组序列数据的任何物种 (例如 http://www.ncbi.nlm.nih.gov/sites/genome所列 物种) , 具体地, 样本基因组 DNA可以取自该物种的个体、 单个细胞或某个组织。 优 选地, 根据本发明的实施例, 样本基因组 DNA为人的基因组 DNA。 另外, 根据本发明 的实施例, 基因组 DNA的提取方法不受特别限制。 本领域技术人员可以理解, 基因组 DNA 的提取可以才艮据物种和样本的不同而选取不同的方法, 具体地, 可以按照本领域 已知的方法完成(包括使用商品化的试剂盒), 比如植物组织或微生物可以使用标准的 CTAB法提取, 人类血液基因组 DNA可以使用 QIAamp® DNA Mini Kit(QIAGEN)完成 等。 根据本发明的实施例的构建 DNA文库的方法, 要求得到的基因组 DNA应尽量保 持完整, 即要减少因人为断裂而产生过多的小 DNA片段, 一般认为, 经琼脂糖凝胶电 泳检测达到 23K以上的标准为合格, 同时要求 DNA纯度尽量高, 以避免影响酶切。
此外, 本申请的发明人发现, 构建 DNA文库必须选择至少一种限制性内切酶对基 因组 DNA进行酶切, 所用限制性内切酶依赖于所研究的物种不同而略有不同, 其中较 常用的识别序列为 5或 6碱基的 II型限制性内切酶,此外切割位点在识别位点以外的 II s型限制性内切酶也可以使用。 一般来讲, 所用限制性内切酶应该为 1-2种, 因为使用 过多的限制性内切酶较难在一管反应体系中完成, 不仅会增加操作的复杂性, 而且容易 导致酶切不完全或星号活性的出现。而目前,有许多商品化的限制性内切酶的可供选择, 比如 NEB ( NEW ENGLANG BioLabs )公司、 TaKaRa公司等等。 为此, 本申请的发明 人进行了大量的筛选工作, 并且选定了优选的根据本发明实施例的限制性内切酶,其为 选自下列的至少一组: ( 1 ) Mbo II; ( 2 ) Tsp 451; ( 3 ) Mbo II和 Hind III; 以及( 4 ) Mbo II和 βί I。
其次, 将酶切产物进行分离, 以便获得长度为 100 bp - 1 ,000 bp的 DNA片段。 根 据本发明的实施例,对酶切产物进行分离回收的方法不受限制,可以按照本领域所熟知 的方法进行。根据本发明的具体示例,可以使用合适浓度的琼脂糖凝胶电泳分离酶切产 物。 具体地, 根据本发明的实施例, 采用 2%的琼脂糖凝胶电泳分离酶切产物, 接着切 取目标长度范围内 ( 100 bp - 1 ,000 bp ) 的凝胶, 然后可以利用商品化的凝胶回收试剂 盒(例如 MinElute® PCR Purification Kit ( QIAGEN ) ) 回收目标长度范围内的 DNA片 段。 根据本发明的实施例, DNA片段的目标长度范围为 100 bp - 1 ,000 bp , 进一步, DNA片段的长度为 200 bp - 700 bp。
接下来, 将 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段, 以及 在经过末端修复的 DNA片段的末端添加碱基 A, 以便获得具有末端碱基 A的 DNA片 段。 根据本发明的实施例, 对 DNA片段进行末端修复和加 "A" 反应使用标准化的流 程, 其具体过程如下: 在一个反应体系中加入上述 DNA片段、 10mM dNTP、 T4 DNA Polymerase, Klenow Fragment, T4 Polynucleotide Kinase以及 T4 DNA ligase buffer (with lOmM ATP) , 20 °C温育 30分钟, 然后回收 DNA片段, 接着在另一反应体系中加入补 平的 DNA、 dATP、 Klenow Fragment、 Klenow (3' -5' exo— ), 于 37 °C反应 30分钟即 可。
最后, 将具有末端碱基 A的 DNA片段与测序接头连接, 以便获得 DNA文库。 根 据本发明的实施例,测序接头的选择不受特别限制,可以依据所使用的测序技术方法(高 通量测序平台) 而选择不同的接头。 根据本发明的实施例, 采用 Iillumina®公司的边合 成边测序原理方法, 因此测序接头就选择了相应的 Iillumina®接头, 该接头序列包含与 测序所用 flow cell上具有的寡核苷酸的序列互补的序列, 由此可以将文库片段连接到 flow cell上, 从而得以继续接下来的的测序流程。 根据本发明的实施例, 测序接头不需 要包含扩增引物结合位点(因为根据本发明实施例的构建 DN A文库的方法,不涉及 PCR 扩增) , 但需要带有测序引物的结合位点, 进一步地, 为了将来源于不同样本制备的 DNA文库在测序后区分开来, 8 bp的标签序列以及标签测序引物序列也可被带入到一 側接头中,这样可以方便将不同文库直接混合后上机测序,得以应用于大规模数量样本 的建库测序。 另外, 根据本发明的实施例, 文库构建结束后, 可以利用 Agilent® Bioanalyzer 2100检测文库片段分布情况以及利用 Q-PCR对文库进行定量。
利用根据本发明实施例的构建 DNA文库的方法, 能够有效地构建样本的 DNA文 库,通过对 DNA文库测序获得的样品 DNA的序列信息进行 SNPs数据分析, 就可以准 确地获得样本 DNA的 SNPs信息,从而可以利用样本的 SNPs信息进行许多相关科学研 究。 另外, 本发明人发现, 上述方法过程简单, 极易操作, 操作流程易标准化, 且成本 较低。 除此之外, 发明人还惊奇地发现, 当针对相同的样品, 基于上述方法, 采用候选 的不同的限制性内切酶构建 DNA文库时, 所得到的测序数据结果的稳定性和可重复性 非常好; 而针对相同样本进行多次平行建库时, 测序数据结果稳定, 表明根据本发明实 施例的构建 DNA文库的方法平行性及可重复性好。
进一步,根据本发明的实施例, 本发明提供了一种构建 DNA文库的方法, 其包括: 1 )使用至少一种限制性内切酶, 对样本基因组 DNA进行酶切, 得到酶切产物;
2 ) 将酶切产物进行分离, 得到长度为 100 bp - 10,00 bp的 DNA片段; 以及 3 ) 将步骤 2 ) 中得到的 DNA片段进行末端修复;
优选地, 还包括下述步驟:
4 ) 将步骤 3 ) 中得到的 DNA片段的末端添加碱基 A;
优选地, 还包括下述步骤:
5 ) 将步骤 4 ) 中得到的 DNA片段连接测序接头。
根据本发明的一些具体示例, 上述根据本发明实施例的构建 DNA标签文库的方法 的步骤 1 ) : 所用样本基因组 DNA可以是来源于目前已有全基因组序列数据的任何物 种 (例 ¾口 http:〃 www.ncbi.nlm.nih.gov/sites/genome所列物种) , 基因组 DNA可以取自 该物种的个体、 单个细胞或某个组织。 优选地, 为人的基因组 DNA。 对本领域技术人 员而言, 基因组 DNA的提取方法根据物种和样本的不同, 可以按照本领域已知的方法 完成(包括使用商品化的试剂盒) , 比如植物组织或微生物可以使用标准的 CTAB 法 提取, 人类血液基因组 DNA可以使用 QIAamp® DNA Mini Kit(QIAGEN)完成等。 得到 的基因组 DNA应尽量保持完整, 减少因人为断裂而产生过多的小 DNA片段, 一般经 琼脂糖凝胶电泳检测达到 23K以上的标准视为合格, 同时 DNA纯度尽量高以避免影响 酶切过程的因素存在。 另外, 选择至少一种限制性内切酶对基因组 DNA进行酶切, 所 用限制性内切酶依赖于所研究的物种不同而略有不同, 其中较常用的识别序列为 5或 6 碱基的 II型限制性内切酶, 此外切割位点在识别位点以外的 II s型限制性内切酶也可以 使用。 一般来讲, 所用限制性内切酶应该为 1-2种, 因为使用过多的限制性内切酶较难 在一管反应体系中完成, 不仅会增加操作的复杂性, 而且容易导致酶切不完全或星号活 性的出现。目前,有许多商品化的限制性内切酶的可供选择,比如 NEB( NEW ENGLANG BioLabs )公司、 TaKaRa公司等等, 反应条件以限制性内切酶提供说明书为准, 以保证 达到优选的酶切效果。 优选地, 所述酶切为完全酶切。 根据本发明的实施例, 以人类基 因组为主要研究对象, 分别设计了不同的酶切组合, 其中优选酶切组合如表 1中所示。 其中, 限制性内切酶名称以 NEB公司公布为准。
根据本发明的一些具体示例, 上述根据本发明实施例的构建 DNA标签文库的方法 的步骤 2 ): 按照本领域所熟知的方法进行酶切后基因组片段的回收, 例如使用合适浓 度的琼脂糖凝胶电泳分离酶切 DNA片段。 一般地, 对于回收 1 kb以下范围内的 DNA 片段, 2%的琼脂糖凝胶是比较合适的选择, 电泳结束后切取目标长度范围内的凝胶。 然后可以使用商品化的凝胶回收试剂盒 (例如 MinElute® PCR Purification Kit ( QIAGEN ) ) , 回收目标长度范围内的 DNA片段。 另外, 限制性内切酶将人类基因 组切割成基本相同的长度分布 (例如 100 bp - 10,00 bp ) , 该范围内分布的片段是为了 得到一部分基因组, 并且一个库中片段长度相差过大会影响最后测序数据的质量,并且 会导致很大的增加成本。根据本发明的实施例,得到的 DNA片段的长度为 100 bp - 1 ,000 bp , 进一步, 根据本发明的实施例, DNA片段的长度为 200 bp - 700 bp . 为了有效地 得到该长度范围的 DNA片段, 本发明人进行了大量的研究和不懈的努力, 发现此方法 在步骤 1 ) 中的限制性内切酶优选地为选自根据本发明实施例的下面的 (1 ) - ( 4 ) 中 的至少一组(如表 1所示) : (l ) M w II; ( 2 ) 451 ( 3 ) Mbo II ^ Hind III; 以及 ( 4 ) Mbo II和 Bcc I。
根据本发明的一些具体示例, 上述根据本发明实施例的构建 DNA标签文库的方法 的步骤 3 )和 4 ) : 回收后的酶切 DNA片段使用标准化的流程进行末端修复和加 " A" 反应, 具体过程如下: 在一个反应体系中加入回收的 DNA、 10mM dNTP、 T4 DNA Polymerase Klenow Fragment、 T4 Polynucleotide Kinase以及 T4 DNA ligase buffer (with lOmM ATP)在 20 °C温育 30分钟后, 回收片段, 在另一反应体系中加入补平的 DNA、 dATP、 Klenow Fragments Klenow (3 ' -5' exo— )于 37 °C反应 30分钟。 根据本发明的一些具体示例, 上述根据本发明实施例的构建 DNA标签文库的方法 的步骤 5 ) : 接头与限制性片段的连接, 接头的选择会因所使用的测序技术方法(高通 量测序平台) 的不同而有所不同。 在本发明实施例 2中所用为 illumina®公司的边合成 边测序原理方法, 所以, illumina®接头序列包含与测序所用 flow cell上连接寡核苷酸互 补的序列以便于将文库片段连接到 flow cell上。由于本发明并不使用 PCR扩增的方法, 所以, 所加接头不需要包含扩增引物结合位点, 但需要带有测序引物的结合位点, 为了 将来源于不同样本制备的 DNA文库在测序后区分开来, 8 bp 的 Index标签序列以及 index标签测序引物序列也可被带入到一侧接头中, 这样可以方便将不同文库直接混合 后上机测序。 文库构建结束后, 需经 Agilent® Bioanalyzer 2100检测文库片段分布情况 以及经过 Q-PCR对文库进行定量。
利用根据本发明实施例的构建 DNA文库的方法, 能够有效地构建样本的 DNA文 库, 对 DNA文库测序后, 能准确地获得样品 DNA的序列信息, 通过对样品 DNA的序 列信息进行 SNPs数据分析,就可以获得样本 DNA的 SNPs信息,从而可以成功地应用 于许多下游的相关科学研究。 另外, 本发明人发现, 上述方法过程简单, 操作流程可标 准化, 则操作方便, 而且成本较低。 此外, 发明人惊奇地发现, 当针对相同的样品, 基 于上述方法, 采用不同的限制性内切酶构建 DNA文库时, 测序数据结果稳定, 可重复 性好; 而针对相同样本多次平行建库时, 测序数据结果稳定性好, 则表明根据本发明实 施例的构建 DNA文库的方法平行性及可重复性好。
根据本发明的又一方面, 本发明还提供了一种 DNA文库, 其是根据本发明的构建 DNA文库的方法所构建的。该 DNA文库可以有效地应用于高通量测序技术例如 Solexa 技术, 从而可以通过获得样本 DNA的序列信息, 进而对其进行进行 SNPs数据分析, 从而可以获得样本 DNA的 SNPs信息, 以为应用于下游的相关科学研究做好准备。
根据本发明的再一方面, 本发明还提供了一种确定 DNA序列信息的方法, 其是通 过对根据本发明实施例的构建 DNA文库的方法构建的 DNA文库进行测序而实现的。 根据本发明的具体示例, 其包括下列步骤: 根据本发明实施例的制备 DNA文库的方法 构建样本基因组 DNA的 DNA文库; 以及对 DNA文库进行测序, 以便获得 DNA序列 信息。 进一步地, 根据本发明的实施例, 还包括对 DNA序列信息进行 SNPs数据分析 的步骤, 以便获得所述 DNA的 SNPs信息。 根据本发明的实施例, 利用选自 GS测序 平台、 GA测序平台、 HiSeq2000TM测序平台、 以及 SOLiD™测序平台对所述 DNA文 库进行测序。 基于该方法, 能够有效地获得 DNA文库中 DNA样品的序列信息, 从而 能够对 DNA序列信息进行 SNPs数据分析, 以获得样本 DNA的 SNPs信息, 进而可以 依据得到的样本的 SNPs信息, 对各样本进行基因分型等科学研究。 另外, 发明人惊奇 地发现, 利用根据本发明实施例的方法确定 DNA样品序列信息, 能够有效地减少数据 产出偏向性的问题, 而且此方法可操作性和平行性好,针对大规模样本的测序时还能够 有效地简化流程, 并降低测序成本。
检测 SNPs的装置、 试剂盒以及基因分型方法
根据本发明的又一方面, 本发明还提供了一种用于检测 SNPs的装置。 参考图 10, 根据本发明的实施例, 该用于检测 SNPs的装置 1000包括: DNA文库制备单元 100、 测序单元 200以及 SNPs数据分析单元 300。根据本发明的实施例, DNA文库制备单元 100用于制备 DNA文库, 例如可以采用适于前面所述的文库构建方法的任意装置作为 DNA文库制备单元 100。 测序单元 200与 DNA文库制备单元 100相连, 可以从 DNA 文库制备单元 100接收所制备的 DNA文库, 并对所接收的 DNA文库进行测序, 从而 可以获得样本的 DNA序列信息。 SNPs数据分析单元 300与测序单元 200相连, 可以 从测序单元 200接收所获得的样本的 DNA序列信息, 并且能够进一步对 DNA序列信 息进行 SNPs数据分析, 从而获得 SNPs信息。 本领域技术人员能够理解的是, 可以采 用本领域中已知的任何适于进行上述操作的装置作为上述各个单元的组成部件。 另外, 这里所使用的术语 "相连"应作广义理解, 可以是直接相连, 也可以通过中间媒介间接 相连, 对于本领域的普通技术人员而言, 可以根据具体情况理解上述术语的具体含义。
利用根据本发明实施例的上述装置, 能够方便地对样本进行 SNPs检测, 并能获得 准确的 SNPs信息。 另外, 本发明人发现, 根据本发明的实施例的检测 SNPs的装置能 够应用于大规模数量的样本的 SNPs检测, 从而简化测序流程, 节省测序时间及成本, 并且获得的 SNPs信息较多而准确,此用途只需要在 DNA文库制备单元中于 DNA文库 加入 Index标签, 并将来自于多个样本的 DNA文库进行混合测序即可实现。
根据本发明的再一方面, 本发明还提供了一种用于检测 SNPs的试剂盒, 根据本发 明的实施例, 该试剂盒包括: 限制性内切酶, 所述限制性内切酶包括选自 M o II和 451的至少一种。 由此, 利用该试剂盒, 能够方便地检测样本的 SNPs。
根据本发明的再一方面,本发明还提供了一种基因分型方法,根据本发明的实施例, 其包括: 首先, 提供样本基因组; 接下来, 根据本发明实施例的构建 DNA文库的方法, 制备样本基因组的 DNA文库; 对 DNA文库进行测序, 以便获得 DNA序列信息; 对 DNA 序列信息进行 SNPs数据分析, 以便获得所述样本的 SNPs信息; 以及基于 SNPs信息对样 本进行基因分型。 利用上述方法, 通过构建符合 SNPs检测要求的高质量的样本 DNA文 库, 基于对高质量的 DNA文库的测序可以获得 DNA样品的序列信息, 然后基于对 DNA 序列信息进行 SNPs数据分析获得的准确有效的 SNPs信息, 再结合已有的基因型信息, 就能够有效地对样本进行基因分型。 另外, 本发明人发现, 该基因分型方法过程简单, 操作容易, 能够同时应用于大规模样本, 且成本很低。
需要说明的是, 根据本发明实施例的确定 DNA样品序列信息的方法是本申请的发 明人经过艰苦的创造性劳动和优化工作才完成的。 下面将结合实施例对本发明的方案进行解释。本领域技术人员将会理解, 下面的实 施例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条 件的, 按照本领域内的文献所描述的技术或条件(例如参考 J.萨姆布鲁克等著, 黄培堂 等译的 《分子克隆实验指南》 , 第三版, 科学出版社)或者按照产品说明书进行。 所用 试剂或仪器未注明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以釆购自 Illumina公司。
实施例 1 : 优选的限制性内切酶或者膝组合的确定
按照下表 1 中的酶或酶组合的识别序列, 通过已知的酶切识别位点信息, 以 hgl8 基因组序列为参考序列, 以酶切位点为分界将基因组按长度范围分类, 最终选取 200bp-700bp范围的片段作为待测的文库集合。 对本领域技术人员而言, hgl 8基因组序 列数据可以从已知的数据库下载, 例如从 http://genome.ucsc.edu/上下载。
按照 Iillumina® HiSeq2000™ PE91 index测序参数过滤-产生数据。 由于在实际测序 中使用 PE91循环数测序, 所以, 将以上文库集合中每个片段两端 91bp的碱基作为目 标区域, 以按照 PE91 长度测序参数将在选定范围内的片段包含酶切位点两端的 91bp 作 为 目 标 区 域 , 统 计 目 标 区 域 覆 盖 dbSNP vl28 数 据 库 ( http://www.ncbi.nlm.nih.gov/projects/SNP/ )中 SNP位点数目, 以及该数目所占 dbSNP vl28中总数的比例。
由于所用参考序列为国际上公用,特别是不涉及实际实验中会产生的其它因素的干 扰(比如 DNA的不可避免的断裂, 酶切的不完全等) , 因此得到的结果是最理想状态 下的结果, 也就是最优化的结果。
表 1 :人类基因组限制性内切酶酶切建库的优选酶组合
可检测到 SNP dbSNP vl28覆盖 回收片段范围 酶或酶的组合
数目 度
200bp-700bp Mbo II 3338421 26.90% Tsp 451 1579936 12.73%
Mbo II和 Hind III 3597897 28.99%
Mbo II和 Bcc I 4835970 38.97% 与上面的检验方法类似,本发明人还检验了大量其它的酶或酶的组合,计算得到的 dbSNP vl28覆盖度一般都在 10%以下, 部分酶或酶的组合的检验结果如下表 2所示: 表 2: 检险过的部分其它酶和酶的组合
Figure imgf000009_0002
从表 2可见,表 2中的酶或酶的组合的可检测到的 SNP数目和 dbSNP vl28覆盖度 都远低于上述表 1中所列的酶或酶的组合。
因此, 表 1中的酶或酶的组合是优选的方案。
实施例 2: 炎黄一号 DNA文库的测序
针对于人类基因组,如详细技术方法表 1中所表述的优选酶切组合,选取其中回收 片段在 200 bp - 700 bp范围内的四种优选酶切组合进行酶切建库,通过数据分析并与表 1所示的结果相比。 具体操作如下:
人类基因组 DNA提取自炎黄一号( YH1 )的血液细胞,提取使用 QIAamp® DNA Mini Kit ( QIAGEN ) 完成, 操作完全按照说明书进行。 最后基因组 DNA溶解于 EB緩冲液 中, 经 NonoDrop® ND- 1000以 A260处吸光值进行定量后, 取 5 克进行酶切。 限制性 内切酶全部购买自 NEB公司, 緩冲液随酶提供, 共进行四种酶切组合。
微微微微微微
每个酶切反应体系中基因组 DNA都为 5 克, 限制性内切酶用量为 ^¾升升升升 20U(NEB定 义单位), 每个反应中因酶组合的不同而选用最适合的緩冲液以及反应条件, 详细见下 面的表 3。
表 3 : 酶切体系
Figure imgf000009_0001
以上反应緩冲液都是 lO x母液, 最后以超纯水将反应体系补平至 10(H啟升, 按照 最适反应条件进行。
酶切后的基因组 DNA经 2%琼脂糖凝胶电泳(TAE緩冲系统) 分离后 (图 2 ) , 手工切取 200 bp - 700 bp长度范围内的片段经 QIAquick® Gel Extraction Kit ( QIAGEN ) 凝胶回收, 将溶于 30微升超纯水中。
末端修复反应按照如下体系进行:
T4 DNA ligase buffer with lOmM ATP 10
dNTPs 4
T4 DNA Polymerase 5
Klenow Fragment 1
T4 Polynucleotide Kinase 5
DNA 30 ddH20 up to 100 微升
20 °C反应 30分钟后, 使用 MinElute® PCR Purification Kit(QIAGEN)回收补平的 DNA片段。 样品最后溶于 32微升的 EB緩冲液中。
加 "A" 反应按照以下体系完成:
Klenow buffer 5 微升
dATP 10微升
Klenow (3' -5' exo-) 3 微升
DNA 32微升
37 °C温育 30分钟后, 经 MinElute® PCR Purification Kit(QIAGEN)纯化并溶于 35微 升的 EB中。
接头的连接反应如下:
10x T4 DNA Ligation buffer 5 微升
PCR-free Adapter oligo mix 5 微升
T4 DNA Ligase 5 微升
加 " A" 后的样品 DNA 35微升
连接反应于 16 °C连接过夜。 其中接头为 Iillumina®公司 PCR-free index接头, 四个 库分别带有唯一的 8 bp index标签序列,将构建好的文库经 Agilent® Bioanalyzer 2100 检 测片段分布范围(图 3 , A - D )。从图 3可见, 文库切割的片段范围为 200 bp - 700 bp , 在连接接头以后片段长度增加约 120 bp左右, 由图 3可以看出四个文库片段范围基本 符合要求, 而且文库廣量符合测序要求。 将其中使用 Tsp 451酶构建的文库命名为 YH 文库 ( YH文库 trial 1 ) 。
再经过 Q-PCR方法对四个文库进行定量, 并以此为标准将除 Mbo ll+Bcc I文库外 的其他三个文库进行 1 : 1等量混合, 而 Mto ll+ cc l文库则为其它文库上样量的两倍, 将该混合文库使用 flow cell—个 lane的测序量进行上机测序。 测序使用 Iillumina®公司 的 mSeq2000TM测序系统完成, 操作完全按照相应的操作指导进行。
数据分析主要按照 jun wang et αΙ·, Nature(2008) ( J Wang, et al , (2008).The diploid genome sequence of an Asian individual. Nature, 56:60. )中 4翁述的方法操作 , 由于双向测 序, 所以通过设定成对测序读长的方向及间隔距离参数( 50 bp - 2000 bp )对原始数据 进行过滤, 满足条件的测序读长以成对进行比对, 不满足的则以单独的测序读长进行比 对, 比对方法可以使用 SOAP v2.20将测序读长比对到参考序列 hg l 8上, 比对过程允许 有两个碱基的错配,计算所有测序读长可以比对到参考序列上的比例。最后再检测这些 可以比对上的读长有多少比例可以落在不同酶切组合结果(表 1所示) 的目标区域上, 以及目标区域的覆盖度和覆盖深度等数据, 结果如表 4所示。
表 4 : 数据分析结果 建库用酶 Mbo \l-Hind
Mbo II Tsp 451 Mbo ll-Bcc I 或组合 III 测序总读数 20406253 16964596 19182040 35838376 收获数据量
3673 3054 3453 6451
( Mb )
可比对到基
2863709280 2137535730 2707424190 5241764970 因组的碱基数
(78.0%) (70.0%) (78.4%) (81 .3%) (比例) 可比对到目
1867134613 1232551200 1717052782 3873866058 标区域的碱基
(65.2%) (57.7%) (63.4%) (73.9%) 数(比例)
目标区域的
81.10% 89.00% 72.60% 87.70% 覆盖度
目标区域平
3.13 4.67 2.675 4.643 均覆盖深度
由最终数据结果可以看出, 选用的 4个酶切组合最后结果基本一致, 除去测序上样 量加倍的 Mbo ll-Bcc I组合, 其余三个测序文库都产生 3Gb - 4Gb的数据量, 而这些序 列有 70% - 80%可以比对到基因组中, 而这其中又有 57% - 73%的数据可以比对在目标 区域, 最后与表 1所示结果相比, 72% - 90%的 标区域被测序所覆盖, 且平均的覆盖 深度为 3 - 5 , 由此可见, 该方法使用较好的酶切组合可以得到约 90%的目标区域, 而且与表 1所示的结果相比, 应用不同的酶切组合的一致性较好。
实施例 3: 使用 7¾? 451酶切建库的 SNPs检测和基因分型
为了检测对于不同样本间的平行性, 以及实际的 SNPs位点检测情况, 本实施例中 除了使用炎黄一号 (标注为 YH )基因组外, 选用了另一个健康男性(标注为 DY )基 因组进行平行实验。 按照与实施例 2中类似的方法, 用 Tsp 451酶分别构建两个 DNA 文库: YH文库 ( YH文库 trial2 )和 DY文库。
SNP 的检测使用 SOAPsnp 程序, 按照 Q20. mean quality of best allele>20.copy number < 1.1的过滤参数进行过滤, 最后统计实际得到的 SNPs数目, 以及这些位点占 dbSNP数据库的比例。 同时,根据炎黄一号全基因组已有的 SNP位点信息( Ruiqiang Li et /., (2010). SNP detection for massively parallel whole-genome resequencing. Genome Research,19: 1 124 ) , 选取以 7¾7 451酶切建库的目标区域范围内的 SNP位点信息, 与 本实施例中鉴定的 SNP位点相比较, 计算实际检测到的 SNPs位点占已有结果的比例。
具体地,将使用 451独立构建两个文库的测序数据与 hgl 8基因组序列为参考进 行比对,使用这些可以正确比对到参考基因组的测序序列,统计了插入片段的长度分布, 结果显示, 无论使用 DY基因组 (图 4 )还是 YH基因组 (图 5 )构建的文库, 插入片 段都正常分布在 200bp - 700bp之间, 这与最初的实验设计和操作是一致的, 而且两个 文库间, 在该片段长度范围 (X坐标) 内测序数据分布比例 (Y坐标)也比较一致。 此 夕卜, 统计了两个文库测序数据的分布情况, 其中 DY文库(图 6 )平均的测序深度为 11 左右, 而 YH文戽(图 7 )平均测序深度达到 20 , 而且二者的深度分布基本近似于 泊松分布, 而 DY文库由于测序最后得到的数据量较 YH文库要小, 所以其测序深度较 低。
进一步的数据统计分析结果如下表 5所示,其中,两个文库上机后分别得到了 4.5Gb 和 7.8Gb的测序原始数据, 这其中分别有 76.8%和 84.6%分别可以比对到 hgl 8参考基 因组上,在正确比对上这部分的数据中,分别有 80.9%和 78.5%是正确位于目标区域的, 而统计目标区域被至少一个测序数据所覆盖的比例,两个文库中分别为 91.9%和 95.2%。 由该数据结果可以看出,使用该限制性内切酶建库的方法,可以稳定得到 90%以上的目 标区域, 而且测序数据的比对率都在正常范围内。 初步数据分析结果
Figure imgf000012_0001
为了进一步比较该建库方法的平行性, 以目标区域中不同碱基的覆盖深度为参考, 分别选取使用 7¾? 451构建的三个文库进行了两两间的比较,分别比较了 YH文库和 DY 文库(图 8 )和两次构建的 YH样品文库(图 9 , "YH文库 trial 1 " 表示实施例 2中构 建的文库, "YH文库 trial2" 表示实施例 3中构建的文库) 的平行性, 其中 X轴和 Y 轴分别对应不同的样品或不同实验批次(如图 8和 9中标注), 其坐标是按照不同的覆 盖深度由小到大分为相应的区间等级, 由 1至 10表示由低到高的覆盖深度。 Z轴表示 的是位于该深度区间的碱基数 U, 由图 8和 9中可以看出, 无论是使用不同的样品还是 不同的批次, 建库的平行性都较好, 大部分碱基在两个库中被覆盖的深度也基本一致。
同时,分析了相互比较的文库之间目标区域被共同覆盖的情况显示, 两次构建的三 个文库一致性较好, 其中有 3%的目标区域在相互比较的两个文库中都没有测序数据覆 盖, 而被覆盖的 II标区域有 90%是一致的, 此外大约 7%的目标区域仅在一个库中被覆 盖, 说明, 该方法建库的平行性在 93%以上。
由于第二次构建的 YH文库, 平均测序深度达到了 20 X , 所以, 使用此次数据进 行了 SNP检测,使用 SOAPsnp软件,以 Q20. mean quality of best allele>20.copy number < 1.1为过滤参数, 以 hg18为参考基因组序列, 一共得到了 264K的 SNPs位点信息, 通过与已发表的 YH基因组 SNPs位点信息比较,应该有 294K的 SNPs位点位于 Tsp 451 酶切后测序的目标区域内, 而本次实险得到的 SNPs位点中有 219K ( 74.6% )为一致的, 其中假阳性有 44K ( 17% ) , 假阴性为 74Κ ( 25% ) , 通过分析确定, 假阳性中有 28Κ ( 65% )位点虽然在已报道的 ΥΗ基因组中并未检测到,但是在 dbSNP数据库中是被收 录的, 说明这部分可能是在 YH的参考 SNP数据集中因某种原因被过滤掉, 而在本实 验中被正确的检测出来, 所以, 除去这部分原因, 假阳性率也可以控制在合理范围内。 而假阴性部分有约 21K ( 28% )是由于 SNP位于限制性内切酶的识别位点内, 最后导 致了酶无法识别和切割而丢掉了该 II标区域片段及 SNP位点信息, 而另外大部分则是 因为测序深度不够或者该位点测序质量值不高导致的,这部分与本方法无关,可以在后 续实验中通过提高测序量来进一步优化。
为了进一步验证该方法得到 SNP位点的准确性, 将此次得到的数据与使用目前主 流的基因分型芯片 ( Iillumina 1M BeadChip )对 YH基因组的分型信息比较, 在芯片上 涵盖的约 1M的 SNPs位点有 100K位于本方法的目标区域内, 而使用本方法覆盖了约 98K ( 90% ) , 在共同覆盖的部分, 其中对于纯合位点的一致率达到 99%以上, 而杂合 位点的一致率为 92% , 准确率和覆盖度都较好。
由以上结果可以看出, 通过 ¾1据本发明实施例的构建 DNA文库、 确定 DNA序列 信息及检测 SNP位点的方法可以有效地得到预先模拟(表 1 )90%以上的目标区域片段, 并成功且准确检测该区域内大部分的 SNPs位点信息,这些 SNP信息可以用于后续的基 因分型或者 GWAS研究中。
工业实用性
本发明的 DNA文库及其制备方法、确定 DNA序列信息的方法、检测 SNPs的装置 和试剂盒、 以及基因分型方法, 能够应用于 DNA测序, 进而应用于 SNPs检测以及基 因分型, 并且能够有效地提高测序平台, 例如 Solexa测序平台的测序通量。
尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根 据已经公开的所有教导,可以对那些细节进行各种修改和替换,这些改变均在本发明的 保护范围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。
在本说明书的描述中, 参考术语 "一个实施例"、 "一些实施例"、 "示意性实施 例" 、 "示例" 、 "具体示例" 、 或 "一些示例" 等的描述意指结合该实施例或示例描 述的具体特征、 结构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说 明书中, 对上述术语的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具 体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结 合。

Claims

权利要求书
1、 一种制备 DNA文库的方法, 包括如下步骤:
使用限制性内切酶, 对样本基因组 DNA进行酶切, 以便获得酶切产物, 其中所述 限制性内切酶包 4舌选自 Mbo II和 Tsp 451的至少一种;
将所述酶切产物进行分离, 以便获得长度为 100 bp - 1 ,000 bp的 DNA片段; 将所述 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段;
在所述经过末端修复的 DNA片段的末端添加碱基 A , 以便获得具有末端碱基 A的 DNA片段; 以及
将所述具有末端碱基 A的 DNA片段与测序接头连接, 以便获得所述 DNA文库。
2、 根据权利要求 1所述的方法, 其中所述限制性内切酶进一步包括选自 Hind III 和 Bcc l的至少一种。
3、 根据权利要求 1所述的方法, 其中所述限制性内切酶为选自下列的至少一组: ( 1 ) Mbo II;
( 2 ) Tsp 451;
( 3 ) Mbo II和 Hind III; 以及
( 4 ) Mbo II和 ficc l。
4、 根据权利要求 1所述的方法, 其中, 通过琼脂糖凝胶电泳和切胶回收将所述酶 切产物进行分离。
5、 根据权利要求 1所述的方法, 其中所述 DNA片段的长度为 200 bp _ 700 bp。
6、 一种 DNA文库, 其是根据权利要求 1至 5任一项所述的方法构建的。
7. 一种确定 DNA序列信息的方法, 其特征在于包括以下步骤:
根据权利要求 1-5任一项所述的方法构建所述 DNA的 DNA文库; 以及
对所述 DNA文库进行测序, 以便获得所述 DNA序列信息。
8、 根据权利要求 7所述的方法, 其特征在于, 利用选自 Illumina、 Roche 454以及
SOLiD测序平台对所述 DNA文库进行测序。
9. 根据权利要求 7所述的方法, 其特征在于, 进一步包括对所述 DNA序列信息进 行 SNPs数据分析的步驟, 以便获得所述 DNA的 SNPs信息。
10. 一种用于检测 SNPs的装置, 包括如下单元:
DNA文库制备单元, 所述 DNA文库制备单元用于制备 DNA文库;
测序单元, 所述测序单元与所述 DNA文库制备单元相连, 用于对所述 DNA文库进 行测序, 以便获得 DNA序列信息; 以及
SNPs数据分析单元,所述 SNPs数据分析单元与所述测序单元相连,用于对所述 DNA 序列信息进行 SNPs数据分析, 以便获得 SNPs信息。
11. 一种用于检测 SNPs的试剂盒, 其包括:
限制性内切酶, 所述限制性内切酶包括选自 Mbo II和 7¾? 451的至少一种。
12、 根据权利要求 11所述的试剂盒, 其中所述限制性内切酶进一步包括选自 Hind III和 Bcc I的至少一种。
13、 根据权利要求 11所述的试剂盒, 其中所述限制性内切酶为选自下列的至少一 组:
( 1 ) Mbo II;
( 2 ) Tsp 451;
( 3 ) Mbo II和 Hind III; 以及
( 4 ) Mbo W^Bcc I。
14. 一种基因分型方法, 其包括:
提供样本基因组; 根据权利要求 1-5所述的方法, 制备所述样本基因组的 DNA文库;
对所述 DNA文库进行测序, 以便获得所述 DNA序列信息;
对所述 DNA序列信息进行 SNPs数据分析, 以便获得所述样本的 SNPs信息; 以及 基于所述 SNPs信息对所述样本进行基因分型。
PCT/CN2011/079971 2010-11-23 2011-09-21 DNA文库及其制备方法、以及检测SNPs的方法和装置 WO2012068919A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US13/989,031 US9493821B2 (en) 2010-11-23 2011-09-21 DNA library, preparation method thereof, and device for detecting SNPs
EP11843141.0A EP2631336B1 (en) 2010-11-23 2011-09-21 Dna library and preparation method thereof, and method and device for detecting snps
DK11843141.0T DK2631336T3 (en) 2010-11-23 2011-09-21 DNA library and the method for producing the same as well as method and apparatus for detecting the SNP

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010555192.4 2010-11-23
CN201010555192.4A CN102061526B (zh) 2010-11-23 2010-11-23 一种DNA文库及其制备方法、以及一种检测SNPs的方法和装置

Publications (1)

Publication Number Publication Date
WO2012068919A1 true WO2012068919A1 (zh) 2012-05-31

Family

ID=43997025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079971 WO2012068919A1 (zh) 2010-11-23 2011-09-21 DNA文库及其制备方法、以及检测SNPs的方法和装置

Country Status (5)

Country Link
US (1) US9493821B2 (zh)
EP (1) EP2631336B1 (zh)
CN (1) CN102061526B (zh)
DK (1) DK2631336T3 (zh)
WO (1) WO2012068919A1 (zh)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102061526B (zh) 2010-11-23 2014-04-30 深圳华大基因科技服务有限公司 一种DNA文库及其制备方法、以及一种检测SNPs的方法和装置
CN102242222B (zh) * 2011-07-21 2013-05-15 中国水产科学研究院珠江水产研究所 基于est-ssr标记的草鱼分类方法
CN102952854B (zh) * 2011-08-25 2015-01-14 深圳华大基因科技有限公司 单细胞分类和筛选方法及其装置
CN103160937B (zh) * 2011-12-15 2015-02-18 深圳华大基因科技服务有限公司 对高等植物复杂基因组基因进行富集建库和snp分析的方法
CN102534814B (zh) * 2012-01-18 2013-08-28 首都医科大学 全细胞水平高效捕获染色质转录调控区的方法及其用途
CN102691111B (zh) * 2012-03-29 2014-11-26 首都医科大学 高通量全基因组水平捕获染色质核小体空缺区的方法
WO2014019180A1 (zh) * 2012-08-01 2014-02-06 深圳华大基因研究院 确定异常状态生物标记物的方法及系统
CN103627710B (zh) * 2012-08-22 2016-08-03 中国人民解放军总医院 Spg11基因突变体及其应用
CN106029899B (zh) * 2013-09-30 2021-08-03 深圳华大基因股份有限公司 确定染色体预定区域中snp信息的方法、系统和计算机可读介质
CN103572378B (zh) * 2013-10-28 2015-12-02 博奥生物集团有限公司 基于Ion ProtonTM 测序平台的小片段DNA文库的构建方法及其应用
CN104005090B (zh) * 2014-05-28 2016-08-17 北京诺禾致源生物信息科技有限公司 低质量样本dna高通量测序文库的构建方法
CN105506748B (zh) * 2016-01-18 2018-11-27 北京百迈客生物科技有限公司 一种dna高通量测序建库方法
CN105603535B (zh) * 2016-01-27 2018-11-27 北京诺禾致源科技股份有限公司 构建dna文库的试剂盒和方法
CN106148513A (zh) * 2016-06-22 2016-11-23 杭州杰毅麦特医疗器械有限公司 一种游离dna文库构建方法及试剂盒
CN108300773A (zh) * 2016-08-30 2018-07-20 广州康昕瑞基因健康科技有限公司 一种建库方法及snp分型方法
CN108300764B (zh) * 2016-08-30 2021-11-09 武汉康昕瑞基因健康科技有限公司 一种建库方法及snp分型方法
CN108179174A (zh) * 2018-01-15 2018-06-19 武汉爱基百客生物科技有限公司 一种高通量简化基因组测序文库的构建方法
CN109610011A (zh) * 2018-12-28 2019-04-12 厦门胜芨科技有限公司 一种NanoDNA超长伴随建库试剂盒及其使用方法
CN111199773B (zh) * 2020-01-20 2023-03-28 中国农业科学院北京畜牧兽医研究所 一种精细定位性状关联基因组纯合片段的评估方法
CN113337590B (zh) * 2021-06-03 2024-07-09 深圳华大基因股份有限公司 一种二代测序方法和文库构建方法
CN114356222B (zh) * 2021-12-13 2022-08-19 深圳先进技术研究院 数据存储方法、装置、终端设备及计算机可读存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341750A (zh) * 2001-09-30 2002-03-27 湖北大学 一种基因克隆的高效方法
CN101230490A (zh) * 2008-02-26 2008-07-30 北京林业大学 一种染色体步移文库的构建方法
CN101343667A (zh) * 2008-07-11 2009-01-14 中国水产科学研究院黄海水产研究所 一种水产动物snp标记筛选方法
KR20090033307A (ko) * 2007-09-29 2009-04-02 재단법인서울대학교산학협력재단 B형 간염 유래 간질환 진단용 키트
WO2009126395A1 (en) * 2008-04-11 2009-10-15 Transgenomic, Inc. Method for identifying the sequence of one or more variant nucleotides in a nucleic acid molecule
WO2010091111A1 (en) * 2009-02-03 2010-08-12 Biohelix Corporation Endonuclease-enhanced helicase-dependent amplification
CN101845489A (zh) * 2009-12-03 2010-09-29 中国海洋大学 一种大规模筛查扇贝snp的方法
CN102061526A (zh) * 2010-11-23 2011-05-18 深圳华大基因科技有限公司 一种DNA文库及其制备方法、以及一种检测SNPs的方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5710000A (en) * 1994-09-16 1998-01-20 Affymetrix, Inc. Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping
EP2789696B1 (en) * 2005-12-22 2015-12-16 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
WO2010085774A1 (en) * 2009-01-26 2010-07-29 Board Of Regents, The University Of Texas System Digital restriction enzyme analysis of methylation
EP2248914A1 (en) * 2009-05-05 2010-11-10 Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. The use of class IIB restriction endonucleases in 2nd generation sequencing applications

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341750A (zh) * 2001-09-30 2002-03-27 湖北大学 一种基因克隆的高效方法
KR20090033307A (ko) * 2007-09-29 2009-04-02 재단법인서울대학교산학협력재단 B형 간염 유래 간질환 진단용 키트
CN101230490A (zh) * 2008-02-26 2008-07-30 北京林业大学 一种染色体步移文库的构建方法
WO2009126395A1 (en) * 2008-04-11 2009-10-15 Transgenomic, Inc. Method for identifying the sequence of one or more variant nucleotides in a nucleic acid molecule
CN101343667A (zh) * 2008-07-11 2009-01-14 中国水产科学研究院黄海水产研究所 一种水产动物snp标记筛选方法
WO2010091111A1 (en) * 2009-02-03 2010-08-12 Biohelix Corporation Endonuclease-enhanced helicase-dependent amplification
CN101845489A (zh) * 2009-12-03 2010-09-29 中国海洋大学 一种大规模筛查扇贝snp的方法
CN102061526A (zh) * 2010-11-23 2011-05-18 深圳华大基因科技有限公司 一种DNA文库及其制备方法、以及一种检测SNPs的方法和装置

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
BAIRD N.B. ET AL.: "Rapid SNP discovery and genetic mapping using sequenced RAD markers, art. e3376", PLOS ONE, vol. 3, no. 10, 30 October 2008 (2008-10-30), pages 1 - 7, XP055005983 *
CHUNMING DING; SHENGNAN JIN.: "Methods in Molecular Biology", 2009, HUMANA PRESS, article "High-Throughput Methods for SNP Genotyping. Single Nucleotide Polymorphisms", pages: 578
CHUNMING DING; SHENGNAN JIN: "Methods in Molecular Biology", 2009, HUMANA PRESS, article "High-throughput methods for SNP genotyping. Single Nucleotide Polymorphisms", pages: 578
J WANG ET AL.: "The diploid genome sequence of an Asian individual", NATURE, vol. 456, 2008, pages 60
J. SAMBROOK; HUANG PT ET AL.: "<Molecular Cloning Laboratory Manual>", SCIENCE PRESS
J. SAMBROOK; HUANG PT: "Molecular Cloning Laboratory Manual", SCIENCE PRESS
JUN WANG ET AL., NATURE, 2008
MICHAEL A. GORE ET AL.: "A First-Generation Haplotype Map of Maize", SCIENCE, vol. 326, 2009, pages 1115
NATHAN A, BAIRD; PAUL D ET AL.: "Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers", PLOS ONE, vol. 3, no. 10, 2008, pages 3376, XP055005983, DOI: doi:10.1371/journal.pone.0003376
RUIQIANG LI ET AL.: "SNP detection for massively parallel whole-genome sequencing", GENOME RESEARCH, vol. 19, 2010, pages 1124
SCNCHEZ CC; SMITH TPL; WIEDMANN RT: "Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library", BMC GENOMICS, vol. 10, 2009, pages 559, XP021062244, DOI: doi:10.1186/1471-2164-10-559
See also references of EP2631336A4

Also Published As

Publication number Publication date
DK2631336T3 (en) 2015-01-26
CN102061526B (zh) 2014-04-30
US20130288907A1 (en) 2013-10-31
CN102061526A (zh) 2011-05-18
EP2631336B1 (en) 2014-11-19
EP2631336A1 (en) 2013-08-28
EP2631336A4 (en) 2013-10-16
US9493821B2 (en) 2016-11-15

Similar Documents

Publication Publication Date Title
WO2012068919A1 (zh) DNA文库及其制备方法、以及检测SNPs的方法和装置
AU2018266377B2 (en) Universal short adapters for indexing of polynucleotide samples
US20230340590A1 (en) Method for verifying bioassay samples
CA2983935C (en) Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis)
US20230250476A1 (en) Deep Sequencing Profiling of Tumors
WO2018195217A1 (en) Compositions and methods for library construction and sequence analysis
WO2013064066A1 (zh) 全基因组甲基化高通量测序文库的构建方法及其应用
CA2878280A1 (en) Multiplexed sequential ligation-based detection of genetic variants
US20110319298A1 (en) Differential detection of single nucleotide polymorphisms
CN110628891A (zh) 一种对胚胎进行基因异常筛查的方法
AU2021359279B2 (en) Nucleic acid library construction method and application thereof in analysis of abnormal chromosome structure in preimplantation embryo
EP4428244A2 (en) Methods and compositions for analyzing nucleic acid
CN115125295A (zh) 一种用于多位点可持续使用的基因分型标准品
WO2024040957A1 (en) Simplified analysis method of dna and cell free dna and uses thereof
Gallardo et al. Application to Assisted Reproductive of Whole-Genome Treatment Technologies
WO2013173993A1 (zh) 鉴定双胞胎类型的方法和系统
Gallardo et al. Application of Whole-Genome Technologies to Assisted Reproductive Treatment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11843141

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011843141

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 13989031

Country of ref document: US