WO2012037884A1 - Dna标签及其应用 - Google Patents

Dna标签及其应用 Download PDF

Info

Publication number
WO2012037884A1
WO2012037884A1 PCT/CN2011/079907 CN2011079907W WO2012037884A1 WO 2012037884 A1 WO2012037884 A1 WO 2012037884A1 CN 2011079907 W CN2011079907 W CN 2011079907W WO 2012037884 A1 WO2012037884 A1 WO 2012037884A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
medip
seq
sample
strand
Prior art date
Application number
PCT/CN2011/079907
Other languages
English (en)
French (fr)
Inventor
孙继华
王君文
罗慧娟
闫淑静
章文蔚
王俊
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Publication of WO2012037884A1 publication Critical patent/WO2012037884A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B70/00Tags or labels specially adapted for combinatorial chemistry or libraries, e.g. fluorescent tags or bar codes

Definitions

  • the invention relates to the field of molecular biology, in particular to the field of genomic thiolated DNA enrichment technology.
  • the invention relates to DNA tags for genomic thiolated DNA enrichment and uses thereof.
  • the present invention provides a DNA tag, an oligonucleotide, a MeDIP-seq library, a preparation method thereof, a method for determining thiolation information of a DNA sample, and a plurality of methods for determining a DNA sample of a MeDIP-seq library for constructing a sample DNA.
  • a method of thiolation information of a DNA sample and a kit for constructing a MeDIP-seq library of sample DNA are examples of sample DNA sample DNA.
  • genomic DNA thiolation is one of the hottest directions in the field of epigenetics research, and it is gradually becoming an epigenetic marker for many diseases such as mammalian development and cancer.
  • DN A thiolation not only plays an important role in chromatin structure modification and genomic stability, but in eukaryotes, DNA thiolation is involved in various biological processes such as embryonic development, genomic imprinting, X chromosome inactivation, genes Regulation of regulation and silencing, silencing of retrotransposons, and the development of various diseases such as mammalian tumors (see, for example, Brena RM, Huang TH, Plass C. Quantitative assessment of DNA methylation: Potential applications for disease diagnosis, classification, And prognosis in clinical settings.
  • thiosylation sequencing etc. are currently the most popular sequencing methods for studying genomic thiolation, but they are limited by cost, flux and resolution to varying degrees (see, for example, Li N, Ye M, Li Y et al. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods Apr. 27. 27. [Epub ahead of print]; Down TA, Rakyan VK, Turner DJ, et al.
  • BS-seq is the most commonly used method for CpG thiolation analysis, which can provide thiolation information with single base resolution, but requires whole genome to be combined with sequencing after bisulfite treatment, so the amount of data Large, sequencing and analysis costs are high (see, for example, Serre D, Lee BH, Ting AH. MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 2010 Jan;38(2 ):391 -9. ; Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolution show widely epigenomic differences. Nature.
  • MeDIP-seq, MBD-seq, and RRBS selectively reduced the sample size of sequencing to varying degrees.
  • RRBS can only cover about 10-20% of the genome, and is mainly the CpG island of the genome and a small part of the promoter region. It is difficult to reflect the genomic characterization of the genome at the overall level (see, for example, Meissner A, Mikkelsen TS, Gu H, Et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008 Aug 7;454(7205):766-70.; Gu H, Bock C, Mikkelsen TS, et al.
  • MBD-seq and MeDIP-seq are used to enrich the binding of thiolated specific binding protein (MBD2) and thiolated specific binding antibody (5mc antibody) to thiolated DNA, respectively.
  • MBD-seq is mainly enriched for sorghum-based DNA in the high CpG region.
  • MeDIP-seq is mainly enriched in regions of high thiol and moderate CpG density.
  • a DNA tag (herein, simply referred to as a "tag") that can be used to construct a MeDIP-seq library is presented.
  • the invention proposes a set of isolated DNA tags.
  • the sample source of DNA can be accurately characterized by linking the DNA tag to the sample DNA or its equivalent.
  • a MeDIP-seq library containing different DNA tags of a plurality of samples can be simultaneously constructed, and thus can be sequenced by mixing MeDIP-seq libraries derived from different samples, and can be based on DNA tags for MeDIP.
  • the singularization information of the -seq library is categorized to obtain thiolation information for a variety of DNA samples, thereby enabling the full utilization of high-throughput sequencing techniques, such as the use of Solexa sequencing technology, and multiple MeDIP-seq libraries simultaneously Sequencing was performed to increase the sequencing efficiency and throughput of the MeDIP-seq library.
  • the inventors have surprisingly found that with the DNA tagged MeDIP-seq library according to an embodiment of the present invention, it is possible to accurately distinguish a plurality of MeDIP-seq libraries, and the resulting sequencing data results are very stable and reproducible. .
  • the invention also provides a set of isolated oligonucleotides for introducing the above DNA tag into a sample DNA or an equivalent thereof.
  • a set of isolated oligonucleotides according to an embodiment of the invention having a first strand and a second strand, each of said strands being composed of a nucleotide represented by SEQ ID NO: (3N-1), respectively
  • these oligonucleotides (also referred to in the present specification, sometimes referred to as "DNA tag linkers” or “tag linkers”) have the DNA tags of the embodiments of the present invention as described above, And it has a sticky end T, and thus, the corresponding DNA tag can be introduced into DNA or its equivalent by a ligation reaction.
  • the sense sequence DNA Index-NF adapter and its corresponding antisense sequence DNA can be The Index-NR adapter performs an equimolar annealing treatment to form a corresponding DNA tag linker having a Y-type structure.
  • Table 1 DNA tag (N DNA tag-N) and DNA tag linker (DNA Index-N_adapter) sequence DNA Index- 14 ATGTCA(40)
  • DNA Index-20R- adapter 5-Phos/CGTTGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC (60) Using the oligonucleotides according to the examples of the present invention described above, it is possible to efficiently introduce a DNA tag into a sample.
  • the inventors have surprisingly found that when constructing a MeDIP-Seq library containing various DNA tags with oligonucleotides having different tags for the same sample, the stability and reproducibility of the resulting sequencing data results are obtained. very good.
  • the human whole blood sample MeDIP-Seq library constructed using DNA Indexl-20 exhibits a correlation of at least 0.99 when data analysis is performed using the pearson coefficient. For details on the specific algorithm of the pearson coefficient, see the literature, for example: t Hoen, PA, Y. Ariyurek, et al. (2008).
  • the invention provides a method of preparing a MeDIP-seq library.
  • the method comprises: fragmenting the sample DNA to obtain a DNA fragment; performing end repair of the DNA fragment to obtain a DNA fragment subjected to end repair; Base A is added to the end to obtain a DNA fragment having a sticky end A; the DNA fragment having the sticky end A is linked to one of a group of isolated oligonucleotides according to an embodiment of the present invention to obtain a tagged ligation product; capturing the tagged ligation product with a methylation-specific binding antibody to obtain a tagged ligation product containing methylated DN A; and isolating and amplifying the methylation-containing product
  • the tagged ligation product of DN A, the tagged ligation product containing methylated DNA constitutes the MeDIP-seq library.
  • the method of constructing the MeDIP-seq library of the present invention it is possible to efficiently introduce the DNA tag of the present invention into the MeDIP-seq library constructed for the sample DNA.
  • the source of the DNA sample can be distinguished by sequencing the MeDIP-seq library to obtain methylation information of the DNA sample and information of the DNA tag.
  • the inventors have surprisingly found that when the same sample is used, based on the above method, the MeDIP-seq library containing various DNA tags is constructed using oligonucleotides having different tags, and the obtained sequencing is performed. The stability and repeatability of the data results are very good.
  • the present invention also provides a MeDIP-seq library obtained by a method of constructing a MeDIP-seq library according to an embodiment of the present invention.
  • the invention also provides a method of determining thiolation information for a DNA sample.
  • a method of determining thiolation information for a DNA sample comprising: establishing a MeDIP-seq library of the DNA sample according to a method of constructing a MeDIP-seq library according to an embodiment of the present invention; and sequencing the MeDIP-seq library to determine the DNA The thiolation information of the sample. Based on this method, the thiolation information of the DNA sample in the MeDIP-seq library and the sequence information of the DNA tag can be efficiently obtained, thereby distinguishing the source of the DNA sample.
  • the inventors have surprisingly found that using the method according to an embodiment of the present invention to determine the thiolation information of a DNA sample can effectively reduce the problem of data output bias, and can accurately distinguish a plurality of MeDIP-seq libraries. .
  • the present invention also provides a method of determining thiolation information for a plurality of DNA samples.
  • the method comprises the steps of: establishing a MeDIP of the DNA sample independently of each of the plurality of samples, independently of the method of constructing the MeDIP-seq library according to an embodiment of the present invention.
  • the method according to an embodiment of the present invention can make full use of high-throughput sequencing technology, for example, using Solexa sequencing technology to simultaneously sequence MeDIP-Seq libraries of various samples, thereby improving the efficiency of MeDIP-Seq library sequencing. And throughput, while improving the efficiency of determining thiolation information for a variety of DNA samples.
  • kits for constructing a MeDIP-seq library of sample DNA comprising: a set of isolated oligonucleotides, according to an embodiment of the present invention,
  • the isolated oligonucleotide has a first strand consisting of a nucleotide represented by SEQ ID NO: (3N-1) and a second strand consisting of SEQ ID NO: (3N)
  • the DNA tag according to the embodiment of the present invention can be conveniently introduced into the constructed MeDIP-seq library.
  • Figure 1 Schematic diagram showing the construction of a mixed MeDIP-seq library of various samples according to an embodiment of the present invention
  • Figure 2 Correlation analysis of inter-sample enriched fragments of two of the six DNA samples obtained by the method for determining thiolation information of various DNA samples according to an embodiment of the present invention.
  • first and second are used for descriptive purposes only, and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defining “first”, “second” may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, “multiple” means two or more unless otherwise stated.
  • the present invention proposes some isolated DNA tags.
  • SEQ ID NO: (3N-2) any integer of 1-20.
  • DNA as used in the present invention may be any polymer comprising deoxyribonucleotides, including but not limited to modified or unmodified DNA.
  • DN A tag a tagged MeDIP-seq library is obtained by ligating the DNA tag with the DNA of the sample or its equivalent, and the thiol group of the DNA sample can be obtained by sequencing the MeDIP-seq library. The information and the sequence of the tag, which in turn based on the sequence of the tag, can accurately characterize the source of the DNA sample.
  • a MeDIP-seq library of a plurality of samples can be simultaneously constructed, whereby the MeDIP-seq library derived from different samples can be mixed and simultaneously sequenced, and the DNA sample is thiolated based on the DNA tag.
  • Information is categorized to obtain thiolation information for a variety of DNA samples. This allows for the use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology, while sequencing the MeDIP-seq libraries of multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing technologies and reducing the determination of DNA samples. The cost of thiolated information.
  • DNA tag attached to the DNA of the sample or its equivalent should be understood in a broad sense, including the DNA tag can be directly linked to the DNA of the sample to construct the MeDIP-seq library, and also the DNA of the sample. Nucleic acids having the same sequence (for example, may be the corresponding RNA sequence or cDNA sequence, which have the same sequence as the DNA) are linked.
  • the inventors of the present application found that: In the present invention, in order to design an effective DNA tag, it is first necessary to consider the problem of recognizability and recognition rate between tag sequences. Second, in the case of a label mix of less than 12 samples, the GT content of each base site on the mixed label must be considered. Because the excitation fluorescence of the bases G and T is the same in the Solexa sequencing process, the excitation lights of the bases A and C are the same, so the "balance" of the base “GT” content and the base “AC” content must be considered. The base base "GT” content is 50%, which guarantees the highest label recognition rate and the lowest error rate. Finally, consider the repeatability and accuracy of the data output.
  • the inventors of the present application performed a large number of screening work, and selected a set of isolated DNA tags according to an embodiment of the present invention, which are respectively represented by the nucleotides represented by SEQ ID NO: (3N-2)
  • the sequence is as shown in Table 1 above and will not be described again.
  • These tags can be applied to the construction of any MeDIP-seq library. There are no reports on the construction of these tags for the genomic genomic DNA enrichment of sample genomics and sequencing by Solexa.
  • the DNA tag used is a nucleic acid sequence of 6 bp in length, and the difference between the tags is more than 3 bases, the set of DNA tags consisting of: At least 5, or at least 10, or at least 15, or all 20 of the labels in 1.
  • the label preferably includes at least 20 kinds of tags of the DNA index-1 - DNA index-5, or DNA index-6 - DNA index-10, or DNA index- 11 - DNA index- 15 , or DNA index- 16 - DNA index-20, or a combination of any two or more of them.
  • the 1 base difference comprises a substitution, addition or deletion of 1 base in the sequence of the 20 tags shown in Table 1.
  • the present invention also provides the use of a tag according to an embodiment of the present invention for the construction and sequencing of a MeDIP-seq library, wherein the DNA tag linker of the MeDIP-seq library comprises the embodiment of the present invention.
  • the invention provides a set of isolated oligonucleotides which can be used to introduce a DNA tag as described above into a DNA fragment of a sample, thereby constructing a tagged MeDIP-seq library.
  • the invention provides a set of isolated oligonucleotides, each of the set of isolated oligonucleotides having a sticky end T, and the isolated oligonucleotides having a first The chain and the second strand, the sticky end T, are formed on the first strand of each of the oligonucleotides.
  • the first strand is composed of the nucleotides represented by SEQ ID NO: (3N-1), and the second strand is composed of the nucleotides represented by SEQ ID NO: (3N), respectively.
  • the corresponding oligonucleotides can be formed by respectively annealing the first strand and the second strand constituting the corresponding oligonucleotide.
  • the above oligonucleotides respectively have the DNA tags according to the embodiments of the present invention as described above, and the oligonucleotides have sticky ends, and thus, the corresponding DNA tags can be linked by a ligation reaction. Introduced into the DNA of the sample or its equivalent. Specifically, the sequences of these oligonucleotides are as shown in Table 1 above, and are not described herein.
  • the oligonucleotide sequence (DNA tag linker) provided according to an embodiment of the present invention has high stability. This finding was primarily based on the analysis of the structural stability of these oligonucleotide sequences by Lasergene software (http://www.dnastar.com/) in accordance with some embodiments of the present invention. Using Lasergene's PrimerSelect software, the affinity parameter between the duplexes can be determined by analyzing the energy values formed between the two sequences, thereby predicting the most stable dimer overrall and energy formed by the DNA tag linker. The value, where the absolute value of the energy value (kcal/mol) is larger, indicates that the result of the duplex is more stable. The above structural stability and affinity analysis were performed on the 20 DNA tag adapters shown in Table 1 above, and the results showed that the "Y-type" structure formed by these DNA tag linkers was very stable.
  • the invention provides DNA tag adapters comprising a DNA tag of an embodiment of the invention, and preferably simultaneously serving as a 5' and 3' linker, said set of tag connectors Included or consist of: at least 5, or at least 10, or at least 15, or all 20 of the tag joints selected in Table 1.
  • these tag binders preferably include at least the DNA index-1F/R_ adapter-DNA index-5F/R_ adapter, or the DNA index-6F/R_ adapter in the 20 tag connectors shown in Table 1.
  • DNA index- 10F/R— adapter or DNA index- 11 F/R adapter - DNA index- 15 F/R adapter , or DNA index- 16F/R_ adapter - DNA index-20F/R_ adapter, or any of them Multiple combinations.
  • a difference of 1 base includes substitution, addition or deletion of 1 base in the tag sequence.
  • a DNA tag linker for use in the construction and sequencing of a MeDIP-seq library is also provided, preferably the tag linker is used simultaneously as a 5' and 3' linker of the MeDIP-seq library.
  • a MeDIP-seq library constructed using the above DNA tag linker is also provided.
  • the present invention also provides a method of constructing a MeDIP-seq library of sample DNA using the above oligonucleotide (DNA tag linker). Specifically, according to an embodiment of the present invention, the method comprises: First, fragmenting a sample DNA to obtain a DNA fragment.
  • the source of the sample DNA is not particularly limited and may be derived from various plants, animals, microorganisms.
  • the DNA sample is derived from at least one of a mammal, a plant, and an insect.
  • the sample DNA is at least one selected from the group consisting of human and mouse genomic DNA.
  • a MeDIP-seq library of a plurality of common model organisms can be efficiently constructed.
  • fragmentation of the sample DNA is carried out by atomization, ultrasonic fragmentation, HydroShear or restriction enzyme treatment, preferably by ultrasonic fragmentation.
  • the length of the DNA fragment is 200 - 400b, whereby the efficiency of constructing the MeDIP-seq library and subsequent sequencing can be further improved.
  • the DNA fragment is end-repaired to obtain a DNA fragment that has been repaired at the end.
  • end repair is carried out using T4 DNA polymerase, Klenow fragment and T4 polynucleotide kinase.
  • base A is added to the end of the end-repaired DNA fragment to obtain a DNA fragment having a sticky end A.
  • the addition of base A at the end of the end-repaired DNA fragment is carried out using the Klenow (3'-5' exo-) enzyme.
  • a DNA fragment having a sticky terminal A is ligated to one of a group of isolated oligonucleotides according to an embodiment of the present invention to obtain a linked product having a tag.
  • one of a set of isolated oligonucleotides selected according to an embodiment of the present invention is ligated at both ends of a DNA fragment having a sticky terminal A.
  • the tagged ligation product is then captured using a thiolated specific binding antibody to obtain a tagged ligation product containing the thiolated DNA.
  • the thiolated specific binding antibody is a 5 mc antibody.
  • the tagged ligation product is subjected to high temperature or NaOH denaturation treatment prior to capture of the tagged ligation product using a thiolated specific binding antibody.
  • the tagged ligation product containing the thiolated DNA and the tagged ligation product containing the thiolated DNA are isolated and amplified to constitute the MeDIP-seq library.
  • amplification of a tagged ligation product containing a thiolated DNA is carried out by a PCR reaction, and the PCR reaction uses an oligonucleotide having the sequence shown in SEQ ID NOS: 63 and 64 as a primer. , as well as using a hot start taq enzyme.
  • the present invention provides a method of constructing a MeDIP-seq library, comprising:
  • the starting purpose research material may be any species, including various plants, animals, microorganisms, such as humans, plants, insects, especially mammals including human, mouse genomic DNA, fragmentation methods including atomization, ultrasonic fragmentation , HydroShear or digestion treatment, thereby breaking the genomic DNA into fragments of preferably 200-400 bp in size; wherein the fragmentation method preferably uses ultrasonic fragmentation;
  • the disrupted fragmented DNA is end-repaired by an enzyme such as, but not limited to, T4 DNA polymerase, Klenow fragment, and T4 polynucleotide kinase to form a blunt-ended DNA random fragment, which is then included but not included.
  • an enzyme such as, but not limited to, T4 DNA polymerase, Klenow fragment, and T4 polynucleotide kinase to form a blunt-ended DNA random fragment, which is then included but not included.
  • Klenow 3 '-5' exo-
  • the DNA random fragment end-linked "A" base is ligated to a different tag linker, including but not limited to T4 DNA ligase, preferably the 5' and 3' ends of the DNA random fragment are simultaneously ligated to the tag a linker; then performing a concentration test on the ligation product including, but not limited to, real-time quantitative PCR to determine the effective concentration of each sample;
  • Step 4 Sample mixing, quantification and immune response
  • the equivalent amount of the ligation product with different tag linkers is taken for equal mixing, the total amount is controlled at 1-3 ⁇ ⁇ , preferably 1.5-2 g; the exogenous thiolated positive is preferably added to the mixed sample.
  • the control and the non-thiolated negative control were used as a control to determine the capture efficiency; then the sample was mixed for high temperature or NaOH denaturation followed by the addition of a thiolated specific binding antibody, preferably a 5 mc antibody, for immunological reaction (IP);
  • An exogenous thiolated positive control refers to a known sequence (eg, a DNA sequence of 200-300 bp), where the CG sites are defined (eg, 5 CG sites), positive control sites Both are thiolated (pretreated with thiol-transferase), and these sites in the non-thiolated negative control are unsterylated, so the antibody will be enriched for thiolation without enrichment. Deuterated. Since these 200-300 fragments are all designed with primers, it is possible to detect the enrichment effect according to QPCR. Positive and negative controls are techniques well known to those skilled in the art;
  • Step 5 Capture DNA for Q-PCR detection
  • IP Immunoreactivity
  • Step 6 PCR amplification and library size selection
  • the IP capture-purified DNA is subjected to a preferred 8-10 cycles of low-cycle PCR amplification, and the amplified product is a MeDIP-seq multi-sample mixed sequencing library, and the PCR amplification product is preferably used with 2% agar.
  • the size of the fragment is selected by gel electrophoresis by gel electrophoresis; after the target band is excised and purified, it is the MeDIP-seq library to be sequenced; PCR amplification preferably uses a hot start taq enzyme.
  • the above-described tag joint in the method of constructing the MeDIP-seq library according to an embodiment of the present invention is a DNA tag joint according to an embodiment of the present invention.
  • a DNA tag according to an embodiment of the present invention can be efficiently introduced into a MeDIP-seq library constructed for a DNA sample.
  • the MeDIP-seq library can be sequenced to obtain the thiolation information of the DNA sample and the sequence information of the DNA tag, thereby enabling the differentiation of the source of the DNA sample.
  • the method of constructing the MeDIP-seq library according to the embodiment of the present invention simultaneously constructs the MeDIP-seq library for a plurality of DNA samples, which can greatly save the sample preparation time and the reagent dosage, so that the efficient and low-cost MeDIP-seq library preparation becomes a reality, so that MeDIP-seq population studies of large sample sizes of clinical samples became possible.
  • the inventors have surprisingly found that when the MeDIP-seq library containing various DNA tags is constructed using the oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results is obtained. And repeatability is very good.
  • the present invention also provides a kit for constructing a MeDIP-seq library of sample DNA, the kit comprising: a set of isolated oligonucleotides, according to an embodiment of the present invention,
  • Each of the set of isolated oligonucleotides is separately disposed in a different container.
  • a DNA tag according to an embodiment of the present invention can be easily introduced into the constructed MeDIP-seq library.
  • components for constructing the MeDIP-seq library may be included in the kit, and details are not described herein.
  • the present invention also provides a MeDIP-seq library constructed according to the method of constructing a MeDIP-seq library of the present invention.
  • the tagged MeDIP-seq library can be effectively applied to high-throughput sequencing technologies such as Solexa technology, so that the obtained sequence of samples can be used to accurately classify the thiolation information of the obtained sample DNA.
  • the invention also provides a method of determining thiolation information for a DNA sample.
  • An embodiment of the present invention comprising: a method of constructing a MeDIP-seq library according to an embodiment of the present invention, establishing a MeDIP-seq library of a DNA sample; and sequencing the MeDIP-seq library to determine a DNA sample Deuterated information.
  • the thiolation information of the DNA sample in the MeDIP-seq library and the sequence information of the DNA tag can be efficiently obtained, thereby distinguishing the source of the DNA sample.
  • the inventors have surprisingly found that using the method according to an embodiment of the present invention to determine the thiolation information of a DNA sample can effectively reduce the problem of data output bias and can accurately distinguish a plurality of MeDIP-seq libraries. .
  • the constructed MeDIP-seq library can be sequenced by any known method, and the type thereof is not particularly limited.
  • sequencing of the MeDIP-seq library is performed using at least one selected from the group consisting of Solexa, Solid, 454, True Single Molecule DNA sequencing technology, SMRT.TM. technology, and nanopore sequencing technology.
  • at least one of SOLEXA, SOLID, 454, PacBi o SMRTTM technology, and nanopore sequencing technology is used.
  • the method of determining the thiolation information of the DNA sample above can be applied to a plurality of samples.
  • the present invention provides a method of determining thiolation information for a plurality of DN A samples.
  • the method comprises the steps of: establishing a MeDIP-seq library of the DNA sample, independently of each of the plurality of samples, according to a method of constructing a MeDIP-Seq library according to an embodiment of the invention , wherein different DNA samples are labeled with different and known sequences.
  • different DNA samples will be The MeDIP-Seq library was mixed to obtain a MeDIP-Seq library mixture.
  • the MeDIP-Seq library mixture is then sequenced to obtain thiolation information for the tag sequence and the DNA sample; and the thiolation information of the DNA sample is classified based on the tag sequence to obtain thiolation information for the various samples.
  • the expression "mixing MeDIP-Seq libraries of different DNA samples to obtain a MeDIP-Seq library mixture" as used herein should be understood in a broad sense, either after independently constructing the MeDIP-Seq library, The resulting MeDIP-Seq library is mixed, and the intermediate product can also be mixed during the preparation of the MeDIP-Seq library, followed by preparation of a MeDIP-Seq library containing various tags, as long as the sequence of DNA tags for different samples It is known.
  • sequencing of the MeDIP-seq library mixture is performed using at least one selected from the group consisting of Solexa, Solid, 454, True Single Molecule DNA sequencing technology, SMRT.TM. technology, and nanopore sequencing technology.
  • at least one of SOLEXA, SOLID, 454, PacBi o SMRTTM technology, and nanopore sequencing technology is used.
  • the method according to an embodiment of the present invention can make full use of high-throughput sequencing technology, for example, using Solexa sequencing technology to simultaneously sequence MeDIP-Seq libraries of various samples, thereby improving the efficiency and throughput of MeDIP-Seq library sequencing. At the same time, the efficiency of determining the thiolation information of various DNA samples can be improved.
  • MeDIP-Seq library Using the method of constructing the MeDIP-Seq library according to an embodiment of the present invention, we constructed a mixed library of 6 samples starting from 6 human peripheral blood genomic DNA (2 ⁇ g each) samples. TA The quality of the library was tested with TA clone, and then high-throughput sequencing was performed.
  • the recovered fragment was then subjected to the 3' end plus base "A" by: adding 2.3 ⁇ l of lOxBlue buffer, 0.5 ⁇ l of 5 mM dATP, 0.5 ⁇ l of Klenow polymerase to 19.7 ⁇ l of DNA recovery solution. (3 '-5 ' exo-) , incubated at 37 °C for 30 minutes, purified to 25 ⁇ l of Elution Buffer (EB) by Ampure Beads.
  • EB Elution Buffer
  • the synthesized 100 micromolar Index-NF_adapter and Index-NR_adapter were mixed with 10 ⁇ l, respectively, and placed at 94 ° C for 5 minutes, placed in a 65 ° C water bath for 15 minutes and then naturally cooled to obtain 50 ⁇ M Index Adapter annealing product. .
  • the DNA fragment of the tag adapter was quantified using Q-PCR [9] , and the reaction system was as follows:
  • reaction system PCR amplification of the DNA fragment after MeDIP capture
  • the PCR amplification product was recovered by QIAquick Gel Extraction Kit (Qiagen), dissolved in 30 ⁇ l of Elution Buffer (EB), and taken for 5 ⁇ l for TA clone detection. The remaining library was used for sequencing.
  • QPCR quantitatively detect library yield (see, for example, Bemd Buehler, Holly H. Hogrefe, Graham Scott et al. Rapid quantification of DNA libraries for next-generation sequencing. Methods. 2010.
  • TA clone was detected in the mixed library, and 51 valid sequences were detected. Among the 51, 45 were able to identify the tag 20, the tag efficiency was 88.24%, and 40 out of 45 were comparable to the genomic group, accounting for 88.89%. Among them, the efficiency of the label and the efficiency of the comparison genomic group were all above 85%, indicating that the library quality was good.
  • the least number of bars measured is indexl and index4; 6 of index6 are measured to account for 13% of all valid indexes; the least number of bars detected is 10 of index 3 are detected. Accounted for 22% of all valid indexes. Overall, the randomness of each tag is measured, indicating that it is effective and feasible according to this method.
  • the overall data analysis results of the library showed that all the six label samples were valid and more consistent, and the unique genomic rate of valid data was above 70%. It is indicated that the sequencing results are consistent with the results of TA clone, and it is also indicated that the method for constructing library sequencing result data is available.
  • Table 2.2 -2 Average thiolation level analysis results for 6 sample coverage intervals
  • the average thiolation level analysis of the six sample coverage intervals showed that the phase difference between the samples was small, and the thiolation level was about 70%, indicating that the MeDIP library was enriched in the sorghum-based region, and the difference between the samples was very high. small.
  • Correlation analysis shows the correlation between the two sample data, that is, whether it covers the common sorghum-based region, and the better the correlation in the sorghum-based interval, the more successful the experiment.
  • the parameter setting of the correlation analysis of this experiment The data amount is pre-homogenized, and then in lk, more than 50% is covered, and the covered sequence is greater than 5 to calculate a valid coverage unit. Then compare the coverage relationship of the two samples to the window of such lk.
  • Figure 2 is a graph showing the correlation analysis of inter-sample enriched fragments of two of the thiolation information of six DNA samples obtained by the method for determining thiolation information of various DNA samples according to an embodiment of the present invention. Specifically, index1 and index2 are selected as a window with lk length as a window, and the number of segments in the window is calculated. The result shows that the segment coverage between different samples in the high coverage region is very correlated, which illustrates the embodiment according to the present invention.
  • the method of determining the thiolation information of various DNA samples by sequencing the database to determine the thiolation information of various DN A samples different samples can be effectively enriched in some sorghum-based regions, and there is no sample caused by experimental methods. The difference in enrichment between the effects.
  • the method of information and the kit for constructing the MeDIP-seq library of sample DNA can be applied to genomic thiolated DNA enrichment and can effectively improve the sequencing throughput of sequencing platforms such as the Solexa sequencing platform.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Description

DNA标签及其应用 优先权信息
本申请请求 2010 年 9 月 21 日向中国国家知识产权局提交的、 专利申请号为 201010299246.5的专利申请的优先权和权益, 并且通过参照将其全文并入此处。
技术领域
本发明涉及分子生物学领域, 特别是基因组曱基化 DNA富集技术领域。 具体的, 本发明涉及用于基因组曱基化 DNA富集的 DNA标签及其应用。 更具体的, 本发明提 供了用于构建样品 DNA的 MeDIP-seq文库的 DNA标签、 寡核苷酸、 MeDIP-seq文库 及其制备方法、 确定 DNA样品的曱基化信息的方法、 确定多种 DNA样品的曱基化信 息的方法以及用于构建样品 DNA的 MeDIP-seq文库的试剂盒。
背景技术
目前, 基因组 DNA曱基化是表观遗传学研究领域最为热点的方向之一, 也正逐渐 成为哺乳动物发育和癌症等多种疾病的表观遗传学标记。 DN A曱基化不仅对染色质结 构修饰、 基因组稳定性具有重要作用, 而且在真核生物中, DNA曱基化参与多种生物 学过程, 如胚胎发育, 基因组印记, X染色体失活, 基因表达的调节与沉默, 逆转录转 座子的沉默以及哺乳动物肿瘤等多种疾病的发生(例如参见: Brena RM, Huang TH, Plass C. Quantitative assessment of DNA methylation: Potential applications for disease diagnosis, classification, and prognosis in clinical settings. J Mol Med. 2006 May;84(5):365-77. ; Egger G, Liang G, Aparicio A et al. Epigenetics in human disease and prospects for epigenetic therapy. Nature 2004 May 27;429(6990):457-63. , 通过参照将其全文并入本 文) 。 DNA曱基化生物标记不仅为多种疾病的早期诊断, 而且对高危险个体的检测和 评估提供了有利的工具。
然而, 目前对样品基因组曱基化 DNA进行富集的方法, 仍有待改进。
发明内容
本发明是基于发明人的下列发现而完成的:
BS-seq (重亚硫酸盐处理测序), MeDIP-seq (抗体曱基化 DNA免疫富集测序), MBD-seq (曱基化特异结合蛋白富集曱基化 DNA测序) 和 RRBS (全基因组代表性曱基 化测序) 等是目前研究基因组曱基化较为流行的测序方法,但是它们不同程度上受到成 本、 通量和分辨率的限制 (例如参见 Li N, Ye M, Li Y et al. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods. 2010 Apr 27. [Epub ahead of print]; Down TA, Rakyan VK, Turner DJ, et al. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol. 2008 Jul;26(7):779-85. ; Serre D, Lee BH, Ting AH. MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 2010 Jan;38(2):391 -9. , 通过参照将其全文并入本 文) 。 其中, BS-seq是 CpG曱基化分析最常用的方法, 可以提供单碱基分辨率的曱基化 信息, 但需要全基因组经重亚硫酸盐 (bisulfite ) 处理之后结合测序研究, 因此数据量 庞大, 测序及分析成本高 (例如参见 Serre D, Lee BH, Ting AH. MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome. Nucleic Acids Res. 2010 Jan;38(2):391 -9. ; Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009 Nov 19;462(7271):315-22.通过参照将其全文并入本文) 。 而 MeDIP-seq、 MBD-seq和 RRBS分别在不同程度上选择性的减少了测序的样本量。 RRBS 只能覆盖基因组约 10-20%的区域, 且主要是基因组的 CpG岛和小部分启动子区域,很难 在整体水平反应基因组曱基化特征 (例如参见 Meissner A, Mikkelsen TS, Gu H, et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008 Aug 7;454(7205):766-70.; Gu H, Bock C, Mikkelsen TS, et al. Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution. Nat Methods. 2010 Feb;7(2): 133-6. , 通过参照将其全文并入本文)。 MBD-seq和 MeDIP-seq则分别利用曱基 化特异结合蛋白 (MBD2 ) 和曱基化特异结合抗体 (5mc抗体)与曱基化 DNA结合起到富 集的作用。 MBD-seq主要富集高 CpG区域的高曱基化 DNA。 MeDIP-seq主要富集高曱基 化、 适度 CpG密度的区域。 已知的 BS-seq结果显示, 大部分高曱基化的调节区域为较低 的 CpG密度, 因此, MeDIP-seq更适合这种特征的分析(例如参见 Li N, Ye M, Li Υ β/. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods. 2010 Apr 27. [Epub ahead of print] , 通过参照将其全文并入 ^文 )。此夕卜, 5mc 抗体不具有 CpG位点的特异性, 因此对非 CpG位点的胞嘧啶曱基化特征的分析具有重要 的意义 (例如参见 Li N, Ye M, Li Y et al. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods. 2010 Apr 27. [Epub ahead of print] ,通 过参照将其全文并入本文) 。 因此, 目前对样品基因组曱基化 DNA进行富集多釆用 MeDIP-seq技术。 然而, 已有的 MeDIP-seq技术仍然有较多缺陷, 其主要缺陷是操作步 骤繁瑣, 不易对大规模样本进行研究。 而且, 现有方法都是将单个样品单独进行免疫反 应然后对富集得到的 DN A结合高通量测序研究, 这种方法使得对大量样本处理时在时 间、 人力和资金上的花费极其巨大, 可行性大打折扣。
本发明旨在解决现有技术问题的至少之一。 为此, 本发明的一个方面, 提出了一种 能够用于构建 MeDIP-seq文库的 DNA标签(在本文中, 有时也简单地称为 "标签" )。 根据本发明的一个方面, 本发明提出了一组分离的 DNA标签。 根据本发明的一些实施 例, 这些分离的 DNA标签由 SEQ ID NO : ( 3N-2 ) 所示的核苷酸构成, 其中 N= l -20 的任意整数。 在本说明书中, 这些 DNA标签分别被命名为 DNA Index-N , 其中 N= l -20 的任意整数, 其序列如下表 1所示。 利用上述根据本发明实施例的 DNA标签, 通过将 DNA标签与样品 DNA或其等同物相连, 可以精确地表征 DNA的样品来源。 由此, 利 用上述 DNA标签, 可以同时构建多种样品的含不同 DNA标签的 MeDIP-seq文库, 从 而可以通过将来源于不同样品的 MeDIP-seq 文库混合之后进行测序, 并且能够基于 DNA标签对 MeDIP-seq文库的曱基化信息进行分类, 从而可以获得多种 DNA样品的 曱基化信息, 由此可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时 对多种 MeDIP-seq文库进行测序, 从而提高 MeDIP-seq文库的测序效率和通量。 发明 人惊奇地发现, 利用根据本发明实施例的 DNA标签构 MeDIP-seq文库, 能够精确地 对多种 MeDIP-seq 文库进行区分, 并且所得到的测序数据结果的稳定性和可重复性非 常好。
根据本发明的另一方面, 本发明还提供了用于将上述 DNA标签引入样品 DNA或 其等同物中的一组分离的寡核苷酸。根据本发明的实施例的一组分离的寡核苷酸, 具有 第一链和第二链, 所述第一链分别由 SEQ ID NO : ( 3N- 1 ) 所示的核苷酸构成, 所述 第二链分别由 SEQ ID NO : ( 3N ) 所示的核苷酸构成, 其中, 对于相同的寡核苷酸, 其第一链和第二链的 N取值相同, 并且 N= 1 -20的任意整数。 根据本发明的实施例, 这 些寡核苷酸(在本说明书中, 有时也称为 "DNA标签接头" 或 "标签接头" ) 分别具 有如前所述的 #居本发明实施例的 DNA标签, 并且具有粘性末端 T , 因而, 可以通过 连接反应, 将相应的 DNA标签引入到 DNA或其等同物中。 与 DNA标签的命名方法类 似, 在本说明书中, 与 DNA标签 DNA Index-N相对应的寡核苷酸( DNA标签接头) 被命名为 DNA Index-N adapter, 其中 N= l -20的任意整数, 进一步, DNA标签接头的 第一链(在本文中, 有时也称为 "正义序列" )和第二链(在本文中, 有时也称为 "反 义序列")分别被命名为 DNA Index-NF adapter和 DNA Index-NR adapter,其中 N=l -20 的任意整数, 其序列如下表 1所示(表 所示序列方向均是 5 ' - 3 '方向) 。 根据本发明 的实施例, 可以通过将正义序列 DNA Index-NF adapter 和其相应的反义序列 DNA Index-NR adapter进行等摩尔退火处理而形成相应的具 Y型结构的 DNA标签接头。 表 1 DNA标签 ( DNA index-N ) 和 DNA标签接头 ( DNA Index-N_adapter )序列
Figure imgf000004_0001
DNA Index- 14 ATGTCA(40)
DNA Index- 14F— adapter TAC ACTCTTTCCCTACACGACGCTCTTCCGATCTATGTCAT(41 )
DNA Index- 14R— adapter 5-Phos/TGACATAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(42)
DNA Index- 15 ATTCCT(43)
DNA Index- 15F— adapter TACACTCTTTCCCTACACGACGCTCTTCCGATCTATTCCTT(44)
DNA Index- 15R— adapter 5-Phos/AGGAATAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(45)
DNA Index- 16 CAACAC(46)
DNA Index- 16F— adapter TACACTCTTTCCCTACACGACGCTCTTCCGATCTCAACACT(47)
DNA Index- 16R— adapter 5-Phos/GTGTTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(48)
DNA Index- 17 CACAAG(49)
DNA Index- 17F— adapter TACACTCTTTCCCTACACGACGCTCTTCCGATCTCACAAGT(50)
DNA Index- 17R— adapter 5-Phos/CTTGTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(51)
DNA Index- 18 CACGGT(52)
DNA Index- 18F— adapter TACACTCTTTCCCTACACGACGCTCTTCCGATCTCACGGTT(53)
DNA Index- 18R— adapter 5-Phos/ACCGTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(54)
DNA Index- 19 CACTCA(55)
DNA Index- 19F— adapter TACACTCTTTCCCTACACGACGCTCTTCCGATCTCACTCAT(56)
DNA Index- 19R— adapter 5-Phos/TGAGTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(57)
DNA Index-20 CCAACG(58)
DNA Index-20F— adapter TACACTCTTTCCCTACACGACGCTCTTCCGATCTCCAACGT(59)
DNA Index-20R— adapter 5-Phos/CGTTGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC(60) 利用上述根据本发明实施例的寡核苷酸, 能够有效地将 DNA 标签引入到样品的
DNA或其等同物中, 由此能够构建具有 DNA标签的 MeDIP-Seq文库。 另外, 发明人 惊奇地发现, 当针对相同的样品, 釆用具有不同标签的寡核苷酸构建含有各种 DNA标 签的 MeDIP-Seq文库时, 所得到的测序数据结果的稳定性和可重复性非常好。 根据本 发明的实施例, 当釆用 pearson系数进行数据分析时, 利用 DNA Indexl-20所构建的人 全血样本 MeDIP-Seq文库均表现出了至少 0.99的相关性。 关于 pearson系数具体算法 的细节可以参见相 文献, 例如: t Hoen, P. A., Y. Ariyurek, et al. (2008). "Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five micro array platforms." Nucleic Acids Res 36(21): el41 , 通过 参照将其全文并入本文。 重复性越高, 则其 pearson系数越接近 1。
根据本发明的又一方面, 本发明提供了一种制备 MeDIP-seq文库的方法。 根据本发 明的实施例, 其包括: 将所述样品 DNA片段化, 以便获得 DNA片段; 将所述 DNA片段 进行末端修复, 以便获得经过末端修复的 DNA片段; 在所述经过末端修复的 DNA片段 的末端添加碱基 A ,以便获得具有粘性末端 A的 DNA片段;将所述具有粘性末端 A的 DNA 片段与根据本发明实施例的的一组分离的寡核苷酸的一种相连,以便获得具有标签的连 接产物; 利用甲基化特异结合抗体对所述具有标签的连接产物进行捕获, 以获得含有甲 基化 DN A的具有标签的连接产物; 以及分离和扩增所述含有甲基化 DN A的具有标签的 连接产物, 所述含有甲基化 DNA的具有标签的连接产物构成所述 MeDIP-seq文库。 利用 #居本发明实施例的构建 MeDIP-seq文库的方法, 能够有效地将 #居本发明实施例的 DNA标签引入到针对样品 DNA所构 的 MeDIP-seq文库中。 从而可以通过对 MeDIP-seq 文库进行测序, 获得 DNA样品的甲基化信息以及 DNA标签的信息, 从而能够对 DNA样 品的来源进行区分。 另外, 发明人惊奇地发现, 当针对相同的样品, 基于上述方法, 釆 用具有不同标签的寡核苷酸构建含有各种 DNA标签的 MeDIP-seq文库时, 所得到的测序 数据结果的稳定性和可重复性非常好。
进一步, 本发明还提供了一种 MeDIP-seq 文库, 其是由根据本发明实施例的构建 MeDIP-seq文库的方法所获得的。
根据本发明的又一方面, 本发明还提供了一种确定 DNA样品的曱基化信息的方法。 根据本发明的实施例, 其包括: 根据本发明实施例的构建 MeDIP-seq文库的方法建立所 述 DNA样品的 MeDIP-seq文库; 以及对所述 MeDIP-seq文库进行测序, 以确定所述 DNA 样品的曱基化信息。 基于该方法, 能够有效地获得 MeDIP-seq文库中 DNA样品的曱基化 信息以及 DNA标签的序列信息, 从而能够对 DNA样品的来源进行区分。 另外, 发明人 惊奇地发现, 利用根据本发明实施例的方法确定 DNA样品的曱基化信息, 能够有效地 减少数据产出偏向性的问题, 并且能够精确地对多种 MeDIP-seq文库进行区分。
根据本发明的再一方面, 本发明还提供了一种确定多种 DNA样品的曱基化信息的 方法。 根据本发明的实施例, 其包括以下步骤: 针对所述多种样品的每一种, 分别独立 地才艮据本发明实施例的构建 MeDIP-seq文库的方法, 建立所述 DNA样品的 MeDIP-seq 文库, 其中, 不同的 DNA样品釆用相互不同并且已知序列的标签; 将所述不同 DNA 样品的 MeDIP-Seq文库进行混合,以便获得 MeDIP-Seq文库混合物;对所述 MeDIP-Seq 文库混合物进行测序, 以便获得所述标签序列和所述 DNA样品的曱基化信息; 以及基 于所述标签序列对所述 DNA样品的曱基化信息进行分类, 以便获得所述多种样品的曱 基化信息。 由此, 根据本发明实施例的该方法, 可以充分利用高通量的测序技术, 例如 利用 Solexa 测序技术, 同时对多种样品的 MeDIP-Seq 文库进行测序, 从而提高 MeDIP-Seq文库测序的效率和通量, 同时可以提高确定多种 DNA样品的曱基化信息的 效率。
根据本发明的再一方面, 还提供了一种用于构建样品 DNA的 MeDIP-seq文库的试 剂盒, 根据本发明的实施例, 该试剂盒包括: 一组分离的寡核苷酸, 所述分离的寡核苷 酸具有第一链和第二链, 所述第一链分别由 SEQ ID NO: ( 3N-1 ) 所示的核苷酸构成, 所述第二链分别由 SEQ ID NO: ( 3N ) 所示的核苷酸构成, 其中, 对于相同的寡核苷 酸, 其第一链和第二链的 N取值相同, 并且 N=l-20的任意整数, 其中, 所述一组分离 的寡核苷酸的每一种分别设置在不同的容器中。 由此, 利用该试剂盒, 能够方便地将根 据本发明实施例的 DNA标签引入到构建的 MeDIP-seq文库中。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。
附图说明
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中:
图 1:显示了根据本发明实施例的多种样品的混合 MeDIP-seq文库的构建方法的流 程示意图;
图 2: 显示了根据本发明实施例的确定多种 DNA样品的曱基化信息的方法获得的 6种 DNA样品的曱基化信息中其中 2个样品的样品间富集片段的相关性分析。
发明详细描述
一 下面详细描述本发明的 施例, 所述实施例的示例在附图中示出, 其中自始至终相 图描述的实施例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制。
需要说明的是, 术语 "第一" 、 "第二" 仅用于描述目的, 而不能理解为指示或暗 示相对重要性或者隐含指明所指示的技术特征的数量。 由此, 限定有 "第一"、 "第二" 的特征可以明示或者隐含地包括一个或者更多个该特征。进一步地,在本发明的描述中, 除非另有说明, "多个" 的含义是两个或两个以上。
DNA标签 根据本申请的一个方面, 本发明提出了一些分离的 DNA标签。 根据本发明的实施 例, 这些分离的 DNA标签分别由 SEQ ID NO: ( 3N-2 )所示的核苷酸序列构成, 其中 N=l-20的任意整数。 在本说明书中, 这些 DNA标签分别被命名为 DNA Index-N, 其中 N=l-20的任意整数, 其序列如前面表 1所示, 在此不再赘述。
在本发明中所使用术语 "DNA" 可以是任何包含脱氧核糖核苷酸的聚合物, 包括 但不限于经过修饰的或者未经修饰的 DNA。利用根据本发明实施例的 DN A标签, 通过 将 DNA标签与样品的 DNA或其等同物相连, 得到具有标签的 MeDIP-seq文库, 通过 对 MeDIP-seq文库进行测序, 可以获得 DNA样品的曱基化信息以及标签的序列, 进而 基于标签的序列可以精确地表征 DNA样品的来源。 由此, 利用上述 DNA标签, 可以 同时构建多种样品的 MeDIP-seq文库, 从而可以通过将来源于不同样品的 MeDIP-seq 文库进行混合, 同时进行测序, 基于 DNA标签对 DNA样品的曱基化信息进行分类, 获得多种 DNA 样品的曱基化信息。 从而可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的 MeDIP-seq文库进行测序, 从而提高了通过高通 量测序技术的效率和通量, 降低了确定 DNA样品的曱基化信息的成本。 这里所使用的 表述方式 "DNA标签与样品的 DNA或其等同物相连" 应做广义理解, 其包括 DNA标 签可以与样品的 DNA直接相连, 以构建 MeDIP-seq文库, 也可以与和样品的 DNA具 有相同序列的核酸 (例如可以是相应的 RNA序列或 cDNA序列, 其与 DNA具有相同 的序列) 相连。
本申请的发明人发现: 在本发明中, 为了设计有效的 DNA标签, 首先需要考虑标 签序列之间的可识别性和识别率的问题。 其次,在标签混合量少于 12个样品的情况下, 必须考虑到混合后的标签上的每个碱基位点的 GT含量。 因为 Solexa测序过程中,碱基 G和 T的激发荧光一样, 碱基 A和 C的激发光是一样的, 因此必须考虑碱基 "GT" 含 量与碱基 "AC" 含量的 "平衡" , 最适碱基 "GT" 含量为 50%, 能保证标签识别率最 高和错误率最低。 最后, 还要考虑数据产出的可重复性和准确性, 即为了实现能够有效 构建 DNA标签文库并进行测序, 所构建的一组 DNA标签需要能够保证结果可靠, 可 重复性高, 也就是针对同样的 DNA样品, 可以保证利用该组 DN A标签中的不同标签 构建的 MeDIP-seq 文库, 能够获得一致的测序结果, 因而可以确保实验结果可靠且重 复性高。 另外, 还需要同时避免标签序列出现 3或 3个以上连续的碱基的出现, 因为 3 个或 3个以上连续的碱基会增加序列在合成过程中或测序过程中的错误率,同时也要尽 量避免 DNA标签接头自身形成发夹结构。
为此, 本申请的发明人进行了大量的筛选工作, 并且选定了根据本发明实施例的一 组分离的 DNA标签, 其分别由 SEQ ID NO: ( 3N-2 ) 所示的核苷酸序列构成, 其中 N=l-20的任意整数。 其序列如前面表 1所示, 不再赘述。 另外, 发明人发现这些标签 之间的差异至少有 3个碱基, 即至少 3个碱基序列不同, 并且当标签的 6个碱基中的任 意 1个碱基出现测序错误或合成错误, 都不影响到标签的最终识别。这些标签可以应用 于任何 MeDIP-seq 文库的构建。 目前尚未有关于这些标签应用于样品基因组曱基化 DNA富集的文库构建并通过 Solexa测序的报道。
根据本发明的一些实施例, 所釆用的 DNA标签为长度是 6bp的核酸序列, 并且所述 标签之间的差异在 3个碱基以上, 所述一组 DNA标签由如下组成: 选自表 1中标签的至 少 5个, 或至少 10个, 或至少 15个, 或全部 20个。 具体地, 才艮据本发明的实施例, 所述 标签优选地至少包括表 1所示的 20种标签的 DNA index- 1 - DNA index-5 ,或 DNA index-6 - DNA index- 10, 或 DNA index- 11 - DNA index- 15 , 或 DNA index- 16 - DNA index-20 , 或者他们任何两个或多个的组合。 在本发明的一些具体示例中, 所述相差 1个碱基包括 对表 1所示 20个标签的序列中 1个碱基的取代、 添加或缺失。
根据本发明的实施例, 本发明还提供了将根据本发明实施例的标签用于 MeDIP-seq 文库构建并测序的用途, 其中 MeDIP-seq文库的 DNA标签接头包含 #居本发明实施例的 DNA标签, 从而构成各自相对应的 DNA标签接头。
寡核苷酸以及构建 MeDIP-seq文库
根据本发明的又一方面, 本发明提供了一组分离的寡核苷酸, 其可以用于将前面所 描述的 DNA标签引入到样品的 DNA片段中, 进而构建含有标签的 MeDIP-seq文库。 根据 本发明的实施例, 本发明提供了一组分离的寡核苷酸, 该组分离的寡核苷酸中的每一种 均具有粘性末端 T, 并且这些分离的寡核苷酸具有第一链和第二链, 粘性末端 T形成在 每一种寡核苷酸的第一链上。 其中, 根据本发明的实施例, 第一链分别由 SEQ ID NO: ( 3N-1 ) 所示的核苷酸构成, 第二链分别由 SEQ ID NO: ( 3N ) 所示的核苷酸构成, 其中,对于相同的寡核苷酸,其第一链和第二链的 N取值相同,并且 N=l-20的任意整数。 本领域技术人员能够理解,可以通过分别将构成相应寡核苷酸的第一链与第二链进行退 火处理, 而形成相应的寡核苷酸。 根据本发明的实施例, 上述寡核苷酸分别具有如前所 述的根据本发明实施例的 DNA标签, 并且这些寡核苷酸具有粘性末端, 因而, 可以通 过连接反应, 将相应的 DNA标签引入到样品的 DNA或其等同物中。 具体地, 这些寡核 苷酸的序列如前面表 1所示, 在此不再赘述。
发明人发现, 根据本发明的实施例所提供的寡核苷酸序列 (DNA标签接头) 具有 较高的稳定性。 该发现主要是根据本发明的一些实施例, 通过 Lasergene软件 ( http://www.dnastar.com/ ) 分析测试这些寡核苷酸序列的结构稳定性得来的。 使用 Lasergene的 PrimerSelect软件, 通过分析两条序列之间形成的能量值可以判断双链体之 间的亲和力参数, 从而预测 DNA标签接头形成的最稳定二聚体结构 (the most stable dimer overrall ) 及能量值, 其中, 能量值 ( kcal/mol ) 的绝对值越大, 表示双链体的结 果越稳定。 分别对前面表 1所示的 20个 DNA标签接头进行上述的结构稳定性和亲和力分 析, 结果表明, 这些 DNA标签接头形成的 "Y型" 结构非常稳定。
根据本发明的一些实施例, 本发明提供了一些 DNA标签接头, 这些标签接头包含 #居本发明实施例的 DNA标签, 并且优选地同时用作 5'和 3'接头, 所述一组标签接头包 括或由如下组成: 选自表 1中标签接头的至少 5个, 或至少 10个, 或至少 15个, 或全部 20 个。 根据本发明的具体示例, 这些标签接头优选地至少包括表 1所示的 20个标签接头中 的 DNA index- 1F/R_ adapter - DNA index-5F/R_ adapter, 或 DNA index-6F/R_ adapter - DNA index- 10F/R— adapter , 或 DNA index- 11 F/R adapter - DNA index- 15 F/R adapter , 或 DNA index- 16F/R_ adapter - DNA index-20F/R_ adapter, 或者他们任何两 或多个的 组合。 根据具体的示例, 相差 1个碱基包括对标签序列中 1个碱基的取代、 添加或删除。 根据本发明的实施例, 还提供了 DNA标签接头用于 MeDIP-seq文库构建并测序的用途, 优选地所述标签接头同时用作 MeDIP-seq文库的 5'和 3'接头。 由此, 根据本发明的实施 例, 还提供了使用上述 DNA标签接头构建的 MeDIP-seq文库。
根据本发明的另一方面, 本发明还提供了一种利用上述寡核苷酸(DNA标签接头) 构建样品 DNA的 MeDIP-seq文库的方法。 具体地, 才艮据本发明的实施例, 该方法包括: 首先,将样品 DNA片段化, 以便获得 DNA片段。根据本发明的实施例,样品 DNA 的来源并不受特别限制, 可以来源于各种植物、 动物、 微生物。 根据本发明的实施例的 具体示例, DNA样品来自于哺乳动物、 植物和昆虫的至少一种。 根据本发明的一个实 施例, 样品 DNA为选自人、 小鼠的基因组 DNA的至少一种。 发明人发现, 利用根据 本发明实施例的方法, 能够有效地构建多种常见模式生物的 MeDIP-seq 文库。 #居本 发明的实施例, 将样品 DNA片段化是通过雾化、 超声片段化、 HydroShear或酶切处理 而进行的, 优选地釆用超声片段化法。 根据本发明的实施例, DNA片段的长度为 200 - 400b , 由此能够进一步提高构建 MeDIP-seq文库以及后续测序的效率。
其次, 将 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段。 根据本 发明的实施例, 末端修复是利用 T4 DNA聚合酶、 Klenow片段和 T4 多聚核苷酸激酶 进行的。 接着, 在经过末端修复的 DNA片段的末端添加碱基 A, 以便获得具有粘性末端 A 的 DNA片段。 根据本发明的实施例, 在经过末端修复的 DNA片段的末端添加碱基 A 是利用 Klenow (3'-5' exo-) 酶进行的。
接下来,将具有粘性末端 A的 DNA片段与根据本发明实施例的的一组分离的寡核 苷酸的一种相连, 以便获得具有标签的连接产物。 根据本发明的实施例, 在具有粘性末 端 A的 DNA片段的两端均连接选自根据本发明实施例的一组分离的寡核苷酸的一种。
然后, 利用曱基化特异结合抗体对具有标签的连接产物进行捕获, 以获得含有曱基 化 DNA的具有标签的连接产物。 根据本发明的实施例, 曱基化特异结合抗体为 5mc抗 体。根据本发明的一些具体示例,在利用曱基化特异结合抗体对具有标签的连接产物进 行捕获之前, 需将具有标签的连接产物进行高温或者 NaOH变性处理。
最后, 分离和扩增得到的含有曱基化 DNA 的具有标签的连接产物, 含有曱基化 DNA的具有标签的连接产物构成所述 MeDIP-seq文库。 根据本发明的实施例, 扩增含 有曱基化 DNA的具有标签的连接产物是通过 PCR反应进行的, 且 PCR反应使用具有 如 SEQ ID NO: 63和 64所示序列的寡核苷酸作为引物, 以及使用热启动 taq酶。
进一步, 根据本发明的实施例, 本发明提供了一种构建 MeDIP-seq 文库的方法, 其包括:
步骤一 样品 DNA片段化
起始目的研究材料可以为任意物种, 包括各种植物、 动物、微生物, 例如人, 植物, 昆虫, 特别是哺乳动物包括人、 小鼠的基因组 DNA, 片段化的方法包括雾化、 超声片 段化、 HydroShear或酶切处理, 从而将基因组 DNA打断为大小优选为 200-400bp的片 段; 其中片段化方法中优选地釆用超声片段化法;
步骤二 DNA片段末端修复及 3'端连接碱基 "A"
打断后的片段化 DNA在包括但不限于 T4 DNA聚合酶、 Klenow片段和 T4 多聚核 苷酸激酶等酶的作用下, 进行末端修复, 形成平末端的 DNA随机片段, 然后在包括但 不限于 Klenow (3 '-5' exo-) 酶的作用下,在平末端化的 DNA随机片段的 3 '末端连接" A" 碱基;
步骤三 标签接头的连接及定量
在包括但不限于 T4 DNA连接酶的作用下, 将末端连接" A"碱基的 DNA随机片段 末端连接不同的标签接头,优选地所述 DNA随机片段 5'和 3'端同时连接所述标签接头; 然后对连接产物釆用包括但不限于实时定量 PCR进行浓度检测, 确定各个样品的有效 浓度;
步骤四 样品混合, 定量及免疫反应
取含等量的连接有不同标签接头的连接产物, 进行等量混合, 总量控制在 1-3μ§ , 优选 1.5-2 g; 混合后的样品中优选地加入外源的曱基化的阳性对照和未曱基化的阴性 对照作为对照确定捕获效率; 然后混合样品进行高温或 NaOH 变性后加入曱基化特异 结合抗体, 优选地是 5mc抗体进行免疫反应(IP);
外源的曱基化的阳性对照是指一段已知序列(如 200-300bp的一段 DNA序列), 当 中的含有的 CG位点是确定的(如 5个 CG位点), 阳性对照这些位点都是曱基化的(预 先用曱基化转移酶处理), 未曱基化的阴性对照的这些位点都是未曱基化的, 所以抗体 会富集曱基化的而不富集未曱基化的。 因为这 200-300的片段都是设计好引物的, 所以 可以有此根据 QPCR检测富集的效果。 阳性对照和阴性对照是本领域技术人员已熟知 的技术;
步骤五 捕获 DNA进行 Q-PCR检测
免疫反应 (IP ) 捕获后的 DNA 纯化后进行 Q-PCR检测富集效率, 根据原混合样 品和捕获 DNA的 Ct值检测抗体对曱基化 DNA捕获效率;
步骤六 PCR扩增和文库大小选择 对 IP捕获纯化后的 DNA进行优选的 8-10个循环的低循环 PCR扩增, 扩增后产物 即 MeDIP-seq多样品混合测序文库, 对所述 PCR扩增产物优选地釆用 2%琼脂糖凝胶 电泳进行切胶回收选择片段大小; 将目的条带切下纯化后, 即为待测序的 MeDIP-seq 文库; PCR 扩增优选地使用热启动 taq酶。
根据本发明的一个具体示例, 上述根据本发明实施例的构建 MeDIP-seq文库的方法 中的标签接头是根据本发明实施例的 DNA标签接头。
利用根据本发明实施例的构建 MeDIP-seq 文库的方法, 能够有效地将根据本发明 实施例的 DNA标签引入到针对 DNA样品所构建的 MeDIP-seq文库中。 从而可以通过 对 MeDIP-seq文库进行测序,获得 DNA样品的曱基化信息以及 DNA标签的序列信息, 从而能够对 DNA样品的来源进行区分。 根据本发明实施例的构建 MeDIP-seq文库的方 法对多种 DNA样品同时构建 MeDIP-seq文库,可以大大节省样本准备时间及试剂用量, 使得高效、 低成本的 MeDIP-seq 文库准备成为现实, 使得大样本量的临床样本的 MeDIP-seq群体研究成为可能。 另外, 发明人惊奇地发现, 当针对相同的样品, 基于上 述方法, 釆用具有不同标签的寡核苷酸构建含有各种 DNA标签的 MeDIP-seq文库时, 所得到的测序数据结果的稳定性和可重复性非常好。
根据本发明的再一方面, 本发明还提供了一种用于构建样品 DNA的 MeDIP-seq文 库的试剂盒, 根据本发明的实施例, 该试剂盒包括: 一组分离的寡核苷酸, 所述分离的 寡核苷酸具有第一链和第二链, 所述第一链分别由 SEQ ID NO: ( 3N-1 ) 所示的核苷 酸构成, 所述第二链分别由 SEQ ID NO: ( 3N ) 所示的核苷酸构成, 其中, 对于相同 的寡核苷酸, 其第一链和第二链的 N取值相同, 并且 N=l-20的任意整数, 其中, 所述 一组分离的寡核苷酸的每一种分别设置在不同的容器中。 由此, 利用该试剂盒, 能够方 便地将根据本发明实施例的 DNA标签引入到构建的 MeDIP-seq文库中。 当然, 本领域 技术人员能够理解, 试剂盒中还可以包含其他用于构建 MeDIP-seq 文库的常规组件, 在此不再赘述。
MeDIP-seq文库及确定 DNA样品的甲基化信息的方法
根据本发明的又一方面, 本发明还提供了一种 MeDIP-seq 文库, 其是根据本发明 的构建 MeDIP-seq文库的方法所构建的。 该具有标签的 MeDIP-seq文库可以有效地应 用于高通量测序技术例如 Solexa技术, 从而可以通过获得标签序列, 来对所获得的样 品 DNA的曱基化信息精确地进行样品来源分类。
根据本发明的又一方面,本发明还提供了一种确定 DNA样品的曱基化信息的方法。
#居本发明的实施例, 其包括: #居本发明实施例的构建 MeDIP-seq 文库的方法, 建 立 DNA样品的 MeDIP-seq文库; 以及, 对 MeDIP-seq文库进行测序, 以确定 DNA样 品的曱基化信息。 基于该方法, 能够有效地获得 MeDIP-seq文库中 DNA样品的曱基化 信息以及 DNA标签的序列信息, 从而能够对 DNA样品的来源进行区分。 另外, 发明 人惊奇地发现, 利用根据本发明实施例的方法确定 DNA样品的曱基化信息, 能够有效 地减少数据产出偏向性的问题, 并且能够精确地对多种 MeDIP-seq 文库进行区分。 根 据本发明的实施例, 可以釆用任何已知的方法对所构建的 MeDIP-seq 文库进行测序, 其类型并不受特别限制。 根据本发明的一些具体示例, 对 MeDIP-seq 文库进行测序是 利用选自 Solexa、 Solid、 454、 True Single Molecule DNA sequencing技术、 SMRT.TM. 技术以及纳米孔测序技术的至少一种进行的。 例如, 根据本发明的具体示例, 釆用 SOLEXA, SOLID, 454、 PacBi o SMRT™技术以及纳米孔测序技术的至少一种。
进一步, 可以将上面确定 DNA样品的曱基化信息的方法应用于多种样品。 例如, 根据本发明的实施例, 本发明提供了一种确定多种 DN A样品的曱基化信息的方法。 根 据本发明的实施例, 其包括以下步骤: 针对所述多种样品的每一种, 分别独立地根据本 发明实施例的构建 MeDIP-Seq文库的方法, 建立所述 DNA样品的 MeDIP-seq文库, 其 中, 不同的 DNA样品釆用相互不同并且已知序列的标签。 接着, 将不同 DNA样品的 MeDIP-Seq文库进行混合, 以便获得 MeDIP-Seq文库混合物。 然后, 对 MeDIP-Seq文 库混合物进行测序, 以便获得标签序列和 DNA样品的曱基化信息; 以及基于标签序列 对 DNA样品的曱基化信息进行分类, 以便获得多种样品的曱基化信息。 其中, 在本文 中所使用的表达方式 "将不同 DNA 样品的 MeDIP-Seq 文库进行混合, 以便获得 MeDIP-Seq文库混合物" 应作广义理解, 既可以是在独立地构建 MeDIP-Seq文库后, 将所得到的 MeDIP-Seq文库进行混合, 也可以在制备 MeDIP-Seq文库的过程中, 将中 间产物进行混合, 随后制备含有多种标签的 MeDIP-Seq文库, 只要针对不同的样品的 DNA标签的序列是已知的即可。 根据本发明的实施例, 对 MeDIP-seq文库混合物进行 测序是利用选自 Solexa、 Solid、 454、 True Single Molecule DNA sequencing 技术、 SMRT.TM.技术以及纳米孔测序技术的至少一种进行的。 根据本发明的具体示例, 釆用 SOLEXA、 SOLID, 454、 PacBi o SMRT™技术以及纳米孔测序技术的至少一种。 根据 本发明实施例的该方法, 可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的 MeDIP-Seq文库进行测序, 从而提高 MeDIP-Seq文库测序的效率和 通量, 同时可以提高确定多种 DNA样品的曱基化信息的效率。
需要说明的是, 根据本发明实施例的确定 DNA样品序列信息的方法是本申请的发 明人经过艰苦的创造性劳动和优化工作才完成的。 下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实 施例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条 件的, 按照本领域内的文献所描述的技术或条件(例如参考 J.萨姆布鲁克等著, 黄培堂 等译的 《分子克隆实验指南》 , 第三版, 科学出版社)或者按照产品说明书进行。 所用 试剂或仪器未注明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以釆购自 Illumina公司。
实施例 1
利用根据本发明实施例的构建 MeDIP-Seq文库的方法, 我们以 6个人外周血基因 组 DNA (各 2 ^敫克)样品起始构建了 1个混合了 6个样品的混合文库。 釆用 TA clone 检测了文库的质量, 然后进行了高通量测序比较分析。
1、 实验部分:
PCR 及 QPCR 使用到的引物列表
Figure imgf000011_0001
1.1 DNA片段化
使用 covaris (E-series ) 打碎仪, 在 96孔板中每孔加入 2微克上文所述的人基因组 DNA样品, 用 TE稀释成 8(H敫升体系, 用热封膜密封后, 2000rpm离心 lmin。 按照功 率 20%、 强度为 10, 循环为 500 , 间歇时间设置为 8 , 打断 240s ( 60s四次) 。 打断后 的样品用 2%琼脂糖凝胶电泳检测合格后 ( DNA片段分布在 200-400bp之间, 无蛋白、 RNA污染),经 Ampure Beads( Agencourt )纯化,纯化后回收到 42微升的 Elution Buffer ( EB ) 中。
1.2 DNA 片段末端修复及 3'端加碱基 "A" 在 42微升 DNA溶液中加入 5微升 10 X T4多聚核苷酸激酶緩冲液, 0.4微升 25mM dNTP , 1.2微升 T4 DNA聚合酶, 0.2微升 Klenow聚合酶和 1.2微升 T4多聚核苷酸激 酶(也称为 T4 PNK酶) , 20 °C温育 30分钟, 对片段化的 DNA进行补平末端。 补平的 DNA片段经 Ampure Beads ( Agencourt ) 纯化到 22微升的 Elution Buffer ( EB ) 中。
接着对回收的片段进行 3 '末端加碱基 "A" , 具体操作为: 在 19.7微升的 DNA回 收液中, 加入 2.3微升的 lOxBlue buffer, 0.5微升 5mM dATP , 0.5微升 Klenow聚合 酶 (3 '-5 ' exo-) , 37 °C温育 30分钟,经 Ampure Beads 纯化到 25微升的 Elution Buffer( EB ) 中。
1.3 标签接头连接
将合成好的 100微摩的 Index-NF_adapter和 Index-NR_adapter分别取 10微升进行 混合, 94 °C , 5分钟, 65 °C水浴放置 15分 4 后自然冷却,得到 50微摩 Index Adapter 退 火产物。
在末端加碱基 "A"后的样品中加入 25微升的 2xRapid ligation buffer, 1微升的 Index Adapter ( 50微摩)和 3微升的 T4 DNA连接酶, 20 °C温育 15分钟, 每种样品取等量 混合, 经 Ampure Beads纯化回收到 32微升的 Elution Buffer ( EB ) 中待用。
1.4 Q-PCR 定量
加完标签接头的 DNA片段使用 Q-PCR[9]进行定量, 反应体系如下:
DNA 1微升
QPCR primer 1.1 1微升
QPCR primer 2.1 1微升
dNTP ( 2.5mM ) 2微升
EvaGreen 1.25 ^:升
Rox 0.5 微升
Taq酶 0.25微升
H2Q 18微升
Total 25微升
1.5 免疫沉淀反应及捕获 DNA 洗脱
取 6个加了不同标签接头的连接产物片段, 根据 QPCR 测得的浓度结果按照 1 : 1 混合样品, 95 °C孵育 3分钟, 快速放入水水混合物中使之冷却。 加入 1.5 升的抗体和 15微升的磁珠 ( Magnetic Methylated DNA Immunoprecipitation Kit ) , 4 °C旋转混合孵 育免疫反应过夜。
每个免疫反应加入 100微升的 Buffer DIB和 2微升的蛋白酶 K( Magnetic Methylated DNA Immunoprecipitation Kit提供) , 55 °C反应 15min后, 100 °C反应 15min, 然后在 4 °C下以 14,000rpm离心 5分钟, 吸取上清分别转移至新的 EP管中, 经酚氯仿抽提纯化, 最后溶解到 32微升的 Elution Buffer ( EB ) 中。
1.6 PCR 扩增
PCR 扩增 MeDIP 捕获后的 DNA片段, 反应体系为:
MeDIP捕获 DNA 32 微升
lOxPCR buffer 5微升
Index primer 1.1 2.5微升
Index primer2.1 2.5微升
MgS04 (50mM) 2微升
dNTP(2.5mM) 5微升
Platinum Pfx DNA Polymerase 1.0 ^:升
总体积 25微升 PCR反应条件: 10个循环
Figure imgf000013_0001
4 C 保存
PCR 扩增产物釆用 QIAquick Gel Extraction Kit ( Qiagen ) 进行胶纯化回收, 溶于 30微升的 Elution Buffer ( EB ) 中, 取 5微升进行 TA clone检测, 剩余文库用于测序测
10 序。
1.7 文库检测
1 ) 使用 Agilent 2100 Bioanalyzer检测文库产量。
使用 QPCR定量检测文库产量(例如参见 Bemd Buehler, Holly H. Hogrefe, Graham Scott et al. Rapid quantification of DNA libraries for next-generation sequencing. Methods. 2010.
15 50:S 15-S 18. , 通过参照将其全文并入本文) 。
2、 结果部分:
2.1 TA clone 文库检测结果
表 2.1 : 文库 TA clone 检测结果
Figure imgf000013_0002
对混合文库进行 TA clone 检测, 共测得有效序列 51条, 51条当中能够识别标签 20 的有 45条, 标签有效率占 88.24% , 45条中有 40条可比对回基因组, 占 88.89%。 其 中标签的有效率和比对回基因组效率都在 85%以上,说明文库质量较好。 另外从标签分 布来看被测到条数最少的是 indexl和 index4; index6 各有 6条被测到占全部有效 index 的 13% ; 被测到条数最少的是 index 3共有 10条被测到占全部有效 index的 22%。 整体来看各标签被测到的随机性较好, 说明根据该方法是有效可行的。
25 2.2 测序 ( Illumina GA ) 数据信息高级分析比较结果
1 ) 整体数据比对结果
表 2.2 -1 : 文库整体数据分析结果
Figure imgf000013_0003
文库整体数据分析结果表明 6 个标签样品都可以有效且较均勾识别, 且有效数据 的唯一比对回基因组率都在 70% 以上。 说明测序结果和 TA clone 结果一致, 也说明 该方法构建文库测序结果数据是可用的。
2 ) 测序数据覆盖区间的平均曱基化水平比较分析
表 2.2 -2: 6个样品覆盖区间的平均曱基化水平分析结果
Figure imgf000014_0001
6个样品覆盖区间的平均曱基化水平分析结果表明样品间相差很小, 曱基化水平都 在 70%左右,说明 MeDIP 建库富集到了高曱基化区域,且各个样品之间的差异性很小。
3 ) 样品间相关性分析
相关性分析可以看出 2 个样品数据之间的关联, 即是否覆盖到共有的高曱基化区 间, 在高曱基化的区间的相关性越好说明实验越成功。 本次实验相关性分析参数设置: 数据量预先进行均一化处理, 然后以 lk为单位, 50%以上被覆盖, 覆盖的序列大于 5 条算一个有效的覆盖单位。 然后比较 2个样品对这样的 lk的窗口的覆盖关系。
图 2显示了根据本发明实施例的确定多种 DNA样品的曱基化信息的方法获得的 6 种 DNA样品的曱基化信息中其中 2个样品的样品间富集片段的相关性分析。 具体地, 选取 indexl和 index2以 lk长度为一个窗口, 计算窗口中片段的个数, 结果表明, 在高 覆盖的区域不同样品间的片段覆盖情况相关性很好,这样就说明根据本发明实施例的建 库测序确定多种 DNA样品的曱基化信息的方法确定多种 DN A样品的曱基化信息时, 不同样品都可对一些高曱基化区域有效富集,不会出现因实验方法导致样品之间富集效 果上的差异。
工业实用性
本发明的用于构建样品 DNA的 MeDIP-seq文库的 DNA标签、寡核苷酸、 MeDIP-seq 文库及其制备方法、 确定 DNA样品的曱基化信息的方法、 确定多种 DNA样品的曱基 化信息的方法以及用于构建样品 DNA的 MeDIP-seq文库的试剂盒, 能够应用于基因组 曱基化 DNA富集, 并且能够有效地提高测序平台, 例如 Solexa测序平台的测序通量。
尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根 据已经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的 保护范围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。
在本说明书的描述中, 参考术语 "一个实施例" 、 "一些实施例" 、 "示意性实施 例" 、 "示例" 、 "具体示例" 、 或 "一些示例" 等的描述意指结合该实施例或示例描 述的具体特征、 结构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说 明书中, 对上述术语的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具 体特征、 结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结 合。

Claims

权利要求书
I.一组分离的 DNA标签,其由 SEQ ID NO: ( 3N-2 )所示的核苷酸构成,其中 N=l-20 的任意整数。
2、 一组分离的寡核苷酸, 所述分离的寡核苷酸具有第一链和第二链, 所述第一链 分别由 SEQ ID NO: ( 3N-1 )所示的核苷酸构成, 所述第二链分别由 SEQ ID NO: ( 3N ) 所示的核苷酸构成, 其中, 对于相同的寡核苷酸, 其第一链和第二链的 N取值相同, 并 且 N=l-20的任意整数。
3、 一种构建样品 DNA的 MeDIP-seq文库的方法, 其特征在于, 包括下列步骤: 将所述样品 DNA片段化, 以便获得 DNA片段;
将所述 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段;
在所述经过末端修复的 DNA片段的末端添加碱基 A, 以便获得具有粘性末端 A的 DNA片段;
将所述具有粘性末端 A的 DNA片段与选自权利要求 2所述的一组分离的寡核苷酸 的一种相连, 以便获得具有标签的连接产物;
利用甲基化特异结合抗体对所述具有标签的连接产物进行捕获,以便获得含有甲基 化 DNA的具有标签的连接产物; 以及
分离和扩增所述含有甲基化 DNA 的具有标签的连接产物, 所述含有甲基化 DNA 的具有标签的连接产物构成所述 MeDIP-seq文库。
4、 根据权利要求 3所述的方法, 其特征在于, 所述样品 DNA来自于哺乳动物、 植物和昆虫的至少一种。
5、 根据权利要求 4所述的方法, 其特征在于, 所述样品 DNA为选自人、 小鼠的 基因组 DNA的至少一种。
6、 根据权利要求 3所述的方法, 其特征在于, 所述将样品 DNA片段化是通过雾 化、 超声片段化、 HydroShear或酶切处理而进行的。
7、根据权利要求 3所述的方法,其特征在于,所述 DNA片段的长度为 200 - 400bp。
8、 根据权利要求 3所述的方法, 其特征在于, 所述末端修复是利用 T4 DNA聚合 酶、 Klenow片段和 T4 多聚核苷酸激酶进行的。
9、 根据权利要求 3所述的方法, 其特征在于, 所述在所述 DNA片段的末端添加 碱基 A是利用 Klenow (3'-5' exo-) 酶进行的。
10、 根据权利要求 3所述的方法, 其特征在于, 在所述具有粘性末端 A的 DNA片 段的两端分别连接选自权利要求 2所述的一组分离的寡核苷酸的一种。
I I、 根据权利要求 3 所述的方法, 其特征在于, 所述甲基化特异结合抗体为 5mc 抗体。
12、 根据权利要求 11所述的方法, 其特征在于, 在利用甲基化特异结合抗体对所 述具有标签的连接产物进行捕获之前,将所述具有标签的连接产物进行高温或者 NaOH 变性处理。
13、 根据权利要求 3所述的方法, 其特征在于, 扩增所述含有甲基化 DNA的具有 标签的连接产物是通过 PCR反应进行的, 所述 PCR反应使用具有如 SEQ ID NO: 63 和 64所示序列的寡核苷酸作为引物, 以及使用热启动 taq酶。
14、 一种 MeDIP-seq文库, 其是才艮据权利要求 3-13任一项所述的方法构建的。
15、 一种确定 DNA样品的甲基化信息的方法, 其特征在于, 包括以下步骤: 根据权利要求 4-9任一项所述的方法, 建立所述 DNA样品的 MeDIP-seq文库; 以 及
对所述 MeDIP-seq文库进行测序, 以确定所述 DNA样品的甲基化信息。
16、 根据权利要求 15所述的方法, 其特征在于对所述 MeDIP-seq文库进行测序是 利用选自 SOLEXA、 SOLID, 454、 PacB io SMRT™技术以及纳米孔测序技术的至少一 种进行的。
17、 一种确定多种 DNA样品的甲基化信息的方法, 其特征在于, 包括以下步骤: 针对所述多种样品的每一种, 分别独立地根据权利要求 4-9任一项所述的方法, 建 立所述 DNA样品的 MeDIP-seq文库, 其中, 不同的 DNA样品釆用相互不同并且已知 序列的标签;
将所述不同 DNA样品的 MeDIP-Seq文库进行混合, 以便获得 MeDIP-Seq文库混 合物;
对所述 MeDIP-Seq文库混合物进行测序,以便获得所述标签序列和所述 DNA样品 的甲基化信息; 以及
基于所述标签序列对所述 DNA样品的甲基化信息进行分类, 以便获得所述多种样 品的甲基化信息。
18、 根据权利要求 17所述的方法, 其特征在于对所述 MeDIP-seq文库混合物进行 测序是利用选自 SOLEXA、 SOLID, 454、 PacBio SMRT™技术以及纳米孔测序技术的 至少一种进行的。
19、 一种用于构建样品 DNA的 MeDIP-seq文库的试剂盒, 其特征在于, 包括: 一组分离的寡核苷酸, 所述分离的寡核苷酸具有第一链和第二链, 所述第一链分别 由 SEQ ID NO: ( 3N-1 ) 所示的核苷酸构成, 所述第二链分别由 SEQ ID NO: ( 3N ) 所示的核苷酸构成, 其中, 对于相同的寡核苷酸, 其第一链和第二链的 N取值相同, 并且 N=l-20的任意整数,
其中, 所述一组分离的寡核苷酸的每一种分别设置在不同的容器中。
PCT/CN2011/079907 2010-09-21 2011-09-21 Dna标签及其应用 WO2012037884A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010299246.5 2010-09-21
CN201010299246.5A CN102409042B (zh) 2010-09-21 2010-09-21 一种高通量基因组甲基化dna富集方法及其所使用标签和标签接头

Publications (1)

Publication Number Publication Date
WO2012037884A1 true WO2012037884A1 (zh) 2012-03-29

Family

ID=45873448

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079907 WO2012037884A1 (zh) 2010-09-21 2011-09-21 Dna标签及其应用

Country Status (2)

Country Link
CN (1) CN102409042B (zh)
WO (1) WO2012037884A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109182465A (zh) * 2018-08-03 2019-01-11 中山大学 一种高通量核酸表观遗传修饰定量分析方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012019320A1 (zh) * 2010-08-11 2012-02-16 中国科学院心理研究所 一种甲基化dna的高通量测序方法及其应用
US20150252359A1 (en) * 2012-11-21 2015-09-10 Berry Genomics Co., Ltd Method for tracking test sample by second-generation DNA sequencing technology and detection kit
CN104450872A (zh) * 2013-09-25 2015-03-25 上海市肿瘤研究所 一种高通量多样本多靶点单碱基分辨率的甲基化水平检测方法
CN104005090B (zh) * 2014-05-28 2016-08-17 北京诺禾致源生物信息科技有限公司 低质量样本dna高通量测序文库的构建方法
CN104480214B (zh) * 2014-12-30 2018-01-16 深圳市易基因科技有限公司 羟甲基化暨甲基化长序列标签测序技术
CN106048009B (zh) * 2016-06-03 2020-02-18 人和未来生物科技(长沙)有限公司 一种用于超低频基因突变检测的标签接头及其应用
WO2018090373A1 (zh) * 2016-11-21 2018-05-24 深圳华大智造科技有限公司 一种dna末端修复与加a的方法
CN108251504A (zh) * 2018-01-17 2018-07-06 翌圣生物科技(上海)有限公司 一种超快速构建基因组dna测序文库的方法和试剂盒
CN108796057A (zh) * 2018-06-29 2018-11-13 上海交通大学 一种少量样品全基因组dna甲基化的检测方法及试剂盒
WO2020135347A1 (zh) * 2018-12-29 2020-07-02 深圳华大生命科学研究院 一种dna甲基化检测的方法、试剂盒、装置和应用
CN114381501A (zh) * 2021-12-30 2022-04-22 翌圣生物科技(上海)股份有限公司 一种简便的高通量dna甲基化检测方法
CN117821575B (zh) * 2024-03-06 2024-06-07 纳昂达(南京)生物科技有限公司 Dna甲基化水平的检测方法及应用

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1103616A2 (en) * 1989-02-24 2001-05-30 Monsanto Company Synthetic plant genes and method for preparation
US20060282914A1 (en) * 2003-11-18 2006-12-14 D Halluin Kathleen Targeted dna insertion in plants

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2121983A2 (en) * 2007-02-02 2009-11-25 Illumina Cambridge Limited Methods for indexing samples and sequencing multiple nucleotide templates
BRPI0807952A2 (pt) * 2007-02-20 2014-06-10 Anaptysbio Inc Sistemas de hipermutação somática
CN100564618C (zh) * 2007-06-13 2009-12-02 北京万达因生物医学技术有限责任公司 分子置换标签测序并行检测法即寡聚核酸代码标签分子库微球阵列分析

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1103616A2 (en) * 1989-02-24 2001-05-30 Monsanto Company Synthetic plant genes and method for preparation
US20060282914A1 (en) * 2003-11-18 2006-12-14 D Halluin Kathleen Targeted dna insertion in plants

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FILIPE, V. J. ET AL.: "Methyl-DNA immunoprecipitation (MeDIP): Hunting down the DNA methylome", BIOTECHNIQUES, vol. 44, no. 1, 31 January 2008 (2008-01-31), pages 35 - 43 *
TAN, JIANXIN ET AL.: "Progresses of methods for epigenomics study", HEREDITAS, vol. 31, no. 1, 15 January 2009 (2009-01-15), pages 3 - 12 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109182465A (zh) * 2018-08-03 2019-01-11 中山大学 一种高通量核酸表观遗传修饰定量分析方法

Also Published As

Publication number Publication date
CN102409042A (zh) 2012-04-11
CN102409042B (zh) 2014-05-14

Similar Documents

Publication Publication Date Title
WO2012037884A1 (zh) Dna标签及其应用
US9133513B2 (en) High throughput methylation detection method
JP6925424B2 (ja) 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法
JP5986572B2 (ja) 固定化プライマーを使用した標的dnaの直接的な捕捉、増幅、および配列決定
DK2630263T3 (en) VARITAL COUNTING OF NUCLEIC ACIDS TO GET INFORMATION ON NUMBER OF GENOMIC COPIES
US20210164128A1 (en) Methods and compositions for sequencing
TW201321518A (zh) 微量核酸樣本的庫製備方法及其應用
WO2012037876A1 (zh) Dna标签及其应用
CN111032881A (zh) 核酸的精确和大规模平行定量
WO2012037882A1 (zh) Dna标签及其应用
WO2012037880A1 (zh) Dna标签及其应用
WO2012037877A1 (zh) Dna标签及其应用
EP2844766B1 (en) Targeted dna enrichment and sequencing
WO2012116661A1 (zh) Dna标签及其应用
AU2016102398A4 (en) Method for enriching target nucleic acid sequence from nucleic acid sample
IL256444B2 (en) Reagents, kits, and methods for molecular barcoding
WO2012126398A1 (zh) Dna标签及其用途
JP2022541387A (ja) 近接ライゲーションのための方法および組成物
US20140336058A1 (en) Method and kit for characterizing rna in a composition
WO2018113799A1 (zh) 构建简化基因组文库的方法及试剂盒
WO2017024991A1 (zh) 猪bcr重链多重pcr引物及其应用
EP3559268B1 (en) Methods and reagents for molecular barcoding
WO2012037879A1 (zh) 核酸标签及其应用
US20090263798A1 (en) Method For Identification Of Novel Physical Linkage Of Genomic Sequences
Myllykangas et al. Targeted deep resequencing of the human cancer genome using next-generation technologies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11826411

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/08/2013)

122 Ep: pct application non-entry in european phase

Ref document number: 11826411

Country of ref document: EP

Kind code of ref document: A1