WO2013075629A1 - 一种检测核酸羟甲基化修饰的方法及其应用 - Google Patents

一种检测核酸羟甲基化修饰的方法及其应用 Download PDF

Info

Publication number
WO2013075629A1
WO2013075629A1 PCT/CN2012/084964 CN2012084964W WO2013075629A1 WO 2013075629 A1 WO2013075629 A1 WO 2013075629A1 CN 2012084964 W CN2012084964 W CN 2012084964W WO 2013075629 A1 WO2013075629 A1 WO 2013075629A1
Authority
WO
WIPO (PCT)
Prior art keywords
linker
nucleic acid
seq
sequencing
control
Prior art date
Application number
PCT/CN2012/084964
Other languages
English (en)
French (fr)
Inventor
高飞
王君文
张秀清
杨焕明
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Priority to US14/360,594 priority Critical patent/US9567633B2/en
Publication of WO2013075629A1 publication Critical patent/WO2013075629A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • C12Q1/683Hybridisation assays for detection of mutation or polymorphism involving restriction enzymes, e.g. restriction fragment length polymorphism [RFLP]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • the invention belongs to the technical field of genetic engineering, and in particular relates to a method for detecting methylolation modification of a nucleic acid and an application thereof. Background technique
  • 5-hydroxymethylcytosine was first discovered in cytosine in phage in 1952 and has recently been found in mammalian genomes such as mouse neurons and embryonic stem cells. A large number of studies are currently focused on revealing the possible role of 5hmC in genomic organization and stem cell differentiation, and demonstrated that the TET protease family can convert 5mC to 5hmC by oxidation.
  • Another object of the invention is to provide an application of the method.
  • a method of detecting a methylolation modification of a nucleic acid comprising the steps of: (1) performing a glycosylation treatment on the nucleic acid to obtain a conversion of a methylolated base to a glycosylated nucleic acid of a glycosylmethylation base;
  • the first control nucleic acid fragment, the sample nucleic acid fragment and the second control nucleic acid fragment obtained in the step (2) are respectively connected to the biotin-labeled linker to obtain a first control ligation product having a biotin linker, a sample ligation product, and a second control ligation product;
  • the first control final digestion product obtained in the step (6), the final digestion product of the sample, and the final digestion product of the second control, are ligated to the sequencing adaptor, and the sequencing linker ligation product is amplified to obtain the first control sequencing library.
  • the nucleic acid of step (1) is genomic DNA.
  • the nucleic acid of step (1) is derived from an animal, a plant, a bacterium, a fungus, a virus, or a combination thereof.
  • the glycosylation treatment according to the step (1) is: the nucleic acid is transferred to the 5-hydroxyl of the nucleic acid by using the uracil diphosphate glucose as a substrate under the action of the T4-BGT enzyme. On methylcytosine (5-hmC), ⁇ -glucosyl-5-hydroxymethylcytosine (5 gmC) was formed.
  • the first restriction enzyme described in the step (2) is ⁇ /.
  • the second restriction enzyme described in the step (2) is H/3 ⁇ 4J/.
  • sequence of the biotin-labeled linker of step (3) is as shown in SEQ ID NO: 1 and SEQ ID NO: 2.
  • the step (4) further comprises: obtaining the first control Malll digestion product, the sample Malll digestion product, and the second control by using a streptavidin magnetic bead to capture a fragment produced by Malll digestion.
  • the product was digested with Malll to obtain a nucleic acid fragment having a biotin-labeled linker at one end and a sticky end at one end.
  • the second linker of step (5) is formed by pairing two oligonucleotide chains, the two The nucleotide strand sequences are SEQ ID NO: 3 and SEQ ID NO: 4, respectively; or the two oligonucleotide strands are SEQ ID NO: 5 and SEQ ID NO: 6, respectively; or the two oligonucleotides
  • the glucoside chains are SEQ ID NO: 7 and SEQ ID NO: 8, respectively.
  • the specific restriction enzyme described in the step (6) is Mmel or Ecop5I.
  • the step (6) is digested with Mmel to obtain a 20 bp-length fragment having a second linker at one end and a sticky end at the other end.
  • the step (6) is digested with Ecopl5I to obtain a fragment of 25 bp in length having a second linker at one end and a sticky end at the other end.
  • the sequencing linker of step (7) is paired by two oligonucleotide strands, and the two oligonucleotide strand sequences are SEQ ID NO: 9 and SEQ ID NO, respectively. : 10.
  • step (8) is selected from any of the following sequencing platforms:
  • Illumina Solaxa Roche 454 ABI SOLID Helicos TRUE single molecule sequencing, PacBio single molecule real time sequencing, Oxford Nanopore nanopore single molecule sequencing.
  • the analyzing the sequence information described in step (8) comprises the steps of:
  • step (iii) calculating the methylation and hydroxylation levels of each CCGG site based on the normalized data obtained in step (ii);
  • the restriction endonuclease comprises Mspl, HpaII, Mmel and Main;
  • the restriction endonuclease comprises Mspl, HpaII, Ecopl5I and Main;
  • the biotin-labeled linker preferably consisting of two oligonucleotide strand pairs, for example, the two oligonucleotide strand sequences are respectively SEQ ID NO : 1 and SEQ ID NO: 2;
  • the second linker being composed of two oligonucleotide strand pairs, preferably, the two oligonucleotide strand sequences are SEQ ID NO: 3 And SEQ ID NO: 4; or the two oligonucleotide strand sequences are SEQ ID NO: 5 and SEQ ID NO: 6; or the two oligonucleotide strand sequences are SEQ ID NO: 7 and SEQ ID NO: 8;
  • the sequencing linker preferably consisting of a pair of two oligonucleotide strands,
  • the two oligonucleotide strand sequences are SEQ ID NO: 9 and SEQ ID NO: 10.
  • the kit further comprises: a reagent for performing magnetic bead capture, an reagent for nucleic acid purification, or a combination thereof.
  • Figure 1 shows a method for detecting a methylolation modification in a preferred embodiment of the present invention.
  • Figure 2 shows that different libraries were tested, and the three different libraries that were finally ligated with linker N and P7 were amplified by PCR, and the fragment size was 96 bp, which was consistent with the theoretical size.
  • Figure 2 (a) shows the stem cell h9 genome via T4. -BGT glycosylation modification, Mspl digestion fragment fragment distribution range;
  • Figure 2 (b) is the stem cell h9 genome directly through the Mspl digestion library fragment distribution;
  • Figure 2 (c) is the stem cell h9 genome directly through the Hpall enzyme library Fragment distribution range.
  • Figure 3 shows the overall distribution trend of methylation and methylolation levels of CCGG sites in the sample.
  • the abscissa is the modification level, and the ordinate is the corresponding modification level.
  • the number of CCGG sites at the modification level is at the total number of sites.
  • Figure 4 shows the level of methylation and methylolation modification of the detected CCGG site on each chromatin.
  • Figure 5 shows the results of a comparison of methylation and methylolation modification data with bisulfite analysis data.
  • a detection method for methylation modification and methylolation modification in nucleic acids specifically, including the steps of: glycosylation modification of nucleic acid and Mspl digestion
  • the digested fragments were ligated to the biotin-labeled linker at the two ends and subjected to Malll digestion; and then captured by streptavidin magnetic beads, all of the captured one ends were linked to the biotin linker, and the other end was highlighted with four bases of CATG.
  • the sequence can indicate the modification status information of the adjacent CCGG site; by ligating a linker containing the Mmel or Ecopl5I cleavage site at the cohesive end of the CATG and cleaving with the corresponding enzyme, the short sequence fragment generated by Bj can be represented
  • the modification information of the CCGG site is obtained; after sequence comparison, the methylation modification and the methylolation modification information can be obtained.
  • the term “containing” includes “comprise”, “consisting essentially of” and “consisting of.” As used herein, the terms “above” and “below” include the number, for example “80% or more” means ⁇ 80%, and “2% or less” means ⁇ 2%.
  • 5-Hydroxymethylpyrimidine is a modified exact group, which is produced by the oxidation of 5-methylcytosine (5-ffiC) by the TET family of enzymes. The absorption and chromatogram of its purple buds are expressed with cytosines. 5-methylcytosine can be present at low levels in a variety of cell types in succulent animals.
  • 5hmC is also present in different genomes or in different cells or tissues.
  • Immunoassay found that the percentage of 5hmC in brain, liver, kidney and colorectal tissues was relatively high, 0.40-0.65%; while in lung tissue, the content was relatively low, 0.18%; in the heart, breast and placenta The content is extremely low, only 0.05-0.06%, which is 0.46-0.57% relative to normal colorectal tissue, and the content of cancerous colorectal tissue is only 0.02-0.06%.
  • 5hmC is mainly concentrated in the vicinity of exons and transcription initiation sites, especially in the promoter containing histone H3 lysine 27 trimethylation (H3K27me3) and histone H3 lysine 4 trimethylation (H3K4me3).
  • H3K27me3 histone H3 lysine 27 trimethylation
  • H3K4me3 histone H3 lysine 4 trimethylation
  • T4 phage ⁇ -glucosyltransferase T4-BGT
  • T4-BGT ⁇ 4 phage ⁇ -glucosyltransferase
  • T4-BGT can efficiently transfer the glucose unit of UDP-Glucose to the 5-hydroxymethylcytosine residue of double-stranded DNA to form ⁇ - Glucosyl-5-hydroxymethylcytosine (5 gmC), while 5 gmC cannot be cleaved by Mspl.
  • the methylolation modification of a specific single CCGG site can be quantitatively detected by PCR semi-quantitative or Q-PCR.
  • the term "primer” refers to a generic term for an oligonucleotide that is complementary to a template and which synthesizes a DNA strand complementary to a template in the action of a DNA polymerase.
  • the primers may be natural RNA, DNA, and may contain any form of natural nucleotides, and the primers may even contain non-natural nucleotides such as LNA or ZNA.
  • the primer is “substantially” (or “substantially") complementary to a particular sequence on a strand on the template. The primer must be sufficiently complementary to a strand on the template to initiate extension, but the sequence of the primer does not have to be fully complementary to the sequence of the template.
  • primers that are not complementary to the template are added to the 5' end of the primer complementary to the template at a 3' end, such primers are still substantially complementary to the template.
  • primers that are not fully complementary can also form a primer-template complex with the template for amplification.
  • the "re-sequencing" of the genome enables humans to detect abnormal changes in disease-associated genes as early as possible, and to conduct in-depth research on the diagnosis and treatment of individual diseases.
  • Those skilled in the art can generally perform high-throughput sequencing using a variety of second-generation sequencing platforms: 454 FLX (Roche), Solexa Genome Analyzer (Illumina), and SOLID from Applied Biosystems.
  • the common feature of these platforms is the extremely high sequencing throughput. Compared to the 96 sequencing capillary sequencing of traditional sequencing, high-throughput sequencing can read 400,000 to 4 million sequences in one experiment. According to the platform, the read length is from 25bp.
  • Solexa high-throughput sequencing includes two steps: DNA cluster formation and on-machine sequencing: a mixture of PCR amplification products is hybridized with a sequencing probe immobilized on a solid phase carrier, and subjected to solid phase bridge PCR amplification to form a sequencing. cluster; The sequencing cluster is sequenced by "edge synthesis-edge sequencing" to obtain the nucleotide sequence of the disease-associated nucleic acid molecule in the sample.
  • the DNA cluster is formed by using a flow cell with a single-stranded primer attached to the surface, and the DNA fragment of the single-stranded state is fixed by the principle of complementary pairing with the linker on the surface of the chip.
  • the fixed single-stranded DNA becomes double-stranded DNA by amplification reaction, and the double strand is denatured into a single strand, one end of which is anchored on the sequencing chip, and the other end is randomly complementary to another primer in the vicinity to be anchored.
  • the DNA clusters were sequenced on the Solexa sequencer while sequencing.
  • the four bases were labeled with different fluorescence, and each base was blocked by a protected base. Only one base could be added to a single reaction. After reading the color of the reaction, the protection group is removed, and the next reaction can be continued. Thus, the exact sequence of the base is obtained.
  • the IndexC tag or barcode is used to distinguish the samples, and after the conventional sequencing is completed, an additional 7 cycles of sequencing for the Index portion can be performed. Up to 1 sequence can be sequenced by Index identification. Twelve different samples were distinguished in the ramp. Detection method
  • the present invention provides a method for accurately detecting a methylol modification site.
  • the method comprises the following steps (see Figure 1):
  • T4-BGT T4 ⁇ -glucosyltransferase
  • the DNA of the glycosylation-modified group is transferred to the 5-hydroxymethylcytosine residue of double-stranded DNA by the action of T4-BGT enzyme using UDP-Glucose as a substrate. , ⁇ -glucosyl-5-hydroxymethylcytosine (5 gmC) was formed.
  • the reaction is independent of the DNA sequence, so all 5-hmCs can be glycosylated, while unmodified cytosine residues and methylated 5-cC residues are not glycosylated; The group did not add T4-BGT and did not undergo glycosylation modification.
  • the genomic DNA may be derived from genomic DNA extracted from animal tissues, cellular genomic DNA, and the like, and may be detected by the technique as long as the CCGG site in the genomic sequence has a C h CGG methylolation modification.
  • Mspl and Hpall have different sensitivity to methylation: Hpall can only recognize and cleave unmodified CCGG sites; Mspl can recognize and cleave various modified CCGG sites (CCGG, C m CGG and PC h CGG; In the present application In the DNA sequence, the superscript m indicates methylation, the superscript h indicates methylolation, but the C g CGG site cannot be cleaved.
  • the end of each set of DNA fragments contains different modification information:
  • the end of the DNA fragment modified by glycosylation and Mspl contains the information of CCGG and C m CGG in the genome, and the DNA fragment directly digested by Hpall The end contains only the information of CCGG in the genome, and the DNA fragment of the control group directly digested by Mspl contains all the information of CCGG, C m CGG and C h CGG in the genome.
  • Biotin-linker (biotin-labeled linker): Biotin-linker is attached to the biotin-labeled linker at both ends of the differently processed and digested DNA fragments by DNA ligase.
  • Streptavidin magnetic bead capture using M-280 chain affinitymycin-conjugated magnetic beads to capture a DNA fragment that is ligated to the biotin-labeled linker at one end and a sticky base at the other end of the 4 base (CATG) wash DNA sequences with sticky ends at both ends were removed, and discarding these sequences had no effect on subsequent analysis.
  • M-280 chain affinitymycin-conjugated magnetic beads to capture a DNA fragment that is ligated to the biotin-labeled linker at one end and a sticky base at the other end of the 4 base (CATG) wash DNA sequences with sticky ends at both ends were removed, and discarding these sequences had no effect on subsequent analysis.
  • Linker N Under the action of DNA ligase, the DNA fragment captured on the streptavidin-coupled magnetic beads and the linker containing the Mmel restriction endonuclease recognition site (Linker N) Ligation, the resulting DNA fragment is ligated to the magnetic beads by the affinity of biotin and streptavidin, and the other end is linked to a Link N containing a Mmel cleavage site.
  • the recognition site for Mmel is 5' TCCRAC3', where R is base or 0. In another preferred embodiment, it is equally feasible to deform the Mmel cleavage site in the Linker N to the Ecop5I cleavage site.
  • Mmel or Ecop5I digestion The restriction enzyme Mmel containing the restriction site contained in Linker N was digested to generate a 20 bp insert with Linker N attached to one end and the other end. The sticky ends of two arbitrary bases are highlighted, and a corresponding fragment bound to the magnetic beads is generated, and each of the fragments linked to the Linker N is generated to represent the modification information on the adjacent CCGG sites.
  • the restriction endonuclease Ecop5I containing the restriction site contained in LinkerN is digested to generate a 25 bp (Ecopl5I) insert having the Linker N at its end, and the other One end is a sticky end protruding from two arbitrary bases, and a corresponding fragment bound to the magnetic beads is generated, and each of the fragments linked to the LinkerN is generated to represent the modification information on the adjacent CCGG sites.
  • PCR amplification and purification PCR amplification of LinkerN and P7 linker sequences for universal amplification, amplification products
  • the 6% non-denaturing PAGE gel was recovered and purified, and the recovered product was subjected to Agilent 2100 fragment size detection and Q-PCR quantitative analysis, and sequence analysis was performed on a Hiseq2000 sequencer.
  • Sequencing and data analysis on the machine After the library is tested, the sequence analysis will be performed on the Hiseq2000 sequencer according to the read length of 50 bases at the single end. After normalizing the sequencing data, compare 20 bp corresponding to each CCGG site in different libraries. The number of short sequences sequenced to obtain information on the methylation and methylolation levels of each site.
  • the analytical comparison of the sequence information is obtained by a method comprising the steps of: (i) filtering the original reads of each library obtained after sequencing to obtain high quality fragment information; The simulation is performed to obtain a virtual library consisting of theoretically digested fragments; (ii) the high-quality fragment information obtained by step ⁇ is compared with the virtual library, and the statistical data is normalized to obtain the sequencing depth of the three libraries. Data; methylation and hydroxylation levels of each CCGG site according to the normalized data obtained in step (ii); statistical samples of methylation and hydroxylation levels of each CCGG site obtained according to step (iii) The level of overall methylation and hydroxylation and the level of methylation and hydroxylation modification on chromatin.
  • the filter conditions include: original library sequence information minus linker sequence information; original library sequence information minus 10% of the number of bases exceeds 10% of the total number of bases; original library sequence information minus base quality value is low Sequence information at 20 bases exceeding 10% of the total number of bases.
  • Normalization includes the steps of: sorting each library according to the depth of the CCGG site, obtaining a ranking value in each library for each CCGG site; obtaining a ranking value for each CCGG site in each column, calculating each The variance of the three sorted values of one locus, the n loops remove the points with larger variance, and the last m loci as the normalized reference line, m, n are positive integers; in another preferred example, m The value range is 5000-15000, n>4 ; the library is normalized according to the proportional relationship between the total sequencing depth of the m sorted stable points in the library. Kit
  • the invention also provides a kit for accurately detecting genomic methylolation modification, the kit comprising:
  • the restriction endonuclease comprises Mspl, HpalK Mmel and P Main; or comprises Mspl, Hpall, Ecopl5I and Malll;
  • the biotin-labeled linker preferably consisting of two oligonucleotide strand pairs, for example, the two oligonucleotide strand sequences are respectively SEQ ID NO : 1 and SEQ ID NO: 2;
  • the second linker preferably consisting of two oligonucleotide strand pairs, for example, the two oligonucleotide strand sequences are SEQ ID NO: 3 and SEQ ID NO: 4; or the two oligonucleotide strand sequences are SEQ ID NO: 5 and SEQ ID NO: 6; or the two oligonucleotide strand sequences are SEQ ID NO: 7 and SEQ ID NO: 8;
  • the sequencing linker preferably consisting of two oligonucleotide strand pairs, for example, the two oligonucleotide strand sequences are SEQ ID NO: 9 and SEQ ID NO: 10;
  • the kit further comprises: a reagent for performing magnetic bead capture, an reagent for nucleic acid purification, or a combination thereof.
  • the main advantages of the invention include: (1) The method of the present invention is a method for detecting hydroxymethyl modification by single-base resolution in combination with a high-throughput sequencing method in a genome-wide range, and can simultaneously detect a certain CCGG site at a single base resolution. Basic modification state
  • the technique adopted by the present invention indirectly reflects the modification state of each site by the sequence tag, and only one end of sequencing, the amount of sequence data is greatly reduced, and the cost is greatly reduced.
  • the invention is further illustrated below in conjunction with specific embodiments. It is to be understood that the examples are merely illustrative of the invention and are not intended to limit the scope of the invention.
  • the experimental methods in the following examples which do not specify the specific conditions are usually carried out according to the conditions described in conventional conditions such as Sambrook et al., Molecular Cloning: Laboratory Manual (New York: Cold Spring Harbor Laboratory Press, 1989), or according to the manufacturer. The suggested conditions. Main instruments and reagents
  • NanoDrop 1000 DNA concentration detector
  • Thermomixer (heating and mixing instrument) Thermomixer comfort Eppendorf
  • Table 2 The main primer sequences and names used in the examples are listed in Table 3.
  • SEQ ID NO: 3 and P SEQ ID NO: 4 is a Linker N sequence with a Mmel recognition site
  • SEQ ID NO 5 and SEQ ID NO: 6, and SEQ ID NO: 7 and SEQ ID NO: 8 are Linker N sequences with an Ecopl5I recognition site.
  • Example 1 Genomic DNA glycosylation modification
  • the genomic DNA of the h9 cell line of 1 ⁇ ⁇ was separately subjected to glycosylation modification and control reaction:
  • the reaction system shown in Table 4 was placed in a 1.5 ml centrifuge tube, respectively.
  • glycosylation treatment and the control DNA were separately digested with Mspl.
  • the system shown in Table 5 was placed in a 1.5 ml centrifuge tube.
  • the enzyme-cut DNA was prepared in a 1.5 ml centrifuge tube, and the reaction system is shown in Table 7.
  • reaction solution was placed on a Thermomixer (Eppendorf), reacted at 37 ° C for 1 hour and 10 minutes, and resuspended once every 10 minutes.
  • Thermomixer Eppendorf
  • the DNA recovery product obtained in Example 9 was prepared in accordance with Table 11 to form a linking reaction system.
  • the library was amplified by using 5 ⁇ l of the reaction product of Example 9 as a template, and the amplification system is shown in Table 12.
  • the gel fragment was selected to be about 86-90 bp in size, and the target strip was placed in a 0.5 ml centrifuge tube with a 2 ml centrifuge tube (there were 6 small holes in the bottom of the 0.5 ml centrifuge tube). This was placed in a centrifuge at 14,000 rpm, centrifuged for 2 min, and the gel was pulverized in a 2 ml centrifuge tube.
  • the Agilent 2100 Bioanalyzer was used to detect the h9 genome after T4-BGT glycosylation modification and then Mspl-digested library and h9 genome were directly subjected to Mspl digestion or Hpall digestion.
  • the library results were as follows: Figure 2 shows Different libraries were tested, and the three different libraries that were finally ligated with linker N and P7 were amplified by PCR, and the fragment size was 96 bp, which was consistent with the theoretical size.
  • Figure 2 (a) shows the stem cell h9 genome via T4-BGT sugar.
  • sequence analysis was performed on a Hiseq2000 sequencer according to a single-end 50 base read length. After normalization of the sequence data, information on the methylation and methylolation levels of each site was obtained by comparing the number of sequencing sequences of 20 bp short sequences corresponding to each CCGG site in three different libraries. The specific steps are as follows:
  • the original sequence information of the built-in library fragments is obtained, and the sequence of the linker in each original sequence is removed according to the sequence information of the sequenced joints of the database, and the original sequence is subjected to mass filtering, and the low frequency is removed.
  • the original sequence of mass, the filter conditions are: the number of N bases in the sequence exceeds 10% of the total number of bases, or the number of bases whose base mass value is less than 20 exceeds 10% of the total number of bases, such a sequence will Filtered 3)
  • the sequence of the human genome hgl9 was subjected to computer simulation under the experimental protocol, and the theoretical digestion fragment was obtained to form a virtual library; and the sequencing sequence obtained by the previous step was compared with the virtual library, which was not allowed. Mismatch; after the comparison is over, the results are compared against the results;
  • Figure 3 shows the overall distribution trend of methylation and methylolation levels of CCGG sites in the sample.
  • the abscissa is the modification level, and the ordinate is the corresponding modification level.
  • the number of CCGG sites at the modification level is in the total number of sites. Density;
  • Figure 3 shows that the methylation modification detected by the scheme of the present invention has two trends of hypomethylation and hypermethylation modification, and the methylolation modification is only a low level of modification.
  • Figure 4 shows the results of methylation and methylolation at each chromatin.
  • the level of methylation is between 60% and 80%, mainly around 70%.
  • the previously demonstrated methylation modification is approximately 70% identical to the human genomic CG site.
  • the inventors detected that in the human embryonic stem cell h9, the methylolation modification was at a low level, less than 20%, consistent with the low level of methylolation modification demonstrated by the current study, indicating the detection of the present invention.
  • the technology is very reliable.
  • the inventors downloaded published h9 cell genomic bisulfite sequencing data, comparing bisulfite sequencing with the enzymatic methylation and methylolation of the present invention. The difference in sequencing.
  • Figure 5 shows the results of the consistency of methylation and methylolation modification data with bisulfite analysis data.
  • ⁇ 0.25 difference value 87.9% of the methylation or methylolation modification sites detected by the two methods were consistent and the agreement was high.
  • Sites with very small differences may be due to factors such as bisulfite conversion efficiency, cell state differences, and enzymatic cleavage efficiency, but this does not affect the holistic application of the technology. Differences can be accepted.
  • the present embodiment provides a kit for accurately detecting nucleic acid methylolation modification in a sample, comprising the components: (1) a first container and an agent for performing 5hmC glycosylation modification in the container;
  • NEB For high-throughput detection of genomic methylation of CCGG loci in the whole genome, NEB has designed a strategy with the following specific ideas:
  • the whole genome is digested with Mspl, so that all CCGG sites in the genome can be excised (including methylation and methylolation modification sites) based on 100% enzymatic cleavage efficiency;
  • the digested fragment is dCTP as a substrate, and a Klenow fragment is formed to form a sticky end of a base C protruding 5';
  • a 4% acrylamide gel recovers a Klenow fragment-repaired DNA fragment of 40-300 bp in length
  • the fragment is ligated to 5' to highlight the double link of the base G (the linker can mediate subsequent PCR amplification and sequencing);
  • the recovered fragment of the adaptor is modified by BGT glycosylation, and if the CCGG site of the original sequence of the genome contains a methylolation modification, 5gmC is formed;
  • the glycosylation modified product is further digested with Mspl. If the CCGG site is modified by hydroxymethylation, the shellfish head will not be cut;
  • the seventh step 1/3 of the above products are subjected to PCR amplification and sequencing, and only the sequence having a linker at both ends, that is, a sequence having a methylolation modification at both ends can be detected; the remaining 2/3 of the product is divided into two For each part, 1/3 of each is not directly subjected to PCR amplification, and under the action of dCTP substrate, the Klenow fragment is again subjected to end repair to form a sticky end product protruding one base C at one end or both ends, and then Another linker is ligated under the action of a ligase, and one of the ligation products is directly subjected to PCR amplification and sequencing.
  • the other was digested with Hpall and then subjected to end-repair and linker ligation, PCR amplification and sequencing.
  • the CCGG sites at both ends of the first sequence detected were methylolated, and one end of the sequence detected in the second group was methylolated, and the other end was methylated or not.
  • One end of the sequence detected by the panel was a methylolation modification, and the other end was an unmodified CCGG site.
  • NEB's technology has made great progress in detecting the distribution of 5hmC in the genome, it also has one.
  • the technique can only use dCTP as a substrate for end-repair.
  • dCTP a substrate for end-repair.
  • the range of fragments to be analyzed by this technique is selected by the gelatin (40-300 bp), so that the modification information of the sites outside the fragments is not obtained. That is, the detection site is incomplete and the amount obtained is small.
  • the inventors designed a set of biotin-containing linkers and Mmel-containing cleavage sites by the principle of glycosylation of 5hmC, which could not be cleaved by Mspl restriction endonuclease, and 5mC and 5C could be cleaved by Mspl.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供了一种检测核酸羟甲基化修饰的方法,具体地,包括步骤:对核酸进行糖基化修饰和MspI酶切,将酶切后的片段两端都连接生物素标记接头,进行NlaIII酶切;再通过链亲和霉素磁珠捕获,所有捕获到的一端连接生物素接头,另一端突出CATG四个碱基的序列均可表示出其临近的CCGG位点的修饰状态信息;通过在CATG粘性末端连接一个含有MmeI或Ecop15I酶切位点的接头并用相应的酶进行切割,则产生的短序列片段可表示出其临近CCGG位点的修饰信息;进行序列对比,即可获得甲基化修饰和羟甲基化修饰信息。本发明还提供了所述方法的应用。

Description

一种检测核酸羟甲基化修饰的方法及其应用 技术领域
本发明属于基因工程技术领域, 具体地涉及一种检测核酸羟甲基化修饰的方法及其应用。 背景技术
5-羟甲基胞嘧啶 (5hmC)最早于 1952年发现存在于噬菌体中胞嘧啶中, 最近在哺乳动 物基因组 (如小鼠的神经元和胚胎干细胞) 中也发现该修饰方式。 目前大量的研究集中于揭 示 5hmC在基因组组织以及干细胞分化中可能承担的角色,并且证明 TET蛋白酶家族可以通 过氧化作用将 5mC转换为 5hmC。
然而, 尽管 5hmC修饰碱基发现较早, 但目前几乎没有一种有效的酶或化学的方法可以 特异性识别 5hmC残基并分辨其在基因组中的具体分布。 例如甲基化依赖性内切酶 MspJI家 族或 McrBC均不能分辨 5mC和 5hmC, 而甲基化敏感类内切酶如 Mspl或 Hpall等, 在大多 情况下, 5mC和 5hmC对其有相同的影响。之前认为是检测甲基化金标准的重亚硫酸盐处理 分析同样不能有效地分辨是 5mC修饰还是 5hmC修饰。此外,随着 5hmC特异性抗体的出现, 依赖于免疫学检测 5hmC的技术如斑点印迹分析技术、 细胞免疫荧光或免疫组化分析技术等 已被广泛应用于羟甲基化相关的科学研究之中, 但这些技术基本上都只限于检测 5hmC在组 织或细胞内的存在与否或表达量的高低, 而不能定位其在基因组上的分布。 目前, 在全基因 组范围内检测 5hmC分布的技术主要集中在富集捕获结合测序分析的策略, 如: hMeDIP, anti-CMS、 JBP-pull down等, 但所有这些富集捕获的实验方法均不足以达到单碱基精确分辨 5hmC在 DNA序列内精确分布的程度,且依赖于抗体或蛋白捕获的该类技术大多受到非特异 性捕获以及捕获偏好性的限制。
由此可见, 本领域尚缺乏能够检测 5hmC在 DNA中精确分布的大规模检测技术, 因此 亟待需要建立一种能精确检测羟甲基化修饰的方法, 为进一步研究 5-羟甲基胞嘧啶在基因组 内的分布及其相关的表观调控机制提供有力的工具, 也为进一步探索其对相关疾病的发生发 展或在个体发育过程中所承担角色的实现提供前提条件。
发明内容
本发明的目的是提供一种检测核酸羟甲基化修饰的方法。
本发明的另一目的是提供所述方法的应用。 在本发明的第一方面, 提供了一种检测核酸羟甲基化修饰的方法, 所述方法包括步骤: ( 1 )对所述核酸进行糖基化处理, 获得羟甲基化碱基转化为糖基羟甲基化碱基的糖基 化核酸;
(2)对未作糖基化处理的对照组核酸和步骤(1 ) 获得的糖基化核酸, 分别进行第一限 制性内切酶酶切反应, 分别获得第一对照核酸片段和样本核酸片段; 对所述对照组核酸或糖 基化核酸进行第二限制性内切酶酶切反应, 获得第二对照核酸片段;
(3 )对步骤 (2) 获得的第一对照核酸片段、 样本核酸片段和第二对照核酸片段, 分别 连接生物素标记的接头, 获得具有生物素接头的第一对照连接产物、 样本连接产物和第二对 照连接产物;
(4)对步骤 (3) 获得的具有生物素接头的第一对照连接产物、 样本连接产物和第二对 照连接产物, 分别进行 Malll限制性内切酶酶切反应, 产生第一对照 Malll酶切产物、 样本 Nlalll酶切产物和第二对照 Malll酶切产物, 且所述三种产物均为一端为生物素标记接头, 另一端为粘性末端;
(5)对步骤(4)获得的第一对照 Malll酶切产物、样本 Malll酶切产物和第二对照 Malll 酶切产物,分别进行第二接头连接,所述第二接头序列中具有特定限制性内切酶的识别位点; 获得第一对照二次接头连接产物、 样本二次接头连接产物和第二对照二次接头连接产物;
(6)对步骤 (5) 获得的第一对照二次接头连接产物、 样本二次接头连接产物和第二对 照二次接头连接产物进行特定限制性内切酶酶切反应, 获得一端具有第二接头, 另一端具有 粘性末端的第一对照最终酶切产物、 样本最终酶切产物和第二对照最终酶切产物;
(7)对步骤 (6) 获得的第一对照最终酶切产物、 样本最终酶切产物和第二对照最终酶 切产物, 与测序接头连接, 扩增测序接头连接产物, 获得第一对照测序文库、 样本测序文库 和第二对照测序文库;
(8)对步骤 (7) 获得的测序文库进行测序, 分析比较序列信息, 获得核酸羟甲基化修 饰的信息。
在另一优选例中, 步骤 ( 1 )所述的核酸为基因组 DNA。
在另一优选例中, 步骤 (1 ) 所述的核酸来源于动物、 植物、 细菌、 真菌、 病毒, 或其 组合。
在另一优选例中, 步骤(1 )所述的糖基化处理为: 核酸在 T4-BGT酶的作用下, 以尿嘧 啶二磷酸葡萄糖为底物, 将葡萄糖单元转移至核酸的 5-羟甲基胞嘧啶 (5-hmC) 上, 形成 β- 葡糖基 -5-羟甲基胞嘧啶 (5gmC)。
在另一优选例中, 步骤 (2)所述的第一限制性内切酶为 Μ /。
在另一优选例中, 步骤 (2)所述的第二限制性内切酶为 H/¾J/。
在另一优选例中,步骤(3 )所述生物素标记接头的序列如 SEQ ID NO:l和 SEQ ID NO:2 所示。
在另一优选例中, 步骤(4)还包括: 利用链亲和霉素磁珠捕获 Malll酶切后产生的片段 获得所述第一对照 Malll酶切产物、 样本 Malll酶切产物和第二对照 Malll酶切产物, 获得 一端具有生物素标记接头, 一端具有粘性末端的核酸片段。
在另一优选例中, 步骤 (5) 所述的第二接头由两条寡核苷酸链配对而成, 所述两条寡 核苷酸链序列分别为 SEQ ID NO: 3和 SEQ ID NO: 4; 或所述两条寡核苷酸链分别为 SEQ ID NO: 5和 SEQ ID NO: 6; 或所述两条寡核苷酸链分别为 SEQ ID NO: 7和 SEQ ID NO: 8。
在另一优选例中, 步骤 (6) 中所述的特定限制性内切酶为 Mmel或 Ecop5I。
在另一优选例中, 步骤 (6) 中用 Mmel酶切, 获得 20bp长度的一端具有第二接头, 另 —端具有粘性末端的片段。
或在另一优选例中,步骤(6)中用 Ecopl5I酶切,获得 25bp长度的一端具有第二接头, 另一端具有粘性末端的片段。
在另一优选例中, 步骤 (7) 所述的测序接头由两条寡核苷酸链配对而成, 所述两条寡 核苷酸链序列分别为 SEQ ID NO:9禾口 SEQ ID NO: 10。
在另一优选例中, 步骤 ( 8 )所述的测序选自下组任一测序平台进行:
Illumina Solaxa Roche 454、 ABI SOLID Helicos TRUE单分子测序、 PacBio单分子实 时测序、 Oxford Nanopore纳米孔单分子测序。
在另一优选例中, 步骤 ( 8 )所述的分析比较序列信息包括下述步骤:
(i)将测序后获得的各文库原始的读段进行过滤, 获得高质量文库片段; 将参考序列进行 酶切模拟, 获得由理论酶切片段构成的虚拟文库;
(ii)将步骤 ©获得高质量文库片段和虚拟文库进行比对, 对比对统计数据进行归一化, 得 到三个文库的测序深度归一化数据;
(iii)根据步骤 (ii)获得的归一化数据计算每个 CCGG位点甲基化和羟基化水平;
(iv)根据步骤 (iii)获得的每个 CCGG位点甲基化和羟基化水平统计样本整体甲基化和羟 基化水平和甲基化和羟基化修饰在染色质上的水平。 在本发明的第二方面, 提供了一种用于精确检测基因组羟甲基化修饰的试剂盒, 包括 组分:
(1)第一容器以及位于容器内的用于进行糖基化修饰的试剂;
(2)第二容器以及位于容器内的限制性内切酶反应试剂;
在另一优选例中, 所述限制性内切酶包括 Mspl、 HpaII、 Mmel和 Main;
在另一优选例中, 所述限制性内切酶包括 Mspl、 HpaII、 Ecopl5I和 Main;
(3)第三容器以及位于容器内的生物素标记接头,所述生物素标记接头优选由两条寡核苷 酸链配对组成, 例如所述两条寡核苷酸链序列分别为 SEQ ID NO: 1和 SEQ ID NO: 2;
(4)第四容器以及位于容器内的第二接头, 所述第二接头为两条寡核苷酸链配对组成, 优 选地, 所述两条寡核苷酸链序列为 SEQ ID NO: 3和 SEQ ID NO: 4; 或所述两条寡核苷酸链 序列为 SEQ ID NO: 5和 SEQ ID NO: 6; 或所述两条寡核苷酸链序列为 SEQ ID NO: 7和 SEQ ID NO: 8;
(5)第四容器以及位于容器内的测序接头,所述测序接头优选由两条寡核苷酸链配对组成, 例如所述两条寡核苷酸链序列为 SEQ ID NO: 9和 SEQ ID NO: 10。
在另一优选例中, 所述试剂盒还包括: 用于进行磁珠捕获所需的试剂、 用于核酸纯化的 试剂, 或其组合。
应理解, 在本发明范围内中, 本发明的上述各技术特征和在下文 (如实施例)中具体描述 的各技术特征之间都可以互相组合, 从而构成新的或优选的技术方案。 限于篇幅, 在此不再 累述。
附图说明
下列附图用于说明本发明的具体实施方案, 而不用于限定由权利要求书所界定的本 发明范围。
图 1显示了本发明一个优选例中, 检测羟甲基化修饰的方法。
图 2显示不同文库经检测, 最终连接 linker N和 P7接头的三种不同文库分别经过 PCR 扩增后, 其片段大小均为 96bp, 与理论大小相符; 图 2(a)为干细胞 h9基因组经 T4-BGT糖 基化修饰后, Mspl酶切文库片段分布范围; 图 2 (b)为干细胞 h9基因组直接经过 Mspl酶切 文库片段分布范围; 图 2 (c)为干细胞 h9基因组直接经过 Hpall酶切文库片段分布范围。
图 3显示样品各 CCGG位点的甲基化和羟甲基化修饰水平的整体分布趋势, 横坐标为修饰 水平, 纵坐标为相应修饰水平下, 该修饰水平的 CCGG位点数目在总位点数中的密度。
图 4显示检测到的 CCGG位点在每一条染色质上的甲基化修饰和羟甲基化修饰水平。 图 5显示甲基化和羟甲基化修饰数据与重亚硫酸盐分析数据一致性比较结果。
具体实施方式
本发明人经过广泛而深入的研究, 首次建立了一种对核酸中的甲基化修饰和羟甲基化修 饰的检测方法, 具体地, 包括步骤: 对核酸进行糖基化修饰和 Mspl酶切, 将酶切后的片段 两端都连接生物素标记接头, 进行 Malll酶切; 再通过链亲和霉素磁珠捕获, 所有捕获到的 一端连接生物素接头,另一端突出 CATG四个碱基的序列均可表示出其临近的 CCGG位点的 修饰状态信息; 通过在 CATG粘性末端连接一个含有 Mmel或 Ecopl5I酶切位点的接头并用 相应的酶进行切割, 贝 lj产生的短序列片段可表示出其临近 CCGG位点的修饰信息; 进行序列 对比, 即可获得甲基化修饰和羟甲基化修饰信息。 术语
如本文所用, 术语"含有"包括"具有 (comprise)"、 "基本上由...构成"和"由...构成"。如本 文所用, 术语"以上"和"以下"包括本数, 例如" 80%以上"指≥80%,"2%以下"指≤2%。
5-羟甲基胞嘧啶 (5-hydroxymethylcytosine, 5hmC)
5-羟甲基跑嘧啶是一种修饰的確基, 是 TET家族酶通过氧化 5-甲基胞嘧锭 (5-ffiC)产生 的, 其紫夕卜光吸收和色谱表现与胞嘧啶类 ί以。 5- 甲基胞嘧啶可以以低水平存在于喃乳动物的 多种细胞类型中。
5hmC在全基因组或不同细胞或组织中含量也不同。免疫测定发现 5hmC在脑、肝、 肾和 结肠直肠组织中的百分含量比较高, 为 0.40-0.65%; 而在肺组织中含量相对较低, 为 0.18%; 在心脏, 乳房和胎盘中的含量极低, 仅为 0.05-0.06%, 相对正常结肠直肠组织 0.46-0.57%的百 分含量, 癌变的结肠直肠组织中其含量仅为 0.02-0.06%。 5hmC主要集中在外显子和转录起始 位点附近, 尤其集中在启动子含有组蛋白 H3赖氨酸 27三甲基化 (H3K27me3 )和组蛋白 H3 赖氨酸 4三甲基化 (H3K4me3 )这两个标记的基因起始位点。研究表明, 5-羟甲基胞嘧啶可能 在转录调控中发挥作用。
T4噬菌体 β-葡糖基转移酶(T4-BGT)
Τ4噬菌体 β-葡糖基转移酶 (T4-BGT)可以高效地将尿嘧啶二磷酸葡萄糖 (UDP-Glucose) 的葡萄糖单元转移至双链 DNA的 5-羟甲基胞嘧啶残基上, 形成 β-葡糖基 -5-羟甲基胞嘧啶 ( 5gmC), 而 5gmC不能被 Mspl切开。 这样基因组经过 T4-BGT糖基化修饰后, 对于特定 单个 CCGG位点的羟甲基化修饰, 可以经过 PCR半定量或经过 Q-PCR进行定量检测。 引物
如本文所用, 术语 "引物"指的是能与模板互补配对, 在 DNA聚合酶的作用合成与模板 互补的 DNA链的寡聚核苷酸的总称。 引物可以是天然的 RNA、 DNA, 可以含有任何形式的 天然核苷酸, 引物甚至可以含有非天然的核苷酸如 LNA或 ZNA等。 引物"大致上 "(或 "基本 上")与模板上一条链上的一个特殊的序列互补。 引物必须与模板上的一条链充分互补才能开 始延伸, 但引物的序列不必与模板的序列完全互补。 比如, 在一个 3'端与模板互补的引物的 5'端加上一段与模板不互补的序列, 这样的引物仍大致上与模板互补。 只要有足够长的引物 能与模板充分的结合,非完全互补的引物也可以与模板形成引物-模板复合物,从而进行扩增。 高通量测序
基因组的"再测序"使得人类能够尽早地发现与疾病相关基因的异常变化, 有助于对个体 疾病的诊断和治疗进行深入的研究。 本领域技术人员通常可以采用多种第二代测序平台进行 高通量测序: 454 FLX(Roche 公司)、 Solexa Genome Analyzer(Illumina 公司)禾口 Applied Biosystems公司的 SOLID等。 这些平台共同的特点是极高的测序通量, 相对于传统测序的 96道毛细管测序, 高通量测序一次实验可以读取 40万到 400万条序列, 根据平台的不同, 读取长度从 25bp到 450bp不等, 因此不同的测序平台在一次实验中,可以读取 1G到 14G不 等的碱基数。 其中, Solexa高通量测序包括 DNA簇形成和上机测序两个步骤: PCR扩增产 物的混合物与固相载体上固定的测序探针进行杂交,并进行固相桥式 PCR扩增,形成测序簇; 对所述测序簇用"边合成 -边测序法"进行测序,从而得到样本中疾病相关核酸分子的核苷酸序 列。
DNA簇的形成是使用表面连有一层单链引物 (primer)的测序芯片 (flow cell),单链状态的 DNA片段通过接头序列与芯片表面的弓 I物通过碱基互补配对的原理被固定在芯片的表面,通 过扩增反应, 固定的单链 DNA变为双链 DNA, 双链再次变性成为单链, 其一端锚定在测序 芯片上, 另一端随机和附近的另一个引物互补从而被锚定, 形成"桥"; 在测序芯片上同时有 上千万个 DNA单分子发生以上的反应; 形成的单链桥, 以周围的引物为扩增引物, 在扩增 芯片的表面再次扩增, 形成双链, 双链经变性成单链, 再次成为桥, 称为下一轮扩增的模板 继续扩增;反复进行了 30轮扩增后,每个单分子得到 1000倍扩增,称为单克隆的 DNA簇。
DNA簇在 Solexa测序仪上进行边合成边测序, 测序反应中, 四种碱基分别标记不同的 荧光, 每个碱基末端被保护碱基封闭, 单次反应只能加入一个碱基, 经过扫描, 读取该次反 应的颜色后, 该保护集团被除去, 下一个反应可以继续进行, 如此反复, 即得到碱基的精确 序列。在 Solexa多重测序 (Multiplexed Sequencing)过程中会使用 IndexC标签 or barcode)来区分 样品, 并在常规测序完成后,针对 Index部分额外进行 7个循环的测序,通过 Index的识别, 最多可以在 1条测序甬道中区分 12种不同的样品。 检测方法
本发明提供了一种精确检测羟甲基修饰位点的方法, 在本发明的一个优选例中, 所述方 法包括以下步骤 (见图 1 ):
1. 对基因组 DNA中的 5hmC进行糖基化修饰: 取没有蛋白、 RNA等污染的完整基因组
DNA与 T4 β-葡萄糖基转移酶(T4-BGT) 反应; 同时, 取等量相同的基因组 DNA不进行糖 基化修饰, 作为对照组。
糖基化修饰组的 DNA在 T4-BGT酶的作用下, 以尿嘧啶二磷酸葡萄糖(UDP-Glucose) 为底物, 将葡萄糖单元转移至双链 DNA的 5-羟甲基胞嘧啶残基上, 形成 β-葡糖基 -5-羟甲基 胞嘧啶 (5gmC)。 该反应不依赖于 DNA序列, 所以所有的 5-hmC都能被糖基化修饰, 而没 有修饰的胞嘧啶残基和甲基化修饰的 5-mC残基则不会被糖基化;对照组没有加入 T4-BGT, 不会进行糖基化修饰。
在本发明中, 所述基因组 DNA可以来源于动物组织提取的基因组 DNA、 细胞基因组 DNA等,只要基因组序列中的 CCGG位点存在 ChCGG羟甲基化修饰,均可运用该技术进行 检测。
2. 限制性内切酶的消化反应: 将糖基化修饰以及对照组的 DNA分别平行进行 Mspl酶 切反应, 同时取没有蛋白、 RNA等污染的完整基因组 DNA用 Hpall酶进行酶切反应。
Mspl和 Hpall对甲基化的敏感性存在差异: Hpall只能识别并切割未被修饰的 CCGG位 点; Mspl可以识别并切割各种修饰的 CCGG位点(CCGG、 CmCGG禾 P ChCGG; 在本申请的 DNA序列中上标 m表示甲基化,上标 h表示羟甲基化),但不能和切割 CgCGG位点。因此, 每组酶切的 DNA片段末端分别含有不同的修饰信息: 经糖基化修饰和 Mspl酶切的 DNA片 段末端包含了基因组中 CCGG和 CmCGG的信息, 直接经过 Hpall酶切的 DNA片段末端只 包含基因组中 CCGG的信息, 而直接经过 Mspl酶切的对照组的 DNA片段末端包含基因组 中 CCGG、 CmCGG和 ChCGG所有的信息。
3. 连接 Biotin-linker (生物素标记的接头): 在 DNA连接酶的作用下, 在不同处理和酶 切的 DNA片段的两端分别连接生物素标记的接头 Biotin-linker。
4. Malll酶切: 分别运用 Malll限制性内切酶切割经不同处理的、 已连接生物素标记的 DNA片段,将两端均连接生物素标记接头的 DNA片段在序列内的" CATG"特定位点处切割, 产生一端连接生物素标记接头, 另一端突出 4个碱基 (CATG)粘性末端的序列和一些两端 均为粘性末端的序列。
5. 链亲和霉素磁珠捕获:运用 M-280链亲和霉素偶联的磁珠捕获一端连接生物素标记接 头, 另一端突出 4个碱基 (CATG)粘性末端的 DNA片段, 洗涤去除两端均为粘性末端的 DNA序列, 舍弃这些序列对后续分析没有影响。
6. 连接 Linker N: 在 DNA连接酶的作用下, 将捕获在链亲和霉素偶联磁珠上的 DNA 片段与连接末端含有 Mmel限制性内切酶识别位点的接头(Linker N)进行连接,产生的 DNA 片段一端通过生物素和链亲和霉素的亲和作用结合在磁珠上, 而另一端则连接有一个含有 Mmel酶切位点的 Link N。 Mmel的识别位点为 5' TCCRAC3', 其中 R为碱基 或0。 在另 一优选例中, Linker N中的 Mmel酶切位点变形为 Ecop5I酶切位点也同样可行。
7. Mmel或 Ecop5I酶切: 用 Linker N中所含酶切位点的限制性内切酶 Mmel进行酶切, 产生一个 20bp的插入片段,该插入片段的一端连有 Linker N,而另一端为突出两个任意碱基 的粘性末端, 同时产生一个相应的结合在磁珠上的碎片, 产生的每一个连接有 Linker N的片 段均可代表其相邻的 CCGG位点上的修饰信息。在另一优选例中,用 LinkerN中所含酶切位 点的限制性内切酶 Ecop5I进行酶切, 产生一个 25bp (Ecopl5I) 的插入片段, 该插入片段的 —端连有 Linker N, 而另一端为突出两个任意碱基的粘性末端, 同时产生一个相应的结合在 磁珠上的碎片,产生的每一个连接有 LinkerN的片段均可代表其相邻的 CCGG位点上的修饰 信息。
8. 连接 P7接头:纯化 Mmel或 Ecopl5I酶切产物中的上清 (含有连接有 Linker N的 DNA 片段), 纯化后在 DNA连接酶的作用下连接 P7接头, 纯化连接产物。
9. PCR扩增及纯化: 以 LinkerN和 P7接头序列为通用弓 I物进行 PCR扩增, 扩增产物用
6%的非变性 PAGE胶回收纯化, 回收产物经 Agilent 2100片段大小检测和 Q-PCR定量检测 后在 Hiseq2000测序仪上进行序列分析。
10. 上机测序及数据分析: 文库检测合格后将按照单末端 50个碱基的读长在 Hiseq2000 测序仪上进行序列分析。 测序数据归一化后, 比较不同文库中每个 CCGG位点对应的 20bp 的短序列的测序数量, 获得每个位点的甲基化和羟甲基化修饰水平信息。
在一个优选例中, 序列信息的分析比较是用包括下述步骤的方法获得的: (i)将测序后获 得的各文库原始的读段进行过滤, 获得高质量片段信息; 将参考序列进行酶切模拟, 获得由 理论酶切片段构成的虚拟文库; (ii)将步骤 ©获得高质量片段信息的和虚拟文库进行比对, 对 比对统计数据进行归一化, 得到三个文库的测序深度归一化数据; 根据步骤 (ii)获得的归一化 数据计算每个 CCGG位点甲基化和羟基化水平; 根据步骤 (iii)获得的每个 CCGG位点甲基化 和羟基化水平统计样本整体甲基化和羟基化水平和甲基化和羟基化修饰在染色质上的水平。
过滤条件包括: 原始的文库序列信息减去接头序列信息; 原始的文库序列信息减去 N碱 基数超过总碱基数的 10%的序列信息; 原始的文库序列信息减去碱基质量值低于 20的碱基 数超过总碱基数的 10%的序列信息。
归一化包括步骤: 根据 CCGG位点的深度对每个文库进行排序, 每一个 CCGG位点获 得在每一个文库中的排序值; 获得每个 CCGG位点在每一列中的排序值, 计算每个位点的三 个排序值的方差, n次循环去除方差较大的点,最后剩余 m个位点作为归一化的基准线, m、 n为正整数; 在另一优选例中, m取值范围为 5000-15000, n>4; 根据这 m个排序稳定的点 的总测序深度在文库间的比例关系, 对文库进行归一化。 试剂盒
本发明还提供了一种用于精确检测基因组羟甲基化修饰的试剂盒, 所述试剂盒包括:
(1)第一容器以及位于容器内的用于进行糖基化修饰的试剂;
(2)第二容器以及位于容器内的限制性内切酶反应试剂;
在一个优选例中, 限制性内切酶包括 Mspl、 HpalK Mmel禾 P Main; 或者包括 Mspl, Hpall, Ecopl5I和 Malll;
(3)第三容器以及位于容器内的生物素标记接头,所述生物素标记接头优选由两条寡核苷 酸链配对组成, 例如所述两条寡核苷酸链序列分别为 SEQ ID NO: 1和 SEQ ID NO: 2;
(4)第四容器以及位于容器内的第二接头,所述第二接头优选为两条寡核苷酸链配对组成, 例如所述两条寡核苷酸链序列为 SEQ ID NO: 3和 SEQ ID NO: 4; 或所述两条寡核苷酸链序 列为 SEQ ID NO: 5和 SEQ ID NO: 6;或所述两条寡核苷酸链序列为 SEQ ID NO: 7和 SEQ ID NO: 8;
(5)第四容器以及位于容器内的测序接头,所述测序接头优选由两条寡核苷酸链配对组成, 例如所述两条寡核苷酸链序列为 SEQ ID NO: 9和 SEQ ID NO: 10;
在本发明的一个优选例中, 所述试剂盒还包括: 用于进行磁珠捕获所需的试剂、 用于核 酸纯化的试剂, 或其组合。 本发明的主要优点包括: (1)本发明方法是在全基因组范围内,结合高通量测序的方法,单碱基分辨检测羟甲基 修饰的技术, 并且可以同时单碱基分辨的检测某一CCGG位点上的甲基化修饰状态;
(2)本发明方法检测位点的数量远远高于现有技术, 覆盖率高;
(3)本发明采用的技术是以序列标签间接反映每一个位点的修饰状态,只需一端测序, 序数据量极大减少, 成本大大降低。 下面结合具体实施例, 进一步阐述本发明。 应理解, 这些实施例仅用于说明本发明而不 用于限制本发明的范围。 下列实施例中未注明具体条件的实验方法, 通常按照常规条件如 Sambrook等人, 分子克隆: 实验室手册 (New York: Cold Spring Harbor Laboratory Press, 1989) 中所述的条件, 或按照制造厂商所建议的条件。 主要仪器和试剂
实施例中用到的主要仪器列于表 1。 仪器名称 型号 厂家 热循环仪 (PCR仪) Veriti Thermal Cycler ABI
安捷伦 2100 2100 Bioanalyzer Agilent
NanoDrop 1000 (DNA浓度检测仪
Spectrophotometer Thermo Fisher Scientific 器)
凝胶成像系统 Tanon 上海天能科技有限公司
DarkReader Transilluminator (切胶仪
D195M Clare Chemical Reasearch 器)
Thermomixer (加热混匀仪器) Thermomixer comfort Eppendorf
低温离心机 5417R Eppendorf 台式离心机 5418 Eppendorf 台式离心机 SVC-75004334 Heraeus
垂直混合仪 HS3
mini PROTEAN Tetra
小型垂直电泳槽
Cel
Thermomixer comfort5355
震荡混匀器 QL-901
磁力架 123-21D Invitrogen
电子分析天平 BS 124S Sartorius 实施例中用到的主要试剂列于表 2。
表 2
Figure imgf000011_0001
实施例中用到的主要引物序列及名称列于表 3。
表 3
Figure imgf000012_0001
SEQ ID NO:3禾 P SEQ ID NO:4为带有 Mmel识别位点的 Linker N序列;
SEQ ID N0 5和 SEQ ID NO:6, 以及 SEQ ID NO:7和 SEQ ID NO: 8为带有 Ecopl5I识别 位点的 Linker N序列。 实施例 1 基因组 DNA糖基化修饰
材料: h9细胞系。
分别取 1μ§的 h9细胞系基因组 DNA进行糖基化修饰以及对照组反应: 分别在 1.5 ml的离心 管中配置如表 4所示的反应体系。
¾4
组分 糖基化处理组 第一对照组 h9基因组 DNA l g
25 UDP-Glucose 4μ1 4μ1
ΙΟχΝΕΒ buffer4 ΙΟμΙ ΙΟμΙ T4-BGT 30单位 30单位 (酶失活处理) 无 RNA酶的水 至 ΙΟΟμΙ 至 ΙΟΟμΙ 混匀、 离心后, 37°C进行水浴 16h, 反应后用乙醇沉淀回收, 回收产物溶于 30μ1的 EB中。 实施例 2 限制性内切酶 Mspl酶切消化
将糖基化处理以及对照组 DNA分别进行 Mspl酶切。
分别在 1.5 ml的离心管中配置如表 5所示 应体系。
Figure imgf000013_0001
组分 糖基化处理组 第一对照组 回收 DNA 30μ1 30μ1
ΙΟχΝΕΒ buffer4 ΙΟμΙ ΙΟμΙ
Mspl 500单位 500单位
无 RNA酶的水 至 50μ1 至 50μ1
37 V水浴反应 16- 19h, 反应完后将产物 80 V灭活 20min。 实施例 3 限制性内切酶 Hpall酶切消化 另取 1μ§的 h9基因组 DNA直接进行 Hpall酶切,在 1.5 ml的离心管中配置如表 6所示的酶切反 应体系。
表 6
组分 Hpall酶切组 h9基因组 DNA
ΙΟχΝΕΒ buffer4 ΙΟμΙ
Hpall 500单位
无 RNA酶的水 至 50 μΐ
37 °C水浴反应 16- 19h, 反应完后将产物 80 °C灭活 20min。 实施例 4 酶切产物连接 biotin-linker
将酶切得到的 DNA在 1.5 ml的离心管中配制反应体系, 反应体系如表 7所示。
表 7 酶切后 DNA ΙΟΟμΙ
Biotin-linker ( 1 ΟμΜ) 3 μΐ
ATP (lOmM) 12μ1
Τ4 DNA连接酶(NEB) 2μ1 将反应体系放到 16°C条件下反应 5h。 反应后用乙醇沉淀回收纯化连接产物, 最后将样品溶于 172μ1的 LoTE (3 mmol/L Tris-HCl pH7.5 , 0.12 mmol/L EDTA) 中。 实施例 5 Nlam (NEB)酶切
将上一步得到连接 biotin-linker的产物分别按表 8配制反应体系:
表 8
DNA 172μ1
lOOxBSA 2μ1
ΙΟχΝΕΒ buffer4 20μ1
Main 6μ1 总计 200μ1 将反应物置于 37°C条件下反应 lh。 反应后在反应体系中加入 400μ1的 Wash buffer D (Invitrogen) 。 实施例 6 链亲和霉素磁珠捕获连接 biotin-linker的序列
1. 准备链亲和霉素偶联磁珠
1 )重悬 M-280链亲和霉素偶联磁珠, 吸取 200μ1至 1.5ml的不粘管中, 将 EP管置于磁 力架上 lmin, 小心去除上清;
2)在 EP管中加入 400μ1的 Wash buffer D重悬磁珠, 将 EP管置于磁力架上 2min, 小 心去除上清。
2. 链亲和霉素偶联磁珠捕获 Malll酶切后链接 biotin-linker的序列
1 )将 Malll酶切后得到的 200μ1 DNA反应液和 400μ1的 Wash buffer D混合液加入到准备 好的磁珠中, 重悬, 室温孵育 20min。 每隔 5min轻弹磁珠, 防止沉淀;
2)反应后, 分别将 EP管放于磁力架上静止 2min, 弃上清; 再用 600 μΐ Wash buffer D洗 两次; 3)分别向每管加入 300μ1的 lxligation buffer (Invitrogen),重悬,置于磁力架上 1分钟, 弃上清。 实施例 7 Linker N的连接
1)在磁珠捕获产物里依次加入如表 9所示的试剂。
¾9
Linker N (50um) 2.5μ1
LoTE缓冲液 27μ1 5Χ ligation buffer 8μ1
2) 重悬后, 置于 50°C水浴中反应 2min, 之后室温放置 lOmin;
3)分别加入 2.5μ1的 T4HCDNAligase (Invitrogen, 型号 15224-041), 重悬混匀, 置于 调至 16°C的 Thermomixer (Eppendorf)上反应 2h, 每 5min重悬混匀一次;
4)反应后, 加入 600μ1的 WashbufferD重悬混匀, 将 EP管置于磁力架上 l-2min, 去上 清;
5)用 600μ1 WashbufferD重复洗涤一次, 将 EP管置于磁力架上 l-2min, 去上清;
6)加入 600μ1的 WashbufferD, 重悬混匀后分别转入新的 1.5ml的不粘管中, 将不粘管 置于磁力架上 l-2min, 去上清, 用 200μ1的 lx EBbuffer4重悬。 实施例 8 Mmel酶切
1)将不粘管置于磁力架上,小心去除 lxNEBbUffer4后,配制如表 10所示的酶切体系。
表 10
LoTE缓冲液 118μ1
10XNEB buffer 4 15μ1
50(^mSAM (S—腺苷甲硫氨酸, 现稀释现用) 15μ1
Mmel 3μ1
2)将反应液置于 Thermomixer(Eppendorf)上, 37°C反应 1小时 10分钟, 每 lOmin重悬 混匀一次。
3) 反应完后, 将不粘管置于离心机中, 15000g, 离心 2min。
4)将不粘管置于磁力架上 2min, 收集上清到新的 1.5ml的 EP管中。
5)在上述 1.5ml的 EP管中依次加入 150μ1的 LoTE和 300μ1的 25:24的苯酚氯仿, 混匀 后置于离心机中, 15000g离心 2min。
6)将上清转入 2ml的离心管中, 依次加入 4μ1的糖原, 200μ1的 7.5Μ的醋酸铵和 1.5ml 预冷的无水乙醇, 混匀, -80°C冰箱放置 30min后, 14000rpm, 4°C离心 10min, 小心吸出上 清。
7) 向沉淀中加入 70%的乙醇, 洗漆, 将管置于离心机中, 14000rpm, 4°C离心 5min。
8) 小心去除上清, 将沉淀置于室温晾干 2min, 将沉淀溶于 6μ1 LoTE中。 实施例 9 Mmel酶切纯化产物连接 P7接头
将实施例 9得到的 DNA回收产物中按表 11配制连接反应体系。
表 11
DNA 6μ1
P7接头 (ΙΟμιη) Ιμΐ
5 X ligation buffer 2μ1
T4 DNAligase Ιμΐ
得离心管放到调至 16°C的 Thermomixer(Eppendorf)上反应 3h。 实施例 10 PCR扩增
取实施例 9反应产物中的 5μ1作为模板进行文库扩增, 扩增体系见表 12。
表 12
连接 Ρ7接头的 DNA 5μ1
dNTP(2.5mM) 2μ1
5 Phusion PCR buffer (NEB) 5ul
Phusion®超保真 DNA聚合酶 Ιμΐ
P5引物 (ΙΟμΜ) Ιμΐ
P7引物 (ΙΟμΜ) Ιμΐ
dH20 ΙΟμΙ
总量 25 μΐ
PCR反应条件见表 13。
表 13
温度 ( °C ) 时间 (min) 循环
98 °C 2min 1
98 °C 30 s
60 °C 30 s 9
72 °C 5min
12°C 维持 实施例 11 PCR产物回收纯化
1 )将 PCR产物进行 6%非变性丙烯酰胺凝胶进行电泳: 180V, 30min。
2)切胶选择约为 86-90bp大小的文库片段, 将目的条带放入套有 2ml离心管的 0.5ml离 心管内(其中 0.5ml离心管底部用针剌有 6个小洞)。将其置于离心机内 14000rpm,离心 2min, 将胶粉碎于 2ml的离心管中。
3 )在 2ml离心管中加入 ΙΟΟμί的 l xNEB buffer2, 置于垂直混合器上, 室温转 2h。
4 )将管内的液体及胶粒全部转到 Spin-X过滤柱( Spin-X Cellulose Acetate Filter)内, 14,000 rpm离心 2min,在收集管内依次加入 Ιμί的糖原, ΙΟμί的 3Μ的醋酸钠和 325μί预冷的无水 乙醇, 混匀, -80°C冰箱放置 30min。
5 )将管置于离心机中, 14000rpm, 4°C离心 10min, 小心去除上清。
6)用 70%乙醇对沉淀洗一次, 将管置于离心机中, 14000rpm, 4°C离心 5min, 小心去 除上清。
7)将沉淀置于室温晾干 2min, 溶于 15μ1的 Elution Buffer (QIAGEN) 中。 实施例 12文库检测
用安捷伦 2100生物分析仪 (Bioanalyzer analysis system, Agilent, Santa Clara, USA)检测 文库插入片段大小及含量; Q-PCR精确定量文库的浓度。
安捷伦 2100生物分析仪检测 h9基因组经 T4-BGT糖基化修饰后再经过 Mspl酶切的文 库和 h9基因组分别直接经过 Mspl酶切或 Hpall酶切的文库片段大小, 文库检测结果如下: 图 2显示不同文库经检测, 最终连接 linker N和 P7接头的三种不同文库分别经过 PCR 扩增后, 其片段大小均为 96bp, 与理论大小相符; 图 2(a)为干细胞 h9基因组经 T4-BGT糖 基化修饰后, Mspl酶切文库片段分布范围; 图 2 (b)为干细胞 h9基因组直接经过 Mspl酶切 文库片段分布范围; 图 2 (c)为干细胞 h9基因组直接经过 Hpall酶切文库片段分布范围。 实施例 13测序及数据分析
文库检测合格后将按照单末端 50个碱基的读长在 Hiseq2000测序仪上进行序列分析。测 序数据归一化后, 通过比较三种不同文库中每个 CCGG位点对应的 20bp的短序列的测序数 量获得每个位点的甲基化和羟甲基化修饰水平信息。 具体操作步骤如下:
1 )文库检测合格后,按照单末端 50个碱基的读长在 Hiseq2000测序仪上进行序列分析;
2) 测序分析结束后, 得到下机的建库片段的原始序列信息, 根据建库所加测序接头的 序列信息, 除去每条原始序列上的接头序列; 同时对原始序列进行质量过滤, 除去低质量的 原始序列, 过滤条件为: 序列中 N碱基的数量超过总碱基数的 10%, 或者碱基质量值低于 20的碱基数超过总结碱基数的 10%, 这样的序列将被过滤; 3 )将人基因组 hgl9的序列在该实验方案下, 进行计算机模拟酶切, 得到理论的酶切片 段, 形成虚拟文库; 再将上一步过滤后得到的测序序列同虚拟文库进行比对, 不允许错配; 比对结束后, 对比对结果进行统计;
4) 比对结束, 对三个文库的样品数据进行预处理, 得到每个 CCGG位点在三个文库中 的测序深度, 并进行数据的归一化, 归一化方法为:
(a)根据 CCGG位点的深度对每一列, 即每一个文库, 进行排序, 每一个 CCGG位点获 得在每一个文库中的排序值; (b)计算每一个位点在三个文库中的排序值的方差, 舍弃方差 较大的点, 舍弃点的数目为: (总位点数 -5000) /4。 对剩余点继续进行排序每个 CCGG位点 获得在每一列中的排序值, 计算每个位点的三个排序值的方差, 并去除方差较大的 (总位点 数 -5000) /4个点; 以此类推, 循环 4次, 最后剩余 5000个点作为归一化的基准线。 根据这 5000个排序比较稳定的点的总测序深度在三个文库间的比例关系,对这三个文库进行归一化, 将各文库的测序深度归一化到能同时检测 C、 mC和 5hmC的文库上;
5)根据归一化后的数据计算出每个 CCGG位点的甲基化水平和羟甲基化水平;
6)利用 perl程序语言和 R作图语言, 通过各 CCGG位点的修饰信息, 统计样品整体的 甲基化和羟甲基化修饰水平分布以及甲基化和羟甲基化修饰在不同染色质上的水平, 分别见 图 3和图 4。
图 3样品各 CCGG位点的甲基化和羟甲基化修饰水平的整体分布趋势,横坐标为修饰水 平, 纵坐标为相应修饰水平下, 该修饰水平的 CCGG位点数目在总位点数中的密度; 图 3可 看出本发明方案检测到的甲基化修饰为低甲基化和高甲基化修饰两种趋势, 而羟甲基化修饰 仅为偏低水平的修饰。
图 4为甲基化和羟甲基化在每一条染色质上修饰水平分析结果, 在每一条染色质上, 甲 基化修饰水平分布在 60%到 80%之间, 主要集中于 70%左右, 与之前证明的甲基化修饰在人 类基因组 CG位点上约为 70%完全一致。 同时, 发明人检测到在人胚胎干细胞 h9中, 羟甲 基化修饰均处于较低水平,只有不足 20%,与跟现在研究证明的羟甲基化低水平修饰相一致, 表明本发明的检测技术的非常可靠。
为了进一步确定本发明方法甲基化检测的准确性, 本发明人下载已公布的 h9细胞基因 组重亚硫酸盐测序数据, 比较重亚硫酸盐测序与本发明的酶切甲基化与羟甲基化测序的差异。
图 5 显示甲基化和羟甲基化修饰数据与重亚硫酸盐分析数据一致性比较结果。 在 ±0.25 差异值范围,两种方法检测得的甲基化或羟甲基化修饰位点 87.9%是一致的,一致性比较高。 对于极小部分差异超出 (-0.25, 0.25) 的位点可能是由于重亚硫酸盐转换效率、 细胞状态差 异以及酶切效率等因素导致的, 但这并不影响该技术的整体性应用, 该差异可以被接受。 实施例 14试剂盒
本实施例提供了一种用于精确检测样本中核酸羟甲基化修饰的试剂盒, 包括组分: (1)第一容器以及位于容器内的用于进行 5hmC糖基化修饰的试剂;
(2)第二容器以及位于容器内的限制性内切反应试剂; 且第二容器内设置独立单元, 限制 性内切酶 Mspl, Hpall, Mmel或 Ecopl5I, Main分别位于独立单元内;
(3)第三容器以及位于容器内的生物素标记接头;
(4)第四容器以及位于容器内的第二接头, 接头序列为 SEQ ID N0.3和 SEQ ID N0.4;
(5)第五容器以及位于容器内的 P5和 P7接头;
(6)第六容器以及位于容器内的用于进行磁珠捕获所需的试剂;
(7)第七容器以及位于容器内的用于核酸纯化的试剂;
(8)说明书。 讨论
对于全基因组 CCGG位点羟甲基化修饰的高通量检测, NEB公司设计了一种策略, 具 体思路如下:
首先,全基因组经过 Mspl酶切,这样在酶切效率为 100%的基础上,基因组所有的 CCGG 位点均可被切开 (包括甲基化修饰和羟甲基化修饰位点);
第二步,酶切后的片段以 dCTP为底物,经过 Klenow fragment的作用形成 5'突出一个碱 基 C的粘性末端;
第三步, 4%的丙烯酰胺凝胶回收 40-300bp长度范围的经 Klenow fragment修复的 DNA 片段;
第四步, 回收片段连接 5'突出碱基 G的双链接头 (接头可以介导后续的 PCR扩增和测 序);
第五步, 连接接头的回收片段经 BGT糖基化修饰, 则基因组原序列的 CCGG位点如果 含有羟甲基化修饰, 则形成 5gmC;
第六步, 糖基化修饰产物再进行 Mspl酶切, 如果 CCGG位点是羟甲基化修饰, 贝嗾头 不会切下来;
第七步, 取 1/3的上述产物进行 PCR扩增、 测序, 只有两端都有接头的序列即两端都是 羟甲基化修饰的序列可以检测到; 剩余 2/3的产物分两份, 每份各取 1/3不直接进行 PCR扩 增, 而分别在 dCTP底物的作用下, 经过 Klenow fragment再次进行末端修复, 形成一端或两 端突出一个碱基 C的粘性末端产物, 然后在连接酶的作用下连接另外一种接头, 连接产物中 的一份直接进行 PCR扩增、测序。另一份经 Hpall酶切后再次经过末端修复和接头连接, PCR 扩增测序。这样第一组检测到的序列两端的 CCGG位点均羟甲基化修饰,第二组检测到的序 列的一端为羟甲基化修饰, 而另一端为甲基化修饰或不修饰, 第三组检测到的序列的一端为 羟甲基化修饰, 而另一端为非修饰的 CCGG位点。
虽然 NEB公司的该技术在检测 5hmC在基因组中的分布取得较大进步,但是它也存在一 些问题: 比如该技术只能以 dCTP为底物进行末端修复, 对于本来双链均发生羟甲基化修饰 的位点,由于末端修复,将只有一条链为羟甲基化修饰,这极大地影响了该位点的酶切状况, 因此而引入很多错误信息; 该技术要分析的片段范围是靠切胶选择的(40-300bp), 这样得不 到所有片段之外位点的修饰信息, 即检测位点不全, 所得数量较少。
本发明人借助 5hmC经过糖基化修饰后不能被 Mspl限制性内切酶切割, 而 5mC和 5C 均可被 Mspl切开的原理, 设计了一套含有生物素修饰的接头和含 Mmel酶切位点的接头, 全基因组依次经过糖基化修饰、 Mspl酶切、 生物素修饰的接头连接、 Malll酶切、 链亲和霉 素磁珠捕获、 以及含 Mmel酶切位点的接头连接和 Mmel酶切等操作构建文库, 借助高通量 测序仪精确检测 5hmC在全基因组范围内的精确定位,建立一种单碱基分辨,精确检测 5hmC 的技术。
在本发明提及的所有文献都在本申请中引用作为参考, 就如同每一篇文献被单独引用作 为参考那样。 此外应理解, 在阅读了本发明的上述讲授内容之后, 本领域技术人员可以对本 发明作各种改动或修改, 这些等价形式同样落于本申请所附权利要求书所限定的范围。

Claims

权 利 要 求 书
1. 一种检测核酸羟甲基化修饰的方法, 其特征在于, 包括步骤:
( 1 )对所述核酸进行糖基化处理, 获得羟甲基化碱基转化为糖基羟甲基化碱基的糖基 化核酸;
(2)对未作糖基化处理的对照组核酸和步骤(1 ) 获得的糖基化核酸, 分别进行第一限 制性内切酶酶切反应, 分别获得第一对照核酸片段和样本核酸片段; 对所述对照组核酸或糖 基化核酸进行第二限制性内切酶酶切反应, 获得第二对照核酸片段;
(3 )对步骤 (2) 获得的第一对照核酸片段、 样本核酸片段和第二对照核酸片段, 分别 连接生物素标记的接头, 获得具有生物素接头的第一对照连接产物、 样本连接产物和第二对 照连接产物;
(4)对步骤 (3 ) 获得的具有生物素接头的第一对照连接产物、 样本连接产物和第二对 照连接产物, 分别进行 Malll限制性内切酶酶切反应, 产生第一对照 Malll酶切产物、 样本 Nlalll酶切产物和第二对照 Malll酶切产物, 所述三种产物均为一端为生物素标记接头, 另 一端为粘性末端;
(5 )对步骤(4)获得的第一对照 Malll酶切产物、样本 Malll酶切产物和第二对照 Malll 酶切产物,分别进行第二接头连接,所述第二接头序列中具有特定限制性内切酶的识别位点; 获得第一对照二次接头连接产物、 样本二次接头连接产物和第二对照二次接头连接产物;
(6)对步骤 (5) 获得的第一对照二次接头连接产物、 样本二次接头连接产物和第二对 照二次接头连接产物进行特定限制性内切酶酶切反应, 获得一端具有第二接头, 另一端具有 粘性末端的第一对照最终酶切产物、 样本最终酶切产物和第二对照最终酶切产物;
(7)对步骤 (6) 获得的第一对照最终酶切产物、 样本最终酶切产物和第二对照最终酶 切产物, 与测序接头连接, 扩增测序接头连接产物, 获得第一对照测序文库、 样本测序文库 和第二对照测序文库;
(8)对步骤 (7) 获得的测序文库进行测序, 分析比较序列信息, 获得核酸羟甲基化修 饰的信息。
2. 如权利要求 1所述的方法, 其特征在于, 步骤(1 )所述的核酸为基因组 DNA。
3. 如权利要求 1或 2所述的方法, 其特征在于, 步骤 (1 )所述的核酸来源于动物、 植 物、 细菌、 真菌、 病毒, 或其组合。
4. 如权利要求 1-3任一项所述的方法, 其特征在于, 步骤 (1 )所述的糖基化处理为: 核酸在 T4-BGT酶的作用下, 以尿嘧啶二磷酸葡萄糖为底物, 将葡萄糖单元转移至核酸 的 5-羟甲基胞嘧啶 (5-hmC) 上, 形成 β-葡糖基 -5-羟甲基胞嘧啶 (5gmC)。
5. 如权利要求 1-4任一项所述的方法, 其特征在于, 步骤 (2)所述的第一限制性内切 酶为 Mspl。
6. 如权利要求 1-5任一项所述的方法, 其特征在于, 步骤 (2)所述的第二限制性内切 酶为 HpaII。
7. 如权利要求 1-6任一项所述的方法, 其特征在于, 步骤 (3 )所述生物素标记接头由 两条寡核苷酸链配对而成,所述两条寡核苷酸链序列分别为 SEQ ID NO: 1和 SEQ ID NO: 2。
8. 如权利要求 1-7任一项所述的方法, 其特征在于, 步骤 (4)还包括: 利用链亲和霉 素磁珠捕获 Malll酶切后产生的片段获得所述第一对照 Malll酶切产物、 样本 Main酶切产 物和第二对照 Malll酶切产物。
9. 如权利要求 1-8任一项所述的方法, 其特征在于, 步骤 (5)所述的第二接头由两条 寡核苷酸链配对而成, 所述两条寡核苷酸链序列分别为 SEQ ID NO: 3和 SEQ ID NO: 4。
10. 如权利要求 1-8任一项所述的方法, 其特征在于, 步骤(5)所述的第二接头由两条 寡核苷酸链配对而成, 所述两条寡核苷酸链分别为 SEQ ID NO: 5和 SEQ ID NO: 6。
11. 如权利要求 1-8任一项所述的方法, 其特征在于, 步骤(5)所述的第二接头由两条 寡核苷酸链配对而成, 所述两条寡核苷酸链分别为 SEQ ID NO: 7和 SEQ ID NO: 8。
12. 如权利要求 1-11 任一项所述的方法, 其特征在于, 步骤 (6) 中所述的特定限制性 内切酶为 Mmel或 Ecop5I。
13. 如权利要求 12所述的方法, 其特征在于, 步骤 (6) 中用 Mmel酶切, 获得 20bp长 度的一端具有第二接头, 另一端具有粘性末端的片段。
14. 如权利要求 12所述的方法, 其特征在于, 步骤 (6) 中用 Ecopl5I酶切, 获得 25bp 长度的一端具有第二接头, 另一端具有粘性末端的片段。
15. 如权利要求 1-14任一项所述的方法, 其特征在于, 步骤 (7)所述的测序接头由两 条寡核苷酸链配对而成, 所述两条寡核苷酸链序列分别为 SEQ ID NO:9和 SEQ ID NO: 10。
16. 如权利要求 1-15任一项所述的方法, 其特征在于, 步骤 (8)所述的测序选自下组 任一测序平台进行:
Illumina Solaxa Roche 454、 ABI SOLID Helicos TRUE单分子测序、 PacBio单分子实 时测序、 Oxford Nanopore纳米孔单分子测序。
17. 如权利要求 1-16任一项所述的方法, 其特征在于, 步骤 (8)所述的分析比较序列 信息包括下述步骤:
(i)将测序后获得的各文库原始的读段进行过滤, 获得高质量文库片段; 将参考序列进行 酶切模拟, 获得由理论酶切片段构成的虚拟文库;
(ii)将步骤 ©获得高质量文库片段和虚拟文库进行比对, 对比对统计数据进行归一化, 得 到三个文库的测序深度归一化数据;
(iii)根据步骤 (ii)获得的归一化数据计算每个 CCGG位点甲基化和羟基化水平;
(iv)根据步骤 (iii)获得的每个 CCGG位点甲基化和羟基化水平统计样本整体甲基化和羟 基化水平和甲基化和羟基化修饰在染色质上的水平。
18. 如权利要求 17所述的方法, 其特征在于, 步骤 (i)所述的过滤包括: (a)原始的文库序列信息减去接头序列信息; 和 /或
(b)原始的文库序列信息减去 N碱基数超过总碱基数的 10%的序列信息; 和 /或
(c)原始的文库序列信息减去碱基质量值低于 20的碱基数超过总碱基数的 10%的序列信 息。
19. 如权利要求 17或 18所述的方法, 其特征在于, 步骤 (i)所述的参考序列为人基因组 hgl8或 hgl9序列。
20. 如权利要求 17-19任一项所述的方法,其特征在于,步骤 (ii)所述的归一化包括步骤: (A)根据 CCGG位点的深度对每个文库进行排序, 每一个 CCGG位点获得在每一个文库 中的排序值;
(B)获得每个 CCGG位点在每一列中的排序值, 计算每个位点的三个排序值的方差, n 次循环去除方差较大的点, 最后剩余 m个位点作为归一化的基准线, m、 n为正整数;
(C)根据这 m个排序稳定的点的总测序深度在文库间的比例关系, 对文库进行归一化。
21. 如权利要求 20所述的方法,其特征在于,步骤 (B)所述的 m取值范围为 5000-15000, n>4 o
22. 一种用于精确检测基因组羟甲基化修饰的试剂盒, 其特征在于, 包括组分:
(1)第一容器以及位于容器内的用于进行糖基化修饰的试剂;
(2)第二容器以及位于容器内的限制性内切酶反应试剂;
(3)第三容器以及位于容器内的生物素标记接头,所述生物素标记接头由两条寡核苷酸链 配对组成, 所述两条寡核苷酸链序列分别为 SEQ ID NO: 1和 SEQ ID NO: 2;
(4)第四容器以及位于容器内的第二接头, 所述第二接头为两条寡核苷酸链配对组成, 且 所述两条寡核苷酸链序列为 SEQ ID NO: 3和 SEQ ID NO: 4; 或所述两条寡核苷酸链序列为 SEQ ID NO: 5和 SEQ ID NO: 6; 或所述两条寡核苷酸链序列为 SEQ ID NO: 7和 SEQ ID NO: 8;
(5)第四容器以及位于容器内的测序接头, 所述测序接头由两条寡核苷酸链配对组成, 且 所述两条寡核苷酸链序列为 SEQ ID NO: 9和 SEQ ID NO: 10。
23. 如权利要求 22所述的试剂盒, 其特征在于, 位于第二容器内的限制性内切酶包括 Mspl、 HpaII、 Mmel和 MaIII。
24. 如权利要求 22所述的试剂盒, 其特征在于, 位于第二容器内的限制性内切酶包括 Mspl、 HpaII、 Ecopl5I和 MaIII。
25. 如权利要求 22-24任一项所述的试剂盒, 其特征在于, 所述试剂盒还包括: 用于进 行磁珠捕获所需的试剂、 用于核酸纯化的试剂, 或其组合。
PCT/CN2012/084964 2011-11-24 2012-11-21 一种检测核酸羟甲基化修饰的方法及其应用 WO2013075629A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/360,594 US9567633B2 (en) 2011-11-24 2012-11-21 Method for detecting hydroxylmethylation modification in nucleic acid and use thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110376589.1A CN103131754B (zh) 2011-11-24 2011-11-24 一种检测核酸羟甲基化修饰的方法及其应用
CN201110376589.1 2011-11-24

Publications (1)

Publication Number Publication Date
WO2013075629A1 true WO2013075629A1 (zh) 2013-05-30

Family

ID=48469116

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/084964 WO2013075629A1 (zh) 2011-11-24 2012-11-21 一种检测核酸羟甲基化修饰的方法及其应用

Country Status (3)

Country Link
US (1) US9567633B2 (zh)
CN (1) CN103131754B (zh)
WO (1) WO2013075629A1 (zh)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9115386B2 (en) 2008-09-26 2015-08-25 Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
ES2872073T3 (es) 2011-12-13 2021-11-02 Univ Oslo Hf Procedimientos y kits de detección de estado de metilación
EP3351644B1 (en) 2012-11-30 2020-01-29 Cambridge Epigenetix Limited Oxidising agent for modified nucleotides
CN103388024B (zh) * 2013-07-04 2015-06-17 徐州医学院 一种基于桥式pcr检测dna羟甲基化的方法
CN103409514B (zh) * 2013-07-23 2016-01-06 徐州医学院 一种基于芯片的高通量高灵敏检测5-羟甲基化胞嘧啶的方法
CN103911439A (zh) * 2014-03-13 2014-07-09 眭维国 系统性红斑狼疮羟甲基化状态的差异表达基因的分析方法和应用
CN104480214B (zh) * 2014-12-30 2018-01-16 深圳市易基因科技有限公司 羟甲基化暨甲基化长序列标签测序技术
EP3355939A4 (en) 2015-09-30 2019-04-17 Trustees of Boston University MICROBIAL DEADMAN AND PASS CODE EMERGENCY STOP SWITCHES
EP3170883B1 (en) * 2015-11-20 2021-08-11 The Procter & Gamble Company Cleaning product
CN105648537B (zh) * 2016-03-02 2018-06-29 上海易毕恩基因科技有限公司 Dna5-甲基胞嘧啶与5-羟甲基胞嘧啶基因图谱测序方法
US11162139B2 (en) 2016-03-02 2021-11-02 Shanghai Epican Genetech Co. Ltd. Method for genomic profiling of DNA 5-methylcytosine and 5-hydroxymethylcytosine
AU2017312953A1 (en) * 2016-08-16 2019-01-24 The Regents Of The University Of California Method for finding low abundance sequences by hybridization (flash)
CN107142320B (zh) * 2017-06-16 2021-03-09 上海易毕恩基因科技有限公司 用于检测肝癌的基因标志物及其用途
WO2019051484A1 (en) * 2017-09-11 2019-03-14 Ludwig Institute For Cancer Research Ltd SELECTIVE MARKING OF 5-METHYLCYTOSINE IN CIRCULATING ACELLULAR DNA
CN109097460A (zh) * 2018-08-30 2018-12-28 青岛大学 一种氧化修饰的含氮碱基的检测方法
CN109321647A (zh) * 2018-10-26 2019-02-12 苏州森苗生物科技有限公司 标记组合物及羟甲基化核酸文库的构建方法
CN109811037B (zh) * 2018-11-15 2022-02-11 华南师范大学 一种dna甲基化过程的连续在线检测方法
CN111254144B (zh) * 2020-01-23 2023-04-28 南开大学 一种分子尺发夹结构及其用于测量单分子磁镊空间尺度准确性的方法
CA3187549A1 (en) 2020-07-30 2022-02-03 Cambridge Epigenetix Limited Compositions and methods for nucleic acid analysis
CN115386966B (zh) * 2022-10-26 2023-03-21 北京寻因生物科技有限公司 Dna表观修饰的建库方法、测序方法及其建库试剂盒

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011025819A1 (en) * 2009-08-25 2011-03-03 New England Biolabs, Inc. Detection and quantification of hydroxymethylated nucleotides in a polynucleotide preparation
WO2011127136A1 (en) * 2010-04-06 2011-10-13 University Of Chicago Composition and methods related to modification of 5-hydroxymethylcytosine (5-hmc)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2289904A1 (en) * 2009-07-03 2011-03-02 Universita' degli Studi di Milano Inhibitors of microbial infections

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011025819A1 (en) * 2009-08-25 2011-03-03 New England Biolabs, Inc. Detection and quantification of hydroxymethylated nucleotides in a polynucleotide preparation
WO2011127136A1 (en) * 2010-04-06 2011-10-13 University Of Chicago Composition and methods related to modification of 5-hydroxymethylcytosine (5-hmc)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WILLIAM A. PASTOR. ET AL.: "Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells", NATURE, vol. 473, no. 7347, 19 May 2011 (2011-05-19), pages 394 - 397 *

Also Published As

Publication number Publication date
CN103131754A (zh) 2013-06-05
US20150031552A1 (en) 2015-01-29
CN103131754B (zh) 2014-07-30
US9567633B2 (en) 2017-02-14

Similar Documents

Publication Publication Date Title
WO2013075629A1 (zh) 一种检测核酸羟甲基化修饰的方法及其应用
AU2021200391B2 (en) Differential tagging of RNA for preparation of a cell-free DNA/RNA sequencing library
JP6571895B1 (ja) 核酸プローブ及びゲノム断片検出方法
US20220316010A1 (en) Methods for copy number determination
JP7407227B2 (ja) 遺伝子アリルを同定するための方法及びプローブ
US20230340590A1 (en) Method for verifying bioassay samples
WO2011127136A1 (en) Composition and methods related to modification of 5-hydroxymethylcytosine (5-hmc)
TW201321518A (zh) 微量核酸樣本的庫製備方法及其應用
CN112105626A (zh) 用于dna、特别是细胞游离dna的表观遗传学分析的方法
US10465241B2 (en) High resolution STR analysis using next generation sequencing
CA3168144A1 (en) Methods of targeted sequencing
TW202305143A (zh) 用於準確的平行定量核酸的高靈敏度方法
EP4172357B1 (en) Methods and compositions for analyzing nucleic acid
TW202302861A (zh) 用於準確的平行定量稀釋或未純化樣品中的核酸的方法
CN105603052B (zh) 探针及其用途
CN114787385A (zh) 用于检测核酸修饰的方法和系统
CN114774514B (zh) 一种适用于高通量靶向基因组甲基化检测的文库构建方法及其试剂盒
US20220307077A1 (en) Conservative concurrent evaluation of dna modifications
EP4215619A1 (en) Methods for sensitive and accurate parallel quantification of nucleic acids
WO2024056008A1 (zh) 鉴别癌症的甲基化标志物及应用
JP2024035110A (ja) 変異核酸の正確な並行定量するための高感度方法
CN117915922A (zh) 与假尿苷和5-羟甲基胞嘧啶的修饰和检测相关的组合物和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12851694

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 14360594

Country of ref document: US

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/11/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 12851694

Country of ref document: EP

Kind code of ref document: A1