WO2010085774A1 - Digital restriction enzyme analysis of methylation - Google Patents

Digital restriction enzyme analysis of methylation Download PDF

Info

Publication number
WO2010085774A1
WO2010085774A1 PCT/US2010/022027 US2010022027W WO2010085774A1 WO 2010085774 A1 WO2010085774 A1 WO 2010085774A1 US 2010022027 W US2010022027 W US 2010022027W WO 2010085774 A1 WO2010085774 A1 WO 2010085774A1
Authority
WO
WIPO (PCT)
Prior art keywords
methylation
dna
methylated
smal
unmethylated
Prior art date
Application number
PCT/US2010/022027
Other languages
French (fr)
Inventor
Jaroslav Jelinek
Jean-Pierre J. Issa
Marcos R. H. Estecio
Shoudan Liang
Original Assignee
Board Of Regents, The University Of Texas System
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Board Of Regents, The University Of Texas System filed Critical Board Of Regents, The University Of Texas System
Publication of WO2010085774A1 publication Critical patent/WO2010085774A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the disclosed methodologies include the step of sequentially digesting genomic DNA, in a single fraction, with a pair of enzymes (such as Smal and Xmal) to create "unmethylated” and "methylated” signatures at the ends of the digested DNA fragments.
  • the methylated and unmethylated signatures are generated in a single tube - where each restriction digestion step is done serially and not in parallel as when two fractions are used. The first digestion creates blunt ended fragments marking unmethylated CpG sites.
  • the second digestion leaves fragments with 5' overhangs that mark methylated CpG sites.
  • the overhangs are filled in a subsequent step and all DNA fragments are then ligated to sequencing adapters. Different ends of each DNA fragment each bear either an unmethylated or a methylated signature.
  • the DNA ends are then read by massively parallel sequencing, which provides the capability of analyzing millions of individual DNA molecules. Sequences are then mapped back to the genome. Methylation levels (percentages of methylation to the tens to hundreds of thousands) of individual specific CpG sites are then calculated as based on the numbers of DNA molecules having methylated or unmethylated signatures.
  • the described methodologies herein are sometimes referred to as "Digital Restriction Enzyme Analysis of Methylation” or "DREAM.” These methods are capable of quantitative high resolution mapping of DNA methylation without the need of bisulfite treatment. As noted immediately above, the methodology utilizes digital signatures of methylated or unmethylated CpG sites at individual DNA molecules, and the results of such readings are not biased towards methylated or unmethylated DNA. Furthermore, these methodologies and devices and systems applying such methodologies use massively parallel sequencing that can measure DNA methylation levels with an error of less than 10%.
  • Figure 1 provides certain data that resulted from sequenced clones in the feasibility study.
  • Figure 2 charts methylation status and the percent of methylated restriction versus the median size of fragment as part of the feasibility study.
  • Figure 3 is a chart depicting the presence in CpG islands where the percent of restriction sites in CpG islands per median size of fragments are shown.
  • Figure 4 is a chart depicting the presence in repeats where the percent of restriction sites in repeats per median size of fragment are shown.
  • Figure 5 is a bar chart of the methylation of adjacent Smal sites in the white blood cell clones of the feasibility study.
  • Figure 6 depicts the example number of methylated Smal sites by genie position.
  • Figure 7 depicts the amount of unmethylated and methylated CpG Islands versus
  • Figure 8 depicts the amount of unmethylated and methylated repeats versus no repeats of Example 1.
  • Figure 9 is a chart showing the amount of restrictions sites in the human genome by fragment size.
  • Figures 1OA and 1OB are depictions of cytosine methylation in normal and cancer cells.
  • CpG islands or dense clusters of CG dinucleotides are present at transcription start sites in about half of human genes. Almost all of these CpG islands were free from cytosine methylation and gives a "green light" for transcription.
  • CpG islands in cancer cells are frequently methylated and affected genes are permanently silenced.
  • Figures HA and HB show that in the contrast to CpG Islands, scattered CpG sites outside of dense islands, in gene bodies, intergenic regions and repeats are generally methylated in normal tissues and may become hypo-methylated in cancer.
  • the method is based on creating unmethylated and methylated signatures by 2 restriction enzymes: Smal and Xmal, neoschizomers recognizing the same site.
  • the first enzyme cuts only unmethylated DNA and leaves a GGG signature.
  • the second enzyme cuts the remaining methylated DNA and leaves a CCGGG signature.
  • the signatures at individual DNA molecules are then resolved by massively parallel sequencing. Counting of methylated and unmethylated signatures tells us the methylation level at each sequenced end of the Sma/Xma fragment
  • Figure 13A shows the number of CpG sites that can be analyzed in the genomic fraction containing Smal/Xmal fragments 500 bp and smaller.
  • Figure 13B shows that the accuracy of measurement is statistically dependent on the number of sequenced DNA molecules or sequencing tags obtained for each particular site.
  • Figure 14A, 14B, 14C and 14D show that the methodology described herein is reproducible.
  • Figures 14A & 14B show replicate analysis of the same sample of normal DNA. These analyses were performed 9 months apart and on different sequencing machines.
  • Figure 14C and 14D show analyses of WBC DNA from 2 different normal individuals.
  • the Y axes in Figures 14A and 14C (the scatter plots) show differences in methylation
  • the X axes show the numbers of sequenced tags per each site.
  • the Histograms on the right Figures 14B and 14D) show that differences in methylation levels between replicates were less than 5% for more than 80% of sequenced CpG sites. Differences greater than 25% were observed in about 1% sites.
  • FIG. 15A and 15B show what levels of methylation can lead to gene silencing.
  • TSS gene transcription start site
  • WBC total white blood cells
  • Figure 16A, 16B, 16C and 16D show the huge disturbancies of methylation in leukemia cell lines when the results were compared to methylation levels in normal WBC. Hypermethylated sites are seen above the zero line, hypomethylated sites are below the zero line. Histograms on the right, Figures 17B & 17D, summarize the frequencies of methylation changes. A remarkable hypomethylation in the K562 cell line can be seen, and about 10% hypermethylated sites in both cell lines.
  • Figures 17A and 17B show methylation changes in a primary sample from a patient with acute myeloid leukemia ("AML"). When we compared methylation levels in the bone marrow of an AML patient with methylation levels in WBC from a healthy donor, we also saw striking differences in leukemia: hypermethylation in 10% sites and hypomethylation in 8% sites.
  • AML acute myeloid leukemia
  • Figures 18A shows that methylation changes go up in leukemia as observed inside
  • Figures 18B shows that CpG sites outside of CpG islands were hypo-methylated in leukemia. These included predominantly repeats.
  • Figure 19 shows certain genes that are differentially hypermethylated in leukemia.
  • BM normal bone marrow
  • WBC normal white blood cells
  • Fig. 20 shows the predicted error rate of methylation frequency measured by DREAM. Error rates were calculated for methylation frequencies 0-100% determined by methylated and unmethylated signatures in 5-100 sequencing tags per Smal site. The error rate shown was estimated using formula described herein.
  • Fig.21 shows methylation frequencies analyzed by DREAM. The values follow binomial distribution and are different within CpG islands (CGI) and outside of CpG islands (NCGI). Numbers of CpG sites analyzed for methylation levels are shown for CGI and NCGI.
  • Fig. 22 shows median methylation by distance from transcription start site for CpG poor promoters (left) and CpG rich promoters (right). Dotted lines are 25% and 75% percentiles. Note a narrow protected region on the left (about 500 bp) and a large protected region on the right.
  • Fig. 23 provides another depiction of the Digital Restriction Enzyme Analysis of
  • Genomic DNA is sequentially digested at CCCGGG sites with Smal and Xmal restriction endonucleases. Smal is blocked by CpG methylation while Xmal is not. The enzymes leave distinct signatures at the ends of fragments of digested DNA.
  • Massively parallel sequencing of individual DNA molecules provides multiple sequencing tags mapping to each Smal site. Methylation frequency at each site is calculated based on the numbers of sequencing tags with methylated or unmethylated signatures.
  • Fig. 24 shows a strategy for increasing of the genomic coverage by the reduction of Smal fragment sizes for Solexa sequencing.
  • Adapters labeled with biotin B are ligated to Smal/Xmal digested DNA.
  • Small fragments A, B, C, D are directly processed for sequencing.
  • High molecular weight fraction H is digested by a frequently cutting enzyme Mval. Biotinylated ends are captured on magnetic beads and Mval ends are religated. Resulting smaller fragments are then processed for sequencing. This approach is suggest to increase the genomic coverage of the method to restriction sites that are more than 500 bases apart.
  • Figure 25 shows unmethylated Smal sites from a sequenced clone.
  • Figure 26 shows methylated Smal sites from a sequenced clone.
  • Figure 27 shows unmethylated and methylated Aval/BsoBI sites from a sequenced clone.
  • Figure 28 shows methylated HpaII sites from a sequenced clone.
  • DETAILED DESCRIPTION Disclosed are reliable and cost effective methods for quantitative high resolution mapping of DNA methylation status in the whole genome, and devices and systems that use the same.
  • the described methodologies are applicable in the detection of DNA methylation changes in cells, particularly diseased cells such as cancer cells.
  • the accuracy and genome- wide scope of this methodology will allow quantitative assessment of disturbances of epigenetic memory in disease such as cancer on an unprecedented scale.
  • Precise quantification of DNA methylation changes allows molecular definition of epigenetic response to treatments such as anti-cancer treatments.
  • the methods taught herein are useful to identify subtypes of disease including cancer with specific epigenetic profiles reflecting unique natural history, prognosis and responsiveness to treatment.
  • Cytosine methylation of DNA is a vital component of epigenetic memory and characterization of methylation changes in disease has important translational implications. Cytosine methylation of DNA undergoes complex changes in disease, particularly cancer.
  • methylation in neoplastic DNA can mark specific subsets of patients with unique natural history and/or responsiveness to treatment.
  • Therapies targeting DNA methylation have already shown efficacy in leukemias and a potential in other malignancies.
  • the methods described herein are useful to measure the frequencies of cytosine methylation by digital counting millions of methylated and unmethylated signatures at hundreds of thousands specific sites across the whole genome. These methodologies further provide an optimization of genome coverage, reproducibility, cost savings, a greater accuracy of quantitative performance, and minimizes the amount of DNA quantities needed.
  • This novel technology is useful to accurately map the genome-wide scale of DNA methylation changes in a single patient sample, and provides a tool for quantitative mapping of DNA methylation applicable in basic, translational and clinical research.
  • the methodology provides quantitative information on epigenome disturbances which is much needed for understanding of the molecular basis of diseases such as cancer and for development of biomarkers characterizing subsets of patients with unique natural history and/or responsiveness to specific treatments.
  • the disclosed methodology provides a technique for high resolution, quantitative mapping of DNA methylation in the human genome.
  • This approach is based on distinct signatures of methylated and unmethylated cytosines generated by treatment of genomic DNA with restriction enzymes.
  • the signatures of individual DNA fragments are resolved by massively, parallel sequencing.
  • quantitative data can be obtained for cancer specific aberrations of DNA methylation at the more than 80,000 CpG sites, including 50% of CpG islands in the human genome, without the need of bisulfite treatment.
  • Cytosine methylation of DNA is a vital component of epigenetic memory.
  • CIMP CpG island methylator phenotype
  • the disclosed methodologies and uses thereof will increase understanding of the role of epigenetic mechanisms in cancer, how the epigenetic changes can be used for better diagnosis, and how the epigenetic mechanisms can be modulated for therapeutic and preventive purposes.
  • This reliable methodology overcomes existing technological barriers related to the quantitative determination of methylation frequencies at individual CpG sites, with the maximum genome coverage, at minimum costs.
  • innovative therapeutic approaches can be designed to pharmacologically target the pathways. Hence, a technique for quantitative high resolution mapping of DNA methylation across the whole genome is provided.
  • the disclosed methodologies provide an unbiased assessment of methylation status of individual DNA molecules across the whole genome.
  • Digital quantitative measurement of DNA methylation genome-wide has not been achieved before. With respect to expected outcomes, this technology allows for accurate mapping the genome- wide scale of DNA methylation changes in an individual cancer sample for less than $1000 (in the current dollar value).
  • Therapies targeting DNA methylation have shown efficacy in leukemias and a potential in other malignancies.
  • the methodology is a tool for basic, translational and clinical research for disease including cancer and is applicable to other diseases such neurodevelopment disorders, degenerative disorders, aging, diseases with complex genetic and epigenetic components such as diabetes or cardiovascular disorders. Hence, the impact of the subject methodologies is vast.
  • Epigenetics refers to the study of clonally inherited changes in gene expression without accompanying genetic changes. There are three major general molecular mechanisms carrying epigenetic information - DNA methylation, histone modifications and RNA interference Cedar H., DNA Methylation and Gene Activity, Cell 53(1): 3-4 (1988); Jenuwein T et al., Translating the Histone Code,. Science 293(5532): 1074-80 (2001); Zaratiegui M, et al., Noncoding RNAs and Gene Silencing. Cell 128(4): 763-76 (2007). DNA methylation in mammals affects cytosines in CpG dinucleotides. There are approximately 30 million CpG sites in the human genome, and the majority of them are methylated.
  • Cancer is associated with complex changes in DNA methylation. For the most part, these changes involve simultaneous global demethylation and de-novo methylation at previously unmethylated CpG islands. Demethylation was first discovered by studying overall 5- methyl-cytosine content in tumors, and appears to involve primarily satellite DNA, repetitive sequences, and CpG sites located in introns Feinberg AP et al., Hypomethylation Distinguishes Genes of Some Human Cancers From Their Normal Counterparts, Nature 301(5895): 89-92 (1983); Ji W, et al., DNA Demethylation and Pericentromeric Rearrangements of Chromosome 1, Mutat Res 379(1): 33-41 (1997).
  • methylation profiling provides a bird's eye view of cancer mRNA
  • methylation studies add to that analysis by identifying (1) genes whose silencing is potentially selected for during carcinogenesis (as opposed to reflecting differentiation or proliferation) and (2) genes whose silencing is permanent, i.e. that cannot be activated in response to changing tumor microenvironment or exposure to conventional chemotherapeutic agents.
  • changes in mRNA levels for genes whose baseline expression is low are more difficult to identify using cDNA arrays, but should readily be detected using differential methylation (if the loci are targeted by aberrant methylation).
  • methylation profiling is less affected by cell selection for analysis than gene expression profiling because the DNA change is thought to mark the neoplastic stem cell as well as its progeny.
  • DNA methylation cannot be measured directly.
  • Methods for detection of DNA methylation rely on 3 main principles: (1) bisulfite conversion of unmethylated cytosines to uracil (2) capture of methylated DNA with methyl-binding proteins or an antibody against 5-methyl-cytosine and (3) distinction of methylated and unmethylated cytosines by methylation-sensitive restriction enzymes.
  • Microarrays are well suited for fast analysis of multiple samples; however, they can only detect genomic regions limited by the selection of the probes. Moreover, they suffer from a host of technical issues, such as variable efficiencies of probe hybridization and probe cross- reactivity Lu R, et al., Assessing Probe-Specific Dye And Slide Biases In Two-Color Microarray Data, BMC Bioinformatics 9: 314 (2008). As a result of these limitations, microarray-based techniques can provide DNA methylation data of qualitative or semiquantitative nature in pre-selected genomic regions.
  • RRBS Reduced representation bisulfate sequencing
  • Restriction endonucleases can accurately distinguish between sequences with methylated and unmethylated cytosines in DNA. As such, described herein are methods for quantitative detection of DNA methylation levels that can be based on massively parallel sequencing of whole genome libraries with distinct signatures of methylated and unmethylated DNA created by sequential digests with methylation-specific restriction enzymes.
  • the Xmal enzyme cuts the remaining methylated CCCGGG sites leaving 5' CCGG overhangs. Next these overhangs are filled in and blunted by Klenow DNA polymerase and T4 DNA polymerase. Next 3' A tails are added to blunt ended DNA fragments by Klenow (exo-minus) DNA polymerase and sequencing adapters are ligated. DNA fragments with ligated adapters are size selected, amplified by limited PCR and the size selection of PCR products is repeated. Massive parallel sequencing follows. Unmethylated Yale sites are characterized by the initial sequence GGG while methylated Smal sites begin with CCGGG. Analysis of Smal/Xmal fragments smaller than 500 bp can provide quantitative information on methylation of 28% of total 378,855 Smal sites in the human genome.
  • sequencing adapters ligated after Smal and Xmal restriction digest were biotinylated at their ends.
  • DNA with ligated adapters was digested with Mval, a frequently cutting enzyme (CCWGG recognition site).
  • Adapters ligated to Smal/Xmal sites and genomic DNA extending to the nearest Mval cutting site were recovered by purification on Streptavidin magnetic beads. Internal sequences cut out by size reduction enzymes were removed by washing, since they will not bind to streptavidin beads.
  • streptavidin purified fraction containing biotinylated adapters connected to short fragments of genomic DNA beginning at Smal/Xmal sites were religated at the sites exposed by size reduction enzymes.
  • This step created a library of short DNA fragments containing sequencing adapters at both ends and genomic DNA flanked by Smal/Xmal sites in the middle.
  • the library was PCR amplified, size purified and cloned in a sequencing vector for validation. The presence of chimeric Smal/Xmal fragments containing human DNA from different chromosomes joined at Mval sites confied feasibility of this approach.
  • methylation sensitive and insensitive restriction enzymes namely Aval/BsoBI (CYCGRG recognition sequence) and Hpall/Mspl (CCGG recognition sequence) were used analogically to the Smal/Xmal approach.
  • the first step is restriction digest with methylation sensitive enzyme 5' overhangs created by this first enzyme were removed by Mung bean nuclease treatment. Restriction digest with the second methylation insensitive enzyme follows and 5' overhangs were filled in by Klenow and T4 DNA polymerases 3'. A tailing and cloning of adapters was the same as in the Smal/Xmal approach.
  • Aval/BsoBI method creates G starts for unmethylated and YCORG starts for methylated sequences.
  • HpaIIlMspI method creates G starts for unmethylated and CGG starts for methylated sequences. Digital reading of multiple sequences provides quantitative measure of DNA methylation at individual restriction sites.
  • the DREAM method thus is useful for epigenome-wide quantitative analysis of DNA methylation in normal and cancer cells.
  • the proposed concept of the DREAM method was well suited for massive parallel sequencing; however, it is not restricted to a particular method of DNA sequencing. We validated it by conventional Sanger sequencing with fluorescent dideoxynucleotide terminators. The method is not biased towards methylated or unmethylated sites. It has a potential application for genome wide mapping of DNA methylation in health and disease. It is not restricted to human DNA, it can be used for mapping in other species that have DNA methylation. In human pathology, the DREAM method can be used for assessing prognosis of the disease, prediction of response to treatment, monitoring the course of the disease and the response to treatment.
  • Step 3 Perform 3' End Filling and A' Tailing with Klenow Exo-Minus
  • NEB2 buffer 5 ul dCTP, dGTP, dATP "CGA”mix 10 mM 2 ul Klenow exo- (3' to 5' exo minus) 3 ul
  • Step 3 Perform 3' End Filling and A' Tailing with Klenow Exo-Minus (continued)
  • Step 4 Ligate Solexa Sequencing Adapters (PEA; Paired Ends Adapters)
  • PE adapters oligo mix PEA 25 mM 1 ul Ultra Pure DNA ligase 5 ul
  • Step 5 Perform Size Selection in Agarose Gel
  • Photograph the gel Cut out a window from 250 to 500 bp, divide in 2 even slices of increasing size A, B.
  • Step 6 PCR with Solexa From Gel Slices A, B Plan 100 ul per reaction iProof HF master mix 2x 50 ul
  • Step 7 Perform Agencourt Ampure Purification, Elute with 50 Ul EB
  • Step 8 Validation of the DREAM Library
  • Clone 2 ul in pZeroBlunt sequencing vector. Transform into bacteria. Pick 24 bacterial colonies for A and for B. Perform PCR and gel electrophoresis analysis of bacterial clones Sequence the clones containing inserts. Calculate the fraction of inserts containing bonafide Smal/Xmal fragments of genomic DNA.
  • Step 9 Sequence the Validated Library Using Illumina GAII
  • MCAM is a powerful method
  • one of its limitations relates to the microarray hybridization step, where multiple factors compromise data quality (hybridization kinetics, background, washing etc.) and to the fact that data are limited by what is present on the arrays.
  • a Solexa deep-sequencer was used to initially test MCA/deep sequencing as an alternate method and obtained reliable data (not shown), but again were faced with an important limitation - the lack of quantification, and the fact that "unmethylated” and "PCR failures” were not distinguishable.
  • DREAM Digital Restriction Enzyme Analysis of Methylation
  • Solexa sequencing has a good performance for short DNA fragments. We found that a Solexa-compatible sequencing library containing fragments 400 bp or smaller would cover 19,079 Smal sites or 48% of total
  • CGI means within a CpG island
  • Precision of the methylation frequency value obtained by the DREAM analysis depends on absolute numbers of methylated and unmethylated tags detected for each Smal site.
  • N__methyl is the number of tags with methylated signature
  • N unmethyl represents is the number of tags with unmethylated signature.
  • the tags represented 5-times and more corresponded to 85,171 Smal sites in total, including 16,238 in promoter CGIs, 14,306 in non-promoter CGIs and 3,936 in non-CGI promoters. These corresponded to 7929 promoter CGIs, 5514 non- promoter CGIs and 2877 non-CGI promoters (based on UCSC genes, with promoters defined as regions ⁇ 500 bases from transcription start). We were able to determine methylation frequencies at 22% of total genomic Smal sites and at 77% of genomic Smal sites mapping to CpG islands.
  • methylation frequencies showed binomial distribution and stark differences in methylation patterns within CpG islands versus outside of them.
  • CpG islands 84.5% of Smal sites were unmethylated (0-5% methylation) and 5.6% Smal sites showed complete methylation (95-100%).
  • Outside of CpG islands only 13.8% Smal sites were unmethylated while 40.2% were completely methylated. (Fig. 21).
  • DREAM requires only 5 ⁇ g of DNA, an amount that can obtained in nearly all cancer cases.
  • DNA is digested with Smal restriction endonuclease, which cuts unmethylated CCCGGG sites leaving a blunt end, followed by Xmal endonuclease, which cuts methylated CCCGGG sites leaving an overhang. This is followed by a simple fill in reaction followed by adapter ligation and Solexa sequencing. After sequencing, tags that start with GGG at Smal sites represent unmethylated state, while tags that start with CCGGG represent methylation. Methylation frequencies for individual Smal sites are calculated as proportions of tags with methylated signatures divided by the sum of methylated and unmethylated tags mapping to the particular Smal site. The method is outlined in Figure 23. Current Protocol For The Dream Method
  • genomic DNA Five micrograms of genomic DNA are digested with 5ul FastDigest Smal endonuclease (Fermentas, Glen Burnie, MD) for 3 hours at 37 0 C. Subsequently, 50 units (5 ul) of Xmal endonuclease (NEB, Ipswich, MA) are added and the digestion is continued for additional 16 hours.
  • the digested DNA is purified using QIAquick PCR purification kit (Qiagen, Valencia, CA). In the next step (1) fill in recesses at 3' DNA ends created by Xmal digestion and (2) add 3' dA tails to blunt ended DNA resulting either from Smal digest or filled in Xmal digest.
  • Solexa paired end sequencing adapters are ligated using Rapid T4 DNA ligase (Enzymatics, Beverly, MA). The ligation mix is size selected by electrophoresis in 2% agarose. A slice corresponding to 250-500 bp size window based on DNA ladder is cut out and DNA is extracted from agarose. Eluted DNA is amplified with Solexa paired end PCR primers using iProof high-fidelity DNA polymerase (Bio-Rad Laboratories, Hercules, CA) and 18 cycles of amplification. Resulting sequencing library is cleaned with AMPure magnetic beads (Agencourt, Beverly, MA). Sequencing on Illumina Gene Analyzer 2.
  • a Solexa core with the Illumina Gene Analyzer 2 machine can be used. Typically, more that 5 million sequences representing individual DNA molecules are collected from each sequencing lane. Sequencing tags are mapped to Smal sites in the human genome and signatures corresponding to methylated and unmethylated CpG are enumerated for each Smal site. Methylation frequencies for individual Smal sites are then calculated. Mapping of Sequencing Tags
  • the tag To match a Smal site, the tag must begin with either a GGG or a CCGGG. The rest of the tag, when a match is found, identifies the genomic location of Smal site. The match can be to either upstream or downstream of the Smal site.
  • the 45 nt from a Solexa read after leading GGG or CCGGG was compared with all 45-mer Smal sequences after the leading GGG or CCGGG. However, this proved to be too much computation.
  • a tag is mapped to the Smal site that has the lowest number of mismatches.
  • the Smal site with the second lowest number of mismatches is also calculated to determine the quality of the match. This filtering approach significantly increased the speed of the computational analysis. For example, in our preliminary study, we found that 32 million reads could be analyzed in a few hours
  • paired-end sequenced tag can be treated as two independent single end tags. It is more economical since the cost of a paired-end run is less than the cost of two single end runs. Importantly, paired-end tags offer additional information. The requirement that the length of the DNA fragment cut by Smal/Xmal be within the range for which the DNA was selected should resolve some of degenerate tags when one or both ends of the tags have multiple matches in the genome. A biologically more interesting case is when considering the methylation status of both ends of a Smal fragment.
  • both ends methylated; both ends unmethylated; and two possibilities of one end methylated and one unmethylated This allows us to access whether the methylation at two ends are independent.
  • Calibrator Standards with Defined Methylation Levels To ensure the accuracy of DNA methylation reading by the DREAM assay, we have constructed a set of 5 calibrators based on From non-human DNA sequences (Taq polymerase, luciferase and green fluorescent protein), each containing two Smal sites with a distance of 200-300 nt. The calibrators will be PCR amplified and left either untreated or in vitro methylated with the M.Sssl CpG methylase (New England Biolab) to 100%. The completeness of methylation will be checked by the resistance to Smal digestion.
  • Graded proportions of unmethylated and methylated calibrators will be mixed to create a set of control sequences methylated to 0%, 25%, 50%, 75% and 100%.
  • This standard calibrator mix will be spiked in the samples of human DNA before processing for the DREAM analysis at the ratio of 1 ng of calibrators to 10,000 ng of gDNA. We expect to get a 100 to 1000-fold coverage for each standard sequence in the DREAM library. Methylation data from these standards will be used for construction of calibration curves. Quality Control of Sequencing Libraries
  • the libraries prepared for sequencing as described above will be examined by gel electrophoresis to check for size distribution of amplified DNA fragments and the absence of contamination with primer dimers. DNA quantity and quality will be measured by UV spectrophotometry using the NanoDrop machine. Aliquots of the libraries will be cloned in a sequencing vector using the Zero Blunt® TOPO® PCR cloning kit (Invitrogen). Representative number (10 or more) of individual bacterial clones will be sequenced by conventional Sanger sequencing at M. D. Anderson core facility to evaluate the proportion of bonafide DNA fragments mapping to Smal fragments and for the correct signatures at Smal sites.
  • Pyrosequencing will be used to analyze Smal residues in spiked in calibrator standards to estimate proportions of methylated and unmethylated signatures. Further testing of sequencing libraries can be performed by real time QPCR with specifically designed primers and TaqMan MGB probes. We can use primer/probe sets detecting (1) primer dimers. (2) correctly ligated sequencing adapters, (3) sequencing adapters ligated to specific genomic sequences flanking Smal sites representing Smal fragments of several different sizes. Validation of Results by Bisulfite Analysis
  • Bisulfite pyrosequencing quantitative assays can be used for independent validation of DNA methylation levels in selected genes.
  • the bisulfite pyrosequencing results can be compared with the DREAM data. Optimization of DREAM to Minimize DNA Quantity and Maximize the Capture of Targeted CpG Sites
  • the DREAM method in its current configuration is limited to the analysis of Smal site that are within 500 bases from each other.
  • Biotinylated adapters with flanking genomic sequences starting with GGG or CCGGG methylation signatures can be captured on Dynabeads M270 Streptavidin magnetic beads (Invitrogen). Exposed Mval sites can be religated using Rapid T4 DNA ligase (Enzymatics). Religated DNA with correct sequencing adapters can be amplified with Solexa paired ends PCR primers and the DREAM procedure will follow as described above. Potential Problems and Alternative Approaches
  • a potential problem in cancer cells is represented by the fragmentation of DNA and mutations.
  • DNA fragmentation would increase the background of non-informative sequences and decrease the coverage.
  • potential approaches aimed to increase the representation of bona fide Smal fragments in sequencing libraries. As sequencing costs are expected to come down, a simple solution would be to increase the depth of sequencing, since methylation signatures are correct even in libraries with poor representation of Smal fragments.
  • sequencing limitations by the size of Smal fragments may be a problem.
  • DREAM Genome coverage limited to Smal sites only.
  • DREAM has the advantages of reliability and quantification, but disadvantages include incomplete genome coverage and limited CpG sites sampled. Theoretically, a higher genome representation can be achieved by Me-DIP or ChIP using methylated CpG antibody but our preliminary data and published studies however suggest that sensitivity of Me-DIP is limited. For example, the original report using this technology found only about 50 genes hypermethylated in the SW48 cell line, while other data show a 10 fold higher number (with >90% validation) in this same cell line.
  • PROPHETIC EXAMPLE VII FULL SCALE OF DNA METHYLATION CHANGES EV CANCER
  • Leukemia and cancer cell lines obtained from ATCC are available in the lab and their identity will be verified by DNA fingerprinting. We expect to perform DREAM analysis in 10 cell lines. Primary cells from 10 patients with acute myeloid leukemia or myelodysplastic syndrome before and after treatment with decitabine will be obtained for these demonstration studies from a leukemia tissue bank. Comparative Analysis of Methylation Results Obtained In Normal And Cancer Cells

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method of analyzing the methylation of CpG sites where genomic DNA is sequentially digested with a pair of enzymes recognizing the same restriction site (CCCGGG) containing a CpG dinucleotide. The first enzyme, Smal, cuts only at unmethylated CpG and leaves blunt ends. The second enzyme, XmaI is not blocked by methylation and leaves a short 5' overhang. The enzymes thus create methylation specific signatures at ends of digested DNA fragments. These are deciphered by next generation sequencing. Methylation levels for each sequenced restriction site are calculated based on the numbers of DNA molecules with the methylated or unmethylated signatures. Using this method and by sequencing on a massively, parallel sequencing device, DNA methylation can be analyzed in a single blood sample.

Description

DIGITAL RESTRICTION ENZYME ANALYSIS OF METHYLATION
CROSS REFERENCE TO RELATED APPLICATIONS
This patent application claims priority to US Application Serial Number 61/147,376 which is incorporated by reference herein in its entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
None.
THE NAMES OF PARTIES TO A JOINT RESEARCH AGREEMENT None.
INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON COMPACT DISC
None.
BACKGROUND OF THE INVENTION
Complex changes of DNA methylation in disease permanently disturb epigenetic regulation and promote neoplastic development. For example, DNA methylation in cancer undergoes complex changes that permanently disturb epigenetic memory and promote tumor development. However, knowledge of epigenetic alterations in cancer, both qualitative and quantitative, is lacking. Yet, quantitative information on epigenome disturbances is needed to understand the molecular basis of cancer and other disease. The current genome-wide analysis of DNA methylation is limited to the detection of qualitative changes, or requires bisulfite conversion of DNA associated with increased costs needed for higher depth of sequencing and difficulties to map degenerate bisulfite-converted sequences. Lack of reliable methods for detailed mapping of accurate percentage levels of DNA methylation across the genome represents an important problem holding our knowledge in a pre-epigenomic era. Until we have reliable and cost-effective methods for quantitative mapping of DNA methylation, the full extent and significance of epigenomic disturbances in disease cannot be ascertained.
SUMMARY OF THE INVENTION Methods of analyzing cytosine methylation in DNA and devices and systems that utilize these methods are provided. The described methodologies are useful in high resolution mapping of DNA methylation, and to determine a disease state such as cancer. As described herein, the disclosed methodologies include the step of sequentially digesting genomic DNA, in a single fraction, with a pair of enzymes (such as Smal and Xmal) to create "unmethylated" and "methylated" signatures at the ends of the digested DNA fragments. The methylated and unmethylated signatures are generated in a single tube - where each restriction digestion step is done serially and not in parallel as when two fractions are used. The first digestion creates blunt ended fragments marking unmethylated CpG sites. The second digestion leaves fragments with 5' overhangs that mark methylated CpG sites. The overhangs are filled in a subsequent step and all DNA fragments are then ligated to sequencing adapters. Different ends of each DNA fragment each bear either an unmethylated or a methylated signature. The DNA ends are then read by massively parallel sequencing, which provides the capability of analyzing millions of individual DNA molecules. Sequences are then mapped back to the genome. Methylation levels (percentages of methylation to the tens to hundreds of thousands) of individual specific CpG sites are then calculated as based on the numbers of DNA molecules having methylated or unmethylated signatures.
The described methodologies herein are sometimes referred to as "Digital Restriction Enzyme Analysis of Methylation" or "DREAM." These methods are capable of quantitative high resolution mapping of DNA methylation without the need of bisulfite treatment. As noted immediately above, the methodology utilizes digital signatures of methylated or unmethylated CpG sites at individual DNA molecules, and the results of such readings are not biased towards methylated or unmethylated DNA. Furthermore, these methodologies and devices and systems applying such methodologies use massively parallel sequencing that can measure DNA methylation levels with an error of less than 10%.
These methods described herein can be used on various systems and devices, including but not limited to Illumina GA II or Applied Biosystems SOLID, and represents a substantial improvement over conventional approaches. As used alone or in connection with a device or system, the methods taught herein add qualitatively new research capabilities not provided by current technologies. With reduced costs, the methods provide for high throughput capability, improved specificity, and quantitative accuracy. This technology also provides for a highly multiplexing platform for the discovery of cancer biomarkers and other research tools useful in the detection of epigenomic alterations in tumor tissue. The methods and devices described herein can also be used to predict a response to therapy and for therapy surveillance. This technology is also applicable for research in cancer etiology, epidemiology and cancer-related disparities. In short, this technology can and will create new insights into epigenetic mechanisms that lead to the development and progression of diseases such as cancer and neurodevelopmental disorders, degenerative diseases, cardiovascular diseases, diabetes and diseases with complex epigenetic and genetic multifactorial components.
BRIEF DECRIPTION OF THE DRAWINGS
Figure 1 provides certain data that resulted from sequenced clones in the feasibility study.
Figure 2 charts methylation status and the percent of methylated restriction versus the median size of fragment as part of the feasibility study.
Figure 3 is a chart depicting the presence in CpG islands where the percent of restriction sites in CpG islands per median size of fragments are shown. Figure 4 is a chart depicting the presence in repeats where the percent of restriction sites in repeats per median size of fragment are shown.
Figure 5 is a bar chart of the methylation of adjacent Smal sites in the white blood cell clones of the feasibility study.
Figure 6 depicts the example number of methylated Smal sites by genie position. Figure 7 depicts the amount of unmethylated and methylated CpG Islands versus
Non-CpG Islands as determined in the feasibility study of Example 1.
Figure 8 depicts the amount of unmethylated and methylated repeats versus no repeats of Example 1.
Figure 9 is a chart showing the amount of restrictions sites in the human genome by fragment size.
Figures 1OA and 1OB are depictions of cytosine methylation in normal and cancer cells. CpG islands or dense clusters of CG dinucleotides are present at transcription start sites in about half of human genes. Almost all of these CpG islands were free from cytosine methylation and gives a "green light" for transcription. CpG islands in cancer cells are frequently methylated and affected genes are permanently silenced.
Figures HA and HB show that in the contrast to CpG Islands, scattered CpG sites outside of dense islands, in gene bodies, intergenic regions and repeats are generally methylated in normal tissues and may become hypo-methylated in cancer.
Figure 12 is a depiction of the Digital Restriction Enzyme Analysis ("DREAM") methodology of DNA Methylation taught herein. CpG methylation pattern is copied after
DNA replication and therefore is not highly affected by differentiation. The method is based on creating unmethylated and methylated signatures by 2 restriction enzymes: Smal and Xmal, neoschizomers recognizing the same site. The first enzyme cuts only unmethylated DNA and leaves a GGG signature. The second enzyme cuts the remaining methylated DNA and leaves a CCGGG signature. The signatures at individual DNA molecules are then resolved by massively parallel sequencing. Counting of methylated and unmethylated signatures tells us the methylation level at each sequenced end of the Sma/Xma fragment
Figure 13A shows the number of CpG sites that can be analyzed in the genomic fraction containing Smal/Xmal fragments 500 bp and smaller.
Figure 13B shows that the accuracy of measurement is statistically dependent on the number of sequenced DNA molecules or sequencing tags obtained for each particular site. Figure 14A, 14B, 14C and 14D show that the methodology described herein is reproducible. In particular, Figures 14A & 14B show replicate analysis of the same sample of normal DNA. These analyses were performed 9 months apart and on different sequencing machines. Figure 14C and 14D show analyses of WBC DNA from 2 different normal individuals. The Y axes in Figures 14A and 14C (the scatter plots) show differences in methylation , the X axes show the numbers of sequenced tags per each site. The Histograms on the right (Figures 14B and 14D) show that differences in methylation levels between replicates were less than 5% for more than 80% of sequenced CpG sites. Differences greater than 25% were observed in about 1% sites.
Figure 15A and 15B show what levels of methylation can lead to gene silencing. We found that as little as 1% of methylation close to gene transcription start site (TSS) is associated with lower gene expression. Although we measured methylation levels in total white blood cells ("WBC"), there was a significant inverse correlation between expression in CD34+ bone marrow progenitors and methylation in their progenies.
Figure 16A, 16B, 16C and 16D show the huge disturbancies of methylation in leukemia cell lines when the results were compared to methylation levels in normal WBC. Hypermethylated sites are seen above the zero line, hypomethylated sites are below the zero line. Histograms on the right, Figures 17B & 17D, summarize the frequencies of methylation changes. A remarkable hypomethylation in the K562 cell line can be seen, and about 10% hypermethylated sites in both cell lines. Figures 17A and 17B show methylation changes in a primary sample from a patient with acute myeloid leukemia ("AML"). When we compared methylation levels in the bone marrow of an AML patient with methylation levels in WBC from a healthy donor, we also saw striking differences in leukemia: hypermethylation in 10% sites and hypomethylation in 8% sites.
Figures 18A shows that methylation changes go up in leukemia as observed inside
CpG islands. Most of these sites are unmethylated in normal blood. We saw a significant deficit of unmethylated sites in leukemia. These sites became methylated to a varying extent.
In contrast, Figures 18B shows that CpG sites outside of CpG islands were hypo-methylated in leukemia. These included predominantly repeats.
Figure 19 shows certain genes that are differentially hypermethylated in leukemia. We analyzed Smal sites close to gene TSS and found differential hypermethylation in hundreds of genes, around 10% of total analyzed, in leukemia cell lines and the primary cells. These genes were significantly enriched for developmental genes and polycomb targets reported in ES cells. Axonal guidance signaling was the top affected pathway in the cell lines, Wnt/b-catenin was the top pathway in primary cells. This pathway is important in cancer. It was also significantly enriched between methylation targets in the leukemia cell lines. Most of the methylated genes are not expressed in hematopoietic cells and are probably just passengers in the leukemia process. To look for potential drivers, we identified genes highly expressed in normal bone marrow ("BM") progenitors, unmethylated in normal white blood cells (sometimes referred to herein as "WBC") and methylated close to TSS in the AML sample. Out of 12 genes identified, 4 showed possible links to cancer. The first one is Rho- specific guanine nucleotide exchange factor, the second is a cyclin dependent kinase inhibitor p27, the third one is a pro-apoptotic transcription factor and GATA2 is an important hematopoietic developmental factor.
Fig. 20 shows the predicted error rate of methylation frequency measured by DREAM. Error rates were calculated for methylation frequencies 0-100% determined by methylated and unmethylated signatures in 5-100 sequencing tags per Smal site. The error rate shown was estimated using formula described herein.
Fig.21 shows methylation frequencies analyzed by DREAM. The values follow binomial distribution and are different within CpG islands (CGI) and outside of CpG islands (NCGI). Numbers of CpG sites analyzed for methylation levels are shown for CGI and NCGI.
Fig. 22 shows median methylation by distance from transcription start site for CpG poor promoters (left) and CpG rich promoters (right). Dotted lines are 25% and 75% percentiles. Note a narrow protected region on the left (about 500 bp) and a large protected region on the right.
Fig. 23 provides another depiction of the Digital Restriction Enzyme Analysis of
Methylation (DREAM) Technique provided herein. Genomic DNA is sequentially digested at CCCGGG sites with Smal and Xmal restriction endonucleases. Smal is blocked by CpG methylation while Xmal is not. The enzymes leave distinct signatures at the ends of fragments of digested DNA. Massively parallel sequencing of individual DNA molecules provides multiple sequencing tags mapping to each Smal site. Methylation frequency at each site is calculated based on the numbers of sequencing tags with methylated or unmethylated signatures.
Fig. 24 shows a strategy for increasing of the genomic coverage by the reduction of Smal fragment sizes for Solexa sequencing. Adapters labeled with biotin (B) are ligated to Smal/Xmal digested DNA. Small fragments (A, B, C, D) are directly processed for sequencing. High molecular weight fraction (H) is digested by a frequently cutting enzyme Mval. Biotinylated ends are captured on magnetic beads and Mval ends are religated. Resulting smaller fragments are then processed for sequencing. This approach is suggest to increase the genomic coverage of the method to restriction sites that are more than 500 bases apart.
Figure 25 shows unmethylated Smal sites from a sequenced clone. Figure 26 shows methylated Smal sites from a sequenced clone.
Figure 27 shows unmethylated and methylated Aval/BsoBI sites from a sequenced clone.
Figure 28 shows methylated HpaII sites from a sequenced clone.
DETAILED DESCRIPTION Disclosed are reliable and cost effective methods for quantitative high resolution mapping of DNA methylation status in the whole genome, and devices and systems that use the same. The described methodologies are applicable in the detection of DNA methylation changes in cells, particularly diseased cells such as cancer cells. The accuracy and genome- wide scope of this methodology will allow quantitative assessment of disturbances of epigenetic memory in disease such as cancer on an unprecedented scale. Precise quantification of DNA methylation changes allows molecular definition of epigenetic response to treatments such as anti-cancer treatments. As such, the methods taught herein are useful to identify subtypes of disease including cancer with specific epigenetic profiles reflecting unique natural history, prognosis and responsiveness to treatment.
Cytosine methylation of DNA is a vital component of epigenetic memory and characterization of methylation changes in disease has important translational implications. Cytosine methylation of DNA undergoes complex changes in disease, particularly cancer.
For example, methylation in neoplastic DNA can mark specific subsets of patients with unique natural history and/or responsiveness to treatment. Therapies targeting DNA methylation have already shown efficacy in leukemias and a potential in other malignancies.
Specifically, the methods described herein are useful to measure the frequencies of cytosine methylation by digital counting millions of methylated and unmethylated signatures at hundreds of thousands specific sites across the whole genome. These methodologies further provide an optimization of genome coverage, reproducibility, cost savings, a greater accuracy of quantitative performance, and minimizes the amount of DNA quantities needed. This novel technology is useful to accurately map the genome-wide scale of DNA methylation changes in a single patient sample, and provides a tool for quantitative mapping of DNA methylation applicable in basic, translational and clinical research. The methodology provides quantitative information on epigenome disturbances which is much needed for understanding of the molecular basis of diseases such as cancer and for development of biomarkers characterizing subsets of patients with unique natural history and/or responsiveness to specific treatments.
As provided herein, the disclosed methodology, sometimes referred to as "Digital Restriction Enzyme Analysis of Methylation" or "DREAM," provides a technique for high resolution, quantitative mapping of DNA methylation in the human genome. This approach is based on distinct signatures of methylated and unmethylated cytosines generated by treatment of genomic DNA with restriction enzymes. The signatures of individual DNA fragments are resolved by massively, parallel sequencing. In this way, for example, quantitative data can be obtained for cancer specific aberrations of DNA methylation at the more than 80,000 CpG sites, including 50% of CpG islands in the human genome, without the need of bisulfite treatment. Cytosine methylation of DNA is a vital component of epigenetic memory. Complex changes of DNA methylation in cancer permanently disturb epigenetic regulation and promote neoplastic development. These changes in cancer consist of global hypo- methylation (hypomethylation) in repetitive sequences and gene contrasts with focal hypermethylation in promoter-associated CpG islands. Hundreds to thousands of genes can be epigenetically silenced by CpG island hypermethylation in disease such as cancer, suggesting a general disturbance of epigenetic memory. For example, methylation affects individual cancer patients with varying extent.
While some patients have minimal changes, others show concordant hypermethylation of multiple genes described as CpG island methylator phenotype (CIMP). Current knowledge of methylation changes in cancer is still superficial. Most current methods for genome-wide analysis of DNA methylation are limited to detection of changes only qualitatively or require bisulfite conversion of DNA and increased costs needed for higher depth of sequencing.
On the other hand, the disclosed methodologies and uses thereof will increase understanding of the role of epigenetic mechanisms in cancer, how the epigenetic changes can be used for better diagnosis, and how the epigenetic mechanisms can be modulated for therapeutic and preventive purposes. This reliable methodology overcomes existing technological barriers related to the quantitative determination of methylation frequencies at individual CpG sites, with the maximum genome coverage, at minimum costs. With the knowledge of genes, pathways, and regions of the genome which are affected by aberrant DNA methylation in cancer, innovative therapeutic approaches can be designed to pharmacologically target the pathways. Hence, a technique for quantitative high resolution mapping of DNA methylation across the whole genome is provided. This methodology will generate specific signatures, distinct for methylated and unmethylated cytosines by treatment of genomic DNA with restriction enzymes and to resolve digitally these signatures on individual DNA fragments by massively parallel sequencing. As such, supportive data is provided herein. Quantitative high resolution mapping of DNA methylation builds on and coincides with the development and application of methods for genome-wide profiling of DNA methylation in disease such as cancer.
In addition, the disclosed methodologies provide an unbiased assessment of methylation status of individual DNA molecules across the whole genome. Digital quantitative measurement of DNA methylation genome-wide has not been achieved before. With respect to expected outcomes, this technology allows for accurate mapping the genome- wide scale of DNA methylation changes in an individual cancer sample for less than $1000 (in the current dollar value). Therapies targeting DNA methylation have shown efficacy in leukemias and a potential in other malignancies. The methodology is a tool for basic, translational and clinical research for disease including cancer and is applicable to other diseases such neurodevelopment disorders, degenerative disorders, aging, diseases with complex genetic and epigenetic components such as diabetes or cardiovascular disorders. Hence, the impact of the subject methodologies is vast.
Epigenetics refers to the study of clonally inherited changes in gene expression without accompanying genetic changes. There are three major general molecular mechanisms carrying epigenetic information - DNA methylation, histone modifications and RNA interference Cedar H., DNA Methylation and Gene Activity, Cell 53(1): 3-4 (1988); Jenuwein T et al., Translating the Histone Code,. Science 293(5532): 1074-80 (2001); Zaratiegui M, et al., Noncoding RNAs and Gene Silencing. Cell 128(4): 763-76 (2007). DNA methylation in mammals affects cytosines in CpG dinucleotides. There are approximately 30 million CpG sites in the human genome, and the majority of them are methylated. About 0.7% of DNA contains dense clusters of CpG dinucleotides, forming mostly unmethylated CpG islands Rollins RA, et al., Large Scale Structure of Genomic Methylation Patterns, Genome Res 16(2): 157-63 (2006).
Cancer is associated with complex changes in DNA methylation. For the most part, these changes involve simultaneous global demethylation and de-novo methylation at previously unmethylated CpG islands. Demethylation was first discovered by studying overall 5- methyl-cytosine content in tumors, and appears to involve primarily satellite DNA, repetitive sequences, and CpG sites located in introns Feinberg AP et al., Hypomethylation Distinguishes Genes of Some Human Cancers From Their Normal Counterparts, Nature 301(5895): 89-92 (1983); Ji W, et al., DNA Demethylation and Pericentromeric Rearrangements of Chromosome 1, Mutat Res 379(1): 33-41 (1997). In parallel to global hypomethylation, there also are distinct and frequent localized increases in methylation, often involving CpG islands Baylin SB, et al., Alterations in DNA Methylation: A Fundamental Aspect of Neoplasia, Adv Cancer Res 72: 141-96 (1998); Jones PA et al., The Epigenomics of Cancer, Cell 128(4): 683-92 (2007). Because CpG island methylation is associated with repressed transcription that is stably inherited through mitosis, this de-novo methylation in transformed cells has been proposed to serve as an alternate mechanism for inactivating tumor-suppressor genes. Aberrant methylation is strongly correlated with gene silencing in neoplasia. In the face of reliable technology to detect gene expression changes in cancer, it is relevant to discuss differences between the two approaches. While expression profiling provides a bird's eye view of cancer mRNA, methylation studies add to that analysis by identifying (1) genes whose silencing is potentially selected for during carcinogenesis (as opposed to reflecting differentiation or proliferation) and (2) genes whose silencing is permanent, i.e. that cannot be activated in response to changing tumor microenvironment or exposure to conventional chemotherapeutic agents. In addition, changes in mRNA levels for genes whose baseline expression is low are more difficult to identify using cDNA arrays, but should readily be detected using differential methylation (if the loci are targeted by aberrant methylation). Finally, specifically in a disease with various stages of differentiation, methylation profiling is less affected by cell selection for analysis than gene expression profiling because the DNA change is thought to mark the neoplastic stem cell as well as its progeny.
So far, DNA methylation cannot be measured directly. Methods for detection of DNA methylation rely on 3 main principles: (1) bisulfite conversion of unmethylated cytosines to uracil (2) capture of methylated DNA with methyl-binding proteins or an antibody against 5-methyl-cytosine and (3) distinction of methylated and unmethylated cytosines by methylation-sensitive restriction enzymes. Frommer M, et al., A Genomic Sequencing Protocol That Yields a Positive Display Of 5-Methylcytosine Residues In Individual DNA Strands, Proc Natl Acad Sci U S A 89(5): 1827-31 (1992); Cross SH, et al., Purification of CpG Islands Using a Methylated DNA Binding Column, Nat Genet 6(3): 236- 44 (1994); Rougier N, et al., Chromosome Methylation Patterns During Mammalian Preimplantation Development Genes Dev 12(14): 2108-13 (1998); Toyota M, et al., Identification of Differentially Methylated Sequences in Colorectal Cancer By Methylated CpG Island Amplification, Cancer Res 59(10): 2307-12 (1999); Khulan B, et al., Comparative Isoschizomer Profiling ofCytosine Methylation: The HELP Assay, Genome Res 16(8): 1046-55 (2006). All three principles of DNA methylation analysis have been applied to recent high throughput technologies, microarrays and massively parallel sequencing. Microarrays are well suited for fast analysis of multiple samples; however, they can only detect genomic regions limited by the selection of the probes. Moreover, they suffer from a host of technical issues, such as variable efficiencies of probe hybridization and probe cross- reactivity Lu R, et al., Assessing Probe-Specific Dye And Slide Biases In Two-Color Microarray Data, BMC Bioinformatics 9: 314 (2008). As a result of these limitations, microarray-based techniques can provide DNA methylation data of qualitative or semiquantitative nature in pre-selected genomic regions.
On the other hand, massively parallel sequencing is not constrained by the shortcomings of the microarray technology. It has already shown its potential by sequencing the first cancer genomes. Ley TJ, et al., DNA Sequencing OfA Cytogenetically Normal Acute Myeloid Leukaemia Genome, Nature 456(7218): 66-72 (2008). As shown in Table 1, costs are coming down and this is bringing unprecedented possibilities for genome-wide epigenetic research.
Table 1
Figure imgf000013_0001
Sequencing of whole bisulfite-converted genome would in theory map DNA methylation with a single base resolution. Although it has been achieved for the 120 Mb genome in Arabidopsis. Cokus SJ, et al., Shotgun Bisulphite Sequencing of The Arabidopsis Genome Reveals DNA Methylation Patterning. Nature 452(7184): 215-9, (2008); Lister R, et al., Highly integrated single-base resolution maps of the epigenome in Arabidopsis, Cell 133(3): 523-36 (2008). This task remains prohibitively expensive for the 25-fold larger human genome. Reduced representation bisulfate sequencing (RRBS) provided a high resolution quantitative analysis and of a CpG-rich fraction of the mouse genome Meissner A, et al., Genome-Scale DNA Methylation Maps of Pluripotent And Differentiated Cells, Nature 454(7205): 766-70 (2008). However, the costs of sequencing with a sufficient multiplicity of coverage for bisulfite converted DNA are still high. Additionally, bisulfate conversion is not 100% efficient. This affects the quantitative accuracy of the method.
Restriction endonucleases can accurately distinguish between sequences with methylated and unmethylated cytosines in DNA. As such, described herein are methods for quantitative detection of DNA methylation levels that can be based on massively parallel sequencing of whole genome libraries with distinct signatures of methylated and unmethylated DNA created by sequential digests with methylation-specific restriction enzymes.
EXAMPLE I INITIAL FEASIBILITY STUDY Initially, we had proposed a new quantitative method for digital reading of DNA methylation genome-wide. The method is based on parallel sequencing of genomic DNA digested with pairs of methylation-sensitive and methylation insensitive enzymes recognizing the same restriction sites containing CG dinucleotides. Unmethylated and methylated restriction sites are distinguished by nucleotides left at the flanks of DNA fragments after sequential digestion and end modification steps. We developed this approach using Smal as methylation-sensitive and Xmal as methylation-insensitive enzyme. Genomic DNA is first digested with Smal that cuts unmethylated CCCGGG sites leaving blunt ended fragments starting with GGG at their 5' ends. Subsequently, the Xmal enzyme cuts the remaining methylated CCCGGG sites leaving 5' CCGG overhangs. Next these overhangs are filled in and blunted by Klenow DNA polymerase and T4 DNA polymerase. Next 3' A tails are added to blunt ended DNA fragments by Klenow (exo-minus) DNA polymerase and sequencing adapters are ligated. DNA fragments with ligated adapters are size selected, amplified by limited PCR and the size selection of PCR products is repeated. Massive parallel sequencing follows. Unmethylated Sinai sites are characterized by the initial sequence GGG while methylated Smal sites begin with CCGGG. Analysis of Smal/Xmal fragments smaller than 500 bp can provide quantitative information on methylation of 28% of total 378,855 Smal sites in the human genome.
To increase the coverage further, we adapted the DREAM method to analyze all Sinai sites. To achieve this, sequencing adapters ligated after Smal and Xmal restriction digest were biotinylated at their ends. DNA with ligated adapters was digested with Mval, a frequently cutting enzyme (CCWGG recognition site). Adapters ligated to Smal/Xmal sites and genomic DNA extending to the nearest Mval cutting site were recovered by purification on Streptavidin magnetic beads. Internal sequences cut out by size reduction enzymes were removed by washing, since they will not bind to streptavidin beads. Next the streptavidin purified fraction containing biotinylated adapters connected to short fragments of genomic DNA beginning at Smal/Xmal sites were religated at the sites exposed by size reduction enzymes. This step created a library of short DNA fragments containing sequencing adapters at both ends and genomic DNA flanked by Smal/Xmal sites in the middle. The library was PCR amplified, size purified and cloned in a sequencing vector for validation. The presence of chimeric Smal/Xmal fragments containing human DNA from different chromosomes joined at Mval sites confied feasibility of this approach.
To increase the coverage of the epigenome even further, additional pairs of methylation sensitive and insensitive restriction enzymes, namely Aval/BsoBI (CYCGRG recognition sequence) and Hpall/Mspl (CCGG recognition sequence) were used analogically to the Smal/Xmal approach. In this case, the first step is restriction digest with methylation sensitive enzyme 5' overhangs created by this first enzyme were removed by Mung bean nuclease treatment. Restriction digest with the second methylation insensitive enzyme follows and 5' overhangs were filled in by Klenow and T4 DNA polymerases 3'. A tailing and cloning of adapters was the same as in the Smal/Xmal approach. Aval/BsoBI method creates G starts for unmethylated and YCORG starts for methylated sequences. HpaIIlMspI method creates G starts for unmethylated and CGG starts for methylated sequences. Digital reading of multiple sequences provides quantitative measure of DNA methylation at individual restriction sites.
The DREAM method thus is useful for epigenome-wide quantitative analysis of DNA methylation in normal and cancer cells. The proposed concept of the DREAM method was well suited for massive parallel sequencing; however, it is not restricted to a particular method of DNA sequencing. We validated it by conventional Sanger sequencing with fluorescent dideoxynucleotide terminators. The method is not biased towards methylated or unmethylated sites. It has a potential application for genome wide mapping of DNA methylation in health and disease. It is not restricted to human DNA, it can be used for mapping in other species that have DNA methylation. In human pathology, the DREAM method can be used for assessing prognosis of the disease, prediction of response to treatment, monitoring the course of the disease and the response to treatment.
With regards to in vitro and in vivo experiments, we assessed the feasibility of the proposed DREAM method by analyzing DNA from a control sample of adult peripheral blood leukocytes. See, Figures 1 to 9. We prepared libraries with different insert sizes for Solexa sequencing and validated them by cloning into pCR4 sequencing vector. Fluorescent Sanger sequencing of bacterial clones confined the potential of the method. Of 79 clones with inserts of genomic DNA, methylation status based on the initial 3-5 nucleotides was clearly distinguishable at both Smal sites flanking the genomic inserts in 68 clones (86%), 8 clones (10%) had single Smal site only and 3 (4%) clones had no Smal site at their flanks. Seventy clones (89%) were present only once, 4 clones were duplicates, 5 clones mapped to identical 28S ribosomal DNA sequence.
Twenty-five clones (32%) mapped to repetitive sequences (SINE, LINE and ribosomal repeats), 54 clones (68%) contained non-repetitive sequences. Methylation status of two Smal sites at the flanks was highly concordant: 20 clones (29%) had both sites unmethylated, 38 clones (56%) had both sites methylated, and only 10 clones (15%) had one site unmethylated and the other methylated. Methylation was more frequent outside of CpG islands (78%) than within CpG islands (37%) It was more pronounced in repeats (80%) compared to non repetitive sequences (56%). As expected, methylation was also more prevalent in sequences mapping to gene bodies (72%) and in intergenic regions (68%) than around gene starts (30%) or ends (0%). We also validated the use of Aval/BsoBI enzymes and Hpall/Mspl enzymes for digital reading of the DNA methylation status.
EXAMPLE II Experimental Protocol: Construction of DREAM Libraries for Sequencing
Step 1: Smal digestion
Genomic DNA 5 ug x ul
10 NEB Buffer #4 10 ul
BSA (10 mg/ml) I uI
Water 85-x ul
Smal Fast Digest enzyme 5 ul
Total 100 ul
Incubate at 37C for 3 hours.
Step 2: Xmal digestion
Add 100 ul of mix {NEB Buffer #4 10 ul, BSA 1 ul, water 90 ul} Add Xmal enzyme 5 ul
Continue digesting overnight at 37C. Purify by QIAquick PCR purification kit, elute in 40 ul EB (20+20 to improve elution).
Step 3: Perform 3' End Filling and A' Tailing with Klenow Exo-Minus
Mix on ice
Digested DNA sample 40 ul
NEB2 buffer 5 ul dCTP, dGTP, dATP "CGA"mix 10 mM 2 ul Klenow exo- (3' to 5' exo minus) 3 ul
Total 50 ul
Incubate for 30 minutes at 37C. Step 3: Perform 3' End Filling and A' Tailing with Klenow Exo-Minus (continued)
Follow the instructions in the MinElute PCR purification kit to purify on one QIAquick MinElute column eluting in 20 ul (10+10) of EB. Step 4: Ligate Solexa Sequencing Adapters (PEA; Paired Ends Adapters)
DNA sample with 3'A overhangs 20 ul
Ultra Pure ligase buffer 25 ul
PE adapters oligo mix;PEA 25 mM 1 ul Ultra Pure DNA ligase 5 ul
Incubate at room temp for 20 minutes. Stop by adding 10 ul of cresol red in 60% glycerol & EDTA.
Step 5: Perform Size Selection in Agarose Gel
Separate gels for each sample.
Make a small (7x8 cm) 2% agarose gel in fresh sodium borate, about 8 mm thick (50 ml).
Add ethidium bromide 2 ul to 100 ml of the gel. Use a comb with 6 wide teeth and its 1.5 mm thick side to be able to load whole ligation in a single well. Use fresh sodium borate running buffer with EtBr 8 ul per 400 ml.
Load 20 ul of the 100 bp marker in wells 1 and 5.
Load the whole ligation volume very carefully to well #3
Run elfo at 150V for 60 minutes.
Photograph the gel. Cut out a window from 250 to 500 bp, divide in 2 even slices of increasing size A, B.
Extract DNA from gel slices A, B; elute in 30 ul EB each.
Step 6: PCR with Solexa From Gel Slices A, B Plan 100 ul per reaction iProof HF master mix 2x 50 ul
Primer Solexa PE PCRl 50 uM 1 ul
Primer Solexa PE PCR2 50 uM 1 ul
Cresol red in 60% glycerol 10 ul Water 8 ul
= PE PCR master mix 70 ul
Vortex and aliquot master mix in 0.65 ul tubes, 70 ul per tube.
Add 30 ul sample, aliquot in 2 tubes, 50 ul each.
Run PCR, program PHUSION Initial denaturation 98C 30 sec
Cycling: 98C 10 sec
65C 30 sec 72C 30 sec
Total number of cycles 18 Final extension 72C 5 minutes Step 7: Perform Agencourt Ampure Purification, Elute with 50 Ul EB
Check 5 ul of purified PCR products in agarose gel Measure DNA by NanoDrop
Step 8: Validation of the DREAM Library
Clone 2 ul in pZeroBlunt sequencing vector. Transform into bacteria. Pick 24 bacterial colonies for A and for B. Perform PCR and gel electrophoresis analysis of bacterial clones Sequence the clones containing inserts. Calculate the fraction of inserts containing bonafide Smal/Xmal fragments of genomic DNA.
Step 9: Sequence the Validated Library Using Illumina GAII
EXAMPLE III Inception of Digital Restriction Enzyme Analysis of Methylation (DREAM)
While MCAM is a powerful method, one of its limitations relates to the microarray hybridization step, where multiple factors compromise data quality (hybridization kinetics, background, washing etc.) and to the fact that data are limited by what is present on the arrays. With the advent of affordable deep-sequencing technology, it is now possible to bypass microarrays completely. A Solexa deep-sequencer was used to initially test MCA/deep sequencing as an alternate method and obtained reliable data (not shown), but again were faced with an important limitation - the lack of quantification, and the fact that "unmethylated" and "PCR failures" were not distinguishable. These problems apply to all current whole genome methylation analysis methods except whole genome bisulfite sequencing, which has its own set of issues (representation, depth of sequencing required, difficulty in mapping tags to degenerate bisulfite-converted sequences with low complexity).
We then combined the power of restriction enzyme analysis with deep sequencing and developed our methodology, Digital Restriction Enzyme Analysis of Methylation ("DREAM") that provides remarkably quantitative methylation data on hundreds of thousands of CpG sites simultaneously. In the method, DNA is digested with Smal, which cuts unmethylated CCCGGG sites leaving a blunt end, followed by Xmal, which cuts methylated CCCGGG sites leaving an overhang. This is followed by a fill in reaction followed by adapter ligation and Solexa sequencing. After sequencing, tags that start with GGG at Smal sites represent unmethylated status, while tags that start with CCGGG represent methylation. In Silico Analysis of the Human Genome for DREAM Analysis
We performed computational analysis of the entire human genome (hgl8, March
2006, UCSC Genome Database) to estimate the number of Smal sites captured in DREAM libraries of Smal fragments with increasing size. See, Table 2 immediately below, where Numbers and genomic coverage of Smal sites in sequencing libraries with increasing sizes of fragments where human genomic DNA (hgl8) was digested in silico with SmallXmal. Here, we used the UCSC definition of CpG islands Gardiner-Garden M, et al., CpG Islands in
Vertebrate Genomes, J MoI Biol 196(2): 261-82 (1987). Solexa sequencing has a good performance for short DNA fragments. We found that a Solexa-compatible sequencing library containing fragments 400 bp or smaller would cover 19,079 Smal sites or 48% of total
Smal sites in CpG islands and 72,969 or 22% of total Smal sites outside of CpG islands.
Table 2
Figure imgf000019_0001
Note: CGI means within a CpG island; Non CGI means outside of CpG islands
Precision of the DREAM analysis
Precision of the methylation frequency value obtained by the DREAM analysis depends on absolute numbers of methylated and unmethylated tags detected for each Smal site. The error can be estimated using the following formula: sqrt(p( 1 -p))/sqrt(2+ Njnethyl+N unmethyl), where p=(0.5+N_methyl)/(l+N_methyl+N_unmethyl),
N__methyl is the number of tags with methylated signature and
N unmethyl represents is the number of tags with unmethylated signature.
We calculated theoretical errors for different numbers of tags sequenced per single Smal site. Results of five sequenced tags per site had expected errors between 10-19%, while 100 tags per Smal would measure methylation frequencies with errors between 0.7 and 5% (Fig. 20).
EXAMPLE IV Pilot Experiment with DREAM Analysis of Human Genomic DNA DREAM was applied to a sample of normal peripheral blood DNA. Table 3 immediately below provides the data associated with a pilot DREAM analysis with Solexa single read, 50 bp, 4 lanes and total 32.5 million reads, 16.9 million mapped to unique Smal sites. Using four lanes in a Solexa sequencing cell, 32.5 million sequencing tags with the length of 50 bases were obtained, of which 23.4 million mapped to Smal sites and 6.5 million were not unique sequences (most of which were LINE and AIu repeats). Out of 16.9 million uniquely mapped sequences, the tags represented 5-times and more corresponded to 85,171 Smal sites in total, including 16,238 in promoter CGIs, 14,306 in non-promoter CGIs and 3,936 in non-CGI promoters. These corresponded to 7929 promoter CGIs, 5514 non- promoter CGIs and 2877 non-CGI promoters (based on UCSC genes, with promoters defined as regions ± 500 bases from transcription start). We were able to determine methylation frequencies at 22% of total genomic Smal sites and at 77% of genomic Smal sites mapping to CpG islands.
TABLE 3
Figure imgf000020_0001
As expected, methylation frequencies showed binomial distribution and stark differences in methylation patterns within CpG islands versus outside of them. In CpG islands, 84.5% of Smal sites were unmethylated (0-5% methylation) and 5.6% Smal sites showed complete methylation (95-100%). Outside of CpG islands, only 13.8% Smal sites were unmethylated while 40.2% were completely methylated. (Fig. 21).
To confirm the quantitative aspect of DREAM, we focused on 22 imprinted regions (promoter CGIs or differentially methylated regions, DMRs), where mono-allelic methylation is often found. In these regions, 13/22 (59%) had evidence of partial methylation consistent with mono-allelic methylation (i.e. 40-60% methylation density). By contrast, 0/24 randomly selected genes (one per chromosome) showed this phenomenon, while 1/7 genes imprinted in mice but not known to be imprinted in humans had evidence of such partial methylation. These data are consistent with the fact that many (but not all) imprinted genes show partial methylation in promoter and/or imprinting control regions, while this is very rare across the genome. For additional validation, we examined genes previously reported to be hypermethylated in normal blood Shen L, et al., Genome-Wide Profiling of DNA Methylation Reveals A Class of Normally Methylated CpG Island Promoters, PLoS Genet 3(10): 2023-36 (2007). Of 13 genes validated by bisulfite pyrosequencing, all showed hypermethylation (mean 89%) by the DREAM method.
Finally, we examined methylation patterns by distance from transcription start site in CGIs vs. non-CGI DNA. As shown in Figures 22A and 22B, based on analysis of over 50,000 CpG sites, CGI transcription start sites are associated with very low levels of DNA methylation across 100 kb upstream and 5 kb downstream; non-CGI transcription start sites show higher levels of methylation, and a much narrower region of methylation protection (0.5 kb around transcription start sites). Overall, these data demonstrate the unique power of quantitative methylation analysis by the DREAM method.
EXAMPLE V Validation of the DREAM Precision by Bisulfite Pyrosequencing of Spiked in Standards
To validate the quantitative accuracy of the DREAM method, we spiked the sample of human genomsic DNA with plasmids containing parts of luciferase and GFP genes. Each gene had 2 Smal sites in a distance of 241 and 196 bp, respectively. The luciferase plasmid was unmethylated while the GFP plasmid was partially methylated by a CpG methylase M.Sssϊ. One part of the sample was processed by the DREAM protocol and methylation signatures of luciferase and GFP Smal sites were counted. Another part of the sample was converted by bisulfite and methylation frequencies at luciferase and GFP were determined by standard bisulfite pyrosequencing. Results of both methods were in a remarkable agreement as shown in Table 4 immediately below - the validation of DREAM analysis of plasmid calibrators by bisulfite pyrosequencing.
TABLE 4
Figure imgf000021_0001
In summary, our preliminary data show that DREAM accurately and quantitatively mapped about 85,000 CpG sites corresponding to over 13,500 promoters. Importantly, the proposed method is not limited to promoters; its coverage of non-promoter sites (including repeats) is extensive, thus providing whole genome approximation.
PROPHETIC EXAMPLE VI RESEARCH, DESIGN AND ADDITIONAL METHODS
To provide quantitative data on -80,000 CpG sites, quantitative aberrations of DNA methylation in cancer can be studied. Epigenetic therapy with DNA methylation inhibitors has already demonstrated survival benefits in hematologic malignancies and is evaluated in other cancers. A complete understanding of cancer biology, particularly as relevant to progression and predicting therapy results, therefore requires a comprehensive analysis of DNA methylation in the cancer epigenome. We propose to develop a cost-effective whole genome technology based on massively parallel next generation sequencing capable of mapping of epigenetic differences between normal and cancer cells.
The method we propose is not bisulfite based but still provides absolute quantification for the methylation data. In our initial configuration, DREAM requires only 5μg of DNA, an amount that can obtained in nearly all cancer cases. Briefly, DNA is digested with Smal restriction endonuclease, which cuts unmethylated CCCGGG sites leaving a blunt end, followed by Xmal endonuclease, which cuts methylated CCCGGG sites leaving an overhang. This is followed by a simple fill in reaction followed by adapter ligation and Solexa sequencing. After sequencing, tags that start with GGG at Smal sites represent unmethylated state, while tags that start with CCGGG represent methylation. Methylation frequencies for individual Smal sites are calculated as proportions of tags with methylated signatures divided by the sum of methylated and unmethylated tags mapping to the particular Smal site. The method is outlined in Figure 23. Current Protocol For The Dream Method
Five micrograms of genomic DNA are digested with 5ul FastDigest Smal endonuclease (Fermentas, Glen Burnie, MD) for 3 hours at 370C. Subsequently, 50 units (5 ul) of Xmal endonuclease (NEB, Ipswich, MA) are added and the digestion is continued for additional 16 hours. The digested DNA is purified using QIAquick PCR purification kit (Qiagen, Valencia, CA). In the next step (1) fill in recesses at 3' DNA ends created by Xmal digestion and (2) add 3' dA tails to blunt ended DNA resulting either from Smal digest or filled in Xmal digest. Both filling in and A-tailing are achieved in single reaction using Klenow DNA polymerase lacking 3 '-5' exonuclease activity (NEB) and dNTP mix. Solexa paired end sequencing adapters are ligated using Rapid T4 DNA ligase (Enzymatics, Beverly, MA). The ligation mix is size selected by electrophoresis in 2% agarose. A slice corresponding to 250-500 bp size window based on DNA ladder is cut out and DNA is extracted from agarose. Eluted DNA is amplified with Solexa paired end PCR primers using iProof high-fidelity DNA polymerase (Bio-Rad Laboratories, Hercules, CA) and 18 cycles of amplification. Resulting sequencing library is cleaned with AMPure magnetic beads (Agencourt, Beverly, MA). Sequencing on Illumina Gene Analyzer 2.
A Solexa core with the Illumina Gene Analyzer 2 machine can be used. Typically, more that 5 million sequences representing individual DNA molecules are collected from each sequencing lane. Sequencing tags are mapped to Smal sites in the human genome and signatures corresponding to methylated and unmethylated CpG are enumerated for each Smal site. Methylation frequencies for individual Smal sites are then calculated. Mapping of Sequencing Tags
Normally, Solexa reads are mapped to the human genome by ELAND, software that finds any match to the genome within two substitutions. In the DREAM protocol, the most tags should be from either sides of a Smal site, the total number of which is much smaller than the total genome size. Therefore, we will map directly by comparing a tag with each Smal site. The advantage of this approach is that the mapping quality is much higher because our approach permits any number of sequencing errors, better than ELAND, in which only two mutations are allowed.
To match a Smal site, the tag must begin with either a GGG or a CCGGG. The rest of the tag, when a match is found, identifies the genomic location of Smal site. The match can be to either upstream or downstream of the Smal site. In our preliminary study, the 45 nt from a Solexa read after leading GGG or CCGGG was compared with all 45-mer Smal sequences after the leading GGG or CCGGG. However, this proved to be too much computation. To increase computational speed, we restricted the mapping to those tags whose 8 nt after CCCGGG match the corresponding 8 nt of a Smal site within two substitutions. A tag is mapped to the Smal site that has the lowest number of mismatches. The Smal site with the second lowest number of mismatches is also calculated to determine the quality of the match. This filtering approach significantly increased the speed of the computational analysis. For example, in our preliminary study, we found that 32 million reads could be analyzed in a few hours
Analysis of Paired-End Sequenced Tags Sequencing of paired-ends enables reading of individual DNA molecules from both ends. A paired-end sequenced tag can be treated as two independent single end tags. It is more economical since the cost of a paired-end run is less than the cost of two single end runs. Importantly, paired-end tags offer additional information. The requirement that the length of the DNA fragment cut by Smal/Xmal be within the range for which the DNA was selected should resolve some of degenerate tags when one or both ends of the tags have multiple matches in the genome. A biologically more interesting case is when considering the methylation status of both ends of a Smal fragment. We have a total of four possibilities: both ends methylated; both ends unmethylated; and two possibilities of one end methylated and one unmethylated. This allows us to access whether the methylation at two ends are independent. In the extreme case, we may have 50% overall methylation on both ends but all tags are either both ends methylated or both ends unmethylated. This may indicate that one chromosome is methylated and the other unmethylated in all cells. It could also mean that half of the cells are methylated the other half unmethylated. Calibrator Standards with Defined Methylation Levels To ensure the accuracy of DNA methylation reading by the DREAM assay, we have constructed a set of 5 calibrators based on From non-human DNA sequences (Taq polymerase, luciferase and green fluorescent protein), each containing two Smal sites with a distance of 200-300 nt. The calibrators will be PCR amplified and left either untreated or in vitro methylated with the M.Sssl CpG methylase (New England Biolab) to 100%. The completeness of methylation will be checked by the resistance to Smal digestion. Graded proportions of unmethylated and methylated calibrators will be mixed to create a set of control sequences methylated to 0%, 25%, 50%, 75% and 100%. This standard calibrator mix will be spiked in the samples of human DNA before processing for the DREAM analysis at the ratio of 1 ng of calibrators to 10,000 ng of gDNA. We expect to get a 100 to 1000-fold coverage for each standard sequence in the DREAM library. Methylation data from these standards will be used for construction of calibration curves. Quality Control of Sequencing Libraries
The libraries prepared for sequencing as described above will be examined by gel electrophoresis to check for size distribution of amplified DNA fragments and the absence of contamination with primer dimers. DNA quantity and quality will be measured by UV spectrophotometry using the NanoDrop machine. Aliquots of the libraries will be cloned in a sequencing vector using the Zero Blunt® TOPO® PCR cloning kit (Invitrogen). Representative number (10 or more) of individual bacterial clones will be sequenced by conventional Sanger sequencing at M. D. Anderson core facility to evaluate the proportion of bonafide DNA fragments mapping to Smal fragments and for the correct signatures at Smal sites. Pyrosequencing will be used to analyze Smal residues in spiked in calibrator standards to estimate proportions of methylated and unmethylated signatures. Further testing of sequencing libraries can be performed by real time QPCR with specifically designed primers and TaqMan MGB probes. We can use primer/probe sets detecting (1) primer dimers. (2) correctly ligated sequencing adapters, (3) sequencing adapters ligated to specific genomic sequences flanking Smal sites representing Smal fragments of several different sizes. Validation of Results by Bisulfite Analysis
Bisulfite pyrosequencing quantitative assays can be used for independent validation of DNA methylation levels in selected genes. The bisulfite pyrosequencing results can be compared with the DREAM data. Optimization of DREAM to Minimize DNA Quantity and Maximize the Capture of Targeted CpG Sites
We can determine the minimum amount of genomic DNA required for the DREAM analysis by serial dilutions of the starting material for preparation of sequencing libraries. With the current method, less than 1% of the library prepared for sequencing is actually used for Solexa sequencing. Since a single haploid genome is contained in 3.3 pg of DNA, it is reasonable to estimate that 10,000 haploid genomes contained in 33 ng of human gDNA will be sufficient for representative analysis of DNA methylation with median 100-fold coverage of Smal sites by the DREAM method Procedures to Maximize the Capture of Bona Fide Smal Fragments Random DNA fragments resulting from broken DNA would create a background of non-informative sequences in the DREAM libraries. If these non-informative fragments would prevail over the bonafide Smal fragments, more sequencing runs would be needed to obtain the desired coverage of CpG sites. This would mean higher sequencing costs. We propose to explore several ways to increase the proportion of bona fide Smal fragments in the sequencing libraries. (1) Limited and specific repair of DNA ends after digestion. Only 3' recessed ends exposed by Xmal digestion will be repaired using exonuclease-negative Klenow DNA polymerase. Fragments with 3' overhangs will not be repaired and integrated into sequencing libraries. (2) To further decrease the contamination with DNA fragments of randomly broken DNA, dTTP will be omitted from the fill-in-A-tailing reaction (Smith et al. 2009). (3) Modifications of Solexa PCR primers enriching for unmethylated and methylated Smal fragments will be tested. Calibrator standards with defined methylation and pyrosequencing analysis of their Smal sites as described above will be useful for quick evaluation of these approaches Strategies to Improve the Genome Coverage
The DREAM method in its current configuration is limited to the analysis of Smal site that are within 500 bases from each other. To overcome this limitation, we propose to remove the inner non-informative parts of large Smal fragments by restriction digest by an unrelated enzyme. In a modification of the DREAM procedure, outlined in Figure 24, we propose to ligate Smal and Xmal digested DNA to sequencing adaptors tagged with biotin. After ligation and size selection in agarose gel, fraction of DNA fragments greater than 500 bp can be processed separately. Restriction enzyme FastDigest Mval (Fermentas) cutting at CCΛWGG and thus not affected by CpG methylation will cut inside the Smal fragments ligated to biotinylated sequencing adapters. Biotinylated adapters with flanking genomic sequences starting with GGG or CCGGG methylation signatures can be captured on Dynabeads M270 Streptavidin magnetic beads (Invitrogen). Exposed Mval sites can be religated using Rapid T4 DNA ligase (Enzymatics). Religated DNA with correct sequencing adapters can be amplified with Solexa paired ends PCR primers and the DREAM procedure will follow as described above. Potential Problems and Alternative Approaches
Technically, all the proposed experiments are straightforward and feasible. Our approach is backed by a limited set of preliminary data demonstrating feasibility in normal blood cells.
A potential problem in cancer cells is represented by the fragmentation of DNA and mutations. DNA fragmentation would increase the background of non-informative sequences and decrease the coverage. We have suggested potential approaches aimed to increase the representation of bona fide Smal fragments in sequencing libraries. As sequencing costs are expected to come down, a simple solution would be to increase the depth of sequencing, since methylation signatures are correct even in libraries with poor representation of Smal fragments.
Further, sequencing limitations by the size of Smal fragments may be a problem. We propose to reduce the size of large fragments by digestions with a frequent cutting enzyme not affected by methylation.
Genome coverage limited to Smal sites only. One could question whether DREAM is the best way to approach whole genome DNA methylation studies. In fact, there is no current consensus on the issue. DREAM has the advantages of reliability and quantification, but disadvantages include incomplete genome coverage and limited CpG sites sampled. Theoretically, a higher genome representation can be achieved by Me-DIP or ChIP using methylated CpG antibody but our preliminary data and published studies however suggest that sensitivity of Me-DIP is limited. For example, the original report using this technology found only about 50 genes hypermethylated in the SW48 cell line, while other data show a 10 fold higher number (with >90% validation) in this same cell line. Weber M, et al., Chromosome-Wide And Promoter-Specific Analyses Identify Sites Of Differential DNA Methylation In Normal And Transformed Human Cells, Nat Genet 37(8): 853-62 (2005). A recent paper also reports a disappointingly low level of gene detection using Me-DIP in cancer. Jacinto FV, et al., Discovery of Epigenetically Silenced Genes By Methylated DNA Immunoprecipitation In Colon Cancer Cells, Cancer Res 67(24): 11481-6 (2007). True whole genome coverage can only be achieved currently by whole genome bisulfite- sequencing, something that has not yet been done for a human genome (and would be prohibitively expensive).
PROPHETIC EXAMPLE VII FULL SCALE OF DNA METHYLATION CHANGES EV CANCER
Based on our preliminary data obtained with microarray analysis, we expect hundreds to thousands alterations of DNA methylation profiles to be present in cancer epigenomes. Estecio MR, et al., High Throughput Methylation Profiling By MCA Coupled To CpG Island Microarray, Genome Res 17(10): 1529-36 (2007); Kroeger H, et al., Aberrant CpG Island Methylation in Acute Myeloid Leukemia is Accentuated at Relapse, Blood 112(4): 1366-73 (2008). By comparing results of DREAM analysis performed in samples of normal blood with results from leukemia cell lines and primary leukemia cells from patients we will demonstrate the capacity of the DREAM method to map the full scale of cancer related changes in DNA methylation. Epigenetic therapy with DNA methylation inhibitors azacitidine and decitabine has already demonstrated survival benefits in hematologic malignancies, however, molecular mechanisms involved in the response to treatment are poorly understood. We will test the potential of the DREAM technology to quantify and map changes of DNA methylation in patients treated with demethylating agents. Biological Samples
Leukemia and cancer cell lines obtained from ATCC are available in the lab and their identity will be verified by DNA fingerprinting. We expect to perform DREAM analysis in 10 cell lines. Primary cells from 10 patients with acute myeloid leukemia or myelodysplastic syndrome before and after treatment with decitabine will be obtained for these demonstration studies from a leukemia tissue bank. Comparative Analysis of Methylation Results Obtained In Normal And Cancer Cells
We expect to obtain quantitative data on DNA methylation at 50,000 and more CpG sites for each normal or cancer sample. We will compare the differences between methylation levels at individual sites using nonparametric t-tests. We propose to use averaging of methylation over regions of different lengths and explore the potential of this approach for the discovery of new biomarkers for cancer. We will also assign weighs to Smal sites based on the number of sequenced tags. Smal sites with higher tag counts should be more heavily weighted in the average. A previously published statistical theory taking into account tag counts, as well as natural variations among nearby Smal sites, can be adapted. Baggerly KA, et al., Differential Expression in SAGE: Accounting For Normal Between-Library Variation, Bioinformatics 19(12): 1477-83 (2003).

Claims

We Claim: L A method of analyzing methylation of CpG islands in a sample comprising the steps of obtaining genomic DNA fragments in a single sample wherein the fragments are produced by sequentially digesting unmethylated DNA and methylated DNA with a pair of enzymes which recognize the restriction site CCCGGG;
generating a methylated signature or an unmethylated signature of the DNA fragments digested in the sample; and
determining the amount of methylation levels for each sequenced restriction site by parallel sequencing of the signatures produced.
2. The method of claim 1, wherein the pair of enzymes comprises a first enzyme, Smal and a second enzyme, Xmal.
3. The method of claim 1, wherein said methylated signature begins with the nucleotide sequence CCCGGG.
4. The method of claim 1, wherein the unmethylated signature begins with the nucleotide sequence GGG.
5. The method of claim 2, wherein the first enzyme Smal cuts only at unmethylated CpG and leaves blunt ends.
6. The method of claim 2, wherein Xmal is not blocked by methylation and leaves a short 5' overhang creating methylation specific signatures at ends of digested DNA fragments.
7. The method of claim 1, wherein the methylation levels for each sequenced restriction site are calculated based on the numbers of DNA fragments with the methylated or unmethylated signatures.
8. The method of claim 1, wherein the methylated signature is generated by filling in a 5' overhang produced when the methylated DNA is digested.
9. The method of claim 1, wherein the fragments are ligated to sequencing adapter
10. The method of claim 1, wherein the signatures are read by massively parallel sequencing of each DNA fragment
11. A method of determining a disease state comprising the steps of analyzing the methylation of CpG islands in a sample of claim 1, and further comprising the step of mapping the sequences of the DNA fragments produced to the DNA of the human genome to determine the changes in DNA methylation.
12. A biomarker useful in determining methylation levels of CpG Islands prepared by the method of claim 1, wherein said biomarker can further detect the potential for gene silencing during carcinogenesis.
13. A biomarker useful in determining disease prepared by the method of Claim 1, wherein the biomarker can further detect a gene whose silencing is permanent.
PCT/US2010/022027 2009-01-26 2010-01-26 Digital restriction enzyme analysis of methylation WO2010085774A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14737609P 2009-01-26 2009-01-26
US61/147,376 2009-01-26

Publications (1)

Publication Number Publication Date
WO2010085774A1 true WO2010085774A1 (en) 2010-07-29

Family

ID=42356241

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/022027 WO2010085774A1 (en) 2009-01-26 2010-01-26 Digital restriction enzyme analysis of methylation

Country Status (1)

Country Link
WO (1) WO2010085774A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102061526A (en) * 2010-11-23 2011-05-18 深圳华大基因科技有限公司 DNA (deoxyribonucleic acid) library and preparation method thereof as well as method and device for detecting single nucleotide polymorphisms (SNPs)
WO2012079490A1 (en) * 2010-12-15 2012-06-21 深圳华大基因科技有限公司 Method for constructing dna sequencing library and use thereof
WO2016061624A1 (en) 2014-10-20 2016-04-28 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050009053A1 (en) * 2003-04-25 2005-01-13 Sebastian Boecker Fragmentation-based methods and systems for de novo sequencing
US20050118721A1 (en) * 2002-01-30 2005-06-02 Sven Olek Identification of cell differentiation states

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050118721A1 (en) * 2002-01-30 2005-06-02 Sven Olek Identification of cell differentiation states
US20050009053A1 (en) * 2003-04-25 2005-01-13 Sebastian Boecker Fragmentation-based methods and systems for de novo sequencing

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHUNG ET AL.: "Identification of Novel Tumor Markers in Prostate, Colon and Breast Cancer by Unbiased Methylation Profiling.", PLOS ONE., vol. 3, no. 4, 2008, pages E2079, 1 - 10 *
DAHL ET AL.: "Multigene amplification and massively parallel sequencing for cancer mutation discovery.", PROC NATL ACAD SCI, vol. 104, no. 22, 29 May 2007 (2007-05-29), USA, pages 9387 - 9392 *
ESTECIO ET AL.: "Tackling the methylome: recent methodological advances in genome-wide methylation profiling.", GENOME MED., vol. 1, no. 11, 2009, pages 106.1 - 7 *
ZRIHAN-LICHT ET AL.: "DNA methylation status of the MUC1 gene coding for a breast-cancer- associated protein.", INT J CANCER., vol. 62, no. 3, 28 July 1995 (1995-07-28), pages 245 - 251 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102061526A (en) * 2010-11-23 2011-05-18 深圳华大基因科技有限公司 DNA (deoxyribonucleic acid) library and preparation method thereof as well as method and device for detecting single nucleotide polymorphisms (SNPs)
US9493821B2 (en) 2010-11-23 2016-11-15 Bgi Tech Solutions Co., Ltd. DNA library, preparation method thereof, and device for detecting SNPs
WO2012079490A1 (en) * 2010-12-15 2012-06-21 深圳华大基因科技有限公司 Method for constructing dna sequencing library and use thereof
WO2016061624A1 (en) 2014-10-20 2016-04-28 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis
EP3209801A4 (en) * 2014-10-20 2018-03-14 Commonwealth Scientific and Industrial Research Organisation Genome methylation analysis
US10889852B2 (en) 2014-10-20 2021-01-12 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis
AU2015336938B2 (en) * 2014-10-20 2022-01-27 Commonwealth Scientific And Industrial Research Organisation Genome methylation analysis

Similar Documents

Publication Publication Date Title
Soto et al. The impact of next-generation sequencing on the DNA methylation–based translational cancer research
Toiyama et al. DNA methylation and microRNA biomarkers for noninvasive detection of gastric and colorectal cancer
Kim et al. Deep sequencing reveals distinct patterns of DNA methylation in prostate cancer
Kron et al. Discovery of novel hypermethylated genes in prostate cancer using genomic CpG island microarrays
Carvalho et al. Genome-wide DNA methylation profiling of non-small cell lung carcinomas
Davalos et al. The epigenomic revolution in breast cancer: from single-gene to genome-wide next-generation approaches
WO2017201606A1 (en) Cell-free detection of methylated tumour dna
Mullapudi et al. Genome wide methylome alterations in lung cancer
Pfister et al. Array-based profiling of reference-independent methylation status (aPRIMES) identifies frequent promoter methylation and consecutive downregulation of ZIC2 in pediatric medulloblastoma
EP2885427B1 (en) Colorectal cancer methylation marker
CA2633203A1 (en) Use of roma for characterizing genomic rearrangements
JP2014519319A (en) Methods and compositions for detecting cancer through general loss of epigenetic domain stability
Tanas et al. Rapid and affordable genome-wide bisulfite DNA sequencing by XmaI-reduced representation bisulfite sequencing
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
Li et al. Identification of novel DNA methylation markers in colorectal cancer using MIRA-based microarrays
WO2010085774A1 (en) Digital restriction enzyme analysis of methylation
JP7399169B2 (en) Tumor marker STAMP-EP4 based on methylation modification
Aiba et al. Methylated site display (MSD)-AFLP, a sensitive and affordable method for analysis of CpG methylation profiles
IL302988A (en) Detecting methylation changes in dna samples using restriction enzymes and high throughput sequencing
Kumar et al. Methods in cancer epigenetics and epidemiology
Jia et al. RETRACTED ARTICLE: DNA methylome profiling at single-base resolution through bisulfite sequencing of 5mC-immunoprecipitated DNA
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
Sipos et al. Genome-wide screening for understanding the role of DNA methylation in colorectal cancer
CN111020034A (en) Novel marker for diagnosing tumor and application thereof
WO2023106415A1 (en) Post-chemotherapy prognosis prediction method for canines with lymphoma

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10733984

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10733984

Country of ref document: EP

Kind code of ref document: A1