US20160032362A1

US20160032362A1 - Method for Enriching Methylated CpG Sequences

Info

Publication number: US20160032362A1
Application number: US14/618,581
Authority: US
Inventors: George R. Feehery; Sriharsa Pradhan
Original assignee: New England Biolabs Inc
Current assignee: New England Biolabs Inc
Priority date: 2008-11-05
Filing date: 2015-02-10
Publication date: 2016-02-04
Also published as: US20100112585A1; US8367331B2; US20130116409A1

Abstract

Compositions and methods are provided for facilitating the enrichment of single-stranded DNA containing methylated CpG in a mixture containing methylated and unmethylated DNA. The compositions relate to methylation-binding protein domains that selectively bind to methylated single strand DNA. In embodiments of the invention, the methylated DNA is eluted in 0.4M-0.6M NaCl while the unmethylated single strand DNA is eluted in less than 0.4M salt. The ability to readily enrich for methylated DNA permits high throughput sequencing of the methylated DNA and identification of abnormal methylation patterns associated with disease.

Description

CROSS REFERENCE

This application is a continuation of U.S. Ser. No. 13/722,535 filed Dec. 20, 2012 which is a divisional of U.S. Ser. No. 12/608,489 filed Oct. 29, 2009, now U.S. Pat. No. 8,367,331, which claims priority from U.S. provisional application Ser. No. 61/111,499 filed Nov. 5, 2008, herein incorporated by reference.

BACKGROUND OF THE INVENTION

The task of epigenomic mapping is inherently more complex than genome sequencing since the epigenome is much more variable than the genome. While an individual only has one genome, one's epigenome varies in time and space with age, tissue type, exposure to environmental factors, and shows aberrations in diseases especially in cancer. With methylated CpG's only accounting for ^˜2-6% of the genome (18), large scale shotgun sequencing efforts will require some form of purification of short CpG methylated sequences. Many current enrichment technologies fall short of the dynamic range necessary to capture minute changes in CpG methylation that can have large repercussions in gene expression.
In the mammalian genome, 60-80% of relatively infrequent (1 per 100 bp on average) CpG dinucleotides are methylated at the carbon 5 position (1). In contrast, dense clusters of unmethylated CpG sequences (^˜1 per 10 bp) are found at the transcription start sites of genes (2). In certain circumstances, these CpG islands are heavily methylated with the concomitant silencing of the promoter and the silencing of gene activity (3). These modifications are considered to be important for development (4), genomic imprinting (5), and X chromosome inactivation through gene silencing (6, 7). Aberrant DNA methylation of CpG islands has been frequently observed in cancer cells (8).
Many techniques exist for the enrichment of heavily methylated CpG islands from genomic DNA. One protocol relies on methylation-sensitive restriction endonucleases such as HpaII (CCGG) and HhaI (GCGC) followed by PCR identification, Southern Blot analysis or microarray profiling (9). Another approach utilizes the ability of an immobilized methyl-CpG-binding domain (MBD) of the MeCP2 protein to selectively bind to methylated double-stranded DNA sequences. Restriction endonuclease-digested genomic DNA is loaded onto the affinity column and methylated-CpG island-enriched fractions are eluted by a linear gradient of sodium chloride. PCR, microarray, DNA sequencing and Southern hybridization techniques are used to detect specific sequences in these fractions (10). These techniques are limited due to the specific cleavage moiety of the restriction enzyme and therefore will not completely reflect all combinations of bases flanking the methylated CpG dinucleotide.
There are several additional methods for analysis of methylation patterns. In the bisulfite method, single-stranded DNA (ssDNA) is exposed to a deamination reagent (bisulfite) that converts unmethylated cytosines to uracils while methylated cytosines remain relatively intact (11). After cleanup, the resultant treated DNA of interest must be PCR amplified (converting the uracils to thymines) and analyzed by a myriad of techniques that can distinguish between methylated and unmethylated DNA. If the PCR products are cloned and sequenced, alignment analysis of the untreated and treated nucleotide sequences can reveal the in vivo methylation status of the amplified region. The PCR products can also be analyzed by combined bisulfite-restriction analysis (COBRA assay) and methylation-specific PCR (MSP) (12, 13).
Recently, direct shotgun ultra-high-throughput sequencing of bisulfite-converted DNA using the Illumina 1G Genome Analyzer and Solexa sequencing technology have yielded insights of the methylation state of the small (^˜120 Mbp) genome of the mustard plant Arabidopsis (14). This new technology allowed the exact identification and quantification of 5-methylcytosines at the single-nucleotide level in genes. Although highly specific and reasonably sensitive, it required at least 20-fold coverage to theoretically cover all potential methylated cytosines. Currently, no method exists to enrich bisulfite-converted CpG methylated DNA, which by the nature of the deamination reaction, is single-stranded, from total genomic DNA.

SUMMARY

Methods and compositions are described herein that include the embodiments listed below.
In one embodiment, an isolated first polypeptide is provided that includes an amino acid sequence having at least 90% homology or identity with SEQ ID NO:3 and is capable of binding single-stranded methylated polynucleotides. The first polypeptide may be fused to a second polypeptide and may be immobilized on a solid substrate by means of the second polypeptide if the second polypeptide is a substrate-binding domain such as maltose-binding domain (MBP). A property of the isolated first polypeptide may include an ability to bind a methylated CpG in a single-stranded polynucleotide.
Examples of the first polypeptide are human UHRFI, and mouse NP95 SRA. Either of these polypeptides may be used in series or in parallel with a methyl-binding domain (MBD), which binds double-stranded methylated DNA and thus recovery of methylated DNA may be enhanced. For example, the sample may be applied to a MBD column, eluted, denatured and then applied to an SRA column. Additionally, one aliquot of a sample may be applied to an MBD column and one aliquot of sample applied to an SRA column.
The above-described polypeptides either alone or as a fusion protein, either in solution or immobilized on a substrate, may be used for differentially binding a single-stranded methylated polynucleotide to a solid substrate, for example at a CpG site in a low salt solution.
In an embodiment of the invention, a method is provided for enriching for CpG methylated single-stranded polynucleotides from a mixture containing methylated and unmethylated polynucleotides. This method includes: binding the mixture to the first polypeptide described above; eluting the unmethylated polynucleotide from the isolated polypeptide in a solution containing a low concentration of a salt; and eluting the methylated polynucleotide from the isolated polypeptide in a solution containing a high concentration of a salt. The eluted methylated polynucleotide can then be sequenced and the methylation site analyzed.
In embodiments of the invention, a low concentration of the salt is less than 0.4 M salt and a high concentration of the salt is 0.4 M-0.6 M salt. The salt may be, for example, sodium chloride.
In an embodiment of the invention, a method is provided which can be applied to determining the existence of pre-cancerous cells. The method includes: (a) comparing the methylation pattern for selected polynucleotide sequences in both pre-identified transformed eukaryotic cells and non-transformed eukaryotic cells by differential binding of methylated polynucleotides to the first polypeptide of claim 1; (b) determining the presence of abnormal methylation patterns associated with alteration of tumor suppressor function; and (c) utilizing the abnormal methylation patterns as a diagnostic tool for determining whether any eukaryotic cells in a sample are transformed. (In this context “transformed” is intended to mean converted to a pre-cancerous state where the cell is immortalized.)

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C show a GST-SRA-domain resin with bound and eluted methylated, and unmethylated dsDNA at low NaCl; and eluted methylated ssDNA at high NaCl.

FIG. 1A is a chromatogram profile at A280 of human chromatin DNA spiked with a small amount of FAM-labeled methylated (M) and unmethylated (U) CpG-containing oligonucleotides. Both the unmethylated and methylated oligos co-eluted with the bulk of the chromatin DNA between 0.2 M and 0.3 M NaCl.

FIG. 1B shows a gel containing individual column fractions in each lane. At higher NaCl, a faint band (*) on the gel was observed corresponding to single-stranded methylated DNA.

FIG. 1C shows a side-by-side comparison of the methylated and unmethylated oligos confirming that the band (*) corresponded to methylated CpG-containing ssDNA.

FIGS. 2A-2B show a DNA preparation with significantly altered elution characteristics of the GST-SRA-domain column.

FIG. 2A is a comparison of chromatogram profiles at A280 of 100 μg of MseI-digested HeLa DNA spiked with 3 μg of MseI digested M.SssI-labeled ³H-Adomet HeLa DNA. The DNA composition was heated to 98° C. for one minute and quickly chilled prior to loading onto the column. A large portion of the ³H-labeled DNA eluted off the column at 0.15 M NaCl, however, three distinct peaks that eluted at 0.3 M, 0.35 M and 0.4 M NaCl were observed with a small peak of ³H-labeled DNA co-eluted with the 0.4 M NaCl peak. The gel shows the content of each fraction.

FIG. 2B shows the same DNA load preparation, which was sonicated for 1 minute followed by heating of the sample to 98° C. for 1 minute, chilled, and loaded onto the column. Three peaks were observed at 0.35 M, 0.4 M and 0.45 M NaCl with the bulk of the ³H-labeled DNA co-eluted with the 0.4 M and 0.45 M peaks, respectively. The gel shows the content of each fraction.

FIG. 3 shows a flowchart of the procedures used to enrich single-stranded methylated CpG-containing DNA. Total genomic DNA was sonicated to 50-150 base fragments. The sample was heated to 98° C., chilled and loaded onto the GST-SRA-domain column (or magnetic beads), or bisulfite-converted (which made the sample single-stranded and converted all non-methyl cytosines to uracils) prior to loading. The column/beads were washed with buffer containing 0.3 M NaCl, which eluted the active gene fraction. Methylated CpG-containing DNA remained on the column matrix and can be eluted with 0.5 M NaCl or alternatively equilibrated with low NaCl buffer prior to the addition of the “fourN” cloning/sequencing primer (SEQ ID NO:1). The sample was heated to 98° C., chilled to 4° C., and then slowly raised to 37° C. Sequenase was introduced into the reaction, allowed to extend the ssDNA fragments, heated and chilled, with more Sequenase added to label the other end of the DNA fragment. The defined-ends DNA was further amplified by a complementary PCR primer without the random nucleotides, purified and digested with BamH1, purified and cloned into a sequencing vector.

FIGS. 4A-4D show a simplified step salt gradient of GST-SRA-domain column yielded reproducible elution profiles.

FIGS. 4A-4B show a comparison of two chromatogram profiles at A280 of 100 μg of sonicated, heated HeLa genomic DNA FIG. 4A or 200 μg initial concentration of sonicated, bisulfite-converted genomic DNA FIG. 4B. The 0.3 M and 0.5 M fractions were characterized by qRT-PCR or cloned and sequenced.

FIG. 4C shows the bisulfite-converted fractions which were labeled and extended with a random “fourN” oligonucleotide, and PCR amplified. Ethidium-stained 20% TBE polyacrylamide gel analysis of the PCR products before (−) and after (+) BamH1 treatment showed the size distribution of fragments from the two peaks.

FIG. 4D shows GST-SRA-domain coupled magnetic beads only retained methylated (M) ssDNA lambda DNA after extensive washing with 0.3M NaCl as assayed on an ethidium-stained 20% TBE polyacrylamide gel.

FIG. 5 shows active and inactive gene enrichment from GST-SRA-domain column. Active genes showed at least a 2-fold enrichment over input DNA in the 0.3 M peak. Single copy inactive genes showed a direct correlation of the fold enrichment and CpG occupancy in the 0.5 M peak. As the copy number increased, satellite and line elements showed an inverse correlation between CpG occupancy and enrichment.

FIG. 6 shows a cartoon of the UHRFI gene illustrating the location of the different domains in the protein. The inset shows an amino acid alignment of the SRA domains from mouse and human (SEQ ID NOS:2 and 3, respectively), revealing that the sequences are 90% identical.

FIG. 7 shows the DNA sequences of mouse and human (SEQ ID NOS:4 and 5, respectively).

FIG. 8 shows how SRA domain can be used in sequencing platforms (e.g. Helicos sequence platform) to detect methylated CpG DNA. 1. Methylated ssDNA (SEQ ID NO:6) annealed to polyT on a slide. 2. Methylated cytosine detected by fluorescence labeled NP95 SRA domain and 3. SRA is washed off. DNA is sequenced.

Within the flow cells, billions of single molecules of ssDNA are captured on a solid surface. These captured strands serve as templates for the sequencing-by-synthesis process. Prior to the addition of polymerase and one fluorescently labeled nucleotide (C, G, A or T), the cell is flooded with MBP-SRA domain protein, which binds specifically to methylated CpG sequences. The cell is washed with a 100 mM NaCl wash buffer, and fluorescently labeled Anti-MBP antibody couples to the MBP-NP95 SRA domain/methylated CpG DNA complexes. After a wash step, which removes free Anti-MBP antibody, the cell is imaged and the positions of the methylated CpG-containing DNA strands are recorded. A high wash step (500 mM NaCl) removes the Antibody-MBP-NP95 SRA domain and the sequencing process continues with a polymerase catalyzing the sequence-specific incorporation of fluorescent nucleotides into nascent complementary strands on all the templates. Multiple cycles result in complementary strands greater than 25 bases in length synthesized on billions of templates, providing a sequence read on the methylated CpG templates.

FIG. 9 shows a flowchart of the procedure used to compare a commercially available methylated CpG DNA enrichment system (e.g. Invitrogen) with MBP-NP95 SRA domain. Total HeLa genomic DNA was sonicated to 50-150 base fragments. Half of the sample was heated to 95° C. for 5 minutes and chilled on ice. The other half of the sample was not heated. To 1 μg of unheated sample, 1 μg of biotinylated (bt) MBD and buffer were added. Similarly, to 1 μg of heated DNA, 1 μg of MBP-NP95 SRA domain and buffer were added. Both samples were incubated at room temperature for 20 minutes. To the bt-MBD sample 100 μl (1 mg) of Streptavidin Magnetic Beads was added. To the MBP-NP95 SRA domain sample 100 μl (1 mg) of Anti-MBP Magnetic Beads was added. The samples were then incubated overnight at 4° C. with rotation. The bound complexes were then washed 3× with 100 mM NaCl, 1% Triton, 0.1% Tween buffer, with magnetic separation and aspiration of buffer and 1× with TE buffer containing 0.1% Tween. Finally, a small quantity of water was added to the aspirated samples, and the enriched methylated DNA complexes were eluted from the magnetic beads by heat. The complexes were then assayed by qPCR using primer sets to known active and inactive genes in HeLa DNA.

FIG. 10 shows the number of fold enrichment values of known methylated (inactive) and unmethylated (active) genes comparing a commercially available methyl CpG enrichment system (e.g. Invitrogen) with MBP-NP95 SRA domain protein. Both techniques resulted in similar enrichment of the inactive genes rDNA and MYOD, with no enrichment of the active gene RPL30.

DETAILED DESCRIPTION OF EMBODIMENTS

UHRFI is a ubiquitin-like protein that improves fidelity of maintenance of methylation and has a histone methyltransferase function. It contains multiple domains (see FIG. 6). Two adjacent domains in the protein are named SET and RING and together are called the SRA domain. The SRA domain has a sequence shown in FIG. 7. The SRA domain is capable of binding methylated CpG in a salt-dependent manner. In an embodiment of the invention, the SRA is immobilized on a matrix and can be used to bind methylated and unmethylated ssDNA or bisulfite-converted genomic DNA at low salt conditions (for example 0.15 M NaCl). The unmethylated DNA can be eluted from the SRA protein in conditions of increased salt concentration such as 0.3 M NaCl while methylated DNA can be eluted at 0.5 M NaCl.
Human UHRFI is an example of a family of DNA-binding proteins that are associated with regulating gene expression via methylation. Other examples include DNMTI and mouse NP95 SRA. This family of related proteins are shown here to be effective in differentiating methylated from unmethylated DNA.
These proteins can be produced in high yield and are relatively stable, which makes them suitable for attaching to solid substrates such as agarose resin or carbohydrate-coated beads or magnetic beads (NEB) without loss of binding activity. The immobilized protein can easily be integrated in a high-throughput bisufite sequencing setup. With just one wash step, mild elution characteristics, sensitivity and accuracy are enhanced. Thus, the reusable matrix provides valuable information on the methylome, providing insights into aging and disease.
There are a variety of approaches by which the SRA-like proteins can be immobilized on a matrix. The matrix may include beads, 96 well plastic dishes, columns or any other support material. Where beads are selected, these can be magnetic, colored and/or coated with a carbohydrate or other ligand suitable for binding the SRA. To facilitate binding of the SRA-like proteins to a matrix, the SRA-like protein can be synthesized as a fusion protein by standard molecular biology techniques in prokaryotic or eukaryotic host cells. For example, the SRA-like proteins may be synthesized as SRA-chitin-binding domain for binding chitin or SRA-MBP for binding to amylose. Examples of suitable fusion proteins are provided for example in U.S. Pat. No. 5,643,758.
Other examples of fusion proteins include SRA-AGT or SRA-ACT proteins (using the SNAP-Tag® or CLIP-Tag™ technology provided commercially by New England Biolabs). These fusion proteins can be labeled as required for detection of purification of polynucleotides for example by using fluorescent labels after covalent binding of the ACT/AGT in the fusion protein to labeled substrates such as benzyl guanine or benzyl cytosine, leaving available the SRA to bind methylated DNA in vitro or in vivo.
The SRA may also be bound to a matrix or solid substrate such as beads, columns, glass, plastic or polymer surfaces, etc. Binding can be achieved by any ligand/ligand-binding molecule system including antibody/antigens or biotin/strepavidin, chitin-binding domain, maltose-binding domain, etc. SRA-like proteins may be synthesized as intein fusions to facilitate certain separation methods (U.S. Pat. Nos. 5,496,714 and 5,834,247).
In an embodiment of the invention, a binding preference for methylated single-stranded polynucleotides by SRA-like proteins was demonstrated. This property can be exploited for detection, purification and analysis of the polynucleotides using immobilized SRA bound to the matrix. The methylated polynucleotides can then be sequenced to identify the location of the methylated CpG. In another embodiment, a double stranded polynucleotide can be bound to SRA where methylation if present can be detected on one strand or the other.
Mammalian UHRF1 SRA domains (such as human UHRF1 or murine NP95) can be used to augment high-throughput sequencing methodologies, for example, True Single Molecule Sequencing (tSMS)™ technology (Helicos Biosciences) by binding and identifying single-stranded methylated CpG-containing DNA prior to a series of nucleotide additions and detection cycles that will then determine the sequence of each fragment (FIG. 8). By integrating the UHFR1-SRA domain into this instrumentation setup, additional epigenetic information can be layered on top of rapid and inexpensive resequencing of genomes to facilitate the understanding of methylation states in complex organisms.
The mammalian UHRF1 SRA domains can be displaced from the polynucleotide by adding cations that neutralize the charge on the DNA and thereby release the electrovalently bound protein. In embodiments of the invention, the protein binding to the polynucleotide is disrupted using NaCl. However, the use of this salt is not intended to be limiting. Moreover, it was found that protein binds to polynucleotide at methylated CpGs more tightly so that a high salt concentration was required to release CpG methylated polynucleotides and a low salt concentration was required to release CpG unmethylated polynucleotides. In an embodiment of the invention, the low salt concentration was 0.3 M NaCl whereas the high salt concentration was 0.5 M NaCl. Table 1 provides the results of a two-step salt gradient.
Table 1 shows a sequence analysis of the two NaCl peaks from the GST-SRA-domain column. Greater than 10-fold enrichment of methylated CpG-containing DNA was observed. 19/30 reads with an average size of 63 bases in the high (0.5 M) NaCl fraction contained at least one methylated CpG. 44/1900 bases were methylated CpG or 2.32% of the total. 3/22 reads with an average size of 105 bases in the low salt 0.3M peak contained methylated CpG. 5/2327 bisulfite-converted bases were identified as methylated CpG or 0.215% of the total.
All references cited herein, as well as U.S. provisional application Ser. No. 61/111,499 filed Nov. 5, 2008 and U.S. Ser. No. 12/608,489 filed Oct. 29, 2009 are incorporated by reference.

EXAMPLES

Example 1

SRA-Domain Protein Purification and the Covalent Coupling of the Protein to Solid-State Matrixes

The SRA domain (386-618) was amplified from full-length human UHRF1 cDNA synthesized using total RNA from HeLa cells. The product was cloned into pENTR-TEV (GST Tag Invitrogen) and recombined into pDEST15 (Invitrogen, Carlsbad, Calif.) to create the GST fusion. The construct was propagated in T7 Express E. coli (NEB) to an OD 590 of 0.5 at 37° C. and induced with 0.1 mM IPTG overnight at 16° C. Cells were spun, broken open by French press, spun again and the supernatant layered over a 10 ml Glutathione Separose High Performance column (GE Healthcare). After a 10-column wash, the protein was eluted with a 10 mM L-Glutathione (Sigma) solution. The yield was 12 mg total of purified SRA-domain from 8 liters shake flasks.

GST-SRA Column

9 μls of 1.2 mg/ml (10.8 mg total) of previously purified and dialyzed GST-SRA-domain protein in 10 mM Tris pH. 7.5, 1 mM EDTA and 0.2 M NaCl was layered onto a 4.5 ml Glutathione Sepharose matrix equilibrated with the above buffer. Of the 10.8 mg load, 7.83 mg remained bound to the column. The resin was washed with 10 column volumes of the above buffer, then cycled twice with the above buffer supplemented with 1 M NaCl before final equilibration at 0.05 M NaCl. Sequences of the methylated oligonucleotides were FAM-GTAGG5GGTGCTACA5GGTTCCTGAAGTG top strand (SEQ ID NO:7), FAM-CACTTCAGGAAC5GTGTAGCAC5GCCTAC bottom strand with 5=5 methyl cytosine. Sequences of the unmethylated oligonucleotides were GTCACTGAAGCGGGAAGGGACTGGCTGCTCCCGGGCGAAGTGCCGGGGCAGGATCT-FAM top strand (SEQ ID NO:8), AGATCCTGCCCCGGCACTTCGCCCGGGAGCAGCCAGTCCCTTCCCGCTTCAGTGAC-FAM bottom strand.
qPCR Analysis of NaCl Fractions from GST-SRA-Column
DNA from the high and low salt fractions were characterized by real-time PCR on a Bio-Rad MyiQ iCycler using Bio-Rad iQ SYBR Green Supermix and the following primer sets: hsALDOA TCCTGGCAAGATAAGGAGTTGAC forward (SEQ ID NO:9), ACACACGATAGCCCTAGCAGTTC reverse (SEQ ID NO:10), hsSERPINA GGCTCAAGCTGGCATTCCT forward (SEQ ID NO:11), GGCTTAATCACGCACTGAGCTTA reverse (SEQ ID NO:12), hsRPL30 CAAGGCAAAGCGAAATTGGT forward (SEQ ID NO:13), GCCCGTTCAGTCTCTTCGATT reverse (SEQ ID NO:14), hsRASSF1 TCATCTGGGGCGTCGTG forward (SEQ ID NO:15), CGTTCGTGTCCCGCTCC reverse (SEQ ID NO:16), hsMYO-D CCGCCTGAGCAAAGTAAATGA forward (SEQ ID NO:17), GGCAACCGCTGGTTTGG reverse (SEQ ID NO:18), hsMYT1 TGAAACCTTGGGTGTCGTTGGGAA forward (SEQ ID NO:19), TTGCGGGCCATTGTTCCATGATGA reverse (SEQ ID NO:20), rDNA CGTACTTTATCGGGGAAATAGGAGAAGTACG forward (SEQ ID NO:21), GTGCTTAGAGAGGCCGAGAGGA reverse (SEQ ID NO:22), hsSAT ATCGAATGGAAATGAAAGGAGTCA forward (SEQ ID NO:23), GACCATTGGATGATTGCAGTCA reverse (SEQ ID NO:24), LINE CGGAGGCCGAATAGGAACAGCTCCG forward (SEQ ID NO:25), GAAATGCAGAAATCACCCGTCTT reverse (SEQ ID NO:26). Cycle program was as follows: cycle 1: (1×) 95° C., 5 minutes, cycle 2 (40×) step 1: 95° C. 10 seconds, step 2: 61° C. 30 seconds, step 3 72° C. 30 seconds.
Cloning and Sequencing of NaCl DNA Fragments from GST-SRA-Column
Eluted and de-salted DNA fragments were cloned into BamH1 cut and alkaline phosphatase (CIP) treated LITMUS 28i cloning vector using the “fourN” procedure (17) with the exception of the sequence of the oligonucleotide: GTTTCCCAGTCAGGATCCNNNN (SEQ ID NO:1) and PCR primer GTTTCCCAGTCAGGATCC (SEQ ID NO:27). PCR products were purified using Qiagen columns cut with BamH1, purified again, ligated to the vector and cloned as stated.

Results

GST-SRA-Domain of Human UHFR1 Coupled to a Solid Matrix Enriched Single-Stranded Methylated CpG-Containing DNA

To determine the preference of the SRA-domain for unmethylated, fully methylated or hemi-methylated double-stranded or ssDNA in a solid state matrix, the following experiment was performed. 7.83 milligrams of purified GST-SRA domain was bound to a 4.5 ml GST column. 1.68 milligrams of MNase digested chromatin (^˜150-1000 bp) from human Jurkat cells spiked with 1 μg each of fluorescein (FAM)-labeled double-stranded methylated CpG oligonucleotide and unmethylated CpG oligonucleotide of different sizes were layered onto the column in buffer A (10 mM Tris pH. 7.5, 1 mM EDTA, 0.05 M NaCl). After a 10 volume column wash with buffer A, the column was developed with a 100 ml NaCl gradient to 1 M and the fractions were assayed by gel electrophoresis (FIGS. 1A-1C). Both the methylated and unmethylated DNA oligos co-eluted with the bulk of the chromatin DNA between 0.2 M and 0.3 M NaCl. Interestingly, a faint fluorescent band that was smaller than the two annealed oligos was eluted off the column at ^˜0.4 M NaCl. It was speculated that this band might contain unannealed methylated ssDNA.
To further investigate the binding preferences of the SRA-domain resin for ssDNA, 100 μg of MseI-digested HeLa DNA spiked with 3 μg of MseI-digested M.SssI-labeled ³H-Adomet HeLa DNA was applied to the above equilibrated GST-SRA domain column. After column wash in buffer A, a 30 ml step gradient from 0.1 M to 0.6 M NaCl was initiated and fractions collected. The double stranded DNA and the ³H-labeled fully methylated double-stranded DNA eluted off the column in the first two fractions at 0.15 M NaCl. Next, another DNA preparation of the same composition was heated to 98° C. for 1 minute and quickly chilled on ice for 5 minutes prior to loading on the equilibrated column. The above step gradient was used to elute the DNA and the fractions were analyzed as before. A large portion of the ³H-labeled DNA eluted off the column at 0.15 M NaCl; however, three distinct peaks that eluted at 0.3 M, 0.35 M and 0.4 M NaCl were observed with a small peak of ³H-labeled DNA co-eluted with the 0.4 M NaCl peak. Finally, a third DNA load preparation was sonicated for 1 minute followed by heating of the sample to 98° C. for 1 minute, chilled, and loaded onto the column. Three peaks were observed at 0.35 M, 0.4 M and 0.45 M NaCl with the bulk of the ³H-labeled DNA co-eluted with the 0.4 M and 0.45 M peaks, respectively (FIGS. 2A and 2B). It was concluded that sonication plus heating of the sample fully fractionated the genomic DNA into a single-stranded form that facilitated binding of the DNA to the resin and greatly improved the resolving power of the matrix to discriminate between unmethylated and fully methylated CpG DNA.

Simplified Elution Profile Enriched Active and Inactive Genes

A new DNA preparation containing 100 μg of sonicated, heated HeLa genomic DNA was layered onto the above equilibrated column in buffer A. To simplify the elution protocol, a 0.15 M wash step and a 0.3 M and 0.5 M elution steps were employed. Fractions containing the 0.3 M and 0.5 M peaks were collected, desalted and concentrated using a Qiagen miniprep column (FIG. 3 flow chart and FIGS. 4A-4D). The products from the salt fractions were characterized by qPCR on a BioRad iCycler using primers to known active and inactive genes in HeLa cells (FIG. 5). The actively transcribed genes Aldolase A (ALDOA), serpin peptidase inhibitor (SERPINA) and 60S ribosomal protein L30 (RPL30) showed a consistent two-fold enrichment in the 0.3 M peak over input DNA. The high salt peak, presumably containing the inactive gene fraction, revealed little or no enhancement of these genes.
Six known repressed areas of the HeLa genome were interrogated in a similar fashion. Single-copy genes RAS association domain family protein 1 (RASSF1), myogenic differentiation 1 (MYO-D), and myelin transcription factor 1 (MYT1) as well as tandem repetitive ribosomal DNA (rDNA) showed a direct correlation of fold enrichment and CpG occupancy in the 0.5 M peak. Highly repetitive satellite DNA (hsSAT) showed less enrichment in the high salt peak. In spite of high CpG content, long interspersed nuclear (LINE) elements that are transcribed by RNA polymerase II into mRNA (16) showed little difference between the low and high salt fractions, suggesting that the SRA-domain column may accurately reflect the extent of methylation of these sequences in the genome.
Random Sequencing of Cloned Fragments Derived from NaCl Eluted Fractions
Sodium bisulfite conversion of genomic DNA, while highly degrading as a consequence of the reaction, can yield very high-resolution information about the methylation state of a given segment of DNA. As the SRA-domain resin favored fragmented ssDNA, it was ideally suited to bind and resolve bisulfite-converted DNA. To explore the characteristics of the SRA-domain column when bisulfite DNA is applied, 200 μg of HeLa genomic DNA converted by the Epitect Bisulfite Kit (Qiagen) was applied to the equilibrated column, washed and eluted as before. As in previous runs, two peaks were observed at the 0.3 M and 0.5 M NaCl step elutions. Fractions were collected, concentrated and de-salted by Qiagen columns. Cloning of the fragments was accomplished using a modification of the “fourN” procedure (17) in which a small oligonucleotide containing four random bases followed by a BamHI restriction site were annealed to the fragments at both ends and extended with Sequenase. Primers complementary to known sequences introduced during the random priming reaction were added and a PCR reaction amplified the products. After cleavage with BamHI restriction enzyme, the DNA was cloned into a BamHI linearized Litmus 28i vector and plated on AMP/IPTG/XGAL plates (FIG. 3 flow chart).
The DNA from 100 white colonies of the 0.5 M peak and 50 colonies of the 0.3 M peak were submitted for sequencing. Of those 100 reads from the 0.5 M peak, 30 were deemed suitable for analysis by the following criteria: 1) Contained viable sequences that could be identified by NCBI BlastN as human; 2) Showed evidence of non-methyl cytosine conversion (C to T or G to A, depending on orientation); and 3) unconverted C that was followed by G or unconverted G followed by C, again depending on forward or reverse sequencing orientation. Out of these 30 reads (Table 1) with an average size of 63 bases, 19 contained at least one methylated CpG. Of the 1900 bases sequenced, 44 were methylated CpG or 2.32% of the total. Amazingly, out of the 19 methylated CpG sequences, 10 mapped to known CpG methylation sites: nuclear receptor subfamily 4 (19), Fanconi anemia (20), von Willebrand factor (21), coagulation factor XIII and transglutaminase (22), chromodomain protein Y-like (23), spectrin repeat (24), HECTD1 (25), zinc finger and BTB domain containing 46 (26), and pumilio (27). Out of 22 reads with an average size of 105 bases in the low salt 0.3M peak, 3 contained methylated CpG. Of these 2327 bisulfite-converted bases, 5 were identified as methylated CpG or 0.215% of the total. Although limited in scope, these data showed a better than 10-fold enrichment of methylated CpG from the high NaCl peak versus the low NaCl peak. Additional sequencing efforts will be required to fully determine the potential fold enrichment by the SRA-domain resin as compared to random sequencing of genomic DNA or to CpG methylated DNA that was augmented by other means such as an MBD column.

GST-SRA-Domain Protein Covalently Coupled to Magnetic Beads Showed Similar Binding and Elution Characteristics

An alternative to column chromatography, GST-SRA-domain protein covalently coupled to a nonporous paramagnetic particle was tested for its suitability as a high-throughput purification matrix for methylated CpG sequences. To compare the binding characteristics of the GST-SRA-domain magnetic beads, 5 μg of sonicated unmethylated lambda DNA or 5 μg of sonicated fully enzymatically methylated (M.SssI) lambda DNA was added to a 50 μl of a 50% slurry of 10 mg/ml SRA-domain magnetic beads in 150 mM NaCl, 0.1 % Tween 20, 10 mM Tris pH 7.5, and 1 mM EDTA and allowed to mix end over end for 30 minutes at room temperature. The tubes were placed on a magnetic separation rack and the supernatant was aspirated. The samples were washed and magnetically separated three times by the above buffer supplemented with 150 mM NaCl. The beads were then loaded directly on a 20% native TBE acrylamide gel for analysis. Similarly, sonicated methylated and unmethylated lambda DNA samples were heated to 98° C. and chilled prior to binding on the magnetic beads, followed by washes as stated above. Based on the ethidium stained DNA gel, it was determined that only the methylated heated lambda DNA remained on the beads after the 0.3 M NaCl washes (FIGS. 4A-4D). Additional work is needed to characterize the DNA fragments that remain bound to the beads by direct linker addition and DNA sequencing.

Example 2

Common Properties Shared by Sra Domains from Different Sources

MBP-NP95 SRA-domain fusion protein effectively enriched single-stranded methylated CpG DNA using a small amount of input DNA. This was demonstrated as described below.
The SRA domain of mouse NP95, which is 90% identical to human UHRF1, bound and enriched fragmented methylated ssDNA using 1 μg of input DNA. In addition, mouse NP95 SRA domain purified methylated CpG-containing DNA by 20-25 fold from 1 μg of fractionated ssDNA, and was comparable to methyl binding domain in yield and sensitivity.
An alternative to column chromatography, a MBP-NP95 SRA-domain fusion protein in conjunction with Anti-MBP monoclonal antibody coupled to a paramagnetic bead was tested for its suitability as a high-throughput purification matrix for methylated CpG sequences. To compare the binding and elution characteristics of the NP95 SRA-domain with a commercially available methylated CpG enrichment system employing biotinylated MBD (MethylMiner™ Methylated DNA Enrichment Kit from Invitrogen), 1 μg of sonicated, heated HeLa DNA (NP95 SRA) and 1 μg of sonicated HeLa DNA (MBD) was added to 1 μg of MBP-NP95 SRA (15 μl) or 1 μg of biotinylated MBD (2 μl), in a 200 μl total reaction mix containing 20 μl 10× NEBuffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 1 mM dithiothreitol pH 7.9) and 2 μl 100 μg/ml BSA was incubated for 30 minutes at room temperature. To the MBP-NP95 SRA reactions, 100 μl (1 mg) of Anti-MBP magnetic beads (NEB) was added. To the MBD reactions, 100 μl (^˜1 mg) of streptavidin magnetic beads (Invitrogen) was added. Both reactions were allowed to mix end over end overnight at 4° C. The tubes were placed on a magnetic separation rack and the supernatant was aspirated. The samples were washed and magnetically separated 3× by 15 ml of wash buffer (20 mM Tris-HCl pH 7.5, 100 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Tween 20) followed by a final 15 ml wash in low salt buffer (20 mM Tris-HCL, 1 mM EDTA, 0.1% Tween 20 (see FIG. 9). 140 μl of water was added to the bead complexes and the DNA samples were heated to 98° C. to liberate the enriched methylated DNA. The products from this heat step were characterized by qPCR on a BioRad iCycler using primers to known active and inactive genes in HeLa cells. The actively transcribed gene ribosomal protein L30 (RPL30) showed no enrichment in the MPB-NP95 SRA samples or the bt-MBD samples. The methylated genes myogenic differentiation 1 (MYO-D), and tandem repetitive ribosomal DNA (rDNA) showed a 20-25 fold enrichment in MPB-NP95 SRA samples, and is comparable to the enrichment values in the bt-MBD samples (FIG. 8). Additional work is needed to characterize the DNA fragments that remain bound to the beads by direct linker addition and DNA sequencing.

TABLE 1

High Salt 0.5M (enriched) peak, no CpG
1
1-33.5 TGTGGGGTTGTTGTTTTGAGAGGGTTTTTTTTTGGGGTTTTTATTAATGATG (SEQ ID NO: 79)
6-33.5 AAACATTGGGAATATAGTATTTATTTTTGGTGATTATGTGTTTAGTTAAGTATTAGAGG
ATATTTTTA (SEQ ID NO: 28)
7-33.5 AATTTTTGTAGTTTTAGTAGAGATGGAGTTTTATTATGTTGGTTAGGTTGG (SEQ ID NO: 29)
8-33.5 GAAACAGGAGAATTTTTTGAATTTGGGTGGTAGAGG (SEQ ID NO: 30)
9-33.5 AGAAAATATGGTTTGTTAATGAATGATAGGTTAATTTTAGTATGTTGGTTATTTTAATA
TTTTGTTATTAGTTGGTTTGG(SEQ ID NO: 31)
H19-33.5 CAGGTATAGTGGTAAGAATTTGTAGTTTTAGTTATTTGGGAGGTTGAGTTAGGA (SEQ ID NO: 32)
H76-33.5 AAACTTTTGGTTGGGGGTGGTGGTTTATGTTTGTAATTTTAGTATTTTGGGAGGTCAAGGTGAGTGGAT
(SEQ ID NO: 33)
H2-33.5 AGGTAGTTTTATTTTGGGTTTTAGGGAATAGGAGGGAATTAGAAGGA (SEQ ID NO: 34)
H5-33.5 CAGTATTTTGGGAGGTTAAGGTAGGTGGATTATGAGGTTAGGAGATTGAGA (SEQ ID NO: 35)
H21-33.5 GATGGATTGTTTGAGTTTAGGAGTTTGAGATTAG (SEQ ID NO: 36)
H24-33.5 TGAGTTTAGTTTAAGTTGATTGGGTAGGTAAATGTTTGTTATGAATTTGGAAGTGAGAGA
(SEQ ID NO: 37)

High Salt 0.5M (enriched) peak, CpG
3-33.5 725439 bp at 3′ side: nuclear receptor subfamily 4, group A, member
2 isoform a
CAGGTGTTGAGTGGTGAGGGATGTGTAAATAAGTAAGTGTGGGGTTCGGTTATTGCGTATAGTTAGGTATATTGG
TTGTT
GTGGGGTGGGGTAGGTAATTTAAGTATTAGTATGGGTATTGGTTTTTTGTGAGGC (SEQ ID NO: 38)
4-33.5 Fanconi anemia, complementation groupM
ACAAAAATTAGTTAGGTATAGTGGTATGTATTTGTAGTTTTAGTTAATCGGGATCCTGA (SEQ ID NO: 39)
5-33.5 GENE ID: 10692 RRH \| retinal pigment epithelium-derived rhodopsin
homolog
GAATGGCAAGTATTGGATTATTTACGGTCGTGGTTGTGGATCGATA (SEQ ID NO: 40)
10-33.5 transglutaminase 2 isoform b
AGTTTGTACGGTGAAGTTTAGGTTTTATTGTGGATACGGTTGAAATAGAAGAGTGATGGG (SEQ ID NO: 41)
H6-33.5 31781 bp at 5′ side: von Willebrand factor preproprotein 46059 bp
at 3′ side: CD9 antigen
TGAACGCGGGAGGCGGAGTTTGTAGTGAGTTAAGATCGCGTTATTGTATTTTAG (SEQ ID NO: 42)
H7-33.5 ref\|NW_001838799.1\|H52_WGA192_36
GGAAACGAATGAAATTATCGAATGGAATCGAATGGTGTTATCGAACGGA (SEQ ID NO: 43)
H12-33.5 coagulation factor XIII A1 subunit precursor
CGGATAGGAGGGGTTGTTATGAAG (SEQ ID NO: 44)
H15-33.5 545337 bp at 5′ side: EGF-like repeats and discoidin I-like
domains-containing TAGTTAATTATATGTGTTCGTTATTTGTGTATGTGG (SEQ ID NO: 45)
H45-33.5 114563 bp at 5′ side: similar to hCG2036843
ATGAAAGTGTTTTGGGGATGGATGGGGGATATGGTTGTATAATGTGGCGGACG (SEQ ID NO: 46)
H55-33.5 B-cell novel protein 1 isoform a
AGAATCGTTTGAGTTTAGGAGTTTAAGATTAGTTTGGGTAATATAGTGAGATTTTGTTGTTACGAAAATAAAT
AAAAAAT
TAGTTAGGTGTGGTGGTGTATGTTTGTGGT (SEQ ID NO: 47)
H64-33.5 17408 bp at 5′ side: musashi 2 isoform b
TGTTTGTTGAGTGTACGTNTNNNGTATTTGTGTTGGGTGTATGTGGATGTGTGNGNTGAG
(SEQ ID NO: 48)
H74-33.5 Homo sapiens HECT domain containing 1 (HECTD1), mRNA
AGTTTGAAGTTTTTATAGAAGAAGGTTATGATTTATTTTCGGTAGGAAGTTTTGAAGAG
(SEQ ID NO: 49)
H15a-33.5 62438 bp at 5′ side: D-amino acid oxidase activator
AGGAAAGTTGGAAGGATGAGGATAACGTAGTGTTTTGTTGAAGAAGGAAGAGANNNNGGATTAAATTGAAATTGA
TTGGG
TTTYTAAAATGGATGGGAT (SEQ ID NO: 50)
H27-33.5 unc-51-like kinase 4 AGTTTGATTTTAGATTGTTGTGTTAGTAATGAGCGAGG
(SEQ ID NO: 51)
H30-33.5 spectrin repeat containing, nuclear envelope 2 isoform 1
TTATTTTTATAAAAATAAAAAAATTAGTTGGGTGTAGTGGCGTATGTTTGTNGTTTTAGT (SEQ ID NO: 52)
H H31-33.5 256834 bp at 5′ side: alpha 1 type IV collagen
preproprotein AACGATAAAGAAAATAAAAGGAGTGAGGGAGGATAGATGGG (SEQ ID NO: 53)
H35-33.5 pumilio 1 isoform 1
ATTAGTTAGGCGTGGGGGTGGGTGTTTGTAGTTTTAGTTATTTAGGAGGTTGAGGTAGGA (SEQ ID NO: 54)
H7a-33.5 zinc finger and BTB domain containing 46
AAGGTGGGGGTTGGGGGGNTNGTTTTTTCGGGNTGTTGTCGCGGNGGAGGAGCGTTTTAGAGTTTACGGCGTA
GTTTTATTCGTCGGNATTTAGGTGGACGTTGATCGGGGGAGAGAATTGAGTATCGGGATC
(SEQ ID NO: 55)
H9-33.5 259088 BP AT 3′ SIDE: CHROMODOMAIN PROTEIN, Y-LIKE 2
AGAGTAGAGAGATGATTAAATTTATGTTAATTTTATTATTTTGGTTTTGAGGTTGTTGTRYAAGTTTTTTAGA
ATGTGAGTCGGGTATTGTTTTTGAGGTTAACGTTATTTGGTTTGCGTTT (SEQ ID NO: 56)

Low Salt 0.3M(control) peak, CpG
13-33.3 GGGAGGTAGTGATGAGAGTAATAGATAGGGTTTAGGTGTTTGTGTATGATATGTTTG
(SEQ ID NO: 57)
L9-33.3
GATGTTATTAAATAATTAGATTATTTGTATTCGAATTGGGTAAGTAGTATAAAGGANAANGATATTATTAAAT
AATTAGACTATTTGTATTCGAATTGGGTAAGTAGTACAAAGGAGAAGTGGGGNAA(SEQ ID NO: 58)
3-2-33.3 19744 bp at 3′ side: Myc-binding protein-associated protein
TTTGTAGAAGGATGTGAGAGGAGAAGTGAGCGGTTTTATAGGTATGATGTTAGTTATAAGGGGTTGGTGAGTTGA
TGTGGGAGGATTATTTG
GTTTAGGAGTTTAAGGTTGCGGTGAGT (SEQ ID NO: 59)
L-17.33 dihydrouridine synthase 3-like
TGAGGGTTGGGTTTAGGATAGAGTATAGAGAGGGAGATTTAGTTAGGAGTTTTTTTAAGGTATATAGTTTTTG
ATTTTTAGGTAGTTAGAATAGGAACGTGGATATAGTTGGTATTTAATAGACGTATATTAGATGGATAGATTTG
TTATTGA (SEQ ID NO: 60)

Low Salt 0.3M(control) peak, no CpG
3-5-33.3
TAGTAGTATGATGTTAGTTTTTTTTAAATTATAGATTCAATAAAATTCAGTTAAAATTTTATTAGTTTTATTT
ATTTATTGATTTAGTAGAGATGGATATAGTACTGT (SEQ ID NO: 61)
3-6-33.3
GTGTTATCGTATTGGGGTTATTTGTGTAATTAATATGTGTTATTTAGTTTTAGGGTGTATGTTTATTGTTTTA
ATTATGATGGAGGTGTAGTTTGGAGATTTTGTGTTAGGAGATTAGTAGAGTTTGGGGTTTTAAGGGGATTTTT
TGTGGGGGAGAGGGATAGTTGTGTAGTAGAGTGATAATGAAGGTTTTTGATTTAATGTGTAGTTTTTAGGTTA
TGTGT (SEQ ID NO: 62)
3-8-33.3 TTTGGGAGGTTGAGGTGGGTAGATTATGATGTTAAGAGATTGAGATTAT(SEQ ID NO: 63)
L1-33.3
GATGAAAGGTTAAAAATTGAGATAGAAGATGTGATTTGGAAGGTTATAAGAGAAGTTGGATAAAGTTAAATAAGGAAA
GGAATTTAGAAAAAAGTGTTTAATGTTGTAGAAGG (SEQ ID NO: 64)
L1-19.3
CTATTCTTCCCATTCTCAACATAACTCTAACCTTCCTTCATCCTCACACCCAACAATCATTCACTCATTTATCTA
(SEQ ID NO: 65)
L-1.33
GATAAAGTTGTGNGTAGGGATTTTTGGTAGAGGGAATAGAAAGATGGAGGTGTTGAGGTAGGAGTGATGGGTAGG
TTTGAAGAGTAGAGTTTAGTGTAGTGAGGGGGTTATTAGTAAGGG (SEQ ID NO: 66)
L-11.33
ATATTTTATGGAGGAGTAATTTTTAGAGTATATGAATTGGTTTTATGGAGGAAGATTGTTATTTATAGGTTGGTG
TAAGTGATGGTAGTAGTGGTTTGTC (SEQ ID NO: 67)
L-12.33 AGAAGATAAGGAGAAGATAATTATTNTTTTGGTAGAGGTAATTGATTTGATTATTAGGA
(SEQ ID NO: 68)
L-15.33 ATGTGTATTTAAAGTAAGGTTATGAGATTTTGGATTGTTTTTTGTTTAGGATGATATGTG
(SEQ ID NO: 69)
L-16.33 AAGTAAAATAATTTTGTTTTTATTTATTTTANAGGATTGTT
(SEQ ID NO: 70)
L-18.33
AAAATTTTAAGATTAGGTAAAAATATTGTGTAAAGTGAGAGGGATGTGATGGTTAAAAAGTGATTTAAGATTT
TTGTAATTTTTAGTTATAATTTAAGA (SEQ ID NO: 71)
L-2.33
GAGATAATAGTGAGTATGATATTTTTTGTTTTTTTTATTATGTGTTAAGTATTGTTTAGGGATTAAGTGGGGT
TGTGTTTATTGTAGATGTTGTAGGTATGGAGTTAGTA (SEQ ID NO: 72)
L-20.33
ATGTATTTAGTTGTTTATTGAATATTATTTTAATATTGTATTATGAATATTGTTATGTTATGGATTTTAGGTT
TTATTAGATTGGTATTAGTATCATTTAGGAATATTTTATGATGTGTGTTGATAAATTTTTAAGATAAATGAAT
TTGAGATATGTGTGAGTATTTTATAAAATAAATTTTGTTGGA (SEQ ID NO: 73)
L-23.33 ATGGTTTGTTTGTTTTTGTGGAAAATGGTATGAAGATTGGGTTTGTATTGAATTTG (SEQ ID
NO: 74)
L-24.33
TGTAGTTTTAGTTATTTAGGAGGTTGAGATATGAGAATTATTTGAATTTGGGGGGGGAAGGTTGTAGTGA
(SEQ ID NO: 75)
L-27.33
TGAGAAGGGGGTAGTGGGGATGGTTTTGTGGGTTTATGTTGTTTTTGATTTTAGAAAATAAAGTTTTTTGTAG
GAAGTAGGTGGGAAGTAATTTGTTGATAAGTGTAAAGATTTGGGAATTATATTAAGGGGTAAATGGAGGANAG
GTGTTGGTGTTAANGAGGTAGACNTATGGGAGTTNGGTTTTAGGAANGGNNGTGGNTAGAAAGG
((SEQ ID NO: 76)
L-28.33 GGTAGGTAGATTATTTGAGGTTAGGAGTTTAAG (SEQ ID NO: 77)
L-4.33
ATATTTTTTTATTGAAGAATGTAGTTTTTTAAAATTAAAATGTATTTTTAAAATTTATTTATTATTTTTT--
GAGATAAGGTTTTGTTTTGTTGTTTAAGTTAGAGTATAGTATGTGATTATAGTTTATTGTAGTTTTGAATTTT
TGGGTTTAAG (SEQ ID NO: 78)

Table 1 above shows the results of sequence analysis of the two NaCl peaks from the SRA-domain column showed a better than 10-fold enrichment of methylated CpG DNA. Out of 30 reads with an average size of 63 bases in the high (0.5 M) NaCl fraction, 19 contained at least one methylated CpG. Of the 1900 bases sequenced, 44 were methylated CpG or 2.32% of the total. Out of 22 reads with an average size of 105 bases in the low salt 0.3M peak, 3 contained methylated CpG. Of these 2327 bisulfite-converted bases, 5 were identified as methylated CpG or 0.215% of the total.

Claims

1.-16. (canceled)

17. A composition comprising:

a first polypeptide comprising a sequence having at least 90% amino acid sequence homology with SEQ ID NO:3; and

a mixture containing methylated and unmethylated polynucleotides, wherein the polynucleotides are single-stranded.

18. The composition of claim 17, further comprising a second polypeptide fused to the first polypeptide.

19. The composition of claim 17, wherein the first polypeptide is immobilized on a solid substrate.

20. The composition of claim 18, wherein the second polypeptide is a substrate-binding domain.

21. The composition of claim 20, wherein the second polypeptide is maltose-binding protein.

22. The composition of claim 17, wherein the first polypeptide is selected from the group consisting of: human UHRF1 and mouse NP95 SRA.

23. The composition of claim 17, further comprising a low concentration of salt.

24. The composition of claim 23, wherein a low concentration of the salt is less than 0.4 M salt.

25. The composition of claim 17, further comprising salt at a concentration of 0.4 M-0.6 M salt.

26. The composition of claim 25, wherein the salt is NaCl.

27. The composition of claim 17, wherein the methylated polynucleotides contains hemi-methylated CpG.

28. A method, comprising:

(a) comparing the methylation pattern for selected polynucleotide sequences in both pre-identified immortalized eukaryotic cells and non-immortalized eukaryotic cells by differential binding of methylated polynucleotides to the first polypeptide of claim 17;

(b) determining the presence of abnormal methylation patterns associated with alteration of tumor suppressor function; and

(c) utilizing the abnormal methylation patterns as a diagnostic tool for determining whether any eukaryotic cells in a sample are immortalized.

29. The method according to 28, wherein the methylated polynucleotide contains hemi-methylated CpG.

30. The method according to claim 28, wherein step (a) further comprises forming single-stranded DNA for differential binding of the hemi-methylated CpG-containing polynucleotide.