WO2001046470A1

WO2001046470A1 - Enrichment of nucleic acid

Info

Publication number: WO2001046470A1
Application number: PCT/SE2000/002638
Authority: WO
Inventors: Claes Wahlestedt; Jingfeng Li; Veronika Zabarovsky; Eugene Zabarovsky
Original assignee: Karolinska Innovations Ab
Priority date: 1999-12-21
Filing date: 2000-12-20
Publication date: 2001-06-28
Also published as: AU2567601A

Abstract

The present invention relates to a novel method for enrichment of specific nucleic acid segments, such as a DNA, e.g. single nucleotide polymorphisms (SNPs), sequences that have been deleted, sequences that are identical between two complex genomes, etc. The present method includes steps for providing a first sample A and a second sample B derived from different sources and digestion of both said samples; amplification of sample A with a suitable primer and dNTPs comprising one unconventional base and amplification of sample B with a labelled primer and all the conventional dNTPs, followed by combination of samples A and B; denaturation and hybridization; treatment with a nuclease specific for said unconventional base, such as uracil-DNA glycosylase (UDG), and isolation of the specific segment originally present in sample B by use of the primer label. In a second aspect, the present invention relates to a kit which comprises components suitable for working the above described method.

Description

ENRICHMENT OF NUCLEIC ACID Technical field

The present invention relates to a novel method for enrichment and cloning of a nucleic acid, such as DNA. The invention also relates to a kit adapted to the various advantageous applications of the present method.

Background

The genetic information of living organisms is carried in the nucleotide sequence of their genome. Minor changes in the nucleotide sequence, such as even a single base substitution, may result in a changed expressed protein product, which in turn may change the phenotype of the organism. The knowledge of the exact various molecular defects that cause e.g. inherited diseases, as well as predisposition to genetic disorders and cancer, is increasing rapidly. During the spring of year 2000, a primary result of the HUGO (Human Genome Organization) project was reported, wherein a majority of the sequence of the human genome was presented. The information provided by HUGO will provide a starting-point for a large number of novel methodologies aimed at identifying genomic defects, such as single nucleotide polymorphisms (SNPs), deletions etc, and thus provide the basis for new diagnostic methods and therapies.

Single nucleotide polymorphisms (SNPs) are the most frequent form of DNA polymorphism found in the human genome (Gu, W., Aguirre, G. D. and Ray, K. (1998) Biotech- niques, 24, 836-837; and Brookes, A. J. (1999) Gene, 234, 177-186). Over the past years, microsatellite markers have almost completely replaced the use of restriction fragment length polymorphisms (RFLPs) (Landegren, U., Nilsson, M. and Kwok, P. Y. (1998) Genome Res., 8, 769-776), but with the identification of SNPs, the situation is now significantly different. SNPs, of which RFLPs represent a subclass, have a number of advantages over microsatellite markers (Landegren, U., Nilsson, M. and Kwok, P. Y. (1998) Genome Res., 8, 769-776; and Weiss, K. M. (1998) Genome Res., 8, 691-697). For example, they are more abundant, more evenly spaced and more stably inherited. There is increasing agreement that SNPs will be a key element in finding disease genes involved in complex traits. They will also be extremely useful for understanding human evolution, population genetics, pharmacogenetics, etc. Many methods have been identified for the detection and analysis of SNPs (Gu, W., Aguirre, G. D. and Ray, K. (1998) Biotechniques, 24, 836-837; Brookes, A. J. (1999) Gene, 234, 177-186; Landegren, U., Nilsson, M. and Kwok, P. Y. (1998) Genome Res., 8, 769-776; Lyamichev, V., Mast, A. L., Hall, J. G., Prudent, J. R., Kaiser, M. W., Takova, T., Kwiatkowski, R. W., Sander, T. J., de Arruda, M., Arco, D. A., Neri, B. P. and Brow, M. A. (1999) Nat. Biotechnol, 17, 292-296; and Gilles, P. N., Wu, D. J., Foster, C. B., Dillon, P. J. and Chanock, S. J. (1999) Nat. Biotechnol, 17, 365-370), but an optimal method for the isolation of previously unknown SNPs has yet to be developed (Kwok, P. Y. and Chen, X. (1998) Genet. Eng.(N Y), 20, 125-134). The most useful and largely exploited method for SNP isolation uses the sequence data generated by the human genome and expressed sequence tag (EST) projects (Gu, W., Aguirre, G. D. and Ray, K. (1998) Biotechniques, 24, 836-837; Brookes, A. J. (1999) Gene, 234, 177-186; Kwok, P. Y. and Chen, X. (1998) Genet. Eng.(N Y), 20, 125-134; and Taillon-Miller, P., Gu, Z., Li, Q., Hillier, L. and Kwok, P. Y. (1998) Genome Res., 8, 748-754). However, this approach has some obvious limitations. For example, it cannot be applied to many non-human species. Thus, development of a simple, reliable and efficient method for isolation of unknown SNPs is therefore highly desirable (Kwok, P. Y. and Chen, X. (1998) Genet. Eng.(N Y), 20, 125-134).

Further, deleted sequences is a common cause of genomic disorders. Although subtrac- tive methods represent potentially powerful tools for the identification of deleted sequences, including tumor suppressor genes, they have not been applied to tumor suppressor gene isolation extensively, probably because of the great complexity of the human genome. This approach has given rewarding results, however, in less complex species such as yeast and E. coli (Parikh, V. S., Morgan, M. M., Scott, R., Clements, L. S., Bu- tow, R. A., 1987. The mitochondrial genotype can influence nuclear gene expression in yeast. Science 235: 576-580; Espinosa-Urgel, M., Kolter, R., 1998. Escherichia coli genes expressed preferentially in an aquatic environment. Mol. Microbiol. 28, 325-332). Different and successful approaches for subtraction at a cDNA level have been suggested and used (Kaiser, C, Von Stein, O., Laux, G., Hoffmann M., 1999. Functional genomics in cancer research: identification of target genes of the Epstein-Barr virus nuclear antigen 2 by subtractive cDNA cloning and high-throughput differential screening using high- density agarose gels. Electrophoresis 20, 261-268Lisitsyn, N., Lisitsyn, N., Wigler, M., 1993. Cloning the differences between two complex genomes. Science 259, 946-951). However, among all the hitherto known genomic subtraction methods, only a modified variant called representational difference analysis (RDA) has produced reproducibly successful results (Lisitsyn, N., Lisitsyn, N., Wigler, M., 1993. Cloning the differences between two complex genomes. Science 259, 946-951). The main idea behind said RDA approach was to use genomic subtraction for only a subset of genomic sequences (e.g. all BamRl fragments less than lkb). Since the complexity of the genome was greatly reduced, results of such modification look promising. Another important point in RDA is that this method uses not only subtractive, but also polymerase chain reaction (PCR) kinetic enrichment to purify restriction endonuclease fragments present in one DNA population, but not in another. However, RDA still has some limitations. The technique is complicated and prone to minor impurities. The differential product is usually between 250-350bp, and this is not convenient for many applications (Lisitsyn, N.A., Lisitsina, N.M., Dalbagni, G., Barker, P., Sanchez, C.A., Gnarra, J., Linehan, W.M., Reid, B.J., Wigler, M.H., 1995. Comparative genomic analysis of tumors: detection of DNA losses and amplification. Proc. Natl. Acad. Sci. U S A 92: 151-155). Another drawback of the method is relatively low productivity of the RDA: only a few probes can be generated per one experiment (Lisitsyn et al., 1993, 1994, 1995, supra). The authors suggested that this limitation could be obviated by diminishing the number of rounds of hybridization/amplification or increasing the complexity of representation (Lisitsyn et al., 1993, supra). The increasing of complexity, however, will result in confronting new challenges as the RDA failed when the complexity of amplicons is not enough simplified (Lisitsyn et al., 1993, supra).

In addition, the identification of regions that are identical by descent (IBD) can be very important for isolation of genes responsible for genetic diseases, including cancer. IBD refers to the segments of the human genome shared by two individuals because they are inherited from a common ancestor. Regions that are IBD between individuals affected with disease conceivably can contain the disease gene(s). The identification of such regions using microsatellite markers can be difficult and expensive. The number of markers that must be tested to discover IBD regions depends on the number of generations that separate two individuals from their common ancestor (from hundred of markers in close relatives to many thousands for distantly related individuals). Recently, a new method, called genomic mismatch scanning (GMS), was suggested to solve this problem (Nelson SF, McCusker JH, Sander MA, Kee Y, Modrich P, Brown PO (1993) Genomic mismatch scanning: a new approach to genetic linkage mapping. Nat Genet 1: 11-18). The advantage of this method is that large genomes can be compared and the IBD regions idenifϊed. Originally, this method worked well using yeast DNA (Nelson et al., 1993, supra), but application of this method to mammalian genomes initially confronted several challenges. However, recent studies (Mirzayans F, Mears AJ, Guo SW, Pearce WG, Walter MA (1997) Identification of the human chromosomal region containing the iridogonio- dysgenesis anomaly locus by genomic-mismatch scanning. Am J Hum Genet 61: 111-119; Cheung VG, Nelson SF (1998) Genomic mismatch scanning identifies human genomic DNA shared identical by descent. Genomics 47: 1-6; Cheung VG, Gregg JP, Gogolin- Ewens KJ et al. (1998) Linkage-disequilibrum mapping without genotyping. Nat Genet 18:225-230) and McAllister L, Penland L, Brown PO (1998) enrichment for loci identical-by-descent between pairs of mouse or human genomes by genomic mismatch scanning. Genomics 47:7-11) have shown that GMS works in complex genomes, such as human or mouse. It was suggested that this technique can be applied to localize hereditary disease genes. The fundamental technique of GMS (Nelson et al, 1993, supra; and Cheung et al., 1998, supra) uses two sets of DNA sequences, one from each of two individuals having a common ancestor. Each DNA preparation is digested with Pstl to yield fragments with protruding 3' ends. The 3' protruding ends are protected from digestion by exonuclease III (Exolll) in later steps. One of the DNA preparations is fully methylated at all GATC sites with E.coli Dam methylase (DAM+). The other DNA preparation remains unmetylated. The two DNA pools are then mixed in equal ratios, denatured, and allowed to reanneal. Digestion of the reannealed DNA with both Dpnl and Mbol which cut at fully methylated and unmethylated GATC sites respectively results in cleavage of the homohybrids to yield smaller duplexes with either blunt (Dpnl) or 5' protruding ends ( bol). The heterohybrids are resistant to both Dpnl and b<?I digestion and survive this treatment. Discrimination between perfect, mismatch-free heterohybrids and those with base mismatches is done by three E.coli mismatch repair proteins mutH, mutL and mutS (MutHLS). Single-base pair differences and small insertion/deletions (up to 4 bp) are efficiently detected by the mismatch-repair proteins (Ellis LA, Taylor GR, Banks R and Baumberg S (1994) MutS binding protects heteroduplex DNA from exonuclease digestion in vitro: a simple method for detecting mutations. Nucleic Acids Research 22:2710- 2711). They introduce a single strand nick in the unmethylated strand at the GATC sites, specifically in the mismatch-containing duplexes. Only perfect duplexes will escape nicking during this step. All DNA molecules, except mismatch-free ones, are degraded further with ExoIII, a 3' to 5' exonuclease specifc for double- stranded DNA (dsDNA). After treatment with ExoIII, all of the DNA molecules with ssDNA regions are removed by adsorption to benzoylated naphtoylatedDEAE cellulose (BNDC). This column allows efficient removal of mainly dsDNA molecules with approximately 100 bp ssDNA regions. Thus the full-length, unaltered heterohybrids are purified from the other DNA fragments. GMS DNA was subsequently labeled using Alu-repeat specific primers and hybridized to microarrays. However, the GMS method has serious disadvantages, the most serious being that the MutHLS enzyme is not commercially available. Accordingly, there is a need within this field of an alternative to GMS.

W098/42871 in the name of Boehringer Mannheim Corporation discloses one procedure for subtractive hybridization, wherein novel adapter oligonucleotides are used to obviate the necessity for repeated replacement of adapters by restriction enzyme digestion and ligation. Lambda exonuclease and mung bean nuclease are used to destroy the ends of heterohybrids and driver molecules and mainly to digest ssDNA and protruding ends. However, said procedure is cumbersome and time consuming due to the use of the many different adaptors and primers, and therefore there is a need in this field of improved methods avoiding these drawbacks. Summary of the invention

The present invention relates to a generalized method of simple and robust enrichment and cloning of a nucleic acid, such as DNA, as defined by the claims. The method has been shown to overcome the problems with the prior art technologies described above. Thus, the novel method according to the invention is firstly useful for enrichment, cloning and testing of SNPs that represent RFLPs. In this simple, efficient and robust procedure procedure, a combination of digestion with restriction enzymes, treatment with ura- cil-DNA glycosylase (UDG) and mung bean nuclease, PCR amplification and purification with streptavidin magnetic beads, is used to isolate polymorphic sequences from the genomes of two samples.

Secondly, the novel method according to the invention is also useful to clone homozy- gously deleted sequences by rapid isolation of deleted genomic sequences. The present method is not prone to some of the limitations of RDA, as discussed in further detail in the experimental section under the headline "Results and discussion". The method is based on the same general concept as above and has shown to be a simple and reproducible procedure, and to improve subtractive enrichment, thereby avoiding excessive PCR kinetic enrichment steps that often generate small DNA products.

Thirdly, the present invention also relates to the same general method specifically adapted for cloning of identical sequences between two DNA populations.

Brief description of the drawings

Figure 1 shows a flow chart diagram of the COP procedure described in detail in Example 1 below for the analysis of DNA A and DNA B from two individuals. Figure 2 illustrates hybridization according to Example I of the isolated recombinant clones to the Southern blots with DNA isolated from individual A (lanes 1, 3, 5, 7, 9, 11, 13, 15, 17, 19) and B (lanes 2, 4, 6, 8, 10, 12, 14, 16, 18, 20).

Figure 3 shows the detection according to Example I of polymorphic sequences in DNA A and B using PCR. Figure 4 shows general schemes (A and B) explaining PCR detection of polymorphic sequences in accordance with Example I.

Figure 5 shows a flow chart diagram of the CODE procedure described in detail in Example II below.

Figure 6 illustrates hybridization according to Example II of the isolated recombinant clones to the Southern blots normal human DNA (lanes 1, 3, 5, 7, 9, 12, 14, 16, 18, 20) and tumor DNA (lanes 2, 4, 6, 8, 10, 11, 13, 15, 17, 19).

Figure 7 shows a general scheme of the experiment acording to Example III below, wherein cloning of identical sequences between two complex genomes is shown (CIS). Figure 8 is a flow chart diagram of the CIS procedure of Example III below. Figure 9 illustrates a FISH analysis of the MCH429.11 DNA (A) and resulting CIS DNA products (C) detected with FITC-conjugated avidin.

Figure 10 illustrates a FISH analysis of the MCH939.2 DNA (A) and resulting CIS DNA products (C) detected with FITC or Cy3-conjugated avidin respectively.

Detailed description of the invention

In a first aspect, the present invention relates to a method of enriching a specific, desired nucleic acid segment, such as a specific DNA segment, that is present in at least one of two samples derived from different sources. The sources may be samples from different individuals or populations or originating from different sources within the same human or non-human individual.

More specifically, the present method comprises the steps of

(a) providing a first sample A and a second sample B and digestion of the nucleic acid therein with one or more suitable restriction enzyme(s) to provide a plurality of short segments;

(b) amplification of sample A with a suitable primer and a dNTP mixture comprising dUTP instead of dTTP (accordingly, dATP, dCTP dGTP and dUTP), or wherein another base has been exchanged to a non-conventional base which was not present in the original sample; (c) amplification of sample B with a suitable labeled primer and the four conventional dNTPs (accordingly, dATP, dCTP, dGTP and dTTP),

(d) combination of samples A and B in a suitable ratio;

(e) denaturation and hybridization in appropriate environments;

(f) treatment with uracil-DNA glycosylase, or another enzyme which is capable of degradation of sample A nucleic acid by use of the non-conventional base included in step (b), and a nuclease, such as mung bean nuclease; and, optionally, repetition of steps (a)-(e) one or more times;

(g) isolation of the specific segment by use of the labeled primer.

The nucleic acid segment enriched by the present method may subsequently advantageously be cloned in a suitable vector. Advantageously, the present method also includes the steps of enzyme inactivation, e.g. by heat inactivation or any other suitable reagent depending on the enzyme used, and ligation of linkers to the short segments obtained from step (a). In one specific embodiment, specific restriction sites are be included in said linker, which sites are then useful in an additional step for digestion after the combination of the samples.

The detection of the enriched specific segment and cloning thereof in a suitable vector are performed by standard methods, see e.g. Berger and Kimmel, Guide to Molecular cloning Techniques, Methods in Enzymology, vol. 152 Academic Press Inc, San Diego, CA. Detectable labels suitable for use in the present method include any composition detectable by spectroscopic, photochemical, photochemical, biochemical, immunochemi- cal, electrical, optical or chemical means. Even though the herein examplified label is biotin, any label is useful, such as fluorescent dyes, radiolabels, enzymes, colorimetric labels etc. Means for detecting such labels are well known to the skilled in this field.

Thus, the method according to the present invention is a highly efficient and robust method for the enrichment of a specific DNA segment, such as an SNP, a nucleic acid segment deleted in one of the samples or a nucleic acid segment segment that is identical in two samples. Accordingly, the present method will often be combined with a subsequent detection of such desired segment, such as a sequencing of an SNP. Compared to the above discussed WO 98/42871 (Boehringer Mannheim Corporation), the method according to the invention is much facilitated while serving an equivalent purpose, since the same adaptors are used according to the invention, contrary to the many adaptors that in fact constitutes the essence of the W098/42871 procedure. In addition, even though a labelled and a non-labelled primer are used in the present method, these may advantageously be the same one, contrary to the different primers used in the different steps of the W098/42871 procedure. Furthermore, an essential difference to said W098/42871 is that in the present method, mung bean is used as an endonuclease, i.e. to digest mainly the bodies of the DNA and not the ends, while in W098/42871, it is used to destroy protruding ends. Finally, there has in fact been doubt expressed on a scientific level as to whether or not the W098/42871 procedure can really work in practice.

The nucleic acid, preferably DNA, is extracted e.g. from blood samples by conventional techniques. Digestion by restriction enzymes, such as BamEl, Bglll, Bell, Sau3A etc., is examplified for different embodiments in the examples below, and is preferably followed by enzyme inactivation. SauiA is preferably used for generation of relatively smaller fragments, such as of approximative sizes of about 250-300 bp, while BamRl, Bglll, Bell are useful for the generation of comparatively larger fragments of approximative sizes of about 1500 bp. Preferably, the so digested nucleic acid is then ligated with suitable linkers, such as Blsubtrl/2 etc. The amplification step is conveniently performed by PCR technique (see e.g. Sambrook et al., Molecular Cloning: A Laboratory Manual, 2ⁿd ed., Cold Spring Harbor, N.Y., 1989; Berger & Kimmel, Methods in Enzymology, Vol. 152: Guide to Molecular Cloning Techniques, Academic Press, Inc., San Diego, California, 1987; and Co et al. (1992) J. Immunol, 148: 1149), even though other methods which are well known to the skilled in this field may be equally suitable.

As regards the primer used for amplification of sample A, it may also include a functional group that enables immobilisation, e.g. biotin or a thiol group, or hydroxyl, carboxyl, aldehyde or amino groups. Preferably, said immobilisation is performed at a later stage, but one embodiment may also be contemplated wherein the primers are immobilised prior to the amplification. As support for the immobilisation microtiter wells, dipsticks, particles, fibres etc of e.g. agarose, cellulose, alginate etc are illustrating examples. Preferably, said means for immobilisation is biotin while the desired segments are isolated in step (g) by use of streptavidin beads.

In step (f), the function of the nuclease, e.g. uracil-DNA glycosylase (UDG) will be to degrade sample A nucleic acid amplified according to step (b), and the skilled in this field will realise that other combinations of base-enzyme capable of degradation thereof may be contemplated within the scope of the present invention, depending e.g. on whether it is DNA or RNA that is enriched. In the present application, it is to be understood that the term "uracil-DNA glycolase" refers to the same enzyme that is sometimes denoted "uracil-N glycosylase" due to its function to hydrolyze the N-glycosidic bond between uracil and the deoxyribose sugar. Thus, if dUTP is used to replace dTTP in the dNTP mixture used for the amplification, then any enzyme which is specific for removal of uracil in DNA may be utilized, such as any "unconventional nucleotide", i.e. nucleo- tides which are not naturally occurring in a particular nucleic acid, and a corresponding degrading enzyme. Such unconventional nucleotides may be naturally occurring nucleo- tides, such as hypoxanthine, or they may be chemically modified derivatives or analogues of conventional nucleotides, such as N-7-methylguanine, deoxyuridine and deoxy- 3'-methyladenosine. The present UDG may be of any origin, as the tertiary structure of UDG have been shown to be highly conserved (see e.g. U. Varshney et al., 1988, J. Biol. Chem. 263:7776-7784 for a description of the E.coli UDG gene, as well as US patent nos. US 5 536 649 and US 5 888 795 for discussions regarding UDG).

In the preferred embodiment of the present method, the nuclease used in step (f) is mung bean nuclease, even though other nucleases capable of digestion of single stranded nucleic acid, such as DNA, and non-perfect hybrids, may equally well be used. (For a review of nucleases, see e.g. "Nucleases Ed. Linn & Roberts, Cold Spring Harbor, N.Y., 1982, in particular the article therein entitled "Single Strand Specific Nucleases" by Shishido and Ando.) Mung bean nuclease is commercially available, e.g. from Pharmacia P-L Biochemicals, Piscataway, N.J.

In a broad embodiment of the present method, in step (c), samples A and B are mixed in a ratio of about 50: 1, preferably about 75: 1 and most advantageous about 100: 1, to enable a subsequent subtractive hybridization. This embodiment is advantageous e.g. for cloning of SNPs or deleted segments.

In the detection of the nucleic acid or DNA segment, illustrating primers useful for detection of SNPs are shown in Table 1 in the experimental section below. However, the skilled in this field will be able to design sutable primers depending on the purpose of the method, e.g. by using the Primer 3 program (http://www-genome.wi.mit.edu/cgi- bin/pπmer/primer3 www.cgi). Further, suitable PCR conditions can be selected e.g. by using a program developed by Breslauer et al. (10, http://alces.med.umn.edu/rawtm.html).

An essential feature of the present embodiment is that polymorphic fragments are enriched because of their relatively small size. Thus, one great advantage with the method of the present invention is that it enables a significant reduction of the complexity of the sample DNA and a dramatic increase of the proportion of the desired, e.g. polymorphic (mainly shorter), fragments.

Thus, one advantageous embodiment of the present method is the detection of polymorphisms, which method is sometimes herein denoted COP (Cloning Of Polymorphisms), as described in more detail in Example 1 below. The COP procedure is e.g. applicable to the isolation of SNPs from particular regions of the genome, e.g. CpG islands, chromosomal bands, YACs or PACs contigs. A combination of digestion with restriction enzymes, treatment with uracil-DNA glycosylase and mung bean nuclease, PCR amplification and purification with streptavidin magnetic beads is described for isolation of polymorphic sequences from the genomes of two human samples. After only two cycles of enrichment, as much as 80% of the isolated clones were found to contain RFLPs, which polymorphisms were subsequently detected by a simple PCR method.

Further uses of the present embodiment of the method are e.g. use thereof to clone markers from regions that have lost heterozygosity to isolate tumor suppressor genes, the detection of rearranged immunoglobulin loci and 'chromosome landing' to facilitate positional cloning in organisms for which high resolution maps have not been developed.

The present invention also relates to a novel PCR method for detection of the RFLPs enriched in accordance with the invention. This method is based on differential PCR amplification of RFLP DNA segments having different length and exploits agarose gel elec- trophoresis to detect polymorphisms and is described in detail in the experimental section below.

A second embodiment of the present method is to clone deleted sequences. The same basic strategy is used as in the COP procedure, as outlined in Fig 5. In this case, one sample will contain a normal sequence while the other have a specific sequence deleted, which is identified by the present method. A subtractive hybridization is performed as in the COP procedure, with the same ranges of advantageous ratios between sample A and B. An example of a CODE (Cloning Of DEleted sequences) procedure according to the invention will be disclosed in Example II below. This embodiment provides a simple, effective and robust procedure that can successfully isolate deleted genomic sequences. In contrast to the representational difference analysis (RDA) of the prior art, the advantages of the CODE procedure are for example that many different probes can be generated in one experiment. The method according to the invention is principally different from said prior art RDA in that it is easier to perform and requires only a limited amount of PCR enrichment. The CODE method does not exploit the enrichment stemming from the difference between exponential and linear amplification. This ensures that the biased influences generated by PCR are kept to a minimum, and the subtractive enrichment becomes the most critical step. Another important difference between the CODE procedure according to the invention and other genomic subtraction methods (e.g. RDA and 'RFLP subtraction' method) is that the CODE allows cloning of differently sized polymorphic fragments hemizygously deleted in the tumor DNA.

As described above, the general scheme of the CODE procedure corresponds to that of the COP. However, some differences exist, such as the mechanisms for the differential cloning. In the CODE procedure, the main force for the differential cloning is subtractive hybridization, while in the COP procedure, the PCR enrichment of polymorphic fragments due to their smaller size is an important aspect. In practice, in general the COP procedure is useful with fewer cycles of enrichment than the CODE procedure.

A third embodiment of the present method is to clone identical sequences between two complex genomes as examplified in Example III below and outlined schematically in Figure 8. In summary, samples A and B are then after digestion thereof ligated to linkers comprising e.g. 2 or more recognition sites for Mvnl, sample A is amplified in the presence of d^m5CTP to methylate essentially all cytosines and the combined sample mixture is digested with Mvnl prior to step (f).

More specifically, the two samples of DNA A and B are herein digested with a suitable restriction enzyme, such as BamRl, and ligated to special linkers containing 2 recognition sites for Mvnl. Thus all molecules contain at least 4 sites for Mvnl. DNA A is then PCR amplified in the presence of dUTP and d^m5CTP, resulting in that all, or essentially all, cytosines will become methylated. DNA B is amplified, e.g. by PCR, in the presence of normal dCTP and biotinylated primers. Contrary to the first embodiment for SNPs (COP) and deletions (CODE), the present two DNA preparations are herein mixed in equal ratios, denatured and hybridized. Subsequently, the DNA is digested with Mvnl. This enzyme can digest only dsDNA molecules without methylcytosine and will digest all homohybrids B, as they contain at least 4 sites for vwI. In a second aspect, the present invention relates to a kit for performing a method according to the invention. More specifically, such a kit comprises, inter alia, the following components:

(i) one or more restriction enzymes, such as BamHl, Bglll, Bell, Sau3A etc; (ii) one or more linker oligonucleotides;

(iii) amplification buffer solution containing dATPs, dCTPs, dGTPs and dUTPs, primer for sample A and enzyme, such as Taq-polymerase;

(iv) amplification buffer solution containing dATPs, dCTPs, dGTPs and dTTPs, labeled primer for sample B and enzyme; (v) hybridization buffert; (vi) uracil-DNA glycosylase (UDG); (vii) nuclease, such as mung bean nuclease; (iix) means for identification of labeled primer from sample B; said components being presented in a suitable package having separate compartments for separate reaction and method steps and optionally together with instructions for the use thereof, wherein each component is present in an appropriate buffer or solvent if required. Regarding further solvemnts and/or bufferts, the skilled in this field can easily make suitable selections by reference to the detailed examples of various embodiments of the method according to the invention as presented in Examples I-III below.

In a preferred embodiment, the present kit comprises a labeled primer for sample B amplification which is biotinylated PBsub and said means for identification is then strepta- vidin. The primer for sample A can e.g. be non-biotinylated PBsub (Antiuniv).

In the most advantageous embodiment, when PCR is used for the initial amplification, the amplification bufferts (iii) and/or (iv) of the present kit comprise Taq-polymerase. Further reagents suitable for such a PCR are given below in the examples and may if required be included in a kit according to the invention. An oligonucleotide linker which is especially advantageously used in the method according to the invention is the linker denoted Blsubtrl/2, which comprises the following oligonucleotides

Blsubtrl 5'-GATCCGCGGCCGCGGTCCCAAAAGGGTCAGTGCTGG-3 ' Blsubtr2 5'-CCCAGCACTGACCCTTTTGGGACCGCGGCCGCG-3 '. The invention aslo encompasses a primer or linker selected from the group presented in Table 1, i.e. Antiuniv, PBsub, SNP1F, SNP1R, SNP2F, SNP2R, SNP3F, and SNP3R, suitable for use in the present method and kit.

Detailed description of the drawings

Figure 1 shows a flow chart diagram of the COP procedure described in detail in Example 1 below for the analysis of DNA A and DNA B from two individuals. R - recognition site of a restriction endonuclease, b - biotin.

Figure 2 illustrates hybridization according to Example I of the isolated recombinant clones to the Southern blots with DNA isolated from individual A (lanes 1 , 3, 5, 7, 9, 11, 13, 15, 17, 19) and B (lanes 2, 4, 6, 8, 10, 12, 14, 16, 18, 20). DNA was digested with restriction enzyme indicated in the Figure.

Figure 3 shows the detection according to Example I of polymorphic sequences in DNA A and B using PCR. PCR conditions are described in Materials and Methods (Example I). Electrophoresis was performed in 1% agarose gel.

Figure 4 shows general schemes (A and B) explaining PCR detection of polymorphic sequences in accordance with Example I. R - recognition site of a restriction endonuclease. Dashed lines denote sequences produced with reverse primer (see Materials and Methods, Example I below), small arrows indicate localization of primers used for the PCR amplification.

Figure 5 shows a flow chart diagram of the CODE procedure described in detail in Example II below. B - BamΑl or Bglll or Bell, b - biotin, u - dUTP.

Figure 6 illustrates hybridization according to Example II of the isolated recombinant clones to the Southern blots normal human DNA (lanes 1, 3, 5, 7, 9, 12, 14, 16, 18, 20) and tumor DNA (lanes 2, 4, 6, 8, 10, 11, 13, 15, 17, 19). DNA was digested with restric- tion enzyme indicated in the Figure. Clones 1 to 5 contain DNA fragments deleted in ACC-LC5. Clones 6 to 8 represent polymorphisms. Clone 9 is present both in the tester and driver DNA, clone 10 contains repeated sequences.

Figure 7 shows a general scheme of the experiment acording to Example III below, wherein cloning of identical sequences between two complex genomes is shown (CIS). Figure 8 is a flow chart diagram of the CIS procedure of Example III below, wherein B - BamUl, b - biotin.

Figure 9 illustrates a FISH analysis of the MCH429.11 DNA (A) and resulting CIS DNA products (C) detected with FITC-conjugated avidin. Corresponding DAPI banded chromosomes are shown below (B,D). In (A) entire microcell hybrid DNA was labelled (Kholodnyuk ID, Kost-Alimova M, Kashuba VI et al (1997) The region of 3p22-p21.3 is non-randomly eliminated from mouse human microcell hybrids during tumor growth in SCID mice. Genes Chromosomes Cancer 18:200-211).

Figure 10 illustrates a FISH analysis of the MCH939.2 DNA (A) and resulting CIS DNA products (C) detected with FITC or Cy3-conjugated avidin respectively. DAPI banded chromosomes are shown below (B,D). In (A) entire microcell hybrid DNA was labelled (Kholodnyuk ID, Kost-Alimova M, Kashuba VI et al (1997) The region of 3p22-p21.3 is non-randomly eliminated from mouse human microcell hybrids during tumor growth in SCID mice. Genes Chromosomes Cancer 18:200-211).

EXPERIMENTAL

The present examples are included as illustrating the present invention and they shall not be construed as limiting the invention as defined by the appended claims. All references given below or elsewhere in the present application are hereby included herein by reference. Throughout the present specification and claims, the term "comprising" is to be interpreted as "comprising but not limited to". Example I: Cloning of polymorphisms (COP) Materials and Methods General methods

The isolation of genomic DNA, restriction enzyme digestion and DNA ligation procedures were performed as previously described (Sambrook, J. Fritsch, E. F. and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor, NY). Cloning, Southern blotting and hybridization were performed as previously described (Sambrook, J. Fritsch, E. F. and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor, NY). Sequencing was undertaken using the ABI 310 (Perkin Elmer), according to the manufacturer's instructions. Oligonucleotides were from Life Technologies (Gibco BRL).

The COP procedure

DNA was extracted from blood samples taken from two individuals (A and B). Both DNA samples (2μg) were digested with BamRl, Bglll and Bell (20U of each) at 37°C for 5h, followed by enzyme inactivation at 65°C for 20min. Ligation of 0.5μg of the digested DNA with 20M excess of the linker, Blsubtrl/2, was performed at room temperature (see Table 1).

Table 1 : List of used primers and linker

Primer linker Size(nt) Primer and linker sequences

Name

Blsubtrl/2 36 5'-GATCCGCGGCCGCGGTCCCAAAAGGGTCAGTGCTGG- 3^'

33 5'-GCGCCGGCGCCAGGGTTTTCCCAGTCACGACCC-3'

Antiuniv 21 5'-CAGCACTGACCCTTTTGGGAC- 3'

PBsub 21 5'-bio-CAGCACTGACCCTTTTGGGAC- 3'

SNP1F 20 5'-TCCATGAAGTCAGGGGTTTG- 3'

SNP1R 20 5'-TGAGGAACCTGGAGACCAAA- 3'

SNP2F 20 5'-CCCATAAGGGTCAGTGCTGA- 3'

SNP2R 20 5'-CCAAGTTGGTCATCCTCCCT- 3'

SNP3F 20 5'-TTCTCACCTTGTCGAAAGCA- 3'

SNP3R 20 5'-CGGTGTCGTGTCTGAAACAT- 3' Following ligation, polymerase chain reaction (PCR) of DNA B was performed in lOOμl solution containing 67mM Tris-HCl (pH 9.1), 16.6mM (NH )₂SO4, l .OmM MgCl2, 0.1% Tween 20, 200μM each of the four dNTPs, lOOng DNA, 400nM of biotinylated PBsub primer (Table 1), and 5U of Taq polymerase. PCR of DNA A was performed in 20 tubes using the same PCR conditions as described for DNA B, with the following exceptions: the concentration of MgC ? was increased to 2.5mM, dUTP (300μM) was used instead of dTTP, and the Antiuniv primer (the same as PBsub, but without biotin) was used in place of PBsub.

The PCR cycling conditions were 72°C for 5 min, followed by 30 cycles of 95°C for 40 sec, 60°C for 45 sec, 72°C for 1.5 min, with a final extension at 72°C for 5 min. PCR amplified DNA A (l,000μl) was mixed with lOμl of PCR amplified DNA B. This mixture was purified with JET quick PCR Purification Spin Kit (GENOMED Inc.), concentrated with ethanol, and dissolved in lOμl H2O.

After denaturation at 100°C for 8 min, the first hybridization was performed for 40 h in 18μl buffer containing 0.4mM NaCl, lOOmM Tris-HCl (pH 8.5) and ImM EDTA. The mixture was then diluted to 200μl and extracted with an equal volume of chloroform: isoamyl alcohol (24:1) to remove the mineral oil. The mixture was treated with 30 U UDG (Boehringer Mannheim) for 4 h at 37°C in a buffer containing 70mM Hepes-KOH (pH 7.4), ImM EDTA and ImM dithiothreitol. The DNA was then concentrated with ethanol and dissolved in 25μl TE buffer (lOmM Tris-HCL, pH8.0, O.lmM EDTA). Then, 3μl 10X MBN buffer (30mM sodium acetate pH 4.6, 50mM NaCl, ImM zinc acetate and 0.001% Triton X-100) and 20U of mung bean nuclease (Boehringer Mannheim) were added and the sample was incubated at 37°C for 30 min. The reaction was terminated by the addition of EDTA to a final concentration of ImM.

The resulting product was purified with streptavidin coupled Dynabeads M-280 (Dynal A.S, Oslo, Norway), according to the manufacturer's instructions, and dissolved in 20μl TE. PCR amplification of 0.5μl of this DNA sample was performed as described above for DNA B. After PCR, 0.1 μg of this DNA was mixed with 10 tubes of PCR amplified DNA A and the hybridization and treatment with UDG and mung bean nuclease, was repeated.

The final product was then PCR amplified, purified with JETquick PCR Purification Spin Kit, and cloned into pCR®4-TOPO using the TOPO™TA cloning kit (Invitrogen).

PCR detection of polymorphisms

Partial sequencing of clones 1, 2 and 3 was performed using Ml 3 reverse primer on the ABI 310 sequencer (Perkin Elmer) according to the manufacturers' protocols. Genomic DNA was completely digested with BamRl or Bglll and self-ligated at a low concentration (4μg/ml) overnight.

PCR amplification was performed using lOOng of the ligation product. The sequences of the PCR primers used for detection of the polymorphisms are shown in Table 1 above. The primers were designed using the Primer 3 program (httpJ/www- genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi) and the PCR conditions were selected using a program developed by Breslauer et al. (10, http://alces.med.umn.edu/rawtm.html).

Results and Discussion, the COP procedure

An overview of the COP procedure is shown in Figure 1. DNA A and B obtained from two individuals were digested with BamRl, Bglll and Bell, and Blsubtrl/2 linkers (Table 1) were ligated to the digested DNA fragments. DNA A was amplified using dUTP and an unmodified primer, and DNA B was amplified with a biotinylated primer, but with normal dNTPS. The PCR products (mainly in the range of 0.5kb to 2kb) were denatured, hybridized at a ratio of 1 : 100 of DNA B to A, and then treated with UDG to destroy all the DNA originating from sample A, and with mung bean nuclease to digest the single- stranded DNA and all non-perfect hybrids. The resulting sample B homohybrids were purified and concentrated with streptavidin beads. The cycle was then repeated. The final DNA product was then re -purified, amplified and cloned into an appropriate vector. This procedure was applied to DNA isolated from two different people of the same race, and a number of clones were obtained from the transformation. DNA from 10 random clones was isolated and the inserts were analyzed by Southern blotting hybridization for the presence of polymorphisms (Fig. 2). Eight clones were clearly polymorphic and two clones were not. These results indicate a good selectivity of the COP procedure.

SNPs that represent restriction fragment length polymorphisms (RFLPs) can be identified by Southern blotting which is a method well known to the skilled in this field (Southern, J. Mol Biol. 98:503, 1975). However, genomic DNA is not always available in quantities sufficient for Southern blotting analysis and PCR is easier to perform. In order to develop a straightforward procedure for testing for the presence or absence of SNPs in the clones generated in this study, the procedure used was based on identification of the different sized polymoφhic fragments generated by restriction enzymes. Three polymoφhic clones were selected, the inserts in these clones were sequenced, and primers for detection of the polymoφhisms were designed. The present primers worked in opposite directions.

Genomic DNA was digested with BamRl or Bglll, and after heat inactivation of the enzyme, the solution was diluted twenty times and self ligated. PCR amplification was performed as described in Materials and Methods above. As shown in Figure 3, all three primers detected SNPs.

Figure 1 does not reflect all the complex reactions involved in COP that enrich for RFLP fragments. Importantly, polymoφhic fragments are PCR enriched in COP because of their relatively small size. However, digestion with restriction enzymes generates two fragments from single RFLP fragment. This may increase the probability of forming ho- modimers, partially from the replacement of a short plus strand with a longer plus strand in a duplex originally comprised of a long minus strand and a short plus strand. This duplex formed by two long strands will be more stable than the duplex formed by short and long strands. Two scenarios probably play a major role in RFLP enrichment. Firstly, enzyme digestion of one long DNA fragment of A (e.g. 15kb) may generate a long fragment (Al) (e.g. 14.5kb) and a very short fragment (A2) (e.g. 0.5kb). After the first step of PCR amplification, only fragment A2 would be present in amplified DNA (Fig. 4a). Clone 3 in Figure 3 demonstrates this case. Another situation would be if enzyme digestion generated two short fragments (e.g. 0.5kb and 0.4kb) from a medium DNA fragment (e.g. 0.9kb). In this case, all three fragments would be PCR amplified on the first step and would therefore participate in the subsequent reactions (Fig. 4b). Clone 1 in Figure 3 is an example of this situation. Clone 2 in Figure 3 reflects the same situation but polymoφhic fragment is longer.

In both scenarios, after the first PCR amplification, the complexity of the DNA would be significantly decreased and the proportion of polymoφhic (mainly shorter) fragments would be dramatically increased.

In this study, three enzymes (BamRl, Bglll and Bell) were used for the digestion of human DNA. These enzymes would generate fragments of an average size of approximately l ,500bp. Sau3A, which generates restriction fragments with an average size of 250bp to 300bp, could have been used in place of these enzymes.

If we assume that two equivalent chromosomes have on average one SNP per 1 ,000bp DNA (Brookes, A. J. (1999) Gene, 234, 177-186), then at least 10,000 polymoφhic fragments could be cloned using the COP procedure, described in this study. In the calculation we assume that the size of the human genome is 2.5x10^ bp and roughly l,500bp of human DNA contain one of the three enzymes used in the study, i.e. BamRl, Bell or Bglll. Using Sau3A (one site per 250bp), 40,000 fragments could be cloned. Using all combinations of restriction enzymes with a 4bp recognition site (4bp cutters), including enzymes recognizing multiple and non-palindromic sequences, practically all SNPs could be identified and cloned by this method. Using only two 4bp cutters containing CG pair(s) in the recognition site, cloning of at least 80,000 SNPs, located mostly in gene rich regions, could be achieved. This effort would be comparable in productivity to SNP generation by EST cloning and sequencing program. Importance of this type of SNPs is that they may locate not only in expressed but also in promoter/enhancer regions.

It is worthwhile to mention that COP procedure itself results in enrichment of polymorphic sequences. Actually, to detect SNP, another step - sequencing of the polymoφhic clone - is needed. Thus, the COP procedure is advantageously used as a complementary method for other approaches to SNP detection.

Thus, as mentioned above, the present procedure will be useful for isolating SNPs from particular regions of the genome. In this case, DNA B would originate from individual or contigs of YACs, PACs, or BACs clones from the region of interest. Other obvious applications for the developed method would be generation of SNPs in CpG islands (Bird, A. P. (1987) Trends Genet., 3, 342-347) using CG containing enzymes and in different human populations and use for non-human organisms. Use of the COP procedure to clone markers from regions that have lost heterozygosity could result in the isolation of tumor suppressor genes or could be used for the detection of rearranged immunoglobulin loci (Rosenberg, M., Przybylska, M. and Straus, D. (1994) Proc. Natl. Acad. Sei. U S A, 91: 6113-6117). Another application could be 'chromosome landing' (Corrette-Bennett, J., Rosenberg, M., Przybylska, M., Ananiev, E. and Straus, D. (1998) Nucleic Acids Res., 26: 1812-1818; and Young, N. D. and Phillips, R. L. (1994) Plant Cell, 6: 1193-1195) to facilitate positional cloning in organisms for which high resolution maps have not been developed. The efficiency of the COP procedure could be increased, if necessary, by adding one more cycle of enrichment or by omitting PCR amplification after the first cycle of enrichment.

The COP procedure resembles the 'RFLP subtraction' in cloning RFLP using subtractive procedure (Rosenberg, M., Przybylska, M. and Straus, D. (1994) Proc. Natl Acad. Sei. U S A, 91 : 6113-6117; and Corrette-Bennett, J., Rosenberg, M., Przybylska, M., Ananiev, E. and Straus, D. (1998) Nucleic Acids Res., 26: 1812-1818). Otherwise, however, these procedures are very different in the biochemical techniques used for cloning RFLP and the results. The RFLP subtraction is a significantly more complicated and laborious method. In addition to the multiple (3-4 cycles) 'classical' subtractive hybridization steps it uses gel purification, a reassociation step to remove poorly hybridizing DNA, subtraction based on representational difference analysis (Lisitsyn, N., Lisitsyn, N. and Wigler, M. (1993) Science, 259, 946-951) and multiple combinations of linkers and PCR primers. The most effective enrichment steps in the COP procedure (with UDG and mung bean nuclease) are not used in the RFLP subtraction at all. The RFLP subtraction results in cloning RFLP segments that are present in one DNA sample and absent in the other. However, the COP procedure yields DNA fragments that are heterozygous in one DNA sample but homozygous in the other. Because of these distinctive properties, compared to the prior art, the COP procedure according to the present invention will find wider applications in studies of genetic diversity.

Example II: Cloning of deleted sequences (CODE) Materials and Methods Cell lines and general methods

For the experimental set-up, DNA isolated from a small cell lung carcinoma cell line ACC-LC5 was used (Yamakawa, K., Takahashi, T., Horio, Y., Murata, Y., Takahashi, E., Hibi, K., Yokoyama, S., Ueda, R., Takahashi, T., Nakamura, Y., 1993. Frequent homozygous deletions in lung cancer cell lines detected by a DNA marker located at 3p21.3-p22. Oncogene 8, 327-330). This cell line contains homozygous 685-kb deletion in 3p21.3-p22 (Ishikawa, S., Kai, M., Tamari, M., Takei, Y., Takeuchi, K., Bandou, H., Yamane, Y., Ogawa, M., Nakamura, Y., 1997. Sequence analysis of a 685-kb genomic region on chromosome 3p22-p21.3 that is homozygously deleted in a lung carcinoma cell line. DNA Res. 4, 35-43) and was used as a source for DNA A, driver. DNA isolated from normal human lymphocytes was a control DNA (DNA B, tester).

Isolation of the genomic DNA, and restriction enzyme digestion and DNA ligation were performed according to standard methods previously described (Zabarovsky, E. R., Boldog, F., Thompson, T., Scanlon, D., Winberg, G., Marcsek, Z., Erlandsson, R., Stan- bridge, E. J., Klein, G., Sumegi, J., 1990. Construction of a human chromosome 3 specific Notl linking library using a novel cloning procedure. Nucleic Acids Res. 18, 6319- 6324).

Cloning, Southern blotting and hybridization were performed according to standard methods (Sambrook, J., Fritsch, E. F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY). Probes were ³²-P labeled by PCR (Bicknell, D. C, Markie, D., Spurr, N. K., Bodmer, W. F., 1991. The human chromosome content in human x rodent somatic cell hybrids analyzed by a screening technique using Alu PCR. Genomics 10, 186-192).

The CODE procedure Two oligonucleotides:

Blsubtrl 5'-GATCCGCGGCCGCGGTCCCAAAAGGGTCAGTGCTGG- 3' and Blsubtr2: 5'-CCCAGCACTGACCCTTTTGGGACCGCGGCCGCG- 3' were used to create the Blsubtrl/2 linker. Annealing was carried out in a final volume of 100 μl containing 20 μl of 100 μM Blsubtrl, 20 μl of 100 μM Blsubtr2, 10 μl of 10X M buffer (Boehringer Mannheim) and 50 μl of H2O. The reaction mixture was boiled for 8 min and allowed to cool slowly at room temperature.

Two micrograms of DNA A and DNA B at a DNA concentration of 50 μg/ml were digested with 20 U of BamRl, Bglll and Bell (Boehringer Mannheim) at 37°C for 5 h. Upon completion of digestion, the enzymes were heat-inactivated for 20 min at 65°C.

Approximately 0.5 μg of the digested DNAs were ligated overnight in the presence of a 50 M excess of Blsubtrl/2 linker at room temperature. PCR of tester amplicon (DNA B with Blsubtrl/2) was performed in 100 μl of a solution containing 67 mM Tris-HCl, pH 9.1, 16.6 mM (NH4)₂S0 , 1.0 mM MgCl , 0.1% Tween 20, 200 μM dNTPs, 100 ng tester amplicon DNA, 400 nM of biotinylated primer PBsub 5'biotin- CAGCACTGACCCTTTTGGGACC-3', and 5U of Taq polymerase. PCR of the driver amplicon (DNA A with Blsubtrl/2) was performed in 20 tubes using the Antiuniv primer (the same as above but without biotin) and the following modified conditions: dUTP (300μM) was used instead of dTTP, and 2.5mM MgCl2 was used rather than l .OmM MgCl2- The PCR cycling conditions were 72°C for 5 min, followed by 25 cycles of 95°C for 1 min, 60°C for 1 min and 72°C for 2.5 min, and a final extension at 72°C for 5 min.

All PCR amplified DNA A samples were pooled (2000 μl) and mixed with 20 μl of PCR amplified DNA B (for subtraction, a ratio of 1 : 100 of DNA B to DNA A was used). The pooled sample was concentrated with ethanol, purified using a JETquick PCR Purification Spin Kit (GENOMED Inc.), and dissolved in 100 μl H2O. This DNA mixture was further concentrated to 6 μl and boiled for 10 min under mineral oil.

Subtractive hybridization was performed for 40 h in 9 μl buffer containing 0.4 M NaCl, 100 mM Tris-HCl, pH 8.5 and 1 mM EDTA. After hybridization, the mixture was diluted to 200 μl and extracted with an equal volume of chloroform: isoamyl alcohol (24: 1) to remove the mineral oil. Treatment with UDG (Boehringer Mannheim) was performed in a buffer containing 70 mM Hepes-KOH, pH 7.4, 1 mM EDTA and 1 mM dithiothreitol with 30 U UDG at 37°C for 4 hrs. Then DNA was precipitated with ethanol and dissolved in 25 μl of TE buffer. To this 3 μl of 10X MBN buffer (30 mM sodium acetate, pH 4.6, 50 mM NaCl, 1 mM zinc acetate and 0.001% Triton X-100) and 20 U of mung bean nuclease (Boehringer Mannheim) were added and incubated at 37°C for 30 min. The reaction was stopped by the addition of EDTA to a final concentration of 1 mM.

The subtracted DNA was purified with streptavidin coupled Dynabeads M-280 (Dynal A.S, Oslo, Norway) according to the manufacturer's instructions and dissolved in 20 μl of TE buffer. Approximately 0.5 μl of this DNA preparation was PCR amplified as described above for DNA B but using only 15 cycles, before subjecting the amplified DNA to a second round of hybridization. The whole procedure was repeated three more times (four cycles altogether).

The final subtraction product was PCR amplified, purified with JETquick PCR Purification Spin Kit (GENOMED Inc.) and digested with Notl. This DΝA preparation was inserted into the pBC KS(+) vector (Stratagene), which was digested with Notl and dephosphorylated by alkaline phosphatase (Boehringer Mannheim).

Results and Discussion, the CODE procedure

An overview of the CODE procedure is shown in Figure 5. To increase the complexity of the representation it was decided to use three enzymes having the same sticky ends. The tester and driver DΝA were digested with BamRl, Bglll and Bell before the ligation of oligonucleotide linkers to the DΝA fragments. The driver DΝA was amplified with dUTP and unmodified primers and tester DΝA were amplified with biotinylated primers in the presence of normal dΝTPs. The products of DΝA amplification (on average 1-2 kb) were denatured and hybridized at a ratio of 1 :100 tester to driver DΝA, respectively. After hybridization was complete, the products were treated with UDG (which destroyed all the driver DΝA) and mung bean nuclease (which digested single stranded DΝA and all the non-perfect hybrids). The resulting tester homohybrids were purified, concentrated with streptavidin beads, and subjected to three more rounds of subtraction. The final PCR product was amplified and cloned into a suitable vector.

In this study, a lung tumor cell line ACC-LC5 that contains a 0.7 Mb homozygously deleted region in 3p21-p22, was compared with normal control DΝA. It was not known whether this cell line contained homozygous deletions in other chromosomes. This normal DΝA is not a completely appropriate control because it has been isolated from another individual. Thus, cloning of polymoφhic sequences as well as deleted may be expected. Twentyfour random clones were tested by Southern blot analysis. Five of the clones were deleted in the tumor sample, five clones were not deleted, three clones failed to show a specific hybridization signal and eleven were polymoφhic. In Figure 6 several examples of Southern hybridization are shown.

These experiments demonstrate that the CODE procedure according to the invention is a simple, effective and robust procedure that can successfully isolate deleted genomic sequences. In contrast to the RDA, many different probes can be generated in one experiment and all 24 tested clones were different. This method is principally different from the prior art RDA in that it is easier to perform and requires only a limited amount of PCR enrichment. The CODE method does not exploit the enrichment stemming from the difference between exponential and linear amplification. This ensures that the biased influences generated by PCR are kept to a minimum, and the subtractive enrichment becomes the most critical step. The sizes of the subtractive products were between 300-700 bp, but it is probable that this can be significantly increased (up to l-2kb). The present inventors have performed the COP procedure (Example I above) and the CIS procedures (Example III below) using only one enzyme - BamHI and a long distance PCR with a decreased number of amplification cycles. Fragments larger than 1 kb were purified by agarose gel electrophoresis. As a result, both procedures yielded DNA fragments mainly in a range 1 kb-2.5 kb. The same approaches can be applied to the CODE procedure.

Another important difference between the CODE and other genomic subtraction methods (e.g. RDA and'RFLP subtraction' method, Rosenberg et al., 1994) is that the CODE allows cloning of differently sized polymoφhic fragments hemizygously deleted in the tumor DNA. The RDA and the 'RFLP subtraction' methods yielded probes that detected hemizygous loss of smaller fragment in the driver DNA. Thus all probes detected two alleles, distinguished by a large (e.g. 7 kb) and a small (e.g. 0.6 kb) DNA fragment. The small allele was always present in the tester but absent in the driver DNA. In the case of the CODE method, polymoφhic fragments with similar length can be differentially cloned (Fig. 6, clones 6 and 7). As demonstrated above, the COP procedure (and con- ceivably the CODE) allowed differential cloning of very similar fragments (e.g. 2 kb and 2.3 kb) and detected the loss of a larger polymoφhic fragment.

The general scheme of the CODE procedure is similar to the COP. However, several main important differences exist. First of all, mechanisms used for the differential cloning in these two methods are different. In the CODE procedure the main force for the differential cloning is subtractive hybridization. In the COP procedure, a very important aspect is PCR enrichment of polymoφhic fragments due to their smaller size. In the COP procedure only two cycles of enrichment were used and in the CODE procedure it was used four times.

If it is assumed that homozygous deletion in tumor is 0.7Mb and the average size of the DNA fragments after simultaneous digestion with BamHI, Bell and Bglll is 1.5 kb, then only approximately 470 DNA fragments will be located in the deleted region. These 470 hundreds fragments represent approximately 0.01 % of all available DNA fragments in a diploid human genome (όxlO^bp). At the same time it can be expected that from 4x10" DNA fragments approximately 20.000 or 0.5% will be polymoφhic. It was assumed in the calculations that two equivalent chromosomes have on average one single nucleotide polymoφhism per 1.000 bp (Brookes et al., 1999). It means that in the CODE procedure, it is required to increase at least 50 times the selectivity of the method and deal with all available for differential cloning molecules. That is why in the CODE procedure more purification steps and less cycles for PCR amplification of the subtracted products were used.

A cDNA subtraction method using PCR amplification with dUTP and UDG treatment has recently been published (Sugai, M., Kondo, S., Shimizu, A., Honjo, T., 1998). Isolation of differentially expressed genes upon immunoglobulin class switching by a sub- tractive hybridization method using uracil DNA glycosylase. Nucleic Acids Res. 26, 911- 918). The present CODE procedure is significantly different from this cDNA subtraction method. Importantly, the subtraction efficiency of the Sugai et al. method was very low (more than 90% of the clones were present in both mRNA preparations), despite the lower complexity of mRNA compared to genomic DNA.

In summary, the present example describes a simple method for cloning deleted sequences, which will find many uses in applications where genomic sequence loss is suspected to contribute to the onset of specific diseases. The use of the present CODE procedure in combination with the CIS procedure, that allows to clone sequences identical between two complex genomes, will provide a useful tool to localize the hereditary disease genes.

Example III: Cloning of identical sequences between two complex genomes (CIS) Materials and Methods Cell lines and general methods

MCH903.1 is a mouse-human microcell hybrid (MCH) containing a single copy of human chromosome 3, derived from a normal human diploid cell line HFDC, as its only human component (Zabarovsky ER, Kashuba VI, Porovskaya ES et al. (1993) Alu-PCR approach to isolate Notl linking clones from the 3pl4-3p21 region frequently deleted in renal cell carcinoma. Genomics 16, 713-719; Zabarovsky ER, Kashuba VI , Kholodnyuk ID et al. (1994) Rapid mapping of Notl linking clones with differential hybridization and Alu-PCR. Genomics 21, 486-489; and Wang JY, Zabarovsky ER, Talmage C et al. (1994) Somatic cell hybrid panel and Notl linking clones for physical mapping of human chromosome 3. Genomics 20: 105-113). MCH429.11 is a rat/human microcell hybrid containing a part of human 3q originating from the same chromosome as in MCH903.1. The MCH939.2 cell line (Zabarovsky et al. 1993, 1994, supra; and Wang et al. 1994, supra) originally contained a cytogenetically normal chromosome 3 (derived from a normal human diploid cell line HHW1108) which later acquired a deletion in the short arm (3p21-p22).

Isolation of genomic DNA, ligation, rescriction enzyme digestion and other molecular and microbiology methods were performed using standard procedures. The CIS procedure and reverse chromosome painting

Mung bean nuclease can cut double stranded DNA containing a gap several nucleotides in length (Sambrook, J., Fritsch, E. F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY). To check if it can cut duplex DNA containing single base pair miswatches, the following experiment was performed. Normal and mutated P53 genes were amplified using p53H-A (5'-ATG GAT GAT TTG ATG CTG TC-3') and p53H-B (5'-GTG AAA TAT TCT CCA TCC AG-3') primers.

Mutated P53 gene contained His 273 coded by the triplet CAT instead of Arg 273 coded by the triplet CGT in the normal P53 gene. If mung bean nuclease can recognize this 1 bp mismatch it should cut approximately 700 bp from the 5' end of the P53 gene. Mutated P53 gene was mixed with normal P53 gene, denatured and annealed in 0.4 M NaCl, 50 mM Tris-HCl, pH 8.5, and 1 mM EDTA.

Subsequently, the duplex was treated (30 min, 37°C) with mung bean nuclease (from 0.01 U to 5 U per one μg of duplex) and ligated to the linker C1/C2 (without 5' phosphate):

5'-TAC CTA CTA AAC TAC GAC AGA A-3' (C2 oligonucleotide) 3'-ATG GAT GAT TTG ATG CTG CT-5' (Cl oligonucleotide)

The resulting product was purified with JET quick PCR Purification Spin Kit (GENOMED Inc.) and PCR amplified using p53H-A and Cl primers. In addition to the full-length P53 fragment (approximately 0.9 kb), a smaller 0,7 kg fragment was seen at concentrations of mung bean nuclease between 0.1 U and 1 U (data not shown). Based on these results we decided to use 1 U of mung bean nuclease per 1 μg of DNA in the following CIS experiments.

The synthetic linker CIS was prepared by annealing of two oligonucleotides:

Blsubtrl 5'-GATCCGCGGCCGCGGTCCCAAAAGGGTCAGTGCTGG-3' and Blsubtr2 5'-CCCAGCACTGACCCTTTTGGGACCGCGGCCGCG-3'. Two micrograms of DNA from two micro-cell hybrid lines (MCH429.11 and MCH903.1 in one experiment, and MCH939.2 and MCH903.1 in another) were digested with BamRl and ligated to the CIS-linker (20 μM). Polymerase chain reaction (PCR) of MCH903.1 DNA was performed using biotinylated primer PBsub: 5'biotin- CAGCACTGACCCTTTTGGGACC-3' in 100 μl containing 67mM Tris-HCl, pH 9.1, 16.6 mM (NH4)2S04, 1.0 mM MgCi2, 0.1% Tween 20, 200 μM each of the four dNTPs, 100 ng DNA, 400 nM primer and 3 U of Taq polymerase. PCR of MCH429.11 and MCH939.2 was performed using Antiuniv primer (the same primer as above but without biotin) with modified conditions: 2.5 mM MgCl2 and dUTP (600 μM) and 5'-methyl- dCTP (m5dCTP, 250 μM) instead of dTTP and dCTP. PCR cycling conditions were 95°C for 1 min, followed by cycles: 90°C for 1 min, 60°C for 1 min, 72°C for 2 min (30 cycles). The sizes of the PCR products were between 200 bp and 5.000 bp with the majority of the fragments between 1.000-2.000 bp. Hybridization was done (65°C, 36 hrs) using 10 μg of each DNA (final concentration 0,5 μg/μl) in 0.4 M NaCl, 50 mM Tris-HCl, pH 8.5, and 1 mM EDTA. After hybridization, DNA was digested with 50 U Mvnl (3 hrs, 37°C), treated with 20 U of mung bean nuclease (30 min, 370°C), 20 U UDG (4 hrs, 37°C) and heated for 15 min at 95°C. The DNA was purified and contrated using streptavidin coupled Dynabeads M-280 (Dynal A.S., Oslo, Norway) according to the instruction and PCR amplified using Antiuniv primer.

The standard procedure of FISH analysis with metaphase chromosomes was performed as described previously (Pinkel D, Straume T, Gray JW (1986/1992) Cytogenic analysis using quantitative, high-sensitivity, fluorescence hybridization. Pro Natl Acad Sei USA 83:2934-2938; and Fedorova L, Kost-Alimova M, Gizatullin RZ et al. (1996/7) Assignment and ordering of twenty-three unique Notl-linking clones containing expressed genes including the guanosine 5'-monophosphate synthetase gene to human chromosome 3. EurJ Hum Genet 5: 110-116).

Resverse chromosomal painting with total genomic DNA isolated from MCH429.11 and MCH939.2 was done as described previously (Kholodnyuk ID, Kost-Alimova M, Kashuba VI et al (1997) The region of 3p22-p21.3 is non-randomly eliminated from mouse human microcell hybrids during tumor growth in SCID mice. Genes Chromosomes Cancer 18:200-211).

A Bionick kit (Gibco BRL, Bethesda, MD) was used to label CIS DNA and total genomic DNA (in the cases of MCH429.11 and MCH939.2) with biotin- 14-dATP. Labelled DNA (500 μg) was hybridized to normal human metaphases, prepared from PHA stimulated lymphocytes and detected with FITC or Cy3-conjugated avidin. A fluorescent microscope (LEITZ-DMRB, Leica, Heidelberg) equipped with Photometries PXL-KAF-1400 CCD camera was used for capturing of metaphases images.

Results and Discussion, the CIS procedure

The general design of the experiment is shown in Figure 7. The microcell mouse/human hybrid cell line MCH903.1 contains the entire human chromosome 3. The microcell rat/human hybrid cell line MCH429.11 contains only part of the long arm of human chromosome 3 (Wang JY, Zabarovsky ER, Talmage C et al. (1994) Somatic cell hybrid panel and Notl linking clones for physical mapping of human chromosome 3. Genomics 20: 105-113). The chromosome 3 portions in MCH429.11 and in MCH903.1, have the same origin (Zabarovsky et al. 1993, 1994, supra; and Wang et al., 1994, supra). All other genetic material is different between them. The objective of the experiment was to perform CIS using these two cell lines. The resulting DNA fragments will be hybridized to metaphase human chromosomes. If the CIS procedure were working, it would be expected that only part of the long arm of the human chromosome 3 would hybridize.

The scheme of the CIS-procedure is shown in Figure 8. DNA A and B is digested with BamRl and ligated to special linkers containing 2 recognition sites for Mvnl. Thus all molecules contain at least 4 sites for Mvnl. DNA A is PCR amplified in the presence of dUTP and m5dCTP, thus all cytosines will be methylated. DNA B is PCR amplified in the presence of normal dCTP and biotinylated primers. The two DNA preparations are mixed in equal ratios, denatured and hybridized. Subsequently, the DNA is digested with Mvnl. This enzyme can digest only dsDNA molecules without methylcytosine and will digest all homohybrids B (they contain at least 4 sites for Mvnl).

The DNA mixture is next treated with mung bean nuclease. This nuclease destroys all imperfect hybrids and ssDNAs. Thus after this treatment there will remain only perfect homohybrids A and perfect (without any mismatches) heterohybrids. The DNA mixture is then treated with UDG (uracil-DNA glycosylase). This enzyme removes uracil base from the DNA and thus destroys all DNA from individual A. As a result, there will be only ssDNA from individual B which is identical to the DNA in individual A. Using magnetic beads this DNA is concentrated and purified then PCT amplified by specific primer (Antiuniv).

This procedure have been applied to DNA strands from MCH 429.11 (DNA A) and MCH903.1 (DNA B). The resulting DNA product was labeled and hybridized to metaphase human chromosomes as described previously (Fedorova L, Kost-Alimova M, Gizatullin RZ et al. (1996/7) Assignment and ordering of twenty-three unique Notl- linking clones containing expressed genes including the guanosine 5'-monophosphate synthetase gene to human chromosome 3. Eur J Hum Genet 5:110-116).

The results of the FISH analysis are shown in Figure 9. The patterns of hybridization for MCH429.11 (Fig. 9A) and CIS DΝA (Fig. 9C) were very similar. No hybridization to the short arm of human chromosome 3 was observed.

To check if the CIS procedure results in cloning of only identical sequences, the present CIS procedure was applied to DNA from MCH939.2 (DNA A) and MCH 903.1 (DNA B). These two cell lines contain human chromosome 3 from unrelated individuals. In this case patterns of hybridization were completely different for MCH 939.2 (Fig. 10A) and CIS DNA (Fig. IOC). CIS DNA gave only background hybridization. This experiment demonstrated that CIS procedure eliminated even highly homologous but not identical sequences. In a direct experminent with wtP53 and mutant P53 genes, it was also shown that mung bean nuclease can recognize and cut single base mismatches (Materials and Methods).

Several features make the CIS procedure advantageous compared to genomic mismatch scanning (GMS). The replacement of MutHLS enzyme with mung bean nuclease has several other consequences in addition to the above mentioned absence of MutHLS on the market. Use of the MutHLS enzyme creates certain inherent problems. For instance, not all Pstl fragments contain the GATC site. This introduces problems both for methylation with Dam methylase and nicking with MutHLS (GATC is recognition site for both enzymes). Moreover, it is difficult to estimate the completeness of the methulation of GATC sites in genomic DNA.

Another problem stems from the capacity of MutHLS to digest hemimethylated GATC sequences and this endonuclease activity does not depend on the presence of mismatched base pairs (Welsh KM, Lu AL, Clark S, Modrich P (1987) Isolation and characterization of the Escerichia coli mutH gene product. JBiol Chem 262:15624-15629).

The MutHLS enzyme does not recognize all mismatches and has different sensitivity to different mismatches (e.g. G-T greater tha G-C, etc.; Nelson SF, McCusker JH, Sander MA, Kee Y, Modrich P, Brown PO (1993) Genomic mismatch scanning: a new approach to genetic linkage mapping. Nat Genet 1: 11-18; Au KG, Welsh K, Modrich P (1992) Initiation of methyl-directed mismatch repair. JBiol Chem 267:12142-12148).

Important to note that MutHLS is not a robust enzyme. The distance between a mismatch and GATC as well as the size of the DNA fragment influence the activity of the enzyme (Au et al., 1992, supra).

The technique does not work for heterohybrid molecules containing, for instance, 8-80 bp mismatches since MutHLS proteins do not cut them and a BNDC column will not remove them. All these limitations result in an increase of background hybridization which then results in decreased efficiency of the procedure. The CIS procedure according to the present invention avoids all of these problems. For example all DNA fragments contain at least four recognition sites for Mvnl, all cytosines are methylated, BNDC columns are not used, etc. To sum up, this experiment showed that the CIS procedure is effective and useful for the generation of chromosome and region-specific probes.

The present scheme may also be contemplated for identification of hereditary disease genes. To test this, the CIS procedure is applied to a panel of DNA pools from families suffering from familial nasopharyngeal carcinoma (NPC), to identify potential NPC susceptibility genes. Comparing two great grandchildren (with different grandparents) resulted in detection of 0.3%-0.4% of hybridizing PAC clones (RPCIl, UK HGMP, Resource Centre) that proved the selectivity of the CIS procedure. Only CIS fragments larger than 1 kb were used for labelling.

Claims

1. A method of enriching a specific nucleic acid segment, which method comprises the steps of

(a) providing a first sample A and a second sample B derived from different sources and digestion of both said samples with restriction enzyme to provide a plurality of short segments;

(b) amplification of sample A with a suitable primer and dATPs, dCTPs, dGTPs and dUTPs,

(c) amplification of sample B with a labelled primer and dATPs, dCTPs, dGTPs and dTTPs,

(d) combination of samples A and B;

(e) denaturation and hybridization;

(f) treatment with uracil-DNA glycosylase (UDG) and a nuclease; and, optionally, re- peting steps (a)-(f);

(g) isolation of the specific segment originally present in sample B by use of the primer label.

2. A method according to claim 1, wherein the nuclease used in step (f) is mung bean nuclease.

3. A method according to claim 1 or 2, wherein the label is biotin and the specific segment is isolated from the sample mixture by use of streptavidin beads.

4. A method according to any one of the preceding claims, which further comprises a step for enzyme inactivation and ligation of linkers to the short segments obtained

5. A method according to any one of the preceding claims, which is followed by a step of detection of the enriched segment and cloning thereof in a suitable vector.

6. A method according to any one of claims 1-5, wherein in step (c), sample A and B are mixed in a ratio of at least about 50: 1, such as about 75: 1 and preferably about 100: 1 to enable a subsequent subtractive hybridization.

7. A method according to any one of claims 1-6, wherein a single nucleotide polymorphism (SNP) originally present in sample B is enriched and cloned by repetition of steps (a)-(f) at least twice.

8. A method according to claims 1-6, wherein a specific segment present in sample B, but not in sample A, is enriched and cloned by repetition of steps (a)-(e) at least four times, samples A and B being comprised of nucleic acids identical except for said specific segment.

9. A method according to any one of claims 1-5, wherein a segment originally present in both sample A and sample B is enriched.

10. A method according to claim 9, wherein samples A and B after digestion thereof are ligated to linkers comprising recognition sites for Mvnl, sample A is amplified in the presence of d^m^CTP to methylate essentially all cytosines and the combined sample mixture is digested with Mvnl prior to step (f).

11. Use of a method according to claim 6 in combination with a method according to any one of claims 7-10 for cloning of sequences identical between two complex genomes in order to localize a hereditary disease gene.

12. A kit for performing a method according to any one of claims 1-10.

13. A kit according to claim 12, which comprises (i) one or more restriction enzymes;

(ii) one or more linker oligonucleotides;

(iii) amplification buffer solution containing dATPs, dCTPs, dGTPs and dUTPs, primer for sample A and enzyme;

(iv) amplification buffer solution containing dATPs, dCTPs, dGTPs and dTTPs, labeled primer for sample B and enzyme;

(v) hybridization buffert;

(vi) uracil-DNA glycosylase;

(vii) nuclease;

(iix) means for identification of labeled primer from sample B; said components being presented in a suitable package having separate compartments for separate reaction and method steps and optionally together with instructions for the use thereof, wherein each component is present in an appropriate buffer or solvent if required.

14. A kit according to claim 12 or 13, wherein the nuclease is mung bean nuclease.

15. A kit according to any one of claims 12-14, wherein the labeled primer for sample B amplification is biotinylated PBsub and said means for identification is streptavidin.

16. A kit according to any one of claims 12-15, wherein the primer for sample A is non- biotinylated PBsub (Antiuniv).

17. A kit according to any one of claims 12-16, wherein the amplification bufferts (iii) and (iv) comprises Taq-polymerase.