WO2000055364A2 - Genetic analysis - Google Patents

Genetic analysis Download PDF

Info

Publication number
WO2000055364A2
WO2000055364A2 PCT/GB2000/000916 GB0000916W WO0055364A2 WO 2000055364 A2 WO2000055364 A2 WO 2000055364A2 GB 0000916 W GB0000916 W GB 0000916W WO 0055364 A2 WO0055364 A2 WO 0055364A2
Authority
WO
WIPO (PCT)
Prior art keywords
dna
fragments
fragment
interest
restriction
Prior art date
Application number
PCT/GB2000/000916
Other languages
French (fr)
Other versions
WO2000055364A3 (en
Inventor
Michael Alan Reeve
Nicholas Ian Workman
Luis Martin-Parras
Original Assignee
Amersham Pharmacia Biotech Uk Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Amersham Pharmacia Biotech Uk Limited filed Critical Amersham Pharmacia Biotech Uk Limited
Priority to CA002362771A priority Critical patent/CA2362771A1/en
Priority to JP2000605780A priority patent/JP2002538837A/en
Priority to EP00909502A priority patent/EP1173609A2/en
Priority to AU31782/00A priority patent/AU3178200A/en
Publication of WO2000055364A2 publication Critical patent/WO2000055364A2/en
Publication of WO2000055364A3 publication Critical patent/WO2000055364A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • C12Q1/683Hybridisation assays for detection of mutation or polymorphism involving restriction enzymes, e.g. restriction fragment length polymorphism [RFLP]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips

Definitions

  • markers are essentially used as a surrogate for sequencing - the more markers, the better
  • the logical endpomt of the above argument is to look at every base in the human genome - and carry out what could be termed a whole genome association study
  • the sequence at every base would be determined for the genome of each member of a phenotypically 'affected' and a phenotypically 'unaffected' population.
  • Statistical correlations could then be drawn between sequence differences and phenotype
  • Such associations would have future predictive values for the phenotype of interest, knowing the genotype and could be of great value in medicine and pharmacogenetics.
  • the current invention selectively enriches for DNA fragments that determine phenotype in the 'affected' population and thus makes the prospect of carrying out whole genome association studies for humans and other species a very real possibility Definition of terms used with the current invention
  • the individuals chosen for whole genome analysis may be human, animal or plant and they may be eukaryotic, prokaryotic or archaebacterial in origin.
  • the terms 'affected' and 'unaffected' are used without limitation i- ⁇ rder to categorise individuals into two groups - those that possess a defined phenotype of interest ('affected' individuals) and those that do not possess the phenotype of interest ('unaffected' individuals).
  • the phenotype common to the 'affected' individuals may be either beneficial (e.g. for these individuals, a particular pharmaceutical entity might show high efficacy in a phase II clinical trial) or detrimental (e.g. for these individuals, a particular pharmaceutical entity might show adverse toxicology in a phase I ciinical trial).
  • the 'affected' population may comprise one or more individuals and the 'unaffected' population may similarly comprise one or more individuals according to the particular embodiment of the invention (see below).
  • DNA is used throughout for simplicity. Within the scope of the current invention, the term DNA may equally well apply to all or part of the haploid, diploid or polyploid genomic DNA content of one or more germ line or somatic cell(s).
  • the DNA may be extracted from cells taken directly from the individual(s), the DNA may be extracted from cells cultured or immortalised from the individual(s) or the DNA may be prepared from a library of clones - with inserts derived from the individual(s) and propagated in some appropriate host and cloning vector system.
  • DNA refers to the expressed part of the haploid, diploid or polyploid genomic DNA content of one or more somatic cells and the DNA is prepared from a library of clones - with inserts derived from the individual(s) and propagated in some appropriate host and cloning vector system
  • a cDNA library (normalised or otherwise) may be used.
  • DNA is compared in fragmented form Fragmentation can be performed after DNA extraction, prior to cloning and/or after cloning Restriction enzyme digestion is the preferred method for such fragmentation - though other methods (e g shearing or sonication) will be obvious to those skilled in the art
  • polymerase chain reaction amplification can be used to prepare the DNA for comparison in fragmented form Priming sites within the vector sequence flanking the cloned restriction enzyme fragmented inserts may be usefully employed for one or more cycles of polymerase chain reaction amplification of the fragmented DNA of interest.
  • the primers used for polymerase chain reaction amplification of the fragmented DNA of interest could again be used after the phenotype- determining fragment enrichment process to 'rescue' and clone the enriched fragments.
  • biotinylation and streptavidin capture are used both as an example and as the currently preferred embodiment for the invention
  • the streptavidin may be surface attached to inert particles (magnetic or otherwise) or to vessel walls (e g. microtitre plate wells)
  • the biotin may be introduced via a deoxynucleotide t ⁇ phosphate analogue using a polymerase; by using a biotm-conjugated primer and polymerase chain reaction amplification; chemically or photochemically.
  • the use of biotin and streptavidin is not a limitation for the invention
  • the invention could equally be used with other high affinity capture systems well known to those skilled in the art (e.g. 'his tag' introduction and metal ion affinity capture).
  • 'normal' - used with respect to the term 'normal' - is used without limitation in order to denote a somatic cell (or somatic cells) with a discernable phenotypic characteristic (or characteristics) arising from the acquisition of a different somatic mutation (or set of somatic mutations) from that (or those) seen in the 'normal' counterpart.
  • Cells will most usually be considered 'abnormal' 5 with respect to their 'normal' counterparts through the acquisition of a different somatic mutation (or set of somatic mutations) leading to one or more of the following phenotypic characteristics: altered marker gene expression, altered genomic organisation, growth under certain selective culture conditions, immortalised growth in culture, unrestrained growth in i o vivo or in vitro, failure of normal apoptotic control mechanisms in vivo or in vitro, induction of neovascularisation, escape of cells across epithelium, migratory cell survival or metastasis.
  • somatic mutation or set of somatic mutations
  • mismatch recognition protein is used without limitation to denote a protein of
  • Inter-population perfectly matched duplex depletion 5 In the inter-population perfectly matched duplex depletion approach, we compare (in fragmented form) the pooled DNA of 'affected' individuals with the pooled DNA of 'unaffected' individuals (both from populations as outbred and otherwise similar to each other as possible). We are only interested in those regions where differences occur between 0 'affected' and 'unaffected' DNA molecules.
  • Nucleotide diversity (defined as the expected number of nucleotide differences per site between a random pair of chromosomes drawn from the population) is 1/500 for DNA in general and 1/2,000 for coding sequence DNA. This means that, on average, any two DNA fragments annealed together from such a population will contain a mismatch every 500 bases.
  • DNA sequence variants are therefore very common. They are not, however, totally random - the variants that occur every 500 bases or so are limited; they are generally biallelic at just that single base. It is this fact that the inter-population perfectly matched duplex depletion approach selectively exploits
  • Each 6 bp cutter will cut DNA every 4.096 bp on average and each 4 bp cutter will cut DNA every 256 bp on average.
  • Example sets of 6 bp cutters and 4 bp cutters that contain panels of six 6 bp cutters that are compatible with terminal restriction site profiling array (TRSPA) analysis are given in the following:
  • the invention provides a method of providing a mixture of DNA fragments enriched in fragments that are characteristic of a phenotype of interest, by providing affected DNA in fragmented form and providing unaffected DNA in fragmented form, which method comprises: a) mixing the fragments of the affected DNA and the fragments of the unaffected DNA under hybridising conditions; b) recovering a mixture of hybrids that contain mismatches; c) recovering fragments of the affected DNA from the mixture of hybrids that contain mismatches; and optionally repeating steps a), b) and c) one or more times.
  • mismatch-containing duplex selection can be achieved by attaching a mismatch- binding protein to a solid support (or using the mismatch-binding protein in solution followed by subsequent solid-phase capture), taking fragmented and denatured 'affected' DNA and hybridising this to an excess of fragmented, denatured and biotinylated 'unaffected' DNA with ensuing capture of mismatch-containing duplex molecules. Releasing the mismatch-containing duplex molecules without denaturation, streptavidin capture and then release of the non-biotinylated strands will give only the desired species as shown below.
  • Mismatch-binding protein select Capture only the mismatch- containing duplexes. Release without denaturation to give
  • Inter-population mismatch-containing duplex selection as above ensures that all of the various phenotype-determining fragments (unique to the 'affected' population) are captured for subsequent analysis - but it also causes the co-purification of very many SNP-containing ('noise') fragments.
  • the non-polymorphic and SNP-containing ('noise') fragments will be depleted as described above.
  • allelic variants are deemed to be captured 'noise'.
  • mismatch-binding protein For both the loss of general and specific SNP-containing ('noise') fragments during inter-population perfectly matched duplex depletion cycles (described above) and where the SNP interferes with the pattern of restriction digestion, if the mismatch-binding protein also binds to duplex molecules with unequal lengths (e.g. from inter-population annealing around a site of restriction site polymorphism), then the above analysis still holds (with perfectly matched duplex being replaced by equal length duplex and mismatch-containing duplex being replaced by unequal partner-length duplex).
  • TRSPA double terminal restriction site profiling array
  • the enriched DNA pool should contain many copies of all phenotype- determining fragments but also low numbers of copies of many different phenotype non-determining fragments.
  • the total number of 'noise' fragments may exceed the number of phenotype determining fragments, despite the number of each individual 'noise' species being very small.
  • the 'noise' fragments would therefore increase the number of probes required for TRSPA analysis before a pattern emerges.
  • a further kinetic enrichment procedure is used. Either one or both of strategies A and B below can be employed to achieve 'kinetic' enrichment.
  • the enriched fragment pool from inter-population mismatch containing duplex depletion is rapidly self-hybridised - enabling the common phenotype-determining fragments to form perfectly matched duplexes with greater efficiency than the rare 'noise' fragments. Selection for perfectly matched duplexes then yields a selectively further enriched pool of fragments. Multiple cycles of subtraction could be carried out if necessary.
  • the enriched fragment pool from inter-population mismatch containing duplex depletion is then hybridised to an excess of biotinylated DNA from the 'affected' pool.
  • This allows the common phenotype- determining fragments to form perfectly matched duplexes with greater efficiency than the rare 'noise' fragments.
  • Selection for perfectly matched duplexes followed by streptavidin capture and denaturation to release the non-biotinylated strands then yields a further enriched pool of fragments. Multiple such 'affected' pool back-hybridisations could be carried out if necessary.
  • each of us contains DNA sequence derived from our parents - our individuality resulting from precisely which parental alleles we receive. If one of the above small number of sequence changes results in a change in phenotype, then we can use inter-population perfectly matched duplex depletion to enrich for fragments encoding this change in phenotype. If we take 'unaffected total ancestral ' cells (by which we mean cells derived from a complete set of 'unaffected' ancestors - e.g. both parents, or mother plus two paternal grandparents, or father plus two maternal grandparents, or two maternal grandparents and two paternal grandparents etc.
  • any fragments that have acquired phenotype-determining sequence changes between 'unaffected' ancestral generations and the 'affected' descendent generation will be unable to form perfectly matched duplexes with the biotinylated 'unaffected total ancestral' fragments. Successive cycles of such inter-population perfectly matched duplex depletion will thus lead to the enrichment of fragments carrying all such sequence - the degree of enrichment per cycle being as described below.
  • p and q are both 0.5.
  • the probability of obtaining a perfectly matched duplex between a biotinylated fragment and a non-biotinylated fragment containing the site is 0.5.
  • the probability of obtaining a mismatch-containing duplex between a biotinylated fragment and a non-biotinylated fragment containing the site is also 0.5.
  • the fraction of fragments carried through the first cycle of inter-population perfectly matched duplex depletion will therefore be
  • the invention provides a method of making a set of arrays of fragments of DNA of interest, which method comprises: a) selecting, from a set of n restriction endonuclease enzymes, a subset of r restriction endonuclease enzymes; b) digesting genomic DNA with the subset of r enzymes; c) ligating to the resulting fragments restriction-enzyme-cutting- site-specific adapters with unique polymerase chain reaction amplifiable sequences; d) splitting the resulting fragments into r 2 aliquots; e) amplifying each aliquot with two-restriction enzyme-specific primers; f) forming an array of the r 2 aliquots of the amplimers; and g) repeating steps a) to f) using a different subset of r restriction 5 endonuclease enzymes.
  • the invention also includes sets of arrays obtained or obtainable by the method.
  • n restriction endonuclease enzymes may be selected from 4-cutters and 5-cutters and 6-cutters, and a set may include enzymes from one or two or three of these categories.
  • the value of n is preferably 3 io to 10, for reasons discussed below.
  • the value of r is less than n and is preferably 2 to 4, chosen with reference to the frequency with which the chosen enzymes cut nucleic acids, and ease of fragment amplification by PCR.
  • TRSPAs Terminal restriction site profiling arrays
  • restriction enzyme cutting sites are denoted A, B and C and the fragments after restriction digestion are denoted 1 , 2, 3, 4 and 5. + and - denote the sense of the strands. o Cut to completion with A. B and C to give
  • TRSPA-1 hybridisation pattern we should expect using a probe resulting from inter-population perfectly matched duplex depletion are
  • Hybridisation to an on-diagonal element e.g. the element at row z, column z.
  • A, B and C denote restriction enzyme cutting sites. 1 , 2, 3, 4, 5, 6, 7, 8 and 9 denote the restriction fragments after digestion.
  • fragment complementary to fragment 4 will be any fragment complementary to fragment 4.
  • fragment complementary to fragment 4 will be any fragment complementary to fragment 4.
  • fragment complementary to fragment 4 will be any fragment complementary to fragment 4.
  • fragment complementary to fragment 5 will be identical to fragment 5
  • fragment complementary to fragment 5 will be identical to fragment 5
  • fragment complementary to fragment 5 will be identical to fragment 5
  • fragment complementary to fragment 6 will be identical to fragment 6
  • fragment complementary to fragment 6 will be identical to fragment 6
  • restriction enzyme cutting sites are denoted A, B and C and the fragments after restriction digestion are denoted 1 , 2, 3, 4 and 5. + and - denote the sense of the strands. Cut to completion with A, B and C to give
  • TRSPA-2 hybridisation pattern we should expect using a probe resulting from inter-population perfectly matched duplex depletion are 1.
  • Hybridisation to an off-diagonal element e.g. row x, column y
  • Hybridisation to a whole row and column intersecting at an on-diagonal element e.g. all of row z and all of column z.
  • A, B and C denote restriction enzyme cutting sites. 1 , 2, 3, 4, 5, 6, 7, 8 and 9 denote the restriction fragments after digestion.
  • fragment complementary to fragment 4 will be any fragment complementary to fragment 4.
  • fragment complementary to fragment 4 will be any fragment complementary to fragment 4.
  • fragment complementary to fragment 4 will be any fragment complementary to fragment 4.
  • fragment complementary to fragment 5 will be identical to fragment 5
  • fragment complementary to fragment 5 will be identical to fragment 5
  • fragment complementary to fragment 6 will be identical to fragment 6
  • fragment complementary to fragment 6 will be identical to fragment 6
  • the number of TRSPA spots arrayed The total number of TRSPA spots is given by
  • the preferred scheme therefore employs six enzymes, tested in groups of three - giving 180 spots per experiment. In order to avoid restriction fragment length polymorphism problems, a duplicate analysis could be performed with a different set of enzymes sharing no cutting sites in common with the first set.
  • the selection of suitable enzymes is an important factor. Ideally, two sets of different enzymes are required to eliminate the small possibility that a phenotype-determining polymorphism might fall within a chosen restriction site and therefore compromise the specificity of the resulting signature.
  • the selection of enzymes can be based upon a number of criteria a) The enzymes should be 6 bp cutters. b) Cleavage by any selected enzyme should leave a 4 bp overhang at the 5' end. c) The selected enzymes in each set should all work efficiently under the same buffer conditions. d) The selected enzymes in each set should ideally work efficiently at a single incubation temperature.
  • the chosen enzymes should be commercially available - ideally at concentrations of 10 U / ⁇ l or more. f) The 5' overhangs left by any two enzymes in the same set should not be identical. g) No enzyme should appear in both sets for TRSPA fabrication. h) Enzymes should be selected to avoid or minimise the effects of mammalian methylation patterns In particular, enzymes with CG dinucleotides in their recognition sites should be avoided unless the enzyme is known to be able to restrict m5 CpG sites
  • DNA is often methylated at the 5 th position of cytosine in the sequence of CpG and this is the only chemical modification that DNA of vertebrates contains under physiological conditions.
  • Enzyme selection method 0 Sixteen possible four base pair overhangs exist (excluding unusual enzymes with asymmetrical recognition sequences such as BssSI or Bsil), five of which contain CG in the sequence. A further four overhangs could potentially contain CG sequences within the restriction recognition site if preceded by a C and followed by a G. Enzymes are 5 therefore preferentially selected from the remaining six groups.
  • Enzymes to cleave sites with all the combinations of flanking bases are not available for all overhangs - hence the enzyme choice is more limited for some overhang groups than others.
  • TRSPA terminal restriction site profiling array
  • Pattern f (row 3 and column 3)
  • the invention provides a nucleic acid characterisation method which comprises presenting to the set of arrays as defined above a nucleic acid fragment of interest under hybridisation conditions, and observing a pattern of hybridisation.
  • a plurality of nucleic acid fragments of interest are separately presented to the set of arrays, and the resulting patterns of hybridisation are compared.
  • the plurality of nucleic acid fragments of interest are drawn from the mixture of DNA fragments, enriched in fragments that are characteristic of a phenotype of interest, as described under the invention (1 ) above.
  • the invention provides a method of identifying fragments of DNA that are characteristic of a phenotype of interest, which method comprises recovering, cloning and amplifying individual DNA fragments from the mixture of DNA fragments obtained under invention (1 ) above, presenting the individual DNA fragments to the set of arrays as defined under hybridisation conditions, observing a pattern of hybridisation generated by each individual DNA fragment, and subjecting to further investigation any two or more individual DNA fragments whose hybridisation patterns are similar or identical
  • phenotype determining fragments will be enriched but will not be entirely free from 'noise' fragments Noise may result from unequal allelic frequencies for certain SNPs between the two populations. Noise will also result from the presence of somatic mutations in the cells used to prepare DNA fragments and from the use of polymerase chain reaction in some of the embodiments of the current invention
  • DNA is prepared from a library of clones (either genomic clones or cDNA clones) - with inserts derived from the ⁇ nd ⁇ v ⁇ dual(s) and propagated in some appropriate host and cloning vector system Restriction enzyme fragmentation is used prior to cloning and polymerase chain reaction amplification is used to prepare the DNA for comparison in fragmented form Priming sites within the vector sequence flanking the cloned restriction enzyme fragmented inserts are employed for one or more cycles of polymerase chain reaction amplification of the fragmented DNA of interest The primers used for polymerase chain reaction amplification of the fragmented DNA of interest are again used after the phenotype-determining fragment enrichment process to 'rescue' and clone the enriched fragments.
  • associations can initially be drawn between TRSPA signatures and phenotype.
  • the clones giving rise to a particular TRSPA signature showing a useful association with a phenotype of interest can then be sequenced in order to determine at a DNA sequence level the association(s) with the phenotype of interest.
  • associations have future predictive values for the phenotype of interest, knowing the genotype and will be of great value in medicine and pharmacogenetics.
  • the invention provides a double- stranded DNA molecule having the sequence a-A-b-B...X-y-Y-z where A, B...X and Y are unique restriction sites for n different restriction endonuclease enzymes, and a, b...y, z denotes distances in base pairs, characterised in that each fragment, obtainable by cutting the DNA molecule by means of any one or more up to n of the restriction enzymes, has a different length from every other fragment.
  • An example totally diagnostic internal control DNA which allows both the extent and exact nature of any example set 1 (or example set 2) 6 bp cutter partial digestion to be unambiguously determined for inter- population perfectly matched duplex depletion or TRSPA restriction
  • A, B, C, D, E and F denote the sites for restriction enzyme cutting and t. u. v w, x, y and z denote distances in base pairs.
  • This internal control DNA is either uniformly pre-labelled and added to the DNA of interest at an appropriate concentration prior to restriction or is Southern blot probed with a complementary sequence not found in the DNA of interest after restriction.
  • All six enzymes can cut in only one way.
  • Two enzymes can fail to cut in 6 C 2 15 ways, these are: AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF or EF failing to cut.
  • Three enzymes can fail to cut in 6 C 3 20 ways, these are: ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF or DEF failing to cut.
  • Criteria for a successful outcome include the following
  • the inter-fragment spacing should be greater for larger fragments (so as to aid electrophoretic resolution).
  • Size gaps between bands comprising different numbers of inter-site units should be greater than the size gaps between bands comprising the same number of inter-site units.
  • the size gaps and size spread from largest to smallest fragment should be electrophoretically compatible.
  • An example totally diagnostic internal control DNA which allows both the extent and exact nature of any example set 1 (or example set 2) 4 bp cutter partial digestion to be unambiguously determined for inter- population perfectly matched duplex depletion
  • A, B and C denote the sites for restriction enzyme cutting and t, u, v and w denote distances in base pairs.
  • This internal control DNA is uniformly pre-labelled and added to the DNA of interest at an appropriate concentration prior to restriction or is Southern blot probed with a complementary sequence not found in the DNA of interest after restriction.
  • each of the above possibilities will generate one or more fragments from the internal control DNA. If each possible fragment has a discernible size from any other (and from any of the fragments in simulation 7 above for the panel of up to 6 enzymes), then we can determine exactly which enzymes have cut and which have not from the size distribution of the fragments generated. The task is therefore to design such a DNA molecule.
  • Size gaps between bands comprising different numbers of inter-site units should ideally be greater than the size gaps between bands comprising the same number of inter-site units. 4.
  • the size gaps and size spread from largest to smallest fragment should be electrophoretically compatible. 5.
  • the largest fragment obtained should ideally be smaller than the smallest fragment obtained in simulation 7 above for the panel of up to six enzymes. Simulation 1
  • simulation 6 clearly fulfils all of the requirements.
  • Three enzymes can fail to cut in 6 C 3 20 ways, these are: ABC. ABD, ABE. ABF, ACD, ACE, ACF, ADE. ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE. CDF, CEF or DEF failing to cut.
  • Ail six enzymes can fail to cut in only one way.
  • the entire set of internal control DNA limit and partial digestion patterns for a panel of up to three restriction enzymes can be determined as below.
  • Example 1 a The digestion internal control plasmid for the 6 bp cutter set 1 TRSPA enzymes BamHI, BsrGI, Hindlll, Ncol, Spel, and Aflll
  • the plasmid pNW33 (shown below) was constructed to contain an insert with all of the 6 bp cutter TRSPA enzyme sites.
  • BspEl sites define the outer ends of the 140 bp and the 200 i o bp fragments.
  • the full sequence for pNW33 is shown below:
  • the insert region of 1 190 bp and the short flanking regions to the vector junctions were sequenced twice in each direction in order to establish the plasmid sequence.
  • a total of 63 analytical restriction digests and one minus-enzyme control were performed as detailed in the following table:
  • the master mix was rapidly dispensed into PCR tubes Thermal cycling was initiated using the following parameters 30 cycles of 97 " C for 1 mm, 50°C for 2 mm, and 72X for 3 mm, 72°C for 5 mm, and then 4°C
  • BspEl digestion was carried out for 1 hour at 37°C in 2 ml of 0 5 U/ ⁇ l BspEl in 1 x NEB buffer #3
  • the digest was then ethanol precipitated by the addition of 0 1 volume of 3 M sodium acetate (pH 5.2) and 2 5 volumes of ethanol, chilling to -20°C and then centnfugation Pellets were rinsed with 70 % ethanol prior to redissolution in 500 ⁇ l of 1x TE buffer 1 ⁇ l of the BspEI-released internal control DNA was mixed with 10 ⁇ l of 50 % glycerol AGE loading dye and electrophoresed on a 1.5 % agarose gel to confirm that the size of the purified DNA was in accordance with that expected High molecular weight genomic DNA digestion in the presence of internal control DNA with a dilution series of a mixture of the six set 1 6 bp cutters
  • canine genomic DNA was first mixed with a dilution series of a mixture of the six set 1 6 bp cutters - each at the same number of units. Aliquots were then removed and mixed with the BspEI-released internal control DNA described above. 20 ⁇ g of canine genomic DNA was digested with 0.25, 0.025, 0.0025, 0.00025, 0.000025, 0.0000025 and 0 U/ ⁇ l BamHI / BsrGI / Hindlll / Ncol / Spel / Aflll in 200 ⁇ l.
  • a premix of restriction enzymes, buffer and BSA was prepared as detailed below:
  • the premix therefore contained each restriction enzyme at 2.5 U/ ⁇ l in 1 X NEB buffer #2 and 1 x BSA. Serial 10-fold dilutions of this premix were then prepared in 1 x NEB buffer #2 and 1 x BSA.
  • premixes therefore contain canine genomic DNA at 0.25 mg/ml in 1 x NEB buffer #2 and 1 x BSA.
  • Canine genomic DNA and restriction enzyme mixes were l ⁇ then set up as follows:
  • samples 1 -7ic 1 ⁇ l of BspEI-released internal control DNA and 1 ⁇ l of 2x NEB buffer #2 and 2x BSA to give samples 1 -7ic. Samples 1 -7ic were then overlaid with 50 ⁇ i of mineral oil in order to prevent evaporation.
  • digests 1 -7 After digestion, 20 ⁇ l of digests 1 -7 were mixed with 10 ⁇ l of 50 % glycerol AGE loading dye and 4 ⁇ l of digests 1 -7ic were mixed with
  • Example 1 b The digestion internal control plasmid for the 4 bp cutter set 1 TRSPA enzymes Haelll, Mbol, and Msel
  • the plasmid pNW35 (shown below) was constructed to contain an insert with all of the 4 bp cutter TRSPA enzyme sites.
  • Hindlll and EcoRI sites define the outer ends of the 25 bp and the 40 bp fragments.
  • the sequence of pNW35 is shown below with the inserted region shown in bold type:
  • the 4 bp internal control plasmid for restriction enzymes 5 Haelll, Mbol, and Msell was prepared by the insertion of a synthetic 130 bp fragment into Hindlll / EcoRI-digested pMOSblue (Amersham Pharmacia Biotech).
  • the insert region of 130 bp was sequenced twice in each direction in order to establish the plasmid sequence.
  • the presence of the 0 restriction sites and the mobility of the fragments released were also checked by restriction digestion and polyacrylamide gel electrophoresis.
  • the PCR master mix was rapidly dispensed into 96 PCR tubes Thermal cycling was initiated using the following parameters 94°C for 2 mm, 50 C for 2 mm, 29 cycles of 72 C C for 2 mm, 94°C for 45 sec, and 50°C for 1 mm, 72 C for 8 mm, and then 4°C
  • the Dynabeads were then washed four times with 20 ml of 10 mM tns-HCI (pH 7 4), 1 mM EDTA (pH 8.0), 1 M NaCI. A fifth wash was performed in 20 ml of 1 x buffer M.
  • EcoRI-digestion was carried out for 1 hour at 37°C in 5 ml of 0 25 U/ ⁇ l EcoRI in 1 x buffer M The digest was then divided into ten 500 ul aliquot parts Each aliquot was ethanol precipitated by the addition of 1 ul of See DNA, 0 1 volume of 3 M sodium acetate (pH 5 2), and 2.5 volumes of ethanol The precipitations were mixed and chilled to 0°C on ice for 30 minutes and then cent ⁇ fugation at 20,000 maxRCF for 10 minutes. The pellets were rinsed with 70 % ethanol before dissolving in a total of 500 ⁇ l of 1 x TE buffer.
  • High molecular weight human placental DNA was mixed with a dilution series of a mixture of the three set 1 4 bp cutter restriction endonucleases - each at the same number of units. Aliquots were then removed and mixed with the EcoRI-released internal control DNA. 3.6 ⁇ g of Human placental DNA (Sigma) was digested with 0.5 U/ ⁇ l, 0.1 U/ ⁇ l, 0.02 U/ ⁇ l, 0.004 U/ ⁇ l, and 0 U/ ⁇ l each of Haelll, Mbol, and Msel in a total i o volume of 36 ⁇ l.
  • a premix of restriction enzymes (5 U/ ⁇ l each enzyme), buffer, and BSA was prepared as described below:
  • a 6x premix of human placental DNA, buffer, and BSA were also prepared as described below: Component 6x mix ( ⁇ l) per reaction ( ⁇ l)
  • This premix contained placental DNA at 0.25 mg/ml in 1 x NEB buffer #2 and 1 x BSA.
  • restriction digests 1 -5ic were performed under the following conditions:
  • a Cambridge Electrophoresis Ltd. vertical protein electrophoresis unit was used with 1 mm plate spacing. The samples were electrophoresed at 30 mA for 2 hours in 1 x TBE. The gel was then stained for 30 minutes in 500 ml of 1 x TBE containing 50 ⁇ l of Vistra Green. The stained gel was imaged on a Fluorimager with the following settings: a 488 nm laser; a 570 DF 30 filter; a PMT setting of 700 V; 200 ⁇ m resolution; and low sensitivity.
  • the pNW33 BamHI , Hindlll, and Aflll matrix was probed with a PCR product from the 140 bp BspEl (466) to BamHI (605) restriction fragment within pNW33.
  • the probe binds to a 204 bp Hindlll to BamHI PCR product derived from the 1 60 bp Hindlll (445) to BamHI (605) restriction fragment within pNW33 (see below).
  • the probe hybridizes to an arrayed PCR fragment with the same restriction site at each end, the pNW33 Hindlll, Ncol, and Spel matrix (matrix #17) was probed with a PCR product from the 140 bp BspEl (466) to BamHI (605) restriction fragment within pNW33.
  • the probe binds to a 514 bp Hindlll to Hindlll PCR product derived from the 470 bp Hindlll (445) to Hindlll (915) restriction fragment within pNW33 (see above).
  • Hindlll adaptor 5' pAGCTGTTATCAAGGAGCGAGAGTTATAT 3'
  • the samples were then vortex mixed, and incubated at 37°C overnight.
  • Short PCR primers and their cognate adaptors were annealed by adding 1 ⁇ l of 200 ⁇ M short PCR primer to 1 ⁇ l of 200 ⁇ M cognate adaptor in 20 ⁇ l of 50 mM NaCI, 1 x TE buffer.
  • the mixed oligonucleotides were overlaid with 30 ⁇ l of light mineral oil and were then heated to 90°C 5 for 5 minutes followed by slow cooling to room temperature.
  • the annealed short PCR primer / cognate adaptor complexes were then diluted with 1 ml of 1 x TE buffer and stored frozen at -20°C.
  • Ligations were performed in 100 ⁇ l of 1 x ligase buffer containing 1 mM ATP and 10 ⁇ l (100 U) of T4 DNA ligase. 20 ⁇ l aliquots from the 1 ml of annealed short PCR primer / cognate adaptor complexes were added according to the following table.
  • Ligation reactions were carried out for 24 hours at 16°C. Samples were then diluted to 1 ml with TE buffer and stored at -20°C.
  • the diluted ligation reactions were then further diluted 1 in 10 and 10 ⁇ l was used as PCR template per 100 ⁇ l reaction.
  • the digest was ethanol precipitated by the addition of 1 ⁇ l of See DNA, 0.1 volume of 3 M sodium acetate (pH 5.2), and 2.5 volumes of ethanol, chilling to -20°C, and then centrifugation.
  • the pellet was rinsed with 70 % ethanol prior to re-dissolution in 50 ⁇ l of 1 x CIAP buffer containing 100 U of CIAP.
  • the CIAP digest was carried out for 5 hours at 37°C and was then made up to 400 ⁇ l with 1 x TE buffer.
  • Phenol extraction of human placental DNA CIAP digest The diluted CIAP digest was extracted with 400 ⁇ l of phenol and then ethanol precipitated as described above - again with a 100 % ethanol wash after the 70 % ethanol wash. The sample was finally re- dissolved in 10 ⁇ l of TE buffer. Ligation to annealed short PCR primers and cognate adaptors
  • Short PCR primers were annealed to their cognate adaptors as described above.
  • the ligation to annealed short PCR primers and cognate adaptors was carried out in 100 ⁇ l of 1 x ligase buffer with 1 mM ATP. 100 U of T4 DNA ligase and 10 ⁇ l of each of the six short PCR primer / cognate adaptor complexes as above.
  • the ligation reaction was carried out for 24 hours at 16°C.
  • the sample was then diluted to 1 ml with 1 x TE buffer and stored at -20°C. 0.2 ⁇ l was used as PCR template per 100 ⁇ l reaction.
  • An initial touch-down reaction was carried out in 50 ⁇ l of 1x PCR buffer with all four dNTPs at 200 ⁇ M and Taq DNA polymerase at 0.05 U/ ⁇ l. Long PCR primers were used at 400 nM. 10 ⁇ l of pNW33 PCR template was used per reaction and 0.2 ⁇ l of human placental DNA PCR template was used per reaction. The samples were overlaid with 40 ⁇ l of light mineral oil and were touch-down thermocycled as described below:
  • PCR cycling parameters 5 ⁇ l of 33P-labelled dCTP was added to the PCR tube. The reactions were then made up to 50 ⁇ l by the addition of 45 ⁇ l of the master mix to each tube. Each reaction was gently mixed. PCR cycling parameters
  • 250 ⁇ l of the pooled PCR was mixed with an equal volume of 10 mg/ml streptavidin-coated colloidal Fe304 particles in 20 mM tris-HCI (pH 7.4), 2 mM EDTA (pH 8.0), 2 M NaCI.
  • the tube was incubated at room temperature for 1 hour with mixing on a Denley Orbital Mixer.
  • the streptavidin-coated colloidal Fe304 particles were then washed with 1 ml of 10 mM tris-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI. Three more identical washes were performed.
  • the washed streptavidin-coated colloidal Fe304 particles were incubated in 500 ⁇ l of 0.1 M NaOH for 10 minutes at room temperature. The supernatant was removed and added to 500 ⁇ l of 0.5 M HEPES.
  • the samples were then ethanol precipitated by the addition of 0.1 volume of 3 M sodium acetate (pH 5.2) and 2.5 volumes of ethanol. chilling to 0 C C (on ice for 30 minutes), and then centrifugation at 20,000 maxRCF for 10 minutes.
  • the pellets were rinsed with 70 % ethanol before dissolving in 100 ⁇ l of TE buffer.
  • Each membrane was placed in a 55 mm x 35 mm x 21 mm plastic box and 1.25 ml of pre-hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [Mw 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - pre- warmed to 65°C) was added. Each box was closed and incubated at 65°C for 50 minutes on a rocking platform.
  • pre-hybridization solution 5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [Mw 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - pre- warmed to 65°C
  • the pre-hybridization solution was removed and replaced with hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [Mw 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - containing 5 ⁇ l of the appropriate 33P-labelled probe) and the box was incubated at 65°C for 3 hours on a rocking platform.
  • hybridization solution 5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [Mw 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - containing 5 ⁇ l of the appropriate 33P-labelled probe
  • the membranes were drained and transferred to 200 ml of 2x SSC, 0.1 % SDS at 68°C for 30 minutes. A further wash was carried out in 0.2x SSC, 0.1 % SDS at 71 °C for 30 minutes. The membranes were rinsed in 2x SSC at room temperature and laid out on blotting paper to remove excess liquid. Once dry, the membrane was covered in Saran
  • mismatch- containing duplex selection can be achieved by: attaching a mismatch- binding protein to a solid support (or using the mismatch-binding protein in solution followed by subsequent solid-phase capture); taking denatured i o 'affected' DNA fragments and hybridizing these to denatured and biotinylated ' unaffected' DNA fragments; and capture of mismatch- containing duplex molecules with the mismatch-binding protein. Releasing the mismatch-containing duplex molecules (without strand denaturation), streptavidin capture and then release of the non-biotinylated strands will
  • PCR fragments are prepared and used to demonstrate each of the individual steps for a single cycle of inter- population perfectly matched duplex depletion using E. coli MutS protein.
  • the clone inserts were constructed using standard cloning methodology well known to those skilled in the art and were inserted between the Aval s e and EcoRI of pMOSBIue (Amersham Pharmacia Biotech). 5
  • the clone inserts contain a common 9 base pair internal core sequence in which a single nucleotide change or an insertion can be located.
  • the internal core sequence is derived from codons 272-274 of human p53. These codons (GTG CGT GGT) correspond to a mutational hotspot found in lung and other types of cancer (R273L).
  • GTG CGT GGT codons
  • the internal core sequence is flanked by a random sequence - allowing the independent detection of the clone #1 and the clone #7 insert sequence in a mixed population of clone inserts.
  • Taq DNA polymerase (5 U/ ⁇ l) 100 ⁇ l
  • Taq DNA polymerase 100 ⁇ l 96x 200 ⁇ l reactions were carried out for template #1 C and 96x 200 ⁇ l reactions were carried out for template #7C on a 96-well Perkin Elmer Cetus GeneAmp PCR System 9600 machine as described below:
  • PCR products were pooled together and precipitated by adding 0.1 volumes of 3 M sodium acetate and 1 volume of isopropanol followed by centrifugation at 16.000 rpm for 30 minutes at 4°C (in a Centrikon T-2070 ultracentrifuge; swinging bucket Kontron rotor TST 41.14). Pellets were washed with 14 ml of ethanol and centrifuged at 20,000 rpm for 30 minutes. Finally, the pellets were air-dried and resuspended in a total volume of 0.6 ml of TE buffer.
  • the 0.6 ml PCR sample was purified in twelve Microspin S- 300 HR columns (50 ⁇ l per column) following the manufacturer's protocol. Briefly, the resin in the columns was resuspended by vortexing. Columns were centrifuged at 735 x g (3000 rpm in a microfuge) for 1 minute. The sample was then applied to the centre of the resin, being careful not to disturb the bed. The columns were centrifuged at 735 x g for 2 minutes and the flow-through containing the PCR product was collected. The twelve eluted 50 ⁇ l volumes were pooled together (pool 1 ).
  • Taq DNA polymerase (5 U/ ⁇ l) 100 ⁇ l
  • Taq DNA polymerase (5 U/ ⁇ l) 100 ⁇ l
  • streptavidin-coated colloidal Fe 3 O particles were then washed with 20 ml of 10 mM tns-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI. Two more identical washes were performed. The washed streptavidin-coated colloidal Fe 3 0 particles were finally incubated in 800 ⁇ l of 0.1 M NaOH for 10 minutes at room temperature. The supernatants were removed and added to 200 ⁇ l of 2 M HEPES (free acid). Samples were quantified by absorbance at 260 nm.
  • Reannealing between the upper strand of A and the single- stranded C will therefore give rise to an A/A mismatch-containing duplex, 5'-biotinylated on the upper strand, for clone insert #1 .
  • Reannealing between the upper strand of B and the single- stranded D will therefore give rise to a perfectly matched duplex, 5'-biotinylated on the upper strand, for clone insert #7.
  • Samples were adjusted to 50 ⁇ l and were made 1 x in PBS and 1 mg/ml in BSA ready for reaction with MutS protein-coated magnetic beads.
  • One 50 ⁇ l sample (the pre-enrichment control, sample 6) was used directly for capture of biotinylated PCR product strands and release of non-biotinylated strands.
  • 50 ⁇ l of the eluates from the MutS protein-coated magnetic beads and the pre-enhchment control were each mixed with an equal volume of 4 mg/ml streptavidin-coated colloidal Fe 3 0 particles in 20 mM tris-HCI (pH 7.4), 2 mM EDTA (pH 8.0), 2 M NaCI.
  • the tubes were incubated at room temperature for 30 minutes with regular mixing.
  • streptavidin-coated colloidal Fe 3 0 4 particles were then washed twice with 500 ⁇ l of 10 mM tris-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI at room temperature.
  • the washed streptavidin-coated colloidal Fe 0 4 particles were incubated in 10 ⁇ l of 0.1 M NaOH for 10 minutes at room temperature. The supernatant was removed and added to 2.5 ⁇ l of 2 M HEPES (free acid).
  • #1 probe oligo and #7 probe oligo were radioactively 5' end- labelled using T4 polynucleotide kinase as described below (all volumes are in ⁇ l):
  • the reactions were incubated at 37°C for 30 minutes and then heated to 70°C for 5 minutes to denature the enzyme.
  • Each membrane was placed in a 55 mm x 35 mm x 21 mm plastic box and 2.5 ml of pre-hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - pre- warmed to 42°C) was added. Each box was closed and incubated at 42°C for 1 hour on a rocking platform.
  • pre-hybridization solution 5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - pre- warmed to 42°C
  • the pre-hybridization solution was removed and replaced with hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - containing 2.5 ⁇ l of the appropriate 33 P-labelled probe) and the box was incubated at 42°C overnight on a rocking platform.
  • hybridization solution 5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate; 100 ⁇ g/ml denatured, sonicated DNA - containing 2.5 ⁇ l of the appropriate 33 P-labelled probe
  • the membranes were drained and transferred to 200 ml of 2x SSC, 0.1 % SDS at 42°C for 10 minutes. A further wash was carried out in 0.2x SSC, 0.1 % SDS at 42°C for 10 minutes. The membranes were rinsed in 2x SSC at room temperature and laid out on blotting paper to remove excess liquid. Once dry, the membrane was covered in Saran Wrap and exposed to a Kodak Phosphor Screen for 1 hour. The phosphor screen was subsequently imaged using a Molecular Dynamics Storm 860 Phosphorimager.

Abstract

A method is described for use in whole genome analysis. The method - termed inter-population perfectly matched duplex depletion - can overcome many of the limitations of current approaches based upon SNPs and linkage disequilibrium within isolated populations. Inter-population perfectly matched duplex depletion isolates a fragment (or fragments) containing differences between the 'affected' and 'unaffected' populations or cells. A convenient method - terminal restriction site profiling arrays (TRSPAs) - is described for the analysis of such fragments. A totally diagnostic internal control DNA is also described which allows both the extent and exact nature of any partial digestion to be unambiguously determined for inter-population perfectly matched duplex depletion or TRSPA restriction.

Description

GENETIC ANALYSIS
INTRODUCTION
Limitations of the current approaches
There are a number of limitations to carrying out association studies using single nucleotide polymorphisms (SNPs) and linkage disequilibrium within human populations (see Science Vol 278, p1580, (1997) for a review of such methods) We have no control over recombination frequency around a given locus or over past human genetic crossing Some mutations will be closely correlated with nearby SNPs and others will not
The need for whole genome analysis
With the SNP and linkage disequilibrium approach (and many others), markers are essentially used as a surrogate for sequencing - the more markers, the better The logical endpomt of the above argument is to look at every base in the human genome - and carry out what could be termed a whole genome association study In essence, the sequence at every base would be determined for the genome of each member of a phenotypically 'affected' and a phenotypically 'unaffected' population. Statistical correlations (associations) could then be drawn between sequence differences and phenotype Such associations would have future predictive values for the phenotype of interest, knowing the genotype and could be of great value in medicine and pharmacogenetics.
The current invention selectively enriches for DNA fragments that determine phenotype in the 'affected' population and thus makes the prospect of carrying out whole genome association studies for humans and other species a very real possibility Definition of terms used with the current invention
Within the scope of the current invention, the individuals chosen for whole genome analysis may be human, animal or plant and they may be eukaryotic, prokaryotic or archaebacterial in origin. The terms 'affected' and 'unaffected' are used without limitation i- ^rder to categorise individuals into two groups - those that possess a defined phenotype of interest ('affected' individuals) and those that do not possess the phenotype of interest ('unaffected' individuals). The phenotype common to the 'affected' individuals may be either beneficial (e.g. for these individuals, a particular pharmaceutical entity might show high efficacy in a phase II clinical trial) or detrimental (e.g. for these individuals, a particular pharmaceutical entity might show adverse toxicology in a phase I ciinical trial).
The 'affected' population may comprise one or more individuals and the 'unaffected' population may similarly comprise one or more individuals according to the particular embodiment of the invention (see below).
The term DNA is used throughout for simplicity. Within the scope of the current invention, the term DNA may equally well apply to all or part of the haploid, diploid or polyploid genomic DNA content of one or more germ line or somatic cell(s). The DNA may be extracted from cells taken directly from the individual(s), the DNA may be extracted from cells cultured or immortalised from the individual(s) or the DNA may be prepared from a library of clones - with inserts derived from the individual(s) and propagated in some appropriate host and cloning vector system. For the particular case wherein the term DNA refers to the expressed part of the haploid, diploid or polyploid genomic DNA content of one or more somatic cells and the DNA is prepared from a library of clones - with inserts derived from the individual(s) and propagated in some appropriate host and cloning vector system, a cDNA library (normalised or otherwise) may be used. In the current invention DNA is compared in fragmented form Fragmentation can be performed after DNA extraction, prior to cloning and/or after cloning Restriction enzyme digestion is the preferred method for such fragmentation - though other methods (e g shearing or sonication) will be obvious to those skilled in the art
For the particular case wherein the DNA is prepared from a library of clones (either genomic clones or cDNA clones) - with inserts derived from the ιndιvιdual(s) and propagated in some appropriate host and cloning vector system and wherein restriction enzyme fragmentation is used prior to cloning, polymerase chain reaction amplification can be used to prepare the DNA for comparison in fragmented form Priming sites within the vector sequence flanking the cloned restriction enzyme fragmented inserts may be usefully employed for one or more cycles of polymerase chain reaction amplification of the fragmented DNA of interest. The primers used for polymerase chain reaction amplification of the fragmented DNA of interest could again be used after the phenotype- determining fragment enrichment process to 'rescue' and clone the enriched fragments.
Within the scope of the current invention, the terms biotinylation and streptavidin capture are used both as an example and as the currently preferred embodiment for the invention The streptavidin may be surface attached to inert particles (magnetic or otherwise) or to vessel walls (e g. microtitre plate wells) The biotin may be introduced via a deoxynucleotide tπphosphate analogue using a polymerase; by using a biotm-conjugated primer and polymerase chain reaction amplification; chemically or photochemically. The use of biotin and streptavidin is not a limitation for the invention The invention could equally be used with other high affinity capture systems well known to those skilled in the art (e.g. 'his tag' introduction and metal ion affinity capture). Within the scope of the current invention, the term 'abnormal'
- used with respect to the term 'normal' - is used without limitation in order to denote a somatic cell (or somatic cells) with a discernable phenotypic characteristic (or characteristics) arising from the acquisition of a different somatic mutation (or set of somatic mutations) from that (or those) seen in the 'normal' counterpart. Cells will most usually be considered 'abnormal' 5 with respect to their 'normal' counterparts through the acquisition of a different somatic mutation (or set of somatic mutations) leading to one or more of the following phenotypic characteristics: altered marker gene expression, altered genomic organisation, growth under certain selective culture conditions, immortalised growth in culture, unrestrained growth in i o vivo or in vitro, failure of normal apoptotic control mechanisms in vivo or in vitro, induction of neovascularisation, escape of cells across epithelium, migratory cell survival or metastasis.
Within the scope of the current invention, the term mismatch recognition protein is used without limitation to denote a protein of
15 eukaryotic, prokaryotic or archaebacterial origin capable of the selective recognition of (and binding to) a DNA duplex that is not perfectly matched along its entire length. Recognition of (and binding to) will be preferably for bases that are not engaged in correct Watson and Crick pairing and for small deletions or insertions. Many such proteins are known to those 0 skilled in the art. Prokaryotic and eukaryotic mutS homologues, phage T4 endonuclease VII, phage T7 endonuclease I and the plant enzyme CEL-1 are just some examples.
Inter-population perfectly matched duplex depletion 5 In the inter-population perfectly matched duplex depletion approach, we compare (in fragmented form) the pooled DNA of 'affected' individuals with the pooled DNA of 'unaffected' individuals (both from populations as outbred and otherwise similar to each other as possible). We are only interested in those regions where differences occur between 0 'affected' and 'unaffected' DNA molecules. For populations as above, the only prevalent sequence differences within the 'affected' pool (compared to the unaffected' pool) should be somewhere within the gene(s) (using the term gene in its widest sense to include exons introns and all associated upstream and downstream regulatory sequences) that actually determιne(s) their common phenotype This means that we are no longer tied into working with rare (and perhaps atypical) populations where there is high genetic homogeneity
Pooling the DNA from entire phenotypically-defined populations massively reduces the amount of labour involved.
DNA sequence variation in populations
Nickerson et al, Nature Genetics, Vol 19, p233 (1998), sequenced 9.7 kb of the lipoprotein lipase gene from 71 individuals (24 African-Americans, 24 Europeans and 23 European-Americans). This gene is fairly typical (90 % mtron and 10 % exon - total size 30 kb with 10 exons). 88 sequence variants were found (i.e. one per 1 10 bp on average). Most variations were found in non-coding sequence. 90 % of these were SNPs (60 % of which were transitions and 40 % were tranεversions). All of the SNPs were biallelic. 10 % of the sequence variants were insertions or deletions at repeat sequences. 58 % of the sequence variants were present in all three ethnic populations. Half of these were found in heterozygous form and half in homozygous form.
Nucleotide diversity (defined as the expected number of nucleotide differences per site between a random pair of chromosomes drawn from the population) is 1/500 for DNA in general and 1/2,000 for coding sequence DNA. This means that, on average, any two DNA fragments annealed together from such a population will contain a mismatch every 500 bases.
DNA sequence variants are therefore very common. They are not, however, totally random - the variants that occur every 500 bases or so are limited; they are generally biallelic at just that single base. It is this fact that the inter-population perfectly matched duplex depletion approach selectively exploits
Statistical analysis of the inter-population perfectly matched duplex depletion fragmentation process
If the length of the DNA fragments is F bases and the average length between sequence differences between any two DNA molecules is 500 bases the probability that a hybrid duplex between any two random DNA fragments will contain no mismatches is given by
Pr(0) _ e (F/500)
and the probability that a hybrid duplex between any two random DNA molecules will contain any mismatches is given by
Pr( 1 ) = {1 -(e (F/500)
)}
Example values for Pr(0) and Pr(>1 ) for different values of F are given below
Figure imgf000008_0001
Example average restriction fragment sizes for DNA digestion with six 6 bp cutters and up to four 4 bp cutters
Each 6 bp cutter will cut DNA every 4.096 bp on average and each 4 bp cutter will cut DNA every 256 bp on average.
For a given set of A 6 bp cutters and B 4 bp cutters (with no duplication of restriction enzyme cutting sequence), the average fragment length F will be
{(A/4.096)+(B/256)}"
Example values for F as A and B are varied are given in the following table
Figure imgf000009_0001
We should note that the above situation assumes that none of the 4 bp cutter recognition sites lie within any of the 6 bp cutter recognition sites. If, for example, we have a 4 bp cutter recognition site nested within a 6 bp cutter recognition site (e.g. from the use of Mbol and BamHI in the fragmentation), then we should reduce the value of A from 6 to 5 in the above.
In general, if we have a given set of A enzymes that cut DNA every a bp on average, B enzymes that cut DNA every b bp on average, ... , Z enzymes that cut DNA every z bp on average (with no duplication of restriction enzyme cutting sequence), the average fragment length F will be
{(A/a)+(B/b)+ ... + (Z/z)}-1 Example sets of 6 bp cutters and 4 bp cutters for fragmentation
Example sets of 6 bp cutters and 4 bp cutters that contain panels of six 6 bp cutters that are compatible with terminal restriction site profiling array (TRSPA) analysis (see below) are given in the following:
Example enzyme set 1
For restriction in M buffer + BSA
Figure imgf000010_0001
Figure imgf000010_0002
Example enzyme set 2
For restriction in M buffer + BSA
Figure imgf000010_0003
Figure imgf000011_0001
Aspect (I)
In one aspect the invention provides a method of providing a mixture of DNA fragments enriched in fragments that are characteristic of a phenotype of interest, by providing affected DNA in fragmented form and providing unaffected DNA in fragmented form, which method comprises: a) mixing the fragments of the affected DNA and the fragments of the unaffected DNA under hybridising conditions; b) recovering a mixture of hybrids that contain mismatches; c) recovering fragments of the affected DNA from the mixture of hybrids that contain mismatches; and optionally repeating steps a), b) and c) one or more times.
Inter-population mismatch-containing duplex selection
'Affected' versus 'unaffected' (i.e. inter-population) mismatch- containing duplex selection can be achieved by attaching a mismatch- binding protein to a solid support (or using the mismatch-binding protein in solution followed by subsequent solid-phase capture), taking fragmented and denatured 'affected' DNA and hybridising this to an excess of fragmented, denatured and biotinylated 'unaffected' DNA with ensuing capture of mismatch-containing duplex molecules. Releasing the mismatch-containing duplex molecules without denaturation, streptavidin capture and then release of the non-biotinylated strands will give only the desired species as shown below.
Method Fragment the 'affected' DNA. Fragment and derivatise the
'unaffected' DNA with biotin. Only DNA from this population will be streptavidin captured. Melt and anneal to give
biotin ' biotin biotin Biotin
biotin Λ biotin
Mismatch-binding protein select. Capture only the mismatch- containing duplexes. Release without denaturation to give
biotin biotin v
biotin Streptavidin capture to give
^ biotin biotin v
biotin
Release the non-biotinylated strands to give
biotin
Ψ
Repeat as necessary.
Repetition of the above sequence of reactions will lead to inter-population perfectly matched duplex depletion.
What will be purified ?
Inter-population mismatch-containing duplex selection as above ensures that all of the various phenotype-determining fragments (unique to the 'affected' population) are captured for subsequent analysis - but it also causes the co-purification of very many SNP-containing ('noise') fragments.
We now need to consider the fate of the various types of fragment (i.e. those that determine the phenotype and those that do not) as we carry out inter-population perfectly matched duplex depletion cycles as above. Recovery of 'affected' DNA molecules after streptavidin capture
For a particular fragment, if we have X molecules of 'affected' DNA and Y molecules of 'unaffected' biotinylated DNA. after complete hybridisation, there will be a ratio of {Y/(X+Y)} streptavidin-capturable molecules to {X/(X+Y)j streptavidin-non-capturable molecules. We can thus manipulate the yield of streptavidin-capturable hybrids by varying X and Y.
Example recovery and loss figures for various X and Y are shown in the following table
Figure imgf000014_0001
After n cycles, the recovery for a phenotype-determining fragment will be given by {Y/(X+Y)}n
Loss of general SNP-containing ('noise') fragments during inter- population perfectly matched duplex depletion cycles
If we anneal fragmented DNA molecules together and capture only the mismatch-containing duplexes, then n repetitions of such a process will reduce the original number of fragments to the following fraction
{1 -(e- (F/500))}n-{Y/(X+Y)}n The enrichment for phenotype-determining fragments over SNP-containing ('noise') fragments during inter-population perfectly matched duplex depletion cycles will therefore be given by
{1 -(e - (F/500)u-n
)}-'
Example figures for enrichment are given below with F = 50, 100, 200. 300, 400, 500, 600, 700, 800, 900 and 1 ,000 and n = 1 , 2, 3, 4, 5, 6 ,7, 8, 9 and 10.
Figure imgf000015_0001
Loss of specific SNP-containing ('noise') fragments during inter- population perfectly matched duplex depletion cycles
The non-polymorphic and SNP-containing ('noise') fragments will be depleted as described above.
Not all fragments will, however, be depleted at the same rate. An individual SNP-containing ('noise') fragment will be depleted with every cycle of inter-population perfectly matched duplex depletion as outlined below.
Let us assume that there are two alleles for the polymorphism within a particular fragment. Let these be called P and Q. Let p be the fraction of the P allele in the (outbred) 'affected' and 'unaffected' populations and let q be the fraction of the Q allele. Let (p+q)=1 , so that q=(1 -p)-
After denaturation and annealing, the four possible events are PP, PQ, QP and QQ. PP and QQ will form perfectly matched duplexes and will therefore be lost - whereas PQ and QP will form mismatch-containing duplexes and will consequently be recovered.
After one cycle, the fraction of recovered molecules will be
{2pq/(p2+2pq+q2)}
= {2pq/((p+q)2)}
= 2pq
= 2p(1 -p)
Hence if we start out with M molecules of DNA from the population, there will be 2Mpq molecules remaining after the first round of inter-population mismatch-containing duplex selection. Let us denote the number of molecules entering the second round of inter-population mismatch-containing duplex selection by M', where
M' = 2Mpq
and the fraction of lost molecules will be
{(p2+q2)/(p2+2pq+q2)}
= {(p +q2)/((p+q)2)} 2 2 p +q
= p2+{1 -2p+p2}
= 1 -2p+2p2
= 1 -2p(1 -p)
= 1 -2pq
The fractional loss of P-allelic molecules will be p2 and the fractional loss of Q-allelic molecules will be q2 ( = (1 -p)2).
If we start out with M molecules of DNA from the population, there will be Mp of the P-allelic molecules and Mq = M(1-p) of the Q-allelic molecules before inter-population mismatch-containing duplex selection. After inter-population mismatch-containing duplex selection, there will therefore be Mp-Mp2 = Mp(1 -p) = Mpq of the P-allelic molecules and Mq- Mq2 = Mq(1 -q) = Mqp of the Q-allelic molecules. In other words, after the first round of mismatch-containing duplex selection, there will be the same number of P-allelic molecules as Q-allelic molecules.
We can define new allelic frequencies p' and q' as follows
p' = q' = 0.5
If we now perform a second round of inter-population mismatch-containing duplex selection, we start out with M' molecules of DNA from the first round. There will be M'p' of the P-allelic molecules and M'q' of the Q-allelic molecules before inter-population mismatch-containing duplex selection. After inter-population mismatch-containing duplex selection, there will therefore be M'p'-Mpp' = M'p'(1 -p) = M'p'q of the P-allelic molecules and M'q'-Mqq' = M'q'(1 -q) = M'q'p of the Q-allelic molecules. The total number of molecules (M") will be
M'p'q + M'q'p
= M'(p'q-q'p)
but p' = q' = 0.5
hence M" = 0.5*M'(p+q)
but (p+q) = 1
so M" = (M72)
In other words, after the second round of inter-population mismatch-containing duplex selection, there will again be the same number of P-allelic molecules as Q-allelic molecules. Thus the new allelic frequencies p" and q" remain as previously p" = q" = 0.5. A pattern now emerges. After the first cycle of inter- population mismatch-containing duplex selection, the allelic frequencies are equalled and the number of molecules is reduced to 2Mpq (Mpq of each allelic molecule). Thereafter, every cycle halves the number of molecules and keeps both alleles at the same frequency. Consequently, after n cycles, the number of P-allelic molecules will be
Mpq(0.5)n"1
and the number of Q allelic molecules will be Mqp(0.5) n 1
both are of course equal
If we now take the capture yield (see above) into consideration, the SNP-containing ( noise') fragment yield will be given by
2Mpq(0.5)n 1-{Y/(X+Y)}π
where both allelic variants are deemed to be captured 'noise'.
Polymorphisms that interfere with the pattern of restriction digestion
For both the loss of general and specific SNP-containing ('noise') fragments during inter-population perfectly matched duplex depletion cycles (described above) and where the SNP interferes with the pattern of restriction digestion, if the mismatch-binding protein also binds to duplex molecules with unequal lengths (e.g. from inter-population annealing around a site of restriction site polymorphism), then the above analysis still holds (with perfectly matched duplex being replaced by equal length duplex and mismatch-containing duplex being replaced by unequal partner-length duplex).
In the rare cases where a restriction site is lost due to a sequence change that actually determines the phenotype of interest, a double-length fragment will be obtained. This will give rise to a double terminal restriction site profiling array (TRSPA) signature (see below). Multiple isolates of the particular double signature will be indicative of an association between the fragment and the phenotype of interest. Further 'kinetic' enrichment to enhance the selective removal of SNP- containing noise' from the pool of phenotype determining fragments
After multiple cycles of enrichment by the above procedure, the enriched DNA pool should contain many copies of all phenotype- determining fragments but also low numbers of copies of many different phenotype non-determining fragments. The total number of 'noise' fragments may exceed the number of phenotype determining fragments, despite the number of each individual 'noise' species being very small. The 'noise' fragments would therefore increase the number of probes required for TRSPA analysis before a pattern emerges. To largely eliminate this problem, a further kinetic enrichment procedure is used. Either one or both of strategies A and B below can be employed to achieve 'kinetic' enrichment.
Strategy A - Subtraction of the enriched DNA from inter-population mismatch containing duplex depletion
The enriched fragment pool from inter-population mismatch containing duplex depletion is rapidly self-hybridised - enabling the common phenotype-determining fragments to form perfectly matched duplexes with greater efficiency than the rare 'noise' fragments. Selection for perfectly matched duplexes then yields a selectively further enriched pool of fragments. Multiple cycles of subtraction could be carried out if necessary.
Strategy B - Hybridisation of the enriched DNA from inter-population mismatch containing duplex depletion against the 'affected' DNA pool
The enriched fragment pool from inter-population mismatch containing duplex depletion is then hybridised to an excess of biotinylated DNA from the 'affected' pool. This allows the common phenotype- determining fragments to form perfectly matched duplexes with greater efficiency than the rare 'noise' fragments. Selection for perfectly matched duplexes followed by streptavidin capture and denaturation to release the non-biotinylated strands then yields a further enriched pool of fragments. Multiple such 'affected' pool back-hybridisations could be carried out if necessary.
Extension of the invention to the case of single phenotypically 'affected' individuals within populations where the distinction between 'affected' and 'unaffected' is clear
The above has described inter-population perfectly matched duplex depletion between non-biotinylated DNA fragments from an
'affected' population and biotinylated DNA fragments from an 'unaffected' population. Provided the 'unaffected' population is sufficiently complex that it contains all the non-phenotype-determining sequence variants found in a single 'affected' individual, then inter-population perfectly matched duplex depletion should be possible for single phenotypically 'affected' individuals against an 'unaffected' population where the distinction between 'affected' and 'unaffected' is clear. The latter proviso is needed in order to ensure that a small number of misdiagnosed 'affected' individuals in the 'unaffected' population do not cause the removal of phenotype-determining fragments during inter-population perfectly matched duplex molecular depletion.
Extension of the invention to the case of disease gene identification in cases where novel phenotype-determining mutations arise spontaneously within a family
Except for a small number of sequence changes, each of us contains DNA sequence derived from our parents - our individuality resulting from precisely which parental alleles we receive. If one of the above small number of sequence changes results in a change in phenotype, then we can use inter-population perfectly matched duplex depletion to enrich for fragments encoding this change in phenotype. If we take 'unaffected total ancestral' cells (by which we mean cells derived from a complete set of 'unaffected' ancestors - e.g. both parents, or mother plus two paternal grandparents, or father plus two maternal grandparents, or two maternal grandparents and two paternal grandparents etc. ) as the source of our biotinylated fragments and cells from an 'affected' descendent as the source of our non-biotinylated fragments, any fragments that have acquired phenotype-determining sequence changes between 'unaffected' ancestral generations and the 'affected' descendent generation will be unable to form perfectly matched duplexes with the biotinylated 'unaffected total ancestral' fragments. Successive cycles of such inter-population perfectly matched duplex depletion will thus lead to the enrichment of fragments carrying all such sequence - the degree of enrichment per cycle being as described below.
Statistical analysis
Let us assume that equal numbers of fragments are used from each of the 'unaffected' ancestors. Let the number of such ancestors be A.
{1/A} of the annealings in the inter-population perfectly matched duplex depletion will be self-against-ancestral-transmitted alleles - statistically equivalent to self-against-self inter-population perfectly matched duplex depletion (see below).
{(A-1 )/A} of the annealings in the inter-population perfectly matched duplex depletion will be self-against-ancestral-nontransmitted alleles - statistically equivalent to inter-population perfectly matched duplex depletion between unrelated individuals.
Figure imgf000022_0001
From the data of Nickerson et al, a DNA sequence variation between unrelated individuals should occur every 500 base pairs (it is this figure that we use for inter-population perfectly matched duplex depletion with self-against-ancestral-nontransmitted alleles and inter-population
5 perfectly matched duplex depletion between unrelated individuals). In addition, a given individual should be heterozygous once every 573 base pairs (this figure is used for inter-population perfectly matched duplex depletion with self-against-ancestral-transmitted alleles and self-against- self inter-population perfectly matched duplex molecular depletion). Inter-
I O population perfectlv matched duplex depletion against the transmitted alleles and the nontransmitted alleles will now be considered separately.
Self-against-ancestral-transmitted alleles inter-population perfectly matched duplex depletion
15 If we anneal the two complementary strands for a fragment using DNA from an 'affected' descendent and DNA containing the transmitted alleles of 'unaffected' ancestors, then {1 -e("F 573)} of fragments will contain one or more site of heterozygosity. {1/A} of the annealings will be of this type. For such annealings, where a site of heterozygosity is 0 present, the probability of obtaining a mismatch-containing duplex between a biotinylated fragment and a non-biotinylated fragment containing the site of heterozygosity is 0.5.
Self-against-ancestral-nontransmitted alleles inter-population 5 perfectly matched duplex depletion
If we anneal the two complementary strands for a fragment using DNA from an 'affected' descendent and DNA containing the nontransmitted alleles of 'unaffected' ancestors, then {1 -e("F 500>} of fragments will contain one or more site of DNA sequence variation. {(A- 0 1 )/A} annealings will be of this type. Phenotype-determining fragment enrichment
The fraction of fragments carried through the first cycle of inter-population perfectly matched duplex depletion will therefore be
([0 5 {1 /A}-{1 -e F/573)}]+[{(A-1 )/A}.{1 -e( F/500)}]).{Y/(X+Y)}
where {Y/(X+Y)} represents the yield of streptavidin- capturable molecules
Hence n repetitions of such a process will reduce the original number of fragments to the following fraction
([0 5-{1/A}-{1 -e( F/573)}]+[{(A-1 )/A}-{1 -e( F/500)}])nJN/(X+Y)}n
The enrichment for phenotype-determining fragments over SNP-containing ('noise') fragments during mter-population perfectly matched duplex depletion cycles will therefore be given by
([0 5-{1 /A}-{1 -e( F 573)}]+[{(A-1 )/A}-{1 -e( F 500)}]) n
Example figures for enrichment are given for A = 2, 3 and 4 below with F = 50, 100. 200, 300, 400, 500, 600, 700, 800, 900 and 1 ,000 and n = 1 , 2, 3, 4 , 5 ,6 ,7, 8, 9 and 10.
or A=2
Figure imgf000025_0001
or A=3
Figure imgf000025_0002
For A=4
Figure imgf000026_0001
Extension of the invention to the case of fully comprehensive 'abnormal' cell mutational profiling within an individual
If we now take 'normal' cells as the source of our biotinylated fragments and 'abnormal' cells from the same individual as the source of our non-biotinylated fragments, any fragments that have acquired sequence changes on the way to becoming 'abnormal' will be unable to form perfectly matched duplexes with the biotinylated 'normal' fragments. Successive cycles of such inter-population perfectly matched duplex depletion will thus lead to the enrichment of fragments carrying all those sequence differences between the 'normal' cells and the 'abnormal' cells - the degree of enrichment per cycle being as described below.
From the data of Nickerson et al, a given individual should be heterozygous about once every 573 base pairs.
If we anneal the two complementary strands for a fragment using DNA from 'abnormal' cells and DNA from 'normal' cells, then {1 -e("F/573)} of fragments will contain one or more site of heterozygosity.
For a heterozygous site, p and q are both 0.5. The probability of obtaining a perfectly matched duplex between a biotinylated fragment and a non-biotinylated fragment containing the site is 0.5. Similarly, the probability of obtaining a mismatch-containing duplex between a biotinylated fragment and a non-biotinylated fragment containing the site is also 0.5. The fraction of fragments carried through the first cycle of inter-population perfectly matched duplex depletion will therefore be
0.5-{1 -e("F/573)} -{Y/(X+Y)}
o hence n repetitions of such a process will reduce the original number of fragments to the following fraction
[0.5-{1 -e(-F/573)}] n-{Y/(X+Y)}n
The enrichment for phenotype-determining fragments over SNP-containing ('noise') fragments during inter-population perfectly matched duplex depletion cycles will therefore be given by
Figure imgf000027_0001
o
Example figures for enrichment are given below with F = 50, 100, 200, 300, 400, 500, 600, 700, 800, 900 and 1 ,000 and n = 1 , 2, 3, 4, 5, 6, 7, 8, 9 and 10.
Figure imgf000028_0001
This approach will enrich for fragments containing all sequence differences within the 'abnormal' cells - no prior knowledge of the genes (oncogenes, tumour suppressor genes etc. ) that may need to be investigated is required.
Having thus isolated a fragment (or fragments) determining differences between the 'affected' and 'unaffected' populations, we can now proceed to their analysis.
Aspect (II)
In another aspect the invention provides a method of making a set of arrays of fragments of DNA of interest, which method comprises: a) selecting, from a set of n restriction endonuclease enzymes, a subset of r restriction endonuclease enzymes; b) digesting genomic DNA with the subset of r enzymes; c) ligating to the resulting fragments restriction-enzyme-cutting- site-specific adapters with unique polymerase chain reaction amplifiable sequences; d) splitting the resulting fragments into r2 aliquots; e) amplifying each aliquot with two-restriction enzyme-specific primers; f) forming an array of the r2 aliquots of the amplimers; and g) repeating steps a) to f) using a different subset of r restriction 5 endonuclease enzymes. The invention also includes sets of arrays obtained or obtainable by the method.
The n restriction endonuclease enzymes may be selected from 4-cutters and 5-cutters and 6-cutters, and a set may include enzymes from one or two or three of these categories. The value of n is preferably 3 io to 10, for reasons discussed below. The value of r is less than n and is preferably 2 to 4, chosen with reference to the frequency with which the chosen enzymes cut nucleic acids, and ease of fragment amplification by PCR.
is Terminal restriction site profiling arrays (TRSPAs)
If we use a total of n 6 bp cutter restriction enzymes within the total set of enzymes used for fragmentation, let us use subsets of r 6 bp cutter enzymes (taken from the total set n) to make (rxr) TRSPA test matrices as follows.
20
Version 1 - (TRSPA-1)
For each (rxr) TRSPA test matrix, digest DNA to completion with all r restriction enzymes. Ligate on restriction enzyme cutting-site- specific adaptors with unique polymerase chain reaction amplifiable tags.
2.1 Split into r2 aliquots and for each aliquot, amplify with biotinylated restriction enzyme, and non-biotinylated restriction enzymek tag-specific primers and array the non-biotinylated strands for all values of j and k between 1 and r. Example
Consider the following dsDNA
..[A]-1 -[B]-2-[B]-3-[C]-4-[A]-5-[B]-- + < ..[AJ..-I [B]-2-[B]-3-[C]-4--[A]--5-[B]-- -
where the restriction enzyme cutting sites are denoted A, B and C and the fragments after restriction digestion are denoted 1 , 2, 3, 4 and 5. + and - denote the sense of the strands. o Cut to completion with A. B and C to give
A--1 --B B--2--B B--3--C C--4--A A--5--B + A--1 --B B--2--B B--3--C C--4--A A--5--B -
Polymerase chain reaction amplify according to the following primer matrix and streptavidin capture to give
Figure imgf000031_0001
Keep only the non-biotinylated strands to give
Figure imgf000031_0002
In simplified form
Figure imgf000032_0001
Repeat for all πCr combinations of restriction enzymes to generate the TRSPA.
TRSPA-1 hybridisation patterns
The two types of TRSPA-1 hybridisation pattern we should expect using a probe resulting from inter-population perfectly matched duplex depletion are
1. Hybridisation to an off-diagonal element (e.g. row x, column y - where x and y are different) and its complementary element reflected across the diagonal (i.e. row y, column x) and
2. Hybridisation to an on-diagonal element (e.g. the element at row z, column z).
TRSPA-1 analysis - a worked example for n=3 and r=2
Let us take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
where
A, B and C denote restriction enzyme cutting sites. 1 , 2, 3, 4, 5, 6, 7, 8 and 9 denote the restriction fragments after digestion.
The TRSPA-1 test matrices
There are 3C2=3 TRSPA-1 test matrices - AB, BC and AC. The test matrix hybridisation patterns
There are {(2-(2+1 ))/2}=3 possible hybridisation patterns for each of the test matrices
Figure imgf000033_0001
The combinatorial diversity
If there are three possible TRSPA-1 test matrix hybridisation patterns and three possible test matrices, then there will be 33=27 possible TRSPA-1 signatures.
Fragment 4 analysis - the AB TRSPA-1 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and B to give
-C-1 -B
B-2-B
B-3-A
A-4-A
A-5-C-6-C-7-B
B-8-A
A-9-A-
The fragment complementary to fragment 4 will be
A-4-A Polymerase chain reaction amplify and hybridise with just fragment 4 to give
Figure imgf000034_0001
Fragment 4 analysis - the BC TRSPA-1 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with B and C to give
-C-1 -B
B-2-B
B-3-A-4-A-5-C
C-6-C
C-7-B
B-8-A-9-A-
The fragment complementary to fragment 4 will be
B-3-A-4-A-5-C
Polymerase chain reaction amplify and hybridise with just fragment 4 to give
Figure imgf000035_0001
Fragment 4 analysis - the AC TRSPA-1 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and C to give
-C
C-1 -B-2-B-3-A
A-4-A
A-5-C
C-6-C
C-7-B-8-A
A-9-A
A-
The fragment complementary to fragment 4 will be
A-4-A
Polymerase chain reaction amplify and hybridise with just fragment 4 to give
Figure imgf000035_0002
Fragment 5 analysis - the AB TRSPA-1 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and B to give
-C-1 -B
B-2-B
B-3-A
A-4-A
A-5-C-6-C-7-B
B-8-A
A-9-A-
The fragment complementary to fragment 5 will be
A-5-C-6-C-7-B
Polymerase chain reaction amplify and hybridise with just fragment 5 to give
AB matrix A B bio-A bio-B
Fragment 5 analysis - the BC TRSPA-1 matrix Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A- Cut with B and C to give
-C-1 -B B-2-B B-3-A-4-A-5-C C-6-C C-7-B B-8-A-9-A-
The fragment complementary to fragment 5 will be
B-3-A-4-A-5-C
Polymerase chain reaction amplify and hybridise with just fragment 5 to give
Figure imgf000037_0001
Fragment 5 analysis - the AC TRSPA-1 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and C to give
-C
C-1 -B-2-B-3-A
A-4-A
A-5-C C-6-C C-7-B-8-A A-9-A A-
The fragment complementary to fragment 5 will be
A-5-C
io Polymerase chain reaction amplify and hybridise with just fragment 5 to give
Figure imgf000038_0001
Fragment 6 analysis - the AB TRSPA-1 matrix
15 Take the dsDNA
-C- 1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and B to give
20
-C-1-B
B-2-B
B-3-A
A-4-A
~>s A-5-C-6-C-7-B
B-8-A
A-9-A- The fragment complementary to fragment 6 will be
A-5-C-6-C-7-B
Polymerase chain reaction amplify and hybπdise with just fragment 6 to give
Figure imgf000039_0001
Fragment 6 analysis - the BC TRSPA-1 matrix Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with B and C to give
-C-1 -B
B-2-B
B-3-A-4-A-5-C
C-6-C
C-7-B
B-8-A-9-A-
The fragment complementary to fragment 6 will be
C-6-C
Polymerase chain reaction amplify and hybridise with just fragment 6 to give
Figure imgf000040_0001
Fragment 6 analysis - the AC TRSPA-1 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and C to give
C-1 -B-2-B-3-A
A-4-A
A-5-C
C-6-C
C-7-B-8-A
A-9-A
A-
The fragment complementary to fragment 6 will be
C-6-C
Polymerase chain reaction amplify and hybridise with just fragment 6 to give
Figure imgf000040_0002
Overall results
Fragment 4 TRSPA-1 analysis
Figure imgf000041_0001
Fragment 5 TRSPA-1 analysis
Figure imgf000041_0002
Fragment 6 TRSPA-1 analysis
Figure imgf000041_0003
Version 2 - (TRSPA-2)
For each (rxr) TRSPA test matrix, digest DNA to completion with all r restriction enzymes. Ligate on restriction enzyme cutting-site- specific adaptors with unique polymerase chain reaction amplifiable tags. Split into r2 aliquots and for each aliquot, amplify with non-biotinylated restriction enzymβj and non-biotinylated restriction enzymβk tag-specific primers and array the denatured strands for all values of j and k between 1 and r. Example
Consider the following dsDNA
__Γ A]~1 -[B]-2-[B]-3-[C]-4-[A]-5-[B]- + .. [A]-l -[B]-2-[B]-3--[C]-4-[A]-5--[B]- -
where the restriction enzyme cutting sites are denoted A, B and C and the fragments after restriction digestion are denoted 1 , 2, 3, 4 and 5. + and - denote the sense of the strands. Cut to completion with A, B and C to give
A--1 --B B--2--B B--3--C C--4--A A--5--B + A--1 --B B--2--B B--3--C C--4--A A-5--B -
Polymerase chain reaction amplify according to the following primer matrix to give
Figure imgf000043_0001
In simplified form
Figure imgf000043_0002
Repeat for all πCr combinations of restriction enzymes to generate the TRSPA. TSPSA-2 hybridisation patterns
The two types of TRSPA-2 hybridisation pattern we should expect using a probe resulting from inter-population perfectly matched duplex depletion are 1. Hybridisation to an off-diagonal element (e.g. row x, column y
- where x and y are different) and its complementary element reflected across the diagonal (i.e. row y, column x) and
2. Hybridisation to a whole row and column intersecting at an on-diagonal element (e.g. all of row z and all of column z).
TRSPA-2 analysis - a worked example for n=3 and r=2
Let us take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
where
A, B and C denote restriction enzyme cutting sites. 1 , 2, 3, 4, 5, 6, 7, 8 and 9 denote the restriction fragments after digestion.
The TRSPA-2 test matrices
There are 3C2=3 TRSPA-2 test matrices - AB, BC and AC.
The test matrix hybridisation patterns
There are {(2-(2+1 ))/2}=3 possible hybridisation patterns for each of the test matrices
Figure imgf000044_0001
The combinatorial diversity
If there are three possible TRSPA-2 test matrix hybridisation patterns and three possible test matrices, then there will be 33=27 possible TRSPA-2 signatures.
Fragment 4 analysis - the AB TRSPA-2 matrix Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and B to give
-C-1 -B
B-2-B
B-3-A
A-4-A
A-5-C-6-C-7-B
B-8-A
A-9-A-
The fragment complementary to fragment 4 will be
A-4-A
Polymerase chain reaction amplify and hybridise with just fragment 4 to give
Figure imgf000045_0001
Fragment 4 analysis - the BC TRSPA-2 matrix
Take the dsDNA -C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A- Cut with B and C to give
-C-1 -B
B-2-B
B-3-A-4-A-5-C
C-6-C
C-7-B
B-8-A-9-A-
The fragment complementary to fragment 4 will be
B-3-A-4-A-5-C
Polymerase chain reaction amplify and hybridise with just fragment 4 to give
Figure imgf000046_0001
Fragment 4 analysis - the AC TRSPA-2 matrix Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and C to give
-C
C-1 -B-2-B-3-A
A-4-A A-5-C
C-6-C
C-7-B-8-A
A-9-A
A-
The fragment complementary to fragment 4 will be
A-4-A
Polymerase chain reaction amplify and hybridise with just fragment 4 to give
Figure imgf000047_0001
Fragment 5 analysis - the AB TRSPA-2 matrix
Take the dsDNA
-C- 1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and B to give
-C-1 -B B-2-B B-3-A A-4-A
A-5-C-6-C-7-B
B-8-A
A-9-A- The fragment complementary to fragment 5 will be
A-5-C-6-C-7-B
Polymerase chain reaction amplify and hybridise with just fragment 5 to give
Figure imgf000048_0001
Fragment 5 analysis - the BC TRSPA-2 matrix Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with B and C to give
-C-1 -B
B-2-B
B-3-A-4-A-5-C
C-6-C
C-7-B
B-8-A-9-A-
The fragment complementary to fragment 5 will be
B-3-A-4-A-5-C
Polymerase chain reaction amplify and hybridise with just fragment 5 to give
Figure imgf000049_0001
Fragment 5 analysis - the AC TRSPA-2 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and C to give
-C
C-1 -B-2-B-3-A
A-4-A
A-5-C
C-6-C
C-7-B-8-A
A-9-A
A-
The fragment complementary to fragment 5 will be
A-5-C
Polymerase chain reaction amplify and hybridise with just fragment 5 to give
Figure imgf000049_0002
Fragment 6 analysis - the AB TRSPA-2 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and B to give
-C-1 -B
B-2-B
B-3-A
A-4-A
A-5-C-6-C-7-B
B-8-A
A-9-A-
The fragment complementary to fragment 6 will be
A-5-C-6-C-7-B
Polymerase chain reaction amplify and hybridise with just fragment 6 to give
AB matrix A B
A
B
Fragment 6 analysis - the BC TRSPA-2 matrix Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A- Cut with B and C to give
-C-1 -B
B-2-B
B-3-A-4-A-5-C
C-6-C
C-7-B
B-8-A-9-A-
ιo The fragment complementary to fragment 6 will be
C-6-C
Polymerase chain reaction amplify and hybridise with just 15 fragment 6 to give
Figure imgf000051_0001
Fragment 6 analysis - the AC TRSPA-2 matrix
Take the dsDNA
-C-1 -B-2-B-3-A-4-A-5-C-6-C-7-B-8-A-9-A-
Cut with A and C to give
T S
C-1 -B-2-B-3-A
A-4-A
A-5-C C-6-C C-7-B-8-A A-9-A A-
The fragment complementary to fragment 6 will be
C-6-C
Polymerase chain reaction amplify and hybridise with just fragment 6 to give
Figure imgf000052_0001
Overall results
Fragment 4 TRSPA-2 analysis
AB BC AC
Figure imgf000052_0002
Fragment 5 TRSPA-2 analysis
AB BC AC
Figure imgf000052_0003
Fragment 6 TRSPA-2 analysis
Figure imgf000053_0001
The number of test matrices
For a total of n enzymes used for fragmentation and a panel of r enzymes per (rxr) test matrix, there will be πCr possible (rxr) test matrices, where
nCr = (n!)/[(n-r)!-r!j
nCr for various n and r is given in the following table
Figure imgf000053_0002
The number of TRSPA spots arrayed The total number of TRSPA spots is given by
r2-{nCr} = r 2-{(n!)/[(n-r)!.r!]} ^•{"Cr} for various n and r is given in the following table
Figure imgf000054_0002
Each test as above will give rise to a particular hybridisation
'signature'
TRSPA-1 TRSPA-2 (for n=6 and r=3) (for n=6 and r=3)
Figure imgf000054_0001
K liltH LBHJE r-THIH I IB I I H MBMB r5lB fSΪ MBM QQ II I I MM MM m 11 rr rm rj-D DO 11 M LLU Ufjϋ QJ U-U L D H 1 π 11 Lfl LJLJ IHB
CB tkfl HB l±l m fill wr MM ■ I i MM τι rra H LLB IXOLJO B MBMa HHH iiB iiH LBJ
There will be {1+2+...+r} = {r(r+1)/2} patterns per (rxr) test matrix. If there are πCr (rxr) test matrices, the total possible number of signatures will be given by {r(r+1)/2} raised to the power nCr. We require that the number of such different signatures is greatly in excess over the number of fragments generated upon fragmentation.
If we choose n=6 and r=3, there will be 6 possible hybridisation patterns per test and 20 possible tests - giving a combinatorial diversity of 6"° = 3.7-1015. The preferred scheme therefore employs six enzymes, tested in groups of three - giving 180 spots per experiment. In order to avoid restriction fragment length polymorphism problems, a duplicate analysis could be performed with a different set of enzymes sharing no cutting sites in common with the first set.
Example restriction enzyme sets for the preparation of test matrices for TRSPAs
Criteria for enzyme choice
In order to make TRSPAs, the selection of suitable enzymes is an important factor. Ideally, two sets of different enzymes are required to eliminate the small possibility that a phenotype-determining polymorphism might fall within a chosen restriction site and therefore compromise the specificity of the resulting signature. The selection of enzymes can be based upon a number of criteria a) The enzymes should be 6 bp cutters. b) Cleavage by any selected enzyme should leave a 4 bp overhang at the 5' end. c) The selected enzymes in each set should all work efficiently under the same buffer conditions. d) The selected enzymes in each set should ideally work efficiently at a single incubation temperature. e) The chosen enzymes should be commercially available - ideally at concentrations of 10 U / μl or more. f) The 5' overhangs left by any two enzymes in the same set should not be identical. g) No enzyme should appear in both sets for TRSPA fabrication. h) Enzymes should be selected to avoid or minimise the effects of mammalian methylation patterns In particular, enzymes with CG dinucleotides in their recognition sites should be avoided unless the enzyme is known to be able to restrict m5CpG sites
DNA methylation
In vertebrates, DNA is often methylated at the 5th position of cytosine in the sequence of CpG and this is the only chemical modification that DNA of vertebrates contains under physiological conditions. By the i o careful selection of enzymes which do not contain CpG sequences within the recognition site, or the selection of enzymes which freely restrict m5CpG methylated sites, it is possible to remove the potentially adverse effects of DNA methylation from the TRSPA analysis 6 bp cutters which are known to restrict mbCpG modified DNA efficiently to leave a 5' four base overhang
1 are BspEl and Xmal. These enzymes are therefore the only enzymes with restriction sites containing CpG dinucleotides that are potentially useful in a TRSPA analysis
Enzyme selection method 0 Sixteen possible four base pair overhangs exist (excluding unusual enzymes with asymmetrical recognition sequences such as BssSI or Bsil), five of which contain CG in the sequence. A further four overhangs could potentially contain CG sequences within the restriction recognition site if preceded by a C and followed by a G. Enzymes are 5 therefore preferentially selected from the remaining six groups.
Excluding isoschizomers, there are up to four possible enzymes which would leave a particular 5' four bp overhang. For example, enzymes leaving a CTAG overhang are
0 Avrll CCTAGG
Nhel GCTAGC
Spel ACTAGT
Xbal TCTAGA
Enzymes to cleave sites with all the combinations of flanking bases are not available for all overhangs - hence the enzyme choice is more limited for some overhang groups than others.
As a primary step towards enzyme selection, the enzymes supplied by Amersham Pharmacia Biotech, New England Biolabs and Promega are ordered below by overhang sequence. Supplementary details such as the percentage activity in common buffers, the reaction temperature, concentration and supplier are also recorded. For three overhang sequences, there are no available enzymes, the remaining 13 are described below Enzymes considered unsuitable due to methylation sensitivity are shaded (darker shade). Enzymes considered unfavourable due to the presence of CpG sites even though they do restrict methylated DNA to some degree are also shaded (lighter shaded).
Candidate restriction enzymes leaving a 5' overhang of GTAC
Figure imgf000057_0001
Candidate restriction enzymes leaving a 5' overhang of TTAA
Figure imgf000058_0001
Candidate restriction enzymes leaving a 5' overhang of AATT
Figure imgf000058_0002
Candidate restriction enzymes leaving a 5' overhang of CATG
Figure imgf000058_0003
Candidate restriction enzymes leaving a 5" overhang of GATC
Figure imgf000058_0004
Candidate restriction enzymes leaving a 5' overhang of CCGG
Figure imgf000059_0001
Candidate restriction enzymes leaving a 5' overhang of GCGC
Figure imgf000059_0002
Candidate restriction enzymes leaving a 5' overhang of TCGA
Figure imgf000059_0003
Candidate restriction enzymes leaving a 5' overhang of AGCT
Figure imgf000059_0004
Candidate restriction enzymes leaving a 5' overhang of CGCG
Figure imgf000060_0001
Candidate restriction enzymes leaving a 5' overhang of GGCC
Figure imgf000060_0002
Candidate restriction enzymes leaving a 5' overhang of TGCA
Figure imgf000060_0003
Candidate restriction enzymes leaving a 5' overhang of CTAG
Figure imgf000060_0004
From the above tables, a short-list of the most useful enzymes is given below for each of the buffer conditions shown
Figure imgf000061_0001
Taking into account all of the criteria explained above, an example selection of two six-enzyme sets is described below. Reserve enzymes, which could also be used, are shown as well. These reserve enzymes can be substituted (provided this does not cause overhang duplication) if practical problems regarding enzyme availability or performance should occur.
Example enzyme set 1 - for restriction in buffer M + BSA
Figure imgf000062_0001
Example enzyme set 2 - for restriction in buffer M + BSA
Figure imgf000062_0002
1. Example reserve enzymes
Figure imgf000063_0001
Example specific (3x3) terminal restriction site profiling array (TRSPA) test matrices for above two sets of six enzymes
Triplet combinations from BamHI, BsrGI, Hindlll, Ncol, Spel and Aflll
For example set 1 - BamHI, BsrGI, Hindlll, Ncol, Spel and Aflll - the 20 (= 6C3) triplet combinations are
BamHI BsrGI Hindlll BamHI Hindlll Spel BsrGI Hindlll Ncol BsrGI Spel Aflll
BamHI BsrGI Ncol BamHI Hindlll Aflll BsrGI Hindlll Spel Hindlll Ncol Spel
BamHI BsrGI Spel BamHI Ncol Spel BsrGI Hindlll Aflll Hindlll Ncol Aflll
BamHI BsrGI Aflll BamHI Ncol Aflll BsrGI Ncol Spel Hindlll Spel Aflll
BamHI Hindlll Ncol BamHI Spel Aflll BsrGI Ncol Aflll Ncol Spel Aflll
Triplet combinations from EcoRI, BspHI, Bglll, Xbal, Acc65l and ApaLI
For example set 2 - EcoRI, BspHI, Bglll, Xbal, Acc651 and ApaLI - the 20 (= 6C3) triplet combinations are
EcoRI BspHI Bglll EcoRI Bglll Acc65l BspHI Bglll Xbal BspHI Acc65l ApaLI EcoRI BspHI Xbal EcoRI Bglll ApaLI BspHI Bglll Acc65l Bglll Xbal Acc65l EcoRI BspHI Acc65l EcoRI Xbal Acc65l BspHI Bglll ApaLI Bglll Xbal ApaLI EcoRI BspHI ApaLI EcoRI Xbal ApaLI BspHI Xbal Acc65l Bglll Acc65l ApaLI EcoRI Bglll Xbal EcoRI Acc65l ApaLI BspHI Xbal ApaLI Xbal Acc65l ApaLI Example TRSPA-1 test matrices for set 1 - BamHI, BsrGI, Hindlll, Ncol, Spel and Aflll
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
Example TRSPA-1 test matrices for set 2 - EcoRI, BspHI, Bglll, Xbal, A cc65l and ApaLI
Figure imgf000066_0002
Figure imgf000067_0001
Figure imgf000068_0001
Figure imgf000069_0001
Example TRSPA-2 test matrices for set 1 - BamHI, BsrGI, Hindlll, Ncol, Spel and Aflll
Figure imgf000069_0002
Figure imgf000070_0001
Figure imgf000071_0001
Example TRSPA-2 test matrices for set 2 - EcoRI. BspHI, Bglll, Xbal, Acc651 and ApaLI
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
Hybridisation patterns
There are six possible hybridisation patterns for a given probe fragment from inter-population perfectly matched duplex depletion enrichment with a (3x3) test matrix in TRSPA-1 analysis.
There are also six possible hybridisation patterns for a given probe fragment from inter-population perfectly matched duplex depletion enrichment with a (3x3) test matrix in TRSPA-2 analysis.
We can denote these patterns as follows TRSPA-1 analysis
Pattern a (1 only)
Figure imgf000075_0001
Pattern b (2 and 4)
Figure imgf000075_0002
Pattern c (3 and 7)
Figure imgf000075_0003
Pattern d (5 only)
Figure imgf000075_0004
Pattern e (6 and 8)
Figure imgf000076_0001
Pattern f (9 only)
Figure imgf000076_0002
TRSPA-2 analysis
Pattern a (row 1 and column 1)
matrix # I δ'-HO-X-pπmer 5'-HO-Y-pπmer δ'-HO-Z-pπmer δ -HO-X-pπmer || δ'-HO-Y-pπmer δ 6
5 -HO-Z-pπmer 8 9
Pattern b (2 and 4)
Figure imgf000076_0003
Pattern c (3 and 7)
Figure imgf000077_0001
Pattern d (row 2 and column 2)
Figure imgf000077_0002
Pattern e (6 and 8)
Figure imgf000077_0003
Pattern f (row 3 and column 3)
matrix # 5'-HO-X-prιmer δ'-HO-Y-pnmer δ'-HO-Z-pπmer I
5'- HO -X-pπmer 1 2
5'- HO -Y-pπmer 4 δ
5'- HO -Z-pπmer
Signatures
If we have 40 such (3x3) test matrices and we denote the possible hybridisation patterns a, b, c, d, e and f, then we can write the overall TRSPA hybridisation signature as follows
Figure imgf000078_0001
Aspect (III)
In another aspect the invention provides a nucleic acid characterisation method which comprises presenting to the set of arrays as defined above a nucleic acid fragment of interest under hybridisation conditions, and observing a pattern of hybridisation. Preferably, a plurality of nucleic acid fragments of interest are separately presented to the set of arrays, and the resulting patterns of hybridisation are compared. Preferably, the plurality of nucleic acid fragments of interest are drawn from the mixture of DNA fragments, enriched in fragments that are characteristic of a phenotype of interest, as described under the invention (1 ) above.
Thus in this aspect the invention provides a method of identifying fragments of DNA that are characteristic of a phenotype of interest, which method comprises recovering, cloning and amplifying individual DNA fragments from the mixture of DNA fragments obtained under invention (1 ) above, presenting the individual DNA fragments to the set of arrays as defined under hybridisation conditions, observing a pattern of hybridisation generated by each individual DNA fragment, and subjecting to further investigation any two or more individual DNA fragments whose hybridisation patterns are similar or identical
TRSPA signatures and whole genome association studies After a given number of cycles of inter-population perfectly matched duplex αepletion, phenotype determining fragments will be enriched but will not be entirely free from 'noise' fragments Noise may result from unequal allelic frequencies for certain SNPs between the two populations. Noise will also result from the presence of somatic mutations in the cells used to prepare DNA fragments and from the use of polymerase chain reaction in some of the embodiments of the current invention
For the preferred embodiment, DNA is prepared from a library of clones (either genomic clones or cDNA clones) - with inserts derived from the ιndιvιdual(s) and propagated in some appropriate host and cloning vector system Restriction enzyme fragmentation is used prior to cloning and polymerase chain reaction amplification is used to prepare the DNA for comparison in fragmented form Priming sites within the vector sequence flanking the cloned restriction enzyme fragmented inserts are employed for one or more cycles of polymerase chain reaction amplification of the fragmented DNA of interest The primers used for polymerase chain reaction amplification of the fragmented DNA of interest are again used after the phenotype-determining fragment enrichment process to 'rescue' and clone the enriched fragments. Cloned enriched fragments are colony purified, picked into appropriate storage containers, catalogued and archived, DNA probes are prepared from these single clones and are individually hybridised against 180 spot TRSPA arrays as above. TRSPA signatures are determined for many colonies. Most noise fragments will be random in nature and will thus be randomly distributed amongst the 620 = 3.7-1015 possible types of TRSPA signature. Phenotype-determining signal fragments, however, will be those where repeat TRSPA signatures are obtained as more and more colonies are sampled. The more frequently the repetition of a given TRSPA signature occurs per unit colonies sampled, the greater the signal to noise ratio and the more successful has been the enrichment. A great many of the steps in this process are amenable to high throughput automation - enabling very large numbers of single colony TRSPA signatures to be determined with ease and extending the power of the current invention to cases where signal to noise ratios are beyond current approaches. Statistical correlations (associations) can initially be drawn between TRSPA signatures and phenotype. The clones giving rise to a particular TRSPA signature showing a useful association with a phenotype of interest can then be sequenced in order to determine at a DNA sequence level the association(s) with the phenotype of interest. Such associations have future predictive values for the phenotype of interest, knowing the genotype and will be of great value in medicine and pharmacogenetics.
If the genome of interest is wholly or partially sequenced, we can also in silico restrict the DNA with all n enzymes, calculate the expected signature for each fragment and pattern match these expected signatures with the observed signature (taking into account any loss or gain of restriction sites due to polymorphic variation compared to the reference sequence) to immediately identify the fragment of interest within a gene, genomic region, chromosome or whole genome. This latter method will be of great value in those cases where a great many phenotype-determining fragments are obtained and repeat signatures are rare or unobtained. The clustering of phenotype-determining fragments to adjacent DNA regions thus gives an association between those genomic regions and the phenotype of interest.
Aspect (IV)
In yet another aspect the invention provides a double- stranded DNA molecule having the sequence a-A-b-B...X-y-Y-z where A, B...X and Y are unique restriction sites for n different restriction endonuclease enzymes, and a, b...y, z denotes distances in base pairs, characterised in that each fragment, obtainable by cutting the DNA molecule by means of any one or more up to n of the restriction enzymes, has a different length from every other fragment.
An example totally diagnostic internal control DNA which allows both the extent and exact nature of any example set 1 (or example set 2) 6 bp cutter partial digestion to be unambiguously determined for inter- population perfectly matched duplex depletion or TRSPA restriction
In both of the above schemes, it is important that limit digestion products are obtained. Monitoring the extent of partial digestion resulting from multi-enzymatic restriction and determining precisely which enzymes have failed to cut is a task of great importance.
If we have up to six enzymes for DNA digestion - let us label these A, B, C, D, E and F. We need to somehow determine that these have all cut to completion during the fragmentation stage for inter- population perfectly matched duplex depletion and also for the digestion step prior to adaptor ligation in TRSPA fabrication. If any of the enzymes have failed to cut to completion, we need to know which ones and to what degree in order to effectively rectify the problem.
The structure of the internal control DNA
If we construct a double stranded DNA molecule with the following structure
end— t— A— u— B— v— C---W— D— x— E— y— F— z— end
where the A, B, C, D, E and F denote the sites for restriction enzyme cutting and t. u. v w, x, y and z denote distances in base pairs.
This internal control DNA is either uniformly pre-labelled and added to the DNA of interest at an appropriate concentration prior to restriction or is Southern blot probed with a complementary sequence not found in the DNA of interest after restriction.
All six enzymes can cut in only one way. One enzyme can fail to cut in 6Cι = 6 ways, these are: A, B, C, D, E or F failing to cut.
Two enzymes can fail to cut in 6C2 = 15 ways, these are: AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF or EF failing to cut. Three enzymes can fail to cut in 6C3 = 20 ways, these are: ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF or DEF failing to cut.
Four enzymes can fail to cut in 6C = 15 ways, these are: ABCD, ABCE, ABCF, ABDE, ABDF, ABEF, ACDE, ACDF, ACEF, ADEF, BCDE, BCDF, BCEF, BDEF or CDEF failing to cut.
Five enzymes can fail to cut in 6C5 = 6 ways, these are: ABCDE, ABCDF, ABCEF, ABDEF, ACDEF or BCDEF failing to cut. All six enzymes can fail to cut in only one way. Each of the above possibilities will generate one or more fragments from the internal control DNA. If each possible fragment has a discernible size from any other, then we can determine exactly which enzymes have cut and which have not from the size distribution of the fragments generated. The task is therefore to design such a DNA molecule. Example simulations
Seven simulations are given below - varying the size of inter- site fragments. Criteria for a successful outcome include the following
1. The inter-fragment spacing should be greater for larger fragments (so as to aid electrophoretic resolution).
2. All possible fragments should be unambiguously resolvable in size from each-other.
3. Size gaps between bands comprising different numbers of inter-site units should be greater than the size gaps between bands comprising the same number of inter-site units.
4. The size gaps and size spread from largest to smallest fragment should be electrophoretically compatible.
Simulation 1
Inter-site fragment sizes (in bp)
Figure imgf000083_0001
Possible digestion products obtained (in bp)
Figure imgf000084_0001
Spread = 690 bp
Simulation 2
Inter-site fragment sizes (in bp)
Figure imgf000085_0001
Possible digestion products obtained (in bp)
Figure imgf000086_0001
Spread = 750 bp Simulation 3
Inter-site fragment sizes (in bp)
Figure imgf000087_0001
Possible digestion products obtained (in bp)
Figure imgf000088_0001
Spread = 810bp Simulation 4
Inter-site fragment sizes (in bp)
Figure imgf000089_0001
Possible digestion products obtained (in bp)
Figure imgf000090_0001
Spread = 870 bp Simulation 5
Inter-site fragment sizes (in bp)
Figure imgf000091_0001
Possible digestion products obtained (in bp)
Figure imgf000092_0001
Spread = 930 bp Simulation 6
Inter-site fragment sizes (in bp)
Figure imgf000093_0001
Possible digestion products obtained (in bp)
Figure imgf000094_0001
Spread = 990 bp Simulation 7
Inter-site fragment sizes (in bp)
Figure imgf000095_0001
Possible digestion products obtained (in bp)
Figure imgf000096_0001
Spread = 1050 bp According to the above criteria for success, simulation 7 above clearly fulfils all of the requirements.
An example totally diagnostic internal control DNA which allows both the extent and exact nature of any example set 1 (or example set 2) 4 bp cutter partial digestion to be unambiguously determined for inter- population perfectly matched duplex depletion
If we have up to three enzymes for DNA digestion - let us label these A, B and C. We need to somehow determine that these have all cut to completion during the fragmentation stage for inter-population perfectly matched duplex depletion. If any of the enzymes have failed to cut to completion, we need to know which ones and to what degree in order to effectively rectify the problem.
The structure of the internal control DNA
If we construct a double stranded DNA molecule with the following structure
end— t— A— u— B — v— C— w— end
where the A, B and C denote the sites for restriction enzyme cutting and t, u, v and w denote distances in base pairs.
This internal control DNA is uniformly pre-labelled and added to the DNA of interest at an appropriate concentration prior to restriction or is Southern blot probed with a complementary sequence not found in the DNA of interest after restriction.
All three enzymes can cut in only one way.
One enzyme can fail to cut in 3Cι = 3 ways, these are: A, B or C failing to cut. Two enzymes can fail to cut in 3C2 = 3 ways, these are: AB,
AC or BC failing to cut. All three enzymes can fail to cut in only one way.
Each of the above possibilities will generate one or more fragments from the internal control DNA. If each possible fragment has a discernible size from any other (and from any of the fragments in simulation 7 above for the panel of up to 6 enzymes), then we can determine exactly which enzymes have cut and which have not from the size distribution of the fragments generated. The task is therefore to design such a DNA molecule.
Example simulations
Six simulations are given below - varying the size of inter-site fragments. Criteria for a successful outcome included the following: 1. The inter-fragment spacing should be greater for larger fragments (so as to aid electrophoretic resolution). 2. All possible fragments should be unambiguously resolvable in size from each-other.
3. Size gaps between bands comprising different numbers of inter-site units should ideally be greater than the size gaps between bands comprising the same number of inter-site units. 4. The size gaps and size spread from largest to smallest fragment should be electrophoretically compatible. 5. The largest fragment obtained should ideally be smaller than the smallest fragment obtained in simulation 7 above for the panel of up to six enzymes. Simulation 1
Inter-site fragment sizes (in bp)
Figure imgf000099_0001
Possible digestion products obtained (in bp)
Figure imgf000099_0002
Spread = 90 bp
Simulation 2
Inter-site fragment sizes (in bp)
Figure imgf000099_0003
Possible digestion products obtained (in bp)
Figure imgf000100_0001
Spread = 105 bp
Simulation 3
Inter-site fragment sizes (in bp)
Figure imgf000100_0002
Possible digestion products obtained (in bp)
Figure imgf000101_0001
Spread = 120 bp
Simulation 4
Inter-site fragment sizes (in bp)
Figure imgf000101_0002
Possible digestion products obtained (in bp)
Figure imgf000102_0001
Spread = 135 bp
Simulation 5
Inter-site fragment sizes (in bp)
Figure imgf000102_0002
Possible digestion products obtained (in bp)
Figure imgf000103_0001
Spread = 90 bp
Simulation 6
Inter-site fragment sizes (in bp)
Figure imgf000103_0002
Possible digestion products obtained (in bp)
Figure imgf000104_0001
Spread = 105 bp
According to the above criteria for success, simulation 6 clearly fulfils all of the requirements.
Example determination of the entire set of internal control DNA limit and partial digestion patterns for a panel of up to six restriction enzymes
For the example simulation 7, the entire set of internal control DNA limit and partial digestion patterns for a panel of up to six restriction enzymes can be determined as below. Inter-site fragment sizes (in bp)
Figure imgf000105_0001
Possible digestion products obtained
All six enzymes can cut in only one way.
Figure imgf000105_0002
One enzyme can fail to cut in 6Cι = 6 ways, these are: A, B, C, D, E or F failing to cut.
Figure imgf000105_0003
Two enzymes can fail to cut in 6C2 = 15 ways, these are: AB, AC. AD, AE, AF, BC. BD. BE. BF. CD, CE, CF, DE, DF or EF failing to cut.
Figure imgf000106_0001
Three enzymes can fail to cut in 6C3 = 20 ways, these are: ABC. ABD, ABE. ABF, ACD, ACE, ACF, ADE. ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE. CDF, CEF or DEF failing to cut.
Figure imgf000107_0001
Four enzymes can fail to cut in 6C4 = 15 ways, these are: ABCD, ABCE, ABCF, ABDE, ABDF. ABEF, ACDE, ACDF, ACEF. ADEF, BCDE. BCDF, BCEF. BDEF or CDEF failing to cut.
Figure imgf000108_0001
Five enzymes can fail to cut in 6Cs = 6 ways, these are: ABCDE. ABCDF. ABCEF, ABDEF, ACDEF or BCDEF failing to cut.
Figure imgf000109_0001
Ail six enzymes can fail to cut in only one way.
Figure imgf000109_0002
Example determination of the entire set of internal control DNA limit and partial digestion patterns for a panel of up to three restriction enzymes
For the example simulation 6, the entire set of internal control DNA limit and partial digestion patterns for a panel of up to three restriction enzymes can be determined as below.
Inter-site fragment sizes (in bp)
Possible digestion products obtained
All three enzymes can cut in only one way.
Figure imgf000109_0004
One enzyme can fail to cut in 3Cι = 3 ways, these are: A. B or
C failing to cut.
Figure imgf000110_0001
Two enzymes can fail to cut in 3C2 = 3 ways, these are: AB, AC or BC failing to cut.
Figure imgf000110_0002
All three enzymes can fail to cut in only one way.
Figure imgf000110_0003
Example 1 a - The digestion internal control plasmid for the 6 bp cutter set 1 TRSPA enzymes BamHI, BsrGI, Hindlll, Ncol, Spel, and Aflll
The plasmid pNW33 (shown below) was constructed to contain an insert with all of the 6 bp cutter TRSPA enzyme sites.
Hindlll (445)
Figure imgf000111_0001
BspEl sites define the outer ends of the 140 bp and the 200 i o bp fragments. The full sequence for pNW33 is shown below:
tcgcgcgtttcggtgatgacggtgaaaacctctgacacatgcagctcccggagacggtcacagcttgtctgtaagcggatgcc gggagcagacaagcccgtcagggcgcgtcagcgggtgttggcgggtgtcggggctggcttaactatgcggcatcagagca gattgtactgagagtgcaccatatgcggtgtgaaataccgcacagatgcgtaaggagaaaataccgcatcaggcgccattc gccattcaggctgcgcaactgttgggaagggcgatcggtgcgggcctcttcgctattacgccagctggcgaaagggggatgt gctgcaaggcgattaagttgggtaacgccagggttttcccagtcacgacgttgtaaaacgacggccagtgaattcgagctcgg taccgggccccccctcgaggtcgacggtatcgataagcttgatcgcagctggtaatccggacgcccgcgtcgaagatgttgg ggtgttgtaacaatatcgattccaattcagcgggggccacctgatatcctttgtatttaattaaagacttcaagcggtcaactatga agaagtgttcgtcttcgtcccagtaaggatccgcactttgaattttgtaatcctgaagggatcgtaaaaacagctcttcttcaaatct 0 atacattaagacgactcgaaatccacatatcaaatatccgagtgtagtaaacattccaaaaccgtgatggaatggaacaaca cttaaatgtacaccctggtaatccgttttagaatccatgataataattttctggattattggtaattttttttgcacgttcaaaattttttgca acccctttttggaaacaaacactacggtaggctgcgaaatgttcatactgttgagcaattcacgttcattataagcttttcactgcat acgacgattctgtgatttgtattcagcccatatcgtttcatagcttctgccaaccgaacggacatttcgaagtattccgcgtacgtg atgttcacctcgatatgtgcatctgtaaaagcaattgttccaggaaccagggcgtatctcttcatagccatggaatacgcctttttc agtgttgcgatgctaatgccgttacaaatattccgagcaccaagaatggctgcgcgcttgcctggtacttgacgtcgtatttgacg gggtccttgagaaagtatttaaactggaacacaatctgaggaatgatcaaagcaaccaacgccaacgcataataactagtg caataccaagacctcccaataatagcacccagacttgtgtaataacctctggctctgatattgctccagatggaattggacgat atggctcattaattgcgtcgatatctctatcataccagtcgttgattgtctgtgtatagccagtaagacaaggaccagacatcatca tgcaaagaatcgcttaagcccttcttggcctttatgaggatctctctgatttttcttgcgtcgagttttccggtaagacctttcggtactt cgtccacaaacacaactcctccgcgcaactttttcgcggttgttacttgactggccacgtaatccacgatctctttttccgtcatcgt ctttccgtgctccaaaacaacaacggcggcgggtccggattaccagctgcgatcaagcttatcgataccgtcgacctcgacct gcaggcatgcaagcttggcgtaatcatggtcatagctgtttcctgtgtgaaattgttatccgctcacaattccacacaacatacga gccggaagcataaagtgtaaagcctggggtgcctaatgagtgagctaactcacattaattgcgttgcgctcactgcccgctttc cagtcgggaaacctgtcgtgccagctgcattaatgaatcggccaacgcgcggggagaggcggtttgcgtattgggcgctcttc cgcttcctcgctcactgactcgctgcgctcggtcgttcggctgcggcgagcggtatcagctcactcaaaggcggtaatacggtt atccacagaatcaggggataacgcaggaaagaacatgtgagcaaaaggccagcaaaaggccaggaaccgtaaaaag gccgcgttgctggcgtttttccataggctccgcccccctgacgagcatcacaaaaatcgacgctcaagtcagaggtggcgaa acccgacaggactataaagataccaggcgtttccccctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccg gatacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctcacgctgtaggtatctcagttcggtgtaggtcgttcg ctccaagctgggctgtgtgcacgaaccccccgttcagcccgaccgctgcgccttatccggtaactatcgtcttgagtccaaccc ggtaagacacgacttatcgccactggcagcagccactggtaacaggattagcagagcgaggtatgtaggcggtgctacag agttcttgaagtggtggcctaactacggctacactagaaggacagtatttggtatctgcgctctgctgaagccagttaccttcgga aaaagagttggtagctcttgatccggcaaacaaaccaccgctggtagcggtggtttttttgtttgcaagcagcagattacgcgc agaaaaaaaggatctcaagaagatcctttgatcttttctacggggtctgacgctcagtggaacgaaaactcacgttaagggatt ttggtcatgagattatcaaaaaggatcttcacctagatccttttaaattaaaaatgaagttttaaatcaatctaaagtatatatgagt aaacttggtctgacagttaccaatgcttaatcagtgaggcacctatctcagcgatctgtctatttcgttcatccatagttgcctgact ccccgtcgtgtagataactacgatacgggagggcttaccatctggccccagtgctgcaatgataccgcgagacccacgctca ccggctccagatttatcagcaataaaccagccagccggaagggccgagcgcagaagtggtcctgcaactttatccgcctcc atccagtctattaattgttgccgggaagctagagtaagtagttcgccagttaatagtttgcgcaacgttgttgccattgctacaggc atcgtggtgtcacgctcgtcgtttggtatggcttcattcagctccggttcccaacgatcaaggcgagttacatgatcccccatgttgt gcaaaaaagcggttagctccttcggtcctccgatcgttgtcagaagtaagttggccgcagtgttatcactcatggttatggcagc actgcataattctcttactgtcatgccatccgtaagatgcttttctgtgactggtgagtactcaaccaagtcattctgagaatagtgta tgcggcgaccgagttgctcttgcccggcgtcaatacgggataataccgcgccacatagcagaactttaaaagtgctcatcatt ggaaaacgttcttcggggcgaaaactctcaaggatcttaccgctgttgagatccagttcgatgtaacccactcgtgcacccaac tgatcttcagcatcttttactttcaccagcgtttctgggtgagcaaaaacaggaaggcaaaatgccgcaaaaaagggaataag ggcgacacggaaatgttgaatactcatactcttcctttttcaatattattgaagcatttatcagggttattgtctcatgagcggataca tatttgaatgtatttagaaaaataaacaaataggggttccgcgcacatttccccgaaaagtgccacctgacgtctaagaaacc attattatcatgacattaacctataaaaataggcgtatcacgaggccctttcgtc The insert was introduced as a four fragment ligation of 140 (Kpnl-BamHI), 150 / 160 (BamHI-Hindlll), 170 / 180 (Hindlll-Spel) and 190 / 200 (Spel-Xhol) into Kpnl / Sail digested pUC19c DNA (Genbank X02514). The Sail and Xhol sites were lost as a result of the joining of their compatible sticky ends. The insert can be removed with BspEl or Pvull.
The insert region of 1 190 bp and the short flanking regions to the vector junctions were sequenced twice in each direction in order to establish the plasmid sequence. In addition, a total of 63 analytical restriction digests and one minus-enzyme control were performed as detailed in the following table:
Figure imgf000113_0001
Figure imgf000114_0001
All of these digests produced the expected fragment patterns on agarose gel electrophoresis and a number of these, shaded in the table above, are shown in figure 1. The restriction digests illustrated in figure 1 were carried out using the following conditions:
Digest 1 (Minus enzyme control)
μl 100 μg/ml internal control plasmid DNA in 1 x NEB buffer #2 + 100 μg/ml BSA μl 1 x NEB buffer #2 + 100 μg/ml BSA
Digest 5
μl I 0.33 U/μl Ncol in 1 x NEB buffer #2 + 100 μg/ml BSA μl i 100 μg/ml internal control plasmid DNA in 1 x NEB buffer #2 + 100 μg/ml BSA μl ! 1 x NEB buffer #2 + 100 μg/ml BSA
Digest 9
Figure imgf000115_0001
Digest 37
Figure imgf000115_0002
Digest 49 5 ul 0.33 U/ul BamHI in 1 x NEB buffer #2 + 100 μg/ml BSA 5 ul 0.33 U/μl Hindlll in 1 x NEB buffer #2 + 100 μg/ml BSA 5 ul 0.33 U/μl Ncol in 1 x NEB buffer #2 + 100 μg/ml BSA 5 ul 0.33 U/ul Spel in 1 x NEB buffer #2 + 100 μg/ml BSA 0 ul 100 μg/ml internal control plasmid DNA in 1 x NEB buffer #2 + 100 ug/ml BSA 0 ul 1 x NEB buffer #2 + 100 μg/ml BSA
Digest 61
μl 0.33 U/μl BamHI in 1 x NEB buffer #2 + 100 μg/ml BSA μl 0.33 U/μl BsrGI in 1 x NEB buffer #2 + 100 μg/ml BSA μl 0.33 U/μl Ncol in 1 x NEB buffer #2 + 100 μg/ml BSA μl 0.33 U/μl Spel in 1 x NEB buffer #2 + 100 μg/ml BSA μl 0.33 U/μl Aflll in 1 x NEB buffer #2 + 100 μg/ml BSA μl 100 μg/ml internal control plasmid DNA in 1 x NEB buffer #2 + 100 μg/ml BSA μl 1 x NEB buffer #2 + 100 μg/ml BSA
All 64 restriction digests were incubated at 37°C for 6 hours and samples were then electrophoresed on 2.5 % FMC MetaPhor agarose gels as illustrated in figure 1 . The lanes marked M contain mixed samples from all of the digests 1 -64 and these lanes were used as size markers after confirmation of the fragment sizes against Stratagene Kb ladder markers.
When spiked into a genomic DNA digest the internal control restriction fragment pattern produced is indicative of both the degree of digestion and the nature of any partial restriction at less than limit digestion. It is possible to deduce from the bands present which, if any, of the enzymes have failed to cut and therefore to take action to correct this before the DNA is used in a subsequent analysis or enrichment procedure. Preparation of Ssp£/-released internal control DNA on a large scale
The method used for generating internal control PCR product DNA (free from contaminating dNTPs) is depicted in the following figure:
PNW33
dNTP
Figure imgf000117_0001
Primers and PCR
20 μM BIO140UP
5' biotin-CGCAGCTGGTAATCCGGACGCCCGCGTCGAAGATGTT 3'
20 μM BIO200DOWN
5' biotin-CGCAGCTGGTAATCCGGACCCGCCGCCGTTGTTGTT 3'
Bulk PCR amplification (192x 100 μl reactions) was carried out according to the conditions described below:
Figure imgf000117_0002
The master mix was rapidly dispensed into PCR tubes Thermal cycling was initiated using the following parameters 30 cycles of 97"C for 1 mm, 50°C for 2 mm, and 72X for 3 mm, 72°C for 5 mm, and then 4°C
After PCR amplification samples were pooled and subjected to capture of biotinylated PCR product termini and BspEl release of internal control DNA
Capture of biotinylated PCR product termini and BspEl release of internal control DNA
All separations were carried out using a Dynal MPC-1 separator (Dynal, product #12001 )
20 ml of pooled PCR reaction were mixed with 20 ml of Dynabeads M-280 in 20 mM tns-HCI (pH 7 4), 2 mM EDTA (pH 8 0), 2 M NaCI The tube was incubated at room temperature for 1 hour with rolling.
The Dynabeads were then washed four times with 20 ml of 10 mM tns-HCI (pH 7 4), 1 mM EDTA (pH 8 0), 1 M NaCI
BspEl digestion was carried out for 1 hour at 37°C in 2 ml of 0 5 U/μl BspEl in 1 x NEB buffer #3 The digest was then ethanol precipitated by the addition of 0 1 volume of 3 M sodium acetate (pH 5.2) and 2 5 volumes of ethanol, chilling to -20°C and then centnfugation Pellets were rinsed with 70 % ethanol prior to redissolution in 500 μl of 1x TE buffer 1 μl of the BspEI-released internal control DNA was mixed with 10 μl of 50 % glycerol AGE loading dye and electrophoresed on a 1.5 % agarose gel to confirm that the size of the purified DNA was in accordance with that expected High molecular weight genomic DNA digestion in the presence of internal control DNA with a dilution series of a mixture of the six set 1 6 bp cutters
High molecular weight canine genomic DNA was first mixed with a dilution series of a mixture of the six set 1 6 bp cutters - each at the same number of units. Aliquots were then removed and mixed with the BspEI-released internal control DNA described above. 20 μg of canine genomic DNA was digested with 0.25, 0.025, 0.0025, 0.00025, 0.000025, 0.0000025 and 0 U/μl BamHI / BsrGI / Hindlll / Ncol / Spel / Aflll in 200 μl. 0.4 μg of canine genomic DNA and 1 μl of BspEI-released internal control DNA were digested with 0.25, 0.025, 0.0025, 0.00025, 0.000025, 0.0000025 and 0 U/μi BamHI / BsrGI / Hindlll / Ncol / Spel / Aflll in 4 μl. Vistra Green staining was used to monitor the extent of internal control DNA digestion and high molecular weight DNA digestion. The fragment pattern from the internal control DNA is diagnostic of both the degree of digestion and the nature of any partial restriction at less than limit digestion.
A premix of restriction enzymes, buffer and BSA was prepared as detailed below:
Figure imgf000119_0001
Total volume = 40 μl
The premix therefore contained each restriction enzyme at 2.5 U/μl in 1 X NEB buffer #2 and 1 x BSA. Serial 10-fold dilutions of this premix were then prepared in 1 x NEB buffer #2 and 1 x BSA.
Premixes of canine genomic DNA, buffer and BSA were also prepared as detailed below:
Figure imgf000120_0001
These premixes therefore contain canine genomic DNA at 0.25 mg/ml in 1 x NEB buffer #2 and 1 x BSA.
Canine genomic DNA and restriction enzyme mixes were l ϋ then set up as follows:
Figure imgf000120_0002
2 μl aliquots were then removed from tubes 1 -7 and added to
1 μl of BspEI-released internal control DNA and 1 μl of 2x NEB buffer #2 and 2x BSA to give samples 1 -7ic. Samples 1 -7ic were then overlaid with 50 μi of mineral oil in order to prevent evaporation.
The remaining volume from tubes 1 -7 was then added to 100 μl of 1 x NEB buffer #2 and 1 x BSA.
All samples were finally incubated at 37°C overnight.
After digestion, 20 μl of digests 1 -7 were mixed with 10 μl of 50 % glycerol AGE loading dye and 4 μl of digests 1 -7ic were mixed with
2 μl of 50 % glycerol AGE loading dye. Digests in loading dye were then electrophoresed on a 2.5 % MetaPhor agarose gel in 1 x TBE. The gel was stained for 60 min in 500 ml of 1 x TBE containing 50 μl of Vistra Green. The stained gel was finally imaged on a Fluorimager with the following settings: a 488 nm laser; a 570 DF 30 filter; a PMT setting of 700 V; 200 μm resolution; and low sensitivity.
Restriction digests - final conditions
Restriction digests 1 -7 were therefore performed under the following conditions:
Figure imgf000121_0001
Total volume = 200 μl Likewise, restriction digests 1 -7ic were performed under the following conditions:
Figure imgf000122_0001
IC = BspEI-released internal control DNA
Total volume = 4 μl
The results are shown in figure 2.
Example 1 b - The digestion internal control plasmid for the 4 bp cutter set 1 TRSPA enzymes Haelll, Mbol, and Msel
The plasmid pNW35 (shown below) was constructed to contain an insert with all of the 4 bp cutter TRSPA enzyme sites.
Figure imgf000123_0001
Hindlll and EcoRI sites define the outer ends of the 25 bp and the 40 bp fragments. The sequence of pNW35 is shown below with the inserted region shown in bold type:
atgaccatgattacgccaagctctaatacgactcactatagggaaagcttccggacgtctcaggctaatgttggcccacc gacgttccacgatggggcgctcttaagggcttagaccctcgtcgggagtatttctgtgatctggcgacactcacgcg agaagtcattaccggcgatatgaattcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgttacccaactta atcgccttgcagcacatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagπgcg cagcctgaatggcgaatgggaaattgtaaacgttaatattttgttaaaattcgcgttaaatttttgttaaatcagctcattttttaacca ataggccgaaatcggcaaaatcccttataaatcaaaagaatagaccgagatagggttgagtgttgttccagtttggaacaag agtccactattaaagaacgtggactccaacgtcaaagggcgaaaaaccgtctatcagggcgatggcccactacgtgaacc atcaccctaatcaagttttttggggtcgaggtgccgtaaagcactaaatcggaaccctaaagggagcccccgatttagagcttg acggggaaagccggcgaacgtggcgagaaaggaagggaagaaagcgaaaggagcgggcgctagggcgctggcaa gtgtagcggtcacgctgcgcgtaaccaccacacccgccgcgcttaatgcgccgctacagggcgcgtcaggtggcacttttcg gggaaatgtgcgcggaacccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctgataaat gcttcaataatattgaaaaaggaagagtatgagtattcaacatttccgtgtcgcccttattcccttttttgcggcattttgccttcctgttt ttgctcacccagaaacgctggtgaaagtaaaagatgctgaagatcagttgggtgcacgagtgggttacatcgaactggatctc aacagcggtaagatccttgagagttttcgccccgaagaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggt attatcccgtattgacgccgggcaagagcaactcggtcgccgcatacactattctcagaatgacttggttgagtactcaccagtc acagaaaagcatcttacggatggcatgacagtaagagaattatgcagtgctgccataaccatgagtgataacactgcggcc aacttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgcacaacatgggggatcatgtaactcgccttgat cgttgggaaccggagctgaatgaagccataccaaacgacgagcgtgacaccacgatgcctgtagcaatggcaacaacgtt 5 gcgcaaactattaactggcgaactacttactctagcttcccggcaacaattaatagactggatggaggcggataaagttgcag gaccacttctgcgctcggcccttccggctggctggtttattgctgataaatctggagccggtgagcgtgggtctcgcggtatcatt gcagcactggggccagatggtaagccctcccgtatcgtagttatctacacgacggggagtcaggcaactatggatgaacga aatagacagatcgctgagataggtgcctcactgattaagcattggtaactgtcagaccaagtttactcatatatactttagattgat ttaaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgataatctcatgaccaaaatcccttaacgtgagttttcgttcc l () actgagcgtcagaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcgcgtaatctgctgcttgcaaacaaaa aaaccaccgctaccagcggtggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcagagcg cagataccaaatactgtccttctagtgtagccgtagttaggccaccacttcaagaactctgtagcaccgcctacatacctcgctc tgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataa ggcgcagcggtcgggctgaacggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaactgagatac
1 ctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcg gaacaggagagcgcacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcgggtttcgccacctctgactt gagcgtcgatttttgtgatgctcgtcaggggggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcctggcc ttttgctggccttttgctcacatgttctttcctgcgttatcccctgattctgtggataaccgtattaccgcctttgagtgagctgataccgc tcgccgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaagagcgcccaatacgcaaaccgcctc 0 tccccgcgcgttggccgattcattaatgcagctggcacgacaggtttcccgactggaaagcgggcagtgagcgcaacgcaat taatgtgagttagctcactcattaggcaccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcggataa caatttcacacaggaaacagct
The 4 bp internal control plasmid for restriction enzymes 5 Haelll, Mbol, and Msell was prepared by the insertion of a synthetic 130 bp fragment into Hindlll / EcoRI-digested pMOSblue (Amersham Pharmacia Biotech).
The insert region of 130 bp was sequenced twice in each direction in order to establish the plasmid sequence. The presence of the 0 restriction sites and the mobility of the fragments released were also checked by restriction digestion and polyacrylamide gel electrophoresis.
Preparation of the internal control spike DNA from the plasmid
The following illustration describes the strategy used for the 5 preparation of EcoRI-released internal control DNA:
Figure imgf000125_0001
crude hm i
+ dNTP
PCR product
SΔ-captured 4 -SA- -bio 1 dNTP PCR product clean ( -SA- - bio— 1
PCR product t
RE d igest I
4 bp cutter mterna control for RE digests
Primers and PCR
U-19 mer bio primer 5' bio-GTTTTCCCAGTCACGACGT 3' ICPCR(F) primer 5' TCCGGACGTCTCAGGCTAATGTT 3'
Bulk PCR amplification (96x 100 μl reactions, repeated to give 192 reactions in total) was carried out according to the conditions described below (all volumes are in μl):
Figure imgf000125_0002
The PCR master mix was rapidly dispensed into 96 PCR tubes Thermal cycling was initiated using the following parameters 94°C for 2 mm, 50 C for 2 mm, 29 cycles of 72CC for 2 mm, 94°C for 45 sec, and 50°C for 1 mm, 72 C for 8 mm, and then 4°C
Capture of biotinylated PCR product termini and EcoRI- release of internal control DNA
All separations were carried out using a Dynal MPC-1 separator (Dynal, product # 12001 ). 20 ml of pooled PCR reaction were mixed with 20 ml of
Dynabeads M-280 in 20 mM tns-HCI (pH 7 4), 2 mM EDTA (pH 8.0), 2 M NaCI The tube was incubated at room temperature for 1 hour with mixing on a Denley Spiramix 5
The Dynabeads were then washed four times with 20 ml of 10 mM tns-HCI (pH 7 4), 1 mM EDTA (pH 8.0), 1 M NaCI. A fifth wash was performed in 20 ml of 1 x buffer M.
EcoRI-digestion was carried out for 1 hour at 37°C in 5 ml of 0 25 U/μl EcoRI in 1 x buffer M The digest was then divided into ten 500 ul aliquot parts Each aliquot was ethanol precipitated by the addition of 1 ul of See DNA, 0 1 volume of 3 M sodium acetate (pH 5 2), and 2.5 volumes of ethanol The precipitations were mixed and chilled to 0°C on ice for 30 minutes and then centπfugation at 20,000 maxRCF for 10 minutes. The pellets were rinsed with 70 % ethanol before dissolving in a total of 500 μl of 1 x TE buffer. A 1 μl aliquot of the EcoRI-released internal control DNA was electrophoresed on an 8 % polyacrylamide gel to confirm that the size of the purified DNA was in accordance with that expected. High molecular weight genomic DNA digestion in the presence of the 4 bp cutter internal control DNA with a dilution series of the enzymes Haelll, Msel, and Mbol
High molecular weight human placental DNA was mixed with a dilution series of a mixture of the three set 1 4 bp cutter restriction endonucleases - each at the same number of units. Aliquots were then removed and mixed with the EcoRI-released internal control DNA. 3.6 μg of Human placental DNA (Sigma) was digested with 0.5 U/μl, 0.1 U/μl, 0.02 U/μl, 0.004 U/μl, and 0 U/μl each of Haelll, Mbol, and Msel in a total i o volume of 36 μl. 0.4 μg of placental DNA and 1 μl of EcoRI-released internal control DNA were digested with 0.5 U/μl, 0.1 U/μl, 0.02 U/μl, 0.004 U/μl, and 0 U/μl each of Haelll, Mbol, and Msel in a total volume of 4 μl. Vistra Green (Amersham Pharmacia Biotech) staining was used to monitor the extent of internal control DNA digestion and high molecular weight DNA digestion.
A premix of restriction enzymes (5 U/μl each enzyme), buffer, and BSA was prepared as described below:
Figure imgf000127_0001
o Serial 5-fold dilutions of the premix were prepared in 1 x NEB buffer #2 and 1 x BSA.
A 6x premix of human placental DNA, buffer, and BSA were also prepared as described below: Component 6x mix (μl) per reaction (μl)
1 mg/ml human placental DNA 24 4
10x NEB buffer #2 9.6 1 .6
, 10x BSA 9.6 1 .6 i Water 52.8 8.8
This premix contained placental DNA at 0.25 mg/ml in 1 x NEB buffer #2 and 1 x BSA.
Human placental DNA and restriction mixes were then set up as follows:
Figure imgf000128_0001
2 μl aliquots were then removed from tubes 1 -5 and added to 1 μl of EcoRI-released internal control DNA and 1 μl of 2x NEB buffer #2 and 2x BSA to give samples 1 -5ic. Samples 1 -5ic were then overlaid with 50 μl of mineral oil in order to prevent evaporation.
The remaining volume from tubes 1 -5 was then added to 18 μl of 1 x NEB buffer #2 and 1 x BSA.
All samples were finally incubated at 37°C overnight. Restriction digests - final conditions
Restriction digests 1 -5 were therefore performed under the following conditions:
Figure imgf000129_0001
Total volume = 36 μl
Likewise, restriction digests 1 -5ic were performed under the following conditions:
Figure imgf000129_0002
IC = Eco^-released internal control DNA
Total volume = 4 μl
10 μl of digests 1 -5 were each mixed with 3 μl of loading dye and 4 μl of digests 1 -5ic were mixed with 1 μl of loading dye. To sample number 5ic, 180 ng of PCR molecular weight markers were added to serve as size standards. The band sizes for these markers are 50 bp, 100 bp, 200 bp, 300 bp, 400 bp, 500 bp, 525 bp, 700 bp, and 1000 bp. The digests were then electrophoresed on an 8 % polyacrylamide gel in 1 x TBE as described below:
10 ml of 10x TBE 19.5 ml of 40 % acrylamide
19 ml of 2 % methylene bisacrylamide
50.4 ml of water
1 ml of freshly prepared 10 % (w/v) ammonium persulphate
100 μl of TEMED
A Cambridge Electrophoresis Ltd. vertical protein electrophoresis unit was used with 1 mm plate spacing. The samples were electrophoresed at 30 mA for 2 hours in 1 x TBE. The gel was then stained for 30 minutes in 500 ml of 1 x TBE containing 50 μl of Vistra Green. The stained gel was imaged on a Fluorimager with the following settings: a 488 nm laser; a 570 DF 30 filter; a PMT setting of 700 V; 200 μm resolution; and low sensitivity.
The results are shown in figure 3.
Example 2
TRSPA-2 analysis with the pNW33 BamHI, Hindlll, and Aflll matrix and the pNW33 Hindlll, Ncol, and Spel matrix probed with the 140 bp BspEl to BamHI fragment from pNW33 As an example of the case where the probe hybridizes to an arrayed PCR fragment with a different restriction site at each end, the pNW33 BamHI , Hindlll, and Aflll matrix (matrix #7) was probed with a PCR product from the 140 bp BspEl (466) to BamHI (605) restriction fragment within pNW33. In this example, the probe binds to a 204 bp Hindlll to BamHI PCR product derived from the 1 60 bp Hindlll (445) to BamHI (605) restriction fragment within pNW33 (see below).
Figure imgf000131_0001
As an example of the case where the probe hybridizes to an arrayed PCR fragment with the same restriction site at each end, the pNW33 Hindlll, Ncol, and Spel matrix (matrix #17) was probed with a PCR product from the 140 bp BspEl (466) to BamHI (605) restriction fragment within pNW33. In this example, the probe binds to a 514 bp Hindlll to Hindlll PCR product derived from the 470 bp Hindlll (445) to Hindlll (915) restriction fragment within pNW33 (see above).
Oligonucleotides
BamHI short PCR primer 5' TGTAACGACACATTGCTGGATACC 3' Hindlll short PCR primer 5' ATATAACTCTCGCTCCTTGATAAC 3'
Ncol short PCR primer 5' AGGCGTCTGAGGCTGCGGCTATGG 3'
Spel short PCR primer 5' AACCCGTCGCGACGAGAGTCTAAG 3'
Aflll short PCR primer 5' GATATACGTGATATATTTTGATTG 3'
BamHI adaptor 5' pGATCGGTATCCAGCAATGTGTCGTTACA 3'
Hindlll adaptor 5' pAGCTGTTATCAAGGAGCGAGAGTTATAT 3'
Ncol adaptor 5' pCATGCCATAGCCGCAGCCTCAGACGCCT 3'
Spel adaptor 5' pCTAGCTTAGACTCTCGTCGCGACGGGTT 3' Aflll adaptor 5' pTTAACAATCAAAATATATCACGTATATC 3'
BamHI long PCR primer 5' TGTAACGACACATTGCTGGATACCGATCC 3'
Hindlll long PCR primer 5' ATATAACTCTCGCTCCTTGATAACAGCTT 3' Ncol long PCR primer 5' AGGCGTCTGAGGCTGCGGCTATGGCATGG 3'
Spel long PCR primer 5' AACCCGTCGCGACGAGAGTCTAAGCTAGT 3'
Aflll long PCR primer 5' GATATACGTGATATATTTTGATTGTTAAG 3'
Luc140down primer i o 5' GCGCTAGGGATCCTTACTGGGACGAAGACGAA 3'
Luc140up-bιo primer 5' bιotιn-CGCAGCTGGTAATCCGGACGCCCGCGTCGAAGATGTT3'
Restriction digestion of pNW33
Restriction digests of pNW33 were set up by combining the following:
Matrix #7 0
Figure imgf000132_0001
The samples were then vortex mixed, and incubated at 37°C overnight.
Calf intestinal alkaline phosphatase (CIAP) digestion of pNW33 5 restriction digests
400 μl fractions of the 20 restriction digests (each containing 6.6 μg of digested pNW33) were ethanol precipitated by the addition of 1 μl of See DNA, 0.1 volume of 3 M sodium acetate (pH 5.2), and 2.5 volumes of ethanol, chilling to -20°C, and then centrifugation. Pellets were rinsed in with 70 % ethanol prior to re-dissolution in 20 μl of 1 x CIAP buffer containing 40 U of CIAP. The CIAP digests were carried out for 5 hours at 37°C and were then made up to 400 μl with 1 x TE buffer.
Phenol extraction of pNW33 CIAP digests
15 The diluted digests were extracted with 400 μl of phenol and then ethanol precipitated as described above but with a 100 % ethanol wash after the 70 % ethanol wash. Samples were finally re-dissolved in 20 μl of TE buffer.
0 Annealing of short PCR primers to cognate adaptors
Short PCR primers and their cognate adaptors were annealed by adding 1 μl of 200 μM short PCR primer to 1 μl of 200 μM cognate adaptor in 20 μl of 50 mM NaCI, 1 x TE buffer. The mixed oligonucleotides were overlaid with 30 μl of light mineral oil and were then heated to 90°C 5 for 5 minutes followed by slow cooling to room temperature. The annealed short PCR primer / cognate adaptor complexes were then diluted with 1 ml of 1 x TE buffer and stored frozen at -20°C.
Ligation to annealed short PCR primers and cognate adaptors 0 1 μl of phenol extracted pNW33 digest was used per ligation.
Ligations were performed in 100 μl of 1 x ligase buffer containing 1 mM ATP and 10 μl (100 U) of T4 DNA ligase. 20 μl aliquots from the 1 ml of annealed short PCR primer / cognate adaptor complexes were added according to the following table.
#7 BamHI / Hindlll / Aflll
#17 Hindlll / Ncol I Spel
Ligation reactions were carried out for 24 hours at 16°C. Samples were then diluted to 1 ml with TE buffer and stored at -20°C.
The diluted ligation reactions were then further diluted 1 in 10 and 10 μl was used as PCR template per 100 μl reaction.
Restriction digestion of human placental DNA
1 U (-50 μg) of human placental DNA was digested overnight at 37CC with 100 U each of BamHI, BsrGI, Hindlll, Ncol, Spel, and Aflll in 400 μl of 1x NEB buffer #2 containing 100 μg/ml BSA.
CIAP digestion of human placental DNA restriction digest
The digest was ethanol precipitated by the addition of 1 μl of See DNA, 0.1 volume of 3 M sodium acetate (pH 5.2), and 2.5 volumes of ethanol, chilling to -20°C, and then centrifugation. The pellet was rinsed with 70 % ethanol prior to re-dissolution in 50 μl of 1 x CIAP buffer containing 100 U of CIAP. The CIAP digest was carried out for 5 hours at 37°C and was then made up to 400 μl with 1 x TE buffer.
Phenol extraction of human placental DNA CIAP digest The diluted CIAP digest was extracted with 400 μl of phenol and then ethanol precipitated as described above - again with a 100 % ethanol wash after the 70 % ethanol wash. The sample was finally re- dissolved in 10 μl of TE buffer. Ligation to annealed short PCR primers and cognate adaptors
Short PCR primers were annealed to their cognate adaptors as described above.
The ligation to annealed short PCR primers and cognate adaptors was carried out in 100 μl of 1 x ligase buffer with 1 mM ATP. 100 U of T4 DNA ligase and 10 μl of each of the six short PCR primer / cognate adaptor complexes as above.
The ligation reaction was carried out for 24 hours at 16°C. The sample was then diluted to 1 ml with 1 x TE buffer and stored at -20°C. 0.2 μl was used as PCR template per 100 μl reaction.
PCR amplification conditions
An initial touch-down reaction was carried out in 50 μl of 1x PCR buffer with all four dNTPs at 200 μM and Taq DNA polymerase at 0.05 U/μl. Long PCR primers were used at 400 nM. 10 μl of pNW33 PCR template was used per reaction and 0.2 μl of human placental DNA PCR template was used per reaction. The samples were overlaid with 40 μl of light mineral oil and were touch-down thermocycled as described below:
98°C for 1 min, 72°C for 2 min, 72°C for 5 min 98°C for 1 min, 69°C for 2 min, 72°C for 5 min 98°C for 1 min, 66°C for 2 min, 72°C for 5 min 98°C for 1 min, 63°C for 2 min, 72°C for 5 min 98°C for 1 min, 60°C for 2 min, 72°C for 5 min
The main thermal cycling was then carried out in 100 μl of 1x PCR buffer with all four dNTPs at 200 μM and Taq DNA polymerase at 0.05 U/μl. Long PCR primers were used at 400 nM. The samples were subjected to 20 cycles of 98°C for 1 min, 60°C for 2 min, 72°C for 5 min and then 72°C for 10 min followed by chilling at 10°C and recovery from under oil. Arraying onto nylon membranes
1 μl aliquots from the PCR amplified samples were spotted onto Hybond N+ nylon membranes. The membranes were then transferred to a stack of three sheets of 3MM paper, saturated with 0.4 M NaOH, and incubated for 10 minutes. The NaOH was rinsed away in 2x SSC and the membranes were used directly for the pre-hybridization.
Probe synthesis A PCR master mix was prepared as follows (all volumes are in μl):
Figure imgf000136_0001
Five PCR reactions were performed. For each reaction, 5 μl of 33P-labelled dCTP was added to the PCR tube. The reactions were then made up to 50 μl by the addition of 45 μl of the master mix to each tube. Each reaction was gently mixed. PCR cycling parameters
94°C 2 min
50 °C 2 min
25 cycles
Figure imgf000137_0001
50 °C 1 min
72°C 8 min
4°C hold
After thermal cycling, the five PCR amplifications were pooled. The 250 μl of labelled DNA was then used for the following magnetic bead purification procedure.
Capture of biotinylated PCR product and release of non-biotinylated strands
All separations were carried out using a Dynal MPC-4 separator (Dynal, product #12004).
250 μl of the pooled PCR was mixed with an equal volume of 10 mg/ml streptavidin-coated colloidal Fe304 particles in 20 mM tris-HCI (pH 7.4), 2 mM EDTA (pH 8.0), 2 M NaCI. The tube was incubated at room temperature for 1 hour with mixing on a Denley Orbital Mixer.
The streptavidin-coated colloidal Fe304 particles were then washed with 1 ml of 10 mM tris-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI. Three more identical washes were performed. The washed streptavidin-coated colloidal Fe304 particles were incubated in 500 μl of 0.1 M NaOH for 10 minutes at room temperature. The supernatant was removed and added to 500 μl of 0.5 M HEPES. The samples were then ethanol precipitated by the addition of 0.1 volume of 3 M sodium acetate (pH 5.2) and 2.5 volumes of ethanol. chilling to 0CC (on ice for 30 minutes), and then centrifugation at 20,000 maxRCF for 10 minutes. The pellets were rinsed with 70 % ethanol before dissolving in 100 μl of TE buffer.
Hybridization
Each membrane was placed in a 55 mm x 35 mm x 21 mm plastic box and 1.25 ml of pre-hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [Mw 500,000]; 0.3 % tetrasodium pyrophosphate; 100 μg/ml denatured, sonicated DNA - pre- warmed to 65°C) was added. Each box was closed and incubated at 65°C for 50 minutes on a rocking platform. The pre-hybridization solution was removed and replaced with hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [Mw 500,000]; 0.3 % tetrasodium pyrophosphate; 100 μg/ml denatured, sonicated DNA - containing 5 μl of the appropriate 33P-labelled probe) and the box was incubated at 65°C for 3 hours on a rocking platform.
Washing
The membranes were drained and transferred to 200 ml of 2x SSC, 0.1 % SDS at 68°C for 30 minutes. A further wash was carried out in 0.2x SSC, 0.1 % SDS at 71 °C for 30 minutes. The membranes were rinsed in 2x SSC at room temperature and laid out on blotting paper to remove excess liquid. Once dry, the membrane was covered in Saran
Wrap and exposed to a Kodak Phosphor Screen for 1 hour. The phosphor screen was subsequently imaged using a Molecular Dynamics Storm 860 Phosphorimager.
Results are shown in figures 4 and 5. Example #3a
A single cycle of inter-population perfectly matched duplex depletion wherein E. coli MutS protein is used to capture an A/A mismatch- 5 containing duplex
'Affected' versus 'unaffected' (i.e. inter-population) mismatch- containing duplex selection can be achieved by: attaching a mismatch- binding protein to a solid support (or using the mismatch-binding protein in solution followed by subsequent solid-phase capture); taking denatured i o 'affected' DNA fragments and hybridizing these to denatured and biotinylated 'unaffected' DNA fragments; and capture of mismatch- containing duplex molecules with the mismatch-binding protein. Releasing the mismatch-containing duplex molecules (without strand denaturation), streptavidin capture and then release of the non-biotinylated strands will
15 give only the desired species.
In this example, PCR fragments are prepared and used to demonstrate each of the individual steps for a single cycle of inter- population perfectly matched duplex depletion using E. coli MutS protein.
0 Clone insert design
The clone inserts were constructed using standard cloning methodology well known to those skilled in the art and were inserted between the Aval s e and EcoRI of pMOSBIue (Amersham Pharmacia Biotech). 5 The clone inserts contain a common 9 base pair internal core sequence in which a single nucleotide change or an insertion can be located. The internal core sequence is derived from codons 272-274 of human p53. These codons (GTG CGT GGT) correspond to a mutational hotspot found in lung and other types of cancer (R273L). For the design of 0 the clone inserts containing a mismatch (only one from a complete series of which is shown below), the nucleotide in position 5 of this core sequence is modified (GTGCXTGGT).
The internal core sequence is flanked by a random sequence - allowing the independent detection of the clone #1 and the clone #7 insert sequence in a mixed population of clone inserts.
5
Clone #1
Mutant sequence (#1 M)
I O Ξ ' CCCGGGGGATCCTCGTTTTATTGGGCCGAGTTTTGGTCCGTAGTGCTTGGTTAGATATGCTTAT
GTTCACAAAATCATCCTTGTACAGAATTC3 ' CAAGTGTTTTAGTAGGAACATGTCTTAAG5 '
15
Control sequence (#1 C)
5 ' CCCGGGGGATCCTCGTTTTATTGGGCCGAGTTTTGGTCCGTAGTGCATGGTTAGATATGCTTAT 3 ' GGGCCCCCTAGGAGCAAAATAACCCGGCTCAAAACCAGGCATCACGTACCAATCTATACGAATA 0
GTTCACAAAATCATCCTTGTACAGAATTC3 ' CAAGTGTTTTAGTAGGAACATGTCTTAAG5 '
Clone #7 5
Control sequence (#7C)
5 ' CCCGGGTGTACACAAAAGTTTACCTGAAGAACGTGGGGGGTCGTGCCTGGTCTTGCGTCACCTG 3 ' GGGCCCACATGTGTTTTCAAATGGACTTCTTGCACCCCCCAGCACGGACCAGAACGCAGTGGAC 0
GTCTCAGGAGAGGGTCCCCATGGGAATTC3 ' CAGAGTCCTCTCCCAGGGGTACCCTTAAG5 '
5 Preparation of upper strand 5'-biotinylated and lower strand 5'- biotinylated #1 C double-stranded PCR product, and upper strand 5'- biotinylated and lower strand 5'-biotinylated #7C double-stranded PCR product
Oligonucleotides
BIOUPST2 5' bio-CTACTGATCGGATCCCCG 3'
BIODOWN3 5' bio-AAACGACGGCCAGTGAAT 3'
PCR reaction set-up (#1 C)
#1 C in pMOSBIue at 2.50 μg/ml 100 μl
Water 8400 μl
10x PCR buffer 1000 μl 50 μM BIOUPST2 100 μl
50 μM BIODOWN3 100 μl
10 mM dNTPs 200μl
Taq DNA polymerase (5 U/μl) 100 μl
PCR reaction set-up (#7C)
#7C in pMOSBIue at 3.32 μg/ml 100 μl
Water 8400 μl
10x PCR buffer 1000 μl 50 μM BIOUPST2 100 μl
50 μM BIODOWN3 100 μl
10 mM dNTPs 200μl
Taq DNA polymerase (5 U/μl) 100 μl 96x 200 μl reactions were carried out for template #1 C and 96x 200 μl reactions were carried out for template #7C on a 96-well Perkin Elmer Cetus GeneAmp PCR System 9600 machine as described below:
95 °C, 5 minutes 1 cycle 95 °C, 1 minute j 50 °C, 1 minute \ 30 cycles 72 °C, 1 minute 72 °C, 5 minutes 1 cycle 4°C, hold
The PCR products were pooled together and precipitated by adding 0.1 volumes of 3 M sodium acetate and 1 volume of isopropanol followed by centrifugation at 16.000 rpm for 30 minutes at 4°C (in a Centrikon T-2070 ultracentrifuge; swinging bucket Kontron rotor TST 41.14). Pellets were washed with 14 ml of ethanol and centrifuged at 20,000 rpm for 30 minutes. Finally, the pellets were air-dried and resuspended in a total volume of 0.6 ml of TE buffer.
The 0.6 ml PCR sample was purified in twelve Microspin S- 300 HR columns (50 μl per column) following the manufacturer's protocol. Briefly, the resin in the columns was resuspended by vortexing. Columns were centrifuged at 735 x g (3000 rpm in a microfuge) for 1 minute. The sample was then applied to the centre of the resin, being careful not to disturb the bed. The columns were centrifuged at 735 x g for 2 minutes and the flow-through containing the PCR product was collected. The twelve eluted 50 μl volumes were pooled together (pool 1 ). Columns were washed with 50 μl of TE buffer and the eluted fractions were loaded onto a fresh S-300 HR column. Product yield and removal efficiency of the PCR primers were analysed on a 1.5 % agarose gel. Preparation of non-biotinylated #1 M single-stranded DNA and non- biotinylated #7C single-stranded DNA
Oligonucleotides
BIOUPST2 5' bio-CTACTGATCGGATCCCCG 3'
DOWN3 5' AAACGACGGCCAGTGAAT 3'
PCR reaction set-up (#1 M)
#1 M in pMOSBIue at 3.56 μg/ml 10O μl
Water 8400 μl
10x PCR buffer 10OO μl
50 μM BIOUPST2 100 μl
50 μM DOWN3 100 μl
10 mM dNTPs 200μl
Taq DNA polymerase (5 U/μl) 100 μl
PCR reaction set-up (#7C)
#7C in pMOSBIue at 3.32 μg/ml 100 μl
Water 8400 μl
10x PCR buffer 1000 μl
50 μM BIOUPST2 100 μl
50 μM DOWN3 10O μl
10 mM dNTPs 200μl
Taq DNA polymerase (5 U/μl) 100 μl
48x 200 μl reactions were carried out for template #1 M and 48x 200 μl reactions were carried out for template #7C on a 96-well Perkin Elmer Cetus GeneAmp PCR System 9600 machine as described below:
95 °C, 5 minutes 1 cycle 95 °C, 1 minute 50 °C, 1 minute 30 cycles 72 °C, 1 minute 72 °C, 5 minutes 1 cycle 4°C, hold
Capture of biotinylated PCR product strands and release of non- biotinylated strands for the preparation of non-biotinylated #1 M single-stranded DNA and non-biotinylated #7C single-stranded DNA
10 ml of pooled #1 M and #7C PCR amplifications were each mixed with an equal volume of 4 mg/ml streptavidin-coated colloidal Fe3O4 particles taken up in 20 mM tns-HCI (pH 7.4), 2 mM EDTA (pH 8.0), 2 M NaCI. The tubes were incubated at room temperature for 60 minutes with mixing.
The streptavidin-coated colloidal Fe3O particles were then washed with 20 ml of 10 mM tns-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI. Two more identical washes were performed. The washed streptavidin-coated colloidal Fe30 particles were finally incubated in 800 μl of 0.1 M NaOH for 10 minutes at room temperature. The supernatants were removed and added to 200 μl of 2 M HEPES (free acid). Samples were quantified by absorbance at 260 nm.
Denaturation and annealing
Denaturation and annealing reactions were prepared by mixing: A = 400 ng of upper strand 5'-biotinylated and lower strand 5' biotinylated #1 C double-stranded PCR product
B = 400 ng of upper strand 5'-biotinylated and lower strand 5'- biotinylated #7C double-stranded PCR product
C = 200 ng of non-biotinylated #1 M single-stranded DNA (the lower of the two #1 M strands shown above)
D = 200 ng of non-biotmylated #7C single-stranded DNA (the lower of the two #7C strands shown above)
Reannealing between the upper strand of A and the single- stranded C will therefore give rise to an A/A mismatch-containing duplex, 5'-biotinylated on the upper strand, for clone insert #1 . Reannealing between the upper strand of B and the single- stranded D will therefore give rise to a perfectly matched duplex, 5'-biotinylated on the upper strand, for clone insert #7.
0.1 volumes of 1 M NaOH were then added, followed by incubation at room temperature for 10 minutes. 0.25 volumes of 2 M HEPES (free acid) were finally added followed by incubation at 42°C for 1 hour.
Samples were adjusted to 50 μl and were made 1 x in PBS and 1 mg/ml in BSA ready for reaction with MutS protein-coated magnetic beads. One 50 μl sample (the pre-enrichment control, sample 6) was used directly for capture of biotinylated PCR product strands and release of non-biotinylated strands.
Mismatch capture 20 μl of M2B2 MutS protein-coated magnetic particles
(Genecheck, lot #20) were added to the annealed DNA above. Samples were incubated for 1 hour at room temperature with shaking.
Samples were then washed twice with 200 μl of ice-cold PBS. Samples were finally eluted from the magnetic beads for 10 minutes at room temperature in 50 μl of the following:
Figure imgf000146_0001
Capture of biotinylated PCR product strands and release of non- biotinylated strands
All separations were carried out using an Amersham magnetic separator (RPN1682, batch #1 ).
50 μl of the eluates from the MutS protein-coated magnetic beads and the pre-enhchment control (sample 6) were each mixed with an equal volume of 4 mg/ml streptavidin-coated colloidal Fe30 particles in 20 mM tris-HCI (pH 7.4), 2 mM EDTA (pH 8.0), 2 M NaCI. The tubes were incubated at room temperature for 30 minutes with regular mixing.
The streptavidin-coated colloidal Fe304 particles were then washed twice with 500 μl of 10 mM tris-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI at room temperature.
The washed streptavidin-coated colloidal Fe 04 particles were incubated in 10 μl of 0.1 M NaOH for 10 minutes at room temperature. The supernatant was removed and added to 2.5 μl of 2 M HEPES (free acid).
5 μl fractions were finally spotted onto Hybond N+ nylon membranes along with 200 ng, 20 ng, 2 ng, and 200 pg amounts of the following: C = non-biotinylated #1 M single-stranded DNA (the lower of the two #1 M strands shown previously)
D = non-biotinylated #7C single-stranded DNA (the lower of the two #7C strands shown previously)
Probe labelling
Oligonucleotides
#1 probe oligo 5' GGCCGAGTTTTGGTCCGTAG 3'
#7 probe oligo 5' GTCTTGCGTCACCTGGTCTCAG 3'
Preparation of probes
#1 probe oligo and #7 probe oligo were radioactively 5' end- labelled using T4 polynucleotide kinase as described below (all volumes are in μl):
Figure imgf000147_0001
The reactions were incubated at 37°C for 30 minutes and then heated to 70°C for 5 minutes to denature the enzyme.
MicroSpin G-25 column purification
Two G-25 columns (APB 27-5325-01 , lot #901532501 1 ) were resuspended by vortexing, and the bottom closures were snapped off as described in the manufacturer's instructions. A pre-spin of the columns was carried out for 1 minute at 730 maxRCF - 2670 rpm in a Hettich Zeutrifugen EBA 12 benchtop centrifuge - and the eluates were discarded. The 25 μl reactions were applied to each column, and the columns were centrifuged for 2 minutes at 730 maxRCF. The eluates from the second spin were stored and used as probes.
Hybridization
Each membrane was placed in a 55 mm x 35 mm x 21 mm plastic box and 2.5 ml of pre-hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate; 100 μg/ml denatured, sonicated DNA - pre- warmed to 42°C) was added. Each box was closed and incubated at 42°C for 1 hour on a rocking platform. The pre-hybridization solution was removed and replaced with hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate; 100 μg/ml denatured, sonicated DNA - containing 2.5 μl of the appropriate 33P-labelled probe) and the box was incubated at 42°C overnight on a rocking platform.
Washing
The membranes were drained and transferred to 200 ml of 2x SSC, 0.1 % SDS at 42°C for 10 minutes. A further wash was carried out in 0.2x SSC, 0.1 % SDS at 42°C for 10 minutes. The membranes were rinsed in 2x SSC at room temperature and laid out on blotting paper to remove excess liquid. Once dry, the membrane was covered in Saran Wrap and exposed to a Kodak Phosphor Screen for 1 hour. The phosphor screen was subsequently imaged using a Molecular Dynamics Storm 860 Phosphorimager.
Dot blot layout
Figure imgf000149_0001
ssDNA = single-stranded DNA
Probe hybridisation results
#1 probe #7 probe
Figure imgf000149_0002
Signal intensities from the spots above were reported using ImageQuant 5.0 software (Molecular Dynamics), with the SumAboveBG figures being used after drawing a 6x3 grid over the array of spots. #1 probe SumAboveBG signals
Figure imgf000150_0001
#7 probe SumAboveBG signals
Figure imgf000150_0002
Recovery
Figure imgf000150_0003
Recovery figures are plotted below for the various different elution conditions: #1 signal (% input ssDNA)
#7 signal (% input ssDNA)
Figure imgf000151_0001
Example #3b
A single cycle of inter-population perfectly matched duplex depletion wherein bacteriophage T4 endonuclease VIM protein containing a cleavage-inactivating N62D point mutation is used to capture an A/A mismatch-containing duplex
In this example, PCR fragments are again prepared and used to demonstrate each of the individual steps for a single cycle of inter- population perfectly matched duplex depletion using bacteriophage T4 endonuclease VIII protein containing a cleavage-inactivating N62D point mutation.
Clone design and DNA preparation
Clone insert design was exactly as described in example #3a (see clone #1 : mutant sequence (#1 M), control sequence (#1 C) and clone #7: control sequence (#7C)). Preparation of upper strand 5'-biotinylated and lower strand 5'-biotinylated #1 C double-stranded PCR product, and upper strand S'- biotinylated and lower strand 5'-biotinylated #7C double-stranded PCR product were as described in example #3a. Preparation of non-biotinylated #1 M single-stranded DNA and non-biotinylated #7C single-stranded DNA were again as described in example #3a.
Denaturation and annealing Denaturation and annealing reactions were prepared as described in example #3a.
Reannealing will therefore again give rise to an A/A mismatch-containing duplex, 5'-biotinylated on the upper strand, for clone insert #1 and a perfectly matched duplex, 5'-biotinylated on the upper strand, for clone insert #7.
One 50 μl sample (the pre-enhchment control) was adjusted to 100 μl with TE buffer and was used directly for capture of biotinylated PCR product strands and release of non-biotinylated strands.
Mismatch capture
50 μl samples of the annealing reaction were mixed with an equal volume of 200 mM sodium phosphate (pH 6.5), 100 mM KCI. 10 μg of GST-tagged T4 endonuclease VII N62D mutant (obtained from Prof. Bδrries Kemper, Univ. Cologne) were added to the annealed DNA and the samples were incubated for 15 minutes at 16°C.
Samples were then mixed with 200 μl of a 50 % slurry of Glutathione Sepharose 4B (Amersham Pharmacia Biotech, lot #279991 ) in 100 mM sodium phosphate (pH 6.5), 50 mM KCI. The tubes were incubated at 16°C for 30 minutes with regular mixing. The mixture was then transferred to a spin column to separate solid from liquid phases.
Samples were finally eluted from the Glutathione Sepharose 4B matrix for 10 minutes at room temperature in 100 μl of 10 mM reduced glutathione in 50 mM tris-HCI (pH 8.0).
Capture of biotinylated PCR product strands and release of non- biotinylated strands
All separations were carried out using an Amersham magnetic separator (RPN1682, batch #1 ).
100 μl of the eluate from the N62D T4 endonuclease VII mismatch-capture reaction, and the pre-enhchment control were each mixed with an equal volume of 4 mg/ml streptavidin-coated colloidal Fe304 particles in 20 mM tris-HCI (pH 7.4), 2 mM EDTA (pH 8.0), 2 M NaCI. The tubes were incubated at room temperature for 30 minutes with regular mixing.
The streptavidin-coated colloidal Fe304 particles were then washed twice with 500 μl of 10 mM tris-HCI (pH 7.4), 1 mM EDTA (pH 8.0), 1 M NaCI at room temperature.
The washed streptavidin-coated colloidal Fe30 particles were incubated in 10 μl of 0.1 M NaOH for 10 minutes at room temperature. The supernatant was removed and added to 2.5 μl of 2 M HEPES (free acid).
5 μl fractions were finally spotted onto Hybond N+ nylon membranes along with 200 ng, 20 ng, 2 ng, and 200 pg amounts of the following:
C = non-biotinylated #1 M single-stranded DNA (the lower of the two #1 M strands shown previously)
D = non-biotinylated #7C single-stranded DNA (the lower of the two #7C strands shown previously)
Probe labelling
Probe labelling was exactly as described in example #3a. Hybridization
Each membrane was placed in a 55 mm x 35 mm x 21 mm plastic box and 2.5 ml of pre-hybridization solution (5x SSC: Denhardt's solution: 1 % SDS; 10 % dextran sulphate [MW 500,000]; 0.3 % tetrasodium pyrophosphate: 100 μg/ml denatured, sonicated DNA - pre- warmed to 42°C) was added. Each box was closed and incubated at 42°C for 10 minutes on a rocking platform. The pre-hybridization solution was removed and replaced with hybridization solution (5x SSC; Denhardt's solution; 1 % SDS; 10 % dextran sulphate [MW 500.000]; 0.3 % tetrasodium pyrophosphate; 100 μg/ml denatured, sonicated DNA - containing 2.5 μl of the appropriate 33P-labelled probe) and the box was incubated at 42°C overnight on a rocking platform.
Washing
The membranes were drained and transferred to 200 ml of 2x SSC, 0.1 % SDS at 42°C for 10 minutes. A further wash was carried out in 0.2x SSC, 0.1 % SDS at 42°C for 10 minutes. The membranes were rinsed in 2x SSC at room temperature and laid out on blotting paper to remove excess liquid. Once dry, the membrane was covered in Saran
Wrap and exposed to a Kodak Phosphor Screen for 1 hour. The phosphor screen was subsequently imaged using a Molecular Dynamics Storm 860 Phosphorimager.
Signal intensities from the spots above were again reported using ImageQuant 5.0 software (Molecular Dynamics), with the
SumAboveBG figures being used after drawing a grid over the array of spots. #1 probe SumAboveBG signals
Figure imgf000155_0001
#7 probe SumAboveBG signals
Figure imgf000155_0002
Recovery
Figure imgf000155_0003
The enrichment results are presented graphically below:
Figure imgf000156_0001
#1 recovery (% input #7 recovery (% input ssDNA) ssDNA)
Dedicated to the memory of Chris Griffin and Richard Beer

Claims

1. A method of providing a mixture of DNA fragments enriched in fragments that are characteristic of a phenotype of interest, by providing affected DNA in fragmented form and providing unaffected DNA in fragmented form, which method comprises: a) mixing the fragments of the affected DNA and the fragments of the unaffected DNA under hybridising conditions; b) recovering a mixture of hybrids that contain mismatches; c) recovering fragments of the affected DNA from the mixture of hybrids that contain mismatches; and optionally repeating steps a), b) and c) one or more times.
2. The method of claim 1 wherein the affected DNA is pooled DNA of individuals who show the phenotype of interest, and the unaffected DNA is pooled DNA of individuals who do not show the phenotype of interest.
3. The method of claim 1 , wherein the affected DNA is DNA of one individual who shows the phenotype of interest, and the unaffected DNA is pooled DNA of individuals who do not show the phenotype of interest.
4. The method of claim 1 , wherein the affected DNA is DNA of one individual who shows the phenotype of interest, and the unaffected DNA is pooled DNA of a complete set of ancestors who do not show the phenotype of interest.
5. The method of claim 1 , wherein the affected DNA is DNA from cells of an individual that show the phenotype of interest, and the unaffected DNA is DNA from cells of the individual that do not show the phenotype of interest.
6. The method of any one of claims 1 to 5, wherein step b) is performed by use of a mismatch-binding protein.
7. The method of any one of claims 1 to 6, wherein either the fragments of the affected DNA or the fragments of the unaffected DNA are tagged by one member of a specific binding pair, and step c) is performed by using the other member of the specific binding pair.
8. The method of claim 7, wherein the fragments of the unaffected DNA are tagged with biotin, and step c) is performed by use of immobilised streptavidin.
9. The method of any one of claims 1 to 8, wherein the mixture of DNA fragments enriched in fragments that are characteristic of the phenotype of interest, is subjected to self-hybridisation followed by recovery of perfectly matched duplexes.
10. The method of any one of claims 1 to 9, wherein the mixture of DNA fragments enriched in fragments that are characteristic of the phenotype of interest, is mixed with an excess of the fragments of the affected DNA under hybridisation conditions, followed by recovery of perfectly matched duplexes.
1 1. The method of any one of claims 1 to 10, wherein each of the affected DNA and the unaffected DNA is provided in fragmented form by digestion with from 4 to 7 six-cutter restriction endonuclease enzymes together with from 0 to 5 four-cutter restriction endonuclease enzymes.
12. A mixture of DNA fragments enriched in fragments that are characteristic of the phenotype of interest, provided by the method of any one of claims 1 to 1 1.
13. A method of making a set of arrays of fragments of DNA of interest, which method comprises: a) selecting, from a set of n restriction endonuclease enzymes, a subset of r restriction endonuclease enzymes; b) digesting genomic DNA with the subset of r enzymes; c) ligating to the resulting fragments restriction-enzyme-cutting- site-specific adapters with unique polymerase chain reaction amplifiable sequences; d) splitting the resulting fragments into r2 aliquots; e) amplifying each aliquot with two restriction-enzyme-specific primers of which one is tagged; f) forming an array of the r2 aliquots of non-tagged amplimer strands; and g) repeating steps a) to f) using one or more different subsets of r restriction endonuclease enzymes.
14. A method of making a set of arrays of fragments of DNA of interest, which method comprises: a) selecting, from a set of n restriction endonuclease enzymes, a subset of r restriction endonuclease enzymes; b) digesting genomic DNA with the subset of r enzymes; c) ligating to the resulting fragments restriction-enzyme-cutting- site-specific adapters with unique polymerase chain reaction amplifiable sequences; d) splitting the resulting fragments into r2 aliquots; e) amplifying each aliquot with two restriction-enzyme-specific primers; f) forming an array of the r2 aliquots of the amplimer strands; and g) repeating steps a) to f) using one or more different subsets of r restriction endonuclease enzymes.
15. The method of claim 13 or claim 14, wherein steps a to f) are repeated using each different subset of r restriction endonuclease enzymes to give (n!)/[(n-r)!r!] different arrays.
16. The method of any one of claims 13 to 15, wherein the n restriction endonuclease enzymes are selected from 4-cutters and 5-cutters and 6-cutters.
17. The method of any one of claims 13 to 16, wherein n is 3 to
10 and r is 2 to 4.
18. The method of claim 17, where n = 6 and r = 3.
19. A set of arrays of fragments of DNA of interest, which set results from performance of the method of any one of claims 13 to 18.
20. The set of arrays of claim 19, which set results from performance of the method of claim 13 and claim 14 and claim 15.
21. The set of arrays of claim 19 or claim 20, derived from a set of n = 6 six-cutter restriction endonuclease enzymes which are BamHI; Bsr Gl; Hind III; Ncol; Spel; and Aflll.
22. The set of arrays of claim 19 or claim 20, derived from the set of n = 6 six-cutter restriction endonuclease enzymes which are EcoRI;
BspHI; Bglll; Xbal; Acc65l; and ApaLI.
23. A nucleic acid characterisation method which comprises presenting to the set of arrays of any one of claims 19 to 22 a nucleic acid fragment of interest under hybridisation conditions, and observing a pattern of hybridisation.
24. The method of claim 23, wherein a plurality of nucleic acid fragments of interest are separately presented to the set of arrays, and the resulting patterns of hybridisation are compared.
25. The method of claim 24, wherein the plurality of nucleic acid fragments of interest are drawn from the mixture of DNA fragments, enriched in fragments that are characteristic of a phenotype of interest, of claim 13.
26. A method of identifying fragments of DNA that are characteristic of a phenotype of interest, which method comprises recovering, cloning and amplifying individual DNA fragments from the mixture of DNA fragments of claim 12, presenting the individual DNA fragments to the set of arrays of any one of claims 19 to 22 under hybridisation conditions, observing a pattern of hybridisation generated by each individual DNA fragment, and subjecting to further investigation any two or more individual DNA fragments whose hybridisation patterns are similar or identical, or near to each other in a genome of interest.
27. A double-stranded DNA molecule having the sequence a-A-b- B...X-y-Y-z where A, B...X and Y are unique restriction sites for n different restriction endonuclease enzymes, and a, b...y, z denotes distances in base pairs, characterised in that each fragment, obtainable by cutting the DNA molecule by means of any one or more up to n of the restriction enzymes, has a different length from every other fragment.
28. The double-stranded DNA molecule of claim 27, wherein the following criteria are satisfied: a) inter-fragment length differences are greater for larger fragments; b) ali possible fragments are unambiguously resolvable by electrophoresis from one another; c) size gaps between bands comprising different numbers of inter-restriction-site units are larger than size gaps between bands comprising the same number of inter-restriction-site units; d) the size gaps and size spread from the largest to the smallest fragment are electrophorectically compatible.
PCT/GB2000/000916 1999-03-12 2000-03-10 Genetic analysis WO2000055364A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CA002362771A CA2362771A1 (en) 1999-03-12 2000-03-10 Genetic analysis
JP2000605780A JP2002538837A (en) 1999-03-12 2000-03-10 Genetic analysis
EP00909502A EP1173609A2 (en) 1999-03-12 2000-03-10 Genetic analysis
AU31782/00A AU3178200A (en) 1999-03-12 2000-03-10 Genetic analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP99301933.0 1999-03-12
EP99301933 1999-03-12

Publications (2)

Publication Number Publication Date
WO2000055364A2 true WO2000055364A2 (en) 2000-09-21
WO2000055364A3 WO2000055364A3 (en) 2001-10-11

Family

ID=8241266

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2000/000916 WO2000055364A2 (en) 1999-03-12 2000-03-10 Genetic analysis

Country Status (5)

Country Link
EP (1) EP1173609A2 (en)
JP (1) JP2002538837A (en)
AU (1) AU3178200A (en)
CA (1) CA2362771A1 (en)
WO (1) WO2000055364A2 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006122215A3 (en) * 2005-05-10 2007-03-22 State Of Oregon Acting By & Th Methods of mapping polymorphisms and polymorphism microarrays
US7378245B2 (en) 2002-09-06 2008-05-27 State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon Methods for detecting and localizing DNA mutations by microarray
WO2011053987A1 (en) * 2009-11-02 2011-05-05 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence selection and amplification
US9206418B2 (en) 2011-10-19 2015-12-08 Nugen Technologies, Inc. Compositions and methods for directional nucleic acid amplification and sequencing
US9650628B2 (en) 2012-01-26 2017-05-16 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library regeneration
US9745614B2 (en) 2014-02-28 2017-08-29 Nugen Technologies, Inc. Reduced representation bisulfite sequencing with diversity adaptors
US9822408B2 (en) 2013-03-15 2017-11-21 Nugen Technologies, Inc. Sequential sequencing
US9957549B2 (en) 2012-06-18 2018-05-01 Nugen Technologies, Inc. Compositions and methods for negative selection of non-desired nucleic acid sequences
US10102337B2 (en) 2014-08-06 2018-10-16 Nugen Technologies, Inc. Digital measurements from targeted sequencing
US10570448B2 (en) 2013-11-13 2020-02-25 Tecan Genomics Compositions and methods for identification of a duplicate sequencing read
US11028430B2 (en) 2012-07-09 2021-06-08 Nugen Technologies, Inc. Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
US11099202B2 (en) 2017-10-20 2021-08-24 Tecan Genomics, Inc. Reagent delivery system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0226288A2 (en) * 1985-10-09 1987-06-24 Collaborative Research Inc. Means and method of testing for cystic fibrosis based on genetic linkage
US4771384A (en) * 1986-07-24 1988-09-13 Dnastar, Inc. System and method for fragmentation mapping
WO1989001526A1 (en) * 1987-08-07 1989-02-23 Genelabs Incorporated Coincidence cloning method and library
EP0466404A1 (en) * 1990-07-13 1992-01-15 Life Technologies, Inc. Size markers for electrophoretic analysis of DNA
WO1993017126A1 (en) * 1992-02-19 1993-09-02 The Public Health Research Institute Of The City Of New York, Inc. Novel oligonucleotide arrays and their use for sorting, isolating, sequencing, and manipulating nucleic acids
WO1994011383A1 (en) * 1992-11-12 1994-05-26 Cold Spring Harbor Laboratory A representational approach to dna analysis
US5376526A (en) * 1992-05-06 1994-12-27 The Board Of Trustees Of The Leland Stanford Junior University Genomic mismatch scanning
WO1995011971A1 (en) * 1993-10-28 1995-05-04 Life Technologies, Inc. Nucleic acid marker ladder for estimating mass
WO1995012688A1 (en) * 1993-01-11 1995-05-11 United States Biochemical Corporation Methods of analysis and manipulation of dna utilizing mismatch repair systems
WO1997029211A1 (en) * 1996-02-09 1997-08-14 The Government Of The United States Of America, Represented By The Secretary, Department Of Health And Human Services RESTRICTION DISPLAY (RD-PCR) OF DIFFERENTIALLY EXPRESSED mRNAs
US5710000A (en) * 1994-09-16 1998-01-20 Affymetrix, Inc. Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping
US5750335A (en) * 1992-04-24 1998-05-12 Massachusetts Institute Of Technology Screening for genetic variation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0226288A2 (en) * 1985-10-09 1987-06-24 Collaborative Research Inc. Means and method of testing for cystic fibrosis based on genetic linkage
US4771384A (en) * 1986-07-24 1988-09-13 Dnastar, Inc. System and method for fragmentation mapping
WO1989001526A1 (en) * 1987-08-07 1989-02-23 Genelabs Incorporated Coincidence cloning method and library
EP0466404A1 (en) * 1990-07-13 1992-01-15 Life Technologies, Inc. Size markers for electrophoretic analysis of DNA
WO1993017126A1 (en) * 1992-02-19 1993-09-02 The Public Health Research Institute Of The City Of New York, Inc. Novel oligonucleotide arrays and their use for sorting, isolating, sequencing, and manipulating nucleic acids
US5750335A (en) * 1992-04-24 1998-05-12 Massachusetts Institute Of Technology Screening for genetic variation
US5376526A (en) * 1992-05-06 1994-12-27 The Board Of Trustees Of The Leland Stanford Junior University Genomic mismatch scanning
WO1994011383A1 (en) * 1992-11-12 1994-05-26 Cold Spring Harbor Laboratory A representational approach to dna analysis
WO1995012688A1 (en) * 1993-01-11 1995-05-11 United States Biochemical Corporation Methods of analysis and manipulation of dna utilizing mismatch repair systems
WO1995011971A1 (en) * 1993-10-28 1995-05-04 Life Technologies, Inc. Nucleic acid marker ladder for estimating mass
US5710000A (en) * 1994-09-16 1998-01-20 Affymetrix, Inc. Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping
WO1997029211A1 (en) * 1996-02-09 1997-08-14 The Government Of The United States Of America, Represented By The Secretary, Department Of Health And Human Services RESTRICTION DISPLAY (RD-PCR) OF DIFFERENTIALLY EXPRESSED mRNAs

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
ELLIS L A ET AL: "MUTS BINDING PROTECTS HETERODUPLEX DNA FROM EXONUCLEASE DIGESTION IN VITRO: A SIMPLE METHOD FOR DETECTING MUTATIONS" NUCLEIC ACIDS RESEARCH, vol. 22, no. 13, 11 July 1994 (1994-07-11), page 2710/2711 XP002021164 *
ESPOSITO J J AND KNIGHT J C: "Orthopoxvirus DNA: A comparison of restriction profiles and maps" VIROLOGY, vol. 143, 1985, pages 230-251, XP000951520 *
KNEHR M ET AL.: "Isolation and characterization of a cDNA encoding rat liver cytosolic epoxide hydrolase and its functional expression in Escherichia coli" THE JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 268, no. 23, 1993, pages 17623-17627, XP002165725 *
KOROLIK V ET AL.: "Differentiation of Campylobacter jejuni and Campylobacter coli strains by using restriction endonuclease DNA profiles and DNA fragment polymorphisms" JOURNAL OF CLINICAL MICROBIOLOGY, vol. 33, no. 5, 1995, pages 1136-1140, XP000951519 *
MASATO ORITA ET AL: "DETECTION OF POLYMORPHISMS OF HUMAN DNA BY GEL ELECTROPHORESIS AS SINGLE-STRAND CONFORMATION POLYMORPHISMS" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, vol. 86, no. 8, 1 April 1989 (1989-04-01), pages 2766-2770, XP000310584 *
SAMBROOK J ET AL.: "Molecular cloning: A laboratory Manual. Vol.2: Partial digestion of high-molecular-weight eukaryotic DNA with restriction enzymes" 1989 , COLD SPRING HARBOUR LABORATORY PRESS , COLD SPRING HARBOUR XP002113808 page 9.24 -page 9.28, paragraph 3 *
SMITH J ET AL: "MUTATION DETECTION WITH MUTH, MUTL, AND MUTS MISMATCH REPAIR PROTEINS" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA,US,NATIONAL ACADEMY OF SCIENCE. WASHINGTON, vol. 93, 1 April 1996 (1996-04-01), pages 4374-4379, XP002030021 ISSN: 0027-8424 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7378245B2 (en) 2002-09-06 2008-05-27 State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon Methods for detecting and localizing DNA mutations by microarray
US7563581B2 (en) 2002-09-06 2009-07-21 State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon Methods for detecting and localizing DNA mutations by extension of differentially fragmented DNA
WO2006122215A3 (en) * 2005-05-10 2007-03-22 State Of Oregon Acting By & Th Methods of mapping polymorphisms and polymorphism microarrays
US9365893B2 (en) 2005-05-10 2016-06-14 State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of The University Of Oregon Methods of mapping polymorphisms and polymorphism microarrays
WO2011053987A1 (en) * 2009-11-02 2011-05-05 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence selection and amplification
GB2487341A (en) * 2009-11-02 2012-07-18 Nugen Technologies Inc Compositions and methods for targeted nucleic acid sequence selection and amplification
US9206418B2 (en) 2011-10-19 2015-12-08 Nugen Technologies, Inc. Compositions and methods for directional nucleic acid amplification and sequencing
US9650628B2 (en) 2012-01-26 2017-05-16 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library regeneration
US10876108B2 (en) 2012-01-26 2020-12-29 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
US10036012B2 (en) 2012-01-26 2018-07-31 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
US9957549B2 (en) 2012-06-18 2018-05-01 Nugen Technologies, Inc. Compositions and methods for negative selection of non-desired nucleic acid sequences
US11028430B2 (en) 2012-07-09 2021-06-08 Nugen Technologies, Inc. Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
US11697843B2 (en) 2012-07-09 2023-07-11 Tecan Genomics, Inc. Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
US9822408B2 (en) 2013-03-15 2017-11-21 Nugen Technologies, Inc. Sequential sequencing
US10619206B2 (en) 2013-03-15 2020-04-14 Tecan Genomics Sequential sequencing
US10760123B2 (en) 2013-03-15 2020-09-01 Nugen Technologies, Inc. Sequential sequencing
US10570448B2 (en) 2013-11-13 2020-02-25 Tecan Genomics Compositions and methods for identification of a duplicate sequencing read
US11098357B2 (en) 2013-11-13 2021-08-24 Tecan Genomics, Inc. Compositions and methods for identification of a duplicate sequencing read
US11725241B2 (en) 2013-11-13 2023-08-15 Tecan Genomics, Inc. Compositions and methods for identification of a duplicate sequencing read
US9745614B2 (en) 2014-02-28 2017-08-29 Nugen Technologies, Inc. Reduced representation bisulfite sequencing with diversity adaptors
US10102337B2 (en) 2014-08-06 2018-10-16 Nugen Technologies, Inc. Digital measurements from targeted sequencing
US11099202B2 (en) 2017-10-20 2021-08-24 Tecan Genomics, Inc. Reagent delivery system

Also Published As

Publication number Publication date
AU3178200A (en) 2000-10-04
CA2362771A1 (en) 2000-09-21
EP1173609A2 (en) 2002-01-23
WO2000055364A3 (en) 2001-10-11
JP2002538837A (en) 2002-11-19

Similar Documents

Publication Publication Date Title
US10538806B2 (en) High throughput screening of populations carrying naturally occurring mutations
JP3535159B2 (en) Selective approach to DNA analysis
AU780119B2 (en) Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
JP7008407B2 (en) Methods for Identifying and Counting Methylation Changes in Nucleic Acid Sequences, Expressions, Copies, or DNA Using Combinations of nucleases, Ligses, Polymerases, and Sequencing Reactions
JP3251291B2 (en) Classification method of nucleotide sequence population
EP2302070B1 (en) Strategies for high throughput identification and detection of polymorphisms
EP1255871B1 (en) Multiplex ligatable probe amplification
US8980551B2 (en) Use of class IIB restriction endonucleases in 2nd generation sequencing applications
JP5166276B2 (en) A method for high-throughput screening of transposon tagging populations and massively parallel sequencing of insertion sites
US6277606B1 (en) Representational approach to DNA analysis
WO1993022462A1 (en) Genomic mismatch scanning
EP1173609A2 (en) Genetic analysis
US20200102612A1 (en) Method for identifying the source of an amplicon
JP2002537855A (en) Compositions and methods for genetic analysis
US20040197774A1 (en) Representational approach to DNA analysis
AU1322000A (en) Allele frequency differences method for phenotype cloning
EP1034261B1 (en) Method of parallel screening for insertion mutants and a kit to perform this method
EP1129219B1 (en) Restricted amplicon analysis
US20070082343A1 (en) Method for the control of segment-wise enzymatic duplication of nucleic acids via incomplete complementary strands
WO2003025221A1 (en) Compositions and methods to identify haplotypes
WO2000056923A2 (en) Genetic analysis
WO2001046470A1 (en) Enrichment of nucleic acid

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2000909502

Country of ref document: EP

ENP Entry into the national phase in:

Ref country code: CA

Ref document number: 2362771

Kind code of ref document: A

Format of ref document f/p: F

Ref document number: 2362771

Country of ref document: CA

ENP Entry into the national phase in:

Ref country code: JP

Ref document number: 2000 605780

Kind code of ref document: A

Format of ref document f/p: F

AK Designated states

Kind code of ref document: A3

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 2000909502

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09914604

Country of ref document: US

WWW Wipo information: withdrawn in national office

Ref document number: 2000909502

Country of ref document: EP