NUCLEOTIDE SEQUENCES AND MATERIALS AND METHODS
FOR THE DIAGNOSIS OF MYOTONIC DYSTROPHY
Myotonic dystrophy (DM) is the commonest adult form of muscular dystrophy. It is an inherited neuromuscular condition found in most of the world's populations, with an incidence thought to be around 1 case per 8000 people
(Harper PS, "Myotonic Dystrophy"; 2nd Edn. , Saunders, 1989).
The onset and degree of severity of the disease is highly variable, ranging from very mildly affected adult cases whose only symptoms are a type of lens cataract occuring in later life, to a life-threatening neonatal form of the disease found in some affected infants who have inherited the genetic defect from their mother. The type of symptoms is also very variable and can involve a wide range of tissues including muscle, brain, heart, and endocrine organs. In general DM is characterised by myotonia and progressive muscle weakness and wasting. To date this disease has been untreatable and its biochemical basis is not understood. The gene responsible for DM has been mapped by family studies to a small region on the long arm of chromosome 19 (Harley HG et al, American Journal of Human Genetics 49, pp68-75 (1991)). Further studies have shown that the gene lies between the markers D19S63 and D19S95. We have now isolated a DNA sequence (SEQ ID NO:l) that identifies the precise site in the DNA of DM patients that causes their disease (the "mutation") . The sequence was obtained from a
collection of DNA fragments that represent the genes expressed in human brain. It is just over 2700 bases in length and is unique. Its biochemical function is the subject of further investigation. The sequence includes, at a particular point along its length, a run of repeats of a 3-base unit (CTG) . The number of times this triplet is repeated is variable between individuals. In normal unaffec¬ ted people, the number is between 5 and about 40. In DM patients the number is between 50 and at least 2000. Generally speaking, the greater the number, the more severe the disease. We have not yet found any individual with between 40 and 50 repeats and this suggests that the population divides into two distinct groups; those having less than 50 repeats being unaffected and those having 50 or more repeats being affected. The number of repeats of the 3- base sequence appears to be the basic underlying genetic difference between DM patients and normal individuals. Although it is usually the case in genetic inheritance that DNA sequences are passed from parents to offspring in an essentially unchanged form, this is not true of this particular sequence in DM families. In most cases the number of repeats increases on transmission from an affected parent to their affected offspring. This increase corre¬ lates with an increasing severity of symptoms, and earlier age-at-onset, in successive generations of a DM family. These observations are based on an extensive study done in our laboratory, involving 100 DM families and 200 normal control samples.
SUMMARY OF THE INVENTION In one aspect, this invention provides a nucleotide sequence, comprising:- a DNA sequence derived from human chromosome 19q including a variable number of repeats of the three-base unit CTG or its complement, wherein said number is greater than about 50 in individuals affected by DM; a DNA sequence which hybridises under standard condi¬ tions to said first mentioned sequence over a region containing said variable number of repeats; an RNA sequence transcribed from or corresponding to either of said aforementioned DNA sequences, or a fragment containing one of said sequences. Preferably the nucleotide sequence is substantially as given in SEQ ID NO:l
In another aspect, this invention provides a nucleic acid hybridisation probe useful for determining the number of repeats of said three-base unit in a sample nucleotide sequence as defined above, said probe including a nucleotide sequence capable of hybridising to said sample sequence, or its complementary sequence or to a fragment of either of these, said probe having associated therewith a detectable label.
In a further aspect, this invention provides a method of DM risk diagnosis which comprises directly or indirectly observing monitoring or determining the number of repeats of the base sequence CTG or its complement in the DNA from chromosome 19q, or the number of repeats of the equivalent
three-base unit in RNA transcribed from or corresponding to said DNA sequence, or observing monitoring or determining the length of the region containing said repeats.
Said method of DM risk diagnosis preferably involves hybridising a sample of genomic DNA or RNA from an individ¬ ual with one or more probes as defined above, said sample preferably initially being exposed to a restriction enzyme before hybridisation with said probe or probes.
Suitable restriction enzymes are EcoRI, EcoRV, Pεtl and PvuII, although many other enzymes which, with the appropri¬ ate probe, provide fragments which differ in length between DM patients and unaffected adults, may be used.
An important feature of this aspect of the invention is that the number of repeats or the length of the repeat region may be used to predict the severity of DM of the individual.
The invention also extends to primers for use in a nucleic acid amplification technique (for example the PCR or polymerase chain reaction) for amplifying at least the variable repeat region of a nucleotide sequence as defined above. The primers preferably comprise first and second oligonucleotides closely flanking said repeat region (e.g. each spaced between about 5 and 75 bases therefrom) . Said first and second oligonucleotides preferably each comprise respective sequences of from 8 to 32 bases and in one embodiment are substantially as identified by primer references 101 and 102 in Figure 1 (SEQ ID NO: 2) , or complements thereof.
In a further aspect this invention provides a diagnos¬ tic kit for carrying out a method of DM risk diagnosis which involves hybridising a sample of genomic DNA or RNA from an individual with one or more hybridisation probes, wherein the kit includes a hybridisation probe and one or more other components for carrying out the method, characterised in that said hybridisation probe is as defined above, and optionally including PCR primers.
The techniques disclosed herein have the capability to distinguish between normal unaffected people and carriers of the DM gene, even when these cannot be diagnosed by clinical examination alone. This can be done by standard methods of DNA or RNA analysis, including Southern or Northern blotting and hybridisation, and/or PCR (polymerase chain reaction) . There is always a risk to gene carriers that their offspring may be severely affected, and therefore there is a demand from DM families for carrier detection by DNA or RNA testing. When a pregnancy at risk for DM is already under way, the family will often request prenatal diagnosis of the foetus by means of chorionic villus sampling or amniocentesis and DNA analysis, following which a decision may be made regarding termination or continuation. At present the DNA diagnosis is done indirectly using DNA sequences that are not themselves part of the DM gene; this is often technically unsatisfactory and requires the cooperation of other family members than those directly involved. Furthermore it gives no indication of the severity of the symptoms, only that the disease gene is or
is not present. The techniques disclosed herein will overcome these limitations since they allow diagnosis of the presence of the disease gene in a sample of DNA or RNA from just the person in question, without the need for a full family analysis. This increases the speed of the procedure (an important consideration when a pregnancy is ongoing) and reduces the risk of misdiagnosis considerably. More significantly, these techniques will enable a prediction to be made concerning the severity of the disease in the person or pregnancy at risk. Because DM is a highly variable condition, ranging from a trivial adult complaint to a potentially lethal congenital illness, it is important for the family and their counsellors to know how severe a form is involved, so that informed choices may be made. Whilst the invention has been described above, it extends to any inventive combination of the features set out above or in the following description.
The invention may be performed in various ways and certain experimental data will now be disclosed to illus- trate examples in accordance with the invention.
LIST OF FIGURES
Figure 1 is a restriction map of the 10 kb EcoRI fragment which undergoes expansion in DM patients;
Figures 2(a) and (b) are autoradiographs showing the variation of fragment lengths for a number of individuals, and
Figure 2(c) is a family tree for the individuals of Figure 2(b) .
Experimental details Isolation of the DNA sequence A series of fragments of human chromosome 19 have been isolated from the region known to contain the DM gene.
The DM gene has been mapped to a 200 kilobase (kb) interval at 19q, presently believed to be at 19ql3.3. Construction of a radiation reduced hybrid 2F5, containing 2 megabases of human chromosome 19 including the closest flanking markers facilitated cloning of this entire region in phage λ. Clones were further localised using radiation reduced hybrids and pulsed-field gel electrophoresis. Clone λM9C showed strong inter-species conservation, and a sub- clone (pBB0.7) identified an RFLP with EcoRV and EcoRI.
A series of phage clones derived from libraries of radiation-reduced hybrid 2F5 were used to span the interval between D19S63 and D19S95, the loci in linkage disequilib¬ rium with DM. Intensive screening of this interval led to the identification of clones designated as λMlOM, λM8L and λSM2 (which contain sub-clone pBB0.7). These clones span the lOkb EcoRI fragment that is increased in size in DM patients. Clone λMlOM was one such clone, and was found to have the ability to distinguish between DM patients and normal individuals, as follows. A fragment of the DNA from clone λMlOM was radioactively labelled to make a "DNA probe". A series of DNA samples from DM patients and normal individuals were digested into specific small fragments
using a bacterial enzyme called EcoRI. The fragments were separated according to their length by electrophoresis in an agarose gel, and transferred to a nylon membrane to which they adhere (Southern blotting) . The radioactive probe from λMlOM was then incubated with the membrane, which allows it to find its corresponding sequence in each of the samples of human DNA (DNA hybridisation) . The positions of these fragments on the membrane are then visualised by exposure to X-ray film (autoradiography) . This experiment revealed that the fragments corresponding to M10M are always larger (by varying amounts) in DM patient DNAs than in DNAs from normal individuals. Following confirmation on a larger number of samples, it was apparent that λMlOM contained a copy of the DNA sequence responsible for myotonic dystrophy. The original λMlOM clone was derived from DNA of complete human chromosomes. Because only part of this DNA is "expressed" (i.e. functions as genes by making proteins) , a second cloned sequence was obtained using M10M as a probe. This second clone came from a collection of cDNAs represent- ing most of the genes expressed in human brain. The clone, which we have designated C28, contains 2726 bases of human DNA. The entire DNA sequence was determined and is shown in (SEQ ID NO: 1) . The position of the 3-base repeat that undergoes expansion in DM patients is indicated and lies at a position approximately 500bp from the poly(A) tract of a mRNA expressed in many of the tissues affected in DM. The RNA in which the repeat resides encodes a polypeptide with strong a ino acid homology to members of the protein kinase
gene family .
Analysis of human DNA samples for the DM mutation
Two different but complementary methods are used to determine the presence of the DM mutation and the size of the expanded sequence. These procedures may be performed on DNA samples from human blood, mouthwashes, or chorion villus biopsies. All of the methodology is based on standard molecular genetic laboratory procedures (Sambrook J, Fritsch EF, Maniatis T. "Molecular cloning - a laboratory manual". Cold Spring Harbor Press, 1989) .
The first method is based on Southern blotting and hybridisation, and is most effective in detecting expanded sequences towards the upper end of the size range. Samples (5-10 μg) of DNA from people to be tested, together with normal controls, are digested by incubation with the restriction enzyme EcoRI or PstI for 2-4 hours at 37°C, The samples are then separated by electrophoresis in 0.8% agarose gels for 16-18 hours at 45 volts, and the DNA transferred to a nylon membrane by overnight capillary action (Southern blotting) . The membrane is removed from the agarose gel, dried, and the DNA fixed to it by ultraviolet radiation. A probe consisting of a part of the C28 sequence (SEQ ID NO: 1) is made by incorporation of a radioactive tracer into the DNA sequence. This is then incubated overnight at 65°C with the membrane in an aqueous buffer solution, allowing the probe to hybridise to the DNA samples on the membrane. The excess unbound probe is then washed
off with dilute salt solution at 65°C, and the membrane exposed to X-ray film in the dark at -70°C for 1 to 4 days. The film is developed and aligned with the original mem¬ brane, allowing identification of the various samples and the size of the DNA fragments containing the 3-base repeat sequence.
The second procedure is based on PCR (polymerase chain reaction) and is best suited to the detection of DM muta¬ tions that are only slightly larger than normal. A pair of small, unique DNA sequences called "primers", which are derived from the C28 sequence (SEQ ID NO: 1) and flank closely the site of the 3-base repeat, are used. Small samples (0.1-0.5 μg) of DNA from individuals to be tested, along with normal controls, are mixed with 20 pmoles of each primer, 1 unit of a bacterial enzyme (Taq polymerase) , individual bases for DNA synthesis, and buffer salts, in a volume of 20 μL. The mixtures are then subjected to a cyclical, 3-phase incubation protocol. In the first phase the mixture is heated to 94°C for 90 seconds to separate the two strands of the DNA sample. The second phase is for 60 seconds at 62°C and allows the primers to bind to their complementary sites on the sample DNA. During the third phase (2 minutes at 72°C) The Taq polymerase enzyme synthesises a new complementary DNA strand on each of the sample strands, starting from the primer. The whole 3-phase procedure is repeated 30 times, using an automatic programmable heating/cooling device. Because each cycle causes a doubling of the number of molecules, the net result
is to specifically amplify the sequence delimited by the two primers. In our procedure this represents the 3-base repeat region which is expanded in DM patients. The products of the PCR reactions are separated by agarose gel electrophore- sis (3% agarose gel, 2 hours at 80 volts) and visualised by staining the DNA with a fluorescent dye. The sizes of the amplified fragments are estimated by comparison with known standards, separated on the same electrophoresis gel.
It will be understood that variations of the above methods are possible. For example, SEQ ID NO: 1 is a cDNA (RNA derived) sequence, and there are flanking and interven¬ ing sequences mixed in with this in genomic DNA. Suitable PCR primers which flank the CTG repeat region in genomic DNA may also be used which may differ from those described above. Example
Using the Southern blotting technique genomic DNA was digested with EcoRI and then probed with pBB0.7, a single copy sub-clone of λM9C. Results
From Figure 2 it is evident that all individuals have a constant ~15kb band (C) . Normal individuals are either homozygous or heterozygous for bands of 10 and 9kb (alleles 1 and 2) . Affected individuals have one of these two bands plus a second band >10 kb, indicated by ► in Figures 2(a) and (b) .
In Figure 2(a), lanes 2,4 and 8 are normal, unrelated individuals; lanes 1,5 and 7 are unrelated affected individ-
uals; lanes 3, 6 and 9 are affected offspring of individuals 1 and 2, 4 and 5, 7 and 8 respectively. Lane 1 shows one of the smallest size changes detectable, and lane 6 one of the largest. Two distinct bands can clearly be seen on the autoradiograph. Lanes 5 and 7 illustrate the smearing of bands seen in some individuals.
In Figure 2 (b) , individual 4 is classified as late onset and has a novel fragment minimally larger than the normal lOkb band. His two affected offspring are classified as adult onset (individual 1 has a minimally increased fragment) and early adult onset (individual 7 with a novel fragment "lkb larger than his father's). The affected grandchildren (individuals 3 and 8) are both classified as early onset and can be seen to have a much larger fragment than their respective parents and their grandparent. Individual 8 had the earliest age at onset and is the most severely affected and also has the largest fragment in this family.
The restriction map shown in Figure 1 was derived from overlapping genomic phage clones from unaffected individ¬ uals. Restriction sites for BamHI(B), EcoRI(E), Hindi(C), HindΙII(H), PstΙ(P), and SacI(S) are indicated. The subcloned 1.4kb BamHI fragment (pM10M-6) is shown enlarged with the PstI and Hindi sites, which flank the expanded region. The positions of PCR primers 96, 98, 100, 101, 102 and 103 and the sequence between primers 101 and 102 is shown
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT:
(A) NAME: UNIVERSITY OF WALES COLLEGE OF MEDICINE
(B) STREET: THE HEATH
(C) CITY: CARDIFF
(D) STATE: WALES
(E) COUNTRY: U.K.
(F) POSTAL CODE (ZIP): CF4 4XN
(G) TELEPHONE: 0222-747747 (H) TELEFAX: 0222-742914
(ii) TITLE OF INVENTION: DNA SEQUENCES AND MATERIALS AND METHODS FOR THE DIAGNOSIS OF MYOTONIC DYSTROPHY
(iii) NUMBER OF SEQUENCES: 2
(iv) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: PatentIn Release #1.0, Version #1.25
(EPO)
(v) CURRENT APPLICATION DATA:
APPLICATION NUMBER: WO PGT/GB/93*********** (vi) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: GB 9202485.0
(B) FILING DATE: 06-FEB-1992
(2) INFORMATION FOR SEQ ID NO: 1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2726 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(ix) FEATURE:
(A) NAME/KEY: repeat_region
(B) LOCATION: 2225..2257
(ix) FEATURE:
(A) NAME/KEY: primer_bind
(B) LOCATION: 2190..2218
(ix) FEATURE:
(A) NAME/KEY: primer_bind
(B) LOCATION: 2308..2334
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:
GGGGACAGCC AGGGACAGGC AGACATGCAG CCAGGGCTCC AGGGCCTGGA CAGGGGCTGC 60
CAGGCCCTGT GACAGGAGGA CCCCGAGCCC CCGGCCCGGG GAGGGGCCAT GGTGCTGCCT 120
GTCCAACATG TCAGCCGAGG TGCGGCTGAG GCGGCTCCAG CAGCTGGTGT TGGACCCGGG 180
CTTCCTGGGG CTGGAGCCCC TGCTCGACCT TCTCCTGGGC GTCCACCAGG AGCTGGGCGC 240
CTCCGAACTG GCCCAGGACA AGTACGTGGC CGACTTCTTG CAGTGGGCGG AGCCCATCGT 300
GGTGAGGCTT AAGGAGGTCC GACTGCAGAG GGACGACTTC GAGATTCTGAAGGTGATCGG 360
ACGCGGGGCG TTCAGCGAGG TAGCGGTAGT GAAGATGAAG CAGACGGGCC AGGTGTATGC 420
CATGAAGATC ATGAACAAGT GGGACATGCT GAAGAGGGGC GAGGTGTCGT GCTTCCGTGA 480
GGAGAGGGAC GTGTTGGTGA ATGGGGACCG GCGGTGGATC ACGCAGCTGC ACTTCGCCTT 540
CCAGGATGAG AACTACCTGT ACCTGGTCAT GGAGTATTAC GTGGGCGGGG ACCTGCTGAC 600
ACTGCTGAGC AAGTTTGGGG AGCGGATTCC GGCCGAGATG GCGCGCTTCT ACCTGGCGGA 660
GATTGTCATG GCCATAGACT CGGTGCACCG GCTTGGCTAC GTGCACAGGG ACATCAAACC 720
CGACAACATC CTGCTGGACC GCTGTGGCCA CATCCGCCTG GCCGACTTCG GCTCTTGCCT 780
CAAGCTGCGG GCAGATGGAA CGGTGCGGTC GCTGGTGGCT GTGGGCACCC CAGACTACCT 840
GTCCCCCGAG ATCCTGCAGG CTGTGGGCGG TGGGCCTGGG ACAGGCAGCT ACGGGCCCGA 900
GTGTGACTGG TGGGCGCTGG GTGTATTCGC CTATGAAATG TTCTATGGGC AGACGCCCTT 960
CTACGCGGAT TCCACGGCGG AGACCTATGG CAAGATCGTC CACTACAAGGAGCACCTCTC 1020
TCTGCCGCTG GTGGACGAAG GGGTCCCTGA GGAGGCTCGA GACTTCATTC AGCGGTTGCT 1080
GTGTCCCCCG GAGACACGGC TGGGCCGGGG TGGAGCAGGC GACTTCCGGA CACATCCCTT 1140
CTTCTTTGGC CTCGACTGGG ATGGTCTCCGGGACAGCGTG CCCCCCTTTA CACCGGATTT 1200
CGAAGGTGCC ACCGACACAT GCAACTTCGA CTTGGTGGAGGACGGGCTCA CTGCCATGGA 1260
GACACTGTCG GACATTCGGGAAGGTGCGCC GCTAGGGGTC CACCTGCCTT TTGTGGGCTA 1320
CTCCTACTCC TGCATGGCCC TCAGGGACAG TGAGGTCCCA GGCCCCACAC CCATGGAAGT 1380
GGAGGCCGAG CAGCTGCTTG AGCCACACGT GCAAGCGCCC AGCCTGGAGC CCTCGGTGTC 1440
CCCACAGGAT GAAACAGCTG AAGTGGCAGT TCCAGCGGCT GTCCCTGCGGCAGAGGCTGA 1500
GGCCGAGGTGACGCTGCGGGAGCTCCAGGAAGCCCTGGAG GAGGAGGTGC TCACCCGGCA 1560
GAGCCTGAGC CGGGAGATGG AGGCCATCCG CACGGACAAC CAGAACTTCG CCAGTCAACT 1620
ACGCGAGGCA GAGGCTCGGAACCGGGACCT AGAGGCACAC GTCCGGCAGT TGCAGGAGCG 1680
GATGGAGTTG CTGCAGGCAGAGGGAGCCAC AGCTGTCACGGGGGTCCCCA GTCCCCGGGC 1740
CACGGATCCA CCTTCCCATC TAGATGGCCC CCCGGCCGTG GCTGTGGGCC AGTGCCCGCT 1800
GGTGGGGCCA GGCCCCATGC ACCGCCGCCA CCTGCTGCTC CCTGCCAGGG TCCCTAGGCC 1860
TGGCCTATCG GAGGCGCTTT CCCTGCTCCT GTTCGCCGTT GTTCTGTCTC GTGCCGCCGC 1920
CCTGGGCTGC ATTGGGTTGG TGGCCCACGC CGGCCAACTC ACCGCAGTCT GGCGCCGCCC 1980
AGGAGCCGCC CGCGCTCCCT GAACCCTAGAACTGTCTTCG ACTCCGGGGC CCCGTTGGAA 2040
GACTGAGTGC CCGGGGCCAG CACAGAAGCC GCGCCCACCG CCTGCCAGTT CACAACCGCT 2100
CCGAGCGTGG GTCTCCGCCCAGCTCCAGTC CTGTGATCCG GGCCCGCCCC CTAGCGGCCG 2160
GGGAGGGAGG GGCCGGGTCC GCGGCCGGCGAACGGGGCTC GAAGGGTCCT TGTAGCCGGG 2220
AATGCTGCTG CTGCTGCTGC TGCTGCTGCT GCTGCTGGGG GGATCACAGA CCATTTCTTT 2280
CTTTCGGCCA GGCTGAGGCC CTGACGTGGA TGGGCAAACT GCAGGCCTGG GAAGGCAGCA 2340
AGCCGGGCCG TCCGTGTTCC ATCCTCCACGCACCCCCACC TATCGTTGGT TCGCAAAGTG 2400
CAAAGCTTTC TTGTGCATGA CGCCCTGCTC TGGGGAGCGT CTGGCGCGAT CTCTGCCTGC 2460
TTACTCGGGAAATTTGCTTT TGCCAAACCC GCTTTTTCGG GGATCCCGCG CCCCCCTCCT 2520
CACTTGCGCT GCTCTCGGAG CCCCAGCCGG CTCCGCCGCC TTCGGCGGTT TGGATATTTA 2580
TTGACCTCGT CCTCCGACTC GCTGACAGGC TACAGGACCC CCAACAACCC CAATCCACGT 2640
TTTGGATGCA CTGAGACCCC GACATTCCTC GGTATTTATT GTCTGTCCCCACCTAGGACC 2700 CCCACCCCCG ACCCTCGCGAATAAAA 2726
(2) INFORMATION FOR SEQ ID NO: 2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 145 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: DNA (genomic)
(ix) FEATURE:
(A) NAME/KEY: primer_bind
(B) LOCATION: 1..28
(ix) FEATURE:
(A) NAME/KEY: repeat_region
(B) LOCATION: 36..68
(ix) FEATURE:
(A) NAME/KEY: primer_bind
(B) LOCATION: complement (119..145)
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: GAACGGGGCT CGAAGGGTCC TTGTAGCCGG GAATGCTGCT GCTGCTGCTG CTGCTGCTGC 60 TGCTGCTGGG GGGATCACAG ACCATTTCTT TCTTTCGGCC AGGCTGAGGC CCTGACGTGG 120 ATGGGCAAAC TGCAGGCCTG GGAAG 145