WO2000018787A1 - Gene mutated in wolfram syndrome - Google Patents

Gene mutated in wolfram syndrome Download PDF

Info

Publication number
WO2000018787A1
WO2000018787A1 PCT/US1999/022429 US9922429W WO0018787A1 WO 2000018787 A1 WO2000018787 A1 WO 2000018787A1 US 9922429 W US9922429 W US 9922429W WO 0018787 A1 WO0018787 A1 WO 0018787A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
sequence
gene
polypeptide
wfsl
Prior art date
Application number
PCT/US1999/022429
Other languages
French (fr)
Inventor
M. Alan Permutt
Hiroshi Inoue
Mike Mueckler
Original Assignee
Washington University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Washington University filed Critical Washington University
Priority to AU62701/99A priority Critical patent/AU6270199A/en
Publication of WO2000018787A1 publication Critical patent/WO2000018787A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07HSUGARS; DERIVATIVES THEREOF; NUCLEOSIDES; NUCLEOTIDES; NUCLEIC ACIDS
    • C07H21/00Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids
    • C07H21/04Compounds containing two or more mononucleotide units having separate phosphate or polyphosphate groups linked by saccharide radicals of nucleoside groups, e.g. nucleic acids with deoxyribosyl as saccharide radical
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • This invention relates to the field of diagnosis and treatment of certain forms of diabetes.
  • this invention provides a novel gene, the disruption of which is associated with Wolfram Syndrome, and methods of using the gene and specific mutants thereof as diagnostic tool for prediction and early detection of the syndrome.
  • WFS Wolfram syndrome
  • diabetes mellitus (diabetes insipidus, diabetes mellitus, optic atrophy, and deafness) syndrome. Most patients with this progressive disorder eventually develop all four cardinal manifestations, and die prematurely with widespread atrophic changes throughout the brain. Insulin-requiring diabetes mellitus occurs with mean age of onset at 6-8 years. When examined, pancreatic islets display atrophic and insulin-producing ⁇ -cells selectively absent. The disease is believed to account for 1/150 patients with young-onset insulin-requiring diabetes mellitus. The pathogenesis of Wolfram syndrome is unknown. Diagnosis is usually made in offspring of unaffected often-related parents, suggesting autosomal recessive inheritance.
  • Isolation and characterization of the gene or genes associated with Wolfram Syndrome is vital for the prediction, diagnosis and, ultimately, treatment of the disease.
  • a gene is provided, whose mutation is highly correlated with Wolfram Syndrome in humans.
  • the symptoms of Wolfram Syndrome consist of diabetes insipidus, diabetes mellitus, optic atrophy and deafness.
  • This gene is found in human genome between markers D4S500 and D4S431 on chromosome 4p, and is referred to herein as WFSl .
  • WFSl is found in SEQ ID NO : 1.
  • Another aspect of the invention comprises several isolated nucleic acids from the human and mouse WFSl genes.
  • these nucleic acids are the genomic sequence of the WFSl gene from human (SEQ ID NO:l), a cDNA sequence from the human WFSl gene (SEQ ID NO: 2) , and a cDNA from the mouse WFSl gene (SEQ ID NO: 4) .
  • the isolated nucleic acids are substantially the same or 60% homologous to SEQ ID NO:l, SEQ ID NO:2, and SEQ ID NO:4.
  • the nucleic acids encode SEQ ID NO: 3 and SEQ ID NO : 5 , substantially the same variants of SEQ ID NO: 3 and SEQ ID NO: 5, variants with at least 60% homology to SEQ ID NO : 3 and SEQ ID NO: 5, and variants of SEQ ID NO: 3 with the mutations and polymorphisms listed in Table 1.
  • the isolated nucleic acids are oligonucleotides (SEQ ID Nos: 6-41) that have been designed based on SEQ ID NO : 1.
  • polypeptides are provided, which result from the expression of part or all of the human and mouse WFSl genes .
  • the polypeptides are encoded by SEQ ID NO : 3 and SEQ ID NO: 5.
  • the polypeptides are substantially the same as SEQ ID NO : 3 and SEQ ID NO: 5, or at least 60% homologous to SEQ ID NO: 3 and SEQ ID NO : 5.
  • a series of methods which use the WFSl gene to genetically characterize mammalian subjects.
  • the WFSl gene is used to make labeled probes which are used to detect WFSl nucleic acids.
  • variant forms of WFSl are sequenced, compared to the sequence of WFSl , and mutations and polymorphisms determined.
  • human genomic DNA is sequenced and compared to SEQ ID NO:l.
  • the primers in Table 2 are used to sequence the variant human gene .
  • restriction enzymes are selected that differentially digest the wild type and variant gene, WFSl nucleic acids are digested and separated by size, and the WFSl nucleic acids are detected.
  • the human DNA is digested, and the primers in Table 2 are used to amplify the DNA before digestion.
  • Figure 1 Pedigrees of Wolfram syndrome families, with individuals designated by disease status (bold-lined symbols affected, thin-lined symbols unaffected) , and with derived haplotypes of chromosome 4p markers.
  • Bold, underlined numbers represent disease- associated chromosomes. Italicized, underlined numbers refer to disease-associated chromosomes with historical recombinants.
  • Consanguineous Japanese families Fig. la, S-1; Fig. lb, WS-2; Fig. lc, WS-3 (an affected daughter, not shown, was deceased before initiation of the study).
  • FIG. 2A The horizontal line at the top of the figure represents a portion of chromosome 4p, with the centromere to the right and pter to the left .
  • the dashed line represents the interval of D4S500 -D4S431 , the critical region of the Wolfram syndrome gene.
  • PI and BAC clones are represented as lines. Their length reflects the number of STSs and not the actual size. The name of each clone is given to the right of the line. Marker names are noted above the line and correspond to the symbols on the line.
  • Fig. 2B An expanded schematic of the genomic structure of the Wolfram gene, WFSl, with exons indicated by boxes. The entire gene is encompassed within a 33.4 kb region.
  • FIG. 3A Northern analysis with human adult polyA ⁇ RNA (5 ⁇ g) derived from various tissues was hybridized with a 32 P-labeled 854-bp genomic fragment of the WFSl gene.
  • Fig. 3B Re-hybridization of the blot with 32 P-labeled ⁇ -actin cDNA as a loading control.
  • Fig. 3C Northern analysis with human adult total RNA (20 ⁇ g) hybridized with 32 P-labeled WFSl cDNA.
  • Fig. 3D Re- hybridization of the blot in c. with 32 P-labeled ribosomal cDNA as a loading control .
  • FIG. 6a The 1685del(N) in family WS-2 , showing that each affected child (III-l, -2, -3, and -4) is homozygous for the mutation.
  • Fig. 6b Sequence chromatograms of the region of exon 8 showing the 15 bp deletion in a patient homozygous for the mutation, along with a normal control.
  • Fig. 6c Segregation of the 1681C to T and the microscopic deletion in family WS-5. The 1681C to T mutation destroys a BsmFl site in a 766 bp PCR (polymerase chain reaction) fragment.
  • the hemizygous T(-) affected children (II-l, -2, and -4) have only a 766 bp uncut fragment, while the hemizygous C(-) mother (1-4) and unaffected daughter (II-3) have 686 bp and 80 bp bands, and the heterozygous CT father (1-3) has 766 bp, 686 bp and 80 bp bands.
  • the symbols "-" and "+" refer to the absence or presence of enzyme.
  • isolated nucleic acid refers to a DNA molecule that is separated from sequences with which it is immediately contiguous (in the 5' and 3' directions) in the naturally occurring genome of the organism from which it was derived.
  • the "isolated nucleic acid” may comprise a DNA molecule inserted into a vector, such as a plasmid or virus vector, or integrated into the genomic DNA of a procaryote or eucaryote .
  • An "isolated nucleic acid molecule” may also comprise a cDNA molecule.
  • isolated nucleic acid primarily refers to an RNA molecule encoded by an isolated DNA molecule as defined above.
  • the term may refer to an RNA molecule that has been sufficiently separated from RNA molecules with which it would be associated in its natural state (i.e., in cells or tissues), such that it exists in a “substantially pure” form (the term “substantially pure” is defined below) .
  • isolated protein or peptide
  • isolated and purified protein or peptide
  • This term refers primarily to a protein produced by expression of an isolated nucleic acid molecule of the invention. Alternatively, this term may refer to a protein which has been sufficiently separated from other proteins with which it would naturally be associated, so as to exist in “substantially pure” form.
  • substantially pure refers to a preparation comprising at least 50-60% by weight the compound of interest (e.g., nucleic acid, oligonucleotide, protein, etc.). More preferably, the preparation comprises at least 75% by weight, and most preferably 90-99% by weight, the compound of interest. Purity is measured by methods appropriate for the compound of interest (e.g. chromatographic methods, agarose or polyacrylamide gel electrophoresis, HPLC analysis, and the like) .
  • Nucleic acid sequences and amino acid sequences can be compared using computer programs that align the similar sequences of the nucleic or amino acids thus define the differences.
  • the BLAST programs NCBI
  • the DNAstar system Madison, WI
  • equivalent alignments and similarity/identity assessments can be obtained through the use of any standard alignment software. For instance, the GCG Wisconsin Package version 9.1, available from the Genetics Computer Group in Madison,
  • nucleic acid or amino acid sequences having sequence variation that do not materially affect the nature of the protein (i.e. the structure, stability characteristics, substrate specificity and/or biological activity of the protein) .
  • nucleic acid sequences the term “substantially the same” is intended to refer to the coding region and to conserved sequences governing expression, and refers primarily to degenerate codons encoding the same amino acid, or alternate codons encoding conservative substitute amino acids in the encoded polypeptide.
  • amino acid sequences refers generally to conservative substitutions and/or variations in regions of the polypeptide not involved in determination of structure or function.
  • percent identical refers to the percent of the amino acids of the subject amino acid sequence that have been matched to identical amino acids in the compared amino acid sequence by a sequence analysis program.
  • Percent similar refers to the percent of the amino acids of the subject amino acid sequence that have been matched to identical or conserved amino acids. conserved amino acids are those which differ in structure but are similar in physical properties such that the exchange of one for another would not appreciably change the tertiary structure of the resulting protein.
  • percent identical refers to the percent of the nucleotides of the subject nucleic acid sequence that have been matched to identical nucleotides by a sequence analysis program.
  • immunologically specific refers to antibodies that bind to one or more epitopes of a protein of interest, but which do not substantially recognize and bind other molecules in a sample containing a mixed population of antigenic biological molecules.
  • the term “specifically hybridizing” refers to the association between two single-stranded nucleic acid molecules of sufficiently complementary sequence to permit such hybridization under pre-determined conditions generally used in the art (sometimes termed “substantially complementary”).
  • the term refers to hybridization of an oligonucleotide with a substantially complementary sequence contained within a single-stranded DNA or RNA molecule, to the substantial exclusion of hybridization of the oligonucleotide with single-stranded nucleic acids of non-complementary sequence.
  • a “coding sequence” or “coding region” refers to a nucleic acid molecule having sequence information necessary to produce a gene product, when the sequence is expressed.
  • operably linked means that the regulatory sequences necessary for expression of the coding sequence are placed in a nucleic acid molecule in the appropriate positions relative to the coding sequence so as to enable expression of the coding sequence.
  • This same definition is sometimes applied to the arrangement other transcription control elements (e.g. enhancers) in an expression vector.
  • Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for the expression of a coding sequence in a host cell.
  • promoter refers generally to transcriptional regulatory regions of a gene, which may be found at the 5' or 3 ' side of the coding region, or within the coding region, or within introns.
  • a promoter is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence.
  • the typical 5' promoter sequence is bounded at its 3 ' terminus by the transcription initiation site and extends upstream (5 1 direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background.
  • a transcription initiation site (conveniently defined by mapping with nuclease SI) , as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase .
  • a “vector” is a replicon, such as plasmid, phage, cosmid, or virus to which another nucleic acid segment may be operably inserted so as to bring about the replication or expression of the segment.
  • nucleic acid construct or "DNA construct” is sometimes used to refer to a coding sequence or sequences operably linked to appropriate regulatory sequences and inserted into a vector for transforming a cell. This term may be used interchangeably with the term “transforming DNA” .
  • Such a nucleic acid construct may contain a coding sequence for a gene product of interest, along with a selectable marker gene and/or a reporter gene .
  • selectable marker gene refers to a gene encoding a product that, when expressed, confers a selectable phenotype such as antibiotic resistance on a transformed cell.
  • reporter gene refers to a gene that encodes a product which is easily detectable by standard methods, either directly or indirectly.
  • a "heterologous" region of a nucleic acid construct is an identifiable segment (or segments) of the nucleic acid molecule within a larger molecule that is not found in association with the larger molecule in nature.
  • the heterologous region encodes a mammalian gene
  • the gene will usually be flanked by DNA that does not flank the mammalian genomic DNA in the genome of the source organism.
  • a heterologous region is a construct where the coding sequence itself is not found in nature (e.g., a cDNA where the genomic coding sequence contains introns, or synthetic sequences having codons different than the native gene) .
  • DNA construct as defined above, is also used to refer to a heterologous region, particularly one constructed for use in transformation of a cell .
  • a cell has been "transformed” or “transfected” by exogenous or heterologous DNA when such DNA has been introduced inside the cell .
  • the transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
  • the transforming DNA may be maintained on an episomal element such as a plasmid.
  • a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cells containing the transforming DNA.
  • a "clone” is a population of cells derived from a single cell or common ancestor by mitosis.
  • a "cell line” is a clone of a primary cell that is capable of stable growth in vi tro for many generations.
  • WFSl a novel human gene, WFSl.
  • the inheritance of a mutated WFSl gene is highly correlated with the development of Wolfram Syndrome in the human population. Normal function of WFSl appears to be essential for survival of pancreatic islet ⁇ -cells and neurons. Included in the invention is the method for using the gene sequence to genetically screen for presence of the potentially mutated forms of the gene for diagnosis and prognosis of the disease in patients.
  • the WFSl gene in humans spans 33.4 kb on chromosome 4p and is composed of 8 exons (Fig. 2B; SEQ ID NO:l) .
  • the ⁇ 5kb of sequence upstream of the start of WFSl open reading frame is expected to contain one or more transcriptional or translational regulatory elements.
  • the WFSl human cDNA is 3.688 kb long (SEQ ID NO: 2) and encodes a predicted protein 890 residues long with a predicted molecular mass of 100.29 kDa (SEQ ID NO: 2)
  • the mouse cDNA (SEQ ID NO: 4) is 3511 nucleotides long and has 83.9% nucleotide identity to the human gene.
  • the predicted protein sequence encoded by the mouse cDNA (SEQ ID NO: 5) has a 86.1% amino acid similarity to the predicted human WFSl protein.
  • the inventors have isolated the WFSl gene represented by SEQ ID NO : 1 from the human genome by positional cloning. Previous work had isolated the Wolfram gene between markers D4S432 and D4S431 on chromosome 4p, 5.5 cM ( ⁇ 5500 kb) apart. In the development of this invention, five families with individuals having typical Wolfram syndrome phenotypes were genotyped with genetic markers shown to locate between D4S432 and D4S431 by physical and/or radiation hybrid mapping (see Example 1) . The region containing the Wolfram gene was thus narrowed further to a region between D4S500 and D4S431 . The critical region between D4S500 and D4S431 was estimated to be ⁇ 250 kb as determined by contig mapping of BAC and PI genomic clones. Three clones were sequenced and the sequence of much of the contig region was determined.
  • Exon trapping of two BAC clones was employed to generate expressed sequence tags (ESTs) of the region and then to determine areas with open reading frames that would be likely to contain a gene.
  • ESTs expressed sequence tags
  • a genomic fragment of this region hybridized to a 3.7 kb RNA on a Northern blot.
  • the gene was determined to be expressed in all tissues, but surprisingly was most abundant in pancreatic islets compared to that in the exocrine pancreas.
  • the abundance of WFSl RNA in pancreatic islets was particularly revealing because one of the outcomes Wolfram Syndrome is atrophy of pancreatic islets.
  • a full length cDNA clone of WFSl was obtained by screening a human infant brain cDNA library (SEQ ID NO: 2) . This clone was 3,688 nucleotides long and contained an appropriate start methionine, open reading frame, and polyadenylation signal. Comparison of the cDNA sequence of WFSl with those in public databases revealed no related genes. Translation of the cDNA sequence predicts a polypeptide of 890 amino acid residues with a molecular mass of 100.29 kDa.
  • the protein is distinguished grossly by the presence of 3 structural domains, a hydrophilic N-terminal region of -300 residues, a hydrophilic C-terminal region of -240 residues, and a central hydrophobic core of -350 residues. Inspection of the hydrophobicity curve suggests the presence of -10 transmembrane segments.
  • PCR polymerase chain reaction
  • Nucleic acid molecules comprising part or all of the WFSl gene of the invention may be prepared by two general methods: (1) they may be synthesized from appropriate nucleotide triphosphates, or (2) they may be isolated from biological sources. Both methods utilize protocols well known in the art.
  • nucleotide sequence information such as Sequence I.D. Nos. 1, 2 and 4
  • Synthetic oligonucleotides may be prepared by the phosphoramadite method employed in the Applied Biosystems 38A DNA Synthesizer or similar devices.
  • the resultant construct may be purified by high performance liquid chromatography (HPLC) .
  • HPLC high performance liquid chromatography
  • a double-stranded DNA molecule several kilobases in length may be synthesized as multiple smaller segments of appropriate complementarily.
  • Complementary segments thus produced may be annealed such that each segment possesses appropriate cohesive termini for attachment of an adjacent segment.
  • Adjacent segments may be ligated by annealing cohesive termini in the presence of DNA ligase to construct an entire double- stranded molecule.
  • a synthetic DNA molecule so constructed may then be cloned and amplified in an appropriate vector.
  • WFSl nucleic acid sequences may be isolated from appropriate biological sources using methods known in the art.
  • a human genomic clone is isolated from a human genomic PI library.
  • a cDNA clone is isolated from the Marathon-Ready human fetal brain ⁇ gtlO cDNA library (Clontech) .
  • a mouse cDNA is isolated from a mouse pancreatic ⁇ -cell line (MIN6) cDNA library.
  • the isolation of human and mouse clones is not limited to the aforementioned libraries, and other commercially available human and mouse libraries may be used. Alternatively, cDNA or genomic clones from other species may be obtained.
  • nucleic acids having the appropriate sequence homology with part or all of Sequence I.D. Nos. 1, 2 or 4 may be identified by using hybridization and washing condition of appropriate stringency.
  • hybridizations may be performed, according to the method of Sambrook et al . , using a hybridization solution comprising: 5 x SSC, 5 x Denhardt ' s reagent, 1.0% SDS, 100 ⁇ g/ml denatured, fragmented salmon sperm DNA, 0.05% sodium pyrophosphate and up to 50% formamide .
  • Hybridization is carried out at 37-42°C for at least six hour.
  • hybridizations are performed in hybridization solution comprising 0.5 M NaP0 4 , 2 mM EDTA, 7% SDS and 0.1% sodium pyrophosphate (pH 7.1) at about 65°C for 20 hours.
  • membranes are subsequently washed sequentially for 1 hour each in: (1) 2X SSC, 0.5X SET, 0.1% sodium pyrophosphate; and (2) 0. IX SSC, 0.5X SET, 0.1% sodium pyrophosphate.
  • membranes are washed at 50°C for 30 minutes in 2X SSC, 0.5X SET, 0.1% sodium pyrophosphate.
  • One common formula for calculating the stringency conditions required to achieve hybridization between nucleic acid molecules of a specified sequence homology (Sambrook et al . , 1989) :
  • T m 81.5°C + 16.6Log [Na+] + 0.41(% G+C) - 0.63 (% formamide) - 600/#bp in duplex
  • the stringency of the hybridization and wash depend primarily on the salt concentration and temperature of the solutions. In general, to maximize the rate of annealing of the probe with its target, the hybridization is usually carried out at salt and temperature conditions that are 20-25°C below the calculated T m of the of the hybrid. Wash conditions should be as stringent as possible for the degree of identity of the probe for the target. In general, wash conditions are selected to be approximately 12-20°C below the T m of the hybrid.
  • a moderate stringency hybridization is defined as hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 100 ⁇ g/ml denatured salmon sperm DNA at 42°C, and wash in 2X SSC and 0.5% SDS at 55°C for 15 minutes.
  • a high stringency hybridization is defined as hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 100 ⁇ g/ml denatured salmon sperm DNA at 42°C, and wash in IX SSC and 0.5% SDS at 65°C for 15 minutes.
  • a very high stringency hybridization is defined as hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 100 ⁇ g/ml denatured salmon sperm DNA at 42 °C, and wash in 0. IX SSC and 0.5% SDS at 65°C for 15 minutes.
  • Nucleic acids of the present invention may be maintained as DNA in any convenient cloning vector.
  • genomic clones are maintained in a COS-7 cells in the vector pSPL3 (Life Technologies, Inc.) .
  • PCR products are subcloned into pAMPIO using the UDG cloning kit (GIBCO BRL) , and propagated in a suitable E. coli host cell.
  • clones are maintained in plasmid cloning/expression vector, such as pGEMT (PROMEGA), and propagated in E. coli .
  • this invention provides oligonucleotides (sense or antisense strands of DNA or RNA) having sequences capable of hybridizing with at least one sequence of a nucleic acid molecule of the present invention, such as selected segments of Sequence I.D. Nos. 1, 2 and 4. Such oligonucleotides are useful as probes for detecting WFSl genes (and specific mutations) in test samples, e.g. by PCR amplification, or as potential regulators of gene expression.
  • a full-length WFSl-encoded protein of the present invention may be prepared in a variety of ways, according to known methods.
  • the protein may be purified from appropriate sources, e.g., human or animal cultured cells or tissues, by immunoaffinity purification.
  • appropriate sources e.g., human or animal cultured cells or tissues
  • immunoaffinity purification due to the limited amount of such a protein that may be present in a sample at any given time, particularly in tumors or tumor cell lines, conventional purification techniques are not preferred in the present invention.
  • the availability of the isolated WFSl coding sequence enables production of protein using in vi tro expression methods known in the art.
  • a cDNA or gene may be cloned into an appropriate in vi tro transcription vector, such a pSP64 or pSP65 for in vi tro transcription, followed by cell-free translation in a suitable cell-free translation system, such as wheat germ or rabbit reticulocytes .
  • vi tro transcription and translation systems are commercially available, e.g., from Promega Biotech, Madison, Wisconsin or BRL, Rockville, Maryland.
  • the recombinant protein may be produced by expression in a suitable procaryotic or eukaryotic system.
  • a DNA molecule such as the cDNA having SEQ ID NO: 2 or No. 4
  • a plasmid vector adapted for expression in a bacterial cell, such as E. coli
  • baculovirus vector for expression in an insect cell.
  • Such vectors comprise the regulatory elements necessary for expression of the DNA in the bacterial host cell, positioned in such a manner as to permit expression of the DNA in the host cell.
  • regulatory elements required for expression include promoter sequences, transcription initiation sequences and, optionally, enhancer sequences.
  • the protein produced by WFSl gene expression in a recombinant procaryotic or eukaryotic system may be purified according to methods known in the art.
  • a commercially available expression/secretion system can be used, whereby the recombinant protein is expressed and thereafter secreted from the host cell, to be easily purified from the surrounding medium.
  • expression/secretion vectors are not used, an alternative approach involves purifying the recombinant protein by affinity separation, such as by immunological interaction with antibodies that bind specifically to the recombinant protein. Such methods are commonly used by skilled practitioners .
  • Proteins prepared by the aforementioned methods may be analyzed according to standard procedures. For example, such proteins may be subjected to amino acid sequence analysis, according to known methods.
  • antibodies capable of immunospecifically binding to proteins of the invention are antibodies capable of immunospecifically binding to proteins of the invention.
  • Polyclonal antibodies directed toward WFSl-encoded proteins may be prepared according to standard methods.
  • Monoclonal antibodies may be prepared, which react immunospecifically with various epitopes of the proteins.
  • Monoclonal antibodies may be prepared according to general methods of K ⁇ hler and
  • Polyclonal or monoclonal antibodies that immunospecifically interact with WFSl-encoded proteins can be utilized for identifying and purifying such proteins.
  • antibodies may be utilized for affinity separation of proteins with which they immunospecifically interact.
  • Antibodies may also be used to immunoprecipitate proteins from a sample containing a mixture of proteins and other biological molecules. Other uses of antibodies are described below.
  • Nucleic acids comprising part or all of the WFSl gene may be used for a variety of purposes in accordance with the present invention.
  • selected WFSl sequences DNA, RNA or fragments thereof
  • WFSl mutations are associated with the occurrence of Wolfram Syndrome, a disease of autosomal recessive inheritance. Early identification of patients destined to develop Wolfram Syndrome may lead to preventive therapies. Identification of heterozygous individuals that are at risk of having a child with Wolfram Syndrome will be very useful in genetic counseling.
  • WFSl sequences may be utilized as probes in a variety of assays known in the art , including but not limited to: (1) in si tu hybridization; (2) Southern hybridization; (3) northern hybridization; and (4) assorted amplification reactions, such as polymerase chain reaction (PCR) .
  • PCR polymerase chain reaction
  • large deletion and premature termination mutations are detected by separation on acrylamide or agarose gel electrophoresis and Southern blotting with probes made from WFSl gene sequences.
  • Knowledge of the wildtype sequence allows the identification of point mutations in non-functional WFSl genes.
  • mutated genes are differentiated from wildtype genes by using restriction enzyme sites that appear or disappear as the result of the mutation. WFSl nucleic acids are digested and the fragments are then separated and probed as described above.
  • Example 1 The human genomic WFSl sequence was used to design screening procedures to quickly screen all the individuals in a Wolfram Syndrome patient's extended family.
  • the screen entailed amplifying the region of the gene encompassing the deletion, then separating the products by agarose gel electrophoresis and performing Southern blot hybridization using a labelled wild type PCR fragment.
  • the PCR primers were designed as above, the mutant and wildtype products were digested with a restriction enzyme that was specific to either the sequence at the point mutation or the corresponding wildtype sequence. The products of the digestion, along with undigested control nucleic acids, were separated and detected as above.
  • the WFSl nucleic acids of the invention may also be utilized as probes to identify related genes either from humans or from other species .
  • a cDNA has been isolated from mouse insulinoma cDNA library by standard screening methods and RACE PCR.
  • the mouse cDNA was 3,511 nucleotides long with 83.9% nucleotide identity to the coding sequence of the human gene and 86.1% predicted amino acid similarity. While the cloning of the mouse cDNA is illustrated in Example 1, those skilled in the art will appreciate that the isolation of WFSl from other species is by no means limited to mouse. Other mammalian species of interest include, but are not limited to, cow, cat, dog, horse, pig and rat.
  • hybridization stringency may be adjusted so as to allow hybridization of nucleic acid probes with complementary sequencing of varying degrees of homology.
  • the cDNA from the mouse wfsl gene is very useful because it allows further study of the gene and Wolfram Syndrome in system more conducive to research than human.
  • the mouse clone may be used to create a targeting construct, which can be used for the targeted mutagenesis of the endogenous mouse gene.
  • a mouse model system for Wolfram Syndrome can be generated. This mouse system can subsequently be used to elucidate the cellular function of the Wolfram gene product, as well as assist in developing therapies for the syndrome .
  • Systems other than mouse can also be used to advantage. These systems include, but are not limited to animal models developed in mouse, various cultured human and mammalian cell systems (e.g., mouse and rat insulinoma cells) and frog oocyte expression systems.
  • the coding region of WFSl may also used to advantage to produce substantially pure WFSl encoded proteins or selected portions thereof. As described below, these proteins may also be used in diagnosis and therapy of Wolfram Syndrome.
  • the WFSl-encoded protein may be used to produce polyclonal or monoclonal antibodies, which also may serve as sensitive detection reagents for the presence and accumulation of the WFS1- encoded polypeptide in cultured cells or tissues from living patients (the term "patient” refers to both humans and animals) . Because the WFSl-encoded protein has not yet been isolated from natural sources, such antibodies will greatly accelerate the identification, isolation and characterization of this protein in mammalian cells and tissues . Recombinant techniques enable expression of fusion proteins containing part or all of the WFSl- encoded protein.
  • the full-length protein or fragments of the protein may be used to advantage to generate an array of monoclonal antibodies specific for various epitopes of the protein, thereby potentially providing even greater sensitivity for detection of the protein in cells or tissues.
  • Monoclonal antibodies specific to variant portions of the WFSl polypeptide may also be used to advantage in diagnosing presence of a variant form of the gene .
  • Polyclonal or monoclonal antibodies immunologically specific for the WFSl-encoded protein may be used in a variety of assays designed to localized and/or quantitate the protein. Such assays include, but are not limited to: (1) flow cytometric analysis; (2) immunochemical localization of the protein in cultured cells or tissues; and (3) immunoblot analysis (e.g., dot blot, Western blot) of extracts from cells and tissues. Additionally, as described above, such antibodies can be used for the purification of WFSl-encoded proteins (e.g., affinity column purification, immunoprecipitation) .
  • assays include, but are not limited to: (1) flow cytometric analysis; (2) immunochemical localization of the protein in cultured cells or tissues; and (3) immunoblot analysis (e.g., dot blot, Western blot) of extracts from cells and tissues. Additionally, as described above, such antibodies can be used for the purification of WFSl-encoded proteins (e.g., affinity column pur
  • the STANFORD CHR 4 YAC MAP project and the CHROMOSOME 4 SUMMARY MAP were viewed at WEB sites (http://shgc.stanford.edu/, http://cedar.genetics.soton.ac.uk/).
  • Primer sequences were obtained from the Genome Database (GDB - http://www.hgmp. mrc . ac .ok/gdb/gdbtop.html) .
  • Pl/BAC library screening A human genomic PI library (HD-K) was screened for clones containing D4S500 and D4S431 by PCR, using primers developed to specifically hybridize with those markers. Sequence- tagged sites (STSs) (SP6 and T7) from Pis 102C5, 89C1 and 77B6, as well as D4S500 and D4S431 , were used for BAC library screening by either PCR or by direct hybridization of library grid blots (Research Genetics, Inc., Huntsville, Alabama).
  • STSs Sequence- tagged sites
  • RNA Northern blot analysis was performed with 20 ⁇ g of RNA by hybridization with WFSl cDNA labeled with [ ⁇ - 32 P]dCTP as previously described (Ferrer et al . , Diabetes 46 . : 386- 392, 1997) .
  • RNA quality and loading was checked by staining the gel for ribosomal RNA and by hybridization with either ⁇ -actin or ribosomal RNA.
  • the same 854-bp fragment was used for screening an infant brain ⁇ gtlO cDNA library.
  • Six overlapping clones containing the WFSl gene were isolated. 5' RACE analysis was performed using Marathon-Ready human fetal brain cDNA (Clontech, Palo Alto, CA) .
  • the RACE products were subcloned into pGEMT vector (Promega Biotech,
  • a mouse pancreatic ⁇ -cell line (MIN6) cDNA library was screened with human WFSl cDNA using standard reduced stringency conditions, and a 3.0 kb clone was isolated.
  • the 2812del(TC) was detected with a labeled PCR fragment on a denaturing Long Ranger polyacrylamide gel (FMC Bioproducts) .
  • the 1685del (N 15 ) was detected by electrophoresis of PCR fragments on 4% agarose gels.
  • the 2341C to T was detected by PCR, digestion with EcoNI (New England Biolabs (NEB) ) and polyacrylamide-gel electrophoresis. The mutated allele was observed as 209 bp and 35 bp fragments.
  • the 2254G to T mutation was detected by PCR, digestion with Avail (NEB) and agarose gel electrophoresis, with two bands (240 bp, 137 bp) in the wild allele, and three (240 bp, 120 bp, 17 bp) in the mutant.
  • the 2114G to A mutation was detected by PCR, digestion with Tfi I (NEB) and agarose gel electrophoresis.
  • the wild allele has one band of 528 bp, and the mutated allele has two bands (396 bp, 132 bp) .
  • the 1681C to T mutation destroys a BsmFl restriction site, and primers are set 8a (Table 2) , with resulting fragments described in Fig. 6c.
  • Multipoint analysis with GENEHUNTER (version 1.1) (Kruglyak et al . , Am J Hum Genet 59 . : 1347-1363, 1996) gave a lod >6.0 for the region encompassed by markers D4S827 and D4S394 .
  • Haplotype Analysis and Mapping by Recombination Haplotypes were constructed by inspection. The boundaries for the WFS gene had been defined by a telomeric recombinant at D4S432 (Collier et al . , 1996), and centromeric recombinants at D4S431 (Polymeropoulos et al . , 1994; Collier et al . , 1996). Recombinants in families WS-1 - WS-6 were identified by genotyping with genetic markers shown to locate between D4S432 and D4S431 by physical and/or radiation hybrid mapping. In family WS-1, subject III-2, recombination of the telomeric region from D4S827 was observed (Fig. 1A) .
  • D4S827 and D4S431 are within two overlapping YACs encompassing the region (204H9 and 420E4) . Further,
  • D4S500 and D4S431 were located on the same YAC (420E4) .
  • a contig was constructed with genomic human Pl/BAC clones, and a contiguous genomic map encompassing the WFS gene was obtained ( Figure 2) .
  • STS content mapping confirmed overlapping clones.
  • the critical region was estimated to be ⁇ 250 kb.
  • the size of the full-length mRNA and pattern of tissue expression was determined by PCR amplification of an 854-bp genomic fragment of the region that was hybridized to a multiple-tissue Northern blot with polyA+ RNA.
  • a major transcript of -3.7 kb was expressed in all tissues including pancreas (Fig. 3A) .
  • Northern analysis of total RNA (20 ⁇ g) revealed the gene most abundantly expressed in pancreatic islets compared to that in exocrine pancreas (Fig. 3C) .
  • a full-length clone was obtained by screening a human infant brain cDNA library. Six clones were isolated and subsequently 5' RACE analysis was performed. These analyses yielded a composite cDNA sequence of 3.688 kb. The longest open reading frame extended from nt 171 to 2843. The methionine at position 171 was chosen as the translation initiation codon primarily because it conforms to Kozak's rule (Kozak, Mamm Genome 1_ : 563-574, 1996) . A consensus polyadenylation site (aataa) was located at position 3615-20, 19 bases upstream from the polyA tail. The gene was named WFSl .
  • the protein is distinguished grossly by the presence of 3 structural domains, a hydrophilic N-terminal region of -300 residues, a hydrophilic C-terminal region of -240 residues, and a central hydrophobic core of -350 residues. Inspection of the hydrophobicity curve suggests the presence of -10 transmembrane segments, if it is assumed that this region of the protein consists of ⁇ -helical segments. Comparison of the predicted amino acid sequence with entries in the Prosite database produced a single match to the prenyltransferase ⁇ - subunit repeat structure.
  • a mouse ivfsl cDNA was isolated from a mouse insulinoma (MIN6) (Ishihara et al . , Diabetologia 36.: 1139-1145) cDNA library, and completed by RT-PCR.
  • the mouse wfsl cDNA was 3511 nucleotides with 83.9% nucleotide identity to the coding sequence of the human gene, and 86.1% amino acid similarity ( Figure 5) . Genomic structure of the WFSl gene and mutations in WFS patients.
  • the genomic structure of WFSl was determined by comparison of cDNA and genomic sequences obtained by shotgun sequencing of BAC460K9 and 33H22 and sequences in the Stanford Human Genome Center database (http://www.shgc.stanford.edu). The gene was found to be composed of eight exons (Fig. 2B) in 33.4 kb of genomic DNA.
  • exons were amplified and sequenced from patients' genomic DNA.
  • a TC deletion at position 2812 for subject WS-1 III-2 predicted a frameshift at codon 882, designated del882fs/ter937 (Table 1) , with absence of the normal stop codon at 891 and the introduction of a new downstream termination codon.
  • the predicted WFSl protein contains 937 amino acids, 47 more than the normal protein.
  • All 3 affected sibs (WS-1 III-l, -2, and -4) were homozygous for this mutation, while the unaffected sib and the parents were heterozygous, indicating a disease-specific mutation.
  • the 2812delTC mutation was not found in 80 healthy control Japanese subjects (160 chromosomes, see Table 1) .
  • the Australian Caucasian family WS-5 was particularly interesting, as each affected child appeared to be homozygous for a P504L mutant allele inherited from the heterozygous father (Fig. 6C) . Repeat sampling and analysis confirmed these results. Analysis of the mother's DNA with new markers between D4S500 and D4S431 suggested that the deletion was confined to a region of ⁇ 170 kb. Recently a patient with another autosomal recessive disorder was observed to be heterozygous for a missense mutation in combination with a partial deletion of a gene (Ries et al . , Human Mutation 12.: 44-51, 1998) . The expression pattern of WFSl appeared ubiquitous by Northern analysis of polyA ⁇ RNA (Fig. 3A) .
  • WFSl may represent a new therapeutic target for treatment and prevention of diabetes mellitus and for neurodegenerative disorders .

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Peptides Or Proteins (AREA)

Abstract

This invention provides a novel gene, WFS1, isolated from human chromosome 4p. Mutation of the WFS1 gene is associated with the development of Wolfram Syndrome. The WFS1 gene, along with cDNAs, encoded protein and antibodies immunologically specific for the protein, provide a biological marker for early diagnosis of the syndrome, and for predicting predisposition of an individual for the syndrome. The gene also will be useful in gene replacement therapy, or for development of new methods and agents for treating Wolfram Syndrome.

Description

GENE MUTATED IN WOLFRAM SYNDROME
FIELD OF THE INVENTION
This invention relates to the field of diagnosis and treatment of certain forms of diabetes. In particular, this invention provides a novel gene, the disruption of which is associated with Wolfram Syndrome, and methods of using the gene and specific mutants thereof as diagnostic tool for prediction and early detection of the syndrome.
BACKGROUND OF THE INVENTION
Various scientific and scholarly articles are referred to in brackets throughout the specification. These articles are incorporated by reference herein to describe the state of the art to which this invention pertains.
Wolfram syndrome (WFS) (OMIM 222300) was first described in 1938 as a combination of familial juvenile-onset diabetes mellitus and optic atrophy. Other clinical features subsequently emerged and accordingly, WFS is also referred to as the DIDMOAD
(diabetes insipidus, diabetes mellitus, optic atrophy, and deafness) syndrome. Most patients with this progressive disorder eventually develop all four cardinal manifestations, and die prematurely with widespread atrophic changes throughout the brain. Insulin-requiring diabetes mellitus occurs with mean age of onset at 6-8 years. When examined, pancreatic islets display atrophic and insulin-producing β-cells selectively absent. The disease is believed to account for 1/150 patients with young-onset insulin-requiring diabetes mellitus. The pathogenesis of Wolfram syndrome is unknown. Diagnosis is usually made in offspring of unaffected often-related parents, suggesting autosomal recessive inheritance. Linkage of the gene to markers on chromosome 4p (Polymeropoulos et al . , Nature Genetics 8: 95-97, 1994) was reported, and later confirmed (Collier et al . , Am L Hum Genet 59: 855-863, 1996) . Recombinants placed the gene in an interval between markers 5.5 cM apart .
Isolation and characterization of the gene or genes associated with Wolfram Syndrome is vital for the prediction, diagnosis and, ultimately, treatment of the disease. Currently there is no way of knowing for sure if an individual is predisposed to Wolfram Syndrome, particularly children too young to have developed the characteristic symptoms. Early diagnosis may lead to effective methods of treatment. Identification and isolation of the gene or genes associated with Wolfram Syndrome would further enable the development of screening procedures to assist in genetic counseling, as well as enabling detailed study of its function and subsequent development of therapeutic methods or agents.
SUMMARY OF THE INVENTION
In accordance with the present invention, a gene is provided, whose mutation is highly correlated with Wolfram Syndrome in humans. The symptoms of Wolfram Syndrome consist of diabetes insipidus, diabetes mellitus, optic atrophy and deafness. This gene is found in human genome between markers D4S500 and D4S431 on chromosome 4p, and is referred to herein as WFSl . One variant of WFSl is found in SEQ ID NO : 1.
Another aspect of the invention comprises several isolated nucleic acids from the human and mouse WFSl genes. In a preferred embodiment, these nucleic acids are the genomic sequence of the WFSl gene from human (SEQ ID NO:l), a cDNA sequence from the human WFSl gene (SEQ ID NO: 2) , and a cDNA from the mouse WFSl gene (SEQ ID NO: 4) . In another preferred embodiment, the isolated nucleic acids are substantially the same or 60% homologous to SEQ ID NO:l, SEQ ID NO:2, and SEQ ID NO:4. In yet another preferred embodiment, the nucleic acids encode SEQ ID NO: 3 and SEQ ID NO : 5 , substantially the same variants of SEQ ID NO: 3 and SEQ ID NO: 5, variants with at least 60% homology to SEQ ID NO : 3 and SEQ ID NO: 5, and variants of SEQ ID NO: 3 with the mutations and polymorphisms listed in Table 1. In a final preferred embodiment, the isolated nucleic acids are oligonucleotides (SEQ ID Nos: 6-41) that have been designed based on SEQ ID NO : 1.
In accordance with another aspect of the invention, a selection of isolated polypeptides is provided, which result from the expression of part or all of the human and mouse WFSl genes . In one preferred embodiment, the polypeptides are encoded by SEQ ID NO : 3 and SEQ ID NO: 5. In another preferred embodiment, the polypeptides are substantially the same as SEQ ID NO : 3 and SEQ ID NO: 5, or at least 60% homologous to SEQ ID NO: 3 and SEQ ID NO : 5.
In accordance with another aspect of the invention, a selection of antibodies that are immunologically specific to the aforementioned polypeptides is provided.
In accordance with another aspect of the invention, a series of methods is provided, which use the WFSl gene to genetically characterize mammalian subjects. In a preferred embodiment, the WFSl gene is used to make labeled probes which are used to detect WFSl nucleic acids. In another preferred embodiment, variant forms of WFSl are sequenced, compared to the sequence of WFSl , and mutations and polymorphisms determined. In a particularly preferred embodiment, human genomic DNA is sequenced and compared to SEQ ID NO:l. In a very particularly preferred embodiment, the primers in Table 2 are used to sequence the variant human gene . In yet another preferred embodiment, restriction enzymes are selected that differentially digest the wild type and variant gene, WFSl nucleic acids are digested and separated by size, and the WFSl nucleic acids are detected. In a particularly preferred embodiment, the human DNA is digested, and the primers in Table 2 are used to amplify the DNA before digestion.
Other features and advantages of the present invention will be better understood by reference to the drawings, detailed descriptions and examples that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1. Pedigrees of Wolfram syndrome families, with individuals designated by disease status (bold-lined symbols affected, thin-lined symbols unaffected) , and with derived haplotypes of chromosome 4p markers. Bold, underlined numbers represent disease- associated chromosomes. Italicized, underlined numbers refer to disease-associated chromosomes with historical recombinants. Consanguineous Japanese families: Fig. la, S-1; Fig. lb, WS-2; Fig. lc, WS-3 (an affected daughter, not shown, was deceased before initiation of the study). Caucasian families: Fig. ID, WS-4; and Fig. IE, WS-5. Markers were ordered from telomeric to centromeric as described in Example 1. Figure 2. Physical map of the Wolfram syndrome critical region. Fig. 2A: The horizontal line at the top of the figure represents a portion of chromosome 4p, with the centromere to the right and pter to the left . The dashed line represents the interval of D4S500 -D4S431 , the critical region of the Wolfram syndrome gene. PI and BAC clones are represented as lines. Their length reflects the number of STSs and not the actual size. The name of each clone is given to the right of the line. Marker names are noted above the line and correspond to the symbols on the line. Fig. 2B: An expanded schematic of the genomic structure of the Wolfram gene, WFSl, with exons indicated by boxes. The entire gene is encompassed within a 33.4 kb region.
Figure 3. Expression of WFSl mRNA in adult tissues. Fig. 3A: Northern analysis with human adult polyA÷ RNA (5 μg) derived from various tissues was hybridized with a 32P-labeled 854-bp genomic fragment of the WFSl gene. Fig. 3B: Re-hybridization of the blot with 32P-labeled β-actin cDNA as a loading control. Fig. 3C: Northern analysis with human adult total RNA (20 μg) hybridized with 32P-labeled WFSl cDNA. Fig. 3D: Re- hybridization of the blot in c. with 32P-labeled ribosomal cDNA as a loading control .
Figure 4. Hydrophobicity analysis was conducted by the method of Kyte and Doolittle (J Molec
Biol 157: 105-132, 1982) using a window size of 9 amino acid residues . Average hydrophobicity values are plotted as a function of position along the polypeptide chain. Figure 5. Comparison of human and mouse WFSl protein sequences. Plain text indicate identical residues. Amino acid gaps between human and mouse proteins are shown by dashes. The locations of the mutations found in Wolfram syndrome patients, 3 missense, the premature stop codon (X), 2 deletions, and the 7 bp repeat insertion, are indicated below the sequences. A predicted prenyltransferase α-subunit repeat structure (A493 to L502) is underlined.
Figure 6. Co-segregation of WFSl mutations with the disease phenotype in Wolfram syndrome families. Gig. 6a: The 1685del(N) in family WS-2 , showing that each affected child (III-l, -2, -3, and -4) is homozygous for the mutation. Fig. 6b: Sequence chromatograms of the region of exon 8 showing the 15 bp deletion in a patient homozygous for the mutation, along with a normal control. Fig. 6c: Segregation of the 1681C to T and the microscopic deletion in family WS-5. The 1681C to T mutation destroys a BsmFl site in a 766 bp PCR (polymerase chain reaction) fragment. The hemizygous T(-) affected children (II-l, -2, and -4) have only a 766 bp uncut fragment, while the hemizygous C(-) mother (1-4) and unaffected daughter (II-3) have 686 bp and 80 bp bands, and the heterozygous CT father (1-3) has 766 bp, 686 bp and 80 bp bands. The symbols "-" and "+" refer to the absence or presence of enzyme.
DETAILED DESCRIPTION OF THE INVENTION
I . Definitions
Various terms relating to the biological molecules of the present invention are used hereinabove and also throughout the specification and claims.
With reference to nucleic acids molecules, the term "isolated nucleic acid" is sometimes used. This term, when applied to DNA, refers to a DNA molecule that is separated from sequences with which it is immediately contiguous (in the 5' and 3' directions) in the naturally occurring genome of the organism from which it was derived. For example, the "isolated nucleic acid" may comprise a DNA molecule inserted into a vector, such as a plasmid or virus vector, or integrated into the genomic DNA of a procaryote or eucaryote . An "isolated nucleic acid molecule" may also comprise a cDNA molecule. With respect to RNA molecules, the term "isolated nucleic acid" primarily refers to an RNA molecule encoded by an isolated DNA molecule as defined above. Alternatively, the term may refer to an RNA molecule that has been sufficiently separated from RNA molecules with which it would be associated in its natural state (i.e., in cells or tissues), such that it exists in a "substantially pure" form (the term "substantially pure" is defined below) .
With respect to proteins or peptides, the term "isolated protein (or peptide) " or "isolated and purified protein (or peptide) " is sometimes used herein. This term refers primarily to a protein produced by expression of an isolated nucleic acid molecule of the invention. Alternatively, this term may refer to a protein which has been sufficiently separated from other proteins with which it would naturally be associated, so as to exist in "substantially pure" form.
The term "substantially pure" refers to a preparation comprising at least 50-60% by weight the compound of interest (e.g., nucleic acid, oligonucleotide, protein, etc.). More preferably, the preparation comprises at least 75% by weight, and most preferably 90-99% by weight, the compound of interest. Purity is measured by methods appropriate for the compound of interest (e.g. chromatographic methods, agarose or polyacrylamide gel electrophoresis, HPLC analysis, and the like) .
Nucleic acid sequences and amino acid sequences can be compared using computer programs that align the similar sequences of the nucleic or amino acids thus define the differences. In the comparisons made in the present invention, the BLAST programs (NCBI) and parameters used therein were employed, and the DNAstar system (Madison, WI) was used to align sequence fragments of genomic DNA sequences. However, equivalent alignments and similarity/identity assessments can be obtained through the use of any standard alignment software. For instance, the GCG Wisconsin Package version 9.1, available from the Genetics Computer Group in Madison,
Wisconsin, and the default parameters used (gap creation penalty=12, gap extension penalty=4) by that program may also be used to compare sequence identity and similarity. The term "substantially the same" refers to nucleic acid or amino acid sequences having sequence variation that do not materially affect the nature of the protein (i.e. the structure, stability characteristics, substrate specificity and/or biological activity of the protein) . With particular reference to nucleic acid sequences, the term "substantially the same" is intended to refer to the coding region and to conserved sequences governing expression, and refers primarily to degenerate codons encoding the same amino acid, or alternate codons encoding conservative substitute amino acids in the encoded polypeptide. With reference to amino acid sequences, the term "substantially the same" refers generally to conservative substitutions and/or variations in regions of the polypeptide not involved in determination of structure or function.
The terms "percent identical" and "percent similar" are also used herein in comparisons among amino acid and nucleic acid sequences. When referring to amino acid sequences, "percent identical" refers to the percent of the amino acids of the subject amino acid sequence that have been matched to identical amino acids in the compared amino acid sequence by a sequence analysis program. "Percent similar" refers to the percent of the amino acids of the subject amino acid sequence that have been matched to identical or conserved amino acids. Conserved amino acids are those which differ in structure but are similar in physical properties such that the exchange of one for another would not appreciably change the tertiary structure of the resulting protein.
Conservative substitutions are defined in Taylor (1986, J. Theor. Biol. 119:205). When referring to nucleic acid molecules, "percent identical" refers to the percent of the nucleotides of the subject nucleic acid sequence that have been matched to identical nucleotides by a sequence analysis program.
With respect to antibodies, the term "immunologically specific" refers to antibodies that bind to one or more epitopes of a protein of interest, but which do not substantially recognize and bind other molecules in a sample containing a mixed population of antigenic biological molecules.
With respect to oligonucleotides or other single-stranded nucleic acid molecules, the term "specifically hybridizing" refers to the association between two single-stranded nucleic acid molecules of sufficiently complementary sequence to permit such hybridization under pre-determined conditions generally used in the art (sometimes termed "substantially complementary"). In particular, the term refers to hybridization of an oligonucleotide with a substantially complementary sequence contained within a single-stranded DNA or RNA molecule, to the substantial exclusion of hybridization of the oligonucleotide with single-stranded nucleic acids of non-complementary sequence.
A "coding sequence" or "coding region" refers to a nucleic acid molecule having sequence information necessary to produce a gene product, when the sequence is expressed.
The term "operably linked" or "operably inserted" means that the regulatory sequences necessary for expression of the coding sequence are placed in a nucleic acid molecule in the appropriate positions relative to the coding sequence so as to enable expression of the coding sequence. This same definition is sometimes applied to the arrangement other transcription control elements (e.g. enhancers) in an expression vector.
Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for the expression of a coding sequence in a host cell.
The terms "promoter" , "promoter region" or "promoter sequence" refer generally to transcriptional regulatory regions of a gene, which may be found at the 5' or 3 ' side of the coding region, or within the coding region, or within introns. Typically, a promoter is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence. The typical 5' promoter sequence is bounded at its 3 ' terminus by the transcription initiation site and extends upstream (51 direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence is a transcription initiation site (conveniently defined by mapping with nuclease SI) , as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase .
A "vector" is a replicon, such as plasmid, phage, cosmid, or virus to which another nucleic acid segment may be operably inserted so as to bring about the replication or expression of the segment.
The term "nucleic acid construct" or "DNA construct" is sometimes used to refer to a coding sequence or sequences operably linked to appropriate regulatory sequences and inserted into a vector for transforming a cell. This term may be used interchangeably with the term "transforming DNA" . Such a nucleic acid construct may contain a coding sequence for a gene product of interest, along with a selectable marker gene and/or a reporter gene .
The term "selectable marker gene" refers to a gene encoding a product that, when expressed, confers a selectable phenotype such as antibiotic resistance on a transformed cell.
The term "reporter gene" refers to a gene that encodes a product which is easily detectable by standard methods, either directly or indirectly.
A "heterologous" region of a nucleic acid construct is an identifiable segment (or segments) of the nucleic acid molecule within a larger molecule that is not found in association with the larger molecule in nature. Thus, when the heterologous region encodes a mammalian gene, the gene will usually be flanked by DNA that does not flank the mammalian genomic DNA in the genome of the source organism. In another example, a heterologous region is a construct where the coding sequence itself is not found in nature (e.g., a cDNA where the genomic coding sequence contains introns, or synthetic sequences having codons different than the native gene) . Allelic variations or naturally-occurring mutational events do not give rise to a heterologous region of DNA as defined herein. The term "DNA construct", as defined above, is also used to refer to a heterologous region, particularly one constructed for use in transformation of a cell .
A cell has been "transformed" or "transfected" by exogenous or heterologous DNA when such DNA has been introduced inside the cell . The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cells containing the transforming DNA. A "clone" is a population of cells derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vi tro for many generations.
Description
In accordance with the present invention, a novel human gene, WFSl, has been isolated. The inheritance of a mutated WFSl gene is highly correlated with the development of Wolfram Syndrome in the human population. Normal function of WFSl appears to be essential for survival of pancreatic islet β-cells and neurons. Included in the invention is the method for using the gene sequence to genetically screen for presence of the potentially mutated forms of the gene for diagnosis and prognosis of the disease in patients. The WFSl gene in humans spans 33.4 kb on chromosome 4p and is composed of 8 exons (Fig. 2B; SEQ ID NO:l) . The ~5kb of sequence upstream of the start of WFSl open reading frame is expected to contain one or more transcriptional or translational regulatory elements. The WFSl human cDNA is 3.688 kb long (SEQ ID NO: 2) and encodes a predicted protein 890 residues long with a predicted molecular mass of 100.29 kDa (SEQ ID
NO: 3) . Comparison of the WFSl cDNA sequence with those in public databases found no related genes. A mouse cDNA of WFSl has also been isolated. The mouse cDNA (SEQ ID NO: 4) is 3511 nucleotides long and has 83.9% nucleotide identity to the human gene. The predicted protein sequence encoded by the mouse cDNA (SEQ ID NO: 5) has a 86.1% amino acid similarity to the predicted human WFSl protein.
The inventors have isolated the WFSl gene represented by SEQ ID NO : 1 from the human genome by positional cloning. Previous work had isolated the Wolfram gene between markers D4S432 and D4S431 on chromosome 4p, 5.5 cM (~5500 kb) apart. In the development of this invention, five families with individuals having typical Wolfram syndrome phenotypes were genotyped with genetic markers shown to locate between D4S432 and D4S431 by physical and/or radiation hybrid mapping (see Example 1) . The region containing the Wolfram gene was thus narrowed further to a region between D4S500 and D4S431 . The critical region between D4S500 and D4S431 was estimated to be <250 kb as determined by contig mapping of BAC and PI genomic clones. Three clones were sequenced and the sequence of much of the contig region was determined.
Exon trapping of two BAC clones was employed to generate expressed sequence tags (ESTs) of the region and then to determine areas with open reading frames that would be likely to contain a gene. Among a number of ESTs isolated was one predicted to be the 3' end 1.8 kb exon of a gene. A genomic fragment of this region hybridized to a 3.7 kb RNA on a Northern blot. The gene was determined to be expressed in all tissues, but surprisingly was most abundant in pancreatic islets compared to that in the exocrine pancreas. The abundance of WFSl RNA in pancreatic islets was particularly revealing because one of the outcomes Wolfram Syndrome is atrophy of pancreatic islets.
A full length cDNA clone of WFSl was obtained by screening a human infant brain cDNA library (SEQ ID NO: 2) . This clone was 3,688 nucleotides long and contained an appropriate start methionine, open reading frame, and polyadenylation signal. Comparison of the cDNA sequence of WFSl with those in public databases revealed no related genes. Translation of the cDNA sequence predicts a polypeptide of 890 amino acid residues with a molecular mass of 100.29 kDa. The protein is distinguished grossly by the presence of 3 structural domains, a hydrophilic N-terminal region of -300 residues, a hydrophilic C-terminal region of -240 residues, and a central hydrophobic core of -350 residues. Inspection of the hydrophobicity curve suggests the presence of -10 transmembrane segments. When patients with Wolfram Syndrome were checked for mutagenesis in the WFSl gene, seven mutations were found in the full-length clones derived from the original EST . PCR (polymerase chain reaction) was used to amplify exons of WFSl from subjects with Wolfram Syndrome, and the products were sequenced. Comparison of these sequences with the wild type gene revealed probable loss of function mutations in all cases, as described in greater detail below.
The following description set forth the general procedures involved in practicing the present invention. To the extent that specific materials are mentioned, it is merely for purposes of illustration and is not intended to limit the invention. Unless otherwise specified, general cloning procedures, such as those set forth in Sambrook et al . , Molecular Cloning, Cold Spring Harbor Laboratory (1989) (herein "Sambrook et al . " ) or Ausubel et al . (eds) Current Protocols in Molecular Biology, John Wiley & Sons (1999) (herein "Ausubel et al . " ) are used. Unless otherwise specified, general genome analysis procedures were used, such as set forth in Genome Analysis: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1997) .
I. Preparation of WFSl nucleic acid molecules, encoded proteins and immunologically specific antibodies
A. Nucleic Acid Molecules
Nucleic acid molecules comprising part or all of the WFSl gene of the invention may be prepared by two general methods: (1) they may be synthesized from appropriate nucleotide triphosphates, or (2) they may be isolated from biological sources. Both methods utilize protocols well known in the art.
The availability of nucleotide sequence information, such as Sequence I.D. Nos. 1, 2 and 4, enables preparation of an isolated nucleic acid molecule of the invention by oligonucleotide synthesis. Synthetic oligonucleotides may be prepared by the phosphoramadite method employed in the Applied Biosystems 38A DNA Synthesizer or similar devices. The resultant construct may be purified by high performance liquid chromatography (HPLC) . Long, double-stranded polynucleotides , such as a DNA molecule of the present invention, must be synthesized in stages, due to the size limitations inherent in current oligonucleotide synthetic methods. Thus, for example, a double-stranded DNA molecule several kilobases in length may be synthesized as multiple smaller segments of appropriate complementarily. Complementary segments thus produced may be annealed such that each segment possesses appropriate cohesive termini for attachment of an adjacent segment. Adjacent segments may be ligated by annealing cohesive termini in the presence of DNA ligase to construct an entire double- stranded molecule. A synthetic DNA molecule so constructed may then be cloned and amplified in an appropriate vector.
WFSl nucleic acid sequences may be isolated from appropriate biological sources using methods known in the art. In a preferred embodiment, a human genomic clone is isolated from a human genomic PI library. In another preferred embodiment, a cDNA clone is isolated from the Marathon-Ready human fetal brain λ gtlO cDNA library (Clontech) . In yet another preferred embodiment, a mouse cDNA is isolated from a mouse pancreatic β-cell line (MIN6) cDNA library. The isolation of human and mouse clones is not limited to the aforementioned libraries, and other commercially available human and mouse libraries may be used. Alternatively, cDNA or genomic clones from other species may be obtained. In accordance with the present invention, nucleic acids having the appropriate sequence homology with part or all of Sequence I.D. Nos. 1, 2 or 4 may be identified by using hybridization and washing condition of appropriate stringency. For example, hybridizations may be performed, according to the method of Sambrook et al . , using a hybridization solution comprising: 5 x SSC, 5 x Denhardt ' s reagent, 1.0% SDS, 100 μg/ml denatured, fragmented salmon sperm DNA, 0.05% sodium pyrophosphate and up to 50% formamide . Hybridization is carried out at 37-42°C for at least six hour. Following hybridization, filters are washed as follows: (1) 5 minutes at room temperature in 2 x SSC and 1% SDS; (2) 15 minutes at room temperature in 2 x SSC and 0.1% SDS; (3) 30 minutes-1 hour at 37oc in 1 x SSC and 1% SDS; (4) 2 hours at 42- 65°in 1 X SSC and 1% SDS, changing the solution every 30 minutes. In a preferred embodiment, hybridizations are performed in hybridization solution comprising 0.5 M NaP04, 2 mM EDTA, 7% SDS and 0.1% sodium pyrophosphate (pH 7.1) at about 65°C for 20 hours. For high-stringency conditions, membranes are subsequently washed sequentially for 1 hour each in: (1) 2X SSC, 0.5X SET, 0.1% sodium pyrophosphate; and (2) 0. IX SSC, 0.5X SET, 0.1% sodium pyrophosphate. For low-stringency conditions, membranes are washed at 50°C for 30 minutes in 2X SSC, 0.5X SET, 0.1% sodium pyrophosphate. One common formula for calculating the stringency conditions required to achieve hybridization between nucleic acid molecules of a specified sequence homology (Sambrook et al . , 1989) :
Tm = 81.5°C + 16.6Log [Na+] + 0.41(% G+C) - 0.63 (% formamide) - 600/#bp in duplex
As an illustration of the above formula, using [N+] = [0.368] and 50% formamide, with GC content of 42% and an average probe size of 200 bases, the Tm is 57°C. The Tm of a DNA duplex decreases by 1 - 1.5°C with every 1% decrease in homology. Thus, targets with greater than about 75% sequence identity would be observed using a hybridization temperature of 42 °C.
The stringency of the hybridization and wash depend primarily on the salt concentration and temperature of the solutions. In general, to maximize the rate of annealing of the probe with its target, the hybridization is usually carried out at salt and temperature conditions that are 20-25°C below the calculated Tm of the of the hybrid. Wash conditions should be as stringent as possible for the degree of identity of the probe for the target. In general, wash conditions are selected to be approximately 12-20°C below the Tm of the hybrid. In regards to the nucleic acids of the current invention, a moderate stringency hybridization is defined as hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 100 μg/ml denatured salmon sperm DNA at 42°C, and wash in 2X SSC and 0.5% SDS at 55°C for 15 minutes. A high stringency hybridization is defined as hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 100 μg/ml denatured salmon sperm DNA at 42°C, and wash in IX SSC and 0.5% SDS at 65°C for 15 minutes. A very high stringency hybridization is defined as hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 100 μg/ml denatured salmon sperm DNA at 42 °C, and wash in 0. IX SSC and 0.5% SDS at 65°C for 15 minutes.
Nucleic acids of the present invention may be maintained as DNA in any convenient cloning vector. In a preferred embodiment, genomic clones are maintained in a COS-7 cells in the vector pSPL3 (Life Technologies, Inc.) . In another preferred embodiment, PCR products are subcloned into pAMPIO using the UDG cloning kit (GIBCO BRL) , and propagated in a suitable E. coli host cell. In another preferred embodiment, clones are maintained in plasmid cloning/expression vector, such as pGEMT (PROMEGA), and propagated in E. coli . WFSl nucleic acid molecules of the invention
(including those containing known polymorphisms and mutations) include cDNA, genomic DNA, RNA, and fragments thereof which may be single- or double-stranded. Thus, this invention provides oligonucleotides (sense or antisense strands of DNA or RNA) having sequences capable of hybridizing with at least one sequence of a nucleic acid molecule of the present invention, such as selected segments of Sequence I.D. Nos. 1, 2 and 4. Such oligonucleotides are useful as probes for detecting WFSl genes (and specific mutations) in test samples, e.g. by PCR amplification, or as potential regulators of gene expression.
B. Proteins and Antibodies A full-length WFSl-encoded protein of the present invention may be prepared in a variety of ways, according to known methods. The protein may be purified from appropriate sources, e.g., human or animal cultured cells or tissues, by immunoaffinity purification. However, due to the limited amount of such a protein that may be present in a sample at any given time, particularly in tumors or tumor cell lines, conventional purification techniques are not preferred in the present invention. The availability of the isolated WFSl coding sequence enables production of protein using in vi tro expression methods known in the art. For example, a cDNA or gene may be cloned into an appropriate in vi tro transcription vector, such a pSP64 or pSP65 for in vi tro transcription, followed by cell-free translation in a suitable cell-free translation system, such as wheat germ or rabbit reticulocytes . In vi tro transcription and translation systems are commercially available, e.g., from Promega Biotech, Madison, Wisconsin or BRL, Rockville, Maryland.
Alternatively, the recombinant protein may be produced by expression in a suitable procaryotic or eukaryotic system. For example, part or all of a DNA molecule, such as the cDNA having SEQ ID NO: 2 or No. 4, may be inserted into a plasmid vector adapted for expression in a bacterial cell, such as E. coli , or into a baculovirus vector for expression in an insect cell. Such vectors comprise the regulatory elements necessary for expression of the DNA in the bacterial host cell, positioned in such a manner as to permit expression of the DNA in the host cell. Such regulatory elements required for expression include promoter sequences, transcription initiation sequences and, optionally, enhancer sequences.
The protein produced by WFSl gene expression in a recombinant procaryotic or eukaryotic system may be purified according to methods known in the art. A commercially available expression/secretion system can be used, whereby the recombinant protein is expressed and thereafter secreted from the host cell, to be easily purified from the surrounding medium. If expression/secretion vectors are not used, an alternative approach involves purifying the recombinant protein by affinity separation, such as by immunological interaction with antibodies that bind specifically to the recombinant protein. Such methods are commonly used by skilled practitioners .
Proteins prepared by the aforementioned methods may be analyzed according to standard procedures. For example, such proteins may be subjected to amino acid sequence analysis, according to known methods.
Included in the present invention are antibodies capable of immunospecifically binding to proteins of the invention. Polyclonal antibodies directed toward WFSl-encoded proteins may be prepared according to standard methods. Monoclonal antibodies may be prepared, which react immunospecifically with various epitopes of the proteins. Monoclonal antibodies may be prepared according to general methods of Kδhler and
Milstein, following standard protocols. Polyclonal or monoclonal antibodies that immunospecifically interact with WFSl-encoded proteins can be utilized for identifying and purifying such proteins. For example, antibodies may be utilized for affinity separation of proteins with which they immunospecifically interact. Antibodies may also be used to immunoprecipitate proteins from a sample containing a mixture of proteins and other biological molecules. Other uses of antibodies are described below.
II. Uses of WFSl Nucleic Acids, Encoded Proteins and Immunologically Specific Antibodies A. WFSl Nucleic Acids
Nucleic acids comprising part or all of the WFSl gene may be used for a variety of purposes in accordance with the present invention. As illustrated in Example 1, selected WFSl sequences (DNA, RNA or fragments thereof) may be used as probes to identify mutations or rearrangements in a patient's DNA, and/or monitor the level of WFSl transcripts in tissues. As discussed earlier, WFSl mutations are associated with the occurrence of Wolfram Syndrome, a disease of autosomal recessive inheritance. Early identification of patients destined to develop Wolfram Syndrome may lead to preventive therapies. Identification of heterozygous individuals that are at risk of having a child with Wolfram Syndrome will be very useful in genetic counseling.
WFSl sequences may be utilized as probes in a variety of assays known in the art , including but not limited to: (1) in si tu hybridization; (2) Southern hybridization; (3) northern hybridization; and (4) assorted amplification reactions, such as polymerase chain reaction (PCR) . In a preferred embodiment, large deletion and premature termination mutations are detected by separation on acrylamide or agarose gel electrophoresis and Southern blotting with probes made from WFSl gene sequences. Knowledge of the wildtype sequence allows the identification of point mutations in non-functional WFSl genes. In another preferred embodiment, mutated genes are differentiated from wildtype genes by using restriction enzyme sites that appear or disappear as the result of the mutation. WFSl nucleic acids are digested and the fragments are then separated and probed as described above.
Both of the above-mentioned preferred embodiments are illustrated in Example 1. The human genomic WFSl sequence was used to design screening procedures to quickly screen all the individuals in a Wolfram Syndrome patient's extended family. In the case of large deletion mutations, the screen entailed amplifying the region of the gene encompassing the deletion, then separating the products by agarose gel electrophoresis and performing Southern blot hybridization using a labelled wild type PCR fragment. In the case of point mutations, the PCR primers were designed as above, the mutant and wildtype products were digested with a restriction enzyme that was specific to either the sequence at the point mutation or the corresponding wildtype sequence. The products of the digestion, along with undigested control nucleic acids, were separated and detected as above. In all of the six extended families studied, inheritance of a homozygous complement of a mutated WFSl gene was consistent with the development of the disease and the pedigree of the family. Of particular interest are the three mis-sense mutations in the human WFSl predicted polypeptide, whose wild type sequence is found to be conserved in the predicted mouse polypeptide. This sequence conservation, together with the mutational effect of non-conservative amino acid substitutions at these sites, suggests that these amino acids are critical for gene function. A primary screen of these sequences is a useful way to expedite the identification of mutations. Other critical mis-sense mutations may also be useful in connection with this invention. These other mutations can be found using procedures detailed in Example 1 and others well known in the art.
The WFSl nucleic acids of the invention may also be utilized as probes to identify related genes either from humans or from other species . In a preferred embodiment, a cDNA has been isolated from mouse insulinoma cDNA library by standard screening methods and RACE PCR. The mouse cDNA was 3,511 nucleotides long with 83.9% nucleotide identity to the coding sequence of the human gene and 86.1% predicted amino acid similarity. While the cloning of the mouse cDNA is illustrated in Example 1, those skilled in the art will appreciate that the isolation of WFSl from other species is by no means limited to mouse. Other mammalian species of interest include, but are not limited to, cow, cat, dog, horse, pig and rat. As is well known in the art, hybridization stringency may be adjusted so as to allow hybridization of nucleic acid probes with complementary sequencing of varying degrees of homology.
The cDNA from the mouse wfsl gene is very useful because it allows further study of the gene and Wolfram Syndrome in system more conducive to research than human. The mouse clone may be used to create a targeting construct, which can be used for the targeted mutagenesis of the endogenous mouse gene. By creating both null and site-directed mutants, a mouse model system for Wolfram Syndrome can be generated. This mouse system can subsequently be used to elucidate the cellular function of the Wolfram gene product, as well as assist in developing therapies for the syndrome . Systems other than mouse can also be used to advantage. These systems include, but are not limited to animal models developed in mouse, various cultured human and mammalian cell systems (e.g., mouse and rat insulinoma cells) and frog oocyte expression systems.
As described above, the coding region of WFSl may also used to advantage to produce substantially pure WFSl encoded proteins or selected portions thereof. As described below, these proteins may also be used in diagnosis and therapy of Wolfram Syndrome.
B. Proteins and Antibodies
The WFSl-encoded protein, or fragments thereof, may be used to produce polyclonal or monoclonal antibodies, which also may serve as sensitive detection reagents for the presence and accumulation of the WFS1- encoded polypeptide in cultured cells or tissues from living patients (the term "patient" refers to both humans and animals) . Because the WFSl-encoded protein has not yet been isolated from natural sources, such antibodies will greatly accelerate the identification, isolation and characterization of this protein in mammalian cells and tissues . Recombinant techniques enable expression of fusion proteins containing part or all of the WFSl- encoded protein. The full-length protein or fragments of the protein may be used to advantage to generate an array of monoclonal antibodies specific for various epitopes of the protein, thereby potentially providing even greater sensitivity for detection of the protein in cells or tissues. Monoclonal antibodies specific to variant portions of the WFSl polypeptide may also be used to advantage in diagnosing presence of a variant form of the gene .
Polyclonal or monoclonal antibodies immunologically specific for the WFSl-encoded protein may be used in a variety of assays designed to localized and/or quantitate the protein. Such assays include, but are not limited to: (1) flow cytometric analysis; (2) immunochemical localization of the protein in cultured cells or tissues; and (3) immunoblot analysis (e.g., dot blot, Western blot) of extracts from cells and tissues. Additionally, as described above, such antibodies can be used for the purification of WFSl-encoded proteins (e.g., affinity column purification, immunoprecipitation) .
The following example is provided to describe the invention in greater detail. It is intended to illustrate, not to limit, the invention.
EXAMPLE 1 Isolation and Identification of Wolfram Gene and Mutants
METHODS
Patients and families. Three families originated from Japan: WS-1 (Nanko et al . , Brit J Psychiatry 161 : 282, 1992), WS-2 (Higashi, Am J Otology 12: 57-60, 1991, and WS-3 (Maruta et al . , Clin Neurol 27: 725,732, 1987). Cell lines for the Caucasian family WS-4 were obtained from the NIGMS Human Genetic Mutant Cell Repository, Camden, NJ. (family #1157) . Family WS-5 is Caucasian Australian and WS-6 Saudi Arabian. Minimum criteria for diagnosis were young-onset insulin-dependent diabetes mellitus and progressive optic atrophy.
Microsatellite analysis. Initial genotyping was carried out as described previously (Nestorowicz et al., Hum Molec Genet 7: 1119-1128, 1998; Inoue et al . ,
Diabetes 45: 789-794, 1996) with markers reported in the 1996 Genethon Microsatellite Map (D4S127 , Hox7 , D4S412, D4S3023 , D4S2925, D4S431 , D4S2935, D4S3007 , D4S394 , D4S2983 , D4S2923) . Second genotyping was performed with markers (D4S2957, D4S2375, A348XA5, D4S827, D4S500, and D4S2366) , shown to locate in the interval of D4S3023 - D4S431 by physical and radiation hybrid mapping projects of chromosome 4. The STANFORD CHR 4 YAC MAP project and the CHROMOSOME 4 SUMMARY MAP were viewed at WEB sites (http://shgc.stanford.edu/, http://cedar.genetics.soton.ac.uk/). Primer sequences were obtained from the Genome Database (GDB - http://www.hgmp. mrc . ac .ok/gdb/gdbtop.html) .
Pl/BAC library screening. A human genomic PI library (HD-K) was screened for clones containing D4S500 and D4S431 by PCR, using primers developed to specifically hybridize with those markers. Sequence- tagged sites (STSs) (SP6 and T7) from Pis 102C5, 89C1 and 77B6, as well as D4S500 and D4S431 , were used for BAC library screening by either PCR or by direct hybridization of library grid blots (Research Genetics, Inc., Huntsville, Alabama).
Exon-trapping, sample and shotgun megabase genomic sequencing. Restriction fragments from BAC 460K9 and 33H22 were cloned into the BamHI site of pSPL3 , transfected into COS-7 cells and spliced products obtained by RT-PCR were subcloned into pAMPIO using the UDG cloning kit (GIBCO BRL) and sequenced.
Sample and shotgun sequencing of BAC460K9 and 33H22 was accomplished as described (Wilson and Mardis, in Genome Analysis : A Laboratory Manual , Cold Spring Harbor Laboratory Press, New York, NY, 1997) . Similarity searches with known genes and ESTs were performed using BLAST programs (NCBI) . The sequencing project was compiled using LaserGene software (DNAStar, Madison, WI) . Sequencing was performed on an ABI 373 with Prism dye terminator kits.
Northern blot analysis, cDNA screening, and 5' RACE analysis. A multiple adult human tissue polyA+ RNA Northern blot (Clontech- MTN human I) was probed with an 854-bp fragment (nt 2133-2986 WFSl cDNA) as described by the manufacturer. Total cellular RNA was isolated (Chomczynski and Sacchi, Anal Biochem 162 : 156-159, 1987) and Northern analysis performed with 20 μg of RNA by hybridization with WFSl cDNA labeled with [α-32P]dCTP as previously described (Ferrer et al . , Diabetes 46.: 386- 392, 1997) . RNA quality and loading was checked by staining the gel for ribosomal RNA and by hybridization with either β-actin or ribosomal RNA. The same 854-bp fragment was used for screening an infant brain λ gtlO cDNA library. Six overlapping clones containing the WFSl gene were isolated. 5' RACE analysis was performed using Marathon-Ready human fetal brain cDNA (Clontech, Palo Alto, CA) . The RACE products were subcloned into pGEMT vector (Promega Biotech,
Madison WI) and sequenced. A mouse pancreatic β-cell line (MIN6) cDNA library was screened with human WFSl cDNA using standard reduced stringency conditions, and a 3.0 kb clone was isolated. A primer corresponding to the predicted 5 ' -untranslated region, based on the mouse EST sequence (GenBank AA021827 and AA692227) that was homologous to the human cDNA, was synthesized and used for RT-PCR with MIN6 RNA (5 ' -CGGTTTCGGAGCAACTTCGC-3 ' , SEQ ID NO: 42 and 5'- CACCTCAGCCTCGTTCTCAG -3', SEQ ID NO: 43) .
Mutation detection. M13 universal primer sequence was incorporated into the 5' terminus of primers for direct sequence analysis using an ABI automated DNA sequencer Model 373 (Chadwick et al . , Biotechniques 20: 676-683, 1996). Primers used for exon amplification, genomic sequencing and mutation detection are as in Table 2.
The 2812del(TC) was detected with a labeled PCR fragment on a denaturing Long Ranger polyacrylamide gel (FMC Bioproducts) . The 1685del (N15) was detected by electrophoresis of PCR fragments on 4% agarose gels. The 2341C to T was detected by PCR, digestion with EcoNI (New England Biolabs (NEB) ) and polyacrylamide-gel electrophoresis. The mutated allele was observed as 209 bp and 35 bp fragments.
The 2254G to T mutation was detected by PCR, digestion with Avail (NEB) and agarose gel electrophoresis, with two bands (240 bp, 137 bp) in the wild allele, and three (240 bp, 120 bp, 17 bp) in the mutant. The 2114G to A mutation was detected by PCR, digestion with Tfi I (NEB) and agarose gel electrophoresis. The wild allele has one band of 528 bp, and the mutated allele has two bands (396 bp, 132 bp) . The 1681C to T mutation destroys a BsmFl restriction site, and primers are set 8a (Table 2) , with resulting fragments described in Fig. 6c.
GenBank Accession Numbers. The human WFSl and mouse wfsl cDNA sequences are deposited in GenBank: #AF084481 and #AF084482.
RESULTS Linkage Analysis. Linkage studies were conducted on three Japanese families (WS-1, -2, and -3) and two Caucasian families (WS-4, -5) (Figs. 1A-1E) , each with at least two individuals having typical Wolfram syndrome phenotypes. When genotyping with chromosome 4p markers used previously (Polymeropoulos et al . , Nature Genetics 8.: 95-97, 1994; Collier et al . , Am J Hum Genet 5 : 855-863, 1996), estimates for recombination fractions (θ) between WFS and the markers confirmed close linkage (lod =3.99 for D4S431 at θ =0.05). Multipoint analysis with GENEHUNTER (version 1.1) (Kruglyak et al . , Am J Hum Genet 59.: 1347-1363, 1996) gave a lod >6.0 for the region encompassed by markers D4S827 and D4S394 .
Haplotype Analysis and Mapping by Recombination. Haplotypes were constructed by inspection. The boundaries for the WFS gene had been defined by a telomeric recombinant at D4S432 (Collier et al . , 1996), and centromeric recombinants at D4S431 (Polymeropoulos et al . , 1994; Collier et al . , 1996). Recombinants in families WS-1 - WS-6 were identified by genotyping with genetic markers shown to locate between D4S432 and D4S431 by physical and/or radiation hybrid mapping. In family WS-1, subject III-2, recombination of the telomeric region from D4S827 was observed (Fig. 1A) . In the Japanese family WS-2 (Fig. IB) , all affected subjects (III-l, -2, -3, and -4) were haploidentical centromeric to D4S500 , suggesting the presence of a historical recombinant in the unrelated father (II -5) . The centromeric boundary was confirmed as D4S431 by a recombinant in WS-5, subject III -2 (Fig. IE) . We concluded that the WFS gene likely lies within the interval D4S500 -D4S431 .
Construction of a genomic contig across the WFS critical region. In the STANFORD CHR 4 YAC MAP project, D4S827 and D4S431 are within two overlapping YACs encompassing the region (204H9 and 420E4) . Further,
D4S500 and D4S431 were located on the same YAC (420E4) . A contig was constructed with genomic human Pl/BAC clones, and a contiguous genomic map encompassing the WFS gene was obtained (Figure 2) . STS content mapping confirmed overlapping clones. As the interval between D4S500 and D4S431 was covered by a PI (102C5) and a BAC clone (33H22) , the critical region was estimated to be <250 kb.
Identification of candidate genes within the WFS region, and cloning the WFSl gene. By exon trapping of BACs 33H22 and 460K9, an expressed sequence tag (EST) was found that resulted in the cloning of the γ-isoform of the B regulatory subunit of the human protein phosphatase 2A (PP2ABRγ) . The genomic structure of PP2ABRγ was determined, and direct sequence analysis of probands excluded this gene. In parallel, we initiated large-scale genomic sequencing from BACs 460K9 and 33H22, and PI 102C5. A total of -180 kb of sequence was analyzed. Among a number of EST matches, one was predicted to be the 3 '-end exon containing a 1.8 kb open reading frame (exon 8, Fig. 2B) . This EST was further evaluated as it was unambiguously within the critical region.
The size of the full-length mRNA and pattern of tissue expression was determined by PCR amplification of an 854-bp genomic fragment of the region that was hybridized to a multiple-tissue Northern blot with polyA+ RNA. A major transcript of -3.7 kb was expressed in all tissues including pancreas (Fig. 3A) . Northern analysis of total RNA (20 μg) revealed the gene most abundantly expressed in pancreatic islets compared to that in exocrine pancreas (Fig. 3C) .
A full-length clone was obtained by screening a human infant brain cDNA library. Six clones were isolated and subsequently 5' RACE analysis was performed. These analyses yielded a composite cDNA sequence of 3.688 kb. The longest open reading frame extended from nt 171 to 2843. The methionine at position 171 was chosen as the translation initiation codon primarily because it conforms to Kozak's rule (Kozak, Mamm Genome 1_ : 563-574, 1996) . A consensus polyadenylation site (aataaa) was located at position 3615-20, 19 bases upstream from the polyA tail. The gene was named WFSl .
Predicted characteristics of the WFSl protein, and cloning of the mouse cDNA. Comparison of the cDNA sequence of WFSl with those in public databases revealed no related genes. Translation of the cDNA sequence predicts a polypeptide of 890 amino acid residues with a molecular mass of 100.29 kDa. Hydrophobicity analysis of the deduced amino acid sequence is presented in Figure 4.
The protein is distinguished grossly by the presence of 3 structural domains, a hydrophilic N-terminal region of -300 residues, a hydrophilic C-terminal region of -240 residues, and a central hydrophobic core of -350 residues. Inspection of the hydrophobicity curve suggests the presence of -10 transmembrane segments, if it is assumed that this region of the protein consists of α-helical segments. Comparison of the predicted amino acid sequence with entries in the Prosite database produced a single match to the prenyltransferase α- subunit repeat structure.
A mouse ivfsl cDNA was isolated from a mouse insulinoma (MIN6) (Ishihara et al . , Diabetologia 36.: 1139-1145) cDNA library, and completed by RT-PCR. The mouse wfsl cDNA was 3511 nucleotides with 83.9% nucleotide identity to the coding sequence of the human gene, and 86.1% amino acid similarity (Figure 5) . Genomic structure of the WFSl gene and mutations in WFS patients. The genomic structure of WFSl was determined by comparison of cDNA and genomic sequences obtained by shotgun sequencing of BAC460K9 and 33H22 and sequences in the Stanford Human Genome Center database (http://www.shgc.stanford.edu). The gene was found to be composed of eight exons (Fig. 2B) in 33.4 kb of genomic DNA.
For mutation screening, exons were amplified and sequenced from patients' genomic DNA. A TC deletion at position 2812 for subject WS-1 III-2 predicted a frameshift at codon 882, designated del882fs/ter937 (Table 1) , with absence of the normal stop codon at 891 and the introduction of a new downstream termination codon. The predicted WFSl protein contains 937 amino acids, 47 more than the normal protein. All 3 affected sibs (WS-1 III-l, -2, and -4) were homozygous for this mutation, while the unaffected sib and the parents were heterozygous, indicating a disease-specific mutation. The 2812delTC mutation was not found in 80 healthy control Japanese subjects (160 chromosomes, see Table 1) . In other WFS families, six additional mutations were found in exon 8 (Table 1) . In family WS-2, affected offspring (III-l, -2, -3, and -4) inherited a 15 bp deletion resulting in del508YVYLL, homozygous by descent from related heterozygous parents. Co-segregation of this deletion with the WFS phenotype is shown in Fig 6A. A sequence chromatogram from an affected child homozygous for the 15 bp deletion is shown in Fig. 6B . In family WS-3, both affected offspring (II-l and -2) were homozygous for a 2341 C to T transversion resulting in a P724L mutation. In the Caucasian family WS-4, all affected offspring (II-l, -3, and -4) were found to be compound heterozygotes for a 2254 G to T transition resulting in a G695V (paternal) mutation, and a 2114 G to A transversion resulting in a W648X (maternal) mutation. The W648X mutation predicts a premature termination, and lack of 242 amino acids of the C- terminus. In each of these 4 families the mutations were shown to co-segregate with the disease phenotype, both by sequencing and by either size change, or alteration of a restriction endonuclease site. None of the mutations were found in Japanese or Caucasian control subjects. No other coding variants were found on sequencing the entire gene in each proband .
In the Australian family WS-5, a 1681C to T transversion (P504L) was observed. This mutation destroys a BsmFl restriction site. In Fig. 6C, the father (1-3) is shown to be heterozygous for 1681C to T, and the mother (1-4) is homozygous 1681C. Yet surprisingly all affected offspring (II-l, -2, and -4) appeared to be homozygous 1681T. The unaffected child (II-3) appeared to be homozygous for 1681C. The most likely explanation for these findings is that the mother's chromosome, inherited by each child (see haplotypes in Fig. IE) , harbored a microscopic deletion for the WFSl gene, and that the affected offspring were hemizygous for the P504L mutation. A sixth family (WS-6) with one 10 year old affected child and two apparently unaffected younger sisters became available for analysis. The parents were Saudi Arabian first cousins. The affected child was homozygous for a 7bp repeat insertion at 1610 (CTGAAGG) , resulting in a predicted frame shift and premature termination of the protein at codon 544. The parents were heterozygous, while the unaffected sisters were heterozygous and homozygous normal respectively. Sequence analysis also revealed a number of silent and intronic variants (polymorphisms) in various families (see Table 1) . Table 1. Mutations and polymorphisms in WFSl
Control
Mutation Amino acid Exon/intron Family chromosomes
2812del(TC) del882fs/ter937 Exon 8 WS-1 160a
1685del (CCTGCT del508YVYLL Exon 8 WS-2 160 a
CTATGTCTA)
2341C to T P724L Exon 8 WS-3 160 a
2254G to T G695V Exon 8 WS-4 160b
2114G to A W648X Exon 8 WS-4 160b
1681 C to T P504L Exon 8 WS-5 160b del WFS lc WS-5 ND
1610insCTGAAGG ins483fs/ter544 Exon 8 WS-6 160
Changes of uncertain effect
1167G to A I333V Exon 8 WS-4 (same chr. ND as W648X )
Polymorphisms
854G to C D268D Exon 6 WS-4 ND
1355T to C V395V Exon 8 WS-4 ND
1457C to T C429C Exon 8 WS-3 ND
1537A to G R456H Exon 8 WS- 1,-2,-3,-4, ND normals
1545C to T L459L Exon 8 WS-3 ND
1570T to C L507L Exon 8 WS-4 ND
1815C to T 549 Exon 8 WS-4 ND
1925C to T F585F Exon 8 WS-4 ND
2002G to A H611R Exon 8 WS-1,-2,-3, ND normals
2603A to G K811K Exon 8 WS-4 ND
2735G to A S855S Exon 8 WS-4 ND
1032-5C to G Intron 7 WS' -1,-2,-3,-4 ND a-Japanese b-Caucasian c-Not confirmed d-Palestinian Arabs Table 2. Primers for amplification, sequencing and mutation detection (SEQ ID NOS: 6-41, consecutively)
DN Product rAagment Primers Size
Exon 1* TGTAAAACGACGGCCAGTCTCGTGCAGAAGGCCGCGCT3 CAGGAAACAGCTATGACCGCCCACAGCCACCGCGCCAC3 247 bp
Exon 2 TGTAAAACGACGGCCAGTCTGTCTCCAGCAGACACTAA3 CAGGAAACAGCTATGACCCACAATGCTGAACTGCAGAG3 276 bp
Exon 3 TGTAAAACGACGGCCAGTCTGAAGACCCTCATGCCTTG3 CAGGAAACAGCTATGACCACACTTCTCTGTGGGCTGTG3 276 bp
Exon 4 TσTAAAACGACGGCCAGTTCGGAGAATCTGGAGGCTGA3 CAGGAAACAGCTATGACCCATTACAAGCTGCTCAACCC3 253 bp
Exon 5 TGTAAAACGACGGCCAGTCGAAAGCCTTCCAGGCAGAG3 CAGGAAACAGCTATGACCCTATGGGAAGGTCCTGGCTC3 353 bp
Exon 6 TGTAAAACGACGGCCAGTCTAGGAACAGTGCGCCAGTT3 CAGGAAACAGCTATGACCATGGAGTCGCACAGGAAGGA3 268 bp
Exon 7 TGTAAAACGACGGCCAGTGCCCATGCTGTTTTCTCTCA3 CAGGAAACAGCTATGACCCCGAGGACACATCCTTATGA3 371 bp
Exon 8a TGTAAAACGACGGCCAGTCCTCGTTCCCACGTACCATC3 CAGGAAACAGCTATGACCGTAGCAGTAGGTGCCCTTGA3 766 bp
Exon 8b TGTAAAACGACGGCCAGTCCTGGTCGTCCTCAATGTCA3 CAGGAAACAGCTATGACCCATAGAACCAGCAGAACAGC3 503 bp
Exon 8c TGTAAAACGACGGCCAGTTGGTTCACGTCTCTGGAGCT3 CAGGAAACAGCTATGACCGAGTTGTAGACCTTCATGCC3 240 bp
Exon 8d* TGTAAAACGACGGCCAGTGGGCATGAAGGTCTACAACT3 CAGGAAACAGCTATGACCGAACTTCTTGATGTGGCAGG3 362 bp
Exon 8e TGTAAAACGACGGCCAGTCTGGATGCGCTGCCTCTACG3 CAGGAAACAGCTATGACCTCAGGCCGCCGACAGGAATG3 523 bp
Exon 8f TGTAAAACGACGGCCAGTTCGCCTTCGACTTCTTTTTC3 CAGGAAACAGCTATGACCCCAAACAAATAAGAAATGCT3 499 bp
2812del(TC) GCC CAG CTC TCG CCC ACC AG 3 ' TCA GGC CGC CGA CAG GAA TG 3 ' 120 bp
1685del(N)15 CCT GGT CGT CCT CAA TGT CA 3 ' GGT AGG GCA CAA GGT AGC AG 3' 119 bp
2341C to T EcoNI-F; 5 ' GGGCATGAAGGTCTACAACTCCA 3 '
ECONI-R; 5 ' CCGTAGAGGCAGCGCATCCAGTCGCCGAcctAGAAC3 ' ** 244 bp
2254G to T 5 ' -GAGGGCATGAAGGTCTACAA-3 ' 5'- CCCACGGTAATCTCAAACTT-3' 377 bp
2114G to A 5 ' -TAGTGTGCCCCTGCTGTTGC-3 ' 5 ' -CCCACGGTAATCTCAAACTT-3 ' 528 bp
* Due to the inability to directly sequence these PCR products, the fragments were subcloned into pGEM-T Easy Vector (Promega) as described by the manufacturer and several colonies sequenced for each individual .
** Because originally there was no appropriate restriction enzyme site to distinguish mutated alleles, the EcoNI-R primer was modified (3-base change, tga to cct, underlined in primer sequence) and a new EcoNI site was introduced to the mutated allele (CC2341 (C/T)NNNNNAGG) . DISCUSSION
Consanguineous families from isolated regions of Japan provided the genetic material that led to the discovery of mutations in WFSl in WFS patients of diverse genetic backgrounds. We believe that mutant alleles at WFSl are responsible for the disease for several reasons, beyond the fact that the gene maps to the critical region. In each of the six pedigrees, mutant alleles of WFSl co-segregated with the disease phenotype. WFSl was shown to be expressed in brain, pancreatic islets, and in a β-cell insulinoma cell line, consistent with the disease phenotype. Seven different mutations were found, as well as a presumed microscopic deletion. The three missense mutations were evolutionarily conserved between the mouse and human (Figure 5) , further suggesting their biological significance. None of the mutations were observed in normal chromosomes .
The Australian Caucasian family WS-5 was particularly interesting, as each affected child appeared to be homozygous for a P504L mutant allele inherited from the heterozygous father (Fig. 6C) . Repeat sampling and analysis confirmed these results. Analysis of the mother's DNA with new markers between D4S500 and D4S431 suggested that the deletion was confined to a region of <170 kb. Recently a patient with another autosomal recessive disorder was observed to be heterozygous for a missense mutation in combination with a partial deletion of a gene (Ries et al . , Human Mutation 12.: 44-51, 1998) . The expression pattern of WFSl appeared ubiquitous by Northern analysis of polyA÷ RNA (Fig. 3A) .
Yet interestingly, the most prominent mRNA observed in total RNA was that in pancreatic islets (Fig. 3C) . This high level of expression of WFSl in islets might explain why the earliest manifestation of WFS is insulin- deficient diabetes mellitus (Barrett and Bundey, J Med Genet 34: 838-841, 1997) . Further analysis of the cell biology of WFSl will be accomplished through generation of specific antibodies, monitoring expression in cultured cells, and gene targeting to define genotype/phenotype relationships .
Swift et al . hypothesized that heterozygous carriers of the gene for WFS were 26-fold more likely to require psychiatric hospitalization than non-carriers (Swift et al., Molecular Psychiatry 3.: 86-91, 1998). Blackwood et al (Blackwood et al . , Nature Genetics 12.: 427-430, 1996) reported highest lod scores with markers mapping to the region D4S431-D4S403 in a genome scan of a large family with bipolar affective disorder. These findings suggest that mutations in WFSl might be implicated in patients with psychiatric diseases.
Mutations in WFSl appear to result in premature death of pancreatic islet β-cells leading to juvenile onset insulin-requiring diabetes mellitus (Karasik et al., Diabetes Care 12: 135-138, 1989). The β-cell damage in autoimmune Type I diabetes likely results from the interaction of the HLA locus as a major susceptibility gene, along with multiple minor gene defects. In contrast, islet β-cell loss in WFS is monogenic in origin. Importantly, the WFSl gene appears to play a major role in maintaining normal islet β-cell function, as mutations in this gene alone can result in loss of islet β-cells. Genome scans for both Type I and Type II diabetes mellitus have not implicated major genes in the 4p region. Yet since these are complex diseases, mutations in WFSl might play a minor role in these more common forms of diabetes. In addition, WFSl may represent a new therapeutic target for treatment and prevention of diabetes mellitus and for neurodegenerative disorders .
The present invention is not limited to the embodiments described and exemplified above, but is capable of variation and modification without departure from the scope of the appended claims.

Claims

What is claimed:
1. A recombinant DNA molecule comprising a vector into which is inserted a heterologous DNA segment from human chromosome 4p, the segment being located between markers D4S500 and D4S431 , the segment comprising a gene, mutations of which are associated with Wolfram Syndrome .
2. The recombinant DNA molecule of claim 1, wherein the gene is composed of exons that form an open reading frame having a sequence that encodes a polypeptide about 880 to 900 amino acids in length.
3. The recombinant DNA molecule of claim 2, wherein the open reading frame encodes an amino acid having greater than 60% identity with SEQ ID NO: 3 or SEQ ID NO: 5.
4. The recombinant DNA molecule of claim 4, wherein said open reading frame comprises a sequence having greater than 60% homology with SEQ ID NO: 2 or SEQ ID NO: 4.
5. The recombinant DNA molecule of claim 1, wherein the gene is composed of exons having sequences greater than 60% homologous with the sequences of the corresponding exons in SEQ ID NO:l.
6. An oligonucleotide between about 10 and 100 nucleotides in length, which specifically hybridizes with a portion of the recombinant DNA molecule of claim 1.
7. An isolated nucleic acid molecule having a sequence that is part or all of a sequence selected from the group consisting of: a) SEQ ID NO:l; b) a variant of SEQ ID NO:l that is substantially the same as SEQ ID NO : 1 within the exons of SEQ ID NO:l; c) a sequence having at least homology 60% to SEQ ID NO:l within the exons of SEQ ID NO:l; d) SEQ ID NO: 2; e) a variant of SEQ ID NO: 2 that is substantially the same as SEQ ID NO: 2; f) a sequence having at least homology 60% to SEQ ID NO: 2; g) a sequence encoding a polypeptide substantially the same as SEQ ID NO: 3; h) a sequence encoding a polypeptide at least 60% homologous to SEQ ID NO: 3; i) a sequence encoding a polypeptide substantially the same as SEQ ID NO: 3, that additionally comprises one or more of the sequence variants set forth in Table 1. j) SEQ ID NO:4; k) a variant of SEQ ID NO: 4 that is substantially the same as SEQ ID NO:4;
1) a sequence having at least homology 60% to SEQ ID NO: 4; m) a sequence encoding a polypeptide substantially the same as SEQ ID NO: 5; and n) a sequence encoding a polypeptide at least 60% homologous to SEQ ID NO: 5.
8. An oligonucleotide between 10 and 100 bases in length, that specifically hybridizes with a portion of the nucleic acid molecule of claim 7.
9. A polypeptide, which is produced by the expression of the nucleic acid molecule of claim 7.
10. Antibodies immunologically specific for the polypeptide of claim 9.
11. A polypeptide produced by expression of an isolated nucleic acid molecule comprising part or all of an open reading frame of a gene located on human chromosome 4p between markers D4S500 and D4S431 , mutations of which are associated with Wolfram Syndrome.
12. The polypeptide of claim 11, which comprises a hydrophilic N-terminal region of about 300 amino acid residues, a hydrophilic C-terminal region of about 240 residues, and a central hydrophobic core of about 350 residues.
13. The polypeptide of claim 12, having an amino acid sequence substantially the same as part or all of SEQ ID NO: 3 or SEQ ID NO: 5.
14. Antibodies immunologically specific for part or all of the polypeptide of claim 11.
15. A method for determining the predisposition of an individual to develop Wolfram Syndrome, which comprises examining a WFSl gene sequence of the individual for mutations resulting in expression of no gene product or a non- functional gene product, the mutations being indicative of the predisposition of the individual to develop Wolfram syndrome.
16. The method of claim 15, wherein the mutations comprises sequences selected from the group consisting of 2812del (TC) , 1685del (CCTGCTCTATGTCTA) ,
2341C to T, 2254 G to T, 2114 G to A, 1681 C to T, and 1610ins(CTGAAGG) .
17. The method of claim 16, wherein the mutations are detected by PCR amplification using primers selected from the group consisting of SEQ ID NOS: 32, 33, 34, 35, 36, 37, 38, 39, 40 and 41.
PCT/US1999/022429 1998-09-28 1999-09-28 Gene mutated in wolfram syndrome WO2000018787A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU62701/99A AU6270199A (en) 1998-09-28 1999-09-28 Gene mutated in wolfram syndrome

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10203198P 1998-09-28 1998-09-28
US60/102,031 1998-09-28

Publications (1)

Publication Number Publication Date
WO2000018787A1 true WO2000018787A1 (en) 2000-04-06

Family

ID=22287745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/022429 WO2000018787A1 (en) 1998-09-28 1999-09-28 Gene mutated in wolfram syndrome

Country Status (2)

Country Link
AU (1) AU6270199A (en)
WO (1) WO2000018787A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001091548A2 (en) * 2000-06-01 2001-12-06 Pharmacia & Upjohn Company Mice heterozygous for wfs1 gene as mouse models for depression
WO2017162798A1 (en) * 2016-03-23 2017-09-28 INSERM (Institut National de la Santé et de la Recherche Médicale) Targeting the neuronal calcium sensor 1 for treating wolfram syndrome

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5578444A (en) * 1991-06-27 1996-11-26 Genelabs Technologies, Inc. Sequence-directed DNA-binding molecules compositions and methods

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5578444A (en) * 1991-06-27 1996-11-26 Genelabs Technologies, Inc. Sequence-directed DNA-binding molecules compositions and methods

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
COLLIER ET AL.: "Linkage of Wolfram syndrome to chromosome 4p16.1 and evidence for heterogeneity", AMERICAN JOURNAL OF HUMAN GENETICS,, vol. 59, no. 4, October 1996 (1996-10-01), pages 855 - 863, XP002926969 *
HARDY ET AL.: "Clinical and molecular genetic analysis of 19 Wolfram syndrome kindreds demonstrating a wide spectrum of mutations in WFS1", AMERICAN JOURNAL OF HUMAN GENETICS,, vol. 65, no. 5, November 1999 (1999-11-01), pages 1279 - 1290, XP002926968 *
INQUE ET AL.: "A gene encoding a transmembrane protein is mutated in patients with diabetes mellitus and optic atrophy (Wolfram syndrome)", NATURE GENETICS,, vol. 20, no. 2, October 1998 (1998-10-01), pages 143 - 148, XP002926971 *
STROM ET AL.: "Diabetes insipidus, diabetes mellitus, optic atrophy and deafness (DIDMOAD) caused by mutations in a novel gene (wolframin) coding for a predicted transmembrane protein", HUMAN MOLECULAR GENETICS,, vol. 7, no. 13, December 1998 (1998-12-01), pages 2021 - 2028, XP002926970 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001091548A2 (en) * 2000-06-01 2001-12-06 Pharmacia & Upjohn Company Mice heterozygous for wfs1 gene as mouse models for depression
WO2001091548A3 (en) * 2000-06-01 2003-07-24 Upjohn Co Mice heterozygous for wfs1 gene as mouse models for depression
US6984771B2 (en) 2000-06-01 2006-01-10 Pharmacia & Upjohn Company Mice heterozygous for WFS1 gene as mouse models for depression
WO2017162798A1 (en) * 2016-03-23 2017-09-28 INSERM (Institut National de la Santé et de la Recherche Médicale) Targeting the neuronal calcium sensor 1 for treating wolfram syndrome
US10639384B2 (en) 2016-03-23 2020-05-05 Inserm (Institut National De La Sante Et De La Recherche Medicale) Targeting the neuronal calcium sensor 1 for treating wolfram syndrome

Also Published As

Publication number Publication date
AU6270199A (en) 2000-04-17

Similar Documents

Publication Publication Date Title
US7989182B2 (en) Nucleic acid encoding SCN1A variant
US8288096B2 (en) Diagnostic method for epilepsy
CA2502359A1 (en) Susceptibility gene for myocardial infarction
US8129142B2 (en) Mutations in ion channels
WO1999051779A9 (en) Glaucoma therapeutics and diagnostics
US5863724A (en) Methods of screening for persistent hyperinsulinemic hypoglycemia of infancy
JP2005528089A (en) Peripheral artery occlusion disease genes
US5756307A (en) Sequence of human dopamine transporter cDNA
WO2000018787A1 (en) Gene mutated in wolfram syndrome
US20060105364A1 (en) Best&#39;s macular dystrophy gene
AU2004200978B2 (en) A diagnostic method for epilepsy
US7005290B1 (en) Best&#39;s macular dystrophy gene
US20030157535A1 (en) Identification of two principal mutations in ion channels associated with idiopathic generalised epilepsies
CA2321129A1 (en) Best&#39;s macular dystrophy gene
AU2007202499B2 (en) Mutations in ion channels
JP2005525079A (en) ALS2 gene and amyotrophic lateral sclerosis type 2
AU2004263548B2 (en) Mutations in ion channels
WO2001029266A2 (en) Identification of arsacs mutations and methods of use therefor
WO1999063078A2 (en) Retinal calcium channel (alpha)1f-subunit gene
JPH11313682A (en) Fukuyama type congenital muscular dystrophy causing protein
WO1999057316A1 (en) A highly conserved polynucleotide sequence linked to a genetic predisposition to schizophrenia, a method of diagnosis, and applications thereof

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase