EP1222264A1

EP1222264A1 - Human sit4 associated proteins like (sapl) proteins and encoding genes; uses thereof

Info

Publication number: EP1222264A1
Application number: EP00969687A
Authority: EP
Inventors: John Andrew Todd; Rebecca Christina Joan Twells; John Wilfred Hess; Patricia Hey; Charles Thomas Caskey; Holly Hammond; Michael Lee Metzker
Original assignee: Wellcome Trust Ltd; Merck and Co Inc
Current assignee: Wellcome Trust Ltd; Merck and Co Inc
Priority date: 1999-10-19
Filing date: 2000-10-19
Publication date: 2002-07-17
Also published as: AU7934300A; CA2387041A1; WO2001029213A1; JP2003512055A

Abstract

Nucleic acids, polypeptides, oligonucleotide probes and primers, methods of diagnosis or prognosis, and other methods relating to and based on the cloning and characterisation of a gene which the present inventors have termed SAPL (SIT4-(sporulation-induced transcript 4)-like), found in at least two isoforms terms SAPLa and SAPLb.

Description

HUMAN SIT4 ASSOCIATED PROTEINS LIKE (SAPL) PROTEINS AND ENCODING GENES; USES THEREOF

The present invention relates to nucleic acids, polypeptides, oligonucleotide probes and primers, methods of diagnosis or prognosis, and other methods relating to and based on the cloning and characterisation of a gene which the present inventors have termed SAPL, found in at least two isoforms terms SAPLa and SAPLb .

Mammals and yeast use a similar mechanism that relies upon cyclins and cyclin-dependent kinases (CDKs) to regulate the cell cycle. In yeast the components of this mechanism include the cyclins, CLN1 and CLN2, and the cyclin-dependent kinases CDC28 (cell division control) in yeast (Dynlacht, 1997. Nature 389, 149-152) . The activity of the serine/t recnine kinase CDC28, also known as CDKl, is essential for the completion of G_λ START, the controlling event in the yeast cell cycle. CDC28 activity is modulated by the level of the cyclins, CLNJ and CLN2. The level of expression of CDC28 remains relatively constant throughout the cell cycle. In contrast, the mRNA expression level of the genes CLN1 and CLN2 increases dramatically during late G_x . This expression of CLN1 and CLN2 is dependent upon the SIT4 Ppase (protein phosphatase) (Fernandez-Sarabia et al. 1992. Genes Dev. 6, 2417-2428). The SIT4 Ppase is a type 2A phosphatase which is encoded by the si t4 (sporulation-induced transcript 4)gene (Sutton et al . 1991. Mol. Cell. Biol . 11, 2133-2148). The SIT4 protein is 55% identical to the catalytic subunit of mammalian type 2A phosphatase and 40% identical to mammalian type 1 phosphatase. A human cDNA clone, protein phosphatase 6, has been obtained that encodes a protein that when expressed in yeast has the ability to complement a si t4 mutant (Bastians and Ponstingl, 1996. J. Cell Sci. 109, 2865-2874). Therefore it is likely that protein phosphatase 6 or a related phosphatase is the mammalian ortholog of SIT4.

Genetic analysis in yeast of si t4 mutations demonstrated that the SIT4 Ppase is necessary for progression of the cell cycle from late G_λ to S phase with a temporal point of action, or execution point, at or similar to that of CDC28 (Sutton et al. 1991. Mol. Cell. Biol . 11, 2133-2148). The SIT4 protein was found to be associated with two proteins with an apparent molecular weight of 190 and 155 kD. The cloning of the genes encoding the SIT4 Associated Proteins, SAP155 and TAP190, resulted in the identification of two additional related genes encoding the proteins SAP185 and SAP4 (Luke et al . , 1996. Mol. Cell. Biol. 16, 2744-2755). Alignment of the members of the SAP family revealed a number of conserved residues (Luke et al., 1996. Mol. Cell. Biol. 16, 2744-2755), some of which are also present in SAPL. The SAP proteins appear to specifically interact with the SIT4 phosphatase (Luke et al . , 1996. Mol. Cell. Biol. 16, 2744-2755). Deletion of all four SAP genes results in a phenotype that is equivalent to a deletion of si t4, thus their association with SIT4 is essential for its function (Luke et al., 1996. Mol. Cell. Biol. 16, 2744-2755). Since overexpression of SAP genes can suppress certain si t4 temperature sensitive mutants it is thought that the SAP proteins act as positive modulators of SIT4 Ppase. The mechanism by which the SAP proteins modulate SIT4 activity is unknown. One possibility that has been suggested is that the SAP proteins increase the substrate specificity of SIT4 Ppase, in a fashion analogous to that found for the glycogen- targeting subunit of type 1 phosphatases (Luke et al . , 1996. Mol. Cell. Biol. 16, 2744-2755). In this case SIT4 would be a SAP-dependent phosphatase in a manner similar to CDC28 being a cyclin-dependent kinase (Luke et al., 1996. Mol. Cell. Biol. 16, 2744-2755) . Regardless of the mode of action the importance of the SAP proteins in the yeast cell cycle in regulating the activity of a critical enzyme, SIT4 Ppase, is well established.

The present inventors now disclose for the first time two isoforms of a novel gene, arising from alternative splicing and encoding highly related proteins, from the IDDM4 locus on human chromosome llql3. The isoforms have been termed by the inventors " SAPLa" (SAP like) and " SAPLb" .

BRIEF DESCRIPTION OF THE FIGURES

Figure 1(a) shows the nucleotide sequence of SAPLa cDNA. Nucleotide numbering herein is by reference to this sequence.

Figure 1 (b) shows the longest open reading frame of the SAPLa cDNA.

Figure 1 (c) shows the amino acid sequence translation of the open reading frame in the SAPLa cDNA producing SAPL isoform a. Amino acid residue numbering herein is by reference to this sequence .

Figure 1 (d) shows the amino acid sequence translation of an alternative open reading frame in the SAPLa cDNA which starts with a sequence that conforms with the Kozak consensus sequence for efficient initiation of translation.

Figure 2 (a) shows the nucleotide sequence of SAPLb cDNA.

Figure 2 (b) shows the longest open reading frame of the SAPLb cDNA.

Figure 2 (c) shows the amino acid sequence translation of the open reading frame in the SAPLb cDNA producing SAPL isoform b.

Figure 3 shows a multiple sequence alignment of the amino acid sequence of yeast SAP190, yeast SAP185 and human SAPL isoform a. The consensus sequence of the alignment is shown, capital letters indicate identity at that position. The amino acid residues that are underlined are conserved within the yeast SAP family as well as SAPL.

Figure 4 shows the sequence of DNA found immediately adjacent to SAPL exon 1 in the genome and identified as a putative promoter. Sequences that match the consensus binding sites for the Spl and NF kappa B transcription factors are shown in capital letters. Sequences that are conserved in the syntenic region of mouse genomic DNA sequence are underlined.

Figure 5 shows a multipoint linkage curve of the IDDM4 region.

Figure 6 shows the LOD score of the Tsp value obtained in analysis described below. The x-axis is not to scale.

Characteristics of SAPL cDNA and SAPL protein

Two full length cDNA sequences, that arise from alternative splicing, were isolated from the IDDM4 locus on chromosome llql3 and termed SAPLa (SAP like) and SAPLb. The SAPL gene is also known to the inventors as DM4E . The longest cDNA of 4793 nucleotides (Figure 1(a)) contains an open reading frame (Figure 1(b)) that encodes a protein, SAPLa, of 793 amino acids (Figure 1 (c) ) . The putative initiator methionine codon at nucleotide 278, AGCATGT conforms to the Kozak consensus sequence for efficient initiation of translation at the -3 (purine, preferably A) but not the +4 position (Kozak, (1996) Mamm. Genome 7, 563-574) . The predicted molecular weight of this protein is 89 kdal with an isoelectric point of 4.31. The protein does not contain any stretches of hydrophobic amino acids that would have a high probability of serving as a transmembrane spanning domain, nor does the protein contain a signal peptide for protein export. Therefore SAPLa is most likely localized to the cytoplasmic portion of the cell. A nonoptimal initiation can lead to multiple start sites (Kozak, (1996) Mamm. Genome 7, 563-574) . The first ATG codon that conforms to the Kozak consensus sequence is at nucleotide 482, GACATGG, this resulting in a protein of 725 amino acids (Figure 1(d)). The SAPLa cDNA contains consensus signals for polyadenylation at nucleotides 3592-3597 and 4115 to 4120.

The second cDNA (Figure 2(a)), SAPLb, of 3228 nucleotides contains an open reading frame (Figure 2 (b) ) that encodes a protein (Figure 2(c)), SAPLB, of 791 amino acids. The two proteins, SAPLa and SAPLb, are 100% identical for the first 776 amino acids. SAPLb has a predicted molecular weight of 89 kdal with an isoelectric point of 4.30, like SAPLa, it is predicted to be expressed in the cytoplasm. Both SAPLa and SAPLb contain a tandem repeat of the amino acid sequence Ser- Thr-Asp-Ser-Glu-Glu (STDSEE) from amino acids 562-573.

Comparison of SAPL with the protein database using the Smith- Waterman algorithm reveals a significant degree of similarity, p = 6.01e-13 and p = 2.02e-12, to two members of a SAP family of yeast proteins, SAP190 and SAP185 respectively. A lesser degree of similarity, p = 2.05e-2, is found to a third member of this family, SAP155. The amino acid sequence identity between amino acid 94-724 of SAPL and SAP190 is 19%. Over a similar region SAPL is 18% identical to SAP185. Using the algorithm tFASTA (which translates all the nucleotide sequences in the database in the 6 possible frames and compares it with the amino acid sequence of the input protein sequence using the FASTA algorithm) to search for additional mammalian genes with similarity to SAPL resulted in the identification of EST sequences but no full length cDNA sequences. Therefore the full length SAPL cDNA identified in this application provides for the first time the determination of amino acid sequence of a mammalian homolog of the yeast SAP family.

A multiple sequence alignment of SAPL, SAP190, and SAP 185 using the program GCG program pileup with the GapWeight set at 10 and the GapLengthWeight set at 1 yields the alignment shown in Figure 3. This alignment reveals several conserved motifs, a number of which are conserved within four members of the SAP family (SAP4, SAP155, SAP185 and SAP190) . The most strikingly conserved motifs are located in SAPL at residues 333-338, WNNFLH, and from 403-414, R (x) GYMGHLT (xx) A. There are also a number of other conserved regions of note from 102 to 108, LL(x) (K/R)L (aromatic) S and from 163 to 168,

MD (hydrophobic) LL(K/R) . Although the SAPL STDSEE repeats are not conserved there are a number of conserved acidic residues in this portion of the protein, i.e. residues 539 to 591. Several of these conserved motifs are found in all members of the SAP family, not only SAP190 and SAP185 which are the most similar to SAPL (Figure 3) . These include portions of the previously noted motifs; the motif from 333-338 WNNF (hydrophobic) H and the motif from 403-414 GYMG. A number of other residues which are identical between human SAPL and the members of the SAP family are indicated in Figure 3. The SAP proteins are not that similar to each other, e.g. SAP185 exhibits only 14% and 42% identity to SAP155 and SAP190, respectively. The finding that the protein contains motifs that are conserved within this family provides a strong indication that it is related to the yeast SAP family.

A number of potential protein phosphorylation sites are found in the SAPL protein (Table 4) . These include sites for the cAMP dependent protein kinase, protein kinase C, and casein kinase 2. Protein phosphorylation is a reversible modification of proteins and an important mechanism for modulating protein function. Furthermore in yeast the deletion of the si t4 gene results in hyperphosphorylation of the SAP proteins (Luke et al . , 1996. Mol. Cell. Biol. 16, 2744-2755) , therefore there is direct evidence that this family of proteins is subject to protein phosphorylation. Thus it is likely that SAPL is phosphorylated at least at some of the sites listed in Table 4. Furthermore, it is likely that SAPL function is modulated by protein phosphorylation. Protein phosphorylation of SAPL may be used in assays for compounds that modulate the level of SAPL phosphorylation. These compounds may inhibit either kinases or phosphatases that act on SAPL. Compounds isolated in such a fashion may have therapeutic utility in modifying the function of SAPL.

The cloning of the SAPL cDNA permits overexpression of the SAPL protein, and various isoforms, and testing of ability to complement SAP mutants in yeast. Similarly, expression of the human SAPL in yeast allows for the testing of a physical association between SAPL and SIT4. The cloning of the SAPL cDNA also allows the testing of the ability of SAPL to interact and modulate the activity of human protein phosphatase 6 and related phosphatases. Usefulness of SAPL in screening for molecules of pharmaceutical potential is discussed further below.

Since the activity of phosphatases, such as a SIT4 ortholog, may be necessary for progression of the cell cycle, compounds that inhibit the activity of the phosphatase may be useful in the treatment of cancer and other proliferative disorders. The SAPL protein may act in a manner analogous to the SAP proteins in yeast either to activate the phosphatase or modify its specificity. This too inicates usefulness of SAPL in assays for compounds that modulate the activity of the phosphatase, e.g. inhibit it.

There is evidence that the cyclin/CDK system is used to monitor environmental factors that influence not only cell division but apoptosis in terminally differentiated cells (Gao and Zelenka, (1997). BioEssays 19, 307-315). Since certain cyclins are expressed in T-cells this mechanism may be important in mediating T-cell apoptosis. Apoptosis of selected T-cell populations is a critical element in the control of the immune system and the prevention of autoimmunity. Therefore the location of SAPL within the IDDM4 locus and its proposed biological function of modulating either the activity or specificity of a phosphatase may indicate that this protein is important in maintaining immune self-tolerance. Compounds that modify the activity of SAPL may be tested in assays of T-cell proliferation or apoptosis. Compounds able to modify SAPL activity may be identified by ability to stimulate or inhibit SAPL complementation in mutant yeast deleted for all four yeast SAP genes (Luke et al . (1996) Mol . Cel l . Biol . 16: 2744-2755).

The presence of polymorphisms within the SAPL gene and the location of this gene within the IDDM4 locus allow for use of certain of the polymorphisms as diagnostic markers. These polymorphisms may be used to assay for the presence of a chromosomal region that confers susceptibility to type 1 diabetes. This susceptibility may be due to functional polymorphisms within the SAPL gene itself or may be due to a functional polymorphism within a neighboring gene that is in linkage disequilibrium with a SAPL polymorphism.

According to one aspect of the present invention there is provided a nucleic acid molecule encoding a SAPL polypeptide, which may be any of the SAPLa polypeptide isoforms of which the amino acid sequences are shown in Figure 1 (c) and Figure 1 (d) and the SAPLb polypeptide isoform of which the amino acid sequence is shown in Figure 2 (c) .

Thus, individual aspects of the present invention provide nucleic acid encoding a polypeptide including the amino acid sequence shown in Figure 1(c), Figure 1(d) or Figure 2(c). Furthermore, an additional aspect of the present invention provides nucleic acid encoding a polypeptide which includes the first 776 amino acids shown in Figure 1(c) and Figure 2(c) which are identical for the respective SAPL isoforms a and b.

A coding sequence of the present invention may be that shown included in Figure 1 (a) , Figure 1 (b) , Figure 2 (a) or Figure 2 (b) , or it may be a mutant, variant, derivative or allele of one of the sequences shown. The sequence may differ from that shown in a said figure by a change which is one or more of addition, insertion, deletion and substitution of one or more nucleotides of the sequence shown. Changes to a nucleotide sequence may result in an amino acid change at the protein level, or not, as determined by the genetic code.

Thus, nucleic acid according to the present invention may include a sequence different from the sequence shown in a figure herein yet encode a polypeptide with the same amino acid sequence .

On the other hand the encoded polypeptide may comprise an amino acid sequence which differs by one or more amino acid residues from the amino acid sequence shown in Figure 1 (c) , Figure 1 (d) or Figure 2 (c) . Nucleic acid encoding a polypeptide which is an amino acid sequence mutant, variant, derivative or allele of the sequence shown in one of these figures is further provided by the present invention. Such polypeptides are discussed below. Nucleic acid encoding such a polypeptide may show at the nucleotide sequence and/or encoded amino acid level greater than about 60% homology with the coding sequence shown in the relevant figure and/or the amino acid sequence shown in the relevant figure, greater than about 70% homology, greater than about 80% homology, greater than about 90% homology or greater than about 95% homology. For amino acid "homology", this may be understood to be similarity (according to the established principles of amino acid similarity, e.g. as determined using the algorithm GAP (Genetics Computer Group, Madison, WI) or identity. GAP uses the Needleman and Wunsch algorithm to align two complete sequences that maximizes the number of matches and minimizes the number of gaps. Generally, the default parameters are used, with a gap creation penalty = 12 and gap extension penalty = 4. Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al . (1990) J. Mol . Biol . 215: 405-410, FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith-Waterman algorithm (Smith and Waterman (1981) J. Mol Biol . 147: 195-197), generally employing default parameters. Use of either of the terms "homology" and "homologous" herein does not imply any necessary evolutionary relationship between compared sequences, in keeping for example with standard use of terms such as "homologous recombination" which merely requires that two nucleotide sequences are sufficiently similar to recombine under the appropriate conditions. Further discussion of polypeptides according to the present invention, which may be encoded by nucleic acid according to the present invention, is found below. The present invention extends to nucleic acid that hybridizes with any one or more of the specific sequences disclosed herein under stringent conditions. Suitable conditions include, e.g. for detection of sequences that are about 80-90% 5 identical suitable conditions include hybridization overnight at 42°C in 0.25M Na₂HPO„, pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 55°C in 0. IX SSC, 0.1% SDS. For detection of sequences that are greater than about 90% identical, suitable conditions include hybridization overnight at 65°C in 100.25M Na₂HP0₄, pH 7.2, 6.5% SDS, 10% dextran sulfate and a final wash at 60°C in 0. IX SSC, 0.1% SDS.

The coding sequence may be included within a nucleic acid molecule which has the sequence shown in Figure 1 (a) or Figure

152(a) and encode the full polypeptide of Figure 1(c) or Figure 2(c). Mutants, variants, derivatives and alleles of these sequences are included within the scope of the present invention in terms analogous to those set out in the preceding paragraph and in the following disclosure. The same applies

20 for the second isoform of SAPLa, of which the amino acid sequence is shown in Figure 1 (d) .

Alterations in a sequence according to the present invention which are associated with IDDM or other disease may be 25 preferred in accordance with embodiments of the present invention. Implications for screening, e.g. for diagnostic or prognostic purposes, are discussed below. Particular nucleotide sequence alleles according to the present invention have sequences with a variation indicated in Table 2. One or more of these may be associated with susceptibility to IDDM or other disease.

Generally, nucleic acid according to the present invention is provided as an isolate, in isolated and/or purified form, or free or substantially free of material with which it is naturally associated, such as free or substantially free of nucleic acid flanking the gene in the human genome, except possibly one or more regulatory sequence (s) for expression. Nucleic acid may be wholly or partially synthetic and may include genomic DNA, cDNA or RNA. The coding sequence shown herein is a DNA sequence. Where nucleic acid according to the invention includes RNA, reference to the sequence shown should be construed as encompassing reference to the RNA equivalent, with U substituted for T.

Nucleic acid may be provided as part of a replicable vector, and also provided by the present invention are a vector including nucleic acid as set out above, particularly any expression vector from which the encoded polypeptide can be expressed under appropriate conditions, and a host cell containing any such vector or nucleic acid. An expression vector in this context is a nucleic acid molecule including nucleic acid encoding a polypeptide of interest and appropriate regulatory sequences for expression of the polypeptide, in an in vi tro expression system, e.g. reticulocyte lysate, or in vivo, e.g. in eukaryotic cells such as COS or CHO cells or in prokaryotic cells such as E. coli . This is discussed further below.

The nucleic acid sequence provided in accordance with the present invention is useful for identifying nucleic acid of interest (and which may be according to the present invention) in a test sample. The present invention provides a method of obtaining nucleic acid of interest, the method including hybridisation of a probe having a sequence shown herein, or a complementary sequence, to target nucleic acid. Hybridisation is generally followed by identification of successful hybridisation and isolation of nucleic acid which has hybridised to the probe, which may involve one or more steps of PCR. It will not usually be necessary to use a probe with the complete sequence shown in any of these figures. Shorter fragments, particularly fragments with a sequence encoding the conserved motifs may be used.

Nucleic acid according to the present invention is obtainable using one or more oligonucleotide probes or primers designed to hybridise with one or more fragments of the nucleic acid sequence shown in any of the figures, particularly fragments of relatively rare sequence, based on codon usage or statistical analysis. A primer designed to hybridise with a fragment of the nucleic acid sequence shown in any of the figures may be used in conjunction with one or more oligonucleotides designed to hybridise to a sequence in a cloning vector within which target nucleic acid has been cloned, or in so-called "RACE" (rapid ampli ication of cDNA ends) in which cDNA's in a library are ligated to an oligonucleotide linker and PCR is performed using a primer which hybridises with a sequence shown and a primer which hybridises to the oligonucleotide linker.

Such oligonucleotide probes or primers, as well as the full- length sequence (and mutants, alleles, variants and derivatives) are also useful in screening a test sample containing nucleic acid for the presence of alleles, mutants and variants, with diagnostic and/or prognostic implications as discussed in more detail below.

Nucleic acid isolated and/or purified from one or more cells (e.g. human) or a nucleic acid library derived from nucleic acid isolated and/or purified from cells (e.g. a cDNA library derived from mRNA isolated from the cells) , may be probed under conditions for selective hybridisation and/or subjected to a specific nucleic acid amplification reaction such as the polymerase chain reaction (PCR) (reviewed for instance in "PCR protocols; A Guide to Methods and Applications", Eds. Innis et al, 1990, Academic Press, New York, Mullis et al, Cold Spring Harbor Symp. Quant. Biol., 51:263, (1987), Ehrlich (ed) , PCR technology, Stockton Press, NY, 1989, and Ehrlich et al, Science, 252:1643-1650, (1991)). PCR comprises steps of denaturation of template nucleic acid (if double-stranded) , annealing of primer to target, and polymerisation. The nucleic acid probed or used as template in the amplification reaction may be genomic DNA, cDNA or RNA. Other specific nucleic acid amplification techniques include strand displacement activation, the QB replicase system, the repair chain reaction, the ligase chain reaction and ligation activated transcription. For convenience, and because it is generally preferred, the term PCR is used herein in contexts where other nucleic acid amplification techniques may be applied by those skilled in the art. Unless the context requires otherwise, reference to PCR should be taken to cover use of any suitable nucleic amplification reaction available in the art.

In the context of cloning, it may be necessary for one or more gene fragments to be ligated to generate a full-length coding sequence. Also, where a full-length encoding nucleic acid molecule has not been obtained, a smaller molecule representing part of the full molecule, may be used to obtain full-length clones. Inserts may be prepared from partial cDNA clones and used to screen cDNA libraries. The full-length clones isolated may be subcloned into expression vectors and activity assayed by transfection into suitable host cells, e.g. with a reporter plasmid.

A method may include hybridisation of one or more (e.g. two) probes or primers to target nucleic acid. Where the nucleic acid is double-stranded DNA, hybridisation will generally be preceded by denaturation to produce single-stranded DNA. The hybridisation may be as part of a PCR procedure, or as part of a probing procedure not involving PCR. An example procedure would be a combination of PCR and low stringency hybridisation. A screening procedure, chosen from the many available to those skilled in the art, is used to identify successful hybridisation events and isolated hybridised nucleic acid.

Binding of a probe to target nucleic acid (e.g. DNA) may be measured using any of a variety of techniques at the disposal of those skilled in the art. For instance, probes may be radioactively, fluorescently or enzymatically labelled. Other methods not employing labelling of probe include examination of restriction fragment length polymorphisms, amplification using PCR, RN'ase cleavage and allele specific oligonucleotide probing. Probing may employ the standard Southern blotting technique. For instance DNA may be extracted from cells and digested with different restriction enzymes. Restriction fragments may then be separated by electrophoresis on an agarose gel, before denaturation and transfer to a nitrocellulose filter. Labelled probe may be hybridised to the DNA fragments on the filter and binding determined. DNA for probing may be prepared from RNA preparations from cells.

Preliminary experiments may be performed by hybridising under low stringency conditions various probes to Southern blots of DNA digested with restriction enzymes. Suitable conditions would be achieved when a large number of hybridising fragments were obtained while the background hybridisation was low. Using these conditions nucleic acid libraries, e.g. cDNA libraries representative of expressed sequences, may be searched. Those skilled in the art are well able to employ suitable conditions of the desired stringency for selective hybridisation, taking into account factors such as oligonucleotide length and base composition, temperature and so on. On the basis of amino acid sequence information, oligonucleotide probes or primers may be designed, taking into account the degeneracy of the genetic code, and, where appropriate, codon usage of the organism from the candidate nucleic acid is derived. An oligonucleotide for use in nucleic acid amplification may have about 10 or fewer codons (e.g. 6, 7 or 8), i.e. be about 30 or fewer nucleotides in length (e.g. 18, 21 or 24). Generally specific primers are upwards of 14 nucleotides in length, but need not be than 18- 20. Those skilled in the art are well versed in the design of primers for use processes such as PCR. Various techniques for synthesizing oligonucleotide primers are well known in the art, including phosphotriester and phosphodiester synthesis methods .

Preferred amino acid sequences suitable for use in the design of probes or PCR primers may include sequences conserved (completely, substantially or partly) encoding the motifs highlighted in Figure 3.

A further aspect of the present invention provides an oligonucleotide or polynucleotide fragment of the nucleotide sequence shown in any of the figures herein providing nucleic acid according to the present invention, or a complementary sequence, in particular for use in a method of obtaining and/or screening nucleic acid. Some preferred oligonucleotides have a sequence shown in Table 1, or a sequence which differs from any of the^" sequences shown by addition, substitution, insertion or deletion of one or more nucleotides, but preferably without abolition of ability to hybridise selectively with nucleic acid in accordance with the present invention, that is wherein the degree of similarity of the oligonucleotide or polynucleotide with one of the sequences given is sufficiently high. In some preferred embodiments, oligonucleotides according to the present invention that are fragments of any of the sequences shown, or any allele associated with IDDM or other disease susceptibility, are at least about 10 nucleotides in length, more preferably at least about 15 nucleotides in length, more preferably at least about 20 nucleotides in length. Such fragments themselves individually represent aspects of the present invention. Fragments and other oligonucleotides may be used as primers or probes as discussed but may also be generated (e.g. by PCR) in methods concerned with determining the presence in a test sample of a sequence indicative of IDDM or other disease susceptibility.

Methods involving use of nucleic acid in diagnostic and/or prognostic contexts, for instance in determining susceptibility to IDDM or other disease, and other methods concerned with determining the presence of sequences indicative of IDDM or other disease susceptibility are discussed below.

Further embodiments of oligonucleotides according to the present invention are anti-sense oligonucleotide sequences based on the nucleic acid sequences described herein. Anti- sense oligonucleotides may be designed to hybridise to the complementary sequence of nucleic acid, pre-mRNA or mature mRNA, interfering with the production of polypeptide encoded by a given DNA sequence (e.g. either native polypeptide or a mutant form thereof) , so that its expression is reduce or prevented altogether. Anti-sense techniques may be used to target a coding sequence, a control sequence of a gene, e.g. in the 5' flanking sequence, whereby the antisense oligonucleotides can interfere with control sequences. Anti- sense oligonucleotides may be DNA or RNA and may be of around 14-23 nucleotides, particularly around 15-18 nucleotides, in length. The construction of antisense sequences and their use is described in Peyman and Ulman, Chemical Reviews, 90:543- 584, (1990), and Crooke, Ann. Rev. Pharmacol. Toxicol., 32:329-376, (1992) .

Nucleic acid according to the present invention may be used in methods of gene therapy, for instance in treatment of individuals with the aim of preventing or curing (wholly or partially) IDDM or other disease. This may ease one or more symptoms of the disease. This is discussed below.

Nucleic acid according to the present invention, such as a full-length coding sequence or oligonucleotide probe or primer, may be provided as part of a kit, e.g. in a suitable container such as a vial in which the contents are protected from the external environment. The kit may include instructions for use of the nucleic acid, e.g. in PCR and/or a method for determining the presence of nucleic acid of interest in a test sample. A kit wherein the nucleic acid is intended for use in PCR may include one or more other reagents required for the reaction, such as polymerase, nucleosides, buffer solution etc. The nucleic acid may be labelled. A kit for use in determining the presence or absence of nucleic acid of interest may include one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, e.g. a swab for removing cells from the buccal cavity or a syringe for removing a blood sample (such components generally being sterile) .

According to a further aspect, the present invention provides a nucleic acid molecule including a SAPL gene promoter.

The promoter may comprise or consist essentially of a sequence of nucleotides 5' to the SAPL gene in the human chromosome, or an equivalent sequence in another species, such as the mouse.

Any of the sequences disclosed in the figures herein may be used to construct a probe for use in identification and isolation of a promoter from a genomic library containing a genomic SAPL gene. Techniques and conditions for such probing are well known in the art and are discussed elsewhere herein. To find minimal elements or motifs responsible for tissue and/or developmental regulation, restriction enzyme or nucleases may be used to digest a nucleic acid molecule, followed by an appropriate assay (for example using a reporter gene such as luciferase) to determine the sequence required. A preferred embodiment of the present invention provides a nucleic acid isolate with the minimal nucleotide sequence required for SAPL promoter activity.

Figure 4 shows a sequence for the putative SAPL promoter. Underlined sequences exhibit similarity to the syntenic mouse DNA sequence. Sequence in bold is found in the SAPL cDNA sequence of Figure 1 (a) . Capital letters indicate bases that match the pattern for Spl transscription factor binding sites. GGGGGTCC matches an NF kappa B transcription factor binding site. GCCAAT matches the CAAT site. The sequence was identified as a putative promoter by the computer algorithm PROMOTERSCAN (Prestridge (1995) J. Mol Biol . 249: 923-932) and corresponds to a CpG island.

As noted, the promoter may comprise one or more sequence motifs or elements conferring developmental and/or tissue- specific regulatory control of expression. Other regulatory sequences may be included, for instance as identified by mutation or digest assay in an appropriate expression system or by sequence comparison with available information, e.g. using a computer to search on-line databases.

By "promoter" is meant a sequence of nucleotides from which transcription may be initiated of DNA operably linked downstream (i.e. in the 3' direction on the sense strand of double-stranded DNA) .

"Operably linked" means joined as part of the same nucleic acid molecule, suitably positioned and oriented for transcription to be initiated from the promoter. DNA operably linked to a promoter is "under transcriptional initiation regulation" of the promoter.

The present invention extends to a promoter which has a nucleotide sequence which is allele, mutant, variant or derivative, by way of nucleotide addition, insertion, substitution or deletion of a promoter sequence as provided herein. Preferred levels of sequence homology with a provided sequence may be analogous to those set out above for encoding nucleic acid and polypeptides according to the present invention. Systematic or random mutagenesis of nucleic acid to make an alteration to the nucleotide sequence may be performed using any technique known to those skilled in the art. One or more alterations to a promoter sequence according to the present invention may increase or decrease promoter activity, or increase or decrease the magnitude of the effect of a substance able to modulate the promoter activity.

"Promoter activity" is used to refer to ability to initiate transcription. The level of promoter activity is quantifiable for instance by assessment of the amount of mRNA produced by transcription from the promoter or by assessment of the amount of protein product produced by translation of mRNA produced by transcription from the promoter. The amount of a specific mRNA present in an expression system may be determined for example using specific oligonucleotides which are able to hybridise with the mRNA and which are labelled or may be used in a specific amplification reaction such as the polymerase chain reaction. Use of a reporter gene facilitates determination of promoter activity by reference to protein production .

Further provided by the present invention is a nucleic acid construct comprising a SAPL promoter region or a fragment, mutant, allele, derivative or variant thereof able to promoter transcription, operably linked to a heterologous gene, e.g. a coding sequence. A "heterologous" or "exogenous" gene is generally not a modified form of SAPL . Generally, the gene may be transcribed into mRNA which may be translated into a peptide or polypeptide product which may be detected and preferably quantitated following expression. A gene whose encoded product may be assayed following expression is termed a "reporter gene", i.e. a gene which "reports" on promoter activity. The reporter gene preferably encodes an enzyme which catalyses a reaction which produces a detectable signal, preferably a visually detectable signal, such as a coloured product. Many examples are known, including β-galactosidase and luciferase. β-galactosidase activity may be assayed by production of blue colour on substrate, the assay being by eye or by use of a spectro-photometer to measure absorbance. Fluorescence, for example that produced as a result of luciferase activity, may be quantitated using a spectrophotometer . Radioactive assays may be used, for instance using chloramphenicol acetyltransferase, which may also be used in non-radioactive assays. The presence and/or amount of gene product resulting from expression from the reporter gene may be determined using a molecule able to bind the product, such as an antibody or fragment thereof. The binding molecule may be labelled directly or indirectly using any standard technique.

Those skilled in the art are well aware of a multitude of possible reporter genes and assay techniques which may be used to determine gene activity. Any suitable reporter/assay may be used and it should be appreciated that no particular choice is essential to or a limitation of the present invention.

Nucleic acid constructs comprising a promoter (as disclosed herein) and a heterologous gene (reporter) may be employed in screening for a substance able to modulate activity of the promoter. For therapeutic purposes, e.g. for treatment of IDDM or other disease, a substance able to up-regulate expression of the promoter may be sought. A method of screening for ability of a substance to modulate activity of a promoter may comprise contacting an expression system, such as a host cell, containing a nucleic acid construct as herein disclosed with a test or candidate substance and determining expression of the heterologous gene.

The level of expression in the presence of the test substance may be compared with the level of expression in the absence of the test substance. A difference in expression in the presence of the test substance indicates ability of the substance to modulate gene expression. An increase in expression of the heterologous gene compared with expression of another gene not linked to a promoter as disclosed herein indicates specificity of the substance for modulation of the promoter.

A promoter construct may be introduced into a cell line using any technique previously described to produce a stable cell line containing the reporter construct integrated into the genome. The cells may be grown and incubated with test compounds for varying times. The cells may be grown in 96 well plates to facilitate the analysis of large numbers of compounds. The cells may then be washed and the reporter gene expression analysed. For some reporters, such as luciferase the cells will be lysed then analysed.

Following identification of a substance which modulates or affects promoter activity, the substance may be investigated further. Furthermore, it may be manufactured and/or used in preparation, i.e. manufacture or formulation, of a composition such as a medicament, pharmaceutical composition or drug. These may be administered to individuals.

Thus, the present invention extends in various aspects not only to a substance identified using a nucleic acid molecule as a modulator of promoter activity, in accordance with what is disclosed herein, but also a pharmaceutical composition, medicament, drug or other composition comprising such a substance, a method comprising administration of such a composition to a patient, e.g. for increasing SAPL expression for instance in treatment (which may include preventative treatment) of IDDM or other disease, use of such a substance in manufacture of a composition for administration, e.g. for increasing SAPL expression for instance in treatment of IDDM or other disease, and a method of making a pharmaceutical composition comprising admixing such a substance with a pharmaceutically acceptable excipient, vehicle or carrier, and optionally other ingredients. A further aspect of the present invention provides a polypeptide which has the amino acid sequence shown in Figure 1(c), Figure 1(d) or Figure 2(c), or includes the first 776 amino acids of Figure 1(c) and Figure 2(c), which are identical between SAPLa and SAPLb, which may be in isolated and/or purified form, free or substantially free of material with which it is naturally associated, such as other polypeptides or such as human polypeptides other than that for which the amino acid sequence is shown in a said figure, or (for example if produced by expression in a prokaryotic cell) lacking in native glycosylation, e.g. unglycosylated.

Polypeptides which are amino acid sequence variants, alleles, derivatives or mutants are also provided by the present invention. A polypeptide which is a variant, allele, derivative or mutant may have an amino acid sequence which differs from that given in a figure herein by one or more of addition, substitution, deletion and insertion of one or more amino acids. Preferred such polypeptides have SAPL function, that is to say have one or more of the following properties: immunological cross-reactivity with an antibody reactive the polypeptide for which the sequence is given in a figure herein; sharing an epitope with the polypeptide for which the amino acid sequence is shown in a figure herein (as determined for example by immunological cross-reactivity between the two polypeptides) ; a biological activity which is inhibited by an antibody raised against the polypeptide whose sequence is shown in a figure herein; ability to complement the yeast mutation; containing one or more of the conserved sequences identified in Figure 3; containing the STDSEE repeat. Alteration of sequence may change the nature and/or level of activity and/or stability of the SAPL protein.

A polypeptide which is an amino acid sequence variant, allele, derivative or mutant of the amino acid sequence shown in a figure herein may comprise an amino acid sequence which shares greater than about 35% sequence identity with the sequence shown, greater than about 40%, greater than about 50%, greater than about 60%, greater than about 70%, greater than about 80%, greater than about 90% or greater than about 95%. The sequence may share greater than about 60% similarity, greater than about 70% similarity, greater than about 80% similarity or greater than about 90% similarity with the amino acid sequence shown in the relevant figure. Amino acid similarity is generally defined with reference to the algorithm GAP (Genetics Computer Group, Madison, WI) as noted above, or the TBLASTN program, of Altschul et al . (1990) J. Mol. Biol. 215: 403-10, or the Smith-Waterman algorithm (Smith and Waterman (1981) J. Mol Biol . 147: 195-197). Similarity allows for "conservative variation", i.e. substitution of one hydrophobic residue such as isoleucine, valine, leucine or methionine for another, or the substitution of one polar residue for another, such as arginine for lysine, glutamic for aspartic acid, or glutamine for asparagine. Particular amino acid sequence variants may differ from that shown in a figure herein by insertion, addition, substitution or deletion of 1 amino acid, 2, 3, 4, 5-10, 10-20 20-30, 30-50, 50-100, 100-150, or more than 150 amino acids.

Sequence comparison may be made over the full-length of the relevant sequence shown herein, or may more preferably be over a contiguous sequence of about or greater than about 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 133, 150, 167, 200, 233, 250, 267, 300, 333, 350, 400, 450, 500, 550, 600, 650, 700, 750, 760, 770, 776, 780, or 790 amino acids or nucleotide triplets, compared with the relevant amino acid sequence or nucleotide sequence as the case may be.

The present invention also includes peptides which include or consist of fragments of a polypeptide of the invention.

The skilled person can use the techniques described herein and others well known in the art to produce large amounts of peptides, for instance by expression from encoding nucleic acid.

Peptides can also be generated wholly or partly by chemical synthesis. The compounds of the present invention can be readily prepared according to well-established, standard liquid or, preferably, solid-phase peptide synthesis methods, general descriptions of which are broadly available (see, for example, in J.M. Stewart and J.D. Young, Solid Phase Peptide Synthesis, 2nd edition, Pierce Chemical Company, Rockford, Illinois (1984), in M. Bodanzsky and A. Bodanzsky, The Practice of Peptide Synthesis, Springer Verlag, New York (1984); and Applied Biosystems 430A Users Manual, ABI Inc., Foster City, California) , or they may be prepared in solution, by the liquid phase method or by any combination of solid- phase, liquid phase and solution chemistry, e.g. by first completing the respective peptide portion and then, if desired and appropriate, after removal of any protecting groups being present, by introduction of the residue X by reaction of the respective carbonic or sulfonic acid or a reactive derivative thereof.

The present invention also includes active portions, fragments, derivatives and functional mimetics of the polypeptides of the invention. An "active portion" of a polypeptide means a peptide which is less than said full length polypeptide, but which retains a biological activity such as disclosed herein.

A "fragment" of a polypeptide generally means a stretch of amino acid residues of at least about five contiguous amino acids, often at least about seven contiguous amino acids, typically at least about nine contiguous amino acids, more preferably at least about 13 contiguous amino acids, and, more preferably, at least about 20 to 30 or more contiguous amino acids. Fragments of the SAPL polypeptide sequence may include antigenic determinants or epitopes useful for raising antibodies to a portion of the amino acid sequence. Alanine scans are commonly used to find and refine peptide motifs within polypeptides, this involving the systematic replacement of each residue in turn with the amino acid alanine, followed by an assessment of biological activity.

Preferred fragments of SAPL include those with any of the following amino acid sequences: HPSQEEDRHSNASQ

RIQQFDDGGSDEEDI

PESQRRSSSGSTDSE

PSSSPEQRTGQPSAPGDTS which may be used for instance in raising or isolating antibodies . Variant and derivative peptides, peptides which have an amino acid sequence which differs from one of these sequences by way of addition, insertion, deletion or substitution of one or more amino acids are also provided by the present invention, generally with the proviso that the variant or derivative peptide is bound by an antibody or other specific binding member which binds one of the peptides whose sequence is shown. A peptide which is a variant or derivative of one of the shown peptides may compete with the shown peptide for binding to a specific binding member, such as an antibody or antigen-binding fragment thereof.

Where additional amino acids are included in a peptide, these may be heterologous or foreign to the polypeptide of the invention, and the peptide may be about 20, 25, 30 or 35 amino acids in length. A peptide according to this aspect may be included within a larger fusion protein, particularly where the peptide is fused to a non-SAPL (i.e. heterologous or foreign) sequence, such as a polypeptide or protein domain.

A "derivative" of a polypeptide or a fragment thereof may include a polypeptide modified by varying the amino acid sequence of the protein, e.g. by manipulation of the nucleic acid encoding the protein or by altering the protein itself. Such derivatives of the natural amino acid sequence may involve one or more of insertion, addition, deletion or substitution of one or more amino acids, which may be without fundamentally altering the qualitative nature of biological activity of the wild type polypeptide. Also encompassed within the scope of the present invention are functional mimetics of active fragments of the SAPL polypeptides provided (including alleles, mutants, derivatives and variants) . The term "functional mimetic" means a substance which may not contain an active portion of the relevant amino acid sequence, and probably is not a peptide at all, but which retains in qualitative terms biological activity of natural SAPL polypeptide. The design and screening of candidate mimetics is described in detail below.

Other fragments of the polypeptides for which sequence information is provided herein are provided as aspects of the present invention, for instance corresponding to functional domains .

A polypeptide according to the present invention may be isolated and/or purified (e.g. using an antibody) for instance after production by expression from encoding nucleic acid (for which see below) . Thus, a polypeptide may be provided free or substantially free from contaminants with which it is naturally associated (if it is a naturally-occurring polypeptide) . A polypeptide may be provided free or substantially free of other polypeptides. Polypeptides according to the present invention may be generated wholly or partly by chemical synthesis. The isolated and/or purified polypeptide may be used in formulation of a composition, which may include at least one additional component, for example a pharmaceutical composition including a pharmaceutically acceptable excipient, vehicle or carrier. A composition including a polypeptide according to the invention may be used in prophylactic and/or therapeutic treatment as discussed below.

A polypeptide, peptide, allele, mutant, derivative or variant according to the present invention may be used as an immunogen or otherwise in obtaining specific antibodies. Antibodies are useful in purification and other manipulation of polypeptides and peptides, diagnostic screening and therapeutic contexts. This is discussed further below.

A polypeptide according to the present invention may be used in screening for molecules which affect or modulate its activity or function, e.g. binding to or modulating the activity of a protein phosphatase, or ability to complement SAP mutant yeast. Such molecules may interact with SAPL or with one or more accessory molecules, and may be useful in a therapeutic (possibly including prophylactic) context.

It is well known that pharmaceutical research leading to the identification of a new drug may involve the screening of very large numbers of candidate substances, both before and even after a lead compound has been found. This is one factor which makes pharmaceutical research very expensive and time- consuming. Means for assisting in the screening process can have considerable commercial importance and utility. Such means for screening for substances potentially useful in treating or preventing IDDM or other disease is provided by polypeptides according to the present invention. Substances identified as modulators of the polypeptide represent an advance in the fight against IDDM and other diseases since they provide basis for design and investigation of therapeutics for in vivo use. Furthermore, they may be useful in any of a number of conditions, including autoimmune diseases, such as glomerulonephritis, diseases and disorders involving cellular proliferation, such as psoriasis, tumors and cancer, given the functional indications for SAPL, discussed elsewhere herein. As noted elsewhere, SAPL , fragments thereof, and nucleic acid according to the invention may also be useful in combatting any of these diseases and disorders .

In various further aspects the present invention relates to screening and assay methods and means, and substances identified thereby.

Thus, further aspects of the present invention provide the use of a polypeptide or peptide (particularly a fragment of a polypeptide of the invention as disclosed, and/or encoding nucleic acid therefor, in screening or searching for and/or obtaining/identifying a substance, e.g. peptide or chemical compound, which interacts and/or^' binds with the polypeptide or peptide and/or interferes with its function or activity or that of another substance, e.g. polypeptide or peptide, which interacts and/or binds with the polypeptide or peptide of the invention. For instance, a method according to one aspect of the invention includes providing a polypeptide or peptide of the invention and bringing it into contact with a substance, which contact may result in binding between the polypeptide or peptide and the substance. Binding may be determined by any of a number of techniques available in the art, both qualitative and quantitative.

In various aspects the present invention is concerned with provision of assays for substances which inhibit interaction between a polypeptide of the invention and one or more protein phosphatases, particularly those similar to SIT4 such as human protein phosphatase 6 (Bastians and Ponstingle (1996) J. Cell . Sci . 109: 2865-2874) .

Further assays are for substances which interact with or bind a polypeptide of the invention and/or modulate one or more of its activities.

One aspect of the present invention provides an assay which includes :

(a) bringing into contact a polypeptide or peptide according to the invention and a putative binding molecule or other test substance; and (b) determining interaction or binding between the polypeptide or peptide and the test substance.

A substance which interacts with the polypeptide or peptide of the invention may be isolated and/or purified, manufactured and/or used to modulate its activity as discussed.

A further aspect of the present invention provides an assay method which includes: (a) bringing into contact a substance including a SAPL polypeptide or fragment, mutant, variant or derivative thereof, a substance including a fragment of a second polypeptide or a fragment, mutant, variant or derivative of said second polypeptide, which is able to bind the SAPL polypeptide; and a test compound, under conditions in which in the absence of the test compound being an inhibitor, the two said substances interact; (b) determining interaction between said substance.

It is not necessary to use the entire proteins for assays of the invention which test for binding between two molecules. Fragments may be generated and used in any suitable way known to those of skill in the art. Suitable ways of generating fragments include, but are not limited to, recombinant expression of a fragment from encoding DNA. Such fragments may be generated by taking encoding DNA, identifying suitable restriction enzyme recognition sites either side of the portion to be expressed, and cutting out said portion from the DNA. The portion may then be operably linked to a suitable promoter in a standard commercially available expression system. Another recombinant approach is to amplify the relevant portion of the DNA with suitable PCR primers. Small fragments (e.g. up to about 20 or 30 amino acids) may also be generated using peptide synthesis methods which are well known in the art .

The precise format of the assay of the invention may be varied by those of skill in the art using routine skill and knowledge. For example, the interaction between the polypeptides may be studied in vi tro by labelling one with a detectable label and bringing it into contact with the other which has been immobilised on a solid support. Suitable detectable labels include ³⁵S-methionine which may be incorporated into recombinantly produced peptides and polypeptides. Recombinantly produced peptides and polypeptides may also be expressed as a fusion protein containing an epitope which can be labelled with an antibody.

Fusion proteins may be generated that incorporate six histidine residues at either the N-terminus or C-terminus of the recombinant protein. Such a histidine tag may be used for purification of the protein by using commercially available columns which contain a metal ion, either nickel or cobalt (Clontech, Palo Alto, CA, USA) . These tags also serve for detecting the protein using commercially available monoclonal antibodies directed against the six histidine residues (Clontech, Palo Alto, CA, USA) .

The protein which is immobilized on a solid support may be immobilized using an antibody against that protein bound to a solid support or via other technologies which are known per se . A preferred in vi tro interaction may utilise a fusion protein including glutathione-S-transferase (GST) . This may be immobilized on glutathione agarose beads. In an in vi tro assay format of the type described above a test compound can be assayed by determining its ability to diminish the amount of labelled peptide or polypeptide which binds to the immobilized GST-fusion polypeptide. This may be determined by fractionating the glutathione-agarose beads by SDS- polyacrylamide gel electrophoresis . Alternatively, the beads may be rinsed to remove unbound protein and the amount of protein which has bound can be determined by counting the amount of label present in, for example, a suitable scintillation counter.

An assay according to the present invention may also take the form of an in vi vo assay. The in vi vo assay may be performed in a cell line such as a yeast strain in which the relevant polypeptides or peptides are expressed from one or more vectors introduced into the cell.

A method of screening for a substance which modulates activity of a polypeptide may include contacting one or more test substances with the polypeptide in a suitable reaction medium, testing the activity of the treated polypeptide and comparing that activity with the activity of the polypeptide in comparable reaction medium untreated with the test substance or substances. A difference in activity between the treated and untreated polypeptides is indicative of a modulating effect of the relevant test substance or substances.

In a further aspect of the invention there is provided an assay method which includes:

(a) bringing into contact a substance including a fragment of a polypeptide according to the invention including a putative phosphorylation site, e.g. as identified in Table 4, or a mutant, variant or derivative thereof and a test compound in the presence of a kinase under conditions in which the kinase normally phosphorylates said fragment, mutant, variant or derivative; and

(b) determining phosphorylation of said fragment, mutant, variant or derivative.

The kinase may be, for example, cAMP dependent protein kinase, protein kinase C, or casein kinase 2

Phosphorylation may be determined for example by immobilising a polypeptide of the invention, or fragment, mutant, variant or derivative thereof, e.g. on a bead or plate, and detecting phosphorylation using an antibody or other binding molecule which binds the relevant site of phosphorylation with a different affinity when the site is phosphorylated from when the site is not phosphorylated. Such antibodies may be obtained by means of any standard technique as discussed elsewhere herein, e.g. using a phosphorylated peptide (such as a fragment of a SAPL polypeptide) . Binding of a binding molecule which discriminates between the phosphorylated and non-phosphorylated form of the polypeptide or relevant fragment, mutant, variant or derivative thereof may be assessed using any technique available to those skilled in the art, which may involve determination of the presence of a suitable label, such as fluorescence. Phosphorylation may be determined by immobilisation of the polypeptide or a fragment, mutant, variant or derivative thereof, on a "suitable substrate such as a bead or plate, wherein the substrate is impregnated with scintillant, such as in a standard scintillation proximetry assay, with phosphorylation being determined via measurement of the incorporation of radioactive phosphate. Phosphate incorporation into a polypeptide or a fragment, mutant, variant or derivative thereof, may be determined by precipitation with acid, such as trichloroacetic acid, and collection of the precipitate on a suitable material such as nitrocellulose filter paper, followed by measurement of incorporation of radiolabeled phosphate. SDS-PAGE separation of substrate may be employed followed by detection of radiolabel .

Combinatorial library technology (Schultz, JS (1996) Biotechnol. Prog. 12:729-743) provides an efficient way of testing a potentially vast number of different substances for ability to modulate activity of a polypeptide. Prior to or as well as being screened for modulation of activity, test substances may be screened for ability to interact with the polypeptide, e.g. in a yeast two-hybrid system (which requires that both the polypeptide and the test substance can be expressed in yeast from encoding nucleic acid) . This may be used as a coarse screen prior to testing a substance for actual ability to modulate activity of the polypeptide.

The amount of test substance or compound which may be added to an assay of the invention will normally be determined by trial and error depending upon the type of compound used. Typically, from about 0.01 to 100 nM concentrations of putative inhibitor compound may be used, for example from 0.1 to 10 nM. Greater concentrations may be used when a peptide is the test substance. Compounds which may be used may be natural or synthetic chemical compounds used in drug screening programmes. Extracts of plants which contain several characterised or uncharacterised components may also be used. A further class of putative inhibitor compounds can be derived from the SAPL polypeptide and/or a ligand which binds. Peptide fragments of from 5 to 40 amino acids, for example from 6 to 10 amino acids from the region of the relevant polypeptide responsible for interaction, may be tested for their ability to disrupt such interaction.

Other candidate inhibitor compounds may be based on modelling the 3-dimensional structure of a polypeptide or peptide fragment and using rational drug design to provide potential inhibitor compounds with particular molecular shape, size and charge characteristics.

Following identification of a substance which modulates or affects polypeptide activity, the substance may be investigated further. Furthermore, it may be manufactured and/or used in preparation, i.e. manufacture or formulation, of a composition such as a medicament, pharmaceutical composition or drug. These may be administered to individuals .

Thus, the present invention extends in various aspects not only to a substance identified as a modulator of polypeptide activity, in accordance with what is disclosed herein, but also a pharmaceutical composition, medicament, drug or other composition comprising such a substance, a method comprising administration of such a composition to a patient, e.g. for treatment (which may include preventative treatment) of IDDM or other disease, use of such a substance in manufacture of a composition for administration, e.g. for treatment of IDDM or other disease, and a method of making a pharmaceutical composition comprising admixing such a substance with a pharmaceutically acceptable excipient, vehicle or carrier, and optionally other ingredients.

A substance identified using as a modulator of polypeptide or promoter function may be peptide or non-peptide in nature. Non-peptide "small molecules" are often preferred for many in vivo pharmaceutical uses. Accordingly, a mimetic or mimick of the substance (particularly if a peptide) may be designed for pharmaceutical use. The designing of mimetics to a known pharmaceutically active compound is a known approach to the development of pharmaceuticals based on a "lead" compound. This might be desirable where the active compound is difficult or expensive to synthesise or where it is unsuitable for a particular method of administration, e.g. peptides are not well suited as active agents for oral compositions as they tend to be quickly degraded by proteases in the alimentary canal. Mimetic design, synthesis and testing may be used to avoid randomly screening large number of molecules for a target property.

There are several steps commonly taken in the design of a mimetic from a compound having a given target property. Firstly, the particular parts of the compound that are critical and/or important in determining the target property are determined. In the case of a peptide, this can be done by systematically varying the amino acid residues in the peptide, e.g. by substituting each residue in turn. These parts or residues constituting the active region of the compound are known as its "pharmacophore" .

Once the pharmacophore has been found, its structure is modelled to according its physical properties, e.g. stereochemistry, bonding, size and/or charge, using data from a range of sources, e.g. spectroscopic techniques, X-ray diffraction data and NMR. Computational analysis, similarity mapping (which models the charge and/or volume of a pharmacophore, rather than the bonding between atoms) and other techniques can be used in this modelling process.

In a variant of this approach, the three-dimensional structure of the ligand and its binding partner are modelled. This can be especially useful where the ligand and/or binding partner change conformation on binding, allowing the model to take account of this the design of the mimetic.

A template molecule is then selected onto which chemical groups which mimic the pharmacophore can be grafted. The template molecule and the chemical groups grafted on to it can conveniently be selected so that the mimetic is easy to synthesise, is likely to be pharmacologically acceptable, and does not degrade in vivo, while retaining the biological activity of the lead compound. The mimetic or mimetics found by this approach can then be screened to see whether they have the target property, or to what extent they exhibit it. Further optimisation or modification can then be carried out to arrive at one or more final mimetics for in vivo or clinical testing.

Mimetics of substances identified as having ability to modulate SAPL polypeptide or promoter activity using a screening method as disclosed herein are included within the scope of the present invention. A polypeptide, peptide or substance able to modulate activity of a polypeptide according to the present invention may be provided in a kit, e.g. sealed in a suitable container which protects its contents from the external environment. Such a kit may include instructions for use . A convenient way of producing a polypeptide according to the present invention is to express nucleic acid encoding it, by use of the nucleic acid in an expression system. Accordingly, the present invention also encompasses a method of making a polypeptide (as disclosed) , the method including expression from nucleic acid encoding the polypeptide (generally nucleic acid according to the invention) . This may conveniently be achieved by growing a host cell in culture, containing such a vector, under appropriate conditions which cause or allow expression of the polypeptide. Polypeptides may also be expressed in in vi tro systems, such as reticulocyte lysate.

Systems for cloning and expression of a polypeptide in a variety of different host cells are well known. Suitable host cells include bacteria, eukaryotic cells such as mammalian and yeast, and baculovirus systems. Mammalian cell lines available in the art for expression of a heterologous polypeptide include Chinese hamster ovary cells, HeLa cells, baby hamster kidney cells, COS cells and many others. A common, preferred bacterial host is E . coli . Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. Vectors may be plasmids, viral e.g. 'phage, or phagemid, as appropriate. For further details see, for example, Molecular Cloning: a Laboratory Manual: 2nd edition, Sambrook et al . , 1989, Cold Spring Harbor Laboratory Press. Many known techniques and protocols for manipulation of nucleic acid, for example in preparation of nucleic acid constructs, mutagenesis, sequencing, introduction of DNA into cells and gene expression, and analysis of proteins, are described in detail in Current Protocols in Molecular Biology, Ausubel et al . eds., John Wiley & Sons, 1992.

Thus, a further aspect of the present invention provides a host cell containing nucleic acid as disclosed herein. The nucleic acid of the invention may be integrated into the genome (e.g. chromosome) of the host cell. Integration may be promoted by inclusion of sequences which promote recombination with the genome, in accordance with standard techniques. The nucleic acid may be on an extra-chromosomal vector within the cell .

A still further aspect provides a method which includes introducing the nucleic acid into a host cell. The introduction, which may (particularly for in vitro introduction) be generally referred to without limitation as "transformation", may employ any available technique. For eukaryotic cells, suitable techniques may include calcium phosphate transfection, DEAE-Dextran, electroporation, liposome-mediated transfection and transduction using retrovirus or other virus, e.g. adenovirus, vaccinia or, for insect cells, baculovirus. For bacterial cells, suitable techniques may include calcium chloride transformation, electroporation and transfection using bacteriophage .

Marker genes such as antibiotic resistance or sensitivity genes may be used in identifying clones containing nucleic acid of interest, as is well known in the art.

The introduction may be followed by causing or allowing expression from the nucleic acid, e.g. by culturing host cells (which may include cells actually transformed although more likely the cells will be descendants of the transformed cells) under conditions for expression of the gene, so that the encoded polypeptide is produced. If the polypeptide is expressed coupled to an appropriate signal leader peptide it may be secreted from the cell into the culture medium. Following production by expression, a polypeptide may be isolated and/or purified from the host cell and/or culture medium, as the case may be, and subsequently used as desired, e.g. in the formulation of a composition which may include one or more additional components, such as a pharmaceutical composition which includes one or more pharmaceutically acceptable excipients, vehicles or carriers (e.g. see below).

Introduction of nucleic acid may take place in vivo by way of gene therapy, as discussed below. A host cell containing nucleic acid according to the present invention, e.g. as a result of introduction of the nucleic acid into the cell or into an ancestor of the cell and/or genetic alteration of the sequence endogenous to the cell or ancestor (which introduction or alteration may take place in vivo or ex vivo) , may be comprised (e.g. in the soma) within an organism which is an animal, particularly a mammal, which may be human or non-human, such as rabbit, guinea pig, rat, mouse or other rodent, cat, dog, pig, sheep, goat, cattle or horse, or which is a bird, such as a chicken. Genetically modified or transgenic animals or birds comprising such a cell are also provided as further aspects of the present invention.

Thus, in various further aspects, the present invention provides a non-human animal with a human SAPL transgene within its genome. The transgene may have the sequence of any of the isoforms identified herein or a mutant, derivative, allele or variant thereof as disclosed. In one preferred embodiment, the heterologous human SAPL sequence replaces the endogenous animal sequence. In other preferred embodiments, one or more copies of the human SAPL sequence are added to the animal genome. Preferably the animal is a rodent, and most preferably mouse or rat.

This may have a therapeutic aim. (Gene therapy is discussed below.) The presence of a mutant, allele or variant sequence within cells of an organism, particularly when in place of a homologous endogenous sequence, may allow the organism to be used as a model in testing and/or studying the role of the SAPL gene or substances which modulate activity of the encoded polypeptide and/or promoter in vi tro or are otherwise indicated to be of therapeutic potential.

An animal model for SAPL deficiency may be constructed using standard techniques for introducing mutations into an animal germ-line. In one example of this approach, using a mouse, a vector carrying an insertional mutation within the SAPL gene may be transfected into embryonic stem cells. A selectable marker, for example an antibiotic resistance gene such as neoR, may be included to facilitate selection of clones in which the mutant gene has replaced the endogenous wild type homologue. Such clones may be also be identified or further investigated by Southern blot hybridisation. The clones may then be expanded and cells injected into mouse blastocyst stage embryos. Mice in which the injected cells have contributed to the development of the mouse may be identified by Southern blotting. These chimeric mice may then be bred to produce mice which carry one copy of the mutation in the germ line. These heterozygous mutant animals may then be bred to produce mice carrying mutations in the gene homozygously. The mice having a heterozygous mutation in the SAPL gene may be a suitable model for human individuals having one copy of the gene mutated in the germ line who are at risk of developing IDDM or other disease.

Animal models may also be useful for any of the various diseases discussed elsewhere herein.

Instead of or as well as being used for the production of a polypeptide encoded by a transgene, host cells may be used as a nucleic acid factory to replicate the nucleic acid of interest in order to generate large amounts of it. Multiple copies of nucleic acid of interest may be made within a cell when coupled to an amplifiable gene such as dihyrofolate reductase (DHFR) , as is well known. Host cells transformed with nucleic acid of interest, or which are descended from host cells into which nucleic acid was introduced, may be cultured under suitable conditions, e.g. in a fermentor, taken from the culture and subjected to processing to purifiy the nucleic acid. Following purification, the nucleic acid or one or more fragments thereof may be used as desired, for instance in a diagnostic or prognostic assay as discussed elsewhere herein .

The provision of the novel SAPL polypeptide isoforms and mutants, alleles, variants and derivatives enables for the first time the production of antibodies able to bind these molecules specifically.

Accordingly, a further aspect of the present invention provides an antibody able to bind specifically to the polypeptide whose sequence is given in a figure herein. Such an antibody may be specific in the sense of being able to distinguish between the polypeptide it is able to bind and other human polypeptides for which it has no or substantially no binding affinity (e.g. a binding affinity of about lOOOx less) . Specific antibodies bind an epitope on the molecule which is either not present or is not accessible on other molecules. Antibodies according to the present invention may be specific for the wild-type polypeptide. Antibodies according to the invention may be specific for a particular mutant, variant, allele or derivative polypeptide as between that molecule and the wild-type polypeptide, so as to be useful in diagnostic and prognostic methods as discussed below. Antibodies are also useful in purifying the polypeptide or polypeptides to which they bind, e.g. following production by recombinant expression from encoding nucleic acid.

Preferred antibodies according to the invention are isolated, in the sense of being free from contaminants such as antibodies able to bind other polypeptides and/or free of serum components. Monoclonal antibodies are preferred for some purposes, though polyclonal antibodies are within the scope of the present invention.

Antibodies may be obtained using techniques which are standard in the art. Methods of producing antibodies include immunising a mammal (e.g. mouse, rat, rabbit, horse, goat, sheep or monkey) with the protein or a fragment thereof. Antibodies may be obtained from immunised animals using any of a variety of techniques known in the art, and screened, preferably using binding of antibody to antigen of interest. For instance, Western blotting techniques or immunoprecipitation may be used (Armitage et al . , 1992, Nature 357: 80-82). Isolation of antibodies and/or antibody- producing cells from an animal may be accompanied by a step of sacrificing the animal.

As an alternative or supplement to immunising a mammal with a peptide, an antibody specific for a protein may be obtained from a recombinantly produced library of expressed immunoglobulin variable domains, e.g. using lambda bacteriophage or filamentous bacteriophage which display functional immunoglobulin binding domains on their surfaces; for instance see WO92/01047. The library may be naive, that is constructed from sequences obtained from an organism which has not been immunised with any of the proteins (or fragments) , or may be one constructed using sequences obtained from an organism which has been exposed to the antigen of interest .

Suitable peptides for use in immunising an animal and/or isolating anti-SAPL antibody include any of the following amino acid sequences: HPSQEEDRHSNASQ RIQQFDDGGSDEEDI PESQRRSSSGSTDSE PSSSPEQRTGQPSAPGDTS

Antibodies according to the present invention may be modified in a number of ways. Indeed the term "antibody" should be construed as covering any binding substance having a binding domain with the required specificity. Thus the invention covers antibody fragments, derivatives, functional equivalents and homologues of antibodies, including synthetic molecules and molecules whose shape mimics that of an antibody enabling it to bind an antigen or epitope.

Example antibody fragments, capable of binding an antigen or other binding partner are the Fab fragment consisting of the VL, VH, Cl and CHI domains; the Fd fragment consisting of the VH and CHI domains; the Fv fragment consisting of the VL and VH domains of a single arm of an antibody; the dAb fragment which consists of a VH domain; isolated CDR regions and F(ab')2 fragments, a bivalent fragment including two Fab fragments linked by a disulphide bridge at the hinge region. Single chain Fv fragments are also included.

A hybridoma producing a monoclonal antibody according to the present invention may be subject to genetic mutation or other changes. It will further be understood by those skilled in the art that a monoclonal antibody can be subjected to the techniques of recombinant DNA technology to produce other antibodies or chimeric molecules which retain the specificity of the original antibody. Such techniques may involve introducing DNA encoding the immunoglobulin variable region, or the complementarity determining regions (CDRs) , of an antibody to the constant regions, or constant regions plus framework regions, of a different immunoglobulin. See, for instance, EP184187A, GB 2188638A or EP-A-0239400. Cloning and expression of chimeric antibodies are described in EP-A- 0120694 and EP-A-0125023.

Hybridomas capable of producing antibody with desired binding characteristics are within the scope of the present invention, as are host cells, eukaryotic or prokaryotic, containing nucleic acid encoding antibodies (including antibody fragments) and capable of their expression. The invention also provides methods of production of the antibodies including growing a cell capable of producing the antibody under conditions in which the antibody is produced, and preferably secreted. ^'

The reactivities of antibodies on a sample may be determined by any appropriate means. Tagging with individual reporter molecules is one possibility. The reporter molecules may directly or indirectly generate detectable, and preferably measurable, signals. The linkage of reporter molecules may be directly or indirectly, covalently, e.g. via a peptide bond or non-covalently . Linkage via a peptide bond may be as a result of recombinant expression of a gene fusion encoding antibody and reporter molecule.

One favoured mode is by covalent linkage of each antibody with an individual fluorochrome, phosphor or laser dye with spectrally isolated absorption or emission characteristics. Suitable fluorochromes include fluorescein, rhodamine, phycoerythrin and Texas Red. Suitable chromogenic dyes include diaminobenzidine .

Other reporters include macromolecular colloidal particles or particulate material such as latex beads that are coloured, magnetic or paramagnetic, and biologically or chemically active agents that can directly or indirectly cause detectable signals to be visually observed, electronically detected or otherwise recorded. These molecules may be enzymes which catalyse reactions that develop or change colours or cause changes in electrical properties, for example. They may be molecularly excitable, such that electronic transitions between energy states result in characteristic spectral absorptions or emissions. They may include chemical entities used in conjunction with biosensors. Biotin/avidin or biotin/streptavidin and alkaline phosphatase detection systems may be employed.

The mode of determining binding is not a feature of the present invention and those skilled in the art are able to choose a suitable mode according to their preference and general knowledge. Particular embodiments of antibodies according to the present invention include antibodies able to bind and/or which bind specifically, e.g. with an affinity of at least 10^"7 M, to one of the following peptides: HPSQEEDRHSNASQ RIQQFDDGGSDEEDI PESQRRSSSGSTDSE PSSSPEQRTGQPSAPGDTS

Antibodies according to the present invention may be used in screening for the presence of a polypeptide, for example in a test sample containing cells or cell lysate as discussed, and may be used in purifying and/or isolating a polypeptide according to the present invention, for instance following production of the polypeptide by expression from encoding nucleic acid therefor. Antibodies may modulate the activity of the polypeptide to which they bind and so, if that polypeptide has a deleterious effect in an individual, may be useful in a therapeutic context (which may include prophylaxis) .

An antibody may be provided in a kit, which may include instructions for use of the antibody, e.g. in determining the presence of a particular substance in a test sample. One or more other reagents may be included, such as labelling molecules, buffer solutions, elutants and so on. Reagents may be provided within containers which protect them from the external environment, such as a sealed vial.

The identification of the SAPL gene and indications of its association with IDDM and other diseases paves the way for aspects of the present invention to provide the use of materials and methods, such as are disclosed and discussed above, for establishing the presence or absence in a test sample of an variant form of the gene, in particular an allele or variant specifically associated with IDDM or other disease. This may be for diagnosing a predisposition of an individual to IDDM or other disease. It may be for diagnosing IDDM of a patient with the disease as being associated with the SAPL gene . This allows for planning of appropriate therapeutic and/or prophylactic treatment, permitting stream-lining of treatment by targeting those most likely to benefit.

A variant form of the gene may contain one or more insertions, deletions, substitutions and/or additions of one or more nucleotides compared with the wild-type sequence (such as shown in Table 2) which may or may not disrupt the gene function. Differences at the nucleic acid level are not necessarily reflected by a difference in the amino acid sequence of the encoded polypeptide. However, a mutation or other difference in a gene may result in a frame-shift or stop codon, which could seriously affect the nature of the polypeptide produced (if any) , or a point mutation or gross mutational change to the encoded polypeptide, including insertion, deletion, substitution and/or addition of one or more amino acids or regions in the polypeptide. A mutation in a promoter sequence or other regulatory region may prevent or reduce expression from the gene or affect the processing or stability of the mRNA transcript. For instance, a sequence alteration may affect alternative splicing of mRNA. As discussed, various SAPL isoforms resulting from alternative splicing are provided by the present invention.

There are various methods for determining the presence or absence in a test sample of a particular nucleic acid sequence, such as the sequence shown in any figure herein, or a mutant, variant or allele thereof, e.g. including an alteration shown in Table 2.

Tests may be carried out on preparations containing genomic DNA, cDNA and/or mRNA. Testing cDNA or mRNA has the advantage of the complexity of the nucleic acid being reduced by the absence of intron sequences, but the possible disadvantage of extra time and effort being required in making the preparations . RNA is more difficult to manipulate than DNA because of the wide-spread occurrence of RN'ases. Nucleic acid in a test sample may be sequenced and the sequence compared with the sequence shown in any of the figures herein, to determine whether or not a difference is present. If so, the difference can be compared with known susceptibility alleles (e.g. as shown in Table 2) to determine whether the test nucleic acid contains one or more of the variations indicated, or the difference can be investigated for association with IDDM or other disease.

Since it will not generally be time- or labour-efficient to sequence all nucleic acid in a test sample or even the whole SAPL gene, a specific amplification reaction such as PCR using one or more pairs of primers may be employed to amplify the region of interest in the nucleic acid, for instance the SAPL gene or a particular region in which polymorphisms associated with IDDM or other disease susceptibility occur. The amplified nucleic acid may then be sequenced as above, and/or tested in any other way to determine the presence or absence of a particular feature. Nucleic acid for testing may be prepared from nucleic acid removed from cells or in a library using a variety of other techniques such as restriction enzyme digest and electrophoresis .

Nucleic acid may be screened using a variant- or allele- specific probe. Such a probe corresponds in sequence to a region of the SAPL gene, or its complement, containing a sequence alteration known to be associated with IDDM or other disease susceptibility. Under suitably stringent conditions, specific hybridisation of such a probe to test nucleic acid is indicative of the presence of the sequence alteration in the test nucleic acid. For efficient screening purposes, more than one probe may be used on the same test sample.

Allele- or variant-specific oligonucleotides may similarly be used in PCR to specifically amplify particular sequences if present in a test sample. Assessment of whether a PCR band contains a gene variant may be carried out in a number of ways familiar to those skilled in the art. The PCR product may for instance be treated in a way that enables one to display the polymorphism on a denaturing polyacrylamide DNA sequencing gel, with specific bands that are linked to the gene variants being selected.

SSCP heteroduplex analysis may be used for screening DNA fragments for sequence variants/mutations. It generally involves amplifying radiolabelled 100-300 bp fragments of the gene, diluting these products and denaturing at 95°C. The fragments are quick-cooled on ice so that the DNA remains in single stranded form. These single stranded fragments are run through acrylamide based gels. Differences in the sequence composition will cause the single stranded molecules to adopt difference conformations in this gel matrix making their mobility different from wild type fragments, thus allowing detecting of mutations in the fragments being analysed relative to a control fragment upon exposure of the gel to X- ray film. Fragments with altered mobility/conformations may be directly excised from the gel and directly sequenced for mutation .

Sequencing of a PCR product may involve precipitation with isopropanol, resuspension and sequencing using a TaqFS+ Dye terminator sequencing kit. Extension products may be electrophoresed on an ABI 377 DNA sequencer and data analysed using Sequence Navigator software.

A further possible screening approach employs a PTT assay in which fragments are amplified with primers that contain the consensus Kozak initiation sequences and a T7 RNA polymerase promoter. These extra sequences are incorporated into the 5' primer such that they are in frame with the native coding sequence of the fragment being analysed. These PCR products are introduced into a coupled transcription/translation system. This reaction allows the production of RNA from the fragment and translation of this RNA into a protein fragment. PCR products from controls make a protein product of a wild type size relative to the size of the fragment being analysed. If the PCR product analysed has a frame-shift or nonsense mutation, the assay will yield a truncated protein product relative to controls. The size of the truncated product is related to the position of the mutation, and the relative region of the gene from this patient may be sequenced to identify the truncating mutation.

An alternative or supplement to looking for the presence of variant sequences in a test sample is to look for the presence of the normal sequence, e.g. using a suitably specific oligonucleotide probe or primer. Use of oligonucleotide probes and primers has been discussed in more detail above.

Allele- or variant-specific oligonucleotide probes or primers according to embodiments of the present invention may be selected from those shown in Table 1 and modified versions thereof. Approaches which rely on hybridisation between a probe and test nucleic acid and subsequent detection of a mismatch may be employed. Under appropriate conditions (temperature, pH etc.), an oligonucleotide probe will hybridise with a sequence which is not entirely complementary. The degree of base- pairing between the two molecules will be sufficient for them to anneal despite a mis-match. Various approaches are well known in the art for detecting the presence of a mis-match between two annealing nucleic acid molecules.

For instance, RN'ase A cleaves at the site of a mis-match. Cleavage can be detected by electrophoresing test nucleic acid to which the relevant probe or probe has annealed and looking for smaller molecules (i.e. molecules with higher electrophoretic mobility) than the full length probe/test hybrid.

Thus, an oligonucleotide probe that has the sequence of a region of the normal SAPL gene (either sense or anti-sense strand) in which mutations associated with IDDM or other disease susceptibility are known to occur (e.g. see Table 2) may be annealed to test nucleic acid and the presence or absence of a mis-match determined. Detection of the presence of a mis-match may indicate the presence in the test nucleic acid of a mutation associated with IDDM or other disease susceptibility. On the other hand, an oligonucleotide probe that has the sequence of a region of the gene including a mutation associated with IDDM or other disease susceptibility may be annealed to test nucleic acid and the presence or absence of a mis-match determined. The presence of a mis- match may indicate that the nucleic acid in the test sample has the normal sequence (the absence of a mis-match indicating that the test nucleic acid has the mutation) . In either case, a battery of probes to different regions of the gene may be employed.

The presence of differences in sequence of nucleic acid molecules may be detected by means of restriction enzyme digestion, such as in a method of DNA fingerprinting where the restriction pattern produced when one or more restriction enzymes are used to cut a sample of nucleic acid is compared with the pattern obtained when a sample containing the normal gene shown in a figure herein or a variant or allele, e.g. as containing an alteration shown in Table 2, is digested with the same enzyme or enzymes.

The presence or absence of a lesion in a promoter or other regulatory sequence may also be assessed by determining the level of mRNA production by transcription or the level of polypeptide production by translation from the mRNA. Determination of promoter activity has been discussed above. A test sample of nucleic acid may be provided for example by extracting nucleic acid from cells or biological tissues or fluids, urine, saliva, faeces, a buccal swab, biopsy or preferably blood, or for pre-natal testing from the amnion, placenta or foetus itself.

Screening for the presence of one or more amino acid sequence variants in a test sample has a diagnostic and/or prognostic use, for instance in determining IDDM or other disease susceptibility.

There are various methods for determining the presence or absence in a test sample of a particular polypeptide, such as the polypeptide with the amino acid sequence shown in any figure herein or an amino acid sequence mutant, variant or allele thereof.

A sample may be tested for the presence of a binding partner for a specific binding member such as an antibody (or mixture of antibodies), specific for one or more particular variants of the polypeptide shown in a figure herein. A sample may be tested for the presence of a binding partner for a specific binding member such as an antibody (or mixture of antibodies) , specific for the polypeptide shown in a figure herein. In such cases, the sample may be tested by being contacted with a specific binding member such as an antibody under appropriate conditions for specific binding, before binding is determined, for instance using a reporter system as discussed. Where a panel of antibodies is used, different reporting labels may be employed for each antibody so that binding of each can be determined.

A specific binding member such as an antibody may be used to isolate and/or purify its binding partner polypeptide from a test sample, to allow for sequence and/or biochemical analysis of the polypeptide to determine whether it has the 'sequence and/or properties of the polypeptide whose sequence is disclosed herein, or if it is a mutant or variant form. Amino acid sequence is routine in the art using automated sequencing machines .

A test sample containing one or more polypeptides may be provided for example as a crude or partially purified cell or cell lysate preparation, e.g. using tissues or cells, such as from saliva, faeces, or preferably blood, or for pre-natal testing from the amnion, placenta or foetus itself.

Whether it is a polypeptide, antibody, peptide, nucleic acid molecule, small molecule or other pharmaceutically useful compound according to the present invention that is to be given to an individual, administration is preferably in a "prophylactically effective amount" or a "therapeutically effective amount" (as the case may be, although prophylaxis may be considered therapy) , this being sufficient to show benefit to the individual. The actual amount administered, and rate and time-course of administration, will depend on the nature and severity of what is being treated. Prescription of treatment, e.g. decisions on dosage etc, is within the responsibility of general practioners and other medical doctors .

A composition may be administered alone or in combination with other treatments, either simultaneously or sequentially dependent upon the condition to be treated.

Pharmaceutical compositions according to the present invention, and for use in accordance with the present invention, may include, in addition to active ingredient, a pharmaceutically acceptable excipient, carrier, buffer, stabiliser or other materials well known to those skilled in the art. Such materials should be non-toxic and should not interfere with the efficacy of the active ingredient. The precise nature of the carrier or other material will depend on the route of administration, which may be oral, or by injection, e.g. cutaneous, subcutaneous or intravenous.

Pharmaceutical compositions for oral administration may be in tablet, capsule, powder or liquid form. A tablet may include a solid carrier such as gelatin or an adjuvant. Liquid pharmaceutical compositions generally include a liquid carrier such as water, petroleum, animal or vegetable oils, mineral oil or synthetic oil. Physiological saline solution, dextrose or other saccharide solution or glycols such as ethylene glycol, propylene glycol or polyethylene glycol may be included.

For intravenous, cutaneous or subcutaneous injection, or injection at the site of affliction, the active ingredient will be in the form of a parenterally acceptable aqueous solution which is pyrogen-free and has suitable pH, isotonicity and stability. Those of relevant skill in the art are well able to prepare suitable solutions using, for example, isotonic vehicles such as Sodium Chloride Injection, Ringer's Injection, or Lactated Ringer's Injection. Preservatives, stabilisers, buffers, antioxidants and/or other additives may be included, as required.

Targeting therapies may be used to deliver the active agent more specifically to certain types of cell, by the use of targeting systems such as antibody or cell specific ligands. Targeting may be desirable for a variety of reasons; for example if the agent is unacceptably toxic, or if it would otherwise require too high a dosage, or if it would not otherwise be able to enter the target cells. Instead of administering an agent directly, it may be be produced in target cells by expression from an encoding gene introduced into the cells, e.g. in a viral vector (see below). The vector may be targeted to the specific cells to be treated, or it may contain regulatory elements which are switched on more or less selectively by the target cells. Viral vectors may be targeted using specific binding molecules, such as a sugar, glycolipid or protein such as an antibody or binding fragment thereof. Nucleic acid may be targeted by means of linkage to a protein ligand (such as an antibody or binding fragment thereof) via polylysine, with the ligand being specific for a receptor present on the surface of the target cells.

An agent may be administered in a precursor form, for conversion to an active form by an activating agent produced in, or targeted to, the cells to be treated. This type of approach is sometimes known as ADEPT or VDEPT; the former involving targeting the activating agent to the cells by conjugation to a cell-specific antibody, while the latter involves producing the activating agent, e.g. an enzyme, in .a vector by expression from encoding DNA in a viral vector (see for example, EP-A-415731 and WO 90/07936) .

Nucleic acid according to the present invention, e.g. encoding the authentic biologically active SAPL polypeptide or a functional fragment thereof, may be used in a method of gene therapy, to treat a patient who is unable to synthesize the active polypeptide or unable to synthesize it at the normal level, thereby providing the effect provided by the wild-type with the aim of treating and/or preventing one or more symptoms of IDDM and/or one or more other diseases.

Vectors such as viral vectors have been used to introduce genes into a wide variety of different target cells. Typically the vectors are exposed to the target cells so that transfection can take place in a sufficient proportion of the cells to provide a useful therapeutic or prophylactic effect from the expression of the desired polypeptide. The transfected nucleic acid may be permanently incorporated into the genome of each of the targeted cells, providing long lasting effect, or alternatively the treatment may have to be repeated periodically.

A variety of vectors, both viral vectors and plasmid vectors, are known in the art, see e.g. US Patent No. 5,252,479 and WO 93/07282. In particular, a number of viruses have been used as gene transfer vectors, including adenovirus, papovaviruses, such as SV40, vaccinia virus, herpesviruses, including HSV and EBV, and retroviruses, including gibbon ape leukaemia virus, Rous Sarcoma Virus, Venezualian equine enchephalitis virus, Moloney murine leukaemia virus and murine mammary tumourvirus. Many gene therapy protocols in the prior art have used disabled murine retroviruses.

Disabled virus vectors are produced in helper cell lines in which genes required for production of infectious viral particles are expressed. Helper cell lines are generally missing a sequence which is recognised by the mechanism which packages the viral genome and produce virions which contain no nucleic acid. A viral vector which contains an intact packaging signal along with the gene or other sequence to be delivered (e.g. encoding the SAPL polypeptide or a fragment thereof) is packaged in the helper cells into infectious virion particles, which may then be used for the gene delivery.

Other known methods of introducing nucleic acid into cells include electroporation, calcium phosphate co-precipitation, mechanical techniques such as microinjection, transfer mediated by liposomes and direct DNA uptake and receptor- mediated DNA transfer. Liposomes can encapsulate RNA, DNA and virions for delivery to cells. Depending on factors such as pH, ionic strength and divalent cations being present, the composition of liposomes may be tailored for targeting of particular cells or tissues. Liposomes include phospholipids and may include lipids and steroids and the composition of each such component may be altered. Targeting of liposomes may also be achieved using a specific binding pair member such as an antibody or binding fragment thereof, a sugar or a glycolipid.

The aim of gene therapy using nucleic acid encoding the polypeptide, or an active portion thereof, is to increase the amount of the expression product of the nucleic acid in cells in which the level of the wild-type polypeptide is absent or present only at reduced levels. Such treatment may be therapeutic or prophylactic, particularly in the treatment of individuals known through screening or testing to have an IDDM4 susceptibility allele and hence a predisposition to the disease .

Similar techiques may be used for anti-sense regulation of gene expression, e.g. targeting an antisense nucleic acid molecule to cells in which a mutant form of the gene is expressed, the aim being to reduce production of the mutant gene product. Other approaches to specific down-regulation of genes are well known, including the use of ribozymes designed to cleave specific nucleic acid sequences. Ribozymes are nuceic acid molecules, actually RNA, which specifically cleave single-stranded RNA, such as mRNA, at defined sequences, and their specificity can be engineered. Hammerhead ribozymes may be preferred because they recognise base sequences of about 11-18 bases in length, and so have greater specificity than ribozymes of the Tetrahymena type which recognise sequences of about 4 bases in length, though the latter type of ribozymes are useful in certain circumstances. References on the use of ribozymes include Marschall, et al . Cellular and Molecular Neurobiology, 1994. 14(5): 523; Hasselhoff, Nature 334: 585 (1988) and Cech, J. Amer. Med. Assn., 260: 3030 (1988).

Aspects of the present invention will now be illustrated with reference to the accompanying figures described already above and experimental exemplification, by way of example and not limitation. Further aspects and embodiments will be apparent to those of ordinary skill in the art. All documents mentioned in this specification are hereby incorporated herein by reference.

EXAMPLE 1

IDENTIFICATION OF IDDM4 EST4 (SAPL)

Construction of Libraries for Shotgun Sequencing DNA was prepared from BAC (Bacterial Artificial Chromosomes) clones 14-1-15 and 25-e-5. Cells containing either BAC vector were streaked on Luria-Bertani (LB)agar plates supplemented with the appropriate antibiotic. A single colony was used to inoculate 200 ml of LB media supplemented with the appropriate antibiotic and grown overnight at 37° C. The cells were pelleted by centrifugation and plasmid DNA was prepared by following the QIAGEN (Chatsworth, CA) Tiρ500 Maxi plasmid/cosmid purification protocol with the following modifications; the cells from 100 ml of culture were used for each Tip500 column, the NaCl concentration of the elution buffer was increased from 1.25M to 1.7M, and the elution buffer was heated to 65° C.

Purified BAC and PAC DNA was digested with Not I restriction endonuclease and then subjected to pulse field gel electrophoresis using a BioRad CHEF Mapper system. (Richmond, CA) . The digested DNA was electrophoresed overnight in a 1% low melting temperature agarose (BioRad, Richmond CA) gel that was prepared with 0.5X Tris Borate EDTA (10X stock solution, Fisher, Pittsburgh, PA ) . The CHEF Mapper autoalgorithm default settings were used for switching times and voltages. Following electrophoresis the gel was stained with ethidiu bromide (Sigma, St. Louis, MO) and visualized with a ultraviolet transilluminator . The insert band(s) was excised from the gel. The DNA was eluted from the gel slice by beta- Agarase (New England Biolabs, Beverly MA) digestion according to the manufacturer's instructions. The solution containing the DNA and digested agarose was brought to 50 mM Tris pH 8.0, 15 mM MgC12, and 25% glycerol in a volume of 2 ml and placed in a AERO-MIST nebulizer (CIS-US, Bedford MA) . The nebulizer was attached to a nitrogen gas source and the DNA was randomly sheared at 10 psi for 30 sec. The sheared DNA was ethanol precipitated and resuspended in TE (10 mM Tris, 1 mM EDTA) . The ends were made blunt by treatment with Mung Bean Nuclease (Promega, Madison, WI) at 30° C for 30 min, followed by phenol/chloroform extraction, and treatment with T4 DNA polymerase (GIBCO/BRL, Gaithersburg, MD) in multicore buffer (Promega, Madison, WI) in the presence of 40 uM dNTPs at 16 °C . To facilitate subcloning of the DNA fragments, BstX I adapters (Invitrogen, Carlsbad, CA) were ligated to the fragments at 14 °C overnight with T4 DNA ligase (Promega, Madison WI) . Adapters and DNA fragments less than 500 bp were removed by column chromatography using a cDNA sizing column (GIBCO/BRL, Gaithersburg, MD) according to the instructions provided by the manufacturer. Fractions containing DNA greater than 1 kb were pooled and concentrated by ethanol precipitation. The DNA fragments containing BstX I adapters were ligated into the BstX I sites of pSHOT II which was constructed by subcloning the BstX I sites from pcDNA II (Invitrogen, Carlsbad, CA) into the BssH II sites of pBlueScript (Stratagene, La Jolla, CA) . pSHOT II was prepared by digestion with BstX I restriction endonuclease and purified by agarose gel electrophoresis. The gel purified vector DNA was extracted from the agarose by following the Prep-A-Gene (BioRad, Richmond, CA) protocol. To reduce ligation of the vector to itself, the digested vector was treated with calf intestinal phosphatase (GIBCO/BRL, Gaithersburg, MD. Ligation reactions of the DNA fragments with the cloning vector were transformed into ultra-competent XL-2 Blue cells (Stratagene, La Jolla, CA) , and plated on LB agar plates supplemented with 100 ug/ml ampicillin. Individual colonies were picked into a 96 well plate containing 100 ul/well of LB broth supplemented with ampicillin and grown overnight at 37 °C. Approximately 25 ul of 80% sterile glycerol was added to each well and the cultures stored at -80 °C.

Prepara tion of plasmid DNA Glycerol stocks were used to inoculate 5 ml of LB broth supplemented with 100 ug/ml ampicillin either manually or by using a Tecan Genesis RSP 150 robot (Tecan AG, Hombrechtikon, Switzerland) programmed to inoculate 96 tubes containing 5 ml broth from the 96 wells. The cultures were grown overnight at 37° C with shaking to provide aeration. Bacterial cells were pelleted by centrifugation, the supernatant decanted, and the cell pellet stored at -20 °C. Plasmid DNA was prepared with a QIAGEN Bio Robot 9600 (QIAGEN, Chatsworth CA) according to the Qiawell Ultra protocol. To test the frequency and size of inserts plasmid DNA was digested with the restriction endonuclease Pvu II. The size of the restriction endonuclease products was examined by agarose gel electrophoresis with the average insert size being 1 to 2 kb.

DNA Sequence Analysis of Shotgun clones

DNA sequence analysis was performed using the ABI PRISM™ dye terminator cycle sequencing ready reaction kit with AmpliTaq DNA polymerase, FS (Perkin Elmer, Norwalk, CT) . DNA sequence analysis was performed with M13 forward and reverse primers. Following amplification in a Perkin-Elmer 9600 the extension products were purified and analyzed on an ABI PRISM 377 automated sequencer (Perkin Elmer, Norwalk, CT) .

Approximately 12 to 15 sequencing reactions were performed per kb of DNA to be examined e.g. 1500 reactions would be performed for a BAC insert of 100 kb.

Assembly of DNA sequences

Phred/Phrap was used for DNA sequences assembly. This program was developed by Dr. Phil Green and licensed from the University of Washington (Seattle, WA) . Phred/Phrap consists of the following programs: Phred for base-calling, Phrap for sequence assembly, Crossmatch for sequence comparisons, Consed and Phrapview for visualization of data, Repeatmasker for screening repetitive sequences. Vector and E . col i. DNA sequences were identified by Crossmatch and removed from the DNA sequence assembly process. DNA sequence assembly was on a SUN Enterprise 4000 server running Solaris 2.51 operating system (Sun Microsystems Inc., Mountain View, CA) . The sequence assemblies were further analyzed using Consed and Phrapview. Biolnformatic Analysis of Assembled DNA Sequences The DNA sequences at various stages of assembly were queried against the DNA sequences in the GenBank database (subject) using the BLAST algorithm (S.F. Altschul, et al . (1990) J. Mol. Biol. 215, 403-410). When examining large contiguous sequences of DNA repetitive elements were masked following identification by Crossmatch with a database of mammalian repetitive elements. Following BLAST analysis the results were compiled by a parser program. The parser provided the following information from the database for each DNA sequence having a similarity with a P value greater than 10^"6; the annotated name of the sequence, the database from which it was derived, the length and percent identity of the region of similarity, and the location of the similarity in both the query and the subject.

Analysis of DNA sequences from BAC 14-1-15 revealed an EST aal94169 which was 91% identical over 60 nucleotides. Several lines of evidence indicated that this was an authentic mRNA transcript and that this EST represented the 5' most portion of that mRNA transcript. The first piece of evidence was revealed by comparing sequences obtained from a mouse BAC clone 53-d-8 that is syntenic with BAC 14-1-15 (Figure 4) . The human genomic DNA corresponding to EST aal94169 was conserved 88% over 43 bp with the mouse genomic DNA. This region of human genomic DNA exhibited a relatively high score in the promoter prediction algorithm PROMOTERSCAN (Prestridge (1995) J. Mol Biol . 249: 923-932) and the presence of a cluster of DNA sequences that are predicted to serve as transcription factor binding sites (Figure 4). This region of human genomic DNA was predicted to be a CpG island, which is often associated with the 5' end genes. These sequences lie approximately 11 kb downstream of the 3' end of a gene, LRP5, that we have previously characterized. Finally, DNA sequences obtained from BAC clone 25-e-5 revealed additional genomic sequences that were represented by EST aal94169 indicating the presence of an intron located between two exons . Together these data support the 5' portion of IDDM4 EST4 corresponding to aal94169 and that this EST sequence is derived from an authentic mRNA transcript. To isolate the open reading frame for this gene, RCCA analysis was focused on extending 3' .

Extension of IDDM4 EST4 by RCCA

The full length cDNA of one aspect of the present invention was generated by a method of cDNA screening called Reduced Complexity cDNA Analysis (RCCA) . Briefly, the extension of partial cDNA sequences have historically been achieved with one or both of the two commonly used methods: filter screening of cDNA libraries by hybridization with labeled probes, and 5'- and 3'-RACE with total cellular mRNA by PCR. The first method is effective but laborious and slow while the latter method is fast but limited in efficiency. This RACE protocol is hindered by limited length of extension due to the use of the entire cellular mRNA population in a single reaction. Since smaller fragments are amplified much more efficiently than larger fragments by PCR in the same reaction, PCR products obtained using the second method are often quite small.

The RCCA method improves upon known methods of cDNA library screening by initially constructing and subdividing cDNA libraries followed by isolating 5'- and 3'- flanking fragments by PCR. Since each pool is unlikely to contain more than one clone for a given gene which is low to moderately expressed, competition between large and small PCR products in one pool does not exist, making it possible to isolate fragments of various sizes. One definite advantage of the method as described herein is the efficiency, throughput, and its potential to isolate alternatively spliced cDNA forms.

The RCCA process provides for rapid extension of a partial cDNA sequence based on subdividing a primary cDNA library and DNA amplification by polymerase chain reaction (PCR) . A cDNA library is constructed with cDNA primed by random, oligo-dT or a combination of both random and oligo-dT primers and then subdivided into pools at approximately 10,000 -20,000 clones per pool that are stored in a 96-well plate. Each pool (well) is amplified separately and therefore represents an independent portion of the cDNA molecules from the original mRNA source.

The fundamental principle of the RCCA process is to subdivide a complex library into superpools of about 10,000 to about 20,000 clones. A library of two million primary clones, a number large enough to cover most mRNA transcripts expressed in the tissue, can be subdivided into 188 pools and stored in two 96-well plates. Since the number of transcripts for most genes is fewer than one copy per ~10,000 transcripts in total cellular mRNA, each pool is unlikely to contain more than one clone for a given cDNA sequence. Such reduced complexity makes it possible to use PCR to isolate flanking fragments of partial cDNA sequences larger than those obtained by known methods .

The skilled artisan, aided with this specification, will understand the far reaching cDNA cloning process disclosed herein: multiple primer combinations from an EST or other partial cDNA sequence, in combination with flanking vector primer oligonucleotides can be used to "walk" in both directions away from the internal, gene specific, sequence, and respective primers, such that a contig representing a full length cDNA can be constructed. In this particular case, the 5' end of the cDNA was represented by an EST with GenBank accession number aal94169. Therefore the RCCA procedure was only employed to obtain sequences 3' of EST aal94169. This procedure relies on the ability to screen multiple pools which comprise a representative portion of the total cDNA library. This procedure is not dependent upon using a cDNA library with directionally cloned inserts. Instead, both 5' and 3' vector and gene specific primers are added and a contig map is constructed from additional screening of positive pools using both vector primers and gene specific primers. Of course, these gene specific primers are initially constructed from a known nucleic acid fragment such as an expressed sequence tag. However, as the walk continues, gene specific primers are utilized from the 5' and 3' boundaries of the newly identified regions of the cDNA. As the walk continues, there is still no requirement that the vector orientation of a yet unidentified fragment be known. Instead, all combinations are tested on a positive pool and the actual vector orientation is determined by the ability of certain vector/gene specific primers to generate the predicted PCR fragment. A full-length cDNA may then be constructed by known subcloning procedures .

RCCA was used to extend the partial cDNA sequence originally identified by similarity between EST aal94169 and BAC 14-1-15 genomic DNA sequences. Positive pools containing the cDNA sequence were identified by PCR using a pair of primers, 4dest4 3f and 4dest4 lr, at a final concentration of 0.15 uM, which generate a PCR product of 377 nucleotides. This product was obtained by 40 amplification cycles of denaturation at 94°C for 30 seconds, primer annealing at 60°C for 30 sec, and product extension at 68°C for 1 min. The DNA polymerase used for PCR amplification was the enzyme TaqGold (Perkin-Elmer, Norwalk CT) and the reaction volume was 10 ul . The PCR template was RCCA pools from size selected libraries >2.5 kb from prostate and testis. Positive pools were identified by detection of the 377 bp product by agarose (2%) gel electrophoresis. Each positive pool in the library contains an independent clone of the cDNA sequence; within each clone are embedded the partial cDNA sequence and its flanking fragments. The flanking fragments are isolated by PCR with primers complementary to the known vector and cDNA sequences and then sequenced directly. To extend the cDNA clone in the 3' direction the primers 4dest4 3f and 4dest4 6f were used in combination with the vector primers, 543R and 873F, in a primary reaction using Taqara LA (Panvera, Madison, WI) . The amplification conditions were 20 cycles of denaturation at 94 °C for 30 sec, primer annealing at 60 °C for 30 sec and extension for 4 min at 68 °C in a 10 ul reaction volume. The primary reaction was diluted by adding 9 parts water and an aliquot was removed for a second PCR reaction containing primer 4dest 7f (4dest4 6f primary reactions) and 4dest4f (for 4dest4 3f primary reactions) . The secondary reactions were amplified 25 cycles using Taqara LA as described above. The DNA sequences from these fragments were assembled with original partial cDNA sequence to generate a continuous cDNA fragment of 2482 nucleotides. The longest clone obtained by RCCA provided an extension of 1.8 kb. The cDNA sequence of 2482 nucleotides was used to search the GenBank database using the BLAST algorithm. This resulted in the identification of a Unigene EST cluster represented by GenBank Accession number aal93106. A number of EST sequences that were present in this Unigene cluster were assembled to produce approximately 1.4 kb of cDNA sequence. PCR primers were then designed based on the Unigene cluster to link these sequences to the DNA sequences identified by RCCA. This resulted in the identification of 4.8 kb of cDNA sequence that contains an open reading frame of 2382 nucleotides which encodes a protein of 794 amino acids.

One of the RCCA clones, clone 33, diverged from the other sequences after nucleotide 2608 to form isoform(b). The divergent sequence in isoform (b) is identical to isofor (a) from nucleotide 4172 to 4682. Therefore this sequence likely represents an alternatively spliced mRNA transcript in which isoform(a) nucleotides 2609-4171 are missing. Isoform (b) contains an open reading frame which encodes a protein of 791 amino acids of which the first 776 amino acids are identical to isoform (a) .

Identifica tion of polymorphisms in SAPL The process of RCCA generates clones that may differ in origin, i.e. the mRNA used to synthesize the cDNA, e.g. testis, mRNA may be derived from an individual heterozygous for the SAPL locus or may be from a pool of different individuals. Therefore polymorphisms between different RCCA clones may represent true differences, alternatively these differences may arise from PCR mistakes or from errors that are made by the DNA polymerase during the propagation of vectors containing SAPL inserts in E. coli . To discriminate against these types of errors polymorphisms were only noted when detected in more than one clone and where the sequence quality was excellent. All the candidate polymorphism's that were detected lie in the putative 3' untranslated portion of the cDNA and thus have no effect on the encoded protein.

Northern Blot Analysis Primers 4dest4 2f and 4dest4 2r (Table 2) were used to amplify a PCR product of 957 bp from placenta, testis, thymus or lymph node cDNA. This products were purified on an agarose gel, the DNA extracted, and subcloned into pCR2.1 (Invitrogen, Carlsbad, CA) . The 957 bp probe was labeled by random priming with the Amersham Rediprime kit (Arlington Heights, IL) in the presence of 50-100 uCi of 3000 Ci/mmole [alpha ³²P]dCTP (Dupont/NEN, Boston, MA) . Unincorporated nucleotides were removed with a ProbeQuant G-50 spin column (Pharmacia/Biotech, Piscataway, NJ) . The radiolabeled probe at a concentration of greater than 1 x 10⁶ cpm/ml in rapid hybridization buffer

(Clontech, Palo Alto, CA) was incubated overnight at 65° C with human multiple tissue Northern's I and II (Clontech, Palo Alto, CA) . The blots were washed by two 15 min incubations in 2X SSC, 0.1% SDS (prepared from 20X SSC and 20 % SDS stock solutions, Fisher, Pittsburg, PA) at room temperature, followed by two 15 min incubations in IX SSC, 0.1% SDS at room temperature, and two 30 min incubations in 0. IX SSC, 0.1% SDS at 60° C. Autoradiography of the blots was done to visualize the bands that specifically hybridized to the radiolabeled probe .

The expression pattern in a number of tissues was examined by Northern blot analysis. Two distinct bands were detected, one of approximately 4.9 kb and the second of approximately 4.1 kb. In most tissues the predominant band is the larger 4.9 kb band, however, in the testis the lower band of approximately 4.1 kb is the predominant one. This lower band may indicate an alternatively spliced form that differs from SAPLb which may be investigated in the testis. The first band likely corresponds to the SAPLa cDNA for which the sequence of 4793 nucleotides has been determined. The second band may. correspond to SAPLb for which the sequence of 3228 nucleotides has been determined. Alternatively, the approximately 4.1 kb band may correspond to an as yet unidentified alternatively spliced form, in which case SAPLb would be a rare alternatively spliced transcript. The highest level of SAPL expression is seen in skeletal muscle, placenta, heart, pancreas and testis. Detectable expression is also observed in brain, lung, liver, kidney, spleen, thymus, prostate, small intestine, colon, and leukocytes. No detectable expression is seen in ovary.

Identifica tion of in tron/exon boundaries for SAPL The program Crossmatch which uses the Smith-Waterman algorithm was used to compare SAPL cDNA sequences with BAC 14-1-15 and BAC 25-e-5 genomic sequences. This identifies the boundaries for first five exons of SAPL which correspond to the first 865 nucleotides of the cDNA sequence (Table 3) .

Isola tion of other species homologs of SAPL gene

The SAPL genes from different species, e.g. rat, dog, are isolated by screening of a cDNA library with portions of the gene that have been obtained from cDNA of the species of interest using PCR primers designed from the human sequence. Degenerate PCR is performed by designing primers of 17-20 nucleotides with 32-128 fold degeneracy by selecting regions that code for amino acids that have low codon degeneracy e.g. Met and Trp. When selecting these primers preference is given to regions that are conserved in the protein e.g. the motifs shown herein. PCR products are analyzed by DNA sequence analysis to confirm their similarity to the human sequence. The correct product is used to screen cDNA libraries by colony or plaque hybridization at high stringency. Alternatively probes derived directly from the human gene are utilized to isolate the cDNA sequence of SAPT from different species by hybridization at reduced stringency.

Use of the SAPL cDNA sequence to search the GenBank database using the FASTA algorithm revealed mouse EST AA684416 which is 93% identical to the SAPL cDNA sequence from 590 to 1080. This is likely the mouse ortholog of human SAPL . It and other mouse ESTs such as aa435418, which is 86% identical from 2888 to 3348, are used in the isolation of the mouse SAPL cDNA either by a PCR based or nucleic acid hybridization based strategy.

EXAMPLE 2 Associa tion wi th diabetes .

Type I diabetes is a multifactorial disorder, with the genetic component being oligo- or polygenic. Two loci have been identified as conferring susceptibility to typ 1 diabetes by candidate gene approaches. The main locus is encoded by the major histocompatibility complex (MHC) on chromosome 6p ( IDDM1 ) (Morton, N., et al. (1983) AM J HUM GENET 35, 201-213; Todd, J and Farrall, M. (1996) Hum Mol Genetδ , 1443-1448) with the second locus, IDDM2, the insulin minisatellite or variable number of tandem repeats (VNTR) on chromosome lip (Bennett, S., et al (1995) Na ture Genet 9, 284-292). These two loci alone, however, cannot account for the observed degree of familial clustering of disease observed in families, where λ_s=15 (λ_s= sibling risk/population prevalence) ; IDDMl and IDDM2 have λ_s=3 and 1.25 respectively, accounting for 50% of familial 5 clustering (Morton, N., et al . (1983) AM J HUM GENET 35, 201- 213; Todd, J and Farrall, M. (1996) Hum Mol Genet5 , 1443-1448; Bennett, S., et al (1995) Na ture Genet 9, 284-292; Risch, N. (1987) Am J Hum Genet 40, 1-14) . A positional cloning approach was therefore undertaken to identify the other loci:

10 a genome wide scan for linkage suggested another 18 possible regions (Davies, J. et al . (1994) Na ture 371, 130-136), including IDDM4 on chromosome llql3 (MLS 3.4, p<0.0001 at FGF3) . This locus was subsequently confirmed at levels of genome-wide significance (p<2 x 10^"5) (Todd, J and Farrall, M.

15 (1996) Hum Mol Genet5 , 1443-1448; Luo, D-F., et al. (1996) Hum Mol Genet 5, 693-698) .

To investigate the extent of linkage within this region, 704 multiplex families (426 UK, 236 US, 32 Norway, 39 Italy) were

20 analysed with 19 microsatellite markers in a 25cM interval spanning FGF3. A multipoint linkage curve was produced (MAPMAKER/SIBS Kruglyak, L and Lander, S. (1995) Am J Hum Genet 57 439-454) with a peak MLS=2.8 (p<0.0003) at D11S1889 (Figure 5), indicating that IDDM4 was localised to within the

2518cM interval D11 S903 to D11 S534 (Nakagawa, Y., et al (1997) Fine mapping of a Type 1 Diabetes Susceptibility Gene (IDDM4) on Chromosome llq 13. Hum . Mol . Gene t . Submi t ted) .

Multipoint linkage analysis cannot localise the gene to a small region. Instead, association mapping has been used for rare single gene traits which can narrow the interval to less than 2cM or 2Mb. In theory, associations of a particular allele very close to the founder mutation will be detected in populations descended from that founder. The transmission disequilibrium test (TDT - Spielman, R., et al (1993) Am J Hum Genet 52, 506-516) assesses the deviation from 50% of the transmission of alleles from a marker locus from parents to affected children. A strategy was undertaken with the IDDM4 linkage region, using TDT, to detect linkage in the presence of association, which had also been previously used to fine map the putative IDDM6 locus on chromosome 18q21 (Merriman, T. et al. (1997) Hum . Mol . Genet . 6 1003-1010). TDT analysis of 658 UK and US families showed a deviation in transmission of alleles of four loci. Analysis of the three most common alleles, with p_uncorrectecι<0 • 05 : D11S4205 54% transmission, p=0.03; D11S1 783, 58% transmission, p=0.0005; D11 S1189, 46% transmission, p=0.05; H0570POLYA, 54% transmission, p=0.01. The multiallelic T_sp test was undertaken on these loci which is a test for association of loci with multiple alleles (Martin, E. et al. (1997) Am . J. Hum . Genet . 61, 439-448). This confirmed the results with D11S4205 (Tsp=17.5, p=0.01), D11 S1 783 (Tsp=23.6, p=0.0001) and H0570POLYA (Tsp=12.4, p=0.03). D11S4205 (proximal) and D11S1 783 (distal) are approximately 1Mb apart, and so may be showing association with one locus. H0570POLYA is approximately 3Mb distal to D11S1 783 and therefore may be showing association with a second locus. Figure 6 shows the LOD score of the Tsp analysis (-Hog of the p value) . Further analysis of H0570POLYA in 2042 families with type 1 diabetes confirmed the association observed with this marker and type 1 diabetes (2X2 test of heterogeneity for affected versus unaffected siblings, p_COrrected ^<4 • 8 ^x 10^~5) (Nakagawa, Y., et al (1997) Hum . Mol . Genet . Submi t ted) .

As association of a particular allele of a marker to the disease is likely to occur when marker and disease mutation are close (within 2Mb) , genes within this interval are candidates. The SAPL gene is within 200kb of H0570POLYA, in a region showing strong association with IDDM, hence single nucleotide polymorphisms within this gene and its regulatory regions are candidates for the aetiological mutation IDDM4.

TABLE 1 OLIGONUCLEOTIDE PRIMERS

4dest4 If (-26) CCGCCTGAGCGCAACTAG

4dest4 2f (0) TCGTGGGCACCTCCAGATAAG 4dest4 5f (24) ACAAGCTCAGAGAGATGTGGTG

4dest4 3f (66) AACTTCCTCGGCCATATGG 4dest4 4F (120) GGGAGAGCTTGTTTCATATCC

4dest4 3r (216) TCTTCTTTGTGGCTCCTTGC

4dest4 lr (443) CGGTTCTGAGCTTTACATTCC 4dest4 6f (565)

GGGAGAAGATGAATCCTTGC

4dest4 7f (619) CCCTTTGAATCCACTACTTGC

4dest4 2r (957) ATTTGTTGCTCAGGCTCCTG

4dest4 8f (1065) CAGCCATAGTCAGTGCAATCC

4dest4 4r (1067) TGGATTGCACTGACTATGGC 4dest4 llf (1497)

TGGGACACCTAACGAGGATAGC

4dest4 9F (1582) AGATCCTCCGACGAAGTCAG

4dest4 5r (1602) CTGACTTCGTCGGAGGATCT

4dest4 6r (1765) GCCAATGTCATCTTGATCTGC

4dest4 lOf (2012) PAIR WITH 8R CAAGACTTGTTTGAACCCAGC 4dest4 7r (2189) TCTCTTTAGTTGGCATCGGC

4dest4 8r (2391) CTTTCTGCATCCTCCTCTCC

4dest4 12f (2515) AGATGCTGCTTGTAAAGACGC 4dest4 12r (2643) ACTGAAGTGTCACCTGGTGC

4dest4 14f (2909) GCCTGTGAAATAAGATCTTGCC

4dest4 14r (2930) GGCAAGATCTTATTTCACAGGC 4dest4 15r (3376)

CAAGCAAACAAGACTTGAACAG

4dest4A 13R (3876) TGAGCTGTTTGAGAAGGCTG

4dest4A HR (4193) AGTGCTGGAATCTCCACACC

4dest4A 13F (4301) TGAAGAGACTGTCCTTGGGC

4dest4A 10R (4691) CCCATTGTCATATCCTTTCCC 4dest4A 9R (4786) TTCAGTATGGCCAACACACAG

Vector Primers for RCCA PBS.543R

GGGGATGTGCTGCAAGGCGA

PBS.578R

CCAGGGTTTTCCCAGTCACGAC

PBS.838F TTGTGTGGAATTGTGAGCGGATAAC

PBS.873F CCCAGGCTTTACACTTTATGCTTCC TABLE 2

DM4E4 POLYMORPHISMS

location Polymorphism 5' Context 3'Context

3297 delete AAGTA AGATTAAGTA TTTATTGCTA

3488 G to A transition TΠTIU ΓI C TΠGGTAU IT

3680 G to A transition TATTTTAAAA TAGAAATCAA 4143 delete TTA GTCTAATGCC TTATTTCTGA

Nucleotide location numbers are based on DM4E4a sequence (Figure XX).

TABLE 3 DM4E4 Intron/Exon Boundaries

Exon Size 5' 3' Intron Size

1 60+ TCCAGgtaa 1 unknown

2 106 tacagATAAG TTGAGgtacc 2 > 13,000

3 150 tacagGAGCT GAAAGgtaag 3 18,014

4 233 tttagACCAG TACAAgtaag 4 6,964

5 187 tctagGTATC AACAGgtaaa 5 3,043

6 129 tctagATTGT AAGATgtgct 6 unknown

Exons 1-6 account for the first 820 nucleotides of DM4E4a cDNA

TABLE 4 Prosite Motifs in SAPR

Residue Number Motif

279->282 CAMP_PHOSPHO_SITE

458->461 CAMP_PHOSPHO_SΠΈ

556->559 CAMP_PHOSPHO_SITE

23->25 PKC_PHOSPHO_SITE

133->135 PKC_PHOSPHO_SΠΈ

278->280 PKC_PHOSPHO_SΠΈ

421->423 PKC_PHOSPHO_SΠΈ

456->458 PKC_PHOSPHO_SΠΈ

554->556 P C_PHOSPHO_SΠΈ

651->653 PKC_PHOSPHO_SΓΓE

655->657 PKC_PHOSPHO_SΓΓE

706->708 PKC_PHOSPHO_SΠΈ

11->14 CK2_PHOSPHO_SΠΈ

I5->18 CK2_PHOSPHO_SΠΈ

23->26 CK2_PHOSPHO_SΠΈ

171->174 CK2_PHOSPHO_SΠΈ

202->205 CK2_PHOSPHO_SΠΈ

214->217 CK2_PHOSPHO_SΠΈ

233->236 CK2_PHOSPHO_SITE

274->277 CK2_PHOSPHO_SΠΈ

304->307 CK2_PHOSPHO_SITE

315->318 CK2_PHOSPHO_SΓΓE

339->342 CK2_PHOSPHO_SΠΈ

351->354 CK_2_PHOSPHO_SΠΈ

360->363 CK2_PHOSPHO_SΠΈ

362->365 CK2_PHOSPHO_SITE

366->369 CK2_PHOSPHO_SΠΈ

452->455 CK2 PHOSPHO.SITE

537->540 CK2_PHOSPHO_5ITE

563->566 CK2_PHOSPHO_SITE

569->572 CK2_PHOSPHO_SΠΈ

571->574 C 2_PHOSPHO_SΠΈ

628->631 CK2_PHOSPHO_SΠΈ

642->645 CK2_PHOSPHO_SΠΈ

651->654 CK2_PHOSPHO_SΠΈ

660->663 CK2_PHOSPHO_SΠΈ

666->669 CK2_PHOSPHO_SΠΈ

697->700 CK2_PHOSPHO_SΠΈ

744->747 CK2_PHOSPHO_SΠΈ

772->775 CK2_PHOSPHO_SΠΈ

293->298 MYRISTYL

561->566 MYRISTYL

717->722 MYRISTYL

Claims

1. An isolated nucleic acid encoding a polypeptide which comprises the first 776 amino acids shown in Figure 1(c) and Figure 2 (c) .

2. An isolated nucleic acid molecule encoding a polypeptide and which hybridizes under stringent conditions to nucleic acid according to claim 1.

3. An isolated nucleic acid encoding a SAPL polypeptide, which SAPL polypeptide is selected from the group consisting of the SAPLa polypeptide isoforms of which the amino acid sequences are shown in Figure 1 (c) and Figure 1 (d) and the SAPLb polypeptide isoform of which the amino acid sequence is shown in Figure 2 (c) .

4. An isolated nucleic acid according to claim 3 comprising a coding sequence selected from the group consisting of the coding sequences shown in Figure 1(a), Figure 1(b), Figure 2 (a) or Figure 2 (b) .

5. An isolated nucleic acid according to claim 4 comprising a coding sequence encoding said SAPL polypeptide selected from the group consisting of the SAPLa polypeptide isoforms of which the amino acid sequences are shown in Figure 1 (c) and Figure 1 (d) and the SAPLb polypeptide isoform of which the amino acid sequence is shown in Figure 2 (c) , wherein the coding sequence differs from the coding sequences shown in Figure 1(a), Figure 1(b), Figure 2(a) or Figure 2(b). 5

6. An isolated nucleic acid encoding a polypeptide, which polypeptide has at least 80% amino acid sequence similarity with a SAPL polypeptide encoded by nucleic acid according to claim 3.

10

7. An isolated nucleic acid according to claim 6 encoding a polypeptide, which polypeptide has at least 90% amino acid sequence similarity with a SAPL polypeptide encoded by nucleic acid according to claim 3.

15

8. An isolated nucleic acid that corresponds to nucleic acid according to claim 4 containing an alteration at a polymorphic site associated with disease.

209. An isolated nucleic acid that corresponds to nucleic acid according to claim 4 containing an alteration shown in Table 2.

10. A replicable nucleic acid vector comprising nucleic acid 25 according to any one of claims 1 to 9.

11. A replicable nucleic acid vector according to claim 10 wherein said nucleic acid is under control of regulatory sequences for expression.

512. A host cell transformed with nucleic acid according to any one of claims 1 to 10 or a replicable nucleic acid vector according to claim 9 or claim 10.

13. An oligonucleotide fragment of a nucleic acid molecule 10 according to claim 4 of at least about 14 nucleotides.

14. An oligonucleotide with a nucleotide sequence shown in Table 1.

15. An isolated nucleic acid encoding a promoter of which the 15 sequence is shown within Figure 4.

16. An isolated nucleic acid according to claim 15 operably linked to a heterologous coding sequence.

2017. A replicable nucleic acid vector comprising nucleic acid according to claim 15 or claim 16.

18. A replicable nucleic acid vector according to claim 17 wherein said nucleic acid is under control of regulatory 25 sequences for expression.

19. A host cell transformed with nucleic acid according to claim 15 or claim 16 or a replicable nucleic acid vector according to claim 17 or claim 18.

520. An isolated polypeptide comprising an amino acid sequence selected from the group consisting of those encoded by nucleic acid according to any one of claims 1 to 9.

21. A fragment of a polypeptide including at least 5

10 contiguous amino acids of an amino acid sequence selected from the group consisting of the amino acid sequences shown in Figure 1(c), Figure 1(d) and Figure 2(c)

22. A fragment according to claim 21 which has an amino acid 15 sequence selected from the group consisting of:

HPSQEEDRHSNASQ, RIQQFDDGGSDEEDI, PESQRRSSSGSTDSE, and PSSSPEQRTGQPSAPGDTS . 20

23. A method of production of a polypeptide which comprises culturing a host cell according to claim 19 under conditions for production of said polypeptide.

2524. A method according to claim 23 further comprising isolating and/or purifying the polypeptide.

25. A method according to claim 24 further comprising formulating the polypeptide into a composition which comprises at least one additional component.

526. A composition comprising a polypeptide according to claim 20 or fragment according to claim 22, or nucleic acid encoding said polypeptide or fragment, and a pharmaceutically acceptable excipient.

1027. An isolated antibody specific for a polypeptide according to claim 20.

28. An isolated antibody according to claim 27 which binds an amino acid sequence selected from: 15 HPSQEEDRHSNASQ,

RIQQFDDGGSDEEDI,

PESQRRSSSGSTDSE, and

PSSSPEQRTGQPSAPGDTS .

2029. A composition comprising an antibody according to claim 27 or claim 28 and a pharmaceutically acceptable excipient.

30. A method which comprises determining in a sample the presence or absence of nucleic acid with the nucleotide 25 sequence of nucleic acid according to any one of claims 1 to 9, an oligonucleotide with the nucleotide sequence of an oligonucleotide according to claim 13 or 14, or a polypeptide with the amino acid sequence of a polypeptide according to claim 20 or comprising the amino acid sequence of a fragment according to claim 21 or claim 22.