CA2387041A1

CA2387041A1 - Human sit4 associated proteins like (sapl) proteins and encoding genes; uses thereof

Info

Publication number: CA2387041A1
Application number: CA002387041A
Authority: CA
Inventors: John Andrew Todd; Rebecca Christina Joan Twells; John Wilfred Hess; Patricia Hey; Charles Thomas Caskey; Holly Hammond; Michael Lee Metzker
Original assignee: Individual
Current assignee: Wellcome Trust Ltd; Merck and Co Inc
Priority date: 1999-10-19
Filing date: 2000-10-19
Publication date: 2001-04-26
Also published as: EP1222264A1; AU7934300A; JP2003512055A; WO2001029213A1

Abstract

Nucleic acids, polypeptides, oligonucleotide probes and primers, methods of diagnosis or prognosis, and other methods relating to and based on the cloning and characterisation of a gene which the present inventors have termed SAPL
(SIT4-(sporulation-induced transcript 4)-like), found in at least two isoforms terms SAPLa and SAPLb.

Description

HUMAN SIT4 ASSOCIATED PROTEINS LIKE (SAPL) PROTEINS AND ENCODING GENES; USES
THEREOF
The present invention relates to nucleic acids, polypeptides, oligonucleotide probes and primers, methods of diagnosis or prognosis, and other methods relating to and based on the 5cloning and characterisation of a gene which the present inventors have termed SAPL, found in at least two isoforms terms SAPLa and SAPLb.
Mammals and yeast use a similar mechanism that relies upon locyclins and cyclin-dependent kinases (CDKs) to regulai.e the cell cycle. In yeast the components of this mechanism 1:~~°lude the cyclins, CLN1 and CLN2, and the cyclin-dependent k~.nases CDC28 (cell division control) in yeast (Dynlacht, 199?. Nature 389, 149-152). The activity of the serine/threor_ine kinaae ~5CDC23, also known as CDK1, is essential for the completion of G1 START, the controlling event in the yeast cell cycle. CDC28 activity is modulated by the level of the cycli.ns, CLrdl and CLN2. 'Ihe i2ve1 of expression of CDC28 remains relative ;' constant throughout the cell cycle. In contrast, the mRNA
2oexpression level of the genes CLNI and CLN2 increases dramatically during late G1. This expression of CLNI and CLN2 is dependent upon the SIT4 Ppase (protein phosphatase) (Fernandez-Sarabia et al. 1992. Genes Dev. 6, 2417-2428). The SIT4 Ppase is a type 2A phosphatase which is encoded by the 25 sit4 (~orulation-induced transcript 4)gene (Sutton et al.
1991. Mol. Cell. Biol. 11, 2133-2148). The SIT4 protein is 55o identical to the catalytic subunit of mammalian type 2A
phosphatase and 40o identical to mammalian type 1 phosphatase.
A human cDNA clone, protein phosphatase 6, has been obtained that encodes a protein that when expressed in yeast has the 5ability to complement a sit4 mutant (Bastians and Ponstingl, 1996. J. Cell Sci. 109, 2865-2874). Therefore it is likely that protein phosphatase 6 or a related phosphatase is the mammalian ortholog of SIT4.
lOGenetic analysis in yeast of sit4 mutations demonstrated that the SIT4 Ppase is necessary for progression of the cell cycle from late G1 to S phase with a temporal point of action, or execution point, at or similar to that of CDC28 (Sutton et al.
1991. Mol. Cell. Biol. 11, 2133-2148). The SIT4 protein was l5found to be associated with two proteins with an apparent molecular weight of 190 and 155 kD. The cloning of the genes encoding the SIT4 Associated Proteins, SAP155 and SAP190, resulted in the identification of two additional related genes encoding the proteins SAP185 and SAP4 (Luke et al., 1996. Mol.
2oCell. Biol. 16, 2744-2755). Alignment of the members-of the SAP family revealed a number of conserved residues (Luke et al., 1996. Mol. Cell. Biol. 16, 2744-2755), some of which are also present in SAPL. The SAP proteins appear to specifically interact with the SIT4 phosphatase (Luke et al., 1996. Mol.
25Ce11. Biol. 16, 2744-2755). Deletion of all four SAP genes results in a phenotype that is equivalent to a deletion of sit4, thus their association with SIT4 is essential for its function (Luke et al., 1996. Mol. Cell. Biol. 16, 2749-2755).
Since overexpression of SAP genes can suppress certain sit4 temperature sensitive mutants it is thought that the SAP
5proteins act as positive modulators of SIT4 Ppase. The mechanism by which the SAP proteins modulate SIT4 activity is unknown. One possibility that has been suggested is that the SAP proteins increase the substrate specificity of SIT4 Ppase, in a fashion analogous to that found for the glycogen-lotargeting subunit of type 1 phosphatases (Luke et al., 1996.
Mol. Cell. Biol. 16, 2744-2755). In this case SIT4 would be a SAP-dependent phosphatase in a manner similar to CDC28 being a cyclin-dependent kinase (Luke et al., 1996. Mol. Cell. Biol.
16, 2744-2755). Regardless of the mode of action the l5importance of the SAP proteins in the yeast cell cycle in regulating the activity of a critical enzyme, SIT4 Ppase, is well established.
The present inventors now disclose for the first time two 2oisoforms of a novel gene, arising from alternative splicing and encoding highly related proteins, from the IDDM9 locus on human chromosome 11q13. The isoforms have been termed by the inventors "SAPLa" (SAP like) and "SAPLb".

Figure 1(a) shows the nucleotide sequence of SAPLa cDNA.

Nucleotide numbering herein is by reference to this sequence.
Figure 1(b) shows the longest open reading frame of the SAPLa cDNA.
Figure 1(c) shows the amino acid sequence translation of the open reading frame in the SAPLa cDNA producing SAPL isoform a.
Amino acid residue numbering herein is by reference to this sequence.
Figure 1(d) shows the amino acid sequence translation of an alternative open reading frame in the SAPLa cDNA which starts with a sequence that conforms with the Kozak consensus sequence for efficient initiation of translation.
Figure 2(a) shows the nucleotide sequence of SAPLb cDNA.
Figure 2(b) shows the longest open reading frame of the SAPLb cDNA.
Figure 2(c) shows the amino acid sequence translation of the open reading frame in the SAPLb cDNA producing SAPL isoform b.
Figure 3 shows a multiple sequence alignment of the amino acid 25sequence of yeast SAP190, yeast SAP185 and human SAPL isoform a. The consensus sequence of the alignment is shown, capital letters indicate identity at that position. The amino acid residues that are underlined are conserved within the yeast SAP family as well as SAPL.

5 Figure 4 shows the sequence of DNA found immediately adjacent to SAPL exon 1 in the genome and identified as a putative promoter. Sequences that match the consensus binding sites for the Spl and NF kappa B transcription factors are shown in capital letters. Sequences that are conserved in the syntenic loregion of mouse genomic DNA sequence are underlined.
Figure 5 shows a multipoint linkage curve of the IDDM4 region.
Figure 6 shows the LOD score of the Tsp value obtained in l5analysis described below. The x-axis is not to scale.
Characteristics of SAPL cDNA and SAPL protein Two full length cDNA sequences, that arise from alternative splicing, were isolated from the IDDM4 locus on chromosome 2o11q13 and termed SAPLa (SAP like) and SAPLb. The SAPL.gene is also known to the inventors as DM4E4. The longest cDNA of 4793 nucleotides (Figure 1(a)) contains an open reading frame (Figure 1.(b)) that encodes a protein, SAPLa, of 793 amino acids (Figure 1(c)). The putative initiator methionine codon 25at nucleotide 278, AGCATGT conforms to the Kozak consensus sequence for efficient initiation of translation at the -3 (purine, preferably A) but not the +4 position (Kozak, (1996) Mamm. Genome 7, 563-574). The predicted molecular weight of this protein is 89 kdal with an isoelectric point of 4.31.
The protein does not contain any stretches of hydrophobic 5amino acids that would have a high probability of serving as a transmembrane spanning domain, nor does the protein contain a signal peptide for protein export. Therefore SAPLa is most likely localized to the cytoplasmic portion of the cell. A
nonoptimal initiation can lead to multiple start sites (Kozak, (1996) Ma mm. Genome 7, 563-574). The first ATG codon that conforms to the Kozak consensus sequence is at nucleotide 482, GACATGG, this resulting in a protein of 725 amino acids (Figure 1(d)). The SAPLa cDNA contains consensus signals for polyadenylation at nucleotides 3592-3597 and 4115 to 4120.
The second cDNA (Figure 2(a)), SAPLb, of 3228 nucleotides contains an open reading frame (Figure 2(b)) that encodes a protein (Figure 2(c)), SAPLB, of 791 amino acids. The two proteins, SAPLa and SAPLb, are 1000 identical for the first 20776 amino acids. SAPLb has a predicted molecular weight of 89 kdal with an isoelectric point of 4.30, like SAPLa, it is predicted to be expressed in the cytoplasm. Both SAPLa and SAPLb contain a tandem repeat of the amino acid sequence Ser-Thr-Asp-Ser-Glu-Glu (STDSEE) from amino acids 562-573.
Comparison of SAPL with the protein database using the Smith-Waterman algorithm reveals a significant degree of similarity, p = 6.01e-13 and p = 2.02e-12, to two members of a SAP family of yeast proteins, SAP190 and SAP185 respectively. A lesser degree of similarity, p = 2.05e-2, is found to a third member of this family, SAP155. The amino acid sequence identity between amino acid 94-724 of SAPL and SAP190 is 190. Over a similar region SAPL is 18o identical to SAP185. Using the algorithm tFASTA (which translates all the nucleotide sequences in the database in the 6 possible frames and locompares it with the amino acid sequence of the input protein sequence using the FASTA algorithm) to search for additional mammalian genes with similarity to SAPL resulted in the identification of EST sequences but no full length cDNA
sequences. Therefore the full length SAPL cDNA identified in this application provides for the first time the determination of amino acid sequence of a mammalian homolog of the yeast SAP
family..
A multiple sequence alignment of SAPL, SAP190, and SAP 185 2ousing the program GCG program pileup with the GapWeight set at 10 and the GapLengthWeight set at 1 yields the alignment shown in Figure 3. This alignment reveals several conserved motifs, a number of which are conserved within four members of the SAP
family (SAP4, SAP155, SAP185 and SAP190). The most strikingly conserved motifs are located in SAPL at residues 333-338, WNNFLH, and from 403-414, R(x)GYMGHLT(xx)A. There are also a number of other conserved regions of note from 102 to 108, LL (x) (K/R) L (aromatic) S and from 163 to 168, MD(hydrophobic)LL(K/R). Although the SAPL STDSEE repeats are not conserved there are a number of conserved acidic residues Sin this portion of the protein, i.e. residues 539 to 591.
Several of these conserved motifs are found in all members of the SAP family, not only SAP190 and SAP185 which are the most similar to SAPL (Figure 3). These include portions of the previously noted motifs; the motif from 333-338 loWNNF(hydrophobic)H and the motif from 403-414 GYMG. A number of other residues which are identical between human SAPL and the members of the SAP family are indicated in Figure 3. The SAP proteins are not that similar to each other, e.g. SAP185 exhibits only 14o and 42o identity to SAP155 and SAP190, l5respectively. The finding that the protein contains motifs that are conserved within this family provides a strong indication that it is related to the yeast SAP family.
A number of potential protein phosphorylation sites are found 2oin the SAPL protein (Table 4). These include sites for the cAMP dependent protein kinase, protein kinase C, and casein kinase 2. Protein phosphorylation is a reversible modification of proteins and an important mechanism for modulating protein function. Furthermore in yeast the 25deletion of the sits gene results in hyperphosphorylation of the SAP proteins (Luke et al., 1996. Mol. Cell. Biol. 16, 2744-2755), therefore there is direct evidence that this family of proteins is subject to protein phosphorylation.
Thus it is likely that SAPL is phosphorylated at least at some of the sites listed in Table 4. Furthermore, it is likely that SAPL function is modulated by protein phosphorylation.
Protein phosphorylation of SAPL may be used in assays for compounds that modulate the level of SAPL phosphorylation.
These compounds may inhibit either kinases or phosphatases that act on SAPL. Compounds isolated in such a fashion may l0have therapeutic utility in modifying the function of SAPL.
The cloning of the SAPL cDNA permits overexpression of the SAPL protein, and various isoforms, and testing of ability to complement SAP mutants in yeast. Similarly, expression of the human SAPL in yeast allows for the testing of a physical association between SAPL and SIT4. The cloning of the SAPL
cDNA also allows the testing of the ability of SAPL to interact and modulate the activity of human protein phosphatase 6 and related phosphatases. Usefulness of SAPL in 2oscreening for molecules of pharmaceutical potential is discussed further below.
Since the activity of phosphatases, such as a SIT4 ortholog, may be necessary for progression of the cell cycle, compounds 25that inhibit the activity of the phosphatase may be useful in the treatment of cancer and other proliferative disorders.

The SAPL protein may act in a manner analogous to the SAP
proteins in yeast either to activate the phosphatase or modify its specificity. This too inicates usefulness of SAPL in assays for compounds that modulate the activity of the 5phosphatase, e.g. inhibit it.
There is evidence that the cyclin/CDK system is used to monitor environmental factors that influence not only cell division but apoptosis in terminally differentiated cells (Gao l0and Zelenka, (1997). BioEssays 19, 307-315)., Since certain cyclins are expressed in T-cells this mechanism may be important in mediating T-cell apoptosis. Apoptosis of selected T-cell populations is a critical element in the control of the immune system and the prevention of l5autoimmunity. Therefore the location of SAPL within the IDDM4 locus and its proposed biological function of modulating either the activity or specificity of a phosphatase may indicate that this protein is important in maintaining immune self-tolerance. Compounds that modify the activity of SAPL
2omay be tested in assays of T-cell proliferation or apoptosis.
Compounds able to modify SAPL activity may be identified by ability to stimulate or inhibit SAPL complementation in mutant yeast deleted for all four yeast SAP genes (Luke et al. (1996) Mol. Cell. Biol. 16: 2744-2755).
The presence of polymorphisms within the SAPL gene and the location of this gene within the IDDM4 locus allow for use of certain of the polymorphisms as diagnostic markers. These polymorphisms may be used to assay for the presence of a chromosomal region that confers susceptibility to type 1 diabetes. This susceptibility may be due to functional polymorphisms within the SAPL gene itself or may be due to a functional polymorphism within a neighboring gene that is in linkage disequilibrium with a SAPL polymorphism.
According to one aspect of the present invention there is provided a nucleic acid molecule encoding a SAPL polypeptide, which may be any of the SAPLa polypeptide isoforms of which the amino acid sequences are shown in Figure 1(c) and Figure 151(d) and the SAPLb polypeptide isoform of which the amino acid sequence is shown in Figure 2(c).
Thus, individual aspects of the present invention provide nucleic acid encoding a polypeptide including the amino acid sequence shown in Figure 1 (c) , Figure 1 (d) or Figure 2.(c) ..' Furthermore, an additional aspect of the present invention provides nucleic acid encoding a polypeptide which includes the first 776 amino acids shown in Figure 1(c) and Figure 2(c) which are identical for the respective SAPL isoforms a and b.
A coding sequence of the present invention may be that shown included in Figure 1(a), Figure 1(b), Figure 2(a) or Figure 2(b), or it may be a mutant, variant, derivative or allele of one of the sequences shown. The sequence may differ from that shown in a said figure by a change which is one or more of 5addition, insertion, deletion and substitution of one or more nucleotides of the sequence shown. Changes to a nucleotide sequence may result in an amino acid change at the protein level, or not, as determined by the genetic code.
loThus, nucleic acid according to the present invention may include a sequence different from the sequence shown in a figure herein yet encode a polypeptide with the same amino acid sequence.
150n the other hand the encoded polypeptide may comprise an amino acid sequence which differs by one or more amino acid residues from the amino acid sequence shown in Figure 1(c), Figure 1(d) or Figure 2(c). Nucleic acid encoding a polypeptide which is an amino acid sequence mutant, variant, 2oderivative or allele of the sequence shown in one of these figures is further provided by the present invention. Such polypeptides are discussed below. Nucleic acid encoding such a polypeptide may show at the nucleotide sequence and/or encoded amino acid level greater than about 60o homology with 25the coding sequence shown in the relevant figure and/or the amino acid sequence shown in the relevant figure, greater than about 70o homology, greater than about 80o homology, greater than about 90o homology or greater than about 95o homology.
For amino acid "homology", this may be understood to be similarity (according to the established principles of amino acid similarity, e.g. as determined using the algorithm GAP
(Genetics Computer Group, Madison, WI) or identity. GAP uses the Needleman and Wunsch algorithm to align two complete sequences that maximizes the number of matches and minimizes the number of gaps. Generally, the default parameters are loused, with a gap creation penalty = 12 and gap extension penalty = 4. Use of GAP may be preferred but other algorithms may be used, e.g. BLAST (which uses the method of Altschul et al. (1990) J. Mol. Biol. 215: 405-410, FASTA (which uses the method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448), or the Smith-Waterman algorithm (Smith and Waterman (1981) J.
Mo1 Biol. 147: 195-197), generally employing default parameters. Use of either of the terms "homology" and "homologous" herein does not imply any necessary evolutionary relationship between compared sequences, in keeping for 2oexample with standard use of terms such as "homologous recombination" which merely requires that two nucleotide sequences are sufficiently similar to recombine under the appropriate conditions. Further discussion of polypeptides according to the present invention, which may be encoded by nucleic acid according to the present invention, is found below.

The present invention extends to nucleic acid that hybridizes with any one or more of the specific sequences disclosed herein under stringent conditions. Suitable conditions include, e.g. for detection of sequences that are about 80-90o 5identical suitable conditions include hybridization overnight at 42°C in 0.25M Na2HP09, pH 7.2, 6.5o SDS, loo dextran sulfate and a final wash at 55°C in 0.1X SSC, 0.1$ SDS. For detection of sequences that are greater than about 90o identical, suitable conditions include hybridization overnight at 65°C in 0 . 25M Na2HP0q, pH 7 . 2, 6. 5 o SDS, 10 o dextran sulfate and a final wash at 60°C in O.1X SSC, O.lo SDS.
The coding sequence may be included within a nucleic acid molecule which has the sequence shown in Figure 1(a) or Figure 152(a) and encode the full polypeptide of Figure 1(c) or Figure 2(c). Mutants, variants, derivatives and alleles of these sequences are included within the scope of the present invention in terms analogous to those set out in the preceding paragraph and in the following disclosure. The same applies 2ofor the second isoform of SAPLa, of which the amino acid sequence is shown in Figure 1(d).
Alterations in a sequence according to the present invention which are associated with IDDM or other disease may be 25preferred in accordance with embodiments of the present invention. Implications for screening, e.g. for diagnostic or prognostic purposes, are discussed below. Particular nucleotide sequence alleles according to the present invention have sequences with a variation indicated in Table 2. One or more of these may be associated with susceptibility to IDDM or 5other disease.
Generally, nucleic acid according to the present invention is provided as an isolate, in isolated and/or purified form, or free or substantially free of material with which it is ionaturally associated, such as free or substantially free of nucleic acid flanking the gene in the human genome, except possibly one or more regulatory sequences) for expression.
Nucleic acid may be wholly or partially synthetic and may include genomic DNA, cDNA or RNA. The coding sequence shown l5herein is a DNA sequence. Where nucleic acid according to the invention includes RNA, reference to the sequence shown should be construed as encompassing reference to the RNA equivalent, with U substituted for T.
2oNucleic acid may be provided as part of a replicable vector, and also provided by the present invention are a vector including nucleic acid as set out above, particularly any expression vector from which the encoded polypeptide can be expressed under appropriate conditions, and a host cell 25containing any such vector or nucleic acid. An expression vector in this context is a nucleic acid molecule including nucleic acid encoding a polypeptide of interest and appropriate regulatory sequences for expression of the polypeptide, in an in vitro expression system, e.g.
reticulocyte lysate, or in vivo, e.g. in eukaryotic cells such 5as COS or CHO cells or in prokaryotic cells such as E. coli.
This is discussed further below.
The nucleic acid sequence provided in accordance with the present invention is useful for identifying nucleic acid of lointerest (and which may be according to the present invention) in a test sample. The present invention provides a method of obtaining nucleic acid of interest, the method including hybridisation of a probe having a sequence shown herein, or a complementary sequence, to target nucleic acid. Hybridisation l5is generally followed by identification of successful hybridisation and isolation of nucleic acid which has hybridised to the probe, which may involve one or more steps of PCR. It will not usually be necessary to use a probe with the complete sequence shown in any of these figures. Shorter 2ofragments, particularly fragments with a sequence encoding the conserved motifs may be used.
Nucleic acid according to the present invention is obtainable using one or more oligonucleotide probes or primers designed 25to hybridise with one or more fragments of the nucleic acid sequence shown in any of the figures, particularly fragments of relatively rare sequence, based on codon usage or statistical analysis. A primer designed to hybridise with a fragment of the nucleic acid sequence shown in any of the figures may be used in conjunction with one or more 5oligonucleotides designed to hybridise to a sequence in a cloning vector within which target nucleic acid has been cloned, or in so-called "RACE" (rapid amplification of cDNA
ends) in which cDNA's in a library are ligated to an oligonucleotide linker and PCR is performed using a primer lOwhich hybridises with a sequence shown and a primer which hybridises to the oligonucleotide linker.
Such oligonucleotide probes or primers, as well as the full-length sequence (and mutants, alleles, variants and i5derivatives) are also useful in screening a test sample containing nucleic acid for the presence of alleles, mutants and variants, with diagnostic and/or prognostic implications as discussed in more detail below.
2oNucleic acid isolated and/or purified from o.ne or more cells (e. g. human) or a nucleic acid library derived from nucleic acid isolated and/or purified from cells (e. g. a cDNA library derived from mRNA isolated from the cells), may be probed under conditions for selective hybridisation and/or subjected 25to a specific nucleic acid amplification reaction such as the polymerase chain reaction (PCR) (reviewed for instance in "PCR

protocols; A Guide to Methods and Applications", Eds. Innis et al, 1990, Academic Press, New York, Mullis et al, Cold Spring Harbor Symp. Quant. Biol., 51:263, (1987), Ehrlich (ed), PCR technology, Stockton Press, NY, 1989, and Ehrlich et s al, Science, 252:1643-1650, (1991)). PCR comprises steps of denaturation of template nucleic acid (if double-stranded), annealing of primer to target, and polymerisation. The nucleic acid probed or used as template in the amplification reaction may be genomic DNA, cDNA or RNA. Other specific lonucleic acid amplification techniques include strand displacement activation, the QB replicase system, the repair chain reaction, the ligase chain reaction and ligation activated transcription. For convenience, and because it is generally preferred, the term PCR is used herein in contexts l5where other nucleic acid amplification techniques may be applied by those skilled in the art. Unless the context requires otherwise, reference to PCR should be taken to cover use of any suitable nucleic amplification reaction available in the art.
In the context of cloning, it may be necessary for one or more gene fragments to be ligated to generate a full-length coding sequence. Also, where a full-length encoding nucleic acid molecule has not been obtained, a smaller molecule representing part of the full molecule, may be used to obtain full-length clones. Inserts may be prepared from partial cDNA

clones and used to screen cDNA libraries. The full-length clones isolated may be subcloned into expression vectors and activity assayed by transfection into suitable host cells, e.g. with a reporter plasmid.
A method may include hybridisation of one or more (e. g. two) probes or primers to target nucleic acid. Where the nucleic acid is double-stranded DNA, hybridisation will generally be preceded by denaturation to produce single-stranded DNA. The iohybridisation may be as part of a PCR procedure, or as part of a probing procedure not involving PCR. An example procedure would be a combination of PCR and low stringency hybridisation. A screening procedure, chosen from the many available to those skilled in the art, is used to identify l5successful hybridisation events and isolated hybridised nucleic acid.
Binding of a probe to target nucleic acid (e.g. DNA) may be measured using any of a variety of techniques at the disposal 2oof those skilled in the art. For instance, probes may be radioactively, fluorescently or enzymatically labelled. Other methods not employing labelling of probe include examination of restriction fragment length polymorphisms, amplification using PCR, RN'ase cleavage and allele specific oligonucleotide 25probing. Probing may employ the standard Southern blotting technique. For instance DNA may be extracted from cells and digested with different restriction enzymes. Restriction fragments may then be separated by electrophoresis on an agarose gel, before denaturation and transfer to a nitrocellulose filter. Labelled probe may be hybridised to 5the DNA fragments on the filter and binding determined. DNA
for probing may be prepared from RNA preparations from cells.
Preliminary experiments may be performed by hybridising under low stringency conditions various probes to Southern blots of lODNA digested with restriction enzymes. Suitable conditions would be achieved when a large number of hybridising fragments were obtained while the background hybridisation was low.
Using these conditions nucleic acid libraries, e.g. cDNA
libraries representative of expressed sequences, may be l5searched. Those skilled in the art are well able to employ suitable conditions of the desired stringency for selective hybridisation, taking into account factors such as oligonucleotide length and base composition, temperature and so on. On the basis of amino acid sequence information, 20oligonucleotide probes or primers may be designed, taking into account the degeneracy of the genetic code, and, where appropriate, codon usage of the organism from the candidate nucleic acid is derived. An oligonucleotide for use in nucleic acid amplification may have about 10 or fewer codons (e.g. 6, 7 or 8), i.e. be about 30 or fewer nucleotides in length (e.g. 18, 21 or 24). Generally specific primers are upwards of 14 nucleotides in length, but need not be than 18-20. Those skilled in the art are well versed in the design of primers for use processes such as PCR. Various techniques for synthesizing oligonucleotide primers are well known in the art, including phosphotriester and phosphodiester synthesis methods.
Preferred amino acid sequences suitable for use in the design of probes or PCR primers may include sequences conserved (completely, substantially or partly) encoding the motifs highlighted in Figure 3.
A further aspect of the present invention provides an oligonucleotide or polynucleotide fragment of the nucleotide l5sequence shown in any of the figures herein providing nucleic acid according to the present invention, or a complementary sequence, in particular for use in a method of obtaining and/or screening nucleic acid. Some preferred oligonucleotides have a sequence shown in Table 1, or a 2osequence which differs from any of the~sequences shown by addition, substitution, insertion or deletion of one or more nucleotides, but preferably without abolition of ability to hybridise selectively with nucleic acid in accordance with the present invention, that is wherein the degree of similarity of 25the oligonucleotide or polynucleotide with one of the sequences given is sufficiently high.

In some preferred embodiments, oligonucleotides according to the present invention that are fragments of any of the sequences shown, or any allele associated with IDDM or other disease susceptibility, are at least about 10 nucleotides in 5length, more preferably at least about 15 nucleotides in length, more preferably at least about 20 nucleotides in length. Such fragments themselves individually represent aspects of the present invention. Fragments and other oligonucleotides may be used as primers or probes as discussed lobut may also be generated (e. g. by PCR) in methods concerned with determining the presence in a test sample of a sequence indicative of IDDM or other disease susceptibility.
Methods involving use of nucleic acid in diagnostic and/or l5prognostic contexts, for instance in determining susceptibility to IDDM or other disease, and other methods concerned with determining the presence of sequences indicative of IDDM or other disease susceptibility are discussed below.
Further embodiments of oligonucleotides according to the present invention are anti-sense oligonucleotide sequences based on the nucleic acid sequences described herein. Anti-sense oligonucleotides may be designed to hybridise to the 25complementary sequence of nucleic acid, pre-mRNA or mature mRNA, interfering with the production of polypeptide encoded by a given DNA sequence (e.g. either native polypeptide or a mutant form thereof), so that its expression is reduce or prevented altogether. Anti-sense techniques may be used to target a coding sequence, a control sequence of a gene, e.g.
Sin the 5' flanking sequence, whereby the antisense oligonucleotides can interfere with control sequences. Anti-sense oligonucleotides may be DNA or RNA and may be of around 14-23 nucleotides, particularly around 15-18 nucleotides, in length. The construction of antisense sequences and their use lois described in Peyman and Ulman, Chemical Reviews, 90:543-584, (1990), and Crooke, Ann. Rev. Pharmacol. Toxicol., 32:329-376, (1992).
Nucleic acid according to the present invention may be used in l5methods of gene therapy, for instance in treatment of individuals with the aim of preventing or curing (wholly or partially) IDDM or other. disease. This may ease one or more symptoms of the disease. This is discussed below.
2oNucleic acid according to the present invention, such-as a full-length coding sequence or oligonucleotide probe or primer, may be provided as part of a kit, e.g. in a suitable container such as a vial in which the contents are protected from the external environment. The kit may include 25instructions for use of the nucleic acid, e.g. in PCR and/or a method for determining the presence of nucleic acid of interest in a test sample. A kit wherein the nucleic acid is intended for use in PCR may include one or more other reagents required for the reaction, such as polymerase, nucleosides, buffer solution etc. The nucleic acid may be labelled. A kit 5for use in determining the presence or absence of nucleic acid of interest may include one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, e.g. a swab for removing cells from the buccal cavity or a syringe for removing a blood sample (such locomponents generally being sterile).
According to a further aspect, the present invention provides a nucleic acid molecule including a SAPL gene promoter.
l5The promoter may comprise or consist essentially of a sequence of nucleotides 5' to the SAPL gene in the human chromosome, or an equivalent sequence in another species, such as the mouse.
Any of the sequences disclosed in the figures herein may be 2oused to construct a probe for use in identification and isolation of a promoter from a genomic library containing a genomic SAPL gene. Techniques and conditions for such probing are well known in the art and are discussed elsewhere herein.
To find minimal elements or motifs responsible for tissue 25and/or developmental regulation, restriction enzyme or nucleases may be used to digest a nucleic acid molecule, followed by an appropriate assay (for example using a reporter gene such as luciferase) to determine the sequence required.
A preferred embodiment of the present invention provides a nucleic acid isolate with the minimal nucleotide sequence 5required for SAPL promoter activity.
Figure 4 shows a sequence for the putative SAPL promoter.
Underlined sequences exhibit similarity to the syntenic mouse DNA sequence. Sequence in bold is found in the SAPL cDNA
losequence of Figure 1(a). Capital letters indicate bases that match the pattern for Sp1 transscription factor binding sites.
GGGGGTCC matches an NF kappa B transcription factor binding site. GCCAAT matches the CART site. The sequence was identified as a putative promoter by the computer algorithm ~5 PROMOTERSCAN (Prestridge (1995) J. Mol Biol. 249: 923-932) and corresponds to a CpG island.
As noted, the promoter may comprise one or more sequence motifs or elements conferring developmental and/or tissue-2ospecific regulatory control of expression. Other regulatory sequences may be included, for instance as identified by mutation or digest assay in an appropriate expression system or by sequence comparison with available information, e.g.
using a computer to search on-line databases.
By "promoter" is meant a sequence of nucleotides from which transcription may be initiated of DNA operably linked downstream (i.e. in the 3' direction on the sense strand of double-stranded DNA).
5"Operably linked" means joined as part of the same nucleic acid molecule, suitably positioned and oriented for transcription to be initiated from the promoter. DNA operably linked to a promoter is "under transcriptional initiation regulation'' of the promoter.
to The present invention extends to a promoter which has a nucleotide sequence which is allele, mutant, variant or derivative, by way of nucleotide addition, insertion, substitution or deletion of a promoter sequence as provided l5herein. Preferred levels of sequence homology with a provided sequence may be analogous to those set out above for encoding nucleic acid and polypeptides according to the present invention. Systematic or random mutagenesis of nucleic acid to make an alteration to the nucleotide sequence may be 2operformed using any technique known to those skilled in the art. One or more alterations to a promoter sequence according to the present invention may increase or decrease promoter activity, or increase or decrease the magnitude of the effect of a substance able to modulate the promoter activity.
"Promoter activity" is used to refer to ability to initiate transcription. The level of promoter activity is quantifiable for instance by assessment of the amount of mRNA produced by transcription from the promoter or by assessment of the amount of protein product produced by translation of mRNA produced by 5transcription from the promoter. The amount of a specific mRNA present in an expression system may be determined for example using specific oligonucleotides which are able to hybridise with the mRNA and which are labelled or may be used in a specific amplification reaction such as the polymerase lochain reaction. Use of a reporter gene facilitates determination of promoter activity by reference to protein production.
Further provided by the present invention is a nucleic acid l5construct comprising a SAPL promoter region or a fragment, mutant, allele, derivative or variant thereof able to promoter transcription, operably linked to a heterologous gene, e.g. a coding sequence. A "heterologous" or "exogenous" gene is generally not a modified form of SAPL. Generally, the gene 2omay be transcribed into mRNA which may be translated into a peptide or polypeptide product which may be detected and preferably quantitated following expression. A gene whose encoded product may be assayed following expression is termed a "reporter gene", i.e. a gene which "reports" on promoter 25 activity.

The reporter gene preferably encodes an enzyme which catalyses a reaction which produces a detectable signal, preferably a visually detectable signal, such as a coloured product. Many examples are known, including ~3-galactosidase and luciferase.
5(3-galactosidase activity may be assayed by production of blue colour on substrate, the assay being by eye or by use of a spectro-photometer to measure absorbance. Fluorescence, for example that produced as a result of luciferase activity, may be quantitated using a spectrophotometer. Radioactive assays lomay be used, for instance using chloramphenicol acetyltransferase, which may also be used in non-radioactive assays. The presence and/or amount of gene product resulting from expression from the reporter gene may be determined using a molecule able to bind the product, such as an antibody or l5fragment thereof. The binding molecule may be labelled directly or indirectly using any standard technique.
Those skilled in the art are well aware of a multitude of possible reporter genes and assay techniques which may be used 2oto determine gene activity. Any suitable reporter/assay may be used and it should be appreciated that no particular choice is essential to or a limitation of the present invention.
Nucleic acid constructs comprising a promoter (as disclosed 25herein) and a heterologous gene (reporter) may be employed in screening for a substance able to modulate activity of the promoter. For therapeutic purposes, e.g. for treatment of IDDM or other disease,~a substance~able to up-regulate expression of the promoter may be sought. A method of screening for ability of a substance to modulate activity of a 5promoter may comprise contacting an expression system, such as a host cell, containing a nucleic acid construct as herein disclosed with a test or candidate substance and determining expression of the heterologous gene.
loThe level of expression in the presence of the test substance may be compared with the level of expression in the absence of the test substance. A difference in expression in the presence of the test substance indicates ability of the substance to modulate gene expression. An increase in l5expression of the heterologous gene compared with expression of another gene not linked to a promoter as disclosed herein indicates specificity of the substance for modulation of the promoter.
2oA promoter construct may be introduced into-a cell line using any technique previously described to produce a stable cell --line containing the reporter construct integrated into the genome. The cells may be grown and incubated with test compounds for varying times. The cells may be grown in 96 25we11 plates to facilitate the analysis of large numbers of compounds. The cells may then be washed and the reporter gene expression analysed. For some reporters, such as luciferase the cells will be lysed then analysed.
Following identification of a substance which modulates or 5affects promoter activity, the substance may be investigated further. Furthermore, it may be manufactured and/or used in preparation, i.e. manufacture or formulation, of a composition such as a medicament, pharmaceutical composition or drug.
These may be administered to individuals.
to Thus, the present invention extends in various aspects not only to a substance identified using a nucleic acid molecule as a modulator of promoter activity, in accordance with what is disclosed herein, but also a pharmaceutical composition, l5medicament, drug or other composition comprising such a substance, a method comprising administration of such a composition to a patient, e.g. for increasing SAPL expression for instance in treatment (which may include preventative treatment) of IDDM or other disease, use of such a substance 2oin manufacture of a composition for administration, e.g. for increasing SAPL expression for instance in treatment of IDDM
or other disease, and a method of making a pharmaceutical composition comprising admixing such a substance with a pharmaceutically acceptable excipient, vehicle or carrier, and 25optionally other ingredients.

A further aspect of the present invention provides a polypeptide which has the amino acid sequence shown in Figure 1(c), Figure 1(d) or Figure 2(c), or includes the first 776 amino acids of Figure 1(c) and Figure 2(c), which are 5identical between SAPLa and SAPLb, which may be in isolated and/or purified form, free or substantially free of material with which it is naturally associated, such as other polypeptides or such as human polypeptides other than that for which the amino acid sequence is shown in a said figure, or (for example if produced by expression in a prokaryotic cell) lacking in native glycosylation, e.g. unglycosylated.
Polypeptides which are amino acid sequence variants, alleles, derivatives or mutants are also provided by the present i5invention. A polypeptide which is a variant, allele, derivative or mutant may have an amino acid sequence which differs from that given in a figure herein by one or more of addition, substitution, deletion and insertion of one or more amino acids. Preferred such polypeptides have SAPL function, 2othat is to say have one or more of the following properties:
immunological cross-reactivity with an antibody reactive the polypeptide for which the sequence is given in a figure herein; sharing an epitope with the polypeptide for which the amino acid sequence is shown in a figure herein (as determined 25for example by immunological cross-reactivity between the two polypeptides); a biological activity which is inhibited by an antibody raised against the polypeptide whose sequence is shown in a figure herein; ability to complement the yeast mutation; containing one or more of the conserved sequences identified in Figure 3; containing the STDSEE repeat.
5Alteration of sequence may change the nature and/or level of activity and/or stability of the SAPL protein.
A polypeptide which is an amino acid sequence variant, allele, derivative or mutant of the amino acid sequence shown in a lOfigure herein may comprise an amino acid sequence which shares greater than about 35o sequence identity with the sequence shown, greater than about 400, greater than about 500, greater than about 600, greater than about 700, greater than about 800, greater than about 900 or greater than about 95%. The l5sequence may share greater than about 60% similarity, greater than about 70o similarity, greater than about 80o similarity or greater than about 90o similarity with the amino acid sequence shown in the relevant figure. Amino acid similarity is generally defined with reference to the algorithm GAP
20 (Genetics Computer Group, Madison, WI) as noted above, or the TBLASTN program, of Altschul et al. (1990) J. Mol. Biol. 215:
403-10, or the Smith-Waterman algorithm (Smith and Waterman (1981) J. Mol Biol. 147: 195-197). Similarity allows for "conservative variation", i.e. substitution of one hydrophobic 25 residue such as isoleucine, valine, leucine or methionine for another, or the substitution of one polar residue for another, such as arginine for lysine, glutamic for aspartic acid, or glutamine for asparagine. Particular amino acid sequence variants may differ from that shown in a figure herein by insertion, addition, substitution or deletion of 1 amino acid, 52, 3, 4, 5-10, 10-20 20-30, 30-50, 50-100, 100-150, or more than 150 amino acids.
Sequence comparison may be made over the full-length of the relevant sequence shown herein, or may more preferably be over 1oa contiguous sequence of about or greater than about 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 133, 150, 167, 200, 233, 250, 267, 300, 333, 350, 400, 450, 500, 550, 600, 650, 700, 750, 760, 770, 776, 780, or 790 amino acids or nucleotide triplets, compared with the relevant amino acid sequence or nucleotide i5sequence as the case may be.
The present invention also includes peptides which include or consist of fragments of a polypeptide of the invention.
2oThe skilled person can use the techniques described herein and others well known in the art to produce large amounts of peptides, for instance by expression from encoding nucleic acid.
25Peptides can also be generated wholly or partly by chemical synthesis. The compounds of the present invention can be readily prepared according to well-established, standard liquid or, preferably, solid-phase peptide synthesis methods, general descriptions of which are broadly available (see, for example, in J.M. Stewart and J.D. Young, Solid Phase Peptide 5Synthesis, 2nd edition, Pierce Chemical Company, Rockford, Illinois (1984), in M. Bodanzsky and A. Bodanzsky, The Practice of Peptide Synthesis, Springer Verlag, New York (1984); and Applied Biosystems 430A Users Manual, ABI Inc., Foster City, California), or they may be prepared in solution, loby the liquid phase method or by any combination of solid-phase, liquid phase and solution chemistry, e.g. by first completing the respective peptide portion and then, if desired and appropriate, after removal of any protecting groups being present, by introduction of the residue X by reaction of the l5respective carbonic or sulfonic acid or a reactive derivative thereof.
The present invention also includes active portions, fragments, derivatives and functional mimetics of the 2opolypeptides of the invention. An "active portion" of a polypeptide means a peptide which is less than said full length polypeptide, but which retains a biological activity such as disclosed herein.
25A "fragment" of a polypeptide generally means a stretch of amino acid residues of at least about five contiguous amino acids, often at least about seven contiguous amino acids, typically at least about nine contiguous amino acids, more preferably at least about 13 contiguous amino acids, and, more preferably, at least about 20 to 30 or more contiguous amino 5 acids. Fragments of the SAPL polypeptide sequence may include antigenic determinants or epitopes useful for raising antibodies to a portion of the amino acid sequence. Alanine scans are commonly used to find and refine peptide motifs within polypeptides, this involving the systematic replacement roof each residue in turn with the amino acid alanine, followed by an assessment of biological activity.
Preferred fragments of SAPL include those with any of the following amino acid sequences:

RIQQFDDGGSDEEDI
PESQRRSSSGSTDSE
PSSSPEQRTGQPSAPGDTS
which may be used for instance in raising or isolating 2oantibodies. Variant and derivative peptides, peptides which have an amino acid sequence which differs from one of these sequences by way of addition, insertion, deletion or substitution of one or more amino acids are also provided by the present invention, generally with the proviso that the 25 variant or derivative peptide is bound by an antibody or other specific binding member which binds one of the peptides whose sequence is shown. A peptide which is a variant or derivative of one of the shown peptides may compete with the shown peptide for binding to a specific binding member, such as an antibody or antigen-binding fragment thereof.
Where additional amino acids are included in a peptide, these may be heterologous or foreign to the polypeptide of the invention, and the peptide may be about 20, 25, 30 or 35 amino acids in length. A peptide according to this aspect may be loincluded within a larger fusion protein, particularly where the peptide is fused to a non-SAPL (i.e. heterologous or foreign) sequence, such as a polypeptide or protein domain.
A "derivative" of a polypeptide or a fragment thereof may l5include a polypeptide modified by varying the amino acid sequence of the protein, e.g. by manipulation of the nucleic acid encoding the protein or by altering the protein itself.
Such derivatives of the natural amino acid sequence may involve one or more of insertion, addition, deletion or 2osubstitution of one or more amino acids, which may be-without fundamentally altering the qualitative nature of biological activity of the wild type polypeptide. Also encompassed within the scope of the present invention are functional mimetics of active fragments of the SAPL polypeptides provided 25 (including alleles, mutants, derivatives and variants). The term "functional mimetic" means a substance which may not contain an active portion of the relevant amino acid sequence, and probably is not a peptide at all, but which retains in qualitative terms biological activity of natural SAPL
polypeptide. The design and screening of candidate mimetics sis described in detail below.
Other fragments of the polypeptides for which sequence information is provided herein are provided as aspects of the present invention, for instance corresponding to functional z0 domains .
A polypeptide according to the present invention may be isolated and/or purified (e. g. using an antibody) for instance after production by expression from encoding nucleic acid (for l5which see below). Thus, a polypeptide may be provided free or substantially free from contaminants with which it is naturally associated (if it is a naturally-occurring polypeptide). A polypeptide may be provided free or substantially free of other polypeptides. Polypeptides 2oaccording to the present invention may be generated wholly or partly by chemical synthesis. The isolated and/or purified polypeptide may be used in formulation of a composition, which may include at least one additional component, for example a pharmaceutical composition including a pharmaceutically 25acceptable excipient, vehicle or carrier. A composition including a polypeptide according to the invention may be used in prophylactic and/or therapeutic treatment as discussed below.
A polypeptide, peptide, allele, mutant, derivative or variant 5according to the present invention may be used as an immunogen or otherwise in obtaining specific antibodies. Antibodies are useful in purification and other manipulation of polypeptides and peptides, diagnostic screening and therapeutic contexts.
This is discussed further below.
A polypeptide according to the present invention may be used in screening for molecules which affect or modulate its activity or function, e.g. binding to or modulating the activity of a protein phosphatase, or ability to complement SAP mutant yeast. Such molecules may interact with SAPL or with one or more accessory molecules, and may be useful in a therapeutic (possibly including prophylactic) context.
It is well known that pharmaceutical research leading to the 2oidentification of a new drug may involve the screening of very large numbers of candidate substances, both before and even after a lead compound has been found. This is one factor which makes pharmaceutical research very expensive and time-consuming. Means for assisting in the screening process can 25have considerable commercial importance and utility. Such means for screening for substances potentially useful in treating or preventing IDDM or other disease is provided by polypeptides according to the present invention. Substances identified as modulators of the polypeptide represent an advance in the fight against IDDM and other diseases since 5they provide basis for design and investigation of therapeutics for in vivo use. Furthermore, they may be useful in any of a number of conditions, including autoimmune diseases, such as glomerulonephritis, diseases and disorders involving cellular proliferation, such as psoriasis, tumors b and cancer, given the functional indications for SAPL, discussed elsewhere herein. As noted elsewhere, SAPL , fragments thereof, and nucleic acid according to the invention may also be useful in combatting any of these diseases and disorders.
In various further aspects the present invention relates to screening and assay methods and means, and substances identified thereby.
2oThus, further aspects of the present invention provide the use of a polypeptide or peptide (particularly a fragment of a polypeptide of the invention as disclosed, and/or encoding nucleic acid therefor, in screening or searching for and/or obtaining/identifying a substance, e.g. peptide or chemical compound, which interacts and/or~binds with the polypeptide or peptide and/or interferes with its function or activity or that of another substance, e.g. polypeptide or peptide, which interacts and/or binds with the polypeptide or peptide of the invention. For instance, a method according to one aspect of the invention includes providing a polypeptide or peptide of 5the invention and bringing it into contact with a substance, which contact may result in binding between the polypeptide or peptide and the substance. Binding may be determined by any of a number of techniques available in the art, both qualitative and quantitative.
In various aspects the present invention is concerned with provision of assays for substances which inhibit interaction between a polypeptide of the invention and one or more protein phosphatases, particularly those similar to SIT4 such as human l5protein phosphatase 6 (Bastians and Ponstingle (1996) J. Cell.
Sci. 109: 2865-2874).
Further assays are for substances which interact with or bind a polypeptide of the invention and/or modulate one or more of 20its activities.
One aspect of the present invention provides an assay which includes:
(a) bringing into contact a polypeptide or peptide according 25to the invention and a putative binding molecule or other test substance; and (b) determining interaction or binding between the polypeptide or peptide and the test substance.
A substance which interacts with the polypeptide or peptide of 5the invention may be isolated and/or purified, manufactured and/or used to modulate its activity as discussed.
A further aspect of the present invention provides an assay method which includes:
l0 (a) bringing into contact a substance including a SAPL
polypeptide or fragment, mutant, variant or derivative thereof, a substance including a fragment of a second polypeptide or a fragment, mutant, variant or derivative of said second polypeptide, which is able to bind the SAPL
l5polypeptide; and a test compound, under conditions in which in the absence of the test compound being an inhibitor, the two said substances interact;
(b) determining interaction between said substance.
2oIt is not necessary to use the entire proteins for assays of the invention which test for binding between two molecules.
Fragments may be generated and used in any suitable way known to those of skill in the art. Suitable ways of generating fragments include, but are not limited to, recombinant 25expression of a fragment from encoding DNA. Such fragments may be generated by taking encoding DNA, identifying suitable restriction enzyme recognition sites either side of the portion to be expressed, and cutting out said portion from the DNA. The portion may then be operably linked to a suitable promoter in a standard commercially available expression 5system. Another recombinant approach is to amplify the relevant portion of the DNA with suitable PCR primers. Small fragments (e.g. up to about 20 or 30 amino acids) may also be generated using peptide synthesis methods which are well known in the art.
to The precise format of the assay of the invention may be varied by those of skill in the art using routine skill and knowledge. For example, the interaction between the polypeptides may be studied in vitro by labelling one with a l5detectable label and bringing it into contact with the other which has been immobilised on a solid support. Suitable detectable labels include 35S-methionine which may be incorporated into recombinantly produced peptides and polypeptides. Recombinantly produced peptides and 2opolypeptides may also be expressed as a fusion protein containing an epitope which can be labelled with an antibody.
Fusion proteins may be generated that incorporate six histidine residues at either the N-terminus or C-terminus of 25the recombinant protein. Such a histidine tag may be used for purification of the protein by using commercially available columns which contain a metal ion, either nickel or cobalt (Clontech, Palo Alto, CA, USA). These tags also serve for detecting the protein using commercially available monoclonal antibodies directed against the six histidine residues (Clontech, Palo Alto, CA, USA).
The protein which is immobilized on a solid support may be immobilized using an antibody against that protein bound to a solid support or via other technologies which are known per l0 se. A preferred in vitro interaction may utilise a fusion protein including glutathione-S-transferase (GST). This may be immobilized on glutathione agarose beads. In an in vitro assay format of the type described above a test compound can be assayed by determining its ability to diminish the amount l5of labelled peptide or polypeptide which binds to the immobilized GST-fusion polypeptide. This may be determined by fractionating the glutathione-agarose beads by SDS-polyacrylamide gel electrophoresis. Alternatively, the beads may be rinsed to remove unbound protein and the amount of 2oprotein which has bound can be determined by counting the amount of label present in, for example, a suitable scintillation counter.
An assay according to the present invention may also take the 25 form of an in vivo assay. The in vivo assay may be performed in a cell line such as a yeast strain in which the relevanty~

polypeptides or peptides are expressed from one or more vectors introduced into the cell.
A method of screening for a substance which modulates activity 5of a polypeptide may include contacting one or more test substances with the polypeptide in a suitable reaction medium, testing the activity of the treated polypeptide and comparing that activity with the activity of the polypeptide in comparable reaction medium untreated with the test substance loor substances. A difference in activity between the treated and untreated polypeptides is indicative of a modulating effect of the relevant test substance or substances.
In a further aspect of the invention there is provided an l5assay method which includes:
(a) bringing into contact a substance including a fragment of a polypeptide according to the invention including a putative phosphorylation site, e.g. as identified in Table 4, or a mutant, variant or derivative thereof and a test compound in 2othe presence of a kinase under conditions in which the kinase normally phosphorylates said fragment, mutant, variant or derivative; and (b) determining phosphorylation of said fragment, mutant, variant or derivative.
The kinase may be, for example, cAMP dependent protein kinase, protein kinase C, or casein kinase 2 Phosphorylation may be determined for example by immobilising a polypeptide of the invention, or fragment, mutant, variant 5 or derivative thereof, e.g. on a bead or plate, and detecting phosphorylation using an antibody or other binding molecule which binds the relevant site of phosphorylation with a different affinity when the site is phosphorylated from when the site is not phosphorylated. Such antibodies may be l0obtained by means of any standard technique as discussed elsewhere herein, e.g. using a phosphorylated peptide (such as a fragment of a SAPL polypeptide). Binding of a binding molecule which discriminates between the phosphorylated and non-phosphorylated form of the polypeptide or relevant l5fragment, mutant, variant or derivative thereof may be assessed using any technique available to those skilled in the art, which may involve determination of the presence of a suitable label, such as fluorescence. Phosphorylation may be determined by immobilisation of the polypeptide or a fragment, 2o mutant, variant or derivative thereof, on a-suitable substrate such as a bead or plate, wherein the substrate is impregnated with scintillant, such as in a standard scintillation proximetry assay, with phosphorylation being determined via measurement of the incorporation of radioactive phosphate.
25 Phosphate incorporation into a polypeptide or a fragment, mutant, variant or derivative thereof, may be determined by precipitation with acid, such as trichloroacetic acid, and collection of the precipitate on a suitable material such as nitrocellulose filter paper, followed by measurement of incorporation of radiolabeled phosphate. SDS-PAGE separation of substrate may be employed followed by detection of radiolabel.
Combinatorial library technology (Schultz, JS (1996) Biotechnol. Prog. 12:729-743) provides an efficient way of lotesting a potentially vast number of different substances for ability to modulate activity of a polypeptide. Prior to or as well as being screened for modulation of activity, test substances may be screened for ability to interact with the polypeptide, e.g. in a yeast two-hybrid system (which requires l5that both the polypeptide and the test substance can be expressed in yeast from encoding nucleic acid). This may be used as a coarse screen prior to testing a substance for actual ability to modulate activity of the polypeptide.
2oThe amount of test substance or compound which may be added to an assay of the invention will normally be determined by trial and error depending upon the type of compound used.
Typically, from about 0.01 to 100 nM concentrations of putative inhibitor compound may be used, for example from 0.1 25 to 10 nM. Greater concentrations may be used when a peptide is the test substance.

Compounds which may be used may be natural or synthetic chemical compounds used in drug screening programmes.
Extracts of plants which contain several characterised or uncharacterised components may also be used. A further class 5of putative inhibitor compounds can be derived from the SAPL
polypeptide and/or a ligand which binds. Peptide fragments of from 5 to 40 amino acids, for example from 6 to 10 amino acids from the region of the relevant polypeptide responsible for interaction, may be tested for their ability to disrupt such interaction .
Other candidate inhibitor compounds may be based on modelling the 3-dimensional structure of a polypeptide or peptide fragment and using rational drug design to provide potential l5inhibitor compounds with particular molecular shape, size and charge characteristics.
Following identification of a substance which modulates or affects polypeptide activity, the substance may be 2oinvestigated further. Furthermore,it may be manufactured and/or used in preparation, i.e. manufacture or formulation, of a composition such as a medicament, pharmaceutical composition or drug. These may be administered to individuals.
Thus, the present invention extends in various aspects not only to a substance identified as a modulator of polypeptide activity, in accordance with what is disclosed herein, but also a pharmaceutical composition, medicament, drug or other composition comprising such a substance, a method comprising 5administration of such a composition to a patient, e.g. for treatment (which may include preventative treatment) of IDDM
or other disease, use of such a substance in manufacture of a composition for administration, e.g. for treatment of IDDM or other disease, and a method of making a pharmaceutical locomposition comprising admixing such a substance with a pharmaceutically acceptable excipient, vehicle or carrier, and optionally other ingredients.
A substance identified using as a modulator of polypeptide or l5promoter function may be peptide or non-peptide in nature.
Non-peptide "small molecules" are often preferred for many in vivo pharmaceutical uses. Accordingly, a mimetic or mimick of the substance (particularly if a peptide) may be designed for pharmaceutical use. The designing of mimetics to a known 2opharmaceutically active compound is.a known.approach to the development of pharmaceuticals based on a "lead" compound.
This might be desirable where the active compound is difficult or expensive to synthesise or where it is unsuitable for a particular method of administration, e.g. peptides are not 25we11 suited as active agents for oral compositions as they tend to be quickly degraded by proteases in the alimentary canal. Mimetic design, synthesis and testing may be used to avoid randomly screening large number of molecules for a target property.
5There are several steps commonly taken in the design of a mimetic from a compound having a given target property.
Firstly, the particular parts of the compound that are critical and/or important in determining the target property are determined. In the case of a peptide, this can be done by losystematically varying the amino acid residues in the peptide, e.g. by substituting each residue in turn. These parts or residues constituting the active region of the compound are known as its "pharmacophore".
i50nce the pharmacophore has been found, its structure is modelled to according its physical properties, e.g.
stereochemistry, bonding, size and/or charge, using data from a range of sources, e.g. spectroscopic techniques, X-ray diffraction data and NMR. Computational analysis, similarity 2omapping (which models the charge and/or volume of a pharmacophore, rather than the bonding between atoms) and other techniques can be used in this modelling process.
In a variant of this approach, the three-dimensional structure 25of the ligand and its binding partner are modelled. This can be especially useful where the ligand and/or binding partner SO
change conformation on binding, allowing the model to take account of this the design of the mimetic.
A template molecule is then selected onto which chemical 5groups which mimic the pharmacophore can be grafted. The template molecule and the chemical groups grafted on to it can conveniently be selected so that the mimetic is easy to synthesise, is likely to be pharmacologically acceptable, and does not degrade in vivo, while retaining the biological loactivity of the lead compound. The mimetic or mimetics found by this approach can then be screened to see whether they have the target property, or to what extent they exhibit it.
Further optimisation or modification can then be carried out to arrive at one or more final mimetics for in vivo or l5clinical testing.
Mimetics of substances identified as having ability to modulate SAPL polypeptide or promoter activity using a screening method as disclosed herein are included within the 2oscope of the present invention. A polypeptide, peptide or substance able to modulate activity of a polypeptide according to the present invention may be provided in a kit, e.g. sealed in a suitable container which protects its contents from the external environment. Such a kit may include instructions for 25 use .

A convenient way of producing a polypeptide according to the present invention is to express nucleic acid encoding it, by use of the nucleic acid in an expression system. Accordingly, the present invention also encompasses a method of making a 5polypeptide (as disclosed), the method including expression from nucleic acid encoding the polypeptide (generally nucleic acid according to the invention). This may conveniently be achieved by growing a host cell in culture, containing such a vector, under appropriate conditions which cause or allow loexpression of the polypeptide. Polypeptides may also be expressed in in vitro systems, such as reticulocyte lysate.
Systems for cloning and expression of a polypeptide in a variety of different host cells are well known. Suitable host l5cells include bacteria, eukaryotic cells such as mammalian and yeast, and baculovirus systems. Mammalian cell lines available in the art for expression of a heterologous polypeptide include Chinese hamster ovary cells, HeLa cells, baby hamster kidney cells, COS cells and many others. A
2ocommon, preferred bacterial host is E. coli. Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. Vectors may 25be plasmids, viral e.g. 'phage, or phagemid, as appropriate.
For further details see, for example, Molecular Cloning: a Laboratory Manual: 2nd edition, Sambrook et al., 1989, Cold Spring Harbor Laboratory Press. Many known techniques and protocols for manipulation of nucleic acid, for example in preparation of nucleic acid constructs, mutagenesis, 5sequencing, introduction of DNA into cells and gene expression, and analysis of proteins, are described in detail in Current Protocols in Molecular Biology, Ausubel et al.
eds., John Wiley & Sons, 1992.
loThus, a further aspect of the present invention provides a host cell containing nucleic acid as disclosed herein. The nucleic acid of the invention may be integrated into the genome (e.g. chromosome) of the host cell. Integration may be promoted by inclusion of sequences which promote recombination l5with the genome, in accordance with standard techniques. The nucleic acid may be on an extra-chromosomal vector within the cell.
A still further aspect provides a method which includes 2ointroducing the nucleic acid into a_host cell. The introduction, which may (particularly for in vitro introduction) be generally referred to without limitation as "transformation", may employ any available technique. For eukaryotic cells, suitable techniques may include calcium 25phosphate transfection, DEAF-Dextran, electroporation, liposome-mediated transfection and transduction using retrovirus or other virus, e.g. adenovirus, vaccinia or, for insect cells, baculovirus. For bacterial cells, suitable techniques may include calcium chloride transformation, electroporation and transfection using bacteriophage.
Marker genes such as antibiotic resistance or sensitivity genes may be used in identifying clones containing nucleic acid of interest, as is well known in the art.
loThe introduction may be followed by causing or allowing expression from the nucleic acid, e.g. by culturing host cells (which may include cells actually transformed although more likely the cells will be descendants of the transformed cells) under conditions for expression of the gene, so that l5the encoded polypeptide is produced. If the polypeptide is expressed coupled to an appropriate signal leader peptide it may be secreted from the cell into the culture medium.
Following production by expression, a polypeptide may be isolated and/or purified from the host cell and/or culture 2omedium, as the case may be, and subsequently used as desired, e.g. in the formulation of a composition which may include one or more additional components, such as a pharmaceutical composition which includes one or more pharmaceutically acceptable excipients, vehicles or carriers (e. g. see below).
Introduction of nucleic acid may take place in vivo by way of gene therapy, as discussed below. A host cell containing nucleic acid according to the present invention, e.g. as a result of introduction of the nucleic acid into the cell or into an ancestor of the cell and/or genetic alteration of the 5sequence endogenous to the cell or ancestor (which introduction or alteration may take place in vivo or ex vivo), may be comprised (e. g. in the soma) within an organism which is an animal, particularly a mammal, which may be human or non-human, such as rabbit, guinea pig, rat, mouse or other lorodent, cat, dog, pig, sheep, goat, cattle or horse, or which is a bird, such as a chicken. Genetically modified or transgenic animals or birds comprising such a cell are also provided as further aspects of the present invention.
lSThus, in various further aspects, the present invention provides a non-human animal with a human SAPL transgene within its genome. The transgene may have the sequence of any of the isoforms identified herein or a mutant, derivative, allele or variant thereof as disclosed. In one preferred embodiment, 2othe heterologous human SAPL sequence replaces the endogenous animal sequence. In other preferred embodiments, one or more copies of the human SAPL sequence are added to the animal genome. Preferably the animal is a rodent, and most preferably mouse or rat.
This may have a therapeutic aim. (Gene therapy is discussed below.) The presence of a mutant, allele or variant sequence within cells of an organism, particularly when in place of a homologous endogenous sequence, may allow the organism to be used as a model in testing and/or studying the role of the 5 SAPL gene or substances which modulate activity of the encoded polypeptide and/or promoter in vitro or are otherwise indicated to be of therapeutic potential.
An animal model for SAPL deficiency may be constructed using lostandard techniques for introducing mutations into an animal germ-line. In one example of this approach, using a mouse, a vector carrying an insertional mutation within the SAPL gene may be transfected into embryonic stem cells. A selectable marker, for example an antibiotic resistance gene such as l5neoR, may be included to facilitate selection of clones in which the mutant gene has replaced the endogenous wild type homologue. Such clones may be also be identified or further investigated by Southern blot hybridisation. The clones may then be expanded and cells injected into mouse blastocyst 2ostage embryos. Mice in which the injected cells have contributed to the development of the mouse may be identified by Southern blotting. These chimeric mice may then be bred to produce mice which carry one copy of the mutation in the germ line. These heterozygous mutant animals may then be bred to 25produce mice carrying mutations in the gene homozygously. The mice having a heterozygous mutation in the SAPL gene may be a suitable model for human individuals having one copy of the gene mutated in the germ line who are at risk of developing IDDM or other disease.
SAnimal models may also be useful for any of the various diseases discussed elsewhere herein.
Instead of or as well as being used for the production of a polypeptide encoded by a transgene, host cells may be used as loa nucleic acid factory to replicate the nucleic acid of interest in order to generate large amounts of it. Multiple copies of nucleic acid of interest may be made within a cell when coupled to an amplifiable gene such as dihyrofolate reductase (DHFR), as is well known. Host cells transformed l5with nucleic acid of interest, or which are descended from host cells into which nucleic acid was introduced, may be cultured under suitable conditions, e.g. in a fermentor, taken from the culture and subjected to processing to purifiy the nucleic acid. Following purification, the nucleic acid or one 2oor more fragments thereof may be used as desired, for instance in a diagnostic or prognostic assay as discussed elsewhere herein.
The provision of the novel SAPL polypeptide isoforms and 25mutants, alleles, variants and derivatives enables for the first time the production of antibodies able to bind these molecules specifically.
Accordingly, a further aspect of the present invention provides an antibody able to bind specifically to the 5polypeptide whose sequence is given in a figure herein. Such an antibody may be specific in the sense of being able to distinguish between the polypeptide it is able to bind and other human polypeptides for which it has no or substantially no binding affinity (e. g. a binding affinity of about 1000x ioless). Specific antibodies bind an epitope on the molecule which is either not present or is not accessible on other molecules. Antibodies according to the present invention may be specific for the wild-type polypeptide. Antibodies according to the invention may be specific for a particular l5mutant, variant, allele or derivative polypeptide as between that molecule and the wild-type polypeptide, so as to be useful in diagnostic and prognostic methods as discussed below. Antibodies are also useful in purifying the polypeptide or polypeptides to which they bind, e.g. following 2oproduction by recombinant expression from encoding nucleic acid.
Preferred antibodies according to the invention are isolated, in the sense of being free from contaminants such as 25antibodies able to bind other polypeptides and/or free of serum components. Monoclonal antibodies are preferred for some purposes, though polyclonal antibodies are within the scope of the present invention.
Antibodies may be obtained using techniques which are standard Sin the art. Methods of producing antibodies include immunising a mammal (e. g. mouse, rat, rabbit, horse, goat, sheep or monkey) with the protein or a fragment thereof.
Antibodies may be obtained from immunised animals using any of a variety of techniques known in the art, and screened, lopreferably using binding of antibody to antigen of interest.
For instance, Western blotting techniques or immunoprecipitation may be used (Armitage et al., 1992, Nature 357: 80-82). Isolation of antibodies and/or antibody-producing cells from an animal may be accompanied by a step of l5sacrificing the animal.
As an alternative or supplement to immunising a mammal with a peptide, an antibody specific for a protein may be obtained from a recombinantly produced library of expressed 2oimmunoglobulin variable domains, e.g. using lambda bacteriophage or filamentous bacteriophage which display functional immunoglobulin binding domains on their surfaces;
for instance see W092/01047. The library may be naive, that is constructed from sequences obtained from an organism which 25has not been immunised with any of the proteins (or fragments), or may be one constructed using sequences obtained from an organism which has been exposed to the antigen of interest.
Suitable peptides for use in immunising an animal and/or 5isolating anti-SAPL antibody include any of the following amino acid sequences:
HPSQEEDRHSNASQ
RIQQFDDGGSDEEDI
PESQRRSSSGSTDSE
PSSSPEQRTGQPSAPGDTS
Antibodies according to the present invention may be modified in a number of ways. Indeed the term "antibody" should be construed as covering any binding substance having a binding domain with the required specificity. Thus the invention covers antibody fragments, derivatives, functional equivalents and homologues of antibodies, including synthetic molecules and molecules whose shape mimics that of an antibody enabling it to bind an antigen or epitope.
Example antibody fragments, capable of binding an antigen or other binding partner are the Fab fragment consisting of the VL, VH, C1 and CH1 domains; the Fd fragment consisting of the VH and CH1 domains; the Fv fragment consisting of the VL and 25VH domains of a single arm of an antibody; the dAb fragment which consists of a VH domain; isolated CDR regions and F(ab')2 fragments, a bivalent fragment including two Fab fragments linked by a disulphide bridge at the hinge region.
Single chain Fv fragments are also included.
5A hybridoma producing a monoclonal antibody according to the present invention may be subject to genetic mutation or other changes. It will further be understood by those skilled in the art that a monoclonal antibody can be subjected to the techniques of recombinant DNA technology to produce other l0antibodies or chimeric molecules which retain the specificity of the original antibody. Such techniques may involve introducing DNA encoding the immunoglobulin variable region, or the complementarity determining regions (CDRs), of an antibody to the constant regions, or constant regions plus i5framework regions, of a different immunoglobulin. See, for instance, EP184187A, GB 2188638A or EP-A-0239400. Cloning and expression of chimeric antibodies are described in EP-A-0120694 and EP-A-0125023.
2oHybridomas capable of producing antibody with desired binding characteristics are within the scope of the present invention, as are host cells, eukaryotic or prokaryotic, containing nucleic acid encoding antibodies (including antibody fragments) and capable of their expression. The 25invention also provides methods of production of the antibodies including growing a cell capable of producing the antibody under conditions in which the antibody is produced, and preferably secreted. -The reactivities of antibodies on a sample may be determined 5by any appropriate means. Tagging with individual reporter molecules is one possibility. The reporter molecules may directly or indirectly generate detectable, and preferably measurable, signals. The linkage of reporter molecules may be directly or indirectly, covalently, e.g. via a peptide bond or lonon-covalently. Linkage via a peptide bond may be as a result of recombinant expression of a gene fusion encoding antibody and reporter molecule.
One favoured mode is by covalent linkage of each antibody with l5an individual fluorochrome, phosphor or laser dye with spectrally isolated absorption or emission characteristics.
Suitable fluorochromes include fluorescein, rhodamine, phycoerythrin and Texas Red. Suitable chromogenic dyes include diaminobenzidine.
Other reporters include macromolecular colloidal particles or particulate material such as latex beads that are coloured, magnetic or paramagnetic, and biologically or chemically active agents that can directly or indirectly cause detectable 25signals to be visually observed, electronically detected or otherwise recorded. These molecules may be enzymes which catalyse reactions that develop or change colours or cause changes in electrical properties, for example. They may be molecularly excitable, such that electronic transitions between energy states result in characteristic spectral 5absorptions or emissions. They may include chemical entities used in conjunction with biosensors. Biotin/avidin or biotin/streptavidin and alkaline phosphatase detection systems may be employed.
1o The mode of determining binding is not a feature of the present invention and those skilled in the art are able to choose a suitable mode according to their preference and general knowledge. Particular embodiments of antibodies according to the present invention include antibodies able to l5bind and/or which bind specifically, e.g. with an affinity of at least 10-' M, to one of the following peptides:
HPSQEEDRHSNASQ
RIQQFDDGGSDEEDI
PESQRRSSSGSTDSE

Antibodies according to the present invention may be used in screening for the presence of a polypeptide, for example in a test sample containing cells or cell lysate as discussed, and 25 may be used in purifying and/or isolating a polypeptide according to the present invention, for instance following production of the polypeptide by expression from encoding nucleic acid therefor. Antibodies may modulate the activity of the polypeptide to which they bind and so, if that polypeptide has a deleterious effect in an individual, may be 5useful in a therapeutic context (which may include prophylaxis).
An antibody may be provided in a kit, which may include instructions for use of the antibody, e.g. in determining the lOpresence of a particular substance in a test sample. One or more other reagents may be included, such as labelling molecules, buffer solutions, elutants and so on. Reagents may be provided within containers which protect them from the external environment, such as a sealed vial.
The identification of the SAPL gene and indications of its association with IDDM and other diseases paves the way for aspects of the present invention to provide the use of materials and methods, such as are disclosed and discussed 2oabove, for establishing the presence or absence in a test sample of an variant form of the gene, in particular an allele or variant specifically associated with IDDM or other disease.
This may be for diagnosing a predisposition of an individual to IDDM or other disease. It may be for diagnosing IDDM of a 25patient with the disease as being associated with the SAPL
gene.

This allows for planning of appropriate therapeutic and/or prophylactic treatment, permitting stream-lining of treatment by targeting those most likely to benefit.
5A variant form of the gene may contain one or more insertions, deletions, substitutions and/or additions of one or more nucleotides compared with the wild-type sequence (such as shown in Table 2) which may or may not disrupt the gene function. Differences at the nucleic acid level are not lonecessarily reflected by a difference in the amino acid sequence of the encoded polypeptide. However, a mutation or other difference in a gene may result in a frame-shift or stop codon, which could seriously affect the nature of the polypeptide produced (if any), or a point mutation or gross l5mutational change to the encoded polypeptide, including insertion, deletion, substitution and/or addition of one or more amino acids or regions in the polypeptide. A mutation in a promoter sequence or other regulatory region may prevent or reduce expression from the gene or affect the processing or 2ostability of the mRNA transcript. For instance, a sequence alteration may affect alternative splicing of mRNA. As discussed, various SAPL isoforms resulting from alternative splicing are provided by the present invention.
25There are various methods for determining the presence or absence in a test sample of a particular nucleic acid sequence, such as the sequence shown in any figure herein, or a mutant, variant or allele thereof, e.g. including an alteration shown in Table 2.
5Tests may be carried out on preparations containing genomic DNA, cDNA and/or mRNA. Testing cDNA or mRNA has the advantage of the complexity of the nucleic acid being reduced by the absence of intron sequences, but the possible disadvantage of extra time and effort being required in making the lopreparations. RNA is more difficult to manipulate than DNA
because of the wide-spread occurrence of RN'ases. Nucleic acid in a test sample may be sequenced and the sequence compared with the sequence shown in any of the figures herein, to determine whether or not a difference is present. If so, l5the difference can be compared with known susceptibility alleles (e.g. as shown in Table 2) to determine whether the test nucleic acid contains one or more of the variations indicated, or the difference can be investigated for association with IDDM or other disease.
Since it will not generally be time- or labour-efficient to sequence all nucleic acid in a test sample or even the whole SAPL gene, a specific amplification reaction such as PCR using one or more pairs of primers may be employed to amplify the 25region of interest in the nucleic acid, for instance the SAPL
gene or a particular region in which polymorphisms associated with IDDM or other disease susceptibility occur. The amplified nucleic acid may then be sequenced as above, and/or tested in any other way to determine the presence or absence of a particular feature. Nucleic acid for testing may be 5prepared from nucleic acid removed from cells or in a library using a variety of other techniques such as restriction enzyme digest and electrophoresis.
Nucleic acid may be screened using a variant- or allele-ZOspecific probe. Such a probe corresponds in sequence to a region of the SAPL gene, or its complement, containing a sequence alteration known to be associated with IDDM or other disease susceptibility. Under suitably stringent conditions, specific hybridisation of such a probe to test nucleic acid is l5indicative of the presence of the sequence alteration in the test nucleic acid. For efficient screening purposes, more than one probe may be used on the same test sample.
Allele- or variant-specific oligonucleotides may similarly be 2oused in PCR to specifically amplify.particular sequences if present in a test sample. Assessment of whether a PCR band contains a gene variant may be carried out in a number of ways familiar to those skilled in the art. The PCR product may for instance be treated in a way that enables one to display the 25polymorphism on a denaturing polyacrylamide DNA sequencing gel, with specific bands that are linked to the gene variants being selected.
SSCP heteroduplex analysis may be used for screening DNA
fragments for sequence variants/mutations. It generally 5involves amplifying radiolabelled 100-300 by fragments of the gene, diluting these products and denaturing at 95°C. The fragments are quick-cooled on ice so that the DNA remains in single stranded form. These single stranded fragments are run through acrylamide based gels. Differences in the sequence locomposition will cause the single stranded molecules to adopt difference conformations in this gel matrix making their mobility different from wild type fragments, thus allowing detecting of mutations in the fragments being analysed relative to a control fragment upon exposure of the gel to X-l5ray film. Fragments with altered mobility/conformations may be directly excised from the gel and directly sequenced for mutation.
Sequencing of a PCR product may involve precipitation with 2oisopropanol, resuspension and sequencing using a TaqFS+ Dye terminator sequencing kit. Extension products may be electrophoresed on an ABI 377 DNA sequencer and data analysed using Sequence Navigator software.
25A further possible screening approach employs a PTT assay in which fragments are amplified with primers that contain the consensus Kozak initiation sequences and a T7 RNA polymerase promoter. These extra sequences are incorporated into the 5' primer such that they are in frame with the native coding sequence of the fragment being analysed. These PCR products 5are introduced into a coupled transcription/translation system. This reaction allows the production of RNA from the fragment and translation of this RNA into a protein fragment.
PCR products from controls make a protein product of a wild type size relative to the size of the fragment being analysed.
loIf the PCR product analysed has a frame-shift or nonsense mutation, the assay will yield a truncated protein product relative to controls. The size of the truncated product is related to the position of the mutation, and the relative region of the gene from this patient may be sequenced to l5identify the truncating mutation.
An alternative or supplement to looking for the presence of variant sequences in a test sample is to look for the presence of the normal sequence, e.g. using a suitably specific 2ooligonucleotide probe or primer. Use of oligonucleotide probes and primers has been discussed in more detail above.
Allele- or variant-specific oligonucleotide probes or primers according to embodiments of the present invention may be 25selected from those shown in Table 1 and modified versions thereof.

Approaches which rely on hybridisation between a probe and test nucleic acid and subsequent detection of a mismatch may be employed. Under appropriate conditions (temperature, pH
etc.), an oligonucleotide probe will hybridise with a sequence 5which is not entirely complementary. The degree of base-pairing between the two molecules will be sufficient for them to anneal despite a mis-match. Various approaches are well known in the art for detecting the presence of a mis-match between two annealing nucleic acid molecules.
to For instance, RN'ase A cleaves at the site of a mis-match.
Cleavage can be detected by electrophoresing test nucleic acid to which the relevant probe or probe has annealed and looking for smaller molecules (i.e. molecules with higher i5electrophoretic mobility) than the full length probe/test hybrid.
Thus, an oligonucleotide probe that has the sequence of a region of the normal SAPL gene (either sense or anti-sense 2ostrand) in which mutations associated with LDDM or other disease susceptibility are known to occur (e.g. see Table 2) may be annealed to test nucleic acid and the presence or absence of a mis-match determined. Detection of the presence of a mis-match may indicate the presence in the test nucleic 25acid of a mutation associated with IDDM or other disease susceptibility. On the other hand, an oligonucleotide probe that has the sequence of a region of the gene including a mutation associated with IDDM or other disease susceptibility may be annealed to test nucleic acid and the presence or absence of a mis-match determined. The presence of a mis-5match may indicate that the nucleic acid in the test sample has the normal sequence (the absence of a mis-match indicating that the test nucleic acid has the mutation). In either case, a battery of probes to different regions of the gene may be employed.
The presence of differences in sequence of nucleic acid molecules may be detected by means of restriction enzyme digestion, such as in a method of DNA fingerprinting where the restriction pattern produced when one or more restriction lsenzymes are used to cut a sample of nucleic acid is compared with the pattern obtained when a sample containing the normal gene shown in a figure herein or a variant or allele, e.g. as containing an alteration shown in Table 2, is digested with the same enzyme or enzymes.
The presence or absence of a lesion in a promoter or other regulatory sequence may also be assessed by determining the level of mRNA production by transcription or the level of polypeptide production by translation from the mRNA.
Determination of promoter activity has been discussed above.

A test sample of nucleic acid may be provided for example by extracting nucleic acid from cells or biological tissues or fluids, urine, saliva, faeces, a buccal swab, biopsy or preferably blood, or for pre-natal testing from the amnion, 5placenta or foetus itself.
Screening for the presence of one or more amino acid sequence variants in a test sample has a diagnostic and/or prognostic use, for instance in determining IDDM or other disease losusceptibility.
There are various methods for determining the presence or absence in a test sample of a particular polypeptide, such as the polypeptide with the amino acid sequence shown in any l5figure herein or an amino acid sequence mutant, variant or allele thereof.
A sample may be tested for the presence of a binding partner for a specific binding member such as an antibody (or mixture 2oof antibodies), specific for one or more particular variants of the polypeptide shown in a figure herein. A sample may be tested for the presence of a binding partner for a specific binding member such as an antibody (or mixture of antibodies), specific for the polypeptide shown in a figure herein. In 25such cases, the sample may be tested by being contacted with a specific binding member such as an antibody under appropriate conditions for specific binding, before binding is determined, for instance using a reporter system as discussed. Where a panel of antibodies is used, different reporting labels may be employed for each antibody so that binding of each can be determined.
A specific binding member such as an antibody may be used to isolate and/or purify its binding partner polypeptide from a test sample, to allow for sequence and/or biochemical analysis l0of the polypeptide to determine whether it has the 'sequence and/or properties of the polypeptide whose sequence is disclosed herein, or if it is a mutant or variant form. Amino acid sequence is routine in the art using automated sequencing machines.
A test sample containing one or more polypeptides may be provided for example as a crude or partially purified cell or cell lysate preparation, e.g. using tissues or cells, such as from saliva, faeces, or preferably blood, or for pre-natal 2otesting from the amnion, placenta or foetus itself.
Whether it is a polypeptide, antibody, peptide, nucleic acid molecule, small molecule or other pharmaceutically useful compound according to the present invention that is to be 25given to an individual, administration is preferably in a "prophylactically effective amount" or a "therapeutically effective amount" (as the case may be, although prophylaxis may be considered therapy), this being sufficient to show benefit to the individual. The actual amount administered, and rate and time-course of administration, will depend on the 5nature and severity of what is being treated. Prescription of treatment, e.g. decisions on dosage etc, is within the responsibility of general practioners and other medical doctors.
1oA composition may be administered alone or in combination with other treatments, either simultaneously or sequentially dependent upon the condition to be treated.
Pharmaceutical compositions according to the present l5invention, and for use in accordance with the present invention, may include, in addition to active ingredient, a pharmaceutically acceptable excipient, carrier, buffer, stabiliser or other materials well known to those skilled in the art. Such materials should be non-toxic and should not 2ointerfere with the efficacy of the active ingredient. The precise nature of the carrier or other material will depend on the route of administration, which may be oral, or by injection, e.g. cutaneous, subcutaneous or intravenous.
25 Pharmaceutical compositions for oral administration may be in tablet, capsule, powder or liquid form. A tablet may include a solid carrier such as gelatin or an adjuvant. Liquid pharmaceutical compositions generally include a liquid carrier such as water, petroleum, animal or vegetable oils, mineral oil or synthetic oil. Physiological saline solution, dextrose 5or other saccharide solution or glycols such as ethylene glycol, propylene glycol or polyethylene glycol may be included.
For intravenous, cutaneous or subcutaneous injection, or loinjection at the site of affliction, the active ingredient will be in the form of a parenterally acceptable aqueous solution which is pyrogen-free and has suitable pH, isotonicity and stability. Those of relevant skill in the art are well able to prepare suitable solutions using, for l5example, isotonic vehicles such as Sodium Chloride Injection, Ringer's Injection, or Lactated Ringer's Injection.
Preservatives, stabilisers, buffers, antioxidants and/or other additives may be included, as required.
2oTargeting therapies may be used to deliver the active agent more specifically to certain types of cell, by the use of targeting systems such as antibody or cell specific ligands.
Targeting may be desirable for a variety of reasons; for example if the agent is unacceptably toxic, or if it would 25otherwise require too high a dosage, or if it would not otherwise be able to enter the target cells.

Instead of administering an agent directly, it may be be produced in target cells by expression from an encoding gene introduced into the cells, e.g. in a viral vector (see below).
The vector may be targeted to the specific cells to be 5treated, or it may contain regulatory elements which are switched on more or less selectively by the target cells.
Viral vectors may be targeted using specific binding molecules, such as a sugar, glycolipid or protein such as an antibody or binding fragment thereof. Nucleic acid may be lotargeted by means of linkage to a protein ligand (such as an antibody or binding fragment thereof) via polylysine, with the ligand being specific for a receptor present on the surface of the target cells.
l5An agent may be administered in a precursor form, for conversion to an active form by an activating agent produced in, or targeted to, the cells to be treated. This type of approach is sometimes known as ADEPT or VDEPT; the former involving targeting the activating agent to the cells by 2o conjugation to a cell-specific antibody, while the latter involves producing the activating agent, e.g. an enzyme, in_a vector by expression from encoding DNA in a viral vector (see for example, EP-A-415731 and WO 90/07936).
25Nucleic acid according to the present invention, e.g. encoding the authentic biologically active SAPL polypeptide or a functional fragment thereof, may be used in a method of gene therapy, to treat a patient who is unable to synthesize the active polypeptide or unable to synthesize it at the normal level, thereby providing the effect provided by the wild-type 5with the aim of treating and/or preventing one or more symptoms of IDDM and/or one or more other diseases.
Vectors such as viral vectors have been used to introduce genes into a wide variety of different target cells.
loTypically the vectors are exposed to the target cells so that transfection can take place in a sufficient proportion of the cells to provide a useful therapeutic or prophylactic effect from the expression of the desired polypeptide. The transfected nucleic acid may be permanently incorporated into l5the genome of each of the targeted cells, providing long lasting effect, or alternatively the treatment may have to be repeated periodically.
A variety of vectors, both viral vectors and plasmid vectors, 2oare known in the art, see e.g. US Patent No. 5,252,479 and WO
93/07282. In particular, a number of viruses have been used as gene transfer vectors, including adenovirus, papovaviruses, such as SV40, vaccinia virus, herpesviruses, including HSV and EBV, and retroviruses, including gibbon ape leukaemia virus, 25 Rous Sarcoma Virus, Venezualian equine enchephalitis virus, Moloney murine leukaemia virus and murine mammary tumourvirus.

Many gene therapy protocols in the prior art have used disabled murine retroviruses.
Disabled virus vectors are produced in helper cell lines in 5which genes required for production of infectious viral particles are expressed. Helper cell lines are generally missing a sequence which is recognised by the mechanism which packages the viral genome and produce virions which contain no nucleic acid. A viral vector which contains an intact lopackaging signal along with the gene or other sequence to be delivered (e. g. encoding the SAPL polypeptide or a fragment thereof) is packaged in the helper cells into infectious virion particles, which may then be used for the gene delivery.
Other known methods, of introducing nucleic acid into cells include electroporation, calcium phosphate co-precipitation, mechanical techniques such as microinjection, transfer mediated by liposomes and direct DNA uptake and receptor-2omediated DNA transfer. Liposomes can encapsulate RNA, DNA and virions for delivery to cells. Depending on factors such as pH, ionic strength and divalent cations being present, the composition of liposomes may be tailored for targeting of particular cells or tissues. Liposomes include phospholipids 25and may include lipids and steroids and the composition of each such component may be~altered. Targeting of liposomes may also be achieved using a specific binding pair member such as an antibody or binding fragment thereof, a sugar or a glycolipid.
5The aim of gene therapy using nucleic acid encoding the polypeptide, or an active portion thereof, is to increase the amount of the expression product of the nucleic acid in cells in which the level of the wild-type polypeptide is absent or present only at reduced levels. Such treatment may be lotherapeutic or prophylactic, particularly in the treatment of individuals known through screening or testing to have an IDDM4 susceptibility allele and hence a predisposition to the disease.
l5Similar techiques may be used for anti-sense regulation of gene expression, e.g. targeting an antisense nucleic acid molecule to cells in which a mutant form of the gene is expressed, the aim being to reduce production of the mutant gene product. Other approaches to specific down-regulation of 2ogenes are well known, including the use of ribozymes designed to cleave specific nucleic acid sequences. Ribozymes are nuceic acid molecules, actually RNA, which specifically cleave single-stranded RNA, such as mRNA, at defined sequences, and their specificity can be engineered. Hammerhead ribozymes may 25be preferred because they recognise base sequences of about 11-18 bases in length, and so have greater specificity than ribozymes of the Tetrahymena type which recognise sequences of about 4 bases in length, though the latter type of ribozymes are useful in certain circumstances. References on the use of ribozymes include Marschall, et al. Cellular and Molecular 5Neurobiology, 1994. 14(5): 523; Hasselhoff, Nature 334: 585 (1988) and Cech, J. Amer. Med. Assn., 260: 3030 (1988).
Aspects of the present invention will now be illustrated with reference to the accompanying figures described already above b and experimental exemplification, by way of example and not limitation. Further aspects and embodiments will be apparent to those of ordinary skill in the art. All documents mentioned in this specification are hereby incorporated herein by reference.

IDENTIFICATION OF IDDM4 EST4 (SAPL) Construction of Libraries for Shotgun Sequencing 2oDNA was prepared from BAC (Bacteria-1 Artificial Chromosomes) clones 14-1-15 and 25-e-5. Cells containing either BAC vector were streaked on Luria-Bertani (LB)agar plates supplemented with the appropriate antibiotic. A single colony was used to inoculate 200 ml of LB media supplemented with the appropriate 25antibiotic and grown overnight at 37°C. The cells were pelleted by centrifugation and plasmid DNA was prepared by following the QIAGEN (Chatsworth, CA) Tip500 Maxi plasmid/cosmid purification protocol with the following modifications; the cells from 100 ml of culture were used for each Tip500 column, the NaCl concentration of the elution 5buffer was increased from 1.25M to 1,7M, and the elution buffer was heated to 65° C.
Purified BAC and PAC DNA was digested with Not I restriction endonuclease and then subjected to pulse field gel l0electrophoresis using a BioRad CHEF Mapper system. (Richmond, CA). The digested DNA was electrophoresed overnight in a to low melting temperature agarose (BioRad, Richmond CA) gel that was prepared with 0.5X Tris Borate EDTA (10X stock solution, Fisher, Pittsburgh, PA ). The CHEF Mapper autoalgorithm 15 default settings were used for switching times and voltages.
Following electrophoresis the gel was stained with ethidium bromide (Sigma, St. Louis, MO) and visualized with a ultraviolet transilluminator. The insert bands) was excised from the gel. The DNA was eluted from the gel slice by beta-2oAgarase (New England Biolabs, Beverly MA) digestion according to the manufacturer's instructions. The solution containing the DNA and digested agarose was brought to 50 mM Tris pH 8.0, 15 mM MgCl2, and 25o glycerol in a volume of 2 ml and placed in a AERO-MIST nebulizer (CIS-US, Bedford MA). The nebulizer 25was attached to a nitrogen gas source and the DNA was randomly sheared at 10 psi for 30 sec.

The sheared DNA was ethanol precipitated and resuspended in TE
(10 mM Tris, 1 mM EDTA). The ends were made blunt by treatment with Mung Bean Nuclease (Promega, Madison, WI) at 30°
C for 30 min, followed by phenol/chloroform extraction, and 5treatment with T4 DNA polymerase (GIBCO/BRL, Gaithersburg, MD) in multicore buffer (Promega, Madison, WI) in the presence of 40 uM dNTPs at 16 °C. To facilitate subcloning of the DNA
fragments, BstX I adapters (Invitrogen, Carlsbad, CA) were ligated to the fragments at 14 °C overnight with T4 DNA lipase (Promega, Madison WI). Adapters and DNA fragments less than 500 by were removed by column chromatography using a cDNA
sizing column (GIBCO/BRL, Gaithersburg, MD) according to the instructions provided by the manufacturer. Fractions containing DNA greater than 1 kb were pooled and concentrated l5by ethanol precipitation. The DNA fragments containing BstX I
adapters were ligated into the BstX I sites of pSHOT II which was constructed by subcloning the BstX I sites from pcDNA II
(Invitrogen, Carlsbad, CA) into the BssH II sites of pBlueScript (Stratagene, La Jolla, CA). pSHOT II was prepared 2oby digestion with BstX I restriction endonuclease and purified by agarose gel electrophoresis. The gel purified vector DNA
was extracted from the agarose by following the Prep-A-Gene (BioRad, Richmond, CA) protocol. To reduce ligation of the vector to itself, the digested vector was treated with calf 25intestinal phosphatase (GIBCO/BRL, Gaithersburg, MD. Ligation reactions of the DNA fragments with the cloning vector were transformed into ultra-competent XL-2 Blue cells (Stratagene, La Jolla, CA), and plated on LB agar plates supplemented with 100 ug/ml ampicillin. Individual colonies were picked into a 96 well plate containing 100 ul/well of LB broth supplemented 5with ampicillin and grown overnight at 37 °C. Approximately 25 u1 of 80o sterile glycerol was added to each well and the cultures stored at -80 °C.
Preparation of plasmid DNA
loGlycerol stocks were used to inoculate 5 ml of LB broth supplemented with 100 ug/ml ampicillin either manually or by using a Tecan Genesis RSP 150 robot (Tecan AG, Hombrechtikon, Switzerland) programmed to inoculate 96 tubes containing 5 ml broth from the 96 wells. The cultures were grown overnight at 1537° C with shaking to provide aeration. Bacterial cells were pelleted by centrifugation, the supernatant decanted, and the cell pellet stored at -20 °C. Plasmid DNA was prepared with a QIAGEN Bio Robot 9600 (QIAGEN, Chatsworth CA) according to the Qiawell Ultra protocol. To test the frequency and size of 2oinserts plasmid DNA was digested with the restriction endonuclease Pvu II. The size of the restriction endonuclease products was examined by agarose gel electrophoresis with the average insert size being 1 to 2 kb.
25 DNA Sequence Analysis of Shofgun clones DNA sequence analysis was performed using the ABI PRTSM''" dye terminator cycle sequencing ready reaction kit with AmpliTaq DNA polymerase, FS (Perkin Elmer, Norwalk, CT). DNA sequence analysis was performed with M13 forward and reverse primers.
Following amplification in a Perkin-Elmer 9600 the extension 5products were purified and analyzed on an ABI PRISM 377 automated sequencer (Perkin Elmer, Norwalk, CT).
Approximately 12 to 15 sequencing reactions were performed per kb of DNA to be examined e.g. 1500 reactions would be Zoperformed for a BAC insert of 100 kb.
Assembly of DNA sequences Phred/Phrap was used for DNA sequences assembly. This program was developed by Dr. Phil Green and licensed from the l5University of Washington (Seattle, WA). Phred/Phrap consists of the following programs: Phred for base-calling, Phrap for sequence assembly, Crossmatch for sequence comparisons, Consed and Phrapview for visualization of data, Repeatmasker for screening repetitive sequences. Vector and E. coli.DNA
2osequences were identified by Crossmatch and removed from the DNA sequence assembly process. DNA sequence assembly was on a SUN Enterprise 4000 server running Solaris 2.51 operating system (Sun Microsystems Inc., Mountain View, CA). The sequence assemblies were further analyzed using Consed and 25 Phrapview.

Biolnformatic Analysis of Assembled DNA Sequences The DNA sequences at various stages of assembly were queried against the DNA sequences in the GenBank database (subject) using the BLAST algorithm (S.F. Altschul, et a1. (1990) J.
5Mol. Biol. 215, 403-410). When examining large contiguous sequences of DNA repetitive elements were masked following identification by Crossmatch with a database of mammalian repetitive elements. Following BLAST analysis the results were compiled by a parser program. The parser provided the lofollowing information from the database for each DNA sequence having a similarity with a P value greater than 10-6; the annotated name of the sequence, the database from which it was derived, the length and percent identity of the region of similarity, and the location of the similarity in both the 15 query and the subject.
Analysis of DNA sequences from BAC 14-1-15 revealed an EST
aa194169 which was 91o identical over 60 nucleotides. Several lines of evidence indicated that this was an authentic mRNA
2otranscript and that this EST represented the 5' most portion of that mRNA transcript. The first piece of evidence was revealed by comparing sequences obtained from a mouse BAC
clone 53-d-8 that is syntenic with BAC 14-1-15 (Figure 4).
The human genomic DNA corresponding to EST aa194169 was 25 conserved 880 over 43 by with the mouse genomic DNA. This region of human genomic DNA exhibited a relatively high score in the promoter prediction algorithm PROMOTERSCAN (Prestridge (1995) J. Mol Biol. 249: 923-932) and the presence of a cluster of DNA sequences that are predicted to serve as transcription factor binding sites (Figure 4). This region of 5human genomic DNA was predicted to be a CpG island, which is often associated with the 5' end genes. These sequences lie approximately 11 kb downstream of the 3' end of a gene, LRPS, that we have previously characterized. Finally, DNA sequences obtained from BAC clone 25-e-5 revealed additional genomic losequences that were represented by EST aa194169 indicating the presence of an intron located between two exons. Together these data support the 5' portion of IDDM4 EST4 corresponding to aa194169 and that this EST sequence is derived from an authentic mRNA transcript. To isolate the open reading frame l5for this gene, RCCA analysis was focused on extending 3'.
Extension of IDDM4 EST4 by RCCA
The full length cDNA of one aspect of the present invention was generated by a method of cDNA screening called Reduced 2oComplexity cDNA Analysis (RCCA). Briefly, the extension of partial cDNA sequences have historically been achieved with one or both of the two commonly used methods: filter screening of cDNA libraries by hybridization with labeled probes, and 5'-and 3'-RACE with total cellular mRNA by PCR. The first method 25is effective but laborious and slow while the latter method is fast but limited in efficiency. This RACE protocol is hindered by limited length of extension due to the use of the entire cellular mRNA population in a single reaction. Since smaller fragments are amplified much more efficiently than larger fragments by PCR in the same reaction, PCR products 5obtained using the second method are often quite small.
The RCCA method improves upon known methods of cDNA library screening by initially constructing and subdividing cDNA
libraries followed by isolating 5'- and 3'- flanking fragments loby PCR. Since each pool is unlikely to contain more than one clone for a given gene which is low to moderately expressed, competition between large and small PCR products in one pool does not exist, making it possible to isolate fragments of various sizes. One definite advantage of the method as 15 described herein is the efficiency, throughput, and its potential to isolate alternatively spliced cDNA forms.
The RCCA process provides for rapid extension of a partial cDNA sequence based on subdividing a primary cDNA library and 20 DNA amplification by polymerase chain reaction (PCR). A cDNA
library is constructed with cDNA primed by random, oligo-dT or a combination of both random and oligo-dT primers and then subdivided into pools at approximately 10,000 -20,000 clones per pool that are stored in a 96-well plate. Each pool (well) 25 is amplified separately and therefore represents an independent portion of the cDNA molecules from the original mRNA source.
The fundamental principle of the RCCA process is to subdivide a complex library into superpools of about 10,000 to about 520,000 clones. A library of two million primary clones, a number large enough to cover most mRNA transcripts expressed in the tissue, can be subdivided into 188 pools and stored in two 96-well plates. Since the number of transcripts for most genes is fewer than one copy per 10,000 transcripts in total locellular mRNA, each pool is unlikely to contain more than one clone for a given cDNA sequence. Such reduced complexity makes it possible to use PCR to isolate flanking fragments of partial cDNA sequences larger than those obtained by known methods.
The skilled artisan, aided with this specification, will understand the far reaching cDNA cloning process disclosed herein: multiple primer combinations from an EST or other partial cDNA sequence, in combination with flanking vector 2oprimer oligonucleotides can be used to "walk" in both-directions away from the internal, gene specific, sequence, and respective primers, such that a contig representing a full length cDNA can be constructed. In this particular case, the 5' end of the cDNA was represented by an EST with GenBank 25accession number aa194169. Therefore the RCCA procedure was only employed to obtain sequences 3' of EST aa194169. This procedure relies on the ability to screen multiple pools which comprise a representative portion of the total cDNA library.
This procedure is not dependent upon using a cDNA library with directionally cloned inserts. Instead, both 5' and 3' vector Sand gene specific primers are added and a contig map is constructed from additional screening of positive pools using both vector primers and gene specific primers. Of course, these gene specific primers are initially constructed from a known nucleic acid fragment such as an expressed sequence tag.
loHowever, as the walk continues, gene specific primers are utilized from the 5' and 3' boundaries of the newly identified regions of the cDNA. As the walk continues, there is still no requirement that the vector orientation of a yet unidentified fragment be known. Instead, all combinations are tested on a i5positive pool and the actual vector orientation is determined by the ability of certain vector/gene specific primers to generate the predicted PCR fragment. A full-length cDNA may then be constructed by known subcloning procedures.
2oRCCA was used to extend the partial cDNA sequence originally identified by similarity between EST aa194169 and BAC 14-1-15 genomic DNA sequences. Positive pools containing the cDNA
sequence were identified by PCR using a pair of primers, 4dest4 3f and 4dest4 1r, at a final concentration of 0.15 uM, 25which generate a PCR product of 377 nucleotides. This product was obtained by 40 amplification cycles of denaturation at 94°C

for 30 seconds, primer annealing at 60°C for 30 sec, and product extension at 68°C for 1 min. The DNA polymerase used for PCR amplification was the enzyme TaqGold (Perkin-Elmer, Norwalk CT) and the reaction volume was 10 u1. The PCR
5template was RCCA pools from size selected libraries >2.5 kb from prostate and testis. Positive pools were identified by detection of the 377 by product by agarose (20) gel electrophoresis. Each positive pool in the library contains an independent clone of the cDNA sequence; within each clone b are embedded the partial cDNA sequence and its flanking fragments. The flanking fragments are isolated by PCR with primers complementary to the known vector and cDNA sequences and then sequenced directly. To extend the cDNA clone in the 3' direction the primers 4dest4 3f and 4dest4 6f were used in l5combination with the vector primers, 5438 and 873F, in a primary reaction using Taqara LA (Panvera, Madison, WI). The amplification conditions were 20 cycles of denaturation at 94 °C for 30 sec, primer annealing at 60 °C for 30 sec and extension for 4 min at 68 °C in a 10 u1 reaction volume. The 2oprimary reaction was diluted by adding 9 parts water and an aliquot was removed for a second PCR reaction containing primer 4dest 7f (4dest4 6f primary reactions) and 4dest4f (for 4dest4 3f primary reactions). The secondary reactions were amplified 25 cycles using Taqara LA as described above. The 25 DNA sequences from these fragments were assembled with original partial cDNA sequence to generate a continuous cDNA

fragment of 2482 nucleotides. The longest clone obtained by RCCA provided an extension of 1.8 kb. The cDNA sequence of 2482 nucleotides was used to search the GenBank database using the BLAST algorithm. This resulted in the identification of a SUnigene EST cluster represented by GenBank Accession number aa193106. A number of EST sequences that were present in this Unigene cluster were assembled to produce approximately 1.4 kb of cDNA sequence. PCR primers were then designed based on the Unigene cluster to link these sequences to the DNA sequences l0identified by RCCA. This resulted in the identification of 4.8 kb of cDNA sequence that contains an open reading frame of 2382 nucleotides which encodes a protein of 794 amino acids.
One of the RCCA clones, clone 33, diverged from the other l5sequences after nucleotide 2608 to form isoform(b). The divergent sequence in isoform(b) is identical to isoform(a) from nucleotide 4172 to 4682. Therefore this sequence likely represents an alternatively spliced mRNA transcript in which isoform(a) nucleotides 2609-4171 are missing. Isoform (b) 2ocontains an open reading frame which encodes a protein of 791 amino acids of which the first 776 amino acids are identical to isoform (a).
Identification of polymorphisms in SAPL
25The process of RCCA generates clones that may differ in origin, i.e. the mRNA used to synthesize the cDNA, e.g.

testis, mRNA may be derived from an individual heterozygous for the SAPL locus or may be from a pool of different individuals. Therefore polymorphisms between different RCCA
clones may represent true differences, alternatively these differences may arise from PCR mistakes or from errors that are made by the DNA polymerase during the propagation of vectors containing SAPL inserts in E. coli. To discriminate against these types of errors polymorphisms were only noted when detected in more than one clone and where the sequence l0 quality was excellent. All the candidate polymorphism's that were detected lie in the putative 3' untranslated portion of the cDNA and thus have no effect on the encoded protein.
Northern Blot Analysis Primers 4dest4 2f and 4dest4 2r (Table 2) were used to amplify a PCR product of 957 by from placenta, testis, thymus or lymph node cDNA. This products were purified on an agarose gel, the DNA extracted, and subcloned into pCR2.1 (Invitrogen, Carlsbad, CA). The 957 by probe was labeled by random priming 2owith the Amersham Rediprime kit (Arlington Heights, IL) in the presence of 50-100 uCi of 3000 Ci/mmole [alpha 32P]dCTP
(Dupont/NEN, Boston, MA). Unincorporated nucleotides were removed with a ProbeQuant G-50 spin column (Pharmacia/Biotech, Piscataway, NJ). The radiolabeled probe at a concentration of 25greater than 1 x 106 cpm/ml in rapid hybridization buffer (Clontech, Palo Alto, CA) was incubated overnight at 65°C with human multiple tissue Northern's I and II (Clontech, Palo Alto, CA). The blots were washed by two 15 min incubations in 2X SSC, O.lo SDS (prepared from 20X SSC and 20 % SDS stock solutions, Fisher, Pittsburg, PA) at room temperature, 5followed by two 15 min incubations in 1X SSC, O.lo SDS at room temperature, and two 30 min incubations in 0.1X SSC, 0.1o SDS
at 60°C. Autoradiography of the blots was done to visualize the bands that specifically hybridized to the radiolabeled probe.
l0 The expression pattern in a number of tissues was examined by Northern blot analysis. Two distinct bands were detected, one of approximately 4.9 kb and the second of approximately 4.1 kb. In most tissues the predominant band is the larger 4.9 kb l5band, however, in the testis the lower band of approximately 4.1 kb is the predominant one. This lower band may indicate an alternatively spliced form that differs from SAPLb which may be investigated in the testis. The first band likely corresponds to the SAPLa cDNA for which the sequence of 4793 2onucleotides has been determined. The second band may..
correspond to SAPLb for which the sequence of 3228 nucleotides has been determined. Alternatively, the approximately 4.1 kb band may correspond to an as yet unidentified alternatively spliced form, in which case SAPLb would be a rare 25alternatively spliced transcript. The highest level of SAPL
expression is seen in skeletal muscle, placenta, heart, pancreas and testis. Detectable expression is also observed in brain, lung, liver, kidney, spleen, thymus, prostate, small intestine, colon, and leukocytes. No detectable expression is seen in ovary.
Identification of intron/exon boundaries for SAPL
The program Crossmatch which uses the Smith-Waterman algorithm was used to compare SAPL cDNA sequences with BAC 14-1-15 and BAC 25-e-5 genomic sequences. This identifies the boundaries lofor first five exons of SAPL which correspond to the first 865 nucleotides of the cDNA sequence (Table 3).
Isolation of other species homologs of SAPL gene The SAPL genes from different species, e.g. rat, dog, are l5isolated by screening of a cDNA library with portions of the gene that have been obtained from cDNA of the species of interest using PCR primers designed from the human sequence.
Degenerate PCR is performed by designing primers of 17-20 nucleotides with 32-128 fold degeneracy by selecting regions 2othat code for amino acids that have low codon degeneracy e.g.
Met and Trp. When selecting these primers preference is given to regions that are conserved in the protein e.g. the motifs shown herein. PCR products are analyzed by DNA sequence analysis to confirm their similarity to the human sequence.
25 The correct product is used to screen cDNA libraries by colony or plaque hybridization at high stringency. Alternatively probes derived directly from the human gene are utilized to isolate the cDNA sequence of SAPT from different species by hybridization at reduced stringency.
Use of the SAPL cDNA sequence to search the GenBank database using the FASTA algorithm revealed mouse EST AA684416 which is 93o identical to the SAPL cDNA sequence from 590 to 1080. This is likely the mouse ortholog of human SAPL. It and other mouse ESTs such as aa435418, which is 86o identical from 2888 loto 3348, are used in the isolation of the mouse SAPL cDNA
either by a PCR based or nucleic acid hybridization based strategy.

lSAssociation with diabetes.
Type I diabetes is a multifactorial disorder, with the genetic component being oligo- or polygenic. Two loci have been identified as conferring susceptibility to typ 1 diabetes by 2ocandidate gene approaches. The main locus is encoded by the major histocompatibility complex (MHC) on chromosome 6p (IDDMl) (Morton, N., et al. (1983) AM J HUM GENET 35, 201-213;
Todd, J and Farrall, M. (1996) Hum Mol Genets, 1443-1448) with the second locus, IDDM2, the insulin minisatellite or variable 25number of tandem repeats (VNTR) on chromosome llp (Bennett, S., et al (1995) Nature Genet 9, 284-292). These two loci alone, however, cannot account for the observed degree of familial clustering of disease observed in families, where ?,S=15 (1~S= sibling risk/population prevalence); IDDMl and IDDM2 have 1s=3 and 1.25 respectively, accounting for 500 of familial 5 clustering (Morton, N., et al. (1983) AM J HUM GENET 35, 201-213; Todd, J and Farrall, M. (1996) Hum Mol Genet5, 1443-1448;
Bennett, S., et al (1995) Nature Genet 9, 284-292; Risch, N.
(1987) Am J Hum Genet 40, 1-14). A positional cloning approach was therefore undertaken to identify the other loci:
10a genome wide scan for linkage suggested another 18 possible regions (Davies, J. et al. (1994) Nature 371, 130-136), including IDDM~ on chromosome 11q13 (MLS 3.4, p<0.0001 at FGF3). This locus was subsequently confirmed at levels of genome-wide significance (p<2 x 10-5) (Todd, J and Farrall, M.
i5 (1996) Hum Mol Genets, 1443-1448; Luo, D-F., et al. (1996) Hum Mol Genet 5, 693-698).
To investigate the extent of linkage within this region, 704 multiplex families (426 UK, 236 US, 32 Norway, 39 Italy) were 2oanalysed with 19 microsatellite markers in a 25cM interval spanning FGF3. A multipoint linkage curve was produced (MAPMAKER/SIBS Kruglyak, L and Lander, S. (1995) Am J Hum Genet 57 439-454) with a peak MLS=2.8 (p<0.0003) at D11S1889 (Figure 5), indicating that IDDM4 was localised to within the 2518cM interval DlIS903 to DI1S534 (Nakagawa, Y., et al (1997) Fine mapping of a Type 1 Diabetes Susceptibility Gene (IDDM4) on Chromosome llq 13. Hum. Mol. Genet. Submitted).
Multipoint linkage analysis cannot localise the gene to a small region. Instead, association mapping has been used for 5rare single gene traits which can narrow the interval to less than 2cM or 2Mb. In theory, associations of a particular allele very close to the founder mutation will be detected in populations descended from that founder. The transmission disequilibrium test (TDT - Spielman, R., et al (1993) Am J Hum loGenet 52, 506-516) assesses the deviation from 500 of the transmission of alleles from a marker locus from parents to affected children. A strategy was undertaken with the IDDM4 linkage region, using TDT, to detect linkage in the presence of association, which had also been previously used i5to fine map the putative IDDM6 locus on chromosome 18q21 (Merriman, T. et al. (1997) Hum. Mol. Genet. 6 1003-1010).
TDT analysis of 658 UK and US families showed a deviation in transmission of alleles of four loci. Analysis of the three most common alleles, with p"n~orre~ted<0. 05: D11S4205 54 0 2otransmission, p=0.03; DlIS1783, 58o transmission, p=0.0005;
D11S1189, 46o transmission, p=0.05; H0570POLYA, 540 transmission, p=0.01. The multiallelic TSP test was undertaken on these loci which is a test for association of loci with multiple alleles (Martin, E. et al. (1997) Am. J. Hum. Genet.
25 61, 439-448). This confirmed the results with DlIS4205 (Tsp=17.5, p=0.01), D11S1783 (Tsp=23.6, p=0.0001) and H0570POLYA (Tsp=12.4, p=0.03). D11S4205 (proximal) and DI1S1783 (distal) are approximately 1Mb apart, and so may be showing association with one locus. H0570POLYA is approximately 3Mb distal to DIlSl783 and therefore may be 5showing association with a second locus. Figure 6 shows the LOD score of the Tsp analysis (-llog of the p value). Further analysis of H0570POLYA in 2042 families with type 1 diabetes confirmed the association observed with this marker and type 1.
diabetes (2X2 test of heterogeneity for affected versus to unaffected siblings, p~orre~ted<4.8 x 10-5) (Nakagawa, Y., et al (1997) Hum. Mol. Genet. Submitted).
As association of a particular allele of a marker to the disease is likely to occur when marker and disease mutation l5are close (within 2Mb), genes within this interval are candidates. The SAPL gene is within 200kb of H0570POLYA, in a region showing strong association with IDDM, hence single nucleotide polymorphisms within this gene and its regulatory regions are candidates for the aetiological mutation IDDM4.

OLIGONUCLEOTIDE PRIMERS
4dest4 if (-26) 4dest4 2f (0) TCGTGGGCACCTCCAGATAAG
304dest4 Sf (24) ACAAGCTCAGAGAGATGTGGTG
4dest4 3f (66) AACTTCCTCGGCCATATGG
4dest4 4F (120) GGGAGAGCTTGTTTCATATCC
4dest4 3r (216) lOTCTTCTTTGTGGCTCCTTGC
4dest4 1r (443) CGGTTCTGAGCTTTACATTCC
4dest4 6f ( 565 ) GGGAGAAGATGAATCCTTGC
4dest4 7f (619) CCCTTTGAATCCACTACTTGC
4dest4 2r (957) ATTTGTTGCTCAGGCTCCTG
4dest4 8f (1065) 4dest4 4r (1067) TGGATTGCACTGACTATGGC
4dest4 llf (1497) TGGGACACCTAACGAGGATAGC
4dest4 9F (1582) AGATCCTCCGACGAAGTCAG
4dest4 5r (1602) CTGACTTCGTCGGAGGATCT
4dest4 6r (1765) 4dest4 lOf (2012) PAIR WITH 8R
CAAGACTTGTTTGAACCCAGC
4dest4 7r (2189) TCTCTTTAGTTGGCATCGGC
4dest4 8r (2391) CTTTCTGCATCCTCCTCTCC
4dest4 12f (2515) AGATGCTGCTTGTAAAGACGC

4dest4 12r (2643) ACTGAAGTGTCACCTGGTGC
4dest4 14f (2909) SGCCTGTGAAATAAGATCTTGCC
4dest4 14r (2930) GGCAAGATCTTATTTCACAGGC
l0 4dest4 15r (3376) CAAGCAAACAAGACTTGAACAG
4dest4A 13R (3876) TGAGCTGTTTGAGAAGGCTG
4dest4A 11R (4193) AGTGCTGGAATCTCCACACC
4dest4A 13F (4301) 4dest4A 10R (4691) CCCATTGTCATATCCTTTCCC
254dest4A 9R (4786) TTCAGTATGGCCAACACACAG
Vector Primers for RCCA
3o PBS.543R
GGGGATGTGCTGCAAGGCGA
PBS.578R
CCAGGGTTTTCCCAGTCACGAC
PBS.838F
TTGTGTGGAATTGTGAGCGGATAAC
PBS.873F

DM4E4 POLYMORPHISMs I ocationPolvmor~hism 5' Context 3'Context.

3297 delete AAGTA AGATTAAGTA TTTATTGCTA

3488 G to A transition TI'1'I'I'GTTTC TITI'GGTAGTT

3680 G to A transition TATTTTAAAA TAGAAATCAA

4 I delete TTA GTCTAATGCC TTATTTC'I GA

Nucleotide location numbers are based on DM4E4a sequence (Figure XX).

DM4E4 Intron/Exon Boundaries Fxon Size 5' 3' IntronSize 1 60+ TCCAGgtaa 1 unknown 2 106 tacagATAAG T'PGAGgtacc 2 > 13,000 3 150 tacagGAGCT GAAAGgtaag 3 18,014 4 233 tttagACCAG TACAAgtaag 4 6,964 187 tctagGTATC AACAGgtaaa 5 3,043 129 tctagATTGT AAGATgtgct 6 unknown Exons 1-6 account for the first 820 nucleotides of DM4F~la cDNA

Prosite Motifs in SAPR
RPCICj~P Number Motif 279->282 CAMP_PHOSPHO SITE

458->461 CAMP_PHOSPHO SITE

556->559 CAMP_PHOSPHO_SITE

23->25 PKC_PHOSPHO SITE

133-> 135 PKC_PHOSPHO _SITE

278->280 PKC_PHOSPHO_ SITE

421->423 PKC_PHOSPHO_ STT'E

456->458 PKC_PHOSPHO_ SITE

554->556 PKC_PHOSPHO SITE

651->653 PKC_PHOSPHO STl'E

655->657 PKC_PHOSPHO SITE

706->708 PKC_PHOSPHO _SITE

11-> 14 CK2_PHOSPHO_ SITE

IS->I8 CK2 PHOSPHO_ SITE

23->26 CIC2_PHOSPHO_ SITE

171->174 CK2 PHOSPHO_ SITE

202->205 CK2_PHOSPHO_ SITE

214->217 CIC2 PHOSPHO_ SITE

233->236 CK2_PHOSPHO SITE

274->277 CK2_PHOSPHO_ SITE

304->307 CK2_PHOSPHO SITE

315->318 CK2_PHOSPHO SITE

339->342 CK2_PHOSPHO SITE

351->354 CK2 PHOSPHO_ SITE

360->363 CK2 PHOSPHO_ SITE

362->365 CIC2_PHOSPHO_ SIT'E

366->369 CK2 PHOSPHO_ SITE

452->455 CIC.2_PHOSPI-i0 SITE

537->540 CK2_PHOSPHO SITE

563->566 CK2_PHOSPHO SITE

569->572 CK2 PHOSPHO _SITE

571->574 CK2 PHOSPHO STTE

628->631 CK2_PHOSPHO_SITE

642->645 CK2 PHOSPHO STTE

651->654 CK2 PHOSPHO _SITE

660->b63 CK2 PHOSPHO SITE

666->b69 CK2_PHOSPHO SITE

697->700 CK2 PHOSPHO_SITE

744->747 CK2_PHOSPHO SITE

772->775 CK2_PHOS PHO_STTE

293->298 M~STn-561->566 M~STn-717->722 MZ'~STZ'I-

Claims

1. An isolated nucleic acid encoding a polypeptide which comprises the first 776 amino acids shown in Figure 1(c) and Figure 2(c).

2. An isolated nucleic acid molecule encoding a polypeptide and which hybridizes under stringent conditions to nucleic acid according to claim 1.

3. An isolated nucleic acid encoding a SAPL polypeptide, which SAPL polypeptide is selected from the group consisting of the SAPLa polypeptide isoforms of which the amino acid sequences are shown in Figure 1(c) and Figure 1(d) and the SAPLb polypeptide isoform of which the amino acid sequence is shown in Figure 2(c).

4. An isolated nucleic acid according to claim 3 comprising a coding sequence selected from the group consisting of the coding sequences shown in Figure 1(a), Figure 1(b), Figure 2(a) or Figure 2(b).

5. An isolated nucleic acid according to claim 4 comprising a coding sequence encoding said SAPL polypeptide selected from the group consisting of the SAPLa polypeptide isoforms of which the amino acid sequences are shown in Figure 1(c) and Figure 1(d) and the SAPLb polypeptide isoform of which the amino acid sequence is shown in Figure 2(c), wherein the coding sequence differs from the coding sequences shown in Figure 1(a), Figure 1(b), Figure 2(a) or Figure 2(b).

6. An isolated nucleic acid encoding a polypeptide, which polypeptide has at least 80% amino acid sequence similarity with a SAPL polypeptide encoded by nucleic acid according to claim 3.

7. An isolated nucleic acid according to claim 6 encoding a polypeptide, which polypeptide has at least 90% amino acid sequence similarity with a SAPL polypeptide encoded by nucleic acid according to claim 3.

8. An isolated nucleic acid that corresponds to nucleic acid according to claim 4 containing an alteration at a polymorphic site associated with disease.

9. An isolated nucleic acid that corresponds to nucleic acid according to claim 4 containing an alteration shown in Table 2.

10. A replicable nucleic acid vector comprising nucleic acid according to any one of claims 1 to 9.

11. A replicable nucleic acid vector according to claim 10 wherein said nucleic acid is under control of regulatory sequences for expression.

12. A host cell transformed with nucleic acid according to any one of claims 1 to 10 or a replicable nucleic acid vector according to claim 9 or claim 10.

13. An oligonucleotide fragment of a nucleic acid molecule according to claim 4 of at least about 14 nucleotides.

14. An oligonucleotide with a nucleotide sequence shown in Table 1.

15. An isolated nucleic acid encoding a promoter of which the sequence is shown within Figure 4.

16. An isolated nucleic acid according to claim 15 operably linked to a heterologous coding sequence.

17. A replicable nucleic acid vector comprising nucleic acid according to claim 15 or claim 16.

18. A replicable nucleic acid vector according to claim 17 wherein said nucleic acid is under control of regulatory sequences for expression.

19. A host cell transformed with nucleic acid according to claim 15 or claim 16 or a replicable nucleic acid vector according to claim 17 or claim 18.

20. An isolated polypeptide comprising an amino acid sequence selected from the group consisting of those encoded by nucleic acid according to any one of claims 1 to 9.

21. A fragment of a polypeptide including at least 5 contiguous amino acids of an amino acid sequence selected from the group consisting of the amino acid sequences shown in Figure 1(c), Figure 1(d) and Figure 2(c)

22. A fragment according to claim 21 which has an amino acid sequence selected from the group consisting of:

HPSQEEDRHSNASQ, RIQQFDDGGSDEEDI, PESQRRSSSGSTDSE, and PSSSPEQRTGQPSAPGDTS.

23. A method of production of a polypeptide which comprises culturing a host cell according to claim 19 under conditions for production of said polypeptide.

24. A method according to claim 23 further comprising isolating and/or purifying the polypeptide.

25. A method according to claim 24 further comprising formulating the polypeptide into a composition which comprises at least one additional component.

26. A composition comprising a polypeptide according to claim 20 or fragment according to claim 22, or nucleic acid encoding said polypeptide or fragment, and a pharmaceutically acceptable excipient.

27. An isolated antibody specific for a polypeptide according to claim 20.

28. An isolated antibody according to claim 27 which binds an amino acid sequence selected from:
HPSQEEDRHSNASQ, RIQQFDDGGSDEEDI, PESQRRSSSGSTDSE, and PSSSPEQRTGQPSAPGDTS.

29. A composition comprising an antibody according to claim 27 or claim 28 and a pharmaceutically acceptable excipient.

30. A method which comprises determining in a sample the presence or absence of nucleic acid with the nucleotide sequence of nucleic acid according to any one of claims 1 to 9, an oligonucleotide with the nucleotide sequence of an oligonucleotide according to claim 13 or 14, or a polypeptide with the amino acid sequence of a polypeptide according to claim 20 or comprising the amino acid sequence of a fragment according to claim 21 or claim 22.