AU752969B2

AU752969B2 - Patched genes and their use

Info

Publication number: AU752969B2
Application number: AU59408/99A
Authority: AU
Inventors: Lisa V Goodrich; Ronald L Johnson; Matthew P Scott
Original assignee: Leland Stanford Junior University
Current assignee: Leland Stanford Junior University
Priority date: 1994-10-07
Filing date: 1999-11-12
Publication date: 2002-10-03
Anticipated expiration: 2015-10-06
Also published as: AU5940899A

Description

S F Ref: 374791D1

AUSTRALIA

PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT

ORIGINAL

a Name and Address of Applicant: The Board of Trustees of tl University Suite 300 900 Nelch Road Palo Alto California 94304 UNITED STATES OF AMERICA he Leland Stanford Junior a Actual Inventor(s): Address for Service: Invention Title: Matthew P. Scott, Lisa V. Goodrich and Ronald L. Johnson Spruson Ferguson, Patent Attorneys Level 33 St Martins Tower, 31 Market Street Sydney, New South Wales, 2000, Australia Patched Genes and Their Use The following statement is a full description of this invention, including the best method of performing it known to me/us:- IP Australia Documents received on: C/ 12 NOV 1999 CD Batch No: 5845 PATCHED GENES AND THEIR USE

INTRODUCTION

Technical Field The field of this invention concerns segment polarity genes and their uses.

Background Segment polarity genes were discovered in flies as mutations which change the pattern of structures of the body segments. Mutations in the genes cause animals to develop the changed patterns on the surfaces of body segments, the changes affecting the pattern along the head to tail axis. For example, mutations in the gene patched cause each body segment to develop without the normal structures in the center of each segment. In their stead is a mirror image of the pattern normally found in the anterior segment. Thus cells in the center of the segment make the wrong structures, and point them in the wrong direction with reference to the over all head-to-tail polarity of the animal. About sixteen genes in the class are known.

The encoded proteins include kinases, transcription factors, a cell junction protein, two secreted proteins called wingless (WG) and hedgehog a single transmembrane protein called patched (PTC), and some novel proteins not related to any known protein. All of these proteins are believed to work together in signaling pathways that inform cells about their neighbors in order to set cell fates and polarities.

Many of the segment polarity proteins of Drosophila and other invertebrates are closely related to vertebrate proteins, implying that the molecular mechanisms involved are ancient. Among the vertebrate proteins related to the fly genes are En- 1 and which act in vertebrate brain development and WNT-1, which is also involved in brain development, but was first found as the oncogene implicated in many cases of mouse breast cancer. In flies, the patched gene is transcribed into RNA in a complex and dynamic pattern in embryos, including fine transverse stripes in each body segment primordium. The encoded protein is predicted to contain many transmembrane domains. It has no significant similarity to any other known protein. Other proteins having large numbers of transmembrane domains include a variety of membrane receptors, channels through membranes and transporters through membranes.

The hedgehog (HH) protein of flies has been shown to have at least three vertebrate relatives: Sonic hedgehog (Shh); Indian hedgehog, and Desert hedgehog.

The Shh is expressed in a group of cells at the posterior of each developing limb bud. This is exactly the same group of cells found to have an important role in signaling polarity to the developing limb. The signal appears to be graded, with cells close to the posterior source of the signal forming posterior digits and other *"**limb structures and cells farther from the signal source forming more anterior 20 structures. It has been known for many years that transplantation of the signaling cells, a region of the limb bud known as the "zone of polarizing activity (ZPA)" has dramatic effects on limb patterning. Implanting a second ZPA anterior to the limb bud causes a limb to develop with posterior features replacing the anterior ones (in S"essence little fingers instead of thumbs). Shh has been found to be the long sought ZPA signal. Cultured cells making Shh protein (SHH), when implanted into the anterior limb bud region, have the same effect as an implanted ZPA. This establishes that Shh is clearly a critical trigger of posterior limb development.

The factor in the ZPA has been thought for some time to be related to another important developmental signal that polarizes the developing spinal cord.

The notochord, a rod of mesoderm that runs along the dorsal side of early vertebrate embryos, is a signal source that polarizes the neural tube along the dorsal-ventral axis. The signal causes the part of the neural tube nearest to the notochord to form floor plate, a morphologically distinct part of the neural tube. The floor plate, in turn, sends out signals to the more dorsal parts of the neural tube to further determine cell fates. The ZPA was reported to have the same signaling effect as the notochord when transplanted to be adjacent to the neural tube, suggesting the ZPA makes the same signal as the notochord. In keeping with this view, Shh was found to be produced by notochord cells and floor plate cells. Tests of extra expression of Shh in mice led to the finding of extra expression of floor plate genes in cells which would not normally turn them on. Therefore Shh appears to be a component of the signal from notochord to floor plate and from floor plate to more dorsal parts of the neural tube. Besides limb and neural tubes, vertebrate hedgehog genes are also expressed in many other tissues including, but not limited to the peripheral nervous system, brain, lung, liver, kidney, tooth primordia, genitalia, and hindgut and foregut endoderm.

PTC has been proposed as a receptor for HH protein based on genetic experiments in flies. A model for the relationship is that PTC acts through a largely unknown pathway to inactivate both its own transcription and the transcription of the wingless segment polarity gene. This model proposes that HH protein, secreted from adjacent cells, binds to the PTC receptor, inactivates it, and thereby prevents PTC from turning off its own transcription or that of wingless. A number of 20 experiments have shown coordinate events between PTC and HH.

Relevant Literature Descriptions of patched, by itself or its role with hedgehog may be found in Hooper and Scott, Cell 59, 751-765 (1989); Nakano et al., Nature, 341, 508-513 (1989) (both of which also describes the sequence for Drosophila patched) Simcox et al., Development 107, 715-722 (1989); Hidalgo and Ingham, Development, 110, 291-301 (1990); Phillips et al., Development, 110, 105-114 (1990); Sampedro and Guerrero, Nature 353, 187-190 (1991); Ingham et al., Nature 353, 184-187 (1991); and Taylor et al., Mechanisms of Development 42, 89-96 (1993). Discussions of the role of hedgehog include Riddle et al., Cell 75, 1401-1416 (1993); Echelard et al., Cell 75, 1417-1430 (1993); Krauss et al., Cell 75, 1431-1444 (1993); Tabata and Kornberg, Cell 76, 89-102 (1994); Heemskerk DiNardo, Cell 76, 449-460 (1994); Relink et al., Cell 76, 761-775 (1994); and a short review article by Ingham, Current Biology 4, 347-350 (1994). The sequence for the Drosophila 5' noncoding region was reported to the GenBank, accession number M28418, referred to in Hooper and Scott (1989), supra. See also, Forbes, et al., Development 1993 Supplement 115-124.

Summary of the Invention Methods for isolating patched genes, particularly mammalian patched genes, including the mouse and human patched genes, as well as invertebrate patched genes and sequences, are provided. The methods include identification of patched genes from other species, as well as members of the same family of proteins. The subject genes provide 1o methods for producing the patched protein, where the genes and proteins may be used as probes for research, diagnosis, binding of hedgehog protein for its isolation and purification, gene therapy, as well as other utilities.

A first aspect of the present invention provides an isolated nucleic acid sequence encoding a patched gene other than Drosophila patched gene or fragment thereof of at 15 least about 12bp different from the sequence of the Drosophila patched gene, wherein the S-DNA sequence hybridizes under stringent conditions to any of SEQ ID No: 1, 3, 5, 7, 9, 18, and fragments thereof, and encodes a patched polypeptide which binds a hedgehog polypeptide.

A second aspect of the present invention provides an expression cassette including a °0o transcriptional initiation region functional in an expression host, a nucleic acid sequence *o according to the first aspect of the present invention described above, under the transcriptional regulation of said transcriptional initiation region, and a transcriptional termination region functional in said expression host.

third aspect of the present invention provides a cell including an expression 25 cassette according to the second aspect of the present invention described above as part of an extrachromosomal element or integrated into the genome of a host cell as a result of introduction of said expression cassette into said host cell and the cellular progeny of said host cell.

A fourth aspect of the present invention provides a cell including an expression cassette including a transcriptional initiation region functional in an expression host, said transcriptional initiation region consisting of a 5' non-coding region regulating the transcription of patched polypeptide encoded by the isolated nucleic acid of the first RA aspect of the present invention including the promoter and enhancer, a marker gene, and a transcriptional termination region, as part of an extrachromosomal element or integrated [I:\DAYL1B\L1BA]04338.doc:MCN into the genome of a host cell as a result of introduction of said expression cassette into said host, and the cellular progeny thereof.

A fifth aspect of the present invention provides a method for following embryonic development employing the patched polypeptide, said method including: integrating an expression cassette including a transcriptional initiation region functional in embryonic host cells, said transcriptional initiation region consisting of a 5' non-coding region regulating the transcription of a patched polypeptide encoded by the nucleic acid of the firs aspect of the present invention, a marker gene and a transcriptional termination region, wherein said embryonic host cells are capable of developing into a fetus; growing said embryonic host cells, whereby proliferation and differentiation occur; and locating cells expressing the patched polypeptide by means of expression of said marker gene.

A sixth aspect of the present invention provides a method for producing patched polypeptide, said method including: growing a cell according to the third or fourth aspects of the present invention, whereby said patched polypeptide is expressed; and isolating said patched polypeptide free of other proteins.

A seventh aspect of the present invention provides a method for screening candidate compounds for binding affinity to the patched polypeptide, said method including: -combining said candidate compound with a vertebrate or invertebrate cell including said patched protein in the membrane of said cell and an expression cassette including a transcriptional initiation region functional in said cell, a nucleic acid sequence according to the first aspect of the present invention including the entire coding sequence under the transcriptional regulation of said transcriptional initiation region, and a transcriptional termination region functional in said cell, expressing said patched polypeptide in said cell; and assaying for the binding of said candidate compound to said patched e 25 polypeptide.

An eighth aspect of the present invention provides a method for screening candidate compounds for agonist activity with the patched polypeptide, said method including: combining said candidate compound with a vertebrate or invertebrate cell including said patched polypeptide in the membrane of said cell and an expression cassette including a transcriptional initiation region functional in an expression host, said transcriptional initiation region consisting of a 5' non-coding region regulating the transcription of a patched polypeptide encoded by the nucleic acid of the first aspect of the present A invention, a marker gene, and a transcriptional termination region, as part of an fI:\DAYLIB\LIBA]04338.doc:MCN extrachromosomal element of integrated into the genome of a host cell; and assaying for the expression of said marker gene.

A ninth aspect of the present invention provides a monoclonal antibody binding specifically to a patched polypeptide, other than the Drosophila patched polypeptide.

A tenth aspect of the present invention provides an isolated nucleic acid including a ptc coding sequence for a naturally occurring vertebrate patched polypeptide, or allelic variant thereof, which binds to a hedgehog polypeptide, wherein the ptc coding sequence hybridizes to the complement of the coding sequence of SEQ ID Nos. 9 or 18 under stringency conditions equivalent to 5 x SSC at 60 0

C.

An eleventh aspect of the present invention provides an isolated nucleic acid including a coding sequence for a polypeptide including a hedgehog-binding sequence that binds hedgehog polypeptide and which is encoded by a nucleotide sequence that hybridizes to the complement of the coding sequence of SEQ ID Nos. 9 or 18 under stringency conditions equivalent to 5 x SSC at 60 0 C, wherein the polypeptide retains hedgehog-binding activity.

A twelfth aspect of the present invention provides a nucleic acid including: a coding sequence from a naturally occurring vertebrate ptc gene or allelic variant thereof, which coding sequence hybridizes to the complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 5 x SSC at 60°C and encodes 20 a polypeptide that binds hedgehog proteins, and (ii) a heterologous transcriptional initiation region controlling the transcription of the coding sequence.

A thirteenth aspect of the present invention provides a nucleic acid including: a coding sequence for a polypeptide including a hedgehog-binding sequence that binds hedgehog polypeptide and is encoded by a nucleotide sequence that hybridizes to the 25 complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 5 x SSC at 60 0 C, and (ii) a heterologous transcriptional initiation region controlling the transcription of the coding sequence, wherein the polypeptide retains hedgehog-binding activity.

A fourteenth aspect of the present invention provides a method for assessing a genetic predisposition of an animal for a basal cell carcinoma, the method including: detecting, from a sample of nucleic acid isolated from the animal, a loss-of-function mutation in a patched gene in the germline of said animal, wherein the presence of said loss-of-function mutation indicates that said animal has a genetic predisposition for basal cell carcinoma.

[I:\DAYLIB\LIBA]04338.doc:MCN A fifteenth aspect of the present invention provides a method for determining a patched phenotype of cells of a tumor including detecting, from a sample of nucleic acid isolated from the cells, the presence or absence of a loss-of-function mutation of a patched gene in the cells.

A sixteenth aspect of the present invention provides an assay for determining the patched phenotype of a cell, including providing a nucleic acid sample isolated from mammalian cells, detecting the presence or absence of a patched gene sequence or allelic variant thereof by hybridization of the nucleic acid sample with one or more nucleic acid probes which hybridize to a mammalian patched gene.

A seventeenth aspect of the present invention provides an assay for detecting mutations in apatched gene, including detecting, in a sample of isolated mammalian cells or nucleic acid isolated therefrom, the presence or absence of a deletion of one or more nucleotides from the patched gene, an addition of one or more nucleotides to the patched gene, a substitution of one or more nucleotides of the patched gene, a chromosomal rearrangement of all or a portion of the patched gene, an alteration in the level of an mRNA transcript of the patched gene, or alteration of the splicing pattern of an mRNA **transcript of the patched gene.

An eighteenth aspect of the present invention provides an assay for phenotyping the °patched status of a cell, including detecting, in a sample of isolated mammalian cells, the 20 presence or absence of a genetic lesion of a patched gene characterized by at least one of an aberrant mutation of a patched gene resulting in loss of function, and (ii) misexpression of the patched gene resulting in loss of function.

A nineteenth aspect of the present invention provides a method for assessing a genetic predisposition of an animal for developing basal cell nevus syndrome, the method including: detecting, from a sample of nucleic acid isolated from the animal, a loss-offunction mutation in a patched gene in the germline of said animal, wherein the presence of said loss-of-function mutation indicates that said animal has a genetic predisposition for developing basal cell nevus syndrome.

A twentieth aspect of the present invention provides a method for diagnosing a genetic predisposition of an animal for at least one of a developmental abnormality or a proliferative disorder marked by aberrant expression or activity of a patched gene or gene product, the method including detecting the presence of a predisposing mutation in a patched gene in cells of said animal, wherein the presence of said predisposing mutation Pindicates that said individual has a genetic predisposition for at least one of S developmental abnormalities or a proliferative disorder.

[I:\DAYLIB\LIBA104338.doc:MCN 4d A twenty-first aspect of the present invention provides a method for characterizing the phenotype of a tumor, including detecting the presence of an oncogenic patched mutation in cells of the tumor, wherein the presence of said oncogenic mutation indicates that said tumor has a patched-associated phenotype.

A twenty-second aspect of the present invention provides a genetically engineered mammalian cell predisposed to develop a proliferative phenotype as a result of transfection of said mammalian cell with at least one nucleic acid construct which inhibits expression of an endogenous patched gene or alters the signal transduction activity of a wild-type patched protein.

A twenty-third aspect of the present invention provides a method for treating an animal having a disorder characterized by loss-of-function of a patched gene, including transfecting cells of the animal with an expression construct encoding a patched polypeptide.

A twenty-fourth aspect of the present invention provides a method for treating an animal having a disorder characterized by loss-of-function of a patched gene, including administering to the animal an agent which inhibits derepression of one or more patcheddependent genes.

A twenty-fifth aspect of the present invention provides a genetically engineered non-human mammal predisposed to develop a proliferative phenotype as a result of 20 transfection of said mammalian cell with at least one nucleic acid construct which inhibits expression of an endogenous patched gene or alters the signal transduction activity of a wild-type patched protein.

A twenty-sixth aspect of the present invention provides a method for treating an animal having a disorder characterized by loss-of-function of a wild-type patched gene, including transfecting cells of the animal with an expression construct encoding a patched polypeptide, which construct functionally replaces the wild-type patched gene.

A twenty-seventh aspect of the present invention provides a method for treating an animal having a disorder characterized by loss-of-function of a patched gene, including administering to the animal an agent which inhibits derepression of one or more patcheddependent genes.

A twenty-eighth aspect of the present invention provides a transgenic non-human mammal predisposed to develop a proliferative phenotype as a result of transfection of S cells of said mammal with at least one nucleic acid construct which inhibits expression of T RA endogenous patched gene or alters the signal transduction activity of a wild-type tched protein.

[I:\DAYLIB\LIBA]04338.doc:MCN A twenty-ninth aspect of the present invention provides a method for screening for a patched agonist or antagonist including the steps of: a) combining a patched polypeptide or bioactive fragment thereof, a patched binding partner and a test compound under conditions wherein, but for the test compound, the patched polypeptide and patched binding partner are able to interact; and b) detecting the extent to which, in the presence of the test compound, a patched polypeptide/patched binding partner complex is formed, wherein an increase or decrease in the amount of complex formed in the presence of the compound relative to complex formed in the absence of the compound indicates that the compound interacts with a patched polypeptide or patched binding partner.

A thirtieth aspect of the present invention provides a method for identifying a compound that modulates hedgehog-dependent signal transduction, including the steps of: contacting an appropriate amount of the compound with a cell or cellular extract, .which expresses a patched gene; and determining the resulting patched bioactivity, wherein an increase or decrease in the patched bioactivity in the presence of the 15 compound as compared to the bioactivity in the absence of the compound indicates that the compound is a modulator of a patched bioactivity.

A thirty-first aspect of the present invention provides a method for identifying S. whether a test compound is a patched binding partner or measuring the strength of an interaction between a patched polypeptide and said patched binding partner including: (a) 20 allowing a first molecule including a patched polypeptide operably linked to a heterologous DNA binding domain to interact with (ii) a second molecule comprising a test compound operably linked to a polypeptide transcriptional activation domain and (iii) a hybrid reporter gene including a nucleic acid encoding a reporter operably linked to a DNA sequence comprising a binding site for said heterologous DNA binding domain; and detecting or measuring the expression of the hybrid reporter gene as an indication of the existence or strength of an interaction between the first molecule and the second molecule wherein high levels of hybrid reporter expression indicate a strong interaction [1:\DayLib\LIBVV]02700.doc:ais between patched and said test molecule thereby identifying a test compound which is a patched binding partner.

A thirty-second aspect of the present invention provides a method for identifying a molecule which is a downstream or an upstream component of a patched biochemical pathway or for measuring the strength of the interaction between a patched biochemical pathway component and a patched binding partner including: (a)allowing a first molecule comprising a patched binding partner polypeptide operably linked to a heterologous DNA binding domain to interact with (ii) a second molecule including a test molecule operably linked to a polypeptide transcriptional activation domain and (iii) a hybrid reporter gene including a nucleic acid encoding a reporter operably linked to a DNA sequence comprising a binding site for said heterologous DNA binding domain; and detecting or measuring the expression of the hybrid reporter gene as an indication of f the existence or strength of an interaction between the first molecule and the second molecule wherein high levels of hybrid reporter expression indicate a strong interaction 5is between a patched binding pair and said test molecule thereby identifying a test molecule which is a downstream or an upstream component of the patched biochemical pathway.

A thirty-third aspect of the present invention provides a use of an expression construct encoding a patched polypeptide for transfecting cells in the treatment of an animal having a disorder characterised by loss-of-function of apatched gene.

20 A thirty-fourth aspect of the present invention provides an expression construct encoding a patched polypeptide when used for transfecting cells of an animal having a f disorder characterised by loss-of-function of apatched gene.

A thirty-fifth aspect of the present invention provides use of an agent which inhibits depression of one or more patched-dependent genes in the treatment of an animal having a disorder characterised by loss-of-function of a patched gene.

A thirty-sixth aspect of the present invention provides an agent which inhibits depression of one or more patched-dependent genes when used in the treatment of an animal having a disorder characterized by loss-of-function of apatched gene.

A thirty-seventh aspect of the present invention provides use of an expression construct encoding a patched polypeptide which construct functionally replaces the wildtype patched gene in the treatment of an animal having a disorder characterized by lossof-function of a wild-type patched gene.

A thirty-eighth aspect of the present invention provides an expression construct y s encoding a patched polypeptide which construct functionally replaces the wild-type [I:\DayLib\LIBVV]02700.doc:ais 4g patched gene when used in the treatment of an animal having a disorder characterized by loss-of-function of a wild-type patched gene.

A thirty-ninth aspect of the present invention provides a nucleic acid including a nucleic acid sequence encoding an. amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 75% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 10, 19, and fragments thereof.

A fortieth aspect of the present invention provides a nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 85% identical to a 1o sequence selected from SEQ ID No. 2, 4, 6, 8, 10, 19, and fragments thereof A forty-first aspect of the present invention provides a nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 90% identical to a *oo sequence selected from SEQ ID No. 2, 4, 6, 8, 10, 19, and fragments thereof.

15 A forty-second aspect of the present invention provides a nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 95% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 10, 19, and fragments thereof.

A forty-third aspect of the present invention provides a nucleic acid including a 20 nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 98% identical to a *o.

sequence selected from SEQ ID No. 2, 4, 6, 8, 10, 19, and fragments thereof.

A forty-fourth aspect of the invention provides a nucleic acid including a nucleic acid sequence that hybridizes under stringent conditions to a sequence selected from SEQ ID Nos. 1, 3, 5, 7, 9, 18, and fragments thereof, wherein the nucleic acid sequence encodes an amino acid sequence that binds a naturally occurring hedgehog polypeptide, and wherein the nucleic acid sequence is not identical to SEQ ID No. Brief Description of the Drawings Fig. 1 is a graph having a restriction map of about 10Okbp of the 5' region upstream from the initiation codon of Drosophila patched gene and bar graphs of constructs of truncated portions of the 5' region joined to P3-galactosidase, where the constructs are introduced into fly cell lines for the production of embryos. The expression of 3-gal in e embryos is indicated in the right-hand table during early and late development of the bryo. The greater the number of the more intense the staining.

[I:\DayLib\LIBVV]02700.doc:ais 4h Description of the Specific Embodiments Methods are provided for identifying members of the patched (ptc) gene family from invertebrate and vertebrate, e.g. mammalian, species, as well as the entire cDNA sequence of the mouse and human patched gene. Also, sequences for invertebrate patched genes are provided. The patched gene encodes a transmembrane protein having a large number of transmembrane sequences.

In identifying the mouse and human patched genes, primers were employed to move through the evoluntionary tree from the known Drosophila ptc sequence. Two *primers are employed from the Drosophila sequence with appropriate i 0 0* g* 0 e° eg) 0 [I:\DAYLIB\L1BA]04338.doc:MCN restriction enzyme linkers to amplify portions of genomic DNA of a related invertebrate, such as mosquito. The sequences are selected from regions which are not likely to diverge over evolutionary time and are of low degeneracy.

Conveniently, the regions are the N-terminal proximal sequence, generally within the first 1.5kb, usually within the first 1kb, of the coding portion of the cDNA, conveniently in the first hydrophilic loop of the protein. Employing the polymerase chain reaction (PCR) with the primers, a band can be obtained from mosquito genomic DNA. The band may then be amplified and used in turn as a probe. One may use this probe to probe a cDNA library from an organism in a different branch of the evolutionary tree, such as a butterfly. By screening the library and identifying sequences which hybridize to the probe, a portion of the butterfly patched gene may be obtained. One or more of the resulting clones may then be used to rescreen the library to obtain an extended sequence, up to and including the entire coding region, as well as the non-coding and 3'-sequences. As appropriate, one may sequence all or a portion of the resulting cDNA coding sequence.

One may then screen a genomic or cDNA library of a species higher in the evolutionary scale with appropriate probes from one or both of the prior sequences.

Of particular interest is screening a genomic library, of a distantly related S' 20 invertebrate, e.g. beetle, where one may use a combination of the sequences obtained from the previous two species, in this case, the Drosophila and the butterfly. By appropriate techniques, one may identify specific clones which bind to the probes, which may then be screened for cross hybridization with each of the probes individually. The resulting fragments may then be amplified, e.g. by 25 subcloning.

By having all or parts of the 4 different patched genes, in the presently illustrated example, Drosophila (fly), mosquito, butterfly and beetle, one can now compare the patched genes for conserved sequences. Cells from an appropriate mammalian limb bud or other cells expressing patched, such as notochord, neural tube, gut, lung buds, or other tissue, particularly fetal tissue, may be employed for screening. Alternatively, adult tissue which produces patched may be employed for screening. Based on the consensus sequence available from the 4 other species, one t can develop probes where at each site at least 2 of the sequences have tfie same nucleotide and where the site varies that each species has a unique nucleotide, inosine may be used, which binds to all 4 nucleotides.

Either PCR may be employed using primers or, if desired, a genomic library from an appropriate source may be probed. With PCR, one may use a cDNA library or use reverse transcriptase-PCR (RT-PCR), where mRNA is available from the tissue. Usually, where fetal tissue is employed, one will employ tissue from the first or second trimester, preferably the latter half of the first trimester or the second trimester, depending upon the particular host. The age and source of tissue will depend to a significant degree on the ability to surgically isolate the tissue based on its size, the level of expression of patched in the cells of the tissue, the accessibility of the tissue, the number of cells expressing patched and the like. The amount of tissue available should be large enough so as to provide for a sufficient amount of mRNA to be usefully transcribed and amplified. With mouse tissue, limb bud of from about 10 to 15 dpc (days post conception) may be employed.

In the primers, the complementary binding sequence will usually be at least 14 nucleotides, preferably at least about 17 nucleotides and usually not more than about 30 nucleotides. The primers may also include a restriction enzyme sequence for isolation and cloning. With RT-PCR, the mRNA may be enriched in accordance with known ways, reverse transcribed, followed by amplification with the appropriate primers. (Procedures employed for molecular cloning may be found in Molecular Cloning: A Laboratory Manual, Sambrook et al., eds., Cold Spring Harbor Laboratories, Cold Spring Harbor, NY, 1988). Particularly, the primers may conveniently come from the N-terminal proximal sequence or other conserved region, such as those sequences where at least five amino acids are conserved out of eight amino acids in three of the four sequences. This is illustrated by the sequences (SEQ ID NO:11) IITPLDCFWEG, (SEQ ID NO:12) LIVGG, and (SEQ ID NO:13) PFFWEQY. Resulting PCR products of expected size are subcloned and may be sequenced if desired.

The cloned PCR fragment may then be used as a probe to screen a cDNA library of mammalian tissue cells expressing patched, where hybridizing clones may be isolated under appropriate conditions of stringency. Again, the cDNA library should come from tissue which expresses patched, which tissue will come within the limitations previously described. Clones which hybridize may be subcloned and rescreened. The hybridizing subclones may then be isolated and sequenced or may be further analyzed by employing RNA blots and in situ hybridizations in whole and sectioned embryos. Conveniently, a fragment of from about 0.5 to lkbp of the Nterminal coding region may be employed for the Northern blot.

The mammalian gene may be sequenced and as described above, conserved regions identified and used as primers for investigating other species. The Nterminal proximal region, the C-terminal region or an intermediate region may be employed for the sequences, where the sequences will be selected having minimum degeneracy and the desired level of conservation over the probe sequence.

The DNA sequence encoding PTC may be cDNA or genomic DNA or fragment thereof, particularly complete exons from the genomic DNA, may be isolated as the sequence substantially free of wild-type sequence from the chromosome, may be a 50 kbp fragment or smaller fragment, may be joined to heterologous or foreign DNA, which may be a single nucleotide, oligonucleotide of up to 50 bp, which may be a restriction site or other identifying DNA for use as a primer, probe or the like, or a nucleic acid of greater than 50 bp, where the nucleic acid may be a portion of a cloning or expression vector, comprise the regulatory 20 regions of an expression cassette, or the like. The DNA may be isolated, purified being substantially free of proteins and other nucleic acids, be in solution, or the like.

The subject gene may be employed for producing all or portions of the patched protein. The subject gene or fragment thereof, generally a fragment of at least 12 bp, usually at least 18 bp, may be introduced into an appropriate vector for extrachromosomal maintenance or for integration into the host. Fragments will usually be immediately joined at the 5' and/or 3' terminus to a nucleotide or sequence not found in the natural or wild-type gene, or joined to a label other than a nucleic acid sequence. For expression, an expression cassette may be employed, providing for a transcriptional and translational initiation region, which may be inducible or constitutive, the coding region under the transcriptional control of the transcriptional initiation region, and a transcriptional and translational termination 4.

region. Various transcriptional initiation regions may be employed which are functional in the expression host. The peptide may be expressed in prokaryotes or eukaryotes in accordance with conventional ways, depending upon the purpose for expression. For large production of the protein, a unicellular organism or cells of a higher organism, e.g. eukaryotes such as vertebrates, particularly mammals, may be used as the expression host, such as E. coli, B, subtilis, S. cerevisiae, and the like.

In many situations, it may be desirable to express the patched gene in a mammalian host, whereby the patched gene will be transported to the cellular membrane for various studies. The protein has two parts which provide for a total of six transmembrane regions, with a total of six extracellular loops, three for each part.

The character of the protein has similarity to a transporter protein. The protein has two conserved glycosylation signal triads.

The subject nucleic acid sequences may be modified for a number of purposes, particularly where they will be used intracellularly, for example, by being joined to a nucleic acid cleaving agent, e.g. a chelated metal ion, such as iron or chromium for cleavage of the gene; as an antisense sequence; or the like.

Modifications may include replacing oxygen of the phosphate esters with sulfur or nitrogen, replacing the phosphate with phosphoramide, etc.

With the availability of the protein in large amounts by employing an 20 expression host, the protein may be isolated and purified in accordance with conventional ways. A lysate may be prepared of the expression host and the lysate purified using HPLC, exclusion chromatography, gel electrophoresis, affinity chromatography, or other purification technique. The purified protein will generally be at least about 80% pure, preferably at least about 90% pure, and may be up to 25 100% pure. By pure is intended free of other proteins, as well as cellular debris.

The polypeptide may be used for the production of antibodies, where short fragments provide for antibodies specific for the particular polypeptide, whereas larger fragments or the entire gene allow for the production of antibodies over the surface of the polypeptide or protein, where the protein may be in its natural .30 conformation.

Antibodies may be prepared in accordance with conventional ways, where the expressed polypeptide or protein may be used as an immunogen, by itself or conjugated to known immunogenic carriers, e.g. KLH, pre-S HBsAg, other viral or eukaryotic proteins, or the like. Various adjuvants may be employed, with a series of injections, as appropriate. For monoclonal antibodies, after one or more booster injections, the spleen may be isolated, the splenocytes immortalized, and then screened for high affinity antibody binding. The immortalized cells, e.g.

hybridomas, producing the desired antibodies may then be expanded. For further description, see Monoclonal Antibodies: A Laboratory Manual, Harlow and Lane eds., Cold Spring Harbor Laboratories, Cold Spring Harbor, New York, 1988. If desired, the mRNA encoding the heavy and light chains may be isolated and mutigenized by cloning in E. coli, and the heavy and light chains may be mixed to further enhance the affinity of the antibody. The antibodies may find use in diagnostic assays for detection of the presence of the PTC protein on the surface of cells or to inhibit the transduction of signal by the PTC protein ligand by competing for the binding site.

The mouse patched gene (SEQ ID NO:09) encodes a protein (SEQ ID NO: 10) which has about 38% identical amino acids to fly PTC (SEQ ID NO:6) over about 1,200 amino acids. This amount of conservation is dispersed through much of the protein excepting the C-terminal region. The mouse protein also has a amino acid insert relative to the fly protein. The human patched gene (SEQ ID 20 NO: 18) contains an open reading fram of about 1450 amino acids (SEQ ID NO: 19) that is about 96% identical (98 similar) to mouse ptc (SEQ ID NO:09). The human patched gene (SEQ ID NO: 18), including coding and non-coding sequences, is about 89% identical to the mouse patched gene (SEQ ID NO:09).

The butterfly PTC homolog (SEQ ID NO:4) is 1,300 amino acids long and overall has a 50% amino acid identity (72% similarity) to fly PTC (SEQ ID NO:6).

With the exception of a divergent C-terminus, this homology is evenly spread across the coding sequence. A 267bp exon from the beetle patched gene encodes an 89 amino acid protein fragment which was found to be 44 and 51 identical to the corresponding regions of fly and butterfly PTC respectively.

The mouse ptc message is about 8 kb long and the message is present in low levels as early as 7 dpc, the abundancy increasing by 11 and 15 dpc. Northern blot indicates a clear decrease in the amount of message at 17 dpc. In the adult, PTC RNA is present in high amounts in the brain and lung, as well as in moderate amounts in the kidney and liver. Weak signals are detected in heart, spleen, skeletal muscle and testes.

In mouse embryos, ptc mRNA is present at 7 dpc, using in situ hybridization. ptc is present at high levels along the neural axis of 8.5 dpc embryos.

By 11.5 dpc, ptc can be detected in developing lung buds and gut, consistent with its Northern profile. In addition, the gene is present at high levels in the ventricular zone of the central nervous system as well as in the zona limitans of the prosencephalon. ptc is also strongly transcribed in the perichondrium condensing cartilage of 11.5 and 13.5 dpc limb buds, as well as in the ventral portion of the somites, a region which is prospective sclerotome and eventually forms bone in the vertebral column. PTC is present in a wide range of tissues from endodermal, mesodermal, as well as ectodermal origin, evidencing the fundamental role in many aspects of embryonic development, including the condensation of cartilage, the patterning of limbs, the differentiation of lung tissue, and the generation of neurons.

The patched nucleic acid may be used for isolating the gene from various mammalian sources of interest, particularly primate, more particularly human, or from domestic animals, both pet and farm, e.g. lagomorpha, rodentiae, porcine, bovine, feline, canine, ovine, equine, etc. By using probes, particularly labeled 20 probes of DNA sequences, of the patched gene, one may be able to isolate mRNA or genomic DNA, which may be then used for identifying mutations, particularly associated with genetic diseases, such as spina bifida, limb defects, lung defects, problems with tooth development, liver and kidney development, peripheral nervous system development, and other sites where a patched gene is involved in regulation.

25 The subject probes can also be used for identifying the level of expression in cells associated with the testis to determine the relationship with the level of expression and sperm production.

The gene or fragments thereof may be used as probes for identifying the non-coding region comprising the transcriptional initiation region, particularly the enhancer regulating the transcription of patched. By probing a genomic library, particularly with a probe comprising the 5' coding region, one can obtain fragments comprising the 5' non-coding region. If necessary, one may walk the fragment to obtain further 5' sequence to ensure that one has at least a functional portion of the enhancer. It is found that the enhancer is proximal to the 5' coding region, a portion being in the transcribed sequence and downstream from the promoter sequences. The transcriptional initiation region may be used for many purposes, studying embryonic development, providing for regulated expression of patched protein or other protein of interest during embryonic development or thereafter, and in gene therapy.

The gene may also be used for gene therapy, by transfection of the normal gene into embryonic stem cells or into mature cells. A wide variety of viral vectors can be employed for transfection and stable integration of the gene into the genome of the cells. Alternatively, micro-injection may be employed, fusion, or the like for introduction of genes into a suitable host cell. See, for example, Dhawan et al., Science 254, 1509-1512 (1991) and Smith et al., Molecular and Cellular Biology (1990) 3268-3271.

By providing for the production of large amounts of PTC protein, one can use the protein for identifying ligands which bind to the PTC protein. Particularly, one may produce the protein in cells and employ the polysomes in columns for isolating ligands for the PTC protein. One may incorporate the PTC protein into liposomes by combining the protein with appropriate lipid surfactants, e.g.

20 phospholipids, cholesterol, etc., and sonicate the mixture of the PTC protein and the surfactants in an aqueous medium. With one or more established ligands, e.g.

hedgehog, one may use the PTC protein to screen for antagonists which inhibit the binding of the ligand. In this way, drugs may be identified which can prevent the transduction of signals by the PTC protein in normal or abnormal cells.

25 The PTC protein, particularly binding fragments thereof, the gene encoding the protein, or fragments thereof, particularly fragments of at least about 18 nucleotides, frequently of at least about 30 nucleotides and up to the entire gene, more particularly sequences associated with the hydrophilic loops, may be employed in a wide variety of assays. In these situations, the particular molecules will 30 normally be joined to another molecule, serving as a label, where the label can directly or indirectly provide a detectable signal. Various labels include radioisotopes, fluorescers, chemiluminescers, enzymes, specific binding molecules, c particles, e.g. magnetic particles, and the like. Specific binding molecules include pairs, such as biotin and streptavidin, digoxin and antidigoxin etc. For the specific binding members, the complementary member would normally be labeled with a molecule which provides for detection, in accordance with known procedures. The assays may be used for detecting the presence of molecules which bind to the patched gene or PTC protein, in isolating molecules which bind to the patched gene, for measuring the amount of patched, either as the protein or the message, for identifying molecules which may serve as agonists or antagonists, or the like.

Various formats may be used in the assays. For example, mammalian or invertebrate cells may be designed where the cells respond when an agonist binds to PTC in the membrane of the cell. An expression cassette may be introduced into the cell, where the transcriptional initiation region of patched is joined to a marker gene, such as P-galactosidase, for which a substrate forming a blue dye is available.

A 1.5kb fragment that responds to PTC signaling has been identified and shown to regulate expression of a heterologous gene during embryonic development. When an agonist binds to the PTC protein, the cell will turn blue. By employing a competition between an agonist and a compound of interest, absence of blue color formation will indicate the presence of an antagonist. These assays are well known in the literature. Instead of cells, one may use the protein in a membrane environment and determine binding affinities of compounds. The PTC may be bound to a surface and a labeled ligand for PTC employed. A number of labels have been indicated previously. The candidate compound is added with the labeled ligand in an appropriate buffered medium to the surface bound PTC. After an incubation to ensure that binding has occurred, the surface may be washed free of 25 any non-specifically bound components of the assay medium, particularly any nonspecifically bound labeled ligand, and any label bound to the surface determined.

Where the label is an enzyme, substrate producing a detectable product may be used.

The label may be detected and measured. By using standards, the binding affinity of o* the candidate compound may be determined.

The availability of the gene and the protein allows for investigation of the development of the fetus and the role patched and other molecules play in such development. By employing antisense sequences of the patched gene, where the sequences may be introduced in cells in culture, or a vector providing for transcription of the antisense of the patched gene introduced into the cells, one can investigate the role the PTC protein plays in the cellular development. By providing for the PTC protein or fragment thereof in a soluble form which can compete with the normal cellular PTC protein for ligand, one can inhibit the binding of ligands to the cellular PTC protein to see the effect of variation in concentration of ligands for the PTC protein on the cellular development of the host. Antibodies against PTC can also be used to block function, since PTC is exposed on the cell surface.

The subject gene may also be used for preparing transgenic laboratory animals, which may serve to investigate embryonic development and the role the PTC protein plays in such development. By providing for variation in the expression of the PTC protein, employing different transcriptional initiation regions which may be constitutive or inducible, one can determine the developmental effect of the differences in PTC protein levels. Alternatively, one can use the DNA to knock out the PTC protein in embryonic stem cells, so as to produce hosts with only a single functional patched gene or where the host lacks a functional patched gene.

By employing homologous recombination, one can introduce a patched gene, which is differentially regulated, for example, is expressed to the development of the fetus, but not in the adult. One may also provide for expression of the patched gene in 20 cells or tissues where it is not normally expressed or at abnormal times of development. One may provide for mis-expression or failure of expression in eee4* certain tissue to mimic a human disease. Thus, mouse models of spina bifida or abnormal motor neuron differentiation in the developing spinal cord are made available. In addition, by providing expression of PTC protein in cells in which it is 25 otherwise not normally produced, one can induce changes in cell behavior upon binding of ligand to the PTC protein.

4 Areas of investigation may include the development of cancer treatments.

The wingless gene, whose transcription is regulated in flies by PTC, is closely related to a mammalian oncogene, Wnt-1, a key factor in many cases of mouse 30 breast cancer. Other Wnt family members, which are secreted signaling proteins, are implicated in many aspects of development. In flies, the signaling factor decapentaplegic, a member of the TGF-beta family of signaling proteins, known to affect growth and development in mammals, is also controlled by PTC.- Since members of both the TGF-beta and Wnt families are expressed in mice in places close to overlapping with patched, the common regulation provides an opportunity in treating cancer. Also, for repair and regeneration, proliferation competent cells making PTC protein can find use to promote regeneration and healing for damaged tissue, which tissue may be regenerated by transfecting cells of damaged tissue with the ptc gene and its normal transcription initiation region or a modified transcription initiation region. For example, PTC may be useful to stimulate growth of new teeth by engineering cells of the gums or other tissues where PTC protein was during an earlier developmental stage or is expressed.

Since Northern blot analysis indicates that ptc is present at high levels in adult lung tissue, the regulation of ptc expression or binding to its natural ligand may serve to inhibit proliferation of cancerous lung cells. The availability of the gene encoding PTC and the expression of the gene allows for the development of agonists and antagonists. In addition, PTC is central to the ability of neurons to differentiate early in development. The availability of the gene allows for the introduction of PTC into host diseased tissue, stimulating the fetal program of division and/or differentiation. This could be done in conjunction with other genes which provide for the ligands which regulate PTC activity or by providing for 20 agonists other than the natural ligand.

The availability of the coding region for various ptc genes from various species, allows for the isolation of the 5' non-coding region comprising the promoter and enhancer associated with the ptc genes, so as to provide transcriptional and posttranscriptional regulation of the ptc gene or other genes, which allow for regulation 25 of genes in relation to the regulation of the ptc gene. Since the ptc gene is autoregulated, activation of the ptc gene will result in activation of transcription of a gene under the transcriptional control of the transcriptional initiation region of the ptc gene. The transcriptional initiation region may be obtained from any host species and introduced into a heterologous host species, where such initiation region 30 is functional to the desired degree in the foreign host. For example, a fragment of from about 1.5 kb upstream from the initiation codon, up to about 10kb, preferably up to about 5 kb may be used to provide for transcriptional initiation regulated by t" the PTC protein, particularly the Drosophila 5'-non-coding region (GenBank accession no. M28418).

The following examples are offered by illustration not by way of limitation.

EXPERIMENTAL

Methods and Materials I. PCR on Mosquito (Anopheles gambiae) Genomic DNA: PCR primers were based on amino acid stretches of fly PTC that were not likely to diverge over evolutionary time and were of low degeneracy. Two such primers (P2R1 (SEQ ID NO:14): GGACGAATTCAARGTNCAYCARYTNTGG, P4R1: (SEQ ID NO:15) GGACGAATTCCYTCCCARAARCANTC, (the underlined sequences are Eco RI linkers) amplified an appropriately sized band from mosquito genomic DNA using the PCR. The program conditions were as follows: 94 °C 4 min.; 72 *C Add Taq; [49 °C 30 sec.; 72 *C 90 sec.; 94 °C 15 sec] 3 times [94 °C 15 sec.; 50 *C 30 sec.; 72 °C 90 sec] 35 times 72 *C 10 min; 4 *C hold This band was subcloned into the EcoRV site of pBluescript II and sequenced using 20 the USB Sequence kit.

*6 II. Screen of a Butterfly cDNA Library with Mosquito PCR Product Using the mosquito PCR product (SEQ ID NO:7) as a probe, a 3 day embryonic Precis coenia Agtl0 cDNA library (generously provided by Sean Carroll) was screened. Filters were hybridized at 65 *C overnight in a solution containing 5xSSC, 10% dextran sulfate, 5x Denhardt's, 200 Jxg/ml sonicated salmon sperm DNA, and 0.5% SDS. Filters were washed in 0.1X SSC, 0.1% SDS at room temperature several times to remove nonspecific hybridization. Of the 100,000 plaques initially screened, 2 overlapping clones, LI and L2, were isolated, which corresponded to the N terminus of butterfly PTC. Using L2 as a probe, the library filters were rescreened and 3 additional clones (L5, L7, L8) were isolated which encompassed the remainder of the ptc coding sequence. The full length sequence of butterfly ptc (SEQ ID NO:3) was determined by ABI automated sequencing.

III. Screen of a Tribolium (beetle) Genomic Library with Mosquito PCR Product and 900 bp Fragment from the Butterfly Clone A Agem 11 genomic library from Tribolium casteneum (gift of Rob Dennell) was probed with a mixture of the mosquito PCR (SEQ ID NO:7) product and BstXI/EcoRI fragment of L2. Filters were hybridized at 55 °C overnight and washed as above. Of the 75,000 plaques screened, 14 clones were identified and the SacI fragment of T8 (SEQ ID NO: which crosshybridized with the mosquito and butterfly probes, was subcloned into pBluescript.

IV. PCR on Mouse cDNA Using Degenerate Primers Derived from Regions Conserved in the Four Insect Homologues Two degenerate PCR primers (P4REV: (SEQ ID NO: 16) GGACGAATTCYTNGANTGYTTYTGGGA; P22: (SEQ ID NO: 17) CATACCAGCCAAGCTTGTCIGGCCARTGCAT) were designed based on a comparison of PTC amino acid sequences from fly (Drosophila melanogaster) (SEQ ID NO:6), mosquito (Anopheles gambiae)(SEQ ID NO:8), butterfly (Precis 20 coenia)(SEQ ID NO:4), and beetle (Tribolium casteneum)(SEQ ID NO:2). I represents inosine, which can form base pairs with all four nucleotides. P22 was used to reverse transcribe RNA from 12.5 dpc mouse limb bud (gift from David Kingsley) for 90 min at 37 PCR using P4REV(SEQ ID NO: 17) and P22(SEQ ID NO: 18) was then performed on 1 /l of the resultant cDNA under the following conditions: 94 *C 4 min.; 72 *C Add Taq; [94 'C 15 sec.; 50 C 30 sec.; 72 *C 90 sec.] 35 times 72 °C 10 min.; 4 °C hold PCR products of the expected size were subcloned into the TA vector (Invitrogen) 30 and sequenced with the Sequenase Version 2.0 DNA Sequencing Kit Using the cloned mouse PCR fragment as a probe, 300,000 plaques of a mouse 8.5 dpc Agtl0 cDNA library (a gift from Brigid Hogan) were screened at as above and washed in 2x SSC, 0.1 SDS at room temperature. 7 clones were isolated, and three (M2 M4, and M8) were subcloned into pBluescript II.

200,000 plaques of this library were rescreened using first, a 1.1 kb EcoRI fragment from M2 to identify 6 clones (M9-M16) and secondly a mixed probe containing the most N terminal (Xhol fragment from M2) and most C terminal sequences (BamHI/BgUI fragment from M9) to isolate 5 clones (M17-M21). M9, M10O, M14, and M17-21 were subcloned into the EcoRI site of pBluescript II (Strategene).

V. RNA Blots and in situ Hybridizations in Whole and Sectioned Mouse Embryos Northerns: A mouse embryonic Northern blot and an adult multiple tissue Northern blot (obtained from Clontech) were probed with a 900 bp EcoRI fragment from an N terminal coding region of mouse ptc. Hybridization was performed at 65 °C in SSPE, lOx Denhardt's, 100 /g/ml sonicated salmon sperm DNA, and 2% SDS.

After several short room temperature washes in 2x SSC, 0.05% SDS, the blots were washed at high stringency in 0. IX SSC, 0.1 SDS at In situ hybridization of sections: 7.75, 8.5, 11.5, and 13.5 dpc mouse embryos were dissected in PBS and frozen in Tissue-Tek medium at -80 12-16 u/m frozen sections were cut, 20 collected onto VectaBond (Vector Laboratories) coated slides, and dried for 30-60 minutes at room temperature. After a 10 minute fixation in 4% paraformaldehyde in PBS, the slides were washed 3 times for 3 minutes in PBS, acetylated for 10 minutes in 0.25% acetic anhydride in triethanolamine, and washed three more times for minutes in PBS. Prehybridization (50% formamide, 5X SSC, 250 4g/ml yeast 25 tRNA, 500 ,g/ml sonicated salmon sperm DNA, and 5x Denhardt's) was carried out for 6 hours at room temperature in 50% formamide/5x SSC humidified chambers. The probe, which consisted of I kb from the N-terminus of ptc, was added at a concentration of 200-1000 ng/ml into the same solution used for prehybridization, and then denatured for five minutes at 80 C. Approximately 30 jul of probe were added to each slide and covered with Parafilm. The slides were incubated overnight at 65 °C in the same humidified chamber used previously. The following day, the probe was washed successively in 5X SSC (5 minutes, 65 0.2X SSC (1 hour, 65 and 0.2X SSC (10 minutes, room temperatufe). After five minutes in buffer BI (0.1M maleic acid, 0.15 M NaCi, pH the slides were blocked for 1 hour at room temperature in 1 blocking reagent (Boerhinger- Mannheim) in buffer BI, and then incubated for 4 hours in buffer Bl containing the DIG-AP conjugated antibody (Boerhinger-Mannheim) at a 1:5000 dilution. Excess antibody was removed during two 15 minute washes in buffer Bi, followed by five minutes in buffer B3 (100 mM Tris, 100mM NaCI, 5mM MgCl 2 pH The antibody was detected by adding an alkaline phosphatase substrate (350 1 l 75 mg/ml X-phosphate in DMF, 450 1 l 50 mg/ml NBT in 70% DMF in 100 mls of buffer B3) and allowing the reaction to proceed over-night in the dark. After a brief rinse in mM Tris, ImM EDTA, pH 8.0, the slides.were mounted with Aquamount (Lerner Laboratories).

VI. Drosophila 5-transcriptional initiation region P-gal constructs.

A series of constructs were designed that link different regions of the ptc promoter from Drosophila to a LacZ reporter gene in order to study the cis regulation of the ptc expression pattern. See Fig. 1. A 10.8kb BamHI/BspMl fragment comprising the 5'-non-coding region of the mRNA at its 3'-terminus was obtained and truncated by restriction enzyme digestion as shown in Fig. 1. These expression cassettes were introduced into Drosophila lines using a P-element vector (Thummel et al., Gene 74, 445-456 (1988), which were injected into embryos, providing flies which could be grown to produce embryos. (See Spradling and Rubin, Science (1982) 218, 341-347 for a description of the procedure.) The vector used a pUC8 background into which was introduced the white gene to provide for yellow eyes, portions of the P-element for integrtion, and the constructs were inserted into a polylinker upstream from the LacZ gene. The resulting embryos were stained using antibodies to LacZ protein conjugated to HRP and the embryos developed with OPD dye to identify the expression of the LacZ gene. The staining pattern is described in Fig. 1, indicating whether there was staining during the early and late development of the embryo.

VII. Isolation of a Mouse ptc Gene Homologues of fly PTC (SEQ ID NO:6) were isolated from three insects: mosquito, butterfly and beetle, using either PCR or low stringency library screens.

PCR primers to six amino acid stretches of PTC of low mutatability and degeneracy were designed. One primer pair, P2 and P4, amplified an homologous fragment of ptc from mosquito genomic DNA that corresponded to the first hydrophilic loop of the protein. The 345bp PCR product (SEQ ID NO:7) was subcloned and sequenced and when aligned to fly PTC, showed 67% amino acid identity.

The cloned mosquito fragment was used to screen a butterfly XGT 10 cDNA library. Of 100,000 plaques screened, five overlapping clones were isolated and used to obtain the full length coding sequence. The butterfly PTC homologue (SEQ ID NO:4) is 1,311 amino acids long and overall has 50% amino acid identity (72% similarity) to fly PTC. With the exception of a divergent C-terminus, this homology is evenly spread across the coding sequence. The mosquito PCR clone (SEQ ID NO:7) and a corresponding fragment of butterfly cDNA were used to screen a beetle X.geml 1 genomic library. Of the plaques screened, 14 clones were identified. A fragment of one clone which hybridized with the original probes, was subcloned and sequenced. This 3kb piece contains an 89 amino acid exon (SEQ ID NO:2) which is 44% and 51% identical to the corresponding regions of fly and butterfly PTC respectively.

Using an alignment of the four insect homologues in the first hydrophilic loop of the PTC, two PCR primers were designed to a five and six amino acid stretch which were identical and of low degeneracy. These primers were used to isolate the mouse homologue using RT-PCR on embryonic limb bud RNA. An appropriately sized band was amplified and upon cloning and sequencing, it was found to encode a protein 65% identical to fly PTC. Using the cloned PCR product and subsequently, fragments of mouse ptc cDNA, a mouse embryonic XcDNA library was screened.

From about 300,000 plaques, 17 clones were identified and of these, 7 form overlapping cDNA's which comprise most of the protein-coding sequence (SEQ ID NO:9) VIIa. Developmental and Tissue Distribution of Mouse PTC RNA WO 96/11260 PCT/US95/13233 In both the embryonic and adult Northern blots, the ptc probe detects a single 8kb message. Further exposure does not reveal any additional minor bands.

Developmentally, ptc mRNA is present in low levels as early as 7 dpc and becomes quite abundant by 11 and 15 dpc. While the gene is still present at 17 dpc, the Northern blot indicates a clear decrease in the amount of message at this stage. In the adult, ptc RNA is present in high amounts in the brain and lung, as well as in moderate amounts in the kidney and liver. Weak signals are detected in heart, spleen, skeletal muscle, and testes.

VIIb. In situ Hybridization of Mouse PTC in Whole and Section Embryos Northern analysis indicates that ptc mRNA is present at 7 dpc, while there is no detectable signal in sections from 7.75 dpc embryos. This discrepancy is explained by the low level of transcription. In contrast, ptc is present at high levels along the neural axis of 8.5 dpc embryos. By 11.5 dpc, ptc can be detected in the developing lung buds and gut, consistent with its adult Northern profile. In addition, the gene is present at high levels in the ventricular zone of the central nervous system, as well as in the zona limitans of the prosencephalon. ptc is also strongly transcribed in the condensing cartilage of 11.5 and 13.5 dpc limb buds, as well as in the ventral portion of the somites, a region which is prospective sclerotome and eventually forms bone in the vertebral column. ptc is present in a wide range of tissues from endodermal, mesodermal and ectodermal origin supporting its fundamental role in embryonic development.

VIII. Isolation of the Human ptc Gene To isolate human ptc (hptc), 2 x 10' plaques from a human lung cDNA library (HL3022a, Clonetech) were screened with a 1kbp mouse ptc fragment, M2-2.

Filters were hybridized overnight at reduced stringency (60 *C in 5X SSC, dextran sulfate, 5X Denhardt's, 0.2 mg/ml sonicated salmon sperm DNA, and SDS). Two positive plaques (HI and H2) were isolated, the inserts cloned into pBluescript, and upon sequencing, both contained sequence highly similar to the mouse ptc homolog. To isolate the 5' end, an additional 6 x 10 plaques were screened in duplicate with M2-3 EcoR I and M2-3 Xho I (containing 5' untranslated sequence of mouse ptc) probes. Ten plaques were purified and of these, 6 inserts were subloned into pBluescript. To obtain the full coding sequence, H2 was fully and H14, H20, and H21 were partially sequenced. The 5.lkbp of human ptc sequence (SEQ ID NO: 18) contains an open reading frame of 1447 amino acids (SEQ ID NO:19) that is 96% identical and 98% similar to mouse ptc. The 5' and 3' untranslated sequences of human ptc (SEQ ID NO: 18) are also highly similar to mouse ptc (SEQ ID NO:09) suggesting conserved regulatory sequence.

IX. Comparison of Mouse. Human, Fly and Butterfly Sequences The deduced mouse PTC protein sequence (SEQ ID NO: 10) has about 38% identical amino acids to fly PTC over about 1,200 amino acids. This amount of conservation is dispersed through much of the protein excepting the C-terminal region. The mouse protein also has a 50 amino acid insert relative to the fly protein. Based on the sequence conservation of PTC and the functional conservation of hedgehog between fly and mouse, one concludes that ptc functions similarly in the two organisms. A comparison of the amino acid sequences of mouse (mptc) (SEQ ID NO: 10), human (hptc) (SEQ ID NO: 19), butterfly (bptc)(SEQ ID NO:4) and drosophila (ptc) (SEQ ID NO:6) is shown in Table 1.

TABLE 1 alignment of human, mouse, fly, and butterfly PTC homologs alignment of human, mouse, fly, and butterfly ptc homologs

S

S.

555555

S

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MASAGNAAEPQDR--GGGGSGCIGAPGRPAGGGRRRRTGGLRRAAAPDRDYLHRPSYCDA

MASAGNAA----------------GALGRQAGGGRRRRTGGPHRA-APDRDYLHRPSYCDA

M-----DRDSLPRVPDTHGD--VVDE---------KLFSDL---------YI-RTSWVDA

MVAPDSEAPSNPRITAAHESPCATEA---------RHSADL---------YI-RTSWVDA

AFALEQISKGKATGRKAPLWLRAKFQRLLFKLGCYIQKNCGKFLVVGLLIFGAFAVGLKA

AFALEQI SKGKATGRKAPLWLRAKFQRLLFKLGCY IQKNCGKFLVVGLLI FGAFAVGLKA

QVALDQIDKGKARGSRTAIYLRSVFQSHLETLGSSVQKHAGKVLFVAILVLSTFCVGLKS

ALALSELEKGNIEGGRTSLWIRAWLQEQLFILGCFLQGDAGKVLFVAILVLSTFCVGLKS

ANLETNVEELWVEVGGRVSRELNYTRQKIGEEAMFNPQLMIQTPKEEGANVLTTEALLQH

ANLETNVEELWVEVGGRVSRELNYTRQKI GEEAMFNPQLMI QTPKEEGANVLTTEALLQH

AQIHSKVHQLWIQEGGRLEAELAYTQKTIGEDESATHQLLIQTTHDPNASVLHPQALLAH

AQIHTRVDQLWVQEGGRLEAELKYTAQALGEADSSTHQLVIQTAKDPDVSLLHPGALLEH

LDSALQASRVHvYMYNRQWKLEHLCYKSGELITET-GYMDQIIEYLYPCLI ITPLDCFWE

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTc

BPTC

HPTC

MPTC

PTC

BPTC

LDSALQASRVHVYMYl4RQWKLEHLCYKSGELITET-GYMDQI IEYLY15CLI ITPLDCFWE LEVLVKATAVKVHLYDTEWGLRDMCNMPSTPS FEGIYYIEQILRHLI PCSI IT-PLDCFWE LKVVHAATRVTVHMYDIEWRLKDLCYSPSI PDFEGYHHIES IIDNVI PCAI ITPLDCFWE GALSTYLGPL WTFPEFEL GAKLQS GTAYLLGKPPLR- WTNFD PLE FLEE LK KINYQVDSWEEMLNKAEV GSQLL-GPESAVVI PGLNQRLLWTTLNPASVMQYMKQKMSEEKI SFDFETVEQYMKRAAI GSKLL-GPDYPIYVPHLKHKLQW'rHLNPLEVVEEVK-KL KFQFPLSTIEAYMKRAGI

*MLLLNGCGLRKYHWEEIVGT

GHGYMDRPCLNPADPDCPATAPNKNSTKPLDMALVLNGGCHGLSRKYMHWQEE LIVGGTV GSGYMEKPCLNPLNPNCPDTAPNKNSTQPPDVGAI LSGGCY GYAAXHMHWPEE LIVGGRK TSAYMKKPCLDPTDPHCPATAPNKKSGHI PDVAAELSHGCYGFAAAYMHWPEQLIVGGAT KN*T *TM*Q *FKG*EV*H*NNE*K KNSTGKLVSAHALQTMFQLMTPKQMYEHFKGYEYVSHINWNEDKAAAI LEAWQRTYVEVV

RNRSGHLRKAQALQSVVQLMTEKEMYDQWQDNYKVHHLGWTQEKAAEVLNAWQRNFSREV

RNSTSALRSARALQTVVQLMGEREMYEYWADHYKVHQI GWNQEKAAAVLDAWQRKFAAEV *QVQSQ LFTTLD ****MLRW- HQSVAQNSTQK VLFTTTTLDDILKSFSDVSVIRVASGYLLMLAYACLTMLRW-DC EQLLRKQSRIATNYDIYVFSSAALDDI LAKFSHPSALSIVIGVAVTVLYAFCTLLRWRDP

RKI-TTSGSVSSAYSFYPFSTSTLNDILGKFSEVSLKIILGYMFMLIYVAVTLIQWRDP

SKQGVGAGLLALVAGILC IG S* VDVFLAH SKSQGAVGLAGVLLVALSVAAGLGLCS LI GI SFNAATTQVLPFLALGVGVDDVFLLAHAF VRGQSSVGVAGVLLM4CFSTAAGLGLSALLGIVFNAASTQVVPFLALGLGVDHI FMLTAAY I RSQAGVGIAGVLLLS ITVAAGLGFCALLGI P FNASSTQIVP FLALGLGVQDMFLLTHTY *PI *FSLAAVVV SETGQNKRI PFEDRTGECLKRTGASVALTS ISNVTAFFMAALI P1PALRAFS LQAAVVVV AESN RREQTKLI LKKVGPS I LFSACS TAGS FFAAAFI PVPALKVFCLQAAIVMC VEQAGD--VPREERTGLVLKKSGLSVLLASLCNVMAFLAAALLPI PAFRVFCLQAAI LLL *NAVLFALSDYRDRD ****RYS FNFAMVLLI FPAI LSMDLYRREDRRLDI FCC FTS PCVSRVI QVE PQAYTDTPHDNTRYSP P SNLAAALLVFPPAMI SLDLRRRTAGRADI FCCCF- PVWKEQPKVAPPVLPLNNNNGR FNLGSILLVFPAMISLDLRRRSAARADLLCCLM-P LPKKKI PER PPYSHSAHT*** VQRT*DPT *YTA *SE*VQV**QD *SQ* PPYSSHSFAHETQITMQSTVQLRTEYDPHTHVYYTTAEPRSEISVQPVTVTQDT LSCQSP

GARHPKSCNNNRVPLPAQNPLLEQRA

AKTRKNDKTHRID-TTRQPLDPDVS

ESTSSTRDLLSQFSDSSLHCLEPPCTKWTLSS FAEKHYAPFLLKPKAKVVVI FLFLGLLG ESTSSTRDLLSQFSDSSLHCLEPPCTKWTLSS FAEKHYAPFLLKPKAKVVVI LLFLGLLG HSLASF SLAT FAFQHYT PFLMRSWVKFLTVMGFLAALI CCL-SV---- -SLTKWAKNQYAPFIMRPAVKVTSMLALIAVI L VSYGTRRDLDTDVPETEYFI* *YSFNM* **-DPNQH*Y VSLYGTTRVRDGLDLTDIVPRETREYDFIAPQFCY FSFYNMY IVTQKA- DY PNIQHLLYD SSLYASTRLQDGLDI IDLVPKDSNEHKFLDAQTRLFGFYSMYAvTQGNFEY PTQQQLLRD TSVWGP.TKVKDGLDLTDIVPENTDEHE FLSRQEKYFGFYNMYAVTQGNFEY PTNQKLLYE a

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HIPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

LHRS FSNVKYVMLEER4KQLPKMWLHY FRDWLQGLQDAFDSDWETGKIMPNN-YKNGSDDG LHKS FSNVKYVMLEENKQLPQMWLHY FRDWLQGLQDAFDSDWETGRIMPNN-YKNGSDDG YHDS FVRVPHVIKNDNGGLPDFWLLLFSEWLGNLQKI FDEEYRDGRLTKECWFPNASSDA YHDQFVRIPNI IKNDNGGLTKFWLSLFRDWLLDLQVAFDKEVASGCITQEYWCKNASDEG VLAK*L *GR *PIDS *ADGIN* AFYYLAWV* *AYAS* VLAYKLLVQTGSRDKPIDISQLTK-QRLVDADGI INPSAFYIYLTAWVSN'DPVAYAASQA ILAYKLIVQTGHVDNPVDKELVLT-NRLVNSDGI INQRAFYNYLSAWATNDVFAYGASQG I LAYKLMVQTGHVDNPIDKS LITAGHRLVDKDGI INPKAFYNYLSAWATNDAAAYGASQG

***EKVRTIC

NIRPHRPEWVHDKADYMPETRLRI PAAE PIEYAQFPFYLNGLRDTSDFVEAIEKVRTI Cs KLYPEPRQYFHQPNEY---DLKI PKSLPLVYAQMPFYLHGLTDTSQIKTLIGHIRDLSV NLKPQPQRWIHSPEDV HLEIKKSS PLIYTQLPFYLSGLSDTDSIKTLIRSVRDLCL

*CTFVCVFLNPWAGIVM

NYTSLGLSSYPNGYPFLFWEQYISLRHWLLLFISVVLACTFLVCAVFLLNPWTAGIIVMV

KYEGFGLPNY PSGI PFI FWEQYMTLRSS LANI LACVLLAALVLVSLLLLSVWAAVLVI LS KYEAKGLPNFPSGI PFLFWEQYLYLRTS LLIALACALGAVFIAVM4VLLLNAWAAVLVTLA LALMTVELFGMMGLI GIKLSAVPVVI LIASVGI GVE FTVHVALAFLTAI GDKNRRAVLAL LALMTVELFGMMGLI GIKLSAVPVVI LIASVGI GVE FTVHVAIAFLTAI GDKNHRP.MLAL VLASLAQI FGAMTLLGIKLSAI PAVI LI LSVGMMLCFNVLI SLGFMTSVGNRQRRVQLSM LATLVLQLLGVMALLGVKLSAMPPVLLVLAI GRGVHFTVHLCLGFVTS IGCKRRRAS LAL EHMFAVL *ST* *GVLN *LLPLL*F* EHMFAPVLDGAVSTLLGVLMLAGSE FDFI VRY FFAVIAI LTVILGVLNGLVLLPVLLS FFG QMSLGPLVHGMLTS GVAVFMLSTS PFE FVIRH FCWLLLVVLCVGACNS LLVFPI LLSMVG ESLPHU MASFGFVARLFLRLLLALVFLGLIDGLLFFPIVLS

ILG

PYPEVS PANGLNRLPTPSPEPPPSVVRFAMPPGHTHSGSDSSDSEYSSQTTVSGLSE-EL PCPEVS PANGLNRLPTPSPEPPPSVVRFAVPPGHTNNGSDSSDSEYSSQTTVSGISE-EL PEAELVPLEHPDRISTPS PLPVRSSKRSGKSYVVQGSRSSRGSCQKSHHHHHKDLNDPSL PAAEVRPIEHPERLSTPSPKCSPIHPRKSSSSSGGGDKSSRTS--KSAPRPC

APSL

*VVPERHPPNPQQHLSGLPGR

RHYEAQQGAGGPAHQVIVEATENPVFAHSTVVH PDSRHHPPSNPRQQPHLDSGSLSPGRQ TTITEE PQSWKSSNSS IQMPHDWTYQPREQ--R PASYAAPP PAYHKAAAQQHHQHQGPPT TTITEEPSSWHSSAHSVQSSMQS IVVQPEVVVETTTYNGSDSASGRSTPTKS SHGGAITT GQQPRRDPPREGLWPPLYRPRRDAFEI STEGHSGPSNRARWGPRGARSHNPRNPASTIMG GQQPRRDPPREGLRP PPYRPRRDAFEI STEGHSGPSNRDRSGPRGARSHNPRNPTSTAMG YPPELQSIVVQPEVTVETTHS------------ DS TKVTATANI KVEVVT PSDRKSRRSYHYYDRRRDRDEDRDRDRERDRDRDRDRDRDRDRDR SSVPGYCQPITTVTASASVTVAVHPPPVPGPGRNPRGGLCPGY-

PETDHGLFEDPHVP

SSVPSYCQPITTVTASASVTVAVHPP-- PGPGRNPRGGPCPGYESYPETDHGVFEDPHVP NT TKVTATPINIKVELAMP--GRAVRS DR DRERSRERDRRDRYRD--ERDHRA SPRENGRDSGHE FHVCERDSKEVILQDECERP *SS FHVRCERRDSKVEVI ELQDVECEERPRGSSSN

SDSSRH

a. a.

HPTC

MPTC

PTC

BPTC

HPTC

MPTC

PTC

BPTC

0P The identity of ten other clones recovered from the mouse library is not determined. These cDNAs cross-hybridize with mouse ptc sequence, while differing as to their restriction maps. These genes encode a family of proteins related to the patched protein. Alignment of the human and mouse nucleotide sequences, which includes coding and noncoding sequence, reveals 89% identity.

In accordance with the subject invention, mammalian patched genes, including the mouse and human genes, are provided which allow for high level production of the patched protein, which can serve many purposes. The patched protein may be used in a screening for agonists and antagonists, for isolation of its ligand, particularly hedgehog, more particularly Sonic hedgehog, and for assaying for the transcription of the mRNA ptc. The protein or fragments thereof may be used to produce antibodies specific for the protein or specific epitopes of the protein. In addition, the gene may be employed for investigating embryonic development, by screening fetal tissue, preparing transgenic animals to serve as models, and the like.

All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were 20 specifically and individually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention 25 that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

S*30 SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR

UNIVERSITY

(ii) TITLE OF INVENTION: Patched Genes and their Use (iii) NUMBER OF SEQUENCES: 19 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Flehr, Hohbach, Test, Albritton Herbert STREET: Four Embarcadero Center, Suite 3400 CITY: San Francisco STATE: CA COUNTRY: US ZIP: 94111 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release 11.0, Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: FILING DATE: 06-OCT-1995

CLASSIFICATION:

(viii) ATTORNEY/AGENT INFORMATION: NAME: Rowland, Bertram I REGISTRATION NUMBER: 20015 REFERENCE/DOCKET NUMBER: a60190-1 (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: 415-781-1989 TELEFAX: 415-398-3249 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 736 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: AACNNCNNTN NATGGCACCC CCNCCCAACC TTTNNNCCNN NTAANCAAAA NNCCCCNTTT

NATACCCCCT

TTTTTNAACC

AAAATTNANA

CATGCACTGG

AAACAAGCCN

ATTCTGGTCT

CGTACTGAAC

TGAATGGTGG

CGTCGAATTA

CGTCGAANAC

GGGTGGCCCA

NTAANANTTT

CCCCCCACCC

NAATTGGTCC

CCCGAACACT

AAAGCTTTAC

GGACATTACA

GCCTGGCAGA

TAATTTTTGG

CATCTTCGTG

GGACGTGGTG

GTCGGGGCTG

TCCACCNNNC

GGAATTCCNA

TAACCTAACC

TGATCGTTGC

AAACTGTTGT

AAGTGCACCA

AGAAGTTCGC

TTGTTCCAGG

ACGTTCTCCA

AAGCTGGGGG

GCTGCCTTGG

NNAAANNCCN

NTNNCCNCCC

NATNGTTGTT

CGTTCCAATA

ACAATTAATG

CATCGGATGG

ACAGGTTGGT

AGGTGGATCG

CCGCCAATTT

TGGTGCTGGG

GAGTGCTGGT

CTGNANACNA

CCAAATTACA

ACGGTTTCCC

AGAATAAATC

GGCGAACACG

AACCAGGAGA

GGTTGGCGCA

TCTGACGAAG

GAACAAGATG

GGTGGCGGCG

CTTNGCGNGC

NGNAAhNCCN

ACTCCAGNCC

CCCCCAAATA

TGGTCATATT

AACTGTTCGA

AGGCCACAAC

AGGACTAGAG

AGCAAGAAGT

TTGAAGGAGG

GTGTACGGGT

TNCNATTCGC

120 180 240 300 360 420 480 540 600 660 720 736 CCTATAGTNA GNCGTA INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 107 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein 9.

(xi) Xaa 1 SEQUENCE DESCRIPTION: SEQ ID NO:2: Pro Pro Pro Asn Tyr Asn Ser Xaa Pro 5 10 Lys Xaa Xaa Xaa Leu Val Leu Thr Pro Xaa Val Val Thr Val Pro Glu His Leu Ile Val Ala Val 35 Leu Asn Lys Pro Lys Ala Leu Gln Ser 25 Pro Pro Pro Lys Tyr Met His Trp Leu Val Ile Ile Arg Ile 9*99** 9* 9 *9999 Thr Val Val Leu Met Gly Glu His 65 Glu Leu Phe Glu Phe Trp Ser Gly His Tyr 70 75 Val His His I le Lys Gly Trp Asn Gln Glu Lys Ala Thr Thr Val 90 Lys Phe Ala Gln Val Gly Gly Trp Arg Lys Leu Asn Ala Trp Gln Glu 100 105 INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 5187 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: GGGTCTGTCA CCCGGAGCCG GAGTCCCCGG CGGCCAGCAG

CCAGGCGCGC

GGGGCCCTGG

GCCGCGCCGG

GAGCAGATTT

TTTCAGAGAC

GTTGTGGGTC

ACCAACGTGG

ACCCGTCAGA

AAAGAAGAAG

CTCCAGGCCA

TGCTACAAAT

CTTTACCCTT

TCCGGGACAG

GAATTCCTAG

AATAAAGCCG

GATTGCCCTG

TTGAATGGTG

GTGGGTGGTA

ATGTTCCAGT

TCTCACATCA

CCGGAGCCCG

GCAGGCAGGC

ACCGGGACTA

CCAAGGGGAA

TCTTATTTAA

TCCTCATATT

AGGAGCTGTG

AGATAGGAGA

GCGCTAATGT

GTCGTGTGCA

CAGGGGAACT

GCTTAATCAT

CATACCTCCT

AAGAGTTAAA

AAGTTGGCCA

CCACAGCCCC

GATGTCAAGG

CCGTCAAGAA

TAATGACTCC

ACTGGAATGA

CGGCGGCGGC

CGGCGGCGGG

TCTGCACCGG

GGCTACTGGC

ACTGGGTTGT

TGGGGCCTTC

GGTGGAAGTT

AGAGGCTATG

TCTGACCACA

CGTCTACATG

TATCACGGAG

TACACCTTTG

AGGTAAGCCT

GAAAATAAAC

TGGGTACATG

TAACAAAAAT

TTTATCCAGG

TGCCACTGGA

CAAGCAAATG

AGACAGGGCA

GGCAACATGG

AGGCGCAGAC

CCCAGCTACT

CGGAAAGCGC

TACATTCAAA

GCTGTGGGAT

GGTGGACGAG

TTTAATCCTC

GAGGCTCTCC

TATAACAGGC

ACAGGTTACA

GACTGCTTCT

CCTTTACGGT

TACCAAGTGG

GACCGGCCTT

TCAACCAAAC

AAGTATATGC

AAACTTGTCA

TATGAACACT

GCCGCCATCC

CGTCCTCGCG

CCTCGGCTGG

GGACCGGGGG

GCGACGCCGC

CGCTGTGGCT

AGAACTGCGG

TAAAGGCAGC

TGAGTCGAGA

AACTCATGAT

TGCAACACCT

AATGGAAGTT

TGGATCAGAT

GGGAAGGGGC

GGACAAACTT

ACAGCTGGGA

GCCTCAACCC

CTCTTGATGT

ATTGGCAGGA

GCGCTCACGC

TCAGGGGCTA

TGGAGGCCTG

AGCCGAGCGC

TAACGCCGCC

ACCGCACCGC

CTTCGCTCTG

GAGAGCGAAG

CAAGTTTTTG

TAATCTCGAG

ATTAAATTAT

ACAGACTCCA

GGACTCAGCA

GGAACATTTG

AATAGAATAC

AAAGCTACAG

TGACCCCTTG

GGAAATGCTG

AGCCGACCCA

GGCCCTTGTT

GGAGTTGATT

CCTGCAAACC

CGACTATGTC

GCAGAGGACT

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260

TACGTGGAGG

ACAACCACGA

GCCAGCGGCT

TCCAAGTCCC

GCAGGATTGG

TTGCCGTTTC

AGTGAAACAG

CGCACCGGAG

GCATTGATCC

TTCAATTTTG

CGTGAGGACA

ATTCAAGTTG

CCCCCATACA

CAGCTCCGCA

TCTGAGATCT

GAGAGCACCA

CTCGAGCCCC

TTCCTCCTGA

GTCAGCCTTT

CGGGAAACCA

ATGTATATAG

CATAAGAGTT

ATGTGGCTGC

TGGGAAACTG

GCTTACAAAC

ACTAAACAGC

CTGACCGCTT

CCTCACCGGC

ATCCCAGCAG

TGGTTCATCA

CCCTGGACGA

ACCTACTGAT

AGGGTGCCGT

GCCTCTGCTC

TTGCTCTTGG

GACAGAATAA

CCAGCGTGGC

CTATCCCTGC

CTATGGTTCT

GAAGATTGGA

AGCCACAGGC

CCAGCCACAG

CAGAGTATGA

CTGTACAGCC

GCTCTACCAG

CCTGCACCAA

AACCCAAAGC

ATGGGACCAC

GAGAATATGA

TCACCCAGAA

TCAGCAATGT

ACTACTTTAG

GGAGGATCAT

TCCTGGTGCA

GTCTGGTAGA

GGGTCAGCAA

CGGAGTGGGT

CAGAGCCCAT

AAGTGTCGCC

CATCCTAAAA

GCTTGCCTAT

GGGGCTGGCT

CTTGATTGGC

TGTTGGTGTG

GAGGATTCCA

CCTCACCTCC

CCTGCGAGCG

GCTCATTTTT

TATTTTCTGC

CTACACAGAG

CTTCGCCCAC

CCCTCACACG

TGTTACCGTC

GGACCTGCTC

GTGGACACTC

CAAGGTTGTG

CCGAGTGAGA

CTTCATAGCT

AGCAGACTAC

GAAGTATGTC

AGACTGGCTT

GCCAAACAAT

GACTGGCAGC

CGCAGATGGC

CGACCCTGTA

CCATGACAALA

CGAGTACGCT

CCAAACTCCA

TCCTTCTCTG

GCCTGTTTAA

GGCGTCCTGT

ATTTCTTTTA

GATGATGTCT

TTTGAGGACA

ATCAGCAATG

TTCTCCCTCC

CCTGCAATTC

TGTTTCACAA

CCTCACAGTA

GAAACCCATA

CACGTGTACT

ACCCAGGACA

TCCCAGTTCT

TCTTCGTTTG

GTAATCCTTC

GACGGGCTGG

GCCCAGTTCA

CCGAATATCC

ATGCTGGAGG

CAAGGACTTC

TATAAAAATG

CGAGACAAGC

ATCATTAATC

GCTTACGCTG

GCCGACTACA

CAGTTCCCTT

CTCAAAAGGT

ATGTCAGTGT

CCATGCTGCG

TGGTTGCGCT

ATGCTGCGAC

TCCTCCTGGC

GGACTGGGGA

TCACCGCCTT

AGGCTGCTGT

TCAGCATGGA

GCCCCTGTGT

ACACCCGGTA

TCACTATGCA

ACACCACCGC

ACCTCAGCTG

CAGACTCCAG

CAGAGAAGCA

TTTTCCTGGG

ACCTCACGGA

AGTACTTCTC

AGCACCTACT

AGAACAAGCA

AGGATGCATT

GATCAGATGA

CCATCGACAT

CGAGCGCTTT

CCTCCCAGGC

TGCCAGAGAC

TCTACCTCAA

GCTTCCCTTC

CATCCGAGTG

CTGGGACTGC

GTCAGTGGCT

AACTCAGGTT

CCATGCATTC

GTGCCTCAAG

CTTCATGGCC

GGTGGTGGTA

TTTATACAGA

CACCAGGGTG

CAGCCCCCCA

GTCCACCGTT

CGAGCCACGC

TCAGAGTCCC

CCTCCACTGC

CTATGCTCCT

CTTGCTGGGG

CATTGTTCCC

TTTCTACAAC

TTACGACCTT

ACTTCCCCAA

TGACAGTGAC

CGGGGTCCTC

TAGTCAGTTG

CTACATCTAC

CAACATCCGG

CAGGCTGAGA

CGGCCTACGA

1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 GACACCTCAG ACTTTGTGGA AGCCATAGAA AAAGTGAGAG TCATCTGTAA CAACTATACG

AGCCTGGGAC

AGCCTGCGCC

TGCGCAGTCT

ATGACCGTTG

GTGGTCATCC

GCCTTTCTGA

TTTGCTCCCG

TCCGAATTTG

GGGGTTCTCA

GAGGTGTCTC

AGTGTCGTCC

TCGGAGTACA

GCACAGCAGG

GTCTTTGCCC

CGGCAACAGC

CGAAGGGATC

TTTGAAATTT

GGGGCCCGTT

AGCTACTGCC

CCCCCGCCTG

CCTGAGACTG

AGGAGGGACT

TGGGGGAGCA

AAGCCCCGCC

AARAGGTGTA

CCACTCCTGC

TGTGCCACAA

TGTCCAGCTA

ACTGGCTGCT

TCCTCCTGAA

AGCTCTTTGG

TGATTGCATC

CAGCCATTGG

TTCTGGACGG

ATTTCATTGT

ATGGACTGGT

CAGCCAATGG

GGTTTGCCGT

GCTCTCAGAC

GTGCCGGAGG

GGTCCACTGT

CCCACCTGGA

CCCCTAGAGA

CTACTGAAGG

CTCACAACCC

AGCCCATCAC

GACCTGGGCG

ATCACGGGGT

CAAAGGTGGAI

GCTCCAACTG

CCCACCTCTT

TGTTACTGTA

CACATGTAAT

CCCAGAGTGG

CCAAGCTTAA

CCCCAATGGC

GCTATCCATC

CCCCTGGACG

CATGATGGGC

TGTTGGCATC

GGACAAGAAC

TGCTGTGTCC

CAGATACTTC

TCTGCTGCCT

CCTAAACCGA

GCCTCCTGGT

CACGGTGTCT

CCCTGCCCAC

GGTCCATCCG

CTCTGGCTCC

AGGCTTGCGG

GCATTCTGGC

TCGGAACCCA

CACTGTGACG

CAACCCCCGA

ATTTGAGGAT

GGTCATAGAG

AGGGTAATTA

TCCAGAACTG

ACTGATTGTA

ATACATGGAA

GGAGACCACA

CTTAGTTTTA

TACCCCTTCC

AGCGTGGTGC

GCCGGGATCA

CTCATTGGGA

GGAGTGGAGT

CACAGGGCTA

ACTCTGCTGG

TTTGCCGTCC

GTCCTCTTAT

CTGCCCACTC

CACACGAACA

GGCATCAGTG

CAAGTGATTG

GACTCCAGAC

TTGTCCCCTG

CCACCCCCCT

CCTAGCAATA

ACGTCCACCG

GCTTCTGCTT

GGGGGGCCCT

CCTCATGTGC

CTACAGGACG

AAATCTGAAG

CTTGAAGAGA

TTATTKKGTG

ATGCTGTACA

GGGGCCCTTT

AAAAAAATCT

TGTTCTGGGA

TGGCCTGCAC

TTGTCATGGT

TCAAGCTGAG

TCACCGTCCA

TGCTCGCTCT

GTGTACTGAT

TGGCCATTCT

CCTTCTTTGG

CTTCGCCTGA

ATGGGTCTGA

AGGAGCTCAG

TGGAAGCCAC

ATCAGCCTCC

GACGGCAAGG

ACAGACCGCG

GGGACCGCTC

CCATGGGCAG

CGGTGACTGT

GTCCAGGCTA

CTTTTCATGT

TGGAATGTGA

CAAAGAGGCC

ACTGCTTGGA

AAATATTTCT

GTCTATTTCC

CCCCTGTGTA

CCCAGCATAT

GCAATACATC

GTTTCTAGTG

CCTGGCTCTG

TGCTGTGCCT

CGTGGCTTTG

GGAACACATG

GCTTGCAGGG

CACCGTCTTG

ACCGTGTCCT

GCCGCCTCCA

TTCCTCCGAC

GCAATACGAA

AGAAAACCCT

CTTGACCCCT

CCAGCAGCCT

CAGAGACGCT

AGGGCCCCGT

CTCTGTGCCC

TGCTGTGCAT

TGAGAGCTAC

CAGGTGTGAG

GGAGAGGCCG

AAAGATTGGA

ATTATGGGAA

ATAAATATTT

TGGGGCCTCT

CATTGGTCTC

GTCGCTGCTG

3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 CTTAAATATT GTATAATTTA CTTGTATAAT TCTATGCAAA TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG ATGAAGAAAA CAGGTTAATC CCAGTGGCTT CTCTAGGGGT GGTGGATGTG TGTGTGCATG TGACTTTCCA ATGTACTGTA TGCTGTTGTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA GTGGGCTGGG AAGGTCCAGG TCTTTTTCTG TCGTGATGCT CATCTGTCCT ATTCTCTGGG ACTATTC INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 1311 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein

TATTGCTTAT

TATCACAACC

TAATTGTTTA

AGTTGTATAT

TTGTGGTTTG

TGATCTTAGC

GGTGGAAAGG

GTAATACGAT

CTGTGGTAGG

ACGAGCAGAC

GGTTCGCATG

TTGTTGTTGT

TCTGGCCTAG

TGACCCCAAT

4800 4860 4920 4980 5040 5100 5160 5187 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:

S

SS

5S5555

S

5* 5 S5

S.

Met 1 Ala Tyr Val Ala Pro His Glu Ser Ile Arg Thr Asp 5 Ser Glu Ala Pro Asn Pro Arg Ile Thr Ala Pro Cys Ala Thr G lu 25 Ala Arg His Ser Ser Trp Val Glu Lys Gly Asp 40 Gly Ala Leu Ala Leu Trp Ala Asp Leu Ser Glu Leu Ile Arg Ala Asn Ile Glu Arg Thr Ser Trp Ala Leu Gln Glu Gln Leu 70 Phe Phe Ile Leu Gly Cys 75 Phe Leu Gln Gly Asp Gly Lys Val Val Ala Ile Leu Thr Val Leu Ser Thr Phe Cys Val Gly Leu Val Gln Glu 115 Lys 100 Ala Gln Ile Arg Val Asp Gln Leu Trp 110 Thr Ala Gln Gly Gly Arg Leu Glu 120 Glu Leu Lys Ala Leu Gly Glu Ala Asp Ser Ser Thr His Gln Leu Val Ile Gln Thr Ala Lys Asp 145 Pro Asp Val Ser Leu Leu His Pro Gly Ala Leu Leu Glu 150 155 His Asp Asp Pro Leu 225 Leu Lys Arg Thr Ile 305 Ala Arg Val Tyr Leu 385 Thr Ser Asn Leu Leu Ile Phe Cys 210 Leu Gin Leu Ala Asp 290 Pro Ala Asn Gin Lys 370 Asp Ser Thr Ile lie Lys Glu Glu 195 Ala Gly Trp Lys Gly 275 Pro Asp Tyr Ser Leu 355 Val Ala Gly Leu Ile 435 Gin Val Trp 180 Gly Ile Pro Thr Phe 260 Ile His Val Met Thr 340 Met His Trp Ser Asn 420 Leu Trp Val 165 Arg Tyr Ile Asp His 245 Gin Thr Cys Ala His 325 Ser Gly Gin Gin Val 405 Asp Gly Arg His Leu His Thr Tyr 230 Leu Phe Ser Pro Ala 310 Trp Ala Glu Ile Arg 390 Ser Ile Tyr Asp Ala Asp Ile 200 Leu Ile Pro Leu Tyr 280 Thr Leu Glu Arg Glu 360 Trp Phe Ala Gly Phe 440 Ile Thr Leu 185 Glu Asp Tyr Leu Ser 265 Met Ala Ser Gin Lys 345 Met Aen Ala Tyr Lys 425 Met Arg 31 Arg 170 Cys Ser Cys Val Glu 250 Thr Lys Pro His Leu 330 Ala Tyr Gin Ala Ser 410 Phe Leu Ser Val Tyr Ile Phe Pro 235 Val Ile Lys Asn Gly 315 Ile Arg Glu Glu Glu 395 Phe Ser Ile Gin Val Pro Asp 205 Glu Leu Glu Ala Cys 285 Lys Tyr Gly Leu Trp 365 Ala Arg Pro Val Val 445 Gly His Ser 190 Asn Gly Lys Glu Tyr 270 Leu Ser Gly Gly Gin 350 Ala Ala Lys Phe Ser 430 Ala Val Met 175 Ile Val Ser His Val 255 Met Asp Gly Phe Ala 335 Thr Asp Ala Ile Ser 415 Leu Val Gly 160 Tyr Pro Ile Lys Lys 240 Lys Lys Pro His Ala 320 Thr Val His Val Thr 400 Thr Lys Thr Ile 9* .9 a. a *9 a a a a.

at.aa a 450 455 460- Ala Gly Val Leu Leu Leu Ser Ile Thr Val Ala Ala Gly Leu Gly Phe 465 470 475 480 Cys Ala Leu Leu Gly Ile Pro Phe Asn Ala Ser Ser Thr Gin Ile Val 485 490 495 Pro Phe Leu Ala Leu Gly Leu Giy Val Gln Asp Met Phe Leu Leu Thr 500 505 510 His Thr Tyr Val Giu Gln Ala Gly Asp Val Pro Arg Giu Giu Arg Thr 515 520 525 Gly Leu Val Leu Lys Lys Ser Gly Leu Ser Val Leu Leu Ala Ser Leu 530 535 540 Cys Asn Val Met Ala Phe Leu Ala Ala Ala Leu Leu Pro Ile Pro Ala 545 550 555 560 Phe Arg Val Phe Cys Leu Gin Ala Ala Ile Leu Leu Leu Phe Asn Leu 565 570 575 Gly Ser Ile Leu Leu Val Phe Pro Ala Met Ile Ser Leu Asp Leu Arg 580 585 590 Arg Arg Ser Ala Ala Arg Ala Asp Leu Leu Cys Cys Leu Met Pro Glu 595 600 605 Ser Pro Leu Pro Lys Lys Lye Ile Pro Glu Arg Ala Lys Thr Arg Lys 610 615 620 Asn Asp Lys Thr His Arg Ile Asp Thr Thr Arg Gin Pro Leu Asp Pro *625 630 635 640 Asp Val Ser Giu Aen Val Thr Lye Thr Cys Cys Leu Ser Val Ser Leu 645 650 655 *Thr Lys Trp Ala Lys Aen Gin Tyr Ala Pro Phe Ile Met Arg Pro Ala :.660 665 670 Val Lys Val Thr Ser Met Leu Ala Leu Ile Ala Val Ile Leu Thr Ser 675 680 685 *.Val Trp Gly Ala Thr Lys Val Lye Asp Gly Leu Asp Leu Thr Asp Ile 690 695 700 Val Pro Giu Aen Thr Asp Glu His Glu Phe Leu Ser Arg Gin Giu Lye 705 710 715 720 Tyr Phe Gly Phe Tyr Aen Met Tyr Ala Val Thr Gin Gly Asn Phe Glu :725 730 735 s Tyr Pro Thr Aen Gin Lye Leu Leu Tyr Glu Tyr His Asp Gin Phe Val 740 745 750 Arg Ile Pro Aen Ile Ile Lye Aen Asp Aen Gly Gly Leu Thr Lye Phe 755 760 765 Trp Leu Ser 770 Asp Lys Glu 785 Asn Ala Ser Gly His Val.

Arg Leu Val 835 Tyr Leu Ser 850 Gin Gly Asn 865 Asp Val His Leu Pro Phe Leu Ile Arg 915 Leu Pro Asn 930 Leu Tyr Leu 945 Ala Val Phe Val Leu Val Val Met Ala 995 Leu Val Leu 1010 Leu Gly Phe 1025 Ala Leu Glu Ala Leu Ala Leu Val1 Asp Asp 820 Asp Ala Leu Leu Tyr 900 Ser Phe Arg Ile Thr 980 Leu Ala Val1 Ser Ala Phe Arg Asp Trp Leu 775 Ala Ser Gly Cys Ile 790 Glu Gly Ile Leu Ala 805 Asn Pro Ile Asp Lys 825 Lys Asp Gly Ile Ile 840 Trp Ala Thr Asn Asp 855 Lys Pro Gin Pro Gin 870 Glu Ile Lys Lys Ser 885 Leu Ser Gly Leu Ser 905 Val Arg Asp Leu Cys 920 Pro Ser Gly Ile Pro 935 Thr Ser Leu Leu Leu 950 Ala Val Met Vai Leu 965 Leu Ala Leu Ala Thr 985 Leu Gly Val Lys Leu 1000 Ile Gly Arg Giy Val 1015 Thr Ser Ile Gly Cys 1030 Val Leu Ala Pro Val 1045 Ser Met Leu Ala Ala Leu Asp Leu Gin Val 780 Thr Gin Giu Tyr Trp 795 Tyr Lye Leu Met Val 810 Ser Leu Ile Thr Ala 830 Asn Pro Lys Ala Phe 845 Ala Leu Ala Tyr Gly 860 Arg Trp Ile His Ser 875 Ser Pro Leu Ile Tyr 890 Asp Thr Xaa Ser Ile 910 Leu Lye Tyr Glu Ala 925 Phe Leu Phe Trp Glu 940 Ala Leu Ala Cys Ala 955 Leu Leu Asn Ala Trp 970 Leu Val Leu Gin Leu 990 Ser Ala Met Pro Ala 1005 His Phe Thr Val His 1020 Lye Arg Arg Arg Aia 1035 Vai His Gly Ala Leu 1050 Ser Glu Cys Gly Phe ?rla Phe Cys Lye 800 Gin Thr 815 Gly His Tyr Aen Ala Ser Pro Glu 880 Thr Gin 895 Lye Thr Lye Gly Gin Tyr Leu Ala 960 Ala Ala 975 Leu Gly Val Leu Leu Cys Ser Leu 1040 Ala Ala 1055 Val Ala *se *0 S 0

S

*0000 0 0 6 0 1060 Arg Leu Phe Leu Arg Leu 1065 Leu Leu Asp Ile Val 33 1070 Phe Leu Gly Leu Ile 1075 1080 1085 Asp Gly Leu Leu Phe Phe Pro Ile Val Leu Ser Ile Leu Gly Pro Ala 1090 1095 1100 Ala Glu Val Arg Pro Ile Glu His Pro Glu Arg Leu Ser Thr Pro Ser 1105 1110 1115 1120 Pro Lys Cys Ser Pro Ile His Pro Arg Lys Ser Ser Ser Ser Ser Gly 1125 1130 1135 Gly Gly Asp Lys Ser Ser Arg Thr Ser Lys Ser Ala Pro Arg Pro Cys 1140 1145 1150 Ala Pro Ser Leu Thr Thr Ile Thr Glu Glu Pro Ser Ser Trp His Ser 1155 1160 1165 Ser Ala His Ser Val Gln Ser Ser Met Gln Ser Ile Val Val Gln Pro 1170 1175 1180 Glu Val Val Val Glu Thr Thr Thr Tyr Asn Gly Ser Asp Ser Ala Ser 1185 1190 1195 1200 Gly Arg Ser Thr Pro Thr Lys Ser Ser His Gly Gly Ala Ile Thr Thr 1205 1210 1215 Thr Lys Val Thr Ala Thr Ala Asn Ile Lys Val Glu Val Val Thr Pro 1220 1225 1230 Ser Asp Arg Lys Ser Arg Arg Ser Tyr His Tyr Tyr Asp Arg Arg Arg 1235 1240 1245 Asp Arg Asp Glu Asp Arg Asp Arg Asp Arg Glu Arg Asp Arg Asp Arg 1250 1255 1260 Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg 1265 1270 1275 1280 Glu Arg Ser Arg Glu Arg Asp Arg Arg Asp Arg Tyr Arg Asp Glu Arg 1285 1290 1295 Asp His Arg Ala Ser Pro Arg Glu Lys Arg Gln Arg Phe Trp Thr 1300 1305 1310 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 4434 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID 34

CGAAACAAGA

ACGCACACAG

AGAAACAGCG

CGAGATACAG

GGCGATGTGG

GCCCAAGTGG

TATCTGCGAT

GCGGGCAAGG

AGCGCCCAGA

GCGGAACTGG

CTCATTCAGA

CACCTGGAGG

GGGCTGCGCG

GAGCAGATCC

GAGGGAAGCC

CTCCTGTGGA

GAGGAAAAGA

GGCAGTGGCT

GCACCGAACA

TACGGTTATG

AGGAACCGCA

ACCGAGAAGG

ACGCAGGAGA

GAACAGCTGC

TCGGCTGCAC

,ATCGGCGTGG

GTCCGTGGCC

GCCGGATTGG

GCGGAGAGCA

GAGCGAGTGA

GCGCAAAACA

GCGCGCGCTC

ATACATCTCT

TCGATGAGAA

CGCTCGATCA

CAGTATTCCA

TGCTATTCGT

TCCACTCCAA

CCTACACACA

CGACCCACGA

TCCTGGTCAA

ACATGTGCAA

TGCGCCACCT

AGCTGTTGGG

CCACCCTGAA

TCAGCTTCGA

ACATGGAGAA

AGAACAGCAC

CCGCGAAGCA

GCGGACACTT

AAATGTACGA

AGGCAGCGGA

TACGTAAACA

TGGATGACAT

CCGTCACCGT

AGAGCAGTGT

GATTGTCAGC

ATCGGCGGGA

GAGTAGGGAG

GTGCACACAG

GCCTAATGAA

CATGGACCGC

ATTATTCTCG

GATAGATAAG

GTCCCACCTC

GGCTATCCTG

GGTGCACCAG

GAAGACGATC

CCCGAACGCC

GGCCACCGCC

CATGCCGAGC

CATTCCGTGC

TCCGGAATCA

TCCCGCCTCT

CTTCGAGACC

GCCCTGCCTG

CCAGCCGCCG

CATGCACTGG

GAGGAAGGCC

CCAGTGGCAG

GGTTTTGAAC

GTCGAGAATT

CCTGGCCAAG

TTTGTATGCC

GGGCGTGGCC

CCTGCTCGGT

GCAGACCAAG

AGCGTCTGTG

ACGCCCGCTG

GTTGTTGGCC

GACAGCCTCC

GATCTTTACA

GGCAAAGCGC

GAAACCCTCG

GTGCTGAGCA

CTGTGGATCC

GGCGAGGACG

TCCGTCCTGC

GTCAAGGTGC

ACGCCCTCCT

TCGATCATCA

GCGGTCGTTA

GTGATGCAGT

GTGGAGCAGT

AACCCACTGA

GATGTGGGAG

CCGGAGGAGC

CAGGCCCTGC

GACAACTACA

GCCTGGCAGC

GCCACCAACT

TTCTCCCATC

TTTTGCACGC

GGAGTTCTGC

ATCGTTTTCA

CTGATTCTCA

TTGTGTGTTG

GGCAAGAGAG

TGGCTGGCGT

CACGCGTTCC

TACGCACCAG

GTGGCAGCCG

GCAGCTCCGT

CCTTCTGCGT

AGGAGGGCGG

AGTCGGCCAC

ATCCGCAGGC

ACCTCTACGA

TCGAGGGCAT

CGCCGCTGGA

TACCAGGCCT

ATATGAAACA

ACATGAAGCG

ATCCCAATTG

CCATCCTGTC

TGATTGTGGG

AGTCGGTGGT

AGGTGCACCA

GCAACTTTTC

ACGATATCTA

CCAGCGCCTT

TCCTCCGCTG

TCATGTGCTT

ATGCGCTGAC

AGAACGCCAG

AGTGTCGCCC

AGTGAGAGAG

GCCGCATCCA

GGACACACAC

CTGGGTGGAC

CACGGCGATC

GCAAAAGCAC

CGGCCTGAAG

CCGGCTGGAG

GCATCAGCTG

GCTGCTTGCC

CACCGAATGG

CTACTACATC

CTGTTTCTGG

CAACCAACGA

AAAGATGTCC

TGCGGCCATT

CCCGGACACG

CGGAGGCTGC

CGGACGGAAG

GCAGCTGATG

TCTTGGATGG

GCGGGAGGTG

CGTGTTCAGC

GTCCATTGTC

GAGGGACCCC

CAGTACCGCC

CGCTGCCTAT

CACCCAGGTG

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 GTTCCGTTTT TGGCCCTTGG TCTGGGCGTC GATCACATCT TCATAGTGGG ACCGAGGATC

CTGTTCAGTG

GCTTTGAAGG

CTATTGGTTT

GACATCTTCT

CTGCCGCTGA

AGGGTGCCGC

AGTCACTCAC

CTCATGCGCA

AGCTTGTATG

GACAGCAACG

TATGCGGTTA

CATGATTCCT

TTCTGGCTGC

TACCGCGACG

CTGGCCTACA

GTGCTCACCA

TATCTGTCGG

TATCCGGAAC

AGTCTGCCAT

CAGATCAAGA

CTGCCCAACT

TCCTCACTGG

CTCCTGCTCT

CAGATCTTTG

CTCATCCTCA

ACATCCGTTG

CTTGTCCACG

GAGTTTGTGA

CCTGCAGCAC

TATTCTGTCT

TTCCGGCCAT

GCTGCTGTTT

ACAACAACAA

TGCCCGCCCA

TGGCGTCCTT

GCTGGGTGAA

CCTCCACGCG

AGCACAAGTT

CCCAGGGCAA

TTGTGCGGGT

TGCTCTTCAG

GACGGCTGAC

AGCTAATCGT

ATCGCCTGGT

CATGGGCCAC

CGCGCCAGTA

TGGTCTACGC

CCCTGATAGG

ATCCATCGGG

CCATGATCCT

CCGTTTGGGC

GGGCCATGAC

GCGTGGGCAT

GCAACCGACA

GCATGCTGAC

TCCGGCACTT

CGCAGGATCC

GCAGGCTGCC

GATTTCGTTG

TCCGGTGTGG

CGGGCGCGGG

GAATCCTCTG

CTCCCTGGCA

GTTCCTGACC

CCTTCAGGAT

CCTGGATGCT

CTTTGAATAT

GCCACATGTG

CGAGTGGCTG

CAAGGAGTGC

GCAAACCGGC

CAACAGCGAT

CAACGACGTC

TTTTCACCAA

TCAGATGCCC

TCATATTCGC

CATTCCCTTC

GGCCTGCGTG

CGCCGTTCTC

TCTGCTGGGC

GATGCTGTGC

GCGCCGCGTC

CTCCGGAGTG

CTGCTGGCTI

TTCTTTGCGG

ATCGTAATGT

GATCTACGCA

AAGGAACAGC

GCCCGGCATC

CTGGAACAGA

ACCTTCGCCT

GTTATGGGTT

GGCCTGGACA

CAAACTCGGC

CCCACCCAGC

ATCAAGAATG

GGTAATCTGC

TGGTTCCCAA

CATGTGGACA

GGCATCATCA

TTCGCCTACG

CCCAACGAGT

TTTTACCTCC

GACCTGAGCG

ATCTTCTGGG

CTACTCGCCG

GTGATCCTCA

ATCAAACTCT

TTCAATGTGC

CAGCTGAGCA

GCCGTGTTCA

CTGCTGGTGG

CCGCCTTTAT

GCTCCAATTT

GACGTACCGC

CGAAGGTGGC

CGAAGAGCTG

GGGCAGACAT

TTCAGCACTA

TCCTGGCGGC

TTATTGATCT

TCTTTGGCTT

AGCAGTTGCT

ATAACGGTGG

AAAAGATATT

ACGCCAGCAG

ACCCCGTGGA

ACCAACGCGC

GAGCTTCTCA

ACGATCTTAA

ACGGACTAAC

TCAAGTACGA

AGCAGTACAT

CCCTGGTGCT

GCGTTCTGGC

CGGCCATTCC

TGATATCACT

TGCAGATGTC

TGCTCTCCAC

TCTTATGCGT

TCCGGTGCCG

GGCAGCGGCT

CGGCAGGGCG

ACCTCCGGTG

CAACAACAAC

CCCTGGGAGC

CACTCCCTTC

CCTCATATCC

GGTGCCCAAG

CTACAGCATG

CAGGGACTAC

ACTGCCGGAC

CGACGAGGAA

CGATGCCATC

CAAGGAACTG

CTTCTACAAC

GGGCAAATTG

GATACCCAAG

AGATACCTCG

GGGCTTCGGC

GACCCTGCGC

GGTCTCCCTG

CTCGCTGGCC

GGCAGTCATA

GGGCTTCATG

CCTGGGACCA

GTCGCCCTTT

TGGCGCCTGC

1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 AACAGCCTTT TGGTGTTCCC CATCCTACTG AGCATGGTGG GACCGGAGGC GGAGCTGGTG

CCGCTGGAGC

AGATCGGGCA

TCGCATCACC

CCGCAGTCGT

CCGCGGGAAC

GCCCAGCAGC

GCCTATCCGC

CACTCGGACA

ATGCCCGGCA

TAGCTATTAG

AATCGATTTG

ATGGATGCTT

CATTAGCTTA

AATCCAAAAT

ATCCAGACCG

AATCCTATGT

ACCACCACAA

GGAAGTCCAG

AGCGACCCGC

ACCACCAGCA

CGGAGCTGCA

GCAACACCAC

GGGCGGTGCG

GACGTATCTT

TCCAGCGGGT

AAATGGCATG

TGGTTTCAAG

CGACGTATCC

CATATCCACG

GGTGCAGGGA

AGACCTTAAT

CAACTCGTCC

CTCCTACGCG

TCAGGGCCCG

GAGCATCGTG

CAAGGTGACG

CAGCTATAAC

TAGACTCTAG

CTGCTGAGGA

GTAATTGGCA

ATACATTTTT

ATGAAAATTG

CCCTCTCCGC

TCGCGATCCT

GATCCATCGC

ATCCAGATGC

GCCCCGCCCC

CCCACAACGC

GTGCAGCCGG

GCCACGGCCA

TTTACGAGTT

CCTA.AGCCGT

TTTCGTTCTC

AAATATCAAT

AAAGAGTCCG

AAAAGCTAAG

TCCGGTATTT

TGCCCGTGCG

CGCGAGGCAG

TGACGACGAT

CCAATGATTG

CCGCCTATCA

CCCCGCCTCC

AGGTGACGGT

ACATCAAGGT

AGCACTAGCA

AACCCTATTT

ATGGATTCTC

TTTTGTGTCT

CCAGATATTT

CAGACCCGTA

ATAGCAGCTG

CAGCAGCAAG

CTGCCAGAAG

CACCGAGGAG

GACCTACCAG

CAAGGCCGCC

CTTCCCGACG

GGAGACGACG

GGAGCTGGCC

CTAGTTCCTG

GTATCTGTAA

ATGGATTCTC

CAAAAAGATG

ATATAAAAAA

TGTATGTATA

CCTT

3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4434 TGTGTATGCA TGTTAGTTAA TTTCCCGAAG INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 1285 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Met Asp Arg Asp Ser Leu Pro Arg Val Pro 1 5 10 Val Asp Glu Lys Leu Phe Ser Asp Leu Tyr 20 25 Asp Ala Gln Val Ala Leu Asp Gln Ile Asp 40 Ser Arg Thr Ala Ile Tyr Leu Arg Ser Val 50 55 Asp Thr His Gly Asp Val Ile Arg Thr Lys Gly Lys Phe Gln Ser Ser Trp Val Ala Arg Gly His Leu Glu Thr Leu Gly Ser Ser Val Gin Lys His Ala Gly Lys Val Lei! Phe Val 70 75 Ala Ile Giu Ala Val1 145 Ala Asp Ile Leu Val 225 Pro Ile Ile Asn Val 305 Met Ser Met Ile His Ala Thr 130 Leu Thr Met Giu Asp 210 Val Ala Ser Gly Cys 290 Gly His Gly rhr Leu Ser Giu 115 His His Ala Cys Gin 195 Cys Ile Ser Phe Ser 275 Pro Ala Trp His Glu Val Lys 100 Leu Gin Pro Val1 Asn 180 Ile Phe Pro Vai Asp 260 Gly Asp Ile Pro Leu 340 Lys Leu Val Ala Leu Gin Lys 165 Met Leu Trp Gly Met 245 Phe Tyr Thr Leu Glu 325 Arg Glu Ser His Tyr Leu Ala 150 Val Pro Arg Glu Leu 230 Gin Giu Met Ala Ser 310 Giu Lys Met Thr Gin Thr Ile 135 Leu His Ser His Gly 215 Asn Tyr Thr Giu Pro 295 Gly Leu Ala Tyr Phe Leu Gin 120 Gin Leu Leu Thr Leu 200 Ser Gin Met Val1 Lys 280 Asn Gly Ile Gin Asp 360 Cys Trp 105 Lys Thr Ala Tyr Pro 185 Ile Gin Arg Lye G iu 265 Pro Lye Cys Val Ala 345 Gin Lys 38 Val1 90 Ile Thr Thr His Asp 170 Ser Pro Leu Leu Gin 250 Gin CYs Asn Tyr Gly 330 Leu Trp Ala Gly Gin Ile His Leu 155 Thr Phe Cys Leu Leu 235 Lys Tyr Leu Ser Gly 315 Gly Gin Gin Ala Leu Giu Gly Asp 140 Giu Giu Giu Ser Gly 220 Trp Met Met Thr 300 Tyr Arg Ser ksp ilu Lys Gly Giu 125 Pro Val1 Trp Gly Ile 205 Pro Thr Ser Lys Pro 285 Gin Ala Lye Val1 Asn 365 Val1 Ser Gly 110 Asp Asn Leu Gly Ile 190 Ile Glu Thr Giu Arg 270 Leu Pro Ala Arg Val 350 Tyr Leu Ala Arg G iu Ala Val Leu 175 Tyr Thr Ser Leu Glu 255 Ala Aen Pro Lye Asn 335 Gin Lye Asn Gin Leu Ser Ser Lys 160 Arg Tyr Pro Ala Asn 240 Lye Ala Pro Asp His 320 Arg Leu Val kla

CS

C

355 His His Leu Giy Trp Thr Gin Glu 370 375 380 Gin Arg Asn Phe Ser Arg Glu Val Glu Gin Leu Leu Trp 385 Ser Leu Val1 Arg Val 465 Leu Asn Val Val Phe 545 Gin Phe Ala Val Arg 625 Asn Leu Arg Asp Ile Trp 450 Leu Leu Arg Val1 Gly 530 Ala Ala Pro Asp Ala 610 His Pro Ala Ile Asp Gly 435 Arg Leu Gly Arg Pro 515 Pro Ala Ala Ala Ile 595 Pro Pro Leu Ser Ala Ile 420 Val Asp Met Ile Glu 500 Phe Ser Ala Ile Met 580 Phe Pro Lys Leu Phe 660 Thr 405 Leu Ala Pro Cys Val1 485 Gin Leu Ile Phe Val 565 Ile Cys Val Ser Glu 645 Ser 390 Asn Ala Val Val1 Phe 470 Phe Thr Ala Leu Ile 550 Met Ser Cys Leu Cys 630 Gin Leu Tyr Lys Thr Arg 455 Ser Asn Lys Leu Phe 535 Pro Cys Leu Cys Pro 615 Asn Arg Ala Asp Phe Val 440 Gly Thr Ala Leu Gly 520 Ser Val1 Ser Asp Phe 600 Leu Asn Ala Thr Lys 680 Ile Ser 425 Leu Gin Ala Leu Ile 505 Leu Ala Pro Asn Leu 585 Pro Asn Asn Asp Phe 665 Phe Tyr 410 His Tyr Ser Ala Thr 490 Leu Gly Cy 5 Al a Leu 570 Arg Vai Asn Arg Ilie 650 Ala 395 Val Pro Ala Ser Gly 475 Ala Lys Val1 Ser Leu 555 Ala Arg Trp Asn Val 635 Pro Phe Phe Ser Phe Val 460 Leu Ala Asn Asp Thr 540 Lys Ala Arg Lys Asn 620 Pro Gly Gin Ser Ala Cys 445 Gly Gly Tyr Ala His 525 Ala Val Ala Thr Glu 605 Gly Leu Ser His Arg Ser Leu 430 Thr Val Leu Ala Ser 510 Ile Gly Phe Leu Ala 590 Gin Arg pro Ser Tyr 670 Lys Ala 415 Ser Leu Ala Ser Giu 495 Thr Phe Ser Cys Leu 575 Gly Pro Gly Ala His 655 Thr Gin 400 Ala Ile Leu Gly Ala 480 Ser Gin Ile Phe Leu 560 Val Arg Lys Ala Gin 640 Ser Pro

C

C C

C

Phe Leu Met Arg Ser Trp Vai 675 Leu Thr Val Met Gly Phe Leu 685 Ala Ala Leu Ile Ser Ser Leu Tyr Ala Ser Thr Arg Leu Gin Asp Gly 690 695 700 Le Let Thr Tyr Gly Leu 785 Glu Leu Val Ala Tyr 865 His Val Gin Glu Trp 945 Cys Val a Asp Asp Gin His Leu 770 Gin Cys Ile Leu Phe 850 Gly Gin Tyr I Ile I 9 Gly P 930 Glu G Val L Trp A Ile Al Gly Asp 755 Pro Lys Trp Val Thr 835 Tyr Ala ?ro Ala Lys P15 he in eu la I1 Gi Aer 74( Sez Asp Ile Phe Gin 820 Asn Asn Ser Asn Gin 900 Thr Gly Tyr Leu Ala 3 Asp n Thr 725 Phe Phe Phe Phe Pro 805 Thr Arg Tyr Gin Glu 885 Met Leu Leu I Met 1 9 Ala A 965 Val L Le 71 Ar Glt Arc Trp Asp 790 Asn Gly Leu Leu Gly 870 ryr ?ro Ile Pro hr 50 la eu u Val Leu Tyr Val Leu 775 Glu Ala His Val Ser 855 Lys Asp Phe Gly Asn 1 935 Leu I Leu I Val I Pr Ph Pr Prc 76C Leu Glu Ser Val Asn 840 Ala Leu Leu ryr Hise )20 Tyr Arg al le o Lys e Gly D Thr 745 His Leu Tyr Ser Asp 825 Ser Trp Tyr Lys Leu 905 Ile Pro I Ser I Leu 9 Leu S 985 As Phe 73C Gin Val Phe Arg Asp 810 Asn Asp Ala Pro Ile 890 iis Arg er er al 70 er Ser 715 Tyr Gin Ile Ser Asp 795 Ala Pro Gly Thr Glu 875 Pro Gly Asp Gly Leu 955 Ser I Val I As Se Gir Lys Glu 780 Gly Ile Val Ile Asn 860 Pro Lys Leu Leu Ile 940 Ala Leu Leu n Gl Me Let Asr 765 Trp Arg Leu Asp Ile 845 Asp Arg Ser Thr Ser 925 Pro Met Leu Ala u Hi t Ty Le 75 As Le Lei.

Ala Lys 830 Asn Val Gin Leu Asp 910 Val Phe Ile Leu Ser 990 Lys r Ala 735 Arg 0 p Asn a Gly Thr Tyr 815 Glu Gin Phe Tyr Pro 895 Thr I Lys I Ile P Leu A 9 Leu S 975 Leu A Phe 720 Val Asp Gly Asn Lys 800 Lys Leu Arg Ala Phe 380 Leu Ser Tyr he la er la a a a 980 Gin Ile Phe Gly Ala Met Thr Leu Leu Gly Ile Lys Leu Ser Ala Ile 995 1000 1005 Pro Ala Val Ile Leu Ile Leu Ser Val Gly Met Met Leu Cys Phe Asn 1010 1015 1020 Val Leu Ile Ser Leu Gly Phe Met Thr Ser Val Gly Asn Arg Gin Arg 1025 1030 1035 1040 Arg Val Gln Leu Ser Met Gln Met Ser Leu Gly Pro Leu Val His Gly 1045 1050 1055 Met Leu Thr Ser Gly Val Ala Val Phe Met Leu Ser Thr Ser Pro Phe 1060 1065 1070 Glu Phe Val Ile Arg His Phe Cys Trp Leu Leu Leu Val Val Leu Cys 1075 1080 1085 Val Gly Ala Cys Asn Ser Leu Leu Val Phe Pro Ile Leu Leu Ser Met 1090 1095 1100 Val Gly Pro Glu Ala Glu Leu Val Pro Leu Glu His Pro Asp Arg Ile 1105 1110 1115 1120 Ser Thr Pro Ser Pro Leu Pro Val Arg Ser Ser Lys Arg Ser Gly Lys 1125 1130 1135 Ser Tyr Val Val Gln Gly Ser Arg Ser Ser Arg Gly Ser Cys Gln Lys 1140 1145 1150 Ser His His His His His Lys Asp Leu Asn Asp Pro Ser Leu Thr Thr 1155 1160 1165 Ile Thr Glu Glu Pro Gln Ser Trp Lys Ser Ser Asn Ser Ser Ile Gin 1170 1175 1180 Met Pro Asn Asp Trp Thr Tyr Gln Pro Arg Glu Gln Arg Pro Ala Ser 1185 1190 1195 1200 Tyr Ala Ala Pro Pro Pro Ala Tyr His Lys Ala Ala Ala Gin Gin His 1205 1210 1215 His Gin His Gin Gly Pro Pro Thr Thr Pro Pro Pro Pro Phe Pro Thr 1220 1225 1230 Ala Tyr Pro Pro Glu Leu Gln Ser Ile Val Val Gln Pro Glu Val Thr 1235 1240 1245 Val Glu Thr Thr His Ser Asp Ser Asn Thr Thr Lys Val Thr Ala Thr 1250 1255 1260 Ala Asn Ile Lys Val Glu Leu Ala Met Pro Gly Arg Ala Val Arg Ser 1265 1270 1275 1280 Tyr Asn Phe Thr Ser 1285 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 345 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: AAGGTCCATC AGCTTTGGAT ACAGGAAGGT GGTTCGCTCG CAGAAATCGC TCGGCGAGAT GGACTCCTCC ACGCACCAGC GATATGGACG CCTCGATACT GCACCCGAAC GCGCTACTGA AAAGCGATCT CGGTGACGGT GCACATGTAC GACATCACGT TACTCGCCCA GCATACCGAG NTTCGATACG CACTTTATCG ATACCGTGCG CGATCATCAC GCCGCTGGAT TGCTTTTGGG INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 115 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide AGCATGAGCT AGCCTACACG TGCTAATCCA AACNCCCAAA CGCACCTGGA CGTGGTGAAG GGAGNCTCAA GGACATGTGC AGCAGATCTT CGAGAACATC

AGGGA

120 180 240 300 345 a a. a.

a a a a a a a. a a a.

a (xi) Lys 1 Leu Gin SEQUENCE DESCRIPTION: SEQ ID NO:8: Val His Gin Leu Trp Ile Gin Giu Gly 5 10 Ala Tyr Thr Gin Lys Ser Leu Gly Giu 20 25 Leu Leu Ile Gin Thr Pro Lys Asp Met Gly Ser Leu Giu His Giu Met Asp Ser Asp Ala Ser Val Lys Lys Ser Thr His Ile Leu His Ala Ile Ser 35 Pro Asn Ala 40 Leu Leu Leu Thr Val Thr His 55 Asp Asp Val Leu Val His Met 65 Tyr Tyr Pro Ile Thr Trp Lys Asp Met Ser Pro Ser Ile 85 Xaa Phe Asp Thr 90 Phe Ile Giu Gin Phe Glu Asn Ile Ile Pro Cys Ala Ile Ile Thr Pro Leu Aspr Cys Phe 100 105 110 Trp Glu Gly 115 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 5187 bane pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: GGGTCTGTCA CCCGGAGCCG GAGTCCCCGG CGGCCAGCAG CGTCCTCGCG AGCCGAGCGC *too 00..

CCAGGCGCGC

GGGGCCCTGG

GCCGCGCCGG

GAGCAGATTT

TTTCAGAGAC

GTTGTGGGTC

ACCAACGTGG

ACCCGTCAGA

AAAGAAGAAG

CTCCAGGCCA

TGCTACAAAT

CTTTACCCTT

TCCGGGACAG

GAATTCCTAG

AATAAAGCCG

GATTGCCCTG

TTGAATGGTG

GTGGGTGGTA

CCGGAGCCCG

GCAGGCAGGC

ACCGGGACTA

CCAAGGGGAA

TCTTATTTAA

TCCTCATATT

AGGAGCTGTG

AGATAGGAGA

GCGCTAATGT

GTCGTGTGCA

CAGGGGAACT

GCTTAATCAT

CATACCTCCT

AAGAGTTAAA

AAGTTGGCCA

CCACAGCCCC

GATGTCAAGG

CCGTCAAGAA

CGGCGGCGGC

CGGCGGCGGG

TCTGCACCGG

GGCTACTGGC

ACTGGGTTGT

TGGGGCCTTC

GGTGGAAGTT

AGAGGCTATG

TCTGACCACA

CGTCTACATG

TATCACGGAG

TACACCTTTG

AGGTAAGCCT

GAAAATAAAC

TGGGTACATG

TAACAA.AAAT

TTTATCCAGG

TGCCACTGGA

GGCAACATGG

AGGCGCAGAC

CCCAGCTACT

CGGAAAGCGC

TACATTCAAA

GCTGTGGGAT

GGTGGACGAG

TTTAATCCTC

GAGGCTCTCC

TATAACAGGC

ACAGGTTACA

GAC!TGCTTCT

CCTTTACGGT

TACCAAGTGG

GACCGGCCTT

TCAACCAAAC

AAGTATATGC

AAACTTGTCA

CCTCGGCTCG

GGACCGGGGG

GCGACGCCGC

CGCTGTGGCT

AGAACTGCGG

TAAAGGCAGC

TGAGTCGAGA

AACTCATGAT

TGCAACACCT

AATGGAAGTT

TGGATCAGAT

GGGAAGGGGC

GGACAAACTT

ACAGCTGGGA

GCCTCAACCC

CTCTTGATGT

ATTGGCAGGA

GCGCTCACGC

TAACGCCGCC

ACCGCACCGC

CTTCGCTCTG

GAGAGCGAAG

CAAGTTTTTG

TAATCTCGAG

ATTAAATTAT

ACAGACTCCA

GGACTCAGCA

GGAACATTTG

AATAGAATAC

AAAGCTACAG

TGACCCCTTG

GGAAATGCTG

AGCCGACCCA

GGCCCTTGTT

GGAGTTGATT

CCTGCAAACC

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 C. C C C C

C.

CCC...

C C

ATGTTCCAGT

TCTCACATCA

TACGTGGAGG

ACAACCACGA

GCCAGCGGCT

TCCAAGTCCC

GCAGGATTGG

TTGCCGTTTC

AGTGAAACAG

CGCACCGGAG

GCATTGATCC

TTCAATTTTG

CGTGAGGACA

ATTCAAGTTG

TAATGACTCC

ACTGGAATGA

TGGTTCATCA

CCCTGGACGA

ACCTACTGAT

AGGGTGCCGT

GCCTCTGCTC

TTGCTCTTGG

GACAGAATAA

CCAGCGTGGC

CTATCCCTGC

CTATGGTTCT

GAAGATTGGA

AGCCACAGGC

CCCCCATACA CCAGCCACAG

CAAGCAAATG

AGACAGGGCA

AAGTGTCGCC

CATCCTAAAA

GCTTGCCTAT

GGGGCTGGCT

CTTGATTGGC

TGTTGGTGTG

GAGGATTCCA

CCTCACCTCC

CCTGCGAGCG

GCTCATTTTT

TATTTTCTGC

CTACACAGAG

CTTCGCCCAC

CCCTCACACG

TGTTACCGTC

GGACCTGCTC

GTGGACACTC

CAAGGTTGTG

CCGAGTGAGA

CTTCATAGCT

AGCAGACTAC

GAAGTATGTC

AGACTGGCTT

GCCAAACAAT

GACTGGCAGC

CGCAGATGGC

CGACCCTGTA

TATGAACACT

GCCGCCATCC

CCAAACTCCA

TCCTTCTCTG

GCCTGTTTAA

GGCGTCCTGT

ATTTCTTTTA

GATGATGTCT

TTTGAGGACA

ATCAGCAATG

TTCTCCCTCC

CCTGCAATTC

TGTTTCACAA

CCTCACAGTA

GAAACCCATA

CACGTGTACT

ACCCAGGACA

TCCCAGTTCT

TCTTCGTTTG

GTAATCCTTC

GACGGGCTGG

GCCCAGTTCA

CCGAATATCC

ATGCTGGAGG

CAAGGACTTC

TATAAAAATG

CGAGACAAGC

ATCATTAATC

GCTTACGCTG

TCAGGGGCTA

TGGAGGCCTG

CTCAAAAGGT

ATGTCAGTGT

CCATGCTGCG

TGGTTGCGCT

ATGCTGCGAC

TCCTCCTGGC

GGACTGGGGA

TCACCGCCTT

AGGCTGCTGT

TCAGCATGGA

GCCCCTGTGT

ACACCCGGTA

TCACTATGCA

ACACCACCGC

ACCTCAGCTG

CAGACTCCAG

CAGAGAAGCA

TTTTCCTGGG

ACCTCACGGA

AGTACTTCTC

AGCACCTACT

AGAACAAGCA

AGGATGCATT

GATCAGATGA

CCATCGACAT

CGAGCGCTTT

CCTCCCAGGC

CGACTATGTC

GCAGAGGACT

GCTTCCCTTC

CATCCGAGTG

CTGGGACTGC

GTCAGTGGCT

AACTCAGGTT

CCATGCATTC

GTGCCTCAAG

CTTCATGGCC

GGTGGTGGTA

TTTATACAGA

CAGCAGGGTG

CAGCCCCCCA

GTCCACCGTT

CGAGCCACGC

TCAGAGTCCC

CCTCCACTGC

CTATGCTCCT

CTTGCTGGGG

CATTGTTCCC

TTTCTACAAC

TTACGACCTT

ACTTCCCCAA

TGACAGTGAC

CGGGGTCCTC

TAGTCAGTTG

CTACATCTAC

CAACATCCGG

1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 so#* 0 Se...

of..

CAGCTCCGCA

TCTGAGATCT

GAGAGCACCA

CTCGAGCCCC

TTCCTCCTGA

GTCAGCCTTT

CGGGAAACCA

ATGTATATAG

CATAAGAGTT

ATGTGGCTGC

TGGGAAACTG

GCTTACAAAC

ACTAAACAGC

CTGACCGCTT

CAGAGTATGA

CTGTACAGCC

GCTCTACCAG

CCTGCACCAA

AACCCAhAGC

ATGGGACCAC

GAGAATATGA

TCACCCAGAA

TCAGCAATGT

ACTACTTTAG

GGAGGATCAT

TCCTGGTGCA

GTCTGGTAGA

GGGTCAGCAA

S. S S S S 55

S

550555 S S

CCTCACCGGC

ATCCCAGCAG

GACACCTCAG

AGCCTGGGAC

AGCCTGCGCC

TGCGCAGTCT

ATGACCGTTG

GTGGTCATCC

GCCTTTCTGA

TTTGCTCCCG

TCCGAATTTG

GGGGTTCTCA

GAGGTGTCTC

AGTGTCGTCC

TCGGAGTACA

GCACAGCAGG

GTCTTTGCCC

CGGCAACAGC

CGAAGGGATC

TTTGAAATTT

GGGGCCCGTT

AGCTACTGCC

CCCCCGCCTG

CCTGAGACTG

AGGAGGGACT

TGGGGGAGCA

AAGCCCCGCC

GGCAGTTCAT

CGGAGTGGGT

ACTTTGTGGA

TGTCCAGCTA

ACTGGCTGCT

TCCTCCTGAA

AGCTCTTTGG

TGATTGCATC

CAGCCATTGG

TTCTGGACGG

ATTTCATTGT

ATGGACTGGT

CAGCCAATGG

GGTTTGCCGT

GCTCTCAGAC

GTGCCGGAGG

GGTCCACTGT

CCCACCTGGA

CCCCTAGAGA

CTACTGAAGG

CTCACAACCC

AGCCCATCAC

GACCTGGGCG

ATCACGGGGT

CAAAGGTGGA4

GCTCCAACTG

CCCACCTCTT

TGTTACTGTA

CCATGACAAA

CGAGTACGCT

AGCCATAGAA

CCCCAATGGC

GCTATCCATC

CCCCTGGACG

CATGATGGGC

TGTTGGCATC

GGACAAGAAC

TGCTGTGTCC

CAGATACTTC

TCTGCTGCCT

CCTAAACCGA

GCCTCCTGGT

CACGGTGTCT

CCCTGCCCAC

GGTCCATCCG

CTCTGGCTCC

AGGCTTGCGG

GCATTCTGGC

TCGGAACCCA

CACTGTGACG

CAACCCCCGA

ATTTGAGGAT

GGTCATAGAG

P.GGGTAATTA

TCCAGAACTG

ACTGATTGTA

GCCGACTACA

CAGTTCCCTT

AAAGTGAGAG

TACCCCTTCC

AGCGTGGTGC

GCCGGGATCA

CTCATTGGGA

GGAGTGGAGT

CACAGGGCTA

ACTCTGCTGG

TTTGCCGTCC

GTCCTCTTAT

CTGCCCACTC

CACACGAACA

GGCATCAGTG

CAAGTGATTG

GACTCCAGAC

TTGTCCCCTG

CCACCCCCCT

CCTAGCAATA

ACGTCCACCG

GCTTCTGCTT

GGGGGGCCCT4

CCTCATGTGC

CTACAGGACG

AAATCTGAAG

CTTGAAGAGA

rTATTKKGTG

TGCCAGAGAC

TCTACCTCAA

TCATCTGTAA

TGTTCTGGGA

TGGCCTGCAC

TTGTCATGGT

TCAAGCTGAG

TCACCGTCCA

TGCTCGCTCT

GTGTACTGAT

TGGCCATTCT

CCTTCTTTGG

CTTCGCCTGA

ATGGGTCTGA

AGGAGCTCAG

TGGAAGCCAC

ATCAGCCTCC

GACGGCAAGG

A.CAGACCGCG

GGGACCGCTC

CCATGGGCAG

CGGTGACTGT

3TCCAGGCTA CTTTTCATGT4 1GGAATGTGA

::AAAGAGGCC

kCTGCTTGGA kAATATTTCT

CAGGCTGAGA

CGGCCTACGA

CAACTATACG

GCAATACATC

GTTTCTAGTG

CCTGGCTCTG

TGCTGTGCCT

CGTGGCTTTG

GGAACACATG

GCTTGCAGGG

CACCGTCTTG

ACCGTGTCCT

GCCGCCTCCA

TTCCTCCGAC

GCAATACGAA

AGAAAACCCT

CTTGACCCCT

CCAGCAGCCT

CAGAGACGCT

AGGGCCCCGT

CTCTGTGCCC

TGCTGTGCAT

TGAGAGCTAC

CAGGTGTGAG

GGAGAGGCCG

kAAGATTGGA kTTATGGGAA

%TAAATATTT

2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 AARAGGTGTA CACATGTAAT ATACATGGAA ATGCTGTACA GTCTATTTCC TGGGGCCTCT 42 4620 CCACTCCTGC CCCAGAGTGG GGAGACCACA GGGGCCCTTT TGTGCCACAA CCAAGCTTAA CTTAGTTTTA AAAAAAATCT CTTAAATATT GTATAATTTA CTTGTATAAT TCTATGCAAA TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG ATGAAGAAAA CAGGTTAATC CCAGTGGCTT CTCTAGGGGT GGTGGATGTG TGTGTGCATG TGACTTTCCA ATGTACTGTA TGCTGTTGTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA GTGGGCTGGG AAGGTCCAGG TCTTTTTCTG TCGTGATGCT CATCTGTCCT ATTCTCTGGG ACTATTC INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 1434 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein

CCCCTGTGTA

CCCAGCATAT

TATTGCTTAT

TATCACAACC

TAATTGTTTA

AGTTGTATAT

TTGTGGTTTG

TGATCTTAGC

GGTGGAAAGG

CATTc3GTCTC

GTCGCTGCTG

GTAATAGGAT

CTGTGGTAGG

ACGAGCAGAC

GGTTCGCATG

TTGTTGTTGT

TCTGGCCTAG

TGACCCCAAT

*4680 4740 4800 4860 4920 4980 5040 5100 5160 5187 (xi) SEQUENCE DESCRIPTION: SEQ ID Met Ala Ser Ala Gly Asn Ala Ala Gly Al a 10 Leu Gly Arg Gln 1 Gly Arg Ala Gly Pro Asp Gly Arg Arg Asp Tyr Leu Arg Arg Thr Gly Gly 25 Tyr Pro His Arg Ala Ala His Arg Pro Glu Gln Ile Ser 40 Cys Asp Ala Ala Phe Ala Leu Ser Lys Gly Leu Arg Lys 55 Ala Thr Gly Arg Lys Ala Pro Leu Trp Ala Lys Phe Gln 70 Arg Leu Leu Phe Lys 75 Leu Gly Cys Tyr I le Gln Lys Asn Cys Gly Gly Lys Phe Leu Val Val 90 Gly Leu Leu Ile Phe Gly Val Glu Ala Phe Ala Glu Leu Trp 115 Leu Lys Ala Asn Leu Glu Thr Glu Val Gly Gly 120 Arg Val Ser Arg Glu 125 Leu Asn Tyr Thr Arg Gin Lys Ile Gly Giu Giu Ala Met Phe Asn Pro Gin Leu Met 130 135 140 Ile Gin Thr Pro Lys Glu Giu Gly Ala Asn Val Leu Thr Thr Giu Ala 145 150 155 160 Leu Leu Gin His Leu Asp Ser Ala Leu Gin Ala Ser Arg Val His Val 165 170 175 Tyr Met Tyr Asn Arg Gin Trp Lys Leu Giu His Leu Cys Tyr Lys Ser 180 185 190 Gly Giu Leu Ile Thr Giu Thr Giy Tyr Met Asp Gin Ile Ile Giu Tyr 195 200 205 Leu Tyr Pro Cys Leu Ile Ile Thr Pro Leu Asp Cys Phe Trp Giu Gly 210 215 220 Ala Lys Leu Gin Ser Gly Thr Ala Tyr Leu Leu Gly Lys Pro Pro Leu 225 230 235 240 Arg Trp Thr Asn Phe Asp Pro Leu Giu Phe Leu Giu Giu Leu Lys Lys 245 250 255 Ile Asn Tyr Gin Vai Asp Ser Trp Glu Glu Met Leu Asn Lys Ala Glu 260 265 270 Vai Gly His Gly Tyr Met Asp Arg Pro Cys Leu Asn Pro Ala Asp Pro 275 280 285 Asp Cys Pro Ala Thr Ala Pro Asn Lys Asn Ser Thr Lys Pro Leu Asp 290 295 300 Val Ala Leu Val Leu Asn Gly Gly Cys Gin Gly Leu Ser Arg Lys Tyr 305 310 315 320 Met His Trp Gin Giu Giu Leu Ile Vai Gly Gly Thr Vai Lys Asn Ala *325 330 335 Thr Gly Lys Leu Val Ser Ala His Ala Leu Gin Thr Met Phe Gin Leu 340 345 350 *Met Thr Pro Lys Gin Met Tyr Giu His Phe Arg Gly Tyr Asp Tyr Val *355 360 365 Ser His Ile Asn Trp Asn Giu Asp Arg Ala Ala Ala Ile Leu Giu Ala *370 375 380 *Trp Gin Arg Thr Tyr Vai Giu Val Val His Gin Ser Val Ala Pro Asn 385 390 395 400 *Ser Thr Gin Lys Val Leu Pro Plie Thr Thr Thr Thr Leu Asp Asp Ile 405 410 415 *.:Leu Lys Ser Phe Ser Asp Val Ser Vai Ile Arg Val Ala Ser Gly Tyr *420 425 430 *Leu Leu Met Leu Ala Tyr Ala Cys Leu Thr Met Leu Arg Trp Asp Cys 47 435 440 445 Ser Lys Ser Gin Gly Ala Val Gly Leu Ala Gly Val Leu Leu Val Ala 450 455 460 Leu Ser Val Ala Ala Gly Leu Gly Leu Cys Ser Leu Ile Gly Ile Ser 465 470 475 480 Phe Asn Ala Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Gly Val 485 490 495 Giy Val Asp Asp Val Phe Leu Leu Ala His Ala Phe Ser Giu Thr Giy 500 505 510 Gin Asn Lys Arg Ile Pro Phe Glu Asp Arg Thr Gly Giu Cys Leu Lys 515 520 525 Arg Thr Gly Ala Ser Val Ala Leu Thr Ser Ile Ser Asn Val Thr Ala 530 535 540 Phe Phe Met Ala Ala Leu Ile Pro Ile Pro Ala Leu Arg Ala Phe Ser 545 550 555 560 Leu Gin Ala Ala Val Val Val Val Phe Asn Phe Ala Met Val Leu Leu 565 570 575 Ile Phe Pro Ala Ile Leu Ser Met Asp Leu Tyr Arg Arg Giu Asp Arg 580 585 590 Arg Leu Asp Ile Phe Cys Cys Phe Thr Ser Pro Cys Val Ser Arg Val 595 600 605 Ile Gin Val Giu Pro Gin Ala Tyr Thr Glu Pro His Ser Asn Thr Arg 610 615 620 Tyr Ser Pro Pro Pro Pro Tyr Thr Ser His Ser Phe Ala His Giu Thr 625 630 635 640 His Ile Thr Met Gin Ser Thr Val Gin Leu Arg Thr Giu Tyr Asp Pro 645 650 655 His Thr His Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Giu Ile Ser 660 665 670 Val Gin Pro Val Thr Val Thr Gin Asp Asn Leu Ser Cys Gin Ser Pro 675 680 685 *Giu Ser Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser 690 695 700 Ser Leu His Cys Leu Giu Pro Pro Cys Thr Lye Trp Thr Leu Ser Ser **705 710 715 720 Phe Ala Giu Lys His Tyr Ala Pro Phe Leu Leu Lye Pro Lys Ala Lys 725 730 735 Val Val Val Ile Leu Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr 740 745 750 48 Gly Thr Thr 755 Arg Giu Thr 770 Ser Phe Tyr 785 Ile Gin His Tyr Val Met Tyr Phe Arg 835 Trp Glu Thr 850 Asp Gly Val 865 Lys Pro Ile Asp Giy Ile Val Ser Asn 915 Pro His Arg 930 Thr Arg Leu *945 Pro Phe Tyr Ile Giu Lys *Ser Ser. Tyr 995 *Ser Leu Arg 1010 .Thr Phe Leu *eaa..1025 Ile Ile Val a Met Gly Leu Arg Val Arg Asp Gly Leu 760 Arg Giu Tyr Asp The Ile 775 Asn Met Tyr Ile Val Thr 790 Leu Leu Tyr Asp Leu His 805 Leu Glu Giu Asn Lys Gin 820 825 Asp Trp Leu Gin Gly Leu 840 Gly Arg Ile Met Pro Asn 855 Leu Ala Tyr Lys Leu Leu 870 Asp Ile Ser Gin Leu Thr 885 Ile Asn Pro Ser Ala Phe 900 905 Asp Pro Val Ala Tyr Ala 920 Pro Glu Trp, Vai His Asp 935 Arg Ile Pro Ala Ala Giu 950 Leu Asn Gly Leu Arg Asp 965 Val Arg Val Ile Cys Asn 980 985 Pro Asn Gly Tyr Pro Phe 1000 His Trp Leu Leu Leu Ser 1015 Val Cys Ala Vai Phe Leu 1030 Met Val Leu Ala Leu Met 1045 Ile Gly Ile Lys Leu Ser 49 Asp Leu Thr Asp Ile-.Val Pro 765 Ala Ala Gin Phe Lys Tyr Phe 780 Gin Lys Ala Asp Tyr Pro Asn 795 800 Lys Ser Phe Ser Asn Val Lys 810 815 Leu Pro Gin Met Trp Leu His 830 Gin Asp Ala Phe Asp Ser Asp 845 Asn Tyr Lys Asn Gly Ser Asp 860 Val Gin Thr Gly Ser Arg Asp 875 880 Lys Gin Arg Leu Vai Asp Ala 890 895 Tyr Ile Tyr Leu Thr Ala Trp 910 Ala Ser Gin Ala Asn Ile Arg 925 Lys Ala Asp Tyr Met Pro Giu 940 Pro Ile Giu Tyr Ala Gin Phe 955 960 Thr Ser Asp Phe Val Giu Ala 970 975 Asn Tyr Thr Ser Leu Gly Leu 990 Leu Phe Trp Giu Gin Tyr Ile 1005 Ile Ser Val Val Leu Ala Cys 1020 Leu Aen Pro Trp Thr Ala Gly 1035 1040 Thr Vai Glu Leu Phe Gly Met 1050 1055 Ala Val Pro Val Val Ile Leu 1060 1065 1070 .le Ala Ser Val Gly Ile Gly Val Glu Phe Thr Val His Val Ala Leu 1075 1080 1085 Ala Phe Leu Thr Ala Ile Gly Asp Lys ken His Arg Ala Met Leu Ala 1090 1095 1100 Leu Glu His Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu 1105 1110 1115 1120 Leu Gly Val Leu Met Leu Ala Gly Ser Giu Phe Asp Phe Ile Val Arg 1125 1130 1135 Tyr Phe Phe Ala Val Leu Ala Ile Leu Thr Val Leu Gly Val Leu ken 1140 1145 1150 Gly Leu Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Cys Pro 1155 1160 1165 Giu Vai Ser Pro Ala ken Gly Leu ken krg Leu Pro Thr Pro Ser Pro 1170 1175 1180 Giu Pro Pro Pro Ser Val Val Arg Phe Ala Val Pro Pro Gly His Thr 1185 1190 1195 1200 ken ken Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr 1205 1210 1215 Val Ser Gly Ile Ser Glu Giu Leu Arg Gin Tyr Glu Ala Gin Gin Gly 1220 1225 1230 Ala Gly Giy Pro Ala His Gin Val Ile Val Giu Ala Thr Giu Asn Pro 1235 1240 1245 Val Phe Ala Arg Ser Thr Val Val His Pro -Asp Ser krg His Gin Pro 1250 1255 1260 Pro Leu Thr Pro Arg Gin Gin Pro His Leu Asp Ser Gly Ser Leu Ser 1265 1270 1275 1280 *Pro Gly Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly :1285 1290 1295 Leu Arg Pro Pro Pro Tyr Arg Pro krg Arg Asp Ala Phe Giu Ile Ser 1300 1305 1310 *Thr Giu Gly His Ser Giy Pro Ser ken Arg Asp Arg Ser Gly Pro krg 1315 1320 1325 Giy Ala Arg Ser His ken Pro Arg ken Pro Thr Ser Thr Ala Met Gly 1330 1335 1340 Ser Ser Val Pro Ser Tyr Cys Gin Pro Ile Thr Thr Val Thr Ala Ser .:1345 1350 1355 1360 *Ala Ser Val Thr Val Ala Val His Pro Pro Pro Gly Pro Gly Arg ken 1365 1370 1375 Pro Arg Gly Gly Pro Cys Pro Gly Tyr Glu 1380 1385 His Gly Val Phe Glu Asp Pro His Val Pro 1395 1400 Arg Arg Asp Ser Lys Val Glu Val Ile Glu 1410 1415 Glu Glu Arg Pro. Trp Gly Ser Ser Ser Asn 1425 1430 INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 11 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide Ser Tyr Pro Glu Thr Asp 1390 Phe His Val Arg Cys Glu 1405 Leu Gln Asp Val Glu Cys 1420 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: Ile Ile Thr Pro Leu Asp Cys Phe Trp Glu Gly 1 5 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 5 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: Leu Ile Val Gly Gly 1 INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 7 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: Pro Phe Phe Trp Glu Gln Tyr 1 INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 28 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: GGACGAATTC AARGTNCAYC ARYTNTGG 28 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 26 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID GGACGAATTC CYTCCCARAA RCANTC 26 INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 27 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer"

ESE

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: GGACGAATTC YTNGANTGYT TYTGGGA INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRAI4DEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: other nucleic acid DESCRIPTION: /desc "primer" (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: CATACCAGCC AAGCTTGTCN GGCCARTGCA T INFORMATION FOR_ SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 5288 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 9* *99999 9 *9 9 9* 09 999999 9

GAATTCCGGG

GCCGGCTCTC

CCCCCGCGCA

GCAGCGGCAG

CGAGCCCGAG

GAGCCCGCAG

GCGGCGGCGG

GCAGACGGAC

CCAGCTACTG

GGAAAGCGCC

ACATTCAAAA

GCTCTTCCGC

ATGTGGCAAT

CAGCGCCCGC

CAGCCTGCGG

CAGCGGCAGC

CGGCGGCGGC

CGGCAGCGGC

GGGGGGGCTG

CGACGCCGCC

ACTGTGGCTG

AAACTGCGGC

GAACTGGATG

GGAAGGCGCA

CGTGTGAGCA

CCAGCAGCGT

AGCGCGCCGG

AACATGGCCT

TGTATCGGTG

CGCCGTGCTG

TTCGCTCTGG,

AGAGCGAAGT

AAGTTCTTGG

TGGGCAGCGG

GGGTCTGACT

GCAGCAGCGG

CCTCGCAAGC

GCCGCCCGGG

CGGCTGGTAA

CCCCGGGACG

CCGCGCCGGA

AGCAGATTTC

TTCAGAGACT

TTGTGGGCCT

GACCGCAAGG AGTGCCGCGG AAGCGCCCGA AGGACAGGCT

CGGCCGCAGA

CCCCGGCAGC

CTGGTCTGTC

CGAGCGCCCA

AAGCCTCCGT

CGCCGCCGAG

GCCGGCTGGA

CCGGGACTAT

CAAGGGGAAG

CTTATTTAAA

CCTCATATTT

CGCTCGGCGC

GACCTCGGGA

GGCCGCGGCC

AACCGGAGCC

GGCGCGCCAG

CCCCGCGGCG

CCCCAGGACC

GGCGGGAGGC

CTGCACCGGC

GCTACTGGCC

CTGGGTTGTT

GGGGCCTTCG,

CGGTGGGATT

GAGGACGAGT

TTAATCCTCA

AAGCGCTCCT

ACAACAGGCA

CAGGTTACAT

ACTGCTTCTG

CTTTGCGGTG

ATCAAGTGGA

ACCGCCCCTG

CAACCAAACC

AGTATATGCA

AACTCGTCAG

ACGAGCACTT

CAGCCATCCT

AGAACTCCAC

CCTTCTCTGA

CCTGTCTAAC

GCGTCCTGCT

TTTCCTTTAA

ATGATGTTTT

TTGAGGACAG

TCAGCAATGT

TCTCCCTCCA

CTGCAATTCT

GTTTTACAAG

CACACGACAA

AAACGCAGAT

ACGTGTACTA

AAAAGCAGCG

AAGTCGTGAA

ACTCATGATA

ACAACACCTG

GTGGAAATTG

GGATCAGATA

GGAAGGGGCG

GACAAACTTC

CAGCTGGGAG

CCTCAATCCG

TCTTGATATG

CTGGCAGGAG

CGCCCATGCC

CAAGGGGTAC

GGAGGCCTGG

TCAAAAGGTG

CGTCAGTGTC

CATGCTGCGC

GGTTGCACTG

CGCTGCAACA

TCTTCTGGCC

GACCGGGGAG

CACAGCCTTC

GGCAGCGGTA

CAGCATGGAT

CCCCTGCGTC

TACCCGCTAC

TACCATGCAG

CACCACCGCT

AACCTCGAGA

TTARATTATA

CAGACCCCTA

GACTCGGCAC

GAACATTTGT

ATAGAATATC

AAATTACAGT

GACCCTTTGG

GAAATGCTGA

GCCGATCCAG

GCCCTTGTTT

GAGTTGATTG

CTGCAGACCA

GAGTATGTCT

CAGAGGACAT

CTTTCCTTCA

ATCCGCGTGG

TGGGACTGCT

TCAGTGGCTG

ACTCAGGTTT

CACGCCTTCA

TGCCTGAAGC

TTCATGGCCG

GTAGTGGTGT

TTATATCGAC

AGCAGAGTGA

AGCCCCCCAC

TCCACTGTCC

GAGCCGCGCT

CCAACGTGGA

CTCGCCAGAA

AAGAAGAAGG

TCCAGGCCAG

GTTACAAATC

TTTACCCTTG

CTGGGACAGC

AATTCCTGGA

ATAAGGCTGA

ACTGCCCCGC

TGAATGGTGG

TGGGTGGCAC

TGTTCCAGTT

CACACATCAA

ATGTGGAGGT

CCACCACGAC

CCAGCGGCTA

CCAAGTCCCA

CAGGACTGGG

TGCCATTTCT

GTGAAACAGG

GCACAGGAGC

CGTTAATCCC

TCAATTTTGC

GCGAGGACAG

TTCAGGTTGA

CTCCCTACAG

AGCTCCGCAC

CCGAGATCTC

GGAGCTGTGG

GATTGGAGAA

TGCTAATGTC

CCGTGTCCAT

AGGAGAGCTT

TTTGATTATT

ATACCTCCTA

AGAGTTAAAG

GGTTGGTCAT

CACAGCCCCC

ATGTCATGGC

AGTCAAGAAC

AATGACTCCC

CTGGAACGAG

GGTTCATCAG

CCTGGACGAC

CTTACTCATG

GGGTGCCGTG

CCTGTGCTCA

CGCTCTTGGT

ACAGAATAAA

CAGCGTGGCC

AATTCCCGCT

CATGGTTCTG

GAGACTGGAT

ACCTCAGGCC

CAGCCACAGC

GGAGTACGAC

TGTGCAGCCC

GTGOAAGTTG

GAGGCTATGT

CTGACCACAG

GTATACATGT

ATCACAGAAA

ACACCTTTGG

GGTAAACCTC

AAAATAAACT

GGTTACATGG

AACAAAAATT

TTATCCAGAA

AGCACTGGAA

AAGCAAATGT

GACAAAGCGG

AGTGTCGCAC

ATCCTGAAAT

CTCGCCTATG

GGGCTGGCTG

TTGATCGGAA

GTTGGTGTGG

AGAATCCCTT

CTCACGTCCA

CTGCGGGCGT

CTCATTTTTC

ATTTTCTGCT

TACACCGACA

TTTGCCCATG

CCCCACACGC

GTCACCGTGA

780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 a.

a a a a. a a a a. a a a a.

a a a

CACAGGACAC

CCCAGTTCTC

CATCTTTTGC

TGATCTTCCT

ACGGGCTGGA

CACAATTCAA

CGAATATCCA

TGTTGGAAGA

AGGGACTTCA

ACAAGAATGG

GCGATAAGCC

TCATTAATCC

CGTATGCTGC

CCGACTACAT

AGTTCCCTTT

AAGTAAGGAC

ACCCCTTCCT

GCGTGGTGTT

CCGGGATCAT

TCATCGGAAT

GAGTGGAGTT

GCAGGGCTGT

CTCTGCTGGG

TTGCTGTGCT

TGCTTTTGTC

TGCCCACACC

ACACGCACAG

GCCTCAGCGA

AAGTGATCGT

CCTCAGCTGC

CGACTCCAGC

TGAGARGCAC

TTTTCTGGGC

CCTTACGGAC

ATACTTTTCT

GCACTTACTT

AAACAAACAG

GGATGCATTT

ATCAGACGAT

CATCGACATC

CAGCGCTTTC

CTCCCAGGCC

GCCTGAAACA

CTACCTCAAC

CATCTGCAGC

CTTCTGGGAG

GGCCTGCACA

TGTGATGGTC

CAAGCTCAGT

CACCGTTCAC

GCTTGCCCTG

AGTGCTGATG

GGCGATCCTC

TTTCTTTGGA

CTCCCCTGAG

CGGGTCTGAT

GGAGCTTCGG

GGAAGCCACA

CAGAGCCCAG

CTCCACTGCC

TATGCTCCTT

TTGCTGGGGG

ATTGTACCTC

TTCTACAACA

TACGACCTAC

CTTCCCAAAA

GACAGTGACT

GGAGTCCTTG

AGCCAGTTGA

TACATCTACC

AACATCCGGC

AGGCTGAGAA

GGGTTGCGGG

AACTATACGA

CAGTACATCG

TTCCTCGTGT

CTGGCGCTGA

GCCGTGCCCG

GTTGCTTTGG

GAGCACATGT

CTGGCGGGAT

ACCATCCTCG

CCATATCCTG

CCACCCCCCA

TCCTCCGACT

CACTACGAGG

GAPLAACCCCG

AGAGCACCAG

TCGAGCCCCC

TCCTCTTGAA

TCAGCCTTTA

GGGAAACCAG

,TGTATATAGT

ACAGGAGTTT

TGTGGCTGCA

GGGAAACCGG

CCTACAAACT

CTAAACAGCG

TGACGGCTTG

CACACCGACC

TCCCGGCAGC

ACACCTCAGA

GCCTGGGGCT

GCCTCCGCCA

GCGCTGTCTT

TGACGGTCGA

TGGTCATCCT

CCTTTCTGAC

TTGCACCCGT

CTGAGTTCGA

GCGTTCTCAA

AGGTGTCTCC

GCGTGGTCCG

CGGAGTATAG

CCCAGCAGGG

TCTTCGCCCA

CTCCACAAGG

CTGTACGAAG

ACCAAAAGCC

TGGCACCACC

AGAATATGAC

CACCCAGAAA

CAGTAACGTG

CTACTTCAGA

GAAAATCATG

CCTGGTGCAA

TCTGGTGGAT

GGTCAGCAAC

AGAATGGGTC

AGAGCCCATC

CTTTGTGGAG

GTCCAGTTAC

CTGGCTGCTG

CCTTCTGAAC

GCTGTTCGGC

GATCGCTTCT

GGCCATCGGC

CCTGGATGGC

CTTCATTGTC

TGGGCTGGTT

AGCCAACGGC

CTTCG' 'G

TTCCC,-

CGCGGC-

CTCCACT-

GACCTGCTCT

TGGACACTCT

AAGGTAGTGG

CGAGTGAGAG

TTTATTGCTG

GCAGACTACC

AAGTATGTCA

GACTGGCTTC

CCAAACAATT

ACCGGCAGCC

GCAGATGGCA

GACCCCGTCG

CACGACAAAG

GAGTATGCCC

GCAATTGAAA

CCCAACGGCT

CTGTTCATCA

CCCTGGACGG

ATGATGGGCC

GTTGGCATAG

GACAAGAACC

GCCGTGTCCA

AGGTATTTCT

TTGCTTCCCG

TTGAACCGCC

CCGCCCGGCC

ACAGTGTCAG

CCTGCCCACC

GTCCATCCCG

2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 S* S a.

S. S

C.

S

9 0 AATCCAGGCA TCACCCACCC TCGAACCCGA GACAGCAGCC CCACCTGGAC TCAGGGTCCC

TGCCTCCCGG

CACCCCTCTA

CTAGCAATAG

CGTCCACTGC

CTTCTGCCTC

CCCGAGGGGG

ACGTGCCTTT

AGGACGTGGA

CTGAAGCAAA

GAAGAGAACT

ACTGTAACCG

TGTGTAATAT

CCAGAGTGTG

ACCAAGCTTC

TTACTTGTAT

TGTTTAAAAT

AACTTTCAAA

CCGGAATT

ACGGCAAGGC

CAGACCGCGC

GGCCCGCTGG

CATGGGCAGC

CGTGACTGTC

ACTCTGCCCA

CCACGTCCGG

ATGCGAGGAG

GAGGCCAAAG

GGTTGGAGTT

ATTGTATTAT

AGGAAGGAAG

GAGGCCACAG

ATTAGTCTTA

AATTCTATGC

ATTTTAAATT

CACGCTATGC

CAGCAGCCCC

AGAGACGCTT

GGCCCTCGCG

TCCGTGCCCG

GCCGTGCACC

GGCTACCCTG

TGTGAGAGGA

AGGCCCCGGG

ATTGGAAACC

ATGGAAAAGA

TTTGTTAAAT

GATGTAAAGT

TGGGGCCTCT

AATTTCAGCA

AAATATTGCT

TGCATATCAC

GTGATAATTT

GCAGGGACCC

TTGAAATTTC

GGGCCCGTTC

GCTACTGCCA

CGCCGCCTGT

AGACTGACCA

GGGATTCGAA

GAAGCAGCTC

CCCCACCCCC

TGCCCTGTGC

ATTTCTATAA

GGTATGATCT

CCGTATTTGT

TATGTTGCTG

TATGTAATAG

AACCCTGTGG

TTTTGTTTAA

CCCCAGAGAA

TACTGAAGGG

TCACAACCCT

GCCCATCACC

CCCTGGGCCT

CGGCCTGTTT

GGTGGAAGTC

CAACTGAGGG

ACCTCTTTCC

CAGGACAGCA

ATATTTAAGA

GGGGCTTCTC

GCATTGGGCT

CTGCTTAAAT

GATTATTTTG

TAGTATGAAA

TGAGCAGATA

GGCTTGTGGC

CATTCTGGCC

CGGAACCCAG

ACTGTGACGG

GGGCGGAACC

GAGGACCCCC

ATTGAGCTGC

TGATTAAAAT

AGAACTGCTT

GTTCATTGTT

GATGTACACA

CACTCCTGCC

CCGTGCCACA

ATTGTATAAT

TAAAGGTTTC

TGTTACTGTT

TGAAGAAAGC

4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5288 goes 04 0 INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 1447 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: Met Ala Ser Ala Gly Asn Ala Ala Glu Pro Gln Asp Arg Gly Gly Gly 1 5 10 Gly Ser Gly Cys Ile Gly Ala Pro Gly Arg Pro Ala Gly Gly Gly Arg 20 25 S. S S S 05

S

S 55555

S

Arg Arg Arg Thr Gly Gly Leu Arg Arg Ala Ala Ala Pro Asp Arg Asp 40 Tyr Ile Ala Asn Ala Trp Gln 145 Thr Gin Tyr Leu Pro 225 Leu Thr Tyr His Pro 305 Leu Trp Leu Ser Lye Cys Val Val 130 Lye Pro His Asn Ile 210 Cys Gin Asn Gin Gly 290 Ala Val Gin Arg Gly Gin Lye 100 Leu Val Gly Giu Asp 180 Gin Giu Ile Giy Asp 260 Asp Met Ala Aen Glu Pro Lys Arg Phe Lys Giy Giu Giu 165 Sor Trp Thr Ile Thr 245 Pro Ser Asp Pro Giy 325 Lou Ser Ala 70 Leu Leu Ala Giy Glu 150 Gly Ala Lye Gly Thr 230 Ala Lou Trp Arg Aen 310 Giy Ile Tyr 55 Thr Lou Vai Ala Arg 135 Ala Ala Lou Lou Tyr 215 Pro Tyr Glu Glu Pro 295 Lye Cys Val Cys Gly Phe Val Aen 120 Val Met Asn Gin Giu 200 Met Lou Leu Phe Giu 280 Cys Asn His Gly Asp Arg Lys Gly 105 Leu Ser Phe Val Ala 185 His Asp Asp Lou Lou 265 Met Lou Ser Gly Gly 57 Ala Lye Lou 90 Leu Glu Arg Asn Lou 170 Ser Lou Gin Cys Gly 250 Giu Lou Asn Thr Lou 330 Thr Ala Aia 75 Gly Lou Thr Glu Pro 155 Thr Arg Cyr.

Ile Phe 235 Lye Glu Asn Pro Lye 315 Ser Val Phe Pro Cys Ile Aen Lou 140 Gin Thr Val1 Tyr Ile 220 Trp Pro Lou Lye Ala 300 Pro Arg Lye Ala Lou Tyr Phe Val1 125 Asn Lou Giu His Lys 205 Giu Giu Pro Lye Ala 285 Asp Lou Lye Asn Lou Trp Ile Gly 110 Giu Tyr Met Ala Val Ser Tyr Gly Lou Lye 270 Giu Pro Asp Tyr Ser Giu Lou Gin Ala Giu Thr Ile Lou 175 Tyr Gly Lou Ala Arg 255 Ile Val1 Asp Met Met 335 Thr Gin Arg Lye Phe Leu Arg Gin 160 Lou Met Giu Tyr Lye 240 Trp Asn Gly Cys Ala 320 His Gly

S

S.

S

Lys Pro Ile 385 Arg Gin Ser Met Ser 465 Val Ala Asp Lye Gly 545 Met Aia Pro Asp Val :625 Pro Leu Lye 370 Asn Thr Lys Phe Leu 450 Gin Ala Ala Asp Arg 530 Ala Ala Ala Ala Ile 610 Glu Pro Val 355 Gin Trp Tyr Val Ser 435 Ala Gly Ala Thr Val 515 Ile Ser Ala Val Ile 595 Phe Pro Pro 340 Ser Met Asn Val Leu 420 Asp Tyr Ala Gly Thr 500 Phe Pro Val Leu Val 580 Leu Cys Gin Pro Ala Tyr Glu Glu 405 Ser Val Ala Val Leu 485 Gin Leu Phe Ala Ile 565 Val Ser Cys Ala Tyr 645 His Giu Asp 390 Val Phe Ser Cys Gly 470 Gly Vai Leu Giu Leu 550 Pro Val Met Phe Tyr 630 Ser Ala His 375 Lye Val Thr Val Leu 455 Leu Leu Leu Ala Asp 535 Thr Ile Phe Asp Thr 615 Thr Ser Leu 360 Phe Ala His Thr Ile 440 Thr Ala Cys Pro His 520 Arg Ser Pro Asn Leu 600 Ser Asp His Thr Gly Ala Ser 410 Thr Val Leu Val Leu 490 Leu Phe Gly Ser Leu 570 Ala Arg CyB His Phe 650 Met Tyr Ile 395 Val Leu Ala Arg Leu 475 Ile Ala Ser Giu Aen 555 Arg Met Arg Val Asp 635 Ala Phe Giu 380 Leu Ala Asp Ser Trp, 460 Leu Gly Leu Giu Cys 540 Val Ala Val Glu Ser 620 Asn His Gin 365 Tyr G iu Gin Asp Gly 445 Asp Vai Ile Gly Thr 525 Leu Thr Phe Leu Asp 605 Arg Thr Glu 350 Leu Val Ala Asn Ile 430 Tyr Cys Ala Ser Val 510 Gly Lye Ala Ser Leu 590 Arg Val Arg Thr Met Ser Trp Ser 415 Leu Leu Ser Leu Phe 495 Gly Gin Arg Phe Leu 575 Ile Arg Ile Tyr Gin 655 Thr His Gin 400 Thr Lys Leu Lye Ser 480 Asn Val1 Asn Thr Phe 560 Gin Phe Leu Gin Ser 640 Ile Thr Met Gin Ser Thr Val Gin Leu Arg Thr Giu Tyr Asp Pro His Thr 660 665 670 His Pro Thr 705 His Giu Val Thr Thr 785 Tyr His Met Arg Thr 865 Val Ile Ile Asn Arg 945 Leu Vai Val 690 Ser Cys Lys Ile Arg 770 Arg Asn Leu Leu Asp 850 Gly Leu Asp Ile Asp 930 Pro Arg Tyr 675 Thr Ser Leu His Phe 755 Val Giu Met Leu Giu 835 Trp, Lys Ala Ile Asn 915 Pro Giu Ile Tyr Val Thr Giu Tyr 740 Leu Arg Tyr Tyr Tyr 820 Giu Leu Ile Tyr Ser 900 Pro Val Trp, Pro Thr Thr Arg Pro 725 Ala Phe Asp Asp Ile 805 Asp Asn Gin Met Lys 885 Gin Ser Ala Val Ala Thr Gin Asp 710 Pro Pro Leu Gly Phe 790 Val Leu Lys Gly Pro 870 Leu Leu Ala Tyr His 950 Ala Ala Asp 695 Leu Cys Phe Gly Leu 775 Ile Thr His Gin Leu 855 Asn Leu Thr Phe Aia 935 Asp Glu Giu 680 Thr Leu Thr Leu Leu 760 Asp Ala Gin Arg Leu 840 Gin Asn Vai Lys Tyr 920 Ala Lys Pro Pro Leu Ser Lys Leu 745 Leu Leu Ala Lys Ser 825 Pro Asp Tyr Gin Gin 905 Ile Ser Ala Ile 59 Arg Ser Gin Trp 730 Lys Gly Thr Gin Ala 810 Phe Lys Ala Lys Thr 890 Arg Tyr Gin Asp Glu Ser Cys Phe 715 Thr Pro Val Asp Phe 795 Asp Ser Met Phe Asn 875 Gly Leu Leu Ala Tyr 955 Tyr Giu Gin 700 Ser Leu Lys Ser Ile 780 Lys Tyr Asn Trp Asp 860 Gly Ser Val Thr Asn 940 Met Ala Ile 685 Ser Asp Ser Ala Leu 765 Val Tyr Pro Val Leu 845 Ser Ser Arg Asp Ala 925 Ile Pro Gin Ser Pro Ser Ser Lys 750 Tyr Pro Phe Asn Lys 830 His Asp Asp Asp Ala 910 Trp Arg Giu Phe Val Giu Ser Phe 735 Val Gly Arg Ser Ile 815 Tyr Tyr Trp Asp Lys 895 Asp Val Pro Thr Pro Gin Ser Leu 720 Ala Val Thr Giu Phe 800 Gin Val Phe Giu Gly 880 Pro Gly Ser His Arg 960 Phe 965 970 -975 Tyr Leu Aen Gly Leu Arg Asp Thr Ser Asp Phe Val Giu Ala Ile Giu 980 985 990 Lys Val Arg Thr Ile Cys Ser Asn Tyr Thr Ser Leu Giy Leu Ser Ser 995 1000 1005 Tyr Pro Asn Gly Tyr Pro Phe Leu Phe Trp Giu Gin Tyr Ile Gly Leu 1010 1015 1020 Arg His Trp Leu Leu Leu Phe Ile Ser Val Val Leu Ala Cys Thr Phe 1025 1030 1035 1040 Leu Vai Cys Aia Vai Phe Leu Leu Asn Pro Trp Thr Ala Gly Ile Ile 1045 1050 1055 Vai Met Val Leu Ala Leu Met Thr Val Giu Leu Phe Gly Met Met Giy 1060 1065 1070 Leu Ile Giy Ile Lys Leu Ser Ala Vai Pro Val Val Ile Leu Ile Aia 1075 1080 1085 Ser Vai Gly Ile Gly Vai Giu Phe Thr Vai His Val Aia Leu Ala Phe 1090 1095 1100 Leu Thr Ala Ile Gly Asp Lys Asn Arg Arg Ala Vai Leu Ala Leu Giu 1105 1110 1115 1120 His Met Phe Ala Pro Val Leu Asp Gly Ala Vai Ser Thr Leu Leu Gly 1125 1130 1135 Val Leu Met Leu Ala Gly Ser Giu Phe Asp Phe Ile Val Arg Tyr Phe 1140 1145 1150 Phe Ala Val Leu Ala Ile Leu Thr Ile Leu Gly Val Leu Asn Gly Leu *.1155 1160 1165 Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Tyr Pro Giu Val 1170 1175 1180 Ser Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro Giu Pro Pro Pro Ser Val Val Arg Phe Ala Met Pro Pro Giy His Thr His Ser *1205 1210 1215 *Gly Ser Asp Ser Ser Asp Ser Giu Tyr Ser Ser Gin Thr Thr Val Ser 1220 1225 1230 **Gly Leu Ser Giu Giu Leu Arg His Tyr Giu Ala Gin Gin Giy Ala Gly **1235 1240 1245 Giy Pro Ala His Gin Val Ile Val Glu Ala Thr Giu Asn Pro Vai Phe :1250 1255 1260 *Aia His Ser Thr Val Val His Pro Giu Ser Arg His His Pro Pro Ser 1265 1270 1275 1280 9 WO096/11260 Asn Pro Arg Gin Gin Pro His Leu Asp Ser Oly Ser Leu Pro Pro Gly 1285 1290 1295 Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Giu Gly Leu Trp 1300 1305 1310 Pro Pro Leu Tyr Arg Pro Arg Arg Asp Ala Phe Giu Ile Ser Thr Giu 1315 1320 1325 Gly His Ser Gly Pro Ser Aen Arg Ala Arg Trp Gly Pro Arg Gly Ala 1330 1335 1340 Arg Ser His Asn Pro Arg Asn Pro Ala Ser Thr Ala Met Gly Ser Ser 1345 1350 1355 1360 Vai Pro Gly Tyr Cys Gin Pro Ile Thr Thr Val Thr Ala Ser Ala Ser 1365 1370 1375 Val Thr Val Ala Val His Pro Pro Pro Val Pro Gly Pro Gly Arg Asn 1380 1385 1390 Pro Arg Giy Gly Leu Cys Pro Gly Tyr Pro Glu Thr Asp His Gly Leu 1395 1400 1405 Phe Giu Asp Pro His Val Pro Phe His Val Arg Cys Giu Arg Arg Asp 1410 1415 1420 Ser Lys Val Giu Val Ile Giu Leu Gin Asp Val Glu Cys Giu Giu Arg 1425 1430 .1435 1440 Pro Arg Gly Ser Ser Ser Asn 1445

Claims

1. An isolated nucleic acid sequence encoding a patched gene other than Drosophila patched gene or fragment thereof of at least about 12bp different from the sequence of the Drosophila patched gene, wherein the DNA sequence hybridizes under s stringent conditions to any of SEQ ID No: 1, 3, 5, 7, 9, 18, and fragments thereof, and encodes a patched polypeptide which binds a hedgehog polypeptide.

2. A nucleic acid sequence according to claim 1, wherein said patched gene is a mammalian gene.

3. A nucleic acid sequence according to claim 1 for human, mouse, mosquito, 10 butterfly or beetle patched gene. S4. A nucleic acid sequence according to claim 3, wherein said nucleic acid sequence is a human sequence. S 5. A nucleic acid sequence according to claim 3, wherein said nucleic acid *...sequence is a mouse sequence.

6. A nucleic acid sequence according to any one of claims 1 to 5, wherein said nucleic acid sequence is a fragment of at least about 18bp.

7. A nucleic acid sequence according to claim 1 joined to a nucleic acid ••sequence including a restriction enzyme recognition sequence. S8. An isolated nucleic acid sequence encoding a patched gene other than the Drosophila patched gene or fragment thereof, substantially as hereinbefore described with S" reference to any one of the Examples. 9 An expression cassette including a transcriptional initiation region functional in an expression host, a nucleic acid sequence according to any one of claims 1 to 8 under the transcriptional regulation of said transcriptional initiation region, and a transcriptional termination region functional in said expression host. An expression cassette according to claim 9, wherein said transcriptional initiation region is heterologous to said nucleic acid sequence according to any one of claims 1 to 8.

11. An expression cassette according to claim 9, wherein said transcriptional initiation region is homologous to said nucleic acid sequence according to any one of claims 1 to 8 and includes the enhancer region.

12. An expression cassette, substantially as hereinbefore described with reference to any one of the Examples. r 13. A cell including an expression cassette according to any one of claims 9 to 12 part of an extrachromosomal element or integrated into the genome of a host cell as a [I:\DAYLIB\LIBA]04338.doc:MCN S. 0 6S 0 S S 0000 6O 0 6 06 OS 6 0 6 0 0ee 0 0 *OSS0O 0 S S *60500 6 OOSSOS 000000 0@ *500 0 5050 *0 00 5 5 result of introduction of said expression cassette into said host cell and the cellular progeny of said host cell.

14. A cell according to claim 13, further including the patched polypeptide in the cellular membrane of said cell.

15. A cell according to claim 13 or 14, wherein said patched polypeptide is a mouse patched polypeptide.

16. A cell according to claim 13 or 14, wherein said patched gene encodes a human patched polypeptide.

17. A cell according to any one of claims 13 to 16, wherein said transcriptional 10 initiation region is a Drosophila patched gene transcriptional initiation region including the promoter and enhancer joined to a heterologous gene.

18. A cell including an expression cassette including a transcriptional initiation region functional in an expression host, said transcriptional initiation region consisting of a 5' non-coding region regulating the transcription of patched polypeptide encoded by the isolated nucleic acid of claim 1 including the promoter and enhancer, a marker gene, and a transcriptional termination region, as part of an extrachromosomal element or integrated into the genome of a host cell as a result of introduction of said expression cassette into said host, and the cellular progeny thereof.

19. A cell according to claim 18, wherein said transcriptional initiation region is 20 the Drosophila region. A cell including an expression cassette, substantially as hereinbefore described with referenced to any one of the Examples.

21. A method for following embryonic development employing the patched polypeptide, said method including: integrating an expression cassette including a transcriptional initiation region functional in embryonic host cells, said transcriptional initiation region consisting of a non-coding region regulating the transcription of a patched polypeptide encoded by the nucleic acid of claim 1, a marker gene, and a transcriptional termination region, wherein said embryonic host cells are capable of developing into a fetus; growing said embryonic host cells, whereby proliferation and differentiation occur; and locating cells expressing the patched polypeptide by means of expression of said marker gene. [I:\DAYLIB\LIBA]04338.doc:MCN

22. A method for following embryonic development employing the patched polypeptide, substantially as hereinbefore described with reference to any one of the Examples.

23. A method for producing patched polypeptide, said method including: growing a cell according to any one of claims 13 to 20, whereby said patched polypeptide is expressed; and isolating said patched polypeptide free of other proteins. A method for producing patched polypeptide, substantially as hereinbefore described with reference to any one of the Examples.

25. A method for screening candidate compounds for binding affinity to the patched polypeptide, said method including: S* combining said candidate compound with a vertebrate or invertebrate cell including said patched protein in the membrane of said cell and an expression cassette including a transcriptional initiation region functional in said cell, a nucleic acid sequence according to any one of claims 1 to 8 including the entire coding sequence under the transcriptional regulation of said transcriptional initiation region, and a transcriptional termination region S* functional in said cell, expressing said patched polypeptide in said cell; and assaying for the binding of said candidate compound to said patched polypeptide.

26. A method for screening candidate compounds for binding affinity to the 20 patched polypeptide, substantially as hereinbefore described with reference to any one of the Examples.

27. A method for screening candidate compounds for agonist activity with the patched polypeptide, said method including: combining said candidate compound with a vertebrate or invertebrate cell including said patched polypeptide in the membrane of said cell and an expression cassette including a transcriptional initiation region functional in an expression host, said transcriptional initiation region consisting of a 5' non-coding region regulating the transcription of a patched polypeptide encoded by the nucleic acid of claim 1, a marker gene, and a transcriptional termination region, as part of an extrachromosomail element of integrated into the genome of a host cell; and assaying for the expression of said marker gene.

28. A method for screening candidate compounds for agonist activity with the atched polypeptide, substantially as hereinbefore described with reference to any one of Examples. [I:\DAYLIB\LIBA]04338.doc:MCN

29. A monoclonal antibody binding specifically to a patched polypeptide, other than the Drosophila patched polypeptide. A monoclonal antibody binding specifically to a patched polypeptide, substantially as hereinbefore described with reference to any one of the Examples.

31. An isolated nucleic acid including a ptc coding sequence for a naturally occurring vertebrate patched polypeptide, or allelic variant thereof, which binds to a hedgehog polypeptide, wherein the ptc coding sequence hybridizes to the complement of 0; the coding sequence of SEQ ID Nos. 9 or 18 under stringency conditions equivalent to 5 x SSC at 10 32. The nucleic acid of claim 31, wherein the ptc coding sequence hybridizes to the complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 0.1 x SSC at 50 0 C. a 33. The nucleic acid of claim 31 or 32, wherein the ptc coding sequence encodes a mammalian patched polypeptide.

34. The nucleic acid of claim 33, wherein the coding sequence encodes a primate patched polypeptide.

35. The nucleic acid of claim 34, wherein the coding sequence encodes a human patched polypeptide.

36. The nucleic acid of claim 31, wherein the amino acid sequence of the ptc S 20 polypeptide is identical to SEQ ID No. 10 or 19.

37. The nucleic acid of claim 31, wherein the ptc coding sequence is identical to SEQ ID No. 9 or 18.

38. An isolated nucleic acid including a coding sequence for a polypeptide including a hedgehog-binding sequence that binds hedgehog polypeptide and which is encoded by a nucleotide sequence that hybridizes to the complement of the coding sequence of SEQ ID Nos. 9 or 18 under stringency conditions equivalent to 5 x SSC at 0 C, wherein the polypeptide retains hedgehog-binding activity.

39. The nucleic acid of claim 38, wherein the hedgehog-binding sequence is encoded by a nucleotide sequence that hybridizes to the complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 0.1 x SSC at 0 C. The nucleic acid of claim 38 or 39, wherein the hedgehog-binding sequence is from a mammalian patched polypeptide. 0 41. The nucleic acid of claim 35, wherein the hedgehog-binding sequence is from a primate patched polypeptide. [I:\DAYLIB\LIBA]04338.doc:MCN

42. The nucleic acid of claim 41, wherein the hedgehog-binding sequence is from a human patched polypeptide.

43. The nucleic acid of claim 38, wherein the nucleotide sequence encoding the hedgehog-binding sequence is at least 267 base pairs in length.

44. The nucleic acid of claim 38, wherein the nucleotide sequence encoding the hedgehog-binding sequence is at least 345 base pairs in length. The nucleic acid of claim 44, wherein the hedgehog-binding sequence includes at least 3 extracellular loops.

46. The nucleic acid of claim 38, wherein the hedgehog polypeptide is selected g"o* 10 from the group consisting of Sonic hedgehog, Indian hedgehog, and Desert hedgehog.

47. A nucleic acid including: a coding sequence from a naturally occurring vertebrate ptc gene or allelic variant thereof, which coding sequence hybridizes to the complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 5 x SSC at 60 0 C and encodes a polypeptide that binds hedgehog proteins, and (ii) a heterologous transcriptional initiation region controlling the transcription of the coding sequence. 0 48. The nucleic acid of claim 31 or 38, further including a heterologous transcriptional initiation region that controls transcription of the coding sequence.

49. The nucleic acid of claim 48, wherein the transcription initiation region is °20 inducible. The nucleic acid of claim 48, wherein the transcription initiation region includes a promoter.

51. A cell including and expressing the nucleic acid of any one of claims 23, 32, 47 or

52. The cell of claim 51, which cell is a eukaryotic cell.

53. The cell of claim 52, which cell is a mammalian cell.

54. The nucleic acid of claim 43, wherein the nucleotide sequence encoding the hedgehog-binding sequence is at least 1 Kbp in length. A nucleic acid including: a coding sequence for a polypeptide including a hedgehog-binding sequence that binds hedgehog polypeptide and is encoded by a nucleotide sequence that hybridizes to the complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 5 x SSC at 60 0 C, and (ii) a heterologous transcriptional initiation region controlling the transcription of the coding sequence, wherein the polypeptide retains hedgehog-binding activity. [1:\DAYL1B\L1BA]04338.doc:MCN 67

56. The nucleic acid of claim 47, wherein the coding sequence hybridizes to the complement of the coding sequence of SEQ ID No. 9 or 18 under stringency conditions equivalent to 0.1 x SSC at 50 0 C.

57. The nucleic acid of claim 47, wherein the amino acid sequence of the ptc polypeptide is identical to SEQ ID No. 10 or 19.

58. The nucleic acid of claim 47, wherein the coding sequence is identical to SEQ ID No. 9 or 18.

59. The nucleic acid of claim 55, wherein the hedgehog-binding sequence is encoded by a nucleotide sequence that hybridizes to the complement of the coding 10 sequence of SEQ ID. No. 9 or 18 under stringency conditions equivalent to 0.1 x SSC at 60 0 C.

60. The nucleic acid of claim 55, wherein the coding sequence encodes a mammalian patched polypeptide.

61. The nucleic acid of claim 60, wherein the coding sequence encodes a primate patched polypeptide.

62. The nucleic acid of claim 61, wherein the coding sequence encodes a human •0 patched polypeptide.

63. A method for assessing a genetic predisposition of an animal for a basal cell carcinoma, the method including: detecting, from a sample of nucleic acid isolated from the animal, a loss-of-function mutation in a patched gene in the germline of said animal, wherein the presence of said loss-of-function mutation indicates that said animal has a genetic predisposition for basal *o0o °cell carcinoma.

64. The method according to claim 63, wherein said sample of nucleic acid is from a biopsy of cells isolated from the animal. A method for determining a patched phenotype of cells of a tumor including detecting, from a sample of nucleic acid isolated from the cells, the presence or absence of a loss-of-function mutation of a patched gene in the cells.

66. The method according to claim 65, wherein the tumor is a carcinoma.

67. The method according to claim 66, wherein said carcinoma is a basal cell carcinoma.

68. The method according to claim 65, wherein nucleic acid is from a biopsy of cells of said tumor. ST 69. An assay for determining the patched phenotype of a cell, including providing S a nucleic acid sample isolated from mammalian cells, detecting the presence or absence [1:\DAYLIB\LIBA)04338.doc:MCN of a patched gene sequence or allelic variant thereof by hybridization of the nucleic acid sample with one or more nucleic acid probes which hybridize to a mammalian patched gene. The assay of claim 69, wherein hybridization of the probe(s) further includes subjecting the probe(s) and nucleic acid sample to an amplification process and detecting abnormalities in an amplified product.

71. The assay of claim 70, wherein the amplification process is polymerase chain reaction (PCR).

72. The assay of claim 69, wherein the probe(s) hybridizes to SEQ ID No. 9 or 18 10 under stringency conditions equivalent to 10 x SSC at 50°C to SEQ ID No. 9 or 18. *73. The assay of claim 69, wherein the probe(s) hybridizes to SEQ ID No. 9 or 18 •under stringency conditions equivalent to 5 x SSC at

74. The assay of claim 69, wherein the probe(s) hybridizes to SEQ ID No. 9 or 18 under stringency conditions equivalent to 0.1 x SSC at

75. The assay of claim 69, wherein the probe(s) further includes a label group attached to the nucleic acid which is capable of being detected.

76. The assay of claim 69, wherein the probe(s) is at least 15 nucleotides in length.

77. The assay of claim 69, wherein the probe(s) is at least 50 nucleotides in length.

78. The assay of claim 69, wherein the probe(s) is 15-100 nucleotides in length.

79. An assay for detecting mutations in a patched gene, including detecting, in a *sample of isolated mammalian cells or nucleic acid isolated therefrom, the presence or 9 absence of a deletion of one or more nucleotides from the patched gene, an addition of one or more nucleotides to the patched gene, a substitution of one or more nucleotides of the patched gene, a chromosomal rearrangement of all or a portion of the patched gene, an alteration in the level of an mRNA transcript of the patched gene, or alteration of the splicing pattern of an mRNA transcript of the patched gene. The assay of claim 79, wherein all or a portion of the patched gene is amplified by an amplification process and abnormalities in an amplified product, if any, are detected.

81. The assay of claim 80, wherein the amplification process is polymerase chain reaction (PCR). ST 82. The assay of claim 79, wherein mutations of the patched gene are detected by 35 ingle strand conformational polymorphism analysis. [I:\DAYLIB\LIBA104338.doc:MCN 69

83. The assay of claim 79, wherein mutations of the patched gene are detected by gel electrophoresis.

84. The assay of claim 79, wherein mutations of the patched gene are detected by digestion with one or more endonucleases.

85. An assay for phenotyping the patched status of a cell, including detecting, in a sample of isolated mammalian cells, the presence or absence of a genetic lesion of a patched gene characterized by at least one of an aberrant mutation of a patched gene resulting in loss of function, and (ii) mis-expression of the patched gene resulting in loss o o 1 of function. 10 86. The assay of claim 85, which assay includes: i. providing one or more nucleic acid probes including a region of nucleotide sequence which hybridizes to a sense or antisense sequence of the patched gene, or naturally occurring mutants thereof or 5' or 3' flanking sequence naturally associated with the gene; ii. combining the probe(s) with a nucleic acid sample from the cells; and iii. detecting, by hybridization of the probe(s) to the nucleic acid, the presence or absence of a deletion of one or more nucleotides from the patched gene, an addition of one or more nucleotides to the patched gene, a substitution of one or more nucleotides of the patched gene, a chromosomal rearrangement of all or a portion of the patched gene, 20 an alteration in the level of an mRNA transcript of the patched gene, or an alteration of the wild type splicing pattern of an mRNA transcript of the patched gene.

87. The assay of claim 86, wherein the probe(s) hybridizes to a sequence **OS designated by SEQ ID No. 9 or 18 under stringency conditions equivalent to 10 x SSC at

88. The assay of claim 86, wherein the probe(s) hybridizes to a sequence designated by SEQ ID No. 9 or 18 under stringency conditions equivalent to 5 x SSC at 0 C.

89. The assay of claim 86, wherein the probe(s) further includes a label group attached to the nucleic acid which is capable of being detected.

90. The assay of claim 86, wherein all or a portion of the patched gene is amplified by an amplification process and abnormalities in an amplified product, if any, are detected.

91. The assay of claim 90, wherein the amplification process is polymerase chain ST reaction (PCR). [I:\DAYLIB\LIBA]04338.doc:MCN

92. The assay of claim 85, wherein mutations of the patched gene are detected by single strand conformational polymorphism analysis.

93. The assay of claim 85, wherein mutations of the patched gene are detected by gel electrophoresis.

94. The assay of claim 85, wherein mutations of the patched gene are detected by digestions with one or more endonucleases. The assay of claim 85, wherein detecting the lesion includes ascertaining, relative to a wild-type level of hedgehog-dependent patched signal transduction, the 6O ability of cells in the cell sample to respond to hedgehog induction. 10 96. The assay of claim 63, 65, 69, 79 or 85, wherein the cell sample is obtained from a human patient. S* 97. The assay of claim 96, wherein the cell sample is obtained from a biopsy.

98. The assay of claim 96, wherein the biopsy is obtained from a carcinoma, .I meningioma, medulloma or fibroma.

99. The assay of claim 69, wherein the nucleic acid sample is an mRNA sample from the mammalian cells.

100. The assay of claim 69, wherein the nucleic acid sample is a cDNA sample reverse transcribed from mRNA of the mammalian cells.

101. The assay of claim 69, wherein the nucleic acid sample is genomic DNA from the mammalian cells. 0" 102. The method of claim 69, which detects loss of heterozygosity in a patched gene of the mammalian cells.

103. A method for assessing a genetic predisposition of an animal for developing basal cell nevus syndrome, the method including: detecting, from a sample of nucleic acid isolated from the animal, a loss-of-function mutation in a patched gene in the germline of said animal, wherein the presence of said loss-of-function mutation indicates that said animal has a genetic predisposition for developing basal cell nevus syndrome.

104. The method according to claim 103, wherein said sample of nucleic acid is from a biopsy of cells isolated from the animal.

105. The assay of claim 69, wherein, the sequence of the detected patched gene is determined.

106. The assay of claim 105, wherein the presence or absence of a deletion of one Sor more nucleotides from the patched gene, or addition of one or more nucleotides to the (I:\DAYLIB\LIBA]04338.doc:MCN patched gene, or a substitution of one or more nucleotides of the patched gene is determined from the sequence.

107. The assay of claim 85, wherein detecting said lesion includes ascertaining, from a methylation pattern of said patched gene, the presence or absence of aberrant methylation of said patched gene.

108. The assay of claim 107, wherein the methylation pattern of said patched gene is determined by combining nucleic acid of said cell sample with one or more methylation-sensitive restriction endonucleases and determining the restriction digest pattern of at least a portion of said patched gene. 10 109. The assay of claim 86, wherein detecting said lesion includes detecting the presence or absence of a non-wild type level of a patched protein product of said patched gene in cells of said cell sample.

110. The assay of claim 109, wherein the level of said patched protein is detected in an immunoassay. is 111. A method for diagnosing a genetic predisposition of an animal for at least one of a developmental abnormality or a proliferative disorder marked by aberrant expression or activity of a patched gene or gene product, the method including detecting the presence -of a predisposing mutation in a patched gene in cells of said animal, wherein the presence of said predisposing mutation indicates that said individual has a genetic predisposition 20 for at least one of developmental abnormalities or a proliferative disorder.

112. The method of claim 111, wherein said genetic predisposition is a basal cell nevus syndrome.

113. The method of claim 111, wherein said genetic predisposition is a predisposition for developing a carcinoma.

114. The method of claim 111, wherein said genetic predisposition is a predisposition for developing a meningiomas.

115. The method of claim 111, wherein said genetic predisposition is a predisposition for developing a medullomas.

116. The method of claim 111, wherein said genetic predisposition is a predisposition for developing a fibroma.

117. The method of claim 111, wherein said detecting step includes analyzing a nucleic acid sample obtained from said animal.

118. The method of claim 111, wherein said detecting step includes functional Sanalysis of patched protein function. [I:\DAYLIB\LIBA]04338.doc:MCN

119. The method of claim 111, wherein said detecting step includes detecting antibody binding to abnormal patched protein.

120. A method for characterizing the phenotype of a tumor, including detecting the presence of an oncogenic patched mutation in cells of the tumor, wherein the presence of said oncogenic mutation indicates that said tumor has a patched-associated phenotype.

121. The method of claim 120, wherein said tumor is a carcinoma.

122. The method of claim 121, wherein said carcinoma is a basal cell carcinoma.

123. The method of claim 120, wherein said tumor is a meningioma.

124. The method of claim 120, wherein said tumor is a medulloma.

125. The method of claim 120, wherein said tumor is a fibroma. o. 126. The method of claim 120, wherein said oncogenic patched mutation is see:. 006:0 6 detected by analyzing DNA of said tumor.

127. The method of claim 120, wherein said oncogenic patched mutation is e. detected by analyzing mRNA of said tumor.

128. The method of claim 120, wherein said detecting step includes functional analysis of patched protein function.

129. The method of claim 120, wherein said detecting step includes detecting antibody binding to abnormal patched protein.

130. A genetically engineered mammalian cell predisposed to develop a 20 proliferative phenotype as a result of transfection of said mammalian cell with at least one 0o••0 "0 nucleic acid construct which inhibits expression of an endogenous patched gene or alters the signal transduction activity of a wild-type patched protein.

131. The cell of claim 130, wherein the cell develops a carcinoma phenotype.

132. The cell of claim 130,wherein the cell develops a basal cell carcinoma phenotype.

133. The cell of claim 130, wherein the cell develops a meningioma phenotype.

134. The cell of claim 130,wherein the cell develops a medulloma phenotype.

135. The cell of claim 130 wherein the cell develops a fibroma phenotype.

136. A method for treating an animal having a disorder characterized by loss-of- function of a patched gene, including transfecting cells of the animal with an expression construct encoding a patched polypeptide.

137. The method of claim 136, wherein the cells are transfected in vivo.

138. The method of claim 136, wherein the cells are transfected in vitro.

139. The method of claim 136, wherein the expression construct is a viral vector. [1:\DAYLIB\LIBA]04338.doc:MCN 73

140. The method of claim 136, wherein the transfected cells include epithelial cells. .141. The method of claim 136, wherein the transfected cells include neuronal cells.

142. The method of claim 136, wherein the transfected cells include carcinoma cells.

143. The method of claim 142, wherein the carcinoma cells are basal cell carcinoma cells. The method of claim 136, wherein the transfected cells include meningiomna cells. 0 145. The method of claim 136, wherein the transfected cells include medulloma cells. I "°146. The method of claim 141, wherein the transfected cells include fibroma cells. O• 147. A method for treating an animal having a disorder characterized by loss-of- function of a patched gene, including administering to the animal an agent which inhibits derepression of one or more patched-dependent genes.

148. A genetically engineered non-human mammal predisposed to develop a proliferative phenotype as a result of transfection of said mammalian cell with at least one nucleic acid construct which inhibits expression of an endogenous patched gene or alters o the signal transduction activity of a wild-type patched protein.

149. The mammal of claim 148, which is predisposed to develop a carcinoma phenotype. The mammal of claim 148, which is predisposed to develop a basal cell 0, carcinoma phenotype.

151. The mammal of claim 148, which is predisposed to develop a meningiomna phenotype.

152. The mammal of claim 148, which is predisposed to develop a medulloma phenotype.

153. The mamnmal of claim 148, which is predisposed to develop a fibroma phenotype.

154. A method for treating an animal having a disorder characterized by loss-of- function of a wild-type patched gene, including transfecting cells of the animal with an expression construct encoding a patched polypeptide, which construct functionally replaces the wild-type patched gene. ~155. The method of claim 154, wherein the cells are transfected in vivo. 3 156. The method of claim 154, wherein the cells are transfected in vitro. [I1ADAYLIB\LIBA]04338.doc:MCN 74

157. The method of claim 155, wherein the expression construct is a viral vector.

158. The method of claim 155, wherein the transfected cells include epithelial cells.

159. The method of claim 155, wherein the transfected cells include neuronal cells.

160. The method of claim 155, wherein the transfected cells include carcinoma cells.

161. The method of claim 160, wherein the carcinoma cells are basal cell carcinoma cells. .els. 162. The method of claim 155, wherein the transfected cells included meningioma 10 cells.

163. The method of claim 155, wherein the transfected cells include medulloma cells.

164. The method of claim 155, wherein the transfected cells include fibroma cells.

165. A method for treating an animal having a disorder characterized by loss-of- function of a patched gene, including administering to the animal an agent which inhibits derepression of one or more patched-dependent genes.

166. A transgenic non-human mammal predisposed to develop a proliferative phenotype as a result of transfection of cells of said mammal with at least one nucleic acid construct which inhibits expression of an endogenous patched gene or alters the signal transduction activity of a wild-type patched protein.

167. The animal of claim 166, wherein the cell develops a carcinoma phenotype.

168. The cell of claim 167, wherein the cell develops a basal cell carcinoma phenotype.

169. The cell of claim 166, wherein the cell develops a meningioma phenotype.

170. The cell of claim 166, wherein the cell develops a medulloma phenotype.

171. The cell of claim 166, wherein the cell develops a fibroma phenotype.

172. A method for screening for apatched agonist or antagonist including the steps of: a) combining a patched polypeptide or bioactive fragment thereof, a patched binding partner and a test compound under conditions wherein, but for the test compound, the patched polypeptide and patched binding partner are able to interact; and b) detecting the extent to which, in the presence of the test compound, a patched polypeptide/patched binding partner complex is formed, wherein an increase or decrease K S %k in the amount of complex formed in the presence of the compound relative to complex [1:\DAYLIB\LIBA]04338.doc:MCN *0 0* 00 0600 00e S 0* formed in the absence of the compound indicates that the compound interacts with a patched polypeptide or patched binding partner.

173. A method of claim 172, wherein the patched polypeptide is a human patched polypeptide.

174. A method of claim 172, wherein the patched polypeptide is encoded by a wildtype patched gene.

175. A method of claim 172, wherein the patched polypeptide is encoded by a mutant patched gene.

176. A method of claim 172, wherein the agonist or antagonist is selected from the group consisting of: a protein, peptide, peptidomimetic, small molecule or nucleic acid.

177. A method of claim 176, wherein the nucleic acid is selected from the group consisting of: a patched gene, an antisense, ribozyme and triplex nucleic acid.

178. A method of claim 172, wherein the compound inhibits hedgehog dependent patched signaling. 15 179. A method of claim 172, which additionally includes the step of preparing a pharmaceutical composition from the compound.

180. A method for identifying a compound that modulates hedgehog-dependent signal transduction, including the steps of: contacting an appropriate amount of the compound with a cell or cellular extract, which expresses a patched gene; and determining the resulting patched bioactivity, wherein an increase or decrease in the patched bioactivity in the presence of the compound as compared to the bioactivity in the absence of the compound indicates that the compound is a modulator of a patched bioactivity.

181. A method of claim 180, wherein the patched gene is a human patched gene.

182. A method of claim 180, wherein the patched gene is a wildtype gene.

183. A method of claim 180, wherein the patched gene is a mutant gene.

184. A method of claim 180, wherein the modulator is an agonist of a patched bioactivity.

185. A method of claim 180, wherein the modulator is an antagonist of a patched bioactivity.

186. A method of claim 180, wherein in step the patched bioactivity is determined by determining the expression level of a patched gene.

187. A method of claim 186, wherein the expression level is determined by etecting the amount of mRNA transcribed from a patched gene. [I:\DAYLIB\LIBA]04338.doc:MCN

188. A method of claim 186, wherein the expression level is determined by detecting the amount of patched gene product produced.

189. A method of claim 188, wherein the expression level is determined using an anti-patched antibody in an immunodetection assay.

190. A method of claim 180, which additionally includes the step of preparing a pharmaceutical composition from the compound.

191. A method of claim 180, wherein said cell is contained in an animal.

192. A method of claim 191, wherein the animal is transgenic.

193. A method of claim 192, wherein the transgenic animal contains a human patched gene.

194. A method for identifying whether a test compound is a patched binding partner or measuring the strength of an interaction between a patched polypeptide and said patched binding partner including: allowing a first molecule including a patched polypeptide operably linked s15 to a heterologous DNA binding domain to interact with (ii) a second molecule comprising a test compound operably linked to a polypeptide transcriptional activation domain and (iii) a hybrid reporter gene including a nucleic acid encoding a reporter operably linked to a DNA sequence comprising a binding site for said heterologous DNA binding domain; S. and detecting or measuring the expression of the hybrid reporter gene as an indication of the existence or strength of an interaction between the first molecule and the second molecule wherein high levels of hybrid reporter expression indicate a strong interaction between patched and said test molecule thereby identifying a test compound which is a patched binding partner.

195. The method of claim 194, wherein said second molecule is encoded by a nucleic acid and includes a test polypeptide operably linked to a polypeptide transcriptional activation domain, and which further includes the step of isolating the nucleic acid encoding said second molecule from a cell expressing the hybrid reporter gene.

196. A method for identifying a molecule which is a downstream or an upstream component of a patched biochemical pathway or for measuring the strength of the interaction between a patched biochemical pathway component and a patched binding partner including: R allowing a first molecule comprising a patched binding partner polypeptide o rably linked to a heterologous DNA binding domain to interact with (ii) a second [1:\DayLib\LIBVV]02700.doc:ais molecule including a test molecule operably linked to a polypeptide transcriptional activation domain and (iii) a hybrid reporter gene including a nucleic acid encoding a reporter operably linked to a DNA sequence comprising a binding site for said heterologous DNA binding domain; and detecting or measuring the expression of the hybrid reporter gene as an indication of the existence or strength of an interaction between the first molecule and the second molecule wherein high levels of hybrid reporter expression indicate a strong interaction between a patched binding pair and said test molecule thereby identifying a test molecule which is a downstream or an upstream component of the patched biochemical pathway.

197. The method of claim 196, wherein said second molecule is encoded by a nucleic acid and includes a test polypeptide operably linked to a polypeptide transcriptional activation domain, and which further includes the step of isolating the nucleic acid encoding said second molecule from a cell expressing the hybrid reporter 15 gene.

198. Use of an expression construct encoding a patched polypeptide for transfecting cells in the treatment of an animal having a disorder characterised by loss-of- function of a patched gene.

199. An expression construct encoding a patched polypeptide when used for 20 transfecting cells of an animal having a disorder characterised by loss-of-function of a patched gene.

200. The use or construct of claim 198 or 199 respectively wherein the cells are S•transfected in vivo.

201. The use or construct of claim 198 or 199 respectively wherein the cells are transfected in vitro.

202. The use or construct of claim 198 or 199 respectively wherein the expression construct is a viral vector.

203. The use or construct of claim 198 or 199 respectively wherein the transfected cells include epithelial cells.

204. The use or construct of claim 198 or 199 respectively wherein the transfected cells include neuronal cells.

205. The use or construct of claim 198 or 199 respectively wherein the transfected cells include carcinoma cells. S206. The use or construct of claim 205 wherein the carcinoma cells are basal cell 35 arcinoma cells. [I:\DayLib\LIBVV]02700.doc:ais

207. The use or construct of claim 198 or 199 respectively wherein the transfected cells include meningioma cells.

208. The use or construct of claim 198 or 199 respectively wherein the transfected cells include medulloma cells.

209. The use or construct of claim 198 or 199 respectively wherein the transfected cells include fibroma cells.

210. Use of an agent which inhibits depression of one or more patched-dependent genes in the treatment of an animal having a disorder characterised by loss-of-function of a patched gene.

211. An agent which inhibits depression of one or more patched-dependent genes when used in the treatment of an animal having a disorder characterized by loss-of- function of a patched gene.

212. The use of an expression construct encoding a patched polypeptide which construct functionally replaces the wild-type patched gene in the treatment of an animal 15 having a disorder characterized by loss-of-function of a wild-type patched gene. *213. An expression construct encoding a patched polypeptide which construct functionally replaces the wild-type patched gene when used in the treatment of an animal having a disorder characterized by loss-of-function of a wild-type patched gene.

214. The use or construct of claim 212 or claim 213 respectively wherein the cells are transfected in vivo.

215. The use or construct of claim 212 or claim 213 respectively wherein the cells are transfected in vivo.

216. The use or construct of claim 212 or claim 213 respectively wherein the expression construct is a viral vector.

217. The use or construct of claim 212 or claim 213 respectively wherein the transfected cells include epithelial cells.

218. The use or construct of claim 212 or claim 213 respectively wherein the transfected cells include neuronal cells.

219. The use or construct of claim 212 or claim 213 respectively wherein the transfected cells include carcinoma cells.

220. The use or construct of claim 219 wherein the carcinoma cells are basal cell carcinoma cells.

221. The use or construct of claim 212 or claim 213 respectively wherein the S sfected cells include meningioma cells. [1:\DayLib\LIBVV]02700.doc:ais

222. The use or construct of claim 212 or claim 213 respectively wherein the transfected cells include medulloma cells.

223. The use or construct of claim 212 or claim 213 respectively wherein the transfected cells include fibroma cells.

224. A nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 75% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 19, and fragments thereof.

225. A nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 85% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 19, and fragments thereof.

226. A nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid is sequence is at least 90% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 19, and fragments thereof.

227. A nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 95% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 20 19, and fragments thereof.

228. A nucleic acid including a nucleic acid sequence encoding an amino acid sequence that binds a naturally occurring hedgehog polypeptide, wherein the amino acid sequence is at least 98% identical to a sequence selected from SEQ ID No. 2, 4, 6, 8, 19, and fragments thereof.

229. A nucleic acid including a nucleic acid sequence that hybridizes under stringent conditions to a sequence selected from SEQ ID Nos. 1, 3, 5, 7, 9, 18, and fragments thereof, wherein the nucleic acid sequence encodes an amino acid sequence that binds a naturally occurring hedgehog polypeptide, and wherein the nucleic acid sequence is not identical to SEQ ID No.

230. The nucleic acid sequence according to claim 229, wherein the nucleic acid sequence is a mammalian nucleic acid sequence.

231. The nucleic acid sequence according to claim 229, wherein the nucleic acid sequence is a human, mouse, mosquito, butterfly or beetle nucleic acid sequence. S232. The nucleic acid sequence according to claim 229, wherein said nucleic acid equence is a human nucleic acid. [1:\DayLib\LIBVV]02700.doc:ais

233. The nucleic acid sequence according to claim 229, wherein said nucleic acid sequence is a mouse nucleic acid.

234. The nucleic acid sequence according to claim 229, wherein said nucleic acid sequence is a fragment of at least about 18 bp. s 235. The nucleic acid sequence according to claim 229 joined to a nucleic acid sequence comprising a restriction enzyme recognition sequence. Dated 26 June, 2002 The Board of Trustees of the Leland Stanford Junior University Patent Attorneys for the Applicant/Nominated Person SPRUSON FERGUSON go *e~ •go o oo°' [1:\DayLib\LIBVVIO2700.doc:ais