WO1998046744A1 - Intracellular interactors and eh domain binding specificity - Google Patents

Intracellular interactors and eh domain binding specificity Download PDF

Info

Publication number
WO1998046744A1
WO1998046744A1 PCT/IT1998/000077 IT9800077W WO9846744A1 WO 1998046744 A1 WO1998046744 A1 WO 1998046744A1 IT 9800077 W IT9800077 W IT 9800077W WO 9846744 A1 WO9846744 A1 WO 9846744A1
Authority
WO
WIPO (PCT)
Prior art keywords
pro
ser
seq
ala
gly
Prior art date
Application number
PCT/IT1998/000077
Other languages
French (fr)
Inventor
Anna Elisabetta Salcini
Margherita Doria
Pier Giuseppe Pelicci
Pier Paolo Di Fiore
Original Assignee
Istituto Europeo Di Oncologia S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Istituto Europeo Di Oncologia S.R.L. filed Critical Istituto Europeo Di Oncologia S.R.L.
Priority to AU70775/98A priority Critical patent/AU7077598A/en
Publication of WO1998046744A1 publication Critical patent/WO1998046744A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/82Translation products from oncogenes
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence

Definitions

  • the present invention relates to proteins and peptides which are able to bind to proteins containing the EH domain, in particular to those which bind to the signal transducers epsl5 and epsl5R containing such domain, and it also relates to polynucleotides coding for such proteins.
  • cellular functions including proliferation, differentiation, cytoskeleton organization and apoptosis, are regulated through a complex intracellular network of signal transducers.
  • the discovery of specific pathways transducing proliferation signals from the cellular surface to the nucleus is critical to the comprehension, thus to the control, of cellular growth and oncogenesis.
  • Growth factors like the epidermal growth factor (EGF) and the platelet derived growth factor (PDGF), play an important role in cellular proliferation. Growth factors bind to receptors present on the cell surface, many of which are endowed with tyrosine kinase activity (receptor tyrosine kinases, RTK) .
  • EGF epidermal growth factor
  • PDGF platelet derived growth factor
  • epsl5 This protein, termed epsl5, was found to be phosphorylated on tyrosines upon activation of EGFR and PDGFR (Fazioli et al . , Mol. Cell. Bol., 13, 5814-5828, 1993). Epsl5 was shown to be involved in the control of cellular proliferation, based on the observation that overexpression of epsl5 induces transformation of NIH3T3 cells, albeit with low efficiency (Fazioli et al., supra).
  • Epsl5R is a subsequently identified protein, which displays sequence and structural homology to epsl5 but is a distinct protein (Wong et al., Proc. Natl. Acad. Sci. Usa 92, 9530-9534, 1995) . It was recently found that epsl5 and epsl5R contain at their N-terminus three copies of a novel functional domain, called EH (for Epsl5 Homology), which functions as a protein:protein interaction surface (Fig. 1) (Wong et al., Proc. Natl. Acad. Sci. Usa 92, 9530-9534, 1995).
  • h-NUMB which has the amino acid sequence reported as SEQ ID NO:l
  • h-NUMB-R which has the amino acid sequence reported as SEQ ID NO: 2
  • h-RAB-R which has the amino acid sequence reported as SEQ ID NO 4, ehb3, which consists in part or entirely of the amino acid sequence reported as SEQ ID NO: 5, ehblO, which consists in part or entirely of the amino acid sequence reported as SEQ ID NO: 6, ehb21, which consists in part or entirely of the amino acid sequence reported as SEQ ID NO: 7, or their functional derivatives.
  • the invention further comprises isolated or purified polynucleotides which encode operably for the above mentioned proteins h-NUMB, h-NUMB-R, h-RAB-R, ehb3, ehblO, ehb21, or their functional derivatives.
  • polynucleotides include the following nucleotide sequences: the nucleotide sequence reported as SEQ ID NO: 8, which encodes for the h-NUMB protein, the nucleotide sequence reported as SEQ ID NO: 9, which encodes for the h-NUMB-R protein, the nucleotide sequence reported as SEQ ID NO: 11, which encodes for the h-RAB-R protein.
  • polynucleotide encoding for the ehb3 protein which contains the partial nucleotide sequence reported as SEQ ID NO: 12
  • polynucleotides operably encoding for the ehblO protein which contains the partial nucleotide sequence reported as SEQ ID NO: 13
  • polynucleotide encoding operably for the ehb21 protein which contains the partial sequence reported as SEQ ID NO: 14.
  • the present invention includes mRNA transcripts encoding for the amino acid sequences of the proteins which are the objects of the present invention.
  • the invention further comprises expression vectors for the production in both eukaryotic and prokaryotic cells of the proteins which are the objects of the present invention and cell lines containing vectors expressing the proteins which are the objects of the present invention.
  • fragment is referred to any portion of the proteins of the present invention.
  • variants are referred to molecules similar in the overall to the complete protein of the invention or to a fragment of such protein.
  • variants include deletions, insertions, and / or substitutions of residues in the amino acid sequence.
  • analogue is referred to a molecule which is not present in nature and is basically similar to the native protein or to a fragment of it.
  • chemical derivative refers to proteins and peptides of the present invention containing chemical groups that do not normally belong to the protein.
  • Such “chemical derivatives” are obtained by chemical modification of specific amino acid residues with an organic derivatizing agent, known to the person skilled in the art, that is capable of reacting with selected side chains or terminal residues. Such modifications may improve the solubility, absorption, biological half life and the like, of the proteins and peptides of the invention. Such modifications are reported, for example, in Remington's Pharmaceutical Sciences 16 th ed.,Mack Publishing Co,Easton, PA (1980) .
  • Cysteinyl residues most commonly are reacted with alpha-haloacetates ( and corresponding amines ) , such as chloroacetic acid or chloroacetamide, to give carboxymethyl or carboxyamidomethyl derivatives. Cysteinyl residues also are derivatized by reaction with bromotrifluoroacetone, alpha-bromo-beta- (5-imidazolyl) propionic acid, choroacetyl- phosphate, N-alkylmaleimides, 3-nitro-2pyridyl disulfide, p- chloromercuribenzoate, 2-chloromercuri-4-nitrophenol, or chloro-7-nitrobenzo-2-oxa-l, 3-diazole.
  • Histidyl residues are dervatized by reaction with diethylprocarbonate at pH 5.5-7.0 because this agent is relativly specific for the histidyl side chain.
  • Para- bromophenacyl bromide also is useful; the reaction is preferably performed in 0.1 M sodium cacodylate at pH 6.0.
  • Lysinyl and amino terminal residues are reacted with succinic or other carboxylic acid anhydrides. Derivatization with this agent has the effect of reversing the charge the lysinyl residues.
  • Other suitable reagents for derivatizing alpha-amino-containing residues include imido esters such as methyl picolinimidate; pyridoxal phosphate; pyridoxal; chloroborohydride; trinitrobenzenesulfonic acid; 0- methylisourea; 2, 4-pentanedione; and transaminase-catalyzed reaction with glyoxylate.
  • Arginyl residues are modified by reaction with one or several conventional reagents, among them phenylglyoxal, 2, 3-butanedione, 1, 2-cyclohexanedione, and ninhydrin. Derivatization of arginine residues requires that the reaction be performed in alkaline conditions because of the high pK a of the guanidine functional group. Furthermore, these reagents may react with the groups of lysine as well as the arginine epsilon-amino group.
  • Carboxyl side groups are selectively modified by reaction with carbodiimides (R'-N-C- N-R' ) such as l-cyclohexyl-3- (2-morpholinyl- (4-ethyl) - carbodiimide or l-ethyl-3-azonia- , 4-dimethylpentyl) carbo- diimide.
  • carbodiimides R'-N-C- N-R'
  • asparthyl and glutamyl residues are converted to asparaginyl and glutminyl residues by reaction with ammonium ions.
  • Glutaminyl and asparaginyl residues are frequently deamidated to the corresponding glutamyl and aspartyl residues. Alternatively, these residues are deaminated under mildly acidic conditions. Either form of these residues falls within the scope of this invention .
  • Derivatization with bifunctional agents is useful for cross-linking the peptide to a water insoluble support matrix or to other macromolecular carriers.
  • Commonly used cross-linking agents include, e.g. , 1, 1-bis (diazoacetyl) -2- phenylethane, glutaraldehyde, n-hy-droxysuccinimide esters , for example, esters with 4-azidosalicylic acid, homobifunctional imidoesters, including disuccinimidyl esters such as 3, 3' -dithiobis (succinimi-dylpropionate) , and bifunctional maleimides such as bis-N-maleimido-1, 2-octane .
  • Derivatizing agents such as methyl-3-[ (p-
  • azidophenyl) dithio]propioimidate yield photoactivatable intermediates that are capable of forming crosslinks in the presence of light.
  • reactive water-insoluble matrices such as cyanogen bromide-activated carbohydrates and the reactive substrates described in U . S . -A-3, 969, 287 ; 3,691,016; 4,195,128; 4,247,642; 4,229,537; and 4,330,440 are employed for protein immobilization.
  • isolated and / or purified The proteins of the present invention are referred to as isolated and / or purified, where the term “isolated” denotes that the material has been removed from its original environment and the term “purified” is intended relative to the material in its natural state an does not mean absolute purity.
  • protein is intended a polypeptide with a molecular weight between about 5.000 and more than 150.000 Dalton.
  • peptide is intended a polymer of amino acids having a sequence of less than 50 amino acids.
  • concentration of proteins isolated and / or purified is preferably al least 1 ⁇ g/ml.
  • the present invention includes the discovery that the proteins which are able to bind to proteins containing at least one EH domain, in particular epsl5 and epsl5R, possess an Asparagine-Proline-Phenylalanine (NPF) motif. It was also shown that the already known h-RAB protein, of which the amino acid sequence is reported as SEQ ID NO: 3 and which contains an NPF motif, is able to bind to proteins containing the EH domain. Besides, it was shown that peptides containing the NPF motif bind to proteins containing the EH domain.
  • NPF Asparagine-Proline-Phenylalanine
  • the present invention therefore further comprises the complex between a protein containing at least one EH domain, in particular epsl5 and epsl5R, and a protein containing an NPF motif, in particular the proteins of this invention and their functional derivatives.
  • Another aspect of the invention consists of the peptides containing at least one NPF motif which are able to bind to a protein containing at least one EH domain, in particular to epsl5 and epsl5R.
  • peptides belong to the proteins of the invention. More preferably such peptides have the sequences reported as SEQ ID NO: 15-79.
  • the invention further comprises chemical derivatives of such peptides, the term "chemical derivative" having the meaning previously reported.
  • Another object of the invention consists of the fusion proteins encompassing a peptide containing an NPF motif, in particular GST-fusion proteins.
  • a further object of the invention is a method for identifying and purifying proteins containing at least one EH domain based on the use of a fusion protein encompassing a peptide containing an NPF motif.
  • another object of the present invention is a method for purifying an EH-containing protein from a complex mixture, which consists in: a) incubate such complex mixture with a solid-phase support to which a peptide containing an NPF motif is bound, allowing such protein to form a complex with the peptide bound to the solid support. b) remove substances not complexed to such peptide bound on the solid support. c)elute such protein complexed to the solid support.
  • Another aspect of the invention consists in purified antibodies against the proteins of the invention, including both monoclonal and polyclonal antibodies.
  • the invention further includes antisense RNA obtained from the polynucleotides of the invention.
  • the present invention includes homopurine and homopyri idine sequences of the polynucleotides of the invention or of their fragments, and their use as triple helix probes.
  • EH novel protein:protein interaction domain
  • GST Glutatione-S-Transferase
  • GST Glutatione-S-Transferase
  • Position +1 (with respect to NPF) exhibited a strong preference for basic and for basic/hydrophobic residues in peptides selected by GST-EHR and GST-EH, respectively. Of note, negatively charged residues were never present in this position. Positions -1 and -2 displayed a weaker preference for Serine or Threonine (Fig. 2) .
  • h-NUMB the human homologue of NUMB, a developmentally regulated gene of Drosophila (Uemura et al., Cell 58, 349-360, 1989); h-NUMB-R, related to h-NUMB but distinct; h-RAB, the gene coding the cellular co-factor of the HIV REV protein; h-RAB-R, a gene related to h-RAB; and three novel genes identified as ehb3, ehblO and ehb21 (for EH-Binding, followed by the original plaque identifier) .
  • the partial cDNAs reported above did not contain entire open reading frames (ORF) .
  • ORF open reading frames
  • Several cDNAs were isolated and the longest ones, representative of each gene, were sequenced.
  • a schematic of the cDNAs containing the entire ORFs of human h-NUMB and h-NUMB-R, and of human h-RAB and h-RAB-R is presented in Fig. 4 and 6, respectively, while the entire sequences are reported as SEQ ID NO: 8, 9, 10, and 11, respectively.
  • SEQ ID NO: 12,13 and 14 correspond to the partial cDNA nucleotide sequences of ehb3, ehblO and ehb21, respectively.
  • the sequence of h-RAB corresponds to that already reported by Bogerd et al. (Cell. 82, 485-494, 1995) and Fritz et al . (Nature 376, 530-550, 1995).
  • the ORFs of h-NUMB and h-NUMB-R have the capacity of encoding peptides of 603 and 609 amino acids, respectively, with a predicted molecular weight of 66 and 65 kDa.
  • the two predicted proteins have amino acid sequences indicated as SEQ ID NO: 1 and 2, respectively, and display an overall relatedness of 74% with 57% identity (Fig. 4) .
  • both h-NUMB and h-NUMB-R contained a phosphotyrosine interaction domain (PID/PTB, van der Geer and Pawson, Trends Biochem., 20: 277-280, 1995) in their N-terminus (Fig. 4), and putative SH3-binding sites, in their C-terminus.
  • the ORFs of h-RAB and h-RAB-R have the capacity of encoding peptides of 562 and 481 amino acids, respectively, with a predicted molecular weight of 58 and 49 kDa.
  • the two predicted proteins display an overall relatedness of 71% with 46% identity (Fig. 7) .
  • h-RAB and h-RAB-R conserveed features between h-RAB and h-RAB-R include a zinc-finger region, in the N-terminus of the proteins and several FG (Phenylalanine-Glycine) motifs, characteristic of nucleoporin-like proteins. In addition, they both contained four NPF motifs, located in the C-terminal half of the molecule .
  • FG Phenylalanine-Glycine
  • epsl5 can physically interact with some of the EH-binding proteins in vivo.
  • the expression vectors obtained were called HA-NUMB and HA-RAB, respectively.
  • C33A cells were then transiently transfected with HA-NUMB or HA-RAB vectors and cellular lysates were prepared using mild lysis condition to preserve protein:protein interactions.
  • epsl5 was recoverable by immunoprecipitation with an anti-HA antibody, but not by control antibodies. The sum of the above results demonstrates that epsl5 interacts with h-NUMB and h-RAB in vivo.
  • Epsl5 and epsl5R are the only EH-containing proteins known in mammals.
  • the identification of peptide sequences that bind to EH allowed for the identification of putative EH- containing proteins in mammalian cells.
  • GST-fusion proteins encompassing NPF-containing peptides from h-NUMB, h-RAB and h-RAB-R were challenged with 35 S-labeled lysates from NIH-3T3 cells and several cellular proteins were specifically recovered (Fig. 15) .
  • mutations of NPF to DPF, NGF or NPY totally abolished recognition, strongly argues that the identified proteins represent EH-containing species (Fig. 15).
  • cDNA may be used as a probe to identify a cDNA clone containing the full length of cDNA sequence.
  • the partial sequences, or portion thereof, can be nick-translated or end-labeled with 32 P using polynucleotide kinase and labeling methods known to those with skill in the art (Basic Methods in Molecular Biology, L.G. Davis, M.D. Dibner and J.F. Battey, ed., Elsevier
  • a lambda library can be directly screened with the labeled cDNA probe, or the library can be converted en masse to pBluescript® (Stratagene, La Jolla, Calif.) to facilitate bacterial colony screening. Both methods are well known in the art.
  • Filters with bacterial colonies containing the library in pBluescript® or bacterial lawns containing lambda plaques are denaturated and the DNA is fixed to the filters.
  • the filters are hybridized with the labeled probe using hybridization conditions described by Davis et al. (supra).
  • the partial sequence, cloned into lambda or pBluescript®, can be used as a positive control to assess background binding and to adjust the hybridization and washing stringencies necessary for accurate clone identification.
  • the resulting autoradiograms are compared to duplicate plates of colonies or plaques; each exposed spot corresponds to a positive colony or plaque.
  • the colonies or plaques are selected, expanded, ant the DNA is isolated from the colonies for further analysis and sequencing.
  • Positive cDNA clones in phage lambda may be analyzed to determine the amount of additional sequence they contain using PCR with one primer from the partial sequences and the other primer from the vector.
  • Clones with a larger vector- insert PCR product than the original clone are analyzed by restriction digestion and DNA sequencing to determine whether they contain an insert of the same size or similar as the mRNA size on a Northern blot.
  • the complete sequence of the clones can be determined.
  • the preferred method is to use exonuclease III digestion (McCombie et al . , Methods, 3: 33-40, 1991).
  • a series of deletion clones is generated, each of which is sequenced.
  • the resulting overlapping sequences are assembled into a single contiguous sequence of high redundancy (usually three to five overlapping sequences at each nucleotide position) , resulting in a highly accurate final sequence.
  • the gene can be expressed in a recombinant organism to obtain significant amount of protein.
  • the DNA encoding for the protein can be inserted into other conventional host organisms and expressed.
  • the organism can be a bacterium, yeast, cell line or multicellular plant or animal.
  • the literature is replete with examples of suitable host organisms and expression techniques .
  • naked polynucleotide can be injected directly into muscle tissue of mammals, where it is expressed.
  • This methodology can be used to deliver the polynucleotide and, therefore, the resulting polypeptide translation product to the animal, or to generate an immune response against a foreign polypeptide (Wolff et al., Science 247: 1465, 1990; Feigner et al . , Nature 349: 351, 1991) .
  • the coding sequence, together with appropriate regulatory regions i.e., a construct
  • the cell (which may or may not be part of a larger organism) then expresses the polypeptide.
  • Antibodies generated against the proteins of the present invention can be obtained by direct injection of the naked polynucleotide into an animal (Wollf, supra) or by administering the polypeptide to an animal. The antibody so obtained will then bind the polypeptide itself. In this manner, even partial DNA sequences can be used to generate antibodies binding the whole native polypeptide.
  • Antibodies can be used in standard immunoassays to detect the presence and / or amount of the proteins of the invention in a sample. Such assays can comprise competitive or non competitive assays. Radioimmunoassays, ELISAs, Western Blot assays, immunohistochemical assays, immunochromatographic assays, and other conventional assays are expressly contemplated. Furthermore, monoclonal and polyclonal antibodies can be generated using well-known methods .
  • Antibodies against the proteins of this invention can be used to determine the quantity of these proteins in a sample. This can be particularly useful in clinical research, as well as in detecting abnormalities in mitogenic signal transduction in malignant tissue.
  • tumor markers are released in the blood stream at levels which correlate with the size of the tumor and its clinical stage.
  • Determination of the levels of a marker can be advantageous in aiding the diagnostic procedures and in monitoring the effectiveness of therapy.
  • the antibody against a protein is immobilized to an agarose column.
  • Sample is then directed through the column where the protein in the sample is bound by the immobilized antibody.
  • a known quantity of radiolabeled antibody is directed through the column.
  • the quantity of labeled antibody which is not retained on the column is measured, and bears a relationship to the quantity of protein in the sample.
  • Another exemplary technique is liquid phase radioimmunoassay.
  • a standard measurement is made. Specifically a small, known amount of purified protein radiolabeled in a conventional manner, is challenged against a known amount of antibody.
  • the resulting immunocomplex is recovered by centrifugation, and the radioactivity of the centrifugate is determined. This value is used as a standard against which later measurements are compared.
  • a sample, containing unknown amounts of protein is challenged against the same known amount (used in making the standard measurement) of antibody.
  • the same amount of labeled protein used in making the standard measurement is added to the reaction mixture, followed by centrifugation and measurement of radioactivity as explained above.
  • the decrease in the immunoprecipitated radioactivity is proportional to the amount of protein in the sample.
  • other well known conventional immunoassay methods may be used.
  • Antisense RNA molecules are known to be useful for regulating translation within the cell. Antisense RNA molecules can be produced from the sequences of the present invention. These antisense molecules can be used as therapeutic agents to regulate gene expression.
  • the antisense molecules are obtained from a nucleotide sequence by reversing the orientation of the coding region with regard to the promoter.
  • the antisense RNA is complementary to the corresponding mRNA.
  • the antisense sequences can contain modified sugar phosphate backbones to increase stability and make them less sensitive to RNase activity. Examples of the modifications are described by Rossi et al., Pharmacol. Ther. 50: 245-254,
  • Antisense molecules are introduced into cells that express the protein gene.
  • the effectiveness of antisense inhibition on translation can be monitored using techniques that include, but are not limited to, antibody-mediated tests such as RIAs and ELISA, functional assays, or radiolabeling.
  • the antisense molecule is introduced into the cells by diffusion or by transfection procedures known in the art.
  • the molecules are introduced onto cell samples at a number of different concentrations, preferably between lxl0 ⁇ 10 M to 1x10 " 4 M. Once the minimum concentration that can adequately control translation is identified, the optimized dose is translated into a dosage suitable for use in vivo.
  • an inhibiting concentration in culture of lxlO "7 M translates into a dose of approximately 0.6 mg/kg bodyweight.
  • levels of oligonucleotide approaching 100 mg/kg bodyweight or higher may be possible after testing the toxicity of the oligonucleotide in laboratory animals.
  • the antisense molecule can be introduced into the body as a bare or naked oligonucleotide, oligonucleotide encapsulated in lipid, oligonucleotide sequence encapsidated by viral protein, or as oligonucleotide contained in an expression vector.
  • the antisense oligonucleotide is preferably introduced into the vertebrate by injection. Alternatively, cells from the vertebrate are removed, treated with the antisense oligonucleotide, and reintroduced into the vertebrate. It is further contemplated that the antisense oligonucleotide sequence is incorporated into a ribozyme sequence to enable the antisense to bind and cleave its target. For technical applications of ribozyme and antisense oligonucleotides, see Rossi et al., supra.
  • Triple helix oligonucleotides are used to inhibit transcription from a genome. They are particularly useful for studying alterations in cell activity as it is associated with a particular gene.
  • the gene sequence or, more preferably, a portion thereof can be used to inhibit gene expression in individuals suffering from disorders associated with this gene.
  • a portion of a gene sequence, or the entirely thereof can be used to study the effect of inhibiting transcription of the gene within the cell.
  • homopurine sequences were considered the most useful.
  • homopyrimidine sequences can also inhibit gene expression.
  • both types of sequences corresponding to the claimed nucleotide sequences are contemplated within the scope of this invention.
  • Homopyrimidine oligonucleotides bind to the major groove at homopurine : homopyrimidine sequences.
  • 10-mer to 20-mer homopyrimidine sequences from the claimed nucleotide sequences genes can be used to inhibit expression from homopurine sequences.
  • the natural (beta) anomers of the oligonucleotide units can be replaced with alpha anomers to render the oligonucleotide more resistant to nucleases.
  • an intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha oligonucleotide to stabilize the triple helix.
  • the oligonucleotides may be prepared on an oligonucleotide synthesizer or they may be purchased commercially from a company specializing in custom oligonucleotide synthesis.
  • the sequences are introduced into cells in culture using techniques known in the art that include but are not limited to calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome- mediated transfection or native uptake. Treated cells are monitored for altered cell function. Alternatively, cells from the organism are extracted, treated with the triple helix oligonucleotide, and reimplanted into the organism.
  • FIG. 1 Schematic of the epsl5 and epsl5R proteins with their EH domains. Amino acid positions are indicated.
  • FIG.2. Predicted amino acid sequence of peptides selected by screening of a random phage-displayed peptide library with GST-EH and GST-EHR, from epsl5 and epsl5R, respectively. NPFs are in bold-face.
  • FIG. 4 Human NUMB and NUMB-R cDNAs and proteins .
  • FIG.4 The structure of the human NUMB (hNUMB) and NUMB-R (hNUMB-R) cDNAs is depicted.
  • the ORFs are indicated by a solid box. Positions are indicated in kb.
  • the nucleotide positions of initiator and terminator codons are also indicated.
  • Canonical poly-adenylation sites (AATAAA) were at position 2649, 2823, 2958 and 3025, for h-NUMB and h-NUMB-R, respectively (not shown).
  • FIG.5. Predicted protein sequences and alignment of human NUMB and NUMB-R.
  • FIG.6 Human RAB and RAB-R cDNAs and proteins.
  • the structure of the human RAB (h-RAB) and RAB-R (h-RAB-R) cDNAs is depicted.
  • the ORFs are indicated by a solid box. Positions are indicated in kb. The nucleotide positions of initiator and terminator codons are also indicated.
  • Canonical poly-adenylation sites (AATAAA) were at position 2499, 2542 and 2556 of the h-RAB sequence; no poly- adenylation site was found in the isolated h-RAB-R cDNA (not shown).
  • FIG.7 Predicted protein sequences and alignment of human RAB and RAB-R. The sequence of h-RAB is identical to that reported by Bogerd et al . (Cell.
  • FIG. 8 In vitro binding of NPF-containing proteins and peptides to epsl5.
  • FIG.8 In vitro binding of epsl5 to GST-fusions of EH-binding proteins: Total cellular lysates from NIH-3T3 cells (1 mg/lane) were incubated with the indicated GST-fusion proteins (10 Dg) for 1 h at 4 °C. Specifically bound epsl5 was detected by immunoblot with an anti-epsl5 antibody.
  • FIG. 10 Coimmunoprecipitation of epsl5 with h-NUMB and h-RAB proteins.
  • Two expression vectors with cytomegalovirus promoter (HA-RAB and HA-NUMB) were engineered by inserting the sequence encoding the HA epitope (YDVPDYASLP) .
  • C33A cells were transfected with the vectors and total cellular proteins were obtained in mild lysis condition to preserve protein:protein interactions. 5 mg. of total cellular proteins were immunoprecipitated with an anti-HA antibody (HA) or with a control serum (C) .
  • HA anti-HA antibody
  • C control serum
  • Figure 11 Requirement for the NPF motif for binding to epsl5. Binding to epsl5 of mutant peptides containing mutations in the NPF sequence. Peptides engineered in GST- fusion proteins are indicated on the left. GST-NPF corresponds to the sequence of a NPF-containing peptide derived from the sequence of RAB (shown in Fig. 3) . Mutant peptides (GST-NGF, GST-DPF, GST-NPY) are also indicated. The in vitro bindings to epsl5, obtained as described in Fig. 8,9,10, are shown on the right. The lane marked "RAB” represents an in vitro binding obtained with the GST-RAB protein (Fig. 8), to serve as a positive control.
  • FIG.12 Mapping of the minimal region of epsl5 required for binding to NPF-containing proteins.
  • FIG.13 Schematic representation of the epsl5 N- terminal domain and of the GST-fusion proteins engineered, with predicted turns indicated by solid boxes. The indicated fragments of epsl5 were engineered into GST-fusion proteins and used for in vitro binding experiments.
  • the EH construct contains all three EH domains.
  • the M2 construct contains the region encompassing the second EH domain flanked by the natural regions predicting the turns shown in FIG. 12.
  • LEH-2 construct contains the same region without the turn.
  • the KT- LEH2 construct contains the same region preceded by an artificial turn region composed of poly-glycine . Amino acid positions are also indicated in parentheses.
  • FIG.14 In vitro bindings. The GST-fusions shown in FIG.13 were used to bind to 35 S-labeled h-RAB protein, obtained by in vitro transcription/translation of the RAB cDNA. Detection was by autoradiography. The lane marked T/T was loaded with the primary product of the in vitro transcription/translation, to serve as a reference.
  • Fig.15 Identification of a family of EH-containing proteins in mammals. 35 S-labeled lysates from NIH-3T3 (50xl0 6 TCA-precipitable counts) were incubated with the indicated GST-fusion proteins displaying NPF-containing peptides (NPF- RAB, NPF-RAB-R and NPF-NUMB, underlined in Fig. 3) or mutant peptides (NGF-RAB, DPF-RAB and NPY-RAB, shown in Fig. 11), or with GST. The first two lanes represent the same lysate immunoprecipitated with anti-epsl5 antibody (EPS15) or with a control serum (PRE) . The position of epsl5 is indicated by an arrowhead. Molecular weight markers are indicated in kDa. EXAMPLES
  • Example 1 Screening of phage-display libraries .
  • the peptide library utilized and the panning conditions were essentially as described by Felici et al. (J. Mol. Biol. 222, 301-310, 1991), the main difference being that the target EH domains (3 ⁇ g) were immobilized by binding to 20 ⁇ l of GST-sepharose matrix (Pharmacia).
  • the peptide library was screened with either the GST-EH or the GST-EHR fusion proteins, designed to encompass amino acid positions 2-230 and 15-368 of mouse epsl5 and mouse epsl5R, respectively. Two panning cycles were carried out for each domain before sequencing of the clones that were enriched during the selection procedure.
  • Example 2 Isolation of cDNAs encoding EH-binding proteins.
  • a GST-fusion protein containing the three EH domains of epsl5 (GST-EH, Wong et al., Proc. Natl. Acad. Sci. Usa 92, 9530-9534, 1995) was used to screen a pCEV-LAC-based prokaryotic expression library (a kind gift of T. Miki) from M426 human fibroblasts.
  • Recombinant plaques (5 x 10 ⁇ ) were screened, after induction with IPTG (D- thyogalactopyranoside) , using a modification of the Far- Western assay developed to identify proteins interacting with GST-fusions (Matoskova et al., Oncogene 12, 2679-2688, 1996; ibidem 12, 2563-2571, 1996) .
  • Blots were then incubated with GST-EH (10 nM) in TTBS in the presence of reduced glutathione (3 ⁇ M) and BSA (0.5% w/v) for 1 h at RT . After extensive washing in TTBS, blots were detected with an affinity-purified anti-GST antibody, as described in Matoskova et al. (supra).
  • phages that specifically reacted with GST-EH but not with control GST were subjected to cross-hybridization experiments and assigned to seven groups, corresponding to distinct cDNAs (not shown) .
  • phages containing the longest inserts were sequenced by the dideoxy-termination method (the so-called Sanger method) on both strands of the cDNAs, using a commercial kit (Sequenase) .
  • a pCEV-29-based eukaryotic expression library from M426 human fibroblasts was screened, with standard techniques. Alignments of peptide sequences were performed by a CLUSTAL4 algorithm on a MacDNASIS software.
  • Example 3 Production of recombinant proteins.
  • GST-fusion proteins containing large fragments of the proteins of interest were obtained by recombinant PCR of the appropriate fragment from the murine epsl5 and epsl5R cDNAs, or from the cDNAs of the EH-binding proteins, followed by cloning in the pGEX expression vector, in-frame with the GST moiety.
  • pGEX 4T1 vector (Pharmacia).
  • the kT-LEH2 vector (Fig. 6) was created using the pGEX-KT vector which contains a poly-glycine sequence in frame with the GST moiety before than the cloning sites.
  • GST-fusions containing short peptides, from the proteins of interest were obtained by annealing in vitro complementary oligonucleotides with the appropriate sequence followed by cloning in frame with the GST moiety, in a pGEX-KT vector. Purification of the GST-fusion proteins onto agarose- glutathione and in vitro binding experiments were performed as described in Wong et al. l(Proc. Natl. Acad. Sci. Usa 92, 9530-9534; 1995) and Matoskova et al. (Oncogene 12, 2679- 2688 and 2563-2571, 1996) .
  • Expression vectors for h-RAB and h-NUMB proteins were engineered in the pMT2 eukaryotic expression vector, by inserting (by insertional overlapping polymerase chain reaction) the sequence encoding the hemagglutinin epitope (YDVPDYASLP) between codons 1 and 2 of the open reading frame of the RAB and NUMB cDNAs, to obtain pMT2-HA-RAB and pMT2-HA-NUMB, respectively.
  • Transient transfection of C33A cells by calcium phosphate was performed as previously described for NIH-3T3 cells (Fazioli et al., Mol. Cell. Biol., 13: 5814-5828, 1993).
  • was synthesized by in vitro transcription-translation using a commercial kit (Promega) and the full-length h-RAB cDNA. Metabolic labeling with 35 S- methionine of NIH-3T3 cells was performed as previously described (Fazioli et al . , supra).
  • AAAGGAAGTT CTTCAAAGGC TTCTTTGGAA AAACTGGAAA GAAAGCAGTT AAAGCAGTTC 540 TGTGGGTCTC AGCAGATGGA CTCAGAGTTG TGGATGAAAA AACTAAGGAC CTCATAGTTG 600
  • ATATTCCAAG CACATTTACT GAAATGTAAA ACACAACAGG AAGCAAAGCA ATCTCCCTTT 2460 GTTTTTCAGG CCATTCACCT GCCTCCTGTC AGTAGTGGCC TGTATTAGAG ATCAAGAAGA 2520
  • GCGCCAGCAC CAGGGCCACC ACCTGCCACA ACAGGGACTT CTGCCTGGGG TGAGCCCTCC 1620
  • AAAAAAAAAA ACAAAAAAAG GCCGCCTCGG CCCGTCGACG TCAA 3104
  • ATCTCTCCAC CTCTTGCACT GTTGTCTTGT TTCACTGATC TTAGCTTTAA ACACAAGAGA 2040
  • CTGTTCCCCC CGCAGACCCC GCTTGTTCAG CAGCAGAATG GCTCTTCCTT CGGGGACTTA 1380
  • ACGGCCATTA CCAATCGCGA AACCGCGATT GCAGTTCTGG CCGCTTCCTA TGGTTCGGGT 60
  • GGTTTCCATA TGGGGATTGG TGGCGACGAC TCCTGGAACC CGTCAGTATC GGCGGAATTG 60
  • ACAGCCCCAA CCCTCAGACC CTCGCCTTCC AAGGCAGGCC CCTCAGCCTG GCCTGCTCTC 660
  • CTGGCCCAGG ACTTGGGACA GTGGCCTTGT CTTTGTCCTC CCCACCCCCC AGCCCTAGGG 840
  • GGTTCTTATT TAACTGTCTA GTTTTGATAG AATTTACCAG GTCTGGCTGA ATGAAGATGT 1200
  • CTTAGGCTCC CGCTGCCATT TGGGTAAGCC GGTGGCTGGT CTCGTCTGCC GGGGGAAGGG 2460
  • MOLECULE TYPE peptide
  • SEQUENCE DESCRIPTION SEQ ID NO: 57:

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Toxicology (AREA)
  • Oncology (AREA)
  • Peptides Or Proteins (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)

Abstract

The present invention relates to proteins and peptides which are able to bind to proteins containing the EH domain, in particular to those which bind to the signal transducers eps15 and eps15R containing such domain, and it also relates to polynucleotides coding for such proteins.

Description

INTRACELLULAR INTERACTORS AND EH DOMAIN BINDING SPECIFICITY
The present invention relates to proteins and peptides which are able to bind to proteins containing the EH domain, in particular to those which bind to the signal transducers epsl5 and epsl5R containing such domain, and it also relates to polynucleotides coding for such proteins.
Many cellular functions, including proliferation, differentiation, cytoskeleton organization and apoptosis, are regulated through a complex intracellular network of signal transducers. The discovery of specific pathways transducing proliferation signals from the cellular surface to the nucleus is critical to the comprehension, thus to the control, of cellular growth and oncogenesis. Growth factors, like the epidermal growth factor (EGF) and the platelet derived growth factor (PDGF), play an important role in cellular proliferation. Growth factors bind to receptors present on the cell surface, many of which are endowed with tyrosine kinase activity (receptor tyrosine kinases, RTK) . The interaction between growth factors and their receptors results in enhanced receptor catalytic activity and tyrosine phosphorylation of intracellular substrates (Ulrich et al., Cell 61: 203-212, 1990; Aaronson, Science 254: 1146-1153, 1991). The molecular characterization of such intracellular pathways has led to the identification of a heterogeneous class of cytoplasmic proteins that associate and / or are tyrosine phosphorylated by RTKs .
Among these substrates a protein of 140-150 kDa was identified. This protein, termed epsl5, was found to be phosphorylated on tyrosines upon activation of EGFR and PDGFR (Fazioli et al . , Mol. Cell. Bol., 13, 5814-5828, 1993). Epsl5 was shown to be involved in the control of cellular proliferation, based on the observation that overexpression of epsl5 induces transformation of NIH3T3 cells, albeit with low efficiency (Fazioli et al., supra).
Epsl5R is a subsequently identified protein, which displays sequence and structural homology to epsl5 but is a distinct protein (Wong et al., Proc. Natl. Acad. Sci. Usa 92, 9530-9534, 1995) . It was recently found that epsl5 and epsl5R contain at their N-terminus three copies of a novel functional domain, called EH (for Epsl5 Homology), which functions as a protein:protein interaction surface (Fig. 1) (Wong et al., Proc. Natl. Acad. Sci. Usa 92, 9530-9534, 1995). We have isolated and / or purified a group of human cytoplasmic proteins which bind to EH-containing proteins. Such group of proteins includes: h-NUMB, which has the amino acid sequence reported as SEQ ID NO:l, h-NUMB-R, which has the amino acid sequence reported as SEQ ID NO: 2, h-RAB-R which has the amino acid sequence reported as SEQ ID NO 4, ehb3, which consists in part or entirely of the amino acid sequence reported as SEQ ID NO: 5, ehblO, which consists in part or entirely of the amino acid sequence reported as SEQ ID NO: 6, ehb21, which consists in part or entirely of the amino acid sequence reported as SEQ ID NO: 7, or their functional derivatives. The invention further comprises isolated or purified polynucleotides which encode operably for the above mentioned proteins h-NUMB, h-NUMB-R, h-RAB-R, ehb3, ehblO, ehb21, or their functional derivatives.
In particular such polynucleotides include the following nucleotide sequences: the nucleotide sequence reported as SEQ ID NO: 8, which encodes for the h-NUMB protein, the nucleotide sequence reported as SEQ ID NO: 9, which encodes for the h-NUMB-R protein, the nucleotide sequence reported as SEQ ID NO: 11, which encodes for the h-RAB-R protein. Also encompassed within the scope of the invention are the polynucleotide encoding for the ehb3 protein, which contains the partial nucleotide sequence reported as SEQ ID NO: 12, the polynucleotides operably encoding for the ehblO protein, which contains the partial nucleotide sequence reported as SEQ ID NO: 13, and the polynucleotide encoding operably for the ehb21 protein, which contains the partial sequence reported as SEQ ID NO: 14.
The present invention includes mRNA transcripts encoding for the amino acid sequences of the proteins which are the objects of the present invention.
In particular mRNA transcripts of polynucleotide sequences set forth as SEQ ID NO:8,SEQ ID NO:9,SEQ ID NO:ll and mRNA transcripts of polynucleotides that contain entirely or in part the partial sequences SEQ ID NO: 12, SEQ ID N0:13, SEQ ID NO: 14.
The invention further comprises expression vectors for the production in both eukaryotic and prokaryotic cells of the proteins which are the objects of the present invention and cell lines containing vectors expressing the proteins which are the objects of the present invention.
The term "functional derivative" is referred to a "fragment", a "variant", an "analogue" or a "chemical derivative" of the proteins of the present invention which maintains the same function of the native proteins described.
The term "fragment" is referred to any portion of the proteins of the present invention.
The term "variant" is referred to molecules similar in the overall to the complete protein of the invention or to a fragment of such protein. For example, such variants include deletions, insertions, and / or substitutions of residues in the amino acid sequence.
The term "analogue" is referred to a molecule which is not present in nature and is basically similar to the native protein or to a fragment of it.
The term "chemical derivative", as used herein, refers to proteins and peptides of the present invention containing chemical groups that do not normally belong to the protein. Such "chemical derivatives" are obtained by chemical modification of specific amino acid residues with an organic derivatizing agent, known to the person skilled in the art, that is capable of reacting with selected side chains or terminal residues. Such modifications may improve the solubility, absorption, biological half life and the like, of the proteins and peptides of the invention. Such modifications are reported, for example, in Remington's Pharmaceutical Sciences 16th ed.,Mack Publishing Co,Easton, PA (1980) . Cysteinyl residues most commonly are reacted with alpha-haloacetates ( and corresponding amines ) , such as chloroacetic acid or chloroacetamide, to give carboxymethyl or carboxyamidomethyl derivatives. Cysteinyl residues also are derivatized by reaction with bromotrifluoroacetone, alpha-bromo-beta- (5-imidazolyl) propionic acid, choroacetyl- phosphate, N-alkylmaleimides, 3-nitro-2pyridyl disulfide, p- chloromercuribenzoate, 2-chloromercuri-4-nitrophenol, or chloro-7-nitrobenzo-2-oxa-l, 3-diazole.
Histidyl residues are dervatized by reaction with diethylprocarbonate at pH 5.5-7.0 because this agent is relativly specific for the histidyl side chain. Para- bromophenacyl bromide also is useful; the reaction is preferably performed in 0.1 M sodium cacodylate at pH 6.0.
Lysinyl and amino terminal residues are reacted with succinic or other carboxylic acid anhydrides. Derivatization with this agent has the effect of reversing the charge the lysinyl residues. Other suitable reagents for derivatizing alpha-amino-containing residues include imido esters such as methyl picolinimidate; pyridoxal phosphate; pyridoxal; chloroborohydride; trinitrobenzenesulfonic acid; 0- methylisourea; 2, 4-pentanedione; and transaminase-catalyzed reaction with glyoxylate.
Arginyl residues are modified by reaction with one or several conventional reagents, among them phenylglyoxal, 2, 3-butanedione, 1, 2-cyclohexanedione, and ninhydrin. Derivatization of arginine residues requires that the reaction be performed in alkaline conditions because of the high pKa of the guanidine functional group. Furthermore, these reagents may react with the groups of lysine as well as the arginine epsilon-amino group.
The specific modification of tyrosyl residues per se has been studied extensively, with particular interest in introducing spectral labels into tyrosyl residues by reaction with aromatic diazonium compounds or tetranitromethane . Most commonly, N-acetylimidazol and tetranitromethane are used to form 0-acetyl tyrosyl species and 3-nitro derivatives, respectively.
Carboxyl side groups (aspartyl or glutamyl) are selectively modified by reaction with carbodiimides (R'-N-C- N-R' ) such as l-cyclohexyl-3- (2-morpholinyl- (4-ethyl) - carbodiimide or l-ethyl-3-azonia- , 4-dimethylpentyl) carbo- diimide. Furthermore, asparthyl and glutamyl residues are converted to asparaginyl and glutminyl residues by reaction with ammonium ions. Glutaminyl and asparaginyl residues are frequently deamidated to the corresponding glutamyl and aspartyl residues. Alternatively, these residues are deaminated under mildly acidic conditions. Either form of these residues falls within the scope of this invention .
Derivatization with bifunctional agents is useful for cross-linking the peptide to a water insoluble support matrix or to other macromolecular carriers. Commonly used cross-linking agents include, e.g. , 1, 1-bis (diazoacetyl) -2- phenylethane, glutaraldehyde, n-hy-droxysuccinimide esters , for example, esters with 4-azidosalicylic acid, homobifunctional imidoesters, including disuccinimidyl esters such as 3, 3' -dithiobis (succinimi-dylpropionate) , and bifunctional maleimides such as bis-N-maleimido-1, 2-octane . Derivatizing agents such as methyl-3-[ (p-
azidophenyl) dithio]propioimidate yield photoactivatable intermediates that are capable of forming crosslinks in the presence of light. Alternatively, reactive water-insoluble matrices such as cyanogen bromide-activated carbohydrates and the reactive substrates described in U . S . -A-3, 969, 287 ; 3,691,016; 4,195,128; 4,247,642; 4,229,537; and 4,330,440 are employed for protein immobilization.
Other modifications include hydroxylation of proline and lysine, phosphorilation of hydroxyl groups of seryl or threonyl or tyrosil residues, methylation of the alfa-amino groups of lysine, arginine, and histidine side chains (T. E.Creighton, Proteins : Structure and Molecule Properties, W.H. Freeman & Co., San Francisco pp.79-86 (1983)), acety- lation of the N-terminal amine and, in some istances, amidation of the C-terminal carboxyl groups. The proteins of the present invention are referred to as isolated and / or purified, where the term "isolated" denotes that the material has been removed from its original environment and the term "purified" is intended relative to the material in its natural state an does not mean absolute purity.
With the term "protein" is intended a polypeptide with a molecular weight between about 5.000 and more than 150.000 Dalton. With the term "peptide" is intended a polymer of amino acids having a sequence of less than 50 amino acids. The concentration of proteins isolated and / or purified is preferably al least 1 μg/ml.
The present invention includes the discovery that the proteins which are able to bind to proteins containing at least one EH domain, in particular epsl5 and epsl5R, possess an Asparagine-Proline-Phenylalanine (NPF) motif. It was also shown that the already known h-RAB protein, of which the amino acid sequence is reported as SEQ ID NO: 3 and which contains an NPF motif, is able to bind to proteins containing the EH domain. Besides, it was shown that peptides containing the NPF motif bind to proteins containing the EH domain. The present invention therefore further comprises the complex between a protein containing at least one EH domain, in particular epsl5 and epsl5R, and a protein containing an NPF motif, in particular the proteins of this invention and their functional derivatives. Another aspect of the invention consists of the peptides containing at least one NPF motif which are able to bind to a protein containing at least one EH domain, in particular to epsl5 and epsl5R. Preferably such peptides belong to the proteins of the invention. More preferably such peptides have the sequences reported as SEQ ID NO: 15-79. The invention further comprises chemical derivatives of such peptides, the term "chemical derivative" having the meaning previously reported.
Another object of the invention consists of the fusion proteins encompassing a peptide containing an NPF motif, in particular GST-fusion proteins.
A further object of the invention is a method for identifying and purifying proteins containing at least one EH domain based on the use of a fusion protein encompassing a peptide containing an NPF motif. Furthermore, another object of the present invention is a method for purifying an EH-containing protein from a complex mixture, which consists in: a) incubate such complex mixture with a solid-phase support to which a peptide containing an NPF motif is bound, allowing such protein to form a complex with the peptide bound to the solid support. b) remove substances not complexed to such peptide bound on the solid support. c)elute such protein complexed to the solid support.
Another aspect of the invention consists in purified antibodies against the proteins of the invention, including both monoclonal and polyclonal antibodies.
The invention further includes antisense RNA obtained from the polynucleotides of the invention.
In addition, the present invention includes homopurine and homopyri idine sequences of the polynucleotides of the invention or of their fragments, and their use as triple helix probes.
As previously reported, epsl5 and epsl5R contain, in their
N-terminus, three copies of a novel protein:protein interaction domain, which we named EH (Fig. 1) . To understand the binding specificity of EH domains, we engineered GST (Glutatione-S-Transferase) -fusion proteins containing the three EH domains of epsl5 (GST-EH) or of epsl5R (GST-EHR) . These fusion proteins were employed to screen a random phage-displayed peptide library and the selected phages were sequenced in the region corresponding to the random inserts. In Fig. 2, the conceptually translated peptides are displayed. Forty-six of 48 selected peptides contained the motif NPF. Position +1 (with respect to NPF) exhibited a strong preference for basic and for basic/hydrophobic residues in peptides selected by GST-EHR and GST-EH, respectively. Of note, negatively charged residues were never present in this position. Positions -1 and -2 displayed a weaker preference for Serine or Threonine (Fig. 2) . We employed the GST-EH fusion protein to screen a prokaryotic expression library from M426 human fibroblasts.
We identified several positive plaques that specifically reacted with GST-EH but not with control GST. By cross- hybridization experiments, the positive phages could be assigned to seven groups, corresponding to distinct cDNAs . Nucleotide sequence of the longest phage inserts of these cDNAs revealed that they represented partial cDNAs identified as: h-NUMB, the human homologue of NUMB, a developmentally regulated gene of Drosophila (Uemura et al., Cell 58, 349-360, 1989); h-NUMB-R, related to h-NUMB but distinct; h-RAB, the gene coding the cellular co-factor of the HIV REV protein; h-RAB-R, a gene related to h-RAB; and three novel genes identified as ehb3, ehblO and ehb21 (for EH-Binding, followed by the original plaque identifier) .
There were no ho ologies among the seven partial cDNAs (apart from those between h-NUMB and h-NUMB-R, and h-RAB and h-RAB-R, respectively) , except for the presence of NPF motifs, frequently present in multiple copies (Fig. 3) . Alignment of all NPF-containing stretches revealed preference for hydrophobic residues at position +1 (relative to NPF) and for serine or threonine at positions -1 and -2.
The partial cDNAs reported above did not contain entire open reading frames (ORF) . We therefore screened a human fibroblast cDNA library, to obtain the entire ORFs of human h-RAB, h-RAB-R, h-NUMB and h-NUMB-R. Several cDNAs were isolated and the longest ones, representative of each gene, were sequenced. A schematic of the cDNAs containing the entire ORFs of human h-NUMB and h-NUMB-R, and of human h-RAB and h-RAB-R is presented in Fig. 4 and 6, respectively, while the entire sequences are reported as SEQ ID NO: 8, 9, 10, and 11, respectively. SEQ ID NO: 12,13 and 14 correspond to the partial cDNA nucleotide sequences of ehb3, ehblO and ehb21, respectively. The sequence of h-RAB corresponds to that already reported by Bogerd et al. (Cell. 82, 485-494, 1995) and Fritz et al . (Nature 376, 530-550, 1995).
The ORFs of h-NUMB and h-NUMB-R have the capacity of encoding peptides of 603 and 609 amino acids, respectively, with a predicted molecular weight of 66 and 65 kDa. The two predicted proteins have amino acid sequences indicated as SEQ ID NO: 1 and 2, respectively, and display an overall relatedness of 74% with 57% identity (Fig. 4) . As already reported for murine and drosophila, both h-NUMB and h-NUMB-R contained a phosphotyrosine interaction domain (PID/PTB, van der Geer and Pawson, Trends Biochem., 20: 277-280, 1995) in their N-terminus (Fig. 4), and putative SH3-binding sites, in their C-terminus. In addition, they both contained a single NPF motif, located a few amino acids before the end of the proteins.
The ORFs of h-RAB and h-RAB-R have the capacity of encoding peptides of 562 and 481 amino acids, respectively, with a predicted molecular weight of 58 and 49 kDa. The two predicted proteins display an overall relatedness of 71% with 46% identity (Fig. 7) .
Conserved features between h-RAB and h-RAB-R include a zinc-finger region, in the N-terminus of the proteins and several FG (Phenylalanine-Glycine) motifs, characteristic of nucleoporin-like proteins. In addition, they both contained four NPF motifs, located in the C-terminal half of the molecule . In order to characterize the binding of the EH-interactors to epsl5, we engineered GST fusion proteins encompassing fragments derived from the 7 identified EH-binding proteins. In all cases the original partial cDNAs were used (see Examples) , since they should contain all of the determinants necessary and sufficient for EH binding. As shown in Fig. 8, all of the GST-fusion proteins were able to specifically recover native epsl5 from cell lysates in in vitro binding experiments .
To address the issue of whether the NPF-containing peptides, present in the EH-binding proteins, were actually responsible for binding to epsl5, we engineered GST-fusion proteins encompassing short NPF-containing peptides from the proteins of interest. The peptides engineered as GST fusions are underlined in Fig. 3. Also in this case (Fig. 9), we evidenced specific binding of the GST-peptide fusions to native epsl5 from total cellular lysates. The sum of the above results indicates that NPF-containing peptides are sufficient for binding to epsl5.
We then tested whether epsl5 can physically interact with some of the EH-binding proteins in vivo. To this aim, we engineered eukaryotic expression vectors encoding for either h-NUMB or h-RAB fused to a hemagglutinin (HA) epitope, at their N-terminus. The expression vectors obtained were called HA-NUMB and HA-RAB, respectively. C33A cells were then transiently transfected with HA-NUMB or HA-RAB vectors and cellular lysates were prepared using mild lysis condition to preserve protein:protein interactions. As shown in Fig 10, epsl5 was recoverable by immunoprecipitation with an anti-HA antibody, but not by control antibodies. The sum of the above results demonstrates that epsl5 interacts with h-NUMB and h-RAB in vivo.
To assess the relevance of the NPF motif for the binding to the EH domain, we selected the peptide SSSTNPFL, which is present in RAB and strongly binds to epsl5. We also generated GST-fusion proteins containing mutants of the same peptide in which individual amino acid positions have been mutated. The various GST-fusion proteins depicted in Fig. 11 were then tested for their ability to recover native epsl5 from cellular lysates, in in vitro binding assays.
As shown in Fig. 11, mutations in the NPF motif to NGF, DPF and NPY completely abolished binding. These results show that the NPF motif is necessary for the binding to the EH domain . Many EH-containing proteins display repeated copies of the binding module (Wong et al., supra); this observation raises the question as to whether a single EH is endowed with binding ability, or whether multiple copies are required. Previous experiments were performed with GST-fusion proteins containing the three EH domains of epsl5 and epsl5R, in order to maximize the chance of detecting protein:protein interactions. To design a strategy to engineer GST-fusions containing single EH domains, it is necessary to take into account the fact that the EH domains of epsl5 are contiguous. Thus, definition of the exact boundaries and of the structural consequences of extrapolating sequences from their natural context was difficult to predict. We thought, therefore, to take advantage of secondary structure predictions of the N-terminus of epsl5. As shown in Fig. 12, a Chous-Fasman-Rose algorithm predicted that the 3 EH domains of epsl5 are flanked by regions with high propensity for turns, possibly underlying the requirement for domain exposure in order to achieve binding. We thus, engineered a GST-fusion protein containing residues 106-216 of epsl5, which encompass the second EH flanked by the predicted N- and C-terminal turns (M2 protein, Fig. 13). This protein displayed strong binding to the h-RAB protein, obtained by in vitro transcription/translation (Fig. 14). Removal of the N-terminal region which can form a turn, as obtained in the GST-fusion protein 122-216 (LEH2 protein, FIG. 13) results in a drastic reduction of the binding capacity (Fig. 14) . However, by replacing this region with an artificial turn region consisting of polyglycine, as obtained in the construct KT-LEH2 (Fig. 13), binding activity is completely restored (Fig. 14). Thus, these results establish that an individual EH domain is endowed with binding ability.
Epsl5 and epsl5R are the only EH-containing proteins known in mammals. The identification of peptide sequences that bind to EH allowed for the identification of putative EH- containing proteins in mammalian cells. GST-fusion proteins, encompassing NPF-containing peptides from h-NUMB, h-RAB and h-RAB-R were challenged with 35S-labeled lysates from NIH-3T3 cells and several cellular proteins were specifically recovered (Fig. 15) . In the case of the RAB peptide, mutations of NPF to DPF, NGF or NPY totally abolished recognition, strongly arguing that the identified proteins represent EH-containing species (Fig. 15).
In the description above, the obtainment of the partial cDNA clones ehb3, ehblO and ehb21 was reported. Several techniques can be used to obtain the corresponding full length cDNA clones.
Conventional biochemical techniques permit the use of a partial cDNA clone as a probe to identify a cDNA clone corresponding to a full length transcript or a genomic clone having the complete ehb3, ehblO or ehb21 gene. For instance, the following approach can be used to obtain a complete cDNA sequence or a genomic DNA sequence corresponding to ehb3, ehblO, ehb21: 1. Label the partial sequence of ehb3, ehblO and ehb21 and use them as probes to screen a lambda phage human cDNA library or a plasmid cDNA library.
2. Identify the colonies containing clones related to the probe cDNA and purify them by known purification methods.
3. Nucleotide sequence the ends of the newly purified clones to identify full length sequences. 4. Perform complete sequencing of full length clones by Exonuclease III digestion or primer walking. Northern blots of the mRNA from various tissues using at least part of the clone as a probe can optionally be performed to check the size of the mRNA against that of the purported full length cDNA.
More particularly, all or part of the cDNA may be used as a probe to identify a cDNA clone containing the full length of cDNA sequence. The partial sequences, or portion thereof, can be nick-translated or end-labeled with 32P using polynucleotide kinase and labeling methods known to those with skill in the art (Basic Methods in Molecular Biology, L.G. Davis, M.D. Dibner and J.F. Battey, ed., Elsevier
Press, N.Y. 1986). A lambda library can be directly screened with the labeled cDNA probe, or the library can be converted en masse to pBluescript® (Stratagene, La Jolla, Calif.) to facilitate bacterial colony screening. Both methods are well known in the art.
Filters with bacterial colonies containing the library in pBluescript® or bacterial lawns containing lambda plaques are denaturated and the DNA is fixed to the filters. The filters are hybridized with the labeled probe using hybridization conditions described by Davis et al. (supra). The partial sequence, cloned into lambda or pBluescript®, can be used as a positive control to assess background binding and to adjust the hybridization and washing stringencies necessary for accurate clone identification. The resulting autoradiograms are compared to duplicate plates of colonies or plaques; each exposed spot corresponds to a positive colony or plaque. The colonies or plaques are selected, expanded, ant the DNA is isolated from the colonies for further analysis and sequencing.
Positive cDNA clones in phage lambda may be analyzed to determine the amount of additional sequence they contain using PCR with one primer from the partial sequences and the other primer from the vector. Clones with a larger vector- insert PCR product than the original clone are analyzed by restriction digestion and DNA sequencing to determine whether they contain an insert of the same size or similar as the mRNA size on a Northern blot.
Once one or more overlapping cDNA clones are identified, the complete sequence of the clones can be determined. The preferred method is to use exonuclease III digestion (McCombie et al . , Methods, 3: 33-40, 1991). A series of deletion clones is generated, each of which is sequenced. The resulting overlapping sequences are assembled into a single contiguous sequence of high redundancy (usually three to five overlapping sequences at each nucleotide position) , resulting in a highly accurate final sequence.
Once isolated and characterized, the gene can be expressed in a recombinant organism to obtain significant amount of protein.
The DNA encoding for the protein can be inserted into other conventional host organisms and expressed. The organism can be a bacterium, yeast, cell line or multicellular plant or animal. The literature is replete with examples of suitable host organisms and expression techniques .
For example, naked polynucleotide (DNA or mRNA) can be injected directly into muscle tissue of mammals, where it is expressed. This methodology can be used to deliver the polynucleotide and, therefore, the resulting polypeptide translation product to the animal, or to generate an immune response against a foreign polypeptide (Wolff et al., Science 247: 1465, 1990; Feigner et al . , Nature 349: 351, 1991) . Alternatively, the coding sequence, together with appropriate regulatory regions (i.e., a construct), can be inserted into a vector, which is then used to transfect a cell. The cell (which may or may not be part of a larger organism) then expresses the polypeptide. Antibodies generated against the proteins of the present invention can be obtained by direct injection of the naked polynucleotide into an animal (Wollf, supra) or by administering the polypeptide to an animal. The antibody so obtained will then bind the polypeptide itself. In this manner, even partial DNA sequences can be used to generate antibodies binding the whole native polypeptide. Antibodies can be used in standard immunoassays to detect the presence and / or amount of the proteins of the invention in a sample. Such assays can comprise competitive or non competitive assays. Radioimmunoassays, ELISAs, Western Blot assays, immunohistochemical assays, immunochromatographic assays, and other conventional assays are expressly contemplated. Furthermore, monoclonal and polyclonal antibodies can be generated using well-known methods .
Antibodies against the proteins of this invention can be used to determine the quantity of these proteins in a sample. This can be particularly useful in clinical research, as well as in detecting abnormalities in mitogenic signal transduction in malignant tissue.
In addition, in many human tumors, tumor markers are released in the blood stream at levels which correlate with the size of the tumor and its clinical stage.
Determination of the levels of a marker (such as the proteins of this invention) can be advantageous in aiding the diagnostic procedures and in monitoring the effectiveness of therapy.
For example in one technique the antibody against a protein is immobilized to an agarose column. Sample is then directed through the column where the protein in the sample is bound by the immobilized antibody. Next, a known quantity of radiolabeled antibody is directed through the column. The quantity of labeled antibody which is not retained on the column is measured, and bears a relationship to the quantity of protein in the sample.
Another exemplary technique is liquid phase radioimmunoassay. First, a standard measurement is made. Specifically a small, known amount of purified protein radiolabeled in a conventional manner, is challenged against a known amount of antibody.
The resulting immunocomplex is recovered by centrifugation, and the radioactivity of the centrifugate is determined. This value is used as a standard against which later measurements are compared. Next, a sample, containing unknown amounts of protein is challenged against the same known amount (used in making the standard measurement) of antibody. Then, the same amount of labeled protein used in making the standard measurement is added to the reaction mixture, followed by centrifugation and measurement of radioactivity as explained above. The decrease in the immunoprecipitated radioactivity (in comparison to the standard) is proportional to the amount of protein in the sample. Moreover, other well known conventional immunoassay methods may be used.
Antisense RNA molecules are known to be useful for regulating translation within the cell. Antisense RNA molecules can be produced from the sequences of the present invention. These antisense molecules can be used as therapeutic agents to regulate gene expression.
The antisense molecules are obtained from a nucleotide sequence by reversing the orientation of the coding region with regard to the promoter.
Thus, the antisense RNA is complementary to the corresponding mRNA.
For a review of antisense design see Green et al . , Ann. Rev. Biochem. 55: 569-597 (1986), which is hereby incorporated by reference.
The antisense sequences can contain modified sugar phosphate backbones to increase stability and make them less sensitive to RNase activity. Examples of the modifications are described by Rossi et al., Pharmacol. Ther. 50: 245-254,
(1991) .
Antisense molecules are introduced into cells that express the protein gene. In a preferred application of the invention, the effectiveness of antisense inhibition on translation can be monitored using techniques that include, but are not limited to, antibody-mediated tests such as RIAs and ELISA, functional assays, or radiolabeling. The antisense molecule is introduced into the cells by diffusion or by transfection procedures known in the art. The molecules are introduced onto cell samples at a number of different concentrations, preferably between lxl0~10M to 1x10" 4M. Once the minimum concentration that can adequately control translation is identified, the optimized dose is translated into a dosage suitable for use in vivo. For example, an inhibiting concentration in culture of lxlO"7M translates into a dose of approximately 0.6 mg/kg bodyweight. Levels of oligonucleotide approaching 100 mg/kg bodyweight or higher may be possible after testing the toxicity of the oligonucleotide in laboratory animals.
The antisense molecule can be introduced into the body as a bare or naked oligonucleotide, oligonucleotide encapsulated in lipid, oligonucleotide sequence encapsidated by viral protein, or as oligonucleotide contained in an expression vector. The antisense oligonucleotide is preferably introduced into the vertebrate by injection. Alternatively, cells from the vertebrate are removed, treated with the antisense oligonucleotide, and reintroduced into the vertebrate. It is further contemplated that the antisense oligonucleotide sequence is incorporated into a ribozyme sequence to enable the antisense to bind and cleave its target. For technical applications of ribozyme and antisense oligonucleotides, see Rossi et al., supra.
Triple helix oligonucleotides are used to inhibit transcription from a genome. They are particularly useful for studying alterations in cell activity as it is associated with a particular gene. The gene sequence or, more preferably, a portion thereof, can be used to inhibit gene expression in individuals suffering from disorders associated with this gene. Similarly, a portion of a gene sequence, or the entirely thereof, can be used to study the effect of inhibiting transcription of the gene within the cell. Traditionally, homopurine sequences were considered the most useful. However, homopyrimidine sequences can also inhibit gene expression. Thus, both types of sequences corresponding to the claimed nucleotide sequences are contemplated within the scope of this invention. Homopyrimidine oligonucleotides bind to the major groove at homopurine : homopyrimidine sequences. As an example, 10-mer to 20-mer homopyrimidine sequences from the claimed nucleotide sequences genes can be used to inhibit expression from homopurine sequences. Moreover the natural (beta) anomers of the oligonucleotide units can be replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides suitable for the triple helix formation see Griffin et al., Science 245: 967-971 (1989), which is hereby incorporated for reference.
The oligonucleotides may be prepared on an oligonucleotide synthesizer or they may be purchased commercially from a company specializing in custom oligonucleotide synthesis. The sequences are introduced into cells in culture using techniques known in the art that include but are not limited to calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome- mediated transfection or native uptake. Treated cells are monitored for altered cell function. Alternatively, cells from the organism are extracted, treated with the triple helix oligonucleotide, and reimplanted into the organism.
FIGURES DESCRIPTION
Figures 1, 2, 3: EH-mediated binding to peptides and proteins in vitro and in vivo. FIG.l. Schematic of the epsl5 and epsl5R proteins with their EH domains. Amino acid positions are indicated. FIG.2. Predicted amino acid sequence of peptides selected by screening of a random phage-displayed peptide library with GST-EH and GST-EHR, from epsl5 and epsl5R, respectively. NPFs are in bold-face. FIG.3. NPF-containing motifs present in the cDNAs identified by screening of a human fibroblast expression library using GST-EH as a probe. Underlined peptides were used for the in vitro binding experiments described in Figs. 4, 5 and 7.
Figures 4,5: Human NUMB and NUMB-R cDNAs and proteins . FIG.4. The structure of the human NUMB (hNUMB) and NUMB-R (hNUMB-R) cDNAs is depicted. The ORFs are indicated by a solid box. Positions are indicated in kb. The nucleotide positions of initiator and terminator codons are also indicated. Canonical poly-adenylation sites (AATAAA) were at position 2649, 2823, 2958 and 3025, for h-NUMB and h-NUMB-R, respectively (not shown). FIG.5. Predicted protein sequences and alignment of human NUMB and NUMB-R. In the h-NUMB-R sequence, only non-identical amino acids are reported, except for the NPF motifs. Dashes indicate gaps introduced to maximize the alignment. Accepted conservations, employed to calculate relatedness were: (D, E, N, Q) ; (L, I, V, M) ; (K, R, H) ; (F, Y, W) ; (A, G, P, S, T) . The PTB domains and the NPF motifs are indicated with reverse print.
Figures 6,7: Human RAB and RAB-R cDNAs and proteins. FIG.6.
The structure of the human RAB (h-RAB) and RAB-R (h-RAB-R) cDNAs is depicted. The ORFs are indicated by a solid box. Positions are indicated in kb. The nucleotide positions of initiator and terminator codons are also indicated. Canonical poly-adenylation sites (AATAAA) were at position 2499, 2542 and 2556 of the h-RAB sequence; no poly- adenylation site was found in the isolated h-RAB-R cDNA (not shown). FIG.7. Predicted protein sequences and alignment of human RAB and RAB-R. The sequence of h-RAB is identical to that reported by Bogerd et al . (Cell. 82, 485-494, 1995) and Fritz et al. (Nature 376, 530-533, 1995). In the h-RAB-R sequence, only non-identical amino acids are reported, except for the FG and NPF motifs. Dashes indicate gaps introduced to maximize the alignment. Accepted conservations, employed to calculate relatedness were: (D, E, N, Q) ; (L, I, V, M) ; (K, R, H) ; (F, Y, W) ; (A, G, P, S, T) . The FG, zinc-finger and NPF motifs are indicated with reverse print.
Figures 8,9,10: In vitro binding of NPF-containing proteins and peptides to epsl5. FIG.8. In vitro binding of epsl5 to GST-fusions of EH-binding proteins: Total cellular lysates from NIH-3T3 cells (1 mg/lane) were incubated with the indicated GST-fusion proteins (10 Dg) for 1 h at 4 °C. Specifically bound epsl5 was detected by immunoblot with an anti-epsl5 antibody. FIG. 9. Binding of epsl5 to GST-fusions encompassing NPF-containing peptides: GST-fusion proteins were engineered to encompass the NPF-containing peptides underlined in Fig. 3 and are indicated with the names of the proteins from which the peptides were derived. In vitro binding experiments were performed as described in A. FIG. 10 Coimmunoprecipitation of epsl5 with h-NUMB and h-RAB proteins. Two expression vectors with cytomegalovirus promoter (HA-RAB and HA-NUMB) were engineered by inserting the sequence encoding the HA epitope (YDVPDYASLP) . C33A cells were transfected with the vectors and total cellular proteins were obtained in mild lysis condition to preserve protein:protein interactions. 5 mg. of total cellular proteins were immunoprecipitated with an anti-HA antibody (HA) or with a control serum (C) . The coimmunoprecipitation was reveled by WB anti-epsl5 antibody. Lane marked "LYS" was loaded with 50 μg of total cellular proteins, to serve as a reference for the position of epsl5 (also indicated by arrowheads) .
Figure 11 : Requirement for the NPF motif for binding to epsl5. Binding to epsl5 of mutant peptides containing mutations in the NPF sequence. Peptides engineered in GST- fusion proteins are indicated on the left. GST-NPF corresponds to the sequence of a NPF-containing peptide derived from the sequence of RAB (shown in Fig. 3) . Mutant peptides (GST-NGF, GST-DPF, GST-NPY) are also indicated. The in vitro bindings to epsl5, obtained as described in Fig. 8,9,10, are shown on the right. The lane marked "RAB" represents an in vitro binding obtained with the GST-RAB protein (Fig. 8), to serve as a positive control.
Figures 12,13,14: Mapping of the minimal region of epsl5 required for binding to NPF-containing proteins. FIG.12.
Secondary structure prediction of the N-terminal region of epsl5 (containing the three EH domains) by a Chou-Fasman- Rose algorithm. ( H\l\ ) , alpha helices; ( SS. ), beta sheets; (B), coils, ( M ); turn. Amino acid positions are also indicated. FIG.13. Schematic representation of the epsl5 N- terminal domain and of the GST-fusion proteins engineered, with predicted turns indicated by solid boxes. The indicated fragments of epsl5 were engineered into GST-fusion proteins and used for in vitro binding experiments. The EH construct contains all three EH domains. The M2 construct contains the region encompassing the second EH domain flanked by the natural regions predicting the turns shown in FIG. 12. LEH-2 construct contains the same region without the turn. The KT- LEH2 construct contains the same region preceded by an artificial turn region composed of poly-glycine . Amino acid positions are also indicated in parentheses. FIG.14. In vitro bindings. The GST-fusions shown in FIG.13 were used to bind to 35S-labeled h-RAB protein, obtained by in vitro transcription/translation of the RAB cDNA. Detection was by autoradiography. The lane marked T/T was loaded with the primary product of the in vitro transcription/translation, to serve as a reference.
Fig.15: Identification of a family of EH-containing proteins in mammals. 35S-labeled lysates from NIH-3T3 (50xl06 TCA-precipitable counts) were incubated with the indicated GST-fusion proteins displaying NPF-containing peptides (NPF- RAB, NPF-RAB-R and NPF-NUMB, underlined in Fig. 3) or mutant peptides (NGF-RAB, DPF-RAB and NPY-RAB, shown in Fig. 11), or with GST. The first two lanes represent the same lysate immunoprecipitated with anti-epsl5 antibody (EPS15) or with a control serum (PRE) . The position of epsl5 is indicated by an arrowhead. Molecular weight markers are indicated in kDa. EXAMPLES
Example 1 - Screening of phage-display libraries . The peptide library utilized and the panning conditions were essentially as described by Felici et al. (J. Mol. Biol. 222, 301-310, 1991), the main difference being that the target EH domains (3μg) were immobilized by binding to 20 μl of GST-sepharose matrix (Pharmacia). The peptide library was screened with either the GST-EH or the GST-EHR fusion proteins, designed to encompass amino acid positions 2-230 and 15-368 of mouse epsl5 and mouse epsl5R, respectively. Two panning cycles were carried out for each domain before sequencing of the clones that were enriched during the selection procedure.
Example 2 - Isolation of cDNAs encoding EH-binding proteins.
A GST-fusion protein containing the three EH domains of epsl5 (GST-EH, Wong et al., Proc. Natl. Acad. Sci. Usa 92, 9530-9534, 1995) was used to screen a pCEV-LAC-based prokaryotic expression library (a kind gift of T. Miki) from M426 human fibroblasts. Recombinant plaques (5 x 10^) were screened, after induction with IPTG (D- thyogalactopyranoside) , using a modification of the Far- Western assay developed to identify proteins interacting with GST-fusions (Matoskova et al., Oncogene 12, 2679-2688, 1996; ibidem 12, 2563-2571, 1996) . Briefly, filters were blocked in 2% bovine serum albumin (BSA) in TTBS (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 0.05% Tween 20) for at least 2 h at room temperature (RT) and then in reduced glutathione (Sigma, 3 μM) in TTBS with 0.5% w/v BSA for 1 h at RT .
Blots were then incubated with GST-EH (10 nM) in TTBS in the presence of reduced glutathione (3 μM) and BSA (0.5% w/v) for 1 h at RT . After extensive washing in TTBS, blots were detected with an affinity-purified anti-GST antibody, as described in Matoskova et al. (supra).
Positive phages that specifically reacted with GST-EH but not with control GST were subjected to cross-hybridization experiments and assigned to seven groups, corresponding to distinct cDNAs (not shown) . For each group, phages containing the longest inserts were sequenced by the dideoxy-termination method (the so-called Sanger method) on both strands of the cDNAs, using a commercial kit (Sequenase) .
To obtain cDNA clones containing the entire open reading frames of human NUMB, NUMB-R, RAB and RAB-R, a pCEV-29-based eukaryotic expression library from M426 human fibroblasts (Miki, T. et al. 1991 Science 251: 12-15) was screened, with standard techniques. Alignments of peptide sequences were performed by a CLUSTAL4 algorithm on a MacDNASIS software.
Example 3 - Production of recombinant proteins. GST-fusion proteins containing large fragments of the proteins of interest, were obtained by recombinant PCR of the appropriate fragment from the murine epsl5 and epsl5R cDNAs, or from the cDNAs of the EH-binding proteins, followed by cloning in the pGEX expression vector, in-frame with the GST moiety.
Except where indicated, we used pGEX 4T1 vector (Pharmacia). The kT-LEH2 vector (Fig. 6) was created using the pGEX-KT vector which contains a poly-glycine sequence in frame with the GST moiety before than the cloning sites.
GST-fusions containing short peptides, from the proteins of interest, were obtained by annealing in vitro complementary oligonucleotides with the appropriate sequence followed by cloning in frame with the GST moiety, in a pGEX-KT vector. Purification of the GST-fusion proteins onto agarose- glutathione and in vitro binding experiments were performed as described in Wong et al. l(Proc. Natl. Acad. Sci. Usa 92, 9530-9534; 1995) and Matoskova et al. (Oncogene 12, 2679- 2688 and 2563-2571, 1996) .
Example 4 - Protein studies .
Expression vectors for h-RAB and h-NUMB proteins were engineered in the pMT2 eukaryotic expression vector, by inserting (by insertional overlapping polymerase chain reaction) the sequence encoding the hemagglutinin epitope (YDVPDYASLP) between codons 1 and 2 of the open reading frame of the RAB and NUMB cDNAs, to obtain pMT2-HA-RAB and pMT2-HA-NUMB, respectively. Transient transfection of C33A cells by calcium phosphate, was performed as previously described for NIH-3T3 cells (Fazioli et al., Mol. Cell. Biol., 13: 5814-5828, 1993). Immunoprecipitation and immunoblotting experiments were performed as previously described in Fazioli et al. (supra) and Matoskova et al. (Oncogene 12, 2679-2688, 1996) . Typically, we employed 50-100 μg of total cellular proteins for direct immunoblot analysis and 3-5 mg of total cellular proteins for immunoprecipitation/immunoblotting experiments. For co-immunoprecipitation experiments total cellular proteins were obtained in mild lysis conditions, in the absence of ionic detergents, to preserve protein-protein interactions, as described in Fazioli et al. (supra). 35S-methionine-labeled h-RAB protein, employed in the experiments in Fig. β, was synthesized by in vitro transcription-translation using a commercial kit (Promega) and the full-length h-RAB cDNA. Metabolic labeling with 35S- methionine of NIH-3T3 cells was performed as previously described (Fazioli et al . , supra).
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT:
(A) NAME: Istituto Europeo di Oncologia (B) STREET: Via Filodrammatici, 8
(C) CITY: Milan
(E) COUNTRY: Italy
(F) POSTAL CODE (ZIP) : 20141 (ii) TITLE OF INVENTION: Intracellular Interactors and EH domain binding specificity
(iii) NUMBER OF SEQUENCES: 79 (iv) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO)
(2) INFORMATION FOR SEQ ID NO: 1:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 603 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:
Met Asn Lys Leu Arg Gin Ser Phe Arg Arg Lys Lys Asp Val Tyr Val 1 5 10 15 Pro Glu Ala Ser Arg Pro His Gin Trp Gin Thr Asp Glu Glu Gly Val
20 25 30
Arg Thr Gly Lys Cys Ser Phe Pro Val Lys Tyr Leu Gly His Val Glu 35 40 45
Val Asp Glu Ser Arg Gly Met His He Cys Glu Asp Ala Val Lys Arg 50 55 60
Leu Lys Ala Glu Arg Lys Phe Phe Lys Gly Phe Phe Gly Lys Thr Gly 65 70 75 80
Lys Lys Ala Val Lys Ala Val Leu Trp Val Ser Ala Asp Gly Leu Arg 85 90 95 Val Val Asp Glu Lys Thr Lys Asp Leu He Val Asp Gin Thr He Glu 100 105 110
Lys Val Ser Phe Cys Ala Pro Asp Arg Asn Phe Asp Arg Ala Phe Ser 115 120 125
Tyr He Cys Arg Asp Gly Thr Thr Arg Arg Trp He Cys His Cys Phe 130 135 140 Met Ala Val Lys Asp Thr Gly Glu Arg Leu Ser His Ala Val Gly Cys 145 150 155 160
Ala Phe Ala Ala Cys Leu Glu Arg Lys Gin Lys Arg Glu Lys Glu Cys 165 170 175
Gly Val Thr Ala Thr Phe Asp Ala Ser Arg Thr Thr Phe Thr Arg Glu 180 185 190
Gly Ser Phe Arg Val Thr Thr Ala Thr Glu Gin Ala Glu Arg Glu Glu 195 200 205
He Met Lys Gin Met Gin Asp Ala Lys Lys Ala Glu Thr Asp Lys He 210 215 220
Val Val Gly Ser Ser Val Ala Pro Gly Asn Thr Ala Pro Ser Pro Ser 225 230 235 240
Ser Pro Thr Ser Pro Thr Ser Asp Ala Thr Thr Ser Leu Glu Met Asn 245 250 255
Asn Pro His Ala He Pro Arg Arg His Ala Pro He Glu Gin Leu Ala 260 265 270
Arg Gin Gly Ser Phe Arg Gly Phe Pro Ala Leu Ser Gin Lys Met Ser 275 280 285
Pro Phe Lys Arg Gin Leu Ser Leu Arg He Asn Glu Leu Pro Ser Thr
290 295 300 Met Gin Arg Lys Thr Asp Phe Pro He Lys Asn Ala Val Pro Glu Val
305 310 315 320
Glu Gly Glu Ala Glu Ser He Ser Ser Leu Cys Ser Gin He Thr Asn 325 330 335
Ala Phe Ser Thr Pro Glu Asp Pro Phe Ser Ser Ala Pro Met Thr Lys 340 345 350
Pro Val Thr Val Val Ala Pro Gin Ser Pro Thr Phe Gin Gly Thr Glu 355 360 365
Trp Gly Gin Ser Ser Gly Ala Ala Ser Pro Gly Leu Phe Gin Ala Gly 370 375 380 His Arg Arg Thr Pro Ser Glu Ala Asp Arg Trp Leu Glu Glu Val Ser 385 390 395 400
Lys Ser Val Arg Ala Gin Gin Pro Gin Ala Ser Ala Ala Pro Leu Gin 405 410 415
Pro Val Leu Gin Pro Pro Pro Pro Thr Ala He Ser Gin Pro Ala Ser
420 425 430 Pro Phe Gin Gly Asn Ala Phe Leu Thr Ser Gin Pro Val Pro Val Gly 435 440 445
Val Val Pro Ala Leu Gin Pro Ala Phe Val Pro Ala Gin Ser Tyr Pro 450 455 460
Val Ala Asn Gly Met Pro Tyr Pro Ala Pro Asn Val Pro Val Val Gly 465 470 475 480
He Thr Pro Ser Gin Met Val Ala Asn Val Phe Gly Thr Ala Gly His 485 490 495
Pro Gin Ala Ala His Pro His Gin Ser Pro Ser Leu Val Arg Gin Gin 500 505 510
Thr Phe Pro His Tyr Glu Ala Ser Ser Ala Thr Thr Ser Pro Phe Phe 515 520 525
Lys Pro Pro Ala Gin His Leu Asn Gly Ser Ala Ala Phe Asn Gly Val 530 535 540
Asp Asp Gly Arg Leu Ala Ser Ala Asp Arg His Thr Glu Val Pro Thr 545 550 555 560
Gly Thr Cys Pro Val Asp Pro Phe Glu Ala Gin Trp Ala Ala Leu Glu 565 570 575
Asn Lys Ser Lys Gin Arg Thr Asn Pro Ser Pro Thr Asn Pro Phe Ser 580 585 590
Ser Asp Leu Gin Lys Thr Phe Glu He Glu Leu 595 600 (2) INFORMATION FOR SEQ ID NO: 2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 609 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: Met Ser Arg Ser Ala Ala Ala Ser Gly Gly Pro Arg Arg Pro Glu Arg 1 5 10 15
His Leu Pro Pro Ala Pro Cys Gly Ala Pro Gly Pro Pro Glu Thr Cys 20 25 30
Arg Thr Glu Pro Asp Gly Ala Gly Thr Met Asn Lys Leu Arg Gin Ser 35 40 45
Leu Arg Arg Arg Lys Pro Ala Tyr Val Pro Glu Ala Ser Arg Pro His 50 55 60
Gin Trp Gin Ala Asp Glu Asp Ala Val Arg Lys Gly Thr Cys Ser Phe 65 70 75 80 Pro Val Arg Tyr Leu Gly His Val Glu Val Glu Glu Ser Arg Gly Met 85 90 95
His Val Cys Glu Asp Ala Val Lys Lys Leu Lys Ala Met Gly Arg Lys 100 105 110
Ser Val Lys Ser Val Leu Trp Val Ser Ala Asp Gly LBU Arg Val Val 115 120 125 Asp Asp Lys Thr Lys Asp Leu Leu Val Asp Gin Thr He Glu Lys Val 130 135 140
Ser Phe Cys Ala Pro Asp Arg Asn Leu Asp Lys Ala Phe Ser Tyr He 145 150 155 160
Cys Arg Asp Gly Thr Thr Arg Arg Trp He Cys His Cys Phe Leu Ala 165 170 175
Leu Lys Asp Ser Gly Glu Arg Leu Ser His Ala Val Gly Cys Ala Phe 180 185 190
Ala Ala Cys Leu Glu Arg Lys Gin Arg Arg Glu Lys Glu Cys Gly Val 195 200 205
Thr Ala Ala Phe Asp Ala Ser Arg Thr Ser Phe Ala Arg Glu Gly Ser 210 215 220
Phe Arg Leu Ser Gly Gly Gly Arg Pro Ala Glu Arg Glu Ala Pro Asp 225 230 235 240
Lys Lys Lys Ala Glu Ala Ala Ala Ala Pro Thr Val Ala Pro Gly Pro 245 250 255
Ala Gin Pro Gly His Val Ser Pro Thr Pro Ala Thr Thr Ser Pro Gly 260 265 270
Glu Lys Gly Glu Ala Gly Thr Pro Val Ala Ala Gly Thr Thr Ala Ala
275 280 285 Ala He Pro Arg Arg His Ala Pro Leu Glu Gin Leu Val Arg Gin Gly 290 295 300
Ser Phe Arg Gly Phe Pro Ala Leu Ser Gin Lys Asn Ser Pro Phe Lys 305 310 315 320
Arg Gin Leu Ser Leu Arg Leu Asn Glu Leu Pro Ser Thr Leu Gin Arg 325 330 335
Arg Thr Asp Phe Gin Val Lys Gly Thr Val Pro Glu Met Glu Pro Pro 340 345 350
Gly Ala Gly Asp Ser Asp Ser He Asn Ala Leu Cys Thr Gin He Ser
355 360 365 Ser Ser Phe Ala Ser Ala Gly Ala Pro Ala Pro Gly Pro Pro Pro Ala
370 375 380
Thr Thr Gly Thr Ser Ala Trp Gly Glu Pro Ser Val Pro Pro Ala Ala
385 390 395 400
Ala Phe Gin Pro Gly His Lys Arg Thr Pro Ser Glu Ala Glu Arg Trp
405 410 415 Leu Glu Glu Val Ser Gin Val Ala Lys Ala Gin Gin Gin Gin Gin Gin
420 425 430
Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Gin Ala Ala
435 440 445
Ser Val Ala Pro Val Pro Thr Met Pro Pro Ala Leu Gin Pro Phe Pro
450 455 460
Ala Pro Val Gly Pro Phe Asp Ala Ala Pro Ala Gin Val Ala Val Phe 465 470 475 480
Leu Pro Pro Pro His Met Gin Pro Pro Phe Val Pro Ala Tyr Pro Gly 485 490 495
Leu Gly Tyr Pro Pro Met Pro Arg Val Pro Val Val Gly He Thr Pro 500 505 510
Ser Gin Met Val Ala Asn Ala Phe Cys Ser Ala Ala Gin Leu Gin Pro 515 520 525
Gin Pro Ala Thr Leu Leu Gly Lys Ala Gly Ala Phe Pro Pro Pro Ala 530 535 540
He Pro Ser Ala Pro Gly Ser Gin Ala Arg Pro Arg Pro Asn Gly Ala 545 550 555 560
Pro Trp Pro Pro Glu Pro Ala Pro Ala Pro Ala Pro Glu Leu Asp Pro 565 570 575
Phe Glu Ala Gin Trp Ala Ala Leu Glu Gly Lys Ala Thr Val Glu Lys 580 585 590
Pro Ser Asn Pro Phe Ser Gly Asp Leu Gin Lys Thr Phe Glu He Glu 595 600 605
Leu
(2) INFORMATION FOR SEQ ID NO: 3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 562 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3:
Met Ala Ala Ser Ala Lys Arg Lys Gin Glu Glu Lys His Leu Lys Met
1 5 10 15
Leu Arg Asp Met Thr Gly Leu Pro His Asn Arg Lys Cys Phe Asp Cys 20 25 30
Asp Gin Arg Gly Pro Thr Tyr Val Asn Met Thr Val Gly Ser Phe Val Cys Thr Ser Cys Ser Gly Ser Leu Arg Gly Leu Asn Pro Pro His Arg 50 55 60
Val Lys Ser He Ser Met Thr Thr Phe Thr Gin Gin Glu He Glu Phe 65 70 75 80
Leu Gin Lys His Gly Asn Glu Val Cys Lys Gin He Trp Leu Gly Leu 85 90 95 Phe Asp Asp Arg Ser Ser Ala He Pro Asp Phe Arg Asp Pro Gin Lys
100 105 110
Val Lys Glu Phe Leu Gin Glu Lys Tyr Glu Lys Lys Arg Trp Tyr Val 115 120 125
Pro Pro Glu Gin Ala Lys Val Val Ala Ser Val His Ala Ser He Ser 130 135 140
Gly Ser Ser Ala Ser Ser Thr Ser Ser Thr Pro Glu Val Lys Pro Leu 145 150 155 160
Lys Ser Leu Leu Gly Asp Ser Ala Pro Thr Leu His Leu Asn Lys Gly 165 170 175
Thr Pro Ser Gin Ser Pro Val Val Gly Arg Ser Gin Gly Gin Gin Gin 180 185 190
Glu Lys Lys Gin Phe Asp Leu Leu Ser Asp Leu Gly Ser Asp He Phe
195 200 205
Ala Ala Pro Ala Pro Gin Ser Thr Ala Thr Ala Asn Phe Ala Asn Phe 210 215 220
Ala His Phe Asn Ser His Ala Ala Gin Asn Ser Ala Asn Ala Asp Phe 225 230 235 240
Ala Asn Phe Asp Ala Phe Gly Gin Ser Ser Gly Ser Ser Asn Phe Gly 245 250 255 Gly Phe Pro Thr Ala Ser His Ser Pro Phe Gin Pro Gin Thr Thr Gly
260 265 270
Gly Ser Ala Ala Ser Val Asn Ala Asn Phe Ala His Phe Asp Asn Phe 275 280 285
Pro Lys Ser Ser Ser Ala Asp Phe Gly Thr Phe Asn Thr Ser Gin Ser 290 295 300
His Gin Thr Ala Ser Ala Val Ser Lys Val Ser Thr Asn Lys Ala Gly 305 310 315 320
Leu Gin Thr Ala Asp Lys Tyr Ala Ala Leu Ala Asn Leu Asp Asn He 325 330 335 Phe Ser Ala Gly Gin Gly Gly Asp Gin Gly Ser Gly Phe Gly Thr Thr
340 345 350
Gly Lys Ala Pro Val Gly Ser Val Val Ser Val Pro Ser Gin Ser Ser 355 360 365
Ala Ser Ser Asp Lys Tyr Ala Ala Leu Ala Glu Leu Asp Ser Val Phe 370 375 380 Ser Ser Ala Ala Thr Ser Ser Asn Ala Tyr Thr Ser Thr Ser Asn Ala
385 390 395 400
Ser Ser Asn Val Phe Gly Thr Val Pro Val Val Ala Ser Ala Gin Thr
405 410 415
Gin Pro Ala Ser Ser Ser Val Pro Ala Pro Phe Gly Ala Thr Pro Ser
420 425 430
Thr Asn Pro Phe Val Ala Ala Ala Gly Pro Ser Val Ala Ser Ser Thr 435 440 445
Asn Pro Phe Gin Thr Asn Ala Arg Gly Ala Thr Ala Ala Thr Phe Gly 450 455 460
Thr Ala Ser Met Ser Met Pro Thr Gly Phe Gly Thr Pro Ala Pro Tyr 465 470 475 480
Ser Leu Pro Thr Ser Phe Ser Gly Ser Phe Gin Gin Pro Ala Phe Pro 485 490 495
Ala Gin Ala Ala Phe Pro Gin Gin Thr Ala Phe Ser Gin Gin Pro Asn 500 505 510
Gly Ala Gly Phe Ala Ala Phe Gly Gin Thr Lys Pro Val Val Thr Pro 515 520 525
Phe Gly Gin Val Ala Ala Ala Gly Val Ser Ser Asn Pro Phe Met Thr 530 535 540
Gly Ala Pro Thr Gly Gin Phe Pro Thr Gly Ser Ser Ser Thr Asn Pro 545 550 555 560 Phe Leu
(2) INFORMATION FOR SEQ ID NO: 4: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 481 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4:
Met Val Met Ala Ala Xaa Lys Gly Pro Gly Pro Gly Gly Gly Val Ser 1 5 10 15
Gly Gly Lys Ala Glu Ala Glu Ala Ala Ser Glu Val Trp Cys Arg Arg 20 25 30
Val Arg Glu Leu Gly Gly Cys Ser Gin Ala Gly Asn Arg His Cys Phe 35 40 45
Glu Cys Ala Gin Arg Gly Val Thr Tyr Val Asp He Thr Val Gly Ser 50 55 60 Phe Val Cys Thr Thr Cys Ser Gly Leu Leu Arg Gly Leu Asn Pro Pro 65 70 75 80
His Arg Val Lys Ser He Ser Met Thr Thr Phe Thr Xaa Pro Glu Leu 85 90 95
Val Phe Leu Gin Ser Arg Gly Asn Glu Val Cys Arg Lys He Trp Leu 100 105 " 110 Gly Leu Phe Asp Ala Arg Thr Ser Leu Val Pro Asn Ser Arg Asp Pro
115 120 125
Gin Lys Val Lys Glu Phe Leu Gin Glu Lys Tyr Glu Lys Lys Arg Trp 130 135 140
Tyr Val Pro Pro Asp Gin Val Lys Gly Pro Thr Tyr Thr Lys Gly Ser 145 150 155 160
Ala Ser Thr Pro Val Gin Gly Ser He Pro Glu Gly Lys Pro Leu Arg 165 170 175
Thr Leu Leu Gly Asp Pro Ala Pro Ser Leu Ser Val Ala Ala Ser Thr 180 185 190
Ser Ser Gin Pro Val Ser Gin Ser His Ala Arg Thr Ser Gin Ala Arg 195 200 205
Ser Thr Gin Pro Pro Pro His Ser Ser Val Lys Lys Ala Ser Thr Asp
210 215 220
Leu Leu Ala Asp He Gly Gly Asp Pro Phe Ala Ala Pro Gin Met Ala
225 230 235 240
Pro Ala Phe Ala Ala Phe Pro Ala Phe Gly Gly Gin Thr Pro Ser Gin 245 250 255
Gly Gly Phe Ala Asn Phe Asp Ala Phe Ser Ser Gly Pro Ser Ser Ser
260 265 270 Val Phe Gly Ser Leu Pro Pro Ala Gly Gin Ala Ser Phe Gin Ala Gin
275 280 285
Pro Thr Pro Ala Gly Ser Ser Gin Gly Thr Pro Phe Gly Ala Thr Pro
290 295 300
Leu Ala Pro Ala Ser Gin Pro Asn Ser Leu Ala Asp Val Gly Ser Phe
305 310 315 320
Leu Gly Pro Gly Val Pro Ala Ala Gly Val Pro Ser Ser Leu Phe Gly 325 330 335
Met Ala Gly Gin Val Pro Pro Leu Gin Ser Val Thr Met Gly Gly Gly 340 345 350 Gly Gly Ser Ser Thr Gly Leu Ala Phe Gly Ala Phe Thr Asn Pro Phe
355 360 365
Thr Ala Pro Ala Ala Gin Ser Pro Leu Pro Ser Thr Asn Pro Phe Gin
370 375 380
Pro Asn Gly Leu Ala Pro Gly Pro Gly Phe Gly Met Ser Ser Ala Gly
385 390 395 400 Pro Gly Phe Pro Gin Ala Val Pro Pro Thr Gly Ala Phe Ala Ser Ser
405 410 415
Phe Pro Ala Pro Leu Phe Pro Pro Gin Thr Pro Leu Val Gin Gin Gin
420 425 430
Asn Gly Ser Ser Phe Gly Asp Leu Gly Ser Ala Lys Leu Gly Gin Arg
435 440 445
Pro Leu Ser Gin Pro Ala Gly He Ser Thr Asn Pro Phe Met Thr Gly 450 455 460
Pro Ser Ser Ser Pro Phe Ala Ser Lys Pro Pro Thr Thr Asn Pro Phe 465 470 475 480
Leu
(2) INFORMATION FOR SEQ ID NO: 5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 170 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:
Ala He Ala Val Leu Ala Ala Ser Tyr Gly Ser Gly Ser Gly Ser Glu 1 5 10 15
Ser Asp Ser Asp Ser Glu Ser Ser Arg Cys Pro Leu Pro Ala Ala Asp 20 25 30
Ser Leu Met His Leu Thr Lys Ser Pro Ser Ser Lys Pro Ser Leu Ala 35 40 45
Val Ala Val Asp Ser Ala Pro Glu Val Ala Val Lys Glu Asp Leu Glu 50 55 60
Thr Gly Val His Leu Asp Pro Ala Val Lys Glu Val Gin Tyr Asn Pro
65 70 75 80 Thr Tyr Glu Thr Met Phe Ala Pro Glu Phe Gly Pro Glu Asn Pro Phe
85 90 95
Arg Thr Gin Gin Met Ala Ala Pro Arg Asn Met Leu Ser Gly Tyr Ala 100 105 110
Glu Pro Ala His He Asn Asp Phe Met Phe Glu Gin Gin Arg Arg Thr
115 120 125
Phe Ala Thr Tyr Gly Tyr Ala Leu Asp Pro Ser Leu Asp Asn His Gin 130 135 140
Val Ser Ala Lys Tyr He Gly Ser Val Glu Glu Ala Glu Lys Asn Gin
145 150 155 160 Gly Leu Thr Val Phe Glu Thr Gly Gin Lys 165 170
(2) INFORMATION FOR SEQ ID NO: 6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 479 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:
Leu Lys Phe Ser Lys Gly Asp His Leu Tyr Val Leu Asp Thr Ser Gly 1 5 10 15
Gly Glu Trp Trp Tyr Ala His Asn Thr Thr Glu Met Gly Tyr He Pro 20 25 30 Ser Ser Tyr Val Gin Pro Leu Asn Tyr Arg Asn Ser Thr Leu Ser Asp
35 40 45
Ser Gly Met He Asp Asn Leu Pro Asp Ser Pro Asp Glu Val Ala Lys 50 55 60
Glu Leu Glu Leu Leu Gly Gly Trp Thr Asp Asp Lys Lys Val Pro Gly 65 70 75 80
Arg Met Tyr Ser Asn Asn Pro Phe Trp Asn Gly Val Gin Thr Asn Pro 85 90 95
Phe Leu Asn Gly Asn Val Pro Val Met Pro Ser Leu Asp Glu Leu Asn 100 105 110
Pro Lys Ser Thr Val Asp Leu Leu Leu Phe Asp Ala Gly Thr Ser Ser 115 120 125
Phe Thr Glu Ser Ser Ser Ala Thr Thr Asn Ser Thr Gly Asn He Phe 130 135 140
Asp Glu Leu Pro Val Thr Asn Gly Leu His Ala Glu Pro Pro Val Arg
145 150 155 160
Arg Asp Asn Pro Phe Phe Arg Ser Lys Arg Ser Tyr Ser Leu Ser Glu 165 170 175
Leu Ser Val Leu Gin Ala Lys Ser Asp Ala Pro Thr Ser Ser Ser Phe
180 185 190 Phe Thr Gly Leu Lys Ser Pro Ala Pro Glu Gin Phe Gin Ser Arg Glu
195 200 205
Asp Phe Arg Thr Ala Trp Leu Asn His Arg Lys Leu Ala Arg Ser Cys 210 215 220
His Asp Leu Asp Leu Leu Gly Gin Ser Pro Gly Trp Gly Gin Thr Gin
225 230 235 240 Ala Val Glu Thr Asn He Val Cys Lys Leu Asp Ser Ser Gly Gly Ala 245 250 255
Val Gin Leu Pro Asp Thr Ser He Ser He His Val Pro Glu Gly His 260 265 270
Val Ala Pro Gly Glu Thr Gin Gin He Ser Met Lys Ala Leu Leu Asp
275 280 285
Pro Pro Leu Glu Leu Asn Ser Asp Arg Ser Cys Ser He Ser Pro Val 290 295 300
Leu Glu Val Lys Leu Ser Asn Leu Glu Val Lys Thr Ser He He Leu 305 310 315 320
Glu Met Lys Val Ser Ala Glu He Lys Asn Asp Leu Phe Ser Lys Ser 325 330 335
Thr Val Gly Leu Gin Cys Leu Arg Ser Asp Ser Lys Glu Gly Pro Tyr 340 345 350
Val Ser Val Pro Leu Asn Cys Ser Cys Gly Asp Thr Val Gin Ala Gin 355 360 365
Leu His Asn Leu Glu Pro Cys Met Tyr Val Ala Val Val Ala His Gly 370 375 380
Pro Ser He Leu Tyr Pro Ser Thr Val Trp Asp Phe He Asn Lys Lys 385 390 395 400
Val Thr Val Gly Leu Tyr Gly Pro Lys His He His Pro Ser Phe Lys
405 410 415 Thr Val Val Thr He Phe Gly His Asp Cys Ala Pro Lys Thr Leu Leu
420 425 430
Val Ser Glu Val Thr Arg Gin Ala Pro Asn Pro He Thr Lys Arg Trp
435 440 445
Lys His Leu Thr Gly Thr Leu He Leu Val Asn Ser Leu Asp Val Leu
450 455 460
Arg Ala Ala Ala Phe Ser Pro Ala Asp Gin Asp Asp Phe Val He 465 470 475
(2) INFORMATION FOR SEQ ID NO: 7:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 122 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:
Val Asn Leu Asp Ser Leu Val Thr Arg Pro Ala Pro Pro Ala Gin Ser 1 5 10 15 Leu Asn Pro Phe Leu Ala Pro Gly Ala Pro Ala Thr Ser Ala Pro Val 20 25 30
Asn Pro Phe Gin Val Asn Gin Pro Gin Pro Leu Thr Leu Asn Gin Leu 35 40 45
Arg Gly Ser Pro Val Leu Gly Thr Ser Thr Ser Phe Gly Pro Gly Pro 50 55 60 Gly Val Glu Ser Met Ala Val Ala Ser Met Thr Ser Ala Ala Pro Gin 65 70 75 80
Pro Ala Leu Gly Ala Thr Gly Ser Ser Leu Thr Pro Leu Gly Pro Ala 85 90 95
Met Met Asn Met Val Gly Ser Val Gly He Pro Pro Ser Ala Ala Gin 100 105 110
Ala Thr Gly Thr Thr Asn Pro Phe Leu Leu 115 120
(2) INFORMATION FOR SEQ ID NO: 8:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 2995 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:
CCGAGGTAGA GGCAGTGGCG CTTGAGTTGG TCGGGGGCAG CGGCAGATTT GAGGCTTAAG 60
CAACTTCTTC CGGGGAAGAG TGCCAGTGCA GCCACTGTTA CAATTCAAGA TCTTGATCTA 120
TATCCATAGA TTGGAATATT GGTGGGCCAG CAATCCTCAG ACGCCTCACT TAGGACAAAT 180
GAGGAAACTG AGGCTTGGTG AAGTTACGAA ACTTGTCCAA AATCACACAA CTTGTAAAGG 240 GCACAGCCAA GATTCAGAGC CAGGCTGTAA AAATTAAAAT GAACAAATTA CGGCAAAGTT 300
TTAGGAGAAA GAAGGATGTT TATGTTCCAG AGGCCAGTCG TCCACATCAG TGGCAGACAG 360
ATGAAGAAGG CGTTCGCACC GGAAAATGTA GCTTCCCGGT TAAGTACCTT GGCCATGTAG 420
AAGTTGATGA ATCAAGAGGA ATGCACATCT GTGAAGATGC TGTAAAAAGA TTGAAAGCTG 480
AAAGGAAGTT CTTCAAAGGC TTCTTTGGAA AAACTGGAAA GAAAGCAGTT AAAGCAGTTC 540 TGTGGGTCTC AGCAGATGGA CTCAGAGTTG TGGATGAAAA AACTAAGGAC CTCATAGTTG 600
ACCAGACGAT AGAGAAAGTT TCTTTCTGTG CCCCAGACAG GAACTTTGAT AGAGCCTTTT 660
CTTACATATG CCGTGATGGC ACCACTCGTC GCTGGATCTG TCACTGCTTC ATGGCTGTCA 720
AGGACACAGG TGAAAGGTTG AGCCATGCAG TAGGCTGTGC TTTTGCAGCC TGTTTAGAGC 780
GCAAGCAGAA GCGGGAGAAG GAATGTGGAG TGACTGCTAC TTTTGATGCT AGTCGGACCA 840 CTTTTACAAG AGAAGGATCA TTCCGTGTCA CAACAGCCAC TGAACAAGCA GAAAGAGAGG 900
AGATCATGAA ACAAATGCAA GATGCCAAGA AAGCTGAAAC AGATAAGATA GTCGTTGGTT 960
CATCAGTTGC CCCTGGCAAC ACTGCCCCAT CCCCATCCTC TCCCACCTCT CCTACTTCTG 1020
ATGCCACGAC CTCTCTGGAG ATGAACAATC CTCATGCCAT CCCACGCCGG CATGCTCCAA 1080
TTGAACAGCT TGCTCGCCAA GGCTCTTTCC GAGGTTTTCC TGCTCTTAGC CAGAAGATGT 1140
CACCCTTTAA ACGCCAACTA TCCCTACGCA TCAATGAGTT GCCTTCCACT ATGCAGAGGA 1200
AGACTGATTT CCCCATTAAA AATGCAGTGC CAGAAGTAGA AGGGGAGGCA GAGAGCATCA 1260 GCTCCCTGTG CTCACAGATC ACCAATGCCT TCAGCACACC TGAGGACCCC TTCTCATCTG 1320
CTCCGATGAC CAAACCAGTG ACAGTGGTGG CACCACAATC TCCTACCTTC CAAGGGACCG 1380
AGTGGGGTCA ATCTTCTGGT GCTGCCTCTC CAGGTCTCTT CCAGGCCGGT CATAGACGTA 1440
CTCCCTCTGA GGCCGACCGA TGGTTAGAAG AGGTGTCTAA GAGCGTCCGG GCTCAGCAGC 1500
CCCAGGCCTC AGCTGCTCCT CTGCAGCCAG TTCTCCAGCC TCCTCCACCC ACTGCCATCT 1560 CCCAGCCAGC ATCACCTTTC CAAGGGAATG CATTCCTCAC CTCTCAGCCT GTGCCAGTGG 1620
GTGTGGTCCC AGCCCTGCAA CCAGCCTTTG TCCCTGCCCA GTCCTATCCT GTGGCCAATG 1680
GAATGCCCTA TCCAGCCCCT AATGTGCCTG TGGTGGGCAT CACTCCCTCC CAGATGGTGG 1740
CCAACGTATT TGGCACTGCA GGCCACCCTC AGGCTGCCCA TCCCCATCAG TCACCCAGCC 1800
TGGTCAGGCA GCAGACATTC CCTCACTACG AGGCAAGCAG TGCTACCACC AGTCCCTTCT 1860 TTAAGCCTCC TGCTCAGCAC CTCAACGGTT CTGCAGCTTT CAATGGTGTA GATGATGGCA 1920
GGTTGGCCTC AGCAGACAGG CATACAGAGG TTCCTACAGG CACCTGCCCA GTGGATCCTT 1980
TTGAAGCCCA GTGGGCTGCA TTAGAAAATA AGTCCAAGCA GCGTACTAAT CCCTCCCCTA 2040
CCAACCCTTT CTCCAGTGAC TTACAGAAGA CGTTTGAAAT TGAACTTTAA GCAATCATTA 2100
TGGCTATGTA TCTTGTCCAT ACCAGACAGG GAGCAGGGGG TAGCGGTCAA AGGAGCAAAA 2160 CAGACTTTGT GTCCTGATTA GTACTCTTTT CACTAATCCC AAAGGTCCCA AGGAACAAGT 2220
CCAGGCCCAG AGTACTGTGA GGGGTGATTT TGAAAGACAT GGGAAAAAGC ATTCCTAGAG 2280
AAAAGCTGCC TTGCAATTAG GCTAAAGAAG TCAAGGAAAT GTTGCTTTCT GTACTCCCTC 2340
TTCCCTTACC CCCTTACAAA TCTCTGGCAA CAGAGAGGCA AAGTATCTGA ACAAGAATCT 2400
ATATTCCAAG CACATTTACT GAAATGTAAA ACACAACAGG AAGCAAAGCA ATCTCCCTTT 2460 GTTTTTCAGG CCATTCACCT GCCTCCTGTC AGTAGTGGCC TGTATTAGAG ATCAAGAAGA 2520
GTGGTTTGTG CTCAGGCTGG GGAACAGAGA GGCACGCTAT GCTGCCAGAA TTCCCAGGAG 2580
GGCATATCAG CAACTGCCCA GCAGAGCTAT ATTTTGGGGG AGAAGTTGAG CTTCCATTTT 2640
GAGTAACAGA ATAAATATTA TATATATCAA AAGCCAAAAT CTTTATTTTT ATGCATTTAG 2700
AATATTTTAA ATAGTTCTCA GATATTAAGA AGTTGTATGA GTTGTAAGTA ATCTTGCCAA 2760 AGGTAAAGGG GCTAGTTGTA AGAAATTGTA CATAAGATTG ATTTATCATT GATGCCTACT 2820
GAAATAAAAA GAGGAAAGGC TGGAAGCTGC AGACAGGATC CCTAGCTTGT TTTCTGTCAG 2880
TCATTCATTG TAAGTAGCAC ATTGCAACAA CAATCATGCT TATGACCAAT ACAGTCACTA 2940
GGTTGTAGTT TTTTTTAAAT AAAGGAAAAG CAGTATTGTC CTGGTTTTAA -ACCTA 2995 (2) INFORMATION FOR SEQ ID NO: 9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3104 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:
GGAACACCCG GCCGCAACCC TGGGAGACGT CCCAGGGACT TCGGGGGCCG TTTTTGTGGC 60
CCGACCTGAG TCCAAAAATC CCGATCGTTT TGGACTCTTT GGTGCACCCC CCTTAGAGGA 120
GGGATATGTG GTTCTGGTAG GAGACGAGAA CCTAAAACAG TTCCCGCCTC CGTCTGAATT 180 TTTGCTTTCG GTTTGGGACC GAAGCCGCGC CGCGCGTCTT GTCTGCCGGA TCATTTAGGT 240
GACACTATAG AATACGGATC CGTCGACGGC CATTACCAAT CGCGAAACCG CTCGCGCGCG 300
TGCATCCCGC ACCATCCCCC GGCCCCGGGC CCTGGCCGGC GTCAGACCGA GCTGCCGCCG 360
CCACCACCGC AGCAGCAGCA GTCGGGGAGC CAGGCCCAGC CAGGGCCGCG GGAGGCGGGG 420
GCGCCCGGGC CCTGGATGTC CCGCAGCGCG GCGGCCAGCG GCGGACCCCG GAGGCCTGAG 480 CGGCACCTGC CCCCAGCCCC CTGTGGGGCC CCGGGGCCCC CAGAAACCTG CAGGACGGAG 540
CCAGACGGGG CGGGCACCAT GAACAAGTTA CGGCAGAGCC TGCGGCGGAG GAAGCCAGCC 600
TACGTGCCCG AGGCGTCGCG CCCGCACCAG TGGCAGGCAG ACGAGGACGC GGTGCGGAAG 660
GGCACGTGCA GCTTCCCGGT CAGGTACCTG GGTCACGTGG AGGTAGAGGA GTCCCGGGGA 720
ATGCACGTGT GTGAAGATGC GGTGAAGAAG CTGAAGGCGA TGGGCCGAAA GTCCGTGAAG 780 TCTGTCCTGT GGGTGTCAGC CGATGGGCTC CGAGTGGTGG ACGACAAAAC CAAGGATCTT 840
CTGGTCGACC AGACCATCGA AAAGGTCTCC TTTTGTGCTC CTGACCGCAA CCTGGACAAG 900
GCTTTCTCCT ATATCTGTCG TGACGGGACT ACCCGCCGCT GGATCTGCCA CTGTTTTCTG 960
GCACTGAAGG ACTCCGGCGA GAGGCTGAGC CACGCTGTGG GCTGTGCTTT TGCCGCCTGC 1020
CTGGAGCGAA AACAGCGACG GGAGAAGGAA TGTGGGGTCA CGGCCGCCTT CGATGCCAGC 1080 CGCACCAGCT TCGCCCGCGA GGGCTCCTTC CGCCTGTCTG GGGGTGGGCG GCCTGCTGAG 1140
CGAGAGGCCC CGGACAAGAA GAAAGCAGAG GCAGCAGCTG CCCCCACTGT GGCTCCTGGC 1200
CCTGCCCAGC CTGGGCACGT GTCCCCGACA CCAGCCACCA CATCCCCTGG TGAGAAGGGT 1260 GAGGCAGGCA CCCCTGTGGC TGCAGGCACC ACTGCGGCCG CCATCCCCCG GCGCCATGCA 1320
CCCCTGGAGC AGCTGGTTCG CCAGGGCTCC TTCCGTGGGT TCCCAGCACT CAGCCAGAAG 1380
AACTCGCCTT TCAAACGGCA GCTGAGCCTA CGGCTGAATG AGCTGCCATC CACGCTGCAG 1440
CGCCGCACTG ACTTCCAGGT GAAGGGCACA GTGCCTGAGA TGGAGCCTCC TGGTGCCGGC 1500 GACAGTGACA GCATCAACGC TCTGTGCACA CAGATCAGTT CATCTTTTGC CAGTGCTGGA 1560
GCGCCAGCAC CAGGGCCACC ACCTGCCACA ACAGGGACTT CTGCCTGGGG TGAGCCCTCC 1620
GTGCCCCCTG CAGCTGCCTT CCAGCCTGGG CACAAGCGGA CACCTTCAGA GGCTGAGCGA 1680
TGGCTGGAGG AGGTGTCACA GGTGGCCAAG GCCCAGCAGC AGCAGCAGCA GCAACAGCAA 1740
CAGCAGCAGC AGCAGCAGCA GCAACAGCAG CAAGCAGCCT CAGTGGCCCC AGTGCCCACC 1800 ATGCCTCCTG CCCTGCAGCC TTTCCCCGCC CCCGTGGGGC CCTTTGACGC TGCACCTGCC 1860
CAAGTGGCCG TGTTCCTGCC ACCCCCACAC ATGCAGCCCC CTTTTGTGCC CGCCTACCCG 1920
GGCTTGGGCT ACCCACCGAT GCCCCGGGTG CCCGTGGTGG GCATCACACC CTCACAGATG 1980
GTGGCAAACG CCTTCTGCTC AGCCGCCCAG CTCCAGCCTC AGCCTGCCAC TCTGCTTGGG 2040
AAAGCTGGGG CCTTCCCGCC CCCTGCCATA CCCAGTGCCC CTGGGAGCCA GGCCCGCCCT 2100 CGCCCCAATG GGGCCCCCTG GCCCCCTGAG CCAGCGCCTG CCCCAGCTCC AGAGTTGGAC 2160
CCCTTTGAGG CCCAGTGGGC GGCATTAGAA GGCAAAGCCA CTGTAGAGAA ACCCTCCAAC 2220
CCCTTTTCTG GTGACCTGCA AAAGACATTC GAGATTGAAC TGTAGTCCGA GCCGCCCCAC 2280
CCACTCCATC ATCTCCAGGT GCCCCACGCC TGGGGGTGGA GGCACAACCT CTCCCCTAAC 2340
CCTGCTCCCT GGGGCTGCGC CCCTCAACAC CCTCTCAACA CCCCCCTCCC TCAACCACCC 2400 CGACAACCAC TACAGAACCA ACATTGTGAC GCCCAGGTTG CAACAGGATG GAATTCAGGG 2460
ACGGACCCAG CCTGGCTAAG GGAACCATTT CACTGCCGGA CTTAGGCTGG CAATGCCCCC 2520
TTCCCCAACC CCAGACACAG GGGTTGGCCA CAATCCCACT GAATGCCCTT GGTTCACACT 2580
CCATTTCCCA GTTTCTGTTG ACCCCCACCT TCCAGTGTTG GACAGGATGG AGGGGGGACA 2640
CTTGCTTAGG GGCTCTCCTG GGCCCCACAC CAGTGCCCAC CCCAAATCTG GTCGTCTCCT 2700 CCCCCCATGC ACAGCACAAG CTAAGGGCTG CCCTCTGACC ACACGCTGCG TTCACTGCCA 2760
ATGCTGTACT CACCTCCATC ACCCTCCAAC TTTGGGGCCC ATGTCTTCCT TGGGCCAAGG 2820
TCTCATGGGG GCTAGCGCCA AGTTGGGGGC CCAGGAGGCG GGGAGGGAGG AGGAGGAGAA 2880
GATGCGCAGT TACCTCATGT CGGTGCCCGC TGGGGAGGGG TCCGGGAAGA AGGGGAAGGG 2940
GTGCCTGGCG GGTACTTTTC TATCTTTTAT TTCCAGATTT TTTTTGTATC TAAACTTGAA 3000 GATTTGTATT ATACAAGGAC AGCCAATAAA GGAAGAATAT AAAAAAAAAA AAAAAAAAAA 3060
AAAAAAAAAA ACAAAAAAAG GCCGCCTCGG CCCGTCGACG TCAA 3104
(2) INFORMATION FOR SEQ ID NO: 10: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2583 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: TGGCGGCGGC GGCGGCGGTT GTCCCGGCTG TGCCGGTTGG TGTGGCCCGT CAGCCCGCGT 60
ACCACAGCGC CCGGGCCGCG TCGAGCCCAG TACAGCCAAG CCGCTGCGGC CGGGTCCGGC 120
GCGGGCGGCG CGCGCAGACG GAGGGCGGCG GCCGCGGCCA GGGCGGCCCG TGGGACCGCG 180
GGCCCCCGGC GCAGCGCTGC CCGGCTCCCG GCCCTGCCGG CCTCCTCCCT TGGCGCCGCG 240
GCCATGGCGG CCAGCGCGAA GCGGAAGCAG GAGGAGAAGC ACCTGAAGAT GCTGCGGGAC 300 ATGACCGGCC TCCCGCACAA CCGAAAGTGC TTCGACTGCG ACCAGCGCGG CCCCACCTAC 360
GTTAACATGA CGGTCGGCTC CTTCGTGTGT ACCTCCTGCT CCGGCAGCCT GCGAGGATTA 420
AATCCACCAC ACAGGGTGAA ATCTATCTCC ATGACAACAT TCACACAACA GGAAATTGAA 480
TTCTTACAAA AACATGGAAA TGAAGTCTGT AAACAGATTT GGCTAGGATT ATTTGATGAT 540
AGATCTTCAG CAATTCCAGA CTTCAGGGAT CCACAAAAAG TGAAAGAGTT TCTACAAGAA 600 AAGTATGAAA AGAAAAGATG GTATGTCCCG CCAGAACAAG CCAAAGTCGT GGCATCAGTT 660
CATGCATCTA TTTCAGGGTC CTCTGCCAGT AGCACAAGCA GCACACCTGA GGTCAAACCA 720
CTGAAATCTC TTTTAGGGGA TTCTGCACCA ACACTGCACT TAAATAAGGG CACACCTAGT 780
CAGTCCCCAG TTGTAGGTCG TTCTCAAGGG CAGCAGCAGG AGAAGAAGCA ATTTGACCTT 840
TTAAGTGATC TCGGCTCAGA CATCTTTGCT GCTCCAGCTC CTCAGTCAAC AGCTACAGCC 900 AATTTTGCTA ACTTTGCACA TTTCAACAGT CATGCAGCTC AGAATTCTGC AAATGCAGAT 960
TTTGCAAACT TTGATGCATT TGGACAGTCT AGTGGTTCGA GTAATTTTGG AGGTTTCCCC 1020
ACAGCAAGTC ACTCTCCTTT TCAGCCCCAA ACTACAGGTG GAAGTGCTGC ATCAGTAAAT 1080
GCTAATTTTG CTCATTTTGA TAACTTCCCC AAATCCTCCA GTGCTGATTT TGGAACCTTC 1140
AATACTTCCC AGAGTCATCA AACAGCATCA GCTGTTAGTA AAGTTTCAAC GAACAAAGCT 1200 GGTTTACAGA CTGCAGACAA ATATGCAGCA CTTGCTAATT TAGACAATAT CTTCAGTGCC 1260
GGGCAAGGTG GTGATCAGGG AAGTGGCTTT GGGACCACAG GTAAAGCTCC TGTTGGTTCT 1320
GTGGTTTCAG TTCCCAGTCA GTCAAGTGCA TCTTCAGACA AGTATGCAGC TCTGGCAGAA 1380
CTAGACAGCG TTTTCAGTTC TGCAGCCACC TCCAGTAATG CGTATACTTC CACAAGTAAT 1440
GCTAGCAGCA ATGTTTTTGG AACAGTGCCA GTGGTTGCTT CTGCACAGAC ACAGCCTGCT 1500 TCATCAAGTG TGCCTGCTCC ATTTGGAGCT ACGCCTTCCA CAAATCCATT TGTTGCTGCT 1560
GCTGGTCCTT CTGTGGCATC TTCTACAAAC CCATTTCAGA CCAATGCCAG AGGAGCAACA 1620
GCGGCAACCT TTGGCACTGC ATCCATGAGC ATGCCCACGG GATTCGGCAC TCCTGCTCCC 1680
TACAGTCTTC CCACCAGCTT TAGTGGCAGC TTTCAGCAGC CTGCCTTTCC- AGCCCAAGCA 1740
GCTTTCCCTC AACAGACAGC TTTTTCTCAA CAGCCCAATG GTGCAGGTTT TGCAGCATTT 1800
GGACAAACAA AGCCAGTAGT AACCCCTTTT GGTCAAGTTG CAGCTGCTGG AGTATCTAGT 1860
AATCCTTTTA TGACTGGTGC ACCAACAGGA CAATTTCCAA CAGGAAGCTC ATCAACCAAT 1920 CCTTTCTTAT AGCCTTATAT AGACAATTTA CTGGAACGAA CTTTTATGTG GTCACATTAC 1980
ATCTCTCCAC CTCTTGCACT GTTGTCTTGT TTCACTGATC TTAGCTTTAA ACACAAGAGA 2040
AGTCTTTAAA AAGCCTGCAT TGTGTATTAA ACACCAGGTA ATATGTGCAA AACAGAGGGC 2100
TCCAGTAACA CCTTCTAACC TGTGAATTGG CAGAAAAGGG TAGCGGTATC ATGTATATTA 2160
AAATTGGCTA ATATTAAGTT ATTGCAGATA CCACATTCAT TATGCTGCAG TACTGTACAT 2220 ATTTTTCTTA GAAATTAGCT ATTTGTGCAT ATCAGTATTT GTAACTTTAA CACATTGTTA 2280
TGTGAGAAAT GTTACTGGGG AAATAGATCA GCCACTTTTA AGGTGCTGTC ATATATCTTT 2340
GGAATGAATG ACCTAAAATC ATTTTAACCA TGCTACTGGA AAGTAACAGA GTCAAAATTG 2400
GAAGGTTTTA TTCATTCTTG AATTTTTCCT TTCTAAAGAG CTCTTCTATT TATACATGCC 2460
TAAATTCTTT TAAAATGTAG AGGGATACCT GTCTGCATAA TAAAGCTGAT CATGTTTTGC 2520 TACAGTTTGC AGGTGAAAAA AAATAAATAT TATAAAATAA AAAAAAAAAA AAAAAAAAAA 2580
AAA 2583
(2) INFORMATION FOR SEQ ID NO: 11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1549 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:
AGCGCCATTA CCAATCGCGA ACCCGCCTTT TTTTTTTTTT TTTTATTAGG AAAATGGGCG 60
ATGGTGATGG CGGCAAANAA GGGCCCGGGC CCGGGCGGCG GGGTCAGCGG GGGCAAGGCG 120
GAGGCGGAGG CGGCCTCGGA GGTGTGGTGC CGTCGGGTGC GGGAGCTGGG TGGCTGCAGC 180 CAGGCCGGGA ACCGCCACTG CTTCGAGTGC GCCCAGCGCG GGGTCACCTA CGTGGATATC 240
ACCGTGGGCA GCTTCGTGTG CACCACCTGC TCCGGCCTCC TGAGAGGGCT GAACCCCCCT 300
CATCGTGTCA AGTCAATCTC CATGACAACT TTCACTGANC CTGAATTAGT ATTCCTGCAA 360 TCCCGTGGAA ATGAGGTTTG CCGGAAGATT TGGTTGGGTC TGTTTGATGC TCGGACATCT 420
TTAGTACCAA ATTCCAGGGA TCCTCAGAAA GTGAAGGAGT TTCTCCAGGA AAAATATGAG 480
AAGAAGAGAT GGTATGTCCC CCCAGACCAA GTCAAGGGGC CCACTTATAC CAAAGGCAGT 540
GCCTCCACCC CTGTGCAGGG CTCCATCCCA GAAGGGAAGC CCCTTCGGAC ACTTCTGGGT 600 GATCCTGCAC CGTCTCTCTC AGTTGCTGCC TCCACCTCGA GCCAGCCCGT CAGTCAGTCT 660
CACGCTCGGA CATCCCAGGC CCGGAGCACT CAGCCACCTC CCCACTCCTC TGTCAAAAAA 720
GCCAGTACTG ACCTGCTGGC TGACATCGGT GGAGACCCCT TTGCTGCACC CCAGATGGCA 780
CCAGCTTTTG CTGCATTCCC TGCCTTTGGG GGCCAGACAC CTTCCCAAGG AGGCTTTGCC 840
AACTTTGATG CCTTTAGCAG TGGCCCCAGC TCTTCTGTGT TTGGAAGCCT CCCTCCAGCT 900 GGTCAAGCCT CGTTCCAGGC CCAGCCAACT CCTGCAGGGA GCAGCCAGGG GACTCCATTT 960
GGTGCCACTC CCCTGGCACC CGCCAGTCAG CCAAACAGCC TCGCAGACGT GGGCAGCTTC 1020
CTGGGACCCG GGGTGCCCGC TGCAGGTGTT CCTAGCAGCC TCTTCGGGAT GGCTGGCCAG 1080
GTCCCCCCGC TCCAGTCTGT CACGATGGGC GGCGGCGGCG GCAGCAGCAC AGGGCTGGCC 1140
TTTGGAGCCT TCACTAACCC TTTCACAGCT CCCGCCGCCC AGTCCCCGCT GCCTTCCACC 1200 AACCCGTTCC AGCCCAATGG CTTGGCGCCA GGGCCCGGCT TTGGGATGAG CAGTGCTGGG 1260
CCTGGCTTCC CCCAGGCAGT GCCACCCACT GGGGCCTTTG CCAGCTCCTT CCCAGCACCG 1320
CTGTTCCCCC CGCAGACCCC GCTTGTTCAG CAGCAGAATG GCTCTTCCTT CGGGGACTTA 1380
GGATCAGCCA AGTTGGGGCA GAGGCCACTG AGCCAGCCAG CTGGGATCTC CACCAACCCC 1440
TTCATGACTG GACCCTCATC AAGCCCATTC GCCTCCAAAC CTCCAACCAC CAACCCCTTC 1500 TTGTAGCACT GTGTTTTTGG GGGGCCTCTT CCCTGCCTTC TGGGGCCCC 1549
(2) INFORMATION FOR SEQ ID NO: 12:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 534 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:
ACGGCCATTA CCAATCGCGA AACCGCGATT GCAGTTCTGG CCGCTTCCTA TGGTTCGGGT 60
TCAGGGTCCG AATCGGACTC GGACAGTGAG AGCAGTCGGT GTCCGCTGCC AGCCGCCGAC 120
TCCCTCATGC ACTTGACTAA ATCGCCTTCA TCAAAGCCGT CTCTAGCAGT GGCAGTGGAC 180
TCGGCTCCGG AGGTGGCAGT TAAGGAAGAT TTGGAGACTG GAGTTCACCT TGACCCTGCC 240 oZ
GTCAAAGAAG TTCAGTATAA TCCTACCTAT GAGACCATGT TTGCTCCTGA GTTTGGACCA 300
GAAAATCCCT TTAGGACACA GCAAATGGCT GCCCCTAGAA ATATGCTTTC TGGATATGCC 360
GAACCAGCTC ATATCAATGA TTTCATGTTT GAGCAGCAAA GGAGAACTTT TGCAACATAT 420
GGTTATGCAT TAGACCCTTC ATTAGATAAT CATCAAGTGT CTGCTAAATA TATTGGTTCT 480
GTAGAAGAAG CTGAAAAAAA TCAAGGTTTA ACTGTATTTG AAACTGGTCA GAAG 534 (2) INFORMATION FOR SEQ ID NO: 13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3514 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:
TACAGTCAAC AGCAACTGAT GGAAACCAGC CATCGCCATC TGCTGCACGC GGAAGAAGGC 60
ACATGGCTGA ATATCGACGG TTTCCATATG GGGATTGGTG GCGACGACTC CTGGAGCCCG 120 TCAGTATCGG CGGAATTGTC GACGGCCATT ACAAATCGCG AAACCCTGAA GTTCTCCAAG 180
GGCGACCATC TCTACGTCTT GGACACATCT GGCGGTGAGT GGTGGTACGC ACACAACACC 240
ACCGAAATGG GCTACATCCC CTCCTCCTAT GTGCAGCCCT TGAACTACCG GAACTCAACA 300
CTGAGTGACA GCGGTATGAT TGATAATCTT CCAGACAGCC CAGACGAGGT AGCCAAGGAG 360
CTGGAGCTGC TCGGGGGATG GACAGATGAC AAAAAAGTAC CAGGCAGAAT GTACAGTAAT 420 AACCCTTTCT GGAATGGGGT CCAGACCAAT CCATTTCTGA ATGGGAACGT GCCCGTCATG 480
CCCAGCCTGG ATGAGCTGAA TCCCAAAAGT ACTGTGGATT TGCTCCTTTT TGACGCAGGT 540
ACATCCTCCT TCACCGAATC CAGCTCAGCC ACCACGAATA GCACTGGCAA CATCTTCGAT 600
GAGCTTCCAG TCACAAACGG ACTCCACGCA GAGCCGCCGG TCAGGCGGGA CAACCCCTTC 660
TTCAGAAGCA AGCGCTCCTA CAGTCTCTCG GAACTCTCCG TCCTCCAAGC CAAGTCCGAT 720 GCTCCCACAT CGTCGAGTTT CTTCACCGGC TTGAAATCAC CTGCCCCCGA GCAATTTCAG 780
AGCCGGGAGG ATTTTCGAAC TGCCTGGCTA AACCACAGGA AGCTGGCCCG GTCTTGCCAC 840
GACCTGGACT TGCTTGGCCA AAGCCCTGGT TGGGGCCAGA CCCAAGCCGT GGAGACAAAC 900
ATCGTGTGCA AGCTGGATAG CTCCGGGGGT GCTGTCCAGC TTCCTGACAC CAGCATCAGC 960
ATCCACGTGC CCGAGGGCCA CGTCGCCCCT GGGGAGACCC AGCAGATCTC CATGAAAGCC 1020 CTGCTGGACC CCCCGCTGGA GCTCAACAGT GACAGGTCCT GCAGCATCAG CCCTGTGCTG 1080
GAGGTCAAGC TGAGCAACCT GGAGGTGAAA ACCTCTATCA TCTTGGAGAT GAAAGTGTCA 1140
GCCGAGATAA AAAATGACCT TTTTAGCAAA AGCACAGTGG GCCTCCAGTG CCTGAGGAGC 1200 GACTCGAAGG AAGGGCCATA TGTCTCCGTC CCGCTCAACT GCAGCTGTGG GGACACGGTC 1260
CAGGCACAGC TGCACAACCT GGAGCCCTGT ATGTACGTGG CTGTCGTGGC CCATGGCCCA 1320
AGCATCCTCT ACCCTTCCAC CGTGTGGGAC TTCATCAATA AAAAAGTCAC AGTGGGTCTC 1380
TACGGGCCTA AACACATCCA CCCATCCTTC AAGACGGTAG TGACCATTTT TGGGCATGAC 1440 TGTGCCCCAA AGACGCTCCT GGTCAGCGAG GTCACACGCC AGGCACCCAA CCCCATCACC 1500
AAGCGCTGGA AGCACCTCAC TGGGACTCTG ATCTTGGTGA ACTCCCTGGA CGTTCTGAGA 1560
GCAGCCGCCT TCAGCCCTGC GGACCAGGAC GACTTCGTGA TTTGAATGGG TCCCCTCCCC 1620
TCCTGCTGCT CTGGAGTGCA AGCCCTCTTC TGCCCTGCGT GCCCTGCTGT CACCGCGGAG 1680
CTGAAGAGGG AGGAAGGGGC GGCTGCTCAG ACAGATTTAG GCCCGCCAGC TAGGCTACAC 1740 CCATCATGCG CCGCCCTCCT CCATCGAGGG AGAGGCCTGA AGGGACTGCC TACTGCAGCT 1800
CGTTGCCAAT CACATAGCTT TCTATTTGTT AAGTATAAAT TTAAATTTAA AATCACTTTT 1860
TTAACGAATG GGGGGAAGGG ATCTATGAGA AAGGTGGTAT CTAATTTTTT TATGGACCAT 1920
AAAGGTTTAA AAGAAAATAG GGGCACAGGC TGTTGAGGTT TTTATGTTGT TATAGACCTT 1980
TTTAAATTAT GTTAGAGATG TATATAGGTA TTTAAAGGTC ACTGGGAGCG TTTCTGATTC 2040 CCGGCCACAC TTTGCATTTC AACACTCAGC CCGGAAAGAT GCTCGTTCGG TTGTTGGACC 2100
TCTTTCACTC CCTGCGTGTA AGAAGGTGAA TCACGTGGGA AAAAGTGGCT TTTCAGTAAA 2160
CGGGTACAGC TCATTCTTTC TGAGAAGGCC CCAGGTCCTG CTCCCTCCTC GGATTTGATT 2220
GTCTTCCGTG CTTTGCCTCA CTCGTAGTAA ATGACCATCC ATAGAATATG TGAATCTTTG 2280
GTGAGCTTCA GTGGGCAGAG TGAAGTCCCG CATTAGCATT TAGGTGCCCT GAGCTGTTTC 2340 TGCCAATAGA TTAGAAAGCA GCCATGAGTT GACAGTCTTT AGGGCCCCTG CCAGTGTGCA 2400
ATTAGTCATT GACAAGAACA ATGCCATTTG AGAGTGAGGT GGTCCCTGCT GCTACGAGGC 2460
CATTGTACTG TTTTTTCCTT GAGGTCAAAG CAGTGCTTCC CATAGAGTTT GCTGCCTCTT 2520
CTGTGGACAG GAAGAAAACT TCATGACCGA ATCAGAGCCT TGGTGGCCAC TGACTCTCGT 2580
GCTTATTGCA GATGCTGTGG TTGGCCTCAC AAGCAACGCC TTATGCTGAT GTGCAGAGGT 2640 GCCAGCTGCC ATTTGCCAAA CTCTGCATTT CATTTCATCT AAGGCTTAAC CCCTCTTCCT 2700
TCCTGGTGTA CCTGTGTCTC CTCGGAAGGA AGTCATAGTT TAGATGAAAC CATTTTTTGT 2760
ACAATGTAAA GATCATCTGA GCAAGATGAG CATTTTGTAA AAATGAAAAT GTGACTCACA 2820
TAAAATCAGG AACTTGACAC AGTGTTGCAT TAATAACTTT AGGGTGCAGA CATGCTGTGT 2880
GAATCTCACA ATGCGTCGTA GATGTCGCGT GTTGGAAGGG AGCAGGAGGA AGGACTGATA 2940 CTGGCAAATC AGTAGAGTGA GGTGATCCTT AGCAACGTGC CAGGACACTT CCTGTGTGCC 3000
TGCAGTTGTC AGGGACCATT TGGGATCCCG AATCTCATTC TCTAAAACTG CTTTCTTGAA 3060
ACATGTTACT TCCTTAGTAT AATCAATGTA TACTCCCTTA CTGGCCTGAA ACGTTGTATA 3120 GCTACTTATT CAGATACTGA AGACCAACGG ACTGAAAAAA AGAACAAACA TTAGCTATTT 3180
TATGCTGCAA GAACCAGGAC ACACAATTCG CCAATCATCC CACCATATAA CCTTCGATTG 3240
TGCTTCTCAA CTCCACCCCA TAATTTCTCC CAGAGACCAT CTATCACCTT TTCCCCAAAG 3300
AAGAAACAAA ACCAGTTGCA CCTTAAACCA TGGATATTTT TTCCTCAGGG^ GCTTTAAATA 3360 GTTTCCTATG CAACGTGTCT TGTAGCACAA ATAAAATTCT ACAAAAGTTG CAGTAAATTT 3420
TATTTGGATA TTTTAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGCCGC 3480
CTCGGCCCGT CGACGTCAAT TCCAGCTGAG CGCC 3514 (2) INFORMATION FOR SEQ ID NO: 14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2981 base pairs (B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: cDNA
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:
GGTTTCCATA TGGGGATTGG TGGCGACGAC TCCTGGAACC CGTCAGTATC GGCGGAATTG 60
TCGACGGCCA TTACCAATCG CGAAACCGTG AACCTGGACT CACTGGTGAC CAGGCCTGCC 120 CCACCAGCCC AGTCCCTCAA CCCTTTCCTG GCACCAGGTG CTCCCGCCAC CTCGGCCCCT 180
GTTAACCCTT TCCAGGTGAA CCAGCCCCAG CCGCTGACAC TGAACCAGCT TCGGGGGAGC 240
CCAGTCCTGG GGACCAGCAC ATCCTTTGGG CCTGGCCCAG GAGTGGAGTC CATGGCTGTG 300
GCCTCGATGA CCTCCGCGGC CCCACAGCCA GCTCTGGGGG CCACTGGTTC CTCTCTGACA 360
CCACTGGGCC CTGCAATGAT GAACATGGTG GGCAGTGTGG GTATACCCCC ATCAGCAGCC 420 CAGGCCACTG GCACAACCAA CCCTTTCCTT CTCTAGTGCC TGGGCCTGGG ACCCACCCAG 480
AGCACCTGTG CTGGAGGATG CCGAGCAGGG ACTCTCGTCT GTGGGACGGG ATCCAAGAGT 540
TTGGGGATTA GGGGTATTAG GGCTTTTCAG CTCCAGCTTC CTGATGAAAG GGCTGTCTTT 600
ACAGCCCCAA CCCTCAGACC CTCGCCTTCC AAGGCAGGCC CCTCAGCCTG GCCTGCTCTC 660
ACCACCTCCT CCAAAGCACT GAGGTCCTGG CAGGGCTCCT CTGAGGCCTT GGACGAGGAC 720 GTGGAGTCAT CAGTGTTGCC CTTGCCCCTA ACCTCAGCCC GAGGGTCTCC AGGATGTTGC 780
CTGGCCCAGG ACTTGGGACA GTGGCCTTGT CTTTGTCCTC CCCACCCCCC AGCCCTAGGG 840
ACACCCCAGG CAGTCCTGGG TGTGGACACG ATGAAGCACC GGCTCCATAA GACACCTTTT 900
GGGGAAGTGG TTGTGCATAT TTTATTTTTC TTTGACTCGT GTGAGTTCAA AGTAAACACC 960
ACCACCGTGG ACAACTCTTG AATTAAATCC ACTAGAGCGA GCTTTAAAAC TAATCTGAGC 1020 ATAACCCCCA ACGTGGCTCT ATGCTTGTGT ACCTTTGCAC ATATTTTGTT TAGTACAGTT 1080
TCATATTTGA GTTTGCAGAA TTATCTAATA GTCTTTTTTT GGCTAATATT TTTATAACGT 1140
GGTTCTTATT TAACTGTCTA GTTTTGATAG AATTTACCAG GTCTGGCTGA ATGAAGATGT 1200
TGGCACGTGT CATTTTAGTG GTTTAAATCC TCTTATTTAT GGTTTTAACT CTAAGAAAAT 1260
TTTAAAAGGA AGAGATGTTT GGATGACAAA AAAATGCCTT ATTGAAAAAC TAAAGCAAAT 1320
TCTTTATAGG AAGAAAATAT GGGAATTTGA TTACACATAG ATGATGATGT TTATTTAAAA 1380
AAAAAACAAC AGCAACAAAA AATCCTAGCC TCCAGTTGCC TTTTCCTTTC TTTAGCTCCT 1440 GTTTTTGGTG AATTACTAAA CTGATTGACT TTCAGCCTTT TTGGCTAGAT CCTGAGAGGC 1500
TATTTTTCTT ACGAATATAC CAACATCCTG AAAGTTAAAG AAAAAAATCT AATGTATGAA 1560
TGTGACTCAC CAATTTTTAT CAACTAATTC CTTTTTTTTA TTAAAGGCAT GCAGGGATTA 1620
ACAGGACTTC TGTTTACAAT GGAAATCTGA AATGGAAGAA ACATCTTTAA CCTTGTGTGT 1680
CTGTGATCTC CTCTGTCTTA ATCCACGCTC AGGCTAAAGA TGGGGATAAT GTGGAAATGG 1740 CAGTTGTCCC GAGGGCGTGG GGTGGGGGGT GCTTCTGTGC CCACGGGTCC CTGGGCAACA 1800
GTCCCTAGGC TAAGACAGGG GTGGGGGGCT AAGGGACCAG GGCTGGCCCT GATCCACCTA 1860
CCTGCTAACT CCAGATATTA TTTTTAAGTT GGAGACCTAA AAATAATTCT CTTGTATTTT 1920
GGAGATGAAG AAAAAAGTCA TTCACCTGGG AGGGATTTTT AACAAACATG CACTAATATT 1980
ACCTCCCTCT TCCAAACTCC TACTGTCCTC ACCTCACCCA CCACTGTAAA AGTATTACGT 2040 CGTGCATTAA AGACCAGAAA AGACTTGCTT TTACCCAGGG AGGGGACATT CCCCCAATCC 2100
TGTGTCTGGG CAGCTTGCCT GGTGGATGTT TCTGGCCCTT TTTCCACCAT GGGAATTTCT 2160
CACCTCTAGC TCTAAGGGCA GCTGACCACC TGCCAGCCCC TCTCTGGTTC CAGAAGATTA 2220
CCCTTCGCAC CGGCACAGGA GAGATCTTAA ATGGTCTCTG GGCTGCTGGG ACATGCAGGC 2280
TCTGAGTGGG GACTGCACCC ACTCATGCTG TTCCTGAGGC TGCCCTCTCC AGTCCCAGGA 2340 GCCGTGTCTA GAGTTCCTGA CCAGCCACCT TCTGCCAGAA CTGCAGGCCT GCGGCTGGGT 2400
CTTAGGCTCC CGCTGCCATT TGGGTAAGCC GGTGGCTGGT CTCGTCTGCC GGGGGAAGGG 2460
TGGGGAGGTG CAATGGGATG GGAGGCAGGA TGCTTGCCCA GAAAGGTGCT TTCCTTCAGA 2520
CGGTGCAGGG CCTGGGCCAG CCTTACAGGT CAGTAGACTA GACTCGACTA CTTGGGGTCA 2580
GTGATTCTTT TGTAACGAGT CTTTCATGAT GTGACTTTGA GGCCCCAACA TGACAGCCAC 2640 TGGGCCACCG GGACCCAAGA AGACAGCCCC CTGCCCCATC TTCTGGCACA GGCCATCCCG 2700
AGTGCATGTC CCTGTTGCCC ACTGCACTGA TCATGAAGCC ACCGGCCACT GCCACGCATG 2760
TTGCACCTGT GCCATCGTGC TCCCTCCTGA TGGGCACCGT GGTGGAGGAA GTCACCCAGC 2820
TGTTTCTCAG TCCCAGAGGC CGGTGGCTGG TTTTGAACTT GTGTTGACTG TTGATACTTA 2880
TTTACTGTAT AAATATAATT TATCATTTGT ACCATGAAAA AAAAAAAAAA AAAAAAAAAA 2940 AAAAAAACAA AAAAAAGGCC GCCTCGGCCC GTCGACGTCA A 2981
(2) INFORMATION FOR SEQ ID NO: 15:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15:
Arg Asn Pro Phe Lys Pro Pro Leu Ser 1 5
(2) INFORMATION FOR SEQ ID NO: 16:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 10 ammo acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16:
Asn Pro Phe Ser Arg Ser Asn Asn Pro Gin 1 5 10
[ 2 ) INFORMATION FOR SEQ ID NO: 17:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:
Asn Pro Phe Val Gin Glu Pro He Lys 1 5
(2) INFORMATION FOR SEQ ID NO: 18:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 8 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:
Asn Pro Phe Arg Leu Val Gly Lys 1 5
(2) INFORMATION FOR SEQ ID NO: 19:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:
Asn Pro Phe Ser Ser Glu Pro Pro Gly 1 5
(2) INFORMATION FOR SEQ ID NO: 20:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:
Arg Asn Phe Ser Pro Asn Pro Phe Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 21:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 10 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21:
Gin Gly Ser Asn Pro Phe Met Arg Ala Val 1 5 10 (2) INFORMATION FOR SEQ ID NO: 22:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 ammo acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:
Arg Lys Lys Asn Pro Phe Arg Leu Ala 1 5
(2) INFORMATION FOR SEQ ID NO: 23:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(i ) MOLECULE TYPE: peptide
(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 23:
Arg Gin Thr Asn Pro Phe Arg Met Pro 1 5
(2) INFORMATION FOR SEQ ID NO: 24:
(l) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 ammo acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (n) MOLECULE TYPE: peptide
(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 24
Ala Lys Asn Pro Phe Trp Asn Ala Pro 1 5 (2) INFORMATION FOR SEQ ID NO: 25:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 ammo acids
(B) TYPE: ammo acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:
Arg Leu Arg Thr Asn Pro Phe Arg Leu 1 5
.2) INFORMATION FOR SEQ ID NO: 26:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:
Asn Asn Pro Phe His Pro Pro Asn Gin 1 5
(2) INFORMATION FOR SEQ ID NO: 27:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:
Gly Pro Arg Asn Pro Phe Asn Ser Thr 1 5 (2) INFORMATION FOR SEQ ID NO: 28:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: Leu Asn Pro Phe Val Thr Tyr Asp Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 29: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29:
Gly Pro Arg Asn Pro Phe Asn Ser Thr 1 5
(2) INFORMATION FOR SEQ ID NO: 30:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30:
Pro Ser Ser Asn Pro Phe Gin Thr Leu 1 5 (2) INFORMATION FOR SEQ ID NO: 31:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: Asn Pro Phe Met Phe Ser Arg Ser Gly 1 5
(2) INFORMATION FOR SEQ ID NO: 32: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:
Gly Pro Asn Pro Phe His Pro Lys Ala 1 5
(2) INFORMATION FOR SEQ ID NO: 33:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33:
Ser Asn Pro Phe Lys Arg Ala Gin Ala 1 5 (2) INFORMATION FOR SEQ ID NO: 34:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: He Gly Ser Ala Asn Pro Phe Gin Thr
1 5
(2) INFORMATION FOR SEQ ID NO: 35: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35:
Gly Asn Pro Phe Lys Val Ser Pro Thr 1 5
(2) INFORMATION FOR SEQ ID NO: 36:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: ammo acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(11) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36:
Lys Asn Pro Phe Gly Ser Ser Ser Gin 1 5 (2) INFORMATION FOR SEQ ID NO: 37:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 ammo acids
Figure imgf000064_0001
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: Thr Asn Tyr Asn Pro Phe Arg He Ser 1 5
(2) INFORMATION FOR SEQ ID NO: 38: (l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 ammo acids
Figure imgf000064_0002
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38:
Arg Lys Asn Pro Phe Gly Val Thr Ser 1 5
(2) INFORMATION FOR SEQ ID NO: 39:
(l) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 ammo acids
Figure imgf000064_0003
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: Glu Pro Phe Asn Pro Phe Thr Pro Arg
1 5 (2) INFORMATION FOR SEQ ID NO: 40:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: Tyr Asn Pro Phe Ser Ala Gin Leu Val 1 5
(2) INFORMATION FOR SEQ ID NO: 41: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 10 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41:
Ser Gin Arg Asn Pro Phe Gin Ala Arg Leu 1 5 10
(2) INFORMATION FOR SEQ ID NO: 42:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42:
Ala Lys Thr Asn His Phe Pro Leu Gly 1 5
(2) INFORMATION FOR SEQ ID NO: 43:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear (11) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: Gin Ser Ser His Pro Phe Arg Lys Val
1 5
(2) INFORMATION FOR SEQ ID NO: 44: (l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
Figure imgf000066_0001
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(i ) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44:
Arg Ser Ala Asn Pro Phe Gin Glu Ser 1 5
(2) INFORMATION FOR SEQ ID NO: 45:
(l) SEQUENCE CHARACTERISTICS: (A) LENGTH: 8 ammo acids (B) TYPE: ammo acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45:
Lys Lys Gin Asn Pro Phe Arg Leu 1 5
[ 2 ) INFORMATION FOR SEQ ID NO: 46:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: Asn Pro Phe Arg He Gin Thr Thr Gly 1 5
(2) INFORMATION FOR SEQ ID NO: 47:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47:
Ser Arg Ser Asn Pro Phe Arg Ser Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 48:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48:
Ser Phe Leu Asn Pro Phe Arg Lys Pro 1 5
(2) INFORMATION FOR SEQ ID NO: 49:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49:
Ser Asn Pro Phe Arg Gin Arg Thr Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 50:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: ammo acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50:
Arg Lys Pro Thr Asn Pro Phe Lys Gin 1 5
(2) INFORMATION FOR SEQ ID NO: 51:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51:
Asn Pro Phe Arg Pro Asp Met Ser Ser 1 5
(2) INFORMATION FOR SEQ ID NO: 52:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52:
Asn Pro Phe Arg Thr Thr Asn Pro Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 53:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53:
Thr Asn Pro Phe He Gly Arg Asp Pro
1 5 (2) INFORMATION FOR SEQ ID NO: 54:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 8 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54:
Asn Pro Phe Arg Thr Lys Lys Ala 1 5
(2) INFORMATION FOR SEQ ID NO: 55:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55:
Asn Pro Phe Arg Leu Lys Asp Ala Ala 1 5
(2) INFORMATION FOR SEQ ID NO: 56:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56:
Asn Pro Phe Arg Thr Thr Asn Pro Leu 1 5 (2) INFORMATION FOR SEQ ID NO: 57:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57:
Thr Arg Asn Pro Phe Arg Gin Leu Pro 1 5
(2) INFORMATION FOR SEQ ID NO: 58:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58:
Leu Asn Pro Phe Arg Gin Arg Thr Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 59:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59:
Gly Lys Ser Ala Asn Pro Phe Arg Leu 1 5 (2) INFORMATION FOR SEQ ID NO: 60:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: Asn Pro Phe Arg Leu Val Ser Lys Ser
1 5
(2) INFORMATION FOR SEQ ID NO: 61: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61:
Asn Pro Phe Arg Arg Pro Gin Gly Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 62:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62:
Ser Asn Pro Phe Phe Thr Ser Gin Thr Lys Glu 1 5 10 (2) INFORMATION FOR SEQ ID NO: 63:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: Thr Pro Ser Thr Asn Pro Phe Val Ala Ala Ala 1 5 10
(2) INFORMATION FOR SEQ ID NO: 64: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64:
Ala Ser Ser Thr Asn Pro Phe Gin Thr Asn Ala 1 5 10
(2) INFORMATION FOR SEQ ID NO: 65:
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 ammo acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65:
Gly Val Ser Ser Asn Pro Phe Met Thr Gly Ala 1 5 10 (2) INFORMATION FOR SEQ ID NO: 66:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 8 amino acids
Figure imgf000072_0001
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: Ser Ser Ser Thr Asn Pro Phe Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 67: (l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 ammo acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(x ) SEQUENCE DESCRIPTION: SEQ ID NO: 67:
Gly Ala Phe Thr Asn Pro Phe Thr Ala Pro Ala 1 5 10
(2) INFORMATION FOR SEQ ID NO: 68:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 ammo acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68:
Leu Pro Ser Thr Asn Pro Phe Gin Pro Asn Gly 1 5 10 (2) INFORMATION FOR SEQ ID NO: 69:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: amino acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: Gly He Ser Thr Asn Pro Phe Met Thr Gly Pro
1 5 10
(2) INFORMATION FOR SEQ ID NO: 70: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 8 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70:
Pro Pro Thr Thr Asn Pro Phe Leu 1 5
(2) INFORMATION FOR SEQ ID NO: 71:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: Pro Ser Pro Thr Asn Pro Phe Ser Ser Asp Leu 1 5 10 (2) INFORMATION FOR SEQ ID NO: 72:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: ammo acid (C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(x ) SEQUENCE DESCRIPTION: SEQ ID NO: 72: Phe Gly Pro Glu Asn Pro Phe Arg Thr Gin Gin 1 5 10
(2) INFORMATION FOR SEQ ID NO: 73: U) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(n) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73:
Glu Lys Pro Ser Asn Pro Phe Ser Gly Asp Leu 1 5 10
(2) INFORMATION FOR SEQ ID NO: 74:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 ammo acids (B) TYPE: ammo acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74:
Met Tyr Ser Asn Asn Pro Phe Trp Asn Gly Val 1 5 10
(2) INFORMATION FOR SEQ ID NO: 75:
(l) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
Figure imgf000074_0001
(C) STRANDEDNESS: (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: Gly Val Gin Thr Asn Pro Phe Leu Asn Gly Asn 1 5 10
(2) INFORMATION FOR SEQ ID NO: 76: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76:
Val Arg Arg Asp Asn Pro Phe Phe Arg Ser Lys 1 5 10
(2) INFORMATION FOR SEQ ID NO: 77:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids (B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77:
Ala Gin Ser Leu Asn Pro Phe Leu Ala Pro Gly 1 5 10
(2) INFORMATION FOR SEQ ID NO: 78:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 11 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: Ser Ala Pro Val Asn Pro Phe Gin Val Asn Gin 1 5 10
(2) INFORMATION FOR SEQ ID NO: 79:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 9 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS:
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: peptide
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: Thr Gly Thr Thr Asn Pro Phe Leu Leu

Claims

1. A protein selected from the group consisting of: h- NUMB (SEQ ID NO: 1) , h-NUMB-R (SEQ ID NO:2), h-RAB-R^ (SEQ ID NO:4), ehb3 having all or part of the amino acid sequence reported as SEQ ID NO: 5, ehblO having all or part of the amino acid sequence reported as SEQ ID NO: 6, ehb21 having all or part of the amino acid sequence reported as SEQ ID NO: 7, or one of their functional derivatives.
2. A polynucleotide operably encoding one of the proteins h-NUMB, h-NUMB-R, h-RAB-R, ehb3, ehblO, ehb21 according to claim 1 or one of their functional derivatives.
3. A polynucleotide according to claim 2, that comprises one of the following polynucleotide sequences: SEQ ID NO: 8, which encodes the h-NUMB protein, SEQ ID NO: 9, which encodes the h-NUMB-R protein, SEQ ID NO: 11, which encodes the h-RAB-R protein.
4. A polynucleotide according to claim 2, encoding ehb3, ehblO or ehb21 protein, having the partial nucleotide sequence reported as SEQ ID NO: 12, SEQ ID NO: 13, or SEQ ID NO: 14, respectively.
5. A polynucleotide according to claim 2, wherein the polynucleotide is mRNA.
6. A recombinant vector comprising a polynucleotide according to claim 2, 3 or .
7. A host cell transfected with the recombinant vector according to claim 6.
8. A peptide containing at least one NPF motif, able to bind to a protein with at least one EH domain.
9. A peptide according to claim 8, able to bind to epsl5 or epsl5R.
10. A peptide according to claim 8 or 9, from h-NUMB, h- NUMB-R, h-RAB, h-RAB-R, ehb3, ehblO, ehb21 proteins or one of their chemical derivatives.
11. A peptide having a sequence selected from SEQ ID NO: 15-79 or one of their chemical derivatives.
12. The complex between a protein containing at least one EH domain and a protein containing at least one NPF motif .
13. The complex according to claim 12, wherein the protein containing an EH domain is epsl5 or epsl5R.
14. The complex according to claim 12 or 13 wherein the NPF containing protein is a protein selected in the following group: h-NUMB, h-NUMB-R, h-RAB, h-RAB-R, ehb3, ehblO, ehb21 or one of their functional derivatives.
15. The complex between an EH containing protein and an NPF containing peptide.
16. The complex according to claim 15 wherein the EH containing protein is epsl5 or eρsl5R.
17. The complex according to claim 15 or 16 wherein the NPF containing peptide is from h-NUMB, h-NUMB-R, h-RAB, h-RAB-R, ehb3, ehblO, ehb21 or one of' their chemical derivatives.
18. The complex according to claim 15 or 16 wherein the peptide is a peptide selected from SEQ ID NO: 15-79 or one of their chemical derivatives.
19. A fusion protein with a NPF-containing peptide .
20. A fusion protein according to claim 19 wherein the fusion protein is a GST fusion protein.
21. A fusion protein according to claim 19 or 20 wherein the NPF containing peptide comes from: h-NUMB, h-NUMB-R, h- RAB, h-RAB-R, ehb3, ehblO or ehb21.
22. A fusion protein according to claim 19 or 20 wherein the NPF containing peptide has a sequence selected from SEQ ID NO: 15-79, or one of their chemical derivative.
23. A method to identify and purify EH containing proteins using a protein according to claims 19 to 22.
24. The use of a peptide, or of one of its functional derivatives, as defined in claims 8 through 11 as an inhibitor of a complex between a protein containing at least one EH and a protein containing at least one NPF motif.
25. A peptide, as defined in claims 8 through 11, wherein the peptide is bound to a solid support.
26. A method to purify an EH-containing protein from a complex mixture, which consists in: a) incubate such complex mixture with a solid-phase support to which a peptide, as defined in claims 8 through 11, is bound allowing such a protein to form a complex with the peptide bound to the solid support; b) remove substances not complexed to such peptide bound on the solid support; c)elute such a protein complex to the solid support.
27. A peptide, as defined in claims 8 through 11, for therapeutical purposes.
28. A purified antibody against h-NUMB, h-NUMB-R, h-RAB-R, ehb3, ehblO or ehb21.
29. The antibody of claim 28, wherein the antibody is conjugated to a cytotoxic agent.
30. The antibody of claim 28 wherein the antibody is bound to a detectable group.
31. The antibody of claim 28, wherein the antibody is bound to a solid support.
32. An antisense RNA or one of its chemical derivative complementary to a mRNA encoding proteins from the following group: h-NUMB, h-NUMB-R, h-RAB-R, ehb3, ehblO or ehb21.
33. An RNA antisense of claim 32 for therapeutic or diagnostic use.
34. Homopurine or homopyrimidine sequences of the polynucleotide sequences reported as SEQ ID NO: 8, SEQ ID
NO:9, SEQ ID NO:ll, SEQ ID NO:12, SEQ ID NO: 13, SEQ ID NO:14 or of their fragments.
35. The use of homopurine or homopyrimidine sequence of claim 34 as a triple helix probe.
PCT/IT1998/000077 1997-04-15 1998-04-06 Intracellular interactors and eh domain binding specificity WO1998046744A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU70775/98A AU7077598A (en) 1997-04-15 1998-04-06 Intracellular interactors and eh domain binding specificity

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITMI97A000868 1997-04-15
ITMI970868 IT1291110B1 (en) 1997-04-15 1997-04-15 INTRACELLULAR INTERACTORS AND BINDING SPECIFICITY OF THE DOMAIN EH

Publications (1)

Publication Number Publication Date
WO1998046744A1 true WO1998046744A1 (en) 1998-10-22

Family

ID=11376890

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IT1998/000077 WO1998046744A1 (en) 1997-04-15 1998-04-06 Intracellular interactors and eh domain binding specificity

Country Status (3)

Country Link
AU (1) AU7077598A (en)
IT (1) IT1291110B1 (en)
WO (1) WO1998046744A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001046256A2 (en) * 1999-12-21 2001-06-28 Incyte Genomics, Inc. Vesicle trafficking proteins
WO2001055448A1 (en) * 2000-01-31 2001-08-02 Human Genome Sciences, Inc. Nucleic acids, proteins, and antibodies

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5487979A (en) * 1992-08-25 1996-01-30 The United States Of America As Represented By The Department Of Health And Human Services DNA encoding human and murine eps15, a substrate for the epidermal growth factor receptor
WO1996003649A1 (en) * 1994-07-22 1996-02-08 The University Of North Carolina At Chapel Hill Src SH3 BINDING PEPTIDES AND METHODS OF ISOLATING AND USING SAME

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5487979A (en) * 1992-08-25 1996-01-30 The United States Of America As Represented By The Department Of Health And Human Services DNA encoding human and murine eps15, a substrate for the epidermal growth factor receptor
WO1996003649A1 (en) * 1994-07-22 1996-02-08 The University Of North Carolina At Chapel Hill Src SH3 BINDING PEPTIDES AND METHODS OF ISOLATING AND USING SAME

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
BOGERD H.P. ET AL.: "Identification of a novel cellular cofactor for the Rev/Rex clas of retroviral regulatory proteins", CELL, vol. 82, 11 August 1995 (1995-08-11), pages 485 - 494, XP002073493 *
DATABASE EMBL SEQEUNCES EMBL, Heidelberg, FRG; 28 August 1996 (1996-08-28), HILLIER L. ET AL.: "EST; H. sapiens pregnant uterus cDNA clone 484533", XP002073502 *
DATABASE GENBANK 18 May 1995 (1995-05-18), HILLIER L. ET AL.: "H. sapiens cDNA 152976", XP002073504 *
DATABASE GENBANK 18 May 1995 (1995-05-18), SHERRINGTON R. ET AL.: "Clone S171", XP002073503 *
DATABASE GENBANK 21 February 1995 (1995-02-21), AUFFRAY C. ET AL.: "H. sapiens brain cDNA c-2fa01", XP002073501 *
DATABASE GENBANK 6 September 1995 (1995-09-06), ADAMS M.D. ET AL.: "EST; Human brain EST59062", XP002073505 *
SALCINI A.E. ET AL.: "Binding specificity and in vivo targets of the EH domain, a novel protein-protein interaction module", GENES & DEVELOPMENT, vol. 11, no. 17, 1 September 1997 (1997-09-01), pages 2239 - 2249, XP002073498 *
SHERRINGTON R. ET AL.: "Cloning of a gene bearing missense mutations in early-onset familial Alzheimer's disease", NATURE, vol. 375, 29 June 1995 (1995-06-29), pages 754 - 760, XP002073495 *
TAN P.K. ET AL.: "The sequence NPFXD defines a new class of endocytosis signal in S. cerevisiae", J. CELL BIOL., vol. 135, no. 6, December 1996 (1996-12-01), pages 1789 - 1800, XP002073497 *
WONG W.T. ET AL.: "A protein-binding domain, EH, identified in the receptor tyrosine kinase substrate Eps15 and conserved in evolution", PROC. NATL. ACAD. SCI. USA, vol. 92, October 1995 (1995-10-01), pages 9530 - 9534, XP002073496 *
ZHONG W. ET AL.: "Asymmetric localization of a mammalian numb homolog during mouse cortical neurogenesis.", NEURON, vol. 17, no. 1, July 1996 (1996-07-01), pages 43 - 53, XP002073494 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001046256A2 (en) * 1999-12-21 2001-06-28 Incyte Genomics, Inc. Vesicle trafficking proteins
WO2001046256A3 (en) * 1999-12-21 2001-12-27 Incyte Genomics Inc Vesicle trafficking proteins
WO2001055448A1 (en) * 2000-01-31 2001-08-02 Human Genome Sciences, Inc. Nucleic acids, proteins, and antibodies

Also Published As

Publication number Publication date
IT1291110B1 (en) 1998-12-29
AU7077598A (en) 1998-11-11
ITMI970868A1 (en) 1998-10-15

Similar Documents

Publication Publication Date Title
Bengal et al. Functional antagonism between c-Jun and MyoD proteins: a direct physical association
JP3632171B2 (en) DNA segment encoding erbB-3 polypeptide, antibody and bioassay for detecting said polypeptide
EP0536350B1 (en) Identification of a novel human receptor tyrosine kinase gene
US5243041A (en) DNA vector with isolated CDNA gene encoding metallopanstimulin
US5717067A (en) Substrate for the epidermal growth factor receptor kinase
US7297493B2 (en) Fibroblast growth factor receptor activating gene 1 and related compositions and methods
US5773237A (en) Method for determining tyrosine kinase activity
AU713937B2 (en) Peptides capable of binding to the gap protein sh3 domain, nucleotide sequences coding therefor, and preparation and use thereof
WO1993023539A1 (en) RETINOBLASTOMA-ASSOCIATED PROTEIN 1 cDNA
GB2282814A (en) Human transcription factor E2F-2
WO1998046744A1 (en) Intracellular interactors and eh domain binding specificity
US6723838B1 (en) Signal transducing synaptic molecules and uses thereof
EP0803571B1 (en) TAB1 protein and DNA coding therefor
US5716782A (en) Nucleic acid encoding a signal mediator protein that induces cellular morphological alterations
US6297019B1 (en) Recombinant polynucleotides encoding CYP7 promoter-binding factors
KR20060104263A (en) Human protooncogene trg and protein encoded therein
EP1539956B1 (en) GEF-H1b: BIOMARKERS, COMPLEXES, ASSAYS AND THERAPEUTIC USES THEREOF
US6545141B1 (en) Brain-specific adapter molecule, gene thereof, and antibody thereto
WO1998000539A2 (en) (mitogen-activated protein kinase) kinase-3 (mkk3) interacting protein (mip)
WO1999054465A2 (en) Gene encoding syntaxin interacting protein
CA2330168A1 (en) Protein for regulating apoptosis
AU2003266470A1 (en) Gene encoding syntaxin interacting protein
MXPA00009332A (en) Gene encoding syntaxin interacting protein
AU1869802A (en) Protein binding fragments of gravin
KR20060107490A (en) Human protooncogene trg and protein encoded therein

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM GW HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 1998543692

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA