CA2354369A1 - Complementary dna's encoding proteins with signal peptides - Google Patents

Complementary dna's encoding proteins with signal peptides Download PDF

Info

Publication number
CA2354369A1
CA2354369A1 CA002354369A CA2354369A CA2354369A1 CA 2354369 A1 CA2354369 A1 CA 2354369A1 CA 002354369 A CA002354369 A CA 002354369A CA 2354369 A CA2354369 A CA 2354369A CA 2354369 A1 CA2354369 A1 CA 2354369A1
Authority
CA
Canada
Prior art keywords
sequence
sequences
protein
nos
cdna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002354369A
Other languages
French (fr)
Inventor
Lydie Bougueleret
Jean-Baptiste Dumas
Aymeric Duclert
Catherine Clusel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merck Biodevelopment SAS
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2354369A1 publication Critical patent/CA2354369A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • C12N15/625DNA sequences coding for fusion proteins containing a sequence coding for a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/02Fusion polypeptide containing a localisation/targetting motif containing a signal sequence
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/20Fusion polypeptide containing a tag with affinity for a non-protein ligand
    • C07K2319/21Fusion polypeptide containing a tag with affinity for a non-protein ligand containing a His-tag
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/32Fusion polypeptide fusions with soluble part of a cell surface receptor, "decoy receptors"

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Peptides Or Proteins (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

The sequences of cDNAs encoding secreted proteins are disclosed. The cDNAs c an be used to express secreted proteins or fragments thereof or to obtain antibodies capable of specifically binding to the secreted proteins. The cDN As may also be used in diagnostic, forensic, gene therapy, and chromosome mappi ng procedures. The cDNAs may also be used to design expression vectors and secretion vectors.

Description

CA 02354369 2001-06-13 f .
' r.
DEMANDES OU BREVETS VOLUMINEUX

COMPREND PLUS D'UN TOME.
CECI EST LE TOME '~ DE
NOTE: ~ Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets THAN ONE VOLUME , THIS IS VOLUME ~ , OF
NOTE: For additional volumes please contact the Canadian Patent Office COMPLEMENTARY DNA'S ENCODING PROTEINS WITH SIGNAL PEPTIDES
Background of the Invention The estimated 50,000-100,000 genes scattered along the human chromosomes offer tremendous promise for the understanding, diagnosis, and treatment of human diseases. In addition, probes capable of specifically hybridizing to loci distributed throughout the human genome find applications in the construction of high resolution chromosome maps and in the identification of individuals.
In the past, the characterization of even a single human gene was a painstaking process, requiring years of effort. Recent developments in the areas of cloning vectors, DNA
sequencing, and computer technology have merged to greatly accelerate the rate at which human genes can be isolated, sequenced, mapped, and characterized.
Currently, two different approaches are being pursued for identifying and characterizing the genes distributed along the human genome. In one approach, large fragments of genomic DNA are isolated, cloned, and sequenced. Potential open reading frames in these genornic sequences are identified using bio-informatics software. However, this approach entails sequencing large stretches of human DNA which do not encode proteins in order to frnd the protein encoding sequences scattered throughout the genome. In addition to requiring extensive sequencing, the bio-informatics software may mischaracterize the genomic sequences obtained, i.e., labeling non-coding DNA as coding DNA and vice versa.
An alternative approach takes a more direct route to identifying and characterizing human genes.
In this approach, complementary DNAs (cDNAs) are synthesized from isolated messenger RNAs (mRNAs) which encode human proteins. Using this approach, sequencing is only performed on DNA which is derived from protein coding fragments of the genome. Often, only short stretches of the cDNAs are sequenced to obtain sequences called expressed sequence tags (ESTs). The ESTs may then be used to isolate or purify cDNAs which include sequences adjacent to the EST sequences. The cDNAs may contain all of the sequence of the EST which was used to obtain them or only a fragment of the sequence of the EST which was used to obtain them. In addition, the cDNAs may contain the full coding sequence of the gene from which the EST was derived or, alternatively, the cDNAs may include fragments of the coding sequence of the gene from which the EST was derived. It will be appreciated that there may be several cDNAs which include the EST sequence as a result of alternate splicing or the activity of alternative promoters.
In the past, these short EST sequences were often obtained from oligo-dT
primed cDNA libraries.
Accordingly, they mainly corresponded to the 3' untranslated region of the mRNA. In part, the prevalence of EST sequences derived from the 3' end of the mRNA is a result of the fact that typical techniques for obtaining cDNAs, are not well suited for isolating cDNA sequences derived from the 5' ends of mRNAs (Adams et aL, Nature 377:3-174, 1996, Hillier et aL, Genome Res. 6:807-828, 1996). In addition, in those reported instances where longer cDNA sequences have been obtained, the reported sequences typically correspond to coding sequences and do not include the full 5' untranslated region (5'UTR) of the mRNA from which the cDNA is derived. Indeed, 5'UTRs have been shown to affect either the stability or translation of mRNAs. Thus, regulation of gene expression may be achieved through the use of alternative 5'UTRs as shown, for instance, for the translation of the tissue inhibitor of metalloprotease mRNA in mitogenically activated cells (Waterhouse et al., J Biol Chem. 265:5585-9. 1990).
Furthermore, modification of 5'UTR
through mutation, insertion or translocation events may even be implied in pathogenesis. For instance, the fragile X syndrome, the most common cause of inherited mental retardation, is partly due to an insertion of multiple CGG trinucleotides in the 5'UTR of the fragile X mRNA resulting in the inhibition of protein synthesis via ribosome stalling (Feng et al., Science 268:731-4, 1995). An aberrant mutation in regions of the 5'UTR
known to inhibit translation of the proto-oncogene c-myc was shown to result in upregulation of c-myc protein levels in cells derived from patients with multiple myelomas (Willis et al., Curr Top Microbiol Immunol 224:269-76, 1997). In addition, the use of oligo-dT primed cDNA libraries does not allow the isolation of complete 5'UTRs since such incomplete sequences obtained by this process may not include the first exon of the mRNA, particularly in situations where the first exon is short.
Furthermore, they may not include some exons, often short ones, which are located upstream of splicing sites. Thus, there is a need to obtain sequences derived from the 5' ends of mRNAs.
Moreover, despite the great amount of EST data that large-scale sequencing projects have yielded (Adams et al., Nature 377:174, 1996, Hillier et al., Genome Res. 6:807-828, 1996), information concerning the biological function of the mRNAs corresponding to such obtained cDNAs has revealed to be limited.
Indeed, whereas the knowledge of the complete coding sequence is absolutely necessary to investigate the biological function of mRNAs, ESTs yield only partial coding sequences. So far, large-scale full-length cDNA
cloning has been achieved only with limited success because of the poor efficiency of methods for constructing full-length cDNA libraries. Indeed, such methods require either a large amount of mRNA
(Ederiy et al., 1995), thus resulting in non representative full-length libraries when small amounts of tissue are available or require PCR amplification (Maruyama et al., 1994;
CLONTECHniques, 1996) to obtain a reasonable number of clones, thus yielding strongly biased cDNA libraries where rare and long cDNAs are lost. Thus, there is a need to obtain full-length cDNAs, i.e. cDNAs containing the full coding sequence of their corresponding mRNAs.
While many sequences dertved from human chromosomes have practical applications, approaches based on the identification and characterization of those chromosomal sequences which encode a protein product are particularly relevant to diagnostic and therapeutic uses. Of the 50,000-100,000 protein coding genes, those genes encoding proteins which are secreted from the cell in which they are synthesized, as well as the secreted proteins themselves, are particularly valuable as potential therapeutic agents. Such proteins are often involved in cell to cell communication and may be responsible for producing a clinically relevant response in their target cells. In fact, several secretory proteins, including tissue plasminogen .
activator, G-CSF, GM-CSF, erythropoietin, human growth hormone, insulin, interferon, interferon-U, interferorM, and interleukin-2, are currently in clinical use. These proteins are used to treat a wide range of conditions, including acute myocardial infarction, acute ischemic stroke, anemia, diabetes, growth hormone deficiency, hepatitis, kidney carcinoma, chemotherapy induced neutropenia and multiple sclerosis. For these reasons, cDNAs encoding secreted proteins or fragments thereof represent a particularly valuable source of therapeutic agents. Thus, there is a need for the identification and characterization of secreted proteins and the nucleic acids encoding them.
In addition to being therapeutically useful themselves, secretory proteins include short peptides, called signal peptides, at their amino termini which direct their secretion.
These signal peptides are encoded by the signal sequences located at the 5' ends of the coding sequences of genes encoding secreted proteins. Because these signal peptides will direct the extracellular secretion of any protein to which they are operably linked, the signal sequences may be exploited to direct the efficient secretion of any protein by operably linking the signal sequences to a gene encoding the protein for which secretion is desired. In addition, fragments of the signs( peptides called membrane-translocating sequences, may also be used to direct the intracellular import of a peptide or protein of interest. This may prove beneficial in gene therapy strategies in which it is desired to deliver a particular gene product to cells other than the cells in which it is produced. Signal sequences encoding signal peptides also find application in simplifying protein purification techniques. In such applications, the extracellular secretion of the desired protein greatly facilitates purification by reducing the number of undesired proteins from which the desired protein must be selected.
Thus, there exists a need to identify and characterize the 5' fragments of the genes for secretory proteins which encode signal peptides.
Sequences coding for secreted proteins may also find application as therapeutics or diagnostics. In particular, such sequences may be used to determine whether an individual is likely to express a detectable phenotype, such as a disease, as a consequence of a mutation in the coding sequence for a secreted protein. In instances where the individual is at risk of suffering from a disease or other undesirable phenotype as a result of a mutation in such a coding sequence, the undesirable phenotype may be corrected by introducing a normal coding sequence using gene therapy. Alternatively, if the undesirable phenotype results from overexpression of the protein encoded by the coding sequence, expression of the protein may be reduced using antisense or triple helix based strategies.
The secreted human polypeptides encoded by the coding sequences may also be used as therapeutics by administering them directly to an individual having a condition, such as a disease, resulting from a mutation in the sequence encoding the polypeptide. In such an instance, the condition can be cured or ameliorated by administering the poiypeptide to the individual.
In addition, the secreted human polypeptides or fragments thereof may be used to generate antibodies useful in determining the tissue type or species of origin of a biological sample. The antibodies may also be used to determine the cellular localization of the secreted human polypeptides or the cellular localization of polypep6des which have been fused to the human polypeptides.
In addition, the antibodies may also be used in immunoaffinity chromatography techniques to isolate, purify, or enrich the human polypeptide or a target polypeptide which has been fused to the human polypeptide.
Public information on the number of human genes for which the promoters and upstream regulatory regions have been identified and characterized is quite limited. In part, this may be due to the difficulty of isolating such regulatory sequences. Upstream regulatory sequences such as transcription factor binding sites are typically too short to be utilized as probes for isolating promoters from human genomic libraries Recently, some approaches have been developed to isolate human promoters. One of them consists of making a CpG island library (Cross et al., Nature Genetics 6: 236-244, 1994).
The second consists of isolating human genomic DNA sequences containing Spel binding sites by the use of Spel binding protein.
(Mortlock et al., Genome Res. 6:327-335, 1996). Both of these approaches have their limits due to a lack of specificity and of comprehensiveness. Thus, there exists a need to identify and systematically characterize the 5' fragments of the genes.
cDNAs including the 5' ends of their corresponding mRNA may be used to efficiently identify and isolate 5'UTRs and upstream regulatory regions which control the location, developmental stage, rate, and quantity of protein synthesis, as well as the stability of the mRNA (Theil et al., BioFacfors 4:87-93, (1993).
Once identified and characterized, these regulatory regions may be utilized in gene therapy or protein purification schemes to obtain the desired amount and locations of protein synthesis or to inhibit, reduce, or prevent the synthesis of undesirable gene products.
In addition, cDNAs containing the 5' ends of secretory protein genes may include sequences useful as probes for chromosome mapping and the identification of individuals. Thus, there is a need to identify and characterize the sequences upstream of the 5' coding sequences of genes encoding secretory proteins.
Summary of the Invention The present invention relates to purified, isolated, or recombinant cDNAs which encode secreted proteins or fragments thereof. Preferably, the purified, isolated or recombinant cDNAs contain the entire open reading frame of their corresponding mRNAs, including a start colon and a stop colon. For example, the cDNAs may include nucleic acids encoding the signal peptide as well as the mature protein. Such cDNAs will be referred herein as "full-length" cDNAs. Alternatively, the cDNAs may contain a fragment of the open reading frame. Such cDNAs will be referred herein as "ESTs" or "5'ESTs". In some embodiments, the fragment may encode only the sequence of the mature protein.
Alternatively, the fragment may encode only a fragment of the mature protein. A further aspect of the present invention is a nucleic acid which encodes the signal peptide of a secreted protein.
The term "corresponding mRNA" refers to the mRNA which was the template for the cDNA
synthesis which produced the cDNA of the present invention. As used herein, the term 'purified" does not require absolute purity; rather, it is intended as a relative definition.
Individual cDNA clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity.
The sequences obtained from these clones could not be obtained directly either from the library or from total human DNA. The cDNA
clones are not naturally occurring as such, but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA
and subsequently isolating individual clones from that library results in an approximately 10~-106 fold purification of the native message.

Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
As used herein, the term "isolated" requires that the material be removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring 5 polynucleotide present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated.
As used herein, the term "recombinant" means that the cDNA is adjacent to "backbone" nucleic acid to which it is not adjacent in its natural environment. Additionally, to be "enriched" the cDNAs will represent 5% or more of the number of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone molecules according to the present invention include nucleic acids such as expression vectors, self-replicating nucleic acids, viruses, integrating nucleic acids, and other vectors or nucleic acids used to mainfain or manipulate a nucleic acid insert of interest. Preferably, the enriched cDNAs represent 15% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. More preferably, the enriched cDNAs represent 50% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. In a highly preferred embodiment, the enriched cDNAs represent 90% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules.
Thus, cDNAs encoding secreted polypeptides or fragments thereof which are present in cDNA
libraries in which one or more cDNAs encoding secreted polypeptides or fragments thereof make up 5% or more of the number of nucleic acid inserts in the backbone molecules are "enriched recombinant cDNAs" as defined herein. Likewise, cDNAs encoding secreted polypeptides or fragments thereof which are in a population of plasmids in which one or more cDNAs of the present invention have been inserted such that they represent 5% or more of the number of inserts in the plasmid backbone are " enriched recombinant cDNAs" as defined herein. However, cDNAs encoding secreted polypeptides or fragments thereof which are in cDNA libraries in which the cDNAs encoding secreted polypeptides or fragments thereof constitute less than 5% of the number of nucleic acid inserts in the population of backbone molecules, such as libraries in which backbone molecules having a cDNA insert encoding a secreted polypeptide are extremely rare, are not "enriched recombinant cDNAs."
The term "polypeptide" refers to a polymer of amino acids without regard to the length of the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not specify or exclude post-expression modifications of polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term polypeptide.
Also included within the definition are polypeptides which contain one or more analogs of an amino acid (including, for example, non-naturally occumng amino acids, amino acids which only occur naturally in an unrelated biological system, modified amino acids from mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occumng and non-naturally occurring.
As used interchangeably herein, the terms "nucleic acids," "oligonucleotides,"
and "polynucleotides"
include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or duplex form. The term "nucleotide" as used herein as an adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single-stranded or duplex form. The term "nucleotide"
is also used herein as a noun to refer to individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide. Although the term "nucleotide" is also used herein to encompass "modified nucleotides" which comprise at least one modifications (a) an alternative linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see for example PCT
publication No. WO
95/04064. The polynucleotide sequences of the invention may be prepared by any known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any purification methods known in the art.
The terms "base paired" and "Watson & Crick base paired" are used interchangeably herein to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence identities in a manner like that found in double-helical DNA with thymine or uracil residues !inked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds (See Stryer, L., Biochemistry, 4~h edition, 1995).
The terms "complementary" or "complement thereof' are used herein to refer to the sequences of polynucleotides which are capable of forming Watson & Crick base pairing with another specified polynucleotide throughout the entirety of the complementary region. For the purpose of the present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide when each base in the first polynucleotide is paired with its complementary base.
Complementary bases are, generally, A
and T (or A and U), or C and G. "Complement" is used herein as a synonym from "complementary polynucleotide," "complementary nucleic acid" and "complementary nucleotide sequence" . These terms are applied to pairs of polynucleotides based solely upon their sequences and not any particular set of conditions under which the two polynucleotides would actually bind.
Preferably, a "complementary"
sequence is a sequence which an A at each position where there is a T on the opposite strand, a T at each position where there is an A on the opposite strand, a G at each position where there is a C on the opposite strand and a C at each position where there is a G on the opposite strand.
'Stringent', "moderate," and "low" hybridization conditions are as defined below.
In particular, the present invention relates to cDNAs which were derived from genes encoding secreted proteins. As used herein, a "secreted" protein is one which, when expressed in a suitable host cell, is transported across or through a membrane, including transport as a result of signal peptides in its amino acid sequence. "Secreted" proteins include without limitation proteins secreted wholly (e.g. soluble proteins), or partially (e.g. receptors) from the cell in which they are expressed. "Secreted" proteins also include without limitation proteins which are transported across the membrane of the endoplasmic reticulum.
cDNAs encoding secreted proteins may include nucleic acid sequences, called signal sequences, which encode signal peptides which direct the extracellular secretion of the proteins encoded by the cDNAs.
Generally, the signal peptides are located at the amino termini of secreted proteins.
Secreted proteins are translated by ribosomes associated with the "rough"
endoplasmic reticulum.
Generally, secreted proteins are co-translationally transferred to the membrane of the endoplasmic reticulum. Association of the ribosome with the endoplasmic reticulum during translation of secreted proteins is mediated by the signal peptide. The signal peptide is typically cleaved following its co-translational entry into the endoplasmic reticulum. After delivery to the endoplasmic reticulum, secreted proteins may proceed through the Golgi apparatus. In the Golgi apparatus, the proteins may undergo post-translational modification before entering secretory vesicles which transport them across the cell membrane.
The cDNAs of the present invention have several important applications. For example, they may be used to express the entire secreted protein which they encode.
Alternatively, they may be used to express fragments of the secreted protein. The fragments may comprise the signal peptides encoded by the cDNAs or the mature proteins encoded by the cDNAs (i.e. the proteins generated when the signal peptide is cleaved off). The fragments may also comprise polypeptides having at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids encoded by the cDNAs.
Antibodies which specifically recognize the entire secreted proteins encoded by the cDNAs or fragments thereof having at least 10 consecutive amino acids, at least: 15 consecutive amino acids, at least consecutive amino acids, or at least 40 consecutive amino acids may also be obtained as described below. Antibodies which specifically recognize the mature protein generated when the signal peptide is cleaved may also be obtained as described below. Similarly, antibodies which specifically recognize the signal peptides encoded by the cDNAs may also be obtained.
25 In some embodiments, the cDNAs include the signal sequence. In other embodiments, the cDNAs may include the full coding sequence for the mature protein (i.e. the protein generated when the signal polypeptide is cleaved off). In addition, the cDNAs may include regulatory regions upstream of the translation start site or downstream of the stop codon which control the amount, location, or developmental stage of gene expression. As discussed above, secreted proteins are therapeutically important. Thus, the proteins expressed from the cDNAs may be useful in treating or controlling a variety of human conditions.
The cDNAs may also be used to obtain the corresponding genomic DtJA. The term "corresponding genomic DNA" refers to the genomic DNA which encodes mRNA which includes the sequence of one of the strands of the cDNA in which thymidine residues in the sequence of the cDNA are replaced by uracil residues in the mRNA.
The cDNAs or genomic DNAs obtained therefrom may be used in forensic procedures to identify individuals or in diagnostic procedures to identify individuals having genetic diseases resulting from abnormal expression of the genes corresponding to the cDNAs. In addition, the present invention is useful for constructing a high resolution map of the human chromosomes.
The present invention also relates to secretion vectors capable of directing the secretion of a protein of interest. Such vectors may be used in gene therapy strategies in which it is desired to produce a gene product in one cell which is to be delivered to another location in the body. Secretion vectors may also facilitate the purification of desired proteins.
The present invention also relates to expression vectors capable of directing the expression of an inserted gene in a desired spatial or temporal manner or at a desired level.
Such vectors may include sequences upstream of the cDNAs such as promoters or upstream regulatory sequences.
In addition, the present invention may also be used for gene therapy to control or treat genetic diseases. Signal peptides may also be fused to heterologous proteins to direct their extracellular secretion.
One embodiment of the present invention is a purified or isolated nucleic acid comprising the sequence of one of SEO ID NOs: 24-73 or a sequence complementary thereto. In one aspect of this embodiment, the nucleic acid is recombinant.
Another embodiment of the present invention is a purified or isolated nucleic acid comprising at least 8 consecutive bases of the sequence of one of SEQ ID NOs: 24-73 or one of the sequences complementary thereto. In one aspect of this embodiment, the nucleic acid comprises at least 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive bases of one of the sequences of SEQ ID NOs: 24-73 or one of the sequences complementary thereto. The nucleic acid may be a recombinant nucleic acid.
Another embodiment of the present invention is a purified or isolated nucleic acid of at least 15 bases capable of hybridizing under stringent conditions to the sequence of one of SEQ ID NOs: 24-73 or a sequence complementary to one of the sequences of SEQ ID NOs: 24-73. In one aspect of this embodiment, the nucleic acid is recombinant.
Another embodiment of the present invention is a purified or isolated nucleic acid comprising the full coding sequences of one of SEQ ID NOs: 24-73, wherein the full coding sequence optionally comprises the sequence encoding signal peptide as well as the sequence encoding mature protein. In one aspect of this embodiment, the nucleic acid is recombinant.
A further embodiment of the present invention is a purified or isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 24-73 which encode a mature protein. In one aspect of this embodiment, the nucleic acid is recombinant.
Yet another embodiment of the present invention is a purified or isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 24-73 which encode the signal peptide.
In one aspect of this embodiment, the nucleic acid is recombinant.
Another embodiment of the present invention is a purified or isolated nucleic acid encoding a polypeptide having the sequence of one of the sequences of SEQ ID NOs: 74-123.
Another embodiment of the present invention is a purified or isolated nucleic acid encoding a polypeptide having the sequence of a mature protein included in one of the sequences of SEO ID NOs: 74-123.
Another embodiment of the present invention is a purified or isolated nucleic acid encoding a polypeptide having the sequence of a signal peptide included in one of the sequences of SEO ID NOs: 74-123.
Yet another embodiment of the present invention is a purified or isolated protein comprising the sequence of one of SEO ID NOs: 74-123.
Another embodiment of the present invention is a purified or isolated polypeptide comprising at least 5 or 8 consecutive amino acids of one of the sequences of SEO IG NOs: 74-123. In one aspect of this embodiment, the purified or isolated polypeptide comprises at least 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 consecutive amino acids of one of the sequences of SEQ ID NOs:
74-123.
Another embodiment of the present invention is an isolated or purified polypeptide comprising a signal peptide of one of the polypeptides of SEQ ID NOs: 74-123.
Yet another embodiment of the present invention is an isolated or purified polypeptide comprising a mature protein of one of the poiypeptides of SEQ ID NOs: 74-123.
A further embodiment of the present invention is a method of making a protein comprising one of the sequences of SEQ ID N0: 74-123, comprising the steps of obtaining a cDNA
comprising one of the sequences of sequence of SEQ ID N0: 24-73, inserting the cDNA in an expression vector such that the cDNA is operably linked to a promoter, and introducing the expression vector into a host cell whereby the host cell produces the protein encoded by said cDNA. In one aspect of this embodiment, the method further comprises the step of isolating the protein.
Another embodiment of the present invention is a protein obtainable by the method described in the preceding paragraph.
Another embodiment of the present invention is a method of making a protein comprising the amino acid sequence of the mature protein contained in one of the sequences of SEQ
ID N0: 74-123, comprising the steps of obtaining a cDNA comprising one of the nucleotides sequence of sequence of SEQ ID N0: 24-73 which encode for the mature protein, inserting the cDNA in an expression vector such that the cDNA is operably linked to a promoter, and introducing the expression vector into a host cell whereby the host cell produces the mature protein encoded by the cDNA. In one aspect of this embodiment, the method further comprises the step of isolating the protein.
Another embodiment of the present invention is a mature protein obtainable by the method described in the preceding paragraph.
Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising the sequence of one of SEQ lD NOs: 24-73 or a sequence complementary thereto described herein.

Another embodiment of the present invention is a host cell containing the purified or isolated nucleic .rids comprising the full coding sequences of one of SEO ID NOs: 24-73, wherein the full coding sequence comprises the sequence encoding the signal peptide and the sequence encoding the mature protein described herein.
5 Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising the nucleotides of one of SEO ID NOs: 24-73 which encode a mature protein which are described herein.
Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising the nucleotides of one of SEO ID NOs: 24-73 which encode the signal peptide which are 10 described herein.
Another embodiment of the present invention is a purified or isolated antibody capable of specifically binding to a protein having the sequence of one of SEQ lD NOs: 74-123. In one aspect of this embodiment, the antibody is capable of binding to a polypeptide comprising at least 10 consecutive amino acids of the sequence of one of SEO ID NOs: 74-123.
Another embodiment of the present invention is an array of cDNAs or fragments thereof of at least 15 nucleotides in length which includes at least one of the sequences of SEQ
ID NOs: 24-73, or one of the sequences complementary to the sequences of SEO ID NOs: 24-73, or a fragment thereof of at least 15 consecutive nucleotides. in one aspect of this embodiment, the array includes at least two of the sequences of SEQ ID NOs: 24-73, the sequences complementary to the sequences of SEA ID
NOs: 24-73, or fragments thereof of at least 15 consecutive nucleotides. in another aspect of this embodiment, the array includes at least five of the sequences of SEQ ID NOs: 24-73, the sequences complementary to the sequences of SEQ ID NOs: 24-73, or fragments thereof of at least 15 consecutive nucleotides.
A further embodiment of the invention encompasses purified polynucleotides comprising an insert from a clone deposited in an ECACC deposit, which contains the sequences of SEQ ID NOs. 25-40 and 42-46, having an accession No. 99061735 and named SignaITag 15061999 or deposited in an ECACC deposit having an accession No. 98121805 and named SignaITag 166-191, which contains SEQ ID NOs.: 47-73, or a fragment of these nucleic acids comprising a contiguous span of at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 nucleotides of said insert. An additional embodiment of the invention encompasses purified polypeptides which comprise, consist of, or consist essentially of an amino acid sequence encoded by the insert from a clone deposited in an ECACC
deposit, which contains the sequences of SEQ ID NOs. 25-40 and 42-46, having an accession No. 99061735 and named SignaITag 15061999 or deposited in an ECACC deposit having an accession No. 98121805 and named SignaITag 166-191, which contains SEA ID NOs.: 47-73, as well as polypeptides which comprise a fragment of said amino acid sequence consisting of a signal peptide, a mature protein, or a contiguous span of at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 amino acids encoded by said insert.
An additional embodiment of the invention encompasses purified polypeptides which comprise a contiguous span of at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 amino acids of SEa ID NOs: 74-123, wherein said contiguous span comprises at least one of the amino acid positions which was not shown to be identical to a public sequence in any of Figures 10 to 13. Also encompassed by the invention are purified polynucleotides encoding said polypeptides.
Another embodiment of the present invention is a computer readable medium having stored thereon a sequence selected from the group consisting of a cDNA code of SEOID
NOs. 24-73 and a polypeptide code of SEQ ID NOs. 74-123.
Another embodiment of the present invention is a computer system comprising a processor and a data storage device wherein the data storage device has stored therean a sequence selected from the group consisting of a cDNA code of SEQID NOs. 24-73 and a polypeptide code of SEO ID
NOs. 74-123. In some embodiments the computer system further comprises a sequence comparer and a data storage device having reference sequences stored thereon. For example, the sequence comparer may comprise a computer program which indicates polymorphisms. In other aspects of the computer system, the system further comprises an identifier which identifies features in said sequence.
Another embodiment of the present invention is a method for comparing a first sequence to a reference sequence wherein the first sequence is selected from the group consisting of a cDNA code of SEOID NOs. 24-73 and a polypeptide code of SEO ID NOs. 74-123 comprising the steps of reading the first sequence and the reference sequence through use of a computer program which compares sequences and determining differences between the first sequence and the reference sequence with the computer program.
In some aspects of this embodiment, said step of determining differences between the first sequence and the reference sequence comprises identifying polymorphisms.
Another embodiment of the present invention is a method for identifying a feature in a sequence selected from the group consisting of a cDNA code of SEOID NOs. 24-73 and a polypeptide code of SEQ ID
NOs. 74-123 comprising the steps of reading the sequence through the use of a computer program which identifies features in sequences and identifying features in the sequence with said computer program.
Brief Description of the Drawings Figure 1 is a table with all of the parameters that can be used for each step of cDNA analysis.
Figure 2 is an analysis of the 43 amino terminal amino acids of all human SwissProt proteins to determine the frequency of false positives and false negatives using the techniques for signal peptide identification described herein.
Figure 3 provides a diagram of a RT-PCR-based method to isolate cDNAs containing sequences adjacent to 5'ESTs used to obtain them Figure 4 provides a schematic description of the promoters isolated and the way they are assembled with the corresponding 5' tags.
Figure 5 describes the transcription factor binding sites present in each of these promoters.
Figure 6 is a block diagram of an exemplary computer system.
Figure 7 is a flow diagram illustrating one embodiment of a process 200 for comparing a new nucleotide or protein sequence with a database of sequences in order to determine the homology levels between the new sequence and the sequences in the database.
Figure 8 is a flow diagram illustrating one embodiment of a process 250 in a computer for determining whether two sequences are homologous.
Figure 9 is a flow diagram illustrating one embodiment of an identifier process 300 for detecting the presence of a feature in a sequence.
Figure 10 illustrates an alignment of the protein of SEO ID N0: 76, encoded by the cDNA SEQ ID
N0: 26 with the parotid HPSP protein (SEO ID N0: 124).
Figure 11 illustrates an alignment of the protein of SEO ID N0: 93, encoded by the cDNA SEQ ID
N0: 43 with a human transmembrane protein (SEO ID N0: 125). The conserved cysteines are in bold. The conserved region around the second cysteine is underlined. The potential active site QxVxG is in italics.
Figure 12 illustrates an alignment of the protein of SEO ID N0: 75, encoded by the cDNA SEQ ID
N0: 25 with a human putative sialyltransferase (SEO ID N0: 126), displaying 89.4% identical residues in a 301 amino acid overlap. The sialylmotifS is in bold. The sialylmotifL is in italics. The potential transmembrane segments are underlined.
Figure 13 illustrates an alignment of the protein of SEO ID N0: 104, encoded by the extended cDNA
SEQ ID N0: 54, with the murine recombination activating gene 1 inducing protein (SEQ ID N0: 177).
Detailed Description of the Preferred Embodiment I. Obtaininq cDNA libraries including the 5'Ends of their Corresponding mRNAs The cDNAs of the present invention may include the entire coding sequence of the protein encoded by the corresponding mRNA, including the authentic translation start site, the signal sequence, and the sequence encoding the mature protein remaining after cleavage of the signal peptide. Such cDNAs are referred to herein as "full length cDNAs.' Alternatively, the cDNAs may include only the sequence encoding the mature protein remaining after cleavage of the signal peptide, or only the sequence encoding the signal peptide.
The methods explained therein can also be used to obtain cDNAs which encode less than the entire coding sequence of the secreted proteins encoded by the genes corresponding to the cDNAs. In some embodiments, the cDNAs isolated using these methods encode at least 5 amino acids of one of the proteins encoded by the sequences of SEQ ID NOs: 24-73. In further embodiments, the cDNAs encode at least 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 consecutive amino acids of the proteins encoded by the sequences of SEQ ID NOs: 24-73. In a prefer-ed embodiment, the cDNAs encode a full length protein sequence, which includes the protein coding sequences of SEQ ID
NOs: 24-73.
The cDNAs of the present invention were obtained from cDNA libraries derived from mRNAs having intact 5' ends as described in Examples 1 to 5 using either a chemical or enzymatic approach.

Preparation of mRNA
Total human RNAs or polyA+ RNAs derived from different tissues were respectively purchased from LABIMO and CLONTECH and used to generate cDNA libraries as described below. The purchased RNA had been isolated from cells or tissues using acid guanidium thiocyanate-phenol-chloroform extraction (Chomczyniski and Sacchi, Analytical Biochemistry 162:156-159, 1987). PolyA+
RNA was isolated from total RNA (LABIMO) by two passes of oligo dT chromatography, as described by Aviv and Leder, Proc. Nafl.
Acad. Sci. USA 69:1408-1412, 1972) in order to eliminate ribosomal RNA.
The quality and the integrity of the polyA+ RNAs were checked. Northern blots hybridized with a probe corresponding to an ubiquitous mRNA, such as elongation factor 1 or elongation factor 2, were used to confirm that the mRNAs were not degraded. Contamination of the polyA~ mRNAs by ribosomal sequences was checked using Northern blots and a probe derived from the sequence of the 28S rRNA.
Preparations of mRNAs with less than 5% of rRNAs were used in library construction. To avoid constructing libraries with RNAs contaminated by exogenous sequences (prokaryotic or fungal), the presence of bacterial 16S ribosomal sequences or of two highly expressed fungal mRNAs was examined using PCR.

Methods for Obtaining mRNAs having Intact 5' Ends Following preparation of the mRNAs from various tissues as described above, selection of mRNA
with intact 5' ends and specific attachment of an oligonucleotide tag to the 5' end of such mRNA is performed using either a chemical or enzymatic approach. Both techniques take advantage of the presence of the "cap" structure, which characterizes the 5'end of intact mRNAs and which comprises a guanosine generally methylated once, at the 7 position.
The chemical modification approach involves the optional elimination of the 2', 3'-cis diol of the 3' terminal ribose, the oxidation of the 2', 3', -cis diol of the ribose linked to the cap of the 5' ends of the mRNAs into a dialdehyde, and the coupling of the dialdehyde to a deriva6zed oligonucleotide tag. Further detail regarding the chemical approaches for obtaining mRNAs having intact 5' ends are disclosed in International Application No. W096134981, published November 7,1996.
The enzymatic approach for ligating the oligonucleotide tag to the 5' ends of mRNAs with intact 5' ends involves the removal of the phosphate groups present on the 5' ends of uncapped incomplete mRNAs, the subsequent decapping of mRNAs with intact 5' ends and the ligation of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide tag. Further detail regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are disclosed in Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University, Le clonage des ADNc complets: difficultes et perspectives nouvelles. Apports pour (etude de la regulation de (expression de la tryptophane hydroxylase de rat, 20 Dec.
1993), EPO 625572 and Kato ef aL, Gene 150:243-250 (1994).
In either the chemical or the enzymatic approach, the oligonucleotide tag has a restriction enzyme site (e.g. EcoRl sites) therein to facilitate later cloning procedures.
Following attachment of the oligonucleotide tag to the mRNA, the integrity of the mRNA was then examined by performing a Northern blot using a probe complementary to the oligonucleotide tag.
cDNA Synthesis Using mRNA Templates Having Intact 5' Ends For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic method, first strand cDNA synthesis was performed using reverse transcriptase with an oligo-dT primer or random nonamer. In some instances, this oligo-dT primer contained an internal tag of at least 4 nucleotides which is different from one tissue to the other. In order to protect internal EcoRl sites in the cDNA from digestion at later steps in the procedure, methylated dCTP was used for first strand synthesis. After removal of RNA by an alkaline hydrolysis, the first strand of cDNA was precipitated using isopropanol in order to eliminate residual primers.
The second strand of the cDNA was then synthesized with a Klenow fragment using a primer corresponding to the 5'end of the ligated oligonucleotide. Preferably, the primer is 20-25 bases in length.
Methylated dCTP was also used for second strand synthesis in order to protect internal EcoRl sites in the cDNA from digestion during the cloning process.

Cloning of cDNAs derived from mRNA with intact 5' ends into BIueScript Following second strand synthesis, the cDNAs were cloned into the phagemid pBIueScript II SK-vector (Stratagene). The ends of the cDNAs were blunted with T4 DNA polymerase (Biolabs) and the cDNA
was digested with EcoRl. Since methylated dCTP was used during cDNA synthesis, the EcoRl site present in the tag was the only hemi-methylated site, hence the only site susceptible to EcoRl digestion. In some instances, to facilitate subcloning, an Hind III adaptor was added to the 3' end of cDNAs.
The cDNAs were then size fractionated using either exclusion chromatography (AcA, Biosepra) or electrophoretic separation which yields 3 or 6 different fractions. The cDNAs were then directionally cloned either into pBIueScript using either the EcoRl and Smal restriction sites or the EcoRl and Hind III restriction sites when the Hind III adaptator was present in the cDNAs. The ligation mixture was electroporated into bacteria and propagated under appropriate antibiotic selection.

Selection of Clones Having the Oligonucleotide Tag Attached Thereto Clones containing the oligonucleotide tag attached to cDNAs were then selected as follows.
The plasmid DNAs containing cDNA libraries made as described above were purified (Qiagen). A
positive selection of the tagged clones was performed as follows. Briefly, in this selection procedure, the plasmid DNA was converted to single stranded DNA using gene II endonuclease of the phage F1 in combination with an exonuclease (Chang et al., Gene 127:95-8, 1993) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA was then purified using paramagnetic beads as described by Fry et aL, Biotechniques, 13: 124-131, 1992. In this procedure, the single stranded DNA was hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3' end of the oligonucleotide tag described in example 2. Preferably, the primer has a length of 20-25 bases.
Clones including a sequence complementary to the biotinylated oligonucleotide were captured by incubation with streptavidin coated WO 00/3?491 PCT/IB99/02058 magnetic beads followed by magnetic selection. After capture of the positive clones, the plasmid DNA was released from the magnetic beads and converted into double stranded DNA using a DNA polymerase such as the ThermoSequenase obtained from Amersham Pharmacia Biotech.
Alternatively, protocols such as the Gene Trapper kit (Gibco BRL) may be used. The double stranded DNA was then electroporated into 5 bacteria. The percentage of positive clones having the 5' tag oligonucleotide was estimated to typically rank between 90 and 98% using dot blot analysis.
Following electroporation, the libraries were ordered in 384-microtiter plates (MTP). A copy of the MTP was stored for future needs. Then the libraries were transferred into 96 MTP.
II. Characterization of the 5' Ends of Clones 10 In order to sequence only cDNAs which contain the 5' ends of their corresponding mRNA, a first round of sequencing was performed on the 5' end of clones as described in example 6. In some instances, only a partial sequence of the clone, therein referred to as "5'EST" was obtained. In other instances, the complete sequence of the clone, herein referred to as a "cDNA" is obtained. A
computer analysis was then performed on the 5' ESTs or cDNAs as described in Examples 7 and 8 in order to evaluate the quality of the 15 cDNA libraries and in order to select clones containing sequences of interest among cDNAs which contain the 5' ends of their corresponding mRNA.

Se4uencing of The 5'End of cDNA Clones The 5' ends of cloned cDNAs were then sequenced as follows. Plasmid inserts were first amplified by PCR on PE 9600 thermocyclers (Perkin-Elmer, Applied Biosystems Division, Foster City, CA) using standard SETA-A and SETA-B primers (Genset SA), AmpIiTaqGold (Perkin-Elmer), dNTPs (Boehringer), buffer and cycling conditions as recommended by the Perkin-Elmer Corporation.
PCR products were then sequenced using automatic ABI Prism 377 sequencers (Perkin Elmer).
Sequencing reactions were performed using PE 9600 thermocyclers with standard dye-primer chemistry and ThermoSequenase (Amersham Pharmacia Biotech). The primers used were either T7 or 21 M13 (available from Genset SA) as appropriate. The primers were labeled with the JUE, FAM, ROX and TAMRA dyes. The dNTPs and ddNTPs used in the sequencing reactions were purchased from Boehringer. Sequencing buffer, reagent concentrations and cycling conditions were as recommended by Amersham.
Following the sequencing reaction, the samples were precipitated with ethanol, resuspended in fom~amide loading buffer, and loaded on a standard 4% acrylamide gel.
Electrophoresis was performed for 2.5 hours at 3000V on an ABI 377 sequencer, and the sequence data were collected and analyzed using the ABI Prism DNA Sequencing Analysis Software, version 2.1.2.
The sequence data obtained from the sequencing of 5' ends of all cDNA
libraries made as described above were transferred to a proprietary database, where quality control and validation steps were performed. A proprietary base-caller, working using a Unix system automatically flagged suspect peaks, taking into account the shape of the peaks, the inter-peak resolution, and the noise level. The proprietary base-caller also performed an automatic trimming. Any stretch of 25 or fewer bases having more than 4 WO 00/3'1491 PCT/IB99/02058 suspect peaks was considered unreliable and was discarded. Sequences con-esponding to cloning vector or iigation oligonucleotides were automatically removed from the sequences.
However, the resulting sequences may contain 1 to 5 nucleotides belonging to the above mentioned sequences at their 5' end. If needed, these can easily be removed on a case by case basis.
Following sequencing as described above, the sequences of the cDNA clones were entered in a database for storage and manipulation as described below. Before searching the cDNA clones in the database for sequ~w ~ ~~s of interest, cDNAs derived from mRNAs which were not of interest were identified and eliminated, namely, endogenous contaminants (ribosomal RNAs, transfert RNAs, mitochondria) RNAs) and exogenous contaminants (prokaryotic RNAs and fungal RNAs) using software and parameters described in Figure 1. In addition, cDNA sequences showing showing homology to repeated sequences (Alu, L1,THE and MER repeats, SSTR sequences or satellite, micro-satellite, or telomeric repeats) were identified and masked in further processing.

Determination of Efficiency of 5' End Selection To determine the efficiency at which the above selection procedures isolated cDNAs which include the 5' ends of their corresponding mRNAs, the sequences of 5'ESTs or cDNAs were aligned with a reference pool of complete mRNAIcDNA extracted from the EMBL release 57 using the FASTA algorithm.
The reference mRNAIcDNA starting at the most 5' transcription start site was obtained, and then compared to the 5' transcription start site position of the 5'EST or cDNA. More than 75% of 5'ESTs or cDNAs had their 5' ends close to the 5' ends of the known sequence. As some of the mRNA
sequences available in the EMBL database are deduced from genomic sequences, a 5' end matching with these sequences will be counted as an internal match. Thus, the method used here underestimates the yield of 5'ESTs or cDNAs including the authentic 5' ends of their corresponding mRNAs.

Identification of Open Readino Frames Coding For Potential Signal Peptides The obtained nucleic acid sequences were then screened to identify those having uninterrupted open reading frames (ORF) with a good coding probability using proprtetary software. When the full-length cDNA was obtained, only complete ORFs, namely nucleic acid sequences beginning with a start codon and ending with a stop codon, longer than 150 nucleotides were considered. When only 5'EST sequences were obtained, both complete ORFS longer than 150 nucleotides and v~complete ORFs, namely nucleic acid sequences beginning .,~ith a start codon and extending up to tr end of the 5'EST, longer than 60 nucleotides were cons; v:-red.
The retrieved ORFs were then searched to identify potential signal motifs using slight modifications of the procedures disGosed in Von Heijne, Nucleic Acids Res. 14:4683-4690, 1986. Those 5'ESTs or cDNA
sequences encoding a polypep6de with a score of at least 3.5 in the Von Heijne signal peptide identification matrix were considered to possess a signal sequence. Those 5'ESTs or cDNAs which matched a known human mRNA or EST sequence and had a 5' end more than 30 nucleotides downstream of the known 5' end were excluded from further analysis.

Confirmation of Accuracy of Identification of Potential Signal Sepuences in 5' ESTs The accuracy of the above procedure for identifying signal sequences encoding signal peptides was evaluated by applying the method to the 43 amino acids located at the N
terminus of all human SwissProt proteins. The computed Von Heijne score for each protein was compared with the known characterization of the protein as being a secreted protein or a non-secreted protein. In this manner, the number of non-secreted proteins having a score higher than 3.5 (false positives) and the number of secreted proteins having a score lower than 3.5 (false negatives) could be calculated.
Using the results of the above analysis, the probability that a peptide encoded by the 5' region of the mRNA is in fact a genuine signal peptide based on its Von Heijne's score was calculated based on either the assumption that 10% of human proteins are secreted or the assumption that 20% of human proteins are secreted. The results of this analysis are shown in figure 2.
Using the above method of identification of secretory proteins, 5' ESTs of the following polypeptides known to be secreted were obtained: human glucagon, gamma interferon induced monokine precursor, secreted cyclophilin-like protein, human pleiotropin, and human biotinidase precursor. Thus, the above method successfully identified those 5' ESTs which encode a signal peptide.
To confirm that the signal peptide encoded by the 5' ESTs or cDNAs actually functions as a signal peptide, the signal sequences from the 5' ESTs or cDNAs may be cloned into a vector designed for the identification of signal peptides. Such vectors are designed to confer the ability to grow in selective medium only to host cells containing a vector with an operably linked signal sequence. For example, to confirm that a 5' EST or cDNA encodes a genuine signal peptide, the signal sequence of the 5' EST or cDNA may be inserted upstream and in frame with a non-secreted form of the yeast invertase gene in signal peptide selection vectors such as those described in U.S. Patent No. 5,536,637. Growth of host cells containing signal sequence selection vectors with the correctly inserted 5' EST or cDNA
signal sequence confirms that the 5' EST or cDNA encodes a genuine signal peptide.
Alternatively, the presence of a signal peptide may be confirmed by cloning the 5'ESTs or cDNAs into expression vectors such as pXT1 as described below, or by constructing promoter-signal sequence-reporter gene vectors which encode fusion proteins between the signal peptide and an assayable reporter protein. After introduction of these vectors into a suitable host cell, such as COS cells or NIH 3T3 cells, the growth medium may be harvested and analyzed for the presence of the secreted protein. The medium from these cells is compared to the medium from control cells containing vectors lacking the signal sequence or cDNA insert to identify vectors which encode a functional signal peptide or an authentic secreted protein.

Evaluation of Expression Levels and Patterns of mRNAs Corresponding to 5' ESTs or cDNAs WO 00/37491 PC'1'/IB99/02058 The spatial and temporal expression patterns of the mRNAs corresponding to the 5' ESTs or cDNAs, as well as their expression levels, may be determined. Characterization of the spatial and temporal expression patterns and expression levels of these mRNAs is useful for constructing expression vectors capable of producing a desired level of gene product in a desired spatial or temporal manner, as will be discussed in more detail below.
In addition, cDNAs or 5' ESTs whose corresponding mRNAs are associated with disease states may also be identified. For example, a particular disease may result from lack of expression, over expression, or under expression of an mRNA corresponding to a cDNA or 5' EST.
By comparing mRNA
expression patterns and quantities in samples taken from healthy individuals with those from individuals suffering from a particular disease, cDNAs and 5' ESTs responsible for the disease may be identified.
Expression levels and patterns of mRNAs corresponding to 5' ESTs or cDNAs may be analyzed by solution hybridization with long probes as described in International Patent Application No. WO 97/05277.
Briefly, a 5' EST, cDNA, or fragment thereof corresponding to the gene encoding the mRNA to be characterized is inserted at a cloning site immediately downstream of a bacteriophage (T3, T7 or SP6) RNA
polymerase promoter to produce antisense RNA. Preferably, the 5' EST or cDNA
is 100 or more nucleotides in length. The plasmid is linearized and transcribed in the presence of ribonucleotides comprising modified ribonucleotides (i.e. biotin-UTP and DIG-UTP). An excess of this doubly labeled RNA is hybridized in solution with mRNA isolated from cells or tissues of interest. The hybridizations are performed under standard stringent conditions (40-50°C for 16 hours in an 80%
formamide, 0.4 M NaCI buffer, pH 7-8). The unhybridized probe is removed by digestion with ribonucleases specific for single-stranded RNA (i.e.
RNases CL3, T1, Phy M, U2 or A). The presence of the biotin-UTP modification enables capture of the hybrid on a microtitration plate coated with streptavidin. The presence of the DIG mod~cation enables the hybrid to be detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline phosphatase.
The 5' ESTs, cDNAs, or fragments thereof may also be tagged with nucleotide sequences for the serial analysis of gene expression (SAGE) as disclosed in UK Patent Application No. 2 305 241 A. In this method, cDNAs are prepared from a cell, tissue, organism or other source of nucleic acid for which it is desired to determine gene expression patterns. The resulting cDNAs are separated into two pools. The cDNAs in each pool are cleaved with a first restriction endonuclease, called an 'anchoring enzyme," having a recognition site which is likely to be present at least once in most cDNAs.
The fragments which contain the 5' or 3' most region of the cleaved cDNA are isolated by binding to a capture medium such as streptavidin coated beads. A first oligonucleotide linker having a first sequence for hybridization of an amplification primer and an internal restriction site for a "tagging endonuclease" is ligated to the digested cDNAs in the first pool. Digestion with the second endonuclease produces short "tag" fragments from the cDNAs.
A second oligonucleotide having a second sequence for hybridization of an amplification primer and an internal restriction site is ligated to the digested cDNAs in the second pool. The cDNA fragments in the second pool are also digested with the "tagging endonuGease' to gbnerate short 'tag" fragments derived from the cDNAs in the second pool. The "tags" resulting from digestion of the first and second pools with the anchoring enzyme and the tagging endonuclease are ligated to one another to produce "ditags." In some embodiments, the ditags are concatamerized to produce ligation products containing from 2 to 200 ditags.
The tag sequences are then determined and compared to the sequences of the 5' ESTs or cDNAs to determine which 5' ESTs or cDNAs are expressed in the cell, tissue, organism, or other source of nucleic acids from which the tags were derived. In this way, the expression pattern of the 5' ESTs or cDNAs in the cell, tissue, organism, or other source of nucleic acids is obtained.
Quantitative analysis of gene expression may also be performed using ariays.
As used herein, the term array means a one dimensional, two dimensional, or multidimensional arrangement of full length cDNAs (i.e. cDNAs which include the coding sequence for the signal peptide, the coding sequence for the mature protein, and a stop codon), cDNAs, 5' ESTs or fragments of the full length cDNAs, cDNAs, or 5' ESTs of sufficient length to permit specific detection of gene expression.
Preferably, the fragments are at least 15 nucleotides in length. More preferably, the fragments are at least 100 nucleotides in length. More preferably, the fragments are more than 100 nucleotides in length. In some embodiments the fragments may be more than 500 nucleotides in length.
For example, quantitative analysis of gene expression may be performed with full length cDNAs, cDNAs, 5' ESTs, or fragments thereof in a complementary DNA microarray as described by Schena et aI.
(Science 270:467-470, 1995; Proc. Nafi. Acad. Sci. U.S.A. 93:10614-10619, 1996). Full length cDNAs, cDNAs, 5' ESTs or fragments thereof are amplified by PCR and arrayed from 96-well microtiter plates onto silylated microscope slides using high-speed robotics. Printed arrays are incubated in a humid chamber to allow rehydration of the array elements and rinsed, once in 0.2% SDS for 1 min, twice in water for 1 min and once for 5 min in sodium borohydride solution. The arrays are submerged in water for 2 min at 95°C, transferred into 0.2% SDS for 1 min, rinsed twice with water, air dried and stored in the dark at 25°C.
Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a single round of reverse transcription. Probes are hybridized to 1 cm2 microarrays under a 14 x 14 mm glass coverslip for 6-12 hours at 60°C. Arrays are washed for 5 min at 25°C in low stringency wash buffer (1 x SSCl0.2% SDS), then for 10 min at room temperature in high stringency wash buffer (0.1 x SSC!0.2% SDS).
Arrays are scanned in 0.1 x SSC using a fluorescence laser scanning device fitted with a custom filter set.
Accurate differential expression measurements are obtained by taking the average of the ratios of two independent hybridizations.
quantitative analysis of the expression of genes may also be performed with full length cDNAs, cDNAs, 5' ESTs, or fragments thereof in complementary DNA arrays as described by Pietu et al. (Genome Research 6:492-503,1996). The full length cDNAs, cDNAs, 5' ESTs or fragments thereof are PCR amplified and spotted on membranes. Then, mRNAs originating from various tissues or cells are labeled with radioactive nucleotides. After hybridization and washing in controlled conditions, the hybridized mRNAs are detected by phospho-imaging or autoradiography. Duplicate experiments are performed and a quantitative analysis of differentially expressed mRNAs is then performed.

Alternatively, expression analysis of the 5' ESTs or cDNAs can be done through high density nucleotide arrays as described by Lockhart ef al. (Nature Biotechnology 14:
1675-1680, 1996) and Sosnowsky et al. (Proc. Natl. Acad. Sci. 94:1119-1123, 1997). Oligonucleotides of 15-50 nucleotides corresponding to sequences of the 5' ESTs or cDNAs are synthesized directly on the chip (Lockhart et al., 5 supra) or synthesized and then addressed to the chip (Sosnowski et aJ., supra). Preferably, the oligonucleotides are about 20 nucleotides in length.
cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or fluorescent dye, are synthesized from the appropriate mRNA population and then randomly fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the chip. After washing as described in 10 Lockhart et aJ., supra and application of different electric fields (Sosnowsky ef aJ., Proc. Natl. Acad. Sci.
94:1119-1123)., the dyes or labeling compounds are detected and quantified.
Duplicate hybridizations are performed. Comparative analysis of the intensity of the signal originating from cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential expression of the mRNA
corresponding to the 5' EST or cDNA from which the oligonucleotide sequence has been designed.
15 III. Characterization of cDNAs including the 5'End of their Corresponding mRNA

Characterization of the complete seguence of cDNA clones Clones which include the 5'end of their corresponding mRNA and which encode a new protein with a signal peptide as determined in the aforementioned procedure were then fully sequenced as follows.
20 First, both 5' and 3' ends of cloned cDNAs were sequenced twice in order to confirm the identity of the clone using a Die Terminator approach with the AmpIiTaq DNA polymerase FS
kit available from Perkin Elmer. Second, primer walking was performed if the full coding region had not been obtained yet using software such as OSP to choose primers and automated computer software such as ASMG (Sutton ef al., Genome Science TechnoL 1: 9-19, 1995} to construct contigs of walking sequences including the initial 5' tag. Contigation was then performed using 5' and 3' sequences and eventually primer walking sequences.
The sequence was considered complete when the resulting contigs included the full coding region as well as overlapping sequences with vector DNA on both ends. In addition, clones were entirely sequenced in order to obtain at least two sequences per clone. Preferably, the sequences were obtained from both sense and antisense strands. All the contigated sequences for each clone were then used to obtain a consensus sequence which was then submitted to the computer analysis described below.
Alternatively, clones which include the 5'end of their corresponding mRNA and which encode a new protein with a signal peptide, as determined in the aforementioned procedure, may be subcloned into an appropriate vector such as pED6dpc2 (DiscoverEase, Genetics Institute, Cambridge, MA) before full sequencing.

Determination of Structural and Functional Features Following identification of contaminants and masking of repeats, structural features, e.g. polyA tail and polyadenylation signal, of the sequences of cDNAs were subsequently determined using the algorithm, parameters and criteria defined in figure 1. Briefly, a polyA tail was defined as a homopolymeric stretch of at least 11 A with at most one alternative base within it. The polyA tail search was restricted to the last 100 nt of the sequence and limited to stretches of 11 consecutive A's because sequencing reactions are often not readable after such a polyA stretch. To search for a polyadenylation signal, the polyA tail was clipped from the full-length sequence. The 50 by preceding the polyA tail were searched for the canonic pofyadenylation AAUAAA signal allowing one mismatch to account for possible sequencing en-ors as well as known variation in the canonical sequence of the polyadenylation signal.
Functional features, e.g. ORFs and signal sequences, of the sequences of cDNAs were subsequently determined as follows. The 3 upper strand frames of cDNAs were searched for ORFs defined as the maximum length fragments beginning with a translation initiation codon and ending with a stop codon.
ORFs encoding at least 80 amino acids were preferred. Each found ORF was then scanned for the presence of a signal peptide using the matrix method described in example 10.
Sequences of cDNAs were then compared, on a nucleotidic or proteic basis, to public sequences available at the time of filing.

Selection of Full Length Seguences cDNAs that had already been characterized by the aforementioned computer analysis were then submitted to an automatic procedure in order to preselect cDNAs containing sequences of interest.
a) Automatic sequence preselection All cDNAs clipped for vector on both ends were considered. First, a negative selection was performed in order to eliminate sequences which resulted from either contaminants or artifacts as follows.
Sequences matching contaminant sequences were discarded as well as those encoding ORF sequences exhibiting extensive homology to repeats. Sequences lacking polyA tail were also discarded. Those cDNAs which matched a known human mRNA or EST sequence and had a 5' end more than 30 nucleotides downstream of the known 5' end were also excluded from further analysis. Only ORFs ending before the polyA tail were kept.
Then, for each remaining cDNA containing several ORFs, a preselection of ORFs was performed using the following criteria. The longest ORF was preferred. If the ORF sizes were similar, the chosen ORF
was the one which signal peptide had the highest score according to Von Heijne method as defined in Example 10.
Sequences of cDNA clones were then compared pairwise with BLAST after masking of the repeat sequences. Sequences containing at least 90% homology over 30 nucleotides were clustered in the same class. Each cluster was then subjected to a clustal analysis that detects sequences resulting from internal priming or from alternative splicing, identical sequences or sequences with several frameshifts. This automatic analysis served as a basis for manual selection of the sequences.
b) Manual sequence selection Manual selection was Gamed out using automatically generated reports for each sequenced cDNA
clone. During the manual selection procedure, a selection was performed between clones belonging to the same class as follows. ORF sequences encoded by clones belonging to the same class were aligned and compared. If the homology between nucleotidic sequences of clones belonging to the same class was more than 90% over 30 nucleotide stretches or if the homology between amino acid sequences of clones belonging to the same class was more than 80% over 20 amino acid stretches, then the clones were considered as being identical. The chosen ORF was either the one exhibiting matches with known amino acid sequences or the best one according to the criteria mentioned in the automatic sequence preselection section. If the nucleotide and amino acid homologies were less than 90% and 80% respectively, the clones were said to encode distinct proteins which can be both selected if they contain sequences of interest.
Selection of full length cDNA clones encoding sequences of interest was performed using the following criteria. Structural parameter; (initial tag, polyadenyiation site and signal, eventually matches with public ESTs in 5' or 3' of the sequence) were first checked in order to confirm that the cDNA was complete in 5' and in 3'. Then, homologies with known nucleic acids and proteins were examined in order to determine whether the clone sequence matched a known nucleic acid or protein sequence and, in the latter case, its covering rate and the date at which the sequence became public. If there was no extensive match with sequences other than ESTs or genomic DNA, or if the clone sequence included substantial new information, such as encoding a protein resulting from alternative splicing of an mRNA
coding for an already known protein, the sequence was kept. Examples of such cloned full length cDNAs containing sequences of interest are described in Examplel4. Sequences resulting from chimera or double inserts as assessed by homology to other sequences were discarded during this procedure.

Characterization of Full-length cDNAs The procedure described above was used to obtain or full length cDNAs derived from a variety of tissues. The following list provides a few examples of thus obtained cDNAs.
Using this procedure, the full length cDNA of SEQ ID N0:1 (internal identification number 108-005-5-0-F9-FLC) was obtained. This cDNA encodes a potentially secreted protein (SEO ID N0:2) with a signal peptide having a von Heijne score of 4.1.
Using this procedure, the full length cDNA of SEO ID N0:3 (internal identification number 108-004-5-0-G10-FLC) was obtained. This cDNA encodes a potentially secreted protein (SEO ID N0:4) with a signal peptide having a yon Heijne score of 5.3.
Using this procedure, the full length cDNA of SEQ IC v0:5 (internal identification number 108-004 5-0-B12-FLC) was obtained. This cDNA encodes a potentially secreted protein (SEQ ID N0:6) with a signal peptide having a yon Heijne score of 7Ø
Using this procedure, the full length cDNA of SEQ ID N0:7 (internal identification number 108-013-5-0-G5-FLC) was obtained. This cDNA encodes a potentially secreted protein (SEO ID N0:8) with a signal peptide having a von Heijne score of 9.4.
Furthermore, the polypeptides encoded by the extended or full-length cDNAs may be screened for the presence of known structural or functional motifs or for the presence of signatures, small amino acid sequences which are well conserved amongst the members of a protein family.
Some of the results obtained for the polypeptides encoded by full-length cDNAs that were screened for the presence of known protein signatures and motifs using the Proscan software from the GCG package and the Prosite database are provided below.
The protein of SEQ ID NO :10 encoded by the full-length cDNA SEQ ID N0:9 (internal designation 108-013-5-0-H9-FLC) shows homologies with a family of lysophospholipases conserved among eukaryotes (yeast, rabbit, rodents and human). In addition, some members of this family exhibit a calcium-independent phospholipase A2 activity (Portilla et al., J. Am. Soc. Nephro., 9 :1178-1186 (1998)). All members of this family exhibit the active site consensus GXSXG motif of carboxylesterases that is also found in the protein of SEQ ID N0:10 (position 54 to 58). In addition, this protein may be a membrane protein with one transmembrane domain as predicted by the software TopPred II (Claras and von Heijne, CABIOS applic.
Notes, 10 :685-686 (1994)). Taken together, these data suggest that the protein of SErD ID N0:10 may play a role in fatty acid metabolism, probably as a phospholipase. Thus, this protein or part therein, may be useful in diagnosing andlor treating several disorders including, but not limited to, cancer, diabetes, and neurodegenerative disorders such as Parkinson's and Alzheimer's diseases. It may also be useful in modulating inflammatory responses to infectious agents andlor to suppress graft rejection.
The protein of SEQ ID N0: 12 encoded by the full-length cDNA SEO ID N0:11 (internal designation 108-004-5-0-D10-FLC) shows remote homology to a subfamily of beta4-galactosyltransferases widely conserved in animals (human, rodents, cow and chicken). Such enzymes, usually type II membrane proteins located in the endoplasmic reticulum or in the Golgi apparatus, catalyze the biosynthesis of glycoproteins, glycolipid glycans and lactose. Their characteristic features defined as those of subfamily A in Breton et al., J. Biochem., 123:1000-1009 (1998) are pretty well conserved in the protein of SEQ ID N0: 12, especially the region I containing the DVD motif (positions 163-165) thought to be involved either in UDP
binding or in the catalytic process itself. In addition, the protein of SEO ID
N0: 12 has the typical structure of a type II protein. indeed, it contains a short 28-amino-acid-long N-terminal tail, a transmembrane segment from positions 29 to 49 and a large 278-amino-acid-long C-terminal tail as predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes, 10 :685-686 (1994)).
Taken together, these data suggest that the protein of SEQ ID N0: 12 may play a role in the biosynthesis of polysaccharides, and of the carbohydrate moieties of glycoproteins and glycolipids andlor in cell-cell recognition. Thus, this protein may be useful in diagnosing andlor treating several types of disorders including, but not limited to, cancer, atherosclerosis, cardiovascular disorders, autoimmune disorders and fieumatic diseases including rheumatoid arthritis.
The protein of SEA ID N0: 14 encoded by the extended cDNA SEQ ID N0: 13 (internal designation 108-004-5-0-E8-FLC) exhibits the typical PROSITE signature for amino acid permeases (positions 5 to 66) which are integral membrane proteins involved in the transport of amino acids into the cell. In addition, the protein of SEQ ID N0: 14 has a transmembrane segment from positions 9 to 29 as predicted by the software TopPred II (Claros and von Neijne, CABIOS applic. Notes, 10 :685-686 (1994)). Taken together, these data suggest that the protein of SEO ID N0: 14 may be involved in amino acid transport. Thus, this protein may be useful in diagnosing and/or treating several types of disorders including, but not limited to, cancer, aminoacidurias, neurodegenerative diseases, anorexia, chronic fatigue, coronary vascular disease, diphtheria, hypoglycemia, male infertility, muscular and myopathies.
Bacteria( clones containing plasmids containing the full length cDNAs described above are presently stored in the inventor's laboratories under the internal identification numbers provided above. The inserts may be recovered from the deposited materials by growing an aliquot of the appropriate bacterial clone in the appropriate medium. The plasmid DNA can then be isolated using plasmid isolation procedures familiar to those skilled in the art such as alkaline lysis minipreps or large scale alkaline lysis plasmid isolation procedures. If desired the plasmid DNA may be further enriched by centrifugation on a cesium chloride gradient, size exclusion chromatography, or anion exchange chromatography. The plasmid DNA
obtained using these procedures may then be manipulated using standard cloning techniques familiar to those skilled in the art. Alternatively, a PCR can be done with primer;
designed at both ends of the cDNA
insertion. The PCR product which corresponds to the cDNA can then be manipulated using standard cloning techniques familiar to those skilled in the art.
The above procedure was also used to obtain the cDNAs of the invention having the sequences of SEQ ID NOs: 24-73. Table I provides the sequence identification numbers of the cDNAs of the present invention, the locations of the first and last nucleicotides of the full coding sequences in SEQ ID NOs: 24-73 (i.e. the nucleotides encoding both the signal peptide and the mature protein, listed under the heading FCS
location in Table I), the locations of the first and last nucleotides in SEQ
ID NOs: 24-73 which encode the signal peptides (listed under the heading SigPep Location in Table I), the locations of the first and last nucleotides in SEQ ID NOs: 24-73 which encode the mature proteins generated by cleavage of the signal peptides (listed under the heading Mature Polypeptide Location in Table I), the locations in SEQ ID NOs: 24-73 of stop codons (listed under the heading Stop Codon Location in Table I), the locations of the first and last nucleotides in SEQ ID NOs: 24-13 of the polyA signals (listed under the heading Poly A Signal Location in Table I) and the locations of the first and Last nucleotides of the polyA
sites (listed under the heading Poly A Site Location in Table I).
Table II lists the sequence identification numbers of the polypeptides of SEQ
ID NOs: 74-123, the locations of the first and last amino acid residues of SEQ ID NOs: 74-123 in the full length polypeptide (second column), the locations of the first and last amino acid residues of SEQ ID NOs: 74-123 in the signal peptides (third column), and the locations of the first and last amino acid residues of SEQ ID NOs: 74-123 in the mature polypeptide created by cleaving the signal peptide from the full length polypeptide (fourth column).
The nucleotide sequences of the sequences of SEQ ID NOs: 24-73 and the amino acid sequences encoded by SEA ID NOs: 24-73 (i.e. amino acid sequences of SEQ ID NOs: 74-123) are provided in the 5 appended sequence listing. In some instances, the sequences are preliminary and may include some incorrect or ambiguous sequences or amino acids. All instances of the symbol "n" in the nucleic acid sequences mean that the nucleotide can be adenine, guanine, cytosine or thymine. For each amino acid sequence, Applicants have identified what they have determined to be the reading frame best identifiable with sequence information available at the time of filing. In some instances the polypeptide sequences in the 10 Sequence Listing contain the symbol "Xaa." These "Xaa" symbols indicate either (1 ) a residue which cannot be identified because of nucleotide sequence ambiguity or (2) a stop colon in the determined sequence where applicants believe one should not exist (if the sequence were determined more accurately). Thus, "Xaa"
indicates that a residue may be any of the twenty amino acids. Iri some instances, several possible identities of the unknown amino acids may be suggested by the genetic code.
15 The sequences of SEQ ID NOs: 24-73 can readily be screened for any errors therein and any sequence ambiguities can be resolved by resequencing a fragment containing such errors or ambiguities on both strands. Nucleic acid fragments for resolving sequencing errors or ambiguities may be obtained from the deposited clones or can be isolated using the techniques described herein.
Resolution of any such ambiguities or errors may be facilitated by using primers which hybridize to sequences located close to the 20 ambiguous or erroneous sequences. For example, the primers may hybridize to sequences within 50-75 bases of the ambiguity or error. Upon resolution of an error or ambiguity" the corresponding corrections can be made in the protein sequences encoded by the DNA containing the error or ambiguity. The amino acid sequence of the protein encoded by a particular clone can also be determined by expression of the clone in a suitable host cell, collecting the protein, and determining its sequence.
Categorization of cDNAs of the Present Invention The nucleic acid sequences of the present invention (SEQ ID NOs. 24-73) were grouped based on their homology to known sequences as follows. All sequences were compared to EMBL release 58 and daily releases available at the time of filing using BLASTN.
In some instances, the cDNAs did not match any known vertebrate sequence nor any publicly available EST sequence, thus being completely new.
All sequences exhibiting more than 90% of homology to known sequences over at least 30 nucleotides were retrieved and further analyzed. Table III gives the sequence identification numbers of these cDNAs (first column) and the positions of preferred fragments within these sequences (second column entitled "Positions of preferred fragments"). Each fragment is represented by x-y where x and y are the start and end positions respectively of a given preferned fragment Preferred fragments are separated from each other by a coma. As used herein the term "polynucleotide described in Table 111" refers to the all of the preferred PC'T/IB99/02058 polynucleotide fragments defined in Table III in this manner. The present invention encompasses isolated, purified, or recombinant nucleic acids which consist of, consist essentially of, or comprise a contiguous span of one of the sequences of SEQ ID Nos. 24-73 or a sequence complementary thereto, said continguous span comprising at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 nucleotides of the sequence of SEQ ID Nos. 24-73 or a sequence complementary thereto, to the extent that a contiguous span of these lengths is consistent with the lengths of the particular sequence, wherein the contiguous span comprises at least 1, 2, 3, 5, 10, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 of a polynucleotide described in Table III, or a sequence complementary thereto. The present invention also encompasses isolated, purified, or recombinant nucleic acids comprising, consisting essentially of, or consisting of a contiguous span of at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 nucleotides of a polynucleotide described in Table III or a sequence complementary thereto, to the extent that a contiguous span of these lengths is consistent with the length of the particular sequence described in Table III. The present invention also encompasses isolated, purified, or recombinant nucleic acids which comprise, consist of or consist essentially of a polynucleotide described in Table lll, or a sequence complementary thereto. The present invention further encompasses any combination of the nucleic acids listed in this paragraph.
Cells containing the cDNAs (SEQ ID NOs: 24-73) of the present invention in the vector pBluescriptll SK- (Stratagene) are maintained in permanent deposit by the inventors at Genset, S.A., 24 Rue Royale, 75008 Paris, France.
A pool of the cells containing the cDNAs (SEQ ID NOs: 24-73), from which the cells containing a particular polynucleotide is obtainable, was deposited on June, 17, 1999, with the European Collection of Cell Cultures (ECACC), Vaccine Research and Production Laboratory, Public Health Laboratory Service, Centre for Applied Microbiology and Reasearch, Porton Down, Salisbury, Wiltshire SP4 OJG, United Kingdom. In addition, a pool of the cells containing the extended cDNAs (SEQ
ID NOs: 47-73), from which the cells containing a particular polynucleotide is obtainable, was deposited on December 18, 1998, with the European Collection of Cell Cultures (ECACC), Vaccine Research and Production Laboratory, Public Health Laboratory Service, Centre for Applied Microbiology and Reasearch, Porton Down, Salisbury, Wiltshire SP4 OJG, United Kingdom. Each cDNA clone has been transfected into separate bacterial cells (E-coli) for these composite deposits. In particular, cells containing the sequences of SEQ ID
Nos: 25-40 and 42-46 were deposited on June, 17, 1999 in the pool having ECACC Accession No. 99061735 and designated SignaITag 15061999. In addition, cells containing the sequences of SEQ ID Nos: 47-73 were deposited on December 18, 1998, in the pool having ECACC Accession No. 98121805 and designated SignafTag 166-191. Table IV
provides the internal designation number assigned to each SEQ lD N0. and indicates whether the sequence is a nucleic acid sequence or a protein sequence.
Each cDNA can be removed from the Bluescript vector in which it was deposited by performing a BsH II double digestion to produce the appropriate fragment for each clone provided the cDNA clone sequence does not c; this restric~ve other restriction erzymes site. Alternati of the multiclonir;
:

_ site of the vector may c 'd to recova desired insert a. ~~,dicated by the manufacturer.

Bacterial ce8s pining a pa ~lar e ot;;~,ined from the composite clone can ~ deposit as follov,~s:

An oligonucle~ probe or : ~es shouldresigned to the sequence that t~~ is known for tt;at particular clone. This s~ ice can ,uences provided herein, or b_ . ed from , from a combination of those sequences. Ttr ~gn of should preferably follow these the c ~cleotie: parameters:

(a) ft should b< signed to .- ~aa :ence which has the fewest of t'. ambiguous bases ("N's"), if any;

(b) Preferably, t~: probe is des f approx. 80;C (assuming 2 ', to hav. degrees for each A

10or T and 4 degrees for e.:.:h G ~ing melting temperatures or C). Ha .~er, pre between 40 ~C and 80 OC may also be used pro,~ded that sped: is not L

The oligonucleotide should pref rith (-[~2P]ATP (specific ply be i~ activity 6000 Cilmmole) and T4 polynucleotide kinase usingechniques for labeling oligonucleotides.
cor mly em; Other labeling techniques can also be -I should preferably be removed used. t corpora by gel filtration 15chromatography or other establishedant of radioactivity incorporated rr ,ods. Tt into the probe should be quantified by measurement;er. Preferably, specific it scintillate- activity of the resulting probe should be approximately 4X106 dF pmole.

The bacterial culture container ngth clones should preferably the pool c be thawed and 100 ~ of the stock used to inoculatecontaining 25 ml of sterile a ; terile cults L-broth containing 20ampicillin at 100 ~g/ml. The culture;rown to saturation at 37i;C, should prefera: , and the saturated culture should preferably be diluted,s of these dilutions should in fresh L-broth. ' preferably be plated to determine the dilution and volumenately 5000 distinct and well-separated which will yield < r colonies on solid bacteriological media ~ ampicillin at 100 ~glml containing L-broth cora.-:~ and agar at 1.5% in a 150 mm petri dish when grown overnightown methods of obtaining distinct, at 370C. Ovh~- . well-separated 25colonies can also be employed.

Standard colony hybridization procedures si;culd then be used to transfer the colonies to nitrocellulose filters and lyse, denature and bake them The filter is then preferably incubated at 65~C for 1 hour with gentle agitation in 6X SSC (20X stock is 175.3 g NaC111iter, 88.2 g Na 7.G with NaOH) containing citratelliter, adjusted to pH 0.5% SDS, 100 pglml 30of yeast RNA, and 10 mM EDTA (approximately ml per 150 mm filter). Preferably, the probe is then added to the hybridization mix at a concentration greater than or equal to 1X106 dpm/ml. The filter is then preferably incubated at 650C with The filter is then preferably gentle agitation overnight. washed in 500 ml of 2X SSCI0.1% SDS at room temperatureshaking for 15 minutes. A
with gentle third wash with 0.1X SSCl0.5% SDS at 65oC for 30 minutes to 1 hour is optional.
The filter is then preferably dried and 35subjected to autoradiography for sufficient time to visualize the positives on the X-ray film. Other known hybridization methods can also be employed.
The positive colonies are picked, grown in culture, and plasmid DNA isolated using standard procedures. The clones can then be verified by restriction analysis, hybridization analysis, or DNA
sequencing.
The plasmid DNA obtained using these procedures may then be manipulated using standard cloning techniques familiar to those skilled in the art. Alternatively, a PCR
can be done with primers designed at both ends of the cDNA insertion. The PCR product which corresponds to the cDNA can then be manipulated using standard cloning techniques familiar to those skilled in the art.
Alternatively, the cDNA clone obtained by the process described in Examples 1 through 13 may not include the entire coding sequence of the protein encoded by the corresponding mRNA, although they do include sequences derived from the 5'ends of their corresponding mRNA. Such 5'EST can be used to isolate extended cDNAs which contain sequences adjacent to the 5' ESTs. Such obtained extended cDNAs may include the entire coding sequence of the protein encoded by the corresponding mRNA, including the authentic translation start site. Examples 16 and 17 below describe methods for obtaining extended cDNAs using 5' ESTs. Example 17 also describes methods to obtain cDNA, mRNA or genomic DNA homologous to cDNA, 5'ESTs, or fragment thereof.
The methods of Examples 16 and 17 can also be used to obtain cDNAs which encode less than the entire coding sequence of proteins encoded by the genes corresponding to the 5' ESTs. In some embodiments, the cDNAs isolated using these methods encode at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 consecutive amino acids of one of the proteins encoded by the sequences of SEQ ID NOs. 24-73.

General Method for Usin 5' ESTs to Clone and Se uence cDNAs which Include the Entire Codin Re ion and the Authentic 5'End of the Corresponding mRNA
The following general method may be used to quickly and efficiently isolate cDNAs including sequence adjacent to the sequences of the 5' ESTs used to obtain them. This method, ilustrated in Figure 3, may be applied to obtain cDNAs for any 5' EST.
The method takes advantage of the known 5' sequence of the mRNA. A reverse transcription reaction is conducted on purified mRNA with a poly dT primer containing a nucleotide sequence at its 5' end allowing the addition of a known sequence at the end of the cDNA which corresponds to the 3' end of the mRNA. Such a primer and a commercially-available reverse transcriptase enzyme are added to a buffered mRNA sample yielding a reverse transcript anchored at the 3' polyA site of the RNAs. Nucleotide monomers are then added to complete the first strand synthesis. After removal of the mRNA hybridized to the first cDNA strand by alkaline hydrolysis, the products of the alkaline hydrolysis and the residual poly dT primer can be eliminated with an exclusion column.
Subsequently, a pair of nested primers on each end is designed based on the known 5' sequence from the 5' EST and the known 3' end added by the poly dT primer used in the first strand synthesis.
Software used to design primers is either based on GC content and melting temperatures of oligonucleotides, such as OSP (Illier and Green, PCR Meth. Appl. 1:124-128, 1991), or based on the octamer frequency disparity method (Griffais et al., Nucleic Acids Res. 19:
3887-3891, 1991 ) such as PC-Rare (http:ll bioinformatics.weizmann.ac.illsoftware/PC-Rareldoclmanuel.html).
Preferably, the nested primers at the 5' end and the nested primers at the 3' end are separated from one another by four to nine bases. These primer sequences may be selected to have melting temperatures and specificities suitable for use in PCR.
A first PCR run is performed using the outer primer from each of the nested pairs. A second PCR
run using the inner primer from each of the nested pairs is then performed on a small aliquot of the first PCR
product. Thereafter, the primers and remaining nucleotide monomers are removed.
Due to the lack of position constraints on the design of 5' nested primer;
compatible for PCR use using the OSP software, amplicons of two types are obtained. Preferably, the second 5' primer is located upstream of the translation initiation codon thus yielding a nested PCR
product containing the entire coding sequence. Such a cDNA may be used in a direct cloning procedure such as the one described in example 4.
However, in some cases, the second 5' primer is located downstream of the translation initiation colon, thereby yielding a PCR product containing only part of the ORF. For such amplicons which do not contain the complete coding sequence, intermediate steps are necessary to obtain both the complete coding sequence and a PCR product containing the full coding sequence. The complete coding sequence can be assembled from several partial sequences determined directly from different PCR products. Once the full coding sequence has been completely determined, new primers compatible for PCR
use are then designed to obtain amplicons containing the whole coding region. However, in such cases, 3' primers compatible for PCR use are located inside the 3' UTR of the corresponding mRNA, thus yielding amplicons which lack part of this region, i.e. the polyA tract and sometimes the polyadenylation signal, as illustrated in Figure 3. Such obtained cDNAs are then cloned into an appropriate vector using a procedure essentially similar to the one described in example 4 Full-length PCR products are then sequenced using a procedure similar to the one described in example 11. Completion of the sequencing of a given cDNA fragment may be assessed by comparing the sequence length to the size of the corresponding nested PCR product. When Northern blot data are available, the size of the mRNA detected for a given PCR product may also be used to finally assess that the sequence is complete. Sequences which do not fulfill these criteria are discarded and will undergo a new isolation procedure.
Full-length PCR products are then cloned in an appropriate vector. For example, the cDNAs can be cloned into a vector using a procedure similar to the one described in example 4. Such full-length cDNA
clones are then double-sequenced and submitted to computer analyses using procedure essentially similar to the ones described in Examples 11 through 13. However, it will be appreciated that full-length cDNA
clones obtained from amplicons lacking part of the 3'UTR may lack polyadenylations sites and signals.

Methods for Obtaining cDNAs or Nucleic Acids Homologous to cDNAs or Fragments Thereof In addition to PCR based methods for obtaining cDNAs, traditional hybridization based methods may also be employed. These methods may also be used to obtain the genomic DNAs which encode the mRNAs from which the cDNA is derived, mRNAs corresponding to the cDNAs, or nucleic acids which are 5 homologous to cDNAs or fragments thereof. Indeed, cDNAs of the present invention or fragments thereof, including 5'ESTs, may also be used to isolate cDNAs or nucleic acids homologous to cDNAs from a cDNA
library or a genomic DNA library as follows. Such cDNA libraries or genomic DNA libraries may be obtained from a commercial source or made using techniques familiar to those skilled in the art such as the one described in Examples 1 through 5. An example of such hybridization-based methods is provided below.
10 Techniques for identifying cDNA clones in a cDNA library which hybridize to a given probe sequence are disclosed in Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989. The same techniques may be used to isolate genomic DNAs.
Briefly, cDNA or genomic DNA clones which hybridize to the detectable probe are identified and isolated for further manipulation as follows. A probe comprising at least 10 consecutive nucleotides from the 15 cDNA or fragment thereof is labeled with a detectable label such as a radioisotope or a fluorescent molecule.
Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the cDNA or fragment thereof. More preferably, the probe comprises 20 to 30 consecutive nucleotides from the cDNA or fragment thereof. In some embodiments, the probe comprises more than 30 nucleotides from the cDNA or fragment thereof.
20 Techniques for labeling the probe are well known and include phosphorylation with polynucleotide kinase, nick translation, in vitro transcription, and non radioactive techniques. The cDNAs or genomic DNAs in the library are transferred to a nitrocellulose or nylon filter and denatured. After blocking of non specific sites, the filter is incubated with the labeled probe for an amount of time sufficient to allow binding of the probe to cDNAs or genomic DNAs containing a sequence capable of hybridizing thereto.
25 By varying the stringency of the hybridization conditions used to identify cDNAs or genomic DNAs which hybridize to the detectable probe, cDNAs or genomic DNAs having different levels of homology to the probe can be identified and isolated as described below.
1. Isolation of cDNA or Genomic DNA Se uences Havin a Hi h De ree of Homolo to the Labeled Probe To identify cDNAs or genomic DNAs having a high degree of homology to the probe sequence, the 30 melting temperature of the probe may be calculated using the following formulas:
For probes between 14 and 70 nucleotides in length the me wig temperature (Tm) is calculated using the formula: Tm=81.5+16.6(log (Na+))+0.41(fraction G+C)-(600%' 'here N
is the length of the probe.
If the hybridization is carried out in a solution containing fon- de, the melting temperature may be calculated using the equation Tm=81.5+16.6(log (Na+))+0.41(fractron G+C)-(0.63% formamide)-(600/N) where N is the length of the probe.
Prehybridization may be carved out in 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 100 Ng denatured fragmented salmon sperm DNA or 6X SSC, 5X Denhardt's reagent, 0.5%
SDS, 100 Ng denatured fragmented salmon sperm DNA, 50% formamide. The formulas for SSC and Denhardt's solutions are listed in Sambrook et al., supra.
Hybridization is conducted by adding the detectable probe to the prehybridization solutions listed above. Where the probe comprises double stranded DNA, it is denatured before addition to the hybridization solution. The filter is contacted with the hybridization solution for a sufficient period of time to allow the probe to hybridize to cDNAs or genomic DNAs containing sequences complementary thereto or homologous thereto. For probes over 200 nucleotides in length, the hybridization may be carried out at 15-25°C below the Tm. For shorter probes, such as oligonucleotide probes, the hybridization may be conducted at 15-25°C
below the Tm. Preferably, for hybridizations in 6X SSC, the hybridization is conducted at approximately 68°C. Preferably, for hybridizations in 50% formamide containing solutions, the hybridization is conducted at approximately 42°C.
All of the foregoing hybridizations would be considered to be under "stringent" conditions.
Following hybridization, the filter is washed in 2X SSC, 0.1% SDS at room temperature for 15 minutes. The filter is then washed with 0.1 X SSC, 0.5% SDS at room temperature for 30 minutes to 1 hour.
Thereafter, the solution is washed at the hybridization temperature in 0.1 X
SSC, 0.5% SDS. A final wash is conducted in 0.1X SSC at room temperature.
cDNAs or genomic DNAs which have hybridized to the probe are identified by autoradiography or other conventional techniques.
2. Isolation of cDNA or Genomic DNA Seguences Having Lower Degrees of Homology to the Labeled Probe The above procedure may be modified to identify cDNAs or genomic DNAs having decreasing levels of homology to the probe sequence. For example, to obtain cDNAs or genomic DNAs of decreasing homology to the detectable probe, less stringent conditions may be used. For example, the hybridization temperature may be decreased in increments of 5°C from 68°C to 42°C in a hybridization buffer having a sodium concentration of approximately 1 M. Following hybridization, the filter may be washed with 2X SSC, 0.5% SDS at the temperature of hybridization. These conditions are considered to be "moderate" conditions above 50°C and "low" conditions below 50°C.
Alternatively, the hybridization may be Gamed out in buffers, such as 6X SSC, containing formamide at a temperature of 42°C. In this case, the concentration of formamide in the hybridization buffer may be reduced in 5% increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. Following hybridization, the filter may be washed with 6X SSC, 0.5% SDS at 50°C. These conditions are considered to be "moderate" conditions above 25% fonmamide and "low" conditions below 25% formamide. cDNAs or genomic DNAs which have hybridized to the probe are identified by autoradiography or other conventional techniques.
3. Determination of the Degree of Homoloay between the Obtained cDNAs or Genomic DNAs and cDNAs or Fra4ments thereof Used as the Labeled Probe or Between the Polypeptides Encoded by the Obtained cDNAs or Genomic DNAs and the Pol a tides Encoded b the cDNAs or Fra ment Thereof Used as the Labeled Probe To determine the level of homology between the hybridized cDNA or genomic DNA
and the cDNA
or fragment thereof from which the probe was derived, the nucleotide sequences of the hybridized nucleic acid and the cDNA or fragment thereof from which the probe was derived are compared. The sequences of the cDNA or fragment thereof from which the probe was derived and the sequences of the cDNA or genomic DNA which hybridized to the detectable probe may be stored on a computer readable medium as described below and compared to one another using any of a variety of algorithms familiar to those skilled in the art such as those described below.
To determine the level of homology between the polypeptide encoded by the hybridizing cDNA or genomic DNA and the polypeptide encoded by the cDNA or fragment thereof from which the probe was derived, the polypeptide sequence encoded by the hybridized nucleic acid and the polypeptide sequence encoded by the cDNA or fragment thereof from which the probe was derived are compared. The sequences of the polypeptide encoded by the cDNA or fragment thereof from which the probe was derived and the polypeptide sequence encoded by the cDNA or genomic DNA which hybridized to the detectable probe may be stored on a computer readable medium as described below and compared to one another using any of a variety of algorithms familiar to those skilled in the art such as those described below.
Protein and/or nucleic acid sequence homologies may be evaluated using any of the variety of sequence comparison algorithms and programs known in the art. Such algorithms and programs include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988, Proc. Nafl. Acad. Sci. USA 85(8):2444-2448; Altschul et al., 1990, J. Moi.
Biol. 215(3):403-410; Thompson et aL, 1994, Nucleic Acids Res. 22(2):4673-4680; Higgins et al., 1996, Methods Enzymol. 266:383-402;
Altschul et ai., 1990, J. Mol. Biol. 215(3):403-410; Altschul ef al., 1993, Nature Genetics 3:266-272).
In a particularly preferred embodiment, protein and nucleic acid sequence homologies are evaluated using the Basic Local Alignment Search Tool ("BLAST') which is well known in the art (see, e.g., Karlin and Altxhul, 1990, Proc. Nafl. Acad. Sci. USA 87:2267-2268; Altschul et at., 1990, J. Mol. Biol.
215:403-410; Altschul et al., 1993, Nature Genetics 3:266-272; Altschul ef al., 1997, Nuc. Acids Res.
25:3389-3402). In particular, five specific BLAST programs are used to perform the following task:
(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein sequence database;
(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence database;
(3) BLASTX compares the six-frame conceptual translation products of a query nucleotide sequence (both strands) against a protein sequence database;
(4) TBLASTN compares a query protein sequence against a nucleotide sequence database translated in all six reading frames (both strands); and (5) TBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
The BLAST programs identify homologous sequences by identifying similar segments, which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix (Gonnet ef al., 1992, Science 256:1443-1445; Henikoff and Henikoff, 1993, Proteins )7:49-61). Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 19i'8, Matrices for Defecfing Distance Relationships: Atlas of Protein Sequence and Structure, Washington: National Biomedical Research Foundation) The BLAST programs evaluate the statistical significance of all high-scoring segment pairs identified, and preferably selects those segments which satisfy a user-specified threshold of significance, such as a user-specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1990, Proc.
Natl. Acad. Sci. USA 87:2267-2268).
The parameters used with the above algorithms may be adapted depending on the sequence length and degree of homology studied. In some embodiments, the parameters may be the default parameters used by the algorithms in the absence of instructions from the user.
In some embodiments, the level of homology between the hybridized nucleic acid and the cDNA or fragment thereof from which the probe was derived may be determined using the FASTDB algorithm described in Brutlag et al. Comp. App. Biosci. 6:237-245, 1990. In such analyses the parameters may be selected as follows: Matrix=Unitary, k-tuple=4, Mismatch Penalty=1, Joining Penalty=30, Randomization Group Length=0, Cutoff Score=1, Gap Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the length of the sequence which hybridizes to the probe, whichever is shorter. Because the FASTDB program does not consider 5' or 3' truncations when calculating homology levels, if the sequence which hybridizes to the probe is truncated relative to the sequence of the cDNA or fragment thereof from which the probe was derived the homology level is manually adjusted by calculating the number of nucleotides of the cDNA or fragment thereof which are not matched or aligned with the hybridizing sequence, determining the percentage of total nucleotides of the hybridizing sequence which the non-matched or non-aligned nucleotides represent, and subtracting this percentage from the homology level. For example, if the hybridizing sequence is 700 nucleotides in length and the cDNA or fragment thereof sequence is 1000 nucleotides in length wherein the first 300 bases at the 5'end of the cDNA or fragment thereof are absent from the hybridizing sequence, and wherein the overlapping 700 nucieotides are identical, the homology level would be adjusted as follows. The non-matched, non-aligned 300 bases represent 30% of the length of the cDNA or fragment thereof. If the overlapping 700 nucleotides are 100% identical, the adjusted homology level would be 100-30=70%
homology. It should be noted that the preceding adjustments are only made when the non-matched or non-aligned nucleotides are at the 5'or 3'ends. No adjustments are made if the non-matched or non-aligned sequences are internal or under any other condftions.
For example, using the above methods, nucleic acids having at least 95%
nucleic acid homology, at least 96% nucleic acid homology, at least 97% nucleic acid homology, at least 98% nucleic acid homology, at least 99% nucleic acid homology, or more than 99% nucleic acid homology to the cDNA or fragment thereof from which the probe was derived may be obtained and identified. Such nucleic acids may be allelic variants or related nucleic acids from other species. Similarly, by using progressively less stringent hybridization conditions one can obtain and identify nucleic acids having at least 90%, at least 85%, at least 80% or at least 75% homology to the cDNA or fragment thereof from which the probe was derived.
Using the above methods and algorithms such as FASTA with parameters depending on the sequence length and degree of homology studied, for example the default parameters used by the algorithms in the absence of instructions from the user, one can obtain nucleic acids encoding proteins having at least 99%, at least 98%, at least 97%, at least 96%, at least 95%, at least 90%, at least 85%, at least 80% or at least 75% homology to the protein encoded by the cDNA or fragment thereof from which the probe was derived. In some embodiments, the homology levels can be determined using the "default"
opening penalty and the "default" gap penalty, and a scoring matrix such as PAM 250 (a standard scoring matrix; see Dayhoff et al., in: Atlas of Protein Sequence and Structure, Vol.
5, Supp. 3 (1978)).
Alternatively, the level of polypeptide homology may be determined using the FASTDB algorithm described by Brutlag et al. Comp. App. Biosci. 6:237-245, 1990. fn such analyses the parameters may be selected as follows: Matrix=PAM 0, k-tuple=2, Mismatch Penalty=1, Joining Penalty=20, Randomization Group Length=0, Cutoff Score=1, Window Size=Sequence Length, Gap Penalty=5, Gap Size Penalty=0.05, Window Size=500 or the length of the homologous sequence, whichever is shorter. If the homologous amino acid sequence is shorter than the amino acid sequence encoded by the cDNA or fragment thereof as a result of an N terminal andlor C terminal deletion the results may be manually corrected as follows. First, the number of amino acid residues of the amino acid sequence encoded by the cDNA or fragment thereof which are not matched or aligned with the homologous sequence is determined.
Then, the percentage of the length of the sequence encoded by the cDNA or fragment thereof which the non-matched or non-aligned amino acids represent is calculated. This percentage is subtracted from the homology level. For example wherein the amino acid sequence encoded by the cDNA or fragment thereof is 100 amino acids in length and the length of the homologous sequence is 80 amino acids and wherein the amino acid sequence encoded by the cDNA or fragment thereof is truncated at the N terminal end with respect to the homologous sequence, the homology level is calculated as follows. In the preceding scenario there are 20 non-matched, non-aligned amino acids in the sequence encoded by the cDNA or fragment thereof. This represents 20% of the length of the amino acid sequence encoded by the cDNA or fragment thereof.
If the remaining amino acids are 100% identical between the two sequences, the homology level would be 100%-20%=80%
homology. No adjustments are made if the non-matched or non-aligned sequences are internal or under any other conditions.
In addition to the above described methods, other protocols are available to obtain homologous cDNAs using cDNA of the present invention or fragment thereof as outlined in the following paragraphs.

WO 00/3?491 PCT/IB99/02058 cDNAs may be prepared by obtaining mRNA from the tissue, cell, or organism of interest using mRNA preparation procedures utilizing polyA selection procedures or other techniques known to those skilled in the art. A first primer capable of hybridizing to the polyA tail of the mRNA is hybridized to the mRNA and a reverse transcription reaction is performed to generate a first cDNA strand.
5 The first cDNA strand is hybridized to a second primer containing at least 10 consecutive nucleotides of the sequences of SEQ ID NOs 24-73. Preferably, the primer comprises at least 10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides from the sequences of SEQ ID
NOs 24-73. In some embodiments, the primer comprises more than 30 nucleotides from the sequences of SEO ID NOs 24-73. If it is desired to obtain cDNAs containing the full protein coding sequence, including the authentic translation 10 initiation site, the second primer used contains sequences located upstream of the translation initiation site.
The second primer is extended to generate a second cDNA strand complementary to the first cDNA strand.
Alternatively, RT-PCR may be performed as described above using primers from both ends of the cDNA to be obtained.
cDNAs containing 5' fragments of the mRNA may be prepared by hybridizing an mRNA comprising 15 the sequences of SEQ ID NOs. 24-73 with a primer comprising a complementary to a fragment of the known cDNA, genomic DNA or fragment thereof hybridizing the primer to the mRNAs, and reverse transcribing the hybridized primer to make a first cDNA strand from the mRNAs. Preferably, the primer comprises at least 10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides of the sequences complementary to SEQ ID
NOs. 24-73.
20 Thereafter, a second cDNA strand complementary to the first cDNA strand is synthesized. The second cDNA strand may be made by hybridizing a primer complementary to sequences in the first cDNA
strand to the first cDNA strand and extending the primer to generate the second cDNA strand.
The double stranded cDNAs made using the methods described above are isolated and cloned.
The cDNAs may be cloned into vectors such as plasmids or viral vectors capable of replicating in an 25 appropriate host cell. For example, the host cell may be a bacterial, mammalian, avian, or insect cell.
Techniques for isolating mRNA, reverse transcribing a primer hybridized to mRNA to generate a first cDNA strand, extending a primer to make a second cDNA strand complementary to the first cDNA
strand, isolating the double stranded cDNA and cloning the double stranded cDNA are well known to those skilled in the art and are described in Current Protocols in Molecular Biology, John Wiley & Sons, Inc. 1997 30 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989.
Alternatively, other procedures may be used for obtaining full-length cDNAs or homologous cDNAs.
In one approach, cDNAs are prepared from mRNA and cloned into double stranded phagemids as follows.
The cDNA library in the double stranded phagemids is then rendered single stranded by treatment with an 35 endonuclease, such as the Gene II product of the phage F1 and an exonuclease (Chang ef aL, Gene 127:95-8, 1993). A biotinylated oligonucleotide comprising the sequence of a fragment of a known cDNA, genomic DNA or fragment thereof is hybridized to the single stranded phagemids. Preferably, the fragment comprises at least 10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides of the sequences of SEO ID
NOs. 24-73.
Hybrids between the biotinylated oligonucleotide and phagemids are isolated by incubating the hybrids with streptavidin coated paramagnetic beads and retrieving the beads with a magnet (Fry et at., Biotechniques, i3: 124-131, 1992). Thereafter, the resulting phagemids are released from the beads and converted into double stranded DNA using a primer specific for the cDNA or fragment thereof used to design the biotinylated oligonucleotide. Altemativefy, protocols such as the Gene Trapper kit (Gibco BRL) may be used. The resulting double stranded DNA is transformed into bacteria.
Homologous cDNAs or full length cDNAs containing the cDNA or fragment thereof sequence are identified by colony PCR or colony hybridization.
Using any of the above described methods, a plurality of cDNAs containing full-length protein coding sequences or fragments of the protein coding sequences may be provided as cDNA libraries for subsequent evaluation of the encoded proteins or use in diagnostic assays as described below.
cDNAs prepared by any method described therein may be subsequently engineered to obtain nucleic acids which include desired fragments of the cDNA using conventional techniques such as subcloning, PCR, or in vitro oligonucleotide synthesis. For example, nucleic acids which include only the full coding sequences (i.e. the sequences encoding the signal peptide and the mature protein remaining after the signal peptide peptide is cleaved off) may be obtained using techniques known to those skilled in the art.
Alternatively, conventional techniques may be applied to obtain nucleic acids which contain only the coding sequence for the mature protein remaining after the signal peptide is cleaved off or nucleic acids which contain only the coding sequences for the signal peptides.
Similarly, nucleic acids containing any other desired fragment of the coding sequences for the encoded protein may be obtained. For example, the nucleic acid may contain at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive bases of a cDNA.
Once a cDNA has been obtained, it can be sequenced to determine the amino acid sequence it encodes. Once the encoded amino acid sequence has been determined, one can create and identify any of the many conceivable cDNAs that will encode that protein by simply using the degeneracy of the genetic code. For example, allelic variants or other homologous nucleic acids can be identified as described below.
Alternatively, nucleic acids encoding the desired amino acid sequence can be synthesized in vitro.
In a preferred embodiment, the coding sequence may be selected using the known codon or codon pair preferences for the host organism in which the cDNA is to be expressed.
IV. Use of cDNA or Fragments Thereof to Express Proteins and Uses of Those Expressed Proteins Using any of the above described methods, cDNAs containing the full protein coding sequences of their corresponding mRNAs or portions thereof, such as cDNAs encoding the mature protein, may be used to express the secreted proteins or portions thereof which they encode as described below. If desired, the cDNAs may contain the sequences encoding the signal peptide to facilitate secretion of the expressed protein. It will be appreciated that a plurality of extended cDNAs containing the full protein coding sequences or portions thereof may be simultaneously cloned into expression vectors to create an expression library for analysis of the encoded proteins as described below.

Expression of the Proteins Encoded by cDNAs or Fragments Thereof To express the proteins encoded by the cDNAs or fragments thereof, nucleic acids containing the coding sequence for the proteins or fragments thereof to be expressed are obtained as described above and cloned into a suitable expression vector. If desired, the nucleic acids may contain the sequences encoding the signal peptide to facilitate secretion of the expressed protein. For example; the nucleic acid may comprise the sequence of one of SEQ ID NOs: 24-73 listed in Table I and in the accompanying sequence listing. Alternatively, the nucleic acid may comprise those nucleotides which make up the full coding sequence of one of the sequences of SEQ ID NOs: 24-73 as defined in Table I
above.
It will be appreciated that should the extent of the full coding sequence (i.e. the sequence encoding the signal peptide and the mature protein resulting from cleavage of the signal peptide) differ from that listed in Table I as a result of a sequehcing error, reverse transcription or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to identify the extent of the full coding sequences in the sequences of SEQ ID NOs. 24-73. Accordingly, the scope of any claims herein relating to nucleic acids containing the full coding sequence of one of SEQ ID NOs. 24-73 is not to be construed as excluding any readily identifiable variations from or equivalents to the full coding sequences listed in Table I.
Similarly, should the extent of the full length polypeptides differ from those indicated in Table II as a result of any of the preceding factors, the scope of claims relating to polypeptides comprising the amino acid sequence of the full length polypeptides is not to be construed as excluding any readily identifiable variations from or equivalents to the sequences listed in Table II.
Alternatively, the nucleic acid used to express the protein or fragment thereof may comprise those nucleotides which encode the mature protein (i.e. the protein created by cleaving the signal peptide off) encoded by one of the sequences of SEQ ID NOs: 24-73 as defined in Table I
above.
It will be appreciated that should the extent of the sequence encoding the mature protein differ from that listed in Table I as a result of a sequencing error, reverse transcription or amplification error, mRNA
splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to identify the extent of the sequence encoding the mature protein in the sequences of SEQ ID NOs. 24-73.
Accordingly, the scope of any claims herein relating to nucleic acids containing the sequence encoding the mature protein encoded by one of SEQ ID NOs.24-73 is not to be construed as excluding any readily identifiable variations from or equivalents to the sequences listed in Table I. Thus, claims relating to nucleic acids containing the sequence encoding the mature protein encompass equivalents to the sequences listed in Table I, such as sequences encoding biologically active proteins resulting from post-translational modification, enzymatic cleavage, or other readily idenfrfiable variations from or equivalents to the secreted proteins in addition to cleavage of the signal peptide. Similarly, should the extent of the mature polypeptides differ from those indicated in Table II as a result of any of the preceding factor;, the scope of claims relating to polypeptides comprising the sequence of a mature protein included in the sequence of one of SEQ ID NOs. 74-123 is not to be construed as excluding any readily identifiable variations from or equivalents to the sequences listed in Table II Thus, claims relating to polypeptides comprising the sequence of the mature protein encompass equivalents to the sequences listed in Table II, such as biologically active proteins resulting from post-translational modification, enzymatic cleavage, or other readily identifiable variations from or equivalents to the secreted proteins in addition to cleavage of the signal peptide. It will also be appreciated that should the biologically active form of the polypeptides included in the sequence of one of SEO ID NOs.
74-123 or the nucleic acids encoding the biologically active form of the polypeptides differ from those identified as the mature polypeptide in Table II or the nucleotides encoding the mature polypeptide in Table I as a result of a sequencing error, reverse transcription or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to identify the amino acids in the biologically active form of the polypeptides and the nucleic acids encoding the biologically active form of the polypeptides.
In such instances, the claims relating to polypetides comprising the mature protein included in one of SEO
ID NOs. 74-123 or nucleic acids comprising the nucleotides of one of SEO ID NOs. 24-73 encoding the mature protein shall not be construed to exclude any readily identifiable variations from the sequences listed in Table I and Table II.
In some embodiments, the nucleic acid used to express the protein or fragment thereof may comprise those nucleotides which encode the signal peptide encoded by one of the sequences of SEO ID
NOs: 24-73 as defined in Table I above.
It will be appreciated that should the extent of the sequence encoding the signal peptide differ from that listed in Table I as a result of a sequencing error, reverse transcription or amplification error, mRNA
splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to identify the extent of the sequence encoding the signal peptide in the sequences of SEQ ID NOs. 24-73.
Accordingly, the scope of any claims herein relating to nucleic acids containing the sequence encoding the signal peptide encoded by one of SEO
ID NOs.24-73 is not to be construed as excluding any readily identifiable variations from the sequences listed in Table I. Similarly, should the extent of the signal peptides differ from those indicated in Table II as a result of any of the preceding factors, the scope of claims relating to polypeptides comprising the sequence of a signal peptide included in the sequence of one of SEO ID NOs. 74-123 is not to be construed as excluding any readily identifiable variations from the sequences Listed in Table II.
Alternatively, the nucleic acid may encode a polypeptide comprising at least 5 consecutive amino acids of one of the sequences of SEQ ID NOs: 74-123. In some embodiments, the nucleic acid may encode a polypep6de comprising at least 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 consecutive amino acids of one of the sequences of SEQ ID NOs: 74-123.
The nucleic acids inserted into the expression vectors may also contain sequences upstream of the sequences encoding the signal peptide, such as sequences which regulate expression levels or sequences which confer tissue specific expression.
The nucleic acid encoding the protein or polypeptide to be expressed is operably linked to a promoter in an expression vector using conventional cloning technology. The expression vector may be any of the mammalian, yeast, insect or bacterial expression systems known in the art. Commercially available vectors and expression systems are available from a variety of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism in which the expression vector is introduced, as explained by Hatfield, ef al., U.S. Patent No. 5,082,767.
The following is provided as one exemplary method to express the proteins encoded by the cDNAs or the nucleic acids described above. First, the methionine initiation codon for the gene and the poly A
signal of the gene are identified. If the nucleic acid encoding the polypeptide to be expressed lacks a methionine to serve as the initiation site, an initiating methionine can be introduced next to the first codon of the nucleic acid using conventional techniques. Similarly, if the cDNA lacks a poly A signal, this sequence can be added to the construct by, for example, splicing out the Poly A signal from pSGS (Stratagene) using Bgll and Sall restriction endonucfease enzymes and incorporating it into the mammalian expression vector pXT1 (Stratagene). pXT1 contains the LTRs and a fragment of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. The cDNA or fragment thereof encoding the polypeptide to be expressed is obtained by PCR
from the bacterial vector using oligonucleotide primers complementary to the cDNA or fragment thereof and containing restriction endonuclease sequences for Pst I incorporated into the 5'primer and Bglll at the 5' end of the corresponding cDNA 3' primer, taking care to ensure that the cDNA is positioned in frame with the poly A signal. The purified fragment obtained from the resulting PCR reaction is digested with Pstl, blunt ended with an exonuclease, digested with Bgl II, purified and ligated to pXT1, now containing a poly A signal and digested with Bglll The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600uglml 6418 (Sigma, St.
Louis, Missouri). Preferably the expressed protein is released into the culture medium, thereby facilitating purification.
Alternatively, the cDNAs may be cloned into pED6dpc2 (DiscoverEase, Genetics Institute, Cambridge, MA). The resulting pED6dpc2 constructs may be transfected into a suitable host cell, such as COS 1 cells. Methotrexate resistant cells are selected and expanded.
Preferably, the protein expressed from the cDNA is released into the culture medium thereby facilitating purification.

Proteins in the culture medium are separated by gel electrophoresis. If desired, the proteins may be ammonium sulfate precipitated or separated based on size or charge prior to electrophoresis.
As a control, the expression vector lacking a cDNA insert is introduced into host cells or organisms and the proteins in the medium are harvested. The secreted proteins present in the medium are detected 5 using techniques such as Coomassie or silver staining or using antibodies against the protein encoded by the cDNA. Coomassie and silver staining techniques are familiar to those skilled in the art.
Antibodies capable of specifically recognizing the protein of interest may be generated using synthetic 15-mer peptides having a sequence encoded by the appropriate 5' EST, cDNA, or fragment thereof. The synthetic peptides are injected into mice to generate antibody to the polypeptide encoded by 10 the 5' EST, cDNA, or fragment thereof.
Secreted proteins from the host cells or organisms containing an expression vector which contains the cDNA or a fragment thereof are compared to those from the control cells or organism. The presence of a band in the medium from the cells containing the expression vector which is absent in the medium from the control cells indicates that the cDNA encodes a secreted protein. Generally, the band corresponding to the 15 protein encoded by the cDNA will have a mobility neap that expected based on the number of amino acids in the open reading frame of the cDNA. However, the band may have a mobility different than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic cleavage.
Alternatively, if the protein expressed from the above expression vectors does not contain sequences directing its secretion, the proteins expressed from host cells containing an expression vector 20 containing an insert encoding a secreted protein or fragment thereof can be compared to the proteins expressed in host cells containing the expression vector without an insert.
The presence of a band in samples from cells containing the expression vector with an insert which is absent in samples from cells containing the expression vector without an insert indicates that the desired protein or fragment thereof is being expressed. Generally, the band will have the mobility expected for the secreted protein or fragment 25 thereof. However, the band may have a mobility different than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic cleavage.
The protein encoded by the cDNA may be purified using standard immunochromatography techniques. In such procedures, a solution containing the secreted protein, such as the culture medium or a cell extract, is applied to a column having antibodies against the secreted protein attached to the 30 chromatography matrix. The secreted protein is allowed to bind the immunochromatography column.
Thereafter, the column is washed to remove non-specifically bound proteins.
The specifically bound secreted protein is then released from the column and recovered using standard techniques.
if antibody production is not possible, the cDNA sequence or fragment thereof may be incorporated into expression vectors designed for use in purification schemes employing chimeric polypeptides. In such 35 strategies the coding sequence of the cDNA or fragment thereof is inserted in frame with the gene encoding the other half of the chimera. The other half of the chimera may be 0-globin or a nickel binding polypeptide encoding sequence. A chromatography matrix having antibody to D-globin or nickel attached thereto is then used to purify the chimeric protein. Protease cleavage sites may be engineered between the (~globin gene or the nickel binding polypeptide and the cDNA or fragment thereof. Thus, the two polypeptides of the chimera may be separated from one another by protease digestion.
One useful expression vector for generating u-globin chimerics is pSGS
(Stratagene), which encodes rabbit 0-globin. Intron II of the rabbit ~gtobin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques as described are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al., (Basic Methods in Molecular Biology, L.G. Davis, M.D.
Dibner, and J.F. Battey, ed., Eisevier Press, NY, 1986) and many of the methods are available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be produced from the construct using in vitro translation systems such as the In vitro ExpressT""
Translation Kit (Stratagene).
Following expression and purification of the secreted proteins encoded by the 5' ESTs, cDNAs, or fragments thereof, the purified proteins may be tested for the ability I:o bind to the surface of various cell types as described below. It will be appreciated that a plurality of proteins expressed from these cDNAs may be included in a panel of proteins to be simultaneously evaluated for the activities specifically described below, as well as other biological roles for which assays for determining activity are available.
Alternatively, the polypeptide to be expressed may also be a product of transgenic animals, i.e., as a component of the milk of transgenic cows, goats, pigs or sheeps which are characterized by somatic or germ cells containing a nucleotide sequence encoding the protein of interest.

Analysis of Secreted Proteins to Determine Whether they Bind to the Cell Surface The proteins encoded by the cDNAs, or fragments thereof are cloned into expression vectors such as those described in the previous example. The proteins are purified by size, charge, immunochromatography or other techniques familiar to those skilled in the art.
Following purification, the proteins are labeled using techniques known to those skilled in the art. The labeled proteins are incubated with cells or cell lines derived from a variety of organs or tissues to allow the proteins to bind to any receptor present on the cell surface. Following the incubation, the cells are washed to remove non-specifically bound protein. The labeled proteins are detected by autoradiography. Alternatively, unlabeled proteins may be incubated with the cells and detected with antibodies having a detectable label, such as a fluorescent molecule, attached thereto.
Specificity of cell surface binding may be analyzed by conducting a competition analysis in which various amounts of unlabeled protein are incubated along with the labeled protein. The amount of labeled protein bound to the cell surface decreases as the amount of competitive unlabeled protein increases. As a control, various amounts of an unlabeled protein unrelated to the labeled protein is included in some binding reactions. The amount of labeled protein bound to the cell surface does not decrease in binding reactions containing increasing amounts of unrelated unlabeled protein, indicating that the protein encoded by the cDNA binds specifically to the cell surface.

As discussed above, secreted proteins have been shown to have a number of important physiological effects and, consequently, represent a valuable therapeutic resource. The secreted proteins encoded by the cDNAs or fragments thereof made using any of the methods described therein may be evaluated to determine their physiological activities as described below.

Assa in the Proteins Ex ressed from cDNAs or Fra ments Thereof for C okine Cell Proliferation or Cell Differentiation Activity As discussed above, secreted proteins may act as cytokines or may affect cellular proliferation or differentiation. Many protein factors discovered to date, including all known cytokines, have exhibited activity in one or more factor dependent cell proliferation assays, and hence the assays serve as a convenient confirmation of cytokine activity. The activity of a protein of the present invention is evidenced by any one of a number of routine factor dependent cell proliferation assays for cell lines including, without limitation, 32D, DA2, DA1G, T10, B9, 89/11, BaF3, MC9IG, M+ (preB M+), 2E8, RBS, DA1, 123, T1165, HT2, CTLL2, TF-1, Mo7c and CMK. The proteins encoded by the above cDNAs or fragments thereof may be evaluated for their ability to regulate T cell or thymocyte proliferation in assays such as those described above or in the following references: Current Protocols in Immunology, Ed. by J.E. Coligan et al., Greene Publishing Associates and Wiley-Interscience; Takai ef al. J. Immunol. 137:3494-3500, 1986. Bertagnolli et aI. J.
Immunol. 145:1706-1712, 1990. Bertagnolli ef al., Cellular Immunology 133:327-341, 1991. Bertagnolli, et aL J. lmmunol. 149:3778-3783, 1992; Bowman et al., J. lmmunol. 152:1756-1761, 1994.
In addition, numerous assays for cytokine production andlor the proliferation of spleen cells, lymph node cells and thymocytes are known. These include the techniques disclosed in Cun-ent Protocols in Immunology. J.E. Coligan et al. Eds., Vol 1 pp. 3.12.1-3.12.14 John Wiley and Sons, Toronto. 1994; and Schreiber, R.D. Current Protocols in Immunology., supra Vol 1 pp. 6.8.1-6.8.8, John Wiley and Sons, Toronto. 1994.
The proteins encoded by the cDNAs may also be assayed for the ability to regulate the proliferation and differentiation of hematopoietic or lymphopoietic cells. Many assays for such activity are familiar to those skilled in the art, including the assays in the following references:
Bottomly, K., Davis, L.S. and Lipsky, P.E., Measurement of Human and Murine Interleukin 2 and Interleukin 4, _Current Protocols in Immunolooy., J.E. Coligan et aL Eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and Sons, Toronto.
1991; deVries et aL, J. Exp.
Med. 173:1205-1211, 1991; Moreau et al., Nature 36:690-692, 1988; Greenberger et aL, Proc. Natl. Acad.
Sci. U.S.A. 80:2931-2938, 1983; Nordan, R., Measurement of Mouse and Human Interieukin 6 Current Protocols in Immunology. J.E. Coligan et al. Eds. Vol 1 pp. 6.6.1-6.6.5, John Wiley and Sons, Toronto. 1991;
Smith et aL, Proc. NatL Acad. Sci. U.S.A. 83:1857-1861, 1986; Bennett, F., Giannotti, J., Clark, S.C. and Turner, K.J., Measurement of Human Interleukin 11 Current Protocols in Immunology. J.E. Coligan et al.
Eds. Vol 1 pp. 6.15.1 John Wiley and Sons, Toronto. 1991; Ciarietta, A., Giannotti, J., Clark, S.C. and Turner, K.J., Measurement of Mouse and Human Interieukin 9 Current Protocols in Immunology. J.E.
Coligan et al., Eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 1991.

The proteins encoded by the cDNAs may also be assayed for their ability to regulate T-cell responses to antigens. Many assays for such activity are familiar to those skilled in the art, including the assays described in the following references: Chapter 3 (In vitro Assays for Mouse Lymphocyte Function), Chapter 6 (Cytokines and Their Cellular Receptors) and Chapter 7, (Irnmunologic Studies in Humans) in Current Protocols in Immunology, J.E. Coligan et al. Eds. Greene Publishing Associates and Wiley-interscience; Weinberger et al., Proc. Natl. Acad. Sci. USA 77:6091-6095, 1980; Weinberger et al., Eur J.
Immun. 11:405-011, 1981; Takai et al., J. lmmunol. 137:3494-3500, 1986; Takai ef al., J. Immunol. 140:508-512; 1988.
Those proteins which exhibit cytokine, cell proliferation, or cell differentiation activity may then be formulated as pharmaceuticals and used to treat clinical conditions in which induction of cell proliferation or differentiation is beneficial. Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.

Assaying the Proteins Exyressed from cDNAs or Fragments Thereof for Activity as Immune System Regulators The proteins encoded by the cDNAs may also be evaluated for their effects as immune regulators.
For example, the proteins may be evaluated for their activity to influence thymocyte or splenocyte cytotoxicity. Numerous assays for such activity are familiar to those skilled in the art including the assays described in the following references: Chapter 3 (In vitro Assays for Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic studies in Humans) in Current Protocols in ImmunoloQV, J.E. Coligan et al. Eds, Greene Publishing Associates and Wiley-Interscience; Herrmann et al., Proc.
Natl. Acad. Sci. USA 78:2488-2492, 1981; Hemnann et al., J. lmmunol. 128:1968-1974, 1982; Handa et al., J.
Immunol. 135:1564-1572, 1985; Takai et al., J. Immunol. 137:3494-3500, 1986; Takai et al., J. Immunol.
140:508-512, 1988; Hem~nann ef aL, Proc. Natl. Acad. Sci. USA 78:2488-2492, 1981; Hem~nann et aL, J.
ImmunoL 128:1968-1974, 1982;
Handa et aL, J. lmmunoL 135:1564-1572, 1985; Takai et al., J. Immunol.
137:3494-3500, 1986; Bowman et aL, J. Virology 61:1992-1998; Takai et aL, J. Immunol. 140:508-512, 1988;
Bertagnolli ef al:, Cellular Immunology 133:327-341,1991; Brown et al., J. lmmunol. 153:3079-3092, 1994.
The proteins encoded by the cDNAs may also be evaluated for their effects on T-cell dependent immunoglobulin responses and isotype switching. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Maliszewski, J. lmmunol.
144:3028-3033, 1990; Mond, J.J. and Brunswick, M Assays for B Cell Function:
In vitro Antibody Production, Vol 1 pp. 3.8.1-3.8.16 in Current Protocols in Immunolo4y. J.E.
Coligan et al Eds., John Wiley and Sons, Toronto.1994.
The proteins encoded by the cDNAs may also be evaluated for their effect on immune effector cells, including their effect on Th1 cells and cytotoxic lymphocytes. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Chapter 3 (ln WO 00/37491 PC'i'/IB99/02058 vitro Assays for Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic Studies in Humans) in Current Protocols in Immunology, J.E. Coligan et al. Eds., Greene Publishing Associates and Wiley-Interscience; Takai ef al., J. lmmunol. 137:3494-3500, 1986; Takai et al.; J.
Immunol. 140:508-512, 1988;
Bertagnolli et aL, J. lmmunol. 149:3778-3783, 1992.
The proteins encoded by the cDNAs may also be evaluated for their effect on dendritic cell mediated activation of naive T-cells. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Guery et al., J.
Immunol. 134:536-544, 1995;
Inaba et al., Journal of Experimental Medicine 173:549-559, 1991; Macatonia et al., Journal of Immunology 154:5071-5079, 1995; Porgador et al., Journal of Experimental Medicine 182:255-260, 1995; Nair et al., Journal of Virology 67:4062-4069, 1993; Huang et aL, Science 264:961-965, 1994; Macatonia ef al , Journal of Experimental Medicine 169:1255-1264, 1989; Bhardwaj et aL, Journal of Clinical Investigation 94:797-807, 1994; and Inaba et aG, Journal of Experimental Medicine 172:631-640, 1990.
The proteins encoded by the cDNAs may also be evaluated for their influence on the lifetime of lymphocytes. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Darzynkiewicz et al., Cytometry 13:795-808, 1992; Gorczyca et al., Leukemia 7:659-670, 1993; Gorczyca et al., Cancer Research 53:1945-1951, 1993;
Itoh et al., Cell 66:233-243, 1991; Zacharchuk, Journal of Immunology 145:4037-4045, 1990; Zamai ef al., Cytometry 14:891-897, 1993; Gorczyca ef al., International Journal of Oncology 1:639-648, 1992.
Assays for proteins that influence early steps of T-cell commitment and development include, without limitation, those described in: Antica ef al., Blood 84:111-117, 1994;
Fine et aL, Cellular immunology 155:111-122, 1994; Galy et aL, Blood 85:2770-2778, 1995; Toki et al., Proc.
Nat. Acad Sci. USA 88:7548 7551, 1991.
Those proteins which exhibit activity as immune system regulators activity may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of immune activity is beneficial.
For example, the protein may be useful in the treatment of various immune deficiencies and disorders (including severe combined immunodeficiency (SCID)), e.g., in regulating (up or down) growth and proliferation of T andlor B lymphocytes, as well as effecting the cytolytic activity of NK cells and other cell populations. These immune deficiencies may be genetic or be caused by viral (e.g., HIV) as well as bacterial or fungal infections, or may result from autoimmune disorders. More specifically, infectious diseases caused by viral, bacterial, fungal or other infection may be treatable using a protein of the present invention, including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, Leishmania spp., malaria spp. and various fungal infections such as candidiasis. Of course, in this regard, a protein of the present invention may also be useful where a boost to the immune system generally may be desirable, i.e., in the treatment of cancer.
Autoimmune disorders which may be treated using a protein of the present invention include, for example, connective tissue disease, multiple sclerosis, systemic lupus erythematosus, fieumatoid arthritis, autoimmune pulmonary inflammation, Guillain-Barre syndrome, autoimmune thyroiditis, insulin dependent WO 00/37491 PC'T/IB99/02058 diabetes mellitis, myasthenia gravis, graft-versus-host disease and autoimmune inflammatory eye disease.
Such a protein of the present invention may also to be useful in the treatment of allergic reactions and conditions, such as asthma (particularly allergic asthma) or other respiratory problems. Other conditions, in which immune suppression is desired (including, for example, organ transplantation), may also be treatable 5 using a protein of the present invention.
Using the proteins of the invention it may also be possible to regulate immune responses, in a number of ways. Down regulation may be in the form of inhibiting or blocking an immune response already in progress or may involve preventing the induction of an immune response. The functions of activated T-cells may be inhibited by suppressing T cell responses or by inducing specific tolerance in T cells, or both.
10 Immunosuppression of T cell responses is generally an active, non-antigen-specific; process which requires continuous exposure of the T cells to the suppressive agent. Tolerance, which involves inducing non-responsiveness or anergy in T cells, is distinguishable from immunosuppression in that it is generally antigen-specific and persists after exposure to the tolerizing agent has ceased. Operationally, tolerance can be demonstrated by the lack of a T cell response upon reexposure to specific antigen in the absence of the 15 tolerizing agent.
Down regulating or preventing one or more antigen functions (including without limitation B
lymphocyte antigen functions (such as, for example, B7)), e.g., preventing high level lymphokine synthesis by activated T cells, will be useful in situations of tissue, skin and organ transplantation and in graft-versus-host disease (GVHD). For example, blockage of T cell function should result in reduced tissue destruction in 20 tissue transplantation. Typically, in tissue transplants, rejection of the transplant is initiated through its recognition as foreign by T cells, followed by an immune reaction that destroys the transplant. The administration of a molecule which inhibits or blocks interaction of a B7 lymphocyte antigen with its natural ligand(s) on immune cells (such as a soluble, monomeric form of a peptide having B7-2 activity alone or in conjunction with a monomeric form of a peptide having an activity of another B
lymphocyte antigen (e.g., B7-25 1, B7-3) or blocking antibody), prior to transplantation can lead to the binding of the molecule to the natural ligand(s) on the immune cells without transmitting the corresponding costimulatory signal. Blocking B
lymphocyte antigen function in this matter prevents cytokine synthesis by immune cells, such as T cells, and thus acts as an immunosuppressant. Moreover, the lack of costimulation may also be sufficient to anergize the T cells, thereby inducing tolerance in a subject. Induction of long-term tolerance by B lymphocyte 30 antigen-blocking reagents may avoid the necessity of repeated administration of these blocking reagents.
To achieve sufficient immunosuppression or tolerance in a subject, it may also be necessary to block the function of a combination of B lymphocyte antigens.
The efficacy of particular blocking reagents in preventing organ transplant rejection or GVHD can be assessed using animal models that are predictive of efficacy in humans.
Examples of appropriate 35 systems which can be used include allogeneic cardiac grafts in rats and xenogeneic pancreatic islet cell grafts in mice, both of which have been used to examine the immunosuppressive effects of CTLA4Ig fusion proteins in vivo as described in t-enschow ef aL, Science 257:789-792 (1992) and Turka ef aL, Proc. Natl.

Acad. Sci USA, 89:11102-11105 (1992). In addition, murine models of GVHD (see Paul ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 846-847) can be used to determine the effect of blocking B
lymphocyte antigen function in vivo on the development of that disease.
Blocking antigen function may also be therapeutically useful for treating autoimmune diseases.
Many autoimmune disorders are the result of inappropriate activation of T
cells that are reactive against self tissue and which promote the production of cytokines and autoantibodies involved in the pathology of the diseases. Preventing the activation of autoreactive T cells may reduce or eliminate disease symptoms.
Administration of reagents which block costimulation of T cells by disrupting receptor ligand interactions of B
lymphocyte antigens can be used to inhibit T cell activation and prevent production of autoantibodies or T
cell-derived cytokines which may be involved in the disease process.
Additionally, blocking reagents may induce antigen-specific tolerance of autoreactive T cells which could lead to long-term relief from the disease. The efficacy of blocking reagents in preventing or alleviating autoimmune disorders can be determined using a number of well-characterized animal models of human autoimmune diseases. Examples include murine experimental autoimmune encephalitis, systemic lupus erythmatosis in MRUpr/pr mice or NZB hybrid mice, murine autoimmuno collagen arthritis, diabetes mellitus in OD
mice and BB rats, and murine experimental myasthenia gravis (see Paul ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 840-856).
Upregulation of an antigen function (preferably a B lymphocyte antigen function), as a means of up regulating immune responses, may also be useful in therapy. Upregulation of immune responses may be in the form of enhancing an existing immune response or eliciting an initial immune response. For example, enhancing an immune response through stimulating B lymphocyte antigen function may be useful in cases of viral infection. In addition, systemic viral diseases such as influenza, the common cold, and encephalitis might be alleviated by the administration of stimulatory form of B lymphocyte antigens systemically.
Alternatively, anti-viral immune responses may be enhanced in an infected patient by removing T
cells from the patient, costimulating the T cells in vitro with viral antigen-pulsed APCs either expressing a peptide of the present invention or together with a stimulatory form of a soluble peptide of the present invention and reintroducing the in vitro activated T cells into the patient.
The infected cells would now be capable of delivering a costimulatory signal to T cells in vivo, thereby activating the T cells.
In another application, up regulation or enhancement of antigen function (preferably B lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor cells (e.g., sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, carcinoma) transfected with a nucleic acid encoding at least one peptide of the present invention can be administered to a subject to overcome tumor-specific tolerance in the subject If desired, the tumor cell can be transfected to express a combination of peptides. For example, tumor cells obtained from a patient can be transfected ex vivo with an expression vector directing the expression of a peptide having B7-2-like activity alone, or in conjunction with a peptide having B7-1-like activity andlor B7-3-like activity. The transfected tumor cells are returned to the patient to result in WO 00/3?491 expression of the peptides on the surface of the transfected cell.
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in vivo.
The presence of the peptide of the present invention having the activity of a B lymphocyte antigens) on the surface of the tumor cell provides the necessary costimulation signal to T cells to induce a T cell mediated immune response against the transfected tumor cells. In addition, tumor cells which lack MHC class I or MHC class II molecules, or which fail to reexpress sufficient amounts of MHC class I or MHC
class II molecules, can be transfected with nucleic acids encoding all or a fragment of (e.g., a cytoplasmic-domain truncated fragment) of an MHC class I a chain protein and 02 microglobulin protein or an MHC class ll 0 chain protein and an MHC class II 0 chain protein to thereby express MHC
class I or MHC class II proteins on the cell surface. Expression of the appropriate class II or class II MHC in conjunction with a peptide having the activity of a B lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T cell mediated immune response against the transfected tumor cell. Optionally, a gene encoding an antisense construct which blocks expression of an MHC class II associated protein, such as the invariant chain,can also be cotransfected with a DNA encoding a peptide having the activity of a B
lymphocyte antigen to promote presentation of tumor associated antigens and induce tumor specific immunity.
Thus, the induction of a T
cell mediated immune response in a human subject may be sufficient to overcome tumor-specific tolerance in the subject. Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
2p EXAMPLE 22 Assaying the Proteins Expressed from cDNAs or Fragments Thereof for Hematopoiesis Re~ulating Activity The proteins encoded by the cDNAs or fragments thereof may also be evaluated for their hematopoiesis regulating activity. For example, the effect of the proteins on embryonic stem cell differentiation may be evaluated. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Johansson ef at.
Cellular Biology 15:141-151, 1995; Keller ef aL, Molecular and Cellular Biology 13:473-486, 1993;
McClanahan et al., Blood 81:2903-2915,1993.
The proteins encoded by the cDNAs or fragments thereof may also be evaluated for their influence on the lifetime of stem cells and stem cell differentiation. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Freshney, M.G.
Methylcellulose Colony Forming Assays, in Culture of Hematopoietic Cells. R.I.
Freshney, et al. Eds. pp.
265-268, Wiley-Liss, Inc., New York, NY. 1994; Hirayama et al., Proc. Natl.
Acad. Sci. USA 89:5907-5911, 1992; McNiece, LK. and Briddell, R.A. Primitive Hematopoietic Colony Forming Cells with High Proliferative Potential, in Culture of Hematopoietic Cells. R.I. Freshney, et al. eds. Vol pp. 23-39, Wiley-Liss, Inc., New York, NY. 1994; Neben et aL, Experimental Hematology 22:353-359, 1994;
Ploemacher, R.E. Cobblestone Area Forming Cell Assay, In Culture of Hematopoietic Cells, R.I. Freshney, et al. Eds. pp. 1-21, Wiley-Liss, Inc., New York, NY. 1994; Spooncer, E., Dexter, M. and Allen, T. Long Term Bone Marrow Cultures in the Presence of Stromal Cells, in Culture of Hematopoietic Cells. R.I. Freshney, et aL Eds. pp. 163-179, Wiley-Liss, Inc., New York, NY. 1994; and Sutherland, H.J. Long Term Culture Initiating Cell Assay, in Culture of Hematopoietic Cells. R.I. Freshney, et al. Eds. pp. 139-162, Wiley-Liss, Inc., New York, NY. 1994.
Those proteins which exhibit hematopoiesis regulatory activity may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of hematopoeisis is beneficial. For example, a protein of the present invention may be useful in regulation of hematopoiesis and, consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal biological activity in support of colony forming cells or of factor-dependent cell lines indicates involvement in regulating hematopoiesis, e.g.
in supporting the growth and proliferation of erythroid progenitor cells alone or in combination with other cytokines, thereby indicating utility, for example, in treating various anemias or for use in conjunction with irradiation/chemotherapy to stimulate the production of erythroid precursors and/or erythroid cells; in supporting the growth and proliferation of myeloid cells such as granulocytes and monocyteslmacrophages (i.e., traditional CSF activity) useful, for example, in conjunction with chemotherapy to prevent or treat consequent myelo-suppression; in supporting the growth and proliferation of megakaryocytes and consequently of platelets thereby allowing prevention or treatment of various platelet disorders such as thrombocytopenia, and generally for use in place of or complimentary to platelet transfusions; andlor in supporting the growth and proliferation of hematopoietic stem cells which are capable of maturing to any and all of the above-mentioned hematopoietic cells and therefore find therapeutic utility in various stem cell disorders (such as those usually treated with transplantion, including, without limitation, aplastic anemia and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem cell compartment post irradiation/chemotherapy, either in-vivo or ex-vivo (i.e., in conjunction with bone marrow transplantation or with peripheral progenitor cell transplantation (homologous or heterofogous)) as normal cells or genetically manipulated for gene therapy. Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to ucrease or decrease the expression of the proteins as desired.

Assaying the Proteins Expressed from cDNAs or Fragments Thereof for Regulation of Tissue Growth The proteins encoded by the cDNAs or fragments thereof may also be evaluated for their effect on tissue growth. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in International Patent Publication No. W095/16035, International Patent Publication No.
W095I05846 and international Patent Publication No. W091/07491.
Assays for wound healing activity include, without IimitaGon, those described in: Winter, E idermal Wound Healing, pps. 71-112 (Maibach, H1 and Rovee, DT, eds.), Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and Mertz, J. Invest. Dermatol 71:382-84 (1978).

Those proteins which are involved in the regulation of tissue growth may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of tissue growth is beneficial. For example, a protein of the present invention also may have utility in compositions used for bone, cartilage, tendon, ligament andlor nerve tissue growth or regeneration, as well as for wound healing and tissue repair and replacement, and in the treatment of bums, incisions and ulcers.
A protein of the present invention, which induces cartilage andlor bone growth in circumstances where bone is not normally formed, has application in the healing of bone fractures and cartilage damage or defects in humans and other animals. Such a preparation employing a protein of the invention may have prophylactic use in closed as well as open fracture reduction and also in the improved fixation of artificial joints. De novo bone formation induced by an osteogenic agent contributes to the repair of congenital, trauma induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic plastic surgery.
A protein of this invention may also be used in the treatment of periodontal disease, and in other tooth repair processes. Such agents may provide an environment to attract bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of progenitors of bone-forming cells. A protein of the invention may also be useful in the treatment of osteoporosis or osteoarthritis, such as through stimulation of bone andlor cartilage repair or by blocking inflammation or processes of tissue destruction (colfagenase activity, osteoclast activity, etc.) mediated by inflammatory processes.
Another category of tissue regeneration activity that may be attributable to the protein of the present invention is tendonlligament formation. A protein of the present invention, which induces tendonlligament-like tissue or other tissue formation in circumstances where such tissue is not normally formed, has application in the healing of tendon or ligament tears, deformities and other tendon or ligament defects in humans and other animals. Such a preparation employing a tendon/ligament-like tissue inducing protein may have prophylactic use in preventing damage to tendon or ligament tissue, as well as use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing defects to tendon or ligament tissue. De novo tendonlligament-like tissue formation induced by a composition of the present invention contributes to the repair of congenital, trauma induced, or other tendon or ligament defects of other origin, and is also useful in cosmetic plastic surgery for attachment or repair of tendons or ligaments. The compositions of the present invention may provide an environment to attract tendon- or ligament-forming cells, stimulate growth of tendon- or ligament-forming cells, induce differentiation of progenitors of tendon- or ligament-forming cells, or induce growth of tendonlligament cells or progenitors ex vivo for return in vivo to effect tissue repair. The compositions of the invention may also be useful in the treatment of tendinitis, carpal tunnel syndrome and other tendon or ligament defects. The compositions may also include an appropriate matrix andlor sequestering agent as a earner as is well known in the art.
The protein of the present invention may also be useful for proliferation of neural cells and for regeneration of nerve and brain tissue, i.e., for the treatment of central and peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic disorders, which involve degeneration, death or trauma to neural cells or nerve tissue. More specifically, a protein may be used in the treatment of diseases of the peripheral nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome.
Further conditions which 5 may be treated in accordance with the present invention include mechanical and traumatic disorders, such as spinal cord disorders, head trauma and cerebrovascular diseases such as stroke. Peripheral neuropathies resulting from chemotherapy or other medical therapies may also be treatable using a protein of the invention.
Proteins of the invention may also be useful to promote better or faster closure of non-healing 10 wounds, including without limitation pressure ulcers, ulcer; associated with vascular insufficiency, surgical and traumatic wounds, and the like.
It is expected that a protein of the present invention may also exhibit activity for generation or regeneration of other tissues, such as organs (including, for example, pancreas, fiver, intestine, kidney, skin, endothelium) muscle (smooth, skeletal or cardiac) and vascular (including vascular endothelium) tissue, or 15 for promoting the growth of cells comprising such tissues. Part of the desired effects may be by inhibition or modulation of fibrotic scarring to allow normal tissue to generate. A protein of the invention may also exhibit angiogenic activity.
A protein of the present invention may also be useful for gut protection or regeneration and treatment of lung or liver fibrosis, reperfusion injury in various tissues, and conditions resulting from systemic 20 cytokinc damage.
A protein of the present invention may also be useful for promoting or inhibiting differentiation of tissues described above from precursor tissues or cells; or for inhibiting the growth of tissues described above.
Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids 25 regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.

Assaying the Proteins Expressed from cDNAs or Fragments Thereof for Regulation of Reproductive Hormones or Cell Movement 30 The proteins encoded by the cDNAs or fragments thereof may also be evaluated for their ability to regulate reproductive hormones, such as follicle stimulating hormone. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Vale et al., Endocrinology 91:562-572, 1972; Ling et al., Nature 321:779-782, 1986; Vale et al., Nafure 321:776-779, 1986; Mason et aL, Nature 318:659-663, 1985; Forage ef al., Proc. Natl. Acad.
Sci. USA 83:3091-3095, 35 1986. Chapter 6.12 (Measurement of Alpha and Beta Chemokines) Current Protocols in Immunolooy, J.E.
Coligan ef al. Eds. Greene Publishing Associates and Wiley-Intersciece ; Taub et al. J. Clin. Invesf. 95:1370-1376, 1995; Lind et al. APMIS 103:140-146, 1995; Muller et at. Eur. J.
Immunol. 25:1744-1748; Gruber et al.
J. of Immunol. 152:5860-5867, 1994; Johnston et al. J. of Immunol. 153:1 l62-1768, 1994.
Those proteins which exhibit activity as reproductive hormones or regulators of cell movement may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of reproductive hormones or cell movement are beneficial. For example, a protein of the present invention may also exhibit activin- or inhibin-related activities. Inhibins are characterized by their ability to inhibit the release of follicle stimulating hormone (FSH), while activins are characterized by their ability to stimulate the release of folic stimulating hormone (FSH). Thus, a protein of the present invention, alone or in heterodimers with a member of the inhibin G family, may be useful as a contraceptive based on the ability of inhibins to decrease fertility in female mammals and decrease spermatogenesis in male mammals.
Administration of sufficient amounts of other inhibins can induce infertility in these mammals. Alternatively, the protein of the invention, as a homodimer or as a heterodimer with other protein subunits of the inhibin-B
group, may be useful as a fertility inducing therapeutic, based upon the ability of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for example; United States Patent 4,798,885. A protein of the invention may also be useful for advancement of the onset of fertility in sexually immature mammals, so as to increase the lifetime reproductive performance of domestic animals such as cows, sheep and pigs.
Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.

Assaying the Proteins Expressed from cDNAs or Fragments Thereof for ChemotacticlChemokinetic Activity The proteins encoded by the cDNAs or fragments thereof may also be evaluated for chemotacticlchemokinetic activity. For example, a protein of the present invention may have chemotactic or chemokinetic activity (e.g., act as a chemokine) for mammalian cells, including; for example, monocytes, fibroblasts, neutrophils, T-cells, mast cells, cosinophils, epithelial andlor endothelial cells. Chemotactic and chmokinetic proteins can be used to mobilize or attract a desired cell population to a desired site of action.
Chemotactic or chemokinetic proteins provide particular advantages in treatment of wounds and other trauma to tissues, as well as in treatment of localized infections. For example, attraction of lymphocytes, monocytes or neutrophils to tumors or sites of infection may result in improved immune responses against the tumor or infecting agent.
A protein or peptide has chemotactic activity for a particular cell population if it can stimulate, directly or indirectly, the directed orientation or movement of such cell population. Preferably, the protein or peptide has the ability to directly stimulate directed movement of cells.
Whether a particular protein has chemotactic activity for a population of cells can be readily determined by employing such protein or peptide in any known assay for cell chemotaxis.

The activity of a protein of the invention may, among other means, be measured by the following methods:
Assays for chemotactic activity (which will identify proteins that induce or prevent chemotaxis) consist of assays that measure the ability of a protein to induce the migration of cells across a membrane as well as the ability of a protein to induce the adhension of one cell population to another cell population.
Suitable assays for movement and adhesion include, without limitation, those described in: Current Protocols in Immunology, Ed by J.E. Coligan, A.M. Kruisbeek, D.H. Margulies, E.M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha and beta Chemokincs 6.12.1-6.12.28; Taub ef al. J. Clin. invest. 95:1370-1376, 1995;
Lind et al. APMIS 103:140-146, 1995; Mueller et al Eur. J. Immunol. 25:1744-1748; Gruber et al. J. of Immunol. 152:5860-5867, 1994;
Johnston et ai. J. of Immunol, 153:1762-1768, 1994.

Assaying the Proteins Expressed from cDNAs or Fragments Thereof for Regulation of Blood Clotting The proteins encoded by the cDNAs or fragments thereof may also be evaluated for their effects on blood clotting. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Linet ef al., J. Clin. Pharmacol.
26:131-140, 1986; Burdick ef al., Thrombosis Res. 45:413-419, 1987; Humphrey et al., Fi6rinolysis 5:71-79 (1991); Schaub, Prostagiandins 35:467-474, 1988.
Those proteins which are involved in the regulation of blood clotting may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of blood clotting is beneficial. For example, a protein of the invention may also exhibit hemostatic or thrombolytic activity. As a result, such a protein is expected to be useful in treatment of various coagulations disorders (including hereditary disorders, such as hemophilias) or to enhance coagulation and other hemostatic events in treating wounds resulting from trauma, surgery or other causes. A protein of the invention may also be useful for dissolving or inhibiting formation of thromboses and for treatment and prevention of conditions resulting therefrom (such as, for example, infarction of cardiac and central nervous system vessels (e.g., stroke)). Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.

Assaying the Proteins Expressed from cDNAs or Fragments Thereof for Involvement in ReceptorlLigand Interactions The proteins encoded by the cDNAs or a fragment thereof may also be evaluated for their involvement in receptorlligand interactions. Numerous assays for such involvement are familiar to those skilled in the art, including the assays disclosed in the following references: Chapter 7.28 (Measurement of Cellular Adhesion under Static Conditions 7.28.1-7.28.22) in Current Protocols in Immunology, J.E. Coligan WO 00!37491 PCT/IB99/02058 et al. Eds. Greene Publishing Associates and Wiley-Interscience; Takai et al., Proc. Natl. Acad. Sci. USA
84:6864-6868, 1987; Bierer et al., J. Exp. Med. 168:1145-1156, 1988;
Rosenstein et al., J. Exp. Med.
169:149-160, 1989; Stoltenborg et al., J. Immunol. Methods 175:59-68, 1994;
Stitt et al., Cell 80:661-670, 1995; Gyuris et al., Ce1175:791-803, 1993.
For example, the proteins of the present invention may also demonstrate activity as receptors, receptor ligands or inhibitors or agonists of receptorlligand interactions.
Examples of such receptors and ligands include, without limitation, cytokine receptors and their ligands, receptor kinases and their ligands, receptor phosphatases and their ligands, receptors involved in cell-cell interactions and their ligands (including without limitation, cellular adhesion molecules (such as selectins, integrins and their ligands) and receptorlligand pairs involved in antigen presentation, antigen recognition and development of cellular and humoral immune respones). Receptors and ligands are also useful for screening of potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A
protein of the present invention (including, without limitation, fragments of receptors and ligands) may themselves be useful as inhibitors of receptorlligand interactions.

AssayinQ the Proteins Expressed from cDNAs or Fragments Thereof for Anti-Inflammatory Activity The proteins encoded by the cDNAs or a fragment thereof may also be evaluated for anti-inflammatory activity. The anti-inflammatory activity may be achieved by providing a stimulus to cells involved in the inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the inflammatory process, inhibiting or promoting cell extravasation, or by stimulating or suppressing production of other factors which more directly inhibit or promote an inflammatory response. Proteins exhibiting such activities can be used to treat inflammatory conditions including chronic or acute conditions), including without limitation inflammation associated with infection (such as septic shock, sepsis or systemic inflammatory response syndrome (SIRS)), ischemia-reperfusioninury, endotoxin lethality, arthritis, complement-mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, inflammatory bowel disease, Crohn's disease or resulting from over production of cytokines such as TNF or IL-1. Proteins of the invention may also be useful to treat anaphylaxis and hypersensitivity to an antigenic substance or material.

Assaying the Proteins Expressed from cDNAs or Fragments Thereof for Tumor Inhibition Activity The proteins encoded by the cDNAs or a fragment thereof may also be evaluated for tumor inhibition activity. In addition to the activities described above for immunological treatment or prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A
protein may inhibit tumor growth directly or indirectly (such as, for example, via ADCC). A protein may exhibit its tumor inhibitory activity by acting on tumor tissue or tumor precursor tissue, by inhibiting formation of tissues necessary to support tumor growth (such as, for example, by inhibiting angiogenesis), by pausing production of other factors, agents or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting factors, agents or cell types which promote tumor growth.
A protein of the invention may also exhibit one or more of the following additional activities or effects: inhibiting the growth, infection or function of, or killing, infectious agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting (suppressing or enhancing) bodily characteristics, including, without limitation, height, weight, hair color, eye color, skin, fat to lean ratio or other tissue pigmentation, or organ or body part size or shape (such as, for example, breast augmentation or diminution, change in bone form or shape); effecting biorhythms or circadian cycles or rhythms; effecting the fertility of male or female subjects; effecting the metabolism, catabolism, anabolism, processing, utilization, storage or elimination of dietary fat, lipid, protein, carbohydrate, vitamins, minerals, cofactors or other nutritional factors or component(s); effecting behavioral characteristics, including, without limitation, appetite, libido, stress, cognition (including cognitive disorders), depression (including depressive disorders) and violent behaviors;
providing analgesic effects or other pain reducing effects; promoting differentiation and growth of embryonic stem cells in lineages other than hematopoietic lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of the enzyme and treating deficiency-related diseases; treatment of hyperproliferative disorders (such as, for example, psoriasis); immunoglobulin-like activity (such as, for example, the ability to bind antigens or complement); and the ability to act as an antigen in a vaccine composition to raise an immune response against such protein or another material or entity which is cross-reactive with such protein.

Identification of Proteins which Interact with Polvpeptides Encoded by cDNAs Proteins which interact with the polypeptides encoded by cDNAs or fragments thereof, such as receptor proteins, may be identified using two hybrid systems such as the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech). As described in the manual accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech), the cDNAs or fragments thereof, are inserted into an expression vector such that they are in frame with DNA encoding the DNA binding domain of the yeast transcriptional activator GAL4. cDNAs in a cDNA library which encode proteins which might interact with the polypeptides encoded by the cDNAs or fragments thereof are inserted into a second expression vector such that they are in frame with DNA encoding the activation domain of GAL4. The two expression plasmids are transformed into yeast and the yeast are plated on selection medium which selects for expression of selectable markers on each of the expression vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants capable of growing on medium lacking histidine are screened for GAL4 dependent Iac2 expression. Those cells which are positive in both the histidine selection and the IacZ assay contain plasmids encoding proteins which interact with the polypeptide encoded by the cDNAs or fragments thereof.
Altema6vely, the system described in Lustig ef aL, Methods in Enzymology 283:
83-99 (1997), may be used for identifying molecules which interact with the polypep6des encoded by cDNAs. In such systems, in vitro transcription reactions are performed on a pool of vectors containing cDNA inserts cloned downstream of a promoter which drives in vitro transcription. The resulting pools of mRNAs are introduced into Xenopus laevis oocytes. The oocytes are then assayed for a desired acitivity.
Aftematively, the pooled in vitro transcription products produced as described above may be translated in vitro. The pooled in vitro translation products can be assayed for a desired activity or for 5 interaction with a known polypeptide.
Proteins or other molecules interacting with polypeptides encoded by cDNAs can be found by a variety of additional techniques. In one method, affinity columns containing the polypeptide encoded by the cDNA or a fragment thereof can be constructed. In some versions, of this method the affinity column contains chimeric proteins in which the protein encoded by the cDNA or a fragment thereof is fused to 10 glutathione S-transferase. A mixture of cellular proteins or pool of expressed proteins as described above and is applied to the affinity column. Proteins interacting with the polypeptide attached to the column can then be isolated and analyzed on 2-D electrophoresis gel as described in Ramunsen et al. Electrophoresis, 18, 588-598 (1997). Alternatively, the proteins retained on the affinity column can be purified by electrophoresis based methods and sequenced. The same method can be used to isolate antibodies, to 15 screen phage display products, or to screen phage display human antibodies.
Proteins interacting with polypeptides encoded by cDNAs or fragments thereof can also be screened by using an Optical Biosensor as described in Edwards &
Leatherbarrow, Analytical Biochemistry, 246, 1-6 (1997). The main advantage of the method is that it allows the determination of the association rate between the protein and other interacting molecules. Thus, it is possible to specifically select interacting 20 molecules with a high or low association rate. Typically a target molecule is linked to the sensor surface (through a carboxymethl dextran matrix) and a sample of test molecules is placed in contact with the target molecules. The binding of a test molecule to the target molecule causes a change in the refractive index andl or thickness. This change is detected by the Biosensor provided it occurs in the evanescent field (which extend a few hundred manometers from the sensor surface). In these screening assays, the target 25 molecule can be one of the polypeptides encoded by cDNAs or a fragment thereof and the test sample can be a collection of proteins extracted from tissues or cells, a pool of expressed proteins, combinatorial peptide andl or chemical libraries,or phage displayed peptides. The tissues or cells from which the test proteins are extracted can originate from any species.
In other methods, a target protein is immobilized and the test population is a collection of unique 30 polypeptides encoded by the cDNAs or fragments thereof.
To study the interaction of the proteins encoded by the cDNAs or fragments thereof with drugs, the microdialysis coupled to HPLC method described by Wang et al., Chromatographic, 44, 205-208(1997) or the affinity capillary electrophoresis method described by Busch et al., J.
Chromatogr. 777:311-328 (1997).
The system described in U.S. Patent No. 5,654,150, may also be used to identify molecules which 35 interact with the polypeptides encoded by the cDNAs. In this system" pools of cDNAs are transcribed and translated in vitro and the reaction products are assayed for interaction with a known polypeptide or antibody.

It will be appreciated by those skilled in the art that the proteins expressed from the cDNAs or fragments may be assayed for numerous activities in addition to those specifically enumerated above. For example, the expressed proteins may be evaluated for applications involving control and regulation of inflammation, tumor proliferation or metastasis, infection, or other clinical conditions. In addition, the proteins expressed from the cDNAs or fragments thereof may be useful as nutritional agents or cosmetic agents.
The proteins expressed from the cDNAs or fragments thereof may be used to generate antibodies capable of specifically binding to the expressed protein or fragments thereof as described below. The antibodies may capable of binding a full length protein encoded by one of the sequences of SEO ID NOs.
24-73, a mature protein encoded by one of the sequences of SEQ ID NOs. 24-73, or a signal peptide encoded by one of the sequences of SEQ ID Nos. 24-73. Alternatively, the antibodies may be capable of binding fragments of the proteins expressed from the cDNAs which comprise at least 10 amino acids of the sequences of SEQ ID NOs: 74-123. In some embodiments, the antibodies may be capable of binding fragments of the proteins expressed from the cDNAs which comprise at least 15 amino acids of the sequences of SEO ID NOs: 74-123. In other embodiments, the antibodies may be capable of binding fragments of the proteins expressed from the cDNAs which comprise at least 25 amino acids of the sequences of SEQ ID NOs: 74-123. In further embodiments, the antibodies may be capable of binding fragments of the proteins expressed from the cDNAs which comprise at least 40 amino acids of the sequences of SEQ ID NOs: 74-123.

Production of an Antibody to a Human Protein Substantially pure protein or polypeptide is isolated from the transfected or transformed cells as described in example 18. The concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few microgramslml.
Monoclonal or polyclonal antibody to the protein can then be prepared as follows:
A. Monoclonal Antibody Production by Hybridoma Fusion Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, C., Nature 256:495 (1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein or peptides derived therefrom over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated.
The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as Elisa, as originally described by Engvall, E., Mefh. EnzymoL
70:419 (1980), and derivative methods thereof. Selected positive clones can be expanded and their WO 00/37491 PC'T/IB99/02058 monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2.
B Polyclonal Antibody Production by Immunization Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein or peptides derived therefrom described above, which can be unmodified or modified to enhance immunogenicity.
Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than others and may require the use of camers and adjuvant. Also, host animals vary in -response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. ef at. J. Clin. Endocrinol. Metab. 33:988-991 (1971).
Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchteriony, 0. et ai., Chap. 19 in:
Handbook of Experimental Immunoloay D. Wier (ed) Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mglml of serum (about 12 OM). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, D., Chap. 42 in:
Manual of Clinical Immunolo4y, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For Microbiol., Washington, D.C. (1980).
Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample.
The antibodies may also be used in therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein in the body.
V Use of cDNAs or Fragments Thereof as Reagents The cDNAs of the present invention may be used as reagents in isolation procedures, diagnostic assays, and forensic procedures. For example, sequences from the cDNAs (or genomic DNAs obtainable therefrom) may be detectably labeled and used as probes to isolate other sequences capable of hybridizing to them. In addition, sequences from the cDNAs (or genomic DNAs obtainable therefrom) may be used to design PCR primers to be used in isolation, diagnostic, or forensic procedures.

Preparation of PCR Primers and Amplification of DNA
The cDNAs (or genomic DNAs obtainable therefrom) may be used to prepare PCR
primers for a variety of applications, including isolation procedures for cloning nucleic acids capable of hybridizing to such sequences, diagnostic techniques and forensic techniques. The PCR primers are at least 10 bases, and preferably at least 12, 15, or 17 bases in length. More preferably, the PCR
primers are at least 20-30 bases in length. In some embodiments, the PCR primers may be more than 30 bases in length. It is preferred that the primer pairs have approximately the same GIC ratio, so that melting temperatures are approximately the same. A variety of PCR techniques are familiar to those skilled in the art.
For a review of PCR technology, see Molecular Cloning to Genetic Engineering White, B.A. Ed. in Methods in Molecular Biology 67: Humana Press, Totowa 1997. In each of these PCR procedures, PCR primers on either side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is initiated. The cycles are repeated multiple times to produce an amplified fragment containing the nucleic acid sequence between the primer sites.

Use of cDNAs as Probes Probes derived from cDNAs or fragments thereof (or genomic DNAs obtainable therefrom) may be labeled with detectable labels familiar to those skilled in the art, including radioisotopes and non-radioactive labels, to provide a detectable probe. The detectable probe may be single stranded or double stranded and may be made using techniques known in the art, including in vitro transcription, nick translation, or kinase reactions. A nucleic acid sample containing a sequence capable of hybridizing to the labeled probe is contacted with the labeled probe. If the nucleic acid in the sample is double stranded, it may be denatured prior to contacting the probe. In some applications, the nucleic acid sample may be immobilized on a surface such as a nitrocellulose or nylon membrane. The nucleic acid sample may comprise nucleic acids obtained from a variety of sources, including genomic DNA, cDNA libraries, RNA, or tissue samples.
Procedures used to detect the presence of nucleic acids capable of hybridizing to the detectable probe include well known techniques such as Southern blotting, Northern blotting, dot blotting, colony hybridization, and plaque hybridization. In some applications, the nucleic acid capable of hybridizing to the labeled probe may be cloned into vectors such as expression vectors, sequencing vectors, or in vitro transcription vectors to facilitate the characterization and expression of the hybridizing nucleic acids in the sample. For example, such techniques may be used to isolate and clone sequences in a genomic library or cDNA library which are capable of hybridizing to the detectable probe as described in example 17 above.
PCR primers made as described in example 32 above may be used in forensic analyses, such as the DNA fingerprinting techniques described in Examples 34-38 below. Such analyses may utilize detectable probes or primers based on the sequences of the cDNAs or fragments thereof (or genomic DNAs obtainable therefrom).

Forensic Matching by DNA SepuencinQ
In one exemplary method, DNA samples are isolated from forensic specimens of, for example, hair, semen, blood or skin cells by conventional methods. A panel of PCR primers based on a number of the cDNAs (or genomic DNAs obtainable therefrom), is then utilized in accordance with example 32 to amplify DNA of approximately 100-200 bases in length from the forensic specimen.
Corresponding sequences are obtained from a test subject. Each of these identification DNAs is then sequenced using standard techniques, and a simple database comparison determines the differences, if any, between the sequences from the subject and those from the sample. Statistically significant differences between the suspect's DNA
sequences and those from the sample conclusively prove a lack of identity.
This lack of identity can be proven, for example, with only one sequence. Identity, on the other hand, should be demonstrated with a large number of sequences, all matching. Preferably, a minimum of 50 statistically identical sequences of 100 bases in length are used to prove identity between the suspect and the san~le.

Positive Identification by DNA Se4uencing The technique outlined in the previous example may also be used on a larger scale to provide a unique fingerprint-type identification of any individual. In this technique, primers are prepared from a large number of sequences from Table I and the appended sequence listing.
Preferably, 20 to 50 different primers are used. These primers are used to obtain a corresponding number of PCR-generated DNA segments from the individual in question in accordance with example 32. Each of these DNA segments is sequenced, using the methods set forth in example 34. The database of sequences generated through this procedure uniquely identifies the individual from whom the sequences were obtained. The same panel of primers may then be used at any later time to absolutely correlate tissue or other biological specimen with that individual.

Southern Blot Forensic Identification The procedure of example 35 is repeated to obtain a panel of at least 10 amplified sequences from an individual and a specimen. Preferably, the panel contains at least 50 amplified sequences. More preferably, the panel contains 100 amplified sequences. In some embodiments, the panel contains 200 amplified sequences. This PCR-generated DNA is then digested with one or a combination of, preferably, four base specific restriction enzymes. Such enzymes are commercially available and known to those of skill in the art. After digestion, the resultant gene fragments are size separated in multiple duplicate wells on an agarose gel and transferred to nitrocellulose using Southern blotting techniques well known to those with skill in the art. For a review of Southern blotting see Davis et af. (Basic Methods in Molecular Biology, 1986, Elsevier Press. pp 62-65).
A panel of probes based on the sequences of the cDNAs (or genomic DNAs obtainable therefrom), or fragments thereof of at least 10 bases, are radioactively or colorimetrically labeled using methods known in the art, such as nick translation or end labeling, and hybridized to the Southern blot using techniques known in the art (Davis et al., su ra . Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the cDNA (or genomic DNAs obtainable therefrom). More preferably, the probe comprises at least 20-30 consecutive nucleotides from the cDNA (or genomic DNAs obtainable therefrom). In some embodiments, the probe comprises more than 30 nucleotides from the cDNA (or genomic DNAs obtainable therefrom). In other embodiments, the probe comprises at least 40, at least 50, at least 75, at least 100, at least 150, or at least 200 consecutive nucleotides from the cDNA (or genomic DNAs obtainable therefrom).
Preferably, at least 5 to 10 of these labeled probes are used, and more preferably at least about 20 or 30 are used to provide a unique pattern The resultant bands appearing from the hybridization of a large 5 sample of cDNAs (or genomic DNAs obtainable therefrom) will be a unique identifier. Since the restriction enzyme cleavage will be different for every individual, the band pattern on the Southern blot will also be unique. Increasing the number of cDNA probes will provide a statistically higher level of confidence in the identification since there will be an increased number of sets of bands used for identification.

10 Dot Blot Identification Procedure Another technique for identifying individuals using the cDNA sequences disclosed herein utilizes a dot blot hybridization technique.
Genomic DNA is isolated from nuclei of subject to be identified.
Oligonucleotide probes of approximately 30 by in length are synthesized that correspond to at least 10, preferably 50 sequences from 15 the cDNAs or genomic DNAs obtainable therefrom. The probes are used to hybridize to the genomic DNA
through conditions known to those in the art. The oligonucleotides are end labeled with P3z using polynucleotide kinase (Pharmacia). Dot Blots are created by spotting the genomic DNA onto nitrocellulose or the like using a vacuum dot blot manifold (BioRad, Richmond California}.
The nitrocellulose filter containing the genomic sequences is baked or UV linked to the filter, prehybridized and hybridized with 20 labeled probe using techniques known in the art (Davis et al. supra). The 32P labeled DNA fragments are sequentially hybridized with successively stringent conditions to detect minimal differences between the 30 by sequence and the DNA. Tetramethylammonium chloride is useful for identifying clones containing small number; of nucleotide mismatches (Wood et at., Proc. Natl. Acad. Sci. USA
82(6):1585-1588 (1985)). A
unique pattern of dots distinguishes one individual from another individual.
25 cDNAs or oligonucleotides containing at least 10 consecutive bases from these sequences can be used as probes in the following alternative fingerprinting technique.
Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the cDNA (or genomic DNAs obtainable therefrom). More preferably, the probe comprises at least 20-30 consecutive nucleotides from the cDNA (or genomic DNAs obtainable therefrom). In some embodiments, the probe comprises more than 30 nucleotides from the 30 cDNA (or genomic DNAs obtainable therefrom). In other embodiments, the probe comprises at least 40, at least 50, at least 75, at least 100, at least 150, or at least 200 consecutive nucleotides from the cDNA (or genomic DNAs obtainable therefrom).
Preferably, a plurality of probes having sequences from different genes are used in the alternative fingerprinting technique. Example 38 below provides a representative alternative fingerprinting procedure in 35 which the probes are derived from cDNAs.

Alternative 'Fingerprint' Identification Technigue 20-mer oligonucleotides are prepared from a large number, e.g. 50, 100, or 200, of cDNA
sequences (or genomic DNAs obtainable therefrom) using commercially available oligonucleotide services such as Genset, Paris, France. Cell samples from the test subject are processed for DNA using techniques well known to those with skill in the art. The nucleic acid is digested with restriction enzymes such as EcoRl and Xbal. Following digestion, samples are applied to wells for electrophoresis. The procedure, as known in the art, may be modified to accommodate polyacrylamide electrophoresis, however in this example, samples containing 5 ug of DNA are loaded into wells and separated on 0.8%
agarose gels. The gels are transferred onto nitrocellulose using standard Southern blotting techniques.
ng of each of the oligonucleotides are pooled and end-labeled with P3z. The nitrocellulose is 10 prehybridized with blocking solution and hybridized with the labeled probes. Following hybridization and washing, the nitrocellulose filter is exposed to X-Omat AR X-ray film. The resulting hybridization pattern will be unique for each individual.
It is additionally contemplated within this example that the number of probe sequences used can be varied for additional accuracy or clarity.
The antibodies generated in Examples 18 and 31 above may be used to identify the tissue type or cell species from which a sample is derived as described above.

Identification of Tissue Types or Cell Species by Means of Labeled Tissue Specific Antibodies Identification of specific tissues is accomplished by the visualization of tissue specific antigens by means of antibody preparations according to Examples 18 and 31 which are conjugated, directly or indirectly to a detectable marker. Selected labeled antibody species bind to their specific antigen binding partner in tissue sections, cell suspensions, or in extracts of soluble proteins from a tissue sample to provide a pattern for qualitative or semi-qualitative interpretation.
Antisera for these procedures must have a potency exceeding that of the native preparation, and for that reason, antibodies are concentrated to a mglml level by isolation of the gamma globulin fraction, for example, by ion-exchange chromatography or by ammonium sulfate fractionation.
Also, to provide the most specific antisera, unwanted antibodies, for example to common proteins, must be removed from the gamma globulin fraction, for example by means of insoluble immunoabsorbents, before the antibodies are labeled with the marker. Either monoclonal or heterologous antisera is suitable for either procedure.
A. Immunohistochemical Techni4ues Purified, high-titer antibodies, prepared as described above, are conjugated to a detectable marker, as described, for example, by Fudenberg, H., Chap. 26 in: Basic 503 Clinical Immunolo4y, 3rd Ed. Lange, Los Altos, California (1980) or Rose, N. et al., Chap. 12 in: Methods in Immunodia4nosis, 2d Ed. John Wiley 503 Sons, New York (1980).
A fluorescent marker, either fluorescein or rhodamine, is preferred, but antibodies can also be labeled with an enryme that supports a color producing reaction with a substrate, such as horseradish peroxidase. Markers can be added to tissue-bound antibody in a .second step, as described below.

Alternatively, the specific antitissue antibodies can be labeled with ferritin or other electron dense particles, and localization of the femitin coupled antigen-antibody complexes achieved by means of an electron microscope. In yet another approach, the antibodies are radiolabeled, with, for example 'zsl, and detected by overlaying the antibody treated preparation with photographic emulsion.
Preparations to carry out the procedures can comprise monoclonal or polyclonal antibodies to a single protein or peptide identified as specific to a tissue type, for example, brain tissue, or antibody preparations to several antigenically distinct tissue specific antigens can be used in panels, independently or in mixtures, as required.
Tissue sections and cell suspensions are prepared for immunohistochemical examination according to common histological techniques. Multiple cryostat sections (about 4 om, unfixed) of the unknown tissue and known control, are mounted and each slide covered with different dilutions of the antibody preparation.
Sections of known and unknown tissues should also be treated with preparations to provide a positive control, a negative control, for example, pre-immune sera, and a control for non-specific staining, for example, buffer.
Treated sections are incubated in a humid chamber for 30 min at room temperature, rinsed, then washed in buffer for 30-45 min. Excess fluid is blotted away, and the marker developed.
If the tissue specific antibody was not labeled in the first incubation, it can be labeled at this time in a second antibody-antibody reaction, for example, by adding fluorescein- or enzyme-conjugated antibody against the immunoglobulin class of the antiserum-producing species, for example, fluorescein labeled antibody to mouse IgG. Such labeled sera are commercially available.
The antigen found in the tissues by the above procedure can be quantified by measuring the intensity of color or fluorescence on the tissue section, and calibrating that signal using appropriate standards.
B. Identification of Tissue Specific Soluble Proteins The visualization of tissue specific proteins and identification of unknown tissues from that procedure is carved out using the labeled antibody reagents and detection strategy as described for immunohistochemistry; however the sample is prepared according to an electrophoretic technique to distribute the proteins extracted ftom the tissue in an orderly array on the basis of molecular weight for detection.
A tissue sample is homogenized using a Virtis apparatus; cell suspensions are disrupted by Dounce homogenization or osmotic lysis, using detergents in either case as required to disrupt cell membranes, as is the practice in the art. Insoluble cell components such as nuclei, microsomes, and membrane fragments are removed by ultracentrifugation, and the soluble protein-containing fraction concentrated if necessary and reserved for analysis.
A sample of the soluble protein solution is resolved into individual protein species by conventional SDS polyacrylamide electrophoresis as described, for example, by Davis, L. et al., Section 19-2 in: Basic Methods in Molecular Biolo4y (P. Leder, ed), Elsevier, New York (1986), using a range of amounts of polyacrylamide in a set of gels to resolve the entire molecular weight range of proteins to be detected in the sample. A size marker is run in parallel for purposes of estimating molecular weights of the constituent proteins. Sample size for analysis is a convenient volume of from 5 to55 DI, and containing from about 1 to 100 Og protein. An aliquot of each of the resolved proteins is transferred by blotting to a nitrocellulose filter paper, a process that maintains the pattern of resolution. Multiple copies are prepared. The procedure, known as Western Blot Analysis, is well described in Davis, L. et al., (above) Section 19-3. One set of nitrocellulose blots is stained with Coomassie Blue dye to visualize the entire set of proteins for comparison with the antibody bound proteins. The remaining nitrocellulose filters are then incubated with a solution of one or more specific antisera to tissue specific proteins prepared as described in Examples 18 and 31. In this procedure, as in procedure A above, appropriate positive and negative sample and reagent controls are run.
In either procedure A or B, a detectable label can be attached to the primary tissue antigen-primary antibody complex according to various strategies and permutations thereof. In a straightforward approach, the primary specific antibody can be labeled; alternatively, the unlabeled complex can be bound by a labeled secondary anti-IgG antibody. In other approaches, either the primary or secondary antibody is conjugated to a biotin molecule, which can, in a subsequent step, bind an avidin conjugated marker. According to yet another strategy, enzyme labeled or radioactive protein A, which has the property of binding to any IgG, is bound in a final step to either the primary or secondary antibody.
The visualization of tissue specific antigen binding at levels above those seen in control tissues to one or more tissue specific antibodies, prepared from the gene sequences identified from cDNA sequences, can identify tissues of unknown origin, for example, forensic samples, or differentiated tumor tissue that has metastasized to foreign bodily sites.
In addition to their applications in forensics and identification, cDNAs (or genomic DNAs obtainable therefrom) may be mapped to their chromosomal locations. example 40 below describes radiation hybrid (RH) mapping of human chromosomal regions using cDNAs. example 41 below describes a representative procedure for mapping a cDNA (or a genomic DNA obtainable therefrom) to its location on a human chromosome. example 42 below describes mapping of cDNAs (or genomic DNAs obtainable therefrom) on metaphase chromosomes by Fluorescence In Situ Hybridization (FISH).

Radiation hybrid mapping of cDNAs to the human genome Radiation hybrid (RH) mapping is a somatic cell genetic approach that can be used for high resolution mapping of the human genome. In this approach, cell lines containing one or more human chromosomes are lethally irradiated, breaking each chromosome into fragments whose size depends on the radiation dose. These fragments are rescued by fusion with cultured rodent cells, yielding subclones containing different fragments of the human genome. This technique is described by Benham et aL
(Genomics 4:509-517, 1989) and Cox ef al., (Science 250:245-250, 1990). The random and indeperxfent nature of the subclones permits efficient mapping of any human genome marker.
Human DNA isolated from a panel of 80-100 cell lines provides a mapping reagent for ordering cDNAs (or genomic DNAs obtainable therefrom). In this approach, the frequency of breakage between marker; is used to measure distance, allowing construction of fine resolution maps as has been done using conventional ESTs (Schuler ef al., Science 274:540-546, 1996).
RH mapping has been used to generate a high-resolution whole genome radiation hybrid map of human chromosome 17q22-q25.3 across the genes for growth hormone (GH) and thymidine kinase (TK) (Foster et al., Genomics 33:185-192, 1996), the region surrounding the Goriin syndrome gene (Obermayr ef at., Eur. J. Hum. Genet. 4:242-245, 1996), 60 loci covering the entire short arm of chromosome 12 (Raeymaekers et al., Genomics 29:170-178, 1995), the region of human chromosome 22 containing the neurofibromatosis type 2 locus (Frazer et al., Genomics 14:574-584, 1992) and 13 loci on the long arm of chromosome 5 (Warrington et at., Genomics 11:701-708, 199i).

Ma inq of cDNAs to Human Chromosomes using PCR technioues cDNAs (or genomic DNAs obtainable therefrom) may be assigned to human chromosomes using PCR based methodologies. In such approaches, oligonucleotide primer pairs are designed from the cDNA
sequence (or the sequence of a genomic DNA obtainable therefrom) to minimize the chance of amplifying through an intron. Preferably, the oligonucleotide primers are 18-23 by in length and are designed for PCR
amplification. The creation of PCR primers from known sequences is well known to those with skill in the art.
For a review of PCR technology see Erlich, H.A., PCR Technoloqy~ Principles and Applications for DNA
Amplification. 1992. W.H. Freeman and Co., New Yotfc.
The primers are used in polymerise chain reactions (PCR) to amplify templates from total human genomic DNA. PCR conditions are as follows: 60 ng of genomic DNA is used as a template for PCR with 80 ng of each oligonucleotide primer, 0.6 unit of Taq polymerise, and 1 ~Cu of a 32P-labeled deoxycytidine triphosphate. The PCR is performed in a microplate thermocycler (Techne) under the following conditions:
30 cycles of 940C, 1.4 min; 550C, 2 min; and 72oC, 2 min; with a final extension at 7200 for 10 min. The amplified products are analyzed on a 6% polyacrylamide sequencing gel and visualized by autoradiography.
If the length of the resulting PCR product is identical to the distance between the ends of the primer sequences in the cDNA from which the primers are derived, then the PCR
reaction is repeated with DNA
templates from two panels of human-rodent somatic cell hybrids, BIOS PCRable DNA (BIOS Corporation) and NIGMS Human-Rodent Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, Camden, NJ).
PCR is used to screen a series of somatic cell hybrid cell lines containing defined sets of human chromosomes for the presence of a given cDNA (or genomic DNA obtainable therefrom). DNA is isolated from the somatic hybrids and used as starting templates for PCR reactions using the primer pairs from the cDNAs (or genomic DNAs obtainable therefrom). Only those somatic cell hybrids with chromosomes containing the human gene con-esponding to the cDNA (or genomic DNA obtainable therefrom) will yield an amplified fragment. The cDNAs (or genomic DNAs obtainable therefrom) are assigned to a chromosome by analysis of the segregation pattern of PCR products from the somatic hybrid DNA templates. The single human chromosome present in all cell hybrids that give rise to an amplified fragment is the chromosome containing that cDNA (or genomic DNA obtainable therefrom). For a review of techniques and analysis of results from somatic cell gene mapping experiments. (See Ledbetter ef al., Genomics 6:475-481 (1990).) Alternatively, the cDNAs (or genomic DNAs obtainable therefrom) may be mapped to individual 5 chromosomes using FISH as described in example 42 below.

Mappin4 of cDNAs to Chromosomes Usinp Fluorescence in situ Hybridization Fluorescence in situ hybridization allows the cDNA (or genomic DNA obtainable therefrom) to be mapped to a particular location on a given chromosome. The chromosomes to be used for fluorescence in 10 situ hybridization techniques may be obtained from a variety of sources including cell cultures, tissues, or whole blood.
In a preferred embodiment, chromosomal localization of a cDNA (or genomic DNA
obtainable therefrom) is obtained by FISH as described by Cherif ef al. (Proc. Nall.
Acad. Sci. U.S.A., 87:6639-6643, 1990). Metaphase chromosomes are prepared from phytohemagglutinin (PHA)-stimulated blood cel!
15 donors. PHA-stimulated lymphocytes from healthy males are cultured for 72 h in RPMI-1640 medium. For synchronization, methotrexate (10 OM) is added for 17 h, followed by addition of 5-bromodeoxyuridine (5-BudR, 0.1 mM) for 6 h. Colcemid (1 Dglml) is added for the last 15 min before harvesting the cells. Cells are collected, washed in RPMI, incubated with a hypotonic solution of KCI (75 mM) at 37DC for 15 min and fixed in three changes of methanol:acetic acid (3:1). The cell suspension is dropped onto a glass slide and air 20 dried. The cDNA (or genomic DNA obtainable therefrom) is labeled with biotin-16 dUTP by nick translation according to the manufacturer's instructions (Bethesda Research Laboratories, Bethesda, MD), purified using a Sephadex G-50 column (Pharmacia, Upssala, Sweden) and precipitated.
Just prior to hybridization, the DNA pellet is dissolved in hybridization buffer (50% formamide, 2 X SSC, 10% dextran sulfate, 1 mg/ml sonicated salmon sperm DNA, pH 7) and the probe is denatured at 70DC for 5-10 min.
25 Slides kept at -20DC are treated for 1 h at 3700 with RNase A (100 Dglml), rinsed three times in 2 X
SSC and dehydrated in an ethanol series. Chromosome preparations are denatured in 70% formamide, 2 X
SSC for 2 min at 70DC, then dehydrated at 40C. The slides are treated with proteinase K (10 Dg1100 ml in 20 mM Tris-HCI, 2 mM CaClz) at 370C for 8 min and dehydrated. The hybridization mixture containing the probe is placed on the slide, covered with a coverslip, sealed with rubber cement and incubated overnight in 30 a humid chamber at 370C. After hybridization and post-hybridization washes, the biotinylated probe is detected by avidin-FITC and amplified with additional layers of biotinylated goat anti-avidin and avidin-FITC.
For chromosomal localization, fluorescent R-bands are obtained as previously described (Cherif et al., supra.). The slides are observed under a LEICA fluorescence microscope (DMRXA). Chromosomes are counterstained with propidium iodide and the fluorescent signal of the probe appears as two symmetrical 35 yellow-green spots on both chromatids of the fluorescent R-band chromosome (red). Thus, a particular cDNA (or genomic DNA obtainable therefrom) may be localized to a particular cytogenetic R-band on a given chromosome.

Use of cDNAs to Construct or Expand Chromosome Maps Once the cDNAs (or genomic DNAs obtainable therefrom) have been assigned to particular chromosomes using the techniques described in Examples 40-42 above, they may be utilized to construct a high resolution map of the chromosomes on which they are located or to identify the chromosomes in a sample.
Chromosome mapping involves assigning a given unique sequence to a particular chromosome as described above. Once the unique sequence has been mapped to a given chromosome, it is ordered relative to other unique sequences located on the same chromosome. One approach to chromosome mapping utilizes a series of yeast artificial chromosomes (YACs) bearing several thousand long inserts derived from the chromosomes of the organism from which the cDNAs (or genomic DNAs obtainable therefrom) are obtained. This approach is described in Ramaiah Nagaraja et al.
Genome Research 7:210-222, March 1997. Brietly, in this approach each chromosome is broken into overlapping pieces which are inserted into the YAC vector. The YAC inserts are screened using PCR or other methods to determine whether they include the cDNA {or genomic DNA obtainable therefrom) whose position is to be determined.
Once an insert has been found which includes the cDNA (or genomic DNA
obtainable therefrom), the insert can be analyzed by PCR or other methods to determine whether the insert also contains other sequences known to be on the chromosome or in the region from which the cDNA (or genomic DNA obtainable therefrom) was derived. This process can be repeated for each insert in the YAC library to determine the location of each of the cDNAs (or genomic DNAs obtainable therefrom) relative to one another and to other known chromosomal markers. In this way, a high resolution map of the distribution of numerous unique markers along each of the organisms chromosomes may be obtained.
As described in example 44 below cDNAs (or genomic DNAs obtainable therefrom) may also be used.to identify genes associated with a particular phenotype, such as hereditary disease or drug response.

Identification of genes associated with hereditary diseases or drug response This example illustrates an approach useful for the association of cDNAs (or genomic DNAs obtainable therefrom) with particular phenotypic characteristics. In this example, a particular cDNA (or genomic DNA obtainable therefrom) is used as a test probe to associate that cDNA (or genomic DNA
obtainable therefrom} with a particular phenotypic characteristic.
CDNAs (or genomic DNAs obtainable therefrom) are mapped to a particular location on a human _ chromosome using techniques such as those described in Examples 40 and 41 or other techniques known in the art. A search of Mendelian Inheritance in Man (V. McKusick Mendelian Inheritance in Man (available .
on line through Johns Hopkins University Welch Medical Library} reveals the region of the human chromosome which contains the cDNA (or genomic DNA obtainable therefrom) to be a very gene rich region containing several known genes and several diseases or phenotypes for which genes have not been identified. The gene corresponding to this cDNA (or genomic DNA obtainable therefrom) thus becomes an immediate candidate for each of these genetic diseases.
Cells from patients with these diseases or phenotypes are isolated and expanded in culture. PCR
primers from the cDNA (or genomic DNA obtainable therefrom) are used to screen genomic DNA, mRNA or cDNA obtained from the patients. CDNAs (or genomic DNAs obtainable therefrom) that are not amplified in the pafients can be positively associated with a particular disease by further analysis. Alternatively, the PCR
analysis may yield fragments of different lengths when the samples are derived from an individual having the phenotype associated with the disease than when the sample is derived from a healthy individual, indicating that the gene containing the cDNA may be responsible for the genetic disease.
VI. Use of cDNAs (or Qenomic DNAs obtainable therefrom) to Construct Vectors The present cDNAs (or genomic DNAs obtainable therefrom) may also be used to construct secretion vectors capable of directing the secretion of the proteins encoded by genes inserted in the vectors.
Such secretion vectors may facilitate the purification or enrichment of the proteins encoded by genes inserted therein by reducing the number of background proteins from which the desired protein must be purified or enriched. Exemplary secretion vectors are described below.

Construction of Secretion Vectors The secretion vectors of the present invention include a promoter capable of directing gene expression in the host cell, tissue, or organism of interest. Such promoters include the Rous Sarcoma Virus promoter, the SV40 promoter, the human cytomegalovirus promoter, and other promoters familiar to those skilled in the art.
A signal sequence from a cDNA (or genomic DNA obtainable therefrom), such as one of the signal sequences in SEQ ID NOs: 24-73 as defined in Table I above, is operably linked to the promoter such that the mRNA transcribed from the promoter will direct the translation of the signal peptide. The host cell, tissue, or organism may be any cell, tissue, or organism which recognizes the signal peptide encoded by the signal sequence in the cDNA (or genomic DNA obtainable therefrom). Suitable hosts include mammalian cells, tissues or organisms, avian cells, tissues, or organisms, insect cells, tissues or organisms, or yeast.
In addition, the secretion vector contains cloning sites for inserting genes encoding the proteins which are to be secreted. The cloning sites facilitate the cloning of the insert gene in frame with the signal sequence such that a fusion protein in which the signal peptide is fused to the protein encoded by the inserted gene is expressed from the mRNA transcribed from the promoter. The signal peptide directs the extracellular secretion of the fusion protein.
The secretion vector may be DNA or RNA and may integrate into the chromosome of the host, be stably maintained as an extrachromosomal replicon in the host, be an artificial chromosome, or be transiently present in the host. Preferably, the secretion vector is maintained in multiple copies in each host cell. As used herein, multiple copies means at least 2,5, 10, 20, 25, 50 or more than 50 copies per cell. In some embodiments, the multiple copies are maintained extrachromosomally. In other embodiments, the multiple copies result from amplification of a chromosomal sequence.
Many nucleic acid backbones suitable for use as secretion vector; are known to those skilled in the art, including retroviral vectors, SV40 vectors, Bovine Papilloma Virus vectors, yeast integrating plasmids, yeast episomai plasmids, yeast artificial chromosomes, human artificial chromosomes, P element vectors, baculovirus vectors, or bacterial plasmids capable of being transiently introduced into the host.
The secretion vector may also contain a polyA signal such that the polyA
signal is located downstream of the gene inserted into the secretion vector.
After the gene encoding the protein for which secretion is desired is inserted into the secretion vector, the secretion vector is introduced into the host cell, tissue, or organism using calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome-mediated transfection, viral particles or as naked DNA. The protein encoded by the inserted gene is then purified or enriched from the supernatant using conventional techniques such as ammonium sulfate precipitation, immunoprecipitation, immunochromatography, size exclusion chromatography, ion exchange chromatography, and hplc.
Alternatively, the secreted protein may be in a sufficiently enriched or pure state in the supernatant or growth media of the host to permit it to be used for its intended purpose without further enrichment.
The signal sequences may also be inserted into vectors designed for gene therapy. In such vector;, the signal sequence is operably linked to a promoter such that mRNA
transcribed from the promoter encodes the signal peptide. A cloning site is located downstream of the signal sequence such that a gene encoding a protein whose secretion is desired may readily be inserted into the vector and fused to the signal sequence. The vector is introduced into an appropriate host cell. The protein expressed from the promoter is secreted extracellulariy, thereby producing a therapeutic effect.
The cDNAs or 5' ESTs may also be used to clone sequences located upstream of the cDNAs or 5' ESTs which are capable of regulating gene expression, including promoter sequences, enhancer sequences, and other upstream sequences which influence transcription or translation levels. Once identified and cloned, these upstream regulatory sequences may be used in expression vectors designed to direct the expression of an inserted gene in a desired spatial, temporal, developmental, or quantitative fashion. The next example describes a method for cloning sequences upstream of the cDNAs or 5' ESTs.

Use of CDNAs or Fragments thereof to Clone Upstream Seguences from Genomic DNA
Sequences derived from cDNAs or 5' ESTs may be used to isolate the promoters of the corresponding genes using chromosome walking techniques. In one chromosome walking technique, which utilizes the GenomeWalker0 kit available from Clontech, five complete genomic DNA samples are each digested with a different restriction enzyme which has a 6 base recognition site and leaves a blunt end.
Following digestion, oligonucleotide adapters are ligated to each end of the resulting genomic DNA
fragments.

W O 00!37491 For each of the five genomic DNA libraries, a first PCR reaction is performed according to the manufacturer s instructions using an outer adaptor primer provided in the kit and an outer gene specific primer. The gene specific primer should be selected to be specific for the cDNA or 5' EST of interest and should have a melting temperature, length, and location in the cDNA or 5' EST
which is consistent with its use in PCR reactions. Each first PCR reaction contains 5ng of genomic DPJA, 5 NI of 10X Tth reaction buffer, 0.2 mM of each dNTP, 0.2 ~M each of outer adaptor primer and outer gene specific primer, 1.1 mM of Mg(OAc)2, and 1 NI of the Tth polymerase 50X mix in a total volume of 50 NI.
The reaction cycle for the first PCR reaction is as follows: 1 min at 94SC I 2 sec at 94~C, 3 min at 72~C (7 cycles) I 2 sec at 94~C, 3 min at 67~C (32 cycles) l 5 min at 6700.
The product of the first PCR reaction is diluted and used as a template for a second PCR reaction according to the manufacturer's instructions using a pair of nested primers which are located internally on the amplicon resulting from the first PCR reaction. For example, 5 NI of the reaction product of the first PCR
reaction mixture may be diluted 180 times. Reactions are made in a 5U NI
volume having a composition identical to that of the first PCR reaction except the nested primers are used. The first nested primer is specific for the adaptor, and is provided with the GenomeWalker0 kit. The second nested primer is specific for the particular cDNA or 5' EST for which the promoter is to be cloned and should have a melting temperature, length, and location in the cDNA or 5' EST which is consistent with its use in PCR reactions.
The reaction parameters of the second PCR reaction are as follows: 1 min at 940C I 2 sec at 94~C, 3 min at 720C (6 cycles) I 2 sec at 94~C, 3 min at 670C (25 cycles) I 5 min at 670C
The product of the second PCR reaction is purified, cloned, and sequenced using standard techniques. Alternatively, two or more human genomic DNA libraries can be constructed by using two or more restriction enzymes. The digested genomic DNA is cloned into vectors which can be converted into single stranded, circular, or linear DNA. A biotinylated oligonucleotide comprising at least 15 nucleotides from the cDNA or 5' EST sequence is hybridized to the single stranded DNA.
Hybrids between the biotinylated oligonucleotide and the single stranded DNA containing the cDNA
or EST sequence are isolated as described in example 17 above. Thereafter, the single stranded DNA
containing the cDNA or EST
sequence is released from the beads and converted into double stranded DNA
using a primer specific for the cDNA or 5' EST sequence or a primer corresponding to a sequence included in the cloning vector. The resulting double stranded DNA is transformed into bacteria. DNAs containing the 5' EST or cDNA
sequences are identified by colony PCR or colony hybridization.
Once the upstream genomic sequences have been cloned and sequenced as described above, prospective promoters and transcription start sites within the upstream sequences may be identified by comparing the sequences upstream of the cDNAs or 5' ESTs with databases containing known transcription start sites, transcription factor binding sites, or promoter sequences.
In addition, promoters in the upstream sequences may be identified using promoter reporter vectors as described below.

Identification of Promoters in Cloned U stream Se uences The genomic sequences upstream of the cDNAs or fragment thereof are cloned into a suitable promoter reporter vector, such as the pSEAP-Basic, pSEAP-Enhancer, pOgal-Basic, pllgal-Enhancer, or pEGFP-1 Promoter Reporter vectors available from Clontech. Briefly, each of these promoter reporter 5 vectors include multiple cloning sites positioned upstream of a reporter gene encoding a readily assayable protein such as secreted alkaline phosphatase, 0 galactosidase, or green fluorescent protein. The sequences upstream of the cDNAs or 5' ESTs are inserted into the cloning sites upstream of the reporter gene in both orientations and introduced into an appropriate host cell. The level of reporter protein is assayed and compared to the level obtained from a vector which lacks an insert in the cloning site. The 10 presence of an elevated expression level in the vector containing the insert with respect to the control vector indicates the presence of a promoter in the insert. If necessary, the upstream sequences can be cloned into vectors which contain an enhancer for augmenting transcription levels from weak promoter sequences. A
significant level of expression above that observed with the vector lacking an insert indicates that a promoter sequence is present in the inserted upstream sequence.
15 Appropriate host cells for the promoter reporter vectors may be chosen based on the results of the above described determination of expression patterns of the cDNAs and ESTs.
For example, if the expression pattern analysis indicates that the mRNA corresponding to a particular cDNA or fragment thereof is expressed in fibroblasts, the promoter reporter vector may be introduced into a human fibroblast cell line.
Promoter sequences within the upstream genomic DNA may be further defined by constructing 20 nested deletions in the upstream DNA using conventional techniques such as Exonuclease III digestion.
The resulting deletion fragments can be inserted into the promoter reporter vector to determine whether the deletion has reduced or obliterated promoter activity. In this way, the boundaries of the promoters may be defined. If desired, potential individual regulatory sites within the promoter may be identified using site directed mutagenesis or linker scanning to obliterate potential transcription factor binding sites within the 25 promoter individually or in combination. The effects of these mutations on transcription levels may be determined by inserting the mutations into the cloning sites in the promoter reporter vectors.

Clonin4 and Identification of Promoters Using the method described in example 47 above with 5' ESTs, sequences upstream of several 30 genes were obtained. Using the primer pairs GGG AAG ATG GAG ATA GTA TTG CCT
G (SEQ ID N0:15) and CTG CCA TGT ACA TGA TAG AGA GAT TC (SEQ ID N0:16), the promoter having the internal designation P13H2 (SEQ ID N0:17) was obtained.
Using the primer pairs GTA CCA GGGG ACT GTG ACC ATT GC (SEQ ID N0:18) and CTG
TGA
CCA TTG CTC CCA AGA GAG (SEQ ID N0:19), the promoter having the internal designation P15B4 (SEQ
35 ID N0:20) was obtained.

Using the primer pairs CTG GGA TGG AAG GCA CGG TA (SEQ ID N0:21) and GAG ACC
ACA
CAG CTA GAC AA (SEQ ID N0:22), the promoter having the internal designation P29B6 (SEQ ID N0:23) was obtained.
Figure 4 provides a schematic description of the promoters isolated and the way they are assembled with the corresponding 5' tags. The upstream sequences were screened for the presence of motifs resembling transcription factor binding sites or known transcription start sites using the computer program Matlnspector release 2.0, August 1996.
Figure 5 describes the transcription factor binding sites present in each of these promoters. The columns labeled matrice provides the name of the Matlnspector matrix used. The column labeled position provides the 5' postion of the promoter site. Numeration of the sequence starts from the transcription site as determined by matching the genomic sequence with the 5' EST sequence. The column labeled "orientation"
indicates the DNA strand on which the site is found, with the + strand being the coding strand as determined by matching the genomic sequence with the sequence of the 5' EST. The column labeled "score" provides the Matlnspector score found for this site. The column labeled "length"
provides the length of the site in nucleotides. The column labeled "sequence" provides the sequence of the site found.
The promoters and other regulatory sequences located upstream of the cDNAs or 5' ESTs may be used to design expression vectors capable of directing the expression of an inserted gene in a desired spatial, temporal, developmental, or quantitative manner. A promoter capable of directing the desired spatial, temporal, developmental, and quantitative patterns may be selected using the results of the expression analysis described in example 10 above. For example, if a promoter which confers a high level of expression in muscle is desired, the promoter sequence upstream of a cDNA or 5' EST derived from an mRNA which is expressed at a high level in muscle, as determined by the method of example 10, may be used in the expression vector.
Preferably, the desired promoter is placed near multiple restriction sites to facilitate the cloning of the desired insert downstream of the promoter, such that the promoter is able to drive expression of the inserted gene. The promoter may be inserted in conventional nucleic acid backbones designed for extrachromosomal replication, integration into the host chromosomes or transient expression. Suitable backbones for the present expression vectors include retroviral backbones, backbones from eukaryotic episomes such as SV40 or Bovine Papilloma Virus, backbones from bacterial episomes, or artificial chromosomes.
Preferably, the expression vectors also include a polyA signal downstream of the multiple restriction sites for directing the polyadenylation of mRNA transcribed from the gene inserted into the expression vector.
Following the identification of promoter sequences using the procedures of Examples 46-48, proteins which interact with the promoter may be identified as described in example 49 below.

Identification of Proteins Which Interact with Promoter Se uences U stream Re ulato Se uences or mRNA
Sequences within the promoter region which are likely to bind transcription factors may be identified by homology to known transcription factor binding sites or through conventional mutagenesis or deletion analyses of reporter plasmids containing the promoter sequence. For example, deletions may be made in a reporter plasmid containing the promoter sequence of interest operably linked to an assayable reporter gene. The reporter plasmids carrying various deletions within the promoter region are transfected into an appropriate host cell and the effects of the deletions on expression levels is assessed. Transcription factor binding sites within the regions in which deletions reduce expression levels may be further localized using site directed mutagenesis, linker scanning analysis, or other techniques familiar to those skilled in the art.
Nucleic acids encoding proteins which interact with sequences in the promoter may be identified using one-hybrid systems such as those described in the manual accompanying the Matchmaker One-Hybrid System kit avalilabe from Clontech (Catalog No. K1603-1). Briefly, the Matchmaker One-hybrid system is used as follows. The target sequence for which it is desired to identify binding proteins is cloned upstream of a selectable reporter gene and integrated into the yeast genome. Preferably, multiple copies of the target sequences are inserted into the reporter plasmid in tandem.
A library comprised of fusions between cDNAs to be evaluated for the ability to bind to the promoter and the activation domain of a yeast transcription factor, such as GAL4, is transformed into the yeast strain containing the integrated reporter sequence. The yeast are plated on selective media to select cells expressing the selectable marker linked to the promoter sequence. The colonies which grow on the selective media contain genes encoding proteins which bind the target sequence. The inserts in the genes encoding the fusion proteins are further characterized by sequencing. In addition, the inserts may be inserted into expression vectors or in vitro transcription vectors. Binding of the polypep6des encoded by the inserts to the promoter DNA may be confirmed by techniques familiar to those skilled in the art, such as gel shift analysis or DNAse protection analysis.
VII. Use of cDNAs (or Genomic DNAs Obtainable Therefrom) in Gene Therapy The present invention also comprises the use of cDNAs (or genomic DNAs obtainable therefrom) in gene therapy strategies, including antisense and triple helix strategies as described in Examples 50 and 51 below. In antisense approaches, nucleic acid sequences complementary to an mRNA are hybridized to the mRNA intracellularly, thereby blocking the expression of the protein encoded by the mRNA. The antisense sequences may prevent gene expression through a variety of mechanisms. For example, the an6sense sequences may inhibit the ability of ribosomes to translate the mRNA.
Alternatively, the antisense sequences may block transport of the mRNA from the nucleus to the cytoplasm, thereby limiting the amount of mRNA available for translation. Another mechanism through which antisense sequences may inhibit gene expression is by interfering with mRNA splicing. In yet another strategy, the antisense nucleic acid may be incorporated in a riboryme capable of spec~cally cleaving the target mRNA.

Preparation and Use of Antisense Oligonucleotides The antisense nucleic acid molecules to be used in gene therapy may be either DNA or RNA
sequences. They may comprise a sequence complementary to the sequence of the cDNA (or genomic DNA
obtainable therefrom). The antisense nucleic acids should have a length and melting temperature sufficient to permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the mRNA
in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene therapy are disclosed in Green et aL, Ann. Rev. Biochem., 55:569-597 (1986) and Izant and Weintraub, Cell, 36:1007-1015 (1984).
In some strategies, antisense molecules are obtained from a nucleotide sequence encoding a protein by reversing the orientation of the coding region with respect to a promoter so as to transcribe the opposite strand from that which is normally transcribed in the cell. The antisense molecules may be transcribed using in vitro transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript. Another approach involves transcription of the antisense nucleic acids in vivo by operably finking DNA containing the antisense sequence to a promoter in an expression vector.
Alternatively, oligonucleotides which are complementary to the strand normally transcribed in the cell may be synthesized in vitro. Thus, the antisense nucleic acids are complementary to the corresponding mRNA and are capable of hybridizing to the mRNA to create a duplex. In some embodiments, the antisense sequences may contain modified sugar phosphate backbones to increase stability and make them less sensitive to RNase activity. Examples of modifications suitable for use in antisense strategies include 2' 0-methyl RNA oligonucleotides and Protein-nucleic acid (PNA) oligonucleotides.
Further examples are described by Rossi et aL, Pharmacol. Ther., 50(2):245-254, (1991).
Various types of antisense oligonucleotides complementary to the sequence of the cDNA (or genomic DNA obtainable therefrom) may be used. In one preferred embodiment, stable and semi-stable antisense oligonucleotides described in fntemational Application No. PCT
W094123026 are used. In these moleucles, the 3' end or both the 3' and 5' ends are engaged in intramolecular hydrogen bonding between complementary base pairs. These molecules are better able to withstand exonuclease attacks and exhibit increased stability compared to conventional antisense oligonucleotides.
In another preferred embodiment, the antisense oligodeoxynucleotides against herpes simplex virus types 1 and 2 described in International Application No. WO 95104141.
In yet another preferred embodiment, the covalently cross-linked antisense oligonucleotides described in International Application No. WO 96131523 are used. These double-or single-stranded oligonucleotides comprise one or more, respectively, inter- or intra-oligonucleotide covalent cross-linkages, wherein the linkage consists of an amide bond between a primary amine group of one strand and a carboxyl group of the other strand or of the same strand, respectively, the primary amine group being directly substituted in the 2' position of the strand nucleotide monosaccharide ring, and the carboxyl group being carried by an aliphatic spacer group substituted on a nucleotide or nucleotide analog of the other strand or the same strand, respectively.

The antisense oligodeoxynucleotides and oligonucleotides disclosed in International Application No.
WO 92/18522 may also be used. These molecules are stable to degradation and contain at least one transcription control recognition sequence which binds to control proteins and are effective as decoys therefor. These molecules may contain "hairpin" structures, "dumbbell"
structures, "modified dumbbell"
structures, "cross-linked" decoy structures and "loop" structures.
In another preferred embodiment, the cyclic double-stranded oligonucleotides described in European Patent Application No. 0 572 287 A2 are used. These ligated oligonucleotide "dumbbells" contain the binding site for a transcription factor and inhibit expression of the gene under control of the transcription factor by sequestering the factor.
Use of the closed antisense oligonucleotides disclosed in International Application No. WO
92/19732 is also contemplated. Because these molecules have no free ends, they are more resistant to degradation by exonucleases than are conventional oligonucleotides. These oligonucleotides may be multifunctional, interacting with several regions which are not adjacent to the target mRNA.
The appropriate level of antisense nucleic acids required to inhibit gene expression may be determined using in vitro expression analysis. The antisense molecule may be introduced into the cells by diffusion, injection, infection or transfection using procedures known in the art. For example, the antisense nucleic acids can be introduced into the body as a bare or naked oligonucleotide, oligonucleotide encapsulated in lipid, oligonucleotide sequence encapsidated by viral protein, or as an oligonucleotide operably linked to a promoter contained in an expression vector. The expression vector may be any of a variety of expression vectors known in the art, including retroviral or viral vectors, vectors capable of extrachromosomal replication, or integrating vectors. The vectors may be DNA
or RNA.
The antisense molecules are introduced onto cell samples at a number of different concentrations preferably between 1x10-~~M to 1x10~M. Once the minimum concentration that can adequately control gene expression is identified, the optimized dose is translated into a dosage suitable for use in vivo. For example, an inhibiting concentration in culture of 1x10- translates into a dose of approximately 0.6 mglkg bodyweight.
Levels of oligonucleotide approaching 100 mglkg bodyweight or higher may be possible after testing the toxicity of the oligonucleotide in laboratory animals. It is additionally contemplated that cells from the vertebrate are removed, treated with the antisense oligonucleotide, and reintroduced into the vertebrate.
It is further contemplated that the antisense oligonucleotide sequence is incorporated into a ribozyme sequence to enable the antisense to specifically bind and cleave its target mRNA. For technical applications of ribozyme and antisense oligonucleotides see Rossi ef al., supra.
In a prefer-ed application of this invention, the polypeptide encoded by the gene is first identified, so that the effectiveness of antisense inhibition on translation can be monitored using techniques that include but are not limited to antibody-mediated tests such as RIAs and ELISA, functional assays, or radiolabeling.
The cDNAs of the present invention (or genomic DNAs obtainable therefrom) may also be used in gene therapy approaches based on intracellular triple helix formation. Triple helix oligonucleotides are used to inhibit transcription from a genome. They are particularly useful for studying alterations in cell activity as it is associated with a particular gene. The cDNAs (or genomic DNAs obtainable therefrom) of the present invention or, more preferably, a fragment of those sequences, can be used to inhibit gene expression in individuals having diseases associated with expression of a particular gene.
Similarly, a fragment of the cDNA (or genomic DNA obtainable therefrom) can be used to study the effect of inhibiting transcription of a 5 particular gene within a cell. Traditionally, homopurine sequences were considered the most useful for triple helix strategies. However, homopyrimidine sequences can also inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major groove at homopurine:homopyrimidine sequences. Thus, both types of sequences from the cDNA or from the gene corresponding to the cDNA are contemplated within the scope of this invention.

Preparation and use of Triple Helix Probes The sequences of the cDNAs (or genomic DNAs obtainable therefrom) are scanned to identify 10-mer to 20-mer homopyrimidine or homopurine stretches which could be used irt triple-helix based strategies for inhibiting gene expression. Following identification of candidate homopyrimidine or homopurine 15 stretches, their efficiency in inhibiting gene expression is assessed by introducing varying amounts of oligonucleotides containing the candidate sequences into tissue culture cells which normally express the target gene. The oligonucleotides may be prepared on an oligonucleotide synthesizer or they may be purchased commercially from a company specializing in custom oligonucleotide synthesis, such as GENSET, Paris, France.
20 The oligonucleotides may be introduced into the cells using a variety of methods known to those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome-mediated transfection or native uptake.
Treated cells are monitored for altered cell function or reduced gene expression using techniques such as Northern blotting, RNase protection assays, or PCR based strategies to monitor the transcription 25 levels of the target gene in cells which have been treated with the oliganucleotide . The cell functions to be monitored are predicted based upon the homologies of the target gene corresponding to the cDNA from which the oligonucleotide was derived with known gene sequences that have been associated with a particular function. The cell functions can also be predicted based on the presence of abnormal physiologies within cells derived from individuals with a particular inherited disease, particularly when the cDNA is 30 associated with the disease using techniques described in example 44.
The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells may then be introduced in vivo using the techniques described above and in example 50 at a dosage calculated based on the in vitro results, as described in example 50.
In some embodiments, the natural (beta) anomers of the oligonucleotide units can be replaced with 35 alpha anomers to render the oligonucleotide more resistant to nucleases.
Further, an intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides suitable for triple helix formation see Griffin et al (Science, 245:967-971 (1989).

Use of cDNAs to Express an Encoded Protein in a Host Organism The cDNAs of the present invention may also be used to express an encoded protein in a host organism to produce a beneficial effect. In such procedures, the encoded protein may be transiently expressed in the host organism or stably expressed in the host organism. The encoded protein may have any of the activities described above. The encoded protein may be a protein which the host organism lacks or, alternatively, the encoded protein may augment the existing levels of the protein in the host organism.
A full length cDNA encoding the signal peptide and the mature protein, or a cDNA encoding only the mature protein is introduced into the host organism. The cDNA may be introduced into the host organism using a variety of techniques known to those of skill in the art. For example, the cDNA may be injected into the host organism as naked DNA such that the encoded protein is expressed in the host organism, thereby producing a beneficial effect.
Alternatively, the cDNA may be cloned into an expression vector downstream of a promoter which is active in the host organism. The expression vector may be any of the expression vectors designed for use in gene therapy, including viral or retroviral vectors.
The expression vector may be directly introduced into the host organism such that the encoded protein is expressed in the host organism to produce a beneficial effect. In another approach, the expression vector may be introduced into cells in vitro. Cells containing the expression vector are thereafter selected and introduced into the host organism, where they express the encoded protein to produce a beneficial effect.

Use Of Signal Peptides To Import Proteins Into Cells The short core hydrophobic region (h) of signal peptides encoded by the cDNAs of the present invention or fragment thereof may also be used as a camer to import a peptide or a protein of interest, so-called cargo, into tissue culture cells (Lin et al., J. 8iol. Chem., 270:
14225-14258 (1995); Du et aL, J.
Peptide Res., 51: 235-243 (1998); Rojas et al., Nature Biotech., 16: 370-375 (1998)).
When cell permeable peptides of limited size (approximately up to 25 amino acids) are to be translocated across cell membrane, chemical synthesis may be used in order to add the h region to either the C-terminus or the N-terminus to the cargo peptide of interest.
Alternatively, when longer peptides or proteins are to be imported into cells, nucleic acids can be genetically engineered, using techniques familiar to those skilled in the art, in order to link the cDNA sequence or fragment thereof encoding the h region to the 5' or the 3' end of a DNA sequence coding for a cargo polypeptide. Such genetically engineered nucleic acids are then translated either in vitro or in vivo after transfection into appropriate cells, using conventional techniques to produce the resulting cell permeable polypeptide. Suitable hosts cells are then simply incubated with the cell permeable polypeptide which is then translocated across the membrane.

This method may be applied to study diverse intracellular functions and cellular processes. For instance, it has been used to probe functionally relevant domains of intracellular proteins and to examine protein-protein interactions involved in signal transduction pathways (Lin et al., supra; Lin et al., J. Biol.
Chem., 271: 5305-5308 (1996); Rojas et al., J. BioL Chem., 271: 27456-27461 (1996); Liu et ai., Proc. Nati.
Acad. Sci. USA, 93: 11819-11824 (1996); Rojas et al., Bioch. Biophys. Res.
Commun., 234: 675-680 (1997)).
Such techniques may be used in cellular therapy to import proteins producing therapeutic effects.
For instance, cells isolated from a patient may be treated with imported therapeutic proteins and then re-introduced into the host organism.
Alternatively, the h region of signal peptides of the present invention could be used in combination with a nuclear localization signal to deliver nucleic acids into cell nucleus.
Such oligonucleotides may be antisense oligonucleotides or oligonucleotides designed to form triple helixes, as described in examples 50 and 51 respectively, in order to inhibit processing and maturation of a target cellular RNA.

Computer Embodiments As used herein the term "cDNA codes of SEO ID NOs. 24-73" encompasses the nucleotide sequences of SEQ ID NOs. 24-73, fragments of SEO ID NOs. 24-73, nucleotide sequences homologous to SEQ ID NOs. 24-73 or homologous to fragments of SEO ID NOs. 24-73, and sequences complementary to all of the preceding sequences. The fragments include fragments of SEC1 ID
NOs. 24-73 comprising at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive nucleotides of SEQ ID NOs. 24-73. Preferably, the fragments are novel fragments. Preferably the fragments include polynucleotides described in Table III or fragments thereof comprising at least 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive nucleotides of the polynucleotides described in Table III. Homologous sequences and fragments of SEQ ID NOs. 24-73 refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, or 75% homology to these sequences. Homology may be determined using any of the computer programs and parameters described in example 17, including BLAST2N with the default parameters or with any modified parameters.
Homologous sequences also include RNA sequences in which uridines replace the thymines in the cDNA
codes of SEQ ID NOs. 24-73. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error as described above. Preferably the homologous sequences and fragments of SEQ ID NOs. 24-73 include polynucleotides described in Table III
or fragments comprising at feast 8, 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, 1000 or 2000 consecutive nucleotides of the polynucleotides described in Table III. It will be appreciated that the cDNA codes of SEQ ID NOs. 24-73 can be represented in the traditional single character format (See the inside back cover of Styer, Lubert. Biochemistry, 3~d edition. W. H Freeman & Co., New York.) or in any other format which records the identity of the nucleotides in a sequence.

WO OOI37491 PCT/tB99/02058 As used herein the term "polypeptide codes of SEO ID NOS. 74-123" encompasses the polypeptide sequences of SEO ID NOs. 74-123 which are encoded by the cDNAs of SEQ ID NOs.
24-73, polypeptide sequences homologous to the polypeptides of SEO ID NOS. 74-123, or fragments of any of the preceding sequences. Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75% homology to one of the polypeptide sequences of SEQ ID NOS. 74-123. Homology may be determined using any of the computer programs and parameters described herein, including FASTA with the default parameters or v~ith any modified parameters.
The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error as described above. The polypeptide fragments comprise at least 5, 8, 10, 12, 15, 20, 25, 30, 35, 40, 50, 60, 75, 100, 150 or 200 consecutive amino acids of the polypeptides of SEQ ID NOS. 74-123.
Preferably, the fragments are novel fragments. Preferably, the fragments include polypeptides encoded by the polynucleotides described in Table III, or fragments thereof comprising at least 5, 1 ~. 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of the polypeptides encoded by the polynucleotides described in Table III. It will be appreciated that the polypeptide codes of the SEO ID NOS. 74-123 can be represented in the traditional single character format or three letter format (See the inside back cover of Stamen, Lubert. Biochemistry, 3~d edition. W. H Freeman 8~ Co., New York.) or in any other format which relates the identity of the polypeptides in a sequence.
It will be appreciated by those skilled in the art that the cDNA codes of SEO
ID NOs. 24-73 and poiypeptide codes of SEO ID NOS. 74-123 can be stored, recorded, and manipulated on any medium which can be read and accessed by a computer. As used herein, the words "recorded"
and "stored" refer to a process for storing information on a computer medium. A skilled artisan can readily adopt any of the presently known methods for recording information on a computer readable medium to generate manufactures comprising one or more of the cDNA codes of SEQ ID NOs. 24-73, one or more of the polypeptide codes of SEO ID NOS. 74-123. Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 cDNA
codes of SEO ID NOs. 24-73.
Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of SEO ID NOS. 74-123.
Computer readable media include magnetically readable media, optically readable media, electronically readable media and magnetic/optical media. For example, the computer readable media may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other media known to those skilled in the an.
Embodiments of the present invention include systems, particularly computer systems which store and manipulate the sequence information described herein. One example of a computer system 100 is illustrated in block diagram form in Figure 6. As used herein, "a computer system refers to the hardware components, software components, and data storage components used to analyze the nucleotide sequences of the cDNA codes of SEQ ID NOs. 24-73, or the amino acid sequences of the polypeptide codes of SEQ ID NOS. 74-123. In one embodiment, the computer system 100 is a Sun Enterprise 1000 server (Sun Microsystems, Palo Alto, CA). The computer system 100 preferably includes a processor for processing, accessing and manipulating the sequence data. The processor 105 can be any well-known type of central processing unit, such as the Pentium III from Intel Corporation, or similar processor from Sun, Motorola, Compaq or International Business Machines.
Preferably, the computer system 100 is a general purpose system that comprises the processor 105 and one or more internal data storage components 110 for storing data, and one or more data retrieving devices for retrieving the data stored on the data storage components. A
skilled artisan can readily appreciate that any one of the currently available computer systems are suitable.
In one particular embodiment, the computer system 100 includes a processor 105 connected to a bus which is connected to a main memory 115 (preferably implemented as RAM) and one or more internal data storage devices 110, such as a hard drive andlor other computer readable media having data recorded thereon. In some embodiments, the computer system 100 further includes one or more data retrieving device 118 for reading the data stored on the internal data storage devices 110.
The data retrieving device 118 may.represent, for example, a floppy disk drive, a compact disk drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device 110 is a removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. containing control logic andlor data recorded thereon. The computer system 100 may advantageously include or be programmed by appropriate software for reading the control Vogic andlor the data from the data storage component once inserted in the data retrieving device.
The computer system 100 includes a display 120 which is used to display output to a computer user. It should also be noted that the computer system 100 can be linked to other computer systems 125a-c in a network or wide area network to provide centralized access to the computer system 100.
Software for accessing and processing the nucleotide sequences of the cDNA
codes of SEA ID
NOs. 24-73, or the amino acid sequences of the polypeptide codes of SEQ ID
NOS. 74-123 (such as search tools, compare tools, and modeling tools etc.) may reside in main memory 115 during execution.
In some embodiments, the computer system 100 may further comprise a sequence comparer for comparing the above-described cDNA codes of SEQ ID NOs. 24-73 or polypeptide codes of SEQ fD NOS.
74-123 stored on a computer readable medium to reference nucleotide or polypeptide sequences stored on a computer readable medium. A "sequence comparer" refers to one or more programs which are implemented on the computer system 100 to compare a nucleotide or polypeptide sequence with other nucleotide or polypeptide sequences andlor compounds including but not limited to peptides, peptidomimetics, and chemicals stored within the data storage means. For example, the sequence comparer may compare the nucleotide sequences of the cDNA codes of SEQ ID NOs.
24-73, or the amino acid sequences of the polypeptide codes of SEQ ID NOS. 74-123 stored on a computer readable medium to reference sequences stored on a computer readable medium to identify homologies, motifs implicated in biological function, or structural motifs. The various sequence comparer programs identified elsewhere in this patent specification are particularly contemplated for use in this aspect of the invention.
Figure 7 is a flow diagram illustrating one embodiment of a process 200 for comparing a new nucleotide or protein sequence with a database of sequences in order to determine the homology levels 5 between the new sequence and the sequences in the database. The database of sequences can be a private database stored within the computer system 100, or a public database such as GENBANK, PIR or SWISSPROT that is available through the Internet.
The process 200 begins at a start state 201 and then moves to a state 202 wherein the new sequence to be compared is stored to a memory in a computer system 100. As discussed above, the 10 memory could be any type of memory, including RAM or an internal storage device.
The process 200 then moves to a state 204 wherein a database of sequences is opened for analysis and comparison. The process 200 then moves to a state 206 wherein the first sequence stored in the database is read into a memory on the computer. A comparison is then performed at a state 210 to determine if the first sequence is the same as the second sequence. It is important to note that this step is 15 not limited to performing an exact comparison between the new sequence and the first sequence in the database. Well-known methods are known to those of skill in the art for comparing two nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced into one sequence in order to raise the homology level between the two tested sequences. The parameters that control whether gaps or other features are introduced into a sequence during comparison are normally entered by the user of the 20 computer system.
Once a comparison of the two sequences has been performed at the state 210, a determination is made at a decision state 210 whether the two sequences are the same. Of course, the term "same" is not limited to sequences that are absolutely identical. Sequences that are within the homology parameters entered by the user will be marked as "same" in the process 200.
25 If a determination is made that the two sequences are the same, the process 200 moves to a state 214 wherein the name of the sequence from the database is displayed to the user. This state notifies the user that the sequence with the displayed name fulfills the homology constraints that were entered. Once the name of the stored sequence is displayed to the user, the process 200 moves to a decision state 218 wherein a determination is made whether more sequences exist in the database.
If no more sequences 30 exist in the database, then the process 200 terminates at an end state 220.
However, if more sequences do exist in the database, then the process 200 moves to a state 22~ wherein a pointer is moved to the next sequence in the database so that it can be compared to the ne~A: sequence. In this manner, the new sequence is aligned and compared with every sequence in the database.
It should be noted that if a determination had been made at the decision state 212 that the 35 sequences were not homologous, then the process 200 would move immediately to the decision state 218 in order to determine if any other sequences were available in the database for comparison.

Accordingly, one aspect of the present invention is a computer system comprising a processor, a data storage device having stored thereon a nucleic acid code of SEQ ID NOs.
24-73 or a polypeptide code of SEO ID NOS. 74-123, a data storage device having retrievably stored thereon reference nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of SEA ID NOs. 24-73 or polypeptide code of SEQ ID NOS. 74-123 and a sequence comparer for conducting the comparison. The sequence comparer may indicate a homology level between the sequences compared or identify structural motifs in the above described nucleic acid code of SEQ ID NOs. 24-73 and polypeptide codes of SEQ ID
NOS. 74-123 or it may identify structural motifs in sequences which are campared to these cDNA codes and polypeptide codes. In some embodiments, the data storage device may have stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the cDNA codes of SEO ID NOs.
24-73 or polypeptide codes of SEQ ID NOS. 74-123.
Another aspect of the present invention is a method for determining the level of homology between a nucleic acid code of SEO !D NOs. 24-73 and a reference nucleotide sequence, comprising the steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a computer program which determines homology levels and determining homology between the nucleic acid code and the reference nucleotide sequence with the computer program. The computer program may be any of a number of computer programs for determining homology levels, including those specifically enumerated herein, including BLAST2N with the default parameters or with any modified parameters. The method may be implemented using the computer systems described above. The method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described cDNA codes of SEQ ID NOs. 24-73 through use of the computer program and determining homology between the cDNA codes and reference nucleotide sequences.
Figure 8 is a flow diagram illustrating one embodiment of a process 250 in a computer for determining whether two sequences are homologous. The process 250 begins at a start state 252 and then moves to a state 254 wherein a first sequence to be compared is stored to a memory. The second sequence to be compared is then stored to a memory at a state 256. 'The process 250 then moves to a state 260 wherein the ftrst character in the first sequence is read and then to a state 262 wherein the first character of the second sequence is read. It should be understood that if the sequence is a nucleotide sequence, then the character would normally be either A, T, C, G or U. If the sequence is a protein sequence, then it should be in the single letter amino acid code so that the first and sequence sequences can be easily compared.
A determination is then made at a decision state 264 whether the two characters are the same. If they are the same, then the process 250 moves to a state 268 wherein the next characters in the first and second sequences are read. A determination is then made whether the next characters are the same. If they are, then the process 250 continues this loop until two characters are not the same. If a determination is made that the next two characters are not the same, the process 250 moves to a decision state 274 to determine whether there are any more characters either sequence to read.

If there aren't any more characters to read, then the process 250 moves to a state 276 wherein the level of homology between the first and second sequences is displayed to the user. The level of homology is determined by calculating the profragment of characters between the sequences that were the same out of the total number of sequences in the first sequence. Thus, if every character in a first 100 nucleotide sequence aligned with a every character in a second sequence, the homology level would be 100%.
Alternatively, the computer program may be a computer program which compares the nucleotide sequences of the cDNA codes of the present invention, to reference nucleotide sequences in order to determine whether the nucleic acid code of SEO ID NOs. 24-73 differs from a reference nucleic acid sequence at one or more positions. Optionally such a program records the length and identity of inserted, deleted or substituted nucleotides with respect to the sequence of either the reference polynucleotide or the nucleic acid code of SEQ ID NOs. 24-73. In one embodiment, the computer program may be a program which determines whether the nucleotide sequences of the cDNA codes of SEQ ID
NOs. 24-73 contain a biallelic marker or single nucleotide polymorphism (SNP) with respect to a reference nucleotide sequence.
This single nucleotide polymorphism may comprise a single base substitution, insertion, or deletion, while this biallelic marker may comprise about one to ten consecutive bases substituted, inserted or deleted.
Another aspect of the present invention is a method for determining the level of homology between a polypeptide code of SEQ ID NOS. 74-123 and a reference polypeptide sequence, comprising the steps of reading the polypeptide code of SEQ ID NOS. 74-123 and the reference polypeptide sequence through use of a computer program which determines homology levels and determining homology between the polypeptide code and the reference polypeptide sequence using the computer program.
Accordingly, another aspect of the present invention is a method for determining whether a nucleic acid code of SEQ ID NOs. 24-73 differs at one or more nucleotides from a reference nucleotide sequence comprising the steps of reading the nucleic acid code and the reference nucleotide sequence through use of a computer program which identifies differences between nucleic acid sequences and identifying differences between the nucleic acid code and the reference nucleotide sequence with the computer program. In some embodiments, the computer program is a program which identifies single nucleotide polymorphisms. The method may be implemented by the computer systems described above and the method illustrated in Figure 8. The method may also be performed by reading at least 2, 5, 10, 15, 20, 25, 30, or 50 of the cDNA codes of SEQ ID NOs. 24-73 and the reference nucleotide sequences through the use of the computer program and identifying differences between the cDNA codes and the reference nucleotide sequences with the computer program.
in other embodiments the computer based system may further comprise an identifier for identifying features within the nucleotide sequences of the cDNA codes of SEQ ID NOs. 24-73 or the amino acid sequences of the polypeptide codes of SEQ ID NOS. 74-123.
An "identifier" refers to one or more programs which identifies certain features within the above-described nucleotide sequences of the cDNA codes of SEQ ID NOs. 24-73 or the amino acid sequences of the polypeptide codes of SEA ID NOS. 74-123. In one embodiment, the identifier may comprise a program which identifies an open reading frame in the cDNAs codes of SEQ ID NOs. 24-73.
Figure 9 is a flow diagram illustrating one embodiment of an identifier process 300 for detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 and then moves to a state 304 wherein a first sequence that is to be checked for features is stored to a memory 115 in the computer system 100. The process 300 then moves to a state 306 wherein a database of sequence features is opened. Such a database would include a list of each feature's attributes along with the name of the feature.
For example, a feature name could be "Initiation Codon" and the attribute would be "ATG". Another example would be the feature name "TAATAA Box" and the feature attribute would be "TAATAA". An example of such a database is produced by the University of Wisconsin Genetics Computer Group (www.gcg.com).
Once the database of features is opened at the state 306, the process 300 moves to a state 308 wherein the first feature is read from the database. A comparison of the attribute of the first feature with the first sequence is then made at a state 310. A determination is then made at a decision state 316 whether the attribute of the feature was found in the first sequence. If the attribute was found, then the process 300 moves to a state 318 wherein the name of the found feature is displayed to the user.
The process 300 then moves to a decision state 320 wherein a determination is made whether move features exist in the database. If no more features do exist, then the process 300 terminates at an end state 324. However, if more features do exist in the database, then the process 300 reads the next sequence feature at a state 326 and loops back to the state 310 wherein the attribute of the next feature is compared against the first sequence.
It should be noted, that if the feature attribute is not found in the first sequence at the decision state 316, the process 300 moves directly to the decision state 320 in order to determine if any more features exist in the database.
In another embodiment, the identifier may comprise a molecular modeling program which determines the 3-dimensional structure of the polypeptides codes of SEQ ID
NOS. 74-123. In some embodiments, the molecular modeling program identifies target sequences that are most compatible with profiles representing the structural environments of the residues in known three-dimensional protein structures. (See, e.g., Eisenberg et al., U.S. Patent No. 5,436,850 issued July 25, 1995). In another technique, the known three-dimensional structures of proteins in a given family are superimposed to define the structurally conserved regions in that family. This protein modeling technique also uses the known three-dimensional structure of a homologous protein to approximate the structure of the polypeptide codes of SEQ
ID NOS. 74-123. (See e.g., Srinivasan, et aL, U.S. Patent No. 5,557,535 issued September 17, 1996).
Conventional homology modeling techniques have been used routinely to build models of proteases and antibodies. (Sowdhamini et al., Protein Engineering 10:207, 215 (1997)).
Comparative approaches can also be used to develop three-dimensional protein models when the protein of interest has poor sequence identity to template proteins. In some cases, proteins fold into similar three-dimensional structures despite having WO 00/37491 PCTlIB99/02058 very weak sequence identities. For example, the three-dimensional structures of a number of helical cytokines fold in similar three-dimensional topology in spite of weak sequence homology.
The recent development of threading methods now enables the identification of likely folding patterns in a number of situations where the structural relatedness between target and templates) is not detectable at the sequence level. Hybrid methods, in which fold recognition is performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from the threading output using a distance geometry program DRAGON to construct a low resolution model, and a full-atom representation is constructed using a molecular modeling package such as QUANTA.
According to this 3-step approach, candidate templates are first identified by using the novel fold recognition algorithm MST, which is capable of performing simultaneous threading of multiple aligned sequences onto one or more 3-D structures. In a second step, the structural equivalencies obtained from the MST output are converted into inter-residue distance restraints and fed into the distance geometry program DRAGON, together with auxiliary information obtained from secondary structure predictions. The program combines the restraints in an unbiased manner and rapidly generates a large number of low resolution model confirmations. In a third step, these low resolution model confirmations are converted into full-atom models and subjected to energy minimization using the molecular modeling package QUANTA.
(See e.g., Aszodi et al., Proteins:Structure, Function, and Genetics, Supplement 1:38-42 (1997)).
The results of the molecular modeling analysis may then be used in rational drug design techniques to identify agents which modulate the activity of the polypeptide codes of SEQ
ID NOS. 74-123.
Accordingly, another aspect of the present invention is a method of identifying a feature within the cDNA codes of SEQ ID NOs. 24-73 or the polypeptide codes of SEQ ID NOS. 74-123 comprising reading the nucleic acid codes) or the polypeptide codes) through the use of a computer program which identifies features therein and identifying features within the nucleic acid codes) or polypeptide codes) with the computer program. In one embodiment, computer program comprises a computer program which identifies open reading frames. In a further embodiment, the computer program identifies structural motifs in a polypeptide sequence. In another embodiment, the computer program comprises a molecular modeling program. The method may be performed by reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the cDNA codes of SEQ ID NOs. 24-73 or the polypeptide codes of SEQ ID
NOS. 74-123 through the use of the computer program and identifying features within the cDNA codes or polypeptide codes with the computer program.
The cDNA codes of SEQ ID NOs. 24-73 or the polypeptide codes of SEQ ID NOS. 74-123 may be stored and manipulated in a variety of data processor programs in a variety of formats. For example, the cDNA codes of SEQ ID NOs. 24-73 or the polypeptide codes of SEQ ID NOS. 74-123 may be stored as text in a word processing file, such as MicrosoftWORD or WORDPERFECT or as an ASCII
file in a variety of database programs familiar to those of skill in the art, such as DB2, SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence comparers, identifier;, or sources of reference nucleotide or polypeptide sequences to be compared to the cDNA codes of SEQ ID NOs. 24-73 or the polypeptide codes of SEQ ID NOS. 74-123. The following list is intended not to limit the invention but to provide guidance to programs and databases which are useful with the cDNA
codes of SEQ ID NOs. 24-73 or the polypeptide codes of SEO ID NOS. 74-123. The programs and databases which may be used include, but ace not limited to: MacPattem (EMBL), DiscoveryBase (Molecular Applications Group), 5 GeneMine (Molecular Applications Group), Look (Molecular Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX (Altschul et al, J. Mol. Biol. 215:
403 (1990)), FASTA (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85: 2444 (1988)), FASTDB (Brutlag ef al. Comp. App. Biosci. 6:237-245, 1990), Catalyst (Molecular Simulations Inc.), CatalystISHAPE
(Molecular Simulations Inc.), Ceriusz.DBAccess (Molecular Simulations Inc.), HypoGen (Molecular 10 Simulations Inc.), Insight II, (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular Simulations Inc.), Delphi, (Molecular Simulations Inc.), QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), QuantalProtein design (Molecular Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular Simulations Inc.), Gene 15 Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc:), the EMBUSwissprotein database, the MDL Available Chemicals Directory database, the MDL Drug Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's Wand Drug Index database, the BioByteMasterFile database, the Genbank database, and the Genseqn database.
Many other programs and data bases would be apparent to one of skill in the art given the present disclosure.
20 Motifs which may be detected using the above programs include sequences encoding leucine zippers, helix-tum-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites.

Methods of Making Nucleic Acids The present invention also comprises methods of making the cDNA of SEQ ID
Nos.24-73, genomic DNA obtainable therefrom, or fragment thereof. The methods comprise sequentially linking together nucleotides to produce the nucleic acids having the preceding sequences. A
variety of methods of 30 synthesizing nucleic acids are known to those skilled in the art.
In many of these methods, synthesis is conducted on a solid support. These included the 3' phosphoramidite methods in which the 3' terminal base of the desired oligonucleotide is immobilized on an insoluble tamer. The nucleotide base to be added is blocked at the 5' hydroxyl and activated at the 3' hydroxyl so as to cause coupling with the immobilized nucleotide base.
Deblocking of the new immobilized 35 nucleotide compound and repetition of the cycle will produce the desired polynucleotide. Alternatively, polynucleotides may be prepared as described in U.S. Patent No. 5,049,656. In some embodiments, several polynucleotides prepared as described above are ligated together to generate longer polynucleotides having a desired sequence.

Methods of Making Polypeptides The present invention also comprises methods of making the polynucleotides encoded by the cDNA of SEQ (D Nos.24-73, genomic DNA obtainable therefrom, or fragments thereof and methods of making the polypeptides of SEQ ID Nos.74-123 or fragments thereof. The methods comprise sequentially linking together amino acids to produce the nucleic polypeptides having the preceding sequences. In some embodiments, the polypeptides made by these methods are 150 amino acids or less in length. In other embodiments, the polypeptides made by these methods are 120 amino acids or less in length.
A variety of methods of making polypeptides are known to those skilled in the art, including methods in which the carboxyl terminal amino acid is bound to polyvinyl benzene or another suitable resin.
The amino acid to be added possesses blocking groups on its amino moiety and any side chain reactive groups so that only its carboxyl moiety can react. The carboxyl group is activated with carbodiimide or another activating agent and allowed to couple to the immobilized amino acid.
After removal of the blocking group, the cycle is repeated to generate a polypeptide having the desired sequence. Alternatively, the methods described in U.S. Patent No. 5,049,656 may be used.

Functional Analysis of Predicted Protein Seguences Following double-sequencing, contigs were assembled for each of the cDNAs of the present invention and each was compared to known sequences available at the time of filing. These sequences originate from the following databases : Genbank (release 108), EMBL (release 58 and daily releases), Genseq (release 35.3) Swissprot (release 37), Genbank (release 108 and daily releases up to October, 15, 1998), Genseq (release 32) PIR (release 53) and Swissprot (release 35). In some cases, based on homology with other proteins, new open reading frames than the one previously selected were chosen. For example, the new open reading frame of SEQ ID N0: 27 does not contain a signal peptide anymore.
Then, the predicted proteins of the present invention matching known proteins were further classified into 3 categories depending on the level of homology.
The first category contains proteins of the present invention exhibiting at least 80% identical amino acid residues on the whole length of the matched protein. They are clearly close homologues, which most probably have the same function or a very similar function as the matched protein.
The second category contains proteins of the present invention exhibiting more remote homologies (35 to 80% over the whole protein) indicating that the protein of the present invention is likely to have functions similar to those of the matched protein.
The third category contains proteins exhibiting homology to a domain of a known protein indicating that the matched protein and the protein of the invention may share similar features such as functional domains.

It should be noted that, in the numbering of amino acids in the protein sequences discussed below, in figures 10 to 13 and in Table V, the first methionine encountered is designated as amino acid number 1.
In the appended sequence listing, the first amino acid of the mature protein resulting from cleavage of the signal peptide is designated as amino acid number 1 and the ftrst amino acid of the signal peptide is designated with the appropriate negative number, in accordance with the regulations governing sequence listings.
In addition, all amino acid sequences (SEQ ID NOs :74-123) were scanned for the presence of known protein signatures and motifs. This search was performed against the Prosite 15.0 database, using the Proscan software from the GCG package as follows.
The polypeptides encoded by the cDNAs were screened for the presence of known structural or functional motifs or for the presence of signatures, small amino acid sequences that are well conserved amongst the members of a protein family. The conserved regions have been used to derive consensus patterns or matrices included in the PROSITE data bank, in particular in the file prosite.dat located at http:Ilexpasy.hcuge.chlsprotlprosite.html. Prosite convert and prosite_scan programs (http:Ilulrec3.unil.chlftpserveurlprosite_scan) were used to find signatures on the cDNAs.
For each pattern obtained with the prosite_convert program from the prosite.dat file, the accuracy of the detection on a new protein sequence has been tested by evaluating the frequency of irrelevant hits on the population of human secreted proteins included in the data bank SWISSPROT.
The ratio between the number of hits on shuffled proteins (with a window size of 20 amino acids) and the number of hits on native (unshuffled) proteins was used as an index. Every pattern for which the ratio was greater than 20% (one hit on shuffled proteins for 5 hits on native proteins) was skipped during the search with prosite scan. The program used to shuffle protein sequences (db shuffled) and the program used to determine the statistics for each pattern in the protein data banks (prosite_statistics) are available on the ftp site http:Ilulrec3.unil.chlftpserveurlprosite_scan.
A) Proteins which are closely related to known proteins Protein of SEQ ID N0: 76 (internal designation 105-095-1-0-D10-FLT
The protein of SEQ ID N0: 76 encoded by the cDNA of SE(~ ID N0:26 exhibits identity to the human parotid secretory protein HPSP (Genseq accession number W60682 and SEQ
ID NO : 124) as shown by the alignment of figure 10. Antagonists of this protein may be used to treat cancer and autoimmune diseases particularly of secretory or gastrointestinal tissue.
Taken together, these data suggest that the protein of SEQ ID N0: 76 or part thereof may play a role in cell differentiation andior proliferation. Thus, this protein or part therein, may be useful in diagnosing andlor treating several disorders including, but not limited to cancer and autoimmune diseases.
Protein of SEQ ID N0: 93 (internal designation 117-007-2-0-C4-FLC) The protein of SEQ ID N0: 93 encoded by the cDNA of SEQ ID N0:43 exhibits identity to a human protein thought to be transmembraneous (Genseq accession number W88491 and SEQ
ID NO : 125) as shown by the alignment of figure 11. This protein displays homology to alpha-2-HS glycoprotein precursors (fetuins) of human and pigs, which belong to the cystatin superfamily. The 382-amino-acid-long protein of SEQ ID N0: 93, which is similar in size to fetuins, displays a cystatin-like domain with 12 conserved cysteines (positions 36, 93, 104, 117, 137, 151, 154, 216, 224, 237, 254 and 368, in bold in figure 11) and a conserved region around the second cysteine (positions 89 to 96, underlined in figure 11 ) although the typical PROSITE signatures for fetuins is not present. In addition, the potential active site QxVxG is also present in the protein of the invention (positions 198 to 202, in italics in figure 11). The cystatin superfamily contain evolutionarily related proteins with diverse functions such as cysteine protease inhibitors, stefins, fetuins and kininogens (see review by Brown and Dziegielewska, Prot. Science, 6:5-12 (1997)).
Taken together, these data suggest that the protein of SEQ ID N0: 93 or part thereof may play a role in cellular proteolysis, maybe as a protease inhibitor. Thus, this protein or part therein, may be useful in diagnosing and/or treating several disorders including, but not limited to, cancer, and especially tumor progression and metastasis, chronic inflammation, neurodegenerative diseases such as Alzheimer disease, diabetes, hypertension and immune disorders. It may also be useful in treating patients with cardiovascular disorders by modulating their blood coagulation properties.
Protein of SE01D N0: 75 (internal designation 105 031 3 0 D6 FLC) The protein of SEQ ID N0: 75 encoded by the cDNA of SE0 ID N0:25 exhibits homology to a murine putative sialyltransferase protein (TREMBL accession number 088725 and SEO ID NO : 126) as shown by the alignment of figure 12. Sialyltransferases are type II
transmembrane proteins involved in the biosynthesis of sialosides which are important in a large variety of biological processes such as cell-cell communication, cell-matrix interactions, maintenance of serum glycoproteins in the circulation, and so on (Sjoberg et aL, J. Biol. Chem. 271:7450-7459 (1996); Tsuji, J. Biochem. 120:1-13 (1996)). The protein of SEQ ID N0: 75 displays the two conserved motifs of the sialyltransferase protein family, namely the centrally located sialylmotifL (positions 73 to 120, in bold in figure 12) thought to be involved in the recognition of the sugar nucleotide donor common to all sialyltransferases and the sialylmotifS
(positions 211 to 233, in italics in figure 12) thought to be the catalytic site and located in the C-terminus of the protein. Furthermore, the 302-amino-acid long protein of SEQ ID N0: 75 has a size similar to the one of the members of the sialyltransferase family. In addition, the protein of the invention has a predicted transmembrane structure.
Indeed, it contains 2 potential transmembrane segment (positions 7 to 27 and 206 to 226, underlined in figure 12) as predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes, 10 :685 686 (1994)).
Taken together, these data suggest that the protein of SEQ ID N0: 75 or part thereof may play a role in the biosynthesis of sialyl-glycoconjugates, probably as a sialyltransferase. Thus, this protein or part therein, may be useful in diagnosing andlor treating several disorders including, but not limited to, cancer, cystic fibrosis and hypothyroidism.

Proteins of SEO ID NOs: 104 (internal designation 108-008-5-0-C5-FL) The protein of SEA ID N0: 104 encoded by the cDNA of SEQ ID N0: 54 exhibits extensive homology over the whole length of the murine recombination activating gene 1 inducing protein (Genbank accession number X96618 and SEO ID NO : 177). As shown by the alignment of figure 13, the amino acid residues are identical except for the positions 6, 7, 10-13, 17, 25, 34-35, 42, 51, 56, 62, 68, 71, 74, 78, 91, 93, 95-96, 106, 121-122, 151-152, 159, 162-163, 170-171, 176-177, 188, 190, 192, 196, 199, 202-203, 206, 210, 215 and 217 of the 221 amino acid long matched protein. This protein with 4 potential transmembrane segments is involved in the induction of the recombination of V(D)J segments in T cells (Muraguchi et al, Leuk Lymphoma, 30 :73-85 (1998)).
Taken together, these data suggest that the protein of SEQ ID N0: 104 may play a role in lymphocyte repertoire formation. Thus, this protein or part therein, may be useful in diagnosing andlor treating several disorders including, but not limited to, cancer, immunological disorders and inflammatory disorders. It may also be useful to modulate the inflammatory or immune response to infectious agents, such as HIV.
B) Proteins which are remotely related to proteins with known functions Proteins of SEQ ID N0: 87 (internal desi4nation 116-073-4-0-C8-FLC) Part of the protein of SEQ ID NO : 87 encoded by the cDNA of SEQ ID N0:37 shows homology over the whole length of the widely conserved family of lysozyme C precursors (fish, bird, and mammals). In addition, this protein displays the characteristic alpha-lactalbuminllysozyme C PROSITE signature of this family of gfysosyl hydrolases, family 22 (positions 162 to 180, see Tabie V).
Lysozymes C are bacteriolytic defensive enzymes and alpha-lactalbumin is the regulatory subunit of lactose synthetase. Lysozymes C and alpha-lactalbumin appear to be evolutionary related (Qasba and Kumar, Crit.
Fi!ev. Biochem. Mot. Biol.
32:255-306 (1997)).
Taken together, these data suggest that the protein of SEQ ID N0: 87 or part thereof, especially the domain matching the above mentioned iysozyme C precursors, may play a role in glycoprotein andlor peptidoglycan metabolism, probably as a glycosyl hydrolase. Thus, this protein or part thereof, may be useful in diagnosing andlor treating several disorders including, but not limited to, cancer and amyloidosis. It may also be useful in modulating defensive responses to infectious agents such as bacteria.
Proteins of SEO ID NO: 86 (internal designation 116-054-3-0-G12-FLC) The protein of SEQ ID N0: 86 encoded by the cDNA of SEQ ID N0:36 found in liver shows homology to the MLRQ subunit of NADH-uniquinone oxidoreductase (complex I) of bovine, murine and human species (Genbank accession numbers X64897, U59509 and EMBL accession number U94586 respectively). In addition, the 83-amino-acid-long protein of SEQ ID N0: 86 has a size similar to those of known MLR4 subunits. Complex I is part of the mitochondria) electron transport chain and is involved in the dehydrogenation of NADH and the transportation of electrons to coenryme Q. It is also thought to play a role in the regulation of apoptosis and necrosis. Mitochondriocytopathies due to complex I deficiency are frequently encountered and affect tissues with a high-energy demand such as brain (mental retardation, convulsions, movement disorders), heart (cardiomyopathy, conduction disorders), kidney (Fanconi syndrome), skeletal muscle (exercise intolerance, muscle weakness, hypotonia) andlor eye (opthmaloplegia, ptosis, cataract and retinopathy). For a review on complex I, see Smeitink ef al., Hum. Mol. Gent., 7 : 1573-1579 (1998).
5 Taken together, these data suggest that the protein of SEQ ID N0: 86 may be a NADH-ubiquinone oxidoreductase MLRQ-like protein. Thus, this protein or part thereof, may be useful in diagnosing andlor treating several disorders including, but not limited to. brain disorders (mental retardation, convulsions, movement disorders), heart disorders (cardiomyopathy, conduction disorders), kidney disorders (Fanconi syndrome), skeletal muscle disorders (exercise intolerance, muscle weakness, hypotonia) and/or eye 10 disorders (opthmalmoplegia, ptosis, cataract and retinopathy).
Protein of SEQ ID N0: 91 (interval desi4nation 117 005-4 0 E5 FLC) The protein of SEQ ID N0:91 encoded by the cDNA of SEQ ID N0:41 found in liver shows homology over domains of a family of mitochondria) substrate carver proteins found in the inner mitochondria) membrane. These carrier proteins are evolutionary related and consist of three tandem 15 repeats of a domain of approximately one hundred residues with each of these domains containing two transmembrane regions. The 308-amino-acid-long protein of SEQ lD N0:91 has a size similar to the one of mitochondria) carver proteins and displays the characteristic PROSITE
signature of this protein family three times (positions 19 to 28, 115 to 124 and 237 to 246, see Table V). In addition, the protein of SEO ID N0:
91 has 6 potential transmembrane segments of 20 amino acids, 4 being predicted with a high level of 20 confidence (positions 1-21, 54-74, 135-155 and 217-237) and 2 with a lower level of confidence (positions 96-116 and 191-211), using the TopPred II software (Claros and von Heijne, CABIOS applic. Notes, 10 :685-686 (1994)).
Taken together, these data suggest that the protein of SEQ ID N0: 91 or part thereof may play a role in energy transfer, probably as a mitochondria) substrate carver protein.
Thus, this protein or part 25 thereof, may be useful in diagnosing and/or treating several disorders including, but not limited to, mitochondriocytopathies and obesity.
In particular, the protein of SEQ ID N0: 91 encoded by the cDNA of SEQ ID N0:
41 exhibits homology to apolipoprotein A-IV related protein. Lipoproteins such as HDL and LDL contain characteristic apolipoproteins that are responsible for targeting them to certain tissues and for activating enzymes required 30 for the trafficking of the lipid fraction of the lipoprotein (including cholesterol). Apolipoprotein A-IV-related protein (AA4RP) is a member of the apolipoprotein family; it is 52% similar (29% identical) to Apolipoprotein A-IV ( ApoA-IV) and therefore is likely to have a similar function. ApoA-IV is found associated with the chylomicron and HDL fraction of blood. Its specific function is currently unknown; however, it is expressed in the liver and intestine and regulated by high fat meals (upregulated) and by leptin (downregulated). Levels of 35 ApoA-IV are correlated with glycemic control in young type I diabetes (IDDM) patients. Over-expression of the protein is protective against atherosclerosis in mice with ApoE knockouts.
Finally, ApoAIV is responsible for part of the inter-individual variability in blood cholesterol response to changes in dietary fatlcholesterol intake.
AA4RP circulates in the blood, and is therefore easily amenable to therapeutic intervention, by direct administration into the blood of synthetic peptide analogs that mimic its activity or function as competitive antagonists (dominant negatives). Since this protein is involved in fat transport and in cholesterol trafficking within the body and mediates the changes in blood cholesterol in response to dietary changes, interventions targeted at this protein will be useful for cholesterol lowering and anti-atherosclerosis therapeutics, and in the control of diabetes and obesity.
Proteins of SEO ID N0: 74 (internal designation 105-016-3-0-E3-FLC) The 325-amino-acid-long protein of SEQ ID NO : 74 encoded by the cDNA of SEO
ID N0: 24 shows homology over the whole length of the 332-amino-acid-long murine neural proliferation differentiation and control 1 protein or NPDC-1 (Genbank accession number X67209) which is thought to play an important role in the control of neural cell proliferation and differentiation as well as in cell survival probably by interacting directly or not with cell cycle regulators such as E2F-1 (Galiana et al., Proc. Natl. Acad. Sci. USA
92:1560-1564 (1995); Dupont et aL, J. Neurosci. Res. 51:257-267 (1998))..
Taken together, these data suggest that the protein of SEQ ID N0: 74 or part thereof may play a role in cell proliferation and differentiation. Thus, this protein or part thereof, may be, useful in diagnosing andlor treating several disorders including, but not limited to cancer and neurodegenerative disorders.
Protein of SEQ ID N0: 111 (internal designation 108-013-5-0-H9-FL) The protein of SEA ID N0: 111 encoded by the extended cDNA SEO lD N0: 61 shows homologies with a family of lysophospholipases conserved among eukaryotes (yeast, rabbit, rodents and human). In addition, some member; of this family (rat :Genbank accession number 097146, rabbit Genbank accession number 097147) exhibit a calcium-independent phospholipase A2 activity (Portilla et al, J. Am. Soc. Nephro., 9 :1178-1186 (1998)). All members of this family exhibit the active site consensus GXSXG motif of carboxylesterases that is also found in the protein of the invention (position 54 to 58). In addition, this protein may be a membrane protein with one transmembrane domain as predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes, 10 :685-686 (1994)).
Taken together, these data suggest that the protein of SEo ID N0:111 may play a role in fatty acid metabolism, probably as a phospholipase. Thus, this protein or part therein, may be useful in diagnosing andlor treating several disorders including, but not limited to, cancer, neurodegenerative disorders such as Parkinson's and Alzheimer's diseases, diabetes. It may also be useful in modulating inflammatory responses to infectious agents andlor to suppress graft rejection.
Protein of SEQ ID NOs:101 (internal designation 108-005-5-0-F9-FL) The protein of SEQ ID N0:71 encoded by the extended cDNA SEQ ID N0: 51 shows homology with the Drosophila rhythmically expressed gene 2 protein (Genbank accession number 065492).
Expression of the mRNA coding for the matched protein is dependent on the interplay between light-dark cycle, feeding conditions and expression of the per gene which is essential to the function of the endogenous circadian pacemaker (Van fielder et al., Curr Biol., 5 :1424-1436 (1995)).
Taken together, these data suggest that the protein of SECI ID N0: 101 may play a role in circadian control. Thus, this protein or part therein, may be useful in diagnosing andlor treating several disorders including, but not limited to, insomnia, depression, stress and other disorders of the circadian rhythm. In addition, such a protein may be useful in modulating the physiological response to night work or to jet lag.
C) Proteins homologous to a domain of a protein with known function Protein of SE0 ID N0: 94 (interval designation 121-004-3-0 F6 FLC) The protein of SEO ID N0: 94 encoded by the cDNA of SEO 1D N0:44 found in brain shows homology to a ganglioside-induced differentiation associated protein 1 found in both human (EMBL
accession number 075786) and murine species (EMBL accession number 088741 ).
Gangliosides are believed to be involved in neural cell development, differentiation, survival and ;.:athology, maybe as modulators of membrane properties (Brigande and Seyfried, Ann. N. Y. Acad.
Sci. 845:215-218 (1998);
Schengrund and Mummert, Ann. N. Y. Acad. Sci. 845:278-284 (1998)).
Taken together, these data suggest that the protein of SEO ID N0: 94 or part thereof may play a role in central nervous system development and differentiation. Thus, this protein or part thereof, may be useful in diagnosing and treating several disorders including, but not limited to, cancer and neuronal disorders.
Protein of SE0 ID N0: 89 (interval designation 117-005-2-0-E10 FLC) The protein of SEQ ID N0: 89 encoded by the cDNA of SEO ID N0:39 shows remote homology to domains of apolipoprotein A-IV of human, murine and chicken species (Genbank accession numbers M13654, M13966, and EMBL accession number 093601 respectively). These apolipoproteins are thought to play a role in chylomicrons and VLDL secretion and catabolism and may also be involved in reverse cholesterol transport. In addition, the 366-amino-acid-long protein of SEQ ID
N0: 89 has a size similar to those of above-mentioned apolipoprotein A-IV.
The protein of SEQ ID N0: 89 encoded by the cDNA of SEQ ID N0: 39 exhibits homology to the camitine carrier related protein. The camitine carrier-related protein (CCRP) is 45% similar (30% identical) to the acyl-camitinelcamitine carver and is therefore likely to have a similar function. The acyl-camitinelcamitine carrier is a mitochondria) carrier protein that is responsible for transporting fatty acids into the mitochondrion where they may be oxidized to produce energy. CCRP also shares underlying structural similarities with the uncoupling protein (UCP-1), another mitochondria) transporter protein which is involved in weight regulation and temperature homeostasis. UCP protein activity is regulated by nucleotides via a 9 amino acid protein domain that is relatively well conserved in the predicted CCR protein (6 of 9 identical, 9 of 9 similar), compared to only 4 of 9 for the acyl-camitinelcamitine carrier itself. Therefore the function of the CCRP may be amenable to direct activation or inhibition via small molecule nucleotide analogs.
Acyl-camitinelcamitine tamer is required for transport of fatty acids into mitochondria before they can be oxidized for energy, however genetic mutations of this gene do not result in disturbances of weight.

This indicates that another protein must also be available for fatty acid transport, and CCRP is likely to be this transporter.
The rate of lipid burning by the mitochondrion is dependent upon the rate of delivery of fatty acids into the mitochondrion by these transporters. Regulation of the activity of CCRP, via its nucleotide binding domain or by other interventions to increase its availability or activity in the mitochondria, would increase the fat burning capacity of tissues. Since elevated plasma free fatty acids have been implicated in the causation of type II diabetes (NIDDM) such interventions could be designed to increase net clearance of lipids from the blood. Other effects of therapeutics targeted at CCRP could be to increase fat burning by liver and muscle at the expense of fat storage by adipose tissue, with the result of decreasing weight.
Taken together, these data suggest that the protein of SEQ ID N0: 89 may play a role in lipid metabolism. Thus, this protein or part thereof, may be useful in diagnosing and treating several disorders including, but not limited to, hyperlipidemia, hypercholesterolemia, atherosclerosis, cardiovascular disorders such as coronary heart disease, neurodegenerative disorders such as Alzheime~s disease or dementia, and obesity.
Protein of SEQ ID N0: 95 (internal designation 122-005-2-0-F11-FLC) The protein of SEQ ID N0: 95 encoded by the cDNA of SEO ID N0:45 exhibits homology with domains of a family of reductases, and especially with the NADH-cytochrome b5 reductase of rat, bovine and human species (Genbank accession numbers J03867, M83104 and Y09501, respectively). The homology include the flavin-adenine dinucleotide-binding domain of NADH-cytochrome b5 reductase proteins which belong to a flavoenzyme family whose members are involved in photosynthesis, in the assimilation of nitrogen and sulfur, in fatty-acid oxidation, in the reduction of methemoglobin and in the metabolism of many pesticides, drugs and carcinogens.
Taken together, these data suggest that the protein of SEQ ID N0: 95 may play a role in cellular oxidoreduction reactions, maybe as a fiavoenzyme reductase. Thus, this protein or part thereof, may be useful in diagnosing and treating several disorders including, but not limited to, cancer, methemoglobinemia, hyperfipidemia, obesity and cardiovascular disorders. It may also be useful in regulating the metabolism of pesticides, drugs and carcinogens.
Protein of SEA ID NO: 106 (internal designation 108-011-5-0-B12-FL) The protein of SEQ ID N0: 106 encoded by the extended cDNA SEQ ID N0: 56 shows homology to the predicted extracellular domain and part of transmembrane domain of interleukin-17 receptor of both human and murine species (Genbank accession number; W04185 and W04184). These IL-17R proteins are thought to belong to a new family of receptors for cytokines which induce T cell proliferation, I-CAM
expression and preferential maturation of haematopoietic precursor; into neutrophils (Yao et al., Cytokine., 9:794-8001 (1997)). It is also thought to play a proinflammatory role and to induce nitric oxide. The protein of the invention has a 21 amino acid transmembrane domain (positions 172 to 192) as predicted by the software TopPred II (Claros and von Heijne, CABIOS applic. Notes, 10 :685-686 (1994)) matching the 21 amino acid putative transmembrane domain of human interleukin-17 receptor.

Taken together, these data suggest that the protein of SEQ ID N0: 106 may play a role in regulating immune and/or inflammatory responses. Thus, this protein or part therein, may be useful in diagnosing and treating several disorders including, but not limited to, cancer, immunological disorders, septic shock and impotence. In addition, this protein may also be useful to modulate immune and/or inflammatory responses to infectious responses andlor to suppress graft rejection.
Protein of SEQ ID N0: 114 (internal designation 108-014-5-0-D12 FL) The protein of SEQ ID N0: 114 encoded by the extended cDNA SEQ ID N0: 64 possess a cysteine-rich C3H2C3 region also found in G1 protein of Drosophila melanogaster (Swissprot accession number 006003). This cysteine-rich region is similar to a RING type zinc finger, a domain that binds two atoms of zinc and is probably involved in mediating protein-protein interaction.
Taken together, these data suggest that the protein of SEQ ID N0: 114 may play a role in protein-protein interaction.
The nucleic acid sequences of SEQ ID NOs: 24-73 or fragments thereof may also be used to construct fusion proteins in which the polypeptide sequences of SEQ ID NOs: 74-123 or fragments thereof are fused to heterologous polypeptides. For example, the fragments of the polypeptides of SEO ID NOs. 74 123 which are included in the fusion proteins may comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of the polypeptides of SEQ ID NOs. 74-123 or may be of any length suitable for the intended purpose of the fusion protein. Nucleic acids encoding the desired fusion protein are produced by cloning a nucleic acid of SEQ ID NOs. 24-73 in frame with a nucleic acid encoding the heterologous polypeptide. The nucleic acid encoding the desired fusion protein is operably linked to a promoter in an appropriate vector, such as any of the vectors described above, and introduced into a host capable of expressing the fusion protein.
Antibodies against the polypeptides of SEQ ID NOs. 74-123 or fragments thereof may be used in immunoaffinity chromatography to isolate the polypeptides of SEO ID NOs. 74-123 or fragments thereof or to isolate fusion proteins containing the polypeptides of SEQ ID NOs. 74-123 or fragments thereof.

Immunoaffinity Chromatoqraphy Antibodies prepared as described above are coupled to a support. Preferably, the antibodies are monoclonal antibodies, but polyclonal antibodies may also be used. The support may be any of those typically employed in immunoaffinity chromatography, including Sepharose CL-4B
(Pharmacia, Piscataway, NJ), Sepharose CL-2B (Pharmacia, Piscataway, NJ), Affi-gel 10 (Biorad, Richmond, CA), or glass beads.
The antibodies may be coupled to the support using any of the coupling reagents typically used in immunoaffinity chromatography, including cyanogen bromide. After coupling the antibody to the support, the support is contacted with a sample which contains a target polypeptide whose isolation, purification or enrichment is desired. The target polypeptide may be a polypeptide of SEQ ID
NOs. 74-123, a fragment thereof, or a fusion protein comprising a polypeptide of SEQ ID NOs. 74-123 or a fragment thereof.

Preferably, the sample is placed in contact with the support for a sufficient amount of time and under appropriate conditions to allow at least 50% of the target polypeptide to specifically bind to the antibody coupled to the support.
Thereafter, the support is washed with an appropriate wash solution to remove polypeptides which 5 have non-specifically adhered to the support. The wash solution may be any of those typically employed in immunoaffinity chromatography, including PBS, Tris-lithium chloride buffer (0.1 M lysine base and 0.5M lithium chloride, pH 8.0), Tris-hydrochloride buffer (0.05M Tris-hydrochloride, pH
8.0), or TrisITritonlNaCl buffer (50mM
Tris.cl, pH 8.0 or9.0, 0.1% TMon X-100, and 0.5MNaCl).
After washing, the specifically bound target polypeptide is eluted from the support using the high pH or 10 low pH elution solutions typically employed in immunoaffinity chromatography. In particular, the elution solutions may contain an eluant such as triethanolamine, diethylamine, calcium chloride, sodium thiocyanate, potasssium bromide, acetic acid, or glycine. In some embodiments, the elution solution may also contain a detergent such as Triton X-100 or octyl-(i-D-glucoside.
As discussed above, the cDNAs of the present invention or fragments thereof can be used for 15 various purposes. The polynucleotides can be used to express recombihant protein for analysis, characterization or therapeutic use; as markers for tissues in which the corresponding protein is preferentially expressed (either constitutively or at a particular stage of tissue differentiation or development or in disease states); as molecular weight markers on Southern gels; as chromosome markers or tags (when labeled) to identify chromosomes or to map related gene positions; to compare with endogenous DNA
20 sequences in patients to identify potential genetic disorders; as probes to hybridize and thus discover novel, related DNA sequences; as a source of information to derive PCR primers for genetic fingerprinting; for selecting and making oligomers for attachment to a "gene chip" or other support, including for examination for expression patterns; to raise anti-protein antibodies using DNA
immunization techniques; and as an antigen to raise anti-DNA antibodies or elicit another immune response. Where the polynucleotide encodes 25 a protein which binds or potentially binds to another protein (such as, for example, in a receptor-ligand interaction), the polynucleotide can also be used in interaction trap assays (such as, for example, that described in Gyuris ef al., Cell 75:791-803 (1993)) to identify polynucleotides encoding the other protein with which binding occurs or to identify inhibitors of the binding interaction.
The proteins or polypeptides provided by the present invention can similarly be used in assays to 30 determine biological activity, including in a panel of multiple proteins for high-throughput screening; to raise antibodies or to elicit another immune response; as a reagent (including the labeled reagent) in assays designed to quantitatively determine levels of the protein (or its receptor) in biological fluids; as markers for tissues in which the corresponding protein is preferentially expressed (either constitutively or at a particular stage of tissue differentiation or development or in a disease state); and, of course, to isolate correlative 35 receptors or ligands. Where the protein binds or potentially binds to another protein (such as, for example, in a receptor-ligand interaction), the protein can be used to identify the other protein with which binding PC'T/IB99/02058 occurs or to identify inhibitors of the binding interaction. Proteins involved in these binding interactions can also be used to screen for peptide or small molecule inhibitors or agonists of the binding interaction.
Any or all of these research utilities are capable of being developed into reagent grade or kit format for commercialization as research products.
Methods for performing the uses listed above are well known to those skilled in the art. References disclosing such methods include without limitation "Molecular Cloning; A
Laboratory Manual", 2d ed., Cole Spring Harbor Laboratory Press, Sambrook, J., E.F. Fritsch and T. Maniatis eds., 1989, and "Methods in Enzymology; Guide to Molecular Cloning Techniques", Academic Press, Berger, S.L. and A.R. Kimmel eds., 1987.
Polynucleotides and proteins of the present invention can also be used as nutritional sources or supplements. Such uses include without limitation use as a protein or amino acid supplement, use as a carbon source, use as a nitrogen source and use as a source of carbohydrate.
In such cases the protein or polynucleotide of the invention can be added to the feed of a particular organism or can be administered as a separate solid or liquid preparation, such as in the form of powder, pills, solutions, suspensions or capsules. In the case of microorganisms, the protein or polynucleotide of the invention can be added to the medium in or on which the microorganism is cultured.
Although this invention has been described in terms of certain preferred embodiments, other embodiments which will be apparent to those of ordinary skill in the art in view of the disclosure herein are also within the scope of this invention. Accordingly, the scope of the invention is intended to be defined only by reference to the appended claims.

WO 00/37491 PCT/tB99/02058 TABLE/
Id FCS SigPep Mature Stop PolyA SignalPofyA Site LocationLocation PolypeptideCodon Location Location Location Location 25 26111166261!314 31511166 1167 152411556 29 1441440 144!287 2881440 441 4571462 5001515 36 192!440 1921278 2791440 441 5901595 6221636 38 139113891391198 199!1389 1390 185411859 187311888 47 1651842 1651251 252/842 843 147411479 1500!1515 48 3111248 311135 13611248 1249 '158011585160711622 53 2801678 2801411 4121678 679 1606!1611 162811643 Id FCS SigPep Mature Stop PolyA SignalPolyA Site LocationLocation PolypeptideCodon Location Location Location Location 64 23811152238/339 340/1152 1153 12~ x:13031324/1355 TABLE II
Id Full Length Signal Mature Polypeptide Peptide Polypepfide Location Location Location 74 -26 through -26 throughthrough 75 -18 through -18 throughthrough 76 -15 through -15 throughthrough 77 1 through 1 through 78 -13 through -13 throughthrough 79 -48 through -48 throughthrough 80 -32 through -32 through1 through 81 -46 through -46 through1 through 82 -19 through -19 through1 through 83 -21 through -21 through1 through 84 -70 through -70 through1 through 85 -32 through -32 through1 through 86 -29 through -29 through1 through 87 -41 through -41 through1 through 88 -20 through -20 through1 through 89 -23 through -23 through1 through 90 -45 through -45 through1 through 91 -68 through -68 through1 through 92 -49 through -49 through1 through 93 -15 through -15 through1 through 94 -197 through-197 through1 through 95 -26 through -26 through1 through 96 -25 through -25 through1 through 97 -29 through -29 through1 through i97 -1 197 98 -35 through -35 through1 through 99 -57 through -57 through1 through 100 -36 through -36 through1 through 101 -243 through-243 through1 through 102 -24 through -24 through1 through 103 -44 through -44 through1 through 104 -28 through -28 through1 through Id Full LengthSignal Mature PolypeptidePeptide Polypeptide Location Location Location 105 -23 through-23 through1 through -184 through-184 through1 through 107 -23 through-23 through1 through 108 -49 through-49 through1 through 109 -28 through-28 through1 through 110 -37 through-37 through1 through 111 -88 through-88 through1 through 112 -56 through-56 through1 through 113 -20 through-20 through1 through 114 -34 through-34 through1 through 115 -42 through-42 through1 through 116 -15 through-15 through1 through 117 -30 through-30 throughthrough 118 -90 through-90 throughthrough 119 -25 through25 throughthrough 120 -101 through101 throughthrough 121 86 through 86 throughthrough 122 21 through 21 throughthrough 123 19 through 19 throughthrough WO 00!37491 PCT/IB99/02058 TABLE III
Id ~ Positions of preferred fragments 24 1 -126, 164-259, 420-432, 25 32-44,4199-1556 26 1-19, 1011-1058 27 1-16, 108-159, 595-648 28 1-119, 486-665, 1968-2009, 29 424-435, 500-515 30 1-122, 242-661 31 1-16, 649-694 32 1-663, 1070-110 33 1-129, 541-623 34 1-200,614-657 35 1-419,1094-1137 36 1-127, 323-331, 595-636 38 1-47, 438-611, 1005-1133, 39 1-430, 527-1894 40 1-119, 1743-1792,1866-1913 41 1-70, 133-1235, 1729-1744 42 575-615, 896-946 43 513-526,950-960,1577-1622 44 1-2, 210-265, 674-715 45 1400-1441, 1508-1549 46 1-4, 1284,1328 TABLE (V
Internal designationId Type of sequence 12i-004-3-0-F6-FL44 DNA

I

108-011-5-p-B12-FL6 D NA

Internal designationd T ype of sequence I

~

~

Internal designationId Type of sequence 108-008-5-0-G5-FL10~ PRT

WO 00/37491 PC'1'/IB99/02058 TABLE V
Id LocationsPROSITE signature Name 87 162-180 Alpha-lactalbumin I lysozyme C

91 19-28 Mitochondria) energy transfer proteins 91 143-152 Mitochondria) energy transfer proteins 91 389-398 Mitochondria) energy transfer proteins FREE TEXT OF SEQUENCE LISTING
Von Heijne matrix Score oligonucleotide used as a primer matinspector prediction name complement <110> Genset <120> Complementary DNA's Encoding Proteins with Signal Peptides <130> 10488-37 LAB
<150> 60/113,686 <151> 1998-12-22 <150> 60/141,032 <151> 1999-06-25 <160> 176 <170> PatentIn Ver. 2.0 <210> 1 <211> 1447 <212> DNA
<213> Homo sapiens <220>
<221> CDS
<222> 501..1253 <220>
<221> sig-peptide <222> 501..1229 <223> Von Heijne matrix score 4.1 seq LPSLAHLLPALDC/LE
<220>
<221> polyA-signal <222> 1392..1397 <220>
<221> polyA_site <222> 1432..1447 <220>

<221> misc feature <222> 243,252;278,285,387,1429 <223> n=a, g, c or t <400> 1 gtgagtcagg tgggtcctgg gcccaggaaccggcccggag ccgtggacgc cctacagctg60 agaaggggac ccaaggggtc ggccgcggccaaggccccta ggaccgccgc cccagctcac120 gctgccgacg gcagctatag acattctgcgtcaggtccgg gctcctggac tttgcctttc180 ccgagccctg gaggtgggga gaaaaggttcaccaattttt aaaatccaaa tatatctcat240 ggntacagtg gnaagaactg gccagagagtctggaagntt tgggnttctg gtcctggctg300 tgccactgac tcactgtgac cttgggatcttgtgctgtga agacatttcc caagtgcttc360 atgttagcca gcaaatctga cccacanggcctggaaagag gtgattgtta ggttgcgcag420 aggtggtctt atccagctca gcttcccctgggacccaccg tgggacctga ggcagaactg480 gggtggactt ggcctcctcc atg c cgg ctg g ata cga ctg ctg 533 gca ca ca acg Met Ala Hi s Arg Leu n Ile Arg Leu Leu Gl Thr tgg gat gtg aag gac acg agg ctc cac ccc tta ggg 581 ctg ctc cgc gag Trp Asp Val Lys Asp Thr Arg Leu His Pro Leu Gly Leu Leu Arg Glu gcc tat gcc acc aag gcc cat ggg gag gtg gag ccc 629 cgg gcc ctg tca Ala Tyr Ala Thr Lys Ala His Gly Glu Val Glu Pro Arg Ala Leu Ser gcc ctg gaa caa ggc ttc gca tac get cag agc cac 677 agg cag agg agc Ala Leu Glu Gln Gly Phe Ala Tyr Ala Gln Ser His Arg Gln Arg Ser ttc ccc aac tac ggc ctg ggc cta tcc cgc cag tgg 725 agc cac acc tgg Phe Pro Asn Tyr Gly Leu Gly Leu Ser Arg Gln Trp Ser His Thr Trp ctg gat gtg gtc ctg cag cac ctg ggt gtc cag gat 773 acc ttc gcg get Leu Asp Val Val Leu Gln His Leu Gly Val Gln Asp Thr Phe Ala Ala cag get gta gcc ccc atc cag ctt aaa gac ttc agc 821 get gaa tat cac Gln Ala Val Ala Pro Ile Gln Leu Lys Asp Phe Ser Ala Glu Tyr His ccc tgc acc tgg cag gtg ggg get gac acc ctg agg 869 ttg gat gag gag Pro Cys Thr Trp Gln Val Gly Ala Asp Thr Leu Arg Leu Asp Glu Glu tgc cgc aca cgg ggt ctg gca gtg tcc aac ttt gac 917 aga ctg atc cga Cys Arg Thr Arg Gly Leu Ala Val Ser Asn Phe Asp Arg Leu Ile Arg cgg cta gag ggc atc ctg ctt ggc cgt gaa cac ttc 965 gag ggc ctg gac Arg Leu Glu Gly Ile Leu Leu Gly Arg Glu His Phe Glu Gly Leu Asp tttgtgctgacc tccgag getgetggc tggccc aagccggacccc cgc 1013 PheValLeuThr SerGlu AlaAlaGly TrpPro LysProAspPro Arg attttccaggag gccttg cggcttget catatg gaaccagtagtg gca 1061 IlePheGlnGlu AlaLeu ArgLeuAla HisMet GluProValVal Ala gcccatgttggg gataat tacctctgc gattac caggggcctcgg get 1109 AlaHisValGly AspAsn TyrLeuCys AspTyr GlnGlyProArg Ala gtgggcatgcac agcttc ctggtggtt ggccca caggcactggac ccc 1157 ValGlyMetHis SerPhe LeuValVal GlyPro GlnAlaLeuAsp Pro gtggtcagggat tctgta cctaaagaa cacatc ctcccctctctg gcc 1205 ValValArgAsp SerVal ProLysGlu HisIle LeuProSerLeu Ala catctcctgcct gccctt gactgccta gagggc tcaactccaggg ctt 1253 HisLeuLeuPro AlaLeu AspCysLeu GluGly SerThrProGly Leu tgaggccagt gagggaagtg gaaaacctta aacaaaccct gctgggccct aggccatgga ggagacaggg agccccttct ctggacct ttccccctct ccctgcggcc 1373 ttctccacag ct tttgtcacct actgtgataa tgctgagc tctcaccctt cccccnccaa 1433 taaagcagtg ag aaaaaaaaaa aaaa 1447 <210> 2 <211> 251 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -243..-1 <400> 2 Met Ala His Arg Leu Gln Ile Arg Leu Leu Thr Trp Asp Val Lys Asp Thr Leu Leu Arg Leu Arg His Pro Leu Gly Glu Ala Tyr Ala Thr Lys Ala Arg Ala His Gly Leu Glu Val Glu Pro Ser Ala Leu Glu Gln Gly Phe Arg Gln Ala Tyr Arg Ala Gln Ser His Ser Phe Pro Asn Tyr Gly Leu Ser His Gly Leu Thr Ser Arg Gln Trp Trp Leu Asp Val Val Leu Gln Thr Phe His Leu Ala Gly Val Gln Asp Ala Gln Ala Val Ala Pro Ile Ala Glu Gln Leu Tyr Lys Asp Phe Ser His Pro Cys Thr Trp Gln Val Leu Asp Gly Ala Glu Asp Thr Leu Arg Glu Cys Arg Thr Arg Gly Leu Arg Leu Ala Val Ile Ser Asn Phe Asp Arg Arg Leu Glu Gly Ile Leu Glu Gly Leu Gly Leu Arg Glu His Phe Asp Phe Val Leu Thr Ser Glu Ala Ala Gly Trp Pro Lys Pro Asp Pro Arg Ile Phe Gln Glu Ala Leu Arg Leu Ala His Met Glu Pro Val Val Ala Ala His Val Gly Asp Asn Tyr Leu Cys Asp Tyr Gln Gly Pro Arg Ala Val Gly Met His Ser -5p -45 -40 Phe Leu Val Val Gly Pro Gln Ala Leu Asp Pro Val Val Arg Asp Ser Val Pro Lys Glu His Ile Leu Pro Ser Leu Ala His Leu Leu Pro Ala Leu Asp Cys Leu Glu Gly Ser Thr Pro Gly Leu <210> 3 <211> 1448 <212> DNA
<213> Homo sapiens <220>
<221> CDS
<222> 131..490 <220>
<221> sig~eptide <222> 131..301 <223> Von Heijne matrix score 5.31 seq AIALATVLFLIGA/FL
<220>

-<221> polyA_signal <222> 1411..1416 <220>
<221> polyA-site <222> 1434..1448 <400> 3 ctgatcccgc ccatgccatg 60 ctggggccgg caaccttggg ctgagtggca cttaagcggg cgctgccaac tcgcgcggcg 120 cgtgggcgag ctccgctgtg ctctgggtgt gcgggcggcc tcagcgtgtt 169 atg atg ccg tcc cgt acc aac ctg get act gga atc ccc Met Met Ile Pro Pro Ser Arg Thr Asn Leu Ala Thr Gly agt agt gtg aaa tat tca agg ctc tcc aca gac ggc tac 217 aaa agc gat Ser Ser Val Lys Tyr Ser Arg Leu Ser Thr Asp Gly Tyr Lys Ser Asp att gac cag ttt aag aaa acc cct cct atc cct aag gcc 265 ctt aag tat Ile Asp Gln Phe Lys Lys Thr Pro Pro Ile Pro Lys Ala Leu Lys Tyr atc gca gcc act gtg ctg ttt ttg att gcc ttt att att 313 ctt ggc ctc Ile Ala Ala Thr Val Leu Phe Leu Ile Ala Phe Ile Ile Leu Gly Leu ata ggc ctc ctg ctg tca ggc tac atc aaa ggg gca gac 361 tcc agc ggg Ile Gly Leu Leu Leu Ser Gly Tyr Ile Lys Gly Ala Asp Ser Ser Gly cgg gcc cca gtg ctg atc att ggc att gtg ttc ccc gga 409 gtt ctg cta Arg Ala Pro Val Leu Ile Ile Gly Ile Val Phe Pro Gly Val Leu Leu ttt tac ctg cgc atc get tac tat gca aaa ggc cgt ggt 457 cac tcc tac Phe Tyr Leu Arg Ile Ala Tyr Tyr Ala Lys Gly Arg Gly His Ser Tyr tac tcc gat gac att cca gac ttt gat tagcacccaccccatagctg510 tat gac Tyr Ser Asp Asp Ile Pro Asp Phe Asp Tyr Asp aggaggagtcacagtggaac tgtcccagct ttaagatatctagcagaaactatagctgag570 gactaaggaattctgcagct tgcagatgtt taagaaaataatggccagattttttgggtc630 cttcccaaagatgttaagtg aacctacagt tagctaattaggacaagctctatttttcat690 ccctgggccctgacaagttt ttccacagga atatgtatcatggaagaatagaggttattc750 tgtaatggaaaagtgttgcc tgccaccacc ctctgtagagctgagcatttcttttaaata810 gtcttcattgccaatttgtt cttgtagcaa atggaacaatgtggtatggctaatttctta870 ~

ttattaagtaatttatttta aaaatatctg agtatattatcctgtacacttatccctacc930 ttcatgttccagtggaagac cttagtaaaa tcaaagatcagtgagttcatctgtaatatt990 WO 00/37491 PCT/(B99/02058 ttttttacttgctttcttactgacagcaaccaggaatttttttatcctgcagagcaagtt1050 ttcaaaatgtaaatacttcctctgtttaacagtccttggaccattctgatccagttcacc1110 agtaggttggacagcatataatttgcatcattttgtcccttgtaaatcaagatgttctgc1170 agattattcctttaacggccggacttttggctgtttcctaatgaaacatgtagtggttat1230 tatttagagtttatagccgtattgctagcaccttgtagtatgtcatcattctgctcatga1290 ttccaaggatcagcctggatgcctagaggactagatcaccttagtttgattctatttttt1350 agcttgcaaaaagtgacttatattccaaagaaattaaaatgttgaaatccaaatcctaga1410 aataaaatgagttaacttcaaacaaaaaaaaaaaaaaa 1448 <210> 4 <211> 120 <212> PRT
<213> Homo sapiens <220>
<221> SIGNAL
<222> -57..-1 <400> 4 Met Met Pro Ser Arg Thr Asn Leu Ala Thr Gly Ile Pro Ser Ser Lys Val Lys Tyr Ser Arg Leu Ser Ser Thr Asp Asp Gly Tyr Ile Asp Leu Gln Phe Lys Lys Thr Pro Pro Lys Ile Pro Tyr Lys Ala Ile Ala Leu Ala Thr Val Leu Phe Leu Ile Gly Ala Phe Leu Ile Ile Ile Gly Ser Leu Leu Leu Ser Gly Tyr Ile Ser Lys Gly Gly Ala Asp Arg Ala Val Pro Val Leu Ile Ile Gly Ile Leu Val Phe Leu Pro Gly Phe Tyr His Leu Arg Ile Ala Tyr Tyr Ala Ser Lys Gly Tyr Arg Gly Tyr Ser Tyr Asp Asp Ile Pro Asp Phe Asp Asp <210> 5 <211> 1515 <212> DNA
<213> Homo sapiens <220>

<221> CDS
<222> 165..842 <220>
<221> sig~eptide <222> 165..251 <223> Von Heijne matrix score 7.01 seq LASFAALVLVCRQ/RY
<220>
<221> polyA_signal <222> 1474..1479 <220>
<221> polyA-site <222> 1500..1515 <400>

agtcgcggga tgaggccctc aggtctctgc ggtgtcgtg 60 tgcgcccggg a agccacagcc gaggaaccta caatttgcca cttccagcag tttagccca gcacctgcca c tcctcttccc tgaggaggat agccctctgg aagcatg actgtg 176 gtgaccggga gag ctgagtcagg MetGlu ThrVal gtgatt gttgcc ataggtgtg ctg gccaccatc tttctgget tcgttt 224 ValIle ValAla IleGlyVal Leu AlaThrIle PheLeuAla SerPhe gcagcc ttggtg ctggtttgc agg cagcgctac tgccggccg cgagac 272 AlaAla LeuVal LeuValCys Arg GlnArgTyr CysArgPro ArgAsp ctgctg cagcgc tatgattct aag cccattgtg gacctcatt ggtgcc 320 LeuLeu GlnArg TyrAspSer Lys ProIleVal AspLeuIle GlyAla atggag acccag tctgagccc tct gagttagaa ctggacgat gtcgtt 368 MetGlu ThrGln SerGluPro Ser GluLeuGlu LeuAspAsp ValVal atcacc aacccc cacattgag gcc attctggag aatgaagac tggatc 416 IleThr AsnPro HisIleGlu Ala IleLeuGlu AsnGluAsp TrpIle gaagat gcctcg ggtctcatg tcc cactgcatt gccatcttg aagatt 464 GluAsp AlaSer GlyLeuMet Ser HisCysIle AlaIleLeu LysIle tgtcac actctg acagagaag ctt gttgccatg acaatgggc tctggg 512 Cys His Thr Leu Thr Glu Lys Leu Val Ala Met Thr Met Gly Ser Gly gcc aag atg aag act tca gcc agt gtc agc gac atc att gtg gtg gcc 560 Ala Lys Met Lys Thr Ser Ala Ser Val Ser Asp Ile Ile Val Val Ala aag cgg agc ccc agg gtg gat gtt aag tcg tac cct 608 atc gat gtg atg Lys Arg Ser Pro Arg Val Asp Val Lys Ser Tyr Pro Ile Asp Val Met ccg ttg ccc aaa ctc ctg gca cgg act gcc ctc ctg 656 gac gac acg ctg Pro Leu Pro Lys Leu Leu Ala Arg Thr Ala Leu Leu Asp Asp Thr Leu tct gtc cac ctg gtg ctg aca agg gcc tgc ctg acg 704 agt gtg aat cat Ser Val His Leu Val Leu Thr Arg Ala Cys Leu Thr Ser Val Asn His gga ggc gac tgg att gac tct ctg get get gag cat 752 ctg cag tcg gag Gly Gly Asp Trp Ile Asp Ser Leu Ala Ala Glu His Leu Gln Ser Glu ttg gaa ctt cga gaa gca cta get gag cca aaa ggc 800 gtc gcc tct gat Leu Glu Leu Arg Glu Ala Leu Ala Glu Pro Lys Gly Val Ala Ser Asp ctc cca cct gaa ggc ttc cag gag tct gca 842 ggc ctg cag att Leu Pro Pro Glu Gly Phe Gln Glu Ser Ala Gly Leu Gln Ile tagtgcctacaggccagcag ctagccatgaaggcccctgccgccatccctggatggctca902 gcttagccttctactttttc ctatagagttagttgttctccacggctggagagttcagct962 gtgtgtgcatagtaaagcag gagatccccgtcagtttatgcctcttttgcagttgcaaac1022 tgtggctggtgagtggcagt ctaatactacagttaggggagatgccattcactctctgca1082 agaggagtattgaaaactgg tggactgtcagctttatttagctcacctagtgttttcaag1142 aaaattgagccaccgtctaa gaaatcaagaggtttcacattaaaattagaatttctggcc1202 tctctcgatcggtcagaatg tgtggcaattctgatctgcattttcagaagaggacaatca1262 attgaaactaagtaggggtt tcttcttttggcaagacttgtactctctcacctggcctgt1322 ttcatttatttgtattatct gcctggtccctgaggcgtctgggtctctcctctcccttgc1382 aggtttgggtttgaagctga ggaactacaaagttgatgatttcttttttatctttatgcc1442 tgcaattttacctagctacc actaggtggatagtaaatttatacttatgtttcccccaaa1502 aaaaaaaaaaaaa 1515 <210>6 <211>226 <212>PRT

<213>Homo sapiens <220>

<221> SIGNAL
<222> -29..-1 <400> 6 Met Glu Thr Val Val Ile Val Ala Ile Gly Val Leu Ala Thr Ile Phe Leu Ala Ser Phe Ala Ala Leu Val Leu Val Cys Arg Gln Arg Tyr Cys Arg Pro Arg Asp Leu Leu Gln Arg Tyr Asp Ser Lys Pro Ile Val Asp Leu Ile Gly Ala Met Glu Thr Gln Ser Glu Pro Ser Glu Leu Glu Leu Asp Asp Val Val Ile Thr Asn Pro His Ile Glu Ala Ile Leu Glu Asn Glu Asp Trp Ile Glu Asp Ala Ser Gly Leu Met Ser His Cys Ile Ala Ile Leu Lys Ile Cys His Thr Leu Thr Glu Lys Leu Val Ala Met Thr Met Gly Ser Gly Ala Lys Met Lys Thr Ser Ala Ser Val Ser Asp Ile Ile Val Val Ala Lys Arg Ile Ser Pro Arg Val Asp Asp Val Val Lys Ser Met Tyr Pro Pro Leu Asp Pro Lys Leu Leu Asp Ala Arg Thr Thr Ala Leu Leu Leu Ser Val Ser His Leu Val Leu Val Thr Arg Asn Ala Cys His Leu Thr Gly Gly Leu Asp Trp Ile Asp Gln Ser Leu Ser Ala Ala Glu Glu His Leu Glu Val Leu Arg Glu Ala Ala Leu Ala Ser Glu Pro Asp Lys Gly Leu Pro Gly Pro Glu Gly Phe Leu Gln Glu Gln Ser Ala Ile <210>7 <211>1918 <212>DNA

<213>Homo sapiens <220>
<221> CDS
<222> 238..612 WO 00/37491 PC'T/IB99/02058 <220>
<221> sig~eptide <222> 238..348 <223> Von Heijne matrix score 9.4 seq LLCCVLSASQLSS/QD
<220>
<221> polyA_signal <222> 1885..1890 <220>
<221> polyA site <222> 1905..1918 <220>
<221> mist feature <222> 945,1624 <223> n=a, g, c or t <400>

aaaaatctaa ggaagttgtgt aaa tgtgcacgcg 60 gcgacttcga ctacaccaca tgccaa cccagggtgg gtcattaaacaatc aattgtttgt aaaccacagt ttaacatctg tgcaga tgataggcag gatacctacg aaaatcaaaa ctttccttct taaatgcaag tttcaacagt ctgaggtttt caaccccaga aggccgacac gtgctcactg aaaaaaa aaagggctgt atggtatgt gaagatgca ccgtct tttcaaatg gcctgggag agtcaa 285 MetValCys GluAspAla ProSer PheGlnMet AlaTrpGlu SerGln atggcctgg gagaggggg cctgcc cttctctgc tgtgtcctt tcgget 333 MetAlaTrp GluArgGly ProAla LeuLeuCys CysValLeu SerAla tcctagttg agctcccaa gactag gacccactg gggcatata aaatct 381 SerGlnLeu SerSerGln AspGln AspProLeu GlyHisIle LysSer ctgctgtat cctttcggc ttccca gttgagctc ccaagacca ggaccc 429 LeuLeuTyr ProPheGly PhePro ValGluLeu ProArgPro GlyPro actggggca tataaaaaa gtcaaa aatcaaaat caaacaaca agttct 477 ThrGlyAla TyrLysLys ValLys AsnGlnAsn GlnThrThr SerSer gagttactt aggaaatag acttcg catttcaat tagagaggc cataga 525 Glu Leu Leu Arg Lys Gln Thr Ser His Phe Asn Gln Arg Gly His Arg gca agg tct aaa ctt ctg get tct aga caa att cct gat aga aca ttt 573 Ala Arg Ser Lys Leu Leu Ala Ser Arg Gln Ile Pro Asp Arg Thr Phe aaa tgt ggg aag tgg ctt ccc cag gtc cca tcc cct gtt tagggataga 622 Lys Cys Gly Lys Trp Leu Pro Gln Val Pro Ser Pro Val gttgatatcatttttatagttgccatgtatgcctctgcctgaatttttttaattgacttt682 tgagcttttgagattgcacgagggagaacaaggcctttgctgttgtggataggaaagact742 taacctaaaattaaaccagcaagaaagcattagtaaaaatctaacaatatgaagggctct802 tatgagtcatttttttcaaaagatgaaaactccagaaacgcacaggaacgaaatacctcc862 cagaaacatgaagcaatcatcgaagactcactggtaatatttttaaaaagtatacagatc922 aaagcaaaaagaagccatgtgtnaacaaagagaaatgtgcaaatattttttaaggcagta982 ttaagtgcaagaggagtaacatgaaataaacattctttcacatggctactgggaatataa1042 atttcgctccagaaaggccgtagcagtttgacgataggtggcaaaaccttaagattgtgt1102 actggggcccagaatttttatttctaggaatgtatcctgaggaaattatccgagatcccc1162 acaaactgcaatgtttaggaattgtccttatagcattgcatacacaagaaaaacagagaa1222 aagcctgatccctgtcagtggaaaaggggttcaatgaattacggtgtgtctgcatgaggc1282 ttttatgacattaaaaattgttgaacaacggccaggcacagtggctcatgcctgtaatcc1342 taacactttgggaggccaaggtgggaagattgcctgagctcaggagtttgagaccagcct1402 gggcaacacggtgaaaccccgtctctactaaaatacaaaaaattagccgggcgtcgcagc1462 atgcgcctgtagtcccagctgctcaggaggctgaggcaggagaattgattgaacccggga1522 ggcagaggttgcactgagctgagattaagccaccgcactccagcctgggcgacagagcaa1582 gattccgttcccaagaaaaaaaaattgttcaacaataagggncaaagggagagaatcata1642 acatctgattaaacagaaaaagcaagatttttaaaactaactatataaggatggtcccag1702 ctgtgtcaaaaggaagcttgtttgtaatacgtgtgcataaaaattaaatagaggtgaaca1762 caattattttaaggcagttaaattatctctgtattgtgaactaagactttctagaatttt1822 acttattcattctgtacttaaattttttctaatgaacacatatacttttgtaatcagaaa1882 atattaaatgcatgtatttttcaaaaaaaaaaaaaa 1918 <210> 8 <211> 125 <212> PRT
<213> Homo sapiens <220>
<221> SIGNAL
<222> -37..-1 <400> 8 Met Val Cys Glu Asp Ala Pro Ser Phe Gln Met Ala Trp Glu Ser Gln Met Ala Trp Glu Arg Gly Pro Ala Leu Leu Cys Cys Val Leu Ser Ala Ser Gln Leu Ser Ser Gln Asp Gln Asp Pro Leu Gly His Ile Lys Ser Leu Leu Tyr Pro Phe Gly Phe Pro Val Glu Leu Pro Arg Pro Gly Pro Thr Gly Ala Tyr Lys Lys Val Lys Asn Gln Asn Gln Thr Thr Ser Ser Glu Leu Leu Arg Lys Gln Thr Ser His Phe Asn Gln Arg Gly His Arg Ala Arg Ser Lys Leu Leu Ala Ser Arg Gln Ile Pro Asp Arg Thr Phe Lys Cys Gly Lys Trp Leu Pro Gln Val Pro Ser Pro Val <210>9 <211>852 <212>DNA

<213>Homo Sapiens <220>
<221> CDS
<222> 229..735 <220>
<221> sig_peptide <222> 229..492 <223> Von Heijne matrix score 6.7 seq VFALSSFLNKASA/VY
<220>
<221> polyA_signal <222> 816..821 <220>
<221> polyA site <222> 841..852 <400> 9 aatgactggc agtggcatca gcgatggcgg ctgcgtcggg gtcggttctg cagcgctgta 60 tcgtgtcgcc ctctgatctt cctgcatggc ggcagggagg tcaggtgatt catagcgcct ctggacaagg aggtttttaa atcaagattt acattccaa attaagaatg a tggatcaagc cacataaaaa cagctcct cccagatcat atactcct atgaaagga ttatttatcc aa MetLysGly ggaatctccaat gtatgg tttgacaga tttaaa ataacc aatgactgc 285 GlyIleSerAsn ValTrp PheAspArg PheLys IleThr AsnAspCys ccagaacacctt gaatca attgatgtc atgtgt caagtg cttactgat 333 ProGluHisLeu GluSer IleAspVal MetCys GlnVal LeuThrAsp ttgattgatgaa gaagta aaaagtggc atcaag aagaac aggatatta 381 LeuIleAspGlu GluVal LysSerGly IleLys LysAsn ArgIleLeu ataggaggattc tctatg ggaggatgc atggca atgcat ttagcatat 429 IleGlyGlyPhe SerMet GlyGlyCys MetAla MetHis LeuAlaTyr agaaatcatcaa gatgtg gcaggagta tttget ctttct agttttctg 477 ArgAsnHisGln AspVal AlaGlyVal PheAla LeuSer SerPheLeu aataaagcatct getgtt taccagget cttcag aagagt aatggtgta 525 AsnLysAlaSer AlaVal TyrGlnAla LeuGln LysSer AsnGlyVal cttcctgaatta tttcag tgtcatggt actgca gatgag ttagttctt 573 LeuProGluLeu PheGln CysHisGly ThrAla AspGlu LeuValLeu cattcttgggca gaagag acaaactca atgtta aaatct ctaggagtg 621 HisSerTrpAla GluGlu ThrAsnSer MetLeu LysSer LeuGlyVal accacgaagttt catagt tttccaaat gtttac catgag ctaagcaaa 669 ThrThrLysPhe HisSer PheProAsn ValTyr HisGlu LeuSerLys actgagttagac atattg aagttatgg attctt acaaag ctgccagga 717 ThrGluLeuAsp IleLeu LysLeuTrp IleLeu ThrLys LeuProGly gaaatggaaaaa caaaaa tgaatgaatc aagagtgatt 765 tgttaatgta GluMetGluLys GlnLys BO

agtgtaatgt ctttgtgaaa ttttt tgccaaat tataatgata agtga ac attaaaatat taagaaatag caaaaaaaaa aa 852 aaaaa <210> 0 <211> 69 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -88..-1 <400> 10 Met Lys Gly Gly Ile Ser Asn Val Trp Phe Asp Arg Phe Lys Ile Thr Asn Asp Cys Pro Glu His Leu Glu Ser Ile Asp Val Met Cys Gln Val Leu Thr Asp Leu Ile Asp Glu Glu Val Lys Ser Gly Ile Lys Lys Asn Arg Ile Leu Ile Gly Gly Phe Ser Met Gly Gly Cys Met Ala Met His Leu Ala Tyr Arg Asn His Gln Asp Val Ala Gly Val Phe Ala Leu Ser Ser Phe Leu Asn Lys Ala Ser Ala Val Tyr Gln Ala Leu Gln Lys Ser Asn Gly VaI Leu Pro Glu Leu Phe Gln Cys His Gly Thr Ala Asp Glu Leu Val Leu His Ser Trp Ala Glu Glu Thr Asn Ser Met Leu Lys Ser Leu Gly Val Thr Thr Lys Phe His Ser Phe Pro Asn Val Tyr His Glu Leu Ser Lys Thr Glu Leu Asp Ile Leu Lys Leu Trp IIe Leu Thr Lys Leu Pro Gly Glu Met Glu Lys Gln Lys <210> 11 <211> 1602 <212> DNA
<213> Homo sapiens <220>
<221> CDS
<222> 24..1004 <220>
<221> sig_peptide <222> 24..170 <223> Von Heijne matrix score 5.6 seq ACLSLGFFSLLWL/QL
<220>
<221> polyA-site <222> 1586..1602 <400>

atgcgccg cc 53 gcctctccgc acg atg ttc ccc tcg cgg agg aaa gcg gcg cag Met Phe Pro Ser Arg Arg Lys Ala Ala Gln ctgccctgg gaggacggc aggtccggg ttgctc tccggc ggcctccct 101 LeuProTrp GluAspGly ArgSerGly LeuLeu SerGly GlyLeuPro cggaagtgt tccgtcttc cacctgttc gtggcc tgcctc tcgctgggc 149 ArgLysCys SerValPhe HisLeuPhe ValAla CysLeu SerLeuGly ttcttctcc ctactctgg ctgcagctc agctgc tctggg gacgtggcc 197 PhePheSer LeuLeuTrp LeuGlnLeu SerCys SerGly AspValAla cgggcagtc aggggacaa gggcaggag acctcg ggccct ccccgtgcc 245 ArgAlaVal ArgGlyGln GlyGlnGlu ThrSer GlyPro ProArgAla tgcccccca gagccgccc cctgagcac tgggaa gaagac gcatcctgg 293 CysProPro GluProPro ProGluHis TrpGlu GluAsp AlaSerTrp ggcccccac cgcctggca gtgctggtg cccttc cgcgaa cgcttcgag 341 GlyProHis ArgLeuAla VaILeuVal ProPhe ArgGlu ArgPheGlu gagctcctg gtcttcgtg ccccacatg cgccgc ttcctg agcaggaag 389 GluLeuLeu ValPheVal ProHisMet ArgArg PheLeu SerArgLys aagatccgg caccacatc tacgtgctc aaccag gtggac cacttcagg 437 LysIleArg HisHisIle TyrValLeu AsnGln ValAsp HisPheArg ttcaaccgg gcagcgctc atcaacgtg ggcttc ctggag agcagcaac 485 PheAsnArg AlaAlaLeu IleAsnVal GlyPhe LeuGlu SerSerAsn agcacggac tacattgcc atgcacgac gttgac ctgctc cctctcaac 533 SerThrAsp TyrIleAla MetHisAsp ValAsp LeuLeu ProLeuAsn gag gag gac tat ggc ttt cct getggg ccc ttc gtg gcc 581 ctg gag cac Glu Glu Asp Tyr Gly Phe Pro AlaGly Pro Phe Val Ala Leu Glu His tcc ccg ctc cac cct ctc tac tacaag acc tat ggc ggc 629 gag cac gtc Ser Pro Leu His Pro Leu Tyr TyrLys Thr Tyr Gly Gly Glu His Val atc ctg ctc tcc aag cag cac cggctg tgc aat atg tcc 677 ctg tac ggg Ile Leu Leu Ser Lys Gln His ArgLeu Cys Asn Met Ser Leu Tyr Gly aac cgc tgg ggc tgg ggc cgc gacgac gag ttc cgg cgc 725 ttc gag tac Asn Arg Trp Gly Trp Gly Arg AspAsp Glu Phe Arg Arg Phe Glu Tyr att aag get ggg ctc cag ctt cgcccc tcg gga aca act 773 gga ttc atc Ile Lys Ala Gly Leu Gln Leu ArgPro Ser Gly Thr Thr Gly Phe Ile ggg tac aca ttt cgc cac ctg gaccca gcc tgg aag agg 821 aag cat cgg Gly Tyr Thr Phe Arg His Leu AspPro Ala Trp Lys Arg Lys His Arg gac cag cgc atc gca get caa caggag cag ttc gtg gac 869 aag aaa aag Asp Gln Arg Ile Ala Ala Gln GlnGlu Gln Phe Val Asp Lys Lys Lys agg gag ggc ctg aac act gtg taccat gtg get cgc act 917 gga aag tcc Arg Glu Gly Leu Asn Thr Val TyrHis Val Ala Arg Thr Gly Lys Ser gcc ctg gtg ggc ggg gcc ccc actgtc ctc aac atg ttg 965 tct tgc atc Ala Leu Val Gly Gly Ala Pro ThrVal Leu Asn Met Leu Ser Cys Ile gac tgt aag acc gcc aca ccc tgcaca ttc agc gctggat 1014 gac tgg tga Asp Cys Lys Thr Ala Thr Pro CysThr Phe Ser Asp Trp ggacagtgaggaagcctgta cctacaggccattgctcaggctcaggacaaggcctcag1074 at gtcgtgggcccagctctgac aggatgtggaggccaggaccaagacagcaagctacgca1134 gt attgcagccacccggccgcc aaggcaggctggctgggccaggacacgtggggtgcctg1194 tg ggacgctgcttgccatgcac agtgatcagagaggctggggtgtgtcctgtccgggacc1254 ga ccccctgccttcctgctcac cctactctgatccttcacgtgcccaggcctgtgggtag1314 cc tggggagggctgaacaggac aacctctcatcccccact tttgttccttcctgctgggc1374 ca tgcctcgtgcagagacacag tgtaggggccgcagctgg cgtaggtggcagttgggcct1434 at ggtgagggttaggacttcag aaaccagagcaagcccca cagagggggaacagccagca1494 ac ccgctctagctggttgttgc catgccggaa gtgttgccagatcttctgat1554 tgtgggccta ttttcgaaagaaactagaat gctggattct aaaaaaaa 1602 caaaaaaaaa <210> 12 <211> 327 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -49..-1 <400> 12 Met Phe Pro Ser Arg Arg Lys Ala Ala Gln Leu Pro Trp Glu Asp Gly Arg Ser Gly Leu Leu Ser Gly Gly Leu Pro Arg Lys Cys Ser Val Phe His Leu Phe Val Ala Cys Leu Ser Leu Gly Phe Phe Ser Leu Leu Trp Leu Gln Leu Ser Cys Ser Gly Asp Val Ala Arg Ala Val Arg Gly Gln Gly Gln Glu Thr Ser Gly Pro Pro Arg Ala Cys Pro Pro Glu Pro Pro Pro Glu His Trp Glu Glu Asp Ala Ser Trp Gly Pro His Arg Leu Ala Val Leu Val Pro Phe Arg Glu Arg Phe Glu Glu Leu Leu Val Phe Val Pro His Met Arg Arg Phe Leu Ser Arg Lys Lys Ile Arg His His Ile Tyr Val Leu Asn Gln Val Asp His Phe Arg Phe Asn Arg Ala Ala Leu Ile Asn Val Gly Phe Leu Glu Ser Ser Asn Ser Thr Asp Tyr Ile Ala Met His Asp Val Asp Leu Leu Pro Leu Asn Glu Glu Leu Asp Tyr Gly Phe Pro Glu Ala Gly Pro Phe His Val Ala Ser Pro Glu Leu His Pro Leu Tyr His Tyr Lys Thr Tyr Val Gly Gly Ile Leu Leu Leu Ser Lys Gln His Tyr Arg Leu Cys Asn Gly Met Ser Asn Arg Phe Trp Gly Trp Gly Arg Glu Asp Asp Glu Phe Tyr Arg Arg Ile Lys Gly Ala Gly Leu Gln Leu Phe Arg Pro Ser Gly Ile Thr Thr Gly Tyr Lys Thr Phe Arg WO 00/37491 PCT/iB99/02058 His Leu His Asp Pro Ala Trp Arg Lys Arg Asp Gln Lys Arg Ile Ala Ala Gln Lys Gln Glu Gln Phe Lys Val Asp Arg Glu Gly Gly Leu Asn Thr Val Lys Tyr His Val Ala Ser Arg Thr Ala Leu Ser Val Gly Gly Ala Pro Cys Thr Val Leu Asn Ile Met Leu Asp Cys Asp Lys Thr Ala Thr Pro Trp Cys Thr Phe Ser <210>13 <211>948 <212>DNA

<213>Homo sapiens <220>
<221> CDS
<222> 80..784 <220>
<221> sig~eptide <222> 80..139 <223> Von Heijne matrix score 4 seq LLICVVFWFASLC/AW
<220>
<221> polyA_signal <222> 910..915 <220>
<221> polyA site <222> 933..948 <400> 13 cttcctgacc caggggctcc gctggctgcg gtcgcctggg agctgccgcc agggccagga 60 ggggagcggc acctggaag atg cgc cca ttg get ggt ggc ctg ctc aag gtg 112 Met Arg Pro Leu Ala Gly Gly Leu Leu Lys Val gtg ttc gtg gtc ttc gcc tcc ttg tgt gcc tgg tat tcg ggg tac ctg 160 Val Phe Val Val Phe Ala Ser Leu Cys Ala Trp Tyr Ser Gly Tyr Leu ctcgcagag ctcattcca gatgca cccctgtcc agtgetgcc tatagc 208 LeuAlaGlu LeuIlePro AspAla ProLeuSer SerAlaAla TyrSer atccgcagc atcggggag aggcct gtcctcaaa getccagtc cccaaa 256 IleArgSer IleGlyGlu ArgPro ValLeuLys AlaProVal ProLys aggcaaaaa tgtgaccac tggact ccctgccca tctgacacc tatgcc 304 ArgGlnLys CysAspHis TrpThr ProCysPro SerAspThr TyrAla tacaggtta ctcagcgga ggtggc agaagcaag tacgccaaa atctgc 352 TyrArgLeu LeuSerGly GlyGly ArgSerLys TyrAlaLys IleCys tttgaggat aacctactt atggga gaacagctg ggaaatgtt gccaga 400 PheGluAsp AsnLeuLeu MetGly GluGlnLeu GlyAsnVal AlaArg ggaataaac attgccatt gtcaac tatgtaact gggaatgtg acagca 448 GlyIleAsn IleAlaIle ValAsn TyrValThr GlyAsnVal ThrAla acacgatgt tttgatatg tatgaa ggcgataac tctggaccg atgaca 496 ThrArgCys PheAspMet TyrGlu GlyAspAsn SerGlyPro MetThr aagtttatt cagagtget getcca aaatccctg ctcttcatg.gtgacc 544 LysPheIle GlnSerAla AlaPro LysSerLeu LeuPheMet ValThr tatgacgac ggaagcaca agactg aataacgat gccaagaat gccata 592 TyrAspAsp GlySerThr ArgLeu AsnAsnAsp AlaLysAsn AlaIle gaagcactt ggaagtaaa gaaatc aggaacatg aaattcagg tctagc 640 GluAlaLeu GlySerLys GluIle ArgAsnMet LysPheArg SerSer tgggtattt attgcagca aaaggc ttggaactc ccttccgaa attcag 688 TrpValPhe IleAlaAla LysGly LeuGluLeu ProSerGlu IleGln agagaaaag atcaaccac tctgat getaagaac aacagatat tctggc 736 ArgGluLys IleAsnHis SerAsp AlaLysAsn AsnArgTyr SerGly tggcctgca gagatccag atagaa ggctgcata cccaaagaa cgaagc 784 .

TrpProAla GluIleGln IleGlu GlyCysIle ProLysGlu ArgSer tgacact gca gggtcctgag aaatgtgtt gtataaacaaatgcagct ggaatcgctc 844 t ct aagaatc tta tttttctaaa ccaacagcc tatttgat gagtatt ttg ggtttgttgt 904 t ca aaaccaatga acatttgcta gttgtaccaa aaaaaaaaaa aaaa g4g <210> 14 <211> 235 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -20..-1 <400> 14 Met Arg Pro Leu Ala Gly Gly Leu Leu Lys Val Val Phe Val Val Phe Ala Ser Leu Cys Ala Trp Tyr Ser Gly Tyr Leu Leu Ala Glu Leu Ile Pro Asp Ala Pro Leu Ser Ser Ala Ala Tyr Ser Ile Arg Ser Ile Gly Glu Arg Pro Val Leu Lys Ala Pro Val Pro Lys Arg Gln Lys Cys Asp His Trp Thr Pro Cys Pro Ser Asp Thr Tyr Ala Tyr Arg Leu Leu Ser Gly Gly Gly Arg Ser Lys Tyr Ala Lys Ile Cys Phe Glu Asp Asn Leu Leu Met Gly Glu Gln Leu Gly Asn Val Ala Arg Gly Ile Asn Ile Ala Ile Val Asn Tyr Val Thr Gly Asn Val Thr Ala Thr Arg Cys Phe Asp Met Tyr Glu Gly Asp Asn Ser Gly Pro Met Thr Lys Phe Ile Gln Ser Ala Ala Pro Lys Ser Leu Leu Phe Met Val Thr Tyr Asp Asp Gly Ser Thr Arg Leu Asn Asn Asp Ala Lys Asn Ala Ile Glu Ala Leu Gly Ser Lys Glu Ile Arg Asn Met Lys Phe Arg Ser Ser Trp Val Phe Ile Ala Ala Lys Gly Leu Glu Leu Pro Ser Glu Ile Gln Arg Glu Lys Ile Asn His Ser Asp Ala Lys Asn Asn Arg Tyr Ser Gly Trp Pro Ala Glu Ile Gln Ile Glu Gly Cys Ile Pro Lys Glu Arg Ser <210> 15 <211> 25 <212> DNA
<213> Artificial Sequence <220>
<223> oligonucleotide used as a primer <400> 15 gggaagatgg agatagtatt gcctg 25 <210> 16 <211> 26 <212> DNA
<213> Artificial Sequence <220>
<223> oligonucleotide used as a primer <400> 16 ctgccatgta catgatagag agattc 26 <210> 17 <211> 546 <212> DNA
<213> Homo Sapiens <220>
<221> promoter <222> 1..517 <220>
<221> transcription start site <222> 518 <220>
<221> protein bind <222> 17..25 <223> matinspector prediction name CMYB O1 score 0.983 sequence tgtcagttg <220>
<221> protein bind <222> complement(18..27) <223> matinspector prediction name MYOD Q6 score 0.961 sequence cccaactgac <220>
<221> protein bind <222> complement(75..85) <223> matinspector prediction name SB O1 score 0.960 sequence aatagaattag <220>
<221> protein bind <222> 94..104 <223> matinspector prediction name S8 01 score 0.966 sequence aactaaattag <220>
<221> protein bind <222> complement(129..139) <223> matinspector prediction name DELTAEF1 O1 score 0.960 sequence gcacacctcag <220>
<221> protein bind <222> complement(155..165) <223> matinspector prediction name GATA C
score 0.964 sequence agataaatcca <220>
<221> protein bind <222> 170..178 <223> matinspector prediction name CMYB O1 score 0.958 sequence cttcagttg <220>
<221> protein bind <222> 176..189 <223> matinspector prediction name GATA1 02 score 0.959 sequence ttgtagataggaca <220>
<221> protein bind <222> 180..190 <223> matinspector prediction name GATA C
score 0.953 sequence agataggacat <220>
<221> protein bind <222> 284..299 <223> matinspector prediction name TALlALPHAE47 01 score 0.973 sequence cataacagatggtaag <220>
<221> protein bind <222> 284..299 <223> matinspector prediction name TAL1BETAE47 01 score 0.983 sequence cataacagatggtaag <220>
<221> protein bind 24 ' <222> 284..299 <223> matinspector prediction name TAL1BETAITF2 O1 score 0.978 sequence cataacagatggtaag <220>
<221> protein bind <222> complement(287..296) <223> matinspector prediction name MYOD Q6 score 0.954 sequence accatctgtt <220>
<221> protein bind <222> complement(302..314) <223> matinspector prediction name GATA1 04 score 0.953 sequence tcaagataaagta <220>
<221> protein bind <222> 393..405 <223> matinspector prediction name IK1 O1 score 0.963 sequence agttgggaattcc <220>
<221> protein bind <222> 393..404 <223> matinspector prediction name IK2 O1 score 0.985 sequence agttgggaattc <220>
<221> protein bind <222> 396..405 <223> matinspector prediction name CREL O1 score 0.962 sequence tgggaattcc <220>
<221> protein bind <222> 423..436 <223> matinspector prediction name GATA1 02 score 0.950 sequence tcagtgatatggca <220>
<221> protein bind <222> complement(47B..489) <223> matinspector prediction name SRY 02 score 0.951 sequence taaaacaaaaca <220>
<221> protein bind <222> 486..493 <223> matinspector prediction name E2F 02 score 0.957 sequence tttagcgc <220>
<221> protein bind <222> complement(514..521) <223> matinspector prediction name MZF1 O1 score 0.975 sequence tgagggga <400> 17 tgagtgcagt gttacatgtc agttgggtta agtttgttaa tgtcattcaa atcttctatg 60 tcttgatttg cctgctaatt ctattatttc tggaactaaa ttagtttgat ggttctatta 120 gttattgact gaggtgtgct aatctcccat tatgtggatt tatctatttc ttcagttgta 180 gataggacat tgatagatac ataagtacca ggacaaaagc agggagatct tttttccaaa 240 atcaggagaa aaaaatgaca tctggaaaac ctatagggaa aggcataaca gatggtaagg 300 atactttatcttgagtaggagagccttcctgtggcaacgtggagaagggaagaggtcgta 360 gaattgaggagtcagctcagttagaagcagggagttgggaattccgttcatgtgatttag 420 catcagtgatatggcaaatgtgggactaagggtagtgatcagagggttaaaattgtgtgt 480 tttgttttagcgctgctggggcatcgccttgggtcccctcaaacagattcccatgaatct 540 cttcat <210> 18 <211> 23 <212> DNA
<213> Artificial Sequence <220>
<223> oligonucleotide used as a primer <400> 18 gtaccaggga ctgtgaccat tgc 23 <210> 19 <211> 24 <212> DNA
<213> Artificial Sequence <220>
<223> oligonucleotide used as a primer <400> 19 ctgtgaccat tgctcccaag agag 24 <210> 20 <211> 861 <212> DNA
<213> Homo Sapiens <220>
<221> promoter <222> 1..806 <220>
<221> transcription start site <222> 807 <220>

<221> protein bind <222> complement(60..70) <223> matinspector prediction name NFY Q6 score 0.956 sequence ggaccaatcat <220>
<221> protein bind <222> ?0..77 <223> matinspector prediction name MZF1 O1 score 0.962 sequence cctgggga <220>
<221> protein bind <222> 124..132 <223> matinspector prediction name CMYB O1 score 0.994 sequence tgaccgttg <220>
<221> protein bind <222> complement(126..134) <223> matinspector prediction name VMYB 02 score 0.985 sequence tccaacggt <220>
<221> protein bind <222> 135..143 <223> matinspector prediction name STAT O1 score 0.968 sequence ttcctggaa <220>
<221> protein bind <222> complement(135..143) WO OOI37491 PC'T/IB99/02058 <223> matinspector prediction name STAT O1 score 0.951 sequence ttccaggaa <220>
<221> protein bind <222> complement(252..259) <223> matinspector prediction name MZF1 O1 score 0.956 sequence ttggggga <220>
<221> protein bind <222> 357..368 <223> matinspector prediction name IK2 O1 score 0.965 sequence gaatgggatttc <220>
<221> protein bind <222> 384..391 <223> matinspector prediction name MZF1 O1 score 0.986 sequence agagggga <220>
<221> protein bind <222> complement(410..421) <223> matinspector prediction name SRY 02 score 0.955 sequence gaaaacaaaaca <220>
<221> protein bind <222> 592..599 <223> matinspector prediction name MZF1 Ol WO 00/37491 n~~B99~02058 score 0.960 sequence gaagggga <220>
<221> protein bind <222> 618..627 <223> matinspector prediction name MYOD Q6 score 0.981 sequence agcatctgcc <220>
<221> protein bind <222> 632..642 <223> matinspector prediction name DELTAEF1 O1 score 0.958 sequence tcccaccttcc <220>
<221> protein bind <222> complement(813..823) <223> matinspector prediction name SB O1 score 0.992 sequence gaggcaattat <220>
<221> protein bind <222> complement(824..831) <223> matinspector prediction name MZF1 O1 score 0.986 sequence agagggga <220>
<221> misc feature <222> 335,376 <223> n=a, g, c or t <400> 20 tactataggg cacgcgtggt cgacggccgg gctgttctgg agcagagggc atgtcagtaa 60 tgattggtccctggggaaggtctggctggctccagcacagtgaggcatttaggtatctct 120 cggtgaccgttggattcctggaagcagtagctgttctgtttggatctggtagggacaggg 180 ctcagagggctaggcacgagggaaggtcagaggagaaggsaggsarggcccagtgagarg 240 ggagcatgccttcccccaaccctggcttscycttggymamagggcgkttytgggmacttr 300 aaytcagggcccaascagaascacaggcccaktcntggctsmaagcacaatagcctgaat 360 gggatttcaggttagncagggtgagaggggaggctctctggcttagttttgttttgtttt 420 ccaaatcaaggtaacttgctcccttctgctacgggccttggtcttggcttgtcctcaccc 480 agtcggaactccctaccactttcaggagagtggttttaggcccgtggggctgttctgttc 540 caagcagtgtgagaacatggctggtagaggctctagctgtgtgcggggcctgaaggggag 600 tgggttctcgcccaaagagcatctgcccatttcccaccttcccttctcccaccagaagct 660 tgcctgagctgtttggacaaaaatccaaaccccacttggctactctggcctggcttcagc 720 ttggaacccaatacctaggcttacaggccatcctgagccaggggcctctggaaattctct 780 tcctgatggtcctttaggtttgggcacaaaatataattgcctctcccctctcccattttc 840 tctcttgggagcaatggtcac 861 <210> 21 <211> 20 <212> DNA
<213> Artificial Sequence <220>
<223> oligonucleotide used as a primer <400> 21 ctgggatgga aggcacggta 20 <210> 22 <211> 20 <212> DNA
<213> Artificial Sequence <220>
<223> oligonucleotide used as a primer <400> 22 gagaccacac agctagacaa 20 <210> 23 <211> 555 <212> DNA
<213> Homo Sapiens <220>
<221> promoter <222> 1..500 <220>
<221> transcription start site <222> 501 <220>
<221> protein bind <222> 191..206 <223> matinspector prediction name ARNT O1 score 0.964 sequence ggactcacgtgctgct <220>
<221> protein bind <222> 193..204 <223> matinspector prediction name NMYC O1 score 0.965 sequence actcacgtgctg <220>
<221> protein bind <222> 193..204 <223> matinspector prediction name USF O1 score 0.985 sequence actcacgtgctg <220>
<221> protein bind <222> complement(193..204) <223> matinspector prediction name USF Ol score 0.985 sequence cagcacgtgagt <220>
<221> protein bind <222> complement(193..204) <223> matinspector prediction name NMYC_O1 score 0.956 sequence cagcacgtgagt <220>
<221> protein bind <222> complement(193..204) <223> matinspector prediction name MYCMAX 02 score 0.972 sequence cagcacgtgagt <220>
<221> protein bind <222> 195..202 <223> matinspector prediction name USF C
score 0.997 sequence tcacgtgc <220>
<221> protein bind <222> complement(195..202) <223> matinspector prediction name USF C
score 0.991 sequence gcacgtga <220>
<221> protein bind <222> complement(210..217) <223> matinspector prediction name MZF1 O1 score 0.968 sequence catgggga <220>
<221> protein bind <222> 397..410 <223> matinspector prediction name ELK1 02 score 0.963 sequence ctctccggaagcct <220>
<221> protein bind <222> 400..409 <223> matinspector prediction name CETS1P54 O1 score 0.974 sequence tccggaagcc <220>
<221> protein bind <222> complement(460..470) <223> matinspector prediction name AP1 Q4 score 0.963 sequence agtgactgaac <220>
<221> protein bind <222> complement(460..470) <223> matinspector prediction name AP1FJ Q2 score 0.961 sequence agtgactgaac <220>
<221> protein bind <222> 547..555 <223> matinspector prediction name PADS C
score 1.000 sequence tgtggtctc <400> 23 ctatagggcacgcktggtcgacggcccgggctggtctggtctgtkgtggagtcgggttga60 aggacagcatttgtkacatctggtctactgcaccttccctctgccgtgcacttggccttt120 kawaagctcagcaccggtgcccatcacagggccggcagcacacacatcccattactcaga180 aggaactgacggactcacgtgctgctccgtccccatgagctcagtggacctgtctatgta240 gagcagtcagacagtgcctgggatagagtgagagttcagccagtaaatccaagtgattgt300 cattcctgtctgcattagtaactcccaacctagatgtgaaaacttagttctttctcatag360 gttgctctgcccatggtcccactgcagacccaggcactctccggaagcct_ggaaatcacc420 cgtgtcttctgcctgctcccgctcacatcccacacttgtgttcagtcactgagttacaga480 ttttgcctcctcaatttctcttgtcttagtcccatcctctgttcccctggccagtttgtc540 tagctgtgtggtctc 555 <210> 24 <211> 1450 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 153..1127 <220>
<221> sig~eptide <222> 153..230 <223> Von Heijne matrix score 8.40 seq RLLRLLLSGLVLG/AA
<220>
<221> polyA signal <222> 1415..1420 <220>
<221> polyA-site <222> 1434..1450 <220>
<221> mist feature <222> 88 <223> n=a, g, c or t <400> 24 ctttcctctt cctcctcctc ctccttggcatccgcctcttcttcctcctgcgtcctcccc60 cgctgcctcc gctgctcccg acgcggancccggagcccgcgccgagcccctggcctcgcg120 gtgccatgct gccccggcgg cggcgctgaagg atg acg ccg cct ccg 173 gcg ctg Met Ala Thr Pro Pro Pro Leu ccc tcc ccg cgg cat ctg ctg cgg ctg ctc ggc ctc 221 cgg ctg ctg tcc ProSer ProArgHis LeuArgLeu LeuArg LeuLeuLeu SerGly Leu gtcctc ggcgccgcc ctgcgtgga gccgcc gccggccac ccggat gta 269 ValLeu GlyAlaAla LeuArgGly AlaAla AlaGlyHis ProAsp Val gccgcc tgtcccggg agcctggac tgtgcc ctgaagagg cgggca agg 317 AlaAla CysProGly SerLeuAsp CysAla LeuLysArg ArgAla Arg tgtcct cctggtgca catgcctgt gggccc tgccttcag cccttc cag 365 CysPro ProGlyAla HisAlaCys GlyPro CysLeuGln ProPhe Gln gaggac cagcaaggg ctctgtgtg cccagg atgcgccgg cctcca ggc 413 GluAsp GlnGlnGly LeuCysVal ProArg MetArgArg ProPro Gly gggggc cggccccag cccagactg gaagat gagattgac ttcctg gcc 461 GlyGly ArgProGln ProArgLeu GluAsp GluIleAsp PheLeu Ala cag gag ctt gcc cgg aag gag tct gga cac tca act ccg ccc cta ccc 509 Gln Glu Leu Ala Arg Lys Glu Ser Gly His Ser Thr Pro Pro Leu Pro aag gac cga cag cgg ctc ccg gag cct gcc acc ctg ggc ttc tcg gca 557 Lys Asp Arg Gln Arg Leu Pro Glu Pro Ala Thr Leu Gly Phe Ser Ala cgg ggg cag ggg ctg gag ctg ggc ctc ccc tcc act cca gga acc ccc 605 Arg Gly Gln Gly Leu Glu Leu Gly Leu Pro Ser Thr Pro Gly Thr Pro acg ccc acg ccc cac acc tcc ctg ggc tcc cct gtg tca tcc gac ccg 653 Thr Pro Thr Pro His Thr Ser Leu Gly Ser Pro Val Ser Ser Asp Pro gtg cac atg tcg ccc ctg gag ccc cgg gga ggg caa ggc gac ggc ctc 701 Val His Met Ser Pro Leu Glu Pro Arg Gly Gly Gln Gly Asp Gly Leu gcc ctt gtg ctg atc ctg gcg ttc tgt gtg gcc ggt gca gcc gcc ctc 749 Ala Leu Val Leu Ile Leu Ala Phe Cys Val Ala Gly Ala Ala Ala Leu tcc gta gcc tcc ctc tgc tgg tgc agg ctg cag cgt gag atc cgc ctg 797 Ser Val Ala Ser Leu Cys Trp Cys Arg Leu Gln Arg Glu Ile Arg Leu act cag aag gcc gac tac gcc act gcg aag gcc cct ggc tca cct gca 845 Thr Gln Lys Ala Asp Tyr Ala Thr Ala Lys Ala Pro Gly Ser Pro Ala get ccc cgg atc tcg cct ggg gac cag cgg ctg gca cag agc gcg gag 893 WO 00/3749( PCT/IB99/02058 Ala Pro Arg Ile Ser Pro Gly Asp Gln Arg Leu Ala Gln Ser Ala Glu atg tac cac tac cag cac caa cgg caa cag atg ctg tgc ctg gag cgg 941 Met Tyr His Tyr Gln His Gln Arg Gln Gln Met Leu Cys Leu Glu Arg cat aaa gag cca ccc aag gag ctg gac acg gcc tcc tcg gat gag gag 989 His Lys Glu Pro Pro Lys Glu Leu Asp Thr Ala Ser Ser Asp Glu Glu aat gag gac gga gac ttc acg gtg tac gag tgc ccg ggc ctg gcc ccg 1037 Asn Glu Asp Gly Asp Phe Thr Val Tyr Glu Cys Pro Gly Leu Ala Pro acc ggg gaa atg gag gtg cgc aac cct ctg ttc gac cac gcc gca ctg 1085 Thr Gly Glu Met Glu Val Arg Asn Pro Leu Phe Asp His Ala Ala Leu tcc gcg ccc ctg ccg gcc ccc agc tca ccg cct gca ctg cca 1127 Ser Ala Pro Leu Pro Ala Pro Ser Ser Pro Pro Ala Leu Pro tgacctggaggcagacagacgcccacctgctccccgacctcgaggcccccggggaggggc1187 agggcctggagcttcccactaaaaacatgttttgatgctgtgtgcttttggctgggcctt1247 gggctccaggccctgggaccccttgccagggagacccccgaacctttgtgccaggacacc1307 tcctggtcccctgcacctctcctgtttggtttagacccccaaactggagggggcatggag1367 aaccgtagagcgcaggaacgggtgggtaattctagagacaaaagccaattaaagtccatt1427 tcagacaaaaaaaaaaaaaaaaa 1450 <210>25 <211>1556 <212>DNA

<213>Homo Sapiens <220>
<221> CDS
<222> 261..1166 <220>
<221> sig~eptide <222> 261..314 <223> Von Heijne matrix score 8.80 seq RLVLIILCSWFS/AV
<220>
<221> polyA site <222> 1524..1556 <400>

cagcccagtc gagctccgag cggcgatcgc 60 ggcccggccc gagcctcctg gggggccatg cgaaccccag tcggccggga gatgcggcag 120 cctgcacgcc tggaatctgg cggttagcat aagggcgg tg cgcccggcct ctccattcgt 180 aaaaacctac cccccgggta gtcctgccct gagaggtgcc ccccagccct ggagacagca 240 cggctcccac gcccctagac cccttcccag tactgaggga t 293 cagcgacagc ccg atg ggt aag cgg ge ctc gtg ctc atc atc Met a Lys Pro Al Gly Arg Leu Val Leu Ile Ile ctgtgctcc gtggtcttc tctgcc gtctacatcctc ctgtgc tgctgg 341 LeuCysSer ValValPhe SerAla ValTyrIleLeu LeuCys CysTrp gccggcctg cccctctgc ctggcc acctgcctggac caccac ttcccc 389 AlaGlyLeu ProLeuCys LeuAla ThrCysLeuAsp HisHis PhePro acaggctcc aggcccact gtgccg ggacccctgcac ttcagt ggatat 437 ThrGlySer ArgProThr ValPro GlyProLeuHis PheSer GlyTyr agcagtgtg ccagatggg aagccg ctggtccgcgag ccctgc cgcagc 485 SerSerVal ProAspGly LysPro LeuValArgGlu ProCys ArgSer tgtgccgtg gtgtccagc tccggc caaatgctgggc tcaggc ctgggt 533 CysAlaVal ValSerSer SerGly GlnMetLeuGly SerGly LeuGly getgagatc gacagtgcc gagtgc gtgttccgcatg aaccag gcgccc 581 AlaGluIle AspSerAla GluCys ValPheArgMet AsnGln AlaPro accgtgggc tttgaggcg gatgtg ggccagcgcagc accc.tgcgtgtc 629 ThrValGly PheGluAla AspVal GlyGlnArgSer ThrLeu ArgVal gtctcacac acaagcgtg ccgctg ctgctgcgcaac tattca cactac 677 ValSerHis ThrSerVal ProLeu LeuLeuArgAsn TyrSer HisTyr ttccagaag gcccgagac acgctc tacatggtgtgg ggccag ggcagg 725 PheGlnLys AlaArgAsp ThrLeu TyrMetValTrp GlyGln GlyArg cacatggac cgggtgctc ggcggc cgcacctaccgc acgctg ctgcag 773 HisMetAsp ArgValLeu GlyGly ArgThrTyrArg ThrLeu LeuGln lA0 145 150 ctcaccagg atgtacccc ggcctg caggtgtacacc ttcacg gagcgc 821 LeuThrArg MetTyrPro GlyLeu GlnValTyrThr PheThr GluArg atgatg gcctac gaccag atc caggacgag acg aag aac 869 tgc ttc ggc MetMet AlaTyr AspGln Ile GlnAspGlu Thr Lys Asn Cys Phe Gly cggagg cagtcg tccttc ctc accggctgg ttc atg atc 917 ggc agc acc ArgArg GlnSer SerPhe Leu ThrGlyTrp Phe Met Ile Gly Ser Thr ctcgcg ctggag tgtgag gag gtggtctat ggg gtc agc 965 ctg atc atg LeuAla LeuGlu CysGlu Glu ValValTyr Gly Val Ser Leu Ile Met gacagc tactgc gagaag agc ccctcagtg cct cac tac 1013 agg cac tac AspSer TyrCys GluLys Ser ProSerVal Pro His Tyr Arg His Tyr tttgag aagggc ctagat gag cagatgtac ctg cac gag 1061 cgg tgt gca PheGlu LysGly LeuAsp Glu GlnMetTyr Leu His Glu Arg Cys Ala caggcg ccccga gcccac cgc atcactgag aag gtc ttc 1109 agc ttc gcg GlnAla ProArg AlaHis Arg IleThrGlu Lys Val Phe Ser Phe Ala tcccgc tgggcc aagagg ccc gtgttcgcc cat tcc tgg 1157 aag atc ccg SerArg TrpAla LysArg Pro ValPheAla His Ser Trp Lys Ile Pro aggact gagtagcttccgt cgtcctgcca 1206 gccgccatgc cgttgcgagg ArgThr Glu cctccgggat gtcccatccc agccatcac ctgagtaattcatggcattt 1266 a actccactcc gggggctcac cacctccagg ctgtcaagt cctggggctgatggccccca 1326 t ggcctttgtc actcaccagc atcatgacct gtgccagtc ctccccagccgcccctacca 1386 t ctggtcctcc ccttttggtg ccacacttct aggctggcc gggcagccgagagcctgggg 1446 c gccctggttg ttcattggtg aaggggcctt gagttgtga cgtatcaggaacgtacgggt 1506 g ctgccggggc aaacgtgtgt tttctggaaa aaaaaaaaa aaaaaaaaaa 1556 a aacaaaaaaa <210> 26 <211> 1058 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 67..813 <220>
<221> sig~eptide <222> 67..111 <223> Von fieijne matrix score 5.20 seq QLWKLVLLCGVLT/GT
<220'>
<221> polyA-signal <222> 1023..1028 <220>
<221> polyA-site <222> 1042..1058 <400>

agcagact gt tgagcatcct cctctaaacg 60 gcagtggggc cgtgacaaga aaggatttca caaaag tg 108 a ctt cag ctt tgg aaa ctt gtt ctc ctg tgc ggc gtg ctc M et Leu Gln Leu Trp Lys Leu Val Leu Leu Cys Gly Val Leu actggg acctcagag tctcttctt gacaat cttggcaat gaccta agc 156 ThrGly ThrSerGlu SerLeuLeu AspAsn LeuGlyAsn AspLeu Ser aatgtc gtggataag ctggaacct gttctt cacgaggga cttgag aca 204 AsnVal ValAspLys LeuGluPro ValLeu HisGluGly LeuGlu Thr gttgac aatactctt aaaggcatc cttgag aaactgaag gtcgac cta 252 Va2Asp AsnThrLeu LysGlyIle LeuGlu LysLeuLys ValAsp Leu ggagtg cttcagaaa tccagtget tggcaa ctggccaag cagaag gcc 300 GlyVal LeuGlnLys SerSerAla TrpGln LeuAlaLys GlnLys Ala caggaa getgagaaa ttgctgaac aatgtc atttctaag ctgctt cca 348 GlnGlu AlaGluLys LeuLeuAsn AsnVal IleSerLys LeuLeu Pro actaac acggacatt tttgggttg aaaatc agcaactcc ctcatc ctg 396 ThrAsn ThrAspIle PheGlyLeu LysIle SerAsnSer LeuIle Leu gatgtc aaagetgaa ccgatcgat gatggc aaaggcctt aacctg agc 444 AspVal LysAlaGlu PY~oIleAsp AspGly LysGlyLeu AsnLeu Ser ttccct gtcaccgcg aatgtcact gtggcc gggcccatc attggc cag 492 PhePro ValThrAla AsnValThr ValAla GlyProIle IleGly Gln attatcaac ctgaaa gcctcc ttggacctc ctgacc gcagtcaca att 540 IleIleAsn LeuLys AlaSer LeuAspLeu LeuThr AlaValThr Ile gaaactgat ccccag acacac cagcctgtt gccgtc ctgggagaa tgc 588 GluThrAsp ProGln ThrHis GlnProVal AlaVal LeuGlyGlu Cys gccagtgac ccaacc agcatc tcactttcc ttgctg gacaaacac agc 636 AlaSerAsp ProThr SerIle SerLeuSer LeuLeu AspLysHis Ser caaatcatc aacaag ttcgtg aatagcgtg atcaac acgctgaaa agc 684 GlnIleIle AsnLys PheVal AsnSerVal IleAsn ThrLeuLys Ser actgtatcc tccctg ctgcag aaggagata tgtcca ctgatccgc atc 732 ThrValSer SerLeu LeuGln LysGluIle CysPro LeuIleArg Ile ttcatccac tccctg gatgtg aatgtcatt cagcag gtcgtcgat aat 780 PheIleHis SerLeu AspVal AsnValIle GlnGln ValValAsp Asn cctcagcac aaaacc cagctg caaaccctc atctgaagaggac gaatgaggag 833 ProGlnHis LysThr GlnLeu GlnThrLeu Ile gaccactgtg gtgcatgctg agtggcttgc cccaccccct tatagcatct 893 attggttccc ccctccagga agctgctgcc cagcgtgaaa gcctgagtcc caccagaagg 953 accacctaac accttcccag ataccccttc cagaacagca gcctctacac atgttgtcct 1013 tcctcacagt gcccctggca ataaaggccc aaaaaaaaaa aaaaa atttctgcaa <210>27 <211>648 <212>DNA

<213>Homo Sapiens <220>
<221> CDS
<222> 187..438 <220>
<221> polyA_signal <222> 612..617 <220> .
<221> polyA site <222> 632..648 <400> 27 agtgcgcact ggagtcgggc cgcgactgtg60 ggcgtgcgag actcggcggg cgctgttgag gtcgttttta acggaagggc ggagacggag120 taccttcccg cgcggacgcc ggcgctgcca tttcgtcatg tatcctcaac gtgaggctct180 ttggccaggc ccatttgaga tctttgaaga gctgcc 228 atg aag gtg aag att aag tgc tgg aac ggc gtg gcc act tgg Met Lys Val Lys Ile Lys Cys Trp Asn Gly Val Ala Thr Trp ctc tgg gcc aac gat gag aac tgt tgc agg atg gca ttt 276 gtg ggc atc Leu Trp Ala Asn Asp Glu Asn Cys Cys Arg Met Ala Phe Val Gly Ile aac gga tgc cct gac tgc aag gtg gac gac tgc ccg ctg 324 tgc ccc ggc Asn Gly Cys Pro Asp Cys Lys Val Asp Asp Cys Pro Leu Cys Pro Gly gtg tgg cag tgc tcc cac tgc ttc cat tgc atc ctc aag 372 ggc cac atg Val Trp Gln Cys Ser His Cys Phe His Cys Ile Leu Lys Gly His Met tgg ctg gca cag cag gtg cag cag ccc atg tgc cgc cag 420 cac cac tgc Trp Leu Ala Gln Gln Val Gln Gln Pro Met Cys Arg Gln His His Cys gaa tgg ttc aag gag tgaggcccga cctggctctc 468 aag gctggagggg Glu Trp Phe Lys Glu Lys catcctgagactccttcctc atgctggcgc cgatggctgctggggacagc gcccctgagc528 tgcaacaaggtggaaacaag ggctggagct gcgtttgttttgccatcact atgttgacac588 ttttatccaataagtgaaaa ctcattaaac tactcaaatctcgaaaaaaa aaaaaaaaaa648 <210>28 <211>2104 <212>DNA

<213>Homo Sapiens <220>
<221> CDS
<222> 92..1753 <220>
<221> sig~eptide <222> 92..130 <223> Von Heijne matrix score 3.90 seq MLYLQGWSMPAVA/EV

WO 00/37491 PC'T/IB99/02058 <220>
<221> polyA-signal <222> 2070..2075 <220>
<221> polyA_site <222> 2090..2104 <220>
<221> misc feature <222> 905 <223> n=a, g, c or t <220>
<221> unsure <222> 259 <223> Xaa = Asp, His,Asn,Tyr <400> 28 atagacttta tatgttttct ttgctaagat tcatacttcg tattgatttt tagcatccag gtattgaagg a 112 gtcccatgtc atg catcgttttc ctt tat ctc cag ggt tgg Met Leu Tyr Leu Gln Gly Trp agcatg cctgetgtg gcagag gtaaaactt cgagatgat caatat aca 160 SerMet ProAlaVal AlaGlu ValLysLeu ArgAspAsp GlnTyr Thr ctggaa cacatgcat getttt ggaatgtat aattacctg cactgt gat 208 LeuGlu HisMetHis AlaPhe GlyMetTyr AsnTyrLeu HisCys Asp tcatgg tatcaagac agtgtc tactatatt gataccctt ggaaga att 256 SerTrp TyrGlnAsp SerVal TyrTyrIle AspThrLeu GlyArg Ile atgaat ttaacagta atgctg gacactgcc ttaggaaaa ccacga gag 304 MetAsn LeuThrVal MetLeu AspThrAla LeuGlyLys ProArg Glu gtgttt cgacttcct acagat ttgacagca tgtgacaac cgtctt tgt 352 ValPhe ArgLeuPro ThrAsp LeuThrAla CysAspAsn ArgLeu Cys gcatct atccatttc tcatct tctacctgg gttaccttg tcagat gga 400 AlaSer IleHisPhe SerSer SerThrTrp ValThrLeu SerAsp Gly actggaagattg tatgtcatt ggaaca ggtgaacgt ggaaatagc get 448 ThrGlyArgLeu TyrValIle GlyThr GlyGluArg GlyAsnSer Ala tctgaaaaatgg gagattatg tttaat gaagaactt ggggatcct ttt 496 SerGluLysTrp GluIleMet PheAsn GluGluLeu GlyAspPro Phe attataattcac agtatctca ctgcta aatgetgaa gaacattct ata 544 IleIleIleHis SerIleSer LeuLeu AsnAlaGlu GluHisSer Ile getaccctactt cttcgaata gagaaa gaggaattg gatatgaaa gga 592 AlaThrLeuLeu LeuArgIle GluLys GluGluLeu AspMetLys Gly agtggtttc tatgtttct ctggagtgg gtcactatc agtaagaaa aat 640 SerGlyPhe TyrValSer LeuGluTrp ValThrIle SerLysLys Asn caagataat aaaaaatat gaaattatt aagcgtgat attctccgt gga 688 GlnAspAsn LysLysTyr GluIleIle LysArgAsp IleLeuArg Gly aagtcagtg ccacattat getgetatt aagcctgat ggaaatggt cta 736 LysSerVal ProHisTyr AlaAlaIle LysProAsp GlyAsnGly Leu atgattgta tcctacaag tctttaaca tttgttcag getggtcaa gat 784 MetIleVal SerTyrLys SerLeuThr PheValGln AlaGlyGln Asp cttgaagaa aatatggat gaagacata tcagagaaa atcaaagaa cct 832 LeuGluGlu AsnMetAsp GluAspIle SerGluLys IleLysGlu Pro ctgtattac tggcaacag actgaagat gatttgaca gtaaccata cgg 880 LeuTyrTyr TrpGlnGln ThrGluAsp AspLeuThr ValThrIle Arg cttccagaa gacagtact aaggagnac attcaaata cagtttttg cct 928 LeuProGlu AspSerThr LysGluXaa IleGlnIle GlnPheLeu Pro gatcacatc aacattgta ctgaaggat caccagttt ttagaagga aaa 976 AspHisIle AsnIleVal LeuLysAsp HisGlnPhe LeuGluGly Lys ctctattca tctattgat catgaaagc agtacatgg ataattaaa gag 1024 LeuTyrSer SerIleAsp HisGluSer SerThrTrp IleIleLys Glu agtaatagc ttggagatt tccttgatt aagaagaat gaaggactg acc 1072 SerAsnSer LeuGluIle SerLeuIle LysLysAsn GluGlyLeu Thr tggcca gagctagta attggagat aaacaa ggggaactt ataaga gat 1120 TrpPro GluLeuVal IleGlyAsp LysGln GlyGluLeu IleArg Asp tcagcc cagtgtget gcaataget gaacgt ttgatgcat ttgacc tct 1168 SerAla GlnCysAla AlaIleAla GluArg LeuMetHis LeuThr Ser gaagaa ctgaatcca aatccagat aaagaa aaaccac_cttgcaat get 1216 GluGlu LeuAsnPro AsnProAsp LysGlu LysProPro CysAsn Ala caagag ttagaagaa tgtgatatt ttcttt gaagagagc tccagt tta 1264 GlnGlu LeuGluGlu CysAspIle PhePhe GluGluSer SerSer Leu tgcaga tttgatggc aatacatta aaaact actcatgtg gtgaat ctt 1312 CysArg PheAspGly AsnThrLeu LysThr ThrHisVal ValAsn Leu ggaagc aaccagtac cttttctct gtcata gtggatcct aaagaa atg 1360 GlySer AsnGlnTyr LeuPheSer ValIle ValAspPro LysGlu Met ccctgc ttctgtttg cgccatgat gttgat gccctactc tggcaa cca 1408 ProCys PheCysLeu ArgHisAsp ValAsp AlaLeuLeu TrpGln Pro cactcc agcaaacaa gatgatatg tgggag cacatcgca actttc aat 1456 HisSer SerLysGln AspAspMet TrpGlu HisIleAla ThrPhe Asn gettta ggctatgtc caagcatca aagaga gacaaaaaa tttttt gcc 1504 AlaLeu GlyTyrVal GlnAlaSer LysArg AspLysLys PhePhe Ala tgtget ccaaattac tcgtatgca gccctt tgtgagtgc cttcgt cga 1552 CysAla ProAsnTyr SerTyrAla AlaLeu CysGluCys LeuArg Arg gtattc atctatcgt cagcctget cccatg tccactgta ctttac aac 1600 ValPhe IleTyrArg GlnProAla ProMet SerThrVal LeuTyr Asn agaaag gaaggcagg caagtagga caggtt getaagcag caagta gca 1648 ArgLys GluGlyArg GlnValGly GlnVal AlaLysGln GlnVal Ala agccta gaaaccaat gatcctatt ttagga tttcaggca acaaat gag 1696 SerLeu GluThrAsn AspProIle LeuGly PheGlnAla ThrAsn Glu agatta tttgttctt actaccaaa aacctc tttttaata aaagta aat 1744 ArgLeu PheValLeu ThrThrLys AsnLeu PheLeuIle LysVal Asn aca gag taattattctaacatattggcctctttgtactggaaaagt 1793 aat Thr Glu Asn attcagtggtacctggaggtctggacagttatactgtaacctcttaagttttaatgtgct1853 aaatatatcttgtatgattttttattttttaataacattggaaatatattcaagagatta1913 tgattctgtaaagctgtggaatgaagctgcagatttagagaacattggcttctgaaaaaa1973 aaaaagagtgaagatagtactagcaagtatacttattttttaaaacaggctagaatctca2033 tgttttatatgaaagatgtacaattcagtgtttaaaaataaaaatatttattgtgtaaaa2093 aaaaaaaaaaa 2104 <210> 29 <211> 515 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 144..440 <220>
<221> sig_peptide <222> 144..287 <223> Von Heijne matrix score 4.10 seq VFMLIVSVI1ALIP/ET
<220>
<221> polyA_signal <222> 457..462 <220>
<221> polyA-site <222> 500..515 <220>
<221> mist feature <222> 60 <223> n=a, g, c or t <400> 29 agagagcggg aagccgagct gggcgagaag taggggaggg cggtgctccg cgcggtggcn 60 gttgctatcg cttcgcagaa cctactcagg cagccagctg agaagagttg agggaaagtg 120 ctgctgctgg gtctgcagac gcg atg gat aac gtg cag ccg aaa ata aaa cat 173 Met Asp Asn Val Gln Pro Lys Ile Lys His cgcccc ttctgcttc agtgtgaaa ggccac gtgaagatg ctgcgg ctg 221 ArgPro PheCysPhe SerValLys GlyHis ValLysMet LeuArg Leu gatatt atcaactca ctggtaaca acagta ttcatgctc atcgta tet 269 AspIle IleAsnSer LeuValThr ThrVal PheMetLeu IleVal Ser gtgttg gcactgata ccagaaacc acaaca ttgacagtt ggtgga ggg 317 ValLeu AlaLeuIle ProGluThr ThrThr LeuThrVal GlyGly Gly gtgttt gcacttgtg acagcagta tgctgt cttgccgac ggggcc ctt 365 ValPhe AlaLeuVal ThrAlaVal CysCys LeuAlaAsp GlyAla Leu atttac cggaagctt ctgttcaat cccagc ggtccttac cagaaa aag 413 IleTyr ArgLysLeu LeuPheAsn ProSer GlyProTyr GlnLys Lys cctgtg catgaaaaa aaagaagtt ttgtaattttata ttactttt ta 460 ProVal HisGluLys LysGluVal Leu gtttgatact aagtattaaa aaaaaaaaaa aaaat catatttctg tattcttcca <210> 30 <211> 661 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 174..443 <220>
<221> sig~peptide <222> 174..269 <223> Von Heijne matrix score 4.10 seq SSLAFCQVGFLTA/QP
<220>
<221> polyA signal <222> 623..628 WO 00/37491 PC'T/IB99/02058 <220>
<221> polyA_site <222> 647..661 <400> 30 aaaaaggaac actcaggagctatgtggatg 60 tttcagtgat acaggagcac aatgaacaaa ctagatgacc ctaccttgaccctagcactctctccaccct120 gactttaccc acttcaaatg gcatcctcac ggccaacagctcaccatcaattc atg 176 ctcagaccat cagttggtta Met ccc tgc gac cag ctc gtt cat cta ccc cct gcc 224 cta caa act gcc tgc Pro Cys Asp Gln Leu Val His Leu Pro Pro Ala Leu Gln Thr Ala Cys cag ccc tct gcc ttc caa gtg ttc tta gca cag 272 tcc ctg tgc ggg aca Gln Pro Ser Ala Phe Gln Val Phe Leu Ala Gln Ser Leu Cys Gly Thr cct tca ccg agg cgc ggg aaa aga tac ttg gtt 320 cct aga aat gac acg Pro Ser Pro Arg Arg Gly Lys Arg Tyr Leu Val Pro Arg Asn Asp Thr ctg caa cag tgc cag gat tta acc tcc ctt gtc 368 cac gaa gat gcc tca Leu Gln Gln Cys Gln Asp Leu Thr Ser Leu Val His Glu Asp Ala Ser tac ctt ctc tgc ttc gac ttg cga tcg cac caa 416 tcc ccc aaa ggt aag Tyr Leu Leu Cys Phe Asp Leu Arg Ser His Gln Ser Pro Lys Gly Lys agc atc gtt gac act aag tagtgccaag tt 463 act get aac ggattgcc Ser Ile Val Asp Thr Lys Thr Ala Asn taaggaagatcaggagcgga tctttctaatagccccattc523 acatctggtg gcaaagaaaa tagtgaccaccttcaacctc gagtaggggacttaggatgt583 ctcatagcag gagagtttgg tttgttcttttaatcaattc ataaaaataaaaatacttga643 agaaaatatg tatgtttgaa gccaaaaaaaaaaaaaaa 661 <210> 31 <211> 694 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 55..399 WO 00/37491 PCT/tB99/02058 <220>
<221> sig~eptide <222> 55..192 <223> Von Heijne matrix score 4.70 seq ILTGLTVGSAADA/GE
<220>
<221> polyA_signal <222> 654..659 <220>
<221> polyA_site <222> 680..694 <400>

aatgcttgag gaaaactggg tgttctg aaaacctaaaaag tttaatg 57 aacagtatat Met aaaacc ttgttcaat ccagcccct gccatt getgacctg gatccc cag 105 LysThr LeuPheAsn ProAlaPro AlaIle AlaAspLeu AspPro Gln ttctac accctctca gatgtgttc tgctgc aatgaaagt gagget gag 153 PheTyr ThrLeuSer AspValPhe CysCys AsnGluSer GluAla Glu atttta actggcctc acggtgggc agcget gcagatget ggggag get 201 IleLeu ThrGlyLeu ThrValGly SerAla AlaAspAla GlyGlu Ala gcatta gtgctcttg aaaaggggc tgccag gtggtaatc attacc tta 249 AlaLeu ValLeuLeu LysArgGly CysGln ValValIle IleThr Leu gggget gaaggatgt gtggtgctg tcacag acagaacct gagcca aag 297 GlyAla GluGlyCys ValValLeu SerGln ThrGluPro GluPro Lys cacatt cccacagag aaagtcaag getgtg gataccacg tgtaga cct 345 HisIle ProThrGlu LysValLys AlaVal AspThrThr CysArg Pro ggctca agacccaag agtgaagca gcaagt gtgaagaag cagaaa cat 393 GlySer ArgProLys SerGluAla AlaSer ValLysLys GlnLys His tataaa taacccagag tgcctactga ttttgtggcc 449 aatcctttta taacagcaac TyrLys taacagctcg agcaaaaatg caacattgtg caatgactaa ttactcaaaa 509 aatataaata WO 00/37491 PC'T/IB99/02058 ttttgtgcat cagcagaagt ggaacctgtg gttggtgcta atattatgaa atgcctttgc 569 tgtttaataa tctggtagct ctgtattatt tagcatgcat ttttcttgga gaacaatgat 629 tttatttcaa gtacctctca ctgaaataaa aaagcagctg ttagaagacg aaaaaaaaaa 689 aaaaa 694 <210> 32 <211> 1110 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 90..287 <220>
<221> sig~eptide <222> 90..146 <223> Von Heijne matrix score 9.30 seq VFVFLFLWDPVLA/GI
<220>
<221> polyA_signal <222> 1078..1083 <220>
<221> polyA site <222> 1096..1110 <400> 32 atcatcttacatcagcacaa caccgatgtc 60 gaagaagagt agaccctgcc gagcatagca actagcctccttaacagaag atgaag cctctc cttgttgtgttt I13 ttcccagcc MetLys ProLeu LeuValValPhe gtc ttt ttc ctttgggat ccagtgctg gcaggt ataaattcatta 161 ctt Val Phe Phe LeuTrpAsp ProValLeu AlaGly IleAsnSerLeu Leu tca tca atg cacaagaaa tgctataaa aatggc atctgcagactt 209 gaa Ser Ser Met HisLysLys CysTyrLys AsnGly IleCysArgLeu Glu gaa tgc gag agtgaaatg ttagttgcc tactgt atgtttcagctg 257 tat Glu Cys Glu SerGluMet LeuValAla TyrCys MetPheGlnLeu Tyr WO 00/37491 PCT/tB99/02058 SO

gag tgc tgt gtc aaa gga aat cct gca ccc tgacataaga aaccaatgaa 307 Glu Cys Cys Val Lys Gly Asn Pro Ala Pro tggccactatcctgtaggcccttgattctgccatctttcacaaaaccagggaatttagat367 caaactgtgacaccatgatgtgtccatgactactggtttttagcatttttataggccagc427 agactcttgtggtcttaaatttaaagagctgagctgtagccttctttaaaagagctcggt487 ttttcacaaaaacaatgtagaagatattttctcacctcaacgtgatgtccagtgtgctca547 tcagcacctgtttctccctctaatcatagaggatat~ttattatttagaaaggcttcaa607 gggaaacaacttttggcacctaagtcgtgtcctaccttcgcttcagcttcgcatttccca667 tttctgtgaaattcccaactttagagaagcagatttgccatggccttctgacaaccttgt727 acatctctcacataaaccgcataggcagggcttaactacaggctggcccgagtctggact787 gagtctgaccctgaagttcctttggaacaggagaggccatcttgtgatgggctggaacaa847 ggtaatttctcatccacctccctagtttcagttgagcaatggaacttcccacctgagccc907 ctagggttcagctacaggctataagactgccgtcctgtggtttagtgttggttccttagc967 agcagagtgatgccacctctgctgcccgtcatctgactcctctggatgggtgttatcctg1027 tggcttaagagctaacaccatgctgatcttgctttgctatatgtgtaactaataaactgc1087 ctaaatgcaaaaaaaaaaaaaaa 1110 <210> 33 <211> 623 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 49..447 <220>
<221> sig~eptide <222> 49..111 <223> Von Heijne matrix score 5.00 seq LIVIFFYCWLSSS/HE
<220>
<221> polyA signal <222> 579..584 <220>
<221> polyA site <222> 602..623 <400>

attagaattt tgagaaattc gtgatgag tgt 57 tctttctcaa atg attaaaggtt tcc Me t Cys Ser tcc aagttt actttgatt gtaatt tttttttac tgttggctt tca 105 cta Ser LysPhe ThrLeuIle ValIle PhePheTyr CysTrpLeu Ser Leu tcc catgag gagttagaa ggtggt acatcgaag tcttttgac ctc 153 agc Ser HisGlu GluLeuGlu GlyGly ThrSerLys SerPheAsp Leu Ser cat gtgatt atgcttgtc atcget ggtggtatc ctggcggcc ttg 201 aca His ValIle MetLeuVal IleAla GlyGlyIle LeuAlaAla Leu Thr ctc ctgata gttgtcgtg ctctgt ctttacttc aaaatacac aac 249 ctg Leu LeuIle ValValVal LeuCys LeuTyrPhe LysIleHis Asn Leu gcg aaaget gcaaaggaa cctgaa getgtgget gtaaaaaat cac 297 cta Ala LysAla AlaLysGlu ProGlu AlaValAla ValLysAsn His Leu aac gacaag gtgtggtgg gccaag aacagccag gccaaaacc att 345 cca Asn AspLys ValTrpTrp AlaLys AsnSerGln AlaLysThr Ile Pro gcc gagtct tgtcctgcc ctgcag tgctgtgaa ggatataga atg 393 acg Ala GluSer CysProAla LeuGln CysCysGlu GlyTyrArg Met Thr tgt agtttt gattccctg ccacct tgctgttgc gacataaat gag 441 gcc Cys SerPhe AspSerLeu ProPro CysCysCys AspIleAsn Glu Ala ggc tgagttagga aaggtgggca catgagcaat acttcttagt 497 ctc caaaaatctt Gly Leu agattgtttt gttattca aa gtgagattat ataatttaca 557 tcaagttcta gtgtttttat gtgttgtttt atatactt tt acactattaa aaataaaaaa aaaaaaaaat gaataaatgt gccaaa 623 <210>

<211>

<212>
DNA

<213> Sapiens Homo <220>

<221>
CDS

<222> 199..618 <220>
<221> sig~eptide <222> 199..408 <223> Von Heijne matrix score 3.90 seq FKVLTQPLSLLWG/CD
<220>
<221> polyA-signal <222> 626..631 <220>
<221> polyA_site <222> 643..657 <400>

aactggat ag atggagaaag gcaaatgcct 60 agtactgccc ccttcagagt ccttcagccc ctacctaa tg tgaagaaaag tcaaagtcca 120 ctttctcaga ttctagctct taaataagca aaaataag ga gattttttgt tttcatctga 180 atgaaatgtt taataatttt ttcctgatat atatatca ca atg gttctt actaaacctctt caaaga aatggc 231 gaaacagc Met ValLeu ThrLysProLeu GlnArg AsnGly agcatg atgagctttgaa aatgtg aaagaaaagagc agagaa ggaggg 279 SerMet MetSerPheGlu AsnVal LysGluLysSer ArgGlu GlyGly ccccat gcacacacaccc gaagaa gaattgtgtttc gtggta acacac 327 ProHis AlaHisThrPro GluGlu GluLeuCysPhe ValVal ThrHis taccct caggttcagacc acactc aacctgtttttc catata ttcaag 375 TyrPro GlnValGlnThr ThrLeu AsnLeuPhePhe HisIle PheLys gttctt actcaaccactt tccctt ctgtggggttgt gatcag aagcct 423 ValLeu ThrGlnProLeu SerLeu LeuTrpGlyCys AspGln LysPro cgtact gttcctaccctt ggaaac ggcgcatgggat acctgc caacaa 471 ArgThr ValProThrLeu GlyAsn GlyAlaTrpAsp ThrCys GlnGln cacata cgcacttcatca tggaca gcaaacacactc gtcatt caaaac 519 HisIle ArgThrSerSer TrpThr AlaAsnThrLeu ValIle GlnAsn WO 00/37491 PCT/lB99/02058 cag cat tca cgg gaa agc act gtt tct gtt tgc ctt ttt atg tta atc 567 Gln His Ser Arg Glu Ser Thr Val Ser Val Cys Leu Phe Met Leu Ile cgc atg caa cat att ttg aaa aca gat aca ctt caa cag ttc aga ata 615 Arg Met Gln His Ile Leu Lys Thr Asp Thr Leu Gln Gln Phe Arg Ile tgc tagtactaat aaaaccaaca tgttaaaaaa aaaaaaaaa 657 Cys <210> 35 <211> 1137 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 271..969 <220>
<221> sig~eptide <222> 271..366 <223> Von Heijne matrix score 5.60 seq WMGLACFRSLAAS/SP
<220>
<221> polyA signal <222> 1092..1097 <220>
<221> polyA_site <222> 1123..1137 <400> 35 aaaaacctttcaagtgccccctcctttccttaaagtcttttataggggtccccttcttgg60 ccatctccatcctgtgagtcaggactgaaagggcacagacaggtcactgccagcattgtt120 ggggcaagcctgcaagcacgcatcactggggatctgacatgacaatggccgcctgccccc180 tctgagggctacaggacttaccccagtgggaagcagctaagcaggtctgaccagccgacc240 tggacctggccaagggtcctgtcatccctcatg gcc ccg cca 294 acc ttc cgg ctg Met Ala Pro Pro Thr Phe Arg Leu ata aggaagatg ttttccttc aaggtg agcagatgg atggggctt gcc 342 Ile ArgLysMet PheSerPhe LysVal SerArgTrp MetGlyLeu Ala tgc ttccggtcc ctggcggca tcctct cccagtatt cgccagaag aaa 390 Cys PheArgSer LeuAlaAla SerSer ProSerIle ArgGlnLys Lys cta atgcacaag ctgcaggag gaaaag gettttcgc gaagagatg aaa 438 Leu MetHisLys LeuGlnGlu GluLys AlaPheArg GluGluMet Lys att tttcgtgaa aaaatagag gacttc agggaagag atgtggact ttc 486 Ile PheArgGlu LysIleGlu AspPhe ArgGluGlu MetTrpThr Phe cga ggcaagatc catgetttc cggggc cagatcctg ggtttttgg gaa 534 Arg GlyLysIle HisAlaPhe ArgGly GlnIleLeu GlyPheTrp Glu gag gagagacct ttctgggaa gaggag aaaaccttc tggaaagag gaa 582 Glu GluArgPro PheTrpGlu GluGlu LysThrPhe TrpLysGlu Glu aaa tccttctgg gaaatggaa aagtct ttcagggag gaagagaaa act 630 Lys SerPheTrp GluMetGlu LysSer PheArgGlu GluGluLys Thr ttc tggaaaaag taccgcact ttctgg aaggaggat aaggccttc tgg 678 Phe TrpLysLys TyrArgThr PheTrp LysGluAsp LysAlaPhe Trp 9p 95 100 aaa gaggacaat gccttatgg gaaaga gaccggaac cttcttcag gag 726 Lys GluAspAsn AlaLeuTrp GluArg AspArgAsn LeuLeuGln Glu gac aaggccctg tgggaggaa gaaaag gccctgtgg gtagaggaa aga 774 Asp LysAlaLeu TrpGluGlu GluLys AlaLeuTrp ValGluGlu Arg gcc ctccttgag ggggagaaa gccctg tgggaagat aaaacgtcc ctc 822 Ala LeuLeuGlu GlyGluLys AlaLeu TrpGluAsp LysThrSer Leu tgg gaggaagag aatgccctc tgggag gaagagagg gccttctgg atg 870 Trp GluGluGlu AsnAlaLeu TrpGlu GluGluArg AlaPheTrp Met gag aacaatggc cacattgcc ggagag cagatgctc gaagatggg ccc 918 Glu AsnAsnGly HisIleAla GlyGlu GlnMetLeu GluAspGly Pro cac aacgccaac agagggcag cgcttg ctggccttc tcccgaggc agg 966 His AsnAlaAsn ArgGlyGln ArgLeu LeuAlaPhe SerArgGly Arg gcg tagccagcat gcaggtgcag ggccctgtgg tccagactcc cctgggttgg 1019 Ala gattcaagtc cagggtgagc ccatgtgctg gagaaaatac acactcattg gtctccttgc 1079 tttgaaagat ccaataaagt cctgaggcaa ggtttggaaa accaaaaaaa aaaaaaaa 1137 <210> 36 <211> 636 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 192..440 <220>
<221> sig~eptide <222> 192..278 <223> Von Heijne matrix score 5.20 seq WFMTVAAGGASS/FA
<220>
<221> polyA-signal <222> 590..595 <220>
<221> polyA_site <222> 622..636 <400> 36 aaaagcgagtcaggtccctcgcgctcccgccccacgcgcgtgaccagagc gcgctggCCC60 ggcccacccggggcggttgtggtcgctatatataaggtggggaggccgcc ggcccgttcg120 gttccgggcgttaccatcgtccgtgcgcaccgcccggcgtccagatttgg caattcttcg180 ctgaagtcatc atg ttt ttc ctc ctg aaa agg aag gaa 230 agc caa atg ctc Met Ser Phe Phe Leu Leu Lys Arg Lys Glu Gln Met Leu att ccc ttg gtg gtg ttc atg act gtg gcg gcg ggt gga gcc tca tct 278 Ile Pro Leu Val Val Phe Met Thr Val Ala Ala Gly Gly Ala Ser Ser ttc get gtg tat tct ctt tgg aaa acc gat gtg atc ctt gat cga aaa 326 Phe Ala Val Tyr Ser Leu Trp Lys Thr Asp Val Ile Leu Asp Arg Lys aaa aat cca gaa cct tgg gaa act gtg gac cct act gta cct caa aag 374 Lys Asn Pro Glu Pro Trp Glu Thr Val Asp Pro Thr Val Pro Gln Lys ctt ata aca atc aac caa caa tgg aaa ccc att gaa gag ttg caa aat 422 Leu Ile Thr Ile Asn Gln Gln Trp Lys Pro Ile Glu Glu Leu Gln Asn gtc caa agg gtg acc aaa tgacgagccc tcgcctcttt cttctgaaga 470 Val Gln Arg Val Thr Lys gtactctata aatctagtgg aaacatttct gcacaaacta gattctggac accagtgtgc 530 ggaaatgctt ctgctacatt tttagggttt gtctacattt tttgggctct ggataaggaa 590 ttaaaggagt gcagcaataa ctgcactgtc caaaaaaaaa aaaaaa 636 <210> 37 <211> 818 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 59..703 <220>
<221> sig_peptide <222> 59..181 <223> Von Heijne matrix score 6.80 seq LVSCLSSQSSALS/QS
<220>
<221> polyA_signal <222> 783..788 <220>
<221> polyA-site <222> 804..818 <400> 37 gacatcttga gctgaagcag ggttttgagc cactgctgct gctgctgcca ttgtcacc 58 atg gtc tca get ctg cgg gga gca ccc ctg atc agg gtg cac tca agc 106 Met Val Ser Ala Leu Arg Gly Ala Pro Leu Ile Arg Val His Ser Ser cctgtttct tctccttctgtg agtgga ccacggagg ctggtgagc tgc 154 ProValSer SerProSerVal SerGly ProArgArg LeuValSer Cys ctgtcatcc caaagctcaget ctgagc cagagtggt ggtggctcc acc 202 LeuSerSer GlnSerSerAla LeuSer GlnSerGly GlyGlySer Thr tctgccgcc ggcatagaagcc aggagc agggetctc agaaggcgg tgg 250 SerAlaAla GlyIleGluAla ArgSer ArgAlaLeu ArgArgArg Trp tgcccaget gggatcatgttg ttggcc ctggtctgt ctgctcagc tgc 298 CysProAla GlyIleMetLeu LeuA1a LeuValCys LeuLeuSer Cys ctgctaccc tccagtgaggcc aagctc tacggtcgt tgtgaactg gcc 346 LeuLeuPro SerSerGluAla LysLeu TyrGlyArg CysGluLeu Ala agagtgcta catgacttcggg ctggac ggataccgg ggatacagc ctg 394 ArgValLeu HisAspPheGly LeuAsp GlyTyrArg GlyTyrSer Leu getgactgg gtctgccttget tatttc acaagcggt ttcaacgca get 442 AlaAspTrp ValCysLeuAla TyrPhe ThrSerGly PheAsnAla Ala getttggac tacgaggetgat gggagc accaacaac gggatcttc cag 490 AlaLeuAsp TyrGluAlaAsp GlySer ThrAsnAsn GlyIlePhe Gln atcaacagc cggaggtggtgc agcaac ctcaccccg aacgtcccc aac 538 IleAsnSer ArgArgTrpCys SerAsn LeuThrPro AsnValPro Asn gtgtgccgg atgtactgctca gatttg ttgaatcct aatctcaag gat 586 ValCysArg MetTyrCysSer AspLeu LeuAsnPro AsnLeuLys Asp accgttatc tgtgccatgaag ataacc caagagcct cagggtctg ggt 634 ThrValIle CysAlaMetLys IleThr GlnGluPro GlnGlyLeu Gly tactgggag gcctggaggcat cactgc cagggaaaa gacctcact gaa 682 TyrTrpGlu AlaTrpArgHis HisCys GlnGlyLys AspLeuThr Glu tgggtggat ggctgtgacttc taggatggac 733 ggaaccatgc acagcaggct TrpValAsp GlyCysAspPhe gggaaatgtg ttgggaagac aagccagcga 793 gtttggttcc ataaaggatg tgacctaggc gttgaacgtg gl8 aaaaaaaaaa aaaaa WO 00/37491 PC'T/IB99/02058 <210> 3B
<211> 1888 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 139..1389 <220>
<221> sig~eptide <222> 139..198 <223> Von Heijne matrix score 5.00 seq HLLAGFCVWWLG/WV
<220>
<221> polyA-signal <222> 1854..1859 <220>
<221> polyA-site <222> 1873..1888 <400>

ccccccca gc tggaaccaag cccccttcct ctgggtgtcc aaggttgtgt ttgtctcctg ctatcagg gc acagtcctca gggagaatag gagccagaac ggatgtttcg ctgagcccct aagccatt cc cctcacca atgatg gggtcccca gtgagtcat ctgctggcc 171 MetMet GlySerPro ValSerHis LeuLeuAla ggcttc tgtgtgtgg gtcgtc ttgggctgg gtagggggc tcagtcccc 219 GlyPhe CysValTrp ValVal LeuGlyTrp ValGlyGly SerValPro aacctg ggccctget gagcag gagcagaac cattacctg gcccagctg 267 AsnLeu GlyProAla GluGln GluGlnAsn HisTyrLeu AlaGlnLeu tttggc ctgtacggc gagaat gggacgctg actgcaggg ggcttggcg 315 PheGly LeuTyrGly GluAsn GlyThrLeu ThrAlaGly GlyLeuAla cggctt ctccacagc ctgggg ctaggccga gttcagggg cttcgcctg 363 ArgLeu LeuHisSer LeuGly LeuGlyArg ValGlnGly LeuArgLeu ggacagcatggg cctctgact ggacggget gcatcc ccaget gcagac 411 GlyGlnHisGly ProLeuThr GlyArgAla AlaSer ProAla AlaAsp aattccacacac aggccacag aaccctgag ctgagt gtggat gtctgg 459 AsnSerThrHis ArgProGln AsnProGlu LeuSer ValAsp ValTrp gcagggatgcct ctgggtccc tcagggtgg ggtgac ctggaa gagtca 507 AlaGlyMetPro LeuGlyPro SerGlyTrp GlyAsp LeuGlu GluSer aaggcccctcac ctaccccgt gggccagcc ccctcg ggcctg gacctc 555 LysAlaProHis LeuProArg GlyProAla ProSer GlyLeu AspLeu cttcacaggctt ctgttgctg gaccactca ttgget gaccac ctgaat 603 LeuHisArgLeu LeuLeuLeu AspHisSer LeuAla AspHis LeuAsn gaggattgtctg aacggctcc cagctgctg gtcaat tttggc ttgagc 651 GluAspCysLeu AsnGlySer GlnLeuLeu ValAsn PheGly LeuSer cccgetgetcct ctgacccct cgtcagttt getctg ctgtgc ccagcc 699 ProAlaAlaPro LeuThrPro ArgGlnPhe AlaLeu LeuCys ProAla ctgctttatcag atcgacagc cgcgtctgc atcggc getccg gcccct 747 LeuLeuTyrGln IleAspSer ArgValCys IleGly AlaPro AlaPro gcacccccaggg gatctacta tctgccctg cttcag agtgcc ctggca 795 AlaProProGly AspLeuLeu SerAlaLeu LeuGln SerAla LeuAla gtcctgttgctc agcctccct tctccccta tccctg ctgctgctg cgg 843 ValLeuLeuLeu SerLeuPro SerProLeu SerLeu LeuLeuLeu Arg ctcctgggacct cgtctacta cggcccttg ctgggc ttcctgggg gcc 891 LeuLeuGlyPro ArgLeuLeu ArgProLeu LeuGly PheLeuGly Ala ctggcggtgggc actctttgt ggggatgca ctgcta catctgcta ccg 939 LeuAlaValGly ThrLeuCys GlyAspAla LeuLeu HisLeuLeu Pro catgcacaagaa gggcggcac gcaggacct ggcgga ctaccagag aag 987 HisAlaGlnGlu GlyArgHis AlaGlyPro GlyGly LeuProGlu Lys gacctgggcccg gggctgtca gtgctcgga ggcctc ttcctgctc ttt 1035 AspLeuGlyPro GlyLeuSer ValLeuGly GlyLeu PheLeuLeu Phe gtg ctg aac atgctg ggg ttg cgg cga ggg ctc agg 1083 gag ctt cac cca Val Leu Asn MetLeu Gly Leu Arg Arg Gly Leu Arg Glu Leu His Pro aga tgc agg cgaaaa cga aat ctc aca cgc aac ttg 1131 tgc agg gaa gat Arg Cys Arg ArgLys Arg Asn Leu Thr Arg Asn Leu Cys Arg Glu Asp ccg gag ggc agtggg atg ctt cag cta cag gca get 1179 aat gcc ccc cca Pro Glu Gly SerGly Met Leu Gln Leu Gln Ala Ala Asn Ala Pro Pro gag cca get cagggc cag gag aag agc cag cac cca 1227 ggg agg aac cca Glu Pro Ala GlnGly Gln Glu Lys Ser Gln His Pro Gly Arg Asn Pro get ctg cct cctggg cac ggc cac cat ggg cac cag 1275 gcc caa agt ggt Ala Leu Pro ProGly His Gly His His Gly His Gln Ala Gln Ser Gly ggc act atc acgtgg atg ctc ctg gat ggt cta cac 1323 gat gtc gga aac Gly Thr Ile ThrTrp Met Leu Leu Asp Gly Leu His Asp Val Gly Asn ctc act ggg ctggcc ata get gcc tct gat ggc ttc 1371 gat ggt ttc tcc Leu Thr Gly LeuAla Ile Ala Ala Ser Asp Gly Phe Asp Gly Phe Ser gcg gcc gta ccacct tagcggtctt ctgccatgag 1419 tca ctgccccacg Ala Ala Val ProPro Ser aactgggtgactttgccatg cagggctgtcctttcggcgg ctgctgctgc1479 ctgctccagt tgagcctcgt ggggtgcagtcctgggggtg gggctcagcc1539 gtctggagcc ctgggattgg tgggccctgtccccctcact ttggggtcactgctggggtc ttcctctatg1599 ccctgggtgt tggcccttgt ttcgtcctccggagcccctg cctacgcccc1659 ggacatgcta ccagccctgc atgtgctcct tggggggcggcctcatgctt gccataaccc1719 gcaggggctg gggctgctgc tgctggagga ctgagggctgatggggccag tggaaagggg1779 gcggctactg cccgtgacca tcgggttgcc ggaatggaggcgggacacag ggccagtagg1839 cttccttccc cccaaccaca agcaataggattttaataaa cccaaaaaaaaaaaaaaaa 1888 cagaacccat <210> 39 <211> 1894 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 21..1118 <220>
<221> sig peptide <222> 21..89 <223> Von Heijne matrix score 10.80 seq ALALLSAFSATQA/RK
<220>
<221> polyA-signal <222> 1858..1863 <220>
<221> polyA_site <222> 1879..1894 <220>
<221> mist feature <222> 1695 <223> n=a, g, c or t <400>

agacgtga gc g a g 53 agagcagata gc agc get at at gcc gtg ctc act tgg get Met a a Al Ser Ala Met Val Al Leu Thr Trp Ala ctgget cttctttca gcgttttcg gccacttag gcacggaaa ggcttc 101 LeuAla LeuLeuSer AlaPheSer AlaThrGln AlaArgLys GlyPhe tgggac tatttcagc tagactagc ggggacaaa ggcagggtg gagtag 149 TrpAsp TyrPheSer GlnThrSer GlyAspLys GlyArgVal GluGln atccat tagtagaag atggetcgc gagcccgcg actctgaaa gacagc 197 IleHis GlnGlnLys MetAlaArg GluProAla ThrLeuLys AspSer cttgag caagacctc aacaatatg aacaagttc ctggaaaag ctgagg 245 LeuGlu GlnAspLeu AsnAsnMet AsnLysPhe LeuGluLys LeuArg cctctg agtgggagc gaggetcct cggctccca taggacccg gtgggc 293 ProLeu SerGlySer GluAlaPro ArgLeuPro GlnAspPro ValGly atgcgg cggtagctg taggaggag ttggaggag gtgaagget cgcctc 341 MetArg ArgGlnLeu GlnGluGlu LeuGluGlu ValLysAla ArgLeu cagccctac atggcagag gcgcacgag ctggtg ggctggaat ttggag 389 GlnProTyr MetAlaGlu AlaHisGlu LeuVal GlyTrpAsn LeuGlu ggcttgcgg cagcaactg aagccctac acgatg gatctgatg gagcag 437 GlyLeuArg GlnGlnLeu LysProTyr ThrMet AspLeuMet GluGln gtggccctg cgcgtgcag gagctgcag gagcag ttgcgcgtg gtgggg 485 ValAlaLeu ArgValGln GluLeuGln GluGln LeuArgVal ValGly gaagacacc aaggcccag ttgctgggg ggcgtg gacgagget tggget 533 GluAspThr LysAlaGln LeuLeuGly GlyVal AspGluAla TrpAla ttgctgcag ggactgcag agccgc gtggtgcac cacacc ggccgcttc 581 LeuLeuGln GlyLeuGln SerArg ValValHis HisThr GlyArgPhe aaagagctc ttccaccca tacgcc gagagcctg gtgagc ggcatcggg 629 LysGluLeu PheHisPro TyrAla GluSerLeu ValSer GlyIleGly cgccacgtg caggagctg caccgc agtgtgget ccgcac gcccccgcc 677 ArgHisVal GlnGluLeu HisArg SerValAla ProHis AlaProAla agccccgcg cgcctcagt cgctgc gtgcaggtg ctctcc cggaagctc 725 SerProAla ArgLeuSer ArgCys ValGlnVal LeuSer ArgLysLeu acgctcaag gccaaggcc ctgcac gcacgcatc cagcag aacctggac 773 ThrLeuLys AlaLysAla LeuHis AlaArgIle GlnGln AsnLeuAsp cagctgcgc gaagagctc agcaga gcctttgca ggcact gggactgag 821 GlnLeuArg GluGluLeu SerArg AlaPheAla GlyThr GlyThrGlu gaaggggcc ggcccggac ccccag atgctctcc gaggag gtgcgccag 869 GluGlyAla GlyProAsp ProGln MetLeuSer GluGlu ValArgGln cgacttcag getttccgc caggac acctacctg cagata getgccttc 917 ArgLeuGln AlaPheArg GlnAsp ThrTyrLeu GlnIle AlaAlaPhe actcgcgcc atcgaccag gagact gaggaggtc cagcag cagctggcg 965 ThrArgAla IleAspGln GluThr GluGluVal GlnGln GlnLeuAla ccacctcca ccaggccac agtgcc ttcgcccca gagttt caacaaaca 1013 ProProPro ProGlyHis SerAla PheAlaPro GluPhe GlnGlnThr gac agt ggc aag gtt ctg agc aag ctg cag gcc cgt ctg gat gac ctg 1061 Asp Ser Gly Lys Val Leu Ser Lys Leu Gln Ala Arg Leu Asp Asp Leu tgg gaa gac atc act cac agc ctt cat gac cag ggc cac agc cat ctg 1109 Trp Glu Asp Ile Thr His Ser Leu His Asp Gln Gly His Ser His Leu ggg gac tgaggatctacctgcccaggcccattcccagctccttgtc 1158 ccc Gly Asp Pro tggggagccttggctctgagcctctagcatggttcagtccttgaaagtggcctgttgggt1218 ggagggtggaaggtcctgtgcaggacagggaggccaccaaaggggctgctgtctcctgca1278 tatccagcctcctgcgactccccaatctggatgcattacattcaccaggctttgcaaacc1338 cagcctcccagtgctcatttgggaatgctcatgagttactccattcaagggtgagggagt1398 agggagggagaggcaccatgcatgtgggtgattatctgcaagcctgtttgccgtgatgct1458 ggaagcctgtgccactacatcctggagtttggctctagtcacttctggctgcctggtggc1518 cactgctacagctggtccacagagaggagcacttgtctccccagggctgccatggcagct1578 atcaggggaatagaagggagaaagagaatatcatggggagaacatgtgatggtgtgtgaa1638 tatccctgctggctctgatgctggtgggtacgaaaggtgtgggctgtgataggaganggc1698 agagcccatgtttcctgacatagctctacacctaaataagggactgaaccctcccaactg1758 tgggagctccttaaaccctctggggagcatactgtgtgctctccccatctccagcccctc1818 cctctgggttcccaagttgaagcctagacttctggctcaaatgaaatagatgtttatgat1878 aaaaaaaaaaaaaaaa 1894 <210> 40 <211> 1913 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 143..592 <220>
<221> sig_peptide <222> 143..277 <223> Von Heijne matrix score 5.90 seq VLVDLAILGQAYA/FA
<220>
<221> polyA_signal <222> 1877..1882 <220>
<221> polyA-site <222> 1899..1913 <400> 40 atttttttgt gcctaagatg cccagtgcgt tgctgggttt ttctgctgtc ctcgggctct 60 ggacatgagg ccagaccttg tgaccttgtt ggcagtgggc agtggcttga tgtgaggtcc 120 cagagacggc aggttcatca ag atg gtg ctc atg tgg acc agt ggt gac gcc 172 Met Val Leu Met Trp Thr Ser Gly Asp Ala ttc aag acg gcc tac ttc ctg ctg aag ggt gcc cct ctg cag ttc tcc 220 Phe Lys Thr Ala Tyr Phe Leu Leu Lys Gly Ala Pro Leu Gln Phe Ser gtg tgc ggc ctg ctg cag gtg ctg gtg gac ctg gcc atc ctg ggg cag 268 Val Cys Gly Leu Leu Gln Val Leu Val Asp Leu Ala Ile Leu Gly Gln gcc tac gcc ttc gcc cca ccc cca gaa gcc ggc gcc cca cgc cgt gca 316 Ala Tyr Ala Phe Ala Pro Pro Pro Glu Ala Gly Ala Pro Arg Arg Ala ccccac tggcaccaa ggccct ctgacagtg gggagg acg atg tgg 364 agg ProHis TrpHisGln GlyPro LeuThrVal GlyArg Thr Met Trp Arg gaccgc cagccgcgg gcactg gtgggccct gacctc ccc ggg agg 412 gcg AspArg GlnProArg AlaLeu ValGlyPro AspLeu Pro Gly Arg Ala gtgggt gccgtggcc cctgca ggtgtggca gagatg ggg ggg cat 460 cac ValGly AlaValAla ProAla GlyValAla GluMet Gly Gly His His tggggt ctccatcag cctctg tggggtgtc tcaggg tgg gtg ggg 508 gca TrpGly LeuHisGln ProLeu TrpGlyVal SerGly Trp Val Gly Ala gtgggg ctgggacgc tgtttg tgctcagcg gggaca gcc gtt gat 556 agg ValGly LeuGlyArg CysLeu CysSerAla GlyThr Ala Val Asp Arg ctggcc ccgagggtt ttggat gtttttagg atgaca taaaaagcaa 602 LeuAla ProArgVal LeuAsp ValPheArg MetThr gtgttttccc catttcctct caaggtacac attgggcggc tatgaaacac cgtctgagcc ctgcaggaac ctgctccagg cgcgaacctt gaagctgggg722 tggacacacg ggccagcagc tgaccgcagg agaccctgta cga ccccgtgacaccctggc782 aggcctgtga gcggagccct cagacaccct gcttggactg ctacccagggg tctggcacgggggaggg842 gggtggcctc tg ctggggcttt ctctgcctgg tgc ggacgcagggtcaccgt902 tacacacgga aaggcggctg gctccgggttttctgacagtcggtgtttcctgggcctttggagtggctgcgaggcctgaa962 cgccttgtggatccgctgtgtccagcccggctgagcatcgccagggctagctcatgctgc1022 tcttgtcagcctctggttctcctcgagtccttggggacgtggcagatgccagcgaccatc1082 agacaacgtggaggccctcatgggcaatggctgagggggccgggctgaggctgtgcacat1142 gcagtctgcacgccactcttgggctctgctggcggagatccccttccttctgggtgcaga1202 ctgcacctccggatgcagttttgatgtccatcttccaggagagagacggtctcgggtcca1262 gggagtggagggggctgcccctgccgtgcaggtcctggccgatggcgccttaccctgctg1322 ccctgggcttttggcctgaagcaaattcctgagtggggggtactggggcctgccgcatcc1382 tgtcctgtccactgcccacccccgtgtgctggctccctcacttctggctgcagtgggagc1442 cgccagtctgacccttgtcaccgcacgctctgcccccaccccgttgcaagaggtcacacc1502 atgtcagcagccttgcactgaccgcagccggcccccaggcctcagagttctggatgcttc1562 cgtgcggctccaacaggcatcgtcttcccttccgcaggtggaggggccgcttcccgcagg1622 catctgagctctgtgccggggccgtggccatgggaagatgttccacgctgcctcctcctc1682 gagttttcctcggaaacactcttgaatgtctgagtgagggtcctgcttagctctttggcc1742 tgtgagatgctttgaaaatttttatttttttaagatgaagcaagatgtctgtagcggtaa1802 ttgcctcacattaaactgtcgccgactgcaggcgcagtgactgctgaatgtaccctgtgt1862 ggcgacttggaatcaataaaccatttgtggatcctaaaaaaaaaaaaaaaa 1913 <210> 41 <211> 1744 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 76..999 <220>
<221> sig_peptide <222> 76..279 <223> Von Heijne matrix score 5.10 seq LSLPVCTVSLVSS/VS
<220>
<221> polyA_signal <222> 1711..1716 <220>
<221> polyA_site <222> 1729..1744 <220>
<221> mist feature <222> 336 <223> n=a, g, c or t <400>

aagttgaggc tggtg accaaagccct ctcaggc aggcagaccc gggcctccc caccc gc a cgccacacct atggatttt gtcget ggagccatc ggaggc gtctgc 111 tgttc MetAspPhe ValAla GlyAlaIle GlyGly ValCys ggt gttgetgtg ggctatccc ctggac acggtgaag gtcagg atctag 159 Gly ValAlaVal GlyTyrPro LeuAsp ThrValLys ValArg IleGln acg gagccaaag tatacaggc atctgg cattgcgtc cgggat acgtat 207 Thr GluProLys TyrThrGly IleTrp HisCysVal ArgAsp ThrTyr cat cgagagcgc gtgtggggc ttctat cggggcctc tcgctg cccgtg 255 His ArgGluArg ValTrpGly PheTyr ArgGlyLeu SerLeu ProVal tgc acggtgtcc ctggtatct tccgtg tcttttggc acttat cgccat 303 Cys ThrValSer LeuValSer SerVal SerPheGly ThrTyr ArgHis tgc ctggcgcat atctgccgg ctccgg tatggnaac cctgac gccaag 351 Cys LeuAlaHis IleCysArg LeuArg TyrGlyAsn ProAsp AlaLys ccc actaaggcc gacatcacg ctctcg ggatgcgcc tccggc ctcgtc 399 Pro ThrLysAla AspIleThr LeuSer GlyCysAla SerGly LeuVal cgc gtgttcctg acgtcgccc actgag gtggccaaa gtccgc ttgtag 447 Arg ValPheLeu ThrSerPro ThrGlu ValAlaLys ValArg LeuGln acg tagacatag gcgtagaag tagtag cggctgctt tcggcc tcgggg 495 Thr GlnThrGln AlaGlnLys GlnGln ArgLeuLeu SerAla SerGly ccg ttggetgtg ccccccatg tgtcct gtgccccca gcctgc ccagag 543 Pro LeuAlaVal ProProMet CysPro ValProPro AlaCys ProGlu ~5 80 85 ccc aagtatcgc gggccactg cattgc ctggccacg gtagcc cgtgag 591 Pro LysTyrArg GlyProLeu HisCys LeuAlaThr ValAla ArgGlu gp 95 100 gag gggctgtgc ggcctctat aagggc agctcggcc ctggtc ttacgg 639 Glu GlyLeuCys GlyLeuTyr LysGly SerSerAla LeuVal LeuArg WO 00/37491 PCT/lB99/02058 gac ggc tcc ttt gcc acc tac ttc tac gcg ctc tgc 687 cac ctt tcc gtc Asp Gly Ser Phe Ala Thr Tyr Phe Tyr Ala Leu Cys His Leu Ser Val gag tgg agc ccc get ggc cac agc gat gtc ggc gtg 735 ctc cgg cca ccg Glu Trp Ser Pro Ala Gly His Ser Asp Val Gly Val Leu Arg Pro Pro ctg gtg ggg ggc tgt gca gga gtc tgg get gcc acc 783 gcc ctg gcc gtg Leu Val Gly Gly Cys Ala Gly Val Trp Ala Ala Thr Ala Leu Ala Val ccc atg gtg atc aag tcg aga ctg gac ggg ggc cag 831 gac cag gca cag Pro Met Val Ile Lys Ser Arg Leu Asp Gly Gly Gln Asp Gln Ala Gln agg cgc cgg ggt ctc ctg cac tgt acc agc cga gag 879 tac atg gtg gtt Arg Arg Arg Gly Leu Leu His Cys Thr Ser Arg Glu Tyr Met Val Val gag gga cgg gtc ctt ttc aag ggg ctc aat tgc cgc 927 ccc ctg gta tgc Glu Gly Arg Val Leu Phe Lys Gly Leu Asn Cys Arg Pro Leu Val Cys gcc ttc gtc aac atg gtg gtc ttc tat gag gtg ctg 975 cct gtc gcc gca Ala Phe Val Asn Met Val Val Phe Tyr Glu Val Leu Pro Val Ala Ala agg ctc cgg ggt ctg ctc aca tagccggtcc ggcccaccc 1029 gcc ccacgcccag c Arg Leu Arg Gly Leu Leu Thr Ala accagcagctgctggaggtc gtagtggctg gaggaggcaaggggtagtgtggctgggttc1089 gggaccccacagggccattg cccaggagaa tgaggagcctccctgcagtgttgtcggccg1149 aggcctaagctcgccctgcc cagctactga cctcaggtcgaggggcccgccagccatcag1209 ccagggttggcctagggtgg caggagccag ggaggagtgggcctctttgatgagagcgtt1269 gagttgcatggagtcggttg ttcatcccag cctccccatggccctcgcctcccatgtctt1329 tgaagcacccctccagggag tcaggtgtgt gctcagccaccctctgccccattcctagac1389 cctcacccccaccactgttc ctgtgtcttc atgagctgtcccttacaggcaggggcttcc1449 cacaggctgggggcctcggg gcggggagca tgagctgggctggcaccacgactgagggct1509 cccggcccggcttcttcccc acagcaggct gctcagagggggtgctgccgggactgccat1569 gcccacctgagaggggcctg gggtggccgt cctcggccggttagggaatttggggtgagg1629 ttcctcaggagccctcactc tgcctgtgga cgctgcacctgccacttaaagaccccaaag1689 actctgttgggaactgttgt caataaaatg tttctgaggaaaaaaaaaaaaaaaa 1744 <210>

<211>

<212>
DNA

<213> Sapiens Homo <220>
<221> CDS
<222> 123..464 <220>
<221> sig~eptide <222> 123..269 <223> Von Heijne matrix score 4.90 seq PSLAAGLLFGSLA/GL
<220>
<221> polyA_signal <222> 908..913 <220>
<221> polyA_site <222> 931..946 <400>

aaatcgcgtt gtgtcccgcg gcttgcgctc 60 tccggagaga cgtagtggac cctggctgct tccgcgggcc ggtagtctcc tttctggact ttcggcagat gagaagagaa gcaggcctgg ga ta 167 atg gtg gag cct aag ttg ccc cat ctc tgg ttc ttt cca ggc t Met eu Glu Val Lys Pro Pro Leu Leu His Phe Trp Pro Phe L Gly tttggc aca gcactggtt gtt tctggtgggatc gttggc tatgta 215 tac PheGly Thr AlaLeuVal Val SerGlyGlyIle ValGly TyrVal Tyr aaaaca agc gtgccgtcc ctg getgcagggctg ctcttc ggcagt 263 ggc LysThr Ser ValProSer Leu AlaAlaGlyLeu LeuPhe GlySer Gly ctagcc ctg ggtgettac cag ctgtatcaggat ccaagg aacgtt 311 ggc LeuAla Leu GlyAlaTyr Gln LeuTyrGlnAsp ProArg AsnVal Gly tggggt cta gccgetaca tct gttacttttgtt ggtgtt atggga 359 ttc TrpGly Leu AlaAlaThr Ser ValThrPheVal GlyVal MetGly Phe atgaga tac tactatgga aaa ttcatgcctgta ggttta attgca 407 tcc MetArg Tyr TyrTyrGly Lys PheMetProVal GlyLeu IleAla Ser ggtgcc ttg ctgatggcc gcc aaagttggagtt cgtatg ttgatg 455 agt Gly Ala Ser Leu Leu Met Ala Ala Lys Val Gly Val Arg Met Leu Met aca tct tagcagaagtcatgttccagcttggactcatgaaggatta 504 gat Thr Ser Asp aaaatctgcatcttccactattttcaatgtattaagagaaataagtgcagcatttttgca564 tctgacattttacctaaaaaaaaaaagacaccaaatttggcggaggggtggaaaatcagt624 tgttaccattataaccctacagaggtggtgagcatgtaacatgagcttattgagaccatc684 atagagatcgattcttgtatattgattttatctctttctgtatctataggtaaatctcaa744 gggtaaaatgttaggtgttgacattgagaaccctgaaaccccattccctgctcagaggaa804 cagtgtgaaaaaaaatctcttgagagatttagaatatcttttcttttgctcatcttagac864 cacagactgactttgaaattatgttaagtgaaatatcaatgaaaataaagtttactataa924 ataattaaaaaaaaaaaaaaas 946 <210> 43 <211> 1622 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 85..1230 <220>
<221> sig_peptide <222> 85..129 <223> Von Heijne matrix score 10.10 seq LLLPLALCILVLC/CG
<220>
<221> polyA-signal <222> 1589..1594 <220>
<221> polyA site <222> 1607..1622 <400> 43 aaagtctgcc ttaaagagcc ttacaagcca gccagtccct gcagctccac aaactgaccc 60 atcctgggcc ttgttctcca caga atg ggt ctg ctc ctt ccc ctg gca ctc 111 Met Gly Leu Leu Leu Pro Leu Ala Leu -is -to tgcatccta gtcctgtgc tgcgga gcaatgtct ccaccc cagctggcc 159 CysIleLeu ValLeuCys CysGly AlaMetSer ProPro GlnLeuAla ctcaacccc tcggetctg ctctcc cggggctgc aatgac tcagatgtg 207 LeuAsnPro SerAlaLeu LeuSer ArgGlyCys AsnAsp SerAspVal ctggcagtt gcaggcttt gccctg cgggatatt aacaaa gacagaaag 255 LeuAlaVal AlaGlyPhe AlaLeu ArgAspIle AsnLys AspArgLys gatggctat gtgctgaga ctcaac cgagtgaac gacgcc caggaatac 303 AspGlyTyr ValLeuArg LeuAsn ArgValAsn AspAla GlnGluTyr agacggggt ggcctggga tctctg ttctatctt acactg gatgtgcta 351 ArgArgGly GlyLeuGly SerLeu PheTyrLeu ThrLeu AspValLeu gagactgac tgccatgtg ctcaga aagaaggca tggcaa gactgtgga 399 GluThrAsp CysHisVal LeuArg LysLysAla TrpGln AspCysGly atgaggata ttttttgaa tcagtt tatggtcaa tgcaaa gcaatattt 447 MetArgIle PhePheGlu SerVal TyrGlyGln CysLys AlaIlePhe tatatgaac aacccaagt agagtt ctctattta getget tataactgt 495 TyrMetAsn AsnProSer ArgVal LeuTyrLeu AlaAla TyrAsnCys actcttcgc ccagtttca aaaaaa aagatttac atgacg tgccctgac 543 ThrLeuArg ProValSer LysLys LysIleTyr MetThr CysProAsp tgcccaagc tccataccc actgac tcttccaat caccaa gtgctggag 591 CysProSer SerIlePro ThrAsp SerSerAsn HisGln ValLeuGlu getgccacc gagtctctt gcgaaa tacaacaat gagaac acatccaag 639 AlaAlaThr GluSerLeu AlaLys TyrAsnAsn GluAsn ThrSerLys cagtattct ctcttcaaa gtcacc agggettct agccag tgggtggtc 687 GlnTyrSer LeuPheLys ValThr ArgAlaSer SerGln TrpValVal ggcccttct tactttgtg gaatac ttaattaaa gaatca ccatgtact 735 GlyProSer TyrPheVal GluTyr LeuIleLys GluSer ProCysThr aaatcccag gccagcagc tgttca cttcagtcc tccgac tctgtgcct 783 LysSerGln AlaSerSer CysSer LeuGlnSer SerAsp SerValPro gtt ggt tgc aaaggt tctctgact cgaaca cactgggaa aagttt 831 ctt Val Gly Cys LysGly SerLeuThr ArgThr HisTrpGlu LysPhe Leu gtc tct act tgtgac ttctttgaa tcacag getccagcc actgga 879 gtg Val Ser Thr CysAsp PhePheGlu SerGln AlaProAla ThrGly Val agt gaa tct getgtt aaccagaaa cctaca aaccttccc aaggtg 927 aac Ser Glu Ser AlaVal AsnGlnLys ProThr AsnLeuPro LysVal Asn gaa gaa cag cagaaa aacacccec ccaaca gactccccc tccaaa 975 tcc Glu Glu Gln GlnLys AsnThrPro ProThr AspSerPro SerLys Ser get ggg aga ggatct gtccaatat cttcct gacttggat gataaa 1023 cca Ala Gly Arg GlySer ValGlnTyr LeuPro AspLeuAsp AspLys Pro aat tcc gaa aagggc cctcaggag gccttt cctgtgcat ctggac 1071 cag Asn Ser Glu LysGly ProGlnGlu AlaPhe ProValHis LeuAsp Gln cta acc aat ccccag ggagaaacc ctggat atttccttc ctcttc 1119 acg Leu Thr Asn ProGln GlyGluThr LeuAsp IleSerPhe LeuPhe Thr ctg gag atg gaggag aagctggtg gtcctg cctttcccc aaagaa 1167 cct Leu Glu Met GluGlu LysLeuVal ValLeu ProPhePro LysGlu Pro aaa gca act getgag tgcccaggg ccagcc cagaatgce agcect 1215 cge Lys Ala Thr AlaGlu CysProGly ProAla GlnAsnAla SerPro Arg ctt gtc ccg ccatgagaatcac acagagtctt ggtgcgcc 1270 ctt ctgtaggggt at Leu Val Pro Pro Leu gcatgacatgggaggcga tg g agagacag agcgtgcaca cgtagagtgg gggacgatg ac ctagtgaaggacgccttttt t tctcagca tgttgactgg gattggaaat 1390 gactcttct gg aatgagactgagccctcggc tgggctgca ctaccctgtacactgcct tgtaccctga1450 t ct gctgcatcacctcctaaact agcagtctc accatggagagatgcctc tcttatgtct1510 g at tcagccactcacttataaag tacttatct tcagcagtatatatgtgc tgaaatctca1570 a tt gcatgaaagcattgcatgag aaagatact ccctaaaaaaaaaaaaaa as 1622 t tt <210>

<211>

<212>
DNA

<213> Sapiens Homo <220>
<221> CDS
<222> 29..664 <220>
<221> sig_peptide <222> 29..619 <223> Von Heijne matrix score 4.80 seq SFFGASFLMGSLG/GM
<220>
<221> polyA-signal <222> 657..662 <220>
<221> polyA-site <222> 699..715 <220>
<221> misc feature <222> 295,357 <223> n=a, g, c or t <220>
<221> unsure <222> -88 <223> Xaa = Ala,Asp,Gly,Va1 <220>
<221> unsure <222> -109 <223> Xaa = Asp,Glu <400> 44 cttttcctgc ctctgattcc gggctgtc atg gcg acc ccc aac aat ctg acc 52 Met Ala Thr Pro Asn Asn Leu Thr ccc acc aac tgc agc tgg tgg ccc atc tcc gcg ctg gag agc gat gcg 100 Pro Thr Asn Cys Ser Trp Trp Pro Ile Ser Ala Leu Glu Ser Asp Ala WO 00/37491 PC'T/IB99/02058 gccaagcca gcggag gcccccgac getcccgag gcggccagc cccgcc 148 AlaLysPro AlaGlu AlaProAsp AlaProGlu AlaAlaSer ProAla cattggccc agggag agcctggtt ctgtaccac tggacccag tccttc 196 HisTrpPro ArgGlu SerLeuVal LeuTyrHis TrpThrGln SerPhe agctcgcag aaggcc aagatcttg gagcatgat gatgtgagc tacctg 244 SerSerGln LysAla LysIleLeu GluHisAsp AspValSer TyrLeu aagaagatc ctcggg gaactggcc atggtgctg gaccagatt gaggcg 292 LysLysIle LeuGly GluLeuAla MetValLeu AspGlnIle GluAla ganctggag aagagg aagctggag aacgagggg cagaaatgc gagctg 340 XaaLeuGlu LysArg LysLeuGlu AsnGluGly GlnLysCys GluLeu tggctctgt ggctgt gncttcacc ctcgetgat gtcctcctg ggagcc 38B

TrpLeuCys GlyCys XaaPheThr LeuAlaAsp ValLeuLeu GlyAla accctgcac cgcctc aagttcctg ggactgtcc aagaaatac tgggaa 436 ThrLeuHis ArgLeu LysPheLeu GlyLeuSer LysLysTyr TrpGlu gatggcagc cggccc aacctgcag tccttcttt gagagggtc cagaga 484 AspGlySer ArgPro AsnLeuGln SerPhePhe GluArgVal GlnArg cgctttgcc ttccgg aaagtcctg ggtgacatc cacaccacc ctgctg 532 ArgPheAla PheArg LysValLeu GlyAspIle HisThrThr LeuLeu tcggccgtc atcccc aatgetttc cggctggtc aagaggaaa ccccca 580 SerAlaVal IlePro AsnAlaPhe ArgLeuVal LysArgLys ProPro tccttcttc ggggcg tccttcctc atgggctcc ctgggtggg atgggc 628 SerPhePhe GlyAla SerPheLeu MetGlySer LeuGlyGly MetGly tactttgcc tactgg tacctcaag aaaaaatac atctagggccagg 674 TyrPheAla TyrTrp TyrLeuLys LysLysTyr Ile cctggggctt a 715 ggtgtctgac tgccaaaaaa aaaaaaaaaa <210>45 <211>1549 <212>DNA

<213>Homo Sapiens <220>
<221> CDS
<222> 18..878 <220>
<221> sig~eptide <222> 18..95 <223> Von Heijne matrix score 6.30 seq GVGLVTLLGLAVG/SY
<220>
<221> polyA-signal <222> 1500..1505 <220>
<221> polyA_site <222> 1533..1549 <220>
<221> misc feature <222> 944 <223> n=a, g, c or t <400> 45 ggaaaaggcg ctccgtc atg ggg atc cag acg agc ccc gtc ctg ctg gcc Met Gly Ile Gln Thr Ser Pro Val Leu Leu Ala tccctg gg9gtgggg ctg act ctc ggc ctg gtgggc tcc 98 gtc ctg get SerLeu GlyValGly Leu Thr Leu Gly Leu ValGly Ser Val Leu Ala tacttg gttcggagg tcc cgg cag gtc act ctggac ccc 146 cgc cct ctc TyrLeu ValArgArg Ser Arg Gln Val Thr LeuAsp Pro Arg Pro Leu aat gaa aag tac ctg cta cga ctg cta gac aag acg ctc tct gca cgg 194 Asn Glu Lys Tyr Leu Leu Arg Leu Leu Asp Lys Thr Leu Ser Ala Arg tcc cca ggc aaa cat atc tac ctc tcc acc cga att gat ggc agc ctg 242 Ser Pro Gly Lys His Ile Tyr Leu Ser Thr Arg Ile Asp Gly Ser Leu gtc atc agg cca tac act cct gtc acc agt gat gag gat caa ggc tat 290 WO 00/37491 PCT/iB99/02058 Val Ile Arg Pro Tyr Thr Pro Val Thr Ser Asp Glu Asp Gln Gly Tyr gtggatctt gtcatcaag gtctac ctgaagggt gtgcacccc aaattt 338 ValAspLeu ValIleLys ValTyr LeuLysGly ValHisPro LysPhe cctgaggga gggaagatg tctcag tacctggat agcctgaag gttggg 386 ProGluGly GlyLysMet SerGln TyrLeuAsp SerLeuLys ValGly gatgtggtg gagtttcgg gggcca agcgggttg ctcacttac actgga 434 AspValVal GluPheArg GlyPro SerGlyLeu LeuThrTyr ThrGly aaagggcat tttaacatt cagccc aacaagaaa tctccacca.gaaccc 482 LysGlyHis PheAsnIle GlnPro AsnLysLys SerProPro GluPro cgagtggcg aagaaactg ggaatg attgccggc gggacagga atcacc 530 ArgValAla LysLysLeu GlyMet IleAlaGly GlyThrGly IleThr ccaatgcta cagctgatc cgggcc atcctgaaa gtccctgaa gatcca 578 ProMetLeu GlnLeuIle ArgAla IleLeuLys ValProGlu AspPro acccagtgc tttctgctt tttgcc aaccagaca gaaaaggat atcatc 626 ThrGlnCys PheLeuLeu PheAla AsnGlnThr GluLysAsp IleIle ttgcgggag gacttagag gaactg caggcccgc tatcccaat cgcttt 674 LeuArgGlu AspLeuGlu GluLeu GlnAlaArg TyrProAsn ArgPhe aagctctgg ttcactctg gatcat cccccaaaa gattgggcc tacagc 722 LysLeuTrp PheThrLeu AspHis ProProLys AspTrpAla TyrSer aagggcttt gtgactgcc gacatg atccgggaa cacctgccc getcca 770 LysGlyPhe ValThrAla AspMet IleArgGlu HisLeuPro AlaPro ggggatgat gtgctggta ctgctt tgtgggcca cccccaatg gtgcag 818 GlyAspAsp ValLeuVal LeuLeu CysGlyPro ProProMet ValGln ctggcctgc catcccaac ttggac aaactgggc tactcacaa aagatg 866 LeuAlaCys HisProAsn LeuAsp LysLeuGly TyrSerGln LysMet cgattcacc tactgagcatcct cgctgcagtt 918 ccagcttccc tggtgctgtt ArgPheThr Tyr gttccccatc cttagattcc tttcctcaga 978 agtactcaag gtttcaggtt cactanaagc ttttcagttacatctagagctgaaatctggatagtacctgcaggaacaatattcctgtag1038 ccatggaagagggccaaggctcagtcactccttggatggcctcctaaatctccccgtggc1098 aacaggtccaggagaggcccatggagcagtctcttccatggagtaagaaggaagggagca1158 tgtacgcttggtccaagattggctagttccttgatagcatcttactctcaccttctttgt1218 gtctgtgatgaaaggaacagtctgtgcaatgggttttacttaaacttcactgttcaacct1278 atgagcaaatctgtatgtgtgagtataagttgagcatagcatacttccagaggtggtctt1338 atggagatggcaagaaaggaggaaatgatttcttcagatctcaaaggagtctgaaatatc1398 atatttctgtgtgtgtctctctcagcccctgcccaggctagagggaaacagctactgata1458 atcgaaaactgctgtttgtggcaggaacccctggctgtgcaaataaatggggctgaggcc1518 cctgtgtgatattcaaaaaaaaaaaaaaaaa 1549 <210> 46 <211> 1328 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 73..1008 <220>
<221> sig-peptide <222> 73..147 <223> Von Heijne matrix score 14.10 seq LTLLLLLTLLAFA/GY
<220>
<221> polyA_signal <222> 1286..1291 <220>
<221> polyA site <222> 1312..1328 <400> 46 actgcgcgga tcggcgtccg cagcgggcgg ctgctgagct gccttgaggt gcagtgttgg 60 ggatccagag cc atg tcg gac ctg cta cta ctg ggc ctg att ggg ggc ctg 111 Met Ser Asp Leu Leu Leu Leu Gly Leu Ile Gly Gly Leu -25 -20 . -15 act ctc tta ctg ctg ctg acg ctg cta gcc ttt gcc ggg tac tca ggg 159 Thr Leu Leu Leu Leu Leu Thr Leu Leu Ala Phe Ala Gly Tyr Ser Gly -to -s 1 ctactg getggggtg gaagtgagt getgggtca cccccc atccgcaac 207 LeuLeu AlaGlyVal GluValSer AlaGlySer ProPro IleArgAsn gtcact gtggcctac aagttccac atggggctc tatggt gagactggg 255 ValThr ValAlaTyr LysPheHis MetGlyLeu TyrGly GluThrGly cggctt ttcactgag agctgcatc tctcccaag ctccgc tccatcget 303 ArgLeu PheThrGlu SerCysIle SerProLys LeuArg SerIleAla gtctac tatgacaac ccccacatg gtgccccct gataag tgccgatgt 351 ValTyr TyrAspAsn ProHisMet ValProPro AspLys CysArgCys gccgtg ggcagcatc ctgagtgaa ggtgaggaa tcgccc tcccctgag 399 AlaVal GlySerIle LeuSerGlu GlyGluGlu SerPro SerProGlu ctcatc gacctctac cagaaattt ggcttcaag gtgttc tccttcccg 447 LeuIle AspLeuTyr GlnLysPhe GlyPheLys ValPhe SerPhePro gcaccc agccatgtg gtgacagcc accttcccc tacacc accattctg 495 AlaPro SerHisVal ValThrAla ThrPhePro TyrThr ThrIleLeu tccatc tggctgget acccgccgt gtccatcct gccttg gacacctac 543 SerIle TrpLeuAla ThrArgArg ValHisPro AlaLeu AspThrTyr atcaag gagcggaag ctgtgtgcc tatcctcgg ctggag atctaccag 591 IleLys GluArgLys LeuCysAla TyrProArg LeuGlu IleTyrGln gaagac cagatccat ttcatgtgc ccactggca cggcag ggagacttc 639 GluAsp GlnIleHis PheMetCys ProLeuAla ArgGln GlyAspPhe tatgtg cctgagatg aaggagaca gagtggaaa tggcgg gggcttgtg 687 TyrVal ProGluMet LysGluThr GluTrpLys TrpArg GlyLeuVal gaggcc attgacacc caggtggat ggcacagga getgac acaatgagt 735 GluAla IleAspThr GlnValAsp GlyThrGly AlaAsp ThrMetSer gacacg agttctgta agcttggaa gtgagccct ggcagc cgggagact 783 AspThr SerSerVal SerLeuGlu ValSerPro GlySer ArgGluThr 200~ 205 210 tcaget gccacactg tcacctggg gcgagcagc cgtggc tgggatgac 831 SerAla AlaThrLeu SerProGly AlaSerSer ArgGly TrpAspAsp WO 00/37491 PCT/iB99/02058 ggt gac cgc agc gag cac tac agc tca ggt agc ggc 879 acc agc gag gcc Gly Asp Arg Ser Glu His Tyr Ser Ser Gly Ser Gly Thr Ser Glu Ala tcc tct gag gag ctg gac gag ggc ggg ccc ggg gag 927 ttt ttg gag tta Ser Ser Glu Glu Leu Asp Glu Gly Gly Pro Gly Glu Phe Leu Glu Leu tca cgg gac cct ggg act ccc ctg act acc tgg ctc 975 ctg gag ggg aag Ser Arg Asp Pro Gly Thr Pro Leu Thr Thr Trp Leu Leu Glu Gly Lys tgg gag act gcc cct gag ggc aag taacccatggcctgcaccct1028 ccc aag gag Trp Glu Thr Ala Pro Glu Gly Lys Pro Lys Glu cctgcagtgcagttgctgag gaactgagcagactctccagcagactctccagccctcttc1088 ctccttcctctgggggagga ggggttcctgagggacctgacttcccctgctccaggcctc1148 ttgctaagccttctcctcac tgccctttaggctcccagggccagaggagccagggactat1208 tttctgcaccagcccccagg gctgccacccctgttgtgtctttttttcagactcacagtg1268 gagcttccaggacccagaat aaagccaatgatttacttgtttcaaaaaaaaaaaaaaaaa1328 <210> 47 <211> 1515 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 165..842 <220>
<221> sig~eptide <222> 165..251 <223> Von Heijne matrix score 7.00 seq LASFAALVLVCRQ/RY
<220>
<221> polyA_signal <222> 1474..1479 <220>
<221> polyA_site <222> 1500..1515 WO 00/37491 PCT/lB99/02058 <400>

agtcgcgg ga gcgcccggg tgaggcc ctcaggtctctgc tcgtg 60 t agccacagcc aggtg gaggaacc ta tccccaatttgcca cttccagcag gccca 120 gcacctgcca cttta tcctct tgaggagg at caggagccctc tggaagcatggag gtg 176 gtgaccggga act ctgagt MetGlu Val Thr gtgattgtt gccataggtgtg ctggcc accatcttt ctgget tcgttt 224 ValIleVal AlaIleGlyVal LeuAla ThrIlePhe LeuAla SerPhe gcagccttg gtgctggtttgc aggcag cgctactgc cggccg cgagac 272 AlaAlaLeu ValLeuValCys ArgGln ArgTyrCys ArgPro ArgAsp ctgctgcag cgctatgattct aagccc attgtggac ctcatt ggtgcc 320 LeuLeuGln ArgTyrAspSer LysPro IleValAsp LeuIle GlyAla atggagacc cagtctgagccc tctgag ttagaactg gacgat gtcgtt 368 MetGluThr GlnSerGluPro SerGlu LeuGluLeu AspAsp,ValVal atcaccaac ccccacattgag gccatt ctggagaat gaagac tggatc 416 IleThrAsn ProHisIleGlu AlaIle LeuGluAsn GluAsp TrpIle gaagatgcc tcgggtctcatg tcccac tgcattgcc atcttg aagatt 464 GluAspAla SerGlyLeuMet SerHis CysIleAla IleLeu LysIle tgtcacact ctgacagagaag cttgtt gccatgaca atgggc tctggg 512 CysHisThr LeuThrGluLys LeuVal AlaMetThr MetGly SerGly gccaagatg aagacttcagcc agtgtc agcgacatc attgtg gtggcc 560 AlaLysMet LysThrSerAla SerVal SerAspIle IleVal ValAla aagcggatc agccccagggtg gatgat gttgtgaag tcgatg taccct 608 LysArgIle SerProArgVal AspAsp ValValLys SerMet TyrPro ccgttggac cccaaactcctg gacgca cggacgact gccctg ctcctg 656 ProLeuAsp ProLysLeuLeu AspAla ArgThrThr AlaLeu LeuLeu tctgtcagt cacctggtgctg gtgaca aggaatgcc tgccat ctgacg 704 SerValSer HisLeuValLeu ValThr ArgAsnAla CysHis LeuThr ggaggcctg gactggattgac cagtct ctgtcgget getgag gagcat 752 GlyGlyLeu AspTrpIleAsp GlnSer LeuSerAla AlaGlu GluHis ttg gaa gtc ctt cga gaa gca gcc cta get tct gag cca gat aaa ggc 800 Leu Glu Val Leu Arg Glu Ala Ala Leu Ala Ser Glu Pro Asp Lys Gly ctc cca ggc cct gaa ggc ttc ctg cag gag cag tct gca att 842 Leu Pro Gly Pro Glu Gly Phe Leu Gln Glu Gln Ser Ala Ile tagtgcctacaggccagcagctagccatgaaggcccctgccgccatccctggatggctca902 gcttagccttctactttttcctatagagttagttgttctccacggctggagagttcagct962 gtgtgtgcatagtaaagcaggagatccccgtcagtttatgcctcttttgcagttgcaaac1022 tgtggctggtgagtggcagtctaatactacagttaggggagatgccattcactctctgca1082 agaggagtattgaaaactggtggactgtcagctttatttagctcacctagtgttttcaag1142 aaaattgagccaccgtctaagaaatcaagaggtttcacattaaaattagaatttctggcc1202 tctctcgatcggtcagaatgtgtggcaattctgatctgcattttcagaagaggacaatca1262 attgaaactaagtaggggtttcttcttttggcaagacttgtactctctcacctggcctgt1322 ttcatttatttgtattatctgcctggtccctgaggcgtctgggtctctcctctcccttgc1382 aggtttgggtttgaagctgaggaactacaaagttgatgatttcttttttatctttatgcc1442 tgcaattttacctagctaccactaggtggatagtaaatttatacttatgtttcccccaaa1502 aaaaaaaaaaaaa 1515 <210> 48 <211> 1622 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 31..1248 <220>
<221> sig~eptide <222> 31..135 <223> Von Heijne matrix score 6.30 seq TLLLFAAPFGLLG/EK
<220>
<221> polyA signal <222> 1580..1585 <220>
<221> polyA site <222> 1607..1622 <400> 48 aacctcttcc gtcggctgaa ttgcggccgt atg cgc ggc tct gtg gag tgc acc 54 Met Arg Gly Ser Val Glu Cys Thr tgg ggt tgg ggg cac tgt gcc ccc agc ccc ctg ctc ctt tgg act cta 102 Trp Gly Trp Gly His Cys Ala Pro Ser Pro Leu Leu Leu Trp Thr Leu ctt ctg ttt gca gcc cca ttt ggc ctg ctg ggg gag aag acc cgc cag 150 Leu Leu Phe Ala Ala Pro Phe Gly Leu Leu Gly Glu Lys Thr Arg Gln gtg tctctggag gtcatccct aactggctg ggccccctg cagaacctg 198 Val SerLeuGlu ValIlePro AsnTrpLeu GlyProLeu GlnAsnLeu ctt catatacgg gcagtgggc accaattcc acactgcac tatgtgtgg 246 Leu HisIleArg AlaValGly ThrAsnSer ThrLeuHis TyrValTrp agc agcctgggg cctctggca gtggtaatg gtggccacc aacaccccc 294 Ser SerLeuGly ProLeuAla ValValMet ValAlaThr AsnThrPro cac agcaccctg agcgtcaac tggagcctc ctgctatcc cctgagccc 342 His SerThrLeu SerValAsn TrpSerLeu LeuLeuSer ProGluPro gat gggggcctg atggtgctc cctaaggac agcattcag ttttcttct 390 Asp GlyGiyLeu MetValLeu ProLysAsp SerIleGln PheSerSer gcc cttgttttt accaggctg cttgagttt gacagcacc aacgtgtcc 438 Ala LeuValPhe ThrArgLeu LeuGluPhe AspSerThr AsnValSer 9p 95 100 gat acggcagca aagcctttg ggaagacca tatcctcca tactccttg 486 Asp ThrAlaAla LysProLeu GlyArgPro TyrProPro TyrSerLeu gcc gatttctct tggaacaac atcactgat tcattggat cctgccarc 534 Ala AspPheSer TrpAsnAsn IleThrAsp SerLeuAsp ProAlaThr ctg agtgccaca .tttcaaggc caccccatg aacgaccct accaggact 582 Leu SerAlaThr PheGlnGly HisProMet AsnAspPro ThrArgThr ttt gccaatggc agcctggcc ttcagggtc caggccttt tccaggtcc 630 Phe AlaAsnGly SerLeuAla PheArgVal GlnAlaPhe SerArgSer agc cgaccagcc caaccccct cgcctcctg cacacagca gacacctgt 678 Ser Arg Pro Ala Gln Pro Pro Arg Leu Leu His Thr Ala Asp Thr Cys cag cta gag gtg gcc ctg att gga gcc tct ccc cgg gga aac cgt tcc 726 Gln Leu Glu Val Ala Leu Ile Gly Ala Ser Pro Arg Gly Asn Arg Ser ctg tttgggctggag gtagcc acattgggc cagggc cctgactgc ccc 774 Leu PheGlyLeuGlu ValAla ThrLeuGly GlnGly ProAspCys Pro tca atgcaggagcag cactcc atcgacgat gaatat gcaccggcc gtc 822 Ser MetGlnGluGln HisSer IleAspAsp GluTyr AlaProAla Val ttc cagttggaccag ctactg tggggctcc ctccca tcaggcttt gca 870 Phe GlnLeuAspGln LeuLeu TrpGlySer LeuPro SerGlyPhe Ala cag tggcgaccagtg gettac tcccagaag ccgggg ggccgagaa tca 918 Gln TrpArgProVal AlaTyr SerGlnLys ProGly GlyArgGlu Ser gcc ctgccctgccaa gettcc cctcttcat cctgcc ttagcatac tct 966 Ala LeuProCysGln AlaSer ProLeuHis ProAla LeuAlaTyr Ser ctt ccccagtcaccc attgtc cgagccttc tttggg tcccagaat aac 1014 Leu ProGlnSerPro IleVal ArgAlaPhe PheGly SerGlnAsn Asn ttc tgtgccttcaat ctgacg ttcgggget tccaca ggccctggc tat 1062 Phe CysAlaPheAsn LeuThr PheGlyAla SerThr GlyProGly Tyr tgg gaccaacactac ctcagc tggtcgatg ctcctg ggtgtgggc ttc 1110 Trp AspGlnHisTyr LeuSer TrpSerMet LeuLeu GlyValGly Phe cct ccagtggacggc ttgtcc ccactagtc ctgggc atcatggca gtg 1158 Pro ProValAspGly LeuSer ProLeuVal LeuGly IleMetAla Val gcc ctgggtgcccca gggctc atgctgcta gggggc ggcttggtt ctg 1206 Ala LeuGlyAlaPro GlyLeu MetLeuLeu GlyGly GlyLeuVal Leu ctg ctgcaccacaag aagtac tcagagtac cagtcc ataaat 1248 Leu LeuHisHisLys LysTyr SerGluTyr GlnSer IleAsn taaggcc cgctctctggagg aaggacat t tgaacctg tcttgct gtgcctcgaaact1308 g ac ctggaggttg gagcatcaag tccagccc c tcactccccca tcttgcttttctgtgga1368 t ct acc tcagagg ccagcctcga ttcctggag ccccaggtggggctt ccttcatactttg1428 c ac ttgggggact ttggaggcgg caggggaca gctattgataaggtc cccttggtgttgc1488 g gg cttcttgcat ctccacacat ttcccttgga tgggacttgc aggcctaaat gagaggcatt 1548 ctgactggtt ggctgccctg gaaggcaaga aaatagattt attttttttt cacagggcaa 1608 aaaaaaaaaa aaaa 1622 <210> 49 <211> 1448 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 131..490 <220>
<221> sig~eptide <222> 131..301 <223> Von Heijne matrix score 5.30 seq AIALATVLFLIGA/FL
<220>
<221> polyA_signal <222> 1411..1416 <220>
<221> polyA site <222> 1434..1448 <400> 49 ctgatcccgc ctggggccgg ctgagtggca cttaagcggg ccatgccatg caaccttggg 60 cgctgccaac cgtgggcgag ctctgggtgt gcgggcggcc tcgcgcggcg ctccgctgtg 120 tcagcgtgtt atg atg ccg tcc cgt acc aac ctg get act gga atc ccc 169 Met Met Pro Ser Arg Thr Asn Leu Ala Thr Gly Ile Pro agt agt aaa gtg aaa tat tca agg ctc tcc agc aca gac gat ggc tac 217 Ser Ser Lys Val Lys Tyr Ser Arg Leu Ser.Ser Thr Asp Asp Gly Tyr att gac ctt cag ttt aag aaa acc cct cct aag atc cct tat aag gcc 265 Ile Asp Leu Gln Phe Lys Lys Thr Pro Pro Lys Ile Pro Tyr Lys Ala atc gca ctt gcc act gtg ctg ttt ttg att ggc gcc ttt ctc att att 313 Ile Ala Leu Ala Thr Val Leu Phe Leu Ile Gly Ala Phe Leu Ile Ile ata ggc tcc ctc ctg ctg tca ggc tac atc agc aaa ggg ggg gca gac 361 Ile Gly Ser Leu Leu Leu Ser Gly Tyr Ile Ser Lys Gly Gly Ala Asp cgg gcc gtt cca gtg ctg atc att ggc att ctg gtg ttc cta ccc gga 409 Arg Ala Val Pro Val Leu Ile Ile Gly Ile Leu Val Phe Leu Pro Gly ttt tac cac ctg cgc atc get tac tat gca tcc aaa ggc tac cgt ggt 457 Phe Tyr His Leu Arg Ile Ala Tyr Tyr Ala Ser Lys Gly Tyr Arg Gly tac tcc tat gat gac att cca gac ttt gat gac tagcacccac cccatagctg 510 Tyr Ser Tyr Asp Asp Ile Pro Asp Phe Asp Asp aggaggagtcacagtggaactgtcccagctttaagatatctagcagaaactatagctgag570 gactaaggaattctgcagcttgcagatgtttaagaaaataatggccagattttttgggtc630 cttcccaaagatgttaagtgaacctacagttagctaattaggacaagctctatttttcat690 ccctgggccctgacaagtttttccacaggaatatgtatcatggaagaatagaggttattc750 tgtaatggaaaagtgttgcctgccaccaccctctgtagagctgagcatttcttttaaata810 gtcttcattgccaatttgttcttgtagcaaatggaacaatgtggtatggctaatttctta870 ttattaagtaatttattttaaaaatatctgagtatattatcctgtacacttatccctacc930 ttcatgttccagtggaagaccttagtaaaatcaaagatcagtgagttcatctgtaatatt990 ttttttacttgctttcttactgacagcaaccaggaatttttttatcctgcagagcaagtt1050 ttcaaaatgtaaatacttcctctgtttaacagtccttggaccattctgatccagttcacc1110 agtaggttggacagcatataatttgcatcattttgtcccttgtaaatcaagatgttctgc1170 agattattcctttaacggccggacttttggctgtttcctaatgaaacatgtagtggttat1230 tatttagagtttatagccgtattgctagcaccttgtagtatgtcatcattctgctcatga1290 ttccaaggatcagcctggatgcctagaggactagatcaccttagtttgattctatttttt1350 agcttgcaaaaagtgacttatattccaaagaaattaaaatgttgaaatccaaatcctaga1410 aataaaatgagttaacttcaaacaaaaaaaaaaaaaaa 1448 <210> 50 <211> 894 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 61..690 <220>
<221> sig~eptide <222> 61..168 <223> Von Heijne matrix score 4.60 seq GTVVLVAGTLCFA/WW
<220>
<221> polyA_signal <222> 858..863 <220>
<221> polyA_site <222> 879..894 <400>

acaccttcac gcctggacag cgcctgctgc ctgcgcccag ccgcctcccg ctccctgcgc atggcc ctgccccagatg tgtgac gggagccac ttggcctcc accctc 108 MetAla LeuProGlnMet CysAsp GlySerHis LeuAlaSer ThrLeu cgctat tgcatgacagtc agcggc acagtggtt ctggtggcc gggacg 156 ArgTyr CysMetThrVal SerGly ThrValVal LeuValAla GlyThr ctctgc ttcgettggtgg agcgaa ggggatgca accgcccag cctggc 204 LeuCys PheAlaTrpTrp SerGlu GlyAspAla ThrAlaGln ProGly cagctg gccccacccacg gagtat ccggtgcct gagggcccc agcccc 252 GlnLeu AlaProProThr GluTyr ProValPro GluGlyPro SerPro ctgctc aggtccgtcagc ttcgtc tgctgcggt gcaggtggc ctgctg 300 LeuLeu ArgSerValSer PheVal CysCysGly AlaGlyGly LeuLeu ctgctc attggcctgctg tggtcc gtcaaggcc agcatccca gggcca 348 LeuLeu IleGlyLeuLeu TrpSer ValLysAla SerIlePro GlyPro cctcga tgggacccctat cacctc tccagagac ctgtactac ctcact 396 ProArg TrpAspProTyr HisLeu SerArgAsp LeuTyrTyr LeuThr gtggag tcctcagagaag gagagc tgcaggacc cccaaagtg gttgac 444 ValGlu SerSerGluLys GluSer CysArgThr ProLysVal ValAsp atcccc acttacgaggaa gccgtg agcttccca gtggccgag gggccc 492 IlePro ThrTyrGluGlu AlaVal SerPhePro ValAlaGlu GlyPro ccaaca ccacctgcatac cctacg gaggaagcc ctggagcca agtgga 540 WO 00/37491 PC7"/IB99/02058 ProThrProPro Tyr Pro Thr Glu Ala Leu ProSer Gly Ala Glu Glu tcgagggatgcc ctc agc acc cag gcc tgg ccaccc agc 588 ctg ccc cct SerArgAspAla Leu Ser Thr Gln Ala Trp ProPro Ser Leu Pro Pro tatgagagcatc ctt get ctt gat gtt tct gagacg aca 636 agc gcc gca TyrGluSerIle Leu Ala Leu Asp Val Ser GluThr Thr Ser Ala Ala ccgagtgccaca tcc tgc tca ggc gtt cag gcacgg gga 684 cgc ctg act ProSerAlaThr Ser Cys Ser Gly Val Gln AlaArg Gly Arg Leu Thr ggaagttaaaggctcctagcaggtcc tgaatccaga ctgtgccttc 740 gacaaaaatg GlySer tccagagtct tatgcagtgc agcaaacgtt cgttgttgaa ctgggacaca gtaggcactc ggctgttcta tttatctatt gctgtataac agaatttagt ggcttaaaat 860 aaaccacccc aaatcccatt ttattacgaa aaaaaaaaaa 894 aaaa <210> S1 <211> 1447 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 501..1253 <220>
<221> sig_peptide <222> 501..1229 <223> Von Heijne matrix score 4.10 seq LPSLAHLLPALDC/LE
<220>
<221> polyA signal <222> 1392..1397 <220>
<221> polyA site <222> 1432..1447 <220>

87 ' <221> mist feature <222> 243,252,278,285,387,1429 <223> n=a, g, c or t <400> 51 gtgagtcagg tgggtcctgg gcccaggaaccggcccggagccgtggacgc cctacagctg60 agaaggggac ccaaggggtc ggccgcggccaaggcccctaggaccgccgc cccagctcac120 gctgccgacg gcagctatag acattctgcgtcaggtccgggctcctggac tttgcctttc180 ccgagccctg gaggtgggga gaaaaggttcaccaatttttaaaatccaaa tatatctcat240 ggntacagtg gnaagaactg gccagagagtctggaagntttgggnttctg gtcctggctg300 tgccactgac tcactgtgac cttgggatcttgtgctgtgaagacatttcc caagtgcttc360 atgttagcca gcaaatctga cccacanggcctggaaagaggtgattgtta ggttgcgcag420 aggtggtctt atccagctca gcttcccctgggacccaccgtgggacctga ggcagaactg480 gggtggactt ggcctcctcc atg t cgg ctg 533 gca ca tag ata cga ctg ctg acg Met Ala Hi s Arg Leu Gln Ile Arg Leu Leu Thr tgg gat gtg aag gac acg agg ctc cat ccc tta ggg 581 ctg ctc cgc gag Trp Asp Val Lys Asp Thr Arg Leu His Pro Leu Gly Leu Leu Arg Glu gcc tat gcc act aag gcc cat ggg gag gtg gag ccc 629 cgg gcc ctg tca Ala Tyr Ala Thr Lys Ala His Gly Glu Val Glu Pro Arg Ala Leu Ser gcc ctg gaa caa ggc ttc gca tat get tag agc cat 677 agg tag agg agc Ala Leu Glu Gln Gly Phe Ala Tyr Ala Gln Ser His Arg Gln Arg Ser ttc ccc aac tat ggc ctg ggc cta tcc cgc tag tgg 725 agc cat act tgg Phe Pro Asn Tyr Gly Leu Gly Leu Ser Arg Gln Trp Ser His Thr Trp ctg gat gtg gtc ctg tag cat ctg ggt gtc tag gat 773 act ttc gcg get Leu Asp Val Val Leu Gln His Leu Gly Val Gln Asp Thr Phe Ala Ala tag get gta gcc ccc atc tag ctt aaa gac ttc agc 821 get gaa tat cat Gln Ala Val Ala Pro Ile Gln Leu Lys Asp Phe Ser Ala Glu Tyr His ccc tgc act tgg tag gtg ggg get gac act ctg agg 869 ttg gat gag gag Pro Cys Thr Trp Gln Val Gly Ala Asp Thr Leu Arg Leu Asp Glu Glu tgc cgc aca cgg ggt ctg gca gtg tcc aac ttt gac 917 aga ctg atc cga Cys Arg Thr Arg Gly Leu Ala Val Ser Asn Phe Asp Arg Leu Ile Arg cgg cta gag ggc atc ctg ctt ggc cgt gaa cat ttc 965 gag ggc ctg gac Arg Leu Glu Gly Ile Leu Leu Gly Arg Glu His Phe Glu Gly Leu Asp WO 00/37491 PCT/tB99/02058 tttgtg ctgacc tcc gaggetget ggctggccc aagccggac ccccgc 1013 PheVal LeuThr Ser GluAlaAla GlyTrpPro LysProAsp ProArg attttc caggag gcc ttgcggctt getcatatg gaaccagta gtggca 1061 IlePhe GlnGlu Ala LeuArgLeu AlaHisMet GluProVal ValAla gcccat gttggg gat aattacctc tgcgattac caggggcct cggget 1109 AlaHis ValGly Asp AsnTyrLeu CysAspTyr GlnGlyPro ArgAla gtgggc atgcac agc ttcctggtg gttggccca caggcactg gacccc 1157 ValGly MetHis Ser PheLeuVal ValGlyPro GlnAlaLeu AspPro gtggtc agggat tct gtacctaaa gaacacatc ctcccctct ctggcc 1205 ValVal ArgAsp Ser ValProLys GluHisIle LeuProSer LeuAla catctc ctgcct gcc cttgactgc ctagagggc tcaactcca gggctt 1253 HisLeu LeuPro Ala LeuAspCys LeuGluGly SerThrPro GlyLeu tgaggccagt gagggaagtg aggccatgga gaaaacctta aacaaaccct 1313 gctgggccct ggagacaggg agccccttct ttccccctct ccctgcggcc ttctccacag ctctggacct tttgtcacct actgtgataa tctcaccctt cccccnccaa taaagcagtg agtgctgagc aaaaaaaaaa aaaa 1447 <210> 52 <211> 1540 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 25..402 <220>
<221> sig~eptide <222> 25..96 <223> Von Heijne matrix score 7.00 seq LLCCFRALSGSLS/MR
<220>
<221> polyA signal <222> 1500..1505 <220>
<221> polyA_site <222> 1525..1540 <220>
<221> mist feature <222> 625,1411,1432,1440,1450,1506 <223> n=a, g, c or t <400>

agcctggccc 51 tccctctttc caaa atg gac aag tcc ctc ttg ctg gaa ctc Met Asp Lys Ser Leu Leu Leu Glu Leu cccatc ctgctctgc tgctttagg gcatta tctggatca ctttcaatg 99 ProIle LeuLeuCys CysPheArg AlaLeu SerGlySer LeuSerMet agaaat gatgcagtc aatgaaata gttget gtgaaaaac aattttcct 147 ArgAsn AspAlaVal AsnGluIle ValAla ValLysAsn AsnPhePro gtgata gaaattatt tagtgtagg atgtgc catctctag ttcccagga 195 ValIle GluIleIle GlnCysArg MetCys HisLeuGln PheProGly gaaaag tgctccaga ggaagagga atatgc acagcaaca acagaagag 243 GluLys CysSerArg GlyArgGly IleCys ThrAlaThr ThrGluGlu gcctgc atggttgga aggatgttc aaaagg gatggtaat ccctggtta 291 AlaCys MetValGly ArgMetPhe LysArg AspGlyAsn ProTrpLeu actttc atgggctgc ctaaagaac tgtget gatgtgaaa ggcataagg 339 ThrPhe MetGlyCys LeuLysAsn CysAla AspValLys GlyIleArg tggagt gtctatttg gtgaacttc aggtgc tgcaggagc catgacctg 387 TrpSer ValTyrLeu ValAsnPhe ArgCys CysArgSer HisAspLeu tgcaat gaagacctt tagaagttaa tggttcttct tctgggtg gtgactccaa tt CysAsn GluAspLeu aggttgttgc ctcagcctct c aaatcacaca cacacacaca tcacaatga tttctaaaaa cacactacag aagaggattg tcttctgcacacga aaggaaagtc 562 caaacacatg gctcca cctctccttt tctacagtct c agtaaataaa taaccttgag ctgtcacgc ccttaaaata agnaaagaacaagatcaatatatcctgcaggttgctacaaacccttgtgctttcactgta682 tagccagttcattcagaaaaggaggaaagggtagtttaatttcaaaaaagaatcccttcc742 tctttcctctgctgctttccttccttctgtggcagggtattttaatatatttttcaaatt802 tttttcctttctgtgttatccttcttatcccactccaaagaaagcacataactgtggcct862 gaagggatggggagtagcaacataaaaagaagtggctcaagtcttcttggagtttgttca922 tgaatgctgatcccagggtgaggagaagattgggacatagaaaggaaactgcatcagaaa982 catgaacagagaaagattgtctaccttctagaatcagatctgtttggggctgggggttgg1042 agaataaaagcaggagaagtctatgggattctagaaatagtacctgcatccagcttccct1102 gccaaactcacaaggagacatcaacctctagacagggaacagcttcaggatacttccagg1162 agacagagccaccagcagcaaaacaaatattcccatgcctggagcatggcatagaggaag1222 ctgagaaatgtggggtctgaggaagccatttgagtctggccactagacatctcatcagcc1282 acttgtgtgaagagatgccccatgaccccagatgcctctcccacccttacctccatctca1342 cacacttgagcttgccactctgtataattctaacatcctggagaaaaatggcagtttgac1402 cgaacctgnttcacaagggtagaggctganttctaacngaaacttgtnagaatgaagcct1462 ggaaagagtgatgaattatattatattatataaaaataataatnaaaaatataaagaaag1522 ctaaaaaaaaaaaaaaaa 1540 <210> 53 <211> 1643 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 280..678 <220>
<221> sig~eptide <222> 280..411 <223> Von Heijne matrix score 3.90 seq LSDSLWSPHCSWS/ER
<220>
<221> polyA signal <222> 1606..1611 <220>
<221> polyA site <222> 1628..1643 <400> 53 cctaagttttctcaaaaatgtctttttacagttagtttaagtcaggatctaaacaaagtt60 catacattacatttgcttgatgtctctcaactgtcttataacctataacaattgctccca120 atccatttttcatgccattactttatttaaaaacctgggccaacccagttctcaaaaggt180 attggacatcctcagaaaagatgactgctctatgttgaaccaaacaactgattcttacag240 gtttcttcctcacttgtcctctggctgtggcagccagatatg gac aga get 294 agg Met Asp Arg Ala Arg aca tcc ttc cct cca ctc cct gcc aaa gaa agg aga get ggg ata agc 342 Thr Ser Phe Pro Pro Leu Pro Ala Lys Glu Arg Arg Ala Gly Ile Ser agt gcc ctc ccc tgc cca ccc act atg tca ctt tct gac tcc ctt tgg 390 Ser Ala Leu Pro Cys Pro Pro Thr Met Ser Leu Ser Asp Ser Leu Trp tcc cct cat tgc tct tgg agt gag aga cct cat tcc ttc tct cac tgg 438 Ser Pro His Cys Ser Trp Ser Glu Arg Pro His Ser Phe Ser His Trp agg cag aga atg gga tcc tct ggt ttg gat gta agt 486 cca ggg tct tat Arg Gln Arg Met Gly Ser Ser Gly Leu Asp Val Ser Pro Gly Ser Tyr ttc aaa tgg ata cac agc tcc aga ggc aag get get 534 cac tct aaa att Phe Lys Trp Ile His Ser Ser Arg Gly Lys Ala Ala His Ser Lys Ile cta gag gga ctg ttc att tcc tgc gat gca aga ggc 582 gca ctt ggg ccc Leu Glu Gly Leu Phe Ile Ser Cys Asp Ala Arg Gly Ala Leu Gly Pro ctg aat tcc caa gga aac caa aga atg gtc ttc aga 630 get aag aac tgt Leu Asn Ser Gln Gly Asn Gln Arg Met Val Phe Arg Ala Lys Asn Cys ggt gga gcc agt cta get ctg cca act cct tgc ctt 678 gtg tct ctc tcc Gly Gly Ala Ser Leu Ala Leu Pro Thr Pro Cys Leu Val Ser Leu Ser tagggtaccactgaggtgga aagcctgaac tgctgtctctgctctggcttgtgctcaagc738 tgtgtgtccttggactggcc atctcctctc tgcaaccctcggtcttctcatttgtaaaat798 ggaagtgatcctctctgccc atacttcctt acagggctgcttggagacaatcaatcaaga858 tgagggaaattgagattcta caaagagtgt gatgcctacataacaaagtattgtttttct918 cacagttggtggtatttgag gagaaggtga agattttggttggaagagggaccagcagac978 aaacttgttctcttgtgtat aaaaagccat aacacgccccacatccctcaagctaggaag1038 aaacctgggctggatggtga cccactggag aagctgtgacatcctagcatggggaagagt1098 accaggatgcccactcctct tccccaggaa ccaccaaggagcctggagcctggctttatc1158 tcagccctgagtccccctct cccggtgcgc acacccctaacttttttttttttagatgga1218 atcttgctctgtcgcccagg ctggagtgca acggcagctcactgtaacctccacctccca1278 ggttcaagcgattctcctgc ctcagcctcc cgagtagctgggattacaggcgcgtgactc1338 catgcctggctaatttttgtatttttagtagaggtagggtttcaccatgttgaccagggt 1398 ggtctggaactcctgatctcaggtgatctgcctgcctccacctcccaaagtgctggaatt 1458 acaggtgtgagctaccgcgcccggccaatctggggctcctagctttggtgcaccaactac 1518 tcaaatccccaacttctctccaagaggaatttcaagaaacactgaccaatctggttacag 1578 aagctgaaggggccccaaccaggctgcaataaacctgctttacccttccaaaaaaaaaaa 1638 aaaaa 1643 <210> 54 <211> 1314 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 64..726 <220>
<221> sig~eptide <222> 64..147 <223> Von Heijne matrix score 3.70 seq WFTLGMFSAGLS/DL
<220>
<221> polyA_signal <222> 1279..1284 <220>
<221> polyA_site <222> 1300..1314 <400> 54 agtaggtccc ggcaaccgca ggctcgcggc gggcgctggg cgcgggatcc gactctagtc 60 gta atg gag gcg ggc ggc ttt ctg gac tcg ctc att tac gga gca tgc 108 Met Glu Ala Gly Gly Phe Leu Asp Ser Leu Ile Tyr Gly Ala Cys gtg gtc ttc acc ctt ggc atg ttc tcc gcc ggc ctc tcg gac ctc agg 156 Val Val Phe Thr Leu Gly Met Phe Ser Ala Gly Leu Ser Asp Leu Arg cac atg cga atg acc cgg agt gtg gac aac gtc cag ttc ctg ccc ttt 204 His Met Arg Met Thr Arg Ser Val Asp Asn Val Gln Phe Leu Pro Phe WO 00137491 PCT/iB99102058 ctc accacggaagtc aacaac ctgggctgg ctgagttat ggggetttg 252 Leu ThrThrGluVal AsnAsn LeuGlyTrp LeuSerTyr GlyAlaLeu aag ggagacgggatc ctcatc gtcgtcaac acagtgggt getgcgctt 300 Lys GlyAspGlyIle LeuIle ValValAsn ThrValGly AlaAlaLeu cag accctgtatatc ttggca tatctgcat tactgccct cggaagcgt 348 Gln ThrLeuTyrIle LeuAla TyrLeuHis TyrCysPro ArgLysArg gtt gtgctcctacag actgca accctgcta ggggtcctt ctcctgggt 396 Val ValLeuLeuGln ThrAla ThrLeuLeu GlyValLeu LeuLeuGly tat ggctacttttgg ctcctg gtacccaac cctgaggcc cggcttcag 444 Tyr GlyTyrPheTrp LeuLeu ValProAsn ProGluAla ArgLeuGln cag ttgggcctcttc tgcagt gtcttcacc atcagcatg tacctctca 492 Gln LeuGlyLeuPhe CysSer ValPheThr IleSerMet TyrLeuSer cca ctggetgacttg getaag gtgattcaa actaaatca acccaatgt 540 Pro LeuAlaAspLeu AlaLys ValIleGln ThrLysSer ThrGlnCys ctc tcctacccactc accatt getaccctt ctcacctct gcctcctgg 588 Leu SerTyrProLeu ThrIle AlaThrLeu LeuThrSer AlaSerTrp tgc ctc ggg ttt cga ctc aga gat ccc atc atg tcc aac 636 tat tat gtg Cys Leu Gly Phe Arg Leu Arg Asp Pro Ile Met Ser Asn Tyr Tyr Val ttt cca atc gtc acc agc ttt atc cgc tgg ctt tgg aag 684 gga ttc ttc Phe Pro Ile Val Thr Ser Phe Ile Arg Trp Leu Trp Lys Gly Phe Phe tac ccc gag caa gac agg aac tac tgg ctg caa 726 cag ctc acc Tyr Pro Glu Gln Asp Arg Asn Tyr Trp Leu Gln Gin Leu Thr tgaggctgctcatctgacca ctgggcacct tagtgccaacctgaaccaaagagacctcct786 tgtttcagctgggcctgctg tccagcttcc caggtgcagtgggttgtgggaacaagagat846 gactttgaggataaaaggac caaagaaaaa gctttacttagatgattgattggggcctag906 gagatgaaatcactttttat tttttagaga ttttttttttttaattttggaggttggggt966 gcaatctttagaatatgcct taaaaggccg ggcgcggtggctcacgcctgtaatcccagc1026 actttgggaggccaaggtgg gcggatcgcc tgaggtcaggagttcaagaccaacctgact1086 aacatggtgaaaccccatct ctactaaaaa tacaaaattagccaggcatgatggcacatg1146 cctgtaatcccagatacttg ggaggctgag gcaggagaattgcttgaacccaggaggtgg1206 aggttgcagtgagctgagat cgtgccattg tgatatgaatatgccttatatgctgatatg1266 aatatgcctt aaaataaagt gttccccacc cctaaaaaaa aaaaaaaa 1314 <210> 55 <211> 2356 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 42..1097 <220>
<221> sig_peptide <222> 42..110 <223> Von Heijne matrix score 4.40 seq QFILLGTTSWTA/AL
<220>
<221> polyA signal <222> 2323..2328 <220>
<221> polyA_site <222> 2341..2356 <400>

atccttggcg ccacagtcgg c 56 ccaccggggc atg tcgccgccgt gag agc gga ggg Met Glu Ser Gly Gly cggccc tcgctgtgc cagttcatc ctcctgggc accacctct gtggtc 104 ArgPro SerLeuCys GlnPheIle LeuLeuGly ThrThrSer ValVal accgcc gccctgtac tccgtgtac cggcagaag gcccgggtc tcccaa 152 ThrAla AlaLeuTyr SerValTyr ArgGlnLys AlaArgVal SerGln gagctc aagggaget aaaaaagtt catttgggt gaagattta aagagt 200 GluLeu LysGlyAla LysLysVal HisLeuGly GluAspLeu LysSer attctt tcagaaget ccaggaaaa tgcgtgcct tatgetgtt atagaa 248 IleLeu SerGluAla ProGlyLys CysValPro TyrAlaVal IleGlu gga get gtg cgg tct gtt aaa gaa acg ctt aac agc cag ttt gtg gaa 296 Gly Ala Val Arg Ser Val Lys Glu Thr Leu Asn Ser Gln Phe Val Glu aactgcaagggg gtaattcag cggctg acacttcag gagcac aagatg 344 AsnCysLysGly ValIleGln ArgLeu ThrLeuGln GluHis LysMet gtgtggaatcga accacccac ctttgg aatgattgc tcaaag atcatt 392 ValTrpAsnArg ThrThrHis LeuTrp AsnAspCys SerLys IleIle catcagaggacc aacacagtg cccttt gacctggtg ccccac gaggat 440 HisGlnArgThr AsnThrVal ProPhe AspLeuVal ProHis GluAsp ggcgtggatgtg getgtgcga gtgctg aagcccctg gactca gtggat 488 GlyValAspVal AlaValArg ValLeu LysProLeu AspSer ValAsp ctgggtctagag actgtgtat gagaag ttccacccc tcgatt cagtcc 536 LeuGlyLeuGlu ThrValTyr GluLys PheHisPro SerIle GlnSer ttcaccgatgtc atcggccac tacatc agcggtgag cggccc aaaggc 584 PheThrAspVal IleGlyHis TyrIle SerGlyGlu ArgPro LysGly atccaagagacc gaggagatg ctgaag gtgggggcc accctc acaggg 632 IleGlnGluThr GluGluMet LeuLys ValGlyAla ThrLeu ThrGly gttggcgaactg gtcctggac aacaac tctgtccgc ctgcag ccgccc 680 ValGlyGluLeu ValLeuAsp AsnAsn SerValArg LeuGIn ProPro aaacaaggcatg cagtactat ctaagc agccaggac ttcgac agcctg 728 LysGlnGlyMet GlnTyrTyr LeuSer SerGlnAsp PheAsp SerLeu ctgcagaggcag gagtcgagc gtcagg ctctggaag gtgctg gcgctg 776 LeuGlnArgGln GluSerSer ValArg LeuTrpLys ValLeu AlaLeu gtttttggcttt gccacatgt gccacc ctcttcttc attctc cggaag 824 ValPheGlyPhe AlaThrCys AlaThr LeuPhePhe IleLeu ArgLys cagtatctgcag cggcaggag cgcctg cgcctcaag cagatg caggag 872 GlnTyrLeuGln ArgGlnGlu ArgLeu ArgLeuLys GlnMet GlnGlu gagttccaggag catgaggcc cagctg ctgagccga gccaag cctgag 920 GluPheGlnGlu HisGluAla GlnLeu LeuSerArg AlaLys ProGlu gacagggag agtctgaag agcgcc tgtgtagtg tgtctgagcagc ttc 968 AspArgGlu SerLeuLys SerAla CysValVal CysLeuSerSer Phe aagtcctgc gtctttctg gagtgt gggcacgtt tgttcctgcacc gag 1016 LysSerCys ValPheLeu GluCys GlyHisVal CysSerCysThr Glu tgctaccgc gccttgcca gagccc aagaagtgc cctatctgcaga cag 1064 CysTyrArg AlaLeuPro GluPro LysLysCys ProIleCysArg Gln gcgatcacc cgggtgata cccctg tacaacagc taatagtttg gaagccgcac 1117 AlaIleThr ArgValIle ProLeu TyrAsnSer agcttgacctggaagcacccctgcccccttttcagggatttttatctcgaggcctttgga1177 ggagcagtggtgggggtagctgtcacctccaggtatgattgagggaggaattgggtagaa1237 actctccagacccatgcctccaatggcaggatgctgcctttcccacctgagaggggaccc1297 tgtccatgtgcagcctcatcagagcctcaccctgggaggatgccgtggcgtctcctccca1357 ggagccagatcagtgcgagtgtgactgaaaatgcctcatcacttaagcaccaaagccagt1417 gatcagcagctcttctgttcctgtgtcttctgtttttttctggtgaatcgttgcttgctg1477 tggacttggtggaggactcagaggggaggaaaggctgggccccgagtacaacggatgcct1537 tgggtgctgcctccgaagagactctgccgcagcttttcttctttttcctcatgccccggg1597 aaacagtctttcttcagaattgtcaggctgggcaggtcaacttgtgttcctttcccctca1657 cctgcttgcctccttaacgcctgcacgtgtgtgtagaggacaaaagaaagtgaagtcagc1717 acatccgcttctgcccagatggtcggggccccgggcaacagattgaagagagatcatgtg1777 aagggcagttggtcaggcaggcctcctggtttcgccactggccctgatttgaactcctgc1837 cacttgggagagctcggggtggtccctggttttccctcctggagaatgaggcgcagaggc1897 ctcgcctcctgaaggacgcagtgtggatgccactggcctagtgtcctggcctcacagctt1957 ccttgcaaggctgtcacaaggaaaagcagccggctggcaccctgagcatatgccctcttg2017 gggctccctcatccagcccgtcgcagctttgacatcttggtgtactcatgtcgcttctcc2077 ttgtgttaccccctcccagtattaccatttgcccctcacctgcccttggtgagcctttta2137 gtgcaagacagatggggctgttttcccccacctctgagtagttggaggtcacatacacag2197 ctctttttttattgcccttttctgcctctgaatgttcatctctcgtcctcctttgtgcag2257 gcgaggaaggggtgccctcaggggccgacactagtatgatgcagtgtccagtgtgaacag2317 cagaaattaaacatgttgcaaccaaaaaaaaaaaaaaaa 2356 <210> 56 <211> 1701 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 245..1399 <220>
<221> sig~eptide <222> 245..796 <223> Von Heijne matrix score 5.10 seq GwLPLLLLSLLVA/THI
<220>
<221> polyA_signal <222> 1669..1674 <220>
<221> polyA_site <222> 1687..1701 <400>

atcccgcgcagtggcccggc gatgtcgctc gtgctgctaagcctggccgcgctgtgcagg60 agcgccgtaccccgagagcc gaccgttcaa tgtggctctgaaactgggccatctccagag120 tggatgctacaacatgatct, aatcccggga gacttgagggacctccgagtagaacctgtt180 acaactagtgttgcaacagg ggactattca attttgatgaatgtaagctgggtactccgg240 gcag atg g aca ttt tcc tac atc ggc g aac aca 289 tg ttc cct gta gag ct Met Trp u Asn Thr Thr Phe Ser Tyr Ile Gly Phe Pro Val Glu Le gtc tat att ggg gcc cat aaa att cct gca aat aat gaa 337 ttc aat atg Val Tyr Ile Gly Ala His Lys Ile Pro Ala Asn Asn Glu Phe Asn Met gat ggc tcc atg tct gtg aat ttc acc cca ggc cta gac 385 cct tca tgc Asp Gly Ser Met Ser Val Asn Phe Thr Pro Gly Leu Asp Pro Ser Cys cac ata aaa tat aaa aaa aag tgt gtc gcc gga ctg tgg 433 atg aag agc His Ile Lys Tyr Lys Lys Lys Cys Val Ala Gly Leu Trp Met Lys Ser gat ccg atc act get tgt aag aag aat gag aca gaa gtg 481 aac gag gta Asp Pro Ile Thr Ala Cys Lys Lys Asn Glu Thr Glu Val Asn Glu Val aac ttc acc act ccc ctg gga aac aga atg get atc caa 529 aca tac ctt Asn Phe Thr Thr Pro Leu Gly Asn Arg Met Ala Ile Gln Thr Tyr Leu cac agc atc atc ggg ttt tct cag gtg gag cca cag aag 577 act ttt cac His Ser Ile Ile Gly Phe Ser Gln Val Glu Pro Gln Lys Thr Phe His aaa caa acg cga get tca gtg gtg att cca gtg act ggg gat agt gaa 625 Lys Gln Thr Arg Ala Ser Val Val Ile Pro Val Thr Gly Asp Ser Glu ggt get acg gtg cag ctg act cca tat ttt cct act tgt ggc agc gac 673 Gly Ala Thr Val Gln Leu Thr Pro Tyr Phe Pro Thr Cys Gly Ser Asp _55 -50 -45 tgc atc cga cat aaa gga aca gtt gtg ctc tgc cca caa aca ggc gtc 721 Cys Ile Arg His Lys Gly Thr Val Val Leu Cys Pro Gln Thr Gly Val cct ttc cct ctg gat aac aac aaa agc aag ccg gga ggc tgg ctg cct 769 Pro Phe Pro Leu Asp Asn Asn Lys Ser Lys Pro Gly Gly Trp Leu Pro ctc ctc ctg ctg tct ctg ctg gtg gcc aca tgg gtg ctg gtg gca ggg 817 Leu Leu Leu Leu Ser Leu Leu Val Ala Thr Trp Val Leu Val Ala Gly atctat ctaatgtgg aggcac gaaaggatc aagaagact tccttt tct 865 IleTyr LeuMetTrp ArgHis GluArgIle LysLysThr SerPhe Ser accacc acactactg cccccc attaaggtt cttgtggtt taccca tct 913 ThrThr ThrLeuLeu ProPro IleLysVal LeuValVal TyrPro Ser gaaata tgtttccat cacaca atttgttac ttcactgaa tttctt caa 961 GluIle CysPheHis HisThr IleCysTyr PheThrGlu PheLeu Gln aaccat tgcagaagt gaggtc atccttgaa aagtggcag aaaaag aaa 1009 AsnHis CysArgSer GluVal IleLeuGlu LysTrpGln LysLys Lys atagca gagatgggt ccagtg cagtggctt gccactcaa aagaag gca 1057 IleAla GluMetGly ProVal GlnTrpLeu AlaThrGln LysLys Ala gcagac aaagtcgtc ttcctt ctttccaat gacgtcaac agtgtg tgc 1105 AlaAsp LysValVal PheLeu LeuSerAsn AspValAsn SerVal Cys gatggt acctgtggc aagagc gagggcagt cccagtgag aactct caa 1153 AspGly ThrCysGly LysSer GluGlySer ProSerGlu AsnSer Gln gacctc ttccccctt gccttt aaccttttc tgcagtgat ctaaga agc 1201 AspLeu PheProLeu AlaPhe AsnLeuPhe CysSerAsp LeuArg Ser cagatt catctgcac aaatac gtggtggtc tactttaga gagatt gat 1249 GlnIle HisLeuHis LysTyr ValValVal TyrPheArg GluIle Asp WO 00/37491 PC'T/1B99/02058 aca aaa gat tat aat get ctc agt gtc ccc aag cat ctc 1297 gac tgc tat Thr Lys Asp Tyr Asn Ala Leu Ser Val Pro Lys His Leu Asp Cys Tyr atg aag gcc act get ttc tgt gca gaa ctc cat aag tag 1395 gat ctt gtc Met Lys Ala Thr Ala Phe Cys Ala Glu Leu His Lys Gln Asp Leu Val tag gtg gca gga aaa aga tca caa gcc cat gat tgc tgc 1393 tca tgc ggc Gln Val Ala Gly Lys Arg Ser Gln Ala His Asp Cys Cys Ser Cys Gly tcc ttg cccaccc atgagaagca agagacctta ccaccaa 1449 tag aaggcttcct atc Ser Leu ttacagggaaaaaacgtgtg atgatcctga agcttactatgcagcctacaaacagcctta1509 gtaattaaaacattttatac caataaaatt ttcaaatattgctaactaatgtagcattaa1569 ctaacgattggaaactacat ttacaacttc aaagctgttttatacatagaaatcaattac1629 agttttaattgaaaactata accattttga taatgcaacaataaagcatcttcagccaaa1689 aaaaaaaaaaas 1701 <210> 57 <211> 772 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 235..441 <220>
<221> sig_peptide <222> 235..303 <223> Von Heijne matrix score 5.30 seq LLLDVTVFIPALP/FS
<220>
<221> polyA_site <222> 758..772 <220>
<221> mist feature <222> 573 <223> n=a, g, c or t WO 00/37491 PCT/tt399/02058 <400> 57 aatacctggc aatctgttta agatcattga caggcctgag agttttccat acggcctgca 60 ccctaacctc tgggaagaaa atatccacaa tgaaatttct acaagattag aggaaggaga 120 gaggcaacgg ggattccatt tctactagga gtatcaacct ctgagaggga tatatccatc 180 tctgtggatg tcatctgctc tgcagaaaac cctttcttgg aactaccagg aaac atg 237 Met aat ctg atg tgg acc ctc ctc ctt ttc ctc ctt ttg gac gta act gtc 285 Asn Leu Met Trp Thr Leu Leu Leu Phe Leu_ Leu Leu Asp Val Thr Val -20 -15 'x -10 ttc att cca gcc ctg ccc ttc tca aca cga cat ata gac aac ccc agg 333 Phe Ile Pro Ala Leu Pro Phe Ser Thr Arg His Ile Asp Asn Pro Arg _5 1 5 10 tcg tgg gtc cct aga gga cac cac cga tac tgt gat gtg atg atg egg 381 Ser Trp Val Pro Arg Gly His His Arg Tyr Cys Asp Val Met Met Arg cgc cgt tgg ctg atc tat agg ggt aaa tgc gag cag atc cac aca ttc 429 Arg Arg Trp Leu Ile Tyr Arg Gly Lys Cys Glu Gln Ile His Thr Phe att cat atc tgaccaccat 481 aga agcagatttc tgcagaactc caccactgcc Ile His Ile Arg 45 ~

ctgtaccaacagcccctccatgtgcagctgccacaacagtactcatgatgtcaatgtcac541 tgactgctttgccagcacagggacccgacctnttcactgccactaccaaaaataaggagt601 ccaccaggcccatgcgagtgggctgcaagaagggggcatctgttcacctggatggctagg661 ttcctcctgacaacggcacctgaatgacttgcaccctacgccttcaaatctgtgcagcac721 tgtcaaggtcttctttgtaaatgcttcgtcctttgcaaaaaaaaaaaaaaa 77:2 <210> 58 <211> 987 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 88..411 <220>
<221> sig~eptide <222> 88..234 <223> Von Heijne matrix score 4.70 seq LLLVSTWSADLMS/YR
<220>
<221> polyA-signal <222> 938..943 <220>
<221> polyA-site <222> 964..987 <220>
<221> misc feature <222> 828,832 <223> n=a, g, c or t <400>

ttttttcttt ggca aggctgaaaactgcagggga 60 gacatgttca tctggttgtg gatgtt ataatcca gg t atg aac 114 cctgaatata aag acc cac tacaaa aag gac tgc tca Met Asn Lys Thr His Lys Asp Cys Ser tcaccc cagtattcc atttac aac atc ctg gaa ctc accagg 162 aat ccg SerPro GlnTyrSer IleTyr Asn Ile Leu Glu Leu ThrArg Asn Pro cctata attctctct tgcagc caa ata tcc tta ctc gtatct 210 tgc ctg ProIle IleLeuSer CysSer Gln Ile Ser Leu Leu ValSer Cys Leu acctgg tcagcagac ctcatg agt tat cgc gtg aca ccatcc 258 cca aaa ThrTrp SerAlaAsp LeuMet Ser Tyr Arg Val Thr ProSer Pro Lys caaaga tgcaccagt ccagca caa agt atg gtc aat acaaaa 306 act ctc GlnArg CysThrSer ProAla Gln Ser Met Val Asn ThrLys Thr Leu gatgta gggttctac gaggat act cag agt aga att ctaagt 354 ata acg AspVal GlyPheTyr GluAsp Thr Gln Ser Arg Ile LeuSer Ile Thr gaaata agccaagcc cagaaa gac aca tac att att tgtatc 402 ttt tca GluIle SerGlnAla GlnLys Asp Thr Tyr Ile Ile CysIle Phe Ser tgtgga atctaaaagagtc aaattcatgg 451 cagcagggag agggctgaag CysGly Ile aagggggaga tgttgatcaa gtttctatg ccaaaccatcacattatgcc a tatacaaaga tcataaatat atacaattat atttgctaa agcaatacaagaagaaaaaa t ttacaagtaa aggaatcataagtaaatccatgacaagtgaaaacgcaatggagagaagggaatcaatgat 631 tgaagaagagaaaggacagtggatttacaactgcttcgaaagagtgatttgactggcaaa 691 ggactggggagaggtcctttgggaaatggacaaaaccctcgaatggttaggaaagacaat 751 ctctttataaatgcggggcataagctgagcacaaggtgaagtttggcatgtactgccgtg 811 ggatgttgtaaaaattnatgntcaaaagcaaagcaattcttggttcatctgtgttcactg 871 tgagactagcctattattggggttaaacttataaacaaacttctgttcatcatttttttt 931 ctccaaaataaagtgatcaaattgtcccacagaaaaaaaaaaaaaaaaaaaaaaaa 987 <210> 59 <211> 1324 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 129..452 <220>
<221> sig~eptide <222> 129..212 <223> Von Heijne matrix score 5.20 seq LDIVISFVGAVSS/ST
<220>
<221> polyA-signal <222> 1290..1295 <220>
<221> polyA_site <222> 1309..1324 <220>
<221> misc feature <222> 888,1080 <223> n=a, g, c or t <400> 59 gatttttttc acaagcaata gtttagtagt tcaactttca ttaattattt ctagtaatta 60 ctttcagtat tgaaaatact tactgttaat attcatgtaa gtaacaaaca tttaaataag 120 aaaaataa atg tat ttt cat ttt cta ggt gcc gga gca att ctt att cct 170 Met Tyr Phe His Phe Leu Gly Ala Gly Ala Ile Leu Ile Pro cgt tta gac att gtg att tcc ttc gtt gga get gtg agc agc agc aca 218 Arg Leu Asp Ile Val Ile Ser Phe Val Gly Ala Val Ser Ser Ser Thr ttg gcc cta atc ctg cca cct ttg gtt gaa att ctt aca ttt tcg aag 266 Leu Ala Leu Ile Leu Pro Pro Leu Val Glu Ile Leu Thr Phe Ser Lys gaa cat tat aat ata tgg atg gtc ctg aaa aat att tct ata gca ttc 314 Glu His Tyr Asn Ile Trp Met Val Leu Lys Asn Ile Ser Ile Ala Phe act gga gtt gtt ggc ttc tta tta ggt aca tat ata act gtt gaa gaa 362 Thr Gly Val Val Gly Phe Leu Leu Gly Thr Tyr Ile Thr Val Glu Glu att att tat cct act ccc aaa gtt gta get ggc act cca cag agt cct 410 Ile Ile Tyr Pro Thr Pro Lys Val Val Ala Gly Thr Pro Gln Ser Pro ttt cta aat ttg aat tca aca tgc tta aca tct ggt ttg aaa 452 Phe Leu Asn Leu Asn Ser Thr Cys Leu Thr Ser Gly Leu Lys ~n 75 80 tagtaaaagcagaatcatgagtcttctatttttgtcccatttctgaaaattatcaagata 512 actagtaaaatacattgctatatacataaaaatggtaacaaactctgttttctttggcac 572 gatattaatattttggaagtaatcataactctttaccagtagtggtaaacctatgaaaaa 632 tccttgcttttaagtgttagcaatagttcaaaaaattaagttctgaaaattgaaaaaatt 692 aaaatgtaaaaaaattaaagaataaaaatacttctattattcttttatctcagtaagaaa 752 taccttaaccaagatatctctcttttatgctactcttttgccactcacttgagaacagaa 812 taggatttcaacaataagagaataaaataagaacatgtataacaaaaagctctctccaga 872 tcatccctgtgaatgnccaaagtaaactttatgtacagtgtaaaaaaaaaaaaatctcag 932 ttatgtttttattagccaaattctaatgattggctcctggaagtatagaaaactcccatt 992 aacataatataagcatcagaaaattgcaaacactagaatfaattttacactctaatggta 1052 gttgatcttcatagtcaagaggcactgntcaagatcatgacttagtgtttcaatgaaatt 1112 tgacaagggactttaaaacttatccagtgcaactcccttgtttttcgtcagaggaaaagg 1172 aggcctagaaaggttaagtaacttggtcgagaccactcagccttgagatcaagaaaacct 1232 aatcttctgactcccaggccaggatgttttatttctcacatcatgtccaagaaaaagaat 1292 aaattatgttcagctcaaaaaaaaaaaaaaas 1324 <210> 60 <211> 1918 <212> DNA
<213> Homo Sapiens <220>
<221> CDS

<222> 238..612 <220>
<221> sig_peptide <222> 238..348 <223> Von Heijne matrix score 9.40 seq LLCCVLSASQLSS/QD
<220>
<221> polyA_signal <222> 1885..1890 <220>
<221> polyA-site <222> 1905..1918 <220>
<221> misc feature <222> 945,1624 <223> n=a, g, c or t <400>

aaaaatctaa gttgtgtaaa tgtgcacgcg gcgacttcga ctacaccaca tgccaaggaa cccagggtgg ttaaacaatc aattgtttgt aaaccacagt ttaacatctg tgcagagtca tgataggcag gatacctacg aaaatcaaaa ctttccttct taaatgcaag tttcaacagt ctgaggtttt caaccccaga aggccgacac gtgctcactg aaaaaaa aaagggctgt atggta tgtgaagat gcaccg tcttttcaa atggcc tgggagagt caa 285 MetVal CysGluAsp AlaPro SerPheGln MetAla TrpGluSer Gln atggcc tgggagagg gggcct gcccttctc tgctgt gtcctttcg get 333 MetAla TrpGluArg GlyPro AlaLeuLeu CysCys ValLeuSer Ala tcccag ttgagctcc caagac caggaccca ctgggg catataaaa tct 381 SerGln LeuSerSer GlnAsp GlnAspPro LeuGly HisIleLys Ser ctgctg tatcctttc ggcttc ccagttgag ctccca agaccagga ccc 429 LeuLeu TyrProPhe GlyPhe ProValGlu LeuPro ArgProGly Pro actggg gcatataaa aaagtc aaaaatcaa aatcaa acaacaagt tct 477 ThrGly AlaTyrLys LysVal LysAsnGln AsnGln ThrThrSer Ser gag tta ctt agg aaa cag act tcg cat ttc aat cag aga ggc cac aga 525 Glu Leu Leu Arg Lys Gln Thr Ser His Phe Asn Gln Arg Gly His Arg gca agg tct aaa ctt ctg get tct aga caa att cct gat aga aca ttt 573 Ala Arg Ser Lys Leu Leu Ala Ser Arg Gln Ile Pro Asp Arg Thr Phe aaa tgt ggg aag tgg ctt ccc cag gtc cca tcc cct gtt tagggataga 622 Lys Cys Gly Lys Trp Leu Pro Gln Val Pro Ser Pro Val gttgatatcatttttatagttgccatgtatgcctctgcctgaatttttttaattgacttt682 tgagcttttgagattgcacgagggagaacaaggcctttgctgttgtggataggaaagact742 taacctaaaattaaaccagcaagaaagcattagtaaaaatctaacaatatgaagggctct802 tatgagtcatttttttcaaaagatgaaaactccagaaacgcacaggaacgaaatacctcc862 cagaaacatgaagcaatcatcgaagactcactggtaatatttttaaaaagtatacagatc922 aaagcaaaaagaagccatgtgtnaacaaagagaaatgtgcaaatattttttaaggcagta982 ttaagtgcaagaggagtaacatgaaataaacattctttcacatggctactgggaatataa1042 atttcgctccagaaaggccgtagcagtttgacgataggtggcaaaaccttaagattgtgt1102 actggggcccagaatttttatttctaggaatgtatcctga.ggaaattatccgagatcccc1162 acaaactgcaatgtttaggaattgtccttatagcattgcatacacaagaaaaacagagaa1222 aagcctgatccctgtcagtggaaaaggggttcaatgaattacggtgtgtctgcatgaggc1282 ttttatgacattaaaaattgttgaacaacggccaggcacagtggctcatgcctgtaatcc1342 taacactttgggaggccaaggtgggaagattgcctgagctcaggagtttgagaccagcct1402 gggcaacacggtgaaaccccgtctctactaaaatacaaaaaattagccgggcgtcgcagc1462 atgcgcctgtagtcccagctgctcaggaggctgaggcaggagaattgattgaacccggga1522 ggcagaggttgcactgagctgagattaagccaccgcactccagcctgggcgacagagcaa1582 gattccgttcccaagaaaaaaaaattgttcaacaataagggncaaagggagagaatcata1642 acatctgattaaacagaaaaagcaagatttttaaaactaactatataaggatggtcccag1702 ctgtgtcaaaaggaagcttgtttgtaatacgtgtgcataaaaattaaatagaggtgaaca1762 caattattttaaggcagttaaattatctctgtattgtgaactaagactttctagaatttt1822 acttattcattctgtacttaaattttttctaatgaacacatatacttttgtaatcagaaa1882 atattaaatgcatgtatttttcaaaaaaaaaaaaaa 1918 <210> 61 <211> 852 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 229..735 <220>

lOG
<221> sig_peptide <222> 229..492 <223> Von Heijne matrix score 6.70 seq VFALSSFLNKASA/VY
<220>
<221> polyA-signal <222> 816..821 <220>
<221> polyA_site <222> 841..852 <400> 61 aatgactggc ctgcgtcggg gtcggttctg agtggcatca cagcgctgta gcgatggcgg tcgtgtcgcc ctctgatctt cctgcatggc caggtgatt ggcagggagg t catagcgcct ctggacaagg aggtttttaa atcaagattt acattccaa attaagaatg a tggatcaagc cacataaaaa cccagatcat atactcct atgaaagga 237 ttatttatcc aacagctcct MetLysGly ggaatctcc aatgtatgg tttgac agatttaaa ataacc aatgactgc 285 GlyIleSer AsnValTrp PheAsp ArgPheLys IleThr AsnAspCys ccagaacac cttgaatca attgat gtcatgtgt caagtg cttactgat 333 ProGluHis LeuGluSer IleAsp ValMetCys GlnVal LeuThrAsp ttgattgat gaagaagta aaaagt ggcatcaag aagaac aggatatta 381 LeuIleAsp GluGluVal LysSer GlyIleLys LysAsn ArgIleLeu ataggagga ttctctatg ggagga tgcatggca atgcat ttagcatat 429 IleGlyGly PheSerMet GlyGly CysMetAla MetHis LeuAlaTyr agaaatcat caagatgtg gcagga gtatttget ctttct agttttctg 477 ArgAsnHis GlnAspVal AlaGly ValPheAla LeuSer SerPheLeu aataaagca tctgetgtt taccag getcttcag aagagt aatggtgta 525 AsnLysAla SerAlaVal TyrGln AlaLeuGln LysSer AsnGlyVal cttcctgaa ttatttcag tgtcat ggtactgca gatgag ttagttctt 573 LeuProGlu LeuPheGln CysHis GlyThrAla AspGlu LeuValLeu 15 20 . 25 cattcttgg gcagaagag acaaac tcaatgtta aaatct ctaggagtg 621 His Ser Trp Ala Glu Glu Thr Rsn Ser Met Leu Lys Ser Leu Gly Val accacgaagttt cat agtttt aatgtt tac cat gag cta aaa 669 cca agc ThrThrLysPhe His SerPhe AsnVal Tyr His Glu Leu Lys Pro Ser actgagttagac ata ttgaag tggatt ctt aca aag ctg gga 717 tta cca ThrGluLeuAsp Ile LeuLys TrpIle Leu Thr Lys Leu Gly Leu Pro gaaatggaaaaa caa aaatgaatgaatcaagagtgatt 765 tgttaatgta GluMetGluLys Gln Lys agtgtaatgt ctttgtgaaa actgccaaattataatgata 825 agtgattttt attaaaatat taagaaatag caaaaaaaaa 852 aaaaaaa <210> 62 <211> 726 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 168..413 <220>
<221> sig~peptide <222> 168..335 <223> Von Heijne matrix score 3.80 seq QMIMLVCFNLSRG/CL
<220>
<221> polyA_signal <222> 684..689 <220>
<221> polyA site <222> 708..726 <220>
<221> misc feature <222> 723 <223> n=a, g, c or t <400>

cagcaaaatggcagggaagg cagctctaagctcccatccttccataggaatgttgaataa 60 acaaccagacactgtcagaa ccaactttgtgagaaccgggaaaataatcaaaggtgtacg 120 gcaactaaaagaatgctgga tcaacacaaaggaaacttaaaaatgat aaa get 176 atg Met Lys Ala gtg tgg ttt tgc ttg tcc aag tcc ttg gtg gtc ttg 224 cat cac agc ata Val Trp Phe Cys Leu Ser Lys Ser Leu Val Val Leu His His Ser Ile aag acg ggc tgg att ccc get ggg ctt atc ggt tcc 272 gca cag acc cct Lys Thr Gly Trp Ile Pro Ala Gly Leu Ile Gly Ser Ala Gln Thr Pro aga gag agc aga tct gat caa atg atg ctt tgt ttt 320 gag tca att gtc Arg Glu Ser Arg Ser Asp Gln Met Met Leu Cys Phe Glu Ser Ile Val aat ctt aga ggc tgt ctg aag gta atc atc gtt tta 368 tcc aag ttc tct Asn Leu Arg Gly Cys Leu Lys Val Ile Ile Val Leu Ser Lys Phe Ser cct gac gaa acc att ctg gga aaa gtg ggc get 413 cca cta aca att Pro Asp Glu Thr Ile Leu Gly Lys Val Gly Ala Pro Leu Thr Ile tgaaaacagtgttctgtggt tgaaaaacccacagtcaccttgggctggtgggaatgtaaa 473 atggcgcctcttctggatca tcgtttggcagtttctcaaaaggtcaaacgtagaatcact 533 atttgatccaacaattctac tcctaggtatatccccaaaagaattgaaaacaaggatgca 593 aacatatgcgtgtacactaa tgtttatagaaaaaatattcacaataatcaaaaggcagaa 653 acaacccaagtgtccaataa cagaagaatgaataaacagtgtgatataaacataaaaaaa 713 aaaaaaaaanaaa 726 <210> 63 <211> 1039 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 100..852 <220>
<221> sig~eptide <222> 100..159 <223> Von Heijne matrix score 6.10 seq FLILFLFLMECQL/HL
<220>
<221> polyA-signal <222> 998..2003 <220>
<221> polyA-site <222> 1019..1039 <400> 63 agaacttctt gattcctcag ataaatagag gacagatgct ggactgtagc taagtatttc 60 ctttcatcta cgggataaaa tactgataat ttgagagtg atg gac aag gtt cag 114 Met Asp Lys Val Gln agtggtttc ctcattttg tttttg tttttaatggaa tgccaactt cat 162 SerGlyPhe LeuIleLeu PheLeu PheLeuMetGlu CysGlnLeu His ttatgcttg ccgtatgca gatgga ctccatcccact ggaaacata aca 210 LeuCysLeu ProTyrAla AspGly LeuHisProThr GlyAsnIle Thr ggcttacca ggtagcttc aaccac tggttttatgtg actcaggga gaa 258 GlyLeuPro GlySerPhe AsnHis TrpPheTyrVal ThrGlnGly Glu ttgaaaagc tgtttcagg ggagat aaaaagaaggta attacattt cac 306 LeuLysSer CysPheArg GlyAsp LysLysLysVal IleThrPhe His cgcaaaaag ttttctttt caaggc agtaaacggtca caaccaccc aga 354 ArgLysLys PheSerPhe GlnGly SerLysArgSer GlnProPro Arg aacatcacc aaagagccc aaagtg ttctttcataaa acccagttg cct 402 AsnIleThr LysGluPro LysVal PhePheHisLys ThrGlnLeu Pro gggattcaa ggggetgcc tcgaga tccacggetgca tcccctacg aac 450 GlyIleGln GlyAlaAla SerArg SerThrAlaAla SerProThr Asn cccatgaaa ttcctgagg aataaa gcaataattcgg catagacct get 498 ProMetLys PheLeuArg AsnLys AlaIleIleArg HisArgPro Ala cttgttaaa gtaatttta atttcg agcgtagccttc agcattgcc ctg 546 Leu Val Lys Val Ile Leu Ile Ser Ser Val Ala Phe Ser Ile Ala Leu atatgtgggatg gcaatctcc tatatgata tatcga ctggca cagget 594 IleCysGlyMet AlaIleSer TyrMetIle TyrArg LeuAla GlnAla gaggaaagacaa cagctcgag tcactttat aagaac ctcagg ataccg 642 GluGluArgGln GlnLeuGlu SerLeuTyr LysAsn LeuArg IlePro ttattaggagat gaagaagag ggctcagag gacgag ggtgag tccacg 690 LeuLeuGlyAsp GluGluGlu GlySerGlu AspGlu GlyGlu SerThr cacctacttcca aagaacgaa aatgagctg gaaaag ttcatc cactca 738 HisLeuLeuPro LysAsnGlu AsnGluLeu GluLys PheIle HisSer gttattatatca aaaagaagc aaaaatatt aagaag aaactg aaggaa 786 ValIleIleSer LysArgSer LysAsnIle LysLys LysLeu LysGlu gagcaaaactca gtaacagaa aacaaaaca aagaat gcgtca cataat 834 GluGlnAsnSer ValThrGlu AsnLysThr LysAsn AlaSer HisAsn ggaaaaatggaa gacttgtgaacgca ga cgacagaggt 882 gccggctgag GlyLysMetGlu AspLeu gcagaggaga aactatgggg actgagc ctgtgggcgtggc gtgctgggag ttgctcccag agaaccttat ggaagaggac gaaatgc cagacctgtatcc atcaaagaaa cagaaaataa agccacatga tatagcaaaa aaaaaaa 1039 aaaaaaaaaa <210> 64 <211>1355 <212>DNA

<213>Homo Sapiens <220>

<221>CDS

<222> 238..1152 <220>
<221> sig~eptide <222> 238..339 <223> Von Heijne matrix score 8.50 seq SIFLLLSFPDSNG/KA

<220>
<221> polyA_signal <222> 1298..1303 <220>
<221> polyA_site <222> 1324..1355 <400>

aattttcttg aaatcacatg tattttgttt cattatgaga gtaccaatca caagtcttgt aagataatct actaaatatt atagctttga tccagggaga aaaatactgg aaggagcaag ccttttccat ttatgtgctt gctatcttct ttatgttctt tagtaatctg ccgccaacaa ctacaactga tgttgttttg aatagacaaa tggaggc ttttctcatg tttgtctctt atgagcttc cttaga attacccct tcgacgcat agttctgtttca tct 285 MetSerPhe LeuArg IleThrPro SerThrHis SerSerValSer Ser ggacttttg aggctt agtatcttt ctactactt agctttcctgac tca 333 GlyLeuLeu ArgLeu SerIlePhe LeuLeuLeu SerPheProAsp Ser aacggaaaa gccatt tggacaget cacctgaat ataacatttcag gtt 381 AsnGlyLys AlaIle TrpThrAla HisLeuAsn IleThrPheGln Val ggaaatgag atcaca tcggaatta ggagagagt ggagtgttcggg aat 429 GlyAsnGlu IleThr SerGluLeu GlyGluSer GlyValPheGly Asn cattctcct ctggaa agggtgtct ggtgtggtg gcacttcctgaa gaa 477 HisSerPro LeuGlu ArgValSer GlyValVal AlaLeuProGlu Glu tggaatcag aatgcc tgtcatcct ttgaccaat ttcagcaggccc aaa 525 TrpAsnGln AsnAla CysHisPro LeuThrAsn PheSerArgPro Lys caggcagac tcatgg ctggccctc atcgaacgt ggaggctgtact ttt 573 GlnAlaAsp SerTrp LeuAlaLeu IleGluArg GlyGlyCysThr Phe acacataaa atcaac gtggcagca gagaaggga gcaaatggggtg atc 621 ThrHisLys IleAsn ValAlaAla GluLysGly AlaAsnGlyVal Ile atctacaac tatcaa ggtacgggc agtaaagta tttcccatgtct cac 669 IleTyrAsn TyrGln GlyThrGly SerLysVal PheProMetSer His caggggacg gaaaat atagtcgcg gtgatgata agcaacctgaaa ggc 717 Gln Gly Thr Glu Asn Ile Val Ala Val Met Ile Ser Asn Leu Lys Gly atggaa ttg cactcg attcagaaagga gtctat gtgaca gtcatc 765 att MetGlu Leu HisSer IleGlnLysGly ValTyr ValThr ValIle Ile attgaa ggg agaatg cacatgcagtgg gtgagc cattac atcatg 813 gtg IleGlu Gly ArgMet HisMetGlnTrp ValSer HisTyr IleMet Val tatcta acc ttcctg getgccacaatt gcctac ttttac ttagat 861 ttt TyrLeu Thr PheLeu AlaAlaThrIle AlaTyr PheTyr LeuAsp Phe tgcgtc aga cttaca cctagagtgccc aattct ttcacc aggagg 909 tgg CysVal Arg LeuThr ProArgValPro AsnSer PheThr ArgArg Trp cgaagt ata aagaca gatgtgaagaaa getatt gaccag cttcaa 957 caa ArgSer Ile LysThr AspValLysLys AlaIle AspGln LeuGln Gln ctgcga ctc aaagaa ggggatgaggaa ttagac ctaaat gaagac 1005 gtt LeuArg Leu LysGlu GlyAspGluGlu LeuAsp LeuAsn GluAsp Val aactgt gtt tgcttt gacacatacaaa ccccaa gatgta gtacgc 1053 gtt AsnCys Val CysPhe AspThrTyrLys ProGln AspVal ValArg Val atttta tgc aaacat tttttccataag gcatgc attgac ccctgg 1101 act IleLeu Cys LysHis PhePheHisLys AlaCys IleAsp ProTrp Thr ctttta cat aggaca tgtcccatgtgc aagtgt gacatc ctgaaa 1149 gcc LeuLeu His ArgThr CysProMetCys LysCys AspIle LeuLys Ala acttaagaaatct ttccaaatac 1202 ggagaatttt ctgaagatgt aaccagatct Thr aaagattaga taaattgtct tatgtagaga gaaaatttca tattgtactt gcttctctac ccaagtatga acaagggtga ttaaaaataa aactccttat aatttgtgtt catgcccagc taaaaaaaaa aaaaaaaaaa aaa 1355 aaaaaaaaaa <210>

<211>

<212>
DNA

<213> Sapiens Homo <220>

<221>
CDS

<222> 369 187..

<220>
<221> sig~eptide <222> 187..312 <223> Von Heijne matrix score 7.10 seq LLPCSSVLTCGQA/SQ
<220>
<221> polyA_signal <222> 489..494 <220>
<221> polyA_site <222> 558..572 <220>
<221> misc feature <222> 94,527,537..538 <223> n=a, g, c or t <400>

cttcttcagtcagtggctgg ataatctaattataatgttataatccatca tttctctttt60 tgaacagtcaatttagttta acatttgcttaacnagccattatgtatgcc aggtaatgtg120 ctagatgctggtggttcaaa gaaaggaacgatgtggacctgacctcaaag aaatccattg180 gagaat aca gat tta gat tg atc 228 atg tta a aac ttt act ttt cct ata Met Thr Asp Leu Asp et Ile Leu M Asn Phe Thr Phe Pro Ile cag tgg aac caa aac cgc gcg tac tct ctg aag cct cta 276 gtc atg tac Gln Trp Asn Gln Asn Arg Ala Tyr Ser Leu Lys Pro Leu Val Met Tyr cta ccc tcc tcc gtg ttg tgt ggt gca agc cag gac tta 324 tgc aca cag Leu Pro Ser Ser Val Leu Cys Gly Ala Ser Gln Asp Leu Cys Thr Gln ctc aca get aca tca gtt ggg atg aaa att gaa gcc 369 tca act gag Leu Thr Ala Thr Ser Val Gly Met Lys Ile Glu Ala Ser Thr Glu tagaaagatcaagaaacttt ctccaggccataaatagaggaatcaggatt caaatcagat429 agaccccagggcttgttctc ttcaacaccacattaccctacattattatt caattattaa489 ataaaaccttgcattagtgg catttccaaatgcataancaaaaaaatnna aaaaaaagta549 acactggcaaaaaaaaaaaa aaa 572 WO 00/37491 PCTlIB99/02058 <210> 66 <211> 535 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 121..459 <220>
<221> sig~eptide <222> 121..165 <223> Von Heijne matrix score 4.20 seq FYLLLASSILCAL/IV
<220>
<221> polyA-signal <222> 497..502 <220>
<221> polyA-site <222> 521..535 <220>
<221> misc feature <222> 486,489 <223> n=a, g, c or t <400>

agttacacca cccaaatcca ggcggctaga ggcatcctgg ggcccactgc cccaaagttt ttcccaacta cgagaaggga gaagaggccg ccagctgagg aagaggaaac gggtccgtcc atgaacttctat ttactccta gcgagcagc attctgtgt gccttg att 168 MetAsnPheTyr LeuLeuLeu AlaSerSer IleLeuCys AlaLeu Ile gtcttctggaaa tatcgccgc tttcagaga aacactggc gaaatg tca 216 ValPheTrpLys TyrArgArg PheGlnArg AsnThrGly GluMet Ser tcaaattcaact getcttgca ctagtgaga ccctcttct tctggg tta 264 SerAsnSerThr AlaLeuAla LeuValArg ProSerSer SerGly Leu attaacagcaat acagacaac aatcttgca gtctacgac ctctct cgg 312 IleAsnSerAsnThr AspAsn AsnLeuAla ValTyr AspLeuSer Arg gatattttaaataat ttccca cactcaata gccagg cagaagcga ata 360 AspIleLeuAsnAsn PhePro HisSerIle AlaArg GlnLysArg Ile ttggtaaacctcagt atggtg gaaaacaag ctggtt gaactggaa cat 408 LeuValAsnLeuSer MetVal GluAsnLys LeuVal GluLeuGlu His actctacttagcaag ggtttc agaggtgca tcacct caccggaaa tcc 456 ThrLeuLeuSerLys GlyPhe ArgGlyAla SerPro HisArgLys Ser acctaaaagcgta caggatgtaa aaagacactt 509 tgccagnggn ggaaatcatt Thr tgagtagatt caaaaaaaaa 535 aaaaaa <210> 67 <211> 572 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 34..336 <220>
<221> sig~eptide <222> 34..123 <223w Von Heijne matrix score 7.80 seq SVTLAQLLQLVQQ/GQ
<220>
<221> polyA_signal <222> 536..541 <220>
<221> polyA site <222> 556..572 <220>
<221> misc feature <222> 545 <223> n=a, g, c or t <400>

gcattacacg ccggtcagga gac gagcgtccc c agt ccc 54 ttcgcgaccc atg cg Met GluArg g Ser Pro Pro Ar caa tcggcc ccggcctctgcc tcaget tcggttaccctg gcg cag 102 tgc Gln SerAla ProAlaSerAla SerAla SerValThrLeu Ala Gln Cys ctc cagctg gtccagcagggc caggaa ctcccgggcctg gag aaa 150 ctg Leu GlnLeu ValGlnGlnGly GlnGlu LeuProGlyLeu Glu Lys Leu cgc atcgcg gcgatccacggc gaaccc acagcgtcccgg ctg ccg 19B
cac Arg IleAla AlaIleHisGly GluPro ThrAlaSerArg Leu Pro His cgg cccaag ccctgggaggcc gcgget ttggetgagtcc ctt ccc 246 agg Arg ProLys ProTrpGluAla AlaAla LeuAlaGluSer Leu Pro Arg cct accctc aggataggaacg gccccg gcggagcctggc ttg gtt 294 ccg Pro ThrLeu ArgIleGlyThr AlaPro AlaGluProGly Leu Val Pro gag gcgact gcgccttcttca tggcat acagtgggcccc 336 gca Glu AlaThr AlaProSerSer TrpHis ThrValGlyPro Ala tgaggttcca ggtcctttgc tggagggcgt ggctacagga cccgggatgc396 ggcggcgatc cattcagtta ctcatctttt cctgacctgt ctcaactaga cttgctcctg456 atgctttcgt caaccaccat gggggttttg tgtggaccat gttacagtta agaaaaatcc516 catttacatt tgtttcagtc cttatatgta ttatgat gcaaaaaaaaaaa aaaaaa ataaaatgnt <210> 68 <211> 804 < 212 > DHTA
<213> Homo Sapiens <220>
<221> CDS
<222> 119..409 <220>
<221> sig_peptide <222> 119..388 , <223> Von Heijne matrix 117 ' score 4.30 seq TCLTACWTALCCC/CL
<220>
<221> polyA-signal <222> 769..774 <220>
<221> polyA_site <222> 789..804 <220>
<221> misc feature <222> 274 <223> n=a, g, c or t <220>
<221> unsure <222> -39 <223> Xaa = His,Gln <400>

acttgctc tg ctgcgggctg gtccgggctc 60 agacaggtgc ctcaggttca ggcaagtcta gacccgac cg gaggagaggt gcactttaca 118 ttatccagtc ggtcccca ggttcgtgga atgaac caagagaac cctcca ccatatcca ggccctggt ccaacg gcc 166 MetAsn GlnGluAsn ProPro ProTyrPro GlyProGly ProThr Ala ccatac ccaccttat ccacca caaccaatg ggtccagga cctatg ggg 214 ProTyr ProProTyr ProPro GlnProMet GlyProGly ProMet Gly ggaccc tacccacct cctcaa gggtacccc taccaagga taccta cag 262 GlyPro TyrProPro ProGln GlyTyrPro TyrGlnGly TyrLeu Gln tacggc tggcanggt ggacct caggagcct cctaaaacc acagtg tat 310 TyrGly TrpXaaGly GlyPro GlnGluPro ProLysThr ThrVal Tyr gtggta gaagaccaa agaaga gatgagcta ggaccatcc acctgc ctc 358 ValVal GluAspGln ArgArg AspGluLeu GlyProSer ThrCys Leu acagcc tgctggacg getctc tgttgctgc tgtctctgg gacatg ctc 406 ThrAla CysTrpThr AlaLeu CysCysCys CysLeuTrp AspMet Leu acc tgaccagacc agcccagccg tcctgtcctg ccagctctgc tgccacctct 459 Thr gacaggtgtgcctgcccccatctcttctgattgctgttaacaaatgactagctttgcaca 519 gacacctctaccttcagcactatgggattctagattaatgggggttgctactgtttaatt 579 cagtgacttgatctttttaatgtccaaaatccatttcttattgatctttaaagatgtgct 639 aaatgacttttttggccaaaggcttagttgtgaaaaatataatttttaaattatacattc 699 aaggtagtggccaaatgtaacacatcaatcatggaatgatttctctgctaacagccgcct 759 gtatgtttcaataaatttgtccaaagctcaaaaaaaaaaaaaaaa 804 <210> 69 <211> 629 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 232..534 <220>
<221> sig_peptide <222> 232..306 <223> Von Heijne matrix score 3.70 seq AKTCLVLCSRVLS/VI
<220>
<221> polyA-signal <222> 595..600 <220>
<221> polyA site <222> 615..629 <400> 69 tatcactgtt acgaaccaag gatttacaga tcactggcaa aaattctgag aactttcaca 60 ccagtatact gtccaagccc attaagtggc atcacacctc tcttttatgt agctcagaca 120 agacagtcta atatcttcaa aatactactg caatatggaa tcttagaaag agaaaaaaac 180 cctatcaaca ttgtcttaac aatagtactc tacccttcga gagtaagagt a atg gtt 237 Met Val gat cgt gaa ttg get gac atc cat gaa gat gcc aaa aca tgt ttg gta 285 Asp Arg Glu Leu Ala Asp Ile His Glu Asp Ala Lys Thr Cys Leu Val ctatgttccaga gtgctttct gtcatt tcagtcaag gaaataaag aca 333 LeuCysSerArg ValLeuSer ValIle SerValLys GluIleLys Thr cagctgagttta ggaagacat ccaatt atttcaaat tggtttgat tac 381 GlnLeuSerLeu GlyArgHis ProIle IleSerAsn TrpPheAsp Tyr attccttcaaca agatacaaa gatcca tgtgaacta ttacatctt tgc 429 IleProSerThr ArgTyrLys AspPro CysGluLeu LeuHisLeu Cys agactaaccatc aggaatcaa ctatta accaacaat atgctccca gat 477 ArgLeuThrIle ArgAsnGln LeuLeu ThrAsnAsn MetLeuPro Asp ggaatattttca cttctaatt cctget cgtctacaa aactatctg aat 525 GlyIlePheSer LeuLeuIle ProAla ArgLeuGln AsnTyrLeu Asn ttagaaatctaacatacgt cagtgtccta 574 agttccttaa caatgcttac LeuGluIle caatgtatgg cttagaagtt ataaaaatt aaaaaaaaaa aaaaa a cacttcatgc <210> 70 <211> 669 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 140..595 <220>
<221> sig_peptide <222> 140..442 <223> Von Heijne matrix score 4.10 seq VFMLIVSVLALIP/ET
<220>
<221> polyA_signal <222> 630..635 <220>

<221> polyA-site <222> 655..669 <400> 70 gagcgggaag gggagggcgg tgctccgccg 60 ccgagctggg cggtggcggt cgagaagtag tgctatcgct c ctcaggcagccagct gagaagagttgag gctgc 120 tcgcagaac ta ggatt tgctgggtct atggat aacgtg ccgaaa ataaaacat cgc 172 gcagacgcg cag MetAsp AsnVal GlnProLys IleLysHis Arg ccc ttctgcttc agtgtgaaa ggccac gtgaagatg ctgcggctg gca 220 Pro PheCysPhe SerValLys GlyHis ValLysMet LeuArgLeu Ala cta actgtgaca tctatgacc tttttt atcatcgca caagcccct gaa 268 Leu ThrValThr SerMetThr PhePhe IleIleAla GlnAlaPro Glu cca tatattgtt atcactgga tttgaa gtcaccgtt atcttattt ttc 316 Pro TyrIleVal IleThrGly PheGlu ValThrVal IleLeuPhe Phe ata cttttatat gtactcaga cttgat cgattaatg aagtggtta ttt 364 Ile LeuLeuTyr ValLeuArg LeuAsp ArgLeuMet LysTrpLeu Phe tgg cctttgctt gatattatc aactca ctggtaaca acagtattc atg 412 Trp ProLeuLeu AspIleIle AsnSer LeuValThr ThrValPhe Met ctc atcgtatct gtgttggca ctgata ccagaaacc acaacattg aca 460 Leu IleValSer ValLeuAla LeuIle ProGluThr ThrThrLeu Thr gtt ggtggaggg gtgtttgca cttgtg acagcagta tgctgtctt gcc 508 Val GlyGlyGly ValPheAla LeuVal ThrAlaVal CysCysLeu Ala gac ggggccctt atttaccgg aagctt ct9ttcaat cccagcggt cct 556 Asp GlyAlaLeu IleTyrArg LysLeu LeuPheAsn ProSerGly Pro tac cagaaaaag cctgtgcat gaaaaa aaagaagtt ttgtaattttata 605 Tyr GlnLysLys ProValHis GluLys LysGluVal Leu ttacttttta gtttgatact tattcttcca 665 aagtattaaa aaaaaaaaaa catatttctg aaaa 669 <210> 1 <211>

<212>
DNA

<213> Homo Sapiens <220>
<221> CDS
<222> 32..658 <220>
<221> sig~eptide <222> 32..289 <223> Von Heijne matrix score 4.00 seq KLWKLLFLMKSQG/WI
<220>
<221> polyA_signal <222> 936..941 <220>
<221> polyA_site <222> 959..973 <220>
<221> misc feature <222> 934 <223> n=a, g, c or t <400>

agggagag gg c 52 atggctagtg atg aggtttagat ttg agc cct acc ttt gtt Met Leu Ser Pro Thr Phe Val ttgtgg gatgttgga tatccctta tacacctat ggatcc atctgcatt 100 LeuTrp AspValGly TyrProLeu TyrThrTyr GlySer IleCysIle attgca ttaattatt tggcaagtg aaaaagagc tgccaa aaattaagc 148 IleAla LeuIleIle TrpGlnVal LysLysSer CysGln LysLeuSer ttggta cctaacagg agctgttgc cggtgtcac cgaaga gtccaacaa 196 LeuVal ProAsnArg SerCysCys ArgCysHis ArgArg ValGlnGln aagtct ggagataga acatcaaga getaggaga acttca caggaagaa 244 LysSer GlyAspArg ThrSerArg AlaArgArg ThrSer GlnGluGlu gcc gag aag ttg tgg aag ctg ctg ttt ctc atg aaa agc cag ggc tgg 292 Ala Glu Lys Leu Trp Lys Leu Leu Phe Leu Met Lys Ser Gln Gly Trp att cct cag gaa gga agt gtg cgg cga atc ctg tgt gca gac ccc tgc 340 Ile Pro Gln Glu Gly Ser Val Arg Arg Ile Leu Cys Ala Asp Pro Cys tgc caa atc tgc aat gtt atg get ctg gag att aag caa ttg ctg gca 388 Cys Gln Ile Cys Asn Val Met Ala Leu Glu Ile Lys Gln Leu Leu Ala gaaget ccagaa gttggc ttggataac aagatgaag ctg ctg cac 436 ttt GluAla ProGlu ValGly LeuAspAsn LysMetLys Leu Leu His Phe tggatt aaccct gaaatg aaagatcga aggcatgag gaa att ctc 484 tcc TrpIle AsnPro GluMet LysAspArg ArgHisGlu Glu Ile Leu Ser ctttct aagget gagaca gtgacccaa gacaggaca aaa att gag 532 aac LeuSer LysAla GluThr ValThrGln AspArgThr Lys Ile Glu Asn aagagt ccaact gtcacc aaagatcat gtgtgggga get aca cag 580 aca LysSer ProThr ValThr LysAspHis ValTrpGly Ala Thr Gln The aagaca acagag gaccct gaggetcag cctccttct act gag gaa 628 gag LysThr ThrGlu AspPro GluAlaGln ProProSer Thr Glu Glu Glu ggcctg atcttc tgtgat gcccccagt gcctaaataatct tagcaa 678 gctc GlyLeu IlePhe CysAsp AlaProSer Ala cactcccttc agtccagcca cctacaaatgctccaaactc738 atcctgggtc ctgtgccact tgtcctcaaa tgacttgtgc cccaggtctaactcacctca798 cactcaacca ggaaatctat gcagaaggca ctgttttatg aaaaggagttcataggttcc858 caagaatacc catcacaaga tgaacetctg caatcccctg ttccattaacatgcaggtga918 aaaaaggctt tcattgccat agcagggcat tctccnaaat aaaaaaaaaaaaaaa 973 atactttgta cctttaagct <210> 72 <211> 791 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 14..280 <2zo>
<221> sig_peptide <222> 14..76 <223> Von Heijne matrix score 9.50 seq ALWLCAFQLVAA/LE
<220>
<221> polyA-site <222> 776..791 <220>
<221> mist feature <222> 607 <223> n=a, g, c or t <400> 72 ataggcgcgc 49 act atg ggc tcc tgc tcc ggc cgc tgc gcg ctc gtc gtc Met Gly Ser Cys Ser Gly Arg Cys Ala Leu Val Val ctctgcget ttttagctg gtcgccgcc ctggagagg taggtgttt gac 97 LeuCysAla PheGlnLeu ValAlaAla LeuGluArg GlnValPhe Asp ttcctgggc tattagtgg gcgcccatc ctggccaac tttgtccat atc 145 PheLeuGly TyrGlnTrp AlaProIle LeuAlaAsn PheValHis Ile atcatcgtc atcctggga ctcttcggc actatctag tatcggctg cgc 193 IleIleVal IleLeuGly LeuPheGly ThrIleGln TyrArgLeu Arg tatgtcatg tgtacacgc tgtgggtag ccgtctggg tcacctgga acg 241 TyrValMet Cys~ThrArg CysGlyGln ProSerGly SerProGly Thr tcttcatca tctgettct acttggaag tcggtggcc tcttaaaggacag 290 SerSerSer SerAlaSer ThrTrpLys SerValAla Ser cgagctactgaccttcagcctctcccggcatcgctcctggtggcgtgagcgctggccagg350 ctgtctgcatgaggaggtgccagcagtgggcctcggggccccccatggccaggccctggt410 gtcaggtgctggctgtgccatggagcccagctatgtggaggccctacacagttgcctgca470 gatcctgatcgcgcttctgggctttgtctgtggctgccaggtggtcagcgtgtttacgga530 ggaagaggacagctgcctgcgtaagtgaggaaacagctgatcctgctcctgtggcctcca590 gcctcagcgaccgaccnagtgacaatgacaggagctcccaggccttgggacgcgccccca650 cccagcaccccccaggcggccggcagcacctgccctgggttctaagtactggacaccagc710 cagggcggca gggcagtgcc acggctggct gcagcgtcaa gagagtttgt aatttccttt 770 ctcttaaaaa aaaaaaaaaa a 791 <210> 73 <211> 1110 <212> DNA
<213> Homo Sapiens <220>
<221> CDS
<222> 93..290 <220>
<221> sig~eptide <222> 93..149 <223> Von Heijne matrix score 9.30 seq VFVFLFLWDPVLA/GI
<220>
<221> polyA_signal <222> 1078..1083 <220>
<221> polyA_site <222> 1096..1110 <400>

agtatagg ac tgtgtgctca tctgttccct gacagccgat gtcagaccct 60 acctcttctc gccactagcc tccttaacag cc atgaagcct ctccttgtt gtg 113 aagttcccag MetLysPro LeuLeuVal Val tttgtctttctt ttcctttgg gatcca gtgctggca ggtataaat tca 161 PheValPheLeu PheLeuTrp AspPro ValLeuAla GlyIleAsn Ser ttatcatcagaa atgcacaag aaatgc tataaaaat ggcatctgc aga 209 LeuSerSerGlu MetHisLys LysCys TyrLysAsn GlyIleCys Arg cttgaatgctat gagagtgaa atgtta gttgcctac tgtatgttt cag 257 LeuGluCysTyr GluSerGlu MetLeu VaIAlaTyr CysMetPhe Gln ctggagtgctgt gtcaaagga aatcct gcaccctgacataaga aaccaatgaa 310 Leu Glu Cys Cys Val Lys Gly Asn Pro Ala Pro tggccactatcctgtaggcccttgattctgccatctttcacaaaaccagggaatttagat370 caaactgtgacaccatgatgtgtccatgactactggtttttagcatttttataggccagc430 agactcttgtggtcttaaatttaaagagctgagctgtagccttctttaaaagagctcggt490 ttttcacaaaaacaatgtagaagatattttctcacctcaacgtgatgtccagtgtgctca550 tcagcacctgtttctccctctaatcatagaggatattcttattatttagaaaggcttcaa610 gggaaacaacttttgacacctaagtcgtgtcctaccttcgcttcagcttcgcatttccca670 tttctgtgaaattcccaacagagaagcagatttgccatggccttctgacaaccttgtaca730 tctctcacataaaccgcataggcagggcttgactacaggctggcccgagtctgcactgag790 tctgaccctgaagttcctttggaacaggagaggccatcttgtgatgggctggaacaaggt850 aatttctcatccacctccctagtttcagttgagcaatggaacttcccacctgagccccta910 gggttcagctacaggctataagactgccgtcctgtggtttagtgttggttccttagcagc970 agagtgatgccacctctgctgcccgtcatctgactcctctggatgggtgttatcctgtgg1030 cttaagagctaacaccatgctgatcttgctttgctatatgtgtaactaataaactgccta1090 aatccaaaaaaaaaaaaaaa 1110 <210> 74 <211> 325 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -26..-1 <400> 74 Met Ala Thr Pro Leu Pro Pro Pro Ser Pro Arg His Leu Arg Leu Leu Arg Leu Leu Leu Ser Gly Leu Val Leu Gly Ala Ala Leu Arg Gly Ala Ala Ala Gly His Pro Asp Val Ala Ala Cys Pro Gly Ser Leu Asp Cys Ala Leu Lys Arg Arg Ala Arg Cys Pro Pro Gly Ala His Ala Cys Gly Pro Cys Leu Gln Pro Phe Gln Glu Asp Gln Gln Gly Leu Cys Val Pro Arg Met Arg Arg Pro Pro Gly Gly Gly Arg Pro Gln Pro Arg Leu Glu Asp Glu Ile Asp Phe Leu Ala Gln Glu Leu Ala Arg Lys Glu Ser Gly His Ser Thr Pro Pro Leu Pro Lys Asp Arg Gln Arg Leu Pro Glu Pro Ala Thr Leu Gly Phe Ser Ala Arg Gly Gln Gly Leu Glu Leu Gly Leu Pro Ser Thr Pro Gly Thr Pro Thr Pro Thr Pro His Thr Ser Leu Gly Ser Pro Val Ser Ser Asp Pro Val His Met Ser Pro Leu Glu Pro Arg Gly Gly Gln Gly Asp Gly Leu Ala Leu Val Leu Ile Leu Ala Phe Cys Val Ala Gly Ala Ala Ala Leu Ser Val Ala Ser Leu Cys Trp Cys Arg Leu Gln Arg Glu Ile Arg Leu Thr Gln Lys Ala Asp Tyr Ala Thr Ala Lys Ala Pro Gly Ser Pro Ala Ala Pro Arg Ile Ser Pro Gly Asp Gln Arg Leu Ala Gln Ser Ala Glu Met Tyr His Tyr Gln His Gln Arg Gln Gln Met Leu Cys Leu Glu Arg His Lys Glu Pro Pro Lys Glu Leu Asp Thr Ala Ser Ser Asp Glu Glu Asn Glu Asp Gly Asp Phe Thr Val Tyr Glu Cys Pro Gly Leu Ala Pro Thr Gly Glu Met Glu Val Arg Asn Pro Leu Phe Asp His Ala Ala Leu Ser Ala Pro Leu Pro Ala Pro Ser Ser Pro Pro Ala Leu Pro <210> 75 <211> 302 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -18..-1 <400> 75 Met Lys Ala Pro Gly Arg Leu Val Leu Ile Ile Leu Cys Ser Val Val Phe Ser Ala Val Tyr Ile Leu Leu Cys Cys Trp Ala Gly Leu Pro Leu Cys Leu Ala Thr Cys Leu Asp His His Phe Pro Thr Gly Ser Arg Pro Thr Val Pro Gly Pro Leu His Phe Ser Gly Tyr Ser Ser Val Pro Asp Gly Lys Pro Leu Val Arg Glu Pro Cys Arg Ser Cys Ala Val Val Ser Ser Ser Gly Gln Met Leu Gly Ser Gly Leu Gly Ala Glu Ile Asp Ser Ala Glu Cys Val Phe Arg Met Asn Gln Ala Pro Thr Val Gly Phe Glu Ala Asp Val Gly Gln Arg Ser Thr Leu Arg Val Val Ser His Thr Ser Val Pro Leu Leu Leu Arg Asn Tyr Ser His Tyr Phe Gln Lys Ala Arg Asp Thr Leu Tyr Met Val Trp Gly Gln Gly Arg His Met Asp Arg Val Leu Gly Gly Arg Thr Tyr Arg Thr Leu Leu Gln Leu Thr Arg Met Tyr Pro Gly Leu Gln Val Tyr Thr Phe Thr Glu Arg Met Met Ala Tyr Cys Asp Gln Ile Phe Gln Asp Glu Thr Gly Lys Asn Arg Arg Gln Ser Gly Ser Phe Leu Ser Thr Gly Trp Phe Thr Met Ile Leu Ala Leu Glu Leu Cys Glu Glu Ile Val Val Tyr Gly Met Val Ser Asp Ser Tyr Cys Arg Glu Lys Ser His Pro Ser Val Pro Tyr His Tyr Phe Glu Lys Gly Arg Leu Asp Glu Cys Gln Met Tyr Leu Ala His Glu Gln Ala Pro Arg Ser Ala His Arg Phe Ile Thr Glu Lys Ala Val Phe Ser Arg Trp Ala Lys Lys Arg Pro Ile Val Phe Ala His Pro Ser Trp Arg Thr Glu <210> 76 <211> 249 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL

<222> -15..-1 <400> 76 Met Leu Gln Leu Trp Lys Leu Val Leu Leu Cys Gly Val Leu Thr Gly Thr Ser Glu Ser Leu Leu Asp Asn Leu Gly Asn Asp Leu Ser Asn Val Val Asp Lys Leu Glu Pro Val Leu His Glu Gly Leu Glu Thr Val Asp Asn Thr Leu Lys Gly Ile Leu Glu Lys Leu Lys Val Asp Leu Gly Val Leu Gln Lys Ser Ser Ala Trp Gln Leu Ala Lys Gln Lys Ala Gln Glu Ala Glu Lys Leu Leu Asn Asn Val Ile Ser Lys Leu Leu Pro Thr Asn Thr Asp Ile Phe Gly Leu Lys Ile Ser Asn Ser Leu Ile Leu Asp Val Lys Ala Glu Pro Ile Asp Asp Gly Lys Gly Leu Asn Leu Ser Phe Pro Val Thr Ala Asn Val Thr Val Ala Gly Pro Ile Ile Gly Gln Ile Ile Asn Leu Lys Ala Ser Leu Asp Leu Leu Thr Ala Val Thr Ile Glu Thr Asp Pro Gln Thr His Gln Pro Val Ala Val Leu Gly Glu Cys Ala Ser Asp Pro Thr Ser Ile Ser Leu Ser Leu Leu Asp Lys His Ser Gln Ile Ile Asn Lys Phe Val Asn Ser Val Ile Asn Thr Leu Lys Ser Thr Val Ser Ser Leu Leu Gln Lys Glu Ile Cys Pro Leu Ile Arg Ile Phe Ile His Ser Leu Asp Val Asn Val Ile Gln Gln Val Val Asp Asn Pro Gln His Lys Thr Gln Leu Gln Thr Leu Ile <210>77 <211>84 <212>PRT

<213>Homo Sapiens <400> 77 Met Lys Val Lys Ile Lys Cys Trp Asn Gly Val Ala Thr Trp Leu Trp Val Ala Asn Asp Glu Asn Cys Gly Ile Cys Arg Met Ala Phe Asn Gly Cys Cys Pro Asp Cys Lys Val Pro Gly Asp Asp Cys Pro Leu Val Trp Gly Gln Cys Ser His Cys Phe His Met His Cys Ile Leu Lys Trp Leu His Ala Gln Gln Val Gln Gln His Cys Pro Met Cys Arg Gln Glu Trp Lys Phe Lys Glu <210> 78 <211> 554 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -13..-1 <220>
<221> UNSURE
<222> 259 <223> Xaa = Asp, His,Asn,Tyr <400> 78 Met Leu Tyr Leu Gln Gly Trp Ser Met Pro Ala Val Ala Glu Val Lys Leu Arg Asp Asp Gln Tyr Thr Leu Glu His Met His Ala Phe Gly Met Tyr Asn Tyr Leu His Cys Asp Ser Trp Tyr Gln Asp Ser Val Tyr Tyr Ile Asp Thr Leu Gly Arg Ile Met Asn Leu Thr Val Met Leu Asp Thr Ala Leu Gly Lys Pro Arg Glu Val Phe Arg Leu Pro Thr Asp Leu Thr Ala Cys Asp Asn Arg Leu Cys Ala Ser Ile His Phe Ser Ser Ser Thr Trp Val Thr Leu Ser Asp Gly Thr Gly Arg Leu Tyr Val Ile Gly Thr Gly Glu Arg Gly Asn Ser Ala Ser Glu Lys Trp Glu Ile Met Phe Asn Glu Glu Leu Gly Asp Pro Phe Ile Ile Ile His Ser Ile Ser Leu Leu Asn Ala Glu Glu His Ser Ile Ala Thr Leu Leu Leu Arg Ile Glu Lys Glu Glu Leu Asp Met Lys Gly Ser Gly Phe Tyr Val Ser Leu Glu Trp Val Thr Ile Ser Lys Lys Asn Gln Asp Asn Lys Lys Tyr Glu Ile Ile Lys Arg Asp Ile Leu Arg Gly Lys Ser Val Pro His Tyr Ala Ala Ile Lys Pro Asp Gly Asn Gly Leu Met Ile Val Ser Tyr Lys Ser Leu Thr Phe Val Gln Ala Gly Gln Asp Leu Glu Glu Asn Met Asp Glu Asp Ile Ser Glu Lys Ile Lys Glu Pro Leu Tyr Tyr Trp Gln Gln Thr Glu Asp Asp Leu Thr Val Thr Ile Arg Leu Pro Glu Asp Ser Thr Lys Glu Xaa Ile Gln Ile Gln Phe Leu Pro Asp His Ile Asn Ile Val Leu Lys Asp His Gln Phe Leu Glu Gly Lys Leu Tyr Ser Ser Ile Asp His Glu Ser Ser Thr Trp Ile Ile Lys Glu Ser Asn Ser Leu Glu Ile Ser Leu Ile Lys Lys Asn Glu Gly Leu Thr Trp Pro Glu Leu Val Ile Gly Asp Lys Gln Gly Glu Leu Ile Arg Asp Ser Ala Gln Cys Ala Ala Ile Ala Glu Arg Leu Met His Leu Thr Ser Glu Glu Leu Asn Pro Asn Pro Asp Lys Glu Lys Pro Pro Cys Asn Ala Gln Glu Leu Glu Glu Cys Asp Ile Phe Phe Glu Glu Ser Ser Ser Leu Cys Arg Phe Asp Gly Asn Thr Leu Lys Thr Thr His Val Val Asn Leu Gly Ser Asn Gln Tyr Leu Phe Ser Val Ile Val Asp Pro Lys Glu Met Pro Cys Phe Cys Leu Arg His Asp Val Asp Ala Leu Leu Trp Gln Pro His Ser Ser Lys Gln Asp Asp Met Trp Glu His Ile Ala Thr Phe Asn Ala Leu Gly Tyr Val Gln Ala Ser Lys Arg Asp Lys Lys Phe Phe Ala Cys Ala Pro Asn Tyr Ser Tyr Ala Ala Leu Cys Glu Cys Leu Arg Arg Val Phe Ile Tyr Arg Gln Pro Ala Pro Met Ser Thr Val Leu Tyr Asn Arg Lys Glu Gly Arg Gln Val Gly Gln Val Ala Lys Gln Gln Val Ala Ser Leu Glu Thr Asn Asp Pro Ile Leu Gly Phe Gln Ala Thr Asn Glu Arg Leu Phe Val Leu Thr Thr Lys Asn Leu Phe Leu Ile Lys Val Asn Thr Glu Asn <210> 79 <211> 99 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -48..-1 <400> 79 Met Asp Asn Val Gln Pro Lys Ile Lys His Arg Pro Phe Cys Phe Ser Val Lys Gly His Val Lys Met Leu Arg Leu Asp Ile Ile Asn Ser Leu Val Thr Thr Val Phe Met Leu Ile Val Ser Val Leu Ala Leu Ile Pro Glu Thr Thr Thr Leu Thr Val Gly Gly Gly Val Phe Ala Leu Val Thr Ala Val Cys Cys Leu Ala Asp Gly Ala Leu Ile Tyr Arg Lys Leu Leu Phe Asn Pro Ser Gly Pro Tyr Gln Lys Lys Pro Val His Glu Lys Lys Glu Val Leu <210> 80 <211> 90 <212> PRT

<213> Homo Sapiens <220>
<221> SIGNAL
<222> -32..-1 <400> BO
Met Pro Cys Leu Asp Gln Gln Leu Thr Val His Ala Leu Pro Cys Pro Ala Gln Pro Ser Ser Leu Ala Phe Cys Gln Val Gly Phe Leu Thr Ala GIn Pro Ser Pro Pro Arg Arg Arg Asn Gly Lys Asp Arg Tyr Thr Leu Val Leu Gln His Gln Glu Cys Gln Asp Asp Leu Ala Thr Ser Ser Leu Val Tyr Leu Ser Leu Pro Cys Phe Lys Asp Leu Gly Arg Ser Lys His Gln Ser Ile Thr Val Ala Asp Thr Asn Lys <210> 81 <211> 115 <212 > PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -46..-1 <400> B1 Met Lys Thr Leu Phe Asn Pro Ala Pro Ala Ile Ala Asp Leu Asp Pro Gln Phe Tyr Thr Leu Ser Asp Val Phe Cys Cys Asn Glu Ser Glu Ala Glu Ile Leu Thr Gly Leu Thr Val Gly Ser Ala Ala Asp Ala Gly Glu Ala Ala Leu Val Leu Leu Lys Arg Gly Cys Gln Val Val Ile Ile Thr Leu Gly Ala Glu Gly Cys Val Val Leu Ser Gln Thr Glu Pro Glu Pro Lys His Iie Pro Thr Glu Lys Val Lys Ala Val Asp Thr Thr Cys Arg Pro Gly Ser Arg Pro Lys Ser Glu Ala Ala Ser Val Lys Lys Gln Lys His Tyr Lys <210> B2 <211> 66 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -19..-1 <400> 82 Met Lys Pro Leu Leu Val Val Phe Val Phe Leu Phe Leu Trp Asp Pro Val Leu Ala Gly Ile Asn Ser Leu Ser Ser Glu Met His Lys Lys Cys Tyr Lys Asn Gly Ile Cys Arg Leu Glu Cys Tyr Glu Ser Glu Met Leu Val Ala Tyr Cys Met Phe Gln Leu Glu Cys Cys Val Lys Gly Asn Pro Ala Pro <210> 83 <211> 133 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -21..-1 <400> 83 Met Ser Cys Ser Leu Lys Phe Thr Leu Ile Val Ile Phe Phe Tyr Cys Trp Leu Ser Ser Ser His Glu Glu Leu Glu Gly Gly Thr Ser Lys Ser Phe Asp Leu His Thr Val Ile Met Leu Val Ile Ala Gly Gly Ile Leu Ala Ala Leu Leu Leu Leu Ile Val Val Val Leu Cys Leu Tyr Phe Lys WO 00/37491 PCT/IB99l02058 Ile His Asn Ala Leu Lys Ala Ala Lys Glu Pro Glu Ala Val Ala Val Lys Asn His Asn Pro Asp Lys Val Trp Trp Ala Lys Asn Ser Gln Ala Lys Thr Ile Ala Thr Glu Ser Cys Pro Ala Leu Gln Cys Cys Glu Gly Tyr Arg Met Cys Ala Ser Phe Asp Ser Leu Pro Pro Cys Cys Cys Asp Ile Asn Glu Gly Leu <210> 84 <211> 140 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -70..-1 <400> 84 Met Val Leu Thr Lys Pro Leu Gln Arg Asn Gly Ser Met Met Ser Phe Glu Asn Val Lys Glu Lys Ser Arg Glu Gly Gly Pro His Ala His Thr Pro Glu Glu Glu Leu Cys Phe Val Val Thr His Tyr Pro Gln Val Gln Thr Thr Leu Asn Leu Phe Phe His Ile Phe Lys Val Leu Thr Gln Pro Leu Ser Leu Leu Trp Gly Cys Asp Gln Lys Pro Arg Thr Val Pro Thr Leu Gly Asn Gly Ala Trp Asp Thr Cys Gln Gln His Ile Arg Thr Ser Ser Trp Thr Ala Asn Thr Leu Val Ile Gin Asn Gln His Ser Arg Glu Ser Thr Val Ser Val Cys Leu Phe Met Leu Ile Arg Met Gln His Ile Leu Lys Thr Asp Thr Leu Gln Gln Phe Arg Ile Cys <210> 85 <211> 233 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -32..-1 <400> 85 Met Ala Thr Pro Pro Phe Arg Leu Ile Arg Lys Met Phe Ser Phe Lys Val Ser Arg Trp Met Gly Leu Ala Cys Phe Arg Ser Leu Ala Ala Ser Ser Pro Ser Ile Arg Gln Lys Lys Leu Met His Lys Leu Gln Glu Glu Lys Ala Phe Arg Glu Glu Met Lys Ile Phe Arg Glu Lys Ile Glu Asp Phe Arg Glu Glu Met Trp Thr Phe Arg Gly Lys Ile His Ala Phe Arg Gly Gln Ile Leu Gly Phe Trp Glu Glu Glu Arg Pro Phe Trp Glu Glu Glu Lys Thr Phe Trp Lys Glu Glu Lys Ser Phe Trp Glu Met Glu Lys Ser Phe Arg Glu Glu Glu Lys Thr Phe Trp Lys Lys Tyr Arg Thr Phe Trp Lys Glu Asp Lys Ala Phe Trp Lys Glu Asp Asn Ala Leu Trp Glu Arg Asp Arg Asn Leu Leu Gln Glu Asp Lys Ala Leu Trp Glu Glu Glu Lys Ala Leu Trp Val Glu Glu Arg Ala Leu Leu Glu Gly Glu Lys Ala Leu Trp Glu Asp Lys Thr Ser Leu Trp Glu Glu Glu Asn Ala Leu Trp Glu Glu Glu Arg Ala Phe Trp Met Glu Asn Asn Gly His Ile Ala Gly Glu Gln Met Leu Glu Asp Gly Pro His Asn Ala Asn Arg Gly Gln Arg Leu Leu Ala Phe Ser Arg Gly Arg Ala <210> 86 <211> 83 <212> PRT

<213> Homo Sapiens <220>
<221> SIGNAL
<222> -29..-1 <400> 86 Met Ser Phe Phe Gln Leu Leu Met Lys Arg Lys Glu Leu Ile Pro Leu VaI Val Phe Met Thr Val Ala Ala Gly Gly Ala Ser Ser Phe Ala Val Tyr Ser Leu Trp Lys Thr Asp Val Ile Leu Asp Arg Lys Lys Asn Pro Glu Pro Trp Glu Thr Val Asp Pro Thr Val Pro Gln Lys Leu Ile Thr Ile Asn Gln Gln Trp Lys Pro Ile Glu Glu Leu Gln Asn Val Gln Arg Val Thr Lys <210>87 <211>215 <212>PRT

<213>Homo Sapiens <220>
<221> SIGNAL
<222> -41..-1 <400> 87 Met Val Ser Ala Leu Arg Gly Ala Pro Leu Ile Arg Val His Ser Ser Pro Val Ser Ser Pro Ser Val Ser Gly Pro Arg Arg Leu Val Ser Cys Leu Ser Ser Gln Ser Ser Ala Leu Ser Gln Ser Gly Gly Gly Ser Thr Ser Ala Ala Gly Ile Glu Ala Arg Ser Arg Ala Leu Arg Arg Arg Trp Cys Pro Ala Gly Ile Met Leu Leu Ala Leu Val Cys Leu Leu Ser Cys Leu Leu Pro Ser Ser Glu Ala Lys Leu Tyr Gly Arg Cys Glu Leu Ala 40 45 50 . 55 Arg Val Leu His Asp Phe Gly Leu Asp Gly Tyr Arg Gly Tyr Ser Leu Ala Asp Trp Val Cys Leu Ala Tyr Phe Thr Ser Gly Phe Asn Ala Ala Ala Leu Asp Tyr Glu Ala Asp Gly Ser Thr Asn Asn Gly Ile Phe Gln Ile Asn Ser Arg Arg Trp Cys Ser Asn Leu Thr Pro Asn Val Pro Asn Val Cys Arg Met Tyr Cys Ser Asp Leu Leu Asn Pro Asn Leu Lys Asp Thr Val Ile Cys Ala Met Lys Ile Thr Gln Glu Pro Gln Gly Leu Gly Tyr Trp Glu Ala Trp Arg His His Cys Gln Gly Lys Asp Leu Thr Glu Trp Val Asp Gly Cys Asp Phe <210> 88 <211> 417 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -20..-1 <400> 88 Met Met Gly Ser Pro Val Ser His Leu Leu Ala Gly Phe Cys Val Trp Val Val Leu Gly Trp Val Gly Gly Ser Val Pro Asn Leu Gly Pro Ala Glu Gln Glu Gln Asn His Tyr Leu Ala Gln Leu Phe Gly Leu Tyr Gly Glu Asn Gly Thr Leu Thr Ala Gly Gly Leu Ala Arg Leu Leu His Ser Leu Gly Leu Gly Arg Val Gln Gly Leu Arg Leu Gly Gln His Gly Pro Leu Thr Gly Arg Ala Ala Ser Pro Ala Ala Asp Asn Ser Thr His Arg Pro Gln Asn Pro Glu Leu Ser Val Asp Val Trp Ala Gly Met Pro Leu Gly Pro Ser Gly Trp Gly Asp Leu Glu Glu Ser Lys Ala Pro His Leu Pro Arg Gly Pro Ala Pro Ser Gly Leu Asp Leu Leu His Arg Leu Leu Leu Leu Asp His Ser Leu Ala Asp His Leu Asn Glu Asp Cys Leu Asn Gly Ser Gln Leu Leu Val Asn Phe Gly Leu Ser Pro Ala Ala Pro Leu Thr Pro Arg Gln Phe Ala Leu Leu Cys Pro Ala Leu Leu Tyr Gln Ile Asp Ser Arg Val Cys Ile Gly Ala Pro Ala Pro Ala Pro Pro Gly Asp Leu Leu Ser Ala Leu Leu Gln Ser Ala Leu Ala Val Leu Leu Leu Ser Leu Pro Ser Pro Leu Ser Leu Leu Leu Leu Arg Leu Leu Gly Pro Arg Leu Leu Arg Pro Leu Leu Gly Phe Leu Gly Ala Leu Ala Val Gly Thr Leu Cys Gly Asp Ala Leu Leu His Leu Leu Pro His Ala Gln Glu Gly Arg His Ala Gly Pro Gly Gly Leu Pro Glu Lys Asp Leu Gly Pro Gly Leu Ser Val Leu Gly Gly Leu Phe Leu Leu Phe Val Leu Glu Asn Met 270 275 . 280 Leu Gly Leu Leu Arg His Arg Gly Leu Arg Pro Arg Cys Cys Arg Arg Lys Arg Arg Asn Leu Glu Thr Arg Asn Leu Asp Pro Glu Asn Gly Ser Gly Met Ala Leu Gln Pro Leu Gln Ala Ala Pro Glu Pro Gly Ala Gln Gly Gln Arg Glu Lys Asn Ser Gln His Pro Pro Ala Leu Ala Pro Pro Gly His Gln Gly His Ser His Gly His Gln Gly Gly Thr Asp Ile Thr Trp Met Val Leu Leu Gly Asp Gly Leu His Asn Leu Thr Asp Gly Leu Ala Ile Gly Ala Ala Phe Ser Asp Gly Phe Ser Ala Ala Ser Val Pro Pro <210> 89 <211> 366 <212 > PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -23..-1 <400> B9 Met Ala Ser Met Ala Ala Val Leu Thr Trp Ala Leu Ala Leu Leu Ser Ala Phe Ser Ala Thr Gln Ala Arg Lys Gly Phe Trp Asp Tyr Phe Ser Gln Thr Ser Gly Asp Lys Gly Arg Val Glu Gln Ile His Gln Gln Lys Met Ala Arg Glu Pro Ala Thr Leu Lys Asp Ser Leu Glu Gln Asp Leu Asn Asn Met Asn Lys Phe Leu Glu Lys Leu Arg Pro Leu Ser Gly Ser Glu Ala Pro Arg Leu Pro Gln Asp Pro Val Gly Met Arg Arg Gln Leu Gln Glu Glu Leu Glu Glu Val Lys Ala Arg Leu Gln Pro Tyr Met Ala Glu Ala His Glu Leu Val Gly Trp Asn Leu Glu GIy Leu Arg Gln Gln Leu Lys Pro Tyr Thr Met Asp Leu Met Glu Gln Val Ala Leu Arg Val Gln Glu Leu Gln Glu Gln Leu Arg Val Val Gly Glu Asp Thr Lys Ala Gln Leu Leu Gly Gly Val Asp Glu Ala Trp Ala Leu Leu Gln Gly Leu Gln Ser Arg Val Val His His Thr Gly Arg Phe Lys Glu Leu Phe His Pro Tyr Ala Glu Ser Leu Val Ser Gly Ile Gly Arg His Val Gln Glu Leu His Arg Ser Val Ala Pro His Ala Pro Ala Ser Pro Ala Arg Leu Ser Arg Cys Val Gln Val Leu Ser Arg Lys Leu Thr Leu Lys Ala Lys Ala Leu His Ala Arg Ile Gln Gln Asn Leu Asp Gln Leu Arg Glu Glu Leu Ser Arg Ala Phe Ala Gly Thr Gly Thr Glu Glu Gly Ala Gly Pro Asp Pro Gln Met Leu Ser Glu Glu Val Arg Gln Arg Leu Gln Ala Phe Arg Gln Asp Thr Tyr Leu Gln Ile Ala Ala Phe Thr Arg Ala Ile Asp Gln Glu Thr Glu Glu Val Gln Gln Gln Leu Ala Pro Pro Pro Pro Gly His Ser Ala Phe Ala Pro Glu Phe Gln Gln Thr Asp Ser Gly Lys Val Leu Ser Lys Leu Gln Ala Arg Leu Asp Asp Leu Trp Glu Asp Ile Thr His Ser Leu His Asp Gln Gly His Ser His Leu Gly Asp Pro <210> 90 <211> 150 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -45..-1 <400> 90 Met Val Leu Met Trp Thr Ser Gly Asp Ala Phe Lys Thr Ala Tyr Phe Leu Leu Lys Gly Ala Pro Leu Gln Phe Ser Val Cys Gly Leu Leu Gln Val Leu Val Asp Leu Ala Ile Leu Gly Gln Ala Tyr Ala Phe Ala Pro Pro Pro Glu Ala Gly Ala Pro Arg Arg Ala Pro His Trp His Gln Gly Pro Leu Thr Val Gly Arg Thr Arg Met Trp Asp Arg Gln Pro Arg Ala Leu Val Gly Pro Asp Leu Pro Ala Gly Arg Val Gly Ala Val Ala Pro Ala Gly Val Ala Glu Met Gly His Gly His Trp GIy Leu His Gln Pro Leu Trp Gly Val Ser Gly Trp Ala Val Gly Val Gly Leu Gly Arg Cys Leu Cys Ser Ala Gly Thr Ala Arg Val Asp Leu Ala Pro Arg Val Leu Asp Val Phe Arg Met Thr <210> 91 <211> 308 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -68..-1 <400> 91 Met Asp Phe Val Ala Gly Ala Ile Gly Gly Val Cys Gly Val Ala Val Gly Tyr Pro Leu Asp Thr Val Lys Val Arg Ile Gln Thr Glu Pro Lys Tyr Thr Gly Ile Trp His Cys Val Arg Asp Thr Tyr His Arg Glu Arg Val Trp Gly Phe Tyr Arg Gly Leu Ser Leu Pro Val Cys Thr Val Ser Leu Val Ser Ser Val Ser Phe Gly Thr Tyr Arg His Cys Leu Ala His Ile Cys Arg Leu Arg Tyr Gly Asn Pro Asp Ala Lys Pro Thr Lys Ala Asp Ile Thr Leu Ser Gly Cys Ala Ser Gly Leu Val Arg Val Phe Leu Thr Ser Pro Thr Glu Val Ala Lys Val Arg Leu Gln Thr Gln Thr Gln Ala Gln Lys Gln Gln Arg Leu Leu Ser Ala Ser Gly Pro Leu Ala Val Pro Pro Met Cys Pro Val Pro Pro Ala Cys Pro Glu Pro Lys Tyr Arg Gly Pro Leu His Cys Leu Ala Thr Val Ala Arg Glu Glu Gly Leu Cys Gly Leu Tyr Lys Gly Ser Ser Ala Leu Val Leu Arg Asp Gly His Ser Phe Ala Thr Tyr Phe Leu Ser Tyr Ala Val Leu Cys Glu Trp Leu Ser Pro Ala Gly His Ser Arg Pro Asp Val Pro Gly Val Leu Val Ala Gly Gly Cys Ala Gly Val Leu Ala Trp Ala Val Ala Thr Pro Met Asp Val Ile Lys Ser Arg Leu Gln Ala Asp Gly Gln Gay Gln Arg Arg Tyr Arg Gly Leu Leu His Cys Met Val Thr Ser Val Arg Glu Glu Gly Pro Arg Val Leu Phe Lys Gly Leu Val Leu Asn Cys Cys Arg Ala Phe Pro Val Asn Met Val Val Phe Val Ala Tyr Glu Ala Val Leu Arg Leu Ala Arg Gly Leu Leu Thr <210> 92 <211> 114 <212> PRT
<213> Homo Sapiens <220>
<221> SIGNAL
<222> -49..-1 <400> 92 Met Glu Lys Pro Leu Phe Pro Leu Val Pro Leu His Trp Phe Gly Phe Gly Tyr Thr Ala Leu Val Val Ser Gly Gly Ile Val Gly Tyr Val Lys Thr Gly Ser Val Pro Ser Leu Ala Ala Gly Leu Leu Phe Gly Ser Leu Ala Gly Leu Gly Ala Tyr Gln Leu Tyr Gln Asp Pro Arg Asn Val Trp Gly Phe Leu Ala Ala Thr Ser Val Thr Phe Val Gly Val Met Gly Met Arg Ser Tyr Tyr Tyr Gly Lys Phe Met Pro Val Gly Leu Ile Ala Gly Ala Ser Leu Leu Met Ala Ala Lys Val Gly Val Arg Met Leu Met Thr Ser Asp <210>93 <211>382 <212>PRT

<213>Homo Sapiens <220>

CA 02354369 2001-06-13 i ~ r.
DEMANDES OU BREVETS VOLUMINEUX

COMPREND PLUS D'UN TOME.
CECI EST LE TOME 'I DE
NOTE: ~ Pour les tomes additionels, veuittez contacter le Bureau canadien des brevets JUMBO APPLICATiONS/PATE1NTS
THiS SECT10N OF THE APPL1CAT10NIPATENT CONTAtNS MORE
THAN ONE VOLUME , THtS tS VOLUME ~ , OF
NOTE: For additional volumes please contact the Canadian Patent Office

Claims (30)

WHAT IS CLAIMED IS:
1. A purified or isolated nucleic acid comprising the sequence of one of SEQ
ID NOs: 24-73 or a sequence complementary thereto.
2. A purified or isolated nucleic acid comprising at least 12 consecutive bases of the sequence of one of SEQ ID NOs: 24-73 or one of the sequences complementary thereto.
3. A purified or isolated nucleic acid comprising the full coding sequences of one of SEQ ID NOs: 24-73, wherein the full coding sequence comprises the sequence encoding signal peptide and the sequence encoding mature protein.
4. A purified or isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 24-73 which encode a mature protein.
5. A purified or isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 24-73 which encode the signal peptide.
6. A purified or isolated nucleic acid encoding a polypeptide having the sequence of one of the sequences of SEQ ID NOs: 74-123.
7. A purified or isolated nucleic acid encoding a polypeptide having the sequence of a mature protein included in one of the sequences of SEQ ID NOs: 74-123.
8. A purified or isolated nucleic acid encoding a polypeptide having the sequence of a signal peptide included in one of the sequences of SEQ ID NOs: 74-123.
9. A purified or isolated protein comprising the sequence of one of SEQ ID
NOs: 74-123.
10. A purified or isolated polypeptide comprising at least 10 consecutive amino acids of one of the sequences of SEQ ID NOs: 74-123.
11. An isolated or purified polypeptide comprising a signal peptide of one of the polypeptides of SEQ ID
NOs: 74-123.
12. An isolated or purified polypeptide comprising a mature protein of one of the polypeptides of SEQ
ID NOs: 74-123.
13. A method of making a protein comprising one of the sequences of SEQ ID NO:
74-123, comprising the steps of:

obtaining a cDNA comprising one of the sequences of sequence of SEQ ID NO: 24-73;
inserting said cDNA in an expression vector such that said cDNA is operably linked to a promoter, and introducing said expression vector into a host cell whereby said host cell produces the protein encoded by said cDNA.
14. The method of Claim 13, further comprising the step of isolating said protein.
15. A protein obtainable by the method of Claim 14.
16. A host cell containing a recombinant nucleic acid of Claim 1.
17. A purified or isolated antibody capable of specifically binding to a protein having the sequence of one of SEQ ID NOs: 74-123.
18. in an array of polynucleotides of at least 15 nucleotides in length, the improvement comprising inclusion in said array of at least one of the sequences of SEQ ID NOs: 24-73, or one of the sequences complementary to the sequences of SEQ ID NOs: 24-73, or a fragment thereof of at least 15 consecutive nucleotides.
19. A purified or isolated nucleic acid of at least 15 bases capable of hybridizing under stringent conditions to the sequence of one of SEQ ID NOs: 24-73 or a sequence complementary to one of the sequences of SEQ ID NOs: 24-73.
20. A purified or isolated antibody capable of binding to a polypeptide comprising at least 10 consecutive amino acids of the sequence of one of SEQ ID NOs: 74-123.
21. A computer readable medium having stored thereon a sequence selected from the group consisting of a cDNA code of SEQID NOs. 24-73 and a polypeptide code of SEQ ID NOs. 74-123.
22. A computer system comprising a processor and a data storage device wherein said data storage device has stored thereon a sequence selected from the group consisting of a cDNA code of SEQID NOs. 24-73 and a polypeptide code of SEQ ID NOs. 74-123.
23. The computer system of Claim 22 further comprising a sequence comparer and a data storage device having reference sequences stored thereon.
24. The computer system of Claim 23 wherein said sequence comparer comprises a computer program which indicates polymorphisms.
25. The computer system of Claim 22 further comprising an identifier which identifies features in said sequence.
26. A method for comparing a first sequence to a reference sequence wherein said first sequence is selected from the group consisting of a cDNA code of SEQID NOs. 24-73 and a polypeptide code of SEQ ID NOs. 74-123 comprising the steps of:
reading said first sequence and said reference sequence through use of a computer program which compares sequences; and determining differences between said first sequence and said reference sequence with said computer program.
27. The method of Claim 26, wherein said step of determining differences between the first sequence and the reference sequence comprises identifying polymorphisms.
28. A method for identifying a feature in a sequence selected from the group consisting of a cDNA code of SEQID NOs. 24-73 and a polypeptide code of SEQ ID NOs. 74-123 comprising the steps of:
reading said sequence through the use of a computer program which identifies features in sequences; and identifying features in said sequence with said computer program.
29. A purified or isolated nucleic acid comprising a contiguous span of at least 12 nucleotides of the sequence of one of SEQ ID NOs: 24-73 or one of the sequences complementary thereto, wherein said contiguous span comprises at least 1 of the nucleotide positions of polynucleotides described in Table III.
30. A purified or isolated nucleic acid comprising a contiguous span of at least 12 nucleotides of the sequence of one of the polynucleotides described in Table III or one of the sequences complementary thereto.
CA002354369A 1998-12-22 1999-12-20 Complementary dna's encoding proteins with signal peptides Abandoned CA2354369A1 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US11368698P 1998-12-22 1998-12-22
US60/113,686 1998-12-22
US14103299P 1999-06-25 1999-06-25
US60/141,032 1999-06-25
PCT/IB1999/002058 WO2000037491A2 (en) 1998-12-22 1999-12-20 Dnas encoding proteins with signal sequences

Publications (1)

Publication Number Publication Date
CA2354369A1 true CA2354369A1 (en) 2000-06-29

Family

ID=26811346

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002354369A Abandoned CA2354369A1 (en) 1998-12-22 1999-12-20 Complementary dna's encoding proteins with signal peptides

Country Status (5)

Country Link
EP (1) EP1144444A3 (en)
JP (1) JP2002539767A (en)
AU (1) AU1675900A (en)
CA (1) CA2354369A1 (en)
WO (1) WO2000037491A2 (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001000803A2 (en) * 1999-06-25 2001-01-04 Genset Apolipoprotein a-iv-related protein: polypeptide, polynucleotide sequences and biallelic markers thereof
US6787647B1 (en) 1998-12-22 2004-09-07 Genset S.A. Carnitine carrier related protein-1
CN1244584A (en) * 1999-05-14 2000-02-16 北京医科大学 Chemotarix factor with immunocyte chemotaxis and hemopoinesis stimulating activity
US7135559B2 (en) * 2000-02-21 2006-11-14 Kureha Chemical Industry Co., Ltd. Proteins and novel genes encoding the same
US7094566B2 (en) 2000-03-16 2006-08-22 Amgen Inc., IL-17 receptor like molecules and uses thereof
EP1276754A4 (en) * 2000-04-14 2005-04-06 Nuvelo Inc Materials and methods relating to lipid metabolism
AU2002218091A1 (en) * 2000-11-17 2002-05-27 Xenon Genetics Inc Fat regulated genes, uses thereof, and compounds for modulating same
WO2002102985A2 (en) * 2001-06-15 2002-12-27 The Government Of The United States Of America, As Represented By The Secretary Of The Department Of Health And Human Services Pate, a gene expressed in prostate cancer, prostate and testis, and uses thereof
WO2003002737A1 (en) * 2001-06-27 2003-01-09 Riken Novel human topoisomerase 2α inhibitory protein and utilization thereof
FR2843395A1 (en) * 2002-08-12 2004-02-13 Genfit S A New synthetic peptide from apolipoprotein AIV related protein, useful for raising antibodies, used for diagnosis and treatment of disorders of lipid metabolism
FR2829581A1 (en) * 2001-09-07 2003-03-14 Genfit S A New synthetic peptide from apolipoprotein AIV related protein, useful for raising antibodies, used for diagnosis and treatment of disorders of lipid metabolism
DE60216482T2 (en) * 2001-09-07 2007-08-23 Genfit COMPOSITIONS AND METHODS FOR DETERMINING AA4RP
DE10344799A1 (en) * 2003-09-26 2005-04-14 Ganymed Pharmaceuticals Ag Identification of surface-associated antigens for tumor diagnosis and therapy
US7892733B1 (en) 2004-04-22 2011-02-22 Amgen Inc. Response element regions
DE102005013846A1 (en) 2005-03-24 2006-10-05 Ganymed Pharmaceuticals Ag Identification of surface-associated antigens for tumor diagnosis and therapy
EP1951281B1 (en) 2005-10-17 2015-04-15 Sloan Kettering Institute For Cancer Research Wt1 hla class ii-binding peptides and compositions and methods comprising same
US9265816B2 (en) 2006-04-10 2016-02-23 Sloan Kettering Institute For Cancer Research Immunogenic WT-1 peptides and methods of use thereof
US20150104413A1 (en) 2012-01-13 2015-04-16 Memorial Sloan Kettering Cancer Center Immunogenic wt-1 peptides and methods of use thereof
HUE052541T2 (en) 2013-01-15 2021-05-28 Memorial Sloan Kettering Cancer Center Immunogenic wt-1 peptides and methods of use thereof
US10815273B2 (en) 2013-01-15 2020-10-27 Memorial Sloan Kettering Cancer Center Immunogenic WT-1 peptides and methods of use thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998031818A2 (en) * 1997-01-21 1998-07-23 Human Genome Sciences, Inc. Tace-like and matrilysin-like polypeptides
JP2001518793A (en) * 1997-04-10 2001-10-16 ジェネティックス・インスチチュート・インコーポレーテッド Secretory expression sequence tags (sESTs)
DE19816395A1 (en) * 1998-04-03 1999-10-07 Metagen Gesellschaft Fuer Genomforschung Mbh Human nucleic acid sequences from ovarian normal tissue
EP0976824A1 (en) * 1998-07-10 2000-02-02 Amsterdam Molecular Therapeutics Gene and protein involved in liver regeneration

Also Published As

Publication number Publication date
WO2000037491A3 (en) 2001-09-20
EP1144444A3 (en) 2002-01-02
JP2002539767A (en) 2002-11-26
EP1144444A2 (en) 2001-10-17
AU1675900A (en) 2000-07-12
WO2000037491A2 (en) 2000-06-29

Similar Documents

Publication Publication Date Title
CA2311572A1 (en) Extended cdnas for secreted proteins
US6936692B2 (en) Complementary DNAs
CA2319089A1 (en) 5&#39; ests and encoded human proteins
US7413875B2 (en) ESTs and encoded human proteins
CA2354369A1 (en) Complementary dna&#39;s encoding proteins with signal peptides
EP1033401A2 (en) Expressed sequence tags and encoded human proteins
CA2296667A1 (en) 5&#39; ests for secreted proteins expressed in brain
CA2297306A1 (en) 5&#39;ests for non tissue specific secreted proteins
CA2302644A1 (en) Extended cdnas for secreted proteins
CA2297157A1 (en) 5&#39; ests for secreted proteins expressed in testis and other tissues
EP1000149B1 (en) 5&#39; ESTs FOR SECRETED PROTEINS IDENTIFIED FROM BRAIN TISSUES
CA2297109A1 (en) 5&#39; ests for secreted proteins expressed in muscle and other mesodermal tissues
EP1000151B1 (en) 5&#39; ESTs FOR SECRETED PROTEINS EXPRESSED IN VARIOUS TISSUES
CA2296398A1 (en) 5&#39; ests for secreted proteins expressed in endoderm
AU753099B2 (en) Extended cDNAs for secreted proteins
AU2002301051B9 (en) Extended cDNAs for secreted proteins
AU2003204659B2 (en) Extended cDNAs for secreted proteins
AU2003262114B8 (en) cDNAs encoding secreted proteins
EP1757699A1 (en) cDNAs encoding secreted proteins
CA2294569A1 (en) Secreted proteins and polynucleotides encoding them
EP1903111A2 (en) Extended cDNAs for secreted proteins
AU2006202884A1 (en) Extended cDNAs for secreted proteins
AU2006225329A1 (en) cDNAs encoding secreted proteins
AU2003262217A1 (en) 5&#39; ESTs and encoded human proteins

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued