WO2004039943A2

WO2004039943A2 - Human genes and gene expression products isolated from human prostate

Info

Publication number: WO2004039943A2
Application number: PCT/US2003/015465
Authority: WO
Inventors: Elizabeth M. Scott; George Lamson; Altaf Kassam; Guozhong Zhang; Doreen Sakamoto; Pablo Dominguez Garcia
Original assignee: Chiron Corporation
Priority date: 2002-05-17
Filing date: 2003-05-16
Publication date: 2004-05-13
Also published as: AU2003299506A1; WO2004039943A3; AU2003299506A8

Abstract

This invention relates to novel human polynucleotides and variants thereof, their encoded polypeptides and variants thereof, to genes corresponding to these polynucleotides and to proteins expressed by the genes. The invention also relates to diagnostics and therapeutics comprising such novel human polynucleotides, their corresponding genes or gene products, including probes, antisense nucleotides, and antibodies. The polynucleotides of the invention correspond to a polynucleotide comprising the sequence information of at least one of SEQ ID NOS:1-1485. The polypeptides of the invention correspond to a polypeptide comprising the amino acid sequence information of at least one of SEQ ID NOS:1486-1542.

Description

HUMAN GENES AND GENE EXPRESSION PRODUCTS ISOLATED FROM HUMAN PROSTATE

Field of the Invention The present invention relates to polynucleotides of human origin, particularly in human prostate, and the encoded gene products. Background of the Invention

Identification of novel polynucleotides, particularly those that encode an expressed gene product, is important in the advancement of drug discovery, diagnostic technologies, and the understanding of the progression and nature of complex diseases such as cancer. Identification of genes expressed in different cell types isolated from sources that differ in disease state or stage, developmental stage, exposure to various environmental factors, the tissue of origin, the species from which the tissue was isolated, and the like is key to identifying the genetic factors that are responsible for the phenotypes associated with these various differences. This invention provides novel human polynucleotides, the polypeptides encoded by these polynucleotides, and the genes and proteins corresponding to these novel polynucleotides. Summary of the Invention

This invention relates to novel human polynucleotides and variants thereof, their encoded polypeptides and variants thereof, to genes corresponding to these polynucleotides and to proteins expressed by the genes. The invention also relates to diagnostics and therapeutics comprising such novel human polynucleotides, their corresponding genes or gene products, including probes, antisense nucleotides, and antibodies. The polynucleotides of the invention correspond to a polynucleotide comprising the sequence information of at least one of SEQ ID NOS:l-1485. The polypeptides of the invention correspond to a polypeptide comprising the amino acid sequence information of at least one of SEQ ID NOS: 1486-1542.

Accordingly, in one aspect, the invention provides an isolated polynucleotide comprising a nucleotide sequence which hybridizes under stringent conditions to a sequence selected from the group consisting of SEQ ID NOS: 1-1485.

In another aspect, the invention provides an isolated polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide sequence having at least 90% sequence identity to a sequence selected from the group consisting of SEQ ID NOS: 1-1485, a degenerate variant of SEQ ID NOS: 1- 1485, an antisense of SEQ ID NOS: 1-1485, and a complement of SEQ ID NOS: 1-1485.

In another aspect, the invention provides an isolated polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1-1485, a degenerate variant of SEQ ID NOS: 1-1485, an antisense of SEQ ID NOS: 1-1485, and a complement of SEQ ID NOS: 1-1485. In specific embodiments, the polynucleotide comprises at least 100 contiguous nucleotides of the nucleotide sequence. In other specific embodiments, the poynucleotide comprises at least 200 contiguous nucleotides of the nucleotide sequence.

In another aspect, the invention provides An isolated polynucleotide comprising a nucleotide sequence of at least 90% sequence identity to a sequence selected from the group consisting of SEQ ID NOS: 1-1485, a degenerate variant of SEQ ID NOS: 1-1485, an antisense of SEQ ID NOS: 1-1485, and a complement of SEQ ID NOS:l-1485. In specific embodiments, the polynucleotide comprises a nucleotide sequence of at least 95% sequence identity to the selected nucleotide sequence. In other specific embodiments, the polynucleotide comprises a nucleotide sequence that is identical to the selected nucleotide sequence. In another aspect, the invention provides a polynucleotide comprising a nucleotide sequence of an insert contained in a clone deposited as NRRL Accession No. B-30523, B-30524, B-30525, B- 30526, B-30527, B-30528, B-30529, or B-30581.

In another aspect, the invention provides an isolated cDNA obtained by the process of amplification using a polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1-1485. In specific embodiments, the polynucleotide comprises at least 25 contiguous nucleotides of the selected nucleotide sequence. In other specific embodiments, the polynucleotide comprises at least 100 contiguous nucleotides of the selected nucleotide sequence. In some embodiments, the amplification is by polymerase chain reaction (PCR) amplification. In another aspect, the invention provides an isolated recombinant host cell containing a polynucleotide of the invention.

In another aspect, the invention provides an isolated vector comprising a polynucleotide of the invention.

In another aspect, the invention provides a method for producing a polypeptide, the method comprising the steps of culturing a recombinant host cell containing a polynucleotide of the invention under conditions suitable for the expression of an encoded polypeptide and recovering the polypeptide from the host cell culture.

In another aspect, the invention provides an isolated polypeptide encoded by a poynucleotide of the invention. In another aspect, the invention provides an isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 1486-1542. hi another aspect, the invention provides an antibody that specifically binds a polypeptide of the invention. hi another aspect, the invention provides a method of detecting differentially expressed genes correlated with a cancerous state of a mammalian cell, the method comprising the step of detecting at least one differentially expressed gene product in a test sample derived from a cell suspected of being cancerous, where the gene product is encoded by a gene comprising an identifying sequence of at least one of SEQ ID OS:l-1485. Detection of the differentially expressed gene product is correlated with a cancerous state of the cell from which the test sample was derived.

In another aspect, the invention provides a method of detecting differentially expressed genes correlated with a cancerous state of a mammalian cell, the method comprising the step of detecting at least one differentially expressed gene product in a test sample derived from a cell suspected of being cancerous, where the gene product comprises an amino acid sequence selected from the group consisting of SEQ IDNOS:1486-1542. Detection of the differentially expressed gene product is correlated with a cancerous state of the cell from which the test sample was derived. In another aspect, the invention provides a library of polynucleotides, wherein at least one of the polynucleotides comprises the sequence information of a polynucleotide of the invention. In specific embodiments, the library is provided on a nucleic acid array. In some embodiments, the library is provided in a computer-readable format.

In another aspect, the invention provides a method of inhibiting tumor growth by modulating expression of a gene product, the gene product being encoded by a gene identified by a sequence selected from the group consisting of SEQ ID NOS: 1-1485. hi another aspect, the invention provides a method of inhibiting tumor growth by modulating expression of a gene product, the gene product comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 1486-1542. These and other objects, advantages, and features of the invention will become apparent to those persons skilled in the art upon reading the details of the invention as more fully described below. Detailed Description of the Invention

Before the present invention is described, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. It must be noted that as used herein and in the appended claims, the singular forms "a," "and," and "the" include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of such polynucleotides and reference to "the colon cancer cell" includes reference to one or more cells and equivalents thereof known to those skilled in the art, and so forth.

The publications and applications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed. Definitions

The terms "polynucleotide" and "nucleic acid," used interchangeably herein, refer to a polymeric forms of nucleotides of any length, either ribonucleotides or deoxynucleotides. Thus, these terms include, but are not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, branched nucleic acid (see, e.g., U.S. Pat. Nos. 5,124,246; 5,710,264; and 5,849,481) , or a polymer comprising purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. These terms furhter include, but are not limited to, mRNA or cDNA that comprise intronic sequences (see, e.g., Niwa et al. (1999) Cell 99(7): 691-702). The backbone of the polynucleotide can comprise sugars and phosphate groups (as may typically be found in RNA or DNA), or modified or substituted sugar or phosphate groups.

Alternatively, the backbone of the polynucleotide can comprise a polymer of synthetic subunits such as phosphoramidites and thus can be an oligodeoxynucleoside phosphoramidate or a mixed phosphoramidate-phosphodiester oligomer. Peyrottes et al. (1996) Nucl. Acids Res. 24:1841-1848; Chaturvedi et al. (1996) Nucl. Acids Res. 24:2318-2323. A polynuclotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs, uracyl, other sugars, and linking groups such as fiuororibose and thioate, and nucleotide branches. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component. Other types of modifications included in this definition are caps, substitution of one or more of the naturally occurring nucleotides with an analog, and introduction of means for attaching the polynucleotide to proteins, metal ions, labeling components, other polynucleotides, or a solid support.

The terms "polypeptide" and "protein," used interchangebly herein, refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones. The term includes fusion proteins, including, but not limited to, fusion proteins with a heterologous ammo acid sequence, fusions with heterologous and homologous leader sequences, with or without N-terminal methionine residues; immunologically tagged proteins; and the like.

"Diagnosis" as used herein generally includes determination of a subject's susceptibility to a disease or disorder, determination as to whether a subject is presently affected by a disease or disorder, prognosis of a subject affected by a disease or disorder (e.g., identification of pre-metastatic or metastatic cancerous states, stages of cancer, or responsiveness of cancer to therapy), and therametrics (e.g., monitoring a subject's condition to provide information as to the effect or efficacy of therapy). "Sample" or "biological sample" as used herein encompasses a variety of sample types, and are generally meant to refer to samples of biological fluids or tissues, particularly samples obtained from tissues, especially from cells of the type associated with a disease or condition for which a diagnostic application is designed (e.g., ductal adenocarcinoma), and the like. "Sample" or "biological sample" are meant to encompass blood and other liquid samples of biological origin, solid tissue samples, such as a biopsy specimen or tissue cultures or cells derived therefrom and the progeny thereof. These terms encompass samples that have been manipulated in any way after their procurement as well as derivatives and fractions of samples, where the samples may be maniuplated by, for example, treatment with reagents, solubilization, or enrichment for certain components. The terms also encompass clinical samples, and also includes cells in cell culture, cell supernatants, cell lysates, serum, plasma, biological fluids, and tissue samples. Where the sample is solid tissue, the cells of the tissue can be dissociated or tissue sections can be analyzed. The terms "treatment," "treating," "treat" and the like are used herein to generally refer to obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms of a partial or complete stabilization or cure for a disease and/or adverse effect attributable to the disease. "Treatment" as used herein covers any treatment of a disease in a mammal, particularly a human, and includes: (a) preventing the disease or symptom from occurring in a subject which may be predisposed to the disease or symptom but has not yet been diagnosed as having it; (b) inhibiting the disease symptom, i.e., arresting its development; or relieving the disease symptom, i.e., causing regression of the disease or symptom.

The terms "individual," "subject," "host," and "patient," used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, particularly humans. Other subjects may include cattle, dogs, cats, guinea pigs, rabbits, rats, mice, horses, and so on.

As used herein the term "isolated" refers to a polynucleotide, a polypeptide, an antibody, or a host cell that is in an environment different from that in which the polynucleotide, the polypeptide, the antibody, or the host cell naturally occurs. A polynucleotide, a polypeptide, an antibody, or a host cell which is isolated is generally substantially purified. As used herein, the term "substantially purified" refers to a compound (e.g., either a polynucleotide or a polypeptide or an antibody) that is removed from its natural environment and is at least 60% free,- preferably 75% free, and most preferably 90% free from other components with which it is naturally associated. Thus, for example, a composition containing A is "substantially free of B when at least 85% by weight of the total A+B in the composition is A. Preferably, A comprises at least about 90% by weight of the total of A+B in the composition, more preferably at least about 95% or even 99% by weight.

A "host cell," as used herein, refers to a microorganism or a eukaryotic cell or cell line cultured as a unicellular entity which can be, or has been, used as a recipient for a recombinant vector or other transfer polynucleotides, and include the progeny of the original cell which has been transfected. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation.

The terms "cancer," "neoplasm," "tumor," and "carcinoma," are used interchangeably herein to refer to cells which exhibit relatively autonomous growth, so that they exhibit an aberrant growth phenotype characterized by a significant loss of control of cell proliferation. In general, cells of interest for detection or treatment in the present application include precancerous (e.g. , benign), malignant, metastatic, and non-metastatic cells. Detection of cancerous cell is of particular interest.

The use of "e", as in 10e-3, indicates that the number to the left of "e" is raised to the power of

.3 the number to the right of "e" (thus, 10e-3 is 10 ).

The term "heterologous" as used herein in the context of, for example, heterologous nucleic acid or amino acid sequences, heterologous polypeptides, or heterologous nucleic acid, is meant to refer to material that originates from a source different from that with which it is joined or associated. For example, two DNA sequences are heterologous to one another if the sequences are from different genes or from different species. A recombinant host cell containing a sequence that is heterologous to the host cell can be, for example, a bacterial cell containing a sequence encoding a human polypeptide.

The invention relates to polynucleotides comprising the disclosed nucleotide sequences, to full length cDNA, mRNA, genomic sequences, and genes corresponding to these sequences and degenerate variants thereof, and to polypeptides encoded by the polynucleotides of the invention and polypeptide variants. The following detailed description describes the polynucleotide compositions encompassed by the invention, methods for obtaining cDNA or genomic DNA encoding a full-length gene product, expression of these polynucleotides and genes, identification of structural motifs of the polynucleotides and genes, identification of the function of a gene product encoded by a gene corresponding to a polynucleotide of the invention, use of the provided polynucleotides as probes and in mapping and in tissue profiling, use of the corresponding polypeptides and other gene products to raise antibodies, and use of the polynucleotides and their encoded gene products for therapeutic and diagnostic purposes. Polynucleotide Compositions

The present invention provides isolated polynucleotides that represent genes that are differentially expressed in human cancer cells. The polynucleotides, as well as polypeptides encoded thereby, find use in a variety of therapeutic and diagnostic methods. The scope of the invention with respect to compositions containing the isolated polynucleotides useful in the methods described herein includes, but is not necessarily limited to, polynucleotides having a sequence set forth in any one of the polynucleotide sequences provided herein; polynucleotides obtained from the biological materials described herein or other biological sources (particularly human sources) by hybridization under stringent conditions (particularly conditions of high stringency); genes corresponding to the provided polynucleotides; cDNAs corresponding to the provided polynucleotides; variants of the provided polynucleotides and their corresponding genes, particularly those variants that retain a biological activity of the encoded gene product (e.g., a biological activity ascribed to a gene product corresponding to the provided polynucleotides as a result of the assignment of the gene product to a protein family(ies) and/or identification of a functional domain present in the gene product). Other nucleic acid compositions contemplated by and within the scope of the present invention will be readily apparent to one of ordinary skill in the art when provided with the disclosure here. "Polynucleotide" and "nucleic acid" as used herein with reference to nucleic acids of the composition is not intended to be limiting as to the length or structure of the nucleic acid unless specifically indicated. The invention features polynucleotides that represent genes that are expressed in human tissue, specifically human breast tissue, particularly polynucleotides that are differentially expressed in cancerous breast cells. Nucleic acid compositions described herein of particular interest are at least about 15 bp in length, at least about 30 bp in length, at least about 50 bp in length, at least about 100 bp, at least about 200 bp in length, at least about 300 bp in length, at least about 500 bp in length, at least about 800 bp in length, at least about 1 kb in length, at least about 2.0 kb in length, at least about 3.0 kb in length, at least about 5 kb in length, at least about 10 kb in length, at least about 50kb in length and are usually less than about 200 kb in length. These polynucleotides (or polynucleotide fragments) have uses that include, but are not limited to, diagnostic probes and primers as starting materials for probes and primers, as discussed herein. The subject polynucleotides usually comprise a sequence set forth in any one of the polynucleotide sequences provided herein, for example, in the sequence listing, incorporated by reference in a table (e.g. by an NCBI accession number), a cDNA deposited at the A.T.C.C, or a fragment or variant thereof. A "fragment" or "portion" of a polynucleotide is a contiguous sequence of residues at least about 10 nt to about 12 nt, 15 nt, 16 nt, 18 nt or 20 nt in length, usually at least about 22 nt, 24 nt, 25 nt, 30 nt, 40 nt, 50 nt, 60nt, 70 nt, 80 nt, 90 nt, 100 nt toat least about 150 nt, 200 nt, 250 nt, 300 nt, 350 nt, 400 nt, 500 nt, 800 nt or up to about 1000 nt, 1500 or 2000 nt in length, hi some embodiments, a fragment of a polynucleotide is the coding sequence of a polynucleotide. A fragment of a polynucleotide may start at position 1 (i.e. the first nucleotide) of a nucleotide sequence provided herein, or may start at about position 10, 20, 30, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 1500 or 2000, or an ATG translational initiation codon of a nucleotide sequence provided herein. In this context "about" includes the particularly recited value or a value larger or smaller by several (5, 4, 3, 2, or 1) nucleotides. The described polynucleotides and fragments thereof find use as hybridization probes, PCR primers, BLAST probes, or as an identifying sequence, for example.

The subject nucleic acids may be variants or degenerate variants of a sequence provided herein, hi general, a variants of a polynucleotide provided herein have a fragment of sequence identity that is greater than at least about 65%, greater than at least about 70%, greater than at least about 75%, greater than at least about 80%, greater than at least about 85%, or greater than at least about 90%, 95%, 96%, 97%, 98%, 99%) or more (i.e. 100%) as compared to an identically sized fragment of a provided sequence, as determined by the Smith-Waterman homology search algorithm as implemented in MPSRCH program (Oxford Molecular). For the purposes of this invention, a preferred method of calculating percent identity is the Smith-Waterman algorithm. Global DNA sequence identity should be greater than 65% as determined by the Smith-Waterman homology search algorithm as implemented in MPSRCH program (Oxford Molecular) using an gap search with the following search parameters: gap open penalty, 12; and gap extension penalty, 1. The subject nucleic acid compositions include full-length cDNAs or niRNAs that encompass an identifying sequence of contiguous nucleotides from any one of the polynucleotide sequences provided herein.

As discussed above, the polynucleotides useful in the methods described herein also include polynucleotide variants having sequence similarity or sequence identity. Nucleic acids having sequence similarity are detected by hybridization under low stringency conditions, for example, at 50°C and 10XSSC (0.9 M saline/0.09 M sodium citrate) and remain bound when subjected to washing at 55°C in lXSSC. Sequence identity can be determined by hybridization under high stringency conditions, for example, at 50°C or higher and 0.1XSSC (9 mM saline/0.9 mM sodium citrate). Hybridization methods and conditions are well known in the art, see, e.g., USPN 5,707,829. Nucleic acids that are substantially identical to the provided polynucleotide sequences, e.g. allelic variants, genetically altered versions of the gene, etc., bind to the provided polynucleotide sequences under stringent hybridization conditions. By using probes, particularly labeled probes of DNA sequences, one can isolate homologous or related genes. The source of homologous genes can be any species, e.g. primate species, particularly human; rodents, such as rats and mice; canines, felines, bovines, ovines, equines, yeast, nematodes, etc. In one embodiment, hybridization is performed using a fragment of at least 15 contiguous nucleotides (nt) of at least one of the polynucleotide sequences provided herein. That is, when at least 15 contiguous nt of one of the disclosed polynucleotide sequences is used as a probe, the probe will preferentially hybridize with a nucleic acid comprising the complementary sequence, allowing the identification and retrieval of the nucleic acids that uniquely hybridize to the selected probe. Probes from more than one polynucleotide sequence provided herein can hybridize with the same nucleic acid if the cDNA from which they were derived corresponds to one mRNA. -

Polynucleotides contemplated for use in the invention also include those having a sequence of naturally occurring variants of the nucleotide sequences (e.g., degenerate variants (e.g., sequences that encode the same polypeptides but, due to the degenerate nature of the genetic code, different in nucleotide sequence), allelic variants, etc.). Variants of the polynucleotides contemplated by the invention are identified by hybridization of putative variants with nucleotide sequences disclosed herein, preferably by hybridization under stringent conditions. For example, by using appropriate wash conditions, variants of the polynucleotides described herein can be identified where the allelic variant exhibits at most about 25-30% base pair (bp) mismatches relative to the selected polynucleotide probe, hi general, allelic variants contain 15-25%) bp mismatches, and can contain as little as even 5-15%, or 2-5%, or 1-2%) bp mismatches, as well as a single bp mismatch.

The invention also encompasses homologs corresponding to any one of the polynucleotide sequences provided herein, where the source of homologous genes can be any mammalian species, e.g., primate species, particularly human; rodents, such as rats; canines, felines, bovines, ovines, equines, yeast, nematodes, etc. Between mammalian species, e.g., human and mouse, homologs generally have substantial sequence similarity, e.g., at least 75% sequence identity, usually at least 80%%, at least 85, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or even 100% identity between nucleotide sequences. Sequence similarity is calculated based on a reference sequence, which may be a subset of a larger sequence, such as a conserved motif, coding region, flanking region, etc. A reference sequence will usually be at least about a fragment of a polynucleotide sequence and may extend to the complete sequence that is being compared. Algorithms for sequence analysis are known in the art, such as gapped BLAST, described in Altschul, et al. Nucleic Acids Res. (1997) 25:3389-3402, or TeraBLAST available from TimeLogic Corp. (Crystal Bay, Nevada).

Moreover, representative examples of polynucleotide fragments of the invention (useful, for example, as probes), include, for example, fragments comprising, or alternatively consisting of, a sequence from about nucleotide number 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-400, 401-450, 451-500, 501-550, 551-600, 651-700,701- 750, 751-800, 800-850, 851-900, 901- 950,951-1000, 1001-1050, 1051-1100, 1101-1150, 1151-1200, 1201-1250, 1251-1300, 1301-1350, 1351-1400, 1401-1450, 1451-1500, 1501-1550, 1551-1600, 1601-1650, 1651-1700, 1701-1750, 1751-1800, 1801-1850, 1851-1900, 1901-1950, 1951-2000, 2001-2050, 2051-2100, 2101-2150, 2151-2200, 2201-2250, 2251-2300, 2301-2350, 2351-2400, 2401-2450, 2451-2500, 2501-2550, 2551-2600, 2601-2650, 2651-2700, 2701-2750, 2751-2800, 2801-2850, 2851-2900, 2901-2950, 2951-3000, 3001-3050, 3051-3100, 3101-3150, 3151-3200, 3201-3250, 3251-3300, 3301-3350, 3351-3400, 3401-3450, 3451-3500, 3501-3550, 3551-3600, 3601-3650, 3651-3700, 3701-3750, 3751-3800, 3801-3850, 3851-3900, 3901-3950, 3951-4000, 4001-4050, 4051-4100, 4101-4150, 4151-4200, 4201-4250, 4251-4300, 4301-4350, 4351-4400, 4401-4450, 4451-4500, 4501-4550, . 4551-4600, 4601-4650, 4651-4700, 4701-4750, 4751-4800, 4801-4850, 4851-4900, 4901-4950, 4951-5000, 5001-5050, 5051-5100, 5101-5150, 5151-5200, 5201-5250, 5251-5300, 5301-5350, 5351-5400, 5401- 5450, 5451-5500, 5501-5550, 5551-5600, 5601-5650, 5651-5700, 5701-5750,

5751-5800, 5801-5850, 5851-5900, 5901-5950, 5951-6000, 6001-6050, 6051-6100, 6101-6150, and 6151 of a subject nucleic acid, or the complementary strand thereto. In this context "about" includes the particularly recited range or a range larger or smaller by several (5, 4, 3, 2, or 1) nucleotides, at either terminus or at both termini, hi some embodiments, these fragments encode a polypeptide which has a functional activity (e.g., biological activity) whereas in other embodiments, these fragments are probes, or starting materials for probes. Polynucleotides which hybridize to one or more of these nucleic acid molecules under stringent hybridization conditions or alternatively, under lower stringency conditions, are also encompassed by the invention, as are polypeptides encoded by these polynucleotides or fragments. The subject nucleic acids can be cDNAs or genomic DNAs, as well as fragments thereof, particularly fragments that encode a biologically active gene product and/or are useful in the methods disclosed herein (e.g., in diagnosis, as a unique identifier of a differentially expressed gene of interest, etc.). The term "cDNA" as used herein is intended to include all nucleic acids that share the arrangement of sequence elements found in native mature mRNA species, where sequence elements are exons and 3' and 5' non-coding regions. Normally mRNA species have contiguous exons, with the intervening introns, when present, being removed by nuclear RNA splicing, to create a continuous open reading frame encoding a polypeptide. mRNA species can also exist with both exons and introns, where the introns may be removed by alternative splicing. Furthermore it should be noted that different species of rnRNAs encoded by the same genomic sequence can exist at varying levels in a cell, and detection of these various levels of mRNA species can be indicative of differential expression of the encoded gene product in the cell.

A genomic sequence of interest comprises the nucleic acid present between the initiation codon and the stop codon, as defined in the listed sequences, including all of the introns that are normally present in a native chromosome. It can further include the 3 ' and 5 ' untranslated regions found in the mature mRNA. It can further include specific transcriptional and translational regulatory sequences, such as promoters, enhancers, etc., including about 1 kb, but possibly more, of flanking genomic DNA at either the 5 ' and 3 ' end of the transcribed region. The genomic DNA can be isolated as a fragment of 100 kbp or smaller; and substantially free of flanking chromosomal sequence. The genomic DNA flanking the coding region, either 3' and 5', or internal regulatory sequences as sometimes found in introns, contains sequences required for proper tissue, stage-specific, or disease-state specific expression.

The nucleic acid compositions of the subject invention can encode all or a part of the naturally-occurring polypeptides. Double or single stranded fragments can be obtained from the DNA sequence by chemically synthesizing oligonucleotides in accordance with conventional methods, by restriction enzyme digestion, by PCR amplification, etc. Probes specific to the polynucleotides described herein can be generated using the polynucleotide sequences disclosed herein. The probes are usually a fragment of a polynucleotide sequences provided herein. The probes can be synthesized chemically or can be generated from longer polynucleotides using restriction enzymes. The probes can be labeled, for example, with a radioactive, biotinylated, or fluorescent tag. Preferably, probes are designed based upon an identifying sequence of any one of the polynucleotide sequences provided herein. More preferably, probes are designed based on a contiguous sequence of one of the subject polynucleotides that remain unmasked following application of a masking program for masking low complexity (e.g., XBLAST, RepeatMasker, etc.) to the sequence., i.e., one would select an unmasked region, as indicated by the polynucleotides outside the poly-n stretches of the masked sequence produced by the masking program.

The polynucleotides of interest in the subject invention are isolated and obtained in substantial purity, generally as other than an intact chromosome. Usually, the polynucleotides, either as DNA or RNA, will be obtained substantially free of other naturally-occurring nucleic acid sequences that they are usually associated with , generally being at least about 50%, usually at least about 90% pure and are typically "recombinant", e.g. , flanked by one or more nucleotides with which it is not normally^' associated on a naturally occurring chromosome.

The polynucleotides described herein can be provided as a linear molecule or within a circular molecule, and can be provided within autonomously replicating molecules (vectors) or within molecules without replication sequences. Expression of the polynucleotides can be regulated by their own or by other regulatory- sequences known in the art. The polynucleotides can be introduced into suitable host cells using a variety of techniques available in the art, such as transferrin polycation- mediated DNA transfer, transfection with naked or encapsulated nucleic acids, liposome-mediated DNA transfer, mtracellular transportation of DNA-coated latex beads, protoplast fusion, viral infection, electroporation, gene gun, calcium phosphate-mediated transfection, and the like. The nucleic acid compositions described herein can be used to, for example, produce polypeptides, as probes for the detection of mRNA in biological samples (e.g., extracts of human cells) or cDNA produced from such samples, to generate additional copies of the polynucleotides, to generate ribozymes or antisense oligonucleotides, and as single stranded DNA probes or as triple- strand forming oligonucleotides. The probes described herein can be used to, for example, determine the presence or absence of any one of the polynucleotide provided herein or variants thereof in a sample. These and other uses are described in more detail below.The subject nucleic acid compositions can be used, for example, to produce polypeptides, as probes for the detection of mRNA of the invention in biological samples (e.g., extracts of human cells) to generate additional copies of the polynucleotides, to generate ribozymes or antisense oligonucleotides, and as single stranded DNA probes or as triple-strand forming oligonucleotides. The probes described herein can be used to, for example, determine the presence or absence of the polynucleotide sequences as shown in SEQ ID

NOS: 1-1485 or variants thereof in a sample. These and other uses are described in more detail below. Use of Polynucleotides to Obtain Full-Length cDNA. Gene, and Promoter Region In one embodiment, the polynucleotides are useful as starting materials to construct larger molecules. In one example, the polynucleotides of the invention are used to construct polynucleotides that encode a larger polypeptide (e.g. , up to the full-length native polypeptide as well as fusion proteins comprising all or a portion of the native polypeptide) or may be used to produce haptens of the polypeptide (e.g., polypeptides useful to generate antibodies). hi one particular example, the polynucleotides of the invention are used to make or isolate cDNA molecules encoding all or portion of a naturally-occuring polypeptide. Full-length cDNA molecules comprising the disclosed polynucleotides are obtained as follows. A polynucleotide having a sequence of one of SEQ ID NOS:l-1485, or a portion thereof comprising at least 12, 15, 18, or 20 nt, is used as a hybridization probe to detect hybridizing members of a cDNA library using probe design methods, cloning methods, and clone selection techniques such as those described in USPN 5,654, 173. Libraries of cDNA are made from selected tissues, such as normal or tumor tissue, or from tissues of a mammal treated with, for example, a pharmaceutical agent. Preferably, the tissue is the same as the tissue from which the polynucleotides of the invention were isolated, as both the polynucleotides described herein and the cDNA represent expressed genes. Most preferably, the cDNA library is made from the biological material described herein in the Examples. The choice of cell type for library construction can be made after the identity of the protein encoded by the gene corresponding to the polynucleotide of the invention is known. This will indicate which tissue and cell types are likely to express the related gene, and thus represent a suitable source for the mRNA for generating the cDNA. Where the provided polynucleotides are isolated from cDNA libraries, the libraries are prepared from mRNA of human prostate cells, more preferably, human prostate cancer cells Techniques for producing and probing nucleic acid sequence libraries are described, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY. The cDNA can be prepared by using primers based on polynucleotides comprising a sequence of SEQ ID NOS: 1-1485. In one embodiment, the cDNA library can be made from only poly-adenylated mRNA. Thus, poly-T primers can be used to prepare cDNA from the mRNA. Members of the library that are larger than the provided polynucleotides, and preferably that encompass the complete coding sequence of the native message, are obtained. In order to confirm that the entire cDNA has been obtained, RNA protection experiments are performed as follows. Hybridization of a full-length cDNA to an mRNA will protect the RNA from RNase degradation. If the cDNA is not full length, then the portions of the mRNA that are not hybridized will be subject to RNase degradation. This is assayed, as is known in the art, by changes in electrophoretic mobility on polyacrylamide gels, or by detection of released monoribonucleotides. Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY. In order to obtain additional sequences 5' to the end of a partial cDNA, 5' RACE (PCR Protocols: A Guide to Methods and Applications, (1990) Academic Press, Inc.) can be performed. Genomic DNA is isolated using the provided polynucleotides in a manner similar to the isolation of full-length cDNAs. Briefly, the provided polynucleotides, or portions thereof, are used as probes to libraries of genomic DNA. Preferably, the library is obtained from the cell type that was used to generate the polynucleotides of the invention, but this is not essential. Most preferably, the genomic DNA is obtained from the biological material described herein in the Examples. Such libraries can be in vectors suitable for carrying large segments of a genome, such as PI or YAC, as described in detail in Sambrook et al., supra, 9.4-9.30. hi addition, genomic sequences can be isolated from human BAC libraries, which are commercially available from Research Genetics, Inc., Huntsville, Alabama, USA, for example. In order to obtain additional 5' or 3' sequences, chromosome walking is performed, as described in Sambrook et al., such that adjacent and overlapping fragments of^genomic DNA are isolated. These are mapped and pieced together, as is known in the art, using restriction digestion enzymes and DNA ligase.

Using the polynucleotide sequences of the invention, corresponding full-length genes can be isolated using both classical and PCR methods to construct and probe cDNA libraries. Using either method, Northern blots, preferably, are performed on a number of cell types to determine which cell lines express the gene of interest at the highest level. Classical methods of constructing cDNA libraries are taught in Sambrook et al., supra. With these methods, cDNA can be produced from mRNA and inserted into viral or expression vectors. Typically, libraries of mRNA comprising poly(A) tails can be produced with polyfT) primers. Similarly, cDNA libraries can be produced using the instant sequences as primers. PCR methods are used to amplify the members of a cDNA library that comprise the desired insert. In this case, the desired insert will contain sequence from the full length cDNA that corresponds to the instant polynucleotides. Such PCR methods include gene trapping and RACE methods. Gene trapping entails inserting a member of a cDNA library into a vector. The vector then is denatured to produce single stranded molecules. Next, a substrate-bound probe, such as a biotinylated oligo, is used to trap cDNA inserts of interest. Biotinylated probes can be linked to an avidin-bound solid substrate. PCR methods can be used to amplify the trapped cDNA. To trap sequences corresponding to the full length genes, the labeled probe sequence is based on the polynucleotide sequences of the invention. Random primers or primers specific to the library vector can be used to amplify the trapped cDNA. Such gene trapping techniques are described in Gruber et al., WO 95/04745 and Gruber et al., USPN 5,500,356. Kits are commercially available to perform gene trapping experiments from, for example, Life Technologies, Gaithersburg, Maryland, USA.

"Rapid amplification of cDNA ends," or RACE, is a PCR method of amplifying cDNAs from a number of different RNAs. The cDNAs are ligated to an oligonucleotide linker, and amplified by PCR using two primers. One primer is based on sequence from the instant polynucleotides, for which full length sequence is desired, and a second primer comprises sequence that hybridizes to the oligonucleotide linker to amplify the cDNA. A description of this method is reported in WO

97/19110. hi preferred embodiments of RACE, a common primer is designed to anneal to an arbitrary adaptor sequence ligated to cDNA ends (Apte and Siebert, Biotechniques (1993) 15:890-893; Edwards et al., Nuc. Acids Res. (1991) 19:5227-5232). When a single gene-specific RACE primer is paired with the common primer, preferential amplification of sequences between the single gene specific primer and the common primer occurs. Commercial cDNA pools modified for use in RACE are available.

Another PCR-based method generates full-length cDNA library with anchored ends without needing specific knowledge of the cDNA sequence. The method uses lock-docking primers (I-VI), where one primer, poly TV (I-HI) locks over the polyA tail of eukaryotic mRNA producing first strand synthesis and a second primer, polyGH (PV-VI) locks onto the polyC tail added by terminal deoxynucleotidyl transferase (TdT)(see, e.g., WO 96/40998).

The promoter region of a gene generally is located 5' to the initiation site for RNA polymerase II. Hundreds of promoter regions contain the "TATA" box, a sequence such as TATTA or TATAA, which is sensitive to mutations. The promoter region can be obtained by performing 5' RACE using a primer from the coding region of the gene. Alternatively, the cDNA can be used as a probe for the genomic sequence, and the region 5' to the coding region is identified by "walking up." If the gene is highly expressed or differentially expressed, the promoter from the gene can be of use in a regulatory construct for a heterologous gene.

Once the full-length cDNA or gene is obtained, DNA encoding variants can be prepared by site-directed mutagenesis, described in detail in Sambrook et al., 15.3-15.63. The choice of codon or nucleotide to be replaced can be based on disclosure herein on optional changes in amino acids to achieve altered protein structure and/or function.

As an alternative method to obtaining DNA or RNA from a biological material, nucleic acid comprising nucleotides having the sequence of one or more polynucleotides of the invention can be synthesized. Thus, the invention encompasses nucleic acid molecules ranging in length from 15 nt (corresponding to at least 15 contiguous nt of one of SEQ ID NOS: 1-1485) up to a maximum length suitable for one or more biological manipulations, including replication and expression, of the nucleic acid molecule. The invention includes but is not limited to (a) nucleic acid having the size of a full gene, and comprising at least one of SEQ ID NOS:l-1485; (b) the nucleic acid of (a) also comprising at least one additional gene, operably linked to pennit expression of a fusion protein; (c) an expression vector comprising (a) or (b); (d) a plasmid comprising (a) or (b); and (e) a recombinant viral particle comprising (a) or (b). Once provided with the polynucleotides disclosed herein, construction or preparation of (a) - (e) are well within the skill in the art.

The sequence of a nucleic acid comprising at least 15 contiguous nt of at least any one of SEQ ID NOS: 1-1485, preferably the entire sequence of at least any one of SEQ ID NOS: 1-1485, is not limited and can be any sequence of A, T, G, and/or C (for DNA) and A, U, G, and/or C (for RNA) or modified bases thereof, including inosine and pseudouridine. The choice of sequence will depend on the desired function and can be dictated by coding regions desired, the intron-like regions desired, and the regulatory regions desired. Where the entire sequence of any one of SEQ ID NOS: 1-1485 is within the nucleic acid, the nucleic acid obtained is referred to herein as a polynucleotide comprising the sequence of any one of SEQ ID NOS:l-1485.

Expression of Polypeptide Encoded by Full-Length cDNA or Full-Length Gene The provided polynucleotides (e.g., a polynucleotide having a sequence of one of SEQ ID NOS: 1-1485), the corresponding cDNA, or the full-length gene is used to express a partial or complete gene product. Constructs of polynucleotides having sequences of SEQ ID NOS: 1-1485 can also be generated synthetically. Alternatively, single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides is described by, e.g., Stemmer et al., Gene (Amsterdam) (1995) 164(l):49-53. In this method, assembly PCR (the synthesis of long DNA sequences from large numbers of oligodeoxyribonucleotides (oligos)) is described. The method is derived from DNA shuffling (Stemmer, Nature (1994) 370:389-391), and does not rely on DNA ligase, but instead relies on DNA polymerase to build increasingly longer DNA fragments during the assembly process.

Appropriate polynucleotide constructs are purified using standard recombinant DNA techniques as described in, for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring Harbor Press, Cold Spring Harbor, NY, and under current regulations described in United States Dept. of HHS, National Institute of Health (NTH) Guidelines for

Recombinant DNA Research. The gene product encoded by a polynucleotide of the invention is expressed in any expression system, including, for example, bacterial, yeast, insect, amphibian and mammalian systems. Vectors, host cells and methods for obtaining expression in same are well known in the art. Suitable vectors and host cells are described in USPN 5,654,173.

Polynucleotide molecules comprising a polynucleotide sequence provided herein are generally propagated by placing the molecule in a vector. Viral and non-viral vectors are used, including plasmids. The choice of plasmid will depend on the type of cell in which propagation is desired and the purpose of propagation. Certain vectors are useful for amplifying and making large amounts of the desired DNA sequence. Other vectors are suitable for expression in cells in culture. Still other vectors are suitable for transfer and expression in cells in a whole animal or person. The choice of appropriate vector is well within the skill of the art. Many such vectors are available commercially. Methods for preparation of vectors comprising a desired sequence are well known in the art.

The polynucleotides set forth in SEQ ID NOS: 1-1485 or their corresponding full-length polynucleotides are linked to regulatory sequences as appropriate to obtain the desired expression properties. These can include promoters (attached either at the 5' end of the sense strand or at the 3' end of the antisense strand), enhancers, terminators, operators, repressors, and inducers. The promoters can be regulated or constitutive. In some situations it may be desirable to use conditionally active promoters, such as tissue-specific or developmental stage-specific promoters. These are linked to the desired nucleotide sequence using the techniques described above for linkage to vectors. Any techniques known in the art can be used. When any of the above host cells, or other appropriate host cells or organisms, are used to replicate and/or express the polynucleotides or nucleic acids of the invention, the resulting replicated nucleic acid, RNA, expressed protein or polypeptide, is within the scope of the invention as a product of the host cell or organism. The product is recovered by any appropriate means known in the art. Once the gene corresponding to a selected polynucleotide is identified, its expression can be regulated in the cell to which the gene is native. For example, an endogenous gene of a cell can be regulated by an exogenous regulatory sequence as disclosed in USPN 5,641,670. Identification of Functional and Structural Motifs

Translations of the nucleotide sequence of the provided polynucleotides, cDNAs or full genes can be aligned with individual known sequences. Similarity with individual sequences can be used to determine the activity of the polypeptides encoded by the polynucleotides of the invention. Also, sequences exhibiting similarity with more than one individual sequence can exhibit activities that are characteristic of either or both individual sequences.

The full length sequences and fragments of the polynucleotide sequences of the nearest neighbors as identified through, for example, BLAST-based searching,can be used as probes and prhners to identify and isolate the full length sequence corresponding to provided polynucleotides. The nearest neighbors can indicate a tissue or cell type to be used to construct a library for the full- length sequences corresponding to the provided polynucleotides.

Typically, a selected polynucleotide is translated in all six frames to determine the best alignment with the individual sequences. The sequences disclosed herein in the Sequence Listing are in a 5' to 3 ' orientation and translation in three frames can be sufficient (with a few specific exceptions as described in the Examples). These amino acid sequences are referred to, generally, as query sequences, which will be aligned with the individual sequences. Databases with individual sequences are described in "Computer Methods for Macromolecular Sequence Analysis" Methods in Enz mology (1996) 266, Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., San Diego, California, USA. Databases include GenBank, EMBL, and DNA Database of Japan (DDB J). Query and individual sequences can be aligned using the methods and computer programs described above, and include BLAST 2.0, available over the world wide web at a site supported by the National Center for Biotechnology Information,, which is supported by the National Library of Medicine and the National Institutes of Health, or TeraBLAST available from TimeLogic Corp. (Crystal Bay, Nevada). See also Altschul, et al. Nucleic Acids Res. (1997) 25:3389-3402. Another alignment algorithm is Fasta, available in the Genetics Computing Group (GCG) package, Madison, Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc. Other techniques for alignment are described in Doolittle, supra. Preferably, an alignment program that permits gaps in the sequence is utilized to align the sequences. The Smith-Waterman is one type of algorithm that permits gaps in sequence alignments. See Meth. Mol. Biol. (1997) 70: 173-187. Also, the GAP program using the Needleman and Wunsch alignment method can be utilized to align sequences. An alternative search strategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCH uses a Smith-Waterman algorithm to score sequences on a massively parallel computer. This approach improves ability to identify sequences that are distantly related matches, and is especially tolerant of small gaps and nucleotide sequence errors. Amino acid sequences encoded by the provided polynucleotides can be used to search both protein and DNA databases. Incorporated herein by reference are all sequences that have been made public as of the filing date of this application by any of the DNA or protein sequence databases, including the patent databases (e.g., GeneSeq). Also incorporated by reference are those sequences that have been submitted to these databases as of the filing date of the present application but not made public until after the filing date of the present application.

Results of individual and query sequence alignments can be divided into three categories: high similarity, weak similarity, and no similarity. Individual alignment results ranging from high similarity to weak similarity provide a basis for determining polypeptide activity and/or structure. Parameters for categorizing individual results include: percentage of the alignment region length where the strongest alignment is found, percent sequence identity, and p value. The percentage of the alignment region length is calculated by counting the number of residues of the individual sequence found in the region of strongest alignment, e.g, contiguous region of the individual sequence that contains the greatest number of residues that are identical to the residues of the corresponding region of the aligned query sequence. This number is divided by the total residue length of the query sequence to calculate a percentage. For example, a query sequence of 20 amino acid residues might be aligned with a 20 amino acid region of an individual sequence. The individual sequence might be identical to amino acid residues 5, 9-15, and 17-19 of the query sequence. The region of strongest alignment is thus the region stretching from residue 9-19, an 11 amino acid stretch. The percentage of the alignment region length is: 11 (length of the region of strongest alignment) divided by (query sequence length) 20 or 55%.

Percent sequence identity is calculated by counting the number of amino acid matches between the query and individual sequence and dividing total number of matches by the number of residues of the individual sequences found in the region of strongest alignment. Thus, the percent identity in the example above would be 10 matches divided by 11 amino acids, or approximately, 90.9%

P value is the probability that the alignment was produced by chance. For a single alignment, the p value can be calculated according to Karlin et al., Proc. Natl. Acad. Sci. (1990) 87:2264 and Karlin et al., Proc. Natl. Acad. Sci. (1993) 90. The p value of multiple alignments using the same query sequence can be calculated using an heuristic approach described in Altschul et al., Nat. Genet. (1994) 6:119. Alignment programs, such as BLAST or TeraBLAST, can calculate the p value. See also Altschul et al., Nucleic Acids Res. (1997) 25:3389-3402.

Another factor to consider for determining identity or similarity is the location of the similarity or identity. Strong local alignment can indicate similarity even if the length of alignment is short. Sequence identity scattered throughout the length of the query sequence also can indicate a similarity between the query and profile sequences. The boundaries of the region where the sequences align can be determined according to Doolittle, supra; BLAST 2.0 (see, e.g., Altschul, et al. Nucleic Acids Res. (1997) 25:3389-3402), TeraBLAST (available from TimeLogic Corp., Crystal Bay, Nevada), or FAST programs; or by determining the area where sequence identity is highest.

High Similarity. In general, in alignment results considered to be of high similarity, the percent of the alignment region length is typically at least about 55% of total length query sequence; more typically, at least about 58%; even more typically, at least about 60% of the total residue length of the query sequence. Usually, percent length of the alignment region can be as much as about 62%; more usually, as much as about 64%; even more usually, as much as about 66%). Further, for high similarity, the region of alignment, typically, exhibits at least about 75% of sequence identity, more typically, at least about 78%; even more typically; at least about 80%) sequence identity. Usually, percent sequence identity can be as much as about 82%>; more usually, as much as about 84%_>; even more usually, as much as about 86%).

The p value is used in conjunction with these methods. If high similarity is found, the query sequence is considered to have high similarity with a profile sequence when the p value is less than or equal to about 10e-2; more usually; less than or equal to about 10e-3; even more usually; less than or equal to about 10e-4. More typically, the p value is no more than about 10e-5; more typically; no more than or equal to about lOe-10; even more typically, no more than or equal to about 10e-15 for the query sequence to be considered high similarity.

Weak Similarity. In general, where alignment results considered to be of weak similarity, there is no minimum percent length of the aligmnent region nor minimum length of alignment. A better showing of weak similarity is considered when the region of alignment is, typically, at least about 15 amino acid residues in length; more typically, at least about 20; even more typically, at least about 25 amino acid residues in length. Usually, length of the alignment region can be as much as about 30 amino acid residues; more usually, as much as about 40; even more usually, as much as about 60 amino acid residues. Further, for weak similarity, the region of alignment, typically, exhibits at least about 35% of sequence identity; more typically, at least about 40%; even more typically, at least about 45%> sequence identity. Usually, percent sequence identity can be as much as about 50%; more usually, as much as about 55%; even more usually, as much as about 60%.

If low similarity is found, the query sequence is considered to have weak similarity with a profile sequence when the p value is usually less than or equal to about 10e-2; more usually, less than or equal to about 10e-3; even more usually; less than or equal to about 10e-4. More typically, the p value is no more than about 10e-5; more usually; no more than or equal to about lOe-10; even more usually, no more than or equal to about 10e-15 for the query sequence to be considered weak similarity. Similarity Determined by Sequence Identity Alone. Sequence identity alone can be used to determine similarity of a query sequence to an individual sequence and can indicate the activity of the sequence. Such an alignment, preferably, permits gaps to align sequences. Typically, the query sequence is related to the profile sequence if the sequence identity over the entire query sequence is at least about 15%>; more typically, at least about 20%; even more typically, at least about 25%; even more typically, at least about 50%. Sequence identity alone as a measure of similarity is most useful when the query sequence is usually, at least 80 residues in length; more usually, at least 90 residues in length; even more usually, at least 95 amino acid residues in length. More typically, similarity can be concluded based on sequence identity alone when the query sequence is preferably 100 residues in length; more preferably, 120 residues in length; even more preferably, 150 amino acid residues in length. Alignments with Profile and Multiple Aligned Sequences. Translations of the provided polynucleotides can be aligned with amino acid profiles that define either protein families or common motifs. Also, translations of the provided polynucleotides can be aligned to multiple sequence alignments (MSA) comprising the polypeptide sequences of members of protein families or motifs. Similarity or identity with profile sequences or MSAs can be used to determine the activity of the gene products (e.g., polypeptides) encoded by the provided polynucleotides or corresponding cDNA or genes. For example, sequences that show an identity or similarity with a chemokine profile or MSA can exhibit chemokine activities.

Profiles can be designed manually by (1) creating an MSA, which is an alignment of the amino acid sequence of members that belong to the family and (2) constructing a statistical representation of the alignment. Such methods are described, for example, in Bimey et al., Nucl. Acid Res. (1996) 24(14): 2730-2739. MSAs of some protein families and motifs are publicly available. For example, the Genome Sequencing Center at thw Washington University School of Medicine provides a web set (Pfam) which provides MSAs of 547 different families and motifs. These MSAs are described also in Sonnhammer et al., Proteins (1997) 28: 405-420. Other sources over the world wide web include the site supported by the European Molecular Biology Laboratories in Heidelberg, Germany. A brief description of these MSAs is reported in Pascarella et al., Prot. Eng. (1996) 9(3):249-251. Techniques for building profiles from MSAs are described in Sonnhammer et al., supra; Birney et al., supra; and "Computer Methods for Macromolecular Sequence Analysis," Methods in Enzymology (1996) 266, Doolittle, Academic Press, Inc., San Diego, California, USA.

Similarity between a query sequence and a protein family or motif can be determined by (a) comparing the query sequence against the profile and/or (b) aligning the query sequence with the members of the family or motif. Typically, a program such as Searchwise is used to compare the query sequence to the statistical representation of the multiple alignment, also known as a profile (see Birney et al., supra). Other techniques to compare the sequence and profile are described in Sonnhammer et al., supra and Doolittle, supra.

Next, methods described by Feng et al., J. Mol. Evol. (1987) 25:351 and Higgins et al., CABIOS (1989) 5:151 can be used align the query sequence with the members of a family or motif, also known as a MSA. Sequence alignments can be generated using any of a variety of software tools. Examples include PileUp, which creates a multiple sequence alignment, and is described in Feng et al., J. Mol. Evol. (1987) 25:351. Another method, GAP, uses the alignment method ofNeedleman et al., J. Mol. Biol. (1970) 48:443. GAP is best suited for global alignment of sequences. A third method, BestFit, functions by inserting gaps to maximize the number of matches using the local homology algorithm of Smith et al., Adv. Appl. Math. (1981) 2:482. In general, the following factors are used to determine if a similarity between a query sequence and a profile or MSA exists: (1) number of conserved residues found in the query sequence, (2) percentage of conserved residues found in the query sequence, (3) number of frameshifts, and (4) spacing between conserved residues.

Some alignment programs that both translate and align sequences can make any number of frameshifts when translating the nucleotide sequence to produce the best alignment. The fewer frameshifts needed to produce an alignment, the stronger the similarity or identity between the query and profile or MSAs. For example, a weak similarity resulting from no frameshifts can be a better indication of activity or structure of a query sequence, than a strong similarity resulting from two frameshifts. Preferably, three or fewer frameshifts are found in an alignment; more preferably two or fewer frameshifts; even more preferably, one or fewer frameshifts; even more preferably, no frameshifts are found in an alignment of query and profile or MSAs.

Conserved residues are those amino acids found at a particular position in all or some of the family or motif members. Alternatively, a position is considered conserved if only a certain class of amino acids is found in a particular position in all or some of the family members. For example, the N-terminal position can contain a positively charged amino acid, such as lysine, arginine, or histidine. Typically, a residue of a polypeptide is conserved when a class of amino acids or a single amino acid is found at a particular position in at least about 40% of all class members; more typically, at least about 50%; even more typically, at least about 60%> of the members. Usually, a residue is conserved when a class or single amino acid is found in at least about 70% of the members of a family or motif; more usually, at least about 80%; even more usually, at least about 90%; even more usually, at least about 95%.

A residue is considered conserved when three unrelated amino acids are found at a particular position in some or all of the members; more usually, two unrelated amino acids. These residues are conserved when the unrelated amino acids are found at particular positions in at least about 40% of all class member; more typically, at least about 50%; even more typically, at least about 60%> of the members. Usually, a residue is conserved when a class or single amino acid is found in at least about 70% of the members of a family or motif; more usually, at least about 80%; even more usually, at least about 90%>; even more usually, at least about 95%).

A query sequence has similarity to a profile or MSA when the query sequence comprises at least about 25% of the conserved residues of the profile or MSA; more usually, at least about 30%; even more usually; at least about 40%. Typically, the query sequence has a stronger similarity to a profile sequence or MSA when the query sequence comprises at least about 45% of the conserved residues of the profile or MSA; more typically, at least about 50%; even more typically, at least about 55%.

Identification of Secreted & Membrane-Bound Polypeptides. Both secreted and membrane- bound polypeptides of the present invention are of particular interest. For example, levels of secreted polypeptides can be assayed in body fluids that are convenient, such as blood, plasma, serum, and other body fluids such as urine, prostatic fluid and semen. Membrane-bound polypeptides are useful for constructing vaccine antigens or inducing an immune response. Such antigens would comprise all or part of the extracellular region of the membrane-bound polypeptides. Because both secreted and membrane-bound polypeptides comprise a fragment of contiguous hydrophobic amino acids, hydrophobicity predicting algorithms can be used to identify such polypeptides.

A signal sequence is usually encoded by both secreted and membrane-bound polypeptide genes to direct a polypeptide to the surface of the cell. The signal sequence usually comprises a stretch of hydrophobic residues. Such signal sequences can fold into helical structures. Membrane- bound polypeptides typically comprise at least one transmembrane region that possesses a stretch of hydrophobic amino acids that can transverse the membrane. Some transmembrane regions also exhibit a helical structure. Hydrophobic fragments within a polypeptide can be identified by using computer algorithms. Such algorithms include Hopp & Woods, Proc. Natl. Acad. Sci. USA (1981) 78:3824-3828; Kyte & Doolittle, J. Mol. Biol. (1982) 157: 105-132; and RAOAR algorithm, Degli Esposti et al., Eur. J. Biochem. (1990) 190: 207-219. Another method of identifying secreted and membrane-bound polypeptides is to translate the polynucleotides of the invention in all six frames and determine if at least 8 contiguous hydrophobic amino acids are present. Those translated polypeptides with at least 8; more typically, 10; even more typically, 12 contiguous hydrophobic amino acids are considered to be either a putative secreted or membrane bound polypeptide. Hydrophobic amino acids include alanine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, threonine, tryptophan, tyrosine, and valine Identification of the Function of an Expression Product of a Full-Length Gene

Ribozymes, antisense constructs, and dominant negative mutants can be used to determine function of the expression product of a gene corresponding to a polynucleotide provided herein. These methods and compositions are particularly useful where the provided novel polynucleotide exhibits no significant or substantial homology to a sequence encoding a gene of known function. Antisense molecules and ribozymes can be constructed from synthetic polynucleotides. Typically, the phosphoramidite method of oligonucleotide synthesis is used. See Beaucage et al., Tet. Lett. (1981) 22:1859 and USPN 4,668,777. Automated devices for synthesis are available to create oligonucleotides using this chemistry. Examples of such devices include Biosearch 8600, Models 392 and 394 by Applied Biosystems, a division of Perkin-Elmer Corp., Foster City, California, USA; and Expedite by Perceptive Biosystems, Framingham, Massachusetts, USA. Synthetic RNA, phosphate analog oligonucleotides, and chemically derivatized oligonucleotides can also be produced, and can be covalently attached to other molecules. RNA oligonucleotides can be synthesized, for example, using RNA phosphoramidites. This method can be performed on an automated synthesizer, such as Applied Biosystems, Models 392 and 394, Foster City, California, USA. Phosphorothioate oligonucleotides can also be synthesized for antisense construction. A sulfurizing reagent, such as tetraethylthiraam disulfide (TETD) in acetonitrile can be used to convert the internucleotide cyanoethyl phosphite to the phosphorothioate triester within 15 minutes at room temperature. TETD replaces the iodine reagent, while all other reagents used for standard phosphoramidite chemistry remain the same. Such a synthesis method can be automated using Models 392 and 394 by Applied Biosystems, for example.

Oligonucleotides of up to 200 nt can be synthesized, more typically, 100 nt; more typically 50 nt; even more typically, 30 to 40 nt. These synthetic fragments can be annealed and ligated together to construct larger fragments. See, for example, Sambrook et al., supra. Trans-cleaving catalytic RNAs (ribozymes) are RNA molecules possessing endoribonuclease activity. Ribozymes are specifically designed for a particular target, and the target message must contain a specific nucleotide sequence. They are engineered to cleave any RNA species site-specifically in the background of cellular RNA. The cleavage event renders the mRNA unstable and prevents protein expression. Importantly, ribozymes can be used to inhibit expression of a gene of unknown function for the purpose of determining its function in an in vitro or in vivo context, by detecting the phenotypic effect. One commonly used ribozyme motif is the hammerhead, for which the substrate sequence requirements are minimal. Design of the hammerhead ribozyme, as well as therapeutic uses of ribozymes, are disclosed in Usman et al., Current Opin. Struct. Biol. (1996) 6:527. Methods for production of ribozymes, including hairpin structure ribozyme fragments, methods of increasing ribozyme specificity, and the like are known in the art.

The hybridizing region of the ribozyme can be modified or can be prepared as a branched structure as described in Horn and Urdea, Nucleic Acids Res. (1989) 17:6959. The basic structure of the ribozymes can also be chemically altered in ways familiar to those skilled in the art, and chemically synthesized ribozymes can be administered as synthetic oligonucleotide derivatives modified by monomeric units. In a therapeutic context, liposome mediated delivery of ribozymes improves cellular uptake, as described in Birikh et al., Eur. J. Biochem. (1997) 245:1.

Antisense nucleic acids are designed to specifically bind to RNA, resulting in the formation of RNA-DNA or RNA-RNA hybrids, with an arrest of DNA replication, reverse transcription or messenger RNA translation. Antisense polynucleotides based on a selected polynucleotide sequence can interfere with expression of the corresponding gene. Antisense polynucleotides are typically generated within the cell by expression from antisense constructs that contain the antisense strand as the transcribed strand. Antisense polynucleotides based on the disclosed polynucleotides will bind and/or interfere with the translation of mRNA comprising a sequence complementary to the antisense polynucleotide. The expression products of control cells and cells treated with the antisense construct are compared to detect the protein product of the gene corresponding to the polynucleotide upon which the antisense construct is based. The protein is isolated and identified using routine biochemical methods.

Given the extensive background literature and clinical experience in antisense therapy, one skilled in the art can use selected polynucleotides of the invention as additional potential therapeutics. The choice of polynucleotide can be narrowed by first testing them for binding to "hot spot" regions of the genome of cancerous cells. If a polynucleotide is identified as binding to a "hot spot," testing the polynucleotide as an antisense compound in the corresponding cancer cells is warranted.

As an alternative method for identifying function of the gene corresponding to a polynucleotide disclosed herein, dominant negative mutations are readily generated for corresponding proteins that are active as homomultimers. A mutant polypeptide will interact with wild-type polypeptides (made from the other allele) and form a non-functional multimer. Thus, a mutation is in a substrate-binding domain, a catalytic domain, or a cellular localization domain. Preferably, the mutant polypeptide will be overproduced. Point mutations are made that have such an effect. In addition, fusion of different polypeptides of various lengths to the terminus of a protein can yield dominant negative mutants. General strategies are available for making dominant negative mutants

(see, e.g., Herskowitz, Nature (1987) 329:219). Such techniques can be used to create loss of function mutations, which are useful for determining protein function. Polypeptides and Variants Thereof The polypeptides of the invention include those encoded by the disclosed polynucleotides, as well as nucleic acids that, by virtue of the degeneracy of the genetic code, are not identical in sequence to the disclosed polynucleotides. Thus, the invention includes within its scope a polypeptide encoded by a polynucleotide having the sequence of any one of SEQ ID NOS: 1-1485 or a variant thereof. Also included in the invention are the polypeptides comprising the amino acid sequences of SEQ ID NOS:1486-1542. In general, the term "polypeptide" as used herein refers to both the full length polypeptide encoded by the recited polynucleotide, the polypeptide encoded by the gene represented by the recited polynucleotide, as well as portions or fragments thereof. "Polypeptides" also includes variants of the naturally occurring proteins, where such variants are homologous or substantially similar to the naturally occurring protein, and can be of an origin of the same or different species as the naturally occurring protein (e.g., human, murine, or some other species that naturally expresses the recited polypeptide, usually a mammalian species). In general, variant polypeptides have a sequence that has at least about 80%, usually at least about 90%, and more usually at least about 98% sequence identity with a differentially expressed polypeptide of the invention, as measured by BLAST 2.0 or TeraBLAST using the parameters described above. The variant polypeptides can be naturally or non- naturally glycosylated, i.e., the polypeptide has a glycosylation pattern that differs from the glycosylation pattern found in the corresponding naturally occurring protein. The invention also encompasses homologs of the disclosed polypeptides (or fragments thereof) where the homologs are isolated from other species, i.e. other animal or plant species, where such homologs, usually mammalian species, e.g. rodents, such as mice, rats; domestic animals, e.g., horse, cow, dog, cat; and humans. By "homolog" is meant a polypeptide having at least about 35%, usually at least about 40% and more usually at least about 60% amino acid sequence identity to a particular differentially expressed protein as identified above, where sequence identity is determined using the BLAST 2.0 or TeraBLAST algorithm, with the parameters described supra. hi general, the polypeptides of the subject invention are provided in a non-naturally occurring environment, e.g. are separated from their naturally occurring environment. In certain embodiments, the subject protein is present in a composition that is enriched for the protein as compared to a control. As such, purified polypeptide is provided, where by purified is meant that the protein is present in a composition that is substantially free of non-differentially expressed polypeptides, where by substantially free is meant that less than 90%, usually less than 60% and more usually less than 50%> of the composition is made up of non-differentially expressed polypeptides. Also within the scope of the invention are variants; variants of polypeptides include mutants, fragments, and fusions. Mutants can include amino acid substitutions, additions or deletions. The amino acid substitutions can be conservative amino acid substitutions or substitutions to eliminate non-essential amino acids, such as to alter a glycosylation site, a phosphorylation site or an acetylation site, or to minimize misfolding by substitution or deletion of one or more cysteine residues that are not necessary for function. Conservative amino acid substitutions are those that preserve the general charge, hydrophobicity/ hydrophilicity, and/or steric bulk of the amino acid substituted. Variants can be designed so as to retain or have enhanced biological activity of a particular region of the protein (e.g., a functional domain and/or, where the polypeptide is a member of a protein family, a region associated with a consensus sequence). Selection of amino acid alterations for production of variants can be based upon the accessibility (interior vs. exterior) of the amino acid (see, e.g., Go et al, hit. J. Peptide Protein Res. (1980) 15:211), the thermostability of the variant polypeptide (see, e.g., Querol et al., Prot. Eng. (1996) 9:265), desired glycosylation sites (see, e.g., Olsen and Thomsen, J. Gen. Microbiol. (1991) 137:579), desired disulfide bridges (see, e.g., Clarke et al., Biochemistry (1993) 32:4322; and Wakarchuk et al., Protein Eng. (1994) 7:1379), desired metal binding sites (see, e.g., Toma et al., Biochemistry (1991) 30:97, and Haezerbrouck et al., Protein Eng. (1993) 6:643), and desired substitutions withm proline loops (see, e.g., Masul et al., Appl. Env. Microbiol. (1994) 60:3579). Cysteine-depleted muteins can be produced as disclosed in USPN 4,959,314.

Variants also include fragments of the polypeptides disclosed herein, particularly haptens, biologically active fragments, and/or fragments corresponding to functional domains. Fragments of interest will typically be at least about 10 aa to at least about 15 aa in length, usually at least about 50 aa in length, and can be as long as 300 aa in length or longer, but will usually not exceed about 1000 aa in length, where the fragment will have a stretch of amino acids that is identical to a polypeptide encoded by a polynucleotide having a sequence of any SEQ ID NOS:l-1485, a polypeptide comprising a sequence of at least one of SEQ ID NOS:1486-1542, or a homolog thereof. The protein variants described herein are encoded by polynucleotides that are within the scope of the invention. The genetic code can be used to select the appropriate codons to construct the corresponding variants. A fragment of a subject polypeptide is, for example, a polypeptide having an amino acid sequence which is a portion of a subject polypeptide e.g. a polypeptide encoded by a subject polynucleotide that is identified by any one of the sequence the sequence listing or its complement. The polypeptide fragments of the invention are preferably at least about 9 aa, at least about 15 aa, and more preferably at least about 20 aa, still more preferably at least about 30 aa, and even more preferably, at least about 40 aa, at least about 50 aa, at least about 75 aa, at least about 100 aa, at least about 125 aa or at least about 150 aa in length. A fragment "at least 20 aa in length," for example, is intended to include 20 or more contiguous amino acids from, for example, the polypeptide encoded by a cDNA, in a cDNA clone contained in a deposited library, or a nucleotide sequence shown in the sequence listing or the complementary stand thereof. In this context "about" includes the particularly recited value or a value larger or smaller by several (5, 4, 3, 2, or 1) amino acids. These polypeptide fragments have uses that include, but are not limited to, production of antibodies as discussed herein. Of course, larger fragments (e.g., at least 150, 175, 200, 250, 500, 600, 1000, or 2000 amino acids in length) are also encompassed by the invention. Moreover, representative examples of polypeptides fragments of the invention (useful in, for example, as antigens for antibody production), include, for example, fragments comprising, or alternatively consisting of, a sequence from about amino acid number 1-10, 5-10, 10-20, 21-31, 31-40, 41-61, 61-81, 91-120, 121-140, 141-162, 162-200, 201-240, 241-280, 281- 320, 321-360, 360-400, 400-450, 451-500, 500-600, 600-700, 700-800, 800-900 and the like. In this context "about" includes the particularly recited range or a range larger or smaller by several (5, 4, 3, 2, or 1) amino acids, at either terminus or at both termini, hi some embodiments, these fragments has a functional activity (e.g., biological activity) whereas in other embodiments, these fragments may be used to make an antibody.

Further polypeptide variants may are described in PCT publications WO/00-55173, WO/01- 07611 and WO/02-16429

Computer-Related Embodiments

In general, a library of polynucleotides is a collection of sequence information, which information is provided in either biochemical form (e.g., as a collection of polynucleotide molecules), or in electronic form (e.g., as a collection of polynucleotide sequences stored in a computer-readable form, as in a computer system and/or as part of a computer program). The sequence information of the polynucleotides can be used in a variety of ways, e.g., as a resource for gene discovery, as a representation of sequences expressed in a selected cell type (e.g., cell type markers), and/or as markers of a given disease or disease state. In general, a disease marker is a representation of a gene product that is present in all cells affected by disease either at an increased or decreased level relative to a normal cell (e.g., a cell of the same or similar type that is not substantially affected by disease). For example, a polynucleotide sequence in a library can be a polynucleotide that represents an mRNA, polypeptide, or other gene product encoded by the polynucleotide, that is either overexpressed or underexpressed in a breast ductal cell affected by cancer relative to a normal (i.e., substantially disease-free) breast cell.

The nucleotide sequence information of the library can be embodied in any suitable form, e.g., electronic or biochemical forms. For example, a library of sequence information embodied in electronic form comprises an accessible computer data file (or, in biochemical form, a collection of nucleic acid molecules) that contains the representative nucleotide sequences of genes that are differentially expressed (e.g., overexpressed or underexpressed) as between, for example, i) a cancerous cell and a normal cell; ii) a cancerous cell and a dysplastic cell; iii) a cancerous cell and a cell affected by a disease or condition other than cancer; iv) a metastatic cancerous cell and a normal cell and/or non-metastatic cancerous cell; v) a malignant cancerous cell and a non-malignant cancerous cell (or a normal cell) and/or vi) a dysplastic cell relative to a normal cell. Other combinations and comparisons of cells affected by various diseases or stages of disease will be readily apparent to the ordinarily skilled artisan. Biochemical embodiments of the library include a collection of nucleic acids that have the sequences of the genes in the library, where the nucleic acids can correspond to the entire gene in the library or to a fragment thereof, as described in greater detail below.

The polynucleotide libraries of the subject invention generally comprise sequence information of a plurality of polynucleotide sequences, where at least one of the polynucleotides has a sequence of any of SEQ ID NOS: 1-1485. By plurality is meant at least 2, usually at least 3 and can include up to all of SEQ ID NOS: 1-1485. The length and number of polynucleotides in the library will vary with the nature of the library, e.g., if the library is an oligonucleotide array, a cDNA array, a computer database of the sequence information, etc.

Where the library is an electronic library, the nucleic acid sequence information can be present in a variety of media. "Media" refers to a manufacture, other than an isolated nucleic acid molecule, that contains the sequence information of the present invention. Such a manufacture provides the genome sequence or a subset thereof in a form that can be examined by means not directly applicable to the sequence as it exists in a nucleic acid. For example, the nucleotide sequence of the present invention, e.g. the nucleic acid sequences of any of the polynucleotides of SEQ ID NOS: 1-1485, can be recorded on computer readable media, e.g. any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as a floppy disc, a hard disc storage medium, and a magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these categories such as magnetic/optical storage media. One of skill in the art can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising a recording of the present sequence information. "Recorded" refers to a process for storing information on computer readable medium, using any such methods as known in the art. Any convenient data storage structure can be chosen, based on the means used to access the stored information. A variety of data processor programs and formats can be used for storage, e.g. word processing text file, database format, etc. In addition to the sequence information, electronic versions of the libraries of the invention can be provided in conjunction or connection with other computer-readable information and/or other types of computer-readable files (e.g., searchable files, executable files, etc, including, but not limited to, for example, search program software, etc.).

By providing the nucleotide sequence in computer readable form, the information can be accessed for a variety of purposes. Computer software to access sequence information is publicly available. For example, the gapped BLAST (Altschul et al. Nucleic Acids Res. (1997) 25:3389-3402) and BLAZE (Brutlag et al. Comp. Chem. (1993) 17:203) search algorithms on a Sybase system, or the TeraBLAST (TimeLogic, Crystal Bay, Nevada) program optionally running on a specialized computer platform available from TimeLogic, can be used to identify open reading frames (ORFs) within the genome that contain homology to ORFs from other organisms. As used herein, "a computer-based system" refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention. The minimum hardware of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based system are suitable for use in the present invention. The data storage means can comprise any manufacture comprising a recording of the present sequence information as described above, or a memory access means that can access such a manufacture.

"Search means" refers to one or more programs implemented on the computer-based system, to compare a target sequence or target structural motif, or expression levels of a polynucleotide in a sample, with the stored sequence information. Search means can be used to identify fragments or regions of the genome that match a particular target sequence or target motif. A variety of known algorithms are publicly known and commercially available, e.g. MacPattern (EMBL), BLASTN and BLASTX (NCBI), TeraBLAST (TimeLogic, Crystal Bay, Nevada). A "target sequence" can be any polynucleotide or amino acid sequence of six or more contiguous nucleotides or two or more amino acids, preferably from about 10 to 100 amino acids or from about 30 to 300 nt A variety of comparing means can be used to accomplish comparison of sequence information from a sample (e.g., to analyze target sequences, target motifs, or relative expression levels) with the data storage means. A skilled artisan can readily recognize that any one of the publicly available homology search programs can be used as the search means for the computer based systems of the present invention to accomplish comparison of target sequences and motifs. Computer programs to analyze expression levels in a sample and in controls are also known in the art.

A "target structural motif," or "target motif," refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a three-dimensional configuration that is formed upon the folding of the target motif, or on consensus sequences of regulatory or active sites. There are a variety of target motifs known in the art. Protein target motifs include, but arc not limited to, enzyme active sites and signal sequences. Nucleic acid target motifs include, but are not limited to, hairpin structures, promoter sequences and other expression elements such as binding sites for transcription factors.

A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. One format for an output means ranks the relative expression levels of different polynucleotides. Such presentation provides a skilled artisan with a ranking of relative expression levels to determine a gene expression profile.

As discussed above, the "library" of the invention also encompasses biochemical libraries of the polynucleotides of SEQ ID NOS: 1-1485 , e.g., collections of nucleic acids representing the provided polynucleotides. The biochemical libraries can take a variety of forms, e.g., a solution of cDNAs, a pattern of probe nucleic acids stably associated with a surface of a solid support (i.e., an array) and the like. Of particular interest are nucleic acid arrays in which one or more of SEQ ID NOS: 1-1485 is represented on the array. By array is meant a an article of manufacture that has at least a substrate with at least two distinct nucleic acid targets on one of its surfaces, where the number of distinct nucleic acids can be considerably higher, typically being at least 10, usually at least 20, and often at least 25 distinct nucleic acid molecules. A variety of different array formats have been developed and are known to those of skill in the art. The arrays of the subject invention find use in a variety of applications, including gene expression analysis, drug screening, mutation analysis and the like, as disclosed in the above-listed exemplary patent documents.

In addition to the above nucleic acid libraries, analogous libraries of polypeptides are also provided, where the polypeptides of the library will represent at least a portion of the polypeptides encoded by a gene corresponding to one or more of SEQ JJD NOS: 1-1485. Utilities

The polynucleotides of the invention are useful in a variety of applications. Exemplary utilies of the polynucleotides of the invention are described below. Construction of Larger Molecules: Recombinant DNAs and Nucleic Acid Multimers. In one embodiment of particular interest, the polynucleotides described herein as useful as the building blocks for larger molecules. In one example, the polynucleotide is a component of a larger cDNA molecule which in turn can be adapted for expression in a host cell (e.g., a bacterial or eukaryotic (e.g., yeast or mammalian) host cell). The cDNA can include, in addition to the polypeptide encoded by the starting material polynucleotide (i.e., a polynucleotide described herein), an amino acid sequence that is heterologous to the polypeptide encoded by the polynucleotide described herein (e.g., as in a sequence encoding a fusion protein). In some embodiments, the polynucleotides described herein is used as starting material polynucleotide for synthesizing all or a portion of the gene to which the described polynucleotide corresponds. For example, a DNA molecule encoding a full-length human polypeptide can be constructed using a polynucleotide described herein as starting material. In another embodiment, the polynucleotides of the invention are used in nucleic acid multimers. Nucleic acid multimers can be linear or branched polymers of the same repeating single- stranded oligonucleotide unit or different single-stranded oligonucleotide units. Where the molecules are branched, the multimers are generally described as either "fork" or "comb" structures. The oligonucleotide units of the multimer maybe composed of RNA, DNA, modified nucleotides or combinations thereof. At least one of the units has a sequence, length, and composition that permits it to bind specifically to a first single-stranded nucleotide sequence of interest, typically analyte or an oligonucleotide bound to the analyte. In order to achieve such specificity and stability, this unit will normally be 15 to 50 nt, preferably 15 to 30 nt, in length and have a GC content in the range of 40% to 60%). In addition to such unit(s), the multimer includes a multiplicity of units that are capable of hybridizing specifically and stably to a second single-stranded nucleotide of interest, typically a labeled oligonucleotide or another multimer. These units will also normally be 15 to 50 nt, preferably 15 to 30 nt, in length and have a GC content in the range of 40% to 60%. When a multimer is designed to be hybridized to another multimer, the first and second oligonucleotide units are heterogeneous (different). One or more of the polynucleotides described herein, or a portion of a polynucleotide described herein, can be used as a repeating unit of such nucleic acid multimers.

The total number of oligonucleotide units in the multimer will usually be in the range of 3 to 50, more usually 10 to 20. In multimers in which the unit that hybridizes to the nucleotide sequence of interest is different from the unit that hybridizes to the labeled oligonucleotide, the number ratio of the latter to the former will usually be 2:l to 30:l, more usually 5 : 1 to 20 : 1 , and-preferably 10 : 1 to 15:1.

The oligonucleotide units of the multimer may be covalently linked directly to each other through phosphodiester bonds or tlirough interposed linking agents such as nucleic acid, amino acid, carbohydrate or polyol bridges, or through other cross-linking agents that are capable of cross-linking nucleic acid or modified nucleic acid strands. The site(s) of linkage may be at the ends of the unit (in either normal 3,-5' orientation or randomly oriented) and/or at one or more internal nucleotides in the strand. In linear multimers the individual units are linked end-to-end to form a linear polymer. In one type of branched multimer three or more oligonucleotide units emanate from a point of origin to form a branched structure. The point of origin may be another oligonucleotide unit or a multifunctional molecule to which at least three units can be covalently bound. In another type, there is an oligonucleotide unit backbone with one or more pendant oligonucleotide units. These latter-type multimers are "fork-like", "comb-like" or combination "fork-" and "comb-like" in structure. The pendant units will normally depend from a modified nucleotide or other organic moiety having appropriate functional groups to which oligonucleotides may be conjugated or otherwise attached. The multimer may be totally linear, totally branched, or a combination of linear and branched portions. Preferably there will be at least two branch points in the multimer, more preferably at least 3, preferably 5 to 10. The multimer may include one or more segments of double-stranded sequences. Multimeric nucleic acid molecules are useful in amplifying the signal that results from hybridization of one the first sequence of the multimeric molecule to a target sequence. The amplification is theoretically proportional to the number of iterations of the second segment.

Without being held to theory, forked structures of greater than about eight branches exhibited steric hindrance which inhibited binding of labeled probes to the multimer. On the other hand, comb structures exhibit little or no steric problems and are thus a preferred type of branched multimer. For a description of branched nucleic acid multimers of both the fork and comb types, as well as methods of use and synthesis, see, e.g., U.S. Pat. Nos. 5,124,246 (fork-type structures); 5,710,264 (synthesis of comb structures); and 5,849,481. Use of Polynucleotide Probes in Mapping, and in Tissue Profiling. Polynucleotide probes, generally comprising at least 12 contiguous nt of a polynucleotide as shown in the Sequence Listing, are used for a variety of purposes, such as chromosome mapping of the polynucleotide and detection of transcription levels. Additional disclosure about preferred regions of the disclosed polynucleotide sequences is found in the Examples. A probe that hybridizes specifically to a polynucleotide disclosed herein should provide a detection signal at least 5-, 10-, or 20-fold higher than the background hybridization provided with other unrelated sequences.

Detection of Expression Levels. Nucleotide probes are used to detect expression of a gene corresponding to the provided polynucleotide. In Northern blots, mRNA is separated electrophoretically and contacted with a probe. A probe is detected as hybridizing to an mRNA species of a particular size. The amount of hybridization is quantitated to determine relative amounts of expression, for example under a particular condition. Probes are used for in situ hybridization to cells to detect expression. Probes can also be used in vivo for diagnostic detection of hybridizing sequences. Probes are typically labeled with a radioactive isotope. Other types of detectable labels can be used such as chromophores, fluors, and enzymes. Other examples of nucleotide hybridization assays are described in WO92/02526 and USPN 5,124,246. Alternatively, the Polymerase Chain Reaction (PCR) is another means for detecting small amounts of target nucleic acids (see, e.g., Mullis et al., Meth. Enzymol. (1987) 155:335; USPN 4,683,195; and USPN 4,683,202). Two primer polynucleotides nucleotides that hybridize with the target nucleic acids are used to prime the reaction. The primers can be composed of sequence within or 3' and 5' to the polynucleotides of the Sequence Listing. Alternatively, if the primers are 3' and 5' to these polynucleotides, they need not hybridize to them or the complements. After amplification of the target with a thermostable polymerase, the amplified target nucleic acids can be detected by methods known in the art, e.g., Southern blot. mRNA or cDNA can also be detected by traditional blotting techniques (e.g., Southern blot, Northern blot, etc.) described in Sambrook et al., "Molecular Cloning: A Laboratory Manual" (New York, Cold Spring Harbor Laboratory, 1989) (e.g., without PCR amplification), hi general, mRNA or cDNA generated from mRNA using a polymerase enzyme can be purified and separated using gel electrophoresis, and transferred to a solid support, such as nitrocellulose. The solid support is exposed to a labeled probe, washed to remove any unhybridized probe, and duplexes containing the labeled probe are detected. Mapping. Polynucleotides of the present invention can be used to identify a chromosome on which the corresponding gene resides. Such mapping can be useful in identifying the function of the polynucleotide-related gene by its proximity to other genes with known function. Function can also be assigned to the polynucleotide-related gene when particular syndromes or diseases map to the same chromosome. For example, use of polynucleotide probes in identification and quantification of nucleic acid sequence aberrations is described in USPN 5,783,387. An exemplary mapping method is fluorescence in situ hybridization (FISH), which facilitates comparative genomic hybridization to allow total genome assessment of changes in relative copy number of DNA sequences (see, e.g., Valdes et al., Methods in Molecular Biology (1997) 68: 1). Polynucleotides can also be mapped to particular chromosomes using, for example, radiation hybrids or chromosome-specific hybrid panels. See Leach et al., Advances in Genetics, (1995) 33 :63-99; Walter et al., Nature Genetics (1994) 7:22; Walter and Goodfellow, Trends in Genetics (1992) 9:352. Panels for radiation hybrid mapping are available from Research Genetics, Inc., Huntsville, Alabama, USA. Databases for markers using various panels are available via the world wide web at sites supported by the Stanford Human Genome Center (Stanford University) and the Whitehead Institute for Biomedical Research/MIT Center for Genome Research. The statistical program RHMAP can be used to construct a map based on the data from radiation hybridization with a measure of the relative likelihood of one order versus another. RHMAP is available via the world wide web at a site supported by the University of Michigan. In addition, commercial programs are available for identifying regions of chromosomes commonly associated with disease, such as cancer. Tissue Typing or Profiling. Expression of specific mRNA corresponding to the provided polynucleotides can vary in different cell types and can be tissue-specific. This variation of mRNA levels in different cell types can be exploited with nucleic acid probe assays to determine tissue types. For example, PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes substantially identical or complementary to polynucleotides listed in the Sequence Listing can determine the presence or absence of the corresponding cDNA or mRNA. Tissue typing can be used to identify the developmental organ or tissue source of a metastatic lesion by identifying the expression of a particular marker of that organ or tissue. If a polynucleotide is expressed only in a specific tissue type,- and a metastatic lesion is found to express that polynucleotide, then the developmental source of the lesion has been identified. Expression of a particular polynucleotide can be assayed by detection of either the corresponding mRNA or the protein product. As would be readily apparent to any forensic scientist, the sequences disclosed herein are useful in differentiating human tissue from non-human tissue. In particular, these sequences are useful to differentiate human tissue from bird, reptile, and amphibian tissue, for example.

Use of Polymorphisms. A polynucleotide of the invention can be used in forensics, genetic analysis, mapping, and diagnostic applications where the corresponding region of a gene is polymorphic in the human population. Any means for detecting a polymorphism in a gene can be used, including, but not limited to electrophoresis of protein polymorphic variants, differential sensitivity to restriction enzyme cleavage, and hybridization to allele-specific probes.

Antibody Production. The present invention further provides antibodies, which may be isolated antibodies, that are specific for a polypeptide encoded by a polynucleotide described herein (e.g., a polypeptide encoded by a sequence corresponding to SEQ ID NOS:l-1485, a polypeptide comprising an amino acid sequence of SEQ ID NOS:1486-1542). Antibodies can be provided in a composition comprising the antibody and a buffer and/or a pharmaceutically acceptable excipient. Antibodies specific for a polypeptide associated with prostate cancer are useful in a variety of diagnostic and therapeutic methods, as discussed in detail herein. Expression products of a polynucleotide of the invention, as well as the corresponding mRNA, cDNA, or complete gene, can be prepared and used for raising antibodies for experimental, diagnostic, and therapeutic purposes. For polynucleotides to which a corresponding gene has not been assigned, this provides an additional method of identifying the corresponding gene. The polynucleotide or related cDNA is expressed as described above, and antibodies are prepared. These antibodies are specific to an epitope on the polypeptide encoded by the polynucleotide, and can precipitate or bind to the corresponding native protein in a cell or tissue preparation or in a cell-free extract of an in vitro expression system.

Methods for production of antibodies that specifically bind a selected antigen are well known in the art. hnmunogens for raising antibodies can be prepared by mixing a polypeptide encoded by a polynucleotide of the invention with an adjuvant, and/or by making fusion proteins with larger iinmunogenic proteins. Polypeptides can also be covalently linked to other larger immunogenic proteins, such as keyhole limpet hemocyanin. lmmunogens are typically administered intradermally, subcutaneously, or intramuscularly to experimental animals such as rabbits, sheep, and mice, to generate antibodies. Monoclonal antibodies can be generated by isolating spleen cells and fusing myeloma cells to form hybridomas. Alternatively, the selected polynucleotide is administered directly, such as by intramuscular injection, and expressed in vivo. The expressed protein generates a variety of protein-specific immune responses, including production of antibodies, comparable to administration of the protein.

Preparations of polyclonal and monoclonal antibodies specific for polypeptides encoded by a selected polynucleotide are made using standard methods known in the art. The antibodies specifically bind to epitopes present in the polypeptides encoded by polynucleotides disclosed in the Sequence Listing. Typically, at least 6, 8, 10, or 12 contiguous amino acids are required to form an epitope. Epitopes that involve non-contiguous amino acids may require a longer polypeptide, e.g., at least 15, 25, or 50 amino acids. Antibodies that specifically bind to human polypeptides encoded by the provided polypeptides should provide a detection signal at least 5-, 10-, or 20-fold higher than a detection signal provided with other proteins when used in Western blots or other immunochemical assays. Preferably, antibodies that specifically bind polypeptides contemplated by the invention do not bind to other proteins in immunochemical assays at detectable levels and can immunoprecipitate the specific polypeptide from solution.

The invention also contemplates naturally occurring antibodies specific for a polypeptide of the invention. For example, serum antibodies to a polypeptide of the invention in a human population can be purified by methods well known in the art, e.g., by passing antiserum over a column to which the corresponding selected polypeptide or fusion protein is bound. The bound antibodies can then be eluted from the column, for example, using a buffer with a high salt concentration. hi addition to the antibodies discussed above, the invention also contemplates genetically engineered antibodies antibodies (e.g., chimeric antibodies, humanized antibodies, human antibodies produced by a transgenic animal (e.g., a transgenic mouse such as the XenomousTM), antibody derivatives (e.g., single chain antibodies, antibody fragments (e.g., Fab, etc.)), according to methods well known in the art.

The invention also contemplates other molecules that can specifically bind a polynucleotide or polypeptide of the invention. Examples of such molecules include, but are not necessarily limited to, single-chain binding proteins (e.g., mono- and multi-valent single chain antigen binding proteins (see, e.g., U.S. Patent Nos. 4,704,692; 4,946,778; 4,946,778; 6,027,725; 6,121,424)), oligonucleotide- based synthetic antibodies (e.g., oligobodies (see, e.g., Radrizzani et al, Medicina (B Aires) (1999) 59:753-8; Radrizzani et al, Medicina (B Aires) (2000) 60(Suppl 2):55-60)), aptamers (see, e.g., Gening et al, Biotechniques (2001) 3:828, 830, 832, 834; Cox and Ellington, Bioorg. Med. Chem. (2001) 9:2525-31), and the like. Polynucleotides or Arrays for Diagnostics.

Polynucleotide arrays provide a high throughput technique that can assay a large number of polynucleotides in a sample. This technology can be used as a diagnostic and as tool to test for differential expression expression, e.g., to determine function of an encoded protein. A variety of methods of producing arrays, as well as variations of these methods, are known in the art and contemplated for use in the invention. For example, arrays can be created by spotting polynucleotide probes onto a substrate (e.g., glass, nitrocellulose, etc.) in a two-dimensional matrix or array having bound probes. The probes can be bound to the substrate by either covalent bonds or by non-specific interactions, such as hydrophobic interactions. Samples of polynucleotides can be detectably labeled (e.g., using radioactive or fluorescent labels) and then hybridized to the probes. Double stranded polynucleotides, comprising the labeled sample polynucleotides bound to probe polynucleotides, can be detected once the unbound portion of the sample is washed away. Alternatively, the polynucleotides of the test sample can be immobilized on the array, and the probes detectably labeled. Techniques for constructing arrays and methods of using these arrays are described in, for example, Schena et al. (1996) Proc Natl Acad Sci U S A. 93(20): 10614-9; Schena et al. (1995) Science

270(5235):467-70; Shalon et al. (1996) Genome Res. 6(7):639-45, USPN 5,807,522, EP 799 897; WO 97/29212; WO 97/27317; EP 785 280; WO 97/02357; USPN 5,593,839; USPN 5,578,832; EP 728 520; USPN 5,599,695; EP 721 016; USPN 5,556,752; WO 95/22058; and USPN 5,631,734.

Arrays can be used to, for example, examine differential expression of genes and can be used to determine gene function. For example, arrays can be used to detect differential expression of a gene corresponding to a polynucleotide of the invention, where expression is compared between a test cell and control cell (e.g., cancer cells and normal cells). For example, high expression of a particular message in a cancer cell, which is not observed in a corresponding normal cell, can indicate a cancer specific gene product. Exemplary uses of arrays are further described in, for example, Pappalarado et al., Sem. Radiation Oncol. (1998) 8:217; and Ramsay Nature Biotechnol. (1998) 16:40. Furthermore, many variations on methods of detection using arrays are well within the skill in the art and within the scope of the present invention. For example, rather than immobilizing the probe to a solid support, the test sample can be immobilized on a solid support which is then contacted with the probe. Differential Expression in Diagnosis The polynucleotides of the invention can also be used to detect differences in expression levels between two cells, e.g., as a method to identify abnormal or diseased tissue in a human. For polynucleotides corresponding to profiles of protein families, the choice of tissue can be selected according to the putative biological function. In general, the expression of a gene corresponding to a specific polynucleotide is compared between a first tissue that is suspected of being diseased and a second, normal tissue of the human. The tissue suspected of being abnormal or diseased can be derived from a different tissue type of the human, but preferably it is derived from the same tissue type; for example, an intestinal polyp or other abnormal growth should be compared with normal intestinal tissue. The normal tissue can be the same tissue as that of the test sample, or any normal tissue of the patient, especially those that express the polynucleotide-related gene of interest (e.g., brain, thymus, testis, heart, prostate, placenta, spleen, small intestine, skeletal muscle, pancreas, and the mucosal lining of the colon). A difference between the polynucleotide-related gene, mRNA, or protein in the two tissues which are compared, for example, in molecular weight, amino acid or nucleotide sequence, or relative abundance, indicates a change in the gene, or a gene which regulates it, in the tissue of the human that was suspected of being diseased. Examples of detection of differential expression and its use in diagnosis of cancer are described in USPNs 5,688,641 and 5,677,125.

A genetic predisposition to disease in a human can also be detected by comparing expression levels of an mRNA or protein corresponding to a polynucleotide of the invention in a fetal tissue with levels associated in normal fetal tissue. Fetal tissues that are used for this purpose include, but are not limited to, amniotic fluid, chorionic villi, blood, and the blastomere of an in vitro-fertilized embryo. The comparable normal polynucleotide-related gene is obtained from any tissue. The mRNA or protein is obtained from a normal tissue of a human in which the polynucleotide-related gene is expressed. Differences such as alterations in the nucleotide sequence or size of the same product of the fetal polynucleotide-related gene or mRNA, or alterations in the molecular weight, amino acid sequence, or relative abundance of fetal protein, can indicate a germline mutation in the polynucleotide-related gene of the fetus, which indicates a genetic predisposition to disease. In general, diagnostic, prognostic, and other methods of the invention based on differential expression involve detection of a level or amount of a gene product, particularly a differentially expressed gene product, in a test sample obtained from a patient suspected of having or being susceptible to a disease (e.g., breast cancer, prostate cancer, lung cancer, colon cancer and/or metastatic forms thereof), and comparing the detected levels to those levels found in normal cells (e.g., cells substantially unaffected by cancer) and/or other control cells (e.g., to differentiate a cancerous cell from a cell affected by dysplasia). Furthermore, the severity of the disease can be assessed by comparing the detected levels of a differentially expressed gene product with those levels detected in samples representing the levels of differentially expressed gene product associated with varying degrees of severity of disease. It should be noted that use of the term "diagnostic" herein is not necessarily meant to exclude "prognostic" or "prognosis," but rather is used as a matter of convenience.

The term "differentially expressed gene" is generally intended to encompass a polynucleotide that can, for example, include an open reading frame encoding a gene product (e.g., a polypeptide), and/or introns of such genes and adjacent 5' and 3' non-coding nucleotide sequences involved in the regulation of expression, up to about 20 kb beyond the coding region, but possibly further in either direction. The gene can be introduced into an appropriate vector for extrachromosomal maintenance or for integration into a host genome. In general, a difference in expression level associated with a decrease in expression level of at least about 25%>, usually at least about 50% to 75%, more usually at least about 90% or more is indicative of a differentially expressed gene of interest, i.e., a gene that is underexpressed or down-regulated in the test sample relative to a control sample. Furthermore, a difference in expression level associated with an increase in expression of at least about 25%, usually at least about 50% to 75%, more usually at least about 90% and can be at least about FΛ-fold, usually at least about 2-fold to about 10-fold, and can be about 100-fold to about 1, 000-fold increase relative to a control sample is indicative of a differentially expressed gene of interest, i.e., an overexpressed or up-regulated gene. "Differentially expressed polynucleotide" as used herein means a nucleic acid molecule (RNA or DNA) comprising a sequence that represents a differentially expressed gene, e.g., the differentially expressed polynucleotide comprises a sequence (e.g., an open reading frame encoding a gene product) that uniquely identifies a differentially expressed gene so that detection of the differentially expressed polynucleotide in a sample is correlated with the presence of a differentially expressed gene in a sample. "Differentially expressed polynucleotide" is also meant to encompass fragments of the disclosed polynucleotides, e.g., fragments retaining biological activity, as well as nucleic acids homologous, substantially similar, or substantially identical (e.g., having about 90% sequence identity) to the disclosed polynucleotides.

Methods of the subject invention useful in diagnosis or prognosis typically involve comparison of the abundance of a selected differentially expressed gene product in a sample of interest with that of a control to determine any relative differences in the expression of the gene product, where the difference can be measured qualitatively and/or quantitatively. Quantitation can be accomplished, for example, by comparing the level of expression product detected in the sample with the amounts of product present in a standard curve. A comparison can be made visually; by using a technique such as densitometry, with or without computerized assistance; by preparing a representative library of cDNA clones of mRNA isolated from a test sample, sequencing the clones in the library to determine that number of cDNA clones corresponding to the same gene product, and analyzing the number of clones corresponding to that same gene product relative to the number of clones of the same gene product in a control sample; or by using an array to detect relative levels of hybridization to a selected sequence or set of sequences, and comparing the hybridization pattern to that of a control. The differences in expression are then correlated with the presence or absence of an abnormal expression pattern. A variety of different methods for determining the nucleic acid abundance in a sample are known to those of skill in the art (see, e.g., WO 97/27317).

In general, diagnostic assays of the invention involve detection of a gene product of a polynucleotide sequence (e.g., mRNA or polypeptide) that corresponds to a sequence of SEQ ID

NOS:l-1485. The patient from whom the sample is obtained can be apparently healthy, susceptible to disease (e.g., as determined by family history or exposure to certain environmental factors), or can already be identified as having a condition in which altered expression of a gene product of the invention is implicated.

Diagnosis can be determined based on detected gene product expression levels of a gene product encoded by at least one, preferably at least two or more, at least 3 or more, or at least 4 or more of the polynucleotides having a sequence set forth in SEQ ID NOS: 1-1485, and can involve detection of expression of genes corresponding to all of SEQ ID NOS:l-1485 and/or additional sequences that can serve as additional diagnostic markers and/or reference sequences. Where the diagnostic method is designed to detect the presence or susceptibility of a patient to cancer, the assay preferably involves detection of a gene product encoded by a gene corresponding to a polynucleotide that is differentially expressed in cancer. Examples of such differentially expressed polynucleotides are described in the Examples below. Given the provided polynucleotides and information regarding their relative expression levels provided herein, assays using such polynucleotides and detection of their expression levels in diagnosis and prognosis will be readily apparent to the ordinarily skilled artisan.

Any of a variety of detectable labels can be used in connection with the various embodiments of the diagnostic methods of the invention. Suitable detectable labels include fluorochromes,(e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6- carboxyfluorescein (6-FAM), 2',7'-dimethoxy-4',5'-dichloro-6-carboxyfluorescein, 6-carboxy-X- rhodamine (ROX), 6-carboxy-2',4',7',4,7-hexachlorofluorescein (HEX), 5-carboxyfluorescein

(5-FAM) orN,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA)), radioactive labels, (e.g. 32P, 35S, 3H, etc.), and the like. The detectable label can involve a two stage systems (e.g., biotin-avidin, hapten-anti-hapten antibody, etc.).

Reagents specific for the polynucleotides and polypeptides of the invention, such as antibodies and nucleotide probes, can be supplied in a kit for detecting the presence of an expression product in a biological sample. The kit can also contain buffers or labeling components, as well as instructions for using the reagents to detect and quantify expression products in the biological sample. Exemplary embodiments of the diagnostic methods of the invention are described below in more detail. Polypeptide detection in diagnosis. In one embodiment, the test sample is assayed for the level of a differentially expressed polypeptide, such as a polypeptide of a gene corresponding to SEQ JD NOS: 1-1485 and/or a polypeptide comprising a sequence of SEQ ID NO: 1486-1542. Diagnosis can be accomplished using any of a number of methods to determine the absence or presence or altered amounts of the differentially expressed polypeptide in the test sample. For example, detection can utilize staining of cells or histological sections with labeled antibodies, performed in accordance with conventional methods. Cells can be permeabilized to stain cytoplasmic molecules. In general, antibodies that specifically bind a differentially expressed polypeptide of the invention are added to a sample, and incubated for a period of time sufficient to allow binding to the epitope, usually at least about 10 minutes. The antibody can be detectably labeled for direct detection (e.g., using radioisotopes, enzymes, fluorescers, chemiluminescers, and the like), or can be used in conjunction with a second stage antibody or reagent to detect binding (e.g., biotin with horseradish peroxidase- conjugated avidin, a secondary antibody conjugated to a fluorescent compound, e.g. fluorescein, rhodamine, Texas red, etc.). The absence or presence of antibody binding can be determined by various methods, including flow cytometry of dissociated cells, microscopy, radiography, scintillation counting, etc. Any suitable alternative methods of qualitative or quantitative detection of levels or amounts of differentially expressed polypeptide can be used, for example, ELISA, western blot, immunoprecipitation, radioimmunoassay, etc. mRNA detection. The diagnostic methods of the invention can also or alternatively involve detection of mRNA encoded by a gene corresponding to a differentially expressed polynucleotide of . the invention. Any suitable qualitative or quantitative methods known in the art for detecting specific mRNAs can be used. mRNA can be detected by, for example, in situ hybridization in tissue sections, by reverse transcriptase-PCR, or in Northern blots containing poly A+ mRNA. One of skill in the art can readily use these methods to determine differences in the size or amount of mRNA transcripts between two samples. mRNA expression levels in a sample can also be determined by generation of a library of expressed sequence tags (ESTs) from the sample, where the EST library is representative of sequences present in the sample (Adams et al., (1991) Science 252:1651). Enumeration of the relative representation of ESTs within the library can be used to approximate the relative representation of the gene transcript within the starting sample. The results of EST analysis of a test sample can then be compared to EST analysis of a reference sample to determine the relative expression levels of a selected polynucleotide, particularly a polynucleotide corresponding to one or more of the differentially expressed genes described herein. Alternatively, gene expression in a test sample can be performed using serial analysis of gene expression (SAGE) methodology (e.g., Velculescu et al., Science (1995) 270:484) or differential display (DD) methodology (see, e.g., USPN 5,776,683 and USPN 5,807,680).

Alternatively, gene expression can be analyzed using hybridization analysis. Oligonucleotides or cDNA can be used to selectively identify or capture DNA or RNA of specific sequence composition, and the amount of RNA or cDNA hybridized to a known capture sequence determined qualitatively or quantitatively, to provide infonnation about the relative representation of a particular message within the pool of cellular messages in a sample. Hybridization analysis can be designed to allow for concurrent screening of the relative expression of hundreds to thousands of genes by using, for example, array-based technologies having high density formats, including filters, microscope slides, or microchips, or solution-based technologies that use spectroscopic analysis (e.g., mass spectrometry). One exemplary use of arrays in the diagnostic methods of the invention is described below in more detail.

Use of a single gene in diagnostic applications. The diagnostic methods of the invention can focus on the expression of a single differentially expressed gene. For example, the diagnostic method can involve detecting a differentially expressed gene, or a polymorphism of such a gene (e.g., a polymorphism in a coding region or control region), that is associated with disease. Disease- associated polymorphisms can include deletion or truncation of the gene, mutations that alter expression level and/or affect activity of the encoded protein, etc.

A number of methods are available for analyzing nucleic acids for the presence of a specific sequence, e.g. a disease associated polymorphism. Where large amounts of DNA are available, genomic DNA is used directly. Alternatively, the region of interest is cloned into a suitable vector and grown in sufficient quantity for analysis. Cells that express a differentially expressed gene can be used as a source of mRNA, which can be assayed directly or reverse transcribed into cDNA for analysis. The nucleic acid can be amplified by conventional techniques, such as the polymerase chain reaction (PCR), to provide sufficient amounts for analysis, and a detectable label can be included in the amplification reaction (e.g., using a detectably labeled primer or detectably labeled oligonucleotides) to facilitate detection. Alternatively, various methods are also known in the art that utilize oligonucleotide ligation as a means of detecting polymorphisms, see, e.g., Riley et al., Nucl. Acids Res. (1990) 18:2887; and Delahunty et al., Am. J. Hum. Genet. (1996) 58:1239. The amplified or cloned sample nucleic acid can be analyzed by one of a number of methods known in the art. The nucleic acid can be sequenced by dideoxy or other methods, and the sequence of bases compared to a selected sequence, e.g., to a wild-type sequence. Hybridization with the polymorphic or variant sequence can also be used to determine its presence in a sample (e.g., by Southern blot, dot blot, etc.). The hybridization pattern of a polymorphic or variant sequence and a control sequence to an array of oligonucleotide probes immobilized on a solid support, as described in US 5,445,934, or in WO 95/35505, can also be used as a means of identifying polymorphic or variant sequences associated with disease. Single strand conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), and heteroduplex analysis in gel matrices are used to detect conformational changes created by DNA sequence variation as alterations in electrophoretic mobility. Alternatively, where a polymorphism creates or destroys a recognition site for a restriction endonuclease, the sample is digested with that endonuclease, and the products size fractionated to determine whether the fragment was digested. Fractionation is performed by gel or capillary electrophoresis, particularly acrylamide or agarose gels.

Screening for mutations in a gene can be based on the functional or antigenic characteristics of the protein. Protein truncation assays are useful in detecting deletions that can affect the biological activity of the protein. Various immunoassays designed to detect polymorphisms in proteins can be used in screening. Where many diverse genetic mutations lead to a particular disease phenotype, functional protein assays have proven to be effective screening tools. The activity of the encoded protein can be determined by comparison with the wild-type protein.

Diagnosis. Prognosis. Assessment of Therapy (Therametrics). and Management of Cancer The polynucleotides of the invention, as well as their gene products, are of particular interest as genetic or biochemical markers (e.g., in blood or tissues) that will detect the earliest changes along the carcinogenesis pathway and/or to monitor the efficacy of various therapies and preventive interventions. For example, the level of expression of certain polynucleotides can be indicative of a poorer prognosis, and therefore warrant more aggressive chemo- or radio-therapy for a patient or vice versa. The correlation of novel surrogate tumor specific features with response to treatment and outcome in patients can define prognostic indicators that allow the design of tailored therapy based on the molecular profile of the tumor. These therapies include antibody targeting, antagonists (e.g., small molecules), and gene therapy. Determining expression of certain polynucleotides and comparison of a patient's profile with known expression in normal tissue and variants of the disease allows a determmation of the best possible treatment for a patient, both in terms of specificity of treatment and in terms of comfort level of the patient. Surrogate tumor markers, such as polynucleotide expression, can also be used to better classify, and thus diagnose and treat, different forms and disease states of cancer. Two classifications widely used in oncology that can benefit from identification of the expression levels of the genes corresponding to the polynucleotides of the invention are staging of the cancerous disorder, and grading the nature of the cancerous tissue.

The polynucleotides that correspond to differentially expressed genes, as well as their encoded gene products, can be useful to monitor patients having or susceptible to cancer to detect potentially malignant events at a molecular level before they are detectable at a gross morphological level, hi addition, the polynucleotides of the invention, as well as the genes corresponding to such polynucleotides, can be useful as therametrics, e.g., to assess the effectiveness of therapy by using the polynucleotides or their encoded gene products, to assess, for example, tumor burden in the patient before, during, and after therapy.

Furthermore, a polynucleotide identified as corresponding to a gene that is differentially expressed in, and thus is important for, one type of cancer can also have implications for development or risk of development of other types of cancer, e.g., where a polynucleotide represents a gene differentially expressed across various cancer types. Thus, for example, expression of a polynucleotide corresponding to a gene that has clinical implications for metastatic colon cancer can also have clinical implications for stomach cancer or endometrial cancer.

Staging. Staging is a process used by physicians to describe how advanced the cancerous state is in a patient. Staging assists the physician in determining a prognosis, planning treatment and evaluating the results of such treatment. Staging systems vary with the types of cancer, but generally involve the following "TNM" system: the type of tumor, indicated by T; whether the cancer has metastasized to nearby lymph nodes, indicated by N; and whether the cancer has metastasized to more distant parts of the body, indicated by M. Generally, if a cancer is only detectable in the area of the primary lesion without having spread to any lymph nodes, it is called Stage I or Stage π, depending on the degree of mvasiveness as indicated by the tumor grade of the primary lesion. If the primary lesion is of tumor grade I or II and the patient does not have any regional or distant metastasis, the cancer is classified as Stage I. If the primary lesion is of tumor grade IH or IV and the patient does not have any regional or distant metastasis, the cancer is classified as Stage II. If the cancer has spread only to the regional lymph nodes, it is classified as Stage HI. Cancers that have spread to a distant part of the body, such as liver, bone, brain or other sites, are Stage TV, the most advanced stage.

The polynucleotides of the invention can facilitate fine-tuning of the staging process by identifying markers for the aggresivity of a cancer, e.g., the metastatic potential, as well as the presence in different areas of the body. Thus, a Stage II cancer with a polynucleotide signifying a high metastatic potential cancer can be used to change a borderline Stage II tumor to a Stage in tumor, justifying more aggressive therapy. Conversely, the presence of a polynucleotide signifying a lower metastatic potential allows more conservative staging of a tumor.

Grading of cancers. Grade is a term used to describe how closely a tumor resembles normal tissue of its same type. The microscopic appearance of a tumor is used to identify tumor grade based on parameters such as cell morphology, cellular organization, and other markers of differentiation. As a general rule, the grade of a tumor corresponds to its rate of growth or aggressiveness, with undifferentiated or high-grade tumors being more aggressive than well-differentiated or low-grade tumors. The following guidelines are generally used for grading tumors: 1) GX Grade cannot be assessed; 2) Gl Well differentiated; 3) G2 Moderately well differentiated; 4) G3 Poorly differentiated; 5) G4 Undifferentiated. The polynucleotides of the invention can be especially valuable in determining the grade of the tumor, as they not only can aid in determining the differentiation status of the cells of a tumor, they can also identify factors other than differentiation that are valuable in determining the aggressiveness of a tumor, such as metastatic potential.

For prostate cancer, the Gleason Grading/Scoring system is most commonly used. A prostate biopsy tissue sample is examined under a microscope and a grade is assigned to the tissue based on: 1) the appearance of the cells, and 2) the arrangement of the cells. Each parameter is assessed on a scale of one (cells are almost normal) to five (abnormal), and the individual Gleason Grades are presented separated by a "+" sign. Alternatively, the two grades are combined to give a Gleason Score of 2-10. Thus, for a tissue sample that received a grade of 3 for each parameter, the Gleason Grade would be 3+3 and the Gleason Score would be 6. A lower Gleason Score indicates a well-differentiated tumor, while a higher Gleason Score indicates a poorly differentiated cancer that is more likely to spread. The majority of biopsies in general are Gleason Scores 5, 6 and 7.

The polynucleotides of the Sequence Listing, and their corresponding genes and gene products, can be especially valuable in determining the grade of the tumor, as they not only can aid in determining the differentiation status of the cells of a tumor, they can also identify factors other than differentiation that are valuable in determining the aggressiveness of a tumor, such as metastatic potential.

Assessment of proliferation of cells in tumor. The differential expression level of the polynucleotides described herein can facilitate assessment of the rate of proliferation of tumor cells, and thus provide an indicator of the aggressiveness of the rate of tumor growth. For example, assessment of the relative expression levels of genes involved in the cell cycle can provide an indication of cellular proliferation, and thus serve as a marker of proliferation.

Detection of colon cancer. The polynucleotides corresponding to genes that exhibit the appropriate expression pattern can be used to detect colon cancer in a subject. Colorectal cancer is one of the most common neoplasms in humans and perhaps the most frequent form of hereditary neoplasia. Prevention and early detection are key factors in controlling and curing colorectal cancer. Colorectal cancer begins as polyps, which are small, benign growths of cells that form on the inner lining of the colon. Over a period of several years, some of these polyps accumulate additional mutations and become cancerous. Multiple familial colorectal cancer disorders have been identified, which are summarized as follows: 1) Familial adenomatous polyposis (FAP); 2) Gardner's syndrome; 3) Hereditary nonpolyposis colon cancer (HNPCC); and 4) Familial colorectal cancer in Ashkenazi Jews. The expression of appropriate polynucleotides of the invention can be used in the diagnosis, prognosis and management of colorectal cancer. Detection of colon cancer can be determined using expression levels of any of these sequences alone or in combination with the levels of expression. Determination of the aggressive nature and/or the metastatic potential of a colon cancer can be determined by comparing levels of one or more polynucleotides of the invention and comparing total levels of another sequence known to vary in cancerous tissue, e.g., expression of p53, DCC ras, lor FAP (see, e.g., Fearon ER, et al., Cell (1990) 61(5):759; Hamilton SR et al., Cancer (1993) 72:957; Bodmer W, et al., Nat Genet. (1994) 4(3):217; Fearon ER, AnnN Y Acad Sci. (1995) 768:101). For example, development of colon cancer can be detected by examining the ratio of any of the polynucleotides of the invention to the levels of oncogenes (e.g., ras) or tumor suppressor genes (e.g., FAP or p53). Thus, expression of specific marker polynucleotides can be used to discriminate between normal and cancerous colon tissue, to discriminate between colon cancers with different cells of origin, to discriminate between colon cancers with different potential metastatic rates, etc. For a review of markers of cancer, see, e.g., Hanahan et al. (2000) Cell 100:57-70.

Detection of prostate cancer. The polynucleotides and their corresponding genes and gene products exhibiting the appropriate differential expression pattern can be used to detect prostate cancer in a subject. Prostate cancer is quite common in humans, with one out of every six men at a lifetime risk for prostate cancer, and can be relatively harmless or extremely aggressive. Some prostate tumors are slow growing, causing few clinical symptoms, while aggressive tumors spread rapidly to the lymph nodes, other organs and especially bone. Over 95% of primary prostate cancers are adenocarcinomas. Signs and symptoms may include: frequent urination, especially at night; inability to urinate; trouble starting or holding back urination; a weak or interrupted urine flow; and frequent pain or stiffness in the lower back, hips or upper thighs.

The prostate is divided into three areas - the peripheral zone, the transition zone, and the central zone - with a layer of tissue surrounding all three. Most prostate tumors form in the peripheral zone; the larger, glandular portion of the organ. Prostate cancer can also form in the tissue of the central zone. Surrounding the prostate is the prostate capsule, a tissue that separates the prostate from the rest of the body. When prostate cancer remains inside the prostate capsule, it is considered localized and treatable with surgery. Once the cancer punctures the capsule and spreads outside, treatment options are more limited. Prevention and early detection are key factors in controlling and curing prostate cancer. While the Gleason Grade or Score of a prostate cancer can provide information useful in determining the appropriate treatment of a prostate cancer, the majority of prostate cancers are Gleason Scores 5, 6, and 7, which exhibit unpredictable behavior. These cancers may behave like less dangerous low-grade cancers or like extremely dangerous high-grade cancers. As a result, a patient living with a medium-grade prostate cancer is at constant risk of developing high-grade cancer. The expression of appropriate polynucleotides can be used in the diagnosis, prognosis and management of prostate cancer. Detection of prostate cancer can be determined using expression levels of any of these sequences alone or in combination with the levels of expression of any other nucleotide sequences. Determination of the aggressive nature and/or the metastatic potential of a prostate cancer can be determined by comparing levels of one or more gene products of the genes corresponding to the polynucleotides described herein, and comparing total levels of another sequence known to vary in cancerous tissue, e.g., expression of p53, DCC, ras, FAP (see, e.g., Fearon ER, et al, Cell (1990) 61(5):159; Hamilton SR et al, Cancer (1993) 72:957; Bodmer W, et al, Nat Genet. (1994) 4(3):2ll; Fearon ER, Ann N Y Acad Sci. (1995) 755:101).

For example, development of prostate cancer can be detected by examining the level of expression of a gene corresponding to a polynucleotides described herein to the levels of oncogenes (e.g. ras) or tumor suppressor genes (e.g. FAP or p53). Thus expression of specific marker polynucleotides can be used to discriminate between normal and cancerous prostate tissue, to discriminate between prostate cancers with different cells of origin, to discriminate between prostate cancers with different potential metastatic rates, etc. For a review of markers of cancer, see, e.g., Hanahan et al. (2000) Cell 100:57-70. Li addition, many of the signs and symptoms of prostate cancer can be caused by a variety of other non-cancerous conditions. For example, one common cause of many of these signs and symptoms is a condition called benign prostatic hypertrophy,' or BPH. In BPH, the prostate gets bigger and may block the flow of urine or interfere with sexual function. The methods and compositions of the invention can be used to distinguish between prostate cancer and such non-cancerous conditions. The methods of the invention can be used in conjunction with conventional methods of diagnosis, e.g., digital rectal exam and/or detection of the level of prostate specific antigen (PSA), a substance produced and secreted by the prostate, and/or prostatic acid phosphatase (PAP).

Detection of breast cancer. The majority of breast cancers are adenocarcinoma subtypes, which can be summarized as follows: 1) ductal carcinoma in situ (DCIS), including comedocarcinoma; 2) infiltrating (or invasive) ductal carcinoma QDC); 3) lobular carcinoma in situ (LCIS); 4) infiltrating (or invasive) lobular carcinoma (ILC); 5) inflammatory breast cancer; 6) medullary carcinoma; 7) mucinous carcinoma; 8) Paget's disease of the nipple; 9) Phyllodes tumor; and 10) tubular carcinoma.

The expression of polynucleotides of the invention can be used in the diagnosis and management of breast cancer, as well as to distinguish between types of breast cancer. Detection of breast cancer can be determined using expression levels of any of the appropriate polynucleotides of the invention, either alone or in combination. Determination of the aggressive nature and/or the metastatic potential of a breast cancer can also be determined by comparing levels of one or more polynucleotides of the invention and comparing levels of another sequence known to vary in cancerous tissue, e.g., ER expression. In addition, development of breast cancer can be detected by examining the ratio of expression of a differentially expressed polynucleotide to the levels of steroid hormones (e.g., testosterone or estrogen) or to other hormones (e.g., growth hormone, insulin). Thus, expression of specific marker polynucleotides can be used to discriminate between nonnal and cancerous breast tissue, to discriminate between breast cancers with different cells of origin, to discriminate between breast cancers with different potential metastatic rates, etc. Detection of lung cancer. The polynucleotides of the invention can be used to detect lung cancer in a subject. Although there are more than a dozen different kinds of lung cancer, the two main types of lung cancer are small cell and nonsmall cell, which encompass about 90% of all lung cancer cases. Small cell carcinoma (also called oat cell carcinoma) usually starts in one of the larger bronchial tubes, grows fairly rapidly, and is likely to be large by the time of diagnosis. Nonsmall cell lung cancer (NSCLC) is made up of three general subtypes of lung cancer. Epidermoid carcinoma (also called squamous cell carcinoma) usually starts in one of the larger bronchial tubes and grows relatively slowly. The size of these tumors can range from very small to quite large. Adenocarcinoma starts growing near the outside surface of the lung and can vary in both size and growth rate. Some slowly growing adenocarcinomas are described as alveolar cell cancer. Large cell carcinoma starts near the surface of the lung, grows rapidly, and the growth is usually fairly large when diagnosed. Other less common forms of lung cancer are carcinoid, cylindroma, mucoepidermoid, and malignant mesothelioma.

The polynucleotides of the invention, e.g., polynucleotides differentially expressed in normal cells versus cancerous lung cells (e.g., tumor cells of high or low metastatic potential) or between types of cancerous lung cells (e.g., high metastatic versus low metastatic), can be used to distinguish types of lung cancer as well as identifying traits specific to a certain patient's cancer and selecting an appropriate therapy. For example, if the patient's biopsy expresses a polynucleotide that is associated with a low metastatic potential, it may justify leaving a larger portion of the patient's lung in surgery to remove the lesion. Alternatively, a smaller lesion with expression of a polynucleotide that is associated with high metastatic potential may justify a more radical removal of lung tissue and/or the surrounding lymph nodes, even if no metastasis can be identified through pathological examination.

Tumor classification and patient stratification The invention further provides for methods of classifying tumors, and thus grouping or

"stratifying" patients, according to the expression profile of selected differentially expressed genes in a tumor. Differentially expressed genes can be analyzed for correlation with other differentially expressed genes in a single tumor type (e.g., a prostate tumor) or between tumor types (e.g., between prostate and colon tumors). Genes that demonstrate consistent correlation in expression profile in a given cancer cell type (e.g., in a prostate cancer cell or type of prostate cancer) can be grouped together, e.g., when one gene is overexpressed in a tumor, a second gene is also usually overexpressed. Tumors can then be classified according to the expression profile of one or more genes selected from one or more groups.

The tumor of each patient in a pool of potential patients can be classified as described above. Patients having similarly classified tumors can then be selected for participation in an investigative or clinical trial of a cancer therapeutic where a homogeneous population is desired. The tumor classification of a patient can also be used in assessing the efficacy of a cancer therapeutic in a heterogeneous patient population. In addition, therapy for a patient having a tumor of a given expression profile can then be selected accordingly. Treatment of cancer The invention further provides methods for reducing growth of cancer cells. In general, the methods comprise contacting a cancer cell with a substance that modulates (1) expression of a polynucleotide corresponding to a gene that is differentially expressed in cancer; or (2) a level of and/or an activity of a cancer-associated polypeptide. In general, the methods provide for decreasing the expression of a gene that is differentially expressed in a cancer cell (e.g., overexpressed) or decreasing the level of and/or decreasing an activity of a cancer-associated polypeptide. The methods also provide for increasing expression of a gene that is underexpressed in a cancer cell or increasing the level of and/or increasing an activity of a cancer-associated polypeptide.

"Reducing growth of cancer cells" includes, but is not limited to, reducing proliferation of cancer cells (e.g., prostate, colon, lung, breast, etc. cancer cells), and reducing the incidence of a non- cancerous cell becoming a cancerous cell. Whether a reduction in cancer cell growth has been achieved can be readily determined using any known assay, including, but not limited to, [³H]- thymidine incorporation; counting cell number over a period of time; detecting and/or measuring a marker associated with the cancer type (e.g., CEA, CA19-9, LASA, PSA, PAP, CA15-3, CA27-29, NSE, LDH, etc.). The present invention provides methods for treating cancer, generally comprising administering to an individual in need thereof a substance that reduces cancer cell growth, in an amomit sufficient to reduce cancer cell growth and treat the cancer. Whether a substance, or a specific amount of the substance, is effective in treating cancer can be assessed using any of a variety of known diagnostic assays for the particular type of cancer being treated. The substance can be administered systemically or locally. Thus, in some embodiments, the substance is administered locally, and cancer growth is decreased at the site of administration. Local administration may be useful in treating, e.g., a solid tumor.

A substance that reduces cancer cell growth can be targeted to a cancer cell. Thus, in some embodiments, the invention provides a method of delivering a drug to a cancer cell, comprising administering a drug-antibody complex to a subject, wherein the antibody is specific for a particular cancer-associated polypeptide, and the drug is one that reduces cancer cell growth, a variety of which are known in the art. Targeting can be accomplished by coupling (e.g., linking, directly or via a linker molecule, either covalently or non-covalently, so as to form a drug-antibody complex) a drug to an antibody specific for a particular cancer-associated polypeptide. Methods of coupling a drug to an antibody are well known in the art and need not be elaborated upon herein. In another embodiment, differentially expressed gene products (e.g., polypeptides or polynucleotides encoding such polypeptides) may be effectively used in treatment tlirough vaccination. The growth of cancer cells is naturally limited in part due to immune surveillance. Stimulation of the immune system using a particular tumor-specific antigen enhances the effect towards the tumor expressing the antigen. An active vaccine comprising a polypeptide encoded by the cDNA of this invention would be appropriately administered to subjects having overabundance of the corresponding RNA, or those predisposed for developing cancer cells with overabundance of the same RNA. Polypeptide antigens are typically combined with an adjuvant as part of a vaccine composition. The vaccine is preferably administered first as a priming dose, and then again as a boosting dose, usually at least four weeks later. Further boosting doses may be given to enhance the effect. The dose and its timing are usually determined by the person responsible for the treatment.

The invention also encompasses the selection of a therapeutic regimen based upon the expression profile of differentially expressed genes in the patient's tumor. For example, a tumor can be analyzed for its expression profile of the genes corresponding to SEQ ID NOS:l-1542 as described herein, e.g., the tumor is analyzed to determine which genes are expressed at elevated levels or at decreased levels relative to normal cells of the same tissue type. The expression patterns of the tumor are then compared to the expression patterns of tumors that respond to a selected therapy. Where the expression profiles of the test tumor cell and the expression profile of a tumor cell of known drug responsivity at least substantially match (e.g., selected sets of genes at elevated levels in the tumor of known drug responsivitiy and are also at elevated levels in the test tumor cell), then the drug selected for therapy is the drug to which tumors with that expression pattern respond.

Identification of Therapeutic Targets and Anti-Cancer Therapeutic Agents The present invention also encompasses methods for identification of agents having the ability to modulate activity of a differentially expressed gene product, as well as methods for identifying a differentially expressed gene product as a therapeutic target for treatment of cancer, especially prostate cancer.

Candidate agents

Identification of compounds that modulate activity of a differentially expressed gene product can be accomplished using any of a variety of drug screening techniques. Such agents are candidates for development of cancer therapies. Of particular interest are screening assays for agents that have tolerable toxicity for normal, non-cancerous human cells. The screening assays of the invention are generally based upon the ability of the agent to modulate an activity of a differentially expressed gene product and/or to inhibit or suppress phenomenon associated with cancer (e.g., cell proliferation, colony formation, cell cycle arrest, metastasis, and the like). The term "agent" as used herein describes any molecule, e.g. protein or pharmaceutical, with the capability of modulating a biological activity of a gene product of a differentially expressed gene. Generally a plurality of assay mixtures are run in parallel with different agent concentrations to obtain a differential response to the various concentrations. Typically, one of these concentrations serves as a negative control, i.e. at zero concentration or below the level of detection.

Candidate agents encompass numerous chemical classes, though typically they are organic molecules, preferably small organic compounds having a molecular weight of more than 50 and less than about 2,500 daltons. Candidate agents comprise functional groups necessary for structural interaction with proteins, particularly hydrogen bonding, and typically include at least an amine, carbonyl, hydroxyl or carboxyl group, preferably at least two of the functional chemical roups. The candidate agents often comprise cyclical carbon or heterocyclic structures and/or aromatic or polyaromatic structures substituted with one or more of the above functional groups. Candidate agents are also found among biomolecules including, but not limited to: peptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, structural analogs or combinations thereof.

Candidate agents are obtained from a wide variety of sources including libraries of synthetic or natural compounds. For example, numerous means are available for random and directed synthesis of a wide variety of organic compounds and biomolecules, including expression of randomized oligonucleotides and oligopeptides. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts (including extracts from human tissue to identify endogenous factors affecting differentially expressed gene products) are available or readily produced. Additionally, natural or synthetically produced libraries and compounds are readily modified tlirough conventional chemical, physical and biochemical means, and may be used to produce combinatorial libraries. Known pharmacological agents may be subjected to directed or random chemical modifications, such as acylation, alkylation, esterification, amidification, etc. to produce structural analogs.

Exemplary candidate agents of particular interest include, but are not limited to, antisense polynucleotides, and antibodies, soluble receptors, and the like. Antibodies and soluble receptors are of particular interest as candidate agents where the target differentially expressed gene product is secreted or accessible at the cell-surface (e.g., receptors and other molecule stably-associated with the outer cell membrane).

Screening of candidate agents Screening assays can be based upon any of a variety of techniques readily available and known to one of ordinary skill in the art. In general, the screening assays involve contacting a cancerous cell (preferably a cancerous prostate cell) with a candidate agent, and assessing the effect upon biological activity of a differentially expressed gene product. The effect upon a biological activity can be detected by, for example, detection of expression of a gene product of a differentially expressed gene (e.g., a decrease in mRNA or polypeptide levels, would in rum cause a decrease in biological activity of the gene product). Alternatively or in addition, the effect of the candidate agent can be assessed by examining the effect of the candidate agent in a functional assay. For example, where the differentially expressed gene product is an enzyme, then the effect upon biological activity can be assessed by detecting a level of enzymatic activity associated with the differentially expressed gene product. The functional assay will be selected according to the differentially expressed gene product. In general, where the differentially expressed gene is increased in expression in a cancerous cell, agents of interest are those that decrease activity of the differentially expressed gene product. Assays described infra can be readily adapted in the screening assay embodiments of the invention. Exemplary assays useful in screening candidate agents include, but are not limited to, hybridization-based assays (e.g., use of nucleic acid probes or primers to assess expression levels), antibody-based assays (e.g., to assess levels of polypeptide gene products), binding assays (e.g., to detect interaction of a candidate agent with a differentially expressed polypeptide, which assays may be competitive assays where a natural or synthetic ligand for the polypeptide is available), and the like. Additional exemplary assays include, but are not necessarily limited to, cell proliferation assays, antisense knockout assays, assays to detect inhibition of cell cycle, assays of induction of cell deafh/apoptosis, and the like. Generally such assays are conducted in vitro, but many assays can be adapted for in vivo analyses, e.g., in an animal model of the cancer. Identification of therapeutic targets

In another embodiment, the invention contemplates identification of differentially expressed genes and gene products as therapeutic targets. In some respects, this is the converse of the assays described above for identification of agents having activity in modulating (e.g., decreasing or increasing) activity of a differentially expressed gene product.

In this embodiment, therapeutic targets are identified by examining the effect(s) of an agent that can be demonstrated or has been demonstrated to modulate a cancerous phenotype (e.g., inhibit or suppress or prevent development of a cancerous phenotype). Such agents are generally referred to herein as an "anti-cancer agent", which agents encompass chemotherapeutic agents. For example, the agent can be an antisense oligonucleotide that is specific for a selected gene transcript. For example, the antisense oligonucleotide may have a sequence corresponding to a sequence of a differentially expressed gene described herein, e.g., a sequence of one of SEQ ID NOS:l-2164.

Assays for identification of therapeutic targets can be conducted in a variety of ways using methods that are well known to one of ordinary skill in the art. For example, a test cancerous cell that expresses or overexpresses a differentially expressed gene is contacted with an anti-cancer agent, the effect upon a cancerous phenotype and a biological activity of the candidate gene product assessed. The biological activity of the candidate gene product can be assayed be examining, for example, modulation of expression of a gene encoding the candidate gene product (e.g., as detected by, for example, an increase or decrease in transcript levels or polypeptide levels), or modulation of an enzymatic or other activity of the gene product. The cancerous phenotype can be, for example, cellular proliferation, loss of contact inhibition of growth (e.g., colony formation), tumor growth (in vitro or in vivo), and the like. Alternatively or in addition, the effect of modulation of a biological activity of the candidate target gene upon cell deatli/apoptosis or cell cycle regulation can be assessed.

Inhibition or suppression of a cancerous phenotype, or an increase in cell/death apoptosis as a result of modulation of biological activity of a candidate gene product indicates that the candidate gene product is a suitable target for cancer therapy. Assays described infra can be readily adapted in for assays for identification of therapeutic targets. Generally such assays are conducted in viti-o, but many assays can be adapted for in vivo analyses, e.g., in an appropriate, art-accepted animal model of the cancer.

Use of Polynucleotides to Screen for Peptide Analogs and Antagonists Polypeptides encoded by the instant polynucleotides and corresponding full-length genes can be used to screen peptide libraries to identify binding partners, such as receptors, from among the encoded polypeptides. Peptide libraries can be synthesized according to methods known in the art (see, e.g., USPN 5,010,175 , and WO 91/17823).

Agonists or antagonists of the polypeptides of the invention can be screened using any available method known in the art, such as signal transduction, antibody binding, receptor binding, mitogenic assays, chemotaxis assays, etc. The assay conditions ideally should resemble the conditions under which the native activity is exhibited in vivo, that is, under physiologic pH, temperature, and ionic strength. Suitable agonists or antagonists will exliibit strong inliibition or enhancement of the native activity at concentrations that do not cause toxic side effects in the subject. Agonists or antagonists that compete for binding to the native polypeptide can require concentrations equal to or greater than the native concentration, while inhibitors capable of binding irreversibly to the polypeptide can be added in concentrations on the order of the native concentration. Such screening and experimentation can lead to identification of a novel polypeptide binding partner, such as a receptor, encoded by a gene or a cDNA corresponding to a polynucleotide of the invention, and at least one peptide agonist or antagonist of the novel binding partner. Such agonists and antagonists can be used to modulate, enhance, or inhibit receptor function in cells to which the receptor is native, or in cells that possess the receptor as a result of genetic engineering. Further, if the novel receptor shares biologically important characteristics with a known receptor, information about agonist antagonist binding can facilitate development of improved agonists/antagonists of the known receptor.

Vaccines and Uses

The differentially expressed nucleic acids and polypeptides produced by the nucleic acids of the invention can also be used to modulate primary immune response to prevent or treat cancer. Every immune response is a complex and intricately regulated sequence of events involving several cell types. It is triggered when an antigen enters the body and encounters a specialized class of cells called antigen-presenting cells (APCs). These APCs capture a minute amount of the antigen and display it in a form that can be recognized by antigen-specific helper T lymphocytes. The helper (Th) cells become activated and, in turn, promote the activation of other classes of lymphocytes, such as B cells or cytotoxic T cells. The activated lymphocytes then proliferate and carry out their specific effector functions, which in many cases successfully activate or eliminate the antigen. Thus, activating the immune response to a particular antigen associated with a cancer cell can protect the patient from developing cancer or result in lymphocytes eliminating cancer cells expressing the antigen. Gene products, including polypeptides, mRNA (particularly mRNAs having distinct secondary and/or tertiary structures), cDNA, or complete gene, can be prepared and used in vaccines for the treatment or prevention of hype roliferative disorders and cancers. The nucleic acids and polypeptides can be utilized to enhance the immune response, prevent tumor progression, prevent hyperproliferative cell growth, and the like. Methods for selecting nucleic acids and polypeptides that are capable of enhancing the immune response are known in the art. Preferably, the gene products for use in a vaccine are gene products which are present on the surface of a cell and are recognizable by lymphocytes and antibodies.

The gene products may be formulated with pharmaceutically acceptable carriers into pharmaceutical compositions by methods known in the art. The composition is useful as a vaccine to prevent or treat cancer. The composition may further comprise at least one co-immunostimulatory molecule, including but not limited to one or more major histocompatibility complex (MHC) molecules, such as a class I or class II molecule, preferably a class I molecule. The composition may further comprise other stimulator molecules including B7.1, B7.2, ICAM-1, ICAM-2, LFA-1, LFA-3, CD72 and the like, immunostimulatory polynucleotides (which comprise an 5'-CG-3' wherein the cytosine is unmethylated), and cytokines which include but are not limited to JJL-1 through IL-15, TNF-α, IFN-γ, RANTES, G-CSF, M-CSF, IFN-α, CTAP III, ENA-78, GRO, 1-309, PF-4, IP- 10, LD- 78, MGSA, MOP-lα, MlP-lβ, or combination thereof, and the like for immunopotentiation. hi one embodiment, the immunopotentiators of particular interest are those which facilitate a Thl immune response.

The gene products may also be prepared with a carrier that will protect the gene products against rapid elimination from the body, such as a controlled release formulation, including implants and microencapsulated delivery systems. Biodegradable polymers can be used, such as ethylene vinyl acetate, polyanliydrides, polyglycolic acid, collagen, polyorthoesters, polylactic acid, and the like. Methods for preparation of such formulations are known in the art.

In the methods of preventing or treating cancer, the gene products may be administered via one of several routes including but not limited to transdermal, transmucosal, intravenous, intramuscular, subcutaneous, intradermal, intraperitoneal, intrathecal, intrapleural, intrauterine, rectal, vaginal, topical, intratumor, and the like. For transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated are used in the formulation. Such penetrants are generally known in the art, and include, for example, administration bile salts and fusidic acid derivatives. In addition, detergents may be used to facilitate permeation. Transmucosal administration may be by nasal sprays or suppositories. For oral administration, the gene products are formulated into conventional oral administration form such as capsules, tablets and toxics.

The gene product is administered to a patient in an amount effective to prevent or treat cancer, hi general, it is desirable to provide the patient with a dosage of gene product of at least about 1 pg per Kg body weight, preferably at least about 1 ng per Kg body weight, more preferably at least about 1 μg or greater per Kg body weight of the recipient. A range of from about 1 ng per Kg body weight to about 100 mg per Kg body weight is preferred although a lower or higher dose may be administered. The dose is effective to prime, stimulate and/or cause the clonal expansion of antigen- specific T lymphocytes, preferably cytotoxic T lymphocytes, which in turn are capable of preventing or treating cancer in the recipient. The dose is administered at least once and may be provided as a bolus or a continuous administration. Multiple administrations of the dose over a period of several weeks to months may be preferable. Subsequent doses may be administered as indicated. hi another method of treatment, autologous cytotoxic lymphocytes or tumor infiltrating lymphocytes may be obtained from a patient with cancer. The lymphocytes are grown in culture, and antigen-specific lymphocytes are expanded by culturing in the presence of the specific gene products alone or in combination with at least one co-immunostimulatory molecule with cytokines. The antigen-specific lymphocytes are then infused back into the patient in an amount effective to reduce or eliminate the tumors in the patient. Cancer vaccines and their uses are further described in USPN 5,961,978; USPN 5,993,829; USPN 6,132,980; and WO 00/38706. Pharmaceutical Compositions and Uses Pharmaceutical compositions can comprise polypeptides, receptors that specifically bind a polypeptide produced by a differentially expressed gene (e.g., antibodies, or polynucleotides (including antisense nucleotides and ribozymes) of the claimed invention in a therapeutically effective amount. The compositions can be used to treat primary tumors as well as metastases of primary tumors, hi addition, the pharmaceutical compositions can be used in conjunction with conventional methods of cancer treatment, e.g., to sensitize tumors to radiation or conventional chemotherapy. Where the pharmaceutical composition comprises a receptor (such as an antibody) that specifically binds to a gene product encoded by a differentially expressed gene, the receptor can be coupled to a drug for delivery to a treatment site or coupled to a detectable label to facilitate imaging of a site comprising colon cancer cells. Methods for coupling antibodies to drugs and detectable labels are well known in the art, as are methods for imaging using detectable labels. The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased body temperature.

The precise effective amount for a subject will depend upon the subject's size and health, the nature and extent of the condition, and the therapeutics or combination of therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount in advance. However, the effective amount for a given situation is determined by routine experimentation and is within the judgment of the clinician. For purposes of the present invention, an effective dose will generally be from about 0.01 mg/kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered.

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term "pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any pharmaceutical carrier that does not itself induce the production of antibodies harmful to the individual receiving the composition, and which can be administered without undue toxicity. Suitable carriers can be large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in the art. Pharmaceutically acceptable carriers in therapeutic compositions can include liquids such as water, saline, glycerol and ethanol. Auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, and the like, can also be present in such vehicles.

Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection can also be prepared. Liposomes are included within the definition of a pharmaceutically acceptable carrier. Pharmaceutically acceptable salts can also be present in the pharmaceutical composition, e.g., mineral acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack Pub. Co., N.J. 1991). Delivery Methods

Once formulated, the compositions of the invention can be (1) administered directly to the subject (e.g., as polynucleotide or polypeptides); or (2) delivered ex vivo, to cells derived from the subject (e.g., as in ex vivo gene therapy). Direct delivery of the compositions will generally be accomplished by parenteral injection, e.g., subcutaneously, intraperitoneally, intravenously or uitramuscularly, intratumorally or to the interstitial space of a tissue. Other modes of administration include oral and pulmonary administration, suppositories, and transdermal applications, needles, and gene guns or hyposprays. Dosage treatment can be a single dose schedule or a multiple dose schedule. Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known in the art and described in, e.g., WO 93/14778. Examples of cells useful in ex vivo applications include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or tumor cells. Generally delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished by, for example, dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well known in

Once differential expression of a gene corresponding to a polynucleotide of the invention has been found to correlate with a proliferative disorder, such as neoplasia, dysplasia, and hyperplasia, the disorder can be amenable to treatment by administration of a therapeutic agent based on the provided polynucleotide, corresponding polypeptide or other corresponding molecule (e.g., antisense, ribozyme, etc.). In other embodiments, the disorder can be amenable to treatment by administration of a small molecule drug that, for example, serves as an inhibitor (antagonist) of the function of the encoded gene product of a gene having increased expression in cancerous cells relative to normal cells or as an agonist for gene products that are decreased in expression in cancerous cells (e.g., to promote the activity of gene products that act as tumor suppressors).

The dose and the means of administration of the inventive pharmaceutical compositions are determined based on the specific qualities of the therapeutic composition, the condition, age, and weight of the patient, the progression of the disease, and other relevant factors. For example, administration of polynucleotide therapeutic composition agents of the invention includes local or systemic administration, including injection, oral administration, particle gun or catheterized administration, and topical administration. Preferably, the therapeutic polynucleotide composition contains an expression construct comprising a promoter operably linked to a polynucleotide of at least 12, 22, 25, 30, or 35 contiguous nt of the polynucleotide of the invention. Various methods can be used to administer the therapeutic composition directly to a specific site in the body. For example, a small metastatic lesion is located and the therapeutic composition injected several times in several different locations within the body of tumor. Alternatively, arteries that serve a tumor are identified, and the therapeutic composition injected into such an artery, in order to deliver the composition directly into the tumor. A tumor that has a necrotic center is aspirated and the composition injected directly into the now empty center of the tumor. The antisense composition is directly administered to the surface of the tumor, for example, by topical application of the composition. X-ray imaging is used to assist in certain of the above delivery methods. Targeted delivery of therapeutic compositions containing an antisense polynucleotide, subgenomic polynucleotides, or antibodies to specific tissues can also be used. Receptor-mediated DNA delivery techniques are described in, for example, Findeis et al., Trends Biotechnol. (1993) 11:202; Chiou et al., Gene Therapeutics: Methods And Applications Of Direct Gene Transfer (J.A. Wolff, ed.) (1994); Wu et al., J. Biol. Chem. (1988) 263:621; Wu et al., J. Biol. Chem. (1994)

269:542; Zenke et al., Proc. Natl. Acad. Sci. (USA) (1990) 87:3655; Wu et al., J. Biol. Chem. (1991) 266:338. Therapeutic compositions containing a polynucleotide are administered in a range of about 100 ng to about 200 mg of DNA for local administration in a gene therapy protocol. Concentration ranges of about 500 ng to about 50 mg, about 1 micrograms to about 2 mg, about 5 micrograms to about 500 micrograms, and about 20 micrograms to about 100 micrograms of DNA can also be used during a gene therapy protocol. Factors such as method of action (e.g., for enhancing or inhibiting levels of the encoded gene product) and efficacy of transformation and expression are considerations which will affect the dosage required for ultimate efficacy of the antisense subgenomic polynucleotides. Where greater expression is desired over a larger area of tissue, larger amounts of antisense subgenomic polynucleotides or the same amounts readministered in a successive protocol of administrations, or several administrations to different adjacent or close tissue portions of, for example, a tumor site, may be required to effect a positive therapeutic outcome. In all cases, routine experimentation in clinical trials will determine specific ranges for optimal therapeutic effect. For polynucleotide related genes encoding polypeptides or proteins with anti-inflammatory activity, suitable use, doses, and administration are described in USPN 5,654,173.

The therapeutic polynucleotides and polypeptides of the present invention can be delivered using gene delivery vehicles. The gene delivery vehicle can be of viral or non-viral origin (see generally, Jolly, Cancer Gene Therapy (1994) 1:51; Kimura, Human Gene Therapy (1994) 5:845; Connelly, Human Gene Therapy (1995) 1:185; and Kaplitt, Nature Genetics (1994) 6: 148).

Expression of such coding sequences can be induced using endogenous mammalian or heterologous promoters. Expression of the coding sequence can be either constitutive or regulated.

Viral-based vectors for delivery of a desired polynucleotide and expression in a desired cell are well lαiown in the art. Exemplary viral-based vehicles include, but are not limited to, recombinant retroviruses (see, e.g., WO 90/07936; WO 94/03622; WO 93/25698; WO 93/25234; USPN 5,

219,740; WO 93/11230; WO 93/10218; USPN 4,777,127; GB PatentNo. 2,200,651; EP 0 345 242; and WO 91/02805), alphavirus-based vectors (e.g., Sindbis virus vectors, Semliki forest virus (ATCC VR-67; ATCC VR-1247), Ross River virus (ATCC VR-373; ATCC VR-1246) and Venezuelan equine encephalitis virus (ATCC VR-923; ATCC VR-1250; ATCC VR 1249; ATCC VR-532), and adeno-associated virus (AAV) vectors (see, e.g., WO 94/12649, WO 93/03769; WO 93/19191; WO . 94/28938; WO 95/11984 and WO 95/00655). Administration of DNA linked to killed adenovirus, as described in Curiel, Hum. Gene Ther. (1992) 3:147, can also be employed.

Non-viral delivery vehicles and methods can also be employed, including, but not limited to, polycationic condensed DNA linked or unlinked to killed adenovirus alone (see, e.g., Curiel, Hum. Gene Ther. (1992) 3 : 147); ligand-linked DNA (see, e.g., Wu, J. Biol. Chem. (1989) 264: 16985); eukaryotic cell delivery vehicles cells (see, e.g., USPN 5,814,482; WO 95/07994; WO 96/17072;

WO 95/30763; and WO 97/42338) and nucleic charge neutralization or fusion with cell membranes.

Naked DNA can also be employed. Exemplary naked DNA introduction methods are described in

WO 90/11092 and USPN 5,580,859. Liposomes that can act as gene delivery vehicles are described in USPN 5,422,120; WO 95/13796; WO 94/23697; WO 91/14445; and EP 0524968. Additional approaches are described in Philip, Mol. Cell Biol. (1994) 14:2411, and in Woffendin, Proc. Natl.

Acad. Sci. (1994) 91:1581 ^!

Further non-viral delivery suitable for use includes mechanical delivery systems such as the approach described in Woffendin et al., Proc. Natl. Acad. Sci. USA (1994) 91(24): 11581. Moreover, the coding sequence and the product of expression of such can be delivered tlirough deposition of photopolymerized hydrogel materials or use of ionizing radiation (see, e.g., USPN 5,206,152 and WO

92/11033). Other conventional methods for gene delivery that can be used for delivery of the coding sequence include, for example, use of hand-held gene transfer particle gun (see, e.g., USPN

5,149,655); use of ionizing radiation for activating transferred gene (see, e.g., USPN 5,206,152 and WO 92/11033).

The present invention will now be illustrated by reference to the following examples which set forth particularly advantageous embodiments. However, it should be noted that these embodiments are illustrative and are not to be construed as restricting the invention in any way.

EXAMPLES

The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments perfonned. It will be readily apparent to those skilled in the art that the formulations, dosages, methods of administration, and other parameters of this invention may be further modified or substituted in various ways without departing from the spirit and scope of the invention. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric. Example LSource of Biological Materials and Overview of Novel Polynucleotides Expressed by the Biological Materials

Candidate polynucleotides that may represent novel polynucleotides were obtained from cDNA libraries generated from selected cell lines and patient tissues. In order to obtain the candidate polynucleotides, mRNA was isolated from several selected cell lines and patient tissues, and used to construct cDNA libraries. The cells and tissues that served as sources for these cDNA libraries are summarized in Table 1 below.

Human colon cancer cell line Kml2L4-A (Morikawa, et al., Cancer Research (1988) 48:6863) is derived from the KM12C cell line. The KM12C cell line (Morikawa et al. Cancer Res. (1988) 48: 1943-1948), which is poorly metastatic (low metastatic) was established in culture from a Dukes' stage B2 surgical specimen (Morikawa et al. Cancer Res. (1988) 48:6863). The KM12L4-A is a highly metastatic subline derived from KM12C (Yeatman et al. Nucl. Acids. Res. (1995) 23 :4007; Bao-Ling et al. Proc. Annu. Meet. Am. Assoc. Cancer. Res. (1995) 21:3269). The KM12C and . KM12C-derived cell lines (e.g., KM12L4, KM12L4-A, etc.) are well-recognized in the art as a model cell line for the study of colon cancer (see, e.g., Moriakawa et al., supra; Radinsky et al. Clin. Cancer Res. (1995) 1:19; Yeatman et al., (1995) supra; Yeatman et al. Clin. Exp. Metastasis (1996) 14:246).

The MDA-MB-231 cell line (Brinkley et al. Cancer Res. (1980) 40:3118-3129) was originally isolated from pleural effusions (Cailleau, J. Natl. Cancer. Inst. (1974) 53:661), is of high metastatic potential, and forms poorly differentiated adenocarcinoma grade π in nude mice consistent with breast carcinoma. The MCF7 cell line was derived from a pleural effusion of a breast adenocarcinoma and is non-metastatic. The MV-522 cell line is derived from a human lung carcinoma and is of high metastatic potential. The UCP-3 cell line is a low metastatic human lung carcinoma cell line; the MV- 522 is a high metastatic variant of UCP-3. These cell lines are well-recognized in the art as models for the study of human breast and lung cancer (see, e.g., Chandrasekaran et al., Cancer Res. (1979) 39:870 (MDA-MB-231 and MCF-7); Gastpar et al., J Med Chem (1998) 41 :4965 (MDA-MB-231 and MCF-7); Ranson et al., Br J Cancer (1998) 77: 1586 (MDA-MB-231 and MCF-7); Kuang et al., Nucleic Acids Res (1998) 26: 1116 (MDA-MB-231 and MCF-7); Varki et al., Int J Cancer (1987) 40:46 (UCP-3); Varki et al., Tumour Biol. (1990) 11 :327; (MV-522 and UCP-3); Varki et al., Anticancer Res. (1990) 10:637; (MV-522); Kelner et al., Anticancer Res (1995) 15:867 (MV-522); and Zhang et al., Anticancer Drugs (1997) 8:696 (MV522)).

The samples of libraries 15-20 are derived from two different patients (UC#2, and UC#3). The bFGF-treated HMVΕC were prepared by incubation with bFGF at lOng/ml for 2 hrs; the VEGF- treated HMVEC were prepared by incubation with 20ng/ml VEGF for 2 hrs. Following incubation with the respective growth factor, the cells were washed and lysis buffer added for RNA preparation. GRRpz was derived from normal prostate epithelium. The WOca cell line is a Gleason Grade

4 cell line. The source materials for generating the normalized prostate libraries of libraries 25 and 26 were cryopreserved prostate tumor tissue from a patient with Gleason grade 3+3 adenocarcinoma and matched normal prostate biopsies from a pool of at-risk subjects under medical surveillance. The source materials for generating the normalized prostate libraries of libraries 30 and 31 were cryopreserved prostate tumor tissue from a patient with Gleason grade 4+4 adenocarcinoma and matched normal prostate biopsies from a pool of at-risk subjects under medical surveillance.

The source materials for generating the normalized breast libraries of libraries 27, 28 and 29 were cryopreserved breast tissue from a primary breast tumor (infiltrating ductal carcinoma)(library 28), from a lymph node metastasis (library 29), or matched normal breast biopsies from a pool of at-risk subjects under medical surveillance. In each case, prostate or breast epithelia were harvested directly from frozen sections of tissue by laser capture microdissection (LCM, Arcturus Enginering Inc., Mountain View, CA), carried out according to methods well known in the art (see, Simone et al. Am J Pathol. 156(2):445-52 (2000)), to provide substantially homogenous cell samples.

Table 1. Description of cDNA Libraries

Characterization of sequences in the libraries

After using the software program Phred (ver 0.000925.C, Green and Weing,, ©1993-2000) to select those polynucleotides having the best quality sequence, the polynucleotides were compared against the public databases to identify any homologous sequences. The sequences of the isolated polynucleotides were first masked to eliminate low complexity sequences using the RepeatMasker masking program, publicly available through a web site supported by the University of Washington (See also Smit, A.F.A. and Green, P., unpublished results). Generally, masking does not influence the final search results, except to eliminate sequences of relatively little interest due to their low complexity, and to eliminate multiple "hits" based on similarity to repetitive regions common to multiple sequences, e.g., Alu repeats.

The remaining sequences were then used in a homology search of the GenBank database using the TeraBLAST program (TimeLogic, Crystal Bay, Nevada). TeraBLAST is a version of the publicly available BLAST search algorithm developed by the National Center for Biotechnology, modified to operate at an accelerated speed with increased sensitivity on a specialized computer hardware platform. The program was run with the default parameters recommended by TimeLogic to provide the best sensitivity and speed for searching DNA and protein sequences. Sequences that exhibited greater than 70% overlap, 99% identity, and a p value of less than 1 x 10e-40 were discarded. Sequences from this search also were discarded if the inclusive parameters were met, but the sequence was ribosomal or vector-derived.

The resulting sequences from the previous search were classified into three groups (1, 2 and 3 below) and searched in a TeraBLASTX vs. NRP (non-redundant proteins) database search: (1) unknown (no hits in the GenBank search), (2) weak similarity (greater than 45% identity and p value of less than l x l 0e-5), and (3) high similarity (greater than 60%) overlap, greater than 80% identity, and p value less than 1 x 10e-5). Sequences having greater than 70% overlap, greater than 99% identity, and p value of less than 1 x 10e-40 were discarded.

The remaining sequences were classified as unknown (no hits), weak similarity, and high similarity (parameters as above). Two searches were performed on these sequences. First, a TeraBLAST vs. EST database search was performed and sequences with greater than 99% overlap, greater than 99% similarity and a p value of less than 1 x 10e-40 were discarded. Sequences with a p value of less than 1 x 10e-65 when compared to a database sequence of human origin were also excluded. Second, a TeraBLASTN vs. Patent GeneSeq database was performed and sequences having greater than 99% identity, p value less than 1 x 10e-40, and greater than 99%> overlap were discarded.

The remaining sequences were subjected to screening using other rules and redundancies in the dataset. Sequences with a p value of less than 1 x lOe-111 in relation to a database sequence of human origin were specifically excluded. The final result provided the sequences listed as SEQ ID NOS: 1-1219 in the accompanying Sequence Listing and summarized in Table 2 (inserted prior to claims). Each identified polynucleotide represents sequence from at least a partial mRNA transcript. Summary of polynucleotides of the invention

Table 2 (inserted prior to claims) provides a summary of polynucleotides isolated as described. Specifically, Table 2 provides: 1) the SEQ UD NO ("SEQ ED") assigned to each sequence for use in the present specification; 2) theCfuster Identification No. ("CLUSTER"); 3) the Sequence Name assigned to each sequence; 3) the sequence name ("SEQ NAME") used as an internal identifier of the sequence; 4) the name assigned to the clone from which the sequence was isolated ("CLONE ID"); and 5) the name of the library from which the sequence was isolated ("LIBRARY"). Because at least some of the provided polynucleotides represent partial mRNA transcripts, two or more polynucleotides may represent different regions of the same mRNA transcript and the same gene and/or may be contained within the same clone. Thus, for example, if two or more SEQ ID NOS: are identified as belonging to the same clone, then either sequence can be used to obtain the full-length mRNA or gene. Clones which comprise the sequences described herein were deposited as set out in the tables indicated below (see Example entitled "Deposit Information"). Example 2: Contig Assembly The sequences of the polynucleotides provided in the present invention can be used to extend the sequence information of the gene to which the polynucleotides correspond (e.g., a gene, or mRNA encoded by the gene, having a sequence of the polynucleotide described herein). This expanded sequence information can in turn be used to further characterize the corresponding gene, which in turn provides additional information about the nature of the gene product (e.g., the normal function of the gene product). The additional information can serve to provide additional evidence of the gene product's use as a therapeutic target, and provide further guidance as to the types of agents that can modulate its activity.

For example, a contig was assembled using the sequence of a polynucleotide described herein. A "contig" is a contiguous sequence of nucleotides that is assembled from nucleic acid sequences having overlapping (e.g., shared or substantially similar) sequence information. The sequences of publicly-available ESTs (Expressed Sequence Tags) and the sequences of various of the above- described polynucleotides were used in the contig assembly. The contig was assembled using the software program Sequencher, version 4.05, according to the manufacturer's instructions. The sequence information obtained in the contig assembly was then used to obtain a consensus sequence derived from the contig usmg the Sequencher program. The resulting consensus sequence was used to search both the public databases as well as databases internal to the applicants to match the consensus polynucleotide with homology data and/or differential gene expressed data.

The final result provided the sequences listed as SEQ J-D NOS: 1220-1428 in the accompanying Sequence Listing and summarized in Tables 3 and 4 (inserted prior to claims). Table 3 provides a summary of the consensus sequences assembled as described. Specifically, Table 3 provides: 1) the SEQ ID NO ("SEQ ID") assigned to each consensus sequence for use in the present specification; 2) theCluster Identification No. ("CLUSTER"); and 3) the consensus sequence name ("CONSENSUS SEQ NAME") used as an internal identifier of the sequence.

A correlation between the polynucleotide used in consensus sequence assembly as described above and the corresponding consensus sequence is contained in Table 4. Specifically Table 4 provides: 1) the SEQ ID NO of the consensus sequence ("CONSENSUS SEQ ID"); 2) the consensus sequence name ("CONSENSUS SEQ NAME") used as an internal identifier of the sequence; 3) the SEQ ID NO of the polynucleotide ("POLYNTD SEQ ID") of SEQ ID NOS: 1-1219 used in assembly of the consensus sequence; and 4) the sequence name ("POLYNTD SEQ NAME") of the polynucleotide of SEQ ID NOS: 1-1219 used in assembly of the consensus sequence. Example 3: Additional Gene Characterization

Sequences of the polynucleotides of SEQ JJD NOS: 1-1219 were used as a query sequence in a TeraBLASTN search of the DoubleTwist Human Genome Sequence Database (DoubleTwist, hie, Oakland, CA), which contains all the human genomic sequences that have been assembled into a contiguous model of the human genome. Predicted cDNA and protein sequences were obtained where a polynucleotide of the invention was homologous to a predicted full-length gene sequence. Alternatively, a sequence of a contig or consensus sequence described herein could be used directly as a query sequence in a TeraBLASTN search of the DoubleTwist Human Genome Sequence Database. The final results of the search provided the predicted cDNA sequences listed as SEQ ID NOS: 1429-1485 in the accompanying Sequence Listing and summarized in Table 5 (inserted prior to claims), and the predicted protein sequences listed as SEQ ID NOS: 1486-1542 in the accompanying Sequence Listing and summarized in Table 6 (inserted prior to claims). Specifically, Table 5 provides: 1) the SEQ ID NO ("SEQ ID") assigned to each cDNA sequence for use in the present specification; 2) the cDNA sequence name ("cDNA SEQ NAME") used as an internal identifier of the sequence; 3) the chromosome ("CHROM") containing the gene corresponding to the cDNA sequence; and 4) the exon ("EXON") of the gene corresponding to the cDNA sequence to which the polynucleotide of SEQ ED NOS: 1-1219 maps. Table 6 provides: 1) the SEQ ID NO ("SEQ ID") assigned to each protein sequence for use in the present specification; 2) the protein sequence name ("PROTEIN SEQ NAME") used as an internal identifier of the sequence; 3) the chromosome ("CHROM") containing the gene corresponding to the cDNA sequence; and 4) the exon ("EXON") of the gene corresponding to the cDNA and protein sequence to which the polynucleotide of SEQ ID NOS: 1-1219 maps.

A correlation between the polynucleotide used as a query sequence as described above and the corresponding predicted cDNA and protein sequences is contained in Table 7. Specifically Table 7 provides: 1) the SEQ ID NO of the cDNA ("cDNA SEQ ID"); 2) the cDNA sequence name ("cDNA SEQ NAME") used as an internal identifier of the sequence; 3) the SEQ ID NO of the protein ("PROTEIN SEQ ID") encoded by the cDNA sequence 4) the sequence name of the protein ("PROTEIN SEQ NAME") encoded by the cDNA sequence; 5) the SEQ ID NO of the polynucleotide ("POLYNTD SEQ ID") of SEQ ID NOS: 1-1219 that maps to the cDNA and protein; and 6) the sequence name ("POLYNTD SEQ NAME") of the polynucleotide of SEQ ID NOS: 1-1219 that maps to the cDNA and protein.

Through contig and consensus sequence assembly and the use of homology searching software programs, the sequence information provided herein can be readily extended to confirm, or confirm a predicted, gene having the sequence of the polynucleotides described in the present invention. Further the information obtained can be used to identify the function of the gene product of the gene corresponding to the polynucleotides described herein. While not necessary to the practice of the invention, identification of the function of the corresponding gene, can provide guidance in the design of therapeutics that target the gene to modulate its activity and modulate the cancerous phenotype (e.g., inhibit metastasis, proliferation, and the like).

Example 4:Results of Public Database Search to Identify Function of Gene Products SEQ ED NOS: 1-1485 were translated in all three reading frames, and the nucleotide sequences and translated amino acid sequences used as query sequences to search for homologous sequences in the GenBank (nucleotide sequences) database. Query and individual sequences were aligned using the TeraBLAST program available from TimeLogic, Crystal Bay, Nevada. The sequences were masked to various extents to prevent searching of repetitive sequences or poly-A sequences, using the RepeafMasker masking program for masking low complexity as described above. Table 8 (inserted prior to claims) provides the alignment summaries having a p value of 1 x 10e-2 or less indicating substantial homology between the sequences of the present invention and those of the indicated public databases. Specifically, Table 8 provides: 1) the SEQ ID NO ("SEQ ED") of the query sequence; 2) the sequence name ("SEQ NAME") used as an internal identifier of the query sequence; 3) the accession number ("ACCESSION") of the GenBank database entry of the homologous sequence; 4) a description of the GenBank sequences ("GENBANK DESCRIPTION"); and 5) the score of the similarity of the polynucleotide sequence and the GenBank sequence ("GENBANK SCORE"). The alignments provided in Table 8 are the best available aligmnent to a DNA sequence at a time just prior to filing of the present specification, incorporated by reference is all publicly available information regarding the sequence listed in Table 8 and their related sequences. The search program and database used for the alignment, as well as the calculation of the p value are also indicated. Full length sequences or fragments of the polynucleotide sequences can be used as probes and primers to identify and isolate the full length sequence of the corresponding polynucleotide. Example 5:Members of Protein Families

SEQ ID NOS:l-1219 were used to conduct a profile search as described in the specification above. Several of the polynucleotides of the invention were found to encode polypeptides having characteristics of a polypeptide belonging to a known protein family (and thus represent members of these protein families) and/or comprising a known functional domain. Table 9 (inserted prior to claims) provides: 1) the SEQ ID NO ("SEQ ID") of the query polynucleotide sequence; 2) the sequence name ("SEQ NAME") used as an internal identifier of the query sequence; 3) the name ("PFAM NAME") of the profile hit; 4) a brief description of the profile hit ("PFAM DESCRIPTION"); 5) the score ("SCORE") of the profile hit; 6) the starting nucleotide of the profile hit ("START"); and 7) the ending nucleotide of the profile hit ("END"). In addition, SEQ ID NOS:1486-1542 were also used to conduct a profile search as described above. Several of the polypeptides of the invention were found to have characteristics of a polypeptide belonging to a known protein family (and thus represent members of these protein families) and/or comprising a known functional domain. Table 10 (inserted prior to claims) provides: 1) the SEQ ED NO ("SEQ ED") of the query protein sequence; 2) the sequence name ("PROTEIN SEQ NAME") used as an internal identifier of the query sequence; 3) the name ("PFAM NAME") of the profile hit; 4) a brief description of the profile hit ("PFAM DESCRIPTION"); 5) the score ("SCORE") of the profile hit; 6) the starting residue of the profile hit ("START"); and 7) the ending residue of the profile hit ("END").

Some SEQ ID NOS exhibited multiple profile hits where the query sequence contains overlapping profile regions, and/or where the sequence contains two different functional domains. Each of the profile hits of Tables 9 and 10 is described in more detail below. The acronyms for the profiles (provided in parentheses) are those used to identify the profile in the Pfarn, Prosite, and InterPro databases. The Pfam database can be accessed through web sites supported by Genome Sequencing Center at the Washington University School of Medicine or by the European Molecular Biology Laboratories in Heidelberg, Germany. The Prosite database can be accessed at the ExPASy Molecular Biology Server on the internet. The InterPro database can be accessed at a web site supported by the EMBL European Bioinformatics Institute. The public information available on the Pfam, Prosite, and InterPro databases regarding the various profiles, including but not limited to the activities, function, and consensus sequences of various proteins families and protein domains, is incorporated herein by reference. Epidermal Growth Factor (ΕGF: Pfam Accession No. PF00008 \ SEQ ID NOS:417 and 418 represent polynucleotides encoding a member of the EGF family of proteins. The distinguishing characteristic of this family is the presence of a sequence of about thirty to forty amino acid residues found in epidermal growth factor (EGF) which has been shown to be present, in a more or less conserved form, in a large number of other proteins (Davis, New Biol. (1990) 2:410-419; Blomquist et al, Proc. Natl. Acad. Sci. U.S.A. (1984) Si: 7363 -7367; Barkert et al, Protein Nucl. AcidEnz. (1986) 2P-.54-86; Doolittle et al, Nature. (1984) 307:558-560; Appella et al, FEBSLett. (1988) 237:1-4; Campbell and Bork, Curr. Opin. Struct. Biol. (1993) 3:385-392). A common feature of the domain is that the conserved pattern is generally found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted. The EGF domain includes six cysteine residues which have been shown to be involved in disulfide bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines strongly vary in length. These consensus patterns are used to identify members of this family: C-x-C- x(5)-G-x(2)-C and C-x-C-x(s)-[GP]-[FYW]-x(4,8)-C.

Seven Transmembrane Integral Membrane Proteins - Rhodopsin Family (7tm_l: Pfam Accession No. PFOOOOl). SEQ ID NO:321 corresponds to a sequence encoding a polypeptide that is a member of the seven transmembrane (7tm) receptor rhodopsin family. G-protein coupled receptors of the (7tm) rhodopsin family (also called R7G) are an extensive group of hormones, neurotransmitters, and light receptors which transduce extracellular signals by interaction with guanine nucleotide-binding (G) proteins (Strosberg, Eur. J. Biochem. (1991) 196:1; Kerlavage, Curr. Opin. Struct. Biol. (1991) 1:394; Probst et al., DNA Cell Biol. (1992) 11:1; Savarese et al., Biochem. J. (1992) 283 : 1. The consensus pattern that contains the conserved triplet and that also spans the major part of the third transmembrane helix is used to detect this widespread family of proteins: [GSTALlVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[LINMNQGA]-x(2)- |LIVMFT]- [GSTANC]-|LIVMFYWSTAC]-|pENH]-R-[FYWCSH]-x(2)- |LINM]. Basic Region Plus Leucine Zipper Transcription Factors (bZD?: Pfam Accession

No. PF00170). SEQ ID NO: 638 represents a polynucleotide encoding a novel member of the family of basic region plus leucine zipper transcription factors. The bZD? superfamily (Hurst, Protein Prof. (1995) 2:105; and Ellenberger, Curr. Opin. Struct. Biol. (1994) : 12) of eukaryotic DNA-binding transcription factors encompasses proteins that contain a basic region mediating sequence-specific DNA-binding followed by a leucine zipper required for dimerization. The consensus pattern for this protein family is: [KR]-x(l,3)-|RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK].

Reverse Transcriptase (rvt: Pfam Accession No. PF00078). SEQ ID NO: 137 represents a polynucleotide encoding a reverse transcriptase, which occurs in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses (Xiong and Eickbush, EMBOJ(1990) 9:3353-3362). Reverse transcriptases catalyze RNA-template-directed extension of the 3 ' -end of a DNA strand by one deoxynucleotide at a time and require an RNA or DNA primer.

KRAB box (KRAB: Pfam Accession No. PF01352). SEQ ID NO: 1012 represents a polypeptide having a Krueppel-associated box (KRAB). A KRAB box is a domain of around 75 amino acids that is found in the N-terminal part of about one third of eukaryotic Krueppel-type C2H2 zinc finger proteins (ZFPs). It is enriched in charged amino acids and can be divided into subregions A and B, which are predicted to fold into two amphipathic alpha-helices. The KRAB A and B boxes can be separated by variable spacer segments and many KRAB proteins contain only the A box.

The KRAB domain functions as a transcriptional repressor when tethered to the template DNA by a DNA-binding domain. A sequence of 45 amino acids in the KRAB A subdomain has been shown to be necessary and sufficient for transcriptional repression. The B box does not repress by itself but does potentiate the repression exerted by the KRAB A subdomain. Gene silencing requires the binding of the KRAB domain to the RJNG-B box-coiled coil (RBCC) domain of the KAP-l/TEFl- beta corepressor. As KAP-1 binds to the heterochromatin proteins HPl, it has been proposed that the KRAB-ZFP-bound target gene could be silenced following recruitment to heterochromatin. KRAB-ZFPs constitute one of the single largest class of transcription factors within the human genome, and appear to play important roles during cell differentiation and development. The KRAB domain is generally encoded by two exons. The regions coded by the two exons are known as KRAB-A and KRAB-B.

Armadillo/beta-catenin-like repeat (Armadillo seg; Pfam Accession No. PF00514). SEQ ID NO: 1486 represents a polypeptide having sequence similarity with the armadillo/beta-catenin-like repeat (armadillo). The armadillo repeat is an approximately 40 amino acid long tandemly repeated sequence motif first identified in the Drosophila segment polarity gene armadillo. Similar repeats were later found in the mammalian armadillo homolog beta-catenin, the junctional plaque protein plakoglobin, the adenomatous polyposis coli (APC) tumor suppressor protein, and a number of other proteins (Peifer et al, Cell 76(2):786-791 (1994)). The 3 dimensional fold of an armadillo repeat is known from the crystal structure of beta- catenin (Rojas et al, Cell 95: 105-130 (1998)). There, the 12 repeats form a superhelix of alpha- helices, with three helices per unit. The cylindrical structure features a positively charged grove which presumably interacts with the acidic surfaces of the known interaction partners of beta-catenin. Cadherin domain (cadherin; Pfam Accession No. PF00028). SEQ ED NO: 1523 represents a polypeptide having sequence similarity to a cadherin domain. Cadherins are a family of animal glycoproteins responsible for calcium-dependent cell-cell adhesion (Takeichi, Annu. Rev. Biochem. 59:237-252(1990); Takeichi, Trends Genet. 3:213-217(1987)). Cadherins preferentially interact with themselves in a homophilic manner in connecting cells; thus acting as both receptor and ligand. A wide number of tissue-specific forms of cadherins are known, for example: Epithelial (E-cadherin)

(CDH1); Neural (N-cadherin) (CDH2); Placental (P-cadherin) (CDH3); Retinal (R-cadherin) (CDH4); Vascular endothelial (VE-cadherin) (CDH5); Kidney (K-cadherin) (CDH6); Cadherin-8 (CDH8); Cadherin-9 (CDH9); Osteoblast (OB-cadherin) (CDH11); Brain (BR-cadherin) (CDH12); T-cadherin (truncated cadherin) (CDH13); Muscle (M-cadherin) (CDH15); Kidney (Ksp-cadherin) (CDH16); and Liver-intestine (Ll-cadherin) (CDH17).

Structurally, cadherins are built of the following domains: a signal sequence, followed by a propeptide of about 130 residues, then an extracellular domain of around 600 residues, then a transmembrane region, and finally a C-terminal cytoplasmic domain of about 150 residues. The extracellular domain can be sub-divided into five parts: there are four repeats of about 110 residues followed by a region that contains four conserved cysteines. The calcium-binding region of cadherins may be located in the extracellular repeats. The signature pattern for the repeated domain is located in the C-terminal extremity, which is its best conserved region. The pattern includes two conserved aspartic acid residues and two asparagines; these residues could be implicated in the binding of calcium. The consensus pattern is: [L ]-x-[LIV]-x-D-x-N-D-[NH]-x-P. CBS domain (CBS: Pfam Accession No. PF00571). SEQ ID NOS:1510 and 1511 represent polypeptides having sequence similarity to CBS domains, which are present in all 3 forms of cellular life, including two copies in inosine monophosphate dehydrogenase, of which one is disordered in the crystal structure. A number of disease states are associated with CBS-containing proteins including homocystinuria, Becker's and Thomsen disease. CBS domains are small intracellular modules of unknown function. They are mostly found in

2 or four copies within a protein. Pairs of CBS domains dimerise to form a stable globular domain (Zhang et al., Biochemistry 38:4691-4700 (1999)). Two CBS domains are found in inosine- monophosphate dehydrogenase from all species, however the CBS domains are not needed for activity. CBS domains are found attached to a wide range of other protein domains suggesting that CBS domains may play a regulatory role. The region containing the CBS domains in Cystathionine- beta synthase is involved in regulation by S-AdoMet (Zhang et al., Biochemistry 38:4691-4700 (1999)). The 3D Structure is found as a sub-domain in TEVI barrel of inosine-monophosphate dehydrogenase.

Phorbol esters/diacylglycerol binding domain (Cl domain) (DAG PE-bind; Pfam Accessin No. PF00130). SEQ ED NO: 1514 represents a polypeptide having sequence similarity to the Phorbol esters/diacylglycerol binding domain (Cl domain). Diacylglycerol (DAG) is an important second messenger. Phorbol esters (PE) are analogues of DAG and potent tumor promoters that cause a variety of physiological changes when administered to both cells and tissues. DAG activates a family of serine/threonine protein kinases, collectively known as protein kinase C (PKC) (Azzi et al, Eur. J. Biochem. 208:547-557 (1992)). Phorbol esters can also directly stimulate PKC. The N-terminal region of PKC, known as Cl, has been shown to bind PE and DAG in a phospholipid and zinc-dependent fashion(Ono et al, Proc. Natl. Acad. Sci. U.S.A. 86:4868-4871 (1989)). The Cl region contains one or two copies (depending on the isozyme of PKC) of a cysteine- rich domain about 50 amino-acid residues long and essential for DAG/PE-binding. The DAG/PE- binding domain binds two zinc ions; the ligands of these metal ions are probably the six cysteines and two histidines that are conserved in the Cl domain. The consensus sequence for the Cl domain is: H- x-[LIVMFYW]-x(8,l l)-C-x(2)-C-x(3)-ILlVMFC]-x(5,10)-C-x(2)-C-x(4)-|TTO the C and H are involved in binding Zinc].

GATA zinc finger (GATA: Pfam Accession No. PF00320). SEQ ID NO: 1520 represents a polypeptide having sequence similarity to GATA zinc finger. A number of transcription factors, including erythroid-specific transcription factor and nitrogen regulatory proteins, specifically bind the DNA sequence (A/T)GATA(A/G) in the regulatory regions of genes (Yamamoto et al, Genes Dev. 4: 1650-1662 (1990)) and are consequently termed GATA-binding transcription factors. The interactions occur via highly-conserved zinc finger domains in which the zinc ion is coordinated by 4 cysteine residues (Evans and Felsenfeld, Cell 58:877-885 (1989); Omichinski et al., Science 261:438- 446 (1993)).

NMR studies have shown the core of the zinc finger to comprise 2 irregular anti-parallel beta- sheets and an alpha-helix, followed by a long loop to the C-terminal end of the finger. The N-terminal part, which includes the helix, is similar in structure, but not sequence, to the N-terminal zinc module of the glucocorticoid receptor DNA-binding domain. The helix and the loop connecting the 2 beta- sheets interact with the major groove of the DNA, while the C-terminal tail wraps around into the minor groove. It is this tail that is the essential determinant of specific binding. Interactions between the zinc finger and DNA are mainly hydrophobic, explaining the preponderance of thymines in the binding site; a large number of interactions with the phosphate backbone have also been observed (Omichinski et al., Science 261:438-446 (1993)). Two GATA zinc fingers are found in the GATA transcription factors; however, there are several proteins which only contains a single copy of the domain. The consensus sequence of the domain is: C-x-[DN]-C-x(4,5)-[ST]-x(2)-W-|ΗR]-[RK]-x(3)- [GN]-x(3,4)- C-N-[AS]-C [The four C's are zinc ligands].

Glutathione S-transferase. N-terminal domain (GST N: Pfam Accession No. PF02798 . SEQ ID NO: 1507 represents a polypeptide having sequence similarity to Glutathione S-transferase, N- terminal domain, hi eukaryotes, glutathione S-transferases (GSTs) participate in the detoxification of reactive electrophilic compounds by catalysing their conjugation to glutathione. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1 -gamma and the HSP26 family of stress-related proteins, which include auxin- regulated proteins in plants and stringent starvation proteins in E. coli. The major lens polypeptide of Cephalopoda is also a GST.

Bacterial GSTs of known function often have a specific, growth-supporting role in biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory proteins, like the stringent starvation proteins, also belong to the GST family. GST seems to be absent from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol.

Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the Al and A2 or YC1 and YC2 subunits. The homodimeric enzymes display a conserved structural fold. Each monomer is composed of a distinct N-terminal sub-domain, which adopts the thioredoxin fold, and a C-terminal all-helical sub-domain. GTF2I-like repeat (GTF2I: Pfam Accession No. PF02946). SEQ ID NOS: 1500, 1501, and

1542 represent polypeptides having sequence similarity to proteins having GTF2I-like repeat. This region of sequence similarity is found up to six times in a variety of proteins including GTF2I. It has been suggested that this may be a DNA binding domain (O'Mahoney et al, Mol. Cell. Biol. 18:6641- 6652 (1998); Osborne et al., Genomics 57:279-284 (1999)). Core histone H2A/H2B/H3/H4 (histone: Pfam Accession No. PF00125). SEQ ID NO: 1497 represents a polypeptide having sequence similarity to core histone H2A/H2B/H3/H4 family polypeptides. Histone H2A is one of the four histones, along with H2B, H3 and H4, which forms the eukaryotic nucleosome core. Using alignments of histone H2A sequences (Wells and Brown, Nucleic Acids Res. 19:2173-2188(1991); Thatcher and Gorovsky, Nucleic Acids Res. 22:174-179(1994)) a conserved region in the N-terminal part of H2A was used to develop a signature pattern. This region is conserved both in classical S-phase regulated H2A's and in variant histone H2A's which are synthesized throughout the cell cycle. The consensus pattern is: [AC]-G-L-x-F-P-V.

Histone H4, along with H3, plays a central role in nucleosome formation. The sequence of histone H4 has remained almost invariant in more then 2 billion years of evolution (Thatcher and Gorovsky, Nucleic Acids Res. 22:174-179(1994)). The region used as a signature pattern is a pentapeptide found in positions 14 to 18 of all H4 sequences. It contains a lysine residue which is often acetylated (Doenecke and Gallwitz, Mol. Cell Biochem. 44:113-128(1982)) and a histidine residue which is implicated in DNA-binding (Ebralidse et al, Nature 331 :365-367(1988)). The consensus pattern is: G-A-K-R-H.

Histone H3 is a highly conserved protein of 135 amino acid residues (Wells and Brown, Nucleic Acids Res. 19:2173-2188(1991); Thatcher and Gorovsky, Nucleic Acids Res. 22:174- 179(1994)). Two signature patterns have been developed, the first one corresponds to a perfectly conserved heptapeptide in the N-terminal part of H3, while the second one is derived from a conserved region in the central section of H3. The consensus patterns are: K-A-P-R-K-Q-L and P-F- x-[RA]-L-[VA]-[KRQ]-[DEG]-[ιV]. The signature pattern of histone H2B corresponds to a conserved region in the C-terminal part of the protein. The consensus pattern is: |^]-E-[LINM]-[EQ]-T-x(2)-|l R]-x-[LIVM](2)-x- [PAG]-|OE]-L-x-[KR]-H-A-[LrVM]-[STA]-E-G

HMG (high mobility group) box (HMG box: Pfam Accession No. PF00505\ SEQ ED NO: 1525 corresponds to a polypeptide having sequence similarity to high mobility group proteins, a family of relatively low molecular weight non-histone components in chromatin. HMGl (also called HMG-T in fish) and HMG2 (Bustin et al, Biochim. Biophys. Acta 1049: 231-243(1990)) are two highly related proteins that bind single-stranded DNA preferentially and unwind double-stranded DNA. HMG1/2 have about 200 amino acid residues with a highly acidic C-terminal section which is composed of an uninterrupted stretch of from 20 to 30 aspartic and glutamic acid residues; the rest of the protein sequence is very basic. In addition to the HMGl and HMG2 proteins, HMG-domains occur in single or multiple copies in the following protein classes; the SOX family of transcription factors; SRY sex determining region Y protein and related proteins; LEF1 lymphoid enhancer binding factor 1; SSRP recombination signal recognition protein; MTF1 mitochondrial transcription factor 1; UBFl/2 nucleolar transcription factors; Abf2 yeast ARS-binding factor; and yeast transcription factors Ixrl, Roxl, Nhp6a, Nhp6b and Spp41.

Importin beta binding domain (TBB: Pfam Accession No. PF01749). SEQ ID NO: 1486 represents a polypeptide having sequence similarity to importin beta binding domain family polypeptides. This family consists of the importin alpha (karyopherin alpha), importin beta (karyopherin beta) binding domain. The domain mediates formation of the importin alpha beta complex; required for classical NLS import of proteins into the nucleus, through the nuclear pore complex and across the nuclear envelope. Also in the alignment is the NLS of importin alpha which overlaps with the D3B domain (Moroianu et al, Proc. Natl. Acad. Sci. U.S.A. 93:6572-6576(1996)).

T-box domain (T-box: Pfam Accession No. PF00907). SEQ ID NOS: 1518 represents a polypeptide having sequence similarity to proteins having a T-box domain. The T-box gene family is an ancient group of putative transcription factors that appear to play a critical role in the development of all animal species. These genes were uncovered on the basis of similarity to the DNA binding domain (Papaioannou and Silver, Bioessays 20:9-19 (1998)) of murine Brachyury (T) gene product, which similarity is the defining feature of the family. The Brachyury gene is named for its phenotype, which was identified 70 years ago as a mutant mouse strain with a short blunted tail. The gene, and its paralogues, have become a well-studied model for the family, and hence much of what is known about the T-box family is derived from the murine Brachyury gene.

Consistent with its nuclear location, Brachyury protein has a sequence-specific DNA-binding activity and can act as a transcriptional regulator (Wattler et al, Genomics 48:24-33(1998)). Homozygous mutants for the gene undergo extensive developmental anomalies, thus rendering the mutation lethal (Kavka and Green, Biochim. Biophys. Acta 1333(2) (1997)). The postulated role of Brachyury is as a transcription factor, regulating the specification and differentiation of posterior mesoderm during gastrulation in a dose-dependent manner (Papaioannou and Silver, Bioessays 20:9- 19 (1998)).

Common features shared by T-box family members are, DNA-binding and transcriptional regulatory activity, a role in development and conserved expression patterns. Most of the known genes in all species are expressed in mesoderm or mesoderm precursors (Papaioannou, Trends Genet. 13:212-213(1997)). Members of the T-box family contain a domain of about 170 to 190 amino acids lαiown as the T-box domain (Papaioannou, Trends Genet. 13: 212-213(1997); Bollag et al, Nat. Genet. 7: 383-389(1994); Agulnik et al., Genetics 144:249-254(1996)) and which probably binds DNA. As signature patterns for the T-domain, we selected two conserved regions. The first region corresponds to the N-terminal of the domain and the second one tothe central part. The consensus sequences are: ]_^W-x(2)-ITC]-x(3,4)-|NT]-E-M-[LlN](2)-T-x(2)-G-|T<G]-ITαR.Q] and [LIVMFYW]- H-[PADH]-[DEΝQ]-[GS]-x(3)-G-x(2)-W-M-x(3)-[TVA]-x-F.

60s Acidic ribosomal protein (60s ribosomal: Pfam Accession No. PF00428). SEQ ID NO: 905 represents a polynucleotide encoding a member of the 60s acidic ribosomal protein family. The 60S acidic ribosomal protein plays an important role in the elongation step of protein synthesis. This family includes archaebacterial LI 2, eukaryotic P0, PI and P2 (Remacha et al, Biochem. Cell Biol. 73:959-968(1995)).

Some of the proteins in this family are allergens. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans (WHO/TUIS Allergen Nomenclature Subcommittee King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E., Thomas W. Bull. World Health Organ. 72:797-806(1994)). This nomenclature system is defined by a designation that is composed of the first three letters of the genus; a space; the first letter of the species name; a space and an arabic number. In the event that two species names have identical designations, they are discriminated from one another by adding one or more letters (as necessary) to each species designation. The allergens in this family include allergens with the following designations: Alt a 6, Alt a 12, Cla h 3, Cla h 4, and Cla h 12. AP endonuclease family 1 (A endonucleasl; Pfam Accession No. PFO 1260V SEQ ID NOS:358 and 836 correspond to a polynucleotide encoding a member of the family of polypeptides designated AP endonuclease family 1. DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate oxygen radicals produce a variety of lesions in DNA. Amongst these is base-loss which forms apurinic/apyrimidinic (AP) sites or strand breaks with atypical 3 '-termini. DNA repair at the AP sites is initiated by specific endonuclease cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blocking groups from the 3'-terminus of DNA strand breaks.

AP endonucleases can be classified into two families, on the basis of sequence similarity. This family contains members of AP endonuclease family 1. Except for Rrpl and arp, these enzymes are proteins of about 300 amino-acid residues. Rrpl and arp both contain additional and unrelated sequences in their N-terminal section (about 400 residues for Rrpl and 270 for arp). The proteins contain glutamate which has been shown (Mol et al, Nature 374: 381-386(1995)), in the Escherichia coli enzyme to bind a divalent metal ion such as magnesium or manganese. The consensus sequences for this family of polypeptides are: [APF]-D-|LlVMF](2)-x-ILIVM]-Q-E-x-K [E binds a divalent metal ion]; D-[ST]-ITY]-R-I KH]-x(7,8)-[FYW]-[ST]-[FYW](2); and N-x-G-x-R-[LIVM]-D- |LIVMFYH]-x-[LV]-x-S

Bowman-Birk serine protease inhibitor family (Bowman-Birk leg: Pfam Accession No. 00228). SEQ ID NO: 321 represents a polynucleotide encoding a polypeptide having sequence similarity to a member of the Bowman-Birk serine protease inhibitor family. The Bowman-Birk inhibitor family (Laskowski and Kato, Annu. Rev. Biochem. 49:593-626(1980)) is one of the numerous families of serine proteinase inhibitors and has a duplicated structure and generally possesses two distinct inliibitory sites.

These inhibitors are found in the seeds of all leguminous plants as well as in cereal grains. In cereals they exist in two forms, one of which is a duplication of the basic structure (Tashiro et al, J. Biochem. 102:297-306(1987)). The signature pattern for sequences belonging to this family of inliibitors is in the central part of the domain and includes four cysteines. The consensus pattern is: C- x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C-|røEKS]-pEKRHSTA]-C [The four C's are involved in disulfide bonds]. Note that this pattern can be found twice in some duplicated cereal inliibitors.

Cation efflux family (Cation efflux: Pfam Accession No. PF01545). SEQ ID NO: 321 encodes a polypeptide having sequence similarity to members of the cation efflux family of proteins. Members of this family are integral membrane proteins, that are found to increase tolerance to divalent metal ions such as cadmium, zinc, and cobalt. These proteins are thought to be efflux pumps that remove these ions from cells (Xiong and Jayaswal, J. Bacteriol. 180: 4024-4029(1998); Kunito et al, Biosci. Biotechnol. Biochem. 60: 699-704(1996)). DC1 domain (DC1: Pfam Accession No. PF031071. SEQ ED NO: 89 corresponds to a polypeptide having sequence similarity to a DC1 domain. This short domain is rich in cysteines and histidines. The pattern of conservation is similar to that found in DAG_PE-bind (Pfam Accession No. PF00130), therefore this domain has been termed DC1 for divergent Cl domain. Like the DAG_PE- bind domain, this domain probably also binds to two zinc ions. The function of proteins with this domain is uncertain, however this domain may bind to molecules such as diacylglycerol. This family are found in plant proteins.

Pneumovirus attachment glvcoprotein G (Glycoprotein G: Pfam Accession No. PF00802). SEQ ID NO:995 represents a polypeptide having sequence similarity to members of the Pneumovirus attacliment glycoprotein G protein family. This family includes attachment proteins from respiratory synctial virus. Glycoprotein G has not been shown to have any neuraminidase or hemagglutinin activity. The amino terminus is thought to be cytoplasmic, and the carboxyl terminus extracellular. The extracellular region contains four completely conserved cysteine residues.

NADH-Ubiquinone/plastoquinone (complex I), various chains (oxidored ql; Pfam Accession No. PF00361). SEQ ID NO:413 represents a polypeptide having sequence similarity to NADH- Ubiquinone/plastoquinone (complex I), various chains protein family. This family is part of the NADH:ubiquinone oxidoreductase (complex I) which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction that is associated with proton translocation across the membrane (Walker, Q. Rev. Biophys. 25: 253-324(1992)). Sub-families within this protein family include NADH-ubiquinone oxidoreductase chain 5; NADH-ubiquinone oxidoreductase chain 2; NADH- ubiquinone oxidoreductase chain 4; and Multicomponent K+:H+antiporter.

Protamine PI (protamine PI: Pfam Accession No. PF00260). SEQ ED NOS :645 and 1217 represent polypeptides having sequence similarity to Protamine PI protein family. Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during the haploid phase of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex. There are two different types of mammalian protamine, called PI and P2. PI has been found in all species studied, while P2 is sometimes absent. There also seems to be a single type of avian protamine whose sequence is closely related to that of mammalian PI (Oliva et al, J. Biol. Chem. 264: 17627-17630(1989)). A conserved region at the N-terminal extremity of the sequence is used as a signature pattern for this family of proteins. The consensus pattern is: [AV]-R-[NFY]-R-x(2,3)- [ST]-x-S-x-S.

Squash family serine protease inhibitor (squash: Pfam Accession No. PF00299). SEQ ED NO:995 represents a polypeptide having sequence similarity to Squash family serine protease inhibitor proteins. The squash inliibitors form one of a number of serine protease inhibitor families. The proteins, found in the seeds of cucurbitaceae plants (squash, cucumber, balsam pear, etc.), are approximately 30 residues in length, and contain 6 Cys residues, which form 3 disulfide bonds (Bode et al, FEBSLett. 242: 285-292(1989)). The inliibitors function by being taken up by a serine protease (such as trypsin), which cleaves the peptide bond between Arg/Lys and lie residues in theN- terminal portion of the protein (Bode et al, FEBSLett. 242: 285-292(1989); Krishnamoorthi et al., Biochemistry 31 : 898-904(1992)). Structural studies have shown that the inhibitor has an ellipsoidal shape, and is largely composed of beta-tums (Bode et al, FEBSLett. 242: 285-292(1989)). The fold and Cys connectivity of the proteins resembles that of potato carboxypeptidase A inhibitor (Krishnamoorthi et al, Biochemistry 31 : 898-904(1992)). The pattern used to detect this family of proteins spans the major part of the sequence and includes five of the six cysteines involved in disulfide bonds. The consensus pattern is: C-P-x(5)-C-x(2)-[DN]-x-D-C-x(3)-C-x-C [The five C's are involved in disulfide bonds]

Metallothionein family 5 (Metallothio 5: Pfam Accession No. PF02067). SEQ ID NO:995 represents a polypeptide having sequence similarity to metallothionein family 5 proteins. Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, and nickel. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds (Kagi, Meth. Enzymol. 205: 613-626(1991); Kagi and K jima, Experientia Suppl. 52: 25-61(1987); Kagi and Schaffer, Biochemistry 27: 8509-8515(1988)).

Due to limitations in the original classification system of MTs, which did not allow clear differentiation of patterns of structural similarities, either between or within classes, all class I and class It MTs (the proteinaceous sequences) have now been grouped into families of phyiogenetically- related and thus alignable sequences. Diptera (Drosophila, family 5) MTs are 40-43 residue proteins that contain 10 conserved cysteines arranged in five Cys-X-Cys groups. In particular, the consensus pattern C-G-x(2)-C-x-C-x(2)-Q-x(5)-C-x-C-x(2)-D-C-x-C has been found to be diagnostic of family 5 MTs. The protein is found primarily in the alimentary canal, and its induction is stimulated by ingestion of cadmium or copper (Lastowski et al, J. Biol. Chem. 260: 1527-1530(1985)). Mercury, silver and zinc induce the protein to a lesser extent.

Caenorhabditis. elegans Sre G protein-coupled chemoreceptor (Sre: Pfam Accession No. PF03125). SEQ ID NO:591 represents a polypeptide having sequence similarity to C. elegans Sre G protein-coupled chemoreceptor family proteins. C. elegans Sre proteins are candidate chemosensory receptors. There are four main recognized groups of such receptors: Odr-10, Sra, Sro, and Srg. Sre (this family), Sra Sra and Srb Srb comprise the Sra group. All of the above receptors are thought to be G protein-coupled seven transmembrane domain proteins (Troemel, Bioessays 21: 1011-1020 (1999); Troemel etal, Cell 83:207-218 (1995)).

Svndecan domain (Svndecan: Pfam Accession No. PF01034). SEQ ED NO:995 corresponds to a polypeptide having a syndecan domain. Syndecans (Bernfield et al, Annu. Rev. Cell Biol. 8:365- 393(1992); David, FASEB J. 7: 1023-1030(1993)) are a family of transmembrane heparan sulfate proteoglycans which are implicated in the binding of extracellular matrix components and growth factors. Syndecans bind a variety of molecules via their heparan sulfate chains and can act as receptors or as co-receptors. Structurally, these proteins consist of four separate domains: a) a signal sequence; b) an extracellular domain (ectodomain) of variable length containing the sites of attachment of the heparan sulfate glycosaminoglycan side chains and whose sequence is not evolutionarily conserved in the various forms of syndecans; c) a transmembrane region; and d) a highly conserved cytoplasmic domain of about 30 to 35 residues which could interact with cytoskeletal proteins.

The signature pattern for syndecans starts with the last residue of the transmembrane region and includes the first 10 residues of the cytoplasmic domain. This region, which contains four basic residues, may act as a stop transfer site. The consensus pattern is: [FY]-R-[EVI]-[KR]-K(2)-D-E-G-S- Y.

LI transposable element (Transposase 22: Pfam Accession No.PF02994). SEQ ED NO:774 represents a polypeptide having an LI transposable element. Many human LI elements are capable of retrotransposition and some of these have been shown to exhibit reverse transcriptase (RT) activity (Sassaman et al, Nat Genet 16(1):37-43(1997)) although the function of many are, as yet, unknown. There are estimated to be 30-60 active LI elements reside in the average diploid genome.

WW domain (WW: Pfam Accession No. PF00397^'). SEQ ID NO:431 represents a polypeptide having WW domain. The WW domain (also known as rsp5 or WWP) is a short conserved region in a number of unrelated proteins, among them dystrophin, responsible for Duchemie muscular dystrophy. This short domain may be repeated up to four times in some proteins (Bork and Sudol, Trends Biochem. Sci. 19: 531-533(1994); Andre and Springael, Biochem. Biophys. Res. Commun. 205: 1201-1205(1994); Hofmann and Bucher, FEBSLett. 358: 153-157(1995); Sudol et al., FEBSLett. 369: 67-71(1995)). The WW domain binds to proteins with particular proline- motifs, [AP]-P-P-[AP]-Y, and having four conserved aromatic positions that are generally Trp (Chen and Sudol, Proc. Natl. Acad. Sci. U.S.A. 92: 7819-7823(1995)). The name WW or WWP derives from the presence of these Trp as well as that of a conserved Pro. The WW domain is frequently associated with other domains typical for proteins in signal transduction processes.

A large variety of proteins containing the WW domain are known. These include; dystrophin, a multidomain cytoskeletal protein; utrophin, a dystrophin-like protein of unknown function; vertebrate YAP protein, substrate of an unknown serine kinase; mouse NEDD-4, involved in the embryonic development and differentiation of the central nervous system; yeast RSP5, similar to NEDD-4 in its molecular organization; rat FE65, a transcription-factor activator expressed preferentially in liver; tobacco DB10 protein and others. The consensus pattern is: W-x(9,l 1)-[VFY]- [FYW]-x(6,7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. Example 6: Detection of Differential Expression Using Arrays and source of patient tissue samples mRNA isolated from samples of cancerous and normal breast and colon tissue obtained from patients were analyzed to identify genes differentially expressed in cancerous and normal cells. Normal and cancerous tissues were collected from patients using laser capture microdissection (LCM) techniques, which techniques are well known in the art (see, e.g., Ohyama et al. (2000) Biotechniques 29:530-6; Curran et al. (2000) Mol. Pathol 53:64-8; Suarez-Quian et al. (1999) Biotechniques 26:328-35; Simone et al. (1998) Trends Genet 14:272-6; Conia et al. (1997) J. Clin. Lab. Anal. 11 :28-38; Emmert-Buek et al. (1996) Science 274:998-1001).

Table 11 (inserted prior to claims) provides information about each patient from which colon tissue samples were isolated, including: the Patient ED ("PT ID") and Path ReportID ("Path ED"), which are numbers assigned to the patient and the pathology reports for identification purposes; the group ("Grp") to which the patients have been assigned; the anatomical location of the tumor ("Anatom Loc"); the primary tumor size ("Size"); the primary tumor grade ("Grade"); the identification of the histopathological grade ("Histo Grade"); a description of local sites to which the tumor had invaded ("Local Invasion"); the presence of lymph node metastases ("Lymph Met"); the incidence of lymph node metastases (provided as a number of lymph nodes positive for metastasis over the number of lymph nodes examined) ("Lymph Met Incid"); the regional lymphnode grade ("Reg Lymph Grade"); the identification or detection of metastases to sites distant to the tumor and their location ("Dist Met & Loc"); the grade of distant metastasis ("Dist Met Grade"); and general comments about the patient or the tumor ("Comments"). Histophatology of all primary tumors indicated the tumor was adenocarcinmoa except for Patient ID Nos. 130 (for which no information was provided), 392 (in which greater than 50% of the cells were mucinous carcinoma), and 784 (adenosquamous carcinoma). Extranodal extensions were described in three patients, Patient ID Nos. 784, 789, and 791. Lymphovascular invasion was described in Patient ID Nos. 128, 278, 517, 534, 784, 786, 789, 791, 890, and 892. Crohn's-like infiltrates were described in seven patients, Patient ID Nos. 52, 264, 268, 392, 393, 784, and 791.

Table 12 (below) provides information about each patient from which the breast tissue samples were isolated, including: 1) the "Pat Num", a number assigned to the patient for identification purposes; 2) the "Histology", which indicates whether the tumor was characterized as an intraductal carcinoma (EDC) or ductal carcinoma in situ (DCIS); 3) the incidence of lymph node metastases (LMF), represented as the number of lymph nodes positive to metastases out of the total number examined in the patient; 4) the "Tumor Size"; 5) "TNM Stage", which provides the tumor grade (T#), where the number indicates the grade and "p" indicates that the tumor grade is a pathological classification; regional lymph node metastasis (N#), where "0" indicates no lymph node metastases were found, "1" indicates lymph node metastases were found, and "X" means information not available and; the identification or detection of metastases to sites distant to the tumor and their location (M#), with "X" indicating that no distant mesatses were reported; and the stage of the tumor ("Stage Grouping"), "nr" indicates "no reported". Table 12. Breast cancer patient data.

Identification of differentially expressed genes cDNA probes were prepared from total RNA isolated from the patient cells described above.

Since LCM provides for the isolation of specific cell types to provide a substantially homogenous cell sample, this provided for a similarly pure RNA sample.

Total RNA was first reverse transcribed into cDNA using a primer containing a T7 RNA polymerase promoter, followed by second strand DNA synthesis. cDNA was then transcribed in vitro to produce antisense RNA using the T7 promoter-mediated expression (see, e.g., Luo et al. (1999) Nature Med 5: 117-122), and the antisense RNA was then converted into cDNA. The second set of cDNAs were again transcribed in vitro, using the T7 promoter, to provide antisense RNA. Optionally, the RNA was again converted into cDNA, allowing for up to a third round of T7-mediated amplification to produce more antisense RNA. Thus the procedure provided for two or three rounds of in vitro transcription to produce the final RNA used for fluorescent labeling.

Fluorescent probes were generated by first adding control RNA to the antisense RNA mix, and producing fluorescently labeled cDNA from the RNA starting material. Fluorescently labeled cDNAs prepared from the tumor RNA sample were compared to fluorescently labeled cDNAs prepared from normal cell RNA sample. For example, the cDNA probes from the normal cells were labeled with Cy3 fluorescent dye (green) and the cDNA probes prepared from the tumor cells were labeled with Cy5 fluorescent dye (red), and vice versa.

Each array used had an identical spatial layout and control spot set. Each microarray was divided into two areas, each area having an array with, on each half, twelve groupings of 32 x 12 spots, for a total of about 9,216 spots on each array. The two areas are spotted identically which provide for at least two duplicates of each clone per array.

Polynucleotides for use on the arrays were obtained from both publicly available sources and from cDNA libraries generated from selected cell lines and patient tissues. PCR products of from about 0.5kb to 2.0 kb amplified from these sources were spotted onto the array using a Molecular Dynamics Gen in spotter according to the manufacturer's recommendations. The first row of each of the 24 regions on the array had about 32 control spots, including 4 negative control spots and 8 test polynucleotides. The test polynucleotides were spiked into each sample before the labeling reaction with a range of concentrations from 2-600 pg/slide and ratios of 1 : 1. For each array design, two slides were hybridized with the test samples reverse-labeled in the labeling reaction. This provided for about four duplicate measurements for each clone, two of one color and two of the other, for each sample. The differential expression assay was performed by mixing equal amounts of probes from tumor cells and normal cells of the same patient ("matched") or from tumor cells and normal cells of different patients ("unmatched") (i.e., the tumor cells are from one patient and the normal cells are from a different patient). The arrays were prehybridized by incubation for about 2 hrs at 60°C in 5X SSC/0.2%) SDS/1 mM EDTA, and then washed three times in water and twice in isopropanol.

Following prehybridization of the array, the probe mixture was then hybridized to the array under conditions of high stringency (overnight at 42°C in 50% formamide, 5X SSC, and 0.2% SDS. After hybridization, the array was washed at 55°C three times as follows: 1) first wash in IX SSC/0.2% SDS; 2) second wash in 0.1X SSC/0.2% SDS; and 3) third wash in 0.1X SSC. The arrays were then scanned for green and red fluorescence using a Molecular Dynamics

Generation IH dual color laser-scanner/detector. The images were processed using BioDiscovery Autogene software, and the data from each scan set normalized to provide for a ratio of expression relative to normal. Data from the microarray experiments was analyzed according to the algorithms described in U.S. application serial no. 60/252,358, filed November 20, 2000, by E.J. Moler, M.A. Boyle, and F.M. Randazzo, and entitled "Precision and accuracy in cDNA microarray data," which application is specifically incorporated herein by reference.

The experiment was repeated, this time labeling the two probes with the opposite color in order to perform the assay in both "color directions." Each experiment was sometimes repeated with two more slides (one in each color direction). The level fluorescence for each sequence on the array expressed as a ratio of the geometric mean of 8 replicate spots/genes from the four arrays or 4 replicate spots/gene from 2 arrays or some other permutation. The data were normalized using the spiked positive controls present in each duplicated area, and the precision of this normalization was included in the final determination of the significance of each differential. The fluorescent intensity of each spot was also compared to the negative controls in each duplicated area to determine which spots have detected significant expression levels in each sample. A statistical analysis of the fluorescent intensities was applied to each set of duplicate spots to assess the precision and significance of each differential measurement, resulting in a p-value testing the null hypothesis that there is no differential in the expression level between the tumor and normal samples of each patient in matched samples or between tumor and normal samples of tissue from different patients in unmatched samples. During initial analysis of the microarrays, the hypothesis was accepted if p > IO^"3, and the differential ratio was set to 1.000 for those spots. All other spots have a significant difference in expression between the tumor and normal sample. If the tumor sample has detectable expression and the normal does not, the ratio is truncated at 1000 since the value for expression in the normal sample would be zero, and the ratio would not be a mathematically useful value (e.g., infinity). If the normal sample has detectable expression and the tumor does not, the ratio is truncated to 0.001, since the value for expression in the tumor sample would be zero and the ratio would not be a mathematically useful value. These latter two situations are referred to herein as "on off." Database tables were populated using a 95%> confidence level (p>0.05).

Table 13 (inserted prior to claims) provides the results for gene products expressed by at least 2-fold or greater in cancerous prostate, colon, or breast tissue samples relative to normal tissue samples in at least 20% of the patients tested. Table 13 includes: 1) the SEQ ID NO ("SEQ ID") assigned to each sequence for use in the present specification; 2) the sequence name ("SEQ NAME") used as an internal identifier of the sequence; 3) the name assigned to the clone from which the sequence was isolated ("CLONE ED"); 4) the percentage of patients tested in which expression levels (e.g., as message level) of the gene was at least 2-fold greater in cancerous breast tissue than in matched normal tissue ("BREAST PATIENTS >=2x"); 5) the breast number ratios, indicating the number of patients upon which the provided ratio using matched breast tissue was based ("BREAST NUM RATIOS"); 6) the percentage of patients tested in which expression levels (e.g., as message level) of the gene was at least 2-fold greater in cancerous colon tissue than in matched normal tissue ("COLON PATIENTS >=2x"); 7) the colon number ratios, indicating the number of patients upon which the provided ratio using matched colon tissue was based ("COLON NUM RATIOS"); 8) the percentage of patients tested in which expression levels (e.g., as message level) of the gene was at least 2-fold greater in cancerous colon tissue than in unmatched normal tissue ("COLON UM >=2x"); 9) the unmatched colon number ratios, indicating the number of patients upon which the provided ratio using unmatched colon tissue was based ("COLON UM NUM RATIOS"). Table 16 (inserted prior to claims) provides the results for other gene products expressed by at least 2-fold or greater in cancerous prostate, colon, or breast tissue samples, which may be metastasized cancer samples, relative to normal tissue samples in at least 20%> of the patients tested. For each set of data (i.e., the percentage of patients in which a particular sequence is up-regulated in a cancer tissue) the number of patients (Colon Cancer Patients; Colon Unmatched Met Patients and Colon Match Met Patients) is shown. If a sample is matched, it is matched to a sample from the same patient, if a sample is unmatched, the results obtained from that sample are compared to a pooled sample of an appropriate tissue type from the patients. If a sample is not from a metastasized tissue, it is from a primary tumor.

These data provide evidence that the genes represented by the polynucleotides having the indicated sequences are differentially expressed in breast, prostate, cancer as compared to normal non- cancerous breast tissue and are differentially expressed in colon cancer as compared to normal non- cancerous colon tissue

The above methods can be performed to identify genes differentially expressed in cancerous and normal cells of any type of tissue, such as prostate, lung, colon, breast, and the like. Example 7: Antisense Regulation of Gene Expression The expression of the differentially expressed genes represented by the polynucleotides in the cancerous cells can be further analyzed using antisense knockout technology to confirm the role and function of the gene product in tumorigenesis, e.g., in promoting a metastatic phenotype.

Methods for analysis usmg antisense technology are well known in the art. For example, a number of different oligonucleotides complementary to the mRNA generated by the differentially expressed genes identified herein can be designed as antisense oligonucleotides, and tested for their ability to suppress expression of the genes. Sets of antisense oligomers specific to each candidate target are designed using the sequences of the polynucleotides corresponding to a differentially expressed gene and the software program HYBsimulator Version 4 (available for Windows 95/Windows NT or for Power Macintosh, RNAture, Inc. 1003 Health Sciences Road, West, Irvine, CA 92612 USA). Factors considered when designing antisense oligonucleotides include: 1) the The expression of the differentially expressed genes represented by the polynucleotides in the cancerous cells can be analyzed using antisense knockout technology to confirm the role and function of the gene product in tumorigenesis, e.g., in promoting a metastatic phenotype.

A number of different oligonucleotides complementary to the mRNA generated by the differentially expressed genes identified herein can be designed as potential antisense oligonucleotides, and tested for their ability to suppress expression of the genes. Sets of antisense oligomers specific to each candidate target are designed using the sequences of the polynucleotides corresponding to a differentially expressed gene and the software program HYBsimulator Version 4 (available for Windows 95/Windows NT or for Power Macintosh, RNAture, Inc. 1003 Health Sciences Road, West, Irvine, CA 92612 USA). Factors that are considered when designing antisense oligonucleotides include: 1) the secondary structure of oligonucleotides; 2) the secondary structure of the target gene; 3) the specificity with no or minimum cross-hybridization to other expressed genes; 4) stability, 5) length and 6) terminal GC content. The antisense oligonucleotide is designed so that it will hybridize to its target sequence under conditions of high stringency at physiological temperatures (e.g., an optimal temperature for the cells in culture to provide for hybridization in the cell, e.g., about 37°C), but with minimal formation of homodimers.

Using the sets of oligomers and the HYBsimulator program, three to ten antisense oligonucleotides and their reverse controls are designed and synthesized for each candidate mRNA transcript, which transcript is obtained from the gene corresponding to the target polynucleotide sequence of interest. Once synthesized and quantitated, the oligomers are screened for efficiency of a transcript knock-out in a panel of cancer cell lines. The efficiency of the knock-out is determined by analyzing mRNA levels using lightcycler quantification. The oligomers that resulted in the highest level of transcript knock-out, wherein the level was at least about 50%>, preferably about 80-90%), up to 95%) or more up to undetectable message, are selected for use in a cell-based proliferation assay, an anchorage independent growth assay, and an apoptosis assay. The ability of each designed antisense oligonucleotide to inhibit gene expression is tested through transfection into LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate carcinoma cells. For each transfection mixture, a carrier molecule (such as a lipid, lipid derivative, lipid-like molecule, cholesterol, cholesterol derivative, or cholesterol-like molecule) is prepared to a working concentration of 0.5 mM in water, sonicated to yield a uniform solution, and filtered tlirough a 0.45 μm PVDF membrane. The antisense or control oligonucleotide is then prepared to a working concentration of 100 μM in sterile Millipore water. The oligonucleotide is further diluted in OptiMEM™ (Gibco/BRL), in a microfuge tube, to 2 μM, or approximately 20 μg oligo/ml of OptiMEM™. hi a separate microfuge tube, the carrier molecule, typically in the amount of about 1.5-2 nmol carrier/μg antisense oligonucleotide, is diluted into the same volume of OptiMEM™ used to dilute the oligonucleotide. The diluted antisense oligonucleotide is immediately added to the diluted carrier and mixed by pipetting up and down. Oligonucleotide is added to the cells to a final concentration of 30 nM.

The level of target mRNA that corresponds to a target gene of interest in the transfected cells is quantitated in the cancer cell lines using the Roche LightCycler™ real-time PCR machine. Values for the target mRNA are normalized versus an internal control (e.g., beta-actin). For each 20 μl reaction, extracted RNA (generally 0.2-1 μg total) is placed into a sterile 0.5 or 1.5 ml microcentrifuge tube, and water is added to a total volume of 12.5 μl. To each tube is added 7.5 μl of a buffer/enzyme mixture, prepared by mixing (in the order listed) 2.5 μl H₂0, 2.0 μl 10X reaction buffer, 10 μl oligo dT (20 pmol), 1.0 μl dNTP mix (10 mM each), 0.5 μl RNAsin® (20u) (Ambion, Inc., Hialeah, FL), and 0.5 μl MMLV reverse transcriptase (50u) (Ambion, Inc.). The contents are mixed by pipetting up and down, and the reaction mixture is incubated at 42°C for 1 hour. The contents of each tube are centrifuged prior to amplification.

An amplification mixture is prepared by mixing in the following order: IX PCR buffer EL, 3 mM MgCl₂, 140 μM each dNTP, 0.175 pmol each oligo, 1:50,000 dil of SYBR® Green, 0.25 mg/ml BSA, 1 unit Taq polymerase, and H₂0 to 20 μl. (PCR buffer II is available in 10X concentration from

Perkin-Elmer, Norwalk, CT). In IX concentration it contains 10 mM Tris pH 8.3 and 50 mM KC1.

SYBR® Green (Molecular Probes, Eugene, OR) is a dye which fluoresces when bound to double stranded DNA. As double stranded PCR product is produced during amplification, the fluorescence from SYBR® Green increases. To each 20 μl aliquot of amplification mixture, 2 μl of template RT is added, and amplification is carried out according to standard protocols. The results are expressed as the percent decrease in expression of the corresponding gene product relative to non-transfected cells, vehicle-only transfected (mock-transfected) cells, or cells transfected with reverse control oligonucleotides.

Example 8: Effect of Expression on Proliferation The effect of gene expression on the inhibition of cell proliferation can be assessed in metastatic breast cancer cell lines (MDA-MB-231 ("231 ")); SW620 colon colorectal carcinoma cells;

SKOV3 cells (a human ovarian carcinoma cell line); or LNCaP, PC3, 22Rvl, MDA-PCA-2b, or

DU145 prostate cancer cells.

Cells are plated to approximately 60-80%> confluency in 96-well dishes. Antisense or reverse control oligonucleotide is diluted to 2 μM in OptiMEM™. The oligonucleotide-OptiMEM™ can then be added to a delivery vehicle, which delivery vehicle can be selected so as to be optimized for the particular cell type to be used in the assay. The oligo/delivery vehicle mixture is then further diluted into medium with serum on the cells. The final concentration of oligonucleotide for all experiments can be about 300 nM. Antisense oligonucleotides are prepared as described above (see Example 3). Cells are transfected overnight at 37°C and the transfection mixture is replaced with fresh medium the next morning. Transfection is carried out as described above in Example 8.

Those antisense oligonucleotides that result in inhibition of proliferation of SW620 cells indicate that the corresponding gene plays a role in production or maintenance of the cancerous phenotype in cancerous colon cells. Those antisense oligonucleotides that inhibit proliferation in

SKOV3 cells represent genes that play a role in production or maintenance of the cancerous phenotype in cancerous breast cells. Those antisense oligonucleotides that result in inhibition of proliferation of MDA-MB-231 cells indicate that the corresponding gene plays a role in production or maintenance of the cancerous phenotype in cancerous ovarian cells. Those antisense oligonucleotides that inliibit proliferation in LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 cells represent genes that play a role in production or maintenance of the cancerous phenotype in cancerous prostate cells. Using the following antisense oligonucleotides: TTGGTTCCCAAGACAAGCCGTGAC (SEQ ID NO:1543); TCTCAACGCTACCAGGCACTCCTTG (SEQ ID NO:1544); GCACAGCCCAAAGTCAAAGGCATTA (SEQ ID NO: 1545); CAGGCACTCCTTGGTCAAATGTGGG (SEQ ID NO: 1546); GGACAGGGAAAGGAGAGGCTAGTCA (SEQ ED NO: 1547) and

TGCATTCTCTCCCACATCTCAACGC SEQ ID NO: 1548, corresponding to a glutothione transferase omega identified by SEQ ID NOS: 1377 and 1541 (Chiron Candidate Id 21), were used to inhibit proliferation of SW620 colon colorectal carcinoma cells. These antisense molecules reduced glutothione transferase omega RNA expression by approximately 90%. Example 9: Effect of Gene Expression on Cell Migration

The effect of gene expression on the inhibition of cell migration can be assessed in LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate cancer cells using static endothelial cell binding assays, non-static endothelial cell binding assays, and transmigration assays.

For the static endothelial cell binding assay, antisense oligonucleotides are prepared as described above (see Example 8). Two days prior to use, prostate cancer cells (CaP) are plated and transfected with antisense oligonucleotide as described above (see Examples 3 and 4). On the day before use, the medium is replaced with fresh medium, and on the day of use, the medium is replaced with fresh medium containing 2 μM CellTracker green CMFDA (Molecular Probes, Inc.) and cells are incubated for 30 min. Following incubation, CaP medium is replaced with fresh medium (no CMFDA) and cells are incubated for an additional 30-60 min. CaP cells are detached using CMF PBS/2.5 mM EDTA or trypsin, spun and resuspended in DMEM/1% BSA 10 mM HEPES pH 7.0. Finally, CaP cells are counted and resuspended at a concentration of lxl 0⁶ cells/ml.

Endothelial cells (EC) are plated onto 96-well plates at 40-50% confluence 3 days prior to use. On the day of use, EC are washed IX with PBS and 50λ DMDM/l%BSA/10mM HEPES pH 7 is added to each well. To each well is then added 50K (50λ) CaP cells in DMEM/1% BSA/ 1 OmM HEPES pH 7. The plates are incubated for an additional 30 min and washed 5X with PBS containing CεH and Mg^"1""1". After the final wash, 100 μL PBS is added to each well and fluorescence is read on a fluorescent plate reader (Ab492/Em 516 nm).

For the non-static endothelial cell binding assay, CaP are prepared as described above. EC are plated onto 24-well plates at 30-40%) confluence 3 days prior to use. On the day of use, a subset of EC are treated with cytokine for 6 hours then washed 2X with PBS. To each well is then added 150- 200K CaP cells in DMEM/1% BSA/ lOmM HEPES pH 7. Plates are placed on a rotating shaker (70 RPM) for 30 min and then washed 3X with PBS containing Ca and Mg**. After the final wash, 500 μL PBS is added to each well and fluorescence is read on a fluorescent plate reader (Ab492/Em 516 nm). For the transmigration assay, CaP are prepared as described above with the following changes. On the day of use, CaP medium is replaced with fresh medium containing 5 μM CellTracker green CMFDA (Molecular Probes, Inc.) and cells are incubated for 30 min. Following incubation, CaP medium is replaced with fresh medium (no CMFDA) and cells are incubated for an additional 30-60 min. CaP cells are detached using CMF PBS/2.5 mM EDTA or trypsin, spun and resuspended in EGM-2-MV medium. Finally, CaP cells are counted and resuspended at a concentration of 1x10⁶ cells/ml.

EC are plated onto FluorBlok transwells (BD Biosciences) at 30-40% confluence 5-7 days before use. Medium is replaced with fresh medium 3 days before use and on the day of use. To each transwell is then added 5 OK labeled CaP. 30 min prior to the first fluorescence reading, 10 μg of FITC-dextran (10K MW) is added to the EC plated filter. Fluorescence is then read at multiple time points on a fluorescent plate reader (Ab492/Em 516 nm).

Those antisense oligonucleotides that result in inhibition of binding of LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate cancer cells to endothelial cells indicate that the corresponding gene plays a role in the production or maintenance of the cancerous phenotype in cancerous prostate cells. Those antisense oligonucleotides that result in inliibition of endothelial cell transmigration by LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 prostate cancer cells indicate that the corresponding gene plays a role in the production or maintenance of the cancerous phenotype in cancerous prostate cells. Example 10: Effect of Gene Expression on Colony Formation

The effect of gene expression upon colony formation of SW620 cells, SKOV3 cells, MD- MBA-231 cells, LNCaP cells, PC3 cells, 22Rvl cells, MDA-PCA-2b cells, and DU145 cells can be tested in a soft agar assay. Soft agar assays are conducted by first establishing a bottom layer of 2 ml of 0.6%) agar in media plated fresh withm a few hours of layering on the cells. The cell layer is formed on the bottom layer by removing cells transfected as described above from plates using 0.05% trypsin and washing twice in media. The cells are counted in a Coulter counter, and resuspended to IO⁶ per ml in media. 10 μl aliquots are placed with media in 96-well plates (to check counting with WST1), or diluted further for the soft agar assay. 2000 cells are plated in 800 μl 0.4% agar in duplicate wells above 0.6% agar bottom layer. After the cell layer agar solidifies, 2 ml of media is dribbled on top and antisense or reverse control oligo (produced as described in Example 8) is added without delivery vehicles. Fresh media and oligos are added every 3-4 days. Colonies form in 10 days to 3 weeks. Fields of colonies are counted by eye. Wst-1 metabolism values can be used to compensate for small differences in starting cell number. Larger fields can be scanned for visual record of differences. Those antisense oligonucleotides that result in inhibition of colony formation of SW620 cells indicate that the corresponding gene plays a role in production or maintenance of the cancerous phenotype in cancerous colon cells. Those antisense oligonucleotides that inhibit colony formation in SKOV3 cells represent genes that play a role in production or maintenance of the cancerous phenotype in cancerous breast cells. Those antisense oligonucleotides that result in inhibition of colony formation of MDA-MB-231 cells indicate that the corresponding gene plays a role in production or maintenance of the cancerous phenotype in cancerous ovarian cells. Those antisense oligonucleotides that inhibit colony formation in LNCaP, PC3, 22Rvl, MDA-PCA-2b, or DU145 cells represent genes that play a rble in production or maintenance of the cancerous phenotype in cancerous prostate cells.

Example 11: Induction of Cell Death upon Depletion of Polypeptides by Depletion of mRNA ("Antisense Knockout")

In order to assess the effect of depletion of a target message upon cell death, LNCaP, PC3 , 22Rvl, MDA-PCA-2b, or DU145 cells, or other cells derived from a cancer of interest, can be transfected for proliferation assays. For cytotoxic effect in the presence of cisplatin (cis), the same protocol is followed but cells are left in the presence of 2 μM drug. Each day, cytotoxicity is monitored by measuring the amount of LDH enzyme released in the medium due to membrane damage. The activity of LDH is measured using the Cytotoxicity Detection Kit from Roche Molecular Biochemicals. The data is provided as a ratio of LDH released in the medium vs. the total LDH present in the well at the same time point and treatment (rLDH/tLDH). A positive control using antisense and reverse control oligonucleotides for BCL2 (a known anti-apoptotic gene) is included; loss of message for BCL2 leads to an increase in cell death compared with treatment with the control oligonucleotide (background cytotoxicity due to transfection).

Example 12: Functional Analysis of Gene Products Differentially Expressed in Cancer The gene products of sequences of a gene differentially expressed in cancerous cells can be further analyzed to confirm the role and function of the gene product in tumorigenesis, e.g., in promoting or inhibiting development of a metastatic phenotype. For example, the function of gene products corresponding to genes identified herein can be assessed by blocking function of the gene products in the cell. For example, where the gene product is secreted or associated with a cell surface membrane, blocking antibodies can be generated and added to cells to examine the effect upon the cell phenotype in the context of, for example, the transformation of the cell to a cancerous, particularly a metastatic, phenotype. In order to generate antibodies, a clone corresponding to a selected gene product is selected, and a sequence that represents a partial or complete coding sequence is obtained. The resulting clone is expressed, the polypeptide produced isolated, and antibodies generated. The antibodies are then combined with cells and the effect upon tumorigenesis assessed.

Where the gene product of the differentially expressed genes identified herein exhibits sequence homology to a protein of known function (e.g., to a specific kinase or protease) and/or to a protein family of known function (e.g., contains a domain or other consensus sequence present in a protease family or in a kinase family), then the role of the gene product in tumorigenesis, as well as the activity of the gene product, can be examined using small molecules that inhibit or enhance function of the corresponding protein or protein family.

Additional functional assays include, but are not necessarily limited to, those that analyze the effect of expression of the corresponding gene upon cell cycle and cell migration. Methods for performing such assays are well known in the art. Example 13: Deposit Information.

Deposits of the biological materials in the tables referenced below were made with either the Agricultural Research Service Culture Collection (NRRL), 1815 North University Street, Peoria, Illinois 61604, or with the American Type Culture Collection (ATCC), 10801 University Blvd.,

Manasas, VA 20110-2209, under the provisions of the Budapest Treaty, on or before the filing date of the present application. The accession number indicated is assigned after successful viability testing, and the requisite fees were paid. Access to said cultures will be available during pendency of the patent application to one determined by the Commissioner to be entitled to such under 37 C.F.R. §1.14 and 35 U.S.C. §122. All restriction on availability of said cultures to the public will be irrevocably removed upon the granting of a patent based upon the application. Moreover, the designated deposits will be maintained for a period of thirty (30) years from the date of deposit, or for five (5) years after the last request for the deposit; or for the enforceable life of the U.S. patent, whichever is longer. Should a culture become nonviable or be inadvertently destroyed, or, in the case of plasmid-containing strains, lose its plasmid, it will be replaced with a viable culture(s) of the same taxonomic description.

These deposits are provided merely as a convenience to those of skill in the art, and are not an admission that a deposit is required. A license may be required to make, use, or sell the deposited materials, and no such license is hereby granted. The deposit below was received by the ATCC on or before the filing date of the present application.

Table 14. Cell Lines Deposited with ATCC

In addition, pools of selected clones, as well as libraries containing specific clones, were assigned an "ES" number and a "CMCC" number (both internal references) and deposited with the NRRL. Table 15 (inserted before the claims) provides the NRRL Accession Nos. of the clones deposited as librarires named ES219-ES225 (CMCC5471-CMCC5477, respectively) on November 1, 2001, and of the clones deposited as a library named ES226 (CMCC5478) on November 7, 2001.

Retrieval of Individual Clones from Deposit of Pooled Clones. Where the biological deposit is composed of a pool of cDNA clones or a library of cDNA clones, the deposit was prepared by first transfecting each of the clones into separate bacterial cells. The clones in the pool or library were then deposited as a pool of equal mixtures in the composite deposit. Particular clones can be obtained from the composite deposit using methods well known in the art. For example, a bacterial cell containing a particular clone can be identified by isolating single colonies, and identifying colonies containing the specific clone through standard colony hybridization techniques, using an oligonucleotide probe or probes designed to specifically hybridize to a sequence of the clone insert (e.g., a probe based upon unmasked sequence of the encoded polynucleotide having the indicated SEQ ED NO). The probe should be designed to have a T_m of approximately 80°C (assuming 2°C for each A or T and 4°C for each G or C). Positive colonies can then be picked, grown in culture, and the recombinant clone isolated. Alternatively, probes designed in this manner can be used to PCR to isolate a nucleic acid molecule from the pooled clones according to methods well known in the art, e.g., by purifying the cDNA from the deposited culture pool, and using the probes in PCR reactions to produce an amplified product having the corresponding desired polynucleotide sequence.

Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims. Those skilled in the art will recognize, or be able to ascertain, using not more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such specific embodiments and equivalents are intended to be encompassed by the following claims.

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 8

Table 9

Table 11

-P- yo

Table 11

Table 11

Table 11

Table 11

Table 11

Table 11

Table 11

Table 13

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 15

Table 16

Table 16

oo <_π

Claims

We Claim:

1. An isolated polynucleotide comprising a nucleotide sequence which hybridizes under stringent conditions to a sequence selected from the group consisting of SEQ ID NOS: 1-1485, or complement thereof

2. An isolated polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide sequence having at least 90% sequence identity to a sequence selected from the group consisting of SEQ ID NOS: 1-1485, or complement thereof.

3. An isolated polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1-1485, or complement thereof.

4. The isolated polynucleotide of any one of claims 1-3, wherein the polynucleotide comprises at least 100 contiguous nucleotides of the nucleotide sequence or complement thereof.

5. The isolated polynucleotide of any one of claims 1-4, wherein the polynucleotide comprises at least 200 contiguous nucleotides of the selected nucleotide sequence or complement thereof.

6. An isolated polynucleotide comprising a nucleotide sequence of at least 90% sequence identity to a sequence selected from the group consisting of: SEQ ID NOS:l-1485 or complement therefore.

7. The isolated polynucleotide of claim 6, wherein the polynucleotide comprises a nucleotide sequence of at least 95% sequence identity to the selected nucleotide sequence.

8. The isolated polynucleotide of claim 6, wherein the polynucleotide comprises a nucleotide sequence that is identical to the selected nucleotide sequence.

9. A polynucleotide comprising a nucleotide sequence of an insert contained in a clone deposited as NRRL Accession No. B-30523, B-30524, B-30525, B-30526, B-30527, B-30528, B- 30529, or B-30581.

10. An isolated cDNA obtained by the process of amplification using a polynucleotide comprising at least 15 contiguous nucleotides of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 1-1485.

11. The isolated cDNA of claim 10, wherein the polynucleotide comprises at least 25 contiguous nucleotides of the selected nucleotide sequence.

12. The isolated cDNA of claim 10, wherein the polynucleotide comprises at least 100 contiguous nucleotides of the selected nucleotide sequence.

13. The isolated cDNA of claims 10, 11, or 12, wherein amplification is by polymerase chain reaction (PCR) amplification.

14. An isolated recombinant host cell containing the polynucleotide according to claims 1, 2,

3, 6, 9, or 10.

15. An isolated vector comprising the polynucleotide according to claims 1, 2, 3, 6, 9, or 10.

16. A method for producing a polypeptide, the method comprising the steps of: culturing a recombinant host cell containing the polynucleotide according to claims 1, 2, 3, 6, 9, or 10, said culturing being under conditions suitable for the expression of an encoded polypeptide; and recovering the polypeptide from the host cell culture.

17. An isolated polypeptide encoded by the polynucleotide according to claims 1, 2, 3, 6, 9, or 10.

18. An isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 1486-1542.

19. An antibody that specifically binds the polypeptide of claim 17 or 18.

20. A library of polynucleotides, wherein at least one of the polynucleotides comprises the sequence information of the polynucleotide according to claims 1, 2, 3, 6, 9, or 10.

21. The library of claim 20, wherein the library is provided on a nucleic acid array.

22. The library of claim 20, wherein the library is provided in a computer-readable format.

23. A method for detecting a cancerous cell, said method comprising: detecting a level of a product of a gene in a test sample obtained from a cell of a subject, wherein said gene is identified by a sequence having at least 80% sequence identity to a sequence selected from a group consisting of SEQ ID NOS: 1-1485, or a fragment thereof; and, comparing the level of said product to a control level of said gene product, wherein the presence of a cancerous cell is indicated by detection of said level and comparison to a control level of said gene product.

24. The method of claim 23, wherein said gene product is nucleic acid.

25. The method of claim 23, wherein said detecting step uses a polymerase chain reaction.

26. The method of claim 23, wherein said detecting step uses hybridization.

27. The method of claim 23, wherein said sample is a sample of prostate, colon or breast tissue.

28. A method for inhibiting a cancerous phenotype of a cell, said method comprising: contacting a mammalian cell with an agent for inhibition of a product of a gene, wherein said gene is identified by a sequence having at least 80% sequence identity to a sequence selected from a group consisting of SEQ ID NOS:l-1485, or a fragment thereof.

29. The method of claim 28, wherein said cancerous phenotype is aberrant cellular proliferation relative to a normal cell.

30. A method of treating a subject with cancer, said method comprising: administering to a subject a pharmaceutically effective amount of an agent, wherein said agent modulates the activity of a product of a gene identified by a sequence having at least 80% sequence identity to a sequence selected from a group consisting of SEQ ID NOS: 1-1485, or a fragment thereof.

31. A method for identifying an agent that modulates a biological activity of a gene product differentially expressed in a cancerous cell as compared to a normal cell, said method comprising: contacting a candidate agent with a product of a gene encoded by a gene defined by a sequence having at least 80% sequence identity to a sequence selected from a group consisting of SEQ ID NOS: 1-1485, or a fragment thereof; and detecting modulation of a biological activity of the gene product relative to a level of biological activity of the gene product in the absence of the candidate agent.